Advertisement

Reliability of Telephone Acquisition of the PROMIS Upper Extremity Computer Adaptive Test

Published:November 24, 2020DOI:https://doi.org/10.1016/j.jhsa.2020.09.014

      Purpose

      Our primary purpose was to evaluate the reliability of telephone administration of the Patient-Reported Outcomes Measurement Information System (PROMIS) Upper Extremity (UE) Computer Adaptive Test (CAT) version 2.0 in a hand and upper extremity population, and secondarily to make comparisons with the abbreviated version of the Disabilities of the Arm, Shoulder, and Hand (QuickDASH).

      Methods

      Patients more than 1 year out from hand surgeries performed at a single tertiary institution were enrolled. Half of the patients completed telephone PROMIS UE CAT and QuickDASH surveys first, followed by computer-based surveys 1 to 10 days later, and the other half completed them in the reverse order. Telephone surveys were readministered 2 to 6 weeks later to evaluate test-retest reliability. Concordance correlation coefficients (CCCs) were used to assess agreement between telephone and computer-based scores, and intraclass correlation coefficients (ICCs) were used to assess test-retest reliability. The proportion of patients with discrepancies in follow-up scores that exceeded estimates of the minimal clinically important difference (MCID) was evaluated.

      Results

      For the 89 enrolled patients, the PROMIS UE CAT CCC was 0.82 (83% confidence interval [83% CI], 0.77–0.86; good), which was significantly lower than 0.92 (83% CI, 0.89–0.94; good to excellent) for the QuickDASH. The PROMIS UE CAT ICC did not differ significantly from the QuickDASH (0.85 and 0.91, respectively). Differences in telephone versus computer scores exceeded 5 points (MCID estimate) for the PROMIS UE CAT in 34% of patients versus 5% of patients exceeding 14 points (MCID estimate) for the QuickDASH.

      Conclusions

      Significantly better reliability was observed for the QuickDASH than the PROMIS UE CAT when comparing telephone with computer-based score acquisition. Over one-third of patients demonstrated a clinically relevant difference in scores between the telephone and the computer-administered tests. We conclude that the PROMIS UE CAT should only be administered through computer-based methods.

      Clinical relevance

      These findings suggest that differences in collection methods for the PROMIS UE CAT may systematically affect the scores obtained, which may erroneously influence the interpretation of postoperative scores for hand surgery patients.

      Key words

      In today’s evolving health care landscape, there continues to be emphasis placed upon providing quality, high-value health care.
      • Lee V.S.
      • Kawamoto K.
      • Hess R.
      • et al.
      Implementation of a value-driven outcomes program to identify high variability in clinical costs and outcomes and association with reduced cost and improved quality.
      Recently, the Hand Surgery Quality Consortium identified patient-reported outcome measures (PROMs) as an important component in defining quality,
      • Kamal R.N.
      Hand Surgery Quality Consortium
      Quality and value in an evolving health care landscape.
      and PROMs data are likely to influence reimbursement of health care services under proposed pay-per-performance payment models.
      • Clough J.D.
      • McClellan M.
      Implementing MACRA: implications for physicians and for physician leadership.
      In 2004, the National Institute of Health initiated the Patient-Reported Outcomes Measurement Information System (PROMIS) with the goal of developing and validating PROMs to assess a diverse array of health domains.
      • Cella D.
      • Yount S.
      • Rothrock N.
      • et al.
      The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years.
      ,
      • Brodke D.J.
      • Saltzman C.L.
      • Brodke D.S.
      PROMIS for orthopaedic outcomes measurement.
      These instruments were designed to be administered to patients in digital format (desktop, laptop, or tablet computers) to allow for Computer Adaptive Testing (CAT). The CAT involves the administration of a subset of questions from a testing bank to a patient, while tailoring subsequent questions based upon responses to previous items.
      • Cella D.
      • Yount S.
      • Rothrock N.
      • et al.
      The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years.
      ,
      • Cella D.
      • Riley W.
      • Stone A.
      • et al.
      The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008.
      The PROMIS Upper Extremity (UE) CAT has recently received increasing attention in the hand and upper extremity literature. Specifically, PROMIS UE CAT scores have been demonstrated to correlate with traditional legacy scales including the Disabilities of the Arm, Shoulder, and Hand (DASH) and its abbreviated version, QuickDASH,
      • Tyser A.R.
      • Beckmann J.
      • Franklin J.D.
      • et al.
      Evaluation of the PROMIS physical function computer adaptive test in the upper extremity.
      • Doring A.C.
      • Nota S.P.
      • Hageman M.G.
      • Ring D.C.
      Measurement of upper extremity disability using the Patient-Reported Outcomes Measurement Information System.
      • Beckmann J.T.
      • Hung M.
      • Voss M.W.
      • Crum A.B.
      • Bounsanga J.
      • Tyser A.R.
      Evaluation of the Patient-Reported Outcomes Measurement Information System Upper Extremity Computer Adaptive Test.
      which were designed to be completed on a paper questionnaire form.
      • Beaton D.E.
      • Wright J.G.
      • Katz J.N.
      Upper Extremity Collaborative Group. Development of the QuickDASH: comparison of three item-reduction approaches.
      ,
      • Gummesson C.
      • Ward M.M.
      • Atroshi I.
      The shortened disabilities of the arm, shoulder and hand questionnaire (QuickDASH): validity and reliability based on responses within the full-length dash.
      Situations exist in which verbal administration of follow-up PROMs would be advantageous. Previous studies have demonstrated greater rates of response and survey completeness using telephone-acquired PROMs, compared with other methods of administration.
      • London D.A.
      • Stepan J.G.
      • Boyer M.I.
      • Calfee R.P.
      Performance characteristics of the verbal QuickDASH.
      ,
      • Schwartzenberger J.
      • Presson A.
      • Lyle A.
      • O'Farrell A.
      • Tyser A.R.
      Remote collection of patient-reported outcomes following outpatient hand surgery: a randomized trial of telephone, mail, and e-mail.
      Furthermore, outcome assessment for patients lacking computer access, those who are illiterate, or those who do not regularly use e-mail may be more successful by telephone. Requesting patients return to the office for collection of mid- to long-term PROMs may also be impractical. Because PROMs are often obtained via telephone in clinical research, it is important to understand the implications of changing the administration format because the long-term efficacy of various treatments may be based upon this method of collection.
      Despite the advantages of telephone acquisition of PROMs, authors have cautioned against administering surveys in a format other than that for which the measure was originally designed and validated.
      • Cella D.
      • Hahn E.A.
      • Jensen S.E.
      • et al.
      Patient-Reported Outcomes in Performance Measurement.
      As the performance of an instrument administered in such a way cannot be predicted without prior investigation,
      • Cella D.
      • Hahn E.A.
      • Jensen S.E.
      • et al.
      Patient-Reported Outcomes in Performance Measurement.
      our main purpose was to assess the reliability of the PROMIS UE CAT administered by telephone compared with administration using the intended computer-based format. Secondarily, we aimed to compare the reliability of the PROMIS UE CAT with that of the QuickDASH.

      Materials and Methods

      This institutional review board–approved study was performed at a single-center tertiary academic center. Patients were identified who were more than 1-year after surgery performed by 1 of 5 fellowship-trained orthopedic hand surgeons (J.T.W. and J.W.C.). We excluded non–English-speaking patients, minors (<18 years), and patients undergoing nonsurgical procedures. In addition, we excluded patients indicating on a screening questionnaire that they had developed an upper extremity injury, or had undergone additional upper extremity treatment, since the listed date of their index surgery. This exclusion, and requirement of a minimum of 1-year postoperative follow-up, was implemented in an effort to exclude patients who could be expected to demonstrate changes in PROMs for upper extremity function or disability. In this specific sample of patients meeting the selection criteria, we assumed that any changes in PROM scores between telephone surveys would be due to variation in how the patient rates their function (which would reflect instrument reliability), rather than actual variation in their function.
      Postoperative patients were identified by surgical Current Procedural Terminology codes performed by the 5 surgeons. Surgical procedures, postoperative diagnoses, and baseline demographic data were verified and recorded through manual chart review. Eligible patients were mailed opt-out consent forms. After excluding those who opted out, patients were informed that study participation would involve 3 contacts for survey completion over a several-week period. E-mail addresses were confirmed, and all participants were asked the screening question regarding new upper extremity conditions or treatments. Patients were informed that their responses would be helpful in understanding how to measure their surgical outcome without specific discussion of the planned comparisons.
      Remaining patients were electronically randomized to 1 of 2 study arms in blocks of 4: (1) arm 1 received the telephone survey first, followed by the computer-based survey, and (2) arm 2 received the computer-based survey first, followed by the telephone survey. Specifically, arm 1 patients were administered the PROMIS UE CAT v2.0 followed by the QuickDASH during this first contact, which was by telephone. Computer-based administration of both instruments was solicited using e-mail with a personalized link within a day following the telephone survey. The link was valid 1 to 10 days from completion of the initial telephone survey. Reminder e-mails were sent daily until completion, or until the window of eligibility had lapsed. Arm 2 patients were e-mailed a personal link for completion of the PROMIS UE CAT and QuickDASH, with reminder e-mails every other day until completed. Once this computer-based administration was completed, attempts to contact each participant by telephone were made the next day to complete the same surveys verbally and up to 10 days after. Attempts to contact participants were made daily until completed or until outside of the eligibility window.
      All participants in both study arms were then contacted by telephone between 2 and 6 weeks following the completion of their second set of surveys, as previously recommended for test-retest analysis.
      • Streiner D.
      • Norman G.
      • Cairney J.
      Health Measurement Scales: A Practical Guide to Their Development and Use. 5th ed.
      Verbal administration of both the PROMIS UE CAT and QuickDASH was performed during this third contact.
      Basic descriptive statistics were calculated. Continuous variables were compared using the Student t test, and categorical variables were compared using either chi-square or Fisher exact tests. Response rate was calculated as the number of patients who responded to the PROMIS UE CAT and QuickDASH for both telephone and computer-based administrations, divided by the total number of patients for whom contact was attempted.
      Differences in PROMIS UE CAT and QuickDASH scores between the first 2 contacts were calculated as the first telephone contact score minus the computer-based score. The concordance correlation coefficient (CCC) was used to assess agreement between the first verbal score and the computer-based score. Ninety-five percent confidence intervals (CIs) for the CCC were calculated for all participants, and separately for arms 1 and 2. For comparisons of scores between the 2 instruments and between the 2 arms, we used 83% CIs to assess statistical significance because, in this situation, overlapping CIs do not indicate a lack of statistical significance.
      • Austin P.C.
      • Hux J.E.
      A brief note on overlapping confidence intervals.
      The reason is because there are 2 sets of CIs indicating variability among 2 different sample means, and thus narrower 83% (or 83.7%) CIs are needed to equate overlap of CIs with a lack of statistical significance at the .05 level.
      • Austin P.C.
      • Hux J.E.
      A brief note on overlapping confidence intervals.
      • Goldstein H.
      • Healy M.J.R.
      The graphical presentation of a collection of means.
      • Payton M.E.
      • Greenstone M.H.
      • Schenker N.
      Overlapping confidence intervals or standard error intervals: what do they mean in terms of statistical significance?.
      Pearson correlation coefficients were calculated to evaluate the relationship between patient age and difference in telephone versus computer-based scores in which P value calculation utilized the Fisher z transformation. Univariate linear regression was used to model the change in the differences between scores at the first 2 contacts (first telephone score minus computer-based score) for both instruments. Predictor variables included study arm, age, sex, race, anesthetic type, body mass index, surgical location (ambulatory surgery center versus main operating room), diagnosis, provider, American Society of Anesthesiologists score, and marital status. Owing to the potential for unreliable estimates, we did not provide coefficients for predictor variable categories with fewer than 5 observations.
      Test-retest reliability was used to assess agreement between the 2 telephone scores using intraclass correlation coefficients (ICCs). Differences between scores collected at the 2 telephone contacts were calculated as the second telephone score minus the first telephone score for both instruments. Estimates and 95% CIs were calculated for the entire cohort, and separately for arms 1 and 2. For statistical comparison of ICCs between the 2 instruments and between the 2 arms, 83% CIs were calculated.
      Bland-Altman plots were created to illustrate differences between telephone and computer-based scores for both instruments. Bland-Altman plots were also created to demonstrate differences between the 2 telephone scores. The number and percentage of patients with differences in scores exceeding estimates of the minimal clinically important difference (MCID) were calculated for each plot, as has been done previously in orthopedic outcomes studies.
      • Cvetanovich G.L.
      • Gowd A.K.
      • Liu J.N.
      • et al.
      Establishing clinically significant outcome after arthroscopic rotator cuff repair.
      • Nwachukwu B.U.
      • Fields K.
      • Chang B.
      • Nawabi D.H.
      • Kelly B.T.
      • Ranawat A.S.
      Preoperative outcome scores are predictive of achieving the minimal clinically important difference after arthroscopic treatment of femoroacetabular impingement.
      • Franchignoni F.
      • Vercelli S.
      • Giordano A.
      • Sartorio F.
      • Bravini E.
      • Ferriero G.
      Minimal clinically important difference of the disabilities of the arm, shoulder and hand outcome measure (DASH) and its shortened version (QuickDASH).
      • Liu J.N.
      • Gowd A.K.
      • Redondo M.L.
      • et al.
      Establishing clinically significant outcomes after meniscal allograft transplantation.
      For the QuickDASH, an MCID estimate of 14 was used.
      • Sorensen A.A.
      • Howard D.
      • Tan W.H.
      • Ketchersid J.
      • Calfee R.P.
      Minimal clinically important differences of 3 patient-rated outcomes instruments.
      Because the MCID for the PROMIS UE CAT v2.0 has not yet been determined, an estimate of 5 was used based upon the one-half SD method (the expected SD is 10 in a normative population).
      Northwestern University
      Healthmeasures—Interpret Scores: PROMIS.
      One-sided paired t tests and associated 95% CIs were used to determine whether absolute differences in telephone versus computer-based scores, or the pair of telephone scores, were different within the tolerance of the MCID estimate for the study population.
      The CCCs and ICCs were interpreted as follows in terms of strength: poor reliability (<0.50), moderate reliability (0.50–0.75), good reliability (0.75–0.90), and excellent reliability (>0.90).
      • Koo T.K.
      • Li M.Y.
      A guideline of selecting and reporting intraclass correlation coefficients for reliability research.
      ,
      • Portney L.
      • Watkins M.
      Foundations of Clinical Research: Applications to Practice. 3rd ed.
      Absolute values of Pearson correlation r values were interpreted as follows in terms of strength of the association: negligible (r < 0.3), low (0.3 ≤ r < 0.5), moderate (0.5 ≤ r < 0.7), high (0.7 ≤ r < 0.9), and very high (≥0.9).
      • Mukaka M.M.
      Statistics corner: a guide to appropriate use of correlation coefficient in medical research.
      ,
      • Hinkle D.
      • Wiersma W.
      • Jurs S.
      Applied Statistics for the Behavioral Sciences.
      A significance level of α equal to 0.05 was used throughout.
      An a priori sample size estimate was performed based upon a previously published ICC between the written and the verbal QuickDASH (0.91; 95% CI, 0.84–0.95).
      • London D.A.
      • Stepan J.G.
      • Boyer M.I.
      • Calfee R.P.
      Performance characteristics of the verbal QuickDASH.
      We conservatively calculated the sample size needed to differentiate between the lower bound of this 95% CI (0.84) from the upper limit of good-to-fair reproducibility (0.74).
      • Walter S.D.
      • Eliasziw M.
      • Donner A.
      Sample size and optimal designs for reliability studies.
      A sample size of 86 subjects, with 2 observations per subject, achieves 80% power to detect a difference between an ICC of 0.84 and an ICC of 0.74 using an F test and significance level of .05.
      • Walter S.D.
      • Eliasziw M.
      • Donner A.
      Sample size and optimal designs for reliability studies.

      Results

      Out of 465 reviewed patients, 112 were excluded owing to injury or additional upper extremity treatment or surgery that occurred after their index surgery. A total of 15 patients opted out (10 in arm 1, 5 in arm 2). An additional 8 patients were excluded owing to recent additional surgery (3 in both arms) or injury (2 in arm 2) after the initial survey was completed. Among the remaining 330 patients, only 89 (27%) completed all 3 sets of questionnaires. Of the 89 included patients, the mean age was 50.6 ± 15.9 years and 54 (59%) were female. Baseline characteristics for patients in arm 1 and arm 2 are provided in Table 1.
      Table 1Baseline Patient Characteristics
      Variable
      Missing values: BMI = 1.
      Type/LevelSummaryArm 1 (n = 43)Arm 2 (n = 46)
      AgeMean (SD)50.6 (15.9)48.2 (16.4)52.9 (15.4)
      Anesthesia Type
      General62 (68%)29 (64%)33 (72%)
      Local1 (1%)0 (0%)1 (2%)
      MAC22 (24%)13 (29%)9 (20%)
      Regional6 (7%)3 (7%)3 (7%)
      ASA score
      Healthy24 (26%)13 (29%)11 (24%)
      Mild systemic disease54 (59%)25 (56%)29 (63%)
      Severe systemic disease13 (14%)7 (16%)6 (13%)
      BMIMean (SD)27.8 (6%)27.9 (7%)27.7 (6%)
      Diagnosis category
      Osteoarthritis20 (22%)7 (16%)13 (28%)
      Nerve compression24 (26%)13 (29%)11 (24%)
      Tendinitis8 (9%)3 (7%)5 (11%)
      Fracture12 (13%)6 (13%)6 (13%)
      Dupuytren3 (3%)0 (0%)3 (7%)
      Ligament/tendon injury10 (11%)6 (13%)4 (9%)
      Mass/cyst9 (10%)3 (7%)6 (13%)
      Nerve laceration5 (5%)2 (4%)3 (7%)
      Ulnar impaction3 (3%)2 (4%)1 (2%)
      Infection3 (3%)3 (7%)0 (0%)
      Other7 (8%)4 (9%)3 (7%)
      Insurance Type
      Commercial69 (76%)33 (73%)36 (78%)
      Medicaid2 (2%)1 (2%)1 (2%)
      Medicare18 (20%)11 (24%)7 (15%)
      Workers’ compensation2 (2%)0 (0%)2 (4%)
      Marital status
      Single20 (22%)15 (33%)5 (11%)
      Married61 (67%)25 (56%)36 (78%)
      Divorce5 (5%)2 (4%)3 (7%)
      Other/unknown5 (5%)3 (7%)2 (4%)
      Race
      White/Caucasian84 (92%)39 (87%)45 (98%)
      Other7 (8%)6 (13%)1 (2%)
      Sex
      Female54 (59%)26 (58%)28 (61%)
      Male37 (41%)19 (42%)18 (39%)
      Surgeon
      Provider A8 (9%)6 (13%)2 (4%)
      Provider B12 (13%)4 (9%)8 (17%)
      Provider C21 (23%)10 (22%)11 (24%)
      Provider D37 (41%)19 (42%)18 (39%)
      Provider E13 (14%)6 (13%)7 (15%)
      Surgery location
      Ambulatory surgery Center84 (92%)42 (93%)42 (91%)
      Main operating room7 (8%)3 (7%)4 (9%)
      ASA, American Society of Anesthesiologists, BMI, body mass index; MAC, minimal anesthesia concentration.
      Missing values: BMI = 1.
      Comparison of telephone and computer-based PROMIS UE CAT scores revealed a CCC of 0.82 (83% CI, 0.77–0.88) in the moderate to good reliability range, which was significantly lower than a QuickDASH CCC of 0.92 (83% CI, 0.89–0.94) in the good to excellent range (Table 2). Given no overlap between respective 83% CIs, the CCC for the QuickDASH should be considered significantly greater than that for the PROMIS UE CAT at a .05 significance level. Bland-Altman plots illustrating differences between telephone and computer-based scores are provided in Figure 1 (PROMIS UE CAT) and Figure 2 (QuickDASH). For the PROMIS UE CAT, 34% of patients had a score difference that exceeded an MCID estimate of 5, and for the QuickDASH 5% had score differences beyond an MCID estimate of 14. One-sided paired t test results demonstrated that the overall difference in telephone and computer-based scores were below MCID estimates for both the PROMIS UE CAT (mean difference, 4.17; 95% CI, 0–4.99; P = .049) and the QuickDASH (mean difference, 4.76; 95% CI, 0–5.63; P < .05).
      Table 2Comparison of Telephone and Computer-Based Scores
      Instrument
      Missing values: QuickDASH e-mail = 1; QuickDASH difference = 1.
      Type/LevelArm 1 (n = 43)Arm 2 (n = 46)Summary
      PROMIS UE CAT
       Phonemean (SD)51 (10.3)48.9 (10.4)49.9 (10.4)
       E-mailmean (SD)47.2 (10.9)47.4 (10.3)47.3 (10.6)
       Differencemean (SD)3.7 (5.5)1.5 (5.8)2.6 (5.8)
       CCC estimate0.810.830.82
       CCC 95% CI0.70–0.890.72–0.900.75–0.88
       CCC 83% CI
      Lack of overlap between 83% CIs indicates that the QuickDASH CCC is significantly greater than the PROMIS UE CAT CCC.
      0.74–0.870.76–0.890.77–0.86
      QuickDASH
       Phonemean (SD)15.8 (19.8)13.2 (13.2)14.5 (16.7)
       Emailmean (SD)16.4 (19.7)14.6 (14.1)15.5 (17)
       Differencemean (SD)-0.4 (7.2)-1.4 (6.5)-0.9 (6.8)
       CCC estimate0.930.880.92
       CCC 95% CI(0.88, 0.96)(0.80, 0.93)(0.88, 0.94)
       CCC 83% CI
      Lack of overlap between 83% CIs indicates that the QuickDASH CCC is significantly greater than the PROMIS UE CAT CCC.
      (0.90, 0.96)(0.83, 0.92)(0.89, 0.94)
      Days elapsedmean (SD)2.6 (2.3)2.5 (1.9)2.5 (2.1)
      1–741 (91%)46 (100%)87 (96%)
      8–104 (9%)0 (0%)4 (4%)
      Response raten/total (%)45/171 (26)46/154 (30)91/325 (28)
      ASA, American Society of Anesthesiologists; BMI, body mass index; CCC, concordance correlation coefficient; MAC, minimal anesthesia concentration.
      Missing values: QuickDASH e-mail = 1; QuickDASH difference = 1.
      Lack of overlap between 83% CIs indicates that the QuickDASH CCC is significantly greater than the PROMIS UE CAT CCC.
      Figure thumbnail gr1
      Figure 1PROMIS UE CAT Bland-Altman plot: telephone versus computer-based administration. The mean score for the x axis label represents the mean of the first telephone score and the computer-based score.
      Figure thumbnail gr2
      Figure 2QuickDASH Bland-Altman plot: telephone versus computer-based administration. The mean score for the x axis label represents the mean of the first telephone score and the computer-based score.
      In the PROMIS UE CAT univariate regression analysis, increasing age, American Society of Anesthesiologists class 2 and class 3, minimal aesthetic concentration anesthesia, and Medicare insurance status were significantly associated with a greater score difference between telephone and computer-based modalities (P < 0.05 for all comparisons; Table 3). These factors were not significant for the QuickDASH (P = 0.30–0.95). A diagnosis of a ligament/tendon injury was significantly associated with a greater score difference for the QuickDASH (P < .05). Differences in all other baseline patient characteristics under study were not significant (all P > .05).
      Table 3Univariate Regression of Differences in Telephone and Computer-Based Scores
      VariableLevelPROMIS UE CAT Difference EstimateQuickDASH Difference Estimate
      Coefficient (95% CI)P Value
      Bold values are statistically significant (P < .05).
      Coefficient (95% CI)P Value
      Bold values are statistically significant (P < .05).
      ArmArm 1ReferenceReference
      Arm 2–2.19 (–4.53 to 0.14).07–0.94 (–3.77 to 1.89).51
      Agemean (SD)0.14 (0.07 to 0.21)<.05–0.05 (–0.14 to 0.04).30
      Anesthesia type
      GeneralReferenceReference
      Local
      Coefficient not provided owing to small counts (<5) in the given category.
      Coefficient not provided owing to small counts (<5) in the given category.
      Coefficient not provided owing to small counts (<5) in the given category.
      Coefficient not provided owing to small counts (<5) in the given category.
      MAC3.45 (0.72 to 6.18)<.05–0.41 (–3.75 to 2.92).81
      Regional–1.95 (–6.65 to 2.75).424.55 (–1.19 to 10.28).12
      ASA Score
      HealthyReferenceReference
      Mild systemic disease2.86 (0.16 to 5.55)<.050.73 (–2.59 to 4.05).67
      Severe systemic disease4.73 (0.95 to 8.51)<.050.15 (–4.50 to, 4.80).95
      BMImean (SD)0.00 (–0.20 to 0.21).970.22 (–0.02 to 0.45).07
      Diagnosis Category
      Diagnosis categories with <5 counts were not included in this analysis.
      Osteoarthritis–2.04 (–4.88 to 0.80).160.87 (–2.60 to 4.33).63
      Nerve compression0.83 (–1.86 to 3.52).55–1.20 (–4.39 to 2.00).47
      Tendinitis2.88 (–1.28 to 7.03).181.64 (–3.34 to 6.61).52
      Fracture0.70 (–2.81 to 4.21).70–1.45 (–5.61 to 2.71).50
      Ligament tendon injury1.13 (–2.66 to 4.93).56–5.10 (–9.48 to –0.72)<.05
      Mass excision–0.67 (–4.65 to 3.31).743.55 (–1.12 to 8.22).14
      Nerve laceration–2.60 (–7.80 to 2.59).331.94 (–4.24 to 8.12).54
      Other–1.67 (–6.12 to 2.78).46–3.23 (–8.48 to 2.03).23
      Insurance type
      CommercialReference-Reference-
      Medicaid
      Coefficient not provided owing to small counts (<5) in the given category.
      Coefficient not provided owing to small counts (<5) in the given category.
      Coefficient not provided owing to small counts (<5) in the given category.
      Coefficient not provided owing to small counts (<5) in the given category.
      Medicare4.35 (1.47 to 7.23)<.050.85 (–2.70 to 4.40).64
      Workers’ compensation
      Coefficient not provided owing to small counts (<5) in the given category.
      Coefficient not provided owing to small counts (<5) in the given category.
      Coefficient not provided owing to small counts (<5) in the given category.
      Coefficient not provided owing to small counts (<5) in the given category.
      Marital status
      SingleReference-Reference-
      Married1.77 (–1.12 to 4.66).23–1.65 (–5.10 to 1.80).35
      Divorce3.21 (–2.40 to 8.82).273.64 (–3.04 to 10.31).29
      Other/unknown5.25 (–0.36 to 10.86).07–0.45 (–7.13 to 6.22).89
      Race
      White/CaucasianReference-Reference-
      Other1.52 (–2.94 to 5.97).51–1.46 (–6.75 to 3.82).59
      Sex
      FemaleReference-Reference-
      Male–2.22 (–4.59 to 0.16).071.67 (–1.19 to 4.53).26
      Surgeon
      Provider AReference-Reference-
      Provider B–2.66 (–7.83 to 2.51).320.38 (–5.84 to 6.60).91
      Provider C1.00 (–3.71 to 5.70).68–1.02 (–6.72 to 4.68).73
      Provider D0.26 (–4.16 to 4.68).91–0.69 (–6.00 to 4.62).80
      Provider E0.20 (–4.89 to 5.29).940.74 (–5.38 to 6.87).81
      Surgery location

      Ambulatory surgery Center
      Reference-Reference-
      Main operating room–1.15 (–5.60 to 3.31).62–0.41 (–5.70 to 4.88).88
      Bold values are statistically significant (P < .05).
      Coefficient not provided owing to small counts (<5) in the given category.
      Diagnosis categories with <5 counts were not included in this analysis.
      Test-retest results comparing the 2 telephone scores are presented in Table 4. The PROMIS UE CAT ICC was 0.85 (83% CI, 0.80–0.89) in the good to excellent range. The QuickDASH ICC was 0.91 (83% CI, 0.88–0.93), which was not significantly different, and also in the good to excellent range. Given overlap between 83% CIs, there was no significant difference between PROMIS UE CAT and QuickDASH ICCs at a .05 significance level. Bland-Altman plots illustrating differences between telephone and computer-based scores are provided in Figure 3 (PROMIS UE CAT) and Figure 4 (QuickDASH). For the PROMIS UE CAT, 29% of patients had a score difference that exceeded an MCID estimate of 5, and for the QuickDASH 6% had score differences beyond an MCID estimate of 14. One-sided paired t test results demonstrated that the overall difference in telephone and computer-based scores were below MCID estimates for both the PROMIS UE CAT (mean difference, 3.58; 95% CI, 0–4.36; P < .05) and the QuickDASH (mean difference, 4.89; 95% CI, 0–5.85; P < .05).
      Table 4Test-Retest: Comparison of First and Second Telephone Scores
      Instrument
      Missing values: QuickDASH e-mail = 1, QuickDASH difference = 1.
      Type/LevelArm 1 (n = 43)Arm 2 (n = 43)Summary
      PROMIS UE CAT
       Phonemean (SD)51.0 (10.3)48.9 (10.4)49.9 (10.4)
       E-mailmean (SD)48.4 (10.4)48.1 (10)48.3 (10.2)
       Differencemean (SD)–2.4 (5.0)–1.1 (5.7)–1.7 (5.4)
       ICC estimate0.860.840.85
       ICC 95% CI0.72–0.930.73–0.910.77–0.90
       ICC 83% CI
      Overlap between 83% CIs indicates that the QuickDASH ICC does not significantly differ from the PROMIS UE CAT ICC.
      0.77–0.910.77–0.890.80–0.89
      QuickDASH
       Phonemean (SD)15.8 (19.8)13.2 (13.2)14.5 (16.7)
       E-mailmean (SD)16.9 (20.4)13.1 (12.9)15.0 (17.1)
       Differencemean (SD)0.7 (8.5)0.1 (5.8)0.4 (7.3)
       ICC estimate0.910.900.91
       ICC 95% CI0.84–0.950.83–0.950.87–0.94
       ICC 83% CI
      Overlap between 83% CIs indicates that the QuickDASH ICC does not significantly differ from the PROMIS UE CAT ICC.
      0.87–0.940.86–0.940.88–0.93
      Days between score collectionsmean (SD)20.4 (5.4)19.6 (6.1)20.0 (5.8)
      Response raten/total (%)43/171 (25%)43/154 (28%)86/325 (26%)
      ICC, intraclass correlation coefficient.
      Missing values: QuickDASH e-mail = 1, QuickDASH difference = 1.
      Overlap between 83% CIs indicates that the QuickDASH ICC does not significantly differ from the PROMIS UE CAT ICC.
      Figure thumbnail gr3
      Figure 3PROMIS UE CAT Bland-Altman plot: telephone test-retest. The mean score for the x axis label represents the mean of the first and second telephone scores.
      Figure thumbnail gr4
      Figure 4QuickDASH Bland-Altman plot: telephone test-retest. The mean score for the x axis label represents the mean of the first and second telephone scores.
      Pearson correlations between age and difference in scores are provided in Table 5. The only statistically significant finding was a low correlation between increasing age and score difference between the first telephone score and the computer-based score for the PROMIS UE CAT (P < .05). This association is illustrated as a scatterplot in Figure 5.
      Table 5Correlations Between Age and Difference in Scores
      ComparisonFirst Telephone Versus Computer-Based ScoreFirst Versus Second Telephone Score (Test-Retest)
      Pearson CoefficientP Value
      Bold values are statistically significant (P < .05).
      Pearson CoefficientP Value
      Bold values are statistically significant (P < .05).
      PROMIS UE CAT versus Age0.38<.05–0.09.39
      QuickDASH versus age–0.11.300.03.82
      Bold values are statistically significant (P < .05).
      Figure thumbnail gr5
      Figure 5Scatterplot of differences in PROMIS UE CAT telephone and computer-based scores by age.

      Discussion

      The main finding of this study was that, when verbally administered, the PROMIS UE CAT had significantly lower reliability than the QuickDASH on computer-based administration. Although the PROMIS UE CAT reliability was considered good, it is noteworthy that the difference between the telephone and the computer scores exceeded the PROMIS UE CAT MCID estimate one-third of the time (vs 5% for the QuickDASH). These findings illustrate the clinical relevance of the differing modes of survey administration.
      Although test-retest reliability did not significantly differ between instruments, the difference between the 2 telephone scores for the PROMIS UE CAT exceeded an MCID estimate in 29% of cases as opposed to only 6% for the QuickDASH. As a final concern, a weak but significant correlation was observed such that increasing age was associated with greater differences between telephone and computer-based PROMIS UE CAT scores—this association was absent for the QuickDASH.
      In light of these observations regarding the proportion of patients with scores exceeding a difference of an MCID estimate upon repeated surveys, it is noteworthy that applying the MCID concept to individual patients may be controversial. However, multiple studies across several orthopedic specialties have published analyses of this nature.
      • Cvetanovich G.L.
      • Gowd A.K.
      • Liu J.N.
      • et al.
      Establishing clinically significant outcome after arthroscopic rotator cuff repair.
      • Nwachukwu B.U.
      • Fields K.
      • Chang B.
      • Nawabi D.H.
      • Kelly B.T.
      • Ranawat A.S.
      Preoperative outcome scores are predictive of achieving the minimal clinically important difference after arthroscopic treatment of femoroacetabular impingement.
      • Franchignoni F.
      • Vercelli S.
      • Giordano A.
      • Sartorio F.
      • Bravini E.
      • Ferriero G.
      Minimal clinically important difference of the disabilities of the arm, shoulder and hand outcome measure (DASH) and its shortened version (QuickDASH).
      • Liu J.N.
      • Gowd A.K.
      • Redondo M.L.
      • et al.
      Establishing clinically significant outcomes after meniscal allograft transplantation.
      The optimal way to apply the MCID and the most appropriate way to report outcomes results is yet to be determined and is the focus of an increased research effort recently. Despite these findings, it should be noted that the paired differences between repeat survey scores were less than the chosen MCID estimate for the PROMIS UE CAT and QuickDASH comparisons of telephone with computer scores, and for comparisons between the 2 telephone scores. Because the MCID for the PROMIS UE CAT version 2.0 has yet to be elucidated (eg, the chosen estimate of 5 is hypothetical), it is possible that an estimate lower than the value of 5 used here may lead to significant differences for the PROMIS UE CAT. Mean differences between score acquisitions for the PROMIS UE CAT were in the range of 3.6 to 4.2 on the paired t test analysis, and therefore, our conclusion that the overall difference in scores is lower than an MCID could be reversed should new estimates arise that fall within or below these values. Although scores from different versions of the PROMIS UE CAT are not interchangeable nor can the values be directly compared,
      Northwestern University
      Healthmeasures—Interpret Scores: PROMIS.
      one recent study provided MCID estimates for the PROMIS UE CAT v1.2 in the range of 4.2 to 8.0 in a carpal tunnel population.
      • Bernstein D.N.
      • Houck J.R.
      • Mahmood B.
      • Hammert W.C.
      Minimal clinically important differences for PROMIS physical function, upper extremity, and pain interference in carpal tunnel release using region- and condition-specific PROM tools.
      Although we could not identify a publication with PROMIS UE CAT v2.0 MCID values, one published abstract suggested this value falls in the range of 3.0 to 4.1 (Kazmers et al, presented at the 74th Annual Meeting of the American Society for Surgery of the Hand, 2020). This range of values, which were obtained in a general hand and upper extremity population similar to that of the current study, are interestingly below the paired differences between computer and telephone scores. It is possible that the differences between telephone and computer scores observed in the current study actually exceed that MCID estimate, which raises further concerns about telephone acquisition of the PROMIS UE CAT. In contrast, paired differences for the QuickDASH were in the range of 4.8 to 4.9, which is subjectively far from even the lower of the MCID estimates in the literature (6.8–19).
      • Franchignoni F.
      • Vercelli S.
      • Giordano A.
      • Sartorio F.
      • Bravini E.
      • Ferriero G.
      Minimal clinically important difference of the disabilities of the arm, shoulder and hand outcome measure (DASH) and its shortened version (QuickDASH).
      ,
      • Sorensen A.A.
      • Howard D.
      • Tan W.H.
      • Ketchersid J.
      • Calfee R.P.
      Minimal clinically important differences of 3 patient-rated outcomes instruments.
      ,
      • Mintken P.E.
      • Glynn P.
      • Cleland J.A.
      Psychometric properties of the shortened disabilities of the arm, shoulder, and hand questionnaire (QuickDASH) and numeric pain rating scale in patients with shoulder pain.
      • Polson K.
      • Reid D.
      • McNair P.J.
      • Larmer P.
      Responsiveness, minimal importance difference and minimal detectable change scores of the shortened disability arm shoulder hand (QuickDASH) questionnaire.
      • Kazmers N.H.
      • Qiu Y.
      • Yoo M.
      • Stephens A.R.
      • Tyser A.R.
      • Zhang Y.
      The minimal clinically important difference of the PROMIS and QuickDASH instruments in a nonshoulder hand and upper extremity patient population.
      Although further investigation is warranted, we hypothesize that a negative trade-off for scale brevity for the PROMIS UE CAT could be slightly lower reliability. That is, if a patient completes a second survey and provides a slightly different response to even the same initial question, the computer adaptive algorithm may place that patient on a different track of subsequent questions, which may pertain to higher or lower overall function than the track of questioning for their initial survey. It should be noted that this is a hypothesis and formally commenting upon this is beyond the scope of the study results.
      Although our findings for the PROMIS UE CAT are unique, study findings with regard to the QuickDASH may be compared with those of prior literature. London et al
      • London D.A.
      • Stepan J.G.
      • Boyer M.I.
      • Calfee R.P.
      Performance characteristics of the verbal QuickDASH.
      utilized ICCs to compare QuickDASH scores obtained through telephone and paper form 1 day apart. This yielded a value of 0.91 and a 95% CI in the good to excellent reliability range (0.84–0.94),
      • London D.A.
      • Stepan J.G.
      • Boyer M.I.
      • Calfee R.P.
      Performance characteristics of the verbal QuickDASH.
      which is subjectively similar to the value of 0.92 and 95% CI spanning the good to excellent range in the current study. Similarly, the authors did not observe a correlation between patient age and difference in QuickDASH scores. Minor differences between the studies should be noted, however. The current study differs in its use of CCCs (although they are very similar to the ICCs), and by comparing telephone scores with computer-based scores rather than the written version of the QuickDASH. London et al
      • London D.A.
      • Stepan J.G.
      • Boyer M.I.
      • Calfee R.P.
      Performance characteristics of the verbal QuickDASH.
      observed good test-retest reliability, as opposed to good to excellent reliability noted in the current study for the QuickDASH. Similarly, the ICCs for test-retest reliability observed in the current study were subjectively greater (0.91 vs 0.68). It is possible that our shorter time frame between the test and the retest administrations (mean, 20 days vs 5 months), or comparison with computer-based scores rather than those from paper forms, could contribute to these differences. Nonetheless, as in the current study, the authors concluded that these differences were not clinically relevant.
      Although the QuickDASH demonstrated superior performance over the PROMIS UE CAT in certain aspects of reliability, both telephone and computer-based acquisition methods may potentially be suitable for both instruments to be utilized in clinical research. Frost et al
      • Frost M.H.
      • Reeve B.B.
      • Liepa A.M.
      • Stauffer J.W.
      • Hays R.D.
      Mayo/FDA Patient-Reported Outcomes Consensus Meeting Group. What is sufficient evidence for the reliability and validity of patient-reported outcome measures?.
      have recommended a minimum reliability threshold of 0.70 for an instrument to be used for clinical trials outcomes, and both instruments have exceeded this when comparing telephone with computer-based scores and for test-retest reliability. Furthermore, other studies have evaluated reliability of select PROMIS instruments with favorable results, but we were unable to locate prior studies on the PROMIS UE CAT for comparison. Deyo et al
      • Deyo R.A.
      • Katrina R.
      • Buckley D.I.
      • et al.
      Performance of a patient reported outcomes measurement information system (PROMIS) short form in older adults with chronic musculoskeletal pain.
      observed good-to-excellent test-retest reliability for the PROMIS short 29-item form for chronic musculoskeletal pain patients. Bjorner et al
      • Bjorner J.B.
      • Rose M.
      • Gandek B.
      • Stone A.A.
      • Junghaenel D.U.
      • Ware Jr., J.E.
      Method of administration of PROMIS scales did not significantly impact score level, reliability, or validity.
      evaluated multiple modes of administration (interactive voice response, paper form, hand-held computer, or personal computer) for 8-item short forms derived from the PROMIS Physical Function, Fatigue, and Depression item banks. High levels of reliability (average ICC of 0.90) between different modes of administration were noted with no clinically relevant effect attributed to the mode of administration within a score tolerance of ±2 points. Broderick et al
      • Broderick J.E.
      • Schneider S.
      • Junghaenel D.U.
      • Schwartz J.E.
      • Stone A.A.
      Validity and reliability of Patient-Reported Outcomes Measurement Information System instruments in osteoarthritis.
      collected several PROMIS instruments (Pain Intensity, Pain Interference, Physical Function, and Fatigue) in CAT and short-form formats for a large cohort of patients with osteoarthritis. Good to excellent test-retest performance for both modes of administration were observed, and less than 25% of patients had differences in CAT and short-form scores that exceeded MCID estimates. For a sample of spinal cord or traumatic brain injury patients, Kisala et al
      • Kisala P.A.
      • Boulton A.J.
      • Cohen M.L.
      • et al.
      Interviewer- versus self-administration of PROMIS measures for adults with traumatic injury.
      observed no effect of mode of administration for multiple PROMIS fixed-length short forms (Physical Function, Fatigue, Pain Interference, Anger, Anxiety, Depression), which were administered by an interviewer (in-person form or telephone) or by a computer-based format. Lastly, Magnus et al
      • Magnus B.E.
      • Liu Y.
      • He J.
      • et al.
      Mode effects between computer self-administration and telephone interviewer-administration of the PROMIS pediatric measures, self- and proxy report.
      observed similarity between telephone and computer-based PROMIS Depressive Symptoms, Fatigue, and Mobility short-form scores in a pediatric population.
      Our study limitations deserve mention. Although our findings should not be generalized to all PROMIS instruments or all study populations, we agree with the notion that nonstandard administration methods should be evaluated before clinical or research use.
      • Cella D.
      • Hahn E.A.
      • Jensen S.E.
      • et al.
      Patient-Reported Outcomes in Performance Measurement.
      Our study sample age should be considered when interpreting the results because it is possible that the results could differ with a younger population more comfortable with technology. Our study did not investigate the reasons why the PROMIS UE CAT performs less favorably than the QuickDASH. It is unclear whether a higher response rate would affect our results. It remains possible that the advantage of fewer questions for PROMIS UE CAT completion could decrease precision or reliability, although future work is needed to elucidate and potentially improve upon this. In addition, it remains unclear what level of reliability is acceptable, which makes it more challenging to interpret the study results with regard to reliability between different modes of survey administration. Our finding that patients with diagnoses related to tendon/ligament pathology demonstrated a greater score difference between telephone and computer administration of the QuickDASH on univariate analysis likely represents alpha error—that is, erroneously concluding there is a difference when 1 does not actually exist—versus the possibility that these patients were older. However, this study is limited in that it was not designed to specifically evaluate for reliability in outcome collection between diagnostic categories. Lastly, it is unclear if our response rate of 26% to 28% had an impact on the study findings because it is possible that responders and nonresponders were different in terms of treatment or patient factors.
      In conclusion, the QuickDASH and PROMIS UE CAT demonstrated similar test-retest reliability for telephone administration in the good to excellent range relative to computer-based administration. However, the QuickDASH demonstrated significantly better reliability than the PROMIS UE CAT when comparing telephone and computer-based administration. Similarly, we observed a high rate of clinically important differences with the PROMIS UE CAT in both the verbal test-retest and the telephone versus computer portions of this study. Based on these findings, we recommend that the PROMIS UE CAT only be administered in a computer-based format.

      Acknowledgments

      This investigation was supported by the University of Utah Study Design and Biostatistics Center, with funding in part from the National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health , through Grant UL1TR002538 .

      References

        • Lee V.S.
        • Kawamoto K.
        • Hess R.
        • et al.
        Implementation of a value-driven outcomes program to identify high variability in clinical costs and outcomes and association with reduced cost and improved quality.
        JAMA. 2016; 316: 1061-1072
        • Kamal R.N.
        • Hand Surgery Quality Consortium
        Quality and value in an evolving health care landscape.
        J Hand Surg Am. 2016; 41: 794-799
        • Clough J.D.
        • McClellan M.
        Implementing MACRA: implications for physicians and for physician leadership.
        JAMA. 2016; 315: 2397-2398
        • Cella D.
        • Yount S.
        • Rothrock N.
        • et al.
        The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years.
        Med Care. 2007; 45: S3-S11
        • Brodke D.J.
        • Saltzman C.L.
        • Brodke D.S.
        PROMIS for orthopaedic outcomes measurement.
        J Am Acad Orthop Surg. 2016; 24: 744-749
        • Cella D.
        • Riley W.
        • Stone A.
        • et al.
        The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008.
        J Clin Epidemiol. 2010; 63: 1179-1194
        • Tyser A.R.
        • Beckmann J.
        • Franklin J.D.
        • et al.
        Evaluation of the PROMIS physical function computer adaptive test in the upper extremity.
        J Hand Surg Am. 2014; 39: 2047-2051.e4
        • Doring A.C.
        • Nota S.P.
        • Hageman M.G.
        • Ring D.C.
        Measurement of upper extremity disability using the Patient-Reported Outcomes Measurement Information System.
        J Hand Surg Am. 2014; 39: 1160-1165
        • Beckmann J.T.
        • Hung M.
        • Voss M.W.
        • Crum A.B.
        • Bounsanga J.
        • Tyser A.R.
        Evaluation of the Patient-Reported Outcomes Measurement Information System Upper Extremity Computer Adaptive Test.
        J Hand Surg Am. 2016; 41: 739-744.e4
        • Beaton D.E.
        • Wright J.G.
        • Katz J.N.
        Upper Extremity Collaborative Group. Development of the QuickDASH: comparison of three item-reduction approaches.
        J Bone Joint Surg Am. 2005; 87: 1038-1046
        • Gummesson C.
        • Ward M.M.
        • Atroshi I.
        The shortened disabilities of the arm, shoulder and hand questionnaire (QuickDASH): validity and reliability based on responses within the full-length dash.
        BMC Musculoskelet Disord. 2006; 7: 1-7
        • London D.A.
        • Stepan J.G.
        • Boyer M.I.
        • Calfee R.P.
        Performance characteristics of the verbal QuickDASH.
        J Hand Surg Am. 2014; 39: 100-107
        • Schwartzenberger J.
        • Presson A.
        • Lyle A.
        • O'Farrell A.
        • Tyser A.R.
        Remote collection of patient-reported outcomes following outpatient hand surgery: a randomized trial of telephone, mail, and e-mail.
        J Hand Surg Am. 2017; 42: 693-699
        • Cella D.
        • Hahn E.A.
        • Jensen S.E.
        • et al.
        Patient-Reported Outcomes in Performance Measurement.
        RTI International, Research Triangle Park, NC2015
        • Streiner D.
        • Norman G.
        • Cairney J.
        Health Measurement Scales: A Practical Guide to Their Development and Use. 5th ed.
        Oxford University Press, New York2015
        • Austin P.C.
        • Hux J.E.
        A brief note on overlapping confidence intervals.
        J Vasc Surg. 2002; 36: 194-195
        • Goldstein H.
        • Healy M.J.R.
        The graphical presentation of a collection of means.
        J R Stat Soc A. 1995; 158: 175-177
        • Payton M.E.
        • Greenstone M.H.
        • Schenker N.
        Overlapping confidence intervals or standard error intervals: what do they mean in terms of statistical significance?.
        J Insect Sci. 2003; 3: 1-6
        • Cvetanovich G.L.
        • Gowd A.K.
        • Liu J.N.
        • et al.
        Establishing clinically significant outcome after arthroscopic rotator cuff repair.
        J Shoulder Elbow Surg. 2019; 28: 939-948
        • Nwachukwu B.U.
        • Fields K.
        • Chang B.
        • Nawabi D.H.
        • Kelly B.T.
        • Ranawat A.S.
        Preoperative outcome scores are predictive of achieving the minimal clinically important difference after arthroscopic treatment of femoroacetabular impingement.
        Am J Sports Med. 2017; 45: 612-619
        • Franchignoni F.
        • Vercelli S.
        • Giordano A.
        • Sartorio F.
        • Bravini E.
        • Ferriero G.
        Minimal clinically important difference of the disabilities of the arm, shoulder and hand outcome measure (DASH) and its shortened version (QuickDASH).
        J Orthop Sports Phys Ther. 2014; 44: 30-39
        • Liu J.N.
        • Gowd A.K.
        • Redondo M.L.
        • et al.
        Establishing clinically significant outcomes after meniscal allograft transplantation.
        Orthop J Sports Med. 2019; 7 (2325967118818462)
        • Sorensen A.A.
        • Howard D.
        • Tan W.H.
        • Ketchersid J.
        • Calfee R.P.
        Minimal clinically important differences of 3 patient-rated outcomes instruments.
        J Hand Surg Am. 2013; 38: 641-649
        • Northwestern University
        Healthmeasures—Interpret Scores: PROMIS.
        (Available at:)
        • Koo T.K.
        • Li M.Y.
        A guideline of selecting and reporting intraclass correlation coefficients for reliability research.
        J Chiropr Med. 2016; 15: 155-163
        • Portney L.
        • Watkins M.
        Foundations of Clinical Research: Applications to Practice. 3rd ed.
        Prentice Hall, Upper Saddle River, NJ2009
        • Mukaka M.M.
        Statistics corner: a guide to appropriate use of correlation coefficient in medical research.
        Malawi Med J. 2012; 24: 69-71
        • Hinkle D.
        • Wiersma W.
        • Jurs S.
        Applied Statistics for the Behavioral Sciences.
        Houghton Mifflin, Boston, MA2003
        • Walter S.D.
        • Eliasziw M.
        • Donner A.
        Sample size and optimal designs for reliability studies.
        Stat Med. 1998; 17: 101-110
        • Bernstein D.N.
        • Houck J.R.
        • Mahmood B.
        • Hammert W.C.
        Minimal clinically important differences for PROMIS physical function, upper extremity, and pain interference in carpal tunnel release using region- and condition-specific PROM tools.
        J Hand Surg Am. 2019; 44: 635-640
        • Mintken P.E.
        • Glynn P.
        • Cleland J.A.
        Psychometric properties of the shortened disabilities of the arm, shoulder, and hand questionnaire (QuickDASH) and numeric pain rating scale in patients with shoulder pain.
        J Shoulder Elbow Surg. 2009; 18: 920-926
        • Polson K.
        • Reid D.
        • McNair P.J.
        • Larmer P.
        Responsiveness, minimal importance difference and minimal detectable change scores of the shortened disability arm shoulder hand (QuickDASH) questionnaire.
        Man Ther. 2010; 15: 404-407
        • Kazmers N.H.
        • Qiu Y.
        • Yoo M.
        • Stephens A.R.
        • Tyser A.R.
        • Zhang Y.
        The minimal clinically important difference of the PROMIS and QuickDASH instruments in a nonshoulder hand and upper extremity patient population.
        J Hand Surg Am. 2020; 45: 399-407.e6
        • Frost M.H.
        • Reeve B.B.
        • Liepa A.M.
        • Stauffer J.W.
        • Hays R.D.
        Mayo/FDA Patient-Reported Outcomes Consensus Meeting Group. What is sufficient evidence for the reliability and validity of patient-reported outcome measures?.
        Value Health. 2007; 10: S94-S105
        • Deyo R.A.
        • Katrina R.
        • Buckley D.I.
        • et al.
        Performance of a patient reported outcomes measurement information system (PROMIS) short form in older adults with chronic musculoskeletal pain.
        Pain Med. 2016; 17: 314-324
        • Bjorner J.B.
        • Rose M.
        • Gandek B.
        • Stone A.A.
        • Junghaenel D.U.
        • Ware Jr., J.E.
        Method of administration of PROMIS scales did not significantly impact score level, reliability, or validity.
        J Clin Epidemiol. 2014; 67: 108-113
        • Broderick J.E.
        • Schneider S.
        • Junghaenel D.U.
        • Schwartz J.E.
        • Stone A.A.
        Validity and reliability of Patient-Reported Outcomes Measurement Information System instruments in osteoarthritis.
        Arthritis Care Res (Hoboken). 2013; 65: 1625-1633
        • Kisala P.A.
        • Boulton A.J.
        • Cohen M.L.
        • et al.
        Interviewer- versus self-administration of PROMIS measures for adults with traumatic injury.
        Health Psychol. 2019; 38: 435-444
        • Magnus B.E.
        • Liu Y.
        • He J.
        • et al.
        Mode effects between computer self-administration and telephone interviewer-administration of the PROMIS pediatric measures, self- and proxy report.
        Qual Life Res. 2016; 25: 1655-1665