If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
We sought to compare preferences for arthroplasty versus arthrodesis among patients with proximal interphalangeal (PIP) joint osteoarthritis (OA) by quantifying the patient-assigned utility of each operation’s attributes.
We undertook a multistep process to identify relevant surgical attributes, including a literature review, surgeon survey, and pretest patient pilot test to build a set of discrete choice experiments. Patients with PIP joint osteoarthritis were identified using a single university electronic medical record and were recruited via electronic message or postcard. Participants completed a demographic survey and 11 discrete choice experiments designed using Sawtooth Software’s Discover tool. Utility and importance scores were generated for each attribute.
Pretest analysis identified out-of-pocket cost, joint stiffness, need for future surgery, change in grip strength, and total recovery time as the most important surgical attributes. Initial response rate to the conjoint survey was 75% and survey completion rate was 61%. The study sample was predominantly white (91%) and female (72%), mean age 64.3 years (range, 34–90 years), and mean daily pain score was 4.32 (range, 0–10). Attribute importance scores demonstrated that joint stiffness (32%) and grip strength (29%) were most important to patients. Cost (17%) and need for future surgery (19%) were intermediate patient-preference drivers. Recovery time was the least important attribute (2%).
In aggregate, patients prefer surgical attributes characteristic of arthroplasty (ability to preserve joint motion and grip strength) relative to those associated with arthrodesis (decreased need for reoperation, lower costs, and shorter reoperation times).
Symptomatic hand osteoarthritis (OA) is a costly and debilitating disease for many Americans; 13 million United States adults have symptomatic or radiographic hand OA, and 1 in 1,000 people are newly diagnosed each year.
This large and growing disease burden results in substantial personal and socioeconomic costs. Affected individuals report difficulty engaging in social activities, lower job productivity, and increased caregiver reliance; thus interventions that can reduce patients’ pain or improve their function have great potential value.
Currently, nonsteroidal anti-inflammatory medications remain the frontline treatment for most patients; however, individuals who develop persistent pain or deformity that limits their ability to perform activities of daily living may require surgery.
EULAR evidence based recommendations for the management of hand osteoarthritis: report of a Task Force of the EULAR Standing Committee for International Clinical Studies Including Therapeutics (ESCISIT).
Little is known regarding how other factors that may affect patient experience, such as cost, complication rates, or recovery time, influence individuals’ surgical preferences. Furthermore, because it is difficult to measure patient-reported outcomes accurately, surgeons often favor objective metrics such as arc of motion or grip strength to gauge results, which may not correlate to meaningful differences in patients’ daily lives.
Conjoint analysis (CA) offers a novel way to integrate objective metrics and patient preference. A well-established technique in market economics, CA is a survey method that seeks to determine what elements of a given product are most valuable and what aspects people are willing to trade. Conjoint analysis rests on the theory that every product’s value can be determined by adding the value of its component attributes.
For example, when selecting a car, potential buyers might consider gas mileage, safety, cost, and aesthetics; however, these features are not equally desirable. By varying the value-level of each of these attributes (20 vs 30 miles per gallon; $25,000 vs $30,000) and then repeatedly shuffling the combinations and asking subjects to choose between hypothetical cars A and B (discrete choice experiments), researchers can determine not only what participants value most, but at what point they are willing to make trade-offs. In this case, a buyer might select the car with the best gas mileage until the cost exceeds a certain threshold, at which time the car would also have to come with a better safety profile to win. Figure 1 describes this process as a schematic.
This study seeks to leverage these principles to bridge the existing gap between known clinical outcomes and unknown patient preferences. Using a combination of surgeon-identified attributes and patient preferences, we created a series of discrete choice experiments to determine the patient-assigned utility of arthrodesis versus arthroplasty at the population level. The results of this study may provide an important foundation for surgeons discussing the relative benefits of each procedure, and thus facilitate patient-centered decision making.
Materials and Methods
This was a cross-sectional study using an online, single-administration, adaptive choice-based survey. We searched the electronic medical record (Michart, Epic Systems, Verona, WI) using International Classification of Diseases, Ninth and 10th Revisions codes to identify all persons seen at our institution who had a hand OA diagnosis between January 1, 2015 and August 18, 2017 (Appendix A, available on the Journal’s Web site at www.jhandsurg.org). Participants were then primarily recruited through the electronic medical record’s messaging system. An additional 110 patients who had not enabled this functionality received postcards inviting them to enroll through the study Web site. Because PIP joint OA often affects multiple fingers (meaning that patients may undergo multiple operations over time), we included patients who both had and had not undergone surgery. Owing to poor specificity in diagnosis codes, before formal enrollment, patients had to answer additional screening questions confirming that they had been diagnosed with OA at the PIP joint, and that they had no concomitant rheumatologic disorders (Appendix B, available on the Journal’s Web site at www.jhandsurg.org). Participants who did not meet these criteria, those aged less than 18, and those who could not speak English were excluded from the study.
Pre-deployment testing: attribute and value-level determination
To identify potential attributes, we conducted a literature review surveying arthrodesis and arthroplasty outcomes and identified 12 candidate attributes.
An inherent constraint in CA is that each additional attribute and value-level increases choice complexity; thus, we decided to cap the final attribute number at 6, which is consistent with previous studies of CA in health care.
We then surveyed 8 hand surgeons in Michigan (ranging from 2 to 22 years post-fellowship) to determine which 6 of the remaining 11 attributes they believed most strongly drove procedure selection. Surgeons identified out-of-pocket cost, joint stiffness, joint fracture-dislocation, need for prosthetic removal, change in grip strength, and total recovery time (time of immobilization and time until mobility without restrictions). Using data from our literature review, we then assigned 2 to 3 preliminary value-levels to these attributes, again capping possible value-levels to reduce choice complexity.
Consistent with International Society for Pharmacoeconomics and Outcomes Research guidelines for CA, we conducted a pilot test to solicit patient feedback.
Our aims were to ensure that surgeon-identified attributes matched those that were important to patients, confirm participants had similar conceptions of what each attribute entailed, and certify that preliminary value-levels pulled from the literature were meaningful and distinguishable. For this purpose, we created a novel interview guide to structure qualitative interviews. To identify potential unrecognized attributes, we asked participants broad open-ended questions (“What worries you about surgery?”). We then performed direct member-checking to assess: (1) terminology use (eg, “What does ‘joint stiffness’ mean to you?”), (2) distinguishability (eg, “What would be a meaningful or big difference to you, in terms of recovery time?”), and (3) reasonability (“What would be the absolute highest amount you would pay?”). The full interview guide can be found on-line (Appendix C, available on the Journal’s Web site at www.jhandsurg.org).
The primary investigator conducted 10 one-on-one interviews: 5 participants had previously undergone surgery (3 arthroplasty and 2 arthrodesis) and 5 had not. Interviews lasted approximately 20 minutes and the investigator concluded each session by reading back participants’ responses to make sure the data were accurate in all cases. Participants did not identify new attributes; however, they had difficulty distinguishing between joint fracture-dislocation and need for prosthetic removal, so this was combined into a single item, the need for future surgery to remove the prosthesis. Table 1 lists final attributes and value-levels incorporating findings from the literature review, surgeon surveys, and patient interviews.
Table 1Attribute and Value-Levels
After surgery, your joint stiffness will be:
After surgery, your grip strength will:
Surgery will cost you (out-of-pocket)
Your recovery will last
You will need future surgery to remove your joint hardware
This set of attributes and value-levels generated 72 (3 × 3 × 2 × 2 × 2) scenarios and 2,556 pairwise comparisons ([72 × (72 – 1)] / 2). Given the massive number of pairwise choices this model produces, testing all of them was not feasible.
To address this, we used Sawtooth Discover software (version 9.4.0; Orem, UT) to administer adaptive discrete choice experiments (DCEs). Discover’s adaptive design uses a random starting point and multiple attribute/value iterations to present users with customized choices based on their previous selections. The software is able to generate high relative D-efficiency (a ratio of the precision of the test design to that of the reference design) while maintaining level balance (value-levels appear an equal number of times). It is near-orthogonal for each participant, meaning that participants see attribute pairs an equal number of times, and provides high statistical efficiency.
In this case, comparisons were presented as a choice between hypothetical surgery A versus surgery B, each of which was made up of 5 attributes with differing value-level combinations. Figure 2 presents an example. Critically, as demonstrated in the figure, participants were not explicitly choosing between arthrodesis and arthroplasty; they were weighing joint motion, grip strength, cost, recovery time, and need for future surgery against one another. Which surgery best matched the most important attributes was then assigned post hoc by the study team.
None of the possible choice combinations were conditional (every combination was a valid real-world outcome), so we did not check for plausibility, nor did we institute prohibitions that would prevent certain attributes or value-level combinations from appearing together.
To increase response efficiency (ie, to make individual choices easier for participants) while preserving our ability to estimate main effects, we used a balanced overlap design. A balanced design means that in some instances, surgery A and B had the same value-level for some proportion of attributes. This approach also probes respondents’ preferences more deeply because it prevents participants from using noncompensatory (cutoff) tactics (eg, always choosing the lowest cost option).
Finally, we assumed the desirability of value-levels was known and consistent across all respondents (eg, higher cost was always less desirable than lower cost); thus, we did not ask participants to rank the desirability of individual value-levels before completing DCEs.
Before beginning discrete choice experiments, participants completed the Michigan Hand Questionnaire (MHQ) pain domain and indicated the average pain level (0–10 scale) at the affected joint during the past 2 weeks.
Participants were also provided written instructions explaining the DCEs, an example question, and definitions for each attribute and value-level (Appendix D, available on the Journal’s Web site at www.jhandsurg.org). Finally, after completing the DCEs, subjects were asked to provide demographic information. In pre-deployment testing, surveys took less than 20 minutes to complete; however, timing was not recorded once surveys went live. Upon completion of the study, participants were compensated with a $10 gift card.
Sample size was determined using parameters proposed by Johnson and Orme, which state that in order to be able to estimate main effects, N > [1,000 (highest number of value-levels)] / [(number of DCE) (number of options within each DCE)].
Here, N > [(1,000) (3)] / [(11) (2)] = 136 participants. To provide a margin of error, target enrollment was set at 200. The number of DCE tasks (11) was determined using the Sawtooth Discover software wizard, which uses logit computation of standard error to make recommendations based on the number of concepts per task (attributes and value-levels).
Overall response rate was calculated by dividing the number of people who clicked the survey link by the total number contacted. However, given the known nonspecificity recruitment codes, we then calculated the survey completion rate excluding ineligible respondents (total completed divided by total eligible to complete). Descriptive statistics were performed to calculate means, SDs, and ranges of respondents as appropriate.
Sawtooth’s Discover tool employs empirical Bayes techniques and maximum likelihood estimation to compute individual-level logit models and generate utilities. Individual rather than pooled aggregate logit analysis avoids independence of irrelevant alternative violations (eg, that a preference for A over B would not be changed by adding C). To generate aggregate utilities, the software uses empirical Bayes methodology to calculate a logit solution using all of the DCE data and then adjusts each participant’s response by the population average. The software also uses monotonicity constraints, which means that if the individual logit search procedure includes part-worths that violate the known preference order within an attribute, their value is averaged and 0.01 is added to the part-worth of the known-preferred option and subtracted from the less preferred one. The combination of these techniques provides results that closely approximate those generated by hierarchical Bayes.
A total of 200 participants completed the survey from August to October, 2017. Recruitment results are summarized in the study flow diagram (Fig. 3) and participant demographic data are reported in Table 2. The overall study response rate was 75% and the eligible participant completion rate was 62%. The study sample was predominantly white (91%) and female (72%). Mean age was 64.3 years (range, 34–90 years). A total of 44% of the sample reported that they had full- or part-time employment or were a homemaker; 53% reported an average household income of more than $75,000. The mean daily pain score was 4.3 (range, 0–10) and the mean MHQ pain domain score was 59 (range, 0–85) of 100, with 0 representing no pain.
Table 2Patient Demographic Characteristics
MHQ pain domain score
Pain intensity rating, 0 to 10 on numerical rating scale
Utilities represent the relative desirability of each value-level with the attributes (also called part-worth). Utility scores are anchored to 0, with negative scores indicating that the value-level deters respondents from selecting options in which it appears and higher numbers signaling higher desirability. A larger range in attribute score suggests that it has a bigger role in driving patient selection. Similarly, asymmetry in value-level magnitude (–80 to +10 vs –10 to +10) suggests that one end of the spectrum (eg, high cost) may strongly influence patient choice when present (as indicated by the –80 part-worth). However, at the other end of the spectrum (eg, low cost), other attributes are more important (+10 is a relatively low value, thus signifying that other attributes with more extreme value magnitudes swing the decision between options A and B). From part-worth utilities, we computed the attribute importance score in which importance = [(utility range) / (sum of utility range for all characteristics)] × 100, thus further quantifying the relative influence of each attribute on participants’ decision making.
Table 3 reports the relative utility of each attribute and importance scores. Attribute importance scores demonstrated that avoiding joint stiffness (32%) was most the critical aspect of surgery for study participants. The slight asymmetry noted in the part-worth scores (severe: –89.27, moderate: 17.39, and mild: 71.88) indicated that severe stiffness was a stronger deterrent compared with the prospect of mild stiffness being an incentive. Grip strength emerged as the second most important attribute (29%). Cost and need for future surgery were intermediate patient-preference drivers (17% and 19%, respectively). Recovery time was the least important attribute (3%).
This study used discrete choice experiments and CA to examine which aspects of PIP joint surgery matter most to respondents. Joint stiffness and grip strength emerged as the leading patient preference drivers, need for future surgery and cost were moderate influencing factors, and recovery time proved to be least important. Linking these attributes to their respective operations (arthroplasty offers better grip strength and range of motion; and arthrodesis has a lower cost, requires fewer reoperations, and has a shorter recovery period), these results suggest that offering arthroplasty as the first-line surgical option is a highly patient-centered approach.
regarding PIP joint arthroplasty versus arthrodesis at the index finger. Although the authors demonstrated that arthroplasty preserved joint motion better than did arthrodesis, based on their finding that patients who had arthroplasty had significantly higher reoperation rates, they advocated changing practice to make arthrodesis the first-line operative choice. Our results suggest that many patients are willing to accept future additional operations to lengthen periods of better function. Of course, patient preference is only one aspect of this question, albeit an important one; additional factors such as increased health care use and higher costs are also important system-level considerations. However, these results illustrate the importance of understanding trade-offs in complex decision making.
Differences in our study findings compared with those reported by Vitale et al
further highlight the importance of finger-specific considerations when selecting an operation for PIP joint OA. In the study by Vitale et al, postoperative surveys demonstrated no differences in surgery satisfaction or MHQ scores between patients who had undergone arthroplasty versus arthrodesis. Given that PIP joint immobility does not greatly impede the key pinch functions of the index finger, these patients likely would not experience functional differences at the same rate as patients with little or ring finger disease, who need joint motion to preserve grip strength. Helping patients understand the relationship between motion and function at different fingers may overcome patients’ initial concerns that joint stiffness will unduly impair function.
Although this study establishes baseline population preference data, these results are not meant to substitute for thoughtful discussions between patients and surgeons regarding individual circumstances. There will certainly be cases in which arthroplasty is not appropriate. Thus, the best application of our findings may actually lie in crafting better individual decision tools. Although decision aids have been successfully developed to guide operative decisions in knee and shoulder OA, to the best of our knowledge, similar tools do not exist for PIP joint OA.
Furthermore, a recent analysis of elderly patients’ experiences using conjoint software to make decisions regarding knee OA found that participants’ low comfort with computers impeded their desire to use Web-based tools.
(video) that preferentially focus on areas that matter most to patients. By spending more time illuminating the trade-offs among joint stiffness, grip strength, and future surgery while limiting questions about recovery time, researchers can maximize the benefit of the current conjoint results while remaining patient-centered.
Response inefficiency is a well-characterized issue in CA and likely had some effect on our study findings. Because DCEs require a higher degree of attention than many other surveys with a stated preference, respondents might employ heuristic approaches that may confound the results. Measurement error in this design include simplifying the decision such that respondents no longer maximize utility (eg, deciding to find the lowest-cost option in each DCE rather than weighing the full merits of choice A vs choice B); respondent fatigue (picking a choice just to be done); and heterogeneity in attribute interpretation (person A taking “stiffness” to mean “pain with motion,” whereas person B takes it to mean “limited arc of motion”).
We attempted to address these issues by conducting pre-deployment pilot-testing to ensure the meaning of our attributes and value-levels was clear. We provided participants with a definition list before DCEs. As discussed previously, we capped both total attribute number and value-levels to reduce choice complexity and used a balance design to minimize heuristic use. Finally, we bolded relevant text in each question stem to try to decrease responder burden. However, despite these precautions, almost 40% of initial respondents did not complete the full survey, which suggests that responder burden remained an issue.
This study was also limited because we enrolled individuals from a single academic medical center, which may restrict these findings’ broad applicability. Furthermore, because we recruited predominantly online and administered the survey electronically, we also may have introduced some selection bias, particularly because our target population was largely older adults. Although the exact relationship between technology familiarity and preference for arthroplasty-associated attributes is not obvious, it is possible that respondents who were more comfortable with online communication were also more activated in their hobbies or jobs, and thus prioritized functionality attributes over lower-maintenance attributes. Furthermore, latency bias or hyperbolic discounting (a phenomenon that occurs when individuals discount future harms in favor of short-term gains, even if small) may have biased our results toward immediately applicable attributes such as joint motion and grip strength and inappropriately contributed to patients discounting the need for future surgery.
Finally, because half of our study population had previously undergone surgery, it is possible that those respondents biased the overall results. We suggest that this was likely not the case because, as with the question of technology familiarity, there is no robust evidence that previous surgery would have directed patient preference in a consistent direction. For example, patients who were unsatisfied with the results of the arthroplasty may have been more likely to discount joint mobility in favor of avoiding future surgery, whereas patients who had excellent results may have continued to rank grip strength and mobility highly. Addressing this question through subgroup analysis adequately would have required us to power the study for 4 additional groups (arthroplasty vs arthrodesis at the index/middle finger and ring/little finger) because joint stability and grip strength differ based on the affected finger. Future analyses examining preferences among the discrete populations are certainly warranted.
Ultimately, our study demonstrates that CA can provide surgeons with nuanced patient preference data that can structure preoperative decision making. As treatment options continue to become more complex, it is important to develop methodologies that can move beyond simple preference rankings to delineate the value patients place on different surgical outcomes relative to one another. As shown in this study, discrete choice experiments and CA may prove to be powerful and efficient tools in creating road maps for decision aids, and work should continue to leverage these findings to assist patients in making the best operative decisions for their value.
This work was supported by American Foundation for Surgery of the Hand Resident Fast Track Grant N022677/387290, the National Institute of Arthritis and Musculoskeletal and Skin Diseases of the National Institutes of Health under Award 2 K24-AR053120-06, and T32 Training Grant 5T32GM008616-17. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Estimates of the prevalence of arthritis and other rheumatic conditions in the United States: part II.
EULAR evidence based recommendations for the management of hand osteoarthritis: report of a Task Force of the EULAR Standing Committee for International Clinical Studies Including Therapeutics (ESCISIT).