Five important concerns for the use of anchor-based methods have emerged from the literature: 1) global estimates of change are consistently biased toward the present state; 2) the use of static current state global measures, while not subject to artifacts of recall, may exacerbate the problem of estimating clinically meaningful change; 3) the anchor assessment response(s) that indicates meaningful change usually involves an arbitrary judgment; 4) the calculated interpretation thresholds are sensitive to the proportion of patients who have improved; and 5) for anchor-based regression methods, the correlation between the COA change scores and the anchor has a direct linear relationship to the magnitude of the interpretation threshold derived using an anchor-based approach, with stronger correlations yielding larger interpretation thresholds. Each of these five concerns is discussed below.
1. Global Estimates of Change Are Consistently Biased Toward the Present State
Anchor-based methods based on a patient global estimate of change (e.g., “Please choose the response that best describes the overall change in your since you started taking the study medication: Much better, A little better, No change, A little worse, Much worse” [6]) have consistently demonstrated bias by overweighting the present state and underweighting the initial state. Empirically, if reports of within-patient change are an unbiased estimator of the difference between study baseline and the present condition, they should result in a high positive correlation with present state and a negative correlation of equal magnitude with the baseline scores [7]. However, empirical investigations correlating the assessment of change on the anchor with independent measures of baseline and present state have consistently demonstrated a high positive correlation with the patients’ current status, and a near zero, and occasionally positive, correlation with baseline assessments [7–10].
The fundamental problem with the approach is that remembering and estimating change from a baseline several weeks or months earlier can be an extremely difficult recall task; as a consequence, people devise alternative, albeit unconscious, strategies [7]. One identified strategy is implicit theory of change [11]. Using numerous examples from the social science literature, Ross [12] documented how individuals do not directly recall the initial state; instead, they use implicit theories based on their current state to estimate their initial state and then reconstruct the estimate of change over time. As a result, implicit theories of patient-perceived stability and/or change lead to recall bias and overweight current status in the change estimation. Ross’s work provides a framework to understand the empirical evidence [7–10] that retrospective ratings are a reflection of the patient’s perception of the current status rather than an accurate assessment of change over time.
2. The Use of Static Current State Global Measures May Exacerbate the Problem of Estimating Clinically Meaningful Change
The challenge to appropriately identify the patients who have a meaningful change is perhaps more difficult when a static (current state) patient global impression of severity (PGIS) scale is used (e.g., “Please choose the response below that best describes the severity of your illness over the past week: None, Mild, Moderate, Severe) as recommended by the FDA [6]. This approach does not directly elicit information from patients about the magnitude of meaningful change.
Moreover, these patient assessments of present state may also suffer from a bias analogous to the implicit theory of change or stability. The related bias is called response shift – as a patient’s health state changes, their expectation of ideal health may change with it. Patients with chronic or degenerative diseases may acclimatize to their health state, so report good or excellent health despite obvious infirmities. As a consequence, “HRQoL scores can be stable despite changes in HRQoL” [14]. That is, the static PGIS response given at baseline may not reflect the health framework used by the patient at later PGIS assessments.
3. The Anchor Assessment Response(s) That Indicates Meaningful Change Usually Involves an Arbitrary Judgment
Using the patient global estimate of change (e.g., “Please choose the response that best describes the overall change in your since you started taking the study medication: Much better, A little better, No change, A little worse, Much worse” [6]) to understand and interpret meaningful change requires the selection of a specific global response(s) to anchor the change score analyses. What then constitutes a meaningful change? Does a patient need to be Much better or perhaps simply A little better for the COA change/improvement to be meaningful? What if the disease or condition is known for rapid patient deterioration on the COA’s concept of interest, should a patient change response of No change indicate a meaningful change or improvement over time given the historically-known downward disease trajectory? Moreover, the selected meaningful change level(s) need to identify the patients who have changed, while at the same time, have not changed too much. That is, it is important to identify the subset of patient who have experienced a meaningful change, but at the same time, to not overestimate the important change threshold through the inclusion of patients with large changes in the meaningful change estimation process [13].
The situation is exacerbated when static state measures are used because relevant change levels are determined by computing the difference between the two states, yet meaningful change is not directly estimated by patients. Recognizing this, the FDA asks sponsors to specify and justify “the anchor response category that represents a clinically meaningful change to patients on the anchor scale, e.g., a 2-category decrease on a 5-category patient global impression of severity scale.” [1]. However, when the criterion judgment of meaningful change over time on this static scale is left in the hands of the investigating team or an expert panel, this undermines the process of identifying meaningful patient-informed change using the static anchor.
It is suggested [1, 5] that interpretation of meaningful change may be assisted by graphic display of the empirical cumulative density function (eCDF) at each global change value. While the eCDF curves from each change level provides descriptive information on the relationship of the COA to the anchor’s change score, these graphic displays do not directly inform the anchor’s meaningful change threshold. That is, by noting that “The meaningful within-patient threshold of the target COA should be explored by the eCDF of the anchor category where the patients are defined and judged (by the anchor measure) as having experienced meaningful change in their condition” [1] assumes that meaningful change level for the anchor is known or has been established.
Indeed, without an adequate qualitative investigation [15] of the global item’s response options to understand what patients within the target population consider a meaningful change in how they feel or function on the global item’s scale, the selected relevant level(s) for the global assessment response(s) that indicates meaningful change may involve an arbitrary judgment by the investigating team, and that judgment can differ over time [3, 16]. The use of a static anchor (e.g., PGIS) and eCDF displays does not address the crucial issue of how the anchor’s meaningful change level is established.
Finally, the reliability of anchor ratings is generally unknown, with limited evidence supporting test–retest reliability of anchor item(s). The paucity of evidence of reliability for the anchor assessments was first noted in 1997 by Norman, Stratford and Regehr [7], and as described by Lavigne in 2016, if the anchor item(s) used to assess the meaningful change is not reliable, the resulting change threshold for meaningful improvement or decline may not be reliable [17].
4. The Calculated Interpretation Threshold Is Sensitive to the Proportion of Patients Who Have Improved
Terluin et al [18] examined the impact of the proportion of improved patients on minimally important change (MIC) thresholds in: 1) multiple simulations of patient samples from anchor-based MIC studies, and 2) in a clinical study dataset. A group MIC was compared to the average of all individual patient reported MIC levels if the patient reported an important change/improvement using a global change anchor, and the group MIC was calculated using two methods, receiver-operator characteristic (ROC) curves and predictive modeling [18]. Not surprisingly, when the proportion of improved patients was less than 50%, the group MIC underestimated the average of individual MICs because proportionately more observations came from the unchanged group. Conversely when more than half the patients had an important change/improvement, the group MIC overestimated the average of individual MICs for the same reason [18].
The FDA has discouraged the use of ROC analysis as the primary method for understanding the meaningful within-patient change threshold for similar reasons, making note that this method is “partially a distribution-based approach” and “the most sensitive threshold identified by ROC [analysis] may not actually be the most clinically meaningful threshold to patients” [5].
5. The Strength of the Relationship Between Changes in the Anchor and Changes in the COA has a Direct Impact on the Magnitude of the Meaningful Change Threshold
An important and often overlooked source of bias in anchor-based methods that directly influences the magnitude of the meaningful change threshold is the correlation between change assessed by the anchor and the COA change scores [13]. It is self evident that there should be some relationship between change in the COA and change assessed by the anchor scale [1, 5]. This can be visualized by considering extreme cases. If there is a no relationship between change on the COA and change on the anchor (rxy = 0.0), then no amount of change in the anchor will lead to a non-zero predicted change in the COA. The two measures are independent. Conversely, if there is a perfect linear relationship (rxy = 1.0), then any change in the anchor will lead to an equivalent 1 SD change in the COA. And presumably intermediate change correlations/relationships between the two measures must result in correspondingly intermediate values on the meaningful change on the COA.
Without providing specific thresholds, FDA notes that an anchor should be “sufficiently correlated to the targeted COA.” [1, 5]. Hays, Farivar, & Liu reported in 2005 that a correlation coefficient of r ≥ 0.371 (equivalent to an effect size of 0.80) defines “a noteworthy (large effect) association” between change on the anchor and change on target COA measure [19]. Other authors have recommended a range of 0.30–0.70 for the magnitude of this change scores correlation [20, 21]. Leaving aside the broad nature of these recommendations, what is not recognized is that selecting a correlation range does not nullify the impact of the correlation on the calculation of the meaningful change threshold. Quite the opposite; the magnitude of the association is a direct determinant of the magnitude of this threshold when regression analysis is used to estimate change in the COA that results from meaningful change on the anchor [13]. Expressed in standard deviation (SD) units, the meaningful change threshold is the difference on the anchor scale in SD units corresponding to meaningful change multiplied by the correlation coefficient.
This key issue makes clear that the magnitude of the resulting meaningful change threshold that emerges from the regression analysis will increase with the strength of this relationship. [13]. Moreover, the effect is non-trivial – depending on the strength of the correlation, the meaningful change threshold can vary from 0 to 1 SD, and is minimal when the correlation is smallest – the weakest relationship. Setting arbitrary ranges of correlation such as 0.30 to 0.70 reduces the impact of correlation, but it remains a major determinant.
While the magnitude of this bias can be directly estimated, it is unclear how to address the role of this key determinant. Fayers and Hays recommended a strategy called linking that equates the standardized change in the COA and the anchor; this strategy is equivalent to assuming a perfect linear relationship (rxy = 1.0) between the change scores [13]. This strategy is, of course, problematic in justifying the use of the anchor if there is a notably weak association between the two change scores. Moreover, the effect of this strategy is not trivial. Using the results from Suner et al [22], Fayers and Hays’ report the estimated minimal important difference (MID) on the 25-item National Eye Institute Visual Function Questionnaire (NEI VFQ-25; scored 0 (worst) to 100 (best vision-related function)) using at least a 15-word change in visual acuity as the anchor was 4.3 points using linear regression models (r < 0.3) and 21.8 via the linking approach—two widely different thresholds for interpreting change over time for patients with neovascular age-related macular degeneration [13, 22].
This finding is worrisome in that a stronger association between change on the COA and anchor will yield higher values for meaningful change threshold, while a weaker anchor relationship yields a lower threshold for demonstrating a meaningful change using regression analysis.
The importance of this finding may give cause for the reconsideration of all meaningful change thresholds computed to date using an anchor-based approach and regression analysis.