2.1 Samples
Data were collected online by an independent polling company (Ipsos) in April and May 2015. Quota sampling was employed to obtain samples representative of the general population with respect to sex, age, occupation, region, and population density of the UK (n=1,509), France (n=1,501), and Germany (n=1,502). Sample weights were calculated using the random iterative method (RIM) to match the latest data available in each country (census 2011 for the UK and Germany, census 2012 for France).
Participation in our general population samples was voluntary and data protection laws obeyed by Ipsos. If a respondent chose to drop out at some point, the data given until that point was not included. As skipping items was not possible, there were no missing data.
2.2 Measures
PROMIS domains and item banks
We used the PROMIS-29 v2.0 Profile to assess seven core domains of health, each assessed with four items: physical function, fatigue, pain interference, anxiety, depression, sleep disturbance, and the ability to participate in social roles and activities (referred to as participation in the remainder of this article) plus the visual analogue scale (VAS) expressing pain intensity on a scale ranging from 0 to 10(28). PROMIS-29 has, compared to other short forms, enough items to achieve a sufficient degree of precision while maintaining a reasonable response burden. Items are measured on five levels (e.g. “never”, “rarely”, “sometimes”, “often”, “always” or “not at all”, “a little bit”, “somewhat”, “quite a bit”, “very much”) and refer to the past 7 days (except physical function). Answers yield a number from one to five, which, once fed into the online PROMIS converter (http://www.healthmeasures.net/score-and-interpret/calculate-scores), give one correspondent PROMIS T-Score (M = 50 ± SD = 10) per domain with the US general population as a reference. Note that due to the invariance property of IRT, T-Scores obtained from the PROMIS-29 are on the same metric as the scores Revicki used in his analysis, though these scores were generated using different items. For desirable constructs (e.g., physical function), higher T-scores indicate better health, whereas for undesirable domains (e.g., depression), higher T-scores indicate poorer health states.
The psychometric properties of the PROMIS-29 profile, including evidence of construct and criterion validity, have been reported elsewhere(29–32). An earlier analysis of the data used in this study revealed that scores on the seven health domains of the PROMIS-29 are measurement invariant across the UK, France, and Germany except for one item(33).
EQ-5D-3L and EQ-5D-5L questionnaires and index value
The EQ-5D-5L and the EQ-5D-3L are standardized and patient-reported HRQoL questionnaires. With answers from these questionnaires, the preference-based generic EQ-5D-5L index value or EQ-5D-3L index value, can be derived(4–7,26). Five health dimensions are involved: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension of the EQ-5D-3L has three levels (i.e. response options): “No problems” (or 1), “Some problems” (2), and “Extreme problems” (3), defining 35 or 243 different health states. Each dimension of the EQ-5D-5L has five levels: “No problems” (or 1), “Slight problems” (2), “Moderate problems” (3), “Severe problems” (4), and “Extreme problems” (5), defining 55 or 3125 different health states. This shows that the 5L version can differentiate more health states and is more sensitive than the 3L version, which is why we chose the EQ-5D-5L questionnaire in our study.
Note that people in different countries value health states differently, so both EQ-5D index values are country-specific(25,26,34–36). They can be derived from EQ-5D-5L questionnaire using either the crosswalk to the 3L value set or using the new 5L value sets(26). Crosswalks to the 3L value sets are available for the UK, France, and Germany(4,26). A 5L value set is available for Germany(35). There is also one for England, which is not equivalent to our sample of the UK, and none yet for France(36,37). We therefore used the 3L crosswalk set for all three samples, thereby ensuring comparability among our samples and to Revicki’s model, which used the 3L value set for the US(19,25,26).
The value assigned to each of these health states with the 3L value set is determined using time trade-off (TTO) and visual analogue scale (VAS) as preference elicitation methods(4,26). The maximum value for the best health state of 11111 is 1.00 or “full health” while 0.00 is considered “dead”. The minimum value of the worst health state of 55555 is negative, then considered “worse than dead”: -0.594 in the UK, -0.530 in France, and -0.205 in Germany. In the remainder of this paper, when referring to our EQ-5D-3L index value, we use the term EQ-5D.
2.3 Statistical analysis
2.3.1 Relationships among individual health domains and health state utility across the UK, France, and Germany
To obtain a first impression of the form of the relationships among individual health domains and HSU and to judge whether the relationships are stable across the three countries under investigation, we plotted the seven domain scores against HSU in the UK, France, and Germany.
2.3.2 Optimal models for predicting health state utility in the three countries
We applied stepwise regression with backward selection to find the best models to predict the EQ-5D for the UK, France, and Germany, starting with full models that incorporated linear, quadratic, and cubic effects for all seven PROMIS-29 domains. We included polynomials up to the third degree as we expected that such polynomials can more flexibly fit the observed data, e.g. in case of nonlinear relationships between predictors and outcome. We used raw polynomials for linear, quadratic and cubic effects in order to obtain coefficients which can be used for prediction independently.
Because sociodemographic factors such as age and sex are known to be useful in predicting HSU, they were also entered as possible predictors(17). The PROMIS pain intensity VAS was not included as pain is already covered by the pain interference domain, which proved to be superior than the VAS(38). Also, while all other domains comprise of 4 items, the pain intensity domain within PROMIS-29 has only this single item, not measured on a T-Score metric.
The Bayesian information criterion (BIC) was used to steer the inclusion and exclusion of predictors in the stepwise regression analyses(39). We chose nRMSE and nMAE as measures of the prediction precision and bias as they are preferred over either R2 or BIC used by Revicki(19,40). The nRMSE is the normalized root of the sum of the squared residuals between observed and predicted scores and the nMAE is the normalized mean absolute error of the absolute residuals. Both are normalized with respect to the different scale ranges of the EQ-5D in the UK, France, and Germany. We also determined the width between the 95% empirical limits of agreement and compared them to the 95% theoretical limits of agreement (i.e., ± 1.96 * SD(residuals)). To check the prediction performance along the HSU continuum, Bland-Altman plots were used.
We use cross-validation to check for overfitting(41). With this in-sample cross-validation technique, the initial dataset is randomly split into 10 subsamples of approximately equal size. One of these subsamples is kept for validation, while the other nine subsamples are used for parameter estimation. This process is repeated ten times, and the results are averaged across repetitions. Overfitting would show when a model’s nRMSE is substantially smaller than the average nRMSE of the models of the 10 subsamples.
We used R version 3.4.1, IBM SPSS Statistics version 23, and Microsoft Excel version 15 to run the analyses.
2.3.3 Impact of misspecified mapping functions on the prediction performance
To the best of our knowledge, as of September 2020, the mapping function by Revicki was the only one available for predicting the EQ-5D from the PROMIS-29 T-scores(19):
EQ-5D=1.0266+0.0077*Physical Functioning-0.0021*Fatigue-0.0040*Pain Interference-0.0023*Anxiety-0.0022*Depression
We were interested in quantifying the detrimental effect of applying this foreign mapping function to the data collected in Europe. Note that application of Revicki’s model to the data collected in the UK, France and Germany (i) disregards the country specificity of the EQ-5D, (ii) does not utilize the potential predictive value of the two PROMIS-29 health domains not used by Revicki, (iii) does not take higher-order effects into account, and in combination with the foregoing, (iv) disregards country dependency of the form of relationships (i.e., the specific values of the regression coefficients used).
Because we were also interested in which factor is mainly responsible for the differences in prediction performance, we moved stepwise from Revicki’s model to our models as follows: First, we used the five health domains of Revicki’s model, but with regression coefficients optimized towards the data collected in each country separately. Second, we investigated the incremental value of adding either sleep disturbance, participation, or both to the prediction equation. Third, we allowed for incorporation of quadratic and/or cubic effects.