2.1 Samples
Data were collected online by an independent polling company (Ipsos) in April and May 2015. Quota sampling was employed to obtain samples representative of the general population with respect to sex, age, occupation, region, and population density of the UK (n=1,509), France (n=1,501), and Germany (n=1,502). Sample weights were calculated using the random iterative method (RIM) to match the latest data available in each country (census 2011 for the UK and Germany, census 2012 for France).
Participation in our general population samples was voluntary and data protection laws obeyed by Ipsos. If a respondent chose to drop out at some point, the data given until that point was not included. As skipping items was not possible, there were no missing data.
2.2 Measures
PROMIS domains and item banks
We used the PROMIS-29 v2.0 Profile to assess seven core domains of health, each assessed with four items: physical function, fatigue, pain interference, anxiety, depression, sleep disturbance, and the ability to participate in social roles and activities (referred to as participation in the remainder of this article) plus the visual analogue scale (VAS) expressing pain intensity on a scale ranging from 0 to 10(28). PROMIS-29 has, compared to other short forms, enough items to achieve a sufficient degree of precision while maintaining a reasonable response burden. Items are measured on five levels (e.g. “never”, “rarely”, “sometimes”, “often”, “always” or “not at all”, “a little bit”, “somewhat”, “quite a bit”, “very much”) and refer to the past 7 days (except physical function). Answers yield a number from one to five, which, once fed into the online PROMIS converter (http://www.healthmeasures.net/score-and-interpret/calculate-scores), give one correspondent PROMIS T-Score (M = 50 ± SD = 10) per domain with the US general population as a reference. Note that due to the invariance property of IRT, T-Scores obtained from the PROMIS-29 are on the same metric as the scores Revicki used in his analysis, though these scores were generated using different items. For desirable constructs (e.g., physical function), higher T-scores indicate better health, whereas for undesirable domains (e.g., depression), higher T-scores indicate poorer health states.
The psychometric properties of the PROMIS-29 profile, including evidence of construct and criterion validity, have been reported elsewhere(29–32). An earlier analysis of the data used in this study revealed that scores on the seven health domains of the PROMIS-29 are measurement invariant across the UK, France, and Germany except for one item(33).
EQ-5D-5L crosswalk value set
The EuroQoL EQ-5D is a standardized patient-reported HRQoL questionnaire, measuring five health dimensions of health: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Its original version, the EQ-5D-3L, differentiates 3 levels per domain, defining 35 or 243 health states. Its revised version, the EQ-5D-5L, has five levels: “No problems” (or 1), “Slight problems” (2), “Moderate problems” (3), “Severe problems” (4), and “Extreme problems” (5), defining 55 or 3125 different health states. We chose the EQ-5D-5L questionnaire because it can differentiate more health states and is more sensitive. Each health state is assigned a HSU by different value sets, reflecting the preferences of the general population in the respective countries. For many countries, there is not yet a value set for the 5L version. An EQ-5D-5L crosswalk value set was developed for the purpose of using 3L value sets for health states described by the 5L version. We used these EQ-5D-5L crosswalk value sets as they are available for all three countries of our samples(4,34–36).
The maximum HSU for the best health state of 11111 is 1.00 or “full health” while 0.00 is considered “dead”. The minimum HSU of the worst health state of 55555 is negative, considered “worse than dead”: -0.594 in the UK, -0.530 in France, and -0.205 in Germany(26).
2.3 Statistical analysis
2.3.1 Relationships among individual health domains and health state utility across the UK, France, and Germany
To obtain a first impression of the form of the relationships among individual health domains and HSU and to judge whether the relationships are stable across the three countries under investigation, we plotted the seven domain scores against HSU in the UK, France, and Germany.
2.3.2 Optimal models for predicting health state utility in the three countries
We applied stepwise regression with backward selection to find the best models to predict the EQ-5D-5L crosswalk for the UK, France, and Germany, starting with full models that incorporated linear, quadratic, and cubic effects for all seven PROMIS-29 domains. We included polynomials up to the third degree as we expected that such polynomials can more flexibly fit the observed data, e.g. in case of nonlinear relationships between predictors and outcome. We used raw polynomials for linear, quadratic and cubic effects in order to obtain coefficients which can be used for prediction independently.
Because sociodemographic factors such as age and sex are known to be useful in predicting HSU, they were also entered as possible predictors(17). The PROMIS pain intensity VAS was not included as pain is already covered by the pain interference domain, which proved to be superior than the VAS(37). Also, while all other domains comprise of 4 items, the pain intensity domain within PROMIS-29 has only this single item, not measured on a T-Score metric.
The Bayesian information criterion (BIC) was used to steer the inclusion and exclusion of predictors in the stepwise regression analyses(38). We chose nRMSE and nMAE as measures of the prediction precision and bias as they are preferred over either R2 or BIC used by Revicki(19,39). The nRMSE is the normalized root of the sum of the squared residuals between observed and predicted scores and the nMAE is the normalized mean absolute error of the absolute residuals. Both are normalized with respect to the different scale ranges of the EQ-5D-5L crosswalk in the UK, France, and Germany(40–42). We also determined the width between the 95% empirical limits of agreement and compared them to the 95% theoretical limits of agreement (i.e., ± 1.96 * SD(residuals)). To check the prediction performance along the HSU continuum, Bland-Altman plots were used.
We use cross-validation to check for overfitting(43). With this in-sample cross-validation technique, the initial dataset is randomly split into 10 subsamples of approximately equal size. One of these subsamples is kept for validation, while the other nine subsamples are used for parameter estimation. This process is repeated ten times, and the results are averaged across repetitions. Overfitting would show when a model’s nRMSE is substantially smaller than the average nRMSE of the models of the 10 subsamples.
We used R version 3.4.1, IBM SPSS Statistics version 23, and Microsoft Excel version 15 to run the analyses.
2.3.3 Impact of misspecified mapping functions on the prediction performance
To the best of our knowledge, as of September 2020, the mapping function by Revicki was the only one available for predicting the EQ-5D-3L index value from the PROMIS-29 T-scores(19):
EQ-5D=1.0266+0.0077*Physical Functioning-0.0021*Fatigue-0.0040*Pain Interference- 0.0023*Anxiety-0.0022*Depression
We were interested in quantifying the detrimental effect of applying this foreign mapping function to the data collected in Europe. Note that application of Revicki’s model to the data collected in the UK, France and Germany (i) disregards the country specificity of any version of the EQ-5D, (ii) does not utilize the potential predictive value of the two PROMIS-29 health domains not used by Revicki, (iii) does not take higher-order effects into account, and in combination with the foregoing, (iv) disregards country dependency of the form of relationships (i.e., the specific values of the regression coefficients used).
Because we were also interested in which factor is mainly responsible for the differences in prediction performance, we moved stepwise from Revicki’s model to our models as follows: First, we used the five health domains of Revicki’s model, but with regression coefficients optimized towards the data collected in each country separately. Second, we investigated the incremental value of adding either sleep disturbance, participation, or both to the prediction equation. Third, we allowed for incorporation of quadratic and/or cubic effects.