Health Utility Measurement with Index Scores for People Living with HIV/AIDS Under Combined Antiretroviral Therapy: A Comparison of EQ-5D-5L and SF-6D

The EQ-5D-5L and SF-6D are two widely used generic index score measures. We compared the discriminative validity, agreement and sensitivity of EQ-5D-5L and SF-6D utility scores in people living with HIV/AIDS (PLWHIV). We conducted a cross-sectional survey among PLWHIV aged more than 18 years old in 9 municipalities in Yunnan Province, China. A convenience sample was enrolled. We administered the SF-12 and EQ-5D-5L to measure health-related quality of life (QALY). The utility index of the SF-6D was derived from the SF-12. The covariate data included demographic components, clinical components and social-psychology components. To evaluate the homogeneity of the EQ-5D-5L and SF-6D, intraclass correlation coecients (ICCs), scatter plots and Bland-Altman plots were computed and drawn. To evaluate the capacity to discriminate between different categories of clinical components, social support and anxiety and depression status, mean and median scores were calculated and compared using one-way ANOVA and the Kruskal-Wallis test, respectively. The effect size was dened as the difference of each of the characteristics and was computed using Z/N. We also used receiver operating characteristic (ROC) curves to compare the discriminative properties and sensitivity of the econometric index. A total of 1,797 respondents, with a mean age of 45.6±11.7 years (range 18 to 80), was interviewed. The distribution of EQ-5D-5L scores skewed towards full health with a skewness of -3.316. The distribution of SF-6D scores was almost centered around its mean, and the skewness was 0.084. The effect size was smaller for the EQ-5D-5L than for the SF-6D across the social support, anxiety and depression subgroups. The overall correlation between EQ-5D-5L and SF-6D index scores was 0.46 (P<0.001). An ICC of 0.59 between the EQ-5D-5L and SF-6D meant a moderate correlation and indicated general agreement. The Bland-Altman plot displayed the same results as the scatter plot. The ROC curve showed that the AUC for the SF-6D was 0.776 (95% CI: 0.757, 0.796) and that for the EQ-5D-5L was 0.732 (95% CI: 0.712, 0.752) by the PCS-12, and it was 0.782 (95% CI: 0.763, 0.802) for the SF-6D and 0.690 (95% CI: 0.669, 0.711) for the EQ-5D-5L by the MCS-12. The greater the effect size was, the stronger the discriminative capacity. We dened the relevant difference of the instrument as showing different effect sizes for the same group category, which could also be explained by the disagreement on the amount of health burden. In this study, the discriminative properties of the econometric index were also compared using receiver operating characteristic (ROC) curves. We used the SF-12 component summaries as external indicators of the performance of the EQ-5D-5L and SF-12. We set the external indicator as a dichotomized variables using the median cutoff points of the PCS-12 and MCS-12. The largest area under the ROC curve (AUC) for the utility measurement demonstrated the most sensitivity to detect differences in the external indicators. The F-ratio of the signicance test for the AUC was referenced to 1.0 for the EQ-5D-5L index. If a value was greater than 1.0, we considered the SF-6D index to be more ecient than the EQ-5D-5L at detecting differences between the categories [27]

dimensions: mobility, ability to self-care, ability to undertake usual activities, pain and discomfort, and anxiety and depression [9] . The primary version of the EQ-5D allows respondents to indicate the degree of impairment in each dimension by three levels: no problems, some problems and extreme problems. A new version of the EQ-5D-5L includes ve levels to indicate the degree of severity for each dimension: no problems, slight problems, moderate problems, severe problems and extreme problems [5] . Regardless of whether the EQ-5D-3L or EQ-5D-5L is used, a preference-based index can be generated. The SF-12 is the abbreviated version of the SF-36 (the Short Form-36 Health Survey) with 12 items, and it provides two component summary scores related to physical and mental health [10] . The SF-6D (the Short Form-6 Dimension) is an econometric index derived from the preference value system of the SF-36 and SF-12 [11] . The SF-6D may provide a useful alternative to the EQ-5D. It has shown values comparable to those from the EQ-5D.
Several studies have compared the EQ-5D with the SF-12 or SF-6D in the general population and different disease groups [8][9][10][11][12][13] . Some studies reported that the different instruments generated widely differing HRQoL scores for the same patient groups. Some studies supported the usage of the SF-6D, which had lower oor and ceiling effects and could better detect the different stages of the disease. Some studies suggested that the EQ-5D could be recommended for use in severe conditions and that the SF-6D could be recommended for use in mild conditions. Together, these studies highlighted the variation in the results generated from the different instruments.
Currently, HRQoL is believed to be a dynamic and relative concept for PLWHIV in the cART era [14] . Otherwise, a considerable number of people living with HIV/AIDS and limited resources for prevention and therapy make resource allocation critical for decision making. A preferred instrument choice is urgently needed for more accurate results. Few studies have compared the EQ-5D-5L with the SF-6D in terms of their power to distinguish health status in PLWHIV. The main purpose of our study was to describe health state index scores with EQ-5D-5L and SF-6D scores in PLWHIV to evaluate the relationship, accuracy and applicability of the two measures.

Study design
We conducted a cross-sectional survey among PLWHIV aged more than 18 years old in the 9 municipalities in Yunnan Province from October 2019 to May 2020. A convenience sample including 1,797 participants was enrolled. Investigators with strict training from local CDCs (Center for disease control and prevention) and social organizations implemented the investigation face to face.

HRQoL assessment
We administered the 12-item Short Form Health Survey (SF-12), which is the shortened version of the 36-item Short Form Health Survey (SF-36) and can explain at least 90% of the accuracy of the SF-36. The SF-12 consists of eight domains and generates two separate summary scores, physical functional scores (PCS) and mental function scores (MCS), ranging from 0 to 100. Higher scores indicated better HRQoL [15,16] . The Cronbach's α was 0.434.
We also administered the EQ-5D-5L (EuroQoL 5-dimensions) simultaneously. The EQ-5D-5L comprised two components: the utility index (UI) and the EQ-VAS (visual analog scale, VAS). We calculated UI by the respondents scores from the ve dimensions (mobility, self-care, usual activities, pain/discomfort, anxiety/depression). For each dimension, respondents were asked to mark from 1 "no problems" to 5 "extreme problems" [17] . All the responses were combined to form a ve-digit number to describe the health status; for example, 11111 represents no problems in any of the ve dimensions. All of these were converted to a UI based on the EQ-5D-5L validation set for Chinese individuals (see Table 1) [18] . The UI ranged from 0 (the worst possible health status) to 1 (the best possible health status). At the same time, all the participants were asked to complete the EQ-VAS. They recorded their self-rated health status on a vertical VAS (visual analog scale) with the end point "the worse health you can image" at the bottom ("0") and "the best health you can image" at the top ("100"). A higher HRQoL was associated with a higher UI and EQ-VAS score [18] . Cronbach's α was 0.813.

Covariate data collection
The data for the covariates included in the study included three parts: a demographic component, a clinical component and a social-psychology component.
All the participants completed a demographic questionnaire designed by the study staff; it included information on age, marital status, education, ethnicity, and household income per year. The clinical components included time since HIV diagnosis, infectious status at diagnosis (HIV infection; AIDS), CD4 count level at diagnosis, self-reported mode of HIV transmission, and time when ART was begun. We also recorded the number of ARTs to track the recent CD4 counts and ART regimens from the electronic follow-up system. Social support was assessed by the Social Support Rating Scale (SSRS) designed by Xiao Shuiyuan in 1986, primarily for the Chinese population [20] . It comprised ten items to examine three dimensions: objective social support, subjective social support and support utilization. The nal score was obtained by averaging all three dimension scores. A higher score indicated higher perceived social support. The reliability has a Cronbach's α of 0.684. Anxiety and depression were assessed by the Chinese version of the Hospital Anxiety and Depression Scale (HADS), which comprises an anxiety and depression subscale with seven items for each condition. Each item was rated on a four-point Likert-type scale from 0 to 3. The maximum and minimum score for each ranged from 0 to 21. A participant with a single score ≥ 8 or a total score ≥ 13 was de ned as pathological. A higher score demonstrated more serious anxiety and/or depression symptoms [21] . The reliability of the HADS in our study had a Cronbach's α of 0.237.

Statistical analysis EQ-5D-5L scoring
The EQ-5D-5L can de ne 3,125 possible health states by the different answer combinations. We adopted the Chinese population-based preference trade-off time (TTO) ( Table 1) to transform the measures into a UI, thereby producing a single preference-based index ranging from -0.391 to 1.000, where 0 was equal to death and -0.391 meant worse than death. The Chinese validation set is shown in Table 1. For example, when we calculated a combination of "21145", the UI equaled 1-0.066-0-0-0.252-0.258=0.424 [17] .

SF-12 scoring
Scores for the two component summaries (physical and mental component summaries, PCS-12 and MSC-12) were calculated using the 2 nd edition of the standard US instrument scoring algorithms (SF-12v2) [15] .

SF-6D scoring
The UI of the SF-6D can be derived from the SF-36 or SF-12, although the number of items included differs: 11 items and 7 items, respectively. We used the validation set from the SF-12 and considered the validation set from the UK general population developed by Brazier et al. (see Table 2 and Figure 1). The UI of the SF-6D can be calculated as U=1+coe cient of different dimensions+ adjustment coe cient (Most=-0.085). For example, when the combination was "223123" for the SF-6D, the UI equaled 1-0.012-0.068-0.069-0-0.004-0.065-0.085=0.697. If the most serious level was chosen for any dimension, the Most was subtracted from the nal result [22,23] .

Data analysis
We described the sample characteristics by calculating the number of individuals and the percentage in each category group. We also computed descriptive statistics, including the mean, standard deviation (SD), 95% con dence intervals (CIs), median, interquartile range (IQR), and the minimum and maximum EQ-5D-5L and SF-5D index scores. Ceiling and oor effects, which were de ned as the proportion of respondents with the best (11111 for the EQ-5D-5L and 111111 for the SF-6D) and the worst (55555 for the EQ-5D-5L and 345555 for the SF-6D) possible theoretical scores, respectively, were calculated for the EQ-5D-5L and SF-6D. If the distribution of index scores was highly skewed, differences between the EQ-5D-5L and SF-6D index scores were examined by Wilcoxon signed-rank test. The Mann-Whitney test was used to compare index scores across participants' characteristics for two groups, and the Kruskal-Wallis test was used for more than two groups.
To evaluate the homogeneity of the EQ-5D-5L and SF-6D, intraclass correlation coe cients (ICCs), scatter plots and Bland-Altman plots were computed and drawn. ICC=1 indicated complete correlation; 0.7≤ICC≤0.9 indicated a strong correlation; 0.4≤ICC≤0.69 indicated a moderate correlation; 0.1≤ICC≤0.39 indicated a slight correlation; and ICC=0 indicated no correlation [24] . A Bland-Altman plot was developed to compare two measurements of the same variable. Generally, the plots located in the 95% LOA (interval of the limit of agreement) interval occupy 95% and simultaneously cannot exceed the professional scope [25] .
To evaluate the capacity to discriminate between different categories of clinical components, social support and anxiety and depression statuses, the mean scores and median scores were calculated and compared using one-way ANOVA and Kruskal-Wallis tests, respectively. Effect size was de ned as the difference of each of the characteristics, which was computed using Z/N, where Z was the Mann-Whitney Z statistic, and N was the total sample size [26] . For variables with more than two categories, the effect size between extreme groups was calculated. An effect size of 0.2 demonstrated a small effect size, 0.5 demonstrated a moderate effect size, and 0.8 demonstrated a large effect size. The effect size could explain the difference in the discriminative capacity of the studied measurements. The greater the effect size was, the stronger the discriminative capacity. We de ned the relevant difference of the instrument as showing different effect sizes for the same group category, which could also be explained by the disagreement on the amount of health burden. In this study, the discriminative properties of the econometric index were also compared using receiver operating characteristic (ROC) curves. We used the SF-12 component summaries as external indicators of the performance of the EQ-5D-5L and SF-12. We set the external indicator as a dichotomized variables using the median cutoff points of the PCS-12 and MCS-12. The largest area under the ROC curve (AUC) for the utility measurement demonstrated the most sensitivity to detect differences in the external indicators. The F-ratio of the signi cance test for the AUC was referenced to 1.0 for the EQ-5D-5L index. If a value was greater than 1.0, we considered the SF-6D index to be more e cient than the EQ-5D-5L at detecting differences between the categories [27] .

Research eld and subjects
A total of 1,797 respondents, with a mean age of 45.6±11.7 years (range 18 to 80), were interviewed. A total of 68.1% of respondents were of Han nationality, while others were from minority ethnic groups, including Yi, Dai, Zhuang, Jinpo, Lisu, and Bai. A total of 58.7% of respondents declared themselves unmarried, divorced or separated. A total of 69.5% of respondents had less than 9 years of compulsory education. A total of 53.6% of respondents' occupations were farming and migrant work. The average yearly income per capita of their households was 10,871 yuan in 2019. Regarding HIV-related clinical characteristics, more than two-thirds of respondents were in the HIV stage (70.4%), 28.3% were in the AIDS stage, and only 1.2% were unclear about when they were diagnosed the rst time. A large proportion of the sample (68.8%) obtained HIV via heterosexual transmission, and 20.3% reported a history of intravenous drug use (IDU). A total of 98.7% of patients took ART; among those, 59.2% had been treated for more than four years. The majority of patients had sustained high CD4 cell counts, and 48.5% had cell counts of more than 500 cells/μl. Table 3 shows the details of the sociodemographic and clinical characteristics of all the respondents.
Descriptive statistics for the EQ-5D-5L and SF-6D The mean EQ-5D-5L index score was 0.896±0.150 (median 0.942, IQR 0.115). The distribution of EQ-5D-5L scores skewed towards full health, with a skewness of -3.316. The index score ranged from -0.391 to 1.000. The percentage of respondents ranked at the oor and ceiling were 0.1% (n=2) and 33.0% (n=593), respectively ( Figure 2). The mean SF-6D index score was 0.772±0.137 (median 0.762, IQR 0.241). The distribution of SF-6D scores was almost centered around its mean, and the skewness was 0.084, with a range from 0.374 to 1.000. The percentages of respondents ranked at the oor and ceiling were 0.06% (n=1) and 6.5% (n=116), respectively ( Figure 3). The mean SF-6D index score for respondents with the best health state on the EQ-5D-5L descriptive system (11111) was 0.862, and those with the worst health state (55555) had a mean SF-6D index score of 0.797. Conversely, the mean EQ-5D-5L index score for those with the best SF-6D health state (355151) was 0.990, and for the one respondent with the worst health state (111515), it was 0.364. Overall, the mean EQ-5D-5L index scores exceeded the mean SF-6D index scores by 0.124, and the difference between the medians was 0.22. The difference between the EQ-5D-5L and SF-6D index scores was signi cant for the entire sample and for some examined sociodemographic and infectious status subgroups (P<0.05) ( Table  4). Both the EQ-5D-5L and SF-6D index scores were signi cantly different across groups of age, race/ethnicity, education level, occupation, household income per year, transmission mode and duration of ART. EQ-5D-5L index scores were signi cantly different across initial infectious status. SF-6D index scores were signi cantly different across the most recent CD4 counts.
Comparison of the SF-12 scores across different dimensions of the EQ-5D-5L and SF-6D Both the PCS-12 and MCS-12 scores indicated signi cant differences across all EQ-5D-5L dimensions, with differences from 0.10 to 0.29 for the EQ-5D-5L dimensions for the PCS-12 and 0.05 to 0.28 for the EQ-5D-5L dimensions for the MCS-12. The relationship among mobility, self-care, usual activities, pain/discomfort dimension and PCS-12 and the relationship between the anxiety/depression dimension and MCS-12 were stronger. The relationship between the less comparable dimensions and component scores was weaker (Table 4). Both the PCS-12 and MCS-12 scores indicated signi cant differences across all SF-6D dimensions, with differences from 0.04 to 0.61 for the SF-6D dimensions for the PCS-12 and 0.02 to 0.52 for the SF-6D dimensions for the MCS-12. The relationship between the physical function, role limitation, vitality and social function dimensions and the PCS-12 and the relationship between bodily pain and the mental health dimension and the MCS-12 were stronger. The relationship between the less comparable dimensions and component scores was weaker (Table 6). The effect size was smaller for the EQ-5D-5L than the SF-6D across the social support, anxiety and depression subgroups (Table 7).
Relationship between the EQ-5D-5L and SF-6D The overall correlation between EQ-5D-5L and SF-6D index scores was 0.46 (P<0.001). The association of the two scales appeared stronger at the upper end. We also observed a degree of dispersion in which very low EQ-5D-5L scores were associated with very high scores on the SF-6D. Conversely, the very high EQ-5D-5L index scores were associated with very low scores on the SF-6D (Figure 4). An ICC of 0.59 between the EQ-5D-5L and SF-6D meant a moderate correlation and indicated general agreement. The Bland-Altman plot displayed the same results as the scatter plot, with a mean difference between the EQ-5D-5L and SF-6D index scores of 0.124. Three percent of observations were outside the 95% limits of agreement (-0.170, 0.418), which indicated an overall acceptable agreement. However, the agreement seemed weaker at the lower end of the scale, with the majority of the observations outside the limits of agreement lines. The distribution of the scatter showed a linear trend, which meant that the more obvious difference between EQ-5D-5L and SF-6D index scores existed in the observations with a good or weak health status, while the observations with a general health status and the homogeneity between the two scales seemed good ( Figure 5).

Sensitivity of EQ-5D-5L and SF-6D index scores
We set the PCS-12 and MCS-12 as the gold standard for measuring health status and used the median of the PCS-12 and MCS-12. The index scores measured by the EQ-5D-5L and SF-6D were divided into two categories. The ROC analysis results were as following ( Figure 6 and Figure 7): the AUC for the SF-6D was 0.776(95% CI: 0.757,0.796), while that for the EQ-5D-5L was 0.732(95% CI: 0.712, 0.752), according to the PCS-12, and that for the SF-6D was 0.782(95% CI:0.763, 0.802), while that for the EQ-5D-5L was 0.690(95% CI:0.669, 0.711), according to the MCS-12. Both AUC differences for the two groups were signi cant (P<0.05). The AUCs for both the SF-6D and EQ-5D-5L were more than 0.5, and the F-ratios were 1.06 and 1.13, respectively, with the EQ-5D-5L as the reference, which revealed the good ability to discriminate health statuses. The SF-6D seemed more sensitive than the EQ-5D-5L in discriminating health status de ned by the SF-12.

Discussion
Our study demonstrated evidence of the performance of EQ-5D-5L and SF-6D index scores in measuring health utility in people living with HIV/AIDS, showing a moderate correlation between the two measurements. Both have shown discriminative capacity and validity in measuring the health status of PLWHIV. However, some considerable overlaps existed in the two measurements, and there were signi cant differences in their performance, which were in accordance with the results reported in previous studies about the differences between the EQ-5D and SF-6D in the general population and several patient groups, such as rural residents in China and patients with Pompe disease, diabetes, mental health, chronic low back pain, stroke and breast cancer [11,17,25,26,28−30] .
In our study, for the mean and median EQ-5D-5L and SF-6D scores, when assessing the same sample of people living with HIV/AIDS, the EQ-5D-5L values exceeded the SF-6D values regardless of whether the whole sample or any of the subgroups was being considered, with a mean difference of 0.124 and a median difference of 0.180. This result was consistent with some studies that proved the difference in the value [11, 17,25,26,28−30] . The ICC for the whole sample was 0.59, which meant a moderate correlation. We could consider this an acceptable but not very good level of agreement for the two measurements, especially at the more serious and mild ends of the scales. The two kinds of plots also revealed the details of the marked differences between the two measures. The lack of agreement highlighted the importance of considering the reasons behind the differences to assess the suitability of the instruments within a population of PLWHIV, which is important to health technology assessment and policy making. Some previous studies have explored the reasons for the differences in the EQ-5D and SF-6D to measure health utility. We mainly discussed on the following three points. First, valuation methods were considered to explain the difference. The EQ-5D-5L is based on the time-trade-off (TTO) method, whereas the SF-6D made use of the standard gamble (SG) technique [26] .
Previous studies have shown that SG technique produced higher values than the TTO method [11,17,29] , and crossover occurred in one study in which TTO values for milder states were higher than the SG values [8,25,30] . Our study was in accordance with this result. HIV/AIDS has transformed into a chronic disease. With scaled ART, PLWHIV can maintain good physical health. We considered that it had milder states than some diseases with disability.
Second, in our study, both the EQ-5D-5L and SF-6D performed better in monitoring changes in social and psychology aspects than physical aspects for people living with HIV/AIDS; among these, the SF-6D appeared to detect more changes and had larger effect sizes than the EQ-5D-5L. This result is somewhat surprising in that the richer descriptive system of the SF-6D might make it easier to identify changes in psychological aspects, which are often smaller and more unnoticeable than physical aspects. Based on the ROC curves and AUCs, both measures revealed good ability in discriminating health status, and the SF-6D seemed more sensitive in discriminating health status. One previous study demonstrated that the difference in SE was inherently driven by the smaller SD of the SF-6D, which was a consequence of the narrower range of the index scores [26] . We considered that the reason lies in the discrepancies in the descriptive systems' contents. In a given sample population, all participants should complete the two measures simultaneously, whereby their health status would be described by the EQ-5D-5L, which includes the ve areas of mobility, self-care, usual activities, pain/discomfort and anxiety/depression, while the areas of physical functioning, role limitations, bodily pain, vitality, social functioning and mental health are obtained from the SF-6D. Different descriptive contents de ned the application and appropriateness. The EQ-5D-5L emphasizes the physical aspect of health more, while the SF-6D emphasizes mental health and social adaptation more. With combined antiretroviral therapy greatly improving the survival of people living with HIV/AIDS, HIV/AIDS has transformed from being a terminal illness to being a chronic disease. A rising challenge for this population is full health, which requires more consideration be given to mental health and family and society rehabilitation. Therefore, the results implied that researchers have to choose between the two instruments based on the appropriateness of the descriptive system for the severity of potential problems the patient group may encounter. From this perspective, we preferred to apply the SF-6D to measure health utility in PLWHIV during the cART.
Third, we also considered that the various scoring algorithms contributed partly to the discrepancy of the two measures [19,31] . The validation algorithms for the EQ-5D-5L and SF-6D are presented in Tables 1 and 2 in the Methods section. There were two different kinds of algorithms that had an effect on the index score generation. For the same health status, the different scoring algorithms assigned different index scores; the worst health status measured by the EQ-5D-5L was − 0.391 (worse than death), while the SF-6D index score was 0.331. These variations resulted in different descriptive systems and different theories of scoring systems from which to choose. One previous study proved that the interpretation of the constant terms and the interaction items were the two key factors [8] . The SF-6D interpreted the constant as an expected value that was equal to one, whereas the EQ-5D interpreted the difference between the constant and one as 'any move away from full health'. For the interaction effects, the SF-6D had a simple dummy named 'MOST', which meant that the value 1 subtracted MOST if any dimension was at the 'most severe' level. The EQ-5D had a dummy named N3, which was similar to 'MOST'. MOST had a coe cient of -0.085, while N3 had a coe cient of 0 in the Chinese validation set for the EQ-5D-5L.
In addition, the preferences of the source population may also be a possible reason for the difference. The EQ-5D-5L values re ected Chinese patient preferences, while the SF-6D values re ected UK patient preferences.
Based on these factors, users needed to pay more attention to the characteristics of the target population. We can summarize some principles for making selections. First, for the general population or a mild patient population with generally good health, the EQ-5D-5L and SF-6D were likely to perform similarly, but for a sicker population, the performance of the two measures seemed different. Second, for a patient population with greatly impacted mental health and mild or minimally impacted physical health, we suggested selection of the SF-6D; such patients could include those with mental health problems, HIV/AIDS, or early stage breast cancer and patients in the controlled disease period. Otherwise, for a patient population with greatly impacted physical health, we suggested selection of the EQ-5D-5L; such populations could include patients with disease loss capacity and patients in the advanced disease period. Third, we should also consider the availability of the scoring algorithm, the origin of the population used for the validation set, the extent of change in health status and resource allocation, when using cost-utility analysis to inform local decisions.
There were some limitations in our study. First, the results are limited to our sample population of people living with HIV/AIDS who had good ART, and thus, these results may not be generalizable to all people living with HIV/AIDS, including patients with failed ART. Second, we used the SF-12 as the gold standard to establish the comparisons; however, the results of the SF-6D are derived from the SF-12, which could generate bias for the results to some extent. Third, we constructed a cross-sectional study and could not capture the responsiveness of the two measures. Fourth, depressive and anxiety symptoms were measured based on self-reports, which could over-or underestimate these symptoms.

Conclusion
Despite these limitations, our study has demonstrated evidence for instrument choice and preference measurements in PLWHIV under cART. Both EQ-5D-5L and SF-6D have shown discriminative capacity and validity in measuring health status. However, there were signi cant differences in their performance.The differences between the measures could generate different health utilities for the same sample population, which is critical for cost-utility analyses that guide resource allocation and decision making.   Scatter plot of EQ-5D-5L and SF-6D index scores Bland-Altman plot of EQ-5D-5L and SF-6D index scores