Mapping the FACT-G to EQ-5D-3L utility index in cancer: data from a cross-sectional study in in Guangxi, China

This study aimed to develop a function for mapping the cancer-specic instrument (FACT–G) to a preference-based measure (EQ-5D-3L) utility index for HRQoL, in which the utility scores were generated using the Chinese value set. The data are based on a cross-sectional survey of 243 patients in China with different cancer types. Cancer patients who completed the EQ-5D-3L and the FACT-G questionnaire, and patient demographics and clinical characteristics were included in this study. Regression models were used to predict the EQ-5D-3L utility index values based on four subscale scores of the FACT-G using the ordinary least squares (OLS) model, generalized linear models (GLM), censored least absolute deviations (CLAD), Tobit model, and two-part model (TPM) regression approaches. The performance and predictive power of each model were also evaluated using r 2 and adj- r 2 , mean absolute error (MAE) and root mean squared error (RMSE). Linear equating is a mapping technique that avoids regression to the mean. for mapping the FACT-G into the EQ-5D-3L utility index can be realized. We also recommend that OLS models be used to assess the economic evaluation of patients' health-related quality of life when the population is in moderate to good health for further cost-utility analysis in China.


Introduction
Cancer is considered the leading cause of human death. With the rapid growth and aging of the population, the incidence and mortality rates of cancer are rapidly growing worldwide. According to the GLOBOCAN 2018 estimates indicate, there would be 18.1 million new cases of cancer and 9.6 million deaths from cancer [1]. Cancer has also caused serious public health problems and represents a signi cant economic burden in China. However, with the continuous improvement of medicine and technology, the survival rate of cancer patients has improved markedly. Although early screening and treatment, as well as advanced medical technologies, can signi cantly improve the survival rate of cancer patients, these can place a huge sociopsychological and economic burden on patients. Therefore, it is very important to know the cancer patient's health-related quality of life (HRQoL), which has become increasingly popular in health economic evaluations and has recently gained signi cant attention in cancer studies [2; 3].
HRQoL refers to the state of physical, mental, and social well-being of an individual, and an accurate impact analysis of cancer-speci city on HRQoL is expected to contribute to the health economic evaluation [4; 5]. Utility is part of quality-adjusted life years (QALYs), as QALY is the product of morbidity (measured by health state utility instruments, e.g. EQ-5D), in which is scaled such that 1 indicates full health and, 0 indicates death. It also allows for less than 0 (negative), which represents a state of health worse than death [6; 7]. The three-level EuroQol-5-dimension questionnaire (EQ-5D-3L) is a standardized instrument used to measure preference-based HRQoL and highly recommended in health-economic evaluations [8]. Liu et al. successfully developed Chinese population-speci c EQ-5D-3L health states using the time trade-off (TTO) method in 2014 [9]. Thus, we used the value set based on the Chinese general population algorithm for this study.
However, preference-based instruments are not always available in clinical trials because many dimensions may not be relevant or sensitive to therapeutic effects. The Functional Assessment of Cancer Therapy-General (FACT-G) is one of the most widely used cancer-speci c HRQoL instruments as well as a non-preference-based measure for cancer patients [10]. The FACT-G is used in clinical trials to assess the quality of life (QoL), with higher values representing a better QoL, and its reliability and validity have been proven [11]. Disease-speci c instruments are mostly used to measure the HRQoL rather than generic preference-based measures as they can provide more speci c details about the patients' assessment of a particular disease. However, it does not calculate the health utility scores and QALYs directly so that limit the development of health economics research [12]. One solution is to use the development of a mapping algorithm that maps scores from the HRQoL data collected by non-preference-based instruments to general preference-based instruments [13]. Growing literature studies have suggested that a mapping function from disease-speci c instruments to generic preference-based measures using regression models in health economic evaluation is available [4; 13-15].
There are a few studies mapping from FACT-G to EQ-5D-3L that have been developed and evaluated using regression model analysis in health economic research. A study from Canada performed a mapping function to both the EQ-5D-3L and SF-6D health utility indices from the FACT-G [16]. Meanwhile, a study was conducted to evaluate the validity of both FACT-G and preference-based instruments (including the EQ-5D-3L, SF-6D, HUI-2, and HUI-3) in assessing cancer severity levels in Canadian patient data [17]. Another mapping from the FACT-G to the EQ-5D-3L health utility index in Singapore shows that a single equation can be applied to different versions of the FACT-G [18]. However, no studies are available to convert the FACT-G to EQ-5D-3L with mapping algorithms in Chinese population due to the inconsistency between utility value sets of different countries [19]. Therefore, it is necessary to develop a health utility value mapping from FACT-G to EQ-5D-3L for Chinese patients. Some studies have shown that mapping can improve the accuracy of models with socio-demographic and clinical factors among the instruments, thus affecting health utility in cost-utility analysis [10]. These studies also compare different regression methods with more accurate models [14].
In 2014, Fayers and Hays have pointed out that regression-based models typically under-predict high scores and over-predict low scores when mapping pro le-based measures to preference-based measures, because of regression to the mean [20]. According to a review of mapping studies, predicted values from the mapping functions tend to have lower levels of variance than the original observed values due to regression to mean [21]. Thus, they have suggested the use of linking strategies such as simple linear equating, equipercentile equating, or item-response theory (IRT) methodology as alternatives. While regressionbased models seek to predict the most likely true preference-based score using the pro le-based score, linking seeks to nd the preference-based score that is equivalent to the pro le-based score by aligning the score distributions of the two scales [22]. In a few mapping studies, linear equating was used to predict utility using a regression-based method and then scale aligning between predicted and observed values to force them to have the same mean and variance [23].
In general, the objective of the present study was to develop a mapping algorithm to estimate EQ-5D-3L health utility values from the FACT-G for the Chinese population, select the best appropriate model to better estimate the patients' health status from a single assessment using the FACT-G, and make recommendations for future mapping studies.

Method And Materials
Study design and data collection The Cancer Screening Program in Urban China, a major public health service project supported by the central government of China beginning in August 2012, was designed to screening programs for lung, breast, colorectal, liver, stomach, and esophageal cancers [24]. Meanwhile, a multicenter cross-sectional study was conducted in 12 provinces between September 2013 and December 2014, with appropriate screening interventions targeted at speci c types of cancer [25]. The study protocol was approved by the Institutional Review Board of the Cancer Hospital of the Chinese Academy of Medical Sciences (Approval No. 15-071/998). All participants gave their [written] informed consent.
This study involved 243 subjects according to the following criteria: 40-74 years old; be diagnosed with either lung, breast, stomach, esophagus, colorectal, or liver cancer; without any mental disorder and able to understand the survey procedure and complete the survey questionnaire; completed both the EQ-5D-3L and FACT-G scales and subscales. Data collection came from one center and multiple oncology hospitals in China. The questionnaire survey was conducted through a face-to-face interview between the investigator and the followed-up subjects.
For information regarding age, gender, marital status, level of education, family population, employment, family nancial pressure, and signi cant life events, patients were required to complete a health and demographic questionnaire. In addition, clinical characteristics were retrieved from medical records, such as tumor site, treatment protocols, age at diagnosis, time point of the survey and clinical stage for cancer according to the 6th edition of the American Joint Committee on Cancer/International Union Against Cancer staging system [35].
HRQoL Instruments EQ-5D-3L Scale The EQ-5D-3L scale is a generic preference-based instrument that provides a simple and universal health measurement method for clinical and economic evaluation. The EQ-5D-3L descriptive system consists of ve dimensions (mobility, self-care, usual activities, pain/discomfort, anxiety/ depression), plus a vertical, 0-100-point visual analogue scale for rating the overall health status. The dimensions are characterized by three levels of health(i.e.no problems, some or moderate problems, and extreme problems) [26]. The EQ-5D-3L index scores were calculated using an algorithm based on societal preferences from the general population-based valuation. Due to the Chinese value set available in the crosswalk project, the most widely used utility algorithm was based on a TTO survey of 1147 Chinese respondents in China [9]. Using this value, the EQ-5D-3L utility index ranges from -0.149 to 1, where 1 indicates full health, 0 indicates a state equivalent to death, and a negative value implies that the respondent's health state is worse than death [27].

FACT-G Scale
The FACT-G produces four subscale scores consists of 27 items that re ect the patient's QoL: physical wellbeing (PWB) (7 items), social/family well-being (SFWB) (7 items), emotional well-being (EWB) (6 items), and functional well-being (FWB) (7 items) [28]. All items were rated on a 5-point Likert scale, with higher scores indicating better HRQoL. An overall score and four dimension scores are obtained by summing the responses to the individual items they comprise. The reliability and validity of the instrument have been well demonstrated in cancer trials and clinical settings.

Statistical analyses
The objectives of statistical analysis is to apply a direct mapping algorithm by regressing the EQ-5D-3L utility into the FACT-G domain scales (i.e. physical, emotional, functional , and social/family) to predicted the EQ-5D-3L health utility from different variables. Similarly, the squared terms and interaction terms of subscale scores were explored in our study. In previous studies [29; 30], the majority of the health utility scores had non-normal distributions with negative skew, as well as a ceiling effect, which violated the assumptions of the ordinary least squares method [OLS]. Despite the fact that its theoretical assumptions were not available, OLS outperformed other regression-based models [30]. In our sample, the distribution of EQ-5D-3L utility is negatively skewed, and there is a large ceiling effect in EQ-5D-3L (62.7% at ceiling). As a result, the different regression models tend to nd the best t for mapping the FACT-G to the EQ-5D-3L health utility.
Here, ve functions mapping from the FACT-G to the EQ-5D-3L (i.e., the ordinary least squares [OLS] model, generalized linear model [GLM], Tobit model, the censored least absolute deviations [CLAD], and the two-part model [TPM])were included in this study. Most of the previous studies suggested that, the OLS model was considered the most commonly used method in mapping algorithms, but it may not be appropriate when preference-based scores are highly skewed [16]. The ceiling effect may also invalidate the normality assumption of OLS [31]. The GLM, on the other hand, relaxes the assumption of the OLS, allowing for a skewed distribution of utility values and accommodate the non-linear relationship with the predictor variables. We wonder if the GLM produced more accurate predictions than the OLS. The Tobit model is an alternative model that accounts for the ceiling effect, thus limiting predictions within a credible range. However, it is sensitive to normal distribution and heteroscedasticity. The CLAD model assumes that the median is more resistant than the mean to ceiling effects and is a possible solution to the heteroscedasticity problem as well, which minimizes the sum of absolute differences between observed and predicted values [31; 32]. The TPM is speci cally designed to deal with limited dependent variables, which divide the data into two parts to predict responders in perfect health and those who are, not. The TPM with logistic regression is used to predict the probability of EQ-5D-3L utility at the ceiling in the rst part, a truncated OLS to predict EQ-5D-3L index for those individuals whose EQ-5D-3L utility is below the ceiling in the second part and combined they obtain the overall utility value [32; 33]. The OLS model, GLM, Tobit model, CLAD model, and TPM were performed in ve different models. We increased the squared terms and the interaction terms to improve the model accuracy for this study, as suggested in the literature [4]. Model 1 uses the FACT-G overall scores to regress the EQ-5D-3L utility indices, Model 2 uses all domain scores on the FACT-G. Model 3 includes only statistically signi cant domains. Statistically signi cant terms from Model 2 are equivalent to Model 3. Then Model 4 is Model 3 + their squared terms; Model 5 is Model 4 + interaction of linear terms. The covariate adjustment (i.e., age, gender) was omitted from in this study, because it may reduce the practical value of the developed mapping function in the regression analysis [30].
The purpose of prediction is usually to predict the most likely true preference-based score using the pro le-based score that is known about the respondent. However, regression-based mapping model predictions result in biased estimates with lower levels of variance than the original observed values due to regression to mean [20]. Simple linear equating, which involves equating the mean and standard deviation of the two scales, can help to alleviate the typical problem of over-prediction of low scores and under-prediction of high scores [34]. Linear equating attempts to nd a preference-based score that is equal to the score based on the pro le by aligning the score distributions of the two scales [22]. As a result, we used linear equating force to predict the EQ-5D-3L preference-based scores of the regression models that had been linearly converted to have the same mean and standard deviation as the observed EQ-5D-3L scores. In other words, given observed EQ-5D-3L utility and predicted utility (Y R ), and predicted linear equating (Y LE ) estimated values are as follows:. Where and were the mean and standard deviation of the observed EQ-5D-3L utility scores, and and were the mean and standard deviation of the predicted EQ-5D-3L utility scores obtained from the regression model. Then, using scatter plots and the Bland-Altman plot, we recoded predictions (equivalent) scores and observed scores to show the difference between them [23].
We calculated the goodness of t of each model to assess how well the responses to the FACT-G predicted EQ-5D-3L utility.
Examining the difference between predicted and observed values to the predictive performance of regression-based models is a better method for evaluating mapping functions [21]. Model goodness of t was measured using mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), mean absolute percentage error (MAPE), Akaike information criteria (AIC), and Bayes information criteria (BIC) to examine the differences between mean observed and predicted EQ-5D-3L utility, in which lower values indicate better model performance. The coe cient of determination, R 2 , and adjusted R 2 also estimated how well the model explained the values in OLS. However, it is not available for other regression models. Instead, we computed the square of the correlation coe cient (r) between the observed and predicted values of each model, with r 2 being equivalent to R 2 in OLS [19]. To penalize the complexity of the model, we de ned the adjusted r 2 as follows: adjusted where n represents the sample size and p is the number of parameters in the model. Predictive ability was evaluated by a paired t-test to compare the differences in the distributions between the observed and mapped EQ-5D-3L utility scores. The different EQ-5D-3L utility scores from different models with demographic and clinical features were examined by non-parametric analysis. Moreover, we selected the lowest MAE/RMSE and the highest r 2 and adjusted r 2 as the best performing models. The EQ-5D-3L observed values and predicted utility values were compared in patients with different demographic and clinical characteristics by the Wilcoxon test and the Kruskal-Wallis H test in the different models. Finally, scatter plots and the Bland-Altman plot were used to visualize each model's predictive performance between observed and predicted EQ-5D-3L scores. All statistical analyses were performed in STATA version14.1 and all hypothetical tests were twotailed, and p-value < 0.05 was considered statistically signi cant in this study.

Results
A total of 243 cancer patients were included in this analysis. Demographic and clinical features are summarized in Table 1.
The average age of participants was 56.34 years (SD = 8.36 years) and the majority were female (71.6%). Among all age groups, the 55-59 years age group (19.3%) comprising the largest proportion of participants. Most of the cancer patients were married (80.7%) and had completed secondary education or less (72.9%). Most (55.1%) did not place nancial stress on the family because of the illness, and the majority (94.7%) reported no signi cant life events. Of all the patients, 46 (18.9%) patients underwent surgical treatment, 122 (50.2%) patients received heteropathy, and 75 (30.9%) patients received other therapies. There were no statistically signi cant differences (p > 0.05) in the demographic and clinical characteristics of the patients except for family nancial pressure and the signi cant life events, which were examined by the Wilcoxon test and the Kruskal-Wallis H test from the EQ-5D-3L, as appropriate. EQ-5D-3L,mean(SD) 0.935(0.099) 1(0.875,1) -P Wilcoxon rank-sum tests for two categories or KruskalWallis test for more than two categories, SD standard deviation, EQ-5D-3L Three-level EuroQol-5-dimension questionnaire The distributions of the EQ-5D-3L utility index and the FACT-G total and subscale scores are described in Table 2. The mean values of EQ-5D-3L and FACT-G utility scores were 0.935 and 82.7, respectively. The value of EQ-5D-3L ranged from 0.364 to 1.000, and the median was 1.000, with 62.7% of the subjects having the highest score. All FACT-G scores reached their ceiling levels but the FACT-G total score was negligible, with notable values for the PWB (27.2%), EWB (6.2%), SFWB (7.8%), FWB (5.8%), and FACT-G total (0.8%). Cronbach's α of all scales exceeded the threshold value (α ≥ 0.7), which was considered satisfactory. Likewise, all the scales exceeded the threshold for good reliability (α ≥ 0.8), except for the SFWB subscale (α = 0.683). Both the EQ-5D-3L utility and FACT-G scores were negatively skewed. Spearman's rank correlation coe cients between the EQ-5D-3L and FACT-G (including the total and four subscales) are shown in Table 3. Most of the scores showed moderate and high correlations. The correlation coe cient between the EQ-5D-3L utility index and FACT-G total scores was 0.5382. The correlation coe cient between SFWB and other scores showed negligible correlations except for the FACT-G total, EWB, and FWB scores. All correlations are signi cant at the 0.05 level after Bonferroni correction.  respectively. In the sample, the models with lowest MAE and RMSE were the CLAD (0.0423) and the OLS (0.0624) models, respectively. In terms of the accuracy of prediction in MAE, the CLAD model performed better than the other models. The best performing algorithms were identi ed for each scale according to the lowest MAE/RMSE scores, namely Models 5 for FACT-G in the CLAD model. In terms of all criteria, the OLS model performed the best among the regression-based models, while the Tobit model performed the worst. For observed utilities ≥1, the predictive accuracy of OLS, GLM, and CALD models was fairly similar. This result was also supported by scatter plots of observed and predicted EQ-5D-3L utility for best tting models in Fig.   1. Broken line is the best tting regression line between observed value and predicted value. The facts were also evident in    <0.05). However, the OLS model showed that over-predicting at severe health states and under-predicting at better health. Similarly, the standard deviation of estimated scores was always much lower for the OLS model than the observed standard deviation. When the linear equated scores were used for model OLS 5, the predicted values were closer to the means of the actual values, with a standard deviation that best matched the observed value and signi cantly reduced under-prediction for high scores and signi cantly reduced over-prediction for low scores (Fig.   2).  Table 7 presents the predicted mean observed and predicted EQ-5D-3L utility values with the statistically signi cant demographic and clinical characteristic features in the best models from the different regression algorithms. Compared with TPM 3 and Tobit 3, the estimated health utilities from OLS 5, GLM 5, and CLAD 5 were closer to the observed values from the EQ-5D-3L. We also found that all models tended to overpredict the higher top and bottom end of the EQ-5D-3L utility: because a few responders in the FACT-G data set reported severe problems. In addition, the OLS model predictive performance was more accurate than other models under the in uence of heteroscedasticity and various misspeci cation confounding factors.
Finally, the predicted EQ-5D-3L utilities for the OLS5 model using simple linear equating were closer to the mean of the actual values.

Discussion
The overall objective was to examine the feasibility and effectiveness of mapping algorithms, as well as the conditions under which circumstances they should be considered, and to draw lessons learned for future mapping studies [21]. We developed an algorithm that maps the EQ-5D-3L health utility index of the general cancer population from the FACT-G based on the data collected from a cross-sectional study in China. The current study suggested that the consistency between the predicted and observed EQ-5D-3L utility scores was feasible among ve models algorithms. Meanwhile, it also con rmed that the FACT-G nopreference-based scores can estimate EQ-5D-3L health utility scores by using a mapping function for HRQoL, in which the utility scores were generated using the Chinese value set. Our ndings suggest that the OLS model is better for predicting EQ-5D-3L utilities compared to other models in terms of goodness of t and model performance, which is consistent with most existing research ndings.
In this study, the coe cients of PWB, EWB, and FWB were signi cant in most models for all the regression algorithms, whereas, the coe cient of SFWB was not signi cant in most models. Hence, the SFWB score of the FACT-G was not included in the regression model. We found that the SFWB was not statistically signi cant and showed weak correlations with the EQ-5D-3L utility index compared to other FACT-G subscale. Previous research showed that mapping studies tended to con rm the predictive ability of health utility more easily when exploring the correlation between the EQ-5D-3L and FACT-G scales [3; 31; 35]. Furthermore, the SFWB may not be entirely captured in the mapping be attributed to the nature of the EQ-5D-3L in that it doesn't cover the social well-being aspect explicitly [16]. Some of the previous study also showed that the SFWB was also not statistically associated with the EQ-5D-3L utility index in regression models, with mapping studies of FACT-P [8], FACT-L [31], , and FACT-B [33].
In this study, we found that the OLS model had the largest r 2 and adjusted r 2 among the regression models. The r 2 and adjusted r 2 values of all models were larger than 0.5, except for Model 1, which indicated that the model had good explanatory power. The validity of such models in terms of goodness-of-t and error of prediction, as they are highly variable. Such variability is clearly observed in the present study as well: RMSE varies between 0.0624 for OLS, 0.234 for GLM and to 0.366 for Tobit. Similar pattern observed for MAE. In previous studies, the model's explanatory power ranged from 0.417 to 0.909 in terms of r 2 [13]. The r 2 of model OLS 5 reached 0.623, indicating that the model performed well. The predictive performance of the models is to examine the difference between the predicted and observed values for assessing the mapping algorithm by MAE and RMSE. From reported results (Tables 4 and 5), both CLAD and TPM performed much better than Tobit on the three criteria set by the R2, RMSE and MAE. CLAD also performed better than GLM on both RMSE and MAE.
More than half of the observations in this study reached the EQ-5D-3L ceiling of 1. As a result, various regression methods were employed in an attempt to solve the problem of estimating patients' utility from descriptive HRQOL data, which was hampered by the ceiling effect. Despite the fact that the data had a severe skewness distribution and a ceiling effect, we attempted to solve the problem by using different regression models to estimate the EQ-5D utility. GLM allows the skewed distribution of utility values. Tobit and CLAD models are appropriate for censored or bounded data and may be used if the utility scores exhibit a ceiling effect, indicating that a large proportion of subjects are in full health with a utility score of 1 [32].
In a previous study [21], OLS, Tobit, and CLAD were used, and it was discovered that CLAD performed the best in terms of . However, although the values of r 2 , MAE, and RMSE were higher than those of previous studies, the overall predictive ability was not satisfactory. This could be due to differences observed between studies, which could be attributed to differences in the target instrument used and variations in the additional covariates used to predict EQ-5D-3L utility values [34]. We used the mean scores of the FACT-G instead of the patient's prediction when predicting the mean of the EQ-5D-3L utility index [37]. In addition, uncertainties or errors in the economic assessment may affect the accuracy of the utility value, leading to an incorrect estimation of the patient's HRQoL, and further research is needed to assess their impact on the mapping algorithm [2].
Our current study demonstrated that the predictive performance of the FACT-G was effective in the OLS model in the Chinese cancer population. Although the OLS model is a common mapping algorithm and the predicted values are close to the true values, it requires very strict assumptions, namely those of normal distribution and homogeneity of variance. In addition, previous studies have shown that OLS produces a low predictive ability, which will affect its prediction performance. This is similar to previous literature studies that overestimated those with poor health and underestimated those with good health utility values, as shown in Regression-based models generally tends to over-prediction for respondents in poor health and under-prediction for respondents in better health due to regression to the mean. Linear equating can help to alleviate the common problem of under-prediction of high scores and over-prediction of low scores, which forces predicted values to have the same mean and standard deviation as observed values [34]. The smallest, 10th, and 50th percentile predicted values of the EQ-5D-3L utility values were signi cantly reduced with linear equating, as shown in Table 6. Given that the majority of observations were in perfect health, it's not surprising that less than half of the EQ-5D-3L value sets had overestimation of scores. Therefore, applying linear equating to predict values reduces biased estimates, resulting in similar variability between estimated and observed values for linear equating models [22]. The linear equivalence model aligns the distribution of the two values' scores on similar scales, corresponding to condition-or disease-speci c scores. As a result, the estimated EQ-5D-3L scores should only be used for group-level analysis and should not be applied at the individual level [20; 22; 34]. While linear equating cannot directly estimate EQ-5D-3L item scores, it can be used to select the best model from various regression-based models to estimate predicted values before applying the linear equating method. Before using the linear tting method, more research is needed to determine the best modeling strategy. In this cross-study, our sample consists of different types of cancer from one center and multiple oncology hospitals, so the sample size is relatively small, which may limit generalizability. The overall EQ-5D-3L utility index for Chinese cancer-patient was estimated at 0.935, which was higher than other utility indexes from the Chinese population. Perhaps most of the cancer patients we examined were in the early stages of their cancer, the quality of life of the patients was not seriously affected. Furthermore, due to a lack of data on disease stage and presentation status classi cation from all cancer patients, we were unable to assess the potential association between these patient parameters and quality of life scores. To some extent, our model can re ect the purpose and outcome of our research. Further validation of the mapping function should consider larger sample sizes and multiple treatment centers in future studies, which would help to more clearly explain the performance of the mapping function generated and the generalizability of the comparison results.
Our nding also suggested that family nancial pressure and signi cant life events might in uence the HRQoL for cancer patients. Higher nancial pressure and had signi cant life events were associated with a lower utility index, which may lead to a more severe disease burden that affects the HRQoL of the patient. And what we were nding would determine the underlying relevant factors that impacted the HRQoL or utility values of patients.
There are several limitations to this study. First, the study suffers from a high ceiling effect in the health utility index, which was 62.7% for Chinese cancer patients, leading to a high mean EQ-5D-3L values of 0.9353. The minimum EQ-5D-3L score is 0.364, which is well above the theoretical minimum of -0.149 for the Chinese value set. Although the range of utility values in this study is relatively narrow, Cheung et al. suggested that the range of utility values contributes to the relative performance of the different methods [30]. Concurrently, they developed the Mean Rank Method, a relatively new method that contributed to less over-estimation (under-estimation) of health utility among people in poor (good) health states [30]. Previous research indicated that OLS mapping of utility in a narrow range may be more accurate than in a broad range [29]. The current study has some limitations when compared to previous studies because its utility values are not fully covered. Meanwhile, it may be due to the smaller proportion of people in poorer health and the lack of the negative EQ-5D-3L utility value, which limits the generalizability of outcomes in more severe patients. That is, the study did not take full advantage of the potential range of scores by the Liu et al. algorithm. Recently published studies have suggested that an increasing number of countries are using the EQ-5D-5L tool as a preference-based measure instead of EQ-5D-3L due to its ability to reduce ceiling effects and sensitivity [39]. In addition, the Chinese value set of EQ-5D-5L was published in 2017 [40]. Second, the EQ-5D-3L has a high ceiling effect that may seriously affect the results, limiting the generalizability of the results to more severe patients. Finally, the study collected a relatively small sample and the data consisted of Chinese patients with ve different types of cancer. Therefore, future studies need a larger sample size and external data to verify the generalizability of this study.

Conclusion
We developed a FACT-G to EQ-5D-3L mapping algorithm for the economic assessment analysis of cancer patients in China. The algorithm found that the OLS model have a good predictive ability compared to the observed and predictive EQ-5D-3L utility scores among all the regression models. The linear equating model can be used to more accurately estimate EQ-5D-3L utility scores in cost-utility studies. More research is needed to determine the reliability of the estimated value sets. This mapping algorithm may provide policymakers and researchers with references for the economic evaluation of speci c health conditions in cost-utility analysis when estimating the health utilities of the Chinese cancer-patients with the Chinese values set. Figure 1 Scatter plots and Bland-Altman plot of observed and predicted EQ-5D-3L utility. OLS Ordinary least squares, GLM Generalized linear models, CLAD Censored least absolute deviations, TPM two-part model. Broken line is the best tting regression line between observed value and predicted value.

Figure 2
Scatter plots of observed and predicted EQ-5D-3L value sets for the preferred model. Broken line is the best tting regression line between observed value and predicted value