**4.1 The research subjects**

Based on the data sources and the purpose of this study, the finalized total sample size for this study was 1200 cases. **Supplementary Material Spreadsheet 2-4 provide **information on the overall demographic characteristics, health behavior characteristics, and related health status of the study population.

**4.2 CHS Scale measurement results**

The theoretical structure of the CHS Scale consists of nine aspects: energy, pain, diet, stool, urine, sleep, body mass, emotion, and health evaluation. The CHS Scale consists of 33 entries, and each entry contains four levels. Post-analysis, the highest CHS Scale scores of the subjects in this study was 19.43 with a mean (standard deviation) of 2.72 (0.12) (**Supplementary Material Spreadsheet 5-6**).

**4.3 EQ-5D Measurement Results**

**4.3.1 Distribution of health status levels in each dimension of EQ-5D **

We found that 8.2%, 5.5%, 8%, 19.7%, and 16.7% of the 1200 subjects in this study had problems related to the mobility dimension, self-care dimension, daily activities dimension, pain or discomfort dimension, and anxiety or depression dimension, respectively. Compared to the five dimensions, the highest percentage of problems was in pain/discomfort dimension, followed by anxiety/depression. The highest percentage of all three levels compared to all five dimensions was the no difficulty level, followed by some distress level (**Supplementary Material Spreadsheet 7**). The results showed that the main aspects of the study’s overall health status that were problematic were pain/discomfort and anxiety/depression.

**4.3.2 EQ-5D health utility value results**

We used the Chinese version of the integral system for calculating EQ-5D health utility values, which was jointly designed and developed by our scholars Wu Hongyan and Liu Guoen. Its theoretical range was -0.149 to 1. The maximum and minimum values of EQ-5D health utility values in the total sample of this study were 1 and -0.03, respectively; the mean (standard deviation) was 0.923 (0.158); the median (interquartile) was 1 (0.875, 1) (**Supplementary Material Spreadsheet 8**). When the results of the EQ-5D health utility values were analyzed, the percentage of health utility values such as 1 ranked first, at 67.0%. The next highest distribution of health utility values was in the range of 0.8-1, at 19.8%. The lowest percentage of health utility values less than 0 was 0.2% (**Supplementary Material Spreadsheet 9**).

**4.3.3 Sample Diversity Results**

The total sample was divided into two subsets, the construction set (Derivation set) and the validation set (Validation set), in a ratio of 2/3 and 1/3. The stratified randomization method was used for creating the subset, and the demographic index (age) was used as the stratification factor to stratify the population according to their age groups in the dataset; each level was randomized in a ratio of 2:1. The final result was 800 cases in the construction set and 400 cases in the validation set, with an overall ratio of 2:1. The relative ratios of men and women in the construction and validation sets were 63% and 55%, respectively. Next, the four-compartment table *X*2 test was performed, with the statistic *X*2 = 0.600, *P* > 0.05, which indicated that there was no statistical difference in the distribution of men and women. The median age of the construction and validation sets were 41 and 43 years, respectively, and the statistic Z= -0.128, *P* > 0.05, by rank-sum test, can be considered as a balanced age factor for both groups. The median BMI of the construct and validation sets were 22.86 and 23.13, respectively. Based on the rank-sum test, the statistic Z = -0.592, *P* > 0.05, it could be considered that the two groups achieved equilibrium on this factor. Thus, both the construction and validation sets were balanced regarding the gender, age, and BMI factors. In addition, the concentration trends of the total scale scores and EQ-5D health utility values in the construction and validation sets were analyzed and the hypotheses of variance were tested. The results showed that the construction and validation sets were not statistically different between the two groups for these indicators at *P* > 0.05 (**Supplementary Material Spreadsheet ****10**). Thus, the construction and validation sets were well representative of the total sample.

**4.3.4 Alternative model construction**

**(1) Construction of the alternative model **

Six model types were constructed using the EQ-5D-3L health utility values as the dependent variable (**Appendix 3 of the Supplementary Material**). Five econometric methods, OLS, Tobit, GLM, QR, and CLAD, were used to construct a set of 800 samples, respectively. The regressions were fitted using the constructed set of 800 samples. The **spreadsheet 11 in the Supplementary Materia**l shows the variables involved in the model types and their characteristics.

**(2) Alternative model construction results**

Regression fits were performed based on the construction set sample using EQ-5D-3L health utility values as the dependent variables of the models. The **Appendix 4 and Spreadsheet 12 in Supplementary Material** show the results of the regression fit. The six models were ranked in descending order of the goodness of fit: model 1 < model 2 < model 3 < model 4 < model 5 < model 6. The model types were ranked in descending order of RMSE: model 6 < model 5 < model 4 < model 3 < model 2 < model 1. Therefore, model type 6 was the best in terms of goodness of fit. The Tobit model 6 with a pseudo R2 = 0.52, > 0.5, indicating that the nine aspect scores and covariates explained more than 50% of the dependent variable health utility value (U). Thus, this model type was used to build the regression equation in this model.

**4.3.5 Optimal model type testing and selection**

**(1) Alternative model prediction results**

The 24 alternative models constructed based on the validation set samples were used to predict the EQ-5D-3L utility values using the observed values. The observed values and the predicted values were analyzed as follows: of the 24 models, only six models had the mean values lower than the observed mean values; only two models had the upper quartile values lower than the observed values; only six models had higher median values than the observed median values. . Only five models had lower minimum values than the observed values and seven models had lower maximum values than the observed maximum values. In terms of absolute error (AE), model 4 of the QR method accounted for the least number of cases with AE > 0.05 and AE > 0.1, with 62 cases (31%) and 31 cases(15.5%). The number of cases with AE > 0.05 and AE > 0.1 was the highest in Model 1 of the Tobit method, with 194 (97%) and 175 (87.5%) cases, respectively (**Supplementary Material Spreadsheet 13**).

**(2) Comparison of prediction accuracy of alternative models**

Among the OLS, Tobit, and QR regression methods, the MAE and RMSE values of model 4, model 5, and model 6 were smaller than those of model 1, model 2, and model 3, i.e., the prediction accuracy of the regression fit using the CHS Scale score as the main effect of the independent variable was higher than that of the model using the CHS Scale total scores as the main effects of the independent variables. The overall MAE was ranked from Tobit < GLM < OLS < QR, with model 4 in the QR method being the best. Model 6 was optimal, and the addition of demographic information (e.g., age, gender, and BMI) or health behavior information (e.g., smoking status, alcohol consumption, etc.) as covariates in each method further reduced the RMSE values and improved the prediction accuracy. The combined MAE metrics and RMSE metrics showed that the prediction accuracy from highest to lowest was as follows: OLS > QR > GLM > Tobit (**Supplementary Material spreadsheet 14**). Thus, the OLS model and model 6 (OLS6) for regression fitting had the highest accuracy, followed by QR and model 6 (QR6) for regression fitting. The final mapping model was constructed based on the total sample data set using the model form of OLS6.

**4.3.6 Final mapping model and effectiveness evaluation**

**(1) Final mapping model construction**

First, the OLS method was used to fit the model based on the total sample data set with the main effect of each aspect of the CHS Scale as the independent variable. This is step 3, Finally based on step 3, the independent variables, including age, BMI, gender, smoking status, alcohol consumption, physical exercise status, and diet control status were used as covariates to adjust the model, and this is step 4. The **Supplementary Material Spreadsheet 15** shows the fit of each step.

The results of the regression fitting showed that after adding the squared term in step 2, although the scores of each aspect became less significant, the new squared term was actually significant, and the goodness of fit R2 increased to 0.4901 (the positive R2 was 0.4743), and the MAE decreased to 0.0622. After adding the interaction term in step 3, the physical fitness and health evaluation aspects became significant, and the explanatory power of the model further improved, with R2 and positive R2 of 0.5603 and 0.5203, respectively, while the MAE decreased to 0.059 again. After adding the covariates of demographic information and health behavior information in step 4, the goodness of fit of the model decreased, with R2 = 0.5325, positive R2 = 0.5078, and MAE value = 0.0617135, which indicated that the addition of covariates did not improve the predictive power and predictive accuracy of the model. Therefore, step 3 was finally identified as the optimal model and used as the final mapping model for the overall sample.

EQ − 5D − 3L = 0.9879 + 0.053 × EnT − 0.0199 × PaT + 0.0263 × DiT + 0.0384 × StT −0.0718 × PiT − 0.0251 × SlT − 0.106 × PhT − 0.0215 × MoT + 0.0437 × GhT − 0.0759 ×EnT2 − 0.0357 × PaT2 − 0.0380 × StT2 + 0.165 × EnTPhT− 0.0729EnTGhT − 0.0829 ×DiTGhT− 0.0809 ×StTPhT+ 0.1111×PhTGht.

The regression coefficients of PhT and GhT in the main effects of this regression model were significant, and the *P*-value of PiT was 0.063, which was more significant. The squared terms EnT2, PaT2, and StT2 were significant, and the interaction terms EnTPhT, EnTGhT, DiTGhT, StTPhT, and PhTGhT showed significant performance. The **Supplementary Material Spreadsheet 16 **shows the results of regression coefficients, *P*-values and test statistics for each independent variable.

**4.3.7 Final mapping model effectiveness evaluation**

**(1) Results of the distribution of observed values and predicted values**

The final model predicted values ranged from 0.1 to 1.045, with a mean of 0.923, an upper quartile of 0.902, a median of 0.962, and a lower quartile of 0.988. A comparative analysis of the distribution of the observed and predicted values shows that both the observed and predicted values were mostly distributed in the greater than 0.8 fraction band, accounting for 86.83% and 89.50%, respectively. The observed values and predicted values were mostly distributed in the 0.6-0.8 score range, accounting for 86.83% and 89.50%, respectively, followed by the 0.6-0.8 score range, accounting for 8.83% and 7.83%, respectively (**Supplementary Material Spreadsheet 17-18** ).

**(2) Correlation analysis between observed and predicted values**

The Spearman’s rank correlation test was performed since the observed values and predicted values did not conform to the normal distribution. The test results showed r = 0.6181, *P* < 0.0001, *P* < 0.05, rejecting the original hypothesis that the observed values were correlated with the predicted values, showing a medium to high correlation.

**(3) Analysis of prediction error sign direction**

In the full sample, there were more cases of underestimation (e < 0) than overestimation (e > 0), with 397 cases (66.16%) and 203 cases (33.83%) showing both cases, respectively. Stratified analysis of the observations showed that underestimation was more severe in the group with observed utility values > 0.8. In the full health state, more than half of the cases (57%) showed underestimation (**Supplementary Material Spreadsheet 19**).

**(4)Analysis of prediction accuracy indicators**

Previous studies have shown that MAE, RMSE, and ME at [0.0011, 0.19], [0.084, 0.2] and [0.0007, 0.042], respectively, were more common mapping models [5]. In this study, MAE = 0.0590, RMSE = 0.1045, ME = 4.64e-09, proportion of AE > 0.05 = 37.83% and proportion of AE > 0.1 = 17.33% in the total sample of 1200 cases. These results were all within the above-mentioned range and were all at a superior level. Therefore, the model prediction accuracy was good and the model fit was acceptable.

A stratified analysis of the actual observations showed that the three indicators ME, MAE, and RMSE were largest in the (0.2, 0.40] range with 0.40557, 0.40557, and 0.44547, respectively, followed by the interval of measured utility values ≤ 0.2. AE > 0.05 and AE > 0.1 were also largest in the (0.2, 0.40] range, followed by the interval of measured utility ≤ 0.2 (**Supplementary Material Spreadsheet 20**).