Study Participants and Features
A total of 256 cancer survivors were included in the dataset after preprocessing. Tables 1 and 2 show the mean or prevalence of sociodemographic and clinical features and SAT score features. The mean age of the survivors was approximately 51 years; 64.5% of the survivors were women; 51.2% earned more than 4,000 won per month; 62.1% had an undergraduate degree or higher; 60.2% lived in a metropolitan area; 85.5% were married; 63.3% had a religion; 36.3% had breast cancer, 27% had lung cancer, 22.3% had colon cancer, 6.6% had gastric cancer, and 7.8% had other cancers; 53.9% had stage Ⅲ or Ⅳ cancers; and 66.4% completed treatment more than 5 years ago. The mean SAT baseline and change scores are described in Table 2.
Table 1
Sociodemographic and clinical features in the dataset
|
|
Features
|
Category
|
N (%) or Mean ± SD
|
Min-Max
|
Age
|
|
51.28 ± 9.64
|
23-77
|
Sex
|
Male
|
91 (35.5)
|
|
Female
|
165 (64.5)
|
|
Income (won)
|
<4,000,000
|
125 (48.8)
|
|
≥4,000,000
|
131 (51.2)
|
|
Education
|
≤High school graduates
|
97 (37.9)
|
|
≥University graduates
|
159 (62.1)
|
|
Residence
|
Other areas
|
102 (39.8)
|
|
Metropolitan area
|
154 (60.2)
|
|
Marriage
|
Single
|
37 (14.5)
|
|
Married
|
219 (85.5)
|
|
Religion
|
No
|
94 (36.7)
|
|
Yes
|
162 (63.3)
|
|
Cancer type
|
Breast cancer
|
93 (36.3)
|
|
Lung cancer
|
69 (27)
|
|
Colon cancer
|
57 (22.3)
|
|
Gastric cancer
|
17 (6.6)
|
|
etc.
|
20 (7.8)
|
|
Cancer stage
|
I, II
|
118 (46.1)
|
|
III, IV
|
138 (53.9)
|
|
Treatment stage
|
≤5 years after treatment
|
86 (33.6)
|
|
>5 years after treatment
|
170 (66.4)
|
|
SD, standard deviation
Table 2
Smart management strategies for health Assessment Tool (SAT) features in the dataset
|
Features
|
Mean ± SD
|
Min-Max
|
Baseline
|
|
|
SAT-C
|
|
|
Proactive problem-solving strategy
|
64.1 ± 18.93
|
10-100
|
Positive-reframing strategy
|
68.42 ± 21.44
|
14.8-100
|
Creating empowered relationship strategy
|
77.06 ± 18.73
|
11.1-100
|
Experience-sharing strategy
|
48.78 ± 25.19
|
0-100
|
SAT-P
|
|
|
Goal and action setting
|
48.68 ± 21.19
|
0-100
|
Rational decision-making strategy
|
63.54 ± 17.74
|
22.2-100
|
Healthy environment-creating strategy
|
57.32 ± 19.81
|
0-100
|
Priority-based planning strategy
|
54.69 ± 21.22
|
0-100
|
Life value-pursuing strategy
|
61.12 ± 20.68
|
6.7-100
|
SAT-I
|
|
|
Self-motivating strategy
|
57.63 ± 19.49
|
4.8-100
|
Self-implementing strategy
|
54.13 ± 21.58
|
0-100
|
Reflecting strategy
|
44.79 ± 27.16
|
0-100
|
Energy-conserving strategy
|
58.42 ± 19.27
|
0-100
|
Activity-coping strategy
|
57.58 ± 22.70
|
0-100
|
Change
|
|
|
SAT-C
|
|
|
Proactive problem-solving strategy
|
-1.18 ± 16.84
|
-100-36.7
|
Positive-reframing strategy
|
-1.35 ± 18.03
|
-91.6-51.9
|
Creating empowered relationship strategy
|
-4.32 ± 17.38
|
-94.4-38.9
|
Experience-sharing strategy
|
-1.74 ± 24.28
|
-88.9-66.7
|
SAT-P
|
|
|
Goal and action setting
|
0.13 ± 19.62
|
-90-53.3
|
Rational decision-making strategy
|
-3.08 ± 17.10
|
-88.9-55.6
|
Healthy environment-creating strategy
|
-0.62 ±18.09
|
-60-46.7
|
Priority-based planning strategy
|
-1.04 ± 19.83
|
-75-58.3
|
Life value-pursuing strategy
|
-3.33 ± 18.82
|
-73.3-46.7
|
SAT-I
|
|
|
Self-motivating strategy
|
-1.10 ± 17.99
|
-76.2-61.9
|
Self-implementing strategy
|
-0.42 ± 19.79
|
-75-58.3
|
Reflecting strategy
|
-3.32 ± 27.98
|
-100-100
|
Energy-conserving strategy
|
-0.95 ± 20.78
|
-77.8-20.78
|
Activity-coping strategy
|
-0.31 ± 21.38
|
-86.7-73.3
|
SD, standard deviation
Table 3 shows the outcome features in the dataset, with 88.67% showing a high global QoL (primary outcome), 66.41% showing good physical health status, 82.03% showing good mental health status, 88.67% showing good social health status, and 75.39% showing good spiritual health status (secondary outcome).
Table 3
Outcome features in the dataset
|
Features
|
Category
|
N (%)
|
Primary outcome
|
|
|
Global Quality of Life
|
High
|
227 (88.67)
|
Low
|
29 (11.33)
|
Secondary outcomes
|
|
|
Physical health status
|
Good
|
170 (66.41)
|
Bad
|
86 (33.59)
|
Mental health status
|
Good
|
210 (82.03)
|
Bad
|
46 (17.97)
|
Social health status
|
Good
|
227 (88.67)
|
Bad
|
29 (11.33)
|
Spiritual health status
|
Good
|
193 (75.39)
|
Bad
|
63 (24.61)
|
Model Performance
Table 4 shows the comparison results between XGBoost and other ensemble classifiers for the primary outcome (global QoL). XGBoost produced the best results for the main performance measures, i.e., the AUROC and AUPRC. Accuracy and F1 scores of XGBoost yielded the second-best results. Assessment of the prediction performance of XGBoost showed an AUROC of 0.80 (95% CI, 0.78 to 0.82), AUPRC of 0.96 (95% CI, 0.95 to 0.97), and F1 scores of 0.88 (95% CI, 0.84 to 0.92) and 0.93 (95% CI, 0.91 to 0.95). The final tuned hyperparameters are described in SI 3.
Table 4
Comparison of the prediction performance for primary outcomes in different methods
|
|
Methods
|
AUROC (mean, 95% CI)
|
AUPRC (mean, 95% CI)
|
Accuracy (mean, 95% CI)
|
F1 score (mean, 95% CI)
|
Decision Tree
|
0.63 (0.61-0.65)
|
0.91 (0.90-0.92)
|
0.88 (0.83-0.90)
|
0.94 (0.92-0.95)
|
Random Forest
|
0.78 (0.76-0.80)
|
0.96 (0.95-0.97)
|
0.88 (0.87-0.90)
|
0.94 (0.93-0.95)
|
Gradient Boosting
|
0.78 (0.76-0.80)
|
0.95 (0.94-0.96)
|
0.87 (0.81-0.92)
|
0.93 (0.89-0.95)
|
XGBoost
|
0.80 (0.78-0.82)
|
0.96 (0.95-0.97)
|
0.88 (0.84-0.92)
|
0.93 (0.91-0.95)
|
LightGBM
|
0.79 (0.77-0.81)
|
0.96 (0.95-0.97)
|
0.89 (0.88-0.90)
|
0.94 (0.94-0.95)
|
AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve; CI, confidential interval.
Table 5 shows the prediction performance results for health statuses (secondary outcome) using XGBoost. Overall, the predictive performance for the secondary outcome was slightly lower than that for the primary outcome. The predictive results for physical health status included an AUROC of 0.66 (95% CI, 0.65 to 0.67), AUPRC of 0.77 (95% CI, 0.75 to 0.79), accuracy of 0.69 (95% CI, 0.57 to 0.78), and F1 score of 0.79 (95% CI, 0.72 to 0.84). The predictive results for mental health status included an AUROC of 0.71 (95% CI, 0.69 to 0.73), AUPRC of 0.90 (95% CI, 0.89 to 0.92), accuracy of 0.83 (95% CI, 0.78-0.86), and F1 score of 0.90 (95% CI, 0.88 to 0.93). The predictive results for social health status included an AUROC of 0.77 (95% CI, 0.75-0.79), AUPRC of 0.96 (95% CI, 0.95-0.97), accuracy of 0.88 (95% CI, 0.84 to 0.92), and F1 score of 0.94 (95% CI, 0.92 to 0.96). The predictive results for spiritual health status included an AUROC of 0.75 (95% CI, 0.74 to 0.76), AUPRC of 0.87 (95% CI, 0.85 to 0.89), accuracy of 0.79 (95% CI, 0.71 to 0.88), and F1 score of 0.87 (95% CI, 0.81 to 0.92).
Table 5
The prediction performance for health statuses' outcomes in the XGBoost model
|
|
|
Outcomes
|
AUROC (mean, 95% CI)
|
AUPRC (mean, 95% CI)
|
Accuracy (mean, 95% CI)
|
F1 score (mean, 95% CI)
|
Physical Health Status
|
0.66 (0.65-0.67)
|
0.77 (0.75-0.79)
|
0.69 (0.57-0.78)
|
0.79 (0.72-0.84)
|
|
Mental Health Status
|
0.71 (0.69-0.73)
|
0.90 (0.89-0.92)
|
0.83 (0.78-0.86)
|
0.90 (0.88-0.93)
|
|
Social Health Status
|
0.77 (0.75-0.79)
|
0.96 (0.95-0.97)
|
0.88 (0.84-0.92)
|
0.94 (0.92-0.96)
|
|
Spiritual Health Status
|
0.75 (0.74-0.76)
|
0.87 (0.85-0.89)
|
0.79 (0.71-0.88)
|
0.87 (0.81-0.92)
|
|
AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve; CI, confidential interval.
The prediction performance of AUROC and AUPRC are depicted in figures 2 and 3 and SI 4. Figure 2 shows the AUROC and AUPRC for predicting the primary outcome (global QoL) in the XGBoost model, and Figure 3 shows the AUROC and AUPRC for predicting the secondary outcomes (overall health statuses) in the XGBoost models. SI 4 shows the AUROC and AUPRC of different algorithms for predicting the global QoL to compare the performance results with XGBoost.
Feature Importance and Individual Prediction
We applied the final tuned XGBoost models to extract SHAP values in each primary and secondary outcome predictive model. Figures 4-5 showed the feature importance bar plots and the beeswarm plots for the top ten important features in each model. For predicting the global QoL, the top three important features were the activity-coping strategy, change values of the self-implementing strategy for 6 months, and the proactive problem-solving strategy. For predicting the physical health status, the top three important features were the activity-coping strategy, the healthy environment-creating strategy, and age. For predicting the mental health status, the top three important features were the proactive problem-solving, the positive-reframing, and the rational decision-making strategies. For predicting social health status, the top three features were the activity-coping, the healthy environment-creating, and the self-motivating strategies. For predicting spiritual health status, the top three features were the positive-reframing strategy, religion, and income. Thus, the top three most important features for each predictive outcome were different. The baseline SAT scores were found to be more important than the change scores of the SAT strategy for 6 months in predicting outcomes.
We selected two samples using the global QoL model and examined the compositions of individual predictions. Blue arrows indicate that the variables contributed to decreasing the global QoL, whereas red arrows indicate that the variables contributed to increasing the global QoL. In Figure 6(A), the predicted global QoL was 2.07, suggesting that the global QoL was high (true) for this survivor, although the survivor was living alone and the low baseline score of the rational decision-making strategy in SAT-P had a negative influence on the QoL. In Figure 6(B), the predicted QoL was -2.58, suggesting that the QoL was low (false) for this survivor. Although there are some positive effects on the QoL, such as no reduction of using the self-implementing strategy for 6 months, the low baseline score of the proactive problem-solving strategy in SAT-C, the activity-coping strategy in SAT-I, the self-motivating strategy in SAT-I, and the reduction in using the activity-coping strategy for 6 months had more negative effects on the global QoL. As a result, this survivor’s global QoL turned out to be low (false).