Patients’ demographics
Overall, 71,506 patients with endometrial cancer were included to assess OS and 66,368 were included to assess CSS. The patient selection process for the OS dataset is shown in Fig. 1, and each clinicopathological variable in the population of the OS dataset is summarized in Table 1. The mean age at diagnosis was 61.2 years (SD, 12.2 years). Regarding race, whites were 84.9%, blacks were 6.7%, and the others comprised 8.3% of the study population. Most of the histological types were endometrial cancer (57.1%), followed by serous adenocarcinoma (6.1%), although the detailed pathologies except for adenocarcinomas were unknown in 29.8% of cases. Regarding pathological grade, grades 1 and 2 were most frequent (67.2%). Most patients presented with early endometrial cancer, with 75.3% classified as Stage. Regarding the TNM classification, T1/T2 (86.5%) was the most frequent, N0 was 90.3%, and M0 was 92.9%. The median size was 26.0 mm (SD, 37.0 mm). In this study, among the OS population, there were 15,977 (22.3%) deaths during the 5-year follow-up. The median time to event for patients who died of cancer was 23.0 months. The median time to events in the overall population was 114 months.
Table 1
Baseline characteristics in 5-year overall survival (OS) dataset
|
N
|
%
|
All
|
71,506
|
|
Mean year at diagnosis
|
2002 (7.7)
|
|
Mean age at diagnosis
|
61.2 (12.2)
|
|
Race
|
|
|
White
|
60,738
|
84.9
|
Black
|
4972
|
6.7
|
Others
|
5976
|
8.3
|
Vital status
|
|
|
Alive
|
55529
|
77.6
|
Dead
|
15977
|
22.3
|
Grade
|
|
|
G1
|
28138
|
39.3
|
G2
|
19971
|
27.9
|
G3
|
12632
|
17.6
|
Missing
|
10765
|
15.1
|
Stage
|
|
|
Ⅰ
|
53885
|
75.3
|
Ⅱ
|
5230
|
7.3
|
Ⅲ
|
6926
|
9.7
|
Ⅳ
|
5465
|
7.6
|
Pathology
|
|
|
Endometrioid
|
40805
|
57.1
|
Serous
|
4292
|
6.1
|
Mixed
|
2297
|
3.2
|
Clear cell
|
1047
|
1.4
|
Mucinous
|
987
|
1.3
|
Carcinosarcoma
|
760
|
1.1
|
Unknown(adenocarcinoma)
|
21318
|
29.8
|
Surgical staging
|
|
|
Localized
|
53887
|
75.3
|
Regional
|
11595
|
16.2
|
Distant
|
6024
|
8.4
|
T class
|
|
|
T1
|
55685
|
77.8
|
T2
|
6256
|
8.7
|
T3
|
6100
|
8.5
|
T4
|
863
|
1.2
|
TX
|
2602
|
3.6
|
N class
|
|
|
N0
|
64576
|
90.3
|
N1
|
4655
|
6.5
|
N2
|
103
|
0.1
|
NX
|
2172
|
3.1
|
M class
|
|
|
M0
|
66491
|
92.9
|
M1
|
5015
|
7.1
|
Cytology*
|
|
|
Negative
|
9779
|
88.1
|
Positive
|
1321
|
11.9
|
Mean tumor size* (mm)
|
26.0 (37.0)
|
|
Mean number of examined PLN*
|
8.3 (9.7)
|
|
Mean number of examined PAN*
|
1.9 (4.1)
|
|
Mean number of positive PLN*
|
0.40 (1.7)
|
|
Mean number of positive PAN*
|
0.30 (1.6)
|
|
Data are mean (SD) or n (%) |
*) contains missing data |
PAN: para-aortic lymph nodes |
PLN: pelvic lymph nodes |
Statistical analysis of variables
Regarding the comparison between the alive and dead groups in the OS population, significant differences were observed in all continuous and categorical variables. The median values and rates of each variable between the two groups for OS are shown in Table 2. As continuous variables, the age at diagnosis in the alive group was 7 years younger than that in the dead groups (mean; 59.2 years vs. 67.9 years), and tumor size was also 10 mm smaller (mean: 23 mm vs. 34 mm). Regarding grade, stage, TNM classes, and cytology, advanced stage endometrial cancer was clearly observed in the dead groups. Regarding the pathological types, serous adenocarcinoma, clear adenocarcinoma, and carcinosarcoma were more common in the dead group.
Table 2
Statistical comparison of two groups in 5-year overall survival (OS) dataset
|
Alive
|
Dead
|
p-values
|
All (n = 71,506)
|
(n = 55,529)
|
(n = 15,977)
|
|
Year at diagnosis
|
2002.2 (7.4)
|
2003.4 (8.3)
|
< 0.01
|
Age at diagnosis
|
59.2 (11.6)
|
67.9 (11.6)
|
< 0.01
|
Race
|
|
|
< 0.01
|
White
|
86.2
|
80.7
|
|
Black
|
5.2
|
12.1
|
|
Others
|
8.7
|
7.2
|
|
Grade
|
|
|
< 0.01
|
1
|
45.6
|
17.3
|
|
2
|
29.1
|
24
|
|
3
|
12.2
|
36.4
|
|
Missing
|
13.0
|
22.3
|
|
Stage
|
|
|
< 0.01
|
Ⅰ
|
84.3
|
44.5
|
|
Ⅱ
|
6.9
|
8.6
|
|
Ⅲ
|
6.9
|
19.2
|
|
Ⅳ
|
1.9
|
27.7
|
|
Pathology
|
|
|
< 0.01
|
Endometrioid
|
60.9
|
43.9
|
|
Serous
|
3.3
|
15.3
|
|
Mixed
|
2.9
|
4.2
|
|
Mucinous
|
1.5
|
1.1
|
|
Clear cell
|
0.9
|
3.4
|
|
Carcinosarcoma
|
0.4
|
3.5
|
|
Unknown (adenocarcinoma)
|
30.1
|
28.8
|
|
Surgical staging
|
|
|
< 0.01
|
Localized
|
84.3
|
44.4
|
|
Regional
|
13.2
|
26.6
|
|
Distant
|
2.5
|
28.9
|
|
T class
|
|
|
< 0.01
|
T1
|
86.2
|
49
|
|
T2
|
7.8
|
12.4
|
|
T3
|
4.9
|
21.1
|
|
T4
|
0.3
|
4.4
|
|
TX
|
0.8
|
13.5
|
|
N class
|
|
|
< 0.01
|
N0
|
95.8
|
71.2
|
|
N1
|
3.5
|
17.1
|
|
N2
|
0
|
0.6
|
|
NX
|
0.7
|
11.2
|
|
M class
|
|
|
< 0.01
|
M0
|
98.3
|
74.5
|
|
M1
|
1.7
|
25.5
|
|
Cytology*
|
|
|
< 0.01
|
Negative
|
93.7
|
69.1
|
|
Positive
|
6.3
|
30.9
|
|
Tumor size (mm)
|
23.3 (32.1)
|
34.7 (48.4)
|
< 0.01
|
Examined PLN
|
9.1 (9.9)
|
6.3 (8.7)
|
< 0.01
|
Positive PLN
|
0.18 (1.2)
|
1.1 (2.6)
|
< 0.01
|
Examined PAN
|
2.0 (4.2)
|
1.6 (3.7)
|
0.037
|
Positive PAN
|
0.17 (1.3)
|
0.77 (2.2)
|
< 0.01
|
Data are mean (SD) or n (%) |
PAN: para-aortic lymph nodes |
PLN: pelvic lymph nodes |
Performance of machine learning classifiers
Regarding the prediction of OS, XGBoost showed the best performance with a class accuracy of 0.862 (95%CI: 0.859–0.866) and AUC of 0.831 (95%CI: 0.827–0.836), followed by ANN with a class accuracy of 0.858 (95%CI: 0.853–0.863) and AUC of 0.831 (95%CI: 0.821–0.838). Logistic regression had a class accuracy of 0.841 (95%CI: 0.836–0.846) and AUC of 0.805 (95%CI: 0.796–0.814), while random forest had a class accuracy of 0.836 (95%CI: 0.833–0.839) and AUC of 0.777 (95%CI: 0.771–0.784).
In the prediction of CSS, XGBoost also showed the best performance with a class accuracy of 0.914 (95%CI: 0.911–0.916) and AUC of 0.867 (95%CI: 0.862–0.871), followed by ANN with a class accuracy of 0.907 (95%CI: 0.904–0.908) and AUC of 0.853 (95%CI: 0.847–0.859). Logistic regression had a class accuracy of 0.896 (95%CI: 0.892–0.899) and AUC of 0.837 (95%CI: 0.831–0.844), while random forest had a class accuracy of 0.903 (95%CI: 0.901–0.906) and AUC of 0.833 (95%CI: 0.827–0.836). The metrics for each prediction model are listed in Table 3.
Table 3
The performance of machine leaning models
1) 5-year overall survival (OS)
|
|
|
Prediction model
|
Accuracy (95% CI)
|
AUC (95% CI)
|
Brier score (95% CI)
|
XGBoost
|
0.862 (0.859–0.866)
|
0.831 (0.827–0.836)
|
0.105 (0.103–0.108)
|
Artificial neural network
|
0.858 (0.853–0.863)
|
0.831 (0.821–0.838)
|
0.107 (0.106–0.109)
|
Logistic regression
|
0.841 (0.836–0.846)
|
0.805 (0.796–0.814)
|
0.118 (0.115–0.120)
|
Random forest
|
0.836 (0.833–0.839)
|
0.777 (0.771–0.784)
|
0.120 (0.118–0.122)
|
2) 5-year cancer-specific survival (CSS)
|
|
|
Prediction model
|
Accuracy (95% CI)
|
AUC (95% CI)
|
Brier score (95% CI)
|
XGBoost
|
0.914 (0.916 − 0.911)
|
0.867 (0.871 − 0.862)
|
0.066 (0.067 − 0.064)
|
Artificial neural network
|
0.907 (0.908 − 0.904)
|
0.853 (0.859 − 0.847)
|
0.069 (0.071 − 0.066)
|
Logistic regression
|
0.896 (0.899 − 0.892)
|
0.837 (0.844 − 0.831)
|
0.079 (0.082 − 0.076)
|
Random forest
|
0.903 (0.906 − 0.901)
|
0.833 (0.836 − 0.827)
|
0.076 (0.077 − 0.074)
|
AUC: area under the curve |
95% CI: 95% confidence interval |
Each model showed good calibration, with low Brier scores. XGBoost and ANN showed lower scores than logistic regression and random forest. In the prediction of CSS, XGBoost showed the best Brier score of 0.066 (95%CI: 0.064–0.0.67), followed by ANN with a Brier score of 0.069 (95%CI: 0.066–0.071). In the prediction of CSS, the model showed lower Brier scores than the prediction of OS.
Graphical assessment for the prediction models
The ROC curves for OS and CSS are shown in Fig. 2. The machine learning models predicted OS and CSS with a high AUC, as mentioned above. Among the four models, XGBoost and ANN showed similarly higher prediction performance compared to the other two models. Comparing the ROC curve between OS and CSS prediction, the difference between XGBoost/ANN and the other two models was more apparent in CSS prediction, which showed that XGBoost/ANN models were considered to have higher abilities in expressing the relationship between clinicopathological variables and CSS.
The calibration curves demonstrated good agreement between the prediction and observation in the probability of OS and CSS, as shown in Fig. 3. XGBoost and ANN are considered to have high stability and a low level of overfitting. Figure 3 also provides the decision curve analyses. Regarding the prediction of OS, the net benefit of XGBoost was the highest among the four models. The gain from XGBoost was particularly higher with threshold probabilities of risk between 0.2 and 0.9.