Development and validation of machine learning models for the prediction of overall survival and cancer-specific survival in endometrial cancer

DOI: https://doi.org/10.21203/rs.3.rs-2264880/v1

Abstract

Background

Accurate prediction of prognosis is essential for the management of patients with cancer. We aimed to predict the prognosis of endometrial cancer using machine learning.

Methods

We included patients with endometrial cancer in the Surveillance, Epidemiology, and End Results database. We constructed four machine learning models including logistic regression, random forest, gradient boosting machine (XGBoost), and artificial neural network to predict 5-year overall survival (OS) and cancer-specific survival (CSS). The variables included patient demographics (age, race, and year of diagnosis), pathologic factors (clinical stage, histological grade, and TNM classification), and therapeutic factors (surgical content).

Results

Overall, 71,506 patients for OS and 66,368 patients for CSS were included in the study. For the prediction of OS, XGBoost showed the best performance, with a class accuracy of 0.862 (95%CI: 0.859–0.866) and area under the curve (AUC) of 0.831 (95%CI: 0.827–0.836). Regarding the prediction of CSS, XGBoost also showed the best performance with a class accuracy of 0.914 (95%CI: 0.911–0.916) and AUC of 0.867 (95%CI: 0.862–0.871).

Conclusion

Using machine learning, we were able to predict the prognosis of endometrial cancer. Future studies should analyze the important variables and suitable algorithms with larger clinical data.

Introduction

Endometrial cancer is the most common gynecological malignancy in developed countries and the second most common in developing countries [1]. Most women with endometrial cancer are diagnosed with tumors that have a good prognosis because patients often present with early stage endometrial cancer. Regarding the prognosis of endometrial cancers, the 5-year overall survival was 90.3% for FIGO (International Federation of Gynecology and Obstetrics) stage IA, 80.5% for FIGO stage II, 68.5% for FIGO stage IIIA, and 22.0% for FIGO stage IVA. Staging is the most important prognostic factor, and the histology and depth of invasion are related to prognosis [2]. Traditionally, gynecologists have relied mainly on the final FIGO stage, histological type, and differentiation grade to estimate oncologic outcomes [3]. However, other clinical factors, such as patient age and tumor size, also affect prognosis.

Prediction of prognosis is one of the biggest challenges in cancer therapy. Accurate prediction could lead to risk stratification and triage of patients, which could help guide additional treatment and follow-up strategies. Depending on the risk stratification for the prediction of prognosis, physicians could customize additional treatment in high-risk patients and reduce the treatment and follow-up in low-risk patients. Additionally, accurate prediction of prognosis could be used as an effective tool for decision-making while providing patients with high quality explanations. Historically, nomogram or logistic regression analysis has been studied as a prediction model for the prognosis of endometrial cancers. A nomogram is a predictive tool that creates a simple graphical representation of a statistical model to generate the numerical probability of a clinical event [4]. Several studies have shown that nomograms have better individual discrimination than currently used staging systems in the endometrial cancer [57].

Currently, machine learning is considered a possible new predictive technique, and there is growing interest in its use in prediction models in medical fields [8, 9]. Machine learning is an area of artificial intelligence that can identify patterns in expanding, heterogeneous datasets to create models that accurately classify a patient’s diagnosis or state in the future [10]. Machine learning models can detect nonlinear correlations in laboratory, demographic, and clinical parameters that cannot be detected by linear methods [11, 12]. With the expression of nonlinear relationships among variables, machine learning models can produce predictions that are more complex. With the aggressive development of computer science and technological progress, a predictive model on a larger database should continue to progress in the medical field. We attempted to develop prediction models using machine learning algorithms and compared the prediction performance of the standard algorithms of machine-learning models.

Materials And Methods

Study population and dataset

We included patients with endometrial cancers between January 1, 1988, and December 31, 2018, using the Surveillance, Epidemiology, and End Results (SEER) database (version 8.9.8, National Cancer Institute, USA). SEER is a database on the incidence and survival rates of cancer in the United States, which covers approximately 28% of the population of the United States [13]. The SEER database is national, with information from 18 states, and includes a high proportion of racial/ethnic minorities and foreign-born individuals because of its targeted sampling strategy. The inclusion criteria were as follows: 1) patients with endometrial cancer, and 2) patients with a known cause of death and survival duration after diagnosis. The exclusion criteria were as follows: 1) uterine sarcoma; 2) unknown cause of death for survival duration < 5 years; and 3) unknown duration of survival or survival length < five years in the case of survival. Access to the SEER database did not require ethical approval and was covered by an open-access policy.

Variables used for the prediction

The following parameters were used as variables for the prediction of prognosis: 1) age, 2) race, 3) year of diagnosis, 4) pathological type, 5) pathological grade, 6) surgical stage, 7) T classification, 8) N classification, 9) M classification, 10) number of pelvic lymph nodes resected during the surgeries, 11) positive number of pelvic lymph nodes, 12) number of para-aortic lymph nodes resected surgically, 13) positive number of para-aortic lymph nodes during surgery, 14) result of washing cytology, and 15) tumor size. Preprocessing of each variable and the number of missing values are summarized in the Appendix (Supplementary material 1). For each variable, we performed statistical analysis and divided the patients into two groups depending on the prognosis. In the statistical comparison of the two groups, Student’s t-test was used to analyze significant differences in quantitative parameters, and Pearson's chi-square test was used for qualitative parameters. Statistical significance was set at P < 0.05. The target of prediction (i.e., the primary endpoint) was 5-year overall survival (OS) and cancer-specific survival (CSS). OS and CSS were defined as the final outcome of the patients, which was measured by the interval between diagnosis and death or loss to follow-up. The time from diagnosis to death or censoring was defined as the survival length.

Continuous data were transformed using z-score normalization, and categorical data were transformed using one-hot encoding. Statistical analyses were performed using Python and R statistical software.

Development of machine learning model

We developed four standard machine learning classifiers that predict 5-year OS and CSS from clinical variables, as mentioned above. The four models were logistic regression (LR), random forest (RF), gradient boosting machine (XGBoost), and artificial neural network (ANN), which are frequently used in predictive tasks. All cases were randomly assigned to the training data (90%) or validation data (10%) using a generator of random numbers. Using the stratified k-cross validation method, the rate of survival groups in the training and test sets was kept equal to that of the original dataset. Hyper-parameter tuning was performed by grid search, and the hyper-parameters used in each model are shown in the appendix (Supplementary material 2). The implementation of machine learning was performed with Python (version 3.7) as a programming language using the Scikit-learn (version 1.0) machine learning package and Keras (version 1.2.2). Our study was performed in accordance with the TRIPOD statement, and the checklist is available in the appendix (Supplementary material 3) [14].

Evaluation technique of prediction performance

Model performance was evaluated using classification accuracy, area under the curve (AUC), and Brier scores. The class accuracy was calculated as follows: (accuracy) = (correctly predicted as survival in the survival groups) + (correctly predicted as death in the dead groups) / total number of cases. The AUC, which is the concordance index (C-index), measures the entire two-dimensional area underneath the receiver operating characteristic (ROC) curve plotted as mentioned below. We also measured the Brier score and accuracy of the probabilistic predictions, which showed that lower scores indicated a more accurate prediction. The point estimates for each metric were calculated by testing each model in the validation dataset. We generated 95% confidence intervals and p-values using the empirical bootstrap method with 1000 iterators.

We also analyzed the graphical assessment of the prediction model using the receiver operating characteristic (ROC), calibration, and decision curves. ROC curves were created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. Calibration curves were used to evaluate how calibrated a classifier was and how the probabilities of predicting each class label differed, based on the actual incidence and predicted incidence [15]. Calibration curves were created by plotting the observed frequency of each class against the average predicted probabilities of the models. Calibration curves can measure whether the prediction models are erroneously estimated and overfitted. Decision curves are graphical summaries recently proposed to assess the potential clinical impact of risk models for recommending treatment or intervention. Decision curves are tools used to determine whether the range of miscalibration is clinically important [16]. Decision curves were created by plotting the net benefit after subtracting the advantages according to the disadvantage against a probability threshold.

Results

Patients’ demographics

Overall, 71,506 patients with endometrial cancer were included to assess OS and 66,368 were included to assess CSS. The patient selection process for the OS dataset is shown in Fig. 1, and each clinicopathological variable in the population of the OS dataset is summarized in Table 1. The mean age at diagnosis was 61.2 years (SD, 12.2 years). Regarding race, whites were 84.9%, blacks were 6.7%, and the others comprised 8.3% of the study population. Most of the histological types were endometrial cancer (57.1%), followed by serous adenocarcinoma (6.1%), although the detailed pathologies except for adenocarcinomas were unknown in 29.8% of cases. Regarding pathological grade, grades 1 and 2 were most frequent (67.2%). Most patients presented with early endometrial cancer, with 75.3% classified as Stage. Regarding the TNM classification, T1/T2 (86.5%) was the most frequent, N0 was 90.3%, and M0 was 92.9%. The median size was 26.0 mm (SD, 37.0 mm). In this study, among the OS population, there were 15,977 (22.3%) deaths during the 5-year follow-up. The median time to event for patients who died of cancer was 23.0 months. The median time to events in the overall population was 114 months.

Table 1

Baseline characteristics in 5-year overall survival (OS) dataset

 

N

%

All

71,506

 

Mean year at diagnosis

2002 (7.7)

 

Mean age at diagnosis

61.2 (12.2)

 

Race

   

White

60,738

84.9

Black

4972

6.7

Others

5976

8.3

Vital status

   

Alive

55529

77.6

Dead

15977

22.3

Grade

   

G1

28138

39.3

G2

19971

27.9

G3

12632

17.6

Missing

10765

15.1

Stage

   

53885

75.3

5230

7.3

6926

9.7

5465

7.6

Pathology

   

Endometrioid

40805

57.1

Serous

4292

6.1

Mixed

2297

3.2

Clear cell

1047

1.4

Mucinous

987

1.3

Carcinosarcoma

760

1.1

Unknown(adenocarcinoma)

21318

29.8

Surgical staging

   

Localized

53887

75.3

Regional

11595

16.2

Distant

6024

8.4

T class

   

T1

55685

77.8

T2

6256

8.7

T3

6100

8.5

T4

863

1.2

TX

2602

3.6

N class

   

N0

64576

90.3

N1

4655

6.5

N2

103

0.1

NX

2172

3.1

M class

   

M0

66491

92.9

M1

5015

7.1

Cytology*

   

Negative

9779

88.1

Positive

1321

11.9

Mean tumor size* (mm)

26.0 (37.0)

 

Mean number of examined PLN*

8.3 (9.7)

 

Mean number of examined PAN*

1.9 (4.1)

 

Mean number of positive PLN*

0.40 (1.7)

 

Mean number of positive PAN*

0.30 (1.6)

 
Data are mean (SD) or n (%)
*) contains missing data
PAN: para-aortic lymph nodes
PLN: pelvic lymph nodes

 

Statistical analysis of variables

Regarding the comparison between the alive and dead groups in the OS population, significant differences were observed in all continuous and categorical variables. The median values and rates of each variable between the two groups for OS are shown in Table 2. As continuous variables, the age at diagnosis in the alive group was 7 years younger than that in the dead groups (mean; 59.2 years vs. 67.9 years), and tumor size was also 10 mm smaller (mean: 23 mm vs. 34 mm). Regarding grade, stage, TNM classes, and cytology, advanced stage endometrial cancer was clearly observed in the dead groups. Regarding the pathological types, serous adenocarcinoma, clear adenocarcinoma, and carcinosarcoma were more common in the dead group.

Table 2

Statistical comparison of two groups in 5-year overall survival (OS) dataset

 

Alive

Dead

p-values

All (n = 71,506)

(n = 55,529)

(n = 15,977)

 

Year at diagnosis

2002.2 (7.4)

2003.4 (8.3)

< 0.01

Age at diagnosis

59.2 (11.6)

67.9 (11.6)

< 0.01

Race

   

< 0.01

White

86.2

80.7

 

Black

5.2

12.1

 

Others

8.7

7.2

 

Grade

   

< 0.01

1

45.6

17.3

 

2

29.1

24

 

3

12.2

36.4

 

Missing

13.0

22.3

 

Stage

   

< 0.01

84.3

44.5

 

6.9

8.6

 

6.9

19.2

 

1.9

27.7

 

Pathology

   

< 0.01

Endometrioid

60.9

43.9

 

Serous

3.3

15.3

 

Mixed

2.9

4.2

 

Mucinous

1.5

1.1

 

Clear cell

0.9

3.4

 

Carcinosarcoma

0.4

3.5

 

Unknown (adenocarcinoma)

30.1

28.8

 

Surgical staging

   

< 0.01

Localized

84.3

44.4

 

Regional

13.2

26.6

 

Distant

2.5

28.9

 

T class

   

< 0.01

T1

86.2

49

 

T2

7.8

12.4

 

T3

4.9

21.1

 

T4

0.3

4.4

 

TX

0.8

13.5

 

N class

   

< 0.01

N0

95.8

71.2

 

N1

3.5

17.1

 

N2

0

0.6

 

NX

0.7

11.2

 

M class

   

< 0.01

M0

98.3

74.5

 

M1

1.7

25.5

 

Cytology*

   

< 0.01

Negative

93.7

69.1

 

Positive

6.3

30.9

 

Tumor size (mm)

23.3 (32.1)

34.7 (48.4)

< 0.01

Examined PLN

9.1 (9.9)

6.3 (8.7)

< 0.01

Positive PLN

0.18 (1.2)

1.1 (2.6)

< 0.01

Examined PAN

2.0 (4.2)

1.6 (3.7)

0.037

Positive PAN

0.17 (1.3)

0.77 (2.2)

< 0.01

Data are mean (SD) or n (%)
PAN: para-aortic lymph nodes
PLN: pelvic lymph nodes

 

Performance of machine learning classifiers

Regarding the prediction of OS, XGBoost showed the best performance with a class accuracy of 0.862 (95%CI: 0.859–0.866) and AUC of 0.831 (95%CI: 0.827–0.836), followed by ANN with a class accuracy of 0.858 (95%CI: 0.853–0.863) and AUC of 0.831 (95%CI: 0.821–0.838). Logistic regression had a class accuracy of 0.841 (95%CI: 0.836–0.846) and AUC of 0.805 (95%CI: 0.796–0.814), while random forest had a class accuracy of 0.836 (95%CI: 0.833–0.839) and AUC of 0.777 (95%CI: 0.771–0.784).

In the prediction of CSS, XGBoost also showed the best performance with a class accuracy of 0.914 (95%CI: 0.911–0.916) and AUC of 0.867 (95%CI: 0.862–0.871), followed by ANN with a class accuracy of 0.907 (95%CI: 0.904–0.908) and AUC of 0.853 (95%CI: 0.847–0.859). Logistic regression had a class accuracy of 0.896 (95%CI: 0.892–0.899) and AUC of 0.837 (95%CI: 0.831–0.844), while random forest had a class accuracy of 0.903 (95%CI: 0.901–0.906) and AUC of 0.833 (95%CI: 0.827–0.836). The metrics for each prediction model are listed in Table 3.

Table 3

The performance of machine leaning models

1) 5-year overall survival (OS)

   

Prediction model

Accuracy (95% CI)

AUC (95% CI)

Brier score (95% CI)

XGBoost

0.862 (0.859–0.866)

0.831 (0.827–0.836)

0.105 (0.103–0.108)

Artificial neural network

0.858 (0.853–0.863)

0.831 (0.821–0.838)

0.107 (0.106–0.109)

Logistic regression

0.841 (0.836–0.846)

0.805 (0.796–0.814)

0.118 (0.115–0.120)

Random forest

0.836 (0.833–0.839)

0.777 (0.771–0.784)

0.120 (0.118–0.122)

2) 5-year cancer-specific survival (CSS)

   

Prediction model

Accuracy (95% CI)

AUC (95% CI)

Brier score (95% CI)

XGBoost

0.914 (0.916 − 0.911)

0.867 (0.871 − 0.862)

0.066 (0.067 − 0.064)

Artificial neural network

0.907 (0.908 − 0.904)

0.853 (0.859 − 0.847)

0.069 (0.071 − 0.066)

Logistic regression

0.896 (0.899 − 0.892)

0.837 (0.844 − 0.831)

0.079 (0.082 − 0.076)

Random forest

0.903 (0.906 − 0.901)

0.833 (0.836 − 0.827)

0.076 (0.077 − 0.074)

AUC: area under the curve
95% CI: 95% confidence interval

 

Each model showed good calibration, with low Brier scores. XGBoost and ANN showed lower scores than logistic regression and random forest. In the prediction of CSS, XGBoost showed the best Brier score of 0.066 (95%CI: 0.064–0.0.67), followed by ANN with a Brier score of 0.069 (95%CI: 0.066–0.071). In the prediction of CSS, the model showed lower Brier scores than the prediction of OS.

Graphical assessment for the prediction models

The ROC curves for OS and CSS are shown in Fig. 2. The machine learning models predicted OS and CSS with a high AUC, as mentioned above. Among the four models, XGBoost and ANN showed similarly higher prediction performance compared to the other two models. Comparing the ROC curve between OS and CSS prediction, the difference between XGBoost/ANN and the other two models was more apparent in CSS prediction, which showed that XGBoost/ANN models were considered to have higher abilities in expressing the relationship between clinicopathological variables and CSS.

The calibration curves demonstrated good agreement between the prediction and observation in the probability of OS and CSS, as shown in Fig. 3. XGBoost and ANN are considered to have high stability and a low level of overfitting. Figure 3 also provides the decision curve analyses. Regarding the prediction of OS, the net benefit of XGBoost was the highest among the four models. The gain from XGBoost was particularly higher with threshold probabilities of risk between 0.2 and 0.9.

Discussion

To our knowledge, this is the largest study on the use of machine learning models in the field of prognosis prediction for gynecologic cancers. We demonstrated that gradient boosting machine algorithms (XGBoost) outperformed other algorithms in predicting endometrial cancer prognosis in a large dataset. The artificial neural network also showed a performance similar to that of the gradient boosting machine. Machine learning has the potential to be a strong predictive tool for cancer prognosis when using a large dataset. We also showed that the prediction performance of CSS was better than that of OS in this study. This shows that the use of variables more directly related to cancer could improve the prediction performance of cancer prognosis.

The models in our study serve as tools for clinicians that could help in decision-making while treating patients and for risk stratification of patients. The variables used in this study were routinely obtained after surgical interventions. Thus, regarding the explanation of pathological/staging results with the patients, our prediction model could be an efficient tool for decision-making regarding the need for additional radiation/chemotherapy and follow-up strategies after surgery. While prognosis could be shown with numerous values, patients and physicians could discuss next steps after surgery.

Currently, risk stratification is a major problem in the management of endometrial cancer. Histopathological evaluation including subtyping and grading is the current cornerstone for endometrial cancer classification, providing clinicians with prognostic information. Nonetheless, patients with histologically similar endometrial cancers may have very different outcomes, notably in patients with high-grade endometrial carcinomas [17]. Traditional classifications are limited in predicting response to therapy [18]. Furthermore, molecular classification is currently being explored as a highly reproducible system with strong prognostic value [19, 20]. Leon-Castillo and colleagues commented that FIGO stage was not a significant predictor of recurrence or survival after correcting for molecular subgroups [19]. A new classification system should be analyzed from a multifaceted aspect of endometrial cancer. The construction of accurate prediction models could lead to risk stratification, which could provide cost-effective follow-up for low-risk patients and a new treatment strategy for high-risk patients.

Several studies have been published on prediction using machine learning models in patients with endometrial cancers. Praiss and colleagues constructed machine learning models using three clinical factors, including TNM stage, grade, and age, from 46,000 patients with endometrial cancer in the SEER study. Their model achieved an AUC of 0.838 for the prediction of 3-year overall survival [21]. In contrast, other studies on prediction models for endometrial cancers used a small dataset consisting of patients from a single institution. Gunakan and colleagues used pathological data from 726 patients with endometrial cancer to predict lymph node involvement [22]. Their machine-learning model achieved 0.84 accuracy for the prediction. For other types of cancer, several machine learning studies using large datasets have been published. Lee and colleagues used 64,000 patients with prostate cancer in the SEER study and constructed machine learning models for the prediction of the prognosis of prostate cancer [23]. Their model achieved an AUC of 0.82 for the prediction 10-year cancer-specific mortality in patients with prostate cancer. Comparing machine learning models with the top-ranked multivariable prognostic models showed that machine learning models had better performance compared to other prediction models. Previous studies using machine learning have also shown performances with an AUC of 0.8–0.85. In our study, the prediction targeting 5-year CSS achieved good performance with an AUC of 0.86 and an accuracy of 0.91. Although it is unclear how high the threshold of the prediction model could be used in clinical settings, improvement in prediction performance could lead to its clinical use as a predictive tool.

The limitations of this study are as follows: First, external validation was lacking. Because the size of the dataset was large, we performed internal validation. To accurately evaluate the robustness of the prediction models and avoid overfitting, a fresh dataset was used for external validation. External validation is necessary to determine a prediction model's reproducibility and generalizability to new and different patients [24]. Second, the clinical variables were limited. These variables were only used for the data included in the SEER study. Thus, other variables, such as gravity, parity, and BMI, which are considered as factors that affect the progression of endometrial cancer, were lacking. Additionally, other pathological information, such as the extent of myometrial invasion or the organs of metastasis and the concrete location of positive lymph nodes, are also important factors. Third, other clinical data, such as preoperative imaging data, molecular profile, tumor marker, or blood examination, could be efficient as strong predictive factors. Leon-Castillo and colleagues reported that the integration of molecular classification with clinicopathological features resulted in improved prognostic accuracy in patients with intermediate-risk endometrial cancer [19]. In the field of radiology, several studies using deep learning and preoperative imaging data have been published to predict prognosis [25, 26]. Finally, we did not construct ensemble models for several machine-learning algorithms. Ensemble learning can show better prediction performance than each single algorithm [27, 28]. However, before the construction of ensemble learning by gathering several algorithms, algorithms that are more suitable should be analyzed by comparing each algorithm using predictive tasks. Currently, the exploration of suitable algorithms among several machine learning models should be conducted.

The study of prediction models using machine learning is developing readily with the aggressive development of computer science and the worldwide spread of medical records and databases. In the future, compilation of database and studies could continuously improve prediction performance. Future studies should analyze efficient variables for the prediction of prognosis and a suitable algorithm for each cancer field. In this study, we demonstrated the possibility of using machine learning to predict prognosis in patients with endometrial cancer.

Declarations

Author contributions:

MA conceived and designed the study. KH supervised the study. MA acquired and analyzed the data. KH interpreted the data, were involved in preparing the manuscript. All authors reviewed the manuscript.

Data availability statements:

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

Additional Information:

The authors declare no competing interests.

References

  1. Morice P, Leary A, Creutzberg C, Abu-Rustum N, Darai E. Endometrial cancer. Lancet 2016;387:1094–108.
  2. Sorosky JI. Endometrial Cancer. Obstetrics & Gynecology 2008;111:436–47.
  3. Frederic A, Philippe M, Patrick N, Dirk T, Erik VL, Ignace V. Endometrial cancer. Lancet 2005;366:491–505.
  4. Alexia I, Deborah S, Ganesh VR, Katherine SP. How to build and interpret a nomogram for cancer prognosis. J Clin Oncol 2008;26:1364–70.
  5. Abu-Rustum NR, Zhou Q, Gomez JD, et al. A nomogram for predicting overall survival of women with endometrial cancer following primary therapy: Toward improving individualized cancer care. Gynecologic Oncology 2010;116:399–403.
  6. Zhu L, Sun X, Bai W. Nomograms for Predicting Cancer-Specific and Overall Survival Among Patients With Endometrial Carcinoma: A SEER Based Study. Front Oncol 2020;9;269.
  7. Guilan X, Cuifang Q, Wenfang Y, et al. Competing risk nomogram predicting cancer-specific mortality for endometrial cancer patients treated with hysterectomy. Cancer Med 2021;10:3205–13.
  8. Wei-Hsuan LC, Julie MD, Qingnan Y, et al. Developing and validating a machine-learning algorithm to predict opioid overdose in Medicaid beneficiaries in two US states: a prognostic modelling study. Lancet Digit Health 2022;4:e455-65.
  9. Faraz F, Fabian B, Anant D, at al. Identifying and predicting amyotrophic lateral sclerosis clinical subgroups: a population-based machine-learning study. Lancet Digit Health 2022;4:e359-69.
  10. Shah NH, Milstein A, Bagley SC. Making Machine Learning Models Clinically Useful. JAMA 2019;322:1351–2.
  11. Soren SS, Jeffrey CK, Amirhossein A, et al. Development and validation of an ensemble machine learning framework for detection of all-cause advanced hepatic fibrosis: a retrospective cohort study. Lancet Digit Health 2022;4:e188-99.
  12. Somaya H, Gamal E, Wafaa E, et al. Comparison of machine learning approaches for prediction of advanced liver fibrosis in chronic hepatitis C patients. IEEE/ACM Trans Comput Biol Bioinformatics 2018;15:861–8.
  13. Kemi MD, Alfred R, Julie AS. Practical Guide to Surgical Data Sets: Surveillance, Epidemiology, and End Results (SEER) Database. JAMA Surg 2018;153:588–9.
  14. Gary SC, Johannes BR, Douglas GA, Karel GM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594.
  15. Yingxiang H, Wentao L, Fima M, Rodney AG, Lucila OM. A tutorial on calibration measurements and calibration models for clinical prediction models. J Am Med Inform Assoc. 2020;27: 621–33.
  16. Andrew JV, Ben VC, Ewout S. Decision Curves, Calibration, and Subgroups. J Clin Oncol 2017;35:472–3.
  17. Lisa V, Vincent S, Remi N, Tjalling B. Incorporation of molecular characteristics into endometrial cancer management. Histopathology 2020;76:52–63.
  18. Rajmohan M, Robert AS, Britta W. Classification of endometrial carcinoma: more than two types. Lancet Oncol 2014;15:e268-78.
  19. Alicia LC, Stephanie MB, Melanie EP, et al. Molecular Classification of the PORTEC-3 Trial for High-Risk Endometrial Cancer: Impact on Prognosis and Benefit From Adjuvant Therapy. J Clin Oncol 2020;38:3388–97.
  20. Alicia LC, Nanda H, Elke EMP, et al. Prognostic relevance of the molecular classification in high-grade endometrial cancer for patients staged by lymphadenectomy and without adjuvant treatment. Gynecol Oncol 2022;164:577–86.
  21. Aaron MP, Yongmei H, Caryn MSC, et al. Using machine learning to create prognostic systems for endometrial cancer. Gynecol Oncol 2020;159:744–50.
  22. Emre G, Suat A, Asuman NH, Irem AK, Ehad G, Ali A. A novel prediction method for lymph node involvement in endometrial cancer: machine learning. Int J Gynecol Cancer 2019;29:320–4.
  23. Changhee L, Alexander L, Ahmed A, David T, Mihaela VS, Vincent JG. Application of a novel machine learning framework for predicting non-metastatic prostate cancer-specific mortality in men using the Surveillance, Epidemiology, and End Results (SEER) database. Lancet Digit Health 2021;3:e158-65.
  24. Chava LR, Kitty JJ, Friedo WD, Carmine Z, Merel VD. External validation of prognostic models: what, why, how, when and where? Clin Kidney J 2020;14:49–58.
  25. Yan Z, Cuilan G, Ling Z, Xiaoyan L, Xiaomei Y. Deep Learning for Intelligent Recognition and Prediction of Endometrial Cancer. J Healthc Eng 2021:26;1148309.
  26. Pier PM, Arnaldo S, Renato C, et al. MRI radiomics: A machine learning approach for the risk stratification of endometrial cancer patients. Eur J Radiol 2022;149:110226.
  27. Cong P, Yurong S, Jinlong Z, et al. Ensemble Learning for Early-Response Prediction of Antidepressant Treatment in Major Depressive Disorder. J Magn Reson Imaging 2020;52:161–71.
  28. Luca B, Francesco M, Alfonso R, Antonella S. An ensemble learning approach for brain cancer detection exploiting radiomic features. Comput Methods Programs Biomed 2020;185:105134.