Study population and variable selection.
Primary early stage EOC patients diagnosed between the years 2010 and 2015 were included using the SEER database. The records of patients diagnosed with EOC were retrieved from the SEER 18 registry database using the SEER*Stat 8.3.5 software. The specific inclusion criteria used to select patients with EOC were as follows: 1) Cases were limited stage T1 disease11; 2) the ICD-O-3 morphology codes ‘8020–8022, 8441–8442, 8460–8463, 9014’; ‘8470-8472, 8480-8481, 9015’; ‘8380-8383, 8570’ and ‘8290, 8310, 8313, 8443–8444’ were used to identify women with serous, mucinous, endometrioid and clear cell ovarian tumors, respectively ; 3) tumors located in primary site limited to C569. The exclusion criteria were as follows: 1)unknown grade, N stage and specific factor 1; 2) multiple primary cancers.Variables were grouped according to the actual clinical situation and previously reported cutoff values We included the following factors assessed at diagnose: age( <55 and ≥55 years old.), race (White, Black, other/unknown), marital status (married, unmarried), histology (serous, mucinous, endometrioid, clear cell), grade (wel-differentiated, moderately-differentiated, poorly differentiated, undifferentiated), laterality (unilateral, bilateral), tumor size (<5, 5–10, >10 cm), CA125 (normal, elevated),distant metastasis M0 and M1; regional lymph node N0 and N1 defined according to the American Joint Committee on Cancer seventh edition (AJCC7th), primary site surgery (yes, no), radiotherapy (yes, no/unknown) and chemotherapy (yes, no/unknown). Categorical variables were coded using one-hot-encoding, and continuous variables were transformed into z-scores. Missing data were imputed by using the non-parametric miss Forest method12. The ratio of positive to negative samples in the training set was 1 to 20, and this sample imbalance will have a great impact on the performance of the predictor. After data cleaning and feature processing, the Synthetic Minority Oversampling Technique (SMOTE) was used to tackle the data imbalance problem. SMOTE had been applied in machine learning applications in healthcare, which can improve the performance of classifier13. The data were anonymous, and therefore the requirement for informed consent was waived from Institutional Review Board approval at Second Affiliated Hospital of Wenzhou Medical University.
Model construction
All patients were categorized randomly into a training (70%), validation (15%), and test set (15%). The correlation between each variable was measured by Pearson’s correlation test, and results were presented in the heat map. Six ML algorithms, including logistic regression model (LR), support vector machine (SVM), multi-layer Perceptron Classifier (MLPClassifier), gaussian naive bayes (GaussianNB), Extreme Gradient Boosting (XGBoost) model, and random forest (RF) model were employed. We performed 10-fold cross-validation on the training set and the hyperparameters were tuned by grid search. The validation set was used to adjust for the model parameters, whereas the test set was used to evaluate the performance of the system. The final models were evaluated for the confusion matrix metrics of accuracy, sensitivity, specificity, F1 score, and area under the receiver operating characteristic (ROC) curve (AUC).
Model Interpretation
Feature contribution toward model prediction was evaluated by the Shapley Additive Explanations, a game-theory-based approach for elucidating feature importance for any fitted ML model. With the SHAP method, the importance of each predictor is sorted according to the SHAP value. The most important feature is that with the largest absolute SHAP value. Simultaneously, the characteristics of high SHAP values positively influence the output of the ML model and vice versa. The SHAP values were obtained in Python shap package to interpret model predictions and visualize the results. The present study refers to presentation videos and guidelines from the SHAP website (http://www.shap.ecs.soton.ac.uk/).
Statistical analysis
The SVM, LR, Gaussian NB, and RF models were implemented via the Python Sklearn package. The XGBoost was implemented using the Xgboost package. MLP Classifier was implemented using TensorFlow. All statistical analyses were performed in R (version 3.6.8, R Foundation for Statistical Computing) and Python (version 3.7, Python Software Foundation).