Machine learning applied to MRI evaluation for the detection of lymph node metastasis in patients with locally advanced cervical cancer treated with neoadjuvant chemotherapy

Concurrent cisplatin-based chemotherapy and radiotherapy (CCRT) plus brachytherapy is the standard treatment for locally advanced cervical cancer (LACC). Platinum-based neoadjuvant chemotherapy (NACT) followed by radical hysterectomy is an alternative for patients with stage IB2–IIB disease. Therefore, the correct pre-treatment staging is essential to the proper management of this disease. Pelvic magnetic resonance imaging (MRI) is the gold standard examination but studies about MRI accuracy in the detection of lymph node metastasis (LNM) in LACC patients show conflicting data. Machine learning (ML) is emerging as a promising tool for unraveling complex non-linear relationships between patient attributes that cannot be solved by traditional statistical methods. Here we investigated whether ML might improve the accuracy of MRI in the detection of LNM in LACC patients. We analyzed retrospectively LACC patients who underwent NACT and radical hysterectomy from 2015 to 2020. Demographic, clinical and MRI characteristics before and after NACT were collected, as well as information about post-surgery histopathology. Random features elimination wrapper was used to determine an attribute core set. A ML algorithm, namely Extreme Gradient Boosting (XGBoost) was trained and validated with tenfold cross-validation. The performances of the algorithm were assessed. Our analysis included n.92 patients. FIGO stage was IB2 in n.4/92 (4.3%), IB3 in n.42/92 (45%), IIA1 in n.1/92 (1.1%), IIA2 in n.16/92 (17.4%) and IIB in n.29/92 (31.5%). Despite detected neither at pre-treatment and post-treatment MRI in any patients, LNM occurred in n.16/92 (17%) patients. The attribute core set used to train ML algorithms included grading, histotypes, age, parity, largest diameter of lesion at either pre- and post-treatment MRI, presence/absence of fornix infiltration at pre-treatment MRI and FIGO stage. XGBoost showed a good performance (accuracy 89%, precision 83%, recall 78%, AUROC 0.79). We developed an accurate model to predict LNM in LACC patients in NACT, based on a ML algorithm requiring few easy-to-collect attributes.


Introduction
Cervical cancer (CC) is the third most common cancer in women worldwide and counts 569,000 new cases annually.
The most common histopathologic type of CC is squamous cell carcinoma, representing more than 80% of the cervical malignancy. The others histotypes are adenocarcinoma (up to 15%) and adenosquamous carcinoma (less than 5%) [1]. Uncommon histopathologic types are small cell or neuroendocrine, serouspapillary and clear cell. Non-squamous presentations are most commonly associated with the worst prognosis [2,3].
While early stage forms are often asymptomatic, symptoms which may occur in locally advanced cervical cancer (LACC) are abnormal vaginal bleeding, pelvic pain, haematuria, dysuria, or haematochezia [4].
In 2018, lymph node (LN) metastasis (LNM) was enlisted in the International Federation of Gynaecology and Obstetrics (FIGO) staging system for the first time [5].
Accordingly, an accurate identification of LN status prior to surgery in women with CC is significant to aid treatment planning. Additionally, the presence of LNM is the most relevant risk factor for recurrence and survival [6,7]. The 5-year survival rate of patients with early-stage CC without LNM reaches 90%, while in patients with LNM 5-year survival is significantly reduced to only 65% [8].
The preoperative radiological evaluation of the LN is fundamental for a correct staging of the patient with CC and consequently for the choice of the most appropriate treatment.
According to 2018 FIGO Staging System, in early-stage forms, treatment typically consists of surgery as chemoradiotherapy makes patients susceptible to more unpredictable long-term side effects and menopause, despite being equally effective; patients may undergo surgery only if risk factors requiring adjuvant radiation treatment are not detected [11].
In patients with stage IB2-IIB, platinum-based neoadjuvant chemotherapy (NACT) followed by radical hysterectomy has been suggested as an alternative approach [14,15].
Several studies also report that patients undergoing radical surgery after NACT may improve disease control and survival outcomes while reducing toxicity than those patients receiving radiotherapy [16].
In the 2018 FIGO staging system, once LNM is identified by imaging, the tumor will be considered as stage IIIC independently of other findings [5].
LACC patients being referred for NACT followed by radical surgery must have no radiological evidence of LNM.
Specifically, patients showing evidence of LNM may be treated with CCRT rather than surgery as a first option, avoiding surgery followed by adjuvant chemoradiotherapy and the potential serious complications that may arise [17].
Thus, the accuracy of LN status detection in patients with CC needs to be enhanced to ensure appropriate management.
Across several areas in science, machine learning (ML) is emerging as a promising tool for the implementation of complex multi-parametric decision algorithms [18,19]. In that sense, the ML approach is a potential game changer. As a matter of fact, in addition to detecting linear models in analysed data, it can disclose complex non-linear relationships between patient attributes that cannot be handled by traditional statistical methods, merging them to provide a prediction or probability for a given outcome [20].
ML is a step towards precision medicine, which leads to the improvement of patient profiling and treatment personalization. Supervised ML algorithms have been proven effective in predicting treatment responses and disease progression in patients affected with heterogeneous diseases [21,22].
We investigated here whether ML could significantly improve the accuracy of pelvic magnetic resonance imaging (MRI) in the detection of LNM in patients affected by LACC.

Materials and methods
We retrospectively examined LACC patients who underwent NACT and radical hysterectomy from 2015 to 2021.
All of our cohort patients underwent MRI before treatment and, consequently, a pre-treatment radiological stage was set, according to FIGO 2018 [5]. All patients had either IB2, IB3, IIA1, IIA2, or IIB stage (ordinal variable). They also received NACT based on dose-dense weekly paclitaxel plus carboplatin (9 cycles) and a subsequent post-treatment MRI. Treatment response was evaluated by the change in tumor size as determined by Response Evaluation Criteria In Solid Tumors (RECIST v. 1.1, ordinal variable) [23]. In case of complete response (CR) or partial response (PR), the patients underwent radical hysterectomy with pelvic and lombo-aortic lymphadenectomy. The type of radical hysterectomy was C1. All surgery cases were performed by open surgery.
Demographic features (age) and clinical characteristics (parity, menopause) were recorded. Additionally MRI examination data, FIGO stage, RECIST criteria and post-surgical histopathology information (histotypes, grading) were collected. In pre and post-treatment MRI we reported the largest lesion diameter, presence/absence of infiltration of the fornix and parametrium.
No patient had radiologic suspect of LNM, vesico-vaginal septum infiltration and recto-vaginal septum infiltration.
Overall, the original database included n. 92 patients and n. 13 variables.
This study applied the STARD guidelines [24] and the TRIPOD statement [25].
The algorithms aimed at forecasting the LNM. The pathological finding at surgery was the ground truth.
Student's t test for paired samples or Wilcoxon matchedpair signed-rank test were used as appropriate to identify differences between continuous variables across different observation periods. McNemar's test was used to identify the difference among dummy variables between different observation periods. The significance level at α = 0.05 was used.
For classification with small training samples and high dimensionality, feature selection plays an essential role in avoiding overfitting and improving classification performance. One commonly used feature selection method for small samples problems is the wrapper feature selection using the recursive feature elimination (RFE) algorithm [26,27]. RFE needs an algorithm to be embedded. Provided with a model with feature coefficients (e.g. regression) or importance factors (e.g. tree algorithms), RFE starts from all features and gradually eliminates the least important feature. Once all features are removed, the algorithm returns the subset that gives the best performance (backward selection). RFE can generate different subsets of features based on various criteria. The subgroup generated in each step will be used to build a model and train the learning algorithm iteratively [26,28]. This is achieved by fitting the given ML algorithm used in the RFE core, ranking features by importance, discarding the least important features, and re-fitting the model (Supplementary Material).
The RFE elimination method is one of the commonly used feature selection methods for small samples problems [29,30] (For further details about RFE see Supplementary Materials).
The whole analysis was implemented in a Python 3.9 environment using scikit-learn (ver.0.22.1) and XGBoost (ver. 1.1.0) libraries [31,32]. After z-score normalization, we ran a Bayesian conditional ridge imputation [33] for missing data. The latter method has proved to be the most accurate for imputation of obstetrics and gynecology datasets [34] (see Supplementary Materials for further details).
Two different classifiers, either linear and non-linear, were used to train and validate the RFE with tenfold crossvalidation to predict LNM, namely logistic regression (LR) and a tree-based algorithm, namely extreme gradient boosting (XGBoost). For further details regarding cross-validation, see Supplementary Material.
While LR had almost always been the algorithm of choice to find independent predictors in multivariate models, it must be noticed that study hypotheses were usually based on the unreal assumption that the association between prognostic factors and clinical outcomes was direct and isolated. Conversely, LR is not suitable for modeling non-independent variables. Therefore, in addition to the usual LR, we employed XGBoost [34]. Tree-based models have recently been proven to accurately predict important woman's health outcomes, even in presence of non-linear patterns in the data [35,36]. Moreover, we choose XGboost as there is evidence of accurate performance in case of imbalanced data, as often occurs in clinical datasets [37]. We also ran XGBoost using cost-sensitive training trying to overcome imbalanced class issue.
A repeated grid-search with cross-validation was used for optimal hyperparameter tuning to maximize the classifiers' performance [38] (See Supplementary Material for hyperparameter fine-tuning).
For each classifier, ROC curves were plotted and then area under receiver operating characteristic curve (AUROC) was assessed.
Then, based on the optimal probability cut-off (Youden's Index) [39] classifiers' performance was compared with the following metrics: Generally, a classification model forecasts a binary outcome for a given observation and class. In the prediction process, a model may output the probability of an observation belongs to each possible class [40]. This case offers flexibility in both way predictions are interpreted and presented, allowing the choice of a threshold, such as the aforementioned Youden's index.
To be reliable, the estimated class probabilities should reflect the true underlying probability of the sample. To check these assumptions, a diagnostic calibration curve for the candidate best classifier was also plotted [41,42].

Results
Our analysis included n. 92 patients with diagnosis of LACC.
Demographic and clinical data, MRI parameters and histological examination are shown in Table 1.
Patients had a mean age (± SD) of 48.9 ± 11.5 years at diagnosis and n. 50 (54.3%) patients were premenopausal. All patients underwent NACT based on dose-dense weekly paclitaxel plus carboplatin [9 cycles].
As reported in Fig. 1, the attribute core set used to train XGboost algorithm included grading, histotypes, age, parity, largest diameter of lesion at either pre and post-treatment MRI, presence/absence of fornix infiltration at pre-treatment MRI and FIGO stage (92 columns × 9 rows (n. 8 selected attributes plus n. 1 target class, LNM, as above mentioned).
In Fig. 2, ROC curve for LR (box A) and XGBoost (box B) models was reported.
In Fig. 3, calibration diagnostic has been plotted for XGBoost; LNM roughly happened with an observed relative frequency consistent with the forecast value, showing an acceptable calibration curve. We would expect the match between predicted frequencies and observed frequencies to increase with a larger dataset.

Discussion
Since NACT had been proposed and applied to the treatment of CC in the 1980s, a large number of studies have focused on it [4,43].
NACT has the potential to treat patients with distant metastases and demonstrates great efficacy in both reducing recurrence and improving survival [44].
In 2019, the Clinical Practice Guidelines in Oncology (NCCN) highlighted that selected patients with FIGO stage IB2-IIB CC may accept radical hysterectomy or NACT followed by radical hysterectomy [45]; specifically, patients with stage IB1 disease or stage IB2-IIA1 disease with a preserved stromal ring should undergo primary radical surgery, whereas patients with stage IB2-IIA1 disease with interrupted stromal ring or stage IB3 disease may undergo definitive CCRT or NACT followed by radical surgery. Patients with stage ≥ IIA2 disease should undergo definitive CCRT.
Platinum-based NACT followed by radical hysterectomy has been suggested as an alternative approach to radiotherapy or CCRT in LACC, particularly for those with squamous cell histology, with objective response rates varying from 69.4 to 90.2%, pathological optimal response rates from 21.3 to 48.3%, 5-year disease-free survival (DFS) rates ranging from 55.4 to 71% and 5-year OS rates ranging from 58.9 to 81%, respectively [46][47][48][49].
NACT significantly reduced tumor size and determines benefits upon stromal invasion depth, parametrial infiltration, lympho-vascular space involvement (LVSI) and LNM, thus lowering the need of adjuvant radiotherapy [50][51][52]. NACT should be indicated mainly to relatively young women, also due to the lower incidence of long-term vaginal toxicity and impaired sexual life [53,54].
The role of LNM in LACC patients has been discussed in several studies.
Lack of LNM represented favorable prognostic variables for overall survival (OS) of patients treated with this chemosurgical approach.
Gadducci et al. [55] studied predictors of clinical outcome in patients with LACC treated with radical hysterectomy followed by NACT using traditional statistics. The study claimed that an optimal pathological response was the most relevant predictor for DFS and OS. Involvement of parametrial and surgical resection margins were the remaining independent predictor; conversely, LN status and LVSI were associated with DFS and OS.
Concordant data ware reported by Uegaki et al.; they demonstrated that pelvic LNM was the only histopathologically independent prognostic factor in patients with LACC treated with radical hysterectomy followed NACT [56].
Further support comes from Benedetti-Panici et al.; they showed that LNM and parametrial involvement were the only two independent factors for survival in the same patients subset [57].
The lack of suspected LNM at pre-treatment radiological staging is an essential criterion for the choice of NACT. Therefore the evaluation of pelvic and paraaortic LN is considered one of the main challenge in the management of CC.
For local pre-treatment staging, MRI is the gold standard examination. However, MRI is useful to define pelvic tumor extent, enabling accurate assessment of either tumor size and stromal invasion depth [58,59].
Conversely, with MRI imaging, metastasis in a normalsized LN may be ignored, and inflammatory LN enlargement cannot be reliably distinguished from cancer infiltration [60].
Thus, the MRI sensitivity in the preoperative assessment of LNM in untreated patients has been reported to be 30-73%, with an overall accuracy of 70-90% [61,62].
Hence, preoperative evaluation of LN status is an extremely important issue to be solved.
We investigated whether ML could improve the accuracy of MRI in the detection of LNM in LACC patients selected to perform NACT.
LR found that predictors of LNM were the maximal diameter of the tumor (OR 1. In regard to tumor size, it had been reported as predictor of LNM by numerous studies. Nanthamongkolkul et al. showed that tumor size > 2 cm was predictive factor for LNM as well as deep stromal invasion, LVSI and parametrial involvement [63].
Another predictor found by LR analysis was parametrial involvement which was already analysed by different studies, indicating it as an important prognostic factor in CC.
Yu. et al. in a study analysing 337 patients with stage IA2-IIA2, demonstrated that parametrial extension was associated with LNM [64].
In our study, XGboost algorithm found instead that predictors of LNM are grading, histotypes, age, parity, largest diameter of lesion at either pre-and post-treatment MRI, presence/absence of fornix infiltration at pre-treatment MRI and FIGO stage.
FIGO stage had been already evaluated as LNM predictors in previous studies.
Wu et al. reported that FIGO stage with serum squamous cell carcinoma antigen (SCC-Ag), histological type of squamous carcinoma and maximal tumor diameter predicted LNM in patients with CC [65].
The importance of grading as a prognostic factor had been already reported in the literature. Minig et al. analyzed 271 patients with stage IA2-IB1. They reported that in tumor < 2 cm without LVSI, LNM was 0%, 5%, and 3.1% in grades 1, 2, and 3 tumors, respectively. No patient with stage IA2-IB1 with grade 1 tumor had any evidence of LNM. In their study, other risk factors, such as tumor grade, histology, depth of stromal invasion, parametrial involvement, and tumor size, were not associated with LNM in their study [66].
Age is another predictor of LNM found by XGBoost. Previously, Kilic et al. reported that age was an independent prognostic factor for death because of disease either in earlystage patients or LACC patients. Risk of death was nearly doubled with younger age (OR 2.693; 95% CI 1.064-6.184) [67].
In the study of Cai et al., a significantly higher risk of LNM was present, at multivariable analysis, in patients with tumor size ≥ 2 cm (OR 4.350; 95% CI 1.197-15.816) and parametrial invasion (8.448; 95% CI 2.487-28.693), whereas other factors like age, menopausal, pathological type, deep stromal invasion, LVSI and histological grading were not significantly associated with LNM [68].
About parity, studies reported that high parity is positively associated with CC. Tekalegn et al. realized a metaanalysis revealing that women with high parity had 2.65 times higher odds of developing CC compared to their counterparts (OR 2.65, 95% CI 2.08-3.38) [69]. Although there are studies associating high parity with risk of developing CC, the former had not been associated with LNM in CC patients yet.
Therefore, our study confirmed the role of grading, histotypes, age, largest diameter of lesion and FIGO stage as classical predictors of LNM and first identified parity and presence/absence of fornix infiltration at pre-treatment MRI as new predictors.
For predictive modeling XGBoost, a non-linear algorithm, showed a higher performance than LR in terms of TPR (sensitivity) and precision.
The strength of our model is its capability of predicting LNM basing on easy-to-gather attributes that are widely available at the time of treatment choice with no additional cost.
Despite good performance, the main limitation of this study is still the sample size. Although our sample size for training and validation is similar or larger than those published recently [54], it must be noticed that ML algorithms perform significantly better when huge cohorts (i.e., thousands of patients) are used for training.

Conclusions
In gynecologic oncology, ML is a step towards precision medicine. By unraveling complex non-linear relationships between patient attributes that cannot be solved by traditional statistical methods, it may lead to improvement of the patient profiling and treatment personalization.
NACT in CC patients should only be performed in wellselected patients and only in patients without radiological suspicion of LNM.
The preoperative evaluation of the LN is fundamental for a correct staging of a patient with CC and therefore for the choice of appropiate treatment.
We developed an accurate model to predict LNM in LACC patients in NACT, based on a ML algorithm requiring few easy-to-collect attributes.
These techniques represent a possible solution that addresses some of the limitations of conventional MRI in LN evaluation in CC patients.
Our results are promising but need to be tested prospectively.