In this hypothesis generating study, we investigated the performance of 11 models based on clinicopathological variables and radiomic features from MRI and ultrasound for the capability towards DFS prediction of early-stage cervical cancer. LightGBM outperforms the other machine learning algorithms, with radiomic feature identified as the most effective predictor. ROC analysis proved that model CR5 with all variables included showed the best performance, yielding with AUC of 0.86 and ACC of 0.92. In general, this study provides new evidence for the combination of clinicopathological and radiomic features based on LightGBM, which may help promote the individualized post-treatment strategy plan that clinicians can follow, from active monitoring to chemotherapy and radiotherapy.
MRI imaging is commonly used to access tumor size, extent of invasion into adjacent structures and lymph nodes metastatic in cervical cancer, and surgery is then considered when the tumor is confined to the cervix/upper vagina. The role of radiomic from MRI, and its possible applications in CC, has been paid more attention recently. Cai et al (Cai et al., 2021) established a radiomics model based on T2WI and DWI ,and demonstrated that the model had a good predictive ability in progression-free survival, which yielded a C-index of 0.803 in the training cohort and an AUC of 0.795 in the validation cohort. Fang et al (Fang et al., 2020) similarly confirmed the value of radiomics features from MRI in DFS prediction by analyzing the values of selected features involving 248 patients with early-stage cervical cancer.
Technical development of US has currently led to its application in CC. Belitsos et al(Belitsos et al., 2012) proved the accuracy of ultrasound in differentiating precancerous lesion from cervical cancer, which improved the diagnostic performance. Jin et al (Jin et al., 2020) constructed a radiomic prediction model based on US images to assess predictive power of the LMN in early-stage CC, which showed satisfactory ability of US to discriminate LNM status. Furthermore, Fischerova et al (Fischerova et al., 2008) compared the application of US with MRI in the identification of tumor size. The accuracy of US was 90.5% whereas of MRI was 81.1% (P < or = 0.049), confirming the utility of US. It is worth noting that the combination of technique might play a significant role in tumor prediction (Millischer et al., 2015; Theodore et al., 2017). The investigators (Millischer et al., 2015; Moro et al., 2020; Theodore et al., 2017) showed that MRI and US could improve CC characterization and parametrial infiltration detection when compared with MRI alone. In addition, further evidence regarding the capability of technique combination for cervical cancer diagnosis has been suggested by Moro et al (Moro et al., 2020).
However, with the development of tumor research, there are more and more variables from different aspects, such as radiation, laboratory, pathology and genetics. Thus, Cai et al (Cai et al., 2021) found that the superiority of the combined model over radiomic feature or clinical features alone to CC survival prediction. This finding was then reproduced by Zheng et al (Zheng et al., 2022)and Zhang et al(Zhang et al., 2022). In these works, they also demonstrated that the model would perform better when these radiomic and clinical variables were incorporated. Our study is concordant with this in showing the predictive model with all variables incorporated has the best performance. In our study, we calculated the relative importance of a variable for DFS prediction, and found that the top eight factors of importance all related to radiomic features, including 4 features from US images and 4 from DWI images. The same result was found after unsupervised clustering analysis applied. Interesting, none of clinicopathological variables were found to participate in these selected subsets. We consider this an important discovery, consistent with previous studies(Cai et al., 2021; Zheng et al., 2022), where radiomic feature were proved to have more impact on performance than clinical features. The outstanding performance of radiomic features indicates that the information hidden in the images might reveal disease features and reflect underlying pathophysiology. Importantly, radiomics would yield promising results in previous studies, such as survival, tumor progression or genetic mutations (Hou et al., 2020; Wang et al., 2020a; Yu et al., 2020; Zhang et al., 2022), when together with ML models that are data-driven analysis methods for mining implicit clinical values from image features.
To date, the most commonly used classification model in clinical research is logistic regression model. Although good results have been proved, the application were not so extensive. As a generalized linear model, logistic model can better deal with the linear relationship between variables. Our study is line with this in showing the good performance of logistic model in DFS prediction. However, the fact is that there is not only a linear correlation but also a non-linear relationship among the characteristics. Thus, LightGBM shows a better performance in this study when compared with the logistic model, reflecting the reliability of our study to some extent. Recently, LightGBM is a newly developed ML model, which could tune a larger number of variables to ensure the accuracy of the model, achieving the optimal performance. While this algorithm has been successfully used in other cancer researches recently (Dong et al., 2020; Vamvakas et al., 2022), but not yet in cervical cancer. Our study demonstrated that the application of LightGBM model achieved an average AUC of 0.86, overcoming the limited efficacy of traditional ML models in improving the CC diagnostic accuracy.
Since the supervised machine learning approach with all variables is proved as an accurate prognostic prediction tool for the management of CC, we additionally implemented steps of unsupervised clustering analysis for prognosis analysis. Interestingly, unsupervised clustering analysis based on all variables was also found to effectively segregate CC subgroups which were clearly associated with prognosis. And a recent analysis of study (Kawakami et al., 2019) included 334 patients with epithelial ovarian cancer showed unsupervised clustering analysis could identify subgroups with significantly worse survival.
While the study highlights the potential of the integration of radiomic and LightGBM, some limitations of this study still deserved to be reported. Specifically, the selected sample size was small due to patients without MRI or US images in this hospital were excluded, which might inevitably limit the performance of the model in our research. Going forward, our study was performed without internal and external validation, which have a certain impact on the reduction of model accuracy confirmation intensity. All in all, a more comprehensive model with larger sample and multicenter external validation was looking forward to establishing in future.