BeOTs, BOTs and eMOTs are very different in terms of their treatment and prognosis. The fact that lack of techniques for early detecting characteristics of ovarian tumors is associated with an increased risk of women mortality. Especially because of BOT’s similarity in radiographic findings to BeOTs, the BOTs are frequently misdiagnosed, and its therapeutic regimens and prognoses is probably underestimated(26, 27). Not only that, BeOTs are also sometimes hard to distinguish from MOTs in some cases, such as ovarian cystadenoma (28). Therefore, accurately distinguish those three kinds of ovarian tumors is the essential link in promoting therapeutic effects and prolonging the survival duration. Currently, a CT report is fundamental to diagnosis and evaluation of ovarian disease and cannot be replaced by routine or random noninvasive and invasive testing which may be neither comprehensive nor precise enough. However, radiologists interpretate CT image with the naked eye may ignore some imaging features, introduce interobserver bias, or be too subjective. Hence, an accurate preprocedural diagnosis model that identify different kind of ovarian tumors when they found as adnexal masses at the first time would contribute to make personalized clinical decision. With that in mind, our goal of this study is to explore and verify a machine-aided diagnosis model based on CECT image features and machine learning algorithm, and different algorithms were compared by diagnostic performance. This diagnosis model, which compare with conventional diagnostic methods, is objective, convenient and noninvasive. In this study, we explore the feasibility of five three-class machine learning models constructed by utilizing CECT based radiomics features to differentiate nature of ovarian tumors. The diagnostic model based on RF algorithm exhibited the best diagnostic performance to identify different nature of ovarian tumors in the two datasets (including training and test cohorts).
Studies have shown that imaging technologies-based radiomic frameworks could provide valuable and truthful information on ovarian tumor nature. The ultrasound-based ML model has shown an accuracy that is over 85% in differentiating BeOTs from MOTs (29). MR is multiparameter and better than other imaging procedures in soft tissue resolution. An algorithmic, as an aid in MRI for distinguishing BeOTs from MOTs, been recommended by European Society of Urogenital Radiology (ESUR) (30). Zhang et al. extracted radiomics features radiomics extracted from MR multiple sequences exhibit favorable performance in differentiating BeOTs, BOTs, and MOTs, and superior to that of radiologists. More importantly, radiologist’s lack of distinguishing BeOTs from BOTs was balanced by MR radiomics (31). More recently, Song et al. also found that radiomics features form 7 different DCE-MR images have favorable discrimination function with AUCs of 0.89, 0.94, and 0.89 for BeOT, BOT, and MOT, respectively (32). MR radiomic shows good diagnosis performance not just in vivo environments, but in vitro tissue of human ovarian tumors (33). However, CT-based radiomic research is principally focuses on the relationship between survival and radiomic features (20, 34–39), and BOT is seldom investigated. In contrast to previous research, this study built a CECT-based radiomic three-class preoperative diagnosis model for analyzing the nature of primary ovarian tumors. Our results support the previous studies that radiomic have diagnostic power for discriminating among BeOTs, BOTs, and MOTs.
In this study, because the CT images of two cohorts were supplied by different multidetector CT scanners, gray-level normalization of each VOI was performed to limit those variabilities. CECT-based radiomic features that extracted through algorithms defined by IBSI were applied to wavelet transform and quantization to get effective features. And then, WMW, LASSO and SVM were employed to select the most suitable radiomics features. The result of multiple classifiers, including LR, SNN, RF, KNN and DT was combined for comparison to choose the best one in the test cohort. Average AUC and discrimination AUC were used to assess performances of the models. In training cohort, excellent performance can be achieved in the five models. In LOOCV, SNN also had excellent performance, the other four were very good performance. In test cohort, models achieved good performance except for DT. These results suggest that models not only exhibited excellent performance in the training cohorts but had good generalization in test cohort. The degradation of diagnostic performance in three cohorts (including training, LOOCV and test cohorts) mainly caused by uneven distribution of pathological type. The models constructed by the RF obtained the maximum average AUC (0.78/0.81) in test cohort and the average bias between training AUC and testing AUC (0.20) was smaller than that in LR, DT and SNN. Our results indicating that in addition to BOTs group, the one-vs-all ROC curves of the other two groups had very good or good distinguishing performance, which echoed previous studies that BOTs were more difficult to be distinguished from ovarian neoplasms(26, 27). The one-vs-all ROC curves of the RF and KNN had sufficient discriminative ability to separate BOTs that requires aggressive treatment to prevent further progression, which again proved that radiomics digged the link between images and ovarian neoplasms more deeply than that of visual interpretation(32). RF showed higher discrimination performances than those of the KNN, with AUC for BeOTs and MOTs were 0.89 and 0.83.
Varying levels of quantization on images to obtain high diagnosis performance is rarely seen in differentiating ovarian tumors. Moreover, accurate texture features were extracted by applying different quantization levels in which 7-bit and 8-bit are relatively rare. Vamvakas et al. applied 8-bit quantized images to acquire textural features that contribute to MR glioma grading (40). Takahiro et al. selected the top 5 features for MR glioma grading and found 3 of them came from 7-bit or 8-bit quantized images (41). In our study, 58 selected features were highly associated with the diagnosis of ovarian tumors, and 44 of them had to be processed for Image quantization. 17 of them came from 4-bit, 15 of them came from 5-bit, 10 of them came from 8-bit, and the remaining 2 selected features were from 6-bit and 7-bit quantized images. That means two points: first, it is necessary to set different quantization levels to acquire accurate texture features; second, effective features perhaps come from a few rare quantization levels.
Combining machine learning with multiple classification in radiomics can be viewed as a novel diagnosis system in ovarian tumors identification. Split a multiple classification model across several dichotomous model, which is an essential ritual for solving multiple classification problems. One-vs-all algorithm can effectively extend binary classification algorithm to multi-classification problems. The diagnosis performance of models was evaluated by ROC curve and AUC in training cohort, and then, generalization performance of the models was assessed in test cohort. Discrimination performances of one-vs-all AUC and average AUC were assessed finally. Among the five ML models, the RF-based model exhibits the best performance in differentiating BeOTs, BOTs, and MOTs, with a macro/micro average AUC of 0.99/0.98 and an accuracy of 0.89 in training cohorts. The test cohort achieved an AUC of 0.89 for the BeOT, 0.60 for the BOT, and 0.83 for the eMOTs (Fig. 4b). Although there are no previous studies combining multiple classification and ML to differentiate ovarian lesions in CT images, Song et al. applied MR radiomics features based 3-class classification to discriminate ovarian neoplasms, and the findings revealed that the test cohort with AUC of 0.82, 0.81, and 0.85 for BeOTs, BOTs, and MOTs groups, respectively, proving a more effective method (32). Perhaps three reasons for they had better result than we did: first, MR have higher resolution for soft tissue, which lead to information come from CECT features is much more limit than that from MR. Second, radiomics features from 7 different DCE-MR images were used to construct a 3-class classification model which owns more information for descripting lesions. Third, eMOTs is more difficult to identify than MOTs in most cases. Although maybe CT-based model is less efficient than MR-based model, the negatively effect of the result is likely to be small from a clinical perspective because CT is still the routine method for the preoperative staging (42, 43). Besides, portal venous phase that was only considered in this study is the most frequently used phase in diagnosing of ovarian tumors(44). Lastly, compared to MR-derived features, CT-derived features are generally considered more reliable(45). Chen et al. builted a three-class CT radiomic model to discriminate risk stratification of gastrointestinal stromal tumors, and the result indicated that the training cohort and internal validation cohort with average AUC of 0.84 and 0.83, respectively (46). From this, we can see three classification model can obtain efficient, reliable diagnostic performance not only in ovarian tumors but also in other tumors.
This study possesses several limitations. First, our patients are all from a single centre, and just 258 patients were recruited, so robustness and reliability of the ML model for differentiating BeOTs, BOTs, and MOTs needs to be deeply researched with multicenter prospective validation. Second, 3D tumor was manually segmented, which may lead to time-consuming, subjective, unstable VOI segmentation and the model’s performance is likely to be effective. Thus, an automatic image segmentation method that accurate and reliable should be applied to map the VOI. Third, we built a pure radiomics diagnosis model that without any consideration of clinical characteristics and subjective CT findings model. Although the two sets were often be considered inferior to radiomics model in previous studies, a comprehensive model with combination of radiomics, clinical characteristics and subjective CT findings may perform better.