Background: Soft tissue sarcoma is a rare and highly heterogeneous tumor in clinical practice. Pathological grading of the soft tissue sarcoma is a key factor in patient prognosis and treatment planning while the clinical data of soft tissue sarcoma are imbalanced. In this paper, we propose an effective solution to find the optimal imbalance machine learning model for predicting the classification of soft tissue sarcomas data.
Methods: In this paper, a large number of features are first obtained based on T1WI images using the radiomics methods. Then, we explore the combinations of different sampling techniques, feature selection methods, and classification algorithms, and get nine imbalance machine learning models based on the above features. The receiver operating characteristic curve model is used to evaluate the performance of these nine methods on predicting the pathological grade of soft tissue sarcoma.
Results: The experimental results show that the combination of extremely randomized trees classification algorithm using random oversampling examples (ROSE) and the recursive feature elimination technique performs best compared to other combination methods. The receiver operating characteristic area under the curve, accuracy, sensitivity, specificity and G-mean of this method for predicting high-grade versus low-grade soft tissue sarcoma are 0.9457, 94.61%, 93.55%, 95.59% and 0.9456, respectively.
Conclusion: Preoperative predicting pathological grade of soft tissue sarcomas in an accurate and noninvasive manner is essential. Our proposed machine learning method can make a positive contribution to solving the imbalanced data classification problem, which can favorably support the development of personalized treatment plans for soft tissue sarcoma patients.