We have developed and validated a novel diagnostic model for gastric GISTs by an XGBoost machine based on a single-center retrospective dataset. The predictors selected into this study through initial XGBoost model include: the ratio of long and short diameter under CT, the CT value of the tumor, the enhancement of the tumor in arterial period and venous period, existence of liquid area and calcific area inside the tumor under EUS. The model yielded satisfactory result on the test dataset for validation.
Current guidelines around the world recommend enhanced CT scan, endoscopy and EUS as the primary diagnostic modalities, while determining whether a tumor is GIST still relies on EUS or CT guided fine-needle aspiration biopsy. Although the guidelines recommend that fine-needle aspiration biopsy should be performed for those considering GISTs, preoperative biopsy is not promoted widely in some countries or regions in the world due to poor conditions or some other reasons and guidelines also mention that preoperative biopsy can be 'omitted' or 'not necessary' for limited resectable SMTs[11, 12]. In addition, fine needle aspiration biopsy may give false negative results due to its small specimen size. Therefore, although this invasive test has been proven secure and will not result in tumor rupture or GI-tract dissemination, the clinical application rate is still very low (2/124 in our database). For patients without preoperative biopsy result, the misdiagnosis rate is rather high (34/122 in our database), which is proof that the use of enhanced CT scan, endoscopy or EUS alone to diagnose GIST is not accurate enough[3, 4].
Recently, many studies demonstrated the influence of peripheral blood hematological indicators of systemic inflammation or nutrition on the long-term prognosis of various cancers and even GISTs after surgical resection[9, 13–16]. It’s suggested that higher-risk tumors, including GISTs, have a stronger impact on the patients’ nutritional state and inflammatory levels. Compared with other benign or low-malignant gastrointestinal SMTs, GISTs should further decrease nutritional indicators and increase inflammatory indicators. Therefore, we included peripheral blood systemic inflammation and nutrition indicators in our initial analysis. But not surprisingly, we found that these hematological test data, except for ALT, have little effect on the outcome. As is currently no evidence to support the impact of ALT level changes alone on the diagnosis of GIST, we excluded all hematological test data in the next model development.
Recent years witnessed the boost in artificial intelligence application in the medical field, assisting in disease detection, diagnosis and treatment decision-making. As the concept of precision medicine being promoted for years, the use of machine learning algorithms to help clinical diagnosis and treatment has become an inevitable trend. However, data science is not able to perfectly match the facts all the time. Selecting appropriate machine learning algorithm is crucial to yielding meaningful and useful results, yet it is not an easy process. Ensemble-based classifier is better than any single classifier in analyzing the influence of the combination of various factors on outcome. In terms of complex nonlinear multi-feature models such as predictive clinical models, the tree-boosting machine has better performance, giving both the importance and ranking of each factor simultaneously[19, 20]. The XGBoost algorithm has been applied in various clinical studies in constructing disease prediction models, and proved to have good validation results[21, 22]. Therefore, it is logical to choose XGBoost in our model development.
The first and foremost achievement of this research is the development of a GIST clinical diagnostic model. All patients included in our dataset were initially diagnosed as gastric GIST after preoperative examinations, and the model proved to have satisfactory validation results in such circumstance. The model outputs the importance of each predictor, suggesting that the existence of liquid area inside the tumor under EUS is the most important predictor, followed by the ratio of long and short diameter under CT, the CT value of the tumor, the enhancement of the tumor in arterial period and venous period, existence calcific area inside the tumor under EUS. All the data we used to develop the model came from the patients’ preoperative clinical examinations and hematological tests, which would not cause any additional pain or economic stress for the patients.
The main limitation of this study is that it is a single center, small sample, retrospective study, with 124 patients included. Large-scale, multi-center studies are required for the development of more accurate models. In addition, it should be noted that the analytic process using the gradient-boosting machine in this study was entirely based on data science. Clinical results may be different from mathematical calculation. At present, there is no perfect and absolutely accurate statistical algorithm that can predict the exact clinical outcome of every patient. Moreover, this model is only suitable for patients who consider gastric GIST as their initial clinical diagnosis and cannot perform biopsy for some reason before surgery. For patients who are not considered GIST initially or have a SMT out of stomach, this model may yield inaccurate prediction, or even contrary results. This model is only a tool to assist clinical diagnosis, by giving an interpretation of the clinical test results to doctors, aiding them in making the final diagnosis and intervention measures. In the future, nationwide, multi-center large scale studies are expected for further improvement of current models.