The establishment of 3-year or 5-year survival prediction model for different tumors can provide some reference for clinicians to predict the prognosis of patients during tumor treatment. Delen et al. used three different ML algorithms to establish a survival state prediction model for breast cancer13. Gong et al. also used a variety of ML algorithms to establish a 5-year survival state model for esophageal cancer, providing a reference for the prognosis prediction of esophageal cancer14. However, for the survival state of HPSCC patients, AI prognosis prediction model has not been reported. We are the first to build a model for the three years survival state prediction for HPSCC patients. This model was constructed based on 22 clinical parameters, using 12 kinds of different ML algorithms and finally choose the best algorithm (XGBoost) for model construction. The performance of the model is relatively good and has certain practicability.
The clinical features of this study can mainly divided into three group, the first group was the basic information group, including age, gender, smoking, alcohol taking, basic disease at diagnosis. According to the feature importance analysis for the prediction model, we found that age was one the most important features, this was in agreement with the finding of Gong et al. in esophageal cancer14. In fact, we found that, compared with HPSCC patients who were older than 60 years, HPSCC patients who were young than 60 years have lower probability of death 3 years after diagnosis. The remain 4 features of the first group have lower feature importance, especially the gender, which had not been ranked in the feature importance diagram.
The second group was the diagnosis-related group, including the TNM and clinical stage, the pathological information, as well as the expression of P53 and Ti67 protein. The T stage and clinical stage were well known factors for survival predicting1, and was also important features for the survival state prediction model in our study. Advanced HPSCC patients were more likely have lower probability of survival than those in early stage1. Although we did not find any significant statistical difference in the survival status of patients with / without cervical lymph node metastasis after 3 years, it can be seen from the importance of characteristics that this parameter is still an important feature for predicting the survival status of HPSCC patients after 3 years (ranking sixth). The reason may be that for ML algorithms, the importance of characteristics cannot be completely determined by statistical differences. Different ML algorithms may have their own unique models for evaluating the importance of features. Due to the types of cases included in this study, there are too few patients with distant metastasis at the time of diagnosis, only 4 of 295 cases. Therefore, this feature is not shown in the feature importance diagram, but considering it is a very important clinical parameter, we still retain it as one of the prediction features.
We found that patients with poorly differentiate pathological state, or have P53 and Ti67 expressed in the cancer tissues have significant lower survival probability. The situation that poor differentiated HPSCC patients have poor survival outcome has been well studied1,4. P53 was a well studied gene which plays important role in variety of tumor, the positive rate of P53 in our study was 42.7%, which was in accordance with the results of other study (34%-81%)15–16. Ki67 is a tumor proliferating marker, and was also be found upregulated in HOSCC, Ki67 levels are significantly associated with the survival outcome in HPSCC17. We also found that, if the model was constructed without P53 or Ki67, the model performance would decline, thus, both of P53 and Ki67 are important feature for the prediction of survival states for HPSCC patients 3 years after diagnosis.
The third group was the treatment-related group. The treatment of HPSCC mainly include surgery, chemotherapy and radiotherapy. In this study, 10 treatment-related parameters were included as training features. We found that patients who have received preoperative or postoperative radiotherapy / chemotherapy have better survival probability than those haven’t. Researches have confirmed the positive value of both chemotherapy and radiotherapy on the prognostic of HPSCC patients4,18−19. We also found that, patients ever undergo TLM, total laryngectomy or total hypopharyngeal resection surgery have better survival outcome than those don’t, there were no significant survival difference between HPSCC patients with or without performing of the parameters of partial laryngectomy, partial hypopharyngeal resection or flap reconstruction surgery. The reason may be that, the total resection of larynx and hypopharyngeal may make the tumor cut more thoroughly. The reason why patients undergo TLM surgery have better outcome may be that, most of these patients were usually in the early stage.
These study has the following limitations: 1) the sample size is relatively small, and the data was obtained from single medical center, the generalization ability of the model should be further tested; 2) This model is only applicable to the prediction of the 3-year survival status of patients with HPSCC, and does not predict the 5-year survival status of patients, because the proportion of patients who still survive after 5 years is much lower than after 3 years, so the data distribution will be too unbalanced during training. Therefore, we only predict the 3-year survival status of patients with HPSCC.
In summary, we use the XGBoost algorithm, using 22 clinical parameters to establish a model for the 3-years survival state prediction of HPSCC patients, as well as the analysis the role of each parameters in the model construction. The model performance was relatively satisfied, and can offer the clinicians a new option for the prognostic prediction of HPSCC patients.