Because of the variety of clinical characteristics and therapy options, the survival outcomes of LSCC vary among patients. Based on data from 671 patients with advanced LSCC, we developed the first machine learning model to predict DFS in advanced LSCC patients. The Cox regression model and random survival forest both showed good predictive ability.
Although HNSCC have great similarities in treatment, their clinical outcomes differ greatly. The lack of identifiable early signs in LSCC makes early detection of HSCC more difficult. In most countries, laryngoscopy is not a routine medical exam [6]. Thus, many LSCC patients have been confirmed to have advanced-stage disease at the initial diagnosis. Although patients with LSCC have a good prognosis after surgery and adjuvant treatment, postsurgical tumor recurrence and metastases remain major concerns for patients with advanced LSCC [7].
Recently, a number of nomograms for predicting risk have been reported. In 2017, the Multidisciplinary Larynx Cancer Working Group developed a dynamic risk model and clinical nomogram for patients with locally advanced laryngeal cancer, utilizing conditional survival analysis and data from the University of Texas MD Anderson Cancer Center database [8]. In line with our findings, they found that nodal burden was an important factor for 3- or 6-year overall survival (OS) in the multivariate analysis. Shi et al. created another risk prediction model using data from 2752 LSCC patients who underwent neck dissection and were recorded in the Surveillance, Epidemiology, and End Results (SEER) database between 1988 and 2008 [9]. The nomogram was constructed according to eight independent prognostic clinical variables. This study showed that the nomograms were superior to no-LNR models and TNM classification (Training-cohort: OS: 0.713 vs 0.703 vs 0.667, CSS: 0.725 vs 0.713 vs 0.688; Validation-cohort: OS: 0.704 vs 0.690 vs 0.658, cancer-specific survival (CSS): 0.709 vs 0.693 vs 0.672). However, the accuracy of the prediction was probably reduced by the fact that only 20 patients were in the undifferentiated subset. Since then, Lin et al. established a prognostic model for advanced LSCC patients treated with primary total laryngectomy [10], using an analysis data set collected from the SEER database. They identified six independent prognostic clinical variables. The C-index of the model was 0.651, which was similar to our model. Cui J al. constructed a survival prediction nomogram based on the data set including 369 patients with LSCC [11]. Six independent parameters predicting prognosis were age, pack-years, N stage, lymph node ratio (LNR), anaemia and albumin. The C-index of the nomogram was 0.73 (0.68–0.78), and the area under the curve (AUC) of the nomogram in predicting overall survival (OS) was 0.766.
In the current study, the first prognostic model predicting DFS for advanced LSCC patients was built. We constructed a nomogram and an RSF model for predicting LSCC. Although the RSF model exhibited better prediction ability than the Cox regression model in the training cohort, both models showed similar prediction ability in the validation cohort. As a widely used machine learning model, the RSF model can judge the importance of factors without dimension reduction or feature selection. It can also judge the interactions between different features. However, RSF has been proven to be overfitting in some noisy classification or regression problems [12]. In our study, RSF exhibited significantly good sensitivity and specificity in the training cohort, although not in the validation cohort. We suspect that there are several possible reasons. First, our research data volume is not large, and the random forest model performs better in solving big data problems [13]. Another possible reason is some overfitting of the RSF model.
In the multivariable Cox regression model, we identified five independent predictors: T stage, N stage, postoperative chemoradiotherapy, pathology grading, and postoperative recovery time. The RSF model considered N stage, clinical stage, and postoperative chemoradiotherapy to be the three most important variables. Interestingly, T stage was a significant prognostic factor in the Cox model, although it was not identified as a significant prognostic variable in the RSF model. One possible reason was that the sample size was not large enough.
The nomogram and RSF models also revealed that adjuvant treatment is essential for prolonging the survival time of advanced LSCC patients. For patients with advanced LSCC, total laryngectomy is the standard treatment. According to NCCN guidelines, a remarkable amount of evidence showed significantly improved OS, disease-free survival, and locoregional control when a systemic therapy and radiation regimen (concomitant or, less commonly, sequential) was compared with RT alone for locoregionally advanced disease [14]. In a previous study, our research group reported that in patients with stage IV LSCC, those receiving adjuvant chemoradiotherapy exhibited a markedly improved survival benefit compared with patients receiving surgical treatment only [15]. Notably, in the present study, postoperative recovery time was identified as a significant variable in both the nomogram and RSF. Postoperative recovery time was strongly associated with clinical stage and surgery. Patients with a higher clinical stage and larger surgical range may need a longer time to recover.
Our study has several limitations. First, this was a retrospective study including LSCC patients undergoing laryngectomy only. As the treatment decision was made before inclusion in the study, there was a potential selection bias. Furthermore, our nomogram has not been applied to the prediction of survival in LSCC patients with other radical treatment models, such as radiotherapy and chemotherapy. Second, although the novel nomogram was generated based on a relatively large sample size and a split validation of the model was performed, no external validation using data from other centres was performed. Finally, only the clinicopathological prognostic factors were used to predict the survival rate. Hence, the decisions offered by the RSF model would be more comprehensive if both the clinicopathological and genomic data of LSCC patients were analyzed together.