Background
Patients with non-small cell lung cancer (NSCLC) often have a poor prognosis. Overall survival (OS) prediction through the early diagnosis of cancer has many benefits, such as allowing providers to design the best treatment plan for patients. In this study, we aimed to evaluate the prognostic factors in NSCLC patients, construct a nomogram, and develop machine learning models to predict the OS. We also conducted feature importance analysis to understand how relevant factors of NSCLC patients impact their OS.
Results
Multiple machine learning models were adopted in a retrospective cohort of patients from 2010 to 2015 in the Surveillance, Epidemiology, and End Results (SEER) database. Independent prognostic factors for NSCLC were determined using Cox proportional hazards regression analysis. We modeled OS and vital status as the outcomes and constructed and validated a nomogram to predict the OS of NSCLC. Furthermore, we applied logistic regression, random forest, XGBoost, decision tree, multilayer perceptron, and LightGBM to predict the patients’ vital status. We tested the prediction ability of the models and evaluated their performances using accuracy, sensitivity, specificity, precision, and the area under the receiver operating characteristic curve. A total of 34,567 patients selected from the SEER database that met our criteria were included in this study. The nomogram visualized the OS prediction results of the Cox regression model. Among the classifiers, XGBoost had the best prediction performance, with an area under the curve of 0.733.
Conclusions
The results demonstrated that machine learning-based classifier models are capable of predicting the outcomes of patients with NSCLC. And Cox regression model-based nomogram interpreted the results well and supports potential medical applications.