Recurrence of stage I NSCLC is a challenging clinical issue, which shortens the survival time and reduces the effect of surgery(16). Adjuvant therapy has been recommended for postoperative therapy for stage II or III NSCLC patients according to guidelines(3), however, it remains controversial in stage I NSCLC. Published clinical trials have not showed a consistent survival benefit due to drug toxicity and side effect(17). Thus, how to select the patients at high risk of recurrence is an unsolved problem in management of patients with stage I NSCLC.
Increasing investigations were devoted to seek the risk factors for recurrence of early stage NSCLC, which provided the clues for decision-making of postoperative management(18). Brandt et al identified that pT stage and lymphovascular invasion were correlated with distant recurrence and declined the disease-free survival(19). Yu and his colleagues developed and validated a radiomics model for prediction of clinical outcome, which benefited to the choice of treatment(20). In addition, numerous molecular signatures, based on the expression profiles, have been proposed to predict the recurrence-free survival in recent years, which reveal the heterogeneous differences between individuals. Noro et al proposed a two-gene prognostic classifier to predict the recurrence for lung squamous cell carcinoma after surgical resection(21). Shuta’s study built a relapse-related molecular signature for lung adenocarcinomas to identify patients at high risk of relapse(22). However, as the problems of small sample size, different microarray platforms, heterogeneity from patients, and diverse range gene selection algorithms, few molecular signatures were broadly adopted in clinic for early stage lung cancer. Compared with previous studies, this study strengthened several aspects and made up the deficiencies. This novel nomogram maximized the potential of molecular biology and clinical factors. In addition, the reliability and robustness of the nomogram was tested in multi-scaled cohorts from different patients and platforms, which showed well performance. As a functional tool derived from molecular biomarkers and clinical variables, it may help optimize patient care by providing better prediction of recurrence, selecting patients for postoperative adjuvant therapy, and stratifying patients in prospective clinical trials. To our knowledge, this is the first study assessing recurrence for stage I NSCLC by combining molecular signature with clinical variables.
In our preliminary work, we found 14 expression profile datasets related to recurrence from GEO database and TCGA. By checking the clinical records and the sample size, we found that TCGA contained the largest sample size and completed clinical records, GSE31210 only involved lung adenocarcinoma samples, and either of GSE41271 or GSE68465 lacked of the expression values of TPSB2 due to different microarray platforms. Thus, we assigned TCGA into the training cohort and selected GSE50081 and GSE30129 into the validation cohort due to moderate sample size and matching clinical records. The diverse racial group and wide geographic distribution of patients made themselves representativeness and generalizability, which enhanced the reliability of the model. Candidate genes were screened by two routine algorithms in order to minimize the possibility of missing or ignoring key markers. L1 penalized Cox regression analysis, a broadly adopted method, was utilized to construct the 13-mRNA signature from candidate genes by yielding the corresponding coefficients(14, 23). Our 13-mRNA signature exhibited favorable discrimination in the training and the validation cohorts, with an AUC of 0.79, 0.73, and 0.72, respectively. The cutoff values of different datasets, used to define the high risk and low risk groups by recurrence associated signature, were determined by the corresponding median risk scores owing to different platforms. Published studies presented evidence supporting that a series of clinical variables, such as age, histology, and differentiation, were associated with recurrence-free survival of early stage NSCLC(10). Therefore, we considered the clinical variables and constructed a nomogram by incorporating clinical variables with our molecular signature to provide an easy-to-use tool for clinicians, showing good calibration and discrimination in the training and validation cohorts. Univariable regression analysis revealed that histology showed statistically significant results in the training cohorts. However, after adjusting with the 13-mRNA signature, it wasn’t significant, which might be related to the respective small sample size. And our meta-analysis of the entire cohort revealed that age and histology were two key variable associated with recurrence-free survival (Fig. S3). Backward wise step method demonstrated that age, histology, and signature were eligible variables with the least AIC value, which could be incorporated into the nomogram. Validation analysis confirmed the reliability and generalization of the nomogram. Tertile stratified method allowed the remarkably distinctions between survival curves. Notably, we found no statistically significant differences between low and medium risk groups in GSE50081 and medium and high risk groups in GSE30219. This might be the lower samples in low risk group in GSE50081 and lower samples in medium risk group, which couldn’t discriminate themselves from other groups.
There are some limitations of this study should be mentioned. First of all, this was a multi-scaled study based on the datasets from GEO and TCGA, but prospective studies are required to further validate our finding. In addition, several significant clinical variables were not recorded in some datasets, which hampered the accuracy of our model. Furthermore, different platforms of the datasets hindered the integrated analysis of these datasets, which reduced the power of the model.