Background Due to the late and poor prognosis of non-small lung cancer(NSCLC), the mortality of patients is high, underlines the need to identify a credible prognostic marker for NSCLC patients. The aim of our study is to examine the association of allele frequency deviation (AFD) with the patient's survival, as well as identification and validation of a new prognostic signature to predict NSCLC overall survival(OS).
Methods First, we developed a new algorithm to calculate AFD from whole-exome sequencing(WES) data, then we compared the predictability of the patient's survival between AFD, tumor mutation burden (TMB) and change of variants allele frequency (dVAF). Second, we overlapped the differentially expressed genes (DEGs) from our data with the genes associated with the survival of The Cancer Genome Atlas (TCGA) database to confirm all genes significantly related to the survival of lung cancer. We identified 149 genes, 31 of which are new genes and have not been reported for lung cancer, that was used to develop a new prognostic model. Lung cancer adenocarcinoma (LUAD) data from the TCGA database was used to validate the gene-signature model. The prognostic model relating to the genes was established and validated in training and LUAD validation groups.
Results There was a significant association found between the high AFD value and poor survival among non-small cell lung cancer (NSCLC) patients. A novel seven genes (UCN2, RIMS2, CAVIN2, GRIA1, PKHD1L1, PGM5, CLIC6) were obtained through multivariate Cox regression analysis and significantly associated with NSCLC patients survival. Cox regression analysis confirmed that AFD and 7-gene signature are an independent prognostic marker in NSCLC patients. The AUC for 5-year survival in AFD and the AUC for 3-year survival in both training and validation groups were greater than 0.7.
Conclusion As a result, AFD and 7-gene signatures were identified as new independent predictive factors used for predicting the survival among NSCLC patients.