1. Screening and co expression analysis of differential genes
RNA sequencing data of LUAD patients were downloaded from TCGA database, including 535 tumor tissue samples and 59 adjacent or normal tissue samples. A total of 501 LUAD patients had complete clinical outcome parameters and RNA SEQ data, which were included in the further study.
The differentially expressed genes belonging to KRT gene family were screened out. A total of 14 KRT genes were found to be significantly dysregulated in LUAD tumor and adjacent non cancer tissues. The differential expression fold changes and thermograms are shown in Fig.1 and Fig.2A. One KRT gene was down regulated, while the other genes were up-regulated. The specific expression is shown in Fig.3. Co-expression analysis showed that there was a certain co expression relationship between genes (Fig. 2B).
2. Prognostic screening of KRT gene
Univariate survival analysis of clinical parameters of overall survival (OS) time in patients with lung adenocarcinoma showed that tumor stage was significantly correlated with OS. Among them, 14 KRT genes were associated with the diagnosis of lung adenocarcinoma (Table 1). Multivariate Cox regression analysis showed that 6 KRT genes were correlated with the prognosis of lung cancer after correction of age, gender and stage (Table 2; KRT86 KRT81 KRT8 KRT18 KRT19 KRT6A). ROC curve analysis confirmed that these 14 KRT genes can be used as potential diagnostic markers for lung adenocarcinoma (Fig.4).
3. Construction of risk and prognostic gene signature in LUAD patients
Variables with p value less than 0.05 in Table 2 were further included in multivariate Cox regression analysis, covariates were included in gender, age and tumor stage, variables were screened by backward stepwise regression model based on maximum likelihood method. In the final regression model, age was retained (p = 0.057, HR = 1.015, 95% confidence interval 1.000-1.031), tumor stage (phase II, p < 0.001, HR = 2.109, 95% confidence interval 1.466-3.034; Phase III, p < 0.001, HR = 3.036, 95% confidence interval 2.063-4.469; Stage IV (p < 0.001, HR = 2.947, 95% confidence interval 1.631-5.324), KRT8 Cox regression results p < 0.001, HR = 1.001, 95% confidence interval 1.001-1.002; KRT6a Cox regression results p < 0.001, HR = 1.002, 95% confidence interval 1.001-1.003. The results showed that KRT8 and KRT6A were independent risk factors for the prognosis of lung adenocarcinoma. KRT8 and KRT6A were selected to construct risk score. The formula is risk score = KRT8 (expression) × β 1 (regression coefficient) + KRT6A (expression) × β 2 (regression coefficient), the construction formula of this study is: risk score = KRT8 (expression) × 0.001 + KRT6A (expression) × 0.002.
By establishing the lung cancer prediction model and constructing the multivariate Cox regression of the patient's risk score, it was found that the risk score was a significant independent risk factor (HR = 2.359, 95% CI 1.728-3.222, p < 0.001) (Fig.5).
K-M curve analysis showed that patients with high-risk score increased the risk of death (log rank p = 0.004, adjusted HR = 1.378, 95% CI 1.013-1.875, as shown in Fig.6A-D). The AUC change of time-dependent ROC curve shows that the risk score has a certain predictive value for all-cause death in patients with lung adenocarcinoma, and its AUC is roughly stable at about 0.6. With the change of time, the change of AUC is not obvious (Fig.6E).
4. Stratification and joint effect analysis
The relationship between clinical parameters and prognostic gene characteristics can be further studied through the comprehensive analysis of Norman map, stratification and combined effect analysis. For example, male patients with stage III disease and older than 65 years old are very likely to die of the disease, and their 1-year survival rate is almost zero. Nomograms constructed with risk scores and clinical LUAD parameters showed that prognostic markers based on KRT gene expression were more accurate than other parameters (Fig.7).