Identification of DEGs in LUAD
A flowchart for our study was presented in Fig. 1. The lung adenocarcinoma mRNA sequencing dataset was downloaded from the TCGA database. A total of 3581 DEGs were obtained according to the criterion: |logFC| > 1, FDR < 0.05, including 2386 up-regulated genes and 1195 down-regulated genes. List, Heatmap, and volcano plot of the DEGs were shown in the supplementary document: Additional file 1, Additional file 2, Additional file 3.
Establishment of a four-gene panel as a prognostic indicator
Univariate Cox regression analysis was performed for identifying the DEGs associated with OS using the “survival” package of R language. Of the 3,581 DEGs, 523 genes were identified as being associated with OS for LUAD patients (p<0.01, Additional file 4). Then, lasso regression analysis was implemented to further obtain a stable set of genes (Additional file 5). Seven genes significantly associated with OS were screened out via this analysis (ANLN, C1QTNF6, DKK1, ERO1A, GNG7, LDHA, MELTF). At last, a four-gene panel as a prognostic indicator was obtained via multivariate Cox regression analysis. The forest map of Cox regression analysis was shown in Fig. 2. The four genes screened were dickkopf WNT signaling pathway inhibitor 1(DDK1), G protein subunit gamma 7(GNG7), lactate dehydrogenase A (LDHA), melanotransferrin (MELTF, also known as MTF1(metal regulatory transcription factor 1)). Among them, DKK1, LDHA and MELTF are high-expressed in tumor tissues compared with tissue adjacent to carcinoma, but GNG7 is low-expressed. The heat map of differential expression was shown in Fig. 3. The risk score = (0.38606 * ExpressionDKK1) + (-0.77458 * ExpressionGNG7) + (1.95469 * ExpressionLDHA) + (0.83740 * ExpressionMELTF). Each patient from the TCGA-LUAD database was awarded a risk score based on the Cox regression model composed of the four genes. The results indicated that high-risk group had a worse prognosis compared with the low-risk group. The area under the ROC curve (AUC) of this four-gene panel as a prognostic indicator was 0.740 and was superior to other clinical indicators used for prognostic classification (Fig. 4A).
Validation of the four-gene panel as a prognostic indicator
To further validate the accuracy of the four-gene panel as a prognostic indicator, we computed the risk score of each patient in GSE42127 using the same model. Consistent with previous results, a significantly worse OS was observed in the high-risk group compared with the low-risk group. ROC curve showed that the AUC for OS was 0.752, indicating a better predictive performance compared with other clinical indicators used for prognostic classification (Fig. 4B).
Independent prognostic value of the four-gene panel
344 patients from the TCGA-LUAD database with complete clinical information including age, gender, and TNM stage were included for further analysis. Univariate and Multivariate Cox regression analysis suggested that only the risk score calculated from the four-gene panel was independent prognostic factors for OS (HR = 1.270, p < 0.001). The consistent result was obtained in the patients from GSE42127 (HR = 1.204, p = 0.018). The results were presented in Fig. 5.
Establishment of predictive nomogram
We established a nomogram to predict 1-year, 3-year, and 5-year OS in 460 patients with complete clinical information from the TCGA-LUAD database using five factors including risk score, age, sex, pharmaceutical, and pathologic stage (Fig. 6A). The C-index for the nomogram model was 0.710 (95% CI 0.624–0.796). Calibration curve showed that the nomogram had the superior prediction efficiency (Fig. 6B). These results indicated that the nomogram might be to serve as a prognostic model used for clinical management of LUAD patients.
The genetic alteration, expression and survival analysis of the four genes
We explored the genetic alteration of the four genes by using the mutation data obtained from the cBioPortal for Cancer Genomics (https://www.cbioportal.org/) [15, 16]. In this database, nine percent (21/230) of patients showed genetic alterations in the four genes. Missense mutation, amplification and deep deletion were common genetic alteration (Fig. 7A). We further validated the expression of the four genes using the lung adenocarcinoma dataset (GSE75037 [17]) from the GEO database, and the results were consistent with the analysis of the TCGA-LUAD dataset (Fig. 7B). Kaplan-Meier survival curves indicated that high expression of DKK1, LDHA, and MELTF and low expression of GNG7 were associated with a poor OS for LUAD (Fig. 7C).
Predictive value of the four-gene panel for patients with EGFR, KRAS and TP53 mutation
To explore the predictive value of the four-gene panel for patients with EGFR, KRAS and TP53 mutation, we performed a combined analysis of gene mutation and transcription data. A heatmap of mutations in TP53, KRAS, EGFR and the four genes was shown in Fig. 8. The results showed that mRNA expression differences of GNG7 and MTF1 (MELTF) were only observed in TP53 mutant and wild-type patients (Fig. 9). We further analyzed the predictive value of the four-gene panel for OS in LUAD patients with EGFR, KRAS and TP53 mutations, respectively. The results showed that high-risk group had a worse OS in LUAD patients with TP53 mutations and KRAS mutations (p < 0.05). The AUC of this four-gene panel as a prognostic indicator was 0.718 and 0.793 in LUAD patients with TP53 and KRAS mutation, respectively. However, in patients with EGFR mutations, there was no significant difference in OS between the high-risk and low-risk groups (Fig. 10).
Gene Set Enrichment Analyses
GSEA analysis was used to identify signaling pathways enriched in low and high expression of the four genes, respectively. The results revealed that genes involved in cell cycle, ubiquitin mediated proteolysis, RNA degradation, aminoacyl tRNA biosynthesis, DNA replication, proteasome, small cell lung cancer, and P53 signaling pathway were enriched in GNG7 low expression group. In the high-expressed group of DKK1, KEGG pathways including cell cycle, RNA degradation, spliceosome, proteasome, DNA replication, P53 signaling pathway and so on were enriched. In the high-expressed group of LDHA, KEGG pathways including proteasome, RNA degradation, spliceosome, DNA replication, RNA polymerase, cell cycle, P53 signaling pathway and so on were enriched. In the high-expressed group of MELTF, the enriched KEGG pathways were mainly focused on bladder cancer, proteasome, DNA replication, base excision repair, pyrimidine metabolism. These results suggested that the absence of GNG7 expression and the increase of DKK1, LDHA and MELTF expression may be significantly related to the metabolism of genetic material, especially in the regulation of cell cycle pathway (Fig. 11).
DNA methylation level and mRNA expression of GNG7
We further explored the relationship between DNA methylation level and mRNA expression of GNG7 using MethHC database. The results showed that the DNA methylation levels of GNG7 were significantly higher in 18 kinds of cancerous tissues than adjacent noncancerous tissues (Fig. 12). Furthermore, methylation level of the promoter and CpG Island region was negatively correlated with mRNA expression of GNG7 (Fig. 13).