Identification of DEGs in LUAD
A flowchart for our study was shown in Fig. 1. The lung adenocarcinoma mRNA sequencing dataset was downloaded from the TCGA database. A total of 3581 DEGs were obtained according to the criterion: |logFC| > 1, FDR < 0.05, including 2386 upregulated genes and 1195 downregulated genes. List, Heatmap, and volcanic map of the DEGs were shown in the supplementary document: Additional file 1, Additional file 2, Additional file 3.
Establishment of a four-gene panel as a prognostic indicator
Univariate Cox regression analysis was performed for identifying the DEGs associated with OS using the “survival” package of R language. Of the 3,581 DEGs, 523 genes were identified as being associated with OS for LUAD patients (p < 0.01, Additional file 4). Then, lasso regression analysis was implemented to further obtain a stable set of genes (Additional file 5). Seven of the 294 mRNAs significantly associated with OS were screened out via this analysis (ANLN, C1QTNF6, DKK1, ERO1A, GNG7, LDHA, MELTF). At last, a four-gene panel as a prognostic indicator was obtained via multivariate Cox regression analysis. The forest map of Cox regression analysis was shown in Fig. 2. The four genes screened were dickkopf WNT signaling pathway inhibitor 1(DDK1), G protein subunit gamma 7(GNG7), lactate dehydrogenase A (LDHA), melanotransferrin (MELTF, also known as MTF1(metal regulatory transcription factor 1)). The risk score = (0.38606 * ExpressionDKK1) + (-0.77458 * ExpressionGNG7) + (1.95469 * ExpressionLDHA) + (0.83740 * ExpressionMELTF). Each patient from the TCGA-LUAD database was awarded a risk score based on the Cox regression model composed of the four genes. Then, these patients were divided into a low-risk and high-risk group according to a cut off value obtained by the R package “survminer" and “survival”. The predictive performance of the four-gene panel for OS was estimated using the time-dependent ROC curve by R package “timeROC” and “survival”. The Area under the ROC curve (AUC) of this four-gene panel as a prognostic indicator was 0.740 and was superior to other clinical indicators used for prognostic classification. The OS in the high- and low-risk groups were compared using the Kaplan–Meier survival curve via the R package “survival”. The results indicated that high-risk group had a worse prognosis compared with the low-risk group (Fig. 3A).
Validation of the four-gene panel as a prognostic indicator
To further validate the accuracy of the four-gene panel as a prognostic indicator, we computed the risk score of each patient in GSE42127 using the same model. Consistent with previous results, a significantly worse OS was observed in the high-risk group compared with the low-risk group. ROC curve showed that the AUC for OS was 0.752, indicating a better predictive performance compared with other clinical indicators used for prognostic classification (Fig. 3B).
Independent prognostic value of the four-gene panel
To determine whether the four-gene panel is an independent prognostic indicator for LUAD patients, 344 patients from the TCGA-LUAD database with complete clinical information including age, gender, and TNM stage were included for further analysis. Multivariate Cox regression analysis suggested that only the risk score calculated from the four-gene panel was independent prognostic factors for OS (Fig. 4A, 4B). The consistent result was obtained in the patients from GSE42127 (Fig. 4C, 4D).
Establishment of predictive nomogram
We established a nomogram to predict 1-year, 3-year, and 5-year OS in 460 patients with complete clinical information from the TCGA-LUAD database using five factors including risk score, age, sex, pharmaceutical, and pathologic stage (Fig. 5A). The C-index for the nomogram model was 0.710 (95% CI 0.624–0.796). Calibration curve showed that the nomogram had the superior prediction efficiency (Fig. 5B). These results indicated that the nomogram might be to serve as a prognostic model used for clinical management of LUAD patients.
The expression of GNG7 in LUAD
We further investigated the role of GNG7 in lung adenocarcinoma using the expression and clinical data from the TCGA-LUAD database, including patients’ gender, age, TNM classification, survival status, and TNM stage. The level of GNG7 expression in LUAD tissues was decreased compared with that in noncancerous tissues (Fig. 6A). A paired comparison of LUAD tissues and tissues adjacent to carcinoma from the same patients also showed the same results (Fig. 6B). To further verify GNG7 expression in LUAD tissues, we selected two independent datasets (GSE18842[15], GSE75037[16]) for differential expression analysis, and the results showed that GNG7 expression was higher in LUAD tissues compared with noncancerous tissues (Fig. 6C, 6D). In addition, differences in GNG7 expression were also found according to gender, TNM classification, and TNM stage (Fig. 6E-I).
DNA methylation level and mRNA expression of GNG7
We further explored the relationship between DNA methylation level and mRNA expression of GNG7 using MethHC (A database of DNA Methylation and gene expression in Human Cancer) database. The results showed that the DNA methylation levels of GNG7 were significantly higher in 18 kinds of cancerous tissues than adjacent noncancerous tissues (Fig. 7). Furthermore, methylation level of the promoter and CpG Island region was negatively correlated with mRNA expression of GNG7 (Fig. 8).
GNG7 is an independent prognostic indicator for OS with LUAD patients
Multivariate Cox regression analysis suggested that GNG7 was an independent favorable factor for OS with LUAD patients (hazard ratio [HR] = 0.56, 95% confidence interval [CI]: 0.41–0.77, P < 0.001, Fig. 9A). Kaplan-Meier curves indicated that low expression of GNG7 was associated with a poor OS for LUAD (P = 9.783e-06; Fig. 9B). The results of Kaplan-Meier analysis for DKK1, LDHA, and MELTF were also showed in Additional file 6, 7, 8, respectively.
GSEA identifies GNG7-related signaling pathways
GSEA analysis was used to identify signaling pathways enriched in low and high GNG7 expression. The results revealed that 17 signaling pathways significantly enriched in GNG7 high expression group, and 27 signaling pathways in GNG7 low expression group (Table 1, 2). Genes involved in cell cycle, ubiquitin mediated proteolysis, RNA degradation, aminoacyl tRNA biosynthesis, DNA replication, proteasome, small cell lung cancer, and P53 signaling pathway were enriched in GNG7 low expression patients. The results suggested that the absence of GNG7 expression may be significantly related to the genesis and progression of LUAD, especially in the regulation of cell cycle (Fig. 9C, 9D).
Table 1
Signaling pathways significantly enriched in GNG7 high expression group.
KEGG pathways | ES | NES | NOM p-val | FDR q-val |
KEGG_HEMATOPOIETIC_CELL_LINEAGE | 0.651571 | 1.963524 | 0.003922 | 0.081407 |
KEGG_FC_EPSILON_RI_SIGNALING_PATHWAY | 0.502703 | 1.919793 | 0.005682 | 0.05288 |
KEGG_AUTOIMMUNE_THYROID_DISEASE | 0.691564 | 1.896951 | 0.005803 | 0.053612 |
KEGG_VEGF_SIGNALING_PATHWAY | 0.42983 | 1.728079 | 0.005837 | 0.083791 |
KEGG_PRIMARY_BILE_ACID_BIOSYNTHESIS | 0.649137 | 1.84108 | 0.006048 | 0.058552 |
KEGG_ARACHIDONIC_ACID_METABOLISM | 0.546052 | 1.892989 | 0.00616 | 0.048027 |
KEGG_ASTHMA | 0.792022 | 1.929154 | 0.007463 | 0.059026 |
KEGG_CELL_ADHESION_MOLECULES_CAMS | 0.58855 | 1.950352 | 0.007533 | 0.061348 |
KEGG_VASCULAR_SMOOTH_MUSCLE_CONTRACTION | 0.462289 | 1.816274 | 0.009671 | 0.055734 |
KEGG_ALLOGRAFT_REJECTION | 0.798208 | 1.846394 | 0.011194 | 0.06289 |
Table 2
Signaling pathways significantly enriched in GNG7 low expression group.
KEGG pathways | ES | NES | NOM p-val | FDR q-val |
KEGG_CELL_CYCLE | -0.76285 | -2.48945 | 0 | 0 |
KEGG_OOCYTE_MEIOSIS | -0.59433 | -2.29722 | 0 | 4.18E-04 |
KEGG_UBIQUITIN_MEDIATED_PROTEOLYSIS | -0.55212 | -2.20146 | 0 | 0.001748 |
KEGG_RNA_DEGRADATION | -0.63957 | -2.15354 | 0 | 0.002089 |
KEGG_HOMOLOGOUS_RECOMBINATION | -0.76862 | -2.04527 | 0 | 0.007301 |
KEGG_DNA_REPLICATION | -0.84117 | -1.99922 | 0 | 0.007784 |
KEGG_PYRIMIDINE_METABOLISM | -0.58566 | -2.00456 | 0 | 0.00844 |
KEGG_AMINOACYL_TRNA_BIOSYNTHESIS | -0.73648 | -2.00871 | 0 | 0.010746 |
KEGG_PROTEASOME | -0.79858 | -1.93904 | 0 | 0.011414 |
KEGG_BASAL_TRANSCRIPTION_FACTORS | -0.62853 | -1.92244 | 0 | 0.01331 |