Identification of Iron Metabolism DEGs in LUAD
After pre-processing, 486 LUAD patients from TCGA and 131 lung adenocarcinoma patients from GEO were selected. A detailed summary of the clinical features of these patients is shown in Table 1. In order to identify prognostic genes related to iron metabolism of LUAD, differential expression analysis was performed. The DEGs between tumor samples and neighboring tumor samples were selected through the Wilcox Test. A total of 257 iron metabolism-related DEGs were identified (adjusted p values<0.05 and |logFC|>1.5); among them, there were 154 up-regulated DEGs, and 103 significantly down-regulated DEGs. The heat map and volcano map of these differential genes are shown in (Fig1a,b). To further identify the representative prognostic genes of iron metabolism, we performed univariate Cox analysis, leading to the retention of 46 DEGs (P<0.01, Table 2). The interaction network between these genes is shown in (Fig 1e).
Establishment and Verification of an Iron Metabolism-related Gene Signature
For constructing a genetic signature related to iron metabolism, the following steps were performed: first, on the basis of expression profiles of the above 46 candidate genes, LASSO Cox regression analysis (2,000 iterations) was carried out. According to the minimum λ, the optimal model was constructed with the minimum parameters (Fig 1c,d). Eventually, a prognostic model containing 12 genes was established to evaluate the prognosis of each lung adenocarcinoma patient. The specific calculation equation for this risk score was: Risk Score = (0.01949×expression value of HAVCR1 -0.00501 × expression value of SPN + 0.00003× expression value of GAPDH + 0.00087 ×expression value of ANGPTL4 + 0.00004×expression value of PRSS3 + 0.00036 × expression value of KRT8 + 0.00122 × expression value of LDHA + 0.02521× expression value of HMMR + 0.00407 ×expression value of SLC2A1 + 0.00102× expression value of CYP24A1+ 0.00450 × expression value of LOXL2 + 0.00031 × expression value of TIMP1). Patients could be split into high-risk group (n = 238) and low-risk group (n = 239) using the optimal cut-off value of the risk score (Fig 2a). Kaplan-Meier analysis results revealed that the OS of the two different risk groups in the training group was significantly different. It was observed that the OS of the low-risk group was significantly higher than that of the high-risk group (P <0.0001, Fig 2b). Next, the strong prognostic value of 12 gene signatures was analyzed using the time-dependent ROC curve. In addition, with respect to the prediction of risk scores for 1-year, 3-year, and 5-year overall survival, the AUCs were 0.735, 0.711, and 0.601, respectively (Fig 2d,e,f).
External Verification of 12 Gene Signatures in GSE42127
The external data set GSE42127 further proved the predictive capability of the 12-gene prognostic signature. For patients in the GEO cohort, the same calculation method as the TCGA cohort was applied to compute the risk score, following which the LUAD patients were split into high-risk and low-risk groups (Fig 3a). Kaplan-Meier analysis results were similar to those obtained in the TCGA cohort; it was shown that the overall survival of the low-risk group was significantly longer than that of the high-risk group (P = 0.001, Fig 3b). Next, the prognostic ability of the signature was assessed through time-dependent ROC, wherein the 12-gene signature could have a higher performance. When predicting the AUC of the overall survival (OS) of the 12-gene signature, the results at 1, 3 and 5 years were 0.904, 0.745, and 0.712, respectively (Fig 3d,e,f).
Analysis of Independent Prognostic Potency of 12-gene Signature
To determine the prognostic factors of overall lung adenocarcinoma survival, we carried out univariate and multivariate Cox regression analysis. Among them, the univariate analysis showed the following results for the TCGA cohort: Risk Score (HR=3.982, 95%CI =2.867-5.530, P <0.001), Stage (HR=1.648, 95%CI =1.396-1.946, P<0.001), T stage (HR=1.600, 95%CI =1.285-1.994, P<0.001), N stage (HR=1.787, 95%CI =1.455-2.195, P<0.001). The univariate analysis also showed that the GEO cohort with Risk Score (HR=82.970, 95%CI =10.025-686.710, P<0.001), Stage (HR=1.652, 95%CI =1.144-2.387, P=0.007) had a significant correlation with the overall survival of lung adenocarcinoma (Table 3). Interestingly, we observed that the risk scores in the TCGA and GEO cohorts were distinctly related to OS. Similarly, the multivariate regression analysis (after correcting the parameters) indicated the following data for the TCGA cohort: Risk Score (HR=3.313, 95%CI =2.273−4.827, P<0.001), Stage (HR=1.921, 95%CI= 1.154-3.198, P=0.012); and the GEO cohort: Risk Score (HR=84.063, 95%CI= 7.882−896.052, P<0.001), Stage (HR=1.568, 95%CI =1.052-2.337, P=0.027) (Table 3). However, in the multivariate Cox regression analysis, the risk score was an independent predictor of OS.
Constructed and Verified Nomogram and Calibration Plots
All the clinical information parameters in the univariate Cox regression analysis mentioned above exist in the TCGA and GEO cohorts. Among these, gender, stage, T stage, M stage and N stage were the parameters involved in the TCGA nomogram (Fig 2c). The parameters included in the GEO nomogram were gender and stage (Fig 3c). In the TCGA and GEO cohorts, a prognostic nomograph was constructed to predict the OS at 1, 3, and 5 years, respectively. It can be seen that those patients with higher scores have distinctly lower OS than those with lower scores. In addition, the results of the calibration chart have shown that the nomogram is significantly accurate in predicting the OS of patients with lung adenocarcinoma (TCGA: Fig 2g,h,i, GEO: Fig 3g,h,i).
Biological Function and Immune Analysis of TCGA Cohort
Next, we aimed to deepen our understanding of the biological functions of the prognostic model. In 477 LUAD samples from high-risk and low-risk groups of TCGA, ssGSEA was used to explore the tumor microenvironment in different immune clusters, and to compute the stromal score, immune score and estimated score of cancer tissue expression profile. Based on the data, we have reason to conclude that patients with high immunity have higher estimated score, stromal score, and immune score than patients with low immunity. In contrast, the tumor purity of the low-immune patients was higher than that of the high-immune patients. The result is shown in Fig 4a. These results indicated that the TCGA cohort significantly enriched many immune-related biological processes (P<0.05). Among them, eight biological processes related to immunity include: immunoglobulin complex, natural killer cell chemotaxis, circulating immunoglobulin complex, immunoglobulin receptor binding, MHC class II protein complex, MHC protein complex, positive regulation of interferon−gamma biosynthetic process, and T cell receptor complex (P<0.05, Fig 4b).
We also probed the correlation between high and low-risk populations and immune status. ssGSEA analysis was used to analyze the immune cells and related functions of the two groups. Between the low-risk group and the high-risk group of the TCGA cohort,it was observed that the scores of APC co-inhibition, Th2 cells, NK cells and MHC class I were significantly different (P<0.05, Fig 4c,e). In addition, for the high-risk group, the scores of Inflammation-promoting, Parainflammation and Th1 cells were higher, while the scores were lower for HLA, typeⅡIFN response, aDCs, B cells, iDCs, Mast cells, Neutrophils and T helper cells (P<0.05, Fig 4c,e). In the ssGSEA score of the GEO cohort, between the high-risk group and the low-risk group, there were differences in cytolytic activity, MHC class I, Parainflammation, typeⅡIFN response, Th2 cells, CD8+ T cells, iDCs, Inflammation−promotion, Th1 cells, and Treg (P<0.05, Fig 4d,f).