Identification of prognostic ferroptosis-related DEGs
The workflow of our study was shown in Fig. 1. A total of 594 NSCLC patients from the TCGA-LUAD cohort and another 176 NSCLC patients from the GEO (GSE13213) dataset were finally enrolled. To identify ferroptosis-related differentially expressed genes (DEGs), all these samples were included for analysis while 85 samples without follow-up data were excluded for identifying ferroptosis-related genes associated with prognosis. The detailed clinical characteristics of these patients are summarized in Table S1. In total, we identified 16 differentially expressed genes (26.7%) between tumor and adjacent normal tissues and 14 (23.3%) prognostic genes in tumor samples (Fig. 2A). Notably, 5 of them (ALOX5, DPP4, FANCD2, GCLC and SLC7A11) were both differentially expressed and correlated with overall survival (OS) in the univariate Cox regression analysis (Fig. 2B and C). Interaction network analysis showed that SLC7A11, GCLC, HMOX1, GCLM, G6PD, NQO1 and NOX1 were the hub genes (Fig. 2D and E), indicating that these genes may be the main components responsible for regulating ferroptosis in NSCLC.
Construction of a prognostic model in TCGA cohort
Based on the 5 ferroptosis-related genes (ALOX5, DPP4, FANCD2, GCLC and SLC7A11) identified above, we constructed a prognostic model using LASSO Cox regression analysis. The patients were stratified into high-risk (n = 297) and low-risk group (n = 297) groups by the median cut-off value (Fig. 3A). The results of principal components analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) analysis indicated that our prognostic model could effectively stratified patients into different directions (Fig. 3B and C). Moreover, patients in the high-risk group had a higher probability of earlier death than those in low-risk group (Fig. 3D). The Kaplan-Meier curves showed that patients in the high-risk group had a significantly worse OS than their low-risk counterparts (Fig. 3E). Furthermore, univariate (hazard ratio [HR]: 2.97; 95% CI: 1.74–5.06; P < 0.001) and multivariate Cox regression analysis (HR: 2.70; 95% CI: 1.57–4.64; P < 0.001) also revealed that patients with high-risk achieved significantly worse OS than those with low-risk (Table S2). Receiver operating curve (ROC) analysis indicated that the area under the curve (AUC) of this model reached 0.792 at 1 year, 0.644 at 2 years, and 0.641 at 3 years (Fig. 3F). Taken together, these results suggested that the prognostic model based on the five genes had strong prognostic power.
Validation of the 5-gene signature in GSE13213 dataset
Next, we validated the prognostic model in the GSE13213 dataset. Similarly, the 107 patients were stratified into high-risk and low-risk groups by the median risk score value (Fig. 4A). PCA and t-SNE analysis suggested that patients with different risks were well separated into two different groups (Fig. 4B and C). Also, patients in the high-risk group had a higher probability of earlier death (Fig. 4D) and achieved significantly worse OS than those in low-risk group (Fig. 4E). The predictive power of our prognostic model was satisfactory (Fig. 4F). Furthermore, the risk score was identified as an independent predictor of OS by both univariate analysis (HR: 5.18; 95% CI: 1.8–14.92; P < 0.01) and multivariate Cox regression analysis (HR: 5.59; 95% CI: 1.79–17.44; P < 0.001; Table S2). Collectively, these findings suggested that our prognostic model also had strong power in GSE13213 dataset.
Functional enrichment analysis
GO enrichment and KEGG pathway analysis were performed to explore the functional roles of the DEGs in the TGCA LUAD cohort and GSE13213 dataset. GO analysis showed that the DEGs were mostly enriched in several immunity- and ferroptosis-related biological processes and molecular functions (P < 0.05; Fig. 5A and B). KEGG analysis showed that DEGs were mostly enriched in the ferroptosis pathway and immune-related pathways such as human T-cell leukemia virus 1 (HTLV-1) infection pathway (P < 0.05; Fig. 5C and D). These results suggested that there may be a crosslinking between ferroptosis and tumor immune in NSCLC.
To further uncover the correlation between the risk score and immune status, the enrichment scores of diverse immune cell subpopulations and related functions or pathways with single-sample gene set enrichment analysis (ssGSEA) were further quantified. Interestingly, the antigen presentation process, including the score of aDCs, iDCs, APC co-stimulation and HLA were significantly different between the low risk and high risk group in the TCGA cohort (P < 0.05; Fig. 6A and B). APC co-inhibition and HLA class were significantly lower in high-risk patients than that in high-risk patients (P < 0.05; Fig. 6B). Comparisons in the GSE13213 dataset identified the differences of HLA class, Type I -IFN response and Type II -IFN response (P < 0.05; Fig. 6C and D). In particular, the scores of macrophages and mast cells were the most statistically different between the two risk groups in both the TCGA cohort and the GSE13213 dataset, which was consistent with the findings in the GO enrichment analysis.