2.1. Identification of prognostic ferroptosis-associated DEGs in the TCGA cohort
A total of 502 LUSC patient's genes expression and clinical information from the TCGA cohort and 247 LUSC patient's genes expression from the GEO cohort were enrolled. The detailed clinical characteristics of these patients are summarized in Table 1. We have obtained 60 ferroptosis-associated genes through previous research, 52 DEGs of them were differentially expressed between tumor and normal tissues and 5 of them were correlated with OS in the univariate Cox regression analysis in TCGA cohort (Fig. 1a). Take the intersection of the 52 DEGs and 5 prognostic related genes from the previous step, a total of 5 prognostic and ferroptosis-associated DEGs were screened out (Fig. 1b). The heat map shows the differential expression of the 5 genes between tumor and normal tissues, ALOX5 and DPP4 were significantly higher in normal tissues, while FADS2, NOX1, and PHKG2 were significantly higher in LUSC tissues (Fig. 1c). We used the Wilcoxon-rank sum test to analyze the relationship between the 5 ferroptosis-associated genes expression in different tissues respectively, and the results showed that the 5 ferroptosis-associated genes expression was significantly differences in LUSC tissues and normal tissues in the TCGA cohort, the result shows that ALOX5 and DPP4 were significantly higher in normal tissues, while FADS2, NOX1, and PHKG2 were significantly higher in LUSC tissues (all adjusted P < 0.05) (Fig. S1). Subsequently, we used the Wilcoxon signed-rank test to determine the 5 ferroptosis-associated genes expression in 46 LUSC tissues and matched adjacent normal tissues in the TCGA cohort, The expression of ALOX5 and DPP4 were significantly higher in 46 matched adjacent normal tissues than in LUSC tissues, adversely, FADS2, NOX1, and PHKG2 were significantly higher in matched adjacent normal tissues, which is consistent with the previous unpaired analysis results (all adjusted P < 0.05) (Fig. S2). Because these 5 genes are significantly related to OS performed by univariate Cox regression analysis (Fig. 1a), Kaplan–Meier survival analysis was performed to examine the influence of different expression levels of these 5 genes on prognosis of LUSC respectively. The results showed that patients with low PHKG2 expression experienced a shorter OS duration than those with high PHKG2 expression (P = 0.026) and the patients with high DPP4 expression experienced a shorter OS duration than those with low DPP4 expression (P = 0.043) (Fig. S3). Meanwhile, we also explored the relationship between the 5 genes and different clinical characteristics respectively. Among those 5 genes, the expression of ALOX5 was obviously correlated with part of T stage and N stage and gender; the expression of DPP4 was obviously correlated with part of N stage; the expression of NOX1 was obviously correlated with part of T stage (all adjusted P < 0.05). (Fig. S4). The interaction network among these 5 genes shows that there is a certain correlation between them, and the correlation coefficient is 0.4 (Fig. 1d). PPI network analysis was performed using the online database STRING with interaction score of 0.4 as the threshold. There are 4 connects and 4 nodes, further proves the correlation between the 5 genes. (Fig. 1e).
Table 1
Clinical characteristics of the LUSC patients used in this study.
|
TCGA cohort
|
Merge cohort
|
No. of patients
|
493
|
740
|
Age (median, range)
|
68(39–85)
|
66 (37–75)
|
Gender (%)
|
|
|
Female
|
128(26%)
|
180 (24%)
|
Male
|
365(74%)
|
560 (76%)
|
Stage (%)
|
|
|
I
|
241
|
351
|
II
|
158
|
237
|
III
|
83
|
124
|
IV
|
7
|
17
|
unknown
|
4
|
11
|
Survival status
|
|
|
OS days(median)
|
660
|
620
|
2.2. Construction of a prognostic model in the TCGA cohort
We further used the LASSO Cox regression analysis to minimize the risk of overfitting. A risk model by these 5 ferroptosis-associated genes expression profile was selected in the LASSO, thus the predictive accuracy could be improved significantly. In detail, A 5-gene signature was identified based on the optimal value of λ (Fig. S5). The optimal cut-off expression value of each gene indicated that high expression of ALOX5, DPP4, and FADS2 are correlated with a poor prognosis while NOX1 and PHKG2 are the opposite. The risk score was calculated as follows: e [0.116 * expression level of ALOX5 + 0.037 * expression level of DPP4 + (-0.306) * expression level of PHKG2 + 0.122 * expression level of FADS2 + (-0.526) * expression level of NOX1]. The patients were stratified into a high-risk group (n = 246) or a low-risk group (n = 247) according to the median value of the risk score (Fig. 2a) The high- risk group was found to be significantly associated with higher TNM stage, higher age in TCGA cohort (Table 2). In order to verify whether the model can effectively distinguish patients in different groups, we realize visualization through dimensionality reduction processing, PCA and t-SNE analysis indicated the patients in different risk groups were distributed in two directions (Fig. 2b-c) and patients with high-risk had a higher probability of death earlier than those with low-risk (Fig. 2d). Consistently, the Kaplan-Meier curve indicated that patients in the high-risk group had a significantly worse OS than their low-risk counterparts (Fig. 2e, P < 0.001). The predictive performance of the risk score for OS was evaluated by time-dependent ROC curves. The area under the curve (AUC) reached 0.67 at 1 year, 0.74 at 2 years, and 0.80 at 3 years proved the performance of this prognostic model (Fig. 2f).
Table 2
Baseline characteristics of the patients in different risk groups.
Characteristics
|
TCGA cohort
|
|
|
|
High risk
|
Low risk
|
P value
|
Gender (%)
|
|
|
0.169
|
Female
|
65 (26.4)
|
63 (25.5)
|
|
Male
|
181 (73.6)
|
184 (74.5)
|
|
Age (%)
|
|
|
0.042
|
< 60y
|
48 (19.5)
|
42 (17.0)
|
|
≥ 60y
|
196 (79.7)
|
202 (81.8)
|
|
unknown
|
2 (0.8)
|
3 (1.2)
|
|
TNM stage (%)
|
|
|
0.012
|
Ⅰ+Ⅱ
|
199 (80.9)
|
200 (81.0)
|
|
Ⅲ+Ⅳ
|
44 (17.9)
|
46 (18.6)
|
|
unknown
|
3 (1.2)
|
1 (0.4)
|
|
2.3. Validation of the 5-gene signature in the GEO and TCGA cohort
We test the robustness of the model constructed by a cohort merged from GEO dataset and TCGA database, the patients from the merged cohort were also categorized into two risk groups by the median value calculated with the same formula as that from the TCGA cohort (Fig. 3a). Similar to the results obtained from the TCGA cohort, PCA and t-SNE analysis confirmed that patients in two subgroups were distributed in discrete directions (Fig. 3b-c). Likewise, patients in the high-risk group were encounter death earlier (Fig. 3d) and had a reduced survival time compared with those in the low-risk group (Fig. 3e, P < 0.05). Besides, the AUC of the model was 0.66 at 1 year, 0.69 at 2 years, and 0.73 at 3 years (Fig. 3f) proves that this prediction model can effectively predict the prognosis of patients further.
2.4. Independent prognostic value of the 5-gene signature
Univariate and multivariate Cox regression analyses were carried out among the available variables to determine whether the risk score was an independent prognostic predictor for OS. In univariate Cox regression analyses, the risk score was significantly associated with OS in the TCGA and the merged cohort (HR = 2.844, 95% CI = 1.794–4.508, P < 0.001; HR = 1.595, 95% CI = 1.299–1.959, P < 0.001) (Fig. 4a, c). After correction for other confounding factors, the risk score still proved to be an independent predictor for OS in the multivariate Cox regression analysis (HR = 2.846, 95% CI = 1.788–4.529, P < 0.001; HR = 1.466, 95% CI = 1.187–1.810, P < 0.001) (Fig. 4b, d).
2.5. GO enrichment and KEGG pathway analyses
To elucidate the biological functions and pathways that were associated with the risk score, the DEGs were used to perform GO enrichment and KEGG pathway analyses in TCGA cohort. The result shows that various pathways including phagosome, pyrimidine metabolism, viral protein interaction with cytokine and cytokine receptor, and cytokine − cytokine receptor interaction, especially some immune-related pathways such as leukocyte transendothelial migration pathway were significantly enriched in cohort by KEGG pathway analyses (all adjusted P < 0.05, Fig. 5d, e, f). Interestingly, the DEGs were also obviously enriched in many immune-related biological processes, including leukocyte migration, humoral immune response, myeloid leukocyte migration, and leukocyte chemotaxis through GO enrichment (all adjusted P < 0.05, Fig. 5a, b, c). Taking into account the correlation between ferroptosis and immunity, enrichment analysis gives evidence that the model may also be related to immunity. Thus, in order to further confirm the above speculation, GSEA was followed performed to reveal differences between the high and low 5 ferroptosis-associated genes expression cohorts respectively. The results shows that multiple immune related pathways were significantly enriched and the most significantly enriched signaling pathways was leukocyte transendothelial migration pathway in the high ALOX5 and DPP4 and low NOX1 and PHKG2 expression phenotype that was consistent with the results of the previous KEGG pathway analyses (Fig s6, s7).
2.6. Single gene immune analysis
Because of the significantly associated with immune in our model, we wonder whether the 5 genes that involved in this model are also related to immune infiltration. The result shows that the relationship between the 5 ferroptosis-associated genes expression and immune cell infiltration (Fig. s8 a-e). At the same time, we extracted the expression levels of these 5 genes in tumors symbol respectively and divided the samples into high- and low-risk groups based on the median value, and then compared the differences in the content of each immune cell in the two groups (Fig. s9 a-e). We got the most relevant immune cells for the 5 ferroptosis-associated gene from the intersection of the two set and visualized using Venn plots (Fig. s10 a-e). Finally, the most significantly immune cells that related to each of the 5 ferroptosis-associated genes we filter out, which are ALOX5: memory B cells, resting CD4 memory T cells, follicular helper T cells, regulatory T (Tregs) cells, activated dendritic cells, M2 macrophages, activated mast cells, and neutrophils; DPP4: resting CD4 memory T cells, follicular helper T cells, M0 macrophages, M2 macrophages and, neutrophils; FADS2: M2 macrophages, activated dendritic cells, and neutrophils; NOX1: CD8 T cells, resting CD4 memory T cells, follicular helper T cells, M0 macrophages and, M2 macrophages; PHKG2: CD8 T cells, resting CD4 memory T cells, follicular helper T cells, resting natural killer (NK) cells, activated NK cells, M2 macrophages, resting dendritic cells, and eosinophils.
2.7. Immune cell infiltration in different risk groups
Previous studies have revealed that tumor-infiltrating immune cells are related to prognosis [17]. In order to find more evidence to support the correlation between the high and low-risk groups and immune infiltration, we explored the infiltration of specific tumor immune cell subsets by CIBERSORT to get the 22 immune cells' abundance in different groups respectively. The radar charts depict a comparative summary of various immune cells in two risk groups (Fig. 6). We found that M2 macrophages and resting CD4 memory T cells were significantly highly expressed in the high-risk group, while follicular helper T cells, activated NK cells, activated dendritic cells were significantly higher in the low-risk group (Fig. 7). Therefore, we analyzed differences in immune infiltration between the tumor and normal tissues for 22 immune cells using CIBERSORT further. First, we presented the proportion of each immune cell in all samples using a bar plot (Figure s11b) and then used a heat map to compare the levels of immune cell infiltration between normal and LUSC tissues (Figure s11c). And low to moderate correlation was observed in various immunocyte subpopulations (figure s11a). To further explore the correlation between the risk score and immune status, we quantified the enrichment scores of diverse immune cell subpopulations, related functions or pathways with ssGSEA. To our surprise, all of them including immune cell subpopulations such as dendritic cells, B cells, CD8 + T cells, macrophages, mast cells, neutrophils, NK cells, helper T cells, tumor infiltrating lymphocyte, and Tregs, and related functions or pathways such as antigen-presenting cells (APC) co inhibition, APC co stimulation, chemokine receptor, check point, cytolytic activity, human leukocyte antigen, inflammation-promoting, major histocompatibility complex (MHC) class I, parainflammation, T cell co-inhibition, T cell co-stimulation, type I and type II interferons (IFN) response have higher scores in high-risk groups (Fig. 8, adjusted P < 0.001). The result reminds us that in LUSC, the ferroptosis-associated gene signature we construct is significantly correlated with immune infiltration. The results shown above all remind us that there are significant differences in the immune status of the high- and low-risk groups with this model in LUSC.