Aberrantly expressed lymph node metastasis related FRGs and functional annotation
A total of 431 FRGs were retrieved from the FerrDB database, of which 186 drivers, 132 suppressors and 113 makers were collected (Table 1). After removing 43 repeats, 288 FRGs remained. The expression level of each gene transformed with log2 was calculated. The "edgeR" analysis revealed that 64 FRGs were highly expressed and 37 FRGs were lowly expressed in LUAD samples with lymph node metastasis compared with LUAD samples without lymph node metastasis (Fig. 2A). In the "DESeq" analysis, 56 FRGs were found to be highly expressed, and 42 FRGs were lowly expressed (Fig. 2B). After comparing the two analytical methods, 54 FRGs were highly expressed and 36 FRGs were lowly expressed in LUAD samples with lymph node metastases compared with LUAD samples without lymph node metastases (Fig. 2C, Table 2).
The results of Go functional enrichment analysis and KEGG pathway enrichment analysis of the above differentially expressed FRGs were displayed in Figure 3 and Table 3. GO functional enrichment analysis showed that the up-regulated FRGs were mainly enriched in autophagosome assembly, protein lipidation involved in autophagosome assembly and response to nutrient in biological processes (BP) terms (Fig. 3A), enriched in cytosol, melanosome and mitochondrion in cellular components (CC) terms (Fig. 3B), and enriched in macromolecular complex binding, ubiquitin protein ligase binding and NADP binding in molecular functions (MF) terms (Fig. 3C). The down-regulated FRGs were mainly enriched in cellular response to hypoxia, response to hypoxia and protein stabilization in BP terms (Fig. 3E), enriched in endoplasmic reticulum and inclusion body in CC terms (Fig. 3F), and enriched in enzyme binding, oxidoreductase activity and antioxidant activity in MF terms (Fig. 3G). KEGG pathway analysis demonstrated that up-regulated FRGs were mainly enriched in ferroptosis, autophagy, central carbon metabolism in cancer and mTOR signaling pathway (Fig. 3D), and down-regulated FRGs were mainly enriched in mTOR signaling pathway and metabolic pathway (Fig. 3H).
Furthermore, we constructed a protein-protein interaction (PPI) network for 90 FRGs and found that there were 87 nodes and 792 edges in this network (Fig. 3I). Using the Cytoscape plugin MCODE, we identified the top 16 most connected genes (Fig. 3J).
Patient demographics and tumor characteristics
We further compared the demographic and clinicopathological characteristics of LUAD patients with different stages of lymph node metastasis. The results revealed that there were significant differences in T staging, M staging and TNM staging among patients in different N staging subgroups, but not in terms of age, sex, tumor site, prior malignancy, prior treatment and type of treatment (Fig. 4A, Table 4). The survival analysis showed significant differences in OS between different N staging subgroups (Fig. 4B). Further pairwise comparison found that there were significant difference between N0 group and N1 group (P < 0.0001), and N0 group and N2 group (P < 0.0001), but there were no significant difference between N0 group and N3 group (P = 0.42), N1 group and N2 group (P = 0.39), N1 group and N3 group (P = 0.23), and N2 group and N3 group (P = 0.17) (Fig. 4C).
Construction and verification of prognostic signatures based on FRGs
We performed a univariate Cox regression analysis on transcripts of 90 lymph node metastasis-associated FRGs in LUAD. Genes with HR > 1 and HR < 1 were considered risk genes and protective genes, respectively, regardless of the significance level. The analystic results indicated that 11 FRGs related to lymph node metastasis were risk genes, including CISD1, DDIT4, DECR1, GCLC, NEDD4, PPP1R13L, RRM2, SLC2A1, SLC7A5, TXNRD1 and VDAC2, and 4 FRGs related to lymph node metastasis were protective genes, including GLS2, IL33, PEBP1 and PHKG2 (Fig. 4D).
Subsequently, we assessed the correlation between 15 lymph node metastasis-related FRGs using Spearman correlation analysis, and the results showed significant correlation between the following FRGs: CISD1 with DDIT4, DECR1, GCLC, NEDD4, PEBP1, RRM2, SLC2A1, SLC7A5, TXNRD1 and VDAC2; DDIT4 with SLC2A1 and VDAC2; DECR1 with PEBP1, PHKG2, SLC7A5 and VDAC2; GCLC with RRM2, SLC2A1, SLC7A5, TXNRD1 and VDAC2; GLS2 with PEBP1; NEDD4 with PPP1R13L, RRM2, SLC2A1 and VDAC2; PEBP1 with PHKG2 and VDAC2; PHKG2 with PPP1R13L; PPP1R13L with SLC2A1 and VDAC2; RRM2 with SLC2A1, SLC7A5, TXNRD1 and VDAC2; SLC2A1 with SLC7A5, TXNRD1 and VDAC2; SLC7A5 with TXNRD1 and VDAC2; TXNRD1 with VDAC2 (Fig. 4E).
To improve the predictive ability of lymph node metastasis-related FRGs in LUAD, we used the LASSO Cox regression algorithm to construct an optimal prognostic model. This approach allowed us to calculate the risk scores for patients by combining both expression levels and the risk coefficients of genes. As a result, LASSO regression constructed a 9-gene signature consisting of CISD1, DDIT4, DECR1, IL33, PEBP1, PHKG2, PPP1R13L, SLC7A5 and VDAC2.
Univariate and multivariate Cox analyses were performed to find the influencing factors of prognosis in patients with LUAD. Univariate Cox analysis demonstrated that T staging, M staging, TNM staging, prior malignancy and risk score were independent predictors of prognosis in patients with LUAD (Fig. 5A). Multivariate Cox analysis indicated that TNM staging, prior malignancy and risk score were independent predictors of prognosis (Fig. 5B). To calculate the predictive value of each individual's clinical final outcome, we evaluated 1, 3, 5 and 10 years OS based on age, gender, T staging, M staging, TNM staging, site, prior malignancy, prior treatment, treatment type and risk score. A nomogram was constructed for annual survival as well as a calibration curve for the nomogram (Fig. 5C). Notably, no significant differences were observed between the observed probabilities and the predicted outcomes for 1, 3, 5 and 10-year OS (Fig. 5D).
Based on the median risk score, the LUAD samples were divided into low-risk and high-risk groups (Fig. 5E). The time-dependent ROC curve showed that the maximal area under the curve (AUC) was 0.695, which indicated that the performance of the prediction model for 9-FRGs was relatively stable (Fig. 5F). The Kaplan-Meier survival curve demonstrated that the survival rate of the low-risk group was significantly higher than that of the high-risk group (P < 0.0001) (Fig. 5G). Fig. 5H displayed that the correlation between N staging and risk score, the results showed that the risk score of N0 group was lower than that of N1 and N2 groups (P = 0.002, P = 0.001, respectively), N1 group was lower than that of N2 and N3 groups (P = 0.037, P = 0.049, respectively), N2 group was lower than N3 group (P = 0.001), indicating that the higher the N staging, the higher the risk score. Further analysis suggested that there was no significant inclusion relationship between N subgroups and risk score groups (Fig. 5I).
Relationships between expression levels of 9 FRGs, risk score, N staging and survival
To further elucidate the relationship between the 9 FRGs, risk score, N staging and survival, we performed the following analysis and the results were shown in Fig. 6. The risk genes, including CISD1, DDIT4, DECR1, PPP1R13L, SLC7A5 and VDAC2, were inclined to be highly expressed in patients at high-risk group (P = 9.10E-05, P = 9.23E-09, P = 0.001, P = 1.21E-04, P = 3.27E-08, P = 6.66E-05, respectively). While protective genes, including IL33, PEBP1 and PHKG2, were highly expressed in low-risk group (P = 1.92E-10, P = 1.34E-10, P = 2.21E-07, respectively).The risk scores of risk genes (CISD1, DDIT4, DECR1, PPP1R13L, SLC7A5 and VDAC2) in high expression group were higher than that in low expression group (P = 8.95E-05, P = 1.03E-06, P = 9.84E-04, P = 9.25E-05, P = 7.05E-08, P = 0.004, respectively), while the risk scores of protective genes (IL33, PEBP1 and PHKG2) in high expression group were lower than that in low expression group (P = 8.01E-05, P = 0.041, P = 0.018, respectively). The relationships of expression levels of 9 FRGs and N staging indicated that the expression level of DDIT4 in N0 group was lower than that in N1 and N2 groups (P = 0.020, P = 0.013, respectively), the expression level of PEBP1 in N0 group was higher than that in N2 group (P = 0.0078), the expression level of PHKG2 in N3 group was higher than that in N0, N1 and N2 groups (P = 0.022, P = 0.021, P = 0.021, respectively), and in N2 group was higher than that in N0 and N1 groups (P = 0.00046, P = 0.036, respectively), the expression level of SLC7A5 in N0 group was lower than that in N1 group (P = 0.019).
Subsequently, we divided the patients into low expression group and high expression group according to the expression levels of 9 FRGs, and we analysed the differences of risk score and survival of different N groups. In high CISD1 expression group, the risk score in N0 group was lower than that in N1 and N2 groups (P = 0.020, P = 0.002, respectively), in N1 group was lower than that in N2 and N3 groups (P = 0.024, P = 0.011, respectively), and in N2 group was higher than that in N3 group (P < 0.001). In high DDIT4 expression group, the risk score in N0 group was lower than that in N1 and N2 groups (P = 0.034, P = 0.005, respectively). In low DDIT4 expression group, the risk score in N0 group was lower than that in N2 group (P = 0.043). In high DECR1 expression group, the risk score in N0 group was lower than that in N1 and N2 groups (P = 0.016, P = 0.007, respectively), in N1 group was lower than that in N3 group (P = 0.007), and in N2 group was higher than that in N3 group (P = 0.001). In low DECR1 expression group, the risk score in N0 group was lower than that in N1 and N2 groups (P = 0.034, P = 0.015, respectively). In high IL33 expression group, the risk score in N1 group was lower than that in N2 group (P = 0.022). In low IL33 expression group, the risk score in N0 group was lower than that in N1 and N2 groups (P = 0.045, P = 0.017, respectively). In high PEBP1 expression group, the risk score in N0 group was lower than that in N1 group (P = 0.034). In low PEBP1 expression group, the risk score in N0 group was lower than that in N1 and N2 groups (P = 0.030, P = 0.003, respectively), in N1 group was lower than that in N2 and N3 groups (P = 0.049, P = 0.040, respectively), and in N2 group was higher than that in N3 group (P = 0.001). In high PHKG2 expression group, the risk score in N0 group was lower than that in N1 and N2 groups (P = 0.015, P = 0.019, respectively), in N1 group was lower than that in N3 group (P = 0.049), and in N2 group was higher than that in N3 group (P = 0.014). In low PHKG2 expression group, the risk score in N0 group was lower than that in N2 group (P = 0.024). In high PPP1R13L expression group, the risk score in N0 group was lower than that in N1 and N2 groups (P = 0.004, P = 0.004, respectively). In low PPP1R13L expression group, the risk score in N0 group was lower than that in N2 group (P = 0.035). In high SLC7A5 expression group, the risk score in N0 group was lower than that in N1 and N2 groups (P = 0.021, P = 0.010, respectively), in N1 group was lower than that in N3 group (P = 0.015), and in N2 group was higher than that in N3 group (P = 0.002). In low SLC7A5 expression group, the risk score in N0 group was lower than that in N2 group (P = 0.029). In high VDAC2 expression group, the risk score in N0 group was lower than that in N1 and N2 groups (P = 0.049, P = 0.002, respectively), and in N1 group was lower than that in N2 group (P = 0.021). In low VDAC2 expression group, the risk score in N0 group was lower than that in N1 and N2 groups (P = 0.008, P = 0.028, respectively).
Further survival analysis found that for the 9 FRGs, whether it was the high expression group or the low expression group, there were significant differences in the survival rates of LUAD patients with different N stages.
Differential analysis and functional analysis between low-risk and high-risk groups
The “edgeR” analysis demonstrated that, compared with low-risk group, 300 genes were highly expressed and 179 genes were lowly expressed in high-risk group (Fig. 7A). And the “DESeq” analysis showed that 19 genes were highly expressed and 32 genes were lowly expressed (Fig. 7B). In edgeR analysis, BP terms mainly enriched in RNA processing, formation of quadruple SL/U4/U5/U6 snRNP and spliceosomal tri-snRNP complex assembly, CC terms mainly enriched in nucleolus, chylomicron and U4/U6 x U5 tri-snRNP complex, MF term mainly enriched in high-density lipoprotein particle receptor binding, lipase inhibitor activity and cholesterol binding (Fig. 7C, Table 5). The most significant KEGG pathways were cholesterol metabolism, ovarian steroidogenesis and cortisol synthesis and secretion (Fig. 7D, Table 5). In DESeq analysis, BP terms mainly enriched in antimicrobial humoral immune response mediated by antimicrobial peptide, protein autoprocessing and cornification, CC terms mainly enriched in extracellular space, extracellular region and perikaryon, MF terms mainly enriched in coumarin 7-hydroxylase activity, serine-type endopeptidase activity and identical protein binding (Fig. 7E, Table 5).
Immune infiltration analysis between high-risk and low-risk groups
Since ferroptosis is closely related to immunity, we explored the differences in immune infiltration between the high-risk and low-risk groups. CIBERSORT is a classical immunoassay that calculates the level of infiltration of 22 immune cells by analyzing gene expression profiles in tissues, so we used it to analyze the composition of immune cells. Through CIBERSORT algorithm analysis, the results showed that 329 LUAD patients were included in this study (P < 0.05), including 171 low-risk patients and 158 high-risk patients. The results were presented as a heatmap and a bar plot (Fig. 8A-B). It can be seen that the composition of TIICs in high-risk and low-risk groups was basically the same, mainly composed of resting memory CD4 T cells, M2 macrophages, M0 macrophages, naive B cells, plasma cells, CD8 T cells, M1 macrophages and resting mast cells. We compared the differences in immune infiltration of 22 TIICs between the two groups by t-test (Fig. 8C). The results suggested that memory B cells, resting mast cells and resting memory CD4 T cells were shown to be more abundant in the low-risk group than in the high-risk group, but M0 macrophages, activated mast cells, activated memory CD4 T cells and follicular helper T cells showed higher infiltration in the high-risk group (Table 6).
Additionally, we found that the 9 FRGs were correlated with 21 TIICs (Fig. 8D). The correlation analysis showed a significant correlation between immune infiltration and risk scores for 5 immune cells. Specifically, M0 macrophages (R = 0.15, P = 0.0083), activated mast cells (R = 0.13, P = 0.018) and follicular helper T cells (R = 0.11, P = 0.042) were positively associated with risk scores, while resting mast cells (R = -0.21, P = 9.8E-05) and resting memory CD4 T cells (R = -0.17, P = 0.0023) were negatively associated with risk scores (Fig. 8E).
Univariate and multivariate Cox analyses were performed to find the predicted TIICs of prognosis in LUAD. Univariate Cox analysis demonstrated that the activated dendritic cells, eosinophils, activated mast cells, monocytes and gamma delta T cells were independent predictors for prognosis of LUAD (Fig. 9A). Multivariate Cox analysis demonstrated that the activated dendritic cells, eosinophils, activated mast cells, monocytes and T cells gamma delta were independent predictors of prognosis (Fig. 9B). According to the median risk score, the samples were divided into low-risk and high-risk groups (Fig. 9C). The Kaplan-Meier survival curves showed that the predicted survival time of the low-risk group was obviously longer than that of the high-risk group (P = 0.016) (Fig. 9D). Time-dependent ROC curves showed that the maximal AUC was 0.711, which indicated that the performance of the 5-TIICs signature was very stable (Fig. 9E).