Figure 1 illustrates the study's flow diagram, displaying the sequential steps of the research process. From the TCGA COAD and READ cohorts, 548 individuals diagnosed with CRC were registered, and the GSE38832 dataset contributed an extra 122 CRC patients. The clinical parameters pertaining to these individuals have been comprehensively presented in Table S3.
Determination of ferroptosis-linked DEGs with prognostic value in TCGA cohort
Most FRGs (51/60, 85%) were expressed differentially in tumor tissues in contrast with the neighboring non-tumorous tissues (FDR < 0.05) (Fig. 2A). In the univariate Cox regression analysis, nine genes were significantly correlated with OS and ten genes were correlated with DFS (Fig. 2B). However, only eight of the nine OS-related FRGs were DEGs, only eight of the ten DFS-related FRGs were DEGs, and five DEGs overlapped between OS- and DFS-related DEGs (Fig. 2A). Therefore, 11 FRGs were preserved (AKR1C1, ALOX12, ATP5MC3, CRYAB, FDFT1, CARS1, HMGCR, PHKG2, ACO1, ZEB1, and HMOX1). The heatmap of these 11 genes is indicated in Fig. 2C. Using the STRING website, protein-protein interaction network analysis indicated an interaction among the 11 FRGs and identified HMOX1, HMGCR, CARS, and FDFT1 as the hub genes (Fig. 2D). Figure 2E presents an overview of the correlation among these genes.
Construction of prognostic models in TCGA cohort
A total of eight FRGs (AKR1C1, ALOX12, ATP5MC3, CARS1, HMGCR, CRYAB, FDFT1 and PHKG2) were significantly associated with OS and differently expressed between non-tumorous and tumor tissues. In the Lasso Cox regression analysis, we used data on these eight candidate genes to construct the OS prediction model based on the optimal value of λ (Fig. S1A-B). The risk score related to the OS is calculated by multiplying the expression value of AKR1C1 by 0.174319035494141, the expression value of ALOX12 by 0.285271626774606, the expression value of ATP5MC3 by -0.052337857381986, the expression value of CARS1 by 0.42422604060183, the expression value of HMGCR by -0.102934945875135, the expression value of CRYAB by 0.172918314244317, the expression value of FDFT1 by -0.303137061624834, and the expression value of PHKG2 by 0.252801587609397. Based on the median values of the OS-related risk score, we divided all cases of CRC into two distinct risk groups, namely high-risk and low-risk groups. Kaplan-Meier (KM) survival curves clearly depicted a notable disparity in mortality rates between the high-risk and low-risk groups (P < 0.001), with patients in the high-risk group exhibiting significantly higher mortality compared to those in the low-risk group (Fig. 3A). The OS model's predictive accuracy was assessed by analyzing time-dependent ROC curves, yielding AUC values of 0.638, 0.685, and 0.696 at 1-year, 3-years, and 5-years, respectively (Fig. 3B). The distribution of OS status among patients was presented in Fig. 3C-D. Patients with a high-risk score demonstrated a greater likelihood of early mortality compared to those with a low-risk score. By analyzing the TCGA dataset, we conducted additional research on the connection between FRG expression and clinical features in CRC patients. Our findings revealed a robust correlation between the novel FRG-centered survival model and the grade of the tumor, along with the stages of T, N, and M (Fig. S2). Furthermore, the utilization of principal component analysis (PCA) and t-SNE examinations exhibited distinct divisions in the arrangement of individuals across various risk categories (Fig. 3E-F).
In the univariate Cox regression assessments, ten out of the sixty FRGs were associated with DFS, with eight of them showing differential expression between tumor and non-tumorous samples (AKR1C1, ALOX12, ATP5MC3, CRYAB, FDFT1, ACO1, ZEB1, and HMOX1). The initial five genes overlapped with the ones in the OS prediction model (AKR1C1, ALOX12, ATP5MC3, CRYAB, and FDFT1). In the Lasso Cox regression assessment, eight genes were identified to establish the DFS prediction model on the basis of the optimal value of λ (Fig. S1C-D). The risk score related to DFS is calculated by multiplying the expression value of AKR1C1 by 0.311406604284949, the expression value of ALOX12 by 0.0232175209855106, the expression value of ATP5MC3 by -0.162721918280564, the expression value of HMOX1 by -0.292725678027233, the expression value of ACO1 by -0.243593906594487, the expression value of CRYAB by 0.152337516487412, the expression value of FDFT1 by -0.179806659564686, and the expression value of ZEB1 by 0.0910633652660188. The median values of the DFS-related risk score were utilized to stratify CRC cases into high-risk and low-risk groups. According to the KM survival curves, the low-risk group exhibited a considerably longer duration of DFS in comparison to the high-risk group (P < 0.01) (Fig. 4A). The predictive performance of the DFS-linked biosignatures was evaluated using ROC curves, which yielded AUC values of 0.629, 0.731, and 0.747 at 1-year, 2-year, and 5-year, respectively (Fig. 4B). Figure 4C-D presented the distribution of DFS status among patients in the TCGA dataset. The application of PCA and t-SNE analyses revealed clear differentiation in the distributions of patients across the various risk groups, represented along two dimensions (Fig. 4E-F). Moreover, the TCGA cohort exhibited a positive correlation between higher risk scores and advanced tumor grade as well as TNM stage (Fig. S3).
Verification of the two prognostic models in independent CRC cohorts
To validate the OS prediction model, we initially focused on the GEO dataset. Using the same formula as described earlier, we categorized patients from the GSE38832 cohort into two distinct categories according to the middlemost figure of their hazard assessments. Consistent with our previous observations, the low-risk group demonstrated a notable increase in the duration of survival (Fig. 5A, p < 0.01) and a higher probability of long-term survival (Fig. 5D). Likewise, the AUC of the 8-gene biosignature was 0.572 at 1-year, 0.626 at 3-year, as well as 0.600 at 5-year (Fig. 5B). Additionally, PCA as well as t-SNE analyses confirmed distinct distributions of patients with different risk scores in separate directions (Fig. 5E-F).
Additionally, we performed verification of the DFS-associated model utilizing the GEO dataset. Following a similar approach to the OS model verification, we calculated the DFS risk score by employing the formula established with the TCGA cohort. In line with our earlier discoveries, the group with low risk exhibited considerably longer survival duration in comparison to the high-risk group (Fig. 6A, p < 0.01) and In line with our earlier discoveries, the group with low risk exhibited considerably longer survival duration in comparison to the high-risk group (Fig. 6D). The AUC values of the 8-gene biosignature were 0.724, 0.713, and 0.752 at 1-year, 3-year, and 5-year, respectively (Fig. 6B). Furthermore, the PCA analysis confirmed distinct distributions of individuals from the two risk groups along separate axes, indicating their divergent patterns. Likewise, the t-SNE analysis yielded satisfactory results in effectively distinguishing the two groups, further supporting their separability.
Independent prognostic value of the two prediction models
To determine the potential of the risk score as an independent prognostic index, we conducted univariate and multivariate Cox regression analyses using the available parameters in the TCGA cohort. In terms of overall survival (OS), the results of the univariate Cox regression analysis confirmed a significant association between the risk score and OS (HR = 2.747, 95% CI = 1.859‒4.059, p < 0.001) (Fig. 7A). Furthermore, the risk score persisted as a significant and independent predictor of OS in the multivariate Cox regression analysis even after adjusting for other confounding factors (HR = 1.961, 95% CI = 1.316‒2.921, p < 0.001) (Fig. 7B). For DFS, the risk score was remarkably linked to DFS in both the univariate (HR = 3.642, 95% CI = 2.293‒5.784, p < 0.001) and MultiCox assessments (HR = 3.061, 95% CI = 1.847‒5.072, p < 0.001), thus establishing its status as an independent predictor of DFS (Fig. 7C-D).
Functional enrichment analysis
We employed the "clusterProfiler" package in R to conduct KEGG and GO enrichment analyses based on the differential expression of FRGs between tumor and non-tumor tissues. The results illustrated that the remarkably enriched GO terms for BP were cofactor metabolic, metabolism of sulphur compounds, and response to oxidative stress; for CC, the terms were mitochondrial matrix, cell projection membrane, and peroxisomal membrane; for MF, the terms were coenzyme binding, ligase activity, and oxidoreductase activity (Fig. 8A). Next, the KEGG analysis results showed that ferroptosis, glutathione metabolism, fluid shear stress, atherosclerosis, and microRNAs in cancer were enriched (Fig. 8B).
To investigate the biological functions and signaling pathways associated with the risk score, we analyzed the differences in gene expression between the low-risk and high-risk groups, which were predicted by our two prognostic models. These differences were then utilized to perform GO and KEGG GSEA. The significantly enriched terms were generally consistent in the two prognostic models (Fig. 9). GO functional assessment demonstrated that genes with upregulated expression in the high-risk group were involved in cellular response to vascular endothelial growth factor stimulus, negative modulation of muscle cell apoptotic process, site of polarized growth, growth factor binding, and activin receptor signaling pathway (Fig. 9A, C). As expected, KEGG analysis demonstrated that genes with upregulated expression in the high-risk group were abundant in several signaling pathways which were essential for carcinogenesis, including WNT, mammalian (mechanistic) target of rapamycin (mTOR), MAPK, and HEDGEHOG signaling pathway (FDR < 0.05 and |NES|>1, Fig. 9B, D). Meanwhile, terms enriched in the low-risk group are indicated in Fig. 9 to better illustrate the underlying biological mechanisms of our prediction models. GO functional analysis indicated involvement of genes with upregulated expression in the low-risk group in the modulation of DNA damage by the p53 class mediator, NADH dehydrogenase complex, oxidoreductase complex, metal cluster binding, and NAD and NADPH binding (Fig. 9E, G). Interestingly, terms associated with immune reaction were also enriched in the low-risk group predicted by our DFS model, such as proteasome, antigen binding, and immunoglobulin receptor binding (Fig. 9G), which indicates the immune status of the cancer cells is a critical factor for tumor progression. Furthermore, genes with upregulated expression in the low-risk group were abundant in several signaling pathways that are related to peroxidation processes, such as the citrate cycle, glutathione metabolism, p53 signaling pathway, and cysteine and methionine metabolism (FDR < 0.05 and NES > 1, Fig. 9F, H).
Previous studies have established a link between immune status and ferroptosis[13]. To further explore this relationship, we employed ssGSEA to assess the enrichment scores of diverse immune cell subpopulations, correlated functions, and pathways. The immune scores varied between the high- and low-risk groups predicted by the DFS model (Fig. S4 A-B). However, this trend was not evident in the OS model (Fig. S4 C-D). Most of the immune cells consisting of aDCs, B cells, CD8 + T cells, dendritic cells (DCs), neutrophils, NK cells, Tfh, Th1 cells, Th2 cells, TILs, and Tregs exhibited a higher score in the low-risk group. Additionally, all immune-related functions displayed a higher score in the low-risk group, except for the type II IFN response.
Construction of nomograms for predicting prognosis of CRC
To estimate the 1-, 3-, and 5-year overall OS rates, we developed a nomogram that integrated the 8-gene ferroptosis signature with clinicopathological parameters, including stage, age, sex, and TNM. Each factor was assigned points in accordance with its contribution to survival risk, as depicted in Fig. 10A. Figure 10B-D showed strong agreement between the observed and projected survival rates, especially for the 3-year survival, as indicated by the calibration plots. As an example, a 70-year-old female patient with stage III (100 points), stage_T3 (70 points), stage_M1 (85 points), stage_N1 (0 points), and a high-risk score (10 points) obtained a total of 367.5 points. The corresponding 1-year and 3-year survival rates were approximately 55% and 25%, respectively. Furthermore, we created a nomogram to evaluate the likelihood of DFS at 1- and 3-year intervals (Fig. 11A). The calibration plots showed a strong concordance between the observed and predicted survival rates, with notable accuracy observed for the 3-year survival period (Fig. 11B-D).
Candidate drugs for CRC patients based on risk models
Using the "limma" package in R, we identified DEGs between the low- and high-risk groups in CRC (FDR < 0.05, |logFC| > 1). Subsequently, we evaluated the possibility of using these DEGs as targets for drugs in CRC by analyzing the substances accessible in the CMap online database. From the 2,428 compounds examined, drugs with unfavorable scores were chosen as potential medications for CRC patients (Table 1, Table 2). Based on the OS prognostic model, statistical analysis identified fourteen potential medications, while the DFS prognostic model yielded two potential medications with a cut-off score below − 80. The chemical structures of these compounds were presented by Fig. S5. Afterwards, we evaluated the levels of activity of the potential medications using the CellMiner online data source. Among the sixteen potential medications, only six possessed an NSC identifier, whereas the remaining ten were devoid of details in the CellMiner online database. Fig.S6 exhibited the Z scores of drug activities in the NCI60 cell lines, emphasizing solely the drugs with Z scores falling within the 1.2 range. These drugs included Fulvestrant, Mifepristone, Zalcitabine, and ABT-737.
Table 1
Candidate drugs identified by the OS model using the CMap database
Name | Score | Description | Target |
Gestrinone | -93.77 | Progesterone receptor antagonist | PGR, AR, ESR1 |
Clofibrate | -91.24 | PPAR receptor agonist | PPARA, LPL |
Trioxsalen | -89.15 | DNA synthesis inhibitor | - |
Arecaidine | -87.67 | Acetylcholine receptor agonist | CHRM1, CHRM2, CHRM3, CHRM4 |
Ethisterone | -86.63 | Progestogen hormone | TYR |
NSC-94258 | -86.58 | Antineoplastic | AKR1B1, CYP19A1, HSD17B1 |
Huperzine-a | -85.65 | Acetylcholinesterase inhibitor | ACHE |
Phensuximide | -85.18 | Succinimide antiepileptic | - |
Benzanthrone | -84.54 | Aromatic hydrocarbon derivative | - |
Zalcitabine | -83.62 | Nucleoside reverse transcriptase inhibitor | - |
Triacsin-c | -83.52 | Adrenergic receptor antagonist | ACSL1 |
ABT-737 | -83.16 | BCL inhibitor | BCL2, BCL2L1, BCL2L2 |
AM-281 | -82.65 | Cannabinoid receptor antagonist | CNR1, CNR2, GPR55 |
Fulvestrant | -81.51 | Oestrogen receptor antagonist | ESR1, EPHX2, ESR2, GPER1 |
Table 2
Candidate drugs identified by the DFS model using the CMap database
Name | Score | Description | Target |
Mifepristone | -93.97 | Glucocorticoid receptor antagonist, Progesterone receptor antagonist | PGR, NR3C1, AR, CYP2B6, CYP2C8, CYP3A5, CYP3A7, NR1I2 |
Dihydro-7-desacetyldeoxygedunin | -81.65 | HSP inhibitor | HSP90AA1 |
In vitro validation and identification of FDFT1 as a promising treatment target in CRC
We systematically quantified the expression levels of the five key genes (AKR1C1, ALOX12, FDFT1, ATP5MC3, and CRYAB) in 15 CRC samples and their neighboring non-tumorous tissues. PCR results revealed that AKR1C1, ALOX12, and CRYAB were upregulated in CRC tissues, while FDFT1 and ATP5MC3 were downregulated in CRC tissues (Fig. 12), which was congruent with the data of the heatmap (Fig. 2C). Data from the cBioPortal suggested that the mutation rate of FDFT1 among CRC patients was 6%, which was the highest among the five key genes, and most of the mutations were deep deletions (Fig. S7). Furthermore, immunohistochemical analysis results from the Human Protein Atlas (HPA) web data resource confirmed that the expression levels of FDFT1 protein were decreased in patients with CRC (Fig. 13A). Based on the above-mentioned results, we performed further in vitro experiments on FDFT1. The results of western blot demonstrated that the expression level of FDFT1 in CRC cell lines (HCT8, HCT116, SW480, SW620, SW1463, RKO and HT29) was significantly lower relative to that in the normal cell line NCM460 and FHC (Fig. 13B), which suggested that FDFT1 might function as a suppressor of CRC progression. To assess further the role of FDFT1 in CRC, SW480 cells were transfected with FDFT1 overexpression plasmid to overexpress FDFT1, and HCT8 cells were transfected with FDFT1-specific (si-FDFT1) to silence FDFT1(Fig. 13C-D). CCK-8 along with Transwell analyses were conducted to validate the prospective suppressor role of FDFT1. Results showed that forced overexpression FDFT1 repressed the proliferation, migration, as well as invasion of CRC cells, whereas FDFT1 knockdown demonstrated the opposite effect (Fig. 13E-H). Next, as Erastin is a widely used ferroptosis inducer, we further explored the potential role of FDFT1 in regulating Erastin sensitivity in CRC. As indicated in Fig. 14A-B, forced expression of FDFT1 enhanced the sensitivity to Erastin, whereas FDFT1 deficiency caused diminished sensitivity to Erastin, with an increase in cancer cell survival rate at diverse drug concentrations compared with control cells. Moreover, intracellular total iron level, Fe2+ levels as well as MDA levels were remarkably increased after upregulating FDFT1, whereas silencing FDFT1 showed the opposite results (Fig. 14C-H). GSEA analysis indicated that FDFT1 expression was linked to iron-sulfur cluster binding (Fig. 14I-J). Western blotting analysis confirmed that the ferroptosis indicator prostaglandin-endoperoxide synthase 2 (PTGS2) was upregulated and the Iron-Sulfur Cluster Assembly Enzyme (ISCU) was downregulated following FDFT1 overexpression, whereas the si-FDFT1 group showed the opposite trend, suggesting that FDFT1 may promote ferroptosis in colorectal cancer by regulating ISCU expression (Fig. 14K-L). Lastly, since immune status was different between the low- and high-risk groups, we analyzed the relationship of the FDFT1 expression with TILs (tumor-infiltrating lymphocytes) using the TIMER (Tumor IMmune Estimation Resource) algorithm web platform (https://cistrome.shinyapps.io/timer/). As indicated in Fig. 14M, FDFT1 expression level was positively associated with invading levels of CD4 + T cells (r = 0.221, P = 2.28e-04), CD8 + cells (r = 0.124, P = 3.92e-02), DCs (r = 0.133, P = 2.79e-02), and neutrophils (r = 0.13, P = 3.08e-02) in CRC.