Classication of Colorectal Carcinoma Based on Ferroptosis-related Gene Signature

Background: Ferroptosis is an iron-dependent form of programmed cell death, involves in the development of many cancers. However, systematic analysis of ferroptosis related-genes in colorectal carcinoma (CRC) remains to be claried. We herein analyzed the public databases to identify the molecular features of CRC by the development of a classication based on the gene expression prole of ferroptosis-related genes. Methods: We collected the gene expression data and clinical information from The Cancer Atlas and Gene Expression Omnibus to explore the correlation of ferroptosis-related gene expression of CRC. Consensus clustering was performed to determine the clusters of colorectal cancer patients, then we analyzed the prognostic value, transcriptome features, immune microenvironment, drug sensitivity, gene mutations differences of the subclasses. Results: Four subclasses of CRC (C1, C2, C3 and C4) were identied. There were signicant differences in the prognosis of patients between the four subtypes. Functional enrichment suggested that these ferroptosis related-genes were associated with biological processes such as metabolic processes and oxidative stress, and the KEGG pathway suggested that it was closely related to ferroptosis and glutathione metabolic pathway. There were signicant differences in immune cell inltrations, immune score, stromal score and tumor purity among four subclasses. The expression of PD-L1 was differentially expressed in four subclasses and PD-L1 was signicantly correlated with the expression of several ferroptosis-related genes in the differentially expressed subtypes. There were signicant differences in stemness indices and drug sensitivity in four subclasses, and gene mutations analysis showed that ferroptosis-related genes such as TP53 had high mutations in different subtypes, and there were signicant differences in mutation frequencies in the subclasses. Conclusion: This study established a new CRC classication based on ferroptosis-related genes expression proles, and different subgroups may have their unique gene expression patterns, indicating that the heterogeneity within CRC and the classication might provide valuable stratication for the design of future treatment.

the prognosis of patients between the four subtypes. Functional enrichment suggested that these ferroptosis related-genes were associated with biological processes such as metabolic processes and oxidative stress, and the KEGG pathway suggested that it was closely related to ferroptosis and glutathione metabolic pathway. There were signi cant differences in immune cell in ltrations, immune score, stromal score and tumor purity among four subclasses. The expression of PD-L1 was differentially expressed in four subclasses and PD-L1 was signi cantly correlated with the expression of several ferroptosis-related genes in the differentially expressed subtypes. There were signi cant differences in stemness indices and drug sensitivity in four subclasses, and gene mutations analysis showed that ferroptosis-related genes such as TP53 had high mutations in different subtypes, and there were signi cant differences in mutation frequencies in the subclasses.
Conclusion: This study established a new CRC classi cation based on ferroptosis-related genes expression pro les, and different subgroups may have their unique gene expression patterns, indicating that the heterogeneity within CRC and the classi cation might provide valuable strati cation for the design of future treatment.

Background
Colorectal cancer (CRC) is the third most commonly diagnosed cancer and accounts for the second leading cause of cancer death globally [1,2]. Although the current new treatments such as biotargeted therapy, immunotherapy, and precise treatment have applied for CRC treatment [3,4], the clinical outcomes are still disappointing. CRC is a heterogeneous disease [5], it is critical to uncover the underlying molecular mechanisms for the development of novel targeted therapies.
Ferroptosis is a recently de ned regulated cell death characterized by the iron-dependent accumulation of lipid peroxidation [6]. A plethora of researches have implicated that ferroptosis play an important role in carcinogenesis and that ferroptosis promises to be a novel option for the cancer treatment, especially for tumors which are resistant to conventional treatments [7,8].
In recent years, the role of ferroptosis in CRC has attracted much attention. For example, the research by Xie et al. demonstrated that ferroptosis related gene TP53 could inhibit dipeptidyl peptidase 4 activity in an independent way of transcription, which can restrain erastin-induced ferroptosis in CRC [9]. Park et al. demonstrated that bromelain effectively causes ferroptosis in Kras mutant cell lines compared to in Kras wild-type cells by modulating ACSL-4 levels [10]. Sui et al. revealed that GPX4 inhibition was a key determinant in RSL3-induced ferroptosis in CRC [11]. Xu et al reported that targeting SLC7A11 might selectively kill colorectal cancer stem cells by inducing ferroptosis and attenuate chemoresistance in CRC [12]. Although previous studies have identi ed several gene markers associated with ferroptosis in CRC. However, the relationship between ferroptosis-related genes (FRGs) and CRC treatment and prognosis is still unclear.
In the present study, we rstly collected the mRNA expression pro les and corresponding clinical data of CRC patients from public databases. Then, three cohorts were merged into a metadata set of 747 patients. We classi ed CRC into four distinct subtypes based on consensus clustering of FRGs pro les.
Furthermore, we evaluated the prognosis value, transcriptome features, immune microenvironment, drug sensitivity as well as gene mutation alterations of the four subclasses, and analyzed the correlation with the conventional clinical features. The identi cation of ferroptosis-related subclasses may can provide an important prognostic and therapeutic value for CRC.

Data collection
Colon carcinoma(n=461) and rectal carcinoma(n=172) samples with mRNA sequencing data and clinical properties were downloaded from The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/). In addition, gene expression data together with clinical properties of 114 CRC samples in the GSE152430 data set was downloaded from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/). Thus, in this study three datasets were merged into a metadata set of 747 CRC samples and the combat function in the SVA R package was applied to remove the batch effects [13]. The data from TCGA and GEO are both publicly available. The current research follows the TCGA and GEO data access policies and publication guidelines.

Construction of ferroptosis subclasses in CRC.
The expression of 60 FRGs were obtained from the previous literatures[14 -17] and which are provided in Supplementary Table S1, next, we extracted gene expression pro les of ferroptosis-related genes in CRC samples and the candidate genes were used for clustering. The R package "Rcircos"[18] was applied to characterize the location distribution of FRGs in CRC samples.
Consensus cluster analysis to identify ferroptosis-related subtypes Initially, in order to perform consensus classi cation of CRC for metadate set, we used the R package "ConsensusClusterPlus", which offers stable quantitative and visual evidence for estimating the number of unsupervised clusters in a dataset [19]. In each algorithm, tumors were sampled 100 times, and the kmeans algorithm with the Euclidean distance metric was carried out. The clustering number was assessed according to the area under the cumulative distribution function (CDF) curve [20]. The R package survminer is then used to evaluate the prognostic value of subtypes [21] and a heat map was draw according to the tumor/node/metastasis (TNM) stage of the samples.
In addition, the analysis of variance (ANOVA) test was utilized to screen the differentially expressed ferroptosis-related genes (DEGs) among different subtypes. and the P values were adjusted by Benjamini & Hochberg (BH) correction (P <0.05). Then, correlation analysis and principal component analysis based on DEGs were carried out.

Functional enrichment analysis
The "clusterPro ler" R package was applied to conduct Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses based on the FRGs in CRC [22]. P values were adjusted with the BH method. The thresholds for analyses were determined by a p-value < 0.05 indicating signi cantly enriched functional annotations Immune characterizations analysis The R package "CIBERSORT" was used to evaluate the in ltration of immune cells, among ferroptosis subclasses [23]. The immune score, matrix score, and tumor purity of the sample were calculated by the R package "ESTIMATE" [24]. In recent years, cancer immunotherapy based on immune checkpoint inhibitors (ICIs) has achieved considerable success in the clinic. We extracted the expression pro le of immune checkpoint molecular (PD-L1) and FRGs in metadate set, and evaluated the differential expression of immune checkpoint molecular (PD-L1) in subclasses, then the R package "ggplot2" was used to analyze the correlation between PD-L1 and FRGs.

Calculation of stemness index
We calculate the stemness index using an innovative one-class logistic regression (OCLR) machine learning algorithm as previously described [25]. By using a linear transformation, the stemness index is mapped to the range [0,1] by subtracting the minimum value and dividing it by the maximum value.

Correlation analysis of drug sensitivity among subclasses
To identify which target compounds might be useful in the subtypes, the R package "pRRophetic" was used to predict the drug sensitivity in subclasses [26].
The analysis of genomic alterations difference According to the gistic score by GenePattern(http://cloud.genepattern.org/), and the R package "Maftools" was used to characterize the difference of mutations and copy number variations of different subclasses [27].

Independence of the FRGs subclass from CRC patients' clinical characteristics
In order to evaluate whether the FRGs subclass was independent variable when considering other conventional clinical features (age, gender, TNM stage, tissue or organ of origin, and body mass index (BMI)) in CRC patients. The correlation analysis by using RCIRCOS were performed.

Statistical analysis
The Kaplan-Meier with log-rank test was used to assess survival difference between groups. All statistical analyses were conducted using R software. If not speci ed above, the P value less than 0.05 was considered statistically signi cant.

Ferroptosis subclasses in CRC
The ow chart of this study is shown in Figure 1A. A total of 461 colon adenocarcinoma (COAD) and 172 rectum adenocarcinomas (READ) patients from TCGA database and 114 CRC patients from the GEO database were nally enrolled. We merged TCGA and GEO data and adopted the ComBat method to remove the batch effect. Next, we extracted ferroptosis expression gene pro les and found that there were 59 ferroptosis-related genes expressed in CRC tissues. Distribution of FRGs by circosplot were shown in Figure 1B.
By applying consensus clustering on the gene expression pro le of the above mentioned 59 FRGs on metadata set 747 CRC samples, four resulting clusters were de ned as C1-C4 (Figure2A, B). In addition, a high similarity of gene expression pattern was observed within each subgroup based on consensus matrix k=4(Figure2C). Survival analysis indicated a signi cant prognostic difference between four subgroups in metadate set ( Figure 2D, p=0.034). Next, we developed a heatmap demonstrating the relationship between stage and 59 genes ( Figure 2E). To validate the subclasses' assignments, we performed the correlation analysis of 59 genes, the results showed that there were co-expression correlations among some FRGs ( Figure 3A). Principal component analysis (PCA) was applied to decrease the dimension of features and we found the subtype designations were largely concordant with twodimensional PCA distribution patterns (Figure3B). Hence, it was reasonable to classify FRGs into four subclasses.

Transcriptomes of the CRC subclasses
To further characterize the differentially expressed FRGs (DEGs)in different subtypes, ANOVA differential expression analysis were performed. Gene expression differences were considered signi cant if the adjusted P value was < 0.05('BH' correction), the results indicated that all 59 FRGs were differentially expressed in all of these subtypes (Figure4).
To elucidate the biological functions and pathways of the 59 DEGs in CRC. Gene Oncology enrichment analysis of the signature genes was conducted using the CLUSTERPROFILER package, and signi cantly enriched biological processes are shown in Figure 5. GO enrichment analysis showed that these FRGs were signi cantly associated with biological processes such as metabolic processes and oxidative stress, and KEGG pathway analysis revealed that they were associated with metabolic pathways such as irondependent cell death and glutathione metabolism.

Tumor immune microenvironment characteristics among four subclasses
To uncover the immune heterogeneity among the four subtypes, we resorted to immune-related tools. Through the CIBERSORT method, we evaluated the differences in the immune in ltration of 22 immune cell types between four subclasses in CRC patients. As illustrated in Figure 6, the in ltration of 18 immune cells were different between the four subclasses. Compared with the other three subclasses, tumor in C3 had higher abundance of 7 immune cell populations (B cells, M2 macrophages, resting mast cells, monocytes, NK cells, plasma cells, CD8+T cells). In addition, C3 also exhibited low in ltration of M1 macrophages, activated mast cells, neutrophils, Tfh cells and Treg cells (Figure 6). We furtherly investigated stromal and immune scores based on the ESTIMATE method. The results indicated that tumor in C3 had higher immune and stromal scores but lower tumor purity (Figure7).

Assessment of immunotherapeutic response in subclasses
Immunotherapy have emerged as a promising strategy for the treatment of many cancers. We subsequently investigated the expression of PD-L1 which is a pivotal immunomodulator. As shown in Figure 8A, the expression of PD-L1 were different between the four subclasses, and tumor in C3 had higher expression of PD-L1 expression than other subclasses. In addition, we investigated the correlation between PD-L1 expression and FRGs in CRC patients. The results revealed that the expression of PD-L1 was signi cantly correlated with multiple FRGs such as ACSL4, ALOX5, HOMX1 and IREB2 (Figure8B,   FigureS1). Cancer stem cells are known to play a critical role in the growth, metastasis, and recurrence of tumors. Recent studies have reported that higher stemness indices could limit the antitumor immune responses and be associated with decreased PD-L1 expression [28]. We herein applied an innovative oneclass logistic regression (OCLR) machine-learning algorithm to calculate stemness indices of subclasses. As shown in Figure9, there was a signi cant difference in the stemness index among the subtypes, and we observed that patients in C3 had lower stemness indices value than other subtypes. Thus, these data support our analysis that patients in C3 may have better prognosis, and might be more promising in response to immunotherapy.
Difference sensitivity to antitumor drugs for CRC subclasses By applying the R package pRRophetic, we further explored the association between the CRC subclasses and sensitivity toward antitumor drugs. For gemcitabine, there were signi cant differences between the four subtypes(p<0.0001). For eriotinib, C3 exhibited signi cant association of eriotinib-resistant group. For bortezomib, the results showed signi cant differences between some subtypes (Figure10). In addition, other drugs such as cytarabine, dasatinib and sorafenib were explored, and signi cant differences were observed between some subtypes (Supplementary Figure). These data demonstrated the CRC subclasses might provide precise therapeutic approaches in the further.

Correlation of the CRC subclasses with mutations and CNV differences
The tumoral genomic landscape has been proven to be associated with the application and e cacy of targeted therapies and immunotherapy. We further evaluated the difference of gene mutations and copy number variation (CNV) among the four subtypes. The results revealed that distinct subtypes had different subclasses tend to have different mutational characteristics. For example,49% samples in C1 had TP53 mutation while 62% in C2 and 71% in C4 (Figure11A). What's more, we further obtained gistic scores of different subtypes according to the genepattern website, and the results indicated that there was a signi cant difference in mutation frequency between the four subtypes. These data could protect the samples in different clusters from the option of individualized therapeutic agents.

Correlation of CRC subclasses with clinical parameters
In order to elucidate whether the ferroptosis-related subclasses for CRC is independent indicator when considering other conventional clinical parameters. Based on the integrated clinical information downloaded from COAD and READ in TCGA with the clinical information obtained from GEO, we obtained the clinical information of the metadate dataset. By applying the RCIRCOS method, we compared the distribution of ferroptosis subclasses with the patients' stage, age, gender, tissue or organ of origin, and BMI. As shown in Figure 10, ferroptosis subclasses do not show signi cant crossover with other clinical parameters, and have signi cantly difference with the parameters. Hence, the subclass may serve as a potential molecular subtype which can provide an important prognostic and therapeutic value for CRC.

Discussion
Ferroptosis is a new form of programmed cell death which is different from other types of cell death, the correlation between ferroptosis and cancer is extremely complicated and inducing ferroptosis is considered a novel approach for cancer therapy [29][30][31]. However, the systematic analysis of CRC has yet to be elucidated. In this study, to obtain the classi cation of ferroptosis-related genes pro le of CRC samples, we analyzed gene expression pro le of samples in TCGA and GEO datasets, and the batch effect from different platforms were successfully removed. The consensus clustering based on our large sample size indicated that our transcriptome classi cation was robust. Then we screened 59 FRGs that could be used to cluster patients with CRC into four groups, with signi cant differences in clinicopathology and prognosis. Next, transcriptome features, immune makers and drug sensitivity of the subclasses were explored.
Recently, a few studies have indicated that several genes could regulate drug-induced ferroptosis in CRC, however, their relationship with CRC patient's clinical outcome remains unknown. In this study, we found most (59/60) of were differently expressed in CRC tissues, and each subclass with different prognosis. These results signi cantly demonstrated the potential role of ferroptosis in CRC.
Tumor microenvironment and immune cell in ltration are reported to be correlated with cancer prognosis [32]. And some studies conducted over the past ve years revealed that the ferroptosis are tightly associated with antitumor immunity [33]. We investigated the relationship between subclasses and immune markers including immune score, tumor purity, and in ltrating immune cells. As we expected, the ferroptosis subclasses could successfully distinguished the CRC patients based on the above makers, which implied the credibility of the classi cation in evaluating the immune response. Moreover, the relevant estimation demonstrated that C3 had the higher immune score, stromal score and the lowest tumor purity. These data suggested that C3 subclass was of high heterogeneity and might be refractory. In recent years, there has been great success in applying immune checkpoint inhibitors for the treatment of various cancers, the PD-L1 level is a commonly used maker for predicting the e cacy of immunotherapy. Patients with high levels of PD-L1 in tumor tissue who received PD-1/PD-L1 inhibitors had better survival outcomes compared to patients who did not receive this treatment [34,35]. Thus, we investigated the expression of immune checkpoint molecules PD-L1 and the correlation between PD-L1 and ferroptosis-related genes, the results showed that there were signi cant differences in the expression of PD-L1 among the four subgroups, and there was a signi cant correlation between PD-L1 and several genes, which further indicated that ferroptosis was signi cantly correlated with immunotherapy. Cancer stemness had been reported to be associated with suppressed immune responses across cancers. As expected in our study, we found that there were signi cant differences between subgroups. Based on the above evidence, we hypothesize that the subclass based on ferroptosis-related genes may act as reliable immune-related biomarker for cancer therapy.
As we were known, gene mutation may induce treatment resistance. In our study, genomic analysis revealed a distinct mutation and copy number change landscape among subclasses. Results indicated that distinct subclasses tended to have different mutation proportion of each gene. These data might demonstrate that our identi ed ferroptosis subtypes attached importance to gene alterations. however, the correlations between ferroptosis and somatic mutations need to be further studied.
Next, utilizing the R package pRRophetic, we tested the sensitivity of antitumor drugs, the results showed signi cant differences in drug sensitivity among the four subtypes, which suggests that there are different mechanisms involved in CRC among the four subtypes.
Then, we compared the classi cation with distribution of the conventical clinical parameters. As a result, it revealed that molecular classi cation of FRGs has no signi cant overlap with the classi cation of other clinical parameters in CRC.
The present study was a pioneering work for the classi cation of CRC based on the molecular signature of ferroptosis. However, we have to acknowledge some aws of our present work. First, a larger sample size dataset is critically needed to validate our classi cation. Second, it will be necessary to validate our classi cation in clinical samples. In addition, further basic experiments are important to understand the difference in the mechanisms of the four subtypes in CRC.

Conclusion
In conclusion, in this study we established a new CRC classi cation based on ferroptosis-related genes expression pro les, and different subgroups may have their unique gene expression patterns, and our work contributes to the insights of ferroptosis signatures in CRC and provides valuable information for personalized treatment and prognosis prediction.      Functional enrichment analysis of differentially expressed ferroptosis-related genes in subtypes.

Figure 6
The difference of immune cell in ltration abundances CRC subtypes.