Development and Validation of a Novel Prognostic Signature with 10 Ferroptosis-Related Genes in Acute Myeloid Leukemia

Acute myeloid leukemia (AML) is one of the most common hematopoietic malignancies and exhibits a high rate of relapse and unfavorable outcomes. Ferroptosis, a relatively recently described type of cell death, has been reported to be involved in cancer development. However, the prognostic value of ferroptosis-related genes (FRGs) in AML remains unclear. In this study, we found 54 differentially expressed ferroptosis-related genes (DEFRGs) between AML and normal marrow tissues. Eighteen of these 54 DEFRGs were correlated with overall survival (OS) (P <0.05). Using the least absolute shrinkage and selection operator (LASSO) Cox regression analysis, we selected 10 DEFRGs that were associated with OS to build a prognostic signature. Data from AML patients from the International Cancer Genome Consortium (ICGC) cohort were used for validation.


Conclusions
In summary, our study establishes a novel 10-gene prognostic risk signature based on ferroptosis related genes for AML patients and additionally we provide evidence that FRGs may be novel therapeutic targets for AML.

Background
Acute myeloid leukemia (AML) is characterized by a heterogeneity of molecular abnormalities and the accumulation of immature myeloid progenitors in the bone marrow and peripheral blood and represents the most common type of acute leukemia in adults [1,2]. Despite novel treatment options over the last years, the 5-year survival rate of AML patient remains unsatisfactory [3]. Between 40 and 70% of AML patients relapse and become treatment-refractory, ultimately leading to treatment failure and even death. Therefore, there is an urgent need to develop novel prognostic biomarkers to monitor the prognosis of AML patients.
Ferroptosis is an iron-dependent form of regulated cell death driven by a lethal increase of lipid peroxidation [4,5]. It has been shown to play a key role for the suppression of tumorigenesis by removing the cells de cient in key nutrients in the environment or damaged by infection or ambient stress [6].
Targeting ferroptosis is considered as a promising way for cancer patients, especially for malignancies that are resistant to traditional treatments [7,8]. However, the role of ferroptosis-related genes (FRGs) in the prognosis of AML remains unclear.
In this study, we constructed a ten ferroptosis-related differentially expressed genes (FRDEGs) prognostic signature based on the transcriptomic and clinical data of AML patients from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression Project (GTEx). Then the 10 FRDEGs prognostic signature was validated with the transcriptomic and clinical data of AML patients from International Cancer Genome Consortium (ICGC). Using functional enrichment analysis and correlation analysis, we further explored the potential molecular mechanisms involved in our signature. Finally, we performed a drug sensitivity analysis to explore potential gene targets.

Data collection
The RNA sequencing (RNA-seq) and clinical data of two AML cohorts were downloaded from public database, including 151 tumor samples of AML patients from TCGA (https://portal.gdc.cancer.gov) and 92 tumor samples of AML patients from ICGC (https://dcc.icgc.org/projects/LIRI-JP). Besides, RNA-seq data of 70 normal marrow samples were obtained from GTEx (https://www.genome.gov/). All the expression data from the three databases were normalized using the perl, respectively. The current research follows the TCGA and ICGC data access policies and publication guidelines. A total of 60 FRGs utilized in this study were obtained from the previous literature (Supplementary Table S1) [7].
Construction of a prognostic 10-gene signature The "limma" R package was used to identify the DEGs between tumor samples from TCGA and normal samples from GTEx with a false discovery rate (FDR) < 0.05. Moreover, with the help of the "limma" R package, we identi ed 60 FRGs of prognostic value and calculated their FDR using the Benjamin-Hochberg (BH) method. Protein-Protein Interaction Networks (PPI) and correlation networks of the intersecting 18 genes were generated using the STRING database (STRING: functional protein association networks (string-db.org)). Least absolute shrinkage and selection operator (LASSO) Cox regression was performed using the "glmnet" R package. The independent variable in the regression was the normalized expression matrix of candidate prognostic differentially expressed genes, and the response variables were overall survival (OS) and status of patients in the TCGA cohort. The optimum penalty parameter (λ) for the model was determined by tenfold cross-validation following the minimum criteria (i.e. the value of λ corresponding to the lowest partial likelihood deviance). The risk scores of the patients were calculated according to the normalized expression level of each gene and its corresponding regression coe cients. The formula was established as follows: score= e sum (expression level of each gene × corresponding coe cient) Patients were strati ed into high-or low-risk groups based on the median value of their risk score. Patients in the ICGC were also strati ed into a high-and low-risk group based on values derived from this formula.
Validation of a prognostic 10-gene signature Based on the expression levels of genes in the signature, we carried out Principal Component Analysis (PCA) using the "prcomp" function of the "stats" R package. Besides, t-distributed Stochastic Neighbor Embedding (t-SNE) was performed to explore the clustering of different groups using the "Rtsne" R package. Univariate and multivariate Cox regression analyses were used to identify independent prognostic factors. Receiver Operating Characteristic (ROC) curve analysis was used to predict overall survival with the R package "pROC". All statistical analyses were carried out using the R software, with P ≤ 0.05 being considered statistically signi cant.
Functional enrichment, correlation, and drug-fast analysis The "clusterPro ler" R package was utilized to conduct Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses based on DEGs (|log2FC| ≥ 1, FDR < 0.05) between the high-and low-group from TCGA cohort. P values were adjusted using the BH method. Moreover, we estimated the in ltration scoreof 16 immune cell types and the activity of 13 immune-related pathways using singlesample gene set enrichment analysis (ssGSEA) in the "gsva" R package. The annotated gene set le is provided in Supplementary Table S2. Cancer Stem Cell (CSC) correlation analysis and tumor microenvironment correlation analysis were conducted using the "limma" and "estimate" R packages. We obtained candidate drugs from the TCGA database and calculated drug sensitivity using the "limma" and "impute" R packages.
Validation of quantitative real-time PCR (qRT-PCR) 20 AML patients and 20 healthy donors (the control group) were enrolled in The First A liated Hospital of Wenzhou Medical University. The use of the bone marrow samples was approved by the ethics committee of The First A liated Hospital of Wenzhou Medical University. The written consents were signed by all the patients. The genes of this 10-gene signature were examined by qRT-PCR. In brief, TRIzol reagent was used to extract the total RNA from clinical samples. Next, using the ribo SCRIPTTM reverse transcription kit, cDNA was obtained. Using SYBR Green master mix, real-time PCR was performed in a 7500 qPCR system. Glyceraldehyde-3-phosphate dehydrogenase was used as the control. Finally, the relative level of mRNA was evaluated using the 2−ΔCt method.

Flow chart and clinical data
The ow chart of this study is shown in Figure 1. Data from a total of 151 AML tumor samples from the TCGA cohort and 92 ICGC tumor samples derived from AML patients were used. Detailed clinical characteristics of patients are summarized in Table 1.

Identi cation of prognostic FRDEGs in the TCGA cohort
We found that the majority of FRGs were differentially expressed between TCGA tumor samples and GTEx normal samples (54/60, 90%). Eighteen of these FRDEGs (presented in the Venn diagram in Figure   2B, C) were associated with OS in univariate Cox regression analysis ( Figure 2A). Using PPI network construction, we identi ed that SLC7A11, G6PD, GPX4, HMOX1, and FTH1 represented be the hub genes ( Figure 2D). A correlation analysis is shown in Figure 2E.
Construction of 10-gene signature in the TCGA cohort The above mentioned 18 prognostic FRDEGs were further ltered using LASSO Cox regression analysis, which resulted in 10 genes for further analysis ( Figure 3A, B). A risk score was calculated using mRNA expression levels and relevant coe cients of these 10 genes with the following formula: Patients in the TCGA or ICGC cohort were then divided into a high-or low-risk group according to the median cut-off value. The results of Kaplan-Meier curve indicated that patients in the low-risk group exhibited a signi cantly better OS than those in the high-risk group in both TCGA and ICGC cohort (P<0.05, Figure 3C, D). The predictive performance of the risk score for OS was evaluated by timedependent ROC curves. In the TCGA cohort, the area under the curve (AUC) reached 0.841 in year 1, 0.811 in year 2, and 0.849 in year 3 ( Figure 3E). In the ICGC cohort, the AUC was 0.634 in year 1, 0.680 in year 2, and 0.678 in year 3 ( Figure 3F). PCA and t-SNE analysis indicated that patients in different risk groups were distributed in two clusters, which was found in both TCGA cohort ( Figure 4A, B) and ICGC cohort ( Figure 4C, D). Independent prognostic value of the 10-gene signature Univariate and multivariate Cox regression analyses were carried out among the available variables to determine whether the risk score was an independent prognostic predictor for OS. In TCGA cohort, the risk score was signi cantly associated with OS in both the univariate Cox regression analyses (HR = 3.563, 95% CI = 2.513-5.051, P < 0.001) ( Figure 5A) and multivariate Cox regression analyses (HR = 3.517, 95% CI = 2.420-5.112, P < 0.001) ( Figure 5B). Similar results including both univariate Cox regression analyses (HR = 2.136, 95% CI = 1.370-3.330, P < 0.001) ( Figure 5C) and multivariate Cox regression analyses (HR = 1.969, 95% CI = 1.250-3.100, P = 0.003) ( Figure 5D) were also found in the ICGC cohort.
Functional analysis of the 10-gene signature In order to explore the potential pathways associated with this novel risk score, expression levels of DEGs between the high-and low-risk groups used for functional enrichment analysis using GSEA. In Figure 6A, the top 10 pathways were displayed. This analysis revealed that DEGs were signi cantly involved in several immune-related processes, including antigen processing and presentation, B cell and T cell receptor signaling pathways, interaction natural killer cell-mediated phagocytosis pathway. GO analysis moreover showed that DEGs were involved in the regulation of cell-cell adhesion, positive regulation of cytokine production, and extracellular matrix organization ( Figure 6B). KEGG analysis indicated that DEGs were signi cantly involved in cytokine-cytokine receptor interaction, chemokine signaling pathway, and phagosome ( Figure 6C). Notably, both GO and KEGG analysis indicated that DEGs were related to the cytokine-related pathway. In order to further explore the association between the tumor microenvironment and the risk score, we quanti ed the enrichment score of various tissue-in ltrating immune cell subpopulations and immune pathways with ssGSEA from the TCGA cohort ( Figure 7A, B). The immune score between the high-and low-risk group showed that the high-risk group had a signi cantly larger fraction of T-helper-cells as well as activation of CCR, HLA, HMC_class_l, and Parain ammation.
Correlation analysis of the 10-gene signature Using tumor microenvironment correlation analysis, we found that the risk score was positively correlated with the immune score and stromal score, which was particularly pronounced for the immune score (p<0.0001, Figure 7C, D). Moreover, Cancer Stem Cell correlation was used to further explore the relationship between the risk score and myelocytic CSCs by measuring DNA methylation pattern (DNAss) and mRNA expression (RNAss). We found that the risk score was signi cantly associated with RNAss (R=-0.18, p=0.045, Figure 7E) and DNAss (R=0.25, p=0.0063, Figure 7F).
Lastly, we conducted a drug sensitivity analysis to identify potential drugs that may have clinical advantages. We analyzed the relationship between candidate drugs and the sensitivity of each gene in the 10-gene signature. Herein, we showed the top 16 drugs ranked by p-value (p<0.05, Figure 8).
Validation of the expressions of genes in 10-gene signature in AML qRT-PCR was utilized to measure the mRNA expression levels of 10 genes in 20 AML samples and 20 normal marrow tissue samples. Our results revealed that 4 genes (CD44, DPP4, NCOA4, SAT1) were increased in AML, whilst the expressions of other genes were decreased in AML (Figure 9). Discussion AML patients have bene ted from advances in targeted molecular and immunotherapy [9,10], but the 5year survival rate of AML patients remains unsatisfactory due to high relapse rates. Strati cation of patients into high-and low-risk groups based on reliable molecular signatures may aid in selecting appropriate treatment strategies in line with precision medicine. Many studies have reported a vital role of FRGs in tumorigenesis [11][12][13]. However, the relationship between AML prognosis and FRGs remains unclear. In this study, we established a novel ferroptosis-related prognostic gene signature for AML patients, and this is a rst report. We assessed the relationship of 60 FRGs in AML tumor samples with OS and identi ed 18 FRDEGs which were differentially expressed in AML tissue compared with normal controls. Using LASSO Cox regression, we selected ten of these 18 FRDEGs for construction of a prognostic gene signature. We also compared enrichment score of in ltration of immune cells and immune pathways between high-and low-risk patients, investigated functional mechanisms using GSEA, and assessed potentially suitable drugs. This novel 10-gene signature may contribute to the improvement in the prediction of AML prognosis and patient strati cation for therapeutic strategies.
The FRGs (CD44, CHAC1, CISD1, DPP4, NCOA4, SAT1, SLC7A11, AIFM2, G6PD, and ACSF2) are included in our 10-gene signature. CD44 is a cell-surface glycoprotein involved in cell-cell interactions, cell adhesion, and migration [14]. Previously, it has been demonstrated that CD44 expression is closely related with the occurrence of tumors, including AML [15][16][17]. Stevens et al. found that CHAC1 contributes to the inhibition of AML via atovaquone [18]. Genetic inhibition of CISD1 results in iron accumulation and subsequent oxidative injury in mitochondria, thus contributing to erastin-induced ferroptosis in HCC cells [19]. Loss of TP53 prevents nuclear accumulation of DPP4 and thus facilitates plasma-membrane-associated DPP4-dependent lipid peroxidation, resulting in ferroptosis [20]. CARS1 has been included in a novel prognostic signature by Chen et al., which effectively predicts the prognosis of Clear Cell Renal Cell Carcinoma [21]. Activation of SAT1 induces lipid peroxidation and sensitizes cells to undergo ferroptosis upon reactive oxygen species (ROS)-induced stress [22]. Genetic inactivation of SLC7A11 has a synergistic effect with APR-246 for the promotion of cell death. G6PD has previously been proposed as a biomarker for AML [23]. EBF3 acts as a tumor suppressor gene in AML, and AIFM2 is related to it [24]. ACSF2 participates in the regulation of the lipid metabolism via peroxisome proliferatoractivated receptor alpha. Recently, Wang et al. constructed a FRG signature for breast cancer patients, which included ACSF2 [25]. Half of the genes included in our signature (CD44, CHAC1, SLC7A11, AIFM2, G6PD) were previously investigated in AML, whereas the other ve FRGs have not previously studied in this context.
Recently, immune in ltration has been reported to be involved in the progression of AML. For example, Luca et al. found that the bone marrow immune environment of AML patients is profoundly altered [26].
Jian et al. also found a higher level of B cell activation in AML samples than non-tumor samples [27]. NK cell can trigger the anti-leukemia responses [28] and ferroptosis has been shown to exert anti-tumor immune effects by triggering dendritic cell maturation [29]. Therefore, we explored the association between immune cell in ltration and the risk score in this study. Our data revealed that the high-risk group had a larger fractions of DCs, T cells, and NK cells. In addition, the high-risk group showed enrichment in many pathways including immune-related biological processes, such as antigen processing and presentation, B and T cell receptor signaling pathway, and NK cell mediated phagocytosis. Besides, in our tumor microenvironment correlation analysis, the risk score was positively associated with the immune score. Our ndings revealed an association between the 10-gene signature and immune cell in ltration.
In the past few decades, targeted cancer therapies have advanced rapidly. However, treatment of AML remains unsatisfactory [30]. In this study, we performed a drug sensitivity analysis to nd AML drugs that may have clinical bene ts. We found that G6PD and SAT1 were sensitive to 4 drugs (Mitomycin, ARRY-162, Cobimetinib, Irofulven), while CD44, SAT1, AIFM2, SLC7A11, and CHAC1 showed resistance to 12 drugs. These outcomes may provide further insights into treatment options for subgroups of AML patients. CSCs have been found play a crucial role in the occurrence and metastasis in AML [31]. In this study, Cancer Stem Cell correlation analysis was signi cantly associated the risk score, however, more studies are required to prove the value of our 10-gene signature with CSCs in AML.
Our study has many advantages. Firstly, we rst report a novel prognostic risk signature for AML based on ten FRGs. Secondly, we validated this 10-gene signature with clinical data. Thirdly, this signature revealed an association between FRGs and immune cell in ltration in AML. Finally, we found 4 drugs with potential have clinical advantages for AML treatment in the future. However, there are many limitations in our research. A single hallmark (ferroptosis) was used to construct a prognostic model, which may lead to the loss of many key prognostic genes of AML. In addition, the detailed roles of FRGs in AML should be further explored in the future.

Conclusion
In summary, our study establishes a novel prognostic risk signature of ten ferroptosis-related genes for AML patients. In addition, FRGs may represent novel therapeutic targets in AML.

Availability of data and materials
The datasets generated and/or analysed during the current study are available in the TCGA (https://portal.gdc.cancer.gov), ICGC (https://dcc.icgc.org/projects/LIRI-JP) and GTEx (https://www.genome.gov/). The data of 40 clinical samples in this study are available on request from the corresponding author.

Ethics approval
The studies involving human participants were reviewed and approved by the Human Research Ethics Committee in The First A liated Hospital of Wenzhou Medical University. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identi able images or data included in this article.

Consent for publication
Not applicable. Figure 1 Flow chart of the data collection and analysis.     Functional analysis of the 10-gene signature. a GSEA of DEGs between high-and low-risk groups. b GO enrichment analysis of DEGs between high-and low-groups. c KEGG enrichment analysis of DEGs between high-and low-groups.  Validation of qRT-PCR. The mRNA expression levels for each gene in the 10-gene signature was veri ed by qRT-PCR.