Identifying Eight Ferroptosis-Related Signatures for Prediction of Hepatocellular Carcinoma Overall Survival

Hepatocellular carcinoma (HCC) is characterized by widespread epidemiology and extraordinary heterogeneity, with challenging prognosis prediction. Ferroptosis is a regulatory cell death driven by iron-dependent lipid peroxidation. The main aim of this study was to determine the predictive value of ferroptosis-related genes (FRGs) in HCC. Herein, the data of HCC patients were downloaded from The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) public databases. In the ICGC cohort, a multigenic signature was constructed using the LASSO Cox regression model. Next, patients in the TCGA cohort were used to verify the reliability of the model. Results showed that 30.07% of the differentially expressed genes (DEGs) in the ICGC cohort were associated with ferroptosis. Among them, 35 genes were identied as intersected genes associated with overall survival in both cohorts. Moreover, an 8-gene signature for prediction of HCC patients was constructed and the patients were divided it into low-risk and high-risk groups. The results indicated that the overall survival (OS) of patients in the high-risk group was lower than OS of patients in the low-risk group (P < 0.001 in both cohorts). Multivariate Cox regression analysis indicated that the risk score was an independent predictor of OS (HR > 1, P < 0.001). Receiver operating curves (ROC) demonstrated the predictive power of the signature. Furthermore, functional enrichment analysis revealed the existence of signicantly correlated immune-related pathways, and their immune states were different between groups. and be used


Introduction
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related death globally and the most common type of primary liver cancer [1,2]. HCC is a heterogeneous tumor, which mainly originates from chronic liver in ammation. Evidence suggests that the leading causes of HCC include chronic viral hepatitis, alcoholism, and fatty liver [3]. For early HCC, liver transplantation, surgical resection, and percutaneous ablation can achieve a 5-year overall survival (OS) as high as 70% [4]. However, the 5-year OS of patients with late-stage HCC is hardly 10% [5]. It is worth noting that HCC is usually diagnosed at an advanced stage, which is unsuitable for treatments. Prediction of HCC prognosis is also di cult due to its complex etiology and heterogeneity. Moreover, HCC is a disease with limited treatment options and frequent relapses. Therefore, this calls for identi cation of a novel predictive model for HCC.
Ferroptosis is a novel non-apoptotic regulatory cell death driven by iron-dependent lipid peroxidation. In recent years, ferroptosis has become a promising cancer treatment method because of the high iron level in cancer cells and the increasing sensitivity to induce ferroptosis [6]. In addition to drugs that induce ferroptosis, many genes have been regarded as markers or regulators of ferroptosis. Preceding studies have shown that ferroptosis modulates HCC. Some genes, such as GPX4 and FSP1, have a negative regulatory effect on ferroptosis [7,8]. Moreover, previous studies have reported that NRF2, S1R, CLTRN, and NFE2L2 could protect hepatocellular carcinoma cells from ferroptosis [9][10][11]. However, the underlying mechanisms of these ferroptosis-related genes for HCC have not yet been elucidated.
In this study, the data of HCC patients was downloaded from The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) databases. The ICGC cohort was rst used to construct a multigene prediction model, followed by validation in the TCGA cohort. Furthermore, functional enrichment analysis was performed on the intersected genes with the overarching goal of further exploring the possible mechanism.

Data Preparation
The data of HCC patients was downloaded from The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/) and International Cancer Genome Consortium (ICGC) (https://dcc.icgc.org/) public databases. The clinical data of 377 HCC patients and the RNA-seq data of 371 HCC patients were downloaded from the TCGA database. Next, "limma" R package was used to normalize the gene expression pro les. In addition, RNA-seq and clinical data of 231 HCC patients were downloaded from the ICGC database. Notably, this is a LIRI-JP project which derives data from the Japanese population [12]. The raw data was then normalized.
Ferroptosis-related genes (FRGs) were downloaded from the FerrDb website (http://www.zhounan.org/ferrdb/index.html) [13]. After excluding overlapping genes, the downloaded data set included the driver, suppressor, and marker for a total of 259 genes. Twenty four more FRGs were included after referring to Liu et al [14]. Furthermore, a comprehensive literature search was conducted on PubMed database, using "ferroptosis" as the keyword, to identify studies published up to 14 th July 2020. A total of 1,011 relevant articles were subjected to the inclusion and exclusion criteria as described in a previous study. In total, 409 FRGs were selected (Supplementary Table 1).

Identi cation of intersected genes
Differentially expressed genes (DEGs) between neoplastic and non-neoplastic tissues were selected using the "limma" R package. The false discovery rate (FDR) < 0.05 and log-fold change (logFC) > 1 were used to lter the DEGs. Next, the "Survival" R package was used for survival analysis. Univariate Cox analysis was then used to determine the OS of TCGA and ICGC cohorts, followed by selection of FRGs with a p < 0.05 prognosis value. DEGs associated with the prognosis of ferroptosis were rst selected in the TCGA cohort. Next, predictive genes associated with ferroptosis were screened in the ICGC cohort, followed by identi cation of the intersected genes between the two gene sets.

Protein-protein interaction (PPI) and the correlation network of intersected genes
We constructed an intersected-gene PPI network using the STRING database [15]. Notably, when the comprehensive score between proteins is greater than 0.4, it is considered statistically signi cant. Hub genes were then analyzed using the CytohHubba plug-in of Cytoscape (Version 3.8.0). The disconnected proteins in the network were excluded. Finally, a correlation network of intersected genes was constructed based on the background gene of HCC difference FRGs in TCGA, with a cut-off value of 0. 3.
Construction and veri cation of FRGs predictive signature LASSO-penalized Cox regression model was used to reduce the risk of over tting [16] , while the "glmnet" R package was used to select variables and contract the LASSO algorithm. The standardized expression matrix of intersected genes was the independent element of the regression. In the TCGA cohort, OS and survival status were the reaction variables. The penalty parameter λ of the model was de ned by the tenfold cross-validation according to the minimum criterion. The standardized expression level of each gene and its matching regression coe cient were used to calculate the risk score of patients in accordance with the following formula: score= esum (the expression of each gene × the relevant coe cient). Patients were then divided into a high-risk group and a low-risk group based on the median risk score. Principal component analysis (PCA) was then conducted using the "stats" R package for those that met gene expression signatures according to the signature gene expression. Furthermore, the "Rtsne" R package was used to analyze t-SNE by exploring the distribution of the different groups. The 'surveyor' R package was utilized to analyze the survival of each gene and 'surv_cutpoint' functions were used to determine the best-truncated expression values. Finally, time-dependent receiver operating curves (ROC) analysis was generated using the "SurvivvalROC" R package and used to assess the predictive ability of the gene signature.

Functional enrichment analysis
Comparison between the two risk score groups led to identi cation of an 8-gene signature among the DEGs. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were conducted using the "clusterPro ler" R package. Next, single gene set enrichment analysis (SSGSEA) was performed on the immune in ltration fraction and immune response pathway of immune cells using the "gsva" R package [17] . Supplementary table 2 shows the annotated documents of immune genes.

Statistical analysis
All statistical analyses were performed using R software (version 4.04). Student's T-test was used to analyze differential expression of genes, while the Chi-squared test was used to compare differences in proportions. The Mann-Whitney u test was used to compare the scores of immune cells or immune pathways. Moreover, survival curves and data were analyzed using Kaplan-Meier and log-rank tests, respectively. Univariate analysis was rst used to identify single-variable predictors, followed by multivariate analysis to construct a multivariable Cox model. P value less than 0.05 was considered statistically signi cant. Figure 1 shows the ow diagram of this research. In this study, 365 cases of TCGA-LIHC and 231 cases of ICGC (LIRI-Japan) HCC were included.

Results
Identi cation of intersected genes of the ICGC and TCGA cohort Differential analysis in the TCGA cohort showed that 123 out of 408 (30.07%) FRGs were differentially expressed between HCC tissues and their adjacent non-tumor tissues ( Fig.2a-b). In addition, 128 FRGs were associated with OS. Next, differential and prognostic genes associated with ferroptosis were intersected. Results indicated that 62 differential prognostic genes in the TCGA cohort were associated with ferroptosis. Subsequent univariate analysis in the ICGC cohort found that 35 genes were associated with OS (supplementary table 3). Furthermore, we intersected prognostic genes in the ICGC cohort with differentially prognostic genes associated with iron death in the TCGA cohort. Thirty ve genes were reserved as the intersection genes for further analysis (Fig.2c). The network of interactions among these genes showed that MAPK3, NQO1, CDKN2A, TXN, and other genes (supplementary table 4) were hub genes (Fig.2d).

Construction of a prognostic model using the ICGC cohort
The expression pro les of 35 genes were used to establish the prediction model after Lasso-Cox regression analysis, from which, eight genes were identi ed based on the optimum value of lambda ( Supplementary Fig.1a). According to the best truncated value of each gene, the survival analysis revealed that high expression of these genes was associated with unfavourable prognosis ( Supplementary Fig.1d). Notably, the adjusted p value was less than 0.05. The risk score was calculated as follows: e (0.073 * expression level (EL) of G6PD+ 0.104 * EL of MAPT+ 0.061 * EL of NRAS+ 0.007 * EL of RRM2+ 0.165 * EL of SLC2A1+ 0.093 * EL of SLC7A11+ 0.164 * EL of SRXN1+ 0.046 * EL of STMN1). Next, the patients were split into a high-risk group (n=115) and a low-risk group (n=116) based on the median critical value (Fig.3a). PCA and t-SNE analyses revealed that there was always a differential distribution of cases among the different risk groups (Fig.3b-c). Results showed that patients in the high-risk group were more likely to die compared to patients in the low-risk group (Fig.3d).
Moreover, the Kaplan-Meier curve indicated that patients in the low-risk group had a higher OS than patients in the high-risk group (Fig.3e, P< 0.001). The outcomes revealed that the area under the curve (AUC) of one year, two years, and three years were 0.706, 0.730, and 0.719, respectively (Fig.3f).
Furthermore, survival analysis of the eight identi ed genes showed that, with the exception of MAPT, the other seven genes were associated with lower OS in the ICGC cohort (P 0.05, Fig.S2).
Validation of the 8-gene signature in the TCGA cohort Survival analysis in the TCGA cohort showed that high expression of the eight genes was associated with poor OS (P 0.05, Fig.S3). To test the reliability of the 8-gene signature prognostic model (established using data obtained from the ICGC cohort), HCC patients in the TCGA cohort were further divided into high-risk and low-risk groups based on the median risk score of the ICGC cohort (Fig.4a). It was found that the validation results were similar to the results of the test model establishedconstructed using the ICGC cohort. Moreover, PCA and t-SNE analyses indicated that the distribution direction of high-and lowrisk groups was different (Fig.4b-c). The OS of patients in the high-risk group was worse than for patients in the low-risk group (Fig. 4d-e, P< 0.001). The results showed that the AUC of one year, two years, and three years were 0.788, 0.706, and 0.668, respectively (Fig.4e).

Functional enrichment analysis (FEA)
To determine the biological mechanism associated with the gene signatures, differential gene expression was analyzed between groups using GO and KEGG analyses. Results showed that most of the DEGs were enriched in many molecular functions associated with ferroptosis, such as cell growth and redox ability (P< 0.05, Fig.6a, c). There were a lot of immune-related biological processes in the ICGC cohort (P 0.05, Fig.6a). The mainly enriched biological processes or molecular functions in the TCGA cohort were immune functions, including T-helper 1 type immune response, mast cell activation, immune effector process, humoral immune response, and neutrophil activation (P 0.05, Fig.6c). KEGG analysis results indicated that HIF−1 and PPAR signal channels were mainly enriched in both cohorts (P < 0.05, Fig.6b, d).
Moreover, the relationship between risk score and immune status was explored. SsGSEA was used to quantitatively analyze the enrichment parts, related functions, or pathways of different immune cell subsets.
In the ICGC cohort, there were signi cant differences between different risk score groups with regard to the process of antigen presentation, including dendritic cells (DCs) score. There were also signi cant differences in macrophage and Th2_cells scores between different risk score groups (P< 0.05, Fig.7a-b). In addition, type II IFN response scores were lower, and MHC _ I scores was higher in the high-risk group (P< 0.05, Fig.7a-b). With regard to the macrophage score, signi cant differences were found in the TCGA cohort (P< 0.05, Fig7c-d).

Discussion
This study systematically analyzed ferroptosis-related genes (FRGs) and their relationship with the overall survival rate using 403 HCC cancer samples. A new predictive model incorporating eight FRGs was constructed and validated using an exterior cohort. Functional enrichment analysis (FEA) showed that there were multiple immune pathways associated with ferroptosis.
Although previous studies have revealed that some genes may regulate Sorafenib-induced ferroptosis in HCC, the correlation between the genes and the OS of HCC patients is still unclear. A comparison of HCC tissues and their adjacent nontumor tissues showed that 30.7% of the FRGs were differentially expressed. Among them, 62 of the differential prognosis genes were associated with ferroptosis in the TCGA cohort. These results suggest that ferroptosis may modulate the occurrence and growth of HCC, and it is possible to further use these FRGs to construct prognosis-related models. In this study, a prediction model including the following eight genes (G6PD, MAPT, NRAS, RRM2, SLC2A1, SLC7A11, SRXN1, and STMN1) was constructed. A previous study found that excessive expression of G6PD could improve lipid accumulation and advance cell proliferation, thereby contributing to lipogenesis and cell growth of HCC cells [18]. The MAPT gene can encode microtubule-associated protein tau, which is associated with a variety of neurodegenerative diseases. Previous studies reported that overexpression of MAPT was associated with feeble prognosis, and MAPT acts as an indispensable part of the bicalutamide resistance of prostate cancer [19][20][21]. Peter et al. (2019) reported that NRAS might be a predictive signal and contributes to the resistance of sorafenib to HCC [22]. RRM2 exerts an anti-ferroptosis role in HCC by sustaining GSH synthesis [23]. GLUT1 is the alias of SLC2A1. A previous study showed that HCC patients with high GLUT1 expression had poor OS [24]. imilarly, xCT is an alias of SLC7A11. Evidence indicates that xCT was upregulated in HCC and proved to be an independent factor for poor prognosis [25,26]. SSRXN1 was signi cantly upregulated in HCC samples and was shown to promote HCC tumorigenesis. Thus, SRXN1 could be a prognostic biomarker for the management of HCC. Moreover, SRXN1 attenuates oxidative stress via the Nrf2/ARE pathway [27,28]. STMN 1 is an oncogene encoding a highly conserved 18-kDa cytoplasmic phosphoprotein, which is extremely expressed in HCC and may be used as a potential biomarker for identifying patients who may bene t from MET inhibitor therapy [24]. Results showed that expression of the eight genes was signi cantly up-regulated in HCC tissues. Moreover, the Unicox regression analysis showed that the eight-gene signature was positively associated with overall survival. This result is consistent with results reported in previous research, with exception of MAPT, which has not been reported before[18, 22-26, 28]. Therefore, further physiological experiments should be conducted to verify the function of MAPT in HCC tumorigenesis.
In addition, the regulation of ferroptosis may have therapeutic potential for iron overload-related diseases [29]. The relationship between the identi ed predictive genes and ferroptosis reported in this study has been con rmed by previous studies. One study revealed that G6PD binds proteins of the cellular iron metabolism under ferroptosis stress [30]. Tau promotes iron e ux from neurons, and the loss of normal soluble tau may lead to accumulation of neurotoxic iron [31]. In cancer cells, erastin induced NRAS irondependent death of HL-60 cells [32]. RRM2 exerts an anti-ferroptosis role in liver cancer cells by sustaining glutathione synthesis [33], and SLC2A1 facilitates lipid peroxidation-dependent ferroptosis death [34]. Moreover, overexpression of SLC7A11 can promote tumor growth by inhibiting ferroptosis [35]. Sul redoxin-1 is the alias of SRXN1, Sul redoxin-1, an essential Nrf2-regulated gene that plays a critical role in ferroptosis [36]. Finally, STMN1 induced ferroptosis to control glutathione levels [37]. All the abovementioned studies suggest that these eight genes are associated with ferroptosis. However, further studies should be conducted to determine whether the ferroptosis mechanism of these eight genes acts as a function in the advancement of HCC.
Although the research on tumor susceptibility mechanism has always been a hot topic, the potential mechanistic link between tumor immunity and ferroptosis has not yet been elucidated. GO and KEGG analyses between risk groups showed that most of the DEGs were mainly enriched in immune-related physiological functions and approaches. Interestingly, signi cant differences were found in the content of immune process among different risk score groups. It can be speculated that ferroptosis may be closely associated with tumor immunity. Dendritic cells (DCs) are central regulators of the immune system and early tumor immune monitoring because they have the characteristics of sentinel cells [38]. It is worth noting that ferroptosis is often triggered by the Ras-selective lethal small molecule 3 (RSL3). However, the sensitivity of RSL3-induced ferroptosis is different in M1 and M2 macrophages [39]. Th_2 cells and regulatory T cells (Treg) are produced by naïve CD4 + T cells. T-cells can trigger ferroptosis, which in turn restricts the immune response to infection. It has been reported that GPX4-de cient CD4 + T cells develop ferroptosis in a lox-independent manner through accumulation of lipid peroxides [40].
A previous study revealed that decreased levels of iron and ferritin heavy chain (FTH) may affect the expression of MHC-I molecules, thereby leading to activation of natural killer (NK) cells [41]. Results obtained in this study showed low levels of type II IFN response across risk scoring groups. Notably, interferon-γ can cause ferroptosis in cancer cells [42]. The ndings of this study were consistent with results reported in the above-mentioned studies. Therefore, the reduced anticancer immune function of HCC may be one of the reasons for the unfortunate prognosis.
However, this study had some limitations. Firstly, retrospective data retrieved from public databases was used to establish and validate the predictive models. Therefore, there is need for a prospective study to con rm the clinical applicability of the model described in this study. Secondly, given that we may have eliminated many important prognosis genes in HCC, it is inevitable to consider only a single marker to establish an intrinsic defect of the prognostic model. Furthermore, the relationship between risk score and immune activity should be con rmed through further experiments.
In summary, this study has constructed a new prognosis model involving eight FRGs in HCC. In the derivation and validation cohort, the model was proved to be independently associated with the overall survival rate, which provided insights for predicting HCC prognosis. However, the potential mechanisms of tumor immunity and FRGs are unknown and thus further studies should be conducted to elucidate the underlying mechanisms.

Declarations
Ethics approval and consent to participate:In this study, all patients agreed to sign informed consent. The research was approved by the Ethics Committee of our hospital. Consent for publication: All authors have approved the manuscript and have agreed to submit it for publication.
Availability of data and materials:The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.