A Novel Ferroptosis-Related Gene Signature For Clinically Predicting Recurrence After Hepatectomy of Hepatocellular Carcinoma Patients.

Background: The high HCC recurrence rate was the main reason for the poor prognosis after hepatectomy. In this present study, we identied a novel ferroptosis-related gene signature for clinically predicting HCC recurrence after hepatectomy. Methods: Ferroptosis-related genes were obtained from the FerrDb database. We identied the ferroptosis-related differentially expressed genes (FDEGs) between HCC tissues and normal tissues from the GSE14520 dataset. These FDEGs were used to perform a univariate and multivariate regression analysis to construct HCC recurrence models. An independent HCC cohort from TCGA database was used to validate the reliability of the multi-gene HCC recurrence model. The predictive nomogram and DCA was built to estimate the recurrence predicting capacity of the multi-gene signature. GO, KEGG and GSEA were used to further investigate the potential mechanism of these FDEGs. Results: A total of 39 FDEGs were identied. A seven-gene signature (MAPK9, SLC1A4, PCK2, ACSL3, STMN1, CDO1, and CXCL2) was constructed for HCC recurrence prediction. Patients in high-risk groups exhibited a signicantly poor prognosis compared with low-risk patients in both the training set (GSE14520 cohort) and the validation set (TCGA cohort). Multivariate cox regression analysis demonstrated that the 7-gene signature were independent risk factors for RFS in HCC patients. The nomograms incorporating 7-gene signature and clinical prognostic risk factors were able to effectively predict RFS. KEGG analysis showed that FDEGs were mainly enriched in Ferroptosis, Hepatocellular carcinoma pathway, MAPK signaling pathway, and so on. GSEA analysis revealed that the high-risk group was enriched with multiple oncology characteristics and invasive-related pathways. Conclusion: Our study constructed a seven ferroptosis-related gene signature and established a prognostic nomogram for clinically predicting recurrence after hepatectomy and offered novel research directions for personalized treatment in HCC patients.


Introduction
Primary liver cancer is one of the most common malignant tumors, in which hepatocellular carcinoma (HCC) accounts for about 85-90% of cases [1]. HCC causes more than 800,000 deaths per year and imposes a huge economic and health burden worldwide [2,3]. According to data released by the American Cancer Society in 2021, the 5-year survival rate of HCC patients for all stages was only 20% [4]. With the improvement of diagnostic capabilities, the proportion of surgical resection of HCC has increased, but those who have undergone radical resection still have a 70% recurrence rate within 5 years [5]. The high recurrence rate of HCC was mainly responsible for the death of patients. Therefore, establishing an effective model for predicting postoperative recurrence and identifying high-risk patients early, and taking the initiative to take clinical actions is of great value to improve the prognosis. The traditional recurrence prediction model integrated tumor stage, tumor size, microvascular invasion, tumor differentiation, and other relevant clinical characteristics, and supplemented by a single serum alpha-fetoprotein expression [6][7][8]. But their speci city and sensitivity were not enough to distinguish patients with heterogeneity.
Ferroptosis is a newly discovered cell death form that results from severe lipid peroxidation of intracellular iron overload and differs from apoptosis, necrosis, and autophagy in terms of morphology [9,10]. In recent years, an increasing number of studies have revealed that ferroptosis plays a non-negligible role in regulating the growth and proliferation of some types of tumors [11][12][13]. Speci cally, ferroptosis has a pivotal role in killing tumor cells and inhibiting tumor invasion and metastasis [14,15]. A previous study has shown that ferroptosis was an effective method to induce HCC cell death and has the role of block the cytotoxic effect of sorafenib on HCC [16]. At present, some ferroptosis-related genes such as NRF2, NQO1, HCAR1, MCT1, ZFP36, etc. have been validated playing the role of cancer-promoting or suppressor factors in HCC [17][18][19]. However, rarely studies focus on the predictive effects of these ferroptosis-related genes on the recurrence of HCC patients.
In this current study, we constructed an HCC recurrence model using ferroptosis-related differentially expressed genes (FDEGs) obtained from the gene expression omnibus (GEO) [20] and FerrDb database [21]. Then, we validated the reliability of the multi-gene HCC recurrence model in an independent The Cancer Genome Atlas (TCGA) cohort [22]. The predictive nomogram and decision curve analysis (DCA) was built to estimate the recurrence predictive capacity of the multi-gene signature. In addition, we investigated the correlations between the genetic alteration of the seven-gene signature and recurrencefree survival (RFS) in the cBioPortal database [23]. Finally, Gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and gene set enrichment analyses (GSEA) was used to explore the intrinsic regulating mechanisms of these ferroptosis-related genes [24,25]. On the whole, our results showed that the ferroptosis-related seven-gene signature and nomogram might help effectively predict the RFS of patients with HCC.

Methods
Gene datasets and clinical characteristics collection.
We downloaded the mRNA expression data and corresponding clinical characteristics data in the GSE14520 dataset and LIHC dataset. The ferroptosis-related gene list was obtained from the FerrDb database and literature published in PubMed. The GSE14520 dataset from the GEO database, which contains RNA sequencing and clinical data of 242 HCC patients, was used to construct the predictive model for recurrence. The liver hepatocellular carcinoma (LIHC) cohort from the TCGA database, containing gene expression and clinical data of 372 HCC patients, was used to validate the results of the predictive model. All gene expression and clinical data were obtained from the publicly available database, hence it was not required for additional ethical approval.
The R software (version 4.0.2) and built-in limma package were utilized to perform differentially expressed genes analysis using RNA sequence data between the HCC tumor tissues and paired normal tissues, which were downloaded from the GSE14520 dataset. The screening of differentially expressed genes must meet two standards: log2 fold change (FC)> 1.0 or log2 fold change (FC)< -1.0, adjusted Pvalue <0.05. Next, the overlapping gene between the ferroptosis-related genes obtained from the FerrDb database and the differentially expressed genes identi ed from the GSE14520 dataset were screened as FDEGs.
Establishment and validation of the ferroptosis-related gene signature.
We performed the univariate Cox regression and Multivariate Cox regression on FDEGs sequentially to obtain the ferroptosis-related genes and their regression coe cients. These genes expression was an independent risk factor for the RFS of HCC patients and was used to establish a prognostic risk signature. Next, the 242 HCC patients from the GSE14520 dataset were separated into high-risk and lowrisk groups based on the median Risk score, which was calculated based on the following formula: Risk score= ∑ n i = 1 vi × βi (v represent the expression value of the gene and β represent the corresponding regression coe cients.) In addition, the Kaplan-Meier survival analysis and time-dependent receiver operating characteristic (ROC) curves were used to evaluate the predictive performance of this gene signature for RFS. We further investigated the correlations between prognostic gene signature and relative clinical characteristics. Moreover, we explored the independent prognostic role of the gene signature for RFS by the univariate and multivariate Cox regression analyses. In this process, some clinical characteristics were integrated, such as age, gender, tumor stage, tumor size, serum Alphafetoprotein (AFP) level, and so on. Finally, we validated the reliability of the risk score model using an independent LIHC cohort in the TCGA database.
Establishment and validation of a predictive nomogram.
We established a nomogram that integrated all the independent risk factors identi ed from the multivariate Cox regression analysis to predict the RFS of 1-year, 3-year, and 5-year. We calculated the concordance index (C-index) using the "survival" R package to assess the predictive performance of the nomogram. We next plotted calibration curves of RFS probability at different years. In addition, the timedependent ROC curve was plotted via the "timeROC" R package to assess the performance of the nomogram. Furthermore, we performed the DCA analysis via the "ggDCA" R package to select the best model that has the highest clinical net bene t.
Genetic alteration and protein expression analysis of gene signature.
To investigate the effect of gene alterations on the aberrant expression, we queried the genetic alterations and mutation hotspot of gene signature using the liver Hepatocellular Carcinoma dataset (TCGA, Firehose Legacy) in the cBioPortal database. Then, we compared the overall survival (OS) probability, disease-free probability, progression-free survival probability, and disease-speci c survival probability between the alteration groups with no alteration groups. Furthermore, we investigated the differential expression of the seven-gene signature protein between the HCC tissues and adjacent normal liver tissues in the Human Protein Atlas database.
Functional enrichment analysis via GO, KEGG, and GSEA.
We performed the GO and KEGG enrichment analysis on the FDEGs using the DAVID database to explore the potential mechanism that these genes regulating the tumorigenesis and progression of HCC [26]. The results were visualized using the "clusterPro ler," "enrichplot," and "ggplot2" R packages.
RNA sequence (RNA seq) data in the TCGA database was selected to perform the GSEA enrichment analysis using the GSEA software (version 4.1.0). We separated the 372 HCC patients into high-risk groups and low-risk groups taking the median Risk score value as the critical point. In the process of GSEA, the KEGG gene set (c2.cp.kegg.v7.0.symbols.gmt) was selected as the functional gene set, and the number of permutations was set as 1000. Other parameters were set to default values. The adjusted p-value<0.05 and false discovery rate (FDR) q-value<0.25 were considered statistically signi cant.

Statistical analysis.
The R software (version 4.0.2) was utilized to perform the statistical analysis and plot the statistical gures. The association between the risk score with the clinicopathological characters was analyzed using Pearson's chi-square test. Univariate and multivariate cox regression analysis was used to identify the risk factors or independent risk factors of RFS. Kaplan-Meier analysis with the log-rank test was utilized to compare the RFS between the high-risk group and low-risk group. The area under the curve (AUC) of ROC was utilized to estimate the predictive performance of gene signature. P<0.05 was considered as a statistically signi cant difference.

Results
FDEGs identi cation in HCC.
We exhibited the scheme of our study in a ow chart (Fig. 1). A total of 1014 differential expressed genes, including 539 down-regulated and 375 up-regulated genes, were identi ed in 242 HCC tissues compared with 246 adjacent normal liver tissues from the GSE14520 dataset. Next, we obtained 254 ferroptosisrelated genes from the FerrDb database. Then, the overlapping 39 genes between the 1014 differential expressed genes with 254 ferroptosis-related genes were identi ed as FDEGs.
Establishment of the prognostic seven-gene signature.
We calculated the seven-gene-based risk score for each HCC patient in the training set (GSE14520). Next, the 242 patients were separated into high-risk and low-risk groups based on the median Risk score ( Fig.  2A). The Kaplan-Meier survival analysis and time-dependent ROC curves were used to evaluate the predictive performance of this gene signature for RFS. The ROC curve revealed that AUCs of 1-year, 3-year, 5-year for RFS were 0.68, 0.64, and 0.61 (Fig. 2B). Furthermore, The Kaplan-Meier survival analysis revealed that patients in the high-risk group exhibited a poorer RFS than patients in the low-risk group ( Fig. 2C). In addition, the correlation analysis demonstrated that high-risk score correlated to tumor-nodemetastasis (TNM) stage (P=0.020), serum AFP level (P<0.001), alanine aminotransferase (ALT) (0.025), predicted risk metastasis signature (PRMS) (P<0.001), recurrence (P=0.001), and death (P=0.002) (  Validation of the prognostic gene signature in the TCGA database. We download the RNA seq and corresponding clinical characteristic data in the TCGA database to validate the performance of the seven-gene signature for RFS prediction. We calculated the seven-genebased risk score for each HCC patient in the validation set (TCGA HCC cohort). The 372 patients were separated into high-risk and low-risk groups based on the median Risk score (Fig. 4A). The ROC analysis showed that the AUCs of 1-year, 3-year, 5-year for RFS were 0.69, 0.73, and 0.74, respectively (Fig. 4B).
Furthermore, consistent with the result of the GSE14520 dataset, the Kaplan-Meier survival analysis revealed that patients in the high-risk group exhibited a poorer RFS than patients in the low-risk group (Fig. 4C). In addition, the correlation analysis demonstrated that high-risk score correlated to tumor grade (P=0.026), preoperative pharmaceutical (P=0.036), T 3/4(P=0.002), lymph node invasion (P=0.001), metastasis (P=0.048), recurrence (P<0.001), and death (P=0.048) (  Establishment and validation of a predictive nomogram. We next established a nomogram that integrated all the independent risk factors including gender, risk score, and tumor stages identi ed from the multivariate Cox regression analysis to predict the RFS of 1year, 3-year, and 5-year (Fig. 6A). The C-index of the combined nomogram model was 0.872. The calibration curve representing the actual and combined model-predictive RFS of the training set (GSE14520) at 1-year, 3-year, and 5-year exhibited an accurate performance for predicting recurrence of the nomogram (Fig. 6B-D). Furthermore, the ROC analysis showed that the AUCs for 1-year, 3-year, 5-year RFS predictions were 0.824, 0.807, and 0.762, respectively (Fig. 6E). In addition, the DCA curve exhibited the best net bene t of the combined model for 1-year, 3-year, and 5-year RFS prediction compared with the individual predictive factors (Fig. 6F-H). Summarizing, these results demonstrated that the combined model of nomogram exhibited an excellent predictive ability for 1-year, 3-year, and 5-year RFS of HCC patients, which might help the clinical therapy decision.
Genetic alteration of gene signature correlated with poor survival probability.
We queried the genetic alteration of the seven-gene signature in the Liver Hepatocellular Carcinoma cohort (TCGA, Firehose Legacy) of the cBioPortal database. Among 352 HCC patients, 95 patients (27.0%) shown genetic alterations of gene signature (Fig. 7A). In addition, the patients with genetic alteration have poorer overall survival probability (P=2.98e-3, Fig. 7B), disease-free probability (P=0.0274, Fig. 7C), progression-free survival probability (P=0.0474, Fig. 7D), and disease-speci c survival probability (P=0.0118, Fig. 7E) than the patients without genetic alterations. We further investigate the protein expression of gene signature between the HCC tissues and adjacent normal liver tissues in the Human Protein Atlas database. In HCC tissues, the expression of MAPK9, SLC1A4, ACSL3, and STMN1 proteins increased, while the expression of PCK2 and CDO1 proteins decreased (Fig. 7F). However, we didn't nd the CXCL2 protein expression in the Human Protein Atlas database.

GO, KEGG enrichment analysis and protein-protein interaction (PPI) network constructions of FDEGs.
We performed the GO and KEGG enrichment analysis on the 39 FDEGs to explore the potential mechanism that these genes regulating the tumorigenesis and progression of HCC. GO analysis revealed that in the biological process, the FDEGs signi cantly enriched in the pathway of response to nutrient levels, response to the metal ion, response to starvation, response to oxidative stress, etc (Fig. 8A). In the cellular component, the FDEGs are mainly enriched in neuron projection cytoplasm, pigment granules, melanosome, etc (Fig. 8B). In the molecular function, the FDEGs are mainly enriched in protein serine kinase activity, decanoate-CoA ligase activity, etc. (Fig. 8C) KEGG analysis showed that the FDEGs signi cantly enriched in the pathway of Ferroptosis, Hepatocellular carcinoma, MAPK signaling pathway, and other cancer-related pathways (Fig. 8D). We further constructed a PPI network of these FDEGs in the STRING database and visualized it utilizing the Cytoscape software (Fig. 8E).

Discussion
Cancer, a currently urgent public health issue that should be addressed, has brought a tremendous economic and health burden worldwide [27]. HCC, a type of highly aggressive cancer, was a primary cause of cancer-related death in many areas of the world, especially in East Asia and sub-Saharan Africa [28,29]. According to data released by the American Cancer Society in 2021, the 5-year survival probability of HCC patients for all stages was only 20% [4]. With the heightened health-conscious and improved diagnostic capacity, an increasing number of HCC patients were diagnosed during the physical examinations, and those patients can be treated by curative surgery. However, studies revealed that those who have undergone radical resection still have a 70% recurrence rate within 5 years [5,[30][31][32]. The high recurrence rate was the main responsibility for the short overall survival and poor prognosis of HCC patients. Therefore, establishing an effective model for predicting postoperative recurrence and identifying high-risk patients early, and taking the initiative to take clinical actions is of great value to improve the prognosis. The conventional predictive model of recurrence integrated tumor stage, tumor size, microvascular invasion, tumor differentiation, and other relevant clinical characteristics, and some supplemented by a single serum alpha-fetoprotein expression. But their speci city and sensitivity were not enough to distinguish patients with heterogeneity. In recent years, the gene signature based on the mRNA aberrant expression has been reported to address the problems of heterogeneity and exhibited an ampli cated diagnostic sensitivity and speci city. Massive studies focus on establishing prognosticrelated gene signature models to improve diagnostic e ciency and overall survival prognosis [33]. Wang et al. established an RNA-binding proteins-related gene signature to predict the overall survival and found that this gene signature can be an independent risk factor for HCC patients [34]. Yang and colleagues constructed a two-gene signature (HNRNPA2B1 and RBM15) to identify and treat HBV-related HCC patients and has the predictive value for OS [35]. However, rarely are studies devoted to investigating the recurrence-related gene signature of HCC[36].
In recent years, it has been con rmed that ferroptosis plays a signi cant role in inducing HCC cell death and inhibiting cell proliferation and metastasis, and the ferroptosis-related genes were the carriers of this function. Previous research revealed that DAZAP1 was the ferroptosis suppressor gene and signi cantly overexpressed in HCC cells. DAZAP1 also promoted proliferation and signi cantly reduced the cellular sensitivity to sorafenib [37]. Another study reported that metallothionein (MT)-1G increases the sorafenibresistance of HCC cells by inhibiting the process of ferroptosis [38]. Furthermore, a recent study revealed that ACSL4, a positive-activating enzyme of ferroptosis, can increase the sensitivity of HCC patients to sorafenib by activating the ferroptosis [39][40][41]. Some studies focus on the predictive value of ferroptosis-related gene signature for overall survival of HCC patients, but rarely research to establish the gene signature to predict the RFS probability [42][43][44][45].
In this current study, we established a novel ferroptosis-related seven-gene signature (including MAPK9, SLC1A4, PCK2, ACSL3, STMN1, CDO1, and CXCL2) for HCC RFS prediction. The seven-gene signature exhibited an excellent predictive performance in the training set (GSE14520). The Kaplan-Meier survival analysis revealed that patients in the high-risk group exhibited a poorer RFS than patients in the low-risk group. And the correlation analysis demonstrated that high-risk scores correlated to TNM stage, serum AFP level, ALT, predicted risk metastasis PRMS, recurrence, and death. Moreover, the multivariate Cox regression analyses revealed that high-risk score was independent risk factors for RFS. All these results were also veri ed in the validation set (TCGA HCC cohort).
We next established a nomogram that integrated all the independent risk factors to predict the RFS of 1- The ow chart showing the scheme of our study on ferroptosis-related gene signatures for clinically predicting recurrence after hepatectomy of hepatocellular carcinoma patients.