New prognostic models in glioblastoma: based ferroptosis-related genes and immune scores

Background: Glioblastoma multiforme (GBM) has a high degree of malignancy and the clinical outcomes is dismal. Ferroptosis is critical to the development and progression of many diseases, such as cancer, cardiovascular diseases and aging. This study was designed to establish a sensitive prognostic model based on ferroptosis-related genes and immune scores to predict overall survival (OS) in patients with GBM. Methods: The expression of genes and associated clinical parameters was obtained from the publicly available TCGA , CGGA and GEO database. According to the immune scores, patient samples were assigned into two groups. Their biological function analyses were performed through differently expressed genes. By means of LASSO, unadjusted and adjusted Cox regression analyses, this predictive signature was constructed and validated by external databases. Results: A total of 4 ferroptosis-related genes (HMOX1,HSPB1,STEAP3,ZEB1)were ultimately screened as associated hub genes and utilized to construct a prognosis model. Then our constructed riskScore model signicantly passed the validation in the external datasets of OS (all p < 0.05). Receiver operating characteristic (ROC) curve analysis was conducted. Finding the area under the ROC curves (AUCs) were 0.82 at 1 years, 0.75 at 3 years, 0.67 at 5 years. Functional analysis revealed that immune related processes were different between two risk groups.We also explored its association with immune inltration. Conclusion: Our study successfully constructed a prognostic model containing 4 hub ferroptosis-related genes for GBM, helping clinicians predict patients’ OS and making the prognostic assessment more standardized. Future prospective studies are required to validate our ndings. Cox regression analyses were carried out to identify 4 OS-related genes (HMOX1,HSPB1,STEAP3,ZEB1). Then, based on the TCGA data, we established a four ferroptosis-related genes model to predict the prognosis of GBM, and veried its prediction eciency in the GEO cohort and CGGA cohort. In these three cohorts, the survival analysis of the low-risk and high-risk groups showed that the high-risk group had a worse prognosis compared with the low-risk group. Functional analysis shows that immune cells and immune-related functions and pathways have been enriched.


Introduction
Glioblastoma multiforme (GBM) which referred to as World Health Organization (WHO) grade IV glioma, is a rapidly proliferating and an untreatable brain cancer [1]. The average annual incidence rate of GBM was about 3 in 100,000 people [2]. Despite recent therapeutic advances, management of GBM remains di cult. Therefore, to help with the clinical practice of GBM patients, there is a pressing need to develop effective prognostic models for predicting the overall survival (OS). Ferroptosis was a newly found cell death pathway which is different from traditional necrosis, apoptosis or autophagic cell death [3]. Although there is accumulating evidence suggests the association between ferroptosis and tumors, its precise molecular mechanisms and physiological function underlying ferroptosis remain elusive [4][5][6][7]. A great deal of researches con rms that ferroptosis has great therapeutic potential for cancer, particularly those with malignancies that are inherently not sensitive to traditional therapies [8]. Previous studies have found that immune scores had a strong, signi cant association with OS [9 , 10]. The presence of tumor-in ltrated immune cells have been largely reported across GBM [11]. Furthermore, immune scores were calculated according to gene expression data to represent immune signatures.
In the present study, RNA expression pro ling data and corresponding clinical data of GBM patients from TCGA data portal (https://portal.gdc.cancer.gov/) was performed. Then, we established an accurate and effective prognostic prediction model with differentially expressed genes (DEGs) in the TCGA cohort. Subsequently, the risk-score model based on this 4-gene signature was veri ed in CGGA and GEO cohorts. Moreover, we carried out functional enrichment analysis to investigate the potential mechanism mechanisms associated with the identi ed genes.

Data acquisition
The mRNA-Seq data and the corresponding clinical information for GBM were downloaded from the publicly available TCGA , CGGA and GEO database. 169 GBM tumor samples were downloaded from the TCGA website (https://portal.gdc.cancer.gov/repository). 115 tumor samples were downloaded from the CGGA database (http://www.cgga.org.cn/) and 225 samples were obtained from the GEO database(https://www.ncbi.nlm.nih.gov).
Immune scores were calculated following the method as described before [12]. In brief, Use Expression data (ESTIMATE) algorithm to calculate immune scores based on gene expression data. The immune score is used to analyze the level of in ltrating immune cells. After gene expression pro les of normal samples were ompared with these of other normal cells, the overlap that onstituted the immune signature was obtained, which represented the in ltration of immune cells in tumor tissue [12]. Then, GBM patients were subdivided into two groups based on the median of immune scores. The collected clinicopathological data are summarized in Table 1. Then, 60 ferroptosis-related genes were summed from the previous paper (Table S1) [13][14][15].
Establishing and validating the prognostic signature Patients were subdivided into high(n=61)and low (n=62)groups according to immune scores. For all statistical analyses the software package R was used. Gene expression levels were normalized using edger (R package). DEGs was identi ed by the limma R package in high vs low immune score group with FDR < 0.05 and |log2FC| ≥ 1 in the TCGA cohort [16]. Prognostic Values of DEGs were evaluated by univariate Cox analysis, using this approach, 5 genes which had strong association were signi cantly related to the OS of GBM patients were identi ed. To minimize over tting, lasso-penalized Cox regression analysis was applied [17]. Tenfold crossvalidation via minimum criteria were applied to tuning parameter (λ) selection. Based on the resulits of multivariate Cox regression analysis, survival genes were selected out to establish a prognostic model. A risk score formula was de ned according to each gene expression level and a speci c coe cient. Using the model, every GBM patient was assigned a risk score. The prognostic signature as risk score = (beta1 * gene1) + (beta2 * gene2) + (beta3 * gene3) + (betan * genen). Based on their corresponding median risk scores, these patients are further divided into high-risk and low-risk groups. Principal component analysis (PCA) was carried out using the PCA function in R. Besides, t-Distributed Stochastic Neighbor Embedding (t-SNE) was then carried out by the Rtsne R package and results were visualized by the ggplot2 R package [18][19][20]. Kaplan-Meier survival analysis was achieved by 'surv_cutpoint' function of the 'survminer' R package, and the package was used to determine the optimal cutoff value of risk scores [21 , 22]. Additionally, the discrimination ability of the gene signature was evaluated using a time-dependent receiver operating characteristic (ROC) curve by employing an R package "survivalROC" [23].

GO and KEGG functional enrichment analysis
The R package " clusterPro ler " was used to perform Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichment analyses of DEGs.The p-values were then adjusted using BH method. The single-sample gene set enrichment analysis (ssGSEA) method in the Gene Set Variance Analysis (GSVA) package of R software was used to calculate the in ltration score of 16 immune cells and the activity of 13 immune-related pathways.

Results
Identi cation of prognostic ferroptosis-related DEGs and building a prognostic model The dataset included a total of 509 GBM samples, 394 of which were used as validation samples. Patients' clinical characteristics are summarized in Table 1. More than half of the genes(39/60) were differentially expressed between two immune score groups. 17 genes that were signi cantly related to the OS of GBM patients were identi ed by bioinformatics analysis. A total of 5 prognostic ferroptosisrelated DEGs(ACACA,HMOX1,HSPB1,STEAP3,ZEB1) were preserved (Fig.1a-c). The relationship between these genes is presented in Fig.1d. Then, the least absolute shrinkage and selection operator (LASSO) method was utilized to screen signi cant DEGs from 5survival-related genes mentioned above and to avoid over tting the model. 10-fold cross validation with minimum criteria was employed to nd an optimal λ, where the nal value of λ gave minimum cross validation error. A 4-gene signature was identi ed based on the optimal value of λ (Fig.2). Using the following formula, the risk score was calculated: 0.3839* expression level of HMOX1+0.5187* expression level of HSPB1 +0.1811* expression level of STEAP3 +0.2649* expression level of ZEB1. A median cutoff value was applied to stratify patients into a high-risk group (n = 61 ) and a low-risk group (n =62) (Fig.3a). The PCA and t-SNE showed that patients in two risk groups were signi cantly distributed in different directions by the TCGA cohort (Fig.3b-c). As shown in Fig.3d, high-risk group patients is closely related to the high risk of mortality. The same, patients predicted to be in the low-risk group had a higher probability of survival. Kaplan-Meier survival analysis showed similar outcomes (Fig.3e,p<0.01). In order to verify the predictive ability of the risk scoring model. Receiver operating characteristic (ROC) curve were carried out to evaluated the predictive performance of the prognostic model. the AUC of the prognostic risk assessment model for the 4 ferroptosis-related genes was 0.82 at 1 years, 0.75 at 3 years, 0.67 at 5 years (Fig. 3f). Calibration plots can predict probabilities based on actual observed risks, so use calibration plots to assess the calibration of the model. The calibration curve of survival probability for the TCGA dataset showed that the calibration curve was close to the ideal curve, indicating that the model has good prognostic effect.Then, the difference between actual and predicted risk was compared, the study showed good concordance between the predicted and actual outcome (Fig. 3g-h).
In order to evaluate the effectiveness of the prognostic model constructed using data from the TCGA cohort, its performance was also assessed using validation cohorts. The risk score of each patient was calculated according to the same formula and patients were classi ed into the high-or low-risk groups according to the same cutoff value (Fig.4a-b). In the CGGA cohorts, we obtained the similar results to the training set. PCA and t-SNE analyses also demonstrated that patients in the two groups were distributed in discrete directions (Fig.4c-d). Similar, The OS time of patients in the high-risk group was signi cantly worse compared to the low-risk group (p < 0.05) (Fig.4e). Besides, in the CGGA cohort ,the AUC of the 4-gene signature was 0.82 at 1 year, 0.75 at 3 years, and 0.67 at 3 years (Fig.4f). On a calibration graph, the validation groups revealed a moderate calibration (Fig.4g-h), Meanwhile, our results are veri ed in the GEO dataset, a similar outcome was obtained.
Independent prognostic value of the 4-gene signature Additionally, To identify the prognostic factors and rule out other confounding factors in GBM patients, we performed the univariate and multivariate cox regression analyses. The results showed that the risk score was an independent prognostic factor, and the highrisk group had a worse OS compared with the low-risk group (

Functional analyses
GO was carried out in 3 functional ontologies: biological process (BP), cellular component (CC), and molecular function (MF). Based on GO analysis, the DEGs were most enriched in biological processes, and speci cally in immune related processes(TableS2). Accumulated evidence has revealed that immune in ltration plays an essential role in GBM [24]. In addition, KEGG pathway analyses indicated that these DEGs were highly enriched in ferroptosis, fatty acid metabolism and cytokine-cytokine receptor interaction signaling pathway (P. adjust < 0.05). To further explore relationship between immune cell in ltration and risk scores in the prognostic risk mode. Implementation of ssGSEA the in ltration levels of immune cell types, functions, pathways in the cancer samples were quanti ed by ssGSEA in R package gsva. Interestingly, the score of Macrophages, Tfh, TIL, Treg and T helper cells were signi cantly different between high-and low-risk groups (Fig.5a). Moreover,APC_co_stimulation,CCR,Checkpoint,Cytolytic_activity,HLA,In ammation−promoting,Parain ammation,T_cell_co−stimulation were the statistically different between the two risk groups (Fig.5b,all adjusted P<0.05). Similar results were obtained in validation group, which was consistent with the ndings in the GO and KEGG analysis Discussion It is becoming increasingly evident that interactions between cancer cells and the TME are closely linked to patient outcomes [25]. In the present study, GBM patients were divided into two different groups(high and low immune score group) based on their immune scores. A few previous studies reported an association between ferroptosis-related gene and GBM. However, it remains unknown whether ferroptosis-related genes are associated with overall survival of GBM patients. we systematically analyzed the expression of 60 ferroptosis-related genes and the associations with OS[6 , 26 , 27]. Our results showed that 39 ferroptosis-associated genes were differentially expressed between different immune score groups in the TCGA cohort. Further LASSO regression analysis and multivariate Cox regression analyses were carried out to identify 4 OS-related genes (HMOX1,HSPB1,STEAP3,ZEB1). Then, based on the TCGA data, we established a four ferroptosis-related genes model to predict the prognosis of GBM, and veri ed its prediction e ciency in the GEO cohort and CGGA cohort. In these three cohorts, the survival analysis of the low-risk and high-risk groups showed that the high-risk group had a worse prognosis compared with the low-risk group. Functional analysis shows that immune cells and immune-related functions and pathways have been enriched.
HMOX1, related to cell proliferation, migration, invasion and cell adhesion, is directly regulated by TGF-beta1 signaling pathway.
Previous studies have shown that HMOX1 is directly involved in the invasion of GBM cells and poor prognosis [28]. STEAP3 is involved in the regulation of immune response and cell cycle. Recently, dysregulation of the STEAP3 has been observed in various cancers [29].
Chen et al found that STEAP3 has a relationship with ferroptosis in the STEAP family and patients with high STEAP3 expression had a worse prognosis [29]. Our result was consistent with the literature. HSPB1 is a ubiquitous multifunctional protein mainly involved in protein folding and upregulated in cells exposed to stress conditions and its eff ects are dependent on its phosphorylation status. HSPB1 can negatively regulate the ferroptotic death of tumour cells. It has been con rmed that HSPB1 is highly expressed in GBM-short survival, especially when compared to GBM-long survival [30]. The relationship between HSPB1 expression and GBM survival has been veri ed by western blot analysis, among the GBM cases with short survival, immunostaining for the protein was more intense, demonstrating the potential usefulness of HSPB1 as a predictor of poorer prognosis.
ZEB1 is a transcriptional repressor, also have the important function in promoting the tumor invasion and metastasis. Recent studies have shown that ZEB1 protein expression was found in both bladder and esophageal cancer tissues, and was signi cantly associated with clinical prognosis [31]. Patients with high ZEB1 expression in breast cancer are prone to chemotherapy tolerance.
Risk score is one of the most commonly used methods for the development of a meaningful signature. Importantly, a nomogram model based on ferroptosis-related genes risk scores could accurately predicted the prognosis of patients with GBM [32]. ROC analysis demonstrated that the ferroptosis-related genes signature performed well in predicting survival for GBM patients in the TCGA, CGGA and GSE16011 data sets. KM analysis con rmed that the model accurately predicted the survival of GBM patients.
During the past few years the role of the ferroptosis in tumor susceptibility has become an intense area of research, the potential roles of ferroptosis in the tumor immune remain elusive. Functional analysis suggested that the biological processes of immune and in ammatory responses, were enriched in the high-risk group, which suggests a surprisingly close association between ferroptosis and tumor immunity. Interestingly, differences of the score of Macrophages, Tfh, TIL, Treg and T helper cells between the two groups were statistically signi cant. One possible speculation is that ferroptotic cells release unique signals, such as lipid mediators, to attract antigen presenting cells (APC) to the cell site where ferroptotic cells die. Previous studies have shown that due to the role of tumorassociated macrophages or Treg cells in immune invasion, they are related to the poor prognosis of HCC patients. [33 , 34].
In summary, we developed a novel prognostic model based on four ferroptosis-related genes and used it to predict the prognosis risk of in GBM patients. This model proved to be signi cantly associated with the OS in both the derivation and validation cohorts, which might provide insight into the prediction of GBM prognosis.
There are several limitations to this study. The rst is: the construction of our prognostic model and its validation with retrospective data from public databases. Second, the link between risk score and immune activity has not been experimentally veri ed.