An Oxidative Stress Response Gene Model for the Prediction of Prognosis and Therapeutic Responses in Hepatocellular Carcinoma

Oxidative stress response genes are critical for the development and progression of hepatocellular carcinoma (HCC). Still, the predictive value for prognosis and treatment response of oxidative stress response genes needs further elucidation. Methods We obtained the transcriptomic data and corresponding clinicopathological information of HCC patients from The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) databases. Oxidative stress response genes (OSRGs) were retrieved from the MSigDB database. LASSO Cox regression analysis was utilized to establish an integrated multi-gene signature in the TCGA cohort, and its prediction performance was validated in the ICGC cohort. The risk score of each patient ware determined by the multi-gene signature. The CIBERSORT algorithm was employed to evaluate the immune cell inltration. Response rate to immune checkpoint inhibition (ICI) therapy was assessed using a TIDE platform. Tumor mutation burden was estimated using VarScan processed somatic mutation data. The drug activity data from the Cancer Genome Project and NCI-60 human cancer cell lines were used to predict sensitivity to chemotherapy. The drugs. Our study indicates that an integrated transcriptomic analysis may provide a reliable molecular model that better predicts diagnosis and forecasts the response of ICI therapy and chemotherapy.


Introduction
Hepatocellular carcinoma (HCC) ranks fth among all malignancies worldwide and is the second common cause of cancer-associated deaths [1]. Despite advances in therapeutic measures, the prognosis of HCC has improved very little over the last two decades. In clinical practice, prognosis prediction and treatment recommendation highly depend on clinical characteristics and pathological features, whose e cacy of prophecy is usually unsatisfactory because of HCC's high heterogeneity. Hence, novel prognostic markers that can better inform treatment strategies and adequately predict the prognosis are urgently needed. The recent technological progress in microarray technology has permitted that integrating multiple prognosis-related genes into a single prognostic model became a powerful approach to improve the accuracy of prediction.
It is well known that cancer cells are generally under stressful biological conditions, among which oxidative stress is one the most representative [2]. Increased oxidative stress is a hallmark of the cancer microenvironment, re ected by the elevated intracellular levels of reactive oxygen species (ROS) [3,4].
ROS are highly reactive oxygen chemicals, constituting peroxides, superoxide, hydroxyl radicals, and hydrogen peroxide. Dysregulated ROS in cancer cells may be attributed to the activation of the oncogene, hypoxia, and extracellular stimuli, such as chemotherapy and radiotherapy [5][6][7]. Excessive generation of ROS can be lethal to cancer cells, as it induces cell apoptotic death via DNA repair disorders, proteins damage, and lipid peroxidation [8]. Cancer cells exploit endogenous adaption, including transcriptomic and proteomic modulation, to survive under pressure from oxidative stress. Earlier studies reported that multiple genes, known as oxidative stress response genes (OSRGs), are involved in the endogenous adaption of cancer cells. For instance, the proteins encoded by MGST1/3, G6PD, GSR, and GPX2/4 play vital roles in glutathione synthesis, which serves as an antioxidant defender in cancer cells [9]. Besides, TXN, together with PRDXs, reduces intracellular hydrogen peroxide and oxidized proteins [10]. Generally, these genes are upregulated in tumor tissues and indicate poor prognoses, making them promising therapeutic targets.
Although OSRGs are essential for the proliferation, progression, and migration of HCC cells [4], the prognosis prediction and therapy recommendation value of OSRGs has not been well elucidated. In the present study, we conducted a transcriptomic analysis using HCC gene expression pro ling data from The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) databases.
First, in TCGA cohort, the OSRGs related to the overall survival (OS) were ltered out, then an eight-gene prognostic signature was established, and its robustness was evaluated with the ICGC cohort. We also analyzed the correlation between the tumor-in ltrating immune cells and prognostic risk scores. In addition, immune checkpoints expression, immune checkpoint inhibition (ICI) therapy response rate, and tumor mutation burden (TME) scores were evaluated to identify potentially valid immunotherapy. Finally, chemotherapeutic susceptibility was predicted according to the risk scores and the expression level of the individual prognostic gene.

Databases
We downloaded the transcriptomic and clinical data of HCC from the Liver Hepatocellular Carcinoma (TCGA-LIHC) collected dataset in the TCGA database and the LIRI-JP cohort in the ICGC database. Both TCGA and ICGC databases are publicly available; hence the ethical review is not required for their use.

Construction and validation of the prognostic oxidative stress response genes signature
We identi ed the differentially expressed genes (DEGs) between the normal and HCC tissues in TCGA cohort using the "limma" package in R software Version4.0.2 with the following threshold values: |log 2 FC| ≥ 1 and adj.p < 0.05. The shared candidate genes of DEGs and OSRGs were analyzed using univariate Cox regression analysis, genes with adj.p < 0.001 were preserved. Then LASSO Cox regression analysis was adopted to construct a prognostic gene signature, implemented using R software with "glmnet" package. The risk score of each patient was determined as follows: Risk score=∑(Coef(i) *Expression of gene(i)). The Kaplan-Meier (K-M) curve was plotted dependent using the "survminer" and "survival" packages in R software. We delineated the receiver operating characteristic (ROC) curves with the "timeROC" package in R software to assess the accuracy of the prognostic gene signature.

Nomogram establishment
A forecast nomogram was developed by incorporating risk scores, age, gender, stage, and TNM classi cation with the R package "RMS." The one, two, three-year OS probabilities were estimated using the total points. Calibration curves were used to evaluate the accuracy.

Assessment of immune cell in ltration
We used the CIBERSORT algorithm to quantify the subsets of in ltrating immune cells in the tumor environment of HCC samples. CIBERSORT is a deconvolution method based on bulk gene expression to estimate speci c immune cell types [11].

Prediction of immune checkpoint inhibition (ICI) therapy
The tumor immune dysfunction exclusion (TIDE) analysis platform was utilized to predict the responses to ICI therapy (http://tide.dfci.harvard.edu/). TIDE is a computational method that uses speci c gene markers to predict the response to ICI therapy [12]. A high TIDE score represents a low response rate to ICI.

Calculation of TMB scores
TMB is a measure of the gene mutations frequency in a cancer cell. We obtained the VarScan processed somatic mutation data of HCC from TCGA database. Strawberry-Perl version5.30.1 based on the JAVA8 platform was used to determine the TMB score depending on the genome mutation information of each HCC sample.

Chemotherapy response estimation
To estimate the association between drug therapeutic response and risk scores, the R package "pRRophetic" was adopted to generate each drug's half-maximal inhibitory concentration (IC50 value) for HCC patients. The pRRophetic package was constructed based on the chemical screening data from the Cancer Genome Project (CGP) dataset [13]. To explore the drug response prediction from the individual prognostic gene, we analyzed the pharmacological activity data of 218 FDA-approved drugs (Table S2) from the NCI-60 cancer cell lines, which were downloaded from the CellMiner database Version2021.2 (https://discover.nci.nih.gov/cellminer).

Statistical analysis
DEGs among HCC and normal liver tissues were compared using Wilcoxon signed-rank test. The K-M curve and the log-rank test were used to assess the difference in OS between groups. The correlation analysis was performed using the Pearson correlation coe cient. GraphPad Prism Version7.0.4 and R software Version 4.0.2 were utilized to make diagrams. P-value <0.05 was considered statistically signi cant.

Results
Construction of a prognostic gene signature in the TCGA cohort The TCGA-LIHC dataset contains 374 HCC patients, and the ICGC-LIRI-JP dataset contains 260 HCC patients, the clinical characteristics of these patients are displayed in Table S3. As illustrated in Fig .1a, b, 2107 DEGs were identi ed from the TCGA cohort, among which 55 genes were OSRGs (Fig. 1c). 13 OSRGs were then veri ed signi cantly related to shorter OS (Fig. 1d). After LASSO Cox regression analysis, an eight-gene prognostic signature was constructed (Fig .1e, f). To calculate the risk score of each patients, following formula is used: risk score = 0.069* expression level of G6PD + 0.177*expression level of MT3 + 0.206*expression level of CBX2 + 0.063* expression level of CDKN2B + 0.078*expression level of CCNA2 + 0.164*expression level of MAPT + 0.248*expression level of EZH2 + 0.213*expression level of SLC7A11 (Fig. S1). The above eight genes' expression was upregulated in HCC tissue than in normal liver tissue in the TCGA cohort (Fig. S2). We further validated the expression pattern of the proteins encoded by the above genes using clinical specimens in HPA database. All proteins except SLC7A11 were found to be elevated in HCC tissues. No conclusion could be drawn about SLC7A11 due to lack of data. G6PD, MT3, and MAPT were exclusively located in the cytoplasm, CDKN2B was located in both the cytoplasm and nucleus, while CBX2, CCNA2, and EZH2 were located in the nucleus (Fig. 1g).
Validation of the prognostic gene signature Then we classi ed the patients into the high-risk score and low-risk score subgroups according to the median cut-off value in the TCGA and ICGC cohorts (Fig.2a). Compared with the low-risk group, patients in the high-risk group had a higher mortality rate (Fig. 2c). In line with this, the K-M curve revealed a signi cantly shorter OS in the high-risk group (Fig. 2e). The expression heatmap of the eight genes is displayed in Fig. 2g. Time-dependent ROC curves assessed the predictive accuracy of the prognostic gene signature. The area under the ROC curve (AUC) of one, two, and three-year was 0.79, 0.75, and 0.73, respectively, which con rmed the effectiveness of this prognostic model (Fig. 2i). Then, we veri ed the robustness of the prognostic gene signature using the independent ICGC cohort. The results of patient distribution, K-M curve analysis, and AUC analysis were in line with those of TCGA cohort, further supporting the robustness of the prognostic gene signature (Fig. 2b, d, f, h, j).
Evaluation of the independent prognostic value of risk score and nomogram establishment We brought the clinicopathological characteristics and risk scores into univariate and multivariate Cox regression analyses to identify the independent prognostic predictors. Univariate Cox analysis of TCGA cohort revealed that stage, T, M, and risk score were signi cantly correlated with OS (Fig. 3a). In the ICGC cohort, univariate Cox analysis suggested that female gender, stage, and risk score were strongly associated with OS (Fig. 3c). After multivariate Cox analysis, risk score emerged as the only independent factor for e ciently predicting prognosis in both cohorts (Fig. 3b, d). We then constructed nomograms combining clinical variables and risk scores to calculate the total score of each patient with HCC, which could predict the one, two, and three-year OS (Fig. 3e, f). The calibration curves showed the excellent predictive accuracy of the nomograms (Fig. 3g, h).

Immune in ltration and ICI therapy prediction
Next, we employed CIBERSORT algorithm to assess the in ltration of the immune cells of each HCC sample (Fig. S1). As illustrated in Fig. 4a, the memory B cells, activated memory CD4+T cells, follicular helper T cells, and M0 macrophages were more abundant in the high-risk group compared to the low-risk group. However, naïve B cells, CD8+T cells, resting memory CD4+ T cells, monocytes and eosinophils were reduced in the low-risk group. In addition, there was a positive correlation between the immune checkpoints expression level and risk scores (Fig. 4b). TIDE analysis revealed that the risk scores and TIDE scores were negatively correlated (Fig. 4c). Consistently, the response rate to ICI therapy was expected to be higher in patients suffering from HCC with high risk scores (62% vs. 37%) (Fig. 4d).

Associations of TMB with Risk Score in HCC patients
Next, we investigated the associations of TMB with the risk score. As mentioned before, we strati ed HCC patients into high-or low-risk groups in the TCGA cohort. The comprehensive mutation data for each group was represented using a waterfall plot (Fig. 5a). The ve most frequently mutated genes were TP53(44%), CTNNB1(24%), TTN(23%), MUC16(18%), and APOB(11%) in the high-risk group, while CTNNB1(27%), TTN(23%), ALB(13%), TP53(12%), and MUC16(11%) were the top ve in the low-risk group. In addition, the overall genome mutation occurrence rate was 86.71% and 82.49% in the high-and low-risk group, respectively. As displayed in Fig. 5b, the TMB scores were positively correlated with the risk scores. The K-M curve analysis revealed that the patients in the high-TMB group have shorter OS than the patients in the low-TMB group. (Fig. 5c).

Analysis of the Correlation Between Risk Score and Chemotherapy Sensitivity
First, we estimated the chemotherapeutic response using the drug activity and transcriptomic data in the CGP cell lines. In the high-risk group, HCC patients were sensitive to 45 chemotherapy and targeted therapy drugs and were resistant to other 44 drugs, as indicated by the variation of the IC50 (Fig .6a, b). For each prognostic gene in the gene signature, the corresponding expression level was analyzed in the transcriptomic data of NCI-60 cell lines. The top two FDA-approved drugs with the strongest positive or negative association with each gene are shown in Fig. 7. The elevated expression of G6PD, MT3, CBX2, CDKN2B, CCNA2, MAPT, and EZH2 was associated with increased resistance to mitomycin, teniposide, 6thioguanine, and fulvestrant, etc. By contrast, elevated CBX2 and SLC7A11 expression are hallmarks of increased sensitivity to dasatinib, ixazomib citrate, and arsenic trioxide.

Discussion
In the era of precision oncology, the conventionally used serum alpha-fetoprotein (AFP) test, BCLC staging system, and AJCC staging system do not comply with the prognostic and therapeutic prediction for HCC precise therapy. Some novel diagnostic approaches, including liquid biopsy [14], GALAD model [15], have shown excellent performance in diagnosis and prognosis prediction of HCC. However, major challenges still exist in the bench-to-bedside translation of these technologies.
Here, for the rst time, we developed a novel oxidative stress response gene signature to predict the prognosis and therapeutic effect of HCC patients. We rst identi ed DEGs between HCC tissues and normal liver tissues in the TCGA cohort. Then, the 55 shared genes of OSRGs and DEGs were analyzed using univariate Cox regression, which revealed that 13 of these genes were related to OS. A gene signature of eight prognostic genes was then established using the LASSO Cox regression analysis. The prognostic gene signature achieved excellent performance in predicting the OS rate in the TCGA and ICGC cohorts. In addition, the nomograms constructed with risk scores and clinicopathologic factors showed good predictive ability for OS. The multivariate Cox analysis identi ed the risk score as an independent predictor for prognosis.
The gene signature established in the present study contains eight OSRGs (G6PD, MT3, CBX2, CDKN2B, CCNA2, MAPT, EZH2, SLC7A11). Compared with normal tissue, these genes are highly expressed in HCC tissue and are associated with unfavorable prognoses. G6PD encodes the protein glucose-6-phosphate dehydrogenase, which can resist damage by oxidative stress in cells [16]. Abnormal activation of G6PD contributes to the progression of many cancers [17]. MT3 encodes the protein metallothionein3, which displays a solid ROS-scavenging capacity in oxidative stress-conditioned cells [18]. However, the exact role of MT3 in cancer cells remains controversial [19,20]. As an oncogenic gene [21,22], CBX2 is strongly associated with genome-scale DNA methylation in various types of cancer cells, which is a response to micro-environmental stresses, particularly oxidative stress [23]. Recently, H-A Lee et al. reported that oxidative stress enhances CDKN2B expression and causes cell cycle arrest in HCC cells [24]. Nonetheless, the correlation between CDKN2B expression levels and prognosis is still unclear [25,26]. CCNA2 encodes Cyclin A2, which is a suppressor of intracellular oxidative stress [27] that is upregulated in colorectal and breast cancer tissues and is associated with poor prognosis [28,29]. The MAPT encodes the protein Taus, which plays a signi cant role in the antioxidative response in neurons [30]. Elevated tau has been associated with a poor prognosis in prostate cancer [31]. EZH2 is an oncogene that is upregulated under oxidative stress induced by H 2 O 2 [32,33]. Finally, SLC7A11 encodes a cystine/glutamate transporter that has been reported to suppress the expression of P-glycoprotein in breast cancer cells induced by ROS [34]. In addition, SLC7A11 was overexpressed in numerous cancers and is correlated with poor survival [35].
We observed that the prognostic gene signature was closely associated with in ltrating immune cells in the tumor microenvironment as memory B cells, activated memory CD4+T cells, follicular helper T cells, and M0 Macrophages were abundant in patients with high risk scores. At present, the interaction between immune checkpoints and oxidative stress has barely been studied. Tang et al. found that treatment with a cocktail of immune checkpoint antibodies increased oxidative stress and T cell in ltration in lung adenocarcinoma [36]. Our data identi ed various immune checkpoints that have a positive correlation with risk scores, among which PD1, PD-L1, and CTLA4 are common targets for ICI therapy in clinical application. Pre-clinical studies showed that IDO1, TIGIT, LAG3, and TIM-3 are promising therapeutic targets [37]. Although patients with high expression levels of immune checkpoints tend to experience immune evasion and the consequent poor prognosis, the elevated expression of PD1, PD-L1, TIM-3, and IDO1 may result in a higher response rate to ICI therapy [38][39][40]. The results of TIDE analysis further supported these ndings because patients with HCC having high risk scores were also associated with low TIDE scores, thereby indicating their high response rates to ICI treatment. Previous studies have reported that high TMB indicates a better response to immunotherapy, which could be explained by the fact that mutant proteins encoded by mutated genes in cancer cells make tumors more immunogenic [41,42]. This study found that the total gene mutation rate was higher in the high-risk group, and that patients with high TMB scores had lower OS rates. Interestingly, the TP53 mutation rate in the high-risk group is much higher than in the low-risk group (44% vs. 27%). Some research ndings suggested that the mutant p53 protein usually loses antioxidant function and increases intracellular ROS, which drives the function switch from a cancer suppressor protein to a cancer promoter protein [43]. Hence, this nding supports our OSRGs-based prognostic gene signature.
In general, HCC cells are resistant to intravenous cytotoxic chemotherapy. Instead, transarterial chemoembolization and oral administration of sorafenib/lenvatinib are the globally-accepted standards for treating advanced HCC. Doxorubicin and cisplatin are the frequently-used drugs in transarterial chemoembolization [44]. In the present study, the patients with HCC in the high-risk group exhibited elevated sensitivity to doxorubicin, afatinib, etoposide, and gemcitabine. Inversely, they were more likely to acquire resistance against sorafenib, the most extensively used multi-targeted tyrosine kinases inhibitor for treating late-stage HCC [45]. Using the NCI-60 data, we veri ed that each member of the gene signature could potentially indicate susceptibility to several FDA-approved chemotherapeutic drugs. Sensitivity towards some of these drugs has already been proved [29,[46][47][48][49][50][51]. For most of the prognostic genes, elevated expression levels are associated with increased drug resistance, making them potential targets to overcome this crisis. For example, cisplatin treatment induces ROS production in ovarian cancer, and ROS promotes EZH2 expression, which inactivates AKT/ERK pathways that confers cisplatin resistance [52].

Conclusion
We established a novel prognostic gene signature consisting of eight oxidative stress response genes, which showed good performance in predicting the prognosis of HCC patients. Our gene signature could also be utilized as a molecular tool for predicting the immunotherapy response and chemotherapy sensitivity, which are a requisite for developing precision cancer medicine strategies. To sum up, our study laid the foundation for further search on the oxidative stress response genes and their effect on the prognostic evaluation and therapeutic choice for HCC.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The public datasets analyzed in this study can be downloaded here: https://portal.gdc.cancer.gov/repository and https://dcc.icgc.org/releases/current/Projects/LIRI-JP. All codes used in this study are available from the corresponding author upon reasonable request.