Screening hypoxia-related genes as prognostic biomarkers and modeling the individualized prognostic predictor in hepatocellular carcinoma

Background Hypoxia closely relates to malignant progression and appears to be prognostic for outcome in hepatocellular carcinoma (HCC). Our research aims to mine the Hypoxic related genes (HRGs) on the role of clinical prognosis in HCC. Moreover, we construct and define a model of prognostic predictor (PP) to estimate and improve prognosis of HCC patients. Results 37 differentially expressed HRGs were obtained. It contained 28 upregulated and 9 down-regulated genes. After the univariate Cox regression model analysis, we obtained 27 prognosis-related HRGs. Of these, 25 genes were risk factors for cancer and 2 genes were protective factors. The PP was composed of the 10 key genes (HDLBP, SAP30, PFKP, DPYSL4, SLC2A1, PGK1, ERO1A, LDHA, ENO2, TPI1), and significantly divided patients of HCC into high- and low-risk groups according to overall survival (OS) ( P <0.001). We got the Area Under Curve(AUC) value of risk score calculated by PP was 0.777, which much bigger than other clinical parameters. Besides, PP was verified as an independent prognosis-related parameter (in univariate analyse, HR=1.484, 95% CI=1.342–1.642, P<0.001; in multivariate analyse, HR=1.414, 95% CI=1.258–1.588, P <0.001). Finally, the application of PP in clinic was concluded that the higher the patient's risk score, the higher the corresponding tumor stage and T stage, and the patient's prognosis was poor. Conclusions This study provides hypoxic related molecular targets for the therapeutic intervention. In addition, an individualized prognostic predictor was constructed to predict prognosis for HCC patients .


Background
Accumulating evidence has revealed that above 50% of locally advanced solid tumors may show hypoxic tissue regions. The distribution of these areas in tumors is heterogeneous [1]. Hypoxia closely relates to malignant progression and seems to be a sign of poor prognosis in almost all solid tumors, especially in hepatocellular carcinoma (HCC) [2]. Hypoxia can lead to changes in the proteome and/or genome.
The mechanism may be related to cells' continued access to nutrients, escape from unsatisfactory microenvironments, and promote unrestricted growth to accelerate tumor progression [3]. Sustained effect of hypoxia may also lead to clinically more aggressive phenotypes [4]. All these can be regarded as important factors for poor prognosis of patients.
Liver cancer is the fourth leading cause of cancer death in 2018. It kills about 782,000 people every year [5,6]. Thus it is predicted that liver cancer may become the sixth most common cancer in the future. HCC comprises 75-85% of primary cancer of the liver. Unfortunately, the exact mechanism and pathways leading to HCC development are still unclear explanation [7]. In particular, the absence of indicators related to the prognosis of HCC. The 5-year survival rate for HCC patients was approximately less than 18% [8,9]. Therefore, finding the key molecular biomarkers focused on hypoxia which related to the occurrence and development of HCC patients has certain reference value for reliable estimation of the deterioration of HCC, and may be an effective measures against HCC.
In recent years, progress using high-throughput sequencing technologies has greatly expanded the research of cancer genome. Herein, we collected 349 HCC patients information from The Cancer Genome Atlas (TCGA) [10]. We compared the expression profile of hypoxic related genes (HRGs) between HCC and non-tumor samples by applying the Limma package in R statistical software. These genes were named, classified and localized through bioinformatic. We also tried to explore their relationship with signaling pathways. Furthermore, in order to make better use of the complementary value between genes and clinic characteristics, to clarify the correlation between the HRGs and clinical outcomes in HCC patients, the prognostic predictor (PP) was developed and validated as an independent indicator to assess the overall survival (OS) outcome of patients. These findings provide new research targets and therapeutic strategies for HCC patients, and provide a reliable theoretical basis for judging treatment outcomes and assessing prognosis.

Differentially expressed HRGs
A total of 374 HCC tissue samples and 50 non-tumor specimens were obtained for RNA-seq and clinical data. from TCGA. By screening patients with clinical follow-up for more than one month, our current study involved 349 patients with primary HCC.
A list of 200 HRGs involved in hypoxic regulatory pathways reported in the literature were extracted from the Hallmark hypoxic gene sets in the MSigDB database.

HRGs
To further understand the biologically relevant information of these 37 differential HRGs, we performed functional enrichment and enrichment analysis. The GO term function and KEGG pathway enrichment of these genes were reviewed. respectively Figure 2 and Figure 3 . We found the results that the top three GO terms for biological processes (BP) were: carbohydrate biosynthetic process, glucose metabolic process, and pyruvate metabolic process. The results of top three cellular components (CC) were: endoplasmic reticulum lumen, secretory granule lumen, and cytoplasmic vesicle lumen. The top three molecular function (MF), genes were mostly enriched in terms of protein heterodimerization activity, cell adhesion molecule binding, and receptor ligand activity. The detailed results of all the above gene enrichment showed in Figure 2 A-D. Through the enrichment analysis function in the KEGG pathway,, we found many important pathways associated with these genes such as HIF-1 signaling pathway, Gluconeogenesis, Carbon metabolism, MAPK signaling pathway , PI3K-Akt signaling pathway, and so on (Figure 3 A-D). As shown in the circle plot of Figure 3C, the change from the Z-score indicated mostly related signaling pathways were more inclined to be increased.

Identification of prognostic HRGs
Prognostic HRGs were computed with univariate Cox regression model by SPSS 22.0.
The relationship between the expression profiles of HRGs and OS was evaluated using TCGA data and the Hallmark hypoxic gene sets in the MSigDB database, resulting in 27 prognosis-related HRGs (HR>1 or HR<1, P<0.01) in Table 1 noting that the coefficients of the genes were positive, the expression of these genes were beneficial to increase the OS of HCC patients. Conversely, the negative coefficient of the gene means that these genes may shorten the OS of patients with HCC.

Validation the value of PP in evaluating patients' clinical prognosis
To validate the performance of PP in evaluating patients' clinical prognosis, we divided patients into the high-and low-risk group with the median risk score as the cut-off point according to the risk score formula ( Figure 5A). The K-M plots were plotted to analyze the different survival time between the two groups. The K-M analysis showed that patients in the high-risk group suffered significantly worse survival than those in the low-risk group (P<0.001, Figure 5B). Besides, the highrisk group had more deaths than the low-risk group over time ( Figure 5C). All these results indicated that the prediction ability of the equation was very well. Figures 5D-E showed the distribution of patients according to their prognostic risk. We could see intuitively that the high risk group had a much higher number of deaths than the low risk group.

The accuracy of PP was verified by combining with clinical parameters
To further vadilate the accuracy of PP in predicting OS in HCC patients, our research integrated age, gender, tumer grade, AJCC TNM stage with risk score. We used the time-dependent ROC curve analysis as the standard to determine the prediction effect of PP on the prognosis of HCC patients. We got the AUC value of risk score was 0.777, which much bigger than other clinical parameters ( Figure 6A)

Relationship between PP and clinical parameters
To further clarify the important value of PP in clinical application, we explored the relationship between risk score and clinical indicators by using the "Beeswarm" software package. The relationship between risk score of the high-and low-risk group of HCC patients and the six clinical indicators (age, gender, tumor grade, AJCC TNM stage) was verified by the "Beeswarm" software package, P < 0.05 was considered to be statistically significant. The final results showed that the risk score calculated by the PP model was consistent with the change in tumor stage ( Figure   7A) and T stage ( Figure 7B). The higher the patient's risk score, the higher the corresponding tumor stage and T stage, and the patient's prognosis was poor.

Discussion
HCC is a major contributor of death caused by cancer. previously, many progresses have been made in understanding the epidemiology, risk factors and molecular profiles of HCC[11]. However, incidence and HCC-specific mortality still continue to increase [12][13][14]. Although some progress has been made in molecular targeted therapy for HCC, the results are still unsatisfactory [15]. It is necessary to better understand the molecular mechanism which leads to the development of HCC, and to search for targeted genes that play a key role in the prognosis of HCC for early intervention.
Exploration of hypoxia mechanism opens new perspectives for HCC [16]. In particular, the announcement of the 2019 Nobel Prize in physiology or medicine has increased researchers' enthusiasm for exploring the mechanism of hypoxia in the field of cancer. Increasing evidence indicates that hypoxia closely related to the migration and malignant progression of HCC [8,17]. However, most studies have focused only on hypoxia by studying signaling genes [18,19]. With the development of bioinformatics and the continuous improvement of high-throughput sequencing technology, some large-scale databases, including TCGA and GEO, have emerged.
These databases provide effective means for sorting out gene signatures. In the current study, we deeply mined all the differential genes that are significantly expressed in tumors and normal tissues. By integrating them with a large sample of clinical data, we aimed to select molecular biomarkers for detecting the prognosis of HCC patients.
We first screened 37 differentially expressed HRGs between HCC and non-tumor tissues. To discover the role of genes in HCC progression, GO and KEGG analysis of the differential expression genes were performed. Gene functional enrichment analysis suggested that these genes were mainly involved in carbohydrate biosynthetic process and HIF-1 signaling pathway. The process of carbohydrate biosynthesis was mainly related to the energy metabolism of cancer cells. HIF-1 and its related genes actively promoted HCC growth, HCC cell proliferation and aggressive behavior, which was positively associated with the presence of intrahepatic metastasis and the histological grade of HCC [8,20]. Our analysis confirmed that HIF-1 signaling pathway was the most critical process for hypoxia leading to poor prognosis of HCC, which is consistent with the findings in most literatures [21,22]. As such, these genes could also act as clinical biomarkers for monitoring metastasis, assessing survival, and potential drug targets. HDLBP (high density lipoprotein binding protein), a positively regulated gene, its closely related to the cancer process [23]. HDLBP can promotes HCC cell proliferation and tumorigenesis. Consistent with many other studies, HDLBP overexpressed in other types of cancers [24]. Intriguingly, in breast cancer, HDLBP may be a tumor suppressor to accelerate the degradation and inhibit the translation of the c-fms proto-oncogene mRNA [25]. These observations suggested that the mechanism of HDLBP (either inhibit or promote carcinogenesis) was still unclear.
SAP30 (serum amyloid P30) was a very sensitive indicator of liver disease [26]. It has been demonstrated that SAP up-regulated in serum samples from HCC patients.
LDHA (lactate dehydrogenase A) is a vital enzyme responsible for cancer growth and energy metabolism via the aerobic glycolytic pathway. Inhibition of LDHA could inhibit tumor-initiating cell survival and proliferation, which indicated that LDHA may be a potential therapeutics target [27]. GK1 (phosphoglycerate kinase 1) is an important enzyme in the metabolic glycolysis pathway. Many articles have observed a significant overexpression of PGK1 in HCC tissues and a negative correlation between PGK1 expression and HCC patient survival. Depletion of PGK1 dramatically reduced cancer cell proliferation and tumorigenesis [28,29].These findings demonstrated a novel role of PGK1 in HCC progression. ENO2 (Enolase 2) encodes an enolase isoenzyme which is regarded as a sensitive and specific biomarker for neuroendocrine tumours [30].Additionally, ENO2 was believed to be an indicator in the diagnosis and prognosis rather than a potential target for therapy in renal tumor patients [31]. In our study, ENO2 was defined as a negative regulatory gene. It provided new therapeutic targets and biomarkers for HCC prognosis through inhibiting its expression to improve the prognosis of HCC.
According to the current study, some features of HCC prognosis based on expression profile have been found by large public databases. However, the purpose of these studies was often limited to mining new molecular markers but neglecting routine clinical parameters [32,33]. Thus, we integrated patient clinical information to establish a PP model for assessing prognosis. Firstly, we verified the accuracy of the model through the K-M survival analysis. The risk score was calculated by PP formula, and the median was taken as the cut-off point dividing the high-and lowrisk group, and the survival difference was significant (Fig. 5A). At the same time, we combined age, gender, tumor grade, AJCC TNM stage with risk score and further analyzed through ROC curve, and found that the AUC value of risk score was the largest (AUC = 0.777, Fig. 6A). Secondly, in the process of univariate and multivariate analysis, by integrating the clinicopathological features, we found that the risk score was still an independent indicator for evaluating the prognosis of patients with HCC. (Fig. 6B, Fig. 6C). At last, by comprehensively analyzing the risk score and the patient's tumor TNM stage, we found that the greater the risk score calculated by PP, the higher the patient's tumor grade and the worse the prognosis.
All of the above fully demonstrated the accuracy of our constructed model of PP and its important value in evaluating patients' prognosis.

Conclusions
In conclusion, through a comprehensive bioinformatics analysis with HRGs expression profiles and corresponding clinical features, we identified 10 prognostic HRGs and constructed the model of the PP. This comprehensive study of multiple databases helps us gain insight into the biological properties of HCC and provides potential molecular targets for subsequent basic research. The PP may help us to make a better judgment of patients' prognosis, so that we can better intervene in patients' progress.

Differentially expressed HRGs enrichment analysis
We screened for differences in HRGs expression between HCC and non-tumor samples by applying the Limma package in R statistical software [36]. The significantly differentially expressed HRGs were defined as the adjusted p value is less than 0.05, the gene expression changes at least 2-fold. To explain the main biological properties of HRGs, the tools of Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) were used for gene functional enrichment analyses.. GO and KEGG terms with a P-value and q-value both smaller than 0.05 were considered significant categories. To take full advantage of the wellknown GO and KEGG themes, and highlight the most relevant GO terms associated with a given gene list, these results of genes enrichment were visualized by R language clustering package.

HRGs
We extracted the expression profile of HRGs downloaded from TCGA in a FPKM format. Univariate Cox regression analysis was performed to screen HRGs whose expression profile was significantly correlated with OS in HCC patients from 200 hypoxic-related genes (P<0.01). Multivariate Cox regression analysis was performed on these survival-related genes to remove genes that might not be independent indicators of prognosis. We get the optimal solution of the model by narrowing down the relevant genes step by step. Finally, the PP was composed of the 10 key genes and significantly divided patients of HCC into high-and low-risk groups according to OS. Kaplan-meier (K-M) method was used to plot the survival curve. The survival rates of the two groups were tested by log-rank test.
To further verify whether hypoxia related gene model could be used as an independent predictor of prognosis in the TCGA cohort of patients with HCC, we introduced several following indicators closely related to the clinic into the model as covariables for multivariate Cox regression analysis: PP and age were coded as continuous variables, gender (male or female), tumor grade (high or low), AJCC TNM stage (I=1, II=2, III=3 and IV=4), tumor stage (1, 2, 3 and 4), lymph node metastasis (positive or negative) and distant metastasis (positive or negative).

Statistical analysis
All statistical analyses were conducted using SPSS 22

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. The datasets analyzed in this study are available in the TCGA databases (https://www.cancer.gov/aboutnci/organization/ccg/research/structural-genomics/tcga) and MSigDB (http://software.broadinstitute.org/gsea/msigdb/index.jsp).

Authors contribution
HG and JZ designed and conceived the study. YZ, JL, YC and LL searched and collected the majority of the data. HG, ZW and WZ performed the bioinformatics analyses. HG and JZ were major contributors in writing the manuscript. All authors have read and approved the final version of this manuscript.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.  Tables   Table 1 The  Abbreviations: HR, hazard ratio; C,I, confidence interval. Abbreviations: HR, hazard ratio; C,I, confidence interval. Figure 1 Differentially expressed hypoxic related genes (HRGs) between hepatocellular carcinoma (HC

Figure 2
The gene ontology (GO) terms function enrichment of differentially expressed hypoxic relate The relationships between the expression profiles of related genes (HRGs) and OS. In these 3 The accuracy of hypoxia-related prognostic predictor (PP) in hepatocellular carcinoma patien  Abbreviations: HR, hazard ratio; C,I, confidence interval.  The gene ontology (GO) terms function enrichment of differentially expressed hypoxic related gen The relationships between the expression profiles of related genes (HRGs) and OS. In these 37 HR The relationship between risk score and clinical indicators. (A) The boxplot showed the relationshi