Differentially expressed HRGs
A total of 374 HCC tissue samples and 50 non-tumor specimens were obtained for RNA-seq and clinical data. from TCGA. By screening patients with clinical follow-up for more than one month, our current study involved 349 patients with primary HCC. A list of 200 HRGs involved in hypoxic regulatory pathways reported in the literature were extracted from the Hallmark hypoxic gene sets in the MSigDB database. According to the limitation of a FDR <0.05 and |log2(Fold Change)|>1.5, our study screened 37 genes with significant differential expression, of which 28 were up-regulated and 9 were down-regulated. (Figure 1A, B). Then, we visualized the expression patterns of these HRGs between HCC and normal tissues by making boxplot (Figure 1C). The boxplot showed expression patterns that 28 up-regulated genes (ALDOA, ANKZF1, ANXA2, B4GALNT2, BCAN, CA12, COL5A1, DDIT3, DPYSL4, DTNA, EFNA3, GPC3, HOXB9, INHA, KDELR3, KIF5A, LOX, NDRG1, P4HA2, PDGFB, PFKP, PGF, PPFIA4, RRAGD, SLC2A1, SLC2A5, STC1, and STC2) and 9 down-regulated genes (DCN, FBP1, FOS, MT1E, MT2A, PCK1, PLAC8, SERPINE1, and ZFP36).
Functional annotation and enrichment analysis of differentially expressed HRGs
To further understand the biologically relevant information of these 37 differential HRGs, we performed functional enrichment and enrichment analysis. The GO term function and KEGG pathway enrichment of these genes were reviewed. respectively （Figure 2 and Figure 3）. We found the results that the top three GO terms for biological processes (BP) were: carbohydrate biosynthetic process, glucose metabolic process, and pyruvate metabolic process. The results of top three cellular components (CC) were: endoplasmic reticulum lumen, secretory granule lumen, and cytoplasmic vesicle lumen. The top three molecular function (MF), genes were mostly enriched in terms of protein heterodimerization activity, cell adhesion molecule binding, and receptor ligand activity. The detailed results of all the above gene enrichment showed in Figure 2 A-D. Through the enrichment analysis function in the KEGG pathway,, we found many important pathways associated with these genes such as HIF-1 signaling pathway, Gluconeogenesis, Carbon metabolism, MAPK signaling pathway , PI3K-Akt signaling pathway, and so on (Figure 3 A-D). As shown in the circle plot of Figure 3C, the change from the Z-score indicated mostly related signaling pathways were more inclined to be increased.
Identification of prognostic HRGs
Prognostic HRGs were computed with univariate Cox regression model by SPSS 22.0. The relationship between the expression profiles of HRGs and OS was evaluated using TCGA data and the Hallmark hypoxic gene sets in the MSigDB database, resulting in 27 prognosis-related HRGs (HR>1 or HR<1, P<0.01) in Table 1. By mapping the forest of HRGs and OS (Figue 4), we can intuitively understand the role of these genes in cancer prognosis. Of these, 25 genes (HDLBP, SAP30, NDRG1, ADORA2B, PFKP, DPYSL4, PFKFB3, SLC2A1, VEGFA, HMOX1, SDC3, ATP7A, PGK1, ERO1A, XPNPEP1, LDHA, ENO2, SLC6A6, MAP3K1, BNIP3L, TPI1, RRAGD, TES, SERPINE1 and JMJD6) were risk factors for cancer and 2 genes (GRHPR and UGP2) were protective factors.
In order to better evaluate the prognosis and survival time of patients, a optimal equation for the prognostic predictor (PP) was further conducted by multivariate Cox proportional hazards regression model. Finally, 10 genes (HDLBP, SAP30, PFKP, DPYSL4, SLC2A1, PGK1, ERO1A, LDHA, ENO2, TPI1) were obtained and incorporated into the final model. Among these genes, 8 genes (HDLBP, SAP30, PFKP, DPYSL4, SLC2A1, ERO1A, LDHA, TPI1) were risk factors for HCC patients, 2 genes (PGK1 and ENO2) were protective factors (Table 2).
Construction and definition of the PP
In the multivariate Cox proportional hazards regression model, Y (survival time and status) is obtained by multiplying the contribution weight of every gene and X (expression value of each gene) and then adding them. The formula of PP is expressed as follows: PP=(0.5003×expression value of HDLBP)+(0.4192×expression value of SAP30)+(0.2345×expression value of PFKP)+(0.3718×expression value of DPYSL4)+(0.4166×expression value of SLC2A1)+(-0.4244×expression value of PGK1)+(0.3085×expression value of ERO1A)+(0.5351×expression value of LDHA)+(-0.5150×expression value of ENO2)+(0.3658×expression value of TPI1). It is worth noting that the coefficients of the genes were positive, the expression of these genes were beneficial to increase the OS of HCC patients. Conversely, the negative coefficient of the gene means that these genes may shorten the OS of patients with HCC.
Validation the value of PP in evaluating patients' clinical prognosis
To validate the performance of PP in evaluating patients' clinical prognosis, we divided patients into the high-and low-risk group with the median risk score as the cut-off point according to the risk score formula (Figure 5A). The K-M plots were plotted to analyze the different survival time between the two groups. The K-M analysis showed that patients in the high-risk group suffered significantly worse survival than those in the low-risk group (P<0.001, Figure 5B). Besides, the high-risk group had more deaths than the low-risk group over time (Figure 5C). All these results indicated that the prediction ability of the equation was very well. Figures 5D-E showed the distribution of patients according to their prognostic risk. We could see intuitively that the high risk group had a much higher number of deaths than the low risk group.
The accuracy of PP was verified by combining with clinical parameters
To further vadilate the accuracy of PP in predicting OS in HCC patients, our research integrated age, gender, tumer grade, AJCC TNM stage with risk score. We used the time-dependent ROC curve analysis as the standard to determine the prediction effect of PP on the prognosis of HCC patients. We got the AUC value of risk score was 0.777, which much bigger than other clinical parameters (Figure 6A). Thus, compared with other indicators, PP had important reference value in predicting patients’ prognosis.
Furthermore, risk score remained as an independent prognostic indicator for HCC patients in univariate and multivariate analyses, after adjusting for clinicopathological features such as age, gender, tumer grade, AJCC TNM stage. (HR=1.484, 95% CI=1.342–1.642, P<0.001, Figue 6B and HR=1.414, 95% CI=1.258–1.588, P<0.001, Figue 6C).
Relationship between PP and clinical parameters
To further clarify the important value of PP in clinical application, we explored the relationship between risk score and clinical indicators by using the "Beeswarm" software package. The relationship between risk score of the high- and low-risk group of HCC patients and the six clinical indicators (age, gender, tumor grade, AJCC TNM stage) was verified by the "Beeswarm" software package, P < 0.05 was considered to be statistically significant. The final results showed that the risk score calculated by the PP model was consistent with the change in tumor stage (Figure 7A) and T stage (Figure 7B). The higher the patient's risk score, the higher the corresponding tumor stage and T stage, and the patient's prognosis was poor.