3.1 Establishment and validation of hypoxia-related prognostic models
343 patients with liver cancer and 200 genes associated with hypoxia were used to identify the prognostic model.Using univariate cox regression analysis, 79 survival-related hypoxic genes were selected forst and then the random forest algorithm was used for feature selection. We identified the genes of the relative importanct gene>0.4 was identified as the final feature (Figure 1A-B). In view of these characteristics, we used multivariate Cox regression analysis to construct the prognosis model containing 7 hypoxic gene-related genes. Through the correlation coefficient, the risk formula was constructed, as follows: r risk score = LDHA * 0.000812695+ KDELR3 * 0.000649537+ CDKN1C * 0.002057653+ SLC2A1 * 0.004190531+ NDRG1 * 0.001235623+ VHL * 0.023669962+SCARB1* 0.00060351.Through the risk formula, the risk values of each patient in the training set and external validation set were calculated, and the patients were further divided into high-risk group and low-risk group on the basis of the median risk value. We found the number of death toll from the high-risk group of patients is significantly higher than low-risk group (Figure2A-B), moreover, the KM curve analysis that according to the results of the survival of high and low risk group has obvious differences, the survival rate of patients with low risk is far higher than the risk group (P < 0.05) (Figure3A-B), in addition, the results show that the training sample set and validation set outside the ROC curve prognosis is of high accuracy (Figure 3C-D).
3.2 Independent assessment of prognostic model
To assess the prognostic independence of this prognostic model in both the training set and the validation set samples, we first retained clinical traits (age, sex, stage, and grade) that existed in both data sets. Using univariate and multivariate cox regression analyses, we found that the prognostic model was significant in both data sets, suggesting that the prognostic model could act as an independent prognostic factor (Figure4).
3.3 Association between prognostic models and clinical cause groups
In order to explore the association between prognostic models and clinical traits, we first assessed the distribution of risk values in clinical traits and found that risk values for G3-4 were significantly higher than G1-2 (P<0.05). In addition, the risk values for stage Stage III-IV were significantly higher than that of Stage I-II. These results suggest that a higher risk score is associated with a higher degree of HCC malignancy(Figure5).
Therefore, this prognostic model can accurately predict the progression of HCC. In addition, in order to study the prognostic value of the model stratified by clinicopathological variables for HCC patients, stratified analysis was conducted for HCC patients according to age, sex, grade, and stage. For all the different stratifications, the Overall Survival (OS) time was significantly shorter in the high-risk group than in the low-risk group (Figure 6). These results suggest that the prognostic model can predict the prognosis of HCC patients without considering clinicopathological variables.
3.4 The construction and verification of nomogram in TCGA data set and ICGC data set
In order to establish quantitative prognostic methods for HCC, normogram was established by independent prognostic factors and prognostic models of two data sets. Based on multivariate Cox analysis, point ratios in nomogram were used to assign points. We drew a horizontal line to determine the points for each variable, calculated the total points for each patient by adding the points for all variables, and normalized it to a distribution of 0 to 100. By drawing a vertical line between the total point axis and each pre-posterior axis, we can calculate the estimated 1-year,3-year and 5-year survival rates of HCC patients, which may be helpful for practitioners to conduce clinical decisions about the prognosis of HCC patients. In addition, we evaluated the accuracy and consistency of the nomo diagram by performing the ROC curve and the calibration curve respectively (Figure 7).
3.5 Functional enrichment analysis of prognostic model
In order to further explore the potential function and role of prognostic models in HCC, GSEA enrichment analysis was used for enrichment analysis of high and low risk groups. The results showed the cell cycle, MTOR signaling pathways, OOCYTEMEIOSIS and UBIQUITINMEDIATESPROTEOLYSIS pathway were significantly enriched in the high-risk group (Figure 8).
3.6 Expression levels of kdelr3 and SCARB1
To better explain the biological function of these genes in the pathogenesis and development of HCC, we selected two genes not reported in HCC studies for RT-qPCR to study the differences in their expression levels. We found that the expression of KDELR3 was the highest in G2, while the lowest in 7721, L02 was the highest in SCARB1, and SK-HEP-1 was the lowest (p < 0.05) (Figure 9).