Identification of DEGs related to hypoxia in HCC
We identified DEGs (|LogFC| >1, P < 0.05) using the mRNA expression profile between HCC and adjacent noncancerous tissues from TCGA database (Figure 1A) (Table 4). Then we matched the differentially expressed mRNA-sequencing data between hypoxia-treated and untreated HCC cell lines in GEO database (Table 2-3) and obtained 397 DEGs which were related to hypoxia in HCC (Figure 1B-D). By using the Gene Ontology (GO) enrichment and functional analysis, we found that these genes are enriched in DNA replication, cell division, cell cycle and also somatic diversification of immune receptors. (Figure 1E).
Using the hypoxia-related genes for the consistent clustering of HCC molecular subgroups
Consistent clustering of 397 hypoxia-related DEGs were constructed by using the ConsensusClusterPlus R software package. The average clustering consistency and inter-cluster variation coefficient of each cluster number were calculated and the optimal cluster number was determined by using CDF. As shown in Figure 2A, the clustering outcoming was stable when k=4. We further analyzed CDF delta area curve and found that the area under the CDF curve tended to be stable after 4 clusters (Figure 2B). The item-Consensus Plot also showed that the sample classification was relatively stable when the clustering number was selected as 4 (Figure 2C). Finally, we built a consensus matrix graph which 397 DEGs were assigned to 4 clusters in order to evaluate the composition and quantity of clustering more intuitively (Figure 2D). The heatmap of 397 hypoxia-related DEGs in 4 clusters was shown in Figure 2E.
The results from Kaplan–Meier plot showed the significant differences in survival probability and recurrence rate among these 4 subgroups. Compared to the other three clusters, the samples in cluster-2 had the worst prognosis and the highest recurrence rate (Figure 3A-B). We further analyzed the distribution of AFP, gender, degree of vascular infiltration, TNM stage, pathological grade, and age in these 4 subgroups (Figure 3C). Samples in cluster-4 were associated with high AFP expression level, undifferentiated tumor cells and lymphatic metastasis while cluster-3 showed high incidence of distant metastasis; cluster-2 had a higher degree of vascular invasion and more tumor cells with low differentiation. Moreover, most of patients in cluster-2 were male and generally aged between 65 and 70 years. It is worth noting that patients in cluster-2 showed the highest TMB than other three clusters (Figure3D-E), suggesting a benefit of immunotherapy.
Construction and validation of a hypoxia-related prognosis signature with good performance
We performed a univariate Cox regression and found 291 DEGs significantly related to OS of HCC patients (P<0.01) (Table 1). Then a Lasso‐penalized Cox analysis was performed to further shrink the scope of gene screening. The penalty parameter was established through 10‐fold cross‐validation. We selected 11 DEGs, which appeared over 900 times of a total of 1000 repetitions (Figure S1). Finally, by analyzing a multivariate Cox regression, three genes (PDSS1, SLC7A11, CDCA8) conforming to the proportional hazards (PH) assumption were selected to build a prognostic model as follows: the prognostic index (PI) = (0.337 * expression level of PDSS1) + (0.383* expression level of SLC7A11) + (0.356* expression level of CDCA8). The optimal cut-off value of 2.296 for the risk store was produced using X-tile software and patients with survival time from TCGA-LIHC were divided into a high- and low-risk group. The K-M curve showed that the OS of the high-risk group was significantly poorer than that of low-risk group (P<0.001, HR=4.76) (Figure 4A), and the high-risk group had higher expression of prognostic genes compared with the low-risk group (Figure 4C). The area under the time-dependent ROC curves (AUCs) for 0.5-, 1‐, 3‐ and 5‐year overall survival (OS) were 0.76, 0.78, 0.7 and 0.7, respectively, indicating a good predictive performance of this prognostic model (Figure 4D).
We further validated the prediction ability of this prognostic signature using HCC samples from ICGC database (Table 5). Consistent with above results, HCC patients were divided into a high- and low-risk group with an optimal cut-off value of 18.812 and patients in the high-risk group had poorer survival probability than the low-risk group (P<0.001, HR=5.26) (Figure 4B). The risk-score distribution and gene expression were distinctly different (Figure 4E) and the AUCs of the three‐gene prognostic model were 0.68, 0. 75, 0.77 and 0.77 for the 0.5-, 1-, 3- and 4‐year survival times (Figure 4F). Meanwhile, we attempted to compare the hypoxia-related signature with other prognostic models published previously[26, 27]. For the hypoxia-related signature, methylation-driven prognostic model and three-gene prognostic model, the AUCs was 0.78, 0.67 and 0.67 in TCGA cohort and 0.75, 0.64 and 0.64 in ICGC cohort, respectively (Figure S2). Taken together, our prognostic model showed a higher specificity and sensitivity.
Evaluating the independent role of prognostic signature and building a predictive nomogram for OS prediction in the HCC cohort from TCGA
Univariate and multivariate Cox regression analysis were used to evaluate whether the predictive value of the prognostic model was independent of other traditional clinical characteristics. The results showed that the TNM stage (P<0.05, HR=1.828) and the risk score (P<0.05, HR=1.683) were independent prognostic factors for OS (Figure 5A). Then we built a predictive nomogram which may be helpful to accurately predict a certain clinical outcome (Figure 5B)[28]. Each level of independent factors was assigned one score and a total score was calculated by summing up the scores in each individual. The survival probability for the individuals at 1-, 3-, and 5- year was obtained through the function conversion relationship of total scores. The calibration plot for internal validation of the nomogram showed better consistency between the predicted OS outcomes and actual observations (Figure 5C-E). The C-index was 0.54, 0.65 and 0.66 for the TNM stage, the prognostic model and the nomogram (95%CI: 0.58--0.73), further indicating that our nomogram had a higher predicting consistency. The AUCs of the nomogram at 1-, 3- and 5- year OS were 0.672, 0.684 and 0.675, which were better than the models with single independent factors (Figure 5F-H). The DCA was used to evaluate guiding significance of these models for clinical application and the results showed that the combined model was the best for predicting the OS (Figure 5I-K). For the hypoxia-related signature, methylation-driven prognostic model and three-gene prognostic model, the C-index reached 070, 0.64 and 0.64 in TCGA database and 0.74, 0.65 and 0.65 in ICGC database, indicating a more sensitive and valuable predictive performance of hypoxia-related model.
Evaluation of the hypoxia-related genes for predicting the recurrence of HCC patients
TCGA-LIHC cohort with release-free survival (RFS) information and recurrent status of HCC patients was utilized as a training set for an independent evaluation, and the HCC cohort from GSE14520 (Table 6) was used as a validation set. Based on these three hypoxia-related genes, we constructed a recurrence signature by using the regression coefficient (β’) of multivariate Cox ccproportional hazards. The prognostic index (PI) = (0.060 * expression level of PDSS1) + (0.045* expression level of SLC7A11) + (0.041* expression level of CDCA8). In both training and validation set, patients were divided into a high- and low-risk group based on the risk score of 0.953 and 1.247. The distribution of risk score and gene expression was examined (Figure 6A, S3A). From the results of Kaplan–Meier survival analysis, patients in high-risk group had significantly higher recurrence rate than the low-risk group. (Figure 6B, 6E) and we also performed ROC analysis to evaluate the predictive accuracy of our recurrence model (Figure 6C, S3B). Compared with other prognostic models, the AUCs was 0.64, 0.6 and 0.6 for the hypoxia-related signature, methylation-driven prognostic model and three-gene prognostic model (Figure 6D). All these results indicated a more reliable predictive ability of our hypoxia-related recurrence model.
Building a nomogram for predicting recurrent probability of HCC patients and evaluating its predictive performance
We performed a univariate and multivariate Cox regression analysis and screened out three independent factors related to the recurrence of HCC (P<0.05) (including the age, the TNM stage and the risk score of our recurrence signature) (Figure 7A). The nomogram for recurrence prediction was built by integrating these three factors (Figure 7B) The level of each factor was assigned according to the regression coefficient of each influencing factor, and then the scores were added to obtain the total score. Finally, the predicted value of the individual outcome was calculated through the function conversion relationship between the total score and the probability of occurrence of outcome. The calibration plot of the nomogram showed a consistency between the prediction and observation (Figure 7C-E). The C-index was 0.62, 0.56, 0.63 and 0.71 for the age, TNM stage, the prognostic model and the nomogram (95%CI: 0.64--0.78). From the results of ROC analysis in Figure 7F-H, the AUCs of nomogram at 1-, 3-, 5-year was 0.746, 0.741, 0.717, respectively, which was obviously higher than other models with single independent factors. The DCA curves showed that the combined model obtained a higher net benefit (Figure 7I-K). Through comparative analysis with other recurrence models, the C-index was 060, 0.59 and 0.59 for the hypoxia-related signature, methylation-driven prognostic model and three-gene prognostic model. These results indicated that our recurrent nomogram performed a better sensitivity and specificity of HCC recurrence prediction and could provide clinicians with more specific guidelines.
Establishment of a diagnostic model based on hypoxia-related genes in HCC
As the diagnosis is of great importance for proper management of patients, we further analyzed whether hypoxia-related genes also contribute to more accurate diagnosis of HCC. A diagnostic model based on these three hypoxia-related genes was constructed by using a stepwise logistic regression method. The diagnostic score was finally identified as follows: logit (P = HCC) = 1.171 + (-0.571) × PDSS1 expression level + (-1.019) × SLC7A11 expression level + (-2.037) × CDCA8 expression level. In TCGA cohort with 50 normal samples paired 50 HCC samples, our diagnostic model achieved a sensitivity of 94% and a specificity of 92% (Figure 8A). We also utilized ICGC cohort with 190 normal samples paired 219 HCC samples as a validation set, and the diagnostic model obtained a sensitivity of 90% and a specificity of 94% (Figure 8B). As shown in ROC analysis (Figure 8C-D), the AUCs of our model reached 0.986 and 0.962 in TCGA and ICGC cohort, indicating a satisfactory accuracy of prediction. We correlated the predictive outcomes with the corresponding gene expression, the expression of PDSS1, SLC7A11, CDCA8 was upregulated in samples which were predicted the type of tumor (Figure 8E-F).
Liver nodule is a kind of hepatic hyperplasia caused by various factors. It is indistinguishable from the early stage of liver cancer, and the corresponding treatment methods are different. We aimed to establish a diagnostic model by using a stepwise logistic regression method to better distinguish liver cancer from hepatic nodules. The diagnostic score was identified as follows: logit (P = HCC) = -45.308 + 0.628 × PDSS1 expression level + 8.452 × SLC7A11 expression level + 4.047 × CDCA8 expression level. We tested the diagnostic performance of the model in two databases, GSE6764 and GSE89377 cohort. One achieved a sensitivity of 88.57% and a specificity of 82.35%, the other one achieved a sensitivity of 87.5% and a specificity of 77.27% (Figure 9A-B). The AUCs for GSE6764 and GSE89377 were 0.934 and 0.935 (Figure 9C-D). The gene expression corresponding with the predicted results was shown in Figure 9E-F. These data further confirmed that the diagnostic model was a novel predictive tool with high accuracy and potential clinical value.
Validation of the expression and genetic alterations and independent prognostic analysis for genes
We detected genetic alterations of the three genes from cBioportal database[29] and found that PDSS1, SLC7A11 and CDCA8 possessed genetic alterations of 9%, 3% and 5% (Figure 10A). These results helped explain that the abnormal gene expression may be attributable to genetic alterations. To further confirm the expression level of each gene in HCC, we used TCGA database containing 50 tumor and 50 normal samples. We found all the three genes were highly expressed in HCC compared with in normal liver tissues (Figure10B-D). Moreover, by analyzing gene expression in GSE6764 cohort, we found that the expression levels of PDSS1, CDCA8 and SLC7A11 were significantly higher in tumor tissue than those in liver nodules (Figure10E-G). We also attempted to explore the interaction between each two genes. As shown in Figure10H-J, there was a sort of synergy between CDCA8 and PDSS1 as well as SLC7A11 (P<0.05).
Kaplan-Meier Plotter database[30] was used in order to analyze the effect of single gene on HCC prognosis. The results showed that the high‐expression level of PDSS1, CDCA8 or SLC7A11 was separately related to a shorter overall survival time (Figure11A-C). In addition, the progression-free survival (PFS) analysis, which can better reflect tumor progression and predict clinical benefits, also showed an association between higher expression level of a single gene and faster disease progression (Figure11D-F). To achieve a better understanding of the functional characteristics of three genes, we performed Gene set enrichment analysis, which showed that some immune-related pathways, such as JAK–STAT3 signaling, The NF-kappa B signaling, were highly active in the high-risk group (Figure11G-I).
Comparison of the immune microenvironment between high- and low-risk groups
Tumor immune cell infiltration refers that the immune cells move from the blood to the tumor tissue. The immune cells in tumors are closely related to clinical outcomes and they are most likely to serve as drug targets to improve survival rate [31]. Since these three genes have been found to enriched in some immune pathways, we then analyzed the relationship between hypoxia-related genes and immune cell infiltration as well as immune checkpoints in HCC (Figure12A-B). Patients in the high-risk group had higher ratios of M0 macrophages, memory B cells and follicular helper T cells than those in the low-risk group (P <0.05) (Figure12D-F). Moreover, we found that the expression levels of TIM3, B7H3, CTLA4, PD1 and PDL1 in the high-risk group were obviously higher than those in the low-risk group (P <0.05) (Figure12C, G-K). Our findings lead us to conclude that tumor immune microenvironment may be responsible for the prognosis of HCC patients with high expression of hypoxia-related genes.