3.1 The expression of BIRC5 and its relationship with prognosis of HCC patients
To investigate the role of BIRC5 on HCC, we firstly analyzed the expression of BIRC5 between HCC samples (369 cases) and normal samples (50 cases) based on the TCGA database. As a result, the expression of BIRC5 was significantly increased in the HCC samples compared with the normal samples (p-value = 2.779e-28, Fig. 1a). Further, the expression of BIRC5 was significantly improved in the patients with Stage Ⅲ/Ⅳ compared with that of Stage Ⅰ/Ⅱ (p-value = 0.003, Fig. 1b). Then, ruling out the 31 samples with missing clinical information, the other 338 HCC samples were stratified into BIRC5 high expression group (n = 169) and BIRC5 low expression group (n = 169) according to the median expression. K-M analysis demonstrated that the HCC patients with high-expression of BIRC5 had an unfavorable prognosis (p-value < 0.0001, Fig. 1c). The all results suggested that BIRC5 may participate in occurrence and development of HCC, and result in an unfavorable prognosis.
3.2 Selection and functional enrichment analysis of DEGs
A total of 338 HCC samples were included to conduct the differential expression analysis. A threshold of |log2FC| > 1 and the adjusted p-value < 0.05 were used as selection criteria. Volcano plots were drawn to illustrate the distribution of each DEG between the samples with high- and low-expression (Fig. 2a). These rigorous criteria generated a list of 241 genes differentially expressed in high-expression group in comparison to low-expression group, with 148 (61.4%) up-regulated and 93 (38.6%) down-regulated genes (Supplementary Table 1). The expression of the DEGs with top 40 was shown in Fig. 2b.
Then, the DEGs were used to perform the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. The results revealed that the DEGs were mainly involved in several GO-biological process (BP) terms including nuclear division, organelle fission, chromosome segregation, mitotic nuclear division and sister chromatid segregation (Fig. 2c and Supplementary Table 2), mainly enriched in the KEGG pathways including cell cycle, chemical carcinogenesis, retinol metabolism, drug metabolism and DNA replication (Fig. 2d and Supplementary Table 3).
Taken together, these results collectively suggested that there were 241 DEGs identified from the BIRC5 high- and low- expression groups, which were mainly associated with nuclear chromosome activities and cell cycle processes.
3.3 Analysis of the WGCNA
The 338 HCC samples were included in the WGCNA to analyze the relationship of the module and the clinical traits in R. Figure 3a demonstrated the sample dendrogram and trait heatmap. Soft thresholds were estimated by the scale independence and mean connectivity. As shown in Fig. 3b, when R2 reached 0.9 (red line), the soft threshold had stabilized and 8 was close to the dividing line (red line, R2 = 0.9). When the mean connectivity was close to 0, the soft threshold was also 8 at the same time. Thus, the power of β = 8 (scale free R2 = 0.9) was chosen as the soft-thresholding parameter. The modules with eigengenes correlation greater than 0.6 would be merged (Fig. 3c). Then, the different modules (MEs) and clinical traits (including survival status, survival time, stage, gender, race and age) were correlated by the Pearson correlation analysis. We chose the blue module (MEblue) which had the highest correlation with survival status (r = 0.22, p-value = 1e-04), survival time (r=-0.22, p-value = 9e-05), stage (r = 0.27, p-value = 2e-06, Fig. 3d). Additionally, scatterplots showed that gene significance (GS) were highly correlated with module membership (MM) in the blue module (p-value < 0.0001, Fig. 3e and Fig. 3f). Then, according to the conditions of GS. status > 0.2 and MMblue > 0.7 as well as GS. times < -0.2 and MMblue > 0.7, genes were screened getting 180 module genes related to clinical traits, which were overlapped with the 241 DEGs by the Venn diagram, ultimately getting 33 candidate genes (Fig. 3g).
3.4 Establishment and verification of the risk signature
The univariate analysis were performed to demonstrate whether the 33 candidate genes was related to the survival of HCC patients. Further, multivariate analysis was applied to construct a risk signature. As shown in Table 1, Fig. 4a and Fig. 4b, we obtained 8 genes related to the HCC prognosis (p-value < 0.1), among which CENPA, CDCA8, EZH2, KIF20A, and KPNA2 acted as hazardous factors (HR > 1), and CCNB1, KIF18B and MCM4 were protective factors (HR < 1). Additionally, there were high correlation between the BIRC5 gene and the 8 genes (all r > 0.72, all p-value < 0.0001, Fig. 4c). The risk signature was constructed by the 8 genes, followed by calculating the risk score. To preferably understand the impact of these 8 genes for HCC patients, K-M analysis was conducted in the TCGA dataset to compare the OS between the BIRC5 high- or low- expression groups. Importantly, the result showed that the expression of 8 genes were correlated with the OS of HCC patients (p-value < 0.001, Supplementary Fig. 1).
To demonstrate the potential predictive impact of this risk signature on clinical outcomes in HCC patients, the samples were divided into high- and low-risk groups. Figure 4d and Figure 4e revealed the survival state distribution of HCC samples. The results demonstrated that the number of HCC patients who died increased with the risk score increased in TCGA and ICGC databases, respectively. Besides, there was a significant difference in OS (P < 0.0001) between the two groups in TCGA (Figure 4f) and ICGC databases (Figure 4g). Thereafter, to substantiate the efficiency of the risk signature, the ROC curves at 1- and 3-years were conducted, indicating that the risk score had high accuracy in in TCGA and ICGC databases [area under the curve (AUC) > 0.73, Fig. 4h and Fig. 4i] in distinguishing the OS of HCC. The expression of the 8 genes was displayed by heatmap in TCGA (Fig. 4j) and ICGC databases (Fig. 4k).
Based on the comprehensive bioinformatics analyses mentioned above, they strongly supported that the efficacy of the risk signature was accurate and stable for the HCC prognosis prediction.
3.5 Independent prognostic analysis of risk score
The further studies were conducted to explore the prognostic value of the risk signature for HCC patients stratified by several clinicopathological characteristics, such as age, gender, TNM stages and clinical stages. We observed that the risk signature constructed by the 8 genes could be used to separate HCC patients into distinct prognosis groups with different clinicopathological characteristics (p-value < 0.05, Supplementary Fig. 2), suggesting that the risk score can predict both OS and clinicopathological characteristics.
Furthermore, we demonstrated whether the risk score was an independent factor in predicting the prognosis of HCC patients after adding to seven clinicopathological characteristics in the TCGA dataset. Regarding the univariate analysis, we found that the risk score, pathologic_M_stage, pathologic_T_stage and tumor_stage were strongly associated with prognosis (p-value < 0.05, Table 2 and Fig. 5a). For the multivariate analysis with the above factors, we noted that the risk score still strongly related to the OS (p-value = 7.7e-10, Table 3 and Fig. 5b). Additionally, 1- and 3-years survival probability were shown by the nomogram (Fig. 5c), and the calibration curves of them were displayed in Fig. 5d and Fig. 5e.
Accordingly, the risk score constructed by the 8 genes was proved to be a powerful and independent predictor for HCC outcome.
3.6 IPA of the genes in the risk signature
To further explore the underlying functions of the genes in the risk signature, we used IPA to analyze the classic signaling pathway and the disease-related pathways enriched by these 8 genes. As illustrated in Fig. 6a, the 8 prognostic genes were obviously enriched in nuclear chromosome activity and cell cycle related classic signaling pathways, such as Mitotic roles of Polo-like kinase, Cell cycle: G2/M DNA damage checkpoint regulation, DNA damage, Kinetochore metaphase signaling pathway, Cell cycle control of chromosomal replication. Disease-related pathways that were enriched by 8 genes mainly included Cancer, Endocrine system disorders, Immunological disease, Inflammatory response, Nutritional disease, and Reproductive system disease (Fig. 6b). Meanwhile, the result of PPI of the 8 genes demonstrated that these 8 genes were related to each other mainly with EZH2 as the core (Fig. 6c).
3.7 Analysis of the immunotherapy response prediction
To investigate the clinical significance of risk score, we evaluated the IPS and TIDE score between the high- and low-risk groups. Compared with the high-risk group, the EC was significantly decreased (p-value = 3.602e-05, Fig. 7a), SC (p-value = 0.0163, Fig. 7b) and CP (p-value = 4.52e-04, Fig. 7c) were significantly increased in the low-risk group. These results demonstrated that the genes of the low-risk group could cause an immune response. That is, these genes can stimulate specific immune cells (such as the suppressor cells, checkpoints or immunomodulators) to activate, proliferate, and differentiate, and ultimately produce immune effect substance antibodies and the sensitized lymphocytes [15]. In addition, the TIDE score in patients with high risk score were significantly higher than those with low risk score (p-value = 0.0018, Fig. 7d). Since patients with higher TIDE score are more likely to have a higher opportunity of antitumor immune escape, showing a lower response rate of immune checkpoint blockade (ICB) treatment [14]. These results suggested that the HCC patients of TCGA database with low risk score were more sensitive to the therapy of ICB.