Screening of DEGs in HBV–HCC patients.
The HBV–HCC datasets GSE55092, GSE62232, and GSE121248 were normalized, and the DEGs were selected by corrected P < 0.05 and |log FC| >2 with ‘limma’. The GSE55092 dataset involved 1289 DEGs, 496 of which showed upregulated expression and 793 exhibited downregulated expression (Fig. 1a). The GSE62232 dataset involved 1583 DEGs, of which 768 showed upregulated expression and 815 exhibited downregulated expression (Fig. 1b). The GSE121248 dataset involved 552 DEGs, of which 165 showed upregulated expression and 387 exhibited downregulated expression (Fig. 1c). The cluster heatmaps of the top-100 genes were shown in Fig. 1.
The expression matrix of these three datasets was obtained using ‘limma’, and sorted by log FC. The integrated DEGs were screened through ‘RRA’ according to corrected P < 0.05 and |log FC| >1. A total of 135 DEGs were identified, 57 of which showed upregulated expression, and 78 of which exhibited downregulated expression (Supplementary Table S1 online). We summarized the top-20 genes with upregulated and downregulated expression in the heatmap shown as Supplementary Fig. S1
Enrichment analyses using GO and KEGG databases. Functional annotation of integrated DEGs was analyzed using DAVID. Functional annotation was based on three categories: Biological process (BP), Molecular function (MF), and Cell component (CC). Annotation of genes with upregulated and downregulated expression was listed in Supplementary Table S2 and Table S3, respectively.
The top-five MF of genes with upregulated expression were “ATP binding”, “Protein kinase activity”, “Protein kinase binding”, “ATP-dependent microtubule motor activity”, and “Peptidase inhibitor activity” (Fig. 2A). The top-five CC of genes with upregulated expression were “Cytoplasm”, “Nucleus”, “Membrane”, “Midbody”, and “Centrosome” (Fig. 2A). The top-five BP of genes with upregulated expression were “Cell division”, “Mitotic nuclear division”, “Positive regulation of apoptotic process”, “Mitotic spindle organization”, and “Cytokinesis” (Fig. 2A). The genes with upregulated expression were shown in Fig. 2B.
The top-five MF of genes with downregulated expression were “Heme binding”, “Iron ion binding”, “Zinc ion binding”, “Oxidoreductase activity”, and “Monooxygenase activity” (Fig. 2C). The top-five CC of genes with downregulated expression were “Extracellular exosome”, “Extracellular region”, “Extracellular space”, “Endoplasmic reticulum membrane”, and “Organelle membrane” (Fig. 2C). The top-five BP of genes with downregulated expression were “Oxidation–reduction process”, “Cellular response to cadmium ion”, “Negative regulation of growth”, “Cellular response to zinc ion”, and “Xenobiotic metabolic process” (Fig. 2C). The genes with downregulated expression were shown in Fig. 2D.
Pathway analyses using the KEGG database were undertaken using DAVID and visualized with ‘ggplot2’ and Cytoscape 3.7.1. In these DEGs, the pathways that were augmented were “Caffeine metabolism”, “Cell cycle”, “p53 signaling pathway”, “Oocyte meiosis”, “Mineral absorption”, “Linoleic acid metabolism”, “Progesterone-mediated oocyte maturation”, “Retinol metabolism”, “Prion diseases”, “Chemical carcinogenesis”, “Tryptophan metabolism”, “Drug metabolism-cytochrome P450”, “Bile secretion”, “Metabolism of xenobiotics by cytochrome P450”, “Steroid hormone biosynthesis”, ‘Arachidonic acid metabolism”, “Glycolysis/gluconeogenesis”, “Complement and coagulation cascades”, “HTLV-I infection”, “The 5' adenosine monophosphate-activated protein kinase (AMPK) signaling pathway”, “Insulin signaling pathway”, “Proteoglycans in cancer”, and “Metabolic pathways” (Fig. 2E). The genes involved in these pathways were visualized in Fig. 2F.
Investigation of DEGs using PPI networks.
We identified 135 DEGs in the GSE55092, GSE62232, and GSE121248 datasets, and constructed PPI networks through the STRING database. We screened seven functional modules from the PPI network through MCODE within Cytoscape. The main function modules are shown in Supplementary Fig. S2 A. The genes in this module that were enriched according to the KEGG database were “Cell cycle”, “Oocyte meiosis”, “p53 signaling pathway”, “Progesterone-mediated oocyte maturation”, “HTLV-I infection”, and “Viral carcinogenesis” (see Supplementary Fig. S2 B online). TOP2A, DLGAP5, RAD51AP1, ZWINT, BUB1B, CCNB1, FOXM1, CCNB2, AURKA and CDK1 were selected as top-ten hub genes through Cytoscape according to the value of degree. The expression heatmap of hub genes in these datasets was shown as Fig. 3.
Verification of hub genes.
We verified these ten hub genes in TCGA HCC cohort, ICGC-LIRI-JP dataset, and Human Protein Profiles. We identified DEGs through ‘edgeR’ in the ICGC-LIRI-JP and TCGA cohort. Expression of these ten hub genes was increased dramatically in ICGC-LIRI-JP (Fig. 4A) and TCGA (Fig. 4B) project. We also confirmed these ten hub genes protein expression patterns in the Human Protein Profiles, FOXM1 and AURKA were robustly positive in HCC samples compared with normal tissues, TOP2A, DLGA5, CCNB1, CCNB2, and CDK1 were moderately increased in HCC cohort in contrast to normal patients (Fig. 4C). However, RAD51AP1, ZWINT and BUB1B were not found in the website.
Construction and verification of a two-gene prognostic model.
Survival of hub genes was assessed using the ICGC-LIRI-JP cohort. All of these ten hub genes were correlated significantly with patient survival (see Supplementary Fig. S3 online). Expression of these ten hub genes in these 232 HCC patients was documented for multivariate cox regression analyses.
These patients were separated into low or high-risk groups according to the median risk score as the cut-off value.
FOXM1 and CDK1 were identified as independent prognostic factors related to patient survival with P < 0.05; their coefficients were 0.108 and 0.129, and hazard ratios were 1.114 and 1.138 respectively. The Kaplan–Meier curve for Overall survival (OS) in the high- and low-risk groups based on the genes of FOXM1 and CDK1 was dramatically different (P = 6.878 × 10− 7) (Fig. 6A). The prognostic capacity based on the genes of FOXM1 and CDK1 was analyzed by the area Under the curve (AUC) of ROC (Fig. 5A). The AUC for survival at 1, 2 and 3 years was 0.769, 0.786 and 0.815, respectively. A higher AUC indicates a better forecasting model than a lower AUC, so our two-gene model showed high sensitivity and specificity to assess the prognosis.
Furthermore, we verified the two-gene prognostic model by TCGA HCC cohort, The Kaplan–Meier curve for OS was significant different (P = 2.718 × 10− 3) between the high- and low-risk groups (Fig. 5B). The AUC for survival at 1, 2 and 3 years was 0.639, 0.636 and 0.644, respectively (Fig. 5B). Therefore, we can conclude that FOXM1 and CDK1 are prognosis molecules for ICGC-LIRI-JP and TCGA HCC project.
We established a liver PTEN knockout (KO) mouse, extrons 4 and 5 of PTEN were deleted (Fig. 6A), and these mice developed liver tumors at 12 months (Fig. 6B). mRNA (Foxm1 and Cdk1) expression increased significantly in the liver tumor tissue of KO mice in contrast to the normal liver tissue of CTL mice (Fig. 6C). These results indicated that the liver tumor expression patterns of FOXM1 and CDK1 were consistently in human and mouse.
Establishment of a prognostic model using a nomogram.
We collected the complete clinical data of 213 patients in the ICGC-LIRI-JP dataset. Multivariate cox regression was undertaken to evaluate the two-gene prognostic model and clinical factors (histology grade, pathology stage, age, cancer history, malignancy and sex) for the prognosis. The histology grade, pathology stage, sex and two-gene prognostic model were independent prognostic factors with P < 0.05 (Fig. 7).
We constructed a nomogram with the four independent prognostic factors selected from multivariate cox regression analyses to predict OS at 1, 2 and 3 years in the ICGC-LIRI-JP cohort (Fig. 8A). The gray line in the calculation curve indicated the best prediction, and the nomogram model (described with a red line) matched well with the gray line (Fig. 8B). We also analyzed the AUC of the nomogram model, and the AUC for OS at 1, 2 and 3 years was 0.837, 0.804 and 0.829, respectively (Fig. 8C). Therefore, we concluded that the nomogram model performed well.
We evaluated the nomogram model through DCA. The nomogram model based on a combination of histology grade, pathology stage, sex and two-gene prognostic model performed better to predict OS than that using a single factor (Fig. 8D). Hence, the nomogram model could facilitate preoperative consultation of the patient, decision-making by the clinician and the follow-up schedule.