DOI: https://doi.org/10.21203/rs.3.rs-141169/v1
Dysregulation of the balance between proliferation and apoptosis is the basis in human hepatocarcinogenesis. There is a possible association of apoptosis dysregulation with poor prognosis in many malignant tumors, such as hepatocellular carcinoma (HCC). However, the prognostic effect of Apoptosis-related genes (ARGs) on HCC is still unclear.
A total of 161 ARGs expression levels were analyzed based on The Cancer Genome Atlas (TCGA) database(https://cancergenome.nih.gov/) to screen for differentially expressed ARGs. Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed to determine the underlying molecular mechanisms of screened ARGs in HCC. ARGs prognostic values were identified using Cox regression to subsequently establish a prognostic risk model and scoring in patients. Kaplan-Meier (K-M) and receiver operating characteristic (ROC) curves were plotted to determine the prognostic value in the model.
Compared to normal specimens, 43 highly upregulated and 8 downregulated ARGs respectively and their normal counterparts in HCC specimens were screened. KEGG analysis demonstrated pathways correlated with these 51 genes which included MAPK, P53, TNF, PI3K-Akt signaling pathways. With Cox regression, 5 prognostic correlated with ARGs (PPP2R5B, SQSTM1, TOP2A, BMF, and LGALS3) were obtained to develop the prognosis model. According to the median of risk scores, patients were categorized into high-risk and low-risk groups. Patients in low-risk groups had a significantly higher two-year or five-year survival probability (p < 0.0001). The risk model had better potency than other clinical characteristics, with the area under the ROC curve (AUC = 0.741). Prognosis of HCC patients was established from a plotted nomogram.
This present study established a novel prognostic risk model for predicting HCC according to the expression of ARGs. The present advancement can potentially contribute to prediction prognosis and individualized treatment of HCC patients.
Liver cancer has been ranked sixth among the most common tumors worldwide and the fourth leading cause of cancer-related mortality [1]. Among the frequent primary liver cancer, hepatocellular carcinoma (HCC) accounts for approximately 75% cases [1]. Despite the advancements in diagnosing, techniques and treatment of HCC, the survival of the patients worsens following the high rate of recurrence and metastasis [2–4]. TNM staging system is a traditional method for prognostic prediction, lacks performance accuracy in its clinical application due to substantive variations in HCC clinical outcomes [5]. Over the past decades, serum alpha-fetoprotein (AFP) has been the only biomarker available for detecting and predicting prognosis, through with low sensitivity limits its clinical utility [6]. Therefore, identification of a novel prognostic biomarker and an advanced prognostic model for HCC patients is of paramount importance.
Bioinformatics analysis had been extensively adopted over the past few years as an effective tool in exploring functions of numerous differentially expressed genes (DEGs) and determining the complexity of HCC occurrence and development [7, 8]. Hub genes and pathways associated with pathogenesis and prognosis of HCC via a series of bioinformatics analyses were established by Meng et al [9]. However, these studies usually ignore clinical information such as sex, age, grade, and stage of tumors. It may be very innovative to construct a prognostic model combining patient’s gene expression level and clinical information. Hepatocarcinogenesis develops following the imbalance between proliferation and apoptosis [10]. Results of a recent study had revealed that the spindle and kinetochore-related complex subunit 3 (SKAT3) overexpressed in HCC potentially inhibits p53 activation by binding to cyclin-dependent kinase 2 (CDK2), before impeding cell apoptosis and promoting cancer cell proliferation [11]. Besides, several biomolecules may influence HCC prognosis by regulating apoptosis-related genes or apoptosis-related pathways [12–14]. Association of the haplotype of the apoptosis-related gene CDKN1B with the prognosis of HCC patients who underwent surgical resection were reported by Yu et al [15]. CCT/ACT haplotype patients had lower overall survival rates than those with common ACT/CCT haplotype. Thus, ARGs can potentially assess HCC prognosis.
Herein, based on the gene expression and clinical characteristics data obtained from the Cancer Genome Atlas (TCGA) database, we determined the ARG related to the prognosis and develop a prognostic prediction model. The model potentially calculates the risk score to predict and evaluate the prognosis of HCC patients.
The mRNA expression data and clinical information of HCC patients were obtained from TCGA database (https://cancergenome.nih.gov/). The clinical information includes age, gender, TNM stage, T stage, N stage, M stage, and histological grade. A total of 161 apoptosis-related genes (ARGs) were acquired from the gene set “HALLMARK_APOPTOSIS” in the Molecular Signatures Database v7.1 in GSEA.
Gene set with prominent differences between in HCC and normal samples were explored by GSEA. Subsequently, the limma package and the Wilcoxon signed-rank test in R software 3.6.2(∣logFC∣ > 1, FDR < 0.05) were used to show the significantly differentiated expressed ARGs in mRNA expression profiles of HCC cohort. The pheatmap and ggpubr packages in R software were used to develop volcano plots, heatmaps, and box plots.
Gene ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed to explore the potential biological functions of ARG participation before visualized through R software with packages such as ggplot2, DOSE, clusterProfiler, enrichplot, GOplot, digest, etc. Interactions among the selected ARGs were determined through protein-protein interaction (PPI) network from the STRING database (http://www.string-db.org/) and visualized by Cytoscape software.
Univariate and multivariate Cox regression analyses were conducted to identify the prognosis-associated ARGs in HCC. A prognosis-associated prediction formula acquired from multivariate Cox regression was used to construct a prognostic model using package “glmnet” in R. According to the prognostic model, the Kaplan-Meier (K-M) method was performed to analyze the survival rate of the high- and the low-risk groups. Subsequently, we use the area under the ROC curve (AUC), and KEGG enrichment analysis to assess the predictive value of the prognostic model. Finally, the R package (rms) was used to develop a risk model-based nomogram for predicting the prognosis of HCC patients.
The gene expression data were normalized by log2 transformation. Thereafter, the statistical analyses and plots were performed and constructed using R software 3.6.2 (https://www.r-project.org/) and Perl language packages. Respectively, p-value < 0.05 was considered statistically significant.
Following the search in the Molecular Signatures Database v7.1 in GSEA, 161 ARGs was identified from the gene set “HALLMARK_APOPTOSIS”. Thereafter, the 161 ARGs in this gene set was further assessed using GSE analysis to ascertain the presence of biological significance in HCC. Results in (Fig. 1a) showed this ARG set was significantly differentially expressed between HCC and normal specimens.
The mRNA sequence data of 374 cases of HCC and 50 cases of normal tissue were obtained from TCGA. The limma package and the Wilcoxon signed-rank test in R (∣logFC∣ > 1, FDR < 0.05) were used to screen the differentiated expressed ARGs in HCC and non-tumor samples. In this study, 43 and 8 ARGs were significantly upregulated and downregulated, respectively. In addition, their results were plotted by volcano plot, heatmap, and box plot (Fig. 1(b-d)).
To further study the main biological functions and significant pathways of these 51 identified genes, the GO enrichment and KEGG pathway enrichment analysis were performed. The plotted results are shown in (Fig. 2). The 51 identified ARGs were involved in the pathways related to cellular apoptosis (Fig. 2a and 2b). According to KEGG enrichment analysis results, the 51 ARGs possible contribute to platinum drug resistance and transcriptional misregulation in cancer as well as associated with some oncogenic pathways such as MAPK, P53, TNF, PI3K-Akt signaling pathways (Fig. 2c). The STRING online tool was used to establish a PPI network to explore the interactions among the proteins coded by ARGs. Results observed using the Cytoscape software (Fig. 2d).
Univariate Cox regression analyses were performed on both mRNA expressional and corresponding clinical data of 51 selected ARGs to identify their prognosis-associated ARGs in HCC (Fig. 3a). There were 20 upregulated ARGs and 1 downregulated ARG which were statistically significant. These 21 identified genes were subsequently analyzed by multivariate Cox regression (p < 0.05) to determine ARGs association with the prognosis of patients and acquire corresponding regression coefficients. Following the multivariate Cox regression, five prognosis-related ARGs were screened which contained PPP2R5B, SQSTM1, TOP2A, BMF, and LGALS3. Expression levels in normal and HCC specimens were further compared to establish the prognostic value of these 5 genes. The 5 identified ARGs in HCC specimens had significantly higher expression levels than normal specimens as indicated by heatmap and boxplot (Fig. 3b and 3c). Besides, the K-M curve was constructed from the comparisons of the survival rates differences in high- and low-expressed groups in the identified ARGs. Interestingly, the high expression levels in the 5 identified ARGs indicated a low survival rate (Fig. 3d).
On the other hand, mutations of these 5 genes in HCC were analyzed by the cBioPortal database (http://cbioportal.org). Among the data of 353 HCC patients in this database, 22 patients (6.3%) had mutations. Notably, among the 22 patients with the mutation, 0.85% had missense mutations, 0.56% had amplifications, 0.56% had deep deletions in PPP2R5B. However, in the SQSTM1, 0.28% had missense mutations, 0.85% had amplifications and 0.28% had deep deletions whereas in TOP2A, 0.85% had missense mutations, 0.28% had truncating mutations, 0.56% had amplifications and 0.28% had deep deletions. Moreover, in BMF, 0.28% had missense mutations, and 0.28% had deep deletions whereas 0.28% had amplifications in LGALS3 (Fig. 3e).
Simultaneously, we revealed that these 5 genes were significantly upregulated in HCC from the immunohistochemical results of the Human Protein Atlas (HPA) database (Fig. 4).
We combined the expression level of ARGs and the regression coefficients of multiple Cox regression analysis to summarize the risk scoring formula. The risk score = (0.385327 * expression value of PPP2R5B) + (0.2787 * expression value of SQSYM1) + (0.152062 * expression value of TOP2A) + (0.172177 * expression value of BMF) + (0.110211 * expression value of LGALS3). All genes had a positive coefficient indicating that the high expressions of identified genes were negatively associated with prognosis. The risk score in HCC patients was calculated by this formula, and these patients were divided into high- and low-risk groups based on the median risk score as a cutoff (Fig. 5). The plotted heatmap exhibit the 5 ARGs expression levels which showed that patients in the same group had distinct expression levels (Fig. 5a). Patients scores were ranked in ascending order (Fig. 5b) and their survival time displayed in the scatterplot (Fig. 5c) revealed that the low-risk patients had a longer survival time and higher survival rate than those with high-risk.
Survival analysis of high- and low-risk groups were performed to further illustrate the correlation between the risk score and the prognosis of patients. First, we demonstrated the prognostic significance of the risk score using K-M curves and the log-rank method as shown in (Fig. 6). Patients in low-risk groups had a significantly high two-year or five-year survival probability (p < 0.0001; Fig. 6) compared with high-risk groups, revealing a worse prognosis.
Both univariate and multivariate Cox regression analyses were used to compare the predictive value of the prognostic model and other clinical variables on prognosis. The risk score was established as an independent prognosis predictor in patients with HCC (Fig. 7a and 7b). On the contrary, clinical variables were not independent prognosis predictor as shown in the result of multivariate cox regression analysis (p > 0.05; Fig. 7b). Moreover, the ROC curve was plotted to validate the predictive value of the risk score (Fig. 7c). The curves demonstrated that the AUC of the risk score was 0.741 as the highest value than that of the clinical variables. The results suggested that the risk score had a higher predictive value than the clinical variables.
The KEGG analysis was utilized to explore the potential biological pathways which were associated with 5 ARGs. The results showed that the major 5 correlated pathways were apoptosis, cell cycle, mTOR signaling pathway, WNT signaling pathway, and phosphatidylinositol signaling system (Fig. 7d).
To better apply the risk score to predict the prognosis of HCC patients, we combined the risk score with the corresponding clinical variables to construct a nomogram to predict the OS of patients at 1, 2, and 3 years. Given the risk scores and clinical variables, an average point for a patient can be established and extrapolated to determine the 1-, 2- and 3-year OS among the HCC patients (Fig. 8).
Based on the gene expression and corresponding clinical data acquired from the TCGA database, we analyzed the correlations between these clinical variables and the risk score formed by 5 ARGs. The results indicated that the risk score was correlated with the stage; BMF was associated with age, gender, grade, and stage; PPP2R5B and LGALS3 were related to the stage; SQSTM1 was associated with age and gender; TOP2A was connected with grade and stage (Fig. 9).
HCC is the most common type of malignancy with a high recurrence rate and worse prognosis [16, 17]. Studies have shown limitations in achieving early diagnosis and treatment of HCC [18]. Therefore, the identification of a prognosis-related biomarker and the development of a prognostic model is of critical importance for HCC patients. Currently, bioinformatics analysis plays an increasingly significant role in identifying therapeutic targets for diagnosis, treatment, and prognosis of numerous tumors [19]. The process of apoptosis plays a crucial role in liver tumor development and regeneration [12, 20]. Insufficient apoptosis potentially leads to occurrence and development of liver tumors [21].
In this work, we developed a novel prognosis-predictive model for HCC based on the expression of ARGs. Based on the 161 ARGs list from the GSEA and the data from the TCGA database, 43 significantly upregulated ARGs and 8 significantly downregulated ARGs in HCC specimens compared with normal specimens were screened out. These 51 genes mainly participated in the pathways associated with cellular apoptosis according to GO enrichment analysis. KEGG analyzes revealed pathways correlated with these 51 genes which included MAPK, P53, TNF, PI3K-Akt signaling pathways. Besides, five prognosis-related ARGs were screened after the multivariate Cox regression which contained PPP2R5B, SQSTM1, TOP2A, BMF, and LGALS3. The high expressions having these five ARGs were negatively associated with prognosis. Based on this, we designed a risk model that considered an independent prognostic model of HCC with multivariate regression analysis. The predictive value of the model was confirmed by performing the K-M curve and ROC curve that showed the highest prognostic predictive power of the model with other clinical features of HCC patients. Therefore, by combining the risk score and clinical features as the basis of the nomogram, facilitate efficiency in predicting the prognosis of HCC patients.
The potential impact of these five hub ARGs on the progression of HCC were earlier observed. For example, Topoisomerase II alpha (TOP2A), was identified as a potential biomarker for cancer therapy in ovarian cancer, colon cancer, pancreatic adenocarcinoma and HCC [9, 19, 22, 23]. However, TOP2A was up-regulated in HCC [24]. Studies have proved that overexpressed TOP2A in HCC leads to a worse prognosis and its reactive agents has a potential therapeutic effect on HCC patients [25]. SQSTM1, an autophagy-related protein, degrades during autophagy. Moreover, autophagy was demonstrated to suppress spontaneous tumorigenesis in the liver [26]. Besides, BMF, one of the Bcl-2 family members, promotes apoptosis by inactivating pro-survival Bcl-2-like proteins via BH3 domain following its activation by stress signals [27]. Laura and Xinping et al [28, 29] revealed a close relationship between BMF and activated caspase-3as well as apoptosis inhibition using miR-221inhibits by targeting BMF in HCC and ovarian cancer cells. Besides, BMF inhibition promotes survival of multiple myeloma [30]. LGALS3 plays a significant role in the progression and metastasis of colon cancer, acute myeloid leukemia, melanoma and pituitary tumors [31–34]. Recent research had indicated that LGALS3 enhanced the tumorigenesis and metastasis of HCC cells via β-catenin signaling [35]. PPP2R5B (B56β) as the regulatory subunit of PP2A can influence cell growth, survival, and metabolism [36]. PPP2R5B mutation is hypothesized to cause human overgrowth [37], though its role in HCC had remained elusive.
In summary, the present study established a 5-gene risk model and constructed a nomogram for predicting outcomes of HCC patients for classification in clinical practice. Moreover, the study results provided an advanced survival prediction tool for HCC patients and revealed the association between ARGs and HCC that can further be confirmed by corresponding experimental studies.
Availability of data and meterials
All the data and materials are available.
Ethics approval and consent participate
There were no cell, tissue, or animal studies. No ethical requirements are involved.
Consent for publication
Not applicable.
Conflicts of Interest
The authors declare that they have no competing interests.
Authors' contributions
RL and GW designed the study; RL collected data and wrote the manuscript; GW performed the data analysis and drew figures and tables; RL, GW, and DB reviewed and revised the manuscript. Besides, the organization, revision and submission of this manuscript were completed by DB. Final draft were read and approved by all authors.
Funding
This project was funded by The National Natural Science Foundation of China (No. 81871909), “13th five-year Plan” Science and Education strong Health Project leading personnel of Yangzhou (LJRC20181; YZCXTD201801), Provincial-level discipline leader of the NJPH (DTRC201809), Research Funds for Tian Qing Liver Diseases (TQGB20200180).
Gene id | Coefficient | HR | HR.95L | HR.95H | P value |
---|---|---|---|---|---|
PPP2R5B | 0.3853266 | 1.4700944 | 1.0225253 | 2.1135688 | 0.0375054 |
SQSTM1 | 0.2786996 | 1.3214104 | 1.1163784 | 1.5640981 | 0.0011966 |
TOP2A | 0.1520620 | 1.1642324 | 1.0033461 | 1.3509168 | 0.0450713 |
BMF | 0.1721765 | 1.1878875 | 0.9781174 | 1.4426455 | 0.0824262 |
ARGs, apoptosis-related genes; HCC, hepatocellular carcinoma. |