Nomogram for Prediction of Hepatocellular Carcinoma Prognosis

A reliable nomogram for predicting prognosis of Hepatocellular Carcinoma (HCC) is lacking, and the cause of poor outcome still remains unclear. It is urgently to prolong the anticipation of overall survival (OS) time of HCC by exploring new therapeutic targets. Different expression genes were calculated in R packages, hub genes dened as overlapped candidates across ve datasets and estimated in Oncomine database. A prognostic nomogram constructed on multivariate Cox analysis and evaluated by receiver operating characteristic curve and concordance index analysis. The fraction of TME cell types were estimated by xCell, a gene signatures enriched method. Hypoxia scores were calculated by single sample gene set enrichment analysis upon the 15 hypoxia signatures. Statistically signicance and correlation analyses were processed in R program, co-expression interaction plotted in Cytoscape.


Page 4/24
Background Liver cancer is the fourth most common cause of cancer-related mortality and ranks the sixth in incidence globally (Villanueva, 2019). Hepatocellular carcinoma (HCC) is the prevalent form of liver cancer and occurs in patients with chronic liver diseases, mostly as a result of hepatitis B or C virus (HBV or HCV) infection, alcohol abuse or nonalcoholic fatty liver (Llovet et al., 2016;Villanueva, 2019). Although the prevention, diagnosis and treatment of HCC have been signi cantly developed in the past decades, especially the targeted therapy and immunotherapy have improved survival in HCC patients, the objective response rates of these promising treatments are less than 50%, and the substantial risks of serious treatment related side effects exist (Bruix et al., 2019;Galon and Bruni, 2020;Havel et al., 2019;Yang and Heimbach, 2020). Thus, the evaluation of tumorigenesis and tumor progression is of great signi cance for predicting prognosis, exploring reasonable therapeutic strategies, and reducing the HCC mortality.
To date, alpha-fetoprotein (AFP) is the most general factor in auxiliary diagnosis and surveillance, whereas, the sensitivities of AFP with ultrasound for early HCC detection is only 63% (95%CI 48% -75%) (Tzartzeva et al., 2018;Yang and Heimbach, 2020). In clinical practice, tumor staging systems, commonly used to determine the treatment strategies, are limited in predicting prognosis (Yang and Heimbach, 2020). Studies have de ned that the treatment response and survival time of HCC patients depend not only on tumor staging but also on heterogeneous and gene signatures (Craig et al., 2020;Hoshida et al., 2008;Nault et al., 2013;Nault and Villanueva, 2020;Sia et al., 2017). Currently, transcriptomic sequencing in tumor tissues has been widely used to identify patients with homogeneous phenotypes. In this scenario, however, some molecular subtypes showed converging and similar survival pro les, making them redundant subtypes in terms of survival differences (Chaisaingmongkol et al., 2017;Xue et al., 2019). Approximately, 2892 genes have been reported relate to the prognosis of HCC (Uhlen et al., 2017).
Taking into account of the feasibility of clinical work, it is rational to screen a few pivotal genes to construct prognostic models for HCC. Several studies based on speci c gene signatures of HCC have been published, but only 2 of them had been externally validated, of which one was based on microarray, the sensitivity and accuracy was limited as compared with RNA-Seq data, and the other one only focused on HCV-related HCC (Ganne-Carrié et al., 2016;Nault et al., 2013;Nault and Villanueva, 2020;Sia et al., 2017;Villanueva et al., 2011).
Accumulative literature show that tumorigenesis, tumor progression, resistance to therapy, invasion, and metastasis are controlled by tumor microenvironment (TME), exploring the bidirectional interaction of cancer cells and TME is essential for developing e cient prognostic model and consequently provide new targets for the treatment of HCC (Hanahan and Coussens, 2012;Quail and Joyce, 2013;Riera-Domingo et al., 2020). Here, based on multivariate Cox analysis and TME assessment, we aim to construct a reliable prognostic nomogram for HCC.

Materials And Methods
Data source and identi cation of DEGs RNA-Seq can quantify expression across a larger dynamic range spanning over ve orders of magnitude than microarray, but their correlation is very low for genes with low expression levels (Z. Wang et al., 2009). To make sure hub genes can be validated easily, the average expression of DEGs were set to be over 1% of the level of internal control genes (ACTB or GAPDH) in our study.
Raw counts of Mongolian RNA-Seq (accession number GSE144269) were also downloaded from GEO database containing 70 pairs of tumor and adjacent normal specimens (Candia et al., 2020). Mongolian_DEGs were calculated in R DESeq2 package (version 1.26.0), and ltered in the following criteria: FDR<0.01, |log2FC|>1 and base average expression > 0.01GAPDH.
LIHC RNA-Seq data of The Cancer Genome Atlas (TCGA) were downloaded from UCSC Xena dataset (https://xenabrowser.net/datapages/) in the form of HTSeq -FPKM. Filtered with hepatocellular carcinoma, primary solid tumor, and vial marked with A, there were 375 tumor specimens in the TCGA_LIHC dataset. Randomly sampled 192 specimens from the tumor specimens of TCGA_LIHC dataset were as a training set, which were merged with the 89 normal specimens. The DEGs were calculated in R Wilcox test after expression matrix transformed in exponential conversion at the base of 2, with FDR<0.01, |log2FC|>1 and either group of average expression > 0.01ACTB in normal samples.
Top 500 overall survival (OS) related genes were downloaded from Gene Expression Pro ling Interactive Analysis (GEPIA2, http://gepia2.cancer-pku.cn). The OS related genes were identi ed by the empirical cutoff value de ned as the median and quartile of OS time on the GEPIA2 database, named OS_Median and OS_Quartile gene set, respectively (Tang et al., 2019).

Oncomine database analysis
The hub genes were evaluated in the Oncomine database (https://www.oncomine.org/resource/login.html) across various types of cancers with the following threshold: p value of 0.0001, fold changes of 2, and gene rank of top 10%. Relative expression levels were downloaded from Wurmbach liver (GSE6764, containing 75 specimens) normalized in Oncomine database and plotted by R ggplot2 package (version 3.3.2) upon corresponding clinicopathological characters.

Elastic net penalty analysis
The elastic net penalty (ENP) was calculated in R glmnet package (version 4.0-2) by setting α of 0.5. As expression levels of hub genes were continuous variables, "gaussian" was the selected as "family" argument for the plot. Mean squared error was chosen in the cross-validation. Reasonable value of log lambda and coe cients were determined according to the smallest value of mean-square error.

Best subsets regression analysis
Best subsets regression (BSR) was processed in R leaps package (version 3.1), with "nvmax" set of 11 to corresponding to the 11 hub genes, and best set of variables for each model size were calculated. If best sets were not consistent, the prediction errors of each model were computed to de ne the best performance model, according to the three metrics of Adj.R2, CP and BIC.

Constitutions of Cox model
Kaplan-Meier curves plot was performed to de ne the suitable covariates for the following multivariate Cox analysis (http://www.sthda.com/english/wiki/cox-proportional-hazards-model). To evaluate the validity of the Cox model assumptions, residuals methods were used in this study, i.e., Schoenfeld residuals were used to check the proportional hazards assumption, deviance residual was applied to examine in uential observations and Martingale residual was employed to assess nonlinearity. Risk scores were predicted after the constitution of Cox model via R survival (version 3.2-7) and survminer (version 0.4.8) packages.

Estimation of TME cell fractions
The cell type fractions of TME were downloaded from TIMER2.0 (http://timer.cistrome.org/). The dataset was calculated on xCell, a gene signature-based method for the estimation of TME cell types, spanning multiple adaptive and innate immunity cells, hematopoietic progenitors, epithelial cells, and extracellular matrix cells (Aran et al., 2017;T. Li et al., 2020).

Evaluation of hypoxia status
Hypoxia scores was evaluated using single sample gene set enrichment (ssGSEA) analysis based on the 15 gene signatures via R GSVA (version 1.34.0) package (Buffa et al., 2010;Ye et al., 2019). Unsupervised hierarchical clustering was performed to classify tumor samples into hypoxia-high, -low and -intermediate groups based on the relative expression level of these hypoxia signatures (Ye et al., 2019).

Statistical analysis
Kaplan-Meier curves plot and log-rank tests were used to analyze the statistically signi cant difference between subgroups for OS time via R survival (version 3.2-7) and survminer (version 0.4.8) packages. A survival receiver operating characteristic (ROC) curve was generated to validate the accuracy of the prognostic model in predicting OS with the R survivalROC (version 1.0.3) package. Calibration curves were assessed graphically by plotting the observed rates against the nomogram predicted probabilities and a concordance index (C-index) was calculated to determine the discrimination of the nomogram via rms package (version 6.0-1) in R.
The p values and coe cient of correlation analysis were calculated using R Hmisc (version 4.4-1) package and plotted by R corrplot (version 0.84) package with signi cant value of 0.05. The statistical signi cance of VEGF signaling (Pathway card) and HIF1A hypoxia signaling (PID pathway) related genes in two risk groups were plotted in R ggplot2 (version 3.3.2) package with p value of <0.05 (Wilcox test). Relative expression and detailed interaction of hypoxia signaling related genes were plotted by Pathway Commons App using Cytoscape (version 3.6.1).

Identi cation and evaluation of prognosis hub genes in open-access datasets
To identify hub genes for predicting prognosis of HCC, we used GEO microarray dataset based on the platform of GPL570 with 54675 gene annotations, and RNA-Seq dataset with high sensitivity and accuracy in mRNA expression levels. Taiwan HCC cohort of GSE45436, Japan HCC cohort of GSE112790 and Singapore cohort of GSE121248 mainly suffered from HBV infection (Shimada et al., 2019; H. W. Wang et al., 2013;S. M. Wang et al., 2007). France HCC cohort of GSE62232 was secondary to either a viral hepatitis infection (HBV or HCV) or cirrhosis (alcoholic hepatitis) (Schulze et al., 2015). 1733 reliable DEGs were identi ed using R limma package (version 3.42.3) with the lter criteria as described in Methods (Supplementary Table 1). Mongolia underlying HBV or HCV infection and alcohol consumption was the highest incidence of HCC around the world (Candia et al., 2020;Llovet et al., 2016 (Tang et al., 2019). Based on the above three DEGs gene lists and two OS gene lists, we developed and identi ed 11 overlapping candidates (Fig. 1a). Then we evaluated the relative expression of these hub genes in oncomine datasets. Four genes (DNASE1L3, ADH4, LCAT and ANXA10) were down-regulated in cancer versus normal tissues, while the other seven genes (TMEM106C, TUBG1, CCNB1, UBE2S, SPP1, ILF2 and KPNA2) were up-regulated in HCC (Fig. 1b).

CCNB1 and ANXA10 are reliable factors to predict HCC prognosis
We next performed ENP and BSR analyses to explore the contribution of above hub genes to the prognosis of patients with cancer. Based on the ENP analysis, three genes, i.e., CCNB1 (Cyclin B1), ANXA10 (Annexin A10) and ILF2 were extracted from hub genes to predict the outcome in the training set, according to the minimum mean-square error and corresponding Log lambda value of 4 ( Supplementary  Fig. 1a, b). Meanwhile, BSR analysis identi ed that the combination model of CCNB1 and ANXA10 showed the minimum value of coe cient variable (CV) error among the 11 identi ed hub genes, after compared all possible created models ( Supplementary Fig. 1c).
Based on the above analysis, overlapped genes of CCNB1 and ANXA10, but not ILF2, were emerged as predictors of prognosis, and their relative expression in Wurmbach liver (normalized by Oncomine database) were evaluated upon the corresponding detailed clinical stages (Fig. 1c). OS analysis showed that both of CCNB1 and ANXA10 were signi cantly different in the expression-Low and -High subgroups plotted by Kaplan-Meier curves for TCGA training set (p < 0.001). In addition, high level of CCNB1 and low level of ANXA10 were correlated with poor outcome and OS in advanced HCC (Fig. 1d). We also determined AFP but did not nd its signi cant correlation with OS (Supplementary Table 2). These data indicated that the combination model of CCNB1 and ANXA10 can be used for evaluating prognosis of HCC.

Development and constitution of the HCC prognostic nomogram
Clinical characteristics have the potential to improve the accuracy of HCC risk prediction, in combination with the prognostic genes (Fujiwara et al., 2018;. Thus we analyzed the clinical characteristics in TCGA training set. Univariate Cox analysis showed that BMI_cut (body mass index with cutoff value of 23.4), T stage, M stage, N stage, Stage and Tumor status were closely related to prognosis (p < 0.05) (Supplementary Table 2). As the Cox model is a proportional-hazards model, it implies curves for subgroups of patients should be proportional. BMI was known as one of the prognostic factors of various diseases including malignant tumors. BMI_cut presented that two subgroups of HCC patients were proportional along with survival time after Kaplan-Meier curves analysis ( Supplementary Fig. 2a). Meanwhile, both T stage and TNM stage respectively divided HCC patients into two groups proportionally over time ( Supplementary Fig. 2b, f). However, covariates of N stage, M stage and Tumor status were excluded for the following multivariate Cox analysis due to their Kaplan-Meier curves came cross at the initial segments ( Supplementary Fig. 2c-e). Since in clinics, TNM stage is more reliable than T stage to predict outcome of patients with cancer, BMI_cut and TNM stage were reasonable to be potential prognostic clinical characteristics. We next performed Multivariate Cox regression analysis, which indicated that the expression levels of CCNB1 and ANXA10 were independent risk factors for OS time in HCC (p < 0.05), adjusted by the covariates of Stage, BMI_cut (Fig. 2a). In order to check Cox model assumptions, Residuals method was used. From the output of scaled Schoenfeld residuals, the test was not statistically signi cant for each of the covariates, and so was the global test (p > 0.05) (Supplementary Fig. 3a). Comparison of the magnitudes of the largest dfbeta values to the regression coe cients suggested none of the observations was in uential individually ( Supplementary Fig. 3b). In addition, plots of the Martingale residuals against continuous covariates of CCNB1 and ANXA10 only showed slight nonlinearity ( Supplementary Fig. 3c).
Then we evaluated the Cox model in the training set by Kaplan-Meier curves analysis and found it was reliable to predict prognosis in the training set (p < 0.0001) (Fig. 2b) and was accurate in predicting OS by ROC curve with AUC value of 0.71 ( Supplementary Fig. 4f). Moreover, a nomogram using the rms package (version 6.0-1) in R to predict 1-, 3-and 5-year OS in patients with HCC, based on the coe cients of the multivariable Cox regression model, was constructed (Fig. 2c).

Satisfactory performance of the prognostic model
To test the performance of the above constructed nomogram, we compared the prognosis model to the other two prognostic models based on ENP and BSR analysis by Kaplan-Meier curve and ROC curve plots. OS curves showed the statistically signi cant between subgroups in all of the three models with p < 0.0001 ( Supplementary Fig. 4a-c). While AUC of 0.7083 for the Cox model was the maximum value compared to ENP-and BSR-risk models with AUC of 0.6511 and 0.6516 ( Supplementary Fig. 4d-f), which indicated that the nomogram outperformed than the other two risk models.
To further examine whether the prognostic model was reliable to predict the outcome of HCC. We evaluated the model in internal data set (TCGA validation set) and external cohort (GSE144269). As expected, the model sensitively distinguished Risk-High and -Low subgroups in the validation set, TCGA set and GSE144269 dataset, according to Kaplan-Meier curves with Log-rank test (Fig. 3a-c). In the ROC analysis, the AUC values were higher than 0.65, i.e., 0.72, 0.69 and 0.70 in validation set, TCGA set and GSE144269 dataset, respectively (Fig. 3d-f), which suggested that the prognostic model was e cient in predicting the OS rates in HCC.
To validate the accuracy of the above prognostic model, we determined C-statistic discriminatory index via R rms package (version 6.0-1), which showed a value of 0.71. Moreover, the bootstrapped calibration plots delineated the accuracy of the model in predicting 1-, 3-and 5-year OS rates in TCGA set (Fig. 3g-i), respectively.
Association of TME cell types with the nomogram TME encompasses vascular vessels, immune in ltrates, broblasts, stroma, and some other structures that are recruited to tumor tissue (Meurette and Mehlen, 2018). To explore the function of TME on antitumor therapy of HCC, we estimated the fractions of TME cell types using xCell, a gene signature-based enrichment method (Aran et al., 2017). According to Kaplan-Meier curves analysis, the enrichment scores of immune and stroma were signi cantly different in OS time of TCGA set (p < 0.05) (Fig. 4a), which indirectly suggested the key roles of some TME cells in promoting HCC progression.
We next performed correlation analyzes between OS time, risk scores of the nomogram and fractions of TME cell types. The fractions of common lymphoid progenitor (CLP), common lymphoid progenitor (CMP), endothelial cell (EC), hematopoietic stem cell (HSC), M2 macrophage (M2), T helper 2 cells (Th2) and regulatory T cells (Tregs) were signi cantly correlated with the risk scores and survival time ( Supplementary Fig. 5). Among these fractions, the proportions of ve cell types including CLP, EC, HSC, M2 and Th2 cells were changed signi cantly between risk-groups upon the prognostic model (Fig. 4b).
Next, we used univariate Cox regression to assess the proportional-hazards of the TME cells, which de ned CLP and Th2 as crucial prognostic factors that increased in greater hazard subgroup, while EC and HSC decreased hazards of death in HCC patients (Fig. 4c).
Hypoxic TME predicts poor outcome in HCC An essential feature of TME is hypoxia, which promotes tumor progression and drug resistance. Consistent with previous study (Ye et al., 2019), the risk scores of the prognostic model increased with the reduction of oxygen partial pressure in TME (Fig. 5a).
HIF1A is the central gene in anoxic regulation process (Bertout et al., 2008). We searched the HIF1A related pathways, namely PID_HIF1A_PATHWAY, described as hypoxic and oxygen homeostasis regulation of HIF-1-alpha (Schaefer et al., 2008). Firstly, we extracted the expression data of 19 related genes and found 14 up-regulated genes in RiskHigh versus RiskLow group upon the prognostic model (Fig. 5b, Supplementary Table 4). Then detailed interaction of hypoxia signaling related genes were plotted in Cytoscape 3.6.1, which showed that HIF1A was indeed the central factor in the PID pathway of HCC (Fig. 5c). These results highlighted the signi cance of hypoxia in predicting the prognosis of HCC.
Angiogenesis in TME correlates with poor prognosis of HCC It has been well established that VEGF is a downstream target of HIF-1 and that VEGF signaling pathway mediated angiogenesis promotes tumor metastasis and progression (Palazon et al., 2017). We next extracted the transcriptome mRNA expression level of 62 angiogenesis-related genes according to PathCards (https://pathcards.genecards.org/Pathway/1840) from TCGA set. We found that 37 genes were up-regulated while 3 genes were down-regulated in RiskHigh versus RiskLow group (Fig. 6a,  Supplementary Table 4), which suggested that the angiogenesis was active in RiskHigh group.
Previous literature indicated that EC and HSC played important roles in angiogenesis to supply oxygen and nutrition for tumorigenesis, tumor development and progression (Hirata and Sahai, 2017;Junttila and de Sauvage, 2013;Wu and Dai, 2017). We assessed gene signatures of TME cells, and found that most of them were up-regulated in RiskHigh group (Fig. 6c), and positively correlated with hypoxia scores and the risk scores (Fig. 6d). These results indicated that EC fraction was differentially regulated during the tumorigenesis and progression of HCC. It was decreased with the destructed structures of hepatic lobules, while the angiogenesis was increased in tumor progression.
Finally, all risk factors of the nomogram for predicting prognosis of HCC we identi ed above were estimated the validation. The proportion of CLP and Th2 in TME, enrichment scores of hypoxia based on 15-gene signatures, risk scores and relative expression of CCNB1 were positively correlated with each other, while the fraction of EC and HSC negatively correlated with the OS time of HCC patients (Fig. 6b). Together these data demonstrate that the nomogram proposed here is reliable for the prediction of HCC prognosis.

Discussion
HCC is one of most common cause of cancer-related mortality with no satisfactory anticipation by currently treatment strategies (Yang and Heimbach, 2020). It is of great signi cance to reveal the underlying molecular mechanisms of HCC progression and develop effective treatment strategies. In this study, we constructed a reliable prognostic nomogram, which signi cantly distinguished RiskHigh group from HCC patients. In addition, the prognostic model addressed four key issues: 1) the hub genes were de ned from independent multi-platform datasets; 2) the datasets covered all major HCC etiologies, including HBV, HCV, alcohol abuse, or NAFL; 3) the prognostic model combined clinical characterizations with transcriptome gene signatures; and 4) the nomogram outperformed than ENP-and BSR-risk models in internal validation and external con rmed cohorts.
When taken prognostic factor into account in the identi cation of hub genes, CCNB1 and ANXA10 were identi ed closely related with outcome of HCC. CCNB1, a regulatory protein involved in mitosis, is necessary for regulation of the G2/M transition phase in the cell cycle. Previous studies have been proved that high expression of CCNB1 is a candidate biomarker of poor prognosis in HCC (Alisi et al., 2011;Weng et al., 2012). ANXA10, a member of the annexin family, plays a role in the regulation of cellular growth and in signal transduction pathways. Down-regulation of ANXA10 in HCC is associated with vascular invasion, early recurrence, and poor prognosis (Liu et al., 2002;X. Liu et al., 2020). Although CCNB1 and ANXA10 were previously found to be related to the prognosis of HCC, our study is the rst to report the feasibility and accuracy of a prognostic nomogram based on their expressions, BMI and TNM stage. TME orchestrates the tumor growth, invasion, metastasis, and therapy resistance. Hypoxia is a common characteristic of TME and is a major contributor to enhance resistance of anticancer therapies (Sharma et al., 2019). We found that hypoxia scores were positively correlated with Risk scores of the nomogram, and almost all genes were up-regulated in hypoxia signaling related HIF1A pathway in RiskHigh group. Therefore, hypoxia-targeted therapy is an effective approach to signi cantly improve the prognosis of speci c patients with HCC.
According to our study, the fraction of EC was bidirectionally regulated in biological process of HCC. The hepatic lobules were composed of portal veins, hepatic arteries and sinusoids anastomosed with hepatocytes. The data indicated that the fraction of EC descended as the destruction of hepatic vessels and sinusoids during tumorigenesis. On the other hand, most EC gene signatures and the majority of genes related to VEGF signaling pathway were up-regulated in RiskHigh subgroup, which indicated that angiogenesis was promoted to supply nutrients and oxygen for tumor progression.
CD4 + Th cells play critical roles in adaptive immune responses. Th2 cells, a member of T helper cells secreting IL-4, IL-5, IL-10 and IL-13, mediate anti-in ammatory humoral response and immune suppression via inhibiting Th1 cytokine production (Zhu and Paul, 2010). Th2 cells decreased to some degree by some speci c immunotherapy, and reducing the proportion of Th2 was shown to enhance the e cacy of immunotherapy (Lewis, 2002;S. Li et al., 2020;. In our present study, the fraction of Th2 was increased in RiskHigh subgroup classi ed by the nomogram, which suggested that Th2 cells might be a promising therapeutic target of HCC. However, the mechanisms underlying the targets of Th2 and OS time of HCC awaits to be revealed. Nonetheless, there are some limits and shortages in our study. Firstly, there were only 60 cases in our external cohort (Candia et al., 2020), and the nomogram should be evaluated in other big datasets and multi-platform datasets to con rm its applicability and reliability. Moreover, the nomogram needs to be updated in the future clinical practice. Secondly, we have proved the e ciency of the nomogram by comparing ENP-and BSR-risk models, but many other models such as Weighted correlation network analysis and Multi-Omics analysis were not included in this study. Thus, some other credible models to predict prognosis for HCC may exist. Thirdly, some phenotype evidences were found here, such as hypoxia related to poor outcome and Th2 acted as the therapy target in prolonging survival of HCC, but underlying molecular mechanisms remain unclear.

Conclusions
In conclusion, we identi ed a reliable nomogram as a prognostic model for HCC, which indicates novel therapeutic targets for curing HCC. These observations highlight CCNB1, ANXA10, hypoxia, angiogenesis, EC and Th2 cells to be taken into account in future e cient treatment strategies and precision medicine for HCC. Due to the prediction e cacy of this nomogram, several potential promising advancements in treating HCC could be prospected.   to predict 1-, 3-and 5-year OS in patients with HCC created on the basis of the four independent prognostic factors. The relative expression was normalized by zscore to expand applications of the nomogram for predicting survival time to multi-platform datasets.