Transcriptomic analysis and Novel gene pair-based signatures for Hepatitis B-related hepatocellular carcinoma

Background: Hepatitis B virus (HBV)-related hepatocellular carcinoma (HCC) misses the opportunity for surgery because it is not detected early. The molecular mechanism of hepatitis B-related liver cancer needs further understanding, and effective diagnostic and prognostic models are in urgent need. Methods: Expression pro�les from the Cancer Genome Atlas (TCGA) Liver Hepatocellular Carcinoma (LIHC), GSE121248, GSE94660 GSE76724 from Gene Expression Omnibus (GEO) database were obtained. Differentially expressed genes (DEGs) between normal and tumor HBV-related HCC samples based on GSE121248 and GSE94660. Gene pairs are generated by comparing the expression levels of every two DEGs. A diagnostic signature of pairs of DEGs was built using cross-validation Lasso and Best Subset Selection regression. Hub genes and signi�cant modules were screened by Cytoscape, and potential drugs were predicted by DGIdb. A prognostic signature was established and xCell and ssGSEA were utilized to reveal the cell composition and cancer hallmarks to get an elucidation for the risk. Results: 457 DEGs were screened. A powerful diagnostic signature of 2 pairs of DEGs was built and validated in TCGA-LIHC and GEO datasets repeatedly with assured performance. 10 Hub genes were found and fostamatinib was predicted to have potential therapeutic effect on HBV-related HCC. A prognostic signature with good e�ciency (Log-rank P value<0.05, AUC=93.1%) were established, with stromal score and several hallmarks related to the risk Conclusion: Taken together, the study provided sight into the molecular mechanism as well as a novel strategy for the early diagnosis and prognosis for HBV-related HCC.


Introduction
Among all cancer-related death cases throughout the world, hepatocellular carcinoma (HCC) is the thirdcommonest cause, only after lung cancer and breast cancer [1][2][3].Although many treatments for HCC have been introduced, including radiation therapy, ethanol ablation and transcatheter arterial chemoembolization, the overall survival rate of 5-year remains unsatisfactory, ranging from 40-75% [4].
Currently, the occurrence of liver cancer is considered to be the result of multiple factors, including genetic susceptibility, environmental factors and viral infection, etc [5].One of the risk factors that has been universally accepted is Chronic hepatitis B virus infection, especially in China where HBsAg carrier ratio among patients aged 1-60 is around 7.18% [6,7].Therefore, identifying the speci c genetic characteristics and variation of patients with chronic HBV induced HCC is a key task for us.However, few studies have proposed a comprehensive molecular mechanism of hepatitis B inducing HCC.
Additionally, early diagnosis and prognosis of HBV-related HCC was important, since in the early stages of HCC, liver transplantation or liver resection can be used to remove tumor tissue to obtain a better prognosis, while these methods are not suitable for advanced-stage of HCC because of metastasis [8][9][10].However, early diagnosis and prognosis of HBV-related liver cancer is still di cult although progressions had been made.Therefore, it is critical to nd more effective biomarkers of HBV-related HCC for early diagnosis and prognosis.
With the help of high-throughput sequencing and methylation microarray technology, it has become more convenient and common to identify HCC-related differentially expressed genes (DEGs), which may help us to understand the speci c mechanisms of HCC progress and provide an effective way to detect potential targets in diagnosing and outcome prediction for HCC in early stages.In this study, we applied bioinformatics approaches to study the microarray data, RNA-Seq data imported from GEO database as well as LIHC expression pro le from TCGA database.Multiple bioinformatics analyses were performed, such as Gene Ontology (GO) analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, PPI network construction and survival analysis.Drugs targeting on HBV-related HCC were predicted.
Innovatively, A diagnostic signature composed of gene pairs and a prognostic signature composed of 4 gene pairs were established.And the cellular composition differences and cancer hallmarks were comprehensively analyzed.
Our aim was to identify hub genes and vital pathways in chronic HBV-related HCC and then to establish an both diagnostic and prognostic models for HBV-related HCC.

Materials And Methods
Work ow of the present study was depicted in the Fig. 1.

Data Sources
GSE121248, a microarray dataset containing gene expression information of 70 chronic HBV-related HCC and 37 adjacent normal tissues, was obtained from GEO database, which we only select the paired 37 tissues.GSE94660, RNA-sequencing of 21 HCC patients' tumor and normal paired tissue was also downloaded from GEO database in the form of normalized Reads Per Kilobase per Million mapped reads (RPKM).GSE76724, a microarray dataset containing gene expression information of 115 HCC (46% with HBV infection and 54% with cirrhosis) and 52 adjacent non-tumor tissues, was also obtained from GEO.The Cancer Genome Atlas (TCGA) RNA-seq data of LIHC in the form of Fragments Per Kilobase of exon model per Million mapped fragments (FPKM) was downloaded from UCSC Xena (https://xenabrowser.net/datapages/), which was immediately converted into Transcripts Per Kilobase of exon model per Million mapped reads (TPM).Detailed clinical data of TCGA-LIHC patients were downloaded by TCGAbiolinks package, patients who carried HBV but did not have other HCC risk factors such as HCV, alcohol or non-alcoholic fatty liver disease etc. were included.Survival data was accessed from UCSC Xena.

Identi cation of Differentially expressed genes (DEGs)
Differential gene expression analysis between normal and tumor tissues of GSE121248 and GSE94660 was performed using limma package, with a paired testing applied.After obtaining the degree of difference of all genes, we used the cutoff criteria that genes satisfying |logFoldchange|≥1 and adjusted p-value < 0.05 de ned as the differential expression genes (DEGs), which were then intersected to get the common DEGs of two datasets.
2.3 Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis and single cell sGene Set Enrichment Analysis (ssGSEA).GO (www.geneontology.org) and KEGG (www.genome.jp/)pathway enrichment analyses are two common methods for large-scale or systematic functional enrichment analyses [11,12].The David Database (https://david.ncifcrf.gov/)was applied to conduct GO enrichment and the ClusterPro ler package was used to conduct KEGG enrichment; the adjusted P value < 0.05 was set and only the term whose the number of genes enriched greater than 10 were included in the results.The GO enrichment result was visualized by GOplot [13].ssGSEA were conducted by GSVA package and visualized by heatmap, with gene sets obtained from MSigDb (http://www.gsea-msigdb.org/).Besides, xCell, a novel tool for detailed cellular composition analysis used in our study, were also based on ssGSEA [14].

Least absolute shrinkage and selection operator (LASSO) regression and Best subset selection regression
LASSO regression is a biased estimate for processing multicollinearity data, which constructs a penalty function to obtain more re ned model and compresses regression coe cients to avoid over tting.Here we rst listed all the gene pairs obtained from DEGs to eliminate the batch effect: expression level of every two genes (A and B) were compared, 0 represents A lower than B and 1 represented A higher than B.
Only the pairs whose comparison was consistent in more than 80% of the samples could be included in the variable selection [15].10 folds Cross validation Lasso regression was implemented on the gene pairs by glmnet package in R. The gene pairs with coe cients not equal to zero continued to have a Best subset selection regression by bestglm package to further decrease the variates [16].Starting from the null model (M0), which has only the intercept term but no independent variables, different variates combinations are used for tting, that is, from model M1 containing 1 variate, M2 containing 2 variates, to Mp containing p vartates.Then the best model is selected (according to BIC) from the total of p + 1 models.The variates con gured by this best model are the ltered variates.Finally, gene pairs selected were included in a logistic regression.And we performed classi cation prediction using the predict function in the training set and validation set.

Construction of PPI network and identi cation of hub genes
For the purpose of exploring the protein interactions among DEGs, the PPI network was constructed with the help of STRING, a web-based tool (http://www.string-db.org).DEGs identi ed were imported into STRING and the interaction pairs between proteins were put out when their combined score > 0.7, which according to the website, had a high con dence.Afterwards, we used the Cytoscape software to conduct the visualization of the network.Furthermore, to obtain a clearer and more focused view of the PPI network, MCODE was applied to select the signi cant modules, then degree of genes were sorted by Cytohubba, a plugin in Cytoscape, and the top 10 genes were considered as hub genes.

Drug-Gene interaction prediction
DGIdb (https://www.dgidb.org/search_interactions)was used to predict the drug-gene interaction to provide potential therapies for HBV induced HCC [17].Firstly, hub genes which had a signi cant prognostic value was input into the database.The drugs output should be the ones that had been approved and had a clear pharmacological effect (inhibitor or activator).

Kaplan-Meier analysis and Cox regression
The prognostic values of hub genes in chronic HBV-related HCC patients were evaluated in the K-M plotter (http://kmplot.com/analysis/).A total of 150 hepatitis virus positive patients were analyzed for overall survival.The univariate Cox regression and multivariate Cox regression for the gene pairs were conducted in R by survival and survminer.Gene pairs which had a signi cant prognostic value (P < 0.001) rstly underwent Lasso-Cox regression and then backward stepwise regression was implemented to shrink the variates number.Then the patients' risk was calculated using predict function.A K-M analysis was done in R. At last, the ROC curve was drawn by pROC package.

Statistical analysis
Statistical analysis was completed by R 4.0.3 and other online tools.The identi cation of DEGs was based on the built-in limma package.Brie y, the genes were rst tested by paired t test and then underwent a Bayesian test.Paired t test was to compare the expression level of hub genes in TCGA database.Cross-validation Lasso regression, Best subset selection regression and Logistic regression were used to establish diagnostic model.Log rank test were applied for K-M analysis.Univariate and multivariate Cox regression were used to select prognostic pairs and establish a signature.

Identi cation of DEGs and function annotations
GSE121248, containing 37 chronic HBV-related HCC samples and matched 37 normal samples and GSE94660, which contained 21 HBV-HCC patients' non-neoplastic liver tissues and matched tumor tissues were selected to undergo differentially expressed analysis with threshold set to adjusted P value < 0.05 and |logFC|≥1.5,817 DEGs were identi ed in GSE121248 and 1857 in GSE94660.After intersection, 457 genes were regarded as common DEGs in two sets (Fig. 2a).KEGG and GO pathway enrichment analysis towards DEGs were carried out by clusterPro ler and DAVID v6.8 respectively.KEGG pathway enrichment indicated that DEGs played a role in Chemical carcinogenesis, Cell cycle, Retinol metabolism and Fatty acid degradation, etc. (Fig. 2b).And for the biological process (BP) in GO enrichment, it was shown that DEGs mainly enriched in oxidation − reduction process, cell division (mainly upregulated DEGs) and metabolic process (mainly downregulated DEGs) (Table S1, Fig. 2c).Generally, we demonstrated that the DEGs of HBC-related HCC patients mainly involved in two aspects of pathways, one is cell cycle and the other is metabolism.

A diagnostic signature composed of gene pairs for HBV-related HCC patients
To make the DEGs identi ed available for diagnostic usage, such as pathologies di cult to diagnose, we sorted to establish an effective model based on DEGs for suspicious HBV-HCC patients.In order to avoid the impact of batch effects when building and verifying the model, we adopt a novel approach that is the gene pairs to replace genes.After being intersected between two groups, a total of 25286 gene pairs were identi ed to construct the signature (With the criteria set to 0.2 between 0.8, all of these gene pairs were not consistent within samples).Cross-validation Lasso regression in GSE121248 (training cohort) identi ed 13 gene pairs with classi cation value based on the lambda.minselected (Fig. 3a, 3b, Table S2).Then the 13 gene pairs underwent best subset selection regression which shrink the variate number into 2, RGS5|FAHD2A, CXCL14|SAMD5.Next, logistic regression was applied to get the gene pairs' coe cients (Table 1).The diagnostic model was rst to predict the classi cation in the training cohort itself, which presented an excellent performance of classi cation with sensitivity, speci city both equaled to 100% (Fig. 3c, 3d).Then, the signature was applied to predict the classi cation of the rst validation cohort, GSE94660, which also showed a quite good e ciency (AUC = 90.5%,95%CI = 81.87%-99.08%)(Fig. 3e, 3f).Then, in order to verify the signature again, we applied the model on the HCC patients with HBV infection in TCGA-LIHC cohort which contained 6 tumor and 6 normal samples.The result indicated that our signature indeed had a good diagnostic effect (ROC-AUC = 100%) (Fig. 3g, 3h).
To explore if our signature had a good performance on the HCC patients but not HBV-related HCC, we then applied the model on GSE76724, which showed that the performance of the model was slightly weakened, but still in a higher quality (ROC-AUC = 86.6%,95%CI = 81.07%-92.06%)(Fig. 3i, 3j).In a word, the gene pair-based diagnostic model had a good performance both in training and validation cohort.

Construction of PPI network and identi cation of hub genes
STRING was applied to perform a PPI network of DEGs, and 308 nodes and 1943 edges were depicted in the network map.The network was then visualized by Cytoscape.MCODE was applied to identify crucial gene modules in the network.4 modules with highest score were selected (Fig. 4a).Genes in each clusters were input into string again for functional enrichment, which were shown in Table 2 (Table 2).

Steroid hormone biosynthesis
Synthesis of bile acids and bile salts via 24hydroxycholeste Subsequently, we applied Cytohubba to sort out the 10 genes assessed with highest connectivity degree.

Survival analysis of 10 hub genes individually and conjointly
We applied the K-M plotter to examine the overall survival of the 10 potential hub genes.The tool set the best cutoff value automatically according to expression level of each input gene, two groups of patients were established to calculate their overall survival curves respectively.A total of 150 Hepatitis virus positive HCC patients were analyzed for overall survival.Data showed that the higher expression of 7 genes (CDK1, CDC20, MAD2L1, CCNA2, CCNB1, AURKA, NDC80) was related to disadvantageous survival impact of HCC samples (Log-rank P < 0.05) (Fig. 6a-g).And the 7 genes underwent a gene-weighting conjoint overall survival analysis to look for their comprehensive effect on the overall survival in chronic HBV-related HCC patients, which also presented an adverse prognostic value (Upper quartile survival: 54.1 months in low expression cohort vs 13.8 months high expression cohort, log-rank P value = 0.01) (Fig. 6h).
Then, we used the DGIdb database to predict drugs targeting the 7 hub genes which was believed to have adverse prognostic effect.Based on the criteria described in Methods, only one drug, FOSTAMATINIB, was identi ed to inhibit both CDK1 and AURKA.

A prognostic signature composed of gene pairs for HBV-related HCC patients
Moreover, prognostic signatures for HBC-HCC patients were also in urgent needs.Again, gene pairs were selected to construct the signature because of the repeatability.A total of 40063 gene pairs were screened to be heterogeneous based on 442 of 457 DEGs in HBV-related TCGA-LIHC cohort (n = 95).With univariate Cox regression, 83 pairs exhibited signi cant prognostic value (P < 0.001) (Table S4).After Lasso-Cox and stepwise regression, 4 gene pairs were left to compose the model (Fig. 7a).
To evaluate the prognostic effect of the signature, 95 patients were divided into two groups according to the risk calculated by the signature.K-M analysis revealed a signi cant prognostic e ciency of OS between two groups (Log-rank p value < 0.0001, ROC-AUC = 93.1% 95%CI = 87.8%-98.5% ) (Fig. 7b, 7c).Subsequently, a nomogram was depicted to predict the 1-year and 5-year overall survival based on the signature, with a assured performance shown in the 5-year calibration curve (Fig. 7d, 7e).

Elucidation of the risk predicted by the prognostic signature
To undermine the mechanisms that determine the higher risk of HBV-HCC patients, we rstly applied xCell to unveil the cellular landscape between high and low risk groups.In the heatmap in which all of the cells had been hierarchically clustered, we observed the rst cluster in which the cells were downregulated in the high-risk group, including hematopoietic stem cells (HSCs), pericytes, megakaryocytes as well as a reduction of stromal score (Wilcoxon test, P value < 0.05) (Fig. 8a).
Subsequently, hallmarks of cancer which had been widely recognized were downloaded from MSigDb, including angiogenesis, apoptosis, cell cycle, DNA repair, EMT, hypoxia, in ammatory & immunity and metabolism pathways.ssGSEA demonstrated multiple hallmarks responded to the risk.DNA Repair, G2M Checkpoint, Myc Targets, WNT/β − Catenin Signaling and E2F Targets were found to be upregulated in the high-risk groups, and Heme Metabolism, Bile Acid Metabolism, Fatty Acid Metabolism and Xenobiotic Metabolism were found to be downregulated.(Fig. 8b).
Furthermore, to determine whether the prognostic model could be an independent prognostic factor, we rst included different demographic characteristics, such as age, pathologic stage, T of TMN staging system, which have been considered to have impact on the HCC patients (M and N staging were not included because a large proportion of missing value).Univariate Cox regression implicated that tumor stage, T staging and our signature had a prognostic impact on HBV-HCC patients, however, when it comes to multivariate Cox regression, only the model still had a signi cant prognostic impact rather than tumor stage and T staging (P < 0.001) (Fig. 8c), hinting that the gene pairs-based signature was an independent prognostic factor for HBV-related HCC patients.

Discussion
In the present study, we focused on the HBV-related HCC because it is not only one of the common subtypes of liver cancer, but also a tricky public health problem.With the development of high-throughput sequencing and microarray technology, identifying DEGs and potent pathways in hepatocellular carcinoma is becoming a routine task, which provides an effective way to explore potential targets in diagnosing and prognosing HCC in the early stages [18,19].However, only a small part of studies focused on the establishment of signatures for HBC-related HCC [20] which makes our study meaningful.Here, we excavated DEGs between HCC and adjacent normal tissues based on two sets of gene expression pro ling data.457 genes were identi ed as DEGs which underwent GO analysis, KEGG analysis.PPI network were constructed and 10 hub genes were selected to undergo survival analysis.Innovatively, A diagnostic signature composed of 2 gene pairs and a prognostic signature composed of 4 pairs were built with high e ciency.Moreover, high risk of HBV-related HCC was primarily elucidated based on cellular composition analysis and enrichment.
Firstly, combined KEGG and GO analysis, it was clearly implicated that the identi ed DEGs were signi cantly involved in both cell cycle and metabolism process.Interestingly, up-regulated genes were mainly enriched in pathways ways associated with cell cycle and down-regulated genes mainly involved in metabolism pathway.For the cell cycle, there have been some studies unveiled the relationship of Hepatitis B virus, mitosis and hepatocarcinogenesis. HBV is usually involved in the development of liver cancer through HBx, immune imbalance and viral DNA integration into the host genome [21,22].Higher HBV load may contribute chronic liver in ammation by mediating immune response of host cytotoxic T lymphocyte (CTL) to promote the destruction and regeneration of HBV infected hepatocytes, and then increase the opportunity of mitotic replication errors [23,24].The accumulation of genetic and epigenetic alterations on cell growth advantage ultimately promotes the hepatocarcinogenesis [25].HBx protein is an oncoprotein encoded by HBV, which may act as a transcription activator of oncogene or host gene to participate in cell growth regulation, DNA repair and epigenetic modi cation [26].It may also affect cell cycle, cell transformation and other signal transduction pathways, and then contribute to HCC development [27,28].HBx can inhibit the expression of Wnt/catenin, or activate the expression of HURP through p38 / MAPK pathway and SATB1 protein by interacting with p53, TGF β, Fas and TNF, leading to the accumulation of anti-apoptotic proteins, and eventually inhibit the apoptosis of hepatoma cells [22,[29][30][31][32]. On the other hand, genomic integration of HBV leads to host genome dysfunction, affecting cell proliferation, cell cycle progression, apoptosis, and even chromosome stability [33].Studies have found that recurrent integration of TERT and MLL4 is often observed in HCC [34]; HBx-LINE integration, as a function of long non-coding RNA, has been proved to promote the carcinogenesis [35].
Secondly, for the phenomenon that downregulated genes mainly participated in the metabolism pathways, it has been found that HBV infection is widely involved in the metabolic process of liver cells, thus participating in the hepatocarcinogenesis [22,36].Studies found that HBV infection leads to upregulation of glucose metabolism (such as gluconeogenesis, glucose aerobic oxidation, and pentose phosphate pathway) and lipid metabolism (such as fatty acids, phospholipids and cholesterol biosynthesis) [37,38] in liver.Recent studies on the mechanism of metabolic recombination provide a new perspective for HBV induced hepatocarcinogenesis. Studies have shown that HBx overexpression can participate in the transcriptional activation of steroid regulatory element binding protein-1 and peroxisome proliferator activated receptor (PPAR γ), leading to hepatocyte steatosis [39].On the other hand, HBx also participate in the process of glucose metabolism recombination by regulating mitochondrial autophagy through BNIP3L, which can increase the stemness of hepatoma cells [40].Some of the above studies are consistent with our study, but why most metabolic-related genes are down-regulated deserves further study.Furthermore, for the oxidation-reduction process, it is recognized that the formation of ROS is a common feature of many cellular biological functions and an important target for cytotoxic effects in the progression of HCC [41].Important ROS-related liver cancer growth signal transduction pathways include nuclear translocation of nuclear factor-Kb (NF-kB) [42].PI3K/AKT/mTOR signaling cascade is also involved in oxidative press of HCC.Its activation is regarded as a stimulator for cell growth and proliferation [43].Unlike NF-kB, the increased level of ROS suppresses the phosphorylation of AKT and mTOR, thereby enhances the apoptosis of HCC cells [44].Recent studies have reported various genes and drugs targeting the PI3K/AKT/mTOR cascade.A combination of sorafenib and C2ceramide therapy was thought to stimulate the production of ROS, activating caspases-dependent cell apoptosis via PI3K/AKT/mTOR pathway [45].CYP3A5, a member of CYP450, can induce ROS accumulation, inhibiting AKT phosphorylation at Ser473 [44].CYP450 is considered as another major source of ROS production except from NADPH, especially in liver [41].However, there are few studies associating CYP450 with the oxidation-reduction process of HCC patients.In this study, a total of 57 genes enrich in oxidation-reduction process, 16 of them belong to CYP450 family, indicating a potential target for the HCC therapy.
The diagnostic model, which was established after cross-validation Lasso regression and best subset selection regression, presented to have a good diagnostic performance both in the training and validation cohort.We thought highly of the diagnostic model constructed in the present study rstly because of the application of gene pairs, which was a good way to eliminate overall sequencing differences between datasets[46], and we used a quite serious variates selection pipeline to determine the gene pairs included in the model.Here, RGS5|FAHD2A and CXCL14|SAMD5 are two gene pairs that composed the signature.It could be conceived that clinically, the expression levels of these two pairs of genes can be compared to identify whether patients with hepatitis B have already acquired HCC.This can solve the problem that different high-throughput methods may not be suitable for model coe cients but the pity is that the number of samples in our training set is not very large.Secondly, because all possible variate combinations have been traversed in the Best subset selection, the selected features can be optimal, and additionally, cross-validation Lasso avoided the over tting.This model has shown very high e cacy in repeated validations, and we believe it can be a good method for assisting pathological diagnosis of HBV induced liver cancer.
As to the hub genes, including all hub genes were up-regulated in cancer tissues, and most of them are related to cell cycle and mitosis.CDK1 was the one with highest degree.According to previous studies, CDK1, also called cell division cycle protein 2 (CDC2), is an important regulator of the cell cycle, which is required for the transition from the G2 phase into mitosis [47,48].For HCC, several studies have suggested that CDK1 related to HCC cell proliferation [49] and regulated apoptin-related cell apoptosis [50].Interestingly, FOSTAMATINIB was predicted to have therapeutic effect for HBV-related HCC patients.
Fostamatinib, a FDA-proved tyrosine kinase (Syk) inhibitor, has been proved to be able to treat chronic immune thrombocytopenia and rheumatoid arthritis [51,52].Recent studies even hinted that it has effect on COVID-19 [53].The effect of fostamatinib on HCC was hardly reported and lacked large-scale clinical trials [54].Here, we further revealed the potential mechanism of fostamatinib treatment of HBV-reduced HCC, that is, targeting CDK1 and AURKA and breaking the disease-causing PPI network.
Furthermore, a prognostic model composed 4 gene pairs was built for HBV-HCC patients, which presented an excellent performance (AUC = 93.1%).The advantages of pairs discussed above also made sense in the prognostic model.However, we hope to have a larger cohort to establish and validate models in the future, for there are still few high-quality datasets for hepatitis B-related liver cancer, especially those with clinical follow-up data.
With xCell, we primarily demonstrated the underlying mechanism related to the rising risk of death from HBV-related HCC, which mainly points to the loss of stromal cells in the tissues.It had been reported that the score of stromal and immune cells of tumor tissue have association with its clinical characteristics [55].Tissue-based approaches were con rmed to be able to characterize the tumor in ltrates in clear cell renal cell carcinoma and primary glioblastoma multiforme, according to previous studies [56,57].A recent research showed that high-risk cohorts in HCC have higher immune, and stromal scores than that in low-risk cohorts, which was consistent with our results[58].
Further, result of ssGSEA hinted that the apart from WNT/ß−Catenin Signaling, DNA Repair and G2M Checkpoint which had been fully discussed above, Myc and E2F targets also related with the higher risk of HBV-related HCC.And metabolism pathways exhibited down-regulation in the high risk group.For it was wildly accepted that HBx and c-Myc were respectively oncogenic factors in the HBV virus and host hepatocyte, for critical interaction was con rmed between the two proteins [59].HBx-mediated Myc stabilization greatly promoted the carcinogenic effect of the virus, on account that SCF (Skp2) ubiquitin E3 ligase could intervene Myc ubiquitination that leaded to Myc oncoprotein, upon which HBx has an effect of stabilization [60,61].At the same time, by activating PIK3CA/Akt/mTOR and c-Myb/COX-2 pathways, c-Myc drove apoptosis, for which E2F1 was found acting as a weakening factor[62, 63].In RB1-altered tumors, the E2F pathway was found activated which was already reported in HCC.While E2F pathway could also activate in CCN-HCC without RB1 inactivation event which might be partly contributed by the ability of cyclin E/Cdk2 complexes to phosphorylate Rb [63].Brie y, we demonstrated that some pathways not only involved in the occurrence of HCC but also involved in the subsequent progression of the tumor.

Conclusion
This study conducted a relatively comprehensive bioinformatics analysis of hepatitis B related HCC, which revealed that DEGs participate in various pathways but mostly in cell cycle and metabolism process in HBV-HCC.And analysis of these genes pave roads for future molecular mechanism studies and provide targets for the potential therapy of HBV-related HCC in clinic.Innovatively, a diagnostic model and a prognostic model based on gene pairs were established with a good performance, and the risk was related with stromal cells and different hallmarks.Together, this study sheds lights on the prevention and treatment of HBV-related hepatocellular carcinoma.

Figures
Figures

Figure 3 A
Figure 3

Figure 4 Construction
Figure 4

Figure 7 A
Figure 7

Table 3
Validation of 10 hub genes in TCGA-LIHC 6 pairs of HBC-HCC tissues