Identification and Analysis of the Key Genes associated with the Development from Liver Cirrhosis to Hepatocellular Carcinoma Based on Bioinformatics Analysis

Background: Hepatocellular carcinoma (HCC) is the most frequent primary liver tumor, and one of the most common malignant cancer with poor prognosis. Liver cirrhosis is the major risk factor for HCC. The aim of this study was to identify potential key genes associated with the development from liver cirrhosis to HCC and explore their potential mechanisms. Methods: Four microarray datasets GSE17548, GSE63898, GSE25097 and GSE89377 were downloaded from the Gene Expression Omnibus database. A protein-protein interaction (PPI) network was constructed using the STRING database, and potential hub genes were screened using MCODE plug-in in Cytoscape software. The Oncomine database was used to verify the expression of differential genes in cirrhosis and HCC. In order to further verify those hub genes, the hierarchical cluster between normal and HCC tissues was constructed using the UCSC Cancer Genomics Browser. The UALCAN database was used to verify the difference of hub genes in normal and HCC tissues and in different tumor grades. Finally, the cBioPortal online platform was used to analyze the association between the expression of hub genes and prognosis in HCC. Results: A total of 360 DEGs, including 280 downregulated and 80 upregulated genes, were identified. Gene ontology enrichment (GO) analysis showed that these DEGs were mainly enriched in monooxygenase activity, cofactor binding, and oxidoreductase activity (acting on CH-OH group of donors). The mainly enriched pathways were complement and coagulation cascades, prion diseases, and arachidonic acid metabolism. By extracting key modules from the PPI network, 16 hub genes were screened out. In the hierarchical cluster of hub genes between normal and HCC tissues, the results showed that the expression level of 16 hub genes in HCC tissues was significantly higher than that in normal tissues. In addition, expression level of the hub genes was significantly associated with the tumor CH- OH group of donors), oxidoreductase activity (acting on paired donors, with incorporation or reduction of molecular oxygen), iron ion binding, peptidase inhibitor activity. KEGG pathway analysis revealed that DEGs were enriched in multiple pathways, including complement and coagulation cascade pathway, prion diseases, chemical carcinogenesis, and viral protein. interaction with cytokine and cytokine receptor p53 signaling pathway. The results suggest that these pathways may be important HCC-associated oncogenic pathways. pathways involved in development to HCC. genes were considered as hub genes. The survival analysis showed that the overexpression of 6 hub genes (KIF20A, HMMR, RRM2, TPX2, TTK and UBE2C) was closely associated with the poor prognosis of HCC. Our results provide new insight into understanding the molecular mechanism underlying HCC carcinogenesis. Further experimental validation should be carried out to confirm these findings in the future.

grades. The survival analysis showed that six hub DEGs, including KIF20A,HMMR, RRM2, TPX2, TTK and UBE2C, were closely associated with the poor prognosis of HCC.

Conclusion:
Our study discovered six novel potential genes associated with the development from liver cirrhosis to HCC. These key genes may be used as prognostic biomarkers and molecular therapeutic targets for HCC.

Background
Liver cancer is the sixth most common cancer and the fourth leading cause of cancerrelated death worldwide in 2018, with about 841,000 new cases and 782,000 deaths annually [1] . Hepatocellular carcinoma (HCC) is the main type of primary liver cancer, which accounts for 75%-85% of all cases [2] . Despite the continuous development of novel treatment strategies, the 5-year survival rate of HCC is still very low, especially in advanced HCC [3,4] . The main risk factors for HCC are chronic infection with hepatitis B virus or hepatitis C virus, heavy alcohol intake, and obesity [5,6] . In the process from chronic inflammation to HCC, liver cirrhosis is recognized as a major step. The incidence of HCC is markedly increased in the cirrhotic state compared with non-cirrhotic state, irrespective of the etiology of liver disease [7] . Therefore, it is important to understand the precise mechanism involved into the development and progression from liver cirrhosis to HCC.
The study of differentially expressed genes (DEGs) in different states of the disease can infer the intrinsic relationship between genes and disease occurrence. The development of high-throughput microarray technology has provided an efficient tool to analyze gene expression profiles, which helps us better understand the general genetic changes and potential mechanisms of tumorigenesis [8,9,10] . As more and more resultant data are updated in public databases, bioinformatics analysis basing on these data has been widely applied to identify novel targets associated with cancer, including HCC. Many studies are focused on exploring the difference of gene expression between HCC tissues and adjacent non-tumor tissues, regardless with or without cirrhosis. However, only a few studies have made a comprehensive comparison between HCC and liver cirrhosis [11,12] .
The aim of this study was to screen DEGs between liver cirrhosis and HCC tissue samples, and identify the key genes and pathways associated with HCC. In order to avoid the limitation due to the application of a single dataset, four microarray datasets were used.
Furthermore, the potential value of the key genes in the prognosis of HCC was evaluated.
The study will provide the help to understand the underlying molecular mechanisms of HCC carcinogenesis and progression.

Microarray data
Four gene expression profile datasets GSE17548, GSE25097, GSE63898, and GSE89377 were obtained from the NCBI GEO database (https://www.ncbi.nlm.nih.gov/geo/). The data contained 552 HCC and 241 liver cirrhosis tissue samples. The information of samples included in each dataset were listed in Table 1. Perl was used to combine four gene expression datasets, and gene probes were converted into genesymbol. When one gene corresponds to multiple probes, the average value was taken. R package "sva" (https://bioconductor.org/packages/release/bioc/html/sva.html) was used to normalize the merged chip data.

Identification of DEGs
The DEGs of the four gene expression datasets between HCC and liver cirrhosis samples were analyzed. The log 2 fold-change (FC) and adjusted P-values were calculated by R package "limma" [13] . Genes that fulfilled the criteria of |log 2 FC|≥1 and adjusted P<0.05 were considered statistically significant and termed DEGs.
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of DEGs ClusterProfiler [14] is an R software package that not only automates the process of biological term classification and gene cluster enrichment analysis, but also provides a visualization module for displaying analysis results. In the present study, the R package "ClusterProfiler" was used to identify and visualize the GO analysis and KEGG pathways enriched by DEGs.

PPI network construction and module analysis
Search Tool for the Retrieval of Interacting Genes Database (STRING) (https://www.stringdb.org/) was used to assess protein-protein interaction (PPI) information. Confidence score > 0.7 was set as significant. In order to explore the relationship between DEGs, we converted the results visually by using Cytoscape [15] V3.7.1. The most significant module in the PPI networks was identified using MCODE [16] . The criteria for selection were as follows: MCODE scores >5, degree cut-off=2, node score cut-off=0.2, max depth=100 and K-score=2. Subsequently, the KEGG and GO analyses for genes in those modules were performed using "clusterProfiler" R package.
Hub gene selection and expression analysis of hub genes in TCGA The most significant gene module was selected as the hub genes for analysis, with the MCODE score ≥15. The biologic process of hub genes was analyzed by using BiNGO [17] plug-in of Cytoscape. The difference between cirrhosis and HCC genes expression was analyzed using the Oncomine database (https://www.oncomine.com). To further identify key genes, normal and tumor tissues were compared based on TCGA database.

DEGs between HCC and liver cirrhosis
A total of 360 DEGs, including 80 upregulated genes and 280 downregulated genes, were obtained (Table S1). A volcano plot of the DEGs was shown in Figure 1, and a heat map was shown in Figure S1.

GO and KEGG analyses of DEGs
GO enrichment analysis showed that the DEGs were mainly enriched in cofactor binding, glycosaminoglycan binding, extracellular matrix structural constituent, oxidoreductase activity, iron ion binding, and peptidase inhibitor activity ( Figure 2 and Table 2). KEGG pathway analysis showed that the DEGs mainly concentrated in completion and coagulation cascades, prion diseases, chemical carcinogenesis, and viral protein interaction with cytokine and cytokine receptor ( Figure 3 and Table 3).  PPI network construction and module analysis To identify the core genes and crucial gene modules involved in HCC from the interaction level, Cytoscape software and STRING database were used. The combined score higher than 0.7 in PPI was used for constructing the PPI networks ( Figure S2). A total of 359 DEGs of the 360 commonly altered DEGs were filtered into the DEG PPI network complex, which contained 359 nodes and 627 edges ( Figure 4A). In addition, two significant modules (modules 1 and 2) with a score≥8 were screened out via MCODE. The functional analyses of genes involved in those modules ( Figure 4B and 4C) were analyzed using 'clusterProfiler', the results are shown in Table S2 and Table S3. In total, 16 genes were identified as hub genes, including CDKN3, ASPM, UBE2C, CENPF, TOP2A, TPX2, CCNB2, TTK, AURKA, NUSAP1, CCNB1, CCNA2, RRM2, KIF20A, NDC80 and HMMR. The names, abbreviations and functions for these hub genes were shown in Table 4. The biological process of the hub genes was analyzed and visualized using BiNGO and the result is shown in Figure 5, which mainly enriched in mitotic cell cycle, cell cycle phaseand, M phase and nuclear division. The expression levels of hub genes in TCGA The analysis of Wurbach liver dataset from the Oncomine database showed that the mRNA levels of 15 genes except CENPF in HCC were significantly higher than those of liver cirrhosis ( Figure 6). The expression levels of 16 hub genes were also compared between HCC and normal tissues using the UCSC Cancer Genomics Browser, revealing that these hub genes were highly expressed in most HCC samples (Figure 7). These results were further confirmed using UALCAN database (Figure 8). Further subgroup analysis demonstrated that the expression levels of the 16 hub genes were significantly associated with tumor grades (Figure 9).

Kaplan-Meier survival analysis for hub genes in TCGA
OncoPrint showed that the genetic alterations of 16 hub genes were found in 174 of 371 HCC patients (47%) ( Figure 10A). The proportion of alterations for individual genes varied between 4% and 21%, and ASPM had the highest level of amplification in HCC. ( Figure   10C). Subsequently, the prognostic analyses of the 16 hub genes were performed in the HCC datasets of the cBioPortal online platform. The results showed that HCC patients with CCNB1, CCNB2, HMMR, KIF20A, NDC80, RRM2, TPX2, TTK and UBE2C alteration had worse overall survival ( Figure 11A), whereas the patients with KIF20A, RRM2, TPX2, TTK, UBE2C, HMMR, ASPM and NUSAP1 alteration exhibited worse disease-free survival ( Figure 11B).

Discussion
The occurrence of HCC is an extremely complicated process. Liver cirrhosis is present in 80%-90% of HCC patients and represents a relevant risk for the development of HCC [18] .
However, the molecular mechanisms underlying the progression from liver cirrhosis to HCC remains unclear. In the present study, we performed a series of bioinformatics analysis to screen key genes and pathways closely related to HCC using four GEO databases. Although there were two studies identifying DEGs between HCC and cirrhosis [11,12] , their studies only used one dataset, and contained small sample sizes. The present study utilized multi-chip joint analysis to provide more reliable and accurate assessment. Our study identified a total of 360 differential genes, including 80 upregulated and 280 down-regulated genes. Furthermore, GO  Furthermore, the expression of these genes was positively correlated with tumor grades.
Survival analysis based on cBioPortal database showed that high expression of KIF20A,HMMR, RRM2, TPX2, TTK and UBE2C was associated with worse overall survival and disease-free survival. The results suggest that these genes may be involved in HCC pathogenesis, and provide clues to explore the molecular mechanism for future investigation.
KIF20A is a member of the kinesin family protein, which participates in spindle assembly during mitosis [19,20] . Numerous studies have shown that KIF20A promotes the proliferation, invasion and migration of cancer cells, and the high expression of KIF20A is significantly related to the occurrence, migration and prognosis of various tumors, such as glioma [21] , gastric cancer [22] , ovarian cancer [23] , and pancreatic cancer [24] . The prognostic value of KIF20A for HCC is also demonstrated [25] . Mechanistically, KIF20A was found to serve as a novel downstream target of glioma-associated oncogene 2 (Gli2), a major transcriptional regulator of hedgehog (Hh) signaling. Gli2 could directly activate the transcription of FoxM1 in response to Hh signaling, which in turn increase KIF20A expression by activating FoxM1-MMB complex [26] . Therefore, KIF20A could have potential as therapeutic targets for HCC.
RRM2 is essential enzyme in DNA synthesis and replication. Many studies have shown that high expression of RRM2 significantly promotes the growth, invasion and resistance of cancer cells [27] . There is no direct evidence to demonstrate the role of RRM2 in the pathogenesis and progression of HCC. However, a study by Ricardo-Lax et al. revealed that RRM2 is essential for HBV replication [28] . HBV induced RRM2 expression by exploiting the Chk1-E2F1 axis of the DNA damage response pathway. Considering chronic HBV infection is a key factor for HCC, inhibition of RRM2 may have a therapeutic value for HBVrelated HCC.
TPX2 level is elevated in many cancers and has been proposed as biomarkers and effectors of cancer progression. TPX2 can disrupt DNA damage responses and promote cancer pathology through the regulation of Ser-139-phosphorylated Histone 2AX signals [29] . In HCC cells, TPX2 expression is associated with proliferation, apoptosis and EMT [30] . Resent research suggests that TPX2 expression is positively correlated with MMP2 and MMP9 in HCC tissues [31] . The down-regulation of TPX2 can result in inactivation of AKT signaling and decrease the expression of MMP2 and MMP9, which reduces the migration and invasion ability of HCC cells.
TTK, a dual-specificity protein kinase, redirects several key proteins to kinetochores and controls mitotic spindle checkpoint [32] . It has been found to be aberrantly overexpressed in a wide range of human tumors. In HCC, increased TTK expression contributes to HCC tumorigenesis via promoting cell proliferation and migration. Mechanistic studies reveal that TTK stimulates the malignancy of HCC cells through the activation of Akt/mTOR and MDM2/p53 signaling pathways [33] .
UBE2C encodes a member of the E2 ubiquitin-conjugating enzyme family, and is required for the destruction of mitotic cyclins and cell cycle progression. UBE2C is nearly undetectable in normal tissues, but it is upregulated in some human cancers, such as lung [34] , colon [35] , breast [36] , nasopharyngeal carcinoma [37] . A recent study [38] has shown that UBE2C is a transcription target of FOXM1, a master regulator of cell cycle progression. The transcriptional activation of UBE2C by FOXM1 leads to increased level of UBE2C protein, and thereby contributes to the loss of G2/M checkpoint control and cell proliferation. Cells overexpressing UBE2C display the mitotic spindle checkpoint inactivation and lose genomic stability [39] .
HMMR is a multifunctional oncogenic protein participating in cell division event. The high expression of HMMR may be related to cancer progression and prognosis. In breast cancer, it promotes cancer cell migration and invasion [40] . However, few studies have reported on HCC. The results of our study showed that HMMR was overexpressed in HCC patients, and increased expression was significantly associated with overall survival and disease-free survival.
In addition to the genes we have discussed above, five genes was found to be associated only with overall survival or disease-free survival. CCNB1 and CCNB2 belong to the cyclin family, which can form complexes with cyclin-dependent kinases to regulate cell-cycle progression [41] . Abnormalities of CCNB1 and CCNB2 could lead to the development of malignant tumors. NDC80 is a core component of the outer kinetochore and a mitotic regulator. It has been shown that NDC80 overexpression contributes to HCC progression via the inhibition of apoptosis and cell cycle arrest [42] . ASPM is essential for mitotic spindle function during cell replication. Overexpressed ASPM has been found to be associated with tumor progression, early tumor recurrence, and poor prognosis in human HCC [43] . NUSAP1 is a microtubule-binding protein implicated in spindle stability and chromosome segregation. Many studies implicate a crucial role for NUSAP1 in regulating mitotic processes [44,45] . The overexpression of NUSAP1 is often associated with tumor recurrence and metastasis.
Among the other five genes, CCNA2 also belongs to the cyclin family. However, different with CCNB1 and CCNB2, we found that CCNA2 expression was upregulated in HCC patients, but not associated with survival. CDKN3 is a cell-cycle regulator, but is displays tumor-suppressive or oncogenic role depending on the molecular background of different cancer types. In HCC, CDKN3 is commonly overexpressed and associated with poor outcome [46] . Both CENPF and TOP2A play important roles in chromosomal segregationg during mitosis [47] . CENPF has been reported to interact synergistically with FOXM1 to promote tumor growth [48] . It has been reported that the overexpression of CENPF and TOP2A is associated with poor prognosis in cancers, including HCC [49,50] . AURKA has been confirmed as an oncogene in cancer development, which promotes tumor development by promoting a variety of biological functions, including cell proliferation, migration, invasion, EMT and cancer stem cell behaviors [51,52] .

Conclusion
In the present study, we performed an integrated bioinformatics analysis to identify key genes and pathways involved in the development from liver cirrhosis to HCC. Sixteen genes were considered as hub genes. The survival analysis showed that the overexpression of 6 hub genes (KIF20A, HMMR, RRM2, TPX2, TTK and UBE2C) was closely associated with the poor prognosis of HCC. Our results provide new insight into understanding the molecular mechanism underlying HCC carcinogenesis. Further experimental validation should be carried out to confirm these findings in the future.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions TYL and HMY contributed to the study concept and design, the acquisition, analysis, and interpretation of data, and the drafting of the manuscript. DF contributed to the data collections . All authors read and approved the final manuscript.     The biological process analysis of hub genes was constructed using BiNGO. The color depth of nodes refers to the corrected P-value of ontologies. The size of nodes refers to the numbers of genes that are involved in the ontologies.
P<0.001 was considered statistically significant.

Figure 6
Comparison of hub genes expression between HCC and cirrhosis tissues. mRNA expression of hub genes in HCC and cirrhosis tussues using the Wurmbach Liver dataset based on oncomine database. P<0.001 was considered statistically significant.

Figure 7
Hierarchical clustering of hub genes was constructed using UCSC. Up-regulation of genes was marked in red and down-regulation of genes was marked in blue.

Figure 8
The expression of hub genes in HCC and normal tissues was constructed using UALCAN online database. All of hub genes are different. P<0.01 was considered statistically significant.

Figure 9
The expression of hub genes in different tumor grades from HCC and normal tissues was constructed using UALCAN online database.