Identification of potential biomarkers in hepatocellular carcinoma: a network-based approach

Background: Hepatocellular carcinoma (HCC) is one of the leading causes of death worldwide. Identification of potential therapeutic and diagnostic biomarkers can be helpful to screen cancer progress. This study implemented with the aim of discovering potential biomarkers for HCC within a network-based approach integrated with microarray data. Methods: Through downloading a gene expression profile GSE62232 differentially expressed genes (DEGs) were identified. Gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis for DEGs were performed utilizing enrichr server. Following reconstruction of protein-protein interaction network of DEGs with STRING, network visualization, analyses, and clustering into structural modules carried out using Cytoscape. Considering degree centrality, 15 hub genes were selected as early biomarker candidates for final validation. In order to validate hub genes, GEPIA server was used to perform overall survival (OS) and disease-free survival (DFS).


Introduction
Hepatocellular carcinoma (HCC), a predominant primary liver cancer, is the sixth leading causes of death by cancer (1). A multi-step mechanism consists of an accumulation of gene alterations that is resulted in various molecular and cellular modifications is the main cause of HCC pathogenesis (2). Recent studies, have indicated various genes such as epidermal growth factor receptor (EGFR) (3), transforming growth factor-beta 1 (TGF-β1) (4), c-myc (5), to name but few involve in tumorigenesis and progress of HCC. However, there is still a lack of a specific biomarker for precise diagnostic of HCC (6). Therefore, finding an accurate biomarker for screening HCC at different stages with high specificity and sensitivity is an urgent need.
Data obtained from microarray technology in combination with bioinformatics approaches provides a prominent strategy at molecular level for an extensive studying of dysregulation in gene expression between cancer and normal samples from patients. A large-scale microarray data has been published in different databases through recent years and utilizing these data to perform integrated analysis is valuable strategy to follow up the cancer progression in the patients. Despite all efforts in finding different gene alterations in HCC, the molecular mechanisms of HCC progression remain unclear. Consequently, analyzing microarray data to obtain differentially expressed genes (DEGs) between tumor and normal samples is a promising method to discover hub genes and key pathways, which results in HCC tumorigenesis.
In this study, expression profile of hepatocellular carcinoma was obtained from the gene expression omnibus (GEO) database for identification of hub genes in HCC. Following the DEGs identification, they were utilized for gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) enrichment analysis, and protein-protein interaction (PPI) network reconstruction studies. Ultimately, survival analyses were performed using GEPIA to evaluate the prognostic value of each hub genes. CDK1, CCNB1, CCNA2, CDC20, AURKA, MAD2L1, TOP2A, KIF11, BUB1B, TYMS, EZH2, and BUB1 were considered as our suggested biomarkers.

Data selection and preliminary analysis
A gene expression profile of hepatocellular carcinoma (GSE62232) was obtained from gene expression omnibus (GEO) which is a repository of transcriptional and expression data (7). Our selected dataset contained samples form patients with age from 21 to 82 years old and different etiological factors. 14 tumor samples were from female patients while 67 belonged to male; moreover, 10 normal samples were used as control. Normalization and expression calculation of downloaded CELL files were implemented with transcriptome analysis console (TAC) through summarization using RMA method additionally eBayes was applied as ANOVA method. To determine differentially expressed genes (DEGs) a threshold value of fold change > 2 or -2 < and p-value <0.05 were chosen.

Gene ontology (GO) term and KEGG pathway enrichment analysis of DEGs
Following DEGs identification, gene ontology (GO) term and kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analysis were performed utilizing Enrichr (8,9) which is available at https://amp.pharm.mssm.edu/Enrichr. GO (10,11) is capable of determine details about three functional features of a genes set, including biological process (BP), molecular function (MF), and cellular component (CC). KEGG (12) is a part of Japanese GenomeNet service and is available at https://www.genome.jp/kegg/. It can reveal the relationships between biological pathways related to disease and drugs. P < 0.05 is considered cutoff criteria to obtain significant result for GO and KEGG enrichment analysis.

PPI-network of DEGs and module analysis
To discover the connection between differentially expressed genes, a protein-protein interaction (PPI) network was reconstructed utilizing STRING v11.0 (13) at https://string-db.org/. Cytoscape 3.6.0 (14) was used for network visualization and analysis. Clustering network into structural modules was performed utilizing MCODE app (15) based on degree cutoff = 2, node score cutoff = 0.2, K-core = 2, and max-depth = 100. GO term and KEGG pathway enrichment analysis of top three modules were carried out with Enrichr software.

Hub genes PPI network and enrichment analysis
Based on highest degree centrality, top 15 nodes from PPI-network of DEGs were considered as hub genes. Selected hub genes were inputted into STRING to analysis their relationships. Furthermore, GO term and KEGG pathway enrichment analysis was also performed for hub genes.

Survival analysis of hub genes
In order to validate the selected hub genes, gene expression profiling interactive analysis (GEPIA) which is available at http://gepia.cancerpku.cn were employed to investigate their potential to choose as biomarkers. Therefore, in patients with liver hepatocellular carcinoma (LIHC) overall survival (OS) and disease-free survival (DFS) related to hub genes were examined.

Differentially expressed genes in hepatocellular carcinoma
After downloading a microarray dataset of HCC (array type: HG-U133_Plus_2) containing 81 carcinoma and 10 normal samples, the sample normalization and comparison analysis was performed between two groups (shown in figure 1), resulted in identifying 1996 DEGs. Among identified DEGs, 995, and 1001 genes have been up-regulated and down-regulated, respectively; moreover, volcano plot and scatter plot were applied for visualization of DEGs between carcinoma and normal groups (shown in figure 2). Each of top 20 up-and down-regulated DEGS are listed in table 1, additionally expression pattern of DEGs were revealed by performing hierarchical clustering analysis (shown in figure 3).

Gene ontology (GO) term and KEGG pathway enrichment analysis of DEGs
The enrichr GO term and KEGG pathway enrichment analysis were performed for identified DEGs. In regard to GO biological process results, DEGs are related to 'epoxygenase P450 pathway', 'alpha-amino acid catabolic process', 'arachidonic acid metabolic process', 'steroid metabolic process', 'monocarboxylic acid metabolic process', 'organic cyclic compound catabolic process', 'exogenous drug catabolic process', 'cellular amino acid biosynthetic process', 'monocarboxylic acid biosynthetic process', and 'drug catabolic process'.

PPI-network reconstruction of DEGs and network clustering
PPI network of DEGs were reconstructed utilizing STRING server. Cytoscape version 3.6.0 was used to visualize, analysis and clustering network into structural modules. Total nodes were 1236 and total edges were 11624 in the constructed PPI network. 594 nodes were belonging to upregulated DEGs and 562 nodes were related to down-regulated DEGs. Top 3 structural modules are shown in Table 4. GO term (Table 5) and KEGG pathway enrichment analysis (Table 6) were performed for each module. PPI network of DEGs and top 3 modules are represented in Figure 5.

Survival analysis of hub genes
Each of hub genes was submitted to GEPIA in order to obtain survival plots.

Discussion
Hepatocellular carcinoma (HCC) is one of the leading causes of death by cancer (16). Due to the important role of biomarkers in diagnosis of HCC, various biomarkers have been introduced in the recent years (17). Despite all efforts in developing diagnosis methods and treatment strategy for HCC patients, there is still shortcomings in this area (18). Consequently, biomarkers with high sensitivity and specificity need in precise detection of HCC.
Integration of network-based approach with microarray has been resulted in the emergence of a robust strategy to find potential biomarkers in various cancers specially HCC (19 CDK1 (Cyclin Dependent Kinase 1) gene belongs to serine/threonine protein kinase family (20).
CCNB1 encodes G2/mitotic-specific cyclin-B1 and based on our result its up-regulation is related with low survival in HCC patients. The potential role of CDK1 and CCNB1 inhibition in increasing the efficacy of HCC treatment have also been investigated (21,22). CCNA2 encodes Cyclin A2 and the regulation of CCNA2 using RNA interference has been shown recently (23). A protein produced from CDC20 (Cell Division Cycle 20 interacts with anaphase-promoting complex/cyclosome (APC/C) in the cell cycle. Up-regulation of CDC20 has been reported in various cancers, including oral squamous cell carcinoma (24), and gastric cancer (25).
Additionally, the effects of CDC20 up-regulation in the progression of HCC have been examined (26). Overexpression of BUB1B (also termed BUBR1) which encodes mitotic checkpoint serine/threonine-protein kinase BUB1 beta are associated with poor prognosis in HCC (27). BUB1 encodes mitotic checkpoint serine/threonine-protein kinase and recently its role to maintain breast cancer stem cell has been studied (28). MAD2L1 encodes mitotic spindle assembly checkpoint protein MAD2A, its suppression using RNA interferences has been shown to control the proliferation and metastasis in HCC (29). CDK1, CCNB1, CCNA2, CDC20, MAD2L1, BUB1B, and BUB1 are enriching cell cycle. Interestingly, cell cycle is among top KEGG pathways which is enriched with both module 1 and DEGs.
AURKA encodes Aurora Kinase A protein and due to our findings, AURKA enriches progesterone-mediated oocyte maturation and oocyte meiosis in KEGG pathways. A study suggests that variation in AURKA gene is important to predict early-stage of HCC, and it is a reliable biomarker for HCC (30). DNA topoisomerase 2-alpha is a protein produced from TOP2A; it has been suggested that TOP2A overexpression in HCC patients is a potential candidate for therapeutic purposes (31) and based on GO molecular function it is related to protein kinase binding. KIF11 encodes Kinesin-like protein KIF11 and due to results of GO cellular component it is related to spindle and mitotic spindle. Recently, a study demonstrated that kinesin family members, including KIF11 are potential markers to predict poor prognosis and cell proliferation in HCC (32). TYMS encode thymidylate synthetase protein and through a study it was shown that TYMS related polymorphisms are valuable factors in predicting clinical outcomes of HCC (33).
EZH2 encode Histone-lysine N-methyltransferase protein and considering GO cellular component enrichment analysis, it is related to nuclear chromosome part. It has shown that EZH2 through modulation of miR-22/galectin-9 axis can cause progression of HCC (34).

Conclusion
In conclusion, our findings alongside recent studies validate the importance of CDK1, CCNB1, CCNA2, CDC20, AURKA, MAD2L1, TOP2A, KIF11, BUB1B, TYMS, EZH2, BUB1 to be considered as potential biomarkers in hepatocellular carcinoma. Based on our findings, we demonstrate that network-based approach integrated with microarray data is a promising method to obtain potential therapeutic and diagnostic biomarkers in various cancers, in our case, hepatocellular carcinoma. However, further experimental analysis is needed to confirm our findings.