Screening and Identication of Signicant Genes in Hepatocellular Carcinoma via Bioinformatical Analysis and Validation in Clinical Tissue

Background: To screen out signicant genes associated with occurrence and development of hepatocellular carcinoma (HCC) via bioinformatical analysis and validation using clinical specimens. Methods: Gene expression chips were obtained from GEO database, differentially expressed genes (DEGs) between HCC and para-cancerous tissues were identied by GEO2R and Venn diagrams. What’s more, Gene Ontology (GO) function analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of DEGs were carried out by DAVID. The protein-protein interaction (PPI) network and module analysis of DEGs were performed by STRING and Cytoscape to get hub genes. Subsequently, the inuence of hub genes on overall survival and the expression levels were determined with Ualcan and GEPIA, and found the pathway via re-analysis of DAVID. Besides, immunohistochemistry staining were performed to verify the key genes, and follow-up results including prognoses and the clinicopathological features were statistically analyzed. Results: 49 up-regulated DEGs and 122 down-regulated DEGs were selected. There were to tally 33 Hub genes were screened out, while 28 of them were related to prognosis and high expression in HCC. Furthermore, CDK1 and RRM2 were signicantly enriched in p53 signaling pathway. Meanwhile, CDK1, RRM2 were highly expressed in HCC tissues by immunohistochemistry staining. Additionally, CDK1 and RRM2 were negatively correlated with overall survival. Tumor size and AFP were signicant prognostic factors, while CDK1 and RRM2 were independent prognostic factors. Conclusion: This study conrmed CDK1 and RRM2 as the key genes in HCC which could be potential biological target for diagnosis and treatment.

Background HCC (hepatocellular carcinoma) is the most common malignant tumor of digestive system featured with high mortality and morbidity (1). At present, a lot progress has been achieved with the usual treatment strategies, such as surgical resection, transplantation, chemotherapy, radio frequency ablation and arterial embolization.
Unfortunately, these strategies would often bring unfavorable side effects resulting poort prognosis of HCC.
Targeted medicine therapy has an essential effect on prognosis that it provides a supplement and innovative option to traditional treatments (2). However, resistance of targeted drugs has been gradually increasing with emergence of target mutations and escapes. In addition, the mechanism of HCC occurrence and development has not been fully elucidated based on current researches, thus discovery of new targets would be extremely di cult and important(3) as it plays an important role in early diagnosis and personalized precision treatment of HCC (4).
Gene pro le, as an e cient and reliable technology of gene detection, has been widely used that large amount of gene detection data has been accumulated, stored and shared in many public databases for further analyses and mining leading to the emergence and development of bioinformatics (5). Especially in recent years, a wealth of data have been yielded from accumulating researches on the genes for liver cancer. However, due to heterogeneity of samples, the existing data is usually biased and mixed and couldn't comprehensively and objectively re ect the actual gene expression during independent gene sequencing research. Besides, evidences for direct and reliable gene veri cation are insu cient in most of current studies. Therefor, joint analysis of multiple samples and veri cation using clinical samples is considered as an effective way to nd new target genes (6,7).
In this study, we analyzed four original microarray datasets and included 336 HCC cases and 72 para-cancerous cases data. Key genes were selected and veri ed in clinical samples by immunohistochemistry. 2.3 Data processing and screening of DEGs GEO2R(http://www.ncbi.nlm.nih.gov/geo/geo2r) is an important online tool in NCBI-GEO for comparing two or more datasets to identify DEGs. We compared cancer tissue and para-cancerous tissue DEGs in gene pro le datasets GSE101685, GSE112791, GSE62232 and GSE45267 respectively by GEO2R. De ned those with logFC (fold change)>2 and adj. P-value < 0.05 as Up-regulated DEGs and those with logFC (fold change)<-2 and adj. P-value < 0.05 as Down-regulated DEGs. Then Venn online software was used to screen the co-expressed DEGs in the four pro les.

Gene ontology and KEGG pathway enrichment analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID,http://david.ncifcrf.gov) (version 6.8) (8) is an integrated biological knowledgebase and analytic tool aimed at systematically extracting biological meaning from large gene/protein lists. Gene ontology(9) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. KEGG(10) is a reference knowledge base for biological interpretation of genome sequences and other high-throughput data. In this study, we selected and analyzed the biological processes (BP), molecular function (MF), cell component (CC) and KEGG pathway enrichment of DEGs in David website. P-Value<0.05 was considered statistically signi cant.
2.5 PPI network and module analysis STRING(http://string-db.org) (version 11.0)(11) online database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Cytoscape (12) is network biology analysis and visualization tools. MCODE is an APP of Cytoscape for nding densely connected regions from PPI networks. In this study, PPI network of DEGs was drawn by STRING and Cytoscape, the most signi cant hub module in the PPI networks was identi ed by MCODE.
2.6 Survival analysis and RNA sequencing expression of Hub genes: UALCAN(13) is an interactive web-portal and user-friendly tool to perform in-depth analyses of TCGA gene expression data. GEPIA (Gene Expression Pro ling Interactive Analysis)(http://gepia.cancer-pku.cn) (14) web server is a valuable resource for gene expression analysis based on tumor and normal samples from the TCGA and the GTEx databases. We used UALCAN to obtain the survival analysis results of hub genes. And the RNA sequencing expressions of Hub genes in cancer tissues and paracarcinoma tissue were analyzed with GEPIA. P-Value<0.05 was considered statistically signi cant.
2.7 Re-analysis core genes via KEGG pathway enrichment: Hub genes associated with poor prognosis and high expressions in cancer tissues were selected and named as core genes, which were identi ed by KEGG pathway enrichment analysis with DAVID.

Immunohistochemical analysis
Immunohistochemical(IHC) staining was used to detect the expression and localization of CDK1 and RRM2 in 71 HCC tissues and matched para-cancer tissues. Then, to follow up the 5-year overall survival rate, and analysis of the correlation between age, gender, tumor size, AFP, AST, ALT, etc. and key genes.
Para n section was depara ned in xylene for 30 min three times at 37℃, hydrated in a series of 100, 95, 90, 85and 80% ethanol solutions, oxidize in 3% hydrogen peroxide solution for 30 min and washed in phosphatebuffered saline (PBS) for 2 min six times. The antigen was recovered in boiling sodium citrate buffer (10mmol/L, pH=6.0) for 5 min. Then the sections were cooled down to room temperature, BSA blocks unwanted antigenic sites for 30 min at 37℃, incubated with anti-CDK1 and anti-RRM2 monoclonal antibody (dilution 1:100) overnight at 4˚C. After washing with PBS, the slides were incubated with SABC for 30 min at 37˚C, DAB staining solution was used to dye the slides for 5 min and hematoxylin for counterstaining the nuclei for 1 min. The sections were then dehydrated in ethanol, coverslips were placed on the slides. Evaluation of immunohistochemical staining: staining included the intensity of staining (scored as: 0, no staining; 1, weak staining; 2, moderate staining; and 3, strong staining) and the percentage of positive tumor cells (scored as: 0, <5%; 1, 5-25%; 2, 26-50%; 3, 51-75%; and 4, 76-100%). Staining intensity and frequency were transformed into a Composite Expression Score (CES) (The formula: CES = Intensity×Frequency). The range of CES was from 0 to 12, according to the score de nition that: 0-6, low expression; 7-12 high expression. These scores were independently determined by three senior pathologists.

Statistical analysis:
Data and gures were processed via GraphPad Prism7.0 software and IBM SPSS25. Comparisons of CDK1, RRM2 gene expression were performed with independent samples t-test, or using one-way of ANOVA and Bonferroni's multiple comparison tests. Correlation between the CDK1, RRM2 expression and clinicopathological features was detected with chi-square test. Kaplan-Meier and Cox regression model analyze are used to analysis the associated overall survival and clinicopathological factors. p-value< 0.05 was considered statistically signi cant.

Identi cation of DEGs in HCC:
Total of 171 DEGs were screened out from GSE101685, GSE112791, GSE62232 and GSE45267 gene chips by online GEO2R tool and Venn software, which including 49 up-regulated genes and 122 down-regulated ( Figure 1A).

DEGs gene ontology and KEGG pathway enrichment
We list the top ve GO enrichment in P-Value order for the 171 DEGs identi ed by DAVID software. Table 2 demonstrates the results of GO enrichment analysis (P-Value<0.05, FDR<0.05, Table 2

PPI network and modular analysis:
We performed PPI diagram of 171 DEGs by online STRING database and Cytoscape. 132 out of 171 DEGs were identi ed from PPI, including 44 up-regulated genes and 88 down-regulated genes. And there were 132 nodes and 775 edges( Figure 1B). Then we use MCOD to screen out the hub modular from PPI, there were 33 nodes and 505 edges(MCODE scores=31.562, Degree Cutoff=2, Node Score Cutoff=0.2, k-Core=2 and Max. Depth=100.), and there are 33 hub genes( Figure 1B).

Survival analysis and RNA sequencing expression of hub genes:
29 genes were found to be able to cause poor overall survival(OS) in HCC patients(P<0.05) among the 33 hub genes uploaded to UALCAN website. Subsequently, 28 out of 29 genes were proved by GEPIA to have signi cant higher expression levels in HCC tissues than that in adjacent tissues (P<0.05)(|Log2FC| Cutoff=1; p-value Cutoff=0.01; Jitter Size=0.4)( Table 3).  We found that CDK1 and RRM2 were expressed in the nucleus and cytoplasm, and much more highly expressed in HCC tissues than in para-cancer tissues(p<0.05) (Figure 2A). Expectedly, patients have poor overall survival (OS) with high expression of CDK1 or RRM2( Figure 2B)(p<0.05).

Clinicopathological features of HCC patients:
Our analysis showed that high expression of CDK1 was strongly correlated with tumor size and AFP (p=0.025, 0.020), while RRM2 with age, tumor size and AFP (p=0.008, 0.005, 0.002). But either of which were not correlated with gender, HbsAg, cirrhosis, drinking, AST or ALT (Table 4). Kaplan-Meier and Cox regression model analyze showed that CDK1 expression, RRM2 expression, tumor size and AFP were signi cant prognostic factors(P<0.05) ( Table 5). Furthermore, we found that CDK1 expression and RRM2 expression were independent prognostic factors in patients with HCC(P<0.05)( Table 5).

Discussion
Numerous studies have been conducted to reveal the causes and underlying mechanisms of HCC formation and progression. Unfortunately, the prognosis still remains poor due to strong invasion and metastasis. And the commonly adapted treatments could not bring satisfactory overall e cacy. Early diagnosis and molecular targeted therapy based on mechanism researches are expected to improve the therapeutic effect of HCC treatment. Gene chip analysis and bioinformatics are the important and widely used ways to screen molecular target on a comprehensive scale and in an e cient and accurate manner. . This study integrated four gene expression pro les from GEO, identi ed 171 common DEGs. CCNB1, CDK1, AD2L1, BUB1B, CCNA2, AURKA, and RRM2 key genes were identi ed with a series of bioinformatics analyses to be associated with cell cycle, progesterone-mediated oocyte maturation, Oocyte meiosis and p53 signaling pathway. Moreover, we found higher expressions of CDK1 and RRM2 in HCC tissues by examining 71 paired clinical tissue samples using imunohistochemistry assay. Subsequently, CDK1 expression and RRM2 expression were identi ed as independent prognostic factors in HCC by Cox regression model analyze(P<0.05).
Interestingly, all the seven key genes screened out were constantly up-regulated in HCC and para-cancer tissues, which indicates that partial of up-regulated genes in HCC cells are responsible for occurrence and development of HCC. Accumulated researches have recognized the main factors for HCC including gene mutation, environment changes, abnormal regulation, and etc. And a lot of research indicated that HCC is caused by overexpression of proto-oncogenes (15,16). Thus it is hypothesized that HCC may promote abnormal expressions of certain oncogenes to cause production of cancerous cells. So, the proto-oncogenes should be the focus for studying on cancer prevention and treatment, especially the newly revealed 7 key genes.
Cyclin-dependent kinase 1(CDK1) was enriched in the above four pathways and governed the phosphorylation of factor 4E-binding protein 1 (4E-BP1), which is important in tumorigenesis (17). In addition, studies have shown that CDK1/EMT process promotes tumor progression via CDK1/PDK1 /β-Catenin pathways (18) and regulates tumorigenesis in tumor stem cells of liver cancer. CDK1/CCNB1 also regulating the apoptosis, invasion and cell cycle of HCC cells by blocking the p53 signaling pathway via Modulating CDK1/CCNB1 Axis (19). Therefore, CDK1 is closely associated with occurrence and development of HCC. In our study, the results of immunohistochemical staining showed that CDK1 expression in HCC was signi cantly increased. Moreover, results of both Kaplan-Meier and Cox regression model analyze demonstrated that CDK1 was an independent unfavorable predictor of OS in HCC patients. These results made us convinced of the reliability of CDK1 as a biological target of HCC, and proved that the bioinformatics analysis method in this study could accurately screen out cancer-related key genes.
Ribonucleotide reductase M2 (RRM2), a critical rate-limiting enzyme for DNA synthesis and repair, plays an important role in cell division, proliferation and differentiation by promoting cells proliferation and inhibiting cellular apoptosis. Host RRM2 and NS5B can inhibit RNA replication of hepatitis B virus, thereby inhibiting the conversion of hepatitis B to liver cancer (20). RRM2 may be a key gene for liver cancer transformation from cirrhosis via P53 pathway(21), Gao(22) suggested in his study that RRM2 was a candidate target for HCC therapy as suppression of RRM2 markedly suppressed proliferation of HCC cells and RRM2 expression was higher in HCC than in non-HCC tissue. Wang Y(23) reported that SCYL1-BP1 could affect the cell cycle through increasing steady state levels of RRM2 proteins. And in our study, we con rmed the RRM2 was highly expressed in HCC tissues through immunohistochemical staining and could be an independent prognostic factors in HCC patients as CDK1.
CDK1 and RRM2 were identi ed to be core genes in colorectal cancer by Ding X(24) in their recent study using integrated bioinformatics analysis, which is constant with the results obtained by Ding YG(25) in anaplastic thyroid cancer, and by Zhou Y(26) in Glioblastoma. In addition, CDK1 and RRM2 were proved by Zhou Z (27) and Wu M(28) as key genes in HCC and to be potential new biological targets by bioinformatics in HCC. However, laboratory veri cation and further mechanism research of CDK1 and RRM2 were not performed in all the mentioned studies based on bioinformatics,. What's more, both CDK1 and RRM2 were enriched in the P53 signaling pathway that signi cantly contributing to reproduction, suppression of tumor expression and cell cycle regulation (29,30). The research on targeted therapy of P53 signaling pathway has a considerable future(31,32).
Jin J(33) proved that LINC00346 could regulate apoptosis, invasion and cell cycle of HCC cells by promoting CDK1/CCNB1 expressions via indirectly blocking p53 signaling pathway. So far, the effect of gene therapy targeting RRM2 has been proven in many different types of tumors (26,34). However, there are only few extensive researches on its effect in HCC and RRM2 as a feasible therapeutic option has not been fully explored and recognized.
Notably, both CDK1 and RRM2 enriched in P53 signaling pathway. In our study, CDK1 and RRM2 were highly expressed in HCC as expected, and they were also identi ed as independent prognostic factors. Therefore, HCC diagnosis and treatment could possibly be improved by targeting both CDK1 and RRM2 although the underlying molecular mechanism linking CDK1 and RRM2 is still unclear and needs further research.
Our results are in line with previous ndings and suggest that CDK1and RRM2 can be key genes associated with HCC development. CDK1and RRM2 combination is expected to become a reliable new biological target for diagnosis and gene therapy. More importantly, bioinformatics is further proved to be an advanced and useful method for screening key genes in cancers.
There were few limitations in the present study that should be noted. First, all the candidate genes could not be veri ed. Second, sample size was relatively small. Finally, molecular mechanisms should have been explored with cell experiments. So that further researches will be necessary to clarify the above issues.

Conclusions
In summary, our present study con rmed that CDK1 and RRM2 as the key genes in the development and progression of HCC. And, CDK1 and RRM2 might be utilized as prognostic indicators of HCC, providing potential new biological target for diagnosis and treatment in HCC. Availability of data and materials:

Abbreviations
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Con ict of Interest:
The authors declare that they have no competing interests.     Figure 1 The