A Bioinformatics-Based Screening and Analysis of Key Genes in Hepatocellular Carcinoma

Hepatocellular carcinoma (HCC) is considered as the leading killer disease in the world. So 2 far most of the diagnosis of HCC is mainly established on imaging and biopsy. As sequencing 3 technology is developing quite fast, and it has already been widely applied in the medical area, 4 such as cancer diagnosis. In this article, GSE121248, GSE76427 and GSE60502 datasets were 5 chosen to analyze and screen key genes which could affect the development of liver cancer 6 through the bioinformatics method. The results showed up regulated genes mainly reside in 7 cell division, nucleus, protein binding pathway, and down regulated genes are mostly located 8 in the Oxidation-reduction process, Extracellular region, Heme binding, Metabolic pathway. 9 Secondly, hub gene analysis indicated there were twelve critical hub genes found: RFC4, 10 RACGAP1, CCNB2, CDC20, UBE2C, PTTG1, AURKA, PRC1, NCAPG, CDKN3, TOP2A, 11 KIF20A, AURKA and CDKN3. By applying bioinformatic measures, the genes associated 12 with hepatocellular carcinoma can be efficiently analyzed, that would provide invaluable 13 information for translational studies. 14


Introduction
Hepatocellular carcinoma (HCC) is the second leading cause of cancer-related deaths in the 17 world[1]. HCC is characterized by high mortality rates, high metastasis and high 18 invasiveness [2]. It is generally believed that the occurrence of HCC is associated with viral 19 infection (hepatitis C virus, hepatitis B virus), long-term alcohol consumption, Aflatoxin 20 infection, water contamination, nitrous acid substances, primary biliary cirrhosis and 21 non-alcoholic fatty liver, etc [3]. The treatment of early HCC mainly includes surgical 22 resection, liver transplantation, percutaneous radiofrequency ablation, and probably more 23 than 50% of patients have a five-year survival rate. Because the pathogenesis of HCC 24 involves multiple signaling pathways, and the molecular mechanisms of malignant 25 progression are unclear. Almost 80% of advanced-stage liver cancer patients have mainly 26 received radiotherapy and chemotherapy treatment. Therefore, identifying effective early 27 tumor diagnostic markers and targeted therapeutic targets is of great clinical significance [4]. 28 The development of high-throughput gene chip and sequencing technology has helped 29 rapid the study of the gene expression profile of liver cancer, facilitating the discovery of gene 30 and gene expression changes of liver cancer tissues and cells subjected to specific conditions.

31
Due to the low reliability of positive results from a single microarray data analysis, in this 32 study, we analyzed three microarray datasets from the GEO database, each of them includes 33 HCC samples and non-tumor containing liver tissue samples. With the aid of GEO2R, 34 differential expressed genes (DEG) between HCC and normal samples were analyzed. 35 Applying several bioinformatics tools such as DAVID and String, the DEG's biological 36 functions, involved signaling pathways and interactions were worked out. Finally, 12 key 37 genes were selected to be candidate biomarkers for HCC. This study provides a theoretical 38 basis for the clinical screening of molecular markers and drug targets for the development and 39 progression of hepatocellular carcinoma (Fig.1    genomes. The X-axis refers to the function of the gene, the left-sided Y-axis represents p-value (-log10), 132 and the right-sided Y-axis represents the number of enriched genes. Figure   NCAPG changes showed poorer disease-free survival, p< 0.05 ( Fig. 6A and B).  In this study, three datasets were selected from the GEO database, which included HCC 187 tissue samples and non-HCC tissue samples. Following the GEO2R analysis of differential 188 genes, 215 DEG were identified, including 34 upregulated genes and 181 downregulated 189 genes. GO enrichment and KEGG functional analysis were performed on these genes.  198 TOP2A is one of the nuclear matrix components. It has been found that overexpression of 199 TOP2A is closely related to breast cancer, prostate cancer and liver cancer [19,20]. In normal cancers. In a variety of tumors such as colon cancer, breast cancer and liver cancer, silencing 222 RFC4 blocks cells in the S phase, preventing them from subsequent mitosis and proliferation, 223 therefore resulting in a decrease in tumor cell proliferation [33][34][35]. CDC20 acts as a cell cycle 224 regulator and also plays an important role in human tumors. In numerous tumor types, high 225 expression of CDC20 was observed and this was associated with poor prognosis especially in 226 lung cancer, bladder cancer and liver cancer [36,37]. NCAPG is a cell cycle-associated gene 227 that affects primary hepatoma cells by influencing mitosis [38].

251
In the PPI network, AURKA and CDKN3 are directly linked to genes such as TOP2A 252 and PTTG1, suggesting its important role in HCC. We evaluated the expression of AURKA 253 and CDKN3 in terms of overall and disease-free survival through GEPIA and Oncomine 254 software, which was reduced when there were genetic alterations in AURKA and CDKN3.

255
Although the overall survival analysis of CDKN3 was not statistically significant in this study, 256 some clinical studies have shown that overexpression of CDKN3 is significantly associated 257 with reduced survival times [58]. It had been reported that the overexpression of CDKN3 also 258 had a significant effect on the overall survival of HCC patients [59]. We speculate that this 259 may be because the survival analysis of the GEPIA tool was based on the relationship 260 between gene mutation and prognosis, but gene overexpression is usually caused by mutation 261 or amplification. Besides, the data included in each database is inconsistent and there will be 262 deviations. Therefore, overexpression of CDKN3 in HCC may come from gene amplification 263 rather than mutation, and further research is needed to confirm our hypothesis.

264
In summary, this study was designed to identify DEGs that may be involved in the 265 pathogenesis or progression of HCC. A total of 215 DEGs and 12 Hub genes were identified 266 and can be considered diagnostic biomarkers for HCC. However, further research is needed to 267 elucidate the biological functions of these genes in HCC.