Transcription Factors Linked to the Molecular Signatures in the Development of HCC on a Cirrhotic Background

Mechanisms underlying the regulation of gene expression in cancer have been surveyed for decades to nd novel prognostic factors and new targets for molecular targeted therapies in cancer. Because most cases of liver cancer are associated with liver cirrhosis, we aimed to analyze the gene expression signatures and the gene regulatory mechanism in hepatocellular carcinoma (HCC) on a cirrhotic background using high-throughput data analysis. In the present study, three valid array-based datasets containing HCC and liver cirrhosis samples were obtained to identify common differentially expressed genes (DEGs). Moreover, a comprehensive data analysis was conducted based on RNA-Seq data and using Kaplan-Meier curve analysis to nd molecular signatures that reduce patients' overall survival rate. Furthermore, we proposed a gene regulatory network (GRN) to explore the possible regulatory mechanism of these molecular signatures by transcription factors in HCC progression from cirrhosis. Besides, we analyzed protein-protein interactions, gene ontology (GO), and pathway enrichment to elucidate the cellular and molecular function of the GRN elements in HCC. In this way, we found a list of 231 molecular signatures in HCC derived from cirrhosis. We also found the importance of TCF4, RUNX1, HINFP, KDM2B, MAF, JUN, NR5A2, NFYA, and AR as key differentially expressed transcription factors (DETFs) in the progression of HCC from cirrhosis. In conclusion, the identied molecular signatures and their transcription factors propose candidate prognostic markers and possible molecular targets in the progression of HCC. molecular function processes.


Introduction
With a less than 20% chance of survival, liver cancer is categorized among the deadliest cancers [1].
Hepatocellular carcinoma (HCC) is one type of liver cancer which is originated from the main liver cells called hepatocytes. This type of cancer is the third most common cancer-related deaths in the world [2,3].
Liver cancer is a malignant growth of liver cells and tends to occur in livers damaged by birth defects, alcohol abuse, chronic infection with diseases such as hepatitis B and hepatitis C, metabolic liver disease, hemochromatosis, and cirrhosis [4]. Among these risk factors, cirrhosis is considered as the major cause of hepatocellular carcinoma (HCC) in the United States, Europe, and Asia and comes at the top of the list [5]. Noticeably, more than half of all people that suffer from liver cancer have cirrhosis.
Cirrhosis occurs very slowly and can take many years to develop damage to the liver hardly. Both diseases are prevalent in the general population increases the possibility of catching both diseases concurrently [6].
Nonalcoholic fatty liver disease refers to a group of diseases related to liver fat deposition. When the deposition of fat coincides with in ammation and moderate brosis, nonalcoholic steatohepatitis (NASH) occurs (Fig. 1A). With raising in in ammation, brosis, and regenerative nodules, liver cirrhosis develops [7,8].
Cirrhosis is the most important factor for the incidence of HCC but, molecular processes especially regulatory mechanisms of the genes in the development of HCC from cirrhosis have been poorly Page 3/18 understood. Despite remarkable progress in the knowledge and management of liver cirrhosis in the past decades, it is still one of the most important reasons for humans mortality. The development of HCC from cirrhosis is mostly affected by genetic factors [9]. Then, we can consider these factors related to liver cell malignancy. Thus, it is important to understand the mechanisms causing the transition of cirrhosis to HCC.
The incidence and mortality of hepatocellular carcinoma (HCC) have remained a major public health issue as one of the most common malignant tumors worldwide. Hepatocarcinogenesis is a very complex biological process associated with many environmental risk factors and factors in heredity, including abnormal activation of cellular and molecular signaling pathways such as Wnt/β-catenin, AKT, MAPK, and ERK signaling pathways [10].
Therefore, the mechanism of gene regulation in cancer has been an important issue for decades and oncologists have come up with new and practical breakthroughs in the ght against cancer. For a basic understanding of this process, it is necessary to identify the mechanism of gene expression patterns in cancer by using high throughput techniques [11]. To elucidate gene expression mechanisms underlying the development of HCC we focused on the transition between cirrhosis and HCC. Studies have indicated that this transition is mostly affected by genetic/epigenetic alterations [9]. Thus, rst of all, we found genes associated with HCC derived from cirrhosis and then, we found genes associated with HCC patients' survival rate. Therefore, our study aimed to revealed potential prognosis markers and molecular targeted therapy in HCC.
Molecular biologists require gene regulatory networks (GRNs) because it is a useful way to elucidate the cellular process in living cells. The alteration between genes and their products plays an important role in many molecular processes and induces some abnormalities [12]. Of all regulatory factors in the cell, transcription factors (TFs) are considered as one of the most important elements in the gene expression alteration processes. Then, in our current survey, we found the key differentially transcription factors (DETFs) in the development of HCC on a cirrhotic background. Therefore, in this study we have tried to nd the regulatory mechanism by DETFs on the molecular signatures in HCC. The work ow of this study is presented in Fig. 1B.

Analysis of DEGs in HCC on a cirrhotic background
Cirrhotic and HCC tissue sample datasets for mRNA expression with patients' pathological information were obtained from the GEO database. We searched for datasets consist of both HCC and cirrhosis samples. Datasets including a small number of samples, cell lines, patients with neoadjuvant therapy and patients without necessary clinical data were excluded from this study. In this regard, we selected three reliable array-based datasets consist of GSE25097 [13], GSE46444 [14], and GSE63898 [15]. To identify DEGs involved in HCC on a cirrhotic background, we compared the gene expression of HCC samples with cirrhotic tissues. In this regard, each dataset was analyzed separately using limma package in R to assess DEGs [16]. A gene with |FC|>1.5 and p-value < 0.05 were selected as a DEG in each dataset. Finally, we selected all common DEGs among the three datasets for further analysis. In this way, we used totally 840 cirrhotic and HCC patients' samples for this study. Details are indicated in Table 1.

Identi cation of molecular signatures for HCC patients
Using Kaplan-Meier curve analysis we estimated each common DEG as a potential prognostic marker for HCC [17]. In this regard, we used RNA-Seq data from 364 HCC patients with liver cancer using the KMplotter database [18]. In a comprehensive data analysis we selected all common up-regulated DEGs with HR > 1 with 95% con dence intervals (CIs) and p-value < 0.05, and also all down-regulated DEGs with HR < 1 with 95% CIs and p-value < 0.05 for HCC. In this way, we selected all common DEGs, which changes in their expression are associated with a lower chance of the patients to be survived with liver cancer. Then, we combined all data to reach a set of molecular signatures in HCC on a cirrhotic background.

Identi cation of DETFs and their target genes
Transcription factors with |FC|>1.5 and p-value < 0.05 were selected as DETFs in comparison of gene expression in HCC derived from liver cirrhosis and then, to identify the acquired DETFs target genes we utilized ChEA and TRANSFAC databases which are designed based on experimental evidence for the identi cation of TFs target genes [19]. We considered a p-value of less than 0.05 as a signi cant result.

Gene regulatory network analysis
Understanding relationships between the architectural properties of gene-regulatory networks (GRNs) has been one of the critical goals in systems biology and bioinformatics. GRNs were constructed and visualized in Cytoscape3.4.0 software based on DETFs and their target DEGs (common DEGs which are associated with patients survival rate) in HCC patients. GRNs were analyzed using Centiscape [20], CytoCtrlAnalyser [21], and Minimum Connected Dominating Set (MCDS) [22] plugins. We created a network that allows us to predict the impact of molecular signatures and their regulatory mechanism by DETFs in HCC on a cirrhotic background. Based on mentioned points, we used nodes degree, betweenness, classi cation, control centrality, and MCDS algorithms to nd the GRN key regulators. Node degree is determined based on the number of other nodes that are directly connected to a node. Betweenness measures the number of regulators for a node that controls the node over the interactions in the whole network. The node classi cation was used to identify the controllability of a node in the network, accordingly, the classi cation for a node is de ned as "indispensable", "neutral" or "dispensable" which are correlated with increasing, no effect, or decreasing the number of the minimum number of driver nodes needed to control the network when a speci c node is absent. MCDS classi es a node as "dominator", "connector" or "none", a dominator node is a node that provides full control over the network and a connector is a node that connects the dominators in the GRN.

Protein-Protein interaction network
We investigated protein-protein interactions (PPIs) between components of each GRN using the STRING database [23]. In this database, we used experimentally determined interaction then displayed data with a score of more than 0.4 in the Cytoscape software.

GO and pathway enrichment analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID) [24], an essential tool for systematically extracting biological information from numerous genes, was used to perform gene ontology (GO) and pathway enrichment analysis [25,26]. pathway enrichment analysis was performed using the KEGG database and here, we developed network-construction models to better understanding this relationship [27]. P-value less than 0.01 was considered to indicate a signi cant difference.

Common DEGs involved in HCC on a cirrhotic background
To reach a set of common DEGs involved in HCC on a cirrhotic background, we performed expression data analysis using three available datasets. We carried out expression data analysis of 256 cirrhotic tissues and 584 HCC samples. According to our criteria, all DEGs were divided into Up-and Downregulated genes. Noticeably, we selected all common DEGs among the three datasets. In this way, we found 427 common genes which differentially expressed in HCC derived from liver cirrhosis. Our results showed that 85 genes had increased expression and 342 genes had decreased expression.

Molecular signatures as potential prognostic markers in HCC
Using patients' survival rate analysis from 364 tumors samples of HCC patients, we detected all common DEGs whose expression level is associated with a lower chance of the patients to be survived. Therefore, we showed a list of 231 molecular signatures in HCC that should be considered as potential prognostic markers for this type of liver cancer. Of all identi ed molecular signatures, 47 genes are up-regulated and 184 genes are down-regulated in HCC compared to the cirrhotic tissues. In Fig. 2, FC and HR for the identi ed molecular signatures are illustrated.  Fig. 3C. All criteria and also the expression pattern for these DETFs are indicated in Table 2.  Fig. 4. According to the experimental evidence interaction score of more than 0.4, we found CDC20, JUN, and CTTNB1 as three hub nodes for the constructed network.

GO term enrichment analysis
To investigate the function of the identi ed molecular signatures, we analyzed them using DAVID (Database for Annotation, Visualization and Integrated Discovery) which is a web-based tool. In this way, all acquired genes were signi cantly enriched in biological process, molecular function, and cellular component. Genes enrichment analysis in cellular components showed cytoplasmic part, extracellular region, and vesicle (Fig. 5A) as signi cant results. Genes were also showed cellular response to chemical stimulus, response to oxygen-containing compound, and single-organism cellular process as signi cant processes (Fig. 5B). According to Fig. 5C, genes were also enriched in tetrapyrrole, collagen, heme, and calcium ion binding as signi cant molecular function processes.

Pathway enrichment analysis
KEGG pathway analysis was used to identify pathway enrichment for identi ed molecular signatures. In this regard, we found 5 signi cant pathways involved in molecular signatures in HCC on a cirrhotic background. As indicated in Fig. 5D the most signi cantly enriched pathways are consist of linoleic acid metabolism, chemical carcinogenesis, cell cycle, drug metabolism-cytochrome p450, metabolism of xenobiotics by cytochrome p450.

Discussion
Patients with cirrhosis are at risk of a variety of complex diseases like ascites, esophageal or gastric varices, hepatic encephalopathy, and hepatocellular carcinoma [28]. Cirrhosis is the most important risk factor for the development of HCC and about 80% of HCC cases are caused by liver cirrhosis.
One of the effective methods in the management of HCC on a cirrhotic background is to understand the molecular mechanisms underlying the transition level of cirrhosis to HCC. Studies show genetic factors affect the development of HCC from cirrhosis. Therefore, it is important to understand how HCC develops from cirrhosis to decipher factors involved in cell malignancy. Noticeably, identifying these factors is crucial for screening and molecular targeted therapy for cancer. HCC development is closely linked to cirrhosis, an in ammatory liver disease, in which normal liver tissue is replaced by scar tissue and reconstructive nodes after prolonged damage caused by various causes such as hepatitis B or C viruses [29].
Like many other cancers, HCC develops slowly after the progressive accumulation of alterations in genetic and epigenetic factors, therefore, to decipher novel strategies for the treatment of HCC, the molecular networks and pathways need to be further investigated.
At rst glance, it seems that enough studies have been done on HCC, however, there is an urgent need for further research on this subject especially given the path we took in this study. The used algorithms and methods for this study are valuable and e cient due to their lower cost and earlier e ciency. Because of using experimental evidence algorithms, we could access important and key genes along this path with greater con dence and a lower probability of error [30].
As studies have shown, one advantage of compiling large numbers of high throughput data is that the results of different studies can be compared directly [31]. In this study, we had an effort to nd a set of cancer-associated genes in HCC on a cirrhotic background using a comprehensive data analysis and systems biological viewpoint. Regarding the analysis of three datasets, we found totally 427 common DEGs in HCC derived from cirrhotic tissues. Then we analyzed the association of patients' overall survival rate for each acquired common DEG and we found a list of 231 molecular signatures which could be considered as prognostic markers for HCC patients. In other words, common DEGs were ltered to identify a list of prognostic markers for HCC on a cirrhotic background whose expression changes reduce the overall survival rate of HCC patients (Fig. 2).
Moreover, using experimental evidence we constructed a GRN consist of 231 genes as molecular signatures and 29 DETFs which affect the expression level of the molecular signatures in HCC. After analyzing the constructed GRN, we found 9 DETFs as the key regulators for the molecular signatures. The  DETFs by the names of TCF4, RUNX1, HINFP, KDM2B, MAF, and JUN are down-regulated and NR5A2, NFYA, and AR are up-regulated in HCC (Table 2). This approach shows these 9 transcription factors as one of the most important elements to control the expression levels of genes that are associated with patients' survival rate in HCC on a cirrhotic background which could be considered as targets for molecular target therapies in this type of cancer. The impact of some of these transcription factors has been previously reported in cancer progression and tumorigenesis. The transcription factor TCF4 causes the epithelial to mesenchymal transition and enhances the cancer cell invasion [32]. RUNX1 is a member of the RUNX family which plays important role in the development of cancer and tumorigenesis [33,34]. KDM2B roles in the alteration of the gene expression as a histone lysine demethylase by epigenetics changes. KDM2B increases cancer cell proliferation and enhances cellular migration by affecting the migration-associated genes [35]. NFYA is reported to be up-regulated in HCC, breast, lung and other types of cancers [36]. The value of NR5A2 as a key regulator for colorectal cancer metastasis was studied [37].
Besides, it was shown that NR5A2 expression level is up-regulated in glioma. Also, the expression level of NR5A2 is considered a poor prognostic factor in glioma patients. In addition, this factor plays a role in cell proliferation, migration, and invasion in malignant glioblastoma cells [38]. We previously showed that the nuclear receptor AR is an important factor for breast cancer development and progression [22]. The value of AR has been shown in HCC as a useful molecule in the molecular targeted therapy for hard-totreat cancers [39].
In addition, we found protein interaction networks based on experimental evidence that convey some important information about cellular pathways and developing effective therapies for the treatment of cancer (Fig. 4). Proteins are vital components that act as molecular machines, sensors, transporters, and structural elements, with interactions between proteins, being key to their function [40]. In this study, PPI network analysis showed that CDC20 plays as a hub node. CDC20 (Cell division cycle 20) is known as a key element that is remarkably suppressed by p53 introduction and is up-regulated in a wide variety of human cancer tissues. CDC20 is known as a potential cancer therapeutic target that is negatively regulated by p53. We also found JUN and CTTNB1 (Catenin beta 1), both of which play pivotal roles in a variety of cancers, as hub genes in PPI network. The products of the Jun family genes are essential components of the activating protein-1 transcription factor complexes that are critically important in the control of cell growth, differentiation, and neoplastic transformation. It should be noted that JUN plays a crucial role in signal transduction pathways and is involved in cell division, motility, adhesion, and survival in both normal and cancer cells [41], and affects the expression of catenin beta 1 in gastroenteropancreatic endocrine tumors [42].
With regards to the pathway enrichment analysis, we found linoleic acid metabolism, chemical carcinogenesis, and cell cycle as the most signi cant pathways for HCC patients' molecular signatures. The linoleic acid pathway regulates many physiological processes, then, its metabolic pathway is vital for metabolisms in cancer cells. According to the previous studies, defects in this pathway have been observed in HCC patients, therefore, our results con rm the previous ones [43,44]. Besides, pathways involved with chemical carcinogenesis and cell cycle are associated with cancer development. Also, according to the results from the GO analysis, we found cytoplasmic, extracellular region, and vesicle as cellular components associated with HCC on cirrhotic tissue molecular signatures. In addition, GO analysis at the level of molecular function showed the value of some binding functions such as tetrapyrrole, collagen, and heme-binding proteins (Fig. 5).
In conclusion, the validation of these ndings is valuable in clinical and pathological research in HCC patients. In the present study, we proposed a comprehensive transcriptome data analysis to nd a set of molecular signatures associated with the transition between HCC and cirrhotic tissue. Then we constructed a GRN to highlight the possible regulatory mechanism of these DEGs by DETFs. In addition, we used systems biology approaches to nd PPI network, pathways and GO associated with the acquired molecular signatures. The results of this study suggested some key elements that could be used as potential prognostic markers and/or therapeutic targets in cirrhotic and HCC patients to prevent malignancy.

Declarations
Authorship contribution statement Jamshid Motalebzadeh: Conceptualization, Investigation, Formal analysis, Writing original draft, Writing review and editing, Supervision.

None
The authors declare that they have no known competing nancial interests or personal relationships that could have appeared to in uence the work reported in this paper.
All data will be made available on request.    Blue color font.