The pattern of gene copy number variations (CNVs) in hepatocellular carcinoma; in silico analysis

Recent studies showed that genetic lost or gain in the genome can predispose cells toward malignancy. Hepatocellular carcinoma (HCC) is the most common type of liver cancer which occurs predominantly in patients with underlying chronic liver disease and cirrhosis. Prognosis of HCC is strongly connected with diagnostic delay. To date, no ideal screening modality has been developed for HCC. Recent ndings demonstrated that Copy number variation (CNVs) can lead to activation of oncogenes and inactivation of tumor suppressor genes in cancers. In this study, CNV prole of 361 HCC samples was evaluated to reveal the potent - chromosomal regions involved in the disease. The obtained data showed that the chr1q and chr8p were two hotspot regions for gene amplications and deletions in studied samples respectively. In this research, YY1AP1 (Yin Yang-1 Associated Protein 1) on chr1q22 was the most amplied gene in HCC samples and showed the positive correlation with tumor grade. Deletion of CHMP7 (Charged Multivesicular Body Protein 7) on chr8p21.3 was another frequently observed CNV among HCC patients. Both genes were interacted with variety of well-known oncogenes and tumor suppressor genes including YY1 (Yin Yang 1), CCND1 (Cyclin D1), HDAC1 (Histone deacetylase 1), VHL (von Hippel-Lindau tumor suppressor), MAD2L2 (Mitotic Arrest Decient 2 Like 2), CEBPA (CCAAT/enhancer-binding protein alpha), CHMP4A, CHMP5, CHMP2A, CHMP3 and ENSG00000249884 (RNF103-CHMP3 gene), all of them are well-known in carcinogenesis. Although this study was based on in silico evaluations, our ndings can open a new window for researchers of HCC to focus on such candidate genes during experimental assays.


Background
Hepatocellular carcinoma (HCC) is the most common type of liver cancer that occurs predominantly in patients with underlying chronic liver disease and cirrhosis. The well-established risk factors of HCC are hepatitis B virus infection, alcoholism and metabolic disorders [1,2]. Prognosis of HCC is strongly connected with diagnostic delay and HCC is usually diagnosed after development of end-stage clinical symptoms [3]. To date, no ideal screening modality has been developed for HCC. The Liver lesions are usually seen on computed tomography (CT), many HCC tumors are asymptomatic and will not be diagnosed in time. Recent ndings demonstrated that Copy number variation (CNV) can lead to activation of oncogenes and inactivation of tumor suppressor genes in cancers and consequently, can predispose the cells toward malignancy. These genetic variations are included with variety of genetic gains and losses which can deviate the genome of the cells from normal diploid state [4,5]. Such cytogenetic changes in uence the stability of genome and push the cells toward tumorigenesis [6]. Chromosomal aberrations are contributed to tumor age of onset, tumor metastatic state, drug resistant and tumor failure phenotypes [7]. Recent advances in functional genomics techniques like comparative genomic hybridization (CGH) or microarray open a new window to characterize the cytogenetic signatures of the malignant cells. This breakthrough seems to be promising in diagnostic and therapeutic area, these days [8]. Here, by means of an in silico analysis we have evaluated the pattern of CNVs in HCC samples which could be an interesting analysis to de ne the experimental researches on HCC

Results
As illustrated in Fig. 1, chr1q and chr8p were detected as the hotspot loci of copy number variation in the studied HCC samples. The obtained data will be discussed below.
These proteins were belonged to 15 signaling pathways which have been illustrated in Fig. 2B. Around 12.5 % of proteins were associated with Cholesterol biosynthesis. The ampli cation of 13 out of these genes were associated with tumor grade (Table 1) We also found that hsa-miR-375, hsa-miR-222-3p, are two miRNAs that target YY1AP1, therefore, are able to regulate the concentration of YY1AP1 protein in the cell (Table3).
Cytoband chr8p was strongly associated with gene deletion in HCC samples Of 24776 examined genes, 172 (0.69%) were lost in studied samples (n=361) (with linear copy number values cut-off ≥0.5) which were mostly located on chr8p21-23. The most deletion scores were belonged to 17 genes including TNFRSF10B, RHOBTB2, PEBP4, TNFRSF10C, CHMP7, TNFRSF10A, ENTPD4, EGR3, BIN3, PDLIM2, R3HCC1, LOXL2, STC1, PIWIL2, SLC25A37, TNFRSF10D and CSMD1 that were considered for further analysis-. Among these, 16 genes were mapped on chr8p21.3 (94%) and -one gene chr8p23.2 (6%) (Fig. 4A). These genes are transcription factors and cytoskeletal proteins that act in apoptosis, EGF, FGF and P53 signaling pathways (Fig. 4B). Deletions of none of the mentioned genes were associated with tumor metastasis or grade. Although all of the mentioned genes were downregulated in studied cancerous tissues, the correlation with CNV was not signi cant expect for CHMP7 gene whose expression showed the moderate correlation (r=0.5) with corresponding CNV in 70% of studied samples. The obtained data from STRING Interaction Network showed that 10 proteins directly interacted with CHMP7 (Charged Multivesicular Body Protein 7) gene among which CHMP4A, CHMP5, CHMP2A, CHMP3 and ENSG00000249884 (RNF103-CHMP3 gene) are the top ve proteins interacted with CHMP7 gene (Fig.  3B). Enrichment analysis through pathway common webserver showed that some of these genes were belonged to the vital cellular pathway including spindle organization, sister chromatid segregation, centrosome duplication, cytokinesis, Nucleus organization, Nuclear envelope reassembly and Vacuolar transport. (Table2).
We also found that hsa-miR-375, hsa-miR-222-3p, are two miRNAs that target CHMP7, therefore they are capable of regulating the concentration of CHMP7 protein in the cell (Table3).

Discussion
Liver cancer is one of the human cancers with poor prognosis which largely originates from its genetic nature. Therefore, identifying the key genetic components and new therapeutic targets is a major step, especially in case of hepatocellular carcinoma (HCC) affected patients who are ineligible for surgical resection or liver transplant. However, the heterogenous nature of HCC has complicated this approach [9].
It is truly amazing that exponential advances in genomic sequencing during the past 10 years, along with the emergence of bioinformatic tools, enables the scientists to translate such high-throughput data, speeds up the discovery of hundreds of new potential targets for human diseases especially in cancer, and nally, offers a perspective to propose new prognostic, diagnostic and therapies approaches [10]. However, there is an urgent need to work on such putative targets experimentally to reveal the druggable candidates [9]. Copy number variations are one kind of DNA mutations that seem to have high impact in cancer pathogenesis. These genetic variations are also valuable tools for nding the hotspot regions in cancers [11].
In this study, the authors carried out genome-wide chromosomal CNV analysis in 361 HCC patients whose both CNV and RNA-seq data were available on highly cited bioportals. Around 24776 genes were screened in this study. As chromosomal CNVs may affect the level of gene expression in HCC, the RNA sequencing pro le of target genes were also considered in parallel. The observed data on 361 HCC samples showed that the chr1q and chr8p were the most important regions for gene ampli cation and deletion respectively. In study by Takafumi Nishimuraet al, they showed that chromosomal gain at chr1q is one of the most common features of HCC [12,13]. Chen and coworker introduced chr1q as host spot regions of gene ampli cation in HCC. In this study, the regions of chr1q12-q22, chr1q23.3-q25.3 and chr1q23.1-q43 were reported as minimal ampli ed region on chr1q [14]. The chr8p region has also been highlighted for HCC in two independent studies by Roesseler [15] and Qin [16]. Another research demonstrates that HCC may develop from cirrhotic cells carrying chr8p loss [17]. In case of 34 found ampli ed genes, 13 genes were associated with tumor grade. Growing body of evidences showed that ampli cations of chr1q and chr8q have been strongly connected with tumor grade and size [4]. We also found that all of selected genes were involved in pathways which repeatedly reported as critical routs in HCC cells [18]. These include Wnt signaling pathway, angiogenesis signaling pathway, CCKR signaling map pathway, Ras signaling, cholesterol biosynthesis pathway, EGF receptor signaling pathway, FGF signaling pathway, avin signaling pathway, glycolysis pathway, in ammation mediated by chemokine and cytokine signaling pathway, integrin signaling pathway, interleukin signaling pathway, PDGF signaling pathway, pyruvate metabolism pathway, RAS signaling pathway, and synaptic vesicle tra cking. We also found that the gain of YY1AP1 and lose of the CHMP7 were observed in most patients (around 70%) and resulted in upregulation and downregulation of corresponding transcripts respectively. It has also been found that YY1AP1 interact with NeuroG3, SS18L2, ZMYM4, ZNF496 and ZNF576 -the proteins that apart from the rst two, altered expressions in HCC is reported in literatures [19][20][21]. Regarding the two microRNAs that target YY1AP1, hsa-miR-375 is a tumor suppressor [22] while hsa-miR-222-3p shows oncogenic properties [23]. The YY1AP1 is a component of the INO80 chromatin remodeling complex, which is responsible for transcriptional regulation, DNA repair, and replication [24]. In a recent study by Zhao X et al, they found that YY1AP1 may serve as a key molecular target for EpCAM(+) AFP(+) HCC subtype which was attributed with poor prognosis and stem cell-like phenotype [9]. They also showed that YY1AP1 is connected with stem cell features of this subtype and silencing of YY1AP1 eliminates the oncogenic feature of the cells through altering the chromatin landscape and triggering massive apoptosis in vitro and in vivo [9].
In case of CHMP7, the ve top interacting proteins are CHMP4A, CHMP5, CHMP2A, CHMP3 and ENSG00000249884, all of which are involved in endosomal transport. Apart from CHMP4A, involvement of the rest was previously shown in HCC [19,25]. We also found that four microRNAs target CHMP7: hsa-miR-26b-5p, hsa-miR-505-5p, hsa-miR-484 and hsa-miR-15b-5p. hsa-miR-26b-5p is involved in hepatitis B virus mediated HCC [26]; hsa-miR-484 shows both oncogenic and tumor-suppressor properties depending on the interacting partners [27,28] and miR-15b-5p is a potential tumor suppressor [29,30]. This suggests that CHMP7 could play important role in mediating HCC. no publication was found for CHMP7 gene in HCC that shows its potency for further analysis in HCC samples, our enrichment analysis showed that CHMP7 promotes nuclear envelope sealing and mitotic spindle disassembly during late anaphase. It plays a role in the endosomal sorting pathway too. Altogether, considering the observed data, we suggest that candidate regions chr1q and chr8p in HCC could be subjects of further researches, with a major emphasis on the role of two genes YY1AP1 and CHMP7.

Methods
The TCGA CNA raw data possessing 24777 genes in 361 samples of Liver Hepatocellular Carcinoma (TCGA, Provisional) was extracted from cBioPortal (http://www.cbioportal.org) and analyzed in R v3.5 using the cgdsr extension package (cran.rproject.org/web/packagescgdsr/). The linear copy number values cut-off ≥ 0.5 and cut-off ≤ -0.5 were considered as thresholds for gene ampli cation and deletion respectively. Filtered genes were selected as target/candidate genes for more analysis. The frequency of CNV for target genes was also calculated in HCC samples. Interaction of target proteins with other proteins and their involvement in various pathways was obtained from STRINGDB (https://stringdb.org/). Using the PNTHER classi cation system (http://www.pantherdb.org), the protein classes and the pathway involvements for target proteins were estimated. The association of CNV variants with pathological tumor grade and stage was also examined in R.
Consequently, we obtained the raw data of RNA-seq for Liver Hepatocellular Carcinoma (TCGA, Provisional) and extracted the relevant information for these candidate genes in order to examine if any correlation exists between CNV of target genes and their corresponding expression values. As in strong positive correlation, the linear correlation coe cient (r) is close to +1, the results were ltered based on the r > 0.7. The interaction of target protein was traced using the UCSC genome browser (https://genome.ucsc.edu). Two databases, one for oncogenes (http://ongene.bioinfo-minzhao.org/) and one for tumor suppressor genes (https://bioinfo.uth.edu/TSGene/), were used to identify if any of the selected genes are listed oncogenes/tumor suppressor genes. Enrichment analysis was performed for selected ampli ed/deleted genes using Pathway Commons webserver (https://apps.pathwaycommons.org/). MiRWalk analysis helped to predict the possible presence of seed sequence in the selected gene. Tables   Table 1 The