Integrated Analysis of Diffrentially Expressed Genes and Epigenetics Biomarkers in HCV-cirrhosis

Purpose To identify the key genes and epigenetics biomarkers in HCV-cirrhosis based on informatics analysis of 4 GEO datasets. Methods After downloaded GEO datasets from NCBI, two GEO datasets (GSE6764 and GSE14323) were used to screen for the differentially expressed genes (DEGs) by limma package in R. Then DEGs were applied for Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis by clusterProler package in R. Protein-protein interaction network was constructed by cytoscape to identify the hub genes of HCV-cirrhosis. DNA methylation dataset GSE60753 was analyzed by ChAMP package in R to identify the differentially methylated genes (DMGs). Cross-analysis of DEGs and DMGs were performed to identify the genes differentially expression and methylation, and further more indicated the methylation of them. Results 357 DEGs and 8830 DMGs were identied in HCV-cirrhosis. Functional analysis of DEGs obtained pathways that may involved in the pathgenesis of HCV-cirrhosis, including focal adhesion, inuenza A, ECM-receptor interaction, protein digestion and absorption, etc. Cross-analysis of DEGs and DMGs identied 212 genes that changed in mRNA level and methylation status, and most of them were methylated in genebodies, but not CpG island. PPI construction in cytoscape revealed 25 hub genes in GEGs, and 5 of which were further analyzed for their probability as markers of HCV-cirrhosis by ROC curve and validation in another GEO dataset.


Background
Cirrhosis is a chronic liver desease characterized by esophageal varices, ascites, high portal pressure, etc, which caused by the formation of brous septae and nodules, collapse of liver structures [1]. Cirrhosis was usually induced by virus infection, alcoholic abuse, immune dysfuction, and biliary desease, which would lead to liver injury [2]. Among these etiologies, infection of Hepatitis C virus (HCV) plays an nonnegligible role in cirrhosis, expecially under the condition that the epidemic of Hepatitis B virus is getting well controll in China [3]. However, it is still not easy to prevent Hepatitis C progress into cirrhosis, especially caused by type 1 HCV [4]. Therefore, it is necessary to identify the key genes of HCV-cirrhosis and to prepare for appropriate treatment.
DNA methylation is a basic biology process that plays an important role in cellular process and organ's function [5]. At present, mainstream understanding of methylation is that hyper methylation of DNA lead to downregulaiton or inhibition of gene expression [6], although there are still other ndings against it [7].
Studies found that liver brosis are usually accompanied by methylation change in global level or particular genes [8][9][10]. However, the mechanisms that gene regulated by DNA methylation in cirrhosis still remains unclear.
Bioinformatics is an emerging and rapidly developing subject, which relies on the progress of sequencing technology and information technology to analyze big data [11], and have obtained many interesting research results. Bioinformatics analysis of the high-throughput sequencing data may provide new clue for the mechanism of HCV-cirrhosis. Therefore, it is useful to conduct an analysis combine gene expression and DNA methylation in HCV-cirrhosis.
In present study, 2 gene expression pro les GSE6764 and GSE14323, which from Gene Expression Omnibus (GEO) database were used to identify the differentially expressed genes. Meanwhile,a methylation pro le GSE60753 was used to identify the differentially methylated CpG sites and genes. Moreover, construction of protein-protein interaction network of these DEGs revealed the potential associations between them. Finally, key genes validation were carried in GSE36411. These results may provide a new sight into the molecular mechanism and the relationship between DNA methylation and gene expression in HCV-cirrhosis.

Materials And Methods
Access of GSE datasets "HCV" and "cirrhosis" or "liver brosis" were used for retrieving in the GEO database, with a inclusion criteria: the dataset has at least two groups,including control and patient with HCV-cirrhosis, and sample size > 20. Two expression pro ling datasets( GSE6764, GSE14323) were included (Table.1  Identi cation of differentially expressed genes R packages "limma"and "preprocessCore" were used to perform background correction and quantile normalization respectively. Then, each probe names were converted into gene symbols according to GPL570 and GPL96 annotation les, respectively. The R package "limma" in R (v4.0.3) was further used to identify differentially expressed genes (DEGs) between patients with HCV-cirrhosis and control. False discovery rat (FDR) < 0.05, adjusted p value < 0.05 and |log2FC|>1 were considerd as the cutoff values for DEGs screening.

Functional enrichment analysis of DEGs
With the help of R package clusterPro ler, gene ontology (GO) and the KEGG signaling pathway enrichment analysis were performed on R by submit DEGs obtained in the previous step, pathways or processes with p < 0.05 and gene number ≥ 5 were recognized and regarded as the key pathways and processes.

Methylation analysis of cirrhosis
The GEO dataset GSE60753 contains DNA methylation pro le of liver was downloaded and then their methylation status were analyzed by R package ChAMP.

Cross-analysis of DEGs and DMGs
In order to know the affection of DNA methylation on gene expression in HCV-cirrhosis, all DEGs with methylation changement were identi ed, and the region of methylation in these genes were analysed.

Construction of protein-protein interaction network
All DEGs were submitted to STRING (http://string-db.org), and 303 DEGs were lterd into the PPI network, while 50 DEGs did not fall into the PPI network. After modi ed in cytoscape v3.6.0, a total of 303 nodes and 1932 edges displayed. Then, maximal clique centrality method (MCC) in the plugin cytoHubba was used to get the hub-genes in these DEGs, the score > 1* 10^11 was considered signi cant and these genes were considered hub genes in HCV -cirrhosis.

ROC curve analysis of methylated hub genes
The whole methylation status of 25 hub genes were analysed by the DMP identi ed before, and only 14 hub genes accompanied with DNA methylation change, indeed hypermethylation. Then,the expression of these genes in GSE14323 dataset were applied to receiver-operator characteristic (ROC) curve analysis, p < 0.05 was considered statistically different, and these genes were considered key genes in HCV -cirrhosis.

Validation of key genes
In order to make this study convincible, another GEO dateset GSE36411, which contains HCV-cirrhosis patient and normal patients, was used to validate the hub genes expression. The expression of 5 genes between control group and HCV-cirrhosis group were analyzed by student's t-test in GraphPad Prism 8.0, and p < 0.05 was considered statistically different.

Methylation status of hub genes
Then the methylation changement among the 25 hub genes were analyzed. 11 genes, include OASL, OAS3, IFI35, IFI6, IFIH1,IFI44, CXCL10,HERC6, RTP4, HERC5, SP110, displayed no signi cant methylation change, while the other 14 hub genes displayed a signi cant change in at leaset one CpG site (supplementary table 2). Interestingly, no methylation change happens in CpG island of these hub genes.

ROC curve analysis
The performance of the 14 differentially methylated hub genes as diagnostic biomarkers was examined by ROC curves. Results indicated that, ISG15, TRIM22, IFI44L, IFI27 and IFI16 are potential markers with AUC value > 0.99,and P value < 0.0001. (Fig. 6).
Validation of hub genes ISG15, TRIM22, IFI44L, IFI27 and IFI16 in GSE36411 were analyzed to com rm the expression of the methylated hub genes. As shown in Fig. 7, ISG15, TRIM22, FI27 and IFI16 signi cantly increased in HCV-cirrhosis patient.

Disscussion
As the progression of cirrhosis is not fully understood, the treatment is still limited. In present study, cross analysis of HCV-cirrhosis was carried out. DEGs were tered, hub genes were predicted by STRING and cytoscape,GO and KEGG analysis of DEGs were performed as well. The methylation of hub genes were analyzed. All of these should provide a comprehensive understanding of HCV-cirrhosis.
GO enrich analysis of DEGs demonstrated that molecular function mainly enriched in extracellular matrix structural constituent, glycosaminoglycan binding, receptor ligand activity; and that biological function mainly enriched in extracellular matrix organization, extracellular structure organization, response to virus, defense response to virus, regulation of cell − cell adhesion; and that cellular component mainly enriched in collagen − containing extracellular matrix, endoplasmic reticulum lumen, external side of plasma membrane. It is explicit that extracellular matrix deposition is the basic pathological change in cirrhosis. Meanwhile, the response and defense response to virus also enriched in DEGs. Previous research has shown that the response to HCV result in chronic in ammation and become a vital activator for myo broblast transdifferentiation [14]. HCV infection not only destroy the hepatocytes, but also induces the expression of TGF-β family [15]. So, antiviral is the principal therapy for HCV-cirrhosis [16]. Study also emphasized that cirrhosis could be reversible when the etiology is removed [17]. These indicated the importance of TGF-βin cirrhosis.
DNA methylation is a key event in cellular and molecular biological processes. Although past studies have shown a reduced mehtylation level in CCl 4 induced liver brosis [10], there are still controversial ndings that DNA methylation increased in particular genes which were upregulated in brosis model [22]. Confusingly, a longer CCl4 exposure did not signi cantly change DNA methylation status [23]. On the other hand, methylation in promoters of several genes including P14, P15, P73, MGMT were found to be increased in liver disease including cirrhosis [24]. In a word, the diversity of DNA methylation status in cirrhosis is undeniable. In fact, gene expression may not negatively correlated with DNA methylation level, because their methylation may increase the binding a nity of their corresponding transcription factors. [25]. Although methylation at CpG island (CGI) of transcription start sites (TSS) was know to repress gene expression,while it is still unkown in non-CGI TSS [26]. Our results indicated that not all DEGs were hyper-or hypo-methylated. of the 25 hub genes, 11 genes showed no mehtylation change, and 6 gene hyper-methylated in only 1 CpG site, while the others in more than 1 site. Thus, it is di cult to judge the effection of DNA methylation on expression of gene which were methylated at different site. More detailed studies are needed to clarify the speci c mechanism.
We screened out 25 genes as hub genes, including OAS family, MX family, IFI family, HERC family and others. OAS1, MX1 polymorphisms was found to associated with the severity of liver disease in HIV and HCV co-infected patients [27] [28]. Morever, MX1 was up-regulated in the activated HSCs [29]. Thus, we might infer that, MX1 promotes HSC activation and is related to the prognosis of HCV-cirrhosis, althouth further study is needed to clarify this. In addition, OAS1 and IFI44 expression were found to be increased in patients with systemic sclerosis-related interstitial lung disease, while there is no study focused on their function in cirrhosis. Of the identi ed 5 biomarkers, IFI27 and ISG15 were shown to increased signi cantly in non-sustained virological response (SVR) patients than in SVR patients [30]; while TRIM22, IFI44L, and IFI16 is still lack of investigation. Thus, we infer that OAS1 and MX1 may be the epigenetics biomarkers of HCV-cirrhosis. However, more detailed work is needed to clarify their function in HCV-cirrhosis.

Conclusion
We identi ed the DEGs in cirrhosis, and the related pathway, including focal adhesion, in uenza A, protein digestion and absorbtion, which may be involved in HCV-cirrhosis. Furthermore, we preliminarily discussed the effection of DNA methylation on gene expression in cirrhosis, which still need further study to explore. And proposed that OAS1 and MX1 may be the epigenetics biomarkers of HCV-cirrhosis. Figure 1 Heat map of DEGs. All differentially expressed genes in GSE6764 (A) and GSE14323 (B) are showed.

Figure 5
Top 25 hub genes network of HCV-cirrhosis. A redder color represent a high score in cytoscape by MCC method.  Validation ISG15, TRIM22, IFI44L, IFI27 and IFI16 in GSE36411. The expression differences of 5 hub genes between the normal group and HCV-brosis group were statistically signi cant except IFI44L. **p < 0.01, ***p < 0.001.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.