Identi cation of Differentially Expressed Genes Triggered by Aberrant Methylation in Idiopathic Pulmonary Fibrosis Using Integrated Bioinformatic Analysis

Shuaijun Chen Huazhong University of Science and Technology Main Campus: Huazhong University of Science and Technology https://orcid.org/0000-0002-8302-2728 Jun Zhang Huazhong University of Science and Technology Main Campus: Huazhong University of Science and Technology Wanli Ma Huazhong University of Science and Technology Main Campus: Huazhong University of Science and Technology Hong Ye (  yehmwl@hust.edu.cn ) Huazhong University of Science and Technology Main Campus: Huazhong University of Science and Technology


Abstract
Background Idiopathic pulmonary brosis (IPF) is a relentlessly progressive and fatal brotic lung disease all over the world, and speci c pathogenesis is still not well understood. DNA methylation is an essential epigenetic mechanism, which likely contributes to the progress of IPF. The purpose of this study is to identify aberrantly methylated differentially expressed genes (DEGs) in IPF and to explore the underlying mechanisms of IPF by using integrated bioinformatics analysis.
Methods Gene expression pro les and gene methylation pro les were downloaded and analyzed to identify the aberrantly methylated-differentially expressed genes. Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Search Tool for the Retrieval of Interacting Genes Database (STRING), and Gene set enrichment analysis (GSEA) were used to evaluate the function of DEGs. RT-PCR was used to verify the mRNA levels of DEGs in mice with pulmonary brosis.
Results By analyzing the differentially expressed genes of the three IPF expression pro les, and taking the intersection, we got 143 co-upregulated genes and 104 co-downregulated genes; GO and KEGG pathway analysis of the DEGs suggested these genes involved in the extracellular matrix organization, multicellular organismal homeostasis. Combining the sequencing data of two IPF methylation chips, we have identi ed genes that may be regulated by methylation in IPF. Finally, we obtained the mRNA expression of DEGs using a mouse model of pulmonary brosis.
Conclusions Through integrated analysis and experimental veri cation, we found a series of biomarkers that were regulated by methylation should be potential therapeutic targets for IPF.

Data processing and identi cation of DEGs and DMGS in IPF
We downloaded the expression matrix of IPF patients from GEO. For the chip sequencing data, we performed gene symbol conversion analysis on the expression matrix according to the soft le. For the expression matrix of high-throughput sequencing, we uni ed the gene name into the form of a gene symbol. For the chip sequencing data, we used the "Limma" package for differential gene screening [5], and for the expression matrix of high-throughput sequencing, we used the "edgR" package for differential analysis [6]. For DEGs, |log2FC|>1 and P<, 0.05 were set as the cutoff criteria. Similarly, we also downloaded the methylation chip sequencing data of IPF patients and used the "ChAMP" package for annotation and difference analysis of methylation sites [7], the differential β value △β where gene with a positive △β value signi es hypermethylation and gene with a negative △β value signi es hypomethylation can be applied in subsequent analysis, △β >0.2 and P< 0.05 were set as the cutoff criteria. All calculations were done using R4.0.

GO term and KEGG pathway enrichment analysis
Gene Ontology (GO) enrichment analysis was employed for functional analysis. Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was applied to analyze the biological functions and pathways of the DEGs and DMGS [8]. KOBAS is a widely used gene set enrichment (GSE) analysis tool [9], and we assessed GO and KEGG analysis on KOBAS 3.0(http://kobas.cbi.pku.edu.cn/kobas3/?t=1). P<0.05 and counts ≥2 were used as the signi cance threshold.

PPI network construction and hub gene identi cation
We utilized the Search Tool for the Retrieval of Interacting Genes (STRING) database to establish a protein-protein interaction (PPI) network and reveal the relationship among the DEGs [10]. The interaction score was set at 0.4 in the STRING database. Cytoscape was conducted to enhance the legibility of the PPI network on the basis of interaction data obtained from the STRING database. We analyzed the hub genes of the PPI network through the MCODE tool in Cytoscape, a plug-in for exploring the core modules in the gene network [11].
Gene set enrichment analysis (GSEA) Gene set enrichment analysis (GSEA) is a bioinformatics method that inspects the statistical signi cance of a priori de ned sets of genes and validates the differences between two biological states [12]. We divided samples into two phenotype subgroups, normal and IPF. Genes from the expression pro le were ranked in a list according to the degree of divergence between the two subgroups through GSEA software 4.0. Then, Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets were analyzed to identify functional terms and pathways enriched in each phenotype subgroup. Gene set permutations were executed 1000 times for each analysis. The criteria of signi cantly enriched pathways were normalized pvalue < 0.05 and the absolute value of normalized enrichment score (NES) >1.5.

Bleomycin-Induced Murine Pulmonary Fibrosis Model
Animal experiments were performed in accordance with the Guide for the Care and Use of Laboratory Animals and approved by the Institutional Animal Care and Use Committee (IACUC) of the Tongji Medical College, Huazhong University of Science and Technology. The C57BL/6 J mice (18-20 g, 6-8 weeks of age) were provided by the Experimental Animal Center of Huazhong University of Science and Technology in this study and placed in a standard freely accessible water and rodent laboratory food environment. All mice were randomly divided into two groups comprising bleomycin group and saline group (n = 6 per group; totally 12 mice were used). Bleomycin dissolved in saline solution (concentration, anesthetized through intraperitoneal injection of pentobarbital sodium (50 mg/kg) and euthanized by decapitation. Then lung tissues were taken for RNA extraction and tissue staining.

Masson's trichrome staining
The mouse lung tissues with the largest cross-section were embedded in para n and cut into 5-µM-thick tissue sections. And then, the tissue sections were stained with Masson's Trichrome Stain kit (Servicebio, G1006-100ML ). Finally, Olympus DP50-CU microscope was employed to observe the blue collagen deposition in the tissue sections and take pictures for morphological analysis and evaluation of collagen deposition.
RNA extraction and quantitative real-time RT-PCR.
After the mouse was dissected, the lung tissue was taken out and frozen in liquid nitrogen immediately and then placed in a refrigerator at -80℃ until it was used. The tissues were lysed with TRIzol reagent (Vazyme) and then extracted according to the instructions provided by the manufacturer. Next, cDNA was synthesized using HiScript II Q RT SuperMix (Vazyme, R223-01), and quantitative real-time PCR was performed using ChamQ SYBR qPCR Master Mix (Vazyme, Q331-02/03). The primers for and GAPDH are as follows:

Statistical analysis
Results are presented as mean ± SEM. All the data were analyzed by one-way analysis of variance (ANOVA) and Tukey's test for multiple comparisons. All p-value of less than 0.05 was considered statistically signi cant. Data analysis was processed by GraphPad Prism 8 software (La Jolla, CA, USA). Background IPF is a chronic progressive interstitial lung disease, which is characterized by the formation of scar tissue and the destruction of lung tissue structure, which ultimately leads to gas exchange dysfunction and restricted ventilation [13].To date, the pathogenesis of IPF is still unclear, but it may be affected by the complex interaction between environmental and genetic factors. The prognosis of idiopathic pulmonary brosis is poor, with a median survival time of 3-5 years after diagnosis and no curative drug therapy. Therefore, exploring speci c biomarkers and therapeutic targets in IPF plays a vital role in diagnosing and treating diseases [14].
With the development of sequencing technology, more and more research reveals the occurrence and progression of diseases through multi-omics. A large amount of sequencing data are mostly stored in the GEO database, which can be easily analyzed by other researchers. Ji-Hoon Cho. et al. used array sequencing to analyze 23 IPF patients and 5 normal lung tissues, revealing that several diseaseassociated modules involving genes from the TGF-beta, Wnt, focal adhesion, and smooth muscle actin pathways may be involved in advancing brosis [15]. Daryle J DePianto. et al. analyzed 40 IPF and 8 control lung tissues and performed chip sequencing analysis. The results suggest microscopic pathological heterogeneity in IPF lung tissue corresponds to speci c gene expression patterns related to bronchodilation and lymphoid aggregates [1]. Marissa J Schafer et al. used high-throughput sequencing to examine lung tissue from 19 normal and 20 IPF patients and found that senescence markers signi cantly increased in pulmonary brosis development [2].
The epigenetic modi cation includes DNA methylation, hydroxymethylation, histone modi cation, noncoding RNA (ncRNA), etc. Among them, DNA methylation plays an essential role in the repression of transposons and genes. In some cases, it is also involved in the activation of genes [16]. DNA methylation is closely related to human growth and development, social behavior, metabolism, tumors, etc. [17][18][19][20].
Changes in the level of DNA methylation are common in the occurrence and development of diseases. It can identify speci c biomarkers and provide crucial clues to the diagnosis, prognosis, and treatment of diseases. More and more evidence shows that abnormal DNA methylation levels are closely related to the occurrence and development of IPF and provide guidance for the diagnosis and treatment of IPF[16, 21,22]. For example, Atsushi Hata et al. used methylation microarray sequencing to explore the methylation pro les of normal and brotic lung tissue. They found that the low-methylation subgroup signi cantly correlated with IPF [4]. However, so far there have not been any studies to analyze methylation patterns in pulmonary brosis on the gene expression pro le.
In our study, We analyzed the sequencing data sets of the IPF mRNA expression spectrum in GEO data and the 450K methylation sequencing data sets. Through the integrated analysis of them, we found that the changes in genomic methylation modi cation signi cantly promoted the progress of pulmonary brosis.

Results
Differentially expressed genes between IPF patients and normal people We performed a differential expression of the gene sequencing data of three different IPF patient cohorts.
According to our screening criteria(log 2 |FC|>1 and p value<0.05), for the GSE53945 data set, we got 964 up-regulated genes and 1123 down-regulated genes. In the data set GSE92592, we have 1698 upregulated genes and 1309 down-regulated genes. In the data set GSE124685, we have 617 up-regulated genes and 395 down-regulated genes. The volcano map shows the up-regulated and downregulated genes( Figure 1A). Each heatmap shows the overall expression level of the differentially expressed genes between IPF patients and the normal population ( Figure 1B, Supplementary Table S1). To obtain more common and representative differentially expressed genes in IPF patients, we took the intersection of these three different data sets and nally got 143 co-upregulated genes and 104 co-downregulated genes ( Figure 1C).

GO function enrichment analysis of differentially expressed genes
To further explore the functions of these differentially expressed genes, we performed GO function enrichment analysis on the up-regulated genes and the down-regulated genes, respectively. As the results showed, the up-regulated genes were enriched in response to vitamin, response to nutrient, regeneration, osteoblast differentiation, ossi cation, hemidesmosome assembly, extracellular matrix structural constituent, extracellular structure organization, extracellular matrix disassembly, collagen trimer, collagen metabolic process, collagen bril organization, collagen-containing extracellular matrix, chondrocyte development, cartilage development, bone development, basement membrane ( Figure 2A).
We could nd that these up-regulated differential genes were enriched in multiple collagen-related GO terms.  Table S3).

Protein interaction network of differentially expressed genes
To further investigate the molecular mechanism of the development of IPF, we attempted to describe the interaction network between these different genes based on the STRING tool. The result indicated that these genes form an intertwined network with each other( Figure 4A). On the other hand, we used the tool MCODE to screen out two clusters of core genes from these differential genes,cluster1 included COL17A1, COL10A1, LCOL14A1, COL15A1, MMP13, SPP1, MMP1, and cluster2 included P2RY6, NTS, ADRB1, VIPR1, RXFP1, EDNRE, BDKRB2, RXFP1, Some of these genes are known to play a key role in the progression of pulmonary brosis( Figure 4B). Other genes may be involved in the formation of pulmonary brosis.

GSEA analysis of the mRNA expression pro le
We carried out GSEA analysis on the expression pro le of IPF patients and normal tissues to explore the signaling pathways changed during the formation and development of IPF. The results showed that Asthma, Type I diabetes mellitus, Lupus erythematosus, intestinal immune network for IgA production,p53 signaling pathway were activated in the tissues of IPF while aldosterone regulated sodium reabsorption was activated in the tissues of normal( Figure 5).

The role of gene methylation modi cation in pulmonary brosis
To further explore the molecular mechanism in the process of lung brosis, we analyzed the results of two methylation chips from IPF and normal samples. As shown in the heatmap, there are different gene methylation patterns between IPF tissue and normal lung tissue. The differentially methylated genes could well distinguish IPF tissue from normal lung tissue and were expected to become a new methylation marker for IPF( Figure 6A, Supplementary Table S4). Taking the intersection between the differentially expressed gene set and the differentially methylated gene set, we nally got 8 genes with low methylation and high expression at the same time, including CXCL14, DAPL1, DOK5, FNDC4, MMP7, MMP10, MMP11, SPP1 ( Figure 6B).

GO function enrichment analysis of methylated genes
We carried out GO enrichment analysis of hypermethylated and hypomethylated genes, respectively. The results indicated that the hypermethylated genes were enriched in transforming growth factor-beta receptor signaling pathway, transcription regulator complex, SMAD protein signal transduction, response to endoplasmic reticulum stress, negative regulation of the apoptotic process, cell migration, and so on ( Figure 7A). The hypomethylated genes were enriched in regulation of small GTPase mediated signal transduction, positive regulation of MAPK cascade, positive regulation of adenylate cyclase activity, negative regulation of cell population proliferation, in ammatory response, extracellular matrix organization, extracellular exosome, DNA-binding transcription factor activity, collagen catabolic process ( Figure 7B, Supplementary Table S5).
KEGG function enrichment analysis of methylated genes KEGG enrichment analysis was performed to investigate the signaling pathways that these differentially methylated genes might be involved in. The results indicated that the hypermethylated genes were enriched in the TGF-beta signaling pathway, Ras signaling pathway, Rap1 signaling pathway, PI3K-Akt signaling pathway, HIF-1 signaling pathway, Focal adhesion, Chemokine signaling pathway, and so on ( Figure 8A). The hypomethylated genes were enriched in Toll-like receptor signaling pathway, p53 signaling pathway, Hippo signaling pathway, Cytokine-cytokine receptor interaction, Cell adhesion molecules (CAMs), cAMP signaling pathway, C-type lectin receptor signaling pathway ( Figure 8B, Supplementary Table S6).
Validation in a mouse model To better verify whether these hypomethylated genes were highly expressed in pulmonary brosis, mice models of pulmonary brosis were constructed by intraperitoneal injection of bleomycin. Masson's trichrome staining results indicated that the bleomycin treatment group showed apparent collagen deposition and pluralization characteristics ( Figure 9A). RT-PCR was performed to detect the mRNA levels of these genes in pulmonary brosis tissues.
The results indicated that the expression of CXCL14, DAPL1, DOK5, FNDC4, MMP7, MMP10, MMP11, SPP1 were signi cantly increased in the IPF group compared with the control ( Figure 9B)

Conclusions
In summary, through the integrated analysis of the mRNA sequencing data sets and methylation chip sequencing data sets of pulmonary brosis, we found that several hub genes play an essential role in the brosis process while the other 8 key genes were increased due to the decrease of methylation level, which promoted the progression of pulmonary brosis. These genes have potential as new biomarkers or therapeutic targets for IPF, but the exact mechanisms by which they affect pulmonary brosis progression need to be further explored.

Discussion
Mortality in IPF is high, with a reported median survival of 2-3 years from diagnosis, based on historical data [23], and more recent evidence shows no improvement in survival [24][25][26]. It is well recognized that IPF is a heterogeneous disease with a variable disease course, and it's very di cult for predicting disease outcomes [27]. Sequencing technology has provided convenience for us to explore the internal mechanism of pulmonary brosis development. We can nd the biological markers of pulmonary brosis according to the change of gene expression level or epigenetic modi cation and provide guidance for our subsequent research [28].
Fundamental pathological changes in pulmonary brosis include the accumulation of extracellular matrices, such as collagen and bronectin, in the lung interstitium leading to respiratory failure [29]. Therefore, the expression of many collagen proteins was increased in pulmonary brosis tissues.
Moreover, when hub gene screening was carried out, several collagen genes occupied very core positions.
The primary function of MMPs(metalloproteinases) is to degrade extracellular matrix proteins; the role of some matrix metalloproteinases (MMPs) is pro brotic, where others have anti-brotic functions [30]. Ivan O Rosas et al. analyzed the concentrations of 49 proteins in the plasma of 74 patients with IPF and the plasma of 53 control individuals. They found that MMP1 and MMP7 were signi cantly increased and were potential peripheral blood biomarkers in idiopathic pulmonary brosis [31]. Takwi Nkyimbeng et al. reported that in IPF, MMPs-1, 2, 7, 9, and 13 were signi cantly up-regulated in model mice and patients with pulmonary brosis [32]. Akihiko Sokai et al. reported that serum MMP-7 and -10 levels markedly correlated with both the percentage of predicted forced vital capacity and the percentage of predicted diffusing capacity of the lung for carbon monoxide and serum MMP-10 predicted clinical deterioration within 6 months and overall survival of IPF patients [33]. In our study, we found that MMP1, MMP7, MMP10, MMP11, and MMP13 were signi cantly up-regulated in three different data sets, among which MMP7, MMP10, and MMP11 were hypomethylated, suggesting that the change in methylation level may be a driving factor for the change in MMPs expression during pulmonary brosis. We also demonstrated elevated MMP7, MMP10, and MMP11 expression in bleomycin-induced pulmonary brosis in mice. SPP1(Secreted Phosphoprotein 1) is involved in the attachment of osteoclasts to the mineralized bone matrix, and the encoded protein is secreted and binds hydroxyapatite with high a nity [34]. Christina Morse et al. found that in the lung tissue of lung bers, macrophages with high SPP1 expression can aggravate pulmonary brosis [35]. SPP1 has been found to be a useful biomarker for several immunemediated diseases, including multiple sclerosis, SLE, rheumatoid arthritis, atherosclerosis, cardiovascular disease, in ammatory bowel disease, and asthma [36][37][38][39][40]. Our study found that SPP1 was increased in multiple IPF data sets, and mRNA levels of SPP1 were also increased in the mice with pulmonary brosis model, suggesting that SPP1 may be involved in the development of IPF, but at present, SPP1 may be linked to the development of IPF. DAPL1(Death Associated Protein Like 1) was reported to autonomously suppresses retinal pigment epithelium proliferation in vivo and in vitro [41]. Felix Grassmann et al. found that DAPL1 was associated with Age-related macular degeneration in a joint analysis of 3,229 cases and 2,835 controls from ve studies [42]. At present, the biological function of DAPL1 is still unclear. Our research found that DAPL1 may be a biomarker of IPF, but the speci c mechanism still needs to be further explored. DOK5(Docking Protein 5) is a member of the DOK family of membrane proteins, which are adapter proteins involved in signal transduction. Hidekata Yasuoka et al. reported that DOK5 is up-regulated in systemic sclerosis and associated with IGFBP-5-induced brosis [43]. Lei Shi et al. found that Dok5 was the substrate of TrkB and TrkC receptors and involved in neurotrophin-induced MAPK activation [43]. At the same time, MAPK was considered to associate with the development of IPF [44].
FNDC4(Fibronectin Type III Domain Containing 4) is a member of the bronectin type III domain family of proteins [45]. It was reported that FNDC4 signaled to macrophages to downregulate in ammation and FNDC4 supplementation reduced disease severity in a mouse model for colitis, indicating therapeutic potential [46]. Our study found that the expression of FNDC4 was elevated in IPF and might be involved in the progress of IPF.
In conclusion, we screened out genes that were differentially expressed in three different IPF data sets by the methods of bioinformatics. Furthermore, GO and KEGG analysis suggested these genes involved in the extracellular matrix organization, and we found the hub genes related to the development of IPF. GSEA analysis indicated that in IPF disease, Asthma, Type I diabetes mellitus, Lupus erythematosus, the intestinal immune network for IgA production, p53 signaling pathway were highly activated; Besides, IPFrelated genes regulated by methylation were identi ed by combining IPF methylation sequencing chips, and these genes were demonstrated in bleomycin-induced murine pulmonary brosis model. Abbreviations IPF: Idiopathic pulmonary brosis; DEGs: differentially expressed genes; GEO: Gene Expression Omnibus; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; PPI: protein-protein interaction; GSEA: Gene set enrichment analysis.

Declarations
Ethics approval and consent to participate The experimental animals were handled in accordance with a protocol approved by the Institutional Animal Care and Use Committee of the Tongji Medical College, Huazhong University of Science and Technology(No.2019S1143), and written informed consent was obtained.