Identication of Key Genes and miRNA-mRNA Regulatory Pathways in Bronchopulmonary Dysplasia in Preterm Infants by Bioinformatics Methods

Background: Bronchopulmonary dysplasia (BPD) remains a severe respiratory complication of preterm infants in neonatal intensive care units (NICUs). However, its pathogenesis has been unclear. Bioinformatics analysis, which can help us explore genetic alternations and recognize latent diagnostic biomarkers, has recently promoted the comprehension of the molecular mechanisms underlying disease occurrence and development. Methods: In this study, we identied key genes and miRNA-mRNA regulatory networks in BPD in preterm infants to elucidate the pathogenesis of BPD. We downloaded and analyzed miRNA and gene expression microarray datasets from the Gene Expression Omnibus database (GEO). Differentially expressed miRNA (DEMs) and differentially expressed genes (DEGs) were obtained through NetworkAnalyst. We performed pathway enrichment analysis using the Database for Annotation, Visualization and Integrated Discovery program (DAVID), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG). Then we used the STRING to establish protein–protein interactions and the Cytoscape tool to establish miRNA– mRNA regulatory networks. Results: We identied 19 signicant DEMs and 140 and 33 signicantly upregulated and downregulated DEGs, respectively. Functional enrichment analysis indicated that signicant DEGs were associated with the antigen processing and presentation, and B-cell receptor signaling pathways in BPD. Key DEGs, such as CD19, CD79B, MS4A1, and FCGR2B were selected as hub genes in PPI networks. Conclusions: In this study, we screened out 19 DEMs that might play important roles in the regulatory networks of BPD. Higher expression of miRNAs such as miR-15b-5p, hsa-miR-32-5p, miR-3613-3p, and miR-33a-5p and lower expression of miRNAs such as miR-3960, miR-425-5p, and miR-3202 might be correlated with the process of BPD.


Introduction
Nowadays, with the development of the department of neonatology and advancements in medical technology, the number of premature infants surviving the neonatal period has been increasing. However, this has been accompanied by an increasing number of cases of bronchopulmonary dysplasia (BPD) in preterm infants [1], a chronic lung disease with signi cant morbidity and mortality mainly due to long time oxygen therapy during the late canalicular or saccular phases of lung development. Accordingly, this might cause long term consequences in the neonates. Researches have shown that infants with BPD are easier to get respiratory infections and are more susceptible to frequent hospitalizations compared with healthy preterm and term infants in the rst 2 y of their life [2]. The pathogenesis of BPD has not been well elucidated yet, so identifying the mechanism that leads to the occurrence and development of BPD is an important issue.
The diagnostic criteria of BPD changes over time.The National Institute of Child Health and Human Development (NICHD) guidelines in 2001 imposed that the diagnose of BPD requires accumulated oxygen inhalation for at least 28 days after birth. While new NICHD guidelines in 2018 imposed that the diagnose of BPD should according to the oxygen concentration for at least 3 consecutive days at 36 weeks post-menstrual age (PMA). There is no question that BPD is a complex disease that develops progressively, has multiple causes, has a spectrum of severity and the diagnosis is relatively nonspeci c which can differ between regions.
In recent years, the genetic background has been involved in the pathogenesis of BPD [3]. Nowadays, we can disclose the molecular mechanism of BPD using high-throughput sequencing and high-resolution microarray analysis. Microarray analysis is a high-throughput, high-e ciency, and high-automation method that has been widely used in scienti c research to provide the expression level of messenger RNA (mRNA) and noncoding RNAs (ncRNAs) in samples [4,5]. Noted, ncRNAs mainly includes microRNA (miRNA), circular RNA (circRNA), and long noncoding RNA (lncRNA). Accordingly, miRNAs is a group of highly stable single-stranded RNA molecules that have been reported to play important roles in posttranscriptional gene regulation [6]. They have been shown to regulate the mRNA and protein expression in various physiological and pathological processes, such as cellular differentiation, proliferation, apoptosis, angiogenesis, and cancer development [7].
Many studies have revealed the role of miRNA during the pathogenesis of BPD [8].These miRNAs have been shown to regulate their targeted downstream genes through changes (over-or under-) in their expression. Lal et al. found that under-expression of miR-876-3p was connected with the development of BPD [9], Whereas, Zhang et al. found that over-expression of miRNA-206 contributed to the development of BPD through the up-regulation of bronectin 1 [10]. However, despite ongoing research in this eld, the molecular mechanism of BPD remains poorly understood. As far as we know, there have been few studies using microarray datasets to obtain key genes and construct miRNA-mRNA regulatory pathways in BPD. Our study aimed to identify the key genes and differentially expressed miRNA (DEMs) and their underlying regulatory mechanisms in BPD using bioinformatic methods.

Microarray Data
GEO (http://www.ncbi.nlm.nih.gov/geo) is a public functional genomics data repository containing arrayand sequence-based data [11]. The GSE108754 gene expression pro les and the GSE108755 miRNA expression pro les were downloaded from GEO. A total of 20 premature infants with BPD and 20 non-BPD age-matched controls were enrolled from NICU in Shanghai Children Hospital from January 2015 to December 2016. BPD is diagnosed according to the National Institute of Child Health and Human Development (NICHD) guidelines in 2001. There was no signi cant difference in general clinical data between the two groups. Peripheral blood samples were collected. 5 infants with BPD and 6 infants without BPD were randomly selected for screening DEMs and DEGs, the others were used to verify. The array data were based on the GPL17107 Exiqon miRCURY LNA™ microRNA Array -hsa, mmu & rno (Bethesda, MD, USA) and GPL13497 Agilent-026652 Whole Human Genome Microarray 4 × 44K v2 (Probe Name version) (Palo Alto, CA, USA). Moreover, we compared the results of GSE125873 and GSE32472 with our analysis.

Identi cation of differentially expressed miRNAs and differentially expressed genes
We compared samples from infants with BPD with those from normal preterm infants in GSE108755 and GSE108754 to identify DEMs and DEGs. The comparison was conducted using a Limma R package based on NetworkAnalyst 3.0 (https://www.networkanalyst.ca/NetworkAnalyst/home.xhtml) [12]. The "P value < 0.05" and "|log2 fold change (FC)| ≥ 1" indexes were set as ltering criteria to screen for signi cant DEMs. False-positive results could be corrected using the adjusted P value from the Benjamini-Hochberg method, and the "adjusted P value < 0.05" and "|log2 fold change (FC)|≥1" indexes were used as primary ltering criteria to screen for DEGs.

Identi cation of miRNA targets
The MiRWalk 2.0 database ( http://mirwalk.umm.uni-heidelberg.de/ ) integrates some miRNA databases and provides a huge amount of predicted and experimentally validated information about the binding targets of miRNAs [13]. We obtained the results of signi cant DEMs and their target mRNAs from the MiRwalk 2.0 retrieval system. Then, we used an online webtool, Venn diagrams (http://bioinformatics.psb.ugent.be/webtools/Venn/) to screen the genes from the intersection between the DEGs obtained from the GEO dataset and the targeted mRNAs predicted by MiRwalk 2.0. These intersected genes were selected as signi cant DEGs.

Functional Enrichment Analysis
Gene ontology (GO) is a tool used for gene annotation that functions through the utilization of a de ned, structured, and controlled vocabulary, including 3 categories, namely biological processes (BP), cellular components (CC), and molecular functions (MF) [14]. The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a database used to assign sets of DEGs to speci c pathways [15]. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/) is an online database that offers gene enrichment analysis and functional annotation clustering from various genomic resources [16]. GO and KEGG pathway analysis on signi cant DEGs was conducted using the DAVID database. The species selected was Homo sapiens, and the "adjusted P value < 0.05" was considered to be of statistical signi cance.

Analysis of protein-protein interaction (PPI) networks
The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) (https://stringdb.org/cgi/input.pl) is an online database, which can predict interactions of proteins by neighborhood, gene fusion, co-occurrence, co-expression experiments, databases, and text-mining [17]. Signi cant DEGs were mapped into PPIs and a combined score of > 0.4 was set as a threshold value in this study. More speci cally, nodes with higher degrees of interaction were considered as hub nodes.
2.6 Analysis of miRNA-mRNA regulatory networks TargetScan (http://www.targetscan.org/) is a database that can be used to predict miRNA targets by matching conserved 8mer and 7mer sites with the seed region of an input miRNA [18]. MiRDB (http://www.mirdb.org/ ) is an online database that predicts miRNA target genes based on high throughput sequencing data [19]. In addition, miRTarBase (http://mirtarbase.mbc.nctu.edu.tw/) is an experimentally-veri ed miRNA target gene database [20]. Target genes that were veri ed by at least 1 of the 3 databases (TargetScan, miRDB, or miRTarBase) were selected as nal sreened-out target genes for signi cant DEMs, and were used to build a miRNA-mRNA pairing le. Finally, the miRNA-mRNA regulatory network was constructed using Cytoscape, an open source bioinformatics software program used to visualize molecular interaction networks [21].

Identi cation of differentially expressed miRNAs / differentially expressed genes
We compared the miRNA and mRNA expression pro les of the whole blood samples obtained from the 5 patients with BPD and 6 healthy controls. After statistical validation, the results of NetworkAnalyst analysis showed that there were 19 signi cant DEMs identi ed. Among them, 10 DEMs were upregulated, whereas 9 DEMs were downregulated (Table 1). In total, 207 DEGs were identi ed, including 168 upregulated and 39 downregulated (Table S1). The distribution of whole DEGs is shown in volcano plots (Fig. 1B), while the top 50 DEGs are shown in a heatmap (Fig. 1A).

Identi cation of miRNA targets
Using the MiRWalk 2.0 validated-target miRNA-gene retrieval system, we obtained a collection of candidate genes of the 19 signi cant DEMs. The intersection number between these candidate genes and the DEGs screened out by NetworkAnalyst was 173, with 140 genes being upregulated and 33 genes downregulated (Table S2). Therefore, the 140 upregulated and 33 downregulated genes were collected as candidates for the nal pro les of signi cant DEGs, including ADM, WNT3, WNT16, ZNF532 and ZNF608.

GO and KEGG Enrichment
The GO was used to identify key biological functions corresponding to 1 of 3 term categories, namely, biological process, cellular component, and molecular function. Results showed that the most signi cant GO terms in each term category for 173 signi cant DEGs were "antigen processing and presentation of peptide or polysaccharide antigen via MHC class II", "MHC class II protein complex", and "MHC class II receptor activity" ( Table 2, Fig. 2A). Furthermore, the identi ed signi cantly enriched KEGG pathways of signi cant DEGs were demonstrated to be about infection, in ammation, and immune response ( Table 2, Fig. 2B). Among them, virus infection, antigen processing and presentation, and B cell receptor were identi ed as those that might play an important role in the pathogenesis of BPD. Similarly, the results of GSE32472 were analyzed by Pietrzyk et al. suggested that the signaling pathway of the hematopoietic cell lineage, allograft rejection, asthma, intestinal immune network for IgA production, and cell adhesion molecules (CAMs) were also involved in BPD [22]. Moreover, the hematopoietic cell lineage signaling pathway could be obtained in all 3 measurements in their study.
Regarding the results of GSE125873 which were analyzed by Ryan et al., they also found the signaling pathway of CAMs, phagosome, systemic lupus erythematosus, and leishmaniasis. In addition, they also obtained the processes of "antigen processing and presentation of exogenous peptide antigen via MHC class II", "antigen processing and presentation of peptide or polysaccharide antigen via MHC class II" and "MHC class II receptor activity" in the terms of the GO-BP category [23].

PPI Network and Hub Gene
The resulting PPI networks were constructed by 173 signi cant DEGs (Fig. 3), constituted by 166 nodes and 334 edges in total. In a PPI network, the genes which have more edges are known to play more important roles (like a seed). The "Degree" was used to count edge numbers of every gene in a PPI network. The top 10 genes, which were identi ed as hub genes, are shown in Table 3. Interestingly, all of 10 hub genes, such as CD19, CD79B, CD74 and CD72 were observed to be upregulated in the blood of infants with BPD compared with normal preterm infants. These genes were also shown to be related to the downstream signaling events of the B cell receptors and of the hematopoietic cell lineage.

miRNA-mRNA Regulatory Network
Target genes that were veri ed by at least 1 of the 3 databases (TargetScan, miRDB, or miRTarBase) were selected as nal screened-out target genes for DEMs. As illustrated in Fig. 4, 1 miRNA can target 1 or more mRNAs to regulate their functions. Moreover, 1 or more miRNAs might interact with the same mRNA. For example, miR-4286, miR-425-5p and miR-3490-5p were shown to interact with the AF4/FMR2 family member 3 (AFF3). Noted, AFF3 encodes a tissue-restricted nuclear transcriptional activator that is preferentially expressed and located in the nucleus of B cells and might play a role in lymphoid development and oncogenesis [24]. Overall, there were 25 mRNA nodes discovered, including BCL11A, CDK14, MOB3B, and TCF4, which were noted to interplay with at least 2 miRNAs.

Discussion
Bronchopulmonary dysplasia is one of the most common complications arising in preterm infants, especially in those born underweight and those of small gestation weeks. It has been reported that up to 70% of babies born before 26 wk of gestation will develop BPD [25]. The progression of BPD is known to be driven by multiple mechanisms, with the participation of a few important protein and signaling pathways, such as thec vascular endothelial growth factor (VEGF), interleukin (IL), and phosphatidyl inositol-3-enzyme-serine/threonine kinase (PI3K-AKT) signaling pathway [26]. Therefore, it is important to clarify the pathophysiology of BPD and discover means of early diagnosis and treatment-related biomarkers. Bioinformatics analysis and e cient microarray might be conducive to our understanding of the molecular mechanisms of disease occurrence and development, thus helping the exploration of genetic alternations and identi cation of underlying diagnostic biomarkers.
In this study, we screened out 19 signi cant DEMs, of which 10 were shown to be upregulated, whereas 9 downregulated. Rcesults of functional enrichment analysis indicated that these signi cant DEGs were associated with the virus infection, antigen processing and presentation, B-cell receptor, phagosome, hematopoietic cell lineage, and CAMs signaling pathways in BPD. Among these signaling pathways, the CAMs pathway was also obtained in the analysis of GSE32472 [22] and GSE125873 [23]. In these 2 series they both identi ed the signaling pathway of T-cell receptor which was not obtained in our study, maybe due to the use of different grouping methods and sampling time. Key DEGs, such as CD19, CD22, CD72, CD74, MS4A1, and FCGR2B were identi ed as hub genes in PPI networks. Moreover, through the construction of the PPI network, we could recognize key genes, with which miRNAs might interplay with. Despite ltering the genes with the potential targets of the 19 signi cant DEMs, we could still identify 140 upregulated and 33 downregulated genes. Hence, considering the total number of DEMs, the enormous and complex miRNA-mRNA regulatory network could be unimaginable. The hub genes of a network are known to always be important, resembling "seeds", that could combine different signal pathways.
Furthermore, some of these DEGs were validated and found to be correlated with BPD. One of the DEGs, called adrenomedullin (ADM), which was found to be downregulated in our study, was shown to be regulated by hsa-miR-423-5p, hsa-miR-3940-5p, hsa-miR-767-5p, and hsa-miR-4301. Moreover, ADM has been shown to have potent angiogenic, anti-in ammatory, and antioxidant properties. Zhang et al. reported that ADM de ciency in human pulmonary microvascular endothelial cells (HPMEC) resulted in signi cantly increased the generation of hyperoxia-induced reactive oxygen species and cytotoxicity compared with ADM su cient HPMEC, nally causing BPD [27]; however this nding remains to be validated. Likewise, WNT 3/16 were demonstrated to be upregulated in our study through many miRNAs, such as the underexpressed hsa-miR-767-5p, hsa-miR-5681b, hsa-miR-423-5p, hsa-miR-3940-5p, and hsa-miR-3960.The WNT family have also been found to be associated with the development of BPD.
Hyperoxia is known to increase the expression of WNT2b, WNT 5a, WNT 9a, and WNT 16, and decrease the expression of WNT 4, WNT 10a, and WNT 11 [28]. The WNT family of proteins includes a large number of members that control a variety of developmental processes, including cell fate, proliferation, polarity, and migration [29]. Li et al. found that patients with BPD were characterized by an increased activity of Wnt/β-catenin [30]. Similar to that, we also found an increased expression of WNT 16 in BPD, but the mechanisms by which WNT3 might cause BPD remain to be explored. Among the identi ed DEGs, we also found the upregulated TLR10, which was shown to be regulated by miRNA, such as the underexpressed hsa-miR-767-5p, hsa-miR-5681b, hsa-miR-423-5p,hsa-miR-3940-5p, and overexpressed hsa-miR-33a-5p, and hsa-miR-337-5p, to be enriched in the immune response process term of the GO-BP category. Toll Like Receptors (TLRs) are known to play an important role in regulating in ammation, maintaining mucosal homeostasis and preventing bacterial invasion [31]. Rising evidence has implied that the TLR signaling pathway is the pivotal component of the pulmonary homeostatic program that abrogates lung in ammation and injury [32]. Many studies were aimed at Toll-interleukin 1 receptor domain-containing adaptor protein (TIRAP). Researchers have also found that TLR5 and TLR4 were associated with the occurrence of BPD via the MyD88-dependent pathway [33,34], and TLR10 was reported to active the TRL4 signaling pathway. So, TLR10 might also be related to the occurrence of BPD; another nding that requires con rmation.
In this study, we screened out 19 DEMs, suggested to modulate the expression of DEGs and contribute in the regulation of many pathways. Besides, we also found that single miRNA could interplay with many mRNA species, as well as that a single mRNA could also interplay with many miRNA species. Although most of them have not been reported in the mechanisms so far studied in patients with BPD, we could still obtain some information from existing studies. One such case was the hsa-miR-15b-5p, one of the identi ed DEMs in our study. Zhang et al. found that miR-15b-5p was upregulated in BPD mice [35]. Fu et al. have also reported that it has a protective action against oxidative stress in HG-stimulated podocytes [36], while Ezzie et al. found that it was increased in patients with chronic obstructive pulmonary disease (COPD) and could potentiate the progression of brosis in lung tissues [37].As such, overexpression of the hsa-miR-15b-5p in the BPD blood samples in our results, might indicate a similar underlying association. Another DEM, hsa-miR-301a-3p, which was overexpressed in our study, was demonstrated to modulate DEGs, such as TLR10, CD72, and BMP3. This effect has been previously observed in animal experiments. The study by Dong et al. showed the overexpression of miR-301a in a murine model of hyperoxia-induced bronchopulmonary dysplasia [38]. Therefore, hsa-miR-301a-3p might also play a role in the mechanism of BPD development in infants, which remains to be validated.
Although we investigated the miRNA-mRNA regulatory pathway in BPD using bioinformatics methods, our study had some limitations that should be clari ed. First, the samples were limited and might have led to high false-positive rates and one-sided results. Therefore, it is required to improve the detection power by integrating more datasets in future studies. Second, the source of microarray data was only from blood samples. Body uids that could be noninvasively obtained in the clinic, such as sputum and urine might also contain miRNAs. Third, to con rm the mechanisms of hub genes related to BPD, it will be helpful to add some in vitro or in vivo experiments to validate our results. Forth, due to the absence of clinical data, we were unable to assess the relationship between DEMs and the severity of BPD. More clinical and demographic characteristics of infants with BPD are required for further analysis. Finally, experimental evidence, obtained from wet research, such as western blot, real-time PCR and immunohistochemistry assays are required to better delineate the role of hub genes and the potential mechanisms of BPD.
In this study, we found multiple miRNA-mRNA regulatory pathways and potential biomarkers of BPD, in line with our current knowledge of the pathophysiology of this disease. We believe that this hypothesisgenerating study offers a new insight into the molecular mechanisms of BPD through the and identi cation of several latent biomarkers that could be used toward its diagnosis and treatment.

Conclusions
we identi ed 19 signi cant DEMs that might play key roles in the regulatory networks of BPD. The higher expression of miRNAs, such as miR-15b-5p, hsa-miR-32-5p, miR-3613-3p, and miR-33a-5p, and the lower expression of miRNAs, such as miR-3960, miR-425-5p, and miR-3202 might be related to the occurrence and development of BPD. However, we still need further experimental studies to test our results.

Availability of data and materials
The datasets generated and/or analysed during the current study are available in the GEO repository, https://www.ncbi.nlm.nih.gov/geo/.