The prediction of the crucial regulatory factors and functional analysis in pulmonary arterial hypertension by re-annotation

Background Pulmonary arterial hypertension (PAH) is characterized by a progressive increase in pulmonary vascular resistance, which increases the right ventricular load and eventually leads to right ventricular failure and death. Recent theoretical developments have revealed that quiet a few lncRNAs played an important role in the development of pulmonary vascular diseases, such as H19, tcons_00034812. However, the mechanism of mRNA regulated by lncRNA in the PAH remained unclear. Results In our study, we re-annotated Affymetrix Murine Genome array to Human reference genome(hg38) by using the data from GENCODE database, in the light of lncRNA and mRNA. We screened differentially expressed lncRNA/mRNA between PAH samples and normal samples. Then we constructed PAH related lncRNA/mRNA co-expression network and analyzed topological properties of the network. We excavated the modules by the WGCNA R package. The genes in the modules excavated are used for enrichment analysis of functions and pathways. Finally, 8 functional modules were identied as pathogenic modules, providing new ideas for understanding the potential mechanisms of PAH. Conclusions We generated a new strategy to do research on pulmonary arterial hypertension. The pathogenic modules played an important role and could offer new therapeutic targets for pulmonary arterial hypertension mechanism research.


Background
Pulmonary arterial hypertension (PAH) is a multifactorial and progressive disease that is usually associated with in ammatory factors, cell metabolites, genetic abnormalities, and hemodynamic changes. Although the cause of the disease is various and the classi cation of diagnosis is complex, its pathological features are very similar, mainly presenting as progressive increase of pulmonary artery vascular resistance, which leads to increased right ventricular load and eventually death due to right ventricular failure. The main pathological basis for its sustainable development is hypoxic pulmonary vasoconstriction, vascular remodeling and formation of in situ thrombosis [1][2][3][4]. Both coding RNAs and non-coding RNAs play an increasingly crucial role in PAH [5,6]. Coding RNAs were involved in cell metabolism by encoding proteins, while non-coding RNAs could exert functions by regulating coding RNAs or by binding directly to proteins. Long non-coding RNA (lncRNA) is a class of non-coding RNA with a length of more than 200 nucleotides, which can regulate translation, splicing and gene expression in complex biological regulation process [7,8].
LncRNAs has captured wide attention because of the irreplaceable role in a multitude of diseases, including cardiovascular diseases, tumors and autoimmune diseases [9][10][11][12]. Much of the research in the last two decades has shown that some lncRNAs played an important role in the development of pulmonary vascular disease. LncRNA caused the dysfunction of microvascular and eventually led to the disease by indirectly regulating the downstream genes' expression [13]. For instance, Su H et al. demonstrated that lncRNA H19 was highly expressed in the serum and lungs of PAH rats, was involved in regulating the proliferation of human pulmonary artery smooth muscle cells(PASMCs), and that the myocardial hypertrophy in mice with pulmonary arterial hypertension could be improved after MALAT1 silencing [14]. Liu et al. showed that tcons_00034812 can promote the expression of transcription factor STOX1 and regulate the function of PASMCs through MAPK signaling pathway [7]. However, there were only a few lncRNAs that had been veri ed by the low through-put experiments related to the PAH. High throughput experiments (gene microarray, RNA-seq) could detect many lncRNAs by one experiment.
However, as a result of the high cost of RNA-seq, there is little or no publicly available PAH-related RNA sequence data. Compared with RNA-seq, gene microarray analysis is not only cheaper but also more sensitive [15,16]. Zhang et al. found the expression of 1056 lncRNAs in mouse CH samples by reannotation [17], others also get lncRNAs by re-annotated probes of gene microarray [18,19]. The accuracy and consistency of this method have been veri ed by researchers [20]. To compare between species, we have set up some matching rules (See Materials and Methods for details) to ensure that all the acquired genes are homologous. Therefore, we can detect the expression of lncRNAs by recombination of probes in gene microarrays, which is considered a viable way.
In this paper, we constructed a network by using the data from Gencode database and miRanda algorithm, in the light of lncRNA and mRNA. We extracted the PAH related lncRNA/mRNA network by processing the gene microarray data to obtain the differentially expressed genes, investigate different topological properties of the network and miner the modules. The genes in the modules excavated are used for enrichment analysis of functions and pathways. Finally, functional modules were identi ed as pathogenic modules, providing new ideas for understanding the potential mechanisms of PAH. Result 1. Construction of LncRNA/mRNA expression pro le in human PAH based on probe re-annotation As a result of the data shortage of the lncRNA microarry annotation data in the GEO database, some studies had shown that lncRNA annotation data could be exploited by the re-annotation methods, so we developed a measure for re-annotation and performed it to annotate the lncRNA and mRNA annotation sets (See Materials and Methods ). Firstly, we downloaded one open gene expression pro le GSE482 of PAH with 3 platforms (AV2 platform contain 8 PAH, 1 control; BV2 platform contain 8 PAH, 3 control; CV2 platform contain 8 PAH, 2 control), which probe sequences were supported by Affymetrix Murine Genome U74A Version 2 Array, Affymetrix Murine Genome U74B Version 2 Array, Affymetrix Murine Genome U74C Version 2 Array, respectively. Sequence alignment algorithm was applied between probe annotation sequences and human long non-coding transcript sequences, protein-coding transcript sequences from Gencode database, respectively, using the Blastn tools. Then through the three-level quality ltering, we identi ed two (lncRNA-probe, mRNA-probe pairs) annotation sets, respectively (Additional le 1: Table S1, Additional le 2: Table S2 and Additional le 3: Table S3).

Screening for differentially expressed LncRNA/mRNA
According to the re-annotation results, we calculated the 3 platforms by T test (See Materials and Methods ) to identify the differential expressed LncRNA/mRNA between PAH samples and normal samples. We considered p-value < 0. 05 was statistically signi cant. Finally, the AV2 platform screened BV2 platform screened 123 differentially high expressed LncRNA/mRNA, 188 differentially low expressed LncRNA/mRNA; The CV2 platform screened 62 differentially high expressed LncRNA/mRNA, 81 differentially low expressed LncRNA/mRNA. Of the three platforms, 356 up-regulated LncRNA/mRNA and 384 down-regulated LncRNA/mRNA were screened out (Additional le 4: Table S4). The expression pro les of differentially expressed LncRNA/mRNA were constructed on 3 platforms.

Construction of lncRNA/mRNA network related to PAH
To reveal how lncRNA may mediate transcription in PAH by affecting mRNA, a LncRNA/mRNA network based on the 3 platforms was constructed and visualized using Cytoscape software. The Pearson correlation coe cient between one pair of differentially expressed lncRNA and mRNA was >0. 3, p-value < 0. 05, which was connected, then 3 platforms for PAH related lncRNA/mRNA interaction networks were constructed ( Fig. 1). There were 296 nodes and 4690 edges in AV2 platform, 311 nodes and 29630 edges in BV2 platform, 143 nodes and 781 edges in CV2 platform. The genes with the highest degree of node in each platform are listed in table 1.
4. Functional modules identi cation of lncRNA/mRNA networks associated with PAH was performed using WGCNA Using the WGCNA package in R to conduct module mining in the lncRNA/mRNA network related to PAH in three platforms (Additional le 5: Table S5, Additional le 6: Table S6 and Additional le 7: Table S7). Among them, 8 modules were identi ed in the AV2 platform ( Fig. 2A), 2 modules were identi ed in the BV2 platform (Fig. 2B), 3 modules were identi ed in the CV2 platform (Fig. 2C). GSE482 chip samples can be divided into the following ve characters: mouse with hypoxia for 10 hours, 2 days, 4 days, 6 days respectively and normoxia. Then we test the relevance between the characters and the modules divided from the three platforms.
The relationship between modules and traits showed that in the AV2 platform (Fig. 3A), the properties of 5. Functional and pathway enrichment analysis of genes in modules Both the green module of AV2 platform and the grey module of CV2 platform related to 10 h showed that the "nervous system/ neuron development" changed at this time. In addition, the green module of AV2 platform also showed that some genes were related to "protein localization" (Fig. 5A C).
The turquoise module of BV2 platform display that "nucleic acid metabolic process" and "cellular macromolecule metabolic process" occurs in 2 days (Fig. 5B ). Further enrichment analysis of the signaling pathways by KEGG revealed that most of the genes were enriched in "thyroid hormone signaling pathway" and "oxytocin signaling pathway".
In 6 days, the turquoise module of AV2 platform showed that abnormal metabolic processes, including primary nitrogen compound organic substance and macromolecule metabolic process (Fig. 5A. The complete picture is shown in Additional le 8: Fig. S1). The grey module of BV2 platform indicated that differentially expressed genes were enriched in Separation of Sister Chromatids and Oocyte meiosis signaling pathways (Fig. 5B). Moreover, the blue and turquoise modules of CV2 platform revealed changes in nuclear specks, also known as splicing speckles (Fig. 5C). All the above results indicate that cells have entered abnormal proliferation and division at this stage.
Eventually, the 8 modules most closely related to these traits will be regarded as pathogenic modules.

Discussion
As far as we know, there is no speci c clinical treatment for pulmonary arterial hypertension. One way to overcome this problem is to nd the relevant genes by bioinformatics method. We therefore analyzed lncRNA and mRNA of PAH samples and investigated whether some mutations in lncRNA/mRNA were signi cant. Finally, we found that different progress stages of PAH are closely related to 8 disease modules. The initial detection was hypoxia for 10 hours. At this time, hypoxia caused changes of neuron/nervous system, which resulted in the increase of extracellular secretion of neurotransmitters such as ACh and ATP. The neurotransmitters further stimulated breathing. Previous studies suggest that changes in respiratory network neurons in rats undergoing chronic intermittent hypoxia in an experimental model of neurogenic hypertension lead to sympathetic overactivity and affect arterial pressure levels [21]. Hypoxia inhibits K+ channel in carotid body type 1 cells, a special group of tissues that sense acute changes in local oxygen tension, bring about depolarization and increasing intracellular calcium concentration, [Ca++]i leads to exocytosis of neurotransmitters, such as ACh and ATP, which stimulating the carotid sinus nerve and respiration. This mechanism is likely to be related to idiopathic pulmonary arterial hypertension through E K Weir et al [22]. Although there is no clear evidence that PAH is associated with neuronal development, it is undeniable that hypoxia leads to changes in the nervous system. Accordingly, changes in the nervous system may be the starting factor of PAH.
After 2 days of hypoxia, the thyroid hormone signaling pathway began to change. 4 days later, the ILK (Integrin Linked Kinase) and TBL1X (Transducin Beta Like 1 X-Linked) have changed. ILK is a Protein Coding gene that mediates cell growth and apoptosis. It is noteworthy that its expression and activity were increased in human aplastic thyroid cancer [23,24]. Proteins encoded by TBL1X was found as a subunit in corepressor SMRT (silencing mediator for retinoid and thyroid receptors) core complex [25,26].
Consequently, the pathogenesis of PAH is closely related to thyroid diseases. The results have been con rmed by other scholars' research. In a study that evaluated 114 patients with hyperthyroidism the prevalence of PAH was found to be 43% [27]. However, in another study, the authors found the prevalence of autoimmune thyroid disease in the 63 patients with PAH to be 49% [28]. So some people hold the view that PAH causes thyroid diseases [29], while others consider that thyroid disorders lead to PAH [28].
Although the pathogenesis of this relationship can not be fully explained at present, this study may provide some guidance for clinical testing.
In addition, on the 4 day of hypoxia, the RAPGEF4 (Rap Guanine Nucleotide Exchange Factor 4) RGS6 (Regulator Of G Protein Signaling 6) and NKX2-3 have changed. Proteins encoded by RAPGEF4 and RGS6 are confers the GTPase-activating activity [30,31]. GTPase catalyzes GTP to GDP and then to ATP.
Nevertheless, the increase of ATP is obviously pernicious factors to PAH [32][33][34][35]. Furthermore, a few complexes gradually formed at this stage, like "Ubiquitin E3 ligase (SIAH1, SIP, SKP1A, TBL1X)". Studies have shown that inhibition of ubiquitin proteasome function prevents monocrotaline-induced pulmonary arterial remodeling [36]. It is thus clear that these factors are aggravating the formation of PAH.
On the 6th day of hypoxia, PAH entered an abnormal period of proliferation and division. One of the most prominent morphological characteristics of PAH was the proliferation of PASMCs. All in all, we seem to have found the origin of PAH, which is related to some changes of neuron/nervous system. We will focus our future research on this part of differential lncRNA/mRNA network in order to speed up the discovery of the pathogenesis of PAH. Although our research still has limitations, we generated a new strategy to do research on PAH. The pathogenic modules played an important role in the PAH and could offer new therapeutic targets for PAH mechanism research.

Conclusions
In summary, these analyses indicate that different progress stages of PAH are closely related to 8 disease modules, and that the origin of PAH seems to be related to some changes in neuron/nervous system. The key contribution of this work is that we generated a new strategy to do research on PAH. The pathogenic modules played an important role and could offer new therapeutic targets for PAH mechanism research.

Data download and preprocessing
The expression pro le of mouse pulmonary arterial hypertension (GSE482) chip was downloaded from GEO database. GSE482 chip data includes GPL81, GPL82 and GPL83 platform data. GPL81, GPL82 and GPL83 platform data correspond to three chips MG_U74Av2, MG_U74Bv2 and MG_U74Cv2 respectively. GPL81 (replaced by AV2) contains 9 samples (8 disease samples, 1 normal sample), GPL82 (replaced by BV2) includes 11 samples (8 disease samples, 3 normal samples), GPL83 (replaced by CV2) includes 10 samples (8 disease samples, 2 normal samples). Samples from the three platforms had the same trait information, that is, mouse with hypoxia for 10 hours, 2 days, 4 days, 6 days respectively and normoxia. The FASTA format data of mouse MG_U74Av2, MG_U74Bv2 and MG_U74Cv2 platform probe sequences were downloaded from Affymetrix website. The FASTA format data of long non-coding RNA and proteincoding gene sequences of human hg38 are downloaded from Gencode database.
In order to convert the microarray expression data of mice into the expression data of human proteincoding genes and lncRNA, according to the principle of sequence similarity, the microarray data of mice on three platforms were re-annotated by BLASTN software, i. e. aligned with the human PCG and lncRNA sequences in Gencode. The comparison rules are as follows: (1) The probe sequence is completely matched in the transcript sequence of PCG or LncRNA.
(2) If a probe sequence is aligned to multiple genes (or lncRNA), the probe is ltered out.
(3) Each lncRNA (or PCG) has at least three probe sequences.
Finally, combined with the data of mouse microarray expression pro les in GSE482, the expression pro les of human mRNA and lncRNA were constructed.
Screening differentially expressed lncRNA and mRNA Differentially expressed LncRNA/mRNA was screened in the expression pro le of LncRNA/mRNA. If foldchange was greater than 1.2 and p-value<0.05, it would be regarded as differentially expressed LncRNA/mRNA; if foldchange was less than 5/6 and p-value<0.05, it would be regarded as differentially low expressed LncRNA/mRNA. Then the differentially expressed LncRNA/mRNA pro les of the three platforms were constructed.
Construction of lncRNA-mRNA network related to pulmonary arterial hypertension The Pearson correlation coe cients for each two differentially expressed lncRNA/mRNA were calculated in three platforms. If the Pearson correlation coe cients for each two differentially expressed lncRNA/mRNA values were greater than 0. 3 and p-value < 0. 05, then the two differentially expressed lncRNA/RNA were connected. Cytoscape was used to construct the lncRNA-mRNA network associated with pulmonary hypertension. Furthermore, three platforms of lncRNA/mRNA interaction network related to pulmonary hypertension were constructed. The node degree of lncRNA/mRNA interaction network on the three platforms is also analyzed.

Mining Modules in lncRNA-mRNA Networks
The WGCNA R package was used to mine the modules in the lncRNA-mRNA network associated with pulmonary hypertension, and the relationship between the modules and the individual characteristics was calculated by combining the sample character information (hypoxia for 10 hours, 2 days, 4 days, 6 days respectively and normoxia.). The co-expression of lncRNA/mRNA on three platforms was analyzed.
Functional enrichment analysis of lncRNA/mRNA in modules g:Pro ler(https://biit.cs.ut.ee/gpro ler/) is used to conduct enrichment analysis of functions and pathways of genes in the modules excavated from the three platforms. The function modules were identi ed as pathogenic modules.

Consent for publication
Not applicable.

Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its Additional les.