A network-based approach is used to collectively study differentially expressed human cmiRNAs during high altitude ascent. As mentioned in the methodology, the literature associated with high altitude cmiRNAs was extracted from the PubMed database. This literature dataset was ranked based on the training model that was developed using discriminative word clouds of high-altitude adaptation. The 299 cmiRNAs are curated from the four [14, 16, 20, 67] out of six PMIDs and. 959 human volunteer samples were analysed, which includes native Tibetans -362, newly arrived migrants to high altitude -126, native low landers -233 and 238 long-time migrants to high altitude.
These 299 differentially expressed high altitude specific human CmiRNAs experimentally validated target interactions were identified using “MultimiR 2.0” R package [29]. The miRNA-target interactions were further classified as miRNA-gene and miRNA-TF interaction by labelling miRNA targets as TFs or genes using all TFs and TcoFs dataset. The final datasets contain 3455 miRNA targets with 2991 genes and 464 TFs respectively. In terms of interactions, there are 8519 miRNA- genes and 1655 miRNA –TF interactions. To construct miRNA-TF-gene co-regulatory network 364 TF-miRNA and 566 TF- gene interactions were added from their validated databases as mentioned in methodology. The final directed tripartite miRNA –TF gene co-regulatory network contains 3754 nodes (299 CmiRNAs, 509 TFs and 2936 genes) and 13407 interactions and visualized using Cytoscape 3.2.1[43]. Degree distribution topological analysis was performed on the miRNA-TF gene coregulatory network showed that it follows the power law and network has scale free topology. The scale-free topology is very common in biological networks.
Functional Annotation and Gene Set Enrichment Analysis
The targets of miRNA-TF-gene coregulatory network was functionally annotated with Gene Ontology (GO) using DAVID server. The functional enrichment analysis of 2991 target genes in the network produced 3300 GO terms. These terms were again subjected to Gene Set Enrichment Analysis (GSEA), yielded 463 enriched sets/clusters. Further quartile distribution based on enrichment score of 463 enriched sets has yielded many outliers (Figure 2). The outliers can be divided into two categories based on their grouping exclusive terms and highly exclusive terms. The exclusive terms can be seen as the extended part of data in the fourth quartile (Figure 2). But highly exclusive terms were so far from mainstream quartiles that it can’t be included. Hence, these highly exclusive terms can be represented as most enriched gene sets in the network. The 5 most enriched sets that were obtained are phosphoprotein/protein kinase, cell-cell adhesion junction proteins, membrane proteins, glycoproteins, tyrosine protein kinases. These topmost terms are phosphoproteins/protein kinase with an enrichment score of 61.33. This GO term has been apparently observed in many GO analysis of other hypoxia datasets because protein kinases are major signal transducing molecules that can transduce hypoxic stress signals in the cytoplasm to the nucleus, where activation of many hypoxia responsive factors such as HIF1, NFKB takes place. Other clusters that appeared were cell adherence proteins and membrane proteins with second and third best enrichment score of 38.7 and 28.7 respectively.
FFL analysis.
Biological systems are often built of recurring circuit modules that carry out key functions. Biomolecules such as miRNA, transcription factors and genes follow these recurring patterns in co-regulatory networks that regulate the molecular and cellular responses of living cells. These recurring patterns are called motifs, FFL is the most common type of motifs that are found overrepresented in most biological networks especially in regulatory networks. These FFL motifs consist of three entities miRNA, TF and gene. The TF and its target gene are directly regulated by the miRNA, and simultaneously miRNA or TF regulate each other. Recently these FFLs have helped the researcher in identifying important regulatory mechanisms during many disease pathologies. Since miRNAs signalling involves the functioning of the enormously complex system of molecular interaction at the system level. Hence, identifying FFL motifs in complex networks could further reveal the regulatory modules that can be used to define particular ecological traits at high altitudes. FFLs were identified in the directed miRNA-TF gene co-regulatory network by using the ‘motif discovery’ plugin of Cytoscape. The network contains 470 FFLs, having 494 nodes (82 miRNA, 85 TF, 327 genes) and 2942 edges between them.
Motif Evaluation-
Edgebetweeness Biological Network Motif (BNM)
The FFL network is evaluated based on the two functions. First, Edgebetweeness Biological Network Motif (BNM), which works on the principle of identifying edgebetweeness centrality in the motif network. Edgebetweeness centrality is defined as the topological coefficient for the number of shortest paths going through an edge. Higher the edgebetweeness centrality between two nodes in a network motif, the more significant chances of it to be biologically overrepresented. The EDGEBETWEENNESS -BNM analysis yielded MYC-CDC25A as the most significant bottleneck interaction in the network (Figure 3). The other more significant interactions in the cluster are hsa-miR-17-5p- HIF1A, SP1- APP, VEGFA- VHL. The second cluster of exclusive outliers interactions involves hsa-miR-26a-1-3p - CDC25A, hsa-miR-335-5p – EGLN, MYC – CRKL, hsa-miR-193b-3p - ADCY9, HIF1A - hsa-miR-210-3p, hsa-miR-210-3p - HIF1A, hsa-miR-26a-1-3p - TP53, CDC25A - CCND1, hsa-miR-26b-5p - CCND1, hsa-miR-335-5p - BRCA1, hsa-miR-335-5p - SP1, hsa-miR-16-5p – CRKL, hsa-miR-16-5p - SP1 interaction.
EDGEGO-BNM
FFL motif network is evaluated second time with EDGEGO-BNM evaluation. For this, first pathways enrichment of genes and TFs in the FFL network have been performed that shows 1067 pathways and 27 pathways respectively. The quartile distribution of 1067 pathways terms from pathways enrichment of genes yielded with no outliers. Whereas, quartile distribution from pathways enrichment of TFs generated four outliers i.e. ‘nuclear receptor transcription pathway’ is the most significant term obtained in pathway enrichment of TFs followed by ‘circadian gene expression’, ‘clock genes’ and ‘PPARA activates gene expression’. An outlier is a data point that differs significantly from other observations and in general, the enrichment p.values have been calculated by comparing the observed outcomes of an annotation term with the expected outcomes; individual terms that are beyond some limit or deviated from the original dataset said to be enriched. So, ‘nuclear receptor transcription pathway’ was adjudged as the most enriched pathway term and its gene sets (PPARA, RARG, PPARG, NR3C2, NR4A2, ESR1, RARA, AR and NR3C1) with their interacting miRNA, TF and gene partners were used to construct EDGEGO-BNM network. Nuclear receptors and their induced transcription factors are a consortium of proteins that are responsible for sensing hormones like steroid, thyroid and certain other molecules to regulate the specific gene expression in the nucleus, thereby controlling the development, homeostasis, and metabolism of the organism. Nuclear receptors are found essential for maintaining body homeostasis and metabolism at high altitude [70, 71] . The EDGEGO-BNM network has 23-miRNAs, 29-TFs, and 23- genes with 302 interactions (Figure 4a). Subsequently, edgebetweeness centrality of all its 302 interactions have been identified and on quartile distribution 7 outliers are obtained (Figure 4b). These are exclusive interactions in the network with hsa-miR-5582-5p - SP1, ESR1 – PTPRG, ESR1 - SP1, hsa-miR-335-5p - SP1, ERG - ESR1, ESR1 - TSC22D3, ESR1 - CCND1, hsa-miR-130a-3p - ESR1, hsa-miR-335-5p - NR4A2 are significantly enriched and bottleneck edges of the network.
The combined detailed evaluation FFL motif network through EDGEBETWEENESS-BNM, EDGEGO- BNM revealed overrepresented 9 FFLs that could be part of regulatory module controlling high altitude adaptation mechanism. The targets of these FFLs are being dysregulated by 9 cmiRNAs named; hsa-miR-335-5p, hsa-miR-26a-1-3p, hsa-miR-210-3p, hsa-miR-193b-3p, hsa-miR-17-5p, hsa-miR-16-5p, hsa-miR-5582-5P, hsa-miR-130a-3p. First of these FFL is hsa-miR-335-5p (miRNA) -SP1 (TF)-APP (target gene) having two exclusive interaction hsa-miR-335-5p - SP1 and SP1- APP EDGEBETWEENNESS -BNM network. Similarly, hsa-miR-210-3p (miRNA) - VEGFA (TF)-VHL (target gene) contains VEGFA-VHL and hsa-miR-210-3p- VHL as exclusive interactions from EDGEBETWEENNESS -BNM network. Third FFL hsa-miR-17-5p (miRNA) -HIF1A (TF)-EPAS1 (target gene) contains hsa-miR-17-5p- HIF1A and hsa-miR-17-5p - EPAS1 VHL as exclusive interactions from EDGEBETWEENNESS -BNM network. Hsa-miR-335-5p (miRNA)-EGR1 (TF)-SP1 (target gene). Fourth FFL hsa-miR-16-5p(miRNA) - SP1(TF)-VEGFA(target gene) contains all three interactions hsa-miR-16-5p- SP1, SP1-VEGFA and hsa-miR-16-5p-VEGFA as exclusive interactions in EDGEBETWEENNESS -BNM network. Fifth FFL hsa-miR-5582-5p (miRNA)-SP1 (TF) - NR3C1 (target gene) contains hsa-miR-5582-5p-SP1 and SP1- NR3C1 as exclusive interactions in EDGEBETWEENNESS -BNM network. Sixth FFL hsa-miR-26a-1-3p (miRNA) – MYC (TF)-CDC25A (target gene) contains MYC-CDC25A and hsa-miR-26a-1-3p-MYC as exclusive interactions in EDGEBETWEENNESS -BNM network. Seventh hsa-miR-130a-3p (miRNA) - ESR1 (TF)-PPARG (target gene). Eighth and ninth FFLs are hsa-miR-210-3p (miRNA) - HIF1A (TF)-BNIP3 (target gene) and hsa-miR-210-3p(miRNA)-HIF1A(TF)-LDHA (target gene) respectively as they contain one common exclusive interaction hsa-miR-210-3p-HIF1A. Hence, all interactions associated with these miRNAs were subjected from high altitude specific miRNA-TF gene coregulatory network. The TF and genes targets of this module were functionally annotated to identify key molecular mechanism regulated by the four miRNAs.
Prevalence of molecular responses during acclimatization at high altitude as hallmark.
The molecular responses obtained in miRNA-TF- gene network module which is regulated by the differentially expressed cmiRNAs; hsa-miR-335-5p, hsa-miR-26a-1-3p, hsa-miR-210-3p, hsa-miR-193b-3p, hsa-miR-17-5p, hsa-miR-16-5p, hsa-miR-5582-5P, hsa-miR-130a-3p have been tested to be hallmark responses using knowledgebase ranking of biomedical literature. For this, the training dataset for classification of these molecular responses as a hallmark was built. The training dataset comprises 41 PMIDs that are mined using keyword “acclimatization at high altitude”, “adaptation at high altitude” from PubMed database. The model was trained using Medline ranker advance to identify scoring function based on discriminative word weightage cloud. After performing leave one out cross-validation the model achieved 95.3% of accuracy. The model was used to rank the 13 molecular responses regulated by four cmiRNAs as hallmarks for high altitude adaptation. The ranking pattern of these molecular responses was compared with positive (training dataset) and negative (malaria dataset) control. Quartile distribution of the normalized p.values of ranked molecular responses related literature shows that ‘cytokine signaling’, ‘interleukin signaling’, ‘cell cycle regulation’, ‘cell development and senescence’, ‘body weight homeostasis’ and ‘circadian rhythm’ are normally distributed and their distribution is very close to the positive control. Also, recent reports have proved that these responses appear immediately after high altitude ascent. ‘Erythropoiesis’, ‘nuclear receptor factor regulation’, ‘protection from oxidative stress’ are slightly distant from positive control but slight variation could be possible due to larger sample size of the literature than the training set. Hypoxia responsive genes and adherence and junction genes have unequal data distribution in quartiles. Hence most of the data trend is towards quartiles that are closer to the positive control including outliers that are closest with the positive control trend (Figure 5). Further comparative analysis Area Under Curve (AUC) showed a high correlation between 13 molecular responses and high-altitude adaptation. Hence, above results prove that these molecular responses are hallmark responses for high altitude adaptation.
Phylogenetic analysis
Phylogenetical analysis was performed between 102 common precursor-miRNAs sequences of human and some high-altitude living mammals including Sus scrofa, Bos mutus, Bos taurus and non-mammals Gallus gallus. Further hierarchical clustering of the percentage sequence similarity gave 16 clusters (Figure 6). The sequences having the highest percentage of similarity are more closely related, while the ones with <90% similarity shows the dissimilarity among them. The cluster number 12 contains 6 human precursor miRNAs hsa-miR-497, hsa-miR-335, hsa-miR-212, hsa-miR-190b, hsa-miR-148b, hsa-miR-125b having high percentage similarity with all three mammals. Similarly, cluster number 11 contains 17 precursor miRNAs including hsa-miR-130a that have more than 93% of sequence similarity with mammals and <90% with Gallus gallus. Cluster 13 contains 6 miRNAs including hsa-miR-16 having >90% sequence similarity with all three mammals and Gallus gallus. Whereas, cluster 4 contains 5 miRNAs including hsa-miR-210 and shares >93% sequence similarity with Bos tauras, Bos mutus, Gallus gallus respectively. Lastly, cluster 1 contains 13 miRNAs that have sequence similarity <90% with both mammalian and Gallus gallus. Overall phylogenetical analysis have shown that human cmiRNAs hsa-miR-335-5p, hsa-miR-210-3p, hsa-miR-16-5p and hsa-miR-130a-3p identified in our network analysis have shared > 95% of sequence similarity with the bta-miR-335, bta-mir-210, bta-mir-16-1 and bta-mir-130a of Bos tauras species. These miRNAs are also found differentially expressed in bovine milk at high altitude. Hence, phylogenetic analysis supports our hypothesis that these cmiRNAs are part of regulating modules controlling early adaptive mechanisms in humans during high altitude ascent.
The 102 common miRNAs from Sus scrofa , Bos mutus, Bos taurus and non-mammals Gallus gallus were studied for phylogenetic relationship with human differentially expressed high altitude miRNAs hsa-miR-335-5p, hsa-miR-210-3p, hsa-miR-16-5p, hsa-miR-130a-3p that shares (>95%) of sequence similarity with the bta-miR-335, bta-mir-210, bta-mir-16-1, bta-mir-130a of Bos tauras species (Table 1).
miRNA
|
Percentage Sequence Similarity with (Sus scrofa)
|
Percentage Sequence Similarity with (Bos-tauras)
|
Percentage Sequence Similarity with (Bos-mutus)
|
Percentage Sequence Similarity with (Gallus-gallus)
|
hsa-miR-335-5p
|
100
|
100
|
100
|
<90
|
hsa-miR-210-3p
|
95
|
95
|
<90
|
<90
|
hsa-miR-16-5p
|
100
|
98.5
|
98.5
|
90.8
|
hsa-miR-130a-3p
|
98.4
|
98.4
|
98.4
|
<90
|
Table 1: Percentage sequence similarity of important human miRNAs that are differentially expressed high altitude with their complementary exogenous miRNAs from Sus scrofa, Bos tauras, Bos mutus and Gallus gallus.
Exogenous miRNAs cross-kingdom stability and AGO2 binding.
miRNAs cross-kingdom serum stability was checked by comparing them with sequence and secondary structure properties of gold standard miR-168a showed all proposed miRNAs have ≥0.85 MFEI score, <9 mismatches and have only 1 bulge in pre-sequence (Table 2). But only bta-mir-16b, bta-mir-210 and bta-mir-335 have ≤ 3 Interior loops, whereas bta-mir-130a precursor sequence contains 4 Interior loops (Table 2). This makes bta-mir-16b, bta-mir-210 and bta-mir-335 as more potent exogenous miRNAs to have cross-kingdom stability. The exosomes and AGO2 binding are the most common mechanism studied for transfer and uptake of exogenous miRNAs from peripheral blood to cells. Docking studies of potential exogenous miRNAs are performed with human AGO2 protein showed bta-mir-210 with highest ACE (-999) score (Table 3). This ACE score is almost double of the positive control osa-MIR168a having -429.12 ACE score. Other mRNAs also showed negative ACE responsible for their spontaneous binding reaction. Bta-miR-335 has the least ACE score of -33.86. Strong hydrophobic amino acid residues and amino acids with aromatic rings, which are relatively hydrophobic in nature, generally take part in hydrophobic interactions and contribute towards the stability in binding moieties while the protein interacts with macromolecules. Detailed examination amino acid residues of AGO protein in 3.0 Å vicinity of these miRNA-AGO complex of exogenous miRNAs (bta-mir-130a, bta-mir-16b, bta-mir-210 and bta-mir-335) showed some strongly hydrophobic amino acids, like LEU 269, LEU339, GLY 968 in bta-mir-193b; ILE 365 and VAL 434 in bta-miR-26-1; ALA 369, MET 364 in bta-miR-130a; MET 549, PRO 523, ALA 859, ILE 756 in bta-miR-210, were observed within a distance of 3.0 Å. Similarly, amino acids, like TYR 311 and TYR 804 in bta-miR-193b and bta-miR-335 with aromatic rings were also found in the binding pocket of miRNAs within a distance of 3.0 Å.
Exogenous source Bos-Tauras miRNAs
|
MFEI= (ΔG/sequence length)/ (G+C)%
|
Number of mismatches in mature miRNAs
|
Bulges in structure
|
bta-mir-130a
|
0.86
|
<9 mismatches
|
1 Bulge and 4 interior loop
|
bta-mir-16b
|
0.968
|
<9 mismatches
|
1 Bulge and 3 interior loop
|
bta-mir-210
|
1.12
|
<9 mismatches
|
1 Bulge and 3 Interior loop
|
bta-mir-335
|
1.03
|
<9 mismatches
|
1 Bulge and 3 Interior loop
|
Table 2: Exogenous miRNAs properties that help them in cross-kingdom serum stability in mammalian host i.e. MFEI, AU percentage in the sequence, Number of mismatches in mature miRNAs and Bulges in structure.
Exogenous source miRNAs
|
Geometrical shape complementary score
|
Penalty
|
Area
|
Atomic Contact Energy (ACE)
|
Ligand Transformation
|
positive control (miR-168)
|
19456
|
-3.19
|
1991.9
|
-429.12
|
-1.12745, 0.42746, -1.12862, -47.30375, 23.41908, 58.60162
|
bta-mir-16-1
|
18652
|
-4.29
|
3608.1
|
-407.47
|
-3.01060, 0.17830, -0.65718, 36.03665, 29.72168, 62.95200
|
bta-mir-130a
|
17200
|
-3.59
|
3040.5
|
-308.59
|
2.85672, 0.06691, 1.90344, 47.16856, -22.10817, 25.06612
|
bta-miR-335
|
18160
|
-4.03
|
3775.8
|
-33.86
|
-0.03258, -0.66220, -1.54875, 33.26122, -2.35926, 62.68807
|
bta-mir-210
|
20858
|
-4.31
|
5162
|
-999.66
|
-0.84306, -0.43879, -1.92970, 39.20004, 53.93851, 57.25181
|
negative control
|
16670
|
-4.31
|
2770.6
|
-168.54
|
-2.71, 0.54, 2.63, 49.89, 52.85, 18.24
|
Table 3: Docking scores between 4 exogenous miRNAs and human AGO2 protein resulted from PachDock.