Identification of microRNAs in the new lotus reference genome
To obtain a more comprehensive miRNA profile, we reanalyzed sRNA-seq datasets from six lotus tissues including leaf, petiole, petal, anther, unpollinated carpel and pollinated carpel, based on an updated miRbase database and an improved chromosome-level genome assembly of lotus. A total of 22.2 million filtered reads were mapped to the known miRNAs in miRBase (Table 1). The ratio of filtered high-quality reads mapped to the miRBase is 0.33%, i.e. a total of 50 866 reads aligned to the reference genome (nelumbo.biocloud.net) (Table 1) [29]. After merging with previous lotus miRNAs [27] and removing the redundant (overlapping) hairpin loci, a total of 1103 potential mature miRNA and 104 miRNA-star sequences were identified, and these miRNAs are produced by 1416 pre-miRNAs (hairpin loci) (Fig. 1a)(Additional file 2: Table S1 and S2). The number of detected mature miRNAs is less than pre-miRNAs because many pre-miRNAs from distinct duplicate MIRNA genes can produce identical (short) mature miRNA sequence, which was also reported in other species (http://mirbase.org). Comparing the origin of the pre-miRNAs with transposable elements (TE) region in genome, 623 (43.99%) pre-miRNAs were found to be TE-related, suggesting that a substantial number of the miRNAs originate from TEs [30, 31]. In addition, only 444 (40.25%) of those mature miRNAs were identified as miRNA in the previous analysis [27]. Furthermore, 235 (19.46%) of miRNAs were known sequences in miRBase database and 528 (43.74%) are novel miRNAs identified in this study. Among these currently identified novel miRNAs, 348 (65.9% of novel) are potentially produced by TE-related MIRNA-likes genes. By length, the 24bp miRNAs are the most abundant while 388 (58.43%) of which are TE-related, supporting that the emerging of novel miRNAs from TEs [32, 33](Additional file 1: Fig.S1). Furthermore, we observed that the frequency of each nucleobase (A, U, C and G) in the miRNAs was close to 25% (Additional file 1: Fig.S2). However, we also determined the frequency of the base of the mature miRNAs, the result showed that the 20bp, 21bp, and 22bp miRNAs preferentially start with ‘U’ at the first base (46.96%, 55.37%, and 61.22%, respectively) (Additional file 1: Fig.S3), while 24bp miRNAs preferred ‘A’ (58.5%). Comparing with miRNA’s first nucleotide bias analysis in other species, we found the bias tendency in 21bp, 22bp and 24bp miRNAs is similar to Camellia japonica [34], pomegranate[35].
Expression dynamics of miRNAs and their target genes across different tissues
Through differential regulation in different tissues or developmental stages, miRNAs play pivotal roles in diverse biological processes including development [4, 5]. To gain insight into the miRNA expression pattern across different lotus tissues, we first performed hierarchical clustering on the expression data from our identified mature miRNAs (Fig.1a). Interestingly, we found that the majority of miRNAs are preferentially expressed in specific tissues. Only 110 miRNAs are commonly expressed in all tissues; carpel has the most specific miRNAs, followed by anther (Fig.1b). A total of 1,003 differentially expressed miRNAs were identified. We identified differentially expressed miRNA in other tissues relative to pollinated carpel, and the up-regulated miRNAs outnumber the down-regulated miRNA in the pollinated carpel, indicating that there could be intensive activation of miRNAs in carpel after pollination (Fig.1c).
The Pearson correlation correlation coefiicients among gene expression profiles generated by the RNA-seq analysis of biological replicates suggested the high reproducibility between replicates (ave r > 0.859, all p-value < 0.0001)(Additional file 1: Fig.S4). To explore the expression pattern of miRNA target genes among different tissues, pairwise comparisons of these six samples were conducted to identify differentially expressed genes (DEGs). A total of 28 701 DEGs were identified by using the edgeR package. The comparison between anther and petiole shows the most DEGs, whereas the comparison between pollinated carpel and unpollinated carpel reveals the least DEGs (Fig.2a). To explore whether differentially expressed miRNAs might escalate the expression difference of their target genes between tissue samples, we calculated the proportion of DEGs in the target genes of those differentially expressed miRNAs (DEMTGs) and compared it to DEGs in the genome background. The comparison between anther and petiole also exhibits the highest percentage 49.26% (740) of DEMTGs, while the comparison in pollinated carpel and unpollinated carpel has the lowest percentage of 5.07% (18) (Fig.2a). The proportion of DEGs in DEMTGs is generally higher than that of DEGs in all genes for most between-tissue comparisons, especially in the comparison between carpel and leaf, between carpel and petiole ( χ2 test, all pvalue<0.01), except for the comparison between petiole and leaf. This indicates that the differentially expressed miRNAs among tissues might influence the expression of their targeted gene to some extent.
To further explore how intensively the expression pattern of target genes was influenced by the miRNA, the expression correlation analyses between target genes and miRNAs across different tissue samples were carried out (Additional file 2: Table S3). In this study, the correlation coefficient (r) between miRNA and target gene is divided into six levels: strong negative correlation (-1 to -0.75), intermediate negative correlation (-0.75 to -0.25), weak negative correlation (-0.25 to 0), weak positive correlation (0 to 0.25), intermediate positive correlation (0.25 to 0.75) and strong positive correlation (0.75 to 1). The result showed a substantial bias toward negative correlations such that the negative correlations are about double comparing with positive correlations (Fig.2b). The intermediate negative correlations and weak negative correlations are prevalent, and the strong negative correlations are the least, suggesting that miRNAs still mainly repress their target genes (Fig.2b). We further investigated the expression level of targeted genes in different samples, which revealed that the expression of targeted genes is varied between samples possibly due to the expression difference of miRNAs between samples (Fig.2c). To validate the potential regulation of miRNA targets, we randomly selected 15 miRNA targeted genes to perform real-time qPCR experiments. We carried out correlation analyses between miRNAs expression and RT-PCR result of target genes and compared with corresponding correlation obtained from RNA-seq expression. Among 15 pairs of correlation between miRNA and target genes, 12 pairs (80%) showed the negative correlation based on both results from RT-PCR and RNA-seq, further revealing the complex regulatory relationships between miRNAs and target genes. (Fig.3, Additional file 1: Fig.S5).
Differentially expressed miRNA and their target isoforms
Taking advantage of transcript isoform analyses from our previous study [28], we further analyzed the miRNA-target isoforms instead of genes. A total of 10345 unique target isoforms were predicted (Additional file 2: Table S4). Most target isoforms (8850, 85.54%) contain only one miRNA target site; a small portion of isoforms (847, 8.18%) contain two miRNA target sites; the rest contain more than two miRNAs target sites (Additional file 1: Fig.S6a). Notably, the isoforms ‘Nn8g40904.1’ and ‘Nn8g40902.1’ can be bound by many miRNAs, with 38 and 31 homologous miRNAs from the family miR169, respectively. We also calculated the number of regulatory miRNAs per target gene, and expectedly the distributions of the number of regulatory miRNAs for miRNA-targeted genes and miRNA-targeted isoforms are similar (Additional file 1: Fig.S6b). Not all miRNA-targeted genes have all their corresponding isoforms being targeted by miRNAs--there are only 1637 target genes having all of their isoforms targeted by the specific miRNAs, such as ‘Nn3g21300’ (AFB3) (Additional file 1: Fig.S7), whereas there are 2449 target genes with only a portion of their isoforms being targeted, such as ‘Nn3g21564’ (Additional file 1: Fig.S7). We further compared the expression level of miRNA-targeted isoforms and non-miRNA-targeted isoforms from the same genes. Interestingly, we found that miRNA targeted isoforms tend to have significantly higher expression level in all investigated tissue samples, suggesting that the isoforms containing miRNA binding sites are under miRNA-mediated expression tuning and buffering likely because of their high expression level representing the functional importance (Additional file 1: Fig.S8). The most miRNA target sites in gene bodies are on coding regions (CDSs) (74.76%), whereas the 5’-UTRs (9.59%) and 3’-UTRs (15.65%) regions have fewer target sites by miRNAs. Given that a substantial number of TE-related miRNAs were found in this study, it is essential to know if they also have a regulatory role in gene expressions. We found that 43.57% of TE-related miRNAs have a target gene while 50.28% of non-TE-related miRNAs have a target gene, suggesting that the TE-related miRNAs also play an important role in regulating genes (Additional file 2: Table.S2, S4).
To understand the biological functions of miRNAs, especially those tissue-specific miRNAs, functional annotation based on gene ontology (GO) was used. We found that only 1979 out of 4086 miRNA target genes were annotated by GO categories (Additional file 2: Table S5; Additional file 1: Fig.S9). Among the most significantly enriched GO terms of target genes are “endonuclease activity,” “regulation of transcription, DNA-templated” and “Cul4-RING ubiquitin ligase complex,” indicating that the genes targeted by miRNA can regulate numerous key processes and many belonging to transcription factors [36, 37]. The specific miRNA may regulate specific genes being crucial in the different developmental stages, and therefore GO functional enrichment analysis was conducted for six samples (Additional file 1: Fig.S10). In anther, the most enriched GO terms are related to plant reproductive processes such as “microtubule organizing center,” “auxin-activated signaling pathway” and “endonuclease activity.” In petiole, the miRNA target genes are enriched in “chloroplast stromal thylakoid” and “leaf development.” Both in the pollinated and unpollinated carpel, the most enriched GO terms are the same, i.e. “sepal development,” “regulation of anthocyanin biosynthetic process” and “miRNA binding.” These results collectively revealed that the functions of the miRNA target genes are closely related to the tissue-specification.
Functional differentiation of isoforms in the co-expression networks
It is often assumed that the tightly connected genes in the co-expression network are likely participating in the same biological process, and therefore it provides a means to identify functional divergence between isoforms. Here, we performed WCGNA at the transcript isoform level. We found that some isoforms are exhibiting dramatic expression differences among different tissues. To explore the potential function of miRNA-targeted isoforms in different tissue, we first performed a hierarchical clustering analysis of total isoforms, and we found that a substantial portion of isoforms showed strong tissue-specificity (Additional file 1: Fig.S11). After filtering out the lowly expressed (FPKM <0.1) and universally expressed (C.V. of FPKMs across six tissue samples <2) isoforms, 56,583 isoforms were retained to construct a co-expression network by using WGCNA. A total of 10 modules were defined as clusters of major tree branches (Fig.4a), with the module size ranging from 766 to 13,309, and isoforms within the same cluster have high correlation coefficients among each other (Additional file 2: Table S6, Fig.4b). We further investigated correlations between the tissues and the 10 co-expression modules. Most modules are significantly (p <0.05) correlated with single tissue, except that the black module is significantly correlated with both pollinated carpel and unpollinated carpel. Basically, isoforms in each module are over-represented in the corresponding tissue, and the 150 candidate hub isoforms for each module were assigned (Additional file 1: Fig.S12). The correlation analysis between the modules revealed that black, cyan, green and pink module, which are significantly correlated with the three floral organs, also have high correlation among each other, proving the accuracy of the module clustering and the homology of differentiated floral organs (Additional file 1: Fig.S13). Because the leaf and petiole are both vegetative tissues, six modules are significantly correlated with leaf or petiole, respectively. To explore the influence of miRNAs on the co-expression network of isoforms, we calculated the content of miRNA-targeted isoforms and the number of hub isoforms in every module(Additional file 1: Fig.S14). Moreover, our further χ2 test analysis at module level revealed that only the proportion of isoforms in brown modules being targeted by miRNAs (184/2260, 8.14%) is significantly lower than the corresponding proportion of isoforms in hubs (51/150, 34%) (χ2 test, p <0.01) (Additional file 1: Fig.S14). This suggested that miRNAs preferentially target hub isoforms in the brown module, which is highly correlated with leaves.
The isoforms from the same gene are often translated into protein variants with different structures and, hence, performing different functions [22]. To understand the scale of functional differentiation among isoforms from the same gene, we identified isoforms that were assigned to different modules in the co-expression network. Interestingly, among 11,302 genes with multiple isoforms being assigned to modules, 3029 genes have their isoforms being assigned into different modules (GIDDM). Moreover, 464 of these GIDDMs were targeted by miRNAs. This supports that substantial genes with multiple isoforms show functional divergence between isoforms. For example, “Nn5g29774”, annotated as ‘responding to salt stress’, produce a total of 41 isoforms, and 18 of them were clustered into five modules, including 12 in cyan, three in red, one in pink, one in black and one in brown (Additional file 1: Fig.S15). Among these 18 isoforms belonging to different modules, and five of them were regulated by two miRNAs, one by nnu-miR200 and one by miR-1655-3p.
If the isoforms of the same gene are functionally divergent, we assume that these different isoforms might likely convert into different genes (duplicates) to play their independent functions during the long-term evolution. To validate this assumption, we searched the closest homologous isoform in rice and Arabidopsis, respectively, for each lotus isoform. After filtering out genes with only one isoform, the gene can be divided into three categories: different isoforms from the same lotus gene with their closest homologs being different genes in rice or Arabidopsis (I); different isoforms from the same lotus gene with their closest homologs being the same isoform from the same gene in rice or Arabidopsis (II); different isoforms from the same lotus gene with their closest homologs being different isoforms from the same gene in rice or Arabidopsis (III). The results show that the number of genes in category II is the most (Additional file 2: Table S7, Additional file 1: Fig.S16). However, interestingly, when only considered the GIDDM, the proportions of isoforms in category I, were largely increased by 10.3% and 9.8%, respectively in rice and Arabidopsis. The result further substantiates that different isoforms, belonging to different co-expression modules, from the same gene tend to evolve into more divergent sequence structures. Meanwhile, this somehow shows that these isoforms were more likely to convert into different duplicate copies during long-term evolution.
MiRNA-targeted isoforms in plant hormone signaling pathways
To further elucidate the functions of miRNAs and their target isoforms, we focused on the phytohormone pathways enrichment since they are essential in almost all biological processes in the plant. First, KEGG annotation found that a total of 397 miRNA target genes were assigned to 106 pathways. ‘Plant hormone signaling transduction’ was the third most enriched pathways and represented by 20 genes. These 20 genes are in auxin-, cytokinin-, gibberellin-, abscisic acid-, ethylene, brassinosteroid- and jasmonic acid-associated signaling pathways, targeted by 24 miRNAs (Additional file 1: Fig.S17). Among the 20 signaling genes, 16 of them were assigned to different modules in the co-expression network at isoform level and four were not assigned to any module, suggesting that miRNA target genes in the hormone pathways are mostly functionally relevant in different lotus tissues (Additional file 1: Fig.S18).
In auxin signaling pathways, the auxin receptor TRANSPORT INHIBITOR RESPONSE1 (TIR1), the auxin-responsive gene auxin/indole-3-acetic acid (AUX/IAA), the auxin response factor (ARF) the small auxin up RNA (SAUR) are targeted by “nnu-miR393b-1s”, “nnu-miR393b-2b”, “NmiRNA#40_469”, “nnu-miR102”, ”nnu-miR156c-1*”. Combining the expression of miRNA and their target isoforms, the result revealed that the altered expression levels of targeted isoforms were not always negatively correlated with their corresponding miRNAs (Fig.5). For example, the expression pattern of most isoforms from TIR1 was almost negatively regulated by miRNAs (Fig.5). However, the two isoforms of TIR1, “Nn3g21300.4” and “Nn4g26020.5”, were highly expressed in leaf and petiole, same with the expressed pattern of the corresponding miRNAs “nnu-miR393b-1s” and “nnu-miR393b-2b”. In another example, we found that the low expression of “nnu-miR102” in petiole might be associated with high expression of most targeted isoforms “Nn1g04271” (ARF) in petiole except for “Nn1g04271.8” (Fig.5). Meanwhile, there are four isoforms, transcribed by Nn1g04271, clustered into red and magenta modules highly correlated with petiole, suggesting the important regulatory relationship of nnu-miR102-ARF in auxin signaling of the petiole. In addition to the auxin signaling genes, similar regulatory relationship, and expression patterns in abscisic acid signaling were also observed (Additional file 1: Fig.S19). For example, the phosphatase 2C (PP2C) is targeted by five miRNAs, which are lowly expressed in the unpollinated carpel. Isoforms from one of two miRNA-targeted genes, “Nn6g35319”, are highly expressed in the unpollinated carpel, whose five isoforms were clustered into the black module. Nevertheless, the miRNA “nnu-miR8” and its targeted genes, homologous to serine/threonine-protein kinase CTR1, have high expression in anther, which, however, do not have a negative expression pattern between the miRNAs and target genes. Although we found many miRNAs and their target isoforms in plant hormone signaling pathways, their regulatory relationships are much more complex.