Identification of microRNAs on the new lotus reference genome
To obtain a more comprehensive miRNA profile, we reanalyzed sRNA-seq datasets from six lotus tissues including leaf, petiole, petal, anther, unpollinated carpel and pollinated carpel, based on an updated miRbase database and an improved chromosome-level genome assembly of lotus. A total of 22.2 million filtered reads were mapped to the known miRNAs in miRBase (Table 1). The ratio of filtered clean reads mapped to the miRBase is 0.33%, i.e. a total of 50 866 reads are aligned to the reference genome (Table 1). After merging with previous lotus miRNAs[27] and removing the redundant (overlapping) MIRNAs (miRNA genes), a total of 1103 potential mature miRNA and 104 miRNA-star sequences were identified, and these miRNAs are produced by 1416 pre-miRNAs (Fig. 1a)(Additional file 2: Table S1 and S2). Comparing to transposable elements (TE), 623 (43.99% of total) pre-miRNAs were found completely or partially TE-related, suggesting that a substantial amount of the miRNAs are originated from TEs [29, 30] Only 444 (36.78%) of those mature miRNAs were identified as miRNA in previously analysis [27]. Furthermore, 235 (19.46%) of miRNAs were known sequences in miRBase database and 528 (43.74%) are novel miRNAs identified in this study. Among these currently identified novel miRNAs, 348 (65.9% of novel) are potentially produced by TE-related MIRNA-likes. By length, the 24bp miRNAs are the most abundant while 388 (58.43%) of which are TE-related, supporting that the emerging of novel miRNAs from TEs [31, 32](Additional file 1: Fig.S1). Furthermore, the summary of miRNA nucleotide bias at each position showed there is no nucleotide bias in lotus miRNA nucleotide content (Additional file 1: Fig.S2). However, we also determined the frequency of the base of the mature miRNAs, the result showed that the 20bp, 21bp and 22bp miRNAs preferentially start with ‘U’ at the first base (46.96%, 55.37% and 61.22%, respectively) (Additional file 1: Fig.S3), while 24bp miRNAs preferred ‘A’ (58.5%). Comparing with miRNA first nucleotide bias analysis in other species, we found the bias tendency in 21bp, 22bp and 24bp miRNAs is similar to Camellia japonica[33], pomegranate[34].
Expression dynamics of miRNAs and their target genes across different tissues
Through differential regulation in different tissues or developmental stages, miRNAs play pivotal roles in diverse biological processes including development[4, 5]. To gain insight into the miRNA expression pattern across different lotus tissues, we first performed hierarchical clustering on the expression data from identified mature miRNAs (Fig.1a). Interestingly, we found that the majority of miRNAs are preferentially expressed in specific tissues. Only 110 miRNAs are commonly expressed in all tissues; carpel has the most specific miRNAs, followed by anther (Fig.1b). A total of 1003 differentially expressed miRNAs were identified in the differential expression analysis. When comparing the miRNA expression level of pollinated carpel with all the other samples, the up-regulated miRNAs outnumbers the down-regulated miRNA, indicating that there is intensive activation of miRNAs in carpel after pollination (Fig.1c).
To explore the miRNA target genes expression pattern among different tissues, pairwise comparisons of these six samples was conducted to identify differentially expressed genes (DEGs). A total of 28 701 DEGs were identified using edgeR package. The comparison between anther and petiole shows the most DEGs, whereas the comparison between pollinated carpel and unpollinated carpel reveals the least DEGs. To explore whether differentially expressed miRNAs might escalate the expression difference of their target genes between tissue samples, we calculated the proportion of DEGs in the target genes of those differentially expressed miRNAs (DEMTGs), and compared it to DEGs in the genome background. The comparison between anther and petiole also exhibits the highest percentage 49.26% (740) of DEMTGs, while the comparison in pollinated carpel and unpollinated carpel has the lowest percentage 5.07% (18) (Fig.2a). The proportion of DEGs in DEMTGs is generally higher than that of DEGs in all genes in most between-tissue comparisons, especially in the comparison between carpel and leaf, carpel and petiole (Χ2 test, pvalue<0.01), except for the comparison between petiole and leaf. This, indicates that the differentially expressed miRNAs among tissues have influence on targeted gene expression in some extent.
To explore how intensive the expression pattern of target genes was influenced by the miRNA, the correlation analyses between expression level of the target gene and miRNA across different tissue samples were carried out (Additional file 2: Table S3). In this study, the correlation coefficient (Cor) is divided into six levels, strong negative correlation (-1~-0.75), intermediate negative correlation (-0.75~-0.25), weak negative correlation (-0.25~0), weak positive correlation (0~0.25), intermediate positive correlation (0.25~0.75) and strong positive correlation (0.75~1). The result showed a substantial biased toward negative correlations as to the negative correlations were about twice as frequent as positive correlations (Fig.2b). The intermediate negative correlations and weak negative correlations are prevalent, and the strong negative correlations are the least, suggesting that miRNAs still mainly repress their target genes (Fig.2b). Further by investigating the targeted genes in different samples, in line with the tissue-specific miRNAs, the targeted genes varied among different samples (Fig.2c). These results suggest that isoforms transcribed from the same gene can be differentially regulated owing to presence and absence of miRNA binding site(s). To validate the potential regulation of miRNA targets, we selected 15 miRNA targeted genes to performed real-time qPCR experiments. We carried out correlation analyses between miRNAs expression and RT-PCR result of target genes and compared with corresponding correlation obtained from RNA-seq expression. Among 15 compared values, 12 (80%) of the Correlation from RT-PCR and RNA-seq was similar, revealing the accuracy of the result of RNA-seq (Fig.3, Additional file 1: Fig.S4).
Differentially expressed miRNA and their target isoforms
Taking advantage of transcript isoform analyses from our previous study[28], we analyzed the miRNA-targeted isoforms instead of genes. A total of 10345 unique target isoforms were found (Additional file 2: Table S4). Most target isoforms (10586, 86.42%) are only regulated by one miRNA, while 904 (7.38%) isoforms are regulated by two miRNAs and the rest are affected by more than two miRNAs (Additional file 1: Fig.S5a). Intriguingly the isoforms ‘Nn8g40904.1’ and ‘Nn8g40902.1’ can be bound by the many miRNAs, with 38 and 31 homologous miRNAs from the family miR169, respectively. We also calculated the number of regulatory miRNAs per target gene, and expectedly the distributions of the number of regulatory miRNAs for miRNA-targeted genes and miRNA-targeted isoforms are similar (Additional file 1: Fig.S5b). Not all miRNA-targeted genes have all their corresponding isoforms being targeted by miRNAs-- there are only 1637 target genes having all of their isoforms targeted by the specific miRNAs, such as ‘Nn3g21300’ (AFB3) (Additional file 1: Fig.S6), whereas there are 2449 target with only a portion of their isoforms being targeted, such as ‘Nn3g21564’ gene (Additional file 1: Fig.S6). We further compared the expression level of miRNA targeted isoforms and non-miRNA targeted isoforms from the same genes. Interestingly, we found that miRNA targeted isoforms tend to have significantly higher expression level in all investigated tissue samples (Additional file 1: Fig.S7). The most miRNA target regions in gene body are on coding regions (CDSs) (74.76%), whereas the 5’-UTRs (9.59%) and 3’-UTRs (15.65%) regions have fewer target sites by miRNAs. Given that a substantial number of TE-related miRNAs were found in this study, it is essential to know if they also have a regulatory role in genes expressions. We found that 43.57% of TE-related miRNAs have target gene, and 50.28% of non-TE-related miRNAs have target gene, suggesting that the TE-related miRNAs also play an important role in regulating genes.
To understand the biological functions of miRNAs, especially tissue-specific miRNAs, gene ontology (GO) functional annotation was used. We found that only 1979 out of 4086 miRNA target genes were annotated which their function based on different GO categories (biological processes, molecular functions and cellular components) are provided in supplementary files (Additional file 2: Table S5; Additional file 1: Fig.S8). The most significantly enriched GO terms were “endonuclease activity”, “regulation of transcription, DNA-templated” and “Cul4-RING ubiquitin ligase complex”, indicating the genes targeted by miRNA regulated numerous key processes and many belonging to transcription factors [35, 36]. The specific miRNA may regulate specific gene in different developmental stage, therefore, GO functional enrichment analysis was conducted for six samples (Additional file 1: Fig.S9). In anther, the most enriched GO terms are related to plant reproductive processes such as “microtubule organizing center”, “auxin-activated signaling pathway” and “endonuclease activity”. In petiole, the miRNA target genes are enriched in “chloroplast stromal thylakoid” and “leaf development”. Both in pollinated and unpollinated carpel, the most enriched GO terms are the same, i.e. “sepal development”, “regulation of anthocyanin biosynthetic process” and “miRNA binding”. These results revealed that the functions of the miRNA target genes are closely related to the tissue-specification.
Functional differentiation of isoforms in the co-expression networks
It is assumed that the closely located gene in the co-expression network, are participating in the same biological process. To investigate lotus isoforms divergence in functions, especially those are miRNA regulated targets, we performed WCGNA at transcript isoform level. We found that some isoforms are exhibiting dramatic expression difference among different tissues. To explore the potential function of miRNA-targeted isoforms in different tissue, we first performed a hierarchical clustering analysis of total isoforms, and found that a substantial portion of isoforms showed strong tissue specificity (Additional file 1: Fig.S10). After filtering out the low expressed (FPKM <0.1) and universally expressed (C.V. of FPKMs across six tissue samples <2) isoforms, 56 583 isoforms were retained to construct a co-expression network by using the WGCNA. A total 9 modules were defined as clusters of major tree branches (Fig.4a), with the module size ranging from 766 to 13 309, and isoforms within the same cluster have high correlation coefficients among each other (Additional file 2: Table S6, Fig.4b). We further investigated correlations between the tissues and the 10 co-expression modules. Most modules are significantly (p <0.05) correlated with single tissue, except that the black module is significantly correlated with both pollinated carpel and unpollinated carpel. Basically, isoforms in each module are over-represented in the corresponding tissue, and the 150 candidate hub isoforms for each module were identified (Additional file 1: Fig.S11). The correlation analysis between the modules revealed that black, cyan, green and pink module, which are significantly correlated with the three floral organs, also have high correlation among each other, proving the accuracy of the module clustering and homology of differentiated floral organs (Additional file 1: Fig.S12). Because the leaf and petiole are both vegetative tissues, six modules are significantly correlated with leaf or petiole, respectively. To explore the influence of miRNAs on isoforms module allocation, we calculated the content of miRNAs targeted isoforms and number of hub isoform in every module. The proportion of isoforms in all modules being targeted by miRNAs (1785/29391, 6.07%) is significantly lower than the corresponding proportion of isoforms in hubs (121/1500, 8.07%) (Χ2 test, p <0.01) (Additional file 1: Fig.S13). This suggests that miRNAs preferentially target hub isoforms and play a central role in a gene network.
The isoforms from the same gene are often translated into a protein with different structures and, hence, different functions [22]. To understand the scale of functional differentiation among isoforms from the same gene, we identified isoforms which were assigned to different modules in the co-expression network. Interestingly, among 11,302 genes with multiple isoforms being assigned to modules, 3029 genes have their isoforms divided into different modules (GIDDM). Moreover, 464 of GIDDM were targeted by miRNAs. This supports that a substantial of genes with multiple isoforms show functional divergence between isoforms. For example, “Nn5g29774”, annotated as ‘responding to salt stress’, produce a total of 41 isoforms, and 18 of them were clustered into five modules, including 12 in cyan, three in red, one in pink, one in black and one in brown (Additional file 1: Fig.S14). Among these 18 isoforms belonging to different modules, and five of them were regulated by two miRNAs, one by nnu-miR200 and one by miR-1655-3p.
If isoforms of the same gene have functionally diverged, we assume that these different isoforms might convert into different genes (duplicates) to play their independent functions during long-term evolution. To validate this assumption, we searched the closest homologous isoform in rice and Arabidopsis, respectively, for each lotus isoform. After filtering out genes with only one isoform, gene can be divided into three categories: different isoforms from the same lotus gene with their closest homologs being different genes in rice or Arabidopsis (I); different isoforms from the same lotus gene with their closest homologs being the same isoform from the same gene in rice or Arabidopsis (II); different isoforms from the same lotus gene with their closest homologs being different isoforms from the same gene in rice or Arabidopsis (III). These results showed that category II is prevalent in this isoform-homology analysis (Additional file 2: Table S7, Additional file 1: Fig.S15). However, interestingly, when we only included the GIDDM, and the proportion of isoforms in category III and especially category I, which represent a stronger degree of sequence structural differentiation between different isoforms from the same lotus gene, were largely increased. This further substantiates that different isoforms from the same gene with functional divergence in the gene network also tend to have higher structural differentiation between each other and more likely to evolve into different duplicate copies.
MiRNA targeted isoforms in plant hormone signaling pathways
To further elucidate the functions of miRNAs and their target isoforms, we investigate the hormones pathways enrichment since they are essential in almost all biological processes in plant. First, KEGG annotation was carried out for miRNA target genes in lotus. A total of 397 miRNA target genes were assigned to 106 pathways. ‘Plant hormone signaling transduction’ was the third most enriched pathways and represented by 20 genes. These 20 genes are in auxin-, cytokinine-, gibberellin-, abscisic acid-, ethylene, brassinosteroid- and jasmonic acid-associated signaling pathways targeted by 24 miRNAs (Additional file 1: Fig.S16). Among the 20 signaling genes, 16 of them were assigned to different modules in the co-expression network at isoform level and four were not assigned to any module, suggesting that miRNA target genes in the hormone pathways are mostly functionally relevant in different lotus tissues (Additional file 1: Fig.S17).
In auxin signaling pathways, the auxin receptor TRANSPORT INHIBITOR RESPONSE1 (TIR1), the auxin-responsive gene auxin/indole-3-acetic acid (AUX/IAA), the auxin response factor (ARF) the small auxin up RNA (SAUR) are targeted by “nnu-miR393b-1s”, “nnu-miR393b-2b”, “NmiRNA#40_469”, “nnu-miR102”, ”nnu-miR156c-1*”. Combining the expression of miRNA and their target isoforms, the result revealed that the altered expression levels of targeted isoforms were not always negatively correlated with their corresponding miRNAs (Fig.5). For example, the expression pattern of most isoforms from TIR1 were almost negatively regulated by miRNAs (Figure 4). However, the two isoforms of TIR1, “Nn3g21300.4” and “Nn4g26020.5”, were highly expressed in leaf and petiole, same with the expressed pattern of the corresponding miRNAs “nnu-miR393b-1s” and “nnu-miR393b-2b”. In another example, we found that the low expression of “nnu-miR102” in petiole might be associated with high expression of most targeted isoforms “Nn1g04271” (ARF) in petiole except for “Nn1g04271.8” (Fig.5). Meanwhile, there are four isoforms, transcribed by Nn1g04271, clustered into red and magenta modules highly correlated with petiole, suggesting the important regulatory relationship of nnu-miR102-ARF in auxin signaling of petiole. In addition to the auxin signaling genes, similar regulatory relationship and expression pattern in abscisic acid signaling were also observed (Additional file 1: Fig.S18). For example, the phosphatase 2C (PP2C) is targeted by five miRNAs, which are lowly expressed in unpollinated carpel. Isoforms from one of two miRNA-targeted genes, “Nn6g35319”, are highly expressed in unpollinated carpel, whose five isoforms were clustered into black module. Nevertheless, the miRNA “nnu-miR8” and its targeted genes, homologous to serine/threonine-protein kinase CTR1, have high expression in anther, which, however, do not have a negative expression pattern between the miRNAs and target genes. Although we found many miRNAs and their target isoforms in plant hormone signaling pathways, their regulatory relationship is much complex, suggesting miRNA and gene splicing might play important roles in lotus hormone signaling.