LncRNAs and CircRNAs Are Widely Distributed in Rrice
About 1.3 billion read pairs were generated from 18 Oryza sativa L. ssp. indica samples, including three tissues (young leaf, panicle and root) from the three rice varieties (MH63, ZS97 and SY63), with two replicates per tissue and variety (Table S1). Based on the pipeline in Figure S1a, we identified 11,513, 13,153 and 13,549 lncRNAs in MH63, ZS97 and SY63, respectively (Fig. 1a, Table S2). LncRNAs are divided into five types, i.e., intergenic, intronic, antisense, bidirectional and sense lncRNAs, of which long intergenic non-coding RNAs (lincRNAs) and long non-coding natural antisense transcripts (lncNATs) accounted for the majority with 5,723, 5,923 and 6,585 lincRNAs, and 1,934, 2,070 and 2,216 lncNATs in MH63, ZS97 and SY63, respectively (Fig. 1a). LncRNAs that intersected with protein-coding (PC) genes in the sense strand (sense lncRNAs) possibly resulted from an incomplete annotation or a coding-free transcript of coding gene due to alternative splicing events. The Pearson correlation of lncRNA expression between two replicates per tissue was > 0.9, indicating the high repeatability of the data (Fig. S1b).
A total of 5,122, 4,273 and 4,707 different high-confidence circRNAs were identified in MH63, ZS97 and SY63 after quality control and filtering. All samples showed similar values to their biological replicates (Table S3). Genome-wide annotation of these high-confidence circRNAs included 4,013, 3,571 and 3,847 exonic, 298, 112 and 112 intronic and 811, 590 and 754 intergenic circRNAs in MH63, ZS97 and SY63, respectively (Fig. 1a). Only ~ 25% of the circRNAs were shared by more than one tool in each variety (Table S4), which is similar to previously reported (Chen et al. 2018), highlighting the expected difference between the available circRNAs identification tools.
We found that the expression levels of circRNAs were not associated with their parental genes, as only 4% circRNAs were significantly positively correlated with the expression of their parental genes (P-value < 0.05, PCC > 0) (Fig. 1b). Furthermore, more than 30% of the parental genes produce more than one circRNA in each variety (Table S5), confirming that there is alternative splicing of circRNA in rice. As expected, the middle length of lncRNAs in rice was longer than circRNAs, while was shorter than the average length of mRNAs (916 bp, 1,339 bp and 1,793 bp in circRNA, lncRNA and mRNA, respectively) (Fig. 1c). In addition, the expression of lncRNAs was lower than the PC genes, which was also lower than the parental genes of circRNAs (Fig. 1d). Some circRNAs were very long (more than 10 kb), which might be caused by the presence of repetitive DNA in the plant genome (Mehrotra and Goyal, 2014). In total, ~ 4.8% (196) of circRNAs longer than 10 kb and located in intergenic regions was filtered from subsequent analyses. Sequence analysis revealed that single exonic lncRNAs or circRNAs were the most common (Fig. S1c), accounting for 41% of lncRNAs, which was significantly higher than in mRNAs (23%) and circRNAs (20%). LncRNAs/circRNAs were not equally distributed in each chromosome, and the proportion of lncRNAs/circRNAs per length or number of genes in the chromosomes changed between chromosomes and varieties (Table S6, S7). There were six lncRNA clusters distributed in chromosomes 4, 9, 10, 11 and 12 (Fig. 1e). Among them, one lncRNA cluster located in 20–20.5 Mb of chromosome 12 was conserved among the three rice varieties.
To assess the conservation of lncRNAs/circRNAs in Oryza sativa and other plant genomes, these sequences were aligned to the genomes of some representative plant species using BLASTN. More than 82% of the lncRNAs sequences in MH63 were conserved in the genomes of different rice varieties (Fig. 1f, Table S8), including Oryza sativa subsp. xian (ZS97), Oryza sativa subsp. geng (Oryza sativa japonica), African cultivated rice (Oryza glaberrima) and wild rice (Oryza barthii, Oryza rufipogon). Similarly, the conservation of circRNAs in rice was ~ 73%. In contrast, analysis of the genomes of different monocots including Brachypodium distachyon, Setaria italica, Zea may and Sorghum bicolor showed similar low conservation in both lncRNAs (< 11%) and circRNAs (< 15%), while in dicots such as Arabidopsis thaliana and Populus trichocarpa the conservation was very low (1–2%). This reflects that lncRNAs/circRNAs were not conserved among different plant species, but had certain intra-species conservation (Fig. 1f), indicating that lncRNAs/circRNAs could be considered as “young” genes as they evolved relatively recently.
Further analyses revealed that 47% and 58% of the lncRNAs sequences from MH63 could align to those of ZS97 and SY63, respectively. The conservation analysis of the back-splicing junctions of circRNAs in MH63 showed that ~ 39% circRNAs were conserved in ZS97, and almost 51% could align to SY63 (Table S8). In addition, 3,445 (30%) lncRNAs and 1,361 (32%) circRNAs of MH63 were conserved in ZS97 and SY63, indicating that a large number of lncRNAs/circRNAs had species-specific expression.
DNA Methylations of LncRNAs and CircRNAs in Rice Genome
To further explore the regulation of lncRNAs/circRNAs in rice, we investigated the possible influence of DNA methylation, an important epigenetic modification in plants, which has been observed for CG, CHG and CHH contexts with H being any nucleotide but G. We analyzed the methylation densities of lncRNAs loci and the parental genes of circRNAs (P-circRNAs), and compared them with those of PC genes in MH63 genome. We found that lncRNAs had higher methylation level (CG = 38.0%, CHG = 15.4%, CHH = 2.2%) than PC genes (27.3%, 8.4%, 1.4%, wilcox.test, P < 2.2e-16 for CG, CHG and CHH, respectively), while the P-circRNAs had the higher CG and lower CHG than PC genes (36.0%, 4.7%, 1.1%, wilcox.test, P < 2.2e-16 for CG and CHG, Fig. 2a). DNA methylation is more likely to be associated with promoter regulatory regions (Zhong et al. 2013). Therefore, the average methylation signals within 1 kb upstream the transcriptional start site (TSS) was examined. The lncRNA loci displayed a relatively higher CG and lower CHH methylation density than the PC genes, while P-circRNAs were the opposite. Although the CG methylation density was also reduced near the TSS of lncRNAs, it remained ~ 2-fold higher than that of the PC genes, and the CG methylation of P-circRNAs was approximately 2-fold lower than PC genes (Fig. 2a and Fig. S2a).
Using PC genes as control, we normalized P-circRNAs into 1 kb and compared the position distribution of circRNAs relative to their parental genes and DNA methylation density. We found that circRNAs were mainly concentrated in the middle of gene-body rather than two sides, and circRNAs were highly correlated with CG and CHG enrichment compared with PC genes (Fig. 2b and Fig. S2b). The lncRNAs were divided into three groups according to their expression (low, FPKM < 0.5; middle, 0.5 < FPKM < 2; high, FPKM > 2), and the average density of DNA methylation over the lncRNA loci were plotted. We observed that the DNA methylation level of lncRNAs with FPKM > 0.5 were similar, while lncRNAs with FPKM < 0.5 had different methylation level near TTS, indicating that lncRNA with FPKM < 0.5 might be incomplete (Fig. 2c), which can provide insights for screening high-confidence lncRNAs. Similarly, we analyzed the DNA methylation levels of P-circRNAs at different quantile expression levels and found that P-circRNAs with high expression levels showed lower methylation near TSS (Fig. 2d).
We further describe the impact of methylation changes on lncRNA loci and P-circRNAs based on the methylation data from (Zhao et al. 2020). In total, 889 and 1042 DMRs were located in the body of lncRNAs and P-circRNAs for different tissues, which involved 660 (8%) and 641 (22%) loci, respectively. However, about half of these DMR-containing genes were differentially expressed. Similar with previous report (Zhao et al., 2018), we observed that DMRs related with CHG and CHH was predominantly hypo-methylated, resulting in up-regulated expression of lncRNAs and P-circRNAs (Fig. 2e and Fig. S2c).
LncRNAs and CircRNAs Are Associated with Different TEs
DNA methylation in plant predominantly occurs on transposons (Law and Jacobsen 2010). With the aim of testing a possible influence of transposons in lncRNAs and circRNAs, we identified that ~ 53% lncRNAs (4,387) were associated with TEs (Fig. 3a), and ~ 15% circRNAs (598) were originated from TE-related genes in MH63, which accounted for ~ 37% of the total genes (Song et al. 2018). In addition, TE-related lncRNAs/circRNAs were found to be abundant in miniature inverted-repeat transposable elements (MITEs) and RC/Helitron transposons (Fig. 3b). Through comparing the expression of TE-associated lncRNAs/circRNAs with non-TE-associated lncRNAs/circRNAs, we observed that the expression of non-TE-associated lncRNAs and P-circRNAs was higher than those associated with TEs (Wilcoxon’s test, P < 2.2e-16, Fig. 3c), while there was no difference in circRNA expression (Wilcoxon’s test, P > 0.05). We further analyzed the methylation profiles of TE-associated lncRNAs/circRNAs and found that all the methylation densities were significantly higher than those of non-TE lncRNAs/circRNAs (Fig. 3d-f), except that no difference was observed for CG methylation density between TE and non-TE-associated circRNAs. In short, the transposon elements, containing high DNA methylation level, affected the expression of lncRNAs and P-circRNAs in xian/indica rice. In addition, the non-TE genes were more preferred to produce circRNAs than TE-related genes.
RNAPII-mediated Domains Affect Expression of LncRNAs in Rice
To characterize the epigenomic feature and RNA polymerase II (RNAPII) occupancy in lncRNA and circRNA, we examined the expression of lncRNA loci and P-circRNAs harboring different histone marks. It was shown that the lncRNA loci marked by RNAPII were significantly highly expressed, however, lncRNAs marked by active histone (H3K4me3 and H3K27ac) or repressed histone (H3K27me3) had similar expression compared with those without histone markers (Fig. 4a and Fig. S3a). This phenomenon was different from protein-coding genes, which were activated by active histones including H3K27ac and H3K4me3 (Zhao et al. 2020). Therefore, P-circRNAs combined with active histone markers like H3K27ac and H3K4me3 explained the high expression of these genes (Fig. 4b, Fig. S3b and Fig. 1d). Although heterochromatin marker H3K9me2 accumulated mainly in the intergenic regions and exhibited significant higher levels of DNA methylation (Zhao et al. 2020), lncRNAs marked by H3K9me2 only accounted for 10% and had no significantly different expression than those not marked (Fig. 4a and Fig. S3c). These results suggested that histone modification may have distinct mechanism in lncRNAs compared with PC genes.
Consistent with previous reports, most lncRNAs and circRNAs are transcribed by RNAPII (Sun et al. 2016; Wang and Chekanova 2017), but detailed functional mechanisms based on DNA 3D structure are lacking in plants. We analyzed the interaction characteristics of lncRNAs and circRNAs using RNAPII-mediated CHIA-PET database in MH63 (Zhao et al. 2019). It was observed that 29% lncRNAs (1251) and 39% circRNAs (322) were transcribed by RNAPII in seedling of MH63 (Fig. 4c), more than half of which were located in the RNAPII-mediated chromatin interaction domains (726 lncRNAs and 228 circRNAs, respectively). In order to determine if a quantitative relationship of intensity and expression was existed between RNAPII and lncRNA, we partitioned the RNAPII intensity in two groups (weak and strong), and plotted the expression of lncRNA loci contained in each group. We observed that lncRNAs harboring strong intensity of RNAPII were expressed higher than those with weak intensity of RNAPII with no linear relationship (Fig. 4d and Fig. S4d). Notably, the expression of lncRNA loci and P-circRNAs in RNAPII-mediated interactions domains were much higher than those without interactions, and it was the lowest for those without RNAPII. For example, ciri_circ147, exp_circ113 and exp_circ114 contained RNAPII peaks with and without interaction but only had RPM value of 2.90e− 08, while exp_circ117 lacked RNAPII but had a higher RPM value of 2.61e− 07, further confirming that high expression of parental genes of circRNAs did not correlate with high expression of circRNAs (Fig. 4c and Fig. 4e).
LncRNAs and CircRNAs in Rice Show High Tissue Specificity
Both lncRNAs and circRNAs showed high tissue-specificity in all varieties, with panicle being the most abundant, then root and the last leaf in all the three rice varieties (Fig. S4a). In MH63, we identified 10,076 (88% account for total lncRNAs), 7,517 (65%) and 6,149 (52%) lncRNAs, and 3,367 (66%), 1,933 (38%) and 1,160 (23%) circRNAs expressed in panicle, root and young leaf, respectively. Interestingly, tissue-specific expression of circRNAs was much higher than lncRNAs. We observed that ~ 43% lncRNAs were commonly expressed in three tissues and the rest were tissue-specific expressed, while the commonly expressed ratio of circRNAs was only 7.5%. In addition, we performed differential expression analysis for lncRNAs, and obtained 8,419 (73%) differentially expressed lncRNAs (DELs) in MH63. GO analysis of the nearest genes of lncRNAs revealed that DELs were enriched in diverse biological processes, cellular components and molecular functions depending on the tissue of origin, confirming the dynamic expression of lncRNAs in different tissues. Nearest genes of Up-regulated lncRNAs in panicle were enriched in ‘flower development and reproduction’, and in leaf were enriched in ‘transport and the formation of thylakoid and plastid’, and in root were enriched in ‘transcription factor activity’. These enriched functions and the tendency of lncRNAs to cis-regulate the expression of nearby genes suggests that the DELs play an important role in the growth and development of different plant tissues. GO analysis for parental genes of circRNAs revealed that they can participate in ‘post-embryonic development’, ‘nitrogen compound metabolic process’, ‘binding of diverse molecules’ and ‘formation of different cell parts’ (Fig. S5a). Considering that circRNAs are highly dynamic and usually only expressed in one tissue, we also performed GO enrichment on parental genes of circRNA in each tissue, revealing that circRNAs participate in ‘’reproductive and post-embryonic development’ in panicle, and contribute to ‘the formation of cytosol and cytoplasm’ in root (Fig. S4b). However, parental genes of circRNAs from leaf were not significantly enriched in any function when compared to those of panicle and root.
LncRNAs and CircRNAs Acting as CeRNAs Can Regulate Important Biological Traits in Rice
To evaluate the ncRNA-associated ceRNA interaction landscape in MH63, a complete circRNA/lncRNA-miRNA-mRNA interaction network was constructed with multi-tissues. In total, 797 lncRNAs and 215 circRNAs were interacted with 137 miRNAs and targeted a total of 1,044 mRNAs, adding up to a total of 4,517 different ceRNA pairs (Fig. 5a). Target genes of these miRNAs were enriched in flower development, reproduction, nucleus, nitrogen compound metabolic process and so on (Fig. S5b). Most of these miRNAs played significant roles in rice growth and development, important agronomic traits, and resistance to disease and drought. For example, Osa-miR156, which over-expressed in rice can increase salt stress tolerance and delay flowering (Wang et al., 2019). Another miRNA, Osa-miR444, is a key factor in relaying the antiviral signaling from virus infection and response to stress (Eren et al. 2015; Wang et al. 2016). To understand the regulatory mechanisms of ceRNAs, we further explored the sub-networks related to the above two important miRNAs. We found that osa-miR156 can target and regulate OsSPL gene family members (Fig. 5b), including OsSPL14, a gene related to rice tillering, panicle number and thousand-grain weight (Miura et al., 2010). However, the mechanism of how osa-miR156 regulates OsSPL14 has not been reported. Our research suggests that lncRNA TCONS_00049880 located in intergenic region, may participate in the competitive binding of osa-miR156 and regulate the expression of SPL gene family. Similarly, several miRNAs from the osa-miR444 family in our network can target and regulate some transcription factors such as KIP1, PFT1, AP2/ERBP, MADS-box and calmodulin-binding protein (Eren et al. 2015). TCONS_00027428, a lncRNA located on the natural antisense strand of OsMADS27, can compete with OsMADS27 to combine osa-miR444b.2, a miRNA that is linked to disease resistance and response to stress (Eren et al. 2015; Wang et al. 2016).
Finally, to explore the difference of ceRNA networks among different tissues, we further constructed circRNAs/DELs-miRNA-DEMs networks in MH63 for panicle, leaf and root. In total, 61, 40 and 52 miRNAs were predicted as target of 231, 68 and 101 DELs and 64, 72 and 99 circRNAs, and could interact with 180, 162 and 225 DEMs in panicle, leaf and root, respectively (Fig. S6-S8). Difference in the ceRNA networks between tissues indicated that different lncRNAs and circRNAs participated in diverse functions in the growth and development of each tissue. We performed GO analysis of the target-genes and found that they were enriched in ‘flower development and reproduction’ in panicle, indicating a specific function of the DELs and circRNAs with miRNA recognition elements in panicle in regulating gene expression (Fig. S5b).