Identification of novel phasiRNA biogenesis pathways in Oryza sativa
The sRNA HTS data set from different rice samples were employed as inputs and cDNA sequences as alignment reference for searching PHAS loci capable of producing 21-nt or 24-nt phasiRNAs. As a result, fourteen 21-nt and nineteen 24-nt PHAS loci candidates passed through the filtering procedures (Additional file 1: Table S1, Additional file 2: Table S2).
Recent reports discovered that, the processing of 21-nt phasiRNAs mainly dependent on OsDCL4, and OsDCL3 is necessary for biogenesis of 24-nt phasiRNAs in rice[1]. Therefore, the 21-nt PHAS loci candidates were further verified by evaluating their phasiRNA productivity difference between wild-type and osdcl4-1mutant. The productivity of 24-nt phasiRNA from corresponding precursors in both wild-type and osdcl3-1 mutant were analyzed for identification of 24-nt PHAS loci. Additionally, the expression of corresponding miRNA/sRNA triggers in wild-type and mutants were also analyzed.
As a result, five novel 21-nt PHAS loci along with their corresponding miRNA/sRNA triggers were identified (Fig. 1, Table 1, Additional file 3: Figure S1, Additional file 4: Figure S2). We identified that the transcripts of LOC_Os01g57968.1 and LOC_Os05g43650.1 capable of generating phasiRNAs in seedling. The transcripts of LOC_Os02g18750.1, LOC_Os04g25740.2 and LOC_Os06g30680.1 were able to produce phasiRNAs in panicle. Additionally, the LOC_Os06g30680.1-derived phasiRNAs were also found in panicle under drought stress (Additional file 3: Figure S1). Two known 21-nt PHAS loci (LOC_Os12g42380.1 and LOC_Os12g42390.1) were also uncovered by our screening procedure (Additional file 5: Table S3), which have been identified as two parts of a long non-coding RNAs[11].
For 24-nt phasiRNA biogenesis pathways, five novel PHAS loci along with their sRNA triggers were identified (Fig. 2, Table 1, Additional file 6: Figure S3, Additional file 7: Figure S4). The LOC_Os01g37325.1-derived phasiRNAs were detected in panicle, under control and drought stress condition (Fig. 2, Additional file 6: Figure S3). The generated 24-nt phasiRNAs from the transcripts of LOC_Os02g20200.1, LOC_Os02g55550.1, LOC_Os04g45834.2 and LOC_Os09g14490.1 were detected in seedling.
In addition, for all these newly found PHAS loci, only the biogenesis of LOC_Os04g25740.2-derived phasiRNAs were triggered by a known miRNA, miR2118f. The phasiRNAs generated from the other newly found PHAS loci, to our knowledge, were all triggered by novel sRNAs (Table 1).
Table 1
Novel PHAS loci in Oryza sativa
PHAS loci | PhasiRNA production region | Small RNA trigger ID | Small RNA trigger | Small RNA trigger binding sites | PARE cutsites |
21-nt PHAS loci |
LOC_Os01g57968.1 | 361–1765 | OSsRNA-1 | GCUUUUUUGAACUUUUUCAUU | 424–444 | 435 |
LOC_Os02g18750.1 | 188–920 | OSsRNA-2 | UUUUUUGGCAUUCUGUAACUUG | 176–197 | 188 |
LOC_Os04g25740.1 | 1908–2159 | osa-miR2118f | UUCCUGAUGCCUCCCAUUCCUA | 1875–1896 | 1887 |
LOC_Os05g43650.1 | 1494–1620 | OSsRNA-3 | GAUUCAUUAACUUCAAUAUGAA | 1528–1549 | 1540 |
LOC_Os06g30680.1 | 62–208 | OSsRNA-4 | UUCCUGGAGCCGCUCAUUCCAU | 50–71 | 62 |
24-nt PHAS loci |
LOC_Os01g37325.1 | 1565–1760 | OSsRNA-14 | AAAAGUAGAUGGAUGCGGAGAC | 1676–1697 | 1688 |
LOC_Os02g20200.1 | 4856–5052 | OSsRNA-15 | UAGAUGCUGUCCUGAAAAGGUG | 4873–4894 | 4885 |
OSsRNA-16 | AGCCAUGCUAGUCUAAGAGGG | 5007–5027 | 5018 |
LOC_Os02g55550.1 | 905–1101 | OSsRNA-17 | UAGAUGCUGUCCUGAAAAGGUG | 922–943 | 934 |
LOC_Os04g45834.2 | 1051–1307 | OSsRNA-18/ OSsRNA-19 | UUAAUAUUUAUAAUUAGUGUCU/ UUAAUAUUUAUAAUUAAUGUCC | 1103–1124 | 1115 |
LOC_Os09g14490.1 | 4585–4757 | OSsRNA-20 | UAGAUGCUGUCCUGAAAAGGUG | 4578–4599 | 4590 |
Analysis of the regulatory function of novel phasiRNAs generated from 21-nt PHAS loci
21-nt phasiRNAs have been revealed function in trans-regulation of target genes by cleaving mRNAs in plant, these phasiRNAs were named as trans-action siRNAs (ta-siRNAs).
In order to identify novel ta-siRNAs generated from the newly found 21-nt PHAS loci, all the 21-nt phasiRNAs were systematically “predicted” by computer based on the modified model of ta-siRNA biogenesis [12]. All the detectable phasiRNAs were then employed for target prediction based on miRU algorithm and verified by using degradome HTS data (see details in “materials and methods”). As a result, we discovered ten novel ta-siRNAs which generated from three newly found 21nt PHAS loci (LOC_Os02g18750.1, LOC_Os05g43650.1 and LOC_Os06g30680.1), respectively, they mediated forty sRNA-target interactions (Table 2, Fig. 3, Additional file 8: Figure S5). Some targets of these ta-siRNAs were found playing important roles in plant cellular signaling cascades (LOC_Os02g39380.1 ) [13]. Some targets were involved in plant growth and development (LOC_Os01g34620.8, LOC_Os02g52900.2, LOC_04g39600.1, LOC_08g40440.1, LOC_Os6g23274.1, LOC_Os06g47850.1, LOC_11g41860.1, LOC_11g41860.2 and LOC_Os05g46580.1)[14–19]. And some targets related to plant defense and stress response (LOC_Os09g12230.1, LOC_Os04g38450.1 and LOC_Os04g49160.1) [20–22].
Although, the transcript of LOC_Os12g42380.1 has been identified as part of an lncRNA phasiRNA precursor[11], one novel LOC_Os12g42380.1-derived ta-siRNAs was found based on our revised ta-siRNA biogenesis model[12]. LOC_Os12g42380.1 (414)21 5'D7(+) targeted to a NAD dependent epimerase/dehydratase gene (LOC_Os07g47700.1) (Table 1, Addition file 8: figure S5), suggesting it might be involved in plant growth, development and environmental stress[23, 24].
PhasiRNAs generated from the transcripts of LOC_Os02g18750.1, LOC_Os06g30680.1 and LOC_Os12g42380.1 were detected in panicle, and the LOC_Os05g43650.1–derived phasiRNAs were detected in seedling, which suggest the requirement of these phasiRNAs in different stages of development. Combining with the annotation information of ta-siRNA target genes (Table 2) and their function verification by searching relative references. We believed that, the OSsRNA-2-LOC_Os02g18750.1-phasiRNA, OSsRNA-3-LOC_Os05g43650.1-phasiRNA, OSsRNA-4-LOC_Os06g30680.1-phasiRNA and OSsRNA-5-LOC_Os12g42380.1-phasiRNA pathways might play crucial regulatory roles in rice growth, development.
The regulatory network of the phasiRNAs pathways that mentioned above were constructed based on the target information (Fig. 4).
Table 2
Targets of novel tasiRNAs in Oryza sativa
PhaisRNA ID | phasiRNA sequence | Targets | Target annotation | miRU start-ending | PARE cutsites |
LOC_Os02g18750.1(189)21 3'D26 (+) | UGUGCCACGUCAACACCACCA | LOC_Os03g40260.1 | Regulator of chromosome condensation domain containing protein | 1676–1696 | 1687 |
LOC_Os02g18750.1(192)21 3'D25 (+) | GCGCCACUGCCGUCGACGUGU | LOC_Os02g39380.1 | OsCML17 - Calmodulin-related calcium sensor protein | 343–363 | 354 |
LOC_Os02g18750.1(204)21 3'D13 (+) | UCGACUUCGCCGCCUCGGCGC | LOC_Os02g39090.1 | expressed protein | 802–823 | 814 |
LOC_Os05g43650.1(1540)21 3'D2(+) | UCAAUAUGAAUGUGGAAAAUG | LOC_Os01g15520.1 | expressed protein | 1248–1268 | 1259 |
LOC_Os01g34620.8 | OsGrx_S15.1 - glutaredoxin subgroup II | 500–520 | 511 |
LOC_Os03g50070.1 | DUF1295 domain containing protein | 1195–1215 | 1206 |
LOC_Os04g38450.1 | gamma-glutamyltranspeptidase 1 precursor | 2137–2157 | 2148 |
LOC_Os04g49160.1 | zinc finger, C3HC4 type domain containing protein | 1093–1113 | 1104 |
LOC_Os05g03574.1 | expressed protein | 648–668 | 659 |
LOC_Os06g23274.1 | zinc finger, C3HC4 type, domain containing protein | 4632–4652 | 4643 |
LOC_Os06g47850.1 | zinc finger family protein | 97–117 | 108 |
LOC_Os08g19114.1 | expressed protein | 2050–2070 | 2061 |
LOC_Os08g40440.1 | dihydroflavonol-4-reductase | 1315–1335 | 1326 |
LOC_Os09g12230.1 | ubiquitin-conjugating enzyme | 1021–1041 | 1032 |
LOC_Os09g27500.1 | cytochrome P450 | 1714–1734 | 1725 |
LOC_Os11g41860.1 | OsFBX429 - F-box domain containing protein | 1030–1050 | 1041 |
LOC_Os11g41860.2 | OsFBX429 - F-box domain containing protein | 973–993 | 984 |
LOC_Os12g12950.1 | expressed protein | 1071–1091 | 1082 |
LOC_Os05g43650.1(1540)21 3'D2(-) | UUUUCCACAUUCAUAUUGAUG | LOC_Os02g45650.1 | peptidase | 1760–1780 | 1771 |
LOC_Os05g43650.1(1542)21 3'D1(+) | AAUGAAUCUAGACAUAUAUAU | LOC_Os02g05810.1 | expressed protein | 1330–1350 | 1341 |
LOC_Os02g05810.2 | expressed protein | 1324–1344 | 1335 |
LOC_Os02g52900.2 | glutaredoxin 2 | 2034–2054 | 2045 |
LOC_Os02g53000.2 | lysM domain-containing GPI-anchored protein precursor | 1340–1360 | 1351 |
LOC_Os04g44590.1 | expressed protein | 651–671 | 662 |
LOC_Os04g44590.5 | expressed protein | 445–465 | 456 |
LOC_Os05g41190.1 | expressed protein | 1026–1046 | 1037 |
LOC_Os05g41190.2 | expressed protein | 1082–1102 | 1093 |
LOC_Os05g51140.1 | expressed protein | 929–949 | 940 |
LOC_Os05g51140.2 | expressed protein | 1586–1606 | 1597 |
LOC_Os09g33930.1 | farnesyltransferase/geranylgeranyltransferase type-1 subunitalph | 1457–1477 | 1468 |
LOC_Os09g33930.2 | farnesyltransferase/geranylgeranyltransferase type-1 subunitalph | 1454–1474 | 1465 |
LOC_Os09g33930.3 | farnesyltransferase/geranylgeranyltransferase type-1 subunitalph | 1740–1760 | 1751 |
LOC_Os09g33930.4 | farnesyltransferase/geranylgeranyltransferase type-1 subunitalph | 1453–1473 | 1464 |
LOC_Os09g33930.5 | farnesyltransferase/geranylgeranyltransferase type-1 subunitalph | 1375–1395 | 1386 |
LOC_Os12g37510.1 | UDP-glucoronosyl and UDP-glucosyl transferase domain containing | 1584–1604 | 1595 |
LOC_Os05g43650.1(1543)21 3'D2(-) | GCAUUUUCCACAUUCAUAUUG | LOC_Os02g48390.1 | phosphoribosyl transferase | 1758–1778 | 1769 |
LOC_Os05g43650.1(1543)21 3'D3(-) | UUCACAAUGUAAGUCAUUUUA | LOC_Os04g39600.1 | fasciclin domain containing protein | 1020–1040 | 1031 |
LOC_Os07g01130.1 | pentatricopeptide containing protein | 4240–4260 | 4251 |
LOC_Os05g43650.1(1543)21 3'D1(+) | AUGAAUCUAGACAUAUAUAUC | LOC_Os12g40920.1 | bZIP transcription factor domain containing protein | 1312–1332 | 1323 |
LOC_Os06g30680.1(62)21 3' D2(+) | CAUGGACAACUUCCUGCACAG | LOC_Os05g46580.1 | polyprenyl synthetase | 1365–1385 | 1376 |
LOC_Os12g42380.1(414)21 5'D7(+) | UUUCUUCCAAGAGAGAGUAAG | LOC_Os07g47700.1 | NAD dependent epimerase/dehydratase family domain containing protein | 1753–1773 | 1764 |
Analysis of the RNA directed DNA methylation (RdDM)regulated promoters of novel 24-nt phasiRNAs
RdDM is an important regulatory event involved in repressive epigenetic modifications that can trigger transcriptional gene silencing. In order to analysis the novel 24-nt phasiRNA mediated RdDM in rice, all the known promoter sequences in rice were employed for scanning of the target sites of novel phasiRNAs generated from the newly found five 24-nt PHAS loci. The DNA methylation status of target promoter was further analyzed by utilizing the bisulfite-seq data sets of panicle (GSM4232038) and root (GSM4232039) of rice. As a result, a promoter of LOC_Os02g40860.1 gene was found targeted by five - LOC_Os01g37325.1-derived phasiRNAs (Table 3).
As CG and CHG methylation contexts are maintained by DNA methyltransferases and histone modifications, while CHH methylation is associated with 24-nt siRNA guided RdDM[25].
The CHH methylation status of promoter was found significantly higher in panicle than that in other tissues (Fig. 5), which consistent with the finding that LOC_Os01g37325.1-derived phasiRNAs only expressed in panicle tissue (Additional file 2: Table S2). Therefore, the results suggesting a methylation mediated transcriptional silencing of the promoter of LOC_Os02g40860.1.
LOC_Os02g40860.1 encodes a Casein kinase I1 (OsCKI1) protein, which belongs to the CKIs protein family. CKIs have been identified highly conserved in eukaryotes, they are believed involving in a variety of important biological events, since they have a wide substrate specificity in vitro[26]. The expression level of LOC_Os02g40860.1 in root and panicle tissues of rice have been analyzed by utilizing the RNA-seq libraries (panicle: GSM4230036 and GSM4230037; root: GSM4230038 and GSM4230039) which contributed by Zhao et al [27]. As shown in Fig. 5C, the expression level of LOC_Os02g40860.1 was significantly lower in panicle than that in root, therefore we speculated that, the biogenesis of LOC_Os01g37325.1-derived phasiRNAs are specific required in panicle for the regulation of the transcriptional level of LOC_Os02g40860.1. In another word, the OSsRNA-14- LOC_Os01g37325.1- phasiRNA pathway might involve in the development of panicle in rice.
Table 3
The target promoter of LOC_Os01g37325.1-derived phasiRNAs
24-nt phasiRNAs_ID | PhasiRNAs_sequences | Binding_sites_ on_promoter | Prmoter_location | Target_genes | Target annotation |
LOC_Os01g37325.1(1684) 24 5’D12(+) | AUCAUGACUUGGGUAUUACGUUUC | 111–134 | chr2_24766608–24766807 | LOC_Os02g40860.1 | Casein kinase I1 (CKI1) |
LOC_Os01g37325.1(1684) 24 5’D10(+) | AGUCCUGGUUUGAUAAGAUUGUAA | 63–86 |
LOC_Os01g37325.1(1684) 24 5’D9(+) | AGUAGAUUUAGGAAACCGAUACCG | 39–62 |
LOC_Os01g37325.1(1665) 24 5’D13(+) | ACUAGUUAUAGGGGAUAACUUAUA | 154–177 |
LOC_Os01g37325.1(1665) 24 5’D11(+) | GACUUGGGUAUUACGUUUCCCUGU | 106–129 |