sRNAs longer than 24 nt found in plant NGS libraries are too often completely disregarded or filtered out during research, presumed to be unimportant degradation products in bioinformatic pipelines. However, sRNAs of non-canonical length may play important roles in gene regulation. In small RNA libraries obtained from somatic embryonic tissues of P. abies, we detected a longer fraction of sRNAs of 31–34 nt, not previously described in plants (Yakovlev and Fossdal, 2017). Our current analysis we report that within the RNAs range the most prevalent size was 24 nt sRNAs (comprising 21%), followed by 21 nt sRNAs (16%), while the 32–33 nt sRNA comprise 11% of the total reads inP. abies somatic embryos. The 32–33 nt sRNA are statistically overrepresented in embryos compared to buds and they were differentially expressed in embryos between different epitype inducing conditions. We here denote these 32–33 nt sRNA as long small RNA (lsRNA). The lsRNAs were present in seeds from all 10 gymnosperms and the angiosperm Arabidopsis thaliana examined, suggesting that these are conserved between all seed plants (gymnosperms and angiosperms) during evolution.
Different gymnosperm orders had lsRNAs of slightly different prevalent lengths.Pinales seed samples had peaks at 31 and 34 nt, inGnetales one at 34 nt, G. biloba at 33 nt and, in C. revoluta, lsRNAs appear shorter with peaks at 29 and 31 nt. It is curious that the lsRNAs prevalent lengths match with the phylogenetic position in gymnosperm orders of the gnepine hypothesis (Wang and Ran, 2014). The most basalCycadales have the shortest lsRNAs, then length and variety increase towards the extantPinales andGnetales sister groups. That this finding fits with the gnepine hypothesis might be a coincidence but may also reflect sRNA related processing machinery adaptations affecting lsRNA production that has occurred during and after the evolutionary divergence of these orders. Among the 9 gymnosperm species analyzed, the most highly expressed lsRNAs differed between species, which resonated with the conclusions of Chen et al. (2018), who noted that lineage-specific sRNAs were likely to perform specific functions in certain plants or groups (Chen et al., 2018).
In the seed of the angiosperm A. thaliana, we found analogous sized sRNA with a peak at 31 nt (putative lsRNA). This finding was a surprise, since theA. thaliana small RNAs was apparently already well described and with just a few reports noting the existence of 30–40 nt sRNAs, only in response to treatment with pathogenic bacteria (Katiyar-Agarwal et al., 2007). These authors denoted them as long siRNAs (lsiRNAs). One of these lsiRNAs, AtlsiRNA-1, represents a natural antisense transcript (natsRNA) specifically induced in response to the bacterium Pseudomonas syringae that carries the effector avrRpt2 (Katiyar-Agarwal et al., 2007). The fact that longer than 30nt sRNAs are typically removed during sRNA library preparations or filtered out during analysis, may be part of the reason that this 32 nt peak has not received attention inA. thaliana seed studies. Despite this omission, we have described for the first time a series of lsRNA tags that are present in gymnosperms and angiosperms that point out the potential importance of lsRNA in these taxa. Taking into account that gymnosperms and angiosperms diverged from a common ancestor ca. 500 million years ago (Vergara-Silva et al., 2000) and the living orders of gymnosperms separated from each other 130 million years ago (Chaw et al., 2000), it is striking that more than 200 specific lsRNA were found in all the 10 species analyzed and more than 250 were shared among the 9 gymnosperms. This finding might reflect the fact that highly conserved sRNAs tend to play key roles in regulation of plant growth and development (Zeng et al., 2019).
The most well studied examples of longer sRNAs are PIWI-interacting RNAs (piRNAs)(Ishizu et al., 2012; Sato and Siomi, 2013) and tRNA-derived sRNAs (Röther and Meister, 2011). piRNAs are usually between 26 and 31 nt long, are mainly derived from transposons, retroelements, and repetitive loci, and only present in germline cells. piRNAs are characterized by the presence of a 5´ uridine and their association with PIWI proteins, the germline specific Argonaut subfamily (Sato and Siomi, 2013). Based mainly on their length, we initially interpreted the 32 nt peak lsRNA as possible plant analogs of the animal piRNAs (Yakovlev and Fossdal, 2017). However, we were unable to identify bona fide PIWI proteins homologs (Nonomura et al., 2007; Zheng et al., 2019) among the defined AGO gene models in the gymnosperm P. abies. Therefore, we should reject our initial hypothesis and now conclude that lsRNAs are not plant analogs to animal piRNAs.
Despite the lack of Piwi proteins in plants we were still able to identify a large set of proteins similar to those involved in piRNA biogenesis in animals (Ishizu et al., 2012). We observed that for 6 of 16 piRNA biogenesis-like the transcripts, levels differed inP. abies embryos (also rich in lsRNAs) compared to needles (with very low lsRNA levels). These gene products could potentially be involved in a yet to be undescribed sRNA biogenesis pathway analogous to, or completely different, to that reported for animal piRNAs.
The exact genomic origin of lsRNAs is enigmatic. Here we predict that lsRNAs derive from varied genomic regions. Direct mapping onto the Norway spruce genome hints at several possible sources that include coding parts of transcripts, introns, longer non-coding RNAs and from specific tRNAs.
Some lsRNA matches are grouped into clusters on the genome and often overlap with predicted siRNAs and miRNAs loci. Therefore, some lsRNAs may be companion products of miRNA and TAS loci or alternatively be specific products from these loci.
At least one third of the lsRNAs have strong similarity to and could originate from tRNAs. Considering that the typical tRNA is 70–90 bp in length, the 28–37 nt size was much smaller than canonical tRNA sequences. It has been reported that the difference between tRF/tRNA-halves and degraded tRNA species shows the position bias, with either 5´ or 3´ of tRNA part observed (Sharma et al., 2016).
tRNA-derived sRNAs include: ~18–28 nt small tRNA-derived fragments (tRFs) and ~ 30–35 nt tRNA-derived halves (tRHs), generated either from pre-tRNAs or mature tRNAs (Martinez, 2018; Zhu et al., 2018). tRDs and tRHs were initially described in animals, including humans (Röther and Meister, 2011). Such sRNA fragments are pronounced in yeast during oxidative stress and similar tRNA fragments were also observed in human cells and plants during oxidative stress using Northern blot analysis (Thompson et al., 2008). The tRHs size fits into the observed length of our lsRNAs. The tRHs can be generated from pre-tRNAs and tRNAs, by cleavage of the anticodon loop, which yields 5′tRHs and 3′tRHs products (Zhu et al., 2018). Zhang et al. (2009) purposely isolated and cloned 30–90 nt RNAs from the phloem sap of pumpkin and discovered several tRH such as 5′tRH-GlyGGA, 5′tRH-MetAUG, 3′tRH-AspGAC, 3′tRH-LysAAG, 3′tRH-GluGAG, 3′tRH-SerAGC, 3′tRH-SerACA, 3′tRH-ProCCT, and 3′tRH-ArgCGA. Similar tRNA-derived small RNAs (tsRNAs) were also identified in rice embryogenic callus, where the 5′tRF from AlaAGC-tRNA was the most abundant, comprising 82% of the total tsRNA pool (Chen et al., 2011). We found similar quantities in this study for both gymnosperms and the angiosperm. InA. thaliana we found the highest ratio of AspGUC, with the second highest being GlyCCC, GlyGCC, GlyUCC and HisGUG and GluUUC tRH-like lsRNAs. InP. abies embryos, the most abundant lsRNAs matched to AspGUC, followed by GlyCCC, GlyGCC, GlyUCC and HisGUG. Asp(GUC) and Gly(GCC) are known as substrates of DNMT2 (Goll et al., 2006). Drosophila Dnmt2 and human DNMT2 are potential RNA methyltransferases (Schaefer et al., 2010; Li et al., 2021).A. thaliana homologs of DNMT2 and NSUN2 are also responsible for tRNA methylation, in particular, Asp-, Gly-, Glu-, and Ser-tRNAs (Burgess et al., 2015). This might suggest that covalent modification in the RNA molecules is involved in the processing of these putative tRNA-derived sRNA or conversely linked to the regulation of RNA methylation. Hence, the tRNA-derived lsRNA population in seeds was highly position-biased, implying that, potentially, most of these tRNA-derived lsRNAs were specifically generated. Due to the size of the 28–37 nt population, these sRNA may best be classified as tRNA halves.
The tRDs and tRHs may have functional roles (Cao, 2023; Wang et al., 2023). Both tRFs purified fromA. thaliana leaves and specific synthetic tRFs deriving from the 5ʹ extremity ofA. thaliana tRNAAla and tRNAAsn efficiently inhibit protein synthesis in vitro. tRFs likely act as general modulation factors of the translation process in plants (Lalande et al., 2020). Recent studies have shown that methylation of RNA by DNMT2 and NSUN2 is involved in RNA transportation from shoot to root inA. thaliana (Yang et al., 2019). Moreover, a rice homolog of Nsun2 appears to be responsible for high-temperature adaptation through translation regulation (Tang et al., 2020). tRHs may be stress- inducible both in animals and plants (Torrent et al., 2018; Hajieghrari and Farrokhi, 2022). The tRNA-derived small RNAs (tsRNAs)-mediated target silencing was further found to be dependent on AGO2 and the fact that 5′ tsRNAs were co-immunoprecipitated with AGO2 suggests that tsRNAs inhibit specific targets via RNA interference (RNAi) (Kumar et al., 2014; Shigematsu and Kirino, 2015; Jehn and Rosenkranz, 2019). Thus, tRNA-derived lsRNA might play a role in adaptive responses to environmental cues and specific stress conditions.
We found that lsRNAs were present naturally in somatic and zygotic embryos of plant seeds, but less abundant in vegetative tissue; however, no functional analysis was made in this study. To obtain insights into their putative role, we studied lsRNAs level changes during embryogenesis, a time when it is known thatP. abies embryogenic tissues are highly sensitive to changing environmental conditions (Kvaalen and Johnsen, 2008). We searched for a possible role for lsRNAs in epigenetic mechanisms by analyzing differentially expressed lsRNAs (DES) patterns in mature somatic embryos cultivated under contrasting epitype-inducing (EpI) temperature conditions. InP. abies epitype embryos, we found 1130 DESs between cooler and warmer EpI conditions grouped into 18 clusters with similar expression profiles. The EpI conditions resulted in a very clear change in lsRNA levels. Such change in expression highlights their dynamic nature, implying direct or indirect involvement in transcriptional and/or translational regulation and, possibly, in epigenetic mechanisms during embryogenesis.
As there are no tools designed specifically for prediction of targets for slRNAs, we tried to use psRNATarget, developed for miRNAs. It does not mean that lsRNAs function the same way as miRNAs, but help to define regions complementary to slRNA sequences, which could be targeted by lsRNAs. The target prediction of lsRNAs showed that only a small number of knownP. abies andA. thaliana gene models could be regulated directly by lsRNAs. Target prediction among repeat and transposable elements was modest. In total, there were 60 targets among the transposable elements. However, the TE databases contain mainly angiosperm species so our search has probably underestimated the TE and repetitive element hits by lsRNAs in gymnosperms. Despite the high degree of conservation of lsRNAs between plants, this finding points to an important role in biological regulation, although their exact role is yet to be determined. In addition, the very high predicted redundancy of lsRNAs means that one single target could potentially be regulated by many lsRNAs. Functional redundancy is thought to introduce a robustness that buffers the deleterious effect of mutations hitting one of the redundant copies (Ambrós et al., 2018).
The possible functional importance of lsRNAs is supported by their presence across different phyla that may have been evolutionary separated for up to 500 million years, namely between angiosperms and gymnosperms. In addition, lsRNAs are differentially expressed in matureP. abies somatic embryos growing under different epitype-inducing temperature conditions. Of course, not all small RNAs are necessarily functionally important and some lsRNAs may be transitory degradation intermediates or simply by-products of aberrant transcription. This being said, we do not rule out the possibility that lsRNAs may have a functional role during propagation, particularly in meristematic cells, but functional studies are needed to confirm or refute such a proposed role.
In conclusion, this is the first detailed in-silico survey of lsRNA, a class of embryo-related 28–37 nt sRNAs, suggesting that at least some of these are conserved between gymnosperm and angiosperm plants. lsRNAs have a prevalent length of 31–32 nt and we recommend that the sRNA fraction of greater than 24 nt in size should be considered potentially functional and not be filtered out or ignored in future RNA analyses. These long small RNAs add to the complexity of the small RNA world and, since they are differentially expressed during epitype inducing conditions during embryogenesis, some may have a direct or indirect role in response to environmental change and in epigenetics.