Two small RNA populations (27nt and 31nt) are identified in Entamoeba
In order to identify the complete spectrum of sRNA species in Entamoeba, including those that may have diverse structures or modifications or may be less abundant we decided to extensively explore the endogenous sRNA species and populations in E. histolytica by sequencing total RNA fractions (15-45nt) from wild-type E. histolytica trophozoites. We fractionated total RNA into two size fractions (15-30nt and 30-45nt), and recovered RNAs from both were cloned by 5′-P independent cloning method (using Tobacco acid pyrophosphatase (TAP) to convert 5′-polyP into 5′-monoP) (Suppl.Table I). Although similar sRNA libraries were previously reported by us, they were on a small-scale using Sanger sequencing or pyrosequencing approaches or were only at the range of 15-30nt [21, 23]. The goal in this study is to allow the current Illumina deep sequencing platform to provide a full account of Entamoeba sRNAs, including potential sRNA species have modifications.
The sRNA size distribution of the two libraries (15-30nt and 30-45nt libraries) cloned by TAP method is shown in Fig. 1A. We observed only one sRNA population (a sharp 27nt peak) for the 15-30nt library, which matched with previous results [23]. However, for the 30-45nt library, we identified two sRNA populations (peaks at 27nt and 31nt). The 27nt peak is likely a carry-over of abundant 27nt population, but the peak at 31nt was unexpected and new to us. We characterized and mapped the sRNA sequences from both libraries using a custom data processing pipeline (Suppl. Figure 1). The unique reads were mapped to tRNA and rDNA sequences using Bowtie [27]; the remaining reads were aligned to the amebic LINEs (Long Interspersed Nuclear Elements), genome and transcriptome. This analysis revealed that most reads in 27nt peak can be mapped to the genome; but that sRNA reads in the 31nt peak do not map to the genome (Table I, Fig. 1A). In order to understand why the 31nt sRNAs could not be mapped to the genome, we plotted the nucleotide frequency at each position for the unmapped reads and identified an oligo-A tail prominent at the 3’-end for the 31nt reads, which causes this 31nt sRNA population to not map to the genome (Fig. 1B). In order to map the 31nt sRNA reads, we clipped sequence reads after 27nt position using a custom Python script, and then re-mapped to the genome. Our analysis revealed that these clipped sequences can now map to genome indicating that the non-templated 3–4 As were added to the existing 27nt sRNAs (Table I). Of note, reads mapped to LINEs make up almost 28% of these 31nt sRNAs compared with < 10% in the 27nt peak (non-modified populations), indicating that retrotransposons may be influenced by non-templated oligo-A sRNAs.
Table I. Genomic categories that are mapped by sRNA reads by size-fractionated total RNA libraries (TAP cloning method).
We followed the same mapping strategy used previously [21, 22]. Raw reads from Illumina sequencing were first sorted by barcodes, then linker sequences were removed and unique reads were generated by the uniq tool. The Bowtie alignments (-v1) were used for mapping to tRNAs, rRNAs and repetitive elements, and genome including ORFs.
Libraries were made using 5′-P independent (TAP) cloning methods: as shown, most reads in 15-30nt library can be mapped to genome (84%), however fewer reads in 30-45nt library can be mapped to genome (50%) and they are mostly 27nt sRNAs. This lead to the discovery of non-templated oligo-A sRNA population. The column “3′ end trimmed” is the remapping of these non-mapped reads in 30-45nt library after trimming off oligo-A at 3′-end. Most reads after trimming (72%) can be mapped to genome. Listed are mapped reads with the calculated percentiles shown in parenthesis.
In order to determine if the sRNA species overlap between the 27nt and 31nt populations, we performed alignment analysis of the trimmed 31nt sRNAs directly against the 27nt sRNA reads and found that most (85%) of the trimmed 31nt reads can be mapped to the 27nt reads, indicating a high overlap between two sRNA populations. Consequently, we found that both sRNA populations target the same set of genes, and the number of unique sRNAs mapped to these genes from both datasets are correlated (Suppl. Figure 2A). However, the abundance of individual sRNAs cloned in each population is not well correlated (Suppl. Figure 2B), indicating abundance of individual sRNAs within the two sRNA populations may be regulated differently within the cell.
We then selected a few probes which were cloned in both 27nt and 31nt populations to confirm by Northern blot analysis. Using total RNA with these probes showed the two expected sizes of sRNAs (Fig. 1C). In addition, we found that both sRNA populations are sensitive to capping enzyme but resistant to Terminator enzyme (Fig. 1D), indicating that both populations have a 5´-polyP structure. Taken together, the sequencing data and Northern blot analyses confirm that E. histolytica contains 27nt as well as 31nt sRNAs, and that the latter are formed by addition of 3 or 4 non-templated As being added to existing 27nt sRNAs.
Both small RNA populations (27nt and 31nt) are unchanged during development of Entamoeba invadens
Entamoeba invadens is a reptilian parasite that is used as a model parasite to study amebic development in vitro [28, 29]. Previously, we sequenced the 27nt sRNA population in E. invadens parasites, and mapped these sRNAs to ~ 700 genes with low expression levels [22]. We also adapted the trigger gene silencing approach to this parasite, and induced specific gene silencing for the targeted genes [30], indicating that the 27nt sRNA populations and silencing mechanism is conserved in these two amoebic species. We and others have shown that many genes are developmentally regulated during encystation and excystation [31, 32], however genes with antisense sRNAs appear to be not developmentally regulated as sequencing and mapping of the 27nt population at four developmental time-points showed identical gene composition [22]. We first sought to check whether the 31nt population is also present in E. invadens. Total RNA samples from (trophozoite, 72hr encysted parasites, and parasites after 8hr excystation) were radioactively labeled and run on a denaturing 15% polyacrylamide gel. Two sRNA bands can be easily detected at 27nt and 31nt sizes (Suppl. Figure 3A), indicating that E. invadens has both sRNA populations. To sequence the 31nt sRNA population, we size-fractioned 30-45nt RNA and made sRNA libraries using the TAP method for all three samples. Similar to the observation in E. histolytica, the size distribution and mapping features of these libraries all showed 27nt and 31nt peaks, and 31nt peak reads cannot be directly mapped to the genome ( Fig. 1E). Nucleotide compositions of the 31nt population clearly show an oligo-A tail (Suppl. Figure 3B). Genome mapping of these three libraries as well as mapping of their tail-clipped sequences are shown in Suppl.Table II. Thus, we conclude that E. invadens also contains an sRNA population with non-templated A-tail. Using a similar approach as outlined previously [22], we analyzed the genes that mapped by sRNA from 31nt populations among trophozoite, 72hr encystation, and 8hr excystation libraries; the overlap is significant as shown in Suppl. Figure 3C, indicating that the development process does not affect these genes, matching previous results with the sRNA from 27nt population. In summary, the endogenous genes with antisense sRNAs are “locked” for silencing during development, which is reflected in both 27nt and 31nt populations.
The relative abundance of two sRNA populations indicates gene silencing efficiency in trigger gene silencing transfectants
Small RNA modification events have been documented in several model systems [10, 11], including oligo-uridylation of siRNAs and miRNAs in plants and algae [33], adenylation of miRNAs in human [34], and recently adenylation of siRNAs in yeast [35]. In these systems, the modification either leads to sRNA degradation (uridylation) or miRNA protection (single adenylation). We sought to explore the possibility that sRNA adenylation is a turnover pathway for sRNA in amoeba, similar to adenylation of siRNAs in yeast. We used cell lines of trigger gene silencing transfectants which we previously developed in the lab [36–38] and probed for sRNAs using gene specific probes. Northerns in Fig. 2A show that two bands corresponding to 27nt and 31nt sRNA populations can be detected for each targeted gene. We also observed that the relative abundance of two sRNA populations is indicative as to whether or not the targeted gene is silenced: for the 19T-EhROM1 cell line, in which the EhROM1 gene is silenced (Fig. 2B), there are much higher levels of the 27nt population than the 31nt population. In contrast, for 19T-EhAgo2-2 cell line, the Ago2-2 gene is not silenced (Fig. 2B), there is more 31nt population than 27nt population. In addition, we tested 19T-induced gene silencing line for a non-RNAi pathway gene (19T-Eh136160 (calreticulin precursor, putative)), we observed the similar phenomenon in which the targeted gene is not silenced but with prominent 31nt sRNA populations detected (Fig. 2B). The control (EHI_164300 and EHI_125400) probes used in the Northern demonstrated that endogenous silenced genes have more 27nt population signal. These results may suggest that parasite cells have a homeostasis mechanism to balance the turnover of the 27nt sRNA population. In the cases where a targeted gene is silenced, 27nt sRNAs can be accumulated and used as silencing effector; however, when a targeted gene is not silenced, most 27nt sRNAs will be converted into modified form with oligo-As, possibly leading to further degradation. As indicated by Argonaute PAZ crystal structure, PAZ domain shows a poor binding ability to the 3’-adenylated RNAs [39]. We have shown that EhAgo2-2 binds abundant 27nt sRNA populations and that mutating its PAZ domain abolishes sRNA binding [26]. With modification with oligo-As, 31nt sRNAs presumably are in a process of disassociation with EhAgo2-2. Loss of protection from binding to Ago may lead to degradation.
The three EhAgo proteins all bind to 27nt sRNAs
We recently reported that three E. histolytica Ago proteins have distinct subcellular localizations and demonstrated that the PAZ domain of each EhAgo controls sRNA binding [26]. To further characterize the sRNA populations that bind to each EhAgo, we used Myc-tagged EhAgo overexpression lines and anti-Myc IP pulldown to isolate RNAs associated with each Ago (Fig. 3A). For EhAgo2-2, a distinct 27nt sRNA population was noted, as has been previously published [23]. For EhAgo2-1, the sRNAs were much less abundant and seen as a faint smear at 20-30nt range. For EhAgo2-3, two sRNA populations at 27nt and 21nt were observed.
The specificity of Myc IP for each Ago was verified by additional controls. First, IP using control beads (anti-HA) showed no signal at the sRNA range when compared with anti-Myc IP for each EhAgo (Suppl. Figure 4A). Second, we used Western blot analysis to demonstrate that each EhAgo has specific Myc signal at the expected sizes which is absent in the control IP (Fig. 3B). The same membrane blot was stripped and probed using anti-EhAgo2-2 antibody; this demonstrated that the EhAgo2-2 was only identified in the EhAgo2-2 IP but not in the IP of EhAgo2-1 or EhAgo2-3, indicating that each IP is specific without cross-contamination with EhAgo2-2 (Fig. 3B). Of note, EhAgo2-2 is the only protein that is abundant enough to be detected in wildtype cell lysates by Western blot analysis, the other two EhAgos are at low level of expression which can only be detected from overexpressing cell lines. Hence, we could not easily test the other two Ago proteins for cross-contamination [26]. Given that EhAgo2-2 is the most abundant Ago protein in Entamoeba and has the most abundant population of associated sRNAs, the ability to exclude its potential co-IP in EhAgo2-1 and EhAgo2-3 pull-down was important. Previously, we showed that the three EhAgos have distinct subcellular locations and that mutations in the PAZ domains can abolish sRNA binding specifically for each Ago [26]. Finally, we demonstrated that the sRNA population bound to each Ago is not affected by various high salt concentrations used in the IP wash (Suppl. Figure 4B), which indicates that each Ago firmly binds the associated sRNA population. Based on these data, we concluded that the sRNA profile identified in Fig. 3A with each EhAgo IP is specific to the given EhAgo protein being studied.
The sRNAs bound to the three EhAgo proteins have 5′-polyP structure
We have previously shown that sRNAs bound to EhAgo2-2 have a 5′-polyP structure [23], a feature similar to the 22G sRNA found in C. elegans and Ascaris [24, 25]. To determine if sRNAs bound to EhAgo2-1 and EhAgo2-3 also have a similar 5′-polyP structure, we performed an RNA capping assay [23, 24]. Using capping treatment, we show that the 27nt sRNAs associated with both EhAgo2-1 and EhAgo2-3 shifted in size by one nucleotide; however, the smaller size RNAs (18-24nt) within the same sample were unchanged with the capping assay (Fig. 3C). Overall, the data indicate that 27nt sRNAs that associate with EhAgo2-1 and EhAgo2-3 have a 5′-polyP structure whereas the lower size sRNAs do not. In order to define the 5′-structure for lower size sRNAs, EhAgo2-3 IP sRNA samples were labeled at the 5′-end using either T4 polynucleotide kinase (PNK) or calf intestinal phosphatase (CIP) + T4 PNK (Suppl. Figure 4C). The signal for PNK labeling can be seen for lower size band but not for upper 27nt band; however, as expected, both upper and lower bands can be seen by CIP + PNK labeling, indicating that the lower size sRNAs likely have 5′-OH structure; thus, these sRNAs may arise from an RNA degradation process.
Our capping assay and 5′-end labeling analysis indicated that the three EhAgos are all loaded with 5′-polyP sRNAs. The sRNA loading to redundant Agos has been seen in other systems. In C. elegans, 5′-polyP 22G sRNAs are loaded into worm-specific WAGOs (18 members) with semi-redundancy [40]. In human cells, all four Ago proteins are loaded with miRNAs and siRNAs in a non-distinguishable manner [41].
Characterization of sRNA populations bound to three EhAgo proteins
For a better understanding of the sRNAs associated with all three EhAgos, we performed high throughput sequencing of sRNA libraries generated from anti-Myc IP RNA samples. We combined three independent biological samples of anti-Myc IP RNAs for each EhAgo line. Using these samples, we made sequencing libraries using two separate enzymatic treatments (TAP and RNA 5´-pyrophosphohydrolase (RppH)) to convert 5′-polyP into 5′-monoP. A total of six sRNA libraries (three with each enzyme treatment) were constructed using a 5′-P independent cloning method (this approach is able to clone both 5′-polyP and 5′-monoP RNA species). All samples are listed with the sequencing depth of total reads and unique reads (Suppl. Table I). The RppH IP libraries were sequenced more deeply than TAP IP libraries, with depth of approximately 2 million reads among three EhAgos, which generated approximately two times more unique reads than TAP IP libraries. We therefore relied on the RppH sequencing libraries for the majority of the analysis presented below; the analysis of TAP sequencing libraries is also provided as Suppl. Table III; it is substantially similar. We also included one important EhAgo2-2 mutant, EhAgo2-2△NLS−DR, which lacks a C-terminal putative NLS and the DR-rich motif region. The mutant protein is localized to the cytoplasm, in contrast to the nuclear localization of wild-type protein, and had similar sRNA binding of 27nt sRNA comparable to the wild-type EhAgo2-2 [26].
The overall mapping of Ago bound sRNAs to the Entamoeba genome is in Table II. The percentage of reads that map to tRNA and rRNA are at similar level among three EhAgo libraries (0.5-1% for tRNA; 13–19% for rRNA) and these reads are predominantly in the sense orientation. The tRNA and rRNA reads are inevitably present in almost all sRNA sequencing libraries published to date and they are often considered as partial degradation products from these high abundant RNA species in the cell, with a few exceptions [42, 43]. Of note, the mutant EhAgo2-2△NLS−DR IP library had fewer rRNA reads (3.7%) when compared with wildtype EhAgo2-2 IP library (13.3%). This may be due to the localization change of the mutant protein (cytoplasmic rather than nuclear).
Table II. Genomic categories that are mapped by sRNA reads by EhAgo IP libraries.
Genomic mapping strategy is same as in Table I. Listed are mapped reads in each category with the calculated percentiles shown in parentheses. LINE mapped reads are much less in EhAgo2-2 compared with other two EhAgos and EhAgo2-2△NLS−DR.
The E. histolytica genome is highly populated with retrotransposons and repeat elements including LINEs, SINEs (Short Interspersed Nuclear Elements), and EREs (Entamoeba Repeat Elements) [44]. There are thousands of copies of EhLINEs in the genome, but they are considered “inactive” as genome sequencing has found no single copy of a LINE which has a complete open reading frame (ORF) [45]. Our sequencing datasets for the three EhAgo sRNA libraries have few reads that mapped to SINEs and EREs, however there are substantial sRNA reads that mapped to LINEs. Among the three EhAgo proteins, EhAgo2-2 had significantly lower amounts of LINE-derived sRNAs (2.5% with EhAgo2-2; 9.9% with EhAgo2-1 and 6.8% with EhAgo2-3) (Table II). For the mutant EhAgo2-2△NLS−DR, we observed a higher percentage of LINEs reads compared to the wild type EhAgo2-2 (10.7% vs. 2.5%).
The largest category (40%) of reads that mapped to genome belong to ORFs, indicating the second major sources for endogenous sRNAs in Entamoeba are derived from gene coding regions. We categorized the genes to which the sRNAs map using both a cutoff (≥ 20 sRNAs map to a gene) and antisense/sense ratio (Antisense (ratio > 2), Mixed (ratio 0.5-2) and Sense (ratio < 2)). As seen in Table III, number of ORFs in the Antisense group is the largest among three categories for all three EhAgo proteins and they overlap by a set of 226 ORFs. Both TAP and RppH IP libraries rendered very similar results in terms of sRNA mapped genes, indicating two different sRNA treatments works equally well, and sequencing depth used in this study is sufficient to identify the core ORFs targeted by sRNAs. As seen previously, our data for all three EhAgo-bound sRNA libraries further demonstrated that genes with antisense sRNAs have very low expression levels and that the distribution of the antisense reads is biased to the 5'-end of genes (Suppl. Figure 5A and 5B). Lastly, we used the sequences from EhAgo IP libraries to determine if E. histolytica antisense sRNAs have a “phased” feature. We checked the first 540 bp region of each ORF for mapped sRNA reads under a 27 bp window starting from the ATG. The resulting frequency for each position (1–27) was plotted (Suppl Fig. 6). There is no apparent phased register for antisense sRNA in Entamoeba, indicating these sRNAs are not from Dicer processing.
Table III. Antisense sRNAs mapped genes overlap among three EhAgo IP libraries (TAP and RppH) and size-fractionated total RNA libraries (TAP).
We used a cutoff of ≥ 20 unique sRNAs mapping to a gene. We further divided targeted genes based on the number of AS/S ratio, and formed Antisense (ratio > 2), Mixed (ratio 0.5-2) and Sense (ratio < 2) groups. Listed are number of genes for each group. As shown, both TAP or RppH treatment rendered very similar gene set for three EhAgos. The sequencing depth of RppH IP libraries is 4-fold deeper than TAP IP libraries (Suppl. Table I), indicating that: A: sequencing depth in this study is sufficient to identify the core ORFs targeted by sRNAs; B: both enzymatic treatments are effective. In addition, genes identified by two size-fractionated total RNA libraries that are cloned using 5′-P independent cloning method (TAP) are listed. In all libraries, the Antisense group consists of the largest number of genes, and these genes significantly overlapped among EhAgo IP libraries, as well as total RNA libraries.
On a genome-wide scale, we used the Cuffdiff algorithm [46] to check if there are intragenic regions to which sRNAs from the three EhAgo IP libraries map differentially. This is an approach similar to our genome-wide RNA-Seq study for identifying loci with differential mapping of mRNAs [22, 32]. Pair-wise comparisons among the three EhAgo libraries identified only a small number of loci with differential mapping of Ago-associated sRNAs. For example, 64 significant differences out of 2,225 loci with mapped sRNAs were identified in the comparison between EhAgo2-1 and EhAgo2-2, and 51 out of 1,734 loci were identified in the comparison between EhAgo2-3 and EhAgo2-2. These results again indicate that sRNAs bound to each of the three EhAgos have very similar targets throughout the genome. Based on our previous work, this core set of silenced genes remains silent even under stress conditions [22]. We speculate that E. histolytica may utilize these EhAgo in a redundant manner for silencing of endogenous genes.
The sRNAs that bind EhAgo proteins are 27nt in size and have a 5′-G bias
For the three Ago-associated sRNA libraries, we determined the size distribution of sRNA cloned from both the TAP and RppH methods and found that they are similar (Fig. 4A and Suppl Fig. 7A), indicating both enzymatic treatments made no difference in converting these 5′-polyP sRNAs for library cloning. The 27nt sRNA peak can be seen in all four EhAgo libraries, with a sharp 27nt peak for EhAgo2-2 and EhAgo2-2△NLS−DR. However, smaller size sRNAs are seen in EhAgo2-1 and EhAgo2-3 libraries by both TAP and RppH methods. This matches with the sRNA profile seen on the sRNA gel (Fig. 3A), where the lower size RNAs have 5′-OH structure and are likely from degradation. In addition, we also checked the size distribution of the total reads (non-unique) for each library (Suppl. Figure 8). There is a prominent peak at 27nt and a very small peak at 21nt in EhAgo2-3 IP library, indicating the smaller 21nt RNA band is not cloned efficiently (as expected, likely because of their 5′-OH structure).
We previously observed that the sRNAs bound to EhAgo2-2 have a G bias in the 5'-nucleotide position [21, 22]. To check the nucleotide composition of sRNAs in EhAgo2-1 and EhAgo2-3 IP libraries, we plotted nucleotide frequency at each position for each unique sRNA read and found the 5′-G bias again (Fig. 4B and Suppl. Figure 7B).
As EhAgo2-1 and EhAgo2-3 have significant numbers of sRNAs in the size range of 18-27nt, we tested if 5′-G bias feature was true for the smaller size sequences in these libraries. We extracted subsets of 23-24nt and 27nt reads from three EhAgo libraries and compared their nucleotide composition (Suppl. Figure 9). Both subsets show the 5′-G bias feature indicating that 5′-end of sRNA is likely intact between two sampled 23-24nt and 27nt subsets. We did further mapping analysis of two subsets (23-24nt reads against 27nt reads) using Bowtie and found that the majority of reads in 23-24nt subset align perfectly to reads in 27nt subset (Suppl. Table IV), indicating these smaller reads are likely derived from 27nt sRNA; thus, the 3' end of the sRNA is more prone to the degradation process than its 5'-end.
In order to determine if the actual sRNA species overlap among the three EhAgo libraries, we performed Bowtie alignment analysis of the EhAgo2-1 and EhAgo2-3 libraries, using EhAgo2-2 dataset as reference. We found that over 70% reads are aligned with reads in the EhAgo2-2 library (Suppl. Table V), indicating the identities of sRNA pool significantly overlap for three EhAgos. We also compared EhAgo2-2△NLS−DR with EhAgo2-2, and the overlap for these two libraries is 75%, indicating that the mutation does not affect sRNA binding to this protein. We had previously noted that the EhAgo2-2△NLS−DR efficiently binds sRNA similar to the wild-type protein [26], and the sequencing now confirms that the EhAgo2-2△NLS−DR mutant does not have an alteration in its associated 27nt sRNA population.
As an added note, the sequencing libraries for TAP EhAgo IP samples were made several years apart from the RppH EhAgo IP samples. Analysis for both libraries rendered very similar results in terms of sRNA mapping features, size distribution profile, 5′-G bias, antisense sRNA mapped genes. Therefore, we think that the highly similar sRNA populations observed in this study among the three EhAgos, despite the different localization, is not due to potential co-IP cross-contamination of EhAgo2-2 but rather is the true reflection on each Ago, as confirmed by the IP controls and Western blot analysis.
Taken together, we concluded that all three EhAgos bind to sRNA populations with significant overlap, mainly targeting retrotransposons and a core set of ~ 226 genes that are silenced in this organism. These sRNAs have a 5′-polyP structure and are not phased; thus, they are likely from the secondary sRNA pathway that involves RdRP activity. As mentioned earlier, C. elegans has worm-specific WAGOs which are all semi-redundantly loaded with secondary 5′-polyP 22G sRNAs. The C. elegans 22G RNAs are important for germline maintenance as they map to ~ 50% of the annotated coding gene in the genome which are from both silent and expressed loci [40, 47].
31nt sRNAs are weakly associated with EhAgo2-2
Our sequencing data for total RNAs show that E. histolytica has second sRNA size peak due to the 3–4 adenine(s) non-templated additions at the 3′-end of 27nt sRNAs. In order to check if this 31nt population is associated with any specific EhAgo, we analyzed non-mapped sRNA reads for each of the Ago IP libraries. The size distribution of the non-mapped reads showed the 31nt peak only present in EhAgo2-2 but not for the other two EhAgo proteins or EhAgo2-2△NLS−DR (Fig. 4C). Additionally, for EhAgo2-2, the nucleotide distribution analysis for non-mapped reads show 5′-G bias for the first nucleotide, and a string of 3 or 4 As were identified at the 3´-end (Fig. 4D). These results may indicate that EhAgo2-2 is the protein complex site that is involved in adenylation of sRNA and that its C-terminal DR-motif domain or associated proteins may be necessary for the adenylation modification to occur.
The EhAgo2-2 has an unusual DR-rich motif which controls the localization of this protein to the nucleus; two deletion mutants including EhAgo2-2△NLS−DR did not alter sRNA binding but caused protein localization change to the cytoplasm [26]. Our work has suggested that EhAgo2-2 can actively transport sRNAs from cytoplasm to the nucleus, where it can target nascent RNA transcripts and build repressive chromatin marks [26, 48]. One question that remains unanswered is the cellular site for the sRNA adenylation event in amoeba. In order to address this issue, we performed cell fractionation for nuclear and cytoplasmic lysates based on previously published methods [49]. Two Myc-tagged overexpression lines of EhAgo2-2 and EhAgo2-2△NLS−DR were fractioned side-by-side, anti-Myc IP pulldown experiments were performed, RNAs were isolated for both fractions and their sRNA profile characterized (Suppl. Figure 10). As expected, 27nt sRNAs were significantly depleted in the nuclear fraction for EhAgo2-2△NLS−DR due to the protein localization to the cytoplasm. In contrast, 27nt sRNAs were found at almost equal levels in both nuclear and cytoplasmic fractions for EhAgo2-2. We then sequenced sRNA libraries for EhAgo2-2 nuclear and cytoplasmic IP samples. Sequence mapping revealed that most reads of 27nt sRNAs from both libraries can be mapped to the genome, and that they have similar percentage for every mapped genomic category except that the cytoplasmic IP library has a larger percentage of non-mapped reads (19%) than the nuclear IP library (11.8%) (Suppl. Table VI). The size distribution of the non-mapped reads was analyzed (Fig. 4E) and demonstrated that the 31nt peak is present in EhAgo2-2 cytoplasmic IP but not in the EhAgo2-2 nuclear IP. The A-tail was identified at the 3´-end for cytoplasmic IP by nucleotide distribution analysis (Fig. 4F).
Our sequencing data for both total and Ago-bound RNAs show that E. histolytica has secondary sRNA populations that are modified with non-templated 3–4 adenine(s) at the 3′-end. Northern blot analysis confirmed both populations for both endogenous silenced genes as well as 19-Trigger induced gene silencing. The adenine(s) non-templated sRNAs are found in part to be associated with EhAgo2-2 and the modification processing appears to occur in the cytoplasm. Our finding of sRNA with the adenylation modification is the first report of a sRNA modification in pathogenic protists and adds to the complexity of the sRNA repertoire in non-model organisms.
Summarizing the findings in this study as well as our previous work [20, 23, 26, 36, 48] (Fig. 5), we see that E. histolytica has abundant 5´-polyP 27nt sRNAs that are loaded to three EhAgo proteins in a non-distinguishable manner. The endogenous sRNAs are mainly derived from LINEs and a core set of ~ 226 gene loci with sRNA features consistent with these being RdRP products (they are 5´-G biased, mostly antisense, and are not in-phase). We identified a second sRNA populations at 31nt that are due to the modification of 27nt sRNAs at 3´-end with 3 or 4 As and these modified sRNAs are found in partial association with EhAgo2-2 while the intact RISC is in cytoplasm. The genetic roles corresponding these non-templated sRNAs await further study.