All 47 microsporidian genome assemblies available as of March 2019 were retrieved from GenBank (Additional file 1) and used for the identification of TEs. DNA transposons are the dominant class of TEs in microsporidian genomes, but their abundance varies greatly between species, and is not phylogenetically conserved (Fig. 1A; Additional file 2). The tree presented in Fig. 1A, built from a dataset of 37 highly conserved proteins (Additional file 3) established to reduce long-branch-attraction artifacts in microsporidian phylogenies [1,2], is congruent with previously published microsporidian phylogenetic inferences based on other data [22,23].
To evaluate whether TE accumulation in microsporidian lineages represents ancient or recent events, we used the Kimura 2-parameter distance (K2P) to generate age distributions of TEs within each genome (Additional file 4). The evolutionary dynamics of TEs shows very distinct patterns, even within species. For example, the three A. algerae genomes show variable amounts of TE content (1.64% in strain PRA109, 1.34% in PRA339 and 15.75% in Undeen; Additional file 1; Fig. 1A), and the accumulation of TEs in A. algerae Undeen apparently results from a recent spread of DNA transposons, mainly from the Merlin superfamily (Additional Files 5 and 6). A sharply bimodal distribution of TE abundance per age class suggests two successive expansions in A. algerae Undeen that are not observed in the remaining two strains (Fig. 1B). Similarly, the Nosema genomes from N. bombycis (33.41 % of TEs) and N. ceranae BRL01 (21.64% of TEs) show an excess of young elements, which also seems to result from the recent spread of Merlin (Fig. 1B; Additional Files 5 and 6).
We reasoned that these recent events of TE expansion in some evolutionary lineages could result from the loss of genes encoding components from cellular defense mechanisms, such as members of the RNAi pathway. We test this idea using the example of Dicer and Argonaute, two genes essential for the RNAi pathway. It was suggested that Dicer and Argonaute genes are selected as a unit in microsporidia, and that they were selectively maintained or lost during lineage divergence [24]. Therefore, we searched for orthologs of proteins related to cellular defense against TEs in the 47 proteomes (Additional file 6). No RIP/MIP-related proteins, which silence TEs by inducing mutation and/or cytosine methylation within repetitive sequences, were found in any lineage. Most strains carrying Dicer and Argonaute orthologs also carry the protein sets necessary for somatic quelling and meiotic silencing, but this is still no evidence of the activity of these mechanisms in microsporidia. The number of Dicer and Argonaute gene copies varies between 0-5 and 0-7, respectively, and in one lineage (A. algerae PRA 109) we found 2 copies of Dicer, but no Argonaute (Additional file 6). Thus, Dicer and Argonaute seem not necessarily selected as a unit. Interestingly, although there is an overall congruence between the Dicer + Argonaute tree (Fig. 1C) and the one built with 37 conserved proteins (Fig. 1A), there are two inconsistencies, i.e., the A. algerae, and the E. aedis clades, though with low statistical support.
A strong negative correlation was found between the percentage of total coding sequences (CDS; including both TE and host derived sequences) and genome size (rho = -0.92, p-value = 2.2e-16; Fig. 2A), indicating that non-coding regions are responsible for variations in genome size. On the other hand, there is a positive correlation between the fraction of genomes occupied by TEs and genome size (rho = 0.66, p-value = 4.375e-07; Fig. 2A). Regression analysis between TE fraction and genome size is also significant (R-squared = 0.33, p-value = 1.501e-05; Additional file 7). We further applied phylogenetic independent contrasts to the regression analysis [25] in order to correct for the non-independency of traits among taxa, which showed that the positive correlation between genome size and the amount of TEs in microsporidia does not result exclusively from phylogenetic inertia (R-squared = 0.16, p-value = 0.003; Additional file 7). However, some lineages with comparatively large genomes such as A. algerae PRA 109, PRA339 and Edhazardia aedis have TE fractions similar to small genomes such as Encephalitozoon cuniculi GBM1 and Enterocytozoon bieneusi. Both E. cuniculi GBM1 and E. bieneusi genomes suggest recent expansions of miniature inverted-repeat transposable elements according to the K2P age distributions of TEs (MITEs; Additional files 3 and 5). MITEs are deletion derivatives of DNA transposons from the Tc1/Mariner superfamily [26]. Tc1/Mariner elements are absent in Encephalitozoon cuniculi and Enterocytozoon bieneusi but are found in closely related microsporidia (Nosema spp. and Hepatospora eriocheir; Additional files 3 and 5).
We identified Dicer and Argonaute orthologs in 27 proteomes (Additional file 6). Strains with putative Dicer and Argonaute genes have, on average, a higher proportion of TEs (8.2%), and larger genome sizes (11.05 Mb) compared to strains in which these proteins are absent (4.06% and 3.81 Mb, respectively; Wilcoxon rank sum; p-value <0.004, and p-value < 2.2e-16, respectively; Fig. 2B). Although the association between the presence of both Dicer and Argonaute with the accumulation of TEs seems to support the idea that a RNAi machinery is being selectively maintained in microsporidian genomes attacked by selfish elements, we think that random events may have had an equivalent or perhaps more important role. First, there is only partial phylogenetic congruence between Dicer + Argonaute proteins (Fig. 1C) and the history of microsporidia inferred from 37 highly conserved proteins (Fig. 1A), suggesting rate heterogeneity in the evolution of the microsporidian RNAi machinery. Second, genes involved in the RNAi pathway may have been lost by deletion or gained by duplications and/or HGT. For example, although it cannot be ruled out that genes might be lacking in the assemblies due to genome incompleteness, and that microsporidia may have mechanisms to eliminate TEs other than RNAi, Argonaute was apparently lost in A. algerae PRA109 but not in A. algerae PRA339, in spite of both strains having a similar TE age structure (Fig. 1B). We have also found evidence suggestive of Argonaute HGT involving N. bombycis; its Argonaute protein sequence is identical to Papilio xuthus, the Asian swallowtail butterfly (Additional file 8). The ortholog protein in N. apis has only 62.3% identity with the P. xuthus Argonaute. Previous comparative studies focusing on the enlargement of the N. bombycis genome suggested a major implication of HGT in the acquisition of new genes and TEs from its lepidopteran host Bombyx mori [10]. However, in the particular case of Argonaute, HGT may have occurred in the opposite direction. The genomes of both the butterfly Papilio xuthus and the moth Helicoverpa armigera encode orthologes of Argonaute that cluster with microsporidian proteins (Additional file 8).
Horizontal transfer of different genes has been reported in microsporidia [23,27–30]. Among TEs, DNA transposons are the most often horizontally transferred in eukaryotes [16], and in A. algerae, several Merlin and PiggyBac transposon insertions were shown to result from HGT [11]. We hypothesize that the invasion and accumulation of TEs in microsporidian genomes is facilitated by their intracellular mode of life and by the low power of purifying selection. We further hypothesize that the latter is caused by low Ne in association with vertical transmission. Microsporidia use vertical transmission as a survival strategy when opportunities to transmit horizontally are reduced [31], thus being of particular importance during times of low density or the colonization of new host population, both of which cause population bottlenecks. Therefore, this mode of transmission may reduce Ne and thus the strength of natural selection [8]. Species with larger mobilomes, such as N. bombycis and Hamiltosporidium spp. show mixed mode transmission (a combination of horizontal and vertical transmission) [32,33]. Microsporidia with mixed-mode transmission have, on average, a higher proportion of TEs than species with horizontal transmission. The difference in TE percentage between species capable of vertical transmission (11.48%) and those showing only horizontal transmission (3.37%) is statistically significant (Wilcoxon rank sum test, p-value <0.0006; Fig. 2B). The difference in genome size between these two groups is also significant (Wilcoxon rank sum test, p-value=1.77e-05; Fig. 2B), with vertically transmitted species having larger genome sizes, on average (16.87 Mb, against 3.67 Mb, from species exclusively horizontally transmitted).
On closer inspection, one can see that some microsporidia that use vertical transmission do exhibit a low proportion of TEs (such as A. algerae PRA109, PRA339 and E. aedis; Fig. 1; Additional files 2 and 4). While these examples are few and do not reverse the overall statistical signal, they seemingly contradict our hypothesis. It is unclear if these low abundances are genuine or artefacts resulting from current methodological approaches. TEs are a challenge for genome assembly and annotation, and some sequencing pipelines may discard TEs during the assembly process and/or later during the filtering of contaminants [34]. Different sequencing platforms and pipelines were used to sequence and assemble the genomes used in our study, which might have interfered in subsequent TE annotation. Most TE annotation methods rely on homology-based approaches that limit the discovery of novel families of TEs, or require previous knowledge of structural features of TEs, being prone to false negatives [35]. Moreover, highly degenerated TEs without recognizable structural features can be missed from annotation [36,37].
Taken together with our previous study [8] the here presented results show that vertical transmission is correlated with genome expansions and the spread of TEs in microsporidia. Thus, we suggest that nonadaptive forces play a strong role in shaping the evolution of genome size in populations of microsporidian parasites [13].