I. TE expansion is a shared feature among phylogenetically related taxa.
The tree topology based on 902 orthologous genes [44] in Fig. 1A (left) reveals that the increase in TE superfamilies (Fig. 1A, center and right) is shared, to different degrees, by various genera of South American annual killifish, several Old World annual killifish species and even in a non-annual fish belonging to the sister order Beloniformes. In fact, this selected tree topology is concordant with a previous phylogenetic tree based on morphological data where Austrolebias charrua, Nematolebias whitei and Cynopoecilus melanotaenia are the most phylogenetically related taxa [8]. In this sense, specific superfamilies of retrotransposons and DNA transposons contribute the most to the increased genome size of two related species, A. charrua and N. whitei, and to a lesser extent, C. melanotaenia. This could be an indication that TE expansion follows characteristic genomic patterns since the common ancestor among these highly related taxa. Despite these similarities, A. charrua has an unusually large diploid genome, having twice the size of the other analyzed species (3.39 Gbp). Moreover, 76.5% of its genome is composed of repetitive elements which far exceeds the repetitive content of other reported fishes. Therefore, this species represents an exceptionally large genome for a vertebrate, being so far, the largest among diploid members of the Actinopterygii, and in particular, among annual killifish that have been successfully sequenced and assembled.
Our analyses indicate that A. charrua also has the largest diversity of TEs superfamilies among related species and that there is evidence of a remarkable expansion of all the orders of TEs, contrasting with the slightly decreased proportion of the non-repetitive content. Strikingly, our study also indicates that part of the TE expansion burst that increased the genome size of A. charrua is not specific to a single lineage or family of TEs, but that it is, rather, a global phenomenon that affects elements mainly from the LINE and DNA orders. This specific expansion is not observed in closely related species. The repetitive elements that stand out as the main genome size contributors are the L2 and hAT-Charlie superfamilies. Further, just three LINE superfamilies account for a total of 1 Gbp (30.25%), approximately a third of the genome size of A. charrua, supporting their contribution to the large genome expansion in this taxon. In particular, the superfamily Rex-Babar presents the largest proportion among fishes in terms of genome size contribution with 7.15%.
Oggenfuss and Croll (2023) proposed that Individual TE families can experience multiple distinct burst events and generate many nearly identical copies [47]. Interestingly, among the three most abundant LINE superfamilies, the RTE-BovB retrotransposon in A. charrua, presents a positive slope near Kimura 0, suggesting that the transposition events for this LINE are likely still ongoing. Of the other major orders of TEs that contribute to genome size increase, the TE superfamily TcMar-Tc1 presents a Kimura slope in A. charrua that evidences an increased TE content by means of multiple small bursts of transposition events at different times. Burst events can cause a local accumulation of TEs in the chromosomes generating extensive clusters of them [68]. According to these authors, events of this type are more frequent in germ cells due to the temporary relaxation of the epigenetic control of TEs during early development. In fact, diapauses in annual fishes, such as those analyzed here, could represent an extended temporal window allowing TEs to avoid the selective constraints that otherwise restrict their activity and would allow them to propagate in the host genome.
II. TE insertion and genome structure dynamics in the Austrolebias genus group
The genomic distribution of TEs has been associated with nuclear architecture [45]. Possible events of genomic instability at the cytogenetic level as a by-product of TE insertion bursts have been detected in the Austrolebias genus group [30, 31]. These earlier studies showed extensive variation in chromosome numbers, and in the number of chromosome arms in each species, involving both types of Robertsonian (centric fusions) and non-Robertsonian (pericentric inversions) chromosome rearrangements occurring at intra- and interspecific levels. At present, it is well accepted that the insertion of TEs can induce chromosomal rearrangements as a by-product of transposition events, promoting genomic structural variation [9, 10]. The high abundance of LINE-1 and Alu repeats (approximately 17% and 11% of the human genome, respectively) favors ectopic recombination or recombination between non-homologous loci. Such recombination often results in significant chromosomal rearrangements such as gene deletions, duplications, translocations or chromosomal inversions [12]. Some authors, [6, 9, 34, 40] proposed that the majority of the TEs are located in heterochromatic regions. LINEs are enriched in AT-rich and gene-poor regions in the genome and in heterochromatin [45]. Since recombination is often suppressed in these regions it could explain why these chromosome segments are prone to the accumulation of TEs, where unequal crossing-over or genetic drift are common events. In the Austrolebias genus group, the C-banding technique detected constitutive heterochromatic blocks distributed at centromeric, telomeric, and interstitial regions following characteristic and variable patterns among species. These findings were interpreted as a result of pericentric inversions and centric fusions occurring alternatively during extensive karyotypic reorganization in cladogenetic events within this group [27, 30, 31].
Additionally, Wells and Feschotte (2020) revisited the classic example of several families of LINEs showing a target ribosomal RNA gene array [62]. Such high copy-number genes offer an excellent niche for TEs because insertion in one or a few of the genes is unlikely to have immediate deleterious consequences, and TEs can be progressively purged out by recombination within the array. Studies in species of the Austrolebias genus group using the silver stain NOR technique to identify rDNA active regions at the cytogenetic level, revealed a high number (3–6) of them at different chromosomal positions and patterns, which was also interpreted as a by-product of chromosomal rearrangements [27, 30, 31].
In agreement with the high level of interspecific karyotypic variation, chromosome polymorphisms presenting four different cytotypes were found in individuals of A. charrua, belonging to ponds from the southern Laguna Merin basin [24]. The chromosome polymorphism in this taxon involves one to three chromosome pairs of bi-armed elements, where pericentric inversions involved the shift from acrocentric to metacentric chromosome type, as well as the addition/deletion of heterochromatic blocks, and NOR bearing regions. All of the extensive variation and genomic turnover found in these taxa, suggest the existence of intrinsic factors, including TE activity.
As a conclusion from the present work, it is possible to re-interpret previous assumptions. For instance, the high variability detected by means of structural chromosomal changes as well as in the heterochromatin and rDNA regions distribution within and among species in the Austrolebias genus group, could be associated, to multiple events of TEs insertion and/or accumulation of TE tandem repeats in the aforementioned chromosomal regions. Interestingly, Ahmed and Liang (2012), considered that the ability of TEs to contribute to genome expansion is due, not only to transposition events, but also by generating tandem repeats producing great genome instability [1]. Some TE copies may be positioned one after another in tandem organization, and some chromosomal regions have several complete or incomplete copies of a specific TE [48].
In contrast to the aforementioned karyological revolution found in Austrolebias genus group, our previous cytogenetic studies in individuals of C. melanotaenia from the same ponds from Laguna Merin basin showed low karyotypic variation in diploid number, as well as in the large blocks of heterochromatin at subtelomeric regions, and rDNA active regions lacked evidence of major genomic instability [31]. This is coherent with our findings regarding the high content of TEs in A. charrua in comparison with the C. melanotaenia genome, which also is reflected in a higher rate of duplicate genes in this taxon. Altogether, these cytogenetic and genomic features suggest a different genomic impact linked to the insertion and the increase of elements belonging to some TE superfamilies in the Austrolebias genus group compared to C. melanotaenia.
III. Differences in gene content, structure, patterns of TE insertion, and selective genome constraints among taxa
The fact that C. melanotaenia has a highly fragmented assembly and, yet, we were able to recover a number of genes (20,236) similar to that of the other species, suggests that a fragmented genome assembly is not the only factor influencing the high number of genes predicted in A. charrua. Interestingly, we found the highest proportion of duplicate genes in this species, opening the possibility of an actual expansion in some gene families and an overall redundancy in gene content. Additionally, according to our gene annotation, we found, on average, shorter exons in A. charrua, N. whitei, and C. melanotaenia. Altogether, these observations seem paradoxical considering that the genomes of these species have a lower proportion of non-repetitive content (i.e. 23,5% in A. charrua). However, when restricting the set of orthologs analyzed to single-copy orthologs, we did not observe the same pattern, suggesting that such a feature may be associated with the duplicated gene set.
The analysis of the insertional patterns of TEs in genic regions indicates that approximately one third of annotated genes in A charrua overlaps with TE superfamilies, with L2 and TcMar-Tc1 being the most prevalent in congruence with presenting the highest number of TE fragments detected. The general pattern shows that the largest number of genes have a low number of fragments, while very few genes have high numbers of them. At the same time, more genes contain a higher number of fragments of DNA transposons than retrotransposons. Wells and Feschotte (2020) propose that the future fate of a TE depends on where it initially inserts in the genome [63]. In this sense, TEs can present little insertional bias, be inserted in genomic regions that minimize their deleterious effects, and/ or target sites that likely facilitate their subsequent propagation. Remarkably, in the present analysis, among other peculiar cases, we found evidence of a past activation event of the Rex-Babar superfamily. For instance, one case which is a unique feature of A. charrua, is the very distinguishable peak of activity at values ranging between 30 and 40 Kimura units. Strikingly, 11.79% of the fragments in the peak are located within a single gene that codes for a Leukocyte elastase inhibitor (LEI) also known as Serpin 1B (Supplementary Table 3). This is noteworthy considering that the expression levels of an orthologue of this gene, Serpin B2, have been found to correlate with the expression levels of TEs in its vicinity [51].
Distinct niches for different superfamilies of TEs and selection pressure against large insertions in flanking regions and exons have been described in other taxa [11]. When comparing the fragment distribution in genic subregions in A. charrua, we found L2 to be the most prevalent in almost all them whereas a higher proportion of the TcMar-Tc1 superfamily members contribute to overall exon length. Therefore, all these patterns suggest the possible occurrence of the amplification and shuffling of genic sequences by TEs. Exon shuffling via transposition of LINE or a DNA transposon type (among other mechanisms), could explain the origin of a new gene set, as well as a high level of isoforms per gene and duplicates (an average of 2.23) as was detected in A. charrua (data not shown). According to Bariah et al. (2023), among other authors, the insertion of a TE into a gene might result in the origin of new isoforms induced by exonization, truncation, alternative splicing, or the domestication of TE-derived coding sequences into host genes, potentially altering gene function [4]. Several studies have demonstrated the impact and range of this shuffling and cycling of genomic content on the evolution of plant and animal genomes [10].
Under our present analysis, A. charrua is the species with the highest number of tandem repeat regions, presenting shorter repetitions than the other analyzed species. Among 7,286 genes containing tandem repeats longer than 500 bp, 93 of them were cataloged as “highly tandem”. Among them, after functional enrichment analysis, the “exosome multienzyme ribonuclease complex” was identified, which constitutes the most versatile RNA-degradation machine in eukaryotes. When ranking by contribution to gene length, two categories were detected. The category “intracellular anatomical structures” could be associated with the repetitive nature of structural genes such as myosin and actin, and in a sense more specific to annual fishes, to their possible role in maintaining cell integrity during the diapauses as developmentally-programed dormancy state. Secondly, the “transcription cis-regulatory region binding” category was also identified. Cis-regulatory networks coordinate the transcription of multiple genes that function in concert to coordinate entire pathways and complex biological processes. There is now great evidence that TEs have been a rich source of material for the modulation of eukaryotic gene expression [10, 32]. Future transcriptomic and proteomic analyses could clarify the multiple functions of the tandem repeats and the TE patterns of insertion detected in present analysis. As proposed by Nishihara (2019) [45], multidisciplinary studies, including computational genomics, chromosomal analyses, cell biology technology and developmental studies, should be a powerful approach for revealing the multifaceted contributions of a number of TEs to the evolution of genome in a wide variety of groups.
The aforementioned analyses could indicate that, in A. charrua, selective constraints have yielded a compact genome structure, preventing TEs insertion within the majority of genes, and showing special and tolerated insertional patterns in another minor fraction of ones. Remarkably, when assessing the degree of the intensity (or relaxation) of selection in a set of single-copy ortholog genes, all analyses were conclusive supporting a stronger purifying selection in A. charrua than in the other analyzed taxa, which showed a weaker intensified selection. These results are in contrast to those reported by Cui et al. (2019) in two genera of annual African killifish (Nothobranchius and Callopanchax) in which they found relatively more genes under relaxed selection than under intensified selection in relation to other non-annual fish species analyzed. Our analyses do not support that relaxed selection is a common fact related to the annual life cycle. Moreover, the present analyses are not concordant with the hypothesis proposed by Cui et al (2019) for which genome expansion in species with an annual life cycle is likely due to a genome-wide weakening of selective constraints. These authors postulate that annual killifishes display increased mutational rates both in nuclear and mitochondrial genomes due to relaxation of selection consistent with their size expansion [18]. This hypothesis is not aligned with the present data that shows stronger purifying selection in a set of single-copy ortholog genes, and analysis of the dynamics of superfamily expansion aforementioned in A. charrua, as well as with previous results in the mitochondrial genome of this species [33].
As the Austrolebias genome is highly unstable as described above, all present data could indicate that the large genome size in A. charrua (and other Austrolebias species of the genus group) could be under high compartmentalization, presenting a high proportion of compact genes as results of an intensified selection. Our data could be suggesting that a large fraction of TEs are predominantly clustered in gene-poor niches (perhaps heterochromatic and/or highly dynamic NOR regions, as mentioned above), indicating strong purifying selection against insertions near coding sequences or as a consequence of insertion site preferences in a few cases (i.e. Serpin 1B gene). It is known that the insertion of TEs directly into coding sequences may simply disrupt the functioning of the gene [5].
The hypothesis of compartmentalization of niches with high TE density and niches mostly composed of genes has been previously proposed by many authors in different taxa [10, 47]. Natural selection, genetic drift, and population structure are also powerful factors shaping the distribution and accumulation of TEs, facilitating future propagation while mitigating deleterious effects on host cell function [10].
IV. Burst of TE activity and hypothetical historical scenarios for genome expansion in Austrolebias genus group
Our results suggest that the very recent (11–12 Mya) genome amplification suffered by killifish of the Austrolebias genus group is likely an ongoing process that began with the birth of this clade and may be the result of its recent life history and the requirements of inhabiting a highly variable and challenging environment in a temperate Neotropical region of the Pampas. Given that the last common ancestor of the Austrolebias genus group and C. melanotaenia existed ~ 25 Myr ago [15, 44], the massive genome-wide changes have occurred between that time and the emergence of the ancestral Austrolebias species as all members of the genus group display a similar, expanded, genome size [26].
Belyayev (2014) proposed a hypothetical scenario for genome expansion mediated by bursts of TE amplification preceding speciation in small marginal populations. This explanation provides a plausible basis for genome and chromosome evolution in this annual killifish genus inhabiting temporally heterogeneous environments in South America. Under the influence of unusual and unpredictable ecological (abiotic) conditions, TEs become active; their mobilization produces genetic variation, epigenetic alterations, and high rates of karyotypic reorganization, including changes in the species-specific chromosomal pattern. Since the discovery of TEs, Barbara McClintock (1984), and subsequently other authors [5], have suggested their potential to modulate host gene expression networks in response to specific environmental stress.
On the other hand, another possible source of endogenous genomic stress could be the results of the hybridization between species leading to several genome reorganizations to be driven by TEs [21, 43]. Earlier, McClintock (1984) proposed the genomic shock hypothesis in which hybridization between two species constitutes a source of stress that could disrupt the control mechanisms of TEs causing their activation [42]. Romero-Soriano et al. (2017) described the mobilization of at least 28 TEs in hybrids between Drosophila buzzatii and Drosophila koepferae [50]. In this sense, the most recent findings concerning the ancient hybrid zone between A. charrua and its sister taxon A. reicherti have corroborated the occurrence of natural hybridization among populations of the genus. This could suggest the possible involvement of these events promoting bursts of TEs in the genomes of hypothetical ancestral populations of the group. Stressful conditions that release constraints and promote the activity burst of TEs are likely, considering that these species undergo extreme departures from normal developmental programs (diapauses), as well as presenting population dynamics characterized by isolation and admixture at times of flooding, yielding to interpopulation hybridization events and the consequent genomic shock [25, 29].
Finally, it is paradoxical that, although to a lesser degree, C. melanotaenia and N. whitei have expanded their genome sizes as well, pointing to an even more ancestral event which may also be related with this changing and unpredictable environment. This points to an open question regarding the identity of the factors that have induced the appearance of a 3 Gbp haploid genome in A. charrua, a feature that is absent in N. whitei and C. melanotaenia. One possibility may be the original genomic background of each species which, for instance, in the case of A. charrua and C. melanotaenia is different, given the distinct geographic origin of the ancestral populations of these taxa [15].