Transposons are specific DNA segments that can repeatedly move from one location to another in one or more genomes [1]. Although the mutagenic activity of transposons might have both positive and negative impacts on cellular fitness [2, 3], the presence of few transposon copies in a genome is thought to drive the population’s evolution [4]. In microorganisms where horizontal gene transfer occurs easily, the DNA strand exchange activities of transposons can lead to gene capture, reordering, and plasmid fusion, accelerating the adaptation of the microbial populations to many antimicrobials used by humans [5–8].
To date, several distinct classes of prokaryotic DNA transposons have been discovered [9, 10]. Based on the routes and consequences of transposition, the transposon movement modes can be roughly classified into ‘cut-out paste-in,’ ‘copy-out paste-in,’ or ‘copy-in’ [9]. For example, Tn3 introduces a nick at one strand at the two termini and joins the 3′-OH to the target site, cointegrating the donor and target molecules after replication (‘copy-in’) [11]. Some members of the insertion sequences (ISs) generate a circular copy intermediate via replication, without generating an empty donor site, and then insert the circular copy into the target site (‘copy-out paste-in’) [12].
The integrative and conjugative elements (ICEs), a class of mobile DNA elements, move from one location in one genome to a few selected locations in the other genomes using site-specific recombinase and conjugation machinery [13]. ICEs follow three steps for movement: excision of the double-stranded ICE DNA as a circular molecule, mobilization (rolling circle replication), and integration (classified as ‘cut-out paste-in’ [9]). There are ICE-like integrative mobilizable elements that lack the conjugation-associated genes, like the IME, CIME, MGI, and MTn (hereafter, collectively IMEs) [14–17]. The excision of these known ICE/IMEs generates empty donor sites [18–20].
A new subset of mobile DNA elements, the strand-biased circularizing integrative elements (SEs), have recently been identified as transposable elements during the mating experiments between Vibrio alfacsensis and E. coli [21, 22]. SEs have four conserved coding sequences, intA, CDS2, intB, and CDS4, between the 13 to 19 bp imperfect inverted repeats (Fig. 1a). Both intA and intB code for a tyrosine recombinase that possesses a catalytic RHRY motif [23], whereas the products of coding sequences CDS2 and CDS4 are hypothetical proteins [22]. The SEs possess unique features, such as that (i) once integrated into a target site, its empty donor site is barely generated upon their circular copy generation; (ii) the 6 bases next to the motif C end, but not the motif C′ end, are always incorporated to the circular SE, thus the circular copy is a copy of one specific strand (top strand in Fig. 1a map), unlike the copy-out of ISs [24]; (iii) the 6 bp (or 6 base) spacer between the motifs C and C′ in the joint region (attS) on the circular SE is always placed at the newly formed attR, whereas the central 6 bp in attB is placed at the newly formed attL [22]; (iv) SEs presumably use the tyrosine recombinases for changing the location (transposition), because the gene encoding HUH endonuclease (homologs of rolling circle replication initiator or relaxase) [25] is not embedded within the SEs discovered, and (v) the SEs integrate into a few selected locations in the genome [21, 22]. Therefore, SEs share features with both the ISs and ICEs, but only partly.
Bardaji et al. [26] reported SE-like genomic islands, as GInts (Fig. 1a), in the genomes of the plant-associated Pseudomonas species and several other taxa, independent of SEs. The GInts carry four conserved coding sequences (ginA, ginB, ginC, and ginD), three of which encode tyrosine recombinase or its related protein. This genetic structure is reminiscent of the RIT elements discovered in the Betaproteobacteria and later in more diverse taxa [27–29]. Bardaji et al. demonstrated all four coding sequences to be essential for both the circularization and integration activities of GInts [26]. However, the transposition of the full-length GInt has not yet been demonstrated. In the three out of seven strains analyzed, the circular copy generation of GInt was not accompanied by the generation of the empty donor sites, similar to SEs, whereas the strand bias in the GInts-associated recombination remains elusive. The blastn- or blastp-based searches have barely detected a similarity between the SE and GInt genes (Fig. 1b). The relaxase gene is not embedded in the GInts. GInt is also putatively non-mobilizable.
The transposons in prokaryotes, including ICEs/IMEs, are expected to have a wide host range because the DNA strand exchange process itself requires only one or a few proteins, DDE transposase alone [30, 31] or Int plus Xis [18, 32–34]. The Tn3 family transposons have been discovered in both the Gram-negative (for instance, Tn3 clade) and Gram-positive (Tn4430 clade) bacteria [11, 35]. Members of several IS families and ICEs have been discovered in both the archaea and bacteria [36–38]. Tn7 and its related target-site selective transposons, encoding multiple proteins with roles in transposition, have been mainly discovered in Proteobacteria [39–41]. However, if the criteria used in the family definition were relaxed to the presence of the DDE transposase component (TnsB) and at least one other component, the size of the Tn7 family remarkably increased, including the members carrying a CRISPR-cas system for target site selection [42] as well as members in the new clades [43]. The host taxa of Tn7-related elements now include Actinobacteria, Deinococcus-Thermus, and Cyanobacteria [43]. There are only three relevant pieces of literature on the movements of SEs and GInts [21, 22, 26]. SEs/GInts have thus been hypothesized to be rather rare in the prokaryotes or most of them are supposed to remain inactive under normal physiological conditions.
Therefore, identification of the undiscovered SEs and their hosts would contribute to expanding our understanding of the prokaryotic genomic organization, particularly about genomic regions having unknown roles. The identification of novel SEs will also improve our understanding of the fundamental process of how mobile element families emerged in the long-term history of life. Therefore, this study aimed to discover the new members of SEs through database searches and determine the range of their hosts by quantitating the discovery rate of SE per taxon.
The essentiality of conserved SE genes in the circularization step has been evaluated using gene knockout experiments. Then, based on the database survey focusing on the two proteins unique to SEs, we show SEs to be active in transmitting and diversifying in the extant bacteria belonging to Gammaproteobacteria, particularly in the genera Vibrio, Shewanella, Laclercia, Alteromonas, and Pseudomonas.