Characterization of Mariner transposons in Rhus gall aphids (Hemiptera: Aphididae: Eriosomatinae)


 Background: Transposable elements (TEs), also known as jumping genes, are widely spread in the genomes of insects and play a considerable role in genomic evolution. Mariner family belongs to class II transposable elements, were searched in the genomes of seven species of Rhus gall aphids belonging to six genera. Mariner-like elements were characterized for the first time in Rhus gall aphids and classified in to respective subfamilies.Results: In total, one hundred twenty-one MLEs were detected in the genomes of the seven investigated species of Rhus gall aphids, which showed a wide distribution of MLEs in both close and distant related species. The sequences of MLEs ranged from 1kb to 1.4kb in length and the structural analysis of the MLEs showed that only five copies were potentially active with intact open reading frame (ORF) while the remaining were classified as inactive MLEs according to absence of single intact ORF or terminal inverted repeats (TIRs). Based on the MLEs in Rhus gall aphids as well as the well characterized MLEs in other organisms from GenBank, the phylogenetic analysis showed that all the one hundred twenty-one MLE sequences belonged to four subfamilies, i.e., thirty from Maurutiana subfamily, twenty-six from Drosophila subfamily, thirty-three from Vertumana subfamily and thirty-two from Irritans subfamily, among which Drosophila and Vertumana subfamilies were reported in aphids for the first time. Moreover, the phylogenetic relationship suggested possible horizontal transfer events of MLEs between aphids and other insects.Conclusion: Our present report revealed the diversity and distribution of MLEs in Rhus gall aphid genomes sequenced by shotgun genome skimming method. This study further expanded our understandings on the characterization of transposable elements in aphid genomes, which might be useful as genetic markers and tools and would play an important role in genomic evolution and adaptation of aphids.

Results: In total, one hundred twenty-one MLEs were detected in the genomes of the seven investigated species of Rhus gall aphids, which showed a wide distribution of MLEs in both close and distant related species. The sequences of MLEs ranged from 1kb to 1.4kb in length and the structural analysis of the MLEs showed that only ve copies were potentially active with intact open reading frame (ORF) while the remaining were classi ed as inactive MLEs according to absence of single intact ORF or terminal inverted repeats (TIRs). Based on the MLEs in Rhus gall aphids as well as the well characterized MLEs in other organisms from GenBank, the phylogenetic analysis showed that all the one hundred twenty-one MLE sequences belonged to four subfamilies, i.e., thirty from Maurutiana subfamily, twenty-six from Drosophila subfamily, thirty-three from Vertumana subfamily and thirty-two from Irritans subfamily, among which Drosophila and Vertumana subfamilies were reported in aphids for the rst time. Moreover, the phylogenetic relationship suggested possible horizontal transfer events of MLEs between aphids and other insects.
Conclusion: Our present report revealed the diversity and distribution of MLEs in Rhus gall aphid genomes sequenced by shotgun genome skimming method. This study further expanded our understandings on the characterization of transposable elements in aphid genomes, which might be useful as genetic markers and tools and would play an important role in genomic evolution and adaptation of aphids.

Background
Transposable elements (TEs) are DNA sequences (usually less than 15kb), which have the ability to jump and change its location within the genome, also known as genomic parasites [1,2]. Once these elements exploit the host cellular machinery for their own replication, they may have a large negative impact on the host tness [1,2]. Transposable elements have considerable in uence on the evolution of host genome due to their propagation and replication within host genome [3]. However, very small proportion of TE sequences are currently active and most of them are degraded as inactive remains of once active copies [4]. During transposition, they may disrupt coding or regulatory sequences, and the high similar copies, which dispersed in the genome, can serve as source of non-homologous recombination breaking points resulting in chromosomal rearrangement such as inversion, deletion, translocation and duplication. Moreover, TEs can also be co-opted to new host function and give rise to new host genes [3][4][5], through a phenomenon known as molecular domestication [6]. The in uence of TEs on genome organization and evolution is not surprising and enough information are available about impact of TEs in the host genome evolution. The complexity and paraphyletic origin of TEs poses substantial challenges to the scienti c community, including the detection, classi cation, assembly, annotations and mapping of genomic variants [7].
Although the recent advancements to the understanding of TE evolution, there are still considerable gaps of knowledge to completely understand the evolutionary interplay between host and genomic parasites [7,8]. Transposable elements comprised a considerable proportion of eukaryotic and prokaryotic genome [8], e.g., approximately 3-20% of the genomes in many lamentous fungi [9], 10%, 12%, 37%, 45% and 80% of the genome in sh, Caenorhabditis elegans, mouse, human and some plants, respectively [2,3]. The abundance and widespread distribution of transposable elements required a uni ed classi cation to divide these sequences into different lineages though it is still a subject of debate [10][11][12]. There are many di culties in classi cation of TEs, one of which is the analysis of the protein sequences of TEs, because some TEs do not possess any coding sequence while some contain many coding regions with different evolutionary histories due to recombination events [11,12]. Wicker et al. (2007) proposed a uni ed system to rapidly classify transposable elements.
TEs, are classi ed into two major classes: Class I or retrotransposons (RTs) and Class II or DNA transposons based on their life cycle and molecular structure [10]. The former is transposed by RNA intermediate while DNA transposons are transposed by typical cut and paste mechanism [13]. Based on their sequence compositions and some conserved features, TEs can further be divided into subclasses, orders, superfamilies and families [10][11][12][13].
Class I transposons are divided into two classes: LTR RTs anked by long terminal repeats (LTRs) and non-LTR RTs with lacking terminal repeats [13], while Class II elements, or DNA transposons are further classi ed in to two subclasses: Subclass 1 elements transpose by the process, i.e., excision and integration, in which cleavage of both DNA strands occurs during excision while elements belong to subclass 2, duplicate before insertion. Subclass 1 elements further consist of two orders, in which Terminal Inverted Repeats (TIR) order is widely known. TIR order further contains nine superfamilies: Tc1/Mariner, hAT, Mutator, Transib, Merlin, P, CACTA, PIF/Harbinger and PiggyBac. Subclass 2 elements consist of two orders: Maverick and Helitron [6,10].
Among class II TEs in eukaryotes, Tc1/Mariner is one of the most abundant superfamily, whose members share many common characteristics [10]. The autonomous copies contain a single ORF, which encodes a transposase of 282 to 350 amino acid residues with the insertion target TA [14]. Transposase enzyme has a conserved catalytic triad DDE/D motif and a DNA binding domain containing two helix-turn-helix (HTH) motifs [15]. The major characteristics to distinguish the different Tc1/Mariner families are their sequence length and DDE/D signature motif. The length of Tc1/Mariner ranges from one to ve kb due to the length of terminal inverted repeats (TIRs) which varies from 13-34bp in mariner, while 20 to 600bp in Tc1. The DDE/D signature motif corresponds to DD34E for Tc1 and DD34D for Mariner [15].
Abundant transposable elements were found in different insect genomes, where the proportion of TEs could also explain the variations of insect genome size [16,17]. So far, insect genome analysis revealed that Belgica antarctica had the smallest (99Mb) genome with TEs less than 1%, while Locusta migratoria (6.5Gb) had the largest one, which consisted of 60% TEs [18]. Tc1/Mariner Superfamily of Class II transposons are well characterized in many insect's genomes [14,15,17] but not studied enough in aphid species. Mariner like elements (MLEs) of Tc1/Mariner superfamily have a simple structure, including single gene anked by untranslated sequences and terminal inverted repeats (TIRs) at both 5' and 3' ends [19]. Mariner transposons were characterized in only a few aphid species in previous studies and very little is known about transposons abundance, diversi cation and in uence on genomic evolution in aphids. Partial sequences of Irritans and Melifera subfamilies in Aphis glycine were identi ed in vitro by PCR ampli cation [16,18], and deleted copies of Mauritiana subfamily were detected in seven fruit tree aphid species [19], whereas, only three complete sequences were reported in pea aphid Acyrthosiphon pisum deposited in Repbase as MarinerN-1_AP, 1B and 2 [20]. However, many lineages of Mariner were detected recently in the genomes of three aphid species: Acyrthosiphon pisum, Diuraphis noxia, and Myzus persicae [18], whose genomes are available at NCBI (http://www.ncbi.nlm.nih.gov/genbank) and in aphid database (http://tools.genouest.org/is/aphidbase) respectively and both complete and truncated copies of TEs were detected in them from different families of Tc1/Mariner superfamily [18,20]. As, Mariner transposons was characterized in only a few aphid species and very little is known about its abundance, diversi cation and in uence on genomic evolution in aphids. In this study, we examined Mariner family of Tc1/Mariner superfamily of Class II transposons in the genomes of seven species of Rhus gall aphids from six genera.
Rhus gall aphids (Aphididae: Eriosomatinae: Fordini) include six genera, in which ve genera are from east Asia while one from east North America, and specially comprise a unique group [21,22]. Rhus gall aphids feed on their primary host plant Rhus species (Anacardiceae) to form galls with rich tannins, which were produced as an important medical and industry raw material [21,22,24,49]. There were some reports on the phylogenetic relationship of the Rhus gall aphids referring from different molecular sequence markers [21,22,36]. However, these studies have been largely constrained by limited sampling or have failed to nd high support for relationships among the genera and species. Recently, Ren et al.
[2019] investigated the evolutionary relationships within Rhus gall aphids by sampling 15 accessions representing all six genera and using 20 gene regions: ve nuclear genes as well as 13 protein-coding genes and two rRNA genes of the complete mitochondrial genome, which obtained the backbone phylogeny to well support the monophyly of six genera and resolve the relationship of genera and species from Rhus gall aphids [22].
In case of the seven species in this study, their relationship was as following: The North America species Melaphis was original in East Asia; Meitanaphis is sister to Kaburagia, and then grouped with Floraphis; Nurudea ibofushi is nested in Schlechtendalia and suggested to be merged in the genus Schlechtendalia. As transposable elements may serve as genetic markers and tools and have impact on insect genome, adaptation and biology [23,24], we are interested in detecting and characterizing mariner-like transposons from at least one Rhus gall aphid species from all the six genera, i.e., Schlechtendalia, Nurudea, Melaphis, Meithanaphis, Kaburagia and Floraphis known to feed on Rhus species. To our knowledge, this study would represent the rst report on the mariner transposable elements and its implications in Rhus gall aphids.

Results
Search of homologous sequences, belonging to Mariner family, was performed by the approach (tBLASTn) using a set of 50 known MLEs as queries (see additional le 1). We found in total 121 sequences of MLEs in all the seven Rhus gall aphid species, i.e., thirty-three in Schlechtendalia chinensis, twenty-six in Schlechtendalia peitan, ten in Kaburagia rhusicola, ten in Floraphis choui, ten in Meitanaphis avogallis, sixteen in Melaphis rhois and sixteen in Nurudea ibofushi. All the detected transposons were classi ed into four subfamilies of Mariner family transposable elements based on the phylogenetic analysis with already classi ed MLEs from previous studies (see Fig. 1a and 1b). The numbers and the classi cations of MLEs detected in all seven Rhus gall aphid's species are shown in Table 1.

General features
Fifteen of the extracted MLEs were truncated at both or one end among the total 121 detected ones, which were mostly due to their presence at the end of contigs (see Tables in supplemental le 2). Only ve complete copies of MLEs with intact ORF and two truncated copies with complete intact ORF for transposase protein were detected in the study. Sequences of intact ORF with no stop codon or frameshift mutation were considered potentially active [18]. Among 33 MLEs detected in Schlechtendalia chinensis, only two (Scmar7 and Scmar10) were found to have intact ORF for transposase but truncated with missing TIRs at one end. While three MLEs from Kaburagia rhusicola (Krmar2, Krmar4 and Krmar5), one from Floraphis choui (Fcmar4) and Meithanaphis avogllis (Mfmar4), respectively, were potentially active with intact ORF and TIRs at both 5' and 3' end. All the other sequences had at least one or more premature stop codons. No active copy with single intact ORF was detected in Schlechtendalia peitan, Nurudea ibofushi and Melaphis rhois. Five of the MLEs with intact transposase ORF belonged to the subfamily Drosophila, i.e., Krmar2, Scmar7, Scmar10, Mfmar4 and Fcmar4, while two belonged to Maurutiana, i.e., Krmar4 and Krmar5 (see Tables in Additional le 2). All the MLEs, belonging to Vertumana and Irritans subfamilies, were inactive with no intact ORF for transposase. All the detected MLEs in the Rhus gall aphids have been submitted to GenBank with accession numbers (see Table 1-4 in Additional le 2).

Structure analysis
Terminal inverted repeats (TIRs), which are necessary for the transposition, were analyzed in all the complete copies of MLEs and consensus TIRs of each subfamily are also given (see Table 1-4 additional le 2). Meanwhile, the TA target site duplication (TSD) were also found at both ends in the complete copies except Krmar8 and Krmar10 in which TA were found at 3' end only. All the completes copies detected were of variable length ranging from 1.2kb to 1.35kb and TIRs from 13bp to 32bp (see Table 1, and additional le 2).
Transposases of the complete MLEs were analyzed for the conserved domains and motifs of mariner transposons. Catalytic domain DD34D were highly conserved in most of the complete copies, while WVPHEL and YSPDL motif required for transposition were slightly modi ed in some MLEs. Helix-turn-helix DNA binding motifs were also conserved and found in all the complete copies. Nuclear localization sequence (NLS) was also present in some complete copies while absent or modi ed in others. Some of the detected MLEs became inactive due to presence of only single point mutation (single nucleotide substitution) which led to generate premature stop codon (see Fig. 4). Conserved catalytic domain DD34D, helix-turn-helix (HTH) DNA binding motifs, WVPHEL motif, YSPDL motif and nuclear localization signal (NLS) of three of the complete MLEs detected in the study belonged to three different subfamilies of MLEs, which are shown in Figs. 2, 3 and 4, respectively.

Phylogenetic analysis
The subfamily classi cation of detected MLEs in Rhus gall aphids was done on the basis of DNA sequence similarities. Mariner sequences of other organisms mainly from the class insecta were downloaded from GenBank, which belonged to already reported major subfamilies, i.e., Maurutiana, Mellifera, Irritans, Cecropia, Capitata, Vertumana, Drosophila, Vertuman, Marmoratus, Lineata and Elegans. Phylogenetic relationship of all the 121 MLEs in Rhus gall aphids along with the MLE sequences of other organisms mainly insects from GenBank were analyzed by constructing ML phylogenetic tree with 1000 duplicates. All the detected MLEs of Rhus gall aphids were clustered into four subfamilies, i.e., Maurutiana, Irritans, Vertumana and the subfamily Drosophila (see Fig. 1a and 1b). The MLEs detected in this study were classi ed into subfamilies according to their groups and relatedness with already known MLEs from different subfamilies downloaded from GenBank (see Fig. 1a and 1b).

Discussion
The seven Rhus gall aphid species sampled in this study feed on the primary host plant Rhus species to form galls containing rich tannin, so they have many economic importance to be widely applied in various elds, e.g., medicine, food, dye, chemical and military industry [21,22]. The sequences of MLEs in Rhus gall aphids were searched using BLAST in the aphid whole genomes sequenced by shotgun genome skimming method. Our study focused only on the existence of mariner-like elements in Rhus gall aphid species, and were irrespective of their total copies, number and percent contribution in the genome size. As a result, Mariner transposons from four different subfamilies, i.e., Maurutiana, Vertumana, Irritans and Drosophila were detected in all the seven species of Rhus gall aphids (see Table 1), which is in agreement with previous studies, i.e., MLEs were shown to have the widespread distribution of in Hexapoda [5,18,29,30].
A total of 121 MLEs were detected in all the seven aphid species, among which the subfamily Vetumana, and Mauritiana were the most widely spread one in all the seven Rhus gall aphids, as the previous studies also reported the presence of MLEs from Mauritiana family in seven tree aphids [20], while no MLEs from Maurutiana were found in the genome mining of Aphis glycine [4], Acyrthosipon pisum, Diuraphis noxia [18], however, MLEs from Vertumana Subfamily were reported for the rst time in aphids during this study. MLEs from Drosophila subfamily were also detected in all the Rhus gall aphid species in this study, while it was not reported in other aphid species in previous studies. Thus, we also reported this for the rst time in Rhus gall aphids, while it was discovered for the rst time in Drosophila species [32]. MLEs from the subfamily Irritans were found in three species of Rhus gall aphids, i.e., Schlechtendalia chinensis, Schlechtendalia peitan and Nurudea ibofushi, while absent in the other four studied species (see Table 1). In contrast, MLEs from the subfamily Irritans were found in all aphid genomes in previous studies [4,18]. Absence of MLEs from subfamilies like Mellifera, Capitata and others might indicate variable distributions of MLEs in aphid genomes, or might be related to the fact that our sequenced genomes didn't cover the 100% genes and repeat regions of aphid species in the study.
A fewer of complete copies of each MLE, i.e., 1 to 3 (see Table 1-4 in additional le 2) was detected in this study as comparing to previously proposed studies [5]. Most of the MLEs detected previously in aphids were in Vitro by PCR cloning, which resulted in detection of a relatively large number of deleted copies of MLEs [20]. Our study mainly focused on the detection of complete copies of Mariner-like elements in Rhus gall aphids and very few truncated copies were detected and reported in this study in contrast to previous studies which reported mostly internally deleted and truncated MLEs mostly less than 1000bp in aphids [4,18]. No Miniature Inverted-repeats transposable elements (MITEs) were detected in this study which were previously reported in aphids [4,18].
The relatively low number of different MLEs in aphid genomes from only four subfamilies in our study as compare to other insects agreed with the previous studies [20], which might be the special genetic characteristics of the aphids including the Rhus gall aphids. Also, this might be due to (i) the genome size sequenced in our study didn't completely cover the repeated regions in genome due to the sequencing Illumina platform [31]; (ii) more than 60% of the assembled contigs were < 1000 bp long (see table no.2), which didn't result in producing good hits by tBLASTn search in the genomes. Though Tc1/mariner is the most abundant superfamily in insect genome, it is poorly represented in aphid genomes [4,5], which was also supported by our study.
Structural analysis of the protein polypeptides of the detected MLEs in all the seven aphid species showed that the conserved catalytic domains DD34D in the third aspartate residue were mutated in many of the inactive copies, while highly conserved in active copies, which also supported the previous studies [33]. DNA binding helix-turn-helix HTH motif and two main conserved domains of MLEs, i.e., WVPHEL and YSPDL, required for transposase activity, which were conserved in most of the MLEs, whereas there was slight modi cation in the conserved regions in some of the MLEs as shown in Fig. 2, 3, 4 which were in agreement with the previous nding [20]. Nuclear localization sequence (NLS) motif, being required for the import of transposase to the nucleus, were analyzed and found in some active MLEs (Fig. 2), and were slightly modi ed in some (Fig. 3,   4) while absent in many sequences due to frequent mutation or inactivation events [20]. However, the previous studies also showed that some of the MLEs didn't have their own NLS, which depended on other proteins for their nuclear importation [34].
The current study showed the diversity of MLEs in aphid genomes, but most of the detected MLEs corresponded to inactive lineages, which was in agreement with previous ndings [4,18]. The absence of very few potentially active copies supported the phenomenon of vertical inactivation of Mariner transposons [5,20]. Single nucleotide substitution which leads to premature stop codon (see Fig. 4) and nucleotides loss due to deletions reported in previous studies [2] appeared to play an important role in vertical inactivation of transposons, e.g., Irritans subfamily, had no active MLE copy, i.e., all the copies were inactive with no intact ORF though, they are widely spread in our studied species and in previously studied species [3,18].
Like all other genes, MLEs are transmitted vertically from parents to offspring during the evolutionary course, so the relationship between MLEs sequences must re ect the evolutionary relationship of their hosts [35,36]. Phylogenetic relationship of aphids based on the mitochondrial COI gene showed the consistency with the classical phylogenetic analysis based on molecular and morphological characteristics in previous studies [35][36][37][38][39], which proved COI to be effective tool for characterizing aphid species phylogeny [40]. While other studies reported signi cant inconsistency of transposable elements with their molecular phylogeny as compared to COI [20] and also with other single non-transposable genes from the same genome [41][42][43][44]. Our study also indicated that the phylogeny of MLEs was non concordant while previous studies also showed that MLEs had evolved independently of host speciation event [45].
Horizontal transfer events of MLEs within aphid species and with other insects was obvious in our sequence analysis. The presence of almost identical MLEs in phylogenetically distant-related species and obviously variable MLEs within the same species in this study re ected the occurrence of horizontal transfer events during aphid evolution, which supported the previous ndings of horizontal transfer of transposons in other insects [43,47,48]. For instance, MLE from Bactrocera tryoni Batmar11 (accession no. KX931004) had more than 90% DNA sequence similarity with Scmar11, Scmar12, Spmar2, Krmar5, Krmar6, and Mfmar7, which supported the occurrence of one or more horizontal transfer events between Rhus gall aphids and fruit y (Bactrocera tryoni) as the previous studies revealed that TEs from distantly related species having sequence query coverage more than 90% and similarity more than 90% indicates the event of horizontal transfer [51]. While one MLEs of irritans subfamily (Scmar15) detected Schlechtendalia chinensis was 100% similar throughout the length to MLE (Spmar5) detected in Schlechtendalia peitans, which is a clear evident of horizontal transfer of transposon (HTT), though these both are sister species belongs to same genus Schlechtendalia but similar HTT were reported in different species of drosophila belong to same genus [52]. However, the detailed explanation of horizontal transfer is not discussed in this study, and we will thoroughly examine and explain this phenomenon by sampling more species and more MLEs from different subfamilies in our further research.

Conclusion
Our present study reported the diversity and structure composition of mariner-like transposons in Rhus gall aphids. All the 121 mariner-like elements (MLEs) belonged to four subfamilies: Maurutiana, Drosophila, Irritans and Vertumana, among which subfamily Drosophila and Vertumana were reported for the rst time in Rhus gall aphid species. We only demonstrated the presence of full length MLEs including both the active and inactive lineages in aphid species. Further research needs to demonstrate the activity of potentially active MLEs and its transposition in Rhus gall aphids and its role in genome evolution and adaptations.

Methods
All the aphid genomes used in this study were sequenced by shotgun genome skimming method by an already ongoing project in our lab.

Sample collections
All the mature Rhus galls were collected on the host plant from different location in China except one species which was collected in North America [22]. There were thousands of aphids in one gall because of the parthenogenetic generations during the gall formation. Some individuals from one gall were placed in 75% alcohol for taxonomic identi cation using microscopy by following taxonomy protocol [24]. The remaining individuals were preserved in absolute alcohol for DNA extraction. Voucher specimen were deposited at the School of Life Sciences in Shanxi University, China. Sampling information and species taxonomy are shown in Table 2.

DNA extraction and Sequencing
Three individuals of the aphid samples stored in absolute alcohol were transferred into distilled water for thirty-six hours in 1.5ml Eppendorf tube, and then the water was removed and the aphids were grounded with the help of a small pestle.
Genomic DNA of all samples were extracted using DNeasy extraction kit (QIAGEN, Valencia, CA), and the quali ed DNAs were sent to the Genomic Sequencing and Analysis Facility (GSAF), University of Texas, Austin for library construction and next generation sequencing (NGS). A TruSeq Nano DNA library preparation kit (Illumina, FC-121-4003) was used to prepare DNA library and the Illumina NextSeq sequencer was used for the generation of paired-end reads 2x150 bp with an insert size of 400 bp. Trimmomatic v.0.35 was used to lter raw data with default settings [25]. De novo assembly of the trimmed data was performed by the program Spades v. 3.7.1 [50] and the whole genome was assembled into contigs with different length. Genome size, GC content and detailed information of the contigs of all the seven Rhus gall aphid species were shown in Table 2.

Data mining
Panel of complete copies of both active and non-active mariner transposable elements were downloaded from GenBank (http://www.ncbi.nlm.nih.gov/GenBank, see the Additional le 3). Most the downloaded sequences belonged to the class Insecta, mainly Drosphila and also the MLEs already reported in other species of aphids. Geneious prime 11.0.3 with default parameters was used for mining the transposable elements using the downloaded sequences as query by local BLASTn searches on genomic contigs of each species. Detailed information of the sequenced genomes of the seven Rhus gall aphid species are given in Table 2. The sequences with the best hits (similarity more than 60% and query coverage more than 60%) were extracted and manually analyzed for MLE signatures and terminal inverted repeats (TIRs). Each of the complete sequences extracted were used again as query to retrieve more similar sequences. Truncated sequences with similarity less than 60% and query coverage less than 60% were manually analyzed and were not included and reported in this study due to absence of TIRs and any MLEs signatures. No MITEs (Miniature Inverted-repeats Transposable Elements) were retrieved during this study and most of the truncated and deleted copies retrieved with length less than 1kb were the exact copies of that complete MLEs.

Sequence analysis and identi cation
All the Mariner sequences extracted from each local database of genomic contigs were manually analyzed for its terminal inverted repeats (TIRs) and target site duplications (TSD). Potentially active and non-active copies from the sequences were determined by translating the sequences for transposase using ORF nder implemented in Geneious prime 11.0.3 by default setting. DDD/E catalytic domain and HTH DNA binding conserved domains were analyzed for potentially active and nonactive copies of MLEs by NCBI conserved domain search (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) [26] with default parameters, while nuclear localization sequence (NLS) motif for active copies of transposase was searched by cNLS mapper (http://nls-mapper.iab.keio.ac.jp/cgi) [27]. Multiple alignment was done using MAFFT version implemented in Geneious 11.0.3 with default parameters for the analysis of the conserved DDE/D signature in the transposase for potentially active copies.

ORF and conserved domains of the MLEs
The analysis of the potentially active copies with ORF ranging from 310 to 345 amino acids was performed by aligning them with transposases of Mariner family of other organisms downloaded from GenBank. Complete structure composition of the transposable elements, i.e., DNA binding domain (HTH), nuclear localizing motif (NLS) and catalytic domain DD34D of active and inactive copies from each species, was predicted, and the sequences having intact ORF with no stop codon or frameshift mutation were considered active [18]. Conserved catalytic domains DD34D were used to justify the classi cation of detected TEs into Mariner family of Tc1/Mariner superfamily. MLEs with no intact ORF and having one or more than one stop codon were also translated and analyzed for conserved domain and motifs, i.e., DD34D catalytic domain, HTH motif, nuclear localization motif, WVPHEL and YSPDL motif. MLEs having no intact ORF and conserved motif and domains due mutations like deletion, insertion or substitution were also classi ed in the same group based on sequence similarity ≥ 80% in the complete sequence or TIRs as proposed by Wicker et al. 2007.

Phylogenetic analysis
The phylogenetic analysis was performed in order to reveal the relationship of MLEs in Rhus gall aphids and other insects. All the sequences detected in Rhus gall aphids along with forty-four MLEs of other organisms mainly from the class Insecta downloaded from GenBank were used to construct the phylogenetic tree (see additional le 3). As not all the detected MLEs had intact ORF in our study, so their whole nucleotide sequences of MLEs were used to align using MAFFT multiple alignment implemented in Geneious 11.0.3 with default parameters and construct the phylogram of 121 MLEs detected in Rhus gall aphids and other 44 MLEs from GenBank. The Maximum likelihood (ML) phylogenetic tree were constructed by GTRGAMMA model with 1000 replications (bootstraps) using the software RAxML [28]. An MLE from the Drosophil virilis (Accession no. DVU26938) belonging to Tc1 family was used as outgroup.