Gene Rearrangements in the Mitochondrial Genomes of Whiteflies (Hemiptera: Aleyrodinae): Plesiomorphies, Synapomorphies and Autapomorphies


 Mitochondrial genome rearrangements have been used for defining historical relationships, but there have been incidences of convergences at different taxonomic levels. Here, we sequenced complete mitogenome of Aleurodicus rugioperculatus (Aleyrodidae: Aleurodicinae) to examine gene rearrangements and phylogenetic relationships within the family Aleyrodidae. We identified five gene blocks (I-V) in the whitefly ancestor that are shared plesiomorphies retained in different whitefly lineages. Gene block I is conserved in all whiteflies except three species (Tetraleurodes acaciae and two Bemisia species). Conversely, we detected 83 derived gene boundaries within the family. Mapping these gene boundaries onto a phylogenetic tree revealed that 16 were symplesiomorphies for two subfamilies; 9 were synapomorphies at different taxonomic levels, and 28 autapomorphies for individual species. Bayesian Inference (BI) and Maximum Likelihood (ML) phylogenetic analyses yielded similar topologies supporting the monophyly of Aleyrodinae and Aleurodicinae. Exclusion of PCG third codon positions from phylogenetic analyses improved both node support and consistency with prior analyses. To understand the significance of gene order convergence on phylogeny of the whiteflies, more species-level data is required.

White ies belong to the family Aleyrodidae, (Suborder Sternorrhyncha, Order Hemiptera) [9]. The family Aleyrodidae is further classi ed into two subfamilies, Aleurodicinae and Aleyrodinae. White ies are inconspicuous phytophagous insects, usually found on the lower surfaces of leaves. White ies are an economically important group that infest a wide variety of plant families [10]. To date, 1556 species and 161 white y genera have been described from throughout the world [11]. India's white y diversity is nearly one quarter of global diversity with 406 species from 60 genera [12].
Aleurodicus rugioperculatus is commonly known as the rugose spiraling white y, and was originally described from Belize in 2004 as a pest on coconut [13]. It was subsequently reported as a pest on gumbo limbo (Burera simaruba) from Miami Dade County, in Florida (United States of America) [14]. This species was recently (2017) introduced to the states of Tamil Nadu and Kerala in India and has been collected from coconuts, mango and guava [15]. Subsequently it was also reported from other Indian states, Andhra Pradesh, Assam, Goa, West Bengal, Maharashtra, Telangana, Meghalaya and Gujarat, as pests of coconut, banana, sapota, maize, oil palm, mango, cashew and many ornamental plants [16][17][18].
Given its range of potential hosts and impact on Indian agriculture this pest needs accurate identi cation tools, including in-depth molecular data, is required to apply successful control strategies.
Here, we sequenced and annotated the complete mitogenome of the rugose spiraling white y, Ad. rugioperculatus (Aleurodicinae). The objectives of the present study were to: i) characterize the complete mitogenome of Ad. rugioperculatus; ii) compare mitogenomes evolution across white ies to understand patterns of gene rearrangements and elucidate ancestral, shared and derived gene boundaries.

Ethics statement
There is no speci c permission required to collect the white y specimens.

Sample collection, and DNA Isolation
White ies specimens were collected in September 2020 from the College of Agriculture Campus, Navile, Shivamogga district, Karnataka state of India (13°58'15.654"N, 75°34'47.952"E). Specimens were identi ed morphologically using available taxonomic keys by KS [13]. Specimens were preserved in absolute alcohol and stored at −80˚C in the Centre for DNA Taxonomy, Molecular Systematics Division, Zoological Survey of India, Kolkata. All the specimens were washed individually prior to extraction ve to six times with PBS solution to eliminate the contamination. Genomic DNA was extracted from a single specimen using the DNeasy DNA Extraction kit (Qiagen, Valencia, CA). The dsDNA high-sensitivity kit was used for checking the DNA quantity by Qubit Fluorometer (Thermo Fisher Scienti c, MA, U00SA).
Mitochondrial genome sequencing, assembly, and annotation Whole genome sequencing was carried out on the Illumina platform (Illumina Hiseq 2500) using 2x150 paired end chemistry. Libraries were constructed using the TruSeq DNA Library Preparation kit. Around 50 GB data were generated and low-quality reads were removed by NGS-Toolkit. The mitogenome was assembled from whole genome reads data using GetOrganelle (ver. 1.7.4). The location of protein coding genes (PCGs), transfer RNAs (tRNAs), and ribosomal RNAs (rRNAs) were estimated using MITOS webserver (http://mitos.bioinf.uni-leipzig.de/index.py) [29] (Table 1). Subsequently BLASTn, BLASTp, and ORF Finder (https://www.ncbi.nlm.nih.gov/or nder/) were used to con rm gene boundaries for the PCGs and rRNAs. tRNA secondary structures were predicted in MITOS [29], and ARWEN 1.2 [30]. Control region secondary structures were predicted in mFold [31], implemented in the UNAFold web server using. The data for the newly sequenced species were deposited in GenBank (accession numbers in Table 2).

Genome visualization, and comparative analysis
CGview was used to visualize the complete mitogenome of Ad. rugioperculatus. Nucleotide composition, Relative Synonymous Codon Usage (RSCU), and AT-GC skew statistics were calculated in MEGAX [32]. Nucleotide skew was calculated as AT skew = (A−T) / (A+T) and GC skew = (G−C) / (G+C) [33]. ENC values were calculated in DnaSP6.0 [34]. Pairwise comparisons of gene order, and the evolutionary sequence of rearrangements within white ies were evaluated by CREx (Common Interval Rearrangement Explorer) [35] while the reconstruction of the ancestral gene orders at each tree node was performed using TreeREx [36].
The large ribosomal RNA (rrnL) gene is located at the conserved position between trnL1 and trnV and is 1,286 bp in length (88.26 % A+T content). The small ribosomal RNA (rrnS) gene was located at the conserved position between trnV and trnQ and is 779 bp in length (87.93% A+T content). rRNA secondary structures were compared to that of Drosophila melanogaster [45,46]. rrnL contains six domains with 47 helices (domain III is absent in Ad. rugioperculatus as is found in other arthropod mitogenomes) (Fig 2), whereas rrnS contains three domains with 31 helices (Fig 3).
Ad. rugioperculatus contains 22 tRNAs with a total length of 1,420 bp (range 57 to 70). Fourteen tRNA genes were coded on the majority strand and eight on the minority strand. The A+T content of tRNAs was 88.24% with negative AT skew (0.031) and positive GC skew (0.006). Most tRNAs showed the typical cloverleaf secondary structure, except trnS1 and trnS2 where the DHU stem and loop was absent (Fig.  S1).
A total of 11 intergenic spacer regions, totaling 99 bp, and varying from 1 to 37 bp in length were detected. The longest intergenic spacer (37 bp) was between nad6 and trnP. A total 15 overlapping regions, covering 179 bp (ranging from 1 to 87 bp in length) were detected ( Table 1).
The control region (CR) is located between trnQ and trnI, and is 805 bp in length. It consists of i) a 161 bp A+T rich region along with poly T-stretch; ii) a stem/loop structure anking TATA motif and GAAT motifs; iii) a 221 bp A+T rich region; iv) two tandem repeats (R1, 120 bp); and v) a terminal 28 bp A-rich region. The A+T content was 93.54 % with positive AT skew (0.19) and GC skew (0.15) (Fig. 4 A).
Comparative studies A comparative study was carried out using the 13 mitogenomes of white ies (Table 2) including the newly generated species to explore the codon usage bias, gene rearrangement and phylogeny.

Codon usage bias
Codon usage bias in the 13 white y mitogenomes was compared to the effective number of codons (ENC) (Fig. 4 B). The ENC value ranged from 29.35 to 56.02, with an average of 42.96. ENC values in the subfamily Aleyrodinae were signi cantly higher (t-test, P < 0.05) than those of Aleurodicinae, ranging from 34.31 to 56.02 and from 29.35 to 33.93, respectively. That more codons were used by members of the Aleyrodinae in comparison to Aleurodicinae, suggests weak selection constraints on mitogenomes of Aleyrodinae.
The thirty-seven ancestral boundaries were mapped to the phylogenetic tree which revealed that 16 boundaries are found in all members of the family Aleyrodidae. Eleven additional ancestral gene boundaries are retained in the subfamily Aleurodicinae. The ancestral boundaries 4 (trnk-trnD) and 5 (trnD-atp8) are plesiomorphies found in all Aleyrodidae except Bemisia species and Te. acaciae. Gene boundary 7 (atp6-cox3) is a plesiomorphy for the subfamily Aleurodicinae but is lost in subfamily Aleyrodinae due to the transposition of cox3 gene. Gene boundary 27 (trnL-rrnL) is a plesiomorphy for the subfamily Aleyrodinae, whereas 21 (trnT-trnP) and 22 (trnP-nad6) are plesiomorphies for the Aleyrodidae except for Ad. dispersus and N. andropogonis. Gene boundary 24 (cyt-trnS2) is conserved Aleyrodidae except for Ad. dispersus and Tr. vaporariorum.

TreeREx
We carried out TreeREx analysis using the 13 PCGs+2RNAs ML-4 phylogenetic tree (Table S5). To eliminate biases due to partial mitogenomes, the 7 missing tRNAs were excluded from all species in this analysis. TreeRex analysis revealed the four inversions at the node of A0 towards Ad. dispersus; one inversion (rrnS) and one inverse transposition (nad3 cox3) at the node of A10 towards N. andropogonis.
Investigating ancestral gene blocks and mapping new gene boundaries through character coding and TreeREx revealed similar results. Gene order in subfamily Aleurodicinae was more conserved relative to the ancestral insect than was subfamily Aleyrodinae. Gene blocks II and V are the most variable gene regions whereas gene block I is the most conserved one in the Aleyrodidae. These mapped gene boundaries require further investigation across more species within each genus to determine whether they represent species-level autapomorphies or genus-level synapomorphies.

Phylogeny
In the present study, phylogenetic analyses were carried out using 4 datasets of derived from the 13 PCGs of white ies. Eight phylogenetic trees were constructed by Bayesian Inference and Maximum likelihood which have broadly similar topologies with few discrepancies (Figs. 6B, S2-S9). Both inference methods support the monophyly of both subfamilies Aleurodicinae and Aleyrodinae, consistent with previous studies [27]. Ad. rugioperculatus was sister to Ad. dugesii and they share an identical gene order. In contrast, Ad. dispersus and Ad. dugesii are in one clade but differ in their gene order. Tr. vaporariorum was sister to Aleurochiton + Pealius in all the analyses except ML-1, 2 and BI-2, where it was sister to Bemisia. This result is well supported in earlier phylogenies of white ies [27]. The genera Bemisia, and Aleurocanthus were recovered as monophyletic in all the analyses. Ach. aceris was sister to P. machili in all the analyses which was also support by earlier phylogenies. Te. acaciae was sister to C. turpinae in all analyses. Finally, to resolve the phylogenetic relationships of Trialeurodes more species from this genus will need to be sequenced.

Conclusion
White ies belong to the order Hemiptera forming the family Aleyrodidae with two subfamilies Aleyrodinae and Aleurodicinae. Until now, 12 either complete or partial mitogenome data of family Aleyrodidae have been available. Here, we sequenced and annotated the second complete mitogenome from the subfamily Aleurodicinae and conducted a comparative analysis. The mitogenome is 15060 bp with 86.5% A + T content and 13.5% G + C content. Gene order in this species is similar to that of another Aleurodicinae species, Ad. dugesii. Gene rearrangements have been detected in the mitogenomes of white ies, located in several gene blocks between (cox2-trnK-trnD-atp6-atp8), (cox3-trnG-nad3-trnA-trnR), (trnL1-rrnL-trnV-rrnS), (trnI-trnQ-trnM-nad2-trnW-trnC-trnY-cox1). Five ancestral gene blocks (I-V) and 83 derived gene boundaries (37 ancestral, 19 derived, 27 unique) were identi ed in white ies. Ancestral gene block I was conserved of the whole family Aleyrodidae except for Tetraleurodes acaciae and Bemisia species. The ancestral gene block IV was conserved across the subfamily Aleurodicinae. Out of the 37 ancestral boundaries (plesiomorphies), 15 were shared across the two subfamilies. Nineteen derived gene boundaries were mapped on to the white y phylogeny but only 9 were synapomorphic. We have identi ed 27 species-level derived boundaries (autapomorphies) which require further investigation. Eight phylogenies were constructed using Bayesian Inference (BI) and Maximum Likelihood (ML) that resulted in similar topologies. It supported the monophyly of both subfamilies Aleyrodinae and Aleurodicinae with good support values. Exclusion of third codon positions, and GBlock masking improved nodal support.
The PCGs based phylogenetic study demonstrated that mitogenomes were informative in addressing the phylogenetic questions in white ies. More mitogenome sequencing data is needed to provide comprehensive taxonomic sampling and better understand the evolution of white ies.