1) Characterization of complete mitogenome of Aleurodicus rugioperculatus
The complete mitogenome of Aleurodicus rugioperculatus (GenBank accession no. MW649000) is a 15,060 bp circular DNA molecule (Fig 1, Table 1). It contains 37 genes, including 13 PCGs, two rRNAs, 22 tRNAs and a non-coding control region (CR). Out of the 37 genes, 23 were located on the majority strand and 14 on minority strand. A+T content was 86.5% (38.6% A + 47.9% T) and G+C content 13.5% (7.9%G + 5.6% C). A+T content was lowest in the control region (75.23%) followed by PCGs (85.21 %), rRNAs (88.1%) and tRNAs (88.24%) (Table S1).
The mitogenome of Ad. rugioperculatus includes 13 PCGs (total length of 10,840 bp) with negative AT skew (0.141) and positive GC skew (0.226). All PCGs used ATN start codons (ATA for nad1, nad2, nad4 and nad6; ATG for cox1, cox2, cox3, atp6, nad4L and cytb; and ATT for nad3, nad5 and atp8) (Table 1). Most PCGs have TAA stop codons except cox1, cox2 and nad1 which have incomplete termination codons (T). Incomplete termination codons are observed in many other insect mitogenomes and are completed by post-transcriptional polyadenylation .
The average A+T content across PCGs was 85.21%, highest in nad6 (92.72%) and lowest in cox1 (77.80%) (Table S1). Base composition at each codon position (Ist, IInd, IIrd) of the concatenated PCGs was 77.73%, 72.6%, and 95.2% respectively. RSCU analysis of the 3,613 codons in Ad. rugioperculatus revealed that TTT(Phenylalanine), AAT(Asparagine), TAT(Tyrosine), ATT(Isoleucine), TTA(Leucine), and AAA (Lysine) were the most frequently used codons. RSCU also indicated that almost all frequently used codons ended with either A or T.
The large ribosomal RNA (rrnL) gene is located at the conserved position between trnL1 and trnV and is 1,286 bp in length (88.26 % A+T content). The small ribosomal RNA (rrnS) gene was located at the conserved position between trnV and trnQ and is 779 bp in length (87.93% A+T content). rRNA secondary structures were compared to that of Drosophila melanogaster [45,46]. rrnL contains six domains with 47 helices (domain III is absent in Ad. rugioperculatus as is found in other arthropod mitogenomes) (Fig 2), whereas rrnS contains three domains with 31 helices (Fig 3).
Ad. rugioperculatus contains 22 tRNAs with a total length of 1,420 bp (range 57 to 70). Fourteen tRNA genes were coded on the majority strand and eight on the minority strand. The A+T content of tRNAs was 88.24% with negative AT skew (0.031) and positive GC skew (0.006). Most tRNAs showed the typical cloverleaf secondary structure, except trnS1 and trnS2 where the DHU stem and loop was absent (Fig. S1).
A total of 11 intergenic spacer regions, totaling 99 bp, and varying from 1 to 37 bp in length were detected. The longest intergenic spacer (37 bp) was between nad6 and trnP. A total 15 overlapping regions, covering 179 bp (ranging from 1 to 87 bp in length) were detected (Table 1).
The control region (CR) is located between trnQ and trnI, and is 805 bp in length. It consists of i) a 161 bp A+T rich region along with poly T-stretch; ii) a stem/loop structure flanking TATA motif and GAAT motifs; iii) a 221 bp A+T rich region; iv) two tandem repeats (R1, 120 bp); and v) a terminal 28 bp A-rich region. The A+T content was 93.54 % with positive AT skew (0.19) and GC skew (0.15) (Fig. 4 A).
A comparative study was carried out using the 13 mitogenomes of whiteflies (Table 2) including the newly generated species to explore the codon usage bias, gene rearrangement and phylogeny.
Codon usage bias
Codon usage bias in the 13 whitefly mitogenomes was compared to the effective number of codons (ENC) (Fig. 4 B). The ENC value ranged from 29.35 to 56.02, with an average of 42.96. ENC values in the subfamily Aleyrodinae were significantly higher (t-test, P < 0.05) than those of Aleurodicinae, ranging from 34.31 to 56.02 and from 29.35 to 33.93, respectively. That more codons were used by members of the Aleyrodinae in comparison to Aleurodicinae, suggests weak selection constraints on mitogenomes of Aleyrodinae.
The neutrality analysis revealed that GC3 content is higher in subfamily Aleyrodinae than in Aleurodicinae ranging from 8.6-40.3 and 4.2-9.9, respectively (Table S3). A significant positive correlation between GC12 and GC3 was observed in both subfamilies (Y= 0.355x+23.96, R2 = 0.791) (Fig. 4C) suggesting directional mutation pressure on all codon positions. The slope of linear regression was less than 1 in whiteflies, implying that mutational pressure played a minor role in their mitogenome evolution. Other factors like natural selection, and translational bias accounted for 65% of observed codon usage bias.
Mitogenome rearrangements can be explained by transposition, inversion, inverse transposition of individual genes or gene blocks. Gene rearrangement in Ad. rugioperculatus was analysed by comparing common gene intervals to those of the ancestral insect’s inferred gene order in CREx. CREx analysis identified the transpositions of two tRNAs (trnQ and trnY) away from the ancestral gene blocks I-Q-M and W-C-Y (Table S4).
Comparison of gene order across the 13 available whiteflies was conducted. Gene order in 6 whitefly species is partial due one or more missing tRNAs. The genes trnA, trnN, trnR were not detected in N. andropogonis; trnI in Ac. camelliae and Ach. aceris; trnS1 in Ac. spiniferus and Te. acaciae; trnS2, trnQ in Ad. dispersus. We mapped gene boundaries with respect to the inferred ancestral insect gene order to identify ancestral vs derived, and unique vs shared gene boundaries (Fig. 5). A total of 83 gene boundaries were detected, of which 37 were ancestral and 46 derived boundaries (Table S5). The ancestral 37 gene boundaries were grouped into 5 gene blocks (I-V) that were conserved in different whitefly lineages. Gene block I (cox1-trnL2-cox2-trnK-trnD-atp6-atp8) was conserved in all taxa except Te. acaciae, B. afer and B. tabaci. trnD was translocated from gene block I in these species resulting in the derived gene boundaries trnR-trnD (55) and trnD-trnI (56) in Te. acaciae; and trnV-trnD (61) and trnD-trnQ (62) in Bemisia species. Gene block II (atp6-cox3-trnG-nad3-trnA-trnR-trnN-trnS1-trnE) is conserved in only two species of the subfamily Aleurodicinae (Ad. rugioperculatus and Ad. dugesii). Gene block II is the most variable block across the remaining taxa. A partial set (cox3-trnG-nad3-trnA-trnR) of gene block II is found in the subfamily Aleyrodinae except Te. acaciae and N. andropogonis. Gene block III (trnE-trnF-nad5-trnH-nad4-nad4L-trnT-trnP-nad6-cytb-trnS2-nad1-trnL1) is conserved in 8 species (Ad. dugesii, Ad. rugioperculatus, Ac. camelliae, Ac. spiniferus, C. turpiniae, B. afer, B. tabaci, Te. acaciae). The genes nad1-trnL1 are translocated out of gene block III in P. machili and Ach. aceris; as are trnS2 in Tr. vaporariorum; trnP in N. andropogonis; and nad6-cytb-nad1-trnL1 in Ad. dispersus.
Gene block IV (trnL1-rrnL-trnV-rrnS) is conserved in 6 species (Ad. dispersus, Ad. dugesii, Ad. rugioperculatus, Tr. vaporariorum, Ach. aceris, P. machili). trnV is translocated out of gene block IV in Ac. spiniferus, Ac. camelliae, Te. acaciae, C. turpiniae; rrnS in B. afer, B. tabaci. Gene block V (rrnS-trnI-trnQ-trnM-nad2-trnW-trnC-trnY-cox1) is not conserved in any whitefly species due to multiple transposition and inversions.
The thirty-seven ancestral boundaries were mapped to the phylogenetic tree which revealed that 16 boundaries are found in all members of the family Aleyrodidae. Eleven additional ancestral gene boundaries are retained in the subfamily Aleurodicinae. The ancestral boundaries 4 (trnk-trnD) and 5 (trnD-atp8) are plesiomorphies found in all Aleyrodidae except Bemisia species and Te. acaciae. Gene boundary 7 (atp6-cox3) is a plesiomorphy for the subfamily Aleurodicinae but is lost in subfamily Aleyrodinae due to the transposition of cox3 gene. Gene boundary 27 (trnL-rrnL) is a plesiomorphy for the subfamily Aleyrodinae, whereas 21 (trnT-trnP) and 22 (trnP-nad6) are plesiomorphies for the Aleyrodidae except for Ad. dispersus and N. andropogonis. Gene boundary 24 (cyt-trnS2) is conserved Aleyrodidae except for Ad. dispersus and Tr. vaporariorum.
Out of 47 derived gene boundaries, 19 were synapomorphies and 28 autapomorphies. Synapomorphic gene boundaries were mapped to the phylogenetic tree demonstrating that only nine were shared at the nodes. Gene boundaries 46 (trnW-trnY) and 47 (trnC-cox1) are shared across the family Aleyrodidae with a few exceptions, N. andropogonis (46,47 absent) and Tr. vaporariorum (46 absent). Moreover, gene boundaries 63-65 (trnS2-trnR, cox3-trnN, trnQ-nad1) are synapomorphies for the clade Ach. aceris+P. machili. Gene boundaries 61 (trnV-trnD), 62 (trnD-trnQ) are synapomorphies for the Genus Bemisia. Gene boundaries 38 (atp6-trnE), 39 (rrnL-rrnS) were synapomorphies for the clade (Ac. spiniferus + Ac. camelliae) + (Te. acaciae + C. turpiniae).
Gene boundary 49 (trnS1-trnM) is an autapomorphy for Ac. camelliae; 52-58 (trnV-trnN, trnN-nad3, cox3-trnR, trnR-trnD, trnD- trnI, trnI-trnA, trnA-trnM) for Te. acaciae; 59 (trnS1-trnI) for C. turpiniae; 66 (rrnS-trnM) for Ach. aceris; 67-72 (cox3-nad3, cytb-nad1, rrnS-trnS2, trnS2-trnQ, trnW-trnG, trnG-trnY) for Tr. vaporariorum; 73-78 (trnT-nad6, rrnL-trnP, trnP-nad3, cox3-trnC, trnY-rrnS, trnW-cox1) for N. andropogonis; 79-83 (trnR-rrnS, nad1-nad6, cytb-trnS1, trnN-trnE, trnP-trnI) for Ad. dispersus.
We carried out TreeREx analysis using the 13 PCGs+2RNAs ML-4 phylogenetic tree (Table S5). To eliminate biases due to partial mitogenomes, the 7 missing tRNAs were excluded from all species in this analysis. TreeRex analysis revealed the four inversions at the node of A0 towards Ad. dispersus; one inversion (rrnS) and one inverse transposition (nad3 cox3) at the node of A10 towards N. andropogonis. One inverse transposition (nad3 cox3) occurred at node A9 towards A8 which contains (Ach. aceris + P. machili) + (Bemisia species) + (C. turpinae + Te. acaciae) + (Ac. camelliae + A. spiniferus) in which nad3 and cox3 were inverse transpositioned. One transposition (nad1 rrnL rrnS) occurred at node A8 towards to A7 which contains Bemisia + (C. turpinae + Te. acaciae) + (Ac. camelliae + Ac. spiniferus) (Fig. 6A).
Investigating ancestral gene blocks and mapping new gene boundaries through character coding and TreeREx revealed similar results. Gene order in subfamily Aleurodicinae was more conserved relative to the ancestral insect than was subfamily Aleyrodinae. Gene blocks II and V are the most variable gene regions whereas gene block I is the most conserved one in the Aleyrodidae. These mapped gene boundaries require further investigation across more species within each genus to determine whether they represent species-level autapomorphies or genus-level synapomorphies.
In the present study, phylogenetic analyses were carried out using 4 datasets of derived from the 13 PCGs of whiteflies. Eight phylogenetic trees were constructed by Bayesian Inference and Maximum likelihood which have broadly similar topologies with few discrepancies (Figs. 6B, S2-S9). Both inference methods support the monophyly of both subfamilies Aleurodicinae and Aleyrodinae, consistent with previous studies . Ad. rugioperculatus was sister to Ad. dugesii and they share an identical gene order. In contrast, Ad. dispersus and Ad. dugesii are in one clade but differ in their gene order. Tr. vaporariorum was sister to Aleurochiton + Pealius in all the analyses except ML-1, 2 and BI-2, where it was sister to Bemisia. This result is well supported in earlier phylogenies of whiteflies . The genera Bemisia, and Aleurocanthus were recovered as monophyletic in all the analyses. Ach. aceris was sister to P. machili in all the analyses which was also support by earlier phylogenies. Te. acaciae was sister to C. turpinae in all analyses. Finally, to resolve the phylogenetic relationships of Trialeurodes more species from this genus will need to be sequenced.