Nucleotide composition and genome organization
The complete mtgenomes of An. peditaeniatus (MT822295) and An. nitidus (MW401801) are both circular, closed and double-stranded structures, with full lengths of 15,416 and 15,418 bp, respectively (Fig. 1). Both are composed of 37 genes (including 13 PCGs, 22 tRNA genes and two rRNA genes) and one control region (CR). There are 22 genes (nine PCGs and 13 tRNAs) located on the majority coding strand (J-strand), while the other 15 genes (four PCGs, nine tRNAs and two rRNAs) are on the minority strand (N-strand). Compared with the typical Diptera mtgenome (Drosophila yakuba), both An. peditaeniatus and An. nitidus have “trnR-trnA” rearrangements. The AT content of the mtgenomes of the two species are as high as 78.32% and 78.26%, respectively, which are significantly higher than their GC content (21.68%, 21.74%), showing obvious AT bias (Additional file 1: Table S1). The AT-skew of An. peditaeniatus (0.0322) is higher than the average AT-skew of all investigated mosquito mtgenomes (0.0283), whereas the AT-skew of An. nitidus mtgenome (0.0266) is lower than the average AT-skew value. The GC-skew in An. peditaeniatus(-0.1587) and An. nitidus (-0.1536) are higher than the average GC-skew value in mosquitoes investigated (-0.16048).
The three-dimensional scatter plot of the AT content, AT-skew and GC-skew of 76 mtgenomes in the genus Anopheles is shown in Fig. 2. The AT-skew with the range of variation from 0.005 for An. gilesi to 0.043 for An. christyi. However, all mtgenomes display negative GC-skews ranging from -0.207 for An. parvus to -0.136 for An. punctulatus. Most of the species of the subgenera Nyssorhynchus and Cellia have similar AT content and AT/GC-skew, which are closely distributed in the Three-dimensional scatter plot, whereas the species of the subgenera Lophopodomyia, Stethomyia, Kerteszia and Anopheles are widely distributed in the plot for AT content, AT-skew and GC-skew.
Protein‑coding genes
The total nucleotide lengths of the 13 PCGs of An. peditaeniatus and An. nitidus is 11,223 and 11,168 bp, respectively. In the An. peditaeniatus, ATN is used as the start codon, except for COX1 and ND5 which use TCG and GTG as the start codon, respectively, and in the An. nitidus, 13 PCGs initiate with ATN as the start codon, but COX1 uses TCG as a start codon (Table 2).
The RSCU values of 76 species of mtgenomes in the genus Anopheles are presented in Additional file 2: Table S2. The mtgenomes of the Anopheles have relatively different usage frequencies of synonymous codons. In the 76 species, UUA is the most frequently used codon, followed by CGA, GGA, GCU. The amino acid Leu has the highest usage percentage for all 76 mtgenomes investigated with an average of 16.37%, followed by Phe (9.69%), Ile (9.31%) and Ser (8.48%), whereas Cys has the lowest percentage (0.99%). The usage percentages of amino acids seem no obvious difference among different subgenera (Fig. 3).
The non-synonymous (Ka) and synonymous (Ks) substitution ratio (Ka/Ks) of 13 PCGs are shown in the Fig. 4. The Ka/Ks ratios are all less than 1, and the ND6 has the highest Ka/Ks ratio (0.203), followed by six genes (ATP8, ND2, ND5, ND4L, ND4, ND3) with Ka/Ks ratios of 0.098-0.152. Complex IV (COX1, COX2 and COX3), Complex III (CYTB), ND1 and ATP6 have low Ka/Ks ratios with range from 0.022 (COX1) to 0.051 (ND1). These results imply all of these 13 PCPs experienced purifying selection, especially Complex IV, Complex III, ND1 and ATP6.
Transfer RNAs, ribosomal RNAs and CR
The total length of 22 tRNAs of An. peditaeniatus and An. nitidus is 1475 bp and 1476 bp, respectively, and the length of these 22 tRNAs varies from 64 to 72 bp. All tRNAs can fold into the typical clover-leaf structure, containing four stems and loops except for trnS2 which lost the dihydrouridine (DHU) arm. There are 22 mismatched base pairs(G-U) to be found in An. peditaeniatus tRNAs, and 21 mismatched base pairs(G-U) in An. nitidus (Additional file 3: Figure S1). In the two newly sequenced mtgenomes, rrnL is located between trnL2 and trnV, and rrnS between trnV and CR. The length of the rRNAs is 2125 bp, with an AT content of 81.36% in An. punctulatus; 2122 bp, with an AT content of 81.39%% in An. nitidus.
The control regions (CRs) of the mtgenomes are both located between rrnS and trnI with their lengths of 575 and 580 bp, and their AT content of 94.43% and 93.62% (the highest among all mtgenome regions), respectively in An. peditaeniatus and An. nitidus. Six repeat unit types are identified in the CRs of the 74 species of mtgenomes in Anopheles (Additional fle 4: Fig S2). All species have the repeat unit type of 15-27 bp poly-T Stretch, which is located in front of other repeat unit types and just after 140-212 bp of conserved sequence. The poly-T Stretch is adjacently connected with the conserved motif 5′-CCCCTA-3′ in the conserved sequence in 68 species, whereas the motif was substituted by 5′-ATTGTA-3′ in An. cracens and An. dirus, and 5′-TTCCCC-3′ in An. kompi, An. nimbus, An. gilesi and An. pseudotibiamaculatus. The second type is a 12-55 bp sequence with 2-6 repeats, which is just after the poly-T Stretch and exists in 54 species. The third type ([TA(A)]n Stretch) contains 22-91 repeats, which exists in 36 species. The fourth type is a 12-38 bp sequence with 2-5 repeats which are near trnI and exist in 40 species. The remaining two repeat unit types are found in only a few species, one of them is a 15-36 bp sequence which after the second type and exists in 5 species; and the last one is a 108-171 bp sequence, which is longest one among all six types and only exists in four species.
Phylogenetic relationships
Bayesian inference (BI) and Maximum-likelihood (ML) analyses produced two same topology of phylogenetic trees in the subgenus-level (Fig. 5-6). The six subgenera investigated, Lophopodomyia, Stethomyia, Kerteszia, Nyssorhynchus, Anopheles and Cellia all seem to be monophyly in both analyses, with posterior probability (pp) = 1 for every subgenus in BI (Fig 5) and bootstrap values (bv) ranging from 99% to 100% in ML analysis (Fig. 6). The subgenus Lophopodomyia is located at the base of these six subgenera, and the branch comprising the remaining five subgenera has the support of pp = 1 and bv = 71%. The two subgenera Stethomyia and Kerteszia form a monophyly with pp = 1 and bv = 89%, which was earliest derived but the Lophopodomyia. The branch containing the Nyssorhynchus, Anopheles and Cellia possess the support of pp = 1 and bv = 68%. The subgenus the Nyssorhynchus seems to be sister group with the monophyly Anopheles + Cellia that has pp = 1 and bv = 99%.
In the subgenus Cellis, four series investigated, Myzomyia, Neocellia, Pyretophorus and Neomyzomyia each seem monophyletic with pp = 1 and bv = 100% for all of these monophylies. The series Neomyzomyia would be earliest derived and sister with remaining three seriers, and the series Pyretophorus would be sister with series Myzomyia and Neocellia. In the subgenus Anopheles, two sections Angusticorn and Laticorn both seem polyphyletic, and in section Laticorn both series Arribalzagia (pp = 1 and bv = 96%) and Myzorhynchus (pp = 1 and bv = 100%) seem monophyletic. In the subgenus Nyssorhynchus, three sections investigated Myzorhynchella, Argyritarsis and Albimanus all seem polyphyletic, and in the section Argyritarsis, two series Argyritarsis and Albitarsis both seem polyphyletic as well.
On the other hand, internal relationships of the Kerteszia are different of BI tree and Ml tree: An. homunculus branched out earlier than An. bellator in BI-tree (Fig. 5), however, in ML-tree, An. bellator branched out earlier than An. homunculus (Fig. 6).