Genome organization
A total of 3.7 Gb data (about 20-fold coverage) was obtained from Illumina HiSeq 2500 platform which produced 15,178,382×2 raw reads. Extracted reads were cleaned and 9,630,532×2 clean reads were obtained for the assembly of the mt genome. The longest contig is 16,659 bp in size that represented the complete mt genome of F. suturalis (GenBank accession: MI456908). We identified and annotated all of the 37 mt genes typical of metazoan mt genomes (Fig. 2; Table 1). This mt genome contains 13 protein-coding genes, 22 tRNA genes, two rRNA genes and three non-coding (AT-rich) regions (Fig. 2; Table 1). The mt gene arrangement and distribution of genes are distinct from those of F. quadripustulatus [11] and I. bisignatus [15]. The overall nucleotide composition was: A = 27.8%, T = 44.8%, C = 11.1%, G = 16.3%. All mt genes were encoded on the heavy strand, which is similar to the other bird louse species [11, 13]. The three pairs of overlapping regions in the mt genome of F. suturalis were observed among nad4L/nad1, tRNA-His/tRNA-Asp and tRNA-Asp/tRNA-Arg. The overlapping regions ranged from − 4 bp to − 8bp (Table 1). Besides, 22 intergenic regions were observed in this mt genome, ranging from 1 bp to 180 bp in size. The longest space was found between tRNA-S2 and tRNA-G genes (Table 1).
Table 1
The organization of the mt genome of F. suturalis.
Gene/Region | Positions | Size (bp) | Number of aaa | Ini/Ter codonsb | Anticodonc | In |
cox1 | 34-1557 | 1524 | 507 | ATA/TAA | | + 33 |
tRNA-Met (M) | 1574–1637 | 64 | | | CAT | + 16 |
tRNA-Gln (Q) | 1639–1705 | 67 | | | TTG | + 1 |
tRNA-Glu (E) | 1706–1770 | 65 | | | TTC | 0 |
atp6 | 1774–2445 | 672 | 223 | ATA/TAA | | + 3 |
tRNA-Asn (N) | 2451–2517 | 67 | | | GTT | + 5 |
rrnS | 2518–3243 | 726 | | | | 0 |
rrnL | 3244–4318 | 1075 | | | | 0 |
tRNA-Ala (A) | 4319–4382 | 64 | | | TGC | 0 |
nad6 | 4385–4858 | 474 | 157 | ATG/TAA | | + 2 |
tRNA-Val (V) | 4861–4922 | 62 | | | TAC | + 2 |
cox3 | 4977–5726 | 750 | 249 | ATA/TAA | | + 54 |
tRNA-Lys (K) | 5746–5808 | 63 | | | TTT | + 19 |
nad4 | 5843–7156 | 1314 | 437 | ATT/TAG | | + 34 |
AT-loop region | 7157–7985 | 829 | | | | |
tRNA-LeuUUR (L2) | 7986–8047 | 62 | | | TAA | 0 |
tRNA-Pro (P) | 8064–8124 | 61 | | | TGG | + 16 |
nad2 | 8130–9101 | 972 | 323 | ATG/TAA | | + 5 |
tRNA-Thr (T) | 9171–9235 | 65 | | | TGT | + 69 |
tRNA-Tyr (Y) | 9249–9313 | 65 | | | GTA | + 13 |
cox2 | 9314–9991 | 678 | 225 | ATA/TAA | | 0 |
AT-loop region | 9992–10713 | 722 | | | | |
nad5 | 10714–123889 | 1676 | 558 | ATG/TA | | 0 |
tRNA-Phe (F) | 12390–12456 | 67 | | | GAA | 0 |
tRNA-Cys (C) | 12477–12543 | 67 | | | GCA | + 20 |
atp8 | 12565–12765 | 201 | 66 | ATG/TAA | | + 21 |
tRNA-SerUCN (S2) | 12772–12840 | 69 | | | TGA | + 6 |
tRNA-Gly (G) | 13021–13091 | 71 | | | TCC | + 180 |
AT-loop region | 13092–13516 | 425 | | | | |
nad3 | 13517–13903 | 387 | 128 | ATT/TAG | | 0 |
tRNA-LeuCUN (L1) | 13905–13966 | 62 | | | TAG | + 1 |
nad4L | 13992–14264 | 273 | 90 | ATT/TAA | | + 25 |
nad1 | 14257–15163 | 907 | 302 | ATG/T | | -8 |
tRNA-SerAGN (S1) | 15164–15231 | 68 | | | TCT | 0 |
cytb | 15232–16323 | 1092 | 363 | TTG/TAG | | 0 |
tRNA-Trp (W) | 16330–16396 | 67 | | | TCA | + 6 |
tRNA-His (H) | 16398–16460 | 63 | | | GTG | + 1 |
tRNA-Asp (D) | 16457–16524 | 68 | | | GTC | -4 |
tRNA-Arg (R) | 16516–16585 | 70 | | | ACG | -8 |
tRNA-Ile (I) | 16593–16659 | 67 | | | GAT | + 6 |
aThe inferred length of amino acid (aa) sequence of 13 protein-coding genes; bIni/Ter codons: initiation and termination codons; |
cIn: Intergenic nucleotides. |
The observed total A + T and G + C content of the complete mt genome were 73.0% and 27.0%, respectively, which were consistent with those of previous studies [11, 15] (Table 2). A negative AT skew (-23.3) and a positive GC skew (18.9) were calculated in this my genome (Table 2), which are common features of ectoparasites mt genome [11, 15]. All bird lice from Philopteridae reported to date and in the present study show strand asymmetry (GC skew between 6.3% and 38.1%) (Table 2).
Table 2
Nucleotide composition of the mt genomes of Philopteridae species, including that of Falcolipeurus suturalis.
Species | Nucleotide frequency (%) | Whole genome sequence |
A | T | G | C | A + T% | AT skew | GC skew |
Bothriometopus macrocnemis | 32.1 | 38.7 | 15.5 | 13.8 | 70.8 | -9.2 | 6.1 |
Campanulotes bidentatus compar | 26.5 | 43.7 | 20.67 | 9.77 | 70.1 | -24.5 | 38.1 |
Campanulotes compar | 26 | 44.5 | 20.4 | 9.1 | 70.5 | -26.3 | 38.1 |
Coloceras sp. SLC-2011 | 27.5 | 42.9 | 19.9 | 9.6 | 70.4 | -21.8 | 35.1 |
Ibidoecus bisignatus | 35.5 | 40.6 | 13.2 | 10.8 | 76 | -6.7 | 10.2 |
Columbicola columbae | 39.1 | 29.2 | 16.3 | 15.4 | 68.2 | 14.6 | 2.8 |
Columbina picui | 33.5 | 31.6 | 18.3 | 16.6 | 65.1 | 2.9 | 5 |
Columbina cruziana | 32.9 | 31.4 | 19 | 16.7 | 64.3 | 2.4 | 6.3 |
Falcolipeurus quadripustulatus | 26.3 | 45.5 | 16.9 | 11.3 | 71.8 | -26.8 | 20.1 |
Falcolipeurus suturalis | 28 | 45 | 16.4 | 11.2 | 73 | -23.3 | 18.9 |
Annotation
As the mt genomes of parasitic lice can contain non-standard initiation codons [1, 5, 13], the identification of initiation codons can sometimes be challenging. In this mt genome, all protein-coding genes had ATA or ATG or ATT or TTG as their initiation codon. 4 genes (cox1, atp6, cox3 and cox2) start with ATA, 5 genes (nad6, nad2, nad5, atp8 and nad1) start with ATG, 3 genes (nad3, nad4L and nad4) start with ATT and 1 gene (cytb) use TTG (Table 1). All protein-coding genes had TAA or TAG or TA or T as their termination codon (Table 1). 8 genes (cox1, atp6, nad6, cox3, nad2, cox2, atp8 and nad4L) stop with TAA, 3 genes (nad4, nad3, and cytb) stop with TAG, nad1 gene stop with TA and nad5 gene use T (Table 1). Incomplete termination codons (TA or T) were identified in nad1 and nad5 genes, which is consistent with studies of some other bird lice, including B. macrocnemis (nad1), F. quadripustulatus (nad5, nad6 and nad1). In F. suturalis mt genome, the rrnL genes was located between rrnS and tRNA-Ala genes, and rrnS genes was between tRNA-Asn and rrnL genes (Fig. 2; Table 1). The lengths of the rrnS and rrnL genes were 726 bp and 1075 bp, respectively. The 22 tRNA genes length varied from 61 to 71 bp (Table 1). All 22 tRNA genes can fold into cloverleaf structure (Fig. 3), which were consistent with those of previous studies [32, 33]. Non-coding region (NC1) (829 bp), located between nad4 and tRNA-L2, has the highest A + T content of 75.4%. Non-coding region (NC2) (722 bp; A + T = 74.4%) located between cox2 and nad5 and Non-coding region (NC3) (425bp; A + T = 75.1%) located between tRNA-G and nad3 (Table 1).
Comparative analyses between F. suturalis and F. quadripustulatus
The entire mt genome of F. suturalis is 537 bp longer than that of F. quadripustulatus [11]. A comparison of the nucleotide and the amino acid sequences of each protein-coding gene of the two Falcolipeurus species is given in Table 3. Nucleotide sequence difference across the entire mt genome was 31.4%. The magnitude of nucleotide sequence variation in each gene between F. suturalis and F. quadripustulatus ranged from 13.2–27.5%. The greatest variation was observed in the atp8 gene (27.5%), whereas least differences (13.2%) were found in the cytb genes (Table 3). For the rrnL and rrnS genes, sequence difference was 28.4% and 14.6% between F. suturalis and F. quadripustulatus, respectively (Table 3). Amino acid sequences inferred from individual mt protein genes of F. suturalis were compared with those of F. quadripustulatus. The amino acid sequence differences ranged from 4.5%-41.2%, with cox1 being the most conserved protein, and atp8 the least conserved (Table 3). This level of amino acid difference is very high. Previous studies of other lice have detected high level difference in protein sequences. For example, difference in amino acid sequences of the 13 protein-coding genes between C. picui and C. cruziana was 5.5–50% [16], and C. bidentatus compar and C. compar was 0-37.3% [11, 14]. Taken together, the molecular evidence presented here supports that F. suturalis and F. quadripustulatus represent distinct louse species.
Table 3
Nucleotide (nt) and/or predicted amino acid (aa) sequence differences in mitochondrial genes among Falcolipeurus quadripustulatus (FQ) and Falcolipeurus suturalis (FS) upon pairwise comparison
Gene/region | Nt sequence length | Nt diference (%) | Number of aa | aa diference (%) |
FS | FQ | FS/FQ | FS | FQ | FS/FQ |
cox1 | 1524 | 1554 | 15.3 | 507 | 517 | 4.5 |
atp6 | 672 | 675 | 18.5 | 223 | 224 | 16.1 |
rrnS | 726 | 610 | 28.4 | | | |
rrnL | 1075 | 1084 | 14.6 | | | |
nad6 | 474 | 478 | 22 | 157 | 159 | 25.8 |
cox3 | 750 | 789 | 21.2 | 249 | 265 | 16.2 |
nad4 | 1314 | 1305 | 24 | 437 | 434 | 27.2 |
nad2 | 972 | 972 | 27 | 323 | 323 | 32.5 |
cox2 | 678 | 675 | 14.4 | 225 | 224 | 7.5 |
nad5 | 1676 | 1711 | 19.6 | 558 | 570 | 21 |
atp8 | 201 | 204 | 27.5 | 66 | 67 | 41.2 |
nad3 | 387 | 354 | 27.4 | 128 | 117 | 34.4 |
nad4L | 273 | 288 | 20.8 | 90 | 95 | 21.1 |
nad1 | 907 | 848 | 20 | 302 | 282 | 12.6 |
cytb | 1092 | 1092 | 13.2 | 363 | 363 | 9 |
Gene rearrangement
The mt genome arrangement of two Falcolipeurus species substantially differs from those of other bird louse species within the family Philopteridae and from the inferred typical gene arrangement of ancestral insect mt genome (Fig. 4). Only two gene blocks are shared between B. macrocnemis and the ancestral insect pattern: G-nad3 and atp8-atp6 [13], and one gene block is shared between Campanulotes species and the ancestral insect: atp8-atp6 [11, 14, 34]. However, no derived mt gene arrangements are shared between the two Falcolipeurus species. In addition, three gene blocks, V-cox3, Y-cox2 and L1-nad4L, are shared by Falcolipeurus and Ibidoecus [11]. Such a lack of conserved gene arrangement in the mt genome of bird lice precludes the accurate reconstruction and identification of the rearrangement events and model [13].
Usually, the gene arrangement in the mt genome is very conserved within the same genus of ectoparasites [11, 14, 35]. Gene arrangement events between F. suturalis and F. quadripustulatus were also analyzed (Fig. 5); at least one translocation could be inferred. The nad3 gene is located between cox2 and tRNA-Thr genes in F. quadripustulatus, but was found between tRNA-Gly and tRNA-LeuCUN in F. suturalis (Fig. 5). The gene arrangement in the mt genomes of two Falcolipeurus species indicated that the rate of change in the arrangement of mt genes may vary substantially among closely related groups of lice [36].
One tRNA gene (tRNA-Gly) was lacking and the duplication of three genes (tRNA-Thr, tRNA-Tyr and cox2) was detected in the F. quadripustulatus mt genome [11]. However, 37 genes have been identified in the F. suturalis mt genome. Gene duplication have been also reported in mt genomes of several families of the class Insecta, such as Brontostoma colossus [37], Phalantus geniculatus [38] and Reduvius tenebrosus [39]. In addition, tRNA loss was also found in the mt genome of several families of the class Insecta [11, 40], and this case can be explained by the tandemduplication-random loss (TDRL) model.
Phylogenetic relationships
Phylogenetic analysis showed the clear genetic distinctiveness between F. suturalis and F. quadripustulatus (Bayesian posterior probabilities = 1.0). The branch leading to the two Falcolipeurus species is much longer than the branch of two Columbina species (looking at branch lengths in tree). The genus Falcolipeurus is more closely related to the genus Ibidoecus than to other genera (Bayesian posterior probabilities = 1.0) (Fig. 6), which was consistent with that of a previous study [11].
Mt genome sequences are effective molecular markers to study the phylogenetic and systematic relationships at various taxonomic ranks across the phylum Arthropoda, including ectoparasites [41–46]. DNA sequencing provides the opportunity to further evaluate the phylogenetic relationships of the Philopteridae. For examples, Cruickshank et al. analyzed nuclear elongation factor-1 alpha (EF1α) sequences of 127 species from the four suborders and showed the Philopteridae to be paraphyletic [41]. Yoshizawa and Johnson 2003 analyzed mt 12S and 16S rDNA sequences of 18 species and showed the Philopteridae to be paraphyletic [42]. However, Johnson et al. analyzed 1107 single-copy orthologous genes of 46 species and showed that the Philopteridae to be monophyletic [43]. de Moya et al., analyzed 2,370 orthologous genes and showed that the Philopteridae to be monophyletic [44]. To date, the phylogenetic position of the Philopteridae in deep-level relationships within the order Phthiraptera could not be confidently determined. Although mt genomic data have been proven to be useful genetic marker to explore the phylogenetic relationships among several major lineages of parasitic lice [7, 9, 11], mt genome sequences of many lineages of the family Philopteridae are underrepresented or not represented. Therefore, more complete mt genomes of bird louse species representing this families that have not yet been decoded should be included in future analysis to resolve the phylogenetic position of the family Philopteridae within the order Phthiraptera.