Identification and Evolution of Pseudoscorpions
In all, 94 pseudoscorpion specimens were collected from an area of Montpellier (France). 60 specimens were morphologically identified as Atemnus politus (Simon, 1878), 24 specimens belong to the species Geogarypus minor (Koch, 1873) and 10 specimens belong to the species Chthonius ischnocheles (Hermann, 1804). The specimen identification is consistent with the analysis of the mitochondrial cytochrome oxidase I gene marker (COI). The COI sequences of the specimens belonging to Geogarypus minor are identical to each other and to the sequence of Geogarypus nigrimanus (JN018180 specimen voucher MNHN-JAB62). The species Geogarypus nigrimanus has been recently synonymized with Geogarypus minor [37]. Regarding the produced COI sequences of the specimens belonging to Chthonius ischnocheles species, the specimens IV3-1, Q3-1, IV1-1 and Q4 are between 99.8% -100% identical and match the sequence of Chthonius ischnocheles available at the NCBI database (JN018172 specimen voucher MNHN-JAB62). The existence of cryptic species of Chthonius ischnocheles has been documented [38] and the specimens IV-J5 and IV-S1 form a paraphyletic group with the other Chthonius ischnocheles (Figure 1).
The COI sequences of the specimens identified as Atemnus politus present more intraspecific variability: calculated pairwise similarities range between 97.7-99.8%. The blastn analysis [15] does not allow the identification of strongly similar sequences in databases, indeed no COI sequences from species related to the Atemnus genus have been deposited in the NCBI database. The most similar COI sequence in the database belongs to Atemnidae sp. JA-2011 voucher (JN018203), showing only 79.74% similarity. The COI-based phylogeny indicates that all the specimens of Atemnus politus form a clade which is a sister group of the clade composed of Paratemnoides sumatranus and Oratemnus curtus species, both representatives of the Atemnidae family (Figure 1). The COI analysis shows all specimens of Chthonius ischnocheles form a sister group of other species belonging to Chthonius species (C. tetrachelatus, C. dacnodes) (Figure 1). To date, the only species of pseudoscorpion described as a Wolbachia host, Cordylochernes scorpioides, belongs to the Chernetinae and is a sister group to the Atemnidae family (including Atemnus politus) (Figure 1).
Prevalence of Wolbachia infection
Wolbachia was detected by PCR amplification of wsp and gatB markers (see Methods) in the three studied pseudoscorpion species and in accordance with the generally used nomenclature of Wolbachia strains, we have named these new strains according to their hosts: wApol for bacteria infecting Atemnus politus, wGmin for bacteria infecting Geogarypus minor and wCisc for bacteria infecting Chthonius ischnocheles.
The A. politus specimens have the highest prevalence of Wolbachia infection with 43.3% positive samples (26 positive of 60 specimens examined). 16.7% of G. minor individuals appear to be infected (4 positive of 24 specimens examined), while only 10% of the C. ischnocheles (1 positive of 10 specimens examined) show PCR amplification of the Wolbachia markers (Table S1).
The wApol draft genome and annotation
Two draft genomes were produced for wApol Wolbachia from Atemnus politus (specimens K5 and K3) using target capture enrichment [35, 36]. For Illumina sequencing, an enrichment of 45-fold was observed (0.4% of the reads mapped to the draft genome without enrichment against 18% with) and 50-fold enrichment was observed with PacBio sequencing (0.8% of the reads mapped to the draft genome without enrichment as opposed to 40.9% with). Unfortunately, the amounts of DNA available for the two specimens were low, and thus the samples were processed simultaneously using individual barcoded adaptors (see Methods) and for PacBio sequencing, relatively few total reads were produced (Table 1). The hydrid de novo assembly allowed production of two draft genomes. For specimen K5, a 373 contig draft genome of 1,445,964 bp was obtained, with an average G+C content of 35.6% (largest contig, 25,286 bp, N50 = 5,741 bp, average sequencing depth of 420X). For specimen K3, a 200 contig draft genome of 1,404,177 bp was obtained, with an average G+C content of 35.49% (largest contig, 40,755bp, N50 =10,346 bp, average sequencing depth= 205X) (Table 1).
Among 221 single-copy orthologue genes conserved among proteobacteria (BUSCOs database), 155 are present in the wApol K5 draft genome and 151 in wApol K3, suggesting 70.2% and 69% complete BUSCOs, respectively. This percentage can be used to assess the level of completeness of genomes (see Methods). By comparison, the Wolbachia from Drosophila melanogaster, wMel, has a higher level of BUSCOs in the current analysis with 180 genes (81.4%) and Wolbachia from Pratylenchus penetrans, wPpe, has a low level with 161 genes (72.9%) (Table S2). The low level of BUSCOs observed for wApol suggest that either the wApol draft is incomplete or this genome is highly degraded as has been described for Wolbachia from Onchocerca ochengi (74.7% complete BUSCOs) [39].
The Average Nucleotide Identity (ANI) calculation (see Methods) indicates that the two wApol drafts have 99% identity and they are most similar to wCle with 87% identity (Figure 2). According to the analysis of available complete genomes, strains which are representative of the same supergroup share more than 94% identity; for example, 99% for wOo and wOv (from supergroup C), 94% for wPip and wTpre (from supergroup B) or 96% for wMel and wCau (from supergroup A).
1746 coding sequences (CDS) were annotated using RAST from wApol K5 and 2,215 CDS from wApol K3. The two draft genomes contain a low number of transposable elements (55 mobile elements, 16 group II intron-associated genes, 52 ISs for K5; 35 mobile elements, 9 group II intron-associated genes, 36 ISs for K3) compared to other Wolbachia strains infecting insects (with the exception of wTpre), but a high number compared to Wolbachia infecting filarial nematodes (Table S3). Numerous phage-like genes were observed in the wApol K5 draft genome, with 54 CDS (coding sequences) detected on 37 different contigs with 9 annotated as prophage sequences (Table S4). Fewer phage-like genes were detected in the wApol K3 draft genome, with 20 CDS detected on 12 different contigs and only one annoted as a prophage sequence (Table S4). The number of phage-like genes is highly variable among the studied Wolbachia genomes and the number observed in wApol is close to wMel (with 44 CDS) or wCle (with 20 CDS). However, the variability in transposable element and phage numbers observed between the two wApol draft genomes suggests that the genome sequences are incomplete.
Although the wApol drafts are likely not complete, functional annotation of genes by KAAS highlights that the biotin pathway is present, while it is absent in most other studied Wolbachia (Table S5). Indeed, only wCle (supergroup F), wNfla, wNleu (supergroup A), wLug, wStri and wVulC (supergroup B) have a complete biotin pathway. Only bioC is not detected in the wApol K5 draft, but the operon is detected on two contigs: the genes bioA and bioD on contig #258 and the genes bioH, bioF and bioB on contig #172 (Figure 3A). In all studied Wolbachia which contain the operon, the bioC gene is present between the bioD and bioH genes, so the absence could be due to the fragmentation of the wApol K5 draft sequence (Figure 3A). For the wApol K3 draft genome, only one contig (#148) containing bioH, bioF and bioB is detected. The organization of the biotin operon appears to be conserved between all Wolbachia genomes which contain the complete pathway (Figure 3A). The phylogenies of the biotin genes (containing bioA, bioB, bioH, bioD and bioF) shows that the biotin operon of wApol K5 is more closely related to the operon of wCle (Figure 3B).
Multi-locus phylogenies and phylogenomics
The analysis of Wolbachia in Geogarypus minor, wGmin, was more complex. Two genes were sequenced for wGmin, coxA and ftsZ, and only nested PCR provided clean sequences to enable phylogenetic comparisons. The Wolbachia phylogeny, based on the two genes, identified two different sub-groups of Wolbachia in our samples (Figure 4A). The Wolbachia infecting Chthonius ischnocheles, wCisc, and Geogarypus minor, wGmin, group together and appear closely related to supergoup H Wolbachia infecting Zootermopsis termite species (wZneg and wZang). The Wolbachia from Atemnus politus, wApol, and the Wolbachia from Cordylochernes scorpioides, wCsco, form a clade which is divergent from known supergroups. This clade is a sister group to supergroup C Wolbachia, which contains the symbiont present in filarial nematodes (Onchocerca spp. and Dirofilaria immitis). The phylogeny based on six genes (16S rDNA, ftsZ, dnaA, gatB, fbpA and coxA) confirms the topology of wApol and wCisc (wGmin and wCsco not included) (Figure 4B). Our analysis shows the same topology for wApol sequences from the two draft genomes (K5 and K3) as well as sequences of wApol K5 amplified by PCR, confirming that the drafts did not introduce any bias. In addition, to verify the accuracy of the Wolbachia presence in Geogarypus minor, the coxA gene was sequenced from another specimen (III-J5) derived from an independent DNA extraction, and PCR amplification and the sequences were identical to another specimen III-T3 (accession number MT273088).
The Wolbachia from Cordylochernes scorpioides had been previously studied using a set of genes not routinely sequenced for Wolbachia evolutionary studies (groEL, fabK, nuoG, NADH dehydrogenase I subunit F, aspS, gltA, coxA, ftsZ, wsp, orpB, nuoD, isocitrate dehydrogenase gene, TPR domain-containing protein gene) [40]. We performed a multi-locus phylogeny based on these 13 genes, including wCsco and sequences isolated from available complete or draft genomes of Wolbachia, as well as those from the wApol produced draft genomes. The observed phylogeny shows that wCsco and wApol cluster together to form a separate clade, a closely related sister group to supergroup C, as observed with the phylogeny based on only two markers (Figure 4A). The two pseudoscorpions contain closely related Wolbachia strains, with 89.3% identity among these 13 genetic markers (Figure S1).
Two complete phylogenomic analyses were performed: one including 17 Wolbachia genomes representing 6 different supergroups and one including 38 Wolbachia genomes representing 7 different supergroups and including additional draft genome sequences. 320 single-copy orthologues were identified among the 17 Wolbachia genomes, while 166 were identified among the 38 Wolbachia genomes. The Maximum Likelihood phylogeny based on the reduced matrix (17 genomes, 320 orthologues, 89,847 amino-acid sites) (Figure 5A) and the one based on the larger matrix (38 Wolbachia, 167 orthologues, 40,488 amino-acid sites) (Figure 5B) indicates the wApol group does not cluster with any other Wolbachia group, suggesting it evolved from an independent separate speciation event. Based upon the phylogenomics analyses, wApol is closely related to supergroup C (symbionts within the filarial nematodes Onchocerca spp. and Dirofilaria immitis) and supergroup F (including both filarial and arthropods symbionts).