Source of Material, DNA extraction and Characterization
Specimens were caught in the field using classical soil microfauna recovery methods using a Berlese–Tullgren funnel [64, 65]. The samples were collected at Montpellier in 2017 (43°37'52.0"N 3°52'04.6"E).
DNA samples were extracted using the Monarch® Genomic DNA Purification Kit following the recommended protocol for extraction from tissues (New England Biolabs, USA) with overnight incubation at 56°C with proteinase K. The quality of the extraction was verified by a PCR targeting the host COI gene (Additional file 17, Table S7). A total of 20 sequences were deposited in the GenBank Data Library: MN923050-MN923069 (Additional file 18, Table S8).
Detection and molecular characterization of Wolbachia symbionts
The determination of Wolbachia infection in populations was determined by a series of individual specific PCRs (Additional file 17, Table S7). The presence of Wolbachia was initially screened using PCR amplification of the Wolbachia surface protein (Wsp) gene. This pre-screening was further performed by PCR amplification of the cell division protein FtsZ gene on 10% of randomly selected specimens. If the results of the two previous markers were not consistent, an amplification of a third marker, the gatB gene, was performed.
The molecular characterization of Wolbachia was determined by PCR amplification of six genes (16S rDNA gene, dnaA, coxA, fbpA, gatB and ftsZ) as described in Lefoulon et al. [14] (Additional file 17, Table S7). Nested PCR amplification was necessary to obtain enough PCR products to be directly sequenced. A total of 14 sequences were deposited in the GenBank Data Library: MN931248 to MN931249 and MN931689 to MN931700 (Additional file 19, Table S7).
Next-Generation sequencing of Atemnus politus.
Atemnus politus specimens were selected for genomic analysis. Four different libraries preparations were processed: one Illumina and one PacBio library with the Wolbachia DNA enrichment protocol, one Illumina and one Pacbio library without enrichment. The enrichment method has been described by Lefoulon et al. [35] and it is based on the use of biotinylated probes to capture Wolbachia DNA (probes designed by Roche NimbleGen).
For PacBio sequencing, the Large Enriched Fragment Targeted Sequencing (LEFT-SEQ) method as previously described, was utilized [35] without the DNA fragmentation step. The PacBio library without enrichment was produced using the SMRTbell® Express Template Prep Kit 2.0 for Low DNA input, using the barcoded Overhang Adapter (Kit- 8B, PacBio). The enriched PacBio library was sequenced using 2 SMRT cells with the PacBio RS II system and the library without enrichment used 1 SMRT cell, all on the PacBio Sequel I system.
For Illumina sequencing, an adaptation of the described protocol was performed, eliminating the last steps from PacBio library protocol (end repair, ligation of PacBio adaptor and purification). DNA was fragmented using NEBNext® Ultra™ II FS DNA (NEB, USA) at 20˚C for 30 minutes, resulting in DNA fragments with an average size of 350 bases pairs. We used 100ng of DNA per sample for each capture and used SeqCap barcoded adaptors (Nimblegen, Roche) to process simultaneous multiple samples. The Illumina library without enrichment was produced using the NEBNext® Ultra™ II FS DNA Library Prep Kit following the manufacturer’s recommendations (NEB, USA). The enriched Illumina library was sequenced on three independent Ilumina MiSeq runs: one mate-pair 150bp read and two mate-pair 300bp reads. The unenriched Illumina library was sequenced with one, single-end, 150bp NextSeq run. All sequencing was performed at New England Biolabs.
De novo assembly pipeline
Illumina reads were filtered by quality and "cleaned" using the wrapper Trim Galore! (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), and then merged with PEAR (Zhang, Kobert et al. 2014). PacBio circular consensus sequences (CCS) were generated using SMRT® pipe RS_ReadsOfInsert Protocol (PacBio) with a minimum 3 full passes and minimum predicted accuracy superior at 90. The adapters were removed by trimming off the first and last 65 bp of the reads and reads smaller than 20 bp or reads containing residual adaptor sequences (potential chimeric reads) were detected and removed using seqkt (github.com/lh3/seqtk) (analyses were performed with an in-house shell script).
A first hybrid de novo assembly was done using Unicycler [66]. The potential contigs belonging to Wolbachia were detected by nucleotide similarity using BLASTn [67] and selected. The Illumina reads were mapped against this contig selection using Bowtie2 (with the very sensitive settings) [68] and the PacBio reads using ngmlr (with the PacBio preset settings) [69].
A second hybrid de novo assembly was performed with this new selection of reads. Selection of Wolbachia contigs was performed using BLASTn a second time. Assembly statistics were calculated using QUAST [70].
In addition, potential contigs belonging to mitochondria of the host were isolated from the first de novo assembly using BLASTn against mitochondrial genomes from pseudoscorpion available in the database. Selection of mapped reads were prepared as described above. An assembly of the host mitochondrial genome was performed using Unicycler. The produced mitochondrial sequences was annotated using MITOS (Bernt, Donath et al. 2013).
Comparative genomic analyses and annotation of wApol
The comparative genomic analyses described below included analyses of 10 available complete genomes of Wolbachia and 4 draft genomes: wMel, Wolbachia from Drosophila Melanogaster (NC_002978), wCau, Wolbachia from Carposina sasakii (CP041215) and wNfla, Wolbachia from Nomada flava (LYUW00000000) for supergroup A; wPip, Wolbachia from Culex quinquefasciatus (NC_010981), wTpre, Wolbachia from Trichogramma pretiosum (NZ_CM003641), wLug, Wolbachia from Nilaparvata lugens and wStri (MUIY01000000), Wolbachia from Laodelphax striatella (LRUH01000000) for supergroup B; wVulC, Wolbachia from Armadillidium vulgare (ALWU00000000) closely related to the supergroup B; wPpe, Wolbachia from Pratylenchus penetrans for supergroup L (NZ_MJMG01000000); wCle, Wolbachia from Cimex lectularius for supergroup F (NZ_AP013028); wFol, Wolbachia from Folsomia candida for supergroup E (NZ_CP015510); wBm, Wolbachia from Brugia malayi (NC_006833) for supergroup D; wOv Wolbachia from Onchocerca volvulus (NZ_HG810405) and wOo, Wolbachia from Onchocerca ochengi (NC_018267) from supergroup C.
The Average Nucleotide Identity (ANI) between the wApol draft genome and available complete genomes of Wolbachia was performed using the ANI Calculator [71]. In addition, an in-silico genome-to-genome comparison was done to calculate a digital DNA-DNA hybridization (DDH) using GGDC [39]. The calculation of dDDH allows analysis of species delineation as alternative to the wet-lab DDH used for current taxonomic techniques. GGDC use a Genome Blast Distance Phylogeny approach (GBDP) to calculated the probability that an intergenomic distance yielded a DDH larger than 70% representing a novel species-delimitation threshold [39]. We used formula 2 to calculate the dDDH because it is more robust using incomplete draft genomes [72]. The completeness of the draft genome was studied using BUSCO v3 [73]. BUSCO estimates the completeness of genomes analyzing gene content and comparing to selection of near-universal single-copy orthologue genes (here, 221 genes in common among proteobacteria (proteobacteria_odb9)).
The processed drafts were analyzed using RAST pipeline [74] and annotated using Prokka [75]. Transposable elements were identified: insertion sequences (ISs) using ISSAGA [76] and group II introns using RAST pipeline. KEGG Orthology (KO) assignment were generated using KASS (KEGG Automatic Annotation Server [54]. KASS assigned orthologue genes by BLAST comparison against KEGG genes database using BBH (bi-directional best hit) method. The same assignment analysis was performed for the wApol draft genome, the set of 14 complete or draft genomes and 1 supplementary draft genomes: wNleu, Wolbachia from Nomada eucophthalma (LYUV00000000). The assigned KOs were ordered in 165 different KEGG pathways (Additional file 20, Table S10). 8 pathways were selected which showed differences between wApol annotation with other Wolbachia reference and for them, a list of assigned genes involved were identified to study potential losses/acquisition of genes within Wolbachia diversity (Additional file 21,Table S11). For some genes of interest, such as the biotin operon, the amino-acid sequences were selected with the KASS assignments and protein orthologues in other Wolbachia or bacteria were identified using nblast and then aligned.
The contigs of wApol draft genome containing WO bacteriophage region were mapped across the complete genome of phage WOVitA1 (HQ906662). Homologs genes from other Wolbachia were then selected using nblast on available data on NCBI. Regarding some genes of interest, homologues of cifA, cifB and wmk were identified in wApol using nblast and then mapped with selection of homologues genes from other Wolbachia.
Multi-locus phylogeny and Phylogenomics
Two multi-locus phylogenies were performed. The first phylogeny was based on six markers (16S rDNA, dnaA, ftsZ, coxA, fbpA and gatB), classically used for Wolbachia phylogeny [14, 77]. The produced sequences were analyzed with available sequences extracted from 49 Wolbachia complete or draft genomes and the addition of sequences from Wolbachia from Zootermopsis angusticollis and Zootermopsis nevadensis (Additional file 19, Table S9). A second phylogeny was based on thirteen markers (groEL, fabK, nuoG, NADH dehydrogenase I subunit F, aspS, gltA, coxA, ftsZ, wsp, orpB, nuoD, isocitrate dehydrogenase gene, TPR domain-containing protein gene) which includes less Wolbachia strains (because these markers were rarely used for phylogenetic analyses) but included the only Wolbachia known to infect pseudoscorpion, Wolbachia from the pseudoscorpion Cordylochernes scorpioides [33, 38]. This analysis included 14 complete genomes in addition to the wApol draft genome and wCsco Wolbachia from Cordylochernes scorpioides (Additional file 19, Table S9)
For phylogenomic analyses, single-copy orthologue genes between a selection of Wolbachia genomes were identified using Orthofinder [78]. Two types of phylogenomics studies were performed: one included only 15 complete genomes and one included 31 complete or draft genomes (Additional file 19, Table S9). Variations in completeness of draft genomes can be variable and have a negative effect on the robustness of the analyses, and thus the two separate analyses were performed.
The orthologue sequence alignments were generated by MAFFT [79]. For the multi-locus phylogenies, a supermatrix of these six alignments was generated using Seaview [80], and for phylogenomics, the supermatrix was produced by Orthofinder (implemented as functionality). For the later, the poorly aligned positions of the produced orthologue genes alignments were eliminated using Gblocks [81]. The phylogenetic analyses were performed with Maximum Likelihood inference using IQ-TREE [82]. The most appropriate model of evolution was evaluated by Modelfinder [83] (implemented as functionality of IQ-TREE). The robustness of each node was evaluated by a bootstrap test (1,000 replicates). The phylogenetic trees were edited by FigTree (https://github.com/rambaut/figtree/) and Inkscape (https://inkscape.org/).