Complete de novo assembly of Wolbachia endosymbiont of Drosophila willistoni using long-read genome sequencing

Wolbachia is an obligate intracellular α-proteobacterium which commonly infects arthropods and filarial nematodes. Different strains of Wolbachia are capable of a wide range of regulatory manipulations in many hosts and modulate host cellular differentiation to influence host reproduction. The genetic basis for the majority of these phenotypes is unknown. The wWil strain from the neotropical fruit fly, Drosophila willistoni, exhibits a remarkably high affinity for host germline-derived cells relative to the soma. This trait could be leveraged for understanding how Wolbachia influences the host germline and for controlling host populations in the field. To further the use of this strain in biological and biomedical research, we sequenced the genome of the wWil strain isolated from host cell culture cells. Here, we present the first high quality nanopore assembly of wWil, the Wolbachia endosymbiont of D. willistoni. Our assembly resulted in a circular genome of 1.27 Mb with a BUSCO completeness score of 99.7%. Consistent with other insect-associated Wolbachia strains, comparative genomic analysis revealed that wWil has a highly mosaic genome relative to the closely related wMel strain from Drosophila melanogaster.


Introduction
Wolbachia is a gram-negative -proteobacterium and is found as an endosymbiont in many arthropods and nematodes with a diverse range of effects on host phenotypes 1,2 .Wolbachia are maternally transmitted from host oocytes to the developing embryo 1 .Wolbachia strains manipulate host reproduction to promote their transmission to the next generation of hosts 2 .Subsequently, Wolbachia strains have strong a nities for host germline tissues 3 .Intriguingly, the Wolbachia strain from the neotropical fruit y Drosophila willistoni, wWil, selectively infects the host germline 4,5 .This unique tropism could be informative for understanding how Wolbachia localizes to and regulates the host germline with implications for vectorizing Wolbachia infections for biological control mechanisms.
The a nity of wWil for host germ line cells is unique in comparison to closely related Wolbachia strains.Phylogenetic comparisons based on ampli cation of the wsp and ftsZ genes by PCR indicate that wWil is closely related to the wAu strain found in Drosophila simulans 4 .However, unlike wAu, which infects both germline and somatic tissues in D. simulans, wWil is exclusively found in the primordial germline cells of D. willistoni embryos 4 .Additionally, wWil exhibits 100% maternal transmission in laboratory lines, which is attributed to wWil's tropism towards pole cells and selective infection of only the germ line 4 .Understanding the mechanism underlying the germ line speci c tropism of wWil could inform how other strains of Wolbachia localize to host tissue types.
This distinctive example of host cell speci city is crucial for understanding Wolbachia's ability to colonize new hosts, with signi cant implications for biological pest control strategies.In D. simulans infected with non-native Wolbachia strains, the host genetic background has been shown to regulate the tissue tropism of the infection 5 .In native infections, D. melanogaster hosts wMel Wolbachia infections in a broad range of cell types, infecting both somatic and germline tissues.Whereas in D. willistoni, wWil demonstrates a restrictive infection pattern, targeting germline-derived cells 4,5 .
Despite the availability of numerous Wolbachia genomes, a complete wWil genome is particularly important due to its unique germline-speci c tropism.Here we present the rst high-quality de novo assembly of wWil obtained from nanopore sequencing of wWil infected in vitro D. melanogaster cultures.In providing this genome, we seek to identify the genetic differences which exist between the wWil and wMel genomes and if those differences can provide insights into the mechanisms underlying wWil's germline-speci c distribution.

Results and discussion wWil genome assembly
We collected Wolbachia wWil from wWil-infected Drosophila willistoni embryos 6 and introduced wWil to immortalized Drosophila melanogaster JW18 cell culture cells with the shell vial technique 7 .We allowed the infection to stabilize by maintaining the culture for several weeks at 23°C, then collected the wWilinfected cells from con uent cultures 7 .For each sample, 1.2 mL (at ~ 2e6 cells/mL) of cells were pelleted by centrifugation at 16,000xg for 10 minutes at 4°C.Following supernatant removal, DNA was extracted using the Wizard HMW DNA Extraction kit (Promega #A2920, Lot: 0000575812).Libraries were prepared with the Native Barcoding Kit V14 for Nanopore MinION R10 (Oxford Nanopore Technologies Cat #SQK-NBD114-24, Lot: NDP1424.10.0010) and sequenced on the Nanopore MinION Mk1B with a MinION R10 Version ow cell (FLO-MIN-114, Lot:11003064).We used Oxford Nanopore's MinKNOW v23.07.8 software and with live basecalling with Guppy v7.0.8 (Fast model, read splitting ON) and a minimum read length of 200 bp and stopped sequencing after 36 hours.This resulted in 3.65 M reads with an estimated N50 of 1.11 kb and 2.6 Gb called with a minQ of 8.
Prior to genome assembly, we preprocessed the raw nanopore reads to remove host-derived sequences.

Genome polishing and quality assessment
We generated Illumina short read whole genome sequence (WGS) data from JW18 cell culture cells stably infected with wWil to polish the Nanopore assembly.Reads were aligned to the wWil assembly and D. melanogaster reference 8 (dmel6) simultaneously using bwa mem with default settings.Optical duplicates were marked with sambamba 15 .The reads aligning to dmel6 were discarded.The remaining reads were converted back to fastq format using samtools fastq, then re-aligned to the wWil genome using minimap2 v2.26 with the settings -ax sr --cs --eqx.Reads with de (gap-compressed mismatch ratio) exceeding 0.04 were ltered out to remove mismapping and excess noise prior to polishing.The tool Pilon (v1.24) was run on these ltered alignments using default settings, producing the nal polished assembly.
We assessed the quality of the polished assembly with BUSCO 16 and annotated the genome with a standard work ow.BUSCO scores were calculated using the rickettsiales_odb10 database and v5.7.0.Polishing produced an improvement in BUSCO score from 98.6-99.7%.Default parameters were used for all software unless otherwise speci ed.We annotated the wWil genome with Prokka 17 v1.1.1 (kingdom:bacteria) to identify coding sequences (CDS), tRNAs, rRNAs, and tmRNA.GC Content and GC Skew were calculated with Proksee 18 v1.1.2.We then aligned the wWil genome against the wMel reference genome (CP046925.1)with BLASTn with an expected value cut-off of 0.0001.We plotted these annotations with Proksee 18 v1.1.2to visualize the annotated genome (Fig. 1).

Genome annotations and assessments
To place our wWil genome within the Wolbachia species phylogeny, we gathered a set of 27 circular, chromosome-level genome assemblies from many Wolbachia supergroups with broad host range 19 , and used Ehrlichia chaffeensis as an outgroup.Genes were annotated using the NCBI Prokaryotic Genome Annotation Pipeline 20 , and groups of orthologous genes (orthogroups) were identi ed across species with OrthoFinder2 21 .This produced a phylogeny based on single-copy orthologs, rooted on E. chaffeensis.Additionally, we utilized BUSCO 16 analysis to characterize gene presence-absence variation across orthogroups.Our wWil assembly had a high BUSCO completeness score of 98.6% before polishing, which was comparable to the other circular, chromosome-level Wolbachia genomes.We found that the wWil genome resides in Wolbachia supergroup A, alongside wMel and many other y-infecting species (Fig. 2A).Despite being closely related, alignment of the wWil genome to the wMel CP046925.1 22 reference genome with Mauve 23 (snapshot 2015-02-25.1)revealed many breaks in synteny between the genomes (Fig. 3).In general, our analysis showed a supergroup-speci c pattern of gene presence-absence variation (Fig. 2B).
We also performed a brief assessment of putative secreted and membrane-bound proteins that could play a role in the Wolbachia-host interaction.Proteins with a signal peptide were identi ed by SignalP 24 , and proteins with a transmembrane domain were identi ed by TMHMM 25 .Those with a signal peptide and a transmembrane domain were classi ed as membrane-bound proteins, while those with a signal peptide but without a transmembrane domain were classi ed as secreted proteins.We then characterized presence-absence variation of putative secreted and membrane proteins within groups of orthologous genes across species.Finally, we identi ed variable sites in all proteins by calculating the Shannon entropy metric 26,27 , and compared the number of high-entropy sites in membrane and secreted proteins versus all proteins in general.
Just as for all genes, there was a supergroup-speci c pattern in presence-absence variation for both membrane-bound and secreted proteins across Wolbachia species (Fig. 4).Additionally, membrane and secreted protein groups had many variable sites compared to all proteins in general.The median number of variable sites in an orthogroup across all Wolbachia genes was one, while the medians for secreted and membrane proteins were 14 and 13.5 variable sites respectively (Fig. 5).This analysis revealed proteins with many sites that vary across diverse Wolbachia species with a wide host range, and thus provides candidates for further interrogating Wolbachia-host interactions at the molecular level.Variability of membrane proteins and secreted proteins compared to all proteins.Shown is a histogram of the distribution of orthogroups across the number of high-entropy (variable) sites in their protein sequence alignment.Orthogroup counts are plotted separately for all proteins (gray), secreted proteins (pink), and membrane proteins (blue), with median number of variable sites represented by dashed lines of the respective colors.There were 1,003 orthogroups that did not contain any variable sites, which are not included in the plot.

Figure 1 Map
Figure 1

Figure 3 The
Figure 3

Figure 4 Presence
Figure 4