Characterization of the family-level Borreliaceae pan-genome and development of an episomal typing protocol

Abstract Background The Borreliaceae family includes many obligate parasitic bacterial species which are etiologically associated with a myriad of zoonotic borrelioses including Lyme disease and vector-borne relapsing fevers. Infections by the Borreliaceae are difficult to detect by both direct and indirect methods, often leading to delayed and missed diagnoses. Efforts to improve diagnoses center around the development of molecular diagnostics (MDx), but due to deep tissue sequestration of the causative spirochaetes and the lack of persistent bacteremias, even MDx assays suffer from a lack of sensitivity. Additionally, the highly extensive genomic heterogeneity among isolates, even within the same species, contributes to the lack of assay sensitivity as single target assays cannot provide universal coverage. This within-species heterogeneity is partly due to differences in replicon repertoires and genomic structures that have likely arisen to support the complex Borreliaceae lifecycle in which these parasites have to survive in multiple hosts each with unique immune responses. Results We constructed a Borreliaceae family-level pangenome and characterized the phylogenetic relationships among the constituent taxa which supports the recent taxonomy of splitting the family into at least two genera. Gene content profiles were created for the majority of the Borreliaceae replicons, providing for the first time their unambiguous molecular typing. Conclusion Our characterization of the Borreliaceae pan-genome supports the splitting of the former Borrelia genus into two genera and provides for the phylogenetic placement of several non-species designated isolates. Mining this family-level pangenome will enable precision diagnostics corresponding to gene content-driven clinical outcomes while also providing targets for interventions.


Background
The Borreliaceae family includes many obligate parasitic bacterial species which are etiologically associated with a myriad of zoonotic borrelioses including Lyme disease and vector-borne relapsing fevers.Infections by the Borreliaceae are di cult to detect by both direct and indirect methods, often leading to delayed and missed diagnoses.Efforts to improve diagnoses center around the development of molecular diagnostics (MDx), but due to deep tissue sequestration of the causative spirochaetes and the lack of persistent bacteremias, even MDx assays suffer from a lack of sensitivity.Additionally, the highly extensive genomic heterogeneity among isolates, even within the same species, contributes to the lack of assay sensitivity as single target assays cannot provide universal coverage.This within-species heterogeneity is partly due to differences in replicon repertoires and genomic structures that have likely arisen to support the complex Borreliaceae lifecycle in which these parasites have to survive in multiple hosts each with unique immune responses.

Results
We constructed a Borreliaceae family-level pangenome and characterized the phylogenetic relationships among the constituent taxa which supports the recent taxonomy of splitting the family into at least two genera.Gene content pro les were created for the majority of the Borreliaceae replicons, providing for the rst time their unambiguous molecular typing.

Conclusion
Our characterization of the Borreliaceae pan-genome supports the splitting of the former Borrelia genus into two genera and provides for the phylogenetic placement of several non-species designated isolates.Mining this family-level pangenome will enable precision diagnostics corresponding to gene contentdriven clinical outcomes while also providing targets for interventions.
Spirochetoses are characterized as chronic, invasive, debilitating diseases caused by parasitic bacteria within the taxonomic order spirochaetales.Among these diderm helical microbes, the most genomically complex in terms of replicon numbers and heterogeneity are those in the Borreliaceae family.Due to climate change and other anthropomorphic environmental disruptions, the Borreliaceae have greatly expanded beyond their traditional geographic epidemiological zones and have developed diverse symbiotic interactions with multiple alternative hosts (28).The true taxonomy of the Borrelia genus has been a subject of interest due to the genomic and pathogenic disparities among its member species.A recent taxonomic revision led to the proposed division of the Borrelia genus into two distinct genera, the Borreliella and Borrelia (29).In this revised classi cation, the relapsing fever agents retain the original Borrelia taxonomy due to their priority of discovery and the Lyme borreliosis agents became known as Borreliella spirochetes.This work did not include a novel subset of isolates known as the reptileassociated Borrelia, which to date have not been well characterized.
It is vital to note that this reclassi cation has not been without controversy (30)(31)(32).In the years since the original proposal, some authors have argued that data other than differences in average nucleotide identity (ANI) do not support the split (30).However, these latter authors were themselves criticized for not fully examining the multi-omic data that provided extensive quantitative data in support of the division (33,34), thus the controversy continues.
Early genomic studies on Borreliaceae spirochetes revealed that they possess a complex genome of a single, linear chromosome terminating in covalently bound hairpin ends, along with multiple single-copy episomes that exist as circular or hairpinned-linear molecules (35,36).While the chromosome contains conserved genes linked to essential functions and survival, the individual episomal elements vary widely in size, structure (linear and circular), and gene content, and facilitate Borreliella host adaptation and/or virulence.Historically, the smaller replicons were found to not be as conserved and were not always retained in long-time cultures.The smaller episomes that were retained or found biologically important were principally categorized by molecular size and structure with only minimal use of gene content.Despite these limitations, previous comparative genomic studies led to the establishment of a B. burgdorferi sensu lato pangenome (37).This study utilized 23 Borreliella genomes, most of which were B. burgdorferi, and were of varying assembly quality and plasmid content.This rst attempt to create a genus-level pangenome showed that additional isolates would be necessary to understand the degree of variation among Borreliella.However, until recently, there have been very few pan-genomic publications on these clinically important taxa (38,39).Moreover, there has not been a pangenome constructed for the Borrelia genus nor for the entire Borreliaceae family.It is essential to better understand these pathogens through the construction of a family-level pangenome using the latest error-correcting longread DNA sequencing methodologies to ensure capture and single-contig assembly of all genomic elements.

Study design
All available complete or near-complete high-quality Borreliaceae genomes, as of the commencement of this study (n = 69) with sequencing coverage greater than 30X and an average N50 of 900 Kb or above were downloaded from the NCBI prokaryotic genome database and subjected to additional QC analyses for completeness.These genomes were then combined with 39 newly sequenced Borreliaceae genomes chosen to ll in gaps within the taxonomic coverage of the family.All pan-genome analyses were conducted from this nal curated database of 108 strains (Table 1).

Paci c Biosciences Sequel I whole-genome sequencing and validation
A modi ed gel-plug DNA isolation and pulse eld gel electrophoresis (PFGE) analysis was performed on selected Borreliaceae isolates (Borreliella burgdorferi strain B31 and Borrelia hermsii strain HS1) to determine the number of replicons and to serve as a control for the extraction and sequencing of the multiple replicons that make up the Borreliaceae genomes (40).Using the same DNAs, we performed whole genome sequencing on a Paci c Biosciences (PacBio) Sequel I using single molecule errorcorrecting circular consensus sequencing.Quality assurance measures of the WGS's included assembling the linear chromosomes into a single contig ≥ 0.9 Mb and identifying each of the other assembled contigs (9 Kb to 200 Kb) as corresponding to one of the PFGE replicons.
We rst performed comparative analyses of our lab-sequenced genomes of the Borreliella burgdorferi B31 and Borrelia hermsii HS1 strains with their respective NCBI reference genomes to determine if all replicons of both Borreliaceae species representing both genera were present and fully sequenced (41,42).We con rmed that our whole genome sequencing was comparable to previous work by aligning a reference and our sequence of B. hermsii strain HS1 (Fig. 1a).Using a progressive Mauve alignment of the reference strain and our sequenced B. hermsii strain HS1d, multiple locally colinear blocks (LCB) were noted across the entire length of the genome.The largest LCB showed high homology between our sequence and the reference for the linear chromosome and the large linear plasmid.The linear chromosomes contained 100 single nucleotide polymorphisms (SNPs) and the large linear plasmids had 5 SNPs.The smaller contigs from both genomes were either partially aligned or had no clear matches, likely in part due to progressive Mauve being unable to track duplications and the shared gene content between some plasmids (43).The estimated total number of SNPs between the two HS1 genomes was ~ 1,000.These differences likely stem from small duplications near the telomeres of the linear plasmids since progressive Mauve is ill-suited to handle them.Additionally, novel plasmids within the newly sequenced B. hermsii strain HS1d may have arisen from recombination events to which Borreliaceae plasmids are prone (38,44), or alternatively due to plasmid loss during the culturing of the original HS1 isolate.A second alignment using D-Genies corroborated these ndings (Fig. 1b).
Finally, whole genome alignments of the reference B. hermsii strain HS1 and our sequenced B. hermsii strain HS1d con rmed their similarity (Fig. S1) but also demonstrated that they are not identical, particularly with respect to the episomal elements.Some differences in the episomal elements may be accounted for by genes encoding proteins undergoing antigenic variation or by the variable cassettes which drive genetic conversion.

Coverage of covalently bound hairpin ends of Borreliaceae episomes
In our alignments, several reference B. hermsii strain HS1 plasmids exhibited homology to the middle of our sequenced HS1d contig pairs (Fig. 1a).This could be an artifact of how these contigs were parsed and assembled.We identi ed short anking inverted repeats at the ends of many HS1d contigs that were not included in an LCB with the HS1 reference (Fig. 1 and Fig. 2).These inverted repeats stem from the hairpin present within the ends of all linear Borreliaceae replicons or because of artifacts of longread sequencing (35,38,45).
To con rm which regions of the B. hermsii strain HS1 and HS1d plasmids were inverted, we paired the contigs and aligned them using the D-Genies program (Fig. 1b).This analysis identi ed regions with inverted repeats present at the very ends of some HS1d contigs that were termini of linear replicons.
Further inspection of the B. hermsii strain HS1d contigs using minimap2 and subsequent plotting with pafr con rmed this artifact on both ends of suspected linear plasmids (Fig. 2) (46, 47).
The presence of these inverted repeats results from a speci c artifact that arises due to a combination of the actual hairpin topology of linear Borreliaceae replicons and the stem-loop adapters used for PacBio long-read sequencing (Fig. 3) (35,38,45).The heterogeneous ends of these contigs were closely inspected for the presence of the telomere resolvase (ResT) Box 3 TATA motif found in Borrelia species (Fig. 3) (48, 49).These motifs were identi ed solely on linear B. hermsii strain HS1d contigs as well as within our sequenced Borreliaceae cohort.To ensure that the inverted repeat sequence in each contig for all the Borreliaceae assembly would not skew downstream comparative genomic analyses, each contig that contained a ResT Box 3 motif was trimmed down.

Average Nucleotide Identity Analyses Support a Digeneric Borreliaceae
The ANI among all Borreliaceae genome assemblies were computed using the pyani average nucleotide identity blast method (BLASTN+).These analyses revealed that the species members of the original single Borrelia genus were clearly separated into two groups corresponding to the proposed (50) Borrelia and Borreliella genera (Fig. 4).Thus, our ndings concur with the division of the former Borrelia genus into two genera and henceforth we will refer to the Lyme borreliosis-causing spirochetes as Borreliella and the relapsing fever-causing spirochetes as Borrelia.Additionally, we included in the analysis two nonspecies designated Borrelia spirochetes, B_sp_HM_thM16W and B_sp_FGy1, which are thought to be part of the reptile-associated group (51)(52)(53)(54)(55) that were demonstrated to be sui generis, but clustered more closely with the Borreliella genus than the Borrelia.
Further inspection of the 65 Borreliella strains included in the analyses, covering nine named species, demonstrated that the vast majority have been placed within the correct taxa with essentially all pairwise strain analyses within a species showing > 96% ANI (Fig. 5).All the B. burgdorferi and B. afzelii were correctly assigned.One of the two strains typed as B. bavariensis clustered within the B. garinii.The other typed with one of the two strains typed as B. valaisiana, forming a small sui generis group which may correspond to a hybrid species between B. garinii and B. afzelii as the hierarchical clustering placed them between the proposed parent species.These ndings are in line with previous reports that had suggested that B. bavariensis was a subspecies of B. garinii.B. maritima was equidistant between B. afzelii and B. burgdorferi.In addition, we characterized the phylogenetic relationship of several novel isolates included in this study.These included the lagomorph isolate, Borreliella andersonii, which at the time of its discovery was misclassi ed as a B. burgdorferi strain (56) but was later reclassi ed as B. andersonii (57).In our ANI analyses, the B. andersonii strain MOK_3a clustered closely to B. burgdorferi and B. maritima.Of the remaining three Borreliella species (B.andersonii, B. bissetii, and B. mayonii), all were most closely related to B. burgdorferi but each were su ciently distant to warrant their species designations.
Compared to the Borreliella, the Borrelia spirochetes displayed greater genetic heterogeneity.The B. miyamotoi strains clustered strongly together, displaying an average pairwise ANI of 97% (Fig. 6) but were distinct from the cluster of species containing B. hermsii, B. parkeri, and B. turicatae.The reptileassociated Borrelia species were both remarkably distant from these other major relapsing fever spirochetes and highly divergent among themselves with only B_sp_FGy1 and B. turcica strains sharing any substantial similarity at 94% (Fig. 7).Through ANIb (blast), we determined that these two genomes bracket the Borrelia genera (Fig. 4).The isolate B_sp_HM_thM16w was also divergent from the group and was closest to B. recurrentis (Fig. 4, 6).
Following our initial ANI analysis, the evolutionary relatedness of these spirochetes was checked further by constructing a family phylogeny (Fig. 7).This revealed both a division amongst the Borreliaceae spirochetes and the same clustering for Borreliella and Borrelia spirochetes.Interestingly, we observed that some more recently identi ed Borreliella and Borrelia isolates cluster within their respective genera.This is most clearly illustrated with B. andersonii which, based on the single-copy core phylogeny, was most closely related to B. burgdorferi and B. bissettii.Other novel genomes, like those seen within the reptile-associated Borrelia spirochetes, stood apart from the majority of the Borrelia genus as had been previously seen in the ANI.Only the novel B. sp strain thHMw intercalated within B. miyamotoi and B. hermsii strains.The rest of the reptile-associated spirochetes formed a separate clade within Borrelia spirochetes.

Borreliaceae family-level pangenome
Following our initial ANI analyses, we wanted to understand how similar the genera were based on their overall gene similarities.The percentage of shared genes within the family was approximately 50% (Fig. 9).To begin to understand which genes are shared between the different genera, the Borreliaceae family-level pangenome was multiply calculated at 5% intervals of increasing similarity (Fig. 9).At 75% BlastP, clear separation between the core and cloud components of the Borreliaceae pangenome is seen (Fig. 9).At the 75% BlastP threshold, there are 322 core genes, 8 soft-core genes, 1,352 genes in the shell, and 3,424 genes in the cloud.There are 1,253 unique genes for all 109 isolates included in this study.Interestingly, the Borreliaceae pangenome had a different composition when analyzed with the EggNOG algorithm.Through this second approach, EggNOG estimates that Borreliaceae contains 997 unique genes for 109 isolates.Furthermore, its pangenome distribution consists of 577 core, 30 softcore, 204 shell, and 213 cloud genes.Although each of these pangenome analysis tools were given the same dataset, they yielded different results.These discrepancies are due to underlying differences in the alignment methods used by the algorithms.EggNOG relies upon alignment to a database of proteins which have been pre-clustered, rather than relying on a speci c blast threshold.Additionally, EggNOG largely ignores genes of unknown function rather than labeling it as "unknown," which arti cially decreases the number of unique genes.
Previously, a Borreliella genospecies complex pangenome was constructed through the use of 22 genomes comprised of B. burgdorferi, B. afzelii, and B. garinii (37).In this work, they found that the Borreliella pangenome was open.To ascertain if this was true within our own curated collection of Borreliaceae genomes, we next tested for changes in the pangenome using both data acquired from Roary at 75% BlastP and EggNOG (Fig. 10).It is evident that the number of gene clusters present within each pangenome compartment is subject to change at the incorporation of a novel pangenome.This suggests that the family-level pangenome is open with a growing distributed component reminiscent of what was found within the early Borreliella comparative genomic study.
Next, we visualized the Borreliaceae pangenome using a heat map for all included isolates (Fig. 11).The family-level core genome consists of 488 genes possessed by all isolates in both genera.Interestingly, the family-level distributed pangenome is fragmented into multiple components.This division reveals two additional groups of genes that on further inspection were identi ed as the core genes of either the Borreliella or Borrelia genera.
The Borreliaceae episomes can be grouped through gene content Borreliaceae spirochetes thrive throughout their complex lifecycles during which they undergo multiple transmissions between phylogenetically distinct hosts including arthropod vectors and multiple classes of vertebrates.Their ability to survive in various hosts is dependent upon a large set of distributed genes that are mostly encoded on episomal replicons.These episomes have traditionally been typed primarily based on molecular size, structure (linear or circular), and alleles of proteins in the PFam32 family (58).To ensure proper episomal identi cation, this system needs to be expanded to include information on the entire gene repertoire of all available episomes.To this end, we ran a pyani ANIb on approximately 1,800 Borreliaceae replicons from our curated genome collection.As pyani ANIb may not factor all gene content within the replicons, we also employed a Hadamard matrix (which interprets identity and coverage simultaneously) for this purpose.In the Hadamard matrix heatmap, multiple subgroups for nearly all of the Borreliaceae family replicons were identi ed (Fig. 12).
The most similar episomes clustered closely together.The three most conserved groups are the Borreliella chromosome, cp26, and lp54.Other Borreliella episomal groups with relatively higher percentages of similarity were lp25 and lp36.Interestingly, there was very broad and diffuse clustering for the clinically signi cant family of cp32 plasmids which are known to contain genes encoding hostspeci c proteins, indicating both their relatedness and the very high degrees of heterogeneity necessary to ensure survival during a lifecycle that includes multiple hosts.Unlike cp32, the lp28 episomes do not likely represent a single family of replicons as they did not all cluster together and instead separated out into smaller clusters.
For the Borrelia genus, a high degree of similarity was seen amongst the chromosomes as well as for the large linear plasmids.Additionally, species-speci c groupings amongst the episomes were also observed.This is best illustrated in B. miyamotoi strains lp6 and lp12.It is important to note that while the reptile-associated Borrelia spirochetes were included in these analyses, they were the only isolates to have extraordinarily dissimilar replicons.This is likely attributable to the incompleteness of the B_sp_A_FGy1 genome as indicated by its large number of constituent contigs.

Discussion
The debilitating diseases caused by Borreliaceae spirochetes have remained on the periphery of societal memory for centuries.Despite our current knowledge of the etiological agents of these spirochetoses, there is still a paucity of information on how each disease is caused and why their presentations are so variable among individuals.In the work presented in this study, multiple pressing points within the eld of Borreliaceae comparative genomics were addressed to help provide a framework going forward to be able to associate microbial genotypes with disease phenotypes in the host.
The rst topic we addressed was the composition of the Borreliaceae family.It was recently proposed that the previous single genus, Borrelia, be divided into two genera based on differences in overall genomic content (29,34).The well-studied and more widely known Lyme borreliosis spirochetes were given the genus name of Borreliella, while the older and less clinically studied species complex comprising the relapsing fever spirochetes retained the Borrelia designation.Since the establishment of this proposed dichotomy, there has been animated discussion among scientists both for and against the split (30)(31)(32)(33)(34).
Pairwise ANI analyses of all 108 sequenced isolates (N = 11556 genome comparisons) in this study revealed a sharp division between the two proposed genera indicating that the recent taxonomic reclassi cation is correct.Even isolates from within each of the genera often did not share sequence similarities above 85%.While it may be true that the ANI cut-off can vary based on species as seen with Strenotrophomonas maltophila, Escherichia spp, or Lactobacillus spp., our results do not support including the Lyme borreliosis spirochetes and relapsing fever spirochetes within a single genus (59,60).
Additional analyses revealed that the recently sequenced novel spirochaetal species B_sp_FGy1 and B_sp_HM_thM16w clustered most closely with B. anserina and B. recurrensis, respectively.These two genomes clustered within the known reptile-associated Borreliaceae.As these isolates are novel, it is unknown if they are pathogenic to humans.
We constructed a family-level pan-genome to characterize the similarities and differences between the two genera concerning gene content.In doing so, we found that the family-level Borreliaceae core genome was very small, consisting of only 488 genes, but that each of the genus-level core genomes were each more than double that size and that the family-level pangenome is still open.This indicates that there are large numbers of distributed genes still to be discovered.The number of core genes possessed by both genera were similar.
It was previously known that there is some similarity between Borreliella and Borrelia replicons with the Borrelia replicons containing regions of similarity to the Borreliella cp26 and lp54 (27,44,61,62).To better place the various replicons within a gene-functional framework, we established a system for typing the Borreliaceae plasmids.Previously, these episomes have been typed based on size, structure, and the presence of speci c alleles of particular protein families found on many of the plasmids (26, 58).
While this technique helped somewhat in the identi cation of the many Borreliaceae episomes, it did little to identify them by the biological functions they encode.In our work, all of the available episomes for each of the isolates in our data set were used in an ANI analysis via pyani.Doing so for over 1,800 replicons resulted in a complex matrix that did little to clarify how these episomes related to each other (Fig. 12).However, in building a Hadamard matrix from these analyses it was possible to visualize each of the known plasmids with respect to their overall hierarchical clustering, even for the most heterogenous replicon, cp32.Thus, we have established a universal Borreliaceae plasmid typing system based on gene content that captures all of the replicons produced via our PacBio-based long-read sequencing protocols.
Finally, we identi ed an artifact of sequencing linear bacterial replicons with hairpin termini with the PacBio SMRTbell technology.In our sequencing, it was noted that some of the many linear episomes had regions of heterogeneity.After closer inspection, it became apparent that these regions were inverted repeats that could emerge as artifacts in long-read sequencing platforms (38, 63).These regions were identi ed as the telomeric ends of all linear Borreliaceae replicons, a feature not unlike those seen in higher eukaryotes (41,45,64,65).To prevent false in ation in the genes within our data set, we searched for and con rmed the location of the ResT Box 3 site (49,65).From this, we have begun to build the rst system to handle these Borreliaceae long-read sequencing artifacts that could be applied to any replicon with similar features.These data also suggest that raw PacBio circular consensus sequences may contain hidden information about hairpin and cruciform DNA structures in other genomes.This work on the Borreliaceae family-wide pangenome is pivotal and the rst of its kind.More sequencing is required to ensure that the pangenome is truly re ective of all Borreliaceae spirochetes.Furthermore, more work is required for the typing of all Borreliaceae episomes.This family-level pangenome analysis provides new tools and information in a eld that requires more insight on how to handle such evasive and persistent pathogens.More importantly, this work can later culminate into effective therapeutics.

Conclusions
Through the application of multiple comparative genomic methods including phylogenetics, ANI, gene content, and core genome analyses, we demonstrate that the Borreliaceae are composed of at least two genera.We also developed a gene-based plasmid typing protocol to replace the current confusing method that relies solely on size and topography.This approach to episomal typing revealed an extremely high degree of allelic heterogeneity in the plasmid(s) that contain the genes that encode the major host-interacting proteins which are re ective of the extremely broad host-range of these parasitic bacteria.Additional analyses with a larger number of genomes will be required to determine if the reptileassociated strains and other underrepresented groups in the current analysis de ne additional genera.

Borreliaceae pangenome project design
This study included all available reference Borreliaceae genomes obtained from NCBI RefSeq on October 17th, 2021.The downloaded genomes were checked for completeness, sequencing platform, sequencing coverage, and if the average number of contigs greatly superseded the number of included replicons.For example, selected genomes were disquali ed if more than 2 or 3 contigs were needed for a single replicon.Additionally, they were run through a QC pipeline to ensure usability in later studies.The remainder of the genomes were used henceforth.Novel isolates collected by collaborators at Virginia Commonwealth University were then sequenced at the Center for Genomic Sciences core facility at Drexel University College of Medicine.These newly sequenced members of the Borreliaceae are included in BioProject PRJNA861274.

Borreliaceae culturing
All Borreliella and Borrelia isolates were cultivated in 50 mL of BSK-II media supplemented with 6 or 12% rabbit serum (Sigma, MA), respectively.The culture was then monitored using wet-mounts and dark-eld microscopy.Once spirochetes grew to the concentration of 5 x 10^7 cells/mL at late log phase, they were harvested by centrifugation, resuspended in 1X sterile PBS, and then recovered by centrifugation.
Pulse-Field Gel plasmid veri cation DNA isolation of selected Borreliaceae isolates was done through a modi ed BioNano plug prep (Bionano Genomics, Bionano Prep cell Culture DNA Isolation Protocol) (Bionano Genomics, CA).Cultured Borreliaceae were pelleted at 5,000 xg for 10 mins and sent overnight to the Center for Genomic Sciences at Drexel University College of Medicine.Received pellets were immediately thawed, resuspended with a total volume of 66 µL of cell buffer, and then 40 µL of 2% agarose was added.The contents were mixed 10 times via pipette and then 100 µL of each sample was aliquoted into a plug mold.The lled plug mold was incubated at 4°C for 15 mins before starting protein digestion.This process was done by placing the plug into a solution of 167 µL proteinase K (Qiagen, DE) and 2.5 mL Lysis buffer, then incubated for 2 hours at 50°C in a thermomixer at intermittent mixing.After incubation, the digestion solution was replaced and incubated overnight on the thermomixer as described above.
Following the overnight incubation, plugs were allowed to cool at room temp for 5 mins and then we added 50 µL of RNAse A before placing the plugs in the thermomixer at 37°C with intermittent mixing.Plugs were then washed 3 times with 10 mL 1X Wash buffer before being washed once more with 10 mL and set on an orbital shaker for 15 mins at 180 RPM.This latter step was repeated 3 more times.DNA recovery was initiated by adding 10 mL of 1X TE buffer to wash plugs and shaking at 180 RPM for 15 mins a total of 5 times.Cleaned plugs were then inserted into a 1% low melt agarose gel and run on a pulse-eld gel at 5V for 21 hrs.Pulse eld gel was stained with ethidium bromide and visualized under UV.

Borreliaceae DNA extraction
Once received, frozen Borreliaceae pellets were thawed on ice and centrifuged at 13,000 rpm for 1 min.Following pelleting, the remaining solvent was discarded before the pellets were resuspended in 200 µL of 1X PBS, pipette mixed, and transferred to a 2 mL bead beating tube (Matrix E) (MP Biomedicals, CA).Following the transfer, 20 µL of Proteinase K (Qiagen, DE) was added, and cells were homogenized using a SPEX 1600 MiniG (Fisher Scienti c, MA) for 1 min at 1500 Hz.Subsequent DNA extraction was performed using the Qiagen™ DNeasy Blood & Tissue Kit according to the manufacturer's instructions (Qiagen, DE).Genome assembly, annotation, and pangenome construction Borreliaceae isolates that were sequenced on the PacBio Sequel I platform were processed using pbcromwell (v 1.0.4)(Paci c Biosciences, CA).All the barcoded data were demultiplexed using pb_demux_subreads and assembled using pb_assembly_microbial.The base modi cation motifs for each assembly were computed using pb_basemods.The species of each genome was veri ed using taxator and GTDB-Tk (v1.7.0) with refpack (v r202).All reference and sequenced Borreliaceae assemblies were annotated with Prokka (v 1.11) and then homologous genes were clustered with Roary (v 3.5.1)(68, 69).The BlastP threshold was calculated consecutively at 5% intervals to determine the threshold.The nal BlastP threshold, 75%, was selected based on the separation of the core pangenome from the cloud and accessory pangenomes.All CDS identi ed by Prokka were also annotated using EggNOG (70).
The Borreliaceae percent gene similarity was calculated using RStudio in a pairwise fashion for each isolate in this study.The nal list of shared percent identity was then pivoted wider to generate a matrix.Any pairs that did not have a gene percent similarity value were assigned a value of 0. The nal matrix was then clustered with hclust and visualized with pheatmap (v1.0.12).

Borreliaceae pangenome mapping
To nd pangenome distribution, all Roary at BlastP 75% or EggNOG gene clusters were imported into R (70).Then each gene was associated with a corresponding strain and/or replicon before being used to construct a presence-absence matrix.This matrix was then visualized in pheatmap to see the gene presence and absence across all species used in this study.To determine if the Borreliaceae pangenome was either open or closed, the composition was checked by shu ing strains, incorporating them one at a time into the analysis, and then calculating the core/distributed component.The combinations of strains incorporated were shifted prior to being plotted in each iteration.Homologs were de ned as having Roary BlastP of ³75%.Y-axis represents gene clusters within the Borreliaceae pangenome.X-axis depicts all Borreliaceae isolates.The red annotations indicate the gene groups.The Borreliaceae pangenome has a relatively small core with each of the two genera-de ned core genomes being more than twice as large.Only the most prevalent of the distributed genes (not present in either of the three core genomes) are included in the heatmap. TableS2.docx

Extracted
Borreliaceae DNA was quanti ed following ThermoFisher Scienti c™ 1X dsDNA HS kit per the manufacturer's instructions (Biotium Inc, CA) on Qubit.DNA from each Borrelial/Borrelia specimen was prepped with the SMRTbell Template Prep Kit 2.0 (Paci c Biosciences, CA) to make PacBio SMRTbell libraries with barcodes sourced from the Barcoded Overhang Adaptor Kit 8A and 8B (Paci c Biosciences, CA).The sequencing primers were then annealed and bound to Polymerase 3.0 using the Sequel Binding Kit 3.0 (Paci c Biosciences, CA).The nal bound complex was then puri ed and later sequenced on PacBio Sequel I using the SMRT Cell M1 v3 tray (Paci c Biosciences, CA).The spike-in controls for each PacBio Sequel I run were from the Internal Control Kit 3.0 (Paci c Biosciences, CA).

Figures
Figures

Figure 1 In
Figure 1

Figure 2 Self
Figure 2

Figure 4 Average
Figure 4

Figure 5 Average
Figure 5

Figure 6 Average
Figure 6

Figure 7 Average
Figure 7