The Diversity and Evolutionary Relationships of Ticks and Tick-borne Bacteria in China

Ticks (order Ixodida) are ectoparasites of vertebrates that transmit diverse pathogens to humans and domestic animals. However, information on the genomic diversity of ticks in China is currently limited to a small number of common species, leaving substantial knowledge gap in the evolution of both ticks and their associated bacterial. We collected more than 20,000 contemporary and historical (up to 60 years of preservation) tick samples representing a wide range of tick biodiversity across diverse geographic regions in China, including 18 common species, nine rare species, and two undetermined species. Metagenomic sequencing was performed on individual ticks to obtain the complete or near-complete mitochondrial (mt) genome sequences from 46 tick species, among which 30 species were revealed for the rst time. These new mt genomes data greatly expanded the diversity of many tick groups and revealed ve cryptic species. Utilizing the same metagenomic sequence data we identied divergent and abundant bacteria in Haemaphysalis, Ixodes, Dermacentor and Carios ticks, including nine species of pathogenetic bacteria and potentially new species within the genus Borrelia. We also used these data to explore the evolutionary relationship between ticks and their associated bacteria, revealing a pattern of long-term co-divergence relationship between ticks and Rickettsia and Coxiella bacteria.


Introduction
Ticks (Acari: Ixodidae) are blood-feeding ectoparasites of vertebrates that are notorious for their role in the transmission of a variety of infectious diseases. Modern ticks are classi ed into three families: Argasidae (soft ticks), Ixodidae (hard ticks) and Nuttalliellidae. While there is only one extant species in the Nuttalliellidae, considered the closest extant relative to the ancestral tick lineage, more than 700 and 200 recognized species have been identi ed within the Ixodidae and Argasidae, respectively. [1][2][3].
China covers a large geographic area and possesses a variety of ecosystems. To date, at least 125 tick species from nine genera have been reported across 34 provinces in China, representing 13.9% of tick species identi ed globally [4]. The most frequently reported species in China are Haemaphysalis. longicornis, Dermacentor. silvarum, Ixodes. persulcatus, Ha. conicinna, Rhipicephalus. microplus, and Rh. sanguineus [5,6]. De. sinicus, Ix. sinensis, Ha. tibetensis and Ha. qinghaiensis have been only reported in China, although there is a lack of genomic data for these species.
Mitochondrial (mt) genomes are widely used in molecular systematics because their relatively high rate of evolutionary change provides greater phylogenetic resolution at the genus or family levels [7]. For ticks, complete mt genomes representing 66 species from 18 genera have been sequenced to date [8,9]. As in most arthropods, tick mt genomes are circular, 14-16 kb in length and contain 37 genes, including 13 protein coding genes, 22 tRNA genes and two rRNA genes. Phylogenetic analyses based on protein coding and rRNA gene sequences shows great consistency in genus level classi cation [8]. A single genome re-arrangement event has been identi ed in the Ixodidae, including the genera Rhipicephalus, Dermacentor, Amblyomma and Haemaphysalis, that possess a translocation of tRNA genes (trnL 1 , trnL 2 , trnC) and an inversion of the trnC gene [10,11].
Ticks has long been regarded as disease vectors, with an increasing number of human disease associations described in recent years [12][13][14][15]. Since 1982, more than 30 emerging tick-borne disease agents have been identi ed from 28 tick species, causing a variety of human infections including rickettsial disease, Q fever and Borrelioses. Together, these account for nearly 50,000 cases per year since 2013 as reported by National Noti able Disease Surveillance System in the USA [16,17]. Among these, rickettsial disease and Q fever are caused by two groups of obligate intracellular endosymbiont bacteria in ticks: the spotted fever group (SFG) of the genus Rickettsia [18,19] and Coxiella burnetii [20], harbored by hard and soft ticks, respectively. In contrast, Borrelioses, represented by Lyme disease and relapsing fever, are mainly caused by bacteria in the genus Borrelia (phylum Spirochaetes). Lyme borreliosis (LB) species (B. burgdorferi) are generally carried by Ixodes ticks, whereas the relapsing fever Borrelia group are generally carried by Ornithodoros. sonrai, Or. erraticus and Or. moubata [21,22]. In addition, ticks harbor a number of other bacteria in the genera Rickettsia, Coxiella, and Borrelia. While their public health signi cance remains unclear, these microbes are highly prevalent in tick species and can be transmitted transovarially [23][24][25][26].
Despite growing interest in tick-borne pathogens, our knowledge of the genetic diversity of ticks in China and the bacteria they carry are limited to a small number of common species. Herein, we collected and sequenced 46 tick species representing the biodiversity of ticks across China. For the rst time we revealed the genetic diversity of both ticks and their bacterial symbionts, enabling a more systematic study of their co-evolutionary history.

Morphological identi cation and characterization of ticks
Between 1959 and 2019 more than 20,000 ticks were collected from a wide range of hosts (e.g. cattle, goats, camels, hedgehogs) across China ( Figure 1, Table 1, Additional le 1: Table S1). Species identi cation of these ticks were carried out based on morphological characters, such as the number of punctuations, size and shape of mouthparts, and genital aperture [27][28][29]. This revealed a total of at least 46 species comprising two families (Ixodidae and Argasidae) and eight genera: Haemaphysalis (19 species), Ixodes (10 species), Dermacentor (6 species), Rhipicephalus (four species), Hyalomma (two species), Amblyomma (three species), Argas (one species) and Carios (one species). Among these were the six most common tick species in China [6]: Rh. microplus, Rh. sanguineus, Ix. persulcatus, Ha. longicornis, De. silvarum and Hy. asiaticum. In addition, other common species were collected and analyzed, including Ix. sinensis (the vector of B. burgdorferi [30] Other samples were collected by drag-agging methods or were historical samples preserved in ethanol for more than 60 years. For example, the oldest sample from our data set (Am. javanense) was collected in the 1960s from a wild Chinese pangolin (M. pentadactyla). In the case of two samples (i.e. C20, A29) we could only identify them to the genus level, in Amblyomma and Ixodes, respectively. Since we were unable to identify them at the species level using morphological characteristics, they were tentatively assigned as potential new species. The only differences were observed in the length and composition of non-coding regions, some of which contain more than one tandem repeat region (Additional les 2, 3, 4: Figure S1-S2,  Figure S1). Furthermore, we found inconsistencies in the control region within some individual samples. For example, cloning of sequencing of PCR products spanning the control region between trnQ and trnF reveals various copy numbers of short repeat sequences within the same (single-tick, De. marginatus E1) sample (Additional le 3: Figure  S2C).

Molecular Identi cation And Genetic Diversity Of Ticks
Both maximum likelihood (ML) and Bayesian phylogenetic trees were estimated based on sequences of 13 protein coding genes and two rRNA genes derived from 136 tick mt genomes, including 74 generated in this study and 62 reference mt genomes from GenBank. The ML and Bayesian methods resulted in highly similar tree topologies that placed the diversity of Chinese ticks within a global context with high resolution ( Figure 2, Additional le 5, 6, 7, 8, 9, 10, 11, 12: Figure S3-S10). Importantly, the newly added genomes greatly expanded the diversity of many groups, particularly the genera Haemaphysalis, Ixodes and Dermacentor ( Figure 2). In addition to the new species identi ed in this study, 28 tick species previously only known through morphological characteristics or incomplete mt genomes (e.g. Ix. kuntzi, Ix. acutitarsus, Ha. mageshimaensis and Ha. colasbelcouri) were also included ( Figure 2, Additional le 5, 6: Figure S3-4). Furthermore, at least ve potential cryptic species were identi ed -Rh. sanguineus, De. steini, De. marginatus, and Ix. ovatus -adding to the previously reported cryptic species identi ed in Rh. microplus [6,32]. Each contained at least two divergent (70.93% ~ 94.21% identity) phylogenetic clusters while sharing the same morphological characteristics ( Figure 2 Figure  3A, Additional le 14: Table S3).
Most of the species in the order Rickettsiales belonged to the genus Rickettsia, within which 14 bacterial species were identi ed from all tick genera included in this study (with the exception of Hyalomma; Figure  3B), including a number of human pathogens. For example, R. raoultii, which causes human tick-borne lymphadenitis [33,34] Figure S12-S14).
In addition to Rickettsia, we identi ed two potential new species within the order Rickettsiales. One, Wolbachia endosymbiont of Ixodes vespertilionis A54, clustered with W. pipientis strain FL2016 (95.37%) and Wolbahcia endosymbiont of Drosophila melanogaster (95.16%) within the genus Wolbachia. The identi cation of Wolbachia in ticks has been reported in recent years [39,40]. The other -Rickettsiales endosymbiont of Dermacentor -clustered with an unclassi ed Rickettsiales bacterium Ac37b identi ed from Am. cajennense in Brazil (86.62% identity). Together they may represent a new genus or even family within the Rickettsiales (Additional le 15, 16, 18: Table S4, Figure S12-S13).
Bacteria of the genus Coxiella had the highest prevalence among the tick species examined (74%, 40/54). The newly discovered Coxiella species in the present study are highly diverse and greatly expand the genetic diversity within this group ( Figure 3C). Indeed, new genetic lineages were de ned based on our phylogenetic analysis, including Coxiella endosymbiont of Dermacentor marginatus, Coxiella endosymbiont of Haemaphysalis concinna and Coxiella endosymbiont of Ixodes ovatus, most of which were divergent from exiting members of Coxiella and generally associated with speci c tick genera ( Figure 3C, Additional le 19: Figure S15). In contrast, we discovered a C. burnetii, the causative agent of Q fever [41], from a De. sinicus tick sampled from Xinjiang province, which had high abundance (29.92 RPM) and was closely related with the "Dugway 5J108-111" strain sampled from the United States and transmitted by De. andersoni [42] ( Figure 3C, Additional le 19: Figure S15). In addition to C. burnetii, we identi ed a single species of Borrelia from a soft tick Ca. vesperitilionis in Henan province. Based on the phylogenetic analyses, the newly identi ed bacteria, named Borrelia henanensis X1, fell within a clade "RF" that contains pathogens causing tick-borne relapsing fever ( Figure 3D, Additional le 20: Figure S16) [43,44].
Ecological and evolutionary patterns in ticks and their associated bacterial symbionts We used Mantel tests to examine whether the tick host and/or geographic factors shape the genetic diversity of the bacteria they carry. For both Rickettsia and Coxiella, our results revealed positive and signi cant (P < 0.0005) correlations between tick and bacteria genetic distance matrices. However, no such signi cant correlation was found between bacterial genetic distance and geographic distance. Similar results were obtained using (i) partial Mantel analyses, in which we tested the effect between two factors while controlling for the third, and (ii) multiple linear regression analyses in which we tested the effect between three matrices ( Table 2, Additional le 21: Table S5). These results suggested that bacterial genetic diversity was primary shaped by host genetic distance, with geographic distribution having little or no impact. The strong impact of host on bacterial genetic diversity were also re ected in the phylogenetic analysis in which we observed a signi cant clustering of bacterial genetic diversity at the host general level (Rickettsia: association index (AI) = 2.760, P < 0.001; Coxiella: AI=0.969, P < 0.001). | Indicate that the rst factor excludes the effect of the second.
We next examined whether the phylogeny of the ticks and their bacterial symbionts exhibited a pattern of bacterial-host co-divergence over evolutionary time. We rst tested hypothesis of co-divergence using an event-based framework, based on which we reconciled the phylogenies of ticks and their associated bacteria (i.e. Rickettsia and Coxiella, respectively) by accounting for four processes: co-divergence, duplication, host switching and loss [45]. This revealed signi cantly fewer non-co-divergence events (i.e. duplication, host switching and loss) than expected by chance alone. We similarly examined the codivergence hypothesis using a distance method, in which we evaluated the overall phylogenetic congruence by comparing the tick and bacterial symbionts patristic distance [46]. This con rmed the signi cant overall similarity (Para tGlobal, P = 0.0021 and 0.0003, respectively, for Rickettsia and Coxiella, at 9999 permutations) between the tick and bacterial symbionts phylogenies ( Figure 4). Collectively, these results suggest that the symbiotic bacteria from genera Rickettsia and Coxiella have co-diverged with their tick hosts for many millions of years.

Discussion
We collected more than 20,000 ticks and determined the mitochondrial genomes of at least 46 species representing the diversity of both common and rare ticks in China. Our sampling mostly covered human residential areas as well as some biodiversity hotspots, namely Shennongjia forest (Hubei province), the Tibet plateau (Tibet and Qinghai provinces), Hulun Beir prairie (Inner Mongolia). While the majority of ticks were sampled from domestic animals and were commonplace [6], those identi ed from wildlife or directly from the environment yielded more unique diversity. Hence, there may be many more species of ticks in China that have not yet been identi ed by current disease or human-centered sampling schemes. Indeed, a number of tick species, such as Ix. kuntzi and Ha. colasbelcouri identi ed from Taiwan and Laos [47,48], have not been sequenced and characterized previously. Furthermore, there is a general lack of genomic surveillance of ticks in reptiles and amphibians, such that substantial evolutionary gaps remain in studying the long-term tick-host relationship.
The results of the morphological and molecular species identi cation performed here were largely similar, suggesting that mitochondrial sequencing can reliably identify tick species. However, there were several inconsistencies, mainly re ected in the presence of ve cryptic species complexes, within which morphologically identical ticks can be separated by high levels of intra-speci c genetic diversity (e.g. 73.40%, 70.93%) or that had paraphyletic phylogenetic structure (e.g. De. marginatus and Rh. sanguineus) (Figure 2). Although we treat all these as cryptic species, we could not completely exclude that there may be some additional morphological characteristics that can distinguish these ticks and were unrecognized during our species identi cation process. In addition to cryptic species, inconsistencies between morphological and mitochondrial taxonomy were also re ected in the observation that several morphologically distinct ticks (De. nuttali, De. silvarum, De. sinicus) had very similar mt sequences (i.e. > 98.46% genetic identity). This suggests a relatively recent speciation event, although this needs to be further con rmed with nuclear genes.
Our study also revealed the high prevalence and large diversity of bacterial endosymbionts within the ticks examined, many of which fell with groups/species that contain human pathogens, including members of Spotted Fever Group (nine species, genus Rickettsia), C. burnetii, as well as B. henanensis. Given that our study only covers 219 individuals from 46 species of ticks, the prevalence rate for pathogenetic bacteria within these ticks is relatively high. This broadly agrees with previous studies that revealed a 41.2% -68.5% prevalence for pathogenetic bacteria [49,50]. Furthermore, the abundance of some of the bacterial pathogens was high (1.42 -162.57 RPM), further facilitating inter-host transmission. Interestingly, we did not detect other pathogenic genera within order Rickettsiales that are frequently carried by ticks, such as Anaplasma and Ehrlichia, although this may simply re ect a limited sample size.
Our results greatly enrich the diversity of both ticks and their associated bacteria, revealing that both endosymbiotic bacteria groups, namely those of the Rickettsia and Coxiella genera, had close association with ticks. This is also re ected in both the strong host structure on the bacteria phylogeny and the high resemblance between bacteria and host phylogenies indicative of long-term a co-divergence. Previous studies have suggested a lack of co-divergence between Rickettsia [51] and Coxiella-like endosymbionts [26]. However, these were mainly based on bacteria detected from PCR assays that are highly sensitive and may easily include both endosymbionts as well as bacteria transmitted through co-feeding. In addition, some previous studies of co-divergence are based on the rRNA gene which is too conserved to distinguish closely related species [52]. In contrast, the unbiased metagenomic sequencing performed here only detects those bacteria at relatively high abundance, such that these data are more re ective of the presence of endosymbiotic bacteria. Clearly, more data is required to fully resolve the co-evolutionary history between ticks and their endosymbiont bacteria.

Conclusion
In sum, our analysis of more than 20,000 ticks collected over broad geographic range across China provides insights into diversity and evolutionary history of ticks and their associated bacteria symbionts/pathogens. Our data reveal that the genetic diversity of ticks in China goes beyond a few common species and includes rare and under-explored species, for which more diversity in wildlife hosts that remains to be discovered. Importantly, despite their low occurrence, these uncommon tick species can harbor diverse pathogens, some of which could pose a potential threat to human health.

Sample collection
Between 1959 and 2019 more than 20,000 ticks belonging to eight genera (Rhipicephalus, Hyalomma, Dermacentor, Amblyomma, Haemaphysalis, Ixodes, Argas and Carios) were collected from different geographic regions in China (Figure 1, Additional le 1: Table S1). Most ticks were directly picked from infested wild and domestic animals, although some were captured using a drag-agging method. Upon capture, each tick was sexed and morphologically identi ed to species by trained eld biologists and further con rmed by sequencing mitochondrial genome sequences. Among these samples, 42% (Ix. simplex, Am. testudinarium and Ha. qinghaiensis) were preserved in alcohol at room temperature for 1-60 years, while others were captured alive and stored at -80℃ until DNA extraction.

Dna Extraction And Sequencing
For most of the samples we used a direct total DNA extraction and sequencing approach. For each sample, the ticks were rst homogenized in PBS solution using the Mixer mill MM400 (Restsch). The homogenates were then subject to DNA extraction using the QIAamp DNA Mini Kit (Qiagen) and Mollusc DNA Kit D3373 (OMEGA) following the manufacturer's recommendations. Library preparations were performed by BGI · Tech and/or Novogene and then 150bp pair-end sequencing on an Illumina HiSeq 4000 platform.
For two of the libraries (A3 and A4), the mitochondria were rst puri ed from cell homogenates before DNA extraction. Library preparation and sequencing were performed as described above. To purify the mitochondria, cell homogenates were rst subjected to low speed centrifugation (10 min at 100g) for insoluble debris removal. The supernatant was further centrifuged at 10000g for 5 min to obtain the mitochondria. The total sequenced data generated varied from 0.2 to 30.1 Gbp. For some of the libraries (A41, C7, D23 and C23) where mitochondrial genome received partial coverage, we increased the sequencing depth to 4.5 to 12.61 Gbp to obtain better coverage.

Mitochondrial Genome Assembly And Phylogenetic Analyses
Clean sequence reads were assembled de novo using MEGAHIT version 1.2.6 [53] with default parameters. To obtain mt genome sequences, the assembled contigs were compared against reference mt genomes in GenBank using blastn [54]. We set the E-value to 1E-10 to maintain high sensitivity and a low false-positive rate. Contigs with unassembled overlaps were merged to form longer mt contigs using the SeqMan program implemented in the Lasergene software package version 7.1 [55]. We used Bowtie2 [56] for con rmation and/or extension of the mt genomes. The protein-coding genes and rRNA genes of the assembled mt genomes were annotated using the Spin program implemented in the Staden package [57]. The annotated mt genomes of these ticks were deposited in GenBank under accession numbers XXX-XXX.
To infer phylogenetic trees, we used 74 mt genomes generated in this study plus an additional of 62 reference tick mt genomes obtained from GenBank. Individual alignments were performed on each of the 13 mt protein coding genes (ATP6, ATP8, COX1, COX2, COX3, CYTB, NAD1, NAD2, NAD3, NAD4, NAD4L, NAD5, NAD6) and two rRNA genes (12S rRNA and 16S rRNA). Protein coding genes were aligned based on codons using the ClustalW implemented in MEGA version 5.2 [58]. Ambiguously aligned regions were removed. Two rRNA genes were aligned using the MAFFT version 7.4 [59] employing the L-INS-i algorithm with all ambiguous aligned regions removed using TrimAL version 1.2rev59 [60]. Individual gene alignments were then concatenated to form super-alignments for subsequent phylogenetic analysis.
These comprised: an (i) "all gene" data set containing 13 protein coding genes and two rRNA genes, and (ii) "a protein coding gene" data set that only contained the 13 protein coding genes. All data sets were analyzed using both maximum likelihood (ML) and Bayesian algorithms implemented in IQ-TREE version

Discovery And Characterization Of Bacterial Pathogens In Ticks
For each library, we searched for the existence of marker genes of Rickettsiales (groEL, gltA), Borrelia (groEL, bamA, aB) and Coxiella (groEL, rpoB). The assembled contigs were compared against the database of reference marker genes downloaded from GenBank using blastx, with an E-value cut-off set at 1E-10. For libraries with incomplete marker gene coverage, partial gene sequences were obtained by rst mapping reads to a reference marker gene sequence of bacterial pathogens using Bowtie2 [56] and generating consensus sequences from the mapped reads. Based on these genes, we then compared the bacterial pathogens identi ed in this study with those previously described by estimating phylogenetic relationships using the ML and Bayesian methods described above. To examine the extent of bacteria-tick co-divergence, we performed event-based co-phylogenetic reconstructions using the Jane program, version 4.0 [45]. The 'cost' scheme for analyses in Jane was set as follows: co-divergence = 0, duplication = 1, host switch = 1, loss = 1, failure to diverge = 1. The number of generations and the population size were both set to 100. The signi cance of co-divergence was derived by comparing the estimated costs to null distributions calculated from 100 randomizations of host tip mapping. In addition, we performed a distance-based analysis to test the hypothesis of bacterialhost co-divergence using ParaFit as implemented in the COPYCAT software package version 2.0 [46], comparing distance matrices derived from the bacteria and host phylogenies. Signi cance testing was based of 9999 randomizations of the association matrices. Additionally, to visualize the association between bacteria and their tick hosts, a tanglegram was generated by matching each bacterial species to their associated ticks using TreeMap version 3.0beta [67].
The in uence of geographic and host genetic distance on pathogen genetic diversity Bacterial and host genetic distance matrices were derived from pairwise genetic comparisons using MEGA version 5.2 [58]. Geographic distances (Euclidean distance) were calculated using spatial coordinates of the samples derived from information on their geographic location. We used Mantel correlation analysis [68] to test the extent of the correlation between these matrices. Both a simple Mantel's test and Partial Mantel's test were performed, and the correlation was evaluated using 10000 permutations. To access which of the two factors -geographic or host genetic distances -best explained total variation in the bacteria genetic distance matrices we performed a multiple linear regression analysis [69] on these distance matrices. The statistical signi cance of each regression was evaluated by performing 1000 permutations. All statistical analyses were performed using the Ecodist package implemented in R 3.4.4 [70], and all statistical results were considered signi cant at a P-value of 0.05.
Declarations Journal of Statistical Software 2007, 22(7). Figure 1 Sampling locations of 46 tick species collected in China. The map of Hubei province (shaded green) was magni ed for clarity. We use different colors and shapes to represent 46 tick species from eight genera: Haemaphysalis (red), Ixodes (dark blue), Dermacentor (yellow), Rhipicephalus (green), Amblyomma (sky blue), Hyalomma (magenta), and Argas (purple), Carios (pink).  Co-phylogenetic comparisons of Rickettsia and Coxiella bacteria phylogenies and their corresponding tick hosts. The table shows the results of the co-phylogeny analysis using Para t and Jane4. The tanglegram shows the match between the phylogenies of the bacteria and tick hosts. The relationship between the two phylogenies is displayed to maximize topological congruence. Dotted line colours correspond to different tick groups as shown by gure legend at the right bottom.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.