Equus roundworms (Parascaris univalens) are undergoing rapid divergence due to host and environment factors

Background The evolution of parasites is often directly affected by the host's environment. Studies on the evolution of the same parasites in different hosts are extremely attractive and highly relevant to our understanding of divergence and speciation. Methods Here we performed whole genome sequencing of Parascaris univalens from different Equus hosts (horses, zebras and donkeys). Phylogenetic and selection analysis was performed to study the divergence and adaptability of P. univalens.


Introduction
Parascaris univalens are large parasitic nematodes that predominantly infect foals and weanlings. The Equus species, such as horse (Equus caballus), zebra (Equus zebra) and donkey (Equus asinus), are the reservoir hosts for P. univalens. Despite their close relationship, the extent of habitats, food composition, digestion levels, and history of human intervention in these hosts are vastly different, and the impact on large parasites living in the host remains unknown. Comparatively, donkeys have much lower energy and protein requirements than those of other equids [1], and even the metabolic response during exercise will differ depending on the level of food given to the horse [2], which may also be a challenge for parasites. Extensive genetic diversity provides the genetic basis for the adaptive evolution of nematodes, but a successful evolution will also come at a cost to the host [3]. Roundworms' underlying genetic diversity contributes to their evolutionary adaptation. Understanding these processes, as well as the key environmental factors that drive the adaptation, is crucial to comprehend the speci c evolutionary trends of parasites [4]. The genetic architecture and selection are key indicators of evolution aiming to understand how natural selection can lead to lineage divergence and speciation [5]. The genetic differences between populations can reveal a lot about the basic evolutionary process due to changes in the environment [6]. Studying the adaptation of parasites along with different hosts are of great signi cance for understanding the host-parasite interactions.
The control of parasitic nematodes feeding on animal relies almost exclusively on anthelmintic, which is proved to be effective in the short-term management, but the long-term effectiveness has been questioned due to the widespread emergence of drug resistance [7]. There is widespread concern about the risk associated with relying on anthelmintics with hundreds of millions of doses being donated and used every year [8]. In general, levels and spectrum of anthelmintic resistance are less severe in parasites of horses, however, the same issues persist, and they seem to be worsening [9][10][11]. Due to the inconsistency of drug history in different regions or farms, roundworms are facing varying degrees of selection pressure. Timely monitoring of drug use on the evolution of roundworms is of great signi cance. In recent years, genome scanning has become an effective means to reveal the genetic determinants of adaptive evolution in different habitats in some organisms. Since less introgression occurs in the selected loci, they exhibit lower polymorphism than other regions of the genome, which enables the formation of highly divergent regions that serve as the genetic foundation for divergence [12,13]. Here we analyzed the genome characteristics of P. univalens populations dwelling on three main hosts: horse (E. caballus), zebra (E. zebra) and donkey (E. asinus) in northeastern China. We presented the rst report on the recent divergence of P. univalens populations, and speculated that it might be linked to host digestion and metabolism preference. In addition, the role of selection pressure in the population was also evaluated at the genetic level. This work is of signi cance for changing the current "one size ts all" roundworm taxonomy, and providing a reference for revealing potential differentiation trends.

Results
Genome resequencing and genetic variations A total of 42 individuals from three P. univalens of Equus (PEc, n=19), zebra (PEz, n=18), donkey (PEa, n=5) from Inner Mongolia and Heilongjiang of China were whole-genome sequenced (Fig. 1, Table S1). We identi ed 4,398,519 SNPs with a genome-wide distribution of 1 SNP per 57 bp on average (Fig. S1). Genome resequencing was accomplished with an average depth of ~20X (Fig.   S2), average mapping rate was 98.17% and genomic coverages were larger than 90% for all individuals (Table S2). Currently, there may be two types of Equus roundworms due to different chromosome numbers (P. equorum n=4, or P. univalens n=2). We rst performed species identi cation. We randomly selected 1-4 samples at each sampling site for karyotype identi cation and found that all samples had two chromosomes (see methods; Fig. S3). In addition, the BWA alignment con rmed that the all samples were closer to P. univalens, with an average mapping rate against P. equorum of 78%, signi cantly lower than that against the P. univalens genome (98.03%, P<0.01) (Fig. S4a). These results indicate that the collected samples were all P. univalens. In addition, we calculated the observed heterozygosity and the expected heterozygosity, indicating that the inbreeding coe cient between individuals were very low (Average F < 0.1), and the possible genetic relationships were relatively distant, which were representative (Table S3).
Genetic differentiation was found in the P. univalens Principal components analysis (PCA) supported the clear separation among P. univalens populations (Fig. 2a), with PC1 and PC2 separating the PEc and PEz&PEa populations (P<0.05). All PCs showed that PEz and PEa were in the same cluster. In addition, the phylogenetic relationship among the three populations inferred by ML tree highlighted a similar division as that of PCA (Fig. 2b). The tree showed two distinct clusters, the PEc population seems to be a separate clade, while the other two formed a distinct clade. Although the sampling sites of PEc and PEz partially overlapped, the divergence between them was still clear. Population admixture analysis further con rmed the two distinct clusters presented by PCA and ML tree, where PEz and PEa had more ancestral components in common (Fig. 2c). We scanned the paired identity-by-descent (IBD) regions at the genome-wide level of all individuals and found that PEz and PEa shared more IBD regions (98.5% of PEa shared with PEz), while PEc and the other two populations had almost no shared large IBD fragments (Fig. S5, Fig. S6). In addition, lower inter-population Fst values were detected in PEz and PEa populations also indicated a closer relationship between them (Fig. S6).

Inference of demographic history and divergence time in P. univalens populations
To examine the genome-wide divergence time among P. univalens populations on the genome-wide, we constructed a molecular clock phylogenetic tree based on the calibrated mutation rate using all SNP sites. The tree topology showed that the divergence time of PEc and PEz&PEa was about 900-1500 years ago when the posterior probability was >95% (Fig. 3). The PSMC results showed similar effective population sizes (Ne) of three populations (Fig. 4a). Further Ne estimation inferred by the MSMC2 also supported the PSMC result with almost a similar trend 1000 years before, without differentiations, indicating the existence of possibly common ancestor ( Fig. 4a, 4b). However, the Ne of the three populations began to diverge in recent ~800 years (Fig. 4b). Further, we found that the PEc population was genetically separated from the PEz and PEa (Fig. 4b) at ~ 300 years ago with the RCCR less than 0.5. However, there was no obvious sign of separation between PEz and PEa (RCCR >0.5). Separations among these three populations were further supported and validated by results inferred by SMC++, which was independent of phased genotypes (Fig. 4c). We carefully compared the relationship between the separation and the topology of the phylogenetic tree, and found similar results. Both results showed that PEc vs PEz, PEc vs PEa had obvious divergence, but PEz vs PEa are not fully differentiated. Although the divergence times estimated by the above methods are not completely consistent, they all indicate that this differentiation occurred recently. Taken together, we considered that P. univalens were mainly divided into two clades, one was from horse-derived (PEc) and the other one was from zebra & donkey-derived (PEz&PEa). The conservative divergence time was estimated within the recent 300 years. Finally, we used fastsimcoal2 to evaluate the population size after divergence based on the observed joint site frequency spectrum (SFS; Fig. S7), and found that the PEc population size was signi cantly larger than PEz&PEa after this divergence event (Fig. 4d). The best coalescent simulation model inferred by fastsimcoal2 also indicates an early bidirectional gene ow between PEc and PEz&PEa.
The most possible demographic model in P. univalens populations In order to better understand this recent divergence of P. univalens, we used the δaδi to further explore the demographic history of the divergence. Following the method of Portik [14], we employed a four-round optimization technique to ensure that all nal optimizations resulted in a similar log-likelihood score (Table S4). In order to better validate the possible divergence patterns between populations, we rst constructed ten 3D models for three independent populations (Fig. S8). Models with the lowest scoring loglikelihoods was "Adjacent ancient migration, shorter isolation" (Fig. 5a, Table S5). This ancient migration model involving early gene ow with symmetric migration was supported as the best t for the three populations. Next, in our eight 2D simulations (Fig. S9), the ancient migration or secondary contact plus instantaneous size change model involving divergence with ancient continuous symmetrical migration, isolation with instantaneous size change provided the best t regarding the PEc and PEz&PEa lineages ( Fig. 5b, Table S5). By comparing the best 2D and 3D models, it was found that the best 2D models have larger log values and lower residuals, which coincided with the results of our phylogenetic tree. Furthermore, the best model revealed a possible divergence of roundworm populations, that is, there was bidirectional gene ow in the early ancient periods, but in the middle and current stages, the gene ow between the populations almost ceased.

Different Adaptive Strategies From The Prospective Of Metabolism
In-depth genome scanning and functional annotation helped us to understand the population divergence. Genome-wide nucleotide diversity (π) was computed for each population with all samples. Meanwhile, we identi ed genomic regions as candidate divergent regions (CDRs) among PEc, PEz and PEa populations (Table S6, Fig. S10). We used iHS to detect genes under recent natural selection in the PEc and PEz&PEa populations. A total of 1,046 SNPs in PEc and 1,093 SNPs in PEz&PEa were identi ed within the top 1% iHS scores (Fig. S11). These SNPs were annotated to 290 and 254 functional genes, respectively. The GO functional enrichment showed that they were mainly enriched in GO terms such as metabolism and regulation of gene expression (Table S7). The results of KEGG enrichment also showed that the two clades have signi cant selection signals in metabolic-related signaling pathways (Fig. S12). In addition, we also used the XP-EHH method to screen genes that may have been positively selected in different environmental pressures by comparing the PEc and PEz&PEa populations. The two-side P-value test was used to scan genome regions with selection sweep signals of the two clades. Interestingly, the differences in carbohydrate metabolism and lipid metabolism were extremely signi cant in the two clades ( Fig. S13, Fig. S14, Table S8). PEz&PEa clade has shown signi cant positive selection in almost all key enzymes in glycolysis and tricarboxylic acid cycle. These loci have aroused our attention and we re-examined the selection dynamics of their surrounding regions (Fig. 6). This includes the kinases (E1:hexokinase and E2:6-phosphofructokinase-1) involved in the two most important irreversible reactions in the rst stage of the conversion of glucose to pyruvate under anaerobic conditions. Meanwhile, the dehydrogenase (isocitrate dehydrogenase) in the irreversible reaction of isocitrate oxidative decarboxylation to α-ketoglutarate has also been signi cantly positively selected (P<0.05). The collective selection of enzymes in the glycolysis process and tricarboxylic acid cycle showed that PEz&PEa has a greater demand than PEc in this process. In addition, members of the lipid synthase family which is involved in the uptake of fatty acids are signi cantly positively selected in PEc.
Parasitic helminths contain appreciable quantities of lipids. However, most of the intestinal helminths do not utilize signi cant amounts of lipids even during starvation and under aerobic conditions [15], largely due to their anaerobic mode of life. The signi cant selection signals in these enzymes, such as fatty acid CoA synthetase family and long-chain fatty acid CoA ligase 5, suggests that they might be involved in some other processes as well, not just lipid uptake or metabolism.

Anthelmintics, The Potential Drivers Of Population Divergence
Currently, the anthelmintics mainly have two modes of action, one is more rapid action on membrane ion channels, and the other is a relatively slow biochemical reaction. These common anthelmintics include benzimidazoles (BZs), macrolides (MLs), nicotinic acetylcholine receptor agonists [16] and aminoacetonitrile derivatives (AADs). We screened out the main genes that may be related to the resistance of all the above-mentioned anthelmintics that have been main reported so far (Table 1). We also scanned iHS scores within 50 kb of all these gene regions, and calculated the nucleotide diversity (π) and tajima'D of the three populations with 10 kb sliding windows. The results showed that multiple resistance-related genes were under strong positive selection among different populations ( Table 1). The genetic diversity of some drug-related gene regions was signi cantly low and conservative. We identi ed three classic resistance locus (167/Phe, 198/Glu and 200/Phe) of the BZs resistance gene β-tubulin [17], and found that all individuals in the three populations did not show resistant mutations through sequence alignment (Fig. S15a). However, these gene regions were found under strong recently positive selection (Fig. S15c). In addition, the selection signals in other resistance-related genes in the population were also discovered ( Fig. S16 -S18). For example, multidrug resistance protein pgp-3 and multidrug resistance protein 1 (mrp-1) were found to be under strong positive selections in the PEc population. Glutamate-gated chloride channel alpha (glc-1), which was related to ivermectin resistance showed strong positive selection in PEz and PEa populations. A uoxetine (Prozac) resistance gene nose resistant to uoxetine protein 6 (nrf-6) [18] showed strong positive selection in three populations. Positive selections of these gene regions were different in three populations, which may be related to the drug history in different environments. Nevertheless, it was obvious that in some populations, certain genetic selections were very conservative and signi cant. It can be seen from the results that the selection of these resistance-related genes will be retained in independent populations and would continue to be passed down to the next generation. It should be noted that these selections were populationspeci c and have the potential to promote population differentiation.

Discussion
The ecological environment in which parasites inhabit is different from that of ordinary animals, and the survival of parasites is more dependent on the intestinal environment. The evolution of a single species is a long and delicate process. Numerous environmental changes and its own factors affect this process. Our study used a combination of explicit genetic analysis and demographic models to determine the possibly diversi ed mechanisms that occur in different intestinal environments. We applied δaδi 2D/3D models to simulate the most possible demographic history. The ancient migration or secondary contact model, as well as the immediate size change model, were found to be effective in explaining the demographic differences and recent divergence of P. univalens populations. In addition, the demographic history also showed that the P. univalens were in the process of divergence. Recent selection analysis provided evidence for understanding the possible causes of the divergence, which supported the signi cant impact of different host intestinal habitats on evolution. We summarized the main ndings on the diversi cation of P. univalens, and provided a perspective for future monitoring of roundworm ecology in a timely manner to deal with possible unfavorable mutations.
For parasites, the host's in uence on its evolution cannot be ignored. The change of the host's diet was an opportunity and a challenge for the parasites. From the perspective of glycolysis, we found that most of the key enzymes involved in glycolysis in PEz and PEa were subjected to a higher degree of recent positive selection when compared to domestic horses. As the most important way for roundworms to obtain ATP, glycolysis and tricarboxylic acid cycle seem to have "degraded" in domestic horse roundworm populations. This may be the result of its adaptation to host diet. Experiments have shown that the content of fatty acids such as palmitic acid, palmitoleic acid, stearic acid and oleic acid of the parasite were almost the same as that of the speci c host and the changes in the ratio of fatty acids tend to be synchronized [19]. Immunological evasion could be the major purpose. This notion was supported by our ndings in horse roundworms, which showed a strong selection of genes involved in lipid synthesis. Obviously, in addition to nutrition and maintaining physiological integrity, a more important purpose was possible to keep consistent with the host's various fatty acid patterns. The evolution of parasites in regulating lipid composition may be a factor in uencing host suitability [20]. The difference was that we believed the consistency in this case was more likely related to the host's intestinal and surrounding lipid deposition rather than the total lipid ratio.
The issue of drug resistance has been widely mentioned over the past two decades. Parasites have strong adaptability, and roundworms lay more than 200,000 eggs per day [21], which is destined to have a su cient mutational basis to resist any environmental changes. The problem of drug resistance has brought signi cant economic losses to industries such as animal husbandry, and the control of parasites has also become an important expenditure [22]. In our results, we found that some resistancerelated genes such as β-tubulin, glc-1, pgp-3, mrp-6, cup-4, nrf-6 and CYP family, were under strong positive selection in different populations. These genes are respectively related to multiple anthelmintics, which may be related to their history of deworming. Over time, the gene frequency of these genes will increase signi cantly in the population and attention should be paid to the impact of this selection on species evolution. Currently, in addition to anthelmintics, vaccines also offer an attractive alternate control strategy against these parasites [23]. Despite the wealth of methods, the response of the parasites was amazing. When we focus on the issue of drug resistance, we should worry more about the impact of this strong selection on species evolution. Compared with non-human interference natural selection, the effect of anthelmintics is undoubtedly more direct and stronger, and the consequences of this choice are di cult to predict. This suggests that we should be cautious in dealing with the issue of drug resistance and adopt more scienti c strategies. It should be noted that P. univalens are highly adaptable and evolve quickly. As the gene frequency of certain drug-related genes in speci c populations further increases, it was likely that the P. univalens will differentiate into new subpopulations with speci c phenotypes. Long-term use of anthelmintics will become a potential driver for the differentiation of P. univalens.
This work provides a reference for monitoring the evolution of parasites and revealing the driving factors of evolution under the action of natural selection and drugs. It should be noted that due to the limitation of sample size, some accidental or other environmental factors cannot be fully considered for natural selection in P. univalens. We only considered two types of extremely signi cant factors (host's diet and anthelmintics), and other selection effects still need to be veri ed in a wider range of P. univalens in the future. It is foreseeable that the impact of human intervention in the host-parasite interaction will become greater and greater, and how to nd a balance point will still be a major challenge in the future.

Sample collection
Seventeen roundworms were collected from six naturally infected horses (E. caballus) treated with anthelmintics in a farm located in the Ordos, Inner Mongolia, China. While, two horse roundworm individuals were collected after anthelmintic treatment from a farm in Harbin, Heilongjiang, China. Five roundworms from three donkey (E. asinus) were collected after anthelmintic treatment from a farm in Chifeng, Inner Mongolia, China. Eighteen roundworms from ten zebra (E. zebra) were obtained after anthelmintics treated from Harbin Northern Forest Zoo, Heilongjiang, China. All specimens were washed extensively in sterile physiological saline (37 ℃), snapfrozen and transported with dry ice and then stored at -80 ℃ until further use.

Karyotyping
We performed karyotyping on the collected samples. Worms were carefully dissected and the gonads located and excised. The gonads were then processed for karyotyping as previously described [24]. Using a modi ed freeze-crack method to permeabilize and x embryos. Brie y, the embryos were immersed in KCl (0.075 M) hypotonic 5 min, then rinse with methanol/acetic (3:1) acid solution. Next, drip 45% acetic acid on the siliconized coverslip. After pressurizing for about 60 seconds, put the slider in liquid nitrogen and freeze for 1-2 minutes. Staining was carried out with 4′,6-diamidino-2-phenylindole (DAPI) for 5 min, and the slides were then examined under a uorescence microscope.

Nucleic acid isolation, library construction and sequencing
Total genomic DNA was isolated using a sodium dodecyl sulphate/proteinase K digestion [25] followed by phenol-chloroform extraction and ethanol precipitation. Genomic DNA was sheared into 200-800 bp for paired-end libraries preparation according to the manufacturer's instructions of the DIPSEQ platform (BGI-Shenzhen, Shenzhen, China). Libraries were then subjected to the DIPSEQ-T1 sequencer for short-read whole genome sequencing (WGS) sequencing (Table S1).

Read mapping and SNP calling
High-quality reads were aligned to the P. Diversity analysis SNP density was counted with a 10 kb sliding window using VCFtools (v0.1.13) software [29]. Genome-wide nucleotide diversity (π) and Tajima's D were computed by sliding windows of 1 kb using all individuals in each population using VCFtools (v0.1.13). The Weir-Cockerham xation index (Fst) was estimated among three populations with genotype data using the VCFtools (v0.1.13).
Phylogenetic relationship, genetic structure and admixture Principal component analysis (PCA) was carried out using EIGENSOFT [30] software base on the SNP dataset, and the population clustering analysis was conducted in PLINK (v1.90b6.10) [31]. We used genome-wide SNPs to construct the maximum likelihood (ML) phylogenetic tree with 1000 bootstraps using iqtree (v1.6.12) [32]. The genome sequence of Baylisascaris schroederi was selected as an outgroup. Population structure was analyzed using the ADMIXTURE (v1.3.0) program with a block-relaxation algorithm. To explore the convergence of individuals, we prede ned the number of genetic clusters of K from 2 to 4.
We investigated the relationships within P. univalens populations in a coalescent framework with SNAPP implemented in BEAST v2.6.3 [33]. We performed two independent runs with a chain length of 10,000,000 generations, sampling every 1,000 generations. We examined convergence using TRACER v1.7.1 and created a maximum clade credibility tree after a burn-in of 20% via TREEANNOTATOR [34]. According to epidemiological investigation, we assumed that the average generation time of P. univalen sroundworms was 0.17 years, and converted the SNAPP analyses into units of real-time using a mutation rate (μ) of 9×10-9 per generation per site.
Estimates of the effective population size and divergence time The pairwise sequentially Markovian coalescent (PSMC) method [35] was used to evaluate the dynamic change of effective population size (Ne) of each population. We used 0.17 years per generation (g) and mutation rate (μ) of 9×10-9 per generation per site to rescale the time to year [36]. More recent (within 1000 years) changes of effective population size of each population and separation time between different populations were further estimated by using the multiple sequentially Markovian coalescent (MSMC2) [37], which can much compensate results from PSMC. However, the inference accuracy of MSMC2 largely depends on the phasing accuracy of genotypes. Switch error rates will introduce bias in the calculation. To further con rm results from MSMC2, we also  [39], then the calculation was performed with the following parameters: -i 20 -t 6 -p '10*1+15*2'. The mutation rate (μ) of P. univalens for SMC++ and MSMC2 were used the same values as for PSMC.

Demographic inference using fastsimcoal2 and δaδi
We used the fastsimcoal2 [40] approach to deduce the recent demographic history of P. univalenspopulations. We chose only SNPs located in intergenic regions to avoid the in uence of SNPs under selection [41]. We used δaδi [42] to investigate alternative demographic scenarios for the species complex. In the absence of historical evidence, we hypothesized that there may or may not be any form of gene ow between roundworm populations. In order to get the best model, we rst simulated a total of ten δaδi 3D models, including one simple model, three simultaneous splitting models, one ancient migration model, one simultaneous splitting variations model, one admixed ("Hybrid") origin model and three divergences with gene ow variations models. When tting demographic models, we perform multiple runs (100 rounds) and ensure that nal optimizations are converging on a similar loglikelihood score. To estimate demographic parameters, the derivative-based BFGS algorithm was used to optimize the composite loglikelihood. All models and scripts are available at https://github.com/dportik/dadi_pipeline.

Recent nature selection analysis
Extended Haplotype Homozygosity (EHH) and Integrated Haplotype Score (iHS) methods were used for detecting SNPs under recently positive selection of three roundworm populations [43]. We use SNPs with an iHS score of top 0.5% and the distance between adjacent SNPs < 50 kb as candidate SNPs [44]. We searched for genes in the 5-kb anking region from both sides of candidate SNPs, and calculate the accumulated iHS scores by adding all iHS scores of candidate genes. Next, to uncover genetic variants under strong positive selection in each host population, we used cross population extended haplotype homozygosity (XP-EHH) method on each pair of combination (PEc vs PEz, PEc vs PEa and PEz vs PEa) to nd population-speci c SNPs under strong positive selection. XP-EHH we used in this study was from the R package rehh (v3.1.2; https://cran.r-project.org/web/packages/rehh/vignettes/rehh.html).
The regions with P values < 0.01 were considered signi cant signals in the population of interest.
Conclusions P. univalens is the main parasitic pathogen that infects equine, and it is also the chief culprit in horses' weight loss and weakened immunity. The genetic variation and host differences complicate the development of broad-spectrum diagnostics, therapeutic. Here we report the recent divergence of P. univalens and reveal that the rapid evolution of glycolysis-related genes drove this divergence. It is also a key factor leading to the parasitic preference of roundworm populations. In addition, we found that resistance-related genes have similar tendency, which was the potential impact of overuse of anthelmintics. We have established a rapid evolution gene set of