Comparative Genomic Insights into the Longhorned Tick, Haemaphysalis Longicornis


 Background: The longhorned tick, Haemaphysalis longicornis Neumann, is widely distributed across temperate regions. It can parasitize terrestrial vertebrates, including birds and a large number of mammals. They are a concern in human and animal health notably for their potential to transmit infectious agents. Methods: Genome survey was investigated using GenomeScope v1.0.0 with a maximum k-mer coverage cutoff of 1,000. Non-redundant assembly was polished with Illumina short reads using two rounds of NextPolish v1.1.0. Genome completeness was assessed using BUSCO v3.0.2 pipeline analyses against arthropod gene set (n = 1, 066). Ab initio predictions were generated using BRAKER v2.1.5. Transcriptomic reads were mapped to the genome with HISAT2 v2.2.0 and assembled with StringTie v2.1.2. Gene functions were assigned against UniProtKB database using Diamond v0.9.24. Orthogroups of 16 Chelicerata species were inferred using OrthoFinder v2.3.8 and gene family evolution was estimated using CAFÉ v4.2.1. Gene families related to digestion and detoxification, i.e. cytochrome P450 (CYP), carboxyl/cholinesterase (CCE), glutathione-S-transferase (GST), ATP-binding cassette (ABC) transporter were annotated by searching in the genome assembly. Results: The final genome assembly has a size of 3.12 Gb, a scaffold N50 of 1.09 Mb, and captured 92.4% of the BUSCO gene set (n=1,066). Genome architecture pattern of the longhorned tick resembles another tick, Ixodes scapularis (Say), particularly in large size, highly repetitive DNA (~65%) and protein-coding genes (21,550). We also identified 5,601 non-coding RNAs with a high ratio of tRNAs (4,271). Gene family evolution revealed 350 rapidly evolving gene families. Combining function enrichment analyses of gene ontology (GO) and KEGG pathway, 255 families experiencing significant expansions mainly involves in cuticle synthesis, digestion and detoxification. Conclusions: The new genome assembly, annotation and comparative genomic analyses provide a valuable resource for insights into parasitic life mode of the longhorned tick.

Despite the medical importance, there are major gaps in the knowledge of tick biology and methods to control ticks and the diseases they transmit are currently limited [12]. Genomic information will greatly contributes to the studies of vectors and its interactions with pathogens and hosts. Large genome sizes but small body sizes bring great di culties in genome sequencing, assembly and annotation, and downstream applications. The present draft genome assembly of H. longicornis reached a size of 7.36 Gb with a very high ratio of redundancy, indicated by the 79% complete and duplicated BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set; in addition, only repetitive elements were simply reported without any information of protein-coding genes and other comparative analyses [13].
These limits severely hinder our understanding of the biology of the longhorned tick.
We present a new version of the longhorned tick genome assembly by removing redundant sequences, annotate the repeats, non-coding RNAs (ncRNAs) and protein-coding genes, and compare gene family evolution across the main Chelicerata lineages, particularly several families related to dietary detoxi cation.

Annotation of gene families associated to detoxi cation
We annotated gene families related to digestion and detoxi cation, i.e. cytochrome P450 (CYP), carboxyl/cholinesterase (CCE), glutathione-S-transferase (GST), ATP-binding cassette (ABC) transporter by searching in the genome assembly and automatically predicted gene models. Acari proteins were downloaded from NCBI RefSeq database as the reference. We conducted TBLASTN-and BLASTP-like searches using MMseq2 Release 8 [48] with four rounds of iterative searches. Candidate target genes were aligned to reference protein sequences to manually check intron/exon boundaries. The resulting proteins were examined using HMMER3 search in the Pfam database to con rm speci c domains and using BLASTP in the non-redundant GenBank database to verify the classi cation.

Genome assembly and annotation
We estimated a genome size of ~ 3.09 Gb, a heterozygosity rate of 3.54-3.65% and a repetitive content of 65%. The left and the right peaks implied that the genome may have high levels of heterozygosity and repetition (FigureS1).The nal assembly had 6,490 scaffolds/7,059 contigs, a total length of 3.12 Gb and scaffold/contig N50 length of 1.09/1.05 Mb. Assembled size was very close to the estimated one.
Genome completeness assessment against arthropod dataset (n = 1,066) (   (Figure 1a, Table S3). In H. longicornis, 19,211 (91.91%) genes were clustered into 9,639 gene families; 850 families and 4,008 genes were species-speci c. Phylogenetic tree clustered H. longicornis and I. scapularis, both representing Ixodidae originated from early Cretaceous (Figure 1a). It indicated that the emergence of parasitic ixodids may be related to the pervasive reptiles, birds, mammals in Cretaceous.
CAFÉ identi ed 350 rapidly evolving gene families, 255 and 95 of them experienced signi cant expansions and contractions, respectively (Figure1a). The largest expanded families were shown in Figure 1b and Table S4. Many of them are related to dietary digestion and detoxi cation, cuticle synthesis, such as ABC transporter, Cytochrome P450, carboxylesterase, tick histamine binding protein, insect cuticle protein, putative secreted salivary gland peptide, juvenile hormone acid Omethyltransferase, secretin family etc. GO (Figure 1c) and KEGG (Figure 1d) enrichment further con rmed it, involving various biological progress or pathways, for example, ABC transporters, insect hormone biosynthesis, fatty acid elongation and biosynthesis, fat digestion and absorption, and ovarian steroidogenesis (Figure 1d). KEGG pathways of toxoplasmosis and amoebiasis may relate to parasitic life of the longhorned tick. These ndings adapted to the parasitic life are very similar to another ixodid tick I. scapularis [49].
Gene families related to detoxi cation Digestion and detoxi cation function are important for parasitic progress unique to ticks, particularly feeding of blood meal, haemoglobin digestion, haem detoxi cation and prolonged off-host survival. We compared four dietary detoxi cation-related gene families in three tick and mite species. ABC, P450 and CCE families showed large expansions in the longhorned tick genome ( Table 2). Expansions of ABCs occurred in the ABCA and ABCC subfamilies, which includes ve and three large (≥ 5 orthologs) clusters on the phylogenetic tree (Fig. 2a). ABCA transporters function in lipid transport and resistance in insects [50][51][52]. ABCC transporters, also known as multidrug resistance proteins (MRPs), are known to be involved in ion transport, signal transduction, and toxin secretion [53]. Major P450 expansions of two tick species were discovered in clan 3 and clan 4 ( Table 2), which had three clan 3 and three clan 4 large clusters on the tree for H. longicornis (Fig. 2b). P450 clan 3 and clan 4 may be linked to xenobiotic metabolism, insecticide resistance, odorant or pheromone metabolism [54]. GST expansions of two ticks mainly occurred in Mu class ( Figure S2), which may activate in drug metabolism, particularly in the detoxi cation of reactive oxygen species (cyclised o-quinones) produced via oxidative metabolism of catecholamines [55,56]. CCE in H. longicornis showed extreme expansions (Table 2) in neuro/developmental class although classi cations of some members were unclear in Chelicerata [57]. These large expansions in the longhorned tick genome greatly enhance abilities in xenobiotic metabolism and insecticide resistance, and thus are considered to contribute to the parasitic adaptation.

Competing interests
The authors declare that they have no competing interests.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. supplementarytables.xlsm