DOI: https://doi.org/10.21203/rs.3.rs-49223/v1
Background: The longhorned tick, Haemaphysalis longicornis Neumann, is widely distributed across temperate regions. It can parasitize terrestrial vertebrates, including birds and a large number of mammals. They are a concern in human and animal health notably for their potential to transmit infectious agents.
Methods: Genome survey was investigated using GenomeScope v1.0.0 with a maximum k-mer coverage cutoff of 1,000. Non-redundant assembly was polished with Illumina short reads using two rounds of NextPolish v1.1.0. Genome completeness was assessed using BUSCO v3.0.2 pipeline analyses against arthropod gene set (n = 1, 066). Ab initio predictions were generated using BRAKER v2.1.5. Transcriptomic reads were mapped to the genome with HISAT2 v2.2.0 and assembled with StringTie v2.1.2. Gene functions were assigned against UniProtKB database using Diamond v0.9.24. Orthogroups of 16 Chelicerata species were inferred using OrthoFinder v2.3.8 and gene family evolution was estimated using CAFÉ v4.2.1. Gene families related to digestion and detoxification, i.e. cytochrome P450 (CYP), carboxyl/cholinesterase (CCE), glutathione-S-transferase (GST), ATP-binding cassette (ABC) transporter were annotated by searching in the genome assembly.
Results: The final genome assembly has a size of 3.12 Gb, a scaffold N50 of 1.09 Mb, and captured 92.4% of the BUSCO gene set (n=1,066). Genome architecture pattern of the longhorned tick resembles another tick, Ixodes scapularis (Say), particularly in large size, highly repetitive DNA (~65%) and protein-coding genes (21,550). We also identified 5,601 non-coding RNAs with a high ratio of tRNAs (4,271). Gene family evolution revealed 350 rapidly evolving gene families. Combining function enrichment analyses of gene ontology (GO) and KEGG pathway, 255 families experiencing significant expansions mainly involves in cuticle synthesis, digestion and detoxification.
Conclusions: The new genome assembly, annotation and comparative genomic analyses provide a valuable resource for insights into parasitic life mode of the longhorned tick.
Haemaphysalis longicornis (Acari: Ixodidae) is obligate hematophagous ectoparasites of vertebrates, which is widely distributed in China, Korea, Japan, Australia, New Zealand and Pacific Islands [1–5]. Hosts of this tick include a wide variety of mammals such as cattle, sheep, dogs, and birds. As one of the three-host ticks, H. longicornis take a single blood meal at larvae, nymphs and adult stage, respectively. In addition to causing anemia, weight loss, slow growth, and even the release of toxin that can cause death of hosts, H. longicornis also can transmit various zoonotic pathogens during the blood feeding. It has been reported that longhorned tick acts as vector of Babesia ovate, Anaplasma bovis, Theileria orientalis, Coxiella burnetii, Ehrlichia chaffeensis, Rickettsia japonica, as well as causal agent of the emerging infectious disease– severe fever with thrombocytopenia syndrome (SFTS) [6–11].
Despite the medical importance, there are major gaps in the knowledge of tick biology and methods to control ticks and the diseases they transmit are currently limited [12]. Genomic information will greatly contributes to the studies of vectors and its interactions with pathogens and hosts. Large genome sizes but small body sizes bring great difficulties in genome sequencing, assembly and annotation, and downstream applications. The present draft genome assembly of H. longicornis reached a size of 7.36 Gb with a very high ratio of redundancy, indicated by the 79% complete and duplicated BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set; in addition, only repetitive elements were simply reported without any information of protein-coding genes and other comparative analyses [13]. These limits severely hinder our understanding of the biology of the longhorned tick.
We present a new version of the longhorned tick genome assembly by removing redundant sequences, annotate the repeats, non-coding RNAs (ncRNAs) and protein-coding genes, and compare gene family evolution across the main Chelicerata lineages, particularly several families related to dietary detoxification.
The available assembly (GCA_008122185.1) and raw PacBio (SRR9226158) and Illumina (SRR9226159) sequencing data were used for subsequent analyses. Pooled adults were collected for transcriptome sequencing to assist genome annotation. RNA extraction, library preparation and sequencing were carried out at Berry Genomics (Beijing, China). Genome survey was investigated using GenomeScope v1.0.0 [14] with a maximum k-mer coverage cutoff of 1,000. The k-mer frequencies were estimated with 21-mers using khist.sh (one of the BBTools v38.49 modules, [17]). Redundant heterozygous regions were removed with an identity cutoff of 60 using two rounds of Purge_dups v1.0.0 [16]. Non-redundant assembly was polished with Illumina short reads using two rounds of NextPolish v1.1.0 [17]. Genome completeness was assessed using BUSCO v3.0.2 pipeline [18] analyses against arthropod gene set (n = 1,066).
We generate a custom repeat library by combining Dfam_3.0 [19], RepBase-20181026 databases [20], and ade novo species specific library, which was constructed using RepeatModeler v2.0.1 [21]. RepeatMasker v4.0.9 [22] was used to mask repeats in the genome assembly upon the custom library. Non-coding RNAs were identified using Infernal v1.1.2 [23] and tRNAscan-SE v2.0 [24].
We employed MAKER v2.31.10 pipeline [25] to conduct gene prediction by integrating ab initio, transcriptome- and protein homology-based evidence. Ab initio predictions were generated using BRAKER v2.1.5 [26], which employed Augustus v3.3.2 [27] and GeneMark-ES/ET/EP v4.48_3.60_lic [28] based on transcriptomic data and OrthoDB [29]. Transcriptomic reads were mapped to the genome with HISAT2 v2.2.0 [30] and assembled with StringTie v2.1.2 [31]. Protein sequences of Drosophila melanogaster, I. scapularis, Tetranychus urticae, Stegodyphus mimosarum, Strigamia maritima, Daphnia pulex and Varroa destructor were downloaded from the NCBI or Ensembl supporting the protein homology. We assigned gene functions using Diamond v0.9.24 [32] against UniProtKB database with the parameter ‘--more-sensitive -e 1e-5’. We also annotated protein domains, Gene Ontology (GO) and pathways (KEGG, Reactome) using eggNOG-mapper v2.0 [33] against eggNOG v5.0 database [34] and InterProScan 5.41-78.0 [35] against Pfam [36], PANTHER [37], Gene3D [38], Superfamily [39], and CDD [40] databases.
We inferred orthogroups of 16 Chelicerata species using OrthoFinder v2.3.8 [41]: Centruroides sculpturatus, Dermatophagoides pteronyssinus, Galendromus occidentalis, H. longicornis, I. scapularis, S. mimosarum, Tachypleus tridentatus, Tetranychus urticae, V. destructor. Phylogenetic tree was reconstructed using IQ-TREE v1.6.10 [42] with the partitioned strategy (‘-m MFP --mset LG --msub nuclear --rclusterf 10’) based on single-copy orthologs inferred from OrthoFinder. Protein sequences were aligned using MAFFT v7.394 [43] with the L-INS-I strategy, trimmed using trimAl v1.4.1 [44] with the heuristic method ‘automated1’, and concatenated using FASconCAT-G v1.04 [45]. Resulting tree was dated using r8s (Sanderson 2003) with calibration information extracted from the PBDB database (https://www.paleobiodb.org/navigator/). Gene family evolution was estimated using CAFÉ v4.2.1 [46]. GO and KEGG function enrichment for those significantly expanded families were analyzed using R package clusterProfiler v3.10.1 [47].
We annotated gene families related to digestion and detoxification, i.e. cytochrome P450 (CYP), carboxyl/cholinesterase (CCE), glutathione-S-transferase (GST), ATP-binding cassette (ABC) transporter by searching in the genome assembly and automatically predicted gene models. Acari proteins were downloaded from NCBI RefSeq database as the reference. We conducted TBLASTN- and BLASTP-like searches using MMseq2 Release 8 [48] with four rounds of iterative searches. Candidate target genes were aligned to reference protein sequences to manually check intron/exon boundaries. The resulting proteins were examined using HMMER3 search in the Pfam database to confirm specific domains and using BLASTP in the non-redundant GenBank database to verify the classification.
We estimated a genome size of ~ 3.09 Gb, a heterozygosity rate of 3.54‒3.65% and a repetitive content of 65%. The left and the right peaks implied that the genome may have high levels of heterozygosity and repetition (FigureS1).The final assembly had 6,490 scaffolds/7,059 contigs, a total length of 3.12 Gb and scaffold/contig N50 length of 1.09/1.05 Mb. Assembled size was very close to the estimated one. Genome completeness assessment against arthropod dataset (n = 1,066) (Table 2) revealed our assembled version covered 92.4% complete, 8.9% complete and duplicated, 1.1% fragmented, and 6.5% missing BUSCO genes.
A total of 64.81% (2.02 Gb) repetitive elements were identified in the longhorned tick genome. The top five abundant repeat categories were Unclassified (21.10%), LTR (18.37%), LINE (18.31%), DNA elements (2.99%), and SINE (2.23%) (Table S1). We identified 5,601 ncRNAs: 191 rRNAs, 65 miRNAs, 820small nuclear RNAs (snRNAs), 2 long non-coding RNAs (lncRNAs), 8 small RNAs (sRNAs), 4,271 tRNAs (22 isotypes), and 244 other ncRNAs (Table S2). SnRNAs were classified into 771 spliceosomal RNAs (U1, U2, U4, U5, U6, U11), three minor spliceosomal RNAs (U12, U4atac, U6atac), and six H/ACA box and 40 C/D box snoRNAs. We predicted 21,550 protein-coding genes with the mean length of genes, exons and introns as 22,231.43, 262.16 and 3,451.43 bp, respectively, and 91.8% BUSCO completeness (Table 1). InterproScan and eggNOG functional annotations assigned protein domains of 17,963 (83.35%) genes, 14,464 GO terms, 10,903 KEGG ko terms, 4,148 Enzyme Codes, 6,788 KEGG and 4,083 Reactome pathways, and 15,557 COG categories, respectively.
Elements |
Current version |
---|---|
Genome assembly |
|
Assembly size (Mb) |
3,117.76 |
Number of scaffolds |
6,490 |
Number of contigs |
7,059 |
Longest scaffold (Mb) |
8.69 |
Longest contig (Mb) |
8.69 |
N50 scaffolds length (Mb) |
1.09 |
N50 contig length (Mb) |
1.05 |
GC (%) |
47.50 |
Gaps (%) |
0.003 |
BUSCO completeness (%) |
92.4 |
Gene annotation |
|
Protein-coding genes |
21,550 |
Mean protein length (aa) |
437.15 |
Mean gene length (bp) |
22,231.43 |
Exons per gene |
7.02 |
Exon (%) |
0.013 |
Mean exon length |
262.16 |
Intron (%) |
14.09 |
Mean intron length |
3,451.43 |
BUSCO completeness (%) |
91.8 |
A total of 162,773 (88.24%) genes were assigned into 18,611 gene families; among them, 3,482 families were shared by all nine species and 470 are single-copy ones; 143 families and 1,075 orthologs are common to four Parasitiformes species (Figure 1a, Table S3). In H. longicornis, 19,211 (91.91%) genes were clustered into 9,639 gene families; 850 families and 4,008 genes were species-specific. Phylogenetic tree clustered H. longicornis and I. scapularis, both representing Ixodidae originated from early Cretaceous (Figure 1a). It indicated that the emergence of parasitic ixodids may be related to the pervasive reptiles, birds, mammals in Cretaceous.
CAFÉ identified 350 rapidly evolving gene families, 255 and 95 of them experienced significant expansions and contractions, respectively (Figure1a). The largest expanded families were shown in Figure 1b and Table S4. Many of them are related to dietary digestion and detoxification, cuticle synthesis, such as ABC transporter, Cytochrome P450, carboxylesterase, tick histamine binding protein, insect cuticle protein, putative secreted salivary gland peptide, juvenile hormone acid O-methyltransferase, secretin family etc. GO (Figure 1c) and KEGG (Figure 1d) enrichment further confirmed it, involving various biological progress or pathways, for example, ABC transporters, insect hormone biosynthesis, fatty acid elongation and biosynthesis, fat digestion and absorption, and ovarian steroidogenesis (Figure 1d). KEGG pathways of toxoplasmosis and amoebiasis may relate to parasitic life of the longhorned tick. These findings adapted to the parasitic life are very similar to another ixodid tick I. scapularis [49].
Digestion and detoxification function are important for parasitic progress unique to ticks, particularly feeding of blood meal, haemoglobin digestion, haem detoxification and prolonged off-host survival. We compared four dietary detoxification-related gene families in three tick and mite species. ABC, P450 and CCE families showed large expansions in the longhorned tick genome (Table 2). Expansions of ABCs occurred in the ABCA and ABCC subfamilies, which includes five and three large (≥ 5 orthologs) clusters on the phylogenetic tree (Fig. 2a). ABCA transporters function in lipid transport and resistance in insects [50–52]. ABCC transporters, also known as multidrug resistance proteins (MRPs), are known to be involved in ion transport, signal transduction, and toxin secretion [53]. Major P450 expansions of two tick species were discovered in clan 3 and clan 4 (Table 2), which had three clan 3 and three clan 4 large clusters on the tree for H. longicornis (Fig. 2b). P450 clan 3 and clan 4 may be linked to xenobiotic metabolism, insecticide resistance, odorant or pheromone metabolism [54]. GST expansions of two ticks mainly occurred in Mu class (Figure S2), which may activate in drug metabolism, particularly in the detoxification of reactive oxygen species (cyclised o-quinones) produced via oxidative metabolism of catecholamines [55, 56]. CCE in H. longicornis showed extreme expansions (Table 2) in neuro/developmental class although classifications of some members were unclear in Chelicerata [57]. These large expansions in the longhorned tick genome greatly enhance abilities in xenobiotic metabolism and insecticide resistance, and thus are considered to contribute to the parasitic adaptation.
Families |
Clan/Group |
H.longicornis |
I. scapularis |
Tetranychusurticae |
---|---|---|---|---|
P450 |
Clan 2 |
50 |
65 |
45 |
Clan 3 |
99 |
100 |
10 |
|
Clan 4 |
45 |
33 |
25 |
|
Mitochondrial |
10 |
7 |
5 |
|
Total |
204 |
205 |
85 |
|
GST |
Delta |
18 |
17 |
19 |
Kappa |
2 |
1 |
1 |
|
Mu |
17 |
23 |
10 |
|
Sigma |
1 |
1 |
0 |
|
Zeta |
1 |
7 |
1 |
|
Omega |
5 |
5 |
2 |
|
Microsomal |
5 |
1 |
0 |
|
Total |
49 |
55 |
33 |
|
CCE |
Total |
155 |
93 |
71 |
ABC |
A |
82 |
25 |
8 |
B |
6 |
5 |
2 |
|
C |
57 |
38 |
37 |
|
D |
4 |
2 |
2 |
|
E |
1 |
1 |
1 |
|
F |
3 |
3 |
4 |
|
G |
19 |
13 |
33 |
|
Total |
174 |
87 |
87 |
The genome research of H. longicornis will facilitate a number of applications including understanding its unique biology and enable the identification of unique tick genes and physiological processes such as blood-meal digestion, haemoglobin digestion, haem detoxification and prolonged off-host survival that could be exploited for tick control. Meanwhile, genome data could offers insights into the interactions of ticks and pathogen and contribute to the designing novel strategies for blocking pathogen transmission and promote the development of tick vaccines.
Ethical Approval and Consent to participate
Not applicable.
Consent for publication
Not applicable.
Availability of data and materials
Raw transcriptomic data and genome assembly have been deposited at CNGBdb under the accessions CNX0199354 and CNA0014631, respectively, corresponding to the project CNP0001175 and sample CNS0254300. Genome annotations are available at the Figshare under the link doi:10.6084/m9.figshare.12662036.
Competing interests
The authors declare that they have no competing interests.
Funding
This research was supported by National Natural Sciences Foundation of China (No. 81871686); and Development Plan Project of Shandong province science and technology (No. 2017GSF221017).
Author Contributions
L.-R.Z. and Z.-Z designed the study. L.-R.Z. and Q.-Z collected the samples. L.-R.Z. and Z.-Z performed the analyses and wrote the paper. All authors edited and approved the final manuscript.
Acknowledgments
We gratefully acknowledge all researchers who submitted the related genome data of ticks and other 14 Chelicerata species.