Transcriptome Assembly of Asian Buffalo Leech Hirudinaria Manillensis and An Accurate Identication of Genes Involved in Anticoagulant Biosynthesis

Background: The Asian buffalo leech Hirudinaria manillensis is a valuable animal widely used in traditional Chinese medicine to cure blood-clotting. However, although the content of the anticoagulant genes in H. manillensis were identied, their actual expression prole remains undetermined. Herein, in this study we conducted transcriptome analysis on this species to address this subject. Results: A high qualied accurate gene expression prole of H. manillensis transcripts was obtained. By identifying genes upregulated during feeding, anticoagulant-related genes and signaling pathways were identied. The assembled Unigenes were compared in various mainstream databases, and a total of 16,682 Unigenes received annotation information. In addition, the Unigenes were evaluated in terms of length distribution, GC content, and expression level. The data showed good sequencing quality and high reliability. A total of 155 anticoagulant-related genes were found in this study, including those involved in different types of degradation of the extracellular matrix and inhibition of platelet aggregation. Conclusions (cid:0) Substantial transcriptome information was obtained by transcriptome sequencing of H. manillensis. This information should help to provide a further molecular theoretical basis for functional gene analysis, genomics, genetic diversity analysis, and molecular marker development of H. manillensis and for the anticoagulation-related genes of this species and its medicinal value as an anticoagulant. manillensis and two that were downregulated. The results show that destabilase plays a role in the anticoagulant and antibacterial processes upon the sucking of blood by H. manillensis.

can supplement the information in genetic databases on this species and strongly promote related research on other leeches. The identi ed anticoagulant-related genes can also be investigated by others. Related research on bloodsucking animals provides a reference for promoting the development and utilization of medicines by humans.

Results
Transcriptome assembly A total of 41.80 Gb of clean data was obtained from the transcriptome sequencing analysis of six transcriptome samples of H. manillensis. The clean data of each sample reached 6.88 Gb, the expected coverage exceeded 200×, and the Q20 base percentage was over 94.90%. The Q30 base percentage was over 88.83%. Clean reads were subjected to ltering out of the joints with the joints, as well as the removal of lowquality reads (unknown N > 1% and quality values Q < 15 reads), so there was no need to further use Trimmomatic or remove other impurities. Sequencing statistics of all six transcriptome samples are shown in Table 1. Note: Sample ID: sample name ST refers to before blood sucking, while AF refers to after blood sucking.
Effective rate: The amount of valid data, the ratio of clean reads to raw reads. Q20: The number of reads achieving a sequencing error quality value of less than 1%. Q30: The number of reads achieving a sequencing error quality value of less than 0.1%.

Functional annotation of genes
To obtain complete functional annotations, all assembled single gene sequences were used as search queries in various databases, and the assembled Unigene uses GO (Gene Ontology), KEGG (Kyoto Encyclopedia of Genes and Genomes), KOG (Clusters of Orthologous Groups), P-fam (Protein family), Swiss-Prot (a manually annotated and reviewed protein sequence database), and Nr (NCBI Non-redundant Protein Sequences) to understand the gene types and expression. The notes on each database are shown in Table 2; according to this   ; and Translation, Ribosome structure and biogenesis, with   corresponding numbers of Unigenes of 2184, 2127, 1183, 871, 856, 608, 502, 499, 471, and 464 respectively (Table S2). Among them, 856 Unigenes with unknown function were found. We speculate that these sequences may be new genes unique to H. manillensis.

Classi cation of GO functions
A total of 9,397 Unigenes were annotated in the GO database and divided into three categories: cellular components, molecular functions, and biological processes ( Figure 2 genes were upregulated in AF, while the expression levels of 1028 genes were downregulated. After the feeding of H. manillensis, the number of upregulated genes was higher than that of the downregulated genes. This indicates that the bodily functions of this species do not decrease to reduce consumption after a single feeding, and more genes are instead activated to participate in metabolic mechanisms. The identi ed DEGs should provide an abundance of molecular information for further exploration of the bloodsucking habits of H.manillensis.

GO function analysis of DEGs
The selected DEGs were annotated with GO functions to reveal the associations between DEGs and certain biological functions. The results showed that a total of 1210 DEGs were annotated in the GO data (table S3).
The GO annotation information results of the DEGs are shown in Figure 5. According to these results, DEGs were classi ed into 48 functional categories in the three major categories of biological processes, cellular components, and molecular functions. Within the category of biological processes, involving 16 GO function categories, the differentially expressed genes were particularly associated with metabolic processes, cellular processes, and single-organism processes, while the remaining DEGs were mainly involved in response to stimulus, among others. Within the category of cellular component, involving 17 GO function categories, the categories of membrane, membrane part, cell, and cell part had the most differentially expressed genes.
Within the 15 functional categories of molecular function, the differentially expressed genes were mainly involved in binding, catalytic activity, and transporter activity. The remaining DEGs were mainly involved in enzyme regulatory activity and electron carrier activity, among others. From this, we can conclude that metabolic activity, protein synthesis activity, and stress response of H. manillensis change signi cantly between before and after blood sucking.
Then, the study enriched the GO of the differential gene in this process, the results of which are shown in Figure 6. In the results of the GO term enrichment of H. manillensis, the main identi ed categories were protein degradation, plasma membrane, serine endopeptidase, response to wounding and cell redox homeostasis. These uniquely enriched GO terms are related to damage repair and in ammation of the body.
These results show that there are diverse anti-in ammatory and analgesic effects on the host in H. manillensis. It is speculated that these may be related to the size of the wound and the physiological characteristics of the bloodsucking.

Functional analysis of KEGG pathways associated with DEGs
Using the KEGG database, pathway enrichment analysis on the DEGs was performed. The results revealed a total of 889 DEGs that participate in 173 metabolic pathways (table S3). There was a total of 20 KEGG pathways with signi cant enrichment, including Metabolism of xenobiotics by cytochrome, Apoptosis, Glutathione metabolism, Vitamin B6 metabolism, VEGF signaling pathway, Phenylalanine and so on. The statistical results are shown in Table 3.
For the KEGG pathway results, this study found that, among the KEGG pathways enriched with the differentially expressed genes of H. manillensis, the particularly well-represented pathways included Neuroactive ligand-receptor interaction, Dorsoventral axis formation, cysteine and methionine metabolism.
Most of these genes are not directly or only indirectly related to anticoagulation, but are related to developmental regulation. Therefore, it is speculated that the activated metabolism of H. manillensis is the main genetic change that occurs after feeding, and there is relatively limited regulation of anticoagulationrelated genes. The P-fam protein family were annotated with antithrombin collected by querying the interproscan website [13,14]. All genes annotated to the same P-fam family were then screened as anticoagulant-related genes in H. manillensis. The results showed that there were 155 anticoagulant-related gene sequences in the H. manillensis genome. These genes were categorized using anticoagulant and anticoagulant-related gene types (analgesic and anti-in ammatory, inhibition of platelet, extracellular matrix degradation, and direct anticoagulation). The results are shown in Table 4.  The antistasin family of proteins is a class of cysteine-rich serine protease inhibitors that can markedly inhibit the coagulation process, albeit through different mechanisms of action [15][16][17]. Antistasin can also respond to host in ammatory responses, exerting an anti-in ammatory effect. Among the genes related to the antistasin family identi ed in this study, 10 related genes were upregulated and 5 were downregulated upon blood sucking in H. manillensis. This indicates that the antistasin protein family plays an important role in the anticoagulant process after the blood sucking of H. manillensis.
ATP diphosphate hydrolase (apyrase)-related genes ATP diphosphate hydrolase (apyrase) is an antiplatelet factor. The mechanism by which apyrase in the saliva of bloodsucking arthropods works is as follows: ADP present in the host after tissue damage promotes platelet activation, but apyrase can hydrolyze ATP and ADP to AMP and inorganic phosphorus, which hinders platelet aggregation [18][19][20][21][22]. This process makes it di cult for a bitten host to form a thrombus, which facilitates the feeding of bloodsucking animals.
In this study, three apyrase-related genes were shown to be downregulated after blood sucking by H. manillensis and one was downregulated. This indicates that apyrase participates in the anticoagulation process after H. manillensis bloodsucking.

Destabilase-related genes
Destabilase acts as a thrombolytic agent in leech; it cleaves peptide bonds and promotes brinolysis, thereby inhibiting coagulation activity [23][24][25][26]. As a multifunctional enzyme, destabilase can also exert antibacterial function. In addition, the salivary glands of the leech can also secrete other kinds of enzyme, which together with destabilase play a bacteriostatic role.
This study identi ed three genes related to destabilase that were upregulated after the blood sucking of H. manillensis and two that were downregulated. The results show that destabilase plays a role in the anticoagulant and antibacterial processes upon the sucking of blood by H. manillensis.

Disintegrin-related genes
Disintegrin contains a sperm-glycoside-aspartic acid sequence or a lysine-aspartate sequence and is rich in cysteine. It binds to the glycoprotein brinogen receptor on the platelet membrane, competitively antagonizing the binding of brin to platelets, and prevents platelet activation and changes in glycoprotein receptor conformation, blocking the binding of receptors to multiple ligands. In this way, it inhibits the nal common pathway of platelet aggregation and blocks the formation of a thrombus.
This study identi ed 15 genes related to disintegrin that were upregulated after blood sucking by H. manillensis and 9 that were downregulated. This indicates that disintegrin plays an important role in the anticoagulation process after the blood sucking of this species.

Gene analysis of bloodsucking characteristics of H. manillensis
Finally, 155 genes related to anticoagulation were screened, corresponding to 13 active substances, including antistasin, apyrase, destabilase, and disintegrin. According to their functions and effects, these genes are classi ed into those involved in platelet inhibition and those directly encoding anticoagulants. Among them, the expression level of the antistasin family, which has both anticoagulant and anti-in ammatory effects, was generally increased. In terms of the expression trends of genes related to the inhibitors of platelet aggregation apyrase, destabilase, and disintegrin, these were upregulated. At the same time, the expression levels of related genes involved in the anti-in ammatory response of host wounds also showed an upward trend. This shows that, in the process of blood sucking, H. manillensis produces a variety of active anticoagulant substances, and directly or indirectly, these together inhibit the host's blood coagulation process, accompanied by an anti-in ammatory response. Taken together, these ndings indicate that the anticoagulant process is a complex biological pathway in which multiple substances cooperate and interact.
Upon consuming blood, bloodsucking animals can also take in infectious bacteria from the surface of the wound and immune proteins from the blood of the host. Therefore, they need to effectively protect themselves against bacterial invasion and immune protein attack. Destabilase has a strong inhibitory effect on bacteria. Hyaluronidase also acts as an active bacteriostatic protein, which dissolves the envelope of bacteria and forms antibodies, effectively inhibiting the activity of various bacteria and fungi. Some of the functional genes associated with the immune response, such as those annotated as hyaluronidase, were shown to be upregulated after the blood sucking of H. manillensis .
After bloodsucking animals ingest host blood, the hemoglobin in the blood tends to hydrolyze to produce a large amount of hemoglobin. Heme is a potent oxidant that produces hydroxyl radicals with excellent oxidizing ability. Studies have shown that heme can cause DNA damage in mitochondria and alter the expression of apoptotic proteins in these organelles. Hemoglobin also causes a certain degree of damage to the genome. In H. manillensis, hemoglobin is likely to cause great damage during the process of digesting blood. Therefore, this species needs to cope with the oxidative stress generated during blood digestion. Citelli et al. [15] asserted that a variety of proteins with oxidoreductase activity, such as superoxide dismutase, glutathione S-transferase, and thioredoxin, are involved in this antioxidant mechanism. In the process of blood sucking by animals such as ticks, the stress response and non-speci c immunity play important roles.
After the blood sucking of H. manillensis, the expression of related genes involved in the antioxidant process was found to be upregulated and the antioxidant capacity was signi cantly enhanced.
Upon blood sucking, H. manillensis needs to produce a large amount of active anticoagulant substances to ensure the uidity of the host blood. At the same time, it also produces antibacterial and immune-related substances to protect against foreign substances, and proteins having oxidoreductase activity are produced by antioxidation to cope with oxidative stress generated by hemoglobin during blood digestion.

Functional analysis of KEGG pathway in anticoagulation-related DEGs
We found 155 anticoagulant-related genes based on the transcriptome data of H. manillensis and explored their related pathways. The Notch signaling pathway, nicotinate, and nicotinamide metabolism associated with anticoagulant genes are involved in cell growth. It is of great signi cance to help to detect the occurrence and development of tumors and to treat them in a targeted manner by excavating these pathways.
(1) Notch signaling pathway In the KEGG database annotation, there are 48 Unigene annotations in the Notch signaling pathway, with three DEGs participating in the pathway. The Notch signaling pathway is important for cell-to-cell communication, which regulates cell differentiation, apoptosis, proliferation, and morphogenesis. It is also critical for the growth and development of organisms [27][28][29][30][31][32]. The Notch signaling pathway is mainly composed of Notch-related receptors, Notch-associated ligands, Notch downstream signal transduction molecules, and nuclear response factors. TACE (ADAM17) is a key enzyme in the activation of the Notch signaling pathway and has a signi cant limiting effect on its activation [33][34][35]39]. Simultaneous activation of ADAM17 induces GPIbα digestion [36], and platelet membrane glycoprotein (GP) Ibα digestion reduces the expression of platelet surface functional receptors, resulting in weakened platelet adhesion. This affects the formation of blood clots and inhibits the aggregation function of platelets, being a recognized negative regulatory mechanism of platelets. After the feeding of H. manillensis , the expression of TACE-related gene (ADAM 17-like protease) is upregulated, which inhibits platelet aggregation, blocks thrombus formation, affects the activation of the Notch signaling pathway, and participates in the anticoagulant process.
Notch-mediated signaling plays a key role in the development of the cardiovascular system and cardiovascular disease, and is closely related to the regulation of immune system function, pancreatic cancer, and medulloblastoma [37][38][39]. Notch signaling may play different roles in different environments, times, or cell lines, and its over-or under-expression may have a negative impact on the organism. Therefore, an indepth understanding of Notch signaling and its interaction with various pathways can provide a foundation for more research on the treatment of cardiovascular diseases and tumors, and for carrying out targeted and e cient clinical treatment.
(2) Nicotinate and nicotinamide metabolism ATP diphosphate hydrolase (Apyrase) blocks platelet aggregation. The apyrase-related gene detected in this study is involved in the nicotinate and nicotinamide metabolism pathway, and for three enzymes there was upregulated gene expression after vaccination with H. manillensis, while for one enzyme there was downregulated gene expression.
In the KEGG database annotation, there are 27 Unigene annotations on the Notch signaling pathway, with four DEGs participating in the pathway. Nicotinamide phosphoribosyltransferase (Nampt) is the rate-limiting enzyme of this pathway [40][41][42]; it also has physiological functions such as promoting angiogenesis, antiapoptosis, participating in the body's in ammatory response, and promoting the proliferation, differentiation, and maturation of various cells. After the feeding of H. manillensis, apyrase-related genes are involved in the anticoagulant process, inhibiting platelet aggregation, and downregulating the expression of Nampt-related genes, affecting the niacin and nicotinamide metabolic pathways.
The concentration of Nampt in the blood also signi cantly increases in many tumor patients, so an inhibitor of the activity of this enzyme is an anti-tumor drug with great potential for clinical application [41][42][43].
However, in this context, there are still many problems to be resolved. The role of anticoagulation genes related to H. manillensis before and after blood sucking is discussed to provide a basis for the better application of tumor treatment.

SNP analysis
Single-nucleotide polymorphism(SNP is mainly a DNA sequence polymorphism caused by DNA replication, a genetic marker formed by a single-nucleotide variation in the genome. These variations have become commonly used as molecular markers, and in disease diagnosis, paternity testing, plant breeding, and genetic mapping [44,45]. By identifying potential SNP sites, it is possible to analyze whether these sites affect the expression level of a gene of interest or the particular protein that is produced. SNPs can be divided into two types: transitions and transversions. Depending on the number of alleles at the SNP site, the SNP can be divided into a homozygous SNP site (only one allele) or a heterozygous SNP site (two or more alleles). There is a difference in the proportion of heterozygous SNPs in different species. SNP molecular markers were developed from H. manillensis transcriptome data. In the obtained 17,051 Unigenes of H. manillensis transcripts, an average of 154,399 SNP sites were found, of which 109,917 were transitions, accounting for 71.19%, and 44,482 were transversions, accounting for 28.81% (Table 5).

Material
All leeches collected in this study were in vivo. At least 20 individuals of H. manillensis were collected from Hechi City, Guangxi Province. They were kept in mineral water and brought back to the laboratory. Then, each individual was subjected to morphological identi cation, and after species determination they were selected for the experiment and reared in a dark and ventilated place.
RNA extraction, detection, library construction, and sequencing The RNA required for transcriptome construction was extracted using Qiagen kit (QIAGEN RNeasy mini kit).
First, the live leeches were directly frozen in liquid nitrogen, then pulverized in a mortar, and a small amount of RNAiso lysate was added and centrifuged immediately. After the precipitate had been obtained, a su cient amount of lysate was added and digested for at least 2 h. Next, total RNA was extracted from the lysed sample in accordance with the standard procedure on the kit. The quality of the nal extracted RNA was accurately quanti ed using 1% agarose gel electrophoresis, qubit, and an Agilent 2100 bioanalyzer. Since the transcriptome was designed for two expression states per species, which were set to pre-feeding and 3 h after feeding, for each transcriptome sequencing sample, the total RNA of an individual was extracted, which was stored in liquid nitrogen and used for subsequent sequencing.
The transcriptome sequencing was carried out by Beijing Baimaike Biotech Co., Ltd. For transcriptome sequencing, two RNA libraries were constructed for H. manillensis using the Hiseq 4000 platform. Library construction used oligomagnetic beads to physically enrich mRNA; the mRNA was then eluted and the random six-mer primers were reverse-transcribed to form a cDNA library. The cDNA library was then subjected to ultrasonication, fragment recovery (300 bp), end repair, poly-A addition, ligation, library recovery, and PCR preampli cation. After the library quality test had been passed, bridge PCR was carried out for chip xation, and PE 150 bp (PE 150) was sequenced. The generated data were then used for subsequent analysis.

Screening and analysis of anticoagulant genes in H. manillensis
A number of records related to anticoagulant-related proteins and polypeptides in leech are available for download. By downloading all of these previously reported leeches with anticoagulant function and protein and polypeptide sequences associated with anticoagulation, from NCBI's protein. DESeq [55] was used for differential expression analysis between sample groups to obtain the differentially expressed gene set between the two biological conditions

SNP detection
In this study, GATK [56] software was used to identify the single base mismatch between the sequencing sample and the reference genome based on the comparison result of TopHat2 of reads of each sample and the reference genome sequence, so as to identify potential SNP loci.Then it can be analyzed whether these SNP sites affect the expression level of genes or the type of protein products. Meanwhile, SnpEff [57] were by annotated variation (SNP) and predict variation effects.    Volcano map of the differential gene distribution of H. manillensis Note: Red represents up, green represents down. The abscissa shows the log2 value of fold change, while the ordinate shows the log10 value of FDR.  Map of KEGG pathways with enrichment of differentially expressed genes in H. manillensis Note: The vertical axis represents the path name, while the horizontal axis represents the rich factor. The size of the dot indicates the number of differentially expressed genes in this pathway, and the color of the point corresponds to a different Qvalue range Nicotinate and nicotinamide metabolism Note: Relative to the control group, the red box-labeled enzyme is associated with the upregulated gene, and the green-box-labeled enzyme is associated with the downregulated gene. The blue box-labeled enzyme is related to both upregulated and downregulated genes, and the numbers in the box represent the numbers of enzymes (EC numbers). Figure 10 10 SNP mutation-type density distribution map Note The horizontal axis presents the number of SNPs distributed per 1000 bp in the gene, and the vertical axis presents the number of genes.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.