De Novo Transcriptome Assembly for Venom Gland in Two Spe-Cies of Spiders (Sinopoda Pengi and Trichonephila Clavata)

Natural molecules from spider venom are considered potential drugs for diseases including cancer and pain, as well as the development of new biological insecticides for agricultural use. During coevolution in the long-term predator-prey game, spiders have formed a huge molecular diversity of toxins. As of March 1 of 2021, a total of 49,243 spider species had been described, but studies of venom have been performed in only a few hundred of these species due to the diculty of collecting venom. Two technologies have helped partially dealing with this limitation in the recent past: the screening of cDNA libraries constructed from venom gland mRNAs and the heterologous expression of the coded peptides for functional characterization. In this study, transcriptomic analysis was performed to describe the predicted toxins of Sinopoda pengi (hereafter S. pengi) and Trichonephila clavata (hereafter T. clavata). The Trinity assembly result in 163,418 transcripts, 114,127 unigene of S. pengi and 125,099 transcripts, 87,084 unigene of T. clavata. A total of 22 and 24 unigenes were identied which were predicted to inhibitor cysteine knot (ICK) toxins from S. pengi and T. clavata, respectively. In summary, molecular templates with potential application value in medical and biological elds were obtained by classifying and characterizing presumed venom components, which lays a foundation for the further study of venom.


Introduction
More than 100,000 species distributed across all major phyla of the animal kingdom have developed sophisticated machinery to produce poison or venom to defend themselves from predators, to ght against competitors, or to immobilize and digest prey [1][2]. Spiders are the seventh most diversi ed order of animals and the most diversi ed group of venomous predators in terms of the number of species, occupying most ecological niches [3]. Spider venoms are complex cocktails produced from the holocrine glands in the chelicerae. During coevolution in the long-term predator-prey game, venomous animals have formed a huge molecular diversity of animal toxins [2]. These toxin molecules usually act on the key physiological protein elements of the target organism, such as cell membrane receptors and ion channels (effectively distinguishing between different membrane receptors and ion channel subtypes) [4]. Toxins have become an important source of molecular tools for dissecting physiological processes [5] and are taken as lead molecules for the development of drugs targeting a range of conditions, such as chronic pain, cancer, stroke, and autoimmune disease [6]. Sinopoda pengi (Sparassidae; Fig. 1A) is a wandering spider with a widespread distribution in Yunnan, China, that lives on walls near houses or in rock cracks along roadsides. These spiders are ambush predators that do not spin webs but instead rely on their strong chelicerae and powerful venom to paralyze or kill their prey during hunting. They hunt at night, remaining quietly on walls waiting for prey to appear and then killing the prey when it passes by. They prey on nocturnal insects such as small beetles and crickets. In contrast, Trichonephila clavata (Araneidae; Fig. 1B) usually produces large, strong nets around hedges or bushes to facilitate prey capture, in addition to deploying its venom arsenal. These spiders are active during the day, and their web often contains the remains of butter ies, bees and other insects. These two species are relatively widespread spiders with different feeding habits, and their venom composition have not yet been studied with transcriptomic.
Spider venom received little attention for quite some time due to technological limitations and its low impact on human health until its enormous medical potential was discovered. As of March 1 of 2021, a total of 49,243 spider species had been described, but studies of venom have been performed in only a few hundred of these species [7]. Considering that individual venoms may contain hundreds of unique compounds and that the venoms of even comparatively well-studied lineages remain largely unexplored [2], animal venoms constitute an enormous unexplored natural library of bioactive compounds [8]. One of the reasons for the gap in venom knowledge resides in the di culties imposed by the need to collect a large number of specimens from natural environments to "milk" their venom in large-enough quantities to allow the isolation of the less represented components [9]. The emergence and development of sequencing techniques has provided a solution to this awkward situation, and the toxin information available in the ArachnoServer and UniProt databases has increased rapidly in the last decade. Through transcriptome sequencing, we can determine the expression of the components associated with venom from just a few samples. Based on this approach, spider venom has been speculated to contain neurotoxins as well as other components, such as enzymes with serine proteinase, metalloprotease activity [10][11][12]. Here, only a few spiders are required to perform a detailed venom composition analysis via transcriptomic techniques.

Biological sample and RNA extraction
Specimens were collected in Dali, China (25°40′41″N, 100°8′59″E; and 25°40'52″N, 100°10'20″E) and were identi ed as S. pengi and T. clavata by Professor Zi-Zhong Yang of Dali University. The spiders were kept at the Yunnan Provincial Key Laboratory of Entomological Biopharmaceutical R&D at room temperature under a natural light-dark cycle, fed mealworms on a weekly basis and provided with petri dishes as drinking vessels.
Venom glands were removed from the cephalothorax of each spider (Fig. 2) and total RNA was extracted from the venom glands of 5 S. pengi and 5 T. clavata using TRIzol, following manufacturer instructions, respectively (Invitrogen Life Technologies, USA). The concentration and purity of the RNA were checked with a Nanodrop 2000 (Thermo Fisher Scienti c, USA). For RNA sequencing, RNA integrity was assessed by standard denaturing agarose gel electrophoresis.

cDNA library preparation and Illumina sequencing
Total RNA was used for the construction of RNA-seq libraries, and we performed paired-end sequencing on an Illumina NovaSeq 6000 system (Origin-gene Biopharm, China) following the vendor's recommended protocol.

Assembly and annotation
The clean data were de novo assembled with Trinity v.2.6.6 using the standard protocol [13]. After the transcripts were assembled by Trinity, the functional annotation of these transcripts was performed. Before annotation, the open reading frame (ORF) prediction method of Trinity was used to predict the amino acid sequences of all assembled transcripts.
Annotation of de novo assembled transcripts was performed using Blastx

Bioinformatic analysis
The unigenes used BLASTX searched against a toxin-related subdatabase of UniProt (https://www.uniprot.org/program/Toxins), best hits were de ned using E-value cutoff of 1e-5 and selecting the best score. These unigenes were translated using the ORF nder (https://www.ncbi.nlm.nih.gov/or nder).
To remove potentially nontoxic orthologs, all candidate toxin unigenes were used as queries for the local BLASTX search (e-value = 1e-5) against two databases: (1) animal-reviewed proteins in the UniprotKB and (2) toxin protein database, as explained previously. Unigenes were ltered out from the candidate toxin gene dataset if the BLAST score of the best hit from the animal protein database was higher than the animal toxin database results [14].
The sequences which may encode for toxins were used for multiple sequence alignment and performed the alignments with Clustal Omega (version: 1.2.4), without a conserved region, were discarded. The signal peptide was predicted using the SignalP-5.0 program (http://www.cbs.dtu.dk/services/SignalP/); pro-peptides were determined by using ProP software (http://www.cbs.dtu.dk/services/ProP/).

RNA extraction and RNA-Seq
The total amounts of RNA obtained from fteen adult female S. pengi and T. clavata, respectively. mRNA was isolated, enriched, fragmented and reverse transcribed into cDNA. Then, the cDNA libraries were sequenced using an Illumina NovaSeq 6000 system (LC Bio, China). The sequencing of the PE libraries produced 47,271,254 and 39,592,392 reads, respectively (Table S1).

De novo transcriptome assembly and functional annotation
There is no available reference genome for S. pengi or T. clavata, so Trinity needs to be used for de novo assembly. After assembly, we obtained 163,418 transcripts, 114,127 unigene of S. pengi and 125,099 transcripts, 87,084 unigene of T. clavata by using Trinity. After statistical analysis, transcripts were annotated in the NR, GO, KEGG, and Pfam databases (Supplementary Figs S1-3, Table S2-3).

Possible toxins identi ed at the transcriptional level
Detailed toxin information is provided in the supplementary materials (Supplementary Tables S4-5).
Each transcript annotated as a toxin is discussed based on the existing literature, followed by a categorical description of the toxin.

ICK-like spider venom peptides
Cysteine-rich peptides are the best investigated venom components and are believed to exist in most spider venoms. They modulate a broad range of channels and receptors on the membranes of excitable cells (e.g., nerves and muscles) [15]. Ion channels on the membranes of excitable cells are responsible for proper signal transduction. Calcium channels are involved in neurotransmitter release from presynaptic cells, voltage-gated sodium channels enable action potential transmission along excitable cells, and voltage-gated potassium channels are crucial for restoring a resting state in depolarized cells [15]. Toxins interact with these targets to disrupt normal channel function, which can affect breathing or heart function, leading to symptoms such as convulsions, paralysis, and eventually death. Abundant ion channel toxins were identi ed by annotating the transcripts of S. pengi and T. clavata. The families of the predicted ICK toxins are described in detail below.

Group I (C-C-CC-C-C)
Group I contained three unigenes in T. clavata, two of which (DN1625_c0_g1, DN16263_c0_g1) were similar to Mu-Sparatoxin Hv2 (UniProt A0A088BP94) and one (DN5056_c0_g1) was similar to Kappa-Sparatoxin Hv1a (UniProt P58425) (Fig. 3A). These reference toxins belong to the neurotoxin 10 (Hwtx-1) family. Mu-sparatoxin-Hv2 from Heteropoda venatoria, is insecticidal toxin potently and irreversibly blocks Na V channels in cockroach dorsal unpaired median (DUM) neurons (IC 50 =833.7 nM) [16]. Na V channels are crucial in the generation and transmission of action potentials in the central nervous system, peripheral nervous system, heart, smooth muscle and skeletal muscle, so the function of Na V channels is crucial, and they are targets of many toxins. There are nine α subtypes, known as Na V 1.1-Na V 1.9 in mammals, each with different tissue distributions and functions [17]. The reference toxin Kappa-Sparatoxin Hv1a is a potassium and calcium channel blocker. K V channels play an important role in human physiology, including the regulation of neurotransmitter release, heart rate, insulin secretion, nerve cell secretion, skeletal muscle contraction, etc. Numerous channelopathies arising from mutations in these channels or from autoimmune attack on the channels have been characterized [18][19]. Potassium channels are the target of a variety of toxic animal toxins, and the potassium channel inhibitors found in spider venom thus far mainly act on K V 1, K V 2 and K V 4 channels, although a few act on other channels, such as K V 3, K V 7 and K V 11 channels [20]. In S. pengi, Group I contained 9 unigenes (Fig. 3B), two of them showed homology with kappa-sparatoxin-Hv1a and seven were aligned to Musparatoxin-Hv2.

Enzymes
Enzymes are very common components of the venom of poisonous animals such as poisonous snakes, scorpions and centipedes. Proteases are enzymes that hydrolyze the amide bonds of the peptide units of polypeptides and proteins. The overall purpose of such an enzyme arsenal co-injected with toxins into a prey's tissue seems clear: by destroying the barriers imposed by the extracellular matrix and cell membranes, the toxins can quickly reach their targets [23]. Additionally, the proteolytic activity of some of these enzymes facilitates subsequent preoral digestion. As proteins play important roles in the maintenance of homeostasis, proteases are vital regulators of physiological processes. Many pathological conditions of humans and animals have been linked to the malfunctioning of this category of enzymes; as a result, they are seen as attractive targets for drug discovery [24].
Unigenes related to metalloproteinases were found in S. pengi and T. clavata. The role of metalloproteinases in venoms is not fully understood, but it may be a diffusion factor promoting the diffusion of other venoms. It may be involved in proteolysis of other venomous toxins, or it may aid extraoral digestion of prey [25].
In S. pengi, we found unigenes may encoding acetylcholinesterase. Acetylcholine-mediated neurotransmission is fundamental for nervous system function. Acetylcholinesterase (AChE) hydrolyses and inactivates acetylcholine, thereby regulating the concentration of the transmitter at the synapse [26]. In venom, the toxic role of this enzyme is unclear; it could reduce musculatory control by rapidly hydrolyzing acetylcholine, or it works synergistically with alkaline phosphatase to paralyze prey through hypotension.

Conclusion
Through RNA-seq, we performed a detailed analysis of all predicted toxins of S. pengi and T. clavata. The results can serve as a reference for other spider toxicological studies and provide valuable molecular templates for research and therapeutic applications.    Alignments of cysteine-rich peptide toxins mature sequences from group (C-C-CC-C-C) in T. clavata. All alignment was performed with Clustal OMEGA. Conserved cysteines are marked in blue.

Figure 5
Alignments of cysteine-rich peptide toxins mature sequences from groups in T. clavata.

Figure 7
Alignments of cysteine-rich peptide toxins mature sequences from group IV (C-C-CC-C-C-CXC-C-C) in T. clavata.

Figure 8
Alignments of cysteine-rich peptide toxins mature sequences from group V (C-C-CC-C-CC-C-C-C) in S. pengi.

Figure 9
Alignments of cysteine-rich peptide toxins mature sequences from group (C-C-C-CC-CXC-CXC-C-C-C) in T. clavata.