The morphology, Habits and phylogenetic status of H. venatoria
The spider H. venatoria is a member of family Sparassidae, common name giant crab spider, and takes more than 250 days to complete their life cycle, which leads to the collection of natural venoms is limited greatly . Adult specimens have a flattened body length of about 2.5 cm with 8 long slightly hairy legs spanning 6 to 12 cm. A yellow to cream clypeus just in front of the eyes is one of the main features of H. venatoria. The female has a larger abdomen with an overall brown body. Usually, an egg sac up to about 2.5 centimeters wide was carried with her pedipalps under its body. The male has a slender body and longer legs, a distinctive pattern on his carapace. This species is found in many tropical and subtropical regions of the world, and can't survive outside during sub-freezing temperatures. In present study, the spiders were found from basements, barns, and greenhouses of our scientific research farm in summer.
The huntsman spiders do not spin webs. They are known to hunt and feed on living insects with their exceptional agility and speed at night. They can stay and run on a vertical smooth surface, as well as contort and squeeze their large body to fit into surprisingly small cracks and crevices, which give them a strong advantage both in predation and evading predators. Almost as soon as they catch their prey, the spiders paralyze them by injecting with the venoms which are from glands (Fig. 1) extending from the chelicerae into the cephalothorax. The spider is considered a beneficial resident of households because it can hunt pests efficiently and does no harm to people.
The phylogenetic relationships (Fig. 2) amongst the spider families whose venom CRPs have been well described shows H. venatoria (in Sparassidae) locate in the middle between Theraphosidae and Lycosidae [16, 29–35]. Compared with Mygalomorphae species (Haplopelma huwenum (now named Haplopelma Schmidti), Haplopelma hainanum, Grammostola rosea, Chilobrachys jingzhao) and Araneomorphae species (Lycosa singoriensis, Dolomedes mizhoanus, Araneus ventricosus), the body size of H. venatoria, who belongs to the more primitive Araneomorphae, is medium between the two suborders (Fig. 2). Although sensitive to light like most spiders, H. venatoria is a large indoor terrestrial spider and hides under a simple bunker during the day and goes out hunting at night. Unlike web-forming spiders (A. ventricosus, A. orientalis et al.) that weave cobwebs before prey and then eat after wrapping its prey with cobwebs, H. venatoria subdues prey by long legs, strong chelicerae and complex venom, which seems similar to tarantula’s predation.
Based on the comparisons of morphology and predation habits of H. venatoria along with other spiders, as well as its phylogenetic classification, we believe that the investigation of the genetic coding products in its venom, the most convergent of spider traits, can contribute to the understanding of Araneae toxins evolution in the context of ecology, as well as the recognition and likely facilitate exploration of the popular spider resources in the cosmotropical regions.
Family /cluster identification
In the present study, the sequences of CRP precursors revealed 24 families based on the sequences of signal peptide and cysteine framework. The formation of disulfide bonds stabilizes the three-dimensional (3D) structures of toxins, and is commonly used to classify toxins.
Family A-H. The full primary sequences of the CRPs in the Family A-H are comprised of a signal sequence (19–25 residues) and a propeptide (11–19 residues) preceding the mature toxin sequence. The N- and C-terminus of mature peptides are highly variable regions. The 11 members of Family A are homologs of Kappa-SPRTX-Hv1c, including its five different precursors (κ-SPRTX-Hv1c_1–5). Signal peptide mode of CRPs in the family is ‘MKh12Sh5’, where ‘h’ indicates hydrophobic residue, the Arabic numerals denote the number of residues, and capital letters indicate the corresponding amino acids. The propeptides of Family A are 19 residues with highly conserved DEQR as an endoproteolytic site preceding the mature peptides, named the Processing Quadruplet Motif (PQM). Mature peptide mode of family A is ‘XCX6CX5CCX4CX3CX4 − 6’, where ‘X’ is any amino acid. On the C-terminal of the mature peptides, there is ‘GK’ as the amidation site. The characters of the signal peptide, propeptide and mature peptide of family B-H are compared with those of family A, shown in Table 1. The mature peptides of Family A-H show the ‘classical’ Inhibitory Cystine Knot (ICK) motif containing three disulfide bonds with I–IV, II–V and III–VI connectivity. The first two disulfide bonds (I–IV and II–V) form an embedded ring which is threaded by the third disulfide bond (III–VI). The backbone regions between successive Cys residues are referred to as loops, numbered starting with loop 1 between Cys I and Cys II [38, 39]. There is less amino acid sequence divergence in loop 1 and 3 than in the much more variable loop 2, 4, N- and C-terminus in mature peptides. The precursors of Family A – G have higher similarity with those from the same species than others. Only the sequences of CRPs in the Family H show high homology with U23-ctenitoxin-Pn1a and U4-agatoxin-Ao1a from Phoneutria nigriventer and Agelena orientalis respectively (Additional file 1: Fig. S1).
Family I. The precursors of Family I have a high content of acidic amino acid in putative mature peptides with a novel Cys scaffold ‘CX5CX3CX5CXCXC’. Since no significant homologous sequence has been found in public protein databases, posttranscriptional processes such as alternative splicing or posttranslational modifications remain uncertain. In this case, the dotted border indicates the putative short propeptides shown in Additional file 2: Fig. S2. The motif of Family I is the first time reported from spider venom.
Family J. Family J includes eleven homologous sequences which are characterized by two consecutive Cys residues in the middle of signal peptide and that is straight followed by the mature region with a cys-scaffold ‘CX13CX2CX12CX3CX8C’. The scaffold in mature peptide has been identified as a conserved domain pfam01147 (representative proteins gi: 221468699, 5921747 shown in Additional file 3: Fig. S3) which includes all known crustacean hyperglycemic hormones (CHHs) found in the sinus gland of isopods and decapods  and the molt-inhibiting hormone (MIH) of the lobster Homarus americanus . The three disulfide bridges are CI–CV, CII–CIV, and CIII–CVI . In addition, the amino acid sequences of several translated cDNA (gi: 304306070, 304307035, 304306844, 304306583) from Loxosceles intermedia venom gland library  are also similar to that of Family J as shown in Additional file 3: Fig. S3. The latrodectins which are identified in widow spiders venom glands also share six conserved cysteines that adopt the same disulfide bond pairing in the mature peptide [44, 45].
Family K, L, M and N. The four families have eight cysteines with a typical motif ‘CIX6CIIXnCIIICIVX4CVXCVIXnCVIIXCVIII’ where X is any residue but cystine. However, the amounts and properties of residues in the loops of CII-CIII, CVI-CVII and the C-terminus are different from one another. The sequences of the signal and precursor proteins, as well as endoproteolytic sites are also diverse. In Family N, there is a long loop between CVI and CVII, and a very short propeptide preceding the mature region. Especially, there is no propeptide predicted in the precursor of U32-sparatoxin-Hv1a. The amino acids sequences of Family K, L, M and N are aligned with the most similar known homologs, respectively, shown in Additional file 4: Fig. S4.
Family O. Family O includes 20 unique sequences which are highly homologous. It is noteworthy that Family O is a novel venom peptide type with a high expression level in H. venatoria. There are twelve residues between the signal peptides and the first Cys, but no usual PQM. The mature region is characterized by a novel eight-Cys scaffold ‘CIX21CIIX4CIIIX9CIVX10CVX11CVICVIIX4CVIII’. The transcript of a secretory protein with the identical Cys- scaffold has been identified from black widow spider (gi: 318087504). However, it is hypothesized to be involved in wrapping silk fibers. Moreover, several hypothetical non-secretory proteins from Amblyomma maculatum also adopt the same eight-Cys scaffold (Additional file 5: Fig. S5).
Family P. The six precursors are highly homologous with Omega-agatoxin-1A (gi: 2507406) from Agelenopsis aperta containing a ten-Cys scaffold ‘CIX6/8CIIXCIIIX6CIVXCVX7/11CVIXCVIIX7CVIIIX5CIXX19/20CX’, so they belong to the omega-agatoxin superfamily which has a particularly interesting feature of the prepropeptide with the occurrence of two glutamate-rich sequences interposed between the signal sequences, the major peptide toxin, and the minor toxin peptide. Heterodimer of the two subunits are linked by a disulfide bond  (Additional file 6: Fig. S6).
Family Q and R. Family Q is homologous with U19-ctenitoxin-Pn1a (gi: 50401390), Hainantoxin-XIV-7 (gi: 310946827), HWTX-XIVa2 (gi:166007861) precursor and a toxin-like peptide (gi: 380692240) from Grammostola rosea. Family R is homologous with U3-aranetoxin-Ce1a (gi: 27805756). The precursors in both families contain a signal peptide and a mature peptide with a ten-Cys scaffold like ‘CIXnCIIX4CIIICIVXnCVX9CVIXnCVIIXCVIIIX5CIXXnCX’. However, the residues are very different in the loops between cysteines. The loops of CIV-CV, CVI-CVII and CIX-CX are longer in Family R than those in Family Q (Additional file 7: Fig. S7).
Family S, T and U. All the three families are composed of a signal peptide and a mature region with a ten-Cys scaffold ‘CIX7CIIX8CIIICIVX4CVX5CVICVIIX3CVIIIX3CIXX17CX’, which has a high degree of similarity to the Cystine scaffold of U7-agatoxin-Ao1a (gi: 74845728) and U20-lycotoxin-Ls1a/c/d (gi: 313471673/313471696/313471677 ) from Agelena orientalis and Lycosa singoriensis respectively. The amino acids in the loop between CVIII and CIX are conserved in the peptides even from different spider families (Sparassidae, Agelenidae and Lycosoidea). The sequences between CIV and CV are also conserved in the three families from the same spider H. venatoria. However, those in other spaces are less homologous, especially, the amino acid sequences in the N- and C- terminus are much various and diverse in Family S, T and U (Additional file 8: Fig. S8).
Family V. The predicted peptide sequences in Family V have a similar disulfide bonding pattern and structure to U9-agatoxin-Ao1a (gi: 74845712). Their common Cys bonding pattern is ‘CIX6CIIX3CIIIXCIVCVX5CVIXCVIIX4CVIIIXCIXX8CXX6CXIX12CXII’. However, the sequences are much different in signal peptides, propeptides and loops between the cysteines, and PQM is apparent in U9-agatoxin-Ao1a, rather than in the precursors of Family V from H. venatoria (Additional file 9: Fig. S9).
Family W. There are three secretory proteins with a long 12-Cys scaffold in Family W, which have mutual Cys bonding pattern is ‘CIX7CIIX23CIIIX9CIVX7CVX22CVIX15CVIIX11CVIIIX11CIXX8CXX8CXIX22CXII’. The precursors had no homologs when they were aligned against the Database of GenBank, EMBL and DDBJ. However, 2 sequences from spider EST database were matched by using TBLASTN, which have not identified as toxins. The amino acid sequences of gi: 304306221 and gi: 189216028, which are in the cDNA library from Loxosceles intermedia venom gland and Acanthoscurria gomesiana, respectively, are homologous to U28-sparatoxin-Hv1a with 43% (E-value is 5e-07) and 48% (E-value is 1e-06) positives (Additional file 10: Fig. S10).
Family X. The two precursors in Family X with only two cysteines ‘CIX6CIIX16’ in the mature region which have no significant sequence homolog in the Database of GenBank, EMBL and DDBJ. The propeptides were predicted by using SpiderP shown in Additional file 11: Fig. S11. Noteworthy, there are two different probable cleavage modes.
Phylogenetic study of the CRPs in H. venatoria
The precursor sequences of CRPs from H. venatoria venom gland were aligned using Clustal X 2.0. The resulting alignment was imported into MEGA5 software to construct the phylogenetic tree with the neighbor-joining method. The vast majority of the 6-cys ICK motif precursors (Family A, B, C, E, F and G) were defined as the relatively original clade. Only Family D and H, with shorter propeptides (12 aa), were placed outside the “older” clade. Remarkably, Family H, whose signal peptides are different from and longer than other families with 6-cys ICK motif, was located far away from the original clade precursors (Family A, B, C, E, F and G). Intriguingly, four 8-cys ICK-like motif families, L, M, N and K, were put in different clades. It is reasonable to put them in four families although the cys-scaffold of the mature peptide looks similar. Family Q and R, which both adopt a ten-Cys-scaffold ‘CIXnCIIX4CIIICIVXnCVX9CVIXnCVIIXCVIIIX5CIXXnCX’ in the mature peptide domains, were also arranged in two far away phylogenetic clades (Fig. 3). The 3 loops (CIV-CV, CVI-CVII and CIX- CX) in Family R are longer than those in Family Q.
Evolution trend analyses for the propeptides of CRPs and cysteine number in mature peptide
The high-quality cDNA libraries and full-length EST sequences from eight spiders, 4 of Mygalomorphae and 4 of Araneomorphae, were used for the analysis of propeptide and cysteine number in mature domain. The length of propeptide varies among the species. There are longer propeptides in Mygalomorphae than in Araneomorphae. The propeptides longer than 25 aa account for 72.7%, 90%, 53.6% and 53.8% in H.huwenum, H.hainanum, G. rosea and Chilobrachys Jingzhao; respectively. By contrast, there are only 5%, 13.2% and 2.5% propeptides longer than 25 aa in L. singoriensis, D.mizhoanus and A.ventricosus, respectively, and none is longer than 25 aa in H. venatoria. By contrast, the percentages of precursors with a propeptide less than 10 residues are 4.6%, 5.7%, 17.8% and 29.2% in H. huwenum, H. hainanum, G. rosea and C. Jingzhao, respectively, and 47.7%, 43.4% and 92.5% in three species (H. venatoria, D. mizhoanus and A. ventricosus) of Araneomorphae. Although the ratio percentage of precursors with shortest propeptide is only 10.4% in L. singoriensis, the precursors with a 10–25 aa propeptide account for 84.7% (Fig. 4). As for the cysteine number in mature domain, there are 69.3%, 83.6%, 75%, 76.9% peptides with 6-cys motif in the four species of Mygalomorphae, respectively. However, there is no 6-cys CRPs described in L. singoriensis and D. mizhoanus so far, and 55% and 10% peptides with 6-cys motif in H. venatoria and A. ventricosus, respectively. On the contrary, there are more peptides with ≥ 8-cys motif, and the ratios percentages are 45%, 100%, 100% and 90% in H. venatoria, L. singoriensis, D. mizhoanus and A. ventricosus, respectively (Fig. 5).