Compliance with the Nagoya protocol on Access and Benefit Sharing
This research project was conducted in accordance with the obligations under the Nagoya Protocol on Access and Benefit Sharing. We obtained Prior Informed Consent (PIC) no. 00308021010800/4 from the Egyptian Convention on Biological Diversity (CBD) National Focal Point to conduct this research using the scorpion, A. amoreuxi collected from Egypt.
Collection of Scorpions and Venom Preparation
Adult A. amoreuxi scorpions were purchased from El Marwa Office for Export (Egypt) and were housed individually (to avoid cannibalism) in clear plastic containers in the laboratory of Professor Abdel-Rahman at Suez Canal University. The scorpion specimens were identified according to the key of El-Hennawy.35 Scorpions were fed on small insects (mealworms) and received water ad libitum. Crude venom was extracted using electrical stimulation (12–16 V; 10 s) and the milked venom was collected and centrifuged for 20 min at 13,000 rpm/4°C. Clear supernatants were pooled, freeze-dried and stored at − 20°C until use.36
Illumina sequencing
RNA was extracted from telsons dissected from five adult scorpions and stored in RNAlater® at − 80°C. Telsons were shipped on dry-ice and stored on receipt at − 80°C until processing. Scorpions were anesthetized before telson dissection by putting them in a freezer at − 20°C for ∼4–5 minutes. Venom was milked 4 days before dissection as recommended in previous studies.36–38 Telsons were combined, washed in ice-cold PBS to remove RNAlater® and homogenized in 1 mL TRIzol reagent (Thermo Scientific, UK) with 2 x 3mm tungsten carbide beads in a TissueLyser II (Qiagen, CA). Total RNA was extracted in TRIzol (Thermo Scientific, UK) and purified on RNeasy micro columns with on column DNase digestion (Qiagen, CA), according to the manufacturer’s instructions. Total RNA was quantified by fluorimetry (Qubit, Thermo Scientific, UK) and quality (RIN 9.1) was assessed on a Tapestation 4200 (Agilent, CA). Unique dual indexed stranded TruSeq mRNAseq libraries were prepared from 500 ng total RNA, according to the manufacturer’s instructions (Illumina, CA). Libraries were quantified by qPCR with SYBR green (Kapa Library Quantification Complete Universal, Roche, CH) on the QuantStudio 6 Flex (ThermoFisher Scientific, UK) and average library size of 291 bp determined on the Tapestation 4200 (Agilent, CA) prior to sequencing and base calling on an Illumina NextSeq500 with v2.5 chemistry and 75 bp paired-end reads, with 14.3 Gb sequence output.
Transcriptome Assembly and Annotation
Raw read quality was assessed before and after pre-processing with FastQC (version 0.11.8)39 and MultiQC (version 1.7)40. Raw reads were pre-processed using Trim Galore! (Version 0.6.4)41 to trim low-quality bases (< Q30) and adapters, and excluding short trimmed reads (< 20bp), before de novo transcriptome assembly with Trinity (version 2.8.5),42, 43 with unique accession Trinity_DN prefix identifying each feature in the output assembly. Assembly quality was assessed with BUSCO (version 4.0.6) with the arthropod lineage dataset, arthropoda_odb10,44 rnaQUAST (version 2.2.0)45 and GeneMarkS-T (version 5.1) gene prediction.46 Transcript abundances were determined with embedded Trinity scripts using the salmon method.47 For evaluation of Trinity transcripts against MS identified peptides, input raw reads used for the venom gland transcriptome assembly were mapped against the assembly using Bowtie 2 (version 2.3.5).48 The alignment file was post-processed with SAMtools (version 1.9) and coverage and allelic composition of transcripts of interest were inspected with SAMtools mpileup (version 1.9),49, 50 with corresponding sequence logos produced with R (version 3.6.1),51 Rstudio (version 1.1.456),52 ggplot2 (version 3.3.6),53 and ggseqlogo (version 0.1)54.
The assembly was annotated using Trinotate software (version 3.2.1).55 Residual background rRNA transcripts were filtered in post-assembly QC using Trinotate RNNAMER classification. AMPFinder with Diamond (version 1.1.0) was used to identify putative A. amoreuxi antimicrobial peptide orthologues of known AMPs with the A. amoreuxi contig nucleotide sequences as input queries against translated RNA sequences of known AMPs in the manually curated dbAMP version 2.0 database.25 Additionally, AMPFinder (version 1.1.0) random forest (RF) classifiers were used to discover putative novel AMPs in the A. amoreuxi transcriptome25. Jhong et al.25 constructed dbAMP 2.0 RF classifiers using experimentally verified AMPs from different source organisms (bacteria, human, amphibian, fish, plants, insects and mammals) as distinct training and test datasets to develop RF models with the best AMP-predictive performance in each lineage, as well as accurate predictive performance of each across all lineages (> 92% for independent test data). Prediction scores in insects, representing the most closely related taxa to scorpions, are reported for A. amoreuxi.
Isolation of venom peptides
Reverse phase high performance liquid chromatography (RP-HPLC) was performed using an Agilent 1260 LC System. Forty milligrams of the crude venom were dissolved in 4 mL (50:50 water: acetonitrile v/v), cleared by centrifugation at 7,000g and the supernatant was injected into a C4 column (ACE 5 C4-300, 5 µm, 250 × 10 mm, 300 Å). The fractions were then further purified using an C18 column (ACE 5 C18-300, 5 µm, 250 × 10 mm, 300 Å). Peptides were eluted using a linear gradient starting at 60% of solution B (0.1% TFA in acetonitrile) in solution A (0.1% TFA in water) for 60 min with a flow rate of 3 mL/min. Individual fractions were manually collected based on the peak absorbances at 220 nm, 254 nm and 280 nm. The collected samples where then dried using rotary evaporator to remove the acetonitrile, followed by lyophilization using a LaboGene CoolSafe freeze dryer.
Mass spectrometric analysis
MS data was acquired on a 12T SolariX 2XR Fourier transform – Ion cyclotron resonance (FT-ICR) instrument equipped with electrospray (ESI) ionization (Bruker Daltonics). RP-HPLC fractionated samples were infused at 2 µL/min and spectra were acquired between 200 and 4000 m/z. For top-down fragmentation, individual ions (single proteoforms) were isolated using the quadrupole and tandem MS was performed using either collision induced dissociation (CID) or electron capture dissociation (ECD). For ECD, typical cathode conditions were a bias of 1.5 V, lens voltage of 15 V, and pulse length was varied between 5 and 15 ms. Collision induced dissociation (CID) was typically conducted using a voltage between 15–25 V. The resulting top-down fragmentation spectra were processed using Data Analysis v4.2 (Bruker Daltonics) and the sophisticated numerical annotation procedure (SNAP) was used to produce a monoisotopic mass list. These mass lists were manually searched for sequence tags. Translated transcripts were searched for these sequence tags to identify the encoding transcript of each peptide and the sequence of the precursor peptide using the sequence logo method described above.
Alignment and phylogenetic tree
Protein homologues of each MS-derived peptide sequence were identified in the NCBI protein database and aligned using NCBI-Blast. Phylogenetic trees were produced using the bootstrap method with 100 bootstrap replications with MEGA11.56
Template-based 3D modelling
The homology modelling of the identified peptides with more than 30 amino acid residues was performed using SWISS-MODEL.57 The templates were selected based on the global model quality estimate (GMQE) which is a quality estimate that combines properties from the target-template alignment and the template structure. For large peptides (> 50 aa), we used GMQE cut-off value of 0.6, coverage = 0.9 and identity = 40% while the corresponding values for short peptides (< 50 aa) are GMQE = 0.8, coverage = 0.9 and identity = 80%. Visualisation and drawing of selected models were executed with PyMOL. The presence of disulfide (DS) bridges was confirmed through MS analysis, and the configuration of these DS bridges was predicted through homology with previously identified scorpion toxins.
Far-UV circular dichroism (CD): CD spectra of purified peptides (50 µM in 20 mM sodium phosphate buffer [pH 7.4] and 20 mM sodium fluoride at 25°C) were acquired between wavelengths of 260 nm and 190 nm on a MOS-500 CD Spectrometer (BioLogic). Each CD spectrum represent the average of 3 scans which was baseline-corrected by subtracting of the buffer spectrum. Mean residue differential extinction coefficient Δεres of each spectrum was calculated from the observed ellipticity θ and plotted against wavelength. The CD analysis was carried out only on peptides with sufficient amounts. Bestsel server (https://bestsel.elte.hu/index.php) was employed for determining the secondary elements from CD spectra.58
Solid phase peptide synthesis (SPPS):
Reagents
The majority of the Fmoc amino acid derivatives were purchased from Novabiochem, Merck Biosciences, UK. Fmoc-Leu-OH, Fmoc-Gln(Trt)-OH, N,N’-diisopropylcarbodiimide (DIC), trifluoroacetic acid (TFA) and triisopropylsilane (TIS) were acquired from Fluorochem, UK. Fmoc-L-Arg(Pbf)-OH, Fmoc-Tyr(tBu)-OH, ethyl cyano(hydroxyimino)acetate (Oxyma), Fmoc-Rink Amide ProTide (LL) resin was obtained from the CEM corporation, USA. N,N-dimethylformamide (DMF), dichloromethane (DCM), methanol (MeOH) and diethyl ether were purchased from VWR, Avantor, USA. Piperidine was purchased from Merck Life Sciences, UK. Sodium diethyldithiocarbamate trihydrate, 2,2′-(ethylenedioxy)diethanethiol (DODT) and pyridine were acquired from Sigma Aldrich.
Synthesis: Peptide synthesis was performed on a Liberty BlueTM Automated Microwave Peptide Synthesizer (CEM Corporation, North Carolina, USA) using the standard solid-phase peptide synthesis (SPPS)-Fmoc/tBu chemistry with piperidine (20% v/v) as the Fmoc-deprotecting agent. The AM29A5-syn peptide was assembled on a 0.1 mmol scale using Fmoc-Rink Amide ProTide (LL) resin with a substitution value of 0.18 mmol/g using 0.2 M Fmoc-protected amino acids and activated by DIC (1 M) and in the presence of 1 M Oxyma prepared in DMF. All the arginine residues were double coupled, and rest of the amino acids were single coupled. On synthesis completion, the resin was washed with dry DCM to remove DMF and subsequently dried under N2. The dried-resin peptide was cleaved with a cocktail solution of TFA/TIS/DODT/H2O (92.5:2.5:2.5:2.5) for 4 hours followed by filtration to remove resin beads. The flow through was then concentrated under N2 gas. The peptides were precipitated using cold diethyl ether, placed into a − 20ºC freezer overnight, washed with ether (3x) and dried under vacuum to give a crude solid. The crude peptides were purified using reversed phase HPLC on an Agilent Technologies 1260 Infinity using a C18 column (ACE 5 C18-HL, 5 µm, 250 × 10 mm, 100 Å) through an acetonitrile (+ 0.1% TFA)/Water (+ 0.1% TFA) gradient. Fractions containing the pure peptide were subsequently lyophilized on a LaboGene CoolSafe Freeze dryer to give the pure solid. Peptide identity was confirmed by MS analysis.
Folding
Four buffers (Table 4) were used to allow the correct oxidative folding of the linear synthetic peptide (AM29A5-syn). Aliquots of each reaction were collected at different time intervals and immediately quenched using 0.4% TFA in MilliQ water. The quenched aliquots were then injected in a C18 column (ACE 5 C18-300, 5 µm, 250 × 10 mm, 300 Å) and the retention time of the folded product was compared with that of the native peptide. MS analysis was also used to confirm the formation of DS bridges.
Surface plasmon resonance:
Recombinant proteins
The receptor binding domain (RBD) of the spike protein of SARS-CoV-2 (aa 319–541) and human ACE2 (aa 19–615) were recombinantly produced in HEK293 cells and provided by Peak Proteins Ltd (Macclesfield, UK). QC sheets are provided (Supplementary document 8 & Supplementary document 9).
Surface plasmon resonance assays
Venom peptide binding to RBD and competition with hACE2 was assayed in surface plasmon resonance experiments carried out using Biacore X100 (Cytiva, Uppsala, Sweden).
RBD binding measurements: RBD in PBS-P + pH 7.4 was captured and immobilized (approximately 6300 RU) on the surface of Flow cell 2 of an NTA sensor chip using the standard nickel activation procedure (Cytiva). This was followed by activation with a mixture of 1-ethyl-3-(3-dimethylaminopropyl) carbodiimide (EDC) and N-hydroxysuccinimide (NHS) for 7 min and inactivation with ethanolamine for 7 min for covalent linking of the 6×His-tag-RBD to the nickel-activated surface. Flow cell 1 (activated and blocked) served as a reference (blank) cell. Binding of fluid phase venom-derived peptides was determined over a range of concentrations determined by peptide quantity. Peptides were dissolved in PBS-P + buffer pH 7.4 (0.2 M phosphate buffer, 27 mM KCl, 1.37 M NaCl, 0.5% surfactant P20; Cytiva), supplemented with 71 µL of 350 mM EDTA. Injection volume was 60 µL and flow rate was 5 µL/min. The surface was regenerated with 350 mM EDTA and 0.5% w/v SDS. Sensorgrams of RBD binding measurements were obtained by subtraction of the sensorgram of buffer alone from that generated by injection of peptides. Equilibrium dissociation constants (KD) as well as association (ka) and dissociation constant (kd) rates were calculated for the peptides that showed dose-dependent binding using the Biacore X100 Evaluation Software version 2.0.2 (Cytiva). Curves were fitted to a 1:1 binding model as judged by the Chi2 value and distribution of residuals.
Inhibition studies
Binding of fluid phase hACE2 at a concentration of 150 nM was determined in the presence of venom peptides at the same concentrations listed above and compared to binding activity of hACE2 alone. For peptides that showed direct binding activity to RBD, final inhibition sensorgrams were obtained by subtraction of the sensorgram of injection of peptides alone from that generated by injection of peptides in presence of hACE2. Fluid phase compounds were dissolved in PBS-P + pH 7.4 buffer supplemented with 500 µM EDTA. Flow rate was 5 µL/min and the injection volume was 60 µL. After each binding measurement, the surface was regenerated with 250 mM EDTA and 0.5% w/v SDS. Single measurements were carried out for each condition. The half-maximal inhibitory concentration (IC50) was defined as the concentration of peptide that reduced hACE2 binding to RBD to 50% when compared with binding of hACE2 alone. IC50 values were calculated by linear regression using Microsoft Excel.
Antiviral assay:
Cells, virus and reagents
Vero E6 cells (ATCC-CRL-1586) for SARS-CoV-2 propagation, antiviral assays and cytotoxicity assays were obtained from LCG Standards (Middlesex, UK). Cells were grown and maintained in grown in Dulbecco's Modified Eagles Medium (DMEM) (Sigma, UK) supplemented with 10% foetal calf serum, 1 mM Sodium Pyruvate (Gibco) and 2 mM L-glutamine (Sigma). Cells were grown in an environment enriched with CO2 (5%) at 37°C and passaged every 2–3 days or at approximately 80% confluence; the number of passages did not exceed 6.
SARS-CoV-2, isolate hCov-19/England/204820464/2020 (Lineage B.1.1.7) was kindly made available by Public Health England through the BEI resources repository National Institute of Allergy and Infectious Diseases and managed by ATCC. The virus was propagated in Vero cells for 72 h at 37°C and 5% CO2 in DMEM supplemented with 2% FCS before recovery and storage (− 70°C) of cell-free virus. Viruses from second and third cell passages were used in all experiments. Infectivity was estimated by measurement of 50% tissue culture infective dose (TCID50) values of the cell-free viral supernatants.
The anti-RBD human neutralising antibody CV30,59 used as reference compound in antiviral assays, was purchased from Absolute Antibody (Cleveland, UK).
Cytotoxicity assays
Peptide cytotoxicity was assessed using the Vybrant® MTT Cell Proliferation Assay Kit (Thermofisher Scientific) by comparison of cell viability in the presence vs absence of peptide. Vero cells were seeded at 3 × 104 per well for 24 h. Cells were then washed with PBS and challenged with 2-fold dilutions of the AM29A5-syn peptide diluted in DMEM for 24 h at the range of concentrations tested in antiviral assays. After 24 h, the supernatant was removed and cells were treated with 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) in 100 µL of DMEM (free of phenol red) for 4 h at 37°C in 5% CO2. Finally, 100% DMSO was added for formazan dissolution at 37°C before absorbance assessment at 540 nm. Viability was estimated by comparison with cells exposed to medium alone and assays were performed in triplicate.
Virus neutralisation assays
Vero cells were seeded at 3 × 104 per well in DMEM supplemented with 10% FCS and incubated for 24 h at 37°C and 5% CO2. On the day of the experiment Vero cells were challenged with 10-fold serial dilutions of cell-free SARS-CoV-2 (starting concentration 108 PFU/mL) pre-incubated with 2-fold serial dilutions of AM29A5-syn peptide (0.8 µM – 25 µM) in DMEM with 2% FCS for 1h. The neutralising antibody CV30 at 2 µg/mL and medium alone were used as positive and negative control, respectively. After the 1 h pre-incubation period, 2.5% Avicell (Sigma) in 100 µl of water was added before the main incubation for 72 h at 37°C and 5% CO2 to allow for viral replication. After this time, infected cells were fixed with 10% formalin (Sigma) for 3 h, before staining with crystal violet in 20% ethanol (Sigma) for 30 min. Finally, wells were washed two times with water and left to dry before plaque counting under light microscopy for quantification of plaque-forming units (PFU). Infectivity was estimated by TCID50 measurements and comparison with cells exposed to medium alone. Each condition was performed in quadruplicate.