Retrieving of SARS-COV-2 spike protein and flagellin sequences. In the first stage of the research, the amino acid sequence of Spike protein of SARS-COV-2 and flagellin (fliC) protein from Salmonella enterica serovar Typhimurium (S. Typhimurium) were retrieved from the UniProt database (uniport.org) (P0DTC2, Q66PQ5, respectively).
Prediction of B-cell epitopes. The identification of B-cell epitopes as potential antigens recognized by the surface receptor of B-cell lymphocytes leads to generate a specific immune response. Thus, B-cell epitopes play a key role in vaccine design. The BCPREDS server (56) allows users to choose the method for predicting B-cell epitopes among several developed prediction methods. Herein, the length of 14-mer B-cell epitopes and Specificity of 75 % were selected.
Prediction of CTL epitopes. The freely-accessible NetCTL 1.2 server (57) was used for CTL epitopes prediction. This method integrates prediction of peptide MHC class I binding, proteasomal C terminal cleavage and TAP (Transporter Associated with Antigen Processing) transport efficiency. The server allows for predictions of CTL epitopes restricted to 12 MHC class I supertype. In this study, we have chosen the A1 supertype and threshold value set at the default score (0.75) for the prediction of CTL epitopes.
HTL epitopes prediction. The amino acid sequence of S protein subjected to NetMHCII 2.3 Server (58) for screening HTL epitopes with 15-mer length. This server using an artificial neuron network predicts binding of MHC II epitopes for human (HLA-DR, HLA-DQ, HLA-DP) and mouse MHC class II alleles (H-2). Herein, we selected mouse H2 class II alleles (H-2-IAb, H-2-IEd, and H-2-IAd).The prediction values inferred from the IC50 values, and as a percentile-rank to a set of random natural peptides. Strong and weak binding peptides are illustrated in the output. An IC50 values <50 nM, <500 nM and <5000 nM indicates high-affinity, intermediate affinity and low affinity, respectively.
Multi-epitope vaccine candidate construction. The selected epitope candidates including high-scoring CTLs, high-affinity HTLs epitopes and B-cell epitopes were joined together using AAY and GPGPG linkers (47) to produce the multi-epitope vaccine (MEV) construct. Flagellin (FliC) as an adjuvant was chosen and was included at the N-terminal region of the MEV construct by an EAAAK linker (47).
Interferon-gamma (IFN-γ) inducing epitope prediction. One of the signatures of the innate and adaptive immune systems is interferon-gamma (IFN-γ) which has an antiviral function, immune regulation as well as anti-tumor activity. IFN-γ cytokine, as the major arm of the Th1 response, has a key role in the control of intracellular pathogens such as viruses. IFNepitope server (http://crdd.osdd.net/raghava/ifnepitope/scan.php) (59) developed to predict and design IFN-gamma inducing MHC class II binder peptides from target proteins to design an effective subunit vaccine. The Support Vector Machine (SVM) hybrid algorithms along with Motif was used for the prediction of IFN-γ epitopes. This server is based on a dataset including IFN-γ inducing and non-inducing MHC class-II binder, which has the potential to activate T-helper cells.
Prediction of antigenicity of the designed vaccine construct. To analyze antigenicity of multi-epitope vaccine construct (contain the flagellin adjuvant sequence), two servers including Vaxijen 2.0 (Vaxijen 2.0 is freely available online at the URL: http://www.jenner.ac.uk/VaxiJen) (60) and ANTIGENpro (ANTIGENpro is integrated into the SCRATCH suite of predictors available at http://scratch.proteomics.ics.uci.edu) (61) were used. The base of the VaxiJen 2.0 server is on auto- and cross-covariance (ACC) transformation of target protein sequences into uniform vectors of principal amino acid properties. The prediction approach used in VaxiJen v2.0 is alignment-free and is based on various physicochemical properties of the protein. The threshold value of Vaxijen 2.0 was set at 0.5. ANTIGENpro server for checking protein antigenicity applies protein antigenicity microarray data and illustrate an antigenicity index. It was estimated the accuracy of the ANTIGENpro server to be 76%, based on cross-validation experiments and using the combined dataset.
Allergenicity prediction of designed vaccine construct. AllerTOP V2.0 (http://www.ddg-pharmfac.net/AllerTOP) (62) and Algpred (Available at: http://webs.iiitd.edu.in/raghava/algpred) (63) servers were used to predict the allergenicity nature of the multi-epitope vaccine construct. In AllerTOP V2.0 server various methods are employed for classification of allergens such as amino acid E-descriptors, auto- and cross-covariance transformation, and the k nearest neighbors (kNN) machine learning. In this method, an accuracy of 85.3% has been presented at 5-fold cross-validation. Algpred allows the prediction of allergens based on the similarity of known epitope with any region of the protein. In AlgPred a systematic attempt has been made to integrate various approaches to predict allergenic proteins with high accuracy. In this study, a hybrid approach of the server, using a combined method (SVMc + IgE epitope + ARPs BLAST + MAST), was used to predict allergen.
Different physicochemical properties and solubility analysis of vaccine construct. Different physicochemical parameters of MEV such as amino acid composition, theoretical pI, Grand Average Hydropathy, Stability Profiling, Instability Index, Molecular Weight, Half-Life, and Aliphatic Index were determined using ProtParam (http://web.expasy.org/protparam/) (51) online server. SOLpro was used for solubility analysis of MEV. SOLpro is integrated in the SCRATCH suite of predictors available at http://scratch.proteomics.ics.uci.edu (52). SOLpro predicts if the protein is soluble or not (SOLUBLE/INSOLUBLE) and gives the corresponding probability (≥ 0.5).
Secondary structure prediction of vaccine construct. The PSIPRED server was used for our vaccine protein secondary structure prediction. PSIPRED, a web-based freely-accessible online server (http://bioinf.cs.ucl.ac.uk/psipred/)(64), is based on primary amino acid sequences input in a precise manner. In this method, a very stringent cross-validation approach use to evaluate the method's efficiency. PSIPRED 3.2 server combines two feed-forward neural networks that do an analysis of output obtained from PSI-BLAST (Position-Specific Iterated - BLAST).
Tertiary structure prediction. Three different servers including I-TASSER, SWISS-MODEL and Phyre2 were used to achieve the best 3D structural models. In the following, Phyre2 software was selected. Phyre2 is available at http://www.sbg.bio.ic.ac.uk/phyre2 (65). This server is one of the most widely used protein structure prediction servers which uses advanced remote homology detection methods to build 3D models.
3D structure validation. Given the importance of model validation and to find the possible errors in primary 3D structure models, we utilized 3 tools for model validation. ProSA-web at https://prosa.services.came.sbg.ac.at/prosa.php, ERRAT server at http://nihserver.mbi.ucla.edu/ ERRATv2/ (66) and PROCHECK’s Ramachandran plot analysis at https://servicesn.mbi.ucla.edu/PROCHECK/ (67) were used. The overall quality of a specific input structure calculates using ProSA-web (68) and is presented as a quality score. ProSA-web provides an easy-to-use interface to the program ProSA which is frequently used in protein structure validation. The ProSA-web score is shown in a plot, which including the z-score of experimentally determined structures deposited in PDB. ERRAT program is used for verifying protein structures determined by crystallography. This program is based on the statistics of non-bonded atom-atom interactions in the reported structure. PROCHECK’s Ramachandran plot program checks the stereochemical quality of a protein structure by analyzing residue-by-residue geometry and overall structure geometry.
Discontinuous B-cell epitope prediction in final MEV construct. ElliPro (http://tools.iedb.org/ellipro/) server (69) was employed for the prediction of conformational B-cell epitopes from the validated 3D structure. ElliPro is a new web-tool for the prediction of antibody epitopes based on the geometrical properties of protein structure. ElliPro considers a score to each output epitope defined as a PI (Protrusion Index) value averaged over epitope residues. In the method, the protein's 3D shape is approximated by several ellipsoids, thus that the ellipsoid with PI = 0.9 would include within 90% of the protein residues with 10% of the protein residues being outside of the ellipsoid.
Molecular docking of designed vaccine with TLR-5 receptor. The interaction patterns of the final MEV construct as a ligand with the TLR5 receptor (PDB ID: 3J0A) were analyzed by using ClusPro 2.0 server (https://cluspro.org) (70). ClusPro is a web-based server for the direct docking of two interacting proteins.Three computational steps including (1) rigid-body docking, (2) RMSD based clustering of the 1000 lowest energy structures, and (3) the removal of steric clashes by energy minimization performs using ClusPro software.
Molecular dynamics (MD) simulations. CABS-flex webserver was used for fast simulations of TLR5-designed vaccine complex flexibility. CABS-flex 2.0 is freely available at http://biocomp.chem.uw.edu.pl/CABSflex2 (53). The only data required as input is a protein structure in the PDB format (or a protein PDB code). Here, the selected docked TLR5-vaccine complex was used as the input for the MD simulation. In this server, the protein flexibility, contact map and root-mean-square fluctuations (RMSFs) of atoms in a protein complex are presented. CABS-flex server show RMSF simulation of all amino acid residues provide in a particular protein, in nanosecond time.
In silico cloning and codon optimization of MEV construct. The Java Codon Adaptation Tool (JCat) (http://www.prodoric.de/JCat) (71) was used for codon optimization and reverse translation of the MEV construct sequence to express the MEV in a suitable expression vector. Here, an E. coli (strain K12) host was selected to express the final MEV sequence and other options such as rho-independent transcription termination, prokaryote ribosome binding site, and restriction enzyme cleavage sites were included. To ensure the high-level protein expression, two JCat output indexes include the codon adaptation index (CAI) and percentage GC content can be used. CAI supply data on codon usage biases; the CAI score is optimal at 1.0, but a score above 0.8 (> 0.8) is also good. Usually, the optimal GC content range between 30–70% is favorable effects on translational and transcriptional efficiencies. Also, to clone the final MEV sequence in E. coli pET-28a (+) vector, two restriction enzyme sites contain XhoI and NcoI were included at N and C-terminals of the target sequence, respectively. To ensure MEV expression, the optimized MEV sequence (with restriction sites) was inserted into the pET-28a (+) vector using the SnapGene tool.