Sequence Retrieval
The amino acid sequence of all the four proteins i.e. palmitoylated EEV membrane glycoprotein (AOE47604.1) and putative IMV envelope protein/P32 (AOE47650.1) of LSDV, EEV glycoprotein (AVI09752.1) of SPPV, and EEV host range protein/B5R protein (ABS72326.1) from GTV were retrieved from NCBI in FASTA format. (https://www.ncbi.nlm.nih.gov/)
MHC‑I Binding Epitope Prediction
A length of 9-mer peptides of the target proteins with bovine MHC-I alleles (6 alleles- BoLA-D18.4, BoLA-JSP.1, BoLA-HD6, BoLA-T2a, BoLA-T2b, BoLA-T2C) were identified by two servers (NetMHC 4.0 and IEDB). The “NetMHC” 4.0 (http://www.cbs.dtu.dk/services/NetMHC/) is an artificial neural network-based server to align the subjected sequences which allow insertions and deletions during the alignment. The threshold level set for strong and weak binders were rank 0.5% and 2%, respectively. The predictions were done for all 6 alleles mentioned above.
Prediction of CTL Epitopes and TCR‑Peptide/Peptide‑MHC Interfaces
The “CTLPred” (http://www.imtech.res.in/raghava/ctlpred/index.html) server benefits from precise machine learning techniques like support vector machine (SVM) and ANN as single algorithms or in combination. The prediction was done using the consensus method with default sensitivity and specificity.
MHC‑II Binding Epitope prediction
The prediction of MHC-II binding epitopes was performed by using four online servers: “NetMHCIIpan” 3.1, “RANKPEP”, “TepiTool”, and “IEDB”. Three bovine MHC alleles [Alleles - BoLA-DRB3 (DRB3_0101, DRB3_0217, DRB3_0303)] were selected for the analysis of peptide binding ability to MHC class II epitopes.
The “NetMHCIIpan 3.1” (http://www.cbs.dtu.dk/services/NetMHCIIpan/) was used with BoLA-DRB3 (DRB3_0303, DRB3_0217, DRB3_0101) bovine alleles and it is a useful method for quantitative prediction of peptide binding to bovine MHC class II molecules. This method is based on the contact between target protein directly with the MHC class II binding cleft which is very crucial to identify CD4+T-cell antigens.
“RANKPEP” (http://imed.med.ucm.es/Tools/rankpep.html) predicts MHC II binding epitopes which are based on their Position Specific Scoring Matrices (PSSMs) ranking for all possible peptides. However, the prediction was done using only one allele-HLA-DR52 (DRB3*02). All other settings are defined as default. All the six alleles of bovine species BoLA-T2a, BoLA-T2b, BoLA-T2C, BoLA-D18.4, BoLA-HD6 and BoLA-JSP.1, were evaluated using the “TepiTool” (http://tools.iedb.org/tepitool/) server. This server applies variety of algorithms to predict MHC II epitopes within the protein sequence. Only14-mer peptide predictions were selected.
“IEDB”( http://tools.iedb.org/mhcii/) server was used find MHC-II binding epitopes with “IEDB recommended 2.22” approach with a length of 15-mer epitopes. Two MHC alleles- H2-IAb and H2-IAd were chosen from mouse origin for prediction.
Linear B‑Cell Epitope Prediction
“BCPREDS”, “LBtope” and “ABCpred” were used to predict B-cell epitopes. 20-mer epitopes of each protein were evaluated by “LBtope” (http://crdd.osdd.net/raghava/lbtope/) using the LBtope_Fixed_non_redundant method (non-redundant dataset).
Three approaches- AAP, fixed-length BCPred, or flexible-length method (FBCPred), are offered by the “BCPREDS” (http://ailab.ist.psu.edu/bcpred/predict.html) server and fixed-length “BCPred” was selected for this study.
“ABCpred”(https://webs.iiitd.edu.in/raghava/abcpred/) is the first server tool that is based on the recurrent neural network which uses using fixed-length patterns. Prediction parameters such as a threshold level and peptide length were set to be 0.51 and 16-mers.
Selection of the Epitope Regions
The results from all the above tools were combined to identify the highly overlapping regions with preference for the selection of only those epitopes that can induce multiple immune responses [Th1 (MHC-I), Th2 (MHC-II), CTL, and B-cell responses]. All selected epitopes were then analyzed by various online tools to choose the most appropriate epitopic regions for our vaccine construct.
Engineering the Vaccine and Evaluation of Physicochemical Properties of the Designed Construct
The favorable epitopes from all the four proteins were selected in consonance with the results of all the above-mentioned software and combined them end to end with a combination that yielded maximum antigenicity and low allergenicity, and a TLR agonist adjuvant and two universal T-helper peptides were finally linked to build the final construct.
“Solpro” (http://scratch.proteomics.ics.uci.edu) server was used to determine the solubility of the vaccine peptide in Escherichia coli (E. coli) and yeast cells. The molecular weight (MW), amino acid composition, aliphatic index, theoretical pI, instability index, and grand average of hydropathicity (GRAVY) were determined by using the ProtParam tool (http://web.expasy.org/protparam/).
Allergenicity Prediction
Two important servers were being used to define allergenicity; The “AllergenFP” v.1.0 (http://ddg-pharmfac.net/AllergenFP/) works with an accuracy of 88% which recognizes allergens and non-allergens by depending on physicochemical properties of the peptide/protein.
The “Algpred” (http://www.imtech.res.in/raghava/algpred/) was another software used which employs numerous approaches, viz., IgEpitope, blast, mast, and SVM, to predict allergenic proteins with higher accuracy.
Antigenicity Prediction
The two reliable servers to estimate the antigenicity of a protein/peptide are “VaxiJen” (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html) v2.0 and the “ANTIGENpro”. The ANTIGENpro (http://scratch.proteomics.ics.uci.edu) web-server works by multiple representations of the given protein sequence and giving antigenicity score. VaxiJen v2.0 server was used with threshold=0.4; its accuracy varies between 70 and 89% and is based on the chemical properties of query proteins amino acid sequences to define its antigenicity.
Prediction of Tertiary Structure and Homology Modeling
“Robetta” (https://robetta.bakerlab.org/submit.php) is a reliable protein structure prediction server developed by the Baker lab, University of Washington. This server provides four main options for structure prediction; 1) Deep learning-based method (TrRosetta), 2) Rosetta Comparative Modeling (RosettaCM), 3) Rosetta Ab Initio (RosettaAB), and 4) A fully automated pipeline, that first predicts protein domains as independent folding units and models each domain unit with (2) or (3), and then finally assembles each domain into form full chain models. This server predicts the five most possible models for any given protein sequence and the best can be selected based on the error estimate graph.
Refinement of 3D Modeled Protein Structure
Two programs, “GalaxyRefine” (http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=REFINE) and “3Drefine” (http://sysbio.rnet.missouri.edu/3Drefine) were applied to refine the best 3D model obtained from “RosettaAB” server.
The “3Drefine” employs an iterative optimization-based hydrogen bonding network along with minimization of the atomic-level energy on an optimized model using composite physics and the knowledge-based force fields. The “GalaxyRefine” server can refine a protein model through mild and aggressive relaxation methods.
Validation of Refined 3D Models
The validation of all refined 3D models was performed by the “ProSA-web” (https://prosa.services.came.sbg.ac.at/prosa.php) (Protein Structure Analysis), “QMEAN“(https://swissmodel.expasy.org/qmean/) and “ERRAT” (http://services.mbi.ucla.edu/ERRAT).
“ProSA-web” server calculates interaction energy of every protein residue to define if it can achieve certain energy criteria. “ERRAT” uses a method called empirical atom-based approach to substantiate the predicted protein structures by evaluation between the statistics of non-bonded atom-atom interactions in the query protein sequence and the database of high-resolution crystallographic structures. “QMEAN” gives local estimates of predicted model quality which is derived from the QMEAN scoring function as a per-residue plot and as well as global score from the collection of high-resolution PDB structures (Z-score).
Conformational B‑Cell Epitope Prediction
The “DiscoTope 2.0” (http://www.cbs.dtu.dk/services/DiscoTope) server was used to identify B-cell epitopes from the predicted 3D structure of the protein. “DiscoTope” applies two approaches to find B-cell epitopes, 1) Contact numbers deduced from surface accessibility of the protein and 2) Novel epitope propensity amino acid score. A selected threshold was −3.7 with the specificity and sensitivity of 0.75 and 0.47, respectively.
Molecular Docking of the Designed Vaccine Construct with TLRs
The 3D structures of TLR molecules were developed by homology modeling using “Robetta” server and horse TLR9 (PDB ID–3wpb), mouse TLR3 (PDB ID –3cig), and human TLR8 (PDB I –6ty5) as templates for bovine, ovine and caprine TLRs respectively.
For protein-protein docking of vaccine and the TLR molecules such as bovine TLR9, ovine TLR3, and caprine TLR8 as the receptors, we used “ClusPro protein-protein docking” server (https://cluspro.org/home.php). It is a molecular docking algorithm that works by rotating the ligand with 70,000 rotations and for each rotation, it translates the ligand in x, y, z with respect to the receptor on a grid and chooses the 1000 rotation/translation combinations which are having the lowest score.
Molecular Dynamics Simulation
The stability assessment of the vaccine-TLR complexes was performed in a simulated biological environment in water box under alternating boundary conditions at 10ns using “VMD” and “NAMD” software. First, the vaccine-bovine TLR complex was edited to generate individual pdb and psf files of corresponding ligand and receptor using CHARMM force field, and later the files were merged in VMD tool(Soteras Gutiérrez et al. 2016; William Humphrey, Andrew Dalke, and Klaus Schulten 1996). The complex was then embedded in a 12Å sized water box. NAMD software was used to simulate the docked complex(Phillips et al. 2020). Initially, 50000 energy minimization steps were performed then the system was subsequently heated from 0 K to 310 K. Further, NPT analysis was performed followed by 10 ns molecular dynamics (MD) simulation analysis with a time step of 2 femtoseconds (fs) using Langevin dynamics algorithm. The same method was followed for the vaccine-caprine TLR complex. Here we used Particle Mesh Ewald (PME) to create the periodic boundary conditions. The trajectory analysis of the simulation and the RMSD analysis was executed using the VMD tool.
Codon Optimization
"Reverse Translate - Bioinformatics" (http://www.bioinformatics.org/sms2/rev_trans.html) server was used for reverse translation of the final vaccine construct. "Optimizer" (http://genomes.urv.es/OPTIMIZER/) was used for Codon optimization and to assess the properties of the optimized DNA sequence in the host like Codon Adaptation Index (CAI), GC content of the sequence, and Codon.
"Reverse Translate" receives a protein sequence as an input and makes use of a standard codon usage table to develop a DNA coding sequence which is most likely non-degenerate coding sequence. The default standard codon usage table was developed using all the E. coli coding sequences in GenBank and was retrieved from Codon Usage Database.