Selection of antigens and sequence retrieval
To select the appropriate antigenic protein for this study, a literature search have been done and a few new antigens, which their potential in designing multi-epitope vaccines against M. tuberculosis have not been studied, were chosen. The protein sequence of the selected antigens including Rv2346 (P9WNI7), Rv2347 (P9WNI5), Rv3614 (P9WJD5), Rv3615 (P9WJD7) and Rv2031 (P9WMK1) have been retrieved from the Universal Protein Resource (Uniprot) database (http://www.uniprot.org/)(Bairoch 2005). Sequence of Flagellin of Salmonella enterica subsp. enterica serovar Dublin as adjuvant have also been retrieved with ID number of Q06971.
Prediction of MHC-I binding epitopes
MHC-I binding epitopes of every antigenic protein have been predicted using NetMHC 4 and Immune Epitope Database and Analysis Resource (IEDB) servers. Artificial neural network (ANN) algorithms have been employed to align sequences in the NetMHC 4 (http://www.cbs.dtu.dk/services/NetMHC/) server. The sequence alignment method of NetMHC 4 tolerates insertion and deletion in the alignment leading to higher performance compare to other strategies (Andreatta 2016). The prediction have been done with all default settings. The IEDB was also utilized to predict MHC-I binding epitopes (http://tools.iedb.org/mhci/). Various methods including ANN, stabilized matrix method (SMM), and combinatorial peptide libraries (CombLib), or NetMHCpan are employed to predict MHC-I binding epitopes in IEDB (Vita 2019).
Prediction of MHC-II binding epitopes
MHC-II binding peptides are important for the activation of CD4+ T-cells. The prediction of MHC-II binding epitopes have been done using IEDB (http://tools.iedb.org/mhcii/) and NetMHCIIpan 4 (https://services.healthtech.dtu.dk/service.php?NetMHCIIpan-4.0) servers. The IEDB utilizes various methods including NN-align, SMM-align, CombLib, and Sturniolo. A consensus approach predicts epitopes based on the availability of predictors for the molecule or NetMHCIIpan. The performance of the predictions have been demonstrated in several studies (Vita 2019). The NetMHCIIpan determines nine residues regions, which directly interact with MHC binding cleft to predict peptides with MHC-II binding potential quantitatively (Andreatta 2015).
Prediction of Linear B-cell epitopes
BepiPred 2 (https://services.healthtech.dtu.dk/service.php?BepiPred-2.0) was utilized to predict linear B-cell epitopes. A random forest algorithm is employed by BepiPred server to predict linear B-cell epitopes (Jespersen 2017).
Selection of the epitope segments
As above mentioned various epitope prediction servers have been employed to predict epitopes for every antigenic protein. To choose the appropriate epitopes for vaccine design, all results have been pooled and high rank regions with overlap between various methods have been selected. Furthermore, allergenicity, immunogenicity and IFN-γ inducing potential of the epitopes have been considered. To evaluate allergenicity of the epitopes, AlgPred server (https://webs.iiitd.edu.in/raghava/algpred/submission.html) have been employed. High accuracy of allergenic peptide/protein prediction in the AlgPred server is due to integration of various methods including blast, mast, IgEpitope and SVM to evaluate the allergenicity (Saha 2006). Antigenicity of the epitopes have been evaluated using the VaxiJen v2.0 server (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html). The accuracy of the VaxiJen results varies ranging from 70% to 89% depend on the target organism. The principal chemical properties of the peptide/protein sequences instead of a sequence alignment approach is employed to assess antigenicity by VaxiJen server (Saha 2006).Potential of IFN-γ induction could improve development of immune response by the vaccine. IFN-γ inducing potential of the epitopes have been analyzed with IFNepitope server (http://crdd.osdd.net/raghava/ifnepitope/). Various approaches, such as machine learning technique, motifs-based search, and hybrid approach are utilized in INFepitope server. The maximum accuracy of 81% have been reported for hybrid model (Dhanda 2013). Finally, for every antigenic protein one non-allergenic MHC-I binding and one non-allergenic MHC-II binding epitope have been selected. All of the selected epitopes have also demonstrated antigenicity.
Construction of the vaccine and evaluation of its properties
The selected epitopes were joined together with appropriate linkers to construct the vaccine sequence. Furthermore, flagellin as toll-like receptor (TLR) agonist and TpD as universal T-helper epitope were added to the vaccine construct to improve its efficacy. The Solpro server (http://scratch.proteomics.ics.uci.edu/) was employed to evaluate solubility of the vaccine protein in Escherichia coli. Solpro which is based on a SVM approach, have demonstrated 74% overall accuracy in an experiment using multiple runs of 10-fold cross validation (Magnan 2009). Different physicochemical properties of the vaccine construct, including amino acid composition, molecular weight (MW), instability index, grand average of hydropathicity (GRAVY) and theoretical pI were estimated using the ProtParam tool (http://web.expasy.org/protparam/) (Gasteiger 2005).
Evaluation of antigenicity and allergenicity of the vaccine construct
Allergenicity and antigenicity of the vaccine construct were also evaluated. AllergenFP v.1.0 (http://ddg-pharmfac.net/AllergenFP/) were employed to estimate allergenicity of the vaccine construct. AllergenFP specify allergens from non- allergens with accuracy of about 88 %. It employs physicochemical properties of the molecules to develop a descriptor-based fingerprint approach (Dimitrov 2014). To evaluate antigenicity of the vaccine construct, ANTIGENpro (http://scratch.proteomics.ics.uci.edu/) and VaxiJen v2.0 servers were employed. ANTIGENpro evaluate antigenicity of proteins by an alignment-free approach in which a final SVM classifier combined with various machine learning algorithms (Magnan 2010).
Homology modeling
To further investigate the interaction of the vaccine construct with immune system proteins such as TLRs and prediction of conformational B-cell epitopes, 3-dimential (3D) structure of the vaccine construct was modeled. 3D modeling have been done using I-Tasser software at http://zhanglab.ccmb.med.umich.edu/I-TASSER/. Four steps is followed to develop a 3D model of a protein in the I-Tasser server. First, templet proteins in protein data bank (PDB) with similar sequence to the query protein are identified with multiple alignment approaches. Second, protein structure is assembled using a modified replica-exchange Monte Carlo simulation method. Some regions including loops and tails could be modeled using ab initio approach. Third, structure decoys are clustered to select the model and fragment-guided molecular dynamics simulation (FG-MD) or ModRefiner is utilized to refine the model. Forth, the modeling is completed by a structure-based functional annotation using COACH approach. A confidence score (Cscore) is designated to the 3D models developed by I-Tasser. The higher value of Cscore demonstrates high confidence for a model. The models were retrieved from I-Tasser server in PDB format and Discovery studio 2020 was used to visualize the 3D structures and production of figures.
Refinement of the 3D modeled structure
Top two 3D models developed by I-Tasser server were refined by GalaxyRefine and 3Drefine servers. GalaxyRefine (http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=REFINE) refines whole protein through mild and aggressive relaxation methods. In GalaxyRefine server, repacking of the protein side-chains is followed by short molecular dynamic simulations to relax the structure (Shin 2014). In the 3Drefine server, (http://sysbio.rnet.missouri.edu/3Drefine/) a two-step protocol refines the protein structure. At first, hydrogen-bonding network is iteratively optimized and then energy minimization at atomic level is performed using combination of physics and knowledge-based force fields (Bhattacharya 2016).
Validation of the 3D refined structures
To evaluate developed 3D structures in previous sections and select the best 3D model of the vaccine construct, ProSA-web Z-score, ERRAT value, and PROCHECK Ramachandran plot were analyzed. The ProSA-web (https://prosa.services.came.sbg.ac.at/prosa.php) computes Z-score and plots calculated Z-score with the Z-scores of experimentally developed 3D structures deposited in PDB. To calculate Z-score, interaction energy of each residue with the rest of protein is calculated and is compared to a certain energy criteria (Wiederstein 2007). The Ramachandran plot have been retrieved from PROCHECH software accessible in the structural and verification analysis server SAVE (https://saves.mbi.ucla.edu/). In the Ramachandran plot residues are divided to allowed and disallowed regions based on the phi-psi torsion angles (Hollingsworth 2010). ERRAT value can also be calculated at SAVE server. Non-bonded atom–atom interactions in a database of reliable high-resolution crystallography structures is compared to the one in the query model to calculate ERRAT value (Colovos 1993).
Prediction of discontinuous B-cell epitopes
3D structure of the vaccine construct were implemented to ElliPro in IEDB database (http:// tools.immuneepitope.org/tools/ElliPro) to predict discontinuous B-cell epitopes. ElliPro identify B-cell epitopes by combination of a residue-clustering algorithm and Thornton’s method. ElliPro have demonstrated the best performance in comparison with six other algorithms predicting epitopes (Ponomarenko 2008).
Molecular docking of the vaccine with TLRs
Interaction of a vaccine with toll like receptors (TLR) can improve efficacy of the vaccine (Vijay 2018). Protein-protein docking have been employed to investigate interactions of the vaccine construct with TLR3, TLR4 and TLR8. Crystallographic 3D structures of TLR3 (1ZIW), TLR4 (4G8A) and TLR8 (3W3G) have been retrieved from the protein data bank (PDB; http://www.rcsb.org/pdb/) and were considered as receptor in the docking procedure. Protein-protein docking have been done using ClusPro (https://cluspro.bu.edu/login.php) (Kozakov 2013, Kozakov 2017, Vajda 2017, Desta 2020), PatchDock (https://bioinfo3d.cs.tau.ac.il/PatchDock/php.php) (Duhovny 2002, Schneidman-Duhovny 2005) and HawkDock (http://cadd.zju.edu.cn/hawkdock/) (Hou 2002, Zacharias 2003, Feng 2017, Weng 2019) servers. Outputs of docking with PatchDock were refined with FireDock as it is recommended in the PatchDock server. To select the best docking results among the outputs of various servers, free binding energy of the docked structures have been calculated using Molecular Mechanics/Generalized Born Surface Area (MM-GBSA) at HawkDock server and the complex with the lowest free binding energy have been considered as the best docking result.
Codon optimization and in silico Cloning
The amino acid sequence of the vaccine construct have been reversely translated by sequence manipulation suite (https://www. bioinformatics.org/sms2/rev_trans.html) to develop a suitable gene sequence for cloning and expression. The properties of the gene sequence including Codon Adaptation Index (CAI), GC content, and Codon Frequency and Distribution (CFD) have been estimated by GenScript server (https://www.genscript.com/tools/rare-codon-analysis) (Yazdani 2020).