A Candidate Multi-Epitope Vaccine Against Pathogenic Chandipura Vesiculovirus Identified using Immunoinformatics.


 Chandipura vesiculovirus (CHPV) is a rapidly emerging pathogen responsible for causing acute encephalitis. Due to its widespread occurrence in Asian and African countries, this has become a global threat, and there is an urgent need to design an effective and non-allergenic vaccine against this pathogen. The conventional method of vaccine design involves large proteins or whole organism which leads to unnecessary antigenic load with increased chances of allergenic reactions. In addition, the process is also very time consuming and labour intensive. These limitations can be overcome by peptide-based vaccines comprising of short immunogenic peptide fragments that can elicit highly targeted immune responses, avoiding the chances of allergenic reactions, in a relatively shorter time span. The multi-epitope vaccine constructed using CTL, HTL and IFN-γ epitopes was able to elicit specific immune responses when exposed to the pathogen, in-silico. Not only that, Molecular Docking and Molecular Dynamics Simulation studies confirmed a stable interaction of the vaccine with the immune receptors. Several physicochemical analyses of the designed vaccine candidate confirmed it to be highly immunogenic and non-allergic. The computer-aided analysis performed in this study suggests that the designed multi-epitope vaccine can elicit specific immune responses and can be a potential candidate against CHPV.


Introduction
Chandipura vesiculovirus (CHPV), associated with an encephalitic illness in humans, is a member of the Rhabdoviridae family, rst discovered by Bhatt and Rodrigues in 1966. It was reported from two febrile cases during an outbreak of Dengue and Chikungunya in Nagpur, Maharashtra, India [1]. Though maximum cases are reported in India [2][3][4][5], studies have also shown its presence in Asian [5,6] and African countries [5,7,8]. It also infects many mammalian species around the globe [9]. Due to its worldwide occurrence, it has gained global attention as an emerging neurotropic pathogen, imposing a mortality rate ranging from 55-77% [2,10]. The complete genome sequencing makes it evident that the uniqueness of the virus has become a new threat to humanity and what seems to be a tropical outbreak can very soon be a global one [11].
CHPV was isolated from sand-ies, which are the potential vectors for transmitting the disease [12,13]. The viral genome is ~ 11 kb [14] consisting of single stranded RNA coding for ve polypeptides namely, Nucleocapsid protein N, Phosphoprotein P, Matrix protein M, Glycoprotein G and Large protein L [5,11,15,16]. L and P proteins together act as a viral RNA dependent RNA polymerase (RdRp) and matrix protein links the encapsidated genome RNA with the phospholipid bi-layer [17]. Spike like glycoprotein present in the membrane act as the major antigenic determinant [18].
Thus, glycoprotein becomes a potent choice for vaccine designing [19]. A recombinant vaccine against CHPV has been developed in mice using the entire G gene [19], but it lacks human trials [10]. Hence, a licensed vaccine is still not available [10]. Considering the global threat imposed by the virus, there is an urgent need to develop a highly effective vaccine against CHPV.
The conventional method of vaccine design involves large proteins or whole organism. The major drawback of this approach is that it leads to unnecessary antigenic load and also increases the chances of allergenic responses [20]. This limitation can be overcome by peptide-based vaccines comprising short immunogenic peptide fragments that can elicit highly targeted immune responses, avoiding the chances of allergenic reactions [20]. Recent advancements in computational biology are very effective in designing various approaches for vaccine synthesis [21][22][23][24]. One of these advancements is the emerging role of immunoinformatics tools in effective vaccine designing [25][26][27][28]. Several antigenic epitopes predicted using various immunoinformatics tools have potential translational implications [25,26]. These epitopes are used for the construction of a multi-epitope vaccine which can activate both humoral and adaptive immune responses [29,30]. The immunoinformatic approach is time saving as well as cost effective and has the potential to ensure a successful vaccine design [31]. The techniques used in this approach have the potential to identify candidate proteins that might be overlooked by conventional experimentation [32]. Hence, in this study the immunoinformatic approach is used to develop a multi-epitope, prophylactic vaccine against CHPV consisting cytotoxic T lymphocyte (CTL), helper T lymphocyte (HTL) and interferon-γ inducing (IFN-γ) epitopes (Fig. 1).

Methodology Selection and retrieval of target protein
The VIPR database (https://www.viprbrc.org/brc/home.spg?decorator=vipr) was used to retrieve the glycoprotein sequences of the viruses belonging to the genus Vesiculovirus of the Rhabdoviridae family in order to investigate their evolutionary relationship. These amino acid sequences were aligned implementing the MUSCLE program in MEGA X [33]. The alignment le obtained was further used for the phylogenetic analysis. The phylogenetic tree was constructed using Maximum likelihood algorithm with bootstrap resampling of 1000 replicates in MEGA X [34]. The evolutionary history was inferred by using the Maximum Likelihood method and JTT (Jones-Taylor-Thornton) matrix-based model [35]. The initial tree for the heuristic search was obtained automatically by applying Neighbour-Join and BioNJ algorithms to a matrix of pairwise distances estimated using JTT model, and then selecting the topology with superior log likelihood value.
Analysis of the physicochemical properties of the target proteins.
T cell epitope prediction.

HTL epitopes prediction
Net MHC II pan 3.2 server (www.cbs.dtu.dk/services/NetMHCIIpan/) was used to predict the 15-mer long T cell epitopes, which have been identi ed by HLA Class II alleles [44]. The peptides had been classi ed as strong, intermediate and nonbinders based on the idea of percentile rank as given by Net MHC II pan 3.2 server. The threshold value for strong, weak and non-binding epitopes were set at 2, 10 and more than 10%, respectively. NetMHC II pan 3.2 server predicts peptide binding on human HLADR, HLA-DQ and HLA-DP alleles using Arti cial Neural Network (ANN) system.
Prediction of IFN-γ epitopes-IFN-γ plays a key role in provoking anti-tumour, antiviral and immune regulation activities. Thus, it is important to identify epitopes with the capacity to induce IFN-γ in order to model an e cient multi-epitope vaccine. IFNepitope server (http://crdd.osdd.net/raghava/ifnepitope/) was used to predict out the IFN-γ epitopes from the target protein [46]. The server functionality is focused on a dataset which includes IFN-γ inducing and non-inducing MHC class II binders. Predictions were made by the use of a combination of various approaches, such as machine learning strategy, motive-based analysis and accuracy hybrid approach having maximum accuracy of 81.39% [46].

Multi epitope vaccine construct, Structural Modelling, Re nement and Validation
An immunological adjuvant, Cholera Toxin subunit B (CTB) was added to the N-terminal of the vaccine construct by EAAAK linker as it has the ability to induce regulatory responses. It shows a nity to monosialotetrahexosylganglioside (GM1) which is distributed in all immune cells [48]. The screened CTL, HTL and IFN-γ epitopes were linked together by glycine-proline rich GPGPG linkers. 3D structure of the linear construct was generated using trRosetta [49] which produced a nal model after re nement. The model was then evaluated using ERRAT score [40], Verify3D score [50], RAMPAGE for Ramachandran Plot analysis [38] and ProSA generated Z-score [39] for determining the overall quality of the vaccine construct.
Antigenicity, allergenicity and physicochemical properties of the vaccine construct The antigenicity of the nal vaccine construct was checked by VaxiJen v2.0 [51] and ANTIGENpro (http://scratch.proteomics.ics.uci.edu/) [52]. VaxiJen v2.0, an antigen-free alignment method, focuses on auto-cross covariance (ACC) transformation of protein sequences into standardized vectors with key amino acid properties. This server uses viral databases to derive predictive models where each dataset constituted of 100 known and 100 nonantigenic antigens. Internal leave-one-out cross-validation and external validation using data sets is used to check the generated models. These models behave well in both validations with predictive accuracy varying from 70-89% [51]. ANTIGENpro is an alignment-free, sequence-based and pathogen independent server which utilizes protein antigenicity microarray data to predict the antigenicity. It correctly classi es 82% of the known protective antigens when trained using only the protein microarray datasets. The accuracy on the combined dataset is estimated at 76% by cross-validation experiments [52]. AllerTOP v2.0 (https://www.ddg-pharmfac.net/AllerTOP/) [53] and AllergenFP (http://ddgpharmfac.net/AllergenFP/) [54] were used to determine the allergenicity of the vaccine construct. AllerTOP server utilizes auto-cross-covariance (ACC) sorting of protein sequences into standardized vectors of equal-length. This was extended to analyses of peptides of various lengths of quantitative structure-activity relations (QSAR). AllerTOP uses K-nearest neighbour algorithm (kNN, k = 1) to classify the proteins based on a training set consisting 2427 recognized allergens from different species and 2427 non-allergens [53]. AllergenFP server uses ve E-descriptors which de nes the amino acids in the protein sequences into data sets, and converts the strings into standardized vectors by auto-cross covariance (ACC) transformation. The E-descriptors of the amino acids were extracted from the key component analysis of a data matrix comprising of 237 physicochemical properties. AllergenFP categorizes a protein as an allergen or non-allergen based on the maximum Tanimoto coe cient, ratio of the intersecting ratio set to the union set as a similarity indicator [54]. Isoelectric point, molecular weight, and other parameters like half-life, instability index, aliphatic index and GRAVY score of the vaccine construct was calculated from ExPASy server [63]. TMHMM server v2.0 (http://www.cbs.dtu.dk/services/TMHMM/) and SignalP4.1 (http://www.cbs.dtu.dk/services/SignalP-4.1/) were used to search any potential transmembrane helices [55] or any peptide signals in the nal vaccine construct [56].
Docking of TLR 1, TLR 2, and TLR 4 with the vaccine Molecular docking is one of the most important tools to understand the protein-protein interaction. For the generation of an appropriate immune response, interaction of the antigenic molecule or vaccine with the target immune cell receptor is necessary. Hence, the binding pattern of TLR1, TLR2 and TLR4 with the multi-epitope vaccine was analysed as they localise at the cell surface and gets activated [59,60]. Both TLR1 and TLR2 structures were obtained from the protein data bank (ID 2Z7X). Similarly, PDB ID 3FXI was used to obtain the TLR4/MD2 heterotetramer structure from protein data bank.
Based on the lowest HADDOCK score, docked clusters were formed, and the best cluster was identi ed. The best representative structure from the clusters was subjected to molecular re nement using HADDOCK Re nement Interface. Active and passive residues for the interaction were predicted by CPORT [61]. PyMOL (Schrodinger) was used for visualization of the result. PDBsum (http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html) was used in order to map the interacting residues between the vaccine and the TLRs [62].

Energy minimization and Molecular dynamics simulation
Vaccine construct-GROMACS (GROningen MAchine for Chemical Simulations) a command line Linux-based program was used for the Molecular Dynamics Simulation (MDS) and energy minimization [63]. The vaccine structure was subjected to MDS in order to mimic the experimental conditions. In this study, the physical conditions like the temperature and the pressure was mimicked using the canonical ensemble, NVT and the isobaric and isothermal ensemble, NPT. The temperature used for the simulation was 300K and the pressure used was of 1 bar in order to mimic the experimental conditions. The topology le was generated using OPLS-AA (Optimized Potential for Liquid Simulation-All Atom) force eld constraint for energy minimization and equilibration. The structure was placed at a distance of 1 nm from the cube's edge which was lled with water molecules in order to generate its periodic image 2 nm apart. An equilibrated three-point water model, spc216 was used as the solvent to simulate the vaccine with periodic boundary conditions. The net charge of the vaccine construct was evaluated, and the system was neutralized by the addition of charged ions. The energy minimization process was performed, and the energy minimized structure was obtained. The NVT and NPT equilibration was conducted for 1000 pico-seconds (ps) consisting of 500,000 steps in order to stabilize the temperature and pressure of the system. A 30 ns Molecular Dynamics (MD) simulation run was performed for the energy minimized structure in order to nd the Root Mean Square Deviation (RMSD) of backbone and Root Mean Square Fluctuation (RMSF) of side chain. The radius of gyration was plotted to check the compactness of the protein structure.
Xmgrace, a linux based software, was used to visualise the graphs generated from the simulation [64].
Vaccine-TLR4 complex-The energy minimization process of the vaccine-TLR4 complex was also performed similar to that of the vaccine construct and accordingly, the energy minimized structure was obtained. The NVT and NPT equilibration was performed for 1000 pico-seconds comprising of 500,000 steps for stabilizing the temperature and pressure of the system. A molecular dynamics simulation run of 30 ns of the energy minimized structure was performed to nd the Root Mean Square Deviation (RMSD) of backbone and Root Mean Square Fluctuation (RMSF) of side chain.

Reverse translation and codon optimization and in silico cloning
Reverse translation and codon optimization was performed by the Java Codon Adaptation Tool (JCat) (http://www.jcat.de/) to attain the reverse translated cDNA sequence [65]. This cDNA is optimized for the vaccine to express in E. coli K-12 strain. The results include GC content and the codon adaptation index (CAI). CAI score is ideal when it is 1, though scores more than 0.8 is also considerable [66]. GC content of the vaccine should range from 30-70%, which indicates good translation and transcription e ciencies in E. coli [65]. Eventually, the SnapGene tool was used to insert the optimized multi-epitope vaccine sequence into the pET-28a (+) vector.

Immune simulation
To further characterize the chimeric peptide's immunogenicity and immune response pro le, the C-ImmSim server (https://www.iac.cnr.it/~ lippo/projects/c-immsim-online.html) was used to perform in silico immune simulations [67]. C-ImmSim is an agent-based model for immune response prediction that uses position-speci c scoring matrices (PSSM) derived from machine learning techniques for predicting immune interactions [67]. For most of the vaccines currently in use, the minimum recommended time between the dose 1 and dose 2 is 4 weeks [68]. The entire simulation ran for 1400 time steps which are about 15 months (a time step is about 8 hours). 12 peptide injections were given four weeks apart at time step 10, 94, 178, 262, 346, 430, 514, 598, 682, 766, 850, 934. Then a live virus was injected at time step 1100, which is about 12 months after the simulation starts.

Selection of virulent protein
The VIPR database was utilized for the retrieval of the amino acid sequence of the target glycoprotein of CHPV (Accession number -YP_007641380.1). A phylogenetic tree was constructed using all the glycoproteins of different viruses of the genus Vesiculovirus belonging to Rhabdoviridae family in order to check the evolutionary relationship of CHPV glycoprotein with the glycoprotein of other vesiculovirus (Fig. 2). The CHPV glycoproteins clustered together in a single clade indicating that all the CHPV glycoproteins are closely related to one another with minimum variations (Fig. 2). In addition, a phylogeny tree was also constructed using all the glycoproteins of different strains of CHPV ( Supplementary Fig. S1). The multiple sequence alignment by Clustal Omega generated a percentage similarity table which showed that all the glycoproteins from different strains of CHPV have high percentage of similarity amongst each other, the highest being 100% and lowest being 88.55%. The results suggest that a vaccine designed against one strain of CHPV would be effective against all the other strains of the virus. The accession numbers and the percentage similarity table of all the strains of CHPV glycoprotein are shown in Supplementary Table 1. The target glycoprotein was also checked for human proteome hit recognition in order to assess the risk of autoimmune reactions. The database reference proteins (refseq_protein) using BLASTp (protein-protein BLAST) were used for the identi cation of hits in the human proteome, which con rmed the glycoprotein showed no signi cant similarity with any of the proteins in human.
Homology modelling, re nement and validation of the glycoprotein Since there is no complete structure of Chandipura glycoprotein involving entire 530 amino acids and the PDB structure available for the glycoprotein of this virus involves only 419 amino acids [69], some of our predicted epitopes couldn't be located in the same. Therefore, a 3D model was constructed using the entire glycoprotein of CHPV by Raptor X homology modelling tool (Fig. 3), and the 3D model was used to locate the epitopes selected in this study (Fig. 4). GalaxyRe ne server predicted 5 re ned models of the glycoprotein from which Model 5 was selected (Supplementary Table 2). In the chosen model, the favoured regions in the Ramachandran plot were 98.7% ( Supplementary Fig. S2), ERRAT score was 89.03 ( Supplementary Fig. S2) and Z-Score was − 6.07 ( Supplementary Fig. S2). The molecular weight of the glycoprotein was found to be 59.03 kDa by ExPASy ProtParam server.

Prediction of CTL and HTL epitopes
For the prediction of CTL epitopes, the NetCTL 1.2 [42] and IEDB consensus methods [43] were used (Table 1)  (Supplementary Table 3), whereas the HTL epitopes were predicted using NetMHC II pan 3.2 server ( Table 2) (Supplementary Table 4) [44]. In addition, the identi ed epitopes that were expected to have high binding a nities with HLA class I and class II alleles were subjected to speci c immune lters to screen out the strongest possible epitopes. The parameters for selecting the best possible epitopes were: (i) epitopes should be immunogenic (ii) should be promiscuous (iii) should be antigenic and (iv) should have high population coverage. This study led to the identi cation of promiscuous and overlapping epitopes (Supplementary Table 5) based on all the above mentioned criteria. The overlapping epitopes were antigenic, as predicted using VaxiJen v2.0 server.  Multi epitope vaccine construct, structural properties and re nement.
The nal multi-epitope vaccine was designed by combining 6 CTL, 7 HTL (Tables 1 & 2) and 3 IFN-γ (Supplementary Table 6) epitopes via GPGPG linkers that prevented the chance of any junctional epitope formation [70]. To the nal vaccine construct, the CTB adjuvant was linked at the N-terminal of the vaccine with EAAAK linker (Fig. 5A). The criteria for the incorporation of the epitopes in the multi-epitope vaccine construct were: (i) they should have overlapping HTL and CTL epitopes (Supplementary Table 5), (ii) should be immunogenic, (iii) should have high a nity to HLA alleles, (iv) they should be promiscuous and (v) should have high population coverage. The nal vaccine construct thus designed, consisted of 388 amino acids. The 3D model of the nal vaccine construct was made using trRosetta web server [49] ( Fig. 5B). Since all of the models were already re ned by trRosetta, further re nement of the models was not required. All the models predicted by trRosetta were subjected to various structural analyses in order to select the best suited model. The nal model chosen revealed a Z score of -7.69 that was within the range of scores of comparable sized proteins ( Fig. 5D) [39]. Ramachandran plot analysis showed 93.5%, 3.9% and 2.6% residues in favoured, allowed and outlier regions, respectively (Fig. 5E), which veri ed the overall quality of the nal multi-epitope vaccine construct [38]. Verify3D predicted that 81.44% of the amino acids scored > = 0.2, which further supported the high quality structure of the vaccine model ( Fig. 5C) [50]. ERRAT of the already re ned vaccine construct projected a score of 59.5819 which further veri es the overall quality of the vaccine construct ( Supplementary Fig. S3) [40].

Physicochemical properties of the nal vaccine construct
The linear construct of the multi-epitope vaccine was predicted to be antigenic by VaxiJen v2.0 and AntigenPro with a score of 0.6334 and 0.857163, respectively. It was found to be non-allergenic as predicted by AllerTOP and AllergenFP.
Evaluation of the physicochemical properties predicted by ExPASy (Supplementary material SM1) showed that the multiepitope vaccine construct has a molecular weight of around 40 kDa. The calculated instability index was found to be 13.11(i.e., < 40) which implies that the vaccine is stable. The theoretical pI and aliphatic index was found to be 8.73 and 79.95, respectively. The vaccine is projected to be thermostable at different temperatures due to high aliphatic index value. The estimated half-life of the vaccine is 30 hours in mammalian reticulocytes, > 20 hours in yeast and > 10 hours in Escherichia coli. The grand average hydropathicity (GRAVY) value was recorded as 0.083, indicating the polar nature of the vaccine construct and there was no signal peptide detected which hence will prevent protein localization ( Supplementary Fig. S4). Since no transmembrane helix has been identi ed in the designed vaccine construct, no expression di culties are anticipated in the vaccine development ( Supplementary Fig. S5).

Prediction of B cell epitopes
The ElliPro server was used to predict linear/continuous as well as conformational/discontinuous B cell epitopes with the default parameters as shown in Table 3 and Table 4, respectively. The predicted B cell epitopes within the vaccine construct were visualized using PyMOL (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.) ( Supplementary Fig. S6). Furthermore, the IFN-γ inducing epitopes were predicted using IFNepitope server from the target glycoprotein (Supplementary Table 6).   and HLA-DRB1*15:01 (HLA Class II allele), respectively in order to evaluate their binding patterns (Fig. 6).
Docking of TLR1, TLR2, and TLR4 with the vaccine The HADDOCK (High Ambiguity Driven protein-protein Docking) 2.4 server performed the molecular docking of the energy minimized vaccine construct with TLR1, TLR2 and TLR4.
Docking with TLR4 -HADDOCK clustered 63 structures in 6 cluster (s), which represents 31.5% of the water re ned HADDOCK generated models. The top cluster with the lowest HADDOCK score is the most reliable cluster of all. Therefore, a representative model from this top cluster has been subjected to re nement. The HADDOCK re ning server grouped the resulting 20 structures into one cluster, accounting 100% of the water re ned HADDOCK generated models. The statistics of the re ned cluster is shown in Table 5A, and the details of the structural analysis are provided in supplementary gure S9.
The docked structure along with amino acid interaction is shown in Fig. 7. The detailed overview and the interaction with amino acid sequences are given in supplementary gure S10 and supplementary material SM2, respectively. The secondary structure of the docked complex was predicted ( Supplementary Fig. S8), and further Ramachandran plot and Z-score analysis were carried out for structural validation of the docked complex ( Supplementary Fig. S7). Docking with TLR2-HADDOCK clustered 147 structures in 20 cluster(s), which represents 73% of the water re ned HADDOCK generated models. The top cluster with the lowest HADDOCK score is the most reliable cluster of all. Therefore, a representative model from this top cluster has been subjected to re nement. The HADDOCK re ning server grouped the resulting 20 structures into one cluster, accounting 100% of the water re ned HADDOCK generated models. The statistics of the re ned cluster is shown in Table 5B, and the details of the structural analysis are provided in the supplementary gure S13. The docked structure along with amino acid interaction is shown in Fig. 8. A detailed overview and the interaction with amino acid sequences are given in supplementary gure S14 and supplementary material SM3, respectively. The secondary structure of the docked complex was predicted ( Supplementary Fig. S12), and further, Ramachandran plot and Z-score analysis were carried out for structural validation of the docked complex ( Supplementary Fig. S11) HADDOCK generated models. The top cluster with the lowest HADDOCK score is the most reliable cluster of all. Therefore, a representative model from this top cluster has been subjected to re nement. The HADDOCK re nement server grouped the resulting 20 structures into one cluster, accounting 100% of the water re ned HADDOCK generated models. The re ned cluster statistics are shown in Table 5C, and the details of the structural analysis are provided in supplementary gure S17. The docked structure along with amino acid interaction is shown in Fig. 9. A detailed overview and the interaction with amino acid sequences are given in supplementary gure S18 and supplementary material SM4, respectively. The secondary structure of the docked complex was predicted ( Supplementary Fig. S16), and further Ramachandran plot and Z-score analysis were carried out for structural validation of the docked complex ( Supplementary Fig. S15). Energy minimization and Molecular dynamics simulation of vaccine and vaccine-TLR4 complex Molecular Dynamics Simulation is used for studying the thermodynamic properties and time dependent phenomena [71,72]. In this study Molecular Dynamics Simulation was conducted in order to study the stability of the vaccine and vaccine-TLR docked complex at varying thermo-baric conditions. The energy components, density, pressure, temperature and potential energy were studied using varying parameters.
The 3D structure of the multi-epitope vaccine was subjected to molecular dynamics simulation using GROMACS (GROningen MAchine for Chemical Simulations). The OPLS-AA force eld was applied, and the vaccine construct mass was found to be 40012.04 a.m.u. "gmx solvate" command in GROMACS, was used to add 59685 solvent molecules to the system. Since the overall charge on the protein was calculated to be + 3, three chloride ions were added to the system for neutralization. Energy minimization was conducted for 1979 steps where the force reached < 1000 kJ / mol. The potential energy of the vaccine was evaluated during the energy minimization step. The plot for the same demonstrates nice and steady convergence of potential energy during this step ( Supplementary Fig. S19). The energy minimized structure with a consistent low energy level of -3.2e + 06 kJ/mol supported it to be a reasonable starting structure for the further steps of MDS. The structure was then subjected to an equilibration phase for 1000 ps, in order to study its various thermodynamic properties and the effect of temperature and pressure on the system. The rst equilibration phase was conducted under the NVT ensemble in order to check if the system remained stable at the desired temperature of 300K.
The temperature of the system quickly reached 300K and was maintained for the equilibration period with minimal uctuations, indicating the stability of the system at 300K (Supplementary Fig. S19). The system was then subjected to a second equilibration phase under the NPT ensemble in order to check the stability of the system at a pressure of 1 bar.
The pressure of the system was maintained at 1 bar and the negligible uctuations observed in the plot ( Supplementary  Fig. S19) indicated that the structure also remains stable at a pressure of 1 bar. The average density of the system computed was 1007.4 kg/m 3 with a total drift of 0.3 kg/m 3 .
Similarly, energy minimization was performed for the vaccine-TLR4 complex. The potential energy of the system was found to be -1.1e + 07 kJ/mol (Supplementary Fig. S20). The consistent low energy level of the energy minimized structure supported it to be a reasonable stable structure for the next steps of the simulation. The structure was also subjected to an equilibration phase for 1000 ps in order to study the effect of temperature and pressure on the system. The equilibration was conducted using the same method as described above. The result for the NVT equilibration of the vaccine-TLR4 complex indicated that the temperature of the system very quickly reached 300K and was maintained for most of the equilibration phase, indicating its stability (Supplementary Fig. S20). Similarly, the NPT equilibration of the vaccine-TLR4 complex suggested that the system remains stable at an atmospheric pressure of 1 bar which is evident from the very little uctuations as observed in the plot (Supplementary Fig. S20). The computed density for the vaccine-TLR4 complex was found to be 1011.6 kg/m 3 with a total drift of 0.6 kg/m 3 .
In the next step, a trajectory analysis was performed for 30 ns in order to monitor the overall stability of the vaccine and vaccine-TLR4 complex. The RMSD plot of the vaccine obtained after the simulation period of 30 ns showed successive uctuation up to 10 ns, and then it reached a steady state for the rest of the simulation (Fig. 10A). The RMSD plot of the vaccine-TLR4 complex indicated that the complex endured uctuations until 12 ns and then reached a steady state which remained stable for the rest of the simulation (Fig. 10C). This trend arises from the efforts of TLR4 and the vaccine molecules to get the best position relative to each other so that they can obtain the most appropriate interaction during the MDS run. The RMSD value for the vaacine-TLR4 complex was much lower when compared to the RMSD value of the vaccine alone. This is a clear indication of the fact that the vaccine and TLR4 has formed strong interactions among themselves which led to the greater stability of the complex.
RMSF was another parameter which was evaluated in order to check the regions with greater exibility as well as stability. The RMSF plot of the vaccine showed that various parts of the vaccine molecule had different uctuations (Fig. 10B). In particular, Pro372, Leu383, Phe385, His387 and Thr388 of the vaccine showed high degree of uctuation during the MDS run. The RMSF plot for the vaccine-TLR4 complex was almost stable with minimal uctuation (Fig. 10D). The highly uctuating residues (Pro372, Leu383, Phe385, His387 and Thr388) obtained from the RMSF data of the vaccine showed very little uctuation when compared to the RMSF data of the vaccine-TLR4 complex. This indicates that these residues tried to modify their interaction with the TLR4 and this modi cation led to the reinforcement of appropriate interactions between the vaccine and TLR4 protein, which led to the stability of the system. This result is further supported from our data obtained from studying the docking interactions between the vaccine-TLR4 complex, where the same residues of the vaccine (Pro372, Leu383, Phe385, His387 and Thr388) have formed hydrogen bonds with TLR4 protein.
The changes in the radius of gyration of vaccine and vaccine-TLR4 was monitored in order to evaluate the compactness of the protein structure during the MDS run of 30 ns. The plot for radius of gyration of both vaccine and vaccine-TLR4 complex indicated the compactness of the structure and it can be concluded that the intra and intermolecular interactions between the vaccine and TLR4 led to the compaction of both TLR4 and the designed vaccine ( Supplementary Fig. S19 & S20).

Reverse translation and codon optimization
GC content in the reverse translated cDNA was found to be 54.89%, which is admirable as it falls between expected ranges of 30-70% [65]. The CAI (Codon Adaptation Index) value was calculated to be 1, meaning the multi-epitope vaccine construct ventures high level expression in E. coli K-12 strain [66]. The cDNA of the vaccine was then inserted into pET-28a (+) vector for restriction cloning (Fig. 11).
Immune simulation C-ImmSim immune server was used for generating an in silico immune response. The generated response was compatible with the actual immune responses as shown in Fig. 12. The graphs showed signi cantly higher secondary and tertiary responses as compared to the primary response. Elevated levels of immunoglobulin activity (IgG1 + IgG2, IgM and IgG + IgM antibodies) were observed in both secondary and tertiary responses with a subsequent decline in antigenic concentration. Additionally, multiple B cell isotypes were found with long lasting behaviour, suggesting possible class switching and memory development (Fig. 12A & B) (Supplementary Fig. S21).A relatively high response was observed in populations of TH (helper) and TC (cytotoxic) cells with pre activation of TCs during vaccination (Fig. 12D & C) ( Supplementary Fig. S21). Higher macrophage activity was demonstrated during exposure, while dendritic and NK cell activity was found to be consistent (Supplementary Fig. S21). High levels of IFN-γ and IL-2 were also evident which says that a good immune response has been generated (Fig. 12E). In order to check the e cacy of the vaccine, a live replicating virus was injected at around day 366. The results from the antigen graph (Fig. 12A) show that when a live replicating virus is injected after vaccination, the antigenic surge is virtually absent. This is a clear indication of the fact that an effective immune response has been generated mainly because of the high concentration of speci c antibodies. In order to check the effectiveness of the vaccination, a control simulation was also performed which consisted of an injection of the live virus at time zero. The results indicate that though a naive immune response is elicited, it is not su cient to eradicate the virus in absence of vaccination ( Supplementary Fig. S22).

Discussion
CHPV, which causes encephalitis, has been regarded as an emerging tropical pathogen with fatality rate of 55-77%, predominantly affecting children of age group 2-16 years [73]. There is no speci c treatment available for the disease till date but a symptomatic treatment is done using mannitol to reduce the brain edema [16]. Hence, development of an effective vaccine has become the need of the hour. A recombinant vaccine has been developed using the complete G gene of the CHPV isolate [19]. A β-propio lactone (BPL) inactive tissue culture based vaccine has also been developed [74]. Unfortunately, for both types of vaccines developed so far, clinical trials have been performed only in mice but not in humans [10]. So far no vaccine is available against CHPV. Thus, aim of the present study was to design a multi-epitope, prophylactic vaccine targeting the CHPV glycoprotein which is primarily responsible for the virus envelopment, budding and antigenicity. Epitope-based vaccines represent a new strategy, developing immune response only against the selected epitopes, thus, avoiding the side effects of other unfavourable epitopes unlike the case of using complete antigen in recombinant vaccine technology [20,75,76].
The advances in the eld of immunoinformatics, along with the knowledge of host immune response is being successfully used for the development of epitope based vaccines against various pathogens [77,78]. Designing of epitope driven vaccine is a modern idea, which is successfully applied in several immunological studies for the development of vaccines [79]. Similarly, the present study is centred on designing of a multi-epitope vaccine because these vaccines have advantages over traditional and single-epitope vaccines due to the following unique features: i) TCRs from various T cell subsets can recognize the multiple MHC Class I and Class II epitopes ; ii) CTL, HTL and B cell epitopes may overlap, thus, it has the capacity to induce humoral and cellular immune responses simultaneously; iii) linking an adjuvant to the vaccine enhances the immunogenicity and provides long lasting immune response; iv) the likelihood of pathological immune responses or adverse effects is lowered because it is less likely to contain unwanted components [29,30,70,[80][81][82][83]]. Further, it has been demonstrated that multi-epitope combined vaccine induce stronger CTL responses compared to those induced by a single-epitope vaccine by enhancing cellular immunity and releasing immune tolerance [29]. The cellular and humoral responses generated by the multi-epitope vaccines are highly speci c with increased cytokines production [84][85][86]. A multi-epitope vaccine developed with such precautions can thus become a powerful tool in the battle against viral infections [87].
One of the troubles with the conventional approach of vaccine discovery is that many of the proteins expressed during infection are not always expressed in vitro, i.e., good candidate antigens might be overlooked [32]. In silico methods whereas, screens for all the probable candidate antigens, as predicted by various tools and algorithms which might otherwise remain undetected [32]. It is extremely important to pick the correct antigenic epitopes of the target proteins to be used in the building of multi-epitope vaccine through in silico methods [88]. The CTL, HTL and IFN-γ epitopes selected for the study were screened against various immunological lters (Table 1 & Table 2). The applied lters were: the epitopes should be promiscuous (bind with multiple MHC class I and MHC class II alleles), must be present on the surface of the target protein, must be immunogenic as well as antigenic. The promiscuous epitopes are those with sensitivity towards several HLA alleles. These epitopes are of paramount importance in vaccine formulation, as they are capable of developing an e cient immune response in the host, as they have a nity to several forms of HLA allele. Thus, the ltered out HLA class I and class II T cell epitopes were further evaluated for the study of promiscuity. In the present study, the epitopes expected to have a strong binding a nity to multiple HLAs were screened out and identi ed as promiscuous epitopes. The overlapping CTL and HTL epitopes have the potential to trigger cytotoxic T cells and helper T cells, which in turn generate immune responses. Allergenicity is one of the key issues faced during the production of vaccines. Hence, evaluation of allergenicity is necessary at an early stage of designing the vaccine. While developing the nal vaccine model, the screened out epitopes were rst subjected to an allergenicity assessment followed by the whole vaccine construct. The vaccine construct designed in this study was observed to be non-allergenic.
Other physicochemical features like molecular weight, instability index, theoretical pI, aliphatic index, GRAVY score and half-life of the vaccine were also checked. The molecular weight of the vaccine was found to be 40 kDa and the instability index calculated was 13.11 which indicated that the designed vaccine is quite stable. Generally, a protein whose instability index is smaller than 40 is predicted to be stable and values above 40 predicts that the protein may be unstable [41]. Moreover, the vaccine has exhibited a fair percentage of solubility in over-expressed conditions. The GRAVY index of the vaccine was 0.083 indicating the vaccine's polar nature and its effective interaction with water [89]. The high aliphatic index of 79.95 signi ed that the vaccine is a thermostable protein. The aliphatic index is commonly de ned as the relative volume of a protein which is occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine) [90]. The half-life of the vaccine is 30 hours in mammalian reticulocytes, > 20 hours in yeast and > 10 hours in Escherichia coli. The half-life of a protein is de ned as the time it takes for the concentration of the protein to be reduced by 50% after its synthesis in the cell. Similarly, Foroutan and his colleagues used the same array of in silico analysis in order to assess the allergenicity and physicochemical properties of their designed vaccine candidate against Toxoplasma gondii [91]. They have also performed laboratory validation of their vaccine candidate, which revealed that the multi-epitope vaccine was able to trigger strong humoral and cellular responses in mice [91]. The physicochemical properties predicted in our study were comparable to those predicted by Foroutan et al., in their recently published work [91]. In fact, the instability index and aliphatic index of our vaccine candidate was found to be better when compared to the values reported by [91]. The Ramachandran Plot, ERRAT score, Verify3D score and Z score analysis validated the overall quality of the vaccine construct (Fig. 5). Thus, after rigorous in silico analysis the nal vaccine construct was designed. A similar approach was used by Bazhan and his co-workers, where they have designed a T-cell multi epitope vaccine against Ebola virus. The Tcell epitopes were predicted using IEDB -Immune Epitope Database and the vaccine candidate constructed using the suitable epitopes were found to be immunogenic when expressed in mice [92].
As the CHPV glycoprotein is an envelope protein, the Toll-like Receptor-4 (TLR4) and Toll-like Receptor-2 (TLR2) expressed in the plasma membrane of the cells should primarily recognize the structural components of the virus [93][94][95]. CHPV regulates TLR4, which leads to the secretion of pro-in ammatory cytokines and Nitric oxide (NO) in monocytes-macrophage cell line of mouse [96]. In humans, TLR4 is expressed in immune cells such as monocytes, macrophages, granulocytes and immature dendritic cells [97]. Cholera toxin B (CTB) has been proven to act as a potential viral adjuvant [98][99][100]. Activation of TLR4 by CTB is presumably due to the direct interaction between TLR4 and CTB [101]. This conclusion was supported by the evidence that the capacity of CTB to induce in ammatory response is lost in TLR4-de cient macrophages. CTB is able to induce NF-κB activation in TLR4 receptor cells by direct binding, which has been demonstrated using ELISA-based assays [101]. Further, TLR2 has been associated with the recognition of viral envelope glycoprotein [93]. The core TLR2 signalling pathway utilizes myeloid differentiation factor 88 (MyD88) as the primary adaptor, and results in NF-κB and mitogen-activated protein kinase (MAPK) activation as well as secretion of a core panel of cytokines [93]. It has also been reported that TLR2 acts as heterodimer with TLR1 for the generation of innate immune response and has been shown to recognise viral proteins [59,[93][94][95]. TLR1/TLR2 dimer generates intracellular signalling via IRAK4 mediated activation of IRAK1/2, which results in the activation of NF-κB, p38 and JNK proteins in the cytoplasm, involved in triggering innate immune response [102].
Figure13. Proposed mechanism of working (A) TLR signal transduction pathway: TLR1/TLR2 heterodimer or TLR 2/2 homodimer utilizes MyD88 and MAL as primary adapters to activate NF-κB that triggers in ammatory cytokine secretion. TLR4 uses four primary adapters namely MyD88, MAL, TRIF and TRAM for NF-κB secretion which in turn induce in ammatory cytokine secretion activating IFN pathway. (B) The CTB activates and interacts with TLR4, expressed on macrophages, B cells and monocytes which up regulate the cytokine secretion. The other immune cells, such as NK-cells, T cells or other human monocytes, will also indirectly be stimulated by CTB. Furthermore, the CTL and HTL epitopes interact with HLA class I and HLA class II and thus form epitope-HLA complexes which in turn interact with CTLs and HTLs, activate them and induce their proliferation. The IFN-γ will induce IFN genes. The proposed vaccine is thus capable of stimulating both adaptive and innate immunity.
Hence, the interaction pattern of multi-epitope vaccine against TLR4/TLR2/TLR1 was analyzed using Molecular Docking Analysis (Figs. 7, 8 & 9). The docking analysis of TLR4 and the vaccine construct showed that there is 1 salt bridge and 12 hydrogen bonds formed during this interaction. The docked complex showed that the salt bridge was formed between Glu53 of TLR4 and Arg303 of the vaccine. Out of the 12 hydrogen bonds, 10 bonds were formed between TLR4 and the vaccine, and remaining 2 bonds were formed between the MD-2 co-receptor protein and the vaccine. Thus, docking studies indicate that both TLR4 and MD-2 are responsible for a stable interaction of the vaccine. Molecular Dynamics Simulation (MDS) for both the vaccine and vaccine-TLR4 complex was performed in order to assess the stability of the vaccine and the complex at various thermo-baric conditions. MDS results indicated that both the structures remained stable at a temperature of 300K and a pressure of 1 bar. A trajectory analysis of 30 ns revealed that both the structures remained stable during the simulation run of 30 ns (Fig. 10). However, the RMSD value obtained for the vaccine-TLR4 complex was much lower when compared to the RMSD value for the vaccine, indicating stable interaction between the vaccine and TLR4 protein. The RMSF plot for the multi-epitope vaccine showed various regions of high exibility for the vaccine, whereas the RMSF plot for the vaccine-TLR4 complex was almost stable with very little uctuation. The highly uctuating residues (Pro372, Leu383, Phe385, His387 and Thr388) obtained from the RMSF data of the vaccine showed very little uctuation when compared to the RMSF data of the vaccine-TLR4 complex. This indicated that these residues tried to modify their interaction with the TLR4 and this modi cation led to the reinforcement of appropriate interactions between the vaccine and TLR4 protein, which led to the stability of the system. The plots for radius of gyration also showed little variation which indicated the compactness of the protein structures due to inter and intra molecular interactions between the vaccine and TLR4 protein. The e cient cloning and expression of such a multi-epitope vaccine in a suitable expression vector is again a very important step. Hence in the present study, an in silico cloning was performed to assure effective expression and translation of the designed multi-epitope vaccine construct into an expression vector: pET-28a (+) (Fig. 11). Several research groups have recently applied similar strategy to design a multiepitope vaccine against Klebsiella pneumoniae [88], Kaposi Sarcoma [20], Pseudomonas aeruaginosa [103], Epstein Barr virus [104], Malaria [105], Hendra virus [106] and Nipah virus [107]. Similar approaches have also been used for developing vaccine against cancer antigens [22,[108][109][110] The proposed mechanism of action was also predicted for the nal vaccine model (Fig. 13). Since, the vaccine comprised of CTL, HTL, and IFN-γ epitopes, it could trigger the stimulation of the respective immune cells in the host, which could further activate other immune cells by complex signaling.

Page 22/41
CHPV is a fast growing pathogen that causes acute encephalitis. Applying computational methods in vaccine design will greatly improve the vaccine discovery process and can achieve that goal in less time and at lower cost. This study utilized the immunoinformatics approach to construct a novel and effective multi-epitope vaccine, which can deliver both humoral and cell-mediated immune response. After the application of various immune lters and a range of immunoinformatics techniques, the nal vaccine was constructed with 6 CTL, 7 HTL and 3 IFN-γ epitopes. The vaccine was predicted to be immunogenic and non-allergenic. Molecular docking, molecular dynamics simulation, and in silico immune simulation were used to check the nal vaccine's stabilization and immune responses. The results obtained can help in the design of an effective vaccine against CHPV infection.  A systematic representation of all the steps followed in the designed study.  Modeled structure of CHPV glycoprotein with sequence length of 530 amino acids as predicted by RaptorX.

Figure 4
Glycoprotein 3D models showing the surface position of the epitopes included in the multi-epitope vaccine construct.
CTL epitopes marked by red colour, HTL epitopes are marked by blue colour and IFN-γ epitopes are marked by green colour.   shown in black colour.

Figure 11
In silico restriction cloning. The red part represents the codon optimised multi-epitope vaccine insert into the pET-28a (+) expression vector (shown in black).  Proposed mechanism of working (A) TLR signal transduction pathway: TLR1/TLR2 heterodimer or TLR 2/2 homodimer utilizes MyD88 and MAL as primary adapters to activate NF-κB that triggers in ammatory cytokine secretion. TLR4 uses four primary adapters namely MyD88, MAL, TRIF and TRAM for NF-κB secretion which in turn induce in ammatory cytokine secretion activating IFN pathway. (B) The CTB activates and interacts with TLR4, expressed on macrophages, B cells and monocytes which up regulate the cytokine secretion. The other immune cells, such as NK-cells, T cells or other human monocytes, will also indirectly be stimulated by CTB. Furthermore, the CTL and HTL epitopes interact with HLA class I and HLA class II and thus form epitope-HLA complexes which in turn interact with CTLs and HTLs, activate them and induce their proliferation. The IFN-γ will induce IFN genes. The proposed vaccine is thus capable of stimulating both adaptive and innate immunity.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. DebetalCHPVAdditionalFile1.docx