3.1. Selection of antigenic proteins, their retrieval and homology with human proteome
There exist two kinds of proteins in SARS-Cov-2 proteome, the non structural and the structural proteins. The non structural proteins principally participate in the process of viral replication, whereas the structural proteins function in the progression of viral assembly and infection within the host 51. Therefore, both classes of proteins should bear equal consideration while either developing a vaccine or a drug. In this study too, we considered a combination of both non structural and structural proteins for fabricating a vaccine. These proteins included non structural 2 (NSP2), non structural 8 (NSP8), helicase (H), Nucleocapsid protein (N) and receptor binding domain (RBD) of spike (S) protein. All of them were retrieved from NCBI database and evaluated for homology against the human proteome using a BLASTp search. The viral helicase displayed 22.4% of sequence identity to the human ZGRF1 isoform X10 protein; with a query coverage of just 34%. Since, the probability of cross reactivity is considered as very low, if the similarity between human and pathogen proteins is less than 40% 52 we were able to select helicase for our further studies. Similarly, the nucleoprotein, exhibited 55.56% identity to the human immunoglobin heavy chain junction region which although appears high, but considering the exhibited query coverage value of just 4%,and a high e-value of 5.4 in the BLASTp results, was deemed insignificant. On this basis nucleocapsid protein was also taken forward into the investigation. All the other proteins did not show any similarity against the human proteome. Further, these proteins were evaluated for their antigenic potential using the ANTIGENpro tool of the Scratch protein properties predictor server. All the proteins were found to be highly antigenic and therefore, were subsequently examined for epitope generation and vaccine designing (Supplementary Table 1).
3.2. Identification of B cell epitopes
The ABCPred server predicted the B cell epitopes against each selected protein. Based on their ranking and the highest score, the best epitope of 10 mer length for each protein was chosen for further investigations (Supplementary Table 2).
3.3. Identification of TH cell epitopes
The IEDB database was utilized to predict Helper T lymphocytes- MHCII recognised epitopes for each protein. The percentile rank andIC50 values were the main criteria utilized for selection of the predicted epitopes (specifically a cut off in percentile rank of less than or equal to 0.9 and anIC50value of below or equal 50). Lower the percentile rank and the IC50 value, better is the binding efficiency of the epitope 11. In order to cover maximum population possible, the complete human HLA reference set was stipulated in the parameter window. The best two epitopes for each protein were selected. The preferred TH cell epitopes with their IC50 value and the percentile rank score are presented in supplementary file in table 3.
3.4. Identification of TC cell epitopes
The Net-CTL server was applied to identify the TC cell epitopes. Barring the supertype selection, all other parameters such as threshold value (0.75), C-terminal cleavage (0.15), and TAP transport efficiency (0.05) were set to default values. The HLA supertype selection determines the specific and efficient perception of epitopes for T cell’s, aimed at several viral diseases, including SARS, MERS, HIV and MMR11. Based on the fact that A2, A3 and B7 HLA supertypes together cover more than 88% of the global population, epitopes were predicted by selecting these supertypes in the parameter window. The study resulted in identification of 15 epitopes, 3 each from every protein (Supplementary Table 4).
3.5. Prediction of antigenicity of the epitopes, assessment of toxicity and construction of the vaccine construct and its efficacy determination
All the epitopes were evaluated for their antigenicity using the VaxiJen server (Supplementary Table 2, 3 and 4). Those epitopes which exhibited VaxiJen scores below the threshold value of 0.4, were left out from further studies. In addition, the plausible toxicity of all the B and T cell epitopes was examined using the ToxinPred server; and all of them were predicted to be safe and non toxic (Supplementary Table 5, 6 and 7). The non toxic, B, TH and TC cell epitopes which exhibited VaxiJen scores higher than the threshold were used to create the vaccine construct.
The selected epitopes (antigenic peptides) were coupled together, utilizing the linkers to yield the vaccine candidate. These linkers, not only assist in maintenance of construct immunogenicity, but also in making the resulting protein complex more flexible and stable.11 The B cell epitopes were linked using linkers ‘KK’ while the TH cell and the TC cell epitopes were connected through ‘GPGPG’ and ‘AAY’ linkers, respectively. Finally, the ‘EAAAK’ linker joined the adjuvant - human beta defensin-3 (which is known to augment a sturdy B and T cell response29,28) to the assembled epitopes for generating the vaccine construct (Supplementary Figure 1). On evaluating the generated vaccine construct for its antigenicity, score values of 0.6033 and 0.836364 were obtained by VaxiJen and ANTIGENpro respectively, confirming the strong immunogenic potential of the proposed multi subunit vaccine.
3.6. Physiochemical properties and safety assessment of the designed Vaccine construct
Various physicochemical properties were determined for the vaccine construct using the ProtParam and PepCalc A tools. Both the tools yielded acceptable physicochemical properties for the vaccine construct. Briefly, the instability index was found to be 20.26 (should be below 40); half-life of the protein was predicted to be approximately- 30 hours in mammalian reticulocytes, >20 hours in yeast and >10 hours in E. coli. The calculated GRAVY (Grand average of hydropathicity) score was -0.273; the value of aliphatic index of the protein was 77.4, the instability index (II) was determined to be 20.26, and the construct was predicted to display good water solubility. Additionally, solubility upon recombinant expression was examined using the SOLpro tool, where the protein was anticipated to be soluble with a probability score of 0.793223. Table 1 depicts all these properties, evaluated by the mentioned tools.
The AlgPred server predicted the possible allergenicity for the vaccine construct. It applies six methods (IgE mapping, MEME/Mast motif, amino acid as well as dipeptide composition based SVM modules, BLAST inspection on ARPs and a hybrid approach that encompasses all these parameters) for determination of protein allergenicity. Among these methods, the SVM modules that take amino acid and dipeptide composition into consideration predicted the protein as allergenic; whereas the hybrid approach, MEME/Mast motif, BLAST search on ARPs andIgE surveying predictors classified the vaccine construct as non-allergenic. In addition, evaluation of the construct using the AllerTop and AllergenFP servers also identified the protein to be non-allergenic. Considering these results, the proposed vaccine candidate can be indexed as a probable non allergen.
3.7. Prediction of the trans-membrane helix
The trans-membrane helix forming regions, or the membrane spanning peptide portion of a protein is immunologically non-accessible. The vaccine construct has no noteworthy alpha-helix trans-membrane (TM) topology as concluded by the TMHMM probability plot (Supplementary Figure 2A). Using the TMPred web-server, scores greater than 500 are considered as crucial for TM helix prediction. The proposed vaccine construct exhibited three fragments, 89-110 (score 633) in inside-to-outside (i-o) orientation, 130-149 (score 562) inoutside-to-inside (o-i) orientation and 283-301 (score 1064) inside-to-outside (i-o) orientation when computed through the strongly preferred model (Supplementary Figure 2 B). Whereas the alternative model predicted two TM helices, viz. 92 -110 (score 1136) in outside-to-inside (o-i) orientation and 283-301 (score 1064) in inside-to-outside (i-o) orientation (Supplementary Figure 2B). The TOPCONS server which utilizes six algorithms to detect TM helix was also applied. Neither the TM helix nor a signal sequence was predicted in the vaccine construct by this server (Supplementary Figure 2C).
3.8. MHC cluster analysis
The potential MHC I and MHC II allele interactions with the epitopes in the vaccine candidate were examined by the MHCcluster server. The relationship of the cluster of alleles with the epitopes is represented in the form of a specificity heat map and tree (Supplementary Figure 3). The red and the yellow zone symbolize strong and weaker interactions respectively.
3.9. Tertiary structure prediction and quality assurance of the designed candidate vaccine
The 3Dpro module of Scratch protein predictor, web-server estimated the likely 3Dconfigurationof the proposed vaccine candidate. On analysis by structure visualization tools such as PyMol and UCSF Chimera, it was observed that the designed multi-epitope vaccine majorly comprised of alpha helices, with very little contribution from a beta sheet structure. Structural refinement was performed by Modeller 9.24 and the refined structure (Figure 1A) was validated using ProSA, PROCHECK and ERRAT2.ProSA estimates a global quality score for the structure provided as input; the structure is considered free of errors if the ProSA-z score falls within a limit that is representative of native proteins having similar number of residues. The ProSA score for the fabricated vaccine construct’s structure was observed to be -5.38 and as is visible from the figure1B and C, it falls well within the range, and thus is confirmed for its quality. The PROCHECK generated Ramachandran plot validated the quality of the 3D structure, as most - 96% of the residues were positioned in the most favoured regions and only 0.4% of them were localized in the disallowed regions (Figure 1D). Analysis by ERRAT2 predicted the overall quality factor as 93.233 (signified as proportion of the protein for which the computed rate of structural inaccuracy lies within the 95% elimination limit) (Figure 1E).
3.10. Molecular Docking of the Vaccine construct with TLR-3
For activation of an immune response a suitable interaction amidst the immune receptor and the antigen molecules is essential. Therefore, to assure that the designed vaccine candidate is capable of associating with TLR-3 we performed a docking study using HADDOCK web-server. HADDOCK clustered 40 structures of the docked complex in a total of 6 clusters, the cluster with the best statistics (RMSD values; Van der Waals, Electrostatic and Desolvation energy values) was found to have a Z score of -1.9; lowest of all the clusters. The more negative the Z score, the better is the interplay between the receptor and the ligand. The details of the topmost docked cluster are given in the table 2 and the binding of the vaccine candidate to TLR-3 is represented in figure 2.
3.11. MD Simulation
MD simulation for the docked complex of TLR-3 and the designed vaccine was evaluated through trajectory study. For the progression of 20 ns MD run, the stable trajectory was perceived and the representative structures were acquired. Different steps for energy minimization, pressure and density equilibrium, and temperature persistence were analysed (Supplementary Figure 4). The deviation of the backbone atoms for simulated structures, comparative to the starting structures, was utilized as a reference and was assessed through RMSD (Figure 3A). Steep RMSD variation during the entire simulation can be an insinuation of a malleable and free instinctive protein or the variation of the force field. Depending upon the effects of the RMSD assessment, Figure 3A signifies that the RMSD fluctuation stabilizes, around 10 ns into the MD simulation; and thus, the total time for simulation was adequate. In the time frame 10-20 ns, RMSD for TLR-3-vaccine candidate complex has an approximate value of about 0.7-0.8 nm.
The fundamental magnitude of this principle is attained by evaluating the disparities arising from modifications of each of the protein residues that majorly feature the most flexible chain frames. Hence, we validated the residual fluctuations by calculating the mean fluctuations for the stable trajectory of the simulation. The RMSF evaluation of all protein residues was attained in order to check the residues that may have inclined to an enhancement in the RMSD results (Figure 3B). Convincing fluctuations existed in the terminal residues, few β-sheet regions along with the loops linking the alternative β-sheets of TLR-3 (residues 330-350, and 490-550) to about 0.32 nm, and few α-helical sections of vaccine candidate (residues 130-210) to about 0.73 nm. Also, we noticed that residues 299-355 of TLR-3 showed comparatively reduced discrepancy in RMSF values; these residues form the binding segment for the corresponding vaccine candidate residues (175-272) after MD simulation. Interestingly, the binding region for TLR-3 and the designed vaccine before and after MD simulation was found to be similar, suggesting a strong binding interaction of the vaccine designed towards TLR-3, and provides stability to the docked complex (Figure 4A).
The radius of gyration investigation was achieved to determine the modification in compactness of TLR-3 and designed vaccine throughout the MD run. The Rg plot shows decrease in radius of gyration values and hence increase in compactness of the protein complex, persists after MD simulation (Figure 3C). The compactness of the TLR-3 protein complex is due to the strong binding interaction of the designed vaccine. Similar observations were determined through SASA analysis representing the solvent defined protein surface and its orientation through folding, making the alterations in the exposed and buried regions of the surface area of proteins. SASA values for the simulation was about 485 nm/S2/N after 15 ns (Figure 3D). Also, it remains similar till 20 ns suggesting that TLR-3-vaccine solvation profile shows a convincing SASA value suggesting a stable structure and strong binding interaction with the vaccine candidate.
Also, the hydrogen bond landscape was evaluated, which exposed the dynamic equilibration of the complex trajectory with a high number of hydrogen bonds, as shown in Figure 5. The consistent high numbers of hydrogen bonds were perceived which contributed significantly to the proximal binding of the designed vaccine with the TLR-3 receptor. Further, these results are strengthened by the vital contribution by the complex binding energies throughout the simulation run. These calculations with consistent high binding energies and large hydrogen bonds involvement demonstrate the stable binding of candidate vaccine with TLR-3.
Further, cluster analysis having a RMSD based cut-off value of 0.25 nm confirmed the development of 17 distinctive clusters for TLR-3 vaccine complex system. The most dominant cluster attained after 10 ns of MD simulation is depicted in supplementary figure 5. Also, a secondary structure investigation of the stable trajectory was implemented by the DSSP tool of GROMACS. TLR-3-vaccine complex was formed mainly of continuous coils and conserved β-sheet regions as secondary structure elements infused with various small segments of bend, α-helix, turn, and β-bridge (Figure 4B). Both cluster analysis and secondary structure interpretation reveals the conformational modifications before and after simulations for TLR-3 vaccine complex structure.
3.12. Immune simulation
The C-ImmSim server was utilized for immune simulation study, which gives a picture of the possible immune response on vaccine administration. The peaks for IgG1 + IgG2 and IgM showa secondary and tertiary immune reaction along with the presence of IgG + IgM, all of which seems to peak between 10-15 days (Figure 6A). The active B cell population was stimulated, which peaks in around 8 days and remains stagnant for many days (Figure6B). Similarly, the cell mediated immune response which is led by TC& TH cells was also found to be high (Figure 6 C and 6 D). The IFN- γ level was also found to be inflated during vaccine administration (Figure 6 E). These results indicate a high probability, of generating a competent immune response which can potentially control the SARS-Cov-2 pandemic.
3.13. In silico cloning
The vaccine construct was reverse translated into the corresponding nucleotide sequence and codon optimization was executed using the GenScript tool, which also predicts crucial parameters to forecast protein expression in the host. The CAI value and GC content of the sequence was 1 and 62.37% respectively, along with 0% CFD of the gene in E. coli (Supplementary Figure 6). These outcomes imply that the optimized nucleotide sequence is suitable for cloning and expression in E. coli 53. The in silico cloning was performed using the WebDSV 2.0 server, utilizing the of pET28a (+) expression vector’s HindIII and BamHI restriction sites. The results of in silico cloning are presented in the figure 7.