Sequence retrieval and identification of T-cell epitope
The protein sequences of the Envelope protein (YP_009724392), Membrane glycoprotein (YP_009724393), and surface glycoprotein (YP_009724390) of SARS-corona virus–2 (SARS CoV–2) retrieved from the NCBI server. The sequences assessed by the Vaxijen and found that all these three proteins are antigenic in nature with a score of 0.6025 for Envelope protein, 0.5102 for Membrane glycoprotein, and 0.4646 for Surface glycoprotein. The T-cell epitopes for these three proteins identified by the NetCTLv1.2 server, where the epitope prediction was restricted to 12 MHC class I supertypes. The top 10 epitopes for Envelope protein and top 12 for Membrane glycoprotein and Surface glycoprotein were (Table 1) selected based on the highest combined score, listed for further analysis.
Both the MHC class I and MHC class II-restricted alleles predicted by IEDB analysis resource based on the IC50 value. All the predicted epitopes in Table 1 evaluated for the analyses of MHC interaction. The MHC class I alleles interacted with epitopes of E-protein, M-protein, S1, and S2 protein summarized in Table S1, S3, S5, and S7, respectively. The number of MHC class I alleles interacted with the predicted epitopes for all these four proteins summarized in Table 2. The MHC class II alleles interacted with epitopes of E-protein, M-protein, S1, and S2 protein summarized in Table S2, S4, S6, and S8, respectively. The MHC class II epitopes (15-mer) selected depending on the 9-mer epitope as a core. The number of MHC class II alleles interacted with the predicted epitopes for all these four proteins summarized in Table 3. After MHC class I and MHC Class II analyses, we selected top interacting peptides and denoted each by a name (Table 4), for example, two peptides for Envelope protein (CVEnvA2, KPSFYVYSRVKNLNS, and CVEnvB2, NIVNVSLVKPSFYVY), Two peptides for Membrane glycoprotein (CVMemA2, VGLMWLSYFIASFRL, and CVMemB2, VIGAVILRGHLRIAG),
three peptides for S1 protein (CVS1A2, FNATRFASVYAWNRK; CVS1B2, ADSFVIRGDEVRQIA, and CVS1C2, ISNCVADYSVLYNSA) and two peptides for S2 protein (CVS2A2, IWLGFIAGLIAIVMV, and CVS2B2, FLHVTYVPAQEKNFT).
The MHC class 1 interaction has been crosschecked by EPISOPT software, the result shown in Table S9. The result showed that the peptide, VSLVKPSFY is not a suitable MHC class-I epitope. The interaction with MHC class II has been validated by a software PREDIVAC, which predicts on the basis of the specificity-determining residue (SDR) concept. We assessed the epitopes for the interaction with the HLA-DRB1 alleles including 01:01, 03:01, 04:01, 07:01, 08:01, 10:01, 11:01, 12:01, 13:02, 14:01, and 15:01 that are expected to cover more than 95% of the worldwide population (Table S10) 25. The peptides were checked for antigenicity by the Vaxijen software, and it was found that all the peptides are potential antigens except CVS1B2 and CVS1C2 (Table S10).
Population Coverage and Conservancy Analysis.
The prediction of both MHC class I- and MHC class II-based coverage of the selected epitopes performed by IEDB analysis resources for the world population as well as different regions of the world. The world population coverage of CVEnvA2 and CVEnvB2 found to be 96.77%, and 71.88%, respectively, that enlisted in Table S11. The world population coverage of CVMemA2 and CVMemB2 found to be 99.82% and 82.11%, respectively (Table S12). The world population coverage of CVS1A2, CVS1B2, and CVS1C2 found to be 94.07%, 79.01%, and 70.77%, respectively, that enlisted in (Table S13). The world population coverage of CVS2A2 and CVS2B2 found to be 87.5% and 57.36%, respectively (Table S14). All these peptides were 96.12–100% conserved among the Sars-CoV–2 isolates, however very poorly conserved in SARS and MERS isolates (Table 4). These analyses give an assumption that CVEnvA2, CVMemA2, CVS1A2, and CVS2A2 are the top peptides for the vaccine in the whole world population.
Homology Modelling and Model Validation
MODELLER modeled the three-dimensional structure of the Envelope protein, Membrane glycoprotein, and Surface glycoproteins through the best multiple template-based modelling approaches (Fig 2). The Envelope protein modeled using 5X29_B, The Membrane glycoprotein modeled using 4N31_B and 5xpd_b, and The Surface glycoprotein modeled using 6ACC_C. The models validated by the PROCHECK server represented as Ramachandran plot and illustrated in Figure S3. In the case of the Envelope protein, 90%, 10%, and 0.0% residues were in the most favoured region, allowed region and disallowed region, respectively. In the case of the Membrane glycoprotein, 82.6%, 15.9%, and 1.4% residues were in the most favored region, allowed region and disallowed region, respectively. While in the case of the Surface glycoprotein, 82.8%, 14.8%, and 1.7% residues were in the most favoured region, allowed region and disallowed region, respectively. The disorder of the protein sequences was measured by the DISOPRED server to retrieve disorder among the targeted sequences (Fig S4). Both analyses showed that the potential peptide placed in the stable part of the protein. Moreover, the proposed epitopes were shown to be on the surface of the protein, give evidence for their surface accessibility (Fig 2).
Allergenicity assessment and Trans-membrane helix prediction
The allergenicity assessment by the AllerTop server showed that CVEnvB2, CVS1A2, CVS1B2, and CVS1C2 were probable allergen (Table 4). The Trans-membrane region prediction by TMHMM server has been depicted in Fig S2 and summarized in Table 4. The potential peptide, CVMemA2, and CVS2A2 found to be located in the transmembrane region of the protein. So, we then go for the next potential peptides CVMemB2 and CVS2B2.
Molecular Docking Analysis.
We selected HLA alleles of MHC class I and MHC class II found from the IEDB analyses interacted with the respective epitopes (Table 2 and 3). The 3D structures HLA alleles retrieved from RCSB-PDB server and docked by CABS-DOC server. The docking interface visualized with the PyMOL Molecular Graphics System. There are several polar and non-polar interactions identified in the docking simulation analyses. The polar contacts extracted by PyMOL and visualized in the figures. The Docking scheme and the receptor amino acid residues interacted for MHC class I is depicted in Figure 4, and MHC class II is depicted in Figure 5. The amino acid residues of peptides CVEnvA1, CVMemB1, CVS1A1, and CVS2B1 interacted with those of MHC class I alleles were illustrated in Figure S5, S6, S7, and S8, respectively. While the amino acid residues of peptides CVEnvA2, CVMemB2, CVS1A2, and CVS2B2 interacted with those of MHC class II alleles were illustrated in Figure S9, S10, S11, and S12, respectively. The polar contacts were shown by red font in the respective figures. The cluster density, average RMSD, maximum RMSD and elements involved also described in Figure 4 and 5.
B-cell Epitope Prediction.
We used the sequence-based approaches for B-cell epitope prediction of the potential peptides CVEnvA2, CVMemB2, CVS1A2, and CVS2B2 by the Kolaskar and Tongaonkar antigenicity scale to assess the antigenic property of the epitope with a maximum propensity score of 1.152, 1.180, 1.095, and 1.183, respectively. Another important benchmark for being a potential B-cell epitope is peptide surface accessibility that was evaluated by Emini surface accessibility of the predicted peptide and found to be with a maximum propensity score of 2.048, 2.471, 2.910, and 2.507 for CVEnvA2, CVMemB2, CVS1A2, and CVS2B2, respectively. The Parker hydrophilicity prediction utilized to find the hydrophilic regions of CVEnvA2, CVMemB2, CVS1A2, and CVS2B2 with a maximum propensity score of 2.5, 0.371, 2.557, and 3.857, respectively. The Karplus & Schulz Flexibility Prediction also utilized to find the flexibility regions of our proposed epitopes of CVEnvA2, CVMemB2, CVS1A2, and CVS2B2 with a maximum propensity score of 1.021, 0.983, 0.998, and 1.043, respectively. These analyses strengthen the prediction that the proposed epitopes might elicit B-cell response (Fig 6).
Multi-epitope vaccine-construction, structural properties and B-cell epitope prediction
The final peptide candidates from all the analyses concluded that CVEnvA2, CVEnvB2, CVMemB2, CVS1A2, and CVS2B2 were the top peptides that can be utilized as vaccines for recognizing the SARS CoV–2 viruses. However, CVEnvB2 and CVS1A2 found to be a probable allergen. This prompted us to modify our predicted peptides. As up to 100 amino acid can be synthesized commercially, we combine the top four peptides via AAY linker that are components of Envelope, Membrane, S1, and S2 proteins (denoted as CVMW) suitable for the world population. A cysteine residue added at the N-terminal of the multi-epitope peptide that can be utilized for conjugation with a carrier protein. As the peptide, CVEnvA2 has poor population coverage for South Africa than CVEnvB2 (3.15% vs. 40.9%) (Table S11), so we constructed another multi-epitope vaccine suitable for South Africa using the second one (denoted as CVMS). Finally, two multi-epitope constructs designed, CVMW (Fig 7A (i)) and CVMS (Fig 7B (i)) those are 70 amino acid long and found to be antigenic with a Vaxijen score of 0.6525 and 0.6927, respectively (Fig 7 and Table S15). Both the vaccines found to be non- allergic in nature (Table S15). The secondary structural properties and the theoretical physicochemical properties of the vaccines shown in Table S15. The 3D model of CVMW (Fig 7A (ii)) and CVMS (Fig 7B (ii)) constructed using iTASSER. Furthermore, the models were subjected to refinement by the Galaxy Refine server. The finalized models were subjected to ProSA-web to analyze the model quality (Fig 7A (iii) and B (iii)). The results revealed a z score of –4.75 for the model, CVMW, and a z score of –2.41 for the model, CVMS. The overall quality of the finalized model of the multi-epitope vaccine constructs checked by Ramachandran plot analysis. The results revealed 52.5%, 39.7%, 6.3%, and 1.6% residues of CVMW lying in favored, allowed, generously allowed and disallowed regions, whereas, 82.5%, 12.7%, 1.6%, and 3.2% residues of CVMS lying in favored, allowed, generously allowed and disallowed regions respectively (Fig. 7 A (iv) and B (iv)).
The linear/continuous and conformational/discontinuous B cell epitopes in the multi-epitope vaccine construct predicted by using ABCPred and Ellipro server, respectively, considering the default parameters. The servers predicted 4 linear and 3 conformational B cell epitopes for CVMW; and 5 linear and 3 conformational B cell epitopes for CVMS (Table S16 and S17).