Selection of protein sequences
The amino acid sequence of envelope (E) protein of SARS-CoV-2 was collected from the NCBI virus with an accession number of QHD43418 released on 2020-01-13. The FASTA sequence was used to construct a multi-epitope vaccine against SARS-CoV-2.
The sequence of “cholera toxin B (CTB) subunit [Vibrio mimicus]” was collected from the NCBI database and was used as an adjuvant.
B-cell epitopes analysis
ABCpred database has studied the full-length E protein sequence for the analysis of linear B-cell epitopes. All results passed three more filters (antigenicity, allergenicity, and toxicity) and two epitopes (NVSLVKPSFYVYSRVK – YVYSRVKNLNSSRVPD) were chosen as protective epitopes (Tab 1).
T-cell epitopes analysis
The first group of T-cell epitope prediction was performed using vaxign server (Tab 1). The next two groups including binding epitopes to MHC class I and class II molecules were predicted individually by the IEDB database. The three filters aforementioned were applied to identify the protective antigens. Based on the number of alleles, the predominant human leukocyte antigen (HLA) alleles were HLA-A*01:01, HLA-A*24:02, HLA-A*11:01, HLA-A*03:01, and HLA-B*07:02 between MHC class I alleles and HLA-DRA-DBR1*01:01 and HLA-DPB1*01:02 in MHC class II alleles.
Antigenicity of potential epitopes
Antigenicity of both B-cell and T-cell epitopes was foreseen by VaxiJen 2.0, with a threshold of 0.4. Except two epitopes, the antigenicity score of other predicted epitopes were upper the threshold and were considered as “antigen” epitopes. The two more screenings (including allergenicity and toxicity) were not carried out on “non-antigen” epitopes (Tab 1).
Antigenicity of potential epitopes
Allergenicity of both B-cell and T-cell epitopes was estimated by AllerTOP v. 2.0 (Tab 1).
Toxicity of potential epitopes
Toxicity of both B-cell and T-cell epitopes was anticipated by ToxinPred, with a peptide fragment length of 10 (Tab 1).
Construction of chimeric peptide
The tick marked epitopes in Table 1 that remained after screening were chosen for the design of chimeric peptide as a multi-epitope vaccine. Finally, we selected 12 epitopes (including 2 B-cell epitopes, 7 binding epitopes to MHC class I proteins, and 5 binding epitopes to MHC class II proteins) and they were all “antigen”, “non-allergen”, and “non-toxin”. In this part, to make a contiguous sequence in final construction, the overlapping of B-cell and T-cell epitopes were merged. Predicted linear B-cell epitopes and T-cell epitopes were connected utilizing KK linkers as flexible connectors. The “cholera toxin B subunit [Vibrio mimicus]” with GenBank accession AJP16764.1 was selected as an adjuvant and attached to the amino terminals of the multi- epitope vaccine through PAPAP linkers as rigid linkers to improve antigen-specific immune responses 22 (Fig. 1).
Antigenicity, allergenicity and toxicity estimation of the nominee multi-epitope vaccine
The final peptide chimera (Fig. 1) antigenicity (along with the adjuvant sequence) was estimated by the VaxiJen 2.0 server to be 0.5841 with a threshold of 0.4. The primary multi-epitope vaccine sequence (without adjuvant) reported a score of 0.6906. ANTIGENpro was another platform that was utilized to estimate the antigenicity of final peptide. Based on this server the whole protein (Fig. 1) is antigen with a probability of 0.809160.
AllerTOP v. 2.0 server estimated the allergenicity of the final peptide. The result for both states was "non-allergen". By utilizing ToxinPred server, the final peptide was predicted as “non- toxon”.
Amino acid composition and physicochemical properties and solubility prediction
Based on the Protparam database, the final peptide chimera comprised 338 amino acids (Fig. 1) with a molecular weight of 37.7 kDa. The isoelectric point (PI) value was expected to be 9.62. The half-life was estimated to be 30 hours, with adjuvant and 100 hours, without adjuvant in mammalian reticulocytes in vitro and more than 20 hours in yeast and over 10 hours in Escherichia coli (E.coli) in vivo. An instability index II was predicted to be 41.34, classifying the protein as unstable (II >40 indicates instability), but the amount of this index is really near the border. The estimated Grand Hydropathic Average (GRAVY) was -0.120. The negative attribute indicates that the protein is hydrophilic and can react with water molecules 22. Furthermore, based on the PepCalc server, the solubility was predicted to be “good” in water. Based on SOLpro from ANTIGENpro, Peptide chimera was expected to be SOLUBLE with a possibility of 0.684306.
By using the Iupred2a server, the disorder regions of the final peptide, which make it unstable, were identified. The disorders are regarded in the adjuvant areas (Fig. 2A).
Secondary structure prediction
The secondary structure of the final protein (Fig. 1) was analyzed by the Prabi server, the final chimeric peptide was estimated to include 36.09% alpha-helix, 22.49% extended strand, and 41.42% random coil. There is no compactness in alpha-helix locations, which demonstrates there will be less difficulty in future synthesis steps (Fig. 2B). The other server we used to predict the secondary structure of the peptide chimera was PSIPRED 4.0 server. The results and the details of residues and their configurations are given in Figure 2C.
Tertiary structure homology modeling
The final protein tertiary structure was built by I-TASSER server. I-TASSER provided the top 10 proteins from PDB library that had the closest structural similarity to the predicted vaccine model. The average TM-score of these ten proteins was calculated to be 0.41. We selected the best predicted model according to C-score; a confidence score to estimating the quality of predicted models by I-TASSER. It had the highest C-score of -3.96, and the RMSD was estimated to be 16.4±3.0Å.
Tertiary structure refinement
The GalaxyWEB server was used to refine the predicted model. After refinement, an evident improvement was observed in the percentage of residues in Ramachandran favored regions relative to the initial predicted model.
Tertiary structure validation
The plot B in Figure 3 reveals local quality model by displaying the energy as a function of the amino acid sequence location. This plot is made by ProSA-web server. In general, the negative values refer to the problematic or erroneous areas of the input model, which is observed in some parts in adjuvant regions. About our model, these negative values were observed in middle parts including epitope but the lowest value was relevant to the C-terminus connected adjuvant, not the multi-epitope section.
Based on the ProSA-web, the Z-score of peptide chimera was predicted to be -4.3 (Fig. 3C). This value is in the range of native conformations23.The diagram of the predicted local similarity to target and the Z-score value of the homology model is shown in Figure 3B and C.
Based on the Procheck server, The Ramachandran plot analysis of the modeled protein reported that 72.0% of residues in the protein are in the favored regions. 22.2% of the residues were predicted to be in additional allowed regions and 2.3% were in generously allowed regions, with 9 residues (3.5%) in the disallowed regions (Fig. 3D).
Molecular docking of subunit vaccine with MHC molecules
Crystal structure of HLA-A*01:01, HLA-A*24:02, HLA-A*11:01, HLA-A*03:01, HLA- B*07:02, HLA-DRA-DBR1*01:01 and HLA-DPB1*01:02 were retrieved from the PDB RCSB database (PDB ID: 4NQX, 5HGH, 6ID4, 6O9B, 5EO0, 1AQD, 3LQZ, respectively). The PDB files were edited and cleaned from heteroatoms. PEP-FOLD 2.0 from the RPBS Web Portal server was used to predict the tertiary structure of 6 epitopes of vaccine construction individually. A molecular docking study was carried out on epitopes and the whole vaccine construction with MHC alleles by the ClusPro 2.0 online server. PyMOL software was used to perform a detailed analysis of the interface of protein-protein interaction (PPI) (Fig. 4). The weighted score of the lowest energy docked complexes are reported in Table 2. The best way to rank the model is the cluster size (number of members) 24,25.The most populated clusters were found in SFVSEETGT and HLA-A*24:02, TLAILTALR and HLA-A*24:02, VTLAILTAL and HLA-A*24:02, and TLAILTALR and HLA-A*01:01 with 997, 997, 990, and 769 cluster members, respectively.
The docking study on the whole vaccine construction with the MHC molecules depicted one of the most probable position and orientation of the peptide within the MHC binding grooves (Fig. 4H). The active residues of Fig. 4I belongs to TLAILTALR epitope which interacts with chain E of HLA-DRB1*01:01.
Immune response simulation
We predicted the IL4, IL10 and INFγ inducing peptides from the 6 epitopes in the final vaccine construction via IL4pred server, IL10pred server and INFepitope server, respectively. The result predicted the SFVSEETGT epitope as an IL4 inducer and the NVSLVKPSFYVYSRVK as an IL4, IL10, and INFγ inducer and the SFYVYSRVKNLNSSRVPD as an IL4 and IL10 inducer peptide.
The primary and secondary immune responses were stimulated by the C-IMMSIM server. This server simulated the immune response to vaccine candidate (without adjuvants) for three times of injection in the time steps of 1, 84, and 100. Each time step is equal to 8 hours. To make a relative comparison, we made a shuffled sequence of the vaccine candidate as a control protein, and we analyzed the results of the immune response simulation to injection of it. This shuffled sequence was employed to evaluate the significance of the vaccine sequence results, because in immune response simulation by this server, the sequence composition (the final epitopes connected via KK linkers) is an important consideration. Finally, we found out that the results of the vaccine injection varied from those of the controls. The results of the immune response simulations are given in Figures 5 and 6.