3.1 Retrieved Sequence Information:
Eight spike S protein sequences were retrieved from NCBI with their accession numbers, area and date of collection as shown in Table 1. All sequences are from China.
3.2 Multiple Sequence alignment and Epitopes Conservancy Assessment:
Multiple sequence alignment of the retrieved sequences was performed using ClustalW through BioEdit software showed high conservancy between the aligned sequences. The conserved regions were identified by identity and similarity of amino acid sequences (Fig.1).
3.3 Phylogeny Analysis
Evolutionary analyses were conducted in MEGA7.0.26 (7170509) software using maximum likelihood parameter. The analysis involved 8 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 1273 positions in the final dataset [18, 54] see Fig.2.
3.4 Structural Analysis:
The physiochemical properties of Spike S protein calculated by protparam server revealed that it contained 1273 amino acids (aa) with molecular weight of 141178.47 kDa, which reflects a good antigenic nature. Theoretical isoelectric point (PI) was 6.24 which indicate its negative in nature. An isoelectric point below 7 states a negatively charged protein, however the total number of negatively charged residues (Asp + Glu) was 110 aa and positively charged residues (Arg + Lys) was 103 aa. Protparam computed instability-index (II), which was a 33.01, this categories spike S protein as stable protein. Aliphatic-index was 84.67, which devotes a thought of proportional volume hold by aliphatic side chain and GRAVY value for protein sequence is 0.012. (Grand average of hydropathicity (GRAVY: -0.079). The half-life of protein described as the total time taken for its disappearing after it has been synthesized in cell, which was computed as 30 hour (h) for mammalian-reticulocytes, > 20 h for yeast, > 10 h for Escherichia coli. The N-terminal of the sequence considered is M (Met). Total number of atoms was 19710. The total number of Carbon (C), Oxygen (O), Nitrogen (N), Hydrogen (H) and Sulfur (S) were entitled by Formula: C6336H9770N1656O1894S54.
The component of secondary structure predicted by GOR IV server (https://npsa-prabi.ibcp.fr/cgi-bin/secpred_sopma.pl) revealed alpha helix (28.59%), Beta turn (3.38%), and random coil (44.78%) as in (Fig.3). The ambiguous states of the Spike S protein were predicted via UbPred server (Fig.4) it showed that there were six amino acids sites at the position 182, 776, 811, 947, 1255 and 1266 respectively with low confidence ambiguity site (grey color). Moreover, the average of hydrophobicity predicted by SOSUI server was -0.079183. This server predicted two Trans-membrane regions as shown in Table 2. DiANNA1.1 tool calculated 20 disulphides bond (S–S) positions and assign them a score and it makes prediction based on trained neural system (see Additional file 1: Table S1).
Pfam server predicted 19 conserved domains (E-value cut-off to 1.0) in spike S protein (2 significant and 17 insignificant). The significant domains were Coronavirus S2 glycoprotein (corona-S2) and Spike receptor binding domain (Spike_rec_bind). The insignificant domains include Spike glycoprotein N-terminal domain, Baculovirus polyhedron envelope protein, Protein of unknown function (DUF2959), MukF middle domain, Calcium-dependent calmodulin binding, Protein of unknown function (DUF1664), Tetramerisation domain of TRPM, Domain of unknown function (DUF4795), Retroviral envelope protein, SlyX protein and Biogenesis of lysosome-related organelles complex-1 subunit 2.
The significant conserved domains were sequenced by Conserved Domain (CDD) BLAST search. The results revealed that corona-S2 (pfam01601) is the only member of the superfamily cl20218 [55]. The top related sequences were Human coronavirus HKU1 (isolate N1), Bovine coronavirus, Porcine haemagglutinating, Human coronavirus HKU1, Bat SARS coronavirus HKU3-3, Murine coronavirus, SARS coronavirus ExoN1, SARS coronavirus ExoN1, Murine hepatitis virus strain A59.
Spike_rec_bind (pfam09408) is the only member of the superfamily cl09656 [56]. The top related sequences were Human coronavirus HKU1 (isolate N1), Bovine coronavirus, Equine coronavirus NC99, Porcine haemagglutinating encephalomyelitis virus, Human coronavirus HKU1, Human coronavirus HKU1 (isolate N2), Bat SARS coronavirus HKU3-3, Murine coronavirus RA59/R13, SARS coronavirus ExoN1 and Murine hepatitis virus strain A59.
The closest homologue obtained from BLASTP results was the severe acute respiratory syndrome-related coronavirus (75.96%) with E value 0.00 followed by Bat coronavirus BM48-31/BGR/2008 (71.96%) see Table 3 and Fig. 5 and 6.
3.5 Proposed B cell epitopes:
In B cell prediction methods, thirty two conserved epitopes were predicted using Bepipred Linear Epitope Prediction method. Among them only five epitopes were pass Emini surface accessibility prediction tool and kolaskar and Tongaonkar antigenicity methods. These epitopes were (110LDSK113, 634RVYST638, 1054QSAPH1058, 1086KAHFP 1090, and 1137VYDPLQPELDSF1148). Among these epitopes only one epitope 1054QSAPH1058 was found non-toxin and non-allergen when investigated by Allertop and ToxinPred servers (Table 4).
Unfortunately, the promising B cell epitope when subjected to Protparam server to determine its physiochemical properties, it was found unstable. The molecular weight is 538.56 kDa and the GRAVY value for protein sequence is -1.460.
However, Discotope 2.0 server was used to calculate surface availability in term of residue contact number and novel tendency amino acid score was utilized to predict the discontinuous epitopes. 3D structure of S protein (PDB ID: 6VSB) [57] was used for discontinuous epitopes prediction, 90% specificity, − 3.700 threshold and 22.000 Angstroms propensity score radius. Total 45 discontinuous epitopes were identified at different exposed surface areas (Table 5). Position of each predicted epitope on surface of 3D structure of S protein shown in Fig.7 were visualized using Chimera tool [35].
3.6 Proposed epitopes for MHCI and MHCII:
MHCI prediction tools outward 109 conserved epitopes of SARS-CoV-2 spike S protein. Of these 7 epitopes were identified as top MHC I epitopes based on the high antigenicity score and great linkage with MHCI alleles class A, B and C. These epitopes were (898FAMQMAYRF906, 258WTAGAAAYY266 and 2FVFLVLLPL10, 202 KIYSKHTPI210, 718FTISVTTEI726, 712IAIPTNFTI720 and 1060VVFLHVTYV1068) (Table 6).
In MHCII prediction methods, many core sequences were predicted to interact with huge numbers of alleles as well as high antigenicity score. The core 898FAMQMAYRF906, that predicted in MHCI methods was interacted with 101 MHCII alleles. 888FGAGAALQI896 and 342FNATRFASV350 epitopes were also interacted with 83 and 65 alleles in MHCII respectively (Additional file 2: Table S2).
3.7 Antigenicity, Allergenicity, Toxicity of MHCI and MHCII Epitopes:
The expected MHCI and MHCII epitopes were subjected to VaxiJen v2.0 server, AllerTop v2.0 and ToxiPred to predict the antigenicity, allergenicity and toxicity of predicted epitopes respectively. The predicted MHCI and II epitopes were antigenic, but 1060VVFLHVTYV1068 and 898FAMQMAYRF906, epitopes displayed the higher scores ((1.5122 and 1.0278 respectively). The epitopes were also free of causing allergenicity and toxicity (see Table. 6 and Additional file 2: Table S2).
3.8. Cross Reactivity with Human Epitopes:
The only one epitope “1209YIKWPWYIW1217” shared between MHC I and MHC II has been detected to have putative conserved domain identical to human peptide among all selected epitopes. Therefore, it was removed from the epitopes pool to avert triggering an autoimmune response.
3.9 Predicted Physicochemical Properties
The proposed epitopes for both MHCI and II were further subjected to Protparam server to determine their physiochemical properties. All predicted epitopes were stable except 718FTISVTTEI726 (see Table 7 and 8 and Fig.8 and 9).
3.10 Population Coverage:
The proposed epitopes for MHCI revealed 95.74 coverage against whole population while the proposed epitopes for MHCII showed only78.09 population coverage against whole population (Table 9).
3.11 Molecular Docking:
The epitope 1060VVFLHVTYV1068 interacted strongly with TLR8 (global energy -84.58) followed by 2FVFLVLLPL10 (global energy -64.23) see Table 10 and Fig.10. Moreover, MHCI peptides were also docked with HLA-B7 (PDB ID: 3VCL). The epitopes were interacted strongly with HLA-B7, but the best one was 2FVFLVLLPL10 (global energy 78.81) followed by 1060VVFLHVTYV1068 (global energy -63.20) see Table 11 and Fig.11.