Investigation of immunogenicity
With the usage of several bioinformatic tools, five 9-mer CD8 T-cell epitopes and five 15-mer CD4 T-cell epitopes from three proteins namely surface glycoprotein, membrane glycoprotein and envelope protein of SARS-CoV-2/human/USA/CA CZB 1055/202 isolate were identified.
In order to combat the pandemic of SARS-CoV-2, the immunogenicity scores of the five best 9-mer CD8 T-cell epitopes for three proteins viz. surface glycoprotein, membrane glycoprotein and envelope protein were investigated and represented in Table 1a. In surface glycoprotein the peptide PDPSKPSKR was found with the highest immunogenicity score of 9.20, in membrane glycoprotein two peptides TDHSSSSDN and NTDHSSSSD had the highest immunogenicity score of 8.72 and likewise, in envelope protein the peptide YSRVKNLNS was found with the highest immunogenicity score of 6.37.
Table 1a
CD8 T-cell epitopes of SARS CoV2/human/USA/CA CZB 1055/2020 isolate showing number of aliphatic amino acids and immunogenicity (Ig) score (Hopp and Woods 1981)
Name of proteins
|
CD8 T- cell epitopes
|
Aliphatic amino acids
|
|
Ig analysis of epitopes
|
|
|
|
Hydropathy index
|
Immunogenicity score
|
Surface glycoprotein
|
PDPSKPSKR
|
2
|
6.97
|
9.2
|
|
DPSKPSKRS
|
2
|
6.88
|
9.01
|
|
DYNYKLPDD
|
2
|
6.53
|
8.23
|
|
WNSNNLDSK
|
2
|
6.34
|
8.12
|
|
QTQTNSPRR
|
0
|
7.09
|
8.08
|
Membrane glycoprotein
|
TDHSSSSDN
|
0
|
6.46
|
8.72
|
|
NTDHSSSSD
|
0
|
6.46
|
8.72
|
|
DHSSSSDNI
|
1
|
5.88
|
7.59
|
|
GNYKLNTDH
|
3
|
6.3
|
7.56
|
|
NYKLNTDHS
|
2
|
6.34
|
7.48
|
Envelope protein
|
YSRVKNLNS
|
3
|
5.64
|
6.37
|
|
KPSFYVYSR
|
2
|
5.3
|
5.72
|
|
VYSRVKNLN
|
4
|
5.09
|
5.24
|
|
YSFVSEETG
|
2
|
4.94
|
4.99
|
|
FYVYSRVKN
|
3
|
4.96
|
4.86
|
The binding efficiency of an epitope is highly enhanced with the presence of aliphatic amino acid sequence on it. Thus, the presence of aliphatic amino acid namely, Gly, Ala, Lys, Ile, Met, Val and Leu in the desired epitopes can lead to proper interactions with lymphocytes. Here, the CD8 T-cell epitopes were analysed for their total number of aliphatic amino acids (Table 1a). Four epitopes of surface glycoprotein viz. PDPSKPSKR, DPSKPSKRS, DYNYKLPDD and WNSNNLDSK possessed two aliphatic amino acids each, while epitope QTQTNSPRR did not possess any aliphatic amino acid. In membrane glycoprotein, epitope DHSSSSDNI had only one aliphatic amino acid, epitope NYKLNTDHS had two aliphatic amino acids and epitope GNYKLNTDH had three aliphatic amino acids. Further, in envelope protein, the epitopes KPSFYVYSR and YSFVSEETG possessed two aliphatic amino acids; epitopes YSRVKNLNS and FYVYSRVKN had three aliphatic amino acids while epitope VYSRVKNLN possessed four aliphatic amino acids.
The immunogenicity scores of the five best 15-mer CD4 T-cell epitopes for three proteins namely surface glycoprotein, membrane glycoprotein and envelope protein were evaluated and reported in Table 1b. Epitopes NSNNLDSKVGGNYNY of surface glycoprotein, GNYKLNTDHSSSSDN of membrane glycoprotein and KPSFYVYSRVKNLNS of envelope protein possessed the highest immunogenicity score of 7.81, 8.14 and 5.70, respectively.
Table 1b
CD4 T-cell epitopes of SARS CoV2/human/USA/CA CZB 1055/2020 isolate showing number of aliphatic amino acids and immunogenicity (Ig) score (Hopp and Woods 1981)
Name of proteins
|
CD4 T- cell epitopes
|
Aliphatic amino acids
|
|
Ig analysis of epitopes
|
|
|
|
Hydropathy index
|
Immunogenicity score
|
Surface glycoprotein
|
NSNNLDSKVGGNYNY
|
5
|
5.96
|
7.81
|
|
WNSNNLDSKVGGNYN
|
5
|
5.93
|
7.71
|
|
ASYQTQTNSPRRARS
|
2
|
6.31
|
6.94
|
|
AWNSNNLDSKVGGNY
|
6
|
5.58
|
6.92
|
|
HRSYLTPGDSSSGWT
|
3
|
5.61
|
6.84
|
Membrane glycoprotein
|
GNYKLNTDHSSSSDN
|
3
|
6.26
|
8.14
|
|
NYKLNTDHSSSSDNI
|
3
|
5.93
|
7.29
|
|
IGNYKLNTDHSSSSD
|
4
|
5.73
|
7.05
|
|
RIGNYKLNTDHSSSS
|
4
|
5.79
|
6.95
|
|
RYRIGNYKLNTDHSS
|
4
|
6.07
|
6.92
|
Envelope protein
|
KPSFYVYSRVKNLNS
|
5
|
5.23
|
5.70
|
|
VKPSFYVYSRVKNLN
|
6
|
4.89
|
5.04
|
|
SLVKPSFYVYSRVKN
|
6
|
4.71
|
4.80
|
|
LVKPSFYVYSRVKNL
|
7
|
4.41
|
4.23
|
|
YSFVSEETGTLIVNS
|
5
|
4.27
|
4.14
|
Aliphatic amino acids in the epitopes of surface glycoprotein varied from 3 (HRSYLTPGDSSSGWT) to 6 (AWNSNNLDSKVGGNY). In membrane glycoprotein three epitopes (IGNYKLNTDHSSSSD, RIGNYKLNTDHSSSS and RYRIGNYKLNTDHSS) possessed 4 aliphatic amino acids and two epitopes (GNYKLNTDHSSSSDN and NYKLNTDHSSSSDNI) had 3 aliphatic amino acids. Further, in envelope protein, the epitopes KPSFYVYSRVKNLNS and YSFVSEETGTLIVNS had 5 aliphatic amino acids, epitopes VKPSFYVYSRVKNLN and SLVKPSFYVYSRVKN possessed 6 aliphatic amino acids while epitope LVKPSFYVYSRVKNL had 7 aliphatic amino acids, shown in Table 1b. Epitopes with high immunogenicity scores are usually more immunogenic in nature and thus these peptides could be the potential choice as candidates in vaccine formulation.
Investigation of antigenicity
Five CD8 T-cell epitopes from each of the three proteins were identified and evaluated for antigenicity (Fig 1a). The antigenicity score of 15 epitopes varied within the range from 8.29 (WNSNNLDSK) to 9.77 (FYVYSRVKN). Similarly, for CD4 T- cell epitopes the antigenicity score was evaluated and found to vary from 14.14 (WNSNNLDSKVGGNYN) to 16.66 (LVKPSFYVYSRVKNL) shown in Fig 1b. Epitope with a high antigenicity score normally possesses high antigenic nature and this is considered as a desirable feature of a peptide to be used as vaccine candidate.
Investigation of hydrophilicity
Hydrophilicity score is the average hydrophilicity value of amino acids that is helpful in the prediction of protein structure. The hydrophilicity score of each CD8 T- cell epitope for three proteins was determined having potential to raise immunogenicity in the host [14]. The hydrophilicity score, the epitope position and total net charge of each CD8 T-cell epitope were depicted in Table 1a. Out of five epitopes of surface glycoprotein, the epitope DPSKPSKRS at position 807-815 and net charge 2.00 had the highest hydrophilicity score of 12.90. In case of membrane glycoprotein, two epitopes namely TDHSSSSDN (position 208-216) and NTDHSSSSD (position 207-215) had the highest hydrophilicity score of 6.50, and net charge -1.75. Likewise, in envelope protein, the epitope YSRVKNLNS possessed the highest hydrophilicity score of 1.40, position 59-67 and net charge 2.00.
Similarly, five CD4 T-cell epitopes for each of the three proteins were identified [14] and their hydrophilicity scores were determined as reported in Table 2b. In surface glycoprotein, epitope ASYQTQTNSPRRARS possessed the highest hydrophilicity score of 6.40, position 671-685 and net charge 3.00. Out of five epitopes of membrane glycoprotein, epitope GNYKLNTDHSSSSDN had the highest hydrophilicity score of 5.80, position 202-216 and net charge -0.75. In case of envelope protein, hydrophilicity scores of all the identified epitopes were less than 0, thus these cannot be used as successful vaccine candidates. High hydrophilicity scores of peptides along with their respective locations suggest their hydrophilic sections in the protein molecules and these regions are usually exposed to external surface of the protein molecules. These sections can easily bind with MHC molecules and the complexes formed might exhibit APCs on the outer regions. The T lymphocytes can easily identify those cells and damage them.
Table 2a
CD8 T-cell epitopes of SARS CoV2/human/USA/CA CZB 1055/2020 isolate showing hydrophilicity score and total net charge (Hopp and Woods 1981)
Name of proteins
|
CD8 T- cell epitopes
|
Epitope position
|
Hydrophilicity score
|
Net charge
|
Surface glycoprotein
|
PDPSKPSKR
|
806-814
|
12.60
|
2.00
|
|
DPSKPSKRS
|
807-815
|
12.90
|
2.00
|
|
DYNYKLPDD
|
419-427
|
5.80
|
-2.00
|
|
WNSNNLDSK
|
435-443
|
2.00
|
0.00
|
|
QTQTNSPRR
|
674-682
|
6.10
|
2.00
|
Membrane glycoprotein
|
TDHSSSSDN
|
208-216
|
6.50
|
-1.75
|
|
NTDHSSSSD
|
207-215
|
6.50
|
-1.75
|
|
DHSSSSDNI
|
209-217
|
5.10
|
-1.75
|
|
GNYKLNTDH
|
202-210
|
1.40
|
0.25
|
|
NYKLNTDHS
|
203-211
|
1.70
|
0.25
|
Envelope protein
|
YSRVKNLNS
|
59-67
|
1.40
|
2.00
|
|
KPSFYVYSR
|
53-61
|
-2.00
|
2.00
|
|
VYSRVKNLN
|
58-66
|
-0.40
|
2.00
|
|
YSFVSEETG
|
2-10
|
-0.10
|
-2.00
|
|
FYVYSRVKN
|
56-64
|
-3.60
|
2.00
|
Table 2b
CD4 T-cell epitopes of SARS CoV2/human/USA/CA CZB 1055/2020 isolate showing hydrophilicity score and total net charge (Hopp and Woods 1981)
Name of proteins
|
CD-4 T- cell epitopes
|
Epitope position
|
Hydrophilicity score
|
Net charge
|
Surface glycoprotein
|
NSNNLDSKVGGNYNY
|
436-450
|
-0.30
|
0.00
|
|
WNSNNLDSKVGGNYN
|
435-449
|
-1.40
|
0.00
|
|
ASYQTQTNSPRRARS
|
671-685
|
6.40
|
3.00
|
|
AWNSNNLDSKVGGNY
|
434-448
|
-2.10
|
0.00
|
|
HRSYLTPGDSSSGWT
|
244-258
|
-1.60
|
0.25
|
Membrane glycoprotein
|
GNYKLNTDHSSSSDN
|
202-216
|
5.80
|
-0.75
|
|
NYKLNTDHSSSSDNI
|
203-217
|
4.00
|
-0.75
|
|
IGNYKLNTDHSSSSD
|
201-215
|
3.80
|
-0.75
|
|
RIGNYKLNTDHSSSS
|
200-214
|
3.80
|
1.25
|
|
RYRIGNYKLNTDHSS
|
198-212
|
3.90
|
2.25
|
Envelope protein
|
KPSFYVYSRVKNLNS
|
53-67
|
-1.60
|
3.00
|
|
VKPSFYVYSRVKNLN
|
52-66
|
-3.40
|
3.00
|
|
SLVKPSFYVYSRVKN
|
50-64
|
-3.30
|
3.00
|
|
LVKPSFYVYSRVKNL
|
51-65
|
-5.40
|
3.00
|
|
YSFVSEETGTLIVNS
|
2-16
|
-5.10
|
-2.00
|
Boman Index (potential protein interaction)
Boman index of each identified CD8 T-cell epitope in three proteins of SARS-CoV-2 was estimated and shown in Fig 2a. A minimum Boman index of 2.48 kcal/mol is considered for efficient binding with the MHC molecule. From this analysis, we could report efficient CD8 T-cell epitopes that might be presented to APCs and thereby elicit immune response in the host. Notably, Boman indices of most epitopes were >2.48 kcal/mol and thus could be used as vaccine candidates. However, a few epitopes namely KPSFYVYSR, FYVYSRVKN and YSFVSEETG had Boman index <2.48 kcal/mol, thus these epitopes would not be effective as vaccine candidates.
Further, the Boman indices of 15 CD4 T- cell epitopes across three proteins of SARS-CoV-2 were determined and depicted in Fig 2b. Here, the epitopes viz. AWNSNNLDSKVGGNY, KPSFYVYSRVKNLNS, VKPSFYVYSRVKNLN, SLVKPSFYVYSRVKN, LVKPSFYVYSRVKNL and YSFVSEETGTLIVNS possessed Boman index <2.48 kcal/mol and therefore we did not consider them as effective vaccine candidates.
Investigation of amino acid contents
The amino acid content of CD8 T-cell epitopes of SARS-CoV-2 was analysed and represented in Fig 3a. Distributions of amino acids were found to be different across different epitopes. Interestingly, a few amino acids viz. Pro, Ser, Asp and Asn in surface glycoprotein comprised of 33% of the total composition, while Ser amino acid in membrane glycoprotein consisted of 44% of the total peptide composition.
Likewise, the amino acid composition for CD4 T-cell epitopes for SARS-CoV-2 proteins was analysed and shown in Fig 3b. The epitopes exhibited different patterns of amino acid compositions. Most CD4 T-cell epitopes possessed amino acid to the tune of 6% in terms of total composition, while a few other amino acids were found to vary from 13 to 33% contents.
Moreover, the identified CD8 T-cell and CD4 T-cell epitopes, if used as vaccine candidates, against SARS-CoV-2 might be amenable to cleavage by several chemicals and enzymes inside host cell, the results of their cleavage sites are depicted in supplementary files (S1, S2).