Potential immune epitope map for structural proteins of SARS-CoV-2


 Researchers around the world are developing more than 145 vaccines (DNA/mRNA/whole-virus/viral-vector/protein-based/repurposed vaccine) against the SARS-CoV-2 and 21 vaccines are in human trials. However, a limited information is available about which SARS-CoV-2 proteins are recognized by human B- and T-cell immune responses. Using a comprehensive computational prediction algorithm and stringent selection criteria, we have predicted and identified potent B- and T-cell epitopes in the structural proteins of SARS-CoV and SARS-CoV-2. The amino acid residues spanning the predicted linear B-cell epitope in the RBD of S protein (370-NSASFSTFKCYGVSPTKLNDLCFTNV-395) have recently been identified for interaction with the CR3022, a previously described neutralizing antibody known to neutralize SARS-CoV-2 through binding to the RBD of the S protein. Intriguingly, most of the amino acid residues spanning the predicted B-cell epitope (aa 331-NITNLCPFGEVFNATRFASVYAWNRK-356, 403-RGDEVRQIAPGQTGKIADYNYKLPD-427 and aa 437- NSNNLDSKVGGNYNYLYRLFRKSNL-461) of the S protein have been experimentally verified to interact with the cross-neutralizing mAbs (S309 and CB6) in an ACE2 receptor-S protein interaction independent-manner. In addition, we found that computationally predicted epitope of S protein (370-395) is likely to function as both linear B-cell and MHC class II epitope. Similarly, 403-27 and 437-461 peptides of S protein were predicted as linear B cell and MHC class I epitope while, 177-196 and 1253-1273 peptides of S protein were predicted as linear and conformational B cell epitope. We found MHC class I epitope 316-GMSRIGMEV-324 predicted as high afﬁnity epitope (HLA-A*02:03, HLA-A*02:01, HLA-A*02:06) common to N protein of both SARS-CoV-2 and SARS-CoV (N317-325) was previously shown to induce interferon-gamma (IFN-γ) in PBMCs of SARS-recovered patients. Interestingly, two MHC class I epitopes, 1041-GVVFLHVTY-1049 (HLA-A*11:01, HLA-A*68:01, HLA-A*03:01) and 1202-FIAGLIAIV-1210 (HLA-A*02:06, HLA-A*68:02) derived from SARS-CoV S protein with epitope conservancy between 85 to 100% with S protein of SARS-CoV-2 was experimentally verified using PBMCs derived from SARS-CoV patients. We observed that HLA-A*02:01, HLA-A*02:03, HLA-A*02:06, HLA-A*11:01, HLA-A*30:01, HLA-A*68:01, HLA-A*68:02, HLA-B*15:01 and HLA-B*35:01 have been predicted to bind to the maximum number of MHC class I epitope (based on the criterion of allele predicted to bind more than 30 epitopes) of S protein of SARS-CoV-2. Similarly, we observed that HLA-A*02:06, HLA-A*30:01, HLA-A*30:02, HLA-A*31:01, HLA-A*32:01, HLA-A*68:01, HLA-A*68:02, HLA-B*15:01 and HLA-B*35:01 are predicted to bind to the maximum number of MHC class I epitope of N protein of SARS-CoV-2. We found that HLA-DRB1*04:01, HLA-DRB1*04:05, HLA-DRB1*13:02, HLA-DRB1*15:01, HLA-DRB3*01:01, HLA-DRB3*02:02, HLA-DRB4*01:01, HLA-DRB5*01:01, HLA-DQA1*04:01, DQB1*04:02, HLA-DPA1*02:01, DPB1*01:01, HLA-DPA1*01:03, DPB1*02:01, HLA-DPA1*01:03, DPB1*04:01, HLA-DPA1*03:01, DPB1*04:02, HLA-DPA1*02:01, DPB1*05:01, HLA-DPA1*02:01, and DPB1*14:01 are predicted to bind to the maximum number of MHC class II epitope of S protein of SARS-CoV-2. Alleles such as HLA-DRB1*04:01, HLA-DRB1*07:01, HLA-DRB1*08:02, HLA-DRB1*09:01, HLA-DRB1*11:01, HLA-DRB1*13:02, HLA-DRB3*02:02, HLA-DRB5*01:01, HLA-DQA1*01:02, DQB1*06:02, DPB1*05:01 and HLA-DPA1*02:01 are found to interact with the maximum number of MHC class II epitope of N protein of SARS-CoV-2. Using the IEDB tool we found the occurrence of HLA alleles with population coverage of around 99% throughout the world. The findings of computational predictions of mega-pool of B- and T-cell epitopes identified in the four main structural proteins of SARS-CoV-2 provides a platform for future experimental validations and the results of present works support the use of RBD or the full-length S and N proteins in an effort towards designing of recombinant protein-based vaccine and a serological diagnostic assay for SARS-CoV-2.


Introduction
PSIPRED 45 and helical and strand content of the constructs are provided in Table S2 and proteins in the MHC class II pathway display preferential cleavage of dibasic (RR,KK,KR 372 or RK) sites 46 . Proteases, which provide peptide ligands for the MHC class II antigenic 373 presentation pathway display preferential cleavage of hydrophobic motifs (AAY). The 374 cleavable linkers are required to be accessible for the proteases associated with MHC I and II 375 antigen processing pathway. It is observed that the cleavable linker residues (AAY and KK) 376 used in multi-epitope subunit vaccines were surface accessible based on computational 377 prediction algorithms indicating high probability of T-cell epitopes presentation by MHC 378 molecules as visualized by discovery studio (Fig.10a-f). The results of C-ImmSim server  In a preprint by Farrera-Soler et al. (2020), the three linear epitopes (655-672, 787-822 and 462 KEELDKYFKNHTSPDVDL-1166, Table 1d) was found predicted as potent B-cell epitopes 472 using ABCpred server thereby collaborating with the recent findings of detection of the 473 above epitope in the sera of SARS-CoV-2 positive patients 52 . Poh and colleagues (2020) Table 1c) was found highly reactive 516 using the convalescent-sera of SARS patients, suggesting the potential application for 517 serologic diagnosis of SARS 60 .

518
In this work we have identified and listed the top CD8+ and CD4+ T cell epitopes based on 519 antigenicity, conservancy, non-allergenicity and positive score for IFN-gamma, of which 41 520 MHC class I epitopes were predicted for the S protein, 19 each for the N and M proteins, and 521 10 for the E protein of SARS-CoV-2 (Table 3c), while we found 26 MHC class II epitopes 522 predicted for the S protein, 8 epitopes for the N protein, 6 and 2 epitopes for the M and E 523 proteins, respectively (Table 4c) revealed that peptides corresponding to residues 1-30, 86-100, 306-320, and 351-365 (Table   533 S3 and S4) of N protein of SARS-CoV (isolate BJ01) have been reported earlier as 534 immunodominant T-cell epitopes. It was previously observed that peptides corresponding to 535 residues aa 336-350 were capable of stimulating IFN-γ production in T-cell cultures derived 536 from peripheral blood mononuclear cells and the peptide spanning the region (aa 81-537 PDDQIGYYRRATRRV-95) have confirmed that YYRRATRRV is a very potent functional 538 CD8+ T-cell epitope of N protein 61 . Intriguingly, this functional CD8+ T-cell epitope (aa 86-GYYRRATRR-94) is found to bind with two alleles (HLA-A*31:01, HLA-A*33:0) is 540 conserved to both N protein of SARS-CoV-2 and SARS-CoV (~85% , Table S3). 541 Additionally, the epitope corresponding to aa 350-NVILLNKHIDAYKTF-364, aa 305-542 IAQFAPSASAFFGMS-319, aa 352-ILLNKHIDA-360 were found common to N protein of 543 both SARS-CoV-2 and SARS-CoV (~80% conservancy, Table S3 and Table S4). Some of 544 the present findings of B-and T-cell epitope predicted using computational algorithms 545 corroborate with the experimentally verified immunogenic epitopes of SARS-CoV and 546 SARS-CoV-2. The potential for cross-protection also exists as the selected proteins and 547 predicted epitopes used in generating the chimeric multi-epitopes exhibited considerable 548 conservation across structural proteins of SARS-CoV and SARS-CoV-2.

549
Designing synthetic peptides for use as vaccines to induce both humoral and cell-mediated   561 We have used NCBI GenBank SARS-CoV-2 (isolate WIV02, accession number 562 MN996527.1 1 ) and SARS-CoV (isolate BJ01, accession number AY278488.2 7 ) to download servers were used to predict conformational B-cell epitopes. EPSVR uses a Support Vector Regression (SVR) method to predict antigenic B cell epitopes 10 . DiscoTope server predicts 575 discontinuous B cell epitopes from protein-3D structures. The method utilizes calculation of 576 surface accessibility and a novel epitope propensity amino acid score 11 . CBTOPE server 577 predicts conformational B-cell epitope with an accuracy of more than 85% using antigen 578 primary sequence in the absence of any homology with the known structures 12 . ElliPro 13 579 predicts linear and discontinuous antibody epitopes based on the protein structure and 580 homology-based model of the amino acid sequence. The helical behaviour of predicted 581 monomeric peptides was computed using Agadir server 14,15 .

583
TepiTool is an interactive and easy to use tool to predict potential peptides binding to MHC To determine the immunogenicity and immune response profile of the multi-epitope protein 611 constructs containing B-and T-cell epitopes, in-silico immune simulation was carried out 612 using the C-ImmSim server. C-ImmSim server uses a position-specific scoring matrix and 613 machine learning techniques for prediction of epitope and immune interactions. The server 614 simulates three components of immune system found in mammals 26 : (i) the bone marrow,

615
where hematopoietic stem cells are simulated and produce new lymphoid and myeloid cells;

616
(ii) the thymus, where naive T-cells are selected to avoid auto immunity; and (iii) a tertiary 617 lymphatic organ, such as a lymph node. All simulation parameters were set at default with 618 time steps set at 1, 84, and 168 (each time step is 8 hours and time step 1 is injection at time = 619 0). Therefore, three injections were given at four weeks apart without lipopolysaccharide 620 (LPS). The Simpson index, D (a measure of diversity) was interpreted from the plot.

622
One of the limitations of the present work is the lack of information on B-and T-cell epitopes 623 of non-structural proteins of SARS-CoV-2. In summary, the bioinformatics analysis of the 624 structural proteins has led to prediction and identification of a pool of B-and T-cell epitopes,

625
which are likely to facilitate researchers to select appropriate epitope or region of proteins 626 (especially the S and N proteins) in an effort to develop a novel drugs, vaccines, and 627 serological assays for the detection, treatment, and management of SARS-CoV-2.
Immunol. Methods 400-401, 30-36 (2013 800 We declare no competing interests. This part of work at present has no funding support.

801
However, NDN has submitted project on 'development of multi-plex RT-PCR and 802 serological assay for SARS-CoV-2' to DST, SERB, Govt. of India for financial support.

953
CD4+ T-cell epitope was identified using two independent prediction algorithm and servers 954 that predicted the corresponding epitope are numbered as (1) for NetMHCpan4 and (2) Table 5: SARS-CoV-2 proteins that were predicted as common and overlapping B-and 957 T-cell epitopes.
958 Table 6: List of potent B-and T-cell epitopes that are included in the design of multi-959 epitope constructs of SARS-CoV-2