Scouting the Receptor Binding Domain of COVID-19: A Comprehensive Immunoinformatics Inquisition

The December of 2019 witnessed emergence of worldwide outbreak by a novel strain of coronavirus termed COVID-19 with sequence similarity of overall 96.2% with BatCoV RaTG13 (coronavirus isolated from bat) and 94% sequence identity with Severe Acute respiratory syndrome Virus (SARS-CoV) that resulted in outbreak in 2002-2003. There is no therapeutic or preventive strategy like vaccine developed so far to overcome infection. The receptor binding domain (RBD) of COVID-19 for any potential vaccine epitopes were explored. The structure of RBD of COVID-19, BatCoV RaTG13 and bACE2 were chalked through homology modeling followed by molecular docking and structural validation. A comprehensive immunoinformatics approach mapped conserved peptide sequence on COVID-19 RBD for their B-, Helper T- & Cytotoxic T-cell epitope profile. The recognized epitopes were further studied and validated for their docking interaction with MHC-I and MHC-II alleles. Through immune-informatics approaches the study identified conserved B- and T-cell epitopes in RBD. The B-cell epitopes lying within the receptor binding motif, LFRKSN and SYGFQPT l were found to be highly antigenic. Among T-cell epitopes, the epitope CVADYSVLY and FTNVYADSF were antigenic and exhibited affinity for maximum number of MHC-I alleles. The T cell epitopes YRLFRKSNL, VYAWNRKRI displayed affinity for maximum number of MHC-II alleles. The docking analysis of the epitopes with MHC proteins revealed strong interactions of T-cell epitopes with MHC-I and MHC-II alleles. The overlapping epitope among B- and T-cells was YRLFRKSNL. The deployment of these epitopes in potential vaccine against COVID-19 may help in sweeping the COVID-19 infectious spread.

Numerous studies have reported Bats as the primary reservoirs for SARS and MERS viruses [26][27][28]14,29], however rodent origin has also reported [13,30]. A recently reported strain derived from bats, the BatCoV RaTG13 contains more than 96% homology with COVID-19 and more than 93% homology with its S protein, rendering it to be most closely related to COVID-19 reportedly till date [11]. Anatomically, conservative peptide sequence of coronaviral RBD compared with the closest known zoonotic coronaviral strain can provide better potential vaccine candidates for human testing.
After the emergence of COVID-19 pandemic, SARS CoV surface protein has been repeatedly utilized for identification of potential vaccine epitopes for COVID-19 [24,23,25]. However, there has also been simultaneous speculation regarding potential existence of cross resistance epitopes between SARS CoV2 and SARS CoV [17,31,32]. Earlier, the computational probing of protein structures for respiratory infections by employment of docking methods have added useful information regarding stereochemical properties, virus binding mediated host receptor conformational transformation and binding preferences [33][34][35][36][37]. Contextually, viral receptor interactions were considered valuable in the instances of picornaviruses, influenza, HIV and coronaviruses [38][39][40]20]. For the current COVID-19 induced infection, the basic reproduction number for viral transmissibility (R o ) as per various estimates is around 1. 1-5.5 [41,42]. Since this threshold points towards very high infectivity rate, it is pertinent to target the viral binding regions with vaccines to prevent infection.
Vaccinable peptide sequence for epitope based vaccine in case of Alphaviruses, Hepatitis B & C, HIV, HPV and Influenza viruses for recognition of potent immunogens have shown propitious results [43][44][45][46]. Several studies reported on SARS and MERS CoV strains provided useful information regarding the potential epitopes retained by these strains [47][48][49][50][51][52][53][54], While, the data in context pertinent to COVID-19 is insufficient . The global COVID-19 pandemic has sparked rigorous R&D activity for COVID19 vaccine development, and in a matter of just more than 4 months, various potential vaccine candidate are in the preclinical and clinical development phase [55]. However, the clinical behavior of COVID-19 infecting people around the world but with varied clinical symptomatology, ranging from completely asymptomatic to rapidly progressing lethal respiratory insufficiency demands for utilization of more and rapid novel technology platforms with more vaccinable options against COVID-19 [55,56].
Therefore, this study reports key findings regarding COVID-19 RBD for its variable and conservative RBD residues in comparison with BatCoV RaTG13 strain, which can be considered as immunogenic epitopes for potential multi epitope vaccine candidate for COVID-19 in the backdrop of its binding orientation.

Materials And Methods:
To identify the presence of antigenic epitopes within the RBD of spike glycoprotein of COVID-19 , insilico analysis was performed ( Figure I). The antigenicity of RBD was determined through VaxiJen v2.0 [57].

Sequence Retrieval and Multiple Sequence Alignment:
Protein sequences of spike glycoprotein of COVID-19 (reported till 31 ist March, 2020) were retrieved from National Center for Biotechnology Information (NCBI). The sequences from China, Australia, USA, Taiwan, India, Pakistan, Nepal, Italy, Sweden, Brazil, Vietnam, Spain, Colombia, Peru and Japan were selected for analysis. The sequence of spike glycoprotein from bat coronavirus (BatCoV RaTG13) (GISAID accession no. EPI_ISL_402131) and SARS coronavirus ZJ02 (Accession No. ABB29898) were used as reference for comparison. Sequence analysis was performed to ascertain the changes in the receptor binding domain of spike glycoprotein. Multiple sequence alignment was performed using Clustal X. The consensus sequence of COVID-19 was used as input for epitope prediction.

B-cell Epitope Prediction:
For identification of B-cell epitopes, Immune Epitope Database (IEDB) and BepiPred-2.0 were used [58,59] predicted the antigenicity on the basis of amino acid abundances in naturally occurring epitopes as well as their physicochemical properties. The default threshold was set to 1.00 for antigenicity determination [60]. Emini surface accessibility method predicts the surface accessibility of epitopes as the surface accessible peptides recognized by the immune system [61]. Chou & Fasman Beta-Turn method was used to predict the antigenic regions exhibiting beta turn as the beta turns are usually hydrophilic in nature and highly accessible [62]. Karplus & Schulz Flexibility method predicts those antigens that are exhibiting flexible amino acids in nature as flexibility is correlated with antigenicity [63]. Bepipred prediction method based on Hidden Markov model predict linear epitopes in protein [64]. The B-cell epitopes were also predicted using ElliPro. This method identifies linear and discontinuous epitopes in protein structure [65]. T-cell Epitope Prediction: In vaccine development, cytotoxic T-lymphocyte (CTL) epitopes play an important role. Hence the Tcell epitopes were identified that have the ability to bind with major histocompatibility complex class I (MHC-I) and class II (MHC-II). CTL epitopes were identified through NetCTL 1.2 server [66]. The Immune epitope database (IEDB) and NetMHCI 4.0 server was used to predict the binding of epitopes with MHC-I. NetMHCI 4.0 predicts the binding affinity through artificial neural network (ANN) by schooling 81 distinct HLA-A, -B, -C and -E human MHC alleles [67]. T-helper cell epitopes were predicted through IEDB and NetMHCII 2.3 server [68]. The epitopes were predicted having high affinity towards HLA-DR, -DQ, and -DP. For all the T-cell epitopes the threshold for predicting strong binding affinity with MHC-I and II was set to be 500nM.

B-and T-cell Epitopes Feature Profiling:
B-and T-cell epitopes were further scrutinized for their enzyme digestion, toxicity, hydrophobicity, and physiochemical properties. The digestion of peptides with enzymes is an important parameter in vaccine development as the peptides that are digested by many enzymes are usually rendered unstable. Hence the digestion of peptides by different enzymes was predicted through protein digest server. AntiangioPred was used to predict the mutation and other physicochemical properties of peptides. ClanTOX predicted toxicity of peptides [69]. Antigenicity of peptides were predicted through Immunomedicine group server. For a peptide to be antigenic the threshold is 1.0 [60].

Human proteome analysis for nonhuman homologues:
To avert autoimmunity, vaccine contenders were screened for human and nonhuman homologues.
The nonhuman homologues were yielded by scrutinizing selected epitopes sharing <30% identity with human proteome, via BLASTp analysis.

Docking of T-cell epitopes with MHC-I and MHC-II Alleles:
The peptides that were showing affinity for maximum number of MHC-I and MHC-II alleles were selected for interaction analysis. The structure of peptides were modelled through PEPFOLD server [70] followed by energy minimization. In case of MHC-I, the common allele between the peptide was selected for docking. Hence, the crystal structure of human HLA-A*0101 was downloaded from pdb (PDB ID: 6AT9: resolution= 2.9 Å). Same criterion was followed for MHC-II alleles and for that the crystal structure of HLA-DRB5*01:01 (PDB ID: 1FV1: resolution= 1.9 Å) as also downloaded from pdb.
Both the structures after ligand removal underwent protonation followed by energy minimization by

Interaction analysis of BatCoV RaTG13 with bACE2:
After identification of vaccine epitopes, we further explored whether these vaccine epitopes harbor any important residues that involved in binding of COVID-19 with hACE2 and BatCoV RaTG13 with bACE2. The interactions of COVID-19 with hACE2 has recently been reported by Lan et al., 2020 [71].
The interaction analysis of BatCoV RaTG13 with bACE2 was performed in the current study. To our knowledge, the structure of RBD of BatCoV RaTG13 as well as bACE2 has not been determined yet.
Hence the structure of both the proteins was determined through homology modeling using Modeller V9.23. BatCoV RaTG13 was modeled using SARS-COV as template (pdb ID: 2GHV). While bACE2 was modeled using Human Angiotensin Converting Enzyme (pdb ID: 1R42) as template. The generated model was subjected to model evaluation and structural validation via Ramachandran plot, PROSA, ERRAT, Qmean, and MolProbity. Ramachandran plot calculates the presence of amino acid residues in allowed, favored and outlier regions on the basis of torsional angles (Ψ and Φ) of amino acids [72].
PROSA reveal the quality of model by estimating any error in the models. It also calculates the score of model on the basis of experimentally reported (X-ray, and NMR) structures of proteins [73]. Qmean (Qualitative Model Energy Analysis) apprises the geometry of protein structure by measuring the torsion angles on three consecutive amino acid residues [74]. MolProbity evaluates the protein structure by assessing its geometry [75]. ERRAT gauges the quality of model by analyzing the statistics between non-bonded interactions and different type of atoms and compared these values with the extremely refined structures [76]. The best model was then subjected to energy minimization using AMBER99 forcefield implemented in Molecular Operating Environment (MOE) version 2015.10. Docking of BatCoV RaTG13:bACE2 were performed using HADDOCK web server [77]. Analysis of protein-protein interactions were performed through pdbSum [78] and PyMOL v2.3.

Multiple Sequence Alignment:
Receptor binding domain of COVID-19 is 192 amino acids long (within position 330-522 amino acids) lying in S1 region of spike glycoprotein. When comparing the receptor binding motif with the BatCoV

B-Cell epitopes within Receptor Binding Domain:
Continuous B-cell epitopes were predicted using B-cell epitope prediction methods on IEDB server. The Kolaskar & Tongaonkar method predicted 11 antigenic epitopes in the receptor binding domain (Table I) which can prompt B-cell responses. Surface accessibility analysis revealed 4 epitopes with surface accessibility (Table II).
Flexibility of epitopes is a measure of antigenicity [53]. The flexible epitopes in RBD were at positions  Fig III.Cytotoxic T-cell Epitope

Prediction:
The default setting in the NetCTL server was used to predict T cell epitopes. On the basis of highest combinatorial scores, five epitopes were opted for subsequent analysis (Table IV)

Helper T-cell Epitope Prediction:
A total of 9 peptides were predicted which exhibited strong affinity for MHC-II alleles (Table V). Among these the peptide YRLFRKSNL and VYAWNRKRI reflected affinity for maximum number of alleles.

B-and T-cell Epitopes Feature Profiling:
To identify the best epitope for vaccine construction, different features of T-cell epitopes were determined (Table VI). The identified epitopes didn't show any homology with human proteins, didn't exhibited mutations and predicted to be non-toxic. The peptides which were digested by fewer enzymes have been considered good potential vaccine candidates (Table VI). Antigenicity of the peptides depicted that CTL specific peptides can be antigenic except ERDISTEIY. In case of helper Tcell epitopes, FELLHAPAT, TGCVIAWNS, and VLYNSASFS were highly antigenic while other peptides were less antigenic. In case of B-cell epitopes, all the 3 peptides were antigenic. Interaction

Interaction Analysis of HTL Epitopes with MHC-II Specific Alleles:
Two  (Fig IV-d).

Interaction analysis of BatCoV RaTG13 with bACE2:
The interactions of BatCoV RaTG13 with bACE2 was done using HADDOCK. 155 different complexes of BatCoV RaTG13:bACE2 were generated that clustered into 12 groups. Table S1 is showing the Z scores of all the seven clusters, size of each cluster, RMSD from the overall lowest-energy structure, and energy values of electrostatic, Van der Waals, and de-solvation. The cluster with the best HADDOCK score (-178.9±3.6) was further used for analysis. Detailed interaction analysis showed that 26 residues of bACE2 and 9 residues of BatCoV RaTG13 were present at the interface. These residues Discussion: Research on various coronaviruse species earlier have been continually reported since the last one and a half decade after the emergence of SARS, for annotating signatures and virulence factors [79].
Viral entry receptors are crucial in viral life cycle, sustenance and egress [80]. Realizing their particular tissue tropism further augment their importance for therapeutic targeting and restricting viral entry into cell which can abolish infectivity altogether.
In case of SARS and MERS-CoV, spike protein and specially S1 regions has been the prime focus in developing immune strategies against these strains [52]. Similar strategy can be employed by investigating the S protein for identifying immune epitopes against SARS COV2. Vaccines against SARS CoV2 can serve as one of the most promising modes of containing COVID19 pandemic. To date no reliable treatment options available for COVID19, so logically vaccine against it is a much needed entity. As the global burden of infectivity by COVID19 pandemic is increasing every day, computational biology aided vaccine design for SARS CoV2 with removal of unnecessary antigenic load and screened allergic response can provide the characteristic immune response required for preventing COVID19 infection. Similar to SARS and MERS-CoV, S1 region of CVOID-19 harbors the RBD which is involved in entry of virus into host cell. Hence, the identification of antigenic peptides within RBD can be a good strategy to forestall infection. In the current study, after identifying the important receptor binding residues on the RBD for any antigenic peptides were probed. The present study focused on deriving immunogenic epitopes capable of triggering both humoral and cell mediated immune response, on the basis of high degree of comparative sequence similarity of RBD from BatCoV RaTG13 with SARS CoV2. Previously, immunogenic epitopes for SARS CoV2 by immunoinformatics method have been reported in comparison with SARS CoV [24,23,25]. Using this approach may yield rather specific epitopes against COVID-19. Contextually, cross reactivity of SARS CoV antibodies against SARS CoV2 epitopes is also under debate and is providing useful information against potential SARS CoV2 host immune response [32,17,31]. alleles could be very antigenic [54,53]. In case of CTL epitopes, two epitopes (peptide3: CVADYSVLY, peptide4: FTNVYADSF) were found be highly antigenic and also showed strong binding with HLA-A allele. HLA are polymorphic proteins with variable expression in different population. Therefore, a vaccine which is suited for all population without inciting any autoimmunity met by for T-cell epitopes by HLA selectivity is crucial for an effective vaccine candidate [81]. epitopes, it can be observed that these epitopes were 100% conserved from the reported data till date, as predicted by conservation analysis. These peptides also did not exhibited homology with any human protein hence may not incite any autoimmunity. These peptides did not display any toxicity.
The digesting enzyme data showed these peptides to be indigestible by many enzymes and hence are safer to use (Table V) Designing vaccine against viral infection such as COVID19 is a tricky entity. On one hand, it has to be ensured that the vaccinable epitopes hold enough antigenic potential to mount a befitting yet specific immune response so as to rapidly clear the infection if the need arises. While on the other hand, the host immune response should not be large enough to ensure chronic inflammation which in case of COVID19 can significantly deteriorate lung infection considering lung as an organ is highly sensitive to inflammatory changes posed by surge of cytokine response [84]. Mutations in the viral genome are capacitating coronaviruses to breach species barrier repeatedly. As the coronaviruses harbor an errorprone RNA dependent RNA polymerase, it may engender recombination events with mutational diversity, concocting therapeutic challenges and survival advantage to virus [79]. This has been the case observed in SARS-2004-2004 epidemic [85,86] . It is of grave concern that COVID-19 has the potential to cause pandemics while considering Ro estimates. As bats are considered primary hosts for coronavirus species, it will be interesting to scrutinize how bats evade viral entry as previous studies have identified bats evolving mechanism for defying interferon pathway activation by the STING interferon pathway [87].

Conclusion:
The current study proposed potential multi epitopes for vaccine development against COVID-19. The           interacting residues with hACE2 while blue dots represent interacting residues of BatCoV RaTG13 with bACE2.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. TableS1.docx