Antigenic protein sequence selection
The complete genome sequence of SARS-CoV-2 was taken from the NCBI database and downloaded in FASTA format for later use. The results of Tai et al indicate that amino acid residues 331-524 in the spike protein constitute the receptor binding domain of the virus, the neutralization of which can effectively prevent viral entry into the host cell, thereby serving as a useful candidate for a vaccine. The selected protein sequence was then run through Vaxijen 2.0 with a threshold of 0.4. The sever predicted a score of 0.5356 which indicates that the overall RBD protein has a strong potential for being an antigen (table 1).
Prediction of B cell epitopes
A preliminary list of potential B cell epitopes was generated through the IEDB B cell epitope prediction tool. Various prediction methods based on physiochemical properties and antigenicity were utilized. The parker hydrophilicity of the RBD protein was found to be 1.316(average), -3.743(minimum), and 5.371(maximum). The Chou and Fasman beta turn was found to be 1.040(average), 0.694(minimum) and 1.397(maximum). The Karplus and Schilz flexibility was found to be 0.989(average), 0.896(minimum), 1.112(maximum). Viable epitopes should be exposed on the surface for better contact and so the surface accessibility was evaluated. The Emini Surface Accessibility of the protein was found to be 1.000(average), 0.074(minimum), 3.839(maximum). The Kolaskar and Tongaonkar antigenicity was found to be 1.042(average), 0.907(minimum), and 1.214(maximum). IEDB’s Hidden Markov model based Bepipred and Random Forest algoritm based Bepipred 2.0 prediction was also used(figure 1). The epitopes that consistently mathched through several prediction methods were chosen. Another server called BcePred that evaluated similar physiochemical methods also generated some additional epitopes while agreeing with most of the epitopes predicted by the IDEB tools. A total of 14 B cell epitopes were generated through both servers which met the criteria for having a Vaxijen antigenicity score of higher than 0.4 and length of greater than 4 amino acids (table S1).
Prediction of T cell epitopes
An initial list of Cytotoxic(CTL) and Helper T cell (HTL) epitopes was created with the help of the IEDB Tepi tool. Generally, T cell epitopes are predicted based on their binding affinity to different alleles of Human Leukocyte Antigen (HLA) molecules, a crucial step for antigen presentation to T cells. For selection of CTL epitopes, the 27 most frequently expressed HLA alleles of supertypes A and B were selected, and the threshold was set to select the epitopes of score of top 1% and the length of 8-11 peptides. Their antigenic potential was also evaluated and yielded a total of 23 potential CTL epitopes (table S2). Interestingly a few epitopes such as PYRVVVLSF and KLNDLCFTNV were completely or nearly identical to their SARS counterparts which have been previously confirmed through MHC binding assays.
The IEDB Tepi tool was also used for the selection HTL epitopes, and the set of 26 most frequently expressed HLA alleles for MHC class II molecules was used. The threshold was set to generate the epitopes with a score in the top 10% and length of 8-11 peptides. The dataset was similarly filtered to only keep epitopes with high antigenicity and produced a total of 7 Helper T cell epitopes (table S3).
Analysis and selection of appropriate T cell and B cell epitopes
Evaluating the safety of any antigen or epitope remains of great importance since many epitopes can resemble potent allergens or toxins and therefore can result in adverse and unwanted immune responses. Accordingly, all the epitopes were tested for their resemblance to allergens and toxins using Allergen FP1.0 and ToxinPred respectively. Almost all of the epitopes were found to be nontoxic, but the majority of B cell and T cell epitopes were filtered out due to their high potential for acting as an allergen. Ultimately 3 CTL, 5 HTL, and 5 B cell epitopes were selected to constitute the multi-epitope vaccine based on their high antigenicity and high number of binding HLA alleles (Table 2). Although most of the selected HTL epitopes bound to several HLA alleles, only 2 CTL epitope demonstrated significant binding to at least 3 HLA alleles.
Epitope conservancy and population analysis
The final list of epitopes was evaluated for their conservancy across different SARS-CoV-2 isolates. The China National Center for Bioinformation was used because of its vast database of almost 4000 SARS-Cov-2 isolates from 154 locations worldwide. Multiple sequence alignment was performed on the RBD region across all the isolates as well as SARS-CoV, the results of which are shown in figure 2. Across all the sequences of SARS-CoV-2, there were no mutations in the amino acid sequence despite 3 different nucleotide mutations. In contrast, significant variation was observed between SARS-CoV-2 and SARS-CoV protein sequences in the RBD region which may partially explain the failure of some SARS-CoV antibodies to cross neutralize SARS-CoV-2 infections.
Due to the fact that HLA alleles are expressed in very different frequencies in different genetic populations, it is important to ensure that the selected epitopes are able to bind to the HLA alleles across different segments of populations. Therefore, the IEDB population analysis tool was used to determine the population coverage of the T cell epitopes in populations across the world. The results of the population analysis are provided in figure 3. The HTL epitopes covered over 99% of the global population and had similar levels of coverage across all continents. However, the 3 CTL epitopes covered only 74% of the global population, in part because of the low number of corresponding alleles for each epitope. Nonetheless, the selected T cell epitopes for the vaccine component altogether cover 99.92% of the global population have over 99% population cover in all continents.
Construction of multi-epitope vaccine
The selected epitopes were fused into a single vaccine component along with an adjuvant and a universal immunogenic sequence to create the final vaccine construct. The epitopes were linked using AAY linkers to increase protein stability. The adjuvant and the universal immunogenic sequence were connected to the epitopes through rigid EAAAK liners. Rigid linkers provide the distinct advantage of separating the different proteins of interest and preventing any cross-interactions. Certain regions such as amino acid residues 174-193 contained several epitopes and so overlapping regions were combined into a single peptide chain encompassing all those epitopes (table 3). The N and C terminals of bacterial flagellin were added to the peptide chain as an adjuvant. These alpha helical regions of flagellin which are largely conserved across bacterial species act as a potent TLR 5 agonist and induce a mixed Th1 and 2 response. To further increase the immunogenicity of the vaccine, the pan-DR epitope (PADRE) with the sequence AKFVAAWTLKAAA was attached to the vaccine. This universal Helper T cell epitope has been shown to improve efficiency of several peptide vaccines by boosting both the helper T cell and cytotoxic T cell response. The final peptide sequence of the multi-epitope vaccine candidate along with the addition of flagellin and PADRE is shown in table 4.
Structural Analysis
The primary structure of the vaccine construct was analyzed for its physical and chemical properties. It was found to have the formula C1720H2758N506O555S3 with a total of 5542 atoms and a molecular weight of 39.5 kDa. The estimated half-life of the vaccine was predicted to be 20 hours (in mammalian cells, in vitro), and 10 hours (in E coli, in vivo). Importantly, the instability index of the protein was determined to be 37.05 which classifies it as stable. The Aliphatic index of 87.86 reflected this observation. Furthermore, the Grand Average of hydropathicity was -0.288 which classified the protein as hydrophilic. The secondary structure was predicted using PSIPRED whose results are indicated in figure 4. The results indicated that 77.53% of the protein consisted of alpha helices, 21.37% consisted of random coils, and 1.09% was made of extended strands.
The tertiary structure was constructed in SparksX and is shown in figure 5. Due to high levels of unfavorable residues in the initial structure, the model was refined in GalaxyRefine which generated 5 possible models. The top model was chosen and structurally evaluated in ProSA and PROCHECK. ProSA compares the structure to other experimentally determined protein structures and assigns it a z-score. The z-score for the vaccine was -5.06 which is well in range of z-scores for proteins of that size. The residue energy chart was also generated and the energies of most of the residues were negative, suggesting a stable structure. The structure was further evaluated in RAMPAGE and a Ramachandran plot was generated. 96.7% of residues were in the favored range, 2.5% in the allowed range, and 0.8% in the outlier range. The results of the tertiary structural analysis are illustrated in figure 5.
Protein-peptide molecular docking
The T cell epitope interactions with their respective HLA molecules was modeled in PepSite and the results are shown in figure 5. Effective T cells response necessitate strong bonding of the epitope with HLA molecules. For some T cell epitopes which were identical in SARS, the binding to HLA alleles has already been experimentally performed through MHC assays and so was not modeled here. For instance, PYRVVVLSF has already been proven to interact with HLA-A*01:01, HLA-A*23:01, HLA-A*24:02, and HLA-A*26:01. Nonetheless, all of the T cell epitopes chosen for modeling bound strongly to their corresponding HLA molecules at an average of 6 residues. For instance, the epitope ASVYAWNRK bonded to HLA-A*30:01 at ala-1, val-3, tyr-4, ala-5, asn-7, and arg-8. Due to the intrinsic lack of antigenicity of peptide vaccines, the interaction between the adjuvant and pattern recognition receptors such as Toll like receptors is critical for the generation of a potent immune response. Therefore, the molecular docking between the vaccine and Toll-like receptor 5 was modeled in Patch Dock (figure 6). As expected, the alpha helix rich flagellin interacted with TLR 5 at 14 key residues and demonstrated its function as an TLR 5 agonist.
Codon optimization in silico cloning
Finally, a 6xHis tag was added for purification purpose. For expression in bacteria, the genomic sequence of the vaccine construct has to be codon optimized to yield optimal expression since the same amino acids are coded using different nucleotides in different organisms. Therefore, the peptide sequence of the vaccine was codon optimized in JAVA Codon Adaption Tool and the generated sequence is shown in table 5.
The codon optimized sequence was then inserted into a vector for expression in E coli K12. The pET-28 a (+) expression vector was chosen from Addgene for its high expression rates [40]. The gene coding for the vaccine construct was then inserted through restriction digestion which was modeled in Benchling (figure 7).