Designing a new multi epitope-based vaccine against COVID-19 disease: an immunoinformatic study based on reverse vaccinology approach

Afshin Samimi Nemati University of Mazandaran Faculty of Basic Science Majid Tafrihi (  m.tafrihi@umz.ac.ir ) University of Mazandaran Faculty of Basic Science https://orcid.org/0000-0001-7200-4252 Fatemeh Sheikhi University of Mazandaran Faculty of Basic Science Abolfazl Rostamian Tabari University of Mazandaran Faculty of Basic Science Amirhossein Haditabar University of Mazandaran Faculty of Basic Science

During this short period of the COVID-19 pandemic, many efforts have been made to introduce therapeutic methods and new drugs to treat Covid-19 infection [14]. To this end, anti-viral drugs, interferon-α nebulization, and broad-spectrum antibiotics have shown promising results in some therapeutic cases [15][16][17]. Despite these, there is no approved drug available in the medical centers yet [18].
Reverse vaccinology (RV), an approach that utilizes the expressed genomic sequences to nd new potential vaccines [19], was rstly applied for Neisseria meningitidis group B and now is accepted as a successful method for producing new vaccines [20]. This method signi cantly reduces the time required for identifying candidate vaccines through using various servers and databases to identify candidate proteins and e cient epitope(s) [21]. For this purpose, to acquire an appropriate peptide vaccine, the biochemical and physicochemical properties of the constructed peptide considered. According to the current advances in this eld, several servers were established [22].
In the present study, we designed a new multi-epitope peptide vaccine against SARS-CoV-19 infection. For this purpose, various epitopes have been selected and engineered based on their biochemical and physicochemical properties. Finally, in silico studies were carried out on the bioactivity of the constructed peptide by docking the vaccine with Toll-like receptors (TLR3 and TLR4) and major histocompatibility complex (MHC I and MHC II).

Materials And Methods
We took all steps of this study according to Fig. 1.

Data collection
To nd proteins associated with SARS-CoV-19 infection, we used the NCBI database available at https://www.ncbi.nlm.nih.gov/protein/. Then, the amino acid sequence of the Cholera toxin subunit B (CTXB) (GenBank: BBG62270.1) has obtained in the FASTA format from the protein database of the NCBI. Accordingly, the three-dimensional structure of TLR 3, TLR 4, MHC I, and MHC II proteins were obtained from the protein databank (available at https://www.rcsb.org/), in PDB format, with PDB entry of 3ULS, 4R7N, 4UQ3A, and 4GBX, respectively.

B-cell and T-cell epitope prediction
In the following, IEDB server (available at https://www.iedb.org/) and BepiPred 2.0 accessible at (http://www.cbs.dtu.dk/) have used to predict the linear B-cell epitopes from the chosen proteins, [24,25]. For the prediction of the T-cell related epitopes that contain the MHC I and MHC II, the IEDB server was utilized. Antigenicity and allergenicity analysis for the selected epitopes using Vaxijen 2.0 and AllerTOP v. 2.0 servers was repeated. The ToxinPred server (available at https://webs.iiitd.edu.in/raghava/toxinpred/algo.php) used to predict the toxic region in the selected epitopes.
Peptide designing and adjuvating For this purpose, by engineering the amino acid sequences of the selected epitopes, a peptide sequence has been constructed.
Some of the selected epitope sequences contain both the B-cell and T-cell epitopes that are named B-cellderived T-cell epitopes. This strategy for the epitope selection could lead to reducing the length of the nal amino acid sequence. Lysine-lysine (KK) cross-linkers connected selected epitopes. Usually, protein adjuvants were used to increase the probability of exposing the peptide vaccines. Therefore, the CTXB amino acid sequence was added to the epitope sequences using Proline-rich (PAPAP) rigid linker.

Structural analysis
The secondary structure of the designed vaccine has been predicted using the GOR4 secondary structure prediction method of the Prabi server (available at https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl? page=/NPSA/npsa_gor4.html). In the next step, the 3D structure of the protein has been predicted by using the I-TASSER server (https://zhanglab.ccmb.med.umich.edu) [28,29]. The prediction of the 3D structure of the proposed vaccine is required to validate the a nity of binding to the immune system elements in docking studies.

Re nement and molecular docking of the candidate peptide
The re nement process of the predicted model has performed using the GalaxyWEB server (available at galaxy.seoklab.org/) to reduce possible structural faults [30,31]. Moreover, the geometric features of the proposed vaccine validated based on the Ramachandran plot by using the PROCHECK server (available at https://servicesn.mbi.ucla.edu/PROCHECK/) [32]. The molecular docking process was conducted on using the Cluspro 2.0 protein-protein docking server (https://cluspro.org/help.php) to ensure the interaction between receptor molecules TLR3, TLR4, MHC I and MHC II and the re ned model of candidate protein as a ligand. The antibody-mode of the Cluspro 2.0 server was used for the docking process [33][34][35][36].
Back translation, codon optimization, and in-silico cloning of the candidate protein The Snap-Gene 3.2.1 software performed the back-translation of the designed peptide into the nucleotide sequence. Also, to optimize the rate of protein expression in Escherichia coli, the JCat online server (http://www.jcat.de/) was utilized [37]. To have the correct translation, the open reading frame (ORF) has checked by using Snap-Gene 3.2.1 software. Finally, in silico restriction cloning was performed using Snap-Gene 3.2.1 software. To this aim, the poly-histidine-tag sequence was fused to the proposed vaccine sequence and then inserted into the multiple cloning site of the PET21 expression vector.

Protein antigen selection
Firstly, protein sequences have aligned with conserved and selected regions. Then, chosen proteins compared with each other in terms of antigenicity, allergenicity, and toxicity. Finally, ve chosen proteins had high levels of antigenicity and were non-allergen (see Table 1).

Linear B-cell and T-cell epitope prediction
To predict the linear B-cell and T-cell epitopes IEDB, and BepiPred2.0 servers have used, and similar epitopes were selected ( Table 2). Brie y, MHC I and MHC II-restricted epitopes were predicted and ranked according to the IEDB scores ( Table 3). Some of the B-cell epitopes have derived from T-cell epitopes to reduce the nal length of the sequence. Finally, epitopes with the highest antigenicity score and lack of allergenicity and toxicity were selected to construct the peptide vaccine.

Peptide designing and adjuvating
For both B-cell and T-cell, thirteen epitopes were selected and then linked together using lysine-lysine (KK) linkers. To increase the immunogenicity of the designed vaccine, an adjuvant sequence was added to the peptide sequence. Various peptide adjuvants were used in previous studies [38]. The amino acid sequence of cholera toxin B subunit (CTXB) that is the non-toxic portion of cholera toxin was added into the initial part of the peptide and connected to the epitopes by PAPAP rigid linker (Fig. 2).

Physicochemical properties of constructed peptide
Physicochemical properties and amino acid composition of the constructed peptide evaluated using Pepcalc and Protparam servers. The results showed that the proposed vaccine was stable, water-soluble, with a molecular weight of 39.4 kDa. The calculated pI value was 9.6, the net charge was 17.3, and the estimated half-life of the protein in mammalian, yeast, and E. coli cells were 30, 20, and 10 hours, respectively (Fig. 3A). Also, the analysis of protein sequence stability was performed through predicting the protein disorder regions (by using Iupred 2A server), and the results con rmed the stability of the designed peptide (Figs 3B and 3C). The Iupred 2A server presents three types of analysis; IUPred2 long disorder, IUPred2 short disorder, and IUPred2 structured domains. In this study, the IUPred2 long disorder mode has selected for investigation. Protein disorders of the proposed vaccine were predicted by Iupred 2.0 designed graph. Due to the designed graph that showed in Fig. 3C, the sequence of the proposed vaccine has not a great chance to be as an anchor for binding to the registered structures in the IUPred2A server. In the presented graph, residues with a score above and below 0.5 are considered as protein disorders and protein orders, respectively. Secondary and tertiary structure of the constructed peptide The prediction of the secondary structure of the proposed vaccine performed using the GOR 4 method of the Prabi server (available at https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl? page=/NPSA/npsa_gor4.html). As Fig. 4 shows, the secondary structure of the proposed vaccine contains alpha-helix (28.57%), extended strand (19.76%), and random coil (51.67 %). Using the I-TASSER server predicted the tertiary structure of the peptide. This server suggests various models for the input sequence and the quality of prediction models re ected on the form of c-scores (-5 to 2). The higher values of the C-score relate to the higher con dence levels for the predicted model (Fig. 5).
Model re nement and molecular docking of the vaccine candidate peptide The re nement process has performed for the predicted model using the GalaxyWeb server. In this process, the server re ned the secondary structure elements like; loop regions and side chains based on several factors containing similarity score (GDT-HA), clash score, RMSD, and MolProbity. The global distance test (GDT_TS) is a measure of similarity between two protein structures with known amino acid correspondences with a different tertiary structure. The GDT-HA is a high accuracy version of GDT_TS which selects smaller cut-off distances that were half of the size of GDT_TS and thus is more rigorous. The root-mean-square deviation (RMSD) is the average distance between backbone atoms in the protein structure. The MolProbity score re ects the crystallographic resolution. A structure with a numerically lower MolProbity score than its actual crystallographic resolution is, quality-wise, better than the average structure at that resolution. This Table presented the GalaxyWeb server as a result of the re nement process for the 3D predicted model of the proposed vaccine. Table 4 shows the ve suggested re ned models. The rst presented model with GDT-HA and RMSD of 0.8951 and 0.542 selected for further considerations. However, the clash score for the chosen model was 33.6, reported score for the initial model was about 24.1. The Rama favored scores was the other score that indicates the percentage of the residues in the most favored regions of the Ramachandran plot. This score changed from 64.2 to 84.4, simultaneously for the re ned model in comparison to the initial model. Moreover, the geometric quality of the re ned model has evaluated using the Ramachandran plot by the PROCHECK server. The quality of the predicted model was investigated before re nement and after the re nement process. Fig. 6 A shows the main Ramachandran plot for the 3D model before the re ning process. Also, the Ramachandran plot for the re ned model has shown in Fig. 6 B. The Ramachandran plot results of the initial structure of the designed vaccine included 59.1% in most favored regions while these proportions in the re ned model were 78.7% that con rmed the re ning process.  (Table 5). Also, this server predicted a 3D structure for the docked molecules models (Fig. 7).
Back translation, codon optimization, and in silico cloning the candidate protein While the bioinformatic and biochemical analysis was done on the constructed peptide vaccine, the amino acid sequence should be back-translated into nucleotide sequence and then inserted into an expression vector for expression in the bacterial system or other expression systems. For this goal, at rst, the nal amino acid sequence of peptide vaccine converted to nucleotide sequence using Snap-Gene 3.2.1 o ine software. The nucleotide sequence was then optimized for codon usage in E. coli using the jcat server. For the next step, the restriction enzyme recognition site and polyhistidine tag added to the optimized nucleotide sequence. Also, the ORF frames for considering correct protein expression performed using the Snapp-Gene 3.2.1 software. Finally, the cloning of the nucleotide sequence into the PET 21 expression vector has simulated by using Snapp-Gene 3.2.1 o ine software (Fig. 8).

Discussion
The SARS-CoV-2 virus and its associated disease COVID-19 have become a crucial world health concern resulted in a signi cant mortality. On the other hand, there are no approved vaccines against COVID-19, yet, and these conditions resist until a speci c medicine or a vaccine gets approved. Therefore, researchers started to discover drugs and vaccines by using bioinformatics methods. Some of the studies reported new vaccines and medical drugs for the COVID-19 infectious disease that some of them were in the clinical trial phases [39].
There are several methods to produce new vaccines like; live-attenuated form, subunit vaccines, multiepitope vaccines, and DNA vaccines [40,41]. The reverse vaccinology approach is a relatively new method to nd a successful and safe vaccine in a relatively short time [42]. Main immunoinformatics approaches have improved this method by creating several databases and algorithms for epitope prediction, which increases the speed and accuracy of the vaccine development [40]. Nosrati and his colleagues have identi ed the rst epitope-based vaccine against Crimean-Congo hemorrhagic fever virus based on the reverse vaccinology approach. Their designed peptide vaccine showed good stability, watersolubility, non-allergenicity, and highly antigenic activity [43]. Ahmad et al. (2020) suggested a novel vaccine for the COVID-19 virus using molecular docking and molecular dynamics of interaction with TLR3 and TLR 4 molecules [44].
In this study, we used immunoinformatics to design an appropriate vaccine for the newfound coronavirus. We chose both immunogen B and T-cell related epitopes from different proteins of the SARS-CoV-2 virus using the epitope prediction methods of the IEDB server. In the same way, Bhattacharya et al.
(2020) developed an epitope-based vaccine against the novel coronavirus (SARS-COV-2) using the immunoinformatics approach. They identi ed potential B and T cell epitopes of the spike glycoprotein of the coronavirus. The nal vaccine construction contains thirteen MHC I and three MHC II related epitopes that were connected by the EAAAK linker [45].
As is shown in Fig. 2, our designed vaccine contains CTXB, as a natural adjuvant, and various B and T-cell epitopes that were connected by linkers. The biochemical and structural properties of the proposed vaccine include amino acid composition, secondary and tertiary structure, stability, and half-life of the constructed peptide was considered using several servers. Also, the prediction of protein stability and protein disorders has performed using the Iupred 2.0 server [46]. Protein disorders are functional domains or protein sequence segments that have not had a highly populated stable secondary and tertiary structure under physiological conditions. These sequences do not have a stable three-dimensional structure, so they may lead to protein instability [47,48]. Our analyses con rmed the stability of the designed chimeric protein by predicting the half-life for the sequence.
Generally, viral surface proteins like glycoproteins play a crucial role in virus fusion to the host cells. On the other hand, these proteins are usually more mutable [49][50][51][52]. Therefore, in this study, we selected conserved proteins with the highest antigenicity levels regardless of their functions. The prediction of Tcell and B-cell epitopes is a crucial step to achieve an immunodominant vaccine. Consequently, in this study, IEDB and Bipred2.0 servers were used to predict linear B-cell epitopes, and thus similarly resulted in selected epitopes. Also, to predict appropriate T-cell epitopes (with high antigenicity and nonallergenicity), the IEDB server used and the most frequently identi ed epitopes that interact with most of MHC I and MHC II alleles have selected (Table 3) [25,53].
The results of the molecular docking process con rmed the immune reaction between the antigen and selected antibodies. All receptors can get an immune interaction with the proposed vaccine (Table 5).
Sanami and colleagues have suggested a multi-epitope vaccine against the human coronavirus 2. In that study, the spike protein of the virus has targeted, and HTL, CTL, and B-cell epitopes were selected using the ProPred-1 and ABCpred servers. They have used ClusPro 2.0 server for the docking studies of the designed vaccine and HLA-I and HLA-II molecules which resulted in the lowest energy weighted score of -1165.4 and -1279.1 Kcal/mol, respectively [54]. According to Table 4 and Fig. 8, the results of our molecular docking process con rmed the immune reaction between the antigen and selected antibodies. As Table 4 shows, related receptors interacted with the proposed vaccine. In this regard, using the Clus pro 2.0 server, TLR 3 molecule showed the best interaction comparing to other receptors with the lowest energy of -602.1 kcal.mol-1 in antibody-mode of Cluspro2.0 server [33]. In this server, the antibody-mode was different from the balanced-mode. Brenke et al. (2012) removed the usual assumption of symmetry using asymmetric pairwise potential. Thus, each atom on the antibody molecule is different from the corresponding atom on the antigen protein [33].

Conclusion
The process of development of a vaccine is a time-consuming and costly process that involves bioinformatics and experimental approaches. Therefore, an epitope-based vaccine that is designed by immunoinformatics methods can be considered as a practical approach for designing and producing a novel vaccine against this virus.
The results of this in silico study con rmed that the designed vaccine that had high antigenicity and stability could be a proper vaccine candidate against the COVID-19 disease. Consequently, to analyze the immune-dominant features, the proposed vaccine should be considered for further in vitro and in vivo studies to investigate the potential immunogenicity.

Declarations
Authors' contributions A.S.N., designed research, performed research, analyzed data, and drafted, M.T., designed research, analyzed data, wrote and revised the draft, F. Sh., performed research, and analyzed data, A. R. T., performed research, and analyzed data, and A. H., performed research, and analyzed data.

Compliance with Ethical statement
This article does not contain any studies involving animals or human participants performed by any of the authors. Table 3 T-cell epitopes binding to MHC I and MHC II. Antigenicity and allergenicity of the most repeated epitopes were analyzed using the vaxijen 2.0 and the AllerTop servers, respectively. The non-allergen epitopes with the most antigenicity score were selected   Schematic representation of the designed peptide vaccine. The designed sequence consisted of four components including CTXB as an adjuvant, PAPAP, and KK as linkers, B-cell, and T-cell related epitopes.

Figure 3
Page 17/21 A) Physicochemical properties of the constructed peptide by the PepCalc server. Analyses showed that the net charge of the peptide is 17.3, and the pI value is about 9.62. B) Analysis of the amino acid composition and estimation of the half-life of the protein using the Protparam database. C) The stability of protein was analyzed using Iupred 2a by studying the protein disorders of the peptide sequence. In this graph, the blue line indicates the ANCHOR2, and the red line indicates the IUPred2. According to the red line on the presented graph, the results con rmed the stability of the protein Figure 4 Secondary structure of the peptide vaccine predicted by the Prabi server and the GOR IV method. The length of the peptide is 326 amino acids. The secondary structure includes alpha-helix (28.57%), extended strand (19.76%), and random coil (51.67%) Figure 5 The predicted tertiary structure of the constructed vaccine by the I-TASSER server Figure 6 Ramachandran plot for constructed peptide after re nement. A) Ramachandran plot for the initial model that showed 59.1% of residues located in most favored regions and 31.2% were located in additional allowed regions. Also, 301 residues from a total of 329 residues were non-glycine and non-proline. B) The Ramachandran plot for the re ned model showed 78.7% of residues located in the most favored regions. In this regard, the designated additional allowed, generously allowed, and disallowed regions residues were 15.9%, 2.3%, and 3.0%, respectively Docking results for the predicted 3D structure. A) 3D structure after docking the peptide vaccine and MHC I, B) MHC II, C), TLR3, and D) TLR4