Prophylactic domain-based vaccine against SARS-CoV-2, causative agent of COVID-19 pandemic

Coronavirus disease 2019 (COVID-19) is undoubtedly the most challenging pandemic in the current century with more than 253,381 deaths worldwide since its emergence in late 2019 (updated 2020). COVID-19 is caused by a novel emerged coronavirus named as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Today, the world needs crucially to develop a prophylactic vaccine scheme for such emerged and emerging infectious pathogens. In this study, we have targeted spike (S) glycoprotein, as an important surface antigen of SARS-CoV-2, to identify its immunodominant B- and T-cell epitopes. We have conducted a multi-method B-cell epitope (BCE) prediction approach using different predictor algorithms to discover most potential BCEs. Besides, we sought among a pool of MHC class I and II-associated peptide binders provided by the IEDB server through the strict cut-off values. To design a broad-coverage vaccine, we carried out a population coverage analysis for a set of candidate T-cell epitopes and based on the HLA allele frequency in the top most-affected countries by COVID-19 (update 02 April 2020). The nal determined B- and T-cell epitopes were mapped on the S glycoprotein sequence, and three potential hub regions covering the largest number of overlapping epitopes were identied for the vaccine designing (I 531 –N 711 ; T 717 –C 877 ; and V 883 –E 973 ). Here, we have designed two domain-based constructs to be produced and delivered through the recombinant protein- and gene-based approaches, including (i) an adjuvanted domain-based protein vaccine construct (DPVC), and (ii) a self-amplifying mRNA vaccine (SAMV) construct. The safety, stability, and immunogenicity of the DPVC were validated using the integrated sequential (i.e. allergenicity, autoimmunity, and physicochemical features) and structural (i.e. molecular docking between the vaccine and human Toll-like receptors (TLRs) 4 and 5) analysis. The stability of the docked complexes was evaluated using the molecular dynamics (MD) simulations. These rigorous in silico validations supported the potential of the DPVC and SAMV to promote both innate and specic immune responses in the animal studies. the rst time, we designed two domain-based vaccine constructs against SARS-CoV-2 based on the two different vaccine production and delivery platforms including, (i) a recombinant protein vaccine, and (ii) a self-amplifying mRNA vaccine. We believe that the results of this study can be a step ahead in the vaccine development campaign against SARS-CoV-2. The methods used for identication of the hub residue fragments of S glycoprotein were conducted based on the rational data ltering and also the precise multi-method analyses of various immunological datasets. The sequential and structural analysis of the DPVC showed that the vaccine is stable, safe, and immunogenic. In this context, these constructs are our urgent ongoing project to monitor the vaccine's potential to trigger properly both innate and specic B- and T-cell immune responses in animal models. Altogether, we have considered comprehensive key factors in the prediction of epitopes and the designing of both the DPVC and SAMV to ensure the proposed vaccines can induce both innate and pathogen-specic immune responses. As a result, we proposed the designed vaccines are as promising vaccines against SARS-CoV-2 after being further examined through accelerated animal studies and clinical trials. The whole-genome reference sequence of SARS-CoV-2 was retrieved from the National Center for Biotechnology Information (NCBI) genome database (accession no. NC_045512). The reference protein sequence of spike protein (accession no. YP_009724390.1) in FASTA format was used for BLAST against non-redundant protein sequences (nr) database through the blastp (protein-protein BLAST) algorithm. The FASTA sequence of 100 spike protein of different countries and different dates of isolation with signicant alignments (identity ≥ 75.80% and E-value 0.0) were taken and multiple-sequence-alignment was carried out using the MUSCLE program of MEGA v10.0 software 16,62 . The aligned sequences were then analyzed to nd the best substitution model of amino acid evolution using MEGA 10 software. The phylogenetic tree of the protein S dataset was inferred by using the Maximum Likelihood (ML) method and JTT matrix-based model 63 and via bootstraps replications of 1000 64 . The putative spike protein isolated from Zaria Bat coronavirus (GenBank: ADY17911.1) was served as an outgroup. 16.94% sites). Initial tree(s) for the statistical were obtained automatically by applying the Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then, the selecting of the topology with the superior log-likelihood value. The clades correspond to the different isolates of SARS-CoVs are collapsed in front of the triangle for better presentation. B) The Shannon entropy plot of different isolates of the protein S. The entropy (Hx) values ranged between 0.0 and 1.0, where the values more than 1.0 are related to the diverse residues.


Introduction
Despite notable progress in medical sciences during the 20 th century, still, infectious diseases have signi cant consequences on the public health systems worldwide. Of these, emerging infectious diseases (EIDs) and re-emerging infectious diseases (RIDs) are always considered as striking threats to humans all around the world 1 . The majority of such infectious diseases are zoonotic and mostly originated from animals, including severe acute respiratory syndrome coronavirus (SARS-Cov), in uenza A virus subtype H1N1, middle east respiratory syndrome coronavirus (MERS-CoV), Ebola, and Zika virus.
Today, the world is confronting a novel coronavirus so o cially named SARS-CoV-2, and World Health Organization (WHO) has named its relevant disease as "Coronavirus disease 2019 (COVID-19)". The rst known SARS-CoV-2 was discovered in late December 2019 in Wuhan, Hubei Province, China. Since then, it has become a global pandemic, in large part due to its rapid rate of human-to-human transmission, lack of vaccine, and delay in global functional protocols 2 . The infection of SARS-CoV-2 can lead to some Different features of the SARS-CoV-2 genome are categorized and presented in Table S1 (Supplementary Information). To further assay the phylogenetic relationship between the SARS-CoV-2 genome and all other strains of CoVs, as shown in Fig. 2A, we built an evolutionary tree with the highest log likelihood (-11665.31). According to the phylogenetic analysis, among all known CoVs, the bat coronavirus RaTG13 (Accession no. QHR63300.2) showed the closest relation to the recent emergent human coronavirus (HCoV).
Identi cation of spike glycoprotein conserved domain(s) and region(s) The conserved and variable regions of the spike glycoprotein among the hundred CoV strains are shown based on the Shannon entropy plot (Fig. 2B). The most variable residues have entropy (Hx) values more than 1.0. According to the NCBI-CDD's output, there are two domain hits in the glycoprotein S sequence, including (i) a large polypeptide (CoV S2 protein, residues from 662 to 1270), and (ii) spike receptorbinding domain (residues from 331 to 583) that mediates the a nity binding of the virus to angiotensinconverting enzyme 2 (ACE2). The conserved regions have a higher probability to be as a part of functional domains of the protein, however, epitope escape mutations may be also a potential consequence to the emergence of such zoonotic EREI viruses.

Signal sequence and subcellular localization
The consensus prediction method of the TOPCONS tool showed the rst 21 amino acids act probably as a signal sequence. Moreover, the result of TOPCONS showed the residues 1217-1237 and 1238-1273 were signi cantly predicted as transmembrane (TM) and cytoplasmic domains, respectively ( Fig. S1A; Supplementary Information). The CCTOP web-server predicts a signal peptide (1 to 14), one membranespanning region (residues 1216-1241), and one intracellular domain (residues 1242-1273) at the Cterminus (Fig. S1B). The TMHMM server predicts one transmembrane helix region from residues 1214 to 1236, and the residues from 1273 to 1273 were predicted as intracellular (Fig. S1C).
The secondary and tertiary structure of glycoprotein S, MD simulation, and validation The secondary structure of the spike protein consists of 21.52% (274) alpha helix, 22.07% (281) extended strand, and 56.4% (718) random coil (Fig. S2). Further, the tertiary structure of spike protein was analyzed. During the MD simulations, the root-mean-square deviation (RMSD) values of the 3D model reached constant values. The backbone dihedral angles (psi/phi), local and overall model quality plots of the energy minimized 3D model are shown in Fig. S3. The analysis of the 3D model by VERIFY3D revealed that 80.28% of the residues had an average 3D-1D score above 0.2. Furthermore, the ERRAT score of the model was 78.67% whereas the accepted range of the ERRAT score is above 50%, and a higher score speci es a better quality. These quality assessment methods express the high quality of the 3D model, and it can be reliable for the identi cation of potential conformational B-cell epitopes.
Pro ling B-cell immunodominant regions of spike glycoprotein of SARS-CoV-2 A preventive vaccine with heightened B-cell defense against SARS-CoV-2 infections should ideally have a high potential to induce neutralizing antibodies. In this regard, the prediction of B-cell epitopes through a multi-method analysis can provide a piece of authentic information about the B-cell immunodominant regions of such less-known infectious agent 17 . Here, we have created a consensus plot of spike glycoprotein amino acid sequence, which can understandably pro le the B-cell immunodominant regions and also potential B-cell epitope of spike glycoprotein of the virus. The B-cell immunodominant regions of the S glycoprotein and its superimposed version with sequence entropy-variability plot is indicated in Fig.  3A and B.
The 3D structure of the homology modeled glycoprotein S (Fig. 4A) was re ned using the GalaxyRe ne online web-server. The re ned model was energy minimized and its poor molecular contacts were excluded through an MD simulation (Fig. 4B). In total, 28 immunodominant B-cell peptides were predicted. All the predicted peptides are located on the accessible surface of the S glycoprotein (Fig. 4C). Therefore, those peptides, which have the highest prediction score, were selected for the vaccine design (Table 1). Besides, the reference sequences of the S glycoproteins of SARS-CoV (accession ID: NP_828851.1) and SARS-CoV-2 (accession ID: YP_009724390.1) were used for pairwise sequence alignment, and the nal predicted B-cell epitopes were marked for comparison with the experimentallydetermined SARS-CoV-derived B-cell epitopes 18 (Fig. S4; Table S2).

Prediction of SARS-CoV-2 T-cell epitopes i. Cytotoxic T-cell epitope
The IEDB server predicted a list of 2529 unique peptides of S glycoprotein binding to the 27 alleles of HLA class A and B as a raw data (Table S3). Of these, the consensus rank (CR) score of the peptide binders which had percentile rank ≤ 1.0, and ANN-and SMM-based IC 50 ≤ 50.0 were calculated. In this approach, peptide binders were sorted and then selected based on (i) their rank in terms of percentile rank, ANN-IC 50, and SMM-IC 50 measures (e.g., consensus rank or CR), and (ii) number of the HLA alleles that are covered by these binders. As a result of the CR score-based screening, we plotted the most dominant peptide binders for both HLA-A and B alleles in Fig. 5A. The CR scores allow screening a subset of binder hits covering a large range of the human population. The most potent peptide binders of the SARS-CoV-2 spike glycoprotein sequence corresponding to the HLA-A and B alleles are shown in Fig. 5A.
The raw output table of IEDB's was contained 132195 peptides of different length binding to HLA-DRB alleles (Table S4). The predicted HLA-II peptide binders were ltered using a strict threshold (adjusted rank ≤ 1.0) to choose all the top-scoring peptides for each speci c HLA-II allele. Of these, 24 most immunodominant peptides were chosen for more analysis (Fig. 5B). The potentially effective CD4 + T-cell epitopes were selected based on the population coverage of each peptide and also the number of covered HLA-II alleles.

Population coverage of T-cell epitopes
According to the announcement of the world health organization on March 12 th , 2020, the COVID-19 outbreak was characterized as a pandemic, indicating that vaccinologists may confront with the broadspectrum immunophenotypes that can complicate the vaccine design and development 20 . Therefore, in this study, we provided a list of most potent peptide binders associated with most frequent MHC alleles to design a broad coverage vaccine construct. The population coverage of the most potent T-cell epitopes in the countries that are impacted the most by SARS-CoV-2 is reported in Table S5.
i. Selection the most dominant CD8 + T-cell epitopes Among the pool of CD8 + T-cell peptide binders we sought to found the most potent regions of S glycoprotein as the CTL epitope. Generally, we found 16 epitope sequences with the highest binding a nity to a maximum number of the most frequent HLA-I alleles. The most dominant predicted CD8 + Tcell epitopes were selected based on their CR score, MHC allele coverage, and percentage of population coverage ( Table 2). As presented in Table 2, the average population coverage for the eleven of the best CD8 + T-cell epitopes and their corresponding HLA-alleles were observed between 36.48% for the "SGWTAGAAAYYV" and 79.05% for the "GYLQPRTFLLKY" peptides. For details of the results of population coverage analysis of each of 16 predicted CD8 + T-cell epitopes in the most-affected countries by COVID-19, readers are directed to see Table S6. ii. Selection of nal CD4 + T-cell epitopes The selected HLA class II binders contain the most frequently occurring amino acids that have the highest capacity to attach different MHC class II alleles (Table 3). Thereupon, they might have good potential to elicit effective cellular immunity in most human populations. The detailed results of population coverage analysis for all 16 predicted CD4 + T-cell epitopes in the most-affected countries by COVID-19 are presented in Table S7.  i. An adjuvanted DPVC, which needs to be produced, expressed, and puri ed in vitro, and injected subcutaneously.
ii. A self-adjuvanted SAMV construct, which needs to be synthesized, produced as in vitro transcription process, delivered employing a designated non-viral delivery system such as liposomal nanoformulation, administrated intramuscularly, and expressed in situ.
i. The recombinant DPVC In this platform, we designed an adjuvanted vaccine construct with a full-length of 984 amino acid residues. The different components of the vaccine are schematically represented in Fig. 6A. The result of PSIPRED web-server showed among 984 amino acids, 257 (26.12%), 204 (20.73%), and 523 (53.15%) amino acids are involved in α-helix, extended strand, and random coil, respectively. The map of the predicted secondary structure is shown in Fig. S5. The 3D structure of the MD-re ned vaccine model is represented in Fig. 6B.
The C-and TM-scores, and RMSD of the initially modeled vaccine by the I-TASSER were calculated as -2.63, 0.41±0.14, and 13.6±3.1Å, respectively. The C-score is usually ranged from -5 to 2, where the Cscore of higher values implies a model with higher con dence 21 . The TM-score and RMSD, as the standard metrics, are measured based on the C-score following the correlation observed between these qualities 22 . The TM-score threshold is independent of the size of proteins and values more than 0.5 are relevant to the correct model topology.
The energy level of the homology 3D modeled vaccine was minimized through the MD simulations for 50 ns to improve structural stability. The RMSD trajectory graph of the MD optimized vaccine model is shown in Fig. 6C. The RMSD of the structure reached to 3.2Å after 5ns and remained approximately stable until the end of simulations. This observation indicated the model expansion during the simulation and that the simulation duration was long enough to obtain an equilibrium structure for the constructed vaccine. Consequently, the extracted equilibrium structure at 310K was used for the subsequent evaluation of the vaccine-receptor binding a nity and interactions.
The backbone torsion angles (psi/phi) of the vaccine model and its overall quality before (i.e., initially modeled vaccine) and after MD simulation were analyzed based on the validation plots obtained from the PROCHECK (Fig. S7). The energy minimized vaccine model showed that 710 of all residues (82.8%) were in the most favored regions of the Ramachandran plot. Whereas in the initial DPVC model only 399 of residues (46.4%) were in these regions (Fig. S7). The comparison assessments showed that the MDminimized vaccine model can be reliable to predict the binding a nity between the vaccine and TLRs 4 and 5.
Vaccine safety, antigenicity, stability, and solubility Based on the result of both AlgPred and AllerTOP web-servers, the DPVC have no allergenic nature. The NCBI protein-protein BLAST against Homo sapiens showed the DPVC has no sequence similarity with the human proteome. This implies that the candidate vaccine should not trigger the autoimmune responses in the human body but activate the desired speci c immunogenic reactions. The VaxiJen antigenicity score for the DPVC was 0.5097 indicating it as a probable antigen.
The molecular weight of the vaccine obtained from the ProtParam tool was about 105 kDa. The theoretical isoelectric point (pI) was calculated to be 5. 95 showing the vaccine is slightly neural. The total numbers of positively and negatively charged residues were computed to be 81 and 91, respectively. The extinction-coe cient was 83660 M -1 cm -1 at 280 nm measured in water, which means all Cys residues are reduced. The half-life of the vaccine construct in mammalian reticulocytes was estimated 30 hours (in vitro), more than 20 hours in yeast (in vivo), and more than 10 hours in Escherichia coli (in vivo) obtained by ProtParam tool. The computed instability index (II) classi ed the vaccine construct as a stable protein with a score of 28.47. The aliphatic index and GRAVY were calculated to be 80.50, and -0.296, respectively. These measures indicate that the vaccine construct is highly thermostable and also hydrophilic. The safe, immunogenic, and stable nature of the designed vaccine makes it a good candidate for more structural analysis.

Vaccine adjuvanticity and molecular docking simulations
The protein-protein molecular docking between the MD-optimized DPVC and the immune receptors (TLR4 and TLR5) was performed using the ClusPro v2.0 tool (Fig. 7). The best docked-complexes with the lowest energy scores were -1350.3 Kcal/mol, and -1369.5 Kcal/mol, for vaccine-TLR4, and vaccine-TLR5 complexes, respectively. The binding energies of the docked complexes were measured in the form of coe cient wattage using the formula E=0.40E rep +-0.40E att +600E elec +1.00E DARS in the Balanced model 23 .
The complexes with the highest binding a nities were subjected to the MD simulations by the GROMCAS software to survey their conformational stability (Fig. 7). The simulations were carried out in a 10 Å cubic box containing water molecules at 310K. The protein solvation was done using the spc216 template. The charges on the proteins were neutralized based on the Varlet cut-off scheme. Then, the system was subjected to the energy minimization using 1500 steps of steepest descent. The geometrical quality of the Cα backbone conformation was investigated using the root mean square deviation (RMSD) that is produced during MD simulation. According to the RMSD plots ( Fig. 7), both docked complexes are stable mostly during the simulation. Based on the RMSD plot of the vaccine-TLR4 complex ( and TNF-α 24 . The H-bonds and hydrophobic interactions between the immune receptors (i.e., TLR4 and TLR5) and the DPVC are represented as a two-dimensional graph in Fig. 8.
Having capitalized on the in vivo cleavable linker (PPGVS) between the PADRE sequence and intramolecular adjuvants, it is expected to have a high level of either TLR-dependent innate immunity by the in vivo cleaved intramolecular adjuvants (FlgC and RS09) and S glycoprotein domains, and also the adaptive immune responses by PADRE sequence and SARS-CoV-2 S glycoprotein domains.
ii. The self-amplifying mRNA (replicon) vaccine construct In this approach, we designed a SAMV construct using the genes encoding the non-structural proteins

Discussion
Today, the sudden emergence with the quick spread of the novel zoonotic infectious agent, SARS-CoV2 ( Fig. 1), has led to a serious pandemic. Currently, several vaccine research teams in several countries are working to design, develop, and formulate an e cient prophylactic vaccine/adjuvant 2,27-29 . However, the conventional vaccine platforms against such a high transmissible and less-known infectious agent is extremely a time-consuming and risky task. Accordingly, among different vaccine platforms, selfamplifying mRNA vaccines as the next generation of mRNA vaccines provide a cost-effective and timee cient strategy for the development of vaccines compared to the traditional methods. Conducting a rapid vaccine engineering approach during such a viral pandemic may need three important preliminary research steps, including (i) viral genome sequencing, (ii) bioinformatics and data analysis, and (iii) designing a gene-based vaccine construct. Under these circumstances, computational modeling and simulation methods can assist the vaccinologists to extrapolate close to real biological evidence for designing a promising recombinant vaccine with high accuracy, least cost, and minimal time 17,30,31 . The in silico vaccinology, as a synergistic strategy is mainly based on (i) discovering of candidate vaccine antigens through the computer-aided data analysis approaches (e.g., reverse vaccinology) 32,33 , and (ii) identi cation of immunodominant epitopes by applying an immunoinformatics pipeline [34][35][36][37] .
In this context, along with releasing multiple whole-genome sequences of SARS-CoV-2 together with our previous experience in designing and developing an epitope-based recombinant vaccine against Echinococcus granulosus through a comprehensive eld trial(s) (National Patent number: 100538; IPC: C12R 32/1;A61P 00/33;C12N 00/15), we designed two domain-based vaccine constructs based on the two different vaccine production and delivery platforms (i.e. recombinant protein vaccine, and selfreplicating mRNA vaccine) as candidate prophylactic treatment against COVID-19. In this line, we used the reference sequence of SARS-CoV-2 spike glycoprotein (accession ID: YP_009724390.1) to rationally design the vaccines. First, to nd out the virus origin and its conserved/variable regions, we carried out a multiple sequence alignment and also phylogenetic analysis based on all the sequenced spike glycoprotein of SARS-related CoVs. According to our phylogenetic analysis, SARS-CoV-2 has a close genetic similarity to the bat-derived CoVs (Fig. 2). A previous analysis using the haplotype network analysis announced that SARS-CoV-2 has emerged (or maybe emerging) due to the high frequently recurrent genetic recombination especially in the receptor-binding domain (RBD) of spike glycoprotein 38 .
Theoretically, this natural occurrence has been likely affected in the virus transmissibility and pathogenicity through multiple amino acid alterations than SARS-CoV 39 . Based on the sequence variability analysis presented in the Shannon entropy plot (Fig. 2B), the RBD was found to be highly variable among different SARS-related CoVs. Tai  Despite these homology-based methodologies for epitope mapping, we believe that an emerged virus may develop sparse peculiar epitopes. Especially, in the variable residues of the spike antigen, emerging probable neo-epitopes may render different physicochemical features to form a stable complex with paratope site of antibodies and also binding groove of speci c HLA molecules. At this stage, the prediction of SARS-CoV-2 epitopes by monitoring its homolog viruses (i.e. SARS-related CoVs) seems to be a reliable method for conserved epitopes. By the same token, we computed the S glycoprotein sequence based on a multi-method B-cell epitope prediction approach through various machine learning and physicochemical algorithms to nd out the hub regions (not exact epitope sequence) with high potential for B-cell immune responses (Fig. 3A). Then, through the rigid cut-off value (≥ 0.6) we identi ed a list of n=11 most immunodominant B-cell epitopes ( Table 1). As shown in the 3D structure of the spike protein, these immunodominant regions are in the surface accessible areas of the protein (Fig. 4).
The currently developed methods for the T-cell epitope prediction are as a shortcut in epitope discovery; however, antigen processing and presentation in antigen-presenting cells (APCs) are followed through several complicated pathways. The T-cell epitope prediction servers specialized to provide widely dispersed dominant peptide binders with different lengths in a queried protein. Moreover, It is known that many of the cleaved peptides that are translocated into the endoplasmic reticulum (ER) have lengths of more than 8-10 amino acids, and some residues will be removed during processing by ER aminopeptidases 44,45 . The structural studies veri ed that there are many different mechanisms whereby a long peptide binder originated from either structural and nonstructural antigens can proceed into the APCs, attached, and presented by MHC class I and II molecules [46][47][48][49] . Currently, there is a lack of knowledge about the binding con guration/mechanism of SARS-CoV-2 epitopes and that how they make stable MHC-peptide complexes. In this regard, we used of online predictor tool IEDB to map potential high-rank T-cell peptide binders based on the reference set of HLA alleles covering > 97% (HLA-I) and > 99% (HLA-II) of the global population. For selection candidate CD8 + T-cell epitopes we de ned a consensus ranking score (CR) to nd out peptide binders with lowest CR score with highest HLA allele coverage (Fig. 5A). For prediction the most potential CD4 + T-cell binders, we selected peptide fragments with the lowest adjusted percentile rank (Fig. 5B). The nal T-cell epitopes were chosen based on the population coverage result of each predicted peptide fragment (Tables 2 and 3).
Having considered the scaled map indicated in Fig. S6, three hub domains of the spike glycoprotein covering the largest number of the best overlapping B-and T-cell epitopes were selected for the designing of the DVC (Fig. 6). Despite the high consistency between our predicted epitopes and the recently reported epitopes 27,28 , we decided to target immunodominant domains of spike glycoprotein for vaccine designing, in large part due to the uncertainty about the exact sequence of B-and T-cell epitopes in different studies. This strategy allowed us to have the optimal B-and T-cell epitopes through the natural humoral and cellular adaptive immune tra cking and APC-based proteolytic processing systems in the human body. We have joined the RS09 and S. typhimurium FlgC fragments at the N-terminal of the vaccine construct using an in vivo cleavable linker (Fig. 6A). The RS09 and FlgC are agonists for TLR4 and TLR5, respectively. RS09 is an LPS peptide mimicking entity that can bind to TLR4 and stimulate it, resulting in the subsequent activation of NF-κB signaling pathways and secretion of chemokines 50 . FlgC is the structural unit of the bacterial agellum, which can interact with TLR5-expressing cells (e.g., monocytes, neutrophils, DCs, lymphocytes, and macrophages) as an agonist of TLR5 51,52 . Some studies reported the synergistic effects of the TLR4 and 5 signaling pathways; therefore, the use of FlgC might modulate initial innate and then the subsequent adaptive immune responses 51,53 . We have validated the interaction of vaccine construct with the TLR4 and TLR5 using molecular docking and then molecular dynamics simulations (Figs. 7 and 8). Of note, as a strength, the self-amplifying mRNA vaccines (Fig. 9) have a high self-adjuvanted nature and both the endosomal and cytosolic RNA sensors (e.g., TLRs 3, 7, 8 and retinoic acid-inducible gene-I (RIG-I) receptors, respectively). Once delivered using an appropriate targeted vaccine delivery system, it can be recognized and internalized by the APCs where the viralderived agents can trigger the innate immune signaling cascades (Fig. 10) 54 .
The Pan-DR epitope (i.e. PADRE sequence), a 13-mer synthetic T helper epitope, was also used to elicit more e cient adaptive immune responses. It is demonstrated that the linear PADRE epitope in conjugation with the carbohydrate B-cell epitope can stimulate speci c IgG antibody 55 . The PADRE sequence was added between the RS09 and spike glycoprotein's domains using the intracellular cleavable linker to facilitate its independent processing and presentation by APCs.
To produce the designed recombinant protein vaccine in a lab setting, a suitable expression host such as microalgae can be used to express the recombinant vaccine with the optimal post-translational modi cations 56,57 . In the case of SAMV construct, although both the non-viral delivery systems (e.g., lipid nanoparticles 58 , polymeric nanoparticles 59 , and cell-penetrating peptides 60 ), and in vivo transfection systems (e.g., injection, electroporation, and gene gun) can improve the stability and cellular uptake e cacy, however, the naked SAM vaccine can be taken up as well by signi cantly antigen-presenting cells without any additional required formulation 61 .

Conclusion
Having capitalized on bioinformatics tools in the current study, for the rst time, we designed two domain-based vaccine constructs against SARS-CoV-2 based on the two different vaccine production and delivery platforms including, (i) a recombinant protein vaccine, and (ii) a self-amplifying mRNA vaccine.
We believe that the results of this study can be a step ahead in the vaccine development campaign against SARS-CoV-2. The methods used for identi cation of the hub residue fragments of S glycoprotein were conducted based on the rational data ltering and also the precise multi-method analyses of various immunological datasets. The sequential and structural analysis of the DPVC showed that the vaccine is stable, safe, and immunogenic. In this context, these constructs are our urgent ongoing project to monitor Secondary and tertiary structure prediction of S glycoprotein The secondary structure of S protein was predicted employing the PSIPRED web-server 70 . The 3D structure of S protein was homology modeled using the SWISS-MODEL online tool 71  reported crystal structures in Protein Data Bank (6LVN, 6LXT, 6VSB, 6VXX, and 6VYB).

Structure re nement, molecular dynamics simulation, and validation
To re ne the 3D model for the hydrogen bonds and overall structural relaxation, it was subjected to the GalaxyRe ne server processing 72 . To optimize the model's free energy, the re ned model was subjected to a molecular dynamics (MD) simulation recruiting GROMACS 5.0.7 software together with the GROMOS 96 force eld 73 . The MD simulation procedure was carried out at 310 K by placing the model into a cubic box that had a suitable size and two Na+ ions to neutralize the environment. Subsequently, the RMSD graph was drawn for the analysis of the dynamic behavior of the constructed model 74 . The local and overall quality of the improved 3D model was checked using online web-servers, including PROCHECK 75 , verify3D 76 , ERRAT 77 .
In silico B-cell epitope mapping: a multi-method approach The potential B-cell epitopes (BCEs) were predicted by using the sequence-and structure-based tools. To predict linear and conformational BCEs with high accuracy, we implemented a multi-method approach based on the different currently available online BCE prediction web-servers 17 . We exploited the physicochemical and machine learning methods such as all the predictor tools of the Immune Epitope was utilized to predict and map the potential discontinuous B-cell epitopes. The FASTA sequence of the protein was imported into the Excel program and any single amino acid was separated in a single cell as a set of consecutive cells using a user-de ned function named "AddSpace" (the Excel VBA code is shown in Table S8). The scores of each of the twenty-one prediction algorithms were normalized to have values between 0 and 1. Then, an average of all normalized scores for each residue was represented as a plot, in which the immunodominant regions of the S protein sequence were highlighted based on a strict threshold value of ≥ 0.6. For the residue-based comparison analysis of the nal predicted B-cell epitopes, the pairwise sequence alignment was implemented employing Clustal Omega web-server 87 between the reference sequences of the spike proteins of SARS-CoV (accession ID: NP_828851.1) and SARS-CoV-2 (accession ID: YP_009724390.1). All the experimentally-determined spike glycoprotein SARS-CoV-derived B-cell epitopes (Table S9)  T-cell epitope prediction SARS coronavirus-associated T-cell epitopes are almost all correlated to the HLA complex antigen recognition. However, the HLA alleles are highly polymorphic among populations and there is no entire screening system to clarify the possible association between the occurrence of SARS-CoV-2 and the susceptibility/resistance of various HLA alleles. Therefore, in such diseases, it is logical to use the reference sets of HLA alleles with the maximal population coverage. The T-cell epitope prediction was performed using the reference isolate of SARS-CoV-2, i.e., spike protein sequence (NCBI: YP_009724390.1). Due to utilizing a vast number of the human leukocyte antigen (HLA) alleles during the calculation of peptide-MHC binding, the predicted output table might be quite substantial. Therefore, the prediction of peptide binders for class I and II MHC molecules was carried out based on the strict cut-offs to give the more accurate and reliable peptide binders. To have a nal set of the epitope for vaccine designing, those candidate epitopes that displayed overlap for multiple alleles were selected.
i. CD8 + T-cell epitope prediction The cytotoxic T-lymphocyte (CTL) epitopes were predicted by utilizing the IEDB recommended v2.22  50 , and stabilized matrix method (SMM) IC 50 ). Then, the sorted binders were ltered based on an MHC binding a nity (IC 50 ) value of ≤ 50 nM, and the percentile rank of ≤ 1.0, as strict thresholds. In the end, we selected the best candidate peptide binders via de ning a ranking score, so-called "consensus rank" (CR). This latter score was calculated by the following equation: [CR = average rank of a mapped peptide binder/n], where, "n" refers to the total number of alleles that covered by a peptide binder.
Therefore, it provides a small list of candidate peptide binders not only they have the highest prediction rank but also can bind to a wide range of alleles.
ii. CD4 + T-cell epitope prediction To predict the most potential CD4 + helper T-cell epitopes, we used the IEDB recommended algorithm v2.22 (consensus approach) 91 based on the full HLA reference set that can cover > 99% of the global population 92 . The epitope length was speci ed on a variable-length option 12-18 that can cover 82.89% of epitope frequency. To generate a consensus list of CD4 + T cell epitopes, we selected the best peptides based on the adjusted percentile rank ≤ 1.0 (as a strict cut-off) and the number of MHC-II alleles covered by the candidate predicted peptide binders.
Population coverage for selection consensus T-cell epitopes HLA molecules are extremely polymorphic, thus using multiple peptides with various HLA binding speci cities will give more coverage of the population targeted by domain-based vaccines. Accordingly, in this study, we computed population coverage of the nal T cell epitopes using the allele frequency net database 93  . This peptide appears as the optimal cleavage site of matrix metalloproteinase-9 (MMP-9), which is a member of the metalloendopeptidase distributed in the human skin 99,100 . The PADRE sequence was linked to the main domain of the vaccine construct using the Cathepsin S cleavable linker (PMGLP). In the human skin, the protease activity of cathepsin S has the main role in the antigen presentation pathways mediated by MHC class II molecules [101][102][103] . The details of the designed DPVC are captioned in Fig. 6.
The second vaccine construct was designed as a self-amplifying mRNA (SAM) replicon vaccine. In this construct, we used the identi ed immunodominant regions of the glycoprotein S as a vaccine sequence.
Further, to have a SAM construct we used the genes encoding non-structural proteins (nsp) of Semliki Forest virus (NCBI reference sequence: NC_003215.1) as a genomic (+) single-strand RNA alphavirus 104 .
The nsp1-4 region can improve properly the mRNA capping, stability, translational e ciency, and can form properly the RNA-dependent RNA polymerase (PDRP) complex 13 . The SAMV construct was anked between the newly designed 5' and 3' untranslated regions (UTRs) named as NASAR 105 . NCA-7d, as the 5' untranslated region (UTR), and S27a+R3U, as the 3' UTR. We propose a newly developed CleanCap TM method (by TriLink BioTechnologies, US) with base analogs Adenosine and Uridine for the mRNA capping process (cap residue: m 7 G(5')ppp(5')(2'OMeA)pU). This 5'-capping, as a co-transcriptional capping technology, is specialized for the high e cient production of the SAMVs with naturally creating Cap 1 structure. The SAMV construct is schematically represented in Fig. 9.
Prediction of vaccine antigenicity, safety, and stability The antigenicity analysis was varied out using VaxiJen v2.0 server 106 . The potential allergenicity of the vaccine construct was evaluated in the AlgPred (using the hybrid method) 107  Structural simulation of the vaccine binding a nity The tertiary and secondary structure of the vaccine construct was predicted using the I-TASSER and the Garnier Osguthorpe and Robson (GOR) version IV online servers 110,111 . The highest quality 3D model was re ned through the GalaxyRe ne server 72 and then was executed for the energy minimization by the GROMACS 5.0.7 software package 73 . The structural quality of the optimized 3D model was validated using PROCHECK 75 web-server. The molecular docking was performed via ClusPro v2.0 online server 23 to assess the binding a nity between the DVC and extracellular regions of the human TLR4 (PDB ID: 4G8A), and TLR5 (PDB ID: 3J0A) molecules. The output of docking simulations was visualized and analyzed using the Chimera v1.14 112 and DIMPLOT schematic diagram of LigPlot + v2.2 25 , respectively.

Declarations Acknowledgments
The authors are very thankful to all the nurses, physicians, and every one of the workers in hospitals who have been being exposed to the SARS-CoV-2 infectious agent worldwide. Further, the authors are grateful to the Research Center for Pharmaceutical Nanotechnology (RCPN) at the Tabriz University of Medical Sciences (TUOMS) for the financial (Grant #65207) and technical support. This work has synchronically been applied to be patented.

Author Contributions Statement
The study protocol and research concept were designed by Y     was long enough to achieve convergence (or stability) for the protein.
C) The position of the ten dominant predicted B-cell epitopes. Owing to the lack of crystallized template protein, some residues in the beginning and end of the 3D model (1-27, and 1021-1273, respectively) are missed. The 3D structures were visualized by UCSF Chimera v1.14 software19.    Schematic representation of the different parts of the designed self-amplifying mRNA vaccine (SAMV). A) The designed SAMV consists of the genes encoding non-structural proteins (nsp1-4) of the Semliki Forest virus (NCBI reference sequence: NC_003215.1). The identi ed immunodominant regions of the glycoprotein S were used as vaccine sequences of interest. The nsp1-4 regions can form the RNAdependent RNA polymerase (RdRp) complex. The SAMV construct was anked between the 5' and 3' untranslated regions (NCA-7d, and S27a+R3U, respectively). A tail of 40-120 adenosine residues (Poly(A) tail) is inserted in the 3' end of the construct to improve the SAMV stability and functionality. B) The 5' end of the SAMV construct contains a cap 1 structure with base analogs AU for the mRNA capping process.

Figure 10
A schematic illustration of the intracellular processing of LNPs formulated the SAMV and the subsequent innate and pathogen-speci c immune responses. The in vitro transcribed SAMV vaccine is formulated as a targeted VDS, which is internalized by the antigen-presenting cells through receptor-mediated endocytosis (1). The targeted VDS is escaped from the endosomal compartment, and the initial endosomal RNA sensing by TLRs (mainly TLRs 3, 7, and 8) is activated (2). Upon SAMV endosomal escape, two main pathways of innate and adaptive immune responses can be activated (3). In the innate immune responses, steps 4' to 7' can occur. Both the SAMV construct and the initial endosomal RNA sensing system activate the secondary RNA sensing system which is induced by cytosolic pathogen recognition receptors and then results in the production of type I interferons (INFα/β) (4', 5'). INFs are secreted (6'). The regulatory impacts of INFs are imposed on T-cell activity pathways (7'). In the Adaptive immune responses, steps 4-9 can occur. The in vivo translation of SAMV construct, the formation of RDRP complex, and the beginning a self-replication machinery for enhancement the protein yield occur (4). The newly produced recombinant proteins have three possible destinies (5). First, protein is released to the extracellular space and its TLR agonists (i.e. RS09, and FlgC) can activate both TLRs 4 and 5, respectively (6). Second, protein is degraded by proteasomes to the small peptide fragments (7). The peptide fragments are processed by the endoplasmic reticulum (8). The MHC class I-epitope complexes are presented on the cell surface (9). Third, the peptides enter the proteolytic endosomes (A) to form the MHC class II-epitope complexes (B) and to be presented on the cell surface (C