Immunoinformatics Analysis and In-silico Design of Multi- Epitopes Vaccine Against Lassa Virus

Lassa virus, an arenavirus, represents the most prevalent human pathogen causing viral haemorrhagic fever. It is endemic in Nigeria and other West African countries. Despite the high burden of the disease, limited treatments are available and no approved vaccine for the prevention of this disease is available. In this study, an immunoinformatics approach was used to predict response of B and T cells from the Lassa virus proteome (GPC, NP, L and Z). The designed chimeric vaccine was modeled, rened, validated and docked with the RIG-I receptor. The docked complex of vaccine-RIG-I was subjected to dynamic stability test and the results suggest that the complex is stable. Validation of the nal vaccine construct was done through in silico cloning using E. coli as host. A CAI value of 0.99 suggests that the vaccine construct expressed properly in the host. Immune simulation predicted signicantly high levels of IgG1, T-helper, T-cytotoxic cells, INF-γ and IL-2. This theoretical study suggests infection control by creating an effective immunological memory against Lassa virus infections. However, both in vitro and in vivo experiments are needed to validate the immunogenicity and safety of the chimeric vaccine.


Introduction
Lassa virus, an arenavirus, historically originated from Lassa village in the north-eastern Nigeria, estimated to infect over 300 000 individuals per year across West Africa and responsible for over 3000 deaths 1 . Lassa fever is endemic to the West African countries like Nigeria, Sierra Leone, Liberia, Benin, Ghana, Guinea and Mali 2 . From January to April 2020, over 963 con rmed cases and 188 deaths were reported by the Nigerian Center for Disease Control with the majority of the con rmed cases in Edo and Ondo States 3 . Lassa virus was rst discovered in 1969 4 and is, carried by persistent infection of the reservoir rodent host Mastomys natalensis 5 . Infection possibly involves inhalation of contaminated aerosolized rodent excreta and consumption of contaminated food 6 . The virus is transmitted from one person to another via contact of infected bodily uids [7][8][9][10] .
Arenaviruses are enveloped negative-strand RNA viruses 11 and the genome consists of two segments. The small segment encodes the glycoprotein precursor (GPC), expressed on the envelope of the virus in a trimeric state 12 and a nucleoprotein in the reverse direction that encapsulates the viral genome. The large segment encodes the viral matrix protein and the viral RNA-dependent RNA polymerase 5 . The posttranslational modi cation of GPC produces a stable signal peptide (SSP), N-terminal GP1, and transmembrane GP2 13 . GP1 binds to cellular receptors and is important for cell-to-cell proliferation of infection while GP2 mediates viral fusion and structurally resembles class I viral fusion proteins 11 . The Lassa virus moves into the host cell through receptormediated endocytosis, and subsequently transported to the endosomal compartments, where fusion occurs at low pH 14,15 .
Dystroglycan (DG), the rst Lassa virus receptor to be discovered, is universally expressed in a conserved cellular receptor for extracellular matrix proteins 16,17 and found in most tissues in mammals 11 . Produced as a single polypeptide chain, DG undergoes autoprocessing, yielding the peripheral α-DG, serving as a Lassa virus, Lymphocytic choriomeningitis virus glycoprotein and class C new-world arenaviruses receptor [18][19][20] . The transmembrane β-DG plays important roles in adhesion of extracellular matrix to the cytoskeleton and as a cell adhesion receptor in both muscle and non-muscle tissues 21 . Activation of Lassa virus-speci c T cells is believed to play a vital role during infection 22,23 as strong T-cell response to Lassa virus is critical for protection against Lassa fever 5,24,25 . Despite the high disease burden in the endemic area, there are no prophylactic measures for Lassa virus. Although progress has been made in the identi cation of promising preclinical candidates, the majority of them have failed in clinical trials. The lack of approved vaccine and limited treatment options makes the development of novel therapeutic strategies against Lassa virus urgently needed.
The experimental methods for vaccine design are challenging and time-consuming, thus Immunoinformatics has emerged as a new tool for identi cation of the target antigens for vaccine development [26][27][28] . Immunoinformatics approach can narrow down a vast number of potential molecules to be tested, therefore increasing the chance of nding better candidates. Epitope based vaccines has been reported to be effective in elucidating protective immunity against in uenza A 29 , hepatitis B 30 and C virus 31 , Leishmania 32 , Shigella 33 and Mayaro virus 34 .
Multi-epitope vaccines ultimate method for the prevention and treatment of tumors or viral infections [35][36][37][38][39] . Multi-epitope vaccines are composed of multiple MHC-restricted epitopes that can be recognized by TCRs of multiple clones from various T-cell subsets 38 , cytotoxic T lymphocytes (CTL), Th and B-cell epitopes. This induce strong cellular and humoral immune responses simultaneously. Components with adjuvant, multiepitopes vaccine can enhance the immunogenicity, provide long-lasting immune responses and reduce unwanted components that can trigger either pathological immune responses or adverse effects [38][39][40][41][42][43] . This study is using an immune-informatics-driven screening approach, and proteomic data for Lassa virus to design a multi-epitope vaccine that may elicit protective humoral and cellular immune response against Lassa virus.

Methodology
Lassa virus proteome retriever The complete amino acid sequence of the Lassa virus glycoprotein precursor (E9K9S4), nucleoprotein (A0A097F4S7), viral matrix protein (A0A5J6BM51) and viral RNA polymerase (A0A2P1JNX2) was retrieved from the UniProt database (https://www.uniprot.org/) in standard FASTA format 34 . The overall ow of the work is presented in Fig. 1.

Screening of virulence factor
The Vaxign-ML (http://www.violinet.org/vaxign2) pipeline was applied to compute the protective antigenicity (protegenicity) score and predict the subcellular localization, transmembrane helix and adhesion probability by a vaccine candidate 44 . Vaxign-ML predicts the protegenicity score based on an optimized supervised machine learning model with manually annotated training data consisting of bacterial and viral protective antigens. This method has been validated through nested ve-fold cross validation and leave-one-pathogen-out validation to ensure unbiased performance assessment and the ability to predict vaccine candidates against pathogen 44,45 .
Cytotoxic T lymphocyte (CTL) epitope prediction Prediction of CTL epitopes (for the structural polyprotein of Lassa virus) was performed using the online web tool NetCTL v2.0 (http://www.cbs.dtu.dk/services/NetCTL/). The epitope prediction is based on MHC-I binding peptide prediction, proteasomal C-terminal cleavage, and transportation e ciency of Transporter associated with Antigen Processing (TAP). Prediction of MHC-I binding and proteasomal C-terminal cleavage uses arti cial neural networks, while a weight matrix was used to calculate the e cacy of the TAP transporter. Threshold for the prediction of CTL epitopes was set at default value of 0.75 46,47 . To further prove that predicted epitopes can recognize peptides presented on MHC molecules and can be activated and elicit their effector functions, predicted CTL epitopes were subjected to the Class I immunogenicity server (http://tools.iedb.org/immunogenicity/) using the default setting to select the best epitopes for immunogenicity.
The tool was only validated for 9-mer peptides for class I immunogenicity 48 . Helper T lymphocytes (HTL) epitope prediction HTL epitopes for seven human alleles HLA-DRB1*03:01, HLA-DRB1*07:01, HLA DRB1*015:01, HLA-DRB3*01:01, HLA-DRB3*02:02, HLA-DRB4*01:01 and HLA-DRB5*01:01 were predicted from Lassa virus protein by MHC-II prediction module of IEDB (http://tools.iedb.org/mhcii/). The peptide a nity for each receptor is based on IC50 score given to each epitope. Peptides having higher binding a nity must have an IC50 value < 50 nM, whereas the IC50 score < 500 nM and < 5000 nM point to an intermediate and low binding a nity of predicted epitopes, respectively. The score of the percentile rank is inversely linked to the binding a nity of the epitope, i.e., the lower the percentile ranks the higher the binding a nity 49 . To prove the predicted HTL epitopes will have the ability to activate a Th1 type immune response followed by the Interferon-gamma (IFN-γ) production by MHC class II activated CD4 + T helper cells, the predicted HTL epitopes were subjected to the IFN-γ inducing epitopes server (http://crdd.osdd.net/raghava/ifnepitope/index.php) using the predict option. The motif and SVM hybrid was selected as the approach and IFN-gamma versus other cytokine as a model of prediction 50 . IFNs mostly have a protective role against infectious diseases and minimize host damage [51][52][53] .

Allergenicity prediction
Algorithms were used to predict the allergenic score for the predicted CTL epitopes, HTL epitopes and constructed vaccine sequence with great accuracy using the prediction of allergenic proteins and for mapping IgE epitopes on allergenic proteins (AlgPred) (http://www.imtech.res.in/raghava/algpred/). Accuracy acquired for this approach is about 85% at a 0.4 threshold. The server employs six different approaches for the prediction of allergenicity with very high accuracy 54 . Antigenicity prediction Antigenicity of the predicted CTL epitopes, HTL epitopes and constructed vaccine sequence were predicted using VaxiJen server (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html) with high accuracy at 0.4 thresholds for virus selected as target organism. The server is based solely on the physiochemical properties for prediction of antigenicity for a given amino acid sequence 47 . Toxicity prediction Non-toxic nature of the selected CTL epitopes and HTL epitopes were predicted using ToxinPred (http://crdd.osdd.net/raghava/toxinpred/) an online tool for prediction of toxicity of epitopes. The server predicts the toxicity of epitopes based on their physiochemical properties 55 . B cell epitope prediction B cell epitopes are identi ed and bound by the receptors associated with the B-lymphocytes on its surface and are characterized as antigenic factors. B-cell epitopes play a much larger role in the host antibody production plan. ABCpred server was used for prediction of linear B cell epitopes (https://webs.iiitd.edu.in/raghava/abcpred/index.html). The server predicts linear B cell epitopes in the Lassa virus proteome, using an arti cial neural network. It is the rst server developed based on recurrent neural network (machine based technique) with 65.9% accuracy using xed length patterns 56 . Maximum accuracy has been obtained using recurrent neural network (Jordan network) with a single hidden layer of 35 hidden units for a window length of 16.
Also, ElliPro an online server (http://tools.iedb.org/ellipro/) was used for prediction of conformational B-cell epitopes (default threshold was 0.5, default maximum distance was 6). ElliPro results are based on Thornton's technique and residue clustering algorithm, ElliPro assigns protrusion index (PI) score to each predicted epitope.
Using several ellipsoids, the 3D shape of the epitope is de ned and for each residue, the PI value is accurately described based on the center of mass of each residue, which is located in the outer region of the largest promising ellipsoid. A residue having larger value has better solvent accessibility 57 . Construction of multi-epitope vaccine sequence A vaccine sequence was carefully constructed using the CTL, HTL and linear B cells epitopes predicted by the above-employed immunoinformatics approaches. These CTL, HTL and B cells epitopes were joined with the help of AAY, GPGPG and KK 58 used as linkers respectively, and adjuvant was added with the help of the EAAK linker 34,59 in Notepad + + version 7.8. (Fig. 2). Physiochemical properties and domain identi cation Physiochemical properties such as amino acid composition of the vaccine, theoretical pI, molecular weight, instability index, half-life (in vitro and in vivo), aliphatic index and grand average of hydropathicity (GRAVY) of the nal vaccine construct were evaluated using an online web tool the ProtParam (http://web.expasy.org/protparam/) 60,61 . Additionally, the solubility of the vaccine construct upon overexpression in E. coli was predicted by the SOLpro (http://scratch.proteomics.ics.uci.edu/) tool in the SCRATCH suite 61,62 .
Secondary structure of construct vaccine prediction PSIPRED, a freely available web tool (http://bioinf.cs.ucl.ac.uk/psipred/) for protein secondary structure prediction for a given amino acid sequences with high accuracy was used for the prediction of the secondary structure of construct vaccine. PSIPRED 4.0 uses position-speci c iterated BLAST (Psi-Blast) to identify and select sequences showing signi cant homology to the vaccine protein and accomplished a normal Q3 score of 81.6% when an exceptionally stringent cross-approval strategy was utilized to assess its performance 63,64 . Tertiary structure of construct vaccine prediction A free online server, RaptorX (http://raptorx.uchicago.edu/) was used to predict the 3D structure of the vaccine sequence. The server provides automated tools for protein structure prediction and analysis. RaptorX also predicts secondary structure, contacts, solvent accessibility, disordered regions and binding sites for the input sequence 65 . For visualization of the predicted 3D model, Pymol software was used. Tertiary structure of construct vaccine re nement The 3D model obtained for the chimeric vaccine peptide was re ned using the GalaxyRe ne server (http://galaxy.seoklab.org/). The sever utilized the CASP10 re nement method for reconstruction of the protein's side chains, repacking as well as molecular dynamic simulations for relaxation of the 3D structure. Galaxy re ne is one of the best available servers for the overall improvement of both the global and the local quality of the given structure generated by utilizing the best protein structure prediction in available web tools 66,67 .

Tertiary Structure of Construct Vaccine Validation
Model validation, a critical step in the model building process as it detects potential errors in the predicted 3D models 64 was done through online servers in three stages. Firstly the ProSA-web (https://prosa.services.came.sbg.ac.at/prosa.php), calculated an overall quality score for a speci c input structure, and displayed it in the context of all known protein structures. Also, any problematic parts of a structure are shown and highlighted in a 3D molecule viewer. If the calculated score falls outside the range characteristic of native proteins the structure likely contains errors 34,68 . Secondly, the SAVES server v6.0 (https://saves.mbi.ucla.edu/) generated the overall quality score of the modeled protein by analyzing nonbonded atom-atom interactions and compared it to reliable high-resolution crystallography structures 34,69 .
Thirdly, PROCHECK analysis was carried out on (https://saves.mbi.ucla.edu/) generating a Ramachandran plot in order to visualize energetically allowed and disallowed dihedral angles psi (ψ) and phi (ϕ) of an amino acid.
The calculations are based on the van der Waal radius of the side chains. The server uses the PROCHECK principle to validate a protein structure by using a Ramachandran plot and separates plots for Glycine and Proline residues 70,71 .

Molecular Docking of the Re ned Vaccine Construct and RIG-I Receptor
Initiation of an appropriate immune response depends on the interaction of an antigenic molecule and a speci c immune receptor. Molecular docking is a computational approach that involves interaction between ligand and receptor to provide a stable adduct with a calculated score measuring the degree of binding interaction 71,72 . Retinoic acid-inducible gene RIG-I-like receptors (RLRs) can mediate high proin ammatory responses against Lassa virus infection 73,74 . Hence, the 3D structure of the RIG-I receptor was retrieved from a protein data bank (PDB ID: 2QFB) and used as the receptor 75 . The binding pockets in the RIG-I receptor was predicted by use the CASTp server (http://sts.bioe.uic.edu/castp/calculation.html). In CASTp, voids are de ned as buried, un lled empty spaces inside proteins following removal of all heteroatoms inaccessible to water molecules (modelled as a spherical probe of 1.4 Å) from the outside and pockets are concave caverns with constrictions at the opening on the surface regions of proteins. CASTp is excellent at providing identi cation and measurements of surface accessible binding pockets along with the information of inner inaccessible cavities for protein molecules 76 .
Molecular docking of the re ned vaccine peptide with the RIG-I receptor was performed using the PatchDock (http://bioinfo3d.cs.tau.ac.il/PatchDock/) server. The molecular docking calculation by PatchDock involves four basic stages, i.e. atomic shape portrayal, surface xes coordination as well as separation and scoring.
PatchDock separates the surface of both the input molecules into small patches as per the proper surface shape. Only after the identi cation of all these small patches, their superposition is accomplished by the use of shape matching algorithms 77 . Furthermore, the FireDock (Fast Interaction Re nement in Molecular Docking) web tool (http://bioinfo3d.cs.tau.ac.il/FireDock/) was used to optimize as well as re-score the rigid body molecular docking solutions. It provides the best ten solutions for nal re nement based on the binding value 78 . PDBsum (http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html) and PDBePisa (https://www.ebi.ac.uk/pdbe/pisa/) were employed to generate the graphical presentation of interaction between the docked complex 79,80 .

Molecular Dynamics Simulation
The study of molecular dynamics (MD) is essential for examining the stability of the protein-protein complex in silico analysis. MD simulation analysis of nal vaccine construct and vaccine-RIG1 complex was done using GROMACS software (version 20.4) 81,82 . Charmm36 83 force eld was employed for the simulation process, using water as a solvent. Cubic and dodecahedron boxes were made with a minimum distance of 1 Å between the surfaces and edges of the vaccine construct and the vaccine-RIG1 complex respectively, and were solvated with the TIP3 water model 84 . After neutralizing the system with Na + and Cl − ions using the Genion tool, energy minimization was carried for all the systems. The systems were equilibrated for 100 ps isothermal-isochoric (NVT) ensemble followed by 200 ps isothermal-isobaric (NPT) ensemble maintaining a constant 300 K temperature and 1 atm pressure. Finally, MD simulation was carried out for 50 ns with a time interval of 2 fs for both vaccine construct and vaccine-RIG1 complex. Root mean square deviation (RMSD) and root mea square uctuation (RMSF) analysis were performed to examine the standard deviation and uctuation of the protein backbone, and radius of gyration was performed to examine the folding compactness of the vaccine construct.

In silico Cloning Optimization of Vaccine Construct Candidate
Reverse translation and codon optimization were performed using the Java Codon Adaptation Tool server (http://www.jcat.de/) to express the multi-epitope vaccine construct in a selected expression vector 85 . Codon optimization was performed to express the nal vaccine construct in the E. coli (strain K12) host, as the codon usage of E. coli differs from that of the native host Lassa virus, where the sequence of the nal vaccine construct is derived. Three additional options were selected to avoid the rho-independent transcription termination, prokaryote ribosome binding site, and restriction enzymes cleavage sites. JCat output includes the codon adaptation index (CAI) and percentage GC content, which can be used to assess protein expression levels.
CAI affords information on codon usage biases; the ideal CAI score is 1.0 but > 0.8 is considered a good score 86 .
The GC content of a sequence ranges between 30-70%. GC content values outside this range suggest unfavourable effects on translational and transcriptional e ciencies 87 . Cloning the optimized gene sequence of the nal vaccine construct in E. coli pET-30a (+) vector, Nde I and Xho I restriction sites were introduced to the N and C-terminals of the sequence, respectively. Finally, the optimized sequence with restriction sites was inserted into the pET-30a (+) vector using SnapGene tool to ensure vaccine expression 71 .
Immune Simulation Proven the immunogenicity and immune response pro le of the vaccine construct further, in silico immune simulations were conducted using the C-ImmSim server (http://150.146.2.1/C-IMMSIM/index.php). C-ImmSim is an agent-based model that uses a position-speci c scoring matrix (PSSM) for immune epitope prediction and machine learning techniques for prediction of immune interactions. It "simultaneously simulates three compartments that represent three separate anatomical regions found in mammals: (i) the bone marrow, where hematopoietic stem cells are stimulated to produce new lymphoid and myeloid cells; (ii) the thymus, where naive T cells are selected to avoid auto immunity; and (iii) a tertiary lymphatic organ, such as a lymph node" 88 . Three injections were given at intervals of four weeks 89 . All simulation parameters were set at default with time steps set at 1, 84, and 168 (each time step is 8 hours and time step 1 is injection at time = 0). Furthermore, 12 injections of the designed peptide were given four weeks apart to simulate repeated exposure to the antigen seen in a typical endemic area in order to probe for clonal selection.

Retrieval of Proteins Sequences
Amino acid sequences for four proteins (glycoprotein precursor, nucleoprotein, viral matrix protein and viral RNA polymerase) of Lassa virus were retrieved from the UniProt database and used to predict the B and T cell epitopes for designing the multi-epitope subunit vaccine. Human β-Defensin 1 was retrieved from the UniProt database (P60022) and used as an adjuvant for the immune interaction based on its ability to induce antiviral immune response 90 .

Screening of Virulence Factor
Vaxign-ML analysis predicted four Lassa virus proteins (GPC, NP, Z, and L protein) as protective antigen with high protegenicity score. GPC and Z proteins were predicted as adhesive proteins (Table 1). Adhesin plays a vital role in the virus adhering to the host cell and enabling the virus entry to the host cell 91 . All the predicted proteins were different to human, mouse or pig proteins.

Cytotoxic T Lymphocyte (CTL) Epitope Prediction
A total of 96 unique CTL epitopes (9-mer) were predicted from the four Lassa virus proteins using the NetCTL v2.0 server. Among them, only 42 epitopes were found as antigenic, immunogenic, and non-toxic while 24 epitopes were found to be non-allergenic (Supplementary Table 1). From these, 12 non-allergenic epitopes were selected based on high immunogenicity scores as CTL epitopes for vaccine construction (Table 2).   Table 2). Among them, only 23 were found to be IFN-γ positive epitopes. From these, 13 non-overlapping epitopes were selected based on high percentile rank scores as HTL epitopes for vaccine construction (Table 3) The non-toxic nature of selected 12 CTL and 13 HTL epitopes were con rmed using the ToxinPred server. The results obtained from ToxinPred for the 12 CTL epitopes are shown in Table 1, and 13 HTL epitopes are shown in Table 2. These epitopes are considered safe for use.

Construction of multi-epitope subunit vaccine
The selected 12 CTL epitopes were connected by an AAY linker, adjuvant protein (Human β-Defensin 1) joined with the rst CTL epitopes using an EAAAK linker and 13 HTL selected epitopes connected by GPGPG linker to form the nal vaccine construct. The nal vaccine construct contained 494 amino acid residues and the structure is shown in Fig. 2.

B Cell Epitopes Prediction for Lassa Virus Protein
Among Linear B cell epitopes predicted by the ABCpred server, six epitopes of 16-mer lengths that were predicted to be antigen, non -allergenic and non -toxic were selected ( Table 3). The Ellipro suit was utilized to predict the conformational B-cell epitopes. A total of ve discontinuous B-cell epitopes of varying residue length were predicted, out of which three epitopes of residue length 48, 47 and 55 residues with a score of 0.721, 0.688 and 0.693 respectively, were evaluated (Fig. 3). Moreover, in the rst chain of the discontinuous epitope, the start and end residues were methionine and glycine with residue score of 0.

Antigenicity of the Vaccine Construct
VaxiJen server was used to predict the antigenic nature of our nal multi-epitope vaccine. The predicted score was 0.65. The results suggest that the vaccine candidate possess strong antigenic properties that will help to provoke the immune response. The scores obtained for each CTL and HTL epitopes from the server are presented in supplementary Tables 1 and 2.

Prediction of Physiochemical Parameters and Solubility
Physiochemical properties of the nal vaccine construct obtained from ProtParam server indicate a molecular weight of 65.2 kDa and theoretical protrusion index (PI) of 9.51 revealing that the vaccine construct is basic in nature. The estimated in vivo half-life in E. coli was greater than 10 hours, instability index was 38.9, which implies that the sequence of the construct will remain stable after expression. The aliphatic index was 74.3 revealing a thermostable nature at varying temperatures. The grand average of hydropathicity (GRAVY) was calculated to be negative (− 0.185). This negative value indicates a hydrophilic nature of the protein; thus, this protein tends to have better interaction with other proteins. The solubility as calculated by the SOLpro server was 0.60, Table 4.

Secondary Structure Prediction
The PSIPRED server was used to predict the secondary structure of the vaccine construct. The results obtained from the server revealed that the multi-epitopes vaccine contains alpha helix (32.1%), beta-strand (15.6%) and coil (52.3%) (Fig. 4). The multi-epitope vaccine also contains small nonpolar (43.7%), hydrophobic (19.8%), polar (21.9%) and aromatics plus cysteine (14.6%) residues. The RaptorX server predicted ve tertiary structure models of the designed chimeric protein based on Multitemplate based approach and was used for the homology model. The ve predicted models had estimated RMSD ranging from 10.50 to 12.53 Å. The model with the best estimated RMSD from the homology modelling was selected for further re nement.

Tertiary Structure Model Re nement
Re nement of the vaccine construct model 3D structure using GalaxyRe ne server generated ve models. Based on model quality scores for all re ned models, model 1 was selected to be the best based on various parameters including GDT-HA (0.897), RMSD (0.543), MolProbity (3.141), Clash score (59.2), Poor rotamers score (4.0), and Ramachandran plot (93.2 %). This model was chosen as the nal vaccine model for further analysis. Figure 5.

Tertiary Structure Model Validation
The quality and potential errors in the vaccine construct 3D model were veri ed by ProSA-web and SAVES.
ProSA-web analysis of the chosen model after re nement had a Z-score of -8.27 (Fig. 6A) while SAVES revealed an overall quality factor of 66.7%. The Ramachandran plot analysis of the model protein revealed that 86.8% of residues in the protein are in the most favoured regions and 12.4% in additional allowed regions. Additionally, 0.6% of the residues were predicted to be in generously allowed regions and only 0.2% in disallowed regions (Fig. 6B).

Molecular Docking of Re ned Model Vaccine with Immune Receptor (RIG-I).
The CASTp server was used to determine the protein binding and hydrophobic interaction sites on the protein surface. The molecular surface area of the pocket was 3858 Å 2 with a molecular surface volume of 4679 Å 3 , the mouth molecular area was about 7014 Å 2 , and the molecular circumference sum was 109 Å. To assess the interaction between the re ned model and the RIG-I (PDB ID: 2QFB) immune receptor, molecular docking was performed by the use of the PatchDock server. Hundred models were generated, which were scored based on their protein surface, geometry and electrostatic complementarity. The top 10 complexes were further submitted to the FireDock server for re nement and rescoring the docking solutions of RIG-I and multi-epitope vaccine candidates. The best docking solution was selected for further analysis from the molecular docking studies based on the binding score. The selected complex of the multi-epitope vaccine candidate and RIG-I had a binding energy of − 0.01 kcal/mol, the attractive van der Waals force was − 28.69 kcal/mol, the repulsive van der Waals force was 21.42 kcal/mol, and the atomic contact energy was 8.17 kcal/mol. The online database PDBePISA was used to generate the interaction between the docked complexes and ten interaction places with hydrogen bonds were noted. Structural analysis of RIG-I and multi-epitope vaccine also revealed that Arg886 formed a hydrogen bond with Met172, Gln300 and Thr301 at a distance of 2.58 Å, 3.80 Å and 3.80 Å respectively, while Tyr819 -Gly295 formed hydrogen bond at 3.42 Å. Similarly, Thr899 -Ser326 and Ile897-Ser326 develops hydrogen bond at 3.14 Å and 2.09 Å, Asp836 -Arg288 and Glu840 -Lys245 at 2.37 Å and 3.43 Å. Also, Gln867 -Arg171 formed hydrogen bond at 3.19 Å and Ile897 -Ala327 at 3.70 Å. (see Fig. 7).

Molecular Dynamics Simulation of the RIG-I -Multi-Epitope Vaccine Complex
The RIG-I-nal vaccine complex was subjected to molecular dynamics simulation for stability and residual uctuation test. RMSD and RMSF of this complex was calculated and presented in Fig. 8 while RMSD and RMSF graphs for the nal vaccine construct are given in supplementary Fig. 1. The average RMSD for the system for the time period of 50 ns was found to be 0.88 Å. RMSF for the side chain was found to show stability except for a few places, which are located in the loop region of the vaccine construct, showing higher uctuation. The radius of gyration is also within the acceptable range.
Codon Optimization and In silico Cloning of the Final Vaccine Construct The Java codon adaptation tool was used to optimize codon usage of the vaccine construct in E. coli (strain K12) for maximal protein expression. The optimized codon sequence was 1,806 nucleotides long. The Codon adaptation index (CAI) of the optimized nucleotide sequence was 0.99, and the average GC content of the adapted sequence was 50.7% indicating good expression of the vaccine candidate in the E. coli host. The optimal percentage range of GC content (54.9%) is between 30% and 70%. Lastly, the sequence of the recombinant plasmid was designed using the SnapGene software by inserting the adapted codon sequences into the pET30a (+) vector (Fig. 9).
In Silico Immune Simulation The immune simulation response was comparable with actual immune responses (Fig. 10), with the tertiary and secondary responses higher compared to the primary response. High concentrations of antigen count was observed at the primary response. In both the secondary and tertiary reactions, the typical high levels of immunoglobulin activities (i.e., IgG1 + IgG2, IgM, and IgG + IgM antibodies) were obvious with associated antigen reduction (Fig. 10A). This indicates the development of immune memory and consequently increased antigen clearance upon subsequent exposures (Fig. 10E). Additionaly, several long-lasting B-cell isotypes were observed, suggestive of the potential for isotype switching and memory formation (Fig. 10B). For the T-cytotoxic cell populations with the respective memory development, a similarly elevated response was noticed (Fig. 10F). In the course of exposure, continuous proliferating dendritic cells was observed (Fig. 10C), high levels of IFN-γ and IL-2 were also apparent with a low Simpson index (D) (Fig. 10D).

Discussion
Following Lassa virus infection in humans, 80% of the cases are mostly presented as a mild non-descript disease while 20% of infections result in severe hemorrhagic fever with multi-organ failure 92 . Lassa fever incubation period is usually 1-3 weeks with accompanying fever, fatigue, hemorrhaging, gastrointestinal symptoms (vomiting, diarrhea, and stomachache), respiratory symptoms (cough, chest pain, and dyspnea), and neurologic symptoms (disorientation, seizures, and unconsciousness) 7 . Presently, treatment is limited to supportive care and the use of ribavirin, which reduces mortality if administered early 93 . The use of Ribavirin is restricted to high-risk patients due to its adverse effects and mixed e cacy 93 Recently, attention has shifted towards the development of subunit vaccines because of improved safety pro les and logistical feasibility associated with them 103 . With the immunoinformatics approach, multi-epitope based vaccines represent a novel approach for producing a speci c immune response and avoiding responses against other unfavourable epitope antigens 104 . Other potential advantages of multi-epitope based vaccines also include better safety, the chance to rationally engineer the epitopes for increased potency, and the ability to focus immune responses on preserved epitopes 105 .
This work therefore focussed on the in silico design and development of a potential multi-epitope antigen peptide against Lassa virus using four proteins expressed in the uniprot database. It has been reported that immunity to Lassa virus is dependent on T-cells 22,23 . The role for antibody-dependent cell-mediated cytotoxicity 22,106 and the RIG-I pathway in immunity to Lassa virus has also been reported 5,107,108 . Using several servers, CTL and HTL epitopes were selected based on their antigenicity, allergenicity, immunogenicity and toxicity for the mult-epitope candidate. Additionally, only HTL epitopes that release other types of cytokines such as interferon-gamma (IFN-γ) were chosen for the multi-epitopes construct as IFN-γ secretion has been shown to be an an important mediator of protection against the Lassa virus. A multi-epitope peptide was formed by joining the selected epitopes with appropriate linkers 109 . The AAY and GPGPG linkers 34,71 were applied between the predicted epitopes to generate sequences with minimized junctional immunogenicity, therefore, allowing the rational design of a potent multi-epitope vaccine 109 . The EAAAK linker 110 , was also introduced between the adjuvant sequence and the fused epitopes in order to generate a high level of expression and improved bioactivity of the fusion protein 71 . Adjuvants are used as supplements in vaccine formulations to develop speci c immune responses to antigens, and enhance the stability, longevity of the vaccine against infection 111 .
The molecular weight (~ 65 kDa) of our vaccine candidate is within the average molecular weight  for a multiepitope vaccine 34 . Given that solubility of the overexpressed recombinant protein in the E. coli host is one of the vital requirements for many biochemical and functional analysis 64 , the vaccine construct was predicted to be soluble upon expression suggesting easy access to the host. The theoretical pI value suggests that the vaccine is basic in nature and the instability index revealing that the vaccine will remain stable after expression 112 , therefore, enhancing its usage capacity further. The hydrophilicity and thermostability as revealed by the GRAVY score and aliphatic index makes it well-matched for use in endemic areas, most of which are found in West Africa. Naturally unfolded protein regions and alpha-helical coiled peptides have been reported to be important forms of structural antigens 113 , making the knowledge of secondary and tertiary structures of the target protein crucial to vaccine design 109 . Secondary structure analyses showed that the protein consisted predominantly of coils (52.3%). Ramachandran plot of the re ned 3D structure of the vaccine candidate shows that 86.8% of the residues are located within the favoured and allowed regions with very few residues in the outlier region, demonstrating that the quality of the overall model is satisfactory 114 . Different structure validation tools were applied to identify errors in the modelled vaccine construct. The Z-score of − 8.27 and ERRAT quality factor of 86.5% from SAVES server revealed that the overall structure of the re ned vaccine is acceptable 34 .
Since the RIG-I receptor has proven recognition capability in Lassa virus 74 , the molecular interaction of vaccine construct through docking analysis suggested that the candidate vaccine has signi cant a nity to RIG-I to act as a sensor for recognizing molecular patterns of pathogen and initiating immune response 73 . Consequently, the vaccine construct RIG-I complex is capable of generating an effective innate and adaptive immune response against Lassa virus.
Molecular docking analysis and dynamics simulation carried out established potential immune interaction and stability between RIG-I and the multi-epitopes vaccine construct. Energy minimization conducted to minimize the potential energy of the whole system con rmed the complete conformational stabilization of the RIG-I-multiepitopes vaccine construct docked complex. The energy minimizes the inappropriate structural geometry by exchanging individual protein atoms, therefore making the structure more stable with suitable stereochemistry.
The derived RMSD and RMSF indicates the complex's stability 34 . Candidate vaccine construct demands validation by screening for immunoreactivity through serological analysis 115 . This includes the expression of the recombinant protein in Escherichia coli expression systems, as the systems are suitable for the production of recombinant proteins 116,117 . Codon optimization carried out to attain high-level expression of our recombinant vaccine protein in E. coli (strain K12), shows that the codon adaptability index (0.99) and the GC content (54.9%) were favourable for high-level expression of the protein in bacteria. Immune simulation results revealed consistent reaction with typical immune responses 112 . Following repeated exposure to the antigen, an increase in the immune responses was observed. Development of memory B-cells and T-cells was apparent, with memory B-cells lasting for several months and helper T cells stimulated. Increase in levels of IFN-γ and IL-2 following the rst injection maintained the peak levels after repeated exposures to the antigen. This indicates high levels of TH cells and thus e cient Ig production, associated with humoral response 118 . The Simpson index D for analysis of clonal speci city suggests a possible diverse immune response 119 . This is reasonable considering that the constructed chimeric peptide is composed of several B and T epitopes.

Conclusion
In this study, an immunoinformatics approach was used to design a potential vaccine peptide coding for multiple T-cell (HTL and CTL) epitopes. Given that the proteins containing these epitopes are expressed in Lassa virus, the vaccine peptide will potentially provide prophylactic bene ts. The interaction and binding potential between chimeric vaccine protein and immune receptor were high and stable. Effective immune responses in real life were modeled and observed in immune simulation. This study will potetially aid infection control by creating an effective immunological memory against Lassa virus infections. The next step is to express this peptide in a bacterial system and perform the various immunological assays needed to validate the results obtained.

Declarations
Author Contributions: AAO, SAM and ZMB conceived and designed the analysis; AAO, ITB, ODO and OJO performed immunoinformatic analyses; AAO prepared illustrations and wrote the manuscript; SSA, JON, BD and JRN contributed to the critical revision of the manuscript; JRN supervised the whole work; and all authors approved the nal manuscript.
Compliance with Ethical Standards Con ict of interest: All authors have declared that they have no con ict of interest.
Research Involving Human Participants and/or Animals: This article does not contain any studies involved with human participants or animals performed by any of the authors.

Data Availability
All data generated or analysed during the study are included in the submitted manuscript. The sequences of the protein analysed can be retrieved from UniProt database using their UniProt ID.

Figure 1
Presentation of the overall ow. The methodological strategy was divided into six parts: (i) Lassa virus proteome retrieval, (ii) prediction of epitopes, (iii) multi-epitope vaccine construction, (iv) vaccine modelling, (v) molecular docking and dynamic simulation, (vi) in-silico expression and immune stimulation.

Figure 2
Structure of the nal multi-epitope vaccine construct. CTL and HTL epitopes along with linkers are depicted.  Graphical representation of secondary structure prediction of the multi-epitope vaccine. The protein is predicted to have an alpha helix (34.4%), a beta-strand (15.8%), and a coil (49.8%).

Figure 5
3D structure nal subunit vaccine model.    In silico restriction cloning of the nal vaccine sequence into the pET30a (+) expression vector. The red part represents the gene coding for the vaccine and the black circle represents the vector backbone.