This study aims to start from the very beginning of proteome retrieval to using in silico approaches for drug discovery. The overall study design is outlined in a flow diagram (Figure 1).
Epitope based vaccine design
Screening the complete proteome of B. pseudomallei, the reference strain we used was K96253, followed by the retrieval of the amino acid sequences in FASTA format using the UniProt Knowledgebase (UniProtKB) database. Filtering out the non-structural proteins (NSP) from this pool, we analyzed the sequences of the structural proteins to study several key properties of prospective vaccine candidates, which include antigenicity, solvent accessibility, surface accessibility, flexibility, and MHC class-I binding affinity [11,12].
To analyze the degree of digression of the protein sequences over time, phylogenetic trees were constructed by MEGA software with a bootstrapping value of about 1000 . In order to predict effective antigens and subunit vaccines out of the protein sequences, VaxiJen v2.0 server  was used while maintaining its default parameters. Only one antigenic surface protein was screened out based on antigenicity score for further evaluation.
Cellular localization of proteins
PSORTb (http://www.psort.org/psortb/)  was used to predict the cellular location and function of proteins. Among these proteins, we selected one transmembrane protein, Aerotaxix Receptor, flagellar by nature as a potential vaccine target. The flagella and motility, along with the bacterial resistance to the bactericidal action of human immunity, is equally important to showcase significant roles in mobility, chemotaxis, and dissemination of B. pseudomallei from primary sites of infection, namely lungs or skin, to a distant part of the body through systemic circulation [16,17,18].
T cell epitope prediction
Systematic projections of CTL (cytotoxic T lymphocytes) epitopes are integral to the effective design of vaccines. NetCTL-1.2, a web-based device for screening human CTL epitopes in a target sequence, calculated the total score of CTL epitopes by adding up the values of TAP (transporter associated with antigen processing protein complex) transport efficiency, proteasomal cleavage, and MHC-I molecules binding affinity. We have set the parameter at 0.5 as per our study requirements with a sensitivity and specificity of 0.89 and 0.94, respectively. The epitope embodying the highest score was carefully chosen for further in silico investigation. To assess the binding affinity of peptide epitope to MHC-I molecules, IC50 values that serve as a definitive measure of drug potency were evaluated [19, 20]. Prior to computation, peptide length was defaulted to 9 amino acids. To validate further, we carefully picked the alleles with a binding affinity (IC50) of less than 200 nm. Another IEDB (Immune Epitope Database) analytics software tool was used as well to measure processing score, TAP score, proteasomal cleavage score, and MHC-I binding affinity. Epitope conservancy, population coverage, and coverage area were also calculated . The candidacy of the peptide epitopes were further scrutinized by IEDB MHC class I binding prediction tool  for all the available class I alleles .
For assessing the allergenicity of the selected epitope, AllerHunter,  an online tool, was implemented. To verify the results of this tool, allergenicity was also evaluated by an additional online server called AllerTop 2.0 .
3D structure prediction
The highest conserve ETAAADALY epitope was submitted to PEP-FOLD  (peptide structure prediction server) to launch a molecular docking simulation study which gave us 5 rational 3D structures. The most reliable model for studying the interactions with HLAs was chosen. UCSF Chimera, a high throughput program for interactive visualization has also been used  for molecular modeling and visualization of the structure.
AutoDockVina verified the interaction between HLA molecules and our target (peptide epitope) through a docking simulation study . We retrieved 4uq2 (PDB id of a crystal structure of HLA-A*26:01) from the RCSB (Research Collaboratory for Structural Bioinformatics) database , followed by elimination of the Azo Benzene Containing Peptide by Discovery Studio, which forms a complex with the HLA-A*26:01 binding site . The removed Azo Benzene Containing Peptide was also allowed to dock with the HLA molecule which is regarded as control docked complex in this construct. The binding energy of our predicted epitope (ETAAADALY) to the MHC-1 allele was compared to the one of the docked structure set as control.
B cell epitope prediction
Linear B cell epitopes were predicted based on the immunogenicity of the protein sequence via B cell epitope prediction tools of IEDB . Emini surface accessibility prediction tool  (default threshold level 1.0) predicted the likelihoods of exposure of surface amino acids within the proteins constituting B cell epitope, while hydrophilicity was probed by Parker hydrophilicity prediction tool  (default set-up). The IEDB analysis resource also provided the Karplus and Schulz flexibility scale  for determining the flexibility of the potential B cell epitope . Chou & Fasman Beta Turn Prediction Tool checked the frequency of beta-turns in the prospective antigenic region among the protein . Beta turns are the most common protein motifs that help the polypeptide chain change its direction. Additionally, the Bepipred tool was used to predict linear B cell epitopes . ABCpred server was employed for further corroboration . To make a precise distinction between intracellular and surface proteins, transmembrane topology prediction was carried out using TMHMM (Transmembrane protein topology prediction method based on hidden markov model) v0.2 server . Eventually, the epitopes that have been found to completely overlap between Bepipred and ABCpred prediction tools are selected as the optimal B cell epitope(s).
Target site analysis
To evaluate the physicochemical profile of the designated protein, ProtParam  and, the self-optimized prediction method with alignment (SOPMA) of the Expasy server [40, 41] were utilized. GlobPlot 2.3 predicted disease-causing regions among the protein sequence . The three-dimensional (3D) structure of the selected protein was visualized via Modeller 9.14 through HHpred [43,44]. The protein sequence underwent rigorous modeling followed by frequent energy minimization refinement program of Swiss-PDB Viewer. Both Swiss-PDB Viewer  and PyMOL  were used as visualization tools. PROCHECK  checked the stereochemistry of the protein structure which is executed by Ramachandran Plot analysis  found in the ‘protein structure and model assessment tools’ section of SWISS-MODEL workspace . Upon submission of the refined PDB format in the server, 2.5 Å resolution value was picked for a better protein structure. The protein structure and 3D profile were assessed using ERRAT  and model quality was estimated by QMEAN . The catalytic binding sites of the protein were scanned as per the structural correlation between the template and the model construct with CASTp (computed atlas of surface topography of proteins)  server. Discovery Studio 4.0 client  has been employed in the measurement of the hydrophobicity.
Pharmacokinetic/Pharmacodynamic modelling for validation
Pharmacokinetic/pharmacodynamic (PK/PD) modeling and simulation can be viewed as an apt, economical approach to work on issues like efficacy and safety of novel high throughput drugs. Ranging from the preclinical phase to phase Ⅳ of drug development, all phases might benefit from this simulation tactic. The proposed peptide was confirmed by this simulation too for the testing of the dosage. The classic and most commonly used pharmacodynamic model is the nonlinear kinetics scheme that follows typically, Michaelis–Menten equation  as shown below:
Here C represents the steady-state concentration, Cmax
the theoretical maximum for C, A the amount absorbed, Amax
the theoretical maximum for A, and D the dose.
Molecular dynamics simulation
The magnitude of affinity between the peptide and the HLA molecule was ascertained by Molecular Dynamics (MD) simulation. MD simulation was performed using Gromacs 2018.1 program . OPLS-AA/L all-atom force field (2001 amino acid dihedrals) was used for preparing topology of the structure, and water molecules were added using the SPC/E water model. The structure was placed at the center of a cubic box, where the distance between the structure and the cubic box was one nanometre. To make the structure well solvated with an explicit solvent, the SPC216 model was used, an all-inclusive equilibrated three-point solvent model. Using that model, 15033 water molecules were added inside the box. The long-range electrostatic interactions were estimated using Particle-Mesh Ewald (PME) method with an interpolation order of four and maximum Fast Fourier Transforms (FFT) grid spacing of 0.16 nanometre. The coulomb radius was set to one nanometre and the short-range interactions also known as van der waals was set to a threshold value of one nanometre . In this construct, periodic boundary conditions were applied, and the bonds were constrained using the LINCS algorithm to their equilibrium position. Three Sodium (Na+) ions were added into the box to neutralize the net charge of the structure. Subsequently, the structure was minimized for 50000 steps using the steepest descent minimization algorithm and subjected to 100 picosecond equilibrations at constant temperature and pressure of 300K and 1 bar, respectively. After the structure was well-equilibrated at the desired temperature and pressure, the production MD was performed in isothermal-isobaric conditions for one nanosecond, and the coordinates were saved every ten picoseconds. The following equation was solved to calculate the stability of the complexes after one nanosecond.
Where N is the number of atoms, Mi is the mass of atom i, Xi is the coordinate vector for target atom i, Yi is the coordinate vector for reference atom i, and M is the total mass. If the RMSD is not mass-weighted, all Mi = 1 and M = N.