Identification of target proteins and sequence retrieval
5ʹ -Nucleotidase isoform 2 (NT5E) also called as ecto-5ʹ-Nucleotidase (CD73) is a surface enzyme present on B and T lymphocytes in humans.27 CD73 encoded by NT5E gene converts adenosine monophosphate (AMP) into adenosine. Several studies have reported over expression of NT5E in GBC leading to accumulation of adenosine.27, 70 The free adenosine produced by NT5E suppresses cellular immune responses through expansion of various immune suppressing cells like regulatory T cells and tumour associated macrophages while down regulating the natural killer T (NKT) cells and helper T (Th1) cells allowing malignant cells to evade detection.71-73 Over expression of NT5E is negatively correlated with survival period in GBC.27 In the tumour microenvironment, over expressed adenosine promotes tumour growth, invasion, angiogenesis, and metastasis while suppressing anticancer immune responses.29, 70 As a result, it has been identified as a promising therapeutic target for controlling GBCprogression.74
Aminopeptidase N (ANPEP) also referred to as CD13 is a membrane bound zinc metallo-enzyme expressed in macrophages and other human cells.75 Over expression of ANPEP has been linked with tumour size, differentiation and metastasis in pancreatic, gastric, breast and gallbladder cancers.28,29 Sanz et al. identified ANPEP activity in tumour tissue and plasma as an independent predictor and prognostic factor of 5-year survival in patients with colorectal cancer.75 ANPEP is a ubiquitous enzyme with moonlighting functional roles and expressed in normal cells limiting its applicability as a potential therapeutic target. However, evidence suggests that the ANPEP expressed by cancer cells is different in function and activity as compared to normal tissues.76 The activities of ANPEP have been linked to the progression of a number of malignancies, suggesting that it may have considerable potential in anticancer therapy.
Membrane metallo-endopeptidase (MME) or Neprilysin is also membrane bound zinc metalloproteinase implicated in promoting cancer progression.29 MME over expression has been reported in breast, lung, hepatocellular and gallbladder cancers and has been associated with poor prognosis.77 Researchers have also identified MME as a potential target for GBC treatment.
Sequences of selected proteins including NT5E (Uniprot ID- P21589), ANPEP (Uniprot ID- P15144) and MME (Uniprot ID- P08473) in FASTA format were retrieved from UniProtKB database. The stability of the proteins was verified by using ProtParam server. The instability index of NT5E, ANPEP and MME was found to be 32.59, 36.17 and 37.62 respectively indicating that selected proteins were stable. The physicochemical properties of the selected proteins were examined and are provided in Supplementary Table 1. The FASTA sequences were subsequently used for B and T-cell epitope prediction for vaccine construction.
Prediction of CTL and HTL epitopes
The CTL and HTL epitopes were predicted and selected on the basis of IC50 (<500nM), antigenicity (>0.5), immunogenicity, non-allergenicity and non-toxicity from NT5E, ANPEP and MME proteins vaccine development. The selected CTL epitopes covering different Human Leucocyte Antigen (HLA) super types are depicted in Table 1. The predicted HTL epitopes were also evaluated for IFN- γ inducing properties. The selected HTL epitopes and corresponding HLA-DR super types with IFN- γ score and antigenicity are shown in Table 2.
CTL and HTL Population Coverage Analysis
In order to develop a feasible vaccine candidate relevant to the global population, it is important to take into account the population coverage of the selected T-cell epitopes. The world population coverage of selected CTL and HTL epitopes was 93.78% and 81.81% respectively. The coverage of the individual CTL and HTL epitopes and their respective HLA genotypic frequency is summarized in Table 3. The HTL epitopes had higher total HLA hits (48) as compared to CTL epitopes (22).
The CTL and HTL population coverage was also assessed for different geographical regions such as South Asia (82.25%, 73.38%), South-East Asia (83.13%, 56.98), Europe (97.17%, 85.83%), South America (75.46, 58.59%), North America (94.35, 87.89%), East Africa (73.78%, 81.82%) and South Africa (78.07%, 32.10%). The coverage of CTL and HTL epitopes in different sub-continents and countries is shown in Figure 1.
Prediction of B-cell epitopes
The 16-mer B-cell epitopes were selected on the basis of binding score (>0.9). The epitopes with high binding score were further evaluated for antigenicity (>0.5), allergenic and toxic properties. The antigenic, non-allergic and non-toxic B-cell epitopes were selected from three proteins as shown in Table 4.
Designing of multi-epitope vaccine construct
From the predicted epitopes, 7 CTL, 4 HTL and 6 B-cell epitopes were selected for designing of final vaccine construct. The GPGPG linkers were added to link B-cell and HTL epitopes where as AAY linkers were used to connect CTL epitopes. hBD3 (ID-Q5U7J2) a 45 amino acid long adjuvant was added to improve the immunogenicity of the vaccine construct. The adjuvant was linked using EAAAK linkers. The schematic diagram and amino acid sequence of designed vaccine construct is shown in Figure 2A.
The overall antigenicity score was found to be 0.71 indicating that the vaccine construct is highly antigenic and classified as probable non-allergen.
Evaluation of construct physicochemical properties
The final construct consisted of 337 amino acids with molecular weight of 36.21 KDa (C1644H2513N453O468S7) and an isoelectric point of 9.28. The predicted aliphatic index was 70.47 suggesting thermo-stability and instability index of 23.39 confirmed that the vaccine construct was stable. The GRAVY score was found to be -0.445 suggesting its hydrophilic nature. The estimated half-life was predicted as 30 hours in mammalian reticulocytes, >20 hours in Yeast and >10 hours in E. coli.
Prediction of Secondary and tertiary structure
The secondary structure consisted of 37% alpha-helix, 14% beta-strand and 49% coils. The 2D structure is shown in Figure 2B. RaptorX generated five 3D models of the final vaccine sequence with RMSD values ranging from 4.563 to 7.726. Each model was evaluated for overall quality using ERRAT and PROCHECK tools. Model 2 with RMSD-5.2374 was selected on the basis of the overall quality factor in ERRAT (93.7%) and PROCHECK (Ramachandran plot). The Ramachandran plot of Model 2 showed 87.2% of amino acid residues in most favoured, 12.5% in allowed and 0.4% in disallowed regions. The initial 3D model and associated Ramachandran plot are shown in Figure 3A, 3C respectively
Tertiary structure refinement and validation
Galaxy Refine generated five models of which Model-3 with RMSD of 0.359, Mol Probity of 1.965, GDT-HA of 0.9733, Clash score of 12.7and poor rotamers of 0.4 was selected for further analysis.
Ramachandran plot of the refined 3D model showed 92.1% residues in most favoured, 6.8% in additional allowed regions, 0.8% in generously allowed regions and 0.4% in disallowed regions. Based on the results of Ramachandran plot and ERRAT score, that quality of Model-3 was found to be acceptable. The refined 3D model and its Ramachandran plot are shown in Figure 3B, 3D respectively. The 3D model was further examined and validated through ProSA Web server with predicted Z-score of -5.2 was in the range of experimentally validated protein structures obtained from X-ray and NMR spectroscopy analysis (Figure 3E).The solubility score was found to be 0.523 which shows that construct is soluble upon expression (Supplementary Figure 2).
Prediction of discontinuous B-cell epitopes
A total of six discontinuous B-cell epitopes were identified by ElliPro server. Almost 5 B-cell epitopes contained 167 amino acid residues present in the main region of the vaccine construct. A score value in the range of 0.67 – 0.76 was chosen for selection of discontinuous B-cell epitopes (Figure 4). 22 residues were predicted from 60-61, 63, 65-81 and 84-85. 24 residues were predicted from 300-303, 305-314 and 328-337. 60 residues were predicted from 97, 100-110, 121, 124, 125, 128-141, 143-145, 148-161, 182, 184, 185 and 187-197. 13 residues were predicted from 220-231 and 234. 48 residues were predicted from 1-7, 9-11, 13-23, 27, 44-47, 49, 50, 53 and 256-274 (Supplementary Table 2).
Molecular docking with immune receptors
The docking of the construct was performed with immune receptors including TLR2, TLR3 and TLR4. 3D docked scores for TLR-2, TLR-3 and TLR-4 were -344.38, -345.38and -324.47 respectively. The docking energy scores indicating the vaccine-receptor complex binding affinity and ligand RMSD are shown in Supplementary Table 3 and the vaccine - TLR2, TLR3 and TLR4 docked complexes are shown in Figures 5A, 5B and 5C respectively. The docked complexes were analysed in LigPlot+ for visualization of intermolecular hydrogen bonds representing vaccine-receptor interactions. For ligand-TLR2 complex, Lys8, Arg207, Ile30, Asn200, Tyr255, Asp259 and Glu333 residues were involved in intermolecular hydrogen bonding (Figure 5D). Similarly Asp268, Arg262, His336, Asp259, Asn85, Glu87, Asp322, Arg12, Arg14 residues in ligand-TLR3 (Figure 5E) and Ala299, Arg207, Ser199, Gln29, Lys32, Lys8 residues in ligand-TLR4 (Figure 5F) were identified in LigPlot.
Molecular dynamics simulation of receptor-vaccine complex
The Maestro's Schrodinger simulation event analysis module was used to analyse the trajectories. By superimposing the trajectories over the reference frame, RMSD and RMSF were calculated using trajectories from MD simulation data. This provides an estimate of conformational stability and volatility over the course of the simulation. The amount of hydrogen bonds established between the receptor and the docked ligand during the simulation also suggests the ligand's binding stability with the receptor. A binding arrangement with a higher number of hydrogen bonds is said to be more stable. For each TLR2, TLR3, and TLR4 receptor complex with ligand, the RMSD was computed. TLR2-complex RMSD demonstrated that after 60ns, the RMSD of TLR2 protein and ligand converged. Throughout the simulation, the protein-ligand complex remained stable. For TLR2 and TLR2 bound ligand, the standard deviations in RMSD were 0.932 and 1.4321, respectively, with average RMSD values of 2.75 and 5.59 (Figure 6A). For both TLR3 protein and ligand, the RMSD plot of the TLR3-complex revealed convergence after 5ns and remained stable throughout the simulation (Figure 6B). The RMSD of the ligand was raised around 20-60ns, but the binding site remained unchanged. TLR3 and TLR3 bound Ligand had standard deviations of 0.231 and 0.954, respectively, with average RMSD values of 3.07 and 5.75 (Figure 6B). The TLR4-complex RMSD data demonstrates convergence after 5ns for both the TLR4 protein and the ligand. Ligand RMSD increased up to 20ns before stabilising toward the end of the simulation. Throughout the simulation, the TLR4-ligand complex maintained its conformational stability (Figure 6C). This shows that the Protein-ligand complex was stable throughout the simulation. The standard deviation in RMSD was 0.314Å and 0.885Å, while Average RMSD values were 3.07Å and 6.03Å, for TLR4 and TLR4 bound Ligand, respectively.
Throughout the simulation, the compactness of the proteins was examined, and the plot revealed that all three proteins were folded correctly. In comparison to TLR3 and TLR4, TLR2 demonstrated changes in compactness during the simulation. The contact between the two molecules is represented by intermolecular hydrogen bonds (H-Bonds), which are implicated in the intensity of binding through the number of H-Bonds. The more H-bonds there are, the more binding or interaction there is between two molecules. Throughout the simulation, the average number of Intermolecular H-bonds between complex TLR2-Ligand (Figure 6D), TLR3-Ligand (Figure 6E), and TLR4-Ligand (Figure 6F) were 15.36, 16.45, and 11.98 respectively. For complex TLR2-Ligand, TLR3-Ligand, and TLR4-Ligand, the range of hydrogen bonds during 100ns simulation over 1000 time frames was 7-24, 6-29, and 6-23, respectively. The number of hydrogen bond interactions at 10ns time intervals with RMSD values is shown in Table 5. This MD simulation analysis indicates that all three immune receptors efficiently bound with the ligand and showed stable binding throughout the simulation (Supplementary Videos). The number of hydrogen bonds in 100ns simulation over 1000 time frames and corresponding receptor and ligand RMSD are provided in Supplementary material 2.
The average value of RMSF for TLR2, TLR3, and TLR4 were 1.23 Å, 2.71 Å, and 2.45Å, respectively. RMSF showed fewer fluctuations in TLR3 and TLR4 as compared to TLR2 Figure 6G. Moreover the ligand’s RMSF plot indicates less fluctuation in RMSF values of TLR2 as compared to TLR3 and TLR4 bound ligands. Figure 6H shows the radius of gyration of all three complexes.
Immune simulations of final vaccine construct
The output of C-ImmSim server provides the stimulation of different immunological cells including B-cell, helper-T cell (TH), cytotoxic-T cell (TC), natural killer cell (NK), macrophages (MA) and dendritic cell (DC) population. Moreover, the prediction of immunoglobulins, cytokines and interleukin production during immune simulation is also provided.
The total B-cell population, memory B-cell and active B-cell population increased following each booster vaccine dose and remained stable with minimal decay over the period of 350 days Figures 7A, 7B. There was a considerable rise in antibody response with every exposure to the vaccine construct with corresponding decrease in the antigen levels. The humoral immune response was characterized by IgG and IgM antibodies and the IgM response was higher as compared to IgG after each vaccination (Figure 7C).The active TH cells and memory TH cells spiked after second and third vaccine doses which remained elevated up to 350 days (Figures 7D, 7E). Similarly the TC cell population per state showed steady increase in active TC population (Figure 7F, 7G). The IFN- γ response was significantly higher after first and second dose and the concentrations of IL-10 and TGF-b also spiked following each vaccination (Figure 7H).Throughout the simulation, there was also a concurrent rise in the activity of macrophages, dendritic and natural killer cells (Figure 7I).