Subtractive Proteomics Analysis Revealed Lipid A-4’phosphatase (lpxF) as a Potential Candidate for Epitope-based Vaccine Design Against Helicobacter pylori Infection

Amidst the surge in the prevalence of resistant H. pylori infections, WHO in 2017 has given a high priority to clarithromycin-resistant H. pylori for research and to develop new antibacterial agents. In this study, the Helicobacter pylori 26695 strain was investigated with extensive computational biology applications to identify novel therapeutic drug targets or vaccine candidates. During the proteomic functional annotation of an organism, it is crucial to determine the function of proteins. The pathogen-specic pathways were found to include only twelve proteins, paving the way further to determine drug or vaccine targets. Lipoprotein A-4’-phosphatase (LpxF) was found to be a novel vaccine target with the highest antigenicity. Having broad-spectrum conservancy with other H. pylori strains. Further, an immunoinformatic approach was used to predict an effective epitope-based vaccine against H. pylori. LpxF protein has been predicted to have linear and conformational B-cell epitopes and cytotoxic T-lymphocyte epitopes. Virtual screening of all the predicted 35 peptides against human TLR2 receptors resulted in identifying the top 5 peptides. Subsequent redocking with exhaustive parameters reported two peptides with docking energy of -6.9 kcal/mol with a good interaction pattern between the peptide-TLR2 complexes. Furthermore, a panel of two potent epitopes has been proposed that could be used to immunize populations against multiple H. pylori infections. lpxF as a prioritized protein of possible vaccine target candidates. Further, the epitopes predicted for lpxF were screened for the potential binding against the human TLR2 receptor using virtual screening and molecular docking approaches. The two peptides named PEP_19 (AGFVYYR) and PEP_31 (FAYLFTSRY) were reported with strong hydrophobic and hydrogen-bond interaction for TLR2. We conclude by recommending these epitope-based peptides validation further by in vivo and in vitro studies to prove their effectiveness. predicted using an articial neural network and a weight matrix(39).


Introduction
Despite decades of effort, overcoming Helicobacter pylori infections has been an arduous task for the scienti c community. More than half of the world's population is infected with H. pylori, a leading cause of duodenal and gastric ulcer as well as gastric cancer [1]. Targeting the bacterium has become quite challenging task due to its ability to adapt to adverse environmental conditions, such as pH, temperature, adhesion and phenotypic forms [2]. The previous unsuccessful treatment attempts and growing antibiotic resistance impede the pathogen's eradication leading to the development of multidrug resistance (MDR) strains. Among the high-priority pathogen list of world health organizations (WHO), the clarithromycin-resistant H. pylori are of precedence for research, discovery, and development of new antimicrobial agents (3). Several factors and mechanisms contribute toMDR development by H. pylori, like point mutations, e ux pumps and bio lm formation. The emergence of MDR has become a most signi cant challenge in the management of H. pylori infection. The alarming H. pylori antibiotic resistance levels have been found with a prevalence of 80% and 40% in developing and industrialized countries, respectively (4).
H. pylori are associated with causing gastritis, a pre-condition in developing a gastric and duodenal ulcer. It is a risk factor for the gastric cancers adenocarcinoma of the distal stomach and gastric mucosaassociated lymphoid tissue (MALT) lymphoma. The outermembrane of Gram-negative bacteria like H. pylori is generally constituted with macromolecules like lipopolysaccharide (LPS), one of the essential virulence factors. This LPS comprises lipid A embedded in the outer membrane known as endotoxin that induces fatal hypersensitive reactions of the human immune system at low concentrations. The latter is a core oligosaccharide followed by O-antigen (5). This bacterium can form chronic colonization in the human stomach by modifying its surface structure, i.e., lipid A, by removing phosphate groups from the 1and 4'-positions of the lipid A backbone. The studies of  demonstrated that the ability of H. pylori colonization in a mammalian host is due to the dephosphorylation of the lipid A domain of LPS by lpxE and lpxF (6).
Generally, biological processes like purine and pyrimidine metabolism and oxidative phosphorylation appear as cell functions that act as therapeutic targets. Other therapeutic interventions related to metabolic aspects o agellar assembly, ABC transporter, chemotaxis, protein folding, regulation, meta homeostasis, genetic information processing, and resistance to acid and oxidative stresses are also crucial in the pathogenesis of H. pylori (7). Global analysis of gene expression explored four independent studies to identify the essential genes of H. pylori for its survival in both in vitro and in vivo conditions (8)(9)(10)(11). In contrast, the advances in whole genome sequencing and computational biology have offered various alternative approaches to identify the drug targets with a worth of experimental follow-up (12).
Applications of genomics to analyze both pathogen and host-genome sequences have revolutionized focused identi cation of drug targets more easily (13)(14). Findings of novel and unique drug targets remain a speci c step in the drug discovery to overcome the infection caused by several drug-resistant pathogens. In this scenario, the advancement in bioinformatics applications has paved the way towards subtractive genomics/proteomics approaches in identifying novel drug targets. Prior to developing the novel therapeutic agents (either by drug-repurposing or new compound discovery) with a novel mode of action, putative targets must be identi ed in the form of pathogenetic bacterial properties or mechanisms involved in pathogenicity. The approaches like whole-genome mutagenesis allow identifying the potential molecular targets in a pathogen (15) or by elucidating protein-protein interactions, protein-DNA interactions, and protein-RNA interactions potential molecular components of bacterium can be targeted.
This strategy aims to predict non-homologous genes from pathogens against the human host. Further, these non-homologous genes must be essential for the survival, replication, and sustainability of the pathogen and critical in the bacterium's unique metabolic pathways. The essential proteins are prioritized by using various computational biology tools and databases. The screening of the shortlisted proteins for their subcellular localization and their absence as pre-existing targets have revealed the novelty of the targets identi ed. This study also emphasizes the discovery of B and T-cell epitopes, which are projected as effective peptide vaccine candidates using the immunoinformatics approach. The potential epitope candidates for the development of peptide vaccines were investigated further by molecular docking studies to understand the interactions in peptide-TLR2 complexes.

Results
The primary focus of this study is to explore the novel therapeutic targets or vaccine candidates against H. pylori by employing a structural genomics approach. We used an unreviewed hypothetical proteome set for analysis through several bioinformatics databases and computational biology tools. The entire ndings of subtractive proteomics study were brie y summarized in Figure 1.

Candidate protein for vaccine design
Analysis of the H. pylori hypothetical proteins A preliminary prediction for the functional annotation was carried out by using the GO FEAT platform. From the total set of H. pylori 26695 reference proteome (1, 115 proteins), a set of 944 unreviewed hypothetical proteins were used for the analysis. Further, the proteins of the known domain and/or families and their GO terms were selected (542 proteins) for further analysis (Supplementary File 1).
These functionally annotated proteins may play an important role in the cell and are thus labeled as hypothetical proteins (HPs). The functional annotation of HPs assist in gaining the knowledge of structure, function and pathways abetting in the pathogenesis of bacterium and thus crucial in identifying novel therapeutic targets. The human homologous proteins in the pathogen and proteins in metabolic pathways associated with pathogen and host in common were determined using various webbased bioinformatics resources.

Selection of non-homologous human proteins
All the hypothetical protein sequences analyzed for functional domain/family were then screened only for the non-homologous sequences. BLASTp with an e-value of 10 -3 threshold against H. sapiens from NCBI was performed. This prompted us to acquire a set of 412 non-homologous proteins from H. pylori (Supplementary File 1).

Essential protein analysis
A total of 77 essential genes were identi ed by performing BLASTp against the DEG database (e-value<0.0001). These proteins considered crucial for H.Pylori survival are unique (Supplementary File 1) and are believed to be in identifying species-speci c drug targets/vaccine candidates (44).

Human Gut microbiota analysis
The inadvertent blockage of the gut oral proteins due to homologous proteins of the pathogen may lead to hostile effects (31). To avoid this, homologous gut microbial proteins which are similar to essential proteins of H. pylori were omitted for further analysis. This step is accomplished using BLASTp by choosing the search set against human gut metagenome 16S ribosomal RNA with an expected threshold at 0.05. Found no signi cant matches suggesting that the entire protein set of essential proteins are unique for the pathogen.

Analysis of metabolic pathways
We retrieved a total of 342 human metabolic pathways and 95 H. pylori-speci c metabolic pathways from the KEGG database (Supplementary le 2). For screening novel therapeutic targets, the proteins exclusively involved in the pathways speci c for the pathogen were considered. In our study, the set of 77 essential proteins were submitted to KAAS server for assigning KEGG ontology (KO) and speci c metabolic pathways. A total of fty-ve proteins were assigned with KO and forty-two proteins were involved in pathways common to H. sapiens. These forty-two proteins were omitted for further screening to circumvent cross-reactivity with other human pathogens. Finally, we arrived at conclusion with twelve unique proteins found to be involved in pathogen-speci c pathways ( Table 1).

Analysis of subcellular location
The prediction of protein localization serves as a vital parameter in identifying therapeutic targets because many pathogens can span multiple locations (45). Among 12 proteins, only two were identi ed as inner membrane proteins, 07 as cytoplasmic and the remaining 03 as periplasmic proteins (Table 2).

Analyzing druggability of hypothetical protein
The novelty of the membrane proteins as a drug target was analyzed using the DrugBank database 'Anti-target' analysis of the novel drug target In view of inadvertent side effects, various drug candidates were either withdrawn or reduced their usage except under extreme situation. The cross-reactivity and carcinogenesis check is crucial in selecting an effective drug molecule (46-48). The toxicity caused by the misconstrued binding of drugs to host 'antitargets' instead of pathogenic targets must be avoided. In this concern, this result revealed no similarity with any of the human 'anti-target' proteins (Supplementary File 4) and thus lpxF is considered as the host 'non-anti-target' protein.

Antigenicity and allergenicity prediction
The reverse vaccination method is considered one of the powerful approaches in designing a candidate vaccine (49)(50). The small antigenic protein sequences were considered for developing a safe recombinant vaccine with the potency to ght against infectious diseases (51

Virulence factors of pathogenic H. pylori
The virulence mechanism of the non-homologous, essential protein can be explored by submitting it to the virulence factor database. For the query protein lpxF, the virulence factors retrieved were listed in Table 3.
Peptide vaccine discovery B cell epitopes prediction.
Three linear B-cell epitopes were predicted by BCPred with a score value of ≥0.90 and length of each epitope with 20 amino acids. Another B-cell epitope prediction server from IEDB identi ed epitopes based on ve different methods with all default parameters and were listed in Table 4.
The Surface accessibility of H. pylori lpxF is predicted based on the threshold value >1. The amino acids that fall above this value are probably considered their presence on the protein surface. Here, the maximum surface probability score was found to be 11.053 for FTSRYKPKRWML165-176. Figure 2A depicts the expected surface accessibility of H. pylori, while Table S1 (Supplementary Material) lists the maximum and minimum accessibility scores. Karplus and Schulz study on surface exibility of H. pylori lpxF has revealed a highly systematic and disordered structure indicated by the low and high b-factor values. The maximum predicted surface exibility score is 1.107 for FKGSSRY184-190. The predicted surface exibility of H. pylori lpxF is graphically represented in Figure 2B and the minimum and maximum scores were shown in Table S6 (Supplementary Material).
The Parker approach was used to predict the hydrophobicity of the predicted epitopes from H. pylori (53) and was graphically illustrated in Figure 2C. The maximum and minimum hydrophobicity calculated was 5.329 and -7.071, respectively, from all the predicted peptides for lpxF at the amino acid residue positions STAHKDG79-85 and FLSLLLW8-14, and predicted to act as active B-cell epitopes.
Prediction of Cytotoxic and HTC epitopes NetCTL 1.2 was used to predict the cytotoxic epitopes for lpxF protein. A total of eight CTL epitopes were predicted based on the de ned criteria and speci c MHC binding score (Table 5). These potential epitopes for MHC class I molecules against the HLA-A*24:02 allele was predicted using the SMM method. Further, MHC-I binding and proteasome-dependent C-terminal cleavage of lpxF were also predicted based on weight matrix and arti cial neural network. Finally, the prominent epitopes were predicted based on the MHC binding a nity, the TAP score and the C-terminal cleavage score.

Structural modeling peptide and molecular docking studies
A total of thirty-ve epitopes were shortlisted by combining B-cell and T-cell epitopes from the lpxF protein (Table S7, Supplementary Material). The selected top ve epitopes from the protein were re-docked with an exhaustive parameter set to 100 for better conformation search. This led to slection of two peptides with the best conformational binding towards the TLR2 target protein and identi ed the interaction pattern between the peptide-TLR2 complexes. The epitope AGFVYYR (PEP_19) was able to bind TLR2 with docking energy of -6.9 kcal/mol. The residues Phe3, Tyr5, Tyr6 and Arg7 from the PEP_19 epitope were found in interaction with and Lue119, Asn137, Phe144 and Asn143 from TLR2. Similarly, the docking energy for binding epitope FAYLFTSRY (PEP_31) with TLR2 was also calculated as -6.9 kcal/mol. The residues Phe1, Tyr3, Phe5, Thr6 and Ser7 from the PEP_31 epitope interacted with Asn265,

Discussion
The Helicobacter pylori responsible for chronic gastrointestinal infections in more than half of the world's population (54) and a high prevalence is found in the East Asian countries (55)(56). The systemic effect of H. pylori infection on the entire body has drawn considerable attention in recent years (57)(58). Subtractive genomics-based approaches are used to identify the druggable targets or vaccine candidates for H. pylori. To recognize speci c features such as non-homologous, important, and antigenic proteins, we used the H. pylori 26695 strain. Candidate vaccine proteins were chosen from the known membrane proteins that play acritical role in H. pylori virulence and survival, as well as pathogen-speci c metabolic pathways.
In the present study, we used the entire set of uncharacterized hypothetical proteins to analyze and predict the drug candidate protein. The primary step employed is to nd the functional domains/ family of the hypothetical proteins. The particular function depends on the domains, which are the structural, functional, and evolutionary protein units. Understanding the function of the protein domain is essential to explore its role at cellular level. Secondly, striking homology exists between bacteria and human protein since these proteins are involved in a typical cellular system (59)(60). Hence, the cross-reactivity with host homologous proteins must be avoided during therapeutic development and administration to bind pathogen-speci c target proteins. Thirdly, critical genes were de ned from the non-homologous set collected, since essential genes are required for the bacterial proteome's cellular processes to continue functioning (61).
Further, a gut microbiota refers to the large population of bacteria that colonize the human intestinal tract(62). Pathogens that cause human in ammatory diseases are closely associated with gut microbiota; these pathogens co-evolve and self-multiply in a symbiotic relationship with gut microbiota (63). The lpxF, which is predicted to be localized in the cell's inner membrane, is hypothesized as a potential vaccine candidate. While cytoplasmic proteins may be potential drug targets, we will focus on inner membrane proteins for this study because membrane proteins account for more than 60% of therapeutic targets (64). Considering membrane proteins as targets for a variety of reasons that include: (1) protein functions will be calculated using machine learning and computer-based methods prior to in vivo or in vitro laboratory trials; (2) A nature of the unique structure of membrane proteins will facilitate to predict and generate their secondary structure (65). The mutational change and exchange of genes among pathogens occur due to the overuse of broad-spectrum therapeutics. The emergence of the antibiotic resistance crisis is mainly because of the misuse of antibiotics and the lack of new therapeutics (64-65).
The lpxF removes the 4'-phosphate group from tetra-and hexa-acylated lipid A species and has no 1phosphatase or 2-keto-3-deoxy-D-manno-octurosonic acid (Kdo2) hydrolase activity. The absence of the 4'-phosphate group renders the bacteria resistant to host-derived cationic antimicrobial peptides (CAMP), allowing it to camou age itself from the host's innate immune response, and plays a critical role in the long-term colonization of the host's stomach. This protein is involved in the LPS lipid A biosynthesis pathway, which is part of bacterial outer membrane biogenesis (66). To identify an epitope-based peptide vaccine that can enhance the immunity against H. pylori infections, we predicted and validated both Bcell and T-cell epitopes. Upon virtual screening of the peptide library of 35 epitopes based on MHC binding a nity, C-terminal cleavage score, TAP score and molecular docking, we hypothesize that epitopes AGFVYYR (PEP_19) and FAYLFTSRY (PEP_31) from lpxF protein as potential epitope-based  71,72]. The TLR2 has unique properties to be investigated in vaccine development and its ability to covalently bind TLR2 ligands to antigens and the enhancement of antigens coupled to TLR2-targeting lipid moieties for direct and cross-presentation.TLR2 stimulation's ability to induce healthy Th responses and regulatory mechanisms and its mucosal imprinting propertiescan aid in the resolution of actual vaccine challenges [73].
We hypothesize that the extracellular domain of TLR2 may be involved in this interaction as LRRs of TLR2 carry the speci city allowing binding of both epitope-based peptides (PEP_19 and PEP_31) and signaling. These epitopes' allergic and antigenic pro les have also been con rmed, indicating that they are potent candidates for vaccine production. Furthermore, B-cell epitopes have been identi ed as the preferred method for eliciting a B-cell immune response. Overall study will aid progress in the peptidebased vaccine development against H. pylori infections. As a result of our research, we've identi ed novel peptides that could help to combat the MDR bacteria crisis by creating new H. pylori therapeutic.

Conclusions
In the current study, the computational subtractive genomic method explored the lpxF of H. pylori as a potential vaccine target candidate. It is found that this protein is found to be involved in the pathogenspeci c 'lipopolysaccharide biosynthesis' metabolic pathway. This inner membrane protein lpxF does not show any homology with the human proteome to avoid a potential autoimmune response. Moreover, the absence of cross-reactivity with other pathogenic antigens and suitable antigenic and adhesion properties are most essential for the pathogenesis of the microbe and protection against the infectious diseases. The lpxF also showed no similarity against the anti-targets in the human. Therefore, this study shows lpxF as a prioritized protein of possible vaccine target candidates. Further, the epitopes predicted for lpxF were screened for the potential binding against the human TLR2 receptor using virtual screening and molecular docking approaches. The two peptides named PEP_19 (AGFVYYR) and PEP_31 (FAYLFTSRY) were reported with strong hydrophobic and hydrogen-bond interaction for TLR2. We conclude by recommending these epitope-based peptides validation further by in vivo and in vitro studies to prove their effectiveness.

Methods
Subtractive proteomics approach for identifying candidate protein for vaccine design Retrieval of the proteomic data set The whole proteome (unreviewed) of Helicobacter pylori (strain ATCC700392/26695)/ Campylobacter pylori were retrieved from the UniProt database (proteome ID: UP00000429). The reference proteome contains a total of 1,115 proteins and 944 unreviewed proteins.

Functional annotation of hypothetical proteins
The entire set of 944 unreviewed hypothetical proteins (HPs) were subjected to the GO FEAT 1.0 server as a preliminary prediction for the functional annotation with e-value 1e-03 (16). Multiple databases like Uniprot (17), InterPro (18), Pfam (19), NCBI (https://www.ncbi.nlm.nih.gov/) and EMBL (https://www.ebi.ac.uk/) were used for rapid functional annotation of the HPs seat using GO FEAT server. The HPs set describing the functional domain and/or protein family were considered for further analysis.
Identifying non-homologous bacterial proteins in humans.
All the HP sequences analyzed by the GO FEAT tool were then screened for non-homologous proteins.
Identifying the protein sequences that are not analogous to the human proteins is an objective of this step and is achieved by performing BLASTp against the human proteome set with e-value<0.0005 (20).
Identi cation of essential non-homologous proteins Essential genes are those indispensable for the survival of the organism by constituting various cellular processes. Hence, identifying the essential genes is a critical step in the screening of potential drug targets. BLASTp from the database of essential proteins (DEG) is used to predict the essential nonhomologous protein from the protein set with a threshold e-value<0.0001 (21).
Identi cation of orthologs in the human gut microbiome.
Gastrointestinal digestion in humans mainly depends on the activity of microbes that reside in the gut region as, gut microbiota plays a critical role in keeping the health of gastrointestine.The gastrointestinal tract of healthy humans houses about 1010-1014 microbes and exist in a symbiotic association with the host (22). Adverse pharmacokinetic side effects in the host may be attributed due to interaction and binding of the pathogenic homologous proteins with human gut microbiota proteins. Hence, the aboveidenti ed essential proteins were screened for non-homology with human gut attributed using BLASTp at signi cance of e-value<0.0001 (23)(24).
Metabolic pathway analysis.
The KEGG pathway database was used for metabolic pathway comparison of H. pylori 26695 (hpy) and H. sapiens (hsa) (25). To identify pathogen-speci c unique pathways. The non-homologous essential proteins were further screened by performing BLASTp analysis using KASS server (26) and the only proteins involved in pathogen-speci c pathways were sorted for further analysis.
Prediction of protein subcellular localization The ve major localizations of Gram-negative bacterial proteins include cytoplasm, inner membrane, peripheral membrane, outer membrane and extracellular region (27). The prediction of subcellular localization is essential for categorizing the proteins as a drug target or vaccine candidate. Generally, the proteins in the cytoplasm can be referred to as drug targets and those on the membrane as vaccine targets (28). In this study, subcellular localization of the shortlisted proteins is identi ed using PSORTb v3.2 server (27), CELLO v2.5 server (27) and PSLPred (29). For protein localization, the exact position predicted by all three servers or at least any two servers. Only membrane proteins were considered as potential drug target and vaccine candidates, used for druggability analysis.

Druggability analysis
Using the BLASTp software with an e-value of 0.001, the DrugBank 5.0 database (30) was used to classify novel drug targets or vaccine candidates from membrane proteins with default parameters.
'Anti-target' analysis The adverse pharmacological effects could be developed in the host due to the non-speci c binding and interaction of the therapeutic compounds instead of acting against pathogenic proteins due to shared homology. Anti-targets are analogous human proteins and a total of 210 proteins were identi ed via a literature search, as shown in Supplementary Table 4 (31) using BLASTp at NCBI was used by submitting the novel membrane proteins against these 'anti-targets' with e-value<0.005 and identity of < 25%.

Antigenicity and allergenicity of druggable proteins
The magnitude protein-antigenicity is greatly important for designing a subunit vaccine. VaxiJen v2.0 server ( Comparison of predicted sequences with other strains The spectrum of drugs in the entire homologous bacterial community can be determined by estimating conservancy patterns among predicted sequences with different classically used strains of the same species. BLASTp was used for the analysis by keeping all default parameters except "Helicobacter pylori" in the organism option.

Virulence factor analysis
A comprehensive data set of virulence factors de ned by sixteen dominant bacterial pathogens were provided by Virulence Factor Database (VFDB) (33). These virulence factors cause the colonization of bacterial pathogens leading to the destruction of host cell.
Immunoinformatic approach for epitope-based peptide design and validation
To predict 20-mer linear B-cell epitopes, the BCPred server uses kernel technique. BCPred predicts B-cell
NetCTL accepts the FASTA sequence format for various analysis, including MHC class I binding a nity prediction, TAP transport performance, and C-terminal cleavage prediction. MHC-I binding and proteasome-dependent C-terminal cleavage were predicted using an arti cial neural network and a weight matrix (39).

Construction of Peptide library and analysis of peptidetarget interactions
Based on each peptide's sequence information, the 3D structures were generated using discovery studio visualized v.20. The energy minimization was done using Maestro software V12.7. Thus prepared peptide structures were added with Gasteiger charges [40] using Autodock tools and processed.PDBQT le'. Subsequently, the molecular docking study was performed to screen the library of epitopes modeled against the human TLR2 receptor (PDB ID: 6NIG) to predict the best t peptide and their binding interactions. The target protein structure was retrieved from the protein data bank (PDB) and edited to remove hetero atoms and added Kollmann charges [41]. Grid box of size 74x, 110y and 110z with centers 4.468x, 7.467y and 23.688z was set around the residues forming the binding site of the target protein and the Broyden-Fletcher-Goldfarb-Shanno algorithm implemented in the AutoDock Vina was employed to study appropriate binding modes of each peptide in their most stable conformation [42]. For the peptide molecules, all the torsions were allowed to rotate during docking [43].
Declarations Table 3 Virulence factor of the hypothetical protein of lpfX.  Table 5 List of the total T-cell epitopes vaccine candidate peptides predicted by NetCTL tool.

S. No
Peptide Sequence  Figure 1 Schematic representation of overall work ow employed in the subtractive proteomics approach to identify vaccine candidate protein.