Essential genes in the target pathogen
Due to the lack of understanding of the essential genome in pathogenic strains such as ST131, the model strain BW25113 was used as a basis for our study. The predicted amino acid sequences of 353 out of the 358 essential genes identified by Goodall et al.8 were retrieved (Figure 1). The remaining five genes (ttcC, yddL, yedN, ygeF, ygeN), all labelled ‘pseudogenes’ or ‘putative protein’ and were not found in the Keio collection9, were excluded from further analysis. Their assignment as essential by Goodall et al.8 may be thus an artefact of the methodology employed in the original study.
The sequences of the 353 proteins were compared to those found in E. coli O25b:H4-ST131 using a pre-defined cut-off (E-value ≤ 10E-10 or ≥ 70% sequence identity, and ≥ 75% alignment length, see Materials and Methods). All of the 353 essential BW25113 proteins were associated with at least one hit in ST131, apart from YqeL (Supplementary Materials 1). However, 15 proteins were excluded as they scored below the cut-off threshold. Inspection of these sequences revealed that many of them were prophage-related or uncharacterized proteins. The presence of phages in a bacterial genome is expected to vary with the specific strain history, and this may explain the observed difference between the laboratory strain BW25113 and the epidemic clonal lineage ST131. This left 337 essential and conserved E. coli proteins in the pipeline for further analyses.
Homology to proteins in mammalian hosts
The second step of the analysis aimed at removing E. coli targets homologous to the human proteome. A high degree of similarity between the pathogen’s target and one or more proteins in the host proteome may result in off-target binding of a drug, leading to toxicity and unwanted side effects. The 337 selected essential proteins were therefore compared to the human proteome, leading to 186 proteins fulfilling the same stringent cut-offs as above (Figure 1, Supplementary Materials 2).
Homology to proteins in beneficial taxa of the gut microbiota
The next step in the selection pipeline aimed to exclude proteins with high similarity to those found in representatives of the beneficial gut microbiota. Given the complexity and variability of the gut microbiome, we decided to focus on seven taxa containing species have previously been shown to have beneficial and protective effects on the host5,6,10–16: Faecalibacterium, Prevotella, Ruminococcus,Bacteroides, Lactobacillus, Lachnospiraceae and Bifidobacterium (Supplementary Materials 3-9). The 186 proteins were blasted against the abovementioned taxa using the same cut-off values as before (Figure 1). As expected, this step was the most selective, leaving just 31 proteins to further analysis (Table 1) and removed all targets of commercially available antibiotics, including ParC (target of fluoroquinolones), FtsI, MrdA (targets of b-lactams), parts of the 30S and 50S ribosome (targets of macrolides, aminoglycosides tetracyclines) and RNA polymerase (target of rifamycins).
Among the identified 31 proteins, only PheM and TrpL, each encoding a leader peptide in the Phe tRNA synthetase and Trp biosynthetic operon, were found to be missing completely in all taxa. No hits for YobI (a protein of unknown function) were found in any of the taxa apart from Faecalibacterium, where one single hit was found (E-value 5.1, 61.9 % alignment and 69.2% id). SafA (part of the low pH stress response) was found to be missing in Lachnospiraceae, Bifidobacterium and Faecalibacterium. Furthermore, WzyE (probable ECA polymerase), MreD (rod shape determining protein), LolA and LolB (both part of the lipoprotein transport pathway) lacked hits in Bifidobacterium, FtsL (a cell division protein) and MukF (involved in chromosome partition) in Bacteroides and YciS (lipopolysaccharide assembly protein A) in Faecalibacterium. All other proteins were associated with hits below the cut-off in all taxa.
Due to the high stringency applied in the above step, potentially valuable targets may have been missed in the selection process. Thus, a second analysis of the microbiota BLAST results was undertaken to find proteins associated with only a few hits over cut-off. Six proteins (BamD, YfgZ, HolA, YrfF, LptD and ZipA) were each found to be associated with one hit only and had all been excluded based on E-value cut-off rather than sequence identity. Thus, these six proteins were included in further analyses, leading to 37 proteins as potential E. coli-selective targets (Table 1).
Target conservation in K. pneumoniae
We evaluated presence and essentiality of the selected targets in K. pneumoniae KPNIH1, another global priority pathogen closely related to E. coli. The essentiality of the 37 proteins was checked against the library generated by Ramage et al.17, together with conservation of the amino acid sequence as above (Figure 1, Supplementary Materials 10). Eighteen were found to be essential in both organisms, all displaying high sequence conservation, apart from ZipA and HipB. However, the majority of the targets not reported to be essential in K. pneumoniae fulfilled the selection criteria, apart from HigA, IraM, SafA, YobI and TrpL (Table 1).
Biological function of selected targets
Of the 37 identified targets (Table 1), several were found to share or have similar biological functions (Figure 2). One of the largest groups comprise of the proteins involved in outer membrane (OM) biogenesis and maintenance (Figure 2). Here, BamD is directly associated with the OM, and is part of the b-barrel assembly machinery (BAM). LptA, LptD, LptE and LptF are all part of the lipopolysaccharide (LPS) transport (Lpt) machinery. LolA and LolB are found in the periplasmic space and the periplasmic side of the OM respectively, and belong to the lipoprotein transport machinery responsible for delivering OM lipoproteins to all three of the OM assembly machineries (LOL, BAM and LPT)18. SecE is part of the SecYEG protein translocation machinery responsible for transporting proteins into the periplasm19, and PssA is involved in phospholipid biosynthesis20. Furthermore, the inner membrane-protein YciS (also known as LapA, lipopolysaccharide assembly protein A) is part of a machinery responsible for envelope stress-response and regulation of LPS production21. Finally, although not associated with OM maintenance, TonB is part of the machinery responsible for actively importing iron across the OM in the cell22 (Figure 2).
Another two functional groups comprise of the proteins responsible for DNA replication (HolA, HolD, PriB, DnaT and YgfZ) and cell division (MukB, MukE, MukF, FtsB, FtsL, FtsQ and ZipA) (Figure 2). DNA replication is a tightly controlled mechanism and DNA Polymerase III holoenzyme is the major replication complex in E. coli, where both HolA (d subunit) and HolD (y subunit) make up parts of the clamp loading complex23. DNA damage can cause this machinery to be stalled and disassemble on the chromosome, leading to replication failure. To re-start replication the cell must make use of the replication restart primosome, where both the PriB helicase and DnaT primase are found24. YgfZ has been shown to be part of the system regulating chromosomal replication25. MukBEF are unique to the g-proteobacteria and are involved in cell division, making up the only E. coli condensin for chromosome replication, segregation and organisation26. Further downstream in this process the transmembrane complex FtsBL is found27, together with FtsQ28 and ZipA. In a related process, MreD is involved in determining cell shape29
Among the proteins in the stress response category, CydX (Figure 2) is part of the CydAB cytochrome bd oxidase complex involved in aerobic respiration and maintaining the charge across the membrane used for synthesizing ATP30. IraM is a regulator of sS, the stationary phase sigma factor responsible for controlling expression of a plethora of genes involved in stress response31. Although the exact function of YobI has not yet been established, it has been shown to accumulate upon heat shock32.
Among the biosynthetic genes, WzyE has been implicated to be involved in assembly of the enterobacterial common antigen33, TrpL is involved in controlling tryptophan biosynthesis34 and HemD is a uroporphyrinogen III synthase35 (Figure 2).
HipB and HigA together make up the category of anti-toxins of the Type II Toxin-Antitoxin system, and work to counteract the effect of their cognate toxins36. As the sole members of their functional groups PheM is a target of transcriptional regulation (Figure 2) and is responsible for attenuation of the phenylalanyl-tRNA synthetase37, while SafA is a two-component system connector38.
Finally, no information regarding biological function could be found for the three proteins YcaR, YrfF and YdhL (Figure 2).
Target localisation
An essential requirement for to develop an efficient antimicrobial drug is target access. This is especially important in Gram-negative bacteria, where the double membrane structure acts as a permeability barrier, efficiently blocking many compounds from accessing intracellular targets. Subcellular localisation (SCL) was therefore considered to evaluate protein’s druggability. Swiss-Prot, the manually annotated section of UniProtKB, was used to find information on SCL for each of the 37 selected proteins (Figure 2, Table 1). The target proteins were found to be located in either the Inner Membrane (IM), Outer Membrane, Cytoplasm, Nucleoid or Periplasm (Figure 2). Notably PssA was annotated as located in both the IM and the cytoplasm. For eleven proteins (PriB, HemD, DnaT, HolA, HolD, YdhL, HigA, HipB, PheM, TrpL and YobI), no SCL had been experimentally determined. Here, the four OM associated proteins (LptD, LptE, LolB and BamD) are promising potential targets, especially LptD, which contains extracellular domains.
Existence of known inhibitors
Next, the literature was searched for previously reported inhibitors of the selected targets. As expected, none of the targets presented in Table 1 are inhibited by commercially available antibiotics. Through analysis of scientific literature we were able to identify inhibitors targeting a few of the listed targets but, to our knowledge, none has gone beyond laboratory studies: the ZipA/FtsZ interaction has been reported to be inhibited by certain antimicrobial compounds39,40; the insect peptide Thanatin blocks LptA41; compound IMB-881 blocks the interaction between LptA and LptC42; JB-95 inhibits b-barrel proteins including LptD43; MAC13243 inhibits LolA44; BamD is inhibited by an inhibitory peptide45 while the compound IMB-H4 has been shown to block BamA-BamD interaction46, and MukB is inhibited by the small molecules Michellamine B and NSC26059447. Finally, multiple inhibitory compounds targeting TonB have been identified48–50. Thanatin has been shown to possess antimicrobial activity against several Gram-negative bacteria beyond E. coli, including K. pneumoniae, Salmonella typhimurium and Enterobacter cloacae41. IMB-H4 was also able to inhibit growth in K. pneumoniae, P. aeruginosa and A. baumannii46. NSC176319 was found to be active against S. aureus and permeabilised P. aeruginosa and A. baumannii47. JB-95 was reported to have antimicrobial activity against A. baumannii, P. aeruginosa and Staphylococcus aureus43, MAC13243 has been shown to also be active against P. aeruginosa44 and TonB inhibition has been shown to affect A. baumanii49. With the information provided in this study, some of these inhibitors may represent starting scaffolds for development into pathogen-specific antibacterials. In addition, they might represent useful tools in validating future target-based assays.
Target structure
Structure-guided drug design is a powerful in silico approach that can rapidly screen millions of compounds for their ability to dock into a desired target, and identified hits can subsequently be tested in vitro. Thus, 3D structures at a high enough resolution represent an advantage for the targets identified in this study.
Information retrieved from the Protein Data Bank (PDB)51 showed that 3D structures at a resolution of <3 Å existed for 18 proteins, >3 Å for 6 proteins and no structure could be found for 3 proteins, while YrfF was associated with a structure but no resolution information was reported in the database, and no structure had been reported for the remaining nine protein targets (Table 1).