Identification of a novel interaction between SMC1 DNA damage repair protein and Escherichia coli O157: H7 EspF using co-immunoprecipitation combined with mass spectrometry

Background: It is known that the enterohemorrhagic Escherichia coli (EHEC) O157: H7 EspF is a multifunctional effector that triggers several damage processes in the host cells. However, in the process of EHEC O157: H7 infection, the interaction between EspF, its N- or C-terminus, and host proteins, are still unclear. Results: In this study, we used co-immunoprecipitation combined with mass spectrometry to screen EspF-interacting proteins. A total of 311 host proteins are detected. The N-terminus of EspF is found to interact with 192 proteins, whereas 205 proteins interact with the C-terminus of EspF. These proteins are mainly involved in RNA splicing, endoplasmic reticulum stress, and a variety of metabolic signaling pathways. We verify here for the first time that SMC1 interacts with EspF and more strongly with its C-terminus, and provide evidence that EspF increases p-SMC1 levels. p-SMC1, known to influence the S-phase cell cycle arrest and usually express during periods of DNA damage. Surprisingly, Mass spectrometry reveals that EspF can also phosphorylate H2AX, suggesting that EspF may directly mediate DNA damage through SMC1 phosphorylation. Conclusion : Taken together, this is the first study describing the interaction between EspF and SMC1. Our work lays a foundation for further research on directly EspF-mediated host cells’ DNA damage, apoptosis, and even colorectal carcinogenesis. Genomes (KEGG), and Clusters of Orthologous Groups (COG) functional annotations. The STRING online tool was used to analyze interactions between target proteins, and Cytoscape software was used to draw the protein interaction network (PPI). The interactions were validated by immunoprecipitation, and the regions of interaction were identified. Co-localization of EspF and the target proteins was performed by confocal microscopy. Our research produced a protein network map between EspF and the host proteins, laying a foundation for further research on how EspF directly

Severe cases may be life-threatening. EHEC O157: H7 uses the T3SS type III secretion system to adhere to the brush border of epithelial cells, and then injects effector proteins into host cells. EspF is one of the most important virulence factors of A / E pathogens [3]. EspF exists in EHEC, enteropathogenic Escherichia coli (EPEC), and Citrobacter rodentium, which are potentially harmful to human. It targets the mitochondria and nucleoli [4,5], destroys the tight junctions of intestinal epithelial cells [6], leading to the disappearance of intestinal epithelial microvilli, and induces host cell apoptosis [7]. Due to its biological effects, it is known as the "Swiss Army Knife" of bacterial pathogens [8].
The N-terminus of EHEC O157: H7 EspF (1−73 aa) contains a secretion signal (1−20 aa), a host cell mitochondrial targeting signal (mitochondrial targeting signal, MTS, 1−24 aa) [9], and a nucleusbinding domain (Nucleolar targeting domain, NTD, 21−74aa) [5]. Roxas found that EspF localizes to mitochondria, destroys mitochondrial membrane potential, and activates the apoptotic proteases 3 and 9. The apoptotic proteases can cleave the epidermal growth factor receptor (EGFR) of host cells, leading to the degradation of EGFR and a dramatic increase in host cell death in the late stages of infection [10]. The C-terminus (73−248 aa) is composed of four highly homologous proline-rich sequences(PRRs), each containing a eukaryotic cell SNX9 (Sorting nexin 9), protein binding site SH3 (Src homology 3) motif, and an N-WASP (Neuronal Wiskott−Aldrich syndrome protein) binding domain, a possible actin-binding motif (ABM) [8,11]. In addition, the results of Amin shows that EspF has a particular anti-phagocytosis effect. The EspF of EHEC O26: H11 and EPEC O127: H6 can prevent bacteria from being engulfed by macrophages by the PI3K pathway, while the ability of EspF in O157: H7 is significantly reduced [12].
Although there has been some research on the interaction between EspF and host proteins, the molecular mechanisms of EspF interaction with host proteins and the impact of these interactions on cell damage and apoptosis are still unclear. The identification of the interaction between EspF and the host is important to elucidate the pathogenic mechanism.
CoIP-MS is one of the most widely used high-throughput techniques for discovering protein-protein interactions. In our research, we analyzed the molecular function (MF), biological processes (BP) and cellular components (CC) of the related proteins by Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Clusters of Orthologous Groups (COG) functional annotations. The STRING online tool was used to analyze interactions between target proteins, and Cytoscape software was used to draw the protein interaction network (PPI). The interactions were validated by immunoprecipitation, and the regions of interaction were identified. Co-localization of EspF and the target proteins was performed by confocal microscopy. Our research produced a protein network map between EspF and the host proteins, laying a foundation for further research on how EspF directly mediates DNA damage in the host cells, even causes colorectal carcinogenesis.

Results
Isolation of differential bands in EspF / EspF-N / EspF-C groups from cell lysates After transfecting pEGFP-EspF, pEGFP-EspF / N, and pEGFP-EspF / C-terminus encoding plasmids into 293T cells for 48 hours, the lysis proteins were added to Flag columns and IgG columns for coimmunoprecipitation ( Fig. 1). pEGFP-EspF was about 58 kDa, pEGFP-EspF / N was 30 kDa, and pEGFP-EspF / C was 55 kDa. Compared with the IgG group, between 35−40 kDa, there were two differentially expressed bands in the pEGFP-EspF group. at 40 kDa, there was a differential band in the pEGFP-EspF / N group. At 130 kDa, there was a differential band in the pEGFP-EspF / C group. Besides these bands, the bands at the same position in the lanes of the respective IgG groups were also cut out. The bands were digested by mass spectrometry to detect the interacting proteins, and the Flag groupspecific proteins (minus the IgG group proteins) were considered as the putative interacting proteins.

Prediction and analysis of interacting protein with EspF
A total of 708 proteins were identified in this work, 311, 192, and 205 proteins were detected in the pEGFP-EspF group, pEGFP-EspF / N group, and pEGFP-EspF / C group, respectively. We also performed functional annotation (including GO, Pathway, STRING analysis) to identify proteins. Through these analyses, we attempted to discover essential proteins.
All possible target proteins that interacted with EspF were loaded into the DAVID database for KEGG pathway annotation and GO enrichment. The threshold was set to ps0.05, and pathways or gene functions with higher counts were analyzed. The top 20 pathways were plotted with Graphpad Prism 6 ( Table 1). Analysis of the differential bands at about 38 kDa in the pEGFP-EspF group by GO annotation analysis revealed that the interacting proteins were involved in 25 biological processes (BPs). Of these, the primary were intracellular processes (12.3%), metabolic processes (10.9%), biological regulation (8.8%), and immune stimulation (6.6%). Cell Components (CCs) enrichment mainly involved cellular anatomical entities (37.7%), intracellular (36.6%), and protein-containing complex (20.9%). Molecular Functions (MFs) mostly involved binding (53.6%) and catalytic activity (22.3%) ( Fig. 2A). Pathway analysis showed that the interacting proteins involved 183 pathways, notably Metabolic pathways (32.6%), Carbon metabolism (19.6%), and Biosynthesis of amino acids (13.0%). (Fig. 2B). Table 1 The top 20 proteins of the 38 kDa differential band between the EspF group and IgG group. We used STRING to analyze the target proteins with which EspF interacted, and found that RPS6, RPL14, and EIF2S1 had the highest connectivity (Fig. 2C). RPS6 plays an essential role in controlling cell growth and proliferation by selectively translating specific kinds of mRNAs [13]. RPL14 is a large ribosomal subunit component that plays a role in mRNA catabolism and translation [14]. EIF2S1 works in the early stages of protein synthesis by forming a ternary complex with GTP and initiator tRNA [15].
This analysis showed that in addition to gene transcription regulation and protein synthesis, EspF also plays a crucial role in cell proliferation, and catabolism.
The GO annotation analysis of the 36 kDa differential band showed that the interacting proteins were involved in 26 BPs, such as intracellular processes (12.5%), metabolic processes (10.2%), and biological regulation (9.0%). CCs and MFs results were consistent with the 38 kDa differential band results ( Fig. 3A) Pathway analysis showed that the interacting proteins involved 150 pathways, including Metabolic pathways (23.0%), Protein processing in the endoplasmic reticulum (13.1%), and Parkinson disease (11.5%) (Fig. 3B). STRING analysis showed that the RPL7A, RPS20, and EIF2S1 had a high degree of connectivity, and we found that there was also a high degree of connectivity between the MDH2 and GOT2 (Fig. 3C). Among them, MDH2 plays a role in cell metabolism and amino acid acetylation [16]. GOT2 is necessary for metabolite exchange between mitochondria and cytoplasm and plays a crucial role in amino acid metabolism [17].
Prediction and analysis of interacting proteins with EspF-N or C-terminus The GO annotation analysis of the differential bands at about 40 kDa in the pEGFP-EspF / N group showed that the interacting proteins involved 25 BPs, such as intracellular processes (12.3%), metabolic processes (10.8%), and biological regulation (9.2%). The CCs and MFs analyses were similar to EspF (pEGFP-EspF). Pathway analysis revealed 228 interacting proteins that were mainly involved in Metabolic pathways (27.1%), Protein processing in the endoplasmic reticulum (9.41%), and pathways in cancer (8.2%). (Fig. 4A and B). STRING analysis of target proteins interacting with EspF-N terminus also found that RPL8, RPS9, and EIF3I proteins had the highest degree of connectivity, indicating that EspF may use its N-terminus for ribosomes recognize binding sites ( Fig. 4C).
GO annotation analysis of the pEGFP-EspF / C group in the 130 kDa differential band showed that BPs enrichment mostly involved translation (23.1%), oxidation-reduction process (15 COG analysis showed that interacting proteins mostly involved translation, ribosomal structure, biogenesis, posttranslational modification, protein turnover, and chaperones (Fig. 5A). KEGG showed that most of the cellular processes involved protein transport, and the signal generation that affected cell growth and apoptosis. Of the interacting proteins, 30.3% were localized in the cytoplasm, and 27.27% were localized in the nucleus (Fig. 5BC). The Venn diagram showed 93 proteins that were all involved in the above four pathways (Fig. 5E). Among these proteins, TUBB, and ANXA2 showed strong interaction with EspF-C. TUBB, a main component of tubulin, has GTPase activity, and plays a key role in the microtubule cytoskeleton organization [18]. Studies have shown that EspF interacts with SNX9 in the cytoplasm to induce the formation of membrane tubules and the host cell membrane change [19]. In addition to the SNX9 binding site, EspF can also activate N-WASP to induce actin polymerization, TJ disruption, and anti-phagocytosis [20]. Moreover, EspF also directly binds to SMC1 was identified as a novel EspF-interacting protein, and its interaction was more robust with the

EspF-C terminus
We are the first to study the mechanism of interaction between EspF and DNA damage repair proteins. Western blotting and confocal analysis further confirmed the interaction between EspF and SMC1. Co-immunoprecipitation results confirmed that EspF interacted with SMC1, and the interaction was stronger when with EspF-C terminus (Fig. 6A). Immunofluorescence analysis also showed that EspF and SMC1 were co-localized in the cytoplasm (Fig. 6B), and EspF relocated SMC1 more from the nucleolus into the cytoplasm, suggesting that SMC1 may not play its usual role in the nucleolus.
To investigate the EspF-SMC1 interplay, we measured SMC1 and its phosphorylation levels after Caco2 cells were transfected with EspF. Immunofluorescence showed that EspF could significantly increase the expression of p-SMC1, and p-SMC1 was distinctly localized in the cytoplasm and colocalized with EspF (Fig. 6C). Compared with the control, the level of SMC1 remained unchanged after transfection with EspF, but p-SMC1 significantly increased ( Fig. 6D-E). We then verified the results by infecting HT-29 cells with the strain. The expression of p-SMC1 in cells transfected with the EDL 933 was higher than infected ΔespF strain, and infecting the complement ΔespF / pespF restored p-SMC1 expression. The above results verified that EspF could increase the expression level of p-SMC1.
In general, when cell DNA is damaged, cyclin-mediated cell cycle arrest will occur. During this period, DNA damage repair proteins are recruited to double side-band break (DSB) [26]. Therefore, we speculated that EspF might lead to DNA damage, which stimulates the S-phase detection point by increases p-SMC1 expression, thus mediating damage repair.
EspF may mediate DNA damage by modifying the histones Mass spectrum results showed that EspF could modify multiple sites of various proteins (Table 2).
Among the modification results, SFXN1, HAX1, EIF3I, ATG16L1, and DNA damage binding proteins had high scores. SFXN1 is a mitochondrial serine transporter that mediates serine into mitochondria and plays an essential role in the single-carbon metabolic pathway [27]. EspF can cause oxidative phosphorylation and methylation of SFXN1, which may mediate the metabolism required component for transport in and out of the mitochondria. Table 2 EspF-mediated phosphorylation, acetylation, and methylation of the host proteins.
Protein symbol Score Pep before Pep_seq Pep after Pep mod MS results showed that host proteins interacting with EspF were mostly involved in metabolic pathways and localized in the cytoplasm and nucleolus. We found that EspF interacted strongly with ribosomal RPL, RPS, and EIF family proteins, and EspF can phosphorylate them. Previous proteomics studies have shown that many ribosomal protein levels in intestinal cells decrease after EPEC infection [36]. In cells expressing EspF, pre-rRNA synthesis is blocked. At the same time, EspFdependent EPEC infection reduces the expression level of ribosomal protein RPL9, and changes the relocation of RPS5 and U8 small nucleolar RNA (snoRNA) [37]. Our results provided further support to the hypothesis that EspF may exercise its biological function by regulating ribosomal protein synthesis and relocation. Future research will attempt to decipher the mechanism by which EspF leads to the inhibition of ribosome synthesis.
EspF targets mitochondria and regulates the expression of DNA mismatch repair proteins in host cells through post-transcriptional manipulation, leading to depletion of Apc and MMR proteins in host cells [38], and increases the instability of microsatellite DNA sites, which is a precursor to DNA damage and can even lead to colon cancer [39]. In this work, we also found that EspF can lead to H2AX phosphorylation and modify DDB2. Phosphorylated H2AX is an indicator of DNA damage. This series of evidence suggests that when EspF enters the host cell, it may cause DNA damage in the early stage by regulating the expression of damage repair proteins. SMC1 is a chromosomal structural protein. When DNA is damaged, the upstream ATM will phosphorylate SMC1 in DSB to form the BRCA1-NBS1-MRE11 complex, and SMC1 activates the S-phase checkpoint in DNA damage repair [40].
Abnormal gene expression or mutation can lead to a deficient DNA damage repair pathway, which are closely related to tumorigenesis [41]. Thus, we focused on the interaction between EspF and SMC1.
Immunofluorescence and co-immunoprecipitation results confirmed the interaction, and the interaction with EspF-C was stronger than with EspF-N terminus.
Furthermore, EspF can increase the expression of p-SMC1 in Caco 2 cells and localized it in the cytoplasm. Thus, EspF may mediate DNA damage repair, cell proliferation, and even canceration by phosphorylating SMC1 (Fig. 7). The DNA damage effect also needs to be study by transfecting siSMC1 or adding ATMi, which is an inhibitor of ATM. There is already the evidence that A/E E. coli infects intestinal epithelial cells, causing DNA damage in host cells, and accelerates the development of cancer through the toll-like receptor signaling pathway, NF-κB pathway, and other cellular inflammation pathways [42]. However, there is no research on whether EspF can directly cause DNA damage, which is worthy of further study.
Meanwhile, we separately studied the interacting proteins and pathogenic mechanisms of EspF-N/C terminus. We found that the proteins interacting with the N-terminus are mostly involved in metabolic pathways and protein processing in the endoplasmic reticulum. Our previous research proved that the N-terminus is essential for cell apoptosis, inflammatory response, and animal toxicity [43].

Conclusion
In this study, we focused on 311 host proteins that interact with the EHEC O157: H7 EspF and used bioinformatics enrichment to analyze their molecular functions, BPs, and cellular pathways. These findings provide new candidates for EspF interactions, suggesting that EspF can phosphorylate H2AX and regulate the DNA damage repair process by interacting with SMC1. These results are very encouraging, and we provided a PPI network of interactions of EspF with the host, which brings great hope for the field of protein interactions mediating the pathogen EHEC O157: H7 and host cell interplay.
The primers used in this study were synthesized by Sangon Biotech (Shanghai, China). Gene sequencing was performed by Guangzhou IGE Biotechnology.
Construction of pEGFP-EspF, pEGFP-EspF / N, and pEGFP-EspF / C-terminus encoding plasmids We amplified the espF gene and its N-terminus (1−219 bp) and C-terminus (220−747 bp) with LA high-fidelity enzyme on the genome of EDL 933. Each constructed plasmid was labeled with 3xFlag. CGCGGATCCCCACCTCCCC amplified the espF /C. The PCR product was digested with EcoRI / BamH I restriction enzyme and T-linked to the pEGFP-N1 plasmid to generate pEGFP-EspF, pEGFP-EspF / N, and pEGFP-EspF / C plasmids. Sequencing verified the integrity of the constructed plasmids.
293T cells were cultured according to standard methods, and the cells were plated on 10 cm 2 culture dishes (NEST, Hong Kong, China). According to the manufacturer's instructions, the cells were transfected with pEGFP-EspF, pEGFP-EspF / N, pEGFP-EspF / C plasmids using Lipofectamine 3000 (Thermo, USA).

CoIP and Western Blot
We followed the instructions of the co-immunoprecipitation Pierce CoIP Kit 26149. We took 80 µl of the control agarose resin slurry and added 1 mg of the protein sample to remove the non-specifically

Mass Spectrometry
The decolorizing solution was added to the slice, which was then washed with mass spectrometry water. The gel was ground, acetonitrile was added, the supernatant was removed, 50 ml of the enzymatic hydrolysis was added, and then the sample was incubated at 37 °C for 6 h. Then, 0.1% formic acid water was added, and the sample was shaken for 5 min to make it fully swelled.
Acetonitrile was added, and the sample was shaken for another 5 min. The sample was centrifuged, and the supernatant was recovered to be analyzed on a mass spectrometer.
Bacterial infections EHEC O157: H7 EDL 933 was cultured in LB medium, ΔespF and ΔespF / pespF in LB broth medium containing kana antibiotics at 37 °C for 12 hours, and the HT-29 cells were seeded in 10 cm 2 culture dishes. When the cells reached 95% confluence, the cellular monolayer was washed with PBS, and the HT-29 cells were infected with bacteria at an MOI = 100: 1, and incubated in a humidified incubator at 37 °C and 5% CO 2 for 6 hours. Then, the medium was aspirated, the cells were gently washed with PBS, and the proteins were collected for immunoblotting.

Immunofluorescence
Vero cells were plated on a confocal dish (35 mm, NEST, Hong) and grown overnight. When the cells reached 50% confluence, the cells were transiently transfected with the pEGFP, pEGFP-EspF, pEGFP-EspF / N, and pEGFP-EspF / C vectors. After 48 hours, the cells were gently washed with PBS three times, then fixed in 4% paraformaldehyde for 15 min at room temperature, and blocked with 0.1% Triton X-100 for 30 min. The primary antibody was added, and the sample was incubated at 4 °C overnight. The sample was washed with PBS three times, incubated with the secondary antibody for 30 min, and then DAPI was added for nuclear staining. Cellular colocalization was observed under an FV1000 confocal microscope (Olympus, Tokyo, Japan).

Bioinformatics Analysis
The DAVID online tool was used to analyze protein MFs, BPs, CCs, cell localization, and KEGG signaling pathways. Taking the DAVID database as a background reference, we used GO and KEGG enrichment analysis to analyze the interacting proteins. The statistical analysis was based on Fisher's exact test.
We used STRING to analyze PPIs. Venn diagrams were used to examine each protein pathway annotation.

Availability of data and materials
The datasets used and/or analyzed during this study available from the corresponding author on reasonable request. Silver staining of proteins. From left to right is the Maker, pEGFP-EspF control group and experimental group, the pEGFP-EspF / N control group and experimental group, and the pEGFP-EspF / C plasmid group. Plasmids were transfected into 293T cells for 48 h. The proteins were lysed electroporated on an SDS-PAGE gel. The pEGFP-EspF group had two differential bands at 35-40 kDa, the pEGFP-EspF / N group had a differential band at 40 kDa, and the pEGFP-EspF / C group had a differential band at 130 kDa.  The distribution of the top 20 interacting proteins of Genes and Genomes (KEGG) pathways.
(C) STRING analysis of statistically significant proteins detected in EspF-interacting proteins.
The networks also illustrate the functional relationships (the edges) between the nodes, the thickness of which is directly proportional to the association's significance score.   EspF interacts with SMC1 and can increase its phosphorylation level. (A) Coimmunoprecipitation to identify the interactions between SMC1 and EspF/EspF-N /EspF-C.
The pEGFP-EspF, pEGFP-EspF-N, and pEGFP-EspF-C were tagged with 3xFLAG and were transfected into 293T cells. Cell proteins were extracted and incubated with Flag/IgG agarose beads. Then, the cell proteins (input) and the beads binding proteins were prepared for 10% SDS-PAGE. (B) Immunofluorescence to identify the interactions between EspF and SMC1. pEGFP-N1 and pEGFP-EspF were transfected into Vero cells. Then, the cells were incubated with anti-SMC1 antibody, and the colocalization was observed under a confocal microscope. The arrow points to co-location. (C) Immunofluorescence to identify the p-SMC1 What is the biological effect of EspF mediated by SMC1? EspF can increase the expression of p-SMC1 in Caco2 cells, and relocalize it in the cytoplasm. Therefore, EspF may mediate DNA damage, cell proliferation, cell cycle arrested, cell apoptosis, and even cancer by interacting with SMC1.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.