Proteome Mining of Sortase A Dependent Proteins (SDPs) in Lactic Acid Bacteria and Docking Analysis of SDPs interaction with Sortase A

Lactic acid bacteria (LAB), which are important probiotics, play a fundamental role in ensuring the health of the gastrointestinal tract, maintaining the microbiome balance, and preventing the gastrointestinal (GI) tract disorder. One of the effective mechanisms in the bacterial-host interaction is related to the action of the enzyme sortase A and Sortase Dependent Proteins (SDPs). Sortase plays an important role in the stabilization and retention of the probiotic in the gut by exposing various SDPs on the bacterial surface proteins which is involved in the attachment of bacteria to the host intestine and retention in the gut. In this study, out of 165 LABs reference proteomes, there were 25 SDP-free strains. Among the 140 strains with SDPs, 707 proteins were found with the potential to function as SDPs. In this way, ProtScreen software with the ability to recognize a specic motif and domain in the proteome, which is available at http://nigebprotscreen.com/ was designed. Also a database including 707 SDPs in Lactobacillus, Enterococcus, Lactococcus, Carnobacterium, and Leuconostoc strains was designed which is available in the project section at online ProtScreen software. Our results showed that the most abundant amino acid in X position in the LPXTG motif among 165(LABs) is glutamine (Q). Results of SDPs and sortase A docking using HADDOCK and CABS-dock tools, showed that the highest binding energy is related to the glutamine, where a positive relationship between frequency of amino acids and binding energy was observed. Therefore, our data shows that why glutamine in nature and during evolution, has been selected as the best amino acid for X site in LPXTG motif. results of present research and similar studies could useful in better of sortase A and SDPs in the studies on the and engineering of therapeutic proteins which have role in attachment by sortase mechanism, vaccine design using probiotics containing sortase A, and targeted delivery of peptides. These results proves the importance of functional analysis of SrtA and SDPs in LAB strains, including probiotics. With reference to the importance of adhesion, both pathologically and in terms of the benets of LAB strains, results of previous laboratory studies on the adhesion of different strains to cell lines and SDPs found in the present study, therefore SDPs are predicted to play key roles in the enhancement and improvement of the adhesion of probiotics to the intestinal wall. these could to identify the mechanisms that are involved in the effectiveness of probiotics in therapeutic approach. As mentioned, the structure of many SDPs have not yet been explored, and it is that the structural analysis of such SDPs can be in future


Background
Nowadays, besides the fundamental role of foods in providing the necessary nutrients for the growth and development of organisms, some other foods containing probiotics have also been found to be important for good health and ghting diseases. Lactic acid bacteria (LABs), which are members of the Lactobacillus and Bi dobacterium genera, are often used as probiotics (3,4). Moreover, LABs are a group of gram-positive bacteria identi ed by their speci c morphological, metabolic, and physiological characteristics. Also, these bacteria, which are known to produce lactic acid as a by-product of glucose catabolism, always play important roles in maintaining human health (5,6). Notably, the mechanisms that are involved in the activity of LAB strains against bacterial pathogens include the production of hydrogen peroxide, lactic acid, bacteriocin-like molecules, stimulation of the immune system, and changes in gut microbiota (7). Besides, LABs prevents pathogens from adhering to the gut by competing for adhesion sites in intestinal epithelial cells, thereby reducing pathogen colonization as well as the risk of infection (8,9).
To exert some positive effects by a probiotic strain, rstly, it must be able to adhere to intestinal cells and form a bio lm (10,11). The genomes of all gram-positive bacteria, a small number of gram-negative bacteria, as well as some archaea, encode cysteine transpeptidase enzymes are called sortase (12)(13)(14). Accordingly, these enzymes covalently bind proteins to the bacterial cell wall, thus playing an important role in the regulation of the surface structure of microorganisms (15).
Sortase has been divided in classes of A, B, C, D, E, and F (16). In this regard, class A sortase is present in many Gram-positive bacteria, which is a housekeeping protein responsible for the covalent binding of proteins with the LPXTG motif to the bacterial cell wall (17,18).
Class B sortase plays a role in iron homeostasis by binding the iron uptake proteins to the cell wall via the NP (Q / K) TN motif (19). class C sortase plays a role in the polymerization of pili constituents via the motif (I / L) (P / A) XTG (20,21). Class D sortase plays a role in the binding of proteins involved in spore formation through the LPNTA motif, as well as the functions of class F and E sortase, which are generally identi ed in Acinetobacter (22). Sortases are responsible for adhesion and stabilization of a group of surface proteins called SDPs to the gram-positive bacterial cell wall by covalently identifying, breaking, and binding a speci c conserved motif (e.g. LPXTG motif in Class A sortase) (22). To ensure the successful adhesion of SDPs to the cell wall, besides having an LPXTG motif, it is essential to ensure a proper hydrophobicity in the C-terminal region as well as a positively charged tail at the carboxy-terminal Domain (CTD) (23).
Sortase enzymes are considered as pathogenic factors in pathogenic bacteria because they play several vital pathological roles such as adhesion, food supply, and escape from the immune system (24)(25)(26). However, these enzymes that are also present in gram-positive probiotic bacteria like LABs, which have nutritional value and health-promoting effect, have been shown to play an important role in the bacteria-host interaction (23). Sortase have also become an attractive target for studying bacteria-host interaction considering its role in the binding of some proteins to the cell wall. (19,27). The mechanism of action of sortase is important for the expression of proteins on the cell surface as well as its role in host-bacteria interactions, which also plays a prominent place in the functions of probiotic and commensal bacteria related to intestinal mucosal membranes. Therefore, it is possible to predict the mechanism of action of sortase in LABs in probiotic strains. Considering the essential role of sortase in binding its substrate to the cell wall in gram-positive bacteria and the vital importance of a proper adhesion to epithelial cells in improving the functions of probiotic bacteria, SDPs are predicted to play a key role in strengthening and also improving the adhesion of probiotic bacteria, especially some strains like lactobacillus. Therefore, investigating the frequency and type of sortase substrates in LABs is important to identify the functions and bene ting from the bene cial effects of this group of bacteria. Therefore, the present study identi es SDPs in 165 LAB strains using bioinformatics tools, along with investigating their physical, chemical, and structural properties. In addition the performance of different amino acids at the X site using molecular docking along with the effect on the interaction between SDPs and sortase A in lactic acid bacteria were investigated.
By elucidating the three-dimensional structure of protein complexes, a thorough understanding of the mechanisms of molecular recognition and interaction can be achieved (28). In this study, an information-driven from docking approach has been applied to the interaction between sortase A and a penta-peptide representing sortase cleavage motif in SDPs, which represent an essential step in a bacterial attachment to the intestinal cell epithelium and its retention in the gut.

Designing ProtScreen software
A software was designed with the ability to recognize a speci c motif and domain in the proteome. Accordingly, in this software, which is available online at http://nigebprotscreen.com/ . The protein sequence as a FASTA format and the desired motif is uploaded to nd the motif in the proteome the outcome would be Motif name and the UniPort ID of the protein containing that Motif.
Also a Database was designed that includes SDPs in Lactobacillus, Enterococcus, Lactococcus, Carnobacterium, and Leuconostoc strains which is available in the project section at the above-mentioned site. By entering the name of the organism, information such as UniPort ID, Protein Name, Motif, and Motif Position is obtained. The motif discovery application and SDP database are written in Java and IntelliJ environment, and run using Tom Cat 7.0.64 and JDK 1.7, respectively.

Identifying SDPs with LPXTG motif
At rst, ProtScreen software was used to identify the SDPs. The reference protein and genome of 165 LAB strains, including 119 Lactobacilli, 29 Enterococci, 8 Lactococci, 5 Carnobacteria, and 4 Leuconostocs, were investigated. After the sequence analysis, the proteins with the LPXTG motif were identi ed.

Investigating the presence of peptide signal
Considering that it is necessary to have a secretory signal peptide in the N-terminal of SDPs, we investigated the presence of signal peptide (SP) in the proteins obtained from the previous step with LPXTG motif using the relevant online software named SignalP-5.0 at http://www.cbs.dtu.dk/services/SignalP/.
SignalP-5.0 offers a deep neural network-based approach that improves SP prediction across all domains of life, and also distinguishes three forms of prokaryotic SPs. Moreover, a Sec signal peptide (Sec / SPI), a Lipoprotein signal peptide (Sec / SPII), a Tat signal peptide (Tat / SPI) or No signal peptide at all (Other) may be present in the protein (29).

Hydrophobicity of C-terminal Region
To investigate the hydrophobicity at the C-terminal region of proteins containing SP, the ExPASy portal, https://www.expasy.org/was used.
An amino acid scale is de ned by an assigned numerical value for each type of amino acids. In this regard, hydrophobicity or hydrophilicity scales and secondary structure conformational parameters scales are the most commonly used scales; however, many other scales are based on different chemical and physical properties of amino acids. ProtScale on the ExPASy server contains 57 prede ned scales input from the literature (30).

Investigation of amino acid abundance at the x site of LPXTG motif in SDPs
In this study, the abundance of x amino acids in LPXTG motif in SDPs was investigated across ProtScreen database at http://nigebprotscreen.com/ . This database contains 707 SDPs of 165 species of lactic acid bacteria, including Lactobacillus, Enterococcus, Lactococcus, Carnobacterium, and Leuconostoc.

Investigation of the structure of proteins and modeling an appropriate structure
To use 3D structures for docking, all 707 SDPs as well as the structure of the sortase A enzyme in all of the above strains were studied using the UniProt database at https://www.uniprot.org/ to separate suitable structures for docking. To determine the structure of sortase A and SDPs, SWISS-MODEL structure prediction software at https://swissmodel.expa sy.org/ was used to predict the structure of SDPs. The models were evaluated and validated using ProSA-web servers (https://prosa.services.came.sbg.ac.at/prosa.php ) and Ramachandran map analysis (http://molprobity.biochem.duke.edu ).

Sortase A structure selection
Investigation of sortases in strains with SDP showed that there are only four cases of sortases with the known structure in the UniProt database, whose identities included F9UKZ1 (F9UKZ1_LACPL) in Lactobacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1), Q5FJP7 (Q5FJP7_LACAC) in Lactobacillus acidophilus (strain ATCC 700396 / NCK56 / N2 / NCFM), Q836L7 (Q836L7_ENTFA) in Enterococcus faecalis (strain ATCC 700802 / V583) and Q82ZJ9 (Q82ZJ9_ENTFA) in Enterococcus faecalis (strain ATCC 700802 / V583). Because the aim of this study was investigating the probiotic strains, L. acidophilus strain was used to select SDPs in this strain and to model its structure as well as for docking.

Identi cation of the appropriate structure of the SDPs
Among 707 SDPs in the ProtScreen database to separate the appropriate structure, it was found that only two of them had the known structure in the UniProt database with ID of ASA1_ENTFA (with LPQTG motif) in the strain Enterococcus faecalis (strain ATCC 700802 / V583) and Q3Y373_ENTFD (with LPETG motif) in Enterococcus faecium strain (ATCC strain BAA-472 / TX0016 / DO). Considering the pathogenicity of the above strains, these structures were not used for docking with sortase A. Because the structure of the sortase A in the strains of Lactobacillus acidophilus (strain ATCC 700396 / NCK56 / N2 / NCFM) and Lactobacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1) was known, the SDPs structure in these strains were used for docking. There were 5 SDPs with unknown structures in L. acidophilus strain and 6 SDPs in L. plantarum strain. Out of a total of 11 SDPs in these two strains, the structure was determined for only one of the SDPs in Lactobacillus acidophilus based on the SWISS-MODEL results. The above SDP identi cation code in UniProt is: Q5FLL3_LACAC. The LPTTG motif is present in this SDP.

Molecular docking using HADDOCK and CABS-DOCK
Docking of SDPs with sortase A in Lactobacillus acidophilus strain (strain ATCC 700396/NCK56/N2/) was studied using HADDOCK2.2 and CABSdock software at https://haddock.science.uu.nl/services/HADDOCK2.2/ and http://biocomp.chem.uw.edu.pl/CABSdock . In this study, 18 amino acids were evaluated at the X site in the LPXTG motif in the SDP recognized in Lactobacillus acidophilus with the UniProt ID Q5FLL3_LACAC and sortase A with the UniProt ID Q5FJP7_LACAC (which had the TLXTC motif in its active site). The reason for checking only 18 amino acids out of the 20 amino acids was that the two amino acids cysteine and tryptophan were not present in the studies on the abundance of the amino acid X in the LPXTG motif in 707 SDPs. Thus, they were not considered for further study.
HADDOCK is a versatile exible docking approach for modeling biomolecular complexes. HADDOCK differentiates itself from ab-initio docking approaches in that it can incorporate knowledge derived from biochemical, biophysical, or bioinformatics methods to enhance sampling, scoring, or both. The knowledge it can incorporate is very varied: device constraints NMR or MS interface constraints, mutagenesis tests, or bioinformatics predictions; various NMR orientation constraints and, more recently, cryo-electron maps. HADDOCK currently allows the simulation of large assemblies consisting of up to 6 different molecules, offering a truly integrative simulation platform along with its rich data support (31).
The web server CABS-dock provides an interface for modeling protein -peptide interactions using a highly e cient protocol for modular protein docking of peptides. Although other docking algorithms require prede ned binding site localization, such information is not required by CABS-dock. Given the structure of a protein receptor and a peptide sequence (and starting from random conformations and peptide positions), CABS-dock performs simulation search for the binding site allowing complete peptide versatility and small receptor backbone uctuations (32).

SDP Identi cation
2709 proteins with LPXTG motif were identi ed using ProtScreen software. Of them, 811 proteins had signal peptide. Analysis of hydrophobicity in the C-terminal region of proteins with SDPs showed that 707 proteins had adequate hydrophobicity in the C-terminal region. Finally, these 707 proteins by having the potential of functioning as SDPs, were identi ed in the studied LABs. In 707 proteins found, 280 were uncharacterized and 427 were characterized. Table 1 compares the number of the characterized SDPs in the studied strains. An additional le shows more detail about identi ed SDPs [see Additional le 1].
Considering the importance of lactobacilli in this group, SDPs in this genus underwent further analysis. Strains with the highest SDPs in terms of the characterized and uncharacterized SDPs are also listed in Table 2.
From 119 studied lactobacilli, 42 strains were probiotic, and 77 non-probiotic. Table 3 Table   4. The results of HADDOCK are reported in Table 5.
For docking by CABS-Dock, the protein sequence (UniProt ID: Q5FJP7_LACAC), LPXTG peptide and sortase A were investigated. The protein and peptide sequences are listed in Table 6. The results of docking analysis are also shown in Table 7. Figure 3 shows the interaction between sortase A and LPQTG peptide.
Given that the amino acid Q was the most abundant amino acid at the X site in the LPXTG motif in 707 SDPs, which are available in the ProtScreen database, docking results showed that the highest binding energy was related to glutamine amino acid. Also, a positive relationship was observed between frequency and binding energy. This is probably associated with the selectivity of the amino acid Q with lower binding energy, as this binding is the precursor to another reaction called transpeptidase, where the LPXTG motif in SDP cleaves between threonine and glycine and binds to the bacterial cell wall peptidoglycan via nucleophilic amines. Thus, less binding energy in the interaction of the two proteins sortase A and SDP probably has a positive effect on cleavage and continuation of the reaction.

Discussion
Probiotic products can contain one or more selected starins of bacteria. It is noteworthy that, Human probiotic microorganisms often include Lactobacillus, Bi dobacterium, Lactococcus, Streptococcus, and Enterococcus (33) genera, among which, lactic acid strains are of great importance. According to the WHO, FAO, and EFSA, the strains selected as probiotics should be safe (34). Probiotics offer some bene ts to the host by performing various functional mechanisms. Correspondingly, the functional aspects of probiotics are determined by their survival in the gastrointestinal tract and their effects on the immune system (35). Lactic acid bacteria (LAB) have a long-term history of use in food industry and are increasingly used in therapies for their advantages on health effects, and signi cant biotechnological potentials. The developed systems for engineering are combined with novel approaches, such as CRISPR-Cas, to allow the use of LAB for targeted delivery (36).
Sortase is an extracellular transpeptidase enzyme in gram-positive bacteria responsible for identifying and covalently binding a group of secretory proteins called SDPs to the gram-positive bacterial cell wall by identifying a speci c conserved motif in these proteins (22).
Various classes of sortase have been identi ed. Including; Class A sortase, a housekeeping protein, Class B sortase, involved in iron homeostasis, Class C sortase responsible for pilin polymerization, Class D sortase, involved in the spore formation and Classes F and E A sortases which identi ed in Acinetobacter. Notably, the sortase enzyme is present in some gram-positive bacteria, and the members of the lactic acid family are not exception (16,23).
SDPs are included in a class of secretory proteins that have a signal peptide at the N-terminal and a cell wall signaling signal (CWSS) with the LPXTG motif, which can be identi ed by the sortase enzyme using the above-mentioned motif. Then, SDPs adhere to the cell wall after T-G being cleaved by the sortase enzyme (37).
Considering the fundamental role of sortase in adhering its substrate to the cell wall in gram-positive bacteria as well as the high importance of a proper adhesion to epithelial cells in improving the functions of probiotic bacteria (38,39), the present study aimed to identify SDPs in LABs.
Given the importance of using probiotics for different purposes, improving different strains by increasing their e ciency, including the engineering of motifs involved in the binding of probiotics can be very helpful. Due to the importance of binding the strains to apply bene cial functional mechanisms, in this study we tried to investigate the SDPs and the LPXTG motif, and recognize the proper amino acid at the X position which has the most important role in the bacterial-host interactions. Due to the fact that the most abundant amino acid in this site is glutamine, its replacement with different amino acids was performed to investigate the effect on the interaction between bacteria and host.
In this research, a comprehensive software (ProtScreen), which included a Projects Section was designed to identify a speci c motif or domain in the proteome. The SDP data bank including the speci cations of LABs such as their UniProt code, motif recognition, and other SDP characteristics was stored in the above-mentioned software.
The results showed that out of the 165 LAB strains studied, a total of 707 SDP proteins could be identi ed by sortase A. It is important that, these proteins could be identi ed in the future studies of lactic acid bacteria, including their functions, binding to the epithelium, examining the interactions between bacteria and the host, and their applications such as drug delivery and oral vaccines. The online ProtScreen software can also be used in the other studies aiming at nding a speci c motif in the proteome. SDP data bank can be useful in future studies performed on sortase and its substrates.
Most of the studies on SDPs and LPXTG motif have been performed in pathogenic bacteria as a virulence factor, and there has been no comprehensive study on lactic acid bacteria. For example, previous studies have proved the presence of the LPXTG motif, as a contributing factor in the pathogenesis of Listeria monocytogenes (40,41), Staphylococcus aureus (42,43), Streptococcus pneumoniae (44), and Clostridium di cile (45). However, few studies have been performed on the function of this motif in adhering probiotic bacteria to the gut. According to, Ossowski et al. Proteins with LPXTG in the probiotic L. rhamnosus gg strain act as adhesion factors to mucus (46,47). LPXTG proteins have also been shown to stimulate the host's immune system. Mannose-speci c adhesion (Msa) in L. plantarum strain WCFS1 is an example of these proteins (48).
A separate study was performed at the genomic and functional analysis of the genes involved in the adhering of 163 probiotic bacterial strains, including lactobacilli, to the HT29 and HT29-MTX cell lins. 156 strains were prepared from native foods along with 7 standard strains.
In that research 14 genes involved in the adhesion process were investigated including sortase A. Also, there were 4 LAB strains. After

Conclusion
In addition to leucine, proline, threonine, and glycine amino acids in the LPXTG motif, any other amino acid can be placed in the X position, which is attached to the cell wall by SrtA (50). Our results referred to glutamine as the most frequent amino acid, which is naturally present in the LPXTG motif of SDPs. Also Glutamine is a polar uncharged amino acid which plays a key role in binding to surface carbohydrates in membranes (51,52). Successful adhesion of SDPs to the cell wall requires a proper hydrophobicity in the C-terminal region, LPXTG motif and the N-terminal peptide signal. Signal peptide induces the secretion of SDPs through the Sec-pathway and charged C-terminal tail adheres proteins to the cell wall (12,23,53).
Using bioinformatics tools in the present study, the types of possible amino acids at the X site in the LPXTG sortase cleavage motif in SDPs and their role in interaction with sortase A were investigated.
The results showed that out of 707 SDP proteins studied in different strains of lactic acid bacteria, only two had a characterized structure in UniProt, which were in the species of E. faecalis.
Investigation of sortases in the studied strains showed only four sortases with the speci ed structure, which were present in L. plantarum and L. acidophilus and two sortases in E. faecalis.
Since the aim was to investigate the probiotic lactic acid bacteria, by selecting the L. acidophilus strain (strain ATCC 700396 / NCK56 / N2), which had sortase A with known structure, the modeling method was used to determine the structure of SDP to show the interaction between sortase A and SDP.
Today, much attention is being directed to lactic acid bacteria for the treatment of various diseases, especially gastrointestinal diseases and also as a useful dietary supplement. Practical use of probiotics for therapeutic applications requires accurate knowledge of their mechanism of action in the gastrointestinal tract.
In the present study, the interaction of sortase A and SDPs as one of the mechanisms involved in the binding of probiotics to the intestinal epithelium was investigated by HADDOCK and CABS-DOCK, the results of which were mentioned previously.
Accordingly, the most abundant amino acid at the X site in the LPXTG motif in the 707 SDP contained in lactic acid bacteria including Lactobacillus, Enterococcus, Lactococcus, Carnobacterium, and Leuconostoc, which are available in the ProtScreen database, was the glutamine.
Docking analysis showed that the lower binding energy was related to the glutamine.
This results and the use of glutamine at X site in the LPXTG motif can be effective in the design and engineering of therapeutic proteins which have role in attachment by sortase mechanism, vaccine design using probiotics containing sortase A, and targeted delivery of peptides.
These results proves the importance of functional analysis of SrtA and SDPs in LAB strains, including probiotics. With reference to the importance of adhesion, both pathologically and in terms of the bene ts of LAB strains, results of previous laboratory studies on the adhesion of different strains to cell lines and SDPs found in the present study, therefore SDPs are predicted to play key roles in the enhancement and improvement of the adhesion of probiotics to the intestinal wall.
Further investigations of these proteins could help to identify the mechanisms that are involved in the effectiveness of probiotics in therapeutic approach. As mentioned, the structure of many SDPs have not yet been explored, and it is hoped that the structural analysis of such SDPs can be helpful in future studies. a. Z-score indicates how many standard deviations from the average this cluster is located in terms of score (the more negative the better.
c. The top cluster is the most reliable according to HADDOCK.