BION-2: Predicting positions of non-specifically bound ions on protein surface by a Gaussian-based treatment of electrostatic environment

Background: Ions play significant roles in biological processes - they may specifically bind to a protein site or bind non-specifically on its surface. Though, the role of specifically bound ions range from actively providing structural compactness via coordination of charge-charge interactions to numerous enzymatic activities, non-specifically surface-bound ions are also crucial to maintaining a protein’s stability, responding to pH and ion concentration changes and contributing to other biological processes. However, experimental determination of positions of non-specifically bound ions is not trivial since they may have low residential time and experience significant thermal fluctuation of their positions. Results: Here we report a new release of a computational method, the BION-2 method, that predicts positions of non-specifically surface-bound ions. The BION-2 utilizes the Gaussian-based treatment of ions within the framework of the modified Poisson-Boltzmann equation, that does not require a sharp boundary between the protein and water phase. Thus, the predictions are done by the balance of the energy of interaction between the protein charges and the corresponding ions, and the de-solvation penalty of the ions as they approach the protein. Conclusions: The BION-2 is tested against experimentally determined ion’s positions, with both X-ray and NMR determined positions, and it is demonstrated that it outperforms the old BION and molecular dynamics tools. The BION-2 is available as a web server as well.


Background
Ions are an important component of biological systems as they interact with macromolecules and directly participate in a wide range of reactions [1][2][3]. In molecular biology, the ions can be broadly grouped into two categories: mobile ions in the water phase and ions bound to the corresponding macromolecule. The mobile ions in the solvent freely move in response to electrostatic environment and their major role is to provide screening of electrostatic interactions [4,5]. On the other hand, the ions bound to macromolecules are involved in specific interactions with macromolecular moiety and play roles in catalysis, electron/proton transfer reactions and structural stability [6][7][8]. In-between these two well distinguishable categories are ions that are weakly bound to macromolecular surface, without being involved in specific chemical interactions and have low residential time, the non-specifically surface-bound ions [9].
This work focusses on such type of ions and reports a new development of an algorithm, the BION-2 algorithm, that predicts positions of non-specifically surface-bound ions.
The role of non-specifically surface-bound ions in molecular biology is well documented. Thus, Ca 2+ and Mg 2+ ions non-specifically bind to backbone phosphate oxygen atoms of nucleic acid [10][11][12], and the binding reduces the electrostatic repulsion between adjacent phosphate groups, hence stabilizes pairing and base stacking [13,14]. The non-specifically surface-bound ions were found to be a key regulator for protein-protein binding and pH-dependence of the binding affinity [15], to affect ion-induced filament formation [16] and to alter macromolecular dynamics [17]. Surface-bound ions are essential for reducing the effective net charge of macromolecules and their effect is manifested via Zeta potential [18,19]. It was demonstrated that accounting for surface-bound ions is crucial for modeling experimentally measured Zeta potential for various proteins [20]. The list of examples can be extended; however, it is evident non-specifically surface-bound ions are essential for many biological processes.
Having in mind the importance of non-specifically surface-bound ions in biology, significant efforts were invested to determine or predict their positions. From an experimental point of view, the main obstacle is that such ions have low residential time and experience large thermal fluctuations. Furthermore, X-ray based experimental techniques require crystals to be grown, and some of the ions could be simply artifacts of crystal packing and high-salt concentration typically required for growing crystals. On the other side of the spectra are computational models to predict positions of non-specifically surface-bound ions. To the best of our knowledge, the BION [21,22] is the only publicly available resource for predicting such type of ions (excluding recent work [12] which however does not provide web service), while many other predictors deal with specifically bound ions [23,24].
In this work we report a new development of BION [21,22], the BION-2, which is a method and a web server to predict non-specifically surface-bound ions. The development takes advantage of a Gaussian-based smooth dielectric function in DelPhi [25][26][27][28][29]. This allows the energy function that evaluates the possibility that a given site holds an ion to be made of two important components: (a) electrostatic energy of interaction between the candidate ion and the charges of the macromolecules and (b) de-solvation penalty the ion should pay by approaching the macromolecular surface.

Results
The results section is organized as follows. First, we provide two examples of predicted ion's positions along with experimentally determined surface-bound ions. Second, we report the results on benchmarking BION-2 to predict surface-bound ions against experimentally determined ions' position using X-ray and NMR datasets. Lastly, we compare BION-2 predictions with VMD predictions.
The visual example section outlines two cases: (a) a case of a protein with only one experimentally determined ion; and (b) a case of a protein with three experimentally determined ions. The first example deals with protein (listed as 1C10 in PDB) which has a Clion bound (Fig. 1a). The predicted Clpositions with rank 1 and 10 are also shown in Fig. 1a. One can see that predicted Clwith rank 1 closely matches the experimentally determined Cl -1 position. In contrast, the predicted ion position with rank 10 is far away from the experimental one. The second example is a protein (listed as 1IZ7 in PDB) that has three experimentally determined Ca 2+ ions. The rank 1 predicted position closely matches one of the experimental ion positions, but it is far away from others. This example illustrates that in case of multiple ion positions, one should choose the best match when performing benchmarking tests. The optimal performance is expected to result in the smallest difference between the experimentally determined ion position and the predicted one with rank 1, as well as the smallest D min . To test the sensitivity of predictions with respect to the grid resolution, the value of internal reference dielectric constant and the ion concentration, were systematically varied. The best performance was achieved at an internal reference dielectric constant of 2 and a salt concentration of 0.5M for X-ray dataset and 0.4M for NMR dataset. A tradeoff between the resolution and the speed of calculations was reached at grid resolution of 2 grids/Å. The rest of the results are reported for this set of parameters which were made default for BION-2 algorithm.
The X-ray dataset provides cases for four types of ions and benchmarking results are shown in Fig. 3. It can be seen that the distribution of D min is much more impressive than the distribution of the rank 1 distance. This indicates that the scoring function, eq. (6), is not successfully capturing the binding in all cases. Perhaps such cases, where D min is not scored as rank 1, represent binding that does not involve only electrostatics, but other energy components are involved as well. Despite that, one can see that about 10% of Ca 2+ , Cland Zn 2+ ions are predicted extremely accurately. If one provides a tolerance of 20Å, then about 80% of Cl -, Mg 2+ and Zn 2+ ions are predicted accurately as well. Since the X-ray dataset is identical to the dataset used in the previous BION version [22], we compared the performance of the new BION-2 versus the old version of BION, which used traditional PBE and standard treatment of molecular surface. Results are shown in Fig. S1 and S2. One can see that the new BION-2 outperforms the old BION version in both ranking and providing better D min . This is especially notable for the ranking of Ca 2+ and Zn 2+ ions, where BION-2 is much more accurate than the old BION version. The next benchmarking was done on the NMR dataset (Fig. 3), which contains only two types of ions, Ca 2+ and Zn 2+ . We observe the same trend as in case of X-ray dataset, namely that the distribution of D min is more impressive than of rank 1. However, it is noticeable that BION-2 predictions are not as much scattered as it was in the case of the X-ray dataset. The largest distance is less than 40A, while in the case of the X-ray dataset, some rank 1 positions were more than 100Å away from the experimental ones. The BION-2 better performance on the NMR dataset compared with X-ray dataset could be attributed to the quality of experimental data.
Indeed, some of the ions presented in the X-ray dataset may have originated from crystallographic artifacts and may not actually be present in proteins in solution.
BION-2 vs VMD: While understanding that the "ionize" module of VMD is designed to place ions in solution, at a minimum distance larger than 6Å away from the protein surface and that the goal is to neutralize the net charge of the protein, it is still tempting to compare VMD versus BION-2 predictions (the VMD requirement of placing the ions at more than 6A away from protein surface is tolerable since many of BION-2 predictions are within the same range -Figs.   BION-2 webserver: The method is implemented into a webserver that is freely available to the community. The users must provide a structural file in PQR format, select ion type and number of ions to be predicted. The BION-2 returns the x,y,z position of the predicted ions along with visualization and other relevant information.

Conclusion
A new development of the BION algorithm, the BION-2, was reported and shown to outperform the old one in placing non-specifically surface-bound ions. While placement of ions in the solution is a standard procedure prior to an MD simulation, and there are many tools for doing that, we demonstrated that they are not efficient in predicting surface-bound ions. Thus, if one is concerned with predicting surface-bound ions, BION-2 should be the primary choice. The method is available as web server as well at http://compbio.clemson.edu/BION-2/.

Database of protein structures
To benchmark BION-2 predictions, we used two set of data: (a) previously compiled set of X-ray structures with surface-bound ions [9,22]  the Protein Data Bank (PDB) (https://www.rcsb.org/), extracting structures containing ions of the following types, Ca 2+ , Cl -, Mg 2+ , Zn 2+ , Na + , and K + . Then the structures were pruned, as described in the next section, to remove cases of buried ions (which are not considered to be non-specifically bound). If the PDB file had more than one model, the first model was used for the predictions.
Pruning the dataset to identify the surface bound ion protein structures.
Since the BION algorithm is designed to predict only surface exposed ions (non-specifically bound ions), cases of buried ions were removed. For this purpose, the solvent accessible surface area (SASA) of free ion was calculated for each type ion using a probe radius of 1.4 Å. Then the same was done for the corresponding ions in the PDB structure and ions with a relative SASA less than 10% were excluded from the library. Finally, a library of 64 Ca 2+ -containing, and 24 Zn 2+ containing NMR structures was created. The purging greatly reduced the number of cases with Cland Mg 2+ ions, therefore structures with CIand Mg 2+ ions were excluded from the NMR dataset. The remaining PDB files were protonated with VMD software with Charmm27 force field parameters [30].

Ions treatment in the framework of Gaussian-based smooth dielectric function.
In the Gaussian-based smooth dielectric model, the continuum solvent/water medium is smoothly merged with the macromolecular region. It ensures that a smooth transition of the dielectric properties occurs from the macromolecular interior to the water phase. The idea is to represent each atom as an atom centered Gaussian density function (Eq 1) as opposed to a hard sphere [26,28,29,31]. (2) where N m stands for the total number of atoms.
At the end, the smooth dielectric function throughout the space is defined as: where, and are internal and external reference dielectric constants in the macromolecule and water, respectively.
Since there is no sharp boundary between the solvent and solution in the Gaussian-based smooth dielectric function model, the treatment of mobile ions in the water phase requires modification of the Poisson-Boltzmann equation (PBE), so as not to allow ions to penetrate into the solute interior. Recently we introduced a modified PBE that penalizes ions to be present in space regions occupied by protein atoms via de-solvation penalty within the Boltzmann factor [27]: Where ( ), ( ), and ( ) are the dielectric constant, electrostatic potential, and charge density of solute at given locations, respectively -is the ionic charge, is the ion concentration in bulk solvent, ∆ is the solvation penalty term for ions, R is the ideal gas constant, and T is the temperature. The de-solvation penalty, ΔG solv , is calculated via the following formula: , where is the Avogadro constant, z is the valence of the ion, e is the elemental charge, 0 is the permittivity of vacuum, 0 is the effective radius of the ion, is the dielectric constant at a given location, and is the dielectric constant of bulk water. For computational efficiency, the effective ion radius is approximated using 2.0Å for both cations and anions.

Electrostatic potential map calculations
Electrostatic potential 3D distribution (electrostatic potential map) was obtained with DelPhi applying the following parameters: scale = 1 grid/A° ; percent of protein filling of the cube = One can note that this is the argument of the Boltzmann factor in the corresponding modified PBE (eq. 4). Thus the first term in eq. (6) is the energy of interaction between protein charges and the ion calculated as the product of the electrostatic potential (F(s)) calculated at each grid point and the charge of the ion (q ion ). The second term is the de-solvation penalty, eq. (5), where e r is the averaged dielectric value at the corresponding grid point, averaged over the six neighboring midpoints. This formalism, i.e. eq. (6), is along the lines of Gaussian-based approach in DelPhi that does not consider sharp border between solute and solvent, rather in this case it assigns a smooth de-solvation penalty function to prevent ions from going inside the solute.
The eq. (6) is used to assign value G(s) for each grid point outside the vdW surface of the solute (note that the grid points near the solute will have highest G(s), since the electrostatic potential quickly decays away from the protein charges). Then a heap-sort technique is used to rank each grid point based on the corresponding G(s), resulting in a priority queue. The most prior, and therefore the most highly ranked site, is the one with the lowest value of ( ) (note that negative value makes the energy favorable). To reduce memory usage, only sites with a negative value of ( ) are stored and the rest is discarded.
From the priority queue, sites are "popped" in the order they are stored to check for plausible vdW clashes. Thus, each "popped" site's prospect of steric clash with the protein's vdW surface is measured by comparing its distance from the nearby protein atoms ( ( , )) and the sum of their vdW radii ( and ), i.e. a site is discarded if ( , ) < + . If successful, the site is then checked for its proximity to all the other predicted sites by ensuring that the distance between the two is greater than 6Å (≳ 2 ). If a site successfully passes these two tests, it is output as a prospective site and assigned a new rank. As mentioned, the lower the rank, the better suited a site is to hold an ion in question around that protein.
The number of prospective sites output by the refined program is limited by a maximum, which a user can provide. For each output site, their coordinates and ranks are printed. To help with further analyses, the outputs also report the site's dielectric, de-solvation energy therein and a list of the neighboring protein atoms.

Using VMD to place ions
We use the VMD-ionize method to compare the VMD and BION ions positions (for given type of ions). VMD-Ionize is a program for placing ions near a biological molecule in preparation for molecular dynamics simulations to make the net charge of the system is zero. In this case, the placement is performed by calculating the coulombic potential due to the molecule in the nearby volume and placing ions at points of minimal energy. After each ion is placed, the potential is updated, so that subsequent ions will be placed in response to this. It should be mentioned that VMD places ions at distances larger that 6A ways from protein surface, so it is not intended to predict positions of surface-bound ions. However, we use VMD placed ions to compare with BION-2 placed ions to get an idea how important it is to calculate electrostatic potential via PBE, rather than by a Coulombic law in a homogeneous media, and how important is to take desolvation penalty into account.

Consent to publish
Not applicable

Availability of data and materials
List of PDB files is provided in supplementary material

Competing Interests
The authors declare that they have no competing interests

Funding
This work was supported by a grant from NIH, grant number R01GM093937. E.A. was supported by grants R01GM125639 and P20GM121342. The funding bodies have not played any roles in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Authors' contributions
MS carried computational work, AC designed the code, SK developed the webserver, EA supervised the work. The authors read and approved the final manuscript.