Existing computational methods to estimate pKa values in proteins rely on theoretical approximations
and lengthy computations. In this work, we use a data set of 6 million theoretically determined pKa
shifts to train deep learning models that are shown to rival the physics-based predictors. These
neural networks managed to assign proper electrostatic charges to chemical groups, and learned the
importance of solvent exposure and close interactions, including hydrogen bonds. Although trained only
using theoretical data, our pKAI+ model displays the best accuracy on a test set of ∼750 experimental
values. Inference times allow speedups of more than 1000 times faster than physics-based methods.
By combining speed, accuracy and a reasonable understanding of the underlying physics, our models
provide a game-changing solution for fast estimations of macroscopic pKa from ensembles of microscopic
values as well as for many downstream applications such as molecular docking and constant-pH
molecular dynamics simulations.