Protein engineering in the big data era: harnessing near-redundant structural data
Motivation: Predicting the effect of mutations on protein-protein interactions is important for relating structure to function, as well as for in silico affinity maturation. The effect of mutations on protein-protein binding energy (ΔΔG) can be predicted by a variety of atomic simulation methods involving full or limited flexibility, and explicit or implicit solvent. Methods which consider only limited flexibility are naturally more economical, and many of them are quite accurate, however results are dependent on the atomic coordinate set used. In this work we perform a sequence and structure based search of the Protein Data Bank to find additional coordinate sets and repeat the calculation on each.
Results: . We improve increase precision and Positive Predictive Value, and decrease Root Mean Square Error and higher Positive Predictive Value, compared to using single structures. Given the ongoing growth of near-redundant structures in the Protein Data Bank, our method will only increase in applicability and accuracy.
Availability: Public web server at biodesign.scilifelab.se
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the latest manuscript can be downloaded and accessed as a PDF.
This is a list of supplementary files associated with this preprint. Click to download.
Posted 18 Feb, 2021
Protein engineering in the big data era: harnessing near-redundant structural data
Posted 18 Feb, 2021
Motivation: Predicting the effect of mutations on protein-protein interactions is important for relating structure to function, as well as for in silico affinity maturation. The effect of mutations on protein-protein binding energy (ΔΔG) can be predicted by a variety of atomic simulation methods involving full or limited flexibility, and explicit or implicit solvent. Methods which consider only limited flexibility are naturally more economical, and many of them are quite accurate, however results are dependent on the atomic coordinate set used. In this work we perform a sequence and structure based search of the Protein Data Bank to find additional coordinate sets and repeat the calculation on each.
Results: . We improve increase precision and Positive Predictive Value, and decrease Root Mean Square Error and higher Positive Predictive Value, compared to using single structures. Given the ongoing growth of near-redundant structures in the Protein Data Bank, our method will only increase in applicability and accuracy.
Availability: Public web server at biodesign.scilifelab.se
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the latest manuscript can be downloaded and accessed as a PDF.