Evolutionary and Functional Lessons from Human-Specific Amino-Acid Substitution Matrices
The characterization of human genetic variation in coding regions is fundamental to the understanding of protein function, structure and evolution. Amino-acid (AA) substitution matrices encapsulate the stochastic nature of such proteomic variation and are widely used in studying protein families and evolutionary processes. The conventional substitution matrices, namely BLOSUM and PAM, were constructed to reflect polymorphism across species. In this study, we analyzed the frequencies of >4.8M single nucleotide variants within the healthy human population to accurately represent proteomic variability within the human species, at codon and AA resolution. Our model exposes various AA substitutions which are observed more frequently in one specific direction than in the opposite direction. We further demonstrate that nucleotide substitution rates only partially determine AA substitution rates. Finally, we investigate AA substitutions in post-translational modification and ion-binding sites, exposing purifying selection over a range of residue-based functions. These novel matrices provide a robust baseline for the analysis of protein variation in health and disease.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the manuscript can be downloaded and accessed as a PDF.
Posted 21 Sep, 2020
Evolutionary and Functional Lessons from Human-Specific Amino-Acid Substitution Matrices
Posted 21 Sep, 2020
The characterization of human genetic variation in coding regions is fundamental to the understanding of protein function, structure and evolution. Amino-acid (AA) substitution matrices encapsulate the stochastic nature of such proteomic variation and are widely used in studying protein families and evolutionary processes. The conventional substitution matrices, namely BLOSUM and PAM, were constructed to reflect polymorphism across species. In this study, we analyzed the frequencies of >4.8M single nucleotide variants within the healthy human population to accurately represent proteomic variability within the human species, at codon and AA resolution. Our model exposes various AA substitutions which are observed more frequently in one specific direction than in the opposite direction. We further demonstrate that nucleotide substitution rates only partially determine AA substitution rates. Finally, we investigate AA substitutions in post-translational modification and ion-binding sites, exposing purifying selection over a range of residue-based functions. These novel matrices provide a robust baseline for the analysis of protein variation in health and disease.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the manuscript can be downloaded and accessed as a PDF.