Background Convergent and parallel evolution provide unique insights into the mechanisms of natural selection. Some of the most striking convergent and parallel (collectively recurrent ) amino acid substitutions in proteins are adaptive, but there are also many that are selectively neutral. Genome-wide assessment of recurrent substitutions has only been performed for orthologs. These studies have revealed that the pervasiveness of recurrent substitutions is for a large part explained by purifying selection. At any position in a protein, only a subset of amino acids is allowed, increasing the chance of the same substitution happening in different lineages.
Results
We developed a framework that detects patterns of recurrent differentiation in paralogs across 90 divergent eukaryotic genomes. A skew in recurrent substitutions serves as a proxy for a recurrent trend in function. We find remarkable examples of recurrent sequence evolution after independent duplication, in some cases involving more than ten different lineages where duplicates show a similar differentiation. We reveal the implicated functional patterns for the gene families Hint1/Hint2, Sco1/Sco2 and vma11/vma3.
Conclusions
The presented methodology provides a means to study the biochemical underpinning of functional differentiation between paralogs. For instance, two abundantly repeated substitutions are identified between independently derived Sco1 and Sco2 paralogs. Such identified substitutions allow direct experimental testing of the biological role of these residues for the repeated functional differentiation. The present study uncovers a diverse set of families with recurrent sequence evolution and reveals trends in the functional and evolutionary trajectories of this hitherto understudied phenomenon.