Primate Speciation Links to Massive and Directional Shrinkage of the Dinucleotide Short Tandem Repeat Compartment.


 The evolutionary trend of short tandem repeats (STRs) at the crossroads of speciation remains largely elusive and attributed to random evolution for the most part. Here we investigated the dinucleotide STR compartment in primate speciation. We selected six species, which shared sequential chronological ancestors, including mouse, macaque, gorilla, chimpanzee, bonobo, and human, and collected three sets of data on the abundance of all classes of dinucleotide STRs (≥6-repeats) for three regions of every chromosome, each region spanning 10 Mb of DNA. In all three datasets, we found consistent directional shrinkage of the dinucleotide STR compartment in all the primate species selected vs. mouse, as follows: mouse>macaque>great apes. The >20-repeat STRs were the most significantly affected as a result of this shrinkage. We propose that massive and directional shrinkage of the dinucleotide STR compartment had a decisive link with primate speciation. This is a prime instane of massive directional STR trend in multiple speciation.


Introduction
With an exceptionally high degree of polymorphism and plasticity, short tandem repeats (STRs) are a spectacular source of variation required for speciation and evolution 1-3 . The impact of STRs on speciation is supported by their various functional implications, such as gene expression, alternative splicing, and translation 3-10 .
While a limited number of studies indicate that purifying selection and drift can shape the structure of STRs at the inter-and intra-species levels [11][12][13][14][15][16] , the global trend of STR evolution at the crossroads of primate speciation remains largely unknown.
The most common STRs in the human genome are dinucleotide repeats 17 . Here we analyzed the evolutionary trend of this category of STRs in six selected species encompassing rodent, Old World monkey, and great apes.

Materials And Methods
Extraction of STRs from genomic se quences.
The abundance of dinucleotide STRs was studied in six selected species, which shared sequential chronological ancestors, including mouse, macaque, gorilla, chimpanzee, bonobo, and human. We designed a software package in C# environment (Suppl. 1) (https://github.com/arabfard/Di-Finder). All possibilities of dinucleotide motifs, consisting of AC, AG, AT, CA, CG, CT, GA, GC, GT, TA, TC, and TG, which were ≥6-repeats were included (Suppl. 2). The written program was based on perfect (pure) STRs. By using the REST API service 18 from Ensemble 101 (https://asia.ensembl.org) 19,20 , data of three regions of every chromosome, each region spanning 10 Megabase (Mb) of genomic DNA, were accessed in the six species (Fig. 1). In each chromosome, the STR abundances were calculated and compared on a chromosome-to-chromosome basis (Table 1, Suppl. 2). Subsequent to collecting the entire data, we also differentiated the STRs based on their length into two classes of 6-20 repeats and >20-repeats, and studied their abundance in the selected species. Finally, the data of the selected regions of the chromosomes in the six species were aggregated and analyzed.

Statistical analysis
The dinucleotide STR abundance trend in the six selected species was compared across datasets 1, 2, and 3, by correlation coe cient and repeated measurements analysis ( Table 2). The Pearson correlation was conducted on the average of STR abundances, consisting of AC, AG, AT, CA, CG, CT, GA, GC, GT, TA, TC, and TG.

Results
Directional shrinkage of all classes of dinucleotide STRs in primates vs. mouse.
In three independent analyses, we studied the way dinucleotide STRs were distributed in respect of their abundance across mouse and primates. The observed trend was strikingly decremental in the ve primates vs. mouse (Fig. 2). That trend was consistently replicated in datasets 1, 2, and 3 (p=0.80) ( Table 2). The AC class of STRs had the most abundance difference in mouse vs. all primates, followed by TG dinucleotides.
The overall STR counts observed in macaque were an intermediate between mouse and the great apes (Table 2, Fig. 3), which matched the phylogenetic distance of these species.
The >20-repeat dinucleotide STRs were more dramatically affected by shrinkage.
The >20-repeat compartment was the most dramatically affected in respect of shrinkage in primates vs. mouse (Fig. 4). While in mouse, this compartment was on average 10-fold larger than in the ve selected primate species, the 6-20 repeat compartment was 2-fold larger.

Discussion
It is largely unknown whether at the crossroads of speciation, STRs evolved as a result of purifying selection, genetic drift, and/or in a directional manner. Our approach to deciphering the above was simple, and yet robust. We selected species that shared sequential chronological ancestors, and investigated all possible dinucleotide STRs of all possible lengths ≥6-repeats. Our analysis revealed a directional trend in the abundance of dinucleotide STRs during primate speciation.
Dinucleotide STRs are the most abundant class of STRs in the vertebrate genomes, and their global pattern of abundance may shed light on a vastly unknown aspect of evolutionary biology. The replicated trends observed in our three datasets seem to be independent of the genome size of the selected species.
Mouse has the highest abundance of STRs in comparison to the ve selected primate species, and yet its genome is smaller than all those species. This nding is in line with the previous reports of lack of relationship between genome size and abundance of STRs 21,22 .
An alternative hypothesis to the directional shrinkage of the dinucleotide STR compartment in primates is that this compartment has actually expanded in mouse. However, the observation that the overall STR counts in macaque were an intermediate between mouse and the great ape species studied (rodent>Old World monkey>great apes) (Fig. 5), supports the shrinkage hypothesis for at least part of the trend observed in our study. Indeed, while both hypotheses may stand in evolutionary terms, directional shrinkage is a likely explanation based on the species selected. It is possible that there is a mathematical threshold required for the abundance of STRs in various species. This is in line with the hypothesis that STRs function as scaffolds for biological computers 23 . In mouse, the overall number of dinucleotide STRs never declined to the numbers observed in the ve primates.
Certain STRs located in the core promoters have been subject to contraction in the process of human and non-human primate evolution 24 . A number of those STRs are identical in formula in primates vs. nonprimates, and the genes linked to those STRs are involved in characteristics that have diverged primates from other mammals, such as craniofacial development, neurogenesis, and spine morphogenesis. It is likely that those STRs functioned as evolutionary switch codes for primate speciation. In line with the above, structural variants are enriched near genes that diverged in expression across great apes 25,26 . It is speculated that STR variants are more likely than single-nucleotide variants to have epistatic interactions, which can have signi cant consequences in complex traits, in human as well as model organisms 27,28 . Future studies such as large-scale genome-editing of STRs 29 in embryonic stem cells and investigation of their differentiation into various cell lineages may be candidate approaches to investigate how the massive and dramatically diverged trend of dinucleotide STRs links to primate speciation and evolution.

Limitations
Although this study covered over 600 Mb of genomic DNA in each of the six selected species, it can be extended to the entire genome and additional species.

Conclusion
In conclusion, we propose that massive and directional shrinkage of the dinucleotide STR compartment links to, and probably has a determining impact on primate speciation. This is a prime instance of directional STR trend in multiple speciation.  Tables   Table 1: Chromosome-by-chromosome study of dinucleotide STR abundance (count) exemplified in Dataset 2*.