Evolution of Old World monkeys and great apes links to massive and directional shrinkage of the dinucleotide short tandem repeat compartment


 Background: The evolutionary trend of short tandem repeats (STRs) at the crossroads of speciation remains largely elusive and attributed to random evolution for the most part. To explore this trend, we selected nine species, which shared sequential chronological ancestors, including rat, mouse, olive baboon, gelada, macaque, gorilla, chimpanzee, bonobo, and human, and collected three sets of data on the abundance of all classes of dinucleotide STRs (≥6-repeats) for three regions of every chromosome, each region spanning 10 Mb of DNA. Results: In all three datasets, we found directional shrinkage of the dinucleotide STR compartment as follows: rodents>Old World monkeys>great apes (P=0.000). The decremented gradient observed for the dinucleotide STRs was not detected for a number of other classes of STRs, such as mono and trinucleotide STRs. Conclusion: We report the first instance of massive and directional gradient of STRs, which may link with the evolution of Old World monkeys and great apes.

While a limited number of studies indicate that purifying selection and drift can shape the structure of STRs at the inter-and intra-species levels [11][12][13][14][15][16] , the global trend of STR evolution at the crossroads of primate speciation remains largely unknown.
The most common STRs in the human genome are dinucleotide repeats 17 . Here we analyzed the evolutionary trend of this category of STRs in nine selected species encompassing rodent, Old World monkey, and great apes.

Materials And Methods
Extraction of STRs from genomic sequences.
The written program was based on perfect (pure) STRs. By using the REST API service 18 from Ensemble 101 (https://asia.ensembl.org) 19,20 , data of three arbitrary regions of every chromosome, each region spanning 10 Megabase (Mb) of genomic DNA, were accessed in the nine species (Fig. 1). In each chromosome, the STR abundances were calculated and compared on a chromosome-to-chromosome basis (Suppl. 2). Subsequent to collecting the entire data, we also differentiated the STRs based on their length into two classes of 6-20 repeats and > 20-repeats, and studied their abundance in the selected species. Finally, the data of the selected regions of the chromosomes in the nine species were aggregated and analyzed.
Additionally, to compare the trend of dinucleotide STRs with other classes of STRs, we used the STR-Finder tool to screen the selected regions and species for mononucleotide STRs (T, G, A, and C) of ≥ 6-repeats and trinucleotide STRs (GCC/ GGC) of ≥ 6-repeat lengths.

Statistical analysis
The dinucleotide STR abundance trend in the nine selected species was compared across datasets 1, 2, and 3, by correlation coe cient and repeated measurements analysis (Table 1).

DS: Dataset
Comparisons of within and between classes were analyzed using one and two-way Anova tests. These analyses were con rmed by nonparametric tests

Results
Overall directional shrinkage of the dinucleotide STR compartment in Old World monkeys and great apes vs. rodents.
In three independent analyses, we studied the distribution of dinucleotide STRs in respect of their abundance across rodents and primates. The observed trend was strikingly decremental as follows: rodents > Old World monkeys > great apes P = 0.000 (Table 1, Fig. 2). That trend was replicated in datasets 1, 2, and 3 (P = 0.80).
Differential gradient of dinucleotide STRs based on their length.
The directional gradient, rodents > Old World monkeys > great apes, was found to be predominantly laid in the dinucleotide STRs of 6-20 repeats (Fig. 3). While the > 20-repeat compartment was the most dramatically affected in respect of shrinkage in primates vs. rodents, this compartment was the largest in human, in comparison with the remaining six primate species studied.
Differential gradient of STR classes in rodents vs. Old World monkeys vs. great apes.
To examine whether the observed trend in dinucleotide STRs can be generalized to other STR classes, we analyzed the abundance trend of mononucleotide STRs (G, A, T, and C) ≥ 6-repeats (Fig. 4) and trinucleotide STRs (GGC/GCC) ≥ 6-repeats (Fig. 5) in the selected species. While the trend in dinucleotide STRs was a decremented gradient (rodents > Old World monkeys > great apes), a similar trend was not detected in the mono (Fig. 4) and tri STR (Fig. 5) compartment. In fact, the dramatic excess of the dinucleotide compartment observed in rodents was not observed for the mono and trinucleotide STRs.

Discussion
It is largely unknown whether at the crossroads of speciation, STRs evolved as a result of purifying selection, genetic drift, and/or in a directional manner. In an attempt to resolve part of the picture, we selected multiple species that shared sequential chronological ancestors, and investigated all possible dinucleotide STRs of all possible lengths (≥ 6-repeats). Our analysis revealed an overall directional gradient in the abundance of dinucleotide STRs during primate speciation, evidenced by the following trend: rodents > Old World monkeys > great apes (Fig. 6).
Dinucleotide STRs are the most abundant class of STRs in the vertebrate genomes, and their global pattern of abundance may shed light on a vastly unknown aspect of evolutionary biology. The replicated trends observed in our three datasets seem to be independent of the genome size of the selected species. Mouse and rat have the highest abundance of dinucleotide STRs in comparison to the seven selected primate species, and yet their genomes are smaller than those species. This nding is in line with the previous reports of lack of relationship between genome size and abundance of STRs 21,22 .
An alternative hypothesis to the directional shrinkage of the dinucleotide STR compartment in primates vs. rodents is that this compartment has expanded excessively in rodents. Indeed, rodent genomes appear to be signi cantly rich in STRs in comparison to several other mammals 11 . However, our present ndings indicate that the above property cannot be generalized to all classes of STRs, as, at least, mono and trinucleotide STRs did not show the dramatic excess observed in the dinucleotide compartment in rodents. Furthermore, the decremented gradient of the dinucleotide STR compartment in the following order: rodents > Old World monkeys > great apes supports the shrinkage hypothesis for the dinucleotide compartment.
It is possible that there is a mathematical threshold required for the abundance of STRs in various orders of species (Fig. 6). This is in line with the hypothesis that STRs function as scaffolds for biological computers 23 .
Certain STRs located in the protein-coding gene core promoters have been subject to contraction in the process of human and non-human primate evolution 24 . A number of those STRs are identical in formula in primates vs. nonprimates, and the genes linked to those STRs are involved in characteristics that have diverged primates from other mammals, such as craniofacial development, neurogenesis, and spine morphogenesis. It is likely that those STRs functioned as evolutionary switch codes for primate speciation. In line with the above, structural variants are enriched near genes that diverged in expression across great apes 25,26 . It is speculated that STR variants are more likely than single-nucleotide variants to have epistatic interactions, which can have signi cant consequences in complex traits, in human as well as model organisms 27,28 . Future studies such as large-scale genome-editing of STRs 29 in embryonic stem cells and investigation of their differentiation into various cell lineages may be candidate approaches to investigate how the massive and dramatically diverged trend of dinucleotide STRs links to primate speciation and evolution.

Conclusion
In conclusion, we propose that massive and directional shrinkage of the dinucleotide STR compartment links to, and probably had a determining impact on primate speciation. This is a prime instance of non-random STR gradient in multiple speciation. Authors' contributions MA collected data and performed the bioinformatic analyses. MS performed the biostatistics analyses. IA contributed to data collection. MO conceived, designed, and supervised the project, and wrote the manuscript. Figure 1 Schematic representation of data collection of the mono, di, and trinucleotide STR compartments. All chromosomes were screened across the nine selected species in three datasets 1, 2, and 3. Only one chromosome is depicted as an example.

Figure 2
Massive directional shrinkage of the dinucleotide STR compartment in primates, replicated in datasets 1, 2, and 3.

Figure 3
Page 10/12 Abundance of various dinucleotide STR lengths across the nine selected species. The directional decremented gradient was predominantly laid in the 6-20 length compartment. While the >20-repeat compartment was the most dramatically affected as a result of shrinkage in primates in comparison with rodents, human had the highest abundance of the >20 repeat STRs in comparison with all other primates studied.

Figure 4
Mononucleotide STR trend across rodent and primates. In contrast to the dinucleotide compartment, the trend in the trinucleotide STRs was not decremented in primates vs. mouse.

Figure 5
Trinucleotide STR trend across rodent and primates. In contrast to the dinucleotide compartment, the trend in the trinucleotide STRs was not decremented in primates vs. mouse.