Simple sequence repeats in the Dodonaea viscosa
To evaluate the assembly quality and develop new molecular markers, the generated 646,428 sequences were examined for potential SSRs. A total of 42,638 sequences were found to contain 49,836 microsatellites. 4,927 sequences contained multiple SSRs. Among the 49,836 SSRs consisting various repeat types (Table 2), mononucleotide repeats were the most abundant accounting for 40.77% (20,320) followed by Di-, 31.39 % (15,644), and tri-, 25.98 % (12,949), while the rest, tetra, penta- and hexa-nucleotide repeat units accounted for 1.14 % (568), 0.21 % (102) and 0.51 % (253) of all the SSRs respectively. Most SSRs repeat units ranged from 5 to 13, with five repeat units 14.77 % (7,363) six repeat units 18.88 % (9,407) and ten repeat units 27.70 % (13,804) as the most repeat types (Table 2). In the di-nucleotide repeat SSRs, AG/CT was the most abundant 49.53 % (7,749), followed by AT/AT 30% (4,693) and AC/GT 20.37% (3,186). Di-nucleotide GC/CG rich repeats were extremely scarce accounting for only 0.1% (16) (Fig. 2A). AAG/CTT 47.14% (6,104) was the most frequent followed by ATC/ATG 15.04% (1948) and AAT/ATT 9.84% (1,274) of the Tri-nucleotide repeats (Fig. 2B). ACAT/ATGT and AAAT/ATTT accounted for 32.04% (182) and 27.82 (158) of all the tetra repeats (Fig. 2C). Both the penta and Hexa-nucleotide repeat motifs had low abundance of 0.21% (102) and 0.51% (253) of all SSRs, respectively.
Microsatellites Validation, polymorphism assessment and genetic diversity and structure
Eighteen primers were used to screen for polymorphism among 92 individuals (Table 3). Of the 18 loci, 12 displayed polymorphism with the allele number varying from 2 to 11 (average of 3.31) and six yielded monomorphic products. Taking no account of the six monomorphic markers and two markers with low PIC values, ten markers were used to assess genetic diversity (Table 4). The 10 SSR markers successfully produced clear amplification products generating a total of 295 alleles across 92 individuals of eight D. viscosa populations. Among the 10 loci, the number of observed alleles (Na) per locus ranged from 2.250 to 5.250 with a mean value of 3.688. Dodo 001 and Dodo 026 loci had the lowest mean values of (Na) of 2.250 and highest at locus Dodo 032. The mean effective alleles per locus had an average of 2.420 alleles per SSR, ranging from 1.360 to 3.564. The mean value of Shannon index ranged from 0.402 to 1.375, with an average of 0.929. The average expected heterozygosity and observed heterozygosity were 0.519 and 0.508 respectively. Average Ho values varied from Dodo 001 (0.119) to Dodo 004 (0.982) and He ranged from Dodo 001 (0.227) to Dodo 032 (0.691) (Table 4).
Of the ten polymorphic loci, Dodo 001 had the highest index of genetic variation (FST = 0.528), whereas Dodo 004 had the lowest (FST = 0.109), with the overall FST value of 0.221. The locus with the highest gene flow was Dodo 026 (Nm = 2.200), lowest in Dodo 001 (Nm = 0.223), and the average gene flow of the 10 loci was 1.174 (Table 4). The average PIC value was 0.605 ranging from 0.383 to 0.805. Dodo 020, Dodo 032, Dodo 010, and Dodo 004 SSRs were the most useful with high PIC values of 0.805, 0.799, 0.741, and 0.718 respectively. Significant deviations from the HWE were observed at all the other loci (P< 0.001) except for Dodo 002, Dodo 026, and Dodo 032.
Variation parameters between the populations were presented in (Table 1). The number of alleles per locus (Na) varied between 2.700 (Vuria 1 population) to 4.600 (Yale population) averaging 3.69 alleles per population. The mean number of effective alleles per locus (Ne) for all populations ranged from 1.973 to 2.806 with Yale population (Ne = 2.806) showing the highest number of effective alleles and lowest in Vuria 1 population (Ne = 1.973). The mean observed (Ho) and expected (He) heterozygosities for all populations were 0.519 and 0.508, respectively, Observed heterozygosity (Ho) ranged from 0.440 (Ngangao 2 population) to 0.578 (Mbololo population) while the expected heterozygosity values (He) values varied between 0.443 (Vuria 1 population) to 0.582 (Vuria 2 population). Significant inbreeding coefficients (FIS) were detected in all the populations. Varying from 0.747 to 1.105.
The AMOVA (Table 5) of eight populations showed that 78% of total variance was found within the populations while 20% was found among the populations and only 2% among the individuals. STRUCTURE analysis identified three groups among the eight populations. (Fig. 3A, Δk = 3). The two populations from Mt. Kenya (Mt. Kenya 1 and Mt. Kenya 2) and population (Mbololo) showed a close genetic similarity forming one cluster, Ngangao 1, Vuria 1 and Yale populations clustered together, while some individuals in Vuria 2 and Ngangao 2 formed the third cluster. Ngangao 2 population showed a high level of admixture, Vuria 2 and Yale also had some of individuals in different clusters. Population genetic distance cluster analysis based on Unweighted Pair Group Method with Arithmetic Mean (UPGMA) by MEGA software analysis revealed a congruent result with STRUCTURE analysis (Fig. 3C). PCoA supported STRUCTURE and UPGMA cluster analyses indicating that these populations could be separated into three groups (data not shown).
Bottleneck tests showed that three populations, Ngangao 1, Vuria 1 and Mt. Kenya 2 displayed shifted modes (Table 6.). Vuria 1 population revealed significant bottleneck IAM (P < 0.05) However, TPM and SMM (P < 0.05) on Wilcoxon signed rank test displayed no significant population bottlenecks.
Species distribution modeling
Maxent models revealed high levels of predictive performance under all scenarios (i.e. Current = 0.905, LGM = 0.889, RCP 4.5 = 0.923, RCP 8.5 = 0.921) (Table 7). The most important variables limiting the distribution of D. viscosa during the LGM were: elevation (relative contribution: 42.7%), temperature seasonality (Bio4 - relative contribution: 17.8%), mean temperature of the wettest quarter (Bio8 - relative contribution: 12.0%), and precipitation of driest month (Bio14 - relative contribution: 11.2%). Besides, for the current and future models, elevation, Bio4, Bio8, and Bio18 (precipitation of warmest quarter), were the most important factors limiting the distribution of D. viscosa (Table 8). The past and future range simulations were fairly consistent with the current potential distribution and covered areas in Kenya and Tanzania, with central and western Kenya, northern and southwestern Tanzania having the most favorable conditions (Fig. 4).Similar high habitat suitability was also observed in Kenya and Tanzania along the Indian Ocean Coast. The LGM conditions observed high population expansion and a significant increase in high suitability areas for D. viscosa compared to the future. Finally, future climatic projections for D. viscosa revealed that the pessimistic scenario RCP85 was more optimistic and highlighted the potential range expansion by 2070 compared to the intermediate pessimistic scenario RCP4.5. High suitable ranges in Kenya and Tanzania were significantly reduced during RCP 4.5 than RCP8.5. Precisely, areas in Southern Tanzania and Eastern Uganda revealed suitable habitats during the pessimistic scenario RCP8.5.