Development and Utilization of Microsatellite Markers to Assess Genetic Variation Coupled with Modelling Range Shifts of Dodonaea Viscosa in Isolated Taita Hills and Mount Kenya Forests

For the protection and maintenance of fragmented and highly disturbed habitats, understanding genetic variation is essential. The Taita Hills of Kenya is the northernmost part of the Eastern Arc Mountains and has been identied as one of the top 10 biodiversity hotspots globally. The current forests in the Taita Hills have been highly fragmented over the past century. In order to appraise the inuence of anthropological disturbance and fragmentation on the genetic variation of Dodonaea viscosa (Sapindaceae), we studied its preliminary genetic variability and population structure using newly developed microsatellite (SSR) markers, combined with ecological niche modeling analyses. We utilized the Illumina paired-end technology to sequence the D. viscosa’s genome and developed its microsatellite markers. In total, 646,428 sequences were analyzed and 49,836 SSRs were identied from 42,638 sequences. A total of 18 primer pairs were designed to test polymorphism among 92 individuals across eight populations. The average observed heterozygosity and expected heterozygosity ranged from 0.119 to 0.982 and from 0.227 to 0.691, respectively. Analysis of molecular variance (AMOVA) revealed 78% variance within individuals and only 20% among the eight populations. According to SDM results, D. viscosa’s suitable habitats have been gradually reducing since the last glacial maximum (LGM), and the situation will worsen under the extreme pessimist scenario of RCP 8.5. Moreover, genetic diversity was signicantly greater in larger fragments. Therefore, urgent conservation management of smaller fragmented patches is necessary to protect this disturbed region and maintain the genetic resources.


Introduction
Forest fragmentation and land destruction lead to substantial alterations in habitat qualities and biochemical conditions in the remaining forest portions [1], causing adverse effects on reproductive systems, biological interactions, seed dispersal, and individual tness in altered landscapes. The disjointed habitats interrupt gene migration among populations and lead to the emergence of small remnant and isolated populations [2][3][4] which have high extinction risks as a result of genetic drift, natural catastrophes, environmental and demographic stochasticity. High rates of allelic drift, reduced genetic connectivity, and high inbreeding may result from devastating effects of small and isolated populations in the discontinuous landscape [5][6][7]. As a result of genetic drift increase in genetic differentiation occurs, resulting in erosion of genetic variation among the populations, gradually lowering the species tness and populations adaptability when coping with varying environmental conditions. The gradual effects of recent anthropogenic activities and habitat fragmentation drastically affects plant species' genetic composition among the isolated forest patches [7][8][9].
The Taita Hills (Kenya) is among the top 10 global biodiversity hotspots that has suffered a greater forest loss during the past several years, thus categorized among the most threatened locations [10], characterized by the fragmented and disturbed ecosystem. It is dominated by major valuable plant species such as Aningeria spp, Strombosia spp, and Dodonaea viscosa, boasting rich diversity and endemic plants and animals [11]. The major driver for the current distribution of plants in the area is commonly thought to be climatic oscillations and geographical shifts associated with the quaternary [12]. The Taita Hills region is characterized by favorable climate and fertile soils encouraging massive clearance of natural forests by local inhabitants for farming [13]. As a result, the forest remnants have undergone fragmentation and currently surrounded by cultivated farmlands for over a century [14,13,15]. The existing Taita Hills landscape comprises of three distinct segregates [16], Sagala Hill separated from the rest by a valley, Mbololo Hill disconnected by a valley from the last isolate Dabida Hill which contains (Ngangao, Vuria and Yale Hills). The Taita Hills forests share vital plant species despite the relatively enormous barriers between the different mountain crests, with slopes and peaks of Mount Kenya, Nyambeni, Mau, Aberdares, Cherangani Hills, Mukinduri and Tinderet Peak [17]. Historical or recent isolation may have resulted in high variation in the Taita Hills shaping its ora when compared with other mountains [13,15]. In recent times, the seed dispersal e ciency of different plants in these destroyed and isolated forest fragments has declined [18,15], and these species may be facing di culties in keeping steady populations in entire natural forest remnants. This signi es that some forest patches are at high risk of extinction [19,20].
The East African Rift-Valley system plays a vital role in disrupting gene ow, although only limited studies have examined the genetic variation of plant species that are widely distributed [21]. In two Acacia species, the genetic diversity was found to be high. However, the genetic differentiation among populations was lower in A. senegal than A. mellifera, suggesting that in A. senegal [22], gene ow barriers have less impact on shaping gene ow. In Prunus africana, the East African Rift System played a major role as a gene ow barrier [21], similar to the study on A. mellifera in tropical Africa [23,24]. In Kenya, D. viscosa is distributed from coastal parts like Taita hills (Voi) to northern areas: Rift valley, Nairobi, Nanyuki, and Kajiado [25]. This species has dense or erect, multi-stemmed shrub with variable leaf shapes and the fruit is a capsule with 3-4 wings and bears the spreading ability. Moreover, it is found in different habitats such as desert gullies, arid shrub lands and temperate woodlands [26]. Therefore, morphological differences and ecological valence in this crucial species make it a perfect model to examine its adaptability to the surrounding conditions and divergence [27,26].
Our current knowledge of the genetic effects of forest fragmentation and land destruction in Kenya's isolated forests is limited. The Taita Hills being a top 10 global biodiversity hotspot area and center of species endemism offers a unique opportunity to study how climatic oscillations in uence patterns of differentiation and diversity. Furthermore, Dodonaea viscosa species yielded large amounts of the most biomass nitrogen in this region, therefore can be use in reforestation, reclamation of marshy and degraded lands, and as a soil stabilizer. Thus we utilized D. viscosa to ll this knowledge gap. Besides, due to the absence of reliable molecular markers in D. viscosa, only its genomic resources are available in public database, we rstly developed the SSR markers for this species using Nextgeneration sequencing (NGS) technology. Secondly, we used some of these newly developed polymorphic loci to test the genetic variation of D. viscosa in the remaining forest patches. Lastly, we inferred the potential global climate change in uence on the distribution of D. viscosa in the Last Glacial Maximum (LGM), present and the future. Our results will help understand adaptive variation, population structure, gene ow, genetic diversity, and potential refugia in the LGM and provide information for designing effective conservation measures in various fragmented mountain massifs found in east Africa.

Plant sampling
A total of 92 individuals were sampled from eight populations in two different distribution zones in Kenya, including Taita hills in Voi and Mt. Kenya (Table 1; Fig. 1). Global positioning system (GPS) was used to record the geographical locations of each population. Leaf samples were collected randomly in the eld, dried in silica gel, and later transported to the Chinese Academy of Sciences (Wuhan, China). A copy of voucher specimens were deposited in the National museums of Kenya and Herbarium of Wuhan Botanical Garden.
DNA isolation, library preparation, sequencing, and primer design DNA was extracted from silica gel dried leaf material of D. viscosa using the cetyltrimethylammonium bromide (CTAB) method [28]. The Qubit DNA Assay kit the Qubit 2.0 Fluorometer (Life Technologies, San Diego, CA) was used to verify the purity and concentration of the DNA. The quality and concentration of the DNA were checked using Qubit DNA Assay kit in Qubit 2.0 Fluorometer (Life Technologies, San Diego, CA). Three samples with high-quality DNA were selected for Illumina paired-end library construction. About 4 ug DNA per sample were sheared using a Covaris S220 (Massachusetts, USA) to generate > 750 bp fragments, following the Illumina TruSeq DNA Library preparation Kit (Illumina, California, USA) and using a multiplex identi er index. Library sequencing with 150 bp paired-end reads was conducted using Illumina HiSeq 2000 Platform at Novogene Bioinformatics Technology Co. Ltd (Beijing, China). The generated raw reads were subjected to a stringent ltering process. The following read sequences were discarded prior to assembly; reads with base pairs exceeding 10% with low-quality score Q < 20, ambiguous paired reads with N content exceeding 10 % and other contaminants. De novo assembly was conducted using the de Bruijin graph-based Velvet 2.0 [29] software. MIcroSAtellite identi cation tool (MISA; http://pgrc.ipkgatersleben.de/misa/) was used to scrutinize the generated sequences for the availability of Simple Sequence repeats. Minimum of 10, 5, 4, iterations were considered for Mono, Di-and Trinucleotides, respectively, whereas a minimum of 3 iterations was considered for both Penta and Hexa-nucleotides. We used Primer3 [30] software at default settings to design primers.

Primer validation
Using genomic DNA isolated from six individuals, a total of 25 SSR primers were chosen at random and tested for ampli cation and speci city. Polymerase Chain Reaction (PCR) was performed in a 20 µL reaction mixture containing; 11.3 µL ddH 2 O, 0.2 µL Taq Polymerase enzyme, 0.5 µL dNTPs, 1 µL Forward primer and 1 µL Reverse primer, 2.5 µL Taq Buffer, and 2 µL of Genomic DNA. PCR ampli cation conditions were as follows: an initial denaturation step of 4 minutes at 94°C, followed by 30 cycles of denaturation for 40 sec at 94°C, annealing for 30 s at 52°C -54°C, and extension of 40 s at 72°C; then a nal extension step of 10 min at 72°C. 2 % Agarose gel was used to quantify the PCR products. We selected 18 primers that produced clearly de ned bands, labeled the 5' end of the forward sequence with the 6-FAM uorescent dye, and then performed genotyping using those markers. The STR sequences were analyzed using GeneMarker software (Soft Genetics).

Data analysis
The deviations from Hardy-Weinberg equilibrium (HWE) with Bonferroni corrections and the polymorphic information content (PIC) for each locus and were checked in Cervus 3.

Species distribution modeling
MAXENT v.3.3.3 was used to perform species distribution modeling of D. viscosa to predict its potential distribution in East Africa [40]. We obtained 19 bioclimatic variables in a resolution of 2.5 arcsec (ca. 5-km2) for the current and past (Last Glacial Maximum, LGM) scenarios from WorldClim1.4 (http://www.worldclim.org/, [41]). An additional elevation raster was also obtained from the Worldclim database and included as an environmental variable. Since collinearity may cause uncertainties and lower the statistical power of models [42, 43], a spatial correlation test was implemented on the bioclimatic variables. The test was performed to remove one variable in a pair of correlated variables at a threshold of 0.8 [44], and was implemented in ENMTools package in R, using the function raster.cor.matrix [45]. Eventually, seven variables were selected as representative of climate factors (Table 7).
To assess the probable future distribution in D. viscosa ranges, we utilized two representative concentration pathway The models of the LGM, present, and future distribution were created using 67 locations obtained from the Global Biodiversity Information Facility (GBIF) and 8 actual occurrences from our eld study that were linked with an accurate GPS mapping. Maxent models were run with default parameters except the 75% random test percentage, 5000 background iterations, and 10 subsample replicates. We selected the logistic output format for the three periods, current, LGM, and future, and using this format, a map of continuous probability values ranging from 0 to 1 was generated. ArcMap v10.5 (ESRI, Redlands, CA) was used to visualize the output logistic predictions.

Simple sequence repeats in the Dodonaea viscosa
To evaluate the assembly quality and develop new molecular markers, the generated 646,428 sequences were examined for potential SSRs. A total of 42,638 sequences were found to contain 49,836 microsatellites. 4,927 sequences contained multiple SSRs. Among the 49,836 SSRs consisting various repeat types (Table 2), mononucleotide repeats were the most abundant accounting for 40.77% (20,320) followed by Di-, 31.39 % (15,644), and tri-, 25.98 % (12,949), while the rest, tetra, penta-and hexa-nucleotide repeat units accounted for 1.14 % (568), 0.21 % (102) and 0.51 % (253) of all the SSRs respectively. Most SSRs repeat units ranged from 5 to 13, with ve repeat units 14.77 % (7,363) six repeat units 18.88 % (9,407) and ten repeat units 27.70 % (13,804) as the most repeat types (Table 2). In the di-nucleotide repeat SSRs, AG/CT was the most abundant 49.53 % (7,749), followed by AT/AT 30% (4,693) and AC/GT 20.37% (3,186). Di-nucleotide GC/CG rich repeats were extremely scarce accounting for only 0.1% (16) (Fig. 2A). AAG/CTT 47.14% (6,104) was the most frequent followed by ATC/ATG 15.04% (1948) and AAT/ATT 9.84% (1,274) of the Tri-nucleotide repeats (Fig. 2B). ACAT/ATGT and AAAT/ATTT accounted for 32.04% (182) and 27.82 (158) of all the tetra repeats (Fig. 2C). Both the penta and Hexa-nucleotide repeat motifs had low abundance of 0.21% (102) and 0.51% (253) of all SSRs, respectively. Microsatellites Validation, polymorphism assessment and genetic diversity and structure Eighteen primers were used to screen for polymorphism among 92 individuals (Table 3). Of the 18 loci, 12 displayed polymorphism with the allele number varying from 2 to 11 (average of 3.31) and six yielded monomorphic products. Taking no account of the six monomorphic markers and two markers with low PIC values, ten markers were used to assess genetic diversity ( Table 4) The AMOVA (Table 5) of eight populations showed that 78% of total variance was found within the populations while 20% was found among the populations and only 2% among the individuals. STRUCTURE analysis identi ed three groups among the eight populations. (Fig. 3A, Δk = 3). The two populations from Mt. Kenya (Mt. Kenya 1 and Mt. Kenya 2) and population (Mbololo) showed a close genetic similarity forming one cluster, Ngangao 1, Vuria 1 and Yale populations clustered together, while some individuals in Vuria 2 and Ngangao 2 formed the third cluster. Ngangao 2 population showed a high level of admixture, Vuria 2 and Yale also had some of individuals in different clusters. Population genetic distance cluster analysis based on Unweighted Pair Group Method with Arithmetic Mean (UPGMA) by MEGA software analysis revealed a congruent result with STRUCTURE analysis (Fig. 3C). PCoA supported STRUCTURE and UPGMA cluster analyses indicating that these populations could be separated into three groups (data not shown).

Species distribution modeling
Maxent models revealed high levels of predictive performance under all scenarios (i.e. Current = 0.905, LGM = 0.889, RCP 4.5 = 0.923, RCP 8.5 = 0.921) ( Table 7). The most important variables limiting the distribution of D. viscosa during the LGM were: elevation (relative contribution: 42.7%), temperature seasonality (Bio4 -relative contribution: 17.8%), mean temperature of the wettest quarter (Bio8 -relative contribution: 12.0%), and precipitation of driest month (Bio14 -relative contribution: 11.2%). Besides, for the current and future models, elevation, Bio4, Bio8, and Bio18 (precipitation of warmest quarter), were the most important factors limiting the distribution of D. viscosa ( Table 8). The past and future range simulations were fairly consistent with the current potential distribution and covered areas in Kenya and Tanzania, with central and western Kenya, northern and southwestern Tanzania having the most favorable conditions (Fig. 4).Similar high habitat suitability was also observed in Kenya and Tanzania along the Indian Ocean Coast. The LGM conditions observed high population expansion and a signi cant increase in high suitability areas for D. viscosa compared to the future. Finally, future climatic projections for D. viscosa revealed that the pessimistic scenario RCP85 was more optimistic and highlighted the potential range expansion by 2070 compared to the intermediate pessimistic scenario RCP4.5. High suitable ranges in Kenya and Tanzania were signi cantly reduced during RCP 4.5 than RCP8.5. Precisely, areas in Southern Tanzania and Eastern Uganda revealed suitable habitats during the pessimistic scenario RCP8.5.

Discussion
In this study, frequencies of perfect SSRs composing of 1-6 bp long motifs were calculated. A total of 49,836 microsatellites with an average density of 280.24 SSRs/Mb sequence data were identi ed. The SSR density detected in this study is nearly similar with Jujube (321.56 SSRs/Mb) [49]. Plants exhibit variation of SSRs Repeat units within their genomes, for example, in Paleosuchus trigonatus, di-nucleotide repeat motif predominated [50], whereas tetra-nucleotide repeats were the most frequent in Coffea canephora [51], and tri-nucleotide repeat motifs being most frequent in Hyalessa fuscata [52]. The mono-nucleotide repeat motifs predominated in our study, which is similar to Alibertia edulis [53]. Among the di-nucleotide repeats, AG/CT was the most abundant, followed by AT/AT and AC/GT while GC/CG repeats were extremely rare in all the Di-nucleotide repeats. AAG/CTT was the most frequent, followed by ATC/ATG and AAT/ATT of the Tri-nucleotide repeats. In addition, microsatellites in D. viscosa exhibited AG/CT preference. Our study species' genetic diversity was much higher than the previously recorded in Blighia sapida (Sapindaceae) using SSR markers [54]. Of the 10 SSR loci, four of them had a negative inbreeding coe cient. Furthermore, Dodo 002, Dodo 026 and Dodo 032 deviated signi cantly from Hardy-Weinberg equilibrium, suggesting excess heterozygosity. High variation was found within populations (78%) compared to the variation among populations (20%) as revealed by AMOVA, showing that individuals within populations are genetically different, hence in accordance with the life history of this species. Our results were congruent with the Aesculus (Sapindaceae), which showed high variation within populations (86.06-89.90%) among populations 7.69-13% [55].
In the present study, the value of differentiation index F ST (0.221) showed a moderate genetic differentiation, while the 20% variation between populations revealed by AMOVA, could have resulted due to high gene ow between D. viscosa populations (Nm = 1.174). This result indicates that this species' seeds could have been transported from one point to another by either wind or animals, hence reducing the genetic differentiation among the populations. Principle Coordinates Analysis (PCoA) grouped together Mt. Kenya 1 & 2, Ngangao 2 and Mbololo individuals, and the second and third indiscriminately grouping the remaining samples into distinct genetic clusters. These results suggest that individuals from Mt. Kenya and Mbololo share many genetic units that differentiate them from other populations.
The UPGMA analysis on genetic distance placed the eight populations into two main groups, leaving one population ungrouped. We expected the individuals from the same sites to group together, however, Ngangao 2 and Mbololo in Taita hills clustered with Mt. Kenya populations while Ngangao1, Vuria1, Vuria2 and Yale were grouped into a distinct cluster. Mount Kenya 1 did not cluster with any other population, showing a more distant relationship to the other populations. This suggests that Mount Kenya 1 could be the Centre of diversity for this species in the region due to increased genetic differentiation, however, more data is needed to con rm this assumption. STRUCTURE analysis placed all the eight populations into three groups (k = 3), with individuals from Mt. Kenya, Mbololo and Ngangao 2 population clustering together, while the remaining populations were found in cluster 2 and 3. Vuria 2 population had nearly equal membership in the two major clusters 1 & 2 and were assigned to multiple parental clusters, while Ngangao 2 population showed mixed proportions with a larger proportion of individuals clustering with Mbololo and Mt. Kenya populations (Fig. 3B).
Rich genetic diversity was seen in the Taita hills and Mt. Kenya populations, especially in larger fragmented localities as projected by conservation genetics theory [56]. Our nding of high genetic diversity in D. viscosa populations is comparable with studies of other plant species with small populations and restricted distribution [57,58] or the one from fragmented and excessively exploited medicinal shrub plumbago [59]. Mbololo being the largest fragment (220 ha) has the best preserved vegetation and richest ora, whereas Ngangao, which is the second largest (206 ha), has only 120 ha occupied by native forest while the remaining parts are introduced plantations and rocky surfaces [13,60]. Vuria, a fragment located on a steep ridge, suffered massive land clearance and natural forest res some years back, while Yale fragment has 16 ha of natural forest. This nding is congruent with [61,7] studies which revealed that decline in population sizes after forest fragmentation might reduce the genetic diversity gradually after several generations. Similarly, based on fragment size and isolation study done by [61], larger forest remnants had signi cantly richer genetic diversity than smaller, more disturbed and isolated hills for the overall data set. This is clearly evident, as shown by Mt Kenya and Mbololo populations revealing high diversity. Moreover, our nding supported the previous studies that Mbololo forest has high diversity richness as a result of low level disturbance compared with Ngangao forest [16,15] and, this might be the reason as to why the Ngangao 2 forest and Vuria 2 were assigned multiple clusters. With the severe deforestation and fragmentation processes, the signs of bottlenecks were detected in more disturbed and fragmented forest patches, suggesting that D. viscosa experienced a recent bottleneck.
The SDM analyses inferred that D. viscosa had expanded its ranges during the LGM. During this period, Africa observed relatively cold and dry climatic conditions that may have persisted until the end of LGM [62]. As a result of D. viscosa's ability to tolerate different habitats and spread quickly [26], the species likely extended its ranges and population extents throughout this period. Conversely, the species ranges are observed to reduce in future scenarios. The anticipated future global warming will likely alter the persistence of different biodiversity [63].

Conclusions
Based on Illumina paired-end sequences, we validated eighteen microsatellite markers for this species, of which twelve were highly polymorphic with an average PIC value of 0.605 and an average allele number of 2.420 comparable to that of white spruce and black spruce [64]. Despite forest fragmentation, high genetic diversity is still maintained in populations with large patches compared to those with reduced forest areas. These results suggest that protection of smaller and isolated fragments should be put in place to avoid further destruction of existing habitats and more emphasis on conservation awareness among local residents is of great urgency. Lastly, our new ndings from the SDMs of D. viscosa could throw light on species with similar geographical distributions and other taxa in this region which are threatened or endangered.          Figure 1 Geographical map indicating sampling locations of Dodonaea viscosa. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.  Habitat suitability is classi ed as follows, high-in red, medium-in yellow, low-in light green, and unsuitable-in dark green. Circular pink dots denote occurrence localities used in developing the models in the present study obtained from GBIF Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.