Genetic diversity and population structure of Vernonia [Vernonia galamensis (Cass.) Less] populations from Ethiopia revealed by SSR markers

Vernonia (Vernonia galamensis) is a potential novel industrial crop due to high demand to its natural epoxidised oil, which can be used for the manufacturing of polyvinyl chloride, adhesives, and petrochemicals, cosmetic and pharmaceutical products. This study is initiated for the systematic and intensive assessment of V. galamensis accessions genetic diversity through SSR molecular markers to minimize the existing research gaps, and provide a clue for germplasm conservation and further research. A total of 150 V. galamensis accessions were analyzed using 20 SSR markers. The markers detected a total of 79 with an average of 3.9 alleles per locus. The mean number of effective alleles was 3.06 and, the mean observed heterozygosity (Ho) was 0.15 across all the 20 markers evaluated. The marker also showed the highest percent of polymorphism that ranged from 0.50 to 0.96 with an average of 0.76. The analysis of molecular variance showed only 11% variation was among populations, 22% among individuals within populations and 67% within individuals. The largest number of migrants per generation was occurred between Derashie and Wollo (Nm=7.37) whereas the lowest values was between East Harerghe and West Harerghe (Nm =1.42). A factor analysis including dendrogram clusters and principal coordinates classified the 150 accessions into 4 groups. However, the Bayesian model based clustering (STRUCTURE) grouped into 3 (K = 3) major gene pools. These analyses showed accessions collected from the same region of origin did not often grouped entirely together within a given major groups. the The

detected the larger number of alleles, higher expected heterozygosity than observed heterozygosity. The markers applied to ten populations, in which East Showa and West Harerghe revealed higher genetic diversity, and can be considered as the hotspots for insitu conservation of V. galamensis. In addition, the values of SSR markers such as heterozygosity, Shannon's index, polymorphic information content and population clusters are an important baseline information for future V. galamensis cultivation, breeding and genetic resource conservation endeavors in Ethiopia.
Study genetic diversity of crop plants is a valuable tool for wild populations such as V.
galamensis to address about its conservation, levels of gene flow among populations and its improvement through breeding [14,15]. The assessment of genetic diversity within and between plant populations is routinely performed on the basis of morphological, biochemical, and molecular markers [16,17]. Further, there is a need to characterize the diverse genetic resources using different statistical tools and utilize them in the breeding programmes [18].
SSR markers are among the most commonly used molecular markers to evaluate the genetic diversity within species, to investigate phylogenetic relationships, to identify and test the paternity of cultivars, to study population structure and gene flow, and to develop a gene mapping [19,20]. SSRs are highly versatile genetic markers because of their codominant inheritance (distinguishes homozygotes from that of heterozygotes), high abundance, highly polymorphic due to the high mutation rate affecting the number of repeat units, enormous extent of allelic diversity (good genome coverage), ease of assessing SSR size variation through PCR with pairs of flanking primers and high reproducibility [15]. SSR markers, however, have limitations such as genomic sequencing is needed to design specific primers; it is also not very cost effective and requires much discovery and optimization for each species before use [21]. To date, no information, no anyone used these SSR markers to study the genetic diversity of V. galamensis.
In Ethiopia, geo-ecological conditions are favorable for the cultivation of V. galamensis, and used as a source of raw material for agro-processing industries [2,9,10,22].
However, the plant is neglected and considered only as a wild weed colonizing disturbed areas and bare agricultural lands [3]. As a result, the crop is not cultivated in any of the collection sites and/or elsewhere in the country. Moreover, lack of attention, negligence in research and conservation, priority has been given to other major crop plants while the potential industrial values of V. Galamensis is underestimated and underexploited. The plant is also under threat of continued genetic erosion. This study is therefore initiated for the systematic and intensive evaluation and characterization of V. galamensis accessions genetic diversity through molecular analysis using SSR markers to minimize the existing research gaps, and provide a clue for germplasm conservation and further research. Methods available in the Flora of Ethiopia and Eritrea. Most of the study materials were collected from the field and, others were assembled from the Ethiopia Biodiversity Institute and Wondo Genet Agricultural Research Center.
At each collection areas, seed samples were collected from plants and kept in separate bags. To ensure that the distance between any two collecting site was about 5-10 Km.
From Collection areas observations, V. galamensis naturally grows in hilly/depression, along the roadside, in valley, farm lands, in forest, in the compounds of mosques and church [23]. The collections were done by taking either seed samples of individual flower head or seeds from plants with all matured flowers, and then accessions were threshed, cleaned and documented. V. galamensis was not cultivated in any of the collecting sites.

Leaf sample collection and DNA extraction
Fresh young leaves of 150 V. galamensis accessions were collected from individual plants that grown at experimental sites, representing 10 populations. A collected leaf samples was put in a sealed bag envelope and dried with silica gel (with 1:10 ratio of leaf samples to silica gel), then kept under room temperature until it used for later DNA extraction according to Gilbert et al. [24]. The dried leaf samples were transported to Huazhong Agricultural University, China for genetic analysis. The total genomic DNA extraction was made according to the modified CTAB protocol of Doye and Doye [25]. Extracted DNA was visualized on a 1% (w/v) agarose gel and quantified spectrophotometrically using a Nanodrop® 2000 (Thermo Scientific, USA). Finally, it was stored at -20 °C for further use.

Primer Screening and Optimization
About 63 simple sequence repeats (SSR) markers were developed by Narina et al. [26] and available in the database (gene bank). Among these, 30 SSR markers were selected, and finally 20 SSR markers were used for this study based on their high polymorphism and compatibility for multiplexing (Table 2). Optimization was carried out by a sequential investigation of each reaction variable, testing different cycling conditions and then by varying (1) the amount of DNA template, (2) the concentration of primer, and (3) the concentration of Taq PCR master mix.

Polymerase Chain Reaction (PCR)
The amplification reaction was performed with a thermal cycler using 96-well plates (T100 TM Thermal Cycler) in a total volume of 10 μl reaction mixture, containing 100 ng/ml of template DNA, 5 μl 2 x Taq PCR master mix (Vazyme P213-01, China), 1 μl of forward and reverse primers and 3.0 ml of double distilled water. The PCR amplification was programmed at an initial denaturation step of 5 minutes at 94 °C followed by 35 cycles of 30 s denaturation at 94 °C, annealing at 56/58 °C (depending on primers) for 30 s, initial extension at 72 o C for 1 minute and final extension at 72 °C for 5 minutes. The amplified DNA samples were stored at 4 °C until it was loaded on the agarose gel for electrophoresis, then the amplified PCR products were separated by electrophoresis using 3% agarose gel.

Band Scoring and Analysis
The amplified products were visually scored based on its migration in comparison with the size standard (100 bp DNA ladder) from the gel photographed under UV illumination (Gel Doc TM with Image Lab TM software, BIO-RAD, in the lab of drug discovery and Technology).The genetic diversity for each alleles such as the number of different alleles (Na), the effective number of alleles (Ne), Shannon's diversity index (I), observed heterozygosity (Ho), expected heterozygosity (He), F-statistics values (F is , F it and F st ), polymorphic information content (PIC), random segregation and distribution (Hardy-Weinberg equilibrium) of each genotype within populations for each locus, Nei's genetic identities (Ji), genetic distances (Ds) and gene flow (Nm) in V. galamensis populations were performed using GeneAlex version 6.503 software [27].
Simple matching dissimilarity coefficient-based Unweighted Pair Group Method with Arithmetic Mean (UPGMA) and Neighbor-Joining (NJ) tree was computed using DARwin version 6.0.19 software [28]. The resulting trees were displayed using Fig Tree version. 1.4.4 [29]. The principal coordinated analysis (PCoA) was performed using GeneAlex version 6.503 software [27]. The SSR markers data that subjected to a Bayesian modelbased cluster analysis was performed using STRUCTURE version 2.3.4 software [30]. To determine the most likely number of populations (K), a burn-in period of 50,000 was used in each run, and data were collected over 500,000 Markov Chain Monte Carlo (MCMC) replications for K = 1 to K = 10 using 20 iterations for each K. The optimum K value was determined according to Evanno et al. [31] using the web-based (http://tyloro.biology.ucla.edu\structure Harvester\) STRUCTURE HARVESTER ver. 0.6.92 [32]. The results generated by this software were visualized in a graphical bar plot using Clumpak beta version (http://www.clumpak.tau.ac.il/) [33].

Molecular based genetic diversity of Vernonia galamensis using SSR markers
Twenty SSR markers were used for the characterization and genetic diversity analysis of the 150 V. galamensis accessions, all of which were polymorphic (Table 3). A total of 79 alleles were identified, varied from 2 to 6 with an average of 3.9 alleles per locus. The maximum number of effective allele (Ne) was 4.79 (Vg-03) and the least number of effective alleles was 1.99 (Vg-16). The highest major allele frequency (MAF) (0.85) was recorded by locus Vg-01 and the least MAF was (0.45) recorded by locus Vg-03. The observed heterozygosity (Ho) values were quite low that ranged between 0.05 (Vg-21) and 0.36 (Vg-03) with an average of 0.16 across all the 20 markers evaluated. The expected heterozygosity (He) mean was 0.50, ranged from 0.23 (Vg-11) to 0.65 (Vg-03) ( Table 3).
Shannon-Weaver's information indices (I) ranged from 0.86 to 1.67, and averaged at 1.20.  Table 3). The results of diversity parameters showed that a high level of polymorphism among the 20 SSR markers, favoring the genetic variation within V. galamensis collection.

Analysis of molecular variance (AMOVA) and genetic distances
The molecular analysis of variance (AMOVA) showed a 67% of the total variation was attributed to genetic variability among individuals from different populations, whereas 22% was due to variation among individuals within the same population. In contrast, a smaller portion (11%) was among populations variations (Table 4).

Genetic distance among Vernonia galamensis populations
The maximum pairwise Nei's [34] genetic distance (GD) was observed between populations of Borena and East Harerghe (0.57), followed between populations of Sidama and West Harerghe (0.54), whereas the minimum genetic distance was observed between populations of Borena and Konso (0.24). Further, the highest pairwise Nei's genetic identity (I) was occurred between Konso and Derashie (0.80) population, while the least Nei's [34] genetic identity was observed between Borena and Konso (0.24) populations.
The overall magnitude of pairwise population matrix of Nei genetic distance was relatively lower than that of Nei's genetic identity (Table 5).

Cluster and principal co-ordinate analysis (PCoA)
Clustering analysis was performed based on the allelic frequency, grouped the 150 accessions into four (4) major clusters from the main node using neighbor-joining, with the DARrwin 6.0.19 software programs. Each of the four clusters comprises individual plants from different zones (geographic regions). The first and the third cluster further divided into sub-clusters, the samples grouped according to their geographic origin ( Figure 2).
The first cluster constituted 41 accessions mainly from Borena (11) and West Harerghe (12), the second cluster contained 25 accessions, and mainly from Wollo (8), cluster three was characterized as the major group in clustering, composed of 59 accessions while the fourth cluster, C 4 , comprised accessions mainly from West Arsi (7) (Figure 2). Generally, the cluster analysis revealed that accessions from different populations (collection sites) clustered together, and clusters did not follow a clear pattern of geographic origins.
The principal co-ordinate analysis (PcoA) showed that the majority of samples were placed at the center of a two-dimensional coordinate plane and roughly forms four groups ( Figure   3). The first three axes of the PCoA accounted together 33.02% of the total variation.

Population structure analysis
Analysis of population structure distinguished the 150 V. galamensis accessions using a model based Bayesian approach with the highest ΔK Value that ranging from K = 1 to K= 10 and 20 iterations for each K. According to Evanno et al. [31] and Gilbert et al. [35] STRUCTURE outputs were used for STRUCTURE Harvester and predicted K = 3 were most likely selected to describe the genetic structure of the 150 V. galamensis accessions ( Figure 4). Based on this value, population structure (Clumpak result) revealed that accessions collected from the same region of origin did not often grouped entirely together within a given major groups. There was a wide admixture in structuring of V.
galamensis populations, agreed with neighbor joining trees.

Determination of SSR-markers based genetic diversity with genetic parameters
Vernonia galamensis is a potential novel industrial crops, contains naturally occurring epoxidized oil. However, its potential values are neglected, underestimated and underexploited. In addition, it also exposed to genetic erosion. Therefore, assessment of genetic diversity with SSR markers generally in plants and particularly in V. galamensis is important for in-situ and ex-situ conservation and efficient management, for selection and improvement of the available genetic resource [16]. The SSR study showed considerable genetic diversity, the average number of alleles (3.9) detected in this study was higher than that reported by Keneni et al.

Genetic Differentiation and Gene Flow
The (AMOVA) demonstrated that V. galamensis had low variation among population (11%).
On the other hand, 67% of the total variation was attributed to genetic variability among individuals from different populations and 22% was due to variation among individuals within the same population. The result is similar to the previously reported in chickpea Genetic distance is the measure of the allelic substitutions per locus that have occurred during the separate evolution of two populations, and in this study the largest genetic distance was observed between Borena and East Harerghe (0.57) populations, while the minimum genetic distance was observed between Borena and Konso (0.24). The overall magnitude of pairwise population matrix of Nei genetic distance was relatively lower than that of Nei's genetic identity. The genetic identity of two populations could be due to interspecific hybridization that has occurred throughout their evolution, which favors allele sharing [36].

Clustering and principal co-ordinates among Vernonia galamensis accessions
In the present study, a phylogenetic tree was constructed based on the 150 accessions of V. galamensis collected from different geographic and agro-ecological regions.

Authors' contributions
AM, KD and KT designed the study. AM and XH coordinate and carried out the laboratory work. AM performed statistical data and wrote the manuscript. All the authors read and approved the final manuscript.

Consent for publication
Not applicable utilizing and reporting population genetic analyses: the reproducibility of genetic clustering using the program STRUCTURE. Mol. Ecol. 2012; 1-6.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.