Genetic Diversity, Population Structure and Relationship of Ethiopian Barley (Hordeum vulgare L.) Landraces as reveled by Simple Sequence Repeat (SSR) Markers

Characterization of genetic resources maintained at genebanks has important implications for future utilization and collection activities. A total of 49 simple sequence repeat (SSR) or microsatellite markers were used to study genetic diversity and relationships among 376 barley landraces collected from different barley producing parts of Ethiopia and eight cultivars. Overall, 478 alleles with an average of 9.755 alleles per locus were obtained of which 97.07% of the loci were observed to be polymorphic. Nei’s genetic diversity index (h) was 0.654, and the Shannon diversity index (I) was 0.647, indicating that the genetic diversity in barley genotypes studies was moderately high. At the population level, the percentage of polymorphic loci (PPL) averaged 98.37%, h = averaged 0.388, and I = averaged 0.568. The highest level of genetic diversity was observed in the AR population (PPL =100%, h = 0.439, I = 0.624); the lowest was observed in the JM population (PPL = 75.51%, h = 0.291, I =0.430). AMOVA revealed signicant genetic differentiation within and between populations (P < 0.001), with 84.21% of the variation occurring within populations and 15.79% occurring among populations. Genetic variation analysis showed a coecient of gene differentiation of 0.053 and a gene ow value of 4.467 among populations. The 384 barley genotypes were divided into seven genetic clusters according to STRUCTURE, Neighbour joining tree and principal coordinate analysis, correlating signicantly with geographic distribution. These results will assist with the formulation of conservation strategies, such as genetic rescue and on-farm in situ and ex situ conservation. et al. 2002). SSRs have been used to study genetic diversity in various crop species, including maize (Eleuch et al. 2008), soybean (Wen et al. 2009), sorghum (Li et al. 2010), cowpea (Badiane et al. 2012) and foxtail millet (Wang et al. 2012). SSRs have also been used to construct linkage maps, assess phylogenetic and population genetic relationships and identify molecular markers for marker-assisted selection in wheat (Song et al. 2005), barley crosses (Hearnden et al. 2007), groundnut (Varshney et al. 2009), peanut (Hong et al. 2010), Bermuda grass (Guo et al. 2011) and Walnut (Kefayati et al. 2019). genetic studies mixtures of landraces and cultivars. In present study, carried out and structure by using simple sequence repeat(microsatellite) markers. barley genotypes used in this study are representative of collection from different barley producing Zones in Ethiopia. The analysis of population structure assigned all the 384 barley genotypes into seven clusters. All the three methods used viz. STRUCTURE, UPGMA clustering, and Discriminant analysis of principal component (DAPC) or PCoA; consistently recovered the same seven groups. The consistency of grouping using these methods has also been observed in earlier studies on different crop species (Tascioglu et al., 2016; Ya et al., 2017; Ketema et al., 2020). The differentiation of the population into different subpopulations by fastSTRUCTURE is based on frequencies of relatedness of the genotypes to each of the subpopulations hypothesized al., Genetic information from this detailed study has provided rst-hand data of the genetic diversity and structure of Ethiopian H. vulgare populations in its cultivation ranges distributed across various agro-ecologies which are crucial for developing strategy for conservation and use to improve the productivity of the crop. Among 15 populations from 15 localities across the majority of barley producing Zones, 478 alleles were obtained in total with an average of 9.755 alleles per locus. Natural populations maintained moderate to low genetic diversity levels, high gene ow and low genetic differentiation among populations. AMOVA also demonstrated major variation existed within populations, which is attributed to high gene ow facilitated by seed exchange. From the results of STRUCTURE analysis, 15 natural populations were categorized into seven groups by PCoA cluster analysis, which could possibly be considered as seven management units for the purpose of conservation. The largest number of populations should be saved by on-farm conservation and ex situ conservation measures, taking precedence over those with genetic diversity and differentiation. In this study, the markers used allowed investigation of population structure, genetic diversity and proposed germplasm collection and a conservation strategy for H. vulgare L. Important information about genetic structure was provided by these markers, which signicantly contribute to future improvements and breeding plans for the crop. The genetic diversity, population structure and genetic relationships between the populations through SSR analysis will be helpful for crop breeding to improve its productivity. To conclude, these results provide value as an important resource to study genetic diversity and support conservation and use the marker to initiate molecular breeding for future improvement.

polymorphic information content (PIC), major allele frequency (MAF), inbreeding coe cient (F IS ) were calculated with the R packages diveRsity using the divBasic function (Keenan et al. 2013). Analysis of molecular variance (AMOVA) was carried out to partition the genetic variances into two levels: Among populations and within populations and conducted using the R package poppr (Kamvar et al. 2014). Pairwise F ST values between populations and gene ow based on SSR were calculated with GENEPOP version 4.2.1 (Rousset, 2008) A cluster analysis among populations, based on UPGMA, was also developed using the PowerMarker v3.25 software (Liu and Muse, 2005). An unrooted tree was constructed based on pairwise standard genetic distances (Liu and Muse, 2005), using the least squares algorithm with 10,000 bootstrap replicates, and these processes were generated and analyzed using Molecular Evolutionary Genetic Analysis (
The structure analysis was run three times for each K value (K = 1 to 15) using a burnin period of 50,000 with 100,000 Markov Chain Monte Carlo (MCMC) iterations, assuming an admixture model and uncorrelated allele frequencies. The most probable value of K for each test was detected by ΔK (Evanno et al., 2005), using the web-based program Structure Harvester (Earl and vonHoldt, 2012). CLUMPP v.1.1.2 (Jakobsson and Rosenberg, 2007) was used to align cluster assignment from independent runs using the in-les generated by structure Harvest. Bar plots were generated with average results of runs for the most probable K value, using DISTRUCT v.1.1 (Rosenberg, 2004). As suggested in Ketema et al. (2020) genotype was considered to belong to a group if its membership coe cient was > 0.70. Genotypes with membership coe cient less than 0.70 at each assigned K were regarded as admixed.
To cross-check the results from the model-based population structure from STRUCTURE with a model-free other method, DAPC was used. DAPC is a multivariate method designed to identify and describe clusters of genetically related individuals and the analysis was performed using adegenet version 2.0.1 (Jombart et al. 2010) in the R environment. In the absence of a known grouping pattern, DAPC uses sequential K-means and model selection to build genetic clusters based on information from genetic data. The Bayesian information criterion (BIC) was used to identify an optimal number of genetic clusters (K) to describe the data. Based on the calculation of the α-score, the optimal number of principal components was retained.
The number of clusters obtained from STRUCTURE and DAPC was compared with those from PCoA without any assumption about the underlying population genetic model and it was performed using GeneAlEx 6.51b2 (Peakall and Smouse, 2012). PCoA is a distance-based approach to dissect and display dissimilarities between individuals.

Results
Characterization and understanding of the genetic diversity and population structure of any species is very much essential to design any conservation strategy or genetic improvement program. In this study, a total of 49 SSR primers (S1 Table) that showed reproducible polymorphic ampli cation during screening step were used to analyze 384 individual genotypes of H. vulgare from Ethiopia covering 15 populations, resulted in a total of 478 alleles. The number of bands ampli ed ranged from 14 to 43 with an average of 31.533 bands per population. The SSR fragment size ranged from 114bp for HVM44 to 263bp for Bmac0040 SSR loci. The band pattern across these 15 populations is given in S1 Figure. SSR locus diversity and Polymorphic Information Index The parameters of the variability of the investigated loci are shown in Table 1. Overall, 478 alleles were observed at the 49 SSR loci with an average of 9.755 alleles per SSR marker. The major allele frequency varied from 0.531 (GMS116A) to 0.784 (EBmac0518) with average value of 0.689. The number of allele per locus ranged from 5 (Bmac0316) to 18 (WMC1E8 and Bmac0040). The number of effective alleles (Ne) ranged between 1.047 (GMS116A) and 3.712 (HVLTPPB), with an overall mean of 2.068. The polymorphic information content (PIC) ranged from 0.243 (WMC1E8) to 0.885 (Bmac0040). Signi cant and positive correlations were found between PIC and He (r=0.91, P<0.001) PIC and number of alleles (r=0.89, P<0.001) and, He and number of alleles (r=0.84, P<0.001).
The inbreeding coe cient among populations (F IT ) values ranged from -1.000 (Bmag0173, Bmac0273e, Bmac0093 and Bmag0905) to 0.625 (HVLTPPB), with a mean of 0.074. The population differentiation (evaluated by F ST ) was estimated at 0.049. The contribution of 49 SSR for population segregation (determined by F ST statistics) varied from 0.000 (Bmag0173, Bmac0273e, Bmac0093 and Bmag0905) to 0.187 (WMC1E8). The overall F-statistics differed signi cantly (p<0.05) from zero. This differentiation had a signi cant contribution from all loci. The values for H o ranged from 0.112 (HVM20) to 0.420 (HVLTPPB), with an overall mean of 0.317, while the values of H e ranged from 0.260 (Bmac0129b) to 0.640 (GMS116A), with a general mean value of 0.479. The average number of migrants per generation (Nm) in the whole population ranged from 0.000 (Bmac0273e, Bmag0173, Bmac0093 and Bmag0905) to 18.981 (Bmac0067) and across all the loci was found to be 6.613. Only 12.25% of the loci in all barley populations, did not differ considerably (p >0.05) from the HWE.
Other descriptive statistics were found to vary among loci and among genotypes studied. Among the analyzed SSR markers, a total of 92 alleles were found to be unique (i. e. occurred either in two-rowed, six-rowed or cultivars) as shown in Table 2. The two-rowed barley landraces possessed 43 (46.74%) and the six-rowed barley type possessed 32 (34.78%) unique alleles on the other hand, cultivars possessed 17 (18.48%) unique (private) alleles.

Genetic diversity indices for barley genotypes
Genetic diversity indices for barley landraces based on origin (Administrative Zones), improvement status and kernel row number, is summarized in Table  3. All the loci were polymorphic. The observed and expected frequencies of heterozygote were not statistically different (p>0.05), hence, the inbreeding coe cient (F) estimates observed were not substantially different from zero. The mean number of alleles varied from 1.255 to 10.670. The highest count of alleles (10.670) was found in the AR population. The highest count of private alleles (12) was observed in the WL population, while the GG, HD, and MT populations contained lower private alleles (2). The number of effective alleles ranged from 1.502 to 1.783. The Shannon Index (I), which is an expression of population diversity in a particular habitat, was high in the AR (0.624) and low in JM population (0.430). Furthermore, the lowest observed heterozygosity was in the WL (0.368) while the highest was recorded in AR population (0.699). The expected heterozygosity in the populations ranged from 0.212 (for WG) to 0.503 (for GN population).
Similarly, all the loci were polymorphic for breeding status and kernel row types, and number of alleles varied from 7.380 to 9.345 (improvement status) and 3.405 to 6.751 (kernel row types). The numbers of private (unique) alleles were higher for landraces and for six-rowed barley types (Table 3).

Population Structure and Clustering Analysis
The two complementary methods (STRUCTURE and DAPC) used in determining the number of clusters among the sub-set of the barley collections. Both showed the presence of seven major clusters along with sub-clusters. The population structure of the 384 barley genotypes was inferred using STRUCTURE 2.3.4 and the peak of ΔK was observed at K = 7 ( Fig. 2A), suggesting the presence of seven main populations (7 clusters, C-I, C-II, C-III, C-IV, C-V, C-VI and C-VII) in the barley genotypes studied across major barley growing regions of Ethiopia. The distribution of the tested barley genotypes based on DAPC plot (Fig. 2B), UPGMA clustering dendrogram (Fig. 3) and the model-based structure analysis (Fig. 4B) show that the accessions are divided into seven major clusters. Using 0.70 as the likelihood to cluster for each accession in the seven populations, a total of 336 genotypes (87.5%) were grouped to one of the seven populations. In the rst cluster, 46 (11.98%) of total genotypes was grouped, in the third cluster, 63 (16.41%), in seventh cluster, 56 (14.58%) were grouped. The admixture comprised 48 genotypes (12.5%) of total genotypes (Table 4). The proportions of membership of each prede ned population in each of the clusters obtained at the best K (K = 7) from STRUCTURE and discriminant analysis of principal component (DAPC) is presented in Table 4. Unweighted pair group method with arithmetic mean (UPGMA) analysis from PowerMarker V3.25 and visualized with MEGA-X and Fig Tree also clearly divided the 384 genotypes into seven groups (Fig 3), which was consistent with the model-based population structure from STRUCTURE. Membership clustering using DAPC also grouped the genotypes into seven clusters ( Fig 2B and Table 4).
The classi cation of accessions into populations based on the model-based structure from STRUCTURE 2.3.4 is shown in Fig 4. To con rm the true value of K, another model-free method, DAPC, was used. The optimum number of clusters was obtained with K = 7 using the Bayesian information criterion (BIC), which again divided the genotypes into seven sub-populations.
The pairwise genetic differentiation coe cient (F ST ) between pairs of population was highly signi cant (P<0.001), ranging from 0.012(TC and TN and TC and WL) to 0.612 (TN and HD and GJ and GG) with a mean value of 0.210 (Table 5). Among the 15 populations, both TC and TN populations showed a relatively high differentiation from the other populations (50% of pairwise F ST > 0.4). Conversely, the estimated values for gene ow (Nm) between populations varied from 0.158 (between GG and GJ and TN and HD) to 20.583 (between TC and TN) with a mean value of 3.440 (Table 5).

AMOVA and genetic partitioning
The matrix of pairwise genetic distances between populations (Table 6 and Fig 5) showed low genetic distance (0.017) between AR and BL, populations.
A similar trend was observed in GJ and GN (0.020) and TC and TN (0.018). On the other hand, the highest genetic distance (0.549) was observed between population from Sidama and two Tigray zones.
Analysis of molecular variance (AMOVA) based on geographical origin (Administrative Zones), breeding status (i.e., improved and local) and kernel row number (two-rowed, six-rowed and irregular) showed that the genetic variations within population (84.0%, 90.0% and 98.0%) were larger than that among populations, based on zones of origin, breeding status and kernel row number (16.%, 10% and 2%), respectively ( Table 7). All of these variance components were highly signi cant, indicating that genetic differentiation within populations was signi cant.
The xation index (F ST ) values in a range of 0.0 to 0.05 generally considered as little differentiation, in the range of 0.05 to 0.15 suggested moderate differentiation, 0.15 to 0.25 large differentiation, and above 0.25 as very large differentiation. In this study, the observed F ST values based on Zones of genotypes origin, breeding status and kernel row number were 0.053, 0.097 and 0.221, respectively, suggesting the presence of moderate differentiation. We also found that gene ow between regions was high as indicated by an Nm= 4.467 (Table 7).
Analysis of genetic differentiation were computed among 7 clusters based on DAPC analysis using the AMOVA method. Results indicated that 19.301% and 80.699% of variations could be attributed to differentiation among clusters and within inferred clusters, respectively (S3 Table). F ST and gene ow (Nm) among the studied population were 0.041 (p < 0.001) and 5.848, respectively. Furthermore, pairwise cluster F ST was computed and estimated values for 7 clusters ranged from 0.057 (C2 with C7) to 0.204 (C3 with C4) (S4 Table). Also, pairwise cluster estimates of gene ow (Nm) for 7 clusters ranged from 0.978 (C3 with C4) to 4.595 (C1 with C5) migrants per clusters.

Discussion
SSR marker polymorphism and its use in barley diversity assessment The SSR microsatellites are useful tools for identifying genetic relationships among varieties that are di cult to distinguish morphologically. It has also been used successfully to determine genetic diversity among many plants including barley (Varshney et al., 2004;Abebe et al., 2010;Shiferaw et al., 2012;Bellucci et al., 2013;Ren et al., 2014;El-Esawi, 2016;Chen et al., 2020).
In barley, the rst molecular genetic maps comprised RFLP markers (Graner et al., 1991;Kleinhofs et al., 1993), however, through time, PCR based molecular markers became the dominant marker type (Varshney et al., 2004). Among different types of molecular markers available for barley, SSRs have been proven to be the markers of choice for marker-assisted selection (MAS) in breeding and genetic diversity studies, based on the identi cation of molecular patterns such as expected and observed heterozygosity and polymorphic information content (Varshney et al., 2005;Varshney et al., 2007). This is largely because they require small amounts of sample DNA, are easy to detect by PCR, are amenable to high-throughput analysis, co-dominantly inherited, multi-allelic, highly informative and abundant in genomes (Powell et al., 1996;Gupta and Varshney, 2000).The value of microsatellite markers for both genetic diversity studies and for barley breeding was demonstrated as early as 1994 (Saghai Maroof et al., 1994;Becker and Heun, 1995;Struss and Plieske, 1998). Later, comprehensive microsatellite genetic maps integrating SSR loci were prepared by Ramsay et al., (2000) and Li et al. (2003).
The SSR markers selected in this study yielded reproducible polymorphic bands in 384 H. vulgare L. genotypes and showed that they provide a powerful and reliable molecular tool for analyzing genetic diversity and relationships among H. vulgare genotypes. Overall, 97.07% of the bands generated by the SSR assay in this study were polymorphic, which was lower than the polymorphic proportions of 99.50% detected by SSR markers among barley accessions from Ethiopia as reported by Abebe et al. (2010). This variation could be explained by the type of SSR primers, sample type and size used in the studies.
Molecular marker with higher polymorphic information content (PIC) values is considered as powerful marker to identify and discriminate cultivars. A locus with an estimated PIC value greater than 0.50 is considered to be highly diverse (Botstein et al., 1980). In this study, the PIC values of the SSR markers used in the H. vulgare genotyping ranged between 0.243 and 0.885, with an average of 0.727. Among 49 primers used in this study, a total of 32 (65.3%) primers have PIC value more than the average 0.727 which indicated that the highly informative SSR markers that could be employed in genetic diversity studies of H. vulgare L. genotypes.

Levels of Genetic diversity among barley populations
Molecular methods have become an essential part of most studies on genetic diversity extent and the studies may use RFLPs, RAPDs, AFLPs or SSRs. It is important, however, to understand that different markers have different properties and will re ect different aspects of genetic diversity (Karp and Edwards, 1995). In this study, the levels of genetic diversity of barley genotypes from various parts of Ethiopia were studied using SSR markers.
The average number of alleles, which is 9.76 per locus, for the 49 SSR loci was higher than values reported in other studies on barley (Abebe et al., 2010;Ould Med Mahmouda and Hamzaa, 2009); which could be associated with a higher number of landraces and primers used in this study. However, this is also a re ection of the diversity in Ethiopia, both arising from the fact that this is one of the, if not the major, centre of diversity of barley, and the fact that there are diverse end-uses for barley in this region. Moreover, Molina-Cano et al., (2005) and Orabi et al., 2007 based on chloroplast DNA were suggested that Horn of Africa, Ethiopia and Eritrea to be a possible center of origin and domestication for cultivated barely.
Among the analyzed SSR markers, a total of 92 alleles were found to be unique to two-rowed, six-rowed or cultivars (i. e. occurred either in two-rowed, sixrowed or cultivars). The two-rowed barley landraces possessed 46.74% and the six-rowed barley type possessed 34.78% unique alleles on the other hand, cultivars possessed 18.48% private alleles. Matus and Hayes (2002) suggested that the occurrence of so many unique alleles could be an indication of the relatively high rate of mutation (and diversity) at SSR loci and its potential as a reservoir of novel alleles required for plant breeding. The genetic diversity values for the SSR markers were high for all landraces as well as for landraces in each Zone, improvement status (particularly for those considered as landraces) and the kernel row numbers.
The high diversity observed in Ethiopian barley landraces could be attributed to various factors including evolution of cultivated barley (Molina-Cano et al., 2005;Orabi et al., 2007); subsistence farming practice that rely on landraces (Demissie and Bjornstad, 1997;Lakew et al., 1997;Teshome et al., 1997;Hadado et al., 2010;Abebe et al., 2010), the geographic and agro-climatic variability of the area affecting adaptability of landraces (Megersa et al., 2019;Taye et al., 2019;Yang et al., 2020). The barley growing region of Ethiopia is characterized by a very diverse topography with gorges and hills creating niche or variable micro-environments of barley production (Demissie and Bjornstad, 1997;Lakew et al., 1997;Hadado et al., 2009). The local farmers are cultivating divers landraces for different socio-cultural uses (Shewayrga and Sopade, 2011;Khan et al., 2012) and also served as an insurance to avert risk of crop failure as well as to meet niche environments. Consequently, farmers of the area have been maintaining invaluable diversity for generations. It has been documented that farmers make conscious decision and management efforts based on agro-ecological condition and end-use to maintain landraces diversity (Teshome et al., 1997;Seboka and van Hintum, 2006). Preferences of different landraces for various end-uses like tworowed (e.g. Kolo-roasted barley used as snak), roasted grain (milky dough stage), local fermented or non-fermented beverages (e.g. Qaribo, Shameta and Tela), occasional dish (e.g. Chuko) and daily dishes (e.g. Injera) affect the selection and maintenance of landraces, which in turn affect genetic diversity (Shewayrga and Sopade, 2011). Phenotypic study of a larger sample of 585 accessions (from which the 384 accessions were subsampled) indicated a high variability for both quantitative and qualitative traits (Dido et al., 2020a;Dido et al., 2020a).

Partitioning Genetic diversity and population Structure
The barley genotypes used in this study are representative of collection from different barley producing Zones in Ethiopia. The analysis of population structure assigned all the 384 barley genotypes into seven clusters. All the three methods used viz. STRUCTURE, UPGMA clustering, and Discriminant analysis of principal component (DAPC) or PCoA; consistently recovered the same seven groups. The consistency of grouping using these methods has also been observed in earlier studies on different crop species (Tascioglu et al., 2016;Ya et al., 2017;Ketema et al., 2020). The differentiation of the population into different subpopulations by fastSTRUCTURE is based on frequencies of relatedness of the genotypes to each of the subpopulations as hypothesized (Nielsen et al., 2014;Chao et al., 2010).
In this, we expected to have more than 10 groupings, corresponding to the landraces geographic origin. This expectation was based on the conviction that the barley landraces are grown in diverse agro-ecologies of Ethiopia, ranging from lowland to high mountainous areas with diverse ethnic groups practicing unique socio-cultural activities, are genetically more diverse than newly introduced modern cultivars ( Lakew et al., 1997;Hadado et al., 2009;Abebe et al., 2010). However, contrary to our expectation, the result obtained did not yield such distinct strict clusters based on geographic origin, but rather admixture (Table 5) was observed among all seven subpopulations. Substantial admixture in the population was indicated by the rst principal component explaining only 23.10% of the total genotypic variation. We observed that cluster III had the highest proportion of SD (Sidama) genotypes (33.33%) and WL (Wellega) genotypes (25.00%) comprising highest proportions of landraces (16.80) followed by cluster VII (14.92%) which had the highest proportion of landraces (33.33%) from MT ( ) zone. This admixture could be due to gene ow facilitated by seed exchange among barley farmers in various agro-ecologies in shared market and also long distance trade (Desmae et al., 2016).

Conclusion And Implication For Conservation And Use Of Barley Landrace For Improvement
Genetic information from this detailed study has provided rst-hand data of the genetic diversity and structure of Ethiopian H. vulgare populations in its cultivation ranges distributed across various agro-ecologies which are crucial for developing strategy for conservation and use to improve the productivity of the crop. Among 15 populations from 15 localities across the majority of barley producing Zones, 478 alleles were obtained in total with an average of 9.755 alleles per locus. Natural populations maintained moderate to low genetic diversity levels, high gene ow and low genetic differentiation among populations. AMOVA also demonstrated major variation existed within populations, which is attributed to high gene ow facilitated by seed exchange.
From the results of STRUCTURE analysis, 15 natural populations were categorized into seven groups by PCoA cluster analysis, which could possibly be considered as seven management units for the purpose of conservation. The largest number of populations should be saved by on-farm conservation and ex situ conservation measures, taking precedence over those with genetic diversity and differentiation.
In this study, the markers used allowed investigation of population structure, genetic diversity and proposed germplasm collection and a conservation strategy for H. vulgare L. Important information about genetic structure was provided by these markers, which signi cantly contribute to future improvements and breeding plans for the crop. The genetic diversity, population structure and genetic relationships between the populations through SSR analysis will be helpful for crop breeding to improve its productivity. To conclude, these results provide value as an important resource to study genetic diversity and support conservation and use the marker to initiate molecular breeding for future improvement.            Figure 1 Map of Ethiopia showing the collection sites of barley landraces based on Ethiopian National Regional States. The map was constructed using the QGIS-OSGeo4W Version 3.8.0 software.   Phylogenetic tree among 15 barley collection Zones based on the genetic-distance matrix using the neighbor-joining method by PowerMarker version3.25 and visualized using the software MEGA X. and Fig Tree.