To develop sesame varieties with desirable traits, knowledge of the genetic diversity and relationships among germplasm accessions is vitally important. The actual level of genetic variation existing among genotypes at the DNA level reflected by Molecular markers; hence, they provide a more accurate estimate of variation than does either phenotypic or pedigree information [49].
This study based on the suitability of DArT platforms that applied for the genomic dissection of sesame. A total of 6115 silicoDArT markers were developed, of which 5002 markers provided robust information of the sesame genome in the absence of sequence information. On the other hand, DArTseq SNPs provided 6474 informative markers.
The average PIC values of silicoDArT were almost similar to that of SNP markers. The abundance of silicoDArT and SNP markers may achieve better genome coverage through the sampling of a greater number of points in the whole genome, as marker density has a high correlation with gene density [57, 58]. Therefore, both silicoDArT and SNP markers may better suit for genetic diversity studies, association/linkage mapping, and/or sequence-based physical mapping in sesame. Additionally, the co-dominant inheritance pattern of SNP markers may increase the utility of DArT platforms for genetic identity and parentage analysis [59]. In comparison with the other existing marker technologies like microsatellite markers, DArT markers are pertinent to high-throughput work and have merits in terms of cost-effectiveness and time aspect [60]. The effectiveness of silicoDArT and SNP markers varies depending on the type of application. For genetic diversity and linkage mapping a large number of silicoDArT markers are suitable. However, for genetic identity and product quality testing, both markers can perform equally. Due to the opportunity to track alleles from parental genotypes, the co-dominant SNP markers are more suitable in plant identity and parentage analysis than silicoDArT.
Then, 2997 SNP markers were filtered with a call rate of 75%, and those having > 0.01 minor allele frequency were used for the analysis, The proportion of rare SNPs (i.e., MAF < 0.05) we examined amounted to ∼61.29% %, which was similar to those reported for the genomes of sesame [38]. In our study, a high proportion of rare SNPs have two explanations. Firstly, since the SNPs were identified via DArTseq conducted by GBS technology, providing a broad genome coverage, they should be less prone to bias than would be low-coverage sequencing data[61]. Secondly, in following its recent program to conserve genetic resources, a significant number of minor sesame varieties have been collected and preserved by Ethiopian Biodiversity and research centers. The SNPs with a MAF < 0.05 were removed in several previous studies [62, 63]. However, rare SNPs might also have control over the expression of a particular phenotype [64]. Providing that the number of individuals with a specific genotype will be very small, the effect of rare alleles on genome mapping could extend beyond the effect of just small population sizes. In such cases, increasing the number of individuals with rare alleles could improve the ability to check these rare alleles.
The average value of genetic diversity (0.14) was lower in the present study than in the earlier reports for the sesame collections analyzed with SNPs markers [29, 32, 38, 39] and SSR markers [65, 66]. However, with the use of 1022 SNP markers that were filtered with a call rate of 97% and > 0.05 MAF similar to the report on [38], the average value of genetic diversity (0.19) was higher than in the earlier reports for the sesame collections analyzed with different markers types [32, 38]. The broad range of variability among collections might be a source of the differences observed in genetic resources (such as landraces, advanced breeding lines, cultivars, etc.), data filtering methods, sampling approaches, and the number of markers [65]. The type of marker is also an important factor for the identification of gene diversity; In general, the genetic diversity estimated by SNPs may be lower than those estimated through SSR markers; however, the accurate consideration of genetic diversity reflected the number of loci instead of the number of alleles [38]. Therefore, sufficiently large numbers of next-generation-based SNPs are analyzed across the genome and are ready to estimate accurate genome-wide diversity in several crop species.
Considering the genotypes based on their geographical origin, Africa (0.21) without the different region of Ethiopian was more diverse than Asia and Different regions of Ethiopia collections, but when we compare at the continent level by including different regions of Ethiopia as Africa, Asia (0.17) was more diverse than Africa (0.14), even if the sample of Asia was little. This finding was expected because the geographical origin of crops generally shows a higher genetic diversity, as reported previously for cotton (Paterson A., 2009) and Oryza ssp. [61]. Laurentin and Karlovsky [28] also obtained higher genetic diversity in sesame accessions collected from Asia.
Based on the size of the sample and the result of Africa we further portioned into four geographical origins based on direction (North, South, East and West Africa) and compared with Asia collections, then North Africa collection (0.23) was more diverse than other three directions of Africa geographical origin and Asia also. East Africa (Ethiopian) collection was less diverse than the others. This indicates even if, Ethiopian sesame well known in international market and has its own taste and aroma, it needs a further breeding program to broaden genetic diversity with hybridization and the introduction of a highly diverse collection of North Africa and different countries of Asia.
Distribution of heterozygous sesame genotypes and SNP markers revealed low values of
heterozygosity, the average heterozygosity with in sesame panel was 0.1; this suggests that the accessions we used were close to being inbred lines. Hence, the accessions selected were suitable for investigating multiple phenotypic traits in a multi-plot field test over several years and to also carry out GWAS.
The genetic distance matrix among the sesame populations from 8 geographical origins was also used to construct the clustering tree (Fig. 6). The similarity coefficients ranged from 0.015 to 0.394, with an average of 0.165. The sesame populations could be clustered into four groups. The clustering Dendrogram based on the geographical distribution of accessions showed that the majority sesame accessions from the identical origin didn’t classify properly on the premise of the country of origin except those accessions introduced from one of African country Egypt. Similar results were reported previously indifferent sesame germplasm [39, 68–70] and in other crops, including wheat [71], finger millet [72], and sorghum [73]. The explanation for this unequal distribution of sesame accessions based on the geographical origin may be associated with the gene flow among the various geographical areas due to migrations of people who traded with other regions for a century or who carried seeds for cultivation.
Similarly, Laurentin and Karlovsky [28] found no association between genetic diversity and accession origin, and they proposed that ecological and geographical factors have not played a significant role in the evolution of sesame. The present AMOVA analysis also supported the possibility of high rates of gene flow between regions, because the genetic variation among the geographical groups accounted for 8.3% of the total variation and in terms of continents, 11.49% of the total molecular variation among the continents (Table 3).
Most of the genotypes used in this study have been used as parental lines or have a similar genetic background, so a mixture of pedigree observed in all clusters. In our result, the genotypes in Cluster 2 and 3 were collected from different regions of Ethiopia that showed a tendency to cluster together and mostly originating from Ethiopia. This result matches the hypothesis that sesame seeds were dispersed to nearby countries by human activities. Subsequently, these distributed sesame genetic resources were later utilized in further breeding activities to a modern cultivars that were commercialized.
Cluster 1 contained accessions originating from two different continents (Africa and Asia), a close genetic relationship between accessions from East Africa, South Africa, North Africa, and West Africa to the accessions from Asia. This close genetic relationship observed might be due to the introduction of similar sesame genetic stock into many countries and material exchange among widely separated locations [74]. Moreover, the exchange of plant materials between Asia and East Africa dated back to a long time ago and is still occurring [75], with a gentle increase in annual exportation of raw sesame seeds mainly for industrial applications. The likelihood of crossover events between materials from different locations grown within the same area is high, knowing that cross-pollination in sesame has been reported to occur at a frequency between 5% and 60% [66]. This crossing could result the similarity of accessions from the eastern a part of Africa and Asia. Similar patterns have also been observed by other researchers [28, 69, 74]. Most of the genotypes used in this study have been used as parental lines or have a similar genetic backgrounds, so a mixture of pedigree observed in all clusters.
Cluster 4 indicates the possibility of genotypes from the same origin those were genotypes observed from one of the African countries Egypt were grouped together (Fig. 6).
Population Structure of the Association-Mapping Panel
The complex breeding history of the numerous important crops and also the limited gene flow in most wild plant populations have created complex structures within their germplasms [76]. Detailed knowledge about the population structure in an association panel is thus important to avoid any spurious associations [77]. An assessment of structure in sesame has been reported by using different populations. As an example, Ali et al., 2007[68] evaluated 96 sesame accessions, collected from different parts of the world and clustered into just two major groups that discriminated varieties as associated with their geographical origin. And [37] divided 705 sesame accessions into two clusters by employing a neighbor-joining tree. Recently, [38] with the K value of 2 was determined by both the LnP (D) and ∆K. By using a 70% probability of membership threshold, the 366 sesame germplasm was successfully divided into three subgroups (Pop 1, Pop 2, and the Mixed) and [39] divided 95 Mediterranean sesame core collection that contains agro-morphologically superior sesame accessions from geographically diverse regions in four continents (Asia, Europe, America, and Africa) into three groups ascertained using STRUCTURE with K = 3.
Similarly, in our study, the K value of 4 determined by both the LnP (D) and ∆K. By employing a 50% probability of membership threshold, the panel was successfully divided into four subgroups (Pop 1, Pop 2, Pop 3, and Pop 4) and the remaining 21 accessions were clustered as an admixture with varying levels of membership shared among the four genetic groups, based on structure analysis. The occurrence of some admixed/hybrid and introgressive hybrid genotypes indicated frequent hybridization and introgression events. Although the extent and significance of natural hybridization/introgression are unclear [79], new gene combinations between domestic cultivars and their wild or weedy relatives are important for the evolution of domesticated plant species [80].
The genetic diversity within each population was explained through the estimation of the expected heterozygosity (the average distances between each individual in the same cluster), which varied from 0.06 (POP2) to 0.31 (POP4). The expected heterozygosity of POP1 was 0.22 and that of POP3 was 0.18. The genetic divergence among the populations revealed by Nei’s net nucleotide distance (D) indicated that a higher distance between POP3 and POP4 (0.22) and the genetic distance observed between POP1 and POP2 (D = 0.09) was the least among the pairs of populations. Mean fixation index of sub-populations ranged from 0.39 (POP4) to 0.77 (POP2) (Table 5).
The population genetic structure reflects interactions among species with regard to their long-term evolutionary history, mutation and recombination, genetic drift, reproductive system, gene flow, and natural selection [81, 82]. Thus, an understanding of the extent and structure of the genetic diversity of a crop could be a prerequisite for the conservation and efficient use of the germplasm available for breeding [83]. The various approaches (STRUCTURE, PCA, and the clustering tree) used to analyze the structure and relation of the sesame germplasm appeared to provide complementary information. The neighbor-joining tree divided the sesame germplasm into four main clusters which are in complete concordance with the structure and PCA analysis results. These results suggest that the crossing among inter-cluster genotypes may develop cultivars with promising agronomic traits.
According to the AMOVA results, 8.3% and 11.49% of the marker variation was explained among the population from different geographical regions of the sesame panel and differentiation between Asia and Africa population respectively. This result suggests the absence of a complicated population structure in our association-mapping panel. Relatively, 22.17% of the marker variation was explained among the population from different directions of Africa and Asia, this suggests the presence of certain complicated between population structure in different directions of Africa and Asia association-mapping panel.
In this study, most collections (225) were from Ethiopia and a specific collection was from West, South, and North Africa and seven collections were from 4 Asia countries. Ethiopian sesame has useful characteristics, and often branded as ‘Humera’, ‘Gondar’ and ‘Welega’ types, well known in the world market by their white color, sweet taste and aroma. The Humera and Gondar sesame seeds are suitable for bakery and confectionary purposes and the high oil content of the Welega sesame seed gives a major advantage for edible oil production[84]. Collections that were introduced from a different direction of Africa and Asia were accustomed to compare the degree of genetic relationship and differentiation among genetic resources of Ethiopian collection, which broadens genetic diversity can also be used to combine alleles for valuable agricultural traits [86].The SNPs obtained from this collection could benefit future breeding and association mapping work in sesame. Our diversity analysis of this collection revealed genetic relationships among the accessions that may be valuable for parental selection in sesame improvement research. Therefore, the identification of genetically distant accessions (such as Najjoo-68 (gabaa kamijaa) and 17712) for hybridization in sesame breeding programs has the potential to lead to the development of elite varieties. Even based on economical traits and the distance we got from SNPs, we can further select a number of accessions for the different breeding programs.