SNPs have become the markers of choice in genetics and evolutionary biology studies, as well as in applications for marker-assisted selection in plant breeding. High-density of markers on a large numbers of individuals is vital for precise quantitative trait locus (QTL) mapping and association analysis (Beissinger et al., 2013). GBS is a NGS based genotyping platform (Elshire et al., 2011) that has the advantage of reduced genome representation to enable high-throughput genome-wide SNP genotyping with an affordable cost.
In this study, 309 pearl millet inbred lines were genotyped using GBS. Using the pearl millet genome (Varshney et al., 2017) as a reference and filtering the dataset resulted in the development of 54,770 high quality genome-wide SNPs. The level of heterozygosity of the SNPs ranged from of 0 to 20% with an average of 15%. Since we pooled 4–6 plants for genomic DNA extraction, this level of heterozygosity may be attributed to the heterogeneity of plants within an accession. It has also been reported that high outcrossing rates, a sequencing error or mapping error may lead to high heterozygosity in pearl millet (Hu et al., 2015). The level of homozygosity of the genotypes ranged from 70 to 93%, with an average of 85%, which is expected for inbred lines.
Genome-wide marker density analysis across the chromosome arms identified an average of 35 SNPs per Mb (1 SNP per 29 Kb) of genome size. This SNP density is slightly lower than the previously reported 48 SNPs per Mb of genome (Serba et al., 2019). As previously reported (Serba et al., 2019), the distribution of the SNPs was dense in the telomeric regions than the pericentromeric regions of the pearl millet chromosomes probably because of low recombination rates, low gene density and/or low restriction sites for the enzymes around the centromere. In pearl millet, the location of each centromere has not been determined. Therefore, the low SNP density in one arm of chromosome 5 is possibly associated with the location of centromere. This phenomenon can also be attributed to decreased restriction enzyme sites in this genome region, or fewer polymorphisms related to interactions of different causes of genetic variation such as mutation, selection, recombination, and genetic-drift which shape nucleotide polymorphisms across the genome (Begun and Aquadro, 1992; Cutter and Payseur, 2013; Gosset and Bierne, 2013; Cruickshank and Hahn, 2014). Low gene density, which is associated with low nucleotide diversity (Flowers et al., 2011), and decreased natural and artificial selection for alleles located in this part of the genome are both additional plausible reasons for lower marker numbers.
Germplasm resources and the genetic diversity of a crop species have paramount importance in the genetic improvement of a crop for desirable traits and conservation of genetic resources. Nucleotide polymorphism is a measure of genetic diversity and a key to understanding the effect of past selective forces on the gene pool. Average nucleotide diversity in the whole panel was 0.28 in this study, which was higher than reported for a global collection (Hu et al., 2015), but lower than the mean gene diversity (0.54) estimated using simple sequence repeat (SSR) markers in a pearl millet inbred germplasm association panel (PMiGAP) (Sehgal et al., 2015). As adaptive evolution is implicated in reducing functional diversity (Hoelzel et al., 2019), genetic distance among inbred lines derived from landraces grown in similar environments is expected to be low. Nevertheless, ecology and evolution work together to determine the population stability and maintain diversity within and among populations (Koch et al., 2014).
Population structure is a very important part of evolutionary genetics and depicts the diversity of a metapopulation that might have evolved independently. Knowledge about the genetic diversity and the population structure of a crop has important implications for a genome-wide association study. In the present study, population structure analysis using 30,893 SNPs detected five subgroups in a panel of 309 inbred lines and the grouping basically matches pedigree relationships or the parental source of the inbred lines. This number of subpopulations was validated by graphing kinship against the cross-validation error. Some genetic diversity studies reported six subpopulations in different panels of pearl millet (Sehgal et al., 2015; Serba et al., 2019). A population genomics study conducted on a collection of landraces from Senegal in comparison with a global collections, observed more diversity in the former (Hu et al., 2015), settling the West African origin of pearl millet (Burgarella et al., 2018). Multiple factors such as natural selection, migration, and genetic drift might be the mechanisms that caused changes in allele frequencies over time and acted as forces for genetic diversification and population structure formation (Cortázar-Chinarro et al., 2017). Grouping of pearl millet inbred lines from the same geographic region into different subpopulations implies that selection for different traits is maintaining genetic diversity.
A wide range of genotypic variations are prevalent in pearl millet for various agronomic traits and stress tolerance as a result of its cultivation in diverse agro-climatic conditions and soil types (Shivhare and Lata, 2017). However, a limited amount of germplasm has been exploited in breeding to improve its agronomic traits, stress tolerance, and productivity in pearl millet (Passot et al., 2016). Only a limited number of studies have assessed the evolutionary dynamics and genetic diversity patterns in pearl millet. Genetic characterization of early- and late-flowering landraces from Senegal also indicated a large diversity in Senegalese pearl millet germplasm that may be useful in defining heterotic groups and formation of a genomic association panels for trait mapping (Diack et al., 2017). This study provides a survey of genetic variation in pearl millet inbred lines from different geographic regions in Africa and Asia representing various agroecological niches. However, as the inbred lines were developed from landraces, improved varieties, and crosses between different genotypes, correlation of the subgroups to geographic origin could not be made.