Core collection definition and field evaluation
A total of 541 Senegalese pearl millet landraces were collected between 1992 and 2014 (9, 11, 22) from farmers with their approval and according to institutional, national, or international guidelines. SSRs markers were used to genotype the germplasm consisting of 429 early-flowering morphotypes (Souna) and 112 late-flowering morphotypes (Sanio). During 2014 and 2015, we evaluated the phenotype of 392 of these accessions in ISRA research stations. A total of 306 Souna accessions were evaluated at Bambey (N14°32'12" W16°36'41") and at Nioro (N13°45'00" W15°45'00"), while 86 Sanio accessions were evaluated at Senthiou Maleme (N13°49'01" W13°55'03") and at Kolda (N12°53'02" W14°57'05"). In each site, the experiment was a randomized complete block design with 3 replications. Each accession was grown in a single row of 8 hills. The distance between rows and between plants within a row was 90 cm. For the different trials, the following phenotypes were measured : downy mildew incidence (DM), 50% flowering time (FLO), nodal tiller number (NTN), plant height (PH), number of productive tillers (NPT), flag leaf length (FLL), flag leaf width (FLW), spike length (SL), spike thickness (ST), 1000 seed weight (SW), grain yield (GY), panicle yield (PY) (25). To establish a core collection from this panel, an advanced maximization sampling, called heuristic, was performed based on phenotypic and genotypic data from these 392 accessions using the PowerCore v 1.0 software (26). From this heuristic approach, 91 accessions were retained, consisting of 60 early-floreing Souna and 31 late-flowering Sanio, and field-evaluated at Nioro during the 2016 and 2017 rainy seasons. The experimental design for each trial was a 7 × 13 alpha lattice, with three repetitions. Each of the tested accession was grown in a single row of eight hills in each repetition and the measurements were taken on three hills. The genetic variability of the panel was assessed using the significance of differences between the Nei genetic index of core collection and a Student t-test at α = 0.05 (27). To assess whether the core collection captured the diversity of the whole dataset, we calculated the percentage of mean and of variance difference (%MD, %VD), the coincidence rate (%CR), and variable rate (%VR) according to (26). There parameters allow knowing the difference of accessions in average and in distribution, respectively. We build the core-collection based on statistical values for these parameters. Notably, the core collection is considered to be the representative of the entries collection when no more than 20% of the traits have different means (significant at α = 0.05) between the core collection and the entries collection, and the coincidence rate CR% retained by the core collection is no less than 80%. Analysis of variance was performed on the different phenotypic parameters using the following model:
Y = µ + G + Y + GY + R + B + 𝜺
where Y is the phenotype; µ, the mean; G the genetic effect; Y, the year effect; GY, the interaction between genotype and year; R the replication effect; B, the incomplete block and 𝜺, the residual effect. The heritability of agro-morphological characters was calculated using the mixed linear model with random-effects for individuals, using the Plant Breeding Tools v 1.3 software, according to the following formula, where is the genotypic variance, the genotype by (y) year variance and , the residual variance for (r) replicates and (y) year. :
For each trait, the adjusted mean from the 2016 and 2017 trials of each individual was calculated with fixed-effects for individuals, using the Plant Breeding Tools v 1.3 software and considered as the value of the individual for the trait.
A principal component analysis was performed with the individually adjusted means for each phenotypic trait, centered and reduced, using the adegenet v2.1.1 package (28), R v3.5.1 (29). A discriminant analysis (DA) between early- and late-flowering millets was then performed. Characters that were significant with a P-value < 0.001 and which presented a high correlation (r) ≥ 0.7, with the axis of differentiation, were identified as discriminating characters between early- and late-flowering millets. This analysis was performed using the (XLStat 2014 software; http://www.xlstat.com) and the distribution of discriminating agro-morphological characters was plotted using R software v. 3.5.1 (29).
DNA extraction, libraries construction, and sequencing
Genotyping-by-sequencing (GBS) was performed on genomic DNA extracted as previously described (30) from a single plant sampled at the 5-leaf stage of each 91 accessions grown in 2016 at Nioro. The DNA was of good quality with ratios 260/280 and 2060/230 between 1.8 and 2. Extracted DNA was stored in a solution of Tris-HCl and sent to be sequenced at the Next Generation Sequencing Platform of the CHU Research Center, University of Laval, Quebec. The libraries were generated in 2 multiplexes of 45 and 46 samples. PstI-MspI double-digestion was applied, and adapters were linked to each sample followed by mixing and amplification. The sequencing of these libraries was done using an Illumina HiSeq2500.
SNPs calling, filtering, and data analysis
The evaluation of the quality of reads was done using FastQC v 0.72 and MultiQC v 1.6 and the cleaning of the sequences was done with FastQ Trimmer v 1.0.0. Only the sequences having an average quality (Q) ≥ 30 (sanger format) were retained and the first 7 bases (5' side) of each read were removed. The sequences were aligned to pearl millet reference genome (GenBank Accession number GCA_002174835.2), using BWA v 1.2.3, before realigning the sequences for insertions and deletions using RealignerTargetCreator v 0.0.4 and IndelAligner v 0.0.6. The BAM format files from the above procedures were joined using the MergeBAM v 1.2.0 tool and the SNP calling was made using UnifiedGenotyper v0.0.6. A total of 545,834 variants were called including 502,382 SNPs. The filtering was first done according to the quality of the mapping and the depth, by applying hard filtering using Variant filtration v0.0.5 tool (MQ ≥ to 40 and MQ divided by the depth of unfiltered samples greater than 0.1). A second filter was performed according the frequency of minor allele (MAF) greater than 0.05, and allowed proportion of maximum missing data of 0.05 for markers and 0.1 for individuals, using Plink v1.9. The multi-allelic markers were then removed using Tassel v5.2.48. The output file ultimately contained 21,663 SNPs markers and 78 individuals. Subsequent analyses were performed using this dataset. Subsequent analyses were performed using this dataset. All the bioinformatics analysis was carried out on the Galaxy v18.0.5 platform (31), implemented in the Bio-Linux 8 operating system (32).
Genetic structure
The genetic structure was evaluated using the sNMF algorithm, through the LEA v2.2.0 package implemented in the R software. For this analysis, we used several populations ranging from K = 1 to K = 10, and 10 repetitions for each K value. Discriminant Analysis of Principal Components (DAPC) was also used through the adegenet v2.1.1 package (28). The choice of the number of axes (PCs) retained for the DAPC was made using a cross-validation method, performed on the data set subdivided into two training sets of 90% and 10% (33). A test comprising 30 repetitions was performed to preselect a limited number of PCs on which to refine the search. A second test comprising 1000 repetitions was carried out on the limited number of preselected PCs to retain the number of PCs that allow the highest proportion of correct prediction with the lowest error. This analysis was carried out using the R software adegenet package (28).
GWAS
Association analyses were conducted with a mixed linear model (MLM) correcting for population structure and kinship was conducted using the Tassel v5.2.48 software. Q-Q and Manhattan plots illustrating the results of GWAS were produced using the qqman package v 0.1.4 from R software (34). The significance threshold (α) of the association of SNP markers with the different traits was calculated using the Bonferroni correction (35). SNPs markers significantly associated with agro-morphological traits were localized in the pearl millet genome intervals. This location was performed with the valR package v 0.5.0 (36) of R software. Pearl millet genome annotation was used to identify these genes.