Genome-wide association mapping for agronomic and quality traits in foxtail millet

Background: Foxtail millet [ Setaria italica (L.) P. Beauv.] is a particularly important cereal and fodder crop in arid and semi-arid regions. The genomic variation and alleles underpinning agronomic and quality traits are important for foxtail millet improvement. To better understand the diversity of foxtail millet and facilitate the genetic dissection of its agronomic and quality traits, we used high-quality single nucleotide polymorphisms (SNPs) to perform a genome-wide association study (GWAS). Results: Using genotyping-by-sequencing, 107 foxtail millet accessions were sequenced, and further analysis revealed 72,181 high-quality SNPs, of which 53 were significantly associated with 15 agronomic and quality traits. These SNPs were distributed across the nine chromosomes of foxtail millet; 44 were located in intergenic regions, whereas one and eight SNPs were located in exon and intron regions, respectively. The GWAS revealed that 28 SNPs were associated with a single trait. Conclusions: For some of the significant SNPs, favourable genotypes showed pyramiding effects for several traits. The 53 loci identified in this study will therefore be useful for breeding programs aimed at foxtail millet improvement. spike diameter) Ten random each were and the distances from the ground to the tips of the plants were measured and averaged per replication. Stem diameters were measured with a Vernier calliper and averaged per replication. The lengths and widths of flag leaves were measured and averaged as the leaf length per replication. Ten labelled leaves in each plot were selected to assess the chlorophyll SPD value by using a portable chlorophyll meter (Konica Minolta , Tokyo, Japan); five measurements per leaf were averaged for each replication. Ten random plants from each plot were hand-harvested to determine spike length, diameter, and weight; all spikes were harvested by hand to assess crop yield. Measurements were averaged per replication for each accession.

3 recognised for its potential as a health-promoting functional food that helps reduce the risk of disease [3], reportedly by reducing blood glucose levels and controlling cholesterol in both healthy subjects and patients with diabetes [4]. Despite its health benefits, foxtail millet's cultivation has decreased significantly compared to other crops due to its lower yield and decreased economic potential [5]. However, increasing awareness of the importance of dietary diversification has led to increased market demand and new opportunities for this crop [6]. Thus, new cultivars with improved nutritional value and higher yield, especially salt-resistant cultivars, are in demand.
Foxtail millet germplasm has a high level of genetic diversity owing to the wide distribution of this plant and its adaption to various agro-climatic conditions [7].
Agronomic traits, such as grain morphology, are the targets of breeding programmes designed to improve both yield and quality as they largely affect crop productivity and responses to environmental stressors [8]. Understanding the genetic basis of phenotypic variation within the germplasm, especially for agronomic traits that are qualitatively important (such as salinity tolerance), has been a major focus in foxtail millet genetic studies. Notably, most agronomic traits are quantitative, making it difficult to unravel the genetic basis for phenotypes of interest.
Linkage mapping, which identifies quantitative trait loci (QTL) that are closely related to complex traits, has been successfully applied and widely utilised in plants. A complementary approach is linkage disequilibrium (LD) mapping or association mapping.
Linkage disequilibrium mapping is based on two strategies: i) re-sequencing of selected candidate genes and ii) genome-wide association, which exploits marker polymorphisms across all chromosomes [9]. Genome-wide association studies (GWAS) are widely utilised to identify valuable natural variations in trait-associated loci, as well as allelic variations in candidate genes underlying quantitative and complex traits, including those related to 4 growth, development, stress tolerance, and nutritional quality [10]. Owing to its high resolution and cost-effectiveness, which are beneficial for gene discovery and molecular marker identification, GWAS have successfully identified numerous loci for complex traits in various plants, including wheat [11], maize [12], sugarcane [13], and alfalfa [14].
However, analyses of foxtail millet by GWAS are limited.
To better understand the diversity of foxtail millet, facilitate the genetic dissection of its agronomic and quality traits, and accelerate its marker-assisted breeding, high-quality single nucleotide polymorphisms (SNPs) were first detected in the present study using genotyping-by-sequencing of 107 foxtail millet accessions, and then used for GWAS.

Results
Double-digest restriction-associated DNA sequencing and variation detection The double-digest restriction-associated DNA (ddRAD) sequencing carried out to genotype the 107 millet accessions generated 187 Gb of data with reads of 150 bp in length, on average. After quality control, 169 Gb of high-quality sequences were obtained and then mapped to the reference genome sequence of foxtail millet (GCF_000263155.2_v2.0, https://www.ncbi.nlm.nih.gov/assembly/GCF_000263155.2/). The mapping rates of the 107 accessions varied between 78.75% and 100% (Additional file 1: To correct the GWAS model for population structure, we performed three analyses to determine the relationships between the 107 accessions. These were classified into four groups (G1, G2, G3, and G4) according to the results of the principal components analysis (PCA), neighbour-joining (NJ)-tree analysis, and ancestry structure analysis (Fig. 2).
To determine the mapping resolution of the GWAS, the LD of the 107 accessions was analysed (Additional file 1: Fig. S1), and the LD decay rate was estimated as 100 kb (r 2 = 0.2).

Genome-wide association analysis
Based on the 72,181 high-quality SNPs obtained, we performed an association analysis for 16 traits, most of which presenting unimodal distributions ( Fig. 3 and Additional file 1: Table S2). In total, 76 SNPs were significantly associated with these traits after Bonferroni correction ( Fig. 4), and 53 significant SNPs were retained after filtering (Additional file 1: Table S3). These SNPs were distributed across eight chromosomes (no significant SNP was detected in NC_028454.1). Of these SNPs, 44 were located in intergenic regions, whereas one and eight SNPs were located in exon and intron regions, respectively. Up to 28 SNPs were associated with a single trait (for leaf width and time_of_peak_value, respectively).
The 417 genes in the LD decay regions around each significant SNP were considered associated to the 15 traits. These genes included 23 genes encoding long non-coding RNAs and 10 genes encoding transfer RNAs. The number of genes associated with the different traits ranged from 2 to 64.
A favourable genotype is defined as one in which a significant SNP leads to an increase in 6 phenotypic value (Additional file 1: Table S5). Eleven traits were associated with more than one significant SNP. To assess the potential pyramiding effects between favourable genotypes for traits, the mean phenotypic values of the accessions that contained multiple favourable genotypes were analysed. As shown in Fig. 5, the correlation between the number of favourable genotypes and phenotypic value was >0.4 for three traits (average, ear_length, and minium_of_viscosity_index). These findings suggested a certain degree of pyramiding effect between the favourable genotypes and these three traits.

Frequencies of favourable genotypes in accession groups
The frequencies of favourable genotypes at different significant SNPs were calculated for G1, G2, G3, and G4. The favourable genotypes of SNP locus NC_028452.1::13,868,887 (the 13,868,887 th base of chromosome NC_028452.1) were all distributed in G2 ( Fig. 6 and Additional file 1: Table S6). Favourable genotypes involving 17, 10, and 25 SNP loci were obtained from two, three, and four different accession groups, respectively. These results suggested that accessions in the different groups might have had different evolutionary or domestication directions.

Discussion
Genome-wide association analysis is an effective method for identifying trait-associated loci in natural populations. To date, only a few SNP-based association analyses in foxtail millet have been reported, with one SNP-based association analysis performed using lowdepth (0.8×) re-sequencing [15]. In the current study, we sequenced 107 millet accessions through ddRAD sequencing with an average coverage of 3.58×, and identified 53 SNP loci associated with 15 traits in the subsequent GWAS.
The accessions used in this study were collected primarily from China, with two accessions from the United States. Population structure analysis revealed that these accessions could 7 be classified into four groups, although the obtained division did not correlate with the geographic origin of the accessions. Similar findings have been reported for other plant species, such as Brassica rapa L. [16] and Matricaria chamomilla L. [17]. This discrepancy might be explained by the transport routes and exchange of plant materials between regions.
Moderate LD decay is important for association analysis, ranging over several hundred kilobases between plants such as rice, soybean [18], and cotton [19]. The LD decay in the present study was estimated as 100 kb when r 2 decreased to 0.2, which is in agreement with previous findings [20] (Zhang et al., 2014). This LD decay will be useful for identifying unknown genes that are linked to significant SNPs.
We employed a mixed linear model in the present association analysis where the significance threshold was estimated as P = 10 -4.86 after the Bonferroni correction

Quality traits
At maturity, all panicles were harvested from each replication, dried naturally, and dehulled. Quality traits, including grain length, width, and length/width ratio, were determined according to China National Standards (GB/T 17891-1999). The millet was ground in a stainless-steel grinder for 3 min and the resulting powder was used for chemical analysis. Amylose content was determined according to the Chinese National standard method GB7648-87, with minor modifications: 10 mg of millet powder was transferred into a 14-mL capped tube, dispersed in 0.1 mL 95% ethanol, and treated with 0.9 mL 1 M NaOH for 16 h at room temperature. Each sample was thoroughly mixed and 10 μL of the formed supernatant was pipetted into 96-well plates; after adding 190 μL of freshly prepared I 2 -KI solution (3% iodine solution diluted 100 times in 0.01 M HCl before use) to each well, the plates were incubated for 10 min. Potato amylose (Sigma-Aldrich, St. Louis, MO, USA) treated in the same way was used to construct a standard curve.
Amylose content was measured according to the absorbance of each sample at 620 nm (OD620) and it was normalised based on sample weight. Viscosity profiles were measured using a Rapid Visco Analyser (Super 3; Newport Scientific, Warriewood, Australia) following the procedure of the American Association of Cereal Chemists [23]. Data were recorded using the RVA-3D model Thermocline Windows Control 1.2 software (New Port Scientific, Sydney, Australia), and included peak viscosity, hot viscosity, cool viscosity, peak time, gelatinisation temperature, breakdown viscosity, setback viscosity, and consistency viscosity expressed in centipoises (cp).

DNA isolation and sequencing
Genomic DNA was extracted from leaves using the cetyltrimethylammonium bromide method [24]. Double-digest restriction-associated DNA libraries were prepared using 500 ng of DNA per sample, following the protocol described by Peterson et al. [25], with some modifications. Briefly, genomic DNA was digested with the restriction enzymes HindIII and BfaI at 37 °C for 5 h, followed by a ligation step whereby each sample was assigned one of

Sequencing quality check and filtering
Adapter sequences were removed using AdapterRemoval 2 [26]. Reads with Phred scores <20 (average on sliding window), or with incorrect restriction sites, and reads of <50 bp were removed.

Annotation of genetic variants
Variants were annotated using ANNOVAR 2016-02-01 [29] with gene-based annotation to assess whether SNPs or indels caused protein-coding changes and to identify which amino acids were affected.

Genome-wide association study
In total, 107 accessions were used in GWAS for different traits. Association analysis was conducted with the genome-wide efficient mixed-model association (GEMMA) software package [31].For mixed-linear-model analysis, the following equation was used: where y represents phenotypes; a and β are fixed effects representing marker effects and non-marker effects, respectively; and μ represents unknown random effects. X, S, and K are the incidence matrices for a, β, and μ, respectively.
For each significant SNP in the GWAS results, accessions with phenotypic value were classified into three groups according to their genotypes (reference homozygosis, heterozygosis, and altered homozygosis). Pairwise comparisons of phenotypic values between genotype groups were performed using t-tests, and significant SNPs were filtered according to the t-test results (at least one p-value ≤0.05).   Distribution of favourable genotypes in G1, G2, G3, and G4 groups.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.