Phenotypic characterization of grain yield components. To explore components of grain yield in wheat, we measured four phenotypes: grain length (Gle), grain width (Gwi), 1000-grain weight (Gwe) and grain yield (Gyi) over two years at two sites. As shown in Table 1, means (± standard deviation) observed for these traits corresponded to: 3.2 mm (± 0.08) for grain length, 1.6 mm (± 0.04) for grain width, 25.7 g (± 0.80) for 1000-grain weight and 2.6 t/ha (± 0.11) for grain yield. The broad-sense heritability estimates were 90.6% for grain length, 97.9% for grain width, 61.6% for 1000-grain weight and 56.0% for grain yield. An analysis of variance revealed significant differences due to genotypes (G) for all traits and, for two traits (Gwe and Gyi), the interaction between genotype and environment (GxE) proved significant. A correlation analysis showed a high significant positive correlation between grain yield and grain weight (r = 0.96; p < 0.01) and also between grain length and grain width (r = 0.88; p < 0.01). Also, significant positive correlations were identified between grain yield and grain length (r = 0.51; p < 0.01) and between grain yield and grain width (r = 0.54; p < 0.01). Interestingly, a bimodal distribution was observed for grain length and width (Fig. 1). Together, these results suggest that a major gene controls two important characters related to grain size with a high heritability within this collection.
Table 1
Descriptive statistics, broad sense heritability (h2) and F-value of variance analysis for four agronomic traits in a collection of 159 wheat lines.
|
|
Range
|
|
|
|
F-values
|
|
Traits
|
Unit
|
Min
|
Max
|
Mean ± SD
|
h2
|
Genotype (G)
|
Environment (E)
|
GxE
|
Gle
|
mm
|
1.22
|
8.55
|
3.2 ± 0.08
|
90.6
|
10.7***
|
36.9
|
1.1
|
Gwi
|
mm
|
0.45
|
3.45
|
1.6 ± 0.04
|
97.9
|
48.6***
|
11.5
|
1.3
|
Gwe
|
g
|
6.25
|
117.38
|
25.7 ± 0.80
|
61.6
|
30.9***
|
15.7**
|
2.6*
|
Gyi
|
t/ha
|
0.42
|
7.83
|
2.6 ± 0.11
|
56.0
|
66.3***
|
174.9***
|
2.2*
|
SD: standard deviation, CV: coefficient of variation, h2: broad sense heritability, Gle: grain length, Gwi: grain width, Gwe: 1000-grain weight and Gyi: grain yield. |
Genome-wide SNP marker discovery and validation. To genetically characterize our wheat collection and study the genetic determinants of grain size, we used a double digestion (PstI/MspI) GBS approach to genotype this collection. Overall, 77,124 and 73,784 SNPs were discovered for the set of 71 Canadian wheat accessions and 159 exotics wheat accessions, respectively.
To assess the reproducibility and accuracy of genotypes called via the GBS approach, we genotyped 12 different plants of CS (i.e. biological replicates), which were added to the set of 288 wheat samples for SNP calling and bioinformatics analysis. Sequence reads of the full set of 300 wheat samples obtained from GBS were analyzed following the standard steps of SNP calling and bioinformatics analysis described below. This yielded a total of 129,940 loci that were used for the assessment of accuracy and reproducibility of SNP calls. For each individual plant of CS, the GBS calls were compared between replicates and with the Chinese Spring reference genome (at the corresponding positions).
On the non-imputed data, we detected a very high level of concordance (99.9%) between the genotypes of each CS individual and the reference alleles for the 1,196,184 called genotypes ([130K SNPs x 12 samples] – missing data; Supplementary Fig. S1). Among those 12 biological replicates of CS, we found a very high reproducibility of genotype calls, as the pairwise identity of genetic distance calls varied from 1.56E-04 to 5.08E-04, with an average of 2.86E-04. In order to ensure about identity of each CS plant, we have found that this value between the individual w56_Guelph (Canadian wheat variety) and each of the CS plant is greater than 0.1.
After imputation of the missing genotype calls, we observed a mean concordance of 93.8% between the CS individuals and the CS reference genome. Furthermore, 76.7% of genotypes were called initially and 23.3% of genotypes were imputed. It should be noted that the accuracy rate for imputing missing data is 73.4%. More details of SNP data set are provided in supplementary Table S1.
As a further examination of data quality, we compared the genotypes called using both GBS and a SNP array on a subset of 71 Canadian wheat accessions that had been previously genotyped using the 90K SNP array. A total of 77,124 GBS-derived and 51,649 array-derived SNPs were discovered in these 71 accessions (Supplementary Table S2). Of these, only 135 SNP loci were common to both platforms and among these potential 9,585 datapoints (135 loci x 77 lines), only 8,647 genotypes could be compared because the remaining 938 genotypes were missing in the array-derived data. As shown in Fig. 2, a high level of concordance (95.1%) was seen between genotypes called by both genotyping approaches. To better understand the origin of discordant genotypes (4.9%), we inspected the set of 429 discordant SNP calls and observed that: 1) 3.5 % of discordant calls corresponded to homozygous calls of the opposite allele by the two technologies; and 2) 1.4 % of discordant calls were genotyped as heterozygous by GBS while they were scored as homozygous using the 90K SNP array. More details are provided in Supplementary Table S3. From these comparisons, we conclude that GBS is a highly reproducible and accurate approach for genotyping in wheat and can yield a greater number of informative markers than the 90K array.
Genome coverage and population structure. For the full set of accessions, a total of 129,940 SNPs was distributed over the entire hexaploid wheat genome. The majority of SNPs were located in the B (61,844) and A (50,106) sub-genomes compared to the D (only 17,990 SNPs) sub-genome (Table 2). Although the number of SNPs varied 2- to 3-fold from one chromosome to another within a sub-genome, a similar proportion of SNPs was observed for the same chromosome across sub-genomes. Typically, around half of the markers were contributed by the B sub-genome (47.59 %), 38.56 % by the A sub-genome and only 13.84 % by the D sub-genome.
Table 2
Distribution of SNP markers across the A, B and D genomes
Chromosomes
|
Wheat genome
|
Total
|
A (*)
|
B (*)
|
D (*)
|
1
|
6099 (0.36)
|
8115(0.48)
|
2607(0.15)
|
16821(0.13)
|
2
|
8111(0.35)
|
11167(0.48)
|
3820(0.17)
|
23098(0.18)
|
3
|
6683(0.33)
|
10555(0.53)
|
2759(0.14)
|
19997(0.15)
|
4
|
6741(0.58)
|
4007(0.34)
|
913(0.08)
|
11661(0.09)
|
5
|
6048(0.38)
|
8015(0.51)
|
1719(0.11)
|
15782(0.12)
|
6
|
5995(0.33)
|
10040(0.55)
|
2191(0.12)
|
18226(0.14)
|
7
|
10429(0.43)
|
9945(0.41)
|
3981(0.16)
|
24355(0.19)
|
Total
|
50106
|
61844
|
17990
|
129940
|
∗ Proportion of markers on a homoeologous group of chromosomes that were contributed by a single sub-genome. |
The analysis of population structure for the 159 accessions of the association panel showed that K = 6 best captured population structure within this set of accessions and these clusters largely reflected the country of origin (Fig. 3). The number of wheat accessions in each of the six subpopulations ranged from 6 to 44. The largest number of accessions was found in northwestern Baja California (Mexico) represented here by Mexico 1 (44) and the smallest was observed in East and Central Africa (6).
GWAS analysis for marker-trait associations for grain size. To identify genomic loci contributing to grain size in wheat, we performed a GWAS analysis (159 accessions, 73,784 SNPs, grain length and width) using an CMLM approach.
As seen in Fig. 3, both Q–Q plots suggest that the confounding effects of population structure and relatedness were well controlled. For both traits, the greatest marker-trait associations were detected at the end of chromosome 2D, while another weaker association was shared at the beginning of chromosome 1D. For grain width only, a marker-trait association was detected on chromosome 4A. In total, seven SNPs were found to be associated with one or both traits, with respectively one, five and one significant SNPs being located on chromosomes 1D, 2D and 4A. Except for two SNPs (chr2D:442798939 and chr4A:713365388), all other SNPs were significant for both grain length and grain width. The SNP at 4A:713365388 was significant only for grain width while the SNP at 2D:442798939 was significant only for grain length.
The most significant association was observed on chromosome 2D and contributed to both grain length and grain width (Table 3; Fig. 3). For this QTL, a total of four SNPs was observed and the SNP most significantly associated to both traits was located at position 2D:452812899. A fifth SNP located at 2D:442798939 was significantly associated to grain length only, but was just below the significance threshold (p-value = 5.09E-05) for grain width.
Table 3
Details of loci associated with grain size traits identified via a genome-wide association study in a collection of 159 hexaploid wheat lines.
Loci
|
Chr
|
Grain traits
|
P-value
|
MAF
|
R2
|
Allelic effect
|
Alleles
(Maj/Min)
|
chr1D:166874041
|
1D
|
Length
|
3.34E-06
|
0.29
|
0.06
|
0.25
|
T/C
|
Width
|
3.29E-05
|
0.29
|
0.06
|
0.11
|
chr2D:403935865
|
2D
|
Length
|
1.31E-06
|
0.29
|
0.12
|
0.27
|
T/C
|
Width
|
1.29E-05
|
0.29
|
0.07
|
0.12
|
chr2D:442798939
|
2D
|
Length
|
3.25E-06
|
0.28
|
0.11
|
-0.26
|
A/G
|
chr2D:444560418
|
2D
|
Length
|
2.08E-06
|
0.28
|
0.11
|
-0.27
|
A/G
|
Width
|
3.44E-05
|
0.28
|
0.06
|
-0.12
|
chr2D:452644656
|
2D
|
Length
|
2.08E-06
|
0.28
|
0.11
|
-0.27
|
A/G
|
Width
|
3.44E-05
|
0.28
|
0.06
|
-0.12
|
chr2D:452812899
|
2D
|
Length
|
6.42E-07
|
0.30
|
0.13
|
-0.27
|
A/G
|
Width
|
7.03E-06
|
0.30
|
0.07
|
-0.12
|
chr4A:713365388
|
4A
|
Width
|
1.35E-05
|
0.14
|
0.07
|
0.13
|
A/G
|
Chr: Chromosome, MAF: Minor Allele Frequency, R2: R square of model with SNP, calculated by R2 of model with SNP minus R2 of model without SNP 48. Maj: Major allele; Min: Minor allele. |
A high degree of LD was detected among some of the seven SNPs from chromosome 2D displaying association with grain traits. These formed one discontinuous linkage block as the LD between markers belonging to this block was higher (mean of r2 = 0.90). For this reason, we considered these to define one quantitative trait locus (QTL) on chromosome 2D (Supplementary Fig. S2). This QTL included 5 SNP markers (chr2D:403935865, chr2D:442798939, chr2D:444560418, chr2D:452644656 and chr2D:452812899) and the peak SNP (chr2D:452812899) explained between 7% and 13% of the phenotypic variation for grain length and width. The minor allele frequencies (MAF) at this locus was 0.30 and exerted an allelic effect from − 0.27 to -0.12 mm (Table 3).
On chromosome 1D, the SNP marker chr1D:166874041 defined a QTL for both grain length and width. The percentage of phenotypic variation explained by this marker for grain length and width was 6% each, with a MAF of 0.29 and allelic effects of 0.25 and 0.11 mm for grain length and width, respectively. Furthermore, a high degree of interchromosomal LD was observed among the peaks SNPs between chromosomes 1D and 2D (r2 = 0.94) displaying association with grain traits. In addition, almost all accessions which have the major allele on chromosome 1D are the same which have the major allele on chromosome 2D. Thus, the combined impact of these two loci could explain the observed bimodal distribution.
On chromosome 4A, the SNP marker chr4A:713365388 defined a QTL for grain width only and it explained 7 % of the variation, had a MAF of 0.14 and exerted an allelic effect of 0.13 mm. However, we reported a very weak LD between this peak SNP marker and the two others on chromosomes 1D and 2D.
In summary, a total of three QTLs significantly associated with grain length and/or width were identified on chromosomes 1D, 2D and 4A.
Candidate gene detection for grain size. To identify candidate genes contributing to grain size within the studied wheat collection, we investigated the genes residing in the same linkage block as the peak SNP for each QTL. On chromosome 2D, the QTL with the largest number of associated SNPs (chr2D:403935865 to chr2D:452811303) included a total of 315 high-confidence genes. On chromosomes 1D and 4A, the SNP markers chr1D:166874041 and chr4A:713365388, defining each a QTL, respectively, doesn’t included high-confidence genes. Upon examination of the annotations for these genes, the most promising candidate appears to be the TraesCS2D01G331100 gene in the QTL on chromosome 2D, an ortholog of the rice CYP724B1 gene, commonly known as the D11 gene. The D11 gene was previously reported as being involved in the regulation of internode elongation and seed development due to its role in the synthesis of brassinosteroids, key regulators of plant growth promoting the expansion and elongation of cells. More details are provided in Supplementary Table S4.
Haplotypes at the wheat orthologue of the rice D11 gene and their phenotypic effects. To provide a useful breeding tool for the main QTL identified in this research, we defined SNP haplotypes around our candidate gene. Using HaplotypeMiner, we identified two SNPs (chr2D:423365752 and chr2D:425474599, Supplementary Fig. S3) that best captured the SNP landscape in the vicinity of the candidate gene. These markers reside in the same haplotype block as the SNP markers, but were not individually found to be significantly associated with grain width and length. These SNP markers define three haplotypes (AT, CT or CC) around the candidate gene, with 100, 18 and 41 individuals carrying these haplotypes, respectively. To investigate the phenotypes associated with these haplotypes, we analyzed the trait value for each haplotype. Interestingly, we observed that for all traits, the mean values of accessions with haplotype AT were significantly larger (p < 0.001) than those obtained for the other haplotypes. As shown in Fig. 4, accessions carrying haplotype AT showed mean values of 3.76 mm for grain length, 2.03 mm for grain width, 41.64 g for grain weight and 2.6 t/ha for grain yield, compare to 2.43 mm, 1.12 mm, 26,57 g and 1.73 t/ha (respectively for grain length, width, weight and yield) for accessions carrying haplotype CC and 1.65 mm, 0.78 mm, 26.89 g and 1.69 t/ha (respectively for grain length, width, weight and yield) for accessions carrying haplotype CT. Furthermore, the relation between the 3 haplotypes and the 6 groups found in the population analysis showed that the haplotype AT predominates in the populations of Mexico 1 and North Africa (Supplementary Fig. S4, Supplementary Table S5). To conclude, we suggest that SNP markers corresponding to haplotype AT will provide a useful tool in marker-assisted breeding programs to improve wheat productivity. Therefore, we point out that the relationship between yield and haplotypes around the D11 gene would allow the selection of high-yielding wheat lines in a breeding program.