Genome-Wide association study identifies new elements on the genetic basis of quality-related traits in wheat across multiple environments


 Backgrounds: Grain protein concentration (GPC), grain starch concentration (GSC), and wet gluten concentration (WGC) are complex traits that determine nutrient concentration, end-use quality, and yield in wheat. To identify the elite and stable loci or genomic regions conferring high GPC, GSC, and WGC, a genome-wide association study (GWAS) based on a mixed linear model (MLM) was performed using 55K single nucleotide polymorphism (SNP) array in a panel of 236 wheat accessions, including 160 commercial varieties and 76 landraces, derived from Sichuan Province, China. The panel was evaluated for GPC, GSC, and WGC at four different fields. Results: Phenotypic analysis showed variation in GPC, GSC, and WGC among the different genotypes and environments. GWAS identified 12 quantitative trait loci (QTL) (-log10(P) > 2.5) associated with these three quality traits in at least two environments and located on chromosomes 1B, 1D, 2A, 2B, 2D, 3B, 3D, 5D, and 7D; the phenotypic variation explained (PVE) by these QTL ranged from 4.2% to 10.7%. Among these, three, seven, and two QTL are associated with GPC, GSC, and WGC, respectively; five QTL (QGsc.sicau-1BL, QGsc.sicau-1DS, QGsc.sicau-2DL.1, QGsc.sicau-2DL.2, QWgc.sicau-5DL) were defined potentially novel Compared with the previously reported QTLs/genes by linkage or association mapping, 5 QTLs (QGsc.sicau-1BL, QGsc.sicau-1DS, QGsc.sicau-2DL.1, QGsc.sicau-2DL.2, QWgc.sicau-5DL) were potentially novel. Furthermore, 21 presumptive candidate genes, which are involved in the metabolism or transportation of all kinds of carbohydrates, photosynthesis, programmed cell death, the balance of abscisic acid and ethylene, within these potentially novel genomic regions were predicted. Conclusions: This study provided new genetic resources and valuable genetic information of nutritional quality to broaden the genetic background and laid the molecular foundation for marker-assisted selection in wheat quality breeding.

4D had the lowest marker density (1.6 markers per Mb). The analysis of PIC showed that chromosome 5B had the highest PIC value (0.330), and 4A had the lowest (0.247) ( Table 4).

Population structure and linkage disequilibrium (LD) decay
The population structure (Q-matrix) analysis using 44,326 SNP markers based on the Bayesian model showed the optimal ΔK as 2. The 236 accessions were classi ed into two subpopulations: subpopulation 1 (SP1) with 74 landraces and 1 cultivar (Xifu14) and subpopulation 2 (SP2) with 159 cultivars and 2 landraces (Kaixianluohanmai and Yupi) based on the three traits.
Furthermore, we constructed the NJ phylogenetic tree based on shared allele distance. Obvious genetic difference was observed when the tree interval was 0.25. The accessions were divided into two clusters, cluster 1 and cluster 2. A total of 74 landraces and 2 cultivars (Xifu14 and Yumai1) belonged to cluster 1, and 158 cultivars, and 2 landraces (Kaixianluohanmai and Yupi) belonged to cluster 2 (Fig. 2).
LD across all chromosomes was estimated using r 2 between the signi cant pairs of intra-chromosomal SNP markers with physical distance. A total of 1,426,789 signi cant (pDiseq < 0.001 and r 2 > 0.1) pairwise SNP markers were identi ed. The average LD decay distance for the whole genome was around 1.922 Mb based on the best tting curve (Fig. 3).
Genome-wide association study (GWAS) GWAS was used to analyze the 44,326 SNP markers and the association with the quality traits in all test environments on the MLM model using Q and K as covariates. The loci with high con dences of association in the four environments were displayed as Manhattan plots with P values across the 21 wheat chromosomes (Fig 4). A total of 15 SNP markers were signi cantly associated with GPC, GSC, and WGC. Based on the LD decay distance (1.922 Mb), 12 QTL were identi ed located on chromosomes 1B, 1D, 2A, 2B, 2D, 3B, 3D, 5D, and 7D. The phenotypic variation explained (PVE) by the markers ranged from 4.2% to 10.75%. Of them, three QTL were signi cantly associated with GPC, seven with GSC, and two with WGC. Totally, ve QTL were de ned potentially novel through the physical mapping of the reported QTL/genes based on reference RefSeq v1.0 (Table 5).

Candidate gene prediction
A total of 243 genes (high con dence) were selected from the region of the ve potentially novel QTL. Of them, 21 candidate genes were predicted to be involved in carbohydrate metabolism or transportation, photosynthesis, programmed cell death, and abscisic acid-ethylene balance that affected quality traits (Supplementary Table 3).

Discussion
Impact of environment on GPC, GSC, and WGC We analyzed GPC, GSC, and WGC of 236 wheat accessions including 76 landraces and 160 cultivars under four different eld trials in Sichuan Province. We found the quality traits were heavily in uenced by environments. The relatively smaller correlation coe cients and lower broad-sense variability (h 2 ) among the different elds revealed the signi cant in uence of environment on the quality traits (Supplementary Table 2; Table 2). Furthermore, ANOVA intuitively proved the signi cant differences in GPC, GSC, and WGC among the environments. These ndings are consistent with many previous reports in wheat at different environment [4,13,55,56]. Meanwhile, H 2 of GSC was more than that of GPC and WGC. H 2 is an important genetic parameter used for phenotypic prediction that indicates all genetic contributions to phenotypic variance. High H 2 combined with relatively higher correlation coe cients among the four environments indicated a major impact of genotypes on GSC, and the possibility to improve traits via selective breeding. Earlier, Denčić [57] showed low h 2 of GPC and WGC in wheat. All these ndings together indicate di culty to improve GPC and WGC by traditional breeding; such traits should be selected at a later generation. So, traditional breeding combining with the maker-assisted selection is a good choice to improve the breeding e ciency and shorten the breeding period.
Impact of stripe rust on GPC, GSC, and WGC Correlation coe cient analysis in this study showed that IT was negatively correlated with GPC and WGC but positively correlated with GSC (Table 1). The impact of environmental factors (including stripe rust) on GPC and WGC was more than that on GSC. And the relative higher h2 (0.841) indicated the GSC was stable and not easily in uenced by the environment (Table 2). We considered that the decrease in GPC and WGC was more than that in GSC. Meanwhile, the present study and several previous reports [11,14,15] demonstrated a positive correlation between GPC and WGC but a negative correlation between GSC and GPC and between GSC and WGC. So, the analysis of the three quality traits and IT under certain conditions revealed a relative positive correlation between IT and GSC. If we compare the quality traits under control and treatment settings, there must be a negative correlation relationship between IT and the three quality traits.

Differences in quality traits between landraces and cultivars
In the present study, obvious differences were found between landraces and cultivars. Bayesian classi cation based on Qmatrix and neighbor-joining (NJ) phylogeny based on shared allele distance clearly grouped the accessions into landraces and cultivars. The phenotypic diversity analysis showed better performance of landraces with respect to GPC and WGC and that of cultivars with respect to WGC (Table 3). In addition, the genetic differences between landraces and cultivars were identi ed using SNP markers indicate that the use of landraces in breeding can broaden the genetic background and improve the phenotypic diversity.

QTL associated with the quality traits
In this study, 236 accessions were genotyped with 55K SNP array, and a total of 44,326 effective SNP markers were analyzed in this study. GWAS for the quality traits on these 236 accessions identi ed 12 QTL associated with GPC, GSC, and WGC. These Three QTL, QGpc.sicau-1BS, QGpc.sicau-2AS, and QGpc.sicau-2BS, were associated with GPC. QGpc.sicau-1BS was at around 51.99 Mb of the short arm on chromosome 1B with PVE that ranged from 5.7% to 6.2%, it was covered by QGPC.ndsu.1B [58]. QGpc.sicau-2AS was located at the distal region of the short arm of chromosome 2A. This QTL was similar to MQTL2A2 linked to wPt-4197 and wPt-5245 in wheat [59]. QTKW.sicau-2AS.1 [60] associated with thousand-kernel weight was located on the same block as QGpc.sicau-2AS. QGpc.sicau-2BS, mapped on the short arm of chromosome 2B at about 25.42 Mb, was also associated with GPC. This QTL was located in the same region as QGpc.crc-2B anked by the SSR markers Xgwm210 and Xwmc25 [30]. QSlC.sicau-2BS [60] associated with spikelet compactness was located in the same region as QGpc.sicau-2BS. Thousand-kernel weight and spikelet compactness are important factors that determine grain yield in wheat. Moreover, several reports have demonstrated a close association between GPC and grain yield [4]. Thus, in the present study, QGpc.sicau-2BS identi ed was found associated with both GPC and thousand-kernel weight/spikelet compactness.
Furthermore, seven QTL associated with GSC were identi ed in this study. QGsc.sicau-3BS was associated with marker AX-110012661, located on the short arm of chromosome 3B, which was covered by QTsc-3B.12 linked to SSR marker Xwmc612 and Xbarc068 [36]. QGsc.sicau-3DS was mapped on the short arm of chromosome 3D at around 17.18 Mb. It was very close to the reported Qftsc3D, which was anked by wPt-2313 and wPt-6965 [38]. QGsc.sicau-5DS, close to the reported QTL linked to the SSR marker Xbarc130 [13], around 3.61 Mb explained a phenotypic variation of 10.75%. The other four QTL located at a greater distance from previously identi ed GSC genes or QTL regions may be novel loci.
Another two QTL associated with WGC were identi ed. QWgc.sicau-5DL anked by markers AX-111139947 and AX-108791420 was located between 555.91 Mb and 556.72 Mb on the long arm of chromosome 5D. This QTL was different from previously reported genes/QTL associated with WGC. Therefore, we consider QWgc.sicau-5DL as a potentially novel QTL. PVE of another QTL, QWgc.sicau-7DL, located on the long arm of the chromosome 7D around 619.11 Mb, ranged from 4.19% to 5.05%; it was close to QGlu7D linked to the SSR markers Xwmc634 and Xwmc273.2 [36].
In this study, mapping was done on the Chinese Spring reference RefSeq v1.0 (IWGSC). Positioning of the 12 QTL revealed that four of them were covered by already reported QTL, three were close to reported QTL, and ve were identi ed as potentially novel QTL. So far, few QTL have been associated with GSC and therefore, novel ones associated with GSC have been identi ed in this study.
Candidate genes for potentially novel QTL Collinearity analysis of the candidate genes of the potentially novel QTL predicted 21 putative candidate genes, which may be associated with the quality traits. Of them, 15 candidate genes were identi ed in the three QTL (QGsc.sicau-1BL, QGsc.sicau-1DS, QGsc.sicau-2DL.2) associated with GSC. Six candidate genes, in QWgc.sicau-5DL, were predicted to be associated with WGC.
Four candidate genes were speculated to exist in QGsc.sicau-1BL. One putative gene, TraesCS1B02G338600, aligned with Arabidopsis JAL3 (Jacalin-related lectin 3), a carbohydrate binding protein that may play a role in determining GSC. concentrationTraesCS1B02G339200 is orthologous to Arabidopsis TRAF1A (TNF receptor-associated factor homolog 1a), which associates with autophagosomes and takes part in the regulation of autophagy dynamics [61]. Young and Gallie [62] have reported programmed cell death (PCD) in the starchy endosperm cells of wheat and maize during the nal stages of seed development. Based on this report, we presume that TraesCS1B02G339200 may in uenceconcentration GSC via autophagy (type PCD) in the starchy endosperm cells of wheat. TraesCS1B02G339500 and TraesCS1B02G339700 were found homologous to PSBW (Photosystem II reaction center W protein), which is involved in photosynthesis and photosystem II stabilization [63] and in uence GSC in wheat.
Another eight candidate genes were predicted for QGsc.sicau-1DS associated with starch concentration. TraesCS1D02G022300 has sequences homologous to rice glycosyltransferase BC10. Glycosyltransferase catalyzes the transfer of sugars during starch biosynthesis [64.65]. TraesCS1D02G023300 aligned with the Arabidopsis gene PP2A1 (Phloem protein 2-like A1), which plays a role in carbohydrate binding similar to JAL3 [66]. TraesCS1D02G024100 is homologous to the gene EXGA (Probable glucan 1,3-beta-glucosidase A) of Emericella nidulans. Gene ontology annotation indicated its role in polysaccharide catabolism. TraesCS1D02G024200, orthologous to the beta-galactosidase 1 Os01g0533400 in rice, might be involved in carbohydrate metabolism and in turn in uence GSC. TraesCS1D02G026200 has sequences homologous to the rice gene WNK2, a probable cytoplasmic serine/threonine kinase involved in protein phosphorylation [67]. Grimaud [68] reported the signi cance of protein phosphorylation in starch synthesis. Meanwhile, TraesCS1D02G026500 and TraesCS1D02G032500 aligned with Arabidopsis cysteine protease XCP1 related to PCD [69]. Similar to TraesCS1B02G339200, TraesCS1D02G026500 and TraesCS1D02G032500 may also in uence GSC of wheat via PCD. TraesCS1D02G031700 is orthologous to the Arabidopsis gene PGRL1A (PGR5-like protein 1A), which is involved in photosynthesis and photosynthetic electron transport in photosystem I [ 70 71].
We identi ed three candidate genes in QGsc.sicau-2DL.2. The orthologous gene of TraesCS2D02G277300 is Arabidopsis EMB1674 involved in abscisic acid (ABA)-activated signaling pathway. The balance between ABA and ethylene is crucial in PCD regulation in the starchy endosperm [72]. TraesCS2D02G278100 showed homology with CHIT5B (Class V chitinase) of Medicago truncatula. Functional analysis revealed its role in the polysaccharide catabolism similar to EXGA. TraesCS2D02G278200 aligned with the Arabidopsis gene TIF3H1 (Translation initiation factor 3 subunit H1), which responds to ABA, glucose, and sucrose levels [72]. ABA can regulate PCD and also assist in the conversion of glucose and sucrose into starch.
Six candidate genes were identi ed in QWgc.sicau-5DL. TraesCS5D02G546400 is orthologous to the Arabidopsis gene ERECTA (LRR receptor-like serine/threonine kinase) that regulates plant organ morphogenesis [73]. We speculate that grain morphogenesis in wheat is related to starch concentration in the grain. TraesCS5D02G549900 is homologous to the gene STP7 (Sugar transport protein 7) in Arabidopsis. It's involved in cell wall sugar recycling [74]. TraesCS5D02G550500 aligned with the Arabidopsis gene ASIL2, which is the trihelix transcription factor that represses seed maturation program during early embryogenesis [75]. Seed maturation program involves starch and protein accumulation. Therefore, repression of the seed maturation program affects starch accumulation. The homologous gene of TraesCS5D02G551800 is Arabidopsis RCD1 (Radical-induced cell death 1), which is also involved in PCD [76]. TraesCS5D02G551900 and TraesCS5D02G552000 are orthologous to rice CIN4 (Beta-fructofuranosidase, insoluble isoenzyme 4) and 1-FEHw3 (Fructan 1-exohydrolase w3), respectively. These genes similar to Os01g0533400 take part in carbohydrate metabolism and might in uence GSC of wheat.

Conclusions
We have shown that genome-wide association mapping effectively detected both stable and environment-speci c QTL for grain protein concentration, grain starch concentration, and wet gluten concentration. Multi-trait chromosome regions have been detected and particularly the region on chromosome 2DL associated with grain protein concentration may be useful in MAS following proper validation. In the context of nutritional quality, 5 QTL regions were potentially novel control grain starch concentration or wet gluten concentration implying the possibility of using vegetation indices for indirect assessment for certain nutritional quality traits. Although some objective limitations for position comparison existed, we tried to improve the analysis threshold and compared with multiple linked markers to obtain maximum information. Furthermore, we will verify the potentially novel QTL using allelism test and ne mapping.

Plant materials
A total of 236 Sichuan wheat accessions including 76 landraces and 160 cultivars were used in the study (Additional le Table   1). All these accessions were homozygous lines were provided by Triticeae Research Institute, Sichuan Agricultural University Measurement and analysis of quality traits GPC and GSC in the whole-grain was determined by Near Infrared Re ectance Spectroscopy (NIR, Foss, Sweden) (AACC method . The gluten extraction was carried out adopting the procedure as described in AACC (2005). All these traits were expressed on a grain dry weight basis. The three quality traits were adjusted to 14% moisture concentration for further analysis [77]. Stripe rust infection type (IT) was evaluated at the adult stage following the method by Ye et al. [78].
Best linear unbiased prediction (BLUP) values for each accession across different locations were calculated by tting the linear model in R package 'lme4' to eliminate the environmental impact on the quality traits. The genotypic and environmental variances (VG and VE) were also computed using 'lme4' package [79]. Broad-sense heritability (H 2 ) for each of the quality traits was calculated across all test environments using the following formula: H2 = VG/(VG+VE) [80]. The phenotypic diversity was con rmed using Shannon-Weaver diversity index (H') [81] calculated based on the BLUP values for each trait. SPSS 20.0 software (IBM Corp., Armonk, NY, USA) was used to compute Pearson's correlation coe cient among four environments or three traits and to perform t-test to determine the signi cant differences in GPC, GSC and WSC between landraces and cultivars.
Differences and interactions among wheat accessions, years and plant locations effects on variance were tested by two-way ANOVA.
Genotyping and molecular analysis Genomic DNA was extracted using plant genomic DNA kit (Bio t Co., China) from a mix of leaves collected from ve one-weekold seedlings of each accession. All 236 DNA samples were genotyped using 55K single nucleotide polymorphism (SNP) array (Affymetrix Axiom Wheat55K) at China Golden Marker Biotechnology Company Ltd. (Beijing, China). The SNP markers with missing values ≤ 10% and minor allele frequency (MAF) ≥ 5% were selected for further analysis. The polymorphic information concentration (PIC) 144,326 SNP marker was analyzed to evaluate the genetic diversity using software POWERMARKER v3.25 [82].
Population structure, neighbor-joining (NJ) phylogeny, and linkage disequilibrium (LD) decay The population structure (Q-matrix) of the wheat accessions were analyzed based on Bayesian model using STRUCTURE software (v2.3.4) [83]. Five independent runs for K (from 1 to 10) were performed using the admixture model with 10,000 burnin and 10,000 Markov chain Monte Carlo (MCMC) iterations. The results from STRUCUTRE were summarized to obtain optimum population structure (optimum K) using delta K (ΔK) method in the web-based software STRUCTURE HARVESTER [84]. To understand the genetic relationship among all accessions, a phylogenetic tree was constructed using neighbor-joining (NJ) method based on shared allele distance [85] using TASSEL software v5.2.38 [86]. The NJ phylogenetic tree was collapsed and formatted using iTOL v4 [87]. The pairwise measure of linkage disequilibrium (LD) was estimated as squared allele frequency correlation (r2) between pairs of intra-chromosomal markers with known chromosomal position using TASSEL v5.2.38 [64]. Signi cant pairwise markers were chosen with the threshold pDiseq < 0.001 and r2 > 0.1, and the LD decay plot and half decay distance were generated with r2 using the ggplot2 package in the R program [86].

Genome-wide association study (GWAS)
To identify the loci associated with GPC, GSC, and WGC, GWAS was performed on the 236 wheat accessions with the 44,326 effective markers using TASSEL v5.2.38 [86] based on the mixed linear model (MLM) with Q and K as covariates. To identify signi cantly associated loci, we set a threshold P value (−log10(P) ≥ 2.5) and the criterion to be detected in at least two test environments. Manhattan plots with P values were generated using ggplot2 package in R program to visualize the loci associated with the quality traits [88]. All the associated loci (−log10(P) ≥ 2.5) in the half decay distance region on the same chromosome were assigned to the same QTL block.

Candidate genes analysis
We further analyzed the putative candidate genes to identify the novel QTL. The genes included in the potentially novel QTL were selected based on the LD decay distance using the Chinese Spring reference genome (IWGSC RefSeq v1.0, RefSeq BLUP, best linear unbiased prediction; GPC, grain protein content; GSC, grain starch content; WGC, wet gluten content; STDEV, standard deviation; CV, coe cient of variation; H', the Shannon-Weaver diversity index; * and ** indicate the differences between landraces and cultivars are signi cant at the level p < 0.05 and p < 0.01, respectively  Table 5 The details of QTLs associated with quality traits