Uncovering genomic regions controlling plant architectural traits in hexaploid wheat using GWAS


 Background: Wheat is a staple food crop worldwide. Plant height is a key factor in plant architecture as it plays a crucial role in lodging and thus affects yield and quality. Genome-wide studies are mostly applied in crop plants, due to its advanced genotyping technologies, identification of novel loci, and improved statistical approaches. Results: In this study, the population was genotyped by using Illumina iSelect 90K single nucleotide polymorphism (SNP) assay and finally 22,905 high-quality SNPs were used to perform a genome-wide association study (GWAS) for plant architectural traits employing four multi-locus GWAS (ML-GWAS) and three single-locus GWAS (SL-GWAS) models. As a result, 174 and 97 significant SNPs controlling plant architectural traits were detected by four ML-GWAS and three SL-GWAS methods, respectively. Among these SNP makers, 43 SNPs were commonly detected, including seven across multiple environments and thirty-six across multiple methods. Interestingly, five most stable SNPs (Kukri_c34553_89, RAC875_c8121_1490, wsnp_Ex_rep_c66315_64480362, Ku_c5191_340, and tplb0049a09_1302) consistently detected across multiple environments and methods, possibly played a role in modulating plant height and flag leaf length. When comparing ML-GWAS methods, pLARmEB was the most powerful and accountable for the detection of 49 significant SNPs that mostly contributed to plant height (36 SNPs). However, in SL-GWAS the FarmCPU model detected most of the significant SNPs. Moreover, a total of 152 candidate genes were found that are likely to be involved in plant growth and development which may provide insightful information related to plant architectural traits.Conclusion: Altogether, our results reveal 174 and 97 significant SNPs controlling plant architectural traits using four ML-GWAS and three SL-GWAS methods, respectively. The detection of the stable loci across multiple environments and methods, possibly play a role in modulating plant architectural traits in hexaploid wheat, and finally will contribute to the discovery of valuable SNP loci for marker-assisted selection (MAS) in wheat molecular breeding.

to the discovery of valuable SNP loci for marker-assisted selection (MAS) in wheat molecular breeding.

Background
Wheat (Triticum aestivum L) is the most important food crop cultivated worldwide on about 200 million hectares and provides 20% of the total needs of the world population [1][2][3]. Wheat is among the three major cereal crops ranked its position next to maize and rice annually produced [4]. For the first time in human history, wheat cultivation has produced a sufficient amount of food, which in turn, supported the development of cities and contributed to the rise of human civilization in Middle Eastern empires of Babylon and Egypt [5]. High grain yield is one of the critical breeding goals in developing cultivars.
Wheat has been grown in China for more than 4,000 years and is currently cultivated in 10 agricultural zones based on its different kinds, cultivation season, effects of temperature and photoperiod [6][7][8]. Crop yield is a complex trait significantly associated with thousand grain weight, grains number per spike and fertile spikes/m 2 . Moreover, kernel shape, plant height, effective tillers per plant, flag leaf length and flag leaf width traits can also affect yield through effects on photosynthetic ability, grain filling duration and dry matter translocation [9][10][11][12]. Larger grains are the potential indicator of grain yield, apart from this increased grain size is positively associated with seeding growth and early maturity [13]. Using semi-dwarf cultivars contained dwarfing genes (Rht1 and Rht2) were the main reason for the success of the Green Revolution [11].
Prior studies have identified chromosomal regions associated with yield-related traits, either by QTL mapping using linkage maps [14][15][16][17] or by genome-wide association study [18][19][20][21]. However, the standard QTL mapping using bi-parental crosses only detect information on two alleles at a given locus with low resolution, and unable to detect QTLs in complex and genetically diverse traits [22]. In addition, the evaluation of the mapping population is a time consuming and costly process. As a complement to conventional QTL mapping, GWAS analysis provides an opportunity to analyze the diverse genetic background of complex traits with more accuracy [23]. GWAS has three obvious advantages compared to QTL mapping: (1) it shortens the time of constructing populations; (2) it can be used to evaluate a wider range of germplasms for phenotyping many traits with one cycle of genotyping; (3) it offers much higher resolution for fine mapping [20,24]. Currently research have been accomplished through GWAS in wheat [25], maize [26], rice [27], soybean [28] and cotton [29]. Comparatively hexploid wheat has larger genome (≈ 15.8 GB) than rice (≈ 400 Mb) and maize (≈ 2.32 Gb) [25]. Over the past ten years, the absence of fully sequenced reference genome has limited the gene discovery of wheat and recent advances in functional genomics have provided breeders with a new impetus to achieve their goals [30].
However, substantially more work is required because of the experimental bottleneck emerging from the absence of inconsistency among studies, and the utilization of lowdensity marker platforms in gene mapping studies. To achieve better understanding with high resolution, larger number of molecular markers are preferred for mapping population.
Currently, the best alternate choice for wheat breeder is the utilization of single nucleotide polymorphism (SNP) to completely detect candidate genes controlling complex traits [20]. Comparatively, the high-density wheat Illumina iSelect array comprising about 90,000 SNPs has been extensively adapted than SSR marker due to their efficiency, high density, stable inheritance, low cost, representativeness and capability of automatic detection [25,31].
To address these issues the present study was designed to use genome-wide association study (GWAS) approach employing high-density wheat iSelect 90K SNP array. Our main objectives were to 1) investigate marker-trait associations (MTAs) for plant architecturerelated traits. 2) detect candidate genes responsible for corresponding morphological traits. Indeed, this study will provide useful insights by integrating three single-locus and four newly developed multi-locus GWAS methods, and further establish a regulatory network in wheat breeding programs.

Statistical analysis of phenotypic traits
In the present study, we evaluated wheat germplasm collections for plant architectural traits, including plant height, flag leaf length, flag leaf width and the number of tillers per plant. The phenotypic characteristics for plant height across four environments and flag leaf length, flag leaf width and number of tillers per plant across two environments as shown in Additional file 1 and Fig. 1. All the traits exhibited the normal distribution pattern each year, indicating the quantitative nature of these traits (Fig. 1). Descriptive statistics revealed large phenotypic variations for all the traits as given in Additional file S1. PH ranged from 48.40 to 124.82 cm with coefficient of variations (CVs) ranged from 11.09-16.11%. The FLL varied from 16.06 to 31.57 cm, FLW varied from 1.22 to 2.75 cm and for tillers, the number of tillers per plant ranged from 8.10 to 14.67. Analysis of variance indicated highly significant differences (P < 0.001) for all the studied traits (Additional file 2).
Broad sense heritability was also estimated for PH, FLL, FLW and number of tillers with values ranged from 0.569 (FLW) to 0.794 (PH), suggesting the stability of these traits. The correlation analysis revealed significant correlation between different environments for each of the four traits, indicating the consistency of these traits across various environments (Fig. 2). Furthermore, PH was significantly and positively correlated with FLL in almost all the environments. However, they correlated significantly but negatively with FLW in most of the environments indicating competition of these traits to assimilate at the plant growth stage. Finally, the relatively weak correlation of tillers with other traits suggested the independency of this trait.

Population Structure Analysis
Population structure is important due to the large number of diverse genotypes used in the study may produce false associations between the phenotypic values and unlinked markers. Therefore, a comprehensive analysis of population structure is prerequisite for evaluating successful association mapping. The number of subpopulations were calculated by the rate of change in the log probability of data between successive K-values. ΔK was calculated for increasing the number of K value determined by STRUCTURE analysis according to the procedure of Evanno [32]. At k = 2, a break in the slope was observed followed by flattening of the curve (Fig. 3a). Hence, the most likely number of subpopulations was two (k = 2) (Fig. 3b). Moreover, this result was confirmed by PCA based on standardized covariance of genetic distances of SNP markers (Fig. 3c).

Gwas For Plant Architectural Traits
Four multi-locus GWAS (ML-GWAS) models: FASTmrMLM, FASTmrEMMA, mrMLM, and pLARmEB, and three single-locus GWAS (SL-GWAS) models: FarmCPU, MLM, and MLMM implemented by Genomic Association and Prediction Integrated Tool (GAPIT) in R were used to detect the association signals. In ML-GWAS the SNPs with the critical logarithm of odds (LOD) score of 3 or greater than 3 were regarded as significant SNPs for associated traits. In SL-GWAS the screening criteria for detecting significant SNPs were P = 1/m (m is the number of total SNPs). To obtain more reliable results, the SNPs that were simultaneously detected in at least two years or by at least two methods were considered as most stable SNPs. A total of 113 and 62 significant SNPs were identified by ML-GWAS and SL-GWAS methods, respectively (Fig. 4a) To validate the findings, we further compared the results across multiple environments and 6 and 1 SNPs were coidentified in at least two of the environments for PH and FLL, respectively (Fig. 4c, Table 2). These environmentstable SNPs were located on different chromosomes. For PH there was one SNP on chromosome 2B, one on 3A, one located on 3B, one on 6A and two SNPs located on chromosome 6B. The LOD scores ranged from 3.05 to 7.92. One stable SNP across two environments located on chromosome 5A associated with FLL with LOD value ranging from 3.88 to 6.55 (Table 2). Comparing the results across different methods, we found 36 common SNPs were codetected simultaneously by at least two approaches (Fig. 4c, Table 1). Among these, four significant SNPs (RAC875_c8121_1490, Ku_27771_508, Tdurum_contig42962_2138, BS00022127_51) were detected by all four methods (Table 1). We further checked the co-detected common SNPs simultaneously in multiple environments and different methods and detect five most stable SNPs (Kukri_c34553_89, RAC875_c8121_1490, wsnp_Ex_rep_c66315_64480362, Ku_c5191_340, and tplb0049a09_1302). Among these, one SNP was associated with FLL and the rest of four were identified for PH across multiple environments and different years ( Fig. 4c, Table 3). Comparatively, the four ML-GWAS models i.e. FASTmrMLM, FASTmrEMMA, mrMLM, and pLARmEB, pLARmEB was the most powerful and useful approach, that detected 49 significant association signals, mostly associated with PH (36 SNPs), however FASTmrEMMA was the least effective approach accountable for 32 significant SNPs (Fig. 3b). The Manhattan and QQ plots of the above four ML-GWAS methods for plant height, flag leaf length, flag leaf width and number of tillers are presented in Additional file 7 (a-e).

Traits Having Common Associations
SNPs associated with more than one trait are very useful for marker-assisted selection. A total of five SNPs were detected associated with more than one trait across multiple environments (Additional file 4). Among these, one SNP on chromosome 5A

Candidate Genes Identification
To further understand the genetic basis of plant architectural traits, we detected several candidate genes that were surrounding the peak SNPs. Interestingly, several major candidate genes that were directly associated with the consensus SNPs had exact same annotations ( Fig. 5 and Additional file 6). For instance, several putative candidate genes for PH and FLL, annotated as Laccase which is used for lignin polymerization to help in a variety of functions in plant development [33]. Similarly, the putative genes responsible for PH and FLL annotated as Cysteine proteinase inhibitor, has a function in plant growth and defense [34]. Seven genes annotated as Glutathione S-transferase responsible for resistance against biotic stress [35]. Additionally, six putative candidate genes surrounding significant SNPs associated with number of tillers have annotations as F-box family protein, involved in plant vegetative and reproductive growth [36]. Further examples are given in Fig. 5 and Additional file 6. These results will provide useful information for future work and suggest that the likely gene families are important for plant architectural traits.

Plant architecture related traits
In this study, we employed four multi-locus models: FASTmrMLM, FASTmrEMMA, mrMLM, and pLARmEB and three single-locus models i.e. FarmCPU, MLM, and MLMM to identify significant association with plant height, flag leaf length, flag leaf width, and number of tillers across multiple environments in hexaploid wheat. Plant height is a key factor in crop breading as it plays a crucial role in reshaping plant architecture. In wheat, it greatly affects lodging and thus grain yield and quality [37]. Consequently, the identification of major dwarfing genes in green revolution to reduce plant height and improve yield was a major component in wheat breading [38]. In this study, we identified significant SNPs across multiple years by different ML-GWAS approaches (

Significance Of Gwas Using High-density Genotyping
Sequencing larger data with new technology will provide the base to use high-density genotyping approach for quicker and cost-effective operations. Compared to tradition QTL mapping, GWAS has three major advantages: high resolution power for QTL identification, ability to detect more alleles, and less estimation time [56]. GWAS is mostly applicable in crop plants, due to its modern genotyping technologies, identification of novel alleles, and improved statistical methods [56]. Crop genotyping has been a common approach since 1990s, but recently several improvements have been occurred in different types of polymorphisms and genotyping platforms [57]. Previously used methods depend on patterns of DNA digestion restriction and hybridization, randomly amplified PCR fragments, the advent of next generation sequencing (NGS) technologies enable genotyping in depth investigation with higher resolution [58,59]. The most widely used NGS is single nucleotide polymorphisms (SNPs) which allow better detection power for markers associated with agronomic traits [60].  [68,69]. In earlier studies, mostly SL-GWAS methods were adopted to dissect complex traits, but only few SNPs for each trait have been identified due to its procedural limitations. GLM has an obvious shortcoming of high false positive rate due to the absence of kinship among materials as covariate [70]. In MLM, due to the setting of very high threshold, many small-effect loci are missed [71]. To make up for the limitations of these methods, some multi-locus methodologies have recently been implemented, such as FASTmrMLM [72], FASTmr EMMA [73], mrMLM [71], and pLARmEB [74] are more effective approaches, which were used in this study. Using these models can improve the accuracy of SNPs with high detection power and less stringent criteria, which can effectively overcome the above issues. The most obvious advantage of these multi-locus models is that no Bonferroni multiple test correction is needed ( [71,73]. To study quantitative traits with the complex genetic background as the number of molecular markers is comparatively larger than sample sizes, it is recommended to simultaneously use multiple GWAS methods. In the past ten years, several GWAS approaches have been used to identify significant SNPs, especially evaluating agronomic traits in common wheat (T. aestivum L.) traits using a single locus and two multi-locus approaches [75]. Conclusively, V Jaiswal, V Gahlaut, PK Meher, RR Mir, JP Jaiswal, AR Rao, HS Balyan and PK Gupta [75] verified that ML-GWAS has more detection power than SL-GWAS by revealing ten Marker FASTmrEMMA, mrMLM, pLARmEB, and ISIS EM-BLASSO methods, it was confirmed that ISIS EM-BLASSO was the most effective approach for QTL identification [76]. The combination of two SL-GWAS and ML-GWAS methods contributes efficiently to detection of significant loci associated with pre-harvest sprouting tolerance in wheat [77]. SU Khan, J Yangmiao, S Liu, K Zhang, MHU Khan, Y Zhai, A Olalekan, C Fan and Y Zhou [68] combined SL and ML-GWAS approaches and revealed that ML-GWAS methods are more effective with high robustness and power of QTN detection than SL-GWAS in the genetic dissection of yield related traits of rapeseed genotypes. Li and his co-workers integrated the results of three single locus GWAS and three multi locus methods for fiber quality traits in upland cotton [78]. A total of 342 significant QTNs were detected of which 72 were consistently detected by at least two approaches or in at least two environments. According to C Li, Y Fu, R Sun, Y Wang and Q Wang [78], the power of QTN detection in association analysis can be improved by combining single locus and multi-locus GWASs.
In this study, we detected a total of 113 and 62 significant SNPs by ML-GWAS and SL-GWAS approaches, respectively. Furthermore, 19 SNPs co-detected by using ML-GWAS and SL-GWAS methods together (Additional file 5). A comparison of the four ML-GWAS methods revealed that pLARmEB was more powerful and robust [68], than the other three models in

Statistical Analysis
Descriptive analysis, ANOVA, correlation analysis and heritability estimates were conducted in the R statistical package [83]. The broad sense heritability for the traits was estimated by the formula H 2 = VG/(VG + VE) where VG and VE represent estimates of genetic and environmental variance, respectively [84].

Population Structure And Kinship Analysis
Population structure using a Bayesian cluster analysis was estimated by STRUCTURE HARVESTER software [85]. A putative number of subpopulations ranging from k = 1 to 7 was assessed using 100,000 burn-in iterations followed by 500,000 recorded Markov-Chain iterations. To estimate the sampling variance (robustness) of inferred population structure, 10 independent runs were carried out for each k. K was estimated using an ad- utilizes Bonferroni corrections for loci detection to reduce the FPRs [87]. Though, this procedure is so stringent that results in missing significant SNPs [71]. Therefore, multilocus GWAS approaches are the best alternatives. The stringent Bonferroni multiple test correction in the SL-GWAS analysis is substituted by a flexible selection criterion in multilocus GWAS analysis, that reduces the possibility of missing out significant loci [71,73].
The four ML-GWAS methods were performed with default parameters, and the screening criteria for significance were set with LOD scores 3 or > 3 [71,73,74,88,89]. However, for SL-GWAS models, the threshold for P-value was calculated based on the number of the markers (P = 1/n, n = total SNP used) according to the method of [90].

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.