Genome-wide association study and Genomic Prediction of spot blotch disease in wheat (Triticum aestivum L.) using genotyping by sequencing

Background Spot blotch caused by Bipolaris sorokiniana is a major constraint in wheat production in tropics and subtropics. There is limited information available on GWAS and study on genomic prediction is completely lacking. To reveal the genetic markers associated with disease resistance, we performed a genome-wide association study (GWAS) for spot blotch disease in 141 spring wheat lines. Results Based on the testing under natural infection in three years at hot spots location in Pusa, India and Jamalpur, Bangladesh, the genotypes showed signicant genetic variation for disease severity. Using Genotyping-by-Sequencing (GBS) based 18637 polymorphic SNP markers and phenotyping from diverse environments, we identied 23 genomic regions across the genome ( p < 0.001) on 14 chromosomes associated with disease scores. Consistent with the previous reports, a most stable genomic region on chromosome 2B, 5B and 7D were detected across the environments. The new genomic region on chromosome 3D was also identied. We performed functional annotation with wheat genome assembly annotation (IWGSC Ref Seq v1.0) and identied NBS-LRR and 35 other plant defense-related protein families across multiple chromosome regions. Using a ve-fold cross-validation scheme, we observed moderate prediction accuracy for 3 of 4 environments indicated that our model was able to successfully capture the quantitative variation underlying the SB variation in our population. Conclusions The GWAS based on the phenotypic data from PUSA India and BARI Bangladesh resulted in a total of 23 genomic regions on 14 chromosomes. The new genomic region appeared on chromosome 3D associated with Zinc nger protein that play important role in plant disease resistance. The genomic prediction model for spot blotch disease resistance in wheat was tested and obtained moderate prediction accuracy.

chromosome 3D was also identi ed. We performed functional annotation with wheat genome assembly annotation (IWGSC Ref Seq v1.0) and identi ed NBS-LRR and 35 other plant defense-related protein families across multiple chromosome regions. Using a ve-fold cross-validation scheme, we observed moderate prediction accuracy for 3 of 4 environments indicated that our model was able to successfully capture the quantitative variation underlying the SB variation in our population.
Conclusions The GWAS based on the phenotypic data from PUSA India and BARI Bangladesh resulted in a total of 23 genomic regions on 14 chromosomes. The new genomic region appeared on chromosome 3D associated with Zinc nger protein that play important role in plant disease resistance. The genomic prediction model for spot blotch disease resistance in wheat was tested and obtained moderate prediction accuracy. Background Wheat (Triticum aestivum L.) is the major staple for more than 35% of the world's population [1]. The pace of wheat improvement must accelerate to meet the projected global food demand by 2050. Green revolution played a key role in India, Pakistan, Nepal and Bangladesh to ensuring food security in this densely populated region of the world [2]. However, wheat production faces multiple threats via rapidly evolving pathogen variants, pests and increased climate variability, which considerably jeopardize crop productivity growth [3,4,5,6]. Breeding wheat for climatic resilience and disease resistance combined with good agronomy can potentially improve wheat productivity to meet the future food demands [7]. Spot blotch caused by Bipolaris sorokiniana is a major constraint in wheat production in tropics and subtropics [8,9]. The pathogen has a worldwide dispersal, but it is predominantly aggressive under conditions of warm, high relative humidity and temperature associated with imbalanced soil fertility. Yield losses are variable but are important in elds with low inputs and under late-sown conditions [2]. Bipolaris sorokiniana act as a causal agent for numerous diseases like head blight, seedling blight, foliar blight/ spot blotch, common root rot and black point of wheat, barley, other small cereal grains and grasses [10]. However, spot blotch of wheat is considered as one of the most important diseases caused by this pathogen in the mega environments characterized by high temperature (coolest month greater than 17˚C) and high humidity [11].
When desired level of resistance to several diseases is required, it is often di cult to achieve through conventional breeding approaches. Disease resistance can be inherited both qualitatively and quantitatively, as is the case in many wheat diseases [12,13,14,15]. The genetic basis of spot blotch resistance has earlier been recognized to eight major quantitative trait loci (QTLs), namely QSb.bhu-2A, QSb.bhu-2B, QSb.bhu-2D, QSb.bhu-3B, QSb.bhu-5B, QSb.bhu-6D, QSb.bhu-7B and QSb.bhu-7D [12,13]. Sharma et al (2007b) [16] reported three microsatellite markers (Xgwm67, Xgwm570 and Xgwm469) linked with spot blotch resistance. Lr34 and Lr46, the two broadly used genes conferring leaf rust resistance have also been reported to contribute to spot blotch resistance [17]. Lr34 gene is reported to be the major locus on chromosome 7D and explains up to 55% phenotypic variation for spot blotch disease resistance and this locus was given the gene designation Sb1 [17]. During past few years, several QTLs and genetic markers for spot blotch resistance have been identi ed in multiple studies in winter Wheat [18,19,20,21].
Due to changes in pathogen populations, resistance genes can lose their effectiveness over time. Given these challenges, identi cation and mapping of novel resistance genes would aid breeding for disease resistance in wheat. One approach to identify spot blotch resistance QTLs is through association mapping. This approach leverages historic recombination and generally have better mapping resolution compared to biparental mapping [22]. A promising strategy to identify QTLs for traits of interest is genome wide association study (GWAS) that takes advantage of decreasing sequencing cost and high throughput genotyping assays [23]. A key approach in GWAS is to have enough genome coverage so that functional alleles will be in linkage disequilibrium (LD) with at least one marker [24]. The association studies for disease resistance including spot blotch have been conducted [25,26,27,28,29,30].
Limited research was done where same set is exposed to diverse environments in large geographic area for wheat spot blotch. Therefore, the primary objective of this study was to establish marker-trait associations for spot blotch using genotyping by sequencing (GBS) SNP markers in spring wheat in the South Asian wheat growing regions. The signi cant SNPs can give insights into the biological function and its relationship with resistance more relevant to the South Asian region. This study aims not only to identify novel QTLs but to validate known genomic loci conferring spot blotch resistance. So far, there is no study on genomic predictions for spot blotch disease resistance in wheat, therefore, we made rst attempt to test genomic prediction models across environments.

Results
Spot blotch disease was recorded visually on a scale of zero (no symptoms visible) to 100 (completely susceptible) across three years (2016-17, 2017-18 and 2018-19). The populations displayed signi cant phenotypic variation for spot blotch resistance with nearly continuous distribution of lines in all environments (Fig 1). The mean disease severity ranged from 8.90 to 31.23 in four environments (2016-17 to 2018-19) including Pusa, India and Bangladesh Agricultural Research Institute (BARI), Bangladesh (Table 1). Highest mean disease severity was recorded in BARI 2016-17 (Env4) while lowest was at PUSA 2017-18 (Env2). The analysis of variance revealed highest heritability in Env 4 (0.80) while lowest was in Env 3 (0.50). Based on the combined analysis of all environments, we observed moderate heritability (0.55). There was signi cant Genotype × Environment interactions (P <0.0001; Table 2).

Genetic linkage mapping
The genetic linkage map was prepared using the most signi cant SNPs found on 14 hexaploid wheat chromosomes. A total of 70 SNPs were used and clustered in 23 linkage groups (Fig 2). A linkage group was considered to be different if the gap between them is more than 10cM on the same chromosome. We observed a maximum of three linkage groups on chromosome 2B and 5B. Similarly, two linkage groups were found on each of 2A, 5A, 7A, 7B, and 7D. Maximum number of signi cant SNPs (18 SNP markers) were observed on chromosome 2B (Table 3).

Principal component analysis
Population structure was determined using genotyping information and principal component analysis (PCA) based approach where genotypes clustered in 12 subgroups. The clustering was done using Ward method in JMP v.13 (Fig 3). The Group 1 (G-I) consisted of eight lines including the resistant check HD2733. Maximum number of lines included in G-VII (24 lines) while minimum in G-X and G-XII (6 lines in each). Most of the lines in a group shared alleles descended from common parents. The lines without common parents or less than three sibs per family were classi ed as 'others'. The largest group (G-VII) consists lines with mixed pedigrees including SAUAL, WBLL#1, Kachu #1, BAV92//IRENA/KAUZ, FRANCOLIN#1, MUCUY and PBW343. The parental lines with TRCH/SRTU//KACHU cross in their genetic backgrounds dominated G-VIII. The parental line AKURI was a most common parent in G-IX and similar in case of G-XII where sister lines dominated the group. The intra-chromosomal LD was calculated as the pairwise marker correlations (R 2 ) between the GBS markers and plotted against the physical distance for signi cant marker-trait associations (Fig S1).

Marker -Trait associations
The GWAS of spot blotch resistance was performed based on the data collected at adult plant stages. After false discovery rate, a q-value (corrected p value) <0.05 was used as a threshold to identify signi cantly associated markers. The GWAS results of spot blotch resistance from the trials conducted at Borlaug Institute for South Asia (BISA) in Pusa, India and BARI in Jamalpur, Bangladesh are given in supplementary tables (Table S1). A total of 70 most signi cant SNPs markers associated with spot   blotch disease resistance belong to 23 linkage groups were detected on chromosomes 1A, 1B, 1D, 2A, 2B,  3A, 3B, 3D, 5A, 5B, 6A, 7A, 7B and 7D (Fig 2). The phenotypic variation explained by the most signi cant chromosome region ranged from 6.75% (Env2) to 13.65% (Env1) in three years trials. The phenotypic variation explained in individual environments ranged from 7.37-13.65%, 6.75-9.39%, 7.51-12.72% and 7.60-11.68% in Env1, Env2, Env3 and Env4 respectively (Table S1). The largest phenotypic variation explained by the SNP marker located on chromosome 5B in Env1 followed by 2B in Env3 explaining up to 13.65% and 12.72% of phenotypic variation respectively. Most of the SNPs marker regions appeared in more than one environment. We detected seven signi cant chromosome regions on chromosome 2B while six were on chromosome 5B. A total of 15 SNP markers (S1B_646895451, S2A_31851904,   S2B_504717, S2B_525073, S2B_594959, S2B_6253562, S2B_8311062, S2B_90662917, S3B_763230831,  S3B_763236179, S3B_763267753, S3B_764192662, S4A_710830493, S6B_719904092 and S7D_181974079) were signi cant in at least two environments (Table S1).

Gene functional annotations
The signi cant SNPs identi ed from the GWAS analysis were further studied for the known candidate genes relevant to disease resistance using the recently annotated wheat reference sequence (RefSeq V1.0). We identi ed NBS-LRR and 35 other plant defense-related protein families across multiple chromosome regions. The signi cant SNPs S2B_13751999 identi ed in Env1 on chromosome 2B was located between the GeneIDs, TraesCS2B01G030100 and TraesCS2B01G030200. The latter gene play an important role in the resistance of various plant diseases including the downy mildew [31] by producing RPP13 protein while the former gene is involved in synthesis of lectin-receptor kinase which has an important plant immunity function [32]. Similarly, the SNP S2B_699219601 identi ed in Env4 belongs to the GenID, TraesCS2B01G505200 also involve in downy mildew disease resistance response and other diseases ( Table 3). The SNP S2B_8311062 identi ed in Env1 and Env3 was located close the geneID, TraesCS2B01G018200, which is involved in NBS-LRR disease resistance protein synthesis. Similarly, the SNP S2B_28592818 identi ed in Env4 located close the geneID, TraesCS2B01G058900 also involved in synthesis NBS-LRR disease resistance protein. The annotation of S5B_683352145 revealed that the gene on chromosome 5B identi ed in Env4 also involved in synthesis of NBS-LRR disease resistance protein family-1. The SNP S2B_6253562 falls within the GeneID, TraesCS2B01G012400 that encodes Avr9/Cf-9 rapidly elicited protein. The Avr9/Cf-9 protein is involved in early signaling events in the Avr9/Cf-9dependent plant defense response. The most important SNP S5A_595393566 detected in Env1on chromosome 5A belongs to the gene TraesCS5A01G402800 which mediates spot blotch (Bipolaris sorokiniana) resistance in wheat ( Table 3). The other SNPs S7B_1020705 and S3B_763230831 associated with Leucine-rich repeat receptor-like protein kinase and transmembrane protein contribute to Fusarium resistance in cereals. The SNP (S5B_233586644; geneID TraesCS5B01G128000) explaining highest phenotypic variance (13.65%) was detected in Env1, produces signal recognition particle subunit SRP68 which play a crucial role in targeting secretory proteins to the rough endoplasmic reticulum membrane. The second highest phenotypic variance explained by the SNP (S2B_504717) located on chromosome 2B involved in DON resistance through cytochrome 450. The SNP on 1A, 1B, 1D, 2A, 2B, 3A, 3B, 4B, 5A, 5B, 6A and 7B found to be involved directly in disease resistance mechanism (Table 3).
To further con rm the results, we looked at the common proteins through gene annotation associated with different SNPs, identi ed independently in different environments. For example, Zinc nger family protein were associated with the SNP on identi ed on 2B (S2B_15129248) and 3B (S3B_764747435) in Env1, 3D (S3D_610628298) in Env2 and 3A in Env3 (S3A_67348475). Similarly, the kinase protein family were associated with the SNPs on chromosome 2B (S2B_13761590), 2A (S2A_708482943) and 5B (S5B_683514734) in Env4. The results based on the gene annotation and synthesized proteins are presented in supplementary Table S2 and

Evaluation of Prediction Accuracy
For genomic prediction modeling, the 141 lines were randomly divided into training and testing sets of size 4/5 and 1/5, respectively for each of the four environments. In the initial model training step, both genotypes and the observed phenotypes (i.e., phenotypic BLUPs) were used to estimate the genetic marker effects from the training population. The estimated marker effects were subsequently used to predict the phenotypes of the testing set population. This process was repeated 100 times to sample a random set of training and testing set population during each iteration. The average correlation between the observed and the predicted phenotypes was calculated. The prediction accuracy distributions, means, and standard errors were reported by each environment. The cross-validation prediction accuracy of the Env4 was 0.33 while for the Env1 and Env2 had a prediction accuracy of 0.29 and 0.24 respectively ( Fig  5). In contrast to other environments, the prediction accuracy of the Env3 was negative (-0.16).

Discussion
The eld trials were conducted at BISA research farm, Pusa, in India for three consecutive years from 2016-17 to 2018-19 and BARI farm, Jamalpur in Bangladesh during 2016-17 crop season. Both the locations fall under the non-traditional, warmer wheat-growing regions belonging to Mega-environment 5 characterized by hot, humid conditions as per CIMMYT's system for classifying wheat-growing environments in developing countries [11]. The average temperature during the wheat plant reproductive phase at Jamalpur and Pusa is higher than 19 0 C with a high relative humidity [33] (Table S3).
The spot blotch disease incidence was captured as percentage of infected leaf area at three different growth stages to minimize the chances of disease escape due to environmental factors. However, the scoring date showing highest disease pressure (usually the second one) was used in the analysis. Since the susceptible parent displayed highest disease severity at growth stage 77 (GS77) on Zadoks scale [34], to make better judgment about the level of resistance, disease severity was recorded at this stage (usually second scoring) was used to differentiate each line.
The nearly continuous distribution of lines in all the environments show quantitative nature of resistance. The same has been supported by earlier ndings where more than two genes [35,12,13,36,37] and multiple alleles with minor effect [30,21] to control spot blotch resistance is reported. It was found that the log transformation improved the data normality which was also re ected by the improved consistency in the GWAS results across locations. We observed signi cant genetic variation for disease susceptibility in the population. The genetic variances and moderate to high heritabilities for spot blotch were comparable with earlier ndings in wheat [37,38,36]. Despite signi cant genotype × environments interactions, we observed moderate to high heritabilities within environments ( Table 2). The environmental interactions might ascribe to difference in the pathogen isolates prevalent in NEPZ of India and Bangladesh in case of locations and weather conditions mainly within location. For example, the maximum mean disease severity of the susceptible lines were up to 43% in Env3 while it was 70% in Env1 ( Table 1).
The linkage analysis was based on 18637 ltered SNP markers covering all chromosomes. The redundant SNPs with 0 cM distance and with same gene annotation were removed from the linkage mapping as no additional information is expected. After GWAS analysis, 14 chromosomes harboring signi cant QTL regions forming 23 linkage groups were used for further analysis and graphical representations. The SNP lies more than 10 cm apart based on linkage mapping, were placed in a separate linkage group. (Fig. 2).
We used genotyping information for the PCA where most of the groups were based on the proportion of genome shared by the parental pool except few exceptions. For example, the subgroup (G-VIII) consisted common parent TRCH/SRTU//KACH while the largest group (G-VII) consists lines with mixed pedigrees dominated by SAUAL, WBLL#1, Kachu #1, BAV92//IRENA/KAUZ, FRANCOLIN#1, MUCUY and PBW343.
So far, based on the consistency in independent QTL mapping studies using different source of resistance, it seems that there is not much genetic variability in spot blotch pathogen across the continent. However, several studies described clear grouping among spot blotch isolates based on the fungal hyphae color, aggressiveness and DNA ngerprinting [48, 49, 50, 51]. Four chromosomal regions on 1B, 2B, 4A and 6B are consistent between Pusa India and Jamalpur Bangladesh. This may be due to prevalence of most aggressive isolate of spot blotch pathogen (isolate No. ICMP 13584, Auckland, New To study the importance of signi cant SNPs in disease resistance, we annotated all SNPs using wheat genome assembly annotation (IWGSC Ref Seq v1.0) and traced the protein synthesized by the annotated gene. The literature was mined to look for the putative functions of those proteins. We found that several genes functional annotation strongly associated with disease resistance and observed across the year and environments (Table 3). For example, seven SNPs (S2B_13814702, S2B_533178164, S2B_14809954, S2B_14963432, S2B_15129248, S2B_504717, S2B_78065) on chromosome 2B associated with eight geneIDs, TraesCS2B01G030500, TraesCS2B01G373900, TraesCS2B01G031700, TraesCS2B01G031900, TraesCS2B01G032000, TraesCS2B01G032100, TraesCS2B01G001100 and TraesCS2B01G000400 involved in synthesis of Cytochromosome P450 family protein. The role of Cytochrome P450 family protein in plant defense, secondary metabolite biosynthesis in the classical xenobiotic detoxi cation pathway is well established by Thapa et al. 2018 [53]. It is involved in resistance to DON which is a trichothecene mycotoxin produced by Fusarium species and increase yield. The Cytochrome P450 family protein may not involve directly in yield increase but to enhanced Fusarium head blight disease resistance.
The SNP (S2B_28592818) detected in Env4 on same chromosome (2B) but at different region synthesizes NBS-LRR disease resistance protein family contribute for disease resistance [54,55]. Similarly, the SNP S2B_8311062 and S5B_683352145 also associated with the gene synthesize NBS-LRR disease resistance protein family and contribute for fungal disease resistance. The role of NBS-LRR disease resistance protein is disease resistance mechanism is well established [54, 55]. One of the signi cant SNP located on chromosome 2B, S2B_15129248 is linked to two geneIDs, namely, TraesCS2B01G032100 (synthesize Cytochrome P450 family proteins) and TraesCS2B01G032200 (involved in GRF zinc nger family protein). Both the proteins play an important role in plant disease resistance [53,56].
It is interesting to note that the most important SNP S5A_595393566 detected in Env1on chromosome 5A belongs to the gene TraesCS5A01G402800 which mediates spot blotch resistance in wheat. This gene is involved in the synthesis of Myb family transcription factor-like protein, found to mediate host resistance to Bipolaris sorokiniana in wheat [57]. The same region has been reported in other independent studies as well [18,20,30,21]. Similarly, the SNP S3A_67065083 associated with geneID TraesCS3A01G103500 involved in synthesis of 1R-MYB Transcription factor which plays an important role in disease resistance against stripe rust fungus and ear head disease in wheat [45].
Maximum number of known proteins involved in fungal defense were based on 14 SNPs on chromosome 2B showing the importance of this chromosome in disease resistance. The earlier independent ndings also describe the importance of chromosome 2B in spot blotch disease resistance [12,19]. Interestingly QTL found in the present study on 1B in Env1, the proteins involved are Pentatricopeptide repeatcontaining protein (TraesCS1B01G424000) mRNAs renders more susceptible to pathogenic bacteria and fungi in Arabidopsis thaliana [109] and Homeobox protein (TraesCS1B01G424100) associated with reaction to stripe rust and powdery mildew in common wheat [72]. The SNPs (S1B_646895451 and S1B_647195634) detected in two environments located on chromosome 1B are 18.97 cM apart on genetic linkage map while those belong to the same geneID TraesCS1B01G424000. The gene annotation results indicate role in plant disease resistance [69,65,72]. This information obtained from gene annotation could potentially be used in ne mapping and map-based cloning to further characterize the mechanisms of spot blotch disease resistance. The markers with lowest P-values may be converted in to diagnostic markers to validate the SNPs and used in identi cation of lines with desired alleles in early generations.

Genomic Prediction
We used the rrBLUP GS model, which includes all marker information to predict a line's genomic estimated breeding values (GEBV) [73,74]. rrBLUP requires much lower computational time compared to the other GS models and it is shown to be robust in different GS scenarios [75,76,77]. Improving disease resistance in crops via single or a few major genes may be a temporary solution because the effectiveness is limited to only selected races of the pathogen and therefore, have no broadenvironment application [78,79]. The costs connected with the introgression of major genes or QTLs into elite backgrounds is challenging and may unintentionally affect the breeding operations and fast-track the evolution of the pathogen. Here, genomic prediction is well-placed to improve the effectiveness of quantitative disease resistance efforts in wheat breeding [80,81]. The SB in wheat is shown to be controlled by many small to large effect genes [12,13,82,83]. Therefore, instead of focusing on a few large-effect QTL, the prediction of spot blotch disease infection based on both small-and large-effect QTLs promise a holistic genetic-based approach for broad-spectrum resistance to evade the development and spread of spot blotch contagion.
Our results provide additional evidence in support of the quantitative nature of the disease resistance in wheat. Moderate to high prediction accuracies for 3 of 4 environments indicated that our model was able to successfully capture the quantitative variation underlying the SB variation in our population (Fig. 5). The contrasting prediction accuracies for PUSA19 environments in our study underscore the need for additional research to investigate the stability of genomic predictions across environments [75,84,85]. The role of environmental variation in the form of genotype-by-environment and marker-by-environment interactions in genomic selection has been highlighted by others [86,87]. In view of both the positive and negative ndings in this genomic prediction work, this study would provide an important precursor for future wheat breeding research in South Asia which is proven by other researchers also [79,81].

Conclusions
This study aimed to identify genetic regions underlying spot blotch resistance in the elite spring wheat genotypes. The variable conditions at four eld environments in India and Bangladesh allowed us to capture the considerable phenotypic variation for spot blotch disease in our trials. The GWAS based on the phenotypic data at each site resulted in a total of 23 genomic regions on 14 chromosomes. We were able to validate earlier ndings and identi ed new genomic regions on chromosome 3D contributing up to 6.94% of phenotypic variation. The literature mining of the functional gene annotations identi ed 36 SNPs encoding single protein or protein family, directly or indirectly involved in disease resistance. The SNPs on chromosome 5A associated with the known gene encodes 'Myb family transcription factor-like protein' found to have direct involvement in spot blotch resistance. Using a ve-fold cross-validation scheme, we observed moderate prediction accuracies for 3 of 4 environments indicated that our model was able to successfully capture the quantitative variation underlying the SB variation in our population.
The results are of importance for the breeders developing spot blotch resistant varieties targeting South Asian region. Given the aggressive pathogen spread and food security concerns, the breeding programs in South Asia could bene t by deploying a genomic selection based breeding scheme for broad-spectrum spot blotch disease resistance in wheat.

Plant material and eld layout
The population was a genetically diverse collection comprising 141 advanced breeding lines of spring wheat. It represents 25 years of research at CIMMYT and was carefully assembled to avoid the confounding effects of phenology. The lines were evaluated in two replicates at two eld locations: BISA, Pusa, India (25°57 '22.8 . Both these locations are known as hot spot for spot blotch disease [3]. The plots of 3.8-meter length were sown in six rows with 0.22 meter spacing at each environment. The trials were timely sown with full irrigation applied through gravity ood-irrigation. The spreader rows of susceptible variety Sonalika were also planted in alleys for disease build up. In addition, four auxiliary gravity ood-irrigations were also given at regular intervals. All agronomic practices like fertigation and weeding were performed as recommended for each location.

Screening for spot blotch disease resistance
The material was evaluated under natural infection conditions in the eld. To limit the number of escapes, spot blotch response was evaluated during the mid to advanced-phases of disease development i.e., between heading (GS50 on Zadoks scale) and grain lling stage (GS80) [34]. The disease severity (SEV) were recorded visually on 0 to 100 scale where 0 is complete resistance and 100 is complete susceptibility.

Genotyping
Seeds of all lines were obtained from the CIMMYT genetic resources program and genomic DNA was extracted from ve bulked leaves using a CTAB procedure [88,] modi ed as described in Dreisigacker et al. 2013 [89] in CIMMYT, Mexico. The DNA samples were sent to Kansas State University, USA for GBS.
The GBS was performed following the protocol of Poland et al 2012 [90]. All lines were sequenced with Illumina Hi Seq2000 or HISeq2500. GBS-SNP markers were called with TASSEL v5.2 pipeline [91] and aligned to the reference Chinese Spring Wheat Assembly v1.0. The following SNP ltering criteria was applied on raw SNP calls: less than 30% missing markers, minimum 5% minor allele frequency (MAF) and less than 20% heterozygosity. The ltering step yielded 18637 markers and the remaining missing values were imputed using Beagle v4.1 [92].

Statistical analysis
The experimental design in each environment was an randomized complete block design with two replications per location. META-R v6.03 developed by CIMMYT [93], Mexico was used to perform multienvironment mixed model analysis. The environments were used as random effects and genotypes as xed effects. The resulting analysis produced the adjusted trait phenotypic values in the form of best linear unbiased predictions (BLUPs) within and across environments. In addition, the components of phenotypic variance were also extracted to calculate broad-sense line-mean heritability.  Functional annotation of the genes harboring signi cant SNPs were retrieved and examined for their association with spot blotch resistance from the genome annotations provided by IWGSC. Subsequently, genes annotated proteins functions were literature mined.

Genomic Prediction
The Log transformed BLUPs of spot blotch scores calculated across years were used for genomic prediction modeling. A ve-fold cross-validation scheme [96] was implemented. The advanced breeding lines were randomly divided into ve subsets (i.e., folds), and four of them were used as the training set. each with approximately the same number of individuals. The random cross-validation runs were repeated for 100 iterations. At each step, the predictive accuracy of the markers was assessed by Pearson's correlation between the predicted values and the BLUP phenotypes. Overall average of the fth fold was reported as accuracy of the prediction. All calculations were performed in R software [95] and by using the packages lme4 and rrBLUP [97,74).       Genomic prediction for spot blotch in four different environments using rrBLUP

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.