Association mapping of Iranian wheat genotypes to detect QTLs conferring resistance to stem rust at seedling stage

Background: Stem rust caused by Puccinia graminis f. sp. tritici ( Pgt ) is an important disease of wheat in the world. Pgt pathogen is constantly evolving and creating more virulent races that break down stem rust ( Sr ) resistance genes. As a result, many of Sr genes have become ineffective against new Pgt races. Exploring new sources of resistance to detect new Sr genes/QTLs is very important in order to introducing them into wheat breeding programs and developing resistant wheat cultivars. The objective of the present study was to evaluate 297 Iranian wheat genotypes for resistance to stem rust at seedling stage and to detect Sr resistance genes/QTLs through association mapping (AM). Results: A set of 297 Iran bread wheat cultivars and landraces were evaluated for infection type and latent period in four race of Pgt . Genotypic data of 282 genotypes were available, so AM was performed based on 282 genotypes. The results of population structure analysis showed that 277 genotypes clearly were distinguished in the three subpopulations and the other five genotypes were classified in the mixed group. The mean linkage disequilibrium decreased with increasing genetic distance. The markers did not have a uniform distribution on the genomes, so the share of each of the A, B and D genomes in commercial cultivars and landraces was approximately 37, 46 and 17%, respectively. Collectively, 69 QTLs for infection type and 62 QTLs for latent period of studied Pgt races were identified in the original dataset (P≤0.00 1). In the imputed SNPs dataset, the number of QTLs for infection type increased to 504 QTL and for latent period increased to 454 QTLs (P≤0.001). Conclusion: Based on the results of this study, it can be concluded that the Iranian wheat genotypes are valuable source resistance to stem rust. By incorporating these genotypes into wheat breeding programs and optimizing effective resistance genes, an important step can be taken to prevent the threat of and the disease to ensure food security. This study provides additional useful information for selection of resistant genotypes against the disease by improving marker-assisted selection efficiency.


Background
In cereal classification, wheat with more than 221 million hectares of cultivation area and more than 751 million tons of grain production annually, is recognized as the third most important cereal worldwide [1]. In Iran, approximately 7.65 million hectares out of about 11 million hectares of field crops in 2016-2017 cropping year has been devoted to cereal cultivation and of the total grain harvest, 70.22% to wheat, 20.84% to barley, 0.76% to rice paddy and 1.88% to grain corn was belonged [2]. These statistics show the great importance of grains, especially wheat, in the country. It is predicted that a 50% increase in cereal yield will be needed to meet the nutritional needs of the human population by 2050 [3]. Biotic and abiotic stresses challenge grain production and yield increase in wheat. Diseases and pests are among the most important biotic stresses that reduce approximately 20% of the world wheat yield annually [4]. Rusts are one of the most dangerous cereal diseases that have coexisted and evolved during the use of cereals [5]. Black or stem rust of wheat caused by Puccinia graminis Pers. f. sp. tritici (Pgt) is considered a very serious threat in major wheat growing areas around the world. This disease, which usually develops under humid conditions and high temperatures (15 to 35 ºC), is the most damaging rust among wheat rusts and the extent of damage may result in the complete destruction of large-scale wheat fields [6,7].
Application of fungicides and use of resistant wheat cultivars are two important strategies to combat Pgt. However, use of fungicides may not always be feasible due to the high cost, risk of fungal resistance to the fungicides, and the possibility of environmental contamination. Use of resistant wheat cultivars is an effective and economical way of controlling the disease. In this regard, it is important to have sufficient knowledge of population genetics of the causal agent of the disease and identification of resistance genes effective in wheat genotypes to be selected.
Based on the inheritance in plants, resistance to disease are monogenic and polygenic. In the monogenic state, a gene is responsible for the expression of resistance, and the type of resistance, dominance and codominance, and discrete variation can be obtained in differentiating generations and can be identified using the Mendelian rules and the monosomic technique of the genes. In polygenic state, resistance is controlled by several genes, so monosomic techniques and Mendel's rules cannot be used to identify resistance genes. Linkage mapping to detect quantitative traits loci (QTLs) is one of the methods used for the genetic study of traits controlled by multiple genes. The major limitations of this method are limited number of crossing overs, resulting in low resolution of the genetic map (10-20 cM) and high cost of population replication to reach sufficient crossing overs. In addition, the created populations are only useful for limited traits and studies [8,9]. An alternative method that has the advantage of using natural populations is association mapping.
The purpose of association mapping is to identify continuous markers linked to one or more quantitative traits. Association mapping is based on the assumption that the observed phenotypic variation is related to genetic variation. In this method, a large and diverse set of individuals from a population is randomly collected and mapping is done based on linkage disequilibrium. Association mapping has been used for both whole-genome scanning and candidate gene analysis [10][11][12][13][14]. Association mapping is much more accurate than linkage mapping due to its many recombination's [15]. In addition, it is more compatible with genetically diverse germplasm and allows mapping of several traits at the same time.
Therefore, for each trait desired, there is no need to create two-parental populations, which in turn incurs additional costs for genotypic and phenotypic evaluation.
The objectives of this study were to carry out a genome wide search in Iranian wheat genotypes for resistance loci to Pgt races at the seedling stage, and the identification of genomic regions suitable for marker-assisted selection and further genetic dissection.

Phenotypic data analysis
Combined analysis of variance was performed for data of infection types and latent period as Bartlett test [16] confirmed homogeneity of variance of experimental errors for data.
According to the results, significant variation (P≤0.01) was observed among the wheat genotypes and Pgt races in terms of infection types and latent period. The effect of genotype×race was also significant (P≤0.01) for both characteristics of infection types and latent period which indicates that for the different Pgt races, the response of the genotypes varied in terms of infection types and latent period (Table 1). Simple variance analysis of infection types and latent period for each race of Pgt was performed to detect variation in reaction of the genotypes to four Pgt individual races (Table 2). Based on these results, significant variation (P≤0.01) was observed among genotypes for infection types and latent period of every race tested.
Genetic parameters including variance components, coefficients of genetic and phenotypic variations, heritability, and genetic efficiency for infection types and latent period in wheat genotypes against four Pgt individual races were calculated (   The frequency of genotypes was calculated based on the 0-4 infection type scale [17,18] for each race and the results are shown in Table 4. Genotypes with seedling response between 0 and 2 (0-0;-;-;1-1-1+-2 --2C-2 and 2 + ) as resistance reaction and genotypes with infection type between 3 and 4 (3-3 + and 4) were considered as susceptible reaction [19]. The results showed that in TTTTF, 16  Cluster analysis of four Pgt races based on infection types of the wheat genotypes performed using the Ward's method ( Figure 1). The dendrogram obtained from the analysis classified the races into three groups with the cut line in the 5 coefficient. The TTTTF and TKTTF races had the least frequency of resistance reaction ( Table 4). The cause of slight difference in the frequency of the TKTTF resistance reaction to the TTTTF race can be attributed to the presence of the Sr11 gene, while the resistance of this gene is broken by the TTTTF race. The other two races (PTRTF and TTKTK) showed independent virulence patterns and each was housed in a separate group.

Population structure and kinship analysis
In order to avoid of mendacious positive results, the population of the wheat genotypes was studied in terms of structure and relationship, extracting its K and ∆K statistics and plotting the 2D graph ( Figure 2). The graph of ∆K at K=3 represents the highest value in perfectly specified fracture, dividing, the present population into three subpopulations ( Figure 3). The value of membership threshold was set to 0.5.
Principal component analysis was performed on the matrix derived from the original SNPs and imputed dataset to further evaluation of population structure and investigation of genetic relationships among wheat genotypes. Based on genotypic data, the 2D plot of original ( Figure 4A) and imputed SNPs ( Figure 4B) showed almost distinct grouping into three groups. Genotyping variance was explained by the first two main principal components that were 12.70 and 5.95, respectively for the original SNPs and 17.20 and 6.10, respectively for imputed dataset.
Out of total of 282 genotypes, 79 genotypes (28.02%) were in the first group, 123 genotypes

Cultivars
Landrace group I Landrace group II

Marker-trait association
Association mapping was used to determine the association between the phenotypic data of infection type and latent period from the evaluation of seedling stage of wheat genotypes to Pgt and genotypic data. Table 7 Genome  TTTTF  PTRTF  TTKTK  TKTTF  IT  LP  IT  LP  IT  LP  IT  LP  Original data  Marker-trait association  29  15  15  18  10  19  15  10  Genome A  2  2  2  5  3  6  4  3  Genome B  17  8  9  6  3  2  8  5  Genome D  8  2  1  3  3  8  2  1  Unassembled Chromosomes  2  3  2  4  1  3  1  1  Imputed data  Marker-trait association  208  128  109  106  83  160  104  60   Genome A   55  37  27  44  25  53  37  10  Genome B  95  69  67  35  39  75  54  33  Genome D  56  19  13  23  18  29  12  16  Unassembled Chromosomes  2  3  2  4  1  3  1 1 IT: Infection type, LP: Latent period Table 8 shows the results of the Bonferroni correction at the 5% probability level of the association analysis of original SNPs. For Bonferroni correction of the original dataset, 0.05 was divided by the number of markers used (8959 markers) and finally were identified markers that were significant at the probability level of 5.58098E-06. Based on the results of Bonferroni correction, two QTLs for the TTTTF latent period, one QTL for the infection type and four QTLs for latent period of PTRTF, one QTL for the infection type, and one QTL for latent period of TTKTK, and one QTL for the infection type of TKTTF were identified. The identified QTLs associated with the latent period of the studied races, except for one QTL related to latent period of PTTTF that located on chromosome 7A with a distance of 133.92 cM, the rest were associated with markers whose genomic location was unknown. The QTL identified for the PTRTF infection type was located on chromosome 7A at 132.92 cM. QTL of TTKTK infection type was on chromosome 6B at 43.284 cM and QTL of TKTTF infection type was on chromosome 1D at 64.821 cM.
Bonferroni correction of the imputed SNPs was performed as with the original SNPs, but with this difference that, the number of markers used in this data was 43921 markers, so 0.05 was divided into these markers number (43921) and the level of significance was considered 1.1384E-06 and the results are shown in Table 9. In this regard, the TTTTF infection type was controlled by six QTLs, which one QTL on chromosome 1, two QTLs on chromosome 3 and one QTL on chromosome 5 of genome B was located; the other two QTLs were located on QTL was detectable, but no QTL was recorded for the infection type. The identified QTLs of the latent period were located on A (chromosomes six, three, five and seven, respectively) and B genomes (chromosomes one and three, respectively) and a QTL was also associated with markers with unknown genomic loci. In the TKTTF, only three QTLs associated with the infection type were identified, two QTLs were located on chromosome 1B at 11.376 cM and one QTL was located on chromosome 1D at 64.821 cM.  TTTTF  PTRTF  TTKTK  TKTTF  IT  LP  IT  LP  IT  LP  IT  LP  rs46629 1D 64   TTTTF  PTRTF  TTKTK  TKTTF  IT  LP  IT  LP  IT  LP  IT Un   Sequence  TGCAGACGGCGGCGGCGCGAAC  AAGCATGGTGGCGAAGCCGACA  CGGAGGAGGATCTCGGCGAG  TGCAGCAAGCTAAACCAAGTCT  TGAAACTGGAGCAGCGAACAAT  TCTAAAGTCGCTTGAGGCTC  TGCAGCCAGAGCTCAAGAGGCC  CCCTCGGCGGCATCTCAAAGGA  GAAGGAAGGAGGAAGGCGAA  TGCAGCCGCCCGTCCACCCCCT  TCGTCTTCGTCCAGCACCAACC  CGTCGCCAGCGTCGCCGTCA  TGCAGCGTGTAGATTACCATGC  TATTCACACAGAAGTTATCGGG  TGTATATGTCATCAACTTTT  TGCAGGCAGCTGACTGTACTGG  AGGGCCACACATTTATCGAAAC  CCTGCTGCCTGCCCTGGTTC  TGCAGGCAGCTGACTGTACTGG  AGGGCCACACATTTATCGAAAC  CCTGCTGCCTGCCCTGGTTC  TGCAGGCATTCCATCAAGCAAC  CAGCAAGTCAGCACCAGAATCA  CCCCAATCAGACATGAGTGA  TGCAGGGGGTAGCCCTGAGCGC  CGCCAAATCAGATCAAACGATG  CCCTTCACTACATAAGCCAG Marker rs4500 rs12408 rs21674 rs25281 rs33960 rs46339 rs46340 rs46629 rs51316

Discussion
The characteristics of the races used to evaluate seedling stage and the effective and ineffective genes for each race are presented in Table 11. These races are known as the dominant black rust races in the country collected during 2015 and 2016 and with inoculated on 20 lines and differential cultivar of North American [20] the race of those have been determined. Isolate 95-2 was pathogenic to the Sr31 resistance gene, which for a very long time caused resistance to Pgt. This isolate belonged to the Ug99 race group and was named TTKTK.
The results of combined analysis of variance showed that in both traits of infection type and latent period, the highest total sum of squares was explained by genotype effect (56.55% and 49.79%, respectively); therefore, it is inferred that genotypes have had a significant effect on data variation. Then, the highest justified variance of infection type (34.40%) and latent period (32.98%) were due to genotype-race interaction. The race effect for infection type and latent period explained 3.75% and 7.85% of total variation, respectively. The small effect of race indicates a relatively low diversity among the races under study.
The values of the genetic and phenotypic variation coefficients for all races infection types and latent periods were very close to each other; this indicates the high impact of genes on creating diversity among genotypes. Each of the phenotypic, genetic, and environmental diversity coefficients provides useful information about the genetic or environmental diversity observed. As can be seen in the results (Table 3), the coefficient of phenotypic variation was higher than the genetic variation coefficient in all studied infection types and latent periods.
This is due to the influence of environmental factors. Since the phenotypic variance is derived from the sum of environmental and genetic variance, so if the genetic variance is assumed to be constant, what causes one trait to differ in phenotypic and genetic variation would be environmental variance; In other words, if the trait has high phenotypic variation and low genetic diversity, this indicates the effect of the environment. The greater the variation ratio of genetic to the environment, the greater the efficiency of selection, and the identification and selection of favorable genotypes from unfavorable will be more accurately performed.
Genetic variation coefficient reflects the variation among genotypes in terms of specificity under study and alone is not able to determine the extent of inheritance of this variation. This index, along with heritability, provides a good estimate of genetic progress in phenotypic selection [21]. Simultaneous application of two very important parameters of heritability and genetic efficiency plays an important role in the development of cultivars and genotypes. The high rate of genetic efficiency indicates additive gene action and its low level indicates nonadditive gene action. When high heritability is estimated for a trait, genetic efficiency will not necessarily be high. If high heritability and high genetic efficiency coincide, it will reflect the additive effects of genes; but if high heritability is associated with low genetic function, it would indicate epistatic effects or dominance [22]. If trait heritability is less than 0.2 (20%), it indicates low heritability, if it is between 0.2 (20%) to 0.5 (50%) it has moderate heritability and if it is more than 0.5 (50%) it has high heritability [23]. High heritability with favorable genetic efficiency were observed in the PTRTF infection type and latent period (Table 3)  to races which disagrees with previous studies [24][25][26][27].
Population structure has been used in genetic studies to explain relationship between individuals inside and between populations and shows a perspective on evolutionary relationship of individuals in a population. In addition existence of structure in population that studied for association mapping, it is a deterrent to achieving reliable result and caused on positive results in population that is not considered by the effects of population structure factors and relationship [28]. Out of 282 genotypes in both groups the original and imputed datasets, 277 genotypes (98.22%) belonged to the 3 identified subpopulations and the remaining 5 genotypes (1.78%), including 1 cultivar and 4 landraces, were excluded from the mixed genotypes. The number of these genotypes that were in mixed group, was small that indicating very low mixing in the population under study. Allelic incorporation in landraces of more than one single gene cohort could have caused a slight mixing, that due to the distribution of wheat in more than on ancestral population [29]. Another reason for mixing may be the presence of gene flow from the introduction of new genotypes into farms in the past. In this regard, germplasm exchange between different Mediterranean regions due to the development of ancient empires is considered as another possible reason for mixing [30].
Geographical original of genotypes, selection and genetic drift cause subpopulation within a large population [31,32]. The reason some cultivars fall into associated groups of landrace is that some of cultivars have been selected and introduced based on indices from landrace and thus have a high genetic similarity to landrace [33].
The extent of linkage disequilibrium (LD) determines the number of markers needed to identify marker-trait association as well as mapping resolution [31]. On the other hand, in genome wide association mapping, locating QTLs is based on the extent of the LD and is therefore of particular importance [34]. Generally with increasing genetic distance, LD respectively. Nearly 87% of markers pairwise and 90% of significant markers pairwise in commercial cultivars and in landraces nearly 87% of markers pairwise and 88% of significant markers pairwise in original SNPs had genetic distance less than 10 cM. In the imputed SNPs, all the markers pairwise and the significant markers pairwise related to the commercial cultivars and landraces were located less than 10 cM. This issue indicants a high impact of LD on population, but as can be seen in result, with increasing distance between markers pairwise, LD in the population have decreased but not disappeared; this is due to other factors such as population structure, genetic drift, migration, selection and mutation. As can be seen in the results, the amount of LD in the cultivars was higher than landraces. The reason for this increase in LD in cultivars is the selection, whether natural or synthetic, which causes a LD among the genes selected and the genes associated with it. In addition, selection of the ascending (or descending) trait controlled by two or more non-linkage genes, despite the high genetic (physical) distance among the genes, increases the LD [35]. The relatively high level of LD in many chromosomal regions in the population indicates that association mapping can be a useful and effective method for identifying and confirming QTLs in these regions [36].
Since association mapping was introduced in plants [10], Due to significant advances in DNA sequencing technology, interest in the identification of new genes has gained popularity. The application of this method in plant populations to identify loci responsible for genetic variation including resistance to disease is increasing [37][38][39]. After Bonferroni correction aimed at identifying markers that are highly correlated with the target trait, the number of QTLs identified reduced in the original dataset, so the total of the QTLs for the studied races infection type were 3 and for the latent period 7 QTLs remained.
In the imputed dataset, the number of QTLs of the infection type was reduced to 10 QTLs and the number of QTLs of the latent period decreased to 71 QTLs. To answer the question of whether QTLs identified with specific genomic regions in this study have been previously introduced or not? The results of this study were compared with previous findings. Some of the QTLs identified in this study corresponded to the genomic regions of the Sr genes and the QTLs reported in previous studies [40,41]. B chromosomes were fined chromosomes carrying rust resistance genes in wheat, which has been reported in various studies [19,27,40,42], in the present study, only one marker (rs23510) was identified based on the results of On the other hand, in this chromosomal position, the Sr14 gene is resistant to some Pgt races in wheat [43], but in this study a significant effect on the infection type of each race was observed. Specific study of TTKTK infection type belonging to the Ug99 group and reported in the study of Jin et al. [44] did not show any effect. At 106.99 cM is located on chromosome 1B marker rs5923. This marker is related to the genomic region that caused seedling resistance to the TTTTF and was reported in the study by Letta et al. [19]. Not only the Pgt resistance gene is located in this region, but also the Lr46/Yr29/Pm39 gene block and the unknown resistance genes are in the adult plant stage [45,46]. A number of Pgt resistance genes, including Sr9, Sr16, and Sr28 [18] and SrWeb, are located on the long arm of chromosome 2B. The SrWeb gene causes resistance to Ug99, while none of the four Sr9 alleles has the ability [44]. No resistance to TTKTK was found on chromosome 2B; the with genomic A chromosomes. The Sr34 and Sr38 Pgt genes are located on chromosome 2A [19] and are somewhat close to the rs17477 marker in genomic position, but because the resistance of these two genes is broken by the studied races and as a result, they were inactivated against rust disease, with no QTL-associated region associated with the infection type. Therefore, this marker can be used in marker assisted selection to identify Sr34 and Sr38. In a study, the coexistence of two markers cfa2201 and wPt-5839 with two Sr34 and Sr38 Pgt resistance genes was reported [19]. These two genes are ineffective against the Ug99 race group, one of which (TTKTK) has been investigated in the present study (Jin et al., 2007;. The Sr7 and SrND643 resistance genes (probably the Sr7 allele) are located on the long arm of chromosome 4A [18,47]. This gene caused resistance to the PTRTF race under study, but no marker-trait association was found in this study. Other studies have also mapped genomic regions associated with resistance to black rust disease in genome A such as marker wpt-734078 [45] and wpt-6869 [48] on chromosome 1AS.
The D genome had the lowest number of marker-trait association. In the original dataset between the infection type and the latent period of all the races under study, only one QTL was identified for the TKTTF infection type associated with the rs46629 marker on the 1D chromosome. In the imputed dataset, this marker retained its association with the TKTTF infection type, and the only marker had the marker-trait association on all TKTTF D genome chromosomes. The TTKTK had no marker-trait association with the D genome. For the TTTTF infection type, 2 QTLs were on the 2D and for the latent period 1 QTL was on 1D, 3 QTLs on 4D and 2 QTLs on 7D. In PTRTF, 1 QTL was observed on 1D, 3 QTL on 4D and 3 QTL on 7D. Edae et al. [49] in their study reported QTHJC resistance-related IWB17135 marker and TPMKC resistance-related IWA2415 marker on the short arm of 2D chromosome.
Tsilo et al. [50] mapped a QTL on the 2D chromosome at 78.50 cM, which was 1.1 cm from the Sr6 Pgt resistance gene.
In the original dataset, there were 6 QTLs on unknown chromosomes, 2 QTLs for TTTTF latent period, 3 QTLs for PTRTF latent period and 1 QTL for TTKTK latent period. In the imputed dataset, 2 QTLs associated to PTRTF latent period and 1 QTL associated to TTKTK latent period that overlapped with the original dataset were identified.
The markers that had the highest marker-trait association were blasted at the Ensembl database to identify overlapping genes, their molecular function and biological processes, as well as to identify orthologous genes. The results indicate that the genes adjacent to the markers play an important role in biosynthetic pathways such as ion transport, oxidation reduction, protein processing, phosphorylation, and so on. In general, environmental stresses cause significant changes in the expression levels of many of these genes in plants, so such changes result in the accumulation or reduction of important metabolites, changes in enzyme activity and protein synthesis rates, as well as the production of new proteins [51] that deal with environmental stress in various ways. ROS, phospholipid-derived signals, and cyclic nucleotide-dependent signals, as well as some plant hormones, are involved in stress signaling [52,53]. When the plant is attacked by the pathogen, in response to the pathogen attack, ROS is produced in the plants and acts as an important sign of stress and reduces the amount of stress damage possible by activating defense mechanisms [52][53][54]. Membrane lipids by regulating membrane fluidity and other physicochemical properties cause secondary signaling molecules in response to stress. Biosynthesis of lipids and their degrading enzymes play several roles such as direct or indirect regulation or effect on stress signaling and tolerance [52]. In plants, a variety of biotic and abiotic stresses such as pathogens, drought, salinity, high temperature, intense light, hormones, and nodulation factors cause changes in cytosolic calcium levels and lead to stress transmission [55,56]. In many transcriptional signaling pathways, the major form of signal transduction is reversible protein phosphorylation. Protein kinases such as mitogen-activated protein kinase (MAPK) are essential in stress developmental, hormonal, biotic and abiotic signaling. Protein phosphatases are responsible for the dephosphorylating of phospho-proteins. Protein phosphatases are subdivided into serine/threonine phosphatase, tyrosine phosphatase, and dual phosphatases through their specific substrates, which play an important role in serine/threonine phosphatase transcriptional stress transcription [52]. Eventually, protein kinases are likely to target transcription factors and bind to the stress-responsive genes promoter and consequently activate transcription [52].
After identifying the nearest wheat gene or genes to the markers, orthologous genes with these genes were examined. Numerous orthologous genes were observed but only genes with high percentage of identity were selected. The orthologous genes identified were generally related to two species of Triticum dicoccoides, Aegilops tauschii, ancestors of wheat and during the evolutionary stages of wheat with hybridization, they have entered it [57][58][59][60][61][62]. These orthologous genes had a high percent of identity with the genes identified in wheat that are involved in stress resistance mechanisms.

Conclusion
Association mapping of resistance of Iranian bread wheat cultivars and landraces to four Pgt races in seedling stage in terms of latent period and infection type. The studied germplasm was grouped into three sub-populations. Based on the results of original and imputed dataset, the highest number of marker pairwise in cultivars and landraces was observed on B genomes.
In the original dataset, the highest linkage disequilibrium was observed in D genome of cultivars and landraces but in imputed dataset, was observed in genome A of cultivars and Inoculated plants were examined from the second day to record the latent period (LP).
Fourteen days after inoculation, seedling infection type (IT) was recorded using the 0-4 scale introduced by Stakman et al. [17] and modified by McIntosh et al. [18]. To ensure the accuracy of the data collected, two replicates were considered for plant materials inoculated with each isolate. ITs less than or equal to 2+ were considered low infection types whereas ITs greater than or equal to 3 were considered high infection types. In order to use the modified Stakman ITs in the genome-wide association studies (GWAS), the 0-4 scale was converted to a 1-13 linear. The average linear scale score across the two replications was used in the AM analyses. Simple and combined analysis of variance (ANOVA) on the linearized 1 to 13 ratings was performed using SAS software 9.3 [67] and genetic parameters were calculated using Excel.
The races grouping were performed according to the estimated infestation type of 0-4 scale (1-13 ratings) on Ward's method using SPSS software. Cluster count (cut line) was also determined based on the discriminant function.

Genotyping by sequencing and imputation method
Genotypic evaluation of wheat genotypes were performed in collaboration with the US Ministry of Agriculture and the University of Kansas. The genomic library was constructed according to the method of Poland et al. [68]. To do this, DNA was extracted, normalized, and digested with PstI and MspI enzymes. Then, barcode adapters were attached to each sample, so that each sample has a different barcode. To remove extras except barcoded genomic DNA, purification was performed by QIAquick (Qiagen) PCR purification kits.
Finally, the size of the amplified fragments between 250-300 bp was determined on the E-gel system and sent for sequencing by the Ion Proton system. Sequencing data were treated for 64 bp and the same reads were grouped into tags. Identical sequence tags were aligned to identify the SNPs within the tags and the SNPs were summoned using the UNEAK (Universal Network Enabled Analysis Kit) GBS pipeline [69] as part of the TASSEL 4.0 bioinformatics package [70]. To reduce false positive error results, SNPs with heterozygosity greater than 10%, minor allele frequency less than 5%, and missing data greater than 20% were eliminated. After specifying the haplotype phase for all individuals, the data was subjected to imputation using BEAGLE v3.3.2 [71] based on available allele frequencies obtained. During imputation, four different reference genomes were assessed that among them W7984 reference genome was shown to have the greatest imputation accuracy [72]. The different chromosomes LD decay was obtained using the ggplot2 package in RStudio [73] based on LOESS regression.

Population structure and kinship matrix
In order to determine the genetic structure of the population of wheat genotypes and to identify sub-populations, Bayesian methods using STRUCTURE v2.3.4 [74] was used. The optimal number of K was determined by an admixture model and with a burn-in and simulation phase consisting of 30,000 steps for values of K from 1 to 10. The ΔK statistic was used to determine the most desirable subpopulation number and its plot was plotted for consecutive K values. The observed and expected values allele frequencies were used to estimate the linkage disequilibrium between markers in TASSEL v.5 [70]. Then, population structure matrix Q (n×p), where n is the number of genotypes assayed and p is the number of subpopulations defined, was used in association studies. Cluster analysis and principal component analysis were also performed to determine the genetic relationships between the genotypes.

Genome-wide association study
Both general linear model (GLM) and mixed linear model (MLM) were used to examine the accurate association between marker and trait in TASSEL [70]. The GAPIT package [75] was also used to perform association mapping in both GLM and MLM methods in RStudio [73].
The results of TASSEL and GAPIT were investigated using t-test. Based on the results, it was found that the general linear model derived from TASSEL provides more accurate information about the marker-trait association.

Gene annotation
Sequences surrounding of significant SNP markers were obtained from the wheat 90 K SNP database [76] and used for assessing gene annotation using Gramene (http://www.gramene.org/) by aligning them to the IWGSC RefSeq v1.0 annotation (https://wheat-urgi.versailles. inra.fr/Seq-Repository/Annotations). The function of putative genes was explored by investigating the pathways which the encoded enzymes were involved in. After aligning SNPs sequences to the reference genome, overlapping genes with the highest identity percentage and blast score were selected for further processing. The ontology of each adjacent genes with T. aestivum, including molecular function and biological process and also orthologous genes in related species, were extracted from the ensemble-gramene database (http://ensembl.gramene.org).

Supplementary information
Additional file 1 Table S1. The information of Iranian wheat genotypes including cultivars