Genetic diversity with pheno-morphological markers
The results of variance analysis using the BLUE & BLUP method and descriptive statistics are presented in Table 1, separately for the two years of the experiment (E1 and E2, respectively). According to the results of this analysis, it was found that the genotypic effect was significant for all traits in both years (except for seed yield in the first year). The estimated genotypic variance for each of the measured traits showed that in both crop years, the highest amount of variation was related to grain yield. In addition, the coefficient of variation for grain yield was estimated more than other traits. The highest and lowest heritability in the first year belonged to plant height and grain yield traits, which were 0.93 and 0.33, respectively. In the second year, grain yield was the lowest with 0.45, and thousand grain weight was the highest heritability with 0.91(Table 1). The GT charts shown separately in Fig. 2 for both experiment years. Based on the first year's data, 52.92% of the total genetic diversity among genotypes was justified by the first two components. By justifying the polygon diagram drawn in this bi-plot, genotypes No. 3, 4, 38, 77, 24, 43, and 86 are located at the polygon's vertices. Therefore, it can be said that genotype No. 43 had the highest value in terms of thousand kernel weight, and genotypes No. 3 and 4 had the highest value in terms of plant height. Also, the highest amount of grain yield belonged to genotype No. 86 (Fig. 2A). The bi-plot drawn based on the obtained data from the second year of the experiment showed that the first two components accounted for 33.22 and 26.53 % of the total variation related to studied traits among genotypes, respectively (Fig. 2B). Based on the second year's data, genotypes No. 18, 14, 91, 44, 72, 41, 45, 47, and 65 were placed in the vertices of polygons. As shown in Fig.2B can be seen, genotype 72 was superior to other genotypes in terms of thousand kernel weight. On the other hand, the best genotypes in terms of plant height and the number of days to physiological maturity were genotypes No. 47 and 65. Genotype No. 41 was also identified as the latest genotype. In terms of grain yield, genotypes No. 9, 11, and 86 had better potential than other genotypes.
Molecular diversity by SilicoDArT markers
The SilicoDArT SNP markers with very high genome coverage were used to study genome diversity in a panel consisting of 92 durum wheat genotypes, which is a valuable and powerful tool for GWAS analysis in the future. Also, these germplasms will be valuable resources for extracting alleles and selecting genotypes that give resistance genes to various biotic and abiotic stresses through linkage mapping in the whole genome. Candidate SilicoDArT and SNP markers can be used to establish marker-trait association (MTA) concerning specific genes or large QTLs controlling valuable traits in durum wheat. The information (data) of these markers was estimated by polymorphism information content (PIC). Polymorphism information content (PIC) is used to provide information about the variation of a gene or segment of DNA in a population to indicate evolutionary pressure on an allele and mutations at a gene locus that may have occurred over some time. When the scoring of a marker is (0) and (1) with ratios of 50% to 50%, the maximum PIC value is calculated as 0.5. In this research, a large number (7882 and 8948) of high-quality SilicoDArT and SNP polymorphic markers were filtered out of 62269 and 54543 markers, respectively, and used for the analysis of genetic diversity and population structure in a set of 92 durum wheat genotypes. The average PIC values for all SilicoDArT markers were equal to 0.38, which indicates the high efficiency of these markers for measuring the genomic diversity of durum wheat (Table 2). In general, the distribution of PIC values among the markers was asymmetric and tended towards values higher than the average, so about 60% of the markers showed PIC above the average (Fig.S1). The average content of polymorphic information in SNP markers was equal to 0.25. The lowest value of PIC was observed in the chromosomal group (Chr2A) with a value of 0.235 and the highest value in the chromosomal group (Chr6A) and (Chr5B) was 0.261. The distribution of PIC values for SilicoDArT and SNP markers is shown in Fig. S1 and Fig.S2, respectively. The distribution of PIC values among markers was almost non-uniform, and more than 3000 SilicoDArT markers had PIC values close to 0.5. About 75% of SilicoDArT markers had PIC values greater than 0.3. In addition to the PIC values some other quality parameters, such as the call rate and the reproducibility of each marker within the panel of examined durum wheat genotypes, were also estimated. The average reproducibility in both SilicoDArT and SNP markers was more than 0.98. The call rate average in SilicoDArT and SNP markers were equal to 0.92 and 0.98, respectively (Table 2). Among 14 linkage groups, the distribution frequency of SNP markers was higher than SilicoDArT markers. The average distribution of SilicoDArT markers in each chromosome was equal to 563, and the same index in SNP markers was equal to 639 in each chromosome. The lowest number of identified SNP markers was observed in the linkage group (Chr4A) and (Chr6A) and the highest number of SNP markers was observed in the chromosomal group (Chr2B). Also, the Chr7B linkage group showed the highest markers with an average of 1.14 SilicoDArT markers per Mbp.
Kinship coefficients between pairs of genotypes
Based on the kinship coefficients matrix, the investigated genotypes were divided into 3 groups for gene data. Also, the results showed that about 60% of the kinship coefficients between pairs of genotypes were between zero (except zero) and 0.1 and about 39% of the values were between 0.1 (except 0.1) and 0.5. Only about one percent of the coefficients had values greater than 0.5. Therefore, the results of this analysis indicated that there is very little kinship in the studied genotypes (Fig.3).
Bayesian clustering
According to the results, K and Delta K statistics were extracted, and two-dimensional diagrams were drawn that clearly showed the curve at the maximum of K=4 (Fig.S3). Therefore, according to the obtained results, durum wheat genotypes were grouped into four separate subpopulations based on the Bayesian model (Fig. 4). The four subpopulations showed relatively high genetic diversity, which ranged from 0.02 in a subpopulation (POP4) to 0.34 in a subpopulation (POP3). Net Nucleotide Distance is a parameter to measure genetic diversity among populations that revealed the highest (0.270) and lowest (0.07) genetic distance between POP2 and POP4 subpopulations and POP1 and POP2 subpopulations (Table 3).
Analysis of linkage disequilibrium (LD)
The linkage disequilibrium index was evaluated using 605212 pairs of DArTseq markers with specific map locations along 14 durum wheat chromosomes. Pairwise linkage disequilibrium values were estimated using the square of allelic frequency correlations (r2) between markers. To identify differences in intra-chromosomal linkage disequilibrium, the average values of r2 between pairs of markers are divided into five groups based on the genetic distance between them: markers with very high linkage (Distance less than 5 cM), Contiguous markers (with a distance of 5-10 cM), markers with medium continuity (with a distance of 10-12 cM), weakly connected markers (with a distance of 20-50 cM) and independent markers (with a distance of more than 50 cM). The evaluation of intra-chromosomal linkage disequilibrium showed that the linkage disequilibrium decreases with increasing genetic distance. Analysis of linkage disequilibrium was also investigated at the genome level. To show the extent of linkage disequilibrium at the level of the genome, the average reduction in disequilibrium was obtained by plotting intra-chromosomal r2 values against the genetic distance between markers (Fig.5). The amount of LD within the chromosomes based on both SilicoDArT and SNP markers for both durum wheat genomes (A and B) is shown in Table 4. The amount of LD in both types of markers was very wide, in such a way that 605,212 pairs of significant markers were observed in the entire population. LD analysis between the A and B genomes revealed that there are a high number of significant marker pairs (82052) in the B genome compared to the A genome (68885). The average r2 values for the whole population were equal to 0.14, whereas about A and B genomes were equal to 0.12 and 0.11, respectively (Table 4). A graphic analysis of the distribution of LD markers is shown in Fig.5. In general, 15 kb of the genome was covered in LD. The highest amount of linkage disequilibrium was observed in chromosome 4B. Some of the markers are shown on this chromosome (Fig.6).
Genome-Wide Association Study
The correlation analysis was performed using markers with a frequency of more than ten percent, and the P-value statistic was considered with 1000 permutations. Also, the lowest P-value was used as the basis for selecting the correlated markers. The distribution of markers was studied based on the coefficient of determination of the marker in the regression model. The coefficient of determination (r2) is the proportion of the phenotypic variance accounted for by the QTL for each locus. The association mapping of days-to-heading, day-to-physiological maturity, plant height, thousand kernel weight, and grain yield of durum wheat was performed using genotypic and phenotypic data. Association mapping analysis for each trait was done based on FarmCPU. To determine the significant markers associated with the studied traits (MATs), QQPlot analysis was also performed, and finally, the markers with -log 10(p) were selected. The Manhattan plots are shown based on all measured traits for two cropping years in Fig.7 and Fig.8, respectively. The full report of significant associations of markers related to all traits in both two years of the experiment is presented in Tables 5 and 6. The results of a genome-wide association study by the FarmCPU method in 2017 showed that 19 markers had a significant association with the studied traits (Table 5). The results of association mapping for studied traits in the first year of the experiment are summarized as follows:
- Two significant relationships were identified for the numbers of days to heading by 1703829 and 1087984 markers located on chromosomes 6B and 7A, respectively.
- The correlation of 3940462 markers on chromosome 4A with days to maturity was significant.
- Ten significant relationships were identified between plant height and DArT markers. These markers had chromosomal locations 3A, 3B, 4B, 5A, 5B, 6A, 6B, and 7B.
- Four significant associated markers with thousand-grain weight were identified (3570140, 992437, 3534094, and 3533907) on chromosomes 7A, 3A, 5B, and 3B, respectively.
- The effective trait of Grain yield had three significant relationships with 1108172, 4989009, and 1233550 markers located on chromosomal positions 3A, 2B, and 1B, respectively.
The results of genome-wide association analysis by the FarmCPU method in 2018 indicated that ten markers had a significant association with the studied traits (Table 6). The summary of the results was as follows:
- Association mapping for the numbers of days-to-heading revealed two significant markers (3935863 and 1211191) with the chromosomal locus 1A and 5B, respectively.
- Three markers (3064874, 1034732, and 1025860) with gene loci (794315843, 673038094, and 666991274) on chromosomes chr3B, chr4A, and chr4A respectively, had a significant relationship with the numbers of day to physiological maturity.
- The relationship of plant height was significant with the marker of 1057654 with gene locus 691163974 on chromosome chr7A. This is even though ten significant associations were identified for plant height in the previous year.
- The trait of a thousand-grain weight had a significant relationship with 3025786 and 981221 markers. These markers with gene loci 609888794 and 684637596 are located on chr6A and chr3B chromosomes, respectively.
- Two significant associations were identified between grain yield and DArT markers. The locus of these markers is located on chromosomes 6A and 7A, respectively.
In general, according to the results, the same marker (1057654) was identified for grain yield and plant height in the second experiment year. This marker with genetic locus 691163974 is located on chromosome 7A. By comparing the significant marker-trait relationships as well as the durum wheat consensus map, the position of the marker can be identified and checked by comparing them with physical maps. If the marker distance is more than +10 or -10 cM that position can be identified as a new QTL. Based on this, 29 quantitative positions were identified and confirmed in this study. Among the identified loci, 9 loci were not previously recorded in Durum's Consensus map (Table 7).