Genetic diversity with pheno-morphological markers
One of the primary goals of any breeding program is to identify and select the desired genotypes in terms of various traits(Mohammadi et al. 2014). To achieve this goal, the traits used should be clearly defined in terms of quantity and quality, so that in combination with important economic traits, they can be used to maximize the Grain yield of the selected genotypes. In this regard, several statistical methods, including correlation analysis, multiple regression analysis, and path coefficient analysis, have been proposed and widely used by experts and breeders. In the meantime, one of the most important statistical methods introduced to achieve this goal is genotype-trait graphical analysis (GT), which is obtained through analysis into principal components(Yan 2001). In this method, the bi-plot used is obtained through the first two components extracted through the analysis of eigenvalues on the desired attribute data in several environments (year, place, or a combination of year and place). Although this method was introduced to analyze multi-environment experiments, it can be used for any type of data that has a two-way structure(Yan 2001). In general, according to Yan and Rajcan (2002), the desired genotypes identified through this analysis can be used as parents in cross-breeding programs or directly for the production of commercial cultivars. In the present research, the GT charts related to 94 studied durum wheat genotypes based on the studied traits, separated by the two years of the experiment, are shown in Fig. 2. Based on the first year's data, 52.92% of the total genetic diversity among genotypes was justified by the first two components. By justifying the polygon diagram drawn in this bi-plot, genotypes No. 3, 4, 38, 77, 24, 43, and 86 are located at the vertices of the polygon. Therefore, it can be said that genotype No. 43 had the highest value in terms of thousand kernel weight, and genotypes No. 3 and 4 had the highest value in terms of plant height. Also, the highest amount of grain yield belonged to genotype No. 86 (Fig. 2A). The bi-plot drawn based on the obtained data from the second year of the experiment showed that the first two components accounted for 33.22 and 26.53 % of the total variation related to studied traits among genotypes, respectively (Fig. 2B). Based on the second year's data, genotypes No. 18, 14, 91, 44, 72, 41, 45, 47, and 65 were placed in the vertices of polygons. As shown in Fig.2B can be seen, genotype 72 was superior to other genotypes in terms of thousand kernel weight. On the other hand, the best genotypes in terms of plant height and the number of days to physiological maturity were genotypes No. 47 and 65. Genotype No. 41 was also identified as the latest genotype. In terms of grain yield, genotypes No. 9, 11, and 86 had better potential than other genotypes. Considering that the bi-plots showed all the available changes did not justify, these predictions may not accurately reflect the observed numbers. However, the predictions are very close to reality and it is possible to select genotypes with superior values compared to previously. Yan and Kang (2002) in soybean, Yan and Frégeau-Reid (2008) in oats, Dehghani et al. (2008) in winter rapeseed, and Rahmati et al. (2020) in cantaloupe applied this method to evaluate, compare and select superior cultivars and reported that this method in addition to showing the relationship between different traits graphically, can make the visual comparison between genotypes very possible.
Molecular diversity by SilicoDArT markers
The SilicoDArT markers with very high genome coverage were used to study genome diversity in a panel consisting of 92 durum wheat genotypes, which is a valuable and powerful tool for GWAS analysis in the future. Also, these germplasms will be very valuable resources for extracting alleles and selecting genotypes that give resistance genes to various biotic and abiotic stresses through linkage mapping in the whole genome. Candidate SilicoDArT markers can be used to establish marker-trait association (MTA) concerning specific genes or large QTLs controlling valuable traits in durum wheat. The information (data) of DArTseq markers was estimated by polymorphism information content (PIC). Polymorphism information content (PIC) is used to provide information about the variation of a gene or segment of DNA in a population to indicate evolutionary pressure on an allele and mutations at a gene locus that may have occurred over some time. When the scoring of a marker is (0) and (1) with ratios of 50% to 50%, the maximum PIC value is calculated as 0.5. In this study, a large number (7882) of high-quality SilicoDArT polymorphic markers were filtered out of 62269 markers and used for the analysis of genetic diversity and population structure in a set of 94 durum wheat genotypes. The average PIC values for all SilicoDArT markers were equal to 0.38, which indicates the high efficiency of these markers for measuring the genomic diversity of durum wheat (Table 2). In general, the distribution of PIC values among the markers was asymmetric and tended towards values higher than the average, so about 60% of the markers showed PIC above the average (Fig.3). Our ability to compare PIC values in this study was limited because there are few reports on genetic diversity using DArT, and we could not track documented reports using DArTseq markers in durum wheat genetic diversity studies. The average PIC values in the recent study were higher than the PIC value reported by Ren et al. (2013) using SNP markers in the global collection of 150 durum wheat genotypes (it was equal to 0.18888). Moragues et al.( 2007) investigated the genetic diversity of 63 indigenous populations of durum wheat from Mediterranean countries using AFLP and SSR markers, the average PIC values for AFLP and SSR markers were 0.24 and 0.70, respectively. The distribution of PIC values for SilicoDArT markers is shown in Fig.3. The distribution of PIC values among markers was almost non-uniform and more than 3000 SilicoDArT markers had PIC values close to 0.5. About 75% of SilicoDArT markers had PIC values greater than 0.3. In addition to the PIC value, some other quality parameters, such as the call rate and the reproducibility of each marker within the panel of examined durum wheat genotypes, were also estimated. Sequence reading rate (Call Rate) is the percentage of valid scores among all possible scores for a marker, where reproducibility is measured as the percentage of scoring repeatability for repeated samples. The average call rate and reproducibility of SilicoDArT markers in this study were more than 0.98 and 0.92, respectively (Table 2). Although the composition of restriction enzymes as well as the amount of diversity in the presented panel is different from other plant species, the amount of technical measurement of repeating pairs for the score of markers and success in reading the sequence of markers in the investigated sample is high and comparable to the previous applications of SilicoDArT in other plant species(Grzebelus et al. 2014; Ndjiondjop et al. 2017; O’Connor et al. 2019). The genomic coverage of SilicoDArT markers in recent research was uniform for most chromosomes and the highest (853) and lowest (300) number of markers were observed in Chr7B and Chr1A, respectively. Chromosome size covered by SilicoDArT markers ranged from 829200 kbp in the linkage group (Chr3B) to 589293.786 kbp in the linkage group (Chr1A) which is similar to previous studies in durum wheat using high-performance GBS (Baloch et al. 2017).
Table 2. SilicoDArT Markers Distribution on Durum Wheat Genome, Polymorphism Information Content (PIC), Call Rate and Average of Reproducibility
Linkage Groups
|
No. Markers
|
Chromosome Size(kbp)
|
Call Rate (mean)
|
One Ratio(mean)
|
PIC (mean)
|
Average of Reproducibility
|
Chr1A
|
300
|
589293.786
|
0.926811
|
0.53253
|
0.392783
|
0.987783
|
Chr2A
|
574
|
779665.301
|
0.928376
|
0.531615
|
0.378022
|
0.987819
|
Chr3A
|
454
|
750610.4
|
0.926852
|
0.528354
|
0.371944
|
0.987748
|
Chr4A
|
489
|
742808.8
|
0.929105
|
0.507459
|
0.382376
|
0.986216
|
Chr5A
|
436
|
708977
|
0.928146
|
0.520776
|
0.375594
|
0.98842
|
Chr6A
|
456
|
616320.1
|
0.928312
|
0.510363
|
0.392759
|
0.987314
|
Chr7A
|
789
|
735262.1
|
0.929175
|
0.510735
|
0.384203
|
0.986959
|
Ch1B
|
681
|
688589.5
|
0.92758
|
0.533268
|
0.385003
|
0.987277
|
Chr2B
|
759
|
800769.3
|
0.927394
|
0.531767
|
0.393895
|
0.987843
|
Chr3B
|
582
|
829200.1
|
0.922085
|
0.564573
|
0.369268
|
0.9861
|
Chr4B
|
333
|
671754.7
|
0.924849
|
0.538444
|
0.37495
|
0.987985
|
Chr5B
|
586
|
708929.4
|
0.923233
|
0.566275
|
0.382289
|
0.986366
|
Chr6B
|
590
|
720840
|
0.926843
|
0.534025
|
0.382743
|
0.986644
|
Chr7B
|
853
|
749946.8
|
0.92991
|
0.515027
|
0.392338
|
0.987082
|
Mean
|
563
|
720926.2
|
0.927048
|
0.530372
|
0.382726
|
0.987254
|
Group A
|
499.72
|
703276.78
|
0.928111
|
0.520261
|
0.382525
|
0.987465
|
Group B
|
626.3
|
738575.7
|
0.925985
|
0.540483
|
0.382927
|
0.987042
|
Kinship coefficients between pairs of genotypes
To the acceptable evaluation of genetic diversity in durum wheat genotypes, relative kinship coefficients between each pair of genotypes were estimated based on DArTseq markers. The value of the matrix of kinship coefficients can vary from 3 to -3, as much as its average tends to be zero or less, it indicates a very high diversity in the population. Based on the kinship coefficients matrix, the investigated genotypes were divided into 3 groups for gene data. Also, the results showed that about 60% of the kinship coefficients between pairs of genotypes were between zero (except zero) and 0.1 and about 39% of the values were between 0.1 (except 0.1) and 0.5. Only about one percent of the coefficients had values greater than 0.5. Therefore, the results of this analysis indicated that there is very little kinship in the studied genotypes(Fig.4). Genotypes that had a high kinship coefficient are very likely to have a common ancestry(Salgotra et al. 2015).
Bayesian clustering
The existence of groups and subgroups in large populations can be due to reasons such as different geographical origins of genotypes, natural or human selection, genetic drift, etc. (Flint-Garcia et al. 2003). In this analysis, the possibility of subpopulations in the investigated germplasm was evaluated using STRUCTURE software. According to the results, K and Delta K statistics were extracted and two-dimensional diagrams were drawn, which clearly showed the curve at the maximum of K=4 (Fig. 5). Therefore, according to the obtained results, durum wheat genotypes were grouped into four separate subpopulations based on the Bayesian model (Fig. 6). The four subpopulations showed relatively high genetic diversity, which ranged from 0.02 in a subpopulation (POP4) to 0.34 in a subpopulation (POP3). Net Nucleotide Distance is a parameter to measure genetic diversity among populations that revealed the highest (0.270) and lowest (0.07) genetic distance between POP2 and POP4 subpopulations and POP1 and POP2 subpopulations (Table 3).
Table 3. Genetic divergence among (Net Nucleotide Distance) and within (expected heterozygosity) population, proportion of membership and mean value of Fst observed from the study of population structure of 92 durum wheat genotypes using SilicoDArT markers.
Population
|
Net Nucleotide Distance
|
Expected Heterozygosity
|
% 0f membership
|
Mean fixation index (Fst)
|
|
POP2
|
POP3
|
POP4
|
|
|
|
POP1
|
0.070
|
0.084
|
0.247
|
0.20
|
46
|
0.41
|
POP2
|
|
0.085
|
0.270
|
0.20
|
26
|
0.46
|
POP3
|
|
|
0.249
|
0.34
|
19.5
|
0.21
|
POP4
|
|
|
|
0.02
|
8.5
|
0.95
|
Analysis of linkage disequilibrium(LD)
Genome-wide Association study is the method that determines the QTLs location based on linkage disequilibrium(Gupta and Varshney 2005). Therefore, for association mapping in addition to the composition of the population structure, the extent of linkage disequilibrium in the genome has fundamental importance(Al-Maskri et al. 2012). The linkage disequilibrium index was evaluated using 605212 pairs of DArTseq markers with specific map locations along 14 durum wheat chromosomes. Pairwise linkage disequilibrium values were estimated using the square of allelic frequency correlations (r2) between markers. To identify differences in intra-chromosomal linkage disequilibrium, the average values of r2 between pairs of markers are divided into five groups based on the genetic distance between them: markers with very high linkage (Distance less than 5 cM), Contiguous markers (with a distance of 5-10 cM), markers with medium continuity (with a distance of 10-12 cM), weakly connected markers (with a distance of 20-50 cM) and independent markers (with a distance of more than 50 cM). The evaluation of intra-chromosomal linkage disequilibrium showed that the linkage disequilibrium decreases with increasing genetic distance. Also, the results revealed that linkage disequilibrium varies from chromosome to chromosome. The greatest imbalance was related to markers with a distance of less than 5 cM. Pairs of markers that are less than 5 cM away and have a lot of continuity showed the highest imbalance among all chromosomes. Previous studies have demonstrated that linkage disequilibrium varies along the chromosome and from chromosome to chromosome(Chao et al. 2010; Hao et al. 2011). The results also showed that increasing the distance between the markers does not completely remove the imbalance. This is due to the presence of other effective factors such as population structure, genetic drift, migration, selection, and mutation on linkage disequilibrium(Reich and Lander 2001). Analysis of linkage disequilibrium was also investigated at the genome level. To show the extent of linkage disequilibrium at the level of the genome, the average reduction in disequilibrium was obtained by plotting intra-chromosomal r2 values against the genetic distance between markers (Fig.7). The amount of LD within the chromosomes based on both SilicoDArT and SNP markers for both durum wheat genomes (A and B) is shown in Table 4. The amount of LD in both types of markers was very wide, in such a way that 605,212 pairs of significant markers were observed in the entire population. LD analysis between A and B genomes revealed that there is a high number of significant marker pairs (82052) in the B genome compared to the A genome (68885). The average r2 values for the whole population were equal to 0.14 and for genomes, A and B were equal to 0.12 and 0.11, respectively (Table 4). A graphic analysis of the distribution of LD markers is shown in Fig.7. In general, 15 kb of the genome was covered in LD. The largest amount of linkage disequilibrium was observed in chromosome 4B, some of the markers are shown on this chromosome (Fig.8).
Table 4 Characteristics of linkage disequilibrium in durum wheat genome based on DArTseq markers
Genome
|
Total pairs
|
Significant pairs (P≤0.001)
|
% of Significant pairs
|
Mean r2
|
Pairs in complete LD
|
Critical r2 value
|
A-genome
|
269891
|
68885
|
24.7
|
0.13
|
1337
|
0.12
|
B-genome
|
335322
|
82052
|
24.4
|
0.12
|
1146
|
0.11
|
Whole-genome
|
605212
|
148937
|
24.6
|
0.13
|
2438
|
0.14
|
Genome-Wide Association Study
The correlation analysis was performed using markers with a frequency of more than ten percent and the P-value statistic was considered with 1000 permutations. Also, the lowest P-value was used as the basis for selecting the correlated markers. The distribution of markers was studied based on the coefficient of determination of the marker in the regression model. The coefficient of determination (r2) is the proportion of the phenotypic variance accounted for by the QTL for each locus. To association mapping of traits including day-to-flowering, day-to-physiological maturity, plant height, thousand seed weight, and grain yield of durum wheat, both molecular and phenotypic data were used. Association mapping analysis for each trait was done based on CMLM (Compressed Mixed Linear Model). To determine the significant markers associated with the studied traits, QQPlot analysis was also performed and finally, the markers with -log 10(p) were selected. The Manhattan plot of two cropping years based on each of the measured traits is shown in Fig.9 and Fig.10, respectively. In Tables 5 and 6, the full report of significant associations of markers related to all traits for both two cropping years is presented. The results of a genome-wide association study by the CMLM method in 2016 showed that 19 markers had a significant association with the studied traits (Table 5). The results of association mapping for each of the traits in the first cropping year are summarized as follows:
- Two significant relationships were identified for days to flowering by markers 1703829 and 1087984, which have located on chromosomes 6B and 7A, respectively.
- The correlation of 3940462 markers on chromosome 4A with days to maturity was significant.
- Ten significant relationships were identified between plant height and DArT markers, these markers had chromosomal locations 3A, 3B, 4B, 5A, 5B, 6A, 6B, and 7B.
- For the trait of thousand kernel weight were identified four significant associated markers were including 3570140, 992437, 3534094, and 3533907 on chromosomes 7A, 3A, 5B, and 3B, respectively.
- About grain yield as an important trait identified three significant relationships with 1108172, 4989009, and 1233550 markers that have located on chromosomal position 3A, 2B, and 1B, respectively.
The results of genome-wide association analysis by the CMLM method in 2018 indicated that 10 markers had a significant association with the studied traits (Table 6). The summary of the results was as follows:
- Association mapping for day-to-flowering revealed two significant markers (3935863 and 1211191) which have chromosomal locus 1A and 5B, respectively.
- The number of three markers 3064874, 1034732, and 1025860 with gene loci 794315843, 673038094, and 666991274 on chromosomes chr3B, chr4A, and chr4A respectively, had a significant relationship with the number of days to maturity.
- The relationship of plant height with marker 1057654 with gene locus 691163974 on chromosome chr7A was significant. This is even though 10 significant relationships were identified for plant height in the previous year.
- The weight of a thousand kernel had a significant relationship with 3025786 and 981221 markers. These markers with gene loci 609888794 and 684637596 are located on chr6A and chr3B chromosomes, respectively.
- Two significant associations were identified between grain yield and DArT markers. The locus of these markers is located on chromosomes 6A and 7A, respectively.
In general, according to the results, the same marker (1057654) was identified for grain yield and plant height in the second cropping year. This marker with genetic locus 691163974 has located on chromosome 7A. By comparing the significant marker-trait relationships as well as the durum wheat consensus map, the position of the marker can be identified and checked by comparing them with physical maps. If the marker distance is more than +10 or -10 cM that position can be identified as a new QTL. Based on this, 29 quantitative positions were identified and confirmed in this study. Among the identified loci, 9 loci were not previously recorded in Durum's Consensus map (Table 7).
Table 5 Associated markers with studied traits in 94 durum wheat genotypes using linkage analysis in 2016 crop year.
Traits
|
Marker
|
Linkage Group
|
Locus
|
P-value
|
Maf
|
Days to Flowering
|
1703829
|
Chr6B
|
4863112
|
0.000075
|
309783
|
1087984
|
Chr7A
|
42448091
|
0.000083
|
0.141304
|
Days to Maturity
|
3940462
|
Chr4A
|
606757033
|
0.000087
|
0.943478
|
Plant Height
|
1228105
|
Chr5A
|
4994308
|
0.000000
|
0.288043
|
3946051
|
Chr3A
|
697061451
|
0.000000
|
0.13587
|
4004958
|
Chr5B
|
648289673
|
0.000001
|
0.130435
|
4405812
|
Chr5A
|
109937709
|
0.000002
|
0.369565
|
1068144
|
Chr6A
|
613084911
|
0.000013
|
0.70652
|
1093423
|
Chr6B
|
33427529
|
0.000026
|
0.375
|
2275869
|
Chr3B
|
19232431
|
0.000034
|
0.483696
|
5410721
|
Chr7B
|
625701808
|
0.000041
|
0.065217
|
1671913
|
Chr3A
|
497297304
|
0.000043
|
0.233696
|
1090315
|
Chr4B
|
629706516
|
0.000067
|
0.413043
|
Thousand grain weight
|
3570140
|
Chr7A
|
71591095
|
0.000031
|
0.211957
|
992437
|
Chr3A
|
541246518
|
0.000041
|
0.391304
|
3534094
|
Chr5B
|
27137735
|
0.000044
|
0.255435
|
3533907
|
Chr3B
|
739839965
|
0.000056
|
0.461957
|
Grain Yield
|
1108172
|
Chr3A
|
652905402
|
0.000071
|
0.059783
|
4989009
|
Chr2B
|
106775933
|
0.000021
|
0.179348
|
1233550
|
Chr1B
|
558575664
|
0.000033
|
0.190217
|
Table 6 Correlated markers with studied traits in 94 durum wheat genotypes using linkage analysis in 2017 crop year.
Traits
|
Marker
|
Linkage Group
|
Locus
|
P-value
|
Maf
|
Days to Flowering
|
3935863
|
Chr1A
|
30133162
|
0.000032
|
0.086957
|
1211191
|
Chr5B
|
700372980
|
0.000037
|
0.065217
|
Days to Maturity
|
3064874
|
Chr3B
|
794315843
|
0.000006
|
0.097826
|
1034732
|
Chr4A
|
673038094
|
0.0000028
|
0.25
|
1025860
|
Chr4A
|
666991274
|
0.000078
|
0.304348
|
Plant Height
|
1057654
|
Chr7A
|
691163974
|
0.000051
|
0.097826
|
Thousand Grain weight
|
3025786
|
Chr6A
|
609888794
|
0.000044
|
0.070652
|
981221
|
Chr3B
|
684637594
|
0.000090
|
0.0157609
|
Grain Yield
|
1057654
|
Chr7A
|
691163974
|
0.000016
|
0.097826
|
7918245
|
Chr6A
|
611855760
|
0.000040
|
0.141304
|
Table 7 Comparison of gene locations identified by Consensus map of durum wheat
Traits
|
Marker
|
Linkage Group
|
cM
|
Days to Flowering
|
1703829
|
Chr6B
|
3.34
|
1087984
|
Chr7A
|
3.34
|
3935863
|
Chr1A
|
35.82
|
1211191
|
Chr5B
|
137.12
|
Days to Maturity
|
3940462
|
Chr4A
|
54.26
|
3064874
|
Chr3B
|
133.21
|
1034732
|
Chr4A
|
96.08
|
1025860
|
Chr4A
|
96.16
|
Plant Height
|
1228105
|
Chr5A
|
5.94
|
3946051
|
Chr3A
|
-
|
4004958
|
Chr5B
|
-
|
4405812
|
Chr5A
|
-
|
1068144
|
Chr6A
|
98.22
|
1093423
|
Chr6B
|
13.14
|
2275869
|
Chr3B
|
17.44
|
5410721
|
Chr7B
|
81.31
|
1671913
|
Chr3A
|
-
|
1090315
|
Chr4B
|
-
|
1057654
|
Chr7A
|
129.67
|
Thousand grain weight
|
3570140
|
Chr7A
|
149.67
|
992437
|
Chr3A
|
68.59
|
3534094
|
Chr5B
|
161.82
|
3533907
|
Chr3B
|
-
|
3025786
|
Chr6A
|
98.83
|
981221
|
Chr3B
|
81.17
|
Grain Yield
|
1108172
|
Chr3A
|
65.11
|
4989009
|
Chr2B
|
-
|
1233550
|
Chr1B
|
-
|
1057654
|
Chr7A
|
129.67
|
7918245
|
Chr6A
|
-
|