SNP profile
The selected 6,200 SNPs were distributed across 15 chromosomes with an average marker density of 1 per 51.17 kb Chromosome Lu1 and Lu4 contained highest (550 SNPs, 8.88%) and lowest (299 SNPs, 4.82%) number of SNPs, respectively. The SNP density was highest on chromosome Lu4 (66.34 kb) and was lowest on chromosome Lu13 (36.95 kb) (Table 1). The occurrence of transition SNPs (3,532 SNPs) was more than that of transversions (2,668 SNPs) with a ratio of 1.32. The frequency of C/T transitions was highest (28.61%) and C/G transversions were lowest (9.56%). Both A/G and C/T transitions occurred in similar frequencies (i.e. A/G 28.35% and C/T 28.61%), whereas the frequencies of four transversions were: A/C 11.61%, A/T 10.40%, C/G 9.56%, G/T 11.45% (Table 2). The inbreeding coefficient within individuals (Fis), inbreeding coefficient within subpopulations (Fst) and observed heterozygosity (Ho) of all the markers were 1, 1 and 0 respectively as all were homozygous. The Shannon’s information index (I) of all markers ranged from 0.03 to 0.70 with a mean value of 0.34.
Table 1 Distribution of SNPs
Chromosome
|
No. of SNPs
|
% SNPs
|
Start position a
|
End position a
|
Length (Mb)
|
Density (Kb)
|
Lu1
|
550
|
8.87
|
48002
|
28940544
|
28.89
|
52.53
|
Lu2
|
402
|
6.48
|
343539
|
25278102
|
24.93
|
62.03
|
Lu3
|
485
|
7.82
|
56610
|
26551417
|
26.49
|
54.63
|
Lu4
|
299
|
4.82
|
20788
|
19857012
|
19.83
|
66.34
|
Lu5
|
424
|
6.84
|
58098
|
17649206
|
17.59
|
41.49
|
Lu6
|
366
|
5.90
|
29887
|
17856972
|
17.82
|
48.71
|
Lu7
|
389
|
6.27
|
3623
|
18287460
|
18.28
|
47.00
|
Lu8
|
358
|
5.77
|
81017
|
23662693
|
23.58
|
65.87
|
Lu9
|
412
|
6.65
|
124789
|
21763401
|
21.63
|
52.52
|
Lu10
|
308
|
4.97
|
199991
|
17833309
|
17.63
|
57.25
|
Lu11
|
454
|
7.32
|
76724
|
19841794
|
19.76
|
43.54
|
Lu12
|
425
|
6.85
|
52319
|
20832003
|
20.77
|
48.89
|
Lu13
|
552
|
8.90
|
14015
|
20413108
|
20.39
|
36.95
|
Lu14
|
423
|
6.82
|
24838
|
19367496
|
19.34
|
45.73
|
Lu15
|
353
|
5.69
|
38217
|
15613904
|
15.57
|
44.12
|
Mean
|
413.33
|
|
|
|
|
51.17
|
a position given in bp
|
The expected heterozygosity (He) ranged from 0.08 to 0.53 with a mean value of 0.30. The polymorphic information content (PIC) ranged from 0.07 to 0.47 with a mean value of 0.24 (Table S2). Population wise marker diversity parameters are presented in supplementary table S3.
Table 2 Transition and transversion SNPs across the genome
SNP type
|
Model
|
No. of sites
|
Frequencies (%)
|
Total (percentage)
|
Transitions
|
A/G
|
1758
|
28.35
|
3532 (56.97%)
|
C/T
|
1774
|
28.61
|
Transversions
|
A/T
|
720
|
11.61
|
2668 (43.03%)
|
A/C
|
645
|
10.40
|
G/T
|
593
|
9.56
|
G/C
|
710
|
11.45
|
Population structure
The whole collection was divided into seven sub-populations based on structure analysis using the Delta K approach (Figure 2A). The NDSU releases and other American genotypes were grouped under sub-population-5 (P5) whereas European (Hungary), Turkish and Asian (India & Pakistan) genotypes were under sub-population-1 (P1), sub-population-7 (P7) and sub-population-6 (P6), respectively. Sub-population-2 (P2), sub-population-3 (P3) and sub-population-4 (P4) were composed of a mixture of genotypes of different origins (Figure 2B). All of the sub-populations consist of oil type genotypes except sub-population-2, which consists of mostly fiber type genotypes. Among oil types, spring type seed flax belong to P5, winter types belong to P1 and P7, short large seed Indian seed flax belong to P6, Mediterranean or Argentine seed flax belong to P3 and Ethiopian forage type seed flax belong to P4. Based on individual Q matrix, the proportion of pure (non-hybrid) and admixed (containing markers assigned to more than one population) genotypes in each population was calculated.
Table 3 Number of pure and admixed individuals per sub-population
Sub-populations
|
Total no. of genotypes
|
0.7 cutoff
|
0.9 cutoff
|
No. of genotypes
|
% of from total
|
No. of genotypes
|
% of from total
|
P1
|
42
|
20
|
47.62
|
12
|
28.6
|
P2
|
55
|
35
|
63.64
|
21
|
38.2
|
P3
|
72
|
44
|
61.11
|
14
|
19.4
|
P4
|
22
|
4
|
18.18
|
0
|
0.0
|
P5
|
106
|
86
|
81.13
|
40
|
37.7
|
P6
|
27
|
22
|
81.48
|
21
|
77.8
|
P7
|
26
|
16
|
61.54
|
14
|
53.8
|
Total
|
350
|
227
|
64.86
|
122
|
34.9
|
The proportion of pure accessions in each population ranged from 18% to 81% at a 0.7 cutoff value. The P5 and P6 contained highest percentage (81%) of pure accessions, whereas P4 contained the lowest percentage (18%) (Table 3). We also performed principal coordinate analysis (PCoA) to show the genetic similarity among populations. The first two axes explained 18.49% of the total observed variation (Table S4). The PCoA revealed that NDSU released and other American genotypes (P5), Turkish (P7) and Asian (P6) genotypes were well clustered and separated from rest of the genotypes (Figure 5). In addition to that, we also constructed phylogenetic tree based on neighbor joining (NJ) criteria (Figure 4). The output of neighbor-joining (NJ) tree analysis was in line with that of structure analysis and PCoA.
Population Diversity
In all sub-populations the percentage of polymorphic loci was greater than 60%. It was highest in P3 (97.53%) and lowest in P6 (62%). The diversity (H) of the seven sub-populations ranged from 0.12 (P6) to 0.28 (P3) with an average of 0.22. The Shannon’s information index (I) ranged from 0.21 (P6) to 0.44 (P3) with an average of 0.34. Likewise percentage of polymorphic loci
Table 4 Sub-population wise diversity parameters
Sub-populations
|
Polymorphic loci (%)
|
Na a
|
Ne b
|
I c
|
H d
|
Uh e
|
Tajima's D
|
P1
|
0.87
|
1.87
|
1.39
|
0.37
|
0.24
|
0.24
|
0.75
|
P2
|
0.87
|
1.87
|
1.33
|
0.33
|
0.21
|
0.21
|
0.40
|
P3
|
0.98
|
1.98
|
1.46
|
0.44
|
0.28
|
0.29
|
1.50
|
P4
|
0.78
|
1.78
|
1.41
|
0.37
|
0.24
|
0.25
|
0.79
|
P5
|
0.88
|
1.88
|
1.33
|
0.32
|
0.20
|
0.20
|
0.69
|
P6
|
0.62
|
1.62
|
1.18
|
0.21
|
0.12
|
0.13
|
-0.85
|
P7
|
0.80
|
1.80
|
1.35
|
0.34
|
0.22
|
0.23
|
0.35
|
Mean
|
0.83
|
1.83
|
1.35
|
0.34
|
0.22
|
0.22
|
0.52
|
a No. of different alleles, b No. of effective alleles, c Shannon's information index
d Diversity, e Unbiased diversity
|
and diversity, it was highest in P3 and lowest in P6. The Tajima's D value ranged from -0.85 (P6) to 1.50 (P3) with an average of 0.52 (Table 4). The mean pairwise relatedness (r) among individuals within population was significant (p <0.01). The P3, P5 and P1 showed lower (<0.1) r values and it increased for P2 (0.10), P4 (0.11), P7 (0.12) and was highest for P6 (0.34) (Table 5, Figure 5). The I and H were significantly and negatively correlated with relatedness (r= -0.91, -0.89 respectively and p < 0.01).
Table 5 Mean pairwise relatedness (r) values within sub-population
Sub-populations
|
P1
|
P2
|
P3
|
P4
|
P5
|
P6
|
P7
|
Mean
|
0.095
|
0.101
|
0.043
|
0.114
|
0.088
|
0.338
|
0.127
|
Upper mark
|
0.006
|
0.004
|
0.003
|
0.011
|
0.002
|
0.008
|
0.008
|
Lower mark
|
-0.005
|
-0.004
|
-0.003
|
-0.008
|
-0.003
|
-0.007
|
-0.007
|
P value
|
0.001
|
0.001
|
0.001
|
0.001
|
0.001
|
0.001
|
0.001
|
Population Genetic Differentiation
The AMOVA revealed that variance among sub-populations covered 28% of total variation whereas the remaining 72% of total variation accounted for variance among individuals within sub-populations (Table 6) with a Fst and Nm value of 0.28 and 0.64, respectively. All pairwise Fst comparisons between sub-populations were significant (p < 0.01).
Table 6 Summary of AMOVA
Sources
|
df
|
SS
|
MS
|
Est. Var.
|
% of variation
|
Fixation indices
|
Nm
|
Among sub-populations
|
6
|
164598.46
|
27433.08
|
274.90
|
28
|
Fst: 0.28
|
0.64
|
Among individuals
|
343
|
483200.88
|
1408.75
|
704.37
|
72
|
Fis: 1.00
|
Within individuals
|
350
|
0.00
|
0.00
|
0.00
|
0
|
Fit: 1.00
|
Total
|
699
|
647799.34
|
|
979.27
|
100
|
|
|
Most of the combinations showed a great degree of divergence (Fst > 0.25) (26) except few combinations such as P1 and P3 (0.13), P3 and P4 (0.13), P3 and P7 (0.13), P2 and P5 (0.16), P4 and P7 (0.17). The pairwise Fst > 0.50 was observed between P2 and P6, P5 and P6, P7 and P6 (Table 7). At the loci level, the genetic differentiation, Fst ranged from 0.01 to 0.95 with a mean of 0.29 (Table S5).
Table 7 Genetic differentiation among sub-populations
|
Sub-population pairwise Fst
|
|
P1
|
P2
|
P3
|
P4
|
P5
|
P6
|
P7
|
P1
|
0.00
|
0.00
|
0.00
|
0.00
|
0.00
|
0.00
|
0.00
|
P2
|
0.25
|
0.00
|
0.00
|
0.00
|
0.00
|
0.00
|
0.00
|
P3
|
0.13
|
0.21
|
0.00
|
0.00
|
0.00
|
0.00
|
0.00
|
P4
|
0.21
|
0.31
|
0.13
|
0.00
|
0.00
|
0.00
|
0.00
|
P5
|
0.27
|
0.16
|
0.21
|
0.32
|
0.00
|
0.00
|
0.00
|
P6
|
0.48
|
0.54
|
0.40
|
0.46
|
0.54
|
0.00
|
0.00
|
P7
|
0.20
|
0.32
|
0.14
|
0.17
|
0.32
|
0.51
|
0.00
|
Below diagonal values are pairwise Fst comparison. Above diagonals depicts the P values
|
Mantel test was performed to show the correlation between geographic and genetic distance among individuals within each sub-population (Table 8).
Table 8 Mantel test output showing genetic and geographic distance correlation
Sub-population
|
SSx a
|
SSy b
|
SPxy
|
Rxy c
|
P value
|
P1
|
45800457.07
|
5538022460
|
93859398.73
|
0.19
|
0.05
|
P2
|
134861866.3
|
41553712025
|
246361305.5
|
0.10
|
0.13
|
P3
|
120655500
|
8473759.059
|
3098570.347
|
0.10
|
0.09
|
P4
|
26416786.96
|
1683893955
|
52933834.97
|
0.25
|
0.01
|
P5
|
318721174.9
|
85673146302
|
1827968297
|
0.35
|
0.01
|
P6
|
92879654.84
|
3062447732
|
179000333.5
|
0.34
|
0.07
|
P7
|
19812140.31
|
479364655.5
|
35378695.93
|
0.36
|
0.05
|
Whole collection
|
8016027762.51
|
1165284415054.30
|
28882623823.41
|
0.30
|
0.001
|
a Genetic distance, b Geographic distance, c Correlation coefficient values
|
Individuals of P4 and P5 showed significant positive correlation between geographic and genetic distance (r= 0.251, 0.349, respectively, and p < 0.05) whereas it was not significant in other populations (Figure S1). In the entire collection, significant positive correlation (r = 0.30 and p < 0.01) was revealed by mantel test.
Linkage Disequilibrium pattern
The linkage disequilibrium (LD) pattern was investigated across the entire collection, each sub-population and chromosome-wise. LD = r2 values decreased with the increase of distances. In all cases, mean LD was high (r2 > 0.80) at short distance bin (0-1 kb) and declined with increasing bin distance (Table S6). In the entire collection, the mean linked LD, mean unlinked LD and loci pair under linked LD was 0.41, 0.02 and 2.46%, respectively. The mean linked LD was highest in P6 (r2 = 0.50), and was lowest in P4 (r 2= 0.39). In P6, highest proportion (28.22%) of total loci pair was linked, whereas it was very low (1.08%) in P3 (Table 9). We also calculated the LD decay rate. In the whole collection, LD decayed to its half maximum within < 21 kb distance. Each chromosome showed differential rate of LD decay.
Table 9 Linkage disequilibrium in the studied collection
Sub-population
|
Mean linked LD
|
Mean unlinked LD
|
Mean LD
|
Loci pairs in linked LD (%)
|
Loci pairs in unlinked LD (%)
|
Whole collection
|
0.41
|
0.02
|
0.03
|
2.46
|
97.54
|
P1
|
0.40
|
0.03
|
0.04
|
3.77
|
96.23
|
P2
|
0.44
|
0.03
|
0.05
|
5.80
|
94.20
|
P3
|
0.48
|
0.02
|
0.02
|
1.08
|
98.92
|
P4
|
0.38
|
0.04
|
0.08
|
11.56
|
88.44
|
P5
|
0.45
|
0.02
|
0.04
|
4.92
|
95.08
|
P6
|
0.50
|
0.04
|
0.17
|
28.22
|
71.78
|
P7
|
0.39
|
0.04
|
0.06
|
7.04
|
92.96
|
LD persisted the longest in chromosome Lu1 (35.42 kb) and Lu3 (34.40 kb). The decay distance was shortest in chromosome Lu13 (13.71 kb) and Lu8 (14.68 kb) (Figure S2, Table S7). LD decayed to its half –maximum within < 30 kb for P1 and P3, 38.34 kb for P7, 52.68 kb for P2, < 85 kb for P4 and P5, and 1,444 kb for P6 (Figure 6, Table S7).