Different structures were detected between the Brazilian and US genetic bases
Principal component analysis revealed that most Brazilian cultivars (red circle) were grouped with a subgroup of US cultivars (green circle). Most of them belonged to maturity groups VI, VII, VIII and IX (Figure 1A). Based on the Evanno criterion (Figure 1B), the structure results based on four groups (K = 4) showed a high ΔK value (312.35), but the upper-most level of the structure was in two groups (K = 2; ΔK = 1885.43).
Considering K = 2 (Figure 1C), the Brazilian cultivars jointly presented an assignment to the Q1 group (green) equal to 86.7% which was much higher than that observed for the US cultivars (43.9%). Considering K = 4 (Figure 1D), the Brazilian cultivars jointly presented an assignment to the Q2 group (red) of only 4.7% while the US cultivars jointly presented an assignment to the Q2 group of 27.4%. The Q1 group (green) has a lower assignment in Brazilian cultivars than US accessions (11.1%, and 30.1%, respectively). As expected, these results confirm that the set of Brazilian cultivars has a narrower genetic base compared to US cultivars.
Large genetic divergences in United States and Brazil soybean germplasm were observed according to their maturity groups
When we compared the cultivars between maturity groups, we observed a clear differentiation between early and late groups (Table 1). The highest genetic distance (0.42) was observed between cultivars with MG 00-0 and MG VIII-IX.
Table 1
Summary of the genomic regions with high FST values between Brazilian and US germplasms.
Chr.a
|
Start
(Mbp)b
|
End
(Mbp)c
|
SNPd
|
FST
|
Tajima’s Dg
|
π (10E−05)h
|
(High)e
|
(Reg.)f
|
ALL
|
BR
|
US
|
BR
|
US
|
US/BRi
|
1
|
48.70
|
48.80
|
5
|
0.45
|
0.47
|
4.07
|
2.45
|
2.47
|
1.76
|
1.60
|
0.91
|
4
|
50.20
|
50.30
|
7
|
0.44
|
0.19
|
4.12
|
2.12
|
3.93
|
1.89
|
2.86
|
1.51
|
6
|
0.60
|
0.70
|
6
|
0.40
|
0.32
|
4.20
|
1.42
|
3.53
|
1.58
|
2.38
|
1.50
|
6
|
46.90
|
47.00
|
8
|
0.41
|
0.29
|
4.24
|
1.70
|
3.55
|
2.18
|
2.92
|
1.34
|
6
|
47.30
|
47.40
|
4
|
0.40
|
0.42
|
4.19
|
-0.03
|
3.83
|
0.58
|
1.80
|
3.10
|
6
|
47.40
|
47.50
|
9
|
0.41
|
0.39
|
5.58
|
0.37
|
5.08
|
1.73
|
4.08
|
2.36
|
6
|
47.50
|
47.60
|
4
|
0.49
|
0.35
|
3.35
|
1.16
|
2.84
|
0.81
|
1.47
|
1.82
|
6
|
47.70
|
47.80
|
16
|
0.46
|
0.22
|
5.42
|
2.30
|
5.23
|
3.85
|
6.81
|
1.77
|
6
|
47.80
|
47.90
|
15
|
0.40
|
0.29
|
5.87
|
2.98
|
4.97
|
5.05
|
6.20
|
1.23
|
6
|
48.10
|
48.20
|
20
|
0.44
|
0.17
|
5.82
|
2.64
|
5.61
|
6.10
|
8.63
|
1.42
|
6
|
48.40
|
48.50
|
4
|
0.47
|
0.15
|
1.94
|
1.15
|
1.72
|
0.80
|
1.08
|
1.35
|
7
|
6.30
|
6.40
|
6
|
0.44
|
0.16
|
1.32
|
2.34
|
0.78
|
1.63
|
0.90
|
0.55
|
9
|
41.50
|
41.60
|
7
|
0.40
|
0.17
|
4.34
|
1.82
|
4.55
|
1.52
|
3.13
|
2.06
|
10
|
44.20
|
44.30
|
6
|
0.52
|
0.23
|
2.95
|
2.61
|
2.00
|
2.13
|
1.63
|
0.77
|
10
|
44.40
|
44.50
|
7
|
0.44
|
0.16
|
3.84
|
3.05
|
2.90
|
2.66
|
2.58
|
0.97
|
12
|
6.10
|
6.20
|
9
|
0.46
|
0.10
|
4.99
|
3.92
|
5.22
|
3.83
|
3.83
|
1.00
|
16
|
3.00
|
3.10
|
12
|
0.42
|
0.09
|
1.74
|
2.25
|
1.24
|
3.27
|
2.26
|
0.69
|
16
|
29.40
|
29.50
|
10
|
0.45
|
0.12
|
4.63
|
3.96
|
4.24
|
3.86
|
4.01
|
1.04
|
16
|
30.70
|
30.80
|
6
|
0.41
|
0.30
|
2.21
|
2.96
|
0.97
|
2.30
|
1.03
|
0.45
|
16
|
31.10
|
31.20
|
6
|
0.51
|
0.27
|
3.38
|
0.55
|
3.18
|
0.98
|
2.20
|
2.24
|
18
|
48.60
|
48.70
|
5
|
0.42
|
0.32
|
2.76
|
4.00
|
1.20
|
2.45
|
1.12
|
0.46
|
18
|
57.20
|
57.30
|
9
|
0.46
|
0.17
|
2.76
|
3.42
|
2.03
|
3.21
|
2.13
|
0.66
|
19
|
0.90
|
1.00
|
7
|
0.40
|
0.11
|
2.65
|
3.40
|
2.12
|
2.15
|
1.97
|
0.92
|
19
|
3.00
|
3.10
|
5
|
0.42
|
0.39
|
2.21
|
4.08
|
0.34
|
2.45
|
0.76
|
0.31
|
19
|
3.10
|
3.20
|
4
|
0.40
|
0.42
|
2.84
|
3.23
|
1.25
|
1.78
|
0.94
|
0.53
|
19
|
3.40
|
3.50
|
4
|
0.40
|
0.42
|
2.84
|
3.23
|
1.25
|
2.24
|
1.31
|
0.58
|
a: Soybean chromosome; b: start position of the genomic region with high FST; c: end position of the genomic region with high FST values; d: total of SNPs observed in this interval; e: the highest FST value observed in a SNP of this interval; f: the genomic region average FST; g: Tajima’s D coefficient for all (ALL), Brazilian (BR), and United States (US) germplasms; h: nucleotide diversity values for all (ALL), Brazilian (BR), and United States (US) germplasms; i: nucleotide diversity ratio between the populations. |
To examine the influence of maturity groups on population structure, we next analyzed the average assignment coefficients (K=4) of Brazilian and US cultivars for each maturity group (Supplementary Figure S1). Brazilian cultivars from maturity group V presented Q1, Q2, Q3, and Q4 equal to 30.4%, 1.9%, 32.1, and 32.0%, respectively; US cultivars from this same maturity group (V) presented means of Q1, Q2, Q3, and Q4 equal to 9.2%, 8.2%, 65.1%, and 17.6%, respectively. This result indicates that, although belonging to the same maturity group, the Brazilian cultivar group presents considerably different allelic frequencies than the US cultivar group especially for Q3 and Q4. US cultivars belonging to earlier maturity groups (00, 0, I, and II) had significantly higher mean assignment coefficient to Q2 group (red) compared to the other later maturity groups (V=8.2%, VI=8.1%, VIII=5.0%, and IX=13.6%). In the case of Brazilian cultivars, the average assignment coefficients for Q2 were much lower (V=1.9%, VI=4.2%, VII=5.6%, VIII=4.9% and IX=4.9%). These results demonstrate an important allelic pool that distinguishes early to late materials present in Q2.
In general, the Brazilian germplasm showed few differences between maturity groups (Table 1 and Figure 2A). This was also observed when we generated a population structure analysis exclusively with their cultivars (Figure 2C). In contrast, the US germplasm showed a high variation of the genetic distance when we analyzed their maturity groups (Table 1) with a clear clustering of the cultivars (Figure 2B), which is more obvious when we observed their exclusive population structure analysis (Figure 2D). The results show that early materials tend to be genetically distant from the late cultivars in the US. The maturity groups from the southern-breeding program of the US (V, VI, VII, VIII, and IX) tend to be less genetically divergent versus northern groups (00, 0, I, II, III, and IV). This agrees with previous studies indicating distinct Northern and Southern genetic pools in the US5. There is a low divergence among US soybean cultivars from maturity groups higher than V (Figure 2B). In contrast, cultivars from groups 00 and 0 were more genetically distant from materials of the MG III and IV when compared to early materials. Maturity groups I-II showed as an intermediate group between 00-0 and III-IV. The population structure analysis showed a high influence of the Q2 in cultivars with MG 00-II. For cultivars with MG III and IV, we observed an increase of Q1. Finally, there is a high influence of Q3 in cultivars with maturity groups higher than V, which agrees with the genetic distance data.
Meaningful genetic change of the Brazilian soybean germplasm occurred in modern materials
The results demonstrate that both genetic bases had few increases in genetic distance among modern materials (releases after 2000) when compared to cultivars from 1950 to 1970s (Table 2). According to the IBS genetic distance mean, the Brazilian genetic base was more diverse along the decades compared to US germplasm especially when we compared cultivars released before 1970s and after 2000s.
Average assignment coefficients (Q1, Q2, Q3, and Q4) from structure results were calculated for both germplasm pools. All accessions were sorted according to their origin and release decade (Figure 3). We observed high genomic modifications along the decades in the Brazilian germplasm. Modern materials (2000-2010) had Q1, Q2, Q3, and Q4 values of 36.8%, 2.3%, 31.7%, and 26.0%, respectively, while old accessions (1950-1960s) had means of Q1, Q2, Q3, and Q4 equal to 1.6%, 6.6%, 7.0%, and 84.7%, respectively. The Q4 had a high decrease since 1990s whereas Q1 and Q3 had a high increase at the same period. For the US genetic base, we observed an increase of Q3 and a decrease of Q2 over time. Old cultivars had Q1, Q2, Q3, and Q4 values of 36.0%, 33.7%, 12.3%, and 18.1%, respectively, while modern cultivars had Q1, Q2, Q3, and Q4 of 24.3%, 17.5%, 40.3%, and 17.8%, respectively.
Modification during the 1990s became more evident upon analysis of the PCA and structure results of the Brazilian genetic base considering the release decades (Figure 4A and C). We observed an increase in the influence of the Q2 in modern materials (2000-2010) when we compared the results to old materials (1950-1970). In contrast, the US genetic base showed few variations over time according to the average of genetic distance (Table 2), PCA, and the exclusive population structure analysis (Figure 4B and D). These results suggest a large influence of new alleles into Brazilian germplasm after the 1990s.
Maturity genes under selection between Brazilian and United States cultivars
Seventy-two SNPs with FST ≥ 0.4 between Brazilian and United States cultivars were identified (Supplementary Table S1). These SNPs are located on chromosomes 1, 4, 6, 7, 9, 10, 12, 16, 18, and 19 (Supplementary Figure S2). Twenty-six 100-Kbp genomic regions with a high degree of diversification between Brazilian and US genetic basis were also found (Table 3). The results for Tajima’s D showed that these regions had balancing events that maintained the diversity of their bases. Two regions on chromosome 6 (47.3 – 47.4 Mbp and 47.3 - 47.4 Mbp) and another on chromosome 16 (31.10 - 31.20 Mbp) had few variations in Brazilian accessions (Supplementary Table S2). In contrast, the allele distribution for most of the SNPs present in these genomic regions in US germplasm were higher compared to Brazilian germplasm. An opposite scenario was observed for the other three regions located on chromosomes 7 (6.30 – 6.40 Mbp), 16 (30.70 – 30.80), and 19 (3.00 – 3.10) (Supplementary Table S2). The allele variance was higher in the Brazilian genetic base than US germplasm for these three intervals.
Some SNPs had a large impact on the differentiation of Brazil and US genetic bases. These were located close to three important soybean maturity loci: E1 (Chr06: 20,207,077 to 20,207,940 bp), E2 (Chr10: 45,294,735 to 45,316,121 bp) and FT2a (Chr16: 31,109,999 to 31,114,963)13–15 (Figure 5). For the SNPs ss715607350 (Chr10: 44,224,500), ss715607351 (Chr10: 44,231,253), and ss715624321 (Chr16: 30,708,368), we found that the alternative allele was barely present in US germplasm whereas the Brazilian genetic base had an equal distribution between reference and alternative alleles. When we examined the SNPs ss715624371 (Chr16: 31,134,540) and ss715624379 (Chr16: 31,181,902), the frequency of the alternative allele remains low in the US germplasm. However, the alternative alleles of these two SNPs were present in more than 78% of the Brazilian accessions in contrast to the previous three SNPs. Finally, the alternative allele for SNPs ss715593836 (Chr06: 20,019,602) and ss715593843 (Chr06: 20,353,073) were extremely rare in Brazilian germplasm with only 2% of the accessions carrying them. In contrast, the US germplasm had an equal distribution of reference and alternative alleles in their accessions. However, all accessions with the alternative alleles belonged to MGs lower than VI with less than five cultivars in MG V.
Ten SNPs were identified related to the gene’s modifier mutations present in Brazilian and US germplasm; these were distributed on chromosomes 4, 6, 10, 12, 16, and 19 (Supplementary Table S3). These SNPs had different allele frequency and could distinguish both genetic bases. Six modifications had a clear influence on the maturity of the accessions whereas two of these had a large influence in some decades of breeding (Supplementary Figure S3). The SNP ss715593833 had similar haplotype of the two SNPs described close to E1 loci (ss715593836 and ss715593843) due the LD among them. At the end of this chromosome, we also observed another three relevant SNPs in LD: ss715594746, ss715594787, and ss715594990. In the US germplasm, we observed a decrease in the alternative allele in accessions with MG values below to IV. We detected other relevant modifications on chromosome 12 for SNPs ss715613204 and ss715613207. Both SNPs had a minor allele frequency higher than 0.35 in Brazilian germplasm with an increase in the alternative allele in materials with MGs higher than VII. In contrast, alternative alleles for both SNPs were barely present in the US germplasm except for accessions with MG higher than VII.
There were 312 genomic regions that differentiate north (00 – IV MG) and south (V – IX MG) cultivar groups (Supplementary Table S4). Some important regions were observed to be less diverse in northern accessions whereas the nucleotide diversity remains present in southern cultivars. The genomic region close to the Dt1 gene is one example of these specific regions. We compared the SNPs observed in the genomic region close to the Dt1 gene (Chr19: 45.20 - 45.30 Mbp) with the growth habit phenotype data available for 284 lines at the USDA website (www.ars-grin.gov). The phenotypic data suggested that these SNPs were associated with trait growth habit. Moreover, our diversity analysis demonstrated a putative selective sweep for the Dt1 gene in the northern germplasm, which has the dominant loci fixed for Dt1; the southern lines tend to be more diverse compared to the northern US cultivars (Supplementary Table S5). In contrast, other genomic regions have lower nucleotide diversity in southern accessions compared to the northern accessions. An important disease resistance cluster gene was observed on chromosome 13 bearing four loci: Rsv1, Rpv1, Rpg1, and Rps316–19. In this interval, we observed two genomic regions (29.70 – 29.80 Mbp and 31.90 – 32.00 Mbp) under putative selective sweeps in the southern germplasm (Supplementary Table S6).
Besides these regions, 1,401 SNPs with FST values higher than 0.40 between northern and southern US cultivars were also identified (Supplementary Table S7). In addition, there were 23 SNPs with FST values higher than 0.70 spread on chromosomes 1, 3, 6, and 19. Seven of them were located close to another important soybean locus: E1 (involved in soybean maturity control) (Supplementary Table S8). These SNPs clearly differentiate northern and southern US cultivars with the reference allele fixed in northern materials, and the alternative alleles into southern accessions. Gene modification in US germplasms were also detected in our study. One hundred twenty-six SNPs were identified in FST analysis modifying 125 genes (Supplementary Table S9).
Finally, we detected 1,557 SNPs with FST values higher than 0.40 between super-early cultivars (00 – 0 MG) and early cultivars (III – IV MG) (Supplementary Table S10). Seventeen SNPs had FST values higher than 0.70 spread on chromosomes 4, 7, 8, and 10. The SNPs identified on chromosome 10 were close to the E2 loci. We also detected 168 SNPs associated with modifications in 164 genes (Supplementary Table S11).
Genetic diversity over time was higher in Brazilian modern cultivars than founder lines
We observed two large SNPs differences in allelic frequencies on Brazilian germplasm (Supplementary Figure S4). On chromosome 4, the SNP ss715588874 (50,545,890 bp) had a decrease of the allele “A” in materials released after 2000 with only nine of the 45 Brazilian materials with this allele. Similar situations were observed on chromosome 19 for ss715633722 (3,180,152 bp) with half of the modern accessions having the presence of allele C. Both SNPs had similar distribution according to their decades in the US genetic base with a large influence of reference alleles.
We also observed important results associated with the Brazilian genetic base. There were 126 genomic regions spread on almost all soybean chromosomes. The only exception was chromosome 20 (Supplementary Table S12). Our analysis between cultivars released before and after 1996 identified 30 putative regions under breeding sweep events. Thirteen regions had a decrease in diversity in modern cultivars according to Tajima’s D and π results. Two genomic regions were observed close to important disease resistance loci: one on chromosome 13 (30.30 – 30.40 Mbp) close to an important resistance gene cluster (with Rsv1, Rpv1, Rpg1, and Rps3)16–19 and another on chromosome 14 (1.70 – 1.80 Mbp) with a southern stem canker resistance loci20,21. In contrast, thirty-one genomic regions had an increase in diversity in modern materials, which suggested putative introgression events in these accessions. Two genomic regions were observed on chromosome 2 (40.90 – 40.10 Mbp) and 9 (40.30 - 40.40 Mbp). These were previously reported to have an association with ureide content and iron nutrient content, respectively22,23.
Besides these regions, there were also 409 SNPs with FST values higher than 0.40, distributed across all soybean chromosomes. There were 73 SNPs with FST values higher than 0.70 (Supplementary Table S13). Some of these SNPs were also reported to be associated with important soybean traits such as plant height, seed mass, water use efficiency, nutrient content, and ureide content22–26.
We also identified gene modifications with a high impact on the Brazilian genetic base when we compared cultivars according to their release decade. Of the 409 SNPs identified in FST analysis, we observed 40 SNPs causing modifications in 39 soybean genes (Supplementary Table S14). Three SNPs with FST values higher than 0.70 were associated with non-synonymous modifications: ss715588896 (Glyma.04G239600 – a snoaL-like polyketide cyclase), ss715607653 (Glyma.10g051900 – a gene with a methyltransferase domain), and ss715632020 (Glyma.18G256700 – a PQQ enzyme repeat).