Development of high-quality SNP marker panels
Based on the analysis of 15 representative soybean accessions, three high-quality SNP marker panels, 40K, 20K and 10K, were developed for GBTS. The 40K, 20K and 10K marker panels consisted of 41541, 20748, and 9670 SNP markers, respectively, well distributed along the entire genome (Table S1). The average number of SNPs were 2077, 1037 and 483 per chromosome for the 40K, 20K and 10K marker panels, respectively. The number of SNPs per chromosome ranged from 1623 (chromosome 20) to 3189 (chromosome 18) for the 40K SNP marker panel, 752 (chromosome 11) to 1576 (chromosome 18) for the 20K SNP marker panel, and 348 (chromosome 11) to 623 (chromosome 18) for the 10K SNP marker panels (Table 1).
The SNP density per 1 Mb was higher in telomeric regions than in the centromeres of the Wm82.a2.v1 assembly (Fig. 1) because recombination events in the pericentromeric regions were infrequent, and the source panel, SoySNP50K, was purposely selected to reduce the number of SNPs in this region (Song et al. 2013).
The missing rate (Fig. 2) decreased as the number of sequencing reads in each panel increased. A total of 1.2 Gbp, 0.6 Gbp and 0.4 Gbp sequencing reads were required to reduce the missing rate to less than 0.02 for 40K, 20K and 10K, respectively (Fig. 2). Sequencing reads were reduced by 50.00% and 33.33% in the 20K and 10K SNP marker panels compared to the 40K panel, respectively.
Consistency of SNPs from different sequencing technologies among 15 soybean accessions
To examine the consistency of SNP alleles from GBTS, the 15 representative soybean accessions were sequenced twice with a 40K SNP GBTS panel, and SNP alleles from these replicates were compared (Table 2). The concordance of the SNP genotypes from repeat target sequencing averaged 99.87% and ranged from 99.28% in ZYD04569 with 41054 SNPs to 99.98% in Hobbit with 41293 SNPs. The concordance of SNP alleles of the 15 soybean accessions between 10× resequencing data and GBTS data averaged 98.86% and ranged from 95.64% for ZYD02878 with 39499 SNPs to 99.70% for Hobbit with 413149 SNPs (Table 3). The results suggested that the 40K SNP marker panel was highly reliable and consistent.
Phylogenetic relationships revealed by different marker panels
Phylogenetic trees were constructed for the 15 representative soybean accessions using 40K, 20K and 10K SNP marker panels to further evaluate the quality of the SNP marker panels. As shown in Fig. 3, identical phylogenetic relationships among the soybean accessions were observed based on the 40K, 20K and 10K SNP markers. The wild soybean accessions were in one cluster, and the cultivated soybean accessions were in another cluster. Although all three GBTS panels could be used for germplasm evaluation, the 10K panel might be a better choice in terms of minimizing costs.
High-density genetic linkage map construction using the 10K SNP marker panel
Given the infrequent recombinant events in the bi-parental populations (Song et al. 2017), two RIL populations derived from HJ117 × Qihuang34 (HQ1734) and HJ117 × xudou16 (HX1716) were sequenced using the 10K SNP marker panel. The linkage map length of 2055 SNP markers obtained in HQ1734 was 4490.83 cM, and the average interval between adjacent SNP markers was 2.19 cM. Among the 20 chromosomes, chromosome 18 had the largest number of SNP markers (164), with a genetic length of 238.58 cM, and chromosome 16 had the least number of SNP markers (57), with a genetic length of 108.70 cM (Table 4). The linkage map of HX1716 consisted of 2111 SNP markers (Table 5). The total linkage map length and average genetic distance between two adjacent SNP markers were 4798.81 cM and 2.27 cM, respectively. The largest and least numbers of SNP markers were 134 on chromosome 18 and 71 on chromosome 12, respectively. A relatively high degree of collinearity between the SNP positions in the reference genome and the genetic maps was observed in these two RIL populations, as there was a distinct diagonal relationship between physical lengths and genetic distances in the plots of each population (Fig. 4).
Table 4
Genetic map length, physical distance (bp), number of bins and intervals of adjacent markers along each chromosome in the HQ1734 population
Chr | Genetic map length (cM) | Physical distance (bp) | Number of bins (#) | Average interval between adjacent markers (cM) |
Chr01 | 251.87 | 55751910 | 110 | 2.29 |
Chr02 | 295.83 | 51461993 | 153 | 1.93 |
Chr03 | 164.87 | 41705654 | 58 | 2.84 |
Chr04 | 263.58 | 48192113 | 84 | 3.14 |
Chr05 | 213.3 | 38703433 | 100 | 2.13 |
Chr06 | 245.37 | 49064024 | 98 | 2.5 |
Chr07 | 214.96 | 43921246 | 74 | 2.9 |
Chr08 | 298.82 | 46188257 | 114 | 2.62 |
Chr09 | 212.65 | 46139114 | 73 | 2.91 |
Chr10 | 238.3 | 50286433 | 127 | 1.88 |
Chr11 | 236.66 | 38005422 | 85 | 2.78 |
Chr12 | 208.38 | 39921557 | 92 | 2.27 |
Chr13 | 283.81 | 42444488 | 111 | 2.56 |
Chr14 | 209.52 | 49297872 | 112 | 1.87 |
Chr15 | 214.93 | 50829911 | 100 | 2.15 |
Chr16 | 108.7 | 29135922 | 57 | 1.91 |
Chr17 | 209.97 | 41654429 | 111 | 1.89 |
Chr18 | 238.58 | 62258912 | 164 | 1.45 |
Chr19 | 172.31 | 45673217 | 106 | 1.63 |
Chr20 | 208.42 | 46186882 | 126 | 1.65 |
Total | 4490.83 | 916822789 | 2055 | 2.19 |
Table 5
Genetic map length, physical distance (bp), number of bins and intervals of adjacent markers along each chromosome in the HX1716 population
Chr | Genetic map length (cM) | Physical distance (bp) | Number of bins (#) | Average interval between adjacent markers (cM) |
Chr01 | 219.35 | 55751910 | 93 | 2.36 |
Chr02 | 260.35 | 49724662 | 127 | 2.05 |
Chr03 | 215.72 | 47702654 | 110 | 1.96 |
Chr04 | 142.68 | 40491114 | 71 | 2.01 |
Chr05 | 244.86 | 38596037 | 78 | 3.14 |
Chr06 | 288.68 | 50593128 | 113 | 2.55 |
Chr07 | 254.79 | 44551009 | 91 | 2.8 |
Chr08 | 297.83 | 45270892 | 97 | 3.07 |
Chr09 | 236.34 | 44902894 | 127 | 1.86 |
Chr10 | 254.59 | 49675543 | 131 | 1.94 |
Chr11 | 254.35 | 37954341 | 93 | 2.73 |
Chr12 | 234.41 | 40032883 | 71 | 3.3 |
Chr13 | 299.8 | 42444488 | 129 | 2.32 |
Chr14 | 231.33 | 48934431 | 117 | 1.98 |
Chr15 | 223.2 | 50805033 | 107 | 2.09 |
Chr16 | 195.74 | 36802736 | 102 | 1.92 |
Chr17 | 261.02 | 41654429 | 114 | 2.29 |
Chr18 | 237.07 | 62119973 | 134 | 1.77 |
Chr19 | 224.39 | 49145388 | 110 | 2.04 |
Chr20 | 222.31 | 46505454 | 96 | 2.32 |
Total | 4798.81 | 923658999 | 2111 | 2.27 |
Genetic analysis and QTL identification for the 100-seed weight trait in soybean
The 100-seed weight of three representative plants from each line of HQ1734 and HX1716 was determined and used for QTL identification. As shown in Table 6, the 100-seed weight was between 16.60 g and 31.75 g in HQ1734 and 11.45 g and 28.05 g in HX1716. Genetic analysis suggested that the distributions of the 100-seed weight were close to a normal distribution, and the skewness and kurtosis were 0.16 and 0.05 in HQ1734, respectively, and − 0.19 and − 0.09 in HX1716, respectively. The broad-sense heritability (h2b) for the two RILs was 0.96 and 0.92, demonstrating that the variation in 100-seed weight caused by experimental error was small.
Table 6
Phenotypic variation in 100-seed weight in the HQ1734 and HX1716 populations
Population | Mean | SD a | Max b | Min b | CV% c | Skew d | Kurt d | h2b e |
HQ1734 | 22.81 | 2.68 | 31.75 | 16.60 | 11.75 | 0.16 | 0.05 | 0.96 |
HX1716 | 20.00 | 3.20 | 28.05 | 11.45 | 16.00 | -0.19 | -0.09 | 0.92 |
a SD, standard deviation |
b Max and Min, the maximum and minimum values of 100 seed weight, respectively |
c CV%, coefficient of variation |
d Skew and Kurt, the Skewness and Kurtosis of 100-seed weight, respectively |
e h2b, broad-sense heritability |
QTL analysis in the two RIL populations identified a common stable locus on chromosome 06, Locus_OSW_06 (Table 7). In HQ1734, qOSW-34 was mapped to the 6541348–6611603 bp interval between the markers Gm06_6611603_A_G and Gm06_6541348_G_T with an LOD value of 4.85, which explained 9.83% of the phenotypic variation. In HX1716, qOSW-16 was detected in the interval of 6192925–6375580 bp between the markers of Gm06_6187779_T_C-Gm06_6370390_G_A with an LOD value of 4.09, which explained 7.05% of the phenotypic variation.
Table 7
Putative QTLs detected for 100-seed weight in the HQ1734 and HX1716 populations
Locus | QTL a | Chr | Position | Marker or interval b | LOD c | PVE(%) d | Add e |
Locus_OSW_06 | qOSW-34 | 06 | 6546570- 6616833 | Gm06_6611603_A_G- Gm06_6541348_G_T | 4.85 | 9.83 | -0.95 |
qOSW-16 | 06 | 6192925- 6375580 | Gm06_6187779_T_C- Gm06_6370390_G_A | 4.09 | 7.05 | -0.81 |
a qOSW-34 and qOSW-16, the QTLs detected for 100-seed weight in the HQ1734 and HX1716 populations, respectively |
b Marker or interval, markers or support intervals on the linkage map in which the LOD is the largest |
c LOD, logarithm of odds |
d PVE (%), percentage of phenotypic variance explained by the QTL |
e Add, additive effects, negative values represent increasing effects of the QTLs derived from HJ117 |
Cost of GBTS compared with GBS and DNA chips
The cost in soybean breeding programs, especially the high genotyping cost, is the major constraint for breeders. The genotyping costs of GBTS, GBS and DNA chips were analyzed (Table 8). According to the Mol Breeding Company, the cost for DNA extraction, library construction, probe hybridization and labor was $0.44, $2.94, $2.21, $1.47 per sample, respectively, for each panel of GBTS; however, the costs for sequencing and bioinformatics analysis varied depending on the panels. Compared to that of the 40K marker panel, the costs for 20K and 10K were reduced by 50.00% and 200.00%, 150.00% and 400.00%, and 42.86% and 150.00%, respectively, for sequencing, data analysis and equipment depreciation. The total genotyping cost of GBTS for 40K, 20K, 10K, GBS with a sequence depth of 2× and DNA chips containing 50K SNP markers were $13.68, $11.32, $9.26, $14.46 and $32.79 per sample, respectively. Compared to GBS and DNA chips, the costs for 40K, 20K and 10K were reduced by 5.38% and 139.78%, 27.27% and 189.61%, and 55.56% and 253.97%, respectively. GBTS has a cost advantage over GBS and DNA chips, especially the 10K marker panel.
Table 8
The genotyping cost (US$ per sample) for different sequencing technologies
Procedure | GBTS a | GBS b | DNA chips c |
40K | 20K | 10K |
DNA extraction | 0.44 | 0.44 | 0.44 | 0.44 | 0.44 |
Library construction | 2.94 | 2.94 | 2.94 | 5.88 | 29.41 |
Probe hybridization | 2.21 | 2.21 | 2.21 | 0.00 |
Sequencing | 4.41 | 2.94 | 1.47 | 4.41 |
Bioinformatics analysis | 0.74 | 0.29 | 0.15 | 0.74 |
Labor | 1.47 | 1.47 | 1.47 | 1.47 | 1.47 |
Depreciation cost | 1.47 | 1.03 | 0.59 | 1.47 | 1.47 |
Total | 13.68 | 11.32 | 9.26 | 14.41 | 32.79 |
a GBTS, genotyping by target sequencing |
b GBS, genotyping by sequencing, where the sequence depth is 2× |
c DNA chips the DNA chips contained 50K SNP markers with a genotyping cost of $32.79 |