Pedigree analysis and high-density SNP density map construction
Pedigree analysis through tracing back to several previous generations assists to understand the relationship between the parents. To know the genealogical relationship of the two parents, HN 84 and KF 17, pedigree trees were constructed, respectively. The pedigree charts showed that HN 84 and KF 17 inherited lineages of 39 and 18 excellent accessions collected from domestic and foreign resources, respectively (Yang et al. 2012, Luan et al. 2018). Notably, the two parents shared three common varieties of soybean, i.e., the Japanese cultivar Tokachi nagaha, and Chinese cultivated varieties Fengshou No.6 and No.10 (Fig. 1a), which showed the two parents might have a resemblance in genetic background, reducing genetic background noise allowing genetic analysis accurately and efficiently.
SNPs are now well recognized as the highest density molecular markers and are ideal for genetic studies. To further clarify the genetic variation loci in the two parents, a total of 128,820 SNPs between two parents on the whole genome were identified. And then, a high-density SNP density map representing number of SNPs within 1 Mb windows across of twenty chromosomes was constructed (Fig. 1b). On Gm01, Gm02, Gm04, Gm05, Gm08, Gm11, and Gm12 chromosome, the proportion of polymorphic SNPs to the total number of SNPs for each chromosome were all below 2.50%, and the polymorphic SNPs tend to cluster together. On the Gm18 and Gm19, the proportions of SNPs were 9.54% and 10.93%, respectively. Whereas, on Gm15, the proportion of SNPs with 12.77% was significantly higher than other chromosomes, which is useful for linkage analysis between genetic markers and trait loci.
High-density SNP genetic map construction
Out of 158,959 SNPs, 13,112 high-quality SNPs on the basis of MLOD values between the two parents were successfully integrated into 20 Gms of soybean. After performing linkage analysis, the final genetic map spanning 2906.29 cM in length with an average distance of 2.13 cM/marker was constructed (Fig. 2a). The number of SNP markers in the integrated map ranged from 132 (Gm12, 0.52 cM/marker) to 1,778 (Gm19, 0.08 cM/marker). Gm01, the shortest chromosome, contained 146 markers and exhibited a genetic map distance of 54.54 cM (i.e., an average inter-marker distance was 0.37 cM/marker). Gm06, the longest chromosome, contained 712 markers and exhibited a genetic map distance of 279.15 cM (i.e., an average inter-marker distance of 0.39 cM/marker). The largest and smallest genetic gaps were Gm03 and Gm19, which were 124.24 cM and 0.14 cM in length, respectively. The highest density Gm was Gm19 containing 1,778 markers, and the average marker density was 12.38 marker/cM. The lowest density group was Gm08 containing 391 markers, and the average marker density was 1.77 marker/cM (Supplemental Table S2). Subsequently, we analyzed the collinearity between the genetic map and the physical map. A flat region was observed at the centromeres (20 to 40 Mb) on Gm15 (Supplementary Figure S1). We also investigated pairwise recombination fractions and LOD scores based on 1,384 bins by a heatmap. There are two linkage group on Gm15, the length were 110.70 and 70.87 cM, respectively (Fig. 2b). These results indicated that the genetic map was valuable for further QTL mapping.
Identification of a locus controlling seed oil content
To clarify the difference in seed oil content between the two parents, we evaluated the seed oil content of HN 84 and KF 17 grown at Harbin in two years (2020 and 2021). In 2020, the seed oil contents of HN 84 and KF 17 were 20.13% ± 0.62% and 22.46% ± 1.26%, respectively. Seed oil contents in KF 17 were 1.11-fold higher than that in HN 84. In 2021, the seed oil content of HN 84 and KF 17 was 20.04% ± 0.62% and 21.86% ± 1.23%, respectively. Seed oil content in KF 17 was 1.08-fold higher than that in HN 84. These results showed that the high seed oil content of KF 17 was stably inherited in two years (Fig. 3b).
To identify the genetic factors influencing seed oil content in HN 84 and KF 17, we constructed RIL populations consisting of 200 individuals derived from the cross between HN 84 and KF 17 and planted them at Harbin in two years (2020 and 2021) (Supplemental Table S3, Supplemental Table S4). Combining genotype and phenotype analysis for seed oil content, in 2020, three QTLs located on Gm05, Gm07, and Gm15 were identified, and the PVE were 8.26%, 7.03%, and 10.44%, respectively (Fig. 3e, Supplemental Table S5). And in 2021, three QTLs having 5.60%, 10.52%, and 6.08% of PVE on Gm10, Gm15, and Gm18 were identified, respectively (Fig. 3f, Supplemental Table S6). Obviously, in two years, QTL intervals were 8.03 ~ 9.10 cM and 6.70 ~ 7.50 cM, explaining approximately 10.52% and 10.44% of PVE, respectively. As the QTL distanced too far less than 5 cM were regarded as the same QTL, which we named Seed oil content 15 (qOil_15) (Fig. 3ef, Fig. 4a, Supplemental Table S5, Supplemental Table S6).
Analysis Of Seed Oil Content Candidate Genes
Based on the soybean reference genome (Glycine max Wm82.a2.v1), the QTL interval between Gm15_50362321 and Gm15_50609859 within 247.54 Kb on Gm15 harbored 20 genes, i.e., open reading frame (ORF) 1–20 (Fig. 4a). To identify the details of sequence variation in the interval between each parent and a reference genome, WGR was performed on the parents. Upon comparison with the reference genome after filtering, two insertions and 908 SNPs were identified in HN 84 and KF 17, respectively. Although 16 SNPs were located in other multiple protein-coding regions leading to non-synonymous mutations, the critical variation, two insertions, were both located in the coding region of Glyma.15g268100, resulting in 3 amino acid elongation variants in KF 17 (Fig. 4b; Supplemental Table S7). Glyma.15g268100 is identical to the previously reported gene GmRNF1a (Yang et al. 2022), encoding RING-type ubiquitin ligase (E3) containing the Zinc ribbon domain and U-box domain. In addition, in different soybean lines varying in seed oil content, the expression analysis showed that Glyma.15g268100 was expressed in higher levels than other nineteen candidated genes in the seed (Fig. 4c; Supplemental Table S8). According to the Gene Expression Nebulas (https://ngdc.cncb.ac.cn/gen/browse/ genes), the co-expression heat map showed that Glyma.15g268100 with a higher expression level in seed development, especially in S7-S9, than other tissues (Fig. 4d). These results implied that Glyma.15g268100 might be a potential gene affect seed oil content.