Association mapping of seed vigor in spring soybean (Glycine max (L.) Merr.) in northeast China

High-vigor soybean seeds can result in the rapid and uniform emergence of seeds under a wide range of eld conditions. We aimed to explore the quantitative trait loci of seed vigor traits and superior allelic and carrier materials from the soybean in northeast China. A total of 257 improved lines and 104 soybean landraces were used to evaluate seed vigor, using normal seedling fresh weight, normal seedling length, main root length, hypocotyl length, and germination percentage, in 2018 and 2019. In this study, the different degrees of LD was detected not only among syntenic markers but also among nonsyntenic ones, suggesting that there had been historical recombination among linkage groups. A total of 19 and 10 simple sequence repeat (SSR) markers associated with seed vigor were detected in improved lines and soybean landraces in 2018 and 2019, respectively. Our results also showed ve SSR markers co-associated with two or three vigor traits. In addition, 41 superior alleles and their carrier materials were mined. Based on the elite alleles detected, the best cross combinations for improving seed vigor traits were proposed. The results demonstrate that association mapping is a valuable foundation for molecular breeding for soybean seed vigor.


Introduction
Soybean (Glycine Max (L.) Merr.) is an important oil crop that provides 69 % of plant protein and 30 % of plant oil worldwide 1,2 . Good production levels are essential, and high seed vigor varieties can increase the crop yield of soybeans by 20-30 % 3,4 . Seed vigor re ects in seed germination potential, eld emergence rate, and seed storage tolerance under a range of conditions 5 . Seeds with high vigor show increase in seed reserve capacity during germination, which provides a better guarantee for seedling development 6 . Furthermore, seeds with high vigor have a strong competitive advantage over weeds; seeds with low vigor have a low germination rate, weak resistance, and low yield 7,8 . Therefore, clarifying the genetic structure of soybeans and exploring elite alleles for the seed germination stage will promote genetic improvement of soybean seed vigor.
Seed vigor is a complex trait that is mainly manifested by factors such as seed germination rate, germination percentage (GP), seedling length (SL), root length (RL), seedling fresh weight (FW), and seed longevity 5 . The genetic mechanisms of seed vigor have been examined in crops such as rice 9,10 , maize 11,12 , and Arabidopsis 13 . Studies have shown that seed vigor is controlled by polygens, and many quantitative trait loci (QTL) for seed vigor have been identi ed 12,14 . The ability of seeds to resist aging is often used to measure seed vigor. Very little is known about the genetic control of seed vigor in soybeans. Zhang et al 15 used two recombinant inbred line (RIL) populations to evaluate three related seed germination traits by aging seed treatment to uncover the genetic mechanisms of soybeans. Thirty four QTL were identi ed on 11 chromosomes; Twenty one of these QTL were clustered in ve QTL-rich regions on Chromosome 3 (Chr3), Chr5, Chr17, and Chr18. Dargahi et al 16 constructed the F2:3 population developed from a cross between poor-longevity soybeans and a landrace cultivar with higher levels of longevity. Three QTL and 13 simple sequence repeat (SSR) loci associated with the relative germination rate and storability of seeds were detected. Singh et al 17 identi ed QTL for seed longevity in soybeans, using aged seeds of 153 F2:3 lines collected from replicated trails. There were four SSR markers associated with seed longevity, which explained the phenotypic variation that ranged from 6.3 % (Satt285) to 7.5 % (Satt434). All QTL were examined using family based linkage mapping as the main method to study the genotypes of the quantitative traits of plants. However, the results of this method often have limitations. Allelic variation could not be detected if there was no separation or recombination at some sites between the two parents 18 .
Association mapping identi es QTL based on linkage disequilibrium (LD) and examines the marker-trait associations in natural populations. Association mapping is an effective method to mine elite alleles because of a short research time and high mapping resolutions; it also facilitates allele discovery 19,20 . QTL detected by association mapping in soybeans include some important quantitative traits such as yield 21,22 , plant type 23-25 , content of oil 26 , and resistance 27,28 .
However, there is no study that examines the genetic mechanisms for seed vigor in the soybean, using association mapping. In this study, 361 soybean accessions were used to conduct association mapping for seed vigor by combining information from 175 SSR markers. FW, SL, RL, HL, and GP were measured in 2018 and 2019, to indicate seed vigor. The purpose of this study was (1) to investigate phenotypic and genetic diversities; (2) to detect the extent of LD in improved lines populations and landraces soybean populations; (3) to detect SSR loci controlling seed vigor and to mine elite alleles and corresponding frontal vector material; and (4) to propose a cross combination to improve seed vigor traits.

Plant materials and eld planting
Of the 361 soybean accessions, 257 belonged to improved lines and 104 belonged to landrace accessions. They were all from four provinces in the northeast of China (Tables S1 and S2). The eld experiments were carried out at the Hulan Experimental Station at Heilongjiang University (Harbin), Heilongjiang Province, China. Seed was sown on May 10, in 2018 and in 2019. Each variety was planted in two rows, 0.5 m apart, with 10 plants in a row in a random design. The harvested seeds of the 361 accessions were dried to a seed moisture of approximately 13 % and then used to evaluate seed vigor.

Investigation Of Traits
Fifty soybean seeds of each genotype were tested in our study. We used the methods outlined in Wang et al 29 to treat seeds before germination and to manage the conditions during germination. Seeds were cultivated in a 13×19×12 cm germination box in a germination room for 11 days, using the between paper (BP) method. This process was repeated twice. Five vigor-related traits were measured in the ve seedlings that grew the fastest in each genotype, as follows: GP, normal SL, main RL, HL, and normal seedling FW. The average measurements of the ve seedlings were used in the study. GP was calculated as the percentage of seeds that germinated after 3 days (GP (%) = 3 days of germinated seeds/total number of seeds × 100), according to the standard of seed germination identi ed by Wang et al 30 .

DNA Extraction And SSR Marker
Genomic DNA was extracted from the primary leaf of soybean seedling, using the cetyl trimethylammonium bromide (CTAB) method 31 . A total of 175 SSR markers were synthesized by Shanghai Generay Biotech Co. Ltd., Shanghai, China, following the sequences published on the SoyBase website (https://www.soybase.org/). The polymerase chain reaction (PCR) was a 10 µL mixture, consisting of 1.5 µL of DNA, 1.5 µL of forward and reverse primers, 5. 7 µL of ddH 2 O, 0.2 µL of dNTP, 1.0 µL of PCR buffer, and 0.1 µL of Taq DNA polymerase. The following steps were performed in the PCR process: pre-denaturation at 94°C for 5 min and denaturation at 94°C for 30 s. The cycle was repeated 35 times, followed by annealing at 42-61°C for 30 s, extension at 72°C for 30 s, and a nal extension at 72°C for 5 min. The PCR products were separated in 8 % polyacrylamide gels, and silver staining was used to enable visualization 32 34 . In the STRUCTURE setting, a burn-in period of 10, 000 and a Markov Chain Monte Carlo (MCMC) run length of 100, 000 was set. We implemented six independent runs for each subpopulation (K), with the setting varying from 2 to 10. Evanno's method was used to determine the best Delta K estimate 35 . Relative kinship was estimated using SPAGeDi software 36 .
Linkage Disequilibrium And Association Analysis LD was calculated using TASSEL V 2.1 software 37 . Pairwise estimates of standardized disequilibrium coe cients (D') were used to measure the level of LD between pairs of SSR markers 38,39 . The LD decay graph was plotted with genetic distance (cM) versus D' for intra-chromosomal combinations, and the background of LD decay was measured as D' dropped to half its maximum value 22 . The general linear model (GLM) method was used to test the associations between the phenotypes and markers. Markers associated with seed vigor traits at a signi cance level of P < 0.01 were selected. The "null allele" was used to estimate the phenotypic allele effect 40,41 .

Phenotypic evaluation and correlation
In both groups, the absolute values of kurtosis and skewness were less than 1 (  The Pearson correlation of the ve traits showed there was a signi cant positive and negative correlation between the ve traits of improved lines and soybean landraces ( Table 2). There was no signi cant correlation between HL and FW, SL, whereas HL and RL were signi cantly negatively correlated (Table 2). Of the 175 SSR markers that were polymorphic, a total of 844 alleles were detected. The number of alleles ranged from 2 to 12, with a mean of 4.82 alleles per marker. Genetic diversity ranged from 0.01 to 0.85, with a mean of 0.50. Polymorphic information content (PIC) ranged from 0.01 to 0.85, with a mean of 0.45 (Table S3). In comparison, higher values were observed for the numbers of alleles, genetic diversity, and PIC in soybean landraces than those in improved lines (Table S4).
STRUCTURE software analysis showed that the 175 markers were relatively independent. The log-likelihood function [Ln P(D)] has no obvious in ection point (Fig. 1A). Therefore, the suitable value of K was determined using Delta K; here, three subpopulations were detected ( Linkage Disequilibrium LD of allelic variation between SSR loci is the basis of association analysis. Considering that group mixing leads to the enhancement of LD, an LD analysis of both the improved lines and the soybean landraces was carried out. For the improved lines, a total of 14,992 pairs were detected, which included both inter-and intrachromosomal combinations (Fig. 3A). The SSR pairs of loci with LD supported by P < 0.01 occupied 19.66 % of the total population (Fig. 3A). The D' of linked marker pairs, supported by the P < 0.01 mean, was 0.38 (Table 3). For the soybean landraces, a total of 14,834 pairs were detected, which included both inter-and intrachromosomal combinations (Fig. 3B). The SSR pairs of loci with LD, supported by P < 0.01, occupied 6.38 % of the total population (Fig. 3A). The D' of linked marker pairs, supported by the P < 0.01 mean, was 0.44 (Table 3).
The study revealed that the improved lines had more LD locus pairs than soybean landraces, whereas soybean landraces had higher D' values than improved lines. This suggests that the history of soybean landraces has undergone more reorganization, retaining higher LD. Association Mapping Using the GLM program of marker-trait association, and based on the threshold (p < 0.01), 19 and 10 SSR loci were revealed in the two populations in 2018 and 2019, respectively (Table S5). Two years were co-detected.
The SSR markers for improved lines were seven for FW, two for SL, two for RL, one for HL, and two for GP, and the SSR markers for soybean landraces were three for FW, two for SL, two for RL, one for HL and two GP (Table 3).
Association mapping identi ed seven markers for FW in the improved lines. The range of phenotypic variation explained (PVE) was from 5.95 % to 12.08 %, of which Sat_256_Chr7 accounted for maximum phenotypic variations: 12.08 % in 2018 and 11.32 % in 2019 (Table 4). There were two markers associated with mapping for SL; the range of PVE was from 6.35 % to 13.35 %, of which Satt441_Chr7 accounted for the maximum phenotypic variations, viz. 13.35 % in 2018 and 2019 (Table 4). There were two markers of association mapping for RL. The range of PVE was from 7.09 % to 12.11 %, of which Satt329_Chr8 accounted for maximum phenotypic variations: 12.11 % in 2018 and 11.32 % in 2019 (Table 4). There was one marker of association mapping with HL, located on nine chromosomes, which accounted for phenotypic variations of 8.58 % in 2018 and 10.22 % in 2019 (Table 4). There were two markers of association mapping for GP. The range of PVE was from 6.98 % to 10.79 %, of which Satt303_Chr18 accounted for maximum phenotypic variations: 10.79 % in 2018 and 6.98 % in 2019 (Table 4). Three markers were co-associated with two or three vigor traits: Aw277661, Satt441, and Satt606 (Table 4).  (Table 4). Two markers were co-associated with two or three vigor traits, namely Sat_378 and Satt509 (Table 4).

Mining Of Elite Alleles And Carrier Materials
The alleles and carrier materials for the loci associated with the ve vigor traits were identi ed (Table S6). In the improved lines, Sat_256-236bp had the greatest positive effect (+ 0.32 g) for FW. The carrier material was Hefeng 48 (Table S6). Satt441-281bp had the greatest positive effect (+ 1.21 cm) for SL, and the carrier material was Heilong 66 (Table S6). Satt329-262bp had the greatest positive effect (+ 3.58 cm) for RL, and the carrier material was Heinong 44 (Table S6). Satt588-128bp had the greatest positive effect (+ 0.99 cm) for HL, and the carrier material was Jiunong 66 (Table S6). Satt413-196bp had the greatest positive effect (+ 12.94 %) for GP, and the carrier material was Heinong 44 (Table S6). Some of the phenotypic effect values of locus-allelic viz., Aw277661-266bp, Satt588-185bp, and Satt441-294bp had positive effects, and the elite allele frequencies were 78.21 %, 74.30 %, and 73.05 %, respectively. This may be because of strong arti cial selection.
In the soybean landraces, Sat_378-168bp had the greatest positive effect (+ 0.18 g) for FW, and the carrier material was Keshandajinhuang (Table S6). Sat_378-156bp had the greatest positive effect (+ 0.62 cm) for SL, and the carrier material was (Table S6). Sat_337-281bp had the greatest positive effect (+ 3.48 cm) for RL, and the carrier material was Longyoutai (Table S6). Satt304-169bp had the greatest positive effect (+ 0.41 cm) for HL, and the carrier material was Keshandajinhuang (Table S6). Sat_378-156bp had the greatest positive effect (+ 7.96 %) for GP, and the carrier material was Fangzhengbailudou ( The two best cross combinations for improving seed vigor traits were identi ed (Table 5), based on the elite alleles detected. The positive effect of elite alleles on ve seed vigor traits, namely Suinong 26, Hefeng 48, Heilong 44, Dalihuang and Keshandajinhuang emerged repeatedly (Table S6), suggesting that these materials carry superior fragments and are more likely to improve the corresponding characteristics in practical breeding if multiple vigor traits are improved simultaneously, and all superior genes are polymerized in one variety to the greatest extent.

Discussion
The vigor traits of soybean seeds during the germination period are the basis of seedling establishment. This study explored the elite alleles of the correlation loci for traits associated with seed vigor in the soybean. Seed vigor traits were identi ed for the spring soybean in northeast China in 2018 and 2019. Our data showed that the vigor traits of the two groups were normally distributed and that the genetic variation was abundant.
Analysis of genetic diversity of germplasm resources is an important basis for exploring favorable genes [42][43][44] .
In this study, 175 SSR markers were polymorphic, with a mean of 4.82 alleles per marker, indicating that this material is more genetically diverse. In addition, higher values were observed for the alleles, gene diversity, and PIC value for soybean landraces than for soybeans in improved lines. This may result from a reduction in the genetic diversity of the improved lines because of crossbreeding and the long-term selection process.
The relative kinship estimates based on the 175 SSR markers indicated that more than 74.67 % and 74.48 % of the pairwise kinship estimates in improved lines and soybean landraces accessions were within the range of 0 to 0.05, whereas 17.43 % and 18.93 % of the improved lines and soybean landraces accessions kinship estimates ranged from 0.05 to 0.15. The remaining pairs of accessions with the higher estimated categories continued to decrease in number, revealing that the pairwise accessions had a null or weak relationship (Fig. 5). Association analysis required non-structural groups and no direct relatives between the materials. The materials in this test met the basic requirements for association analysis.
Association analysis with the natural population as the object of study is more likely to cause false positives than traditional linkage analysis, as the mixing of subsets can increase the LD in a population. Understanding the population structure is important to avoid spurious associations in association mapping 34,45 . There have been many reports on the genetic structure of soybean 26,46 . In this study, based on the Bayesian clustering method, the population of improved lines and landrace accessions was clustered into three main subpopulations. It is noteworthy that the germplasm resources of Heilongjiang Province of China were distributed in three subpopulations, indicating some exchange of genes with those from the other three provinces.
Since LD between QTL and marker sites is the premise and foundation of association analysis, it is necessary to establish the LD status of the genomic region of the population in the study before association analysis. In this study, the value of D' was higher in the landraces than in the improved soybean lines, suggesting that there had been historical recombination among linkage groups in the former population. Arti cial selection further leads to different LD levels among soybean gene groups. Some studies suggest that the LD decay distance of soybean is more than 50-420 kb 2,47,48 , whereas other studies found a LD decay distance of more than 10-12 cM 26,49 . This indicates the inconsistency of LD decay levels in improved lines and soybean landraces, and highlights the need to determine the range of LD variation in improved lines and soybean landraces. In this study, the improved soybean lines had more LD locus pairs than soybean landraces, and the background of LD decay was from 1.46 cM for landraces to 2.53 cM for improved lines. Therefore, when performing association mapping, different genetic populations will require different marker densities, and to some extent, lower LD decay rates may limit the resolution of association analysis.
A total of ten and seven SSR markers associated with the vigor traits were screened out from cultivated and landraces of soybean in 2018 and 2019, respectively. We also compared the SSR markers associated with seed vigor with ndings from previous studies. Our results were different from those reported by Zhang et al 15  In the improved soybean lines, two SSR markers were co-associated with FW and SL, namely on the alleles Aw277661-246bp, Aw277661-266bp, Aw277661-268bp, Satt441-281bp, Satt441-287bp, and Satt441-294bp.
One SSR marker was co-associated with FW, SL, and GP, in which the direction of the phenotypic effects of alleles Satt606-279bp, Satt606-301bp, and Satt606-329bp was consistent. In the soybean landraces, two SSR markers were co-associated with FW and GP, on the alleles Sat_378-156bp, Sat_378-168bp, Sat_378-173bp, Sat_378-178bp, Sat_378-301bp, Satt509-181bp, Satt509-189bp, and Satt509-197bp. The direction of phenotypic effects was highly consistent. In addition, the results of the study revealed ve loci associated with two or more traits simultaneously, and these multi-effect loci may be the genetic basis for their phenotypic correlation, which can be applied to the synergistic improvement of multiple traits 41 .
Through sexual hybridization, more favorable elite alleles can be aggregated to improve the seed vigor traits of the variety. There were 16 positive alleles for the FW trait, for which the phenotypic effects varied from 0.04 g to 0.32 g, and these positive alleles were carried by six varieties. There were three elite alleles for Hefeng 48 and Dalihuang, and seven elite alleles for Suinong 26. There were seven positive alleles for the SL trait, with phenotypic effects varying from 0.02 cm to 1.21 cm, and these positive alleles were carried by six varieties. There were two elite alleles for Heinong 46. There were six positive alleles for the RL trait, with phenotypic effects varying from 0.35 cm to 3.58 cm, and these positive alleles were carried by four varieties. For Heinong 44 and Helongyoutai, there were two elite alleles. There were four positive alleles for the HL trait, for which the phenotypic effects varied from 0.16 cm to 0.99 cm, and these positive alleles were carried by four varieties. There were eight positive alleles for the GP traits, with phenotypic effects varying from 1.78 % to 12.94 %, and these positive alleles were carried by eight varieties.

Conclusions
In conclusion, seed vigor traits are of great importance for the breeding of the soybean. However, these are quantitative traits controlled by multiple genes. Association mapping has become the main method for exploring the genetic architecture of complex quantitative traits. Improving the detection e ciency of genetic loci and revealing the mechanism of genetic correlation and interaction between genes and the environment is a major challenge in analyzing plant quantitative traits. This study determined that more LD loci were found in improved lines than in landraces, which had a higher degree of LD. We identi ed 17 SSR marker traits associated with ve vigor traits. Association mapping information can compensate for the lack of family linkage QTL location information. Information on allelic variations can be used to select parents, cross design for pyramiding novel alleles, and perform marker-assisted selection (MAS) in soybean breeding at the germination stage.

Declarations
Data availability All data generated or analyzed during this study are included in this published article (and its supplementary information les). The improved lines are approved by the National Crop Variety Approval Committee. Soybean landraces are commonly used by breeders as the base material for variety selection and breeding. They meet the requirements for cultivation in Heilongjiang Province. The seeds are provided by the relevant breeding units and can be planted for use.

Compliance with ethical standards
The authors declare to do not con ict of interest among authors and in the research work. The above manuscript publication is approved by all authors and by the responsible authorities where the work was carried out.

Funding information
Ministry of Agriculture Protection of crop germplasm resources, Grant/Award Number: 2018NWB036-12; Master Innovation Research Project of Heilongjiang University, Grant/Award Number: YJSCX2020-216HLJU

Con ict of Interest
No con ict of interest among authors and in the research work.

Authors Contribution
Caijin Wang wrote the manuscript, performed association analysis, linkage disequilibrium. Caijin Wang and Huiyan Zhao evaluated the ve vigor-related traits; Yang Wang was involved in all aspects of the study and reviewed and edited the manuscript.