Fine mapping of a QTL locus (QNFSP07-1) and analysis of candidate genes for four-seeded pods in soybean

Soybean [Glycine max (L.) Merr.] is an important grain and oil crop in the world, and it is the main source of high-quality protein. The number of four-seeded pods is a quantitative trait in soybean and is closely related to yield in terms of breeding. Therefore, it is of great significance to study the inheritance of four-seed pods and to excavate related genes for improving soybean yield. In this study, individuals with high ratio of four-seed pods which from chromosome segment substitution lines (CSSLs) that can be stably inherited were selected as the parent, and Suinong 14 (SN14) was used as recurrent parent to construct secondary mapping population via marker-assisted selection. From 2006 to 2017, QTL analysis was performed using secondary mapping populations, and the initial QTL mapping interval was 0.67 Mb and was located on Gm07. Based on the initial QTL mapping results, individuals that were heterozygous at the interval (36,116,118–37,399,738 bp) were screened in 2018, and the heterozygous individuals were subjected to inbreeding to obtain 13 F3 populations, with a target interval of 321 kb. Gene annotation was performed on the fine mapping interval, and 27 genes were obtained. Among 27 genes, Glyma.07G200900 and Glyma.07G201200 were identified as candidate genes. qRT-PCR was used to measure the expression of the 2 candidate genes at different developmental stages of soybean, and the expression levels of the 2 candidate genes in terms of cell division (axillary buds, COTs, EMs) were higher than those in terms of cell expansion (MM, LM), and these genes play a positive regulatory role in the formation of four-seeded pods. Haplotype analysis of 2 candidate genes which shows that Glyma.07G201200 has two excellent haplotypes, and the significance level between the two excellent haplotypes at p < 0.05. Those results provide the information for gene map–based cloning and molecular marker–assisted breeding of the number of four-seeded pod in soybean.


Introduction
Soybean [Glycine max (L.) Merr.] originated in China and has enriched oil and protein contents (Hwang et al. 2014). Soybean breeders have been working hard to improve the yield and quality of soybean, which is among the economically important crop varieties. As the number of four-seeded pods is closely related to yield, mapping QTLs related to four-seeded pods of soybean via marker-assisted selection will provide new ideas for cultivating new soybean varieties with multi pods and high yields (Yang et al. 2013;Kumar et al. 2014).
More than 401 QTLs for the number of pods per plant have been identified in soybean (Sun et al. 2006;Chen et al. 2007;Zhang et al. 2010); Among them, 86 of these QTLs were located by RIL population or F 2 population, and IM and CIM methods were used to detect QTLs for four-seeded pods in LGs B2, D2, H, I, and M (Wang et al. 2007;Li et al. 2020). Three QTLs associated with the number four-seeded pods have been detected, located on Gm02, Gm11, and Gm17 (Li et al. 2008). Zhou et al. (2009) used 165 recombinant inbred lines (RILs) to analyze the number of four-seeded pods and showed that 3 QTLs on linkage group (LG) I were associated with the number of four-seeded pods. Gao et al. (2012) reported 8 QTLs on LGs D1a, A1, and H were associated with the number of four-seeded pods in a RIL population. Yang et al. (2013) used a RIL population comprising 147 F 2:14 -F 2:19 individuals and detected QTLs for the number of four-seeded pods under 11 environmental conditions; these QTLs were located on LGs A1, B1, and F. Two others set of RILs (RIL 3613 and RIL 6013) have been used for QTL mapping, and 21 and 26 pod-related QTLs were detected via complete interval mapping (CIM) and could explain the phenotypic variation percentages of 1.25% and 7.91%, respectively (Ning et al. 2018).
The number of four-seeded pods is a complex quantitative character, is highly heritable, and is susceptible to environmental conditions (Pei et al. 2010). Studies have shown that the number of four-seeded pods has a significant positive correlation with the number of pods per plant, and the number of pods per plant is highly correlated with yield (Peng et al. 1994). Moreover, the number of pods per plant and the number of seeds per plant are highly positively correlated, showing that an increase in the number of pods increases the number of seeds. In addition, the number of fourseeded pods and the number of seeds per pods are positively correlated (Li et al. 2018). The number of four-seeded pods is one of the important breeding goals for breeding new soybean varieties with high and stable yields (Richards, 2000). A relationship between lanceolate leaves and the number of pods in soybean has also been reported. Compared with that of plants with oval leaves, the ratio of four-seeded pods to the number of pods per plant of those with the lanceolate leaves was higher, suggesting that the alleles controlling lanceolate leaves play a major pleiotropic role in increasing the ratio of four-seeded pods in soybean (Takahashi, 1934;Domingo, 1945;Johnson and Bernard, 1962;Weiss, 1970). Ln has been widely explored, its expression has been found in lanceolate leaves, it is considered to be closely related to the number of four-seeded pods, and it has a significant pleiotropic effect (Jeong et al. 2012). Ln controls lanceolate leaves of soybean and is stably inherited in offspring, and the ratio of plants with four-seeded pods with lanceolate leaves (ln/ln) is higher than the ratio of plants with four-seeded pods with the oval leaves (Ln/Ln) (Dinkins et al. 2002). In Arabidopsis, overexpression of CYP78A9 can induce large and seedless fruits and has been reported to be associated with the number of seeds (Ito and Meyerowitz, 2000). Three P450/CYP78A genes (AtCYP78A5, AtCYP78A6, and AtCYP78A9) have been reported to be associated with seed size and pod number (Li et al. 2016). KLU/CYP78A5 has been shown to promote both leaf and floral organ growth and seed development by increasing cell proliferation in ovules and increasing KLU activity in ovules; this activity is sufficient to increase seed size and number (Eriksson et al. 2010). GmCYP78A10, which belongs to the P450/CYP78A family, was cloned, and it was shown that the proportion of four-seeded pods in soybean plants carrying the GmCYP78A10a allele was significantly higher than that carrying the GmCYP78A10b allele (Wang et al. 2015).
In this study, we selected a line with a high proportion of four-seeded pods from chromosome segment substitution lines (CSSLs), which were crossed with Suinong 14 (SN14) to construct a fine mapping population. The phenotype of this population was investigated. This information was combined with that determined via SSR markers to determine the genotype. The main objectives of this study were to define a more precise QTL interval and perform gene annotation on this interval to mine candidate genes and analyze them using qRT-PCR analysis and haplotype analysis.

Materials and methods
Plant materials, planting, and experimental design In this study, CSSLs were constructed from SN14 (the main cultivar used in Heilongjiang, China) and ZYD00006 (a wild soybean resource) and subsequently backcrossed with SN14. In 2006, SN14 was crossed with ZYD00006 to obtain F 1 plants, and in 2007, the F 1 plants were backcrossed with SN14 to obtain BC 1 F 1 plants. The experimental material CSSL860-13 was selected as the parent of the fine mapping population and has a stable and high ratio of the number of four-seeded pods to the number of pods per plant from 2012 to 2014 (BC 4 F 4 -BC 4 F 6 ). T-tests showed significant differences between the phenotypic data of CSSL860-13 and SN14. In 2015, CSSL860-13 (BC 4 F 6 ) was backcrossed with SN14 to obtain a CSSL860-13/ SN14 backcross population. In 2016, all F 1 seeds harvested the previous year were planted in an experimental field. In 2017, 1168 lines of the CSSL860-13/SN14 (F 2 ) population were obtained in the experimental field, constituting a preliminary mapping population. In 2018, individual F 2 plants were screened for heterozygous intervals, after which all were planted in the experimental field; 492 lines (F 3 ) were ultimately obtained for the fine mapping population.
From 2006 to 2018, the mapping population was planted in Xiangyang Farm, Xiangfang District, Harbin, Heilongjiang Province (Harbin, latitude 45° 72′ N, longitude 126° 68′ E). A random block design with three replicates was used from 2006 to 2014, the row length was 5 m, a row spacing of 60 cm, and a plant spacing of 5 cm, 80 seeds were sown per row, and fields were under general management.

Data collection and segregation analysis
The phenotypic data of the number of four-seeded pods (NFSP) were recorded with Microsoft® Excel 2016 (Frye, 2016) and analyzed with SPSS 17.0 (Grafius.1978), including the frequency distribution, standard deviation, skewness, and extremum. The phenotypic data for the number of pods traits of each line were recorded from 2006 to 2018 as described by Qiu and Chang (2006).
The phenotypic data of NFSP were analyzed using the Segregation Analysis (SEA) software (Cao et al. 2013), which was based on the Microsoft Visual Studio 2010 platform and C + + programming language to identify the main genes and multi-gene mixed genetic models of quantitative trait. All genetic models are analyzed, and the Akaike Information Criterion (AIC) value is calculated by using the likelihood function. According to the AIC minimum value to determine the optimal genetic model, least-squares method was used to estimate the gene effect value and genetic variances, and precisely calculating the probabilities in the Smirnov test (nW 2 ) and Kolmogorov test (Dn).
Smirnov test (nW 2 ): Kolmogorov test (Dn): Identification of QTLs for the number of four-seeded pods Parent resequencing performed according to standard protocols provided by Illumina, the amount of analysis data is 34.63 Gbp Clean Data, and the Q30 is 86.73%. The average ratio of sample to reference genome (Williams82) was 98.64%, the average coverage depth was 10X, and the genome coverage was 97.80%. The results of the parent resequencing combined with the SSR sequence provided by SoyBase (https:// www. soyba se. org/) were used to synthesize 1050 pairs of SSR marker primers. Primers located on Gm03, Gm06, Gm07, Gm08, Gm13, and Gm14 were selected for QTL mapping, and their sequence information is listed in Table S1.
With respect to individual plants that are heterozygous at the segment of the initial QTL interval are screened, self-crossing results in a continuous overlapping arrangement within the separation intervals, generating a fine mapping population. All SSR markers between SSR-07-1224 and SSR-07-1257 on Gm07 were selected for primer encryption analysis, and polyacrylamide gel electrophoresis was used to analyze the genotypes. The relevant information is listed in Table S2.
Single-marker analysis via Windows QTL Cartographer (version 2.5) (Wang et al. 2006) was used to analyze the relationships between phenotypes and genotypes by a simple linear regression model, as follows: By judging whether b1 and 0 are significantly different, the relationship between the molecular (1) (3) y = b0 + b1x + e markers and QTLs can be determined. F statistical analysis is used to compare the following hypotheses: H0, b1 = 0; H1, b1 is not zero. The smaller the Pr(F) value is, the smaller the support for H0, and the more relevant the tag is to the QTL.
The QTL nomenclature of soybean pod-seed number traits refers to the naming standard of QTLs of McCouch (1997), specifically: Q + trait name + linkage group number + "-" + QTL sequence number.

Candidate gene mining
The "Williams 82.a2.v1" genome was used as a reference genome for predicting the candidate genes, and SoyBase (https:// www. soyba se. org/) provided the physical positions of the major QTL intervals. To determine the gene sequence related to the number of four-seeded pods, BLAST (https:// phyto zome. jgi. doe. gov/ pz/ portal. html#) was used to perform a homology analysis between the extracted gene sequence and the sequence of the candidate gene interval, and candidate genes related to the number of four-seeded pods were obtained.
The expression data of 27 candidate genes in 26 tissues of soybean were downloaded from Phytozome (https:// phyto zome-next. jgi. doe. gov/), and the transcriptome data of four experimental materials from the CSSLs during three periods (Gl: 4 days after flower open, Hr: 7 days after flower open, Co: 10 days after flower open) were also analyzed. A high ratio of four-seeded pods to the number of pods per plant (HNFSP) consisting of HNFSP-1 (24.62%) and HNFSP-2 (12.84%) and a low ratio of four-seeded pods to the number of pods per plant (LNFSP) consisting of LNFSP-1(4.85%) and LNFSP-2(0.40%) was used to draw the heatmap with the software TBtools (Chen et al. 2020).

qRT-PCR analysis of candidate genes
With SN14 (the ratio of four-seeded pods to the number of pods per plant: 18.39%) used as a control, experimental materials with a high ratio of four-seeded pods to the number of pods per plant (line 953: 31.00%) and materials with a low ratio of fourseeded pods to the number of pods per plant (line 32: 0.00%) were screened for qRT-PCR. Florescence, the axillary bud stage, COT (cotyledon) stage, EM (early maturity) stage, MM (mid-seed maturity) stage, and LM (late seed maturity) stage were selected for sample collection according to soybean development stage criteria published on SoyBase (https:// www. soyba se. org/). The axillary bud and COT and EM stages are cell division stages, while the MM and LM stages are cell expansion stages (SoyBase ontology).
Total RNA was extracted from different tissues of soybean by TRIzol reagent (Ambion). Oligo 7.57 software was used to design gene primers, and GmACTIN4 (GenBank login number: AF049106) was used as an internal control. Information concerning the primers is listed in Table S3.

Haplotype analysis
Two hundred fifty-three soybean germplasm resources from across China were used for haplotype analysis; the planting methods and field management practices were the same as those for the mapping population. The results of the germplasm resource resequencing (10 × sequencing) and the genomic sequence information were used for this study, and the upstream (3000 bp), the genomic sequences, and the downstream (2000 bp) of the four candidate genes were extracted from the Phytozome 13 database (https:// phyto zome-next. jgi. doe. gov/). The DNAsp 5.0 software was used for analyzing the haplotype distribution of the genomic sequence of the candidate genes in the germplasm resource population (Librado and Rozas, 2009); the haploview (https:// haplo view. softw are. infor mer. com/4. 2/) was used for screening the excellent haplotypes (the number of cultivars belonging to the haplotype exceeding 5% of the total). SPSS 17.0 software (Armonk, New York, USA) was used to determine the effects of each haplotype on the phenotype. PLACE (plant cis-acting regulatory DNA elements (https:// www. dna. affrc. go. jp/ PLACE/)) was used to query the function of plant promoter elements.

Phenotypic analysis and segregation analysis
In 2017, the phenotypic data of 1168 individuals in the backcross population were obtained. The ratio of four-seeded pods to total pods ranged from 0 to 40.54%, with an average of 8.15%, and the standard deviation was 6.00%. The average ratio of four-seeded pods to total pods of SN14 (the recurrent parent) was 18.39% (the average value of 10 standard lines), with a standard deviation of 0.07%. Statistical analysis showed that the ratio of four-seeded pods to the number of pods per plant ranged from 35.00 ~ 41.00%; there were 512 individuals whose ratio of four-seeded pods was within 8 ~ 25% and 124 individuals without four-seeded pods (Table 1, Fig. 1A).
There were 113 individuals in the R1 fine mapping population (line 337; line 1160), and the ratio of the number of four-seeded pods to the number of pods per plant ranged from 0.00 to 37.84%, with a standard deviation of 8.38% (Fig. 1B). There were 137 individuals in the R2 population (line 18, line 108, line 554, line 614), and the ratio of the number of four-seeded pods to the number of pods per plant ranged from 0.00 to 43.59%, with a standard deviation of 7.77% (Fig. 1C). There were eighty-four individuals in the R3 population (line 271; line 345; line 712) and the ratio of the number of four-seeded pods to the number of pods per plant ranged from 0.00 to 57.89%, with a standard deviation of 8.34%. These results are consistent with a normal distribution and are suitable for phenotypic and genotype association analyses (Fig. 1D, Table 1).
The results show that the optimal model of the F 2 population was 2MG-AD, indicating that there are two major genes in the population. The variance of the main gene is 8.58, and the heritability of the main gene is 0.75; the Smirnov test (nW 2 ) result was 0.00, and the probability of the Kolmogorov test (Dn) was 1. The optimal model of R1 was 2MG-A, indicating that there are two major genes in this population. The variance of the main gene was 57.44, and the heritability of the main gene is 0.81; the Smirnov test (nW 2 ) result was 0.73, and the probability of the Kolmogorov test (Dn) was 1. The optimal model of R2 was 2MG-AD, indicating that there are two major genes in the population. The variance of the main gene is 37.31, and the heritability of the main gene is 0.61; the Smirnov test (nW 2 ) result was 0.73, and the Kolmogorov test (Dn) was 1. The optimal model of R3 was 2MG-EEAD, indicating that there are two major genes in the population. The variance of the main gene is 47.51, and the heritability of the main gene is 0.01; the Smirnov test (nW 2 ) result was 0.71, and the Kolmogorov test (Dn) was 1 ( Table 2).

QTL analysis of the F 2 population
The resequencing results of the population parents (CSSL860-13 and SN14) showed that there were heterozygous segments on Gm03, Gm06, Gm08, and Gm14. The heterozygous fragments are 0.26 Mb, 0.21 Mb, 4.28 Mb, 5.86 Mb, and 4.50 Mb, respectively. ZYD00006 homozygous fragments were present in Gm07 and Gm13, and the homozygous fragments were 1.28 Mb and 1.95 Mb, respectively (Figure. S1).
Sixty individuals (F 2 ) were screened for genotypic analysis; among them, 30 individuals had a high ratio of four-seeded pods to the number of pods per plant (ranging from 20.00 to 40.54%), and another 30 individuals did not have four-seeded pods. Sixty-five pairs of SSR markers were screened from within the ZYD00006 homozygous fragment and the heterozygous fragment, and the intervals were approximately 0.13-0.41 Mb. SN14 and ZYD00006 were used for identifying polymorphic SSR markers, and 29 pairs of SSR markers with polymorphisms between the parents were identified. Sixteen pairs of SSR markers with polymorphisms in five heterozygous fragments, of Gm03 (3,957,733-4,224,480 bp), Gm06 (48,413,650-48,625,035 bp), Gm08 (3,398,336-7,680,873 bp), Gm08 (11,853,745-17,718,597 bp), and Gm14 (8,151,693-12,658,057 bp) were used to determine the genotype of 60 individuals (F 2 ) ( Figure. S2). The results showed that there was no significant correlation between the genotype of individuals with a high ratio and the genotype of the individuals with a low ratio. Single-marker analysis showed that the P values of all the 16 SSR markers were greater than 5%, which were not significant, and the possibility of the five heterozygous segments associated with the NFSP could be excluded (Table S4).
. Six SSR markers with polymorphisms in the homozygous fragments of Gm13 (23,349,544,199 bp) were used to determine the genotype of 60 individuals (F 2 ), and a genotypic map was created (Figure. S3). The results showed that there was no significant correlation between the genotype of the individuals with a high ratio and the genotype of the individuals with a low ratio. Single-marker analysis showed that the P values of all six SSR markers were greater than 5%, which was not significant, and the possibility that the Gm13 homozygous fragment (23,349,075-24,544,199 bp) was associated with the NFSP was excluded (Table S4).
Nine SSR markers with polymorphisms in the homozygous fragment of Gm07 (36,116,118-37,399,738 bp) were used to determine the genotype of 60 individuals (F 2 ), and a genotypic map was created ( Figure. S3). The results showed that there was a significant correlation between the genotype of the individuals with a high ratio and the genotype of the individuals with a low ratio. Single-marker analysis showed that four SSR markers, SSR-07-1224, SSR-07-1248, SSR-07-1254, and SSR-07-1257, have high significance. The interval  (Table S4).
Candidate gene mining in QNFSP07-1 for soybean NFSP The marker sequences of QNFSP07-1 were obtained using SoyBase (http:// www. soyba se. org/), and candidate genes were detected and annotated based on a BLAST search of publicly available resources, and 27 genes were identified (Table S8). Fourteen genes had amino acid mutations according to the SNP sequences of the parental SN14 and ZYD00006 sequences via resequencing, and 13 genes have indels between parents in the Intron region and 5'UTR region, no amino acid mutations among these indels (Table S9) Glyma.07G200400 and Glyma.07G201000 were not expressed. Glyma.07G200200 encodes an HSP20-like protein of the chaperone superfamily, whose members are involved in the drought stress response (Gugger et al. 2017). Glyma.07G201500 contains the DUF679 protein domain and decreases downstream gene expression during flower senescence (Gao et al. 2018). Glyma.07G199700 contains the ERF protein 9 domain, and the amino acids valine and glutamate, which are retained in the ERF protein domain, play an important role in drought and low-temperature stress responses (Sakuma et al. 2002). Glyma.07G199800 contains a MAC/perforin domain, which directly interacts with the cell membrane of migrating neurons (Gorbushin, 2016). Glyma.07G200800 contains a zf-Di19 superfamily domain, which is related to the plant drought stress response (Lu et al. 2007). Glyma.07G201300 was annotated as being associated with the glucose-6-phosphate/phosphoric acid transporter; the homologous gene in Arabidopsis is AT5G46110, which is downregulated gene in response to VVZFPL overexpression, and overexpression of VVZFPL affects plant photosynthesis and slows plant growth (Kobayashi et al. 2012). Glyma.07G200900 was annotated as an OB-folding protein according to KEGG pathway analysis (pathway_id k07466); this protein is related to litchi flowering defects (Lu et al. 2014). Glyma.07G201200 was annotated as homeobox 1, and the homologous gene in Arabidopsis is AT1G28420, which is a membranetethered transcription factor (MTTF) belonging to the bZIP family (Chen et al. 2008). Previous studies have shown that members of the bZIP gene family are TFs (transcription factors); bZIPs are ubiquitous within the genome of green plants, and they affect organ and tissue differentiation and regulate cell elongation (Smitha et al. 2017;Corrêa et al. 2008). Glyma.07G201400 was annotated as a cysteine-tRNA synthase, and the homologous gene in Arabidopsis is AT2G31170, which is a regulatory gene encoding aminoacyl-tRNA synthase. Studies have shown that interruption of chloroplast translation affects gametophytes in the seed during embryogenesis, and interruption of mitochondrial translation leads to ovule abortion and cessation of male-female gametogenesis (Berg et al. 2005). Glyma.07G201600 was annotated as encoding the HAESA-like protein 1, and the homologous gene in Arabidopsis is AT1G28440. The HAESA coreceptor SERK1 is a positive regulator of the flower abscission pathway, which allows high-affinity sensing of peptide hormones by binding to the Arg-His-Asn motif in IDA. This sequence pattern is conserved among various plant peptides, indicating that plant peptide hormone receptors share common ligand binding patterns and activation mechanisms. Leucine-rich repeat receptor kinases (LRR-RKs), HAE-SAs, and the peptide hormone IDA control flower abscission (Santiago et al. 2016). Combinations of the results of previous studies have shown that the development of ovules, embryos, pollen, stamens, and flowers has a great influence on the number of pods and seeds (Kurdyukov et al. 2014;Jiang et al. 2015). Glyma.07G200900 and Glyma.07G201200 have higher expression in pod (Fig. 3A), and Glyma.07G200900 and Glyma.07G201200 have a positive regulatory effect in the Hr stage (Fig. 3B). Glyma.07G200900 and Glyma.07G201200 were selected as candidate genes in this study, and additional studies were carried out.
Candidate gene expression analysis The expression of two candidate genes, Glyma.07G200900 and Glyma.07G201200, at Fig. 3 Heatmap profiles of the candidate genes in tissues and different stages. A Heatmap profiles of the candidate genes in tissues. B Heatmap profiles of the candidate genes in different stages different growth and development stages of soybean (florescence, axillary bud, COT, EM, MM, and LM stages) was measured via qRT-PCR. The selected materials included line 953 (high ratio of four-seeded pods to the number of pods per plant), line 32 (low ratio of four-seeded pods to the number of pods per plant), and SN14 (control). The results showed that the expression of the two candidate genes in the cell division stage (axillary bud, COT, and EM stages) was higher than that in the cell expansion stage (MM and LM stages). The highest expression occurred in the EM stage, followed by the COT stage, axillary bud stage, florescence stage, and MM stage. The lowest expression occurred in the LM stage, indicating that the expression decreased with the development of organs (Fig. 4). The candidate genes may affect cell division rather than cell expansion and affect ovule number by regulating cell division, which is related to the formation of four-seeded pods in soybean. The expression of the two candidate genes in the three experimental materials was the highest in the EM stage and lowest in the LM stage. The expression was associated with a high ratio of the number four-seeded pods to the number of pods per plant (higher than that of SN14) and a low ratio of the number four-seeded pods to the number of pods per plant, indicating that these two candidate genes have a positive regulatory effect on the NFSP of soybean. Therefore, Glyma.07G200900 and Glyma.07G201200 can be used as candidate genes for the number of fourseeded pods of soybean, laying a foundation for future gene cloning and functional verification.

Haplotype analysis of candidate genes
Two hundred fifty-three germplasm resources (Table S10) were used for haplotype analysis, which differed significantly in their NFSP and met the requirements for haplotype analysis of the candidate genes. Whole-genome resequencing of the ninety-two germplasm resources was carried out in 2019, and the phenotype statistics are shown in Table S11.
We selected the full-length sequences of four candidate genes and their upstream 3000 bp promoter region sequence and 2000 bp downstream in the Phytozome database as analysis objects, and DNAsp 5.0 was used to detect haplotype polymorphisms. The results showed that Glyma.07G201200 has 2 excellent haplotypes, while Glyma.07G200900 has only one excellent haplotype in the germplasm resource. ANOVA was used to analyze the differences between the phenotypes of the excellent haplotypes, and significant differences in NFSP were showed only one excellent haplotype of Glyma.07G200900 (Table S12).
Glyma.07G201200 has two excellent haplotypes: HA-1and HA-2. There are nine SNP differences between HA-2, nine SNP differences in the promoter region sequence (V186, V581, V2356, V2625, V2656, V2713, V2720, V2787, V2940), four SNP differences in the CDS region (V8713, V10660, V12131, V12448), and one difference in the genomic intron region (V13150). The change in SNP V8713 resulted in an amino acid change (D > E) and the change in SNP V12131 resulted in an amino acid change (K > Q), potentially leading to changes in the structure and function of the Fig. 4 Relative expression of Glyma.07G200900 and Glyma.07G201200, in different stages. A qRT-PCR analysis of Glyma.07G200900 expression. B qRT-PCR analy-sis of Glyma.07G201200 expression. The bars represent the means ± SE (standard errors) of three replications. The different letters represent significant differences (p ≤ 0.05) gene (Fig. 5A). There are multiple SNP differences between HA-1 and HA-2, of which SNPV2713 leads to EECCRCAH1 element according to PLACE. With respect to NFSP based on phenotypic values and haplotypes, ANOVA showed that there were significant differences between HA-1 and HA-2, and HA-2 was the haplotype with a high number of four-seeded pods. HA-1 was the haplotype with a low number of four-seeded pods, with P values of 0.046 (Fig. 5B.)

Discussion
Molecular marker-assisted selection has resulted in the detection of many QTLs of different quantitative traits, including physiological, morphological, and quality-related traits, as well as yield-related traits such as the soybean seed filling rate, salt tolerance, seed yield, and protein content (Varshney et al. 2005;Reinprecht et al. 2006;Guzman et al. 2007;Jun et al. 2008;Palomeque et al. 2009). Soybean NFSP is the key factor in determining yield, and it has also been highly conserved in the process of crop domestication. Molecular markers were used to locate the QTLs and genes related to the NFSP in soybean, determine the chromosomes and positions where they are located, clarify their genetic effects, analyze the gene functions, and determine the relationship between QTLs and the genes to determine their role in the formation of four-seeded pods to increase the NFSP for soybean genetic improvement.
Eighty-six QTLs related to the NFSP of soybean have been published, and the mapping results are distributed across all 20 chromosomes of soybean (Zhang et al. 2012). During research involving marker-assisted breeding, the main effect of QTLs is very important for the study of soybean agronomic characteristics, and most researchers have focused on determining how to obtain main-effect QTLs. In this study, an NFSP-related QTL was preliminarily mapped onto Gm07 between BARC-SOYSSR_07_1224 and BARCSOYSSR_07_1257, ranging from 36,731,771 to 3,740,044 bp. SSR primer encryption was applied to delineate the target segment; the QTL segment was shortened to the SSR marker between BARCSOYSSR_07_1224 and BARCSOYSSR_07_1235, ranging from 36,731,771 to 37,052,836 bp, and the segment was 321 kb. A previous study mapped a QTL related to the number of four-seeded pods, which was located on Gm07, from 37,079,608 to 40,919,337 bp; our region overlapped with this initial QTL (Kuroda et al. 2013), and it was concluded that the QTL identified in this study had certain authenticity. To date, the Ln gene controlling four-seeded pods has been found and studied extensively. In 2013, using map-based, Ln was mapped to 57.3 kb of soybean Gm20 (Fang et al. 2013). The QTL obtained with Ln is important for further study of the number of four-seeded pods of soybean in this study and provides excellent population material support for gene mining and functional research.
The use of a large number of plant populations segregating for the initially mapped QTL in high-density linkage analysis is an effective method for QTL fine mapping. There are two main advantages of using an RHL population for fine mapping. First, when developing a population for fine mapping, only one line from the F 2 population needs to be selected based on its genotype, and backcrossing and selection based on DNA markers or phenotypes are not needed. The genotype composition of the RHL can then be determined by checking only the genotypic data of the linkage map, and there is no need to analyze the genotype of any marker. In 1997, an RHL population was first used for QTL fine mapping. At present, this technique has been widely used for fine mapping of related traits in soybean (Yamanaka et al. 2015;He et al. 2009;Suzuki et al. 2010;Yan et al. 2017).
The maximum number of seeds per pod is determined by the number of ovules, and each stage of ovule development also affects seed formation. Many genes are reported to be related to ovule development, including BEL1 (Western and Haughn, 1999); AP2 (Modrusan et al. 1994); BELL (Ray et al. 1994); HLL; and ANT (Schneitz et al.1998). Many signal pathways involved in seed number determination have also been identified, including BR signaling (Huang et al. 2013), demonstrating that BR signal affects ovule and seed development by regulating BZR1. The number of ovules in ovaries affects the number of seeds per pod, and cell division affects the seed setting rate (Tischber et al. 2003). SHORT INTEGUMENTS1 and 2 (SIN1 and SIN2) are key genes for cell division (Ray et al. 1996), and TSO1 plays an important role in cell elongation (Song et al. 2000). Glyma.07G200900 is involved in the KEGG pathway (pathway_id k07466), and protein phosphatase 2A participates in cell division, which is the molecular mechanism of DNA: replication, repair, recombination, and chromosome dynamics (Davis et al. 2008). Glyma.07G201200 was annotated as homeobox 1, and it belongs to the bZIP family; the bZIP protein was reported to be involved in ABAsignaling pathways, suggesting that it can induce cell division (Toshio and Toshiyuki, 2002). In the present study, by analyzing the transcriptome data of four experimental materials from CSSLs and the expression levels in 26 soybean tissues, we identified two key genes that may be affected NFSP. The expression of two genes in the cell division stage was higher than that in the cell expansion stage. These results suggest that the gene may be involved in the cell division stage. The haplotypes of the two genes were analyzed by 253 germplasm resources, and the result shows that Glyma.07G201200 has two excellent haplotypes, there are differences in the promoter region sequence and CDS region, and resulted the amino acid change. ANOVA showed that the number of four-seeded pods (the ratio of the number of four-seeded pods) among 253 germplasm resources with different excellent haplotypes has significant differences, laying the foundation for our study of the molecular mechanisms of NFSP.
Wild soybean is abundant and widely distributed, with a large number of pods (Jin et al. 2017), and it is not isolated from cultivated soybeans. The excellent traits or genes of wild soybean can be introduced into cultivated soybean during the process of cross breeding. In this study, residual heterozygous individuals were selected from the CSSL population generated by crossing SN14 and ZYD00006. The theoretical background was the SN14 genotype with a short heterozygous interval and few heterozygous loci, which provided a material basis for rapidly mapping the QTL for the soybean four-seeded-pod trait (QNFSP07-1). The primary mapping material was developed in 2015, and QNFSP07-1 was detected in the primary mapping material. Plants heterozygous for the target segment were screened, and a secondary isolated population was developed to gradually complete the QTL fine mapping of the four-seeded-pod trait in soybean.

Conclusions
We used CSSL860-13/SN14 (F 3 ) population for QTL mapping, and obtained one QTL, which the QTL interval is 321 kb. According to the gene annotation of the QTL interval, 27 genes were obtained. qRT-PCR was used to measure the expression of the candidate genes at different developmental stages of soybean, and the result showed that Glyma.07G200900 and Glyma.07G201200 were identified as candidate genes that affect NFSP. Haplotype analysis of two candidate genes, Glyma.07G201200, has two excellent haplotypes, and the significance level between the two excellent haplotypes at p < 0.05. Those results provide the information for gene map-based cloning and