Comparative QTL analysis and candidate genes identication of seed size, shape and weight in soybean (Glycine max L.)

Dissecting the genetic mechanism underlying seed size, shape and weight is essential to these traits for enhancing soybean cultivars. High-density genetic maps of two recombinant inbred line populations, LM6 and ZM6, evaluated in multiple environments to identify candidate genes behind seed-related traits major and stable QTLs. A total of 239 and 43 M-QTL were mapped by composite interval mapping and mixed-model based composite interval mapping approaches, respectively, from which 22 common QTLs including four major and novel QTLs. CIM and MCIM approaches identied 180 and 18 novel M-QTLs, respectively. Moreover, 18 QTLs showed signicant AE effects, and 40 pairwise of the identied QTLs exhibited digenic epistatic effects. Seed atness index QTLs (34 QTLs) were identied and reported for the rst time. Seven QTL clusters underlying the inheritance of seed size, shape and weight on genomic regions of chromosomes 3, 4, 5, 7, 9, 17 and 19 were identied. Gene annotations, gene ontology (GO) enrichment and RNA-seq analyses identied 47 candidate genes for seed-related traits within the genomic regions of those 7 QTL clusters. These genes are highly expressed in seed-related tissues and nodules, that might be deemed as potential candidate genes regulating the above traits in soybean. This study provides detailed information for the genetic bases of the studied traits and candidate genes that could be eciently implemented by soybean breeders for ne mapping and gene cloning as well as for MAS targeted at improving these traits individually or concurrently. present study used RAD-seq to generate over 2200 bin-markers for either of the two-related recombinant inbred line (RIL) populations for QTL mapping and candidate gene(s) identication for seed size, shape and weight. The two-related RIL populations (ZM6 and LM6) were derived from a common male parent Meng 8206 (M8206) crossed with either Zhengyang (Z) and Linhefenqingdou (L) and RILs and their parents were evaluated across multiple environments. The study was aimed to: (i). map main-effect QTLs (M-QTLs), additive by additive QTLs and QE for seed size, shape and weight traits, (ii). analyze epistatic QTL pairs and their interactions with the environment for further utilization of these QTLs in soybean genetic improvement, and (iii). mine potential candidate genes for the major and stable QTLs. These ndings would be useful for the application of marker-assisted breeding (MAB) in soybean and provide comprehensive knowledge on the genetic bases for these traits as well as mined candidate genes would serve as a foundation for functional validation and verication of some genes for seed size, shape and weight in soybean. on implemented by gene B.K., S.S., S.L., Y.C., M.A. and A.H. analyzed the data. M.A.E. drafted the manuscript. T.Z. and S.F.A. revised the paper. All authors have read and agreed to the published version of the manuscript.

generate large scale markers. Among them include genotyping-by-sequencing (GBS) (Poland et al. 2012), restriction-site associated DNA sequencing (RADseq) (Miller et al. 2007;Peterson et al. 2012) and speci c length ampli ed fragment sequencing (SLAF-seq) (Sun et al. 2013). These sequencing technologies have facilitated the production of hundreds to millions of single-nucleotide polymorphisms (SNPs) throughout the whole genome that promote the development of high-density linkage maps. The RAD-seq produces markers that have been proved to be a promising tool for SNP detection and genetic map construction (Chutimanitsakun et al. 2011;Xie et al. 2018). A number of genetic maps produced by RAD-seq have been generated and used for QTL mapping in several crops such as barley (Chutimanitsakun et al. 2011), soybean (Hina et al. 2020), cowpea (Pan et al. 2017), jute (Kundu et al. 2015), sorghum (Kajiya-Kanegae et al. 2020), alfalfa (Zhang et al. 2019a) among others.
In addition to the above, knowledge of molecular mechanisms underlying soybean seed size, shape and weight is still limited. So far, only two seed sizes/weight-related genes have been isolated from the soybean. The gene Glyma20g25000 (ln) has a signi cant impact on seed size and the number of seeds per pod (Jeong et al. 2012), and the PP2C-1 allele underlying Glyma17g33690 from the wild soybean accession 'ZYD7' was reported and demonstrated to increase seed size/weight (Lu et al. 2017). Therefore, it is essential to identify major and stable QTLs and candidate genes related to seed size, shape and weight to improve our understanding of genetic mechanisms controlling these important traits in soybean (Kato et al. 2014;Zhang et al. 2018). Due to the shortage in the available molecular markers and the lack of high-density linkage maps which resulted in low resolution and large con dence intervals of identi ed QTLs, the present study used RAD-seq to generate over 2200 bin-markers for either of the two-related recombinant inbred line (RIL) populations for QTL mapping and candidate gene(s) identi cation for seed size, shape and weight. The two-related RIL populations (ZM6 and LM6) were derived from a common male parent Meng 8206 (M8206) crossed with either Zhengyang (Z) and Linhefenqingdou (L) and RILs and their parents were evaluated across multiple environments. The study was aimed to: (i). map main-effect QTLs (M-QTLs), additive by additive QTLs and QE for seed size, shape and weight traits, (ii). analyze epistatic QTL pairs and their interactions with the environment for further utilization of these QTLs in soybean genetic improvement, and (iii). mine potential candidate genes for the major and stable QTLs. These ndings would be useful for the application of marker-assisted breeding (MAB) in soybean and provide comprehensive knowledge on the genetic bases for these traits as well as mined candidate genes would serve as a foundation for functional validation and veri cation of some genes for seed size, shape and weight in soybean.

Plant materials and experiments
Two recombinant inbred lines (RIL) populations, i.e., ZM6 and LM6 Zhang et al. 2019b), consisting of 126 and 104 lines, respectively, were used in the present study. The two populations were developed by single seed descent (SSD) with the genotypes Zhengyang (Z) and Linhefenqingdou (L) were used as female parents and the M8206 (M6) genotype was used as the male parent. The two female parents, Z and L, have an average 100-seed weight of 17.1 and 35 g, respectively, whereas the male parent has an average 100-seed weight of 13.7 g.
The two RIL populations along with their parents were evaluated for seed size and shape across multiple environments. Experiments were conducted in the Jiangpu Experimental Station (33 • 030 N and 63 • 1180 E), Nanjing, Jiangsu Province, in 2012Province, in , 2013Province, in , 2014 and 2017 growing seasons (designated as 12JP, 13JP, 14JP, and 17JP, respectively), the Fengyang Experimental Station, Chuzhou, Anhui Province (32 • 870 N and 117 • 560 E), in 2012 growing season (designated as 12FY) and the Yancheng Experimental Station, Yangcheng, Jiangsu Province, (33•410 N and 120•200 E) in 2014 (designated as 14YC). Plants were sown in June and harvest were done in October of the same year. Experiments were designed in a randomized complete blocks design (RCBD) with three replications. The experimental plot was one row of 2 m long at 5 cm plant to plant distance and 50 cm row to row distance. Planting and post-planting operations were carried out following the recommended agronomical practices.
Phenotypic data were measured and recorded according to standard procedures (Cheng et al. 2006;Tomooka et al. 2002). In brief, seeds harvested from 10 guarded plants in the middle of each row were used for estimating SL, SW, ST and HSW. The SL was measured as the longest dimension over the seed equivalent to the hilum. SW was measured as the longest dimension across the seed vertical to the hilum. ST was measured as the longest dimension from top to bottom of the seed. The SL, SW, and ST were estimated in millimeters (mm) utilizing the Vernier caliper instrument, according to (Kaushik et al. 2007) (Fig. 1). The seed shape was identi ed by calculating three different ratios, i.e., SL/SW (SLW), SL/ST (SLT), and SW/ST (SWT), as well as atness index (FI).
The ratios between the SL, SW and ST were estimated from the individual values of the length, width, and thickness of the seeds according to (Omokhafe and Alika 2004). Flatness index is an indicator for high or lower seed vigor which impacts the seedling uniformity index as it reduces if the seed vigor reduces. Thus, showing that the plants with FI near to one has good seed vigor concluding higher leaf area index at early vegetative stages and hence higher yield per plant as they have a good number of pods due to the higher accumulation of photoassimilates. These plants, therefore, do not need to increase their average internode length and have a higher number of total nodes and a higher pod number. While atness index (FI) was calculated following the formula elaborated by (Cailleux 1945) and (Cerdà and Garcıa-Fayos 2002) to describe seed shape: where is the seed length, is seed width and T is seed thickness.
It extended from a value of 1 for the round seeds to more than 2 for skinny seeds. The HSW was expressed as an average of ve measurements of 100 randomly selected seeds.
The descriptive statistics of the seed size, seed shape, and HSW traits were calculated using the SPSS software, version 24 (http://www.spss.com). The analysis of variance (ANOVA) for each environment and the combined overall environments (CE) were performed using the PROC GLM procedure in SAS software based on the random model (SAS Institute Inc. v. 9.02, 2010, Cary, NC, USA). The broad-sense heritability (h 2 ) in individual environments was estimated as: where σ 2 g , σ 2 e and σ 2 ge are the variance components estimated from the analysis of variance for the genotypic, error and genotype × experiment variances, respectively, with r as the number of replicates and n as the number of environments. All the parameters were assessed from the expected mean squares in ANOVA. Pearson correlation coe cient (r) between seed size, seed shape, and HSW traits was calculated from the mean data utilizing the SAS PROC CORR with data obtained for CE (average across environments) for each population.

Construction of Genetic Maps and QTL Analysis
High-density genetic maps of the ZM6 and LM6 populations consist of 2601 and 2267 bin markers by using RAD-seq technique, respectively Zhang et al. 2019b) (Suppl. Table 1). The total length of the ZM6 and LM6 maps were 2630.22 and 2453.79 cM, with an average distance between the markers 1.01 and 1.08 cM, respectively (Suppl.

Main-and Epistatic-Effect QTLs Mapping
The WinQTLCart 2.5 software ) was employed to identify the M-QTLs using the average values of seed size, seed shape, and 100-seed weight from the individual environments and overall environments with the composite interval mapping model (CIM) (Zeng 1994). The software running features were 10 cM window size, 1 cM running speed, the logarithm of odds (LOD) (Morton 1955) threshold was computed using 1000 permutations due to an experiment-wide error proportion of P < 0.05 (Churchill and Doerge 1994), and the con dence interval was determined utilizing a 1-LOD support interval, which was controlled by nding the local on the two sides of a QTL top that compatible with a reduction of 1 LOD score. The QTL detected within the overlapping intervals in different environments were considered the same (Palomeque et al. 2009;Palomeque et al. 2010;Zhaoming et al. 2017). Moreover, to identify the genetic effects of the QTLs, i.e., additive QTLs, additive × additive (AA), additive × environment (AE) and AA × environment (AAE), the mixed-model based composite interval mapping (MCIM) procedure was employed in the QTLNetwork V2.1 software (Yang et al. 2008). Critical F-value was calculated by a permutation test with 1000 permutations for MCIM. The effects of QTLs were assessed using the Markov Chain Monte Carlo (MCMC) approach. Epistatic effects, candidate interval selection, and putative QTL detection were estimated with an experiment-wide error proportion of P < 0.05 (Wang et al. 1994;Xing et al. 2012;Yang et al. 2007).

Mining of Candidate Genes for QTL Clusters
QTLs identi ed in two or more environments with an R 2 > 10% were considered as major and stable QTLs (Zhaoming et al. 2017). Regions on chromosomes with several M-QTLs related to different traits studied in this research were termed as a QTL Cluster. The Phytozome (http:/phytozome.jgi.doe.gov) and SoyBase (http:/www.soybase.org) online platform repositories, were employed to retrieve all model genes within the physical interval position of the main "QTL Clusters". Possible candidate genes were predicted based on gene annotations (http:/www.soybase.org and https:/phytozome.jgi.doe.gov) as well as the reported putative function of genes implicated in these traits. Gene ontology (GO) information was obtained from SoyBase via the online resources, i.e., The Center for Biotechnology Information (NCBI: https:/www.ncbi.nlm.nih.gov), GeneMania (http:/genemania.org/) and the Kyoto Encyclopedia of Genes and Genomes (KEGG, www.kegg.jp). These online tools were employed to further screen the predicted candidate genes. Gene ontology (GO) enrichment analysis was conducted for all the genes within each QTL cluster region using AgriGO V2.0 (http:/systemsbiology.cau.edu.cn) (Tian et al. 2017). Gene classi cation was then carried out using Web Gene Ontology (WeGO) Annotation Plotting tool, Version 2.0 (Ye et al. 2006). The publicly available RNA-Seq database on the SoyBase website was used to analyze the expression of the predicted candidate genes in various soybean tissues and the development stages. A heatmap to visualize the fold-change patterns of these candidate genes was developed using the TBtools_JRE 1.068 software (Chen et al. 2020a).

Results
Phenotypic evaluation of seed size, seed shape, and 100-seed weight traits All measured (SL, ST, SW, and HSW), and calculated (SLW, SLT, SWT and FI) phenotypic traits exhibited signi cant differences among the three parental lines across all environments as indicated by the analysis of variance (Tables S2 and S3). Analysis of variance (ANOVA) revealed that all studied traits were signi cantly (P < 0. 001 or P < 0.05) in uenced by the environment, genotypes and the genotype × environment interaction (Tables S4 and S5), indicating the differential response of the genotypes to the changes in environmental cues. Furthermore, the two populations showed continuous phenotypic variations in all studied traits, implying a polygenic inheritance of these traits (Fig. 2). Besides, the estimation of skewness, kurtosis and coe cient of variation (CV) for all studied traits across all environments showed that most of the recorded skewness and kurtosis values were < 1 with CV values of > 3 %, emphasizing that these traits in both populations are controlled by polygenes and are t for QTL mapping (Tables S2 and S3). The differences in mean phenotypic values among the three parental lines for seed size, seed shape, and HSW traits were constantly high across all studied environments, and their multi-environment means for both populations (Fig. 3). The female parent of the LM6 population, Linhefenqingdou, exhibited an average increase of 27.80,28.19,31.10 and 41.37 % in SL,ST,SW and HSW,respectively,compared to the male parent, M8206. Meanwhile, in the ZM6 population the female parent Zhengyang surpassed the male parent M8206 by an average of 11. 00,9.66,7.65 and 17.53 % in SL,ST,SW and HSW across all environments,respectively (Fig. 3,Tables S2 and S3). In both populations, several lines overstep their parents in both directions in all studied traits across all environments, suggesting the occurrence of transgressive segregations within the two populations (Fig. 2, 3). The broad-sense heritability (h 2 ) under individual environments ranged from 69.58-99.04%, 69.08-97.9% and 78.49-98.68% for seed size, HSW and seed shape (Tables S2 and S3). Meanwhile, h 2 under combined environments (CE) ranged from 90.82-94.48%, 87.51-94.46% and 97.58-98.39% for seed size, shape and HSW, respectively. The correlation coe cient (r 2 ) among SL, ST and SW exhibited signi cant positive correlations with each other and with two of the seed shape traits (SLT and SLW) in both populations with r 2 values ranged from 0.79-0.91. Meanwhile, SL, ST and SW exhibited signi cant negative correlations with the other two seed shape traits (SWT and FI) (Suppl. Table 6). Except for the correlation between SLW and SWT, all the seed shape traits showed signi cant positive correlations with each other in both populations with r 2 values ranged from 0.33-0.95. Furthermore, all seed size traits, i.e., SL, SW, and ST showed signi cant positive correlations with HSW with r 2 values ranged from 0.29 to 0.70 in both populations.
Mapping of seed shape main-effect QTLs in the two populations.
Comparative analysis of main-effect QTLs for seed size, shape and weight in the two-related populations Regions on chromosomes with several identi ed M-QTLs for different studied seed phenotypic traits were designated as a QTL cluster. Accordingly, twentyfour QTL clusters located on 17 chromosomes with exception of Chr02, Chr12 and Chr18, were identi ed (Suppl. Table 10, Suppl. Figure 2). Among the identi ed 24 clusters, seven clusters harbored QTLs related to seed size, seed shape, and HSW, ve clusters harbored QTLs related only to seed size and seed shape traits, nine clusters comprised QTLs related to seed size and HSW traits, and 3 clusters harbor QTLs for only seed shape traits (Suppl. Table 10). The majority of these clusters contained major QTLs. Furthermore, QTLs within 15 clusters revealed positive additive effects with the bene cial alleles inherited from the big seed size and heavy seed weight parents (either Zhengyang or Linhefenqingdou). Eight clusters out of 24 contain QTLs that have been detected in both populations (Suppl. Table 10). The most prominent M-QTL (qFI-1-3 LM6 ) with a LOD score of 3.71-10.44 and R 2 (10.45-31.50 %) was located in Cluster-01. Each cluster comprised a different number of QTLs, with the highest number of QTLs, i.e., seven associated with seed size, shape, and HSW traits were in Cluster-03 at the physical position of 1,509,548-6,780,840 bp allocated as two QTLs related to seed size (qSL-3-1 LM6 and qSL-3-2 LM6 ), four QTLs for seed shape (qSLW-3-2 LM6 , qFI-3-1 LM6 , qSLT-3-1 LM6 , and qFI-3-2 LM6 ) and one QTL HSW (qHSW-3-1 LM6 ). In addition, except for qHSW-3-1 LM6 , all QTLs in this cluster were major QTLs with R 2 > 10 %. Furthermore, each of clusters-13, 16.2, and 17.1 contained ve or six QTLs only related to seed size and HSW traits and displayed R 2 of 8.85-13.43 %, 5.96-11.26 %, and 6.8-18.30 %, respectively, and these clusters comprised M-QTLs from only one of the two RIL populations (Suppl. Figure 2, Suppl. Table 10). Another rich region of QTLs was cluster-20 on Chr20 with a physical length of 1.2Mb. This region harbored 5 M-QTLs related to seed size and shape, i.e., qFI-20-1 ZM6 , qSLT-20-1 ZM6 , qSLW-20-1 ZM6 , qST-20-1 ZM6 , and qSW-20-1 ZM6 , out of these ve QTLs, three are major QTLs with R² of 11.2-19.2% (Suppl. Table 10). Cluster-09 contained 5 QTLs related to seed size, shape, and HSW and displayed R² of 16.3 % and 12.5 % for qHSW-9-1 ZM6 and qSLW-9-3 LM6 , respectively, across the two populations. Cluster-14.1 consisted of four major M-QTLs within the physical region between 5834015-9844637bp in both populations with R 2 values ranged from10.4-18.4%, one from which (qSW-14-2 ZM6 ) are related to seed size traits, and three (qSLW-14-1 LM6 , qFI-14-1 LM6 , and qSLT-14-1 LM6 ) for seed shape traits all were (Suppl. Table 10). Six clusters harbored 4 M-QTLs each, were identi ed, from which four clusters harbored QTLs associated with HSW as well as some seed size and seed shape traits, i.e., cluster-07 and cluster-19.1, cluster-08 and cluster-14.2 cluster-10.2 mapped on Chr10 and contained M-QTLs for seed size and seed shape traits, and cluster-16.1 that contained only M-QTLs related to seed shape traits (Suppl. Table 10). The remaining 9 clusters have three QTLs each, out of them cluster-01 and cluster-17.2 that contain major QTLs related only to seed shape traits. Conversely, cluster-04.1 and cluster-19.2 contain minor M-QTLs associated with SW, SL and HSW. Another two clusters contained M-QTLs for both seed size and seed shape traits (Suppl . Table 10). Moreover, the other three clusters of M-QTLs, i.e., cluster-10.1, cluster-11 and cluster-15 contained both major and minor QTLs for seed size traits and HSW. The Cluster-04.2 had two QTLs for two traits, qHSW-4-3 LM6, ZM6, and qSL-4-1 ZM6 with R 2 of 13.1-17.7 %. Among the identi ed 24 clusters, seven clusters harbored QTLs related to seed size, seed shape, and HSW, ve clusters harbored QTLs related to only seed size and seed shape traits, nine clusters had QTLs related to seed size and HSW traits, and three clusters harbored QTLs for only seed shape traits (Suppl. Table 10). The majority of these clusters had major QTLs. Furthermore, QTLs within 15 clusters revealed positive additive effects with the bene cial alleles inherited from the big seed size and heavy seed weight parents (either Zhengyang or Linhefenqingdou). Eight clusters out of twenty-four contained QTLs that have been detected in both populations (Suppl. Table 10).
Analyses of additive effect QTL and additive QTL ×environment interactions for seed size, shape and weight. The mixed-model based composite interval mapping (MCIM) method was implemented in QTL Network V2.1 software to map for additive effect (A) QTLs and their interactions with the environment (AE) was performed for both RIL populations across multi-environments. In total, thirty-ve AA on 17 chromosomes related to seven seed size and seed shape traits were identi ed. These comprised 9, 3, 7, 3, 4, 1, and 8 A QTLs associated with SL, SW, ST, SLW, SLT, SWT, and FI, respectively, in the LM6 and ZM6 populations across all environments (Table 1). Furthermore, the contributed allele of 11 QTLs of them was inherited from the M8206 parent that decreased seed size and seed shape values through signi cant additive effects. Meanwhile, the contributed allele of the remaining 24 QTLs descended from either Zhengyang or Linhefenqingdou parent of the ZM6 or LM6 population, respectively, that increased seed size and shape values through signi cant additive effects (Table 1). On the other hand, thirteen out of 35 QTLs revealed signi cant AE effects in at least one environment. However, ve QTLs, i.e., qSW-13-5 LM6 , qSW-19-2 LM6 , qFI-8-6 LM6 , qSL-10-2 ZM6 and qST-13-5 ZM6 , showed signi cant or highly signi cant AE among all studied environments (Table 1). Furthermore, the in uence of AE effects on seed size and seed shape values was environmentally dependent (Table 1). Eight AA QTLs associated with HSW were identi ed on 6 chromosomes, i.e., Chr03, Chr08, Chr09, Chr13, Chr14 and Chr16 in LM6 and ZM6 populations across six environments ( Table 2). Six of those 8 QTLs displayed a positive additive effect with the bene cial allele inherited from the female parent (Linhefenqingdou or Zhengyang in the LM6 or ZM6 population) which could increase HSW. Meanwhile, the remaining two QTLs, i.e., qHSW-13-3 ZM6 and qHSW-14-2 ZM6 , revealed negative additive effects with the alleles are inherited from the common male parent (Meng8206) which could reduce HSW (Table 2). Additionally, two QTLs, i.e., qHSW-14-3 LM6 and qHSW-8-3 ZM6 , displayed signi cant AE effects in two individual environments. Whereas, the qHSW-13-3 ZM6 showed a signi cant AE only in the 13JP environment. In addition, the qHSW-14-4 ZM6 revealed signi cant or highly signi cant AE effects across three different environments, i.e., 12FY, 12JP, and 17JP (Table 2).

Comparison Of Two Mapping Approaches
A total of 92, 99, and 48 M-QTLs associated with seed size, seed shape, and HSW, respectively, were mapped by the CIM approach (Tables S7-S9). Meanwhile, forty-three QTLs were identi ed for seed size, shape and HSW by using MCIM approach (Tables 1 and 2). Among these, twenty-two QTLs were identi ed by both approaches within the same physical chromosomal position, indicating the dependability and stability of these QTLs. Moreover, a comparison of the physical chromosomal regions of the QTLs detected by both approaches revealed that four QTLs, i.e., qSL-7-1 LM6 , qSW-19-2 LM6 , qFI-3-1 LM6 , and qHSW-3-2 LM6 , were identi ed for the rst time in the two populations (LM6 and ZM6) with an R 2 > 10 %. Therefore, we considered these QTLs as novel and most stable QTLs that could be validated and utilized for map-based cloning, candidate genes identi cation and QTL stacking into elite cultivars targeted at improving seed size, shape and HSW in soybean.
Analyses of epistatic-effect QTLs and their interaction with the environment Analysis of the seed size and shape traits data under all four environments identi ed 38 pairwise epistatic effects (AA) QTLs, from which 2, 13, 6, 2, 3, 5, and 7 pairs were related to SL, SW, ST, SLW, SLT, SWT, and FI traits, respectively, with R 2 values ranged 0.51-11.35 % ( Table 3). All QTL pairs displayed a high signi cant additive × additive (AA) effect. Further analyses revealed that 20 AA QTLs showed signi cant or highly signi cant pairwise additive-additiveenvironment (AAE) interaction effects in at least one environment with R 2 values ranged 0.13-5.31 % (Table 3). Furthermore, ten pairs showed signi cant AAE in two environments, i.e., 12FY (AAE1), and 12JP (AAE2), while three pairs displayed signi cant AAE in 12JP (AAE2) and 14JP (AAE3) environments (Table 3). This indicates the effect of the environment on gene expression on phenotype development through epistatic effects. Out of 38 QTLs, sixteen pairwise interactions exhibited negative epistatic effects (AA) that decreased the values of seed size and shape traits, whereas 22 pairwise interactions exhibited positive epistatic effects (AA) that increased the values of seed size and shape traits ( Table 3). The pairwise interaction between qFI-1-1 ZM6 and qFI-7-3 ZM6 revealed the strongest positive epistatic effect (0.65), whereas the pairwise qSLT-6-1 LM6 and qSLW-9-1 LM6 , revealed the weakest positive epistatic effect (0.02).
Conversely, qSWT-3-1 LM6 and qSWT-13-1 LM6 resulted in the strongest negative epistatic effect (-0.71), whereas the pairwise qSLW-2-6 ZM6 and qSLW-18-3 ZM6 pairwise resulted in the weakest negative epistatic effect (-0.02) ( Table 3). Two digenic pairwise epistatic QTLs for HSW with highly signi cant additive × additive (AA) effects were identi ed on 4 chromosomes ( Table 4). The rst pairwise is composed of 2 QTLs, qHSW-11-1 LM6 , and qHSW-20-1 LM6 with an R² of 3.46 %, whereas the second pairwise comprises the two QTLs qHSW-9-1 ZM6 and qHSW-16-3 ZM6 with an R² of 1.38 %. In addition, the two pairwise interactions exhibited positive epistatic effects that could increase the HSW in both populations. Meanwhile, the two pairs did not show any signi cant AAE interaction effects across all six environments (Table 4).  Table 11). Gene ontology (GO) enrichment analyses via AgriGO V2.0 (http:/systemsbiology.cau.edu.cn) (Tian et al. 2017) were used to classify the model genes in each cluster. The classi cation was based on molecular function, biological process, and cellular components visualized on the Web-based GO (WeGO) V2.0 https://wego.genomics.cn (Ye et al. 2006) In all seven QTL clusters, high percentages of genes were related to catalytic activity, cell part, cell, cellular process, binding, and metabolic process terms, in addition to the response to stimulus in cluster-03 (Fig. 5). These indicate essential roles of these terms in the seed size, shape and seed weight development in soybean. Probable candidate genes underlying these QTL clusters responsible for seed size, shape, and HSW in soybean were further predicted based on gene annotations, GO enrichment analysis and the previously known putative biological function of the gene. Based on these, a total of 19, 12, 26, 18, 22, 30, and 16 . Table 12). These genes may work directly or indirectly in regulating seed development in soybean, which in turn regulating seed size, shape, and HSW. These genes are involved in response to brassinosteroid stimulus, regulation of cell proliferation and differentiation, regulation of transcription, secondary metabolism and signaling, storage of proteins and lipids, hormone-mediated signaling pathway, regulation of cell cycle process, transport, ubiquitin-dependent protein catabolic process, embryonic pattern speci cation, and response to auxin stimulus (Table 5). However, the RNS-seq data of expression of genes in soybean genome developed by (Severin et al. 2010)  respectively, in the young leaf, ower, pod, seed, root and nodule (Fig. 6, Suppl. Table 13). From the heatmaps, forty-seven genes out of the identi ed 143 candidate genes are highly expressed during seed developmental stages as well as in seed-related tissues (Fig. 6, Suppl. Table 13), hence these could be considered as potential candidate genes, however, they need screening and validation for utilization for seed size, shape, and weight improvement in soybean.

Discussion
Dissecting the genetic factors underlying seed size, shape and weight and their relationship to the ambient environment is essential for improving soybean yield and quality-related traits. In addition, understanding the additive and additive × environment effects of QTLs and their contribution to the phenotypic variations would facilitate the application marker-assisted selection (MAS) because it will prominently lead the breeders in the QTL selection and expectation of the outcomes of MAS (Jannink et al. 2009). A major objective of utilizing linkage mapping in plant breeding is to deepen our understanding of the inheritance and genetic architecture of quantitative traits and detect markers that can be employed as indirect selection tools in plant breeding (Abou-Elwafa 2016; Bernardo 2008). In this regard, QTL mapping has been regularly used for detecting the QTL/gene underlying the quantitative traits such as seed size, shape, and weight in crop plants. As known, parental diversity and marker density greatly in uence the accuracy and precision of QTL mapping. Besides, the population size used in most of the previously published reports for genetic mapping studies usually varied from 50-250 individuals, but larger populations are needed for high-resolution mapping. Moreover, a high-density genetic map facilitates the detection of narrow linked markers associated with QTLs and provide a good base for investigating quantitative traits (Galal et al. 2014;Mohan et al. 1997;Tewodros and Zelalem 2016). Besides, the statistical difference between phenotypic data obtained from various environments could enhance the accuracy to detect QTL position (Zhao and Xu 2012). Previous studies identi ed important seed size and shape QTLs, which were also classi ed to be associated with HSW, however, most of the studies utilized low-density genetic maps based on RFLP, SSR markers, biochemical and morphological markers which have large con dence interval with low resolution of QTLs not suitable for detecting candidate gene (Abou-Elwafa 2016; Bernardo 2008; Han et al. 2012). Therefore, it is crucial to employ high-density genetic maps to detect more new recombination in a population, which in turn will increase the accuracy of QTL mapping and MAS Hina et al. 2020).
In the present study, high-density genetic maps constructed from the two-related RIL populations LM6 and ZM6 consist of 2267 and 2601 bin markers, respectively, were used. The markers in the LM6 and ZM6 linkage maps were distributed to all 20 linkage groups and covered the length of 2453.79 and 2630.22 cM, with 1.08 cM and 1.01 cM average distance between adjacent markers, respectively . To minimize the environmental errors, the two RIL populations were evaluated in four environments (including different geographical areas and years). The high-density linkage maps of LM6 and ZM6 RIL populations across multi-environment as well as the combined environment were employed to map major main-effect, additive-effect and epistatic-effect QTLs together with interactions with environments and the candidate genes underly seed size, shape and seed weight traits. The parents of the two mapping populations showed high phenotypic variations across all environments in all studied traits, i.e., SL, SW, ST, SLW, SLT, SWT, FI and HSW. Consequently, the transgressive segregation and continuous variations observed in the two populations in all studied phenotypic traits facilitate the identi cation of a high number of both major and minor effect QTLs including some novel QTLs associated with all studied traits (Teng et al. 2009;Xu et al. 2011;Zhang et al. 2018). All measured and calculated traits in both populations were signi cantly (P < 0.01) in uenced by genotype (G), environment (E) and their interactions (G×E), suggesting that the seed size, shape, and weight traits are not only governed by both genetic and environment but also there is an effect of the G×E interaction (Hu et al. 2013;Liang et al. 2016;Sun et al. 2012). This explains the observed high h 2 (99.04 %), and accordingly deduced that these traits are amenable to manipulation by selection without the assistance of molecular markers, indicating that these traits may produce the same phenotypic values when evaluated in the same geographical area. Except for SL, SW and ST that exhibited a highly signi cant correlation between each other and with HSW, our data showed that seed size, shape, and weight traits are not correlated which is favorable when breeding for a round-type with smaller or bigger seed size (Cober et al. 1997;Salas et al. 2006).
Comparative QTL results using the CIM QTL mapping approach with SoyBase database identi ed 69, 82 and 29 novel QTLs for seed size, shape, and HSW, respectively, indicating the distinct genetic architecture of the LM6 and ZM6 populations. These novel QTLs together explain more than 88.00 % of phenotypic variance for seed size, shape, and weight, signifying their potential value for improving soybean cultivars. Besides, the identi cation of novel QTLs in the present study suggests that more germplasms are needed to be used for unraveling the complex genetic basis for seed size and shape traits in soybean.
Among these novel QTLs, the qFI-1-3 LM6 showed the highest R 2 and LOD values and therefore may be the major QTL underlies atness index (FI).
Among these novel QTLs, the qFI-1-3 LM6 showed the highest R 2 and LOD values and therefore might be the major QTL underlies FI. Besides, Chr01 and Chr03 harbored 4 and 3 FI QTLs, suggesting crucial roles of Chr01 and Chr03 in controlling the inheritance of seed FI in soybean. Moreover, the positive alleles for seed size, shape and HSW traits were inherited from both parents of the two RIL populations. Therefore, it is likely that not only the higher seed size and heavy weight parent (Linhefenqingdou or Zhengyang) contributed favorable alleles but also the lighter seed weight parent (M8206) might play a role Hina et al. 2020).
Mapping of QTLs associated with seed size, shape and weight related traits using the MCIM approach was performed to; i) dissect the additive effect QTLs and Q × E interactions which is essential for selecting the most compatible varieties adapted to particular environments, and ii) further validate the QTLs identi ed by the CIM approach. The MCIM method approach identi ed 18 QTLs for seed sizes, shapes, and weight traits that are colocalized in the same physical interval of the CIM-mapped QTLs as previous studies. The major SL QTL qSL-7-1 LM6 is colocalized with Satt150 QTL (Salas et al. 2006). Furthermore, the SW QTLs qSW-13-5 LM6 , qST-18-4 LM6 , qSWT-7-5 LM6 , and qSLT-5-3 ZM6 were mapped in the same position as reported in previous studies (Fang et al. 2017;Salas et al. 2006). Additionally, the qHSW-3-2 LM6 , qHSW-8-3 ZM6 and qHSW-13-3 ZM6 QTLs are colocalized to the previously identi ed SW QTL Seed weight 32 − 3 (Satt675), Seed weight 35 − 1 and Seed weight 19 − 2 (Satt114) QTLs, respectively (Funatsuki et al. 2005;Han et al. 2012;Li et al. 2008). Therefore, these QTLs could also be considered as stable QTLs for further ne mapping and map-based cloning to uncover the genetic control and mechanisms of seed size, shape, and weight traits in soybean, and molecular markers tightly linked to these QTLs could be used for MAS.
Dissecting the epistatic and QTL × environment effects are crucial for understanding the genetic mechanisms that greatly contributed to the phenotypic variations of complex traits (Kaushik et al. 2007). The genetic construction of seed size, shape, and weight also contains epistatic interactions between QTLs (Kato et al. 2014;Liang et al. 2016). Therefore, disregarding intergenic interactions will lead to the over-estimation of individual QTL effects, and the underestimation of genetic variance (Nyquist and Baker 1991). Consequently, this might result in a large drop in the genetic response to MAS especially in late generations (Zhang et al. 2004). The identi ed 40 pairwise digenic epistatic QTLs for seed size, shape and weight related traits in the present study could be considered as modifying genes that do not exhibit only additive effects but could affect the expression of seed size, shape and weight related genes through epistatic interactions. Similar results for the epistatic interaction of seed size, shape and weight QTLs have been also previously reported by (Xin et al. 2016;Zhang et al. 2018). The appearance of epistatic interactions for a speci c trait makes selection di cult. Noteworthily, all main-effect QTLs detected in our study had no epistatic effect, which raises the heritability of the trait guiding to easier selection.
Genomic regions were identi ed as QTL clusters based on the presence of a large number of QTLs related to all or some of seed size, shape and HSW. The identi cation of 24 QTL clusters located on 17 different chromosomes. Accordingly, twenty-four QTL clusters were identi ed on 17 chromosomes each contained three or more QTLs related to seed size, shape, and HSW traits. These QTL clusters have not been reported previously and added to the developing knowledge of the genetic control of these traits. Moreover, the colocalization of QTLs for seed size, shape, and HSW and the way that they have exceptionally corresponded support the highly signi cant correlation with each other (Cai and Morishima 2002) (Suppl . Table 10). Besides, the occurrence of the QTL clustering could signify a linkage of QTLs/genes or outcome from the multiple effects of one QTL in the same genomic region Liu et al. 2017;Wang et al. 2006). Furthermore, the QTL clusters displayed that the QTLs linkage/ gathering could make the enhancement of seed size and shape more easily than single QTLs (Hina et al. 2020). Signi cant positive correlations between soybean seed protein and oil contents and seed size and seed shape have been demonstrated, therefore, both traits are directly associated with seed size and shape in soybean (Hacisalihoglu and Settles 2017;Qi et al. 2011;Wu et al. 2018 Yang et al. 2011). Additionally, the position of the rst ower and the number of days to owering have large effects on seed number per plant in soybean (Khan et al. 2008;Tasma et al. 2001;Yamanaka et al. 2001), which in turn affects seed size and HSW indicating the existence of common genetic factors for these traits. QTLs associated with the position of the rst ower identi ed previously (Han et al. 2012;Tasma et al. 2001), are located to the genomic region of clusters 16.1, 19.2, and 20 (Hyten et al. 2004). The extensive analysis of QTLs clusters in our study suggests that breeding programs aiming to improve seed size, shape, and weight with enhanced quality should focus on QTL clustering and select QTLs within these regions. Besides, the existence of QTL clusters provides evidence that some traits related-genes are more densely concentrated in speci c genomic regions of crop genomes than others (Fang et al. 2017).
Identi cation of candidate genes underlying QTL regions is of great interest for breeding programs (Abou-Elwafa 2018; Abou-Elwafa and Shehzad 2018). So far, only two seed sizes/weight-related genes have been cloned from the soybean, i.e., the Glyma20g25000 (ln) gene that has a signi cant impact on seed size and the number of seeds per pod (Jeong et al. 2012), and the PP2C-1 allele underlying Glyma17g33690 has been reported to increase seed size/weight (Lu et al. 2017). In our study, a bioinformatics pipeline implementing genomic sequences of identi ed QTL clusters was employed to identify candidate genes. The pipeline consists of three complementary steps, i.e., 1) retrieving candidate genes from the SoyBase database (www.soybase.org) visualize the molecular function of candidate genes by GO enrichment analyses and gene classi cation, and 3) the implication of candidate genes in seed size, shape and weight based on their expression pro les. Accordingly, one-hundred forty-three genes were considered as potential candidates. The GO enrichment and gene classi cation analyses showed that most of the identi ed candidate genes behind QTL clusters are related to the terms of catalytic activity, cell part, cell, cellular process, binding and metabolic process terms in addition to the response to stimulus in cluster-03, and these terms are reported as being vital elements in seed development (Li and Li 2014;Mao et al. 2010). For example, Glyma07g14460 gene underlying QTL cluster-7 belongs to the oxygenase (CYP51G1) protein class, which has been con rmed to regulate seed size in soybean (Zhao et al. 2016). These predicted 143 genes have functions that are related/involved in seed development, which in turn in uences the size, shape, and weight of seeds, such as brassinosteroid mediated signaling pathway, regulation of cell differentiation and proliferation, fatty acid beta-oxidation, peroxisome organization, double fertilization forming a zygote and endosperm, lipid transport and storage, regulation of hormone levels transport and metabolic processes, ubiquitin-dependent protein catabolic process, and sugar mediated signaling pathway (Li and Li 2014;Mao et al. 2010). Furthermore, ten candidate genes were identi ed as a regulator of ubiquitin-dependent protein catabolic process, RING-type E3 ubiquitin ligases and lipid catabolic process (Table 5). Several components of the ubiquitin pathway such as the ubiquitin activating enzyme (E1), ubiquitin conjugating enzyme (E2) and ubiquitin protein ligase (E3) have been reported to play important roles in regulation seed and organ size (Li and Li 2014). Similarly, 16 candidate genes have functions in pollen tube development, embryo sac egg cell differentiation, post-embryonic development, regulation of seed maturation, positive regulation of gene expression, regulation of cell cycle process, ovule development, anther development, seed dormancy process and seed maturation (Table 5) and hence they are likely to participate in regulating seed size, shape and weight in plants, including soybean (Meng et al. 2016). Additionally, ten candidate genes are involved in response to auxin stimulus, response to ethylene stimulus, abscisic acid biosynthetic process that are known to be implicated in promoting seed size and weight in Arabidopsis (Table 5) (Xie et al. 2014). Furthermore, six genes are known to play functions in glucose catabolic process, phosphatidylcholine biosynthetic process, carbohydrate metabolic process, maltose metabolic process and starch biosynthetic process which some of them are known to be involved in partitioning and translocation of photo-assimilates and grain lling in rice (Table 5) (Chen et al. 2020b;Zhang et al. 2020).

Conclusions
The present study employed high-density maps of two-related RIL populations, LM6 and ZM6 evaluated in multiple environments to identify M-QTLs as well candidate genes controlling seed size, shape, and weight in soybean. Besides, this is the rst detailed and comprehensive investigation of QTLs for atness index as a seed shape trait in soybean. A total of 180 and 18 M-QTLs were reported for the rst time in this study using the CIM and MCIM QTL mapping approaches, respectively. Besides, sixty-nine QTLs were considered as stable as detected in more than one speci c environment or one individual environment together with CE. All the positive alleles of 282 identi ed QTLs were inherited from the female parents. Our data revealed 7 major and stable QTL clusters underlying the inheritance of seed size, shape and weight located to genomic regions on chromosomes 3, 4, 5, 7, 9, 17 and 19 in soybean. The implemented bioinformatics pipeline delimits the number of the identi ed candidate genes to 47 genes within the physical interval of the previously mentioned 7 genomic regions involved directly or indirectly in seed size, shape and weight. These genes are highly expressed in seed-related tissues and nodules, indicating that they may be involved in regulating the above traits in soybean. Furthermore, some of the potential 47 candidate genes have been included in our on-going projects for functional validation to con rm their effect on seed size, shape, and weight. Our study provides detailed information for genetic bases of the studied traits and candidate genes that could be e ciently implemented by soybean breeders for ne mapping and gene cloning as well as for MAS targeted at improving seed size, shape and weight.

Con icts of Interest
The authors declare no con icts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Availability of data and material
All data are included within the manuscript and its supplementary material.
Code availability Not applicable.
Ethics approval and consent to participate Not applicable. Boxplot for seed size, seed shape and 100-seed weight traits. The black line in the middle of the box shows the median, the white box indicates the range from the lower quartile to the upper quartile, the dashed black line and yellow dots represent the dispersion and frequency distribution of the phenotypic data in each of the six environments, i.e., 12FY, 12JP, 13JP, 14JP, 14YC, and 17JP. While a and b represent LM6 and ZM6 populations  Position of most prominent QTL detected by CIM approach associated with seed size and seed shape traits in the LM6 and ZM6 RIL populations grown in multiple environments indicated as with E1, FY2012; E2, JP2012; E3, JP2013; E4, JP2014; E5, YC2014; E6, JP2017 respectively, in addition to the combined environment (CE). (a) LOD curve for qSL-10-1LM6, (b) LOD curve for qSW-17-2LM6, (c) LOD curve for qST-13-2ZM6, (d) LOD curve for qSLW-13-4ZM6, (e) LOD curve for qSLT-14-1LM6, (f) LOD curve for qSWT-8-1LM6, (g) LOD curve for qFI-1-3LM6, and (h) LOD curve for qHSW-14-2ZM6. The LOD threshold (2.5) is indicated by a pink line. The double-headed arrow denotes the location of prominent QTL. The X and Y-axis represent chromosome and LOD score, respectively.