Dissecting four correlated growth period traits using a genome-wide association study approach in soybean

Soybean [Glycine max (L.) Merr.] is a short-day, photoperiod-sensitive crop. The vegetative growth period (V), reproductive growth period (R), whole growth period (V + R) and the ratio of growth period structure (R/V) are photoperiod traits critical to adaptability and yield in soybean. To thoroughly dissect the genetic basis of these growth period traits and their correlations, a panel of 277 soybean accessions were genotyped with a single nucleotide polymorphisms (SNPs) panel and phenotyped at three field environments over three years. A genome-wide association study found 28 SNPs on fifteen chromosomes with 33 significant marker-trait associations, eleven, eight, four and ten for V, R, V + R and R/V, respectively. Two SNPs, Map-6077 and -6502, showed pleiotropy in three and two traits, respectively. Map-6077, associated with V, R, and V + R, was a synonymous SNP of the GmMFT, a member of PEBP family that has been reported to inhibit seed germination and regulate oil content. These results provide candidate SNP markers and genes for further functional experiments to advance molecular marker-assisted breeding.


Introduction
As a typical short-day crop, soybean (Glycine max (L.) Merr.) is sensitive to changes in photoperiod, which varies with latitude and planting season (Liu et al. 2017). This sensitivity limits the introduction, cultivation and large-scale planting of elite soybean cultivars. Growth period traits are influenced by the response of the plant to the relative length of day and night. This is important not only in the context of extending the geographical range of cultivation of the crop but also to increase yield.
Although the whole growth period (V ? R) includes vegetative growth (V) and reproductive growth (R), they are relatively independent developmental stages (Fehr and Caviness 1977) and are used as evaluation traits (Yang et al. 1994;Zhang et al. 2015). R is related to yield, quality and resistance to stresses from environment, and several genetic loci have been associated with R besides V and V ? R (Cheng et al. 2011). Time to flower (R1), maturity (R8), and reproductive period (RP) have also been defined as three separate traits (Kong et al. 2018). Genetic loci differentiating DTF, days to flowering, (V) and DTM, days to maturity, (V ? R) as well as DTFM, days from flowering to maturity, (R) have been found indicating that there may be major and minor-effect loci underlying V and V ? R (Zhang et al. 2015). The ratio of R and V (R/V) is used as an index of growth period structure and the relationships between two stages (Jian 1990;Wang et al. 2015a). These traits are all photoperiod and important to the adaptability and yield of soybean (Garner and Allard 1927;Han et al. 2006). Pleiotropy is due to a QTP (SNP or other polymorphism affecting a quantitative trait) affecting two or more traits (Chen and Lubberstedt 2010). For example, a few genes have been found to associate with multiple agronomic traits in crops, for example, Ghd7 regulated heading date and yield potential in rice (Xue et al. 2008).
Here, we conducted a genome-wide study of vegetative growth period (V), reproductive growth period (R), whole growth period (V ? R), and the ratio of growth period structure (R/V) separately using 277 soybean accessions and 1161 SNPs, which found 32 significant marker-trait associations (MTAs). This study provides a better understanding of the underlying genetic basis of the growth period traits and provides functional SNPs as tools for molecular soybean breeding programs aimed at improving the adaptation and yield of soybean to each latitude.

Measurement of phenotypes
The 277 accessions were planted in a randomized complete block design with three replications at three field environments (BJSp, Beijing Spring; BJSu, Beijing Summer; SY, Sanya) over two years -2012 or three years 2010-12 (SY). For BJSu, BJSp, and SY, the average photoperiod (day length) decreased gradually due to planting in the field without any treatment. The field environments were defined as combinations of each location and planting season. BJSp and BJSu are from the Changping and Shunyi district of Beijing (40. 22°N, 116.2°E; 40.13°N, 116.65°E) with two different planting seasons: Spring (planted in mid-May) and Summer (planted in mid-June). SY is from Sanya, Hainan Province (18.25°N, 109.5°E), nearly 22°south of Beijing, planted in early November. At the Beijing site, each accession was planted in a 1.5 m long row with an interplant distance of ten centimeters (cm) and an inter-row distance of 55 cm. At Sanya, the row length was uniformly one meter, the interplant distance was six cm, and the inter-row distance was 45 cm. The following traits were scored: date of emergence of the cotyledons as VE, the date when 50% of the plants within the row first flowered as R1 and the date when 50% of the plants within the row had reached maturity (95% of pods maturity for one plant) as R8 (Fehr et al. 1971;Qiu et al. 2006). Four traits were analyzed: V = R1-VE, R = R8-R1, V ? R = R8-VE, R/V = (R8-R1) / (R1-VE). V and R represent the vegetative and reproductive stages as described previously (Fehr et al. 1971). Measurements were averaged over the three replicates, and subsequently over the three years in each environment, using a best linear unbiased prediction (BLUP) (Henderson 1975).

Phenotype data analysis and Heritability estimation
Plants that did not reach maturity were not included and treated as missing data. The field data was used to get V, R, V ? R and R/V phenotype data, and extreme values that due to artificial error or plant died from virus or other reasons were filtered. BLUP values for each independent environment used in GWAS, phenotypic description analysis, and trait correlations analysis were estimated through the R environment package lme4 package (Bates et al. 2014) to evaluate the BLUPs with the linear mixed model, lmer(y * (1|Accession) ? (1|Year) ? (1|Accession:Year)). The broad-sense heritability for each trait was evaluated by lme4 package using the formula H 2 = V g /(V g ? V GL /L ? V GY /Y ? V e /(L*Y)), where V g , V GL , V GY , V e represent variance of the genotype, genotype and location, genotype and year, and residual variance, respectively. Descriptive analysis of the phenotypic data was performed using R software (https://www.r-project.org).

SNP genotyping and filtering
A total of 1236 SNPs were used for genotyping 277 cultivated soybean accession. These SNPs included 1150 SNPs from a reported SNP genotyping array which selected from random distribution, homologs and reported QTL based on 55 whole-genome resequenced genome and a transcript map (Li et al. 2015). An additional panel of 86 SNPs obtained by Sanger sequencing that around known QTL of growth period traits or distributed among chromosomes. The methods for SNP genotyping of 1150 SNPs and 86 SNPs were based on the Illumina BeadArray platform (Illumina Inc., San Diego, CA, USA) and Sequenom MassARRAY platform (Gabriel et al. 2009). After obtaining genotype data, we filtered 75 SNPs with missing rate ([ 10% of samples), or apparent Fig. 1 The geographical distribution and population structure analysis of 277 soybean accessions. NR, North region; CR, Central region; SR, South region. NR, CR, SR, and mixture correspond to the color of red, yellow3, blue, and gray. a Geographical distribution of 277 soybean accessions. The x-axis represents longitude, and the y-axis represents latitude. One point represents one accession. b Genome-wide principal components analysis (PCA) of 277 soybean accessions. Principal Component 1 (PC1) is plotted against 2 (PC2) and 3 (PC2). c Evolutionary relationships of 277 soybean accessions. The evolutionary history was inferred using the Neighbor-Joining method. The branch length of the tree indicates the evolutionary distances which computed using the p-distance method. d Three subpopulations inferred from structure result. Each color represents one cluster, and the length of the colored segment shows the accession's estimated proportion of membership in that cluster as calculated by STRUCTURE software. (Color figure online) heterozygosity ([ 20%), or low minor allele frequency (\ 5%). The final data set consisted of 1161 SNPs (Table S2).
Population structure analysis TASSEL 5.0 software was used to calculate the Minor Allele Frequency, Proportion Missing, and Proportion Heterozygous for each SNP locus (Bradbury et al. 2007). A subset of 800 SNPs distributed across twenty chromosomes were selected to determine population structure through the software package STRUCTURE 2.1 (Falush et al. 2007) using Bayesian Markov Chain Monte Carlo approach. The admixture and independent allele frequency model were adopted, with cluster number (K) from one to ten. Twenty runs performed with a 100,000 burn-in length and 100,000 iterations for each value of K. The most likely number of subclusters was obtained through the online Structure Harvester (Earl and vonHoldt 2011). The program executed the Evanno method (Evanno et al. 2005) to detect the number of K clusters (with the peak delta K value) that best fit the data.
Genome-wide association study and candidate gene analysis GWAS was conducted using a developed model selection algorithm, FarmCPU (the fixed and random model circulating probability unification) in R package (Liu et al. 2016). The confounding problem between covariates and test SNP is considered in the algorithm by using both a Fixed Effect Model (FEM) and a Random Effect Model (REM). The first three calculated principal components were used as covariates. A sum of 1,161 SNPs, of which MAF [ 5% and missing rate \ 0.1, was used to perform GWAS. We used FarmCPU.P.Threshold function to obtain the recommended p-value threshold. The relationship with the genotypes was broken by the phenotypes permutation. The permutation time was 500. A vector of minimum p-value is produced, of which the 95% quantile value is recommended. For the significantly associated MTAs, the allelic effects were determined by representing phenotype data carrying different alleles as box-plots and were confirmed through the Student's t-test. All the annotated high-confidence genes located in the 300 kb genomic regions centered on the each significant MTA were selected for candidate gene identification. The gene annotation information and the expression profiles were obtained from SoyBase database (https://www.soybase.org/). These genes, of which annotation information were related to growth period traits and/or expressed in young leave, were identified as candidate genes.

Phenotypic evaluation
A total of 277 soybean accessions from worldwide ( Fig. 1a) were phenotyped at three environments, including Sanya (SY), Beijing Spring (BJSp) and Beijing Summer (BJSu). BLUP values among two or three years were used in the following analysis as the variance of year was not significant. The descriptive statistics of V, R, V ? R and R/V, revealed wide variation both within and between the three environments, providing a diverse phenotypic dataset with which to dissect the complex genetic architecture of the target traits. Among these three environments, SY had the smallest variation range from 1.29 (V) to 1.57 (R/V) while BJSp had the largest range from 1.99 (R) to 4.65 (V). For V, the mean value in BJSp was 0.25-and 1.08-fold higher than BJSu and SY, and the differences decreased for R and V ? R as the fold differences were 0.19 and 0.38, 0.19 and 0.53, respectively. In contrast to R/V, the mean value in BJSp was the same as BJSu but 0.21-fold lower than SY. A different trend was observed for V, R, and V ? R where they decreased from BJSp to BJSu, then to SY, while increasing for R/V (Table 1). The ANOVA indicate that the effects of genotype and the genotype and environment interaction were both significant for V, R, V ? R, and R/V at 0.001 level.
The correlation coefficients of the four traits were calculated for the three environments (BJSp, BJSu, and SY corresponding to Fig. 2a-c). Similar correlation patterns were observed for BJSp and BJSu, significant positive correlations between R and V ? R, and R and R/V, significant negative correlations between V and R, V and R/V, and V ? R and R/V, and little between V and V ? R. However, SY differed in that V and R had a significant negative correlation, r = -0.47, and a significant positive correlation between R and V ? R with r = 0.92, while other traits had no significant correlations. The high correlation coefficient between R and V ? R in Sanya suggested that the length of V ? R depended on R. The correlation coefficients between BJSp, BJSu, and SY for four traits were shown in Fig. S1. BJSp and BJSu had high positive correlation coefficients (r = 0.63-0.94) between them for the four traits. But SY showed lower positive coefficients with BJSp and BJSu for V and R (r = 0.38-0.45) and no relationship for V ? R and R/V. In all, the Sanya environment exhibited a large difference from BJSp and BJSu while BJSp and BJSu had similar trends, Each significance level is associated to a symbol: p-values ( Population structure A total of 1161 SNPs remaining after filtering (Table S2) were used in the ensuing analyses. Population structure analysis was conducted with STRUCTURE (Evanno et al. 2005;Kaeuffer et al. 2007) using 800 evenly distributed SNPs on the twenty chromosomes. The results indicated three clusters, NR (Northern Region), CR (Central Region), and SR (Southern Region), concordant with a shared allelebased neighbor-joining tree and PCA (principal components analysis) (Fig. 1b-d). Subpopulation SR was the largest group with 105 accessions, containing 76 accessions from south China; subpopulation NR consisted of 90 accessions, dominated by the accessions of 63 from north and northeast and 16 from outside China; and subpopulation CR had 34 accessions, 25 of which were from North and Huanghuai (Table S1). The derivative clusters of these soybean accessions referred to Q value from population structure analysis consistent with the geographical origin ( Fig. 1; Table S1).
Marker-trait association analysis for V, R, V ? R and R/V We used the fixed and random model through FarmCPU to conduct the GWAS and added the first three PCs in this model to avoid false positives from non-genetic effects associated with population structure (NR, CR, and SR). The GWAS revealed 29 SNP markers representing 33 significant MTAs above the threshold p \ 6.70e-05 for V (10), R (8), V ? R (4) and R/V (10) distributed across fourteen soybean chromosomes except for Gm01, 02, 04, 11, 12, and 19 (Fig. 3, Table 2). Two SNPs were associated with more than one trait and/or identified in two environments, for example Map-6502 associated with V (BJSp, p = 3.80E-07; BJSu, p = 9.05E-07) and R/V (BJSp, p = 5.46E-05). Most, 27 of the 28 SNPs were located in coding regions of annotated genes. The genetic effects of 27 SNPs were also determined: nonsynonymous polymorphisms (17), synonymous polymorphisms (9) and a stop-gain codon ( Table 2). Twelve of the 28 SNPs were close (300 kb, kilobase) or within (Map-6077) of reported flowering-related genes, 12 were in known QTLs related to growth period traits, and six matched both conditions (Table 2).

Reproductive growth period (R)
For R, we found eight significant MTAs on chromosomes Gm03, 05 (2), 07, 08 (2), 10, and 18. Map-6214 was located at 19 kb downstream of Glyma.07g089000, which was homologous to the A. thaliana VRN5 gene (At3g24440) (Suo et al. 2016) involved in both vernalization (Sung et al. 2006) and photoperiod pathways (Greb et al. 2007), and its expression is down-regulated significantly in the V4 stage in the photoperiod pathway in soybean (Zeng et al. 2018). The peak SNP, Map-6285, was associated with R by the most significant association (p = 4.76E-11) in BJSu. Glyma.08g141000, located at 69 kb upstream of the peak SNP Map-6285, is a homolog of the A. thaliana gene AT4G14540, the Nuclear Factor Y subunits NF-YB3, a candidate flowering gene in long-day photoperiod under the influence of E3 in soybean (Haider 2014 For V ? R, we found four significant MTAs on chromosomes Gm08, 15, 18, and 20. Map-6541 was covered by three QTLs related to growth period traits (two of V ? R). Glyma.15g162300 (GmZTL3) was located at 262 kb upstream of Map-6541 and has been proposed to be a photoreceptor and play a role in the   Jung et al., 2012) control of flowering time in soybean (Jung et al. 2012;Zhou et al. 2016).
Glyma.08g148200 was located at 212 kb upstream of Map-6293 (R/V). It was a homolog to the A. thaliana gene AT1G08970, Nuclear Factor Y subunits NF-YC9, and related to seed development and maturation in soybean. Glyma.15g176300, a homolog of ELF4-L3 (EARLY FLOWERING 4-like3), functioned as a signaling intermediate in circadian clock function, and photoperiod perception in A. thaliana (Marcolino-Gomes et al. 2017), was located at 31 kb downstream of Map-7187. The second significant SNP Map-6146 was found in BJSu (p = 2.79E-09) and BJSp (p = 5.09E-06). Although it was covered by 19 growth period QTLs including seven of V ? R, seven of V, and three of Photoperiod insensitivity (Table 2), no candidate genes with growth period traits were found nearby.
Pleiotropy on V, R, V ? R, and R/V in soybean Two SNPs (Map-6502 and -6077) showed pleiotropy on two or more traits of V, R, V ? R, and R/V. Map-6502 was associated with two traits, V (BJSp, p = 3.80E-07; BJSu, p = 9.05E-07) and R/V (BJSp, p = 5.46E-05). Further statistical analysis exhibited the significant difference between allele A and G in the defined cluster of SR for V and difference in the HR and SR for R/V but failed to calculate significance because the small number of statistics of 7 (allele A in HR) and 6 (allele G in SR) in Beijing Spring and 8 (allele A in HR) and 6 (allele G in SR) in Beijing Summer (Fig. S2). Map-6077 was related with R (SY, p = 5.17E-05), while it also appeared significant association with V (BJSu, p = 2.69E-03) and V ? R (SY, p = 9.82E-04) where the p value below the threshold, and further confirmed by significant difference analysis (Fig. S3). The days significantly decreased for V and V ? R while increased for R due to the favorable allele A of Map-6077 in BJSp and BJSu (Fig. S2). Map-6077 was a synonymous polymorphism within Glyma.05g244100, named as GmMFT in soybean, a member of the PEBP family that proposed to be a suppressor of seed germination (Li et al. 2014), also associated with seed oil content (Li et al. 2018;Zhang et al. 2018).

Discussion
Soybean is a short-day crop and the timing of transition from vegetative to reproductive period is regulated by environmental cues. We explored the genetic basis for these developmental phases using an association approach of cultivated germplasm grown at different environments. It had been reported causal genes playing different roles on DPF (before flowering) and DFM (after flowering) in different planting seasons (Yang et al. 1994). Thus, we picked two locations (Beijing and Sanya) with a latitudinal difference of 21.97 degrees and two planting seasons (Spring and Summer) to form three diverse field environments (BJSp, BJSu, and SY). The major soybean loci (E1-8, and J) were reported to be related to both time of flowering and maturity (Watanabe et al. 2012) and were photoperiod-sensitive loci (Cober and Morrison 2010). Photoperiod effects on genes have been shown to be dependent on growth period stages, affecting E1 and E4 during V and R, respectively, and E3 during V ? R (Wang et al. 2008). The four traits, V, R, V ? R, and R/V had relationship with each other. In BJSp and BJSu, a higher correlation was observed for V and V ? R (r = 0.56, 0.55) as compared to R and V ? R (r = 0.37, 0.39), indicating that V had greater impact on V ? R than R. However, in SY, the correlation coefficient of R and V ? R was large, r = 0.92, while V and V ? R was negligible, r = 0.08, indicating that R had greater impact on V ? R than V. The negative correlation between V and R (r = -0.47 * -0.07) indicating that a shorter V might imply a longer R, which may be beneficial to the accumulation of more dry matter in seeds while V, R, and V ? R were all positively related with yield (Orf et al. 1999). SNP Map-6077 for V and R showed opposite effects, indicating an underlying genetic explanation for the negative correlation between V and R. We speculated that the distinct photoperiodic environments contributed to the obvious difference for V, R, R/V and V ? R in Beijing (BJSp and BJSu, long-day (LD) with average day length of 13.5 h in summer) and Sanya (SY, short-day (SD) with average day length of 11.3 h). Take flowering time (V) for example, soybean is a typical SD plant that flowering early under SD but not LD condition. As a complex quantitative trait, flowering time is regulated by a sophisticated regulatory network in response to the change of photoperiodic environments (Lin et al. 2021). At least 12 flowering time loci (E1 to E11 and J) involved in this network have been identified with epistasis relationships by genetic analysis (Kim et al. 2020;Watanabe et al. 2012;Yang et al. 2017).
To better understand the adaptation of soybean to more extreme latitudinal differences, we used GWAS to validate and find genes or loci during the growth period. We selected 277 accessions including 135 from Chinese mini core collections (Qiu et al. 2009(Qiu et al. , 2013 and planted them in three locations with latitudes from 18.25 to 40.22°N. Three SNPs, Map-6077, Map-1899, and Map-6383, from our study were close to previously reported loci. Map-6077 related with V, R, and V ? R in our study was 1.23 kb upstream of Chr05_41854422 associated with V ? R (Fang et al. 2017) and 140.77 kb downstream of ss715599242 (Gm05_38636402, Glyma1.1) (Fang et al. 2017;Mao et al. 2017). Map-1899 and Map-6383, associated with V and R individually, were 27.87, and 1.58 kb downstream of Chr10_44710944 and Chr10_45025153 from Fang et al. (2017). The reminding 26 SNP loci were not found in other reported GWAS studies. Furthermore, we found that six of 33 significant MTAs were identified both in two environments, while most of which (27/33) were identified in different photoperiodic environments, implying the genotype by environment effects. These photoperiodic environment related MTAs will play an important role in the promotion of soybean accessions spreading to different latitudes.
The pleiotropy and trait correlations in this study are different growth stages, different from the general pleiotropy, such as the correlation between seed oil content and protein content. The last two traits (V ? R and R/V) are derived from the first two and their relationships with other traits are determined by the variance and covariance of V and R and the calculation formula. Deciphering the correlations among V, R, V ? R, and R/V could facilitate soybean improvement. A positive correlation could simultaneously increase the trait values, and negative correlation raised the trait value for one while decreasing other traits.
Map-6077 exhibited extreme significant difference between phenotypic variations carrying A and G alleles in all three environments (BJSp, BJSu, and SY) for V, R, and V ? R (Fig. S2). Improved cultivars had a higher proportion of favorable allele A which had higher R, shorter V and V ? R, as compared to landraces with 0.84 to 0.51, implying that the soybean accessions with early flowering (V) and maturity (R) and a comparatively longer reproductive period (R) were selected during improvement (Fig. S2). Map-6077 was a synonymous SNV in GmMFT, expressed at the stage of seed 10DAF and increased at seed 42DAF stage. It has been reported to be a negative regulator of seed germination (Li et al. 2014) and a potential causal candidate gene controlling oil content (Li et al. 2018) in soybean. There was another SNP in GmMFT associated with R8 full maturity as shown by GWAS (Fang et al. 2017). We proposed that GmMFT is a pleiotropic gene and Map-6077 is a pleiotropic SNP associated with V, R, and V ? R. The analysis above implied that Map-6077 and -6502 as the pleiotropic SNP could be used as linkage makers for targeted traits. With the development of molecular marker technology, SNPs could be converted to CAPS/dCAPS or KASP makers easily. For most researchers and breeders, it is feasible to used developed markers to screen large numbers of samples followed the routine procedure. The elite materials carrying with favorable allele will be selected and used for new modern cultivars cultivation.

Conclusions
We identified 28 SNPs related to V, R, V ? R or R/V. Fourteen candidate genes were found, including the E2 gene. Although the number of SNPs (1161) used in this study was not large, the low LD decay rate in soybean made it feasible for us to discover traitassociated SNPs through GWAS. Because of the decrease in costs associated with obtaining highdensity genome-wide marker sets and improvements of statistical methods for GWAS, future challenges lay in innovation of accurate, low-cost, high-throughput, and reproducible phenotyping, which together may launch a new wave of marker-assisted breeding for crops. It is straightforward for crop breeders to provide the pleiotropic SNP allele with desirable effects on two or more correlated traits. In this study, two SNPs (Map-6077 and -6502) have pleiotropic effects the measured traits, but further functional experiments are needed. It was exciting to find that GmMFT, reported to inhibit the seed germination and related with seed oil content, was associated with V, R, and V ? R in this study. Our critical mission in the future is to provide candidate genes and favorable alleles to produce better soybean cultivars more rapidly and efficiently.
National Crop Germplasm Resources of China (No. 2016-004 and2017-004) and the Agricultural Science and Technology Innovation Program (ASTIP) of Chinese Academy of Agricultural Sciences (CAAS-ZDRW202109).
Authors' contributions Y-F.L., Y-H.L., S.J., and L.Q. conceived the study and jointly wrote the paper; Y-F.L. performed the GWAS, genotype data processing, population structure analysis and relative analyses. H.H. and Z.L. carried out the phenotypic evaluation; Y.M. analyzed the phenotypic data; All authors read and approved the final manuscript.
Availability of data and materials The datasets analyzed during the current study are available in this article in Additional Tables S1 and S2.
Code availability Not applicable.