Whole-Genome Sequencing on 220 Alfalfa (Medicago sativa L.) Accessions Identied Loci Associated with Fall Dormancy

Fall dormancy (FD) is one of the most important traits of alfalfa (Medicago sativa) for cultivar selection to overcome winter damage. Regrowth plant height following autumn clipping is an indirect way to evaluate FD. Although transcriptomics, proteomics analysis, and QTL mapping have revealed some important genes correlated with FD, the genetic architecture of this trait is still unclear. There are no applicable genes or markers for selection, which hinders progress in the genetic research and molecular breeding for the trait. We conducted whole-genome sequencing (WGS) on 220 alfalfa accessions at 10x depth. Among the 875,023 SNPs, seven of them were associated with FD using genome-wide association study (GWAS). One SNP located on chromosome 6 is in linkage disequilibrium with dehydration-responsive element-binding protein 1C (DREB1C). Furthermore, seven DREB genes are clustered in this region, one of which has previously been shown to enhance freezing tolerance in the model plant Medicago truncatula. The candidate genes uncovered by our research will benet the transgenic and CRISPR-Cas9 research of FD in alfalfa. This gene will also be useful for marker development and assisted selection of FD for alfalfa.


Introduction
Fall dormancy (FD) is the adaptive growth characteristic of alfalfa in autumn and can be evaluated by the regrowth plant height at 25-30 days after nal cutting (Teuber et al. 1998).
The taller alfalfa grows in the fall, the less dormant it is. The FD level of alfalfa varieties can be classi ed from 1 to 11, where 1 represents most dormant and 11 represents least dormant (Teuber et al. 1998). Dormancy is induced by falling temperature and shortening photoperiod. Dormancy can improve plant survival in winter and increase freezing tolerance. FD alfalfa cultivars have better winter survival and hardiness than non-dormant alfalfa cultivars (Barnes et al. 1979;Smith 1961;STOUT 1985;STOUT and HALL 1989). FD level can be considered one of the indices for alfalfa cultivar selection in speci c regions. There is a trade-off between FD and yield. High FD could increase overwintering ability, but decrease the potential yield (Avci et al. 2018;STOUT and HALL 1989). Breaking down the linkage between FD and yield from genetic level could bene t for both yield and winter hardiness. Studying the genetic architecture of FD could be used to improve those high yield alfalfa cultivars with bad cold tolerance Munjal et al. 2018).
The genetic mechanism of FD in alfalfa has been studied using genomics, transcriptomics, and proteomics. QTL mapping between different FD level alfalfa germplasms has been conducted in several studies. Seventy-one QTL related to FD traits have been identi ed, and 15 of them can be detected in more than one environment . Another study identi ed 45 signi cantly associated QTL for FD (Adhikari et al. 2018). Both of these groups found that FD and winter hardiness have independent Page 4/23 inheritance and could be improved independently. A gene expression-related experiment showed that a cold acclimation-responsive gene, RootCAR1, was positively associated with alfalfa winter survival (Cunningham et al. 2001). More FD-related genes have been found using transcriptomics technology.
Forty-four genes were identi ed by comparison of the transcriptomes from non-FD versus FD cultivars' leaves. The transcription of the genes IAA-amino acid hydrolase ILR1-like 1, abscisic acid receptor PYL8, and monogalactosyldiacylglycerol synthase-3 were suggested to be involved in regulating FD in alfalfa (Du et al. 2017). Eight signi cantly differentially expressed transcription factors related to CBF and ABRE-BFs were identi ed . Furthermore, FD-related proteins were analyzed using proteomics and metabolomics. A total of 90 proteins were different between FD and non-FD alfalfa. One of these was thiazole biosynthetic enzyme (MsThi), which is important for alfalfa growth (Du et al. 2018). Ra nose family oligosaccharide (RFO) metabolism was shown to be involved in short photoperiod-induced freezing tolerance in dormant alfalfa cultivars (Bertrand et al. 2017).
Genome-wide association studies (GWAS) are widely used to locate precise SNPs associated with phenotypes using historical recombination information. With the development of the sequencing method and genetic analysis method of GWAS, the candidate SNP markers for important quantitative traits have been identi ed across different populations (Biazzi et al. 2017;Wang et al. 2016;Yu et al. 2016). Based on 198 accessions, Longxi-Yu reported a set of GWAS results. It includes 19 SNPs associated with drought resistance (Zhang et al. 2015). 36 SNPs associated with salt tolerance (Yu et al. 2016), 42 associated with plant growth and forage production (Liu and Yu 2017), and 131 markers associated with 26 forage quality traits (Lin et al. 2020). Using the 336 genotypes, Wangzan conducted GWAS for several key alfalfa traits, including ber-related traits and digestibility (Wang et al. 2016), crude protein and mineral concentrations (Jia et al. 2017), and nine biomass-related traits . Another study reported several SNPs associated with forage quality using half-sib progeny developed from three cultivars (Biazzi et al. 2017).
Furthermore, several genes were revealed to be related to some important traits of alfalfa. For example, several stress-responsive genes associated with yield under water de cit were identi ed, including leucinerich repeat receptor-like kinase, B3 DNA-binding domain protein, translation initiation factor IF2, and phospholipase-like protein (Liu and Yu 2017). Two markers linked to NIR-NBS-LRR genes were signi cantly associated with Verticillium wilt resistance . A cell wall biosynthesis gene associated with several cell wall-related traits that affect alfalfa nutritive value has been identi ed (Sakiroglu and Brummer 2017). MsACR11 has an effect on plant height, which could signi cantly increase the height of transgenic arabidopsis .
Although many GWAS have been performed in alfalfa, few of these have focused on FD. Furthermore, with the lack of reference genome in alfalfa, most previous studies have used the Medicago truncatula genome as a reference (Sakiroglu and Brummer 2017;Yu et al. 2017). With the technological development of the genome assembly, the genome of the cultivar "Zhongmu No.1'' has been released. A owering locus T homolog gene, MsFTa2, may be associated with fall dormancy and salt resistance and was identi ed using the GWAS method and the new reference genome for alfalfa (Shen et al. 2020).
In this study, we evaluated FD (we de ne FD as regrowth plant height at 30 days following autumn clipping) in 220 accessions and conducted WGS with the mean depth of 10x sequencing for every accession. A total of 875,023 SNPs were detected in the alfalfa reference genome and used for conducting GWAS. The objectives of this experiment were (1) evaluate the linkage disequilibrium (LD) and population structure of alfalfa, (2) identify SNPs associated with FD, and (3) infer the candidate genes controlling FD.

Plant materials and phenotyping
The plant materials used in this study consisted of 220 accessions collected from all over the world.  (Table S1).
In October 2017, the seeds of 220 accessions were planted in the greenhouse of the Chinese Academy of Agricultural Sciences (CAAS) in Langfang, Hebei Province, China (39.59°N, 116.59° E). The area has a continental monsoon climate with an average temperature of 11.9 °C/year. The coldest month is January with an average temperature of -4.7 °C and the hottest month is July with an average temperature of 26.2°C . The annual rainfall is 554.9 mm. The soil type is loam soil with the PH value of 7.37 and soil organic matter content is 1.69%. Individuals were transplanted to the eld of CAAS in April 2018. A randomized complete block design with three replications was used for the experimental design. Every accession included 5 individuals placed 30 cm apart in a single row for one replication. The accession spacing was 65 cm between rows and columns. To keep all individuals uniform, they were clipped to a height of 5 cm after transplant. To keep same evaluation standard with eld management of alfalfa, all individuals were clipped at the early owering stage (when 10% of plants begin owering). Plants were clipped three times in 2018 and four times in 2019. Fall dormancy was collected using plant regrowth one month after nal clipping (31 Oct 2018 and 29 Oct 2019). The tallest stem height of one individual was considered FD. The correlation information between FD of the mean value of different individuals (3 replication x 5 individual=15 individuals) and FD of sequencing individual was analyzed using ggplot2 (Wickham 2011).
The variance of fall dormancy was analyzed using a generalized linear model (GLM) as follows: height = accession + ind (accession) + year + accession * year + replicates + error All factors were random, and the GLM was performed using PROC GLM (SAS Institute, 2010). The broadsense heritability (H 2 ) was computed as follows: where σ 2 g is the variance of accession; σ 2 gy is the interaction variance of accession and year; σ 2 is the residual variance. The items y and i in the equation refer to the number of years and individuals, respectively.

Sequencing and SNP calling
Phenotyping was conducted on 15 individuals per accession and one individual with the typical phenotype was selected for sequencing. Young leaves were selected at the early regrowth stage (9 July 2019  (Bolger et al. 2014). Paired-end sequencing reads were mapped to the alfalfa reference genome (haploid genome with 8 chromosomes (Long et al. 2021)) with BWA-MEM using default parameters (Li 2013). SAMtools were used to translate SAM le to BAM le and sort BAM les using default parameters (Li et al. 2009).
Picard Tools was used to mark duplicate reads (http://broadinstitute.github.io/picard/), and Genome Analysis ToolKit was used to correct indels which can be mistaken for SNPs (Van der Auwera et al. 2013).
SAMtools mpileup and VarScan were used to detect SNPs (Koboldt et al. 2012). Furthermore, SNP data were ltered using VCFtools (Danecek et al. 2011) with a missing rate of less than 10%, a minor allele frequency of more than 0.05, and mean read depth greater than 20. The SNP distribution on the chromosome was generated using the R code from a GWAS study in beef cattle (Zhou et al. 2019).

Population structure and linkage disequilibrium
Population structure was calculated using the TASSEL 5 software with principal component analyses (PCA) (Bradbury et al. 2007). The 3D PCA results combined with geography information were plotted using R package scatterplot3d (Ligges and Mächler 2002). The population structure subgroup was estimated using admixture and the gure was generated using R package pophelper (Francis et al, 2017). LD information was calculated using the software PopLDdecay with default parameters (Zhang et al.

2019)
. The VCF le containing information on all 875,023 SNP markers was imported to PopLDdecay. The LD results among all accessions were used to estimate the LD of alfalfa. The distance versus mean R 2 within 30 kb of LD was plotted using R package ggplot2.
The Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK) method was used to carry out the GWAS (Huang et al. 2019) for two kinds of FD phenotype (sequencing individual and mean of 15 individuals). GWAS was conducted using the BLINK C version software. The Bonferroni multiple test correction was used to determine the signi cant SNP threshold (P=0.01/875,023= 1.14x10-8). The BLINK method uses iterations to select a set of markers associated with a trait. These associated markers are tted as a covariate for testing the remaining markers. This method better controls false positives than the kinship approach. The real data and simulated data results showed that BLINK has higher statistical power than other methods, such as MLM, FarmCPU, et al. (Huang et al. 2019). The Manhattan and Q-Q plots of GWAS results were created using the R package qqman (Turner et al. 2014).

Phenotypic variance
To identify the genetic difference in FD for different individual among the same accession, the variance of probability level (p=0.12), indicating that the difference in FD between individuals is not big enough and the mean value can be used to represent the FD of one accession (Table 1). The broad-sense heritability was 69.2% for FD.

Genotype calling and allele frequencies
A total of 32.2 million SNPs were detected using the BWA-SAMTools-VarScan pipeline. After ltering the SNPs with a missing value >10%, minimum mean read depth >20, and minor allele frequency (MAF) >0.05, there were 875,023 SNPs that passed the threshold and were con rmed for further analysis. Among these 875,023 SNPs, the average SNP density in the genome was one SNP per 26.95 bp (among the 814,517 SNP with SNP interval less than 300 bp) (Figure 2a, b). There were 60,506 SNP with an interval larger than 300bp, and the average SNP interval for these was 11128.66 bp (Figure 2b). The distribution of the MAF value showed that the value equal to 0.06 has the highest density ( Figure 2c, detail data not shown). The SNP number range is similar for MAF values from 0.2 to 0.5 (Figure 2c). These results demonstrate that the markers can be assigned to unbiased chromosomes and distinct positions.

Population structure
The accessions used in our research originated from different countries, especially the USDA-GRIN collection, representing wide coverage of genetic diversity on alfalfa. To analyze population structure, the genome-wide SNP data generated by WGS was used for principal component analysis (PCA) and population structure analysis. The PCA scatter-plots showed a weak population structure for the 220 accessions consisting of different geographical origins with no clear subpopulation ( Figure 3). In the scatterplot of PC1-3, most accessions are mixed. But some accessions coming from China can be separated from those with other origins (red color). Similarly, some accessions from South America can be separated from the others (cyan color). The accessions from Europe, North America, and other Asia countries have wide and mixed distribution. The geographic distribution of accessions interacted with PC1 showed similar results and the accessions coming from China and South America were different from others ( Figure S2). The rst three PCs explained 2.6%, 1.8%, and 1.2% genetic variance separately; the total genetic variance explained by the rst three PCs is 5.6%. Population structure analyses were conducted by admixture. The best subgroup was estimated by cross-validation (CV) error with a K value range from 2 to 10. The CV results showed K=3 has the smallest error ( Figure S3 and Figure S4)

Linkage disequilibrium
To estimate the con dence interval of GWAS results, LD was analyzed using a total of 875,023 markers.
The r 2 of LD across all chromosomes was used to estimate the LD of alfalfa. A rapid drop in r 2 was observed with the increase in physical distance. Over all the accessions, the value of r 2 decreased quickly within 2 kb physical distance, then decreased slowly afterward (Figure 2d). The value of r 2 decreased to 0.1 after 18 kb across all chromosomes. The results showed that extensive recombination took place during alfalfa evolution and distribution.

GWAS and candidate genes for FD
To identify SNP markers associated with FD and compare the effect of different kind of phenotype, genotypic and phenotypic data were analyzed using the Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK) method in the BLINKC version software. SNP effect was calculate using the BLINK R version Prediction function. Both kind of phenotype (mean of 15 individuals and sequencing individual) were used to conduct GWAS analysis.
The quantile-quantile (Q-Q) plot results of marker-trait associations for FD were illustrated using observed versus expected p-values ( Figure 5). Any deviation from the expected red line implies SNP association with FD. In the present study, a total of four SNPs passed the 1% threshold after a Bonferroni correction (p<1.14x10-8) and were associated with FD of mean of 15 individuals and three SNPs associated with FD of sequencing individual ( Figure 5, Table 2). Among them, two markers, chr2__23380918 and chr2__80663885, were located on chromosome 2. In addition, 2 markers (chr6__740339 and SNP chr6__69737395) were located on chromosome 6. Another three markers associated with FD of sequencing individual were located on chromosome 5 (chr5__36702273), 7 (chr7__49431088) and 2 (chr2__58174960). The FD difference between different genotypes can be observed in the seven SNP boxplots. There is no homozygous genotype CC for SNP chr6_740339 and there is no homozygous genotype TT for SNP chr5__36702273 and no AA for SNP chr2__58174960 (Figure 6). These results may suggest that CC, TT or AA genotype is the deleterious variance for alfalfa.
To identify potential candidate genes linked to SNP associated with FD, candidate genes were screened based on the up-and downstream 18 kb LD interval information. Of the seven signi cantly associated SNP, four of them are linked to known genes in the alfalfa genome (Table 2). There is one candidate gene (Msa0867660) with the annotated function of signal anchor protein connected to chr6__740339. Only one gene (Msa0892030), dehydration responsive element binding protein 1C (DREB1C), was connected with the SNP chr6__69737395 in the 36 kb genomic interval based on LD information ( Figure S6). We further checked the genes near SNP chr6__69737395 with a larger scope (500 kb interval up-and downstream).
Another six DREB genes were found upstream of this SNP (chr6:69234712-69716598). One of the six genes (Msa0891950) has been validated to be associated with freezing tolerance in Medicago truncatula ( Figure S7) (Chen et al. 2010). The transgenic Medicago truncatula has lower plant height than normal plants and has better tolerance for freezing. We further checked all DREB genes on chromosome 6, and found that there were nine in total (Table S2). Another two DREB genes were Msa0893350 and Msa0893360 (chr6:76949157-76957350). There is only one cluster region with seven DREB genes on chromosome 6, and this region corresponds to our signi cant GWAS signal. Another two SNP have more than one candidate gene. The SNP chr5__36702273 is connected with one hypothetical protein gene (Msa0729370) and transcription factor FAMA isoform X2 gene (Msa0729380). The SNP chr7__49431088 is connected with two transcription factor bHLH155 genes (Msa1022700 and Msa1022710).

Discussion
The FD variance of alfalfa Because of the outcrossing and self-incompatibility feature of alfalfa, the genotype between different individuals is different. Most studies collect phenotypes using cloned plants Liu and Yu 2017;Sakiroglu and Brummer 2017;Wang et al. 2016). For one study, seed plants of the same accession were used for phenotype collection (Yu et al. 2016). Compared with cloned plants, the advantages of using seed plants are that they are easier to establish and have better accession representation.
Furthermore, some experiments and assessment of phenotypes cause damage to the plants, such as winter survival, disease resistance, and root-related phenotypes. It is easier to rebuild the population using seed than by cloning plants. We checked the FD variance of individuals based on our experiment. To check the in uence of individuals' genetic differences for FD among the same accessions, we nested individuals in accession. Our results showed that the variance of individuals is not signi cant, while the variance of accessions is signi cant (Table 1). That means we could use the mean value to serve as a representative phenotype of one accession. Even though there is genetic variance among different individuals, their mean value is the whole picture of one accession. Furthermore, the correlation between sequencing individual plants and the mean value of different plants is high enough. The correlation R 2 value is 0.46 and 0.55 in 2018 and 2019 (Figure 1). While the correlation R 2 between 2018 and 2019 is 0.41 and 0.27 for mean FD and sequencing individual FD ( Figure S1). It showed that environmental effect has bigger in uence for a single individual. The mean value of multiple individuals is more stable among the different environments. In order to test the genetic variance of these two kinds of FD phenotype (sequencing individual and mean of 15 individuals). We tested the GWAS signal for both kinds of phenotypes and found associated SNP ( Figure 5, Table S3). However, there is no overlap SNP for them.
This phenomenon has been observed in previous studies (Heller and Yekutieli 2014;Lin 2019;Rietveld et al. 2014). We think FD maybe controlled by several different genes. Our two kinds of phenotypes corresponding to two different kinds of genes. Combine both associated SNP could uncover a better understanding of FD (Rietveld et al. 2014). Combining ANOVA and correlation results of FD and GWAS results, we think the mean of FD could be used for genetic analysis of alfalfa.
Our experiment also showed that FD evaluation in one location for three years is very hard. Because the accessions were collected from all over the world. The fall dormancy level is very different. For the accessions with low dormancy ability, their winter hardiness is weak and winter survival is a problem. It is hard to control the balance between sample diversity and local adaption. If we want to collect more trails data, these accessions not adapted local environment will die and our phenotype will be biased. That's why we only collected two years' data. Our purpose is to gain more genetic variance of alfalfa. However, more test trails with different climatic environments may be a possible way to solve this problem.
The genetic diversity of alfalfa The improvement of alfalfa is mainly based on outcrossing, and the genetic exchange of alfalfa always exists in alfalfa-growing regions (Li and Brummer 2012). To this end, nine distinct germplasms with global backgrounds were introduced in North America to cultivate new varieties (Barnes 1977). It is more di cult to distinguish alfalfa accessions based on geographic origin than for self-pollinating plants, such as soybean (Zhou et al. 2015) and rice (Huang et al. 2012). Also, diploid alfalfa (M. sativa subsp. caerulea and M. sativa subsp. falcata) can be clustered based on subspecies and geography (Sakiroglu and Brummer 2017;Şakiroğlu et al. 2010); the genetic diversity of tetraploid alfalfa (cultivated alfalfa) is more complicated. Based on our PCA results and population structure results, the genetic differentiation among different origins is not very high, except the ones from China and South America.
The introduction of alfalfa to South America took place around the sixteenth century, while it was introduced to China 2000 years ago (Annicchiarico et al. 2015). Differentiation has occurred along with cultivation in the local region. This phenomenon has also been identi ed in previous studies (Qiang et al. 2015;Shen et al. 2020;. The accessions coming from Europe and other Asia regions (regions except for China) are mixed and have a wide range. These accessions have the highest genetic diversity among all accessions with different geography. This result may be the reason that alfalfa originated from the Caucasus region: northeastern Turkey, Turkmenistan, and northwestern Iran (Michaud et al. 1988). This theory can also be strengthened by the knowledge that accession genetic diversity from North America and Africa is relatively low compared to European accessions. The cultivated alfalfa was exported to North America and Africa only around 200~300 years ago (Prosperi et al. 2014). This relatively short cultivation period also partially explains the low genetic diversity.

The linkage disequilibrium of alfalfa
Our results also showed that the LD distance is very short (R 2 reduced to 0.1 after 18 kb). This is different from previous studies (Sakiroglu and Brummer 2017) and (Li et al. 2014). One major reason may be because we used a large number of SNPs (around 58 times more than the previous report). With increased marker density, it is easier to estimate historical recombination events. Another study showed that LD can be decreased rapidly to less than 1 kb in alfalfa using candidate gene-related SNP (Herrmann et al. 2010). The rapid LD decrease is likely due to the outcrossing nature of alfalfa and the diverse germplasm we collected.
The candidate genes of FD A total of seven SNPs were determined to be associated with FD based on the 1% Bonferroni test (Table 2 and Figure 5). Four out of seven SNPs can be mapped to candidate genes based on LD information, and one SNP is connected with a DREB1C gene. DREB1C has been shown to be associated with freezing tolerance. When the gene CBF2/DREB1C was disrupted, mutant Arabidopsis had a higher capacity to tolerate freezing than WT plants and were more tolerant to dehydration and salt stress (Novillo et al. 2004). Further validation has been done in model plant Medicago truncatula. Transgenic Medicago truncatula exhibited suppressed shoot growth and enhanced freezing tolerance (Chen et al. 2010). The transcriptomic analyses of different dormancy alfalfa cultivars showed that DREB1C was one of the most signi cantly differentially expressed genes and was upregulated in a fall dormant alfalfa cultivar . Another transcriptome sequencing analysis revealed that nine DREB unigenes responded to cold and/or freezing stresses (Shu et al. 2017). Even though FD and winter survival may be controlled by different QTL (Adhikari et al. 2018;Brummer et al. 2000;Li et al. 2015), FD serves as an indicator to evaluate freezing tolerance in the broad sense is possible. Regrowth plant height following autumn clipping can be considered a signal of fall dormancy and freezing tolerance. If an alfalfa cultivar has a higher regrowth plant height, it will also have less freezing tolerance (Sheaffer et al. 1992). We further con rmed that DREB1C contributes to FD using the GWAS method. And our gene expression data showed that the DREB1C gene can be induced by cold treatment. But the difference between high and low fall dormancy level alfalfa is not signi cant ( Figure S8). Maybe our sample size (10 samples) used for expression analysis is not enough. However, combining several previous results from transgenic studies, transcriptomics, and GWAS showed that the DREB1C gene is very important for alfalfa freezing tolerance.
Validation of the DREB1C gene effects in alfalfa using gene-modi cation tools such as CRISPR/CAS9  or a transgenic method (Chao et al. 2014) will give us a better understanding of the function of DREB1C.
In addition to the DREB1C gene, there should be other genes responsible for FD, because FD is a quantitative trait with continuous variance ( Figure S9). We identi ed six genes associate with FD and one of which has a clear relationship with FD. Other genes identi ed by sequencing individual FD such as transcription factor FAMA isoform X2 and transcription factor bHLH155 may be also important for FD.
These results tell us maybe there are different genes responsible for FD.
Moreover, there are some candidate genes responsible for FD that have been identi ed using omics technologies, such as ABF4 from RNA-seq results  and MsThi from the iTRAQ-based method (Du et al. 2018). Cross-validation from multiple omics technologies could help us develop a more accurate understanding of FD. Our results showed that GWAS results could agree with RNA-seq data.
Combining GWAS methods with other omics technologies on a bigger scale could be a possible way to uncover more details regarding FD for alfalfa (Scossa et al. 2021).
In conclusion, we conducted a GWAS analysis of FD in alfalfa using whole-genome sequencing on seeding plants. Weak population structure was found among different accessions belonging to different geographical origins. Seven SNPs were shown to be associated with FD. One candidate gene, DREB1C, was responsible for FD. This research will help us achieve a better understanding of the relationship between genotype and the FD phenotype. The results could serve as basic information for QTL mapping or candidate gene cloning to understand the mechanism of FD in alfalfa. Furthermore, analyzing the genetic basis of FD will help to develop cultivars with different FD.

Competing interests
The authors declare that they have no con icts of interest.

Availability of data and materials
All RAD raw sequence data were upload to the National Genomics Data Center (NGDC,   Properties of single nucleotide polymorphisms (SNPs). In total, 220 alfalfa accessions were genotyped by WGS; 875,023 SNPs passed lters and quality control. Marker distributions are displayed as a heatmap on 8 chromosomes by minor allele frequency (MAF) (a). Marker density is displayed by histogram according to the interval of adjacent SNPs (b). MAF distribution is shown in a histogram (c). LD decay is shown according to the red trend line (d).

Figure 3
Population structure from the principal component analysis. A total of 875,023 SNPs and 220 alfalfa accessions were used to perform the principal component analysis. Population structure is shown as a 3D plot (D) of the rst three principal components (PC) with colored points that de ne the seven groups.
There are 13, 35, 73, 32, 4, 38, and 25 accessions in the groups corresponding to Africa, China, Europe, North America, other (not belonging to any other group), other Asia (except China), and South America, respectively.

Figure 4
Population structure results from K=2 to K=4. K values were labeled on the right side of the gure. The individual order of subgroups was ordered by K=2. The geographical origins were shown at the bottom of the gure. Different origins were separated by a white dish line.

Figure 5
Manhattan and Quantile-Quantile (Q-Q) plot of FD. The genome-wide association study was performed by BLINK C version software, with a signi cant p-value threshold set at P=0.01/875,023= 1.14x10-8(red line).

Page 23/23
We identi ed four signi cantly associated SNPs for FD of mean of 15 individuals, which are shown in the Manhattan plot (a). Q-Q plots are displayed on the right panel (b). There are three associated SNP for FD of sequencing individual (c and d).

Figure 6
Distribution of FD among the genotypes of the associated SNPs. The SNP associated with FD of mean of 15 individuals were shown in upper half gure (a, b, c and d). The SNP associated with FD of sequencing individual were shown in lower half gure (e, f and g). There is only one individual with TT genotype for SNP chr2_23380918 and no individual with genotype of CC on SNP chr6_740339. There is no homozygous genotype AA for SNP chr5__36702273 and no GG for SNP chr2__58174960.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.