Plant materials and soil sampling
In spring 2016, 14 alfalfa-grass mixed stands with a minimum of 25 years of grazing history were selected across Brown, Dark Brown, Black and Grey Wooded soil zones of Saskatchewan, Canada (Table 7; Fig. 11). These 14 grazing sites were privately owned by individual ranchers and permissions for collection of alfalfa plants were obtained prior to the study. The collection of plant materials complied with relevant institutional guidelines. The Brown soil zone is the most arid region characterized by annual precipitation in the range of 275–325 mm. It has the highest evapotranspiration among the four soil zones. The Dark Brown soil zone is characterized by annual precipitation in the range of 325–375 mm. It has a moderate evapotranspiration. The Black soil zone is characterized by annual precipitation in the range of 400–475 mm. The Grey Wooded soil zone is characterized by annual precipitation ranging from 300–500 mm with lower evapotranspiration than in the Black soil zone40. At each site, 30 mature alfalfa plants were dug and stored in a plastic container with 10 cm water. The distance between any two individual plants was 30 m to minimize sampling similar clones. Meanwhile, 30 soil samples at a depth of 30 cm were collected near the sampled plants at each site and then bulked by site for analyses of nitrogen (NO3-N), phosphorus, potassium, sulfur (SO4-S), and soil pH and electrical conductivity (E.C.) at ALS Laboratory Group (Saskatoon, SK).
Table 7
Soil zone and rural municipality of 14 long-term grazing sites in Saskatchewan Canada used for alfalfa plant selection.
No. | Population Name | Soil Zone | Rural Municipality (RM) |
1 | Crooked River | Grey Wooded | RM 426 |
2 | Shellbrook | RM 493 |
3 | Erwood | RM 394 |
4 | MacDowall | Black | RM 463 |
5 | Duck Lake | RM 403 |
6 | Rockhaven | RM 439 |
7 | Arcola | RM 64 |
8 | Dalmeny | Dark Brown | RM 344 |
9 | Pike Lake | RM 345 |
10 | Fillmore | RM 96 |
11 | Ceylon | Brown | RM 39 |
12 | Gull Lake | RM 139 |
13 | Val Marie | RM 17 |
14 | Moose Jaw | RM 45 |
Reference cultivars for genetic diversity study
Ten random alfalfa plants from each of the 14 alfalfa populations were selected for the genetic diversity study. To identify population structure of the 14 alfalfa populations to the Canadian commercial alfalfa cultivars released in Western Canada from 1926 to 1980, 11 commercial alfalfa cultivars (Table 1) were also included in the study. The seeds of the 11 alfalfa cultivars were germinated on top of a double layer of filter paper (VWR 413) in 9 cm plastic Petri-dishes. Then, 10 random germinated seeds per cultivar were transplanted into a 1/8-gallon pot (11 × 0.9 cm, Sun Gro Horticulture) filled with propagation soil mix containing 70–80% sphagnum peat moss, vermiculite and dolomite (SS#3, Sun Gro Horticulture Ltd.) at the College of Agriculture and Bioresources greenhouse at the University of Saskatchewan. The greenhouse has an 18-h photoperiod supplemented with high pressure sodium halogen lamps to a total of 490–550 µMs− 1m− 2 PAR. The air temperature was 26 ˚C and 20 ˚C, for day/night, respectively. Relative humidity was maintained at 67%.
Field Experimental Design
Four hundred twenty alfalfa plants collected from the 14 long-term grazing sites were propagated by stem cuttings. Then, the cloned plants were transplanted to a 1/4-gallon pot (13 × 12 cm, Sun Gro Horticulture) with propagation soil mix containing 70–80% sphagnum peat moss, vermiculite and dolomite (SS#3, Sun Gro Horticulture Ltd.) at the College of Agriculture and Bio-resources greenhouse in the University of Saskatchewan. A field spaced nursery was established on June 6, 2017, at the Agriculture and Agri-Food Canada (AAFC) Saskatoon Research Farm, Saskatoon, SK. The experimental design was a nested randomized complete block design (RCBD) with two blocks. There were 840 plots in the nursery (14 alfalfa populations × 30 genotypes population− 1 × 2 blocks). Spacing between rows and plants within a row was 1 m. Weed control was done using a triple rototiller (KS-190, Tram sales Ltd, AB, Canada) in early spring in addition to hand weeding around individual plants. The soil was a Sutherland clay loam (Dark Brown Chernozem, Typic Haploloroll)93. Weather data for Saskatoon from 2018 to 2020 was obtained from Environment Canada (https://climate.weather.gc.ca). The average monthly air temperature and precipitation at the study site are shown in Fig. 12. Growing season (May to September) air temperature during the three study years was near the long-term mean except for a higher temperature in May 2018, and lower temperatures in September 2018, May and August 2019. Rainfall was lower than normal during June and July in 2018. Almost no rainfall was recorded in May 2019. There was a higher than normal rainfall in June 2020.
Agro-morphological Trait Measurement
Six agro-morphological traits were measured on individual genotypes for three consecutive summers from 2018 to 2020, in addition to nutritive value analysis in 2018 and 2019. Measurement of the agro-morphological traits is described in Table 8. The flowering date was expressed as growing degree days (GDDs) using a base temperature of 5 \(℃\)94.
Table 8
Measurement of seven agro-morphological traits of alfalfa in a spaced nursery.
Traits | Measurement | Year |
Spring vigor (SV) | 1 = poor, 5 = good, visually scored on spring growth, plant size and leafiness | 2018–2020 |
Days to flower (GDD) | Recorded as the date first flower emerged on each genotype | 2018–2020 |
Plant height (PH) | Measured from the soil surface to the tip of the stem by stretching the stems upwards for each genotype | 2018–2020 |
Forage dry matter yield (DMY) | Each genotype harvested was dried for 48h at 60 \(℃\) in a forced air oven and weighed in grams | 2018–2020 |
Regrowth yield (RY) | Each genotype harvested was dried for 48h at 60 \(℃\) in a forced air oven and weighed in grams | 2018–2020 |
Fall plant height (FPH) | Measured from the soil surface to the tip of the stem by stretching the stems upwards for each genotype | 2018–2020 |
Forage Nutritive Value
For forage nutritive value determinations, plants were dried at 60 ˚C for 48 h in a forced-air oven. The dried plant samples were ground in a Wiley mill (Thomas-Wiley, Philadelphia, PA, USA) and then passed through a 1-mm mesh screen (Cyclone Mill, UDY Mfg., Fort Collins, Colorado, USA). Ground samples were stored in filter bags (Nasco Whirl-Pak, USA) prior to crude protein (CP), neutral detergent fiber (NDF) and acid detergent fiber (ADF) determinations. The values of three nutritive value traits were determined using a FOSS XDS rapid content analyzer (Foss, Denmark).
Phenotypic Data Analysis
For each phenotypic trait, a linear mixed model was used to perform analysis of variance (ANOVA) using JMP (JMP 15.2.1. SAS Institute Inc., Cary, NC). Soil zone, year, population (soil zone) and genotype (population, soil zone), soil zone × year, were treated as fixed effects in the model. The block was treated as a random effect. For each of six agro-morphological and three nutritive values traits, if the ANOVA indicated a significant difference at p ≤ 0.05 level, means were separated using the studentized Tukey multi-treatment method. Degrees of freedom were calculated using Satterthwaite’s method.
Genotype-environment associations (GEA) were determined using the redundancy analysis (RDA) procedure using the R package “vegan 2.5-7 version” with the rda function 95,96. The long-term (1986–2015) lowest and highest air temperatures and total precipitation (from May to September) for GEA analysis were obtained from Canada Weather Stats (available at https://www.weatherstats.ca/) and Environment Canada (available at https://climate.weather.gc.ca/). The soil pH, nitrogen (NO3-N), phosphorus (P), potassium (K), sulfur (SO4-S), and electrical conductivity (E.C.) from each long-term grazing site were also added into the GEA 40. The electrical conductivity was removed as it was highly correlated (׀r׀ > .7) with soil pH, because the RDA method for GEA is a regression-based method subject to problems when using highly correlated predictors as described by Dormann, et al. 97. Two alfalfa populations from Dalmeny and Pike Lake were excluded from the GEA due to no soil test information.
Individual genetic data were considered as response matrix Y, and a set of environmental predictors were used as explanatory matrix X. Genotypes were coded using the values 0, 1 or 2 that correspond to homozygote for the most frequent alleles, and heterozygote and homozygote for the less frequent alleles. RDA aims at constructing a matrix Y׳ of fitted genetic values estimated from the regression of each locus by the environmental predictors and at performing principal component analysis on the matrix Y׳ 98. The number of axes used (K) is determined by looking at the amount of information retained on the different axes of the RDA. A Mahalanobis distance D is then computed for each locus to identify loci showing extreme D values compared to the rest of the SNPs. A Mahalanobis distance is a multidimensional generalization of the idea of measuring how many standard deviations is a point from an average point. Computation of the Mahalanobis distance uses the covRob function of r package “ROBUST” 99. Mahalanobis distances are distributed as a chi-squared distribution with K degrees of freedom after correcting with the genomic inflation factor 100. Inflation factors are constant values that are used to rescale chi-square statistics in order to limit inflation due to diverse confounding factors 101. Inflation factors are constant values that are used to rescale chi-square statistics to limit inflation due to diverse confounding factors 101. Multicollinearity between environmental predictors was assessed using the variance inflation (VIF) and since all predictors showed VIF < 5 none were excluded 102. We then adjusted the resulting p-values for the false discovery rate (FDR) by computing q-values with the “qvalue” r package 103. A locus is considered as an outlier if its q-value is < 0.01.
Dna Extraction And Library Establishment
Fresh young leaves of 10 random alfalfa plants from the 14 alfalfa populations (11 plants for Shellbrook and Moose Jaw) and the 11 commercial alfalfa cultivars were collected into 1.5 ml centrifuge tubes and flash frozen in liquid nitrogen. Then the tubes were immediately stored in a 80 \(\text{℃}\) freezer before the leaf samples were dried in a freeze-drier (Freezone 6, Labconco, USA) at -52 ˚C and < 0.09 Pa for 72 hours. One hundred milligram of dried leaf samples were ground individually with a sterilized plastic stick. The genomic DNA was extracted using DNeasy Plant Mini Kit (Qiagen, Toronto, ON, Canada). The Quant-iT PicoGreen dsDNA Assay Kit (Invitrogen, Carisbad, CA, USA) was used to quantify the concentration of the DNA samples. The DNA concentration was adjusted to 20 ng/ul, and 100 ng DNA was digested by ApeKI (New England Biolabs, Ipswitch, MA). Then, the individual DNAs were ligated with a unique barcode adapter and a common adapter (Eurofins Genomics, Louisville, KY, USA) for 96 genotypes per plate using the protocol from Elshire, et al. 25. The ligated DNA fragments were purified by Qiagen PCR cleanup kit (Qiagen, Toronto, ON, Canada). Following the purification, PCR primers (Eurofins Genomics, Louisville, KY, USA) and KAPA HiFi HotStart Library Amp Kit (Roche, Laval, QC, Canada) were added through PCR amplification. The amplicon fragments were dual-side size selected using 0.57× and 0.78× (Average fragment size 400bp, approximate size distribution 220–900 bp) using Sera Mag Select (Cytiva, Marlborough, MA, USA). An aliquot of each sample was run on an Agilent 2100 Bioanalyzer (Agilent Technologies 2013) for evaluation of fragment sizes. The three libraries of 96 barcoded samples were sequenced with paired-ends of 125 bp in three lanes using an Illumina HiSeq 2500 at the National Research Council of Canada, Saskatoon, Canada.
Snp Calling
The quality of pooled, raw reads was assessed using FastQC software 104. The raw reads were then demultiplexed using Sabre 105. After demultiplexing, the raw reads of individual samples were cleaned by removing adapters and low-quality sequences using Trimmomatic v.0.36 based on the default setting of paired-end mode, phred 33 and threads 6 106. The M. sativa genome sequence 107 was used as a reference genome for alignment of the reads using Burrows-Wheeler Aligner v0.7.17 software 108. SNP calling was performed using Freebayes v1.2.0 109 with the following settings: minimum coverage 1, minimum alternate fraction 0.1, minimum alternate count 5, genotype variant threshold 26, minimum base quality 13, pooled continuous, best 1 alleles, mapping quality 30, ploidy 2, other settings, default. The minor allele frequency, extent of missing SNP data and distribution of SNPs on the reference genome were calculated. The SNPs with Phred-scaled quality (QUAL) score ≥ 1260 were kept for imputation using BEAGLE software 110. After imputation, 19,853 high quality SNPs were obtained and used in further analysis.
Genetic Diversity Analysis
The Bayesian method implemented in the program STRUCTURE is a common method applied to detect population genetic structure 111–114. This program uses a Bayesian clustering approach to define clusters containing groups of individuals whose genotypes maximize Hardy-Weinberg and linkage equilibrium 111. A discriminant analysis of principal components (DAPC) 115 using the “adegenet” package for R 116,117 was conducted to identify clusters of genetically related individuals. The Bayesian Information Criterion (BIC) was used for selecting the optimal number of clusters (K) based on the elbow method 115. The find.clusters function was used to detect the number of clusters among the populations. It uses K-means clustering which decomposes the total variance of a variable into between-group and within-group components. The optimum number of sub-populations has the lowest associated BIC. A cross validation function Xval.dapc was used to confirm the correct number of principal components (PC) to be retained.
The genetic diversity analysis was based on 19,853 SNP markers obtained from the SNP calling procedure. Three types of diversity analysis were performed at individual genotype level. First, genetic structure of 251 alfalfa genotypes was assessed using a model-based Bayesian method implemented in the program STRUCTURE version 2.3.4 111. Sub-population numbers (K) ranging from 1 to 10 were evaluated, repeating each analysis 20 times. Modelling decisions included correlated allele frequencies and use of an admixture model. A burn-in of 30,000 iterations and 30,000 iterations of the Markov Chain Monte Carlo (MCMC) were used. The most likely number of sub-populations was inferred with the delta K method 118 implemented in the STRUCTURE HARVESTER program 119. The CLUMPP program 120 was used to generate a consensus ancestry estimate from the 20 independent runs for each K and ancestry bar plots were generated with the “pophelper” package for R 117,121.
Identification Of Unique Loci Associated With Long-term Grazing
The 19,853 SNPs, with minor allele frequency (MAF) below 5% removed, were used for genotype-environment association for 122 genotypes representing 12 alfalfa populations sampled from the long-term grazing sites. The SnpEff software 122 was used to annotate all SNPs based on the M. sativa reference genome database. The candidate genes were localized by retrieving the positions of significant SNP markers in the annotation database of M. sativa genome function 107.
Linkage Disequilibrium
Pairwise Pearson correlation (r2) between pairs of SNPs were performed to assess the linkage disequilibrium (LD) across all chromosomes. A total of 12,402 SNPs were used in the LD analysis. The LD decay over physical distance was determined as the mean distance under the LD threshold of r2 = 0.2. The r package “sommer” (version 4.2.0) was used to conduct LD analysis 123.