The effect of cycles of genomic selection on the wheat (T. aestivum) genome

We documented changes in the wheat genome attributed to genomic selection including loss of diversity, and changes in population structure and linkage disequilibrium patterns. We conclude that training and prediction populations need to co-evolve instead of the use of a static training population. Genomic selection (GS) is widely used in plant breeding to shorten breeding cycles. Our objective was to assess the impact of rapid cycling GS on the wheat genome. We used 3927 markers to genotype a training population (YTP) and individuals from five cycles (YC1–YC5) of GS for grain yield. We assessed changes of allele frequency, genetic distance, population structure, and linkage disequilibrium (LD). We found 27.3% of all markers had a significant allele frequency change by YC5, 18% experienced a significant change attributed to selection, and 9.3% had a significant change due to either drift or selection. A total of 725 of 3927 markers were fixed by YC5 with selection fixing 7.3% of the 725 markers. The genetic distance between cycles increased over time. The Fst value of 0.224 between YTP and YC5 indicates their relationship was low. The number LD blocks decreased over time and the correlation between LD matrices also decreased over time. Overall, we found a reduction in genetic diversity, increased genetic differentiation of cycles from the training population, and restructuring of the LD patterns over cycles. The accuracy of GS depends on the genomic similarity of the training population and the prediction populations. Our results show that the similarity can decline rapidly over cycles of GS and compromise the predictive ability of the YTP-based model. Our results support implementing a GS scheme where the training and prediction populations co-evolve instead of the use of a static training population.


Introduction
The goal in wheat (Triticum aestivum L.) breeding is to accumulate favorable alleles by recombining the available genetic diversity in a gene pool into trait-enhanced inbred lines. These genotypes result from a breeding cycle involving crossing parents, inbreeding the progeny, and selecting the best progeny to release as a variety or to become parents of the next cycle. Many important wheat traits such as grain yield are inherited quantitatively. With relatively inexpensive genome-wide genotyping capacities, breeders are now employing genomic selection (GS) to improve the efficiency of breeding for quantitative traits and to improve genetic gain per unit of time. Genomic selection utilizes molecular markers and phenotyping information collected in a training population (TP) to model genomic estimated breeding values (GEBVs) of individuals related to the TP. Breeders can then select superior individuals based solely on their GEBVs (Meuwissen et al. 2001) to become parents of the next breeding cycle (Hoffstetter et al. 2016a), to advance lines through stages of trialing (Borrenpohl et al. 2020;Endelman et al. 2014), to predict line performance in untested environments (Huang et al. , 2018, and to collaborate in multilocation testing and sharing of wheat lines across breeding programs with sparse testing (Atanda et al. 2021;Crespo-Herrera et al. 2021;Jarquin et al. 2020;Sneller et al. 2021).
An advantage of GS over phenotypic selection (PS) is that GS can greatly reduce the duration of a breeding cycle and increase selection intensity. Because of the shorter cycle duration and higher selection intensity, GS can potentially change allele frequencies faster than PS (Jannink 2010; Communicated by Mark E. Sorrells. Makanjuola et al. 2020). Selection and drift are the main factors determining allele frequencies in closed breeding populations and both can lead to allele fixation and reduced genetic diversity. The dynamic of genetic diversity in wheat has mostly been studied in a historical and domestication context (Cabrera et al. 2014;Christiansen et al. 2002;Fu and Somers 2010;Reif et al. 2005;Roussel et al. 2005). Christiansen et al. (2002) used microsatellites to study the change in genetic diversity of spring wheat varieties released between 1900 and 1990s in the Nordic countries, and Roussel et al. (2005) studied microsatellites allelic diversity in 480 European wheat varieties developed between 1840 and 2000. These studies showed shifts in diversity, primarily attributable to the breeder's temporal needs and strategies for managing diversity. Fu and Somers (2010) studied microsatellite allele differences between populations with and without improvement for a given trait in wheat. Genotypes without improvement showed higher microsatellite diversity than wheat genotypes with improvement. The rate of increase or decrease in diversity was trait-dependent as selection for one trait could increase the allelic diversity, whereas selection for another trait could reduce allele diversity. Reif et al. (2005) compared microsatellite diversity in T. tauschii, landrace cultivar, and modern wheat cultivars. Triticum tauschii was more diverse than the landraces indicating a reduction in diversity during domestication. Landraces were more diverse than modern cultivars indicating a reduction of diversity through breeding. In addition, the comparisons within the modern cultivars showed less diversity when lines were closer in release date than lines farther apart (Reif et al. 2005).
Few studies have researched genetic diversity in current breeding populations undergoing genomic selection. Jacobson et al. (2015) researched genetic variance in GS and PS selection in bi-parental maize populations with different selection intensities. As selection intensity increased, genetic diversity decreased with GS but remained unchanged with PS. Jacobson et al. (2015) considered this reduction in diversity minimal and that it had little impact on trait value. Long-term studies to monitor changes in allele frequency under selection are limited to maize (Moose et al. 2004). The authors studied 49 markers in a population that had undergone forward and reverse selection for high and low oil content. Of the 49 markers studied, 59% showed significant differences in allele frequency, and 20% showed evidence of selection (Moose et al. 2004).
Animal breeders have report realized genetic gain due to GS while also measuring the inbreeding rate per generation of breeding. Makanjuola et al. (2020) measured the rate of inbreeding in cattle before and after the implementation of GS. They showed that ten years of GS in cattle increased inbreeding by up to 2.06% per generation. This inbreeding rate was sufficient to prompt cattle breeders to develop strategies for better management of genetic resources. Scott et al. (2021) showed that the rate of inbreeding changed from 0.037 to 0.18% per year in Jersey bulls when using GS and from 0.037% (prior to GS) to 0.760% (after GS) per year in Holstein bulls. Aiming at better management of genetic diversity in animal breeding, Meuwissen et al. (2020) introduced the genomic optimal contribution selection method (GOC) that controls for inbreeding rate by limiting co-ancestry in crossing schemes. They found that using the same marker set to construct the identity-by-descent matrix and to estimate the GEBVs provided the highest genetic gain per unit of inbreeding. Meuwissen et al. (2020) suggested that GOC is a good method to obtain genetic gain while maintaining neutral alleles that may be needed for animal welfare and diversity for genetic gain (Meuwissen et al. 2020).
Although the impact of GS on diversity was recognized more than 10 years ago (Jannink 2010), little has been reported on the actual impact of GS on genetic diversity and population structure in crops. Rutkoski et al. (2015) measured the rate of inbreeding and genetic variance in populations used to improve quantitative resistance to stem rust of wheat. This study initiated with the same genetic pool (generation 0) and conducted one cycle of PS and two cycles of GS. The PS cycle increased inbreeding by 0.05 units per cycle, whereas GS increased inbreeding by 0.08 in cycle 1 and by 0.21 units in cycle 2, demonstrating that GS can reduce diversity faster than PS (Rutkoski et al. 2015). Allele frequencies within closed, finite populations can change due to mutation, selection, or drift. However, regardless of what causes the change, fixation of alleles reduces genetic diversity. Reduction in diversity can compromise the response to selection and reduce short and long-term genetic gains. In addition to reducing diversity, changes in allele frequencies can lead to genetic differentiation between the TP and the prediction population (PP). Changes in allele frequencies can lead to population differentiation and recombination could reduce the LD between markers and QTL. If later breeding cycles have a different LD pattern than that of the TP, then the accuracy of GS will be reduced (Wientjes et al. 2013).
Research to overcome the challenge of reduced diversity has been conducted in plant breeding (Gaynor et al. 2017;Lado et al. 2017).  presented the optimal cross selection method (OCS) that is analogous to the GOC in animal breeding (Meuwissen et al. 2020). Both OCS and GOC limit population-wide inbreeding by penalizing crosses among closely related individuals.  also report that OCS could help maintain the prediction accuracy between the TP and the PP.
The effect of rapid cycling with GS on the wheat genome is unknown. It is essential to assess the effect of rapid cycles of GS on genetic diversity, population structure, patterns of LD, and allele frequencies. Monitoring these population characteristics enables (a) strategic planning of crosses, (b) maintenance of trait genetic variation and response to selection, (c) deciding when to utilize new genetic variation, (d) retraining GS models for accurate predictions, and (e) understanding alteration in the genome due to rapid cycling with GS. This research uses single nucleotide polymorphisms (SNPs) to genotype individuals in a TP and five cycles of GS to accomplish the following objectives: (1) to quantify the magnitude of change in allele frequency, (2) to examine whether drift or selection drove the observed changes in allele frequency, and (3) to assess the impact of changes in allele frequency on genetic diversity, LD patterns, and population structure.

Training population
The training population (YTP) for this study was created for implementing GS for yield, fusarium head blight (FHB, caused by Fusarium graminearum) index of infection, and quality traits. The YTP was previously described by Hoffstetter et al. (2016a). The YTP consists of 470 F 4 -derived soft red winter wheat breeding lines representing the diversity in 2010 of the breeding programs at The Ohio State University. The YTP resulted from 47 bi-parental crosses using 23 parents derived from seven wheat breeding programs in the United States.
Phenotyping the Training Population The phenotyping of the YTP has been described by Hoffstetter et al. (2016a). Briefly, the YTP was phenotyped for FHB index in inoculated and misted trials with three replications in Wooster, Ohio during the 2011 and 2012 growing seasons. Index of FHB was rated as percentage of all spikelets showing visual symptoms of infection. The YTP was assessed for grain yield in Custar, Fremont, and Wooster, Ohio, in 2010 and 2011. The grain yield experiment consisted of 13 augmented blocks with 37 YTP lines and three standard checks per block. Analysis of yield data indicated that the Custar and Fremont sites had little genotype × environment interaction between them, while the Wooster site was distinct from those sites (Hoffstetter et al. 2016b). Consequently, the Best Linear Unbiased Estimates (BLUES) for grain yield were obtained (1) over the Custar and Fremont sites, referred to as Northwest sites (NW), (2) over the Wooster sites, referred to as Wooster sites (WOO), and (2) overall sites (ALL).

Genomic selection cycles
Five breeding cycles of genomic selection for grain yield, abbreviated as YC1, YC2, YC3, YC4, and YC5 were conducted. For the first selection cycle (YC1), 71 crosses were made using 14 YTP lines as parents ( Table 1). The F 1 seed produced by these crosses was sown in the greenhouse and self-pollinated to produce the F 2 seed. The F 2 seed was planted for a final YC1 population of 922 F 2 individuals (Table 1). Each F 2 plant was genotyped, and their GEBVs were estimated for yield in NW, WOO, and ALL (all sites combined) were estimated with the model trained with YTP data. While we originally planned to include quality traits and FHB resistance in the selection process, the final selection was based primarily on yield due to its importance in cultivar release. For each cycle, the F 2 plants were ranked in descending order based on their GEBVs for yield and grouped into crossing priority groups 1, 2 and 3: the top 5% of the F 2 s were placed in group-1, the next 5% in group-2, and the third 5% in group-3. Group-1 contained the most superior plants, and we crossed among them whenever flowering allowed. If a group-1 plant was ready for crossing when no other group-1 plant was ready, then the group-1 plant was crossed to a cross-ready group-2 or group-3 plant. We made a total of 78 crosses among the YC1 F 2 plants. The F 1 seed from the 78 crosses was planted to initiate YC2. Cycles YC2, YC3, YC4, and Y5 were conducted similarly. The number of F 2 plants, the times an F 2 was used as a parent, and the number of crosses per cycle is summarized in Table 1. The total time to conduct one GS cycle was 12 months. This GBS protocol consists of (a) reduction of the wheat genome complexity by cutting it with restriction enzymes (PstI and MspI), (b) ligation of the restriction pieces to adaptors, and (c) amplification of fragments by polymerase chain reaction (PCR). SNPs were discovered and called with the TASSEL-GBS software, following the pipeline described in Glaubitz et al. (2014). The steps of the GBS pipeline include (1) identifying tags and taxa in the fastQ files, (2) aligning to the wheat reference genome to extract each tag's genomic position (mapping distances in base pairs), and (3) posterior variant detection (SNP calling).

SNP filtering details
Genotyping-by-sequencing produced 270,717 single nucleotide polymorphisms (SNP) across all the chromosomes of the wheat genome. We calculated the type, count, and frequency of alleles (major, minor, indels) and the proportion of missing and heterozygous data for each SNP in the YPT. Then, for each genotype in the YTP, we calculated the proportion of missing and heterozygous data. These summaries served as the reference for loci filtering in all cycles.

Rounds 1 and 2 of filtering
For the first round of filtering, guided by the loci and genotype summaries, we remove (1) loci with > two alleles, (2) unmapped loci, (3) YTP lines with ≥ 50% missing data, (4) loci with missing data > 0.00, and (5) loci with < = 0.10 minor allele frequency (MAF). For the second filtering round, we merged YTP and YC cycles with only loci remaining from round 1 of filtering. Merged data became the master file to remove loci with > two alleles (appearing in YC cycles) and loci with missing data > 0.005. This step was necessary to remove loci that contained more than two alleles, which is assumed to be an artifact of the SNP calling process instead of mutations. After the second filtering round, we unmerged the data set by cycles. The third filtering round removed genotypes with > 5% missing data within each YC cycle.
Finally, the fourth filtering round purged markers (using only the YTP), with heterozygosity ≥ 0.25, and then thinned by a minimal distance of 50 bp in adjacent loci. The YPT is a RIL population expected to be 100% homozygous at all loci. However, after the third filtering round, we found heterozygosity in the YTP ranging from 0.02 to 1.00, requiring a fourth filtering round for highly heterozygous loci. After four filtering steps, we retained a set of 3972 biallelic and mapped SNPs for all subsequent analyses in 6911 wheat genotypes (Tables 1, 2).

Genetic drift simulation
We simulated the probability of obtaining a particular allele frequency in the YC5 due to drift using the genetic.drift function in the learnPopGen version 1.0.4 (Revell 2019) package in R version 4.2.1 (R Core Team 2022). Simulations were run for five mating cycles, assuming an effective population size of 20, and setting initial frequencies ranging from 0.50 to 0.99 in increments of 0.01 as these values represented all possible major allele frequencies in the training population: these alleles were termed TPMA. The simulation was run 1000 times for each possible frequency to provide a variance ( 2 ) of the possible TPMA frequencies in the YC5 that could be produced by drift. This provided an estimate of 2 for each frequency between 0.50 and 0.99. We used the "NORMDIST" function in excel to determine the probability of an observed TMPA frequency in YC5 given its frequency in the TP and the associated estimate of √ 2 obtained from the drift simulation. Markers with a probability of < 0.05 or > 0.95 were declared to be changed due to selection. Regression Analysis of polymorphism information content (PIC) and Allele Frequency.
We estimated the PIC for all each marker within each cycle as follows, where p i is the frequency of the ith allele of the jth marker in that cycle and n is the number of alleles at the jth marker (Botstein et al. 1980). We determined the PIC value for each marker locus and the frequency of the TPMA for all markers, in all cycles. We then regressed the allele frequencies and PIC values of each marker in each cycle onto cycle number where the YTP was considered cycle 0. The regression analysis was performed using Proc Reg in SAS v9.4

Genetic distance
We estimated the genetic distance (GD) among individuals, within and between populations. The GD was estimated as the complement for a simple matching coefficient (SMC) (GD = 1-SMC). The GD between the i th and jth individuals was calculated as: where n 1,1 and n 0,0 are the times the ith, and jth individuals share the same allele. And n 1,0 and n 0,1 are times individuals do not share the same allele. The distance between individuals was estimated within and between cycles. We used the dist.binary() (Dray and Dufour 2007) function in R version 4.2.1 (R Core Team 2022) to calculate the GD.

Linkage disequilibrium
A matrix of LD between all pairs of SNPs was calculated for each cycle separately as: D is estimated between two loci (A, B), each with two alleles (A1 and A2, B1 and B2) producing A1B1, A1B2, A2B1, A2B2 haplotypes with frequencies x11, x12, x21, and x22, respectively. Allele frequencies are equal to p A = x11 + x12 and p A = x11 + x21 . We estimated the r 2 with LD() (Warnes et al. 2022) function in the genetics package version 1.3.8.1.3 developed for R version 4.2.1 (R Core Team 2022). We conducted a Mantel test (Mantel 1967) to compare the LD matrices between every pair of LD matrices. We use the mantel() function from the package vegan developed in R (Oksanen et al. 2022; R Core Team 2022), with 9999 permutations to estimate a p-value. The Mantel test correlates variables between two matrices and the rows and columns are permuted to then estimate a p-value for the test. The LD blocks by chromosome were estimated using the function BigLD() from the package gpart version 3.15 (Kim et al. 2019) developed in R version 4.2.1 (R Core Team 2022). We defined an LD block as a group of SNPs with LD > 0.2. The LD blocks were only created for the YPT and the YC5 population.

Population structure
The degree of differentiation between populations was estimated using the fixation index (F ST ), proposed by Nei (1987) and calculated as: where a and b are the two populations, and p ajk represent the frequency of the jth allele of the kth SNP in population. These F ST values were calculated, between all pairs of cycles, with the function pairwise.neifs from the hierfstat package version 0.5-11 (Goudet 2005) coded in R version 4.2.1 (R Core Team 2022). To further study and visualize the structure among all populations, we conducted a discriminant analysis of the principal components (DAPC) (Jombart et al. 2010). The DAPC uses principal components scores to group observations with a discriminant analysis. We found the posterior probability of assigning an individual to one of six groups using the interactive function dapc (Jombart and Ahmed 2011) in R version 4.2.1 (R Core Team 2022). We first ran a principal component analysis of all genotyped individuals from all cycles using all 3927 markers. For DAPC, we established six prior groups (YTP, YC1-YC5) and retained 17 principal components that explained 51% of the variance: each additional PCs explained a negligible amount of the variance so they were not included. The DAPC assigned posterior probabilities of belonging to one of the six cycles for each genotype. For example, a line in the YTP population is assigned six posterior membership probabilities (to YTP, YC1, YC2, YC3, YC4, and YC5).

Genome-wide association analysis (GWAS)
We conducted a GWAS using all 3927 markers considering for yield in NW, WOO, and ALL combined environments as three different traits. We used four analyses in the in the R package GAPIT (Lipka et al. 2012): (1) general linear model, (2) the mixed linear model incorporating the first three principal components and the kinship relationship, (3) the multiple-loci mixed linear model, and (4) the FarmCPU model. We focused our results on the 1047 markers that were declared to have a significant allele frequency change over the five cycles. We declared a SNP-trait association significant if the p-value was ≤ 0.05 for at least two of the four AA models. For example, marker SNP239020 located on chromosome 7B, deemed to be under selection, was found to be associated with yield in 3 out of the 4 models for all traits (p-value ≤ 0.05, supplementary Table 1).

Genomic estimated breeding values (GEBV)
GEBVs were calculated for all individuals from all cycles for yield in NW, WOO, and ALL using phenotypic data from the YTP. The GEBVs were estimated using the ridge regression best linear unbiased predictor (rrBLUP) using the function mixed.solved (Endelman 2011) in the rrBLUP R package version 4.2.1 (R Core Team 2020).

Marker data
After four rounds of filtering, 3927 SNPs common across all cycles were used for analysis. The number of SNPs per chromosome ranged from 22 (4D) to 460 (2B) ( Table 2). The SNP count in genome B was the highest (1878), followed by genome A (1596) and genome D (463). We did not impute missing values for the population genetics analyses (F ST , allele frequencies, GD, LD): missing marker data were imputed with the mean value for estimating GEBVs.

Genomic estimated breeding values
This research covered five cycles of rapid cycling GS for grain yield. Individuals with the greatest GEBVs were used as parents to generate the next cycle. Grain yield in WOO was the primary selection criteria because it had the highest heritability (Hoffstetter et al. 2016a) and the greatest variation for GEBV. The first round of crosses occurred in 2011, with 3% of the YTP lines selected as parents based on GEBV and phenotypes for grain yield in NW and WO and FHB resistance (Table 1). Crosses were made, and F 2 plants were selected to be used as parents based on the GEBV for grain yield in NW and WOO in subsequent cycles. The selection intensity in the subsequent cycles ranged from 3 to 8% (Table 1). The average GEBV for YLD increased over cycles (Table 3, Fig. 1) while variance among the GEBVs decreased (Fig. 2). The mean GEBV increased at a per cycle rate of 19 kg/ha for NW, 35 kg/ha for ALL sites, and 73 kg/ ha for the WOO site. The largest change in mean GEBV was observed from YTP to YC1 for all yield traits. The GEBV variance, adjusted to that expected in the F 4 generation (e.g., 1.75 times greater than in the F 2 ), decreased over cycles for yield in NW and ALL (Fig. 2). Variance for GEBVs for yield in WOO decreased from YTP to YC2, followed by an increase to YC5. Much of the reduction in variance occurred due to fewer individuals with low GEBVs (Fig. 1). F 2 Individuals with yield GEBV greater than the maximum YTP GEBV were noticed in YC1 for yield over all environments but not until later cycles for yield in NW and WOO (Fig. 1).

Polymorphism information content (PIC) and genetic distance (GD)
We assessed the change in genetic diversity over cycle of GS by assessing PIC and GD. The PIC value for each 3927 SNP loci was estimated within the YTP and each cycle. The PIC is a characteristic of a marker and indicates how informative the marker is and its diversity. The PIC values ranged from 0.18 -0.50 in the YTP and 0.00-0.50 in cycles (Fig. 3A). The average PIC value decreased from 0.358 in the YTP to 0.227 in YC5. The minimum PIC value for a SNP in the YTP was 0.18, while PIC values of 0.0 (i.e., fixation) were observed in all subsequent GS cycles. We found a PIC value of zero for 76 and 616 loci in YC1 and YC5, respectively. We calculated the GD between individuals within and between cycles. The average GD increased slightly from the YTP to YC1 and declined slightly afterward (Fig. 3B). When considering all possible comparisons with the YTP, the average GD between cycles was highest for YTP versus YC1 (0.562) and lowest for YTP versus YC4 (0.554) ( Table 4). The average GD did not steadily change as cycles progressed; however, the minimum GD showed substantial change. For example, the minimum distance found in the YC1 was 0.157, whereas the YC5 comparison had a minimum difference of 0.318. In this study, the GD estimates genetic diversity for a given individual versus every other individual. These analyses showed that there was a decrease in diversity over the cycles of GS.

Population structure
We assessed the change in population structure and degree of differentiation among cycles. The F ST analysis showed differentiation of all cycles from the YTP with greater genetic differentiation with each subsequent cycle (Table 5, Fig. 4A). For instance, the YC5 and YTP are five cycles apart and showed the highest F ST value (F ST = 0.224). There was an increase of 0.0456 F ST units per cycle difference, and the average F ST between consecutive cycles was 0.039. The first and second principal components obtained from an analysis using all 3927 markers explained 11.6% and 4.2%, respectively, of the variation among all YTP and YC1-YC5 individuals. We conducted a discriminant analysis of the principal components (DAPC) to place lines into the prior (YTP, YC1-YC5) grouping. For the DAPC, we retained 17 principal components, which cumulatively explained 51% of the variance and retained five discriminant functions (N-1, where N is the number of prior groups). The DAPC results showed the cycles becoming more distant from the YTP as the cycle number increased (Fig. 5). More admixture is observed between cycles that are closer in number, than if cycles are far from each other. Admixture with the YTP is reduced as the number of GS cycles is increased. For example, cycles YC1, YC2, and YC3 clustered closer to the YTP than cycles YC4 and YC5 (Fig. 5). The DAPC assigns a posterior probability for an individual being in a prior group (YTP, YC1-YC5). We divided the posterior probabilities by quartiles (0.00-0.25, 0.26-0.50, 0.51-0.75, 0.76-1.00) and reported the frequency of individuals falling within each range (Table 6). For instance, we found that 80% of YTP individuals have between 0.00 and 0.25 posterior probability of belonging to the YC1 population. The general conclusion is that the differentiation of the cycles from the TP increased over time.

Linkage disequilibrium
We assessed possible changes in LD over cycles of GS as effective GS requires that the LD pattern in the TP persists over cycle of GS. We calculated a matrix of LD values among all SNP pairs within each cycle and performed a Mantel test to assess the correlation between the LD matrices. All pairs of matrices were significantly correlated at p < 0.0001. The correlation of LD matrices decreased responding to their separation in time (Table 7, Fig. 4B). For each unit increase in the difference between cycle numbers, the correlation value fell by − 0.057. We use the YTP and YC5 LD matrices to extract LD blocks by population and chromosome. We defined a block as a group of SNPs with LD > 0.2. The YTP had 604 LD blocks, ranging from 2 to 51 loci per block. The LD blocks were dynamic, that is, some chromosomes would develop new LD blocks, and others would lose LD blocks (Table 8).
There was a 23% reduction in the number of LD blocks by YC5 compared to the YTP. The total number of blocks ranged from three (3A) and 67 (3B) in the YTP; and 13 (5D), 42 (3B) in the YC5 (Table 8). Collectively these results suggest that LD structure changed throughout selection and advancement cycles.

Regression analysis of allele frequency
We calculated the change in allele frequencies over the five cycles and estimated if significant changes were likely due to selection or drift. We calculated the frequency of the major allele in the YTP (TPMA) in all cycles for all SNPs (supplementary Table 1). We then calculated the correlation of the TPMA frequency between all cycles (Table 9). This correlation declines as cycles become more separated by cycle numbers with a slope of − 0.106 per difference between cycle numbers (Fig. 4C). The rate of change in allele frequency was estimated for each marker by regressing the frequency of the TPMA in each cycle onto the cycle number where the YTP was called cycle 0. The same regression was done Fig. 4 The difference of A fixation index (F ST ), B Mantel statistic of the linkage disequilibrium matrices and C the correlation of allele frequencies when comparing values between the between the training population and the five cycles of genomic selection in winter wheat. The X axis is the difference between cycle number. For example, the difference in cycle number when comparing the training population (assigned as cycle 0) to cycle 1 is 1, as is the difference between cycles 4 and 5, while the difference between comparison of the training population and cycle 5 is 5 using the PIC value of each marker from each cycle (supplementary Table 1). We declared an allele frequency change significant if the r 2 for the TPMA and PIC regressions both exceeded 0.70, and if the estimated slope of both regressions had a probability of < 0.05 of being zero. A significant change was attributed to selection if the frequency in YC5 had a < 0.05 probability of being attributed to drift based on the distribution of possible allele frequencies generated by simulating drift.
The TPMA and PIC regressions determined that 1074 (27.3%) of all 3927 SNPs had a significant allele frequency change by YC5 (Fig. 6). These 1074 loci showed an average TPMA frequency in the YTP of 0.72, followed by YC1 of 0.65, YC2 of 0.59, YC3 of 0.58, YC4 of 0.57, and YC5 of 0.57.
We estimated whether a significant allele change was likely due to selection and not due to drift. We simulated drift over five cycles for each TPMA that occurred in the TP (0.50-0.99). We performed 1000 simulations per possible level of TPMA. The 1000 runs produced an error variance for each TPMA in the TP and allowed us to test if the observed TPMA in YC5 was significantly different from that observed in the TP if drift alone had occurred. In addition, the allele frequency regression used a linear model and a declaration of significance required an r 2 > 0.70 which is only produced by a fairly consistent rate of change over the cycles, something that would be unlikely if the change was due to drift. Thus to be declared changed due to selection a marker had to have a change that was < 5% likely due to drift and it had to have a consistent change over cycles.
We deemed that 707 of 3927 (18%) loci had a significant allele frequency change due to selection. An additional 367 of 3927 (9.3%) loci had a significant change, but the cause was ambiguous as these changes could be due to drift or to selection that did not meet our criteria. We considered alleles fixed if the TPMA frequency equaled 0 or 1 in YC5. A total of 725 of 3927 (18.5%) loci had become fixed by YC5. Of these 53 became fixed due to selection and with the rest fixed due to drift or low selection pressure. Many loci (2853, 72.7%) across the wheat genome showed no significant change in the TPMA frequency, but 512 out of these loci became fixed by YC5. The results show that a large portion of the genome has been altered through five cycles of GS.

Genome-wide association analysis
We determined that 1074 markers had a significant allele frequency change over five cycles. We conducted genomewide association analysis (GWAS) to assess if these markers had a significant effect on yield in NW, WOO, and ALL, considering each to be a unique trait. All 3972 SNPs and the yield data collected in the YTP were used for GWAS. We focused on the results for the 1074 whose allele frequency was deemed to show significant change. Overall, we found 101 of these 1074 markers were significantly (p < 0.05) associated with at least one yield trait, and 40 loci were significant for at least two traits. Two markers on chromosome 2B, three on 5A, and one on chromosome 6A were significant for yield in NW, WOO, and ALL environments. These markers were still not fixed in the YC5 population. Overall, most of the 1074 markers were not statistically associated with the yield traits.

Discussion
This study documents the effect of five GS cycles for grain yield on the wheat genome. Our analysis includes estimating the proportion of the markers with a significant change in allele frequency, discerning between genetic drift and selection as the cause of those changes, and the change in the index of fixation (F ST ), diversity, and LD between the YTP and YC1-YC5 cycles. Additionally, we report the distribution of GEBVs for all cycles and connect changes in allele frequencies to the value of alleles using GWAS.
Finite populations undergo selection, genetic drift, and recombination that can alter allele frequencies and the LD Selection inevitably causes changes in the genome, leading to loss of diversity and limiting long-term genetic gain though these changes may occur faster with GS than with traditional phenotyping breeding.
In the present study, five cycles of GS were completed in five years. The selection resulted in significant changes in alleles frequency compared to the TP, loss of diversity, fixation of alleles, increasing differentiation of the cycles from the TP, a general reduction in LD blocks, and increased differentiation of LD patterns over the cycles. We attributed these results to the selection intensity of 3% to 13% per cycle and a limited population size (n = 445-1821) ( Table 1). The documented changes could have a significant effect on GS prediction accuracy. We used conservative criteria to determine if there was a significant linear change of allele frequency over cycles and whether the change was driven by selection or genetic drift. There was a significant change in allele frequency by cycle-5 for 27.3% of the loci relative to the YTP (Fig. 6). Of the 1074 loci deemed to have a significant allele frequency change by YC5, 66% of the changes were attributed to the selection process. It is not possible to unequivocally attribute these changes to selection, though there was a 0.05 or less probability that they were due to drift. We found alleles fixed at 18.3% of all loci by YC5, indicating a loss of diversity. The percentage of all loci that became fixed and attributed to selection (1.3% of 725) was much less than the percentage that became fixed due to other causes (98.7% of 725). The fact that much of the fixation was not attributed to selection suggests that the genetic bottleneck caused by intense selection at each cycle was a primary cause of fixation. In addition, there was a decrease in the average PIC values and GD among individuals within a cycle as the cycle number increased (Table 4, Fig. 3A, B), indicating reduced diversity beyond the noted fixation.
Research to overcome the challenge of reduced diversity from selection has been conducted in plant breeding (Gaynor et al. 2017;Lado et al. 2017). To conserve diversity in plant breeding,  presented the optimal cross-selection method (OCS), and  developed AlphaMate to implement OCS. The OCS method is analogous to the genomic optimal contribution (GOC) selection method proposed for managing diversity in animal breeding   (Meuwissen et al. 2020). Both OCS and GOC minimize population-wide inbreeding by penalizing crosses among closely related individuals.  also report that OCS could help maintain the prediction accuracy between the training and prediction populations. It would be interesting to simulate OCS in our populations to see how it would affect diversity and impact gains in GEBVs compared to our results. We cannot compare our results to genome changes that may occur using five cycles of PS as that assessment would take 35 years to complete (seven years per cycle of PS in out program). Rutkoski et al. (2015) compared GS with PS in wheat and showed that inbreeding increased faster with GS (0.08 per cycle) than with PS (0.05 per cycle), indicating that GS changed allele frequencies faster than PS. Results of an increase in inbreeding from GS are also reported in animal breeding by Makanjuola et al. (2020). They investigated the changes in inbreeding, co-ancestry, and effective population size in cattle. They studied three periods that represent significant changes in breeding strategies: BLUPs (1990-1999), inbreeding control (2000-2009), and GS implementation (2009. Makanjuola et al. (2020) report a rapid increase of inbreeding during GS implementation, and a steady rise during the BLUP and inbreeding control periods. The percentage increase of inbreeding reported for the three periods was 1.50% (BLUP), 0.50% (inbreeding control) and 1.95% (GS).
The effectiveness of GS depends on close relationship between the prediction and training populations. Significant genome changes during rapid cycling would reduce the effectiveness of GS. We found that the genetic distance between cycles and the TP increased the further apart they were in time. The GD (Table 4) and Fst values (Table 5, Fig. 3) between cycles also increased as cycles become further apart in time. This was supported by the DAPC analysis (Table 6, Fig. 5) that showed individuals from five cycles of GS became more differentiated from the YTP as the cycle number increased. The greatest differentiation was between the YTP and YC5 where the Fst was 0.224 whereas the Fst between the YTP and YC1 was 0.059. Frankham et al. (2002) state that F ST values greater than 0.15 indicate significant differentiation in plants, while a value less than 0.05 indicate little differentiation. These results indicate the cycles had evolved to the point where YC5 was not closely related to the YTP, and thus the YTP may not be predictive of genetic values in the later cycles.
Prediction accuracy depends on maintaining the LD pattern between markers and QTL that existed in the TP over cycles of GS. Decay of this LD will reduce the accuracy of GS.The reported change in LD patterns and number of LD blocks over five cycles of GS could impact the accuracy of GS predictions in later cycles (Tables 7,8). We observed that the correlation of the LD matrices between cycles decreased at a rate of -0.056 as cycles became separated over time. This shows that the LD pattern between markers was changing. One can then assume that the LD between markers and QTL is also likely to be changing and that will affect the accuracy of GS. Our observations pertain to LD among all monitored loci. It should be noted though that prediction accuracy in the TP results from markers and QTL being close and the LD between closely linked loci will decay slowly than the general decay we assessed. While we did observe an increase in GEBVs over the cycles (Fig. 1), the predictions in all cycles are based on the YTP data and assume that the marker-QTL associations of the YTP remain constant through the cycles. The change of LD over cycles indicates that a GS model based on the LD in the YTP may quickly lose predictive ability during rapid cycling. Fig. 6 Classification of 3927 wheat markers as to whether they (1) had a significant change in allele frequency over five cycles of genomic selection, (2) whether the change for markers with a significant change could be attributed to selection or other ambiguous causes, and (3) whether the markers become fixed or not This study shows that GS can produce rapid changes in the genome and much of the change does not seem to result from selection per se. The changes seem very likely to compromise the predictive ability of a GS model based on a static TP in a relatively few number of cycles as well as reduce diversity. The accuracy of GS depends on a close genetic relationship of the TP and the PP. This relationship decayed quickly in this study. We cannot estimate the reduction of GS accuracy from the genome changes that we have documented, but our results serve as a reference to understand the impact of rapid cycling on the wheat genome and their implications for GS. Our results also show that a static TP may have a very limited useful lifespan over cycles of GS and is not desirable.
Breeding schemes employing GS need to accommodate genome changes as well as short and long term breeding goals. Reducing diversity within a closed population is a goal of breeding as the selection process should fix favorable alleles and eliminate unfavorable ones. This is desirable in the short term but not the long term. Short term gains are of paramount importance in a competitive market. Our results indicate that the breeding scheme first outline by Gaynor et al. (2017) and supported by  is well suited to accommodate genome changes as well as short and long term goals. They proposed coupling rapid cycling GS used in population improvement with applying GS in the product development phase of breeding. Here GS is executed in a dynamic breeding pipeline to identify new parents at the early stages of field testing using phenotypes, GEBVs, and OCS. This reduces the number years per cycle compared to selecting parent only after the later stages of testing (Borrenpohl et al. 2020;Gaynor et al. 2017;. At the same time GS is used in rapid cycling employing a GS model that is continually updated with data of the ongoing breeding pipeline. This can be viewed as "evolving-GS" where data from every new cohort of lines that enter the product development phase is also used to update the TP and the parental pool. The genomes of the population will be changing though TP and PP in this scheme can be viewed as evolving together in genetic space such that their genomes remain similar and GS accuracy can be maintained within the shifting window defined by the current TP and PP. The use of some common parents across cohorts will increase the relationship between the TP and PP and would allow data from past cohorts to be relevant to the current PP. This co-evolution of TP and PP is in stark contrast to the static TP such as our YTP To some degree the genome changes we observed are inevitable consequences of the selection and drift that are inherent to all breeding and must be accommodated in a GS-based breeding scheme. A carefully constructed GS scheme can be implemented in an ongoing breeding pipeline where the genome information of the co-evolving TP and PPs maintains GS accuracy over many years of breeding. This would meet both the short and long term needs of a program. These schemes can be easily implemented by genotyping all lines that enter the product development phase and using data from the product development phase to update the TP every season to capture information on the changing genome and overcome some of the issues we report in this study.