Genomic estimated breeding values (GEBV)
Genomic estimated breeding values were determined for each cycle for 10 cycles (Supplementary Fig. 1). Over the course of ten cycles, GEBVs saw either a gradual increase or decrease before reaching a plateau. For DF and SY, the GEBVs increased rapidly before plateauing. The opposite trend was observed for WM tolerance, with GEBVs declining before leveling off. The parental population sizes had an impact on the GEBVs at the end of the breeding program. For DF, the GEBVs averaged across the strategies were 19.90, 17.47, 13.98, and 9.87 for 15, 30, 60, and 100 parents. For WM tolerance, parental population sizes of 15, 30, 60, and 100 resulted in average GEBVs of -35.26, -61.52, -62.34, and − 62.52, respectively. Lastly, for SY, the average GEBVs were 2420.18, 2745.80, 2185.04, 2183.38 for parental population sizes of 15, 30, 60, and 100, respectively.
True breeding values were obtained from the QU-GENE output files and plotted over 10 cycles (Supplementary Fig. 2). Over the course of ten cycles, TBVs saw either a gradual increase or decrease before reaching a plateau. For DF and SY, the TBVs increased and eventually plateaued. The opposite was true for WM tolerance, where TBVs declined before reaching a plateau. There were notable differences between the TBVs when different numbers of parents were used at the beginning of the cycle. For each of the traits, as the number of parents increased, the average TBV for the strategies decreased. For DF, the average TBVs across strategies at the end of the 10th cycle were 20.21, 17.69, 13.95, 9.96 DF for 15, 30, 60, and 100 parents, respectively. For WM (0-100 severity score) tolerance after 10 cycles, the average TBVs were − 46.17, -62.32, -62.51, -62.55 for 15, 30, 60, and 100 parents. For SY, the average TBVs were 2830.95, 2759.40, 2190.16, 2189.79 kg/ha for 15, 30, 60, and 100 parents, respectively. For most breeding scenarios, bulk breeding, single seed descent, pedigree, and modified pedigree methods led to similar TBVs. Mass selection resulted in a lower TBV for DF and SY, while it led to a higher TBV for WM tolerance in comparison to the other four strategies.
GS Model Accuracies
In Silico Realized GS Accuracy
GS accuracies were estimated from the correlation between the TBV and the GEBV. Under each of the five strategies, accuracies saw a general decline before plateauing. They ranged from − 0.35 for SY to 0.32 for WM (Fig. 1). The mean accuracies for each strategy were − 0.03, -0.02, 0.02, 0.05, and − 0.01, for mass selection, bulk breeding, single seed descent, the pedigree method, and the modified pedigree method, respectively.
For DF, the highest accuracy (0.31) was observed under the pedigree method with 30 parents, while the lowest accuracy (-0.35) was in bulk breeding with 100 parents. When considering WMWM, single seed descent with 15 parents resulted in the greatest accuracy (0.32). The lowest WM accuracy was seen under mass selection with 60 parents (-0.29). Interestingly, in cycle 2, there was an increase in accuracy, after which the WM accuracy declined rapidly and became negative by cycle 4. For SY, both the highest (0.24) and lowest (-0.34) accuracies were observed in mass selection. For certain cycles, a correlation could not be obtained. In these cycles, the variance was zero and the correlation was undefined.
Expected Formula-based GS Accuracy
GS accuracies determined using Eq. 3 ranged from 0.07 for SY to 0.63 for DF (Fig. 2). In general, prediction accuracy decreased over the 10 cycles. The decline was smaller with parental population sizes of 15 and 30. Prediction accuracies were higher with larger parental population sizes. The strategies had similar accuracies and followed similar trends when the parental population size was small. However, with large parental population sizes, mass selection had a much greater prediction accuracy compared to the other strategies. Furthermore, the accuracy remained relatively high for mass selection. The accuracy was highest under DF, followed by WM tolerance and then SY. For DF under mass selection with 100 parents, the accuracy decreased from 0.63 to 0.47 over 10 cycles. Meanwhile, for WM tolerance under mass selection with 100 parents, the accuracy decreased from 0.46 to 0.39 over 10 cycles. Lastly, for SY under mass selection with 100 parents, the accuracy decreased from 0.43 to 0.29 over 10 cycles. In most breeding scenarios, bulk breeding resulted in the lowest prediction accuracies. For DF with 15 parents, the accuracy in bulk breeding decreased from 0.18 to 0.10 over 10 cycles. For WM tolerance with 15 parents, accuracy declined from 0.11 to 0.09 over 10 cycles when bulk breeding was used. For the selection of SY with 15 initial parents, accuracy with bulk breeding decreased from 0.09 to 0.07 over 10 cycles. Heritability had an impact on GS accuracy, where accuracy was highest under DF, followed by WM tolerance and then SY. However, selection strategies had similar accuracies when the parental population size was small, regardless of heritability.
Genetic gain with Model Update
The results from the model update indicated that there was a sharp increase followed by a rapid decline in genetic gain. Model update only seemed to improve genetic gain in one or two cycles immediately after the update, only to return to the rates of genetic gain prior to the update. Conventional breeding was included alongside genomic selection as a comparison for model update. Figure 3 shows that updating the GS model resulted in an increase in genetic gain after cycle 3 for mass selection, the pedigree method, and the modified pedigree method when selecting for DF and SY. However, it led to a decrease in genetic gain immediately after cycle 3, followed by an increase after cycle 4, and a decrease after cycle 5 for all strategies when selecting for WM tolerance. When compared to conventional breeding, genomic selection led to higher levels of genetic gain for certain strategies in the cycle following the GS model update. For DF, mass selection under genomic selection was 23.8% higher compared to conventional breeding. Meanwhile, the pedigree method and the modified pedigree method were 30.2% and 34.0% higher in genomic selection, respectively. For WM tolerance, mass selection led to 17.0% greater genetic gain using genomic selection than conventional breeding, while the modified pedigree method under genomic selection resulted in 9.94% higher genetic gain. Finally, for SY, mass selection, the pedigree method, and the modified pedigree method resulted in 22.7%, 11.3%, and 18.2% higher genetic gain, respectively using genomic selection compared to conventional breeding. For all other breeding scenarios, there was little to no difference between genomic selection and conventional breeding in the cycle after the GS model update.
In Silico Realized GS Accuracy with Model Update
In addition, the in silico realized accuracies fluctuated from one cycle to the next regardless of the model update. GS accuracies, represented by the correlations between the TBV and the GEBV, were obtained and plotted over 6 cycles (Fig. 4). Updating the model generally did not improve accuracies. Once again, the accuracy fluctuated over the different cycles. Mass selection had the greatest variability in accuracy, in some cycles having the highest accuracy, while in others having the lowest accuracies. For DF, following the model update at cycle 4, there was a small improvement in accuracy under single seed descent and the modified pedigree method, where accuracies increased by 0.08 and 0.04, respectively. The other three strategies saw a decrease in accuracy.
From cycle 3 to cycle 4, WM tolerance GS accuracies declined by 0.09, 0.13, and 0.12 for mass selection, bulk breeding, and the pedigree method, respectively. GS accuracies increased by 0.14, 0.03, and 0.02 between cycle 3 and 4 for mass selection, bulk breeding, and single seed descent, respectively. Although mass selection resulted in an increase in accuracy after cycle 3, it rapidly dropped and became negative. Decreases in GS accuracy after the third cycle were observed for the pedigree method and the modified pedigree method. However, the strategy with the highest accuracy in the final cycle was the pedigree method, with a value of 0.06.
Expected Formula-based GS with Model Update
GS accuracies determined using Eq. 3 ranged from 0.08 for SY to 0.58 for DF (Fig. 4). For all breeding scenarios, a general trend was observed where an increase in accuracy occurred after the GS model update at cycle 3, followed by a decline from cycle 4 to 5. The peak accuracy predicted from the DF simulation was 0.58, occurring at cycle 4 with mass selection. For WM tolerance, the peak accuracy was 0.51, occurring at cycle 4 with the pedigree method. The peak accuracy for SY was 0.38 at cycle 4 using the pedigree method.
True breeding values (TBV) with Model Update
After model update, the true breeding values were determined and plotted over 6 cycles (Fig. 5). In general, updating the model resulted in an increase in TBVs. At cycle 3 (where the update occurred), there was an increase in the TBV for all breeding scenarios. For DF, the TBV increased by 9.53, 7.05, 6.32, 4.34, and 6.38 from cycle 3 to cycle 4 for mass selection, bulk breeding, single seed descent, the pedigree method, and the modified pedigree method respectively. For WM tolerances, TBVs rose by 15.0, 38.3, 9.44, 0.73, and 10.7 from cycle 3 to cycle 4 for mass selection, bulk breeding, single seed descent, the pedigree method, and the modified pedigree method, respectively. Lastly, for SY from cycle 3 to cycle 4, mass selection, bulk breeding, single seed descent, the pedigree method, and the modified pedigree method had increases in TBVs of 761, 299, 134, 129, and 134, respectively. TBVs appeared to plateau after cycle 4 for DF and SY. However, for WM tolerance, TBVs rapidly declined after cycle 4.
Genomic estimated breeding values (GEBV) with Model Update
Genomic estimated breeding values were plotted over 6 cycles (Fig. 5). For DF and SY, GEBVs saw an increase following the model update. For WM GEBVs saw an increase and subsequent decline. For DF, there was a pronounced increase from cycle 3 to cycle 4 for mass selection, the pedigree method, and the modified pedigree method, with increases of 19.0, 18.8, and 20.6, respectively. Smaller increases were observed for the other two strategies. GEBVs increased by 7.66 and 6.02 between cycle 3 and 4 for bulk breeding and single seed descent, respectively. For WM tolerance, a large increase was observed for bulk breeding, while single seed descent led to the smallest increase in GEBV. From cycle 3 to cycle 4 for WM tolerance, GEBVs increased by 4.03, 39.1, 9.43, 16.7, and 13.1 for mass selection, bulk breeding, single seed descent, the pedigree method, and the modified pedigree method, respectively. Lastly, for SY, all five strategies resulted in an increase in GEBV following model update, with the greatest increase observed in mass selection and the smallest increase in single seed descent.
GEBVs increased by 1867, 367, 130, 886, and 1251 for mass selection, bulk breeding, single seed descent, the pedigree method, and the modified pedigree method, respectively.
Principal component analysis was conducted to show the overall result of the simulation with the model update. Figure 6 shows the PCA plot for DF, where 80.53% of the variance is explained by the first two principal components. Notably, the eigenvectors for TBV and GEBV are close together and point in the same direction. The eigenvector for genetic gain and Hamming distance point in similar directions. Towards the right side of the PCA plot, there were two clusters for mass selection that formed on the extreme of the GEBV and TBV eigenvectors. On the opposite side, a cluster containing all five strategies was found in the extremes of both the Hamming distance vector and the genetic gain vector. In the direction of the eigenvector for fixation of favourable alleles, there was a cluster consisting of bulk breeding. No clusters formed in the extreme of the effective population size eigenvector. Near the center of the plot was a cluster consisting of the pedigree method and single seed descent.
The first two principal components in the WM PCA (Fig. 6) explained 79.13% of the variance. such as DF, the eigenvectors for GEBV and TBV were close to each other. Meanwhile, the eigenvectors for Hamming distance and genetic gain were located close together. In the extreme of the Hamming distance eigenvector, there was a cluster consisting of mass selection, while for the genetic gain eigenvector, there was a cluster for the modified pedigree method. In the direction of the eigenvector for the fixation of favourable alleles, there was a cluster for bulk breeding.
For SY (Fig. 6), the first two principal components described 79.47% of the variance. The GEBV and TBV eigenvectors are very close together and point in similar directions. On the opposite end are the Hamming distance and genetic gain eigenvectors, which are located close together. Towards the extreme of the Hamming distance eigenvector is a cluster made of mass selection and bulk breeding. Between the eigenvectors for fixed favourable alleles and effective population size, there was a cluster corresponding to bulk breeding.