Genomic prediction of agronomic and malting quality traits in six-rowed winter barley

doi:10.21203/rs.3.rs-1641581/v1

Download PDF

Research Article

Genomic prediction of agronomic and malting quality traits in six-rowed winter barley

https://doi.org/10.21203/rs.3.rs-1641581/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 19 May, 2023

Read the published version in Euphytica →

You are reading this latest preprint version

While two-rowed barley is usually preferred for malting and beer-making, six-rowed malting barley varieties appear in Europe around 30 years ago, and several breeders have specific improvement programs on this specific germplasm.

In this study, we evaluated the feasibility of genomic prediction for yield and malting related traits using 679 breeding lines from two French barley breeders, as well as a set of recently registered varieties. These lines were evaluated in five locations and two harvest years in an unbalanced design. Although the germplasm from the two breeders does show some trend towards differentiation, globally the whole panel did not show a clear-cut genetic structure. Predictive ability of GBLUP was evaluated through random cross-validation within and across breeder sets, and using cross-prediction between breeder sets. Results show moderate to high predictive ability (PA), particularly for malt friability and b-glucan content, for which predictive ability of 0.8 was obtained with training populations as small as 96 registered varieties and across breeding sets. The long range of useful linkage disequilibrium in this particular germplasm allows using as few as 2,000 to 5,000 markers to obtain high PA. Other prediction methods such as Bayesian LASSO, Bayes Cpi or EGBLUP not improve predictive ability. These results are very encouraging for implementing GS prediction of malting quality traits in applied breeding programs

Hordeum vulgare L.

genomic selection

malt quality

genetic improvement

Genomic models gave high predictive ability for agronomic and malt related traits in 6-rowed winter barley.

Barley (Hordeum vulgare L.) is one of the founder crops of Old-World agriculture. It is likely the first cereal that was domesticated in the Middle East, about 8,000 BC, from the wild species Hordeum vulgare ssp. spontaneum as suggested by archaeological remains of barley grains found at various sites in the Fertile Crescent (Zohary and Hopf, 1993). Badr et al. (2000) demonstrated the monophyletic nature of barley domestication. Their results supported the hypothesis that the Israel-Jordan area is the region in which barley was brought into culture, and that the Himalayas can be considered a region of domesticated barley diversification.

The wild progenitor (H. vulgare ssp. spontaneum) has a two-rowed phenotype, with strictly rudimentary, lateral rows. It is likely that Neolithic cultivators of barley selected a phenotype with a six-rowed spike, in order to increase grain number and thereby grain yield. The gene responsible for the six-rowed spike in barley vrs1 (six-rowed spike 1), was isolated by positional cloning (Komatsuda et al., 2007). The wild-type Vrs1 allele (for two-rowed barley) encodes a transcription factor that includes a homeodomain with a closely linked leucine zipper motif. Loss of function of Vrs1 resulted in complete conversion of the rudimentary lateral spikelets in two-rowed barley into fully developed fertile spikelets in the six-rowed phenotype. Phylogenetic analysis demonstrated that the six-rowed phenotype originated repeatedly, at different times and in different regions, through independent mutations of Vrs1.

Six-rowed barley is thus usually preferred for feed production, as higher yielding, although it was also traditionally used in US beers, while two-rowed barley is more often used for malting and beer-making. Indeed, two-rowed barley has more favourable characteristics for beer making and its first step, malting. Malt is dried germinated barley grain. Malting quality thus depends on grain size (more particularly its homogeneity), friability and its diastatic power, that is its ability to digest starch into fermentable sugar, which is later converted into ethanol in the brewing process. Malting barley is usually lower protein, in order to have less protein in the extract that can make beer cloudy. Two-rowed barley, which has higher average and more homogeneous grain size, is traditionally preferred by malting and brewing industry in Europe, e.g. in English ale-style beers or traditional German beers. France, which ranks first in Europe for malting barley production and first worldwide for malt export (1.2 Mt annually, 80% of French production), also uses six-rowed malting barley, which is the second grown small grain cereal behind bread wheat. That is why French breeders have specifically developed breeding programmes of six-rowed barley for malting quality. Most breeding schemes rely on doubled haploid production as a faster breeding method.

Evaluation of malting quality is achieved by a micro-malting test (e.g. Haslemore et al., 1982), which requires a lot of grains and takes time (usually > 4 days). Such tests are usually applied on a limited number of lines that have already been screened for agronomic traits such as yield or disease tolerance, just before official registration trials. The selection pressure is therefore quite low, thereby leading to slow genetic gain.

Genomic selection (GS) was first proposed by Meuwissen et al. (2001) who applied ridge and Bayesian regression models to animal populations for predicting breeding values. Marker effects are first estimated from the genotypic and phenotypic data in a training population. Appropriate methods such as ridge regression or Bayesian approach must be used in the usual case where marker number is higher than the number of observations (phenotypes). Then marker effects are used to calculate breeding values in the target population with only genotypic data, and selections are based on these Genomic Estimates of Breeding Values (GEBV). This method has been used successfully for dairy cow breeding (Goddard and Hayes, 2007). Indeed, in the case of dairy cow breeding (and particularly for bulls), the advantages of GS over classical breeding are obvious, with genotyping being much cheaper than progeny and GS being applicable at birth time, while progeny testing requires > 7–8 years. Therefore, GS of dairy bulls allowed early selection on a larger population, thus leading to nearly doubling the genetic gain per unit of time while the costs of proving bulls were reduced by 92% (Shaeffer, 2006). Although less obvious than in dairy cows, plant breeding could benefit from GS, provided that 1) genotyping cost is lower than phenotyping cost and/or can be applied at earlier stage on larger set of candidates and 2) prediction accuracy of GS is similar to that of phenotypic selection (Bernardo & Yu, 2007; Crossa et al., 2010; Heffner et al., 2009; Jannink et al., 2010; De los Campos et al., 2013).

Condition 1 can be applied to cereal quality traits such as breadmaking in wheat or malting traits in barley, which are relatively expensive. In this manuscript, we analysed agronomic traits and malting-related traits in a set of registered varieties and breeding lines of six-rowed winter barley, with a focus on genomic prediction ability of malting quality traits, as a feasibility study of GS implementation in malting barley breeding (condition 1 above).

Plant material

Two French breeders, here named as breeder1 and breeder2, provided a set of proprietary doubled haploid breeding lines (DH) of six-rowed winter barley, that had been preliminary selected for adaptative traits such as plant height, flowering, lodging or disease resistance. For competition issues, each set of proprietary lines were evaluated separately by each breeder in 2–3 locations during two growing seasons, 2017/2018 and 2018/2019. To allow connectivity in the whole data, a set of registered varieties, here named as “founders” (since often used as parents of the breeder’s DH lines) were evaluated in common by the two breeders. Breeder1 provided 259 proprietary DH breeding lines and Breeder2, 315 breeding lines.

One hundred and five « founder lines », i.e. registered varieties freely available under UPOV agreement were used by each breeder to enable main location effect correction.

Genotyping data

The barley 50K iSelect SNP Array (Bayer et al., 2017) was used for whole genome polymorphism assessment of the 679 breeding lines and cultivars. From the initial 44,040 SNP, quality control and filtering for missing data (< 20% per SNP), heterozygous SNP (< 5%) and minor allele frequency (> 1%) lead to a subset of 24,945 SNP, among which 24,101 were mapped of the barley physical map V2 and further used in statistical analyses. The average rate of heterozygosity was very low (1.76%), either by marker of by barley line. Moreover, 80% of barley lines had less than 1% heterozygous maskers, as expected for DH lines. Some lines are more heterozygous, likely due to cross-pollination during seed multiplication. Some markers (4.2%) had more than 5% heterozygous data, possibly due to genotyping misreading and were discarded from further analyses. But it is noticeable that only 850 out of # 25,000 markers were discarded by all the three quality criteria.

Missing data have been imputed using an EM algorithm (Poland et al., 2012) implemented in the A.mat function of the rrBLUP package (Endelman, 2011).

A genomic relationship matrix K was computed using the 24K markers according to van Raden (2008) equation using the A.mat function from rrBLUP package:

K=\(\frac{{WW}^{T}}{2\sum ({p}_{k}-1){p}_{k}}\)

where W is a centered N × M marker matrix of the i lines with W_ik = X_ik + 1 − 2p_k with X_ik the genotype of the i-th individual for the k-th marker as {− 1,0,1} and p_k the allele frequency at the k-th marker.

A principal coordinate analysis (PCoA, with “cmdscale” command in R)) was applied to the Roger’s distance matrix (Rogers 1972) computed with the “dist” command in R, to illustrate the additive relationships among the breeding lines and registered varieties studied.

Phenotypic data

Breeder1 evaluated its 259 DH lines and the 104 (out of 105) founder lines in three locations in France, namely Thoiry, Auffay, Warmeriville, in two growing seasons, i.e. harvest years 2018 (1872 plots) and 2019 (1327 plots). Each location carried out an unbalanced trial with most breeding lines being unreplicated, and a few control lines being replicated 15–20 times (cvs Etincel, Pixel and Visuel), and up to 50–60 times (cv KWS-Tonic) in 2018. In 2019, a lighter design was used with only three blocks per site and a single control (cv Pixel) being replicated 42–50 times. Given that these controls were spread randomly in the trial rather than a fixed number in each block, spatial models were used to correct from field heterogeneity. All trial plots were managed according to local farmer practice for malting barley, including fungicide treatment.

Breeder2 evaluated its 315 DH lines and 91 of the 105 founder lines in two locations in France, Cupperly and Presmesque, in the same two growing seasons, i.e. harvest years 2018 and 2019. Each location carried out unbalanced trial with most breeding lines unreplicated, a few control lines being replicated 10 times (cv Amistar) or to 50 − 20 times (cvs Casino and Etincel) in 2018. As for breeder 1 trials, spatial models were used to correct from field heterogeneity.

The common set of variables available on all plots in every location included: Yield (dt/ha), protein content (%), thousand grains weight (g), test weight (Kg/hl), Calibration (% kernels > 2.5 mm) and heading date (days from January 1st ), later named agronomic traits.

In addition, the following traits related to malting ability (later named malting traits) were evaluated on grain from a single plot after micro-malting tests, most often a and fewer locations (Cupperly and Warmeriville in 2018, Premesque and Warmeriville in 2019). Since the replicated control plots were not evaluated, spatial correction was not possible for those traits, namely friability, extract, viscosity and beta-glucan content.

Malt friability was assessed according EBC (European Brewery Convention) 4.15 method (Friability, glassy corns and unmodified grains of malt by friabilimeter – International method): whole malt corns are fragmented by the mechanical action of the friabilimeter’s drum and small fragments of physically modified material pass through the mesh of the drum whereas larger, unmodified, fragments are retained.

Extract of malt was determined according EBC 4.5.1 method (Extract of malt – Congress mash): fine malt grind is mashed and filtered after a standard procedure. Extract is defined by the determination of gravity of the wort. It defines the potential of malt for producing wort solubles by a standard mashing program. This procedure is also used for the determination of viscosity of wort, and soluble beta-glucans content.

Wort viscosity is an important parameter to estimate the quality of malt. The lower the viscosity, the better the modification of grains during germination. After congress malt extract, wort viscosity at 20°C is determined using a calibrated viscometer according EBC 8.4 method (Viscosity of laboratory wort from malt).

Wort viscosity is linked to soluble beta-glucans content of congress mash wort. A well modified malt content a limited quantity of these soluble polysaccharides. They are determined according EBC 4.16.2 method (High molecular weight β-glucan content of malt and malt wort: fluorimetric method). The fluorochrome Calcofluor complexes with high molecular weight β-glucan above MW 10000 in solution (in malt wort). The monitoring apparatus is calibrated against standards made of purified barley β-glucan.

Statistical analysis of phenotypic data

On each trial (i.e. site x location combination), the randomly replicated controls were used to correct agronomic traits from field spatial heterogeneity using the SpATS R-package (Rodriguez-Alverez et al., 2018). Then spatially adjusted plot values were used in a limear mixed model (LMM) using the lme4 library (Bates et al 2015) in R (R core team 2020), with genotype and its interactions considered as random. For quality traits, no spatial correction was possible, due to the lack of replicated controls, and raw data were used instead in LMM of the incomplete block design. Indeed, in the case of highly unbalanced design as we have, the conditional modes of the random effects are known to be better corrected from fixed effects (environments) than the adjusted means in fixed effects models

y_ijk = µ + y_j + y:s_jk + g_i + g yi_j + gs_ik + ε_ijk (1)

where y_ijkl is the phenotypic value of the i-th genotype in j-th year and the k-th site µ is the overall mean, g_i is the random effect of the i-th genotype, y_j is the fiexd effect of j-th year, y:s_kk is the fixed effect of the k-th location, nested within j-th year, gy_ij is the random interaction between the i-th genotype and the j-th year, gs_ik the random interaction between the i-th genotype and the k-th location and ε_ijkis the residual error, i.e. the three terms interactions, since most genotypes were not replicated .

Equations (1) with gi and its interactions with y and s as random effect were used (command VarComp in R lme4 library) to estimate variance components σ²_g, σ²_gy, σ²_gs and σ²_e, and their confidence intervals (command confint in R).

Since the experimental design was highly unbalanced, classical heritability formulae based on variance component ratios are poorly suited, and we rather used formula (20) in Piepho et al (2007)

h² = 1−(v(BLUP) / 2σ²_g),

where v(BLUP) is the mean variance of a difference of two BLUP (Cullis et al. 2006).

Heritability of each trait was estimated either using the full dataset, or separately for each breeder’s set or founders set.

The conditional modes (i.e. corrected from environment main effects) of each genotype were then extracted from LMM (command ranef in lme4 R package) and further used to show trait distribution, and pairwise correlations.

Since the genotype variance component was generally higher than its interaction variances, these genotypic conditional modes were further used to test the predictive ability of genomic selection models.

Genomic prediction models

The BWGS R software was used in this study (Charmet et al., 2020) to estimate the predictive ability of five genomic selection models, namely GBLUP, Bayes Cpi (Habier et al., 2011), LASSO (Park & Casella, 2008), and EGBLUP Jiang and Reif 2015).(GBLUP assumes an infinitesimal model, with every marker having a small effect drawn from a single gaussian distribution, while Bayes Cpi assumes a proportion of markers having 0-effects and others with non-zero effects from a scaled t-distribution. LASSO also assumes a highly narrowed distribution of QTL. EGBLUP is an extension of GBLUP with a “squared” relationship matrix to model epistatic 2x2 interactions, as described by Jiang and Reif (2015).

Model validation

To compare the models and estimate their predictive ability, different strategies were used:

1. Cross validation with 10 folds randomly sampled from a global population made of:

a. Breeder1 + Breeder2 + founder lines (N = 679)

b. Breeder1 lines + founder lines (N = 364)

c. Breeder2 lines + founder lines (N = 420)

d. Founder lines only (N = 105)

Each 10-fold cross-validation was replicated 50-times.

Strategies b. and c. give an estimate of what each breeder can expect from its own and publicly available material, while strategy a. measures the advantage from merging data sets from different breeders to train prediction models. Strategies d. was used to assess the robustness of genomic prediction when training size decreases dramatically and what could be achieved using publicly available material only.

To assess whether the lower predictive ability obtained in strategies b-d. compared to strategy a. can be attributed to training size only, we carried out random sampling within the full data to achieve training size N in {50, 100, 200, 300, 400, 500}, with 50 replicates each. To illustrate, we show results on yield and friability, the traits with most contrasted predictive abilities.

2. Across-population validation, using either:

a. Breeder1 + founder lines as training set and Breeder2 lines as validation set

b. Breeder2 + founder lines as training set and Breeder1 lines as validation set

c. Breeder1 + Breeder2 lines as training set and founder lines as validation set

The predictive ability was calculated as the Pearson’s correlation between predicted values and adjusted means from the LMM. To obtain confidence intervals of predictive ability in across-population prediction, we used a bootstrap method as described in Rutkoski et al. (2012)

Trait variation and summary statistics

Spatial adjustment of agronomic traits proved to be efficient in reducing residual error, as illustrated in suppl Fig 1.

Table 1 shows the variance components of the random effects in model (1), estimated from the whole dataset, and variety means heritability.

(table 1 around here)

The genotype component s²_g appears to be larger than the interaction components s²_gSands²_gY,leading to heritabilities ranging from 0.4 (Yield) to 0.8 ( Test weight and friability). The heritabilities of malting traits are all larger than 0.6, despite a smaller experimental design that used for agronomic traits.

Correlation and Principal component analysis

Figure 1 shows the single trait distribution and pairwise correlation of conditional modes of genotypic effects of the ten traits studied. Among agronomic traits, the highest correlation (0.77) is between average grain weight TGW and calibration which is quite obvious. Protein content is negatively correlated to yield (-0.32), but not as tightly as reported in bread wheat, e.g. -0.82 in French registration trials 1991-1999 (Oury et al., 2003). Moreover, a high protein content is not looked for in malting barley, since too much protein makes problems in the filtration process, as illustrated by the negative correlation between protein content and extract rate (-0.35). Thus, a stabilization of protein content, which is necessary to correctly feed yeast, is desired rather than a continuous enrichment.

(Fig 1 around here)

Among malting related traits, the highest correlation (0.90) is found between viscosity and b-Glucan content, and another (0.64) between friability and extract. Both correlations were expected from causal reasons. Viscosity and b-Glucan are negatively correlated with extract, which is favorable to breeding objectives, since extract is to be enhanced while viscosity is to be reduced. Malting traits are weakly correlated to agronomic traits, the highest (in absolute value) negative correlation being between extract and protein content (-0.42). This suggests s that genetic improvement of agronomic traits and malting traits can be achieved independently.

These correlations can also be seen in the first two axes of a principal component analysis shown in suppl. Figure 2. It clearly shows the two groups of tightly correlated malting traits, which are in opposite position along axis 1, while the agronomic traits are mostly supported by axis 2, particularly TGW and calibration, thus independent from malting traits, and protein content being poorly represented, as less correlated to all other variables in this plan. Heading date, poorly represented on axes 1-2, is therefore not correlated with agronomic or quality traits.

Molecular Data

The distribution of the 24,101 filtered markers was fairly homogeneous between chromosomes, ranging from 2,505 on chromosome 4H to 4,604 on chromosome 3H. The scatter plot of the 679 breeding lines and cultivars on the first two axes of the principal coordinate analyses of the Roger’s distance matrix is shown on Figure 2:

(Fig 2 around here)

The clouds of the two breeders’ lines show both overlapping regions and more privative ones. Cultivars are more widespread on the whole graph, with a higher density in the middle zone where the two breeder’s lines overlap. This may be explained by the use of cultivars as parents of crosses by both breeders, which explains the overlap, but also that each breeder has its own source of parents for crosses, which explains a beginning of divergence among the two sets of breeding lines. However, the overlap seems to be large enough to anticipate the possibility of successful cross-prediction between the two breeders, i.e. one breeder set used for training and the other set used for validation.

Genomic prediction

Cross-validation and forward-validation prediction abilities using the popular method GBLUP are presented in Table 2.

(table 2 around here)

Random cross validation using the whole set of lines (N=679) shows moderate predictive ability for yield and protein content (0.45-0.50), and good to very good ones for all quality and malting-related traits. In particular, predictive abilities of traits measured by the micro-malting test (last four row) are all larger than 0.65, and up to 0.80 for friability. This is very encouraging about the possibility to efficiently screen more candidates (at cheaper cost) and/or at earlier stage in the breeding scheme, thereby enabling a faster genetic gain for malting quality traits.

Columns 2 and 3 show predictive abilities in random cross-validation using lines from a single breeder + founder lines, i.e. what a single breeder can hope to achieve on its own, without sharing data with another breeder. The differences are 1) a smaller population size compared to column 1 (N=359 and N=410, respectively), which is expected to yield lower predictive abilities, and 2) phenotypes from a single breeder come from the same set of environments, which is expected to give higher repeatability (broad sense heritability) of the measured traits. These two effects are likely to balance each other, since predictive abilities are nearly as good as those in column 1 i.e. when using the whole dataset, and even higher for some traits of Breeder1, despite a smaller size of its available material (N=359 vs N=410 for Breeder2)

Column 4 shows predictive abilities obtained by cross-validation within a very small training set (N=95), made of the founder lines evaluated in common by both breeders. Although they are more variable than using the largest training set (standard deviation 2-4 times larger), they are unexpectedly large, particularly for malting traits.

To assess whether predictive ability of lines subset is due to training size, we used random sampling on limited size within the whole dataset. Figure 3 shows the predictive abilities obtained for Yield (Figure 3a) and Friability (Figure 3b) using random sampling vs determined subset (single breeder and/or founder lines).

(Fig 3 around here)

As expected from the theory, predictive ability decreases with sample size when sampling is random, while its variability increases. Using the training sets from a single breeder and/or registered varieties leads to contrasted results. For both traits, PA from Breeder2 + founder lines are close to those of random samples of similar sizes, while Breeder 1 samples slightly higher PA and founder lines only (N=95) give higher PA than expected by random sample size. The advantage of founder lines over random sampling is particularly pronounced for yield, with a PA of 95 fonder lines being higher that that obtain by cross-validation using the whole population.

These differences between predictive abilities of training set of similar size can hardly be attributed to the average coancestry between lines of the training set (and validation set, since randomly sampled in cross-validation). Indeed, the kinship coefficients (after normalization of the K matrix from A.mat function) within each subset are not very different to each other, at least when considering their average: 0.195, 0.200 and 0.201 for breeder 2, breeder 1 and founders, respectively. There is no clear-cut structure among the lines, and none which fits the a priori grouping by breeder origin (suppl. Figure 3).

Another explanation could be that founder lines were evaluated by both breeders, thereby in 5 locations each year, instead of only 2 or three locations for breeder’s own lines. This should be visible through the broad sense heritability when estimated from a single subset of lines, that are shown in Suppl. Table 1. Heritabilities estimated on founder lines only arealways higher than when estimated on the whole dataset. This may partly explain the higher predictive ability shown in Figure 3a. Moreover, heritabilities estimated in Breeder 2’s materials are always lower than those estimated from Breeder 1’s lines, which is consistent with the relative position of predictive abilities in Figure 3.

Figure 4 shows the effect of the number of randomly selected markers on predictive ability for the same two traits. As expected, predictive ability increases and its standard error decreases as marker number increases, up to a plateau that is reached with as few as 2,000markers, which are enough to nearly achieve predictive abilities that are close to and as reliable as those obtained with the full marker data. The most likely explanation is that the extent of linkage disequilibrium is large enough between any of the 2,000 markers and its neighbors, so that they are able to capture the effect of any QTL lying between them. To test this hypothesis, we estimated the decay of linkage disequilibrium with physical distance between markers.

Suppl. Figure 4 illustrates this decay on chromosome 1H.

(Fig 4 around here)

Although LD seems to decay quite rapidly at the scale of a whole chromosome, on average (green curve), it remains greater than 0.3 up to # 2Mb. Given the size of the barley genome, 4,250 Mb in our data, # 2,100 markers (4,200/2) regularly spaced will achieve a complete coverage of the full genome at LD-threshold =0.3. This fits to our empirical finding of prediction accuracy being nearly optimal with M =2,000 markers.

Table 3 shows the predictive abilities obtained with across-population validation, i.e. using pre-defined subsets for training and validation. Since there are no replicates, standard deviation of PA is not available in this case.

(Table 3 around here)

The size of the training set is roughly decreasing from left to right. As expected, PA decreases with the size of the training set, more rapidly than using random cross validation (Table 2), particularly for Yield and protein content. However, they remain within the range of practical usefulness for malting quality traits.

Values in column 1 are close to those of column 1 in Table 2, with similar size of training sets (N=612 in random 10-fold CV, N=569 in BRE1+BRE2 subset). This is illustrated in Figure 5, which shows the predictive abilities for the 10 traits in the first columns in Table 2 (random cross validation) and Table 3 (across population validation), i.e. with the largest possible size of the training set. Across population validation give predictive abilities which are slightly lower for agronomic traits, except Test weight, but very similar ones for malting traits, and even higher for extract rate.

(Fig5 around here)

To explore why malting related traits are more precisely and more robustly predicted by molecular markers than agronomic traits, we tried genomic predictions with models which depart from the infinitesimal one used in GBLUP. Indeed, LASSO and Bayes Cpi both estimate additive effects, but allow some markers to have null or very small values, while a few ones have larger effects. Results are presented in Table 4.

(Table 4 around here)

As expected, given the limits of the design, Yield and protein content show moderate levels of both heritability estimates. This is also the case for heading date, a trait that is most often considered as being highly heritable. This is likely due to the relatively narrow range of variation in our studied material, made only of western Europe adapted six-rowed winter barley.

It is worth noticing that the first column is broad sense heritability of plot means, also called repeatability, which relies on the design, whose square root is assumed to be the theoretical upper limit of the predictive ability of any model.

The traits that show high heritability have accordingly high predictive abilities. Globally, there are very few differences in predictive abilities among the 4 models, although LASSO shows lower PA, particularly for the least heritable traits, namely yield and protein content Bayes Cpi gives PA very similar to those of GBLUP, sometimes slightly, but not significantly higher, the difference being often on the third digit, i.e. within the range of 2 standard deviations. Comparatively, EGBLUP, which aims to model first order epistatic interactions, has higher than GBLUP, which only accounts for addictive marker effects, for most traits, sometimes with significant improvement (second digit), particularly for yield.

Despite relatively small and highly unbalanced (2 years, 5 locations, but breeders’ own material only in 2 or 3 each year) data, the experimental design allowed us to achieve a good plot-mean heritability for most traits, particularly those associated to malting quality. This further enabled us to use the adjusted genotype means from LMM as target to be predicted using genome-wide marker. Although marker-assisted selection in barley was proposed more than 20 years ago (e.g. Han et al., 1997), only some traits controlled by a few QTL with large effects, such as diastatic power of b-glucan content, were concerned (Li et al., 2009; Fang et al., 2019). The development of high-density marker systems based on SNP in barley is about ten-years old, and have paved the way to an efficient use of modern quantitative genetic approaches such as genomic selection. Considering this relatively recent development and the secondary importance of barley as a field crop, reports on GS application to barley are even more recent. Given its cost and resources-demanding aspect, malting quality has been one of the first objective of such studies. One of the first report was Schmidt et al. (2015), who explored the applicability of GS for malting quality in two practical breeding programs, namely spring and winter barley. They studied more traits than we did, including enzymatic activities (a-amylase and b-glucanase), but our four malting traits were also included in their report. Using an Illumina-9K SNP tool, they kept 4359 markers in winter barley, which allowed them to achieve predictive abilities ranging from 0.625 (Extract) to 0.798 (b-glucan content), i.e. values very close to our results, despite a very small training population (N=102). It is noticeable that in Schmidt et al. (2015), GS predictive ability was lower in spring barley compared to winter barley, by 0.16 on average, despite larger training populations. They explained this result by a more homogeneous population structure of their winter barley panel, as we also reported in the present study.

Nielsen et al. (2016) reported predictive abilities of G-BLUP model in a little-structured population of 309 spring barley lines using 3,540 SNP markers. With random leave-one-out,they obtained PA ranging from 0.40 (protein content) to 0.68 (seed weight), and even 0.83 for ergosterol, a trait we did not measure. As in our study, predictive abilities of the “leave set out” method gave lower predictive ability ranging from 0.31 (protein) to 0.52 (seed weight), and 0.72 for ergosterol.

It is generally acknowledged that increasing training population size increases predictive ability, as expected from the theory (e.g. Daetwyler et al., 2008; Goddard, 2009) or simulation studies (e.g. Iwata & Jannink, 2011). Our results fit the theory when sampling the set is randomly sampled from the entire barley materials. Similar results were also reported by Nielsen et al. (2016). However, this relationship is far from being a fixed rule. For example, Edwards et al. (2019), recently showed that, for a fixed size, it is better to increase the number of crosses (progenies) rather than the number of lines per cross. This may explain why the smallest subset of 95 registered varieties give a higher predictive ability than expected from random sampling. Indeed, it is likely that varieties registered in the French catalogue derive from many different crosses, compared to breeding lines from a single breeder.

The advantage of using lines from two breeders to get a larger training population did not always translate into higher predictive ability, particularly for malting related traits, at least when tested by random cross-validation. Such a result was already reported, also in barley, by Lorenz & Smith (2015). Using barley lines from two university breeding program (MN and ND), they showed that adding genetically distant individuals from another breeding program to training population does not improve, and even reduces genomic prediction accuracy in barley. But the breeding materials of these two programs were really distinct. Clear-cut clustering is obvious in their Figure 1 (heatmap), which is not the case in our suppl Figure 3. Although their scale for genomic relationship was different from our (not normalized), the mean relationship between programs was significantly lower than within each program, which again is not the case in our material, which appears to be more homogeneous. However, when tested using an independent set of lines (founders, Table 3), predictive abilities obtained using breeder 1 + breeder 2 lines are higher than those obtained using a single breeder’s material. From a practical point of view, this means that, when their breeding population does not show too much genetic divergence, there is something to be gained by merging materials from different breeders in order to achieve a larger training set.

Our results also showed, as in Nielsen et al. (2016), that as few as 2,000 markers are enough to achieve maximum predictive ability. This is likely due to the wide range of linkage disequilibrium in this material composed of six-rowed winter barley. This reflects a limited effective population size in this particular breeding material, as also illustrated by relatively high pairwise kinship, either within or between breeder sets of lines.

As already often reported (e.g. Heslot et al., 2012), we did not find huge differences between statistical models in terms of predictive ability, and this applies to all studied traits. Often the “old” GBLUP method, based on genomic estimates of Kinship, which is equivalent to the ridge regression BLUP, appears to be one of the best methods. Although it relies to the unrealistic assumptions of an infinite number of QTL with very small effects drawn from the same distribution, it does not significantly differ from other methods, which allow QTL effects to come from various 0 or non-zero distribution. Similar results were reported by Wang et al. (2015). Using simulated data, they showed that Bayes Cpi had higher predictive ability only with the scenario with 20 QTL. For all other genetic architecture, either simulated or real data with true polygenic traits, RR-BLUP slightly outperformed the other methods.

As in our previous report on bread wheat (Charmet et al., 2020), the method which is supposed to capture non-additive marker effects (EGBLUP), shows slightly higher predictive ability than GBLUP.

In our study, we kept sticking to single trait genomic prediction. Although a recent study (Bhatta et al., 2020) reported significantly higher predictive abilities of multitrait Genomic prediction over single trait method, we do not think it could be the case in our material. Indeed the genetic correlations reported by Bhatta et al. are much higher than those we found. The only tightly correlated traits are TGW/calibration, Friability/Extract and viscosity/b-glucan. These traits already show very high predictive abilities (0.6-0.8), which are thus less likely to be much improved by multitrait models, while single trait models show rather low predictive abilities in Bhatta et al. (2020).

It is worth noticing that the correlations we found in our six-rowed winter barley panel are all favorable to breeding goals. Indeed, friability and extract, which are to be increased, are negatively correlated to viscosity and b-glucan, which must be reduced. TGA and calibration are positively correlated to yield, and protein content is not negatively correlated to yield as often reported in wheat (e.g. Oury et al., 2003). Moreover, malting barley breeder wish to stabilize protein content in a medium range rather to increase. Finally, all traits are independent to heading date, which makes possible to select high malting quality in both early and late flowering material to better fit local climate.

Finally, this study, based on representative material from applied breeding programs of six-rowed winter barley, showed highly encouraging results in the perspective of using genomic prediction to accelerate breeding progress for malting traits. Predictive abilities are very high, sometimes approaching the square root of heritabilities, and would allow an efficient use of genomic prediction to replace phenotyping in early generation, thereby increasing selection intensity and reducing cycle length. Genetic resources from a single breeder and publicly available materials (varieties) are enough to achieve useful predictive abilities, but merging material and data from competing companies may allow some improvement in predictive ability of GS models.

Funding

The study was carried out in a project named Genomalt, coordinated by Amélie Genty and funded by the Fonds de soutien à l'obtention végétale (FSOV) under grant FSOV 2016 T.

Acknowledgements

The authors wish to thank G Cresté and PM Leroux from SECOBRA Recherches (Maule, France), R Dupont and M Tison from RAGT (Rodez France), S Schwebel & C. Colin and her team from IFBM (Vandœuvre-lès-Nancy, France) for providing the material, carrying out field trials and subsequent analyses, including malting tests, ordering genotyping and processing raw data.

Competing interest

The authors declare no competing interest

Author Contribution Statement

Amélie Genty coordinated the whole project and field trials of breeder X, Pierre Pin supervised genotyping and data analysis of breeder X, Bruno Claustres and Nathalie Leroy coordinated field trials of breeder Y, Christopher Burt coordinated genotyping of breeder Y, Marc Schmitt coordinated malt analyses, Gilles Charmet analysed the whole data and wrote a first draft of the MS. All authors read, completed the MS and endorsed the final version.

Data availability

A R directory with phenotypic and genotypic data can be provided on demand

Badr KM, Sch R, El Rabey H, Effgen., Ibrahim HH, Pozzi C, Rohde, Salamini F (2000) On the origin and domestication history of earley (Hordeum vulgare). Mol Biol Evol 17:499–510. https://doi.org/10.1093/oxfordjournals.molbev.a026330
Bates D, Mächler M, Bolker B, Walker S (2015) Fitting Linear Mixed-Effects Models Using lme4. J Stat Softw 67(1):1–48. doi: 10.18637/jss.v067.i01
Bayer MM, Rapazote-Flores P, Ganal M, Hedley PE, Macaulay M, Plieske J, Ramsa L, Russell J, Shaw PD, Thomas, Waugh R (2017) Development and evaluation of a barley 50k iSelect SNP array. Front Plant Sci 8:1792. doi: 10.3389/fpls.2017.01792
Bhatta M, Gutierrez L, Cammarota L, Cardozo F, Germán S, Gómez-Guerrero B et al (2020) Multi-trait genomic prediction model increased the predictive ability for agronomic and malting quality raits in Barley (Hordeum vulgare L.), G3 Genes|Genomes|Genetics. 10:1113–1124. https://doi.org/10.1534/g3.119.400968
Bernardo R, Yu JM (2007) Prospects for genomewide selection for quantitative traits in maize. Crop Sci 47:1082–1090
Charmet G, Tran LG, Auzanneau J, Rincent R, Bouchet S (2020) BWGS: A R package for genomic selection and its application to a wheat breeding programme. PLoS ONE 15(4):e0222733. https://doi.org/10.1371/journal.pone.0222733
Crossa J, de los Campos G, Perez P, Gianola D, Burgueño J, Araus J, Makumbi., Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun HJ (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724
Cullis BR, Smith AB, Coombes NE (2006) On the design of early generation variety trials with correlated data. J Agric Biol Environ Stat 11:381–393. https://doi.org/10.1198/108571106X154443
Daetwyler HD, Villanueav., Wooliams JA (2008) Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS ONE 3(10):e3395
De los Campos HJ, Pong-Wong, Daetwyler HD, Calus MPL (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193(2):327–345. http://doi.org/10.1534/genetics.112.143313
Edwards SM, Buntjer J, Jackson R et al (2019) The effects of training population design on genomic prediction accuracy in wheat. Theor. Appl. Genet. 132, 1943–1952 (2019). https://doi.org/10.1007/s00122-019-03327-y
Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250–255. doi: 10.3835/plantgenome2011.08.0024
Fang Y, Zhang X, Xue D (2019) Genetic analysis and molecular breeding applications of malting quality QTLs in barley. Front Genet 10:352. doi: 10.3389/fgene.2019.00352
Goddard ME, Hayes BJ (2007) Genomic selection. J Anim Breed Genet 124:323–330
Goddard M (2009) Genomic selection: prediction of accuracy and maximization of long term response. Genetica 136(2):245–257
Habier DRL, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics2011. 12:186
Han F, Romagosa I, Ullrich SE, Jones BL, Hayes PM, Wesenberg M (1997) Molecular marker-assisted selection for malting quality traits in barley. Mol Breed 3:427–437. doi: 10.1023/A:1009608312385
Haslemore RM, Slack CR, Brodrick KN (1982) Assessment of malting quality of lines from a barley breeding programme. New Z J Agricultural Res 25(4):497–502. DOI: 10.1080/00288233.1982.10425212
Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49:1–12
Heslot N, Yang H, Sorrells M, Jannink JL (2012) Genomic selection in plant breeding: A comparison of models. Crop Sci 52:146–160
Iwata H, Jannink JL (2011) Accuracy of genomic selection prediction in barley breeding programs: a simulation study based on the real single nucleotide polymorphism data of barley breeding lines. Crop Sci 2011,51: 1915–1927
Jannink JL, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomic Proteomic 9:166–177
Jiang Y, Reif JC (2015) Modeling epistasis in genomic selection. Genetics 201(2):759–768. http://doi.org/10.1534/genetics.115.177907
Komatsuda., Pourkheirandish M, He C, Azhaguvel P, Kanamori H, Perovic D, Stein N, Graner, Wicker T, Tagiri A, Lundqvist U, Fujimura T, Matsuoka M, Matsumoto, Masahiro Yano M (2007) Six-rowed barley originated from a mutation in a homeodomain-leucine zipper I-class homeobox gene. Proceedings of the National Academy of Sciences 104 (4) 1424–1429, DOI: 10.1073/pnas.0608580104
Li CD, Cakir M, Lance R (2009) Genetic improvement of malting quality through conventional breeding and marker-assisted selection. In: Zhang G, Li C (eds) Genetics and Improvement of Barley Malt Quality. Advanced Topics in Science and Technology in China. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01279-2_9
Lorenz A, Smith KP (2015) Adding genetically distant individuals to training populations reduces genomic prediction accuracy in barley. Crop Sci 55:2567–2667. doi:10.2135/cropsci2014.12.0827
Meuwissen THE, Hayes B, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
Nielsen NH, Jahoor A, Jensen JD, Orabi J, Cericola F, Edriss V et al (2016) Genomic prediction of seed quality traits using advanced barley breeding lines. PLoS ONE 11(10):e0164494. doi:10.1371/journal.pone.0164494
Oury FX, Berard P, Brancourt-Hulmel M, Depatureaux C, Doussinaults G, Galic N, Heumez E, Lecomte C, Pluchard P, Rolland B, Rousset M, Trottet M (2003) Yield and grain protein concentration in bread wheat: a review and a study of multi-annual data from a French breeding program. J Genet & Breeding 57:59–68
Park T, Casella G (2008) The bayesian lasso. J Am Stat Assoc 103:681–686
Piepho HP, Möhring J (2007) Computing heritability and selection response from unbalanced plant breeding trials. Genetics 177:1881–1888. 10.1534/genetics.107.074229
Poland J, Endelman J, Dawson J, Rutkoski J, Wu S, Manes Y, Dreisigacker S, Crossa J, Sánchez-Villeda H, Sorrells M, Jannink JL (2012) Genomic selection in wheat breeding using genotyping-by-sequencing. The Plant Genome 5:103–113
https://doi.org/10.3835/plantgenome2012.06.0006
R Core Team (2020) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/
Rodriguez-Álvarez MX, Boer MP, van Eeuwijk FA, Eilers PH (2018) Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spat Stat 23:52–71. https://doi.org/10.1016/j.spasta.2017.10.003
Rogers JS (1972) Measures of genetic similarity and genetic distances. Studies in Genetics. Univ Tex Publ 7213:145–153
Rutkoski J, Benson J, Jia Y, Brown-Guedira G, Jannink., Sorrells M (2012) Evaluation of genomic prediction methods for Fusarium head blight resistance in wheat. The Plant Genome 5:51–61. doi: 10.3835/plantgenome2012.02.0001
Schaeffer LR (2006) Strategy for applying genome-wide selection in dairy cattle. J Anim Breed Genet 123:218–223. DOI: 10.1111/j.1439-0388.2006.00595.x
Schmidt M, Kollers S, Maasberg-Prelle A et al (2016) Prediction of malting quality traits in barley based on genome-wide marker data to assess the potential of genomic selection. Theor Appl Genet 129:203–213. https://doi.org/10.1007/s00122-015-2639-1
Van Raden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4443
Wang X, Yang Z, Xu CW (2015) A comparison of genomic selection methods for breeding value prediction. Sci Bull 60:925–935. https://doi.org/10.1007/s11434-015-0791-2
Zohary D, Hopf M (1993) Domestication of plants in the Old World. The origin and spread of cultivated plants in West Asia, Europe and the Nile Valley. Clarendon Press, Oxford, England

Tables 1 to 4 are available in the Supplementary Files section

SupplFig1.tif
Suppl Figure 1: Illustration of the spatial correction of genotypic BLUE using SPaTs library on the Cupperly 2028 trial.
SupplFig2.tif
Suppl Figure 2: Plot of the first-2 axes of a standardized principal component analysis of the 10 variables.
SupplFig3.tif
Suppl Figure 3: Pairwise kinship coefficients from the A matrix between varieties and breeding lines, ordered by group of origin. Red are low values, green are high values.
SupplFig4.tif
Suppl. Figure 4: Plot of Pairwise LD against physical distance between markers on chromosome 1H. Green curve is a smoothed adjustment.
Tables.docx
Suppl.Table1.docx

Download PDF

Journal Publication

published 19 May, 2023

Read the published version in Euphytica →

Reviewers agreed at journal
19 Aug, 2022
Reviewers invited by journal
01 Aug, 2022
Editor invited by journal
28 May, 2022
Editor assigned by journal
16 May, 2022
First submitted to journal
10 May, 2022

You are reading this latest preprint version

Genomic prediction of agronomic and malting quality traits in six-rowed winter barley

Status:

Journal Publication

Version 1

Abstract

Figures

Key Message

Introduction

Materials And Methods

Plant material

Genotyping data

K=\(\frac{{WW}^{T}}{2\sum ({p}_{k}-1){p}_{k}}\)

Phenotypic data

Statistical analysis of phenotypic data

Genomic prediction models

Model validation

Results

Trait variation and summary statistics

Correlation and Principal component analysis

Molecular Data

Genomic prediction

Discussion

Statements & Declarations

References

Tables

Supplementary Files

Status:

Journal Publication

Version 1