Additive and dominance genetic values in inbred populations
Assume initially a single biallelic gene (A/a) determining a quantitative trait, where A is the gene that increases the trait expression, and a population derived by n generations of selfing from a Hardy-Weinberg equilibrium population (generation 0). Defining \({M}_{F}^{1}\) and \({M}_{F}^{2}\) as the means of the inbred population after an allelic substitution for the genes A and a, respectively, the average effect of the allelic genes in the inbred population are \({\alpha }_{A}^{\left(n\right)}={M}_{F}^{1}-{M}_{F}=q\alpha +2Fpqd\) and \({\alpha }_{a}^{\left(n\right)}={M}_{F}^{2}-{M}_{F}=-p\alpha +2Fpqd\), where \({M}_{F}=m+\left(p-q\right)a+2pqd- 2Fpqd=M- 2Fpqd\) is the inbred population mean, p and q are the allelic frequencies, \(\alpha\) is the average effect of an allelic substitution, \(F\) is the inbreeding coefficient, and M is the non-inbred population mean. Thus, the additive values in the inbred population are \({A}_{AA}^{\left(n\right)}=2q\alpha +4Fpqd={A}_{AA}^{\left(0\right)}+4Fpqd\), \({A}_{Aa}^{\left(n\right)}=\left(q-p\right)\alpha +4Fpqd={A}_{Aa}^{\left(0\right)}+4Fpqd\), and \({A}_{aa}^{\left(n\right)}=-2p\alpha +4Fpqd={A}_{aa}^{\left(0\right)}+4Fpqd\), where \({A}^{\left(0\right)}\) is the additive value in the non-inbred population. Note that \(E\left({A}^{\left(n\right)}\right)=4Fpqd\). Expressing the genotypic values in the inbred population as a function of \({M}_{F}\), we have:
$${G}_{AA}={M}_{F}+{A}_{AA}^{\left(0\right)}+\left({-2q}^{2}d+2Fpqd\right)={M}_{F}+{A}_{AA}^{\left(0\right)}+\left({D}_{AA}^{\left(0\right)}+2Fpqd\right)={M}_{F}+{A}_{AA}^{\left(0\right)}+{D}_{AA}^{\left(n\right)}$$
$${G}_{Aa}={M}_{F}+{A}_{Aa}^{\left(0\right)}+\left(2pqd+2Fpqd\right)={M}_{F}+{A}_{Aa}^{\left(0\right)}+\left({D}_{Aa}^{\left(0\right)}+2Fpqd\right)={M}_{F}+{A}_{Aa}^{\left(0\right)}+{D}_{Aa}^{\left(n\right)}$$
$${G}_{aa}={M}_{F}+{A}_{aa}^{\left(0\right)}+\left({-2p}^{2}d+2Fpqd\right)={M}_{F}+{A}_{aa}^{\left(0\right)}+\left({D}_{aa}^{\left(0\right)}+2Fpqd\right)={M}_{F}+{A}_{aa}^{\left(0\right)}+{D}_{aa}^{\left(n\right)}$$
Note that in the inbred population, \(E\left({A}^{\left(0\right)}\right)=E\left({D}^{\left(n\right)}\right)=0\) but \(E\left({D}^{\left(0\right)}\right)=-2Fpqd\). Note also that the additive value in the non-inbred population is the additive value in the inbred population expressed as deviation from its mean \(\left({A}^{\left(0\right)}={A}^{\left(n\right)}-4Fpqd\right)\) and the dominance value in the inbred population is the dominance value in the non-inbred population expressed as deviation from its mean \(\left({D}^{\left(n\right)}={D}^{\left(0\right)}+2Fpqd\right)\). This implies that, in the inbred population,\(E\left(G\right)={M}_{F}.\)
Genetic variances in inbred populations in LD
Assume now two linked biallelic genes (A/a and B/b) determining a quantitative trait and a non-inbred population in LD (generation 0). Assume dominance but initially no epistasis. After n generations of selfing, the genotypic variance for the two genes in the inbred population is (see the genotype probabilities in the Additional File Appendix) \({\sigma }_{G}^{2\left(n\right)}={\sigma }_{A}^{2\left(n\right)}+{\sigma }_{D}^{2\left(n\right)}+2{\sigma }_{A,D}^{\left(n\right)}\), where:
\({\sigma }_{A}^{2\left(n\right)}=\left(1+F\right)\left(2{p}_{a}{q}_{a}{\alpha }_{a}^{2}+2{p}_{b}{q}_{b}{\alpha }_{b}^{2}\right)+2\left[2+{c}_{1}\left(1-2{r}_{ab}\right)\right]{\varDelta }_{ab}^{(-1)}{\alpha }_{a}{\alpha }_{b}=\left(1+F\right){\sigma }_{A}^{2\left(0\right)}+ 2\left[{c}_{1}\left(1-2{r}_{ab}\right)-2F\right]{\varDelta }_{ab}^{(-1)}{\alpha }_{a}{\alpha }_{b}\) is the additive variance,
\({\sigma }_{D}^{2\left(n\right)}=\left(1-{F}^{2}\right)\left(4{p}_{a}^{2}{q}_{a}^{2}{d}_{a}^{2}+4{p}_{b}^{2}{q}_{b}^{2}{d}_{b}^{2}\right)+F\left[4{p}_{a}{q}_{a}{\left({p}_{a}-{q}_{a}\right)}^{2}{d}_{a}^{2}+4{p}_{b}{q}_{b}{\left({p}_{b}-{q}_{b}\right)}^{2}{d}_{b}^{2}\right]+ 8\left\{\left(1-F\right)\left({c}^{n}-1+F\right){p}_{a}{q}_{a}{p}_{b}{q}_{b}+\left({p}_{a}-{q}_{a}\right)\left({p}_{b}-{q}_{b}\right)\left[\left(1-F\right){c}^{n}-\left(1-2F\right)+{c}_{1}\left(1-2{r}_{ab}\right)/2\right]{\varDelta }_{ab}^{(-1)}/2+ \left(1-F\right){c}^{n}{{\varDelta }_{ab}^{(-1)}}^{2}\right\}{d}_{a}{d}_{b}=\left(1-{F}^{2}\right){\sigma }_{D}^{2\left(0\right)}+F{D}_{2}+8\left\{\left(1-F\right)\left({c}^{n}-1+F\right){p}_{a}{q}_{a}{p}_{b}{q}_{b}+ \left({p}_{a}-{q}_{a}\right)\left({p}_{b}-{q}_{b}\right)\left[\left(1-F\right){c}^{n}-\left(1-2F\right)+{c}_{1}\left(1-2{r}_{ab}\right)/2\right]{\varDelta }_{ab}^{(-1)}/2+\left[\left(1-F\right){c}^{n}-\left(1-{F}^{2}\right)\right]{{\varDelta }_{ab}^{(-1)}}^{2}\right\}{d}_{a}{d}_{b}\) is the dominance variance, and
\({\sigma }_{A,D}^{\left(n\right)}=2F\left[{2p}_{a}{q}_{a}\left({p}_{a}-{q}_{a}\right){\alpha }_{a}{d}_{a}+{2p}_{b}{q}_{b}\left({p}_{b}-{q}_{b}\right){\alpha }_{b}{d}_{b}\right]+\left[2F+{c}_{1}\left(1-2{r}_{ab}\right)\right]{\varDelta }_{ab}^{\left(-1\right)}\left[\left({p}_{b}-{ q}_{b}\right){\alpha }_{a}{d}_{b}+\left({p}_{a}-{q}_{a}\right){\alpha }_{b}{d}_{a}\right]=2F{D}_{1}+\left[2F+{c}_{1}\left(1-2{r}_{ab}\right)\right]{\varDelta }_{ab}^{\left(-1\right)}\left[\left({p}_{b}-{ q}_{b}\right){\alpha }_{a}{d}_{b}+\left({p}_{a}-{q}_{a}\right){\alpha }_{b}{d}_{a}\right]\) is the covariance between additive and dominance values,
where \({\varDelta }_{ab}^{(-1)} ={P}_{AB}^{(-1)}.{P}_{ab}^{(-1)}-{P}_{Ab}^{(-1)}.{P}_{aB}^{(-1)}\) is the measure of LD in the gametic pool of generation −1 [23], where \({P}^{(-1)}\) is a haplotype probability, \({r}_{ab}\) is the recombination frequency, \({c}_{1}=2\left\{1-{\left[\left(1-2{r}_{ab}\right)/2\right]}^{n}\right\}/\left(1+2{r}_{ab}\right)\), \(c=1-2{r}_{ab}\left(1-{r}_{ab}\right)\), \({\sigma }_{A}^{2\left(0\right)}=2{p}_{a}{q}_{a}{\alpha }_{a}^{2}+2{p}_{b}{q}_{b}{\alpha }_{b}^{2}+4{\varDelta }_{ab}^{(-1)}{\alpha }_{a}{\alpha }_{b}\) and \({\sigma }_{D}^{2\left(0\right)}=4{p}_{a}^{2}{q}_{a}^{2}{d}_{a}^{2}+4{p}_{b}^{2}{q}_{b}^{2}{d}_{b}^{2}+8{d}_{a}{d}_{b}\) are the additive and dominance variances in the non-inbred population in LD [24], and \({D}_{1}\) (covariance of a and d) and \({D}_{2}\) (variance of d) are the components of the covariance of relatives from self-fertilization, assuming linkage equilibrium [8]. The other terms are the covariances between the average effects of an allelic substitution, between dominance deviations, and between the average effect of an allelic substitution and dominance deviation, for genes in LD. Because we assumed biallelic genes, \({\stackrel{ˇ}{H}=\sigma }_{D}^{2}.\) Thus, \({\left(1-{F}^{2}\right){\sigma }_{D}^{2\left(0\right)}=\left(1-F\right)\sigma }_{D}^{2\left(0\right)}+F\left(1-F\right)\stackrel{ˇ}{H}\). Note that the genotypic variance derived here is a general formulation for the Cockerham’s genotypic variance cggg [8], assuming LD. If p = q, \({\sigma }_{A,D}^{\left(n\right)}=0\).
Assuming LD but no inbreeding, the genotypic variance after n generations of random cross in the non-inbred population in LD is \({\sigma }_{G}^{2\left(n\right)}={\sigma }_{A}^{2\left(n\right)}+{\sigma }_{D}^{2\left(n\right)}\), because
, where:
$${\sigma }_{A}^{2\left(n\right)}=2{p}_{a}{q}_{a}{\alpha }_{a}^{2}+2{p}_{b}{q}_{b}{\alpha }_{b}^{2}+4{\left(1-{r}_{ab}\right)}^{n}{\varDelta }_{ab}^{(-1)}{\alpha }_{a}{\alpha }_{b}$$
$${\sigma }_{D}^{2\left(n\right)}=4{p}_{a}^{2}{q}_{a}^{2}{d}_{a}^{2}+4{p}_{b}^{2}{q}_{b}^{2}{d}_{b}^{2}+8{\left[{{\left(1-{r}_{ab}\right)}^{n}\varDelta }_{ab}^{(-1)}\right]}^{2}{d}_{a}{d}_{b}$$
Thus, the genotypic variance can increase or decreases after n generations of random cross in a non-inbred population, depending on the sign of the LD measure. The LD value is positive for genes in coupling phase and negative for genes in repulsion phase.
Epistasis in non-inbred and inbred populations in LD
The quantitative genetics theory for modelling epistasis in a population in LD is a generalization of the theory proposed by O Kempthorne [16], who assumed a non-inbred population in linkage equilibrium and any number of alleles. We assumed biallelism. It should be emphasized that the Kempthorne’s theory allows a generalization from two to three or more interacting genes. But fitting three or more interacting genes in a population in LD is a challenge because the genotype probabilities for three or more genes in LD are too complex to derive. Furthermore, only complementary and duplicate epistasis can be easily defined for three or more epistatic genes.
Assume now that the two previous defined genes are epistatic. The genotypic value is [16]:
$${G}_{ijkl}=M+{\alpha }_{i}^{1}+{\alpha }_{j}^{1}+{\alpha }_{k}^{2}+{\alpha }_{l}^{2}+{\delta }_{ij}^{1}+{\delta }_{kl}^{2}+{\left({\alpha }^{1}{\alpha }^{2}\right)}_{ik}+{\left({\alpha }^{1}{\alpha }^{2}\right)}_{jk}+{\left({\alpha }^{1}{\alpha }^{2}\right)}_{il}+{\left({\alpha }^{1}{\alpha }^{2}\right)}_{jl}+{ \left({\alpha }^{1}{\delta }^{2}\right)}_{ikl}+{\left({\alpha }^{1}{\delta }^{2}\right)}_{jkl}+{\left({{\delta }^{1}\alpha }^{2}\right)}_{ijk}+{\left({{\delta }^{1}\alpha }^{2}\right)}_{ijl}+{\left({{\delta }^{1}\delta }^{2}\right)}_{ijkl}=M+A+D+AA+AD+ DA+DD$$
where AA, AD, DA, and DD are the additive x additive, additive x dominance, dominance x additive, and dominance x dominance epistatic genetic values.
The parametric values of the 36 parameters for the nine genotypic values are obtained by solving the equations \(\beta ={\left(X\text{'}VX\right)}^{-1}X\text{'}Vy\), under the restrictions defined by O Kempthorne [16], where \(X\) is the incidence matrix, \(V=diagonal\left\{{f}_{ij}^{\left(n\right)}\right\}\) is the diagonal matrix of the genotype probabilities, and \(y\) is the vector of the genotypic values \(\left({G}_{ij}\right)\) (i, j = 0, 1, and 2).
O Kempthorne [16] provided explicit functions for all effects because he assumed linkage equilibrium. Assuming LD makes very difficult to derive such functions but the following results hold:
1) the expectation of the breeding value is zero regardless of the degree of inbreeding in the population.
2) the expectation of the dominance value is \(E{\left(D\right)}^{\left(n\right)}={p}_{a}{q}_{a}F\left({\delta }_{AA}-{2\delta }_{Aa}{+\delta }_{aa}\right)+{p}_{b}{q}_{b}F\left({\delta }_{BB}-{2\delta }_{Bb}{+\delta }_{bb}\right)\); then, defining the dominance value in an inbred population as the dominance value expressed as deviation from its mean \(\left({D}^{\left(n\right)}=D-E{\left(D\right)}^{\left(n\right)}\right)\), \(E\left({D}^{\left(n\right)}\right)=0\).
3) the expectation of the additive x additive value is zero only if there is no LD.
4) the expectation of the additive x dominance value is zero only if F = 0 or p = q for all genes.
5) the expectation of the dominance x additive value is zero only if F = 0 or p = q for all genes.
6) the expectation of the dominance x dominance value is zero only if F = 0 and there is no LD.
Thus, defining the additive x additive, additive x dominance, dominance x additive, and dominance x dominance epistatic values as the values expressed as deviation from its mean, \({AA}^{\left(n\right)}=AA-E{\left(AA\right)}^{\left(n\right)}\), \({AD}^{\left(n\right)}=AD-E{\left(AD\right)}^{\left(n\right)}\), \({DA}^{\left(n\right)}=DA-E{\left(DA\right)}^{\left(n\right)}\), and \({DD}^{\left(n\right)}=DD-E{\left(DD\right)}^{\left(n\right)}\), the genotypic value in an inbred population can be expressed as
$$G=M+E{\left(D\right)}^{\left(n\right)}+E{\left(AA\right)}^{\left(n\right)}+E{\left(AD\right)}^{\left(n\right)}+E{\left(DA\right)}^{\left(n\right)}+E{\left(DD\right)}^{\left(n\right)}+A+{D}^{\left(n\right)}+{AA}^{\left(n\right)}+{ AD}^{\left(n\right)}+{DA}^{\left(n\right)}+{DD}^{\left(n\right)}={M}_{F}+A+{D}^{\left(n\right)}+{AA}^{\left(n\right)}+{AD}^{\left(n\right)}+{DA}^{\left(n\right)}+{DD}^{\left(n\right)}$$
This implies that \(E\left(G\right)={M}_{F}\). If F = 0 then
$$G=M+E\left(AA\right)+E\left(DD\right)+A+D+\left[AA-E\left(AA\right)\right]+AD+DA+\left[DD-E\left(DD\right)\right]={M}^{*}+A+D+{AA}^{*}+AD+DA+{DD}^{*}$$
where,
\(E\left(AA\right)=2{\varDelta }_{ab}^{(-1)}\left({\alpha }_{A}{\alpha }_{B}-{\alpha }_{A}{\alpha }_{b}-{\alpha }_{a}{\alpha }_{B}+{\alpha }_{a}{\alpha }_{b}\right)\) and \(E\left(DD\right)={\left[{\varDelta }_{ab}^{(-1)}\right]}^{2}\left({{\delta }_{AA}\delta }_{BB}-2{{\delta }_{AA}\delta }_{Bb}+{{\delta }_{AA}\delta }_{bb}-2{{\delta }_{Aa}\delta }_{BB}+4{{\delta }_{Aa}\delta }_{Bb}-{{\delta }_{Aa}\delta }_{bb}+{{\delta }_{aa}\delta }_{BB}-2{{\delta }_{aa}\delta }_{Bb}+{{\delta }_{aa}\delta }_{bb}\right)\).
This implies that \(E\left(G\right)={M}^{*}\). If F = 0 and there is no LD,
where the linear components are those defined by O Kempthorne [16]. This implies that \(E\left(G\right)=M\).
In non-inbred populations in LD, only the additive and dominance values are not correlated. The genotypic variance in these populations is, in simplified form,
$${\sigma }_{G}^{2\left(0\right)}={\sigma }_{A}^{2\left(0\right)}+{\sigma }_{D}^{2\left(0\right)}+{\sigma }_{AA}^{2\left(0\right)}+2{\sigma }_{A,AA}^{\left(0\right)}+2{\sigma }_{D,AA}^{\left(0\right)}+\dots$$
where
$${\sigma }_{AA}^{2\left(0\right)}={f}_{22}^{\left(0\right)}{\left[\left({4\alpha }_{A}{\alpha }_{B}\right)\right]}^{2}+\dots +{f}_{00}^{\left(0\right)}{\left[\left({4\alpha }_{a}{\alpha }_{b}\right)\right]}^{2}-{\left[E{\left(AA\right)}^{\left(0\right)}\right]}^{2}$$
$${\sigma }_{A,AA}^{\left(0\right)}=2{\varDelta }_{ab}^{(-1)}\left[{\alpha }^{A}\left({\alpha }_{A}{\alpha }_{B}{-\alpha }_{A}{\alpha }_{b}+{\alpha }_{a}{\alpha }_{B}-{\alpha }_{a}{\alpha }_{b}\right)+{\alpha }^{B}\left({\alpha }_{A}{\alpha }_{B}{-\alpha }_{a}{\alpha }_{B}+{\alpha }_{A}{\alpha }_{b}-{\alpha }_{a}{\alpha }_{b}\right)\right]$$
$${\sigma }_{D,AA}^{\left(0\right)}=-4{\varDelta }_{ab}^{(-1)}\left[{{p}_{a}{q}_{a}d}_{a}\left({\alpha }_{A}{\alpha }_{B}{-\alpha }_{A}{\alpha }_{b}-{\alpha }_{a}{\alpha }_{B}+{\alpha }_{a}{\alpha }_{b}\right)+{{p}_{b}{q}_{b}d}_{b}\left({\alpha }_{A}{\alpha }_{B}{-\alpha }_{a}{\alpha }_{B}-{\alpha }_{A}{\alpha }_{b}+{\alpha }_{a}{\alpha }_{b}\right)\right]$$
where, to avoid confusion, \({\alpha }^{A}\) and \({\alpha }^{B}\) are the average effects of an allelic substitution.
The assumption of LD makes very difficult to derive the components of the genotypic variance (additive, dominance, and epistatic variances and the covariances between these effects), even assuming non-inbred populations, biallelic genes, and only digenic epistasis. In respect to the types of digenic epistasis, the following can be defined [25, 26]:
-
Complementary (\({G}_{22}={G}_{21}={G}_{12}={G}_{11}\) and \({G}_{20}={G}_{10}={G}_{02}={G}_{01}={G}_{00}\); proportion of 9:7 in a F2).
-
Duplicate (\({G}_{22}={G}_{21}={G}_{20}={G}_{12}{=G}_{11}={G}_{10}={G}_{02}={G}_{01}\); proportion of 15:1 in a F2).
-
Dominant (\({G}_{22}={G}_{21}={G}_{20}={G}_{12}{=G}_{11}={G}_{10}\) and \({G}_{02}={G}_{01}\); proportion of 12:3:1 in a F2).
-
Recessive (\({G}_{22}={G}_{21}={G}_{12}={G}_{11}\), \({G}_{02}={G}_{01}\), and \({G}_{20}={G}_{10}={G}_{00}\); proportion of 9:3:4 in a F2)
-
Dominant and recessive (\({G}_{22}={G}_{21}={G}_{12}={G}_{11}={G}_{20}={G}_{10}={G}_{00}\) and \({G}_{02}={G}_{01}\); proportion of 13:3 in a F2).
-
Duplicate genes with cumulative effects (\({G}_{22}={G}_{21}={G}_{12}={G}_{11}\), and \({G}_{20}={G}_{10}={G}_{02}={G}_{01}\); proportion of 9:6:1 in a F2).
-
Non-epistatic genic interaction (\({G}_{22}={G}_{21}={G}_{12}={G}_{11}\), \({G}_{20}={G}_{10}\), and \({G}_{02}={G}_{01}\); proportion of 9:3:3:1 in a F2).
Simulated data sets
Because the magnitude of the components of the genotypic variance generally cannot be inferred from the previous functions, all means and genetic variances and covariances were computed from simulated data sets provided by the software REALbreeding (available upon request). This software uses the quantitative genetics theory that was described in the previous sections and in JMS Viana [24]. REALbreeding has been used to provide simulated data in investigations in the areas of genomic selection [27], GWAS [28], QTL mapping [29], linkage disequilibrium [30], population structure [31], and heterotic grouping/genetic diversity [32].
The software simulates individual genotypes for genes and molecular markers and phenotypes in three steps using user inputs. The first step (genome simulation) is the specification of the number of chromosomes, molecular markers, and genes as well as marker type and density. The second step (population simulation) is the specification of the population(s) and sample size or progeny number and size. A population is characterized by the average frequency for the genes (biallelic) and markers (first allele). The final step (trait simulation) is the specification of the individual phenotypes. In this stage, the user informs the minimum and maximum genotypic values for homozygotes (to compute the a deviations), the minimum and maximum phenotypic values (to avoid outliers), the direction and degree of dominance (to compute the dominance deviations/d), and the broad sense heritability. The current version allows the inclusion of digenic epistasis, gene x environment interaction, and multiple traits (up to 10), including pleiotropy. The population mean (M), additive (A), dominance (D), and epistatic (AA, AD, DA, and DD) genetic values or general and specific combining ability effects (GCA and SCA) or genotypic values (G) and epistatic values (I), depending on the population, are calculated from the parametric gene effects and frequencies and the parametric LD values. The phenotypic values (\(P\)) are computed assuming error effects \(\left(E\right)\) sampled from a normal distribution (\(P=M+A+D+AA+AD+DA+DD+ E=G+E\) or \(P=M+GCA1+GCA2+SCA+I+E=G+E\)). The population in LD is generated by crossing two populations in linkage equilibrium followed by a generation of random cross. This generation of random cross aims to generate a population in Hardy-Weinberg equilibrium. Thus, the generation 0 (the founder population) is a population in Hardy-Weinberg equilibrium, in LD for linked genes and molecular markers, and the individuals are not related. The parametric LD in this population is \({\varDelta }_{ab}^{(-1)}=\left[\left(1-2{r}_{ab}\right)/4\right]\left({p}_{a1}-{p}_{a2}\right)\left({p}_{b1}-{p}_{b2}\right)\), where the indexes 1 and 2 stand for the allele frequencies in the parental populations.
The quantitative genetics theory for epistasis does not solve the challenge of studying genetic variability and covariance between relatives in populations, using simulated data sets, even assuming simplified scenarios such as linkage equilibrium and no inbreeding. Because the genotypic values for any two interacting genes are not known, there are infinite genotypic values that satisfy the specifications of each type of digenic epistasis. For example, fixing the gene frequencies (the population) and the parameters m, a, d, and d/a (degree of dominance) for each gene (the trait), the solutions \({G}_{22}={G}_{21}={G}_{12}={G}_{11}\) = 5.25 and \({G}_{20}={G}_{10}={G}_{02}={G}_{01}={G}_{00}\) = 5.71 or \({G}_{22}={G}_{21}={G}_{12}={G}_{11}\) = 6.75 and \({G}_{20}={G}_{10}={G}_{02}={G}_{01}={G}_{00}\) = 2.71 define complementary epistasis but the genotypic values are not the same.
The solution implemented in the software allows the user to control the magnitude of the epistatic variance (V(I)), relative to the magnitudes of the additive and dominance variances (V(A) and V(D)). As an input for the user, the software requires the ratio V(I)/(V(A) + V(D)) for each pair of interacting genes (a single value; for example, 1.0). Then, for each pair of epistatic genes the software samples a random value for the epistatic value \({I}_{22}\) (the epistatic value for the genotype AABB), assuming \({I}_{22}N\left(0, V\left(I\right)\right)\). Then, the other epistatic effects and genotypic values are computed.
We simulated grain yield assuming 400 genes in 10 chromosomes of 200 and 50 cM (40 genes/chromosome). The average density was approximately one gene each five and one cM, respectively. We generated five populations, two with high LD level and one with low LD level, all three with an average allele frequency of 0.5, and two populations with intermediate LD level and an average frequency for the favorable genes of 0.3 (not improved) and 0.7 (improved). We defined positive dominance (average degree of dominance of 0.6), maximum and minimum genotypic values for homozygotes of 160 and 30 g.plt− 1, and maximum and minimum phenotypic values of 180 and 10 g.plt− 1. The broad sense heritability was 20%. For each population we assumed additive-dominance model and additive-dominance with digenic epistasis model, defining 100% and 30% of interacting genes. Concerning the ratio V(I)/(V(A) + V(D)), the analyses assuming ratios 1, 10, and 100 evidenced that increasing the ratio from 1 to 10 and 100 increased the epistatic variances but also increased the additive and dominance variances. Then, because the main conclusions for the greater ratios were essentially the same provided by ratio 1, we will present only the results for ratio 1. With epistasis, we assumed a single type or an admixture of the seven types. We ranged the degree of inbreeding from 0.0 to 1.0, assuming 10 generations of selfing. We also assumed 10 generations of random crosses. The population size was 5,000 per generation.
The characterization of the LD in the populations was based on the parametric Δ, r2, and D’ values for the 40 genes in chromosome 1, which were provided by REALbreeding (it should be similar for the other chromosomes). The heatmaps were processed using the R package pheatmap. Assuming no epistasis, the software provides the parametric additive and dominance genetic values and the parametric genetic variances and covariances. Assuming epistasis, the software provides the parametric additive, dominance, and epistatic genetic values. Thus, under epistasis, the genetic variances and covariances were computed from the parametric genetic values, using a sample size of 5,000 individuals per generation. Two important implications of our results are that selection based on breeding value prediction remains the best approach for population improvement and that cross- and self-pollinated populations keep a non-negligible amount of genetic variation for quantitative traits to allow their adaptive potential to environmental changes, assuming LD and epistasis.