Plant materials
This experiment was performed using two RIL populations. The first was derived from crossing B73 and BY804, which consisted of 188 RILs. B73 is a famous elite line developed from the Iowa Stiff Stalk Synthetic population. BY804 is a special inbred line with a high kernel oil content, which was derived from a Beijing high-oil population. Another RIL population was composed of 215 lines, which were derived from a lodging-resistant maize hybrid with the elite inbred lines Zheng58 and HD568 as parents. Zheng58 is a famous inbred line that is the parent of the widely grown maize hybrid Zhengdan958. HD568 was purposefully selected with the criterion of high stalk strength. Finally, these two RIL populations are abbreviated as HO (high-oil) and LR (lodging-resistance) for simplicity.
Field trial and phenotyping
A randomized incomplete block design with two replicates was implemented for the trial in each year. For the HO population, an experiment was primarily performed in Beijing in the summer of 2012, in which all lines were sown in a single row and the planting density was 49,500 plants per hectare. In addition, both RIL populations were evaluated in Hainan Province in the winter of 2012 and Beijing in the summer of 2013. Furthermore, each line was assigned to a single-row plot, and the planting density was 60,000 plants per hectare. The RPR in seven stages, including the tenth-leaf stage (V10), days to silking (DTS), and 10 to 50 days after silking (AS10, AS20, AS30, AS40 and AS50), was evaluated in the middle of the flat side of the third stalk internode aboveground with an electronic rind penetrometer (AWOS-SL04, Aiwoshi Science & Technology Co., Ltd. Company, Hebei, China). In this experiment, two to five randomly selected plants were used to investigate RPR measurements, and the RPR of each line was determined by the mean of ten measures.
Phenotypic data analysis
Analysis of variance, broad-sense heritability and best linear unbiased estimation
Analysis of variance (ANOVA) was performed using the aov function in the R package stats version 3.6.0 (R Core Team, 2019). The linear model is as follows:
yijk = μ + gi + ej + geij + rjk +εijk,
where yijk is the phenotypic value of the ith genotype in the jth environment with the kth replicate, μ is the overall mean, gi is the effect of the ith genotype, ej is the effect of the jth environment, geij is the interaction effect between the ith genotype and the jth environment, rjk is the effect of the kth replicate within the jth environment, and εijk is the model residuals. Broad-sense heritability was estimated according to the formula:
H2 = /( + /e + /re),
where , , and are the variance components of genotype, genotype by environment interaction and random error, respectively, and r and e are the numbers of replicates and environments, respectively. In addition, the R package lme4 version 1.1-21 was used to perform the best unbiased linear estimation (BLUE) for the genetic effects with the following mixed linear model (MLM) [73, 74]:
yijk = μ + gi + ej + geij + rjk +εijk,
where yijk is the phenotypic value of individual i in the jth environment with the kth replicate, μ is the overall mean, gi is the fixed effect of the ith genotype, ej is the fixed effect of the jth environment, geij is the random interaction effect between the ith genotype and the jth environment, rjk is the random effect of the kth replicate within the jth environment, and εijk are the model residuals. The phenotypic values of RPR determined in the winter of 2012 and the summer of 2013 were jointly used to perform ANOVA and estimate the BLUE values. The BLUE values were used to perform QTL mapping to detect the relationship between H2 and the QTL number.
Construction of the hierarchical clustering of RPR in various stages
For the construction of the hierarchical clustering of RPR in two RIL populations, the BLUE values of each stage were first standardized to a zero mean and unit variance according to the following formula:
Yij = (Xij – mean(X.j)) / sd(X.j),
where Yij is the transformed value, Xij is the BLUE values of the ith genotype in the jth stage, mean() is defined as the mean value, and sd() is the standard deviation. Furthermore, the transformed values of RPR in each stage were used to calculate the Euclidean distances between all pairs of stages, which was performed by the euclidean method with the dist function in the R package stats version 3.6.0. The formula is as follows:
DAB = (2)1/2,
where DAB is the value of the Euclidean distance between stages A and B; YiA and YiB are the transformed values of the ith genotype in stages A and B, respectively; and n is the individual numbers of two RIL populations. The hclust function was used to construct the hierarchical clustering tree based on the distance values of all pairs of seven stages.
Phenotypic and genetic correlation
The phenotypic correlation (rp) is a measurement of the association between the phenotypic values of individuals for a pair of targeted traits. The genetic correlation (rg) is a parameter of genetics that can be used to evaluate the degree of association in genetic variation between two traits. The correlation coefficients are estimated by the following formulae [75–77]:
rp = covp(A,B)/;
rg = covg(A,B)/,
where covp() and covg() are the phenotypic and genetic covariances between traits, respectively, and Vp and Vg are the phenotypic and genetic variances of target traits, respectively. Phenotypic and genetic correlation analyses were performed using the cor function in the R package stats version 3.6.0 (R Core Team, 2019) and the asreml function in the R package ASReml version 3.0 [78], respectively. Phenotypic data of RPR evaluated in each environment were utilized to estimate the phenotypic correlation coefficient. However, the RPR values determined in the winter of 2012 and the summer of 2013 were used to perform an analysis of genetic correlation.
Genotypic data analysis
Genotyping and quality control
For the HO population, all inbred lines were used for genotyping with the MaizeSNP3K array, which is a subset of the Illumina MaizeSNP50 BeadChip [79]. Markers with missing rates greater than 0.20 and minor allele frequencies (MAFs) less than 0.05 were removed. However, the maize 55K SNP array [80] was used to genotype all the RILs in the LR population. The eligible SNP markers were retained according to the screening criteria, including MAFs greater than 0.05, missing values less than 0.10 and heterozygous allele frequencies less than 0.05. Moreover, chi-square tests were performed for all the SNPs in each population with the aim of filtering out markers with segregation distortion (P < 0.05) in the two RIL populations.
Construction of the bin map and QTL mapping
Bin markers were detected and aligned with the sliding-window approach, which was applied to identify variant calling errors and evaluate the ratio of SNP alleles derived from the parents. The detailed method of bin map construction was according to Liu et al. (2019) [81]. The genetic maps of both RIL populations were constructed by the Kosambi mapping function in the mstmap function in the R package ASMap version 1.0-4 [82]. QTL related to RPR were detected by composite interval mapping using the cim function in the R package R/qtl version 1.44-9 [83]. The threshold of likelihood of odds (LOD) ratio was determined by 1,000 permutation tests with a P value less than 0.05. After the permutation tests, a threshold LOD value of 3.5 was used to determine QTL for RPR. The confidence interval of each QTL was defined as a 1.5-LOD drop from the peak LOD value for each marker. If the physical position of the confidence intervals corresponding to each QTL had an overlapped genomic region, this region was considered a pleiotropic QTL. Furthermore, the most likely candidate genes within the confidence interval were consulted and selected from the maize genetics and genomics database (MaizeGDB, https://www.maizegdb.org/). Because fewer qualified markers were retained in the HO population, these SNPs were directly used to construct a linkage map and perform further analyses without detecting bin markers. However, the bin markers could be checked and used to perform subsequent analyses in the LR population.
Analysis of GO enrichment and KEGG pathway
Gene ontology (GO) enrichment analysis was performed using singular enrichment analysis (SEA) by AgriGO version 2.0 (http://systemsbiology.cau.edu.cn/agriGOv2/ index.php) with the Fisher statistical test method and Yekutieli multitest adjustment method at a significance level of P < 0.05 [84]. Additionally, the slim of GO function was summarized by GOSlimViewer (https://agbase.arizona.edu/cgi-bin/tools/goslimviewer_select.pl) of AgBase [85]. For Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation, KOBAS version 3.0 (http://kobas.cbi.pku.edu.cn/index.php) was used to perform an analysis of the functional gene set enrichment with Fisher statistical method and the Benjamini and Yekutieli FDR (false discovery rate) correction method (P < 0.05). Significant GO items and KEGG entries were extracted to draw plots.
Genomic selection
A 5-fold cross-validation scheme with 100 replicates was used to assess the performance of each model and calculate the prediction accuracy (rMP), which was the correlation coefficient between genomic-estimated breeding values (GEBVs) and phenotypic values. Three models were used to perform cross-validation, which were developed from the genomic best linear unbiased prediction (GBLUP) model. The univariate model (UV) is essentially a general form of the GBLUP model, and the mixed model is as follows [86, 87]:
y = 1nμ + u + ε,
where y is a vector (n×1) of phenotypic values, 1n is a vector (n×1) of ones, μ is the overall mean, u is the random effects that obeys a normal distribution N(0,G), is the genetic variance, G is the genomic relationship matrix among all genotypes calculated following VanRaden (2008) [86], ε is a vector (n×1) of random terms with a normal distribution N(0,I), and I is an identity matrix. In addition, n is the number of individuals. For the GBLUP model including fixed effects (FIXED), the formula can be described as [81]:
y = Xβ + u + ε,
where β is the vector (n×1) of fixed effects, X is the (n×p) design matrix, and the other parameters are identical to the description mentioned above. The markers corresponding to the peak LOD value within each QTL were selected to construct the X design matrix, and p denotes the number of target markers. However, the G matrix is calculated by the design matrix containing m-p markers, where m is the number of all markers in each RIL population. Finally, the multivariate model was developed from the univariate model, and the model was as follows [66]:
y = 1nμ + u + v + ε,
where y is a vector (n×1) of the target variate, v is the random effects for auxiliary variates with a normal distribution N(0,Gv), is the variance component of v, and the other parameters are identical to the description mentioned above. The Gv is the multivariate relationship matrix, which was calculated as follows: Gv = nMvMv’/trace(MvMv’), where Mv = [y1,y2,…,yi,...,yt-1], yi is a scaled vector (n×1) of the phenotypic values of the ith environment or stage that were standardized to zero mean and unit variance, t is the number of all variates, and trace denotes the sum of all diagonal elements. The phenotypic data of the t-1 environments or stages are recognized as auxiliary variates in the model. If auxiliary variates were derived from multiple environments, the multivariate model would be abbreviated as ME. For auxiliary variates derived from multiple stages, the model was represented as MS for short. These models were fitted using the R package BGLR version 1.0.8 [88], and the iteration of the Gibbs sampler was set to 10,000, with the first 5000 samples discarded as burn in.