Natural variation in KRN within the association panel
KRN was measured within our association panel, which included 639 maize inbred lines, in XX (Xinxiang in Henan Province, 35.19°N, 113.53°E), BJ (Beijing, 39.48°N, 116.28°E) and GZL (Gongzhuling in Jilin Province, 43.50°N, 124.82°E) in 2011 (Table S1). The results showed that KRN was normally distributed in each environment, and the KRNs among environments were highly positively correlated, with correlations ranging from 0.73 between XX and BJ to 0.79 between XX and GZL (Figure 1a). KRN exhibited high broad-sense heritability (H2 = 0.90, Table 1), which was similar to the results of previous studies [14, 16]. Comparing KRN among the different environments, we found that it showed the smallest average (13.69), minimum (8.60) and maximum (20.60) values in XX, where all accessions were planted in summer (June). With increasing latitude, where the accessions were planted in spring (May), the average KRN increased (14.65 in BJ and 14.59 in GZL). The largest range (max - min) in KRN appeared in GZL (12.60), which had the longest day length (Table 1). Based on previous results [53], our association panel could be divided into five subgroups: Reid, tangsipingtou (TSPT), lvdahonggu (LRC), Lancaster and P. The KRN statistical analysis results of various subgroups are shown in Table S2. There were no significant differences in KRN among the five subgroups (Figure 1b). These results indicated that KRN was a quantitative trait and that the phenotypic variation among the tested inbred lines in the association panel was beneficial for dissecting the genetic architecture of KRN.
Table 1. Phenotypic variance in KRN for 639 maize inbred lines in three environments.
Env
|
Mean
|
Min
|
Max
|
SD
|
CV (%)
|
H2
|
|
|
XX
|
13.69
|
8.60
|
20.60
|
2.02
|
14.76
|
0.90
|
|
BJ
|
14.65
|
9.20
|
21.00
|
1.69
|
11.56
|
|
GZL
|
14.59
|
8.60
|
21.20
|
2.00
|
13.69
|
|
BLUP
|
14.31
|
9.17
|
20.01
|
1.61
|
11.27
|
|
|
Env, environment; XX, Xinxiang; BJ, Beijing; GZL, Gongzhuling; Max, maximum; Min, minimum; SD, standard deviation; CV, coefficient of variation; H2, broad-sense heritability.
QTNs for KRN identified by different methods
Single-locus analysis of KRN (MLM)
Based on the MaizeSNP50 BeadChip, we obtained 42,667 high-quality SNPs distributed on 10 maize chromosomes. Under the P < 0.0001 and P < 0.001 thresholds, 3/56, 3/46, 1/24, and 3/51 KRN-associated QTNs were found in XX (Figure 2a), in BJ (Figure 2b), in GZL (Figure 2c) and with BLUP (Figure 2d), respectively. To account for overcorrection in this model, the P < 0.001 threshold was selected to identify KRN-associated QTNs. Finally, 177 QTNs were found to be associated with KRN, and the proportion of phenotypic variance explained (PVE) by these individual QTNs ranged from 1.84 to 4.01% (Table S3).
Multiple-locus analysis of KRN
Using different multiple-locus models, we identified different numbers of significant QTNs for KRN in XX, BJ, and GZL and together with BLUP across all locations. These QTNs were unevenly distributed on 10 chromosomes, with the most QTNs on Chr. 1 and the fewest on Chr. 8 (Figure 2e). Specifically, 15 (FASTmrEMMA)-177 (mrMLM) QTNs in XX, 11 (FASTmrEMMA)-30 (ISIS EM-BLASSO) QTNs in BJ, 12 (FASTmrEMMA)-55 (mrMLM) QTNs in GZL and 11 (FASTmrEMMA)-106 (mrMLM) QTNs for BLUP were identified by the six different methods (Table S4). Comparative analysis of the GWAS results among different statistical approaches showed that FASTmrEMMA detected the fewest QTNs in all the environments, while mrMLM detected the most QTNs in all the environments, except for BJ (Table S4). QTN overlap analysis among the seven methods indicated that the common QTNs codetected by at least two methods accounted for more than 40% of the QTNs in different environments (Figure S1a and Table S5, 42% in XX, 62% in BJ, 58% in GZL and 47% with BLUP). For example, 65 common QTNs representing 30 loci were codetected by two methods in XX, and 39 common QTNs representing 13 loci, 28 common QTNs representing 7 loci, 25 common QTNs representing 5 loci, and 6 common QTNs representing 1 locus were codetected by three, four, five and six methods, respectively (Figure S1a and Table S5). No QTNs were identified by all 7 methods in different locations. Overall, ISIS EM-BLASSO, which detected the third largest number of QTNs, identified the most codetected QTNs, followed by FASTmrMLM (Figure S1a and Table S5). Comparative analysis of the GWAS results among the different environments showed that the majority of the QTNs identified by the MLM method and ISIS EM-BLASSO were repeatedly detected in different locations (Figure S1b, Table S6).
Overall, comparing our GWAS results with those of previous studies, we found that some important genes controlling inflorescence architecture in maize were located within 200 kb of the significant QTNs, including CT2 (Zm00001d027886), FEA3 (Zm00001d040130), BAD1 (Zm00001d005737), RA1 (Zm00001d020430), and VT2 (Zm00001d008700) (Table S13).
Annotation and expression of candidate genes for KRN
To obtain reliable significant QTNs and predict the candidate genes for KRN, only the QTNs simultaneously identified by at least three methods (either single-locus or multilocus) and in at least two environments were used for the next analysis. Finally, seven QTNs controlling KRN were obtained (Table 2). The seven QTNs were located on chromosomes 1, 2, 3, 5, 9, and 10, and the PVE by these QTNs ranged from 1.06% to 5.21%. Based on the linkage disequilibrium (LD) in the association panel (Figure S2), 49 genes around the QTNs (200 kb upstream and downstream) were obtained, and their expression varied widely in different maize tissues (Figure 3a and Table S7). For example, Zm00001d016760, which encodes the abscisic acid stress ripening 6 protein, is highly expressed in the roots, and Zm00001d031426, which encodes serine/threonine-protein kinase, and Zm00001d043298, which encodes a P-loop containing nucleoside triphosphate hydrolase superfamily protein, are highly expressed in tassels and anthers. Among the 49 genes, 22 were differentially expressed in different spike development mutants (Table S8); i.e., the ra1, ra2 and ra3 mutants had abnormal highly branched tassels and ears, with the ears displaying a very large KRN [45]; the kn1 mutant had smaller ears and fewer spikelets [58]. This result suggested that these 22 genes might be involved in ear development in maize.
Table 2. Significant KRN-associated QTNs codetected in at least two environments and by at least three models.
SNP
|
Chr
|
Pos
|
Single-locus GWAS (MLM)
|
|
Multilocus GWAS
|
LOD
|
PVE (%)
|
|
LOD
|
PVE (%)
|
Methods1
|
PZE-101124566
|
1
|
156580056
|
3.44
|
3.00
|
|
4.60-11.63
|
1.91-3.02
|
2, 3, 4, 5, 6, 7
|
PZE-101144585
|
1
|
187526525
|
3.13
|
2.00
|
|
4.39-5.95
|
1.84-3.51
|
3, 4, 5, 7
|
PZE-102176259
|
2
|
219023013
|
3.32
|
3.00
|
|
3.41-4.17
|
1.06-2.04
|
2, 3, 4, 7
|
PUT-163a-110967306-138
|
3
|
191981941
|
3.28
|
2.56
|
|
8.17-11.77
|
1.62-3.37
|
2, 5, 6
|
PZE-105114980
|
5
|
171187130
|
/
|
/
|
|
4.35-8.20
|
1.15-2.29
|
2, 3, 5,6,7
|
PZE-109047930
|
9
|
79941271
|
4.61
|
4.00
|
|
5.73-10.40
|
2.43-5.21
|
2, 3, 5, 6, 7
|
PZE-110106563
|
10
|
146944098
|
3.61
|
3.00
|
|
3.65-5.25
|
1.18-2.38
|
2, 3, 4, 5, 6, 7
|
Methods1: Numbers 1 to 7 represent different GWAS methods: 1: MLM; 2: mrMLM; 3:
FASTmrMLM; 4: FASTmrEMMA; 5: pLARmEB; 6: pKWmEB; 7: ISIS EM-BLASSO.
Interestingly, we found that Zm00001d026540 (encoding auxin response factor 29, ARF29), which was located within 200 kb downstream of PZE-110106563 on Chr. 10 and was detected by the MLM method and all six multilocus GWAS methods (Table 2), had higher expression in SAMs and ears than in other tissues (Table S7). Candidate gene association mapping was also performed. The SNPs within ARF29 and the 10-kb promoter and 10-kb region downstream of ARF29 were obtained from maize HapMap3 [60]. The KRN of 282 inbred lines was measured in six environments (see Methods), and the BLUP values were calculated. The MLM mapping result showed that five SNPs (two SNPs in the gene and three SNPs in the region upstream of the gene) around ARF29 were significantly related to KRN (Figure 3b and Table 3). ARF29 can bind the Bif1 (which is related to SAM development and final KRN) promoter by recognizing the TTTCGG motif [40, 41]. The S10_147122969 SNP, located within the gene body, was significantly associated with KRN. Two alleles for this SNP (A/T) were present in this panel, with the A allele conferring a higher KRN. Cytokinins also play an important role in the development of immature spikes and the formation of final KRN [42]. For example, UB3 regulates KRN by the cytokinin pathway and CLAVATA-WUSCHEL pathway [42]. In this study, CKO4 (Zm00001d043293, encoding cytokinin oxidase protein) was detected as being located within 200 kb upstream of PUT-163a-110967306-138 on Chr. 3 by four GWAS methods (MLM, mrMLM, pLARmEB, and pKWmEB, Table 2), and candidate gene association mapping of CKO4 was also conducted. The SNPs and KRN were also obtained from HapMap3 and 282 inbred lines. The MLM results showed that two SNPs located upstream of CKO4 were significantly associated with KRN (Figure 3c and Table 3). The S3_191837578 SNP had two alleles (T/G), and the T allele was associated with a higher KRN but had a lower frequency. Therefore, this allele may not be widely useful in maize breeding.
Table 3. Candidate gene association analysis.
Gene ID
|
SNP1
|
Chr
|
Pos
|
LOD
|
PVE
|
Allele
|
Frequency
|
ARF29
|
S10_147122969
|
10
|
147122969
|
4.57
|
8.97%
|
A/T
|
127/99
|
S10_147121954
|
10
|
147121954
|
4.44
|
8.98%
|
G/A
|
94/90
|
S10_147126021
|
10
|
147126021
|
3.88
|
7.58%
|
T/A
|
161/27
|
S10_147123193
|
10
|
147123193
|
3.33
|
5.30%
|
A/C
|
119/110
|
S10_147141311
|
10
|
147141311
|
3.17
|
4.92%
|
C/G
|
211/21
|
CKO4
|
S3_191837578
|
3
|
191837578
|
4.64
|
7.85%
|
G/T
|
177/45
|
S3_191841761
|
3
|
191841761
|
4.67
|
6.99%
|
T/G
|
236/16
|
1: The significant SNPs calculated by the MLM method in regional association mapping based on the 282 inbred lines.
Whole-genomic prediction of KRN
We first analyzed the LD blocks of all markers using the threshold value r2 > 0.2 and obtained 27,688 tagSNPs in our association panel. Then, we randomly selected different numbers of tagSNPs, from 5 to 27,000, in the whole genome to calculate the prediction accuracies for KRN of the inbred lines, which was calculated as a correlation between predicted and true values from the simulations. The results showed that the prediction accuracies increased as the number of tagSNPs increased (Figure 4a and Table S9). More specifically, the prediction accuracies sharply increased when the number of tagSNPs increased from 5 to 500 and then slowly increased when the number of tagSNPs increased from 400 to 2000. Once the number exceeded 2000, the prediction accuracies maintained a consistently high level. Although a large number of tagSNPs were used to predict KRN, the prediction accuracies were still less than 0.5. The effects of training population size on the prediction accuracy were also assessed based on a marker number of 14000 (approximately 50% of the total tagSNPs). In the association panel, the prediction accuracies improved with increasing training population size. When the training population size increased from 50% to 90%, a slight increase in prediction accuracy was observed (Figure 4b and Table S10).
To better understand the genetic architecture of KRN and improve the ability to predict it, we ranked the 27,688 tagSNPs according to their significance in relation to KRN, as obtained by the MLM method, to obtain the top tagSNPs. We found that these top tagSNPs had a higher prediction accuracy (ranging from 0.58 for the top 100 tagSNPs to 0.66 for the top 700 tagSNPs) than randomly selected tagSNPs (ranging from 0.22 for 100 random tagSNPs to 0.33 for 700 random tagSNPs) (Figure 4c and Table S11).
The tagSNPs representing the significant QTNs detected by different models based on BLUP were collected and used to calculate prediction accuracies for KRN in our association panel. The results showed that these tagSNPs identified by different methods had different prediction accuracies ranging from 0.43 (FASTmrEMMA) to 0.60 (ISIS EM-BLASSO) (Figure 4d and Table S12). We also found that the tagSNPs associated with KRN identified by the same method showed different prediction accuracies in diverse environments (Figure S3 and Table S12). To explore whether using the codetected QTNs in different GWAS methods could increase prediction accuracies for KRN, we selected the common QTNs identified by at least two, three, four, five or six methods to obtain the predictions. The results showed that only the common QTNs identified by at least two methods (common ≥ 2) could maintain predictability at a high level; other common QTNs had no advantage in predicting KRN, which may be due to the smaller QTN numbers (Figure S3 and Table S12).
Additionally, to improve the prediction ability, we put the KRN-related tagSNPs detected by seven methods together in a single environment (204 in XX, 87 in BJ, 118 in GZL and 167 for BLUP), namely, M-total tagSNPs, to conduct KRN prediction. As a result, we found that the prediction accuracies were improved sharply and reached 0.74 in XX, 0.66 in BJ, 0.75 in GZL and 0.75 for BLUP (Figure 4d and Table S12). These predictabilities were much higher than those of the single method in each environment (Table S12). Then, we collected the tagSNPs associated with KRN from all methods and all environments, namely, E-M-total tagSNPs, and obtained 439 tagSNPs in total. However, there was only a slight increase in prediction accuracy (ranging from 0.68 in BJ to 0.79 for BLUP for the 439 tagSNPs) when we used the much higher number of E-M-total tagSNPs compared to the fewer M-total tagSNPs (Figure 4d and Table S12).