To evaluate the quality characterization in AM122 and AM180, four nutrition-related traits were qualified, including protein, oil, starch and fibre, by NIRS. The observed phenotypes of the four grain quality-related traits showed diversity across variants in each experimental environment, and the trends of the two panels tended to be consistent (Fig. 1). All absolute values for skewness and kurtosis for the four traits were less than 1, which meant that they fit a normal distribution (Table S1). The coefficient of variation (CV) showed extensive continuous variation for three of the phenotypes (the CV ranged from 6.86% to 15.44%), except for the starch content (1.50%-2.53%). It is worth noting that the CV of each trait in AM180 was higher than that in AM122, which may be because of the larger population and richer genetic background in AM180 (Table S1). Moreover, the broad-sense heritability (h2) for each trait was greater than 80%, which ranged from 80.74%% to 91.82%, indicating that genetic factors play a major role in determining these traits and could stabilize inheritance. Correlation analysis showed that there was a significant positive correlation between the locations for the same trait, and the two panels showed the same trend (Fig. S1).
Genome-wide association analysis
After deleting low-quality SNPs, 42,200 and 42,541 high-quality SNPs were saved for further research in AM122 and AM180, respectively. According to the Bonferroni adjustment, the significant SNPs were identified using the threshold -log10P=4. Combined with LD analysis, we retained independent associations with the highest significance level at significant sites when R2 was >0.6 between related sites in the LD interval (Fig. 2, Table S2). In total, nine, six, fourteen and sixteen SNPs were significantly correlated with protein content, oil content, starch content and fibre content in AM122, respectively. Eleven, twenty-four, eleven and fifteen SNPs were significantly correlated with the four quality-related traits in AM180 (Fig. 2, Table S2). Six colocalized sites were detected two or more times. For each unique significant SNP, the phenotypic variance explained (PVE) ranged from 0.54%-18.97% with an average PVE of 8.42%. In addition, nine of these loci explained more than 15% of the phenotypic variance, which is usually considered a major effect for the related trait. For seven of the nine major associated SNPs (six associated with starch), the frequency of the favoured allele was larger than that of the less favoured allele, which means that high quality (especially for starch content) was unconsciously selected for during the breeding process (Fig. S4). Furthermore, these colocated or major SNPs can be explored for molecular markers for quality improvement.
Prediction and annotation of the candidate genes
According to the linkage disequilibrium (LD) decay distance (200 kb) calculated by PopLDdecay (Fig. S2), we considered the genes approximately 200 kb from the associated SNPs as candidate genes based on the B73 reference genome in MaizeGDB (B73_RefGen_v3, https://www.maizegdb.org/). Combined with dynamic transcriptomic data for grain development from 0 days after pollination (DAP) to 38 DAP in the maize inbred line B73 (Chen et al., 2014), 287 candidate genes were expressed with RPKM≥1 in the kernel development process, including 75, 74, 68 and 77 genes that may affect the contents of fibre, oil, protein and starch, respectively (Table S3). Furthermore, cluster analysis showed that there were 23, 18, 17, and 9 highly expressed candidate genes for these four quality-related traits (Table S4). Notably, the GRMZM5G818977 gene was identified for both oil content and starch content. Among the 67 candidate genes, 31 were differentially expressed in kernels at later developmental stages of KA225 and KB035, in which there were significant differences in the four quality-related traits (Fig. 3, Table S5). These genes might be responsible for differences in quality-related traits.
Furthermore, GO analysis showed that these genes were enriched in cellular process, metabolic process, single-organism process, multicellular organismal process, biological regulation, developmental process, cellular component organization or biogenesis, response to stimulus, regulation of biological process and other biological processes (Fig. 4a). By KEGG analysis, the 31 candidate genes were annotated into ubiquitin-mediated proteolysis, sesquiterpenoid and triterpenoid biosynthesis and nicotinate and nicotinamide metabolism (Fig. 4b).
Regulation network of the candidate genes
There was an inverse correlation across each of the kernel components; that is, if one substance was high, then the other components were relatively reduced. To understand the regulatory relationship for the four quality-related traits in maize, 31 candidate genes were used to construct an interaction network in String (https://cn.string-db.org/). Of these, 28 genes were obtained from the regulatory networks, interacted with various functional proteins by forming independent or interrelated modules, and participated in the regulation of a variety of biological networks (Fig. 5).
It is worth noting that there were interactions between modules related to the same or different traits. For oil content, three genes, GRMZM2G033787 (M12) annotated as phosphatase, GRMZM2G057910 (M13) annotated as NAD-dependent dihydrogenase and GRMZM2G049588 (M14) annotated as glutamine-dependent NAD+ synthetase, interacted with each other through GRMZM2G402319. Moreover, GRMZM2G033787 (M12) interacted with one fibre candidate gene GRMZM2G043338 (M7) via adh1F, which encodes cystathionine beta synthase domain protein 1 (CBS); the domain protein maintains the balance of redox reactions in cells (Wang, 2015). In addition, M12 also shared several common interacting proteins (including NRH1 and NRH2) with the other fibre candidate gene GRMZM2G179432 (M1), which encodes a Haloacid dehalogenase-like hydrolase (HAD) superfamily protein. Another network in the oil content-related modules included GRMZM2G064993 (M15) and GRMZM2G027862 (M17), which interacted with four cellulose synthase genes (CesA-1, CesA2, CesA3 and CesA-9) participating in sugar metabolism. For starch content, there was an interaction network with fibre content that included the starch-related protein GRMZM2G085849 (M23), which directly interacted with the fibre-related protein GRMZM2G053764 (M2) and indirectly interacted with the fibre-related protein GRMZM2G127632 (M4) via GRMZM2G424582. For fibre content, module M9 (GRMZM2G102161, zmm8) formed an independent network with several members of the MADS family, including MADS1, MADS76, MADS5 and ZAG1, which played an important role in inflorescence and kernel development (Lid et al., 2004; Riechmann et al., 1997). These results may provide a deep understanding of the coordination among the four quality-related traits.