Unravelling the genetic architecture of grain quality traits under optimum and low-nitrogen stress conditions using tropical maize (Zea mays L.) germplasm

Soils in sub-Saharan Africa are nitrogen de�cient due to low fertilizer use and inadequate soil fertility management practices. This has resulted in a signi�cant yield gap for the major staple crop maize, which is undermining nutritional security and livelihood sustainability across the region. Dissecting the genetic basis of grain protein, starch and oil content under nitrogen-starved soils can increase our understanding of the governing genetic systems and improve the e�cacy of future breeding schemes. An association mapping panel of 410 inbred lines and four bi-parental populations were evaluated in �eld trials in Kenya and South Africa under optimum and low nitrogen conditions and genotyped with 259,798 SNP markers. Genetic correlations demonstrated that these populations may be utilized to select higher performing lines under low nitrogen stress. Furthermore, genotypic, environmental and GxE variations in nitrogen-starved soils were found to be signi�cant for oil content. Broad sense heritabilities ranged from moderate (0.18) to high (0.86). Under low nitrogen stress, GWAS identi�ed 42 SNPs linked to grain quality traits. These signi�cant SNPs were associated with 51 putative candidate genes. Linkage mapping identi�ed multiple QTLs for the grain quality traits. Under low nitrogen conditions, average prediction accuracies across the studied genotypes were higher for oil content (0.78) and lower for grain yield (0.08). Our �ndings indicate that grain quality traits are polygenic and that using genomic selection in maize breeding can improve genetic gain. Furthermore, the identi�ed genomic regions and SNP markers can be utilized for selection to improve maize grain quality traits.


Introduction
Maize (Zea mays L.) yields in sub-Saharan Africa (SSA) are amongst the lowest in the world (FAO 2021).Average yields in this region range from 1-3 t ha −1 , well below the global average of around 5 t ha −1 (Prasanna et al. 2020(Prasanna et al. , 2021)).Over a quarter of households in SSA are deemed persistently food insecure, with that gure climbing to 40% during the dry season (Fraval et al. 2019).Furthermore, demand for maize in this region is expected to triple by the year 2050 as a result of rapid population growth (Ekpa et al. 2018).While drought stress and increased climate variability are linked to low yields, low fertilizer use is also a key driver of the maize yield gap in this region (Tittonell and Giller 2013).While average N fertiliser use in SSA by smallholder farmers has increased over the past decade, however it remains very low at 17.9 kg N ha −1 (Jayne and Sanchez 2021).This difference is particularly pronounced on female managed plots which tend to receive less nitrogen inputs than male management plots within a farm (Cairns et al. 2021;Farnworth et al. 2017).
For vulnerable populations, increased dietary diversity and consumption of nutrient-rich food is essential for increased nutrition (Poole et al. 2021).However, in more than 20 countries in SSA, maize accounts for more than 30% of calories consumed (Goredema-Matongera et al. 2021).In Lesotho, Malawi, Zambia, and Zimbabwe, the average per capita consumption of maize is more than 100 kg per person per year (i.e., 300 g per day), roughly half of daily calorie intake (Cairns et al. 2021;Prasanna et al. 2021).High-quality protein sources include eggs, meat, and dairy products, but vulnerable populations in SSA have limited access to these foods and rely heavily on maize as a primary source of protein (Nuss and Tanumihardjo 2011).Average protein supply from maize is10 g per capita per day in SSA, with up to 35 g per capita per day in southern Africa (FAO 2021).In Burkina Faso Eswatini, Ethiopia, Lesotho, Malawi, Mozambique, Nigeria, Tanzania, Togo, Zambia and Zimbabwe, maize provides a greater source of protein than protein derived from animal sources (FAO 2021).
Generally, maize grain has low oil (4.4%) and protein (9.1%) contents but a relatively high starch content (73.4%) basing on dry matter measurements (Dei 2017).In temperate maize, breeding has led to signi cant increase in grain yield.However, grain protein content is estimated to have decreased 0.3% per decade and grain starch content has increased at 0.3% per decade (Duvick 2005).Grain oil content has also reduced over time in temperate maize (Scott et al. 2006).Grain quality is linked closely linked to soil quality (Wood et al. 2018).Maize protein content increases as the level of N applied increases (Zhang et al. 2020).To date, studies looking at the effect of reduced N level on grain quality have used signi cantly higher levels of N than is relevant to smallholder farmers in SSA.Under low nitrogen stress in SSA environments, previous maize grain quality assessment studies, such as Abu et al. (2021), Ngaboyisonga and Njoroge (2014), and (Oikeh et al. 1998), used N rates ranging from 30 -120 kg ha −1 .Such application rates are higher than the average N application rates (which is between 12 -16 kg ha −1 (Heffer and Prud'homme 2015;Sheahan et al. 2014) in smallholder agriculture in SSA.Incorporating grain quality traits into breeding can involve signi cant costs associated with grain nutrient analyses, morphological characterizations, and associated trait complexities stymie progress in improving maize grain quality.The high-cost requirement of measuring grain quality traits stems from the wet chemistry procedures required to create near-infrared spectroscopy (NIRS) calibration curves, which are relatively expensive.There is a signi cant opportunity to assess and employ the potential of genomic selection in the improvement of grain quality traits in maize.This reinforces the need for more rapid progress to improve maize grain quality components (particularly, protein, starch, and oil content) through accelerated and more e cient breeding enabled by molecular markers.
The integration of genomic tools such as genomic-wide association studies (GWAS), linkage mapping and genomic selection (GS) with traditional breeding approaches have increased the e ciency of grain quality and Low N tolerance selection.These techniques have been used to identify causal variants for Low N tolerance.Linkage mapping is a common method for locating quantitative trait loci (QTL) based on a segregating population derived from a cross of two parental lines with signi cantly divergent phenotypes (Xiao et al. 2017).Linkage mapping can be combined with GWAS to provide a more comprehensive strategy for identifying markers linked with a trait of interest, by the discovery of trait linked markers by GWAS and their validation through linkage mapping.The most robust markers can then be employed for marker-assisted selection (MAS).However, the accuracy with which genetic maps are constructed is dependent on the mapping population since doubled haploid (DH) lines, near-isogenic lines (NILs), recombinant inbred lines (RILs), and backcross lines are extremely effective yet to date labour-intensive and time-consuming to generate (Rao et al. 2021).GS has been highlighted as a powerful option for selecting polygenic traits that are di cult to select through MAS.GS can accomplish this by employing genome-wide dense markers for predictions, and therefore can support association analyses to determine the genetic basis of key traits (Bentley et al. 2014).Galli et al. (2020) reported that GWAS could identify both additive and dominant genetic effects that in uence Low N tolerance in maize.
To understand how low N stress affects key grain quality traits such as grain protein, starch, and oil content, this study was performed using a tropical maize population under low N and optimum conditions across multi-location eld trials in Kenya and South Africa.The study had the objectives of (i) assessing the genetic architecture of low soil-N tolerant maize test crosses using their responses to grain quality and yield traits under two management conditions (optimum and low soil N); (ii) identifying the signi cant quantitative trait nucleotides (QTNs) and putative candidate genes and QTLs for quality traits in tropical maize germplasm tested in multiple locations; and (iii) assessing the potential of utilizing GS in the improvement of grain quality traits.

Germplasm, experiment design and management
An association mapping panel of 410 tropical maize lines developed under CIMMYT's Improved Maize for African Soils (IMAS) project (Ertiro et al. 2020a) in collaboration with the Kenya Agricultural and Livestock Research Organisation (KALRO) and Agricultural Research Council (ARC, South Africa) was used.The study also evaluated two DH populations from CIMMYT's Heterotic group B (221 lines of CML550/CML504, 115 lines of CML550/CML511), and two DH population from CIMMYT's Heterotic group A (175 lines of CML505/LaPostaSeqC7-F64-2-6-2-2 and 131 lines of CML536/LaPostaSeqC7-F64-2-6-2-2) (Ertiro et al. 2020b).Test cross hybrids were generated by crossing all inbred lines with a broadly adapted CIMMYT maize inbred line tester from the opposite heterotic group.
The eld trials were designed and conducted by Das et al. (2019) and Ertiro et al. (2020a) who previously studied and published on different objectives using data from these same trials.Three optimal and six low N-stressed sites were used to screen the testcross progenies (Table 1).Experiments conducted in the same site over several years were classi ed as separate environments.The study employed an alphalattice design with two replications.The sites for low N trials were depleted for soil N content by growing sorghum for several years without applying any external N fertilizer.At planting, triple phosphate (46% P 2 O 5 ) was applied to the low N trials at the rate of 50 kg P 2 O 5 /ha.On optimum trials, diammonium phosphate (DAP) fertilizer was used at the rate of 54 kg N per hectare.Optimum trials were top-dressed with urea fertilizer at the rate of 138 kg N per hectare three weeks after planting.All trials under both optimum and low-N were irrigated as required to avoid any moisture stress.Trials under both conditions were kept weed-free and other standard agronomic practices were conducted.

Measurements of grain yield and quality traits
Data were recorded for grain yield and quality traits (i.e., protein, oil, and starch contents).Shelled grain yield was measured in kilograms (kg) and converted to tons per hectare and reported at 12.5% moisture.Protein, starch, and oil content were measured using a FOSS Infratec TM 1241 from 500-gram samples of grain taken from each plot and are reported as a percentage of whole grain.The FOSS Infratec is a non-destructive whole-grain analyzer that uses near-infrared re ectance (NIR) to estimate quality parameters.Five 100-gram subsamples were assayed and the mean reading for each parameter was reported per plot.

Phenotypic data analysis
The restricted maximum likelihood (REML) approach was used to conduct analyses of variance (ANOVA) using the META-R program for each and across environments under optimum and Low-N conditions (Alvarado et al. 2020).The linear mixed model was used to calculate all variance components.The study treated replication as xed effect and all other treatment effects as random.On an entry-mean basis, the broad-sense heritability was estimated using the genotypic to phenotypic variance ratio from the derived variance components.Furthermore, to determine the genotypic effects of the investigated lines for each and across environments, best linear unbiased estimation (BLUE) and best linear unbiased prediction (BLUP) were obtained.For GWAS and linkage mapping, BLUPs were used.On the other hand, BLUEs were used for GS analyses.The classi cation of the genotypic correlation coe cients followed the guidelines provided by Pro llidis and Botzoris (2019).To determine the impact of low N stress on the aforementioned traits, we used a t-test to compare the mean values of the two management conditions.We also determined the percentage change (i.e., decrease or increase) in grain quality trait performance.
Genotyping-by-sequencing (GBS) DNA was extracted according to the CIMMYT high-throughput mini-prep Cetyl Trimethyl Ammonium Bromide (CTAB) method (Semagn (2014).Following the protocol presented in Elshire et al. (2011), maize DNA samples were genotyped using a restriction enzyme (ApeKI) and 96-plex multiplexing at the Institute of Biotechnology at Cornell University, USA.The Institute of Genomic Diversity (IGD) at Cornell provided raw GBS data for a maximum of 955,120 SNP loci spread throughout the 10 maize chromosomes (Ertiro et al. 2020a).Raw data was ltered for linkage mapping according to the criteria used in Ertiro et al. (2020a) of >10 percent minor allele frequency (MAF) and no missing data.Furthermore, the genotype data were ltered for GWAS using the Trait Analysis by Association, Evolution, and Linkage (TASSEL v.5.2.7.2) software, with a baseline count of SNPs on 90% and a MAF of >5% of the sample size as presented in Ertiro et al. (2020a).Principal Component Analysis (PCA) was carried out in TASSEL (v.5.2.7.3), as were genetic distances and kinship.

Genome-wide association study analysis
The R package 'FarmCPU-Fixed and random model Circulating Probability Uni cation' (Liu et al. 2016) was used for GWAS analysis.FarmCPU utilized the rst three PCs derived by TASSEL as input for GWAS.The kinship was computed using FarmCPU's default kinship algorithm as presented in Ertiro et al. (2020a).The Manhattan and quantile-quantile (QQ) plots, GWAS ndings, and a table of marker effects of user-provided variables were all produced by the FarmCPU using the "GAPIT" function.The putative genes were checked on the: (i) maize gdb website (http://www.maizegdb.org/)and (ii) ensemble website (http://plants.ensembl.org/biomart/martview/ce35c2dc12e78418361fb4cffa43bdbe)using the BLAST tool against the representative genome B73 version 2.

QTL mapping and genomic prediction
The four DH populations were genotyped with GBS and data was further ltered to a manageable size using TASSEL software with >0.10 MAF, <5% heterozygosity, and 90 % the minimum count of the total size (Bradbury et al., 2007;Sitonik et al., 2019).In all the populations, homozygous marker loci for both parents and uniformly distributed polymorphic markers between parents were retained.Linkage maps were constructed by using QTL IciMapping version 4.1 in all four DH populations.BIN, is an inbuilt tool implemented in QTL IciMapping was used to remove the highly correlated SNPs.This resulted into retain 2,699, 1,962, 1,985 and 2,086 high-quality SNPs in CML550/CML504, CML550/CML511, CML505/LaPostaSeqC7-F64-2-6-2-2 and CML536/LaPostaSeqC7-F64-2-6-2-2, respectively.These SNPs were used to construct linkage maps using the MAP function.IciMapping used the grouping, ordering, and rippling steps to construct a linkage map.The Kosambi genetic distance mapping function which assumes that recombination events in uence the occurrence of adjacent recombination was used.BLUP values across environments for the DH populations were used in QTL detection analysis (Meng et al., 2015).The mapping populations were grouped by the SNPs and the signi cant difference between the means (P-value <0.0001) was detected based on the markers that were linked to a QTL controlling the selected target trait (Collard et al., 2005).The highest peak of one LOD that supports the con dence interval was used to declare the signi cance of the QTL map position on both sides of the QTL (Hackett, 2002).The phenotypic variation explained (PVE) by each QTL and together for all QTLs for each trait was estimated.The origin of the favourable allele for each trait was identi ed based on the sign of the additive effects of each QTL.
BLUEs across environments for each trait in each population were used in the GS analysis.The Ridge-regression BLUP (RR-BLUP, Zhao et al. 2012) with vefold cross-validation for each trait was used for the analysis.A sample of 4,000 SNPs with all data values, equally distributed throughout the genome, and MAF > 0.05 was chosen from the GBS data for the IMAS panel and all four DH populations.
Individual DH population and the IMAS set were sampled to form a training and prediction set.The prediction accuracy was calculated as the correlation between the observed phenotypes and genomic estimated breeding values (GEBVs) divided by the square root of heritability (Dekkers 2007).In each population, 100 iterations were done for the sampling of the training and validation sets.

Effect of low N stress on grain yield and quality traits
There was signi cant variation in protein, starch and oil content, and grain yield within all four biparental DH populations and the IMAS panel under optimum and low N stress conditions (Figure 1 and Table 2).In the IMAS panel and CML505/LaPostaSeqC7-F64-2-6-2-2 DH pop, yield under low N stress was reduced by 59% and 48%, respectively.In DH pop CML550/CML511, the mean yield under low N stress was 5.45 t ha −1 , however, this was a reduction of 47% relative to optimal conditions.Low N stress signi cantly (p<0.01)reduced protein and oil content (except in DH pop CML505/LaPostaSeqC7-F64-2-6-2-2) but had no signi cant effect on starch content.Although the level of N stress and therefore the reduction in grain yield was the lowest in DH pop CML550/CML511, both protein and oil content had the largest reduction in this population under low N stress.
The genotypic, environmental, and genotype-environment interaction effects (G, E and G x E, respectively) were signi cant at p≤0.05 for yield and quality traits (Table 2).For protein, starch, and oil content under optimal conditions, the magnitude of genotypic variance was greater compared to low N stress conditions.Under low N conditions, the effect of genotype, environment, and G x E interactions on oil content was signi cant across all genotypes tested.Interestingly, under the same conditions, the genotypic effects on protein content were only signi cant in DH pops CML505/LaPostaSeqC7-F64-2-6-2-2 and CML536/LaPostaSeqC7-F64-2-6-2-2.The G x E effects on protein content and starch content in DH pop CML550/CML511 were signi cant.The zero estimates of G x E interactions for starch and protein content (observed on DH pop CML550/CML511 under low N stress) indicate that genotypic performance for these traits was stable across the tested environments.H 2 values of each trait under both optimal and low N stress are presented in Table 2.In general, H 2 of all traits was lower under low N stress than optimal, with the exception of starch content which increased under low N stress across all populations.
Grain yield was negatively correlated with protein content across populations, regardless of N stress level (Table 3).Similarly, starch content showed a negative correlation with protein and oil content across genotypes and management options.A weak positive correlation was reported between protein and oil content across populations and N levels.The only exceptions were in DH pop CML550/CML504 under optimum conditions where oil content was signi cantly (at p<0.01) and negatively correlated to protein content (r = -0.92**).Protein content had a negative correlation with grain yield (r = -0.41**)and starch content (r = -0.54**) in the IMAS panel under optimum conditions.Similarly, a weak positive correlation was observed in the IMAS panel, DH pops CML550/CML504 and CML505/LaPostaSeqC7-F64-2-6-2-2 between protein content and oil content under low N stress.Starch and oil contents were shown to be signi cantly (p<0.05) and negatively correlated under optimum (r = -0.65)and low N stress (r = -0.18) in the IMAS panel.The genotypic correlation coe cients for DH pops CML505/LaPostaSeqC7-F64-2-6-2-2 and CML536/LaPostaSeqC7-F64-2-6-2-2 showed that oil content under optimum conditions had no correlation with protein content (r = 0.00).As demonstrated in Table 3, further signi cant (p < 0.05 and p < 0.01) trait correlations were established among the phenotypic parameters measured across the management conditions for each set of genotypes.The observed correlations between grain quality and yield can be useful for selection decisions or trade-offs in genotypic selection.

GWAS analyses
Of the 337,113 GBS SNPs used to genotype 410 genotypes, 77.1% (259,798) remained after ltering using the >5% MAF and 10% missing per marker criteria (Supplementary Figure S1).The kinship relations among the IMAS panel were determined using the ltered 259,798 SNP markers and depicted as a genetic cluster, indicating that the panel of genotypes are split into four potential genetically differentiated subgroups.The heatmap of the panel's kinships was used to predict the magnitude of the existing relationships in the genotypes: this established that the genotypes were not closely related and that there is no strong population structure (Supplementary Figure S2).Further partition of the population structure of the IMAS panel using STRUCTURE 2.3.4 is presented in an earlier study by Kibe et al. (2020a) and Gowda et al. (2015).PCA was carried out using 259,798 high-quality SNPs (Supplementary Figure S3).The rst principal component (PC1) accounted for roughly 4.5% of the overall variation, whereas the second principal component (PC2) explained 2.5% (Supplementary Figure S3).Calculation of genome-wide LD using 259,798 SNPs showed a signi cant decline in LD as genetic distance rose, with different rates of attenuation for each of the ten chromosomes (Supplementary Figure S4).Figures 2 and 3 depict the GWAS ndings for protein, starch, and oil content, and yield across the two N managements as Manhattan and Q-Q plots of p-values evaluating the anticipated and observed -log10 p-values.Sixty-one SNPs were signi cantly (P=2x10 −5 , p=0.1 False Discovery Rate (FDR)) associated with the protein, starch, and oil content, and yield under optimal conditions and were spread across 10 chromosomes (Table 4).Under low N conditions, 42 SNPs are linked to the aforementioned traits.Under optimal conditions, three SNPs linked with protein content were signi cant on chromosomes 1 (S1_17679954 and S1_214242607) and 10 (S10_114836465).Under low N stress, however, two different SNPs S3_198394847 and S4_120988951 were associated with protein content.Starch content (low N) was associated with ve SNPs, the most signi cant of which was S5 10542862.Eight SNPs on chromosomes 2 (S2_174345463 and S2_174345465), 3 (S3_180044790), 5 (S5_10542862), 6 (S6_5158703 and S6_60978968), 7 (S8_3430590), and 8 (S7_14465153) were linked with the starch content under optimum conditions.Under optimal conditions, twelve SNPs were signi cantly associated with oil content, with one-third of these loci located on chromosome 6.Twelve SNPs, with loci on all chromosomes except 4 and 7, were signi cantly linked with oil content under low N.For starch and oil content at low N conditions, only the signi cant SNP on chromosome 6 (S6_60978968) was co-detected.The proportion of detected SNPs for the other traits are presented in Table 4.The Q-Q plot for grain yield and oil content under optimum conditions and oil content under low N stress revealed that some observed P-values were more signi cant as the marker points migrated from the dotted red line towards the y-axis.
To elucidate the molecular and physiological mechanisms controlling grain quality traits under optimum and low N conditions, candidate genes were identi ed (Harper et al. 2016).On all chromosomes, a total of 51 candidate genes were discovered (Table 4).The lowest number of candidates (2) and the highest number (12) were related to protein content under low N and oil content under both optimum and low N, respectively.From these candidates, 80.39% (41 genes) were functionally annotated, whereas 19.61% (10 genes) were classi ed as unknown proteins.The study revealed four candidate genes with protein serine/threonine kinase activity that play a role in soil N response.Under optimum conditions, GRMZM2G159307 and GRMZM2G104325 were encoded as ATP binding proteins for grain yield and starch content, respectively.GRMZM2G10816 (yield), GRMZM2G070523 and GRMZM2G080516 (oil content) were associated with DNA biosynthesis under low N stress conditions.Under both optimal and low N circumstances, GRMZM2G033694 was annotated in the Histonelysine N-methyltransferase family.Genes coding for shoot apex development were discovered to be associated with grain yield, protein, starch, and oil content under low N stress.

QTLs associated with grain yield and quality traits
The four populations used in this study for linkage mapping were also used in our earlier study (Ertiro et al., 2020a) which includes detailed information about genetic maps.Table 5 shows the detected QTLs and their positions and genetic effects.In DH pop CML550/CML504, two QTLs each were detected for grain yield and starch content, three QTL for protein content and ve QTL for oil content under low N stress.The PVE by these QTL was varied from 4.67 to 22.19% and together the total PVE was varied from 12.5% for grain yield to 47.9% for oil content.In DH pop CML550/CML511, one QTL each were detected for grain yield, starch content and oil content under low N stress.In DH pop CML505x LaPostaSeqC7-F64-2-6-2-2, ve QTL were detected for grain yield with one QTL on chromosome 3 having a major effect with 12.17% of PVE.For protein content nine QTL were detected with all individually having minor effects except a QTL on chromosome 3 with 11.78% of variance explained.For starch content, three QTL each were detected under optimum and low N conditions with two major effects QTL on chromosome 8.For oil content three QTL were detected under optimum and six QTL were identi ed under low N conditions with one common QTL on chromosome 2 across management conditions.Four major effect QTL were identi ed for oil content on chromosomes 2, 4 and 5 which explained >10% of the phenotypic variation (Table 5).In DH pop CML536xLapostaSeqiaF64, one QTL each were detected for protein and oil content and three QTL were detected for starch content, with one major effect QTL at chromosome 4 which contributes 20.3% of phenotypic variation for oil content.* Chr = Chromosome, LOD = Logarithm of Odds; add = additive effect; PVE = phenotypic variance explained; fav allele = parental line contributing the favorable allele for trait, QTL name composed by the trait code followed by the chromosome number in which the QTL was mapped and a physical position of the QTL The RR-BLUP model (Endelman 2011) was used to estimate the performance of maize genotypes for grain quality traits for each population (Figure 4 and Supplementary Table S2).Under low nitrogen conditions, average prediction accuracies across the studied genotypes were higher for oil content (0.78) and lower for grain yield (0.08).In IMAS panel we observed the prediction accuracy of 0.41, 0.38.0.39 and 0.44 under optimum and 0.35, 0.35, 0.41 and 0.56 under low N conditions, respectively.Interestingly, in DH pop CML550/CML504 outperformed other DH populations in terms of genomic prediction accuracy.Under low N, CML550/CML504 had the best prediction accuracy for protein (0.66), oil (0.73), and starch (0.7) content.The prediction accuracy for protein content was highest in DH pop CML550/CML504 both under optimum (r = 0.66) as well as under low N stress (r = 0.69).For starch content under low N, prediction correlation was highest for CML550/CML504 (r = 0.70) followed by the CML536x LaPostaSeqC7-F64-2-6-2-2 (r = 0.56), IMAS panel (r=0.41),CML550xCML511 (r = 0.26) and CML505x LaPostaSeqC7-F64-2-6-2-2 (r = 0.23).CML536x LaPostaSeqC7-F64-2-6-2-2 had the highest prediction correlation for oil content under low N (r = 0.78), followed by CML550/CML504 (r=0.73) and CML505x LaPostaSeqC7-F64-2-6-2-2 (r =0.71).CML550xCML511 had the lowest prediction for grain yield (r = 0.08), protein (r=0.17)starch (r = 0.16), and oil content (r = 0.11).

Discussion
Signi cant levels of malnutrition (Christian and Dake 2021) and food insecurity (Giller 2020) continue to be experienced by maizedependent smallholder farming populations in SSA that cultivate in nitrogen-depleted soils.Unravelling the genetic architecture of grain yield and quality traits through GWAS and GS is critical for the development of superior genotypes conferring high expression of grain quality traits both under optimum and low N stress.This study aimed to understand the underlying genetics of low N stress on grain quality traits by combining GWAS with the IMAS panel, QTL detection, and GS in four bi-parental populations.Grain quality traits, notably protein, starch, and oil content, are critical for reducing the incidence of undernutrition in SSA.Understanding the performance of grain quality traits and associated genetic markers under low N stress can aid in the development of maize lines with high protein, starch, and oil content.
Phenotypic analyses showed that protein, starch, and oil content were signi cantly decreased under low N stress compared to optimal conditions across all tested genotypes.This is consistent with the ndings of Liu et al. (2008), who found that in lower N conditions, protein and oil content are considerably reduced.However, the same research reported the opposite for starch content increased under lower levels of N stress.According to Jahangirlou et al. (2021) and Simić et al. (2020), high N conditions are associated with higher protein content and yield.On the other hand, low N substantially decreases protein concentration and all zein fractions apart from β-zeins, according to research conducted in Zimbabwe by Shawa et al. (2021).The impact of soil nutrient management on oil content was signi cant in the study of Ray et al. (2019).That study reported that using N as a component in NPK blends increased the quantity of saturated fatty acids while decreasing the percentage of unsaturated fatty acids in grain maize oil.Kaplan et al. (2017) suggested that N fertilizer application in combination with adequate irrigation has a favourable effect on the oil content.However, in SSA due to the impoverished economic situation of smallholder farmers, N fertilization is not currently an accessible solution to combat endemic undernutrition.Therefore, maize lines showing high protein, starch and oil content under low N stress should be considered for incorporation into maize breeding programs targeting the SSA region.
In low N stress environments, genetic variability is crucial for the effective selection of enhanced grain quality traits in maize.Ertiro et al.
(2020a) asserted that, due to the intrinsic unpredictability of various traits of interest, phenotypic data for trials conducted under low N conditions typically show poor heritability.However, under both optimum and low N environments, our study estimated wide genetic variances and moderate to high broad sense heritabilities.Estimates of heritability ranging from moderate to high imply that the traits have the potential to be enhanced by recurrent selection (Gowda et al. 2021).The in uence of G, E, and G x E interactions on oil content was signi cant under low N conditions across all genotypes examined.The genotypic effects on protein content were signi cant in some of the genotypes tested (CML505/LaPostaSeqC7-F64-2-6-2-2 and CML536/LaPostaSeqC7-F64-2-6-2-2) under low N conditions.The G x E effects of CML550/CML511 on protein and starch content were signi cant.The detected signi cant genotypic variation for the assessed traits in this study indicated the possibility of selecting for improved protein, starch, and oil content under low N stress.Among the three grain quality traits investigated, starch content had the lowest H 2 estimate, whereas oil content had the highest H 2 estimate under low N conditions.Oil content's high broad-sense heritability suggests that its narrow-sense heritability may be even greater, implying that signi cant genetic gain for this trait is attainable.
Grain yield had a negative genotypic correlation with protein content in all populations, regardless of N level.This supports Arisede et al.
(2020)'s ndings that increased grain yield was associated with decreased grain protein content in both susceptible and tolerant maize hybrids when residual soil N was low, despite the fact that tolerant hybrids showed a substantially smaller loss in grain protein content.It is well known that choosing between yield and quality is hard in breeding.The observed relationship between grain yield and quality traits in this research under both optimal and low N conditions imply that selecting for grain yield alone will not increase protein, starch, or oil content.On the other hand, protein content had a signi cant negative correlation with starch content in all populations, which is consistent with previous results (Liu et al. 2008;Zheng et al. 2021).Thus, to increase grain quality, particularly under low N stress, there is a need to select for both grain yield and grain quality.Obviously, this would be very expensive, and a negative relationship makes breeders balance in the selection of these traits hence the need to investigate the potential of molecular breeding in the improvement of these traits.
Association studies targeting protein, oil, and starch content in maize have been conducted utilizing a range of genotypes and marker sets (Alves et al. 2019;Cook et al. 2012;Zheng et al. 2021).The use of GWAS in maize genetics has been highly effective in discovering causal genes for grain quality traits (Zheng et al. 2021).In particular, GWAS is an effective technique for mapping loci linked with complex plant traits in genetically heterogeneous populations (Deng et al. 2021).The power of detection of GWAS is dependent on the LD between the markers and QTL.In outcrossing plant species such as maize, LD declines at a short distance and rapidly (Dinesh et al. 2016).In this study, the LD declined rapidly across physical distance (Kibe et al., 2020a), showing that the IMAS panel has signi cant genetic diversity and was, therefore, suitable for GWAS.
Candidate genes and SNPs discovered by GWAS for maize grain nutrient content can provide critical information for maize breeding efforts focusing on developing high-quality varieties (Zheng et al. 2021).In this study, GWAS identi ed 42 SNPs linked to the grain quality traits studied under low N conditions.However, there were no overlapping SNPs for grain quality traits under low N.Under low N stress, two SNPs on chromosomes 3 (S3_198394847) and 4 (S4_120988951) were discovered to be linked to protein content.Moreover, 12 SNPs with loci on all chromosomes except 4 and 7 were also shown to be substantially associated with oil content under low N stress.The genetic regions identi ed in this work through GWAS will be increasingly relevant in future breeding approaches for accurate selection of high grain quality and to increase tolerance of maize lines to low N stress.
Comparison of SNPs identi ed in this study under low N and optimum conditions revealed no overlapping of SNPs for grain yield and protein content possibly we were not able to detect the common variants responsible for these traits in different management conditions.Nonetheless, for starch content, all the SNPs detected under low N were also detected under optimum conditions.Whereas for the oil content SNP S6_60978968 on chromosome 6 was consistently detected in both the conditions and also found common SNPs in bin 2.01, 5.03, 6.05 and 9.01 on chromosome 2, 5, 6 and 9, respectively (Table 4).Further comparison of the detected SNPs with the previous studies revealed some overlapping with earlier reported QTL (Wang et al. 2016;Zheng et al. 2021).For instance, SNP S1_214242607 associated with grain protein content under optimum was closely located with marker detected through GWAS (Zheng et al. 2021) and co-located with QTL detected in two populations (Wang et al. 2016).SNP S1_191845162 detected for oil content was co-located within the QTL (bnlg2086-umc1122 interval) reported by Zhang et al. (2008) and Wang et al. (2010.Marker 1_190758142 detected through GWAS for oil content by Zheng et al. (2021) was also located within the same QTL region pointing to the importance of the region for improving oil content in maize.Another SNP S2_148879075 detected for oil content was located within the QTL region (bnlg108-phi092 interval) reported by Zhang et al. (2008).SNP S1_17679954 for protein content was co-located within the QTL detected for oil content on chromosome 1 (umc1685-umc1044 interval) in F 3 population (Wang et al. 2010).Nevertheless, some SNPs did not coincide with earlier reports in terms of their physical location.This possibly due to several reasons like these SNPs might be speci c to the population in this study, the variation for quality traits in these populations is different, and different methods used to estimate quality traits in different studies also contribute to variation.However, new speci c SNPs detected in this study need further validation, nevertheless, these results can serve as a reference for future studies.
We identi ed 51 candidate genes potentially underlying the molecular and physiological processes governing grain quality traits under optimum and low N environments.The identi cation of candidate genes based on associated SNPs can aid with the identi cation of genes important in grain quality performance under optimal and low N environments.Under low N stress, genes coding for shoot apex growth were revealed to be linked with grain yield, protein, starch, and oil content.Peng et al. (2010) asserted that shoot growth, rather than root size, is a good indicator of N su ciency in maize.The research also identi ed four candidate genes with protein serine/threonine kinase activity that play a role in soil N response.Protein kinases are well-known regulators of the response of plants to abiotic stresses (Diédhiou et al. 2008;Kulik et al. 2011;Mao et al. 2010).GRMZM2G159307 and GRMZM2G104325 encode ATP binding proteins for grain yield and starch content, respectively.ATP binding proteins are essential for cellular motility, membrane transport and the control of different metabolic activities (Chauhan et al. 2009).ATP-binding has also been reported in several studies to in uence the maintenance of homeostasis in plants under both abiotic and biotic stresses (Dahuja et al. 2021;Franz et al. 2011;Jarzyniak and Jasiński 2014).GRMZM2G033694 was assigned to the Histone-lysine N-methyltransferase family at both optimum and low N conditions.It is important to note, however, that these candidate genes should be further validated before being used in breeding schemes.Further functional research on the candidate genes discovered in this study is necessary to validate their possible utility in high grain quality breeding under low N conditions.
Linkage mapping in four populations found multiple QTLs for the studied grain quality traits.Zheng et al. (2021) alluded that, numerous grain nutritional quality QTLs in maize have been identi ed by genetic dissection of nutrient quality over the last two decades using traditional QTL mapping.Despite the discovery of QTLs and genes that confer superior maize grain quality in some studies, further sources of genetic variation are likely to exist among currently unexplored populations.QTL analyses in four DH populations revealed 8, 13, 12 and 15 potential QTLs associated with grain yield, protein, starch, and oil content, respectively.One QTL on chromosome 3 (qGY3_187) for grain yield is overlapped with major effect QTL (qPC3_187) for protein content and located between 180 and 189 Mb, which might be an interesting region to improve both protein and grain yield by considering their negative relationship.Zhang et al. (2015) also identi ed a consistent QTL (umc1644-phi102228 interval) in the same genomic region for protein content.Another QTL qPC1_115 in CML550/CML504 which explained 11% of the phenotypic variance was consistent with earlier reported QTL (phi001-umc1988 interval) by Zhang et al. (2015) and qPC10_142 detected on chromosome 10 was consistent with QTL (SYN37373 -PZE110095199 interval) reported by Wang et al. (2016) in recombinant inbred line population.There was one major effects QTL (>10% phenotypic variance explained) for grain yield (qGY3_196), three QTL each for protein content (qPC1_115, qPC3_187, qPC5_67) and starch content (qSC1_180, qPC4_32, qPC8_124) and six QTLs for oil content (qOC2_186, qOC3_60, qOC4_70, qOC5_183, qOC6_133and qOC7_08) were detected in four biparental populations.A major QTL (qSC1_180) identi ed in DH pop CML550xCML511, explaining about 11.5% of total phenotypic variance and located between 175 and 188 Mb, was consistent with a QTL (SYN367-PZE101031077 interval) observed in a RIL population by Wang et al. (2016).Similarly, another QTL for starch content (qSC8_124) located between 123 and 124 Mb also coincided with earlier reported QTL on chromosome 8 (PZE108069534-SYN19928 interval; Wang et al. ( 2016)).Similarly, major QTL for oil content qOC2_186 was also overlapped with earlier detected QTL, indicating several consistent regions for quality traits across genetic back grounds which supports their stable nature and is amenable for MAS-based improvement.Overall, several QTLs were consistent with the previous studies indicating their reliability to be used in applied breeding.
The promise of GS in tropical maize for various traits of interest has been evaluated in a range of studies (Azmach et al. 2018;Beyene et al. 2019;Crossa et al. 2014;Gowda et al. 2021).The relative merits of GS over phenotypic selection in uence its widespread application in breeding programs (Kibe et al. 2020a).Moderate to high accuracies observed in this study for the bi-parental populations and IMAS panel offer promise in breeding for quality traits in tropical maize.Under N-starved soils, average prediction accuracies across the studied genotypes (Figure 4) were higher for oil content (0.78) and lower for grain yield (0.08) which ascribed to their differences in their genetic architecture.CML550/CML504 exhibited the best protein (0.69), oil (0.73), and starch (0.70) content prediction accuracies under low N stress.GS levels have a direct in uence on the degree of trait genotypic diversity and heritability in each population (Kibe et al. 2020a).This is con rmed by this study, especially for oil content which had the highest genetic prediction accuracy and H 2 estimates.Prediction accuracy of the IMAS panel was in agreement with various studies on moderately complex traits like resistance for grey leaf spot (Kibe et al. 2020a), common rust (Kibe et al. 2020b), Striga (Gowda et al. 2021), maize lethal necrosis and maize chlorotic mottle virus (Sitonik et al. 2019).In the IMAS panel, the observed moderate prediction accuracy can be attributed to its genetic structure and high LD between adjacent markers, which could also be credited to its moderate heritability.Overall, this study indicates that utilizing a common training population to predict grain quality trait performance under low N stress in many linked but separate populations can be bene cial.

Conclusions
To investigate the genetic basis of protein, starch, and oil content performance under low N stress, we employed a single panel consisting of 410 tropical maize lines for GWAS and genomic prediction.QTL mapping was also used to investigate the underlying genetic architecture in four bi-parental populations to better understand the grain quality traits.The genotypic correlations of the grain quality traits investigated indicated that these populations can be used to select better-performing lines under low N stress.GWAS identi ed 42 SNPs associated with grain quality traits.In addition, several QTLs for the examined grain quality traits were identi ed by linkage mapping across populations.The genomic regions identi ed can be used for selection efforts to enhance grain quality trait performance in low-nitrogen soils.Furthermore, the ndings showed that including GS in maize breeding can successfully support phenotypic selection to improve grain quality trait performance under low N stress.Future work should, therefore, focus on validating the identi ed QTLs to enhance the e cacy of maize breeding in SSA.

Declarations
Figures

Table 1
Description of the eld trials used in the study.

Table 2
Genetic parameters for IMAS panel and four DH Populations evaluated under optimum and low N stress conditions in multiple environments.

Table 3
Genotypic correlations between grain yield and grain quality traits evaluated under optimum and low N stress management.Values in shaded cells represent genetic correlations between traits under low N stress, values in unshaded cells represent genetic correlations between traits under optimum conditions.

Table 4
Chromosomal positions and SNPs signi cantly associated with grain yield, protein, starch, and oil content detected by SNP-based GWAS in the IMAS association mapping panel under optimum and low N management conditions.
* MAF -Minor allele frequency; MAE -minor allele effect; a The exact physical position of the SNP can be inferred from marker's name, for example, S1_82702920: chromosome 1; 82,702,920 bp (Ref Gen_v2 of B73)* MAF -Minor allele frequency; MAE -minor allele effect; a The exact physical position of the SNP can be inferred from marker's name, for example, S1_82702920: chromosome 1; 82,702,920 bp (Ref Gen_v2 of B73) * MAF -Minor allele frequency; MAE -minor allele effect; a The exact physical position of the SNP can be inferred from marker's name, for example, S1_82702920: chromosome 1; 82,702,920 bp(Ref Gen_v2 of B73)

Table 5
Number of QTL detected for grain yield, grain protein, starch and oil content under optimum (opt) and low-nitrogen (LN) stress across environments in four populations.
* Chr = Chromosome, LOD = Logarithm of Odds; add = additive effect; PVE = phenotypic variance explained; fav allele = parental line contributing the favorable allele for trait, QTL name composed by the trait code followed by the chromosome number in which the QTL was mapped and a physical position of the QTL