Genomics of Maize Stover Yield and Saccharification Efficiency Using a Multi-Parent Advanced Generation Intercross (MAGIC) Population

Background: Cellulosic ethanol derived from fast growing C4 grasses could become an alternative to finite fossil fuels. With the potential to generate a major source of lignocellulosic biomass, maize has gained importance as an outstanding model plant for studying the complex cell wall network, and as a model to optimize crop breeding strategies in bioenergy grasses. A genome-wide association study was conducted using a subset of 408 Recombinant Inbred Lines (RILs) from a Multiparent-Advanced Intercross (MAGIC) Population in order to identify single nucleotide polymorphisms (SNPs) associated with yield and saccharification efficiency of maize stover. Results: We identified 4 SNPs significantly associated with stover yield that corresponded to 4 QTL, and 16 SNPs significantly associated with saccharification efficiency, that could be clustered into 5 QTL. Markers linked to these QTL could be used in marker-assisted selection programs to improve ethanol production. In addition, we have pointed out genes that contain the significant SNPs or are physically close to them. Conclusions: Genes involved in nitrogen assimilation, organ growth, and stress tolerance could be good candidates attending to QTLs for stover yield. On the other hand, for saccharification efficiency we highlight genes implicated in biomass degradation, transcriptional control of monolignol biosynthesis, and lignin polymerization as probable candidate genes in the QTLs involved.


Background
In a scenario of global growth, depletion of natural resources and climate change the economic and environmental consequences of reliance in finite fossil biofuels has become a global concern. This situation has driven to exhaustive scientific research in order to find sustainable energetic alternatives. Cellulosic ethanol derived from fast growing C4 crops has become one of the preferred choice due to their high biomass yields, broad geographic adaptation, carbon sequestration and nutrient utilization (1,2) With the potential to generate a major source of lignocellulosic biomass, maize has been postulated as a model for understanding the complex cell wall architecture, and to optimize crop breeding strategies in bioenergy grasses. Maize stover, the residue left after harvesting the grain, is the largest readily available lignocellulosic feedstock (1,(3)(4)(5)(6) Lignocellulosic ethanol (second generation biofuel) from maize stover is composed of 33.1% hemicellulose, 39.4% cellulose, and 14.9% lignin (7,8). The conversion of lignocellulosic biomass to ethanol is a three step process that: (i) a pre-treatment stage, followed by the (ii) hydrolytic degradation of carbohydrates to the constituent sugar monomers (saccharification), and the (iii) final fermentation of the free sugars to ethanol (9).
The key factor in this process is the stover recalcitrance to deconstruction, conferred by the composition and organization of the cell wall. Maize cell walls are mainly composed of cellulose microfibrils embedded in a matrix of hemicelluloses, lignin and to a lesser extent, pectins, proteins and phenolic compounds (mainly hydroxcinnamates) (10). This strong assemblage provides not only structural support and rigidity to the cell, but also resistance to biotic and abiotic stresses (11). The framework of hemicellulose and lignin closely interconnected with cellulose, prevents the action of hydrolytic enzymes reducing the degradability of carbohydrates. The degree of lignification and the polysaccharides crosslinking by diferulates, as well as cellulose crystallinity, contribute to the recalcitrance of lignocellulosic feedstock. This recalcitrance means a greater expense in pretreatments and high enzyme inputs, which is translated in a greater economic cost. Therefore, reduction of the cell wall recalcitrance is expected to improve saccharification efficiency (12,13).
It should be noted that the ability to produce ethanol depends on the genotype and on the applied pre-treatment. Therefore, to look for differences for ethanol production among genotypes, it is essential to choose the appropriate treatment for the tissue under study. Among a number of pretreatments that could be used, alkaline pre-treatment has been suggested as the most appropriate for maize stover and other herbaceous plants (14,15). The cell walls of gramineous monocots are known to contain alkali-labile ferulate ester cross-links within the hemicellulose and thereafter crosslinked with lignin (16,17), as well as high phenolic hydroxyl contents in their lignins, resulting in increased alkali solubility (18). As a consequence, mild alkali pre-treatment of grasses can be employed for both fractionating biomass and generating pre-treated biomass that is highly amenable to enzymatic hydrolysis (15,19). The optimisation and improvement of stover biofuel production should be focused on stover yields (expressed as tonnes of dry plant material per unit of land area) as well as on the stover quality under a specific pre-treatment.
Mapping QTLs and identifying genes underlying stover quality and quantity are important stages in order to optimize selection programs for upgrading the biofuel production. Maize genetic variation for saccharification efficiency has been detected (20,21) and several linkage mapping studies have been conducted to find QTLs for saccharification efficiency (22)(23)(24). Furthermore, Trunztler et al. (25) performed a metaQTL analysis that included several QTL mapping studies for digestibility and cellwall components and found 27 saccharification-related QTLs. Lorenzana et al. (24) evaluated crosses of 223 maize recombinant inbred lines from B73 × Mo17 (IBM population) for cell wall composition and glucose release after acid pre-treatment and enzymatic hydrolysis and identified 10 QTLs for sugar release, 5 of them co-localizing with QTLs for lignin content. Also, in the IBM population, Penning et al. (26) found 4 QTLs for saccharification efficiency, measured as glucose or xylose releases after steam explosion, but none of them overlapped with QTLs for lignin. The differences in the results found in both studies may be dependent on the pre-treatment chemistry and/or the genotype response to pretreatment and hydrolysis. Lorenzana et al. (24) measured the sugar release after dilute acid/high temperature pre-treatment. This method uses strong acids to hydrolyse the hemicellulosic fraction of the biomass, resulting in a more effective enzymatic hydrolysis (27)(28)(29), whereas in Penning et al. (26) samples were subjected to steam explosion at 180 °C. However, the explored genetic variation for saccharification efficiency has been low because the studies mentioned above were performed using just bi-parental populations and thereby the resolution of the detected QTLs was low. One of the most robust techniques for high resolution mapping of QTLs is Genome-Wide Association Mapping using diversity panels. This technique has been extensively used in maize to identify significant associations with yield and agronomic traits (30), biotic and abiotic resistance (31,32), cell wall components (33)(34)(35) and lignin abundance and sugar yield (26). However, association studies using diversity panels could still have a limited power to detect QTLs due to the small effect and/or low frequency (rare alleles) of some genetic variants, so many undetected rare alleles could be loss for breeding purposes even having major effects (36)(37)(38). Therefore, results from QTL mapping in Multi-Parent Advanced Generation Inter-Cross (MAGIC) populations could be complementary to results from bi-parental populations and association mapping panels because several alleles can be simultaneously studied but none of them would be in low frequency (39)(40)(41). In addition, even though QTL resolution in MAGIC populations is not as high as in diversity panel's, MAGIC populations present a known underlying structure that better prevents from false positive associations than unstructured populations.
We developed a MAGIC population using eight temperate maize inbred lines of diverse genetic origin, where the eight founders have a common characteristic: the lack of Stiff Stalk Materials in their pedigrees (39,41). Six founders were directly obtained from different open-pollinated varieties from Spain, Italy, and France, while two inbred lines derived from North American materials. New inbreds developed from this MAGIC population could have practical interest for breeders as they are expected to express high heterosis when crossed to inbreds from the Stiff Stalk heterotic group. In the present study we identified genomic regions and genes associated with saccharification efficiency and stover yield using this MAGIC population. Results provide a better understanding of the genetic factors that can modulate these traits and the molecular tools to be used in breeding programs for increasing stover production and saccharification efficiency.

Means and Analysis of Variance
The analyses of variance showed that differences among check inbreds were significant for stover yield but not for saccharification efficiency. However, RIL means differed significantly for both traits (data not shown). Means and ranks for the traits under study are detailed in Table 1. Data for EP43 and PB130 was not available, both founders of the MAGIC population, was not available either because the seeds did not germinate or either there were not enough plants in the plot.

Association Analysis
We carried out association analysis to determine genomic regions that modify stover yield and saccharification efficiency. A marker was considered significantly associated with a trait at p values less than 2.42 × 10 − 5 (-log10 (p-value) = 4.6). We considered a +/-700kbp region as confident SNP interval and two SNPs were included in the same QTL when their confident intervals overlapped. We identified 4 SNPs associated with 4 stover yield QTL (qStoverYield_1_1, qStoverYield_3_1, qStoverYield_3_2 and qStoverYield_5_1), and 16 SNPs that were associated with 5 saccharification efficiency QTLs (qSACC_1_1, qSACC_1_2, qSACC_2_1, qSACC_6_1 and qSACC_10_1). SNPs and QTLs are detailed in Table 2. Minor frequency alleles generally decreased stover yield but minor and major frequency alleles contributed almost equally to increased saccharification efficiency. The percentages of phenotypic variance explained by each significant SNP ranged from 5 to 9%. The significant SNPs found in the current study were distributed in bins 1.05, 3.05 and 5.00 for stover yield and in 1.05, 2.06, 6.07 and 10.07 for saccharification efficiency ( Table 2). Table 2 SNPs and QTL significantly associated with saccharification (SACC) and stover yield, including SNP's chromosome, bin and position within chromosome, allelic variants and additive effect for the SNP, proportion of total variance explained by the SNP and p-value for the association between the SNP and the phenotype.

Candidate Gene Selection
The genes containing or physically close to SNPs significantly associated with traits were identified and characterized according to the maize B73 reference genome assembly, version 4 (Supplementary  (Table 3). Table 3 List of candidate genes for the QTL for saccharification efficiency and stover yield found in the MAGIC population. Candidate genes were selected among all genes found in the +-700.000 bp QTL intervals.

Discussion
An optimisation of biofuel feedstock can be achieved using plant breeding for increasing stover yield and quality. Stover quality is associated to the composition of the cell wall and the potential for saccharification (42). Only one of the QTLs found was significantly associated with saccharification efficiency in this study, and coincides in the same bin than those previously described for glucose yield (24). With respect to other studies, we describe new regions related to saccharification efficiency in bins 1.05, 6.07 and 10.07. According to the above mentioned co-localizations we should take into account that we are referring to QTLs detected at the bin scale and QTLs detected in biparental populations corresponds sometimes to a different vegetal material or pre-treatment method.
Genetic markers and genes associated with these traits can allow the establishment of breeding programs based on genomic selection or marker-assisted selection for increasing stover and saccharification yields to avoid heavy and expensive field evaluations and laboratory assays.
In the next paragraphs, we support the reasons for the proposed genes involved in plant development, growth, and assimilation of nutrients as probable candidate genes for the QTL involved in stover yield.
Nitrogen supply is one of the major factors limiting growth and productivity in crops, affecting both grain and stover yields. Therefore, we propose glutamate synthase 1 (Zm00001d029732) gene as candidate gene for the QTL qStoverYield_1_1 because the enzyme glutamate synthase is essential for ammonia assimilation in plants and has been proposed as a key target enzyme to improve nitrogen assimilation efficiency. Chichkova et al. (43) found a direct relation between this gene and variability for biomass as they observed increases in shoot weight and shoot total nitrogen and carbon contents in tobacco transgenic plants overexpressing NADH-glutamate synthase.
In addition, stover yield is determined by plant development and growth, processes that are greatly limited by biotic and abiotic stresses (1,44). Authors, have reported that biomass yield increase can be achieved through enhancing mechanisms of stress tolerance (45)(46)(47). As generation of reactive oxygen species (ROS) occurs at stress conditions, plant mechanisms to protect from ROS damage could contribute to enhanced tolerance to stress because oxidative stress has a negative effect on biomass and plant fitness (48). Consequently, glutathione S-transferase (GST) genes that lie within the confidence intervals of QTLs for stover yield such as Zm00001d041772 which is located within qStoverYield_3_1 QTL could be highlighted as candidate genes because GST contribute to minimise ROS species (49). In the same way, for qStoverYield_3_2 we also spot as candidate the L-ascorbate peroxidase 2 gene (Zm00001d041939), involved in ascorbate-glutathione cycle and detoxification of hydrogen peroxide (50).
Another gene can be proposed as candidate gene for QTL qStoverYield_3_1, the Zar9 and AtMYB 86 may be also involved in regulating secondary cell wall biosynthesis (71).
In the phenylpropanoid pathway that leads to the synthesis of lignin monolignols, Coumaroyl-CoA is converted into caffeoyl-CoA through the formation of quinate or shikimate esters by a hydroxycinnamoyl transferase (HCT) (72). Here, we found a gene encoding a hydroxycinnamoyl transferase (Zm00001d030542) that lies within the confidence interval for the saccharification efficiency QTL in chromosome 1 (qSACC_1_1). The downregulation of this enzyme has been shown to change lignin composition by enriching H units and decreasing the S:G ratio (73)(74)(75). Increases in H produce a greater frequency of resistant inter-unit bonds, and this strengthening of the cell wall leads to less amenability and degradability (76)(77)(78). Therefore, gene Zm00001d030542 appears as a promising candidate gene for improving saccharification efficiency.
Finally, cell wall composition and organisation is a remarkably polygenic character and is influenced by hormonal and developmental factors. Interestingly, we found a gene encoding a Gibbellerin 2oxidase (Zm00001d038996), which irreversibly catalyzes the deactivation of bioactive gibbellerin, within the confidence interval of the saccharification efficiency QTL qSACC_6_1. Gibberellic acid (GA) has been shown to regulate lignin biosynthesis and morphogenesis, at higher amounts of bioactive GA, levels of lignification in plant tissues are increased suggesting that lignification and biomass recalcitrance could be optimized by targeting gibberellin biosynthesis (79).

Conclusion
In order to develop materials with higher biofuel yield per hectare we highlight in the current study genomic regions directly linked to stover yield and saccharification efficiency. Markers located in those regions that can be used in assisted-selection programs. The candidate genes identified in this study, support that total lignin and lignin composition play an important role in cell wall recalcitrance.
Meanwhile genes involved in nitrogen assimilation, organ growth, and stress tolerance are potential candidates to improve stover production. This study opens a possible optimisation path for improving the viability of second generation biofuels.

Material And Methods Plant Material
The was recorded, and a stover sample was collected for estimating the percentage of stover dry matter and carry out the saccharification efficiency analyses. The stover sample was composed of tissue from two to ten plants, the fresh stover was weighed (sample fresh weight), chopped, pre-dried at 35 °C in a forced air camera, dried at 60 °C in a stove and again weighed (sample dry weight). Dry stove samples from each plot were grounded in a Wiley mill with a 0.75 mm screen for saccharification assays.
Stover yield in Mg ha − 1 was determined by the following equation: Saccharification assays were performed as described in Gomez et al. (81). Ground material was weighed into 96-well plates, each well contained 4 mg of each sample either as four replicates; and processed using a high-throughput automated system (Tecan). Samples were pre-treated with 0.5M NaOH at 90 °C for 30 min, washed four times with 500 µl sodium acetate buffer and finally subjected to enzymatic digestion (Celluclast 2, 7FPU/g) at 50 °C for 9 hours. The amount of released sugars was assessed against a glucose standard curve using the 3-methyl-2-benzothiazolinone hydrozone method.

Statistical Analysis
Inbred lines were previously genotyped using a genotyping-by-sequencing (GBS) strategy for 955,690 SNPs (82). Genotypic and phenotypic datasets were combined. The SNPs with more than 50% missing data and a minor allele frequency less than 5% were omitted. Heterozygous genotypes were considered missing data. After filtering, 215.131 SNPs distributed across the maize genome were retained.
Each trial was analyzed separately and combined according to the mixed model procedure (PROC MIXED) of the SAS program (version 9.4) (83) and the best linear unbiased estimator for each inbred line was calculated based on the combined data for the 2-year analysis. Lines were considered as fixed effects, while years and blocks within years were treated as random effects. The comparison of means was carried out using the Fisher's protected least significant difference (LSD).
A genome-wide association analysis was completed with Tassel 5 (84) based on a mixed linear model using a genotype-phenotype matrix and a kinship matrix obtained by the centered identity by state (IBS) method (85). Among the mixed linear model options, we used the optimum compression level and P3D to estimate the variance components.

Snps, Qtl And Candidate Gene Selection
A Bonferroni approach was used to calculate the comparison-wise threshold for declaring significant an association between a trait and a SNP; the experiment-wise threshold (0.3) was divided by the number of independent tests (12397 independent comparisons) (86). We used Haploview program to generate independent blocks using the option four gamete rules (87,88). We considered a +/-700 kbp confident interval region around each significant SNP following previous association studies using the same mapping population (40). In case confidence intervals of two SNPs overlapped they were assigned to a single QTL. The two described genes that delimit the +/-700 kbp region around the SNP in the reference genome assembly version 2 were positioned in version 4 of the reference genome, and all genes contained in the region delimited by those genes were then identified and characterized based on the maize B73 reference genome assembly (version 4) available on the MaizeGDB browser (89) (Supplementary Table 1 Not applicable in this study.

Consent for publication
Not applicable in this study.

Availability of data and materials
The data sets used and/or analysed during the current study will be available upon reasonable request to the corresponding author. Vegetal materials are distributed to the scientific community by Maize Genetics and Breeding group of MBG-CSIC upon request (http://www.mbg.csic.es/en/plantgenetics-and-breeding-department/maize-genetics-and-breeding/. RA Malvar, rmalvar@mbg.csic.es)

Competing interests
The authors declare that they have no competing interests.