THP9 enhances seed protein content and nitrogen-use efficiency in maize

Teosinte, the wild ancestor of maize (Zea mays subsp. mays), has three times the seed protein content of most modern inbreds and hybrids, but the mechanisms that are responsible for this trait are unknown1,2. Here we use trio binning to create a contiguous haplotype DNA sequence of a teosinte (Zea mays subsp. parviglumis) and, through map-based cloning, identify a major high-protein quantitative trait locus, TEOSINTE HIGH PROTEIN 9 (THP9), on chromosome 9. THP9 encodes an asparagine synthetase 4 enzyme that is highly expressed in teosinte, but not in the B73 inbred, in which a deletion in the tenth intron of THP9-B73 causes incorrect splicing of THP9-B73 transcripts. Transgenic expression of THP9-teosinte in B73 significantly increased the seed protein content. Introgression of THP9-teosinte into modern maize inbreds and hybrids greatly enhanced the accumulation of free amino acids, especially asparagine, throughout the plant, and increased seed protein content without affecting yield. THP9-teosinte seems to increase nitrogen-use efficiency, which is important for promoting a high yield under low-nitrogen conditions. Genetic analyses of teosinte, the wild ancestor of maize, identify a locus (THP9) that is associated with high seed protein content and increased nitrogen-use efficiency, suggesting that THP9 could have applications in crop breeding.

Teosinte, the wild ancestor of maize (Zea mays subsp. mays), has three times the seed protein content of most modern inbreds and hybrids, but the mechanisms that are responsible for this trait are unknown 1,2 . Here we use trio binning to create a contiguous haplotype DNA sequence of a teosinte (Zea mays subsp. parviglumis) and, through map-based cloning, identify a major high-protein quantitative trait locus, TEOSINTE HIGH PROTEIN 9 (THP9), on chromosome 9. THP9 encodes an asparagine synthetase 4 enzyme that is highly expressed in teosinte, but not in the B73 inbred, in which a deletion in the tenth intron of THP9-B73 causes incorrect splicing of THP9-B73 transcripts. Transgenic expression of THP9-teosinte in B73 significantly increased the seed protein content. Introgression of THP9-teosinte into modern maize inbreds and hybrids greatly enhanced the accumulation of free amino acids, especially asparagine, throughout the plant, and increased seed protein content without affecting yield. THP9-teosinte seems to increase nitrogen-use efficiency, which is important for promoting a high yield under low-nitrogen conditions. The seeds of plants contain stored metabolites-for example, carbohydrates, proteins, lipids, and nucleic acids-that are crucial for rapid cell division and growth during the transition from dormancy to photosynthetic autotrophy when environmental conditions are suitable for germination 3 . These metabolites also make seeds a valuable source of food for a variety of animals, as well as humans 4 . Over millennia, plant breeders have genetically altered plant species to create seeds with greater proportions of these metabolites, to improve their nutritional value and utility as food and feed 5 . Perhaps one of the most notable examples of this process was the conversion of the wild ancestor of maize, teosinte, to modern maize 6 .
Native Americans selected mutations that modified a variety of traits of teosinte, including the size and structure of its floral inflorescences and seeds, and its yield 6,7 . Because of its importance in their diet, the maize that was domesticated by Native Americans had a high protein content, enhanced flavour and utility for making food. However, as corn became a commodity and was used as feed for livestock, starch content (yield) became a primary concern, and less attention was paid to protein content and flavour 2,8,9 . In addition, the use of nitrogen fertilizer reduced the importance of seed nitrogen content, and, as a consequence, modern maize hybrids contain only 5-10% protein; by contrast, teosinte has a protein content of 20-30% (ref. 2 ).
Although nitrogen fertilizer markedly improves the yield of maize, its excessive use often leads to run-off, which causes the eutrophication of rivers and other bodies of water 10 . Consequently, future maize breeding must design plants with a higher nitrogen-use efficiency (NUE) 11 . In addition, seed protein content and quality will be more important in the future, as vegetable protein is likely to have a larger role in human diets 12 .
To identify genes that are responsible for the differences in protein content between maize and teosinte, we analysed the progeny of their cross and characterized the quantitative trait loci (QTLs) that affect this trait. We sequenced a teosinte haplotype genome (Zea mays subsp. parviglumis, Ames 21814), and localized gene loci that were associated with a high content of protein in the seeds. Using the teosinte haplotype and nearly isogenic line (NIL) populations created from it, we were able to clone a teosinte high-protein locus, THP9, which contains an asparagine synthetase 4 gene (ASN4) that has a central role in the accumulation of amino acids throughout the plant. The THP9-teosinte allele, THP9-T, is highly expressed in teosinte, whereas the corresponding allele in the maize B73 inbred, THP9-B73 (THP9-B) has a 48-bp deletion in the tenth intron that affects intron splicing. Several versions of the THP9-B transcripts contain a premature stop codon, which results in undetectable levels of the ASN4 enzyme. Introgression of THP9-T into maize inbreds and hybrids increased the amino acid and protein content in the roots, stems and leaves, as well as the seeds. These plants exhibited a higher NUE under low-nitrogen conditions than did those with the THP9-B allele, and show promise for improving maize germplasm in general.

High protein content of teosinte
During the domestication and artificial selection of maize, many visible (such as plant and glume architectures) and invisible (seed composition) traits were substantially modified (Fig. 1a). To investigate the variation in seed protein content between teosinte and modern maize inbreds, we collected 20 lines of Zea mays subsp. parviglumis, 10 lines of Zea mays subsp. mexicana and 518 modern maize inbreds. In maize seeds, most nitrogen occurs in storage proteins, so the total nitrogen content is essentially equal to the value measured. In the roots, stems and leaves, the total nitrogen reflects the sum of the nitrogen in free amino acids and proteins, but most of it is found in proteins. We determined the nitrogen content of the roots, stems, leaves and seeds of B73 by two procedures: acid hydrolysis and the Dumas method, the latter using a Dumas rapid nitrogen analyser (see Methods). There was no significant difference in the nitrogen content measured by the two methods (Extended Data Fig. 1a). Therefore, we could use the high-throughput rapid nitrogen analyser to assay seed protein content and nitrogen content in plant tissues. The seed protein content of all teosinte lines was around 30%, whereas that of maize inbreds (405 finally harvested for measurement in 2019) ranged from 6.5% to 16.5%, with an average of 11.5% (Fig. 1b). These differences suggest that the loci that control seed protein content are genetically variable in teosinte and modern inbreds.
We selected one line of Zea mays subsp. parviglumis (accession Ames 21814) as a representative high-protein genotype for analysis. We measured the total nitrogen content in the roots, stems and leaves of Ames 21814 and B73, and found that the nitrogen content was higher in all tissues of Ames 21814 than in B73 (Extended Data Fig. 1b). The composition of free amino acids differed to some extent (Extended Data Fig. 1c); in particular, the levels of asparagine were notably higher in all tissues of Ames 21814 than in B73 ( Fig. 1c and Extended Data Fig. 1c). This is consistent with a previous observation in rice seeds, in which increased levels of asparagine were found to be associated with a high protein content 13 .
Maize-seed proteins are classified according to their solubility as prolamins (called zeins), albumins, globulins and glutelins 14 . Zeins are the main endosperm storage proteins, and account for more than 60% of the total. On the basis of their structure, zeins are divided into four families: α (19-and 22-kDa; designated α19 and α22), β (15-kDa), γ (50-, 27-and 16-kDa) and δ (18-and 10-kDa). The α19 family is further divided into z1A, z1B and z1D subgroups, and all members of the α22 family are z1C (refs. 15,16 ). SDS-PAGE revealed that the accumulation of both zein and non-zein proteins was apparently higher in teosinte than in B73 seeds, but the most increased fractions seemed to be α19 and α22 (Extended Data Fig. 1d). This encouraged us to investigate the copy number of α-zein genes in teosinte.

Assembly of the Ames 21814 haplotype
To create a high-quality teosinte haplotype assembly for comparing α-zein loci in Ames 21814 and other inbreds, and for the mapping of high-protein QTLs, we sequenced (Supplementary Table 1) the DNA of a single F 1 plant from the B73 × Ames 21814 cross using trio binning 17 . The genome sequence of B73 is known, and we used a graph-based trio-binning strategy 18 to decipher the haplotype information (Extended Data Fig. 2a,b) were combined and scaffolded separately into 10 pseudo-chromosomes using 375.56-Gb Hi-C reads (Extended Data Fig. 2c,d  Fig. 2f,g). Whether these structural variations correlate with phenotypic variation between teosinte and modern maize will need to be determined experimentally.
Using the highly contiguous Ames 21814 haplotype, we were able to annotate all of the α-zein genes. The total copy numbers of α19 (z1A1, z1A2, z1B and z1D) and α22 (z1C1 and z1C2) genes in the Ames 21814 haplotype were 22 and 12, respectively, compared with 25 and 15 in B73, and 25 and 19 in W22, respectively (Extended Data Fig. 3a,b). This suggests that the high-protein trait in teosinte is not conferred by a larger number of α-zein genes, and that the high-protein QTLs in teosinte generally, rather than specifically, increase protein content.

Cloning of the high-protein locus
To identify the QTLs associated with the high-protein trait in Ames 21814, we created a series of continuous backcrossing populations using Ames 21814 as the high-protein donor parent and B73 as the recurrent backcrossing parent (Extended Data Fig. 4a). Owing to unidirectional incompatibility between teosinte and modern maize, we used Ames 21814 to pollinate B73 for the F 1 progeny (Extended Data Fig. 2a). We measured protein content with the rapid nitrogen analyser and found that the F 1 seeds (B73 × Ames 21814; 11.6 ± 0.8% (s.d.)) had a protein level similar to that of B73 (10.8 ± 1.0%), whereas the Ames 21814 seeds had a protein content of 28.6 ± 1.0% (Fig. 2a)-consistent with the accumulation pattern of zein proteins, in which α-zeins (α19 and α22) are a major indicator of the total protein content (Extended Data Fig. 4b).
However, the total nitrogen content in the roots and leaves of F 1 plants was higher than that in B73 (Extended Data Fig. 4c-e), and the F 2 seeds had nearly double the protein content (19.9 ± 1.2%) of F 1 and B73 seeds (Fig. 2a). When F 2 seeds were analysed individually by SDS-PAGE, there was no apparent variation in the accumulation of zein proteins, and α-zeins in particular were notably more abundant as compared with B73 seeds (Extended Data Fig. 4f and Supplementary Fig. 1). The lack of Mendelian-like segregation in F 2 seeds indicates that the high-protein trait is determined by the maternal rather than the filial genotype.
Because the F 1 (B73 × Ames 21814) plants exhibited many rudimentary teosinte phenotypes in vegetative and reproductive growth (Extended Data Fig. 2a), we used B73 as the ear parent to make the first backcrossing generation (BC 1 ; B73 × F 1 ). Afterwards, we used B73 as the pollen source for backcrossing (Extended Data Fig. 4a). In the BC 2 (BC 1 × B73) population, we observed a segregation of zein protein content among different ears in a quantitative rather than a qualitative pattern (Extended Data Fig. 4g and Supplementary Fig. 2), which indicates that the high-protein trait is regulated by multiple genetic loci. Like the F 2 seeds (Extended Data Fig. 4f), when individual seeds from a high-protein BC 2 ear were analysed, each seed uniformly accumulated more α-zeins than did B73 (Extended Data Fig. 4h). Subsequent backcrossing generated eight ears with the highest protein content (about 15%); these were retained and seeds from each ear were planted for analysis. Similarly, quantitative measurement of the BC 3 and BC 4 generations with the rapid nitrogen analyser confirmed that the protein content varied among different ears (ranging from 10 to 15%), but was uniform in individual seeds of the same ear (Extended Data Fig. 4i-l).
To identify the genetic loci that influence protein content, we planted the BC 3 seeds, and saved a piece of leaf from each plant for DNA extraction. Zein and non-zein proteins from 500 BC 3 ears were analysed by SDS-PAGE (Extended Data Fig. 5a and Supplementary Fig. 3). On the basis of their phenotypes, we pooled leaf DNA samples of the low-and high-protein individuals (n = 75 for each) for bulked segregant analysis (BSA) DNA sequencing. The results highlighted several QTLs on chromosomes 1, 3, 4, 5, 7 and 9, with a significant peak in the region between 130 Mb and 160 Mb (based on Teo_v1) on chromosome 9 ( Fig. 2b and Extended Data Fig. 5b) that contained 315 introgressed teosinte genes. Accordingly, this locus was designated TEOSINTE HIGH PROTEIN 9 (THP9).
Using the same approach, we created the BC 6 (n = 1,314) and BC 8 (n = 1,344) generations. BSA of BC 6 and BC 8 confirmed the existence of THP9. However, continuous backcrossing did not appear to result in more frequent recombination at this locus, as the two latter BSAs still contained 271 and 190 teosinte genes in this region (based on a 0.025 threshold) (Extended Data Fig. 5c-f). We performed high-coverage (higher than 20×) resequencing of this region in five high-protein and five low-protein individuals from the BC 6 population, and found that the introgressed teosinte locus in the five high-protein lines recombined in the form of large DNA fragments between 22.7 and 144.4 Mb (based on B73_v4); the smallest common region (135.5-143 Mb) should be the candidate interval (Extended Data Fig. 5g). Nearly isogenic lines, NILTeo and NILB73, with high and low levels of protein, respectively, were created on the basis of this interval.
To fine-map THP9, we created a BC 9 generation (n = 2,000) that narrowed THP9 to a 150-kb region containing three genes (Zm00001d047732, Zm00001d047736 and Zm00001d047737), on the basis of the B73 reference genome (B73_v4) (Fig. 2c). Zm00001d047732, which encodes a protein phosphatase, lacks notable structural variation between Ames 21814 and B73, except for several single-nucleotide polymorphisms (SNPs) in the gene coding sequence (Extended Data Fig. 6a). The fold changes of Zm00001d047732 expression in the roots and leaves of NILTeo and NILB73 were all slight (Extended Data Fig. 6b). Zm00001d047737, which encodes an uncharacterized protein, was not expressed in the roots and leaves of NILTeo and NILB73 (according to our RNA-seq data), nor was it expressed in other tissues (according to public RNA-seq data) 23 . Zm00001d047736, which corresponds to Teo09G002926 in Ames 21814, encodes an asparagine synthetase 4 (ZmASN4). An analysis of the Ames 21814 and B73 sequences revealed that Teo09G002926 is an intact ASN4 gene (hereafter referred to as THP9-Teosinte, THP9-T), whereas Zm00001d047736 has a 48-bp deletion in the tenth intron of ASN4 (hereafter referred to as THP9-B73, THP9-B; Fig. 2c).
On the basis of published data 19 , we determined that the intron deletion in ASN4 creates altered splicing of THP9-B transcripts, which results in the formation of three different isoforms of the mRNA. The ZmASN4-T001 isoform is similar to the ASN4-Teo transcript, whereas ZmASN4-T002 and ZmASN4-T003 are defective, as both contain a premature stop codon (Extended Data Fig. 6c). RNA-seq revealed that THP9-T transcripts (ASN4-Teo) accumulate abundantly in the roots and leaves of Ames 21814, whereas the ZmASN4-T003 isoform was barely detectable in these tissues of B73 (Extended Data Fig. 6d). Further RNA-seq analysis of NILTeo and NILB73 confirmed that THP9-T is highly expressed, whereas THP9-B is barely expressed, in root and leaf tissues ( Fig. 2d and Extended Data Fig. 6e).
We amplified the ASN promoter sequences (around 1.9 kb) from Ames 21814 and B73 and tested their activities by dual-luciferase assay. The results showed that there was no significant difference in activity between the two ASN4 promoters (P > 0.05), which suggests that the differential expression of THP9-T and THP9-B is unlikely to be caused by promoter variation (Extended Data Fig. 6f). Consistent with the transcript levels, ASN4 protein accumulates abundantly in NILTeo, but is absent in NILB73 ( Fig. 2e and Supplementary Fig. 4). The results suggest that the 48-bp deletion in the tenth intron of the ASN4 gene in B73 considerably affects the RNA splicing and stability of ASN4 transcripts, making them and the ASN4 protein difficult to detect. THP9-B can therefore be considered a null allele.
We developed a molecular marker for THP9-B and used it to genotype 200 individuals in the BC 7 population (Extended Data Fig. 7a). When we measured the free asparagine content in the roots of THP9-T and the heterozygote of THP9-H (T/B), we found significantly higher levels of asparagine than in THP9-B (Extended Data Fig. 7b). The protein content of THP9-T and THP9-H seeds was also significantly higher than in THP9-B seeds (Extended Data Fig. 7c), confirming that the THP9-T allele is associated with a higher protein content. The protein content of NILTeo seeds grown in Harbin (northeast China), Shanghai (east China) and Sanya (south China) was 12 ± 0.7%, 13.1 ± 0.4% and 15.4 ± 1.0%, respectively, whereas that of NILB73 was 9.2 ± 0.5%, 9.7 ± 0.4% and 11.2 ± 1.0%, respectively. Thus, the THP9-T allele increased the protein content by 30.4%, 35.2% and 37.8%, respectively, at these three geographical locations (Fig. 2f). We also compared other phenotypes of NILTeo and NILB73 in Sanya. NILTeo showed a 7.6% increase in plant height (Extended Data Fig. 7d,e) and a 15.1% increase in plant fresh weight (root and aboveground mass), as compared with NILB73 (Extended Data Fig. 7f).
Because seed storage proteins function as a sink for nitrogen storage, we wondered whether the total nitrogen content-most of it existing as free amino acids and proteins-was increased in source tissues. We used the rapid nitrogen analyser to measure the levels of nitrogen in the stems and corresponding ears of 1,334 BC 8 plants, and found that they were highly correlated (Extended Data Fig. 7g). This correlation was also observed in NILs; NILTeo had an increased total nitrogen 28.6 ± 1.0% (A) 10.8 ± 1.0% (C) 11.6 ± 0.8% (C) 19.9 ± 1.2% (B) Ames 21814 B73 × Ames 21814 (B73 × Ames 21814) Chr. 9 130.6 142.8 CEN In a,c, different letters indicate significant differences (P < 0.01, one-way ANOVA and further Tukey's test; see Source Data). In d,f,g, a two-tailed Student's t-test was used to determine P values (see Source Data). CEN, centromere.
Article content in the roots, stems and leaves (Extended Data Fig. 7h), as well as an increased total free amino acid content in the roots and leaves, as compared to NILB73 (Extended Data Fig. 7i). In addition, the levels of free asparagine in NILTeo roots and leaves were significantly higher than those in NILB73 (Fig. 2g), indicating that the increased accumulation of asparagine through THP9-T facilitates increased synthesis of proteins in the roots, stems, leaves and seeds.

Validation and natural variation of THP9
To investigate whether THP9-T can influence the low-protein phenotype of B73, we expressed this allele in transgenic plants using the ubiquitin promoter. The THP9-overexpressing plants had greatly enhanced levels of ASN4 transcript and ASN4 protein in the leaves and roots, as compared with the non-transgenic B73 control (Fig. 3a-c and Supplementary  Fig. 5). Two representative THP9-overexpressing lines (THP9-OE1 and THP9-OE2) grown in Sanya were analysed. The seed protein contents of THP9-OE1 and THP9-OE2 were 15.2 ± 1% and 15.8 ± 1.1%-increases of 25.7% and 30.9%, respectively, over the B73 control (12.1 ± 0.9%) (Fig. 3d).
These results are consistent with the hypothesis that the mutation in THP9 is responsible for the low-protein phenotype of B73. As well as measuring 405 maize inbreds grown in Sanya in 2019 (see Fig. 1b and its corresponding Source Data), we measured the seed protein content of 437 inbreds that were grown in Sanya in 2020 (see Fig. 3e and its corresponding Source Data). The protein content of the 2020 crop ranged from 7.8% to 16.9%, with an average of 12.3%. Inbreds for which data were available on the seed protein content from both 2019 and 2020 were used for a genome-wide association study (GWAS) analysis, which identified a region with physical coordinates near the THP9 locus (Fig. 3e). PCR and sequencing of 420 inbreds revealed that the ASN4 gene in this population had three haplotypes (HAP1-HAP3) based on an indel polymorphism in the tenth intron: HAP1 inbreds (25.7%) belong to the Ames 21814 genotype (THP9-T), with an intact ASN4 coding sequence; HAP2 inbreds (46.4%) contain a 22-bp deletion in the tenth intron (designated THP9-T'); and HAP3 inbreds (27.9%) have the B73 genotype (THP9-B), with a 48-bp deletion in the intron (Fig. 3f). Using B73 as a standard (10-12% in our studies), we defined the high-protein inbreds as those with a protein content higher than 13%, and the low-protein inbreds as those with a protein content lower than 10%. HAP1 had the highest percentage of high-protein inbreds (44 out of 108; 40.7%), followed by HAP2 (37 out of 195; 19.0%) and HAP3 (6 out of 117; 5.1%), whereas HAP3 had the highest percentage of low-protein inbreds (30 out of 117; 25.6%), followed by HAP2 (30 out of 195; 15.4%) and HAP1 (0 out of 108; 0%). HAP1 inbreds also had the highest average protein content (12.7 ± 1.4%), whereas HAP2 had a medium level of protein (11.5 ± 1.7%) and HAP3 the lowest (11.0 ± 1.4%, Fig. 3g). Quantitative PCR with reverse transcription (qRT-PCR) showed that the transcript levels of ASN4 in HAP1 were generally higher than those in the HAP2 and HAP3 haplotypes, consistent with the average seed protein contents in the three haplotypes (Fig. 3h). This suggests that the length of deletion in the tenth intron of ASN4 is related to the final transcript levels. Together, these results support the hypothesis that THP9 is a major QTL that influences the variation of seed protein content among inbred lines.

THP9-T increases the NUE
Because THP9-T increases the free amino acid content in plants, which in turn promotes plant development and the accumulation of protein in seeds, we investigated whether THP9-T could increase the NUE. To this end, we set up an experimental site on our farm in Shanghai to test the  effects of applying different levels of nitrogen fertilizer on plant growth. Several aboveground concrete containers with different concentrations of soil nitrogen were constructed. A plastic film covered the containers to prevent rainwater from affecting the soil nitrogen concentration (Extended Data Fig. 8a-d). More than 50 NILTeo and 50 NILB73 plants were grown side by side in containers either with a normal application of nitrogen (40 g per plant; normal-nitrogen condition) or without the application of nitrogen (low-nitrogen condition). We measured the levels of nitrogen in the soil and found that the pool of plants that were treated with normal levels of nitrogen contained 76.7% more nitrogen than did the low-nitrogen pool of plants (Extended Data Fig. 8e). NILTeo plants appeared to grow better than NILB73 plants in normal-and low-nitrogen conditions (Extended Data Fig. 8f-h). qRT-PCR showed that the expression of THP9-T, but not THP9-B, was strongly induced when nitrogen was applied, suggesting that THP9-T is sensitive to the level of soil nitrogen (Extended Data Fig. 8i). Without the application of nitrogen, both NILTeo and NILB73 plants were slender and had a smaller amount of root mass than was observed with the normal-nitrogen treatment, but NILTeo plants that were treated with less nitrogen were comparable in size to NILB73 plants that were treated with normal levels of nitrogen (Extended Data Fig. 8d,f). The root fresh weight and the aboveground biomass of NILTeo and NILB73 plants were greatly reduced by the low-nitrogen condition, but there was no significant difference between NILTeo in low-nitrogen and NILB73 in normal-nitrogen conditions (Extended Data Fig. 8j,k). The total nitrogen content (mostly free amino acids and proteins) in the roots, stems and leaves, and the protein content in the seeds of NILTeo and NILB73, were affected by low levels of nitrogen, but these values (protein and nitrogen content) in NILTeo under the low-condition condition were comparable to those of NILB73 in the normal-nitrogen condition (Extended Data Fig. 8l-o). We also examined the NUE in THP9-OE2 transgenic plants. Similar to NILTeo, THP9-OE2 showed an improved NUE in low-nitrogen conditions (Extended Data Fig. 9a-j). Subsequently, in 2021 we set up a larger field trial in Sanya, in which we applied different amounts of nitrogen: 100% (32 g per plant), 50% (16 g per plant), 25% (8 g per plant) and 0%. In each trial, 300 seeds of NILTeo and NILB73 were planted together (Fig. 4a). NILTeo seemed to have a growth advantage over NILB73 in terms of plant height (Fig. 4b) and aboveground biomass (Fig. 4c) under the different nitrogen conditions. The total nitrogen content in the roots, stems and leaves of NILTeo was significantly higher than that in NILB73 in all trials ( Fig. 4d-f). After reducing the application of nitrogen in the three treatments from 100% to 0%, the protein content in NILTeo seeds decreased from 14.2% to 13.5%, 12.0% and 10.7%, whereas in NILB73 seeds it decreased from 11.4% to 11.2%, 10.7% and 8.9% (Fig. 4g). The results indicate that seed protein content is sensitive to the level of soil nitrogen, and in each treatment, NILTeo seeds always maintained a higher level of protein than NILB73 seeds. The protein content in NILTeo seeds that were harvested at 25% nitrogen reached 12.0%, which was higher than that of NILB73 seeds (11.4%) that were treated with normal levels of nitrogen. These results are consistent with the hypothesis that THP9-T confers a higher NUE than THP9-B in NILB73 in normal-and low-nitrogen conditions. B73 and Mo17 (HAP2 type) are inbreds that are frequently used to study the vigour of hybrids. To examine whether THP9-T could increase the protein content of hybrids and influence other agronomical traits, we generated two sets of F 1 seeds-NILTeo × Mo17 and NILB73 × Mo17-and planted them at the Harbin site. The protein content of F 2 seeds from the NILTeo × Mo17 cross (9.2 ± 0.6%) was 7.8% higher than that of seeds from the NILB73 × Mo17 cross (8.6 ± 0.4%), whereas the 100-kernel weight was nearly identical for the two hybrids (35.4 g versus 35.6 g; Fig. 5a-c).
We measured the seed protein content of NILTeo × Mo17 and Zhengdan 958-T hybrids that were harvested in Shanghai in 2022, which also showed a significant increase compared with the corresponding control (Extended Data Fig. 10). The results suggest that THP9-T has the potential to improve the protein content of maize seeds and plants through plant breeding.

Discussion
Variable seed protein content in maize Genetic variability in terms of seed protein content is well documented in maize. More than 100 years ago, the University of Illinois initiated a breeding program to examine the consequences of artificial selection on seed composition. High-protein and low-protein phenotypes were among the traits selected. Midway through the decades-long process, plant breeders reversed the selection, and converted the high-protein germplasm to a low-protein phenotype, and vice versa with the low-protein selection. The outcome was four genetic strains of maize: Illinois high protein (IHP), reverse high protein (RHP), Illinois low protein (ILP) and reverse low protein (RLP), which had protein contents of about 30%, 7%, 4% and 15%, respectively 24 . The results of this experiment suggest the existence of both positive and negative genetic factors influencing protein content in natural maize populations, which are likely to be controlled by multiple genetic loci 25 .
Because modern maize was domesticated from teosinte, we reasoned that characterizing the genes responsible for the high-protein trait in teosinte might reveal a more diverse set of QTLs than those found in recent inbred maize populations. The results might also help us to understand the reasons for the decrease in seed protein content during the domestication of maize 2 . In addition, teosinte contains high levels of free amino acids-especially asparagine-in the roots, stems and leaves ( Fig. 1c and Extended Data Fig. 1c); this suggests that it has high nitrogen assimilation, which could contribute to the seed protein content and NUE. The challenge was to create a complete teosinte genome sequence, and this led us to characterize the nucleotide sequence of a high-quality teosinte haplotype, Zea mays subsp. parviglumis, Ames 21814 ( Fig. 1d and Extended Data Fig. 2).

THP9 encodes an asparagine synthetase
We assembled a high-quality teosinte haplotype genomic sequence that helped us to identify the genes responsible for QTLs associated with its high seed protein phenotype 1 . Significant QTLs were found on chromosomes 1, 3, 4, 5, 7 and 9 (Fig. 2b), but we focused on THP9, because it had the greatest effect and was the highest peak revealed by BSA DNA sequencing. THP9 encodes an asparagine synthetase 4 (ASN4) gene. The intact THP9-T allele is highly expressed in teosinte roots and leaves, but the THP9-B allele is not functional in these tissues in B73, owing to mis-splicing of its transcripts (Fig. 2d); this probably leads to differences in nitrogen assimilation.
ASN is an important enzyme in the metabolism of nitrogen 26 and it has a key role in the nitrogen response network 27 . Previous research in Arabidopsis 28-30 , rice 13,31,32 , wheat 33,34 and barley 35 showed that changes in the expression of ASN genes alter plant growth and nitrogen content, and that the level of ASN expression is affected by the environment. In Arabidopsis thaliana, studies of AtASN1, AtASN2 and AtGS Article confirmed the effect of asparagine on the nitrogen content in seeds, floral organs, leaves and plants [28][29][30] . In rice (Oryza sativa), studies of OsASN1 confirmed the importance of asparagine for plant nitrogen and for grain protein content 13 . The increase in ASN activity leads to enhanced assimilation of nitrogen, which results in more asparagine being transported to the seed for protein synthesis 26,36 .
In B73, there are four ASN genes: ZmASN1-ZmASN4 (Zm00001d045675, Zm00001d044608, Zm00001d028750 and Zm00001d047736) (ref. 37 ). ZmASN1 appears to be expressed in all maize tissues, including the root, stem, leaf, endosperm and embryo. ZmASN2 is mainly expressed in the endosperm and embryo, according to public RNA-seq data 23 . ZmASN3 (on chromosome 1) and ZmASN4 (on chromosome 9) could have resulted from an ancestral gene duplication 38 . These two genes are functional in Ames 21814 and could have an additive effect in asparagine synthesis.
ZmASN3 was annotated as an intact gene in the B73 genome, but it is expressed at low levels in leaves, cobs and silks 23 . The four maize ASN genes could have a redundant function for asparagine synthesis, but the absence of ZmASN4 and probably any of the other three ZmASN genes leads to asparagine insufficiency in the plant. When THP9-T was introgressed into B73, the asparagine content in roots and the nitrogen content of the entire plant were significantly enhanced ( Fig. 2g and Extended Data Fig. 7h), providing evidence of the importance of THP9-T for NUE and seed protein content. We anticipate that overexpression of other ZmASN genes might also increase the seed protein content and NUE in maize.
Our data suggest that THP9-B is a null allele, as the ZmASN4 protein is missing in B73 (Fig. 2e). Public data showed that THP9-B gives rise to three mRNA isoforms. Of these, the functional one was undetectable in our assays, and the other two are defective because the 48-bp deletion in the tenth intron leads to mis-splicing, which creates a premature stop codon in THP9-B transcripts. On the basis of prediction using Pfam searches (https://www.ebi.ac.uk/interpro/), the THP9-T allele produces a protein of 588 amino acids, whereas the barely detectable isoform 3 of THP9-B can only be translated into a truncated protein with 480 amino acids. Amino acids are essential substrates for protein synthesis, and their levels in the plant are influenced by soil nitrogen availability and the NUE of the plant 39 . During amino acid synthesis, asparagine has a primary role in nitrogen recycling, and it acts as a nitrogen donor for multiple aminotransferases 40 . Owing to its high nitrogen-to-carbon ratio and inert nature, free asparagine is a major carrier for nitrogen storage and long-distance transport in the plant 26 . ASN, which is responsible for transferring amide groups from glutamine to aspartate and forming asparagine, determines the assimilation, remobilization and allocation of nitrogen in the plant 28,41 .
With limited soil nitrogen, the amino acid supply for protein synthesis can be increased by a greater NUE 36 . NUE is determined by multiple processes, namely, nitrogen uptake, transport, assimilation and remobilization, of which nitrogen assimilation has been actively studied 42,43 . Looking to the future, there is economic and environmental pressure to maintain high-yielding maize while reducing the level of nitrogen applied to the soil. Therefore, it is crucial to identify genetic factors that increase the NUE. Several studies have shown an association between QTLs and enzymes related to nitrogen metabolism [44][45][46][47] ; however, the genes responsible for QTLs that control nitrogen assimilation have not been cloned.

Potential use of THP9-T for breeding
THP9-T had a stable phenotypic effect at different geographical locations and when treated with different levels of nitrogen, which is essential for its practical application. However, we found that the protein content of NILTeo seeds was only half that of teosinte. We can offer two possible explanations: (1) the high protein content of teosinte is regulated by multiple QTLs, and the remaining uncharacterized QTLs on other chromosomes make substantial contributions to the high-protein phenotype of teosinte 1 (Fig. 2b); (2) the seed protein content is affected by the distribution of nitrogen from source to sink. Teosinte seeds are small and their yield per plant is low. Therefore, the concentration of amino acids allocated to a single seed could be greater for protein synthesis. The seeds of modern inbred lines are larger than those of teosinte, and there are more seeds per ear and per plant; in consequence, a lower concentration of amino acids allocated from the source to the seed could limit protein synthesis. The protein content of hybrids with the introgressed THP9-T allele was lower than that in NILTeo (Figs. 2f and 5c,h), supporting this hypothesis.
As most of the increased protein in teosinte and NILTeo is α-zein, which is essentially devoid of the essential amino acid lysine, the increased protein content has limited nutritional value for monogastric animals and humans. However, THP9-T could be introgressed into quality protein maize (QPM)-which contains less zein and more non-zein proteins, owing to the o2 mutation-to create high-protein, high-lysine hybrids. The improved QPM will be valuable for future food security, in particular in countries where maize is consumed as a major source of protein.
The HAP1-HAP3 genotypes imply that some-if not all-of the ancestral high-protein QTLs were retained in domesticated populations of maize. Nevertheless, seed protein content declined during modern Z h e n g d a n 9 5 8 -B Z h e n g d a n Z h e n g d a n 9 5 8 -T Z h e n g d a n 9 5 8 -B Z h e n g d a n 9 5 8 -T Z h e n g d a n 9 5 8 -B Z h e n g d a n 9 5 8 -T Z h e n g d a n 9 5 8 -B Z h e n g d a n 9 5 8 -T Z h e n g d a n 9 5 8 -B Z h e n g d a n 9 5 8 -T Z h e n g d a n 9 5 8 -B Z h e n g d a n Article maize breeding 9 . Because THP9-T is superior to THP9-B and THP9-T ' for protein synthesis and has no apparent negative effect on yield, why THP9-T was not retained in elite maize germplasm is a key question. One possible explanation is that THP9 was not under selection pressure, owing to the ample application of nitrogen fertilizer. This could have become a vicious cycle, with the low NUE of THP9-B requiring more nitrogen fertilizer to improve the yield and protein content. NUE has key environmental and economic implications for global food security, and research to understand NUE is necessary if we are to maintain high yields and high protein quality with a low input of nitrogen 48 . Several genes and QTLs that affect the NUE in rice, including NRT1.1B, OsTCP19, GRF4 and NGR5, have been cloned 49 . Superior alleles of these genes or QTLs offer the potential to achieve high, stable rice yields with low levels of nitrogen application. Root nitrogen sensing has been found to be affected by several external factors, and strategies are being developed to increase the nitrogen acquisition efficiency under varying nitrogen conditions for crop production 42,50 . Our research shows the possible value of hybrids that contain the THP9-T allele, although larger field trials in multiple geographical locations will be needed to fully establish the potential of THP9-T for improving the seed protein content and NUE in maize breeding. These hybrids perform well in a nitrogen-poor environment, and maintain a normal yield when reduced levels of nitrogen are applied. Additional research on NUE, based on the high-quality teosinte genome sequence, could lead to other QTLs that improve modern hybrids. The marked structural variation between the genomes of Zea mays subsp. parviglumis, Ames 21814 and B73 will also be beneficial for investigating the genes that may be responsible for the phenotypic modifications that occurred during the domestication of teosinte.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-022-05441-2.

Plant materials
We obtained 30 teosinte lines (20 lines of Zea mays subsp. parviglumis and 10 lines of Zea mays subsp. mexicana) from the laboratory of J. Messing at Rutgers University, USA. They were originally obtained from the North Central Regional Plant Introduction Station (NCRPIS), USA. The 518 inbred lines that were used for the GWAS analysis were obtained from the laboratory of J. Lai at China Agricultural University. A teosinte line (Zea mays subsp. parviglumis, accession number: Ames 21814) was used for genome sequencing and the creation of mapping populations and NILs. The maize genetic variants were grown in the experimental fields in Shanghai (30.5° N, 121.1° E), Harbin (44.0° N, 125.4° E) and Sanya (18. 2° N, 109.3° E).
Ames 21814 pollen was used to fertilize B73 ears, and the resulting F 1 generation was used for pollen to backcross with B73, yielding BC 1 . We chose a single BC 1 ear for planting. In the following generations, we used B73 pollen for backcrossing. The zein and non-zein protein accumulation patterns of 108 BC 2 ears were characterized, and the protein accumulation pattern showed quantitative segregation. In a single ear, all seeds contained a uniformly high or low content of α-zeins. The BC 2 seeds from ears with a high protein content were selected for planting. We created continuous backcrossing populations, yielding BC 3 (n = 500), BC 4 (n = 500), BC 5 (n = 1,000), BC 6 (n = 1,314), BC 7 (n = 1,200), BC 8 (n = 1,344) and BC 9 (n = 2,000). In each generation, we measured the protein content ear by ear.
To obtain homozygous NILTeo and NILB73, 20 BC 6 independent ears with a high protein content (about 15%) were planted as 20 groups. Thirty plants of each group were self-pollinated, yielding 600 BC 6 F 2 ears, which formed 30 × 20 subgroups. The protein content of all subgroup ears was measured and 50 BC 6 F 2 subgroup ears with a high protein content were planted. Twenty plants of each subgroup were self-pollinated, yielding 1,000 BC 6 F 3 ears that were measured for protein content. If individual ears in a subgroup had a uniformly high protein content-namely, no segregation-they should be homozygous for the high-protein locus and were identified as NILTeo. By contrast, if all the ears in a subgroup uniformly had a protein content similar to B73 (about 10%), the ears were designated NILB73. NILTeo and NILB73 were propagated by self-pollination.
Five BC 6 F 4 NILTeo and five NILB73 individuals were selected for 20× resequencing. The linkage analysis was performed by genotyping 200 BC 7 F 2 plants and measuring the protein content of the corresponding BC 7 F 3 ears.
The overexpression vector of THP9-T fused with Flag (ubiPro:THP9-T) was constructed and then transformed into B73 via Agrobacteriummediated transformation by Wimi Biotechnology (http://www.wimibio. com/). The primer sequences used in this study are shown in Supplementary Table 7.

Resolving the Ames 21814 haplotype with trio binning
Because of advances in long-read DNA-sequencing technologies, many high-quality maize inbred haplotypes have been assembled successfully 19 . Inbreeding simplified the assembly consensus process, as most of the regions are homozygous. However, teosinte lines make it hard to untangle the two haplotypes owing to the high heterozygosity caused by open pollination.
High-quality genomic DNA was extracted from fresh leaves of the F 1 crossed with B73 and Ames 21814, followed by library construction according to the standard protocol of PacBio (Pacific Biosciences). DNA sequencing on the PacBio Sequel II HiFi platform, which produces high-fidelity reads with CCS (v.4.2.0; https://github.com/Pacific-Biosciences/ccs), was done by Shanghai OE Biotech. In addition, we generated 50× Illumina PE 150 reads for the parental B73 and Ames 21814 genomic DNA, respectively. We used yak (https://github.com/lh3/yak) to generate the 31-mer database with the parental short reads. We applied the hifiasm (v.0.16.1) trio mode 18 , a de novo assembler that could faithfully preserve the contiguity of all haplotypes, to assemble the haplotypes of Ames 21814. Contaminants, such as organelle DNA or rDNA fragments, were removed by BLASTN. We mapped Hi-C reads to the assembly with the Juicer pipeline (v.1.5.7) (ref. 51 ) and scaffolded by 3D-DNA (version: 180419) with "-r 0 -m haploid". False duplications and phase error were manually curated on the basis of yak trioeval within Juicebox (v.1.11.08) (ref. 52 ). Finally, we used yak to evaluate the base accuracy of the genome assembly.

Measurement of protein content with the rapid nitrogen analyser
To measure the total nitrogen content in seeds and other tissues (root, stem and leaf), the samples were first dried to a constant weight at 65 °C and then powdered using a grinder (Tissuelyser-48, Shanghai Jingxin Industrial Development; 60 hz, 60 s). A total of 50-70 mg of powder was wrapped in tin foil as the test sample. The total nitrogen content was determined using the Dumas rapid nitrogen analyser ('rapid N exceed') from Elementar. Before each round of measurement, it was necessary to weigh about four standard asparagine samples for internal controls. After debugging the machine, the weight of each sample was entered in the weight column of the rapid N exceed software (v.1.1.25), and the following options were selected as program settings: O 2 dosing time, 60 s; O 2 dosing flow, 120 ml min −1 ; O 2 cut-off threshold, 15%; Autozero delay, 30 s; and peak anticip., 90 s. At the same time, the packaged samples were placed into the sample tank according to the corresponding serial number. Fifty-five samples were measured in one round. The data were exported in Excel format for analysis.

Measurement of free amino acids
The roots, stems and leaves of different plant genetic materials were analysed to determine the content of free amino acids at the flowering stage. Plant materials were dried at 65 °C to a constant weight and ground. Thirty milligrams of powder was treated in 1 ml distilled water at 4 °C for 8 h and then homogenized. The powder was hydrolysed with 6 M hydrochloric acid at 110 °C for 24 h; after filtration, 100 μl liquid was added with 100 μl 5 M NaOH and 800 μl distilled water. After centrifugation at 5,500g for 5 min, 50 μl of supernatant was mixed with amino mixed standards (MSLAB50AA) and 50 μl 4% sulfosalicylic acid solution, and the mixture was centrifuged at 17,370g at 4 °C for 4 min. The supernatant was mixed with 50 μl borate buffer (0.1 M, pH 8.8) and then derivatized with 20 μl 6-aminoquinoline-N-hydroxyl succinimide carbamate at 55 °C for 15 min. After cooling and centrifuging at 4 °C, 50 μl supernatant was analysed by ultra-performance liquid chromatography (UPLC; Ultimate 3000)-tandem mass spectrometry (MS/MS; API 3200 Q TRAP). Chromatographic separations were performed on an MSLab HP-C18 column (150 × 4.6 mm, 5 μm). The mobile phase consisted of water (A) and acetonitrile (B). The solvent was delivered to the column at a flow rate of 0.8 ml min −1 . The conditions for MS/MS detection were as follows: positive-ion mode; ion spray voltage, 5,500 V; nebulizer gas pressure, 55 psi; curtain gas pressure, 20 psi; collision gas Article pressure, medium; turbo gas temperature, 500 °C; entrance potential, 10 V; collision cell exit potential, 2 V. Nitrogen gas was used as the collision gas in a multiple-reaction-monitoring mode. The data were obtained using Analyst software v.1.5.1 (Applied Biosystems). Amino acid detection and data analysis were performed by Beijing Mass Spectrometry Medical Research.

Extraction and SDS-PAGE analysis of zein and non-zein proteins
The endosperm was first dried to a constant weight at 65 °C and then ground to a fine powder in a tissue grinder. A total of 100 mg of flour was extracted with 1 ml of zein extraction buffer (3.75 mM sodium borate, 2% 2-mercaptoethanol (v/v), 0.3% SDS and 70% ethanol). After incubation for 2 h or overnight, the mixture was centrifuged at 17,370g for 10 min. One hundred microlitres of supernatant was transferred to a new tube and mixed with 10 μl of 10% SDS. The solution was vacuum-dried in a Concentrator Plus (Eppendorf) and the precipitate was redissolved in 100 μl ddH 2 O. For the extraction of non-zein proteins, a total of 100 mg of flour was extracted with 1 ml of zein extraction buffer three times. After each centrifugation, the supernatant was discarded. Finally, the precipitate was vacuum-dried and then redissolved in 1 ml of non-zein extraction buffer (12.5 mM sodium tetraborate, 2% 2-mercaptoethanol (v/v) and 5% SDS) for 2 h at room temperature. After centrifugation at 17,370g for 10 min, the supernatant containing the non-zein proteins was transferred to a new tube. Then, 3 μl of zein and non-zein proteins was analysed by 15% SDS-PAGE 57 .

GWAS analysis
We planted 512 inbred lines at Sanya in 2019 and 2020, and 405 and 437, respectively, were harvested for the measurement of seed protein content using the Dumas rapid nitrogen analyser. For each inbred, three ears were used for biological repetition. For each ear, six seeds were dissected for measurement. A linear mixed model was used for the best linear unbiased prediction (BLUP) of seed protein content using the lme4 package 58 in R (R Core Team, 2019). Lines with seed protein content data from both years (2019 and 2020) were kept for analyses. The linear mixed model was: Y ijk = μ + Genotype i + Year j + (Genotype × Year) ij + ε ijk , in which Y ijk is the seed protein content, μ is the overall mean, Genotype i is the genotype effect (i = 1, 2, 3,...n) and ε ijk is the error. Year was fixed with all other variables as random. GWAS analysis was conducted with 1.63 million high-quality SNPs from the maize haplotype map 59 by GEMMA (v.0.98.1) software 60 . Three principal components were fitted, and the centred identical-by-state kinship matrix was used as a random effect in the GWAS model.

Gene location
(1) BSA sequencing. Plants of the BC 4 , BC 6 and BC 8 populations were labelled, and a piece of leaf was sampled for DNA extraction. After measurement of the protein content of the next-generation seeds by SDS-PAGE (BC 4 ) and using the Dumas rapid nitrogen analyser (BC 6 and BC 8 ), the corresponding high-protein and low-protein plants were identified. DNA samples of 75, 150 and 50 for each phenotype in the BC 4 , BC 6 and BC 8 populations, respectively, were pooled and the library construction and sequencing of BC 4 and BC 6 were performed by Personal Biotechnology (Shanghai). The sequencing of BC 8 and data analysis were performed by Shanghai OE Biotech.
(2) Data quality evaluation. The raw reads generated by highthroughput sequencing were preprocessed by fastp (v.0.19.5) software 61 . The quality filter included four steps: (1) removing the linker sequence; (2) removing reads with N (non-AGCT) bases greater than or equal to 5; (3) performing a sliding window with four bases as the window size, and removing reads with an average base quality value of less than 20; and (4) after filtering, removing reads with a length of less than 75 bp or an average base quality value of less than 15. Then, BWA-MEM (v.0.7.12) (ref. 62 ) was used to align the clean reads to the reference genome. After the alignment, results were formatted and sorted by SAMTools (v.1.9), and the duplication reads were removed by Picard (v.4.1.0.0).
(3) Variant information detection. On the basis of the alignment of the sample sequencing data with the reference genome, SNP and indel detection were performed using the HaplotypeCaller module of the GATK (v.4.1.0.0) software 63 .
(4) G-value analysis. The G-value analysis was implemented in QTLseqr 64 . After manually filtering the SNPs, G′, a smoothed value of the standard G statistic, was calculated in an 8-Mb window size. The red line indicates the threshold of the G′ value, corresponding to a q value of 0.01.
(5) Analysis of gene introgression. The coverage depth of high bulk (high protein) and low bulk (low protein) on each window was calculated with a 25-kb window, and then normalized (divided by the respective average sequencing depth). The normalized low bulk depth was subtracted from the normalized high bulk depth to obtain the delta depth. (6) Resequencing mapping analysis. The introgressed genes on chromosome 9 from Ames 21814 were analysed. The physical coordinates of the extracted regions were based on the B73 reference genome (B73_V4). On the basis of the differences in genomic sequence between teosinte Ames 21814 and B73, the SNPs between teosinte and B73 were used as mapping markers. The number of SNPs was counted every 10 kb as a window (SNP density) in each resequencing sample. Then the ggplot2 package in R (v.3.5.1) was used for plotting.
(7) Fine-mapping: More than 2,000 BC 9 individuals were planted, numbered, sampled and self-pollinated. The seed protein content of BC 9 F 2 was determined by a rapid nitrogen analyser. According to the teosinte Ames 21814 haplotype sequence and the B73 reference genome sequence, we designed molecular marker primers (Supplementary Table 7). On the basis of the genotypes of molecular markers and corresponding seed protein contents, THP9 was narrowed down to an interval between two markers, 140.2 and 140.3, on chromosome 9, based on the B73 reference genome (B73_V4). All figures were plotted with ggplot2 (v.3.3.5).

Analysis of the structural variation of THP9
We analysed gene structural variation using GSDS 2.0 (Gene Structure Display Server 2.0), on the basis of the Ames 21814 haplotype and B73 genome sequences. ASN transcripts in the root and leaf of B73 × Ames 21814 were analysed using kallisto (v.0.44.0) for determining the allele-specific expression.

Dual-luciferase reporter assay
We performed a dual-luciferase reporter assay to detect the promoter activities. The ASN4 promoter sequences (around 1.9 kb) were amplified from B73 and Ames 21814, and cloned upstream of the LUC gene in the reporter vector pGreenII 0800. The constructs were transformed into the B73 leaf protoplasts. After incubating for 16 h, the transformed protoplasts were used for total protein extraction and then analysed on a luminometer (Promega 20/20) using a commercial LUC analysis kit according to the manufacturer's instructions (Promega, E1960). Three biological replicates were performed for each experiment. The primers are listed in Supplementary Table 7.

Genetic confirmation of THP9 in B73
The full-length coding sequence of THP9-T was amplified from Ames 21814 root cDNA and fused with a Flag tag at the N terminus. This DNA fragment was inserted downstream of the ubiquitin promoter. The construct was transformed into B73 by Agrobacterium-mediated transformation. This was done by Wimi Biotechnology. The primer sequences used in this study are shown in Supplementary Table 7.

RNA extraction, reverse transcription and qRT-PCR
The leaf and root tissues of B73, Ames 21814 and the NILs were frozen in liquid nitrogen and stored in −80 °C. These materials were ground into fine powder, and a total of 100 mg was extracted with TRIzol reagent (Invitrogen, 15,596,018). RNA was purified with an RNeasy Mini Kit (Qiagen, 74,106) after DNaseI digestion (Qiagen, 79,254) and used for reverse transcription with a SuperScript III First Strand Synthesis Kit (Invitrogen, 18,080,051). cDNA was diluted to 80 ng μl −1 for qRT-PCR with SYBR Green (TAKARA) on a CFX Connect Real-Time System (Bio Rad). The maize Actin gene was used as an internal control and the relative gene-expression level was calculated by the comparative CT method (ΔΔCt method).The expression level in the control was set to 1. All data were generated from three replicate biological samples, and means and s.d. were calculated. The primer sequences are shown in Supplementary Table 7.

Zein copy number analysis
To accurately locate zein gene clusters, BLASTN was used to align the assembled Ames 21814 haplotype with the known α-zein clusters and flanking genes of the B73 and W22 inbreds 65 for copy number analysis. To further clarify the copy number, stringent parameters of BLASTN were chosen as follows: -evalue 1e-10.

Antibody preparation and immunoblot analysis
A partial ASN4 protein fragment from the 460th to the 588th amino acid was used to make antibodies (ABclonal, Wuhan). To analyse the protein accumulation of ASN4 in the roots of NILTeo and B73, total proteins were extracted using the non-zein buffer. Twenty micrograms of total protein was separated by 10% SDS-PAGE and then transferred electrophoretically to a PVDF membrane. The protein was detected with ASN4 antiserum at a dilution of 1:1,000 at 4 °C overnight, followed by secondary anti-rabbit-HRP at a concentration of 1:5,000 (Abmart, M21002L). The control protein, ACTIN, was detected with a mouse monoclonal actin antibody (Abmart, M20009L) at a dilution of 1:1,000, and a secondary antibody, anti-mouse IgG-HRP (Abmart, M21001L) at a dilution of 1:5,000. The membranes were treated with chemiluminescence substrate reagent (Invitrogen, WP20005), and then immunoreactive bands were detected using the Tanon-5200 system. To examine ASN4 in THP9-OE1 and THP9-OE2 plants, total protein was extracted from the leaf. Immunoblotting used anti-Flag (Sigma, A8592) as the primary antibody at a dilution of 1:1,000, and anti-mouse IgG-HRP (Abmart, M21001L) as the secondary antibody at a dilution of 1:5,000. Imaging was done with a Tanon-5200 system (Tanon).

NUE testing
In 2021, we planted NILB73 and NILTeo at the Songjiang experimental field in Shanghai, using soil in cement tanks with or without normal nitrogen application. For normal nitrogen, 20 g of nitrogen fertilizer was applied to each plant at the seedling stage (V4) and 20 g at the jointing stage (V12). The nitrogen content of the fertilizer is 17%. Gene expression, plant height, aboveground biomass, root biomass, root, stem and leaf nitrogen content and seed protein content were investigated. In addition, to perform a test of the NUE of transgenic plants, we planted wild-type and THP9-OE2 plants at the Songjiang experimental field in Shanghai in 2022, under the same nitrogen fertilizer application conditions as in 2021.
Larger field trials were performed in Sanya in 2021. Four different nitrogen applications were tested: normal application (16 g per plant applied at each seedling stage (V4) and jointing stage (V12)); 50% (8 g per plant applied at each seedling stage (V4) and jointing stage (V12)); 25% (4 g per plant applied at each seedling stage (V4) and jointing stage (V12)); and 0% (no nitrogen applied). The nitrogen content of the fertilizer is 17%. Each treatment contained 300 plants grown at 0.6 m × 0.25 m for each plant. Plant height, aboveground biomass, total nitrogen content of the root, stem and leaf, seed protein content and amino acid content were measured.

Introgression of THP9-T in hybrids
NILTeo and NILB73 were crossed with Mo17 to create hybrids that were grown in Harbin. Using molecular marker selection, THP9-T was introgressed into Zheng 58 and Chang 7-2 by backcrossing for four generations. The marker was developed on the basis of an indel polymorphism between THP9-T and THP9-B. After backcrossing, the resulting plant materials were self-pollinated for two generations, creating Zheng 58-T and Chang 7-2-T. The cross of Zheng 58-T and Chang 7-2-T produced a modified hybrid, Zhengdan 958-T, that carried the THP9-T allele. Zhengdan 958-T and Zhengdan 958-B, with the THP9-B allele, were grown in Sanya in 2021. Plant height, aboveground biomass, total nitrogen content of the root, stem, and leaf and seed protein content were measured.

Statistical analysis
GraphPad Prism v.8.0.2 and Microsoft Excel 2016 were used for the statistical analyses (one-way ANOVA, Tukey's test and two-sided Student's t-test).

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
All sequencing data for creation of the Ames 21814 haplotype have been deposited at the National Genomics Data Center (NGDC; https://ngdc. cncb.ac.cn/) under the BioProject number PRJCA011706, in which the PacBio HiFi data are under the accession number SAMC874385, the Hi-C data are under the accession number SAMC873392, the PacBio isoform sequencing data are under the accession number SAMC873393 and the Illumina WGS data are under the accession numbers SAMC874386 and SAMC874387. The final assembled genome sequence data reported in this paper have been deposited under accession number GWHBKHM00000000 that is publicly accessible at https://ngdc.cncb. ac.cn/gwh. The RNA-sequencing data of Ames 21814, B73 × Ames 21814 and B73 (roots and leaves at flowering stage) are under the accession numbers SAMC874334-SAMC874357. Source data are provided with this paper. Ames 21814 and B73. Data are presented as mean ± s.d. (n = 6 biologically independent samples). d, SDS-PAGE of zein and non-zein proteins in B73 and 10 teosinte lines. The apparent size in kDa of each protein band is indicated on the left. M, protein markers. γ27, 27-kDa γ-zein; α22, 22-kDa α-zein; α19, 19-kDa α-zein; γ16, 16-kDa γ-zein; γ15, 15-kDa γ-zein; δ10, 10-kDa δ-zein. 3 times the SDS-PAGE analysis experiment were repeated independently with similar results. In a and b, a two-tailed Student's t-test was used to determine P values, see Source Data. Fig. 2 | See next page for caption. Fig. 2 | Ames 21814 haplotype assembled by trio binning. a, Phenotypes of teosinte Zea mays subsp. parviglumis (accession number: Ames 21814), B73 x Ames 21814 and B73. Scale bar, 35 cm. b, Teosinte haplotype genome assembly flow chart. To perform a de novo assembly of the teosinte haplotype, we sequenced and assembled its haplotype by integrating three technologies: HiFi long reads with the PacBio Sequel platform, paired-end sequencing with the Illumina HiSeq platform, high-throughput chromatin conformation capture (Hi-C). We completed assembly of the teosinte haplotype based on the trio-binning strategy because of the characteristics of high heterozygosity of Ames 21814. c, Whole genome Hi-C interaction heat map of 2.5 Mb windows. Each blue number indicates the corresponding chromosome.

Extended Data
Each cluster represents a chromosome in the haplotype. In a set of chromosomes, the bottom cluster represents the hap1 (teosinte Ames 21814) chromosome, the top cluster represents the hap2 (B73) chromosome. d, Each chromosomes Hi-C contact map, Ames 21814 (Teo). e, Dot plot of B73 genome assembly (hap2, this study) and B73_v5 genome assembly. Alignment less than 20 kb was filtered out. f, Dot plot of Ames 21814 haplotype (hap1) and B73 haplotype (hap2). Alignment less than 20 kb was filtered out. g, Haplotypespecific inversions supported by the Hi-C contact map. Fourteen inversions larger than 1 Mb were selected for Hi-C zoom-in inspection by excluding those caused by tandem repeat arrays (CentC or knob). Thirteen inversions were correctly verified, while the Chr1: 235 Mb inversion is a misscaffold contig. Fig. 8 | NILB73 and NILTeo under normal and low-nitrogen conditions. a-c, Construction of four aboveground concrete containers with plastic film covering the containers. d, NILB73 and NILTeo grown in container without nitrogen fertilizer application. e, The nitrogen content of soil in containers with and without nitrogen application. Data are mean ± s.d. (n = 16 biologically independent samples). f, Plant phenotypes of NILB73 and NILTeo with and without nitrogen application. Scale bar, 20 cm. g, Root phenotypes of NILB73 and NILTeo with and without nitrogen application. Scale bar, 5 cm. h, Plant height of NILTeo and NILB73. Data are mean ± s.d. (n = 8 biologically independent samples). i, qRT-PCR analysis of THP9 expression in NILB73 and NILTeo roots with and without nitrogen application. Data are mean ± s.d. (n = 3 biologically independent samples). j, Root fresh weight of NILB73 and NILTeo with and without nitrogen application. Data are mean ± s.d. (n = 8 biologically independent samples). k, Aboveground biomass of NILB73 and NILTeo with and without nitrogen application. Data are mean ± s.d. (n = 14 biologically independent samples). l. Total nitrogen content in NILB73 and NILTeo roots with and without nitrogen application. Data are mean ± s.d. (n = 6 biologically independent samples). m, Total nitrogen content in NILB73 and NILTeo stems with and without nitrogen application. Data are mean ± s.d. (n = 10 biologically independent samples). n, Total nitrogen content in NILB73, and NILTeo leaves with and without nitrogen application. Data are mean ± s.d. (n = 6 biologically independent samples). o, Protein content in NILB73 and NILTeo seeds with and without nitrogen application. Data are mean ± s.d. (n = 20 biologically independent samples). In e and h-o, letters indicate significant differences (P < 0.01, one-way ANOVA and further Tukey's test). biologically independent samples). g. Total nitrogen content in WT and THP9-OE2 roots with and without nitrogen application. Data are mean ± s.d. (n = 12 biologically independent samples). h, Total nitrogen content in WT and THP9-OE2 stems with and without nitrogen application. Data are mean ± s.d. (n = 12 biologically independent samples). i, Total nitrogen content in WT and THP9-OE2 leaves with and without nitrogen application. Data are mean ± s.d. (n = 12 biologically independent samples). j, Protein content in WT and THP9-OE2 seeds with and without nitrogen application. Data are mean ± s.d. (n = 20 biologically independent samples). In c-g, letters indicate significant differences (P < 0.01, one-way ANOVA and further Tukey's test).