R3-MYB repressor Mybr97 is a candidate gene associated with the Anthocyanin3 locus and enhanced anthocyanin accumulation in maize

Anthocyanin3 inhibits the anthocyanin and monolignol pathways in maize. Transposon-tagging, RNA-sequencing, and GST-pulldown assays determine Anthocyanin3 may be R3-MYB repressor gene Mybr97. Anthocyanins are colorful molecules receiving recent attention due to their numerous health benefits and applications as natural colorants and nutraceuticals. Purple corn is being investigated as a more economical source of anthocyanins. Anthocyanin3 (A3) is a known recessive intensifier of anthocyanin pigmentation in maize. In this study, anthocyanin content was elevated 100-fold in recessive a3 plants. Two approaches were used to discover candidates involved with the a3 intense purple plant phenotype. First, a large-scale transposon-tagging population was created with a Dissociation (Ds) insertion in the nearby Anthocyanin1 gene. A de novo a3-m1::Ds mutant was generated, and the transposon insertion was found to be located in the promoter of Mybr97, which has homology to R3-MYB repressor CAPRICE in Arabidopsis. Second, a bulked segregant RNA-sequencing population found expression differences between pools of green A3 plants and purple a3 plants. All characterized anthocyanin biosynthetic genes were upregulated in a3 plants along with several genes of the monolignol pathway. Mybr97 was highly downregulated in a3 plants, suggesting its role as a negative regulator of the anthocyanin pathway. Photosynthesis-related gene expression was reduced in a3 plants through an unknown mechanism. Numerous transcription factors and biosynthetic genes were also upregulated and need further investigation. Mybr97 may inhibit anthocyanin synthesis by associating with basic helix–loop helix transcription factors like Booster1. Overall, Mybr97 is the most likely candidate gene for the A3 locus. A3 has a profound effect on the maize plant and has many favorable implications for crop protection, human health, and natural colorant production.


Introduction
Anthocyanins are the colorful molecules present in nearly every plant species. These compounds are derived from phenylalanine and are a major branch point in the flavonoid pathway. Anthocyanins provide many valuable functions for the plant and are involved with a multitude of biotic and abiotic stress responses, including UV protection, herbivory defense, and reactive oxygen species (ROS) scavenging (Hatier and Gould 2008;Lev-Yadun and Gould 2008). Anthocyanins have many important human applications as well. Purple corn (Zea mays L.) contains anthocyanins and other antioxidant compounds that make it beneficial as a potential nutraceutical (Lao et al. 2017). Anthocyanins are well-known health-promoting compounds and have been shown to reduce disease biomarkers associated with cancer, cardiovascular disease, diabetes, inflammation, and obesity (He and Giusti 2010). Increasing dietary consumption of purple corn has a potential positive effect on human health. Given this, investigations into the regulation of anthocyanin biosynthesis in maize are justified.
Anthocyanin synthesis in all plant species is regulated by a core set of transcription factors collectively referred to as the MBW complex for the MYB, bHLH, and WD-repeat proteins that must physically interact for anthocyanin-related Communicated by Mingliang Xu. 1 3 55 Page 2 of 16 genes to be transcribed (Lloyd et al. 2017). In maize, the MBW complex is made of multiallelic gene families that coordinate the temporal and spatial expression of anthocyanin synthesis. In aleurone-pigmented tissue, MYB Colored aleurone1 (C1), bHLH Colored1 (R1), and WD-repeat Pale aleurone color1 (Pac1) predominantly activate pigment synthesis. In mature photosynthetic tissue, anthocyanin synthesis is regulated by MYB Purple plant1 (Pl1), bHLH Booster1 (B1), and an unknown WD-repeat protein (Coe et al. 1988;Ludwig and Wessler 1990;Carey et al. 2004;Cone 2007).
In addition to core regulators, two classic genes-Antho-cyanin3 (A3) and Intensifier1 (In1)-are known to enhance the anthocyanin pathway when recessive. Intensifer1 has homology to a bHLH transcription factor and inhibits pigment synthesis in the aleurone. It is hypothesized that In1 physically interacts with R1 protein in order to inhibit pigment synthesis in aleurone (Burr et al. 1996). The other negative regulator, A3, has not been well characterized, but the locus responsible has been known for decades (Lindstrom 1934). The a3 phenotype is very prevalent when a weak allele of B1 such as B1-b is present along with Pl1. Plants with this genetic makeup and A3 are the normal condition of green, but begin developing a pale purple color as the plants reach maturity and senesce. This is in contrast to a3 plants that are intensely purple in leaf sheaths beginning after the ninth vegetative leaf stage (V9) and into the reproductive phase ( Fig. 1) (Styles and Coe 1986).
To this date, the gene responsible for the a3 phenotype has not been characterized. It is known that the A3 locus lies on Chromosome 3 in close proximity to Anthocyanin1 (A1) (Beckett et al. 1973), but there have been some discrepancies as to where in segment of the genome the a3 locus lies. One study placed A3 upstream of A1 about 4.4 cM and this is reflected also on the Maize Genetics and Genomics Database (Styles and Coe 1986;Andorf et al. 2010). One study linked intense leaf sheath pigmentation to A3 using RFLP markers umc63 and csu25a, which are both downstream of A1 (Lauter et al. 2004). The most recent calculation found that A3 is 7.7 cM downstream of A1 (Robinett et al. 1995). In this work, two approaches were used to discover candidate genes associated with the a3 phenotype and to characterize the effect of this negative regulator on the anthocyanin pathway in maize.

Plant materials
Genetic stocks were chosen for this study that differed in their anthocyanin pathway genotypes. Genotypes are curated and available using the Locus Lookup Tool in the Maize Genetics and Genomics Database (Andorf et al. 2010). The reference allele donor for a3, referred to as a3-ref, was in genetic stock 320 N and was provided by the Maize Genetics Cooperation Stock Center. This stock contains the weak allele of B1, designated B1-b; a dominant Pl1 allele; and a null r1 allele. B73 is the maize reference genome and contains a dominant A3 allele with recessive b1 and pl1 loci. The W22 inbred used in this study is the "color-converted" version with aleurone pigmentation that is commonly used for genetic studies. This inbred contains a pigmented aleurone due to dominant R1 and C1 alleles, but is in a recurrent W22 background. W22 also has a dominant B1-b allele, a recessive pl1 allele, and a dominant A3 allele. The active transposon stock used was in a W22 background and contains the Ac-im with transposase activity and the non-autonomous Ds element in a1 (a1-m3::Ds) (Conrad and Brutnell 2005). This genetic stock referred to as W22 Ac-im;a1-m3::Ds was kindly donated by Kevin Ahern. To create the transposon-tagging population, reciprocal crosses of W22 Ac-im;a1-m3::Ds and 320 N were made in the summer of 2018 at the University of Illinois Vegetable Crops Farm in Urbana, IL, USA. Progeny of the cross was grown in the Pl1 alleles conferring intensely purple leaf sheaths, husks, and tassels. C Mature 320 N × W22 Ac-im;a1-m3::Ds dominant A3 ear expressing the B1-b weak pigmentation phenotype in the husk. D 320 N × W22 dominant A3 expressing the B1-b weak pigmentation phenotype at the silking stage. E Leaf sheath of a 320 N × W22 Acim;a3-m1::Ds plant at silking demonstrating the a3 phenotype. F 320 N genetic stock demonstrating the a3 intense purple phenotype at the silking stage. Photo Credit A-C UI Public Affairs: L. Brian Stauffer same location the following year and monitored for the a3 phenotype. Predicted de novo mutants were crossed to 320 N and selfed. To create the bulked segregant RNA-seq population, 320 N was crossed to B73. In 2019, all F 2 plants with the a3 phenotype and plants that appeared to have weak plant pigmentation due to B1-b and Pl1 (Fig. 1) were selfed. Select F 2:3 families were grown in 2020 for bulked segregant RNA-seq analysis.

Nucleic acid extractions
DNA extractions for all plant tissues described below followed a CTAB procedure (Doyle and Doyle 1990) with modifications described in Supplementary Materials and Methods. Tissue for RNA extraction was flash-frozen in liquid nitrogen during collection, freeze-dried, ground into a fine powder, and stored at -20 ˚C before extraction. RNA extraction was performed using the RNeasy Plant Mini Kit (QIAGEN) with manufacturer's instructions. DNA was depleted during RNA extraction on the column using DNase (QIAGEN). RNA quality was assessed using a Nan-oDrop 2000 (Thermo Fisher Scientific, LLC) and a bleach gel (Aranda et al. 2012) before submission for sequencing.

Inverse PCR of transposon-flanking sites
Inverse PCR of transposon-flanking sites followed the general protocol previously described (Ahern et al. 2009) with modifications described in Supplementary Materials and Methods. Briefly, in this study, genomic DNA was circularized by cutting with PstI and ligating sticky ends. Primers LC18 and LC24 (Table S1) were used in the first round of PCR using the Platinum SuperFi DNA Polymerase (Invitrogen) kit. The second round of nested PCR (for stringent Ds selectivity) used primers LC45 and JGp2 (Table S1). Transposon-flanking fragments were cloned using the PCR Cloning Kit (New England Biolabs) according to the manufacturer's instructions. Purified plasmids were sequenced at the DNA Services laboratory of the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana-Champaign using the LC45 primer (Table S1).

Molecular marker analysis
The most recent mapping work determined that the a3 locus was 7.7 cM downstream of A1 (Robinett et al. 1995). For this reason, SSR markers downstream from A1 were chosen from the Maize Genetics and Genomics Database to assess genotypes for a3-ref (Andorf et al. 2010). Marker umc2008, which is 3.65 Mb from A1 in B73, was ultimately chosen based on efficient gel separation. Sequences for umc2008 primers are provided in Table S1. The marker was amplified using Taq 5 × Master Mix (New England Biolabs). The reaction consisted of 1.2 M betaine and 600 nm primers. The PCR conditions for umc2008 are as follows: denature at 94 ˚C for 1 min followed by 35 cycles of 94 ˚C for 30 s, 53 ˚C for 30 s, and 72 ˚C for 30 s, ending with 3 min extension at 72 ˚C. Reactions were separated on a 4% SFR (VWR) agarose gel in TBE buffer for 45 min at 5.4 V cm −1 and scored visually based on parent band size.

RNA-seq library preparation
Sampling for bulked segregant RNA-seq analysis followed a similar procedure as previously described (Chayut et al. 2015). In select F 2:3 families from the B73 × 320 N population, three individual plants were designated as biological replicates in either recessive (purple) and dominant (green) A3 pools and genotyped with umc2008 to validate the phenotypes observed. Some families were segregating for the a3 phenotype and had representation from both recessive and dominant A3 pools. Plants with the pale purple B1-b phenotype were chosen for sampling because the lack of dominant B1 or Pl1 alleles would result in a normal green plant regardless of the A3 allele present. Mature husk tissue (greater than 20 DAP) was flash-frozen and pooled with their respective biological replicates. In total, there were 6 samples: 3 biological replicates of 31 dominant A3 families and 3 biological replicates of 31 recessive a3 families. RNAseq was performed as described in Supplementary Materials and Methods at the DNA Services laboratory of the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana-Champaign. Raw data from RNA-seq have been deposited into the Short Read Archive (SRA) database under the project identity PRJNA764765.

Bulked segregant RNA-seq analysis
Raw adapter-trimmed transcriptome data were trimmed and quality-filtered using Fastp (Chen et al. 2018). First, ten nucleotides from the beginning and three nucleotides from the end of each read were trimmed. Then, reads below a quality score of 30 and less than 50 nt were removed. Quality-filtered reads were then aligned to the B73 RefGen_v4 genome using STAR 2.7.6 (Dobin and Gingeras 2015; Jiao et al. 2017). Gene counts were calculated with featureCounts (Liao et al. 2014). The summary of the read counts from the RNA-seq filter pipeline is given in Table S2. Differential gene expression was calculated in EdgeR (Robinson et al. 2010) using gene counts normalized to library size. A curated list of differentially expressed genes is given in Table S3. The full normalized dataset with gene annotations and differential gene expression calculations is available in Table S4. Genes with an FDR-adjusted p-value less than 0.01 and an absolute value log 2 -fold change greater than 1.2 were considered significant. GO term enrichment was calculated using AgriGO v2 (Tian et al. 2017). To call SNPs from the RNA-seq libraries, STAR 2.7.6 two-pass mode (Dobin and Gingeras 2015) was enabled, and the resulting BAM files were passed through SAMtools 1.11 (https:// github. com/ samto ols/ samto ols/ relea ses/1. 11) and BCFtools 1.9 (https:// github. com/ samto ols/ bcfto ols/ relea ses/ tag/1.9) Danecek et al. 2021). Genotypes were called using the multiallelic calling model using all compiled data from each biological replicate (Danecek et al. 2014). Only biallelic SNPs with a final depth greater than or equal to 150 and an average quality of greater than or equal to 30 were kept in the analysis. A custom script was used to accumulate the depth of each allele (either reference or alternate) at each position. Two measures were used to determine differences in heterozygosity between recessive and dominant A3 pools. The first measurement, ΔSNP, took the difference of scored genotypes (0 = B73 allele, 1 = 320 N allele, and 0.5 = heterozygote) for dominant vs. recessive pools. Hypothetically, the ΔSNP window for the A3 locus would be close to 1, since alleles should be fixed in their respective pools. The second calculated F st given the allelic depth of each SNP, for which a sufficient threshold for differences is 0.25 (Hartl and Clark 1997). Both values were smoothed in window size of 50 SNPs with a step of 10 SNPs to alleviate noise from the expression dataset.

Quantitative PCR
Various time points of B73 development were sampled to determine patterns of Mybr97 expression. Sampling procedures are explained in Supplementary Materials and Methods. Validation of RNA-seq results was done using the same RNA as was sequenced. Additionally, husk tissues collected from three 320 N and three B73 plants on the same day were compared for gene expression to the RNA-seq population. RNA was converted to cDNA using poly-T primers and the Protoscript M-MuLV First Strand cDNA Synthesis Kit (New England Biolabs). PowerUp SYBR Green Master Mix (Thermo Fisher Scientific, LLC) was used to perform qPCR. Reaction mixes and cycling conditions for qPCR and the melt curve analysis were as suggested by manufacturer's instructions. Expression values from the time point study were normalized to housekeeping gene Elongation factor 1α, while bulked segregant RNA-seq validation used Elongation factor 1α and β-Tubulin as the controls as was proposed in a previous study (Lin et al. 2014). Genes and primers used for qPCR are given in Table S1. Data analysis was done using QuantStudio 7.0 software (Thermo Fisher Scientific, LLC). Overall, there was high agreement with the data from qPCR and RNA-seq. The ΔΔC T (relative change in cycle threshold) and log 2 -fold change data correlated very highly (ρ = 0.97) indicating the RNA-seq data are reliable and comparisons are generally accurate.

Cloning of maize anthocyanin transcription factors
Coding sequences of full B1-b and the B1-Myb-interacting region (B1-MIR) were amplified using cDNA from 320 N husk tissue, and full-length Mybr97 cDNA was amplified using cDNA from B73 husk tissue with the appropriate primer sets (Table S1). The 320 N a3-ref allele of Mybr97 was cloned from genomic DNA using primers MP-A3-2F and MP-A3-2R (Table S1) and introduced into the pMiniT 2.0 vector from the PCR Cloning Kit (New England Biolabs) using manufacturer's instruction. All PCR was conducted using the Platinum SuperFi II PCR Master Mix (Ther-moFisher Scientific, LLC) following manufacturer's recommendations except for amplification of Mybr97 genomic DNA where an extension of 65 ˚C was used to account for high AT-rich regions (Dhatterwal et al. 2017). Vector pGSTag, which was kindly donated by D. Ron (Addgene plasmid #21,877) (Ron and Dressler 1992), was used as the GST fusion vector. Full Mybr97 cDNA was inserted into vector pET-30a( +) (MilliporeSigma), which contains a 6x-His tag. All vectors were cloned into E. coli strain DH5-α for sequence confirmation and the Rosetta-gami2 pLysS (DE3) strain (MilliporeSigma) for protein expression. Sanger sequencing was performed at the DNA Services laboratory of the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana-Champaign using pGEX primers for pGSTag, T7 primers for pET-30a(+), and the Cloning Analysis primers for pMiniT 2.0 (Table S1).

GST pull-down assay
Full details for culturing maize transcription factors and performing the GST pull-down assay are available in Supplementary Materials and Methods. Briefly, cultures containing Mybr97 and GST alone were induced with 0.5 mM IPTG for 3 h at 37 ˚C and cultures containing B1 and B1-MIR were induced with 0.1 mM IPTG overnight at 18 ˚C. Purified polyhistidine-tagged Mybr97 (approx. 200 µg) was incubated on glutathione resin with purified GST-tagged protein for 1 h at 4 ˚C. Protein was viewed on a 12% Mini-Protean TGX (Bio-Rad Laboratories) gel stained with Coomassie blue (Simpson 2007).

Total anthocyanin and total phenolic assays
Bulked segregant RNA-seq analysis tissue was used for anthocyanin extractions along with three biological replicates of B73 and 320 N husks sampled on the same day. Triplicates of 25 mg were extracted with 1 mL 2% (v/v) formic acid for 1 h at 50 ˚C for each biological replicate. Samples were centrifuged for 10 min, and the supernatant was diluted in 25 mM potassium chloride (pH 1.0) and 0.4 M sodium acetate (pH 4.5) as per the AOAC standard pH differential method (Lee et al. 2005). Anthocyanins were quantified as cyanidin 3-glucoside equivalents using the molar extinction coefficient of 26,900 L M −1 cm −1 (Lee et al. 2005). Total phenolics were extracted in triplicate from 25 mg of tissue from each biological replicate in 1 mL 95% ethanol for 1 h at 50 ˚C and then diluted five-or tenfold in water. Total phenolics were quantified using a method previously described (Singleton and Rossi 1965), but scaled down tenfold to be adapted for a 96-well plate. In this procedure, twenty µL diluted phenolic samples were mixed with 100 µL 0.2 N Folin-Ciocalteu's regent and incubated for 5 to 6 min before the addition of 80 µL 7.5% (w/v) sodium carbonate. The plates were incubated for at least one hour in the dark, and the absorbance was read at 765 nm. Total phenolic samples were quantified in gallic acid equivalents (GAEs) using a standard curve of 31.25 to 500 µg ml −1 on each plate. Significant differences among varieties were calculated in GraphPad Prism 9.1.2 using a t-test.

Anthocyanin and total phenolic content were upregulated in plants with the a3 phenotype
An additional population segregating for a3 alleles was made by crossing B73, the maize reference genome, to genetic stock 320 N. B73 is a normal type green plant ( Fig. 1) with recessive b1 and pl1 alleles and a dominant A3 allele. B73 produced undetectable levels of anthocyanins in mature husks, but contained around 7.2 mg total phenolics expressed in gallic acid equivalents (GAE) per g husk tissue (Fig. 2). In the segregating B73 × 320 N F 2:3 population, three plants per family were designated as biological replicates and pooled with their respective phenotypes. This resulted in 31 green or purple individuals pooled per biological replicate. Anthocyanin accumulation was markedly reduced with a dominant A3 allele. 320 N and pools of recessive a3 husks expressed between 50-and 100fold more anthocyanins than A3 plants (Fig. 2). Pigment content after one extraction yielded 4% pigment on a dry weight basis for 320 N (Fig. 2). Total phenolic content increased approximately three-to sixfold with a recessive a3. The dominant A3 pool accumulated more (8.30 ± 0.17 vs. 7.17 ± 0.10 GAE ± standard error) total phenolics than B73 presumably due to the activation of the anthocyanin pathway by B1-b.

A transposon insertion in the Mybr97 promoter confers the a3 phenotype
Regardless of the uncertainty in the exact location of the a3 locus, transposon tagging could be employed utilizing the Activator-immobile (Ac-im)/Dissociation (Ds) system. In this system, Ac-im contains an active transposase, but cannot excise itself. Ds is a non-autonomous element that can be mobilized in the presence of Ac-im. A genetic stock in a W22 inbred background was developed that has Acim and a mobile Ds element in anthocyanin biosynthetic gene A1 and is designated W22 Ac-im;a1-m3::Ds (Conrad and Brutnell 2005). The proximity of the A3 locus to A1 provides for a chance that the Ds element may jump from A1 to the A3 locus. The W22 background means that the genetic stock has a B1-b locus and a recessive pl1, so the plant is the normal condition of green. The active transposon stock was crossed to genetic stock 320 N, the a3-reference (a3-ref) donor consisting of dominant B1-b and Pl1 loci. 320 N exhibits the recessive a3 phenotype and becomes intensely purple in the leaf sheaths, husks, and tassels after V9. Crossing W22 Ac-im;a1-m3::Ds to 320 N a3 typically results in a normal green plant that becomes pale purple around senescence due to the B1-b/Pl1 locus combination and a functional A3 from W22. Approximately 25,000 plants from this cross were screened for a de novo a3 phenotype. One plant out of the population contained intensely purple leaf sheaths, husks, and tassels consistent with the a3 phenotype. DNA was extracted from three different leaf tissues from this plant to be used for inverse PCR of the transposon insertion site. In all three replicates, the sequence flanking the Ds element pinpointed the insertion to a G-box element in the promoter region of Mybr97. To test whether the insertion was germinal, the W22 × 320 N Ac-im;a3-m1::Ds plant was testcrossed to 320 N, resulting in 40 purple plants with the a3 phenotype. This indicates that the de novo insertion is sufficient to confer the a3 phenotype.

Phylogenetic analysis of Mybr97 reveals homology to anthocyanin R3-MYB repressors
Mybr97 is an R3-MYB-like gene that has homology to known MYB repressors. The closest homology (60.71% , which is involved with trichome regulation (Wester et al. 2009). Trichome and anthocyanin regulation are tightly linked in Arabidopsis as they share overlapping proteins from the MBW complex (Walker et al. 1999). The Mybr97 protein is annotated in NCBI (Accession ONM40397.1) as AtCAPRICE (AtCPC), a known negative regulator of anthocyanin synthesis. R3-MYB proteins act as anthocyanin repressors by competitively binding to bHLH members of the MBW complex (Zhu et al. 2009). These proteins have been found to contain a motif required for binding to R1/B1 bHLH proteins. The conserved motif [DE] Lx2[RK]x3Lx6Lx3R (Zimmermann et al. 2004) was present in Mybr97, except for a substitution of isoleucine for the last leucine ( Figure S1).The functional similarity of isoleucine and leucine indicates Mybr97 should be capable of binding to R1 or B1 and is a likely candidate gene responsible for the a3 phenotype.

Mybr97 can bind to a maize bHLH member of the MBW complex in vitro
As mentioned above, the most probable mechanism for inhibition of the anthocyanin pathway is through the sequestration of the MBW complex, based on the structure of Mybr97. To determine the interaction of Mybr97 protein with bHLH members from maize, a GST pull-down assay was performed. Anthocyanin bHLH allele B1-b and the MYB-interacting region (MIR) of this gene (designated B1-MIR) were introduced into a GST fusion vector. Results indicate that Mybr97 can bind directly to B1-MIR, but not the GST vector alone (Fig. 3). No interaction was detected with full-length B1. The interaction with B1-MIR suggests Mybr97 is a competitive inhibitor of the MBW complex and gives further evidence the a3 locus is associated with Mybr97.
Bulked segregant RNA-seq narrows down candidate genes for the a3 phenotype RNA-seq was performed on the dominant and recessive a3 pools resulting in an average of 64.9 million paired end reads per sample after filtering and aligning (Table S2). Differential gene expression analysis found 1794 differentially expressed genes with a log 2 -fold change of 1.2 or greater and an FDR-adjusted p-value of 0.01 or less. Transcriptomic data were also used to call SNPs resulting in 78,815 SNPs with coverage greater than 150 and phred quality greater than 30. The average distance between SNPs was 11.6 kb. Genotypes were coded as 0 if they contained the B73 reference allele, 1 if they had the 320 N a3-ref allele, or 0.5 if they were heterozygous. The difference between the coded genotypes in recessive and dominant pools is the ΔSNP value. The expectation is that alleles not involved with the a3 phenotype will be in a 1:1 ratio. The smoothed ΔSNP value remained close to 0 across most of the genome except on Chromosome 3 where the segment between 219.8 and 225.0 Mb was greater than 0.9 (Fig. 4). The fixation index (F st ) is another GST Glutathione S-transferase, MIR MYB-interacting region calculation to determine whether an interval departs from the expected heterozygosity. Heterozygosity within and heterozygosity between are assumed to be 0.5, so the F st should theoretically be 0 for a SNP not involved with A3. Using this index, a region on Chromosome 3 from 219.3 to 225.5 Mb had a smoothed F st greater than 0.25 (Fig. 4). A majority of the significant intervals in this analysis place the A3 locus downstream of A1 in accordance with previous studies (Robinett et al. 1995;Lauter et al. 2004). Within the conservatively larger significance interval for F st , there were 16 transcription factors and a total of 183 expressed genes (Table S3). One of the genes within the interval was A1 itself, indicating that the significant interval is too conservatively large. Within the ΔSNP interval, a total of 32 genes were differentially regulated with 16 of those being downregulated in dominant vs. recessive a3 samples (Table S3). Transcription factors downstream of A1 that were significantly downregulated in recessive a3 pools include bHLH25 and Mybr97. Mybr97 was the most downregulated gene in the interval (log 2 -fold change − 6.52 vs. − 1.67) and the fifth most downregulated gene in the entire dataset (Table S4). Altogether, the transcriptomics results with other experiments points to Mybr97 as the likely gene responsible for the a3 phenotype. Fig. 4 Bulked segregant RNA-seq analysis results. A The ΔSNP calculation was the difference between scored genotypes where 0 equals the reference B73 allele, 1 equals the 320 N allele, and a score of 0.5 was heterozygous. Theoretically the value will approach 1 as a locus becomes fixated in the population. B The F st index is a measure of the heterozygosity within pools compared to total heterozygosity across pools. A common threshold for this value is 0.25 for populations becoming fixated for an allele (Hartl & Clark 1997). Each measure was smoothed within a window of 50 SNPs and step size of 10 SNPs

Transcript of Mybr97 is abundant in leaf sheaths and husks
Quantitative PCR (qPCR) of Mybr97 across various developmental time points in B73 found that the gene is expressed in every tissue analyzed (Fig. 5). At the fifth vegetative leaf stage (V5), B73 and 320 N plants are both green. When the plant matures to V9, 320 N begins to flush with color until it is entirely purple except for the leaf blades (Fig. 1), while B73 stays the normal condition of green. The pattern of Mybr97 across development is consistent with the a3 phenotype. Mybr97 is more abundant in mature photosynthetic tissues (Fig. 5), especially in husks at the silking stage (R1) followed by leaf sheath tissue at V9 and then V5. A detectable amount of transcript was detected in roots at V1 and pericarp 10 to 20 days after pollination (DAP), but the amount of transcript was minimal compared to mature leaf tissues. Interestingly, the expression of Mybr97 exceeded that of the housekeeping gene Elongation factor 1α for maturing leaf sheath and husk tissue.

Transcriptomic analysis reveals the entire anthocyanin biosynthetic pathway was affected by the A3 locus
All canonical biosynthetic genes associated with anthocyanin production in maize were significantly upregulated in recessive a3 plants in the RNA-seq dataset and several were validated with qPCR (Table 1 and Fig. 5). The full anthocyanin pathway with canonical and non-canonical genes is shown in Fig. 6. Maize is capable of synthesizing peonidin, albeit in a typically low amount (Paulsmeyer et al. 2017), but the anthocyanin O-methyltransferase (AOMT) that creates peonidin has not been characterized in maize. Only three methyltransferases were upregulated in recessive a3 plants.
One gene, designated O-methyltransferase4 (Omt4), resides on chromosome 4 within a QTL for peonidin content from another study (Chatham & Juvik 2021). Three genes associated with chalcone synthase, the first committed step of anthocyanin production, were upregulated in recessive a3 plants. It is known that Colorless2 (C2) is the major chalcone synthase gene in most tissues outside pollen (Franken et al. 1991), but in husks it appears White pollen1 (Whp1) and the uncharacterized gene Chls2 may also be involved.
The anthocyanin pathway shares precursors with the phenylpropanoid pathway, which is the larger superfamily of molecules that includes monolignols and other phenolic acids (Deng and Lu 2017). Recessive a3 plants produced three-to fivefold more phenolics in mature husks than the dominant A3 green pools (Fig. 2). The first step in the anthocyanin pathway utilizes p-coumaryl-CoA, which is synthesized from phenylalanine in a series of enzymes consisting of phenylalanine ammonia lyase (PAL), cinnamate 4-hydroxylase (C4H), and 4-coumaroyl CoA-Ligase (4CL), in order (Boerjan et al. 2003). In husk tissue, Pal1, Pal2, and Pal6 were all upregulated in recessive a3 plants along with the C4H and 4CL biosynthetic steps. Maysin biosynthetic genes Salmon silks2 (Sm2) and Cgt1 were significantly upregulated in a3 recessive husks, but not Sm1. Pericarp1 (P1) canonically activates the maysin pathway in silks (Casas et al. 2016), but was not expressed in husk tissue in this population (Table S4). Maysin was not assayed in husks, but the activation of genes involved with its synthesis was unexpected.

Numerous regulatory genes are impacted by A3
Many transcription factors with predicted roles in anthocyanin regulation were differentially regulated in recessive a3 husks. A complete list of transcription Relative expression values of differentially expressed genes determined by qPCR. All values are relative to the dominant A3 pool from the bulked segregant RNA-seq population and standardized to Elongation factor 1α and β-Tubulin. Error bars represent standard errors factors significantly differentially regulated is included in Table S3. The central bHLH member expressed in husks, B1, was significantly upregulated in recessive a3 plants (Fig. 5). Unexpectedly, transcript for Mybr97 could be detected in two of the recessive a3 pools, albeit at a very low level (Table S4). Anthocyanin-related transcription factors typically only associated with aleurone pigmentation-C1, In1, Pac1, and R1-were also expressed in husks (Table S4). Of those, In1 was significantly upregulated in a3 recessive husks (Fig. 5). Many transcription factors upregulated in recessive a3 plants were novel genes not known to be associated with anthocyanin synthesis. Nactf25 (Zm00001d023294) and Nactf44 (Zm00001d028999) are Nac transcription factors with homology to a peach (Prunus persica) transcription factor that confers red pigmentation to peach flesh (Zhou et al. 2015). Wrky33 (Zm00001d024323) is a transcriptional regulator with homology to Arabidopsis TRANSPARENT TESTA GLABRA2, which affects late steps in the anthocyanin biosynthetic pathway (Gonzalez et al. 2016).

Photosynthesis-related gene expression is reduced in recessive a3 plants
Numerous photosynthesis-related genes were differentially regulated in recessive a3 plants (Table S3). Many of these genes were master regulators of photosynthesis. One such regulator was Elongated hypoctyl5 (Hy5), which was downregulated in recessive a3 plants. Another transcription factor known to promote chloroplast development specifically in mesophyll tissue, Golden2-like1 (Glk1), was downregulated in recessive a3 plants (Wang et al. 2013). Sigma factors are nuclear-encoded RNA polymerases that activate synthesis of plastid-and chloroplast-related genes (Beardslee et al. 2002). In recessive a3 plants, genes Sig1a, Sig2b, and Sig8 were all downregulated indicating transcription of photosynthesis-related genes was reduced. In addition to regulatory genes, multiple subunits from Photosystems I and II, carbonic anhydrase, PEP-carboxykinase, PEP-carboxylase, pyruvate orthophosphate dikinase, NADP-malic enzyme, and rubisco activase were downregulated. Reduced A3 expression seems to be associated with decreased expression of many photosynthesis-related genes.

A3 is involved with a range of biological processes related to stress response
Gene ontology (GO) term enrichment for the 1794 differentially expressed genes found that A3 is associated with a range of biological and cellular processes (Table 2). Anthocyanin, flavonoid, and terpenoid pathways were differentially regulated. In addition, photosynthesis-related genes, including genes from both Photosystem I and II, and photosynthetic regulatory genes were differentially regulated. GO term enrichment found associations of A3 with water, high light, osmotic, oxidative, salt, and UV stress. Not only is A3 involved with abiotic stress, but genes involved with responses to bacteria, fungi, herbivores, and wounding were differentially regulated. GO terms for systemic acquired resistance and the related salicylic acid pathway were also enriched in this gene expression dataset.

Discussion
Mybr97 is the most likely candidate for the gene underlying the A3 locus Transposon tagging revealed that interrupting the promoter of Mybr97 was sufficient to confer the a3 intense purple phenotype. The structure of Mybr97 was similar to other R3-MYB repressor genes that act on the anthocyanin pathway in other species. Mybr97 contains a motif similar to the

R1/B1 gene family binding motif [DE]Lx2[RK]x3Lx6Lx3R
and was demonstrated to bind to the MIR of B1 in vitro (Zimmermann et al. 2004). The pattern of Mybr97 expression follows the same pattern of pigment accumulation in a3 recessive plants. Finally, bulked segregant RNA-seq analysis showed that Mybr97 was the highest downregulated gene within the a3 locus. Altogether, Mybr97 is the most likely gene involved with the a3 intense purple phenotype in maize.

Fine-tuning expression of anthocyanin-related transcription factors increases anthocyanin production
A3 is a strong negative regulator of the anthocyanin pathway in maize. Recessive plants display enhanced pigmentation in leaf sheath, husks, and tassels. Anthocyanin content in the husks is up to 100-fold greater than with a functional A3. It is difficult to compare anthocyanin production across the literature since extraction protocols vary widely. In a nearly exhaustive anthocyanin extraction of purple husks from a breeding program, anthocyanin content reached up to 19% of total weight (Li et al. 2008). The population developed in this current study contains the weak B1-b allele, so pigment production was not optimized. A constitutively expressing B1 or including naturally strong B1 alleles with a3 may be the most effective route to enhance pigment in husks, leaf sheaths, and tassels. Negative regulators Myb11, Myb31, Myb42, Sro1, and In1 were expressed in husks, which indicates that reducing the expression of these genes might also help increase pigment yield in husks. Myb11, Myb31, and Myb42 regulate lignin content while Sro1 is a competitive inhibitor of Pl1 (Agarwal et al. 2016;Qin et al. 2021;Vélez-Bermúdez et al. 2015). In1 affects aleurone pigmentation and is not known to be associated with husks, so this is novel regulation for this gene (Burr et al. 1996). Additionally, R1, C1, and Pac1 transcripts were detected in husks (Table S3). These are also not canonically associated with husk pigmentation, so their function is unknown. In a previous study, Pac1 transcript was also found in husks, but it was found that mutant pac1 plants do not show reduced anthocyanin content in vegetative tissues (Selinger and Chandler 1999;Carey et al. 2004). This finding indicates a secondary WD-repeat protein is most likely responsible for gene activation in husks. Pac1 homolog Mp1 was expressed in husks (Table S4) and may complement Pac1 in husks. Increasing anthocyanin content in maize has implications on human health as these pigments are well-characterized antioxidants and have a range of health-promoting benefits (Lao et al. 2017).

Bulked segregant RNA-seq analysis facilitates the identification of candidate genes
Two methods for pinpointing A3 gene candidates were utilized in this study. Transposon tagging has long been used to discover genes since this system produces stable knockouts for visible phenotypes. In this study, Ac-im, a transposon that encodes transposase, but cannot excise itself, was used to mobilize Ds (Conrad and Brutnell 2005). This is important so that the Ds element is the only mobile element causing phenotypic changes. The transposon-tagging population was developed with the expectation that 4.5% of the plants will have heritable excision events (Conrad and Brutnell 2005). In a pilot experiment utilizing this system, it was estimated that 5000 individuals would need to be generated to insert into an average-sized gene 4 cM away (Ahern et al. 2009). Since Ds prefers closely linked (4 cM or less) sites (Ahern et al. 2009), then it was necessary to produce a large population for a site expected to be 7.7 cM downstream of a1-m3::Ds (Robinett et al. 1995). The readily visible a3 phenotype assisted in finding new insertion events. The second method for narrowing down candidate genes for A3 involved calling SNPs from transcriptomic data. Bulked segregant RNA-seq analysis is an important tool for discovering genes and is beneficial as a form of low-representation sequencing method for complex genomes like maize (Liu et al. 2012). Large coverage for SNPs can be found in gene-coding regions. The two thresholds for determining significance in bulked segregant RNA-seq left conservatively wide confidence intervals. The A3 locus could have been impacted by the inclusion of some A3 dominant individuals in the recessive pools since the 320 N allele of the Mybr97 transcript was found in recessive pools. Despite this, this study demonstrated the effectiveness of using bulked segregant RNAseq analysis to provide a narrow list of candidate genes that traditional linkage mapping methods are unable to do with the same generation time and population size.

Phylogenetic approaches are important for discovering new anthocyanin-related genes
The anthocyanin pathway is important in numerous plant species and has its roots as far back as the bryophytes (Markham 1988). It is reasonable to hypothesize then that regulatory control of the anthocyanin pathway may have also been phylogenetically conserved among angiosperms. Indeed, anthocyanin R3-MYB repressor genes appear to be functionally conserved among angiosperms. R3-MYB repressors have been characterized in Arabidopsis, chrysanthemum (Chrysanthemum morifolium), eggplant (Solanum melongena), gentian (Gentiana trifolia), grape hyacinth (Muscari spp.), Iochroma loxense, Lychee (Litchi chinensis), monkeyflower (Mimulus lewisii), petunia (Petunia × hybrida), orchid (Phalaenopsis spp.), poplar (Populus spp.), and tomato (Solanum lycopersicum) (Ma and Constabel 2019;Fu et al. 2019;Xiang et al. 2019;Zhang et al. 2020;Zhao et al. 2022). It is reasonable to conclude that Mybr97, which has homology to these transcriptional regulators is also involved with repressing anthocyanin synthesis in maize. Utilizing the wealth of knowledge from other plant species could decipher clues into additional regulators of the anthocyanin pathway in maize. In this study, Nac and WRKY transcriptional regulators with homology to eudicot anthocyanin regulators were upregulated (Table S2). All the canonical biosynthetic genes involved with anthocyanin production were upregulated as was expected (Table 1 and Fig. 4). The AOMT in maize has not been characterized to this date, but Omt4 appears to be a good candidate based upon the upregulation in purple tissues and homology to other AOMTs. Work is currently underway to isolate this protein to test its specificity.

Marker-assisted selection was effective for the a3 locus
Molecular marker umc2008 is an appropriate marker for A3. The marker itself is only 30.2 kb away from Mybr97 according to the B73 RefGen_v4 reference genome and has high variability in simple repeat number among common varieties tested (data not shown). However, in the W22 v2 genome, the distance is as far as 1.85 Mb from Mybr97 (Springer et al. 2018). These large-scale structural variabilities may account for the accidental inclusion of dominant A3 plants in the recessive pools of samples. All a3 recessive plants were checked with umc2008 before sampling. Interestingly, some intensely purple plants contained the B73 marker allele and were excluded. Intense anthocyanin accumulation with a dominant A3 allele indicates that there might be alternative factors involved with intensifying anthocyanins in this population besides A3. More evidence for alternative intensification factors was visible in the pericarps of the segregating population. B73 has yellow kernels and 320 N contains white kernels. The F 1 segregates for yellow and white kernels as expected. However, after selfing, red and purple kernels were common. The source of the cryptic genetic variation shown in this population is not currently understood. It may be due to the interaction of transcription factor alleles from the two parents in the population or recombination swapping promoter elements.

Increased phenylpropanoid pathway gene expression implies increased plant protection
Not only is the involvement of A3 with the anthocyanin pathway important, but also the upregulation of genes in the monolignol pathway has implications on structural stability; insect, disease, and wounding defense; and forage quality in maize. Three PAL genes, along with C4H and 4CL were upregulated in recessive a3 plants (Table 1 and Fig. 4).
In maize, there appears to be multiple copies of PAL with various tissue-specific regulation (Guillaumie et al. 2007;Morohashi et al. 2012;Yuan et al. 2019). Pal1, Pal2, and Pal6 were important in husks, which is in agreement with a previous study (Yuan et al. 2019). Increased expression of PAL has been also associated with increased insect and disease defense through the increased production of salicylic acid in another study (Yuan et al. 2019). Production of anthocyanins is very costly for the plant, so it is no wonder that many modern maize varieties have a dominant A3 and do not produce appreciable amounts of anthocyanins.
Knocking out anthocyanin biosynthetic genes along with A3 might pull vital precursors away from anthocyanin synthesis into, say, lignin, maysin, or salicylic acid synthesis, where they might be more beneficial for crop protection.

Photosynthesis-related gene expression was affected in recessive a3 plants
The negative association of the a3 phenotype with photosynthesis is concerning in terms of plant productivity. However, if one's goal is to create an intensely purple plant, then A3 is an ideal candidate. The mechanism for reduced photosynthetic capacity in recessive a3 plants is currently unknown.
Anthocyanins have a slight overlap in absorbance spectrum with chlorophyll in the ultraviolet to blue spectrum. However, the maximum absorbance wavelength of anthocyanins is distinct from Chlorophyll a and b (Zscheile 1934;Chatham et al. 2020). It is possible that the high concentration of anthocyanins in recessive a3 plants may mask some photosynthetically active radiation. If Mybr97 is the gene responsible for the a3 locus, reduced photosynthetic gene expression could be the result of associations with other transcription factors with an impact on photosynthesis and shade sensing. Previous studies have implicated Mybr97 in shade avoidance syndrome. Mybr97 was upregulated in the dark, indicating it may have photosynthetic regulatory functions (Shi et al. 2019;Wang et al. 2016). This is similar AtCPC and other CPC-like genes in Arabidopsis that not only affect anthocyanin accumulation, but have a range of effects on trichome development, stomata, and flowering time (Zhu et al. 2009).

Mybr97 is activated independently of the anthocyanin pathway
The activation of Mybr97 in the absence of the MBW complex in B73 and other normal green varieties implies that Mybr97 is involved with a broader activation response mechanism independent of anthocyanins. Mybr97 seems to be involved with many stress response pathways in maize. A previous study also implicated Mybr97 in cold, heat, salt, and UV stress (Makarevitch et al. 2015), just as GO term analysis did here (Table 2). Promoter elements may infer probable mechanisms for the control of Mybr97. Two G-box elements are in the promoter of Mybr97. These elements are known binding sites for maize anthocyanin bHLH proteins and photosynthetic regulator Hy5 (Ang et al. 1998;Kong et al. 2012). The presence of this G-box element may indicate the gene is auto-regulated by anthocyanin bHLH proteins like R1 and B1. In fact, in grape hyacinth, the R3-MYB repressor gene is activated by MabBhlh1, which is the bHLH member of the MBW complex (Zhang et al. 2020). Using the PROMO tool, predicted binding sites for light-associated transcription factors Gt1, Phytochrome interacting factor1, and Thioredoxin m1 were found (Xu et al. 2001;Farre 2003;Gao et al. 2015). Hormonal control of A3 might be another plausible mechanism of activation. Abscisic acid-responsive cis-element TAC GTG and auxin responsive elements TGG TTT and TGT CTC are also present in the promoter of A3 (Guilfoyle and Hagen 2007;Song et al. 2018). Future work needs to investigate the role of these binding sites and hormone levels on the Mybr97.

A major role of Mybr97 may be to competitively inhibit the MBW complex
The association of Mybr97 with B1-MIR suggests Mybr97 is a competitive inhibitor of the MBW complex in maize. No DNA-binding experiments were performed, but R3-MYB genes are not predicted to have DNA-binding capabilities because they lack the R2-domain typically found in most MYB proteins (Dubos et al. 2010). It is not currently understood why full-length B1 protein could not bind to Mybr97 (Fig. 3). Anthocyanin bHLH proteins are capable of homodimerizing, which may interfere with the association of Mybr97 to B1 (Kong et al. 2012). In addition, the GST tag may have interfered with binding sites in vitro. Future experiments will be aimed at discovering other targets for Mybr97 protein using alternate affinity tags or using a yeast two-hybrid approach. A previous study showed that AtCPC is able to bind to the MIR of R1 allele Leaf color1 (Lc1) as demonstrated in a yeast two-hybrid and GST pull-down assay (Tominaga-Wada et al. 2012). Future experiments should test the MIR of this protein and see whether it is able to inhibit anthocyanins in either the pericarp or aleurone layers of the grain. Furthermore, protein interaction assay should be performed with photosynthetic bHLH members to see whether Mybr97 physically interacts with regulators of photosynthesis.

Conclusions
A strong candidate for the gene conferring the a3 intense purple phenotype was determined to be Mybr97, an R3-MYB repressor gene, via a transposon-tagging population, bulked segregant RNA-seq analysis, and a GST pull-down assay. Recessive a3 plants display enhanced pigmentation in leaf sheath, husks, and tassels. Anthocyanin content in the husks is up to 100-fold greater in recessive a3 plants. Transcriptomic analysis found the entire anthocyanin biosynthetic pathway was upregulated in recessive a3 plants along with some monolignol and maysin biosynthetic genes being upregulated as well. Novel transcriptional regulators were discovered that may have associations with the anthocyanin pathway. Numerous stress response related genes were upregulated in recessive plants which indicates A3 has implications on plant health. There was a negative association with photosynthesis-related genes, which needs to be investigated further. Future work will be focused on confirming the role of Mybr97 with the a3 phenotype and finding other targets for Mybr97 besides B1. Overall, the A3 locus has a profound effect on anthocyanin content in maize and has implications on crop protection and human health.