Loss-of-function mutation of soybean R2R3 MYB transcription factor reduces flavone content and dilutes tawny pubescence color

Pubescence color of soybean is controlled by two genes, T and Td. In the presence of a dominant T allele, dominant and recessive alleles of the Td locus generate tawny and light tawny (or near-gray) pubescence, respectively. Flavones, responsible for pubescence color, are catalyzed by two copies of flavone synthase II genes (FNS II-1 and FNS II-2). This study was conducted to map and clone the Td gene. Genetic and linkage analysis using an F2 population and F3 families derived from a cross between a Clark near-isogenic line with light tawny pubescence (genotype: TT tdtd) and a Harosoy near-isogenic line with tawny pubescence (TT TdTd) revealed a single gene for pubescence color around the end of chromosome 3. Genome sequence alignment of plant introductions revealed an association between premature stop codons in Glyma.03G258700 (R2R3 MYB transcription factor) and recessive td allele. Cultivars and lines having near-gray or light tawny pubescence and a gray pubescence cultivar with td allele had premature stop codons in the gene. These results suggest that Glyma.03G258700 corresponds to the Td gene. It was predominantly expressed in pubescence. Compared to a tawny pubescence line, a near-isogenic line with td allele produced extremely small amounts of transcripts of Glyma.03G258700, FNS II-1, and FNS II-2 in pubescence. The promoter of FNS II-1 and FNS II-2 shared cis-acting regulatory elements for binding of MYB proteins. These results suggest that the wild-type of Glyma.03G258700 protein binds to the promoter of FNS II genes and upregulates their expression, resulting in increased flavone content and deeper pubescence color. In contrast, mutated Glyma.03G258700 protein fails to upregulate the expression of FNS II genes, resulting in decreased flavone content and dilute pubescence color.


Conclusions
This study revealed that soybean Glyma.03G258700 encoding the R2R3 MYB transcription factor corresponds to the Td gene. The wild type of MYB protein binds to the promoter of FNS II genes and upregulates their expression, resulting in higher flavone content and deeper pubescence color. Loss-of-function mutation of the gene fails to promote expression of FNS II genes, resulting in lower flavone content and dilute pubescence color.

Background
Pubescence color of soybean (Glycine max (L.) Merr.) is controlled by two genes (T and Td) [1]. The T gene has a major effect on pubescence color, dominant T and recessive t allele produce tawny and gray pubescence, respectively. Further, the T gene controls coloration of the seed coat and hypocotyl [1][2][3]. The T gene encodes a flavonoid 3ʹ-hydroxylase and hydroxylates flavonoids at the 3ʹ-position of the B-ring to generate dihydroxylated flavonoids [4,5]. The entire coding region of F3'H was cloned from a pair of near-isogenic lines (NILs) for the T gene, To7B (TT) and To7G (tt). They differed by one base deletion in To7G, resulting in a short polypeptide lacking the consensus sequence GGEK and hemebinding domain [5].
Bernard [6] reported another gene, Td, affecting pubescence color. In the presence of the dominant T allele, dominant and recessive alleles of the Td locus generate tawny and light tawny (or near-gray) pubescence, respectively. Recessive t allele generates gray pubescence color irrespective of the allele of the Td locus (tt TdTd or tt tdtd) [6]. In contrast to the T gene, Td gene affects only pubescence color [1,2].
Existence of different flavonoids is responsible for variation of tissue coloration in soybean. Iwashina et al. [7] characterized flavonoids in soybean pubescence. A large amount of luteolin aglycone (flavone with 3ʹ4ʹ-dihydroxylation) was extracted together with small amounts of apigenin aglycone (flavone with 4ʹ-hydroxylation) and luteolin derivatives from tawny pubescence. Luteolin aglycone comprised the largest portion (95-96%). From gray pubescence, a large amount of apigenin aglycone was extracted together with small amounts of luteolin aglycone, apigenin 7-O-glucoside, and apigenin derivatives.
Apigenin aglycone was most abundant (90%). Thus, the hydroxylation pattern of the B-ring is associated with the dominance of the T gene. From light tawny pubescence of Clark-td, a Clark NIL with td allele, 3 compounds identical to those in tawny pubescence were extracted. However, compared to tawny pubescence, the amount of luteolin aglycone was reduced to half. In addition, two high performance liquid chromatography (HPLC) peaks corresponding to isoflavonoids were occasionally found in light tawny pubescence [7].
These results suggest that the Td gene might be involved in the biosynthesis of flavones.
Flavone glycosides exist in vacuoles in pubescence whereas flavone aglycones exist outside the cell surface, and they belong to so-called surface flavonoids [7]. Biological roles of surface flavonoids remain to be investigated [8]. Most pigments in pubescence could not be extracted from pubescence, suggesting that flavone derivatives are highly polymerized [7]. In soybean, flavones predominantly exist in pubescence and are quite rare in other tissues [7].
Plants have evolved two independent enzyme systems to catalyze flavone synthesis using the same substrates, flavone synthase (FNS) I and II [9]. Both enzymes never occur side by side in the same organism. FNS I, a soluble 2-oxoglutarate-and Fe 2+ -dependent dioxygenase occurs in Apiaceae, whereas FNS II, a membrane-bound cytochrome P450 monooxygenase is more widespread among plant species. Soybean has two functional copies of FNS II genes, FNSII-1 (Glyma.12G067000) and FNSII-2 (Glyma.12G067100) with 93% amino acid identity existing with an approximately 8 kb distance in chromosome 12 [10,11].
Genome-wide association studies (GWAS) using a total of 1,402 soybean genotypes revealed a significant signal associated with pubescence color at 47,244,893 bp on Chromosome 3, presumably corresponding to the Td-td locus [12]. In addition, GWAS of a total of 12,360 accessions suggest a signal associated with the Td-td locus around the end of chromosome 3 [13]. However, the identity and nature of the Td gene has not yet been clarified. This study was conducted to map and clone the Td gene responsible for flavone content and deepening/diluting of pubescence color in soybean.

Inheritance of pubescence color
From a total of 120 F 2 seeds planted, 104 plants grew normally. Some of the F 2 plants were difficult to classify into tawny or light tawny pubescence class as previously reported [6] so F 3 progeny tests were performed. A total of 98 F 2 plants were randomly selected, and their F 3 seeds were planted in the field. 98 F 3 families were segregated into 24 families fixed for tawny pubescence, 50 segregating families, and 24 families fixed for light tawny pubescence. The segregation fitted to a single gene control model of 1:2:1 ( 2 = 0.04, P = 0.98), in accordance with the previous report [6].

Mapping of pubescence color gene
Bulked segregant analysis suggested that the Td gene was located in chromosome 3.  Table 2). The primers were constructed flanking nine repeats of the AT motif. The Td gene was mapped 1.1 cM away from the marker toward the chromosome end, generating a linkage group spanning 86.1 cM (Fig. 1).

Candidate gene identification
There are 71 genes (Glyma.03G257900 to Glyma.03G254900) from Gm03:45226162-45226406 to the chromosome end in Williams 82. Based on genome sequence comparison with Williams 82 (Glycine max Wm82.a2.v1), plant introductions with near-gray pubescence (PI 157421, PI 84631, and PI 549046) had mutations in the coding region of Glyma.03G258700, which is annotated as a MYB transcription factor (Fig. 2). In PI 157421, base substitution from TGG to TGA generated a stop codon in the coding region. In PI 549046 and PI 84631, a single-base deletion occurred at different positions 3-bp apart each other in the second exon, resulting in premature translation termination. These results suggest that Glyma.03G258700 might correspond to the Td gene.
However, according to the reference genome of Williams 82, cDNA of Glyma.03G258700 is truncated and lacks a start codon (Fig. 2). Genome sequence alignment of the plant introductions and cultivar Bay indicated that no next generation sequencer (NGS) reads were allocated to a 58-nucleotide region (Gm03:45302852 to Gm03:45302909) corresponding to the middle of the first exon in all genotypes (Additional File 1: Fig. S1).
We designed a forward primer in the upstream region to amplify the entire coding region of Glyma.03G258700 (Table 2 and Additional File 1: Fig. S1).
Gene cloning RT-PCR using primers for Glyma.03G258700 generated fragments of approximately 1 kb length in Harosoy-T, Clark-td, and Cloud. The coding region of Harosoy-T was 840 bp long and encoded 279 amino acids (Fig. 3A). BLAST analysis confirmed that Glyma.03G258700 belongs to an MYB transcription factor of the R2R3 type (Fig. 3A). A single base was substituted from G to A at the nucleotide position 633 in Clark-td compared with Harosoy-T. The single nucleotide polymorphism (SNP) generated a premature stop codon (TGA) resulting in a short polypeptide consisting of 210 amino acids (Fig. 3B). Cloud had one base deletion at the nucleotide position 677. This deletion changed a subsequent reading frame and generated a truncated polypeptide consisting of 232 amino acids (Fig. 3B).
Sequencing of the 2nd exon revealed that Korean, Seneca, Kingwa, Grant, and Sooty had the same SNP as Clark-td.

Derived cleaved amplified polymorphic sequences (dCAPS) analysis
The scheme of dCAPS analysis for SNP and indel are shown in Fig. 6A and 6B, respectively.

Gene expression
The transcript level of Glyma.03G258700 in pubescence was extremely high (151.6 times of the immature seed), whereas that in the other tissues were comparable with immature seed (leaf: 0.57 times; stem: 0.46 times; root: 2.26 times; root nodule: 5.02 times; flower: 1.38 times) in Harosoy-T (Fig. 7A). In pubescence, the transcript level of Glyma.03G258700 was approximately 12% in Clark-td compared with that in Harosoy-T ( Fig. 7B). Similarly, the transcript levels of two flavone synthase genes, GmFNSII-1 and GmFNSII-2, in Clark-td were approximately 23% and 10%, respectively, of that in Harosoy-

Promoter of FNSII genes
The nucleotide sequence of the promoter region had generally a low identity (18%) among the two flavone synthase II genes, GmFNSII-1 and GmFNSII-2, in Williams 82. However, nucleotides around the end of the region were relatively similar (Fig. 8). There are six kinds of cis-acting regulatory elements for binding of MYB transcription factors (MYBCORE, MYB2CONSENSUSAT, MYBCOREATCYCB1, MYBPZM, MYB1AT, and MYBPLANT) slightly upstream of the coding region, majority of which are shared by the two genes (Fig. 8).

Discussion
Pubescence color of soybean is controlled by two genes T and Td [1,6]. The T gene encodes a flavonoid 3ʹ-hydroxylase and hydroxylates the 3ʹ-position of the B-ring to generate dihydroxylated flavonoids [4,5]. The dominant T allele generates luteolin derivatives and tawny color in pubescence, whereas the recessive t allele produces apigenin derivatives and gray color [7]. Thus, T gene alters pubescence color by changing the structure of flavones. Under the dominant T allele, dominant Td allele generates higher amount of luteolin derivatives and tawny color in pubescence [7]. In contrast, recessive td allele produces less amount of luteolin derivatives and light tawny (or neargray) pubescence color. These results suggest that the Td gene may control amounts of flavones.
Genetic analysis and linkage mapping using the F 2 population and F 3 families derived from a cross between NILs with tawny and light tawny pubescence suggested that a gene responsible for pubescence color was located at the end of chromosome 3, which is consistent with the results of GWAS analysis [12,20]. Flavones are catalyzed by two functional copies of FNS II genes, FNSII-1 and FNSII-2, in chromosome 12 [10,11]. Accordingly, the FNS II genes may not correspond to the Td gene. There remains a possibility that the Td gene encodes a transcription factor that controls the expression of FNS II genes. Genome sequence alignment of plant introductions revealed that near-gray pubescence is associated with premature stop codons in the coding region of Glyma.03G258700 encoding an R2R3 MYB transcription factor. However, cDNA of Glyma.03G258700 is truncated and lacks a start codon in the reference genome of Williams 82. Genome sequence alignment indicated that no NGS reads were allocated to the 58-nucleotide region in the middle of the first exon. We presume that the reference sequence of Williams 82 has a mistake at this region. We designed a forward primer upstream of the region and successfully amplified the entire cDNA of Harosoy-T, Clark-td, and Cloud. The 58-nucleotide sequence of these cDNAs was completely different from the reference sequence of Williams 82 (Additional File 1: Fig. S1). Genome sequence of Williams 82 needs to be confirmed at this region. An additional data file shows that majority of the MYB proteins are involved in biosynthesis of anthocyanins, proanthocyanins, or flavonols (Additional File 2: Table S1) [27]. The Td gene specifically controls color of pubescence, but not changes in the color of flower, seed, or hypocotyl [2,6]. The allele at the Td locus did not affect the amounts of anthocyanins, flavonol glycosides, or dihydroflavonol in flower petals [35]. In addition, the present study revealed that Glyma.03G258700 expresses predominantly in pubescence where it almost exclusively deposits flavones. These results suggest a hypothesis that Glyma.03G258700 has evolved to upregulate biosynthesis of flavones. Further investigations including expression assays of other flavonoid biosynthesis genes may be necessary to ascertain the hypothesis.
Glyma.03G258700 protein may bind to the promoter of FNS II genes and upregulate their transcription. Essentially, expression levels of Glyma.03G258700, GmFNSII-1, and

GmFNSII-2 in pubescence of Clark-td wereextremely low compared with those in Harosoy-
T. In addition, the promoter region of FNSII-1 and FNSII-2 shared cis-regulatory elements for MYB transcription factors, including the MYBCORE element. Persimmon DkMYB4 protein having a quite similar MYB-binding domain was proved to bind to the MYBCORE element by electrophoretic mobility shift assays [26]. These results strongly suggest that the Glyma.03G258700 protein may also bind to a similar element in the promoter of FNSII genes.
In summary, the wild type Glyma.03G258700 protein may bind to the promoter of FNS II

Plant materials
Plant materials used in this study are listed in Table 1. Pedigree and pubescence color information was obtained from the USDA GRIN database (https://www.ars-grin.gov/npgs/). Linkage mapping A total of 94 F 2 plants were randomly selected and used for linkage mapping because the PCR plates and electrophoresis apparatus are designed for multiples of 96 samples (2 parents and 94 F 2 plants). Total DNA was extracted from trifoliate leaves of the parents and each of the F 2 plants by the CTAB method [15]. Pubescence color was scored in each of the F 2 plants and F 3 families. Based on pubescence color in F 2 and F 3 generations, 24 F 2 plants fixed for tawny pubescence and 24 F 2 plants fixed for light tawny pubescence, were identified. Two bulked DNA samples (one for tawny pubescence and another for light tawny pubescence) were obtained by mixing 10 L from each of the selected plants and subjected to bulked segregant SSR analysis. Polymorphic SSR markers in Harosoy x Clark (Nebraska) population [16] were used for analyzing variation between the two bulked DNA samples. PCR conditions were similar to those in a previous report [17]. Variation between the two bulked samples were used to determine which chromosome harbored the pubescence color gene. SSR markers on this chromosome [18] were then tested for variation among the 94 F 2 plants. The linkage map was constructed using MAPMAKER/EXP version 3.0 [19] with the Kosambi function and the threshold logarithm of odds (LOD) score of 3.0. To fill the gap in the linkage group, SSRs were screened and an SSR marker designated as Gm03:45226162-45226406 was constructed using the Simple Sequence Repeat Identification Tool (http://archive.gramene.org/db/markers/ssrtool).  Table 1). The genome sequences together with our NGS data of US cultivar Bay with gray pubescence were uploaded to the Integrative Genomics Viewer (http://software.broadinstitute.org/software/igv/) [21] and were compared with those of Williams 82 having tawny pubescence.

Molecular cloning
Total RNA was isolated from trifoliate leaves (100 mg) of Harosoy-T,Clark-td, and Cloud usingthe Spin Column Plant Total RNA Purification Kit (Sangon Biotech, Shanghai, China) according to the manufacturer's instructions. cDNA was synthesized by reverse transcription polymerase chain reaction of 5 g of total RNA using the PrimeScript II 1st Strand cDNA Synthesis Kit (Takara, Dalian, China) and an oligo (dT) primer following the manufacturer's instructions. cDNA was amplified by end-to-end PCR using primers ( Table   2)  The PCR products were sent to Sangon Biotech for direct sequencing.

Sequencing analysis
Nucleotide sequences of both strands were determined with the BigDye terminator cycle method using an ABI 3730XL (Thermo Fisher Scientific). Nucleotide sequences and the putative amino acid translations were analyzed with the BLAST program [22]. Nucleotide and amino acid sequences were aligned using ClustalW (http://clustalw.ddbj.nig.ac.jp/topj.html) with default settings. An additional data file shows the amino acid alignment used to construct a phylogenetic tree of MYB genes related to flavonoid biosynthesis (Additional File 1: Table S1) using the neighbor-joining method with MEGA version 10.0.5 (http://www.megasoftware.net/) [23]. Bootstrap test of 1000 replications was performed.
Nucleotide sequences of the putative promoter region (up to 1500 bp upstream from the start codon) of two FNSII genes, GmFNSII-1 (Glyma.12G067000) and GmFNSII-2 (Glyma.12G067100) of Williams 82 were downloaded from Phytozome. Cis-acting regulatory elements of these genes were investigated using New PLACE, a database of plant cis-acting regulatory DNA elements (https://sogo.dna.affrc.go.jp/cgi-bin/sogo.cgi? lang=en&pj=640&action=page&page=newplace), using default settings [24]. Quantitative real-time PCR Tissue samples were collected in 3 replicates. Pubescence was sampled with razor blades from pods and stems of Harosoy-T andClark-td at the R3 stage [25]. Flower petals were collected on the day of opening from Harosoy-T. Leaves, stems, roots, root nodules, and immature seeds were sampled from Harosoy-T at the R3 stage. Total RNA was extracted from 50 mg of pubescence, and 100 mg of the other tissue samples. cDNA was synthesized by reverse transcription polymerase chain reaction of 1 g of total RNA using the PrimeScript RT reagent Kit with gDNA Eraser (Perfect Real Time) (Takara) and an oligo (dT) primer following the manufacturer's instructions. The real-time PCR mixture contained 1 × TB Green Premix Ex Taq (Tli RNaseH Plus) (Takara), 10 pmol of each primer, 1 × ROX reference dye, 50 ng of cDNA, and water to a final volume of 20 l. The PCR was performed using the StepOnePlus Real-Time PCR instrument (Thermo Fisher Scientific).
The initial 30 sec denaturation at 95 °C was followed by 40 cycles of 5 sec denaturation at 95 °C and 30 sec annealing at 60 °C. The expression level of the target gene was normalized using soybean actin 1 gene (GenBank accession number: J01298). The PCR primers are listed in Table 2.

Declarations
Ethics approval and consent to participate Not applicable Consent for publication Availability of data and material Sequence data from this article has been deposited to DDBJ, with the accession numbers LC485152, LC485153 and LC485154. The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.        Tab.S1_accession.xlsx Table.docx