A Rice Promoter Protein Binding Microarray for Cis-Acting Elements for Rice Transcription Factors

Transcription factors (TFs) regulate the expression of genes at the transcriptional level by binding a specific DNA sequence. Thus, predicting the DNA-binding motifs of TFs is one of the most important areas for the functional analysis of TFs in the postgenomic era. Although many methods have been developed for this challenge, there are still many TFs with unknown DNA-binding motifs. In this paper, we designed an rice ( Oryza sativa )-specific protein binding microarray (RPBM), and its probes are 40 bp long with 20 bp of overlap; there are 49 probes spanning the 1 kb promoter region before the translation start site of each gene. To confirm the efficiency of RPBM technology, we selected two TFs, OsWOX13 and OsSMF1 . We identified the ATTGATTG DNA-binding sequence and 635 putative target genes of OsWOX13 . OsSMF1 bound to GCTGACTCA and GGATGCC sequences and bound especially strongly to CCACGTCA. A total of 932 putative target genes were identified for OsSMF1 . RPBM can be applicable in the analysis of DNA-binding motifs for TFs where binding is evaluated in extended natural promoter regions. The analysis can also be applicable to TFs that have single or multiple binding motifs. The technology might even be expanded for application to TFs that are heterodimers or form higher-order complexes. to the promoters of and ONAC024. These results indicate that OsSMF1 has multiple distinct motifs, with OsSMF1 binding to the ACGT (CCACGT(C/G)), GCN4 (TGA(G/C)TCA), and ATGA (GGATGAC) motifs. poly dI-dC, 1X binding buffer, 2.5% (v/v) glycerol and 0.05% (wt/v) NP-40 in a 20 µl reaction volume for 1 h at room temperature according to the manufacturer's instructions (Pierce). The reaction mixture was then analyzed by electrophoresis in a nondenaturing 6% acrylamide gel with 0.5X TBE buffer. The DNA-protein complexes in the gel were detected as fluorescence signals using Fusion SL (Vilber Lourmat).


Introduction
Transcription factors (TFs) play a pivotal role in the regulation of gene expression by binding to their cognate motifs in the promoter regions. For many years, This binding activity has been investigated by biochemical assays such as electrophoretic mobility shift assays (EMSAs), nitrocellulose filter binding assays, footprinting assays, and yeast one-hybrid system assays (Hellman and Fried 2007;Helwa and Hoheisel 2010). However, such approaches are generally laborious and slow, and many TFs still remain uncharacterized.
High-throughput methods such as chromatin immunoprecipitation (ChIP)-chip, ChIP followed by sequencing (ChIP-seq), and protein binding microarrays (PBM) have been developed with the availability of whole-genome sequences and advances in microarray technology (Barski et al. 2007; Ren et al. 2000;van Steensel et al. 2001; Wang et al. 2008). PBM has some advantages compared to ChIP. For example, PBM is not dependent on the availability of highly specific antibodies and does not need to use cross-linking reagents, eliminating the risk of cross-linking artifacts. Protein binding microarrays (PBMs) were introduced to conveniently determine protein-DNA interactions in vitro ). PBMs were improved by adapting de Bruijn sequences and in situ synthesis of DNA oligonucleotides on slides (Berger et al. 2006). The de Bruijn sequences represent not only all contiguous 10-mers but also all 10-mers with a gap size of 1 nucleotide. The whole-genome yeast intergenic microarray for PBM was prepared by spotting double-stranded DNA (Zhu et al. 2009). Recently, in efforts to characterize the DNA-binding activity of transcription activator-like effectors (TALEs), which are secreted by the bacteria Xanthomonas via their Type III secretion system function and function as virulence factors, a custom PBM was developed (Anderson et al. 2020; Rogers et al. 2015). TALE-DNA interactions were comprehensively assayed in this PBM in which ~ 5,000-20,000 unique DNA sequences per effector protein were spotted.
Identification of genomic regulatory elements has led to the construction of the databases TRANSFAC (Wingender et al. 1996), GRASSIUS (Yilmaz et al. 2009), PlnTFDB (Perez-Rodriguez et al. 2010), UniPROBE (Hume et al. 2015), and PlantTFDB (Jin et al. 2017). In particular, PlantTFDB was constructed based on a collection of 156 plant species with sequenced genomes. Recent advances in ChIP-seq have provided powerful ways to identify genome-wide profiling of DNA-binding proteins and histone modifications, leading to databases such as ChEA, CistromeMap, and ChIPBase (Lachmann et al. 2010;Qin et al. 2012;Yang et al. 2013).
Previously, we designed a PBM, denoted Q9-PBM, in such a way that target probes are quadruples of all possible 9-mer combinations (Kim et al. 2009). A total of 131,072 features were selected from the 262,144 reads after consideration of the reverse complimentary sequences because double-stranded DNA has a bidirectional aspect. The quadruple sequences can provide highly consistent and concrete results for consensus binding motifs. Q9-PBM employs DsRed fluorescent protein, which eliminates multiple wash and hybridization steps. Q9-PBM confirmed the well-known DNA-binding sequences of Cbf1 and CBF1/DREB1B, and it was also applied to elucidate the unidentified cis-acting elements of the OsNAC6, MYB44, and OsSMF1 rice TFs (Kim et al. 2009). These PBMs can identify binding motifs but could be limited by the number of designed nucleotide sequences in terms of oligomer length (9 or 10). It also opens the possibility that the binding sites of TFs can be searched in gene-specific promoters.
To overcome the limitations from the number of nucleotides and investigate the binding activity in the promoter region, we designed a rice (Oryza sativa)-specific PBM (RPBM) in such a way that the 1 kb gene-specific promoter region was covered by overlapping 40 nt long probes. The single oligomers on the microarray were subjected to polymerase chain reaction (PCR) to form double strands, and then the binding sites of the TFs OsWOX13 and OsSMF1 were tested. OsWOX13 is known to preferentially bind to a ATTGATTG DNA-binding motif, while OsSMF1 has multiple DNA-binding motifs such as GCN4 [TGA(G/C)TCA], ACGT [CCACGT(C/G)], and ATGA [GGATGAC] (Kim et al. 2017;Minh-Thu et al. 2018). Using this RPBM, we confirmed the DNA-binding motifs and identified putative target genes and of OsWOX13 and OsSMF1.

Design of a RPBM
Probes for the RPBM were designed from promoters of genes deposited in the IRGSP RAP2 database (http://rapdb.lab.nig.ac.jp). A probe is 40 bp long, covering a gene-specific region, with 20 bp for an annealing site for PCR. Each gene-specific region overlapped 20 bp, and 49 probes spanned the 1 kb promoter region before the translation start site of each gene (Fig. 1). Considering the ambiguity of annotation, the first probe of genes without 5'-UTR or with a 5'-UTR longer than 200 bp was designed from the 5' upstream region including methionine. In this way, 954,520 probes were designed from 19,480 genes among 31,439 genes. Each target probe was followed by a sequence complementary to a primer sequence (5'-CGGAGTCACCTAGTGCAG-3') and was connected by a 5 nt thymidine linker on the microarray.

Analysis of signal intensities
The full-length OsWOX13 and OsSMF1 cDNAs were fused at the N-terminus to the DsRed fluorescent protein gene. Purified recombinant OsWOX13 and OsSMF1 proteins fused with DsRed:6xHis were hybridized to the RPBM as described in the Methods section. Then, the consensus binding motifs were determined based on signal strength (Kim et al. 2017;Kim et al. 2009).
A rank-ordered signal distribution showed a steep slope on the left followed by a heavy right tail for RPBM. As the probes in the steep slope region differed in only one base, we assumed that the signal distribution was due to specific interactions between the proteins and features on the microarray. Two independent linear models, y = ax + b, were applied in the steep and heavy right tail regions using the R statistical language. In OsWOX13, the slope and y-axis intercept of the steep sloping region were − 14.7 and 66,570.6, respectively, while those of the heavy tail region were − 0.0043 and 3,144, respectively (Additional file 1: Figure S1a). The number of strong binding probes from the deep slope was 34,778 (Additional file 2: Table S1).
OsSMF1 gave a similar rank-ordered signal distribution, showing a steep slope on the left followed by a heavy right tail. The slope and y-axis intercept of the steep slope region were − 25.1 and 64,928.8, respectively, while those of the heavy tail region were − 0.0207 and 2283, respectively (Additional file 1: Figure S1b). For OsSMF1, the extrapolated intensity of the heavy right tail was 3,137. The number of target probes for which the intensity was higher than this value was 38,654 (Additional file 3: Table S2). These results suggest that the binding of transcription factors and their cognate binding sites in RPBM as stable as to those found in Q9-PBM. In addition, the probe design from the promoter regions overcome potential complexities due to concatemers of target sites.

Identifying putative target genes of OsWOX13 by RPBM
To find the DNA-binding motif of OsWOX13, a 40 bp probe was split into 9-mers, and each oligomer was given the pseudointensity of the probe. The process was repeated with a base shift, and finally, a probe gave 32 9mers (Additional file 1: Figure S2a). The strongly binding feature probes (34,778) give 198,384 distinct 9-mers from 1,177,280 of the total frequency (Fig. 2a). We found that 4-5 consecutive G-or C-rich oligomers (3,148) exhibited nonspecific binding and discarded them from the subsequent analysis. The average intensity and frequency of 9-mers were 21,193.0 and 5.9, respectively. These 9-mers were sorted according to their intensities, and GATTGATTG had the highest intensity of 37,706 with a frequency of 280 (Additional file 4: Table  S3). To find a consensus sequence, cluster analysis was performed in such a way that any 9-mer with a 5 nt long sequence matching the template of the highest intensity belonged to a group. The 1,028 9-mers formed a cluster with GATTGATTG as a template. These top 20 9-mers ranked by intensity contained one or more ATTG sequences ( Table 1). The occurrences of nucleotides at each position were shown in a position weight matrix (PWM) by clustering of these 9-mers (Fig. 3a). Web logo (weblogo.berkeley.edu) gave ATTGATTG (Fig. 3b). In addition, mutation analysis was conducted by changing bases in each ATTGATTG (Fig. 3c). A base-mutated sequence gave a maximum decrease at the 4th nt, G, and a minimum at the 1st nt, A (10756.4 and 10139.6, respectively). The Wilcoxon-Mann-Whitney test using the ranks with and without the motif clearly showed that the ATTG motif (8-mer) is the binding motif of the OsWOX13 TF. Similarly, oligomer frequency and point mutations at distinct positions were also analyzed with 5-, 6-, 7-, 8-, and 10-mers (Additional file 1: Figure S3). These analyses showed that ATTGATTG is the binding motif of the TF. Table 1 List of 9-mers highly ranked by intensity and containing the ATTGATTG sequence  a) Rank order by the intensity b) 9-Mers were obtained by a base shift on a 40 nt long feature probe, and finally, the probe gave 32 distinct 9-mers. c) Intensities were averaged over all the feature probes containing the corresponding 9-mer sequence. d) Total number of frequencies of the 9-mer from the 34,778 strongly binding feature probes. e) Total intensities for column c * column d f) Distinct positions of 9-mers in the 40 nt probes. The highest value (near 32) suggests that the 9-mers were obtained from all the positions by a base shift in the probes.
An extended motif was constructed using ATTGATTG as a template by adding a base in either the 5' or 3' direction ( Fig. 3c). For example, GATTGATTG (-1) was chosen from analysis of the 8-mer, which was extended in the 5' direction with the base G to make GATTGATTG, and repeated analysis showed that T is the farthest in the 5' direction (-2). Similarly, G and T were added in the 3' positions of + 1 and + 2, respectively, which gave TGATTGATTGGT. These data were confirmed by counting the actual frequency of nt flanking ATTGATTG. A total of 3,243 genes in rice contained the ATTGATTG motif in the 1 kb promoter regions, and 29,379 genes were retrieved from RAP-DB (http://rapdb.dna.affrc.go.jp/). The preferred nucleotides were searched (Additional file 1: Figure S4) for in flanking sequences around ATTGATTG. A and T were preferable at -3 and − 2, and G and A were preferable at the − 1 position. In contrast, A/G was preferable at the + 1 position, and T was preferable at the + 2 and + 3 positions.
Among 34,778 probes, 646 probes contained the ATTGATTG motif (Fig. 2a, Additional file 5: Table S4). From these probes, we identified 635 putative target genes of OsWOX13. Gene ontology (GO)-based functional enrichment analysis of the above candidate genes was performed by the web-based tool AgriGO (http://bioinfo.cau.edu.cn/agriGO/analysis.php). The results revealed that among the 635 genes, 501 were annotated, of which 10 GO terms showed significant differences compared to those in the Oryza sativa database as a background reference ( Table 2). The most enriched terms of macromolecule metabolic process (GO:0043170) were significantly enriched, including protein (GO:0019538), carbohydrate (GO:0005975), lipid (GO:0006629), and nucleobase (GO:0006139) ( Table 2). Categories such as death (GO:0016265) and response to stress (GO:0006950) were also highly enriched. These results were in line with the observation in a previous paper that compared to control plants, rice plants overexpressing OsWOX13 showed early flowering and drought tolerance (Minh-Thu et al. 2018). The 635 putative target genes of OsWOX13 (a) with the ATTGATTG motif and the 932 putative target genes of OsSMF1 (b) with the GCCACGTCA motif were chosen and subjected to gene ontology analysis using AgriGO (http://bioinfo.cau.edu.cn/agriGO/analysis.php).
To verify putative targets of OsWOX13, we selected Hd1-3 (Os08g0536300), for which a probe (Os08g0536300_14, AATATAACGAAACATGCAATCAATCAAAATGTTGGGAAGG) contains the ATTG motif ( Fig. 3d and Table S1). We assayed its binding specificity to recombinant OsWOX13 by EMSA using carboxyfluorescein (FAM)labeled double-stranded oligonucleotide probes. The binding of OsWOX13 to the 40 bp probe with the ATTG motif was detected as lagging bands (Fig. 3d). These results confirmed the ATTG motif that has previously been identified using an analysis based on Q9-PBM (Minh-Thu et al. 2018). .

Identifying the DNA-binding motif of OsSMF1 by RPBM
OsSMF1 reportedly binds multiple cis-elements (Kim et al., 2017). To test this, RPBM was applied to find the binding motif of OsSMF1, and 32 9-mers were extracted from a 40 bp long probe in the same manner as that for OsWOX13. The 15,394 probes gave 178,857 distinct oligomers, and the total frequency was 492,608 (Fig. 2b).
The average intensity and frequency of 9-mers were 21,725.2 ± 11,270.6 and 2.75, respectively. In contrast to OsWOX13, several groups were identified by initial cluster analysis, suggesting that OsSMF1 binds several motifs. Thus, the distinct 9-mers with frequencies four times the average frequency (over 11) were sorted according to the value of the intensity multiplied by the frequency, and then the 9-mers were narrowed down to 648 in total (Table 3, Additional file 6: Table S5). This list gave 4 clusters, GCCACGTCA, ACGTAAGCG, GCTGACTCA, and AGGATGCCA, with 335, 24, 31 and 24 9-mers, respectively (Additional file 7: Table S6, Fig. 4A).
In addition, these results show that the cluster of GCCACGTCA is predominant and that other clusters were minor but distinct. In a previous paper, Q9-PBM and EMSAs were used to show that OsSMF1 binds the GCN4 (TGA(G/C)TCA), ACGT (CCACGT(C/G)), and ATGA (GGATGAC) motifs with three different affinities (Kim et al. 2017). GCCACGTCA and ACGTAAGCG are part of the ACGT motif, GCTGACTCA is included in the GCN4 motif, and AGGATGCCA is very similar to the ATGA motif. As the GCCACGTCA and ACGTAAGCG clusters have ACGT motifs, they were aligned together and gave a position matrix, and CCACGTCA was a main element (Fig. 4b). The feature probes containing CCACGTCA (932) are listed (Additional file 8: Table S7). The Wilcoxon-Mann-Whitney test was performed as shown for those target probes containing CCACGTCA and those without the sequence, and it gave a p-value of 0, suggesting that CCACGTCA contributed significantly to binding. To test the preferences for any nucleotide flanking CCACGTCA sequences, an extended motif was constructed using CCACGTCA as a template by adding a base in either the 5' or 3' direction as with OsWOX13 (Fig. 4c). Mutation analysis was performed as with OsWOX13 by changing the bases in each CCACGTCA (Fig. 4c). Intensities strongly decreased with changes to A at the 3rd position (by 10637. 3) and to A at the 7th position (by 8356.0). An extended motif was constructed using CCACGTCAG as a template by adding a base in either the 5' or 3' direction, giving TGCCACGTCAGC. Thus, this study showed that CCACGTCA is a DNAbinding motif for OsSMF1, while the flanking sequences of this motif have no significant effect. Similarly, the intensities of the feature probes in terms of the frequency and mutations at each position were also analyzed with 5-, 6-, 7-, 8-, 10-, and 11-mers (Additional file 1: Figure S5).
To verify putative targets of OsSMF1, we selected two nonapical meristem (NAM) proteins, Os01g0393100 (ONAC026) and Os05g0415400 (ONAC024), from "regulation of nitrogen compound metabolic process (GO:0051171). ONAC026 and ONAC024 were identified as target genes of OsSMF1 in a previous paper (Kim et al. 2017). Probes from the ONAC026 and ONAC024 promoters contain the ACGT and GNC4 motifs, respectively (Fig. 4a). We assayed their binding specificities to recombinant OsSMF1 by EMSA using FAM-labeled doublestranded oligonucleotides corresponding to each probe. The binding of OsSMF1 to the 40 bp probes was detected as lagging bands (Fig. 4d). This result indicated that OsSMF1 directly binds to the promoters of ONAC026 and ONAC024. These results indicate that OsSMF1 has multiple distinct motifs, with OsSMF1 binding to the ACGT (CCACGT(C/G)), GCN4 (TGA(G/C)TCA), and ATGA (GGATGAC) motifs.

Discussion
In this paper, we reported RPBM where the 1 kb promoter region is covered by overlapping 40 bp long probes. The initial signal distribution of RPBM was very similar to that of Q9-PBM, where quadruple 9-mer oligonucleotides were designed as the target probes. These results suggest that the binding of transcription factors and their cognate binding sites in RPBM as stable as to those found in Q9-PBM. The probe design from the promoter regions overcome potential complexities due to concatemers of target sites and the binding is understood in the promoter regions. The analysis of signal intensities of 5-10 oligomers, especially 9 mers, highlighted putative binding sequences and the comparison of those signals of oligomers with point mutation at each site clearly showed strong binding sequences. Further, it is confirmed the feature probes on RPBM can be directly used in the subsequent EMSA analysis without further modification.
We first applied 9-mer-based analysis and identified the ATTGATTG DNA-binding sequence and 635 putative target genes of OsWOX13, which has one dominant binding site. The Plant Transcription Factor Database (Jin et al. 2017) showed that Os01g0818400 (OsWOX8) has a representative motif, CAATCAA, which has a 7 nt sequence of the reverse complement of ATTGATTG. Many homeobox-containing TFs contain ATTGATTG or parts of it in their motifs, and this is also found in the similar homeobox TFs, as shown in Os090528200 and Os03g0170600 in PlantTFDB. We also surveyed the UniPROBE database (Hume et al. 2015) and compared its entries with putative cis-elements of homeo-domain-containing TFs such as UP00615B_1 and UP00158A_1 from humans and mice, respectively. These factors also provided various GA-or AT-rich motifs. In particular, the UP00158A_1 binding site contains AATTAATTA and ATTA repeats and showed a base (A to G) difference with ATTG repeats in the ATTGATTG motif in our analysis (Minh-Thu et al. 2018).
The mode by which OsSMF1 modulates downstream TFs that are bound to GCCACGTCA and ACGTAAGCG, which include the ACGT motif, might be complex. GCTGACTCA is included in the GCN4 motif, GGATGCC is very similar to the ATGA motif, and the cluster near CCACGTCA is predominant, confirming previous results (Kim et al. 2017).
Although the cis-elements are not registered in PlantTFDB, the cis-elements representative of the basic leucine zipper in the database are consistent with those found in many basic leucine zipper TFs. These TFs contain an ACGT motif in their representative binding motif. A few examples are Os01g0859500 with GATGACGTCA, Os02g0203000 with TGATGACGTGGC, Os02g0766700 with TGCCACGTGNCC, and Os03g0796900 with TGACGTGG, which is reverse complementary to CCACGTCA (Additional file 9: Table S8). These results suggest that OsSMF1 evolved to have specific functionality involving common DNA-binding activity due to the bZIP domain.
Application of the technology might even be expanded for application to TFs that are heterodimers or form higher-order complexes, as a 40 nt probe could have additional putative cis-elements. In addition, an extended analysis of the databases could be evaluated with other interacting TFs that might be functionally associated in processes such as metabolism and development. For example, the TFs that might be associated with OsWOX13 were sought in PlantTFDB through the elements in the 40 bp flanking ATTGATTG in the promoter regions (data not shown). Thus, the CAATCA site for Os09g0528200 (homeobox-leucine zipper protein), AAAAAG site for Os02g0707200 (Dof-like protein 34) and CAAGNAA site for Os03g0119966 (NAC-domain protein) are frequently found elements in rice.

Conclusions
These results showed that RPBM is applicable in the analysis of DNA-binding motifs with a TF where binding is evaluated in extended natural promoter regions. The analysis can also be applicable to TFs that have single or multiple binding motifs. The technology might even be expanded for application to TFs that are heterodimers or form higher-order complexes. In addition, the extended analysis of the databases could be evaluated with other interacting TFs that might be functionally associated in processes such as metabolism and development.

Protein Expression and Purification
All proteins used in this study were expressed as N-

Synthesis of Complementary Strands on Microarray
Complementary DNA strands were synthesized as described in a previous report. The reaction solution contained 40 µM dNTPs (TaKaRa), 1.6 µM CyDye5-dUTP (GE Healthcare), 1 µM 5'-CTGCACTAGGTGACTCCG-3' primer (Bioneer), 1X ThermoSequenase buffer, and 0.5 U/µl ThermoSequenase (USB). A custom-designed PBM (Agilent) was combined with the reaction solution in a hybridization chamber (Agilent) according to the manufacturer's protocol. The assembled hybridization chamber was incubated at 85 °C for 10 min and then 60 °C for 90 min.
The microarray was washed in PBS-0.01% (v/v) Triton X-100 at 37 °C for 1 min, PBS-0.01% (v/v) Triton X-100 at 37 °C for 10 min and PBS at room temperature for 3 min and dried by centrifugation at 500 g for 2 min. The doubled-stranded microarray was scanned to verify successful synthesis.

Protein Binding Microarray and Data Analysis
Double-stranded microarrays were washed with PBS containing 0.01% (v/v) Triton X-100 and blocked with PBS containing 2% (wt/v) BSA (Sigma) for 1 h. Then, the microarray was first washed with PBS containing 0.1% (v/v) Tween-20 and then with PBS containing 0.01% (v/v) Triton X-100 for 1 min. The protein binding mixture was prepared containing 200 nM TF in PBS containing 2% (wt/v) BSA, 51.3 ng/µl salmon testes DNA (Sigma), and 50 µM zinc acetate. The prepared protein mixture was incubated to stabilize and bind the microarray at 25 °C for 1 h. The microarray was first washed for 2 min with PBS containing 50 µM zinc acetate and 0.5% (v/v) Tween-20 for 10 min, then with PBS containing 50 µM zinc acetate and 0.01% Triton X-100 for 2 min, and finally with PBS containing 50 µM zinc acetate. Fluorescence images were obtained with a microarray scanner (Axon).

Selection of Promoters Containing ATTGATTG motifs
The 1 kb long promoter regions of 29,379 rice genes were retrieved from RAP-DB (http://rapdb.dna.affrc.go.jp/). The genes containing ATTGATTG were selected by using an in-house Perl script. A total of 1631 genes contained the motif in their promoters. Promoter regions 1 kb long were also retrieved from the same database. To identify cis-elements and TFs that might be associated with OsWOX13, the TFs and their associated cis-elements of Oryza sativa were downloaded from the Plant Transcription Factor Database (Jin et al. 2017). The representative cis-elements are extracted by using the nucleotides with higher occupancies than 0.5 at each position in the letter-probability matrix. The motifs with at least 6 distinctive nucleotides and nonconsecutive Ns were chosen for further analysis. With these criteria, 264 TFs and cis-elements were identified.

Electrophoretic Mobility Shift Assay (EMSA)
First, 5' FAM-end labeled and unlabeled oligonucleotides were annealed with each complimentary sequence. Five micrograms of OsWOX13 and OsSMF1 protein was incubated with 40 fmol of FAM-labeled double-stranded oligonucleotides, 1 µg of poly dI-dC, 1X binding buffer, 2.5% (v/v) glycerol and 0.05% (wt/v) NP-40 in a 20 µl reaction volume for 1 h at room temperature according to the manufacturer's instructions (Pierce). The reaction mixture was then analyzed by electrophoresis in a nondenaturing 6% acrylamide gel with 0.5X TBE buffer. The DNA-protein complexes in the gel were detected as fluorescence signals using Fusion SL (Vilber Lourmat).

Competing interests
The authors declare that they have no competing interests. Figure 1 Schematic of the rice promoter protein binding microarray. A probe is 40 bp long, of which 20 bp overlaps. For each gene, 49 probes spanned the 1 kb promoter region before the translation start site. A total of 954,520 probes were designed from 19,480 genes among 31439 genes. mutations at distinct positions in ATTGATTG. Binding motif of OsWOX13 from Wilcoxon-Mann-Whitney test, pvalue 0. The wild type (WT) has the highest value (37,706), and the intensities of the 9-mer sequences with a point mutation were obtained from the list in Table S1. d) EMSA-based competition analysis of OsWOX13 using the probe Os08g0536300_14, which contains the ATTGATTG motif. The 40 bp sequences used as probes and their competitors are depicted. EMSAs were carried out using the OsWOX13:DsRed protein and a probe 5'labeled with FAM. Competition for the labeled sequences was tested by adding different concentrations of unlabeled probes. Figure 4 DNA-binding motif analysis of OsSMF1. a) DNA-binding motifs of OsSMF1 by clustering of the significant binding sequences. It gave at least 4 clusters; each cluster was analyzed, and its position weight matrix was calculated.

Figures
The sequences were visualized with the Web logo program as shown in Figure 2. b) A cluster containing the GCCACGT motif and the position weight matrix. b) Comparison of the intensities of oligomers with point mutations at distinct positions in GCCACGTCA. Binding motif of OsSMF1 from the Wilcoxon-Mann-Whitney test, pvalue 0. c) Mutation analysis using 9-mers. d) EMSA-based competition analysis of OsSMF1. Forty bp sequence feature probes, Os01g0393100_8 and Os05g0415400_39, representing GCCACGT and TGAGTCA clusters, respectively, were used as probes, and competitors are depicted. EMSAs were carried out using the OsSMF1:DsRed protein and a probe 5'-labeled with FAM. Competition for the labeled sequences was tested by adding different concentrations of unlabeled probes.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.