Hybrid allele-specic ChIP-Seq analysis links variation in transcription factor binding to traits in maize


 Variation in transcriptional regulation is a major cause of phenotypic diversity. Genome-wide association studies (GWAS) have shown that most functional variants reside in non-coding regions, where they potentially affect transcription factor (TF) binding and chromatin accessibility to alter gene expression. Pinpointing such regulatory variations, however, remains challenging. Here, we developed a hybrid allele-specific chromatin binding sequencing (HASCh-seq) approach and identified variations in target binding of the brassinosteroid (BR) responsive transcription factor ZmBZR1 in maize. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) in B73xMo17 F1s identified thousands of target genes of ZmBZR1. Allele-specific ZmBZR1 binding (ASB) was observed for about 14.3% of target genes. It correlated with over 550 loci containing sequence variation in BZR1-binding motifs and over 340 loci with haplotype-specific DNA methylation, linking genetic and epigenetic variations to ZmBZR1 occupancy. Comparison with GWAS data linked hundreds of ASB loci to important yield, growth, and disease-related traits. Our study provides a robust method for analyzing genome-wide variations of transcription factor occupancy and identified genetic and epigenetic variations of the BR response transcription network in maize.


Introductory paragraph
Variation in transcriptional regulation is a major cause of phenotypic diversity 1  Comparison with GWAS data linked hundreds of ASB loci to important yield, growth and disease-related traits. Our study provides a robust method for analyzing genome-wide variations of transcription factor occupancy and identified genetic and epigenetic variations of the BR response transcription network in maize.
BR is a major growth-promoting hormone that controls many important agronomic traits, such as plant height and architecture, shoot branching, flowering time, sex determination, seed size, and disease resistance [3][4][5] . Studies in Arabidopsis have shown that BR binds to the receptor kinase BRI1 to activate the BZR1 family of TFs which directly regulate thousands of target genes involved in various cellular and developmental responses 6 .
Genetic evidence indicates that components of the BR signaling pathway, including BZR1, are conserved in monocotyledonous crops such as rice and maize 5,7,8 . Given the severe and broad developmental phenotypes of BR deficient and signaling mutants, we expect that evolutionary variations in the BR transcriptional targets would be a major contributor to phenotypic diversity 5,9 . To characterize the BR-regulated transcriptional network in maize, we performed RNA-seq on shoot tissue of wild type as well as BR biosynthesis deficient brd1 (BR6ox2) seedlings treated with and without brassinolide (BL), the most active BR. We identified a total of 2743 BR-responsive genes (Supplementary Table 1). To identify the genes directly regulated by the BR signaling pathway in vivo, we performed ChIP-seq analysis of the maize BES1/BZR1 homolog (Zm00001d021927), using transgenic plants (B73 backcrossed) expressing a ZmBES1/BZR1-YFP fusion protein driven by the ZmBES1/BZR1 promoter described previously 10 . The nuclear localization of ZmBES1/BZR1-YFP was increased by BR and reduced by RNAi silencing of BRI1 10 , and by the BR biosynthesis inhibitor propiconazole (PPZ 11,12 ) ( Supplementary Fig. 1). Such BR-dependent nuclear localization is similar to that observed for BZR1 in Arabidopsis, rice, and tomato [13][14][15][16] , suggesting that ZmBZR1 plays a conserved role in BR-responsive gene regulation.
Our ChIP-seq experiment identified 17463 high confidence ZmBZR1 binding peaks (Supplementary Table 2), most of which were near the transcription start site (Fig. 1a).
The list of potential target genes included maize homologs of known AtBZR1 targets, such as ZmBR6ox2 and ZmIAA19 (Fig. 1b, Supplementary Fig. 1) 6 , some of which were validated by ChIP-qPCR ( Supplementary Fig. 1). Our analysis of cis-element enrichment identified the BR response element (BRRE, CGTG(C/T)G) and the G-box (CACGTG), which were previously identified as BZR1 binding sites in Arabidopsis 6,17 (Fig. 1c).
Interestingly, co-enrichment of the binding sites for TCP factors (e.g. GG C /ACCA) was also observed, similar to the observation in Arabidopsis 6 (Fig. 1d). These results indicate that the BR response cis-elements and their combination with other factors are conserved in Arabidopsis and maize.
The ZmBZR1 ChIP-seq peaks located near 6371 putative target genes (Supplementary  Supplementary Fig. 1). Among orthologous genes between the two species, we found the highest enrichment over random for BR-responsive BZR1 targets compared to e.g. non-responsive targets or BRresponsive non-target genes (Fig. 1g). The conserved BR-responsive target genes tended to show the same direction of response (Fig. 1h). The results indicate both conservation and evolution among the transcriptional targets of BR signaling.
Although it is poorly understood how non-coding variants affect phenotypes, nearly half of quantitative trait variation in maize is explained by accessible, non-coding regions that may contain regulatory elements, e.g. TF binding sites 2,19 . While molecular approaches such as ChIP-seq can identify TF binding sites, quantitative analyses of the effects of genetic and epigenetic variations on TF binding often suffer from sample-tosample variations and trans-factor variations. To address these challenges, we performed ChIP-seq in F1 hybrids and named the approach HASCh-seq. To understand the functions of BZR1 binding in regulating gene expression and plant traits, we studied the influence of genetic variation between two inbred lines, B73 and Mo17, on BZR1 binding ( Fig. 2a). Single nucleotide polymorphisms (SNPs) that affect, or link to, BZR1 binding will show a shift of the allele frequency from the expected 1:1 ratio in an F1 (Fig. 2b). We observed a strong reproducibility between the biological replicates (3x B73xMo17 and 3x Mo17xB73; Pearson's correlation coefficient > 0.88, Supplementary Fig. 1 2c). The lead SNP of each linkage group (i.e. showing the most significant bias) was considered as putative allele-specific BZR1 binding sites (ASBs). About 19.2% of ASBs were within 1-kb upstream of the transcription start sites of genes. About 56.8% of ASBs were found within 5 kb upstream and 1 kb downstream of 1355 genes, which are considered putative BZR1 ASB target genes (Supplementary Table 9, Fig. 2d). An analysis of the higher affinity allele sequence of ASB regions revealed BZR1-binding motifs (BRRE and G-box) as the most enriched (Fig. 2e). Of the BZR1-binding motifs that overlapped with ASB SNPs (i.e. were altered compared to the canonical motif), a decreased BZR1 binding was observed for 87.5% and 94.3% of variations in BRRE and of G-box motifs, respectively (Fig. 2f). A 2.1-fold higher frequency of variations affecting BZR1 binding was observed for the core "CGTG" sequence than the two flanking bases (Fig. 2g), consistent with results of a previous mutagenesis study showing the importance of this core sequence for BZR1 binding 17 . These results indicate that variations in the binding site sequences affect BZR1 occupancy in vivo.
Variation in canonical BZR1 binding motifs accounted for 554 (16.8%) of all ASBs ( Fig. 2h). It is possible that SNPs outside the canonical cis-elements affect BZR1 occupancy, by directly affecting the DNA binding of BZR1 or its heterodimer partner TFs 18,20,21 . Alternatively, haplotype specific DNA methylation or histone modification may affect TF binding [22][23][24] and contribute to allele-specific BZR1 occupancy. Therefore, we compared our BZR1 ASB data with genome-wide DNA methylation data of B73 and To determine whether variations of BZR1 binding contribute to differential gene expression, we compared our ASB data with the data of allele-specific transcriptome analysis in the B73xMo17 hybrid. About 24.6% of all maize genes showed allele-specific differential expression in B73xMo17 hybrids ( Fig. 3a) 25 , while 38.1% (433 of the 1137) ASB-associated genes showed allele-specific variation in gene expression (1.55-fold enrichment, p=1.1*10 -24 assuming a hypergeometric distribution, Fig. 3b). This suggests that variation of BZR1 binding may contribute to variation of gene expression in the hybrid.
Indeed, RT-qPCR analysis of BR responsive expression of ASB-associated BZR1 target genes, in B73 and Mo17 plants, confirmed that stronger BZR1 binding correlated with higher BR responsiveness of both BR-induced, VP14 26 , and BR repressed, DWF4 27 , target genes in the inbred lines treated with or without PPZ and brassinolide ( Fig. 3c-3h) 11 .
In addition to cis-regulatory elements located in the close vicinity to the transcriptin start site (TSS) or transcription termination site (TTS), transcriptional enhancers can be located tens of kilobases from their target genes 28 . We thus hypothesized that the ASBs located in intergenic regions could be located within distant enhancers and analyzed their abundance around putative intergenic enhancer regions 29 . Of the ASBs distant from the genes, 11% coincided with the 1495 intergenic enhancer regions identified in B73, which is approximately 37-fold higher than random probability (Fig. 4a).
To assess the relationship between variations of BZR1 binding and trait variations, we quantified the enrichment of GWAS hits across 41 traits 2 within 2 kb of ASB regions.
We found a 2.2-5.5 fold enrichment for 16 of the traits, compared to background SNPs with similar genomic distribution and minor allele frequency (bgSNPs, see methods).
While the largest fraction (40%, 1.5 fold enrichment within the dataset) were traits related to growth and yield, other known phenotypic variations between B73 and Mo17 such as tassel branching 30 and disease resistance 31 were also enriched (Fig 4b). To further investigate the role of ASBs in complex organismal trait variation we used variance component annotation (VCAP). We partitioned the heritable phenotypic variance into annotation-specific components classifying ASB and bgSNP regions 19 . We examined the Nested Association Mapping (NAM) population using VCAP of ASBs and found that they explained a remarkable portion of the heritable variance (> 10%) for some of the traits with moderate to high heritability (h2 > 0.4l). For 16 of the 25 traits analyzed, we found that ASBs explained disproportionately larger genetic variances compared to the bgSNPs, including 100 kernel weight, tassel branch number, and northern leaf blight resistance, but not e.g. the ratio of ear height to total plant height (Fig. 4c).
In summary, we present HASCh-seq as a robust method for identifying genetic variants that affect TF binding. By analyzing the TF binding to two different alleles in the same cells, HASCh-seq avoids technical variations that compromises quantitation and trans-factor differences that complicate data interpretation. Our HASCh-seq analysis of ZmBZR1 demonstrates a high level of variations in the BR transcriptional regulatory network between two diverse maize inbreds. Differences of ZmBZR1 occupancy were correlated with variations in its binding motif sequences and DNA methylation status. Our data also provides genetic evidence for functions of thousands of cis-elements in the BR transcription network in maize. The approach complements classical GWAS approaches, as there were significant associations between ASBs and BR-regulated traits 32 . This demonstrates that combining GWAS with HASCh-seq can be a powerful approach to pinpoint candidate targets for genome editing to improve traits.

Acknowledgement
We are grateful to A. Luo and A. W. Sylvester for providing the transformed BES1/BZR1-YFP maize lines. We are grateful to Marty Sachs for helpful communications concerning maize mutants and the Maize Genetics Cooperation Stock Center for providing the brd1 mutant seeds. We thank M. Evans, Y. Lu, D. Sosso and V. Walbot for maize field support.
We thank S. E. Castel for the merge-sam script. We thank G.W. Huntress and M. Lopez for IT support and the Stanford genomic center for sequencing support. We are grateful to H. Cartwright for confocal microscopy. We also thank W. B. Frommer, S. Rhee, R.
Weizbauer, and C. H. Park for their feedback and critical discussion of the paper. This work was supported by a grant from NIH (R01GM066258) to Z. Y. W.

Data availability
The data discussed in this publication has been deposited in the NCBI Gene Expression Omnibus and will be made accessible upon article acceptance.

Plant material and growth conditions
Construction of the ZmBES1/BZR1-YFP transgenic line was previously described 1 and obtained in HiII background from A.W. Silvester. The ZmBZR1-YFP line was further crossed six times with B73, using B73 as mother to avoid contamination. Wild-type and   Table S11.

ChIP-seq data analysis
Quality filtered ChIP-seq reads were aligned to the B73 AGPv4 genome using bwa-mem were determined using R (v. 3.3.2) and considered high confidence peaks.

HASCh-seq data analysis
Mitigating potential biases are critical steps in order to accurately analyze allele specific binding. For HASCh-seq, we first tried to address mapping bias using an enhanced reference approach previously described 6  an average insert size of ~200 bp achieved after sonication. Therefore, SNPs in close proximity with causative polymorphisms will show biased allele frequency due to linkage disequilibrium (LD). To address this, we identified SNPs with significant bias within a 150 bp rolling window of each other and defined the lead SNP of ASBs by the lowest p value, integrating both read depth and allele frequency bias.

Control background SNP (bgSNP) sampling
Functional GWAS variants have been shown to be significantly enriched in gene proximal regions 9 . Therefore, control background SNPs (bgSNPs) were proportionally sampled (excluding ASBs) per chromosome and genomic location (i.e. 5 -1kb upstream, 1kb upstream -TSS, 5'UTR, exon, intron, 3'UTR, TTS -1 kb downstream, intergenic) to match the genomic distribution of the ASBs dataset. Additionally, we checked that ASBs and bgSNPs showed a similar minor allele-frequency ( Supplementary Fig. 3). In total, 168950 bgSNPs were sampled, yielding approximately 50 times as many background SNPs per genome location, compared to the number of ASBs within each location. Sequencing data that support the finding of this study have been deposited in the NCBI SRA database and will be made available upon acceptance of the manuscript.

Fluorescence imaging
Heterozygote BZR1-YFP plants were grown in the dark at RT for 10 days in vermiculite with and without 10 µM PPZ. Prior to imaging, 1 cm root tip segments were removed from the mock and PPZ-treated seedlings. The root tips of PPZ-treated plants were treated for 15 min with 10 µM PPZ with or without 1 µM 24epi-BL.

RNA extraction, RNA sequencing and differential expression analysis
BR deficient brd1 mutant siblings were grown in soil under greenhouse conditions for 26 days as described above. The oldest 2 leaves were removed and 2 cm of meristem enriched tissue (Supplementary Fig. 1) were placed in 1 µM BL for 4 h at room temperature (RT) in water. Total RNA was isolated using acidic phenol extraction as described previously 10  To identify ASBs which may be explained by motif variation (Fig. 2h), we extracted the To identify ASBs which overlapped with significant variation in either CG, CHG, or CHH methylation between B73 and Mo17, we first, per ASB, assigned averaged methylation levels of Mo17 and B73 methylation levels (separately for the CG, CHG, or CHH methylation datasets) within a given window of +/-40 bp around the ASB position.
Differentially methylated alleles were defined as described previously 13 . Accordingly, we identified ASBs as overlapping with differentially methylated regions if in the B73 or Mo17 methylation datasets, the methylation level of one allele would be >= 70% while the level of the corresponding allele was <= 10%.

GWAS enrichment, Kinship matrices and variance components analysis
We tested association of ASBs with the curated 4041 significant GWAS hits for 41 different phenotypes of the NAM population 9 . Per trait we performed an enrichment between the ASBs compared to the control bgSNPs (with the same average genomic distribution and minor allele frequency ( Supplementary Fig. 4)). GWAS hits were counted if they were located within 2 kbps of ABSs or bgSNPs and the subsequent enrichment analysis was based on a hyper-geometric test.
We estimated the variance components explained by different ASB SNP subsets and the remaining SNPs using the maize Nested Association Mapping (NAM) population 16  We then fed these kinship matrices along with the NAM phenotypic data to estimate the variance components explained by the ASB subsets, using a Residual Maximum Likelihood (REML) method implemented in LDAK 19 .