Whole maize B73 leaf tissue was collected at development stage v04. The same batch of plants, at the same stage (v04) and divided into four biological replicates (four plants per replicate) was used for gene expression profiling (RNA-Seq; four replicates), three-dimensional chromatin profiling (Hi-C; two replicates) and accessible chromatin (ATAC-Seq; three replicates).
The ultra-deep sequencing of two Hi-C biological replicates led to a total number of 3,435,596,872 and 3,424,795,714 raw paired reads, for replicates 1 and 2, respectively. Filtering of the raw data and mapping to the B73 AGPv4 reference genome sequence led to the detection of 392,396,275 and 449,828,472 Hi-C contact pairs, after combining inter-chromosomal and intra-chromosomal contacts for replicates 1 and 2, respectively (Table 1).
Table 1
interaction metrics for the chromosomal distribution of Hi-C contact pairs. Intra-chromosomal contacts are further divided into short range (< 20Kbps) and long-range (> 20Kbps) contacts.
| Inter-chromosomal Contacts | Intra-chromosomal Contacts | Intra-chromosomal Short-Range (< 20Kbps) Contacts | Inter-chromosomal Long-Range (> 20Kbps) Contacts |
Replicate 1 | 92,042,539 | 300,353,736 | 215,014,034 | 85,331,655 |
Replicate 2 | 95,674,415 | 354,154,057 | 252,256,720 | 101,886,819 |
Hi-C interaction matrices showed strong interactions between neighboring loci on euchromatic arms, accompanied by cis and trans interactions between centromeric and telomeric regions and cis interactions between chromosomal arms (Fig. 1a). The chromosomes could be further partitioned into A/B compartments by eigenvector analysis of the Hi-C interaction matrices (Fig. 1b). Further analysis showed evidence of inter-nested TAD-like regions, with prevalent cis interactions within those regions. Increased interactions at their borders suggested the existence of chromatin loops (Fig. 1c).
A total of 17,176 and 25,917 chromatin loops were detected for Hi-C replicates 1 (See Additional File 1) and 2 (See Additional File 2), respectively. Distances between anchors within a loop varied from 30Kbps to > 1Mbp (Fig. 1d). Repeat element density analysis between anchors and the regions located between two anchors (“loop interiors”) showed that repeats were more prevalent in loop interiors (Fig. 1e). There were 7,917 loops present in both replicates, with both anchors overlapping by at least 1 bp, and only 1,268 loops in replicate 1 and 2,657 loops in replicate 2 where none of the anchor overlap, indicating that a significant fraction of the loops shared one anchor only. Comparison with a prior set of 24,215 high-resolution chromatin interactions detected with the ChIA-PET method from B73 shoot (see Supplementary Data 16 in [10]) indicated that both anchors from up to 60% of all high-resolution interactions overlapped with anchors derived from Hi-C replicates 1 or 2 (Fig. 2a). Up to 28% of the remaining high-resolution interactions shared one anchor with loop interior regions, while a small number of interior regions overlap fully with both high-resolution anchors. Interestingly, only 23,536 out of the 48,430 anchors forming high-resolution chromatin interactions were deemed as distinct, based on their exact physical coordinates.
Sequence coverage peaks from ATAC-Seq libraries are generally seen as proxy for binding sites and gene regulatory elements in genomic DNA [14]. Sequencing of three ATAC-Seq biological replicates led to the detection of 20,955 to 39,584 open chromatin peaks (see Additional file 3). For each replicate, peaks were classified as located between two nucleosomes (“nucleosome-free” or “NF”) or overlapping one or more nucleosome (“multi-nucleosome” or “MN”), based on the distance between ATAC-Seq paired sequencing reads. While NF peaks tended to be discrete peaks (~ 100bps) located immediately upstream or downstream of genes (proximal peaks), or in intergenic regions (distal peaks), MN peaks tended to be broad peaks primarily covered entire gene regions. A list of 32,009 “consensus” NF and MN peaks was generated, where a peak had to be present in at least two individual replicates to be conserved, out of which only the 19,532 NF peaks were kept for further analysis (See Additional File 4).
Co-location analysis of ATAC-Seq NF peaks with chromatin loops was assessed, based on the following criteria. As many chromatin loop anchors were shared between multiple loops, in a significant number of cases, a peak could align to an anchor for one loop and a loop interior for another loop. Therefore, peak overlaps to loop regions were determined first based on their potential overlap to at least one loop anchor. If no overlap was detected, peaks were assessed based on their potential overlaps to loop interiors, then to genomic regions located outside of chromatin loops. Using this approach, up to 13,026 peaks, out of 19,532, overlapped primarily with anchors, while up to 4,649 peaks overlapped primarily with loop interiors (Fig. 2b). On the other hand, only 33% of all anchors overlapped with open chromatin peaks, suggesting technical constraints that limited the total number of peaks detected here, but also the possibility for distinct functions, or lack thereof, for some anchors not overlapping with peaks.
A similar outcome was observed for expressed gene. The total number of genes in B73 was estimated to be 38,847, out of which 18,700 were defined as “expressed” (see Methods) in B73 leaf whole tissue (see Additional file 5). Out of these, up to 13,918 (74.4%) primarily overlapped with chromatin loop anchors (Fig. 2b). When adding expressed genes overlapping with loop interiors, up to 91.1% of all expressed genes in leaf overlapped with chromatin loops. Among the remaining 20,147 un-expressed genes, 8,162 overlapped primarily with loop anchors (from replicate 1) while another 7,672 overlapped with loop interiors, suggesting that silenced genes also could be regulated through loop formation. No major difference in expression levels was observed between genes overlapping with loop anchors, genes overlapping with loop interiors and genes located outside of loops (Fig. 2c; replicate 1 only).
The 23,536 distinct anchors from high-resolution chromatin interactions were mapped to determine whether peaks and expressed genes overlapping with Hi-C loop interiors (Fig. 2d) also could overlap with high-resolution loop anchors. Up to 92% of distinct high-resolution anchors were contained within Hi-C chromatin loops, including up to 4,890 overlapping with Hi-C loop interiors (Fig. 2d). Among those, 49% overlapped with at least one expressed gene or an open chromatin peak (Fig. 2e). Conversely, 42.7% of expressed genes and 29.9% of open chromatin peaks present in replicate 2 loop interiors (Fig. 2b) also overlap with high-resolution loop anchors.
To further explore potential relationships between gene expression and loop formation, gene content for anchors overlapping with at least one NF peak was assessed. In that analysis, anchors were annotated based on the activity of overlapping genes located next to NF peaks. The data show that 62% of all NF peaks overlapping with chromatin loop anchors either overlapped or were located < 2Kbps away from expressed genes, suggesting those peaks may correspond to proximal regulatory regions, including promoters. Conversely, analysis of gene activity in loop anchors (including genes present in multiple anchors) showed that, for 71% of anchors harboring genes, at least one of those genes was expressed (See Additional File 6). Expression was associated with the absence of inactive genes within the anchor for 88% of those anchors, and this trend was generally accentuated by the presence of proximal open chromatin peaks. 74% of the remaining anchors carried distal peaks, possibly representing long-range regulatory elements.
Anchor annotation using replicate 1 as an example (See Additional File 7) showed that, of all 18,296 anchors with at least one expressed gene, 15,058, or 82%, had one expressed gene only. A total of 3,684 anchors contained at least one peak and no expressed genes (including 2,285 containing one peak only). Out of those, 2,897 were part of a loop where the opposite anchor contained at least one expressed gene and at least one peak, suggesting functional interactions between proximal and distal elements facilitated by loop formation. In addition, there was a total of 5,488 loops containing at least one expressed gene in both anchors, including 1,058 harboring at least one peak in each anchor, indicating potential interactions between proximal elements. Interestingly, out of the 11,066 anchors from replicate 1 with no known peak or expressed gene, only 2,368 were combined together into one chromatin loop, suggesting that only 6.8% (1,184/17,176) of all long-range chromatin loops from replicate 1 are potentially devoid of any gene expression activity or open chromatin regions.
Finally, 50,929 eQTLs associated to over 18,000 maize genes, derived from genotyping-by-sequencing, high density arrays and RNA-Seq data (see Supplemental Table 6 in [13]) were aligned to anchors. For replicate 2, 17,020 eQTLs had the lead SNPs and expression traits located on the same anchor while 2,632 had them located on separate anchors from the same loop (for replicate 1, those numbers were 10,938 and 1,829, respectively). Co-location occurred on 8,745 distinct replicate 2 anchors and 8,907 distinct replicate 1 anchors. Out of the 43,398 unique SNPs derived from the eQTL datasets, 25,162 and 27,252 overlapped with replicate 1 and replicate 2 anchors, respectively. Interestingly, 29,248 eQTL SNPs also overlapped with the high-resolution chromatin interactions described above [10].