Whole maize B73 leaf tissue was collected at development stage v04. The same batch of plants, at the same stage (v04) and divided into four biological replicates (four plants per replicate) was used for gene expression profiling (RNA-Seq; four replicates), three-dimensional chromatin profiling (Hi-C; two replicates) and accessible chromatin (ATAC-Seq; three replicates).
The ultra-deep sequencing of two Hi-C biological replicates led to a total number of 3,435,596,872 and 3,424,795,714 raw paired reads, for replicates 1 and 2, respectively. Filtering of the raw data and mapping to the B73 AGPv4 reference genome sequence led to the detection of 392,396,275 and 449,828,472 Hi-C contact pairs, after combining inter-chromosomal and intra-chromosomal contacts for replicates 1 and 2, respectively (Table 1).
|
Inter-chromosomal Contacts
|
Intra-chromosomal Contacts
|
Intra-chromosomal Short-Range (<20Kbps) Contacts
|
Inter-chromosomal Long-Range (>20Kbps) Contacts
|
Replicate 1
|
92,042,539
|
300,353,736
|
215,014,034
|
85,331,655
|
Replicate 2
|
95,674,415
|
354,154,057
|
252,256,720
|
101,886,819
|
Table 1: interaction metrics for the chromosomal distribution of Hi-C contact pairs. Intra-chromosomal contacts are further divided into short range (<20Kbps) and long-range (>20Kbps) contacts.
Hi-C interaction matrices showed strong interactions between neighboring loci on euchromatic arms, accompanied by cis and trans interactions between centromeric and telomeric regions and cis interactions between chromosomal arms (Fig. 1a). Further analysis showed evidence of hierarchical TAD-like domains [14], covering most of the maize genome in a nested fashion (Table 2) (Fig. 1b). A total of 17,978 and 18,739 domains were detected for Hi-C replicates 1 (See Additional File 1) and 2 (See Additional File 2), respectively. Here, large “level 0” domains contain series of nested “sub-domains” (level 1 to 6) representing approximately 60% of all detected domains (Table 2). Previous studies have shown that the rice genome contained extensive hierarchical chromatin interactions [8]. Others have shown that TAD-like domains in larger plant genomes such as maize are essentially compartment domains [9], where transcriptionally active domains are separated by large inactive heterochromatic domains. The results shown here suggest a model where “level 0” TAD-like compartment domains may contain multiple nested layers of chromatin interactions, marked by the presence of smaller nested “sub-domains”.
TAD level
|
Replicate 1
100% read count
|
Replicate 1
50% read count
|
Replicate 2
100% read count
|
Replicate 2
50% read count
|
0
|
7,131
|
7,989
|
7,257
|
8,028
|
1
|
6,621
|
4,982
|
6,997
|
6,231
|
2
|
3,206
|
2,084
|
3,384
|
2,717
|
3
|
835
|
428
|
911
|
603
|
4
|
157
|
57
|
165
|
83
|
5
|
24
|
10
|
25
|
10
|
6
|
4
|
0
|
0
|
0
|
Table 2: Hierarchical TAD-like domain counts for replicates 1 and 2. Hierarchical domains (level 0 to 6) were detected in replicates 1 and 2 from Hi-C contact maps generated using 100% and 50% total sequencing read counts. Domains (level 0) were further divided into nested “sub-domains” (levels 1 to 6).
To determine whether ultra-deep sequencing of each Hi-C biological replicate impacted hierarchical TAD-like domain detection and resolution, random Hi-C sequencing read datasets representing 50% of all total sequencing reads were generated to create new Hi-C interaction matrices. The new matrices then were used to compute new domains for each Hi-C replicate, which were subsequently compared to the ones computed with full read counts (Table 2). While the total number of domains did not significantly improve with higher initial read counts, it led nonetheless to the dissection of some domains into additional sub-domains.
In addition to TAD-like domains, the chromosomes could be partitioned into larger chromosomal A/B compartments by eigenvector analysis of the Hi-C interaction matrices. In maize, those compartments would divide each chromosome into two global A compartments at the chromosomal ends, flanking a global B compartment, corresponding roughly to its pericentromeric heterochromatic regions [9]. Eigenvector analysis of chromosome 1 at 500Kb resolution, using Hi-C interaction matrices generated from the “50% read count” and “100% read count” datasets shown above, led to the detection of global compartments similar in structure to what had been already published [9] and between each other, therefore indicating that, contrary to domain detection, increasing read counts did not significantly improve compartment detection.
Compartments, domains and sub-domains are prominent organizational features in the maize genome. Increased interactions at their borders suggested the existence of chromatin loops (Fig. 1c). A total of 17,176 and 25,917 chromatin loops were initially detected for Hi-C replicates 1 (See Additional File 3) and 2 (See Additional File 4), respectively. To confirm their validity, ICCUPS analyses were run at various resolutions from interaction matrices generated with distinct read counts (Table 3). Expectedly, loop counts varied with both read counts and resolution of detection. Based on those results, it was determined that the initial datasets for both replicates provided an appropriate number of high-resolution loops for subsequent analysis.
Loop detection resolution
|
Replicate 2
100% read count
|
Replicate 2
50% read count
|
5Kbps
|
16,568
|
9,118
|
10Kbps
|
25,917
|
16,950
|
25Kbps
|
18,397
|
13,697
|
50Kbps
|
35,832
|
24,358
|
Table 3: chromatin loop counts at various resolutions and read counts. Chromatin loops were detected with the HICCUPS software tool, from Hi-C interaction matrices generated using 100% and 50% total sequencing read counts.
Distances between anchors within a loop varied from 30Kbps to >1Mbp (Fig. 1d). Interestingly, repeat element density analysis, between anchors and regions located between two anchors, labeled as “loop interiors”, showed that repeats were more prevalent in loop interiors (Fig. 1e). There were 7,917 loops present in both replicates, with both anchors overlapping by at least 1bp, and only 1,268 loops in replicate 1 and 2,657 loops in replicate 2 where none of the anchor overlap, indicating that a significant fraction of the loops from both replicates shared one anchor only.
Additional comparisons were made with prior sets of high-resolution chromatin interactions detected with the HiChIP [11] and ChIA-PET methodologies in B73 (10). The HiChIP data were generated using antibodies targeting histone modifications associated with transcriptional activations (H3K4me3) and repression (H3K27me3) while the ChIA-PET data were generated using antibodies targeting histone modifications associated with transcriptional activation only (H3K4me3 and H3K27ac) and filtered into a final chromatin interaction dataset (see Supplementary Data 16 in [10]). Those high-resolution datasets were expected to capture local interactions (such as interactions between regulatory elements), typically not detectable via Hi-C sequencing. Co-location analysis of replicate 2 loop anchor regions with high-resolution interaction loop anchors (Table 4) showed that, while ~9 to 18% of high-resolution interaction datasets fully overlapped with Hi-C loops (“2 co-located anchors”), a majority (~68 to 80%) of all remaining high-resolution loops shared one anchor with replicate 2 Hi-C loops.
|
2 co-located anchors
|
1 co-located anchor
|
0 co-located anchor
|
Total loop counts
|
ChIA-PET
(chromatin interactions)
|
4,511
|
15,890
|
3,817
|
24,218
|
HiChIP
(H3K27me3)
|
3,612
|
24,909
|
11,297
|
39,818
|
HiChIP
(H3K4me3)
|
10,021
|
45,011
|
11,980
|
67,012
|
Table 4: co-location of loop anchors with high-resolution conformation capture datasets. High-resolution ChIA-Pet and HiChiP interaction coordinates were compared to replicate 2 Hi-C loop anchor coordinates. Co-location was determined with anchors exhibiting at least 50% overlap between the two data types. Antibodies targeting histone modifications in HiChIP are shown. High-resolution loops were counted once only, after prioritization, in the following order, for exhibiting 2, 1 or 0 co-located anchors.
Further analysis comparing replicates 1 and 2 Hi-C datasets with high-resolution ChIA-PET maize data indicated that both anchors from up to 60% of all high-resolution interactions overlapped with anchors derived from Hi-C replicates 1 or 2 (Fig. 2a). Up to 28% of the remaining high-resolution interactions shared one anchor with loop interior regions, while a small number of interior regions overlap fully with both high-resolution anchors. Interestingly, only 23,536 out of the 48,430 anchors forming high-resolution chromatin interactions were deemed as distinct, based on their exact physical coordinates.
A similar analysis was performed with a prior low-resolution Hi-C dataset [9], where 96% of all low-resolution Hi-C loops (5,393 out of 5,616) shared either one or two anchors with the replicate 2 Hi-C loop dataset. Taken together, these results suggested that deep sequencing of Hi-C libraries captured specific regions of the maize genome that were involved in transcriptional regulation. In addition, the repeated detection of loops sharing one anchor between multiple independent datasets suggested a regulatory environment where a conserved set of genomics regions was involved in complex interaction networks regulating multiple genes through the formation of distinct loops.
Another co-location analysis was performed where Hi-C loop anchor coordinates were compared for each replicate to their respective hierarchical TAD-like domain boundary coordinates. In here, loop spans (excluding loops detected in AGPv4 Chr0), rather than individual loop anchors, were analyzed and counted, depending on whether loops were fully embedded within a hierarchical TAD-like domain, or whether one or both loop anchors were located within 2Kbps flanking domain boundaries. Results are shown in Table 5 (replicate 1) and Table 6 (replicate 2).
|
Embedded loops
|
1 co-located anchor
|
2 co-located anchors
|
Level 0
|
4,272
|
1,214
|
85
|
Level 1
|
2,484
|
1,024
|
103
|
Level 2
|
852
|
366
|
67
|
Level 3
|
185
|
76
|
7
|
Level 4
|
18
|
13
|
0
|
Level 5
|
2
|
3
|
0
|
Level 6
|
2
|
1
|
0
|
Table 5: replicate 1 loop overlap with hierarchical TAD-like domains. Out of a total of 16,863 loops (excluding Chr0 loops), 10,774 overlap with replicate 1 domains, including 7,815 fully embedded within domains (“Embedded loops”). Hierarchical TAD-like domains (levels 0 to 6) are shown on the left column. “Embedded loops”: Hi-C loops are fully included within a domain; “1 co-located anchor”: one Hi-C loop anchor overlaps with one of the two domain boundaries only, while the other anchor is located inside the domain; “2 co-located anchors”: both Hi-C loop anchors overlap with boundaries from the same domain. Loops overlapping with multiple Level 0 domains or not overlapping with any domains are not shown.
|
Embedded loops
|
1 co-located anchor
|
2 co-located anchors
|
Level 0
|
5,360
|
2,484
|
308
|
Level 1
|
3,117
|
2,104
|
371
|
Level 2
|
1,109
|
769
|
142
|
Level 3
|
213
|
164
|
30
|
Level 4
|
35
|
19
|
4
|
Level 5
|
3
|
4
|
0
|
Table 6: replicate 2 loop overlap with hierarchical TAD-like domains. Out of a total of 25,628 loops (excluding Chr0 loops), 16,236 overlap with replicate 2 domains. Definitions are similar to the ones described on Table 5.
Approximately 37% of all loops detected from Hi-C replicates 1 and 2 either co-located with multiple Level 0 domains (i.e., each anchor was located in a distinct domain) or did not overlap with any domain. At least 60% of the remaining loops as shown on Tables 5 and 6 were fully embedded within a single domain.
Sequence coverage peaks from ATAC-Seq libraries are generally seen as a proxy for transcription factor binding sites and gene regulatory elements in genomic DNA [15]. Sequencing of three ATAC-Seq biological replicates led to the detection of 20,955 to 39,584 open chromatin peaks (see Additional file 5). For each replicate, peaks were classified as located between two nucleosomes (“nucleosome-free” or “NF”) or overlapping one or more nucleosome (“multi-nucleosome” or “MN”), based on the distance between ATAC-Seq paired sequencing reads. While NF peaks tended to be discrete peaks (~100bps) located immediately upstream or downstream of genes (proximal peaks), or in intergenic regions (distal peaks), MN peaks tended to be broad peaks primarily centered over entire gene regions. A list of 32,009 “consensus” NF and MN peaks was generated, where a peak had to be present in at least two individual replicates to be conserved, out of which only the 19,532 NF peaks were kept for further analysis (See Additional File 6).
Co-location analysis of ATAC-Seq NF peaks with chromatin loops initially was performed using whole replicate 1 and 2 Hi-C loop datasets and assessed, based on the following criteria. As many chromatin loop anchors were shared between multiple loops, in a significant number of cases, a peak could align to an anchor for one loop and a loop interior for another loop. Therefore, peak overlaps to loop regions were determined first based on their potential overlap to at least one loop anchor. If no overlap was detected, peaks were assessed based on their potential overlaps to loop interiors, then to genomic regions located outside of chromatin loops. Using this approach, up to 13,026 peaks, out of 19,532, overlapped primarily with anchors, while up to 4,649 peaks overlapped primarily with loop interiors (Fig. 2b). On the other hand, only 33% of all anchors overlapped with open chromatin peaks, suggesting technical constraints that limited the total number of peaks detected here, but also the possibility for distinct functions, or lack thereof, for some anchors not overlapping with peaks.
A similar outcome was observed for expressed genes. The total number of genes in B73 was estimated to be 38,847, out of which 18,700 were defined as “expressed” (see Methods) in B73 leaf whole tissue (see Additional file 7). Of these, up to 13,918 (74.4%) primarily overlapped with chromatin loop anchors (Fig. 2b). When adding expressed genes overlapping with loop interiors, up to 91.1% of all expressed genes in leaf overlapped with chromatin loops. Among the remaining 20,147 un-expressed genes, 8,162 overlapped primarily with loop anchors (from replicate 1) while another 7,672 overlapped with loop interiors, suggesting that silenced genes also could be regulated through loop formation. No major differences in expression levels were observed between genes overlapping with loop anchors, genes overlapping with loop interiors and genes located outside of loops (Fig. 2c; replicate 1 only).
The 23,536 distinct anchors from high-resolution chromatin interactions were mapped to determine whether peaks and expressed genes overlapping with Hi-C loop interiors (Fig. 2d) also could overlap with high-resolution loop anchors. Up to 92% of distinct high-resolution anchors were contained within Hi-C chromatin loops, including up to 4,890 overlapping with Hi-C loop interiors (Fig. 2d). Among those, 49% overlapped with at least one expressed gene or an open chromatin peak (Fig. 2e). Conversely, 42.7% of expressed genes and 29.9% of open chromatin peaks present in replicate 2 loop interiors (Fig. 2b) also overlap with high-resolution loop anchors.
A total of 50,929 eQTLs associated to over 18,000 maize genes, derived from genotyping-by-sequencing, high density arrays and RNA-Seq data (see Supplemental Table 6 in [13]) were aligned to anchors. For replicate 2, 17,020 eQTLs had the lead SNPs and expression traits located on the same anchor while 2,632 had them located on separate anchors from the same loop (for replicate 1, those numbers were 10,938 and 1,829, respectively). Co-location occurred on 8,745 distinct replicate 2 anchors and 8,907 distinct replicate 1 anchors. Out of the 43,398 unique SNPs derived from the eQTL dataset, 25,162 and 27,252 overlapped with replicate 1 and replicate 2 anchors, respectively. Interestingly, 29,248 eQTL SNPs also overlapped with the high-resolution chromatin interactions described above [10].
To assess whether ultra-deep sequencing of Hi-C libraries captured chromatin loops carrying distinct functions, loops mapping to hierarchical TAD-like domains (as described in Tables 5 and 6) were further analyzed by determining the overlaps of their anchor regions (at least 1bp overlap) with the same ATAC-Seq NF peaks, expressed genes and eQTL expression trait datasets used above (Fig. 3). Anchors from loops mapping to distinct domains or overlapping with one or two boundaries (at least 2Kbps overlap) within the same domain intersected more frequently with ATAC-Seq NF peaks (Fig. 3a), expressed genes (Fig. 3b) and eQTL expression traits (Fig. 3c), than anchors from fully embedded loops. The different frequencies (shown here in percentage points) were consistent between Hi-C replicates 1 and 2 (Fig. 3) and could reflect domain types with different organizational features or associated with distinctive biological functions [9]. No apparent differences were observed when plotting expression levels for genes overlapping with replicate 2 loop anchors in regard to the domain types they mapped to (Fig. 3d), suggesting that expression patterns and regulation of genes mapping to chromatin loops within domains did not necessarily differ from the ones located in loops overlapping domains or mapping to domain boundaries. Further studies, focusing for example on specific epigenetic marks or patterns in genes of interest, might be required to better understand potential variations in the mechanisms establishing their expression and regulation.
To further explore potential relationships between gene expression and loop formation, gene content of anchors overlapping with at least one NF peak was assessed. In this analysis, anchors were annotated based on the activity of overlapping genes located next to NF peaks. The data show that 62% of all NF peaks overlapping with chromatin loop anchors either overlapped or were located <2Kbps away from expressed genes, suggesting those peaks may correspond to proximal regulatory regions, including promoters. Conversely, analysis of gene activity in loop anchors (including genes present in multiple anchors) showed that, for 71% of anchors harboring genes, at least one of those genes was expressed (See Additional File 8). Expression was associated with the absence of inactive genes within the anchor for 88% of those anchors, and this trend was generally accentuated by the presence of proximal open chromatin peaks. 74% of the remaining anchors carried distal peaks, possibly representing long-range regulatory elements.
Anchor annotation using replicate 1 as an example (See Additional File 9) showed that, of all 18,296 anchors with at least one expressed gene, 15,058, or 82%, had only one expressed gene. A total of 3,684 anchors contained at least one peak and no expressed genes (including 2,285 containing one peak only). Out of those, 2,897 were part of a loop where the opposite anchor contained at least one expressed gene and at least one peak, suggesting functional interactions between proximal and distal elements facilitated by loop formation. In addition, there were a total of 5,488 loops containing at least one expressed gene in both anchors, including 1,058 harboring at least one peak in each anchor, indicating potential interactions between proximal elements. Interestingly, out of the 11,066 anchors from replicate 1 with no known peak or expressed gene, only 2,368 were combined together into one chromatin loop, suggesting that only 6.8% (1,184/17,176) of all long-range chromatin loops from replicate 1 are potentially devoid of detected gene expression activity or open chromatin regions.