A first-generation genome-wide map of correlated DNA methylation demonstrates highly coordinated and tissue-independent clustering across regulatory regions

Genome-wide DNA methylation studies have typically focused on quantitative assessments of CpG methylation at individual loci. Although methylation states at nearby CpG sites are known to be highly correlated, suggestive of an underlying coordinated regulatory network, the extent and consistency of inter-CpG methylation correlation across the genome, including variation between individuals, disease states, and tissues, remains unknown. Here, we leverage image conversion of correlation matrices to identify correlated methylation units (CMUs) across the genome, describe their variation across tissues, and annotate their regulatory potential using 35 public Illumina BeadChip datasets spanning more than 12,000 individuals and 26 different tissues. We identified a median of 18,125 CMUs genome-wide, occurring on all chromosomes and spanning a median of ~1 kb. Notably, 50% of CMUs had evidence of long-range correlation with other proximal CMUs. Although the size and number of CMUs varied across datasets, we observed strong intra-tissue consistency among CMUs, with those in testis encompassing those seen in most other tissues. Approximately 20% of CMUs were highly conserved across normal tissues (i.e. tissue independent), with 73 loci demonstrating strong correlation with non-adjacent CMUs on the same chromosome. These loci were enriched for CTCF and transcription factor binding sites, always found within putative TADs, and associated with the B compartment of chromosome folding. Finally, we observed significantly different, but highly consistent, patterns of CMU correlation between diseased and non-diseased states. Our first-generation, genome-wide, DNA methylation map suggests a highly coordinated CMU regulatory network that is sensitive to disruptions in its architecture.

Methylation of DNA CpG dinucleotides is an important regulator of gene transcription (Attwood,Yung 56 Data Normalization and Filtering 113 Our analysis pipeline was designed to work from pre-processed beta values with, at a minimum, 114 standard quality control of samples and probes (e.g. removal based on detection p values and data 115 failures), background correction, type 1 and type 2 control of color bias, and confounder adjustments 116 for batch, plate, chip, etc. as outlined in (Wilhelm-Benartzi, Koestler et al. 2013). For the datasets used 117 here, we proceeded from the primary data processing and normalization as described in the 118 accompanying manuscripts (TCGA, Blood - (Lehne, Drong et al. 2015), (Konigsberg,Barnes et al. 119 2021) and SAM (Schulze, Swaminathan et al. 2019). In order to mitigate against the undue influence of 120 outliers in the datasets, we converted probe-wise sample distributions to normal distributions using 121 qqnorm in R. The analysis pipeline is also designed to utilize post-hoc filtered CpG sites from raw data. 122 In this proof of principle study, we limited our analyses to autosomal regions to obviate the added 123 complexity of methylation patterns on the sex chromosomes. 124 125

Image Clustering Method (ICM): Identifying Correlated Methylation Units (CMUs) 126
We divided quality-controlled methylation data into non-overlapping windows, arbitrarily set to a 127 genomic distance of 250 Kb. Pearson correlations of methylation profiles of all possible pairs of probes 128 within each window were then calculated (Fig. 1), resulting in a correlation matrix for each window. 129 The size of a correlation matrix (symmetric and square) for a window depends on the number of CpG 130 probes encompassed, which varied from window to window. Each 250kb-window correlation matrix 131 was treated as an image, with each pixel representing the strength of correlation bounded by the 132 extremes of negative (-1) and positive (+1) correlation. The image was then smoothed using box 133 filtering with unequal weights (Fig. 1, Supp. Fig. 2). To reduce the noise in the system, values below a 134 minimal cutoff were set to 0. When working with a given (individual) cohort we found a correlation 135 cutoff of 0.6 to provide a sufficient balance between noise and signal; however, this cutoff can be 136 adjusted depending on sample size and other factors such as knowledge and existence of heterogeneity 137 of cell types of the tissue of interest, or the goal of the study -e.g. higher cutoffs can be used to 138 identify more tightly correlated methylation units that could be interpreted as sub-CMUs (Supp. Fig.3). 139 For example, we utilized 0.4 cutoff for studying CMUs across tissues in order to provide larger regions 140 that could be evaluated across tissues. 141 142 After setting a cutoff threshold, we then computed image gradients to identify edges in the image. 143 Rectangular edges then lead to the identification of contiguous (genome proximal) and non-contiguous 144 CMUs (interspersed with non-correlated sites). Two or more contiguous CMUs with strong positive or negative correlation with each other but interspersed by CpG sites without strong methylation 146 correlation (unsynced units) are then referred to as non-contiguous CMUs (Fig. 1). In this 147 classification, our underlying assumption is that CpG sites within a CMU that are not assayed by the 148 array (Illumina 450k or EPIC array) are also in sync with the interrogated CpG sites. In preliminary 149 studies, we found this to generally hold true. Deviations from this assumption should only affect the 150 size classification of the underlying CMUs (i.e. create smaller units), but should not substantively 151 impact the functional interpretations of CMU presented in this paper. 152

153
Tissue Independent contiguous and non-contiguous CMUs 154 Each TCGA normal group was independently processed using ICM to obtain group-specific contiguous 155 and non-contiguous CMUs. Genomic regions constituting contiguous CMUs in more than 80% of 156 unique tissues (when more than one dataset is available for a tissue, we chose the one with largest 157 sample size) were identified as conserved contiguous CMU genomic regions (Supp. Fig. 4). For the 158 purpose of studying conserved non-contiguous CMUs across tissues we limited our search to non-159 contiguous CMUs that consisted of 3 or more contiguous CMUs and defined the genomic region under 160 consideration as occurring between the first bp position of the first CMU (in genomic linear order) and 161 end bp position of the last CMU. As above, genomic regions constituting non-contiguous CMUs in 162 more than 80% of the tissues were identified as conserved non-contiguous CMU genome regions 163 To compare genomic regions of two ICM defined sets of CMUs, say ! and # (arising from different 168 cohorts, tissues, parameters, etc. ) we define an asymmetric similarity score that can also be viewed as 169 a Tversky index (Tversky 1977) with parameters α=1 and β=0. 170

Annotation of Regulatory Features 173
We utilized the Ensemble regulatory feature database, derived from the Regulatory Build process as 174 described in (Zerbino, Wilder et al. 2015). Overlap between a regulatory feature region and a CMU 175 region was annotated using bedtools. Ensemble was used to determine regulatory features: Promoter, biased in their placement (Bibikova, Barnes et al. 2011), which could bias feature enrichment analysis; 178 therefore, we generated random sets of CMUs (background control CMUs) to mirror the 179 observed/actual CMUs. To determine whether specific regulatory features were enriched in a defined 180 CMU, we used two tests: a CpG probe-based test and a region-based test (Supp. Fig. S13). For region-181 based tests we used a comparison of 'background' random regions chosen without regard to their inter-182 CpG correlation. These were identified by first obtaining the number of CMUs found by the ICM 183 algorithm, then removing windows with insufficient CpG sites (<4 [lenient] or <10 [stringent] sites) in 184 the ICM analysis. We then chose n windows randomly out of the remaining (450K-contained) 185 windows. Within each of these random windows, we then picked a region matching the genomic size 186 of the CMU under test to act as a background control for each observed CMU. These random 187 background CMU units were annotated as described above. The number of times a feature was 188 annotated by this test CMU was then made relative to annotations of the random background CMU 189 units. As an example, if "CTCF region" was annotated five times, it receives a score of five (5). The chromatin features, we wanted to also contextualize the overlap between Illumina CpG sites and 211 chromatin features; therefore, for a given set of CMUs and a chromatin feature we calculate the 212 following parameters: 213 Tissue specific CMU overlap comparison for A/B compartment 220 We computed the above values for each pair of tissue specific non-cont CMUs and tissue specific A/B 221 compartments and combine these to provide a tissue pair score: 222 The same computation was repeated for annotations of the B compartment. If CMUs are found in A 224 and B compartments with equal frequency, then this score will be similar across both compartments. 225 226

Differential CMUs 227
To determine the potential for differential CMU between any two comparison groups, we first 228 separated each cohort into cases and controls. This approach requires that either the samples are 229 collected from a homogeneous tissue and/or that tissue/cell proportions are known or can be robustly 230 estimated. Normalized methylation data (Y) was regressed on confounding factors (X) (e.g. age, 231 gender, cell/tissue proportion for non-homogeneous data, etc.); Y=aX+R. The residuals (R) of this 232 regression were then used to compute pairwise correlations, also known as partial correlations. ICM 233 was then used for each group separately to identify CMUs as described above. For each CMU defined 234 within a given group, a corresponding paired correlation matrix is compiled in the other group using 235 the same CpGs. In this way each group-defined CMU is represented in cases and controls. The 236 correlation matrices within each pair are then compared to identify CMUs that are significantly more 237 correlated in the CMU-defining group when compared to the other group. 238 239 A CMU is identified as showing significant differential correlation if it passes multiple correction p-240 values for two tests -the first is a correlation matrix comparison test derived using the cortest function 241 in R (Revelle 2020) with inputs: R1 as CMU correlation matrix for the CMU defining group, R2 as the 242 corresponding matrix for the other group, n1 number of samples used to compute R1, n2 number of 243 samples used to compute R2 and Fisher=TRUE. This test is based on a chi square comparison 244 proposed by Steiger to study correlation matrices (Steiger 1980). Chi-square statistics are computed on 245 Fisher transformed correlation values, accounting for sample sizes of both groups and using a degree of 246 freedom that depends on the number of CpG probes present in the CMU. The null hypothesis was that 247 the off-diagonal entries of the difference between the two correlation matrices (R1-R2) would be zero. 248 The p-values are corrected using Bonferroni correction for each dataset separately. The second test 249 compares off-diagonal entries using Wilcox rank sum test (implemented in R as wilcox.test function 250 providing input paired=FALSE). The alternative hypothesis is that the distribution of the diagonal 251 entries of the CMU defining group is shifted to the right of its counterpart by mu or more (correlation 252 level difference); for the sample sizes used here, we chose mu= 0.1 as an arbitrary but sensitive cut-off.

Image Clustering captures Correlated Methylation Units (CMUs) on the 450K/EPIC array 263
In order to perform a genome-wide scan for Correlated Methylation Units (CMUs), we first divided the 264 genome into non-overlapping windows spanning 250kb. Strongly correlated CpG sites have been 265 estimated to extend 10kb on average (Liu, Li et al. 2014); therefore, we chose 250kb to allow us to 266 capture multiple potential CMUs in a computationally tractable manner. For the Illumina 450K array 267 there are ~7000 windows across the autosomal genome (Supp. Fig. S1a) with the number of CpG sites 268 within a window varying from a median of 37 to a maximum of 1300. The highest CpG probe density 269 was observed at the HLA region of chromosome 6, with higher CpG densities around chromosome 270 telomeres (Supp. Fig. S1b). To determine CMUs, we calculated Pearson correlation between all pairs 271 of probes within the 250kb window, resulting in a correlation matrix for each window (Fig. 1a); a total 272 of ~22 million CpG pairs across all windows were assessed. 273 274 Instead of binning CpG sites based on correlation between immediate adjacent sites and imposing a 275 hard correlation score threshold, we used an Image Clustering Method (ICM) (Methods) to identify 276 CMUs based on the presence of an overall strong correlation amongst CpG methylation profiles within 277 a CMU even if the correlation was not high between every CpG pair within the CMU. This follows 278 from ICM treating the correlation matrix in each 250kb-window as an image, with each pixel 279 representing a correlation score between -1 and +1 (Fig. 1b). The image was then smoothed using 280 pseudo box filtering (Supp. Fig. S2). The values of image pixels below a threshold a were set to 0. 281 Image gradients were then computed to identify edges where the correlation between methylation 282 profiles of adjacent CpG sites fell to zero, and to capture contiguous CMUs (at least 4 adjacent CpGs 283 with strongly correlated methylation profiles), and non-contiguous CMUs where there was significant 284 correlation between methylation profiles of non-adjacent CMUs (Fig. 1b). Imposing a particularly 285 "low" (<0.4) correlation threshold a reduces the noise in the system; higher correlation thresholds 286 result in smaller (or sub-) CMUs that are part of the larger CMUs observed at lower thresholds (Supp. 287 Fig. S3). In our model, we chose 0.6 as the 'default' correlation to balance between these two 288 extremes; consistent with this, we observed non-contiguous CMUs defined using a higher threshold 289 often fell completely within a lower threshold-associated contiguous CMU (Supp. Fig.S3). 290

291
We used ICM to identify CMUs across 35 in-house and publicly available cohorts ( Table 1, Supp  292   Table 1). Starting with eleven (11) groups comprised of normal (non-tumor) samples from TCGA as 293 our baseline discovery set, ICM defined a median of 18,125 contiguous, dataset-specific CMUs (range: 294 4993-21,279), with a median length of 860 basepairs (bps) (IQR: 381bp to 3,423bp) ( Fig. 2A). These 295 CMUs were found to be broadly distributed across the genome in each tissue (Fig. 2B, Supp. Fig. S4), 296 with the HLA region on chromosome 6 generally having the most CMUs (Supp. Fig. S5). On average, 297 ~30% of all CpGs were contained within one of the CMUs identified (Supp. Fig. S6), and each CMU 298 consisted of a median of 6 CpG sites (range 5-300; Fig. 2C). The overall distributions for the number 299 of CMUs, the number of CpGs within a CMU, and CMU length in cancer tissues were comparable 300 with their respective normal tissues (Supp. Fig. S7). 301

302
We noted that about half of the CMUs within a given dataset had evidence of correlation with near-by 303 CMUs. We formally denoted these as non-contiguous CMUs (Supp . Table 3), where two adjacent 304 CMUs are interspersed with CpGs that belong to neither CMU. In total, there were a median of 3,689 305 non-contiguous, dataset-specific CMUs (range: 586-4,789), with each non-contiguous CMU having an 306 average of 3 CMUs per 250kb window (Supp. Table 4). As expected from the distribution of CMUs, 307 non-contiguous CMUs were also broadly distributed across the genome in each tissue (Fig. 2B, Supp.  308   Fig. S9). 309 310

CMUs are consistent within a given tissue and have variable cross-tissue overlap 311
To understand the consistency and interrelationship of CMUs across tissues, we introduced an 312 asymmetric similarity measure between any given ordered pair of CMUs S1 and S2: Asy(S1, S2), which 313 varies between 0 and 1 (methods, Supp. Fig. 10). We applied this to every possible ordered pair of 314 available tissue datasets (Table 1), from which we derived an asymmetric square matrix that quantifies 315 the representation of CMUs both across and between tissues (Fig. 3, Supp. Table 5). Varying the 316 second tissue for a fixed first tissue S1, Asy(S1, *), quantifies the representation of the first (S1) tissue's 317 CMUs in those of the other tissue, whereas Asy(*,S1) quantifies how representative the second tissue's 318 (S1) CMUs are of CMUs found in the other tissues. We see a variation in these scores that ranges 319 between 0.16 to 0.8 (median Asy(S1, *)) and .08 to .76 (median Asy(*,S1)), which indicates presence of 320 different degrees of unshared (tissue specific) CMUs and shared CMUs among tissues. 321 322 CMUs identified in kidney, pleura, thyroid had the highest presence in other tissues (median 323 Asy(Kidney,*) between 0.7 and 0.8, median Asy(Thyroid,*)=0.75, median Asy(Pleura, *)=0.75). CMUs 324 of kidney, blood and thyroid were least representative of other tissues (median Asy (*,blood)=0.09, Asy 325 (*,Kidney)=0.08 and Asy (*,Thyroid)=0.08 ) (Fig 3). Testis stood out among all tissues as being the 326 most representative of CMUs found across our datasets (median Asy(*,Testis)=0.76). This is consistent 327 with the germline haploid nature of most testicular tissue. Further, hierarchical clustering, based on 328 CMU asymmetric similarity measures, clustered pairs of the same tissue, even from different studies 329 (e.g. studies of kidney tissue: KICH,KIRC and KIRP, studies of colorectal tissue: COAD and READ, 330 studies of peripheral blood: 450k chip and EPIC chip), as well as most biologically paired tissues 331 (normal and cancer) within in a study (Fig 3), giving additional weight to our methodology. 332

CMU conservation across tissues is commonly observed 334
The high asymmetric scores seen between tissues suggested that the information content and utility of 335 CMUs may be partially independent of the tissue; therefore, to provide a context for interpretation 336 across tissues, we looked for CMUs that were present across most tissues. For this evalation, we excluded the highly-complex HLA region, given the extensive polymorphism and uncertain 338 methylation readouts from array-based platforms in this region (Rakyan, Hildmann T Fau -Novik et 339 al.). Further, we focused on higher confidence CMUs by restricting our analysis to CMUs with >10 340 adjacent correlated CpG sites (as opposed to the more liberal >4 adjacent CpG sites used in our initial 341 analyses) (Supp. Fig.8) and looked for the largest region of overlap across each tissue-defined CMU 342 (Fig. 4A). We defined tissue-independent CMUs (TI-CMUs) as CMUs observed in more than 80% of 343 unique normal TCGA tissues (Fig. 4B). We found 473 TI-CMUs across the genome (Fig. 4C, Supp. 344 Table 6), representing ~20% of dataset-specific CMUs. Consistent with the asymmetric similarity 345 score, thyroid, kidney, and blood showed the most CMU overlap with TI CMUs (up to 50%; Supp 346 Fig.11), and TI CMUs identified using only normal tissues had a comparable overlap with CMUs of 347 both their paired cancerous tissue, and of other cancer tissues (Fig 4D). 348 349 Next, we sought to investigate tissue-independent long-range correlation of methylation with an 350 expectation that CMUs within a non-contiguous CMU may not have the same boundaries across 351 tissues. Therefore, we identified tissue-independent (TI) non-contiguous CMUs by simple intersection 352 of genomic regions (defined by the start of first CMU under consideration and end of last CMU under 353 consideration) across tissues (Fig. 5A). This approach necessarily allows for variation in CMU 354 boundaries within tissue-specific non-contiguous CMUs. Despite this, at a given genomic locus, 355 several non-contiguous TI CMUs had a similar distribution of CMUs across many tissues (Fig. 5B,  356 Supp Fig. 12). In total we obtained 74 TI non-contiguous CMUs (Supp . Table 7), with a region on 357 chromosome 5q15 (93570534-93596339, hg38), showing three consecutive CMUs wherein the middle 358 one consistently negatively correlated with the other 2 adjacent CMUs within every tissue studied (Fig.  359   5C). 360

TI CMUs are enriched for regulatory annotations and appear to link regulatory units 362
In annotating CMUs across the genome, we identified a non-contiguous CMU across the HOX gene 363 family on chromosome 17 that appeared to link methylation overlying promoter regions and enhancers 364 (Fig. 6); subsequently, we also noticed that many non-contiguous TI CMUs overlapped clusters of 365 related genes (e.g. PCDHA, PCDHB, SNORD, KRTAP,OR(Olfactory receptor family)) (Supp. Ensembl regulatory features except for enhancer regions (Fig. 7A). Specially, enrichments of CpG island and H3K4me3 were most prominent (probe-based test; see methods), and this was also found 371 using a secondary method of assessing enrichment (Supp. Fig.13). Notably, only a small number of 372 CpG probes are found over annotated enhancer regions on the 450K/EPIC arrays, and almost none of 373 the resulting CMUs overlapped this feature. We note, however, that Ensembl 'enhancer' annotations 374 overlap with promoter flanking regions and are collapsed into this latter category when the two 375 overlap; thus, enhancer regions may not completely be devoid of CMUs. 376 377 By contrast, TI non-contiguous CMUs were enriched for transcription factor (TF) and CTCF binding 378 sites, with the caveat that TF binding site annotations are the default for regions not included in any 379 other primary annotations (i.e., promoter, promoter flanking, CTCF, and enhancer), and so are likely to 380 include distant binding sites. CTCF binding sites are often linked to chromatin looping, suggesting a 381 potential link between 3-dimentional spatial DNA folding and TI non-contiguous CMUs. We also 382 observed that in contrast to TI contiguous CMUs, TI non-contiguous CMUs tended to overlap multiple 383 regulatory regions (Fig. 7B). In part this is due to the larger genomic regions of TI-non-contiguous 384 CMUs and the non-linear probe placement of the array, but alongside examples of positive and 385 negative correlation between distant CMUs, this might also suggest long-range coordination between 386 regulatory motifs. 387

TI non-contiguous CMUs occur within TADs and appear more frequently in the B compartment 389
The long-range interactions of contiguous CMUs were reminiscent of regulatory Topographical 390 Associated Domains (TADs) and loop domains described from Hi-C and related 4D genome folding 391 assays. We thus sought to determine the relationship between TADs and our TI non-contiguous CMUs. 392 When we looked at tissue specific TADs (from (Wang, Song et al. 2018), Methods) we found that 393 even though the resolution of conserved TADs (~1Mb) is substantively larger than our correlation 394 windows, almost all of our TI non-contiguous CMUs (>90%) were contained within TADs (i.e. did not 395 cross an annotated TAD boundary) (Fig. 7C). For chromatin loops on the other hand, among the TI 396 non-contiguous CMUs overlapping this annotated feature, more than 50% overlapped two chromatin 397 loops (Fig. 7C), and this was true for almost all of the tissues assessed. This suggests a physical 398 proximity of the two ends of the two overlapping chromatin loops (Supp. Fig. 14). Further, as expected 399 for any annotation, there is a linear relationship between the fraction of TI non-contiguous CMUs 400 overlaps and fraction of Illumina CpG overlap (Fig. 7C). But TADs have the smallest slope implying 401 that even with less CpG overlap there is higher CMU overlap. We also found that for a given fraction 402 of TI non-contiguous CMUs overlapping the A compartment, the fraction of CpG probe overlap with B 403 compartment was much lower (Fig. 7C). To further investigate, we compared tissue specific CMUs 404 overlapping A/B compartment and observed a similar pattern (Supp. Fig.15), implying that CMUs 405 occur more frequently in the B than the A compartment. 406 407 Differential correlation patterns between disease states highlight putative candidate loci 408 Finally, we explored the impact of disease-state on CMUs using buccal cell samples from children with 409 severe acute malnutrition (SAM) ( Table 2). We previously demonstrated that these childhood buccal 410 samples are composed predominantly of buccal epithelial cells (median probability of predicted buccal 411 samples: 0.84) (Schulze, Swaminathan et al. 2019); consequently, we anticipated that differences in 412 CMU correlation patterns between childhood SAM cases (with edematous malnutrition) and controls 413 (with non-edematous malnutrition) would be largely the result of phenotypic differences, rather than 414 cellular composition. For this analysis, we also included covariates of age, gender and country. 415

416
Three CMUs showed evidence of differential correlation -characterized in this first-pass analysis as 417 weaker correlation between CpGs (Figure 8A & 8B, Supp. Table 8). This observation is consistent 418 with wide-spread differential hypomethylation between Edematous SAM (ESAM) and Non-edematous 419 SAM (NESAM) noted at specific CpG clusters in these samples (Schulze, Swaminathan et al. 2019); 420 however, none of the differentially correlated regions showed statistically significant differential 421 methylation. The strongest differential CMU signal was found on chromosome 5 overlying the PCDHB 422 gene family (Fig. 8B) Table 1. Horizontally placed tissue labels on the 520 bottom (x-axis) corresponds to the first dataset in the ordered pair (S1) and vertical tissue labels on the 521 right (y-axis) corresponds to the second dataset in the ordered pair (S2). Both sides of labels are colored 522 with median column and row scores respectively and the dendrograms are based on hierarchical 523 clustering (distance=1-cor and complete-linkage). 524 525 526 CMUs along chromosome 7 for normal TCGA tissue samples with rectangular red boxes illustrating regions for which ICM identifies CMUs across the majority of tissues (>80%, methods). 4b. Identification of tissue independent CMUs. The unbroken horizontal black bar at the base represents CpGs across a given genomic region. Alternate grey and black hatched bars represent genomic regions of CMUs of tissues in consideration. Bottom most black bars outlined in grey represent the genomic regions identified as TI CMUs. 4c. Circos plot of TI CMU regions in the outer most panel (black). TI CMU annotation for regulatory regions is displayed in the corresponding regulatory circular inner panels. 4d. TI CMU representation in tissue CMUs using Tversky index (y axis; methods). Each dot corresponds to a dataset, split in 3 categories (x-axis) unpaired cancer tissues, normal tissues, and paired cancer-normal tissues (see Table 1).  8a. Volcano plot of SAM differential CMUs identified for edematous-(cases) and non-edematous-598 (controls) malnutrition samples (see methods). Each dot represents a CMU, with blue dots having 599 significant differentially correlated CMUs at a Bonferroni threshold for the number of tests. A negative 600 effect size implies higher correlation in cases and a positive effect size the opposite (methods). 601 8b. A differential CMU on chromosome 5 showing correlation patterns between non-edematous (above 602 diagonal) and edematous (below diagonal) malnutrition subjects (left panel); non-edematous adults 603 (above diagonal) and edematous adults (below diagonal) (middle panel) and normal (above diagonal) 604 vs cancer (below diagonal) samples for BRCA study. 605 8c. Example of a differential CMU where small CMU units in cases show correlation but lack of 606 correlation among themselves. This was only observed in samples from Jamaica.