The Competition between DNA Methylation and Demethylation is Associated with Transcription Regulation and Tumorigenesis

The mammalian DNA methylome is formed by two antagonizing processes, methylation by DNA methyltransferases (DNMT) and demethylation by ten-eleven translocation (TET) dioxygenases. Although the dynamics of either methylation or demethylation have been intensively studied in the past decade, their competition effect remains elusive. Here, we quantify the competition between DNA methylation and demethylation by the percentage of unmethylated CpGs within a partially methylated read from bisulfite sequencing. After verifying methylation competition by its strong association with the co-localization of DNMT and TET enzymes, we observe that methylation competition is strongly correlated with gene expression. In particular, during tumorigenesis, the elevation of methylation competition is associated with the repression of 40~60% of tumor suppressor genes, which cannot be explained by promoter hypermethylation alone. Furthermore, methylation competition can be used to stratify large undermethylated regions with negligible differences in average methylation into two subgroups with distinct chromatin accessibility and gene regulation patterns. Together, methylation competition represents a novel methylation metric important for transcription regulation and tumorigenesis and is largely distinct from conventional metrics, such as average methylation and methylation variation.


Introduction
DNA methylation at CpG dinucleotide(5mC) is introduced and maintained by DNA methyltransferases (DNMT family) 1,2 . Meanwhile, through hydroxymethylation, 5mC is removed by ten-eleven translocation dioxygenases (TET family) 3,4 . Besides their opposing effects, the two enzyme families present complementary DNA binding patterns. While TET1 protein prevents de novo methyltransferases from binding to regulatory elements 5,6 , DNMT3A also blocks TET1 binding, especially in promoter regions 6 . Interestingly, these two 'competing' enzyme families are observed to be jointly associated with tumor malignancy. For example, the DNMT3A and TET2 double-knockout mice show worse survival than single-knockout counterparts 7 ; also, mutations in DNMT3A and TET2 significantly co-occur in human T-cell lymphoma 8 .
These findings suggest that the competition between methylation and demethylation processes is related to tumorigenesis. However, to what extent this competition contribute to cancer gene regulation remains largely unknown.
For years, DNA methylation levels have been quantified in an 'average' manner. The increased average methylation level of CpG island (CGI), i.e., CGI hypermethylation, is a well-established mechanism for gene silencing 9 . Numerous differentially methylated regions have been identified based on between-sample comparison of average methylation levels 10,11 . Besides the average methylation, DNA methylation has been quantified by its variation as 'methylation heterogeneity' 12 or 'epigenetic polymorphism' 13 . Methylation heterogeneity scores are defined based on the frequencies of methylation patterns (epialleles) at multiple CpGs inferred from bisulfite sequencing reads [12][13][14][15] . Previous studies reveal that methylation variation is associated with global transcription variation 12,16,17 . However, neither average methylation nor methylation variation can delineate the degree of competition between active methylation and demethylation.
Recently, a mathematical model has been applied to deconvolute methylation and demethylation rates from average methylation levels of individual CpGs in stem cells 18 . However, the spatial coupling of methylation competition at adjacent CpGs has not been considered, and such coupling may be critical for transcription factor (TF) binding and cancer gene regulation 19 .
Bisulfite sequencing has enabled the measurement of DNA methylation of adjacent CpGs within the same read 20 , and thus it captures methylation competition if there are unmethylated CpGs in a partially methylated read. Through integrative analysis of 75 methylomes and 44 transcriptomes, we demonstrate that methylation competition unveils a new type of methylation abnormality, which is largely distinct from both the change of average methylation and methylation variation. We find that methylation competition is associated with a previously undetected repertoire of epigenetically regulated tumor suppressor genes (TSGs), and it can be used to stratify large undermethylated regions into two subgroups with distinct characteristics in chromatin accessibility and gene regulation.

Delineating the competition between active DNA methylation and demethylation
We quantify the competition between active DNA methylation and demethylation within the same cell by dissecting reads from bisulfite sequencing. The active competition events are captured by the unmethylated CpG(s) within a partially methylated read (slashed circles in Fig 1a) because each read comes from one cell. while 'C' fragments are segments of unmethylated CpGs within partially methylated reads. We define the methylation competition ratio of a genomic region as the sum of 'C' fragments' weights divided by the sum of all fragments' weights in that region (Eq.1 in Methods). Each fragment's weight is set as its number of CpGs (see Methods).
We compare the methylation competition ratio with two measures of the average methylation (i.e., the traditional mean methylation and the cellular heterogeneityadjusted clonal methylation (CHALM) 21 , see Methods) and three measures of the methylation variation (i.e., Shannon's entropy 15 , Epipolymorphism 13 , and the proportion of discordant reads (PDR) 12 , see Methods) across gene promoters using whole-genome bisulfite sequencing (WGBS) data from mouse embryonic stem cells (mESCs). We observe that these six methylation measures are correlated to different degrees with DNA binding intensities of DNMT3A1 and TET1 enzymes in matched sample measured by ChIP-seq (chromatin immunoprecipitation followed by high-throughput sequencing). We observe that the average methylation measures are correlated positively and negatively with the binding intensities of the methyltransferase DNMT3A1 and the demethylase TET1, respectively, consistent with the known enzymatic activities of the two enzymes ( Supplementary Fig 1b and   1c). The methylation variation measures show similar correlation patterns ( Supplementary Fig 1d, 1e, and 1f). In contrast, the methylation competition ratio is positively correlated with both DNMT3A1 and TET1 binding intensities ( Supplementary Fig 1a). To further explore the two enzymes' joint effects, we define the DNMT3A1-TET1 'joint regulation score' ( ) of a promoter as the product of DNMT3A1 and TET1 binding intensities within that promoter (see Eq.2 in Methods).
This joint regulation score depicts the extent to which the promoter is co-occupied by both enzymes, and it takes a low value if either enzyme has a low binding intensity.
As expected, the joint regulation score has a strong positive correlation with the methylation competition ratio (Fig 1b) but not as much with the average methylation measures (Fig 1c and Supplementary Fig 1c) or the methylation variation measures (Fig 1d and Supplementary Fig 1e and 1f). This result is consistent with the fact that average methylation and methylation variation are neutralized by the additive effects of two opposing enzymes and that only the methylation competition ratio characterizes the competition between DNMT3A1 and TET1. In the following text, we use the traditional mean methylation as the average methylation measure and Shannon's entropy as the methylation variation measure.
Based on DNMT3A1 and TET1 co-localization patterns, we categorize promoters into four groups: DNMT3A+TET1+, DNMT3A+TET1-, DNMT3A-TET1+, and DNMT3A-TET1-, where + and -indicate strong and weak bindings, respectively. As shown in Fig 1e, the methylation competition ratio is significantly higher in 'DNMT3A+TET1+' promoters than in the other three groups of promoters, which are bound by only one or none of the two enzymes. For example, Prok2 (Fig 1f) has a DNMT3A+TET1+ promoter and a high methylation competition ratio, while Ogt ( Fig   1g) has a DNMT3A-TET1-promoter and a low methylation competition ratio.
Notably, only the methylation competition ratio can distinguish the different colocalization patterns of Prok2 and Ogt, while the average methylation and the methylation variation cannot. Moreover, only the methylation competition ratio is predictive of the expression levels of Prok2 and Ogt (Prok2 has a high ratio and low expression, while Ogt has a low ratio and high expression), while the average methylation and the methylation variation are not (both genes have similar average methylation and methylation variation levels). These findings are supported by external evidence in mouse samples with Dnmt or Tet knockout. As expected, Dnmt3a knockout leads to genome-wide hypomethylation, whereas Tet knockout leads to global hypermethylation (Fig 1h). We observe that either knockout results in more regions with decreased methylation competition ratios than regions with increased ratios, an observation consistent with our definition of methylation competition ratio (Fig 1i). Overall, these results confirm that the methylation competition ratio, a new quantitative measure of methylation, can delineate the antagonism between methylation and demethylation processes.

Methylation competition is negatively correlated with gene expression
Our previous analysis of genes Prok2 and Ogt (Fig 1f and 1g) suggests that the methylation competition ratio may be a better predictor of gene expression than the average methylation and the methylation variation. To further examine the relationship between these quantitative measures of methylation and gene expression, we first calculate the methylation competition ratios, the average methylation, and the methylation variation of every CpG site and transcription regulatory elements/regions of three types (promoters, gene-body regions, and enhancers, see Methods) using WGBS data of primary cells and normal tissues from Epigenomic Roadmap Consortium 22 . Then we quantify gene expression levels using the RNA sequencing (RNA-seq) data from matched samples (Supplementary Table   1) and separate all genes into four equal-sized groups based on expression quantiles (0-25%, 25-50%, 50-75%, and 75-100%). As indicated in Compared with the average methylation (measured by both the traditional mean and CHALM) and the methylation variation, the methylation competition ratio is better correlated with gene expression, and this phenomenon is consistent across all three types of regulatory elements in CD3+ T-cells (Fig 2b and Supplementary Fig 2b). In contrast, the correlations between gene expression and the average methylation or the methylation variation are not consistent across the three types of regulatory elements; for example, CHALM values in gene body regions and enhancers have much lower correlations with gene expression than CHALM values in promoters do ( Supplementary Fig 2b). We further confirm this phenomenon using WGBS and RNA-seq data of other primary cells, fetal tissues, and adult tissues ( Supplementary   Fig 3). All together, we show that the methylation competition ratio is a better predictor of gene expression than both the average methylation and the methylation variation.

Effects of sequencing depth and CpG density on quantification of methylation competition
Sequencing depth is a key factor that affects quantitative analysis of high-throughput sequencing data. Using a down-sampling analysis, we observe an expected phenomenon that the methylation competition ratio has a stronger negative correlation with gene expression when sequencing depth increases, and the correlation plateaus at ~60x depth ( Supplementary Fig 5). Notably, across sequencing depths from ~4x to ~86x, the methylation competition ratio consistently has better correlations with gene expression than the average methylation and the methylation variation do.
Besides sequencing depth, CpG density also affects the correlations between methylation measures and gene expression. To investigate this issue, we stratify genes into three groups based on the CpG densities in their promoters, i.e., high-CpG promoter (HCP) genes, intermediate-CpG promoter (ICP) genes, and low-CpG promoter (LCP) genes 23 . For all three methylation measures (the methylation competition ratio, the average methylation, and the methylation variation), the correlations between their values in promoters and gene expression decrease from HCP genes to ICP genes and further down to LCP genes. Among the three measures, the methylation competition ratio consistently has the strongest negative correlation with gene expression for all three groups of genes ( Supplementary Fig 6).

Methylation competition is associated with the repression of tumor suppressor genes
To investigate whether methylation competition is involved in tumorigenesis, we apply gene set enrichment analysis (GSEA) to profiles of methylation competition ratios in promoters in 8 TCGA normal samples. As shown in Supplementary Fig 7a   and 7b, the curated tumor suppressor genes (TSGs) in the COSMIC Cancer Gene Census (CGC) database 24 are associated with low methylation competition ratios. In contrast, cell lineage genes such as Homeobox genes are associated with high methylation competition ratios. In addition, housekeeping functions, e.g., 'pentose phosphate pathway', do not exhibit associations with methylation competition.
Further analysis of 56 normal samples' methylomes shows a conserved pattern of low methylation competition at the promoter of TP53, a well-known TSG ( Supplementary Fig 7c), and high methylation competition at the promoter of NKX1-1, a Homeobox transcription factor related to organ development and regeneration 25 ( Supplementary Fig 7d).
Promoter hypermethylation is a well-established mechanism for TSG silencing in tumor 26 . We compare the methylation competition ratios and the average methylation between the normal and tumor uterus samples from the same patient in TCGA.
Through an overlap analysis of genes that exhibit changes in methylation competition and/or average methylation, we identify three gene groups: 863 methylation-competition elevated and average-methylation stable genes (P1), 1,130 methylation-competition stable and average-methylation increased analysis, we find all three groups of genes enriched in various cancer-related signaling pathways (Fig 3d). We also find 27 TSGs in the three gene groups, and more than 44% (12/27) of them cannot be explained by promoter hypermethylation alone (P1 TSGs in Fig 3e). For example, ZBTB16 (also known as PLZF) is a P1 TSG that inhibits prostate cancer tumor growth through the interplay with PTEN and FOXO3a 27 , and its genetic alterations have been found in metastatic prostate cancer samples 28 . Although ZBTB16 has an unclear function for uterine cancer, our result suggested that the elevation of methylation competition in its promoter may indicate a novel mechanism for its tumor suppressing role.
We also analyze the methylomes of normal and tumor breast samples from the same patient in TCGA ( Supplementary Fig 8). Again, 66% (10/15) of repressed TSGs can be explained by the methylation competition elevation but not by hypermethylation, and ZBTB16 is again an example ( Supplementary Fig 8e).
To compare the methylation competition ratio again with the methylation variation (i.e., the methylation entropy), we perform a similar overlap analysis of genes that exhibit changes in methylation entropy and/or average methylation. We find that the 243 entropy elevated and average-methylation stable genes (Q1, Supplementary Fig   9a and 9b) are not significantly repressed in the tumor uterus sample than in the matched normal sample, and that they do not include any TSGs ( Supplementary Fig   9c, 9d, and 9e). This result shows that, unlike methylation competition, methylation entropy is not associated with the repression of TSGs.
Collectively, our analysis suggests that the methylation competition reveals a repertoire of epigenetically regulated tumor suppressor genes that cannot be detected by the average methylation or the methylation variation.  Fig 10d). To examine the role of methylation competition in their function, we classify the methylation canyons into highcompetition canyons (blue in Fig 4) and low-competition canyons (red in Fig 4), based on a cutoff derived from the genome background (see Supplementary Fig 10 and Methods). Although the two groups do not differ in their average methylation, their chromatin accessibilities are dramatically distinct (Fig 4a). Accordingly, we refer to the low-competition canyons, which are enriched with active markers (H3K4me3 and DNase I hypersensitive sites), as active canyons (aCanyons); we refer to the high-competition canyons, which are bound by H3K27me3, as Polycomb canyons (pCanyons).

Methylation competition in large undermethylated regions
Although the methylation variation measures (entropy, Epipolymorphism, and PDR) also differ between aCanyons and pCanyons ( Supplementary Fig 11), they do not provide the same information as the methylation competition ratio does: only the methylation competition ratio is higher inside pCanyon than in the flanking regions.
This phenomenon is consistent with the finding of a previous study that TET1 binding is elevated in canyons in Dnmt3a knockout sample 6 .  Fig 13).
To gain further insights into the functions of canyons, we define canyon targets as the genes whose promoters or gene-body regions overlap with canyons. Again, consistent with the definitions of aCanyons and pCanyons, aCanyon targets are highly expressed, while pCanyon targets are almost silenced (Fig 4b). Gene ontology analysis reveals that aCanyon targets are enriched with TSGs and cancer pathways, while pCanyon targets are enriched with cell fate commitment and Homeobox genes. As a negative control, randomly selected genes are not enriched with these functional terms (Fig 4c).
Furthermore, we correlate the expression of canyon targets with the average methylation in a locus-specific manner 32 . Specifically, we first divide the promoter and downstream region (-2 to +10kb) of each gene into 120 equal-length bins. Then we compute the correlation between gene expression and the average methylation in each bin across aCanyon targets, pCanyon targets, and randomly selected genes.
As expected, for all three gene sets, the average methylation in promoters is negatively correlated with gene expression (Fig 4d). Previous studies report positive correlations between the average methylation in gene-body regions and gene expression 10,33 . However, we observe that this phenomenon only occurs for pCanyon targets and randomly selected genes, but not for aCaynon targets (Fig 4d).
The surprisingly negative correlations we observe (between the average methylation in gene-body regions and gene expression) for aCanyon targets suggest that these genes may have a distinct methylation regulation mechanism.
Using in vitro SELEX assays, Yin et al. profile human TFs for binding preferences towards 5mC 34 . They classify the TFs into 'methyl-plus' and 'methyl-minus' groups, so that methyl-plus TFs' binding is enhanced by 5mC, while methyl-minus TFs' binding is inhibited by 5mC. To further understand aCanyons and pCanyons, we utilize this SELEX data and perform differential analysis of TF motif enrichment. We find that the TF motifs' 5mC preferences are negatively correlated with their fold enrichment between aCanyon and pCanyon (Fig 4e). In other words, methyl-plus TF motifs are more enriched in pCanyons, while methyl-minus TF motifs are more enriched in aCanyons. Therefore, when methylation increases, the aCanyons would lose TF binding, and the aCanyon targets would have decreased expression. On the other hand, when methylation increases, the pCanyons would have more TF binding, and the pCanyon targets would have increased expression.
Together, aCanyons and pCanyons, the two categories of methylation canyons defined by methylation competition, present distinct patterns of chromatin accessibility and gene regulation.

Discussion
Cytosine methylation is a reversible biochemical modification 35 . The global pattern of mammalian methylome is formed by two antagonizing processes: methylation and demethylation. For the first time, the methylation competition ratio reported in this paper utilizes methylation competing events inside the same DNA molecule to measure this antagonism. Through reanalyzing bisulfite sequencing data, we reveal that the methylation competition ratio is strongly correlated with gene expression. are all negatively associated with the average methylation in promoters in human stem cells (Supplementary Fig 14b). However, they are all positive indicators of methylation competition (Supplementary Fig 14a). Bisulfite sequencing data in Ezh2 conditional knockout mice 42 also confirm that the removal of PRC2 would cause more regions to have higher average methylation but lower methylation competition ( Supplementary Fig 14c), suggesting that methylation competition can be promoted by PRC2 binding. Previous studies suggest that TET1 is associated with the repression of Polycomb targets 43,44 , but they have not detected any direct interaction between PRC2 and TET1. Our analysis indicates that methylation competition may serve as the missing link in the PRC2-TET1 association.
Nucleosome positioning is essential for gene regulation by altering chromatin accessibility 45 . Nucleosome fuzziness measures the randomness of a nucleosome's position. Through reanalyzing human brain MNase-seq (micrococcal nuclease digestion with deep sequencing) data 46 , we reveal that the methylation competition ratio is not associated with nucleosome fuzziness (Supplementary Fig 15), suggesting that methylation competition regulates gene expression independent of nucleosome positioning.
Methylation canyons are poorly methylated and have negligible differences in the average methylation, so they serve as an ideal context to investigate the methylation competition. A previous study of mouse hematopoietic stem cells (HSCs) finds that some canyons are active while the others are silent, and that such a difference is explainable by the binding of H3K4me3 or H3K27me3 29 . Thanks to our definition of methylation competition, the distinct activities of different canyons can be explained by methylation data alone, suggesting that methylation competition indicates chromatin accessibility. Besides, we find that the active canyon (aCanyon) target genes present a unique regulating model: their negative correlation between gene expression and gene-body methylation (Fig 4d) has not been reported before.
Although we have confirmed this observation by analyzing in vitro SELEX data of the 5mC preferences of TFs, more efforts are needed to scrutinize the mechanisms.
Recently, the competition between DNMTs and TETs is also confirmed in human ESCs 47 . With its different characteristics from the average methylation and methylation variation, the methylation competition ratio can disclose more methylation abnormality and cancer drivers. It highlights local methylation dynamics in the epigenetic landscape and will serve as a new layer of methylation biology.

Data source
All the public or controlled data used by this study are summarized in Supplementary   Table 1

Quality control and reads alignments
FastQC v0.11.7 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used for general quality checks of sequencing reads in FASTQ files. Then Trim Galore v0.6.4 (https://github.com/FelixKrueger/TrimGalore) was used to trim the sequencing adaptor and remove low-quality bases. WGBS reads were aligned to human (hg19) or mouse (mm9) genome using BSMAP v2.90 48 with default parameters. RRBS reads were also aligned by BSMAP, while an extra option '-D C-CGG' was added to activate the RRBS mode 49 . The overlapping bases of two read mates were only counted once to avoid duplicate counting. RNA-seq reads were first mapped to human (hg19) genome by STAR v2.6.0c 50

Genome annotation and genomic features
The NCBI RefSeq gene annotation files (hg19 and mm9) were fetched using the   Table Browser, which was derived based on the published formula 55 .
Enhancers and their target genes in CD3+ T-cell were defined based on Hi-C data.
First, the pairs of anchors which have chromatin interaction were annotated to the genome. Then for each pair, if only one of them located in promoter or gene-body regions, it was assigned as an enhancer-target pair.

The quantification of competition between active DNA methylation and demethylation
The methylation competition events are represented by the unmethylated CpGs in partially methylated reads (slashed circles in Fig 1a). So, bisulfite sequencing reads were dissected into three categories of fragments, i.e., methylated fragments (consecutive solid circles in Fig 1a), unmethylated fragments (consecutive blank circles in Fig 1a), and methylation competition fragments (consecutive slashed circles in Fig 1a). Thus, the methylation competition ratio of a particular genomic region was measured by the following equation.
M, U, and C represent the numbers of methylated, unmethylated, and methylationcompetition fragments, respectively. % , ' , and ! are the weights for each fragment. They can be set as the CpG counts of each fragment (weighted) or 1 (unweighted). Comparing the weighted and unweighted ratios, we found that the weighted ratio (Fig 2b; left panel) has slightly better correlations with gene expression than the unweighted ratio does (Supplementary Fig 4). Hence, the weighted methylation competition ratio was used in this study. Note that the weighted methylation competition ratio is equivalent to the proportion of CpGs in (Eq. 1) methylation-competition fragments among all the CpGs. Hence, the methylation competition ratio is also applicable to single CpG site. Theoretically, methylation competition ratios vary between 0 and 1 (not equal to 1). But in real data, as more than half of reads are either fully methylated or fully unmethylated ( Supplementary   Fig 16a), methylation competition ratios of most regions are lower than 0.5 ( Supplementary Fig 16b and 16c and 16d). To reduce false positive, we only retained CpGs covered by at least four reads in the analysis. We integrated the calculation of the methylation competition ratio and the average methylation ratio in a single Python script available at https://github.com/JiejunShi/methylation_competition.

Visualization of the methylation competition ratio and the average methylation
Methylation competition ratios and average methylation measures at base-pair resolution were generated in the 'wiggle' format using our 'methylation competition' tool (https://github.com/JiejunShi/methylation_competition). The 'wigToBigWig' program from UCSC binary utility directory (http://hgdownload.soe.ucsc.edu/admin/exe/) was used to transform 'wiggle' files to 'bigWig' files. For viewing specific gene loci, the 'bigWig' files were uploaded to 'custom tracks' in UCSC Genome Browser (http://genome.ucsc.edu/), with genome assembly of hg19 or mm9. The 'computeMatrix' module of deepTools v3.2.1 56 was used to extract the scores of interested regions into matrix format. Then the averaged profiles and heatmaps were visualized using 'plot' and 'image' functions in R.

The co-localization analysis of DNMT3A1 and TET1
The ChIP-seq reads density profiles of DNMT3A1 and TET1 in mouse ESCs were downloaded from the GEO database (GSE100951 and GSE100955).

Defining undermethylated regions (UMRs)
UMRs were detected using the published method 29 based on the Hidden Markov model. To reduce false positive, we only retained CpGs covered by at least four reads for UMR detection. The CpGs with an average methylation ratio lower than 10% were defined as undermethylated CpGs. Then UMRs were identified as regions that include at least four consecutive undermethylated CpGs. The adjacent UMRs were merged into a single UMR if the average methylation ratio of the merged UMR was still lower than 10%.

Determining the threshold for categorizing methylation canyons
Methylation canyons are the large (≥3.5kb) and conserved UMRs 29,31,32 . So, the UMRs longer than 3.5kb were identified as canyons. And the canyon target genes were defined as genes whose promoter or gene-body region is overlapped with canyons. To set the threshold for canyon categorizing, we calculated methylation competition ratios of 10,000 non-overlapped random genomic regions to determine the distribution of genome backgrounds. Then 'fitdist' function in R package 'fitdistrplus' was used to fit it to the classical distributions, including 'normal', 'lognormal', 'beta', 'gamma', 'uniform', 'exponential', and 'logistic' distributions ( Supplementary Fig 9a). The parameters for each distribution were estimated by 'maximum likelihood estimation'. By using the Cramér-von Mises criterion, we found the fitness to 'gamma' has the minimum distance with the distribution of genome background methylation competition ratios, which means 'gamma' distribution is the best fit (Supplementary Fig 9b). And this was also confirmed by the Q-Q plot ( Supplementary Fig 9c). Given the estimated parameters 'shape' and 'rate' from 'gamma' fitness, the threshold was determined as the 90 percentiles of the background distribution, which is aiming to select the regions with a significantly higher methylation competition ratio. Then canyons were categorized based on this threshold. Canyons with methylation competition ratio higher than this threshold were assigned as 'pCanyons' (Polycomb canyons), while the rest were designated as 'aCanyons' (active canyons) ( Supplementary Fig 9d).

Locus-specific correlation between average methylation and gene expression
The gene promoter and downstream regions (2kb upstream to 10kb downstream of TSS) were equally divided into 120 bins. The average methylation ratio was calculated for each bin, which is in 100 bp length. Aligning the TSS of different genes, all the average methylation ratios were organized into a matrix, in which rows are genes and columns are bins. Then the Spearman's rank correlation coefficients were computed between the gene expression vector and each column of this matrix.
The resulting vectors of correlation coefficients were visualized as bar plots.

Enrichment of motifs with different 5mC preference
The 5mC preferences of TF motifs were quantified by the SELEX method introduced by the previous study 34 . The motifs with preference values (termed as 'mCG enrichment' in 34 ) higher than 0 were assigned as 'methyl-plus' motifs, whose binding can be enhanced by 5mC. While motifs with preference value lower than 0 were defined as 'methyl-minus' motifs, whose binding is not favored by 5mC. Besides these two categories, there are also motifs with multiple effects or little effect on 5mC. Removing these ambiguous motifs and motifs with identical sequences, we got 105 non-redundant motifs that are clearly 'methyl-minus' or 'methyl-plus'. Then the 'findMotifsGenome' module in Homer software was used to call the motif positions in canyon sequences. The enrichment score of a particular motif was calculated as motif counts per kilobase (CPK) of the canyon. Then the fold enrichment of this motif was defined as the fold change between its CPK at aCanyons and CPK at pCanyons. Spearman's rank correlation between fold enrichment and 5mC preference was visualized in a scatter plot. And the p-value was calculated by the correlation test.

Transcription factor binding difference between methylation canyon groups
Positions of TF binding peaks determined in H1 human stem cells were downloaded from the Roadmap website. Then the percentages of TF-occupied aCanyons and pCanyons were calculated based on whether the canyon overlaps with TF binding peaks. The binding difference was measured by the odds ratio between these two percentages, i.e., TF occupied aCanyons (%) / TF occupied pCanyons (%). TFs with odds ratios higher than 1 are more enriched in aCanyons, while the others are more enriched in pCanyons.

Chromatin interaction at methylation canyons
Chromatin interactions were defined by pairs of Hi-C anchors. Then the overlap between anchor pairs and canyons was detected using bedtools v2.25.0 62 . As a control, genomic regions have the same length as canyons, but random genomic positions were generated using bedtools. For each anchor pairs, if both of them are located in the same canyon or random genomic region, it is called 'self-interacting'. If only one of them is located in a canyon or random region, it is called 'distantinteracting'.

Code availability
The open-source software for methylation competition is freely available at https://github.com/JiejunShi/methylation_competition.  Methylation competition ratios are on the left, average methylation ratios are on the middle, and methylation variation scores(entropy) are on the right. 0~25%, lowest expressed; 25~50%, lower expressed; 50~75%, higher expressed; 75~100%, highest expressed. The gene number of each group is 3,600. (b) The promoter/gene-body/enhancer methylation competition ratios (1st column) are strongly negatively correlated with gene expression level in CD3+ T-cells, and this correlation is stronger than that of the average methylation (2nd column) and the methylation variation (3rd column). Promoter regions are from 1kb upstream to 500bp downstream of TSS. Gene-body regions are from 500bp downstream of TSS to TTS. Enhancer regions are defined   Table 3, together with their target genes. (c) Functional enrichment analysis of aCanyon target genes, pCanyon target genes, and 1,000 randomly selected genes. Enriched gene counts in each group are indicated on the left side. P-values were measured by two-tailed Fisher's Exact Test and adjusted by the Benjamini-Hochberg method. (d) Spearman correlation between gene expression and average methylation of 100bp-bin in gene regions. Totally 120 bins from -2kb to 10kb were measured. (e) Relationship between fold enrichment and 5mC preference of TF motifs. Each dot represents a motif. Y-axis indicates the fold change (log2) between enrichment at aCanyon and enrichment at pCanyon of the same motif (see Methods). The X-axis shows the 5mC preference of motifs measured by the SELEX technique. 'methyl-plus' TFs prefer to bind methylated sequences, while binding of 'methyl-minus' TFs are not favored by 5mC. The list of 'methyl-plus' and 'methyl-minus' TFs are in Supplementary Table 4. Spearman's rank correlation was used. P-values were calculated by the two-tailed correlation test