The Smoothing Method for DNA Methylome Analysis Identifies Highly Accurate Epigenomic Signatures in Epigenome-Wide Association Studies

Abstract

Background: The genome-wide assessment of the DNA methylome has revolutionized our comprehension of epigenome alterations linked to complex human traits and diseases. The ability of epigenome-wide association studies (EWAS) to translate into biologically meaningful results relies on detecting epigenomic signatures with a high level of statistical certainty. However, the classical analyses of EWAS are prone to statistical inflation and bias, leading to spurious associations, particularly in case series with small sample sizes, such as those analyzing patients with rare inherited disorders.

Methods: Based on the co-methylation pattern of CpG dinucleotides within the CpG islands, we propose the smoothing method at the genome-wide level through a sliding window approach to calculate and visualize data from EWAS to decipher the most informative epigenetic alterations of EWAS with a high degree of accuracy.

Results: The smoothing method is a simple method that identifies epigenomic signatures with a high degree of certainty while controlling the risk of spurious findings outside the significant loci at a genome-wide level. We have systematically compared the smoothing method with a classical supervised approach in several EWAS settings, including two monogenic epigenetic diseases (epi-cblC and primary constitutional MLH1epimutation) and epigenetic predictors of aging. In the latter example, we showed that the smoothing method remained efficient even after applying an 80% reduction of the original sample size.

Conclusions:

The smoothing method for DNA methylation analyses is based on the biological correlate of the epigenome structure and identifies highly accurate epigenomic signatures in DNA methylation analyses. Its application to several settings of epigenome-wide analyses confirmed its usefulness for deciphering the most informative epigenomic signatures with a high degree of certainty while controlling the risk of spurious findings outside the significant loci at a genome-wide level.

Our results suggest revisiting EWAS by applying the smoothing method to already available datasets to re-analyze and potentially identify highly accurate epigenomic signatures that could translate into biologically meaningful results.

1. Introduction

The advent of genome-wide methods for assessing the epigenome landscape, notably the DNA methylome, has revolutionized our understanding of changes and alterations of epigenetic architecture in complex human traits and human diseases’, respectively [1]. Epigenome-wide association studies (EWAS) use microarrays to analyze the methylation of CpG dinucleotides (also called CpG probes) at the genome-wide level. The ability of EWAS to translate into biologically meaningful results relies on detecting epigenomic signatures with a high level of statistical certainty. However, EWAS may be prone to bias, potentially leading to spurious associations, notably in case series with small sample sizes [1, 2].

In a landmark study, Eckhardt et al. used bisulfite DNA sequencing to generate high-resolution methylation profiles of human chromosomes 6, 20, and 22, and showed that over 90% of CpG sites within 50 bases had the same methylation status [3–5]. Based on this concept of co-methylation pattern of CpG dinucleotides, notably within CpG islands, we proposed the smoothing method at the genome-wide level through a sliding window approach to calculate and visualize data from EWAS in order to decipher the most informative epigenetic alterations with a high degree of accuracy. We have illustrated the smoothing method by performing several EWAS from various settings, including two monogenic epigenetic diseases (epi-cblC and primary constitutional MLH1 epimutation) and an EWAS on epigenetic predictors of aging. To assess the magnitude of signal enhancement at the top significant loci and evaluate the benefit provided by the smoothing method, we used several metrics, including the signal-to-noise ratio (SNR), variance reduction, and variance ratio (VR), that demonstrated the method’s efficiency. Importantly, by applying the smoothing method, only the genomic regions carrying an aberrant epimutation related to monogenic disorders (e.g., MMACHC promoter CpG island and MLH1 CpG island) reached a top significant association without any other spurious association at the genome-wide level. Using the smoothing method in an EWAS of aging epigenetic predictors, we showed that it remained efficient even after applying an 80% reduction of the original sample size.

2. Materials And Methods

2.1. Example #1: epimutation of the MMACHC promoter CpG island (CpG:33)

2.1.1. DNA methylome analysis and quality controls

We carried out bisulfite conversion of 600 ng of DNA extracted from whole blood using the EZ DNA Methylation kit (Zymo Research, Proteigene, Saint-Marcel, France). The genome-wide profiling of DNA methylome was determined using the Infinium MethylationEPIC BeadChip array (Illumina, Paris, France), according to the manufacturer’s instructions. The Infinium MethylationEPIC BeadChip provides coverage of 850,000 CpG probes in enhancer regions, gene bodies, promoters, and CpG islands. The arrays were scanned on an Illumina iScan® system, and raw methylation data were extracted using Illumina’s Genome Studio methylation module. For each CpG probe, the methylation level was described as a β value, ranging between 0 (fully unmethylated CpG probe) and 1 (fully methylated CpG probe). Background correction and normalization were implemented using the SWAN method (R Package Minfi) [6]. Probe annotation information, including sequence and chromosome location for the Infinium MethylationEPIC BeadChip array, was retrieved from the Infinium MethylationEPIC v1.0 B5 Manifest File.

2.1.2. Description of cases and controls and EWAS design

We studied 16 subjects with proven MMACHC epimutation. All exhibited aberrant methylation on the CpG:33 on the MMACHC gene promoter (Chr1: 45,965,587–45,966,049, GRCh37). Of the 16 patients with MMACHC epimutation, five were previously reported [7]. For the control group, we used in silico data generated using the Infinium Human Methylation 450k BeadChip array from the MARTHA cohort, which included 350 unrelated European-ancestry patients [8]. In the EWAS, we compared the mean β value of each CpG probe between cases and controls using a t-test.

2.2. Example #2: epimutation of the MLH1 promoter CpG island (CpG:93)

We used DNA methylome data generated using the Infinium Human Methylation 450k BeadChip array using blood samples from 12 patients carrying MLH1 epimutation (CpG: 93, EPM2AIP1-MLH1), 61 patients with Lynch syndrome patients, and 41 healthy controls (GSE107351) [9]. In the EWAS, we compared the 12 MLH1 epimutation carriers (cases) with the 102 remaining patients without MLH1 epimutation from the series (controls). In the EWAS, we compared the mean β value of each CpG probe between cases and controls using a t-test.

2.3. Example #3: epigenetic predictors of aging

We used DNA methylome data generated using the Infinium Human Methylation 450k BeadChip array using blood samples from 656 subjects (426 Caucasians and 230 Hispanics), aged 19 to 101 (GSE40279) [10]. We performed an EWAS using linear regression to look for epigenetic predictors associated with age as a continuous dependent variable after adjusting for gender and ethnicity. We selected the CpG probes with a rho Spearman’s correlation coefficient ≥ 0.5 in association with age from the top significant loci. We elaborated the “Polyepigenetic CpG aging score” by calculating the sum of the beta values corresponding to the top CpG probes (n = 15). We assessed the correlation between the “Polyepigenetic CpG aging score” and age using Spearman’s rank correlation coefficient. To assess the distribution of the “Polyepigenetic CpG aging score” in an independent dataset, we used DNA methylome data generated using the Infinium Human Methylation 450k from 20 newborns and 20 nonagenarians (ArrayExpress, GEOD-30870) [11]. We compared the median value of the “Polyepigenetic CpG aging score” between newborns and nonagenarians using the Wilcoxon-Mann-Whitney test.

2.4. Quality controls of methylome array data

We visually inspected the genome-wide distribution of the CpG probes according to their β value. We assessed population stratification according to the whole methylome landscape by performing numeric principal component analysis on normalized beta values of each CpG probe across the methylation array. The top ten principal components (eigenvectors) were calculated with their respective eigenvalue.

2.5. Description of the smoothing method for calculating and visualizing data from EWAS results using nominal –Log10( P -values)

The smoothing method relies on the DNA co-methylation pattern, which is the correlation between adjacent CpG sites. The first description of the co-methylation pattern of CpG sites was reported by Eckhardt et al. using bisulfite DNA sequencing to generate high-resolution methylation profiles of human chromosomes 6, 20, and 22, providing a resource of about 1.9 million CpG methylation values derived from 12 different tissues [3]. In this landmark study, over 90% of CpG sites within 50 bases had the same methylation status [3]. Taking into account this assumption, we proposed the conversion of the nominal –Log10(P-value) obtained for each CpG probe at the genome-wide level to smoothed –Log10(P-values) using a pre-specified window width (w) value ranging between 1 and 3 (Smoothed[P_k,w]) (Figure S1). We used a symmetric smoothing method so that the Smoothed(P_k,w) of each CpG probe corresponds to the mean of the –Log10(P-values) of the central CpG probe (k) and those located on either side of the central CpG probe within the limit of the pre-defined window width, through a sliding window approach, according to the following formula:

$${S}{m}{o}{o}{t}{h}{e}{d}({P}{k},{w})={\sum }_{{n}={k}-{w}}^{{n}={k}+{w}}\left(-{L}{o}{g}10\left[{P}-{v}{a}{l}{u}{e}\right]\right) ÷(2{w}+1)$$

, where k corresponds to the central CpG probe and w to the number of CpG probes on either side of the central CpG probe (window width). The Infinium Human Methylation 450k BeadChip array encompasses 482,421 CpG probes and 26,658 CpG islands with an average number of 5.63 CpG probes per CpG island [5]. Considering this resolution, we estimated the optimal smoothing width at 2 since this value corresponds to a total number of 5 CpG sites (2w + 1) to calculate the Smoothed(P_k,w) value.

To assess the magnitude of signal enhancement at a given top significant locus, we used several metrics, including the signal-to-noise ratio (SNR), variance reduction, and variance ratio (VR). For a given top significant locus, the signal-to-noise ratio (SNR) was defined as the ratio of the power of a signal (meaningful information at the top locus) and the power of background noise (unwanted signal in the remaining genomic regions at the genome-wide level) according to the following formula [12]:

$$\mathbf{S}\mathbf{i}\mathbf{g}\mathbf{n}\mathbf{a}\mathbf{l}-\mathbf{t}\mathbf{o}-\mathbf{n}\mathbf{o}\mathbf{i}\mathbf{s}\mathbf{e} \mathbf{r}\mathbf{a}\mathbf{t}\mathbf{i}\mathbf{o} \left(\mathbf{S}\mathbf{N}\mathbf{R}\right)=\frac{{\mu } \left(\mathbf{s}\mathbf{i}\mathbf{g}\mathbf{n}\mathbf{a}\mathbf{l}\right)}{{\sigma } \left(\mathbf{n}\mathbf{o}\mathbf{i}\mathbf{s}\mathbf{e}\right)}$$

, where ‘µ (signal)’ corresponds to the mean of Smoothed(P_k,w) of the CpG probes located in the analyzed top significant locus and ‘σ (noise)’ to the standard deviation of Smoothed(P_k,w) of all the CpG probes located outside the analyzed top significant locus at the genome-wide level, using the same window width value (w).

We calculated the variance ratio according to the following formula:

$$\mathbf{V}\mathbf{a}\mathbf{r}\mathbf{i}\mathbf{a}\mathbf{n}\mathbf{c}\mathbf{e} \mathbf{r}\mathbf{a}\mathbf{t}\mathbf{i}\mathbf{o} \left(\mathbf{V}\mathbf{R}\right)=\frac{{V}{A}{R} \left(\mathbf{S}\mathbf{m}\mathbf{o}\mathbf{o}\mathbf{t}\mathbf{h}\mathbf{e}\mathbf{d}\right[{P}{k},{w}\left]\right) \left(\mathbf{s}\mathbf{i}\mathbf{g}\mathbf{n}\mathbf{a}\mathbf{l}\right)}{{V}{A}{R} \left(\mathbf{S}\mathbf{m}\mathbf{o}\mathbf{o}\mathbf{t}\mathbf{h}\mathbf{e}\mathbf{d}\right[{P}{k},{w}\left]\right) \left(\mathbf{n}\mathbf{o}\mathbf{i}\mathbf{s}\mathbf{e}\right)}$$

, where ‘VAR (Smoothed[P_k,w]) (signal)’ corresponds to the variance calculated from the Smoothed[P_k,w] of the CpG probes located in the studied genomic region and ‘VAR (Smoothed[P_k,w]) (noise)’ to the variance calculated from the Smoothed[P_k,w] of all the CpG probes located outside the studied genomic region at the genome-wide level, using the same window width value (w).

We reported the results of epigenome-wide association studies using epi-Manhattan plots for nominal and smoothing transformed –Log10(P-values). In the epi-Manhattan illustrating the smoothing transformation of –Log10(P-values), we used (2×σ²) as a significance threshold value, where σ² corresponds to ‘VAR (Smoothed[Pk,w]) (noise)’, as previously described. The smoothing method and all statistical analyses were performed using the SNP & Variation Suite (v8.8.1; Golden Helix, Inc., Bozeman, MT, USA) and MedCalc, version 19.5.3 (MedCalc Software, Ostend, Belgium).

3. Results

3.1. Example #1: epimutation of the MMACHC promoter CpG island (CpG:33)

DNA methylome analysis and quality controls

All the DNA methylome profiles of the 16 subjects with an MMACHC promoter epimutation (CpG island CpG:33) were of optimal quality and were used for statistical analyses. A total of 449,517 CpG probes were analyzed. As shown in the epi-Manhattan plot, the EWAS using untransformed –Log10(P-values) did not retrieve any top significant locus at the genome-wide level (Fig. 1A).

After applying a smoothing transformation at the genome-wide level using a window width of 1 (three CpG probes in each window), we observed a top significance at the MMACHC promoter CpG island (CpG:33) without any other DNA methylome signature at the genome-wide level. The SNR_{MMACHC/outside_MMACHC} was increased by ~ 50% (SNR = 16.2) and the VR_{MMACHC/outside MMACHC} increased by + 95% (VR = 12.7). Using this window width, the median Smoothed(P_k,1) (k: from cg03123370 to cg15896098) of the MMACHC promoter CpG island (CpG:33) was 161 (IQR, 129–177) vs. 9 (IQR, 5–15) for the genomic regions outside the CpG:33 of MMACHC (Tables 1–2 and Fig. 1B). The increase in VR_{MMACHC/outside_MMACHC} was related to a -61% decrease in noise variance (outside CpG:33, MMACHC) that was concomitant with a decrease but to a much lesser extent (-23%) of the signal variance (CpG:33, MMACHC) (Table 1 and Fig. 2).

Table 1

Untransformed –Log10(P-values) and their smoothed transformation using a window width of 1, 2, and 3 in the CpG island of the *MMACHC* promoter (CpG:33) and outside this genomic region
	Minimum	Maximum	Mean	Median	IQR, 25th − 75th	Variance	SD	Variance reduction (%)	Signal-to-noise Ratio*	Variance Ratio †
Genomic region: Outside CpG:33 (MMACHC) (449,507 CpG probes)
Untransformed –Log10 P-value (window width = 0; 1 CpG probe)	0	311	11	6	2–14	225	15	—	—	—
Smoothed –Log10 P-value (window width = 1; 3 CpG probes)	0	255	11	9	5–15	89	9	-60%	—	—
Smoothed –Log10 P-value (window width = 2; 5 CpG probes)	0	212	11	10	6–14	45	7	-80%	—	—
Smoothed –Log10 P-value (window width = 3; 7 CpG probes)	0	209	11	10	7–14	37	6	-84%	—	—
Genomic region: CpG:33 (MMACHC) (10 CpG probes)
Untransformed –Log10 P-value (window width = 0; 1 CpG probe)	104	217	163	165	133–193	1467	38	—	10.9	6.5
Smoothed –Log10 P-value (window width = 1; 3 CpG probes)	89	197	153	161	129–177	1128	34	-23%	16.2	12.7
Smoothed –Log10 P-value (window width = 2; 5 CpG probes)	100	181	139	140	114–159	810	28	-45%	20.7	18.0
Smoothed –Log10 P-value (window width = 3; 7 CpG probes)	103	167	135	131	121–150	417	20	-72%	22.1	11.2
Note. IQR: interquartile range; SD: standard deviation.
* Signal-to-noise ratio is defined as the ratio of the power of a signal (meaningful information) and the power of background noise (unwanted signal) according to the following formula: SNR = Mean of –Log10(P-values) (CpG:33, MMACHC gene) / SD of –Log10(P-values) (outside CpG:33, MMACHC gene).
† Variance-ratio = Variance of –Log10(P-values) (CpG:33, MMACHC gene) / Variance of –Log10(P-values) (outside CpG:33, MMACHC gene).

Table 2

Results of the epigenome-wide association study comparing 11 patients with *MMACHC* epimutation and controls regarding the CpG probes located in the *MMACHC* promoter CpG island (CpG:33)
CpG probe	Chr.	Position*	t-test P-value	Mean β-value, epi-cblC	Mean β-value, Controls	–Log10 P-value (Untransformed) (WW = 0)	Smoothed(P_k,1) (WW = 1) ‡	Smoothed(P_k,2) (WW = 2) ‡	Smoothed(P_k,3) (WW = 3) ‡
cg03123370†	1	45965343	1.43×10^–133	0.51	0.11	133	89	100	119
cg04700938†	1	45965449	4.98×10^–130	0.44	0.06	129	122	108	122
cg13848568	1	45965625	3.54×10^–105	0.36	0.04	104	127	131	120
cg00081251	1	45965679	5.99×10^–148	0.51	0.07	147	151	149	137
cg09323228	1	45965727	2.02×10^–203	0.57	0.05	203	181	156	160
cg27393325	1	45965846	7.18×10^–194	0.56	0.02	193	176	169	167
cg22536808	1	45965870	1.03×10^–133	0.40	0.00	133	170	181	152
cg14836864	1	45965990	7.49×10^–184	0.49	0.03	183	178	160	141
cg03108114	1	45966048	2.77×10^–217	0.59	0.02	217	197	131	125
cg15896098	1	45966115	2.11×10^–192	0.56	0.03	192	136	105	103
Note. Chr.: chromosome; WW: window width.
* Genomic positions according to GRCh37 (hg19).
† The two CpG probes, cg03123370 and cg04700938, were in the vicinity of the CpG island:33 and were included in the analysis of the MMACHC locus.
‡ The median and IQR –Log10(P-values) much lower in the CpG probes outside the CpG:33, MMACHC gene were equal to 6 (IQR, 2–14), 9 (5–15), 10 (6–14), and 10 (7–14) for the untransformed –Log10(P-values) and smoothed –Log10(P-values) using a window width of 1, 2, or 3, respectively.

The smoothing transformation using a window width of 2 (five CpG probes in each window) achieved a + 90% increase of the SNR_{MMACHC/outside_MMACHC} (20.7) and a + 176% increase in the VR_{MMACHC/outside_MMACHC} (18.8). Using this window width, the median Smoothed(P_k,2) of the MMACHC promoter CpG island (CpG:33) was 140 (IQR, 114–159) vs. 10 (IQR, 6–14) for the genomic regions outside the CpG:33 of MMACHC (Tables 1–2 and Fig. 1C). Using the window width of 2, the decrease in noise variance was − 80%, while that of the signal variance (CpG:33, MMACHC) was − 45% (Table 1 and Fig. 2).

The smoothing transformation using a window width of 3 (seven CpG probes in each window) provided a 103% increase of the SNR_{MMACHC/outside_MMACHC} (22.1) and a + 73% increase in the VR_{MMACHC/outside_MMACHC} (11.2) (Table 1 and Fig. 1D). Using this window width, the median Smoothed(P_k,3) of the MMACHC promoter CpG island (CpG:33) was 131 (IQR, 121–150) vs. 10 (IQR, 7–14) for the genomic regions outside the CpG:33 of MMACHC (Tables 1–2 and Fig. 1D). Using the window width of 3, the decrease in noise variance was − 84%, while that of the signal variance (CpG:33, MMACHC) was − 72% (Table 1 and Fig. 2).

3.2. Example #2: epimutation of the MLH1 promoter CpG island (CpG:93)

The EWAS using untransformed –Log10(P-values) retrieved a significant association in the EPM2AIP1–MLH1 CpG island (CpG:93, gene promoter) (Fig. 3A) with a hemimethylated profile among patients harboring the CpG:93 epimutation when compared to controls who had a fully unmethylated CpG:93 (Fig. 3B). The smoothing transformation using a window width of 3 maintained the top significance at CpG:93, reduced the noise variance in the remaining genomic regions, and did not retrieve any other DNA methylome signature at the genome-wide level (Fig. 3A).

3.3. Example #3: epigenetic predictors of aging

Using linear regression EWAS, we looked for epigenetic predictors associated with age as a continuous dependent variable after adjusting for gender and ethnicity. The EWAS using untransformed –Log10(P-values) found no clear association at the genome-wide level (Fig. 4A). After applying the smoothing transformation and a window width of 3, we retrieved a top significance at four loci: ZYG11A (CpG:112, gene promoter), FHL2 (CpG:81, gene promoter), ELOVL2/ELOVL2-AS1 (CpG:141 bidirectional promoter of the head-to-head sense-antisense gene pair), and KLF14 (CpG:156, gene promoter) (Fig. 4B). Fifteen CpG probes from the four top significant loci were highly and positively correlated with age (Supplemental Table 1). The ‘Polyepigenetic CpG aging score’ based on the 15 top CpG was highly and positively correlated with age (Spearman’s rho = 0.89; 95% CI, 0.88–0.90; P = 3.00×10^–219) (Fig. 4C). Notably, on the four top loci retrieved in the present analysis, two were highlighted in the original published study (ELOVL2, KLF14), and two remaining loci have been very well documented in later studies as being involved in age-related epigenomic signatures [13–19]. We performed a sensitivity analysis on independent sub-samples by randomly selecting five subgroups, each of them representing 20% of the total population (Supplemental Table 2). Using a Smoothed(P_k,3) threshold of 10, all the five sensitivity analyses confirmed the FHL2, ELOVL2/ELOVL2-AS1, and KLF14 loci (Fig. 5 and Supplemental Fig. 2). Using the same Smoothed(P_k,3) threshold, the ZYGA11 locus was confirmed in three EWAS (Fig. 5 and Supplemental Fig. 2). In the replication EWAS, the Polyepigenetic CpG aging score differed significantly between newborns and nonagenarians with a median score of 2.00 (IQR, 1.89–2.08) and 5.60 (IQR, 5.52–5.72), respectively (P = 3.40×10^–8) (Fig. 6).

4. Discussion

Several analytical methods have been developed to identify epigenomic signatures in EWAS [20]. Among them, the supervised approaches based on differentially methylated region (DMR) calling and the intelligent classifiers represent the most widely used [21–32]. The DMR calling methods are based on CpG annotations and use conventional statistical approaches (e.g., Student’s t-test [33]) or more elaborated methods (e.g., beta-binomial regression fit [34] or Shannon entropy [35]). All these methods and algorithms are well detailed in the review by Fan et al. [26], and as suggested by the authors, the supervised methods are in line with the classical paradigms of study power and multiple testing correction [26]. The smoothing method utilizes the concepts of signal processing theory to mirror the biological reality of the DNA methylome architecture. Formerly, several smoothing methods have been developed as mathematical models for signal de-noising, notably in engineering, spectrometry, and image processing [36–40].

Jaffe et al. reported a ‘bump hunting’ approach for DNA methylome analysis that relies on the co-methylation pattern of CpG probes [22]. They modeled the methylation level of CpG probes as a smooth function of genomic position and they implemented a multistep approach to identify differentially methylated regions based on a logit-transformation of methylation levels of CpG probes depending on the outcome of interest, an estimation of smoothed beta-values, and statistical testing to assess the statistical uncertainty to each estimated differentially methylated region [22]. In this way, the smoothing method we propose in the setting of EWAS analyses overcomes both statistical power and adjustment to multiple testing constraints since it handles nominal P-values to reconstruct an updated epi-Manhattan plot after signal processing. We showed that by applying the smoothing method, we optimized the signal-to-noise ratio, which allowed highlighting potentially relevant epigenomic signatures. The optimization of the signal-to-noise ratio resulted from a downward shift (shift to the left) of the distribution of the –Log10(P-values) at the genome-wide level, allowing background noise to be reduced while maintaining the strength of the signal (Supplemental Fig. 3). Interestingly, the signal-to-noise ratio approach has been tested to uncover de novo locus-disease discoveries in genome-wide association studies. It was suggested as a potential alternative to multiple testing correction approaches that could result in an over-conservative correction and a reduced power to detect true associations [41, 42].

CONCLUSIONS

We demonstrated that the smoothing method effectively uncovers epigenomic signatures in monogenic epigenetic disorders with small sample size datasets but also in large sample size EWAS on complex phenotypes. Importantly, in the case of large datasets, the smoothing method provided similar results even after applying an 80% reduction of the original sample size. Our results suggest revisiting EWAS by applying the smoothing method to already available datasets to re-analyze and potentially identify highly accurate epigenomic signatures that could translate into biologically meaningful results.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Availability of data and material

Example #1: epimutation of the MMACHC promoter CpG island (CpG:33): Anonymized patient data are available for use in collaborative studies to researchers upon reasonable request. Data will be provided following the review and approval of a research proposal (including a statistical analysis plan) and the completion of a data-sharing agreement. Responses to the request for the raw data will be judged by the IRB of the INSERM UMR_S 1256.
Example #2: epimutation of the MLH1 promoter CpG island (CpG:93): GSE107351.
Example #3: epigenetic predictors of aging: GSE40279 and GEOD-30870.

Competing interests

The authors declare no conflict of interest.

Funding

INSERM UMR_S 1256, Nutrition, Genetics, and Environmental Risk Exposure.

Authors’ contributions

Conceptualization: AO, DAT, JLG, Formal Analysis: AO, Methodology: AO, DAT, JLG, Supervision: AO, Validation: AO, DAT, JLG, Writing – original draft: AO, DAT, JLG, Writing – review & editing: AO, DAT, JLG.

Acknowledgements

The authors thank all the persons who contributed to the original study on patients with epi-cblC [7]: Javad Nadaf, David Coelho, Thomas Josse, Justine Flayac, Aurélie Robert, Isabelle Koscinski, Isabelle Gastin, Pierre Filhine-Tresarrieu, Mihaela Pupavac, Alison Brebner, David Watkins, Tomi Pastinen, Alexandre Montpetit, Fadi Hariri, Benjamin A Raby, Wendy K. Chung, D. Sean Froese, Matthias R. Baumgartner, Jean-François Benoist, Can Ficicioglu, Virginie Marchand, Yuri Motorin, Chrystèle Bonnemains, François Feillet, Jacek Majewski, and David S. Rosenblatt.
The MARTHA project was supported by a grant from the “Programme Hospitalier de Recherche Clinique”, and the methylation array typing was funded by the Canadian Institutes of Health Research (grant MOP 86466) and the Heart and Stroke Foundation of Canada (grant T6484).

References

Michels KB, Binder AM, Dedeurwaerder S, Epstein CB, Greally JM, Gut I, Houseman EA, Izzi B, Kelsey KT, Meissner A et al: Recommendations for the design and analysis of epigenome-wide association studies. Nat Methods 2013, 10(10):949-955.
van Iterson M, van Zwet EW, Consortium B, Heijmans BT: Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biol 2017, 18(1):19.
Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, Burton J, Cox TV, Davies R, Down TA et al: DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet 2006, 38(12):1378-1385.
Bell JT, Pai AA, Pickrell JK, Gaffney DJ, Pique-Regi R, Degner JF, Gilad Y, Pritchard JK: DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol 2011, 12(1):R10.
Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL et al: High density DNA methylation array with single CpG site resolution. Genomics 2011, 98(4):288-295.
Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, Irizarry RA: Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics (Oxford, England) 2014, 30(10):1363-1369.
Gueant JL, Chery C, Oussalah A, Nadaf J, Coelho D, Josse T, Flayac J, Robert A, Koscinski I, Gastin I et al: Publisher Correction: A PRDX1 mutant allele causes a MMACHC secondary epimutation in cblC patients. Nat Commun 2018, 9(1):554.
Dick KJ, Nelson CP, Tsaprouni L, Sandling JK, Aissi D, Wahl S, Meduri E, Morange PE, Gagnon F, Grallert H et al: DNA methylation and body-mass index: a genome-wide analysis. Lancet 2014, 383(9933):1990-1998.
Damaso E, Castillejo A, Arias MDM, Canet-Hermida J, Navarro M, Del Valle J, Campos O, Fernandez A, Marin F, Turchetti D et al: Primary constitutional MLH1 epimutations: a focal epigenetic event. Br J Cancer 2018, 119(8):978-987.
Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, Klotzle B, Bibikova M, Fan JB, Gao Y et al: Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell 2013, 49(2):359-367.
Heyn H, Li N, Ferreira HJ, Moran S, Pisano DG, Gomez A, Diez J, Sanchez-Mut JV, Setien F, Carmona FJ et al: Distinct DNA methylomes of newborns and centenarians. Proc Natl Acad Sci U S A 2012, 109(26):10522-10527.
Bushberg JT, Boone JM: The essential physics of medical imaging: Lippincott Williams & Wilkins, 2011.
Naue J, Hoefsloot HCJ, Mook ORF, Rijlaarsdam-Hoekstra L, van der Zwalm MCH, Henneman P, Kloosterman AD, Verschure PJ: Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression. Forensic Sci Int Genet 2017, 31:19-28.
Florath I, Butterbach K, Muller H, Bewerunge-Hudler M, Brenner H: Cross-sectional and longitudinal changes in DNA methylation with age: an epigenome-wide analysis revealing over 60 novel age-associated CpG sites. Hum Mol Genet 2014, 23(5):1186-1201.
Tan Q, Heijmans BT, Hjelmborg JV, Soerensen M, Christensen K, Christiansen L: Epigenetic drift in the aging genome: a ten-year follow-up in an elderly twin cohort. Int J Epidemiol 2016, 45(4):1146-1158.
Jung SE, Lim SM, Hong SR, Lee EH, Shin KJ, Lee HY: DNA methylation of the ELOVL2, FHL2, KLF14, C1orf132/MIR29B2C, and TRIM59 genes for age prediction from blood, saliva, and buccal swab samples. Forensic Sci Int Genet 2019, 38:1-8.
Bacalini MG, Deelen J, Pirazzini C, De Cecco M, Giuliani C, Lanzarini C, Ravaioli F, Marasco E, van Heemst D, Suchiman HED et al: Systemic Age-Associated DNA Hypermethylation of ELOVL2 Gene: In Vivo and In Vitro Evidences of a Cell Replication Process. J Gerontol A Biol Sci Med Sci 2017, 72(8):1015-1023.
Garagnani P, Bacalini MG, Pirazzini C, Gori D, Giuliani C, Mari D, Di Blasio AM, Gentilini D, Vitale G, Collino S et al: Methylation of ELOVL2 gene as a new epigenetic marker of age. Aging Cell 2012, 11(6):1132-1134.
Kananen L, Marttila S, Nevalainen T, Jylhava J, Mononen N, Kahonen M, Raitakari OT, Lehtimaki T, Hurme M: Aging-associated DNA methylation changes in middle-aged individuals: the Young Finns study. BMC Genomics 2016, 17:103.
Mallik S, Odom GJ, Gao Z, Gomez L, Chen X, Wang L: An evaluation of supervised methods for identifying differentially methylated regions in Illumina methylation arrays. Brief Bioinform 2019, 20(6):2224-2235.
Pedersen BS, Schwartz DA, Yang IV, Kechris KJ: Comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values. Bioinformatics 2012, 28(22):2986-2988.
Jaffe AE, Murakami P, Lee H, Leek JT, Fallin MD, Feinberg AP, Irizarry RA: Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int J Epidemiol 2012, 41(1):200-209.
Zou J, Lippert C, Heckerman D, Aryee M, Listgarten J: Epigenome-wide association studies without the need for cell-type composition. Nat Methods 2014, 11(3):309-311.
Li H, Hong G, Xu H, Guo Z: Application of the rank-based method to DNA methylation for cancer diagnosis. Gene 2015, 555(2):203-207.
Lehne B, Drong AW, Loh M, Zhang W, Scott WR, Tan ST, Afzal U, Scott J, Jarvelin MR, Elliott P et al: A coherent approach for analysis of the Illumina HumanMethylation450 BeadChip improves data quality and performance in epigenome-wide association studies. Genome Biol 2015, 16:37.
Fan S, Chi W: Methods for genome-wide DNA methylation analysis in human cancer. Brief Funct Genomics 2016, 15(6):432-442.
Chen J, Behnam E, Huang J, Moffatt MF, Schaid DJ, Liang L, Lin X: Fast and robust adjustment of cell mixtures in epigenome-wide association studies with SmartSVA. BMC Genomics 2017, 18(1):413.
Heiss JA, Breitling LP, Lehne B, Kooner JS, Chambers JC, Brenner H: Training a model for estimating leukocyte composition using whole-blood DNA methylation and cell counts as reference. Epigenomics 2017, 9(1):13-20.
Rahmani E, Schweiger R, Shenhav L, Wingert T, Hofer I, Gabel E, Eskin E, Halperin E: BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference. Genome Biol 2018, 19(1):141.
Zheng SC, Breeze CE, Beck S, Teschendorff AE: Identification of differentially methylated cell types in epigenome-wide association studies. Nat Methods 2018, 15(12):1059-1066.
Saffari A, Silver MJ, Zavattari P, Moi L, Columbano A, Meaburn EL, Dudbridge F: Estimation of a significance threshold for epigenome-wide association studies. Genet Epidemiol 2018, 42(1):20-33.
Srivastava A, Karpievitch YV, Eichten SR, Borevitz JO, Lister R: HOME: a histogram based machine learning approach for effective identification of differentially methylated regions. BMC Bioinformatics 2019, 20(1):253.
Bibikova M, Chudin E, Wu B, Zhou L, Garcia EW, Liu Y, Shin S, Plaia TW, Auerbach JM, Arking DE et al: Human embryonic stem cells have a unique epigenetic signature. Genome Res 2006, 16(9):1075-1083.
Hebestreit K, Dugas M, Klein HU: Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics 2013, 29(13):1647-1653.
Xie H, Wang M, de Andrade A, Bonaldo Mde F, Galat V, Arndt K, Rajaram V, Goldman S, Tomita T, Soares MB: Genome-wide quantitative assessment of variation in DNA methylation patterns. Nucleic Acids Res 2011, 39(10):4099-4108.
Angenent S, Pichon E, Tannenbaum A: Mathematical methods in medical image processing. Bulletin of the American mathematical society 2006, 43(3):365-396.
Kuan DT, Sawchuk AA, Strand TC, Chavel P: Adaptive noise smoothing filter for images with signal-dependent noise. IEEE transactions on pattern analysis and machine intelligence 1985(2):165-177.
O'Haver T, Begley T: Signal-to-noise ratio in higher order derivative spectrometry. Analytical Chemistry 1981, 53(12):1876-1878.
Czanner G, Sarma SV, Ba D, Eden UT, Wu W, Eskandar E, Lim HH, Temereanca S, Suzuki WA, Brown EN: Measuring the signal-to-noise ratio of a neuron. Proc Natl Acad Sci U S A 2015, 112(23):7141-7146.
Qi J, Chen C, Zhang X, Hu X, Ji S, Kwok RTK, Lam JWY, Ding D, Tang BZ: Light-driven transformable optical agent with adaptive functions for boosting cancer surgery outcomes. Nat Commun 2018, 9(1):1848.
Jiang X, Barmada MM, Becich MJ: Evaluating de novo locus-disease discoveries in GWAS using the signal-to-noise ratio. AMIA Annu Symp Proc 2011, 2011:617-624.
Martin LJ, Gao G, Kang G, Fang Y, Woo JG: Improving the signal-to-noise ratio in genome-wide association studies. Genet Epidemiol 2009, 33 Suppl 1:S29-32.