Dna Methylation Analysis
To identify DNA methylation changes associated with MS, we performed EWAS on bisulfite treated genomic DNA isolated from whole blood (see Fig. 1 and methods section). A total of 583 people with MS (pwMS) and 643 controls from three independently collected groups were analyzed in this study. Further details of each group is in Supplementary table 1 and the methods section.
The discovery group was comprised of whole blood samples collected at baseline from The Australian Multi-Centre Study of Environment and Immune function (Ausimmune study)22. From this group, we used 208 MS cases and 402 matched non-MS controls. Most of this group entered the Ausimmune study after their first clinical diagnosis of CNS demyelination (FCD), were primarily treatment naïve at the time of sampling, and had mainly relapse onset phenotypes. We used whole blood samples from the baseline collection in cases who, at 10-years follow up, had converted to MS based on the 2017 McDonald criteria23. Controls were matched for age, sex and region22.
The first replication group was from Sweden and consisted of 140 MS cases and 139 controls. This dataset consisted of previously published data from prevalent MS cases who were primarily on treatment at the time of blood sampling, mostly relapse onset phenotype, and had been profiled using the Illumina 450K arrays16.
The second replication group consisted of 235 MS cases sourced from four Australian MS specialist clinics: Melbourne, Newcastle, Hobart and Adelaide. The cases in this group were exclusively of relapse onset, but longer disease duration (median 16 years) and were on a variety of treatments at sample collection24. This case group did not have controls so were matched with data from 102 non-MS controls from a publicly available dataset.
All DNA samples from the discovery groups and case samples from the second replication group were tested in-house for genome-wide methylation using the Illumina Infinium Human Methylation EPIC BeadChip (EPIC array) (Illumina Inc, SanDiego, CA, USA). The publicly available data set matched to the second replication group was also profiled using the EPIC array.
Whole blood EWAS identified major differential methylation at HLA genes
To identify DNAm changes in whole blood associated with MS, we first identified differentially methylated positions (DMPs) i.e., DNAm changes at single CpG sites. We used logistic regression with phenotype (case or control) as the outcome variable and incorporated major covariates in the model, such as age, sex, region, and cell-type proportion (see methods section). The mean difference in CpG methylation between the case and control group was used to assess the effect size (referred to as Δβ).
In the discovery group, we identified 3,218 DMPs, which met the genome-wide significance cut-off of P < 9.8x10− 8. Of these, 159 DMPs also met a delta beta cut off of Δβ ≥ 0.02 (an effect size of 2% or greater). To confirm these results, we investigated if these DMPs were also present in our two replication groups using an empirical p value of < 0.05 and a Δβ > 0.02. We found 18 DMPs that overlapped in both replication groups with the same directionality as the discovery group. The majority of these DMPs (13/18) localised to HLA-DRB1, HLA-DRB5 or HLA-DRB6 genes (Table 1). Of the remaining 5 DMPs, 4 localized to genes related to interferon pathways and one DMP localized to an unannotated location (supplementary Table 2). Because there were 32 individuals in the discovery group who were treated with beta interferon prior to sample collection, we determined if the DMP signal at these four genes was originating from interferon beta treated individuals. To do this, we removed the 32 samples from the analysis and repeated it. After removal, their effect size reduced and two of the four DMPs no longer reached genome-wide statistical significance (located on CMPK2 and PARP9, see Supplementary Table 2).
Table 1
The 13 DMPs at the HLA-DRB locus in all three groups. All DMPs with Δβ > 2% and genome-wide significant in the discovery group as well as FDR ≤ 0.05 and Δβ > 2% in both replication cohorts.
CpG | Chr | Position | Gene | Feature | Discovery Cohort Δβ | Discovery Cohort p-value | Replication Cohort 1 Δβ | Replication Cohort 1 p-value | Replication Cohort 2 Δβ | Replication Cohort 2 p-value |
---|
cg01341801 | 6 | 32489203 | HLA-DRB5 | Body | 0.192 | 3.43E-11 | 0.212 | 6.51E-07 | 0.276 | 1.94E-05 |
cg25140213 | 6 | 32522683 | HLA-DRB6 | Body | 0.187 | 3.14E-10 | 0.189 | 1.63E-06 | 0.238 | 4.55E-05 |
cg17369694 | 6 | 32485396 | HLA-DRB5 | 3'UTR | 0.155 | 2.65E-11 | 0.174 | 1.90E-06 | 0.217 | 1.46E-04 |
cg09139047 | 6 | 32552042 | HLA-DRB1 | Body | -0.135 | 1.36E-08 | -0.162 | 2.70E-06 | -0.175 | 2.70E-03 |
cg14645244 | 6 | 32552205 | HLA-DRB1 | Body | -0.122 | 7.18E-10 | -0.131 | 6.30E-07 | -0.128 | 2.46E-04 |
cg10632894 | 6 | 32552453 | HLA-DRB1 | Body | -0.12 | 6.55E-08 | -0.127 | 2.01E-06 | -0.207 | 5.35E-05 |
cg08845336 | 6 | 32551891 | HLA-DRB1 | Body | -0.119 | 4.01E-09 | -0.14 | 2.16E-06 | -0.137 | 4.94E-05 |
cg10104420 | 6 | 32526414 | HLA-DRB6 | Body | -0.107 | 1.96E-09 | -0.135 | 5.60E-07 | -0.071 | 3.11E-02 |
cg19575208 | 6 | 32551888 | HLA-DRB1 | Body | -0.105 | 3.55E-09 | -0.121 | 3.26E-06 | -0.128 | 9.10E-05 |
cg17416722 | 6 | 32554385 | HLA-DRB1 | Body | 0.096 | 4.67E-11 | 0.105 | 1.88E-06 | 0.14 | 2.72E-04 |
cg00440797 | 6 | 32493873 | HLA-DRB5 | Body | 0.079 | 5.40E-10 | 0.07 | 5.95E-05 | 0.124 | 1.24E-04 |
cg00579921 | 6 | 32489991 | HLA-DRB5 | Body | 0.064 | 7.06E-08 | 0.051 | 6.31E-04 | 0.101 | 6.14E-03 |
cg26036029 | 6 | 32552443 | HLA-DRB1 | Body | -0.061 | 4.72E-08 | -0.042 | 5.30E-05 | -0.076 | 8.69E-05 |
cg22930808 | 3 | 12228188 | PARP9 | 5'UTR | -0.043 | 6.66E-08 | -0.079 | 8.45E-08 | -0.088 | 5.90E-04 |
cg01028142 | 2 | 7004578 | CMPK2 | Body | -0.036 | 6.28E-12 | -0.039 | 4.77E-06 | -0.062 | 1.57E-05 |
cg01533966 | 1 | 90363165 | LRRC8D | 5'UTR | 0.03 | 1.41E-12 | 0.019 | 3.94E-02 | 0.03 | 2.47E-02 |
cg01311537 | 10 | 50386709 | TMEM273 | Body | -0.021 | 4.98E-08 | -0.017 | 7.78E-03 | -0.035 | 4.08E-03 |
cg10778971 | 14 | 94577101 | IFI27 | 5'UTR | -0.02 | 4.43E-12 | -0.014 | 1.15E-03 | -0.014 | 9.12E-03 |
Δβ = delta beta |
To increase statistical power, we combined all three datasets (see methods section). The combined dataset analysis identified 11,969 individual DMPs that met the genome-wide significance threshold of (P < 9.8x10− 8). Of these, 190 also had a Δβ > 0.02 and 81 were located at the MHC Class II locus (Supplementary Fig. 1). Of the total 11,969 individual DMPs, 7803 (65%) were hypermethylated, 9234 (77%) mapped to the immediate vicinity of genes, and 2116 (18%) were located at the transcription start site of genes. These 11,969 DMPs represented a total of 5,560 genes.
From the DMP list of the pooled dataset, we identified differentially methylated regions (DMRs) using the DMRcate package by defining a DMR as three or more DMPs separated by less than 1000bp. To ensure maximum biological relevance we filtered the list to contain only DMRs where at least one DMP had a Δβ greater than 0.1 (10%). After this filtering, nine independent DMRs remained, all of which were located in HLA genes (Table 2). They contained between five and 40 DMPs with effect sizes ranging from 1–18% (Δβ 0.01–0.18). There were four DMRs in HLA-DRB1 (three hypermethylated and one hypomethylated), two in HLA-DRB6 (both hypermethylated), and one each in HLA-DRB5, HLA-DQA (both hypermethylated) and HLA-DQB (hypomethylated).
Table 2
DMRs in the combined cohort.
Chromosome | Start | End | Width | Number of CpGs | Min Smoothed FDR | Maximum Difference | Mean Difference | Overlapping Genes | DMR Index |
---|
chr6 | 32548321 | 32550067 | 1747 | 10 | | 0.161 | 0.051 | HLA-DRB1 | 1 |
chr6 | 32551749 | 32552670 | 922 | 19 | | -0.167 | -0.077 | HLA-DRB1 | 2 |
chr6 | 32553920 | 32556093 | 2174 | 5 | | 0.109 | 0.059 | HLA-DRB1 | 3 |
chr6 | 32557422 | 32558459 | 1038 | 5 | | 0.126 | 0.029 | HLA-DRB1 | 4 |
chr6 | 32489203 | 32490444 | 1242 | 12 | | 0.182 | 0.056 | HLA-DRB5 | 5 |
chr6 | 32520615 | 32523136 | 2522 | 11 | | 0.152 | 0.064 | HLA-DRB6 | 6 |
chr6 | 32525805 | 32526702 | 898 | 12 | | 0.117 | 0.034 | HLA-DRB6 | 7 |
chr6 | 32604564 | 32610971 | 6408 | 24 | | 0.113 | 0.043 | HLA-DQA1 | 8 |
chr6 | 32632000 | 32636189 | 4190 | 40 | | -0.111 | -0.012 | HLA-DQB1 | 9 |
DNA methylation mediates genetic risk of HLA-DRB1
Given the strong genetic associations at HLA-DRB1 with MS we investigated if the four DMRs in this gene are influencing the relationship between genotype and phenotype (i.e., mediating the genetic effect through epigenetic alteration). Genotyping array data for cases and controls was available for both the discovery group and the first replication dataset to test this hypothesis. For each of the four DMRs we selected an index CpG (i.e., yielding the largest delta-beta) to serve as a proxy for the respective DMR. We first checked the methylation quantitative effect loci (mQTL) effect of HLA-DRB1 and found that regardless of status (case or control) the results reveal a striking relationship between methylation and risk haplotype at this locus (Supplementary Fig. 2).
We then performed mediation analysis with the causal inference test (CIT) method 25 to test for causality between HLA-DRB1 risk haplotype (casual factor), DMR methylation (mediator) and CDMS as the phenotype. CIT analysis allows the establishment of a chain of causality between genotype, methylation and MS, ensuring that the genotype effect on MS is mediated through methylation. We used rs3135388 as the tagging SNP for the main HLA-DRB1 haplotype. Results of the CIT showed that the methylation differences identified between pwMS and controls are driven entirely by genotype in both the discovery and the first replication group (P < 0.05) (Fig. 2). We did not have the same genotype information from all cohorts for the HLA-DRB5 haplotype; however, in the discovery group, the CIT HLA-DRB5 reached statistical significance using the tagging SNP rs73405151 (data not shown).
Methylation differences act independently of genetic risk
Given the strong correlation between methylation and genotype at the MHC region, we wanted to assess epigenetic risk acting independently from known genetic risk of MS. To do this we computed EWAS models controlling for the effect of both HLA-DRB1 risk haplotype and non-MHC SNPs identified in the 2019 IMSGC GWAS2, using a polygenic risk score (PRS). This was only performed in the discovery group given a) there was missing genetic data from other groups and b) the discovery group represents early-stage MS, which is an important attribute for risk assessment. From this analysis, we identified 1627 DMPs reaching statistical significance (P < 9.8x10− 8), 123 of which had a Δβ above 2% (See supplementary File 1). Interestingly, none of the DMPs localized to the HLA-DRB1 gene, further indicating that the epigenetic effects we had identified at this locus are linked to genetic variation. In order to assess if identified DMPs were modulated by genotype or most likely the result of lifestyle and/or environmental exposure, we tested each CpG for cis-acting methQTLs. Only 104 out of 1627 (6.4%) produced statistic evidence of modulation by a nearby genetic locus (see Fig. 3d). Over representation analysis (ORA) on the list of 1092 unique genes identified after genotype correction revealed emerging pathways in MS such as NOTCH signaling pathways and axon guidance pathways (see Fig. 3c).
Using these DMPs we then constructed a genetic risk corrected methylation risk score (grcMRS) for each subject. This grcMRS represents epigenetic risk of MS independent of known genetic risk in this group. We then compared the grcMRS to HLA-DRB1 risk haplotype and the PRS to assess how well each index distinguishes MS from healthy controls. First, we show that PRS and HLA-DRB1 risk haplotype are not correlated, indicating independence of these genetic risk factors. We subsequently constructed a total genetic risk score, incorporating both the PRS and HLA-DRB1 risk haplotype. As expected, the grcMRS correlated very weakly with total genetic risk (see Fig. 3a).
To assess how those scores can act as classifiers for MS, we calculated AUC-ROC curves (see Fig. 3b). We show that the total genetic risk score (HLA-DRB1 + PRS) provides moderate discriminatory accuracy (AUC = 0.72, 95% CI: 0.66–0.76). By comparison the grcMRS performed better at distinguishing MS from healthy controls (AUC = 0.85, 95% CI: 0.82–0.89). Combining both genotype and methylation scores only marginally improved the accuracy (AUC = 0.87, 95% CI: 0.84–0.91) as too did the addition of other covariates such as cell proportions, sex and age (AUC = 0.89, 95% CI: 0.86–0.92).
Cell Specific Differential Methylation Is Mainly Attributed To B Cells And Monocytes
To identify if specific immune cell types are responsible for the differential methylation identified in the whole blood samples, we employed a two-step approach using the combined dataset. First, we calculated cell proportion estimates in the combined dataset using the R package EpiDISH (24). This revealed a statistically significant decrease in the proportion of Natural Killer (NK) cells (meancase = 0.021, meancontrol = 0.031, P = 1.34e− 08) and CD8+ T cell proportions (meancase = 0.095, meancontrol = 0.106, P = 5.92e− 05) and a statistically significant increase in CD4+ T cell proportions (meancase= 0.081, meancontrol = 0.071, P = 1.58e− 03) in MS cases compared to controls. There were no differences in the proportion of monocytes, B cells and neutrophils between cases and controls (supplemental Fig. 3).
Second, we used these cell type proportions, in conjunction with CpG methylation (M) values, to calculate cell-type specific DMPs (csDMPs). Here we employed an adaptation of the functions in CellDMC 26. Although it accounts for all cell types simultaneously, the base CellDMC regression model can become overburdened when incorporating many cell-types (i.e., many terms), thus reducing power and preventing the identification of cell specific effects. To overcome this, we constructed a "per-cell type" linear model (see methods section).
Genome-wide analysis of cell specific methylation patterns revealed an over-representation of csDMPs in both B cells and monocytes. From highest number of csDMPs to lowest were B cells > monocytes > CD4+ T cells > CD8+ T cells > NK cells then neutrophils (Fig. 4a). Interestingly, csDMPs were predominantly hypermethylated in B cells (over 60%) while monocytes had a more evenly distributed set of csDMPs (Fig. 4a). Consistent with whole blood analysis, there were marked differential methylation effects at HLA, but notably, this effect exhibited obvious variation across cell subtypes (Fig. 3b). More detailed analysis of the cell-specific modelling revealed that the strong differential signal observed at the HLA region in whole blood originates primarily from monocytes and B cells but was also present to a lesser extent in T lymphocytes (Fig. 4). It was notably absent or nearly absent in both NK cells and neutrophils (Supplementary file 2).
Comparison of the four cell types with highest number of csDMPs, that met both the significance and effect size cut offs, (monocytes, B cells, CD4+ T cells and CD8+ T cells) revealed that both B cells and monocytes had largely a different set of csDMPs. There were 1165 CpGs unique to B cells, 771 CpGs unique to monocytes and 347 common csDMPs between the two. When we investigated all the cell types we found that CD4 + T cells, CD8 + T cells, monocytes and B cells shared 28 common csDMPs, with the same directionality between them located on 7 different genes: HLA-DQB1, CMPK2, IGF2R, HLA-DRB1, HLA-DRB6, HLA-DRB5 and HLA-DQA1 (Supplementary Fig. 4 and supplementary table 3).
Our replication groups had an older mean age and a longer disease duration than the discovery group. To examine if the predominant methylation signal coming from B cells and monocytes occurs early in disease pathology or is a result of long-standing disease, we performed the same analysis described above in only the discovery group. We found the majority of DMPs appeared to be from B cells and monocytes, with absent or nearly absent signals from both NK cells and neutrophils (supplementary Table 3). Similarly, there was differential methylation predicted to a lesser extent in T lymphocytes.
DNA methylation at HLA-DRB1 correlates with gene expression in B cells and monocytes
We did not have accompanying RNA for the whole blood groups used in this study to examine gene expression. Therefore, to determine if methylation was related to gene expression, we examined the relationship between methylation and gene expression using RNA extracted from isolated monocytes, CD19+ B cells and CD4+ T cells from our previously published study16, one unpublished study, and unpublished dataset (Maltby et al, unpublished) (see methods for group details). All three were cross-sectional, unrelated MS case-control studies.
We used Pearson’s correlation to estimate the relationship between methylation and expression levels of HLA-DRB1 in all three cell types (Fig. 5) and found a strong negative correlation between gene expression and methylation at DMR-2 located in exon 2 of HLA-DRB1. This data indicates hypermethylation at this locus is associated with decreased gene expression in these immune cell subtypes (Fig. 5). As expected, based on the DNA methylation levels, there was a strong correlation between HLA-DRB1 expression and methylation levels, but this correlation was not as strong in the CD4+ T cells.
Cell types are listed in rows, CpGs are in columns, except for column 1, which represents gene-of-interest expression in each cell type. The correlation coefficient (R) is represented through the circle size and colour. Larger circle area indicates high absolute R value (maximum 1). Red indicates negative values; blue indicates positive values. The black rectangle represents the hypomethylated DMR 2 at HLA-DRB1 identified in whole blood and Cell specific analysis
Over-representation analysis shows enrichment of DMPs at immunological pathways
In the combined whole blood group, we identified 190 DMPs with Δβ > 2%. These DMPs mapped to 92 unique genes (Supplementary file 3). Over-representation analysis (ORA) using this gene list identified 28 pathways. The ORA showed an enrichment in immunological pathways, including cytokine and interferon-related pathways as well as antigen presentation through class I and II MHC (supplementary Fig. 6).
Our cell specific analysis identified 1740 csDMPs in B cells and 1212 csDMPs in monocytes (Fig. 3a). We created a gene list for B cells and monocytes which contained genes associated with at least one csDMP in either cell type. We then used these resulting gene lists in combination with the whole blood gene list to perform over-representation analysis. All the pathways identified in B cells were also present in both whole blood and monocytes. However, the gene list from monocytes generated unique pathways. For example, cytokine signaling and IFN alpha/beta pathways identified in whole blood were only found in monocytes (Fig. 6).