Genome-wide analysis of DNA methylation in hypertension-discordant monozygotic twins

Background: DNA methylation has great potential for identifying the aetiology of hypertension. The aim of this study was to explore the correlation between hypertension and DNA methylation using twins discordant for hypertension in China. Methods: In this study, 43 pairs of monozygotic twins discordant for hypertension (age 31.9-72.3 years; 67.4% male) from the Chinese National Twin Registry were recruited. Genome-wide DNA methylation was measured using the Illumina Human methylation EPIC Beadchip in whole-blood-derived DNA. Standardized questionnaires were used to collect twin data on the following variables: age, gender, socioeconomic level, lifestyle factors (including smoking, alcohol drinking, vegetable intake, and physical activity). Blood pressure, height, weight, and other anthropometric indicators were obtained by physical examination. Empirical Bayes paired moderated t-test was utilized to compare the methylation data within twin pairs. Results: Four suspected significant methylation sites, cg00950476, cg08041400, cg26733338, and cg08580087 were identified. All of these four sites locate on known loci, which were LINC01252, BDP1, SYT1, and ODZ4, respectively. The main function includes transcriptional regulation, learning and cognitive, neurodevelopment. The significant sites were further replicated among two different replication population, the first replication population contained 38 hypertension concordant monozygotic twin pairs and 38 non-hypertension concordant monozygotic twin pairs matched in age, sex, region, and birth order, and the second replication group included 21 MZ twin pairs discordant for hypertension . None of them, however, were significant. The methylation variation in the above sites may influence blood pressure, independent of genetic and early-life environmental factors. Conclusions: This study found four suspected methylation sites correlated with hypertension. However, all four sites failed the replication analysis. More hypertension-discordant monozygotic twin pairs are needed to replicate these findings in


Study Importance Questions
-What is already known about this subject?
Numerous studies had focused on the correlation between hypertension and DNA methylation and tens of methylation sites on specific gene loci correlated with hypertension had been found.
Limited DNA methylation studies were based on twin population worldwide. Moreover, there was a gap to be filled in this field in China.
-What does our study add?
Although the four suspected methylation sites (cg00950476, cg08041400, cg26733338, and cg08580087) correlated with hypertension failed the replication analysis in this study, which might be related to imperfect replication design, it was worth to be further replicated.
Genetic factors may influence DNA methylation. Disease-discordant monozygotic twin pairs are the ideal subjects when exploring potential contributions of DNA methylation to the aetiology of hypertension with minimum confounding from genetic heterogeneity.
To our knowledge, this study was the first one to explore the correlation between hypertension and DNA methylation using twins discordant for hypertension in China and had the largest sample size in genome-wide methylation study of hypertension in twins.
(MZ) twin pairs, which share highly matched genetic structure and early environmental factors, provide an ideal model for exploring potential contributions of DNA methylation to the aetiology of hypertension with minimum confounding from genetic heterogeneity 8 .
Analysis of discordant MZ twins has been successfully used to study epigenetic mechanisms in hypertension 9 . However, the sample size was mostly only dozens and the evidence from Asian twins was lacking.
In this study, we examine genome-wide DNA methylation using the Illumina Human methylation EPIC Beadchip in whole blood from 43 pairs of MZ twins discordant for hypertension. Then we attempted to further replicate significant cytosine-phosphateguanine sites (CpG sites, where DNA methylation occurs) among two different replication population (detailed descripted in Methods section).

Results
According to the criteria in this study, a total of 43 MZ twin pairs discordant for hypertension were analysed in the discovery population. The age of the twins ranged from 32 to 72 years, with a mean of 53 years. There were 29 (67.4%) pairs of male twins. The first replication group included 76 cases and 76 controls, and the second replication group included 21 MZ twin pairs discordant for hypertension. No significant difference in socioeconomic level was found among discovery population, replication population 1 and 2. There were 18 (29.8%) twins had taken antihypertensive drugs within one month in the discovery population and 41 (27.0%) twins in replication population 1, 5 (11.9%) twins in replication population 2. Table 1 shows the significant differences among discovery population, replication population 1 and 2. Notably, the difference in blood pressure between the case-controls was larger than that in the discovery population. The average blood pressure difference (SBP difference add DBP difference) between case-controls was 64mm Hg (ranged 11-151mm Hg). Only four pairs (5.3%) of case-controls had blood pressure difference less than 20mm Hg.

DNA methylation correlation analysis
Multivariate model 1 adjusted smoking, alcohol consumption, and SVA agent variables; model 2 additionally adjusted BMI and recalculated SVA agent variables. Figure 1a displayed the volcano plot of paired model 1, which depicted the genome-wide DNA methylation difference at each CpG site between hypertension twins and nonhypertension co-twins. Only one CpG site showed P value < 10 -6 , however, after correction for multiple testing, no CpG site remained significant under the threshold of FDR < 0.05. Then we filtered twin pairs among which blood pressure difference (SBP difference add DBP difference) ≥10mm Hg, ≥20mm Hg, ≥30mm Hg, ≥40mm Hg, respectively for empirical Bayes paired moderated t-test expecting to enhance the power to discover DNA methylation sites correlated with hypertension.
Three CpG sites with FDR < 0.05 were found in the twin paired model 1 among those with blood pressure difference ≥20 mm Hg. The results of the paired model for each blood pressure difference group were shown in Table 2. A CpG site with FDR=6.74E-02 was also found in the twin paired model 2 with blood pressure difference ≥20 mm Hg, which could be regarded as a suspected significant site to be replicated as well. The information of statistically significant sites in model 1 and model 2 was shown in Table 3 and the volcano plot was displayed in Figure 2.
Four significant CpG sites were hypomethylated in hypertensive twins. The significant CpG site (cg08580087) in model 2 was covered by 450K Beadchip, while the other three sites (cg00950476, cg08041400, cg26733338) were not.

Sensitivity analysis
Considering the influences from correlated diseases or medication treatment, we conducted the following sensitivity analyses in model 1 and model 2 by (1) Excluding twins discordant for diabetes mellitus and kidney disease; (2) Additionally adjusting selfreported medication use in the past month. All four significant sites maintained low P values, but failed to achieve a significant level of FDR < 0.05 (Supplement Table 1).

Replication analysis
Due to the small number of replication CpG sites, paired t-test was used in the replication analysis firstly. The significance level was P < 0.05 (two-sided test). Then empirical Bayes paired moderated t-test was used to test which site reached the Bonforroni test level (P < 0.05/ the number of significant sites found in the discovery stage, i.e. P < 1.25E-2). Since only four replication sites were not suitable for generating SVA, paired model 1 only adjusted smoking and alcohol consumption; model 2 additionally adjusted BMI. The replication results of four CpG sites were shown in Table 4.
In addition to the suspected significant CpG sites mentioned above, there were some other sites near the above CpG sites detected by MALDI-TOF-MS. Empirical Bayes paired moderated t-test was also performed on these sites, for which the significance level was FDR < 0.05. The result was shown in Supplement Table 2. Cg08580087 was analysed by paired t-test and empirical Bayes paired moderated t-test in the replication population 2, the significance levels were all P < 0.05. Similarly, only smoking and drinking were adjusted in paired model 1, and BMI was further adjusted in model 2. The results were P = 9.99E-01 in paired t-test, P = 4.66E-02 in paired model 1 and P = 1.15E-01 in paired model 2. This CpG site could still not reach the significance level. However, these were not the main replication result considering insufficiency power led by the small sample size.
From the above results, we could conclude that four suspected significant CpG sites (cg00950476, cg08041400, cg26733338, cg08580087) failed in the replication phase, suggesting that these four sites found in the discovery stage were suspected falsepositive sites.
In addition, we contrasted the P valule of methylation sites correlated to hypertension found in previous researches with in our models. One CpG site (cg17061862) correlated to SBP, which had a P value of 6.9E-05 in previous research 10 , reached a P value of 8.2E-05 in paired model 1 with blood pressure difference ≥20 mm Hg and 5.1E-05 in paired model 2 with blood pressure difference ≥10 mm Hg, which replicated this site to a certain extent. Replication results in our models of the sites found in previous researches were displayed in Supplement Table3.

Enrichment analysis
The enrichment analysis of four sites was performed by comparing the GO pathway of four above genes with known genes covered by EPIC Beadchip. Table 5 showed the biological pathways with Fisher's exact test P < 10 -3 . While FDRs of these pathways were all higher than 0.05, it could not be confirmed whether these genes were enriched in these pathways.

Discussion
In a two-stage design of discovery and replication analyses comprising 102 MZ Chinese twin pairs, four suspected significant CpG sites (cg00950476, cg08041400, cg26733338, and cg08580087) were found in twin pairs whose blood pressure difference ≥20 mm Hg.
Although the four suspected CpG sites correlated with hypertension failed the replication analysis in this study, which might be related to imperfect replication design, it was worth to be further validated.
All CpG sites failed to achieve a significant level of FDR < 0.05 in the sensitivity analyses.
The disease status and medication treatment, however, were based on self-report rather than medical record or auxiliary diagnosis. Eliminating twins with inconsistent diabetes or kidney diseases inevitably leaded to a reduction in sample size. Therefore, we did not consider the results of sensitivity analyses as the main results. BDP1, B double prime 1, is a subunit of RNA polymerase III transcription initiation factor IIIB, which helps RNA polymerase III to target on the binding promoter to start transcription. Mutation of BDP1 is related to hereditary deafness 11 . SYT1, synaptotagmin-1, is a membrane intrinsic protein on synaptic vesicles. It acts as a Ca 2+ receptor in the process of vesicle transport and exocytosis. Ca 2+ combines with synaptotagmin-1 to participate in the release of neurotransmitters to synapses. Current studies suggest that synaptotagmin-1 is related to learning, memory and cognitive function, and it has been found that SYT1 gene is associated with attention deficit hyperactivity disorder (ADHD) and SYT1 mutation leads to recurrent neurodevelopmental disorder 12 .
ODZ4, also known as TENM4 (teneurin transmembrane protein 4), plays an important role in the establishment of neuronal connectivity during development, and the defect of ODZ4 is related to primary tremor 13,14 . So far, we could not find any other studies revealing these four significant CpG sites or their located genes related to human hypertension. It was worth noting that, however, SYT1, where cg26733338 was located, had been studied in renovascular hypertensive rats (RHR) with hypertensive stroke induced by artificial cold wave exposure. This study found that SYT1 protein was upregulated in the cerebral tissue of RHR 12 . A previous metaanalysis of GWAS found a SNP (rs751984) located on SYT7 (synaptotagmin-7) was associated with mean arterial pressure (MAP) 15 . The protein encoded by SYT7 regulated calcium-dependent membrane transport in synaptic transmission. SYT7 was correlated with SYT1, suggesting that SYT1 might be correlated with hypertension.
All four suspected significant CpG sites in this study failed replication. We consider it might due to the following issues: (1) Since there was no established methodology for calculating the power of genome-wide methylation study in twins, the sample size was still a problem. Reviews of twins for epigenetics suggested that "a relatively small number (15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25) of phenotypically discordant twin pairs had sufficient (>80%) power to detect epigenetic changes of 1.2-fold, where an effect size of 1.2-fold change was significantly greater than the null experimental variance threshold for the assay (1.15-fold change)" 16,17 . Usually, the sample size of previous genome-wide methylation MZ twin study targeted on hypertension was about ten pairs. Our study was the largest genomewide methylation study based on twins discordant for hypertension so far; however, the sample size might still be insufficient. Moreover, about 2% of sites were missing in the replication analysis due to unsuccessful detection. (2) Most of the significant sites correlated with hypertension were found previously in the general population. Although potential confounding from age or gender was controlled by matching or covariable adjustment, the influence from the genetic background was not considered in the general population. In this study, MZ discordant for hypertension were used to match the genetic factors. Therefore, it was reasonable to presume that the CpG sites associated with hypertension found in the previous general population might be affected by genetic The replication design was imperfect. The first replication population included MZ twin pairs, which, however, were regarded as individual since they were hypertension or non-hypertension concordant pairs. Furthermore, the first replication population showed a higher difference in blood pressure between hypertension twins and non-hypertension cotwins than discovery population. These differences might originate from different treatments and proportion of controlled hypertension patients among discovery population and replication population. Although medication had been considered when performing sensitivity analyses, this might be a major limitation of this study, hampering replication of findings. Considering significant sites were discovered in the MZ with blood pressure difference ≥20mmHg, perhaps replications need to be further investigated in casecontrols with blood pressure difference ≥20mmHg. (4) BMI was significantly higher in hypertension twins compared to non-hypertension co-twins in discovery population and replication population. Although model 2 adjusted for BMI, the difference might impact the results of paired analysis. (5) Multiple testing correction was the preferable way to avoid over-inflating the results in the high throughput study 18,19 . In this study, we found that a large number of sites with original P values less than 10 -5 ; however, most of them were no longer significant after multiple testing correction. (6) Methodology in methylation data processing needed to be further improved. So far, there was no consistent method for data standardization 20 . In this study, we used DASEN with watermelon package for data standardization/normalization and SVA for potential confounding factors adjustment (mainly test plates and batches effect). These methods were commonly used in methylation studies. Different researchers might use different methods, followed by different findings. (7) Pyrosequencing was still the gold standard for methylation detection. In this study, MALDI-TOF-MS was used to detect methylation sites in candidate regions, which might not be perfectly consistent with pyrosequencing. In addition, the DNA extraction and methylation detection for discovery population and replication population were conducted in two different labs; and the DNA extraction kits used were from different manufacturers. Although all testing was conducted under strict quality control, the variance from different operators, reagents, and batches of experiments could not be excluded. (8) The disease status and medication treatment were based on selfreport. For instance, if a participant not diagnosed with hypertension was taking thiazide diuretic drugs for some kidney disease without knowing that it is antihypertensive drug, and so answered no to question 2, this participant would end up in the non-hypertension group. Hence we attempted to exclude twins discordant for kidney disease in the sensitivity analysis.
The strength of our study included a twin design, a new generation of methylation detection array, and careful adjustment for established and potential risk factors for hypertension. However, this study had some limitations. The sample size might be insufficient even though it might be the largest in hypertension-discordant twin pairs. The first replication population included 76 MZ twin pairs, while they were analysed individually. However, we additionally included 21 pairs of hypertension-discordant MZ twin pairs in the second replication population, although only one CpG site covered by 450K beadchip could be replicated. DNA methylation was tested using peripheral blood sample instead of hypertension-related tissues. Considering the difficulty for human tissues such as kidney or myocardium, blood was a common fitting surrogate marker for reflecting DNA methylation in tissues 21 . Another disadvantage was that these CpG sites were identified through cross-sectional analyses, and we were not able to characterize the direction of the association. Despite these limitations, we discovered four CpG sites associated with BP among 43 twin pairs, although they could not reach the significance level in the replication stage, partly due to the imperfect replication population design.

Conclusion
This study used twins to analyse the correlation between hypertension and DNA methylation and found four suspected significant sites, cg00950476, cg08041400, cg26733338, and cg08580087 located on LINC01252, BDP1, SYT1, and ODZ4, in discovery population; however, none of them survived the replication. In the future, we need a bigger sample size and different tissues for further replication. Longitudinal data will be needed for causality inference between these CpG sites and hypertension.

Measurements
Standardized questionnaires were used to collect twin data on the following variables: age, gender, socioeconomic level, lifestyle factors (including smoking, alcohol drinking, vegetable intake, and physical activity). Blood pressure, height, weight, and other anthropometric indicators were obtained by physical examination. Two questions in questionnaires focused on hypertension: (1) Have you ever been diagnosed with hypertension by county-level or above? (2) Have you ever taken antihypertensive drugs within one month before the investigation? The subjects who met with any of the following criterias were defined as hypertension patients: (1) Yes to question 1; or (2) Yes to question 2; or (3) Systolic blood pressure (SBP) ≥140 mm Hg and/or diastolic blood pressure (DBP) ≥90 mm Hg. Blood pressure was measured twice by trained observers using the OMRON HEM-7200 sphygmomanometer on the right arm of seated participants after 5-min rest. The average SBP and DBP were calculated. If the difference between the two measurements was more than 10 mm Hg, the average of the nearest two measurements were calculated after the third measurement. Height was measured by height meter, waist circumference and hip circumference were measured by tape, and weight and percent body fat (PBF) were measured by TANITA professional body composition scales.

Sample collection and processing
Blood samples were collected, transported and stored according to a standardized CNTR DNA methylation for the second replication group was detected using 450K Beadchip at Huazhong University of Science and Technology and has been described previously 23 .
These methylation data were available before designing this study.

Zygosity analysis
For the discovery population, zygosity was confirmed by genotyping 59 SNPs (single nucleotide polymorphism) on EPIC Beadchip. The SNP correlation coefficient for MZ twin pairs was extremely close to 1, while it was generally less than 0.8 for the dizygotic (DZ) twin pairs. In the replication population, we used 21 short tandem repeats (STRs) detection to identify the zygosity. The accuracy of STRs comparison in zygosity determination could exceed 99%, which was validated in Chinese twins 23,24 .

Quality control of the DNA methylation data
For the DNA methylation data of the discovery population, R minfi package was used to carry out the data quality control 25

Statistical analysis
In MZ discordant for hypertension, empirical Bayes paired moderated t-test was used to compare the methylation level between hypertension and non-hypertension co-twins and identify hypertension-related differentially methylated CpG sites (DMCs). The empirical Bayes paired moderated t-test has two following advantages for the application of MZ 27 : Firstly it helps to minimize the error between samples brought by the probe and it is suitable for studies with small sample size. Ebayes function in the R limma package was used 28 .
Considering the same age, sex, and region between MZ twins, we adjusted for smoking, drinking, BMI, and surrogate variable analysis (SVA) agent variables [29][30][31] in multivariate models. False discovery rate (FDR) <0.05 was used for multiple testing correction.
To examine the robustness of our findings, we also conducted sensitivity analyses by excluding discordant twin pairs for self-reported diabetes mellitus or kidney disorders.
We applied Bonforroni correction for multiple testing (P < 0.05/ the number of CpGs significant in the discovery stage) for CpGs selection in the replication stage. If a significant CpG site found in the discovery stage was covered by the 450K Beadchip, it could be replicated both in the first and second replication groups.
Top differentially methylated CpG sites were used for enrichment analysis. By linking these sites with the gene symbol using the annotation file provided by Illumina, the gene set of all significant sites could be obtained, as well as a corresponding gene set for the remaining CpG sites. The gene ontology (GO) functions and processes of the genes of all significant CpG sites were investigated using the gene ontology enrichment analysis and visualization web-based tool (Gorilla, http://cbl-gorilla.cs.technion.ac.il/) 32

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.
Supplementary Information.pdf