Overview. The primary epigenome-wide association study (EWAS) of handedness was performed in two cohorts with DNA methylation data in whole blood (Illumina, 450k): NTR adults (N=2,682 individuals including twins, mean age at methylation 36.5, standard deviation (SD) 12.7), and ALSPAC adults (N=1,232, mean age at methylation 48.98, SD 5.55). EWAS analyses were performed in each dataset separately, and summary statistics were combined in the meta-analysis (N=3,914) testing 409,563 CpGs. Further, we tested whether EWAS signal was enriched nearby loci detected in the previous GWAS on handedness13. We carried out within-pair twin analysis in MZ twins discordant for handedness (Nadults= 133 twin pairs, Nchildren= 86 twin pairs). Secondary analyses were performed in different tissues: in cord blood and peripheral blood in ALSPAC children (N=1,021 with DNA methylation data at birth, at 7, 17 years old, Illumina 450k chip, and/or at 24 years old, Illumina EPIC array) and in buccal cells in NTR children (N=946 twins, mean age 9.5, SD 1.85, Illumina EPIC array). We performed EWAS analyses in each dataset and examined correlations between the effect sizes of top CpGs (defined as CpGs with the lowest p-value) in each analysis. Finally, we created and tested polygenic and DNA methylation scores for left-handedness. The study methodology and design are presented in Fig. 1 and Fig. 3.
Subjects. Primary analysis. The adult NTR Biobank cohort75 included twins, parents of twins, siblings of twins and spouses of twins, and had DNA methylation data from blood samples, as previously described76. Complete data on handedness and DNA methylation from blood samples were available for 2,682 individuals (mean age 36.5, SD = 12.7), of whom 2,486 were twins/triplets and 196 were parents, siblings or spouses of twins. All subjects provided written informed consent. The study was approved by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Centre, Amsterdam, an Institutional Review Board certified by the U.S. Office of Human Research Protections (IRB number IRB00002991 under Federal-wide Assurance FWA00017598; IRB/institute codes, NTR 03-180).
The ALSPAC cohort77–79 is a population-based birth cohort. All pregnant women living in the geographical area of Avon (UK) with expected delivery date between 1 April 1991 and 31 December 1992 were invited to participate. Approximately 85% of the eligible population was enrolled, totaling 14,541 pregnant women who gave informed and written consent. The study website contains details of all the data that are available through a fully searchable data dictionary and variable search tool (http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/). The ALSPAC adult group consisted of parents including 1,232 mothers and fathers with mean age 48.98 years, SD 5.55, when blood samples were acquired. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees.
Secondary analysis. The ALSPAC child cohort comprised of 1021 individuals recruited from birth who had information on handedness and DNA methylation profiles. DNA methylation from peripheral blood cells measured at different ages within the Accessible Resource for Integrated Epigenomics Studies (ARIES) project80: at birth (N=703), at mean age 7.44 (N=757), at mean age 17.11 (N=759), and at mean age 24.3 (N=442). Study data were collected and managed using REDCap electronic data capture tools hosted at the University of Bristol81. REDCap (Research Electronic Data Capture) is a secure, web-based software platform designed to support data capture for research studies. The NTR child cohort included in the EWAS of buccal cell DNA was part of a project on childhood aggression “Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies” (ACTION)82,83. The ACTION-NTR cohort84 included 1,235 children for whom epigenome-wide data were successfully assessed, mainly from MZ twin pairs. ACTION included twins who at least once scored high or low on a test score for aggression from the population-based NTR. Complete data on handedness and DNA methylation from buccal samples were available for 946 twin individuals (mean age 9.5, SD 1.8, range = 5-12). The study was approved by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Centre, Amsterdam, an Institutional Review Board certified by the U.S. Office of Human Research Protections (IRB number IRB00002991 under Federal-wide Assurance FWA00017598; IRB/institute codes, NTR 03-180). For children, written informed consent was given by their parents.
Handedness. NTR. An aggregated variable was made for handedness using all available information on hand preference from Adult Netherlands Twin Register (ANTR) surveys, Youth Netherlands Twin Register (YNTR) surveys and experimental projects to obtain the most accurate value for each person. ANTR surveys were held at different time points and included the question: “Are you right-handed or left-handed?” or “Are you predominantly left-handed or right-handed?”. The answer categories were left-handed, right-handed, and both. If there was only one report or only consistent reports, this value was used. In case of inconsistent reports, if there were at least three reports and only one deviating value, the deviating one was excluded and the consistently reported value was used. In all other inconsistent cases, handedness was coded as missing. The YNTR surveys include parent reports as well as self-reports (from age 14 onward), and included questions on hand preference at different ages. In the remaining surveys, handedness was assessed using a single item with 3 categories: left-handed, right-handed, and both hands. Exception was the assessment at age 5 when handedness was assessed based on 7 items (drinking from a cup; eating; throwing a ball; drawing on paper; picking up a coin; combing hair; and thumb sucking while asleep) which were then classified in the same 3 categories as other measurements. The reports at age 2, 5, 14, 16, and 18 (all with categories right-handed, left-handed, and both hands) were checked for consistency across time to obtain handedness based on YNTR surveys. Non-survey projects assessed handedness with 3 categories (left-handed, right-handed, and both hands), except for one project where 7 items were used which were rated on a 5-point scale (writing, throwing, using scissors, toothbrushing, using fork, using spoon, lighting a match). These 7 items were recoded into a single 3 category item using an algorithm similar to the one used for the age 5 questionnaire. Finally, ANTR surveys, YNTR surveys, and projects were combined. The ANTR measurement was used, if not inconsistent with project handedness. If this was not available, YNTR handedness was used, if not inconsistent with project handedness. The final variable was coded left-handed, right-handed, and both hands (that included ambidextrous and mix-handed individuals).
ALSPAC. Child handedness was assessed at 42 months by questionnaire in which the mother was asked which hand the child used to draw, throw a ball, color, hold a toothbrush, cut with a knife, and hit things (6 questions). Responses were scored -1, 0 or 1 for left, either or right, respectively. Those with score sums from -6 to -4 were labelled left-handed and those with sums from 4 to 6 were labelled right-handed. Mothers and fathers were similarly asked which hand they used to write, draw, throw, hold a racket or bat, brush their teeth, cut with a knife, hammer a nail, strike a match, rub out a mark, deal from a pack of cards or thread a needle (11 questions). Responses were scored -1, 0 or 1 for left, either or right, respectively. Those with score sums from -11 to -7 were labeled left-handed and those with sums from 7 to 11 were labeled right-handed. Individuals with scores outside these ranges were labeled ambidextrous or mixed-handed and excluded from this study. Handedness was coded as 1 for left-handed or 0 for right-handed in both cohorts.
DNA methylation. NTR adults. The NTR blood DNA methylation data was generated as part of the Biobank-based Integrative Omics Study (BIOS) consortium76,85. Blood collection procedures were described previously75. DNA methylation was assessed with the Infinium HumanMethylation450 BeadChip Kit (Illumina, San Diego, CA, USA), wet laboratory procedure, preprocessing analyses, and quality control were performed at the Human Genotyping facility (HugeF) of ErasmusMC, the Netherlands (http://www.glimdna.org/) and have been described previously76,85. Only the autosomal methylation sites were analyzed, i.e., a total of 411,169 methylation sites. The percentages of neutrophils, monocytes and eosinophils were used to adjust DNA methylation data for inter-individual variation in white blood cell proportions76. Missing probe values (probes with missing values in more than 5% of the sample had been removed) were imputed with the function imputePCA from the package missMDA as implemented in the pipeline for DNA methylation array analysis developed by the Biobank-based Integrative Omics Study (BIOS) consortium86
ALSPAC adults and children. The ALSPAC blood collection were generated at different ages. DNA methylation was measured with the Infinium HumanMethylation450 BeadChip Kit and Infinium MethylationEPIC BeadChip (Illumina, San Diego, CA, USA) as part of the ARIES80. Wet laboratory procedures, preprocessing analyses, and quality control were performed at the University of Bristol, as previously described80. Only autosomal probes were analysed in our study: 838,019 probes (Illumina EPIC human methylation arrays) at 24 years of age, and 471,465 probes (Illumina human methylation 450k arrays) at other ages. Blood cell-type proportions were estimated from DNA methylation profiles using deconvolution algorithms87, and included in statistical models to adjust for cell type heterogeneity. Batch effects and additional unknown confounding were estimated using surrogate variable analysis (SVA) in meffil88. DNA methylation outliers were identified as those three times the inter-quartile range from the nearest of the first and third quartiles. Outliers were replaced with missing values.
NTR children. DNA samples were collected from buccal swabs, as previously described89. DNA methylation was measured using the Infinium MethylationEPIC BeadChip (Illumina, San Diego, CA, USA)90, wet laboratory procedure, preprocessing analyses, and quality control were performed by the Human Genotyping facility (HugeF) of ErasmusMC, the Netherlands (http://www.glimdna.org/), as previously described91. Only autosomal methylation sites were analyzed, leaving 787,711 out of 865,859 sites for analysis. Cellular proportions of buccal cells were estimated from DNA methylation profiles using the deconvolution algorithm HepiDISH92. Cell proportions of epithelium cells and natural killer cells were included in statistical models to adjust for cellular heterogeneity. DNA methylation outliers were identified as those three times the inter-quartile range from the nearest of the first and third quartiles. Outliers were replaced with missing values.
Genotyping. NTR. Genotyping in NTR was done on multiple platforms including Perlegen-Affymetrix, Affymetrix 6.0, Affymetrix Axiom, Illumina Human Quad Bead 660, Illumina Omni 1M and Illumina GSA. Quality control was carried out and haplotypes were estimated using PLINK93. For each genotype platform, samples were removed if DNA sex did not match the expected phenotype, if the PLINK heterozygosity F statistic was < -0.10 or > 0.10, or if the genotyping call rate was < 0.90. SNPs were removed if the MAF < 1×10-6, if the Hardy-Weinberg equilibrium p-value was < 1×10-6, and/or if the call rate was < 0.95. Subsequently, for each platform, the genotype data was aligned with the 1000 Genomes reference panel using the HRC and 1000 Genomes checking tool, which tests and filters for SNPs with allele frequency differences larger than 0.20 as compared to the CEU population, palindromic SNPs and DNA strand issues. The data of the six platforms was then merged into a single dataset, keeping all quality-controlled SNPs of each platform. For each individual, one platform was chosen. Based on the ~10.8k SNPs that all platforms have in common, DNA Identity By Descent state was estimated for all individual pairs using the Plink and King programs. CEU population outliers, based on per platform 1000 Genomes PC projection with the Smartpca software94, were removed from the data. Data were phased per platform using Eagle, and then imputed to 1000 Genomes and Topmed using Minimac following the Michigan imputation server protocols95. For the polygenic scoring the imputed data were converted to best guess data, and were filtered to include only ACGT SNPs, SNPs with MAF > 0.01, HWE p > 10-5 and a genotype call rate > 0.98, and to exclude SNPs with more than 2 alleles. All Mendelian errors were set to missing. 20 PCs were calculated with Smartpca using LD-pruned 1000 Genomes–imputed SNPs that were also genotyped on at least one platform, had MAF > 0.05 and were not present in the long-range LD regions.
ALSPAC. Genetic data were collected from the blood samples obtained in clinic visits. Genotyping was conducted with the Illumina HumanHap550 quad chip for children and the Illumina human660W-quad array for mothers. Quality control measures were carried out and haplotypes estimated using ShapeIT. A phased version of the 1000 genomes reference panel from the Impute2 reference data repository was used, and imputation of the target data was performed with this, using all reference haplotypes. Following imputation, variants were retained only given info scores > 0.8 and minor allele frequency > 0.01. Retained variants were then converted to best-guess genotype calls. To avoid potential confounding due to relatedness, closely related individuals were removed using GCTA with a GRM cutoff of 0.05.
Statistical analysis
Intergroup differences. We tested if there were differences in characteristics that were included in EWAS models (such as age at biosample collection, sex, BMI, smoking status at blood collection for adults, and gestational age, maternal smoking during pregnancy, birth weight for children, cell proportions/percentages in buccal swabs and in blood samples) between left-handed and right-handed individuals by generalized estimating equations (GEE) to accommodate the relatedness among the twins in NTR, and by standard logistic regression in ALSPAC. The R package ‘gee’ was used with the following specifications: binomial (for ordinal data) link function, 100 iterations, and the ‘exchangeable’ option to account for the correlation structure within families and within persons. Right- and lefthanded MZ discordant twins were compared with paired t-test for the traits that were not identical in twins (birth weight, BMI, smoking, cell percentages).
Epigenome-Wide Association Analyses. Primary analyses. The association between DNA methylation levels and left-handedness was tested for each site under a linear model (ALSPAC) or generalized estimating equation (GEE) model accounting for relatedness of twins (NTR). DNA methylation β-value was the dependent variable, and the following predictors were included in the basic model: handedness (coded as 0=right-handed and 1=left-handed), sex, age at blood sampling, percentage of blood cells (monocytes, eosinophils, neutrophils) for blood samples, sample plate and array row in NTR. In ALSPAC, the predictors were handedness (coded the same way), sex, age at blood sampling, percentage of blood cells, and surrogate variables (n=20) were included as predictors. An adjusted model was fitted to account for BMI and smoking status at blood draw in both NTR and ALSPAC adult cohorts, because BMI and smoking are known to have large effects on DNA methylation in adults49,56 and were associated with handedness in some studies64,65. The primary results reported in the paper are based on the fully adjusted model. The models are described in Appendix 2.
Secondary analyses. The same basic models were fitted to the data from ALSPAC and NTR children. For DNA methylation in buccal cells, cell proportions (epithelial cells, natural killer cells) for buccal samples were included instead of percentage of blood cells. As several characteristics showed association with handedness in previous studies64,65 and effect on DNA methylation57,58, they were included in adjusted model in children (gestational age and birthweight, see Appendix 2).
In the within-pair analysis of discordant MZ twins, paired t-tests were employed to test for methylation differences between the left-handed and the right-handed twins. Paired t-tests were performed in R on residual methylation levels, which were obtained by adjusting the DNA methylation β-values for sample plate, array row, cell proportions in buccal samples (epithelial cells, natural killer cells) in children and sample plate, array row, and percentages of blood cells (monocytes, eosinophils, neutrophils) in adults. Additional covariates, birth weight in children and BMI and smoking status in adults, were added in adjusted model. Age, sex, maternal smoking, and gestational age were not included because these variables are identical in MZ twins.
To account for multiple testing, we considered Bonferroni correction and a False Discovery Rate (FDR) of 5%. The Bonferroni corrected p-value threshold was calculated by dividing 0.05 by the number of genome-wide CpGs tested, and false discovery rate (FDR) q-values were computed with the R package ‘qvalue’ with default settings. The Bayesian inflation factor (λ) was calculated with the R package Bacon96 (see Table S5).
Meta-analysis. A meta-analysis was performed in METAL97 based on estimates (regression coefficients) and standard errors from the EWAS of handedness performed with GEE in NTR and linear regression in ALSPAC. NTR and ALSPAC adult cohorts were combined. In total, 409,563 CpG sites present in both cohorts were tested with statistical significance evaluated after Bonferroni correction and at an FDR q-value <0.05.
Comparison of top CpGs from different analyses. We selected methylation sites that overlapped in 13 analyses with adjusted model (meta-analysis, meta-analysis without discordant MZ twins, EWAS NTR adults, EWAS NTR adults without discordant MZ twins, EWAS ALSPAC adults, EWASs ALSPAC at birth, 7, 17, 24 years, EWAS NTR children, EWAS NTR children without discordant twins, and within-pair analyses of discordant MZ twin adults and children) that resulted in 379,924 methylation sites. We calculated Pearson correlations for effect estimates of the top 100 CpGs ranked by p-value from one analysis with the effect estimates of the same CpGs in other analyses. Statistical significance of correlations was assessed after Bonferroni correction for the number of correlations tested: α = 0.05/(13 × 13 – 13) = 0.0003.
Differentially methylated regions. We used the R dmrff library for R45 to identify regions where multiple correlated methylation sites showed evidence for association with handedness. Dmrff identifies DMRs by combining EWAS summary statistics from nearby CpG sites with methylation datasets to compute correlations between CpGs (https://github.com/perishky/dmrff). Dmrff was applied in each cohort separately. For the meta-analysis, we calculated CpG site correlations separately for NTR adults and ALSPAC adults (parents) cohorts, and performed DMR meta-analysis. We applied an adjusted p-value that was a p-value multiplied by the total number of tests performed with the number of tests equal to the number of regions for which DMR statistics are calculated. We report significant regions (padj < .05) with at least two methylation sites within a 500bp window. We plotted the DMRs with the coMET R Bioconductor package98 to graphically display additional information on physical location of CpGs, correlation between sites, statistical significance, and functional annotation (annotation tracks included genes Ensembl, CpG islands (UCSC), regulation Ensembl).
GWAS follow-up. GWAS follow-up analyses were performed to examine whether CpGs within a 1 Mb window of loci detected by the GWAS for left-handedness13, on average, showed a stronger association with left-handedness than other genome-wide methylation sites (Infinium HumanMethylation450 BeadChip). We obtained a SNP list based on the GWAS meta-analysis without NTR, ALSPAC, and 23andMe by Cuellar-Partida et al13 (196,419 individuals, NSNPs = 13,550,404), from which we selected all SNPs with a p-value <1.0 ×10-08, <1.0 ×10-06, and <1.0×10-05, and determined the distance of each Illumina 450k methylation site to each SNP. To test whether methylation sites near GWAS loci were more strongly associated with left-handedness, meta-analysis EWAS test statistics were regressed on a variable indicating if the CpG is located within a 1 Mb window from SNPs associated with handedness (1=yes, 0=no):
|Zscore| = Intercept + 𝛽category x * Category x,
where |Zscore| represents the absolute Zscore for a CpG from the EWAS meta-analysis of handedness; 𝛽category x represents the estimate for category x, i.e. the change in the EWAS test statistic associated with a one unit change in category x (e.g. being within 1 Mb of SNPs associated with left-handedness). For each enrichment test, bootstrap standard errors were computed with 2000 bootstraps with the R-package “simpleboot”. Statistical significance was assessed at α = 0.05. As control analysis the same follow-up was performed using GWAS summary statistics on a trait that is unrelated to handedness – type 2 diabetes in UK Biobank cohort (N=244,890)54. GWAS summary statistics were downloaded from GWASAtlas (https://atlas.ctglab.nl/traitDB/3686; 41204_E11_logistic.EUR.sumstats.MACfilt.txt; access on February 1 2021).
EWAS follow-up. To examine previously reported associations for epigenome-wide significant DMRs associated with left-handedness in our study, we looked up CpGs from the regions in the EWAS Atlas (https://bigd.big.ac.cn/ewas/tools; accessed on August 1 2020)99 and EWAS catalogue (http://www.ewascatalog.org; access on November 1 2020)100.
Polygenic and methylation scores. Polygenic scores (PGS) for handedness were calculated based on the GWAS meta-analysis without 23andMe by Cuellar-Partida et al13. To avoid overlap between the discovery and target samples, summary statistics without NTR and ALSPAC were requested (196,419 individuals, NSNPs = 13,550,404). The linkage disequilibrium (LD) weighted betas were calculated using a LD pruning window of 250 KB, with the fraction of causal SNPs set at 0.50by LDpred101. We randomly selected 2500 2nd degree unrelated individuals from each cohort as a reference population to calculate the LD patterns. The resulting betas were used to calculate the PGSs in each dataset using the PLINK 1.9 software. All PGSs were standardized (mean of 0 and standard deviation of 1).
Methylation scores (MS) were calculated in NTR based on EWAS summary statistics obtained from ALSPAC, and vice versa, as previously done to create methylation scores for BMI, height, and prenatal smoking102,103. We calculated same-tissue same-age DNA-methylation scores based on methylation data from NTR adults (blood) and ALSPAC parents (blood), and cross-tissue DNA-methylation scores based on data from NTR and ALSPAC children, with DNA methylation measured in buccal cells, and blood, respectively (see Fig.3). For each individual, a weighted score sum was calculated for left-handedness by multiplying the methylation value at a given CpG by the effect size of the CpG (β), and then summing these values over all CpGs: DNA methylation score (i) = β1*CpG1i + β2*CpG2i …. + βn*CpGni, where CpGn is the methylation level at CpG site n in participant i, and βn is the regression coefficient at CpGn taken from summary statistics of the EWAS analysis. All methylation scores were standardized (mean of 0 and standard deviation of 1).
We used weights from summary statistics of EWASs in four cohorts: NTR adults, ALSPAC parents, NTR children, ALSPAC children at 7 years old. Subsets of CpGs to be included in methylation scores were selected based on p-value <1*10-1, <1*10-3, and <1*10-5.
Testing predictive value. We analysed the predictive value of the left-handedness polygenic scores and methylation scores in NTR and ALSPAC adult and child cohorts from our EWAS study. To quantify the variance explained by the PGS and MS, we used the approach proposed by Lee et al. where coefficients of determination (R2) for binary responses are calculated on the liability scale104. R2 is equal to the explained variance divided by the total variance; that is the sum of explained variance and residual (homoscedastic) variance. We first regressed left-handedness on PGS and GWAS covariates (genotype platform, the first ten principal components based on genotype data, and sex) (model 1), and then on GWAS covariates only (model 2) using logistic regression. We calculated variance explained by all predictors in each model. We calculated the predictive value of the PGS by subtracting the difference between the variance explained by the first and the second model. For BMI, it has been shown that DNA methylation predicts the trait over and above a polygenic score based on SNPs102. To examine the predictive value of MS and PGS in a combined model, we regressed left-handedness on PGS, MS, genotyping and EWAS covariates (model 3). Next, we regressed left-handedness on the same predictors without MS (model 4) and without PGS (model 5). The difference between explained variance in the third and fourth models gave us an estimate of variance explained by MS. The difference between explained variance of the third and fifth models resulted in an estimate explained by PGS. The equations of all models are provided in Appendix 2. Statistical significance was assessed following Bonferroni correction for the number of scores tested (PGS and 3 MS). This resulted in α=0.05/4=0.0125, nominal significance at 0.05.
Data availability
The HumanMethylation450 BeadChip data from the NTR are available as part of the Biobank-based Integrative Omics Studies (BIOS) Consortium in the European Genome-phenome Archive (EGA), under the accession code EGAD00010000887. The Infinium MethylationEPIC from NTR are available from the Netherlands Twin Register on reasonable request. DNA methylation data from ALSPAC are available at ALSPAC and can be provided on request. The study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool (http://www.bristol.ac.uk/alspac/researchers/our-data).
The code used to perform the meta-analyses are available from the corresponding author on request. The pipeline for the DNA methylation array analysis developed by the Biobank-based Integrative Omics Study (BIOS) consortium are available here: https://molepi.github.io/DNAmArray_workflow/86. EWAS summary statistics for the top 100 CpGs are given in Supplemental tables (S14-S27). The full EWAS summary statistics from the meta-analysis with adjusted model is provided in Supplemental tables 31.