Study sample and ethics statement
A large fraction of the Icelandic population has participated in a nationwide research program at deCODE genetics. Participants donated blood or buccal samples after signing a broad informed consent allowing the use of their samples and data in all projects at deCODE genetics approved by the National Bioethics Committee (NBC). The data in this study was approved by the NBC (VSN-19-158; VSNb2019090003/03.01) following review by the Icelandic Data Protection Authority. All personal identifiers of the participants' data were encrypted in accordance with the regulations of the Icelandic Data Protection Authority. The Icelandic CTS cases were obtained in collaboration with Icelandic physicians at Landspitali – National University Hospital in Reykjavik, the Registry of Primary Health Care Contacts, and the Registry of Contacts with Medical Specialists in Private Practice. The CTS cases were identified using International Classification of Diseases 10 (ICD-10) code G56.0, ICD-9 code 354.0, and Nomseco Classification of Surgical Procedures (NCSP) code ACC51 (decompression and freeing of nervus medianus) through the scrutiny of hospital records from 1985 to 2020. The cases with lesions of the ulnar and radial nerves were identified using ICD-10 code G56.2 and G56.3, respectively.
The UK Biobank resource includes extensive phenotype and genotype data from ~500,000 participants in the age range of 40 to 69 from across the UK that have provided an informed consent43. The North West Research Ethics Committee reviewed and approved UK Biobank’s scientific protocol and operational procedures (REC Reference Number: 06/MRE08/65). This study was conducted using the UK Biobank resource under application number 24898. CTS cases were identified by searching for ICD-10 code G56.0 and Classification of Interventions and Procedures (OPCS) codes A651 (carpal tunnel release) and A692 (revision of carpal tunnel release) in General Practice clinical event records (Field ID 42040) and UK hospital diagnoses (Field ID 41270 and 41271).
The Copenhagen Hospital Biobank (CHB) is a research biobank, which contains left over samples from diagnostic procedures on hospitalized and outpatients in the Danish Capital Region hospitals. Under the “Genetics of pain and degenerative diseases” protocol, approved by the Danish Data Protection Agency (P-2019-51) and the National Committee on Health Research Ethics (NVK-18038012), we identified CTS cases by ICD-10 code G56.0 and NCSP code ACC51 to identify CTS cases. The Danish Blood Donor Study (DBDS) Genomic Cohort is a nationwide study of ~110,000 blood donors44. The Danish Data Protection Agency (P-2019-99) and the National Committee on Health Research Ethics (NVK-1700407) approved the studies under which genetic data on DBDS participants were obtained. The DBDS data requested for this study was approved by the DBDS steering committee.
The FinnGen database consists of samples collected from the Finnish biobanks and phenotype data collected at the national health registers. The Coordinating Ethics Committee of the Helsinki and Uusimaa Hospital District evaluated and approved the FinnGen research project. The project complies with existing legislation (in particular the Biobank Law and the Personal Data Act). The official data controller of the study is the University of Helsinki. Subjects were identified using ICD-10 code G56.0 and ICD-9 code 354.0. NCSP-FI codes (Finnish NCSP adaption) was not available. The summary statistics for CTS were imported on May 11th, 2021 from a source available to consortium partners (version 5; http://r5.finngen.fi).
Genetic ancestry quality control was performed for the Icelandic45, British46, and Danish47 participants. All participants were genotypically verified as being of European descent.
In total, we had 48,843 carpal tunnel cases (8,122 from Iceland, 19,849 from the UK, 9,664 from Denmark, and 11,208 from Finland) and 1,190,837 controls (318,161 from Iceland, 411,179 from the UK, 266,450 from Denmark, and 195,047 from Finland) in the meta-analysis.
Genotyping and imputation
The preparation of the Icelandic samples, genotyping, whole-genome sequencing (WGS), and imputation was performed at deCODE genetics45,48. The genomes of 49,962 Icelanders were WGS using GAIIx, HiSeq, HiSeqX, and NovaSeq Illumina technology to a mean depth of at least 17.8×. Single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) were identified and their genotypes called using joint calling with Graphtyper49. Additionally, over 166,000 Icelanders (including all sequenced Icelanders) have been genotyped using various Illumina SNP chips and their genotypes phased using long-range phasing50, which allows for improving genotype calls using haplotype sharing information. Subsequently, genealogic information was used to impute sequence variants into the chip-typed Icelanders, as well as their first- and second-degree relatives51 to increase the sample size and power for association analysis.
The UK Biobank samples were genotyped with a custom-made Affymetrix chip, UK BiLEVE Axiom in the first 50,000 individuals52, and the Affymetrix UK Biobank Axiom array53 in the remaining participants. Samples were filtered on 98% variant yield and any duplicates removed. Over 32 million high-quality sequence variants and indels to a mean depth of at least 20× were identified using Graphtyper49. Quality-controlled chip genotype data were phased using Shapeit454. Variants where at least 50% of the samples had a GQ score > 0 were used to prepare a haplotype reference panel using in-house tools and the long-range phased chip data. The haplotype reference panel variants were then imputed into the chip genotyped samples using in-house tools and methods described above for the Icelandic data45,50.
Samples from 276,114 Danes from the CHB and DBDS where genotyped using Illumina Global Screening Array chips and long-range phased together with ~238,000 genotyped samples from North-western Europe using Eagle255. Samples and variants with less than 98% yield were excluded. A haplotype reference panel was prepared in the same manner as for the Icelandic and UK data45,50 by phasing whole-genome sequence genotypes of 15,576 individuals from Scandinavia, the Netherlands, and Ireland using the phased chip data. Graphtyper was used to call the genotypes which were subsequently imputed into the phased chip data. Whole genome sequencing, chip-typing, quality control, long-range phasing, and imputation from which the data for this analysis were generated was performed at deCODE genetics.
A custom-made FinnGen ThermoFisher Axiom array (>650,000 SNPs) was used to genotype ~177,000 FinnGen samples at Thermo Fisher genotyping service facility in San Diego. Genotype calls were made with AxiomGT1 algorithm. Individuals with ambiguous gender, high genotype missingness (>5%), excess heterozygosity (±4 SD), and non-Finnish ancestry were excluded. Variants with high missingness (>2%), low Hardy-Weinberg equilibrium (< 1 × 10-6), and minor allele count (<3) were excluded. High coverage (25-30×) WGS data was used to develop the Finnish population-specific SISu v3 imputation reference panel with Beagle 4.1. More than 16 million variants have been imputed (https://finngen.gitbook.io/documentation/methods/genotype-imputation).
Association analysis
We applied logistic regression using the Icelandic, UK, and Danish data and combined the results with imported association results from Finland (https://r5.finngen.fi/) to test for association between sequence variants and CTS. For the additive model, the expected allele counts were used as a covariate. We used LD score regression to account for distribution inflation due to cryptic relatedness and population stratification14 and used the intercepts as correction factors (CF) in the Icelandic (CF = 1.19), UK (CF = 1.03), and Danish (CF = 1.03) datasets.
In the Icelandic association analysis, we adjusted for sex, county of origin, age at data analysis or age at death (first and second order terms included), blood sample availability for the individual, and an indicator function for the overlap of the lifetime of the individual with the time span of phenotype collection. In the UK association analysis, we adjusted for sex, age, and the first 20 principal components46. In the Danish association analysis, we adjusted for sex, whether the individual had been chip-typed and/or sequenced, and the first 20 principal components. The imported Finnish association analysis was adjusted for sex, age, the genotyping batch, and the first 10 principal components.
We combined CTS GWASs from Iceland, the UK, and Denmark with summary statistics from Finland using a fixed-effects inverse variance method56 based on effect estimates and standard errors in which each dataset was assumed to have a common OR but allowed to have different population frequencies for alleles and genotypes. The total number of variants included in the meta-analysis that had imputation information above 0.8 and MAF > 0.01% was 37,071,338 (20,697,529 in Iceland, 32,381,502 in the UK, 27,776,695 in Denmark, and 12,830,475 in Finland). We estimated the genome-wide significance threshold using a weighted Bonferroni adjustment that controls for the family-wise error rate12. Sequence variants were mapped to NCBI Build38 and matched on position and alleles to harmonize the four datasets. Variants were weighted based on predicted functional impact12 (Supplementary Table 2).
Conditional association analyses were performed on the GWASs from Iceland, the UK, and Denmark using true imputed genotypes of participants. Approximate conditional analyses (COJO), implemented in the GCTA-software53, were applied on the lead variants in the Finnish summary statistics. The analyses were restricted to variants within 1 Mb from the index variants. LD between variants was estimated using a set of 5,000 WGS Icelanders. After adjusting for all variants in high LD (r2 > 0.8) and vice versa, the P-values were combined for all four datasets to identify the most likely causal variant at each locus and any secondary signals. Based on the number of variants tested, we chose a conservative P-value threshold of <5 × 10-8 for secondary signals.
Polygenic risk score and heritability
Polygenic risk score (PRS) was generated using a set of 611,000 high quality variants across the genome were used to avoid uncertainty due to imputation quality57. LD estimated from almost 15,000 phased Icelandic samples, was used derive adjusted effect estimates by applying LDpred19. The effect estimates were used as weights. We generated PRS into the Icelandic, the UK, and Danish datasets and used leave-one-out meta-analyses, where the summary data from that particular dataset was omitted, to avoid any bias in PRS estimates. In addition, we meta-analyzed the results from the dataset-specific PRS analyses using a random-effects model, fitted via maximum likelihood estimation20. The unit of the effect for each PRS is in SD. Since the effects on CTS were comparable (overlapping confidence intervals), we performed no further scaling.
We estimated the SNP heritability for the combined CTS GWASs from Iceland, the UK, and Denmark using LD score regression14 and precomputed LD scores based on about 1.1 million variants from European ancestry samples (downloaded from: https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.bz2).
Genetic correlation
Genetic correlation analyses between the CTS meta-analysis and 1,319 published GWAS traits (P ≤ 3.8 × 10-5) from the UK Biobank28 with effective sample size over 5,000 were performed using LD score regression14,58, which suggests the minimal effective sample size of 5,000 for each trait to get unbiased estimates of genetic correlation and heritability. Since participants in the published GWAS studies are of Caucasian ancestry, we used pre-computed LD scores from a 1000 genome panel with r2 from HapMap3, excluding HLA region. The HLA region was excluded for its genetic complexity and association with a wide number of traits. The default parameters of the LD score regression were used to compute the genetic correlation and heritability estimates.
Mendelian randomization
A two sample MR analysis was performed to estimate the causal relationship between CTS and exposure traits, that we used as instrumental variables. We used the inverse-variance-weighted (IVW) method to estimate the causal relationship between variables, a t-test to compute the P-value, and the Egger method to test for pleiotropy in IVW estimates, all implemented in the MendelianRandomization package59 in R.
Functional data
To highlight potential causal genes associating with CTS, we annotated the CTS associations or variants in high LD (r2 ≥ 0.8 and within +/- 1MB) that are predicted to affect coding or splicing of a protein (variant effect predictor using Refseq gene set), mRNA expression (top local expression quantitative trait loci [cis-eQTL]) in multiple tissues from deCODE, GTEx (https://www.gtexportal.org/home/), and other public datasets, and/or plasma protein levels (top protein quantitative trait loci [pQTL]).
RNA sequencing was performed on whole blood from 13,175 Icelanders and on subcutaneous adipose tissue from 750 Icelanders, described in detail elsewhere60. Gene expression was computed based on personalized transcript abundances using kallisto61. Association between sequence variants and gene expression (cis-eQTL) was estimated using a generalized linear regression, assuming additive genetic effect and quantile normalized gene expression estimates, adjusting for measurements of sequencing artefacts, demographic variables, blood composition, and hidden covariates62.
We used the SomaLogic® SOMAscan proteomics assay to test the association of the sequence variants with protein levels in plasma15. The assay scanned 4,907 aptamers that measure 4,719 proteins in samples from 35,559 Icelanders with genetic information available at deCODE genetics. Plasma protein levels were standardized and adjusted for year of birth, sex, and year of sample collection (2000-2019).
We performed gene-based enrichment analysis using the GENE2FUNC tool in FUMA. The genes were tested for over-representation in different gene sets, including Gene Ontology cellular components (MsigDB c5) and GWAS catalogue-reported genes.