Study Design
PERF is comprised of 5855 post-menopausal Danish women, enrolled at baseline (PERF I) between 1999 and 2001. The aim of the study was to identify potential risk factors associated with age-related diseases. All participants had previously been enrolled in randomized placebo-controlled trials or had been screened for previous studies in Copenhagen or Aalborg at the Center for Clinical and Basic Research (Neergaard et al. 2017).
The study was carried out in accordance with Good Clinical Practice (ICH-GCP) and Declaration of Helsinki. The study protocol was approved by the Danish Research Ethics Committee (KA99070gm). Informed consent was obtained from all participants or legal guardians.
Patients were excluded from the study based on missing genotype data and filtered based on a population level genotype filter to remove participants with cryptic relatedness. Patients without registry data were excluded from the study, as were patients with missing VICM measurements.
Data Collection
As part of the PERF study, participants were interviewed by a doctor or nurse at baseline, allowing collection of data pertaining to demographics, lifestyle, and medical history. For those who consented, fasting serum (n = 5668) and DNA samples were collected (n = 5553).
VICM was measured blinded in serum by enzyme-linked immunosorbent assay (ELISA) in a CAP-certified laboratory as previously described 14. Lymphocyte and neutrophil counts were determined using an automated blood cell analyser (Sysmex).
Disease history from each consenting participant (n = 5602) was collected from the Danish National Patient Registry (NPR), made possible by linking each participant’s civil registration number (CPR number) to the NPR. Data was collected for the period 1974–2014 and was censored on 31-12-2014, corresponding to the end of PERF study.
Disease phenotypes used in this study were classified through a use of the NPR, biomarker measurements and questionnaire data (Supplementary Table 1). Phenotypes were defined as pre-baseline plus one year, and all-time occurrence.
Genotyping
A total of 5516 samples were successfully genotyped of the 5553 samples available. Genotyping was then performed using a custom-made Illumina Global Screening Array version 2 (693143 probes) in collaboration with deCODE Genetics Iceland.
Probe and Individual Filtering
Standard probe-level filtering was performed, using a minor allele frequency of greater than or equal to 1%, a Hardy Weinberg Equilibrium p-value cutoff greater than or equal to 1e-6, and a minimum probe call rate of 97%. No multi-allelic SNP filtering was conducted.
A total of 534710 probes were screened for association to serological VICM levels. Identity-by-descent and the inbreeding coefficient were calculated using the Plink -genome and -ibc functions respectively, in order to address cryptic relatedness. Study participants were removed on a one-side-of-a-pair basis, using a pi_hat = 0.1875 cut-off value. A cut off value of less than − 0.1 of or greater than 0.1 was used for the Fhat2 coefficient. Of the 5516 available genotyped study participants, 136 were removed.
Principal Component Analysis
EIGENSTRAT Smartpca 7.2.0 15 was used to conduct principal component analysis (PCA) of the genotypes in the available population (n = 5106) for all 534710 filtered variants. 10 principal components were extracted with 5 iterations used. The 3 leading components capture 0.3% of the explained variance.
Linear Regression
Using plink v1.90p 16 linear additive regression was conducted on the study population (n = 4369), whilst adjusting for age at baseline, BMI, and the leading three principal components. Plink switches –allow-no-sex to allow samples with missing gender information and –keep-allele-order to assign allele A1 to the ALT allele were used.
GWAS significance thresholds were defined as equal to 5e-8 for the genome-wide significance threshold, and 1e-5 for the suggestive significance threshold. P-values were visualized in a Manhattan plot, using the R package qqman 17.
Pathway analysis/ Enrichment
Following GWAS analysis, pathway enrichment analysis was performed using VEGAS2 18 and PARIS 2.4 19 using default parameters. VEGAS2 analysis was performed using the Biosystems gene/pathway annotation file provided by the software website. The LOKI knowledge base used by PARIS2 was compiled in February 2020. Significant associations to REACTOME pathways which were common to both frameworks were reported, using p < 0.05.
Statistical Analysis
Logistic regression was performed between serum levels of VICM and disease phenotypes, whilst adjusting for baseline age, BMI, and smoking, three known factors affecting VICM levels. P-values were corrected for multiple hypothesis testing using FDR, with a threshold of 0.05. Cause specific Cox proportional hazards regression analyses were performed between VICM and major causes of death, with cause of death being taken from the patient death registry. Age was used as the time scale, and the models were adjusted for BMI and smoking. All analysis was done in R version 3.6.0 20 with plotting done with cox models built using rms 21 and plotted using ggplot2 22.