Participants and procedures
Data were derived from the English Longitudinal Study of Aging (ELSA), which is a multi-disciplinary prospective cohort study of nationally representative men and women aged 50 years and older in England.22 The study began in 2002 with reassessments biennially since then. Data from combined waves 2 and 4 (2004–2008) were used as baseline as genetic data were first introduced across this period. Data for outcomes on sleep duration and depression symptoms were derived from combined waves 6 and 8 (2012–2016) given that symptoms of depression and sleep duration may fluctuate within subjects over time. Data were collected in participants’ homes, through nurse visits and computer-assisted personal interviews (CAPI). The sample of 7146 was reduced by 625 (8.8%) participants who experienced severe depression symptoms and 1076 (15.1%) who experienced short-sleep or long-sleep at baseline were excluded from analyses, leaving two analytic samples of 6521 and 6070, respectively (Fig. 1). Participants provided written informed consent and ethical approval was granted by the National Research Ethics Service (London Multicentre Research Ethics Committee).
Study variables
Sleep duration. Sleep duration was measured with an open-ended question, asking participants about the length of their sleep on an average weeknight. Following literature,7,23 sleep duration was also categorised into “≤5hrs” (short-sleep), “>5-<9hrs” (optimal-sleep), and “≥9hrs” (long-sleep).
Depressive Symptoms. The eight-item Centre for Epidemiologic Studies Depression Scale24 (CES-D) was used to assess self-reported depression symptoms over the past week. The psychometric properties are excellent in validity and reliability to the original 20-item scale.25 The scale was reduced by a single item (i.e., “whether their sleep was restless during the past week”), as this item iterated sleep estimations. The reduced seven-item scale included whether, during past week, participants felt “…depressed much of the time”; “…everything was an effort”; “…happy much of the time”; “…felt sad much of the time”; “…lonely much of the time”; “…enjoyed life much of the time”; and “…could not get going much of the time”. The items were scored on a binary response scale (anchored at 1=‘yes’; 0=‘no’). Positively worded items were reversed scored. Higher scores indicated greater depression symptoms. Scores were summed to generate a total continuous score, ranging 0 (‘no depression symptoms’) to 7 (‘severe depression symptoms’). The Cronbach’s alpha in this sample was 0.80. To indicate severe depression symptoms, scores were dichotomised by ≥4; a well-recognised clinically significant indicator of depression.25
Covariates. Covariates included age (≥50); age squared (age2) to account for nonlinearity; sex (male/female); and genetic ancestry to account for ancestry differences in genetic structures that could bias results (as measured by principal components [described below]).
Genetic data
The genome-wide genotyping was performed at University College London (UCL) Genomics in 2013-2014 with the funding the Economic and Social Research Council (ESRC) using the llumina HumanOmni2.5 BeadChips (HumanOmni2.5-4v1, HumanOmni2.5-8v1.3), which measures ~2.5 million markers that capture the genomic variation down to 2.5% minor allele frequency.
Quality Control. Single-nucleotide polymorphisms (SNPs) were excluded if they were non-autosomal, minor allele frequency was <1%, if more than 2% of genotype data were missing and if the Hardy-Weinberg Equilibrium p-value was <10−4. Samples were removed based on call rate (<0.99), sex difference, heterozygosity, and relatedness. To improve genome coverage, we imputed untyped quality-controlled genotypes to the Haplotype Reference Consortium26 using the University of Michigan Imputation Server.27 Post-imputation, we kept variants that were genotyped or imputed at INFO>0.80, in low linkage disequilibrium (R2<0.1) and with Hardy-Weinberg Equilibrium p-value>10−5. After the sample quality control 7179780 variants were retained for further analyses. To account for potentially biasing ancestry differences in genetic structures, a principal components (PCs) analysis was conducted, retaining the top 10 PCs,28 which were subsequently used to adjust for possible population stratification in the association analyses.28,29
Polygenic risk scores (PGS). PGS for sleep duration, short-sleep, and long-sleep were calculated using summary statistics from genome-wide association studies (GWAS) from the UK Biobank.10,30 To calculate PGS for depression, summary statistics from GWAS of major depressive disorders (MDD) was conducted by the Psychiatric GWAS Consortium (PGC) encompassing n=1331010 participnats.19 All PGSs were calculated using a six p-value threshold (PT; i.e., 0.001, 0.01, 0.05, 0.1, 0.3, and 1) using PRSice (Supplementary [S] Table 1).31 Using information on sample size (n), total number of independent markers in genotyping panel (m) and lower and upper P-values to select markers into polygenic score, we estimated the predictive accuracy (R2), and estimated a predictive power of each PT using Avenge me package implemented in R.32,33 Our results showed that the ultimate PT was 0.001 for the PGSs for sleep duration (m=39476, R2=0.003, P=2.12×10-5), short-sleep (m=52197, R2=0.004, P=6.52×10-08), and depression (m=63824, R2=0.001, P=0.003). Whereas the optimal PT for the PGS for long-sleep was 0.01 (m=127099, R2=0.003, P=5.79×10-06). The estimated predictive accuracy for PGSs can be found in Table S1. To aid interpretability of the results, all PGSs were standardised by subtracting the mean and multiplying by their corresponding standard deviations; this scaling led to a unit increase, doubling the risk of the corresponding outcome. The correlations between PGSs and phenotypic data ranged --0.057-+0.048 (Table S2).
Statistical Analyses
Imputation of missing values. Missingness in the main and sensitivity analyses ranged from 0.0-17.0% (Table S3). Given the possibility of bias in the complete case analysis,34,35 missing values were imputed using missForest based on Random Forests, an iterative imputation method, in RStudio v.4.0.3. In ELSA, socioeconomic variables are the main drivers of attrition,22 so the assumption that missingness was not dependent on unobserved values, and was, thus, missing at random (MAR), was likely to be met. It has previously been shown that in the presence of nonlinearity and interactions, missForest outperformed prominent imputation methods, such as multivariate imputation by chained equations and k-nearest neighbours.36 The imputation of the missing values yielded a minimal error for continuous (Normalized Root Mean Squared Error=0.09%) and categorical (proportion of falsely classified=0.14%) variables. A comparison of imputed and observed data indicates homogeneity between samples (Table S4).
Association analyses. Logistic regressions, reported as odds ratios (OR) with 95% confidence intervals (95%CI), were used to test whether PGSs for sleep duration, short-sleep, and long-sleep were associated with the onset of severe depression symptoms during an average 8-year follow-up period. Using multilinear and multinomial regressions, associations were investigated between PGS for depression and overall sleep duration, and onset of short-sleep and long-sleep during follow-up. Here, standardised regression coefficients (β) and relative risk ratios (RRR), respectively, with standard errors (SE) and 95%CI, denote the unit increase in overall sleep duration and the relative risk of short-sleep and long-sleep, as compared to optimal-sleep (the reference category). Sleep duration was modelled continuously with quadratic (squared) terms to account for nonlinearity. When significant linear and quadratic effects were detected, the linear effect took lower-order and was subsumed under the quadratic effect. Models were fitted to understand the role of covariates on associations: Model 1 was unadjusted; Model 2 controlled for baseline age, age2, sex and 10 PCs. All association analyses were conducted in Stata 17.1 (STATA CorpLP, USA).
Sensitivity analyses. Three sets of sensitivity were performed to measure the robustness of the main results. First, we tested whether associations were dependent on the categorisation of depression, so analyses were repeated using continuous scores. Second, phenotypic associations, using self-reported sleep duration, short-sleep, long-sleep, and depression symptoms, were tested to assess consistency with the genetic findings. Finally, to ensure consistency with results from imputed data, analyses were repeated using complete cases. No corrections for multiple comparisons were made as exploratory studies do not strictly require multiplicity adjustment.37