Health and Prevention Enhancement (H-PEACE) and Gene-Environment of Interaction and Phenotype (GENIE) cohorts
In the study of discovery stage, the H-PEACE and GENIE cohorts were used. They consisted of participants who visited the Seoul National University Hospital Healthcare System Gangnam Center for health check-ups from 2003 to 2017 14. Among them, 7 999 and 2 349 participants from the H-PEACE and GENIE cohorts, respectively, were genotyped on 833 000 variants using the Affymetrix Axiom Korean Chip (v1.0) 15. The variant calling was performed using the K-Medoid clustering method 16.
Beck Depression Inventory
The degree of depression of the participants in the two cohorts was measured using original BDI. It is one of the most widely used self-scored measures of depression, comprising 21 items to evaluate how the participants had been feeling for the last two weeks 17. Each item was rated on a four-point ordinal scale from 0 to 3, where a higher score represents a higher level of depression-related feelings. The total sum of the scores of 21 items was used as a criterion for depression.
In the cohorts, the BDI scores of a few participants had errors, because they had selected more than one point for an item or their responses for some items were missing. We regarded them as missing values and selected subjects who had missing values lower than 10.5, half of the total number of items. The average missing rates for each item in the H-PEACE and GENIE cohorts were 7.23% and 19.09%, respectively. We imputed missing values of the remaining participants by referring to (if any) their own scores measured during their check-ups at other times or scores of other participants who were close to them based on scores of other non-missing items using the missForest (v.1.4) R package 18. In accordance with Beck et al. (1988) 19, we regarded participants whose sum of the imputed scores was lower than 10 as controls and the others as cases. In the case of participants who had multiple health check-ups, the result was extracted when they had the highest BDI score (if they had the same scores, the latest result was extracted).
Genome-wide Association Study
After sample selection based on the BDI score, the genotype data were cleaned using the quality control steps suggested by Anderson et al (2010) 20. Participants whose genotype missing rate was higher than 0.03, estimated heterozygosity rate differed from the sample mean of all participants by more than three standard deviations, and the estimated pairwise identity-by-descent was higher than 0.185 with a higher genotype missing rate than that of their counterparts were excluded. Variants whose genotype missing rate was significantly different between the case and control by Fisher’s exact test (P < 1 × 10− 5) and higher than 0.03, minor allele frequency (MAF) was lower than 0.05, and P-value of the Hardy–Weinberg equilibrium (HWE) test was lower than 1 × 10− 3 were excluded (Supplementary Fig. 1).
Subsequently, the genotype data were imputed using the Michigan Imputation Server (v.1.1) 21. Haplotype Reference Consortium r1.1 2016 and other/mixed population were chosen as reference panels, and phasing and imputation were performed using Eagle v2.4 and Minimac4, respectively. Among the imputed genotypes, variants whose MAF was higher than 0.05, genotype missing rate was lower than 0.03, HWE test was not significant at P-value 1 × 10− 3, and imputation quality measure “INFO” was higher than 0.8 were extracted (Supplementary Fig. 1).
Further, the GWAS was performed in each cohort after adjusting for the effects of sex, age, body mass index (BMI), and the top four principal component (PC) scores using plink (v1.90b3.44) 22. For 67 and 18 (approximately 1%) participants in the H-PEACE and GENIE cohorts, respectively, BMIs were not observed and were imputed using missForest by referring to those of other participants who had similar sex and age 18. The PC scores were calculated using pruned genotyped variants extracted by using --indep-pairwise 50 5 0.2 option of plink. Meta-analysis was also performed with weights of inverse of the corresponding standard errors of each GWAS result using METAL (v2011-03-25) 23.
For genomic data analysis, in addition to plink (v1.90b3.44) and METAL (v2011-03-25), ONETOOL (v1.0) 24 and R (v3.6.3) were utilized.
Moreover, LocusZoom (http://locuszoom.org/) was used to generate a regional plot of ± 0.5 Mb flanking region of a significant variant 25. Genome build and linkage disequilibrium (LD) population were selected as hg19 and 1000 Genomes Nov 2014 ASN, respectively. The other options were not manipulated.
Replication Cohorts
For validation, Kangbuk Samsung Cohort Study (KSCS) cohort was used. It is a prospective cohort study to evaluate the natural history, prognosis, and genetic and environmental determinants of a wide range of health traits and diseases among Korean adults. 2 011 participants were genotyped using the Illumina HumanCore BeadChips 12v. Depressive symptoms and severity were assessed using CES-D 26. For identifying individuals at risk for clinical depressive symptoms, it uses 16 or above as cut-off score 27. We divided the participants into two groups based on the cut-off score: depressed and non-depressed groups. The analysis steps used were the same as in H-PEACE and GENIE, except that 0.05 was used as the genotype missing rate cutoff in the participant QC step, and 0.02, 0.01, and 1 × 10− 6 were used as the genotype missing rate, MAF, and HWE cutoffs, respectively, in the variant QC step. Furthermore, an older version of Minimac, Minimac3, was used in the imputation step.
Moreover, we referred an independent study of which meta-analysis of GWASs on depression in nine cohorts of East Asian ancestry 9. The summary statistics, excluding the three cohorts, WHI, 23andME, and UKB, were downloaded from https://www.med.unc.edu/pgc/download-results/.
Expression Quantitative Trait Loci Analysis
LDexpress 28, a web-based tool implemented in LDlink 29 (https://analysistools.cancer.gov/LDlink/), was used to search for genes whose expression levels were associated with a significant variant from the GWAS results. In the East Asian population, genes in several tissues were identified, the expression levels of which were significantly associated (P < 1 × 10− 5) with variants\(\)which were highly correlated (\({R}^{2}\) > 0.8) with a significant variant in ± 1 Mb window. The correlation between variants was estimated in the East Asian population using LDpair implemented in LDlink.
Transcriptome-wide Association Study
Gene expression levels in the whole blood tissues of the participants were predicted using PrediXcan 30. The imputed genomic data of each cohort were used as input for the analysis. The gene-wise association between gene expression (a response variable) and self-rated depression (an explanatory variable) was estimated using a linear regression model with the same covariates used in the GWAS. Moreover, the false discovery rate (FDR) was estimated for each chromosome.
Mr Analysis
In this study, MR analysis was performed to identify the causal relationship between a gene and the risk of depressive symptoms. Generalized summary-data-based MR (GSMR) 31 implemented in Genome-wide complex trait analysis (GCTA) (v1.94.0b) 32 was used for the analysis with its default options. It required two summary statistics from regression models between variants and exposure or outcome and reference samples for LD estimation.
We estimated the summary statistic of the association between the expression levels of genes and variants in the imputed H-PEACE dataset using a linear regression model with the same covariates used in the GWAS. The GWAS result of the imputed GENIE dataset was used for the other summary statistic. East Asian samples in 1000 Genomes Phase 3 v5a were used as reference samples.
Association Study Between Variants Or A Gene And Each Depressive Symptom
GWASs of each item were performed considering the phenotype as 0 if the score was 0 or 1 otherwise and adjusting the effects of sex, age, BMI, and top four PC scores in each cohort. Meta-analyses for each item were performed using METAL. Associations between gene expression and the phenotype of each item were estimated with the same model used in the GWAS, except for replacing the number of minor alleles of a variant with the expression level of a gene in the two cohorts. Further, their Z statistics were summed by weighting their corresponding inverses of standard errors, and the P-values were calculated using the weighted Z statistics. Statistical analyses were performed using R and Rex (v3.6.1) 33.
Estimation Of Phenotypic And Genotype Correlations Between Depressive Symptoms
Based on the scores (0–3 scale), pairwise phenotypic correlations were estimated between the 21 BDI items in the H-PEACE and GENIE cohorts, respectively. Moreover, the items in combining the two cohorts were grouped into four clusters using hcluster function implemented in the amap R package (v0.8-18) with designating method and link options as Pearson and average, respectively. The phenotypes in each cluster were divided into two groups based on a cutoff in proportion to the number of items in the cluster. For example, the cutoff was set to 6.67, which was 10 times 14 divided by 21, in cluster 4 which included 14 items. GWASs were performed in each cluster of the two cohorts, respectively, with adjusting the effects of the same covariates used as above. Then, the results were combined by METAL.
In each cohort, genetic correlation between items were estimated by performing a bivariate GREML analysis implemented in GCTA with adjusting the effects of sex, age, BMI, and top four PC scores, respectively. Prevalence of each item was estimated as ratio of non-zero score of the item, and specified in the GREML analysis. Genetic relationship matrix was calculated by pruned variants with having larger than 0.1 MAF.
Ethics Statement
This study was approved by the institutional review boards of Seoul National University Hospital (H-1708-120-880) and Kangbuk Samsung Hospital (2019-08-006). All participants were provided written informed consent and the study was performed according to relevant guidelines and regulations.