Increased depression genetic liability associates with inammatory markers in multiple electronic health record systems

Although depression is a common disorder, its underlying biological basis remains poorly understood. The results of clinical lab tests available for research in electronic health records can be used to identify biomarkers that may reveal biological processes involved in the development of depression or may be markers of physiological changes due to depression. Here, we leveraged clinical laboratory tests and integrated biobank data to evaluate the relationship between genetic risk for depression and 315 routinely collected quantitative lab measures. Analyses across four health care systems (N = 382,452) robustly implicate increased white blood cell count as both a risk factor and consequence of depression diagnosis and revealed neutrophils, lymphocytes, and monocytes as the primary cell types responsible for this association. Our results highlight the importance of the immune system in the etiology of depression and motivate future development of clinical biomarkers and targeted treatment options for depression and its systemic effects.


Introduction
Depression is a common psychiatric disorder estimated to affect 264 million individuals worldwide 1 .
Diagnostic criteria for depression include clinical evaluation of self-reported psychiatric symptoms, such as depressed mood, irritability, anhedonia, or suicidal thoughts. In addition to psychiatric effects, depression is associated with increased risk for cardiovascular disease [2][3][4] , autoimmune disease 5 , and diabetes [6][7][8][9] . The increased risk of peripheral diseases suggests the biology of depression is not limited to the brain; however, the causes and biologic effects of depression in the brain and the periphery remain poorly understood.
In a health care setting, laboratory ("lab") tests, are used to aid clinicians in diagnostic and treatment decision making, and tests that can accurately and reproducibly indicate a medical state are generally referred to as biomarkers 10 . To date there are no biomarkers for depression, however, consistent with the high number of common comorbidities, clinical depression is associated with a wide range of clinical labs, including increased pro-in ammatory cytokines [11][12][13][14] , altered lipids [15][16][17] , altered growth factors [18][19][20] , and decreased brain-derived neurotrophic factor [21][22][23][24] . For many of these physiological quantitative labs, the underlying biologic mechanisms are well understood. Thus further understanding the biological link between clinical depression and these labs can help identify the biological processes contributing to depression and could lead to the development of more informative biomarker panels to be used in risk assessment and treatment response.
Previous studies report a bidirectional relationship between depression and autoimmune disease 25 .
Additionally, several immune biomarkers such as monocytes [26][27][28][29][30][31] , neutrophil-lymphocyte ratio 14,32,33 , and C-reactive protein 14,31,34 are increased in patients with depression compared to controls. However, most immune biomarker studies of depression have been limited in sample size and scope. Additionally, these studies are often unable to control for potential confounders or determine the direction of association between depression and biomarkers.
Electronic health records (EHRs) store longitudinal information about the health and clinical care of individual patients, including lab results, medication records, diagnoses, and clinician notes. Biobanks that link EHRs to DNA provide an opportunity to analyze this clinical information along with genetic risk factors. Genetic risk for depression can be estimated using polygenic scores (PGS) which aggregate the small effects of thousands of loci across the genome into one score for each individual 35 . Clinical lab data from EHRs has been historically di cult to use for research due to methodological issues, such as data entry errors, implausible recorded values, and changes in reference ranges and units over time [36][37][38] .
We recently developed and validated a scalable pipeline for quality control and harmonization of EHRderived lab data (QualityLab), and analysis of cleaned labs that can enable large-scale genetic evaluation of laboratory values as biomarkers of disease (LabWAS) 39 . In this work, we demonstrate that combining depression PGS with lab results stored in EHRs can robustly identify physiological processes that are affected by increased genetic liability to depression.
While independent biobanks can be used to discover associations, combining multiple health record systems through consortia can validate those discoveries in broader populations. The PsycheMERGE Network consists of investigators from institutions across the United States with the common goal of using EHRs and biobanks to advance the identi cation, biology, and treatment of psychiatric disorders 40 .
Here, we investigate the effect of polygenic risk for depression on clinically measured lab values leveraging data from four healthcare systems participating in the PsycheMERGE Network
Next, we explored the robustness and generalizability of our ndings across ancestry by repeating the depression PGS-LabWAS in individuals of African ancestry in VUMC (N = 12,384). As expected given the reduction in sample size and the poor transferability of PGS to individuals of African ancestry, no lab values were signi cantly associated with the depression PGS in the African ancestry group. However, the association between depression PGC and WBC trended in the same direction as observed in the European population (p-value = 0.058, beta = 0.02), (Supplementary Table 2).

Conditional Analyses of WBC
Depression diagnosis is associated with increased risk for autoimmune disease 5 and is often comorbid with other phenotypes linked to elevated WBC, such as cardiovascular disease 2-4 , smoking 41,42 , and diabetes [6][7][8][9] . Therefore, we performed a series of conditional analyses to rule out the possibility that a common phenotype associated with both depression PGS and WBC was confounding the observed association. To identify phenotypes associated with both depression PGS and median WBC in VUMC, we performed separate phenome-wide association scans (PheWAS) 43 of depression PGS and WBC measurements. In a PheWAS, ICD codes are hierarchically grouped together into 'phecodes' based on phenotypic similarity. Depression PGS and median WBC were signi cantly associated with 66 and 469 phecodes, respectively. Of these signi cantly associated phecodes, 32 were common to both depression PGS and median WBC. Because many of the associated phecodes represented clinically similar conditions, they were binned into seven phe-groups: cardiovascular, psychiatric, obesity, respiratory, hepatic, pain, and autoimmune. (Fig. 2a, Supplementary Table 3).
To determine if comorbidity within a phe-group could account for the relationship between depression PGS and WBC, we controlled for each possible confounding phe-group separately and all phe-groups together. The association between depression PGS and WBC remained signi cant after controlling for each group separately and controlling for all phenotype groups together (p-value = 4.19 × 10 − 3 , beta = 0.02) with effect estimates similar to the original association (Fig. 2b, Supplementary Table 4).

Replication in the PsycheMERGE Network
Given the robustness of the association with WBC and the history of associations between depression status and pro-in ammatory markers, we focused on WBC for replication and further investigation.
Replication analysis included tting the same linear regression model to test the association between depression PGS and median WBC after adjusting for sex, age at median WBC, and genetic ancestry.  Table 5). In both MVP (N = 289,880) and MGBB (N = 20,828), the association between depression PGS and WBC remained signi cant with effect estimates similar to those observed at VUMC, even after controlling for depression diagnosis and after controlling for depression and anxiety diagnoses (Fig. 3). In MSSM, the effect size point estimates were similar to those observed in the three other sites, though they did not reach statistical signi cance, likely due to the smaller sample size (n = 823). The meta-analyzed effect estimate from the four sites was robustly and highly signi cant (p-value = 2.52 × 10 − 137 , beta = 0.04), even after controlling for depression diagnosis (p-value = 3.93 × 10 − 102 , beta = 0.04), and after controlling for depression and anxiety diagnoses (p-value = 2.48 × 10 − 100 , beta = 0.04) (Fig. 3, Supplementary Table 6).

Mediation Analysis
Mediation analyses can help explain and quantify the mechanisms by which an exposure affects an outcome through a mediating variable 45 . Two potential pathways between depression PGS, WBC, and depression diagnosis were assessed using mediation analyses. In the rst analysis, median WBC was modeled as a mediator of the relationship between depression PGS (exposure) and depression diagnosis (outcome). Meta-analysis across all sites revealed that WBC mediated 2.0% of the association between depression PGS and depression diagnosis (p-value = 5.05 × 10 − 43 ) ( Table 1, Supplementary Table 7). In the second analysis, depression diagnosis was modeled as a mediator of the association between the depression PGS (exposure) and median WBC (outcome). Again, meta-analysis across all sites indicates that depression diagnosis mediated 1.3% of the association between depression PGS and WBC (p-value = 2.78 × 10 − 10 ) ( Table 1, Supplementary Table 8).
Depression PGS and WBC-differential Mediation Analysis WBC counts are calculated from the sum of ve different cell subtypes: neutrophils, lymphocytes, monocytes, basophils, and eosinophils. These cell subtypes can be measured along with the total WBC using a complete blood count differential (CBC-diff) lab. To determine whether speci c WBC components accounted for the relationships between depression PGS and total WBC, and between depression PGS and depression diagnosis, we performed a series of multiple mediator analyses. In a multiple mediator analysis, a single main mediator and additional alternative mediators are speci ed. A structural equation modeling approach is used to assess the effect of the main mediator between the exposure and outcome after controlling for the correlation structure between the alternative mediators and the outcome 46 . Each cell subtype was analyzed as the main mediator with the remaining cell subtypes as the alternative mediators. Only differential lab values recorded on the same date (N = 24,383) were used to ensure accurate modeling of the CBC-differential.

Discussion
Depression is consistently associated with increased pro-in ammatory biomarkers, however, the mechanisms underlying these associations remains unclear. In this study, analysis of EHR-linked biobanks within the PsycheMERGE Network were utilized to examine the effect of depression polygenic scores on a variety of clinical lab traits, revealing a robustly replicated association with increased white blood cell count. Notably, several other lab traits associated with depression polygenic scores, including lipids, blood glucose, and blood urea nitrogen. In this study, we chose to further investigate the relationship with white blood cell count given the robustness of the association to clinical confounders and the existing literature.
In a lab-wide screen, increased polygenic depression risk was associated with increased in ammatory markers including white blood cell count, which was signi cantly associated with depression PGS even after controlling for depression, anxiety, and multiple comorbid phenotypes, thus highlighting depression polygenic score as an important risk factor for the pro-in ammatory state observed in depression. These results indicate that genetic risk for depression, independent of depression diagnosis or state, are linked to a pro-in ammatory biomarker. The effect of the depression polygenic score on WBC was modest across all biobanks, suggesting that individuals with high depression genetic liability may have an activated, but not abnormal immune system. Nonetheless, sustained activation of the immune system could have important implications for the risk of developing depression.
There are two main models that connect depression to a pro-in ammatory state, the neuroin ammation model and the stress response model. The neuroin ammation model hypothesizes that an activated immune system contributes to risk of developing depression 47,48 . Alternatively, the stress response model proposes that the stress of depression symptoms leads to a pro-in ammatory state 49,50 . Importantly, these two models are not mutually exclusive and some have suggested that they form a feedback loop 51,52 . In support of this hypothesis, the mediation results from this study do not distinguish either the neuroin ammation model or the stress response model as the exclusive pathway between depression and WBC. Rather, the results raise the hypothesis of a "feedback loop" whereby increased depression genetic liability leads to increased WBC, further increasing the likelihood of a depressive episode and subsequent diagnosis. Simultaneously, increased depression genetic liability increases the risk of a depression diagnosis through other mechanisms which in turn, increases WBC, completing the proposed loop.
In the clinic, WBC measurements can be broken down into measurements of each WBC subtype: neutrophils, lymphocytes, monocytes, basophils, and eosinophils. Abnormal levels of different WBC subtypes can index different immune processes. Understanding which cell types underlie the relationship between depression polygenic score and WBC can help narrow a speci c immune process that may be altered by depression. In analyses at VUMC, neutrophil counts, lymphocyte counts, and monocyte counts explained the entirety of the association between depression genetics and WBC, while neutrophil counts explained 1.9% of the association between depression polygenic score and depression diagnosis. However, the analysis of WBC subtypes from EHRs comes with limitations: individuals with clinical orders for WBC differential panels may represent a different subsample than those with only the total WBC measurement. Additionally, WBC subtypes contribute unequally to the total WBC measurement. The subtypes that explain the highest percentage of the relationship between depression genetics and WBC mirror the percent contribution to the whole WBC measurement, indicating the analysis might be in uenced by cell type proportions even though the differential analyses used cell type count, not proportion.
Overall, using polygenic risk of depression, we prioritized elevated WBC counts as a potential biomarker of depression risk worthy of further investigation. However, our results should be interpreted in light of several analytic limitations. First, the WBC measurements used in the study were clinically derived, with measurements re ecting a range of health states. Because the relationship between depression PGS and WBC remained constant even after controlling for several comorbid phenotypes, it is less likely that the association is due to ascertainment bias for a speci c phenotype for which WBC is commonly measured. Additionally, EHRs often contain multiple WBC measurements for the same individual. In this study, only the median values per individual are utilized, leaving unanswered questions about the effect of depression PGS on WBC over time and in response to antidepressant treatment. Finally, even though the relationship between depression PGS and WBC was statistically robust, the effect sizes are small, making WBC an unlikely candidate for use as a diagnostic biomarker of depression. Instead, the associations described in this study highlight the importance of WBC biology in depression and demonstrate the use of EHR-based genomics as a tool for discovery of physiological markers in psychiatric traits.

PsycheMERGE Study Samples
Vanderbilt At VUMC, lab results were extracted from the EHRs of 70,704 individuals and cleaned as previously described 39 . Brie y, labs were required to have at least 70% of observations in a single set of units and ltered for at least 1,000 observations over at least 100 individuals. Observations outside 4 standard deviations of the sample mean were excluded to remove extreme values including those that are biologically implausible. The median observation for each individual in each lab was selected and adjusted for cubic splines of age at measurement. The age-adjusted value was normalized using a rankbased inverse normal transformation (INT) to ensure a normal distribution for downstream analyses, generating age-adjusted INT lab values. For genetic analysis, labs exhibiting no measurable heritability through the GREML analysis in the GCTA software 59 were excluded, leaving 315 labs for use.

LabWAS of Depression PGS in VUMC
Associations between the depression PGS and labs were estimated with a lab-wide association scan (LabWAS) approach 39 in the VUMC sample. LabWAS uses the median, age-adjusted, rank-based inverse normal transformed (INT) lab values from the QualityLab pipeline in a linear regression to determine the association with an input variable after adjusting for covariates. The LabWAS of depression PGS was controlled for sex and top 10 genetic principal components. Primary analyses were restricted to individuals of European descent. We repeated the depression PGS-LabWAS in individuals of African ancestry in BioVU only (n = 12,384) ( Supplementary Fig. 3, Supplementary Table 2). Effect estimates represent per standard deviation increases in depression PGS. In this analysis, the scalability of labWAS across hundreds of labs took precedence over the interpretability of the effect estimate which cannot be converted to the lab's original units.
In conditional analyses, the LabWAS of depression PGS was covaried for median BMI across the EHR, and for depression, anxiety, and adjustment reaction diagnoses, de ned by phecodes 296.2, 300.1 and 304, respectively. For all analyses involving phecodes, cases were required to have at least two instances of component ICD codes and controls were required to have zero component ICD codes and zero phecode exclusion codes de ned by the phecode map in the PheWAS R package v0.99 60 . Individuals with only one component ICD code were excluded.

Sensitivity Analyses with WBC in VUMC
A series of steps were taken to ensure the association between depression PGS and WBC was not due to a common comorbid confounder phenotype present in individuals with both increased depression PGS and WBC. To nd phenotypes associated with both depression PGS and WBC, separate phenome-wide association scans (PheWAS) were conducted of depression PGS and of the median, age-adjusted, INT normalized WBC measurement. Phecode cases were required to have at least two instances of component ICD codes and controls were required to have zero component ICD codes and zero phecode exclusion codes de ned by the phecode map in the PheWAS R package v0.99 60 . Individuals with only one component ICD code were excluded. Phecodes were required to have at least 100 cases to be included in the scans. Next, phenotypes that were signi cantly associated with both depression PGS and WBC at Bonferroni signi cance (p WBC < 3.64 × 10 − 5 , p depression PGS < 3.72 × 10 − 5 ) were selected. The common phenotypes were binned into seven categories based on similarity of phecodes: cardiovascular, psychiatric, obesity, respiratory, hepatic, pain, and autoimmune. Group-based case-control variables were constructed, in which an individual was considered a case if they were a case for any of the group's component codes. Controls were required to be a control for all of the component codes. To assess the effect of the comorbid phenotypes on the association between depression PGS and WBC, a series of linear regression analyses were conducted controlling for each of the groups separately. Finally, a linear regression was conducted between depression PGS and WBC controlling for all common phenotype groups. All analyses were controlled for sex and top 10 genetic principal components. Depression PGS and WBC in the PsycheMERGE Network Targeted replication analyses between depression PGS and WBC were conducted in three external biobanks. Depression PGS were constructed as described above. In each healthcare system, the median WBC measurement per individual was extracted, age-adjusted for cubic splines of age at measurement, and normalized using a rank-based inverse normal transformation (INT). The depression PGS and WBC were then tted in a linear regression model controlling for sex and top 10 genetic principal components.
The associations controlling for depression and anxiety diagnoses were also replicated using the same phenotype de nition as described in the discovery LabWAS at VUMC. We then meta-analyzed the effect estimates from each analysis across all four sites using a xed-effect inverse variance weighted model in the meta 61 R package.

Controlling for the impact of WBC genetics
We rst tested the correlation between WBC PGS and depression diagnosis. The WBC PGS were constructed using PRS-CS-auto with the 1000 Genomes Project Phase 3 European subset reference panel and weights from the UK Biobank WBC summary statistics 62 . WBC PGS were z-score scaled so that the effect estimate is per standard deviation increase in PGS. We rst con rmed that the WBC PGS was signi cantly associated with measured WBC (p < 2.23 × 10 − 308 ; beta = 0.14). Then we tested the association between WBC PGS and depression diagnosis (de ned as phecode 296.2) across all biobanks, controlling for sex, median age across the medical record, and top 10 genetic principal components using a linear regression model.
In a separate series of analyses, the in uence of WBC genetic factors within the depression PGS was examined. First, the effect of genetic regulation of WBC was adjusted from the depression summary statistics by conditioning the depression summary statistics on the WBC summary statistics using multitrait-based conditional & joint analysis 63 (mtCOJO) in GCTA version1.91.4. Next, conditioned depression PGS were constructed using the conditioned depression summary statistics. Finally, we tested the association between the conditioned depression PGS and the median age-adjusted INT normalized WBC measurements, controlling for sex and top 10 principal components of ancestry. The effect estimates of each analysis from all four sites were meta-analyzed using a xed-effect inverse variance weighted model in the meta 61 R package. Results are presented in the Supplementary Results, Supplementary   Fig. 4, and Supplementary Table 9. Depression PGS and WBC Mediation Analysis Two mediation models between depression PGS, WBC, and depression diagnosis were investigated. First, WBC was modeled as the mediator between depression PGS (exposure) and depression diagnosis (outcome). Second, depression diagnosis was modeled as the mediator between depression PGS (exposure) and WBC (outcome). For each analysis, proportion mediated estimates were calculated by dividing individuals in the control and treatment groups based on percentile of depression PGS. The control group consisted of individuals in the 50th percentile of depression PGS. Three different treatment groups were tested: 85th percentile, 90th percentile and 95th percentile. In the main text, results from the 90th percentile treatment group are reported. All results are reported in Supplementary Tables 8 and 9.. All analyses were controlled for sex and top 10 genetic principal components. The mediation analyses were conducted using 10,000 bootstrapping simulations until a stable p-value was achieved using the mediation 46 R package. The proportion mediated estimates from all four sites were meta-analyzed using a xed-effect inverse variance weighted model in the meta 61 R package.

Depression PGS and WBC-differential Mediation Analysis
To determine which WBC cell types contributed to the association between depression PGS and WBC measurement or depression diagnosis, a series of multiple mediator analyses were conducted using the mediation 46 R package. All measurements were required to be recorded on the same date to ensure they were from the same WBC-differential. For individuals with multiple WBC-differentials recorded in their EHR, median WBC values and the corresponding subtype absolute values were selected. All measurements were adjusted for cubic splines of age at observation and normalized 64,65 .
First, each WBC subtype (basophils, eosinophils, lymphocytes, monocytes, and neutrophils) was analyzed as the main mediator between depression PGS (exposure) and WBC measurement (outcome) with the remaining subtypes as the alternative mediators. Next, each WBC subtype was analyzed as the main mediator between depression PGS (exposure) and depression diagnosis (outcome) with the remaining subtypes as the alternative mediators.