DeepGWAS: Enhance GWAS Signals for Neuropsychiatric Disorders via Deep Neural Network

Genetic dissection of neuropsychiatric disorders can potentially reveal novel therapeutic targets. While genome-wide association studies (GWAS) have tremendously advanced our understanding, we approach a sample size bottleneck (i.e., the number of cases needed to identify >90% of all loci is impractical). Therefore, computationally enhancing GWAS on existing samples may be particularly valuable. Here, we describe DeepGWAS, a deep neural network-based method to enhance GWAS by integrating GWAS results with linkage disequilibrium and brain-related functional annotations. DeepGWAS enhanced schizophrenia (SCZ) loci by ~3X when applied to the largest European GWAS, and 21.3% enhanced loci were validated by the latest multi-ancestry GWAS. Importantly, DeepGWAS models can be transferred to other neuropsychiatric disorders. Transferring SCZ-trained models to Alzheimer’s disease and major depressive disorder, we observed 1.3-17.6X detected loci compared to standard GWAS, among which 27-40% were validated by other GWAS studies. We anticipate DeepGWAS to be a powerful tool in GWAS studies.


Introduction
Neuropsychiatric disorders carry high public health burden including tremendous morbidity, mortality, lessened quality of life, and nancial costs 1,2 . For example, schizophrenia (SCZ) is a highly heritable and debilitating psychiatric disorder affecting about 0.28% of the global population and is associated with high morbidity, mortality, as well as personal and public health costs 3 . During the past 15 years, GWAS have greatly advanced our understanding of the genetic basis underlying these disorders 4,5 . For example, SCZ started with 1 locus reaching genome-wide signi cance in a GWAS with 3,322 cases in 2009 6 to 287 loci in the most recent meta-analysis 5 of ~ 75K cases. The tremendous advancement is largely attributable to increased sample size, which is of undisputed value in GWAS for many complex diseases 7 . However, increasing sample size by another order of magnitude in GWAS becomes increasingly challenging, particularly for SCZ and other neuropsychiatric disorders. Therefore, enhanced GWAS on existing samples via computational approaches would be particularly valuable for genetic dissection of neuropsychiatric disorders.
Standard GWAS associates genotypes with phenotypes usually assumes that all variants are a priori equally likely to be associated 7 . This assumption was initially proper as priors were either unavailable or debated. However, we have now accumulated rich genomic and epigenomic evidence and the continuation of this assumption may represent a tremendous, missed opportunity to leverage and integrate standard GWAS results with functional annotations to effectively up-weight variants that are more likely to play functional roles. For example, GWAS variants have been reported to enrich in regulatory regions 8,9 , and explain a larger than expected amount of disease and trait heritability 10,11 . Therefore, leveraging functional annotations could enhance statistical power to identify causal variants.
Researchers have employed similar integration ideas for related purposes, including phenotypic prediction, gene-gene interaction detection and post-GWAS prioritization of genetic variants and their target genes [12][13][14][15][16] but not for GWAS per se.
Here, we apply machine learning to integrate summary statistics from standard GWAS with functional annotation information for enhancing GWAS ndings. Speci cally, we develop DeepGWAS, a 14-layer deep neural network to enhance GWAS signals without increasing sample size. The input predictors include GWAS summary statistics, linkage disequilibrium (LD) information, and brain related functional annotations. We rst trained our DeepGWAS model with SCZ GWAS summary statistics, nding that our DeepGWAS model outperformed other state-of-the-art machine learning and traditional statistical methods, including XGBoost 17 and logistic regression. Encouraged by these results, we further transferred our DeepGWAS model trained on SCZ to enhance GWAS for two other neuropsychiatric diseases.

Results
Overview of the DeepGWAS model DeepGWAS infers the probability of a variant associated with the phenotype of interest by modeling a vector of 33 input features in a 14-layer fully connected deep neural network ( 1 and Online Methods). In DeepGWAS models, genetic variants are observations, and for each observation, the input features include GWAS summary statistics, basic population genetics statistics, and brain-related functional annotations (Online Methods). Training a DeepGWAS model entails label information (i.e., the binary label indicating whether a variant is associated with the phenotype of interest). In reality, gold-standard true labels typically do not exist. Therefore, we recommend training a DeepGWAS model using results from two GWA studies where the smaller and less powerful GWA study provides the input features while the more powerful one provides label information. The trained DeepGWAS models can be applied to enhance the more powerful GWAS (which only provides input features), enhance another GWAS for the same phenotype, or enhance GWAS for another brain-related disease (for examples, see "DeepGWAS enhancement reveals hundreds of novel SCZ loci", "Enhancing AD GWAS" and "Enhancing MDD GWAS" sections below).

Systematic evaluation of DeepGWAS using SCZ GWAS data
We rst compared the ability of DeepGWAS to enhance GWAS signals with two alternative methods, logistic regression and XGBoost 17 . The former is a classic statistical method, and the latter is a widelyused machine learning method. Results show that DeepGWAS achieves the best performance at both variant and locus-level (Fig. 2). Using GWAS summary statistics from 64 SCZ GWAS cohorts 5 , we were able to design careful experiments (Table S1 and Table S2) for systematic evaluation. Since neural networks are prone to a trivial solution due to a highly imbalanced data such as GWAS summary statistics, DeepGWAS adopts an under-sampling strategy for non-signi cant variants when selecting a subset of variants for training (detailed in "Under-sampling insigni cant variants for model training" in Supplementary Materials). For logistic regression and XGboost, we consider models trained on the full sample of variants as well as models trained on the subset sample of variants as input into the DeepGWAS model (indicated by "_subset"). Each of the ve models takes the same 33 features as input. With the default prediction probability threshold value of 0.5, DeepGWAS achieved rst place (at variant level) and second place (at locus level) for capturing true positives and had an overall best and second best F1 score balancing sensitivity and speci city at the variant level and locus level, respectively (Fig. 2).
For example, at the locus level, the F1 score of XGBoost (0.07) was less than half that of DeepGWAS (0.16). Although logistic regression applied to all variants had the highest F1 score, DeepGWAS approximately doubled the power (TPR: 0.28 vs 0.55). Thus DeepGWAS provided the best balance between power and overall performance (Fig. 2). At the variant level, DeepGWAS (red curve) outperforms all other models and is the clear winner in terms of power (TPR) with a range of 40-60%, the only range where methods have reasonable power and acceptable false positive control (Fig. 2).

DeepGWAS enhancement reveals hundreds of novel SCZ loci
After systematically comparing DeepGWAS with alternative methods using 64 SCZ GWAS cohorts, we trained a DeepGWAS model using data from two recent European SCZ GWA studies, applied to the latest European SCZ GWAS, and investigated the enhancement results. Speci cally, we trained a DeepGWAS model using GWAS summary statistics from Ripke et al. 2014, the 2nd largest European ancestry SCZ GWAS meta-analysis 4 , as input features, and using genome-wide signi cance (p-value < 5e-8) from DeepGWAS's enhancement via transfer learning suggests novel loci for AD and MDD We have shown above that the DeepGWAS model trained using SCZ data has demonstrated satisfactory performance in enhancing SCZ GWAS. Importantly, we found that DeepGWAS could transfer the knowledge learned from SCZ data to enhance GWAS for additional neuropsychiatric disorders including Alzheimer's disease (AD) and major depressive disorder (MDD) ("Transfer learning using deepGWAS", Online Methods). Speci cally, we xed the model parameters for DeepGWAS learned from two recent European SCZ GWAS data described above, and applied the pre-trained DeepGWAS model to AD and MDD GWAS results.  Fig. 4 and Fig. S1a-c, 30% -40% enhanced loci can be validated by other AD GWA studies, while few loci that were signi cant in the original input GWAS were missed by DeepGWAS. Taking the enhanced results based on Jansen et al. 2019 19 as an example, we observe that DeepGWAS identi ed the APP locus which was not identi ed as a signi cant locus in the original Jansen et al., but was detected as a GWAS locus by several larger AD studies recently published 21,23,24 (Fig. 5). APP is a well-established AD gene and previous studies have reported that mutations in APP can lead to β-amyloid protein accumulation and early-onset AD 25,26 .
The APP gene serves as a positive control validated by earlier rare variant studies and more recent larger GWAS. In addition to APP, DeepGWAS identi ed other genes not, or not yet, validated by independent AD GWAS. Multiple genes have also been reported to be relevant to AD. For example, the locus marked by EIF4G3, the closest gene to a DeepGWAS index variant (i.e., the variant with the highest DeepGWAS predicted association probability at the locus) when applied to Jansen et al. 2019 19

Discussion
We proposed here a novel deep neural network to enhance GWAS signals without increasing sample size for neuropsychiatric disorders. Systematic evaluation using SCZ GWAS data and real GWAS enhancement showed DeepGWAS achieved the best performance compared to other two state-of-the-art deep learning methods.
Although DeepGWAS and ne-mapping methods are similar in terms of prioritizing variants, they are different in at least two aspects. First, DeepGWAS allows more complex relationships between loci and disease phenotype including non-linear relationships by employing a deep learning model. Second, DeepGWAS naturally accommodates both qualitative and quantitative annotations, while most nemapping methods only allow qualitative annotations. Among 33 features of interest, initial GWAS p-value is the most important feature, followed by super FIRE in adult annotations and LD score for known GWAS variants and eQTLs (Fig. S3). Speci cally, approximately 69% ~ 87% enhanced variants have initial pvalues < 1e-5 in the input GWAS. In addition, DeepGWAS enhanced variants are more likely to reside in super FIRE regions, exhibit higher LD score for known GWAS variants, and are more likely to be eQTLs (details in "Feature importance" in Supplementary Materials and Fig. S4).
By default, we used DeepGWAS's prediction probability 0.5 as the threshold, for screening purposes where our goal is to maximize power while tolerating false positives. Investigators may desire more stringent thresholds to shortlist variants or to reduce false positives. We investigated other threshold values from 0.5 to 0.95 with an increment of 0.05. We found 0.5-0.75 would be a good calibration threshold value interval and adapted to the user's preferences based on our calibration on SCZ, AD and MDD studies (Fig.  S5a-e). With higher or more stringent thresholds, DeepGWAS would detect fewer variants. For example, the number of DeepGWAS enhanced loci decreased from 413 to 39 for the testing SCZ GWAS dataset from Pardiñas et al. 2018 (Fig. 3 and Fig. S6) when the threshold was increased from the default 0.5 to 0.9. Accordingly, the number of enhanced loci that can be validated by the independent Trubetskoy et al. 2022 decreased from 88 to 17. Similar trends were observed for MDD ( Fig. 6 and Fig. S7). Users therefore should choose thresholds that suit their purposes.
In this work, we trained our DeepGWAS model using two GWA studies on SCZ. Future efforts are warranted to train a "meta" DeepGWAS model with GWAS data from multiple genetically-correlated diseases, as different neuropsychiatric disorders like SCZ, MDD, and bipolar disorder are known to share some common genetic determinants as do certain neurological diseases (e.g., APOE in Alzheimer's and Parkinson's disease 32 . The immediate advantage of combining GWAS across diseases is to increase sample size for training, which in principle often improves the performance of neural network performance. Importantly, DeepGWAS had the ability to transfer knowledge from one disease (SCZ) to other diseases (AD and MDD). DeepGWAS model can potentially transfer knowledge from one neuropsychiatric disorder to other neuropsychiatric disorders such as bipolar, autism, and Parkinson's disease. It is also worthwhile to assess whether DeepGWAS model can transfer knowledge to other non-neuropsychiatric diseases or traits, because increasing sample size is generally expensive for GWAS of almost any trait. For disorders or traits not directly brain-related, annotation matching by tissue and cell type would be a non-trivial task that warrants separate future studies. Nevertheless, we believe our DeepGWAS model is a generalizable and valuable approach to enhance GWAS with additional knowledge that may be relevant to the diseases or traits under study. Careful training with SCZ data and applications to SCZ, AD and MDD GWAS presented in this work have demonstrated DeepGWAS's enhanced power as well as the potential to remove false positives in the original study, by integrating GWAS results with relevant annotations in a deep learning framework.

DeepGWAS model
DeepGWAS is a fully connected deep neural network model, which aims to enhance GWAS results ( Fig. 1), by discovering additional candidate loci relevant for complex diseases or traits. The structure of DeepGWAS model utilizes 33-dimensional vectors as predictors (input), including GWAS summary statistics such as p-value and odds ratios as well as population genetics metrics such as MAF and two different LD scores (speci cally, the regular LD score [summing across all variants] and LD score with signi cant variants in the input GWAS), which could also be calculated from a matching reference panel if no individual level data available. MAF and LD scores calculation require a matching ancestry reference panel otherwise it may affect the performance of DeepGWAS model. While for the 28 annotation-related features including brain open chromatin regions and eQTLs, users can use the released annotations or complement more annotations and assemble those 28 categories to apply DeepGWAS model. The output of DeepGWAS is each variant's predicted probability of being associated with the trait/disease of interest. We denote the DeepGWAS model as F, the input SNP feature matrices as X, the input binary label as Y, and the predicted probability as F(X) (Fig. 1). Binary cross entropy loss is adopted in the training process. The goal of training is to learn F that minimizes the binary cross entropy (details "DeepGWAS model" in Supplementary Materials).
We aimed to release the pre-trained DeepGWAS model using the latest European SCZ GWAS summary statistics. To achieve the aim, we trained DeepGWAS model using Ripke

GWAS summary statistics
DeepGWAS performs analysis on GWAS summary statistics. In this work, we used the following summary statistics for SCZ, AD, and MDD GWAS.

Schizophrenia (SCZ) GWAS data
We assembled SCZ GWAS summary statistics from a total of 64 European cohorts, all contributing to the latest SCZ GWAS meta-analysis 5 . The sample sizes in each cohort range from 389 to 12,310, with 204 to 5,370 SCZ cases, and the number of pre-imputation variants released varies from 225,788 to 813,688. Detailed information of 64 cohorts is listed in Table S1. The released DeepGWAS model was trained using the two largest European SCZ GWAS summary statistics 4,18 (Table S3).

Alzheimer's disease (AD) GWAS data
Six most recent Alzheimer's disease (AD) GWAS data were used in our study [19][20][21][22][23][24] (Table S3) Major depressive disorder (MDD) GWAS data Two most recent major depressive disorder (MDD) GWAS data were used in our study 29,30 (Table S3) European individuals from 1000 Genomes Project. Similarly, LD scores with known variants were calculated the same way but only summing over LD pairs involving signi cant variants in the input GWAS, again within 1Mb of each target variant.

Functional annotations
Functional annotations were collected from rich resources. We brie y introduce each annotation feature below and detailed information is summarized in Table S4.
Open Additional epigenomic annotations -We also collected 30 additional epigenomic annotations from 12 .
Since we have multiple similar open chromatin and histone features collected from 12 , we adopted datadriven strategy to merge similar annotations and used the Jaccard similarity index (bedtools v2.29.0) to group them into 11 meta-annotations (details in "Data-Driven Clustering for epigenetics annotations" in Supplementary Materials) which was shown in Fig. S8, and merged the annotations within the one subannotations using bedtools (v2.29.0).
With all the functional annotations above, we have in total 30 functional annotations used as predictors in the DeepGWAS model.

Evaluation using SCZ GWAS results
With GWAS summary statistics from 64 SCZ studies, we rst randomly split them into three sets: set A, set B and set C (Fig. S9). Each variant in set A, B and C was annotated for all features listed in the "Functional annotations" section above. When splitting, we made efforts to balance the three sets considering several aspects including the number of cases, total sample size (i.e., the number of cases and controls), and the number of signi cant variants. To mimic increasingly larger GWAS, we assigned 10, 22 and 32 studies to set A, B and C respectively. After splitting, we meta-analyzed GWAS within each set using METAL 50 , and obtained three sets of GWAS summary statistics (Fig. S9). With these three sets of GWAS summary statistics, we rst trained models to "enhance set A to set B". In other words, set A contributed input features (X in Fig. 1) while set B contributed outcome labels (Y in Fig. 1). Speci cally, the binary indicator of whether the meta-analysis p-value < 5e-8 from set B was used as Y to train models. We then applied this pre-trained model to set B, to obtain enhanced set B results. Finally, signi cance in set C was served as ground truth to evaluate enhanced set B results.
We repeated the splitting procedure, randomly generated a new independent testing data following the same evaluation procedure, and applied the pre-trained DeepGWAS, XGBoost, and logistic regression three models to another independent testing dataset. We nally calculated the mean of the F1 score, CPR, TPR, ROC metrics, and precision recall curve metrics. The detailed information including the sample sizes and loci number used in evaluation were included in the Table S2.

Comparison with alternative methods
To evaluate the performance of the DeepGWAS model, we compared DeepGWAS with alternative methods including logistic regression and XGBoost.

Logistic regression model
We trained a logistic regression model implemented in R v3.6.0. Logistic regression model was formulated as below, denoted as weights of predictors and denoted predictors. The output of the logistic regression model was prediction probability of whether a given variant be signi cantly associated with a disease.

XGBoost model
XGBoost, or eXtreme Gradient Boosting, is a commonly used decision-tree-based ensemble machine learning algorithm using a gradient boosting framework. Using the same training dataset and testing dataset as applied to the DeepGWAS model, we trained and tested a supervised XGBoost model in R v3.6.0. We speci ed the learning task to be a tree-based logistic regression and evaluation metric to be root mean square log error (RMSLE). We set maximum boosting iteration as 50 in the model. Due to the extreme unbalanced ratio between the signi cant variants versus insigni cant variants in the model, we used the argument of scale_pos_weight to control the unbalanced data. To assess the performance of three models, we rst de ned the enhanced variants and loci as the predicted signi cant variants and loci that are not considered to be signi cantly associated with the disease in the input study. Then, we considered two metrics: truth positive rate (TPR), and F1 score. In addition, receiver operating characteristic curves (ROC) and precision recall curves (PRC) were also used to compare three models.

Transfer learning using DeepGWAS
Although DeepGWAS training is supervised with labels from a large-sample-size study of the same disease ( Fig. 1 and Fig. S9), DeepGWAS can transfer the knowledge learned from one disease (SCZ) to other diseases (such as AD and MDD). Speci cally, we rst trained our DeepGWAS model with two largest European SCZ GWAS 4,18 , xed all the parameters in the neural networks, and applied the SCZtrained-DeepGWAS model to enhance AD and MDD GWAS. Then we summarized the enhanced AD results by rst binning them according to the probability of signi cant association with the disease and then assessing the proportion of loci within each bin that can be validated by independent AD GWAS (Fig. S1d). The rationale behind this loci validation approach is that true positives are more likely to be enhanced by additional independent studies.

Declarations
Data availability All GWAS data are available through the original publications with PMIDs listed in Table S3. Overview of the DeepGWAS model. The blue circle Xdenotes the 33 input features, which serve as predictors in DeepGWAS model; the blue circle Y denotes the true binary input label indicating whether a variant is associated with the disease. During training, Y is obtained from a larger-sample-size study, serving as the working truth; n denotes the number of genetic variants; the black and gray solid circles denote the neural network nodes of the deep learning architecture within the DeepGWAS model; the yellow circle denotes DeepGWAS output p̂ : estimated probabilities for each of the n variants being associated with the disease.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.