Analyses of GWAS and sub-threshold loci lead to the discovery of dendrite development and morphology dysfunction underlying schizophrenia genetic risk

Schizophrenia (SCZ) is highly polygenic, and thousands of genes contribute to its risk. The 145 GWAS loci identied to date do not fully reveal SCZ genetic risk pathways. In this study, we explore a cost-effective strategy to increase power of inference of novel pathways, by expanding the analysis to include sub-threshold GWAS (subGWAS) loci. We identify 180 subGWAS loci (e.g., 5 x10 -8 < P ≤ 10 -6 ) based on SCZ summary statistics of 40,675 cases and 64,643 controls from CLOZUK and PGC datasets, and show that subGWAS loci contain substantial true genetic association signals. We merge GWAS (sigGWAS) and subGWAS loci and identify in total 304 high-condence risk genes (HRGs) by jointly modeling the expanded set of loci. We identify dendrite development and morphogenesis (DDM, GO:0016358 and GO:0048813) as a novel category of biological processes implicated in SCZ genetic risk. SigGWAS loci fail to detect DDM, which is predominantly enriched in subGWAS loci. Further, DDM genes are signicantly enriched for heritability of SCZ, as well as bipolar disorder and major depression. Genes in this functional process show cell type specicity in neurons in both fetal and adult brains, and their involvement in SCZ risk is further supported by eQTL analysis of SCZ risk alleles. We derived induced pluripotent stem cell (iPSC) lines from sporadic SCZ patients and normal controls and observe increased neurite lengths and soma sizes in patient-derived iPSC lines along multiple time points during neuronal development, further validating the genetic ndings. We also nd that the implicated genes are enriched in FDA-approved drug targets, suggesting a therapeutic potential for targeting the implicated biological processes for prevention and treatment. Our results showcase that expanding the analysis to include subGWAS loci is a valuable strategy for enhancing power of uncovering disease mechanisms, especially those of weak effect size, for SCZ and other complex diseases. Glausier Lewis, seemingly in contradiction to the increased dendritic density observed in hiPSC-derived neurons here. We speculate that SCZ genetic risk leads to increased dendritic density during the prenatal life and early childhood, while neuronal and specically dendritic dysfunction subsequently triggers aberrant pruning of synapses during adolescence, nally resulting in the reduced density observed in SCZ postmortem brains. Future studies are necessary to accurately understand the underlying mechanisms.


Introduction
Schizophrenia (SCZ), a life-spanning psychiatric disorder a icting approximately 1% of the population, is a leading cause of disability and premature death (Vos et al., 2017;Piotrowski et al., 2017), incurring an immense individual and societal burden upwards of 155 billion dollars in United States annually (Charlson et al., 2018;Cloutier et al., 2016). Incomplete understanding of disease pathophysiology continues to hinder the development of new therapeutics beyond the advent of antipsychotic medications 50 years ago . Understanding the cause of SCZ is a critical step to relieve the burden. SCZ is highly heritable, with an estimated heritability of ~80% (Sullivan et al., 2003;Hilker et al., 2018). Identifying risk genes is key to facilitating drug development (Lencz and Malhotra, 2015). During the past decade, more than a hundred loci associated with SCZ have been identi ed from Genome-wide association studies (GWAS) (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014;Pardiñas et al., 2018). However, the discovered loci collectively explain only a small proportion of the heritability, leaving the vast majority of genetic loci missing, which is a common challenge referred to as the "missing heritability" conundrum (Manolio et al., 2009) in most complex diseases. Risk genes have been implicated based on GWAS-detected loci (Wang et al., 2019), however, discoveries are constrained by the limited number of loci, and detection of more loci to infer new pathways and mechanisms is necessary given that thousands of genes are likely involved in SCZ risk (Nguyen et al., 2017). Increasing sample size is an important approach to unveil the hidden part of heritability, but current GWAS sample sizes are already very large, so the return is limited. Alternative approaches to uncovering risk loci, parallel to increasing GWAS samples, are essential. Several studies have shown that GWAS variants slightly below the genome-wide signi cance threshold (P ≤ 5 x 10 -8 ) account for most of the "missing heritability" in GWAS (Shi et al., 2016;Yang et al., 2010); we refer to variants with 'sub-threshold' signi cance as subGWAS variants hereinafter. The widely accepted genome-wide signi cance threshold for GWAS is based on a conservative Bonferroni correction, and adoption of such a stringent threshold possibly neglects numerous subGWAS variants that may reach genome-wide signi cance by inclusion of additional data (Panagiotou et al., 2012). Moreover, a study implicates subGWAS variants as harboring plausible functional, disease-relevant signals (Wang et al., 2016). These ndings suggest that exploration into subGWAS loci has real potential to make further discoveries, given a current wealth of data.
Despite the success in identifying risk loci through GWAS, pinpointing corresponding risk genes remains a big challenge to understanding underlying biological mechanisms of SCZ susceptibility (Henriksen et al., 2017). Furthermore, since SCZ is a highly heterogeneous and polygenic disorder, identifying pathways or functional networks through which disease risk alleles exert their impact is crucial for making progress in the prevention and treatment of this condition. Several gene sets and pathways relevant to SCZ pathology have been proposed (Purcell et al., 2014;Hertzberg et al., 2015;Pardiñas et al., 2018). For instance, the involvement of synaptic processes have been repeatedly reported (McGlashan and Hoffman, 2000;Glessner et al., 2010;Lips et al., 2012;Osimo et al., 2019;Berdenis van Berlekom et al., 2020), and this can explain a progressive cortical grey matter reduction observed in imaging studies (Turetsky et al., 1995;Fornito et al., 2009;Heckers and Konradi, 2010). Despite these ndings, our knowledge of the disease pathophysiology remains limited. Therefore, the potential value and need for additional mechanistic insights compels exploration into subGWAS risk signals for bona de risk loci.
In this study we demonstrate and utilized the value of subGWAS variants (5 x 10 -8 < P ≤ 1 x 10 -6 ) in SCZ from different perspectives and combine subGWAS with the genome-wide signi cant GWAS (sigGWAS) loci together to make robust mechanistic inference based on the expanded loci. Speci cally, we use an integrative Risk Gene Selector (iRIGS), a Bayesian model selection framework previously developed to nominate risk genes at GWAS loci (Wang et al., 2019), to jointly analyze both sigGWAS and subGWAS loci together to predict risk genes. We are particularly interested in identifying novel mechanisms involved in SCZ. Based on the expanded gene list, we identi ed a novel category of biological processes involved in neuronal dendrite development and morphogenesis (DDM, GO:0016358 and GO:0048813), key functions that are essential for normal synapse formation and maturation. Of particular note is that these biological processes are only enriced in subGWAS signals while absent in sigGWAS loci, supporting the value of subGWAS in identifying pathways with weaker effect size that are otherwise hard to discover without explicitly analyzing subGWAS loci. We validated the role of DDM in SCZ from multiple perspectives, including through normal and SCZ patient-derived induced pluripotent stem cells (iPSC) experiments. Our analyses indicated that subGWAS loci are a valuable resource to gain new insights into the underlying SCZ risk mechanisms, and the newly implicated DDM processes provide targets for therapeutic development.

SCZ subGWAS loci in large-scale GWAS contain substantial true associated variants
To investigate whether subGWAS loci harbor valid signals, we obtained two SCZ GWAS data sets for analyses, and investigated whether subGWAS loci could convert into sigGWAS with increased sample sizes. The rst, published by the Schizophrenia Working Group of the Psychiatric Genomics Consortium (PGC) had a sample size of 36,989 cases and 113,075 controls and identi ed 108 independent sigGWAS loci (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). The second is from a meta-analysis combining the CLOZUK and PGC samples with a total sample size of 40,675 cases and 64,643 controls (Pardiñas et al., 2018), referred to as CLOZUK+PGC hereinafter. CLOZUK+PGC identi ed 145 sigGWAS loci, among which there are 52 new loci not identi ed by PGC. We aimed to evaluate whether these new loci overlap subGWAS variants from the PGC study. Brie y, for each index single nucleotide polymorphism (SNP) of the 52 loci, we extended it to a linkage disequilibrium (LD) region by including other SNPs tagged by the index SNP with an r 2 threshold. When requiring r 2 ≥ 0.9, 38 index SNPs from the 52 loci can be extended to LD regions (with the rest having no available SNPs in LD), among which 21 LD regions (55%) overlap at least one subGWAS SNP identi ed in the PGC study. With a loose criterion of r 2 ≥ 0.2, all 52 index SNPs can be extended to valid LD regions, and 28 (54%) overlap at least one subGWAS SNP identi ed in PGC study (SF 1). If we relaxed the criteria of subGWAS in PGC study as 5 x 10 -8 < P ≤ 1 x 10 -5 , 33 of 38 LD regions (87%) overlap ≥ 1 subGWAS SNP when requiring r 2 ≥ 0.9 and the ratio is 44 of 52 (85%) when requiring r 2 ≥ 0.2. We also investigated distributions of Pvalues in those 52 loci from both PGC and CLOZUK+PGC studies and observed similar overall patterns with improved signi cance in CLOZUK+PGC (SF 2, an example). All these results showed that subGWAS loci in SCZ GWAS with thousands of samples contain true genetic signals.

SCZ subGWAS loci are enriched in brain tissue enhancers
The majority of GWAS SNPs lie in non-coding regions, and they affect their target genes through regulatory effects (Maurano et al., 2012). To evaluate whether subGWAS loci are potentially functional in the brain, we collected enhancers identi ed in dorsolateral prefrontal cortex (DLPFC) from the ROADMAP Epigenomics Project (Roadmap Epigenomics  to assess whether subGWAS loci signi cantly overlap with these enhancers. To this end, we de ned 180 independent subGWAS index SNPs based on CLOZUK+PGC data (Methods). For each index SNP, we adopted the r 2 ≥ 0.2 threshold to de ne LD regions and counted how many regions thus de ned overlapped with DLPFC enhancers. We observed 77.2% (132/171) of subGWAS LD regions overlapped with ≥ 1 DLPFC enhancers, which is signi cantly higher than background (P < 1 x 10 -3 , permutation test, SF 3A, Methods), indicating more subGWAS LD regions overlap with DLPFC enhancers. Moreover, we observed that each subGWAS LD region overlaps 6.54 enhancers on average, which is also signi cantly higher than background (P = 5 x enhancers. The signi cance also holds for sigGWAS LD regions (P < 1 x 10 -3 , SF 3C, and P = 5 x 10 -3 , SF 3D). Next, we compared across all tissue types and observed that subGWAS LD regions overlapped with more enhancers in brain tissues compared to non-brain tissues (SF 4A, B; Methods). This trend is signi cant in DLPFC when comparing DLPFC versus all non-brain tissues (P sig = 1.75 x 10 -5 , SF 4A; P sub = 7.5 x 10 -5 , SF 4B; Wilcoxon test), indicating the enrichment of subGWAS in enhancer regions show brainspeci city.
Risk genes inferred from SCZ subGWAS harbor genuine SCZ risk genes We next applied iRIGS, a method developed previously to predict SCZ risk genes (Wang et al., 2019), to identify a set of high-con dence risk genes (HRGs). Brie y, iRIGS is a Bayesian model selection framework that leverages multi-omics data and gene network information to predict risk genes from GWAS loci. Features used in iRIGS include differential expression in SZ cases and controls (Fromer et al., 2016), DRE-promoter links (Andersson et al., 2014;Won et al., 2016;Mifsud et al., 2015), de novo mutations (Turner et al., 2017), and distance to index SNP (DTS). We combined subGWAS and sigGWAS together to boost the power of risk gene inference, and identi ed 304 HRGs in total, among which 135 and 177 were identi ed in sigGWAS and subGWAS loci, respectively (Methods) (ST 1). Accordingly, we termed the 135 genes sigHRGs, the 177 genes subHRGs, and the total 304 genes allHRGs. In addition to HRGs, iRIGS also predicted sets of local background genes (LBGs) as cleaner controls than whole genome background for comparison purposes (Methods); we referred to the corresponding LBGs as sigLBGs, subLBGs, and allLBGs, respectively. SCZ subGWAS-derived risk genes explain signi cant SCZ heritability.
We then evaluated SCZ heritability explained by subHRGs using a strati ed LD score regression (LDSC) method (Finucane et al., 2015). We included SNPs located within a ±10 kb window centered at the transcription start site (TSS) of each gene for LDSC analysis. We observed a signi cant enrichment of heritability in subHRGs (enrichment = 8.97, P = 1.51 × 10 -4 ) compared to subLBGs (enrichment = 3.38, P = 2.28 x 10 -3 , SF 5). We also set the window size as ±100 kb for LDSC and observed consistent results (SF 6). Note that DTS (distance of SNP to gene TSS) used in iRIGS (Methods) is a confounding factor for LDSC, since genes closer to index SNPs are more likely to have a high LDSC score. To avoid this confounding effect, we excluded DTS here in the application of iRIGS to predict HRGs for LDSC enrichment analysis. SCZ subGWAS-derived risk genes show spatiotemporal expression patterns characteristic of SCZ risk genes.
Next, we explored the spatiotemporal expression pattern of HRGs using expression data from GTEx (GTEx  and Brainspan (Miller et al., 2014) (Methods). We observed strong brain speci city for sigHRGs (P = 8.21 x 10 -7 ; Wilcoxon test) and reduced but noticeable brain speci city for subHRGs (P = 2.58 x 10 -3 ; Wilcoxon test), compared to an absence of brain speci city for LBGs (P = 0.707 for sigLBGs, P = 0.688 for subLBGs, SF 7; Wilcoxon test). In addition, we also observed higher expression levels of both sigHRGs (P = 4.37 x 10 -4 , SF 8; Wilcoxon test) and subHRGs (P = 4.48 x 10 -4 , SF 8; Wilcoxon test) at prenatal stages compared to postnatal stages. These results are consistent with the previously discovered spatiotemporal expression pattern of SCZ risk genes (Wang et al., 2019;Sey et al., 2020). SCZ subHRGs inferred from subGWAS loci expand the knowledge discovered by sigHRGs To investigate whether subGWAS signals capture knowledge of disease etiology complementary to that inferred only from sigGWAS, we performed gene set enrichment analysis (GSEA) upon 18 gene sets that have been widely and repeatedly implicated in SCZ (Methods). First, we observed that subGWAS loci enhance the signi cance of enrichment in the seven gene sets that are signi cant in sigGWAS, with the trend that P values for these sets become more extreme, including evolutionarily constrained genes (ECG, P all = 8.76 X 10 -17 , P sig = 6.5 x 10 -11 , P sub = 1.65 x 10 -6 ) and miRNA-137 (miR-137) targets (P = 3.04 X 10 -6 , P sig = 2.11 x 10 -3 , P sub = 1.12 x 10 -2 ) ( Table 1). Second, combining subHRGs and sigHRGs led to the identi cation of new signi cant gene sets that sigHRGs and subHRGs failed to identify separately. For example, sigHRGs and subHRGs are not signi cantly enriched in the presynaptic active zone (PRAZ) set (P sig = 0.27, P sub = 0.288), while allHRGs show signi cant enrichment in the same gene set (P all = 6.22 X 10 -3 , OR = 6.57) after inclusion of subHRGs. The improvement can also be observed for calcium channel and signaling (CCS, P all = 3.55 X 10 -2 , P sig = 5.03 X 10 -2 and P sub = 1) and metabotropic glutamate receptor subtype 5 (mGluR5, P all = 1.38 X 10 -2 , P sig = 7.03 X 10 -2 , P sub =1), respectively. Lastly, we especially note that subHRGs alone can lead to the identi cation of new gene sets that sigHRGs fail to detect. For the set of fragile X mental retardation protein (FMRP) targets de ned by Ascano et al. (Ascano et al., 2012), the enrichment of subHRGs is signi cant (P sub = 7.39 X 10 -3 , OR = 3.13), while the signi cance is absent for sigHRGs (P sig = 0.23, OR = 2.28). All of these observations imply that subGWAS can enhance and expand the knowledge inferred only from sigGWAS signals. We also investigated phenotypic manifestations of gene knockouts in mouse models to ask whether mutations in mouse genes orthologous to HRGs exhibit phenotypes related to SCZ (Methods). Among the 1721 terms (> 50 genes) in the Mammalian Phenotype Ontology (MPO) collected from the Mouse Genome Informatics (MGI) database, we observed 201 terms signi cantly enriched in allHRGs after Bonferroni correction (ST 2). We visualized the signi cant results of the three HRG lists in directed acyclic graphs (DAGs) to show the underlying hierarchical structure of ontology. The nervous system phenotype branch (MP:0003631, P = 6.2 X 10 -21 ) stands out from these enriched terms (SF 9), consistent with prior knowledge that SCZ re ects perturbations of neurodevelopmental processes (Gulsuner et al., 2013). Some terms, however, are independently identi ed by subHRGs (Fig. 1A), e.g., abnormal hindbrain morphology (MP:0000841, P sub = 4.82 x 10 -2 , P sig = 1), abnormal hippocampus morphology (MP:0000807, P sub = 1.51 x 10 -2 , P sig = 0.799), abnormal brain development (MP:0000913, P sub = 1.13 x 10 -3 , P sig = 1) and abnormal forebrain development (MP:0003232, P sub = 3.51 x 10 -2 , P sig = 1). We also observed that subHRGs improve the power to identify terms that sigHRGs and subHRGs failed to identify individually ( Fig. 1B), including abnormal long-term potentiation (MP:0002207, P all = 3.17 x 10 -2 , P sub = 0.236, P sig = 1), abnormal synaptic depression (MP:0002915, P all = 4.03 x 10 -2 , P sub = 1, P sig = 1), abnormal neuron differentiation (MP:0009937, P all = 2.46 x 10 -2 , P sub = 1, P sig = 1) and abnormal neural tube morphology (MP:0002151, P all = 7.74 x 10 -3 , P sub = 1, P sig = 0.625). These ndings are consistent with previous observations from GSEA of gene sets in SCZ, supporting that subGWAS loci are potentially valuable to expanding our understanding of disease etiology.
SCZ subHRGs lead to novel discovery of dendrite development and morphogenesis implicated in SCZ genetic risk that sigHRGs fail to detect To gain more speci c functional insights underlying SCZ pathophysiology, we collected Gene Ontology (GO) terms under the 'biological process' category and performed GSEA (Methods). We observed signi cantly enriched terms across different layers of the hierarchical GO structure (SF 10 and ST 3). We are particularly interested in learning more speci c and interpretable terms, we restricted further analysis to terms of ≤ 500 genes. We visualized enriched terms in Fig. 2, which illustrates three major clusters. In GSEA, sigHRGs make the major contribution to the identi cation of both clusters I and III (Fig. 2, pink dots), in which the enriched GO terms are related to well-known implicated functions, including synapse related processes (Osimo et al., 2019)  Genes in dendrite development and morphogenesis terms explain heritability of SCZ and other relevant psychiatric disordersIn cluster II, we observed a novel functional module involved in dendrite development and morphogenesis. We note that the major contribution to this is from subHRGs (Fig. 2, orange dots), not from sigHRGs. Speci cally, enriched GO terms include dendrite development (DD, GO:001635, P sub = 2.03 x 10 -3 , P sig = 0.24) and dendrite morphogenesis (DM, GO:0048813, P sub = 1.67 x 10 -3 , P sig = 0.24), regulation of dendrite development (RDD, GO:0050773, P sub = 4.86 x 10 -4 , P sig = 0.4), and regulation of dendrite morphogenesis (RDM, GO:0048814, P sub = 7.61 x 10 -4 , P sig = 0.54) (Fig. 3 large orange circles).
We found that the DD term includes ten genes overlapping subHRGs, and four genes overlapping with sigHRGs (ST 1), explaining why subHRGs are the main driver of identi cation of this enriched module (Fig. 3, and ST 1). We also found that the enrichment mainly originated from overlaps between terms involving regulation function in subHRGs, i.e., nine of ten genes overlapping between DD and subHRGs derive from RDD (ST 1 To validate that DDM terms are genetically involved in SCZ, we extracted genes from relevant dendrite GO terms and performed LDSC analysis. To avoid confounding effects due to the use of sigGWAS and subGWAS loci in the discovery of the terms, we excluded allHRGs from these GO term genes for LDSC analysis. We observed signi cant enrichment of heritability in DDM related terms (Fig. 4) Owen, 2016;Brainstorm Consortium et al., 2018). We observed similar enrichment of dendrite related terms in genetic risk of BD and MDD (Fig. 4).
Previous studies have indicated that gene-disrupting rare coding variants are more abundant in SCZ cases compared to controls (Genovese et al., 2016), and risk genes identi ed from common variants (mostly noncoding) are also likely overlapped with genes targeted by rare variants (mostly coding) (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). We sought to probe whether rare coding variants in genes in DD and DM are also associated with SCZ risk. We collected rare variant association summary statistics data at gene level from SCHEMA (https://schema.broadinstitute.org/) (Singh et al., 2020) and observed that genes of DD and DM sets have signi cantly smaller P-values than random background genes (P DD = 8.55 x 10 -6 , P DM = 1.62 x 10 -4 , SF 15; Wilcoxon test). After removing allHRGs from GO terms, the P values of the remaining genes remain signi cant (P DD = 7.03 x 10 -5 , P DM = 5.67 x 10 -4 , SF 15; Wilcoxon test).Rare coding variants in genes in dendrite development and morphogenesis are associated with SCZ Genes in dendrite development and morphogenesis terms show speci city in neurons at fetal and adult stages A current neuropathology model of SCZ is a revised neurodevelopment hypothesis proposing the contribution of neurodevelopmental processes at both early (pre-or perinatal stages) and late (childhood to adolescence) stages (Gogtay et al., 2011). The majority of evidence supporting this model comes from neuroimaging studies of adult brain (Pantelis et al., 2003), e.g. reduction of grey matter areas in cortex and hippocampus (Honea et al., 2005;Takahashi et al., 2009;Levitt et al., 2010). However, there is little direct evidence showing abnormalities in fetal brain. Here, we sought to explore whether dendrite related molecular machinery is likely to function in neurons at both fetal and adult stages at cell type level. We collected four sets of single-cell sequencing (scRNA-seq) data from previous studies, including two of fetal brain and two of adult brain (Li et al., 2018;Nowakowski et al., 2017;Lake et al., 2016) , and calculated the cell type speci city of neuronal cells (Methods). We observed that DD genes shows high cell type speci city in both fetal and adult neurons compared to background (Fig 5), including both excitatory (ExN) and inhibitory (InN) neurons. These results imply the potential defects of dendrites in both types of neurons at fetal and adult stages, supporting the revised neurodevelopment hypothesis of SCZ at molecular level.
SCZ subGWAS risk alleles are associated with expression of dendritic genes Next, we investigated genetic in uences on expression of dendritic genes. We collected four brain expression quantitative trait loci (eQTL) studies among which one was performed in fetal human brain (O'Brien et al., 2018), and the other three were performed in adult human brain (Jaffe et al., 2018;Hoffman et al., 2019;GTEx Consortium, 2015).
LRP8 is one of our predicted subHRGs, and it was also previously suggested as a SCZ risk gene (Li et al., 2016). The encoded protein plays a critical role in dendrite development through interacting with reelin signaling pathway (Niu et al., 2004;Myant, 2010). We observed that the risk allele of one subGWAS SNP, rs5174 (risk allele: C, GWAS P-value: 1.45 x 10 -6 ), signi cantly upregulates the expression level of LRP8 consistently in all four data sets (P Obrien(fetal) = 0.02, P Jaffe(adult) = 1.29 x 10 -4 , P Hoffman(adult) = 1.34 x 10 -7 , and P GTEx(adult) ≤ 1.9 x 10 -7 SF 16, ST 5; Wilcoxon test). A previous differential expression study (Fromer et al., 2016) has shown that LRP8 is over-expressed in SCZ patients (P = 1.22 x 10 -3 ), consistent with the eQTL effect here. Another interesting subHRG is ASAP1, which regulates the formation of dendritic spines in dentate gyrus neurons (Yadav et al., 2017;Jain et al., 2012). In the corresponding subGWAS locus, we found rs17212137 (risk allele: T, GWAS P-value: 8.74 x 10 -7 ) is an eQTL down regulating ASAP1 expression consistently in all the three adult brain studies (P Jaffe(adult) = 6.19 x 10 -5 , P Hoffman(adult) = 6.25 x 10 -6 , and P GTEx(adult) ≤ 8.6 x 10 -7 , SF 16, ST5; Wilcoxon test). We note that the observed the association between genetic variant and LRP8 expression in both prenatal and postnatal stages, while the eQTL support of ASAP1 was only signi cant in adult human brain. We speculate that the genetic in uence on risk gene expression could happen at a speci c stage or span a long period of time, re ecting the temporal expression pattern of dendritic genes observed in the section above.
Cortical neurons differentiated from SCZ patient-derived iPSC lines show abnormal dendrite development and morphogenesis along multiple time points As direct examination of neuroanatomical changes related to SCZ is impractical in fetal brain of patients, we employed an induced pluripotent stem cell (iPSC) model that mimics human neuronal development. We generated iPSC cell lines from 4 sporadic SCZ patients and 4 healthy controls and differentiated the iPSC lines into cortical neurons (Methods). We focused on early stages of neuronal development and examined neuronal morphology at 1-4 weeks post differentiation (Methods). We observed signi cantly longer neurite (dendrite) lengths in developing SCZ neurons compared to controls across all four time points (P = 3.29 x 10 -7 , 1.45 x 10 -11 , 4.35 x 10 -10 and 9.33 x 10 -4 at 1, 2, 3 and 4 weeks, respectively; Wilcoxon test) (Fig. 6). Moreover, the soma size is signi cantly larger in SCZ patient-derived iPSC lines than that in controls (P = 9.72 x 10 -10 , 2.90 x 10 -11 , 1.65 x 10 -6 and 5.95 x 10 -4 at 1, 2, 3 and 4 weeks, respectively; Wilcoxon test). Fig. 6 left panel shows an image of dendrites of neurons from a typical normal control iPSC line (top) and a typical SCZ patient-derived iPSC line (bottom) at 2 weeks, which illustrates the increased dendrite density and soma size.
Genes in dendrite development and morphogenesis terms are likely to be potential drug targets.
Given the important role of DDM in SCZ pathology, we were interested in exploring whether genes corresponding to DD and DM terms have the potential for repositioning existing drugs for SCZ treatment.
We utilized a list of 3,078 con rmed druggable targets curated in a previous study (Gaspar and Breen, 2017) for drug target enrichment analyses. Among the 233 genes from DD and DM terms, 70 of them (30%) are drug targets, including two subHRGs, FYN and CDK5R1. The overlap ratio is signi cantly higher than random (P = 1.67 x 10 -13 , OR = 3.18; Fisher's exact test). In particular, 8 dendritic genes (CHRNA3, CHRNA7, CHRNB2, GRIN3A, DTNBP1, GSK3B, HDAC2, and TRPC5) are targets of nervous system drugs, corresponding to a signi cant enrichment (P = 0.02, OR = 2.60; Fisher's exact test). The protein encoded by CDK5R1 (p35) is a neuron-speci c activator of cyclin-dependent kinase 5 (CDK5), and the decreased expression of p35 observed in SCZ was suggested to affect synaptic protein expression (Gaspar and Breen, 2017). The three genes CHRNA3, CHRNA7, and CHRNB2 are subunits of the neuronal nicotinic acetylcholine receptor family (McKay et al., 2007). GRIN3A encodes a subunit of the N-methyl-Daspartate receptor (NMDAR), a key player in controlling synaptic plasticity (Paoletti et al., 2013), while FYN tightly regulates NMDAR function (Tezuka et al., 1999;Trepanier et al., 2012). Interestingly, FYN is implicated in SCZ risk in different populations (Wu et al., 2013;Tsavou and Curtis, 2019). DTNBP1 has been linked to SCZ susceptibility in multiple studies (Narr et al., 2009) and proposed as a potential therapeutic target for SCZ treatment (Wang et al., 2017). The overexpression of another dendritic gene, HDAC2, was previously shown to reduce dendritic spine density, synaptic plasticity, and synapse number, while its downregulation led to increased synapse number (Guan et al., 2009). These results exemplify the important roles of DD and DM genes in SCZ etiology, synaptic development and plasticity, supporting the idea of drug repositioning of these genes for SCZ therapy.

Discussion
Although GWAS have successfully identi ed more than one hundred loci associated with SCZ, they explain only a small proportion of disease heritability. Increasing sample size is a straightforward solution to improve the power of GWAS; however, collection of additional samples that make meaningful contribution to discovery of novel loci is extremely effort consuming, and the current sample size is already exceptionally large. On the other hand, existing SCZ GWAS with a sample size of tens of thousands of individuals already provides us a wealth of data, and gaining additional information by mining this extant resource will be tremendously cost-effective and advance our understanding of the underlying biology of SCZ. As a proof of concept, we revisited two sequential SCZ GWAS and found that a substantial portion of these 'sub-threshold' signi cant loci from the PGC study would become genomewide signi cant after inclusion of additional samples (in the CLOZUK+PGC study). Moreover, we found that brain-related enhancers are enriched in subGWAS loci, consistent with the knowledge that GWAS loci exert their functions through regulatory effects. These ndings inspired us to explore the subGWAS loci to increase the discovery power beyond sigGWAS loci. As our primary interest is to make novel discoveries of biological processes involved in SCZ, inferring novel genes beyond GWAS loci is critical in achieving the goal.
Given the complexity of SCZ from multiple angles, inferring risk genes from associated loci is an established challenge. We employed a recently developed and validated computational framework for SCZ, iRGIS, to explore the value of subGWAS loci. When applying iRIGS to predict risk genes, we combined subGWAS and sigGWAS loci together as input, which has two bene ts. On one hand, sigGWAS loci are more reliable signals and can improve the accuracy of prediction of subHRGs given that the input loci are jointly modeled. On the other hand, integration of subGWAS and sigGWAS loci increased the number of risk genes and therefore raised the discovery power of underlying biological mechanisms. As expected, some of the functional modules implicated in subHRGs are consistent with sigHRGs. In addition, we especially note that subGWAS loci can also implicate biological functions that are achieved only in subGWAS. Speci cally, it is the analysis of subHRGs that led to the discovery of novel biological processes involved in DDM, which sigHRGs fail to detect. Interestingly, dendrite development is a subprocess of synapse development in neurogenesis, while the dysregulation of synapse function has been repeatedly implicated in SCZ (McGlashan and Hoffman, 2000;Glessner et al., 2010;Lips et al., 2012;Osimo et al., 2019; Berdenis van Berlekom et al., 2020). We reason that this is due to the heterogeneity of genetic architecture of SCZ, and to a certain extent, the heterogeneity can be re ected by effect sizes in the association study. The sigGWAS loci implicating synapse development harbor variants with large effect size, while subGWAS loci implicating DDM harbor variants with relatively weaker effect sizes. The heterogeneity between loci with different effect sizes lead to the discovery of mutually exclusive functional modules even though dendrite development and synapse development are molecularly correlated. Given that functional gene modules may confer different effect sizes, modules with higher effect sizes are more likely to be detected rst from GWAS. Most human complex diseases are highly heterogeneous and identi ed GWAS loci are far from complete due to the relatively limited sample sizes for most. Therefore, many gene modules with weaker effect sizes remain to be discovered. Our strategy presented here demonstrates that focusing on subGWAS signals is a powerful and valuable approach to identify this category of functional gene sets.
Dendritic alterations have long been observed in SCZ and other psychiatric conditions (Konopaske et al., 2014). However, little is known about the link between genetic risk and dendritic dysfunction. In this study, we discovered dendrite dysmorphogenesis in SCZ based largely from GWAS data and validated our ndings from multiple orthogonal angles, including from genetic evidence and iPSC models. The observed increase in dendritic length and density in SCZ patient-derived iPSC models compared to control samples is consistent with previous studies (Dimitrion et al., 2017;Forrest et al., 2017), providing further support of the involvement of dendrite development in SCZ genetic risk. For instance, the deletion of 15q11.2 is associated with SCZ, and in a study of hiPSC-derived neurons bearing 15q11.2 deletion, the authors observed an approximately 7.5-fold increase in dendritic density compared with neurons without the deletion (Dimitrion et al., 2017). In another study, the authors found that one SCZ risk allele located near the miR-137 locus can signi cantly increase dendritic length and arborization in human neurons derived from hiPSCs (Forrest et al., 2017). Reduced dendritic spine density in postmortem brain tissue from SCZ patients, density has also been reported in multiple studies (Konopaske et al., 2014;Moyer et al., 2015;Glausier and Lewis, 2013), seemingly in contradiction to the increased dendritic density observed in hiPSC-derived neurons here. We speculate that SCZ genetic risk leads to increased dendritic density during the prenatal life and early childhood, while neuronal and speci cally dendritic dysfunction subsequently triggers aberrant pruning of synapses during adolescence, nally resulting in the reduced density observed in SCZ postmortem brains. Future studies are necessary to accurately understand the underlying mechanisms.
Synaptic dysfunction is long implicated in SCZ (Glantz and Lewis, 2000;Roberts et al., 1996;Berdenis van Berlekom et al., 2020; Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014;Fromer et al., 2014;Osimo et al., 2019;Mirnics et al., 2001). Dendrites are the major sites for synaptic connections (Cline, 2001). Dendritic spines specialized from the shaft contain postsynaptic receptors for the vast majority of excitatory glutamatergic synapses (Sekino et al., 2007). Thus, dendrite development and morphogenesis are critical determinants of synaptic functions. Conversely stabilization of dendrites also requires productive synaptogenesis and active synaptic input (Wu and Cline, 1998;Niell et al., 2004;Sekino et al., 2007). Therefore, dendrites and synapses contribute to the development and function of each. Synaptic loss and dendritic atrophy have been reported for neurodevelopmental and neurodegenerative disorders like SCZ, MDD and Alzheimer disease (AD) (Lin and Koleske, 2010), suggesting a possible therapeutic strategy of maintaining normal dendritic architecture and restoring its related functions (e.g., synapse development). Perhaps preventing synaptic over-pruning during childhood and adolescence could be bene cial (Sakai, 2020). Supporting evidence from an earlier study showed that people chronically exposed to minocycline, an antibiotic with anti-in ammatory effects that reduces pruning, are at signi cant lower risk of incident psychosis (Sellgren et al., 2019). Alternatively, restoration of DDM in adulthood may provide a path to treatment. We note that second-generation antipsychotic drugs such as olanzapine and clozapine can enhance neurite outgrowth and increase spine numbers (Critchlow et al., 2006;Lu and Dwyer, 2005). In addition, another study demonstrated an exciting strategy of designing a synthetic synaptic organizer protein to restore dendritic spine numbers, which nally led to the improvement of hippocampus-dependent learning in AD (Suzuki et al., 2020). Moreover, they anticipated that other extracellular scaffold proteins could be used to restore synaptic connectivity in other neurodevelopment and neurodegenerative diseases. Given the fact that therapeutics with direct genetic evidence are more likely to be successful (King et al., 2019;Nelson et al., 2015), such therapies to restore and stabilize dendrite morphogenesis are an important area of further exploration.

Replication of PGC subGWAS variants in CLOZUK+PGC study
We extracted the 52 new sigGWAS loci from CLOZUK+PGC study (Pardiñas et al., 2018). For each index SNP of the 52 loci, the corresponding LD segment (with relevant PGC study SNPs) was retrieved from HaploReg V4 (Ward and Kellis, 2016). The lowest P value SNP in each region was treated as the tag SNP for the region. We also considered the LD segments containing ≥1 subGWAS SNP (5 X 10 -8 < P ≤ 1 X 10 -6 ) identi ed in the PGC study.

De ning independent subGWAS index SNPs for CLOZUK+PGC data as input to iRIGS
We collected the GWAS summary statistics from the CLOZUK+PGC study and utilized an iterative strategy to obtain independent subGWAS index SNPs: In step 1, we removed all the SNPs within ± 500 kb intervals centered on sigGWAS index SNPs to avoid potential overlaps.
In step 2, we selected the SNP with the lowest P value from the remaining SNPs, and then removed all the other SNPs within its ± 500 kb region.
In step 3, we iterated step 2 until no further SNPs were left.
In step 4, we retained the selected SNPs with 5 x 10 -8 < P ≤ 1 x 10 -6 and de ned these SNPs as subGWAS index SNPs.
In total, we obtained 180 subGWAS index SNPs from the CLOZUK+PGC data.

Enhancers in GWAS LD regions
For each index SNP either from sigGWAS or subGWAS, we adopted a threshold of r 2 to de ne a LD region. Then we collected DLPFC enhancer data from the ROADMAP Epigenomics Project (Roadmap Epigenomics . To evaluate whether sigGWAS or subGWAS LD regions signi cantly overlap with DLPFC enhancers, we followed the strategy proposed in a previous study to construct a set of background SNPs used for a permutation test (Wang et al., 2016). Speci cally, for either sigGWAS or subGWAS, we picked a matched background SNP list requiring that it has the same number of SNPs as sigGWAS/subGWAS and further each SNP in the background SNP list has similar features to a given index SNP in sigGWAS/subGWAS: matched minor allele frequency (±0.1), gene density in 1 Mb region centered at the selected SNP (±3), location of the picked SNP (whether located in a gene body), distance to TSS of the closest protein-coding gene (±25 kb) and if the LD region of the selected SNP contains a similar number of related SNPs under the same criterion, i.e., r 2 ≥ 0.2 (±5). We randomly picked 1000 matched background SNP lists.
Three statistics are presented in the related section above: First, we counted the number of LD regions overlapping ≥ 1 DFPLC enhancers in sigGWAS/subGWAS and 1000 random background SNP lists, respectively. P values were calculated using the observed numbers in sigGWAS/subGWAS compared to the background of random lists.
Second, we also calculated the mean number of enhancers overlapping each LD region in sigGWAS/subGWAS and 1000 random lists. P values were calculated based on the observed mean and background means.
Third, we collected enhancers from other tissues in addition to DFPLC from ROADMAP Epigenomics Project. To obtain clean results on tissues other than brain, we removed brain enhancers from all nonbrain tissues. Then for a given tissue, we calculated the z score for the ith SNP in sigGWAS/subGWAS using the formula: Where C i is the number of enhancers overlapped by the ith SNP LD region in sigGWAS/subGWA, C i ' is the number of enhancers overlapped by the corresponding LD regions for matched SNPs sampled. Z score > 0 means a given test SNP overlaps more enhancers than its random background samples and vice versa.
Applying iRIGS to identify HRGs for SCZ We identi ed HRGs for SCZ by applying iRIGS (Wang et al., 2019), a Bayesian framework integrates multiple features and a gene network to predict risk genes affected by GWAS variants affect. We adopted the same gene network and features from the original study in our analyses here. Integrated features include the distance from target gene to index SNP (DTS), DNMs identi ed in SCZ (Turner et al., 2017, differential expression between SCZ patients and controls (Fromer et al., 2016) , and distal regulatory elements (DREs)-promoter links (Andersson et al., 2014;Won et al., 2016;Mifsud et al., 2015). Refer to the original study for details on how to generate these features and the gene network.
The terms of MPO (Smith and Eppig, 2009) were collected from MGI (http://www.informatics.jax.org/) and the corresponding gene sets were assembled as previously described (Wang et al., 2019). In that study, only central nervous system terms were analyzed, and we modi ed the method here by including all terms to get a more comprehensive phenotypic pro le. Brie y, we determined gene list of a speci c term by including all the genes in itself and its descendant terms. In this study, we compiled 12490 terms in total and selected 1721 terms with > 50 genes for our analysis. P values were computed by hypergeometric tests and corrected for multiple testing using the Bonferroni method.
The GO terms were downloaded from Gene Ontology database (Ashburner et al., 2000) and terms were grouped following the relationships of "is_a", "part_of", and "regulates", i.e., one term is a descendant term of a parent term if it has any of the three kinds of relationships with its parent term. In our hierarchical structure, genes of one term include all the genes residing in its descendant terms. P values were calculated using clusterPro ler (Yu et al., 2012) and the false discovery rate (FDR) method was used for multiple testing correction.

Gene expression analysis
We downloaded the transcript per million (TPM) values of genes from the GTEx portal (https://www.gtexportal.org/), and tissue speci city was measured by Preferential Expression Measure (PEM) (Huminiecki et al., 2003;Kryuchkova-Mostacci and Robinson-Rechavi, 2017). Accounting for the fact that tissues from the same organ may be correlated with each other and underlie the same disease, we calculated the tissue speci city in a grouping manner. For a speci c tissue, we computed its PEM versus all tissues in other organs, e.g., we calculated the speci city of frontal cortex by comparing it with all other non-brain tissues but not include brain tissues.
To calculate the developmental-stage speci city, we downloaded the data from BrainSpan (www.brainspan.org). Similarly, we grouped the stages into two major ones: prenatal (S2-S7) and postnatal stages (S8-S14) and calculated PEM for each stage following the same grouping manner, i.e., each stage in a prenatal category was computed comparing only with postnatal stages, and vice versa.
To calculate the speci city of neuron cell types, we collected four sets of single-cell sequencing (scRNAseq) data from previous studies, including two of fetal brain (Li et al., 2018;Nowakowski et al., 2017) and two of adult brain (Li et al., 2018;Lake et al., 2018). Cell types were annotated in the original studies, including excitatory neuron, inhibitory neuron, astrocyte, microglia, and others. We calculated the speci city of each gene in neuron following the same strategy used in a previous study (Skene et al., 2018), i.e., the mean expression in neuronal cells divided by the sum of mean expression in all cell types. Note that in these studies, sub-clusters (e.g., excitatory neurons in prefrontal cortex and inhibitory neurons in hippocampus) were annotated for excitatory/inhibitory neuron, so we rst calculated the speci city for these sub-clusters and then grouped them into two large categories, i.e., excitatory and inhibitory neurons.
iPSCs were generated using Cytotune-iPSC 2.0 TM Sendai viruses (Thermo Fisher Scienti c) according to manufacturer's instructions. After 3 days, transduced cells were plated on irradiated mouse embryonic broblast (MEFs) and grown for 2 to 3 weeks in hiPSC medium consisting of D-MEM/F12 (Thermo Fisher Scienti c), 20% Knockout Serum Replacement (KSR, Thermo Fisher Scienti c), 1X Glutamax (Thermo Fisher Scienti c), 1X MEM NEAA (Thermo Fisher Scienti c), 100 µM 2-mercaptoethanol (Thermo Fisher Scienti c), and 10 ng/ml human basic FGF (bFGF, PeproTech) until the emergence of individual colonies. Live hiPSC clones were manually picked and grown on MEFs using hiPSC medium for further expansion and characterization. Media were changed daily, and iPSC lines were passaged by collagenase (Thermo This study is supported by US NIH/NHGRI grants U01HG009086 (to R.C., Q. Wang, Q. Wei, Y.J., H.Y., X.Z., N.J.C., and B.L.), U24HG008956, and R01MH113362 (to J.S., N.J.C., and B.L.). The grant U01HG009086 supports the Vanderbilt Analysis Center for the Genome Sequencing Project (GSP) and U24 HG008956 supports the GSP Coordinating Center. The content is solely the responsibility of the authors and does not necessarily represent the o cial views of the National Institutes of Health.
Author Contributions R.C. and B.L. conceived the overall design of the study. R.C. and Q. Wang conducted most of the data analyses. C.X. and Z.W. performed the iPSC related experiments and analyses. Q. Wei, Y.J., H.Y., X.Z., and F.C. provided data integration and analyses. J.S.S., E.H.C., and N.J.C. contributed to the interpretation of the results. R.C., Q. Wang and B.L. wrote the manuscript, and all authors participated in the review and revision of the manuscript.