Unraveling Neuro-Proteogenomic Landscape and Therapeutic Implications for Human Behaviors and Psychiatric Disorders

Understanding the genetic basis of neuro-related proteins is essential for dissecting the molecular basis of human behavioral traits and the disease etiology of neuropsychiatric disorders. Here, the SCALLOP Consortium conducted a genome-wide association meta-analysis of over 12,500 individuals for 184 neuro-related proteins in human plasma. The analysis identified 117 cis-regulatory protein quantitative trait loci (cis-pQTL) and 166 trans-pQTL. The mapped pQTL capture on average 50% of each protein’s heritability. Mendelian randomization analyses revealed multiple proteins showing potential causal effects on neuro-related traits such as sleeping, smoking, feelings, alcohol intake, mental health, and psychiatric disorders. Integrating with established drug information, we validated 13 out of 13 matched combinations of protein targets and diseases or side effects with available drugs, while suggesting hundreds of re-purposing and new therapeutic targets. This consortium effort provides a large-scale proteogenomic resource for biomedical research on human behaviors and other neuro-related phenotypes.

proteins have the potential to provide insight into the pathophysiology of neurological and men-92 tal disorders and the genetic architecture of their molecular pathways, setting the basis for the im-93 provement of diagnostic instruments and targeted therapy 16 . 94 Protein levels are more linked to variation in cognitive function than genetic variants alone. 95 Current studies on neurology-related proteins either focussed on neurodegenerative disorders or 96 cognitive function specifically or had a limited sample size [17][18][19][20][21][22] . In a recent study, neurology-related 97 proteins were associated with general fluid cognitive abilities in late life, and a portion of these was 98 observed to be mediated by brain volume, measured as a structural brain variable 20 . 99 The field of proteomics has been rapidly expanding in recent years and produced results that 100 have played a fundamental role in the decoding process of molecular mechanisms involved in sev-101 eral traits and diseases, from cardiovascular disease to general health 19, 23-26 . The genomic studies 102 of the human proteome have benefited from various high-throughput measurement techniques, 103 such as mass spectrometry 14, 27 , aptamer-based assays 28 , and antibody-based assays 15 . Among these, 104 the antibody-based Proximity Extension Assay 29 has high measurement precision, especially for 105 many functional but low-abundant proteins.

106
This study aims to identify genetic variants associated with 184 neurology-related blood circu-107 lating proteins via a large-scale genome-wide association meta-analysis (GWAMA) and investigate 108 the proteins' genetic and potential causal relationships with potential disease-causing behaviors, 109 common psychiatric disorders, as well as related comorbidities. We systematically investigate the 110 proteins' therapeutic implications based on established drug information. We provide an atlas for 111 the genetic architecture of these proteins as a resource for biomedical research on human behav-112 iors and psychiatric disorders.

122
Out of the 137 proteins with detected pQTL, 68 proteins had significantly associated variants both 123 in cis-and trans-regulatory loci.

124
As expected, the identified trans-pQTL, in general, were more weakly associated than the cis-125 pQTL, nevertheless, we found that 24 proteins shared a total of 14 trans-pQTL. For example, well-126 known pleiotropic loci such as the HLA region and the ABO locus showed trans-regulatory effects 127 across a number of plasma proteins (Fig. 1a). For instance, 19 proteins showed significant trans-128 pQTL at the ABO locus, nevertheless, the associations were not completely due to the same causal 129 variants ( Supplementary Fig. 3). Most of the mapped pQTL were also found to be expression QTL 130 (eQTL) significantly associated with the expressions of the corresponding/nearest genes, however, 131 compared to trans-pQTL, cis-pQTL were much more likely to colocalize with eQTL, in terms of the 132 underlying genetic regulation (Supplementary Fig. 1-2). The lead variants of the cis-pQTL were 133 also more centered around the transcription start sites (TSS) of the corresponding coding genes, 134 compared to those of the trans-pQTL around the TSS of the nearest coding genes (Fig. 1b). The cis-135 pQTL also had stronger effects, less correlated with the minor allele frequencies (MAFs), compared 136 to the trans-pQTL ( Fig. 1c-d). 137 The fact that the trans-pQTL were not colocalized with eQTL could be partly due to the weaker 138 signals of the trans-pQTL than those of the cis-pQTL. However, we hypothesized that the trans-139 pQTL may not necessarily reflect the biological regulatory mechanisms of the corresponding pro-140 teins, but rather driven by underlying features of the blood samples, due to their influence on the 141 immuno-reaction of the Olink assay. For example, the pleiotropic trans-pQTL across the proteins 142 highlight major blood coagulation and clotting factors such as KLKB1 (Plasma kallikrein), KNG1 143 (Kininogen-1), and F12 (Coagulation factor XII), as well as glycosylation locus ST3GAL4. We thus 144 also looked into the functional pathways and gene sets that involve the closest genes to our trans-145 pQTL, using the gene set enrichment analyses ( Supplementary Fig. 6). With a false discovery rate 146 < 5%, 997 significant pathways were found to be enriched for the genes of our trans loci, of which  Table 8). Particularly, the trans-pQTL were 150 found to be enriched in 1) established GWAS traits such as blood protein levels, platelet count, and 151 platelet crit; 2) GO pathways such as biological adhesion, wound healing, coagulation, and glyco-  individual-level data collected in the ORCADES cohort to assess the narrow-sense heritability for 159 each protein 33 . Across the analyzed proteins, we found that the higher the protein's heritability, 160 the more pQTL detected for the protein (Fig. 1e), the stronger the cis-pQTL effects are (Fig. 1g), 161 and the higher amount of phenotypic variance captured by the detected pQTL (Fig. 1f). On aver-162 age, the mapped pQTL together explain 49% of the proteins' heritability. This indicates that pro-163 teins as molecular phenotypes have strong major regulatory loci. Nevertheless, their genetic ef-164 fects can still be widespread across the genome, having a polygenic genetic architecture.

165
Using data from the ORCADES cohort, we found TDGF1 (Teratocarcinoma-Derived Growth Fac-  Table 2). 113 of our discovered loci were already discovered in previous studies. We also checked 178 whether the hits from the meta-analysis were significant in the individual cohorts and observed 179 that 73 of the sentinel variants were found to be statistically significant only in the meta-analysis. 180 We also extracted the established associations between our mapped cis-pQTL and complex traits 181 from the PhenoScanner database (Supplementary Table 3). At a 5% false discovery rate, 39 cis-182 pQTL showed significant association with both complex traits and other proteins (mostly based 183 on an aptamer-based assay). We found that the level of pleiotropy at the protein level, i.e., being 184 trans-pQTL for other proteins, is associated with the level of pleiotropy on the complex traits (Sup-185 plementary Fig. 4). 186 We performed linkage disequilibrium (LD) pruning (r 2 < 0.001) to identify secondary indepen-187 dent associations at the cis-pQTL. We identified a total of 769 additional variants across all the 117 188 proteins with cis-pQTL mapped (Supplementary Table 4).

189
This meta-analysis within our SCALLOP collaborative framework is a follow-up of a previous 190 7 study on the proteins from the Olink Neurology and Neuro-exploratory panels, where data were 191 collected from the two Greek cohorts that we included in this study 36 . Our results replicated over 192 90% of the established loci, including the previous main discoveries of the cis-pQTL for CD33, GP-193 NMB, and MSR1. Furthermore, we cross-referenced the significant loci discovered in the meta-  Table 1).

198
Mendelian randomization analysis identifies plausible causal protein mark-199 ers for neuro-related phenotypes 200 In order to make statements on potential causality from the proteins to complex traits and dis-  Table 5).

208
In order to control for false positive inference due to LD, we adopted the HEIDI (heterogeneity 209 in dependent instruments) 39 test statistic to examine the colocalization between each pQTL and 210 its association with the corresponding downstream outcome phenotypes. Nine out of the 24 plau-211 sible causal associations had colocalization support by HEIDI (p > 0.05) ( Fig. 2-3, Supplemen-212 tary Table 5). Among these, the single protein CDH6 showed a potential causal effect on neurolog-213 ical and behavioral traits including mood swings, miserableness, leg pain, smoking, and neuroti-214 cism, where the effect on smoking had a different direction compared to on the others. CTSC and

215
LGALS8 were both plausible causal markers for alcohol intake but with opposite effects directions.

216
CDH17 showed an positive effect on intelligence. DPEP1 showed a negative effect on napping, while 217 as a druggable target it also showed a potential risk-increasing effect on schizophrenia. Bank database (see Data Availability). There were 13 protein-trait combinations from the signifi-239 cant MR discoveries that matched established drugs. We found that for all the 13 established drug 240 targets ( Fig. 5a-b), the MR-inferred causal effects directions matched the corresponding target-241 ing drugs' pharmacological effects (including side effects) (Fig. 5c). For instance, hyaluronic acid 242 is a liver disease biomarker, the protein NCAN binds with hyaluronic acid thus reduces liver cir-  Clenbuterol was used as a bronchodilator in the treatment of asthma patients. But it can cause 249 long and short-term side effects, including hypertension. Our MR analysis showed that the increased 250 9 level of beta-nerve growth factor (beta-NGF), which could be caused by Clenbuterol, could lead to 251 a higher risk of hypertension (Fig. 5d). 252 The MR analysis reveals that protein CTSS (cathepsin S) can increase platelets in the blood and 253 reduce mean platelet volume. Fostamatinib can inhibit the protein CTSS, known as an approved 254 medication for chronic immune thrombocytopenia (ITP) by inhibiting the spleen tyrosine kinase 255 (SYK). It indicates that fostamatinib treats ITP via both protein SYK and CTSS (Fig. 5e). 256 Cilastatin is a dehydropeptidase 1 (DPEP1) inhibitor used to prevent degradation of imipenem, 257 both were used together to treat infections. We found that inhibiting DPEP1 can increase the risk 258 of high blood pressure, while decrease the risk of schizophrenia (Fig. 5f). This indicates clinical 259 re-purposing potential of Cilastatin, and other DPEP1 inhibitors, as treatments for schizophrenia, 260 though further investigations are needed.

261
Overall, besides the validated targets, we also identified 273 suggestive drug re-purposing target-262 disease pairs for 18 proteins ( Fig. 5a-  Regarding the MR methodology, we found that the MR analysis with a single genetic instrument 286 at the cis-pQTL tended to generate a stronger estimated causal effect (Fig. 4). This is partly due to 287 power, as compared to multi-instrument MR, single-instrument MR tends to produce causal effects 288 estimates with larger standard errors, so that only the results with large causal effects estimates 289 could reach statistical significance. Thus it indicates: 1) Single genetic instrument analysis may be 290 more prone to winner's curse, i.e., more likely to detect an overestimated effect on the outcome 291 trait; 2) using multiple independent instruments within a locus may not only improve power but 292 also control false discoveries due to overestimated effects in the outcome GWAS.

293
As expected, the mapped trans-pQTL did not show good colocalization with nearby genes, and 294 they were enriched in blood clotting and coagulation pathways. For instance, a blood clotting fac-295 tor KLKB1 appeared to be a trans-regulatory hub for multiple proteins. We thus infer that some of 296 the trans-pQTL discovered are not directly involved in the genetic mechanisms of the correspond-297 ing proteins, but rather they regulate blood characteristics that affect the performance of the antibody-298 based assays. This is an important discovery for biotechnological development in proteomics, sug-    Table 11.        Causalitybetween theproteinsandneuro-relatedphenotypes inferredbyMendelian randomization (MR) analyses. The forest plot shows the signi cant MR results (false discovery rate< 0.05) based on LDpruned (r 2 < 0.001) instrumental variants within each cis-pQTL. Inversevariance weighted (IVW) estimates are provided as the solid round dots, and the whiskers indicate 95% con dence intervals. The numbers of instrumental variants in the cis-pQTL are given to the right of the whiskers. As a colocalization measure, the HEIDI (heterogeneity in dependent instruments) test evidence (p > 0.05) are given as the diamonds, where the largest diamonds correspond to a p-value of 1. The upper part of the plot shows the results where the proteins are known druggable targets, while the lower part shows the results for new protein targets.

Figure 3
Regional association patterns of the pQTL and the colocalized neuro-related complex traits. The displayed protein-trait pairs correspond to the Mendelian randomization discoveries in Figure 2 with the HEIDI p-value > 0.05. Each sub gure shows the pQTL region of 1Mb centered at thelead variant. The vertical dashedlinein each sub guremarks the transcription start site of the corresponding protein's coding gene.

Figure 4
Causalitybetween theproteinsandUKBiobankdiseasephenotypes inferredbyMendelian randomization (MR) analyses. The forest plot shows the signi cant MR results (false discovery rate< 0.05) based on LDpruned (r 2 < 0.001) instrumental variants within each cis-pQTL. Inverse variance weighted (IVW) estimates are provided as the solid round dots, and the whiskers indicate 95% con dence intervals. The numbers of instrumental variants in the cis-pQTL are given to the right of the whiskers. As a colocalization measure, the HEIDI (heterogeneity in dependent instruments) test evidence (p > 0.05) are given as the diamonds, where the largest diamonds correspond to a p-value of 1. The upper part of the plot shows the results where the proteins are known druggable targets, while the lower part shows the results for new protein targets.

Figure 5
Drug targets revealed by Mendelian randomization (MR) analyses.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SCALLOPNEUXsuppinfocompressed.pdf