Cross-tissue transcriptome-wide association studies of 885,176 individuals and seven diseases of the gut-brain axis identify susceptibility genes shared between schizophrenia and in ammatory bowel disease

David Ellinghaus (  d.ellinghaus@ikmb.uni-kiel.de ) Institute of Clinical Molecular Biology https://orcid.org/0000-0002-4332-6110 Florian Uellendahl-Werth Christian-Albrechts-University of Kiel Carlo Maj Institute for Genomic Statistics and Bioinformatics Oleg Borisov Institute for Genomic Statistics and Bioinformatics Simonas Juzenas https://orcid.org/0000-0001-9293-0691 Eike Wacker Institute of Clinical Molecular Biology https://orcid.org/0000-0002-2068-2550 Isabella Jørgensen University of Copenhagen https://orcid.org/0000-0001-5694-4332 Tim Steiert Institute of Clinical Molecular Biology Saptarshi Bej University of Rostock The IBD Genetics Consortium International PSC Study Group Peter Krawitz https://orcid.org/0000-0002-3194-8625 Per Hoffmann Institute of Human Genetics Christoph Schramm First Department of Internal Medicine Olaf Wolkenhauer University of Rostock https://orcid.org/0000-0001-6105-2937 Karina Banasik University of Copenhagen https://orcid.org/0000-0003-2489-2499 Søren Brunak University of Copenhagen https://orcid.org/0000-0003-0316-5866 Stefan Schreiber Institute for Clinical Molecular Biology and Department of General Tom Karlsen Oslo University Hospital Franziska Degenhardt University of Duisburg-Essen Markus Nöthen School of Medicine & University Hospital Bonn https://orcid.org/0000-0002-8770-2464 Andre Franke Christian-Albrechts-University Kiel Trine Folseraas Oslo University Hospital Rikshospitalet


Introduction
The direct link (gut-brain-axis [GBA]) of intestinal dysfunction and in ammation with brain development and mental illness is one of the most current topics in biomedical research [1][2][3] . Gastrointestinal symptoms have been observed in patients with neurobehavioral, neurodevelopmental and mental diseases, including attention-de cit/hyperactivity disorder (ADHD) 4 , schizophrenia (SCZ) 5 and major depressive disorder (MDD) 6 . Vice versa, psychiatric comorbidity, including depression, anxiety, bipolar disorder (BD) and SCZ, is known to be more prevalent in in ammatory bowel disease (IBD) patients [7][8][9] . Given the polygenic nature of most psychiatric and chronic in ammatory diseases, some of the (potentially shared) phenotypic variation in disease risk is expected to be explained by expression quantitative trait loci (eQTL) active in disease-relevant tissues of the GBA 10 .
In this study, we performed cross-tissue transcriptome-wide association studies (TWAS) for 23 tissues of the GBA using GWAS summary statistics from ten case-control GWAS data sets (total 246,772 patients and 638,454 controls, Supplementary Table 1; Methods) of three clinically related in ammatory diseases of the gastrointestinal tract (CD, UC and primary sclerosing cholangitis (PSC) because PSC patients suffer from a highly increased frequency (62-83%) of IBD called PSC-IBD 11 ) and four diseases of the mind and brain (SCZ, MDD, BD and ADHD) for which moderate to strong genetic correlation (r g >0. 3) at the genome-wide level has been described between chronic in ammatory diseases 12 and psychiatric diseases 13 , respectively. The TWAS approach can be viewed as a transcriptome-wide screening method to test for gene-disease associations from GWAS data sets 14 . By using a new principled cross-tissue TWAS method (UTMOST) 15 -simultaneously training expression imputation models across multiple disease-relevant tissues and performing cross-tissue gene-level conditioned association tests -in combination with GWAS/TWAS conditional analysis for cross-phenotype studies we were able to address several types of confounding factors that have previously compromised the validity of TWAS studies for complex diseases 14,16 : (1) poor TWAS prediction accuracy due to imputation from reference expression panels containing tissues not relevant to disease, (2) a bias in the number of transcriptome-wide signi cant genedisease associations due to different sample sizes of reference datasets in expression imputation models 17 , (3) an excess of TWAS association signals at loci with multiple TWAS signals due to correlated expression among genes of the same locus and (4) an inaccurate determination of association boundaries for TWAS signals based on SNP information.
The objectives of this cross-tissue, cross-phenotype TWAS were to (i) evaluate the effectiveness of cross-tissue imputation analysis using ten GWAS consortium studies of psychiatric and in ammatory disease; (ii) identify transcriptome-wide signi cant gene-disease associations in GBA tissues for each disease and localize the most strongly associated genes through cross-tissue and multiple-gene-conditioned ne-mapping analyses; (iii) quantify the overall component of (shared) genetically regulated expressional changes (ρ E ) for immune-mediated and psychiatric diseases in relation to genome-wide genetic correlation (ρ G ); (iv) identify comorbidities and causal relationships using population-based administrative health data from the entire Danish population and by performing Mendelian Randomization (MR) analyses; (v) identify susceptibility genes and pathways shared between psychiatric disorders and in ammatory diseases of the gastrointestinal tract using bulk RNA-Seq data of different developmental stages and single-cell RNA-sequencing (scRNA-seq) expression data for tissues of the GBA.

Results
Cross-tissue transcriptome-wide association studies (TWAS) for seven diseases of the GBA The UTMOST framework was used to perform cross-tissue TWAS for 17,290 genes and 23 selected disease-relevant tissues, which had been identi ed as disease relevant for diseases of the GBA in a previous study of Finucane and colleagues 18 (Methods), using transcriptome-wide and genome-wide reference data of 4,043 samples provided by consortia projects GTEx 19 , STARNET 20 , and BLUEPRINT 21 (Fig. 1, Supplementary Table 2). A schematic overview of the study work ow is shown in Supplementary Fig. 1. Gene expression imputation was conducted using GWAS summary statistics data of ten consortium metaanalysis studies comprising 246,772 independent cases of European-ancestry (CD: 21 Table 1). We then tested for gene-disease associations for a maximum of 17,290 genes and 23 tissues for each of the ten GWAS meta-analysis studies. Single-tissue gene-wise (marginal; i.e. unconditioned) association P values (analysis Singletissue marginal ; Supplementary Table 3) were combined into (marginal) joint-tissue gene-wise summary P values (analysis GBJ marginal , resulting in one P GBJ_marginal for each gene and disease; Supplementary Table 3) via a generalized Berk-Jones (GBJ) test 22 . The cross-tissue GBJ gene test quanti es genedisease associations across all tissues of the GBA (separately for each disease), adjusts the correlation between TWAS association statistics for individual tissues, and also provides (compared with standard univariate meta-analysis approaches) a substantial increase in statistical power in situations where eQTL effects are not unique to a single tissue (Methods). The GBJ test identi ed a total of 586 (277), 345 (179), 214 (79), 361 (104), 38 (21), 77 (10) and 42 (8) gene-disease associations (number of loci with size of ±1Mb) with transcriptome-wide signi cance (P GBJ_marginal <0.05/15,587 = 3.20×10 − 6 ; Supplementary  Table 3). To evaluate a possible bias in the number of transcriptome-wide signi cant gene-disease associations due to different sample sizes of reference data sets, we further conducted TWAS analyses using pre-trained expression imputation models provided by S-PrediXcan 23 and FUSION 24 (Methods). We observed on average only a moderate positive correlation between reference sample size and the number of signi cant genes per tissue in our cross-tissue TWAS results (average Spearman's ρ UTMOST = 0.46; Supplementary Table 4, Supplementary Fig. 3). This suggests that cross-tissue TWAS imputation models can effectively select predictive cis eQTL variants (within ±1 Mb of the transcription start/end site of a gene) shared by multiple tissues, whereas tissue-speci c eQTLs with strong effects were retained.
Cross-tissue and multiple-gene-conditioned ne-mapping analyses at loci with multiple gene-disease association signals P GBJ_marginal values from cross-tissue TWAS analyses (Supplementary Table 3) showed weak to moderate in ation of GBJ association statistics (λ 1000 in the range [0.94;1.05]; Supplementary Table 5). One reason for this may be co-regulation of multiple genes by the same eQTL, as well as LD between eQTLs with non-zero prediction weights, which may lead to correlated predicted expression between nearby genes and thus multiple correlated TWAS hits per locus 14 , analogous to the situation in a GWAS study where an association signal typically spans a large number of highly signi cant SNPs in high LD with the causal SNP 25 . In order to distinguish between causal and marker gene-disease associations at signi cant TWAS loci, we performed (i) multiple-gene-conditioned single tissue analysis (Single-tissue conditional ) followed by (ii) multiple-gene-conditioned cross-tissue analysis (GBJ conditional ) (Methods) with nearby genes within range of ±1Mb around transcriptome-wide signi cant genes from marginal results (genes highlighted as red dots in the Manhattan plots in Supplementary Fig. 2). This accounted for potential cross-tissue co-regulation and LD between eQTL SNPs, and furthermore, the marginal association result of each gene was conditioned based on all genes within ±1 Mb region (avoiding speci cation of xed locus boundaries based on LD blocks from genotype data). This analysis yielded a total of 215 (80), 152 (50), 150 (22), 89 (53), 9 (5), 5 (5), 10 (6) GBJ gene-disease associations (separate loci of ±1Mb) with transcriptome-wide signi cance (P GBJ conditional <3.20×10 − 6 ) for CD, UC, PSC, SCZ, MDD, BD, ADHD, respectively (Supplementary Table 3 and Supplementary  Table 7). Table 1 Transcriptome-wide signi cant (P conditional <3.20×10 − 6 ) genes from multiple-gene-conditioned ne-mapping analysis at loci with multiple gene-disease association signals, for psychiatric and immune phenotypes in 23 gut-brain-axis (GBA) tissues. For selection of 23 tissues, see Fig. 1 Fig. 2a) from cross-tissue and multiple-gene-conditioned ne-mapping analysis at loci with multiple gene-disease association signals (analysis GBJ conditional ; see Methods). The number of associated loci of size ± 1 Mb in square brackets corresponds to the number of red dots in Fig. 2a. GBJ conditional corrected for nearby genes in the range of ±1Mb around for leading transcriptomewide signi cant genes (red dots in Fig. 2a) across all 23 tissues. n unique genes in total from single tissues, number of transcriptome-wide signi cant genes (P conditional <3.20×10 − 6 ; Supplementary Table 3) from multiple-gene-conditioned single tissue analysis (i.e. Single-tissue conditional , before applying the generalized Berk-Jones (GBJ) test to quantify total gene-disease associations across all 23 tissues). CD, Crohn's disease; UC, ulcerative colitis; PSC, primary sclerosing cholangitis; SCZ, schizophrenia; MDD, major depressive disorder; BIP, bipolar disease (BIP); ADHD, attention-de cit/hyperactivity disorder (ADHD). * signi cant gene-disease associations in at least two of 23 tissues; ** signi cant gene-disease associations in at least two tissues including brain and non-brain tissues (excluding cross-tissue associations for immune and gastro/liver tissues only, respectively, see Fig. 1); in curved brackets: number of gene-disease associations outside the extended major histocompatibility complex (MHC; chr6:25-34Mb).
Comorbidity analysis and trait correlation at the level of predicted gene expressions Using population-based administrative health records from the Danish National Patient Registry (DNPR), we conducted a retrospective cohort study for the period 1994 to 2018 by examining directed diagnosis pairs of 7,191,385 people from the entire Danish population (Methods). We con rmed statistically signi cant comorbidity (false discovery P FDR <0.05) between our four psychiatric and three immunological phenotypes for 9 psychiatric-immunological pairs out of 42 disease pairs (Supplementary Table 8) with an increased incidence of SCZ (relative risk RR = 1.15), BP (RR = 1.28) and depression (RR = 1.52) in CD, depression in UC (RR = 1.34) and PSC (RR = 1.42), CD in SCZ (RR = 1.12), BD (RR = 1.29) and depression (RR = 1.57), UC (RR = 1.37) and PSC (RR = 1.57) in depression.
Next, to examine pairwise trait correlations at the level of predicted (genetically regulated) gene expression (ρ E ) and to compare ρ E with genome-wide genetic correlations (ρ G ), we calculated Spearman's correlations between TWAS gene effect estimates for all pairs of diseases by using the gene-wise Z scores from the analyses Single-tissue marginal and GBJ marginal and a correlation-pruned list of 1,825 co-regulation-and eQTL-independent genes previously used in Mendelian randomization studies 26 (to avoid measuring correlated predicted expression on the transcriptome-wide scale) (Methods). On the single tissue level (Single-tissue marginal ; Supplementary Table 9), the strongest positive correlation across psychiatric and immune phenotypes was observed for CD4 + T-cells (̂E = 0.11 for CD and BP and ̂E = 0.14 for CD and SCZ) ( Supplementary Fig. 4). On the cross-tissue level across all 23 tissues (GBJ marginal ; Supplementary    Table 10). HEIDI-ltering identi ed CSNK1A1 and SGSM3 as potential horizontally pleiotropic genes, suggesting a pleiotropic function in both diseases.
Identi cation of gene-disease associations shared between psychiatric and immune phenotypes After identifying pairs of immune-mediated and psychiatric diseases as putative correlated traits on the transcriptome-wide level, we sought to identify susceptibility genes shared between psychiatric and immune phenotypes from the list of all transcriptome-wide signi cant genes from both analyses Singletissue conditional and GBJ conditional ( Table 1). Out of 288 possible pair-wise tissue-speci c combinations of psychiatric and immune phenotypes (Supplementary   Table 11), gene overlap analysis revealed 31 (of those 4 GBJ conditional ) pairs of psychiatric and immune phenotypes that share at least one susceptibility gene.
After excluding genes within the major histocompatibility complex (MHC; chr6:25-34Mb), which was particularly relevant to pairs PSC-MDD and PSC-SCZ in various tissues, we identi ed three non-MHC TWAS susceptibility genes (NR5A2, SATB2, PPP3CA) from analyses Single-tissue conditional to be shared between brain (SCZ,CD,UC: Hypothalamus; SCZ,UC: Frontal cortex BA9; SCZ,CD: Putamen basal ganglia) and intestinal tissues (CD,UC: Colon transversum; UC: Colon sigmoideum; CD: Colon transversum) ( Table 2; more detailed results in Supplementary Table 3). In summary, these three genes met the transcriptome-wide signi cance threshold of 3.20×10 − 6 for SCZ and CD/UC in the same individual tissues and also showed associations across both brain and intestinal tissues, with substantial gene expression heritability estimates h med 2 (i.e. heritability mediated by the cis genetic component of gene expression levels) across the 23 tissues (h med−min 2 , h med−max 2 ) of (13.4%, 18.2%) for NR5A2, (6.9%, 24.8%) for SATB2, and (5.0%, 10.9%) for PPP3CA (Table 2; Methods). Thus, these genes met the most stringent criteria for shared susceptibility genes, as they were shared between psychiatric and immune phenotypes in the same tissues and across brain and non-brain tissues and were therefore prioritized below. We observed that an increase (decrease) in predicted expression for NR5A2 and SATB2 (PPP3CA) was associated with an increased risk of SCZ, CD and/or UC (Table 2). In addition, four other gene candidates (INO80E, SF3B1, SGSM3, ZC3H7B) from the secondary analysis (GBJ conditional ) met the transcriptome-wide signi cance threshold and showed associations across both brain and intestinal tissues, but each in different tissues for psychiatric and immune phenotypes, implying that for these four genes the most signi cant tissues differ between psychiatric and immune phenotypes (Supplementary Tables 11-13). This suggests that those four genes may be involved in SCZ and CD/UC, but not necessarily in the same tissue. We will show below by gene set enrichment analysis that INO80E, SGSM3, ZC3H7B are likely to be proxies for other genedisease associations at the respective loci. Table 2 Transcriptome-wide signi cant susceptibility genes (NR5A2, SATB2, PPP3CA) shared between schizophrenia and in ammatory bowel diseases expressed in t tissues. Gene overlap analysis of TWAS results from multiple-gene-conditioned ne-mapping analysis (Single-tissue conditional ; Supplementary Table 11, wit   Supplementary Table 3) identi ed three common TWAS susceptibility genes (NR5A2, PPP3CA, SATB2) that meet the transcriptome-wide signi cance threshold for each disease separately; for diseases listed in the "Diseases primary tissue(s)" column). These three genes are shared between psychiatric and immune p tissues and across brain and non-brain tissues (Fig. 3). Increased predicted expression (indicated by a positive Z score) of NR5A2 and SATB2 is associated wit CD as well as SCZ and UC. Decreased predicted expression (indicated by a negative Z score) of PPP3CA is associated with increased risk for SCZ and CD. Ge major histocompatibility complex (MHC; chr6:25-34Mb) were excluded from gene overlap analysis.  (P conditional <1×10 − 4 ) for gene-disease association; Z statistic, Z score from analysis Single-tissue conditional and/or GBJ conditional for primary tissue(s), the plus/ increased/decreased predicted expression of these genes to be associated with increased disease risk, Z scores in bold type meet the transcriptome-wide sign 3.20×10 − 6 , TWAS P conditional , P-value from multiple-gene-conditioned ne-mapping analyses Single-tissue conditional and/or GBJ conditional for primary tissue(s), meet the transcriptome-wide signi cance threshold of 3.20×10 − 6 ; Overlap with GWAS locus, testing an overlap with locus boundaries of established GWAS lo mapping GWAS studies for SCZ 30

GWAS/TWAS conditional analysis to describe the GWAS contribution to the TWAS result
To estimate the contribution of the three shared "same-tissue" susceptibility genes NR5A2, SATB2 and PPP3CA relative to the established GWAS signals as de ned by the recent consortia ne-mapping studies for SCZ 30 and CD/UC 31,32 , we performed a joint GWAS/TWAS ne-mapping analysis across the 23 tissues of the GBA using the GWAS consortium summary statistics listed in Supplementary Table 1 Increased genetically regulated expression of NR5A2 in the hypothalamus and colon transversum was associated with increased risk for SCZ, CD and UC. We found that the TWAS association of NR5A2 at 1q32.1 in the hypothalamus and colon transversum (Fig. 3a) was driven by the nearby (but non-overlapping) established SCZ and CD/UC GWAS signals located 104 and 654 kilobases (kb) upstream of NR5A2, respectively. This GWAS locus was not assigned a gene in the CD/UC consortium ne-mapping study 32 but was assigned an overlap with an epigenetic mark for immune cells (Immune_H3K4me1). Given the functional evidence for NR5A2 in attenuating in ammatory damage in murine and human intestinal organoids 34 (see also Supplementary Box 1) we propose NR5A2 as the best candidate gene for the GWAS locus 1q32.1 with a possible effect on the hypothalamic-pituitary-adrenal axis (HPA axis) 10 .
Increased genetically regulated expression of SATB2 in frontal cortex BA9 and colon sigmoideum was associated with increased risk of SCZ and UC. The TWAS association of SATB2 at 2q33.1 (Fig. 3b) is driven by the nearby (non-overlapping) established SCZ and UC GWAS signals located 67 and 425 kilobases (kb) downstream and upstream of SATB2, respectively. Because multiple TWAS eQTLs were identi ed for SATB2 in the 95% credible set of the UC GWAS peak (consisting of intergenic variants between LOC101927619 and SATB2 32 ), it is convincing that the GWAS/TWAS ne-mapping analysis revealed a possible explanation of the GWAS signal by eQTLs at this locus. SATB2 as a potential susceptibility gene for UC and SCZ has already been implicated by nongenetic studies (Supplementary Box 1).
Decreased genetically regulated expression of PPP3CA in the putamen basal ganglia and colon transversum was associated with increased risk of SCZ and CD. We found that the TWAS association of PPP3CA at 4q24 (Fig. 3c) is driven by the nearby (non-overlapping) established SCZ and CD GWAS signals located 280 and 384 kilobases (kb) upstream of PPP3CA, respectively. GWAS ne-mapping analysis 32 assigned the 95% credible set of ne-mapped SNP variants (n = 170 SNPs) to the gene BANK1, which is closest to the lead GWAS SNP variant at 4q24. Because PPP3CA encodes a subunit of calcineurin (Supplementary Box 1) that has been associated with SCZ and is also a known drug target (due to a direct physical interaction with FKBP Prolyl Isomerase 1A (FKBP1A) 35 , also known as FKBP12) for the treatment of UC via calcineurin inhibitors, we propose PPP3CA as the best candidate gene for the GWAS association signal at 4q24.
Analysis of bulk RNA-Seq data of different developmental stages and scRNA-seq expression data for tissues of the GBA Having tested the three gene-disease associations for plausibility at the GWAS level in the previous section, we hypothesized that high, tissue-or cell-typespeci c expression would be indicative of a particular function in the tissue of interest, and that expression of a gene could also occur only in a small subset of cells of only a particular tissue or at a particular human developmental stage, which is hypothesized for SCZ 36 . We used publicly available non-diseased bulk tissue 37,38 and scRNA-Seq data sets 39,40 (Methods) to look up in which tissues and at which developmental stage of human tissues and cells the shared susceptibility genes are generally expressed, and found that the observed patterns matched the existing knowledge on the genes (Supplementary Box 1). Using GTEx and BLUEPRINT data sets representing organ-and blood cell type-speci c bulk gene expression data (Fig. 4a), we found that NR5A2 was expressed in gastrointestinal tissues but not in the brain and barely in blood cells, whereas SATB2 was expressed mainly in the colon transversum and prefrontal cortex, the same tissues where we detected the shared TWAS associations, and PPP3CA in various brain tissues, but less so in intestinal tissues. It is also possible that an adverse event in utero can interfere with normal brain development and create a brain vulnerability that, in an individual already at risk (genetic predisposition), can lead to SCZ later in life 36 . We therefore examined the expression of our three candidate genes in developmental data from brain 37 and intestine 38 , since it is conceivable that a susceptibility gene is expressed during fetal development but is no longer expressed in adulthood. Again, NR5A2 was almost absent in bulk brain samples from different developmental stages, whereas SATB2 was strongly upregulated speci cally in the fetal forebrain during mid-fetal development, and PPP3CA was weakly expressed in all developmental stages of the brain 37 (Fig. 4b).
In the intestine, it was the opposite: NR5A2 was strongly expressed in the ileum and colon during the fetal and juvenile stages, whereas SATB2 was weakly expressed in the colon sigmoideum and not ileum, and PPP3CA almost not at all 38 (Fig. 4b).
To achieve resolution at the single cell level, we analyzed human single-nucleus and scRNA-seq data from different cell types of the adult brain 39 and gastrointestinal tract 40 , respectively (Fig. 4c). In these data sets, NR5A2 is expressed in the ileal progenitor, stem, and transient amplifying (TA) cells, less strongly in the colon and rectum, and not in the brain, consistent with the bulk expression data. Consistent with the bulk expression from developmental stages of the intestine (Fig. 4b), SATB2 was strongly expressed in cell types of the colon and rectum and barely expressed in brain cells. PPP3CA expression was highly enriched in enteroendocrine and Paneth-like cells of the ileum, colon and rectum despite weak expression in the bulk RNA-Seq data of ileum or colon, which may imply a speci c role of PPP3CA in immune modulation in the gut for CD and UC; in the brain, PPP3CA is also expressed in neurons (Fig. 4c). Bulk tissue and scRNA-seq results for genes INO80E SF3B1, SGSM3, ZC3H7B are shown in Supplementary Fig. 11.
Calcineurin-dependent NFAT and Wnt signaling as shared signaling pathways for IBD and SCZ Based on the statistically unambiguous gene prioritization from multiple-gene-conditioned ne-mapping analyses, it is likely that NR5A2, SATB2 and PPP3CA are shared susceptibility genes for both IBD and SCZ.  Box 1), has been shown to bind to β-catenin 44 , which is also part of the Wnt pathway. Recent ndings also suggest that SCZ and BD are characterized by abnormal Wnt gene expression and plasma protein levels, suggesting that suggest that drugs targeting the Wnt signaling pathway may play a role in the treatment of severe mental disorders 45 . Thus, the Wnt signaling pathway and NFAT activation are the most likely candidate pathways for IBD and SCZ, where risk is mediated by gene expression.

Discussion
By performing cross-tissue and multiple-gene conditioned transcriptome-wide association studies (TWAS) for 23 tissues of the GBA for three clinically related in ammatory diseases of the gastrointestinal tract (CD, UC and PSC) and four diseases of the mind and brain (SCZ, MDD, BD and ADHD) we identi ed susceptibility genes NR5A2, SATB2, and PPP3CA shared between IBD (CD/UC) and SCZ that are (genetically) expressed both in intestinal and brain tissues.
We estimated the proportion of the genetically regulated component of expression (h med−min 2 , h med−max 2 ) with respect to total gene expression, measured across a maximum of 23 different tissues, to be (13.4%, 18.2%) for NR5A2, (6.9%, 24.8%) for SATB2, and (5.0%, 10.9%) for PPP3CA. These three genes could represent a molecular link between psychiatric and in ammatory traits and may contribute to the observed genetic correlations 27,46 and increased prevalence of SCZ in CD/UC patients at the population level 7-9 .
Our bioinformatic cross-tissue TWAS pipeline for selected tissues of GBA addressed typical technical drawbacks of previous TWAS studies and enabled conditional analysis of multiple disease-associations at TWAS loci. In our opinion, TWAS analyses for CD, UC and SCZ had the highest statistical power to detect shared TWAS susceptibility genes, as speci cally for CD, UC and SCZ a much higher expression-mediated heritability (h med 2 ) has recently been estimated than for other psychiatric disorders such as BD 47 . Another key outcome of our GWAS/TWAS conditional analysis for these three genes is that the expression-disease associations largely explain the previously ne-mapped GWAS association signals at nearby GWAS susceptibility loci for SCZ 30 , CD and UC 31,32 , suggesting that risk is mediated by genetically regulated expression at those loci. Together with the results of our gene set enrichment pathway analysis, we were able to show that genetic variation mediated by gene expression in the Wnt signaling pathway and NFAT activation by calcineurin is associated with both SCZ and IBD.
A very interesting observation is the gene-disease association with PPP3CA. PPP3CA encodes a subunit of calcineurin (the catalytic subunit Calcineurin A).
Calcineurin is a Ca 2+/ calmodulin-regulated serine/threonine protein phosphatase 48 , which activates the transcription factor NFAT (nuclear factor of activated T cells) and thus plays a key role in the regulation of the immune response (Fig. 5). After multiple-gene-conditioned ne-mapping analyses at loci with multiple gene-disease association signals, in total, 18 out of 55 genes of the NFAT signaling pathway showed a gene-disease association with either IBD or SCZ, or both ( Fig. 5; Methods). Calcineurin inhibitors (e.g. cyclosporine A and tacrolimus) have long been used as immunosuppressants after solid organ transplantation and are used as rst-line treatment in PSC patients who have undergone liver transplantation 49 . Calcineurin inhibitors are also being studied for their e cacy in a number of autoimmune diseases 50 . Recent clinical trials with tacrolimus have demonstrated the e cacy and safety of tacrolimus in treatment-resistent UC 51 and ulcerative proctitis (UC con ned to the rectum) 52,53 . However, knockout of calcineurin in the forebrain of mice was found to be associated with symptoms of schizophrenia-like psychosis 54 , and recently experiments in rats have shown that calcineurin inhibitor therapy may be a risk factor for the development of neurobehavioral changes 55 . In humans, treatment with calcineurin inhibitors sometimes showed an increase of neuropsychiatric side effects 56, 57 and tracrolimus has been reported to cause relapse of schizophrenia 58 . Calcineurin-inhibiting immunosuppression used after solid organ transplantation have also been associated with multiple neuropsychiatric side effects 59 . Our results therefore suggest that calcineurin inhibitors should be used with caution in psychiatrically predisposed IBD patients and that transplanted PSC-IBD patients using calcineurin inhibitors as rst-line immunosuppressive treatment may be at increased risk of psychiatric side effects. Consistent with these results from clinical and functional studies, we found that decreased predicted expression of PPP3CA is associated with increased risk for SCZ (Table 2).
Our analysis of bulk RNA-Seq and scRNA-Seq data showed that PPP3CA expression is mainly restricted to enteroendocrine and Paneth-like cells of the ileum, colon, and rectum. The gut microbiota produces various metabolites (including short chain fatty acids, secondary bile acids and lipopolysaccharides) that modulate enteroendocrine cells and produce hormonal signals re ecting food intake, microbial composition and epithelial integrity 60 . Paneth cells are secretory epithelial cells of the intestine that contain antibacterial proteins and alpha-defensins that are released into the intestinal lumen in response to a series of stimuli 61 , indicating a possible link to the GBA. PPP3CA is also expressed in the brain, there is evidence for a general implication in SCZ, and studies show more calcineurin immunoreactive neurons in the caudate nucleus of SCZ patients (Supplementary Box 1), which, together with the putamen where we found the TWAS gene-disease association, forms the striatum. Taken together with the literature (Supplementary Box 1), PPP3CA has probably at least two distinct functions relevant to SCZ or IBD, one in neuronal signal transduction in the striatum and the other in signal transduction for immune modulation in Paneth and enteroendocrine cells. NR5A2 (encoding human LRH-1) regulates intestinal glucocorticoid synthesis and is known to protect epithelial integrity and attenuate in ammatory damage in murine and human intestinal organoids, including those derived from IBD patients (Supplementary Box 1). Although NR5A2 appears to be nearly absent from the brain and blood based on the present data, there is evidence of its function in brain tissues (Supplementary Box 1), so we suspect that NR5A2 expression is restricted to less-studied regions of the brain and that it has separate functions in the gut and brain. SATB2 is a promising candidate for a GBA susceptibility gene because it is predominantly expressed in the colon sigmoideum and prefrontal cortex, consistent with our TWAS results in which SATB2 expression is associated with SCZ and UC but not with CD. SATB2 is also a prognostic marker in UC-associated colorectal cancer (Supplementary Box 1). In summary, gene expression analyses for NR5A2, SATB2, and PPP3CA using intestinal and brain tissue data from different developmental stages and from scRNA-seq studies support the ndings from previous human, mouse, and organoid studies and suggest a pleiotropic role for these genes as part of the GBA.

Selection of tissues for TWAS of the GBA
Recently, a heritability enrichment analysis of speci cally expressed genes across 205 tissues and cell types in combined with GWAS summary statistics of CD, UC, SCZ, MDD, BD and ADHD and subsequent validation of enrichments results with chromatin data yielded statistically signi cant enrichment results for tissues of the central nervous system (CNS) for psychiatric diseases and for tissues from blood/immune cell types and the gastrointestinal tract for in ammatory bowel diseases 18 . Based on these results, we restricted our TWAS analyses using genome-wide and transcriptome-wide reference data of 4,043 samples from brain, immune cell and the gastrointestinal tissues (Supplementary Table 2) available from consortia projects GTEx 19 (version V7, dbGaP accession code phs000424.v7.p2), STARNET 20 (dbGaP accession code phs001203.v1.p1), and BLUEPRINT 21 ftp://ftp.ebi.ac.uk/pub/databases/blueprint/).
As described by Hu et al. 15 , biallelic SNPs with minor allele frequency (MAF) > = 0.01 were retained and normalized gene expression information was adjusted to correct for potentially confounding effects of sex, sequencing platform and the three main principal components from principal component analysis (PCA) of the genome-wide genotype data, and with estimation of expression residuals (PEER) factors 67 .

UTMOST expression imputation models
To estimate the genetically regulated expression of each gene in the genome across 23 tissues of the GBA, we applied multivariate-response penalized regression models as implemented in UTMOST (https://github.com/Joker-Jerome/UTMOST) to predict cross-tissue gene expression from reference data sets. Brie y, a standard least-squares loss function was minimized, with a L1 Lasso penalty adaptively set based on sample sizes to select predictive SNP variables in a range of ±1Mb around each gene and force shrinkage in effect size estimation (within-tissue effects), and further with a group-Lasso penalty set across multiple tissues on the effect sizes of each SNP to select eQTLs shared between tissues (cross-tissue effects) 15 . Such cross-tissue models for expression imputation favor eQTLs that are shared across tissues while preserving tissue-speci c effects, and UTMOST's approximation procedure proposed for coordinate descent iteration handles incomplete data in the reference expression matrix.

Expression imputation, association testing and GBJ test
UTMOST was further used to test for gene-based association at the level of gene expression regulation, taking into account cross-tissue regulatory effects. A univariate regression model was applied to test the association between predicted gene expression (i.e., the genetically regulated component of gene expression) and disease status for each tissue separately (although previously trained over several tissues at the same time). To summarize the tissuespeci c results and test whether a gene is signi cant for at least one tissue, a generalized Berk-Jones (GBJ) test 22 was used to combine associations across individual tissues into a uni ed test of association. The GBJ is based on the absolute values of gene trait Z score, allowing for directions of eQTL effects in the different tissues, and uses tissue covariance to correctly weight results across tissues and correct for multiple tissue testing. We used the Bonferroni correction on the number of genes tested to determine transcriptome-wide signi cance (P < 0.05/15,587 = 3.20×10 − 6 ). For the CD, UC and PSC phenotypes, it had to hold additionally that transcriptome-wide signi cant genes replicate with P < 0.05 in the smaller GWAS datasets #2, #4, #6 in Supplementary Table 1.
Comparison with other pre-trained single-tissue expression imputation models The elastic net approach from S-PrediXcan 23 was used for expression imputation of 19 GTEx tissues (Supplementary Table 2) followed by gene-disease association testing using logistic regression for our 10 study data sets (Supplementary Table 1 Whitney U-test was applied to test for differences in slopes (β) between the three tools.

Cross-tissue multiple-locus-genes conditional analysis and GBJ test
We used the UTMOST multiple regression approach to perform multiple-locus-genes-conditional cross-tissue analysis for each transcriptome-wide signi cant gene (from unconditioned analysis) in each of the 10 studies separately to prioritize gene-disease associations at the same locus within the TWAS locus boundaries of the ±1Mb range. Conditional analysis accounts for one of the main problems of the TWAS gene-based association test, namely the potential coregulation of multiple genes by the same eQTLs or the presence of independent eQTLs across genes that are in linkage disequilibrium (LD) 14 . Again, the association statistics of the conditional analysis were combined across tissues using the generalized Berk-Jones score (GBJ) test described above. Again, we used the Bonferroni correction on the number of genes tested to determine transcriptome-wide signi cance (P < 0.05/15,587 = 3.20×10 − 6 ).
Overlap of gene-disease associations with the boundaries of known susceptibility loci from genome-wide association studies (GWAS) for CD, UC, PSC, SCZ, MDD, BD, ADHD For genomic position comparison, locus boundaries of established GWAS loci were adopted from the last ne-mapping GWAS studies for SCZ 30 and CD/UC 31,32 . A gene was labeled "Overlap with GWAS locus" in Table 2 if the gene had an overlap with an established GWAS locus. Gene coordinates (GRCh37, Jan 2020) were obtained using the Ensembl database and BioMart 68 .

Comorbidity analysis in the Danish National Patient Registry (DNPR)
To determine signi cant co-occurrences (disease-pairs) for the seven diseases under study, we screened an independent data set covering ICD10 diagnosis codes from 7,191,385 people of the entire Danish population in the period from 1994 to 2018 69 . The DNPR includes primary and secondary diagnoses coded according to the International Statistical Classi cation of Diseases 10th Revision (ICD-10). We identi ed patients diagnosed with one of the seven disease and then a control group of 20 controls for each case, matched by sex and year of birth. The strength of the association between the diseases was assessed by relative risk (RR). P-values were calculated using a two-sided Fisher exact test and corrected for multiple testing using the false discovery rate (FDR).
Trait correlation at the level of predicted gene expressions Genes are often co-regulated, i.e. they either share the same eQTLs or eQTLs exist in strong LD. For this reason, we used a subset of 1,825 co-regulated and eQTL-independent genes recently provided for Mendelian randomization studies 26 , and further excluded genes from the extended MHC region (chr6:  to avoid in ated correlation measures. Spearman's correlation of all traits with each other was calculated separately for each of the 23 tissues based on the single-tissue marginal Z scores (Supplementary Table 9) and additionally based on the GBJ-test marginal test-statistics (Supplementary Table 9; Fig. 2).

Mendelian randomization (MR) analysis at the level of predicted gene expressions
We performed multi-gene MR analysis at the level of predicted gene expressions using GSMR 29 (Generalised Summary-data-based Mendelian Randomisation) to test for a causal association between of one trait (exposure) on another (outcome) using summary-level TWAS gene association statistics. We used only standardized transcriptome-wide signi cant marginal UTMOST gene-disease associations as instruments (p marginal&conditional <3.2e-6) that were likely independent of neighboring genes by enforcing a minimum distance of 1 Mb between the instrumental genes (see section Cross-tissue multiple-locus-genes conditional analysis). (Results of the conditional GBJ test could not be used because this test does not output a directional effect size (Z score); thus, MR analysis was performed on our tissue-speci c TWAS results.) At least 10 signi cant, independent gene-disease associations were necessary as instruments to obtain reliable regression results 29 . Because genetically regulated expression is (by de nition) heritable and inferred from SNP-disease association effects, the MR approach from GSMR was performed with (statistically independent) TWAS genes. This satis es the requirements of MR analysis according to Bowden et al. 70 . We further used the HEIDI ((heterogeneity in dependent instrument)-outlier test) 29 to detect and eliminate pleiotropic gene-disease associations (genetic instruments) that have apparent pleiotropic effects (horizontal pleiotropy, P HEIDI <0.01) on both traits (exposure and outcome) under investigation.

Gene overlap analysis
To nd potential genes mediating the gut brain axis in the same or different tissues, we considered as candidates all genes that were transcriptome-wide signi cant for both one in ammatory and one psychiatric trait (i) in the same tissue or (ii) in the GBJ test, both marginally (P marginal <3.20×10 − 6 ) and conditionally (P conditional <3.20×10 − 6 ) signi cant. Option (ii) is more liberal and thus includes genes, which are signi cant in one tissue in a psychiatric trait but signi cant in an in ammatory trait in another tissue.

Gene expression heritability estimation
Because the heritability of gene expression of each gene was not given by UTMOST expression reference and training les, we extracted heritability estimates of all available tissues from our FUSION results (see section Comparison with other pre-trained single-tissue expression imputation models) by determining the lowest and the highest heritability value per gene, respectively, to give a kind of con dence range of the expected heritability. Brie y, for heritability estimation, the cis locus was de ned as +/-500kb of gene boundaries. Samples were restricted to Europeans by principal components analysis (PCA). FUSION's heritability models were adjusted for 20 gene expression principal components from PCA, two genetic ancestry PCs, and local structural variation estimated from SNP array data of reference data sets 24 . Heritability estimation was performed using GCTA/GREML 71 .

GWAS/TWAS conditional analysis
If a signi cant TWAS disease gene is present for a given locus, we expect that the eQTLs of this gene explains a portion of the original GWAS signal seen in the input GWAS summary statistics, for example, in the presence of statistically causal protein-coding variants. To estimate the proportion of the GWAS signal explained by the eQTLs of a signi cant TWAS candidate gene, GWAS summary statistics were conditioned with GCTA COJO 72 , version 1.92.2) using a maximal set of noncolinear eQTL variants of a TWAS gene (of ±1Mb anking region around the gene) for each candidate gene of interest. The aim of this analysis was to qualitatively test whether the eQTLs of a transcriptome-wide signi cant gene were a possible statistical explanation for the GWAS association signal. GCTA COJO uses the allele frequency, beta, standard error of the beta, the association p-value and the population size of the GWAS summary statistics as input. All eQTLs with a non-zero effect of a signi cant TWAS candidate gene were given to COJO using parameter --select, whose algorithm is designed to nd non-colinear associated SNPs (MAF > 0.01) to subsequently condition on. The P value threshold for including disease-associated non-colinear eQTL sets was chosen 0.05; this relatively weak association signi cance threshold was chosen to additionally identify eQTLs from non-genome-wide signi cant regions.
The resulting set of non-colinear eQTL SNPs was used to condition the GWAS summary data using the GCTA COJO parameter --cond via conditional linear regression analysis 33 . The resulting TWAS-conditioned GWAS summary statistic represents the minimal portion of the GWAS signal that is not covered by the eQTLs of the selected gene and thus could not have been included in the TWAS association. This may suggest that the TWAS gene, although signi cantly associated, may not be su cient to fully explain the GWAS association at this locus.
In summary, our complete UTMOST-and GCTA COJO-based analysis can answer three questions on the association of genetically regulated expression of a gene in a given tissue; (i) whether the eQTLs weighted from the UTMOST-trained models provide an explanation for a GWAS signal with ±1Mb around a TWAS

Competing interests
The authors have no competing interests or other interests that might be perceived to in uence the results and/or discussion reported Selection of 23 tissue and cell types to perform transcriptome-wide association analyses (TWAS) for gut-brain axis (GBA) diseases. We selected 23 tissues and cell types previously found to contribute to the heritability of psychiatric or immune-mediated disease (Methods). Cross-tissue TWAS training and imputation were performed using transcriptome-wide and genome-wide reference data from 4,043 reference samples from 23 tissues and cell types (Supplementary Table 2) provided by consortium projects GTEx, STARNET, and BLUEPRINT (Methods): gastrointestinal and liver tissues (n=9), brain tissues (n=10), whole blood (n=1) and immune cell types (CD14+ monocytes, CD16+ neutrophils, CD4+ T-cells; n=3). Different colors represent biological samples from different tissue classes. The size of the circles corresponds to the number of reference samples of the respective tissue.  Analysis of bulk and scRNA-seq data suggest a cell-speci c pleiotropic role of NR5A2, SATB2, and PPP3CA in SCZ, CD and UC and showed that PPP3CA expression was strongest in neurons and enteroendocrine and Paneth-like cells of the ileum, colon, and rectum, indicating a possible link to the GBA. (A) Screening of several organ-and blood cell type-speci c bulk expression data from GTeX and BLUEPRINT shows that NR5A2 is expressed in gastrointestinal tissues but not in the brain, SATB2 is expressed mainly in the colon transversum and prefrontal cortex (the same tissues where we detected the gene-disease associations) and PPP3CA is expressed in several brain tissues but less in intestinal tissues. (B, top) Brain expression data from humans of different ages37 con rm GTeX and BLUEPRINT tissue data and show that NR5A2 is almost absent in brain samples. SATB2 is strongly upregulated in the fetal forebrain in midfetal development. This suggests a developmental function that may predispose to disease when dysregulated. PPP3CA expression is strongest in the forebrain and increases with age. Expression in liver is given for reference. PC, post conception. (B, bottom) Data on expression in the developmental gut38 show that NR5A2 is strongly expressed in the ileum and less so in the colon, whereas SATB2 is more abundant in the colon than in the ileum. This is consistent with the observation that SATB2 is signi cant for UC but not for CD (Table 2). PPP3CA expression is not evident from bulk RNA-Seq data of intestinal tissue. (C) NR5A2 is expressed in ileal progenitor, stem cells and transient amplifying (TA) cells, less strongly in colon and rectum, and not in brain, consistent with bulk RNA-seq results. SATB2 is strongly expressed in all cell types of the colon and rectum. PPP3CA, despite weak expression in ileum or colon, is enriched in enteroendocrine and Paneth-like cells of the colon and rectum, supporting a role in immunomodulation. In the brain, PPP3CA is enriched in neurons. Overall, these expression data indicate that separate functions are likely in the gut and brain based on speci c expression patterns. NR5A2 expression was not found in the brain, but literature results make brain-speci c function highly likely