The neuronal chromatin landscape in adult schizophrenia brains is linked to early fetal development

Non-coding variants increase risk of neuropsychiatric disease. However, our understanding of the cell-type specific role of the non-coding genome in disease is incomplete. We performed population scale (N=1,393) chromatin accessibility profiling of neurons and non-neurons from two neocortical brain regions: the anterior cingulate cortex and dorsolateral prefrontal cortex. Across both regions, we observed notable differences in neuronal chromatin accessibility between schizophrenia cases and controls. A per-sample disease pseudotime was positively associated with genetic liability for schizophrenia. Organizing chromatin into cis- and trans-regulatory domains, identified a prominent neuronal trans-regulatory domain (TRD1) active in immature glutamatergic neurons during fetal development. Polygenic risk score analysis using genetic variants within chromatin accessibility of TRD1 successfully predicted susceptibility to schizophrenia in the Million Veteran Program cohort. Overall, we present the most extensive resource to date of chromatin accessibility in the human cortex, yielding insights into the cell-type specific etiology of schizophrenia.


Description of the post-mortem brain samples
Frozen brain tissue derived from ACC (anterior cingulate cortex/Broadmann area 10) and PFC (prefrontal cortex/Broadmann area 9 and 46) were obtained from three separate brain banks, described below.This cohort of schizophrenia (SCZ), bipolar or other affective/mood disorder (AFF) cases and control subjects was assembled after applying stringent inclusion/exclusion criteria.All involved subjects had to meet the appropriate diagnostic DSM-IV criteria, as determined in consensus conferences after review of medical records, direct clinical assessments, and interviews of family members or care providers.All tissue donors were from the Icahn School of Medicine at Mount Sinai (MSSM), University of Pittsburgh (PITT) brain banks and NIMH Human Brain Collection Core (HBCC).Table S1A-D tabulates the demographic information at sample level of the present study, including sex, age of death, and ethnicity, stratified by cell type, brain region, institution, and diagnosis.The link to the complete demographic and clinical information of the present study population is provided on synapse platform (syn52264219).

Processing of data
We have implemented our in-house pipeline for processing of ATAC-seq data previously used for other studies (2,3).The detailed steps are explained below.

Alignment of raw sequencing files
The raw reads were trimmed with Trimmomatic (4) and then mapped to human reference genome GRCh38 analysis set reference genome with the pseudoautosomal region masked on chromosome Y with the STAR aligner (v.2.7.0e) (5).To correct for allelic bias resulting from person-specific genome variation, we ran STAR with enabled WASP module (6) as we provided both ATAC-seq FASTQ file and WGS file or SNP array genotype of corresponding individual.This yielded for each sample a BAM file of mapped paired-end reads sorted by genomic coordinates.From these files, reads that mapped to multiple loci or to the mitochondrial genome were removed using samtools (7) and duplicated reads were removed with PICARD (v2.2.4; http://broadinstitute.github.io/picard).Quality control metrics were reported with phantompeakqualtools (8), ataqv (9), and Picard.All ATAC-seq QC metrics are summarized in Table S1F and details at sample level are provided on synapse platform (syn52264219).

Genotype calling
For ATAC-seq, genotypes were called using GATK (v3.5.0) (10).In brief, these steps were performed: (1) indel-realignment; (2) base score recalibration; and (3) joint genotype calling across all samples for variants with a phred-scaled confidence threshold >= 10.All clustered variants, variants in ENCODE blacklisted regions of the genome (11), and variants not in dbSNP v151 (12) were not considered.Read depth was not used for filtering.Finally, only variants with minor allele frequencies (MAF) ≥ 25% were considered.Pairwise genotype concordance was carried out amongst all ATAC-seq samples and whole genome sequencing (WGS) (or SNP-array samples for donors without WGS) using the kinship coefficient from KING v1.9 (13) and considering only the variants called in the ATAC-seq data.Cutoffs for the kinship values were selected based on the samples that were expected to be from the same brain versus those that were not.Using these cut-offs, we identified and resolved all sample swaps.We also reported potentially contaminated libraries that were matching multiple genotypes (or no genotype), mostly due to mistakenly used identical barcodes within the same pools for more than one sample.Those libraries were removed from the dataset.

Peak/OCR Calling
Peaks or OCRs calling was done as previously described (2, 3) with a modification of FDR threshold to 0.01.

Metrics used for quality control
For each sample, the following quality control metrics were used: the total number of initial reads; the number of uniquely mapped reads; the fraction of reads that were uniquely mapped; further metrics from the STAR aligner; the GC content, duplication and insert metrics from Picard; the rate of reads mapping to the mitochondrial genome; the PCR bottleneck coefficient (PBC), which is an approximate measure of library complexity estimated as uniquely mapped non-redundant reads divided by the number uniquely mapped reads; TSS enrichment in housekeeping genes (calculated per ENCODE ATAC-seq data standards); the normalized strand cross-correlation coefficient (NSC) and the relative strand cross-correlation coefficient (RSC), which are metrics that use cross-correlation of stranded read density profiles to assess sample quality independently of peak calling; and finally the fraction of reads in peaks (FRiP), which is the fraction of reads that fall in detected peaks, the fraction of reads in only blacklisted peaks (11), and the ratio between these two metrics (to calculate these metrics, the consensus set of peaks was used).The main quality metrics are shown in Table S1F and details at sample level are provided on the synapse platform (syn52264219).

Annotating OCRs to genes using ABC model
We used Activity-by-contact model (ABC, v.0.2) (14) to construct a comprehensive regulatory map of enhancer-promoter (E-P) interactions in neuronal and non-neuronal cell types of the two investigated brain regions (DLPFC and ACC).This model requires: (1) contact frequency between putative enhancers and promoters of regulated genes; and (2) enhancer activity data.Contact frequency matrices were generated from neuronal and non-neuronal Hi-C datasets composed of eight post-mortem human brains (2).Here different neocortical regions (dorsolateral prefrontal cortex, orbital frontal cortex, and anterior prefrontal cortex) were profiled across multiple donors aged 34-103 years.Enhancer activity data was represented by the cell type and brain region specific ATAC-seq signal (current study) and the H3K27ac ChIP-seq signal.ChIP-seq data was generated in a subset of ten controls (STG and EC; age of donors ranged between 61-103 years) (15).In accordance with the authors' directions, we filtered out predictions for genes on chromosome Y and lowly expressed genes (genes that did not meet inclusion criteria in our RNA-seq dataset).We used the default threshold of ABC score (a minimum score of 0.02), the default screening window (5MB around the TSS of each gene), and we skip ABC calculation for ubiquitously expressed genes (using default ABC list of 806 genes).

Analysis of differentially accessible OCRs (schizophrenia OCRs)
To assess which OCRs showed differential accessibility in SCZ-and BD-related phenotypes, we analyzed the accessibility statistically.For this, chromatin accessibility was estimated by the number of ATAC-seq reads overlapping a given OCR.The more overlaps seen with an OCR the more accessible the more accessible the OCR was considered.The statistical analysis encompassed the following steps.

Read count and OCR filtering
The starting point here were the sample by OCR matrices (separately for neuronal and non-neuronal samples) of read counts generated as described in the preceding section.From these matrices, we excluded OCRs that were lowly accessible by only keeping OCRs that had at least 1 count per million reads in at least 10% of the samples.This removed 742 and 4 neuronal and non-neuronal OCRs, respectively, and resulted in a final read count matrices of 690 neuronal samples by 391,420 OCRs and 703 non-neuronal samples by 260,431 OCRs.Next, the read counts were normalized using the trimmed mean of M-values (TMM) method (16).

Exploration of covariates and model selection
To explore the effect of technical and biological covariates, we first did a principal component analysis (PCA) on the normalized read counts to identify high-variance components explaining more than 1% of the variance.We then accessed the correlation of covariates with the PCs and selected those that showed a significant correlation with one or more PCs at a lenient FDR cut-off of 0.2 as candidate covariates for the analysis of differential chromatin accessibility.This encompassed 64 covariates including FRiP, GC content metrics, mapping metrics, insert metrics, the fraction of reads mapping to the mitochondrial genome, PBC, RSC, and barcode.These covariates were subsequently assessed as detailed in the following.Next, the starting point for modeling the chromatin accessibility was chosen as the variables "brain region by diagnosis status" (2x2=4 levels) and "sex" (2 levels) for a base model.The variable "sex" was included as it is known to have a strong effect on a few OCRs primarily located on the gonosomes.To assess which covariates should be included in order to have a good average model of OCR accessibility we employed the Bayesian information criterion (BIC).In particular, it was for each additional covariate tested how many OCRs showed an improved BIC score minus how many showed a worse BIC score when the covariate was included in the linear regression model compared to when it wasn't.Here, a covariate was required to improve mean BIC per OCR by at least 10 in at least 2% of OCRs in order for it to be included in the final model.If there are some covariates fulfilling this criteria, the best performing of them is added into the base model and the remaining covariates are tested again.For neuronal dataset, the following numerical covariates were selected: "GC 80-100%" (i.e., normalized coverage over each quintile of GC content ranging from 80 -100%), "Fraction of reads in peaks (FRiP)", "Fraction of unmapped reads", "GC 20-39%", "AT dropout", "GC 40-59%", and "Age at the time of death".For non-neuronal dataset, the following numerical covariates were selected: "GC 20-39%", "GC 80-100%", "GC 0-19%", "Age at the time of death", "Fraction of reads in blacklisted peaks (FRiBP)".Subsequently 14 categorical covariates were considered for inclusion due to the higher number of degrees of freedom of each covariate.None of these fulfilled the BIC criteria for inclusion.Finally it was considered if the selected numeric covariates affected chromatin accessibility as a quadric term by testing the squared FRiP for inclusion.It did not meet the BIC inclusion criteria and was therefore not added to the model.Resultantly the final neuronal and non-neuronal models jointly encompass 14 and 12 degrees of freedom, respectively (Fig. S7).

Analysis of differentially expressed genes
The analysis of differential gene expression follows the same protocol as previously described analysis of chromatin accessibility.Therefore, a description will focus on explanation of changes of parameters in the pipeline.

Read count and OCR filtering
The initial read count matrix of the CMC RNA-seq dataset (19) consisted of 58,930 genes quantified for 1,818 samples (only samples from DLPFC and ACC brain regions with SCZ, BD, and Control diagnosis status were selected).Then, we calculated the correlation of each sample to all other samples and removed the samples with markedly different correlation, i.e., the difference of mean correlation of the given sample with the rest of the dataset and mean correlation of all pairs of samples within the dataset was more than four times higher than the standard deviation calculated upon all pair's correlation.We only kept protein-coding genes with gene expression of at least 1 count per million reads in at least 30% of the samples.These filters led to the exclusion of 66 samples and 44,024 genes so the resultant read count matrix consisted of 1,818 samples by 14,906 genes.We performed quantile normalization of the data followed by normalization using the trimmed mean of M-values (TMM) method (16).
We required that at least 5% of the peaks showed a change of 2 in the BIC score, corresponding to "positive" evidence against the null hypothesis (20).The relatively high number of covariates can be explained by the study design which employed homogenate tissues, not FANS-sorted nuclei.Therefore, the model needed to cope with naturally different neuronal / non-neuronal composition in different brain regions.The final model jointly encompassed 24 degrees of freedom.

Identification of SCZ stages using manifold learning
To identify SCZ stages, neuronal and non-neuronal schizophrenia OCRs were first filtered based on FDR corrected p-values with thresholds of FDR < .0001 in PFC (n=2,411 neurons upregulated schizophrenia OCRs), FDR < .001 in ACC (n=1,495 neurons upregulated schizophrenia OCRs), FDR < .0001 in PFC (n= non-neurons upregulated schizophrenia OCRs), and FDR < .001 in ACC (n= nonneurons upregulated schizophrenia OCRs).Filtered expression matrices of cell-region specific OCRs were then projected onto the first 50 principle components and a nearest neighbour graph was constructed using the UMAP neighbour kernel (21).Diffusion maps (22) were calculated with the resulting neighbour graph using the kernel normalisation introduced in ( 23) (see Fig. S11A-B).We then constructed another UMAP neighbour graph using the first 15 diffusion coordinates and performed Leiden clustering (24) (resolution parameter = 1) to identify SCZ stages in diffusion map space.A total of 6(5) leiden clusters were obtained from PFC(ACC) regions in neuronal samples (see Fig. 2C and  S11C).A diffusion pseudotime calculation was then performed according to (23) with a root node in the cluster with the highest density of control samples.All analysis was performed in scanpy using default parameters (see Fig. S11D-E).We limited the pseudotime estimation to neuron PFC and ACC samples only because of noisy diffusion maps for non-neurons PFC and ACC OCRs expression matrices.
The diffusion maps of the non-neuronal OCRs were notably noisier than those of the neuronal OCRs, leading to worse clustering and SCZ staging.One possible reason for this is the difference in spectral gaps, i.e. the difference between the eigenvalues associated with the first and second principal components of the data.When the data is Gaussian distributed, large spectral gaps lead to clean diffusion maps which follow the shapes shown in (see Fig. S11A-B), while smaller gaps lead to noisier ones.One biological reason for this is that the non-neuronal samples contain a mixture of cell types (astrocytes, oligodendrocytes, and microglia) as opposed to the neurons which only contain different subtypes.The first and second principal components of the mixture distribution can, in general, be orthogonal to the first and second principal components of each individual cell distribution and in such a case the spectral gap of the mixture distribution can be decreased and, in some cases, disappear entirely.If one cell type dominates the mixture distribution with a large number of samples and large variance, its principal components will dominate the mixture distribution while the other cell types will shrink the overall spectral gap of the mixture relative to the dominant cell type.This leads to decent diffusion maps for the first couple of diffusion components, followed by noisy diffusion components that deviate from the expected structure.

Million Veterans Program cohort
The Million Veteran Program (MVP) is a mega-biobank that has been previously described (25)(26)(27).
On top of quality control performed by the MVP Core Team, we perform additional quality control (sex check, relatedness, minor allele frequency, hardy weinberg equilibrium, missingness, etc.. ) recommended for similar studies (28) using electronic medical data v20.1.For this study, we only utilise the European (EUR) population as defined by HARE(29) using the 1000 Genomes Project reference (30).

PRS estimation
Genotyping specifications for the CMC SNP array data used in the PRS analyses have been described previously (19,31).For the dataset used in these analyses specifically, marker positions were lifted-over to GRCh38.Pre-imputation processing then included running the quality control script HRC-1000Gcheck-bim.plfrom the McCarthy Lab Group (https://www.well.ox.ac.uk/~wrayner/tools/), using the Trans-Omics for Precision Medicine (TOPMed) reference (32).Genotypes were then phased and imputed on the TOPMed Imputation Server (https://imputation.biodatacatalyst.nhlbi.nih.gov).Variants with an imputation R 2 > 0.8 were retained.Samples with a mismatch between one's self-reported and genetically inferred sex, suspected sex chromosome aneuploidies, high relatedness as defined by the KING kinship coefficient (13)(KING > 0.177), and outlier heterozygosity (+/-3SD from mean) were removed.Additionally, samples with a sample-level missingness > 0.05 were removed, calculated within a subset of high-quality variants (variant-level missingness ≤ 0.02).
SCZ PRS were calculated on the MVP and CMC cohort samples using summary statistics from the Psychiatric Genomics Consortium (PGC3) SCZ GWAS as training (36).PRS constructed with SNPs underneath schizophrenia OCRs within specific TRDs used as training data PGC3 SCZ GWAS summary statistics that were subsetted to include only SNPs falling within TRD1, TRD6, or TRD7 (for PFC-specific scores) and TRD1 or TRD2 (for ACC-specific scores), respectively.The PRS-CS-auto method (37) was used to apply continuous shrinkage priors to the effect sizes from these summary statistics.
A EUR LD reference panel provided by the developers of PRS-CS was utilized (https://github.com/getian107/PRScs),which draws from UK Biobank data (38).The following PRS-CS default settings were used: parameter a in the γ-γ prior = 1, parameter b in the γ-γ prior = 0.5, MCMC iterations = 1000, number of burn-in iterations = 500, and thinning of the Markov chain factor = 5.The global shrinkage parameter phi was set using a fully Bayesian determination method.Individual-level SCZ PRS and TRD-specific PRS were calculated using PLINK 2.0 (35).SCZ PRS and TRD-specific PRS were calculated in the MVP cohort similarly to how it was calculated above, except utilising a EUR LD reference panel drawing from 1000 Genomes Project data (33).

CRD calling
Cell-region specific covariates corrected expression matrices from four datasets were utilized for CRD analysis.The datasets included: 1) neuron PFC (391,420 OCRs X 291 samples), 2) neuron ACC (391,420 OCRs X 272 samples), 3) non-neuron PFC (260,431 X 299 samples), and 4) non-neuron ACC (260,431 X 278 samples).Since disease-specific changes were not observed in BD vs. control samples at the OCR level, the CRD analysis focused only on SCZ and control samples.The CRD calling pipeline followed the methodology described in Girdhar et.al (39).Briefly, the first step involved estimating hidden confounds using probabilistic estimation of expression residuals (PEER) (40) residualization on each of the four above-mentioned matrices separately.This was done to eliminate global noise and retain the local correlation structure of OCRs in physical proximity to each other.The retained OCR correlation structure facilitated their isolation using decorate software (41) in the next step.
2) The distribution of the mean absolute correlation (MAC) metric was created for both permuted matrices and original matrices separately, for the five cluster sizes (10, 25, 50, 80, 100).MAC is defined as the mean absolute correlation value of a CRD.We estimated the MAC cutoff by taking the MAC value at 95 th percentile from the distribution separately for each cluster size.Figure S13 shows the plot of MAC vs PEERs for each cluster size.We choose PEER25 as optimal PEER for final CRDs all across the four datasets.All CRDs at the five mean cluster sizes were retained with a MAC value greater than MAC cutoff and any overlapping CRDs were merged into one after this step.
Alternatively, we measured the odds of enhancer-promoter OCRs from the ABC model within a CRD vs all OCRs in a CRD.To do this, we ran Fisher test (42) by using 2 x 2 contingency table of cell-region specific OCRs from ABC model ( 14) that are inside/outside cell-region specific CRDs in comparison to all cell specific OCRs that are inside/outside CRDs as shown in Fig S15I.

Identification of schizophrenia CRDs
In our analysis, we employed a two-stage testing approach using the stageR package (44) to identify significant CRDs.The process involved two distinct stages: the screening stage (stage 1) and the confirmation stage (stage 2).In the screening stage, we summarized each CRD by computing its pvalue using the Sidak method (45), which was applied to all p-values of OCRs within that particular CRD.In confirmation stage (stage 2), we focused only on CRDs with Sidak p-values less than 0.05 from stage 1.These selected CRDs underwent individual hypothesis testing to assess the dysregulation of OCRs within them.A CRD was considered a "schizophrenia CRD" if it obtained a p-value < 0.05 in stage 2, indicating significant differences in OCR expression between the SCZ and control samples.In order to summarize the directionality of CRDs, we calculated the mean of log2FC (SCZ vs Controls) of OCRs within each CRD, comparing SCZ samples to control samples.An upregulated CRD had a mean of log2FC (SCZ vs Controls) > 0, while a downregulated CRD had a mean of log2FC (SCZ vs Controls) <0 as depicted in Fig S14C .Differential CRDs summary table of p-values obtained from both stage 1 and stage 2 for all the identified CRDs across the four datasets is available at syn52264219.

Assessing the link of CRDs with schizophrenia OCRs and diseased genes
In order to assess whether the diseased OCRs are more likely to be within the CRD compared to outside the CRD, we employed a generalized linear model using logistic regression.The model used the tstatistics of OCRs from differential OCRs table at syn52264219 to predict the status of OCRs inside or outside the CRD as shown in eq (1). Figure S16B shows the odds of OCRs inside CRD from eq (1).
~   − (  ) (1) ℎ     (1)   (0)  Subsequently, to examine whether the schizophrenia CRDs are indeed associated with schizophrenia OCRs, we utilized eq (2).This model aimed to predict whether the CRDs is schizophrenia CRD or not using the proportion of schizophrenia OCRs inside CRDs.We have added an offset term to control for the number of OCRs within the CRD.The prediction uses Poisson model distribution in glm R function.

Figure S16C
shows the odds of observing the clustered schizophrenia OCRs in SCZ CRD.

𝐶𝑅𝐷 𝑂𝐶𝑅𝑠 𝑠𝑡𝑎𝑡𝑢𝑠 ~ 𝑜𝑓𝑓𝑠𝑒𝑡(𝑙𝑜𝑔(𝑛𝑜 𝑜𝑓 𝑂𝐶𝑅𝑠 𝑤𝑖𝑡ℎ𝑖𝑛 𝐶𝑅𝐷))
+   ℎℎ     (2) ℎ           ℎℎ     ℎℎ  Lastly, to establish a connection between the disease-associated epigenome and the disease-associated transcriptome, we tested whether schizophrenia OCRs inside the schizophrenia CRDs could be predicted using the t-statistics of SCZ associated genes using eq (3).We use differential genes analysis from the CMC consortium (19).The OCRs were mapped to genes using ABC mapped genes table at syn52264219.Figure S16E shows the odds of observing the schizophrenia OCRs inside CRDs using the t-stats of differential genes.

Clustering of schizophrenia CRDs into TRDs
In this section, we evaluate whether the interaction between schizophrenia CRDs across samples can stratify the CRDs into domains called trans-regulatory domains (TRDs) that can inform us about the specific molecular mechanism.We have limited the identification of TRDs to only schizophrenia CRDs in neuron PFC and neuron ACC because no significant association of CRDs with SCZ risk loci was observed in non-neurons ACC and PFC CRDs.The workflow from schizophrenia CRDs to TRDs is explained in schematic in Fig. S17A.
First, to obtain the interaction between schizophrenia CRDs, we took covariate corrected matrices of OCRs within neuron PFC and neuron ACC with dimensions of 1,953 x 360 and 1,691 x 330 respectively.Every CRD is summarized by calculating the average expression of OCRs using the eq(4) below: We took the Pearson correlation of CRD expression matrices that gives 1,953 x 1,953 and 1,691 x 1,691 dimensions of neuron PFC and ACC respectively.We used the first three principle components of correlation matrices for running clustering algorithm.We then fitted these matrices to hierarchical clustering with k=2:20 using diceR package in R using the default method of distance i.e. average linkage (46).
For every value of k, a random subsampling of 80% of the CRD correlation matrix is carried out 20 times.Therefore, not every sample is included in each clustering.The clustering for each iteration of the hierarchical clustering is completed using k-nearest neighbor (47) and majority voting.The optimal number of clusters was found by evaluating Baker-Hubert GAMMA index (48) while varying the cluster size (from =2:20).GAMMA index is a measure of compactness (how similar are the objects within the same cluster), separation (how distinct are objects from different clusters), and robustness (how reproducible are the clusters in other datasets).Using GAMMA index, we found k=10 optimal clusters across both cortical regions for neuron CRDs as shown in Fig. S17B.The list of annotated TRDs to CRDs from the hierarchical clustering at k=10 clusters is available at syn52264219.For every CRD, we estimated its directionality using eq (5).We added these annotations of TRDs and their direction from eq (5) to the heatmap of correlation matrix of schizophrenia CRDs in Fig 3D and Fig. S18A.Also, we curated two lists, one with all OCRs and second one with only schizophrenia OCRs within each TRDs for all downstream analysis.Fig. S17C shows the number of all OCRs and schizophrenia OCRs within each TRDs whereas Fig. S17D is stratified by directionally of SCZ OCRs within TRDs.For Fig. 3H and Fig. S18E, we took all OCRs within TRD1 to measure the spearman correlation with fetal associated changes detected at FDR < .05 in Rahman et.al.(49).For the full differential OCRs table of Fetal cortical ATAC OCRs vs. Adult controls OCR analysis, see syn52264219.

Sample collection
Fetal brain samples were obtained from cortical plates from four distinct donors aged 18-24 pcw.These samples were collected from de-identified prenatal autopsy specimens with no neuropathological abnormalities at the Icahn School of Medicine at Mount Sinai.Library prep and post processing of atac libraries was done similar to described above.

Association test of fetal cell types to TRDs
We obtained scATAC-seq expression data from human fetal cortical samples, as outlined in the study by Trevino et.al (50).The dataset comprises four samples collected over an 8-week period during midgestation, specifically at 16pcw, 20pcw, 21pcw, and 24pcw, containing 6423, 4486, 12675 and 7720 cells.This dataset contains 31,304 nuclei annotated to a total of 22 cell-type clusters, including five types of interneurons (IN), nine different clusters of cortical excitatory neurons (GluN) and precursor cells, such as early radial glia, late radial glia and other non-neuronal cell types.In total, there were 657,930 fetal OCRs genomic regions identified in this dataset.Only the cell types with at least 100 cells in the cell type cluster were considered for this analysis.
To assess the expression score of schizophrenia OCRs within TRDs in fetal cortical cell types, we compiled a list of schizophrenia OCRs from both TRDs and outside TRDs that overlapped at least 99% with the fetal OCRs in both cortical regions.As a result, we obtained the majority of regions for TRD1 and outside TRDs, with neuron TRD1 comprising 3,189(2,490) regions from the PFC(ACC), while outside TRDs encompassed 2,055(1,383) regions from PFC(ACC).In contrast, the number of regions that overlapped with schizophrenia OCRs in TRDs was limited, with only 263(88) regions identified in neuron PFC(ACC) TRDs(2:10).To analyze the data further, we utilized the clustering annotations provided by Trevino et.al and computed the expression scores per cluster/cell types using Seurat's AddModuleScore function.For better visualization, we did Z-score normalization on these module scores and generated plots, as shown in To investigate significant cell types associated with expression scores from our input list, our approach was two fold.First we identified TRDs that exhibited a positive and significant association with a particular cell type, relative to other cell types.To achieve this, we applied a linear mixed model, considering the enrichment scores for each TRD at different developmental stages (pcw16, pcw20, pcw21, pcw24).The model was formulated as follows: (enrichment scores ~ cell type + (1|sample ID)), where enrichment scores were obtained from the AddModuleScore R function (51).The cell type was coded as "cell type " for cells in a given cluster and "non_celltype" for cells in other clusters, and sample ID represented the individual ID of cortical samples.By incorporating the sample ID as a random effect in the model, we accounted for the presence of multiple cells from each individual.We ran this model on all cell type scores to examine their associations.Finally, we created the list of cell types and associated TRDs that had significant and positive association after FDR correction for multiple testing burden from the number of tests.
In the second step, we ran another model to compare the enrichment scores of a given cell type from a significantly associated TRD with the enrichment scores of the same cell type from schizophrenia OCRs outside the TRD.The model was defined as lmer(enrichment scores ~ TRD_OCRs_status + (1|Sample.ID) + (1|Cell.ID)), where enrichment scores were obtained as previously explained, and TRD_OCRs_status was coded as "TRD" for enrichment scores of cells from a significant TRD in a given cell type and "outside_TRD" for cells from outside TRD in the same cell type.The sample ID and cell ID represented the individual ID and cell IDs of cortical samples, respectively.We incorporated the cell ID and sample ID as a random effect to account for the repeated measurements from individuals and cells from the outside TRD enrichment scores.Table S7A-B and Table S7C-D show the output from step1 and step 2 on neuron PFC(ACC) TRDs.Fig. 4B-C and Fig S19B-C show the distribution of enrichment scores coloured by the magnitude of log2FC which is the enrichment score of schizophrenia OCRs in TRD1 in a given cell type vs enrichment score schizophrenia OCRs in TRD1 in other cell types from step 1.The star (*) annotation next to distribution shows whether or not a given cell type is significant after step 2. Our analysis showed significant association of schizophrenia OCRs in TRD1 with early glutamatergic cell types: Glu (1,3,5,8) and late glutamatergic cell types Glu(6,9) across both cortical regions.In inhibitory neuron class, IN (3,4,5) and IN (2,3,4,5) significant association with schizophrenia OCRs in TRD1 from PFC and ACC respectively.Next we gathered the cell specific markers from (50) for all significant cell types and intersect with ABC schizophrenia OCR mapped genes in TRD1 to obtain the cell-region and disease specific genes (see Table S7G-H).We used this list for running magma and pathway analysis as explained below to generate Fig 4D -E and Fig. S19D-E.

LD score enrichment analysis of CRDs
We ran LD score enrichment analysis to estimate the enrichment of brain-related and non-brain-related GWAS in all input OCR regions.For Fig 2 , we did this analysis on all neuronal and non-neuronal OCRs using differential OCRs table (syn52264219), schizophrenia OCRs from neuron PFC, neuron ACC, non-neuron PFC and non-neuron ACC given (syn52264219).For Fig 3 , we estimated SCZ heritability using schizophrenia OCRs within neuron PFC and neuron ACC CRDs, and outside neuron PFC and neuron ACC CRDs given in Table S3, S6, S8.In Fig 4 , we did this analysis on OCRs within TRD1:10 from neuron PFC and ACC using Table S7.For all analyses, the broad MHC-region (chr6:25-35MB) was excluded due to its extensive and complex LD structure, but otherwise default parameters were used for the algorithm.

MAGMA association trait analysis
We used MAGMA version 1.08b ( 52) for all the analysis described below.For Fig. 4E-F and Fig. S19E-F we took the gene markers list obtained from the above section named as "Association test of scATAC from fetal cell types to TRDs''.Table S7G-H tabulates cell-region and disease specific genes markers that were utilized to identify enrichment of GWAS brain and non-brain related traits in cellregion and disease specific genes.For each gene and trait, MAGMA calculates gene-level P-value based on the joint association of all SNPs from summary statistics that fall to the gene region while it accounts for linkage disequilibrium (LD) between SNPs.The gene regions were defined with the window size of 35kb upstream and 10kb downstream and LD was estimated from the European panel of 1000 Genome Project phase 3 (33).Then, MAGMA uses a linear regression framework to test whether the differentially expressed genes are more significantly associated with GWAS traits compared to the rest of the genome.In concordance with ATAC-seq GWAS analysis, we excluded genes that overlap the MHC-region(chr6:25-35MB).

Pathway analysis of OCRs
To functionally interpret the OCRs within TRD1 that overlapped with fetal specific OCRs (49) we used the Genomic Regions Enrichment of Annotations Tool (GREAT) approach to identify the biological function of nearby genes for regions using GREAT as shown in Fig. 4F and Fig S7G and Table S7H respectively.2) "Up+Down": upregulated and downregulated schizophrenia OCRs (41,387(27,771) in PFC(ACC) neurons (red), 8,671 (9,405) in PFC(ACC) non-neurons (blue)) (3) "Up": upregulated schizophrenia OCRs (25,146(15,791) in PFC(ACC) in neurons (red), 3,726 (3,075) in PFC(ACC) non-neurons (blue)); and (3) "Down": dysregulated schizophrenia OCRs (16,241(11,980) in PFC(ACC) neurons (red), 4,945 (6,330) in PFC(ACC) non-neurons (blue)).All upregulated and downregulated OCRs are with log2FC (schizophrenia versus controls) >0 and <0, respectively.Error bars represent standard error in schizophrenia heritability from LD score regression.schizophrenia OCRs located inside or outside neuronal (non-neuronal) CRDs in both cortical regions colored in red(blue).SCZ-associated t-statistics obtained from the differential genes mapped to OCRs within the CRD is an independent variable in the model.The differential genes analysis is of CMC cohort (19).for TRD2; PGC3 SCZ: 350,882).B) Plot of logOR from logistic regression of PRS of 632 individuals from CMC cohort estimated using the SNPs in OCRs from these TRDs to predict their case and control status, C) Average variance explained per variant used in estimating PRS of CMC cohort (PFC TRDs: 3,869 for TRD1, 449 for TRD6 and 357 for TRD7; ACC TRDs: 2,447 for TRD1 and 172 for TRD2; PGC3 SCZ: 350,882).D-E) Turkey bar plot of PRS of samples calculated using the SNPs underneath schizophrenia OCRs within TRD1 (subsetted from the full PGC3 SCZ GWAS summary statistics), stratified by inferred disease severity stages in PFC and ACC respectively.Table S1: Metadata and summary of QC metrics for neuronal PFC, ACC, nonneuronal PFC and non-neuronal ACC samples.

Supplementary Tables
Table S2: GWAS enrichment of brain traits and non-brain related traits in neuronal and nonneuronal OCRs.
Table S3: Inferred stages of SCZ/controls of neuronal and non-neuronal samples in PFC and ACC regions using manifold learning method and their respective polygenic risk scores estimated using PGC3 SCZ GWAS summary statistics.
Table S4: List of genomic coordinates of identified neuronal PFC, neuronal ACC, nonneuronal PFC and non-neuronal ACC CRDs.
Table S5: GWAS enrichment of brain traits and non-brain related traits in OCRs within neuron PFC, neuron ACC, non_neuron PFC and non_neuron ACC CRDs stratified by OCR location (inside or outside CRDs).
Table S6: GWAS enrichment of brain traits and non-brain related traits of schizophrenia OCRs that reside inside and outside of neuronal PFC, neuronal ACC, non-neuronal PFC and nonneuronal ACC CRDs.
Table S7: Association test of scFetal cell types with neuronal TRD expression scores in PFC and ACC regions.Magma enrichment of brain and non-brain related traits in significantly associated scfetal cell types from association test with TRD1 marker genes that are present within the neuron TRD1 in PFC and ACC regions.List of TRD1 associated scfetal cell types marker genes and their functional pathways analysis using GREAT tool.
. S19F.To interpret the biological function of cell-region and disease specific genes from section "Association test of scATAC from fetal cell types to TRDs'', we applied GREAT pipeline to obtain Fig 4D, Fig S19D using the Table

Figure S1 |
Figure S1 | Summary metadata of samples in the study.Figure S2 | Quality control metrics in cell-and region-specific samples stratified by diagnosis.Figure S3 | Quality control metrics of neuronal and non-neuronal consensus OCRs. Figure S4 | Variance explained by identified covariates in normalized counts of samples for all OCRs.Figure S5 | Annotation of OCRs to regulatory elements and similarity of OCRs with other datasets.Figure S6 | Annotation of OCRs to genes using the Activity-By-Contact (ABC) method.Figure S7 | Covariate correction of the expression of cell specific OCRs.Figure S8 | Mean variance distribution of normalized counts of neuronal and non-neuronal OCRs. Figure S9 | Correlation of SCZ and BD specific OCRs.Figure S10 | LDsc enrichment of cell and region specific schizophrenia OCRs. Figure S11 | Inferred SCZ stages from chromatin accessibility generated from ACC in neurons.Figure S12 | Distribution of technical and clinical variables of samples stratified by inferred disease stages.Figure S13 | Optimal number of PEER to call final CRDs.Figure S14 | Cis-regulatory landscape defined by CRDs. Figure S15 | Higher order chromatin structure related characteristics of CRDs. Figure S16 | Disease relevance of CRDs. Figure S17 | Characteristics of trans-regulatory domains of CRDs.Trans regulatory domains from hierarchical clustering of neuronal SCZ CRDs from the ACC Figure S18 | Trans regulatory domains from hierarchical clustering of neuronal SCZ CRDs from the ACC. Figure S19 | Analysis of cell-type specificity of neuronal ACC schizophrenia OCRs in fetal cortical scATAC-seq data.Figure S20 | Enrichment of expression of schizophrenia OCRs in SCZ TRDs in fetal cortical OCRs in scATAC-seq data.Figure S21 | Trans Regulatory Domain one (TRD1) in PFC and ACC can stratify SCZ cases and controls in MVP and CMC cohort.

Figure S5 |
Figure S1 | Summary metadata of samples in the study.Figure S2 | Quality control metrics in cell-and region-specific samples stratified by diagnosis.Figure S3 | Quality control metrics of neuronal and non-neuronal consensus OCRs. Figure S4 | Variance explained by identified covariates in normalized counts of samples for all OCRs.Figure S5 | Annotation of OCRs to regulatory elements and similarity of OCRs with other datasets.Figure S6 | Annotation of OCRs to genes using the Activity-By-Contact (ABC) method.Figure S7 | Covariate correction of the expression of cell specific OCRs.Figure S8 | Mean variance distribution of normalized counts of neuronal and non-neuronal OCRs. Figure S9 | Correlation of SCZ and BD specific OCRs.Figure S10 | LDsc enrichment of cell and region specific schizophrenia OCRs. Figure S11 | Inferred SCZ stages from chromatin accessibility generated from ACC in neurons.Figure S12 | Distribution of technical and clinical variables of samples stratified by inferred disease stages.Figure S13 | Optimal number of PEER to call final CRDs.Figure S14 | Cis-regulatory landscape defined by CRDs. Figure S15 | Higher order chromatin structure related characteristics of CRDs. Figure S16 | Disease relevance of CRDs. Figure S17 | Characteristics of trans-regulatory domains of CRDs.Trans regulatory domains from hierarchical clustering of neuronal SCZ CRDs from the ACC Figure S18 | Trans regulatory domains from hierarchical clustering of neuronal SCZ CRDs from the ACC. Figure S19 | Analysis of cell-type specificity of neuronal ACC schizophrenia OCRs in fetal cortical scATAC-seq data.Figure S20 | Enrichment of expression of schizophrenia OCRs in SCZ TRDs in fetal cortical OCRs in scATAC-seq data.Figure S21 | Trans Regulatory Domain one (TRD1) in PFC and ACC can stratify SCZ cases and controls in MVP and CMC cohort.

Figure S1 |
Figure S1 | Summary metadata of samples in the study.

Figure S2 |
Figure S2 | Quality control metrics in cell-and region-specific samples stratified by diagnosis.A) Distribution of uniquely mapped reads, fraction of duplicated reads, fraction of mitochondrial reads, and median insert size for each sample.B) Distribution of TSS enrichment of OCRs using all housekeeping genes from Refseq (9), fraction of reads in OCRs, and gc content in OCRs for each sample.In both A) and B), box plots are depicted with the center line representing the median, the box indicating the interquartile range (between the 25th and 75th percentile), and whiskers extending to the lowest and highest values within 1.5 times the interquartile range from the median.

Figure S3 |
Figure S3| Quality control metrics of neuronal and non-neuronal consensus OCRs.A) Distribution of median read insert size, distance to TSS, fragment length, and TSS enrichment of cell-specific consensus OCRs.B) Genotype checks were performed using pairwise comparisons of genotypes called from ATAC-seq with genotypes from whole-genome sequencing, as well as with the ATAC-seq libraries themselves.C) Correlations of raw reads coverage (number of reads) over consecutive bins of 10 kb genomic regions between samples originating from the same versus different person (yellow line: samples originating from the same cell type but different brain region; green line: samples originating from different cell types but the same brain regions).Median correlations are highlighted by the "x" symbol."D) Principal component analysis (PCA) was applied to evaluate chromatin accessibility levels in OCRs, with cell types represented in red for neuronal and blue for non-neuronal cells.E) Sex check based on measuring the number reads mapped on chromosome Y.

Figure S4 |
Figure S4 | Variance explained by identified covariates in normalized counts of samples for all OCRs.A) Heatmap representing the Pearson correlation of expression levels for merged chromatin regions across all samples.The bar above the heatmap displays cell and region type labels corresponding to the samples.B) Distribution of variance explained by each OCR with respect to different covariates.The covariates on the x-axis were selected using a BIC model explained in the methods section.Box plots are depicted with the center line indicating the median, the box representing the interquartile range (between the 25th and 75th percentiles), and whiskers extending to the lowest and highest values within 1.5 times the interquartile range from the median.Dots are used to indicate potential outliers beyond this range.

Figure S5 |
Figure S5 | Annotation of OCRs to regulatory elements and similarity of OCRs with other datasets.A) Bar plot depicting the percentage of regulatory elements in each cell-specific OCR, obtained after annotation using ChIPseeker.B) Enrichment analysis showing the overlap of cell-specific OCRs with cell-specific marker genes using the resource from 'Ze' (Zeisel)(53) and 'Zh' (Zhang)(54).'#': significant for enrichment after FDR correction of multiple testing across all tests in the plot (Benjamini-Hochberg test); '•': nominally significant for enrichment.C) Spearman correlation analysis illustrates the expression count (log2cpm) correlation of neuronal OCRs from samples in this study with expression counts (log2cpm) of samples from PFC homogenate in Bryois et al.'s (55) study.D) Pie plot displaying the distribution of OCRs categorized as known (overlapping with other studies(2, 3, 55-59)) and novel (unique to this study).E) Overlap coefficient to show the overlap of OCRs identified in this study with cell-specific OCRs from other published datasets(3, 56, 57).

Figure S6 |
Figure S6 | Annotation of OCRs to genes using the Activity-By-Contact (ABC) method.A) Histogram of the number of OCRABC linked per gene.B) Histogram of the number of genes linked per OCRABC.C) Histogram of the number of genes "skipped" by an OCRABC to reach their linked genes.D) Histogram of the distance of OCRABC to the TSS of regulated genes.

Figure S7 |
Figure S7 | Covariate correction of the expression of cell specific OCRs.A) Violin plots to show the distribution of variation of OCRs for biological and technical covariates in neurons (left) and nonneurons (right) before and B) after correcting for covariates.All technical covariates (i.e.except Sex, Brain region, Person and Residuals) were identified using the BIC model explained in the methods section.C) Mean proportion of overall variance in OCRs attributed to the biological and technical covariates before and after correcting for technical (but not biological) covariates in neurons (left) and non-neurons.

Figure S8 |
Figure S8 | Mean variance distribution of normalized counts of neuronal and non-neuronal OCRs.A) and B) show the plot of mean vs variance in expression of neuronal and non-neuronal OCRs shown as black dots with LOWESS trend as a red line that indicate low -moderate biological variation.

Figure S9 |
Figure S9 | Correlation of SCZ and BD specific OCRs.A) Heatmap of the number of differential OCRs at FDR < .05 in for all 4 datasets: neuron PFC, neuron ACC, non-neuron PFC and non-neuron ACC across SCZ/controls and BD/controls associated.B) Comparison of t-statistics of significant SCZ associated changes at FDR < .05across cortical regions in neuronal (53,673 OCRs) and non-neuronal (14,616 OCRs).White square near the origin corresponds to the OCRs that did not pass the FDR cutoff.

Figure S11 .
Figure S11.Inferred SCZ stages from chromatin accessibility generated from PFC and ACC in neurons.A-B) Estimated low-dimensional space to order samples based on similarity in expression of upregulated neuronal OCRs (n=2,411(1,495)) in SCZ from PFC(ACC) regions.The samples are colored by clinical diagnosis.SCZ samples are in yellow(purple) and controls are in black(black) in PFC(ACC) regions.Stratification of samples by clinical diagnosis as a function of inferred disease stage, where early stage (n=1) to late stage (n=5) is from left to right.D-E) Distribution of pseudotime for neuronal PFC and neuronal ACC samples in C and D respectively.Logistic regression model: schizophrenia/Controls ~ pseudotime; OR=2.9, P value = 1.69e-05 in PFC (D) and OR=2.8,P value = 9.50e-05 in ACC (E).F-G) Plot of PRS of samples calculated using PGC3 SCZ GWAS summary statistics, stratified by inferred disease status in PFC and ACC region respectively.Beta estimate and p value in G-H are obtained using linear regression model: disease stages/status ~ PRS + Age + Age 2 + Sex.H) Plot of PRS of samples calculated using PGC3 SCZ GWAS summary statistics, stratified by inferred disease severity stages in ACC region.Box plots in D-H have lower and upper hinges at the 25th and 75th percentiles and whiskers extending to, at most, 1.5xIQR (interquartile range).

Figure S12 .
Figure S12.Distribution of technical and clinical variables of samples stratified by inferred disease stages.A-C) Bar plot to show the brain-bank, gender and race of neuron PFC samples stratified by inferred disease stages.D-E) Distribution of fraction of reads in peak(frip) and postmortem interval (PMI) in hrs of neuron PFC samples as a function of inferred disease stages.F-H) Bar plot to show the brain-bank, gender and race of neuron ACC samples stratified by inferred disease stages.I-J) Distribution of fraction of reads in peak (frip) and postmortem interval (PMI) in hrs of neuron ACC samples as a function of inferred disease stages.Box plots in D-E) and I-J) have lower and upper hinges at the 25th and 75th percentiles and whiskers extending to, at most, 1.5xIQR (interquartile range).

Figure S13 |
Figure S13 | Optimal number of PEER to call final CRDs.A) Plot of mean absolute correlation (MAC) cut off value which is the value at 95th percentile from MAC distribution of MAC of each CRDs and permuted CRDs at i th PEER.

Figure S14 |
Figure S14 | Cis-regulatory landscape defined by CRDs.A) Summary table of the number of samples, OCRs and CRDs identified across four datasets 1) neuronal PFC, 2) neuronal ACC, 3) nonneuronal PFC and 4) non-neuronal ACC.B) Genome-wide base pair length distribution (x-axis log scale) of CRDs (colored curves) compared to neuronal and non-neuronal PFC subTADs (black curve).C) Schematic of a CRD obtained from the pairwise correlation of four OCRs.The log2FC CRD is summarized as the average of log2FC (SCZ versus controls) of four OCRs.

Figure S15 |
Figure S15 | Higher order chromatin structure related characteristics of CRDs.A) Distribution of number of OCRs per CRD in neuron PFC, neuron ACC, non-neuron PFC and non-neuron ACC CRDs.The x-axis is in log scale.B) Jaccard index to show the overlap of OCRs within the cell and region specific CRDs.C) Odds ratios to find the cell-region specific OCRs within CRDs inside PFC cell specific subTADs when compared to all OCRs.Proportion of CTCF sites in a bin of 1Kb as a function of distance from boundaries of neuronal CRD in D) PFC (red), E) ACC (red), F) non-neuronal CRD PFC (blue), G) ACC (blue), H) PFC neuron subTAD (black) and PFC non-neuronal subTAD(gray).CTCF regions are from H1 stem-cell-differentiated neuronal culture in relation to distance from CRD.I)Odds ratio to find ABC OCRs within neuronal PFC(red), neuronal ACC(red) and non-neuronal PFC(blue), non-neuronal ACC(blue) CRDs.(neurons: OR>3, Fisher's exact test p value < 0.05) and non-neurons CRDs (OR>2, Fisher's exact test p value < 0.05)

Figure S17 |
Figure S17 | Characteristics of trans-regulatory domains of CRDs.A) Schematic workflow depicting identification of TRDs from hierarchical clustering of schizophrenia CRDs.B) Gamma statistics were plotted against the number of clusters in Chromatin Regulatory Domain (CRD) matrices of neuronal PFC and ACC.The dotted line indicates the optimal number of clusters, n=10, which was selected for both neuronal PFC and ACC through the hierarchical clustering method.C) The number of OCRs and SCZ associated OCRs are represented as "All OCRs'' and "schizophrenia OCRs," respectively, within CRDs in neuronal PFC TRDs (top plots) and non-neuronal ACC TRDs (bottom plots).D) The number of upregulated and downregulated schizophrenia OCRs within TRDs are depicted separately in neuronal PFC TRDs (top plots) and neuronal ACC TRDs (bottom plots).

Figure S18 |
Figure S18 | Trans regulatory domains from hierarchical clustering of neuronal SCZ CRDs from the ACC.(A) Heatmap depicting hierarchical clustering of trans interaction of 1,691 SCZ CRDs across 330 samples from the ACC.The clustering results in ten trans regulatory domains (TRDs) using gamma statistics.The directionality annotation bar plot above the heatmap shows upregulation in red (log2SCZ-log2Control >0) and downregulation in navy (log2SCZ-log2Control >0) of SCZ neuronal CRDs.Bottom plot shows a barplot of the number of up and downregulated SCZ neuronal CRDs per TRD from the ACC region.(B) Coefficients of SCZ heritability stratified by TRDs.The overlap of OCRs within the TRDs with SCZ risk variants was assessed using LD score regression.P values are from LD score regression.'#': significant for enrichment in LD score regression after Benjamini-Hochberg FDR correction for multiple testing across all tests in the plot (FDR < 5%).Error bars show standard error in SCZ heritability from LD score regression.(C) Top and bottom bar plot show log2FCfetal_cortical (Fetal vs adult controls) and log2FC (SCZ vs Controls) of neuronal ACC CRD respectively (D) Spearman correlation of log2FCfetal_cortical compared to log2FC (SCZ vs Controls) of neuronal ACC CRDs from TRD1 (n=3,149 OCRs).P value is obtained from Spearman's rank correlation (ρ) test.(E) Functional pathway enrichment of OCRs from neuronal ACC TRD1 that overlap with fetal cortical specific OCRs.

Figure S19 |
Figure S19 | Analysis of cell-type specificity of neuronal ACC schizophrenia OCRs in fetal cortical scATAC-seq data.A) UMAP plot of fetal cortical scATAC-seq data from Trevino et al. in which each cell type is colored by Z-scores of schizophrenia OCRs in neuronal ACC TRD1.The cell types include early radial glia (EarlyRG), late radial glia (LateRG), oligodendrocyte progenitor cell/oligodendrocyte (OPC/Oligo), neuronal intermediate progenitor cell (nIPC), oligo intermediate progenitor cell (oIPC), glutamatergic neuron (GluN), interneuron (IN), endothelial cell (EC), microglia (MG), and pericytes (Peric).B) Distribution of enrichment scores stratified by two major classes of cell types: 1) excitatory(nine glutamatergic neurons) and 2) inhibitory neurons (five inhibitory neurons).The color represents the magnitude of coefficient of association of enrichment scores of schizophrenia OCRs from neuronal ACC TRD1 at each developmental stage separately for each cell types, using the model (enrichment score ~ celltype + (1| sample ID) where cell types are grouped as the cell type of interest vs. all other cell types.* depicts the cell types that are significantly enriched from the latter test and also have higher enrichment than the schizophrenia OCRs outside TRDs.C) Functional pathway analysis conducted on the early and late fetal glutamatergic specific marker genes that are also annotated by schizophrenia OCRs in TRD1.D) Heat map of MAGMA enrichment P values of brain-related and nonbrain related GWAS traits and E) Coefficients of enrichment of common SCZ risk variants using MAGMA in fetal cell specific marker genes that are also annotated to schizophrenia OCRs within neuronal ACC TRD1.The x-axes in D) and E) are 1) early fetal (16+20 pcw) glutamatergic (GluN 1,3,5,8) in magenta, 2) late fetal (21+24 pcw) glutamatergic (GluN 6,9) in blue, 3) early fetal (16+20 pcw) inhibitory (IN2,4) in magenta, 4) late fetal (21+24 pcw) inhibitory (IN 2,3,4,5) in blue fetal cell types and all ABC mapped schizophrenia OCRs to genes within neuronal ACC TRD1 in black.'#':

Figure S21 |
Figure S21 | Trans Regulatory Domain (TRD1) in PFC and ACC can stratify SCZ cases and controls in MVP and CMC cohort.For SCZ TRDs: TRD1, TRD6 and TRD7 from PFC region and TRD1, TRD2 from ACC region.A) Average variance explained per variant used in estimating PRS of MVP cohort (PFC TRDs: 1,388 for TRD1, 139 for TRD6 and 103 for TRD7; ACC TRDs: 815 for TRD1 and 46 For current study samples, covariates included in the analyses were age at the time of death, age squared, the first 20 genotype PCs (computed within EUR ancestry samples only), sex, and institution (Mount Sinai, University of Pennsylvania, University of Pittsburgh, or NIMH Human Brain Collection Core).For the MVP cohort, we utilised the latest MVP Phenotypes (v21.1) and used covariates for age at recruitment, age squared, the first 20 genotypes PCs (computed within the whole cohort) and sex.PRS regression coefficients were reported as the natural logarithm of odds (logOR) along with 95% confidence intervals, and all reported p-values are unadjusted.A variance partitioning tool (https://github.com/GabrielHoffman/misc_vp/blob/master/calcVarPart.R) was used to determine the variance in SCZ(BD, MD) diagnosis status explained by the PRS variable for each regression model.These values were then divided by the number of variants used to compute each PRS, respectively, to obtain the reported "Average variance explained per variant used in estimating PRS" metrics.
OCRs in all four datasets (Fig S15A and TableS4).We estimated cell and region specificity of CRDs by measuring jaccard index across all pairs as shown in Fig S15B.To measure the odds of observing CRDs to be inside a subTADs, we ran Fisher test (42) by using 2 x 2 contingency table of OCRs within CRDs that are inside/outside subTADs in comparison to all OCRs that are inside/outside subTADs as shown in Fig S15C.To test whether the OCRs within the CRD contain three-dimensional genome interactions, we utilised CTCF ChIP-seq genomic regions from ENCODE human neural cells (43) (syn52264219) and quantified the density of CTCF sites in 200 bins (each bin size equals to 1 kb) around CRD boundaries (Fig. neuronal CRDs and 4,612(4,710) non-neuronal CRDs were obtained from PFC(ACC) regions (Fig.3B) which encompassed, on average, of 21.7(22.1)and 18.9(18.7)OCRs per CRD that were present in neuronal and non-neuronal CRDs in PFC(ACC) respectively, with a median number of 6