Cell type-specific Multi-Omics Analysis of Cocaine Use Disorder in the Human Caudate Nucleus

Abstract Structural and functional alterations in the brain's reward circuitry are present in cocaine use disorder (CocUD), but their molecular underpinnings remain unclear. To investigate these mechanisms, we performed single-nuclei multiome profiling on postmortem caudate nucleus tissue from six individuals with CocUD and eight controls. We profiled 31,178 nuclei, identifying 13 cell types including D1- and D2-medium spiny neurons (MSNs) and glial cells. We observed 1,383 differentially regulated genes and 10,235 differentially accessible peaks, with alterations in MSNs and astrocytes related to neurotransmitter activity and synapse organization. Gene regulatory network analysis identified the transcription factor ZEB1 as exhibiting distinct CocUD-specific subclusters, activating downstream expression of ion- and calcium-channels in MSNs. Further, PDE10A emerged as a potential drug target, showing conserved effects in a rat model. This study highlights cell type-specific molecular alterations in CocUD and provides targets for further investigation, demonstrating the value of multi-omics approaches in addiction research.


Introduction
Cocaine is one of the most frequently consumed illicit psychostimulants worldwide and poses a signi cant public health challenge.In the US alone, about 4.8 million people, or 1.7% of the population, used cocaine in the past 12 months and 0.5% are diagnosed with a cocaine use disorder (CocUD) 1 .
Cocaine use is associated with a range of negative side effects, most prominently cardiovascular and respiratory complications 2 .In recent years, there has been a rise in deaths caused by cocaine overdoses, especially in the African American population 3 .
A complex interplay between genetic and environmental factors is assumed to contribute to an individual's susceptibility and resilience to CocUD 4 .In recent years, epigenetics has been increasingly studied as an important mechanism for how environmental factors in uence the transcription of genetic information, thus altering brain function 5 .However, the role of epigenetic changes and their relevance for altering gene expression levels in brain regions implicated in CocUD remains unclear 6 , requiring joint pro ling of epigenetics and gene expression and their integration using multi-omics analyses.
Our current understanding of the neurobiological basis of CUD implicates the involvement of a number of different brain regions involved in the development and maintenance of this disorder.Most notable are the ventral striatum, which is involved in reward processing as part of the mesocorticolimbic pathway 7 , and the dorsal striatum -consisting of the putamen and caudate nucleus -which plays a critical role in habit formation and decision-making 8,9 .Most of our current knowledge comes from preclinical studies, with the majority investigating gene expression in the ventral striatum, particularly the nucleus accumbens.To date, only a few studies have focused on human postmortem brain tissue.An early study using microarray technology identi ed a reduction in myelin-related genes in the nucleus accumbens in CUD 10 .Studies in other brain regions, such as the hippocampus, highlight the relevance of metabolic alterations, particularly in mitochondria, in CocUD 11 , as well as changes in ion-and potassium-channel expression and receptor density 12 .While studies in the human striatum are sparse, more evidence is available from rodent models.On the bulk level, several studies highlight CREB-signaling 13,14 , immediate early genes (IEGs) [13][14][15] , and ion-channel expression [16][17][18] .Interestingly, a recent study showed that transcriptomic signatures associated with CocUD widely overlap in the nucleus accumbens and caudate nucleus in postmortem human brain tissue 18 .Furthermore, differential gene expression in human postmortem brain tissue was concordant with a mouse model, highlighting the translational value of such model systems in substance use disorders 18 .
One major limitation of bulk-level studies results from the lack of cell type speci city of gene expression alterations.So far, cell type-speci c gene expression studies have only been performed in the rat nucleus accumbens and drosophila models of CocUD.These studies suggested the importance of D1-medium spiny neurons (D1-MSNs) 19 and con rmed cell type-speci c alterations in IEG expression 20 .
A second research gap depicts the lack of integrative multi-omics analyses in the same brain tissue specimens at single-cell resolution.Multi-omics analysis allows for a more mechanistic investigation of molecular alterations by combining evidence from multiple layers.With the 10x Genomics Multiome technology gene expression and open chromatin states can be pro led from the same cell, providing a powerful tool for linking epigenetic mechanisms to gene expression changes in CocUD.Here, we can directly infer transcription factor activity and construct gene regulatory networks to identify upstream regulators of transcriptional changes.
In our study, we aim to address these two research gaps by pro ling the differential gene expression patterns and chromatin accessibility landscapes in a cell type-speci c manner within the caudate nucleus of individuals with CocUD compared to non-affected subjects.By employing state-of-the-art sequencing techniques and data analysis, we sought to identify key upstream regulators of transcriptional changes that underlie the pathophysiology of CocUD and validate them in the wellestablished 3-crit model of CocUD in rats 21,22 .To our knowledge, this is the rst study that has pro led human CocUD at the single-cell multi-omics level, in a highly affected but underrepresented population.

Identi cation of cell types in the human caudate nucleus
We isolated nuclei from the human caudate nucleus of six individuals with CocUD (mean age 55) and eight controls (mean age 57), all of African American ancestry (Fig. 1A).CocUD cases and controls were balanced according to age and sex, see also Sup.Table S1.We performed single-nuclei ATAC-seq and RNA-seq using the 10x Genomics Multiome platform, which allows for simultaneous pro ling of chromatin accessibility and gene expression from each cell.After quality control, we retained 31,178 nuclei with a median of 1,641 expressed genes and 14,418 ATAC fragments per nucleus.For normalization and dimension reduction, we used Seurat 23 and Signac 24 .The R package Harmony 25 was used for the integration and reduction of batch effects (Fig. 1B-D, Sup.Fig. S1A-S1E).Weighted nearest neighbor analysis resulted in an integrated UMAP depicting 13 clearly delineated cell types, including astrocytes, microglia, oligodendrocytes, oligodendrocyte precursor cells (OPC), endothelial cells and lymphocytes, as well as different GABAergic neurons, such as D1 and D2 medium spiny neurons (MSNs) and a small population of cholinergic neurons (Fig. 1B).Distribution of cell types was comparable between samples, with one control sample having a higher ratio of astrocytes to oligodendrocytes (Fig. 1E).Annotation of clusters was supported by the expression of known marker genes (Fig. 1F), as well as their promoter accessibility (Fig. 1G).As expected in this brain region 26 , the main neuronal cell types observed were GABAergic D1 and D2 medium spiny neurons (MSN).

Differential Expression & Accessibility Analysis
We performed within cell type differential expression analysis comparing CocUD cases with controls.
Differential accessibility analysis revealed 10,235 differentially accessible (DA) peaks (Sup.Table S3), again with a clear distinction between glial and neuronal cell types (Fig. 2E).DA peaks were overrepresented in promoter and exonic regions, compared to the genomic background (Fig. 2F & 2G).In a GO term enrichment analysis performed separately for promoter and distal peaks, we observed an overrepresentation of voltage-gated ion channel and serine-threonine kinase activity in promoter peaks (Sup.Fig. S2A) and ligand-and voltage-gated channel and methyltransferase activity in distal peaks (Sup.Fig. S2B).
CocUD-speci c peak-linked genes are associated with ribosomal activity, synaptic pathways, and nervous system development To identify CocUD-speci c linked peaks, we linked peaks to gene expression separately for samples from individuals with and without CocUD, revealing N=821 CocUD-speci c, N=3,624 common, and N=1,159 control-speci c gene-peak pairs (Fig. 3A).As multiple peaks can be linked to a single gene, we then further investigated the functional relevance of linked genes.GO term analysis of CocUD-speci c peaklinked genes showed overrepresentation of ribosomal and synaptic pathways, as well as a pathway cluster related to nervous system development (Fig. 3B).

Family-wise motif enrichment of transcription factors in cell types
Altered open chromatin states in CocUD can increase or decrease the availability of transcription factor motifs thereby resulting in transcription factor-mediated downstream effects on gene expression.To evaluate differential chromatin accessibility related to transcription factor motifs, we performed a cell type-speci c motif enrichment analysis (Fig. 3C) suggesting cell type-speci c motif enrichment in CocUD (Fig. 3D).Here, we observed major downregulation of transcription factor (TF) motifs in CocUD of immediate early genes, such as JUND and FOS, in D1-and D2-MSNs, while the same motifs were upregulated in astrocytes, oligodendrocytes and OPCs (Fig. 3E and Sup.Table S4).Strong effects were also present for the SOX and RFX families.To determine whether there was differential TF expression, alongside differential motif availability, we speci cally investigated differential TF expression (Fig. 3F).
Here, the strongest differential expression patterns were observed for ZEB1, RFX3, FOXP2, and NFIB.For ZEB1, we observed reduced availability in CocUD of its motif (Fig. 3G) in neurons while increased availability in astrocytes was found.At the same time, statistically signi cant downregulation of the ZEB1 transcript was found across neuronal and astrocyte clusters in CocUD.We con rmed robust expression of ZEB1 in neuronal and astrocyte clusters across conditions (Fig. 3H) implicating it is a promising candidate transcription factor for inducing downstream differential expression in these cell types.Another deregulated transcription factor in CocUD was FOXP2, for which we found altered motif availability and transcript expression especially in D1-MSNs and microglia (Fig. 3G+3H).

Gene Regulatory Networks reveal TFs accounting for transcriptional changes in CocUD
Following up on the observation of common transcriptional changes in D1-and D2-MSNs and the identi cation of transcription factor candidates such as ZEB1 and FOXP2, we were interested in gene regulatory networks (GRN) related to CocUD.GRNs can be used to infer the relationship between TFs and their downstream targets (Fig. 4A).An MSN-speci c GRN was constructed using pando 27 and metrics are depicted in Sup.Table S5 and Sup.Fig. S3A.As immediate early genes are known to in uence gene expression of a large set of downstream targets after exposure to multiple drugs of abuse, we were interested if there are conserved sets of TFs and their targets in our CocUD dataset that might be of particular importance for the induction of altered transcriptional programs.Appropriately, two subclusters emerged in the GRN, one consisting mainly of immediate early genes, such as FOSB or JUND, targeting heat shock proteins (e.g.HSPA1B, HSPA1B) and downstream map kinases (Fig. 4A).A larger cluster encompasses the TFs highlighted in the motif analysis, targeting calcium-, potassium-and ion-channels.To identify the most relevant TFs, we performed a scoring approach wherein the size of the TF regulon is weighed by the log2FC from a differential regulon expression analysis (Sup.Fig. S3B & Sup.Table S6).The regulon refers to all downstream targets that are either activated or suppressed by the associated TF (Sup.Fig. S3C).Hence, one TF can have two regulons for activated downstream genes, e.g.ZEB1(+), and for suppressed ones, e.g.ZEB1(-).By investigating the regulons, we observed that a small number of TFs account for a large number of genes, many of which have been previously identi ed in the DE analysis (Sup.Fig. S3D).A TF with particularly large effects in both activating and suppressing downstream gene expression was again ZEB1 that is widely expressed in D1-and D2-MSNs (Sup.Fig. S3B & S3C; Sup.Table S6).Interestingly, we observed that ZEB1 has speci c suppressing effects in controls and activating effects in individuals with CocUD: CocUD-speci c subclusters of MSNs express downstream targets activated by ZEB1, while suppressed ZEB1 targets are preferably expressed in control individuals (Fig. 4B).GO ORA analysis of the downstream targets of the three most important TFs derived from the GRN, ZEB1, PBX3, and ZNF148, revealed common suppressing effects on genes related to ribosomal pathways while activated target genes were consistently involved in synaptic signaling pathways (Fig. 4C).To further explore the GRN in MSNs, we characterized the chromatin structure of highly connected TF targets, such as PDE10A and CACNA1C by visualizing differential accessibility in TF binding sites between individuals with CocUD and controls (Sup.Fig. S3E & S3F).

Druggability analysis suggests phosphodiesterase-inhibitors as potential pharmacological targets
When identifying potential pharmacological targets for disorders with complex molecular alterations, it is crucial to preserve this complexity in druggability analysis.Therefore, we performed a druggability analysis of highly connected target genes from the D1-/D2-MSN GRN, using the drug-gene interaction database (DGldb).Here, we observed several pharmacological agents with the potential to treat CocUD (Sup.Table S7).Interestingly, among the top drug targets with large effect size in the DE analysis, we observed PDEs (PDE10A and PDE7B) several times, suggesting brain-expressed phosphodiesterases as a potential treatment target in CocUD.

Validation of differential expression in PDE10A and CACNA1C in the 3-crit model of CocUD
We prioritized the DE genes from the CocUD DE analyses based on their relationship with the DA peaks (CACNA1C), their importance in GO terms (KCNIP4), their role in the GRN (ZEB1) and in the druggability analysis (PDE10A).To validate these ndings, we performed RT-qPCR analysis on mRNA extracted from the caudate nucleus of rats that underwent the 3-crit model of CocUD, a cocaine self-administration procedure after which three addiction-like criteria are assessed: persistence of drug seeking, motivation for drug taking, and resistance to punishment.Animals ful lling all three criteria are classi ed as "addicted-like".We compared ve 3-crit animals, to six 0-crit animals that were exposed to cocaine, but did not develop addiction-like symptoms and six cocaine-naïve control animals (N total = 17, Fig. 4D).We observed differential expression in PDE10A (t=2.89,p=0.022), consistent with an upregulation in the nonneuronal cell types (Sup.Table S2), and in CACNA1C (t=-2.75,p=0.024), in line with a downregulation in MSNs (not signi cant in non-neuronal cell types, Table S2).These effects were signi cant for 0-crit, but not 3-crit animals, compared to controls, indicating either a cocaine-consumption effect or a habituation in the 3-crit animals.No effect was observed for ZEB1 (t=0.15,p=0.88) and KCNIP4 (t=0.05,p=0.96).

Multi-Omics Factor Analysis reveals a sex-speci c factor characterized by metabolic pathway enrichment
In addition to the data-informed approaches, we performed a data-driven multi-omics factor analysis to identify a dimensionality-reduced set of factors associated with CocUD.Of the 15 resulting factors, four were associated with CocUD (Sup.Fig. S4A and S4B).Factor6 is characterized by high factor loadings for microglia, with positive enrichment for immune pathways and negative enrichment for pathways related to neuronal development (Sup.Fig. S4B & S4C).
Factor7 was mainly associated with GABAergic neurons and enriched for GO terms related to metabolic processes and synaptic signaling (Sup.Fig. S4D).Overall, the results for the CocUD-associated factors con rmed the ndings of the DE analysis.
Of particular interest is the CocUD-associated Factor8 that indicated sex-speci c subclusters in the majority of cell types, marked by high factor-loadings in subclusters of OPCs, oligodendrocytes, MSNs, and astrocytes.Characterized by an enrichment for metabolic GO terms, this factor suggests a malespeci c effect.Interestingly, in the promoter-associated motifs of Factor8, we observe a negative enrichment of major regulators from the GRN, such as ZEB1 and TCF4 (Sup.Fig. S4E).
Sex-strati ed DE analysis shows concordant effects in astrocytes, microglia, and D1-/D2-MSNs Because we observed high Factor8 loadings in the majority of cell types, we performed exploratory analyses comparing the effects in the DE analysis between male and female participants using rank-rank hypergeometric overlap (RRHO) analysis.Interestingly, we observed high concordance of DE genes between sexes in astrocytes, GABAergic D1MSN, and microglia, and moderate concordance between GABAergic D2 MSNs, and OPCs (Fig. 4E).At the same time, strong sex-speci c effects were evident in GABAergic CALB2 MSNs and oligodendrocytes, underlining the results from MOFA.

No cell type-speci c enrichment of GWAS signals of CocUD
To test whether GWAS signals of substance use disorders and CocUD in particular, were overrepresented in cell type gene expression patterns, we used single-cell disease-relevance scoring (scDRS).While we observed low enrichment scores of CocUD in microglia and cholinergic neurons, feature plots did not support a clear cell-type speci c effect (Sup.Fig. S4F, Sup.Table S8).

Discussion
Our single-nuclei multi-omics study of CocUD in the human caudate nucleus reveals rst insights into cell type-speci c alterations of gene expression and the underlying chromatin states, directly linking transcription factor activity to downstream gene expression.To our knowledge, this is the rst study that investigated human CocUD at single-nuclei resolution combining ATAC-and RNA-seq in a multi-omics approach in the same cell.Our study highlights synaptic pathways, ribosomal activity, and metabolic pathways as altered in CocUD, often in a cell type-speci c manner.
We present different lines of analysis from data-informed differential gene expression and DNA accessibility analyses to data-driven GRN and factor analyses, all with convergent results.
In both the DE and DA analyses, we observed initial evidence for altered ion-and calcium-channel activity, implicating altered synaptic signaling in CocUD.In addition, we found an overrepresentation of genes in pathways related to synapse organization in CocUD.Changes in synaptic plasticity following repeated cocaine exposure have been well described 28,29 .This is also in line with the large effects observed for ribosomal genes, both in the DE, DA and GO ORA analyses, as the transcription of ribosomal DNA is an important prerequisite to neural plasticity 30,31 .Interestingly, we also observed KEGG-enrichment of neurodegenerative diseases and addiction-related processes in the neuronal cell types, providing evidence that our results re ect known pathologies.In addition, we observed vast expression differences in astrocytes, related to synaptic processes.Consistent with our results, it has been shown that cocaine can trigger astrocyte-mediated synaptogenesis in the nucleus accumbens shell 32 .
To exploit the strength of the multiome framework, the simultaneous pro ling of RNA and open chromatin in the same cell, we calculated a GRN in D1-and D2-MSNs.Here, we observed two clusters, the smaller one consisting of immediate early genes, whose effect has been described extensively 20,33 .The other cluster consisted of a set of major regulators, with downstream effects mainly on calcium-, ion-, and potassium-channels that have also been found in the differential expression analysis.Interestingly, we were able to show CocUD-speci c effects, not on TF activity per se, but on the activation and suppression of downstream gene expression.We observed that ZEB1 activates genes related to synaptic signaling in a CocUD-speci c subcluster of MSNs, ZEB1 could therefore potentially mediate this well-described effect 28 .Interestingly, ZEB1 was also recently identi ed as a major transcriptional regulator in Alzheimer's disease 34 , providing another link between neurodegenerative diseases and SUDs.We further explored the druggability of the highly connected downstream targets, revealing phosphodiesterase inhibitors as a candidate drug, consistent with previous studies in rodent models showing reduced acquisition of cocaine place preference and cue-induced reinstatement of cocaine seeking 35,36 .Because of the large overlap of DE genes in the D1-and D2-MSNs, we constructed the GRN using information from both cell types.It has to be noted that while the transcriptional signatures of the two cell types are largely overlapping, a recent study has shown that transcriptional changes in D1-MSNs are more prevalent in the acute response to cocaine and in D2-MSNs more re ective of a cumulative effect of chronic exposure 37 .
We explored whether broad gene expression changes of GRN-identi ed targets could be observed in the 3-crit model of cocaine use disorder.Here, we validated the effects for PDE10A and CACNA1C in comparing 0-crit, but not 3-crit, to control animals.This highlights an important challenge in addiction research: based on postmortem human brain tissue alone, we cannot conclude whether the observed effects result from substance intake or re ect neuroadaptations as observed in substance use disorder.
In addition to the GRN, we performed another data-driven analysis, Multi-Omics Factor Analysis (MOFA).MOFA con rmed the results of the DE analysis, with the neuronal-speci c factor pointing towards increased metabolic processes and synaptic signaling in CocUD.We also observed a microglia-speci c factor associated with control status, providing further evidence that neuroin ammation is not elevated in CocUD 18 , as opposed to other SUDs, such as alcohol use disorder 38,39 and opioid use disorder 18 .Interestingly, we also observed a sex-speci c factor associated with CocUD and enriched for ribosomal and metabolic processes.While sex-speci c effects have been observed at large in rodent models, suggesting that female rats are more susceptible to cocaine consumption, evidence from human studies suggests an effect speci c to the luteal phase of the menstrual cycle 40 .In our exploratory analyses of potential sex-effects in the gene expression data, we observed strong sex-speci c effects in oligodendrocytes, endothelial cells, and CALB2-GABAergic neurons, providing rst evidence into potential drivers of sex-speci c effects in CocUD.
At the same time, the sex-speci c effects need to be interpreted with caution.While the single-nuclei framework allowed for a high statistical power, because each cell was treated as one observation, the overall sample size was still small and results could lack generalizability.
We did not observe cell type-speci c enrichment of GWAS signals of CocUD.It has to be noted that the currently available GWAS samples of CocUD are small, comprising 2,482 cases and 836 controls of African American ancestry, and have not yet identi ed genome-wide signi cant SNPs 41 .Future studies increasing the sample size of CocUD GWAS are needed 4 , providing the basis to interpret GWAS signals in multi-omics studies.
Our results replicate known neurobiological mechanisms in CocUD and identify new targets, with complex CocUD interaction effects, particularly ZEB1, for future investigation.The present analysis highlights the potential of using a multi-omics framework allowed us to analyze the data in a more mechanistic approach compared to single-omics datasets.

Postmortem human brain tissue
Postmortem human brain tissue samples of the caudate nucleus from six individuals with and eight individuals without CocUD of African American descent were obtained from the University of Texas Health Science Center at Houston (UTHealth Houston) Brain Collection (Sup Table S1).CocUD donors had age > 18 and a lifetime diagnosis of cocaine use disorder based on next-of-kin interviews and psychological autopsy 42 .Controls had no history of substance use or neuropsychiatric disorders.For all cases, exclusion criteria included a known diagnosis of severe neurodevelopmental disorders.

Nuclei isolation
For each sample, nuclei were isolated from 10 mg of frozen postmortem tissue, in order to minimize the content of free-oating debris.Brain samples were processed following the 10x Genomics Chromium Nuclei Isolation kit with RNase Inhibitor user guide.Brie y, the tissues were dissociated in lysis buffer, ltered and the cellular debris eliminated with the designated buffer.Then, alternate steps of centrifugation and wash resulted in clean nuclei suspensions.Nuclei were automatically counted on LUNA-FL™ Dual Fluorescence Cell Counter and particles smaller than 5 µm of diameter and bigger than 15 µm of diameter were excluded from the counting.

Library preparation and sequencing
Single-nuclei RNA-seq and ATAC-seq libraries were generated using the 10x Genomics Chromium Next GEM Single Cell Multiome ATAC + Gene Expression kit and user guide.In brief, single nuclei were loaded into chromium chips J with a capture target of 10,000 nuclei per sample.Final libraries were prepared as indicated in the 10x Genomics protocol and sequenced at NGS Competence Center Tübingen (Tübingen, Germany) using an Illumina NovaSeq6000 sequencer with a depth of 50,000 read pairs per nuclei for the RNA libraries and 25,000 read pairs per nuclei for the ATAC libraries.

Single nuclei RNA-and ATAC sequencing data analysis
Preprocessing and quality control Raw data obtained from single-nuclei RNA-and ATAC-sequencing was preprocessed using the Cell Ranger ARC software (10x Genomics, v.2.0.2) using the human genome built GRCh38 as reference.
Resulting ltered feature barcode matrices were imported in R (v.4.2.1) and converted to a Seurat object using Seurat v.4.3.0 23 .In Seurat, quality control was performed and nuclei were excluded if they did not meet the following parameters: number of RNA features between 200 and 6,500, number of ATAC features between 1,000 and 25,000, percentage of mitochondrial genes < 10%, TSS enrichment score between 2 and 8, and nucleosome signal between 0.5 and 2.

Data integration and clustering
After quality control, individual datasets from the 14 individuals were merged to a single Seurat object using the merge function.ATAC peaks were called in the merged object using MACS2 43 to obtain a uniform set of peaks.After normalization, variable feature identi cation, and data scaling of the RNA assay, we applied dimensionality reduction using the RunPCA function and integrated the expression values across the 14 individuals with harmony 25 (v.0.1.1)using the rst 20 principal components from PCA, followed by UMAP reduction.In a similar approach using Signac (v.1.9.0) 24 , the ATAC assay was normalized using TFIDF, and linear dimensionality reduction was performed using SVD.Based on the principal components 2-30, UMAP reduction was performed followed by data integration using harmony.Finally, the RNA and ATAC assays were combined using a weighted nearest neighbor methodology as implemented in the FindMultiModalNeighbors function in Seurat.Using a resolution threshold of 0.25, a total of 15 clusters were identi ed in the single-nuclei multiomic dataset.

Identi cation of cell types
The expression of known marker genes for cell types in the human striatum 44 and different types of expected interneurons was evaluated in the RNA assay.In addition, we utilized the FindAllMarkers function to generate differential expression pro les distinguishing expression in one cluster from all other clusters (Sup Fig. S5).We cross-checked highly expressed genes in clusters with the Transcriptomics Explorer of the Allen Human Brain Atlas 45 and the Human Protein Atlas 46 leading to the identi cation of 15 different cell type clusters.Two minor clusters were characterized by strong expression of mitochondrial genes as evaluated by the FindAllMarkers function in R. We excluded these clusters expressing oligodendrocytes and excitatory neuron markers from the object and performed further analysis in the 13-cluster object.

Differential expression analysis
Differentially expressed (DE) genes were determined based on a Wilcoxon rank sum test as implemented in the FindMakers function in Seurat.As differential expression cut-offs, absolute log2-fold change > 0.5, minimal expression in at least 25% of cells in either of the compared clusters and an adjusted p-value < 0.05 were used.DE analyses were restricted to clusters consisting of at least 100 nuclei in each condition (CocUD yes/no) to ensure su cient statistical power in the comparison.Visualization of DE genes was performed using the R packages ComplexHeatmap 47 (v.2.14.0) and UpSetR 48 (v.1.4.0).

Differential chromatin accessibility analysis
Identi cation of differentially accessible (DA) peaks in the ATAC assay for each cluster was performed using FindMarkers based on peaks that were at least expressed in 5% of cells in each of the compared clusters.Differential accessibility cut-offs were absolute log2 fold-change > 0.25 and adjusted p-value < 0.05.DA analyses also were restricted to clusters with at least 100 nuclei per condition.

Genomic annotation and overrepresentation analysis of ATAC peaks
Annotation of DA peaks to genomic features based on chromosomal coordinates of the GRCh38 reference genome was performed using the annotatePeak function of the R package ChIPseeker 49 (v.1.34.1).We further tested the enrichment of DA peaks in different genomic feature categories using the annotation of all peaks in the ATAC assay as the reference.A chi-squared test was used to test for signi cant enrichment within a category and Bonferroni-correction was used to adjust resulting p-values for multiple testing.

Gene ontology enrichment analysis
To test the overrepresentation of upregulated and downregulated DE genes in GO terms for the different cell type clusters, we used the compareCluster function of the R package clusterPro ler 50 (v.4.6.2) with the list of signi cant DE genes as input.Results for GO terms with a signi cant enrichment (adjusted pvalue < 0.05) were visualized as pie plots in GO term networks using enrichplot (v.1.18.3).The same approach was applied on gene lists resulting from the annotation of CocUD-associated differentially accessible distal and promoter peaks to genes with the intention of identifying GO terms related to differential accessibility.

Motif enrichment analysis
To assess the enrichment of transcriptional motifs within DA peaks associated with CocUD status, we performed motif enrichment analysis (MEA) using chromVAR 51 (v.1.20.2).First, the RunChromVar function was applied to the full ATAC assay to identify motif activities (z-scores) in the dataset.Up-and downregulated differentially active motifs that are associated with CocUD status were assessed for each cell type cluster using the FindMarkers function resulting in an estimate for the average difference in ChromVAR z-scores.Differentially active motifs with an adjusted p-value < 0.05 were considered statistically signi cant.

Identi cation of gene regulatory networks
We used pando 27 to infer gene regulatory networks (GRN) based on snRNA and snATAC data for each cluster using the respective DE genes as the candidate gene list for inferring GRNs.Pando uses multiomic data together with inferred transcription factor binding sites to generate a GRN consisting of transcription factors and their inferred targets.After initiating the GRN, transcription factor modules were identi ed using the following parameters: p-value threshold = 0.05, number of variables = 5, minimum number of genes per module = 5, and R 2 =0.1.Network graphs for TFs and targets in each of the evaluated clusters were generated using the force-directed Fruchterman-Reingold layout.

Druggability analysis of GRN targets
Reference data from the drug-gene interaction database 52 (DGIdb) was used to evaluate drug-gene interactions and potential druggability of GRN targets.Gene targets from the GRN were prioritized based on the number of statistically signi cant TF-target associations inferred by Pando.We determined quantiles of the TF-target connectivity distribution and selected targets with a minimum number of N=17 signi cant TF-target connections corresponding to the value related to the 95% quantile.All of the identi ed target genes were connected to multiple TFs (mean=16 upstream TFs) in the GRN network highlighting their potential use as convergent downstream drug targets.The prioritized set of target genes was queried using the DGIdb online tool (https://www.dgidb.org,v.5.0.5) with "Interactions by Gene" mode.Results for unique matches of gene-drug interactions were downloaded and sorted by gene-drug interaction score and FDA approval state of connected drugs.

Multi omics factor analysis
We applied multi omics factor analysis (MOFA) 53 to identify factors that are correlated with CocUD consisting of features from both the ATAC and the RNA assay.We separated the ATAC assay by peak annotation to "distal" and "promoter" category as performed in Cell Ranger Arc to investigate the relationship between ATAC peaks in different genomic regions with RNA expression levels in the factor analysis.After the identi cation of highest variable features in the RNA (N=5,000), ATAC promoter (N=16,700) and ATAC distal (N=42,651) assays, a MOFA object based on 15 factors was trained in R using MOFA2 (v.1.8.0).In the resulting object, we tested for association of the resulting factors with CocUD status and performed MOFA-implemented downstream analyses such as gene set enrichment analysis (GSEA) and motif enrichment analyses in the CocUD-associated factors.

Rank-Rank Hypergeometric Overlap
We utilized the R package RRHO2 54 (v.1.074)to conduct rank-rank hypergeometric overlap (RRHO) analysis, aiming to assess convergent and divergent expression patterns at the transcriptome-wide level between male and female donors.RRHO scores were derived from comprehensive differential expression statistics from both male and female datasets.Subsequently, we evaluated overlapping signatures using the hypergeometric testing procedure implemented in RRHO2.

GWAS enrichment analysis
To evaluate the expression signatures of genes containing risk variants associated with substance use disorders including CocUD we performed a GWAS enrichment analysis using single-cell diseaserelevance scoring (scDRS).Here, we aimed to identify cell types of the caudate nucleus that are enriched for the expression of risk genes and might therefore be especially susceptible to the effects of genetic risk variants.The following GWAS summary statistics for African American (AA) populations were included in the analysis: cocaine dependence (N=3,318, Gelernter et al., 2014) 41 , alcohol dependence 55 (N=6,280, Walters et al., 2018), cannabis use disorder 56 (N=9,745, Johnson et al., 2020), and opioid use disorder 57 (N=2555, Polimanti et al., 2020).First, a gene-based analysis was performed for each summary statistics le using MAGMA 58 v.1.10.Next, scDRS was performed according with default settings as outlined in https://martinjzhang.github.io/scDRS/notebooks/quickstart.html.First, a gene-pvalue matrix was created including information on the strength of the association between the gene and the phenotype.The scdrs munge-gs function was applied to convert the GWAS-based gene statistics to scDRS format.Next, single-cell expression data was extracted from the Seurat object and converted to h5ad le-format.Finally, using the h5ad-le and the munged scDRS input le, disease-relevance scoring was performed using the compute-score function in scDRS.We used the scdrs perform-downstream function to evaluate group-level (i.e.cell cluster) associations with the inferred disease score.
Validation 3-crit model of cocaine use disorder and rodent tissue RT-qPCR was performed in tissue from the rat dorsal striatum, corresponding to the human caudate putamen, in male Sprague-Dawley rats that underwent the multisymptomatic 3-CRIT animal model of CocUD.Brie y, following extended intravenous self-administration (45 days), rats were tested on 3 measures modeled from criteria used to diagnose human use disorder in DSM-IV/V: the persistence of drug-seeking (responding when cocaine is signaled as unavailable), motivation for drug-seeking (progressive ratio), and drug taking despite adverse consequences (responding for cocaine in the presence of foot shock).The 3-CRIT model results in a distribution of animals ranging from those showing no addicted-like behavior (0-crit) to those showing addicted-like behavior on all three criteria (3crit) (further described in Pohořalá, et al. 22 ).For RT-qPCR in the current experiment, we analyzed tissue from six 0-crit and ve 3-crit rats, as well as six control rats (N total = 17) that were con ned to the home cage for the entirety of experiment and thus had no exposure to cocaine or behavioral procedures.All rats were sacri ced 2d following the completion of the 3-CRIT protocol, at which point brains were ashfrozen in isopentane and stored at -80° C until further processing.Experimental procedures were conducted according to the NIH ethical guidelines for the care and use of laboratory animals, in compliance with the German Animal Welfare Act, and approved by the local animal care committee (Regierungspräsidium Karlsruhe, Germany).
Quantitative Real-Time PCR in the 3-crit model Total RNA was isolated from rat brain tissues using the RNeasy Mini Kit (Qiagen) and reverse transcribed into cDNA using the High-Capacity cDNA Reverse Transcription Kit (4368813; ThermoFisher).Quantitative real-time PCR (qPCR) was performed on the QuantStudio™ 7 Flex Real-Time PCR System (Applied Biosystems) with the following reaction setup: 10 µL TaqMan™ Gene Expression Master Mix (Applied Biosystems, catalog number: 4369016), 1 µL TaqMan™ Gene Expression Assay (Thermo Fisher Scienti c), and 9 µL cDNA.Speci c TaqMan™ Gene Expression Assays were used to detect the genes of interest: PDE10A (Rn00673152_m1), KCNIP4 (Rn00710466_m1), CACNA1C (Rn00709287_m1), and ZEB1 (Rn01538408_m1).The housekeeping gene GUSB (Rn00566655_m1) served as internal control.Gene expression analysis was performed in triplicates for the genes of interest and in duplicates for the internal control, with all samples analyzed together on the same plate per gene.

Statistical Analysis of 3-crit model qPCR data
We rst generated the delta CT (ΔCt) values by subtracting the Ct mean values of the triplicates of the genes of interest (ZEB1, CACNA1C, PDE10A, KCNIP4) from the Ct mean values of the duplicates of the internal control GUSB.Then, we used Student's t-tests to compare the ΔCt values between different groups.

Supplementary Files
This is a list of supplementary les associated with this preprint.Click to download.

Figure 1 A
Figure 1

Figure 3 A
Figure 3

Figure 4 A
Figure 4