Identification of de novo ASD missense and loss of function nonsense genes
We performed a systematic literature search to identify variants relevant to ASD. The ASD mutation data were compiled from 26 studies (Supplementary Table 1) where ASD is the main phenotype (Supplementary Fig. 1). We included articles that used data from whole exome, whole genome, and targeted sequencing (Supplementary Fig. 1) cohorts (excluding case reports). Among the variants, 92.4% (156,688 out of 169,580) were reported to be de novo and 0.863% (1463 out of 169,580) were rare inherited. Variants in the dataset were classified based on their location in the genome. More than 90% of variants overlap noncoding DNA sequences reported in autism whole genome sequencing projects (e.g., MSSNG, ASC), including intergenic regions, introns, and untranslated regions. Focusing on variants with direct impact on protein structure, de novo exonic and splicing variants were found to make up 6.23% (10,565 out of 169,580) of our curated data. Among the de novo variants, 62.01% (6,551 out of 10,565) were classified as missense (Supplementary Fig. 2, Supplementary Table 2) and 10.29% (1087 out of 10,565) were loss of function (LOF) (splicing (203), nonsense (470), frameshift (414)). We mostly focused on de novo LOF mutations (impacting 852 unique genes) (Supplementary Table 2) for our downstream analysis due to the consistent association of LOF mutations with clinical ASD [18, 19]. Our pathway enrichment analysis of all the de novo ASD LOF mutation genes identified multiple pathways including neuronal regulation and differentiation (p < 5.29 x 10− 11), cytoskeletal and helicase activity (p < 1.07 x 10− 07), brain development (p < 1.76 x 10− 07), synapse and neurotransmission (p < 4.07 x 10− 05) (Supplementary Fig. 3, 4 and Supplementary Table 3).
Construction of single-cell clusters and identification of genes that are differentially expressed across clusters
We have used single cell RNAseq data from 3 brain regions ACC (Anterior cingulate cortex), MTG (middle temporal gyrus) and VISP (primary visual cortex) derived from 8 neurotypical human tissue Allen Brain Atlas donors comprising 32,209 nuclei (ACC-7,283; MTG-15,928; VISP-8,998)[17]. RNAseq data were extensively processed and clustered (Supplementary Fig. 5) (detailed in Methods section) using Seurat v.3 [20], resulting in 17, 17 and 18 distinct cell clusters for ACC, MTG and VISP, respectively (Fig. 1A). The clusters are of varying sizes ranging from 100 to 3364 cells (Supplementary Fig. 6). To identify differentially expressed genes (DEGs) across clusters in all three brain regions, we have applied four statistical tests (Wilcox [21], t test [22], Bimod [23], MAST [24]) and a stringent cutoff of pvaladj < 0.001. Genes that were significant across all four tests were then used for further downstream analysis (Supplementary Table 4). This conservative analysis yielded differentially expressed genes for each cluster ranging from 349 to 2671 genes per cluster (Supplementary Table 4).
Identification of cell clusters that are enriched for constraint genes
We next aimed to prioritise clusters for downstream analysis by identifying those enriched for highly constrained genes intolerant of mutations. To identify single cell clusters that are regulated by constraint genes, we used two different approaches: i) brain critical exons and ii) pLi (probability of being LoF intolerant). Brain critical exons identify genes with exons that are both highly expressed and show constraint against mutation accumulation, whereas the pLi score characterises genes in terms of their tolerance to LOF mutation. Brain critical exons are reported to be highly constrained in human brain [25]. We have constructed a brain critical exon database using the gnomAD mutation database and RNA-seq data from 196 developing human brains [26] from three developmental stages (prenatal, early childhood and adulthood) following the method described in our previous work [25]. Our enrichment analysis identified that differentially regulated cluster genes are significantly enriched with brain critical genes in clusters ACC (2, 13, 10, 9), MTG (11, 10, 17, 16), VISP (16, 11, 8, 15). Brain critical exons are predominantly enriched in differentially expressed cluster genes that are highly expressed in prenatal stages compared to early childhood and adulthood stage (Fig. 1B, Supplementary Table 5). The most significant clusters are ACC-cluster 2 (p < 1.47 x 10− 154, OR = 1.74), MTG-cluster 11 (p < 3.99 x 10− 88, OR = 1.51) and VISP-cluster 16 (p < 1.43 x 10− 132, OR = 1.59). A gene-based score, pLi [27] was applied with a cutoff of pLi ≥ 0.9 to identify single cell clusters that are highly intolerant of LOF mutations. The clusters with most significant (see supplementary Table 6) tolerant genes are ACC (10, 13, 9, 2), MTG (11, 10, 16, 17), VISP (16, 11, 8, 15) (Fig. 1C). pLi is highly effective in quantitating haplo-insufficient genes [28] and our analysis showed that ACC-cluster 10 (p < 8.55 x 10− 106), MTG-cluster 11 (p < 9.05 x 10− 91), and VISP-cluster 16 (p < 4.04 x 10− 122) were those clusters most enriched for genes that are LOF intolerant and haplo-insufficient. In addition, we conducted replication-timing analysis as it has been reported that somatic mutations discovered in cancers [29, 30] and ASD gene mutations [31] are associated with late DNA replication. We found that the cluster genes are constrained in late replication timing compared to early replication (Supplementary Fig. 7, Supplementary Table 7).
Genes harbouring ASD de novo LOF and missense mutations are enriched in constrained clusters
We next conducted enrichment analysis between the single cell clusters and 6,651 de novo missense mutations (impacting 6551 genes) and 1087 de novo LOF mutations (impacting 852 genes) derived from our curation of the ASD mutational landscape as described above (Supplementary Table 8). Using the GeneOverlap package in R, we observed that ACC (10, 9,13), MTG (10, 11, 6), and VISP (16, 11, 8) clusters were most significantly enriched (Fig. 2A-C) for genes harbouring ASD LOF and missense mutations (p <: ACC (1.55 x 10− 09, 1.89 x 10− 09, 1.16 x 10− 08), MTG (1.31 x 10− 09, 3.11 x 10− 09, 6.74 x 10− 08), VISP (5.23 x 10− 11, 6.40 x 10− 11, 6.04 x 10− 09)). Interestingly, the de novo ASD LOF mutation enriched clusters were also those clusters that we have found to be significantly constrained (based on enrichment of high pLi genes and critical exons). In addition, we have conducted enrichment analysis on three additional sets of genes that harbour i) multiple de novo LOF ii) multiple de novo missense and iii) multiple de novo or missense mutations in ASD. We observed the initial LOF enriched clusters ACC (10, 9, 13), MTG (10, 11, 6), VISP (16, 11, 8) are also highly enriched (see Supplementary Fig. 8, Supplementary Table 9 for enrichment) for these three additional sets of genes. These clusters are chosen for further downstream analysis.
Previous studies implicated that fragile X mental retardation protein (FMRP) genes are found to harbour clinically relevant de novo mutations in ASD [32]. There are also phenotypic and genotypic overlap between epilepsy, intellectual disability and ASD. We conducted enrichment analysis of FMRP protein targets, epilepsy and ID LOF gene lists (Supplementary Table 10) across clusters and observed enrichment in the same highly constrained clusters found to be enriched for genes harbouring de novo ASD LOF/missense mutations, hereafter referred to as ‘ASD LOF enriched clusters’ (Supplementary Table 11). Of note, the housekeeping genes (negative control) (Supplementary Table 10) were not enriched in any of the clusters across brain regions. Further, the enrichment analysis of pathway genes involved in autism spectrum disorder (Supplementary Table 10, 12) were also carried out.
Subtypes of glial cells are enriched with ASD de novo LOF mutated genes
Classifying single cell transcriptomic clusters in terms of cell identity is complex; consequently, to assign cluster identity we have used two independent approaches. First, we used an unbiased approach to identify cell types enriched in each cluster based on their molecular signatures employed by a set of known marker genes (Fig. 2D). For our analysis, we used 865 known gene markers curated from literature [33–35] (Supplementary Table 13). Initial analysis on two selected well studied ASD genes (SCN2A, GRIN2B) known to harbour pathogenic de novo LOF mutations that are involved in critical neuronal inhibitory and excitatory functions confirmed their enrichment in various neuronal subtypes (Fig. 2E, 2F).
We next conducted enrichment analysis for the entire de novo LOF mutated genes. In this way we identified that the top ASD de novo LOF enriched clusters are characterised by a significantly higher expression of non-neuronal marker genes for astrocytes, oligodendrocytes, microglia, and oligodendrocyte progenitor cells (OPC) compared to neurons. We observed that top ASD de novo LOF cluster genes were over-represented among non-neuronal cell markers (Fig. 3A, Supplementary Fig. 9). Also, LOF genes had higher composition of known non-neuronal marker genes compared to neuronal marker genes (Supplementary Table 14).
Secondly, we further looked at the composition of the top 20 DE genes, which represent the primary regulatory molecular machinery that defines each cluster. Our analysis observed that the top 3 significant clusters for each brain region included in our study were composed of non-neuronal cells, i.e. astrocytes, oligodendrocytes, and microglia marker genes (Fig. 3B, Supplementary Fig. 10). We further looked for the evidence from expression of well-characterized non-neuronal marker genes for primary brain cell types. Our analysis found that multiple non-neuronal marker genes have restrictively high expression in our selected top three ASD de novo LOF enriched clusters (Fig. 3C). Based on these three lines of evidence, we observed that a subset of ASD LOF enriched constrained clusters were non-neuronal, i.e., comprised astrocytes, oligodendrocytes, and microglia. As such, this indicates that a subgroup of ASD-implicated genes are preferentially involved in non-neuronal brain processes, thereby earmarking these non-neuronal elements in the etiology of ASD among those individuals harbouring such mutations. Hierarchical clustering of cell type depicting lineage relationship based on DEGs was performed using Slingshot [36] (Supplementary Fig. 11). Venn diagrams depicting the overlap between genes among all clusters across 3 brain regions were also plotted (Supplementary Fig. 12).
Spatio-temporal characterisation of ASD LOF enriched clusters
For further characterizing these non-neuronal ASD LOF enriched clusters, we examined the spatio-temporal dynamics across three developmental periods (prenatal, early childhood and adulthood) and in 16 brain regions. We observed the strongest expression in the prenatal developmental stage (Fig. 4A, 4B, Supplementary Table 15). Further, critical genes in the ASD LOF enriched clusters showed significant expression in DFC (p < 2.88 x 10− 277) and V1C (p < 4.64 x 10− 214) brain regions, both known to be critical for cognitive function. The association of DFC and V1C is consistent in all top ASD LOF enriched clusters derived from three brain regions. In addition, network analysis identified that synaptic vesicle, ion activity and cell regulation are significantly enriched and common among all three ASD LOF clusters derived from the three brain regions (Fig. 4C, Supplementary Fig. 13, Supplementary Table 16).
Replication of conserved expression pattern of ASD LOF genes in non-neuronal cell types
For replicating our findings, we examined additional human and mouse brain single cell transcriptomes. Unlike our discovery analysis using single cell transcriptomics data, this human brain single cell dataset used pre-sorted (based on markers) brain cells [37]. We found that the top ASD de novo LOF enriched clusters were characterised by high expression in non-neuronal cells (astrocytes: p < 4.01 x 10− 03; oligodendrocytes: p < 1.22 x 10− 04) (Fig. 5A, Supplementary Fig. 14). The data also showed neuronal expression for other de novo ASD LOF enriched clusters consistent with our previous observation. We also analysed the expression of top ten-fold change genes (Supplementary Table 4) across clusters (Fig. 5B, Supplementary Fig. 15) and observed higher expression in non-neuronal cells compared to neurons. The mean expression of ASD LOF genes was also plotted, which shows higher expression within non-neuronal marker genes compared to neuronal markers (Supplementary Fig. 16).
To assess the robustness of our results and to examine evolutionary conservation across other mammals we used a dataset that identified broad categories of cell types from mouse brain regions [38]. Our analysis confirmed that genes from the ASD de novo LOF enriched clusters show significant expressed in astrocytes (Fig. 5C, Supplementary Fig. 17). A second independent mouse brain single cell transcriptome dataset [39] also showed a similar pattern (Supplementary Fig. 18).
We further used cell type specific gene expression data from another human brain [40] dataset to quantify mean expression of the top ten DEGs from each cluster across fetal and adult astrocytes. We consistently observed that top ten fold change genes from our previous analysis were more active in adult astrocytes, postnatally after the neurons mature (Supplementary Fig. 19). Enrichment of ASD LOF mutations across cell types of cerebrum and cerebellum in fetal brain [41] shows expression in non-neuronal cells (Fig. 5D). We thus observed distinct cell type association between ASD risk genes and glial-specific gene expression throughout development. These results not only support the genetic evidence indicating that non-neuronal cells may play a role in ASD, but also indicate involvement of non-neuronal cell type related disease etiology of ASD.
Our results suggest a subset of ASD genes have significant non-neuronal bias in expression. By way of example, both KANK1 and PLXNB1 showed very restricted expression in non-neuronal cell types in all three brain regions. Both KANK1 and PLXNB1 are shown to harbour multiple LOF mutations in our curated ASD mutation data set. Through personal communication with other labs with ongoing autism research cohorts (including MSSNG database) we have identified 2 de novo LOF mutations, 14 LOFs with unknown inheritance and 4 de novo missense from 11 ASD cases for KANK1 (Fig. 6A, Supplementary Table 17). Similarly, for PLXNB1, we have identified 5 de novo LOF mutations, 2 LOFs with unknown inheritance and 10 de novo missense in 8 ASD cases (Fig. 6A, Supplementary Table 18). There is significant enrichment of frameshift mutations in PLXNB1 (p = 0.0143) and splice site mutations in KANK1 (p = 0.0012, Supplementary Table 19) compared to control (gnomAD). In addition, we looked into other databases (Clinvar entries, DECIPHER) to accumulate more mutations from cases (Fig. 6A, Supplementary Table 17,18). We found that 24 of 115 KANK1 and 5 of 35 PLXNB1 variants are damaging mutations (based on rare variant ACMG guidelines). Further analysis on CNV shows enrichment of de novo CNVs for both PLXNB1 and KANK1 gene within ASD and neurodevelopmental disorder cases (Supplementary Fig. 20, Supplementary Table 17,18). Both KANK1 and PLXNB1 have shown restricted high expression in astrocytes in both of our human single cell data (Fig. 6B, Supplementary Fig. 21). This pattern of restricted astrocyte expression for both genes were found to be significant compared to neuron expression in human brain [37] and multiple mouse brain single cell transcriptome data [38, 39] (Fig. 6C and Fig. 6D, Supplementary Fig. 22, 23, 24). Further, these critical genes for ASD were highly expressed in fetal astrocytes compared to adult astrocytes (Supplementary Fig. 25). Expression of critical genes, KANK1 and PLXNB1 across cell types of cerebrum and cerebellum in fetal brain [41] shows expression in non-neuronal cells (Fig. 6E, Supplementary Fig. 26). KANK1 and PLXNB1 were expressed in non-neuronal cell types (Supplementary Fig. 27) in other single cell datasets too [42, 43].