Identification of de novo variant in ASD probands
We analyzed a ASD cohort consisting of 369 ASD probands and 706 parents from 353 pedigrees recruited from Department of the Child and Adolescent Psychiatry, Shanghai Mental Health Center. Among the cohort, there are 15 multiplex family containing two ASD children and 338 simplex family which have one ASD child. The fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) were used for ASD diagnoses by trained psychiatrists.
Proportion of the targeted exome regions covered with ≥ 20x or 40x of reads indicates sufficient coverage (Fig. S1A). After performing the multidimensional scaling of the genotyping data, common exonic SNPs was identified with the PLINK toolkit (a whole genome association analysis tool)(13). We found that common exonic SNPs in probands of the SMHC cohort were adjacent to the cluster of East Asian populations previously characterized, suggesting that the SMHC cohort faithfully carried genetic signatures of East Asian populations (Fig. 1A).
After performing variant filtering, we discovered a set of 220 de novo mutations (DNMs) (Table S1). We classified DNMs into three categories, including High-impact, Moderate-impact, and Possible-damaging. The High- and Moderate-impact were defined by VEP (Ensembl Variant Effect Predictor, https://asia.ensembl.org/info/docs/tools/vep/index.html). Briefly, the High-impact variants usually lead to truncation of protein products, including gain or loss of STOP codons as well as frameshift-causing insertions and deletions (INDELs). Interestingly, among the 55 genes containing High-impact SNVs, there are 18 genes previously reported in the SFARI gene list (Category S, 1, 2, 3) such as SCN2A, PTEN, MECP2, SRCAP, TCF4, indicating that most genes containing high-impact SNVs in the Chinese cohorts are novel and not included in the SFARI gene database. (Fig. 1B, C).
Moderate-impact variants were defined as protein sequence changing, but not truncating, such as missense SNVs and inframe INDELs. To further categorize the severity of missense variants, we annotated missense SNVs into a new class, named Possible-damaging missense DNMs, which were defined as the variants predicted to be damaging by at least two of the seven following prediction algorithms: SIFT(14), PolyPhen-2 HumVar(15), PolyPhen-2 HumDiv(15), LRT(16), Mutation Taster(17), Mutation Assessor(18) and PROVEAN(19) annotated by dbNSFP4.0a(20, 21). Interestingly, among 165 Moderate-impact variants, there are only 23 variants are present in the SFARI gene list (Fig.1 B, C).
Over one thousand ASD risk genes in the SFARI gene list were mainly found from genetic studies in US and European studies, therefore we were wondering whether numerous genes with DNMs in the Chinese ASD cohorts which were not included in the SFARI list were really contributory to ASD or some common genetic variants may not associated with disorders. To further determine whether these genes with DNMs may be contributory to ASD, next we statistically evaluated the contributions of each de novo variant to ASD using the Transmission and De Novo Association Test-Denovo (TADA-Denovo) method. We first measure the frequency of de novo and missense variants in each gene with DNMR-SC-subtype data(22), then applied the TADA-Denovo method(23). We classified the DNM variants with p values obtained from TADA-Denovo test into two tiers (*, p < 0.01, or **, p < 0.001) (Table S2). We further measured the “probability of loss-of-function intolerance” (pLi) score for each variant and categorized variants with significant TADA-Denovo value into two tiers as well (>0.9 represented by ##, 0.5-0.9 represented by #)(24). Finally, we found that 11 genes with High-impact mutations and 35 genes with Moderate-impact mutations, all of them not included in the SFARI gene list, were statistically significant with both TADA-Denovo and pLi score, further strengthening their contributions to ASD (Fig. 1C).
We would like to investigate whether genes with de novo variants identified in various ASD genetic studies may be overlapping. Interestingly, we found that de novo ASD risk genes detected in ASD probands in the SMHC cohort showed little overlapped with the list of de novo ASD risk genes from the Japanese cohort (Fig. S1B, C)(8). Moreover, we found that there was also little overlapping in de novo variants between the SMHC cohorts with other studies with 200-400 trios (Fig. S1D)(5, 25-27).
Identification of de novo CNVs in ASD risk genes with the WES dataset
Although the gold standard for copy number variations detection is the chromosomal microarray analysis (CMA), various toolkits has emerged to identify CNVs with the whole-exome sequencing (WES) dataset(28). However, the current algorithms for CNV detection are not optimal for the WES dataset and incompatible with the GRCh38/hg38 reference genome.
We applied a germline CNV calling protocol based on GATK cohort mode (version 4.2.0.0) (See Supplementary Methods) and identified numerous de novo CNVs in the probands (Fig. 2A-N, Table S3). To exclude the false positive hits, we set 2 standards for CNV screening. First, selection of duplication or deletion signals appearing in more than 2 continuous exons. Second, CNVs should fulfill the HIGH-impact criteria, leading to protein truncation, such as deletion of START or STOP codons.
To prioritize ASD risk genes, we first examine CNVs happened in the known SFARI genes (Fig. 2A-N). We found 18 CNVs exhibiting duplication or deletions in known SFARI genes (Cat S:4 genes, Cat 1:14 genes), such as duplications of RAI1, UBE3A and deletions of TBR1, SHANK3, MECP2, GIGYF1 (Fig. 2A-N). We further validated the CNV results by performing quantitative PCR, confirming the feasibility and faithfulness of our new methods (Fig. 2O).
Furthermore, among de novo large CNVs we found, there are 9 CNVs containing genes in the SFARI Cat 2 gene list (Table S3). There are totally 26 CNVs containing critical ASD-risk genes in the SFARI gene list (Cat S, 1, 2), suggesting that genes implicated in these de novo large CNVs may contribute to pathogenesis of ASD.
Expression of ASD risk genes enriched at PC, PRC and BST regions in the developing human brain
The etiology of ASD may be disruption of neural circuits associated with social behaviors, thus identification of the expression profile of gene with DNMs in the human brain would provide critical insights for which brain regions may be affected by mutations of ASD risk genes(29). To acquire the expression pattern of ASD risk genes in the single-cell resolution, we used the recent single-cell sequencing database in the developing human brain(30, 31). We grouped total 17434 transcriptomes collected from gestational week (GW) 09-26 of human fetus brains and categorized them into sub-cell types according to marker genes (Fig. 3A, Fig. S2A-C).
We first examined the expression pattern of 55 High-impact genes and 165 Moderate-impact genes in various cell types, and found that both High-impact genes and Moderate-impact genes were highly expressed in several subtypes of cells, including NPC-4, Ex-1 and In-2, as well as cajal-retzius cells (CR) (Fig. 3B, C, D). We further looked into where NPC-4, Ex-1, In-2, and cajal-retzius cells (CR) localized in the developing human brain. We found that NPC-4 was generally distributed in the four major lobes of the brain, suggesting that this specific sub-group of neural progenitor cells may be associated with ASD (Fig. 3E, F, Fig.S3A, B). However, Ex-1 and In-2 specifically enriched in some sub-regions of the brain including precentral gyrus (PRC), postcentral gyrus (PC) and banks of superior temporal sulcus (BST) regions (Fig. S3C-E).
We next investigated whether expressions of ASD risk genes may be enriched in specific brain regions of the human brain. In previous work, the single-cell sequencing were performed in 22 brain subregions in the developing human brain (Fig. 4A)(30). Surprisingly, we found that the High- and Moderate-impact genes were significantly enriched in precentral gyrus (PRC), postcentral gyrus (PC) and banks of superior temporal sulcus (BST) regions (Fig. 4B, C, D, E). The PRC is the primary motor cortex (M1), and PC is the primary somatosensory cortex (S1). The implications of PRC and PC in ASD had been reported previously(32, 33). Interestingly, we also found the functional connectivity including right S1 (S1R) and M1 (M1R) regions were specifically decreased in MECP2 transgenic monkeys, the non-human primate model for autism, comparing to wild-type monkeys (34-36).
Brain imaging analysis
In order to determine whether these brain regions were affected in ASD patients from different populations, we acquired imaging data from Autism Brain Imaging Data Exchange (ABIDE-I, http://fcon_1000.projects.nitrc.org/indi/abide/)(37) a publicly available database released containing 1112 subjects (539 ASDs, 573 age-matched healthy controls-HCs) from 16 international imaging sites underwent anatomical and resting-state functional MRI scans. we collected more than 200 age-matched brain imaging data from ASD or HC groups of ABIDE-I (Table S4).
To further validate whether PC/PRC and BST may have structural (gray matter) alternations in ASD patients, we first performed the voxel-based morphometry (VBM) analysis of these region in ASD and HC using T1 data. Surprisingly, we found that the gray matter volume of BST in the right hemisphere was significantly smaller in the ASD group than that in the HC group (t = 3.61, p = 0.003, t- and p-value from linear mixed model detailed in Statistic section of Methods and Materials), and this effect persisted even after controlling for medication status (t = 3.32, p = 0.001) and full-scale intelligence quotient (FIQ) (t = 3.4, p = 0.0007) (Fig. 5A, B, C).
We finally investigate the potential functional connectivity (FC) between the above regions of interests (ROIs) and the whole-brain voxels, by performing seed-based FC analysis using resting-state functional MRI scans data from ASD and HCs. Consistently, we observed a significant decrease in connectivity between BST/PC/PRC and sensory areas, insula, as well as frontal lobes in ASD compared to HC (Fig. 5D). We found a decrease of all six ROIs’ functional connectivity to the occipital lobe region, which is commonly associated with vision. We also found decreased connectivity between bilateral PC/PRC to the sensorimotor region of the parietal lobe. In addition, on the right BST, we found the most widely FC decrease among all ROIs, including connections to the right insula and temporal lobes (t = -6.05, FWE corrected p = 0.0002), to the bilateral frontal lobe and to the occipital lobe (Table S5-S6 and Fig. 5D).
BST has been shown to be voice-selective areas in normal adults, which plays a role in voice recognition and social stimuli processing (38). In an fMRI study, activation of BST by speech stimulation appeared compromised in adults with ASD (39). In addition, BST also exhibited ASD-related functional connectivity alterations (40, 41), gray matter changes (such as lower surface area and greater age-related cortical thinning) (42, 43) and white matter volume reduction (44). Our study indicated that genetic predispositions in ASD patients may lead to structural and function abnormalities in brain regions associated with processing of social information, thus providing novel candidate brain regions for intervention of ASD.