Distinct de novo variants in LFA and HFA probands
We first analyzed WES of 772 ASD trios (probands and both unaffected parents) collected from the Department of Developmental Children Care, Xinhua Hospital Affiliated with Shanghai Jiao Tong University School of Medicine. We focused on de novo variants (present in probands but not in parents) in this study. We identified 401 de novo synonymous variants and 708 rare de novo non-synonymous variants (allele frequency < 0.1% in the four genomic databases: 1000G-ALL, 1000G-EAS, ExAc-ALL, ExAc-EAS, database websites see Methods, Fig. 1a, Fig. S1). Among 708 rare de novo non-synonymous variants in protein-coding regions, there were numerous LGD and missense variants, of which we validated 89.55% of total rare de novo non-synonymous variants by Sanger sequencing (Fig. 1a, Table S1).
To classify the severity of protein-changing de novo variants, we analyzed the “probability of loss-of-function intolerance” (pLI) score7 and the integrated “missense badness, PolyPhen-2, constraint” (MPC) score8 for LGD and missense variants, respectively (Fig. 1a-c). Previously, ASD probands were found to carry significantly more LGD variants, but not missense variants, than their unaffected siblings in simplex families4. Interestingly, we found that LFA probands carried significantly more both LGD and missense variants likely having larger effects (pLI > 0.995 or MPC > 2) than HFA probands (Fig. 1b-g), suggesting that LFA carried heavier mutation burdens than HFA. Therefore, both de novo LGD and missense variants may contribute to the etiology of ASD with development delay.
Overall, we identified de novo rare non-synonymous variants (including LGD and missense) in 62% (259 of 415) of families having LFA probands, and 54% (180 of 334) of families having HFA probands (Fig. 1h). Next, to further analyze mutation burdens of LFA and HFA, we examined the de novo variants occurred in each proband. We found that there were significantly more LFA probands carrying more than 1 de novo variants (LGD or missense) than HFA probands, suggesting that comorbid DD/ID is likely associated with heavier mutation burdens (Fig. 1i).
We analyzed the statistical significance using the TADA-denovo method 7,9. Among the genes with de novo LGD variants, we found that 24 out of total 64 genes in LFA probands were presented in the SFARI database (http://gene.sfari.org) which is the most comprehensive ASD gene database (Fig. 1j). Interestingly, in the genes with de novo LGD variants in HFA probands, there are only 6 of 38 genes were previously reported in the SFARI database (Fig. 1k). Many of these novel ASD risk genes passed the TADA-denovo test (p < 0.001 - 0.01), suggesting that they may represent East Asian specific ASD risk genes (Fig. 1j, 1k).
To further investigate the function of genes carrying variants with large effects (LGD or missense MPC >2) discovered in LFA and HFA probands, we performed Gene Ontology analysis and found that genes in LFA probands with likely disruptive variants are primarily associated with learning memory and brain development (Fig. S2a, genes elaborated in Fig. S2b), whereas genes in HFA probands with large effects are mainly associated with various post-translational protein modification, histone modification, and small RNA transcription (Fig. S2c, S2d). These data suggest that the etiology leading to LFA and HFA may be distinct in the course of brain development.
LFA and HFA genes exhibited distinct expression landscapes
To determine the expression profiles of genes with de novo variants in LFA and HFA probands, we first examined the expression of candidate genes in three critical developmental stages, including early embryonic stage (1st - 2nd trimester), late embryonic and early postnatal stage (3rd trimester and infant-toddler), and childhood, adolescence and adult stage. We categorized genes with de novo variants into these three groups according to the stage at which they were expressed at the highest level (Fig. S2e-g). We first found that genes with de novo LGD variants in HFA probands appeared to contain more early embryonic development genes than genes with de novo LGD variants in LFA probands (Fig. 2a)
From the gene expression data from human brain in the BrainSpan database, we found that expression levels of genes with de novo LGD variants in LFA (n = 61, excluding three genes showing low expression level in the brain: NLRP5, R3HDM4, SLCO1B7) showed a similar pattern with genes with de novo LGD variants in HFA (n = 38) at the early embryonic stage (Fig. 2B). Interestingly, genes with de novo LGD variants in HFA exhibited higher expression in late embryonic stage and lower expression in early postnatal stages (Fig. 2C). Moreover, genes with de novo LGD variants in LFA showed higher expression levels in the childhood stage than genes with de novo LGD variants in HFA (Fig. 2D).
We next evaluated enrichment of genes with de novo variants of large effects (LGD, missense MPC > 2) with bulk RNA-seq data in the Genotype-Tissue Expression (GTEx) resource10. We observed significant enrichment of LFA genes in various brain tissues including frontal cortex, cerebellum (FDR < 0.05) and amygdala, hippocampus, ACC, basal ganglia, nucleus accumbens (p < 0.05) (Fig. 2E), suggesting that etiology of LFA may be widely associated with various brain subregions. We further evaluated expression levels of genes with LGD variants in LFA and HFA probands. We found that according to the normalized expression level, genes with LGD could be divided into two groups, a cortically expressed group and a sub-cortically expressed group (Fig. 2F). Interestingly, we found that the cortically expressed groups of HFA probands exhibited higher expression level in central, medial and lateral ganglionic eminence (CGE, MEG, LGE) where the GABAergic neurons are mainly formed prenatally (Fig. 2F-H). This observation suggested that the etiology of HFA may be associated with dysfunction of GABAergic neurons in the brain.
To further examine the expression patterns of genes with de novo variants in LFA and HFA at the single-cell resolution, we collected total 17434 transcriptomes from gestational week (GW) 09-26 of human fetus brains11, 12 and further classified them into sub-types according to marker genes (neural progenitor cells: PAX6, glutamatergic/excitatory cells: NEUROD2, SLC16A6, SLC16A7, GABAergic/inhibitory cells: DLX2, GAD1) (Fig. 3A, 3B).
Interestingly, we found that expression of genes with de novo variants (including LGD and missense variants) in LFA probands showed enrichment in sub-types of excitatory cells (Ex-1) and inhibitory cells (IN-2), whereas genes with de novo variants in HFA specifically enriched in a sub-types of neural progenitor cells (NPC-4) (Fig. 3c, 3d), which was previously reported to be a transient state of neural progenitors that expressed neural stem cell genes HES1 and VIM, intermediate progenitor cell genes EOMES and PPP1R17, neuronal genes STMN2, NEUROD1/2/6 and also the reported layer V gene NPR3 (Fig. 3c, 3d, Fig. S3a-b) 11, 12.
From regional aspects, we found that genes with de novo variants in LFA probands exhibited enrichment in precentral cortex (PRC), postcentral cortex (PC) and banks of superior temporal cortex (BST) regions (Fig. 3e, Table S2), which was consistent with our previous report 13 and suggested that dysfunction of primary somatosensory and motor cortex may be associated with LFA probands.
Gene discovery in Chinese ASD cohorts
In order to comprehensively evaluate the contribution of genes with de novo variants discovered in the Chinese ASD population, we combined the current cohort with a previous cohort containing 369 Chinese ASD probands with their parents13 and analyzed all genes with de novo variants in the dataset of 1141 ASD probands.
We identified 59 genes at the false discovery rate (FDR) < 0.3 with the TADA-based model, of which 22 passed FDR < 0.1 and 13 passed FDR < 0.059 (Fig. 4a, Table S3). Among ASD risk genes at the FDR threshold of 0.1 or less, 40 % (9 of 22) are not present in the SFARI gene set, suggesting that a substantial amount of ASD risk genes in Chinese ASD cohorts may be distinct from genes found in other populations previously studied. These new ASD risk genes from Chinese cohorts are enriched for functions in gene expression regulation, neural communication, cytoskeleton & endomembrane system and others (Fig. 4b).
To determine if de novo variants may be associated with IQ/DQ of probands, we first plotted the distribution of ASD probands according to the IQ/DQ range (Fig. 4c upper panel). Then we listed ASD probands carrying genes with recurrent de novo variants along the IQ/DQ axis (Fig. 4c lower panel). Interestingly, among numerous genes with recurrent de novo variants, some of them appeared only in probands with normal IQ/DQ (SLC35G1, RTEL1, SBNO2, PLEC, PLCB1) (Fig. 4c), suggesting that they are candidate risk genes for ASD without DD/ID. Because of differences in genetic penetrance, variants in the same gene may lead to various IQ/DQ in different individuals, such as NSD1 and RAI1 (Fig. 4c). This observation is consistent with previous discussion that it may be difficult to define the “autism-specific” genes since different variants in the same gene may lead to dramatically different symptoms including with or without DD/ID6. Thus, we took advantage of the opportunity to identify genes primarily contributing to ASD without DD/ID, by focusing on genes with genetic variants that will not lead to DD/ID symptoms in recurrent cases.
SLC35G1 is a novel candidate gene for ASD without DD/ID
We identified a novel ASD risk gene SLC35G1 with the high confidence (FDR = 0.058), which encodes a transmembrane protein transporting nucleotide sugar14. We identified two mutations, K120fs and A310P in two unrelated male ASD patients without DD/ID (Fig. 4c, Fig. 5a, 5b). The K120fs mutation is a frame-shifting variant caused by a deletion of 4 base pair (bp), in which 2 bp are located at the end of exon 1, and 2 bp located at beginning of intron 1 of the SLC35G1 gene. The K120fs mutant leads to pre-mature termination at the 156th amino acid of the SLC35G1 protein. The A310P mutation was predicted as damaging by a series bioinformatic databases including SIFT, Polyphen2, MutationTaster, PROVEAN and fathmmxMKL15-19. The alanine at position 310 of the SLC35G1 protein is conserved among zebrafish, xenopus, chicken, mouse, dog, monkey and human (Fig. 5a). It was previously reported that a variant at a splicing site in the SLC35G1 gene may be associated with ASD20. Moreover, the expression of Slc35g1 enriched in the human brain according to the single-cell sequencing analysis (Fig. S4a, b). Therefore, we speculate that loss-of-function variants of SLC35G1 may lead to ASD.
To investigate the functional consequence of SLC35G1 loss-of-function, we generated a Slc35g1 knockout mouse model using two sgRNAs targeting 5’UTR and 3’UTR of the Slc35g1 gene were used to generate a ~8.5kb deletion (Fig. 5c). We used RT-qPCR to confirm disruption of the Slc35g1 mRNA from brain tissues collected from WT, Slc35g1+/-, and Slc35g1-/- mice. We found that the mRNA level of Slc35g1 gene reduced to nearly 50% and 100% in heterozygous and homozygous deletion mice, indicating the success targeting of Slc35g1 (Fig. 5d).
We next examined Slc35g1 heterozygous mutant mice (Slc35g1+/-) with various behavioral tests, which mimicked people with ASD carrying heterozygous mutations of SLC35G1. Deficiency of social interaction and stereotypic behaviors are two core symptoms of ASD1. We first examined the social behaviors of mice using the classic three-chamber test21. We found that Slc35g1+/- mice decreased interaction time with partners comparing to wild-type (WT) mice (Fig. 5e, 5f), showing reduced social preference from 35.87% to 5.8% (Fig. 5g). Next, we found that Slc35g1+/- mice did not display preference for newly introduced mouse as WT mice did, suggesting that Slc35g1+/- mice exhibited defects in both recognizing mice over objects and novel partners (Fig. 5h). To further evaluate the reciprocal social interaction behaviors of Slc35g1+/- mice, we performed the intruder interactions test for Slc35g1+/- mice (Fig. 5j). Indeed, we found that Slc35g1+/- mice exhibited significantly less interaction time with intruder mice, in comparison with WT mice (Fig. 5k). These data strongly indicated that Slc35g1+/- mice exhibited deficiency in social interaction behaviors, mimicking the phenotype seen in people with ASD.
We then examined whether Slc35g1+/- mice exhibited stereotypic and repetitive behaviors by marble burying test and self-grooming test21. We found that Slc35g1+/- mice had more frequent marble-burying activity than WT mice (Fig. 5l, 5m), but normal self-grooming activity as WT mice (Fig. S4c), suggesting that Slc35g1+/- mice exhibited stereotypic and repetitive behaviors to some extent.
Finally, to examine whether Slc35g1+/- mice had abnormal anxiety level, which is commonly associated with people with ASD21, we used the open field test and elevated plus maze test. We found that Slc35g1+/- mice exhibited a normal level of anxiety level in both tests, comparing to WT mice (Fig. S4d-f). Lastly, we examined the spatial learning ability of Slc35g1+/- mice with the Barnes maze test. We found that Slc35g1+/- mice showed the same latency in the learning curve as WT mice, in consistent with the finding that human patients carrying SLC35G1 mutations exhibited normal IQ/DQ (Fig. S4g, h).