DNVs in our Chinese cohort
In this study, we sequenced 547 target genes in 1,102 Chinese patients with NDDs and identified a set of predicted potential functional variants, including PTVs and Dmis (Fig. 1 and Table S1). Using Sanger sequencing, we successfully validated 1,271 potential functional variants, including 108 de novo variants (54 de novo PTVs and 54 de novo Dmis), 975 inherited variants (156 inherited Dmis and 819 inherited PTVs), and 188 undetermined variants (36 PTVs and 152 Dmis) (Fig. S1 and Table S2). We found that 108 DNVs in 78 genes appeared in approximately 10.91% (102/935) of patients in our Chinese cohort. Among these 78 genes, 36 genes with DNVs are the first to be reported in Chinese patients. Furthermore, we revealed that 21 genes carried multiple DNVs, including 28 de novo PTVs and 23 de novo Dmis (Table 1). Both SCN2A and MECP2 were the most frequently mutated genes, each carrying one de novo PTV and three de novo Dmis (Table 1). Five genes (MED13L, GRIN2B, KCNQ2, CTNNB1, and TCF20) carried three DNVs in our Chinese cohort, while another 14 genes (ASH1L, SATB2, NRXN1, BCL11A, ADNP, SHANK3, MSL2, SYNE1, SYNGAP1, BRAF, GATAD2B, LLGL1, SLC2A1, and KDM5C) carried two DNVs (Table 1). We manually collected 805 known candidate genes that have either been reported in studies on large NDD cohorts or have strong evidence associating them with NDDs in the OMIM or SFARI Gene databases[17,21-23,42,44,41,40,43,45] (Table S3). We found that, of 21 genes with multiple DNVs that we identified, 20 are classified as known candidate genes (Table S4). For example, TCF20, which carries three de novo PTVs (p.S1803Vfs*6, p.C1795Wfs*13, and p.R1907X), was first reported in a Chinese cohort. Interestingly, we identified a potential novel NDD risk gene (MSL2) that carries two de novo PTVs (p.S560Ifs*11 and p.L266Vfs*3). This study is the first to report an association between MSL2 and NDDs.
We then employed probability of loss-of-function intolerance (pLI)[47] analysis from ExAC and residual variation intolerance scores (RVIS)[48] to evaluate the functional impact of genes with multiple DNVs. According to the scores of pLI and RVIS, we ranked them and used percentiles as an indicator of gene intolerance. Genes with low-percentile RVIS or pLI were more likely to be intolerant to genetic variants. As expected, it was shown that genes with multiple DNVs both exhibited significantly lower percentile pLI (p = 4.10 × 10-10) and RVIS (p = 1.55 × 10-10) than background genes (Fig. S2a). In particular, 20 of the 21 multiple DNV genes ranked in the top 50% of pLI and RVIS (Table 1), suggesting that they are less tolerant of damaging variants.
Inherited X-linked hemizygous variants
We identified seven inherited X-linked hemizygous variants in six genes (Table S5). Among these seven variants, three variants, including two Dmis (p.R270C on PTCHD1 and p.R197H on SLC16A2) and one splicing variant (c.1030-1G>C on SLC9A7), were recorded in the dbSNP or gnomAD database. The remaining four variants, including three Dmis (p.Q153R on ATP6AP2, p.R287C on ARHGEF9, and p.C517Y on PLXNA3) and one frameshift (p.N136Kfs*30 on SLC16A2) are reported here for the first time. Compared with the 805 known candidate genes, we note that five of the above genes are classified as known candidate genes and three (PTCHD1, ATP6AP2, and SLC9A7) are reported for the first time in Chinese patient with NDDs. In addition, we identified a novel candidate gene (PLXNA3) carrying a novel hemizygous missense variant (p.C517Y). A previous study has described an ASD patient from Maghreb carrying a rare inherited missense variant (p.D863E) in PLXNA3[49], suggesting that it may contribute to autism.
Prioritisation of candidate genes
Previous studies have demonstrated that both DNVs and rare inherited variants (RIVs) contribute significantly to NDDs[50-55]. Therefore, we integrated data on DNVs and RIVs and employed TADA[38] to prioritise candidate genes in NDDs. Based on DNVs and RIVs identified in our study, we prioritised 17 candidate genes (SCN2A, KCNQ2, SATB2, SHANK3, GATAD2B, NRXN1, MED13L, SYNGAP1, BRAF, MECP2, GRIN2B, TCF20, CTNNB1, ASH1L, ADNP, BCL11A, and LLGL1) with FDR values < 0.1 (Table S6). To increase the power of candidate gene detection[17,22,43,56-58], we integrated data from public datasets of NDD cohorts (N = 16,807) with that of controls (N = 3,391) from the Gene4Denovo[39] database (Table S7), and prioritised 208 candidate genes with FDR values < 0.1, in which 193 candidate genes reached an FDR < 0.05 (Table S8). Together with the six genes characterized by inherited X-linked hemizygous variants, in total, we prioritised 212 NDD candidate genes, with 17 genes defined as novel candidate genes (including 13 genes with FDR values < 0.05, three genes with 0.05 ≤ FDR < 0.1 (Table 2), and one gene with an inherited X-linked hemizygous variant). We observed that the 212 candidate genes prioritised in our study exhibited significantly higher pLI scores (p < 2.20 × 10-16) and lower RVIS (p < 2.20 × 10-16; Fig. S2b), consistent with the result regarding genes with multiple DNVs. A similar result was observed among the 17 novel candidate genes (p = 0.0394 and p = 1.74 × 10-4; two-tailed Wilcoxon rank-sum test; Fig. S2c), suggesting that these novel NDD candidate genes were likely to be intolerant of functional variants.
In addition, we found that DNVs and inherited X-linked hemizygous variants in 212 candidate genes were identified in ~10.59% (99/935) and ~0.94% (7/745) of patients in our study, respectively (Table 3). DNVs and inherited X-linked hemizygous variants in known candidate genes accounted for ~10.05% (94/935) and ~0.81% (6/745) of patients, whereas in novel candidate genes these variants accounted for ~0.53% (5/935) and ~0.13% (1/745) of patients, respectively. Inherited or state unknown variants in all candidate genes were detected in ~35.75% (394/1102) of all patients (Table 3).
Functional characteristics of prioritised candidate genes
We next employed MetaScape[59] to perform functional enrichment analysis. As expected, these candidate genes were significantly enriched in NDD-associated pathways, such as synapse organisation, covalent chromatin modification, head development, behaviour, and regulation of ion transport[17,60-64] (Fig. S3). Interestingly, the novel candidate genes were also involved in similar biological pathways. For example, MSL2 was implicated in covalent chromatin modification, LRRC4 was involved in chemical synaptic transmission, and PLXNA3 was associated with head development and axonogenesis.
We also performed functional cell- and tissue-specific enrichment analyses[65,66] to investigate whether candidate genes were associated with specific tissues or cells. As expected, the 212 NDD candidate genes tended to be enriched in the cerebellum, cortex, and striatum across the early to mid-fetal, mid-late childhood, and young adulthood stages of development (Fig. S4a). In the cell-specific enrichment analyses, we observed a highly significant enrichment in Drd1+ and Drd2+ medium spiny neurons of neostriatum and rods (Fig. S4b), similar to data reported in a previous study[21]. These results suggest that the 212 NDD candidate genes and novel candidate genes are functionally associated with the etiology of NDDs.
Functional network analysis between known and novel candidate genes
To investigate correlations between novel and known candidate genes, we performed a permutation test to estimate the relationship between these genes based on co-expression gene pairs identified from the BrainSpan atlas. We found that nine of the 17 novel candidate genes (p = 0.013, permutation test, Fig. S5a) were co-expressed with 149 known candidate genes (p = 0.01, permutation test, Fig. S5b), with 217 connections between them (p = 5.40 × 10-3, permutation test, Fig. S5c), suggesting that the novel candidate genes are significantly co-expressed with the known candidate genes and are more likely to be related to the pathology of NDDs.
To further investigate the functional relationship between novel and known candidate genes, we constructed a functional network by integrating PPI data from STRING and brain expression data from BrainSpan. Only known candidate genes directly interacting/co-expressed with at least two novel candidate genes were added to the network. The co-expressed/PPI network encompassed 78 genes, including 10 novel candidate genes and 68 known candidate genes (Fig. 2). These genes were enriched in several biological processes known to be related to NDDs, such as covalent chromatin modification (GO:0016569, p = 1.55 × 10-12, Fisher’s exact test), dendritic spine development (GO:0060996, p = 4.74 × 10-6, Fisher’s exact test), and brain development (GO:0007420, p = 1.92 × 10-9, Fisher’s exact test). Furthermore, these genes showed a significant enrichment of previously reported gene sets: FMRP targets[67] (p = 9.82 × 10-13, Fisher’s exact test), genes involved in synaptic genes[68] (p = 0.012, Fisher’s exact test), and genes essential in mice[69] (p = 1.14 × 10-18, Fisher’s exact test). It is worth noting that 10 novel candidate genes (UBR3, PLXNA3, ITSN1, POLR3A, MSL2, LRRC4, PSD3, DCX, SPAG9, and SMAD6) were significantly involved in functional clusters known to be associated with NDDs. For example, MSL2, UBR3, and PLXNA3 are involved in chromatin organization, while ITSN1, POLR3A, LRRC4, UBR3, and PLXNA3 have been implicated in nervous system development. In addition, we found that 10 novel candidate genes showed co-expression/interaction with more than three known candidate genes (Fig. S6). For example, MSL2, which was the most frequently connected novel gene, was co-expressed/interacted with 35 known candidate genes. DCX, ITSN1, and POLR3A were connected with more than 10 known candidate genes. These results suggest that novel candidate genes are functionally associated with known risk genes and that these 10 novel candidate genes may have a stronger influence in the etiology of NDDs.