Effects of gene features on gene co-expression profiles
The potential effect of the four gene features, including mRNA abundance, mRNA size, gDNA size, and GC content of the coding region, on gene co-expression profiles was first analyzed. The BrainSpan human brain transcriptome dataset was used for this analysis. This dataset contains transcriptomes of human (both gender) brain tissues from 16 different brain regions of various developmental stages and ages (from PCW8 to 40Y). A total of 15,942 genes with information on all 4 gene features were identified and used for analyses. These genes were placed in ascending order of mRNA abundance, mRNA size, gDNA size, or GC content as gene lists (Additional file 2: Table S1). The correlation coefficient (CC) of each gene pair was calculated to reflect the co-expression level of the two genes, and the results were displayed in pseudo color-coded matrices. In each of the CC matrices (Fig. 1a), these 15,942 genes were placed in ascending order on both x and y axes from the lowest mRNA abundance and GC content or the smallest gDNA and mRNA size to the highest mRNA abundance and GC content or the largest gDNA and mRNA size. All four CC matrices were found to exhibit uneven color intensity in different areas with higher intensity corresponding to higher CC values. The overall color intensity was the highest in areas corresponding to medium mRNA abundance, medium to high gDNA or mRNA size, and low GC content (Fig. 1a). This result suggests that all four gene features affect gene co-expression profiles.
Most hcASDs are large genes with medium to high mRNA abundance levels, but with no clear bias in GC content (Additional file1: Fig. S1). To determine whether each of these four gene features affects the co-expression of a gene with the hcASD gene set as a whole, the co-expression coefficient (CEC, mean CC between a gene and each of the hcASD genes) of each of the 15,942 genes with the entire hcASD gene set was calculated (blue dots in Fig. 1b; Additional file 2: Table S1). In each of the 4 panels (Fig. 1b), the 15,942 genes were placed in ascending order (x-axis). A noise-reduced (by data averaging) CEC distribution curve was then generated by plotting the average CEC of a gene with its neighboring 20 (10 above and 10 below; ±10), 50 (±25), 100 (±50), or 200 (±100) genes on the gene lists under each gene ranking condition. Results showed a bell-shaped curve when genes were ranked by mRNA abundance, suggesting that genes with medium expression levels are more likely to co-express with the hcASD gene set (Fig. 1b, left panel). There was an overall positive correlation between gDNA or mRNA size of a gene and its CEC with the hcASD gene set (Fig. 1b, middle two panels). The CEC curve peaked at genes with approximately 40% GC content (x-axis between 1000-2000; y-axis between 0.28-0.31) and gradually declined with increasing GC content (Fig. 1b, right panel; Additional file 2: Table S1).
With cubic regression, each noise-reduced CEC distribution curve was found to have an R2 value > 0.9 (Additional file 1: Fig. S2;Additional file 3: Table S2), indicating a very high correlation between each of these gene features and the tendency of co-expression of a gene with the hcASD gene set. When the 15,942 genes were placed in stochastic (random) orders, CECs were evenly distributed, and the noise-reduced CEC distribution curves were flat (Fig. 1c).
Similar genome-wide gene co-expression profiles of the hcASD gene set were observed in transcriptomes of early (8PCW-2Y) and late (4Y-40Y) stages (Additional file 1: Fig. S3a), both gender, and different brain regions (Additional file 1: Fig. S3b, c). These findings suggest that the co-expression profile of hcASD genes is affected by all four gene features, regardless of developmental stages, gender, and brain areas.
The genome-wide co-expression profile of the hcASD gene set was then compared to those of gene sets with an equal number (64) of genes in the top, middle, or bottom positions of the gene lists corresponding to highest, medium, and lowest mRNA abundance and GC content or largest, medium, and smallest mRNA and gDNA sizes (Additional file 4: Table S3; Fig. 1d). When genes were ranked by mRNA abundance, the noise-reduced CEC distribution curve of the hcASD gene set largely overlapped with that of the gene set of median mRNA abundance levels (Middle in Fig. 1d); this result is consistent with the observation that most hcASDs are genes of moderate mRNA abundance. The noise-reduced CECs of the top mRNA abundance gene set (Top in Fig. 1d) were positively correlated with mRNA abundance, whereas those of the bottom mRNA abundance gene set (Bottom in Fig. 1d) were negatively correlated with mRNA abundance. When genes were ranked by gDNA or mRNA size, the noise-reduced CEC distribution curve of the hcASD gene set was most similar to, and higher than, that of the gene set of largest genes (Top in Fig. 1d). This result is also consistent with the fact that most hcASDs are large genes. When genes were ranked by GC content, the noise-reduced CEC distribution curve of the hcASD gene set greatly deviated from those of the Top, Middle, and Bottom gene sets (Fig. 1d), consistent with the lack of correlation with the GC content of hcASD genes.
Similar co-expression profiles of feature-matched gene sets
The genome-wide gene co-expression profile of the hcASD gene set was then compared to the profiles of 200 non-hcASD gene sets, each comprised equal number (64) of randomly selected and feature-matched non-hcASD genes under the four different gene ranking conditions (Fig. 2a). These gene sets were named “matched random” (mRand) gene sets (see methods). In general, the genome-wide CEC distribution of hcASDs was similar to that of each of the 200 mRand gene sets under all four gene ranking conditions. These findings suggest that gene sets with matched gene features have similar genome-wide co-expression profile as the hcASD gene set. However, genes with moderate mRNA abundance had higher noise-reduced CECs with the hcASD gene set than with any of the 200 mRand gene sets. In contrast, both low and high mRNA abundance genes (< 1 and > 30 RPKM; < 3000 and >14000 on x-axis) had lower noise-reduced CECs with the hcASD gene set than with most mRand gene sets. Moreover, genes with medium to large sizes had higher noise-reduced CECs with the hcASD gene set than with most size-matched mRand gene sets. Most genes, except those with highest GC content, had higher noise-reduced CECs with the hcASD gene set than with most GC content-matched mRand gene sets.
Co-expression of ASD risk genes
To determine whether hcASDs exhibit a significant tendency of co-expression with each other, the mean CEC of each of the 64 hcASDs with the hcASD gene set as a whole (hcASD-hcASD, see method) was compared to that of a large number of permuted gene sets, each comprised equal number of feature-matched non-hcASD genes (mRand-mRand) or randomly selected non-hcASD genes (Rand-Rand) and to the CEC between hcASD and mRand (hcASD-mRand) or Rand (hcASD-Rand) gene sets. Two hundred each of mRand and Rand gene sets were first analyzed. Results showed that feature-matched gene sets (mRand) had overall higher CECs than random gene sets (Rand) under all four matched conditions (@@@ in Fig. 2b), suggesting that genes of similar features tend to co-express with each other. The CEC of hcASD-hcASD (dashed line in Fig. 2b) was 1.5 times higher than the interquartile range [Q3 + 1.5 x (Q3-Q1), upper fence] of the CECs of mRand-mRand, Rand-Rand, hcASD-mRand, and hcASD-Rand gene sets. This result suggests that hcASDs have a significantly greater co-expression tendency with each other than other feature-matched non-hcASD genes or randomly selected genes. Results of the Grubbs’ test confirmed this tendency (*** in Fig. 2b). To corroborate this finding, permutation test was further conducted with 100,000 permuted sets of genes with matched or non-matched features (see method). The CEC of hcASD-hcASD was still found to be significantly larger (permutation P-value < 0.001) than that of hcASD-mRand, mRand-mRand, hcASD-Rand, or Rand-Rand (### in Fig. 2b), indicating a significant co-expression tendency of hcASDs.
Significant co-expression of hcASDs was also observed in transcriptomes of brain tissues from both early (8PCW – 2Y) and late (4Y – 40Y) stages (Additional file 1: Fig. S4a, b), both gender (Additional file 1: Fig. S4c, d), and different brain regions (Additional file 1: Fig. S5a-d). These results indicate a highly conserved mechanism for co-expression of hcASDs. Combined ranking of -log10 p-values of the Grubbs’ test under all four different matched conditions was then performed to determine the relative significance level of co-expression of hcASDs with each other in different brain regions (Additional file 1: Fig. S5e). The top five brain regions with the highest significance levels were cerebellum (CB), dorsal frontal cortex (DFC), orbital frontal cortex (OFC), primary sensory cortex (S1C), and striatum (STR); these are the brain regions previously implicated in ASD [28-34]. These results suggest that hcASDs play important roles in the development and function of these ASD-relevant brain regions.
ASD-relevant pathways identified by MGCA
A gene whose CEC with hcASDs was significantly higher than its CECs with permuted sets of feature-matched genes (P < 0.001) was considered as significantly co-expressed with hcASDs. Results of this matched-gene co-expression analysis (MGCA) showed that 3931, 3330, 5629, and 5854 genes were significantly co-expressed with hcASDs under each of the four matched conditions, respectively (Fig. 3a; Additional file 5: Table S4), with a false discovery rate below 5% (FDR < 0.05).
Altogether, 2515 genes were found to significantly co-express with hcASDs under all four matched conditions with an estimated FDR of each gene below 6.25 x 10-6 (0.054); this gene set was named TetraM-2515 (Fig. 3a; Additional file 5: Table S4). TetraM-2515 was then compared with 2515 genes that had the highest CECs with the hcASD gene set (referred to as Top-2515 gene set, Additional file 5: Table S4). TetraM-2515 and Top-2515 gene sets had 1500 genes in common (Overlapped), and each had 1015 non-overlapped genes. These two non-overlapped gene sets were named TetraM-only and Top-only, respectively (Fig. 3b; Additional file 5: Table S4). Most Top-2515 genes had a medium mRNA abundance level, a large gene size, and a high CEC value (> 0.4), whereas TetraM-2515 genes had a broad range of mRNA abundance, gene size, and CEC values (Fig. 3b). TetraM-2515 genes and Top-2525 genes had 34 and 32 genes overlapped with hcASD genes, respectively.
Gene ontology (GO) enrichment analysis of the TetraM-2515 gene set showed significant over representation of genes in molecular pathways closely related to ASD, including covalent chromatin modification, protein polyubiquitination, homophilic cell adhesion, axon guidance, negative regulation of dendrite development, synapse assembly, Wnt signaling pathway, and RNA splicing. The Top-2515 gene set also showed significant enrichment of genes in several pathways relevant to ASD, including covalent chromatin modification, mRNA slicing, protein ubiquitination, and Wnt signaling pathway (Fig. 3c; Additional file 6: Table S5).
To investigate the functional relationship between hcASD and TetraM-2515 or Top-2515 gene sets, an integrated GO enrichment network of multiple gene sets was constructed [35]. Genes of TetraM-only, Top-only, Overlapped, and the hcASD gene sets were divided into 130 nodes based on GO and KEGG annotations. Nodes were connected based on the similarities between node pairs (Kappa similarity > 0.3). These 130 nodes formed 20 networks. Based on the number of genes from each of the four gene sets in each node, some networks were found to be dominated by genes from one of these four gene sets (Fig. 3d;Additional file 7: Table S6). None of hcASDs were in networks dominated by Top-only genes. The hcASD-dominated network was found to connect with networks dominated by TetraM-only genes or Overlapped genes (Fig. 3d; Additional file 7: Table S6), suggesting that hcASDs have a closer functional relationship with TetraM-2515 genes than with Top-2515 genes.
MGCA was performed to further analyze an expanded set of ASD risk genes containing 1166 non-redundant ASD risk genes from ten different sets of previously reported ASD risk genes [6, 17, 25, 36-41] (Additional file 8: Table S7). An integrated GO enrichment network of this combined ASD gene set (cASD) along with TetraM-only, Top-only, and Overlapped gene sets was constructed. Results showed that all gene sets were converged separately in a subset of functional pathways; however the pathway patterns of TetraM-only and Top-only gene sets were largely complementary to each other (Additional file 1: Fig. S6a). The pathways of TetraM-only genes were related to brain developmental processes, such as dendrite development, synapse development, and neuronal projection morphogenesis and that of Top-only genes were related to mRNA processing and covalent chromatin modification. (Additional file 1: Fig. S6a; Additional file 9: Table S8). More co-localized nodes of cASD and TetreM-only genes than those of cASD and Top-only genes were seen in the network graph (Additional file 1: Fig. S6b;Additional file 9: Table S8). These results suggest that cASD genes have a closer functional relationship with TetraM-only genes than with Top-only genes.
Co-expression of cadherin genes with hcASDs
Consistent with previous findings [7], MGCA revealed that homophilic cell adhesion is the most significantly over-represented pathway of TetraM-2515 genes (Fig. 3c, d; Additional file 6: Table S5; Additional file 7: Table S6). Some cadherin family members, such as CDH2, in the TetraM-2515 gene set are known to be high risk genes of ASD (Additional file 10: Table S9) that play important roles in brain circuit development [24]. Several cadherin family members were also found in the TetraM-2515 gene set, including many members of the protocadherins β gene cluster and Dachsous Cadherin-related 1 (DCHS1), suggesting that these genes also participate in the development and function of ASD-relevant brain circuits. Some cadherin genes were not significantly co-expressed with hcASDs under any of the matched conditions; these genes were referred to as tetra-negative genes (TetraN; Additional file 10: Table S9). Several recent genetic studies have implicated two type II cadherins, CDH11 and CDH9, in ASD and other psychiatric diseases [42-46]. As CDH11 and CDH9 belonged to the TetraM and TetraN gene sets, respectively, we hypothesized that CDH11, but not CDH9, is more likely to be an authentic ASD risk gene.
Autistic-like traits of Cdh11-null mice
To assess the functional relevance of CDH11 and CDH9 to ASD, the behaviors of Cdh11 knockout (KO) and Cdh9 KO mice were investigated. In the open field test (OFT), both male and female Cdh11-null mice spent a longer time exploring the central area of the open field arena than wild type (WT) littermates (Fig. 4a, d). Heterozygous littermates showed a similar but less significant pattern. Total locomotion and average moving speed of Cdh11 KO mice were slightly reduced compared to WT littermates (Fig. 4b, c). Both male and female Cdh9-null mice were largely normal in the OFT (Fig. 4e-g).
In the elevated plus maze test, female Cdh11-null mice visited the open arm more frequently and spent a significantly longer time there. Heterozygous females spent a slightly but not statistically significant more time in the open arm (Fig. 4h, i). The increased time and frequency of open arm exploration by female Cdh11-null mice is consistent with the results of a previous study using the same mouse line of mixed gender [47]. Male Cdh9-null mice showed longer exploration of the open arm, but female Cdh9-null mice did not, although female heterozygotes showed an increased frequency of open arm entry (Fig. 4j, k).
Individuals with ASD often have a weaker grip strength than age-matched controls [48]. The gripping strength test and the horizontal bar test showed that both male and female Cdh11-null mice exhibited significantly shorter hanging duration than WT littermates (Fig. 5a, b), indicating reduced gripping strength and/or impaired motor coordination. The gripping strength of Cdh9-null mice was normal (Fig. 5c).
The rotarod test was conducted to evaluate motor-related functions of KO mice. Since female and male mutant mice displayed similar behaviors in most of the above behavioral tests, only female mice were analyzed in this test. Compared to WT littermates, Cdh11-null mice, but not Cdh9-null mice, stayed longer on the rotarod and endured a higher rotation speed in the initial trial (Fig. 5d-g). In subsequent trials, Cdh11-null mice did not display significant improvement in performance (Fig. 5d, e), indicating impaired motor learning. The enhanced performance of Cdh11-null mice in the initial trial was very similar to the phenotype of several other well-characterized ASD mouse models and suggested increased repetitive motion of these mutant mice [49].
Repetitive behaviors were then evaluated by measuring the duration and frequency of self-grooming within a 10-minute period, during which mice were placed in a novel or a relatively familiar environment. As shown in Fig. 5h-i, during the first 10 minutes of exploring a novel chamber, Cdh11-null mice exhibited a significantly greater frequency of self-grooming than WT littermates, indicating elevated repetitive behavior in a novel environment. Cdh11-null mice also showed a significantly higher frequency of self-grooming than WT littermates during the second 10-minute period (Fig. 5j, k), indicating elevated repetitive behavior even in a relatively familiar environment. No such behavioral alteration was observed in Cdh9-null mice (Fig. 5l, m).
The modified three-chamber social preference test was conducted to evaluate the sociability of mutant mice. One main modification was an enlargement of the area for housing social partner mice in order to reduce their potential stress and anxiety. Another major modification to the protocol was using three mice instead of a single mouse as social partners. This was done to increase the availability of social cues and reduce the variability of test results caused by differences in the sociability of individual social partners (Fig. 6a). In addition, the two side chambers were covered on the top to slow the diffusion and mixing of odorant cues. Results showed that female Cdh11-null mice exhibited a significant preference to social partner mice than to an object and to novel partners than to familiar ones (Fig. 6b, c). However, compared to WT littermates, mutant mice spent significantly longer time in the middle chamber but significantly shorter time interacting with partner mice (Fig. 6b, c), indicating reduced sociability. In contrast, Cdh9-null mice did not show any abnormality in this test (Fig. 6d, e).