Identification of circQTLs in ASD brain
We first collected rRNA-depleted paired-end RNA-seq data 5 and genotyping data 26 of postmortem samples from frontal cortex (FC) (Brodmann area 9), temporal cortex (TC) (Brodmann area 22, 41, and 42), and cerebellar vermis (CV) from individuals with ASD and non-ASD controls (the Synapse database) (Table 1). To identify local genetic effects that regulate circRNA expression, we considered the RNA-seq data and the corresponding genotyping data, both of which were derived from the same individuals (Fig. 1a, Table 1, and Supplementary Table 1). The examined circRNAs (1,060 circRNAs; Supplementary Table S1) were extracted from our previous study 10 based on the RNA-seq data used in this study. The expression levels of circRNAs were calculated using the number of supporting circRNA junction reads per million uniquely mapped reads (RPM) 27 and adjusted for covariates such as the corresponding host gene expression, brain region, diagnosis, and other biological/technical confounding factors (see Methods and Supplementary Table 1). For accuracy, the genotyped and imputed SNPs should be satisfied imputation score ³ 0.8 and estimated minor allele frequency ³ 0.01. To focus on cis-effects of circQTLs, we limited our analysis to SNPs in ± 200 kb nucleotides of each back-splice site. CircQTLs were evaluated by testing the correlations between the imputed genotype dosages and covariate-adjusted circRNA expression using Matrix eQTL 28. P values were adjusted across all circQTLs using false positive rate (FDR) correction (Methods). Significant circQTLs were determined if FDR values were less than 0.05. We thus identified 989 circQTLs associated with 81 circRNAs (Supplementary Table 2).
Recent transcriptomics studies of ASD showed that expression profiles of transcripts (including circRNAs 10, miRNAs6, and mRNAs29,30) and allele-specific genes 26 were highly similar between the two cortical regions (FC and TC) but were quite distinct in the CV. We calculated the circQTL effects for the 989 circQTLs in the three brain regions, respectively. We found that the circQTL effect size and direction were highly concordant between the two cortical regions and relatively weakly concordant (or even not concordant) between the cortex and CV (Fig. 1b). The differences in circQTL effect size between the cortex and CV were more significant than that between the two cortical regions (Fig. 1c). We also performed principal component analysis (PCA) and observed the similar results for the expression profiles of the 81 circRNAs (Fig. 1d). Previous studies have reported that the ASD pathophysiology was present in the cortex 31,32 and transcriptional dysregulation in ASD were shown to be consistently stronger in the cortex than in the cerebellum 5. These observations suggest that the cortical regions are more selectively vulnerable to transcriptomic changes than the cerebellum. Therefore, we focused on the RNA-seq data and the corresponding genotyping data from the cortex (FC/TC) samples (105 samples from 53 ASD cases and 52 controls; Table 1) for the subsequent circQTL analyses. For accuracy, we only considered the identified circQTL SNPs (666 circQTLs associated with 48 circRNAs; Supplementary Table 2) in which the number of individuals with the minor allele (heterozygotes and minor allele homozygotes) was larger than three.
We then studied the relationship between circQTLs and circRNA formation. We first showed that the identified circQTLs were preferentially located in the flanking sequences close to the back-splice donor/acceptor sites (Fig. 1e). The majority of circQTLs (90.5%; 603 out of 666; Supplementary Table 2) were located in the flanking sequences of back-splice sites. These observations were consistent with previous studies for distance distribution analysis of circQTLs in other human samples 33,34. Next, back-splicing was reported to be facilitated by reverse complementary sequences (RCSs) residing in the introns flanking circularized exons 27,35,36. We found that circQTLs residing in the flanking sequences of back-splice sites were significantly enriched in RCSs compared with background (empirical P<10-4; see Methods) and non-circQTL SNPs (P=3.2´10-7 by two-sided Fisher’s exact test; Fig. 1f). Of note, the examined non-circQTL SNPs were also located in the flanking sequences of the examined back-splice sites and not determined as circQTLs. Moreover, it was shown that RNA binding proteins (RBPs) can regulate back-splicing by binding to the flanking sequences 20,21. Indeed, the circQTLs located in the flanking sequences were significantly enriched in RBP binding sites compared with background (empirical P<10-4) and non-circQTL SNPs (P=0.01; Fig. 1g). These observations support that circQTLs located in the flanking sequences of back-splice sites may contribute to circRNA formation.
Mediation effects of trans-eQTLs (circQTLs) via circRNA expression
We next examined whether the identified circQTLs were also trans-eQTLs affecting the expression of distant genes (trans-eGenes) using Matrix eQTL. By definition, such circQTL SNPs and trans-eGenes should be located on different chromosomes or the same chromosome separated by a distance greater than 5Mb (from the SNP site to the transcription start site of the trans-eGene). Since some SNPs in high linkage disequilibrium (LD) were associated with different trans-eGenes, non-pruned circQTLs were considered in this study. With controlling for potential biological/technical confounding factors and multiple testing correction (Methods), 546 circQTLs (associated with 47 circRNAs) were identified to be also trans-eQTLs, which affected the expression of 7,165 trans-eGenes and constructed 43,372 circQTL-trans-eGene pairs (Fig. 2a; Supplementary Table 3). Meanwhile, these circQTL-trans-eGene pairs also resulted in 13,379 circRNA-trans-eGene pairs, where the expression levels of circRNAs and trans-eGenes were both regulated through the same circQTLs (Fig. 2a). We then explored the causes for the trans-effect of circQTLs on the expression of trans-eGenes. In addition to the direct effect of circQTLs on trans-eGene expression, we hypothesized that in some cases the expression of trans-eGenes were indirectly regulated through the expression of circRNAs near the circQTLs (i.e., the “mediation effect”; Fig. 2a). For the scenario of mediation effects, the circRNAs served as cis-mediators of trans-eQTLs. We first showed that the expression levels of circRNAs and trans-eGenes in the 13,379 circRNA-trans-eGene pairs were more likely to be correlated with each other than expected (empirical P<10-4; see Supplementary Fig. 1 and Methods). To test the mediation effects of trans-eQTLs, we performed mediation analyses37 for all 43,372 circQTL-trans-eGene pairs to identify the proportion of association between a circQTL and the trans-eGene that was caused by the effect of the circQTL on the corresponding circRNA expression. We found that 19,393 out of the 43,372 (45%) circQTL-trans-eGene pairs passed the mediation test (Supplementary Table 3), which were significantly mediated (P and FDR<0.05 with adjusting for related covariates; see Methods) by expression of circRNAs near the circQTLs. The majority (67%, 13,023 pairs) of the 19,393 pairs exhibited that more than 10% of the circQTL-trans-eGene association were mediated by the expression of circRNAs; in some cases (2,099 pairs), the proportion of mediation of the circQTL-trans-eGene association by the circRNAs was even more than 50% (Fig. 2b).
Regarding the 19,393 circQTL-trans-eGene pairs (or 19,393 circQTL-circRNA-trans-eGene axes), we observed that the percentages of circQTL-trans-eGene pairs passing the mediation test markedly increased with increasing significance levels of trans-eQTL effects for circQTL-trans-eGene pairs (Fig. 2c). Such a percentage even reached more than 70% if FDR values of trans-eQTL effects were smaller than 10-8. We further examined the relationship between mediation effects and the significance levels of circRNA-trans-eGene correlation of expression profile. We found 55% of the circRNA-trans-eGene pairs passing the mediation test if the circRNA expression and trans-eGene expression were significantly (Spearman’s P<0.05) correlated with each other (Fig. 2d), whereas such a percentage was significantly reduced from 55% to 40% (odd ratio=0.56 and P<2.2´10-16 by Fisher’s exact test) if the significant correlation between circRNA and trans-eGene expression disappeared (Fig. 2d). The percentages of the circQTL-trans-eGene pairs passing the mediation test markedly increased with increasing significance levels of the correlation of expression profile between circRNAs and trans-eGenes (Fig. 2d). Such a percentage even reached 90% at Spearman’s P value <0.0005. Similar results were observed in the analyses based on the CV samples (Supplementary Fig. 2 and Supplementary Table 3). These results reveal that the mediation effects of the circQTLs (trans-eQTLs) on circRNA expression are correlated with the magnitudes of trans-eQTL effect and circRNA-trans-eGene correlation of expression profile.
Causal effects between circQTLs and ASD diagnosis
Regarding the identified 43,372 circQTL-trans-eGene pairs, we then conducted causal inference test 38,39 (CIT; see Methods) to examine whether the trans-effects of circQTLs on trans-eGene expression might explain the association between the circQTLs and diagnosis status (ASD vs. non-ASD) and infer the direction of association between circQTLs, trans-eGene expression, and ASD diagnosis (Fig. 3a, top). At P and FDR<0.05, we identified 1,000 circQTL-trans-eGene pairs with a propagation path from circQTL SNP to ASD diagnosis via trans-eGene expression (Fig. 3a, bottom; Supplementary Table 4). We speculated that the trans-eGenes (708 genes; CIT-passing trans-eGenes) involved in the 1,000 circQTL-trans-eGene pairs may be implicated in ASD. We examined enrichment analysis for genes previously implicated in ASD from Simons Foundation Autism Research Institutive (SFARI) 40. We observed that the CIT-passing trans-eGenes were significantly enriched for the SFARI genes, but not for genes implicated in monogenetic forms of other brain disorders (Fig. 3b). These CIT-passing genes were also enriched for other classes of ASD-relevant genes, including genes encoding postsynaptic density (PSD) proteins 41, genes whose transcripts were bound by the RBPs of FMR1 42, RBFOX1 43, and ELAVL1 44, and differentially expressed genes (DEGs) in ASD 5 that were derived from the same cortex samples used in this study (Fig. 3b). In contrast, these trends were not observed in the trans-eGenes not passing CIT (Fig. 3b). Moreover, a previous study has presented a genome-wide prediction of ASD risk genes and provided an estimated probability of ASD-association for each gene 45. On the basis of the probabilities of ASD risk, we observed the CIT-passing genes indeed had a significantly higher probability of ASD risk compared with background gene set (empirical P<0.001; Supplementary Fig. 3) and the trans-eGenes not passing CIT (all P < 10-8 by two-sided Fisher’s exact test; Fig. 3c). We further found that the CIT-passing trans-eGenes were more intolerant of a loss function mutation (measured by gene variant intolerance (pLI) scores 46) compared with the background gene set (Supplementary Fig. 3) and the trans-eGenes not passing CIT (Fig. 3d), reflecting previous observations that genes implicated in ASD tended to be subject to stronger selective constraints than the other genes 45,47. Regarding DEGs in a cell type-specific manner, the CIT-passing genes were most overrepresented in vasoactive intestinal polypeptide (VIP)-expressing interneurons and L2/3 and L4 excitatory neurons. This trend was generally consistent with the previous observation for SFARI genes 48. Gene Ontology (GO) analysis revealed that the CIT-passing genes were enriched in GO terms related to chemical synaptic transmission, synaptic signaling, neurogenesis, synapse, neuron projection, axon, and so on (Fig. 3f and Supplementary Table 4), also reflecting their enrichment for PSD. These observations support the relevance of the CIT-passing trans-eGenes to ASD pathophysiology.
Furthermore, the above mediation analysis has identified 19,393 circQTL-circRNA-trans-eGene axes with significant mediation effects of circQTLs on trans-eGene expression through circRNA expression (Fig. 2b and Fig. 4a, middle). By integrating the CIT-passing circQTL-trans-eGene pairs with the 19,393 circQTL-circRNA-trans-eGene axes, we determined 257 circQTL-circRNA-trans-eGene-ASD diagnosis propagation paths, where the circRNA-trans-eGene axes may act as causal mediators for the circQTL-ASD diagnosis associations (Fig. 4a, left; Supplementary Table 4). The 257 ASD-associated circQTL-circRNA-trans-eGene regulatory axes involved 33 circQTLs (trans-eQTLs), 8 circRNAs, and 124 trans-eGenes (Figs. 4a and 4b). We found that 97 of 124 (78%) trans-eGenes were ASD-relevant genes and 217 of 257 (84%) circQTL-circRNA-trans-eGene axes were associated with the 97 ASD-relevant genes (Fig. 4b). Particularly, 10 of the 97 genes were SFARI genes and five were top SFARI genes with score = 1 or 2, suggesting the regulatory role of the corresponding circQTL-circRNA-trans-eGene axes in ASD brain.
Potential mediators for the circRNA-trans-eGene interactions
Regarding the 19,393 circQTL-circRNA-trans-eGene axes (associated with 5,669 circRNA-trans-eGene axes) passing the mediation test, we proceeded to identify potential mediators for the circRNA-trans-eGene interactions (Fig. 5a, left). Since circRNAs and mRNAs may carry the common miRNA/RBP target sites, numerous cases of circRNAs were demonstrated to act as an upstream regulator of mRNAs through mediating miRNA/RBP activities 20,21. To this end, we utilized crosslinking immunoprecipitation (CLIP)-seq data-supported RNA interactomes (extracted from ENCORI 49) to search for the common miRNA/RBP target sites of the 5,669 circRNA-trans-eGene axes (see Methods). We thus identified 3,141 circRNA-miRNA-trans-eGene axes (associated with 958 circRNA-trans-eGene axes; Fig. 5a, middle) and 54 circRNA-RBP-trans-eGene axes (associated with 30 circRNA-trans-eGene axes; Fig. 5a, right), respectively (Supplementary Table 5). Regarding the 257 abovementioned circQTL-circRNA-trans-eGene axes that passed both the mediation test and CIT (see Fig. 4a, right and Fig. 5b, left), we further found that 90 axes (Fig. 5b, right) overlapped with the 958 circRNA-trans-eGene axes with common miRNA target sites of the circRNAs and trans-eGenes and no axis overlapped with the 30 circRNA-trans-eGene axes with common RBP target sites of the circRNAs and trans-eGenes. Accordingly, 158 circQTL-circRNA-miRNA-trans-eQTL axes that might be causally related to ASD diagnosis were generated (Figs. 5b and 5c; Supplementary Table 5). The 158 axes were associated with 28 circQTLs, 6 circRNA, 17 miRNAs, and 34 trans-eGenes (Fig. 5b, right). Of note, 28 of the 34 trans-eGenes were ASD-relevant genes and one (ANK3) was a top SFARI gene with score = 1 (Fig. 5c). A previous study reported that genetic variations in ANK3 were associated with autism susceptibility 50. ANK3 can regulate the structure and function of glutamatergic synapses 51. Alteration of ANK3 expression can regulate neuronal microtubule dynamics functions and affect the corresponding role in psychiatric illness 52. Taken together, we provided a framework for detecting potentially causal relationships between genetics and trait and generating the corresponding regulatory axes in ASD.