Nexus between genome-wide copy number variations and autism spectrum disorder in Northeast Han Chinese population

Background Autism spectrum disorder (ASD) is a common neurodevelopmental disorder, with an increasing prevalence worldwide. Copy number variation (CNV), as one of genetic factors, is involved in ASD etiology. However, there exist substantial differences in terms of location and frequency of some CNVs in the general Asian population. Whole-genome studies of CNVs in Northeast Han Chinese samples are still lacking, necessitating our ongoing work to investigate the characteristics of CNVs in a Northeast Han Chinese population with clinically diagnosed ASD. Methods We performed a genome-wide CNVs screening in Northeast Han Chinese individuals with ASD using array-based comparative genomic hybridization. Results We found that 22 kinds of CNVs (6 deletions and 16 duplications) were potentially pathogenic. These CNVs were distributed in chromosome 1p36.33, 1p36.31, 1q42.13, 2p23.1-p22.3, 5p15.33, 5p15.33-p15.2, 7p22.3, 7p22.3-p22.2, 7q22.1-q22.2, 10q23.2-q23.31, 10q26.2-q26.3, 11p15.5, 11q25, 12p12.1-p11.23, 14q11.2, 15q13.3, 16p13.3, 16q21, 22q13.31-q13.33, and Xq12-q13.1. Additionally, we found 20 potential pathogenic genes of ASD in our population, including eight protein coding genes (six duplications [DRD4, HRAS, OPHN1, SHANK3, SLC6A3, and TSC2] and two deletions [CHRNA7 and PTEN]) and 12 microRNAs-coding genes (ten duplications [MIR202, MIR210, MIR3178, MIR339, MIR4516, MIR4717, MIR483, MIR675, MIR6821, and MIR940] and two deletions [MIR107 and MIR558]). Conclusion We identified CNVs and genes implicated in ASD risks, conferring perception to further reveal ASD etiology. Supplementary Information The online version contains supplementary material available at 10.1186/s12888-023-04565-7.


Background
Autism spectrum disorder (ASD) is a common neurodevelopmental disorder with an increasing prevalence worldwide [1,2]. ASD manifests the wide range of symptoms and severity in perceivability and socialization with others, such as limited and repetitive patterns of behavior. Both genetic and environmental factors are involved in ASD pathogenesis. Environmental factors, including viral infections, medications during pregnancy, and air pollutants, may contribute to ASD risks [3]. Compared with environmental factors, genetic factors appear to be a prerequisite for ASD development: genetic changes (mutations) may increase ASD risks; additionally, genes, such as CHD8 [4], CNTNAP2 [5], DCC [6], neurexin genes [7], SHANK1 [8], SHANK2 [9], SHANK3 [10], and WNT2 [11] may affect brain development or brain-cells communication. ASD heritability has been estimated to be 50%, reflecting that genetic factors afford main components in ASD etiology [12].
ASD begins in early childhood. Children with ASD usually show symptoms of autism within the first year, and regress during a period between one and two years of age. Although there is no specific medication for ASD patients [13], early treatment can confer the lives of children with ASD beneficially. Gene-based test provides an impressive opportunity to identify potential infants with ASD [8].

Study subjects
We enrolled 16 individuals with ASD aged 2 to 7 years from the Chunguang Rehabilitation hospital in Jilin Province, after cases with fragile X syndrome, Rett syndrome, chromosomal abnormalities, or any neurological or psychiatric disorders were excluded. The individuals with ASD were diagnosed by Pediatric Neurology and Neurorehabilitation doctors using the Diagnostic and Statistical Manual of Mental Disorders (5 th edition) [32]. All the individuals with ASD were northeast Han Chinese.

DNA extraction and Detection of CNVs
Genomic DNA was extracted from peripheral blood samples using DNA extraction kits, according to the manufacturer's instructions (DP319 TIANamp Blood DNA Kit, TIANGEN BiotechCo. Ltd, Beijing, China) [33]. We used Nano Drop (Cat#ND-1000, Ther-moFisher, Waltham, MA, US) and 1% agarose gel electrophoresis to check the quantity and quality of the isolated DNA. We used aCGH for genome-wide CNVs screening (Agilent SurePrint G3 Human CGH 60 K). Male and female DNA samples were hybridized with male and female reference DNA samples (G1471, G1521, Promega), respectively.

aCGH data analysis
We converted the raw data using FEATURE EXTRAC-TION software 10.7 and analyzed CNVs using Agilent CytoGenomics software 4.0.3.12 (Agilent technologies, Santa Clara, CA, US). The human genome assembly NCBI36/hg18 was used as a reference. The analysis settings for CNVs calling were Aberration Detection Method 2 algorithm, centralization threshold 6, bin size 10, and minimum number of adjacent probes 3. Thresholds were set via log2-ratio (log 2 R ) (for detecting duplications, log 2 R ≥ 0.25; for detecting deletions, log 2 R ≤ -0.25).

Identification of potential pathogenic CNVs of ASD
We calculated the frequency of each overlapping or non-overlapping CNV in DNA samples from our subjects. CNVs with same overlapping sequence were defined as one kind of CNV, and a non-overlapping CNV was also sorted as one kind of CNV. The circular plot of CNVs distribution in chromosome was visualized using circlize package in R3. 6 [35]. CNVs were considered of strong putative interest when they reached the following criteria: (1) they were classified as likely pathogenic or pathogenic; (2) they were of large size (> 100 kb); (3) they had been found in the knowledgebases for the genetic evidence of ASD (Simons Foundation Autism Research Initiative [SFARI, https:// www. sfari. org/ resou rce/ sfari-gene/], or AutismKB [http:// www. autis mkb. com]); (4) they had been found in the Database of genomic variation and phenotype in Humans using Ensembl Resources (DECIPHER, https:// decip her. sanger. ac. uk/ about# overv iew); and (5) they contained previously reported ASD-relative genes.

Identification of potential pathogenic genes of ASD
We selected potential pathogenic genes within potential pathogenic CNVs on the basis of the following criteria: (1) genes enriched in ASD-related pathways; and (2) same genes shared with 363 genes in SFARI classified as high-confidence or strong-candidate, or with 228 genes in AutismKB classified as high-confidence.

Identification of potential pathogenic microRNAs of ASD
MicroRNAs (miRNAs) are involved in the pathogenesis of ASD [30,36]. Because genes implicated in CNVs that we found encode miRNAs, we further selected potentialpathogenic-CNVs-encoded miRNAs by retrieving Pub-Med according to experimental evidence documenting nervous system dysfunction.

Bioinformatic analysis
The Gene Ontology (GO) and KEGG pathway analyses of the genes from potential pathogenic CNVs were performed using clusterProfiler package in R3.6.2 software [37,38]. P-value < 0.05 was considered statistically significant. miRWalk 2.0 database, which contained 12 miRNA-target-prediction database, was used to predict target genes of CNVs-encoded miRNAs [39]. We selected the target genes according to the criteria-target genes existed in at least seven of the 12 databases. Moreover, interactive relationship between CNVs-encoded miRNAs and target genes was presented using Cytoscape 3.8.0 (http:// www. cytos cape. org/).

Identification of CNVs
To detect CNVs, aCGH was performed in all DNA samples from the 16 subjects with ASD (13 males and 3 females). We identified 364 CNVs (153 deletions and 211 duplications) with an average genomic size of 211.982 kb (114.091 kb for deletions and 258.705 kb for duplications). The mean number of CNVs per subject was 22.750 (9.563 for deletions and 13.188 for duplications). The mean number of deletions in male (10.462) was greater than that in females (5.667) ( Table 1).

Identification potential pathogenic CNVs of ASD
A total of 20 CNVs from 364 CNVs failed to be converted to GRCh37 (hg19); thus, we obtained 72 benign, 65 likely benign, 9 VOUS, 167 likely pathogenic, and 31 pathogenic CNVs ( Table 2). We found that more than half CNVs were likely pathogenic or pathogenic.
After we calculated the frequency of each overlapping or non-overlapping CNV in DNA samples from our subjects, 344 CNVs were converted into 115 kinds of CNVs (45 deletions and 70 duplications). All the 115 kinds of CNVs were further classified (benign: 13 kinds; likely benign: 18 kinds; VOUS: two kinds; likely pathogenic: 60 kinds; and pathogenic: 13 kinds) (Supplementary Table 1). The distribution of the 115 kinds of CNVs in chromosome is visualized by circular plot (Fig. 1).

Identification of potential pathogenic genes with CNVs of ASD
A total of 511 genes from the 22 potential pathogenic  The top 20 pathways are presented in Fig. 2 and Supplementary Table 5. We constructed intersections among 511 genes that we found, 363 high-confidence or strong-candidate risk genes of ASD reported in SFARI database, and 228 high-confidence risk genes related to ASD reported in AutismKB database (Fig. 3). After investigating genes in the intersections, we found that cholinergic receptor nicotinic alpha 7 subunit gene (CHRNA7) was involved in the regulation of excitatory postsynaptic potential and cholinergic synapse; dopamine receptor D4 gene (DRD4) was involved in the regulation of synaptic transmission, dopamine binding, and glutamatergic synapse; HRas proto-oncogene (HRAS) played roles in the regulation of excitatory postsynaptic potential, glutamatergic synapse, and mTOR signal pathway; oligophrenin 1 gene (OPHN1) correlated with regulated synaptic signal, ionic glutamate receptor binding, and glutamatergic synapse; phosphatase and tensin homolog (PTEN) was implicated in the regulation of synaptic signal, neuron differentiation of central nervous system, ionic glutamate receptor binding, sphingolipid signaling, and mTOR signaling; SH3 and multiple ankyrin repeat domains 3 gene (SHANK3) was involved in the regulation of synaptic signal, ionic glutamate receptor binding, neuronal synapse, postsynaptic density, and asymmetric synapse; solute carrier family 6 member 3 gene (SLC6A3) played roles in dopamine binding, neurotransmitter: sodium cotransporter activity, and neurotransmitter transport activity; and TSC complex subunit 2 gene (TSC2) was involved in synapses, postsynaptic density, asymmetric synapses, and mTOR signaling pathways. Scores of all these genes (CHRNA7, DRD4, HRAS, OPHN1, PTEN, SHANK3, SLC6A3, and TSC2) in AustismKB and corresponding ranks in SFARI are listed in Table 4. DRD4, HRAS, OPHN1, SHANK3, SLC6A3, and TSC2 were in the regions of CNVs duplication. CHRNA7 and PTEN were in the regions of CNVs deletion.

Identification and analysis of potential pathogenic CNVs-encoded miRNAs of ASD
We found 50 potential-pathogenic-CNVs-encoded miRNAs (45 encoded by duplication regions and 5 encoded by deletion regions). According to experimental evidence documenting nervous system dysfunction, we retrieved PubMed, identifying that 12 CNVs-encoded miRNAs were previously reported to be associated with brain or nervous system dysfunction (Table 5). We intersected CNVs-encoded-miRNAs-targeted genes predicted using miRWalk 2.0 database with the union between SFARI and AutismKB ( Supplementary  Fig. 1). A total of 219 target genes were chosen for further study. We presented the interaction networks between CNVs-encoded miRNAs and 219 target genes (Figs. 4 and 5). The CNVs-encoded miRNAs and target genes are presented in Supplementary Tables 6 and 7.
We further investigated potential functions of the 219 target genes using GO analysis. KEGG pathway enrichment analysis showed enriched key pathways, such as glutamatergic synapse (hsa04724), dopaminergic synapse (hsa04728), and Wnt signaling pathway (hsa04310). The top 20 pathways are presented in Fig. 6 and Supplementary Table 11.

Discussion
In the present study, we identified that 22 kinds of CNVs (six deletions and 16 duplications), eight protein-coding genes, and 12 miRNAs-coding genes are associated with ASD risks in northeast Chinese Han from Jilin province, China. CNVs have repeatedly been found to correlate with ASD risks [40,41]. In our study, we filtered 22 potential pathogenic CNVs. Individuals with deletions and duplications of 15q13.3 have been found to manifest neuropsychiatric disease and cognitive deficits [42]. In line with the discoveries of Bitar et al. [ 33, and Xq12-q13.1 were associated with ASD risks. Autism-related phenotypes are common in patients with deletion or duplication at 22q13.3 [48][49][50][51]. Most of the defects are due to haploinsufficiency of SHANK3 [49]. Chen et al. found a deletion at 22q13.3 in two male children with ASD and a duplication at 22q13.31-q13.33 in one male child with ASD from Taiwan, China [46]. In our study, we found a duplication at 22q13.31-q13.33 that overlaps SHANK3 from two male children with ASD, indicating that the duplication at 22q13.31-q13.33 may play a key role in ASD etiology in our population. CNVs at 15q13.3 have been found to be involved in a variety of neuropsychiatric diseases, including intellectual disability/developmental delay, epilepsy, schizophrenia, and ASD [42,[52][53][54]. The relation between CHRNA7 at 15q13.3 and neuropsychiatric disorder phenotype has been validated intensively [53]. In accordance with the discovery of Pinto et al. [28], we also found that a deletion of CHRNA7 was associated with ASD risks.
Except CHRNA7 and SHANK3, we found CNVs-duplications (DRD4, HRAS, OPHN1, SLC6A3, and TSC2) and Fig. 3 Venn diagram based on ASD_SFARI, ASD_AutismKB, and genes in our candidate CNVs for ASD. Note: We denote genes in our candidate CNVs for ASD as "ASD_CNV", the 363 high confidence and strong candidate autism risk genes in SFARI as "ASD_SFARI", and the 228 high confidence autism related genes in AutismKB as "ASD_AutismKB" CNVs-deletions (PTEN). For DRD4 and HARS, we found seven children with ASD had duplications at 11p15.5, which overlaps DRD4 and HARS. Mutations in DRD4 are associated with ASD risks [55][56][57]. The mRNA expression levels of DRD4 in peripheral blood lymphocytes are higher in people with ASD than those in healthy controls [58,59]. Herault et al. also found positive association between HRAS and autism in French-Caucasian [60,61]. For OPHN1 at Xq12-q13.1, Celestino-Soper et al. found a deletion of exons 7-15 of OPHN1 at Xq12 in a male child with ASD [45]. In contrast, we found a male child with ASD had a duplication at Xq12-q13.  [63]. We found duplications at 16p13.3 in two female children with ASD. PTEN loss involved in white matter pathology in human with ASD is consistent with that in mouse models [64]. We revealed that deletions at 10q23.2-q23.31 overlapping PTEN in 13 male children with ASD, rather than 3 female children with ASD. Thus, these eight genes may be implicated in ASD etiology. MiRNAs encoded within CNVs are important functional variants, providing a new dimension to recognize the association between genotype and phenotype [65]. MiRNAs play vital roles in governing essential aspects of inhibitory transmission and interneuron development in nervous system [66]. Deletion or duplication of a chromosomal loci changes the levels of miRNAs which further impact on neuronal function and communication [36]. In our study, 12 candidate-susceptible miRNAs-coding genes of ASD were identified (ten duplications [MIR202, MIR210, MIR3178, MIR339, MIR4516, MIR4717, MIR483, MIR675, MIR6821, and MIR940] and two deletions [MIR107 and MIR558]). BDNF, a brainderived neurotrophic factor and a member of the neurotrophic factor family, is a target gene of miR-202 [67]. Moreover, we further predicted that miR-4717-5p, miR-483-3p, and miR-940 also targeted BNDF. Skogstrand et al. found that lower BDNF levels in serum correlate with ASD risks [68,69]. miR-339-5p has been found to be a drug target for Alzheimer's disease, and is low expressed in mature neurons and related to axon guidance [70,71]. In our study, we found that miR-339-5p targets 42 genes associated with ASD risks. Among these genes, the association of DIP2A and ASD risks has been validated by our team [72]; moreover, Dip2a knockout mice exhibit autism-like behaviors, including excessive repetitive behavior and social novelty defects [73]. Notably, autism-like behaviors and germline transmission in MECP2 transgenic monkeys corroborate association between miR-339-5p and MECP2 [74]. In addition, miR-202-5p, miR-483-3p, and miR-940 also targets MECP2. For these reasons, miRNAs encoded within CNVs may be implicated in ASD etiology. For enrichment analysis, we found that genes were enriched in synapse, synapse-related signal regulation, neurotransmitter activity, neurotransmitter transport, and neurotransmitter binding. Mutations in synapse-related or neurotransmitter-related genes are associated with ASD risks in multiple unbiased, targeted sequencing, and neuropathological studies, evidencing that dysregulation in synaptogenesis and neurotransmission is implicated in the pathogenesis of ASD [75][76][77][78]. We corroborated that ASD pathogenesis was related to dopaminergic synapse, mTOR signaling pathway, insulin signaling pathway, and cholinergic synapse [79][80][81][82]. Dopamine affects ASD-related-brain regions (basal ganglia, cortex, and amygdala) via dopaminergic synapse [79]. mTOR is involved in integrating signaling from ASD synaptic and regulatory proteins, such as SHANK3, FMRP and the glutamate receptors mGluR1/5 [63,83]. Dysfunction in mTOR signaling affords one of mechanisms of ASD -an imbalance between excitatory and inhibitory currents [80]. Insulin Fig. 4 Interaction network of the CNVs-encoded-miRNAs-targeted genes in ASD (duplication). Note: Yellow rectangles represent the miRNAs encoded within pathogenic CNVs regions, while 219 CNVs-encoded-miRNAs-targeted genes are denoted by diamonds. Blue, pink, purple, light green, and red diamonds represent different target genes which are targeted by 1, 2, 3, 4, and 5 miRNAs respectively signaling pathway is feasible for development of autism [81]. Neurochemical abnormalities in the cholinergic system are involved in ASD pathogenesis, highlighting the potential for intervention-targeted cholinergic synapses [82].
Functional network analysis of the 219 CNVs-encoded-miRNAs-targeted genes elicited that a novel regulating mechanism of these CNVs-encoded miRNAs consisted of synapse-related functions (glutamatergic synapse, dopaminergic synapse, serotonergic synapse, and GABAergic synapse), axon guidance, ion channel (ion-gated channel and cation channel complex), and Wnt signaling pathway. Synaptic function and Wnt signaling pathway are affected by mutations in diverse ASD-related genes, and altered Wnt pathway signaling may confer an involvement in ASD pathogenesis [78]. Interestingly, dysfunction of axon-guidance signaling is integral to the microstructural abnormalities of the brain in people with ASD [84]. Notably, the involvement of ion channel-related genes has been found in ASD etiology [85]. Mutations in ion channel genes contribute to low-to-moderate susceptibility of ASD [85].
Both GO and Pathway enrichment analyses showed that CNVs-relating genes and CNVs-encoded-miRNAstargeted genes mapped synapse-related functions. Additionally, CNVs-relating genes also enriched in mTOR signaling pathway and insulin signaling pathway. In contrast, CNVs-encoded-miRNAs-targeted genes enriched in axon guidance, ion channel, and Wnt signaling pathway. These results documented the high complexity and heterogeneity of ASD, suggesting that different genomic alteration on same chromosomal location may confer distinct but complementary effects on the brain of people with ASD.
Our study had some limitations: (1) the sample size in our study may confer limited statistical power to discover significant findings; (2) genetic and environmental factors contribute to ASD risk; however, environmental factors were not available for us; and (3) de novo or inherited of the CNVs were not be classified because of the lack of data from parents.
Despite these limitations above, our study also had some strength. Firstly, we found eight de novo CNVs (duplications at 1p36. 31 Table 12). Secondly, we identified 20 genes (eight protein-coding genes supported by SFARI and AutismKB and 12 microRNAs-coding genes that refine understanding of involving approach of ASD-susceptible-genes in etiology) are implicated in ASD risks. Thirdly, we performed GO and KEGG pathway analyses of CNVs-relating genes and CNVs-encoded-miRNAstargeted genes, providing a new dimension to revealing ASD etiology.

Conclusions
In summary, we identified that 22 kinds of CNVs (six deletions and 16 duplications), eight protein-coding genes, and 12 miRNAs-coding genes are implicated in ASD risks, conferring perception to further reveal ASD etiology.  Fig. 1. Venn diagram based on ASD_SFARI, ASD_AutismKB, and CNVs-encoded-miRNAs-targeted genes.