Endophenotype-Wide Association Study Reveals Genetic Substrates of Core Symptom Domains and Neurocognitive Function in Autism


 Background Autism is a neurodevelopmental disorder largely attributable to rare and common genetic variants. Additionally, environmental factors such as maternal immune activation and air pollution exposure can also increase the risk of autism. Genetic heterogeneity of autism has been well-recognized from gene discovery efforts over the past decade; however, genetic substrates of endophenotypes that constitute phenotypic heterogeneity are not known yet. Methods Whole-genome sequencing (WGS) data and a set of phenotype scores that represent neurocognitive development and the severity of core symptoms of autism were collected from the iHART and MSSNG databases and the phenotype database of Autism Speaks. Endophenotype-wide association analysis was performed with genome-wide genotype and 29 phenotype scores. Results One or more genetic loci were associated with each of phenotype scores at a genome-wide significance threshold ( P =5×10 -8 ) except for a total score of the Social Responsiveness Scale-2. An intergenic locus on chromosome 15q26.1 was significant for three core symptom domain scores of ADOS Module 1 while each phenotype score was associated with a unique set of genetic loci. The Repetitive Behaviors Scale total score was associated with the largest number of loci (N=132) including the loci that overlapped with the genes involved in brain development and neurodegenerative disorders. Among the significant genotype-endophenotype associations, verbal intelligence and the OSTN gene was notable. The secretory peptide osteocrin—encoded by OSTN —is implicated in activity dependent dendritic growth in human and has potential for a biomarker of autism and an endophenotype marker for verbal intelligence. Limitations Validation of our findings in another cohort is required. Several associations involving the ADI-R and ADOS scores may indicate inherited allelic differences between affected and unaffected individuals since unaffected siblings were included in our analysis. Conclusions Our results suggest that autism candidate genes discovered by case-control GWAS may include trait-associated genes for core symptoms.

Diagnostic validity of ASD is well established (10) while genetic and phenotypic heterogeneity are evident. (3) Research Domain Criteria (RDoC) were established by the National Institute of Mental Health (NIMH) to create a framework for research on pathophysiology, especially for genomics and neuroscience, which will ultimately inform classification schemes. (11) The idea was to introduce a parallel categorization system to DSM-5, which describes validated dimensions of functioning relevant to mental health that can be linked to underlying biological systems. To this end, endophenotypes (EPs) enable researchers to narrow the gap between mental disorders and their genetic underpinnings. The commonly proposed models of EPs were reviewed by Kendler and Neale: the liability-index (or "riskindicator") model and the mediational model. (12) The former mechanism postulates that risk for dichotomous mental disorders and continuous EPs are correlated with a common set of genes. On the other hand, the latter illustrates a causal pathway in which genetic variants influence EPs, leading to the corresponding mental disorder. Although Kendler and Neale noted the stronger and more falsifiable nature of the mediational model, EPs are explained most accurately with a bivariate or multivariate paradigm. In fact, several EPs of a disorder such as cognitive abnormalities and antisocial behavior in schizophrenia can be accounted by distinct components of genetic risk. (13) Here we aimed to find genetic loci associated with EPs assessed by diverse instruments for various aspects of ASD and cognitive systems. We collected phenotype measures in diverse domains of neurocognitive function evaluated by standard instruments and tests-ADI-R, ADOS, Repetitive  (19) and head circumference (HC)-that are essential to evaluate positive and negative valences of ASD, as well as related cognitive systems and social processes in the context of RDoC framework. With detailed phenotype data and whole genome sequencing (WGS), we employed a genome-wide association (GWA) analysis framework to discover common genetic variants that are associated with the core symptoms of ASD. We found associations between neurocognitive features of ASD and several variants that have previously been described in the context of psychological disorders, supporting their likely contribution to the genetic underpinnings of ASD.

Method Participants
Family-based data was collected from all individuals who participated in the Autism Genetic Resource Exchange (AGRE) Consortium, which compiles the WGS and phenotype data of families containing at least one individual with ASD diagnosed by the ADI-R and ADOS. (20) Although both instruments assess the three domains of ASD, they differ in format; the ADI-R is a structured caregiver interview that is shorter, (21) while the ADOS involves observation of the examinee in a series of standardized scenarios. (22) The ADI-R was utilized to characterize individuals in the sample as Autism, Not Quite Autism (NQA), Broad Spectrum, or Not Met. (21) In accordance with previous methods, we classified individuals as "case" if they fell under the Autism or NQA categories while "unaffected" individuals were those who were categorized as Broad Spectrum or Not Met by the AGRE. In addition to ASD-specific diagnostic tests, participants were given an opportunity to complete additional phenotype evaluations and these scores were utilized in GWA studies (GWASs).
Our AGRE dataset consisted of 11,961 individuals with demographic and phenotypic information, including 3,833 individuals with WGS data available. WGS data was collected through MSSNG and the Hartwell Autism Research and Technology Initiative (iHART) consortiums. MSSNG, a joint effort of Autism Speaks, University of Toronto, SickKids Hospital, and Google, is the largest collection of readily available WGS data for ASD researchers. (23) In its first phase of collection, MSSNG aimed to incorporate the phenotype scores and WGS data from individuals who were primarily part of the AGRE. (23) iHART is distinct in that its collection of WGS data from AGRE individuals focuses on multiplex families.(24) Both repositories have allowed for the successful identification of novel ASD-risk genes, which furthers our progress in developing interventions for the disorder. A summary of the demographic data for the entire AGRE dataset as well as for individuals with WGS data can be accessed in Additional File 1.

Phenotype scores
For our analysis, we analyzed 29 scores from nine phenotypic instruments compiled in the AGRE dataset: ADI-R, ADOS, RBS, SRS, PPVT, RPCM, SB-5, VABS, and HC. Each instrument covers one or more core symptom domains of ASD or neurocognitive development by age to access the severity of a participant's impairment. ADI-R, ADOS, and SRS have components to estimate difficulties in social interaction. RRBs are scored in the ADI-R, ADOS, and RBS while deficits in verbal and nonverbal communication are mostly measured by the ADI-R and ADOS. General neurocognitive development is estimated by RPCM, PPVT, SB-5, and VABS. Additional File 2 summarizes the instruments and scores used in our study. The number of individuals with scores for each phenotype measure (either in the entire AGRE dataset or with WGS data available) varied because of the differences in compliance and completion rates across phenotypic instruments. Among the anthropometric measurements, we incorporated HC that is well-studied in the context of ASD and associated genetic conditions.(25)

Phenotype assessment for core symptoms of ASD
The ADI-R is a standardized, semi-structured interview administered by an experienced rater to a caregiver of participants suspected of having ASD. Effective for differentiating ASD from similar developmental disorders, the ADI-R is concerned with the participant's development, social functioning, language acquisition, and RRBs. In our study, we used the 4 corresponding domain scores-Social, The RBS is a caregiver-informant questionnaire that quantifies various forms of RRBs that are characteristic of ASD. (14) Participants are evaluated on six subscales: stereotyped behavior, selfinjurious behavior, compulsive behavior, ritualistic behavior, sameness behavior, and restricted behavior.
The RBS Overall Score, which combines the five subscale (i.e., Ritualistic/sameness, Self-injurious, Stereotypic, Compulsive, and Restricted) scores, provides a measure of RRB severity and was chosen for our analysis. To encompass the distinct social domain of ASD, we also incorporated the SRS T-Score Total in our study. The SRS is a widely accepted measure of social impairment in the realms of social awareness, social cognition, social communication, social motivation, and mannerisms.

Instruments for assessing neurocognitive development
The SB-5 quantifies the cognitive abilities and intelligence of clinical and nonclinical populations. (18) The total scores from these two realms are combined to yield the full-scale IQ (FSIQ) score, which is used in addition to verbal IQ (VIQ) and nonverbal IQ (NVIQ) scores in the present study. All three of these scores are age-normed (mean 100, standard deviation (SD) 15). To provide additional information about each participant's neurocognitive development and encompass receptive vocabulary, we incorporated the PPVT standard score (mean 100, SD 15). The PPVT is an individually administered assessment of receptive lexical knowledge. (16) Of the three different versions recorded for the AGRE cohort, we chose 'Version 3' since it was used for most individuals with a reported PPVT score (1,681 out of 2,239).
Consisting of a series of tasks in which participants are required to identify missing elements of matrix patterns, the RCPM is a measurement of nonverbal intelligence. (26) The assessment serves as a paramount measurement of nonverbal processing, fluid intelligence, and spatial reasoning. (27) We utilized raw total scores from the RCPM in our analyses. The VABS is a semi-structured caregiver interview examining a participant's adaptive behavior and living skills.(28) An individual's level of functioning within the domains of communication, daily living skills, socialization, and motor skills are evaluated and used to derive the composite standard score-an age-normalized score (mean 100, SD 15) used for the purposes of the current investigation.

Genotype data
Merged variant call files were downloaded from the MSSNG (version db6, N=9,621) and iHART (version v01, N=2,308) project sites. After selecting individuals with available phenotype scores and filtering genotype data (bi-allelic variants of 0% genotype missing rate, variant allele frequencies between 5% and 95%, Hardy-Weinberg equilibrium), a total of 5,313,961 variants (4,983,916 single nucleotide variants (SNVs) and 330,045 indels) on autosomes across 3,833 individuals were available to be tested for association with phenotypic scores. The top 10 principal components (PCs) calculated from the 3,833 individuals were used as covariates to control for global ancestral backgrounds. For each phenotypic score, we used genetic variants with allele frequencies between 5% and 95% among individuals with the tested phenotypic score since the number of available individuals varied by phenotype score (Additional

File 3).
Therefore, the number of tested variants was less than 5,313,961 and varied across tests. For each phenotype score, we performed a GWA using PLINK (version v2.00a3LM downloaded from https://www.cog-genomics.org/plink/2.0/). Participant age at each test performed, gender, and top 10 PCs were used as covariates. A genome-wide significance P-value threshold of 5×10 -8 was applied to select significant genomic variants for each analysis. (29) The summary statistic files from PLINK were used as an input to Functional Mapping and Annotation (FUMA, available at https://fuma.ctglab.nl/) for functional annotation and regional plots. (30) To calculate the proportion of variance in phenotype score that was explained by genotype and the other covariates, we performed variance component analysis (VCA). The age at completion of each test, gender, genetic ancestry, and polygenic risk score (PRS) for ASD were utilized as covariates. PRS for ASD was calculated using the risk alleles at P-value <0.1 as reported by Grove et al. (31) using PLINK. To make VCA computationally feasible, the numeric covariates (such as age at test and PRS) were discretized: 4 levels for age (7 years old or less; 7 to 9 years old; 9 to 12 years old; 12 years old or more) and decile ranks for PRS. For head circumference measurement, we grouped ages differently (7 years old or less; 7 to 12 years old; 12 to 18 years old; 18 years old or more) due to wide range of values (from 1.7 to 60.2 years old). The statistical language R (version 4.0.5) and the R library 'VCA' (version 1.4.3) were used for the analysis.

Phenotype scores
The diagnostic and neurocognitive measurements used in the current study and the number of available participants for each measurement are listed in Additional File 3. Since the number of available phenotype scores varied across individuals, we used all participants for each phenotype score instead of selecting a subgroup (N=509) with all phenotype scores. Thus, each association test included a different number of individuals. For instance, the ADI-R social domain score was available for 3,746 individuals (includes 3,386 probands and 358 unaffected siblings) while the SB-5 FSIQ score was available for 833 individuals (includes 681 probands and 146 unaffected siblings). Between individuals with available WGS data and all individuals in the AGRE cohort, differences for all phenotype scores except for ADOS Module 3 Behavior Total, ADI-R Verbal Communication Total, RBS Overall Score, and SRS Total Tscore were not significant at the threshold of P <0.01 (Wilcoxon test), confirming that the group with WGS data is an unbiased subset within the AGRE dataset. We also compared the distribution of scores in the AGRE cohort with published results. This comparison allowed us to check whether our cohort displayed any bias in terms of severity of ASD and neurocognitive traits. All scores were comparable with published baseline scores for individuals with ASD (Additional File 3).

Genome-wide Association Analysis of Core Symptoms of Autism
Participants were given an opportunity to complete the eight phenotype evaluations: ADI-R, ADOS, SRS, RBS, PPVT, RPCM, SB-5, and VABS, but compliance rates varied based on the test. For each phenotype measure, one or more domains were chosen to assess the severity of impairment. We performed GWA analysis for phenotype scores and HC measurement (N=29) using PLINK. A total of 681 variants were associated with 16 scores at the threshold of P <5×10 −8 (Figure 1). Each phenotype score was associated with a median of 1.5 variants (ranging from 1 to 531). For each test, PLINK output was uploaded to the FUMA server to identify risk loci from independent significant variants and annotation, resulting in 174 genomic risk loci (Additional File 4 for detailed list and Additional File 5 for regional plots). Of note, we found 68 loci where rare homozygous variants were observed in a small fraction of study cohort (<1% of entire tested individuals) with extreme phenotype scores, for which rare variants transmitted from both parents drove the significant associations. Except for the SRS, all phenotype tests measuring neurocognitive function were associated with one or more genetic loci. The leading variants for four phenotypes-ADI-R Verbal Communication Total and Nonverbal Communication Total scores, RBS Overall Score, and HC-passed the P-value threshold of 1.72×10 -9 (= 5×10 -8 /29) adjusted for multiple concurrent hypothesis testing with 29 scores.
All four of the calculated total scores on the ADI-R (Social, Nonverbal Communication, Verbal Communication, and Behavior) that comprise the characteristic deficits of ASD were associated with one or more genetic loci ( Table 1 and Additional File 4). Behavior and Verbal Communication total scores were associated with intronic SNVs in the HECW1 and CDYL genes, respectively. For an intronic variant in the CDYL gene (rs11754469, P = 7.15×10 -10 ), both male and female carriers of the CC genotype displayed significantly decreased scores compared to individuals with TT and TC genotype groups MACROD2 is a strong candidate gene for neurodevelopmental disorders, while its molecular function in the developing brain remains poorly understood. We also found that genes directly overlapping with significant variants were enriched with calcium ion binding functionality (Fisher's exact test, adjusted P = 0.030).

Genetic Substrates for Cognitive Systems in Autism
Total scores from the four instruments that measure neurocognitive development-PPVT, RPCM, SB-5, and VABS-were associated with multiple loci. For SB-5, NVIQ and VIQ scores were associated with four and two independent loci, respectively, while FSIQ was not associated with any locus. The intronic region of the ACSS3 gene showed the strongest signal for NVIQ. Acyl-CoA synthetase shortchain family member 3 (ACSS3) is a mitochondrial enzyme producing acetyl-CoA from short chain fatty acids., which is necessary for energy creation.(44) NVIQ score was higher for the individuals with TT genotype of the intronic SNV in ACSS3 (rs7487040, P = 2.14×10 -8 ) (Figure 4A). The ACSS3 gene appears frequently in literature regarding psychiatric disorders, ranging from ADHD to schizophrenia. (45,46) A whole-exome sequencing (WES) study of families with single or multiple ADHD cases uncovered a rare variant in the ACSS3 gene.(45) Further, a GWAS that incorporated individuals with schizophrenia, bipolar disorder, and schizoaffective disorder revealed an association between a SNP in the ACSS3 gene (rs7136590, P = 7.43×10 -6 ) and pars orbitalis volume. (47) The pars orbitalis is part of the inferior frontal gyrus and noteworthy to our study of ASD due to its importance for the brain's language processing network. (48,49) Two loci that were significant for VIQ were mapped to the PLA2G4A and CDH23 genes.
Individuals with GG genotype of chr1:186860544 in PLA2G4A were associated with higher VIQ scores ( Figure 4B). The PLA2G4A gene encodes the cytosolic phospholipase A2 (PLA2) that plays important for normal brain development and synaptic function.(50) Cytosolic phospholipase A2 (PLA2G4A) is an enzyme that facilitates phospholipid hydrolysis to cleave and, thus, release fatty acids including as arachidonic acid.(50) An earlier GWAS revealed SNPs in PLA2G4A associated with epilepsy,(51) a condition that is estimated to be comorbid with ASD in up to 39% of cases. (3) The CDH23 gene encodes atypical cadherin that is implicated in Usher syndrome type 1, non-syndromic and age-related hearing loss, pre-pulse inhibition, and Alzheimer's disease. (52) Although the PPVT is a receptive vocabulary test and, therefore, a proxy for verbal intelligence, the SB-5 VIQ and PPVT standard scores were associated with different loci in our cohort. Two intronic loci in the GDPD4 and OSTN genes were significantly associated with the PPVT score ( Figure 3B).
Inherited or de novo CNVs encompassing GDPD4 were found in patients with ASD.(53) Interestingly, the OSTN gene, encoding osteocrin, restricts activity-dependent dendritic growth in human neurons. In response to sensory input, osteocrin regulates features of neuronal structure and function that are unique to primates.(54) An additional SNV in the OSTN gene was associated with RBS Total score. Additionally, two intergenic loci were significantly associated with the RPCM score, which is a "paradigmatic" measure of fluid intelligence.(55) The total VABS score, which assesses adaptive behavior, was associated with a locus in the NUGGC gene.
We discovered that eight loci were significantly associated with HC. These loci were mapped to CHD5, GRP137B, NKAIN3, UBASH3B, and intergenic regions. The strongest signal was found for the NKAIN3 gene (Figure 4C), which encodes the Sodium/Potassium Transporting ATPase Interacting 3 protein. NKAIN3 encompasses a risk allele for dyslexia (56) and is a known candidate gene for Dravet syndrome (MIM# 607208), which is a disorder characterized by an infantile-onset epileptic encephalopathy, intellectual disability, and refractory seizures.(57)

Proportion of phenotype variation explained by genotype
We performed a VCA to estimate the contribution of various covariates-age at test, gender, genetic ancestry, and PRS-in the phenotypic scores (Additional File 6). Except for HC for which the covariates accounted for 71.4% of variance in the observed values, an average of 11.5% (range from 0.92% to 44.18%) of phenotypic variance was attributed to the covariates. For HC, age was the major contributor (54.23%) to the phenotype variance, followed by gender (14.2%). In fact, gender and age at test were the most frequent top contributing covariates for several phenotypes-12 out of 29 (age) and 15 out of 29 (gender)-as well as the largest contributors (5.6% on average by age (ranges from 0 to 38.10%) and 3.4% by gender (ranges from 0 to 14.29%)). Overall, PRS for ASD was not likely attributable to the variance of EPs.

Discussion
Independent studies on multiple cohorts have demonstrated that ASD is highly heritable with genetic underpinnings that are likely polygenic from common and rare variants. Using WGS, previous studies discovered candidate genes with de novo mutations and rare inherited variants enriched in individuals with ASD. Gene discovery efforts with genotyping microarrays, WES, and WGS have been successful to catalogue candidate genes for ASD. However, there are still specific genes, molecular mechanisms, and brain circuits implicated in the disorder that are yet to be discovered. More importantly, understanding the biological substrates that underlie specific symptoms will be valuable to define target symptoms for treatments and, thus, to develop therapeutic approaches. To this end, we aimed to discover the genetic basis of the core symptom domains and neurocognitive development in ASD using rich phenotype information and WGS data from the AGRE. All of phenotype tests that we used in the analysis were associated with a genetic locus or multiple loci except for SRS Total Score.
The most significant association was found for ADOS Behavior Total Score (Module 3) and the locus including exon 4 and intronic region of the PTPRD gene. For an intronic SNV of PTPRD, the scores were higher in individuals of AA genotype of rs12006270, with females displaying more severe behavioral deficits. The PTPRD gene encodes the receptor protein tyrosine phosphatase delta (PTPRD) that regulates neurogenesis by modulating tyrosine kinase signaling pathway.(58) Previously, a homozygous microdeletion of this gene was found in a patient with intellectual disability, hearing loss, and trigonocephaly.(59) Decreased dosage of PTPRD showed aberrant neurogenesis and an increased number of cortical neurons in vivo that suggest PTPRD is a key regulator of brain development. (58) Indeed, independent studies have reported genetic association of PTPRD with ASD,(60) ADHD, (61)and Obsessive Compulsive Disorder.(62) Moreover, neurofibrillary tangle accumulation in autopsy brain samples from Alzheimer's disease was associated with the PTPRD locus (rs560380, P = 3.8×10 -8 ). (63) It is compelling that 132 loci were significantly associated with RBS Total Score, and 54 of these loci had lead variants with P <1.72×10 -9 . RRBs comprise one of the core symptom domains of ASD in the DSM-IV; however, these behaviors are observed in multiple neuropsychiatric conditions (e.g., schizophrenia, bipolar disorder, obsessive-compulsive disorder, drug addiction, L-DOPA-induced dyskinesia, and Huntington's disease).(64) Behavioral approaches are used to treat RRBs and several pharmacological treatments have been effective in reducing these behaviors in ASD. Therefore, RRBs are treatment targets; however, biological pathways and neural circuits associated with RRBs remain undiscovered. Interestingly, the eight loci that were associated with RBS Total Score were enriched with the genes involved in the calcium signaling pathway and highly expressed in various regions of the brain. Parvalbumin (PV) is a calcium binding protein that is expressed in a subpopulation of neurons called fastspiking interneurons (i.e., PV+ interneurons). PV+ interneurons are reduced in the prefrontal cortex of ASD compared to controls.(65) Moreover, a recent study found that dysregulation of calcium signaling in astrocytes of striatal microcircuits contributed to repetitive behaviors in vivo. (66) PV knockout mice exhibit RRBs as well as the other core symptoms of ASD. (67) Multiple SNVs in coding and intronic regions of the OSTN gene were significant for the total scores on the PPVT and RBS. Osteocrin, which is a secretory peptide of 103 amino acids, binds to natriuretic peptide clearance receptor. (68) Osteocrin is involved in activity-dependent regulation of neuronal function, bone growth, and physical endurance. (69) The OSTN gene is highly expressed in multiple areas of the developing brain, especially in the neocortex, and shows higher levels of expression in cortical regions compared to the other tissue types in the adult human. Ataman and colleagues identified that osteocrin was secreted in an activity-dependent manner in human fetal brain cultures, but not in mice. (54) Evolutionary acquisition of the regulatory region of OSTN enables the binding of MEF2, an activity-regulated transcription factor. As a result, activity-dependent dendritic growth is restricted in human neurons. Indeed, integrative analysis of ChIP-seq, transcriptome, and protein-protein interaction data demonstrated that MEF2A and MEF2C binding sites were enriched in the regulatory regions of ASD candidate genes. (70) In our analysis, an intronic variant (rs6783287, P=7.4×10 -11 ) was significant for a phenotype score related to RRB. Scores on the PPVT and RBS were significantly associated with coding and intronic variants in the OSTN gene located at chromosome 3q28. This region is also a GWAS hot spot for cerebrospinal fluid tau levels in Alzheimer's disease, (71) allowing for the conclusion that its association with a detriment in VIQ is indicative of cognitive decline. Since osteocrin is a circulating peptide, it has the potential to be a biomarker of ASD, endophenotype marker for RRBs and verbal intelligence, and potential treatment target for ASD. However, further in vivo studies are required to understand downstream biological pathways in human cells.

Limitations
Firstly, all the loci discovered for phenotype scores need to be reproduced in the other cohorts. The AGRE participants are primarily multiplex families with pervasive developmental disorder (PDD) and Asperger syndrome that were diagnosed by experts using the ADI-R and ADOS. Multiplex families with ASD can have higher genetic burden compared to sporadic cases; however, the aim of our analysis was to find genetic substrates of phenotype tests covering core symptom domains and neurocognitive development rather than to discover associations between ASD and neurotypical controls. A similar study can be performed for different cohorts to validate the associations from the current study. Secondly, sample sizes were moderate to discover loci with small effect sizes. For instance, ADOS Module 2 scores were available for a subgroup of our cohort (N=311) while 1,881 individual scores were available for the social and behavior domains of ADI-R. Thirdly, genotype-phenotype associations found in our study may be valid for ASD and their family members. Indeed, the candidate genes with alleles that were previously reported for intelligence were mapped to the genes associated with diverse phenotype scores-ADI-R Nonverbal Communication Total Score (CADM2), Behavior Total Score of ADOS Module 3 (PTPRD and GDA), HC (LNPEP) and RBS (FAM78B, CNTN4, FHIT, ICA1, DGKB, SP4, SGCZ, CDH2, PLCB1, MACROD2, and PTPRT), but not with cognitive measurements such as SB-5 FSIQ and PPVT. As unaffected siblings were included in the analysis, some associations with ADI-R and ADOS scores might indicate the genotype difference between affected and unaffected individuals.

Conclusion
In summary, we used WGS and phenotype scores to successfully perform an endophenotype-wide association analysis that extends previous candidate gene discovery for ASD by unveiling the genetic basis of core symptoms and neurocognitive deficits. Notably, several ASD candidate genes that were previously discovered by case-control comparisons were associated with the severity of core symptoms such as RRBs in the present study. It is possible, therefore, that these candidate genes are responsible for specific traits that constitute core symptoms of ASD rather than the disorder itself. Several genes (such as OSTN) identified in our results represent potential biomarkers for ASD; however, further studies are required to replicate our findings and to understand the genetic impacts on molecular pathways, brain circuits, and the phenotype spectrum in the context of RDoC framework.       but loci that satisfy the adjusted P-value threshold for multiple (N=29) concurrent hypothesis testing (P <1.72 × 10-9) are highlighted in red. The genes that overlap with or in anking region of each signi cant genomic loci are displayed next to the corresponding circles. score. The individuals with CC genotype for the lead SNV show lower score. The females with CC genotype are unaffected siblings except for one with score > 15. b Two genic regions of PCLO and SEMA3E were found signi cantly associated to ADI-R nonverbal communication total score. For the lead variant in the PCLO gene, individuals with AA genotype showed lower phenotypic score in both males (all unaffected siblings) and females (all probands). Also, the ADI-R non-verbal communication total score decreased among individuals with GG genotypes for the lead SNV in the SEMA3E gene. c The HOXC11 and HOXC10 loci are signi cantly associated with ADOS Module 1 total score.  Regional plots for the signi cant loci associated with SB-5 nonverbal and verbal IQ scores and head circumference. a and b show the genomic loci in the ACSS3 and PLA2G4A genes associated with SB-5 nonverbal and verbal IQ scores, respectively. c Head circumference is signi cantly associated with the locus close to 5´-untranslated region of NKAIN3.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.