Prevalence and Phenotypic Impact of Rare Likely Pathogenic Variants in Autism Spectrum Disorder

Background: Copy number variants (CNVs) and single nucleotide variants (SNVs) are sources of risk for autism spectrum disorder (ASD). The distribution of such pathogenic variants in individuals with ASD and the characterization of those who carry such variants versus those who do not are understudied at the population level. We describe a population sample from Sweden, evaluating the distribution of likely pathogenic variants and their impact on medical, neurological, and psychiatric phenotypes. Methods: The genotyped sample consisted of 1,236 children born in Sweden with autistic disorder, a severe form of ASD (International Classication of Diseases, Tenth Revision, code F84.0.) Of these individuals, CNVs were called from 997, while SNVs were called from 808. Results: Out of 997 individuals from whom CNVs were called, 104 (10.4 %) carried one or more likely pathogenic CNV, including 15q11q13 (n=8), 15q13.3 (n=5), 16p13.11 (n=5), 16p11.11 (n=5), 22q11.2 (n=5). Of 808 individuals assessed by whole-exome sequencing, 69 (8.5%) had a likely pathogenic SNV, including in GRIN2B (n = 6), POGZ (n = 5), SATB1 (n=4), DYNC1H1 (n=4), and CREBBP (n=3). Fourteen individuals carried two likely pathogenic CNVs, and 5 carried a likely pathogenic CNV and SNV. Carriers of likely pathogenic CNVs or SNVs were more likely to have intellectual disability (ID), scholastic skill disorders, and epilepsy, with odds ratios of 2.31 (95%CI, 1.55,3.47), 1.98 (95%CI, 1.19,3.21), and 1.63 (95%CI, 1.08,2.44) respectively. Carriers of likely pathogenic CNVs also showed signicant increased rates of congenital anomalies. We compared rates of likely pathogenic CNVs, SNVs, and phenotypes from genotyped AD subjects with and without ID: rates were not signicantly different between groups. Limitations: As a case-control cohort, we did not have de novo information to aid in classication. More broadly, there were judgment calls involved in identifying likely pathogenic variants. For these reasons, some misclassication is possible. In addition, phenotypes are dened from medical registers, which may lead to underestimates of milder ndings. Conclusions: People with ASD who carry likely pathogenic CNVs or SNVs show increased rates of various comorbidities, most prominently ID. Despite the strong association with ID, conditioning on its presence explains little of the variation for other comorbidities and physical traits. gestational age 207/139, head – small 220/153, head – large 220/153 individuals. ADHD: attention-decit/hyperactivity disorder, SZ: schizophrenia (SZ), OCD: obsessive-compulsive disorder, ID: intellectual disability.


Background
Autism spectrum disorder (ASD) is a childhood-onset neurological and developmental disorder that affects more than 1% of the ASD has a complex genetic architecture with both rare and common variation contributing to risk. While common variation accounts for the majority of genetic liability for autism, rare variation, often de novo, accounts for substantial individual liability (5). For example, while only 14% of individuals with ASD in the Simons Simplex Collection dataset carried a de novo risk CNV or loss-of-function (LoF) SNV, 80% and 57% of these probands would not have ASD if they did not carry the de novo CNV or LoF mutation, respectively (5). As a result, identi cation of rare genetic variants in individuals with ASD can be clinically bene cial, both for diagnostic and genetic counseling purposes.
Chromosome microarray (CMA) and whole-exome sequencing (WES) are being increasingly used as part of genetic evaluation for ASD. In a study of 258 probands with ASD, 9.3% received a diagnosis by CMA and 8.4% by WES, for a total diagnostic yield of 15.8% with both. This diagnosis rate varied based on the presence of co-morbid major congenital abnormalities and minor physical anomalies, ranging from as low as 6.3% for the least complex group to 37.5% for the most complex group (15).
Despite these ndings, which are either from clinic-or cohort-based studies, the prevalence and characterization of CNVs and SNVs in populations is understudied. The objective of this work is to examine and test for possible associations between phenotypes of individuals diagnosed with ASD with, or without, likely pathogenic genetic ndings. By systematically curating rare variants based on pathogenicity, we are able to describe the genetic architecture of rare variants in the population. By incorporating robust and relatively unbiased phenotype data obtained from the Swedish national registers, we compare the phenotypes of those with ASD with a known genetic cause and those with ASD without a known genetic cause. In the companion study (Klei et al), the inter-related role of common variation in ASD risk is explored in the PAGES sample.

Study population
In this study, we used data collected from study participants in PAGES (Population-based Autism Genetics and Environmental Study), a large ongoing population-based cohort study in Sweden that started in 2012 with the overall aim to identify possible genetic and environmental risk factors for ASD (5). The PAGES study is a collaboration between researchers at Karolinska Institutet in Stockholm, Sweden, and the Icahn School of Medicine at Mount Sinai, New York, USA. The study was approved by the Regional Ethical Review Board in Stockholm, Sweden, and the Institutional Review Board (IRB) at the Icahn School of Medicine at Mount Sinai, New York, USA. All individuals with a diagnosis of ASD with ICD-9 and − 10 (International Classi cation of Diseases) were identi ed in the Swedish National Patient Register (NPR). Our focus here is on Autistic Disorder, de ned by ICD-9 codes 299.A/B/X and ICD-10 code F84.0. The eligible individuals were born in Sweden between 1960-1996 and followed up through 2011.
In PAGES, once a potential case was identi ed in the NPR, and the ASD diagnosis con rmed by the clinic, a team of research nurses informed the family with a letter about the genetic study, followed up with a phone call to schedule an appointment, and then visited the family to collect informed consent, summary phenotypic information through a questionnaire, and the biomaterial (blood, in most cases). Information about sex, age at the time of diagnosis, date of admission and discharge, and diagnostic codes for intellectual functioning and psychiatric comorbidities were extracted from the NPR after the consent form was signed. The date of the rst registered ASD diagnosis was used as the diagnosis date. Individuals with a diagnosis of Down syndrome were excluded from this study.
In addition to the NPR, the Multi-generation Register was also accessed, which allowed for the identi cation of family relations, and the Swedish Medical Birth Register (MBR), which contained birth characteristics of all Swedish-born children since 1973 (including prenatal, perinatal and neonatal variables).
As of March 2019, biological samples from 1726 individuals were genotyped: 1236 individuals with ASD, and 238 controls. DNA was genotyped from 1,154 ASD samples using In nium Omni Express Exome and 82 ASD samples using In nium Global Screening Array; furthermore, the DNA from 827 of these samples were whole exome sequenced.

CNV Calling
CNV calls were generated from 1,154 ASD samples genotyped on the In nium Omni Express Exome by PennCNV using hg19 genomic coordinates for autosomes. Data and calls were cleaned using standard procedures in PennCNV (BAF drift < = 0.01, |WF| <= 0.05, LRR SD < = 0.3). We ltered CNVs with the number of SNPs > = 20, and overlap with common regions < = 50% of the CNV's own length, leaving 997 high quality samples from which CNVs were called. All CNV calls that span 1) recurrent CNVsrecurrent loci within speci c chromosomal regions that are found to cause genomic disorders (GDs) (for more details, see (4)); or 2) larger than 1 MB and include at least one gene were called likely pathogenic. For most GDs, there is information on whether the deletion and/or duplication is likely pathogenic (Decipher, ClinVar). If a CNV has such information, we adhere to it; if a given CNV does not include such information, we assume that the deletion and duplication are GD.
Individuals with two or more large CNVs (> 1 MB) due to concerns about the quality of the sample; likewise, individuals with a CNV larger than 45 MB were removed from the analysis.
The GATK Team at Broad Institute has generated CNV calls for PAGES sequenced data using germline copy number variants (gCNVs) (16). We use these calls as a validation step in our analysis.

SNV Calling
SNVs were called using the Genome Analysis Toolkit (17) HaplotypeCaller package version 3.4 (for more details, see (4)). We reviewed the SNVs that were identi ed in a known and candidate ASD risk gene based on Satterstrom et al. gene list (4). SNVs in these 102 genes were classi ed as likely pathogenic if there was either 1) a loss-of-function mutation (LoF) that was rare (de ned as absent from the Genome Aggregation Database (gnomAD)), or 2) a missense rare (absent from gnomAD) mutation with an MPC score > 2.

Exposure Covariates
We extracted information for the following variables from the NPR and the MBR: intellectual disability (ID), attentionde cit/hyperactivity disorder (ADHD), schizophrenia (SZ), obsessive-compulsive disorder (OCD), anxiety disorder, speech and language disorders, scholastic skill disorders, motor function disorders, epilepsy, sleeping disorders, hypotonia, birth defects, prenatal growth rate, gestational age in weeks, weight and height at birth, head circumference, and Apgar scores (Table S1).
Due to the limited number of individuals diagnosed with OCD and anxiety disorder, we grouped these diagnoses together to gain statistical power and ensure valid statistics. OCD was included in the anxiety disorder section in DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, fourth edition). In DSM-5, OCD is in a separate section from anxiety disorders, albeit with minimal changes in diagnostic criteria (2, 3).
The average head circumference of healthy newborns is 35 cm (13 ¾ inches). While the range depends on the height of the child, among other attributes (18), we used head circumference without adjustment, de ning "HC-small" if the circumference was smaller than 32 cm and "HC-big" if it was larger than 38.

Statistical analysis
To identify comorbidities and birth characteristics associated with ASD probands who carry damaging mutations, we used a logit model in which carrier status of the damaging variant type was the outcome (carrier or not) and predictors were sex, used as a covariate, and potential comorbidity or characteristic. Thus, a series of models were t, one for each potentially associated feature. We reported the resulting odds ratio (OR), P values, and 95% con dence intervals (CIs) for the OR after adjusting for sex.

Results
Demographic Data: For quality control, we removed twelve individuals with a diagnosis of Down syndrome, ve individuals with more than two large CNVs, and one individual with a CNV larger than 45 MB. After quality control, genotype data was available for 997 ASD individuals and WES data for 808 ASD individuals (Table 1). Of these probands, 70% were male, and males tended to have a lower age of ASD diagnosis (Table 1). Of the comorbidities and birth characteristics for the population of ASD probands (Table 2), ID was most common (46%), and epilepsy was second (31%). Individuals with congenital anomalies had the lowest age of ASD diagnosis, while individuals with SZ had the highest age of ASD diagnosis. Comorbidities and birth characteristics of the probands which were not genotyped or sequenced are presented in Table S4.

Genetic ndings
Of the 997 individuals who were genotyped, 104 (10.4%) carried one or more likely pathogenic CNVs (Table 3, Table S2). Fourteen probands had two likely pathogenic CNVs each, for a total of 118 likely pathogenic CNVs in the population sample; 65 of these were deletions and 53 were duplications, and they ranged in size from 217 kb to 43 Mb (median 3.6 Mb). There were 53 CNVs considered to be recurrent genomic disorders. Those loci with more than one overlapping likely pathogenic CNV/genomic disorder in different probands are described in Table 3; see Table S2 for full results. Of the 808 individuals for whom WES was performed, 69 (8.8%) had a likely pathogenic variant. and no individuals had more than one (Table S3)

Comorbidities and Birth Characteristics of the Probands
Evaluating medical and psychiatric comorbidities among individuals with ASD (Tables 4 and 5), the likely pathogenic CNV and SNV groups showed slightly different average ages of ASD diagnosis by group and by sex, although none of these differences were signi cant (P value > 0.05). Of the phenotypes of individuals with likely pathogenic CNVs and SNVs (Table 5), ID was the most common disorder. Average diagnosis age (SD) 6.8 (7.6) 6.9 (7.5) 7 (7.7) Congenital anomalies and scholastic skill disorders were associated with having likely pathogenic CNVs (Table 6) with an odds ratio (OR) of 1.85 (95%CI, 1.12,2.97), and 3.05 (95%CI, 1.64,5.47), respectively (Table 6). For probands carrying a damaging SNV, versus those who did not, ID and epilepsy showed a signi cant positive association (Table 6). In general, these patterns were similar when carrier status for either a CNV or SNV was assessed (Table 6). We next compared the effect of carrier status for ASD subjects who do or do not manifest ID for the largest group of genetically characterized subjects, those who were genotyped (Table 7). Although ASD subjects who had a likely pathogenic CNV were more likely to have ID, ASD subjects with and without ID showed no signi cant differences in the association of carrier status with other, potentially associated phenotypes (P value > 0.05 for all tests). Thus, conditioning on ID status does not explain much of the variation for other CNV-related associations. For instance, the risk for congenital abnormalities are similar for likely pathogenic CNV carriers whether or not they meet criteria for ID, 2.46 versus 3.87 (Table 7).

Discussion
In this study, we investigated the characterization of CNVs and SNVs in a population sample of individuals from Sweden identi ed with autistic disorder. Of this population sample, 17.6% had a likely pathogenic CNV or SNV. Carriers of likely pathogenic CNV make up 10.4% of the individuals with AD in the Swedish population, similar to that reported for European ancestry in other studies (19,20). Among individuals with AD, we observed a signi cant association between likely pathogenic genetic ndings and intellectual disability, scholastic skills disorders, and epilepsy. Individuals with CNVs had elevated risk for congenital anomalies, a relevant risk factor for autism.
Eight individuals had 15q11q13 duplication syndrome, which contributes to delayed development and impairment of motor functions. Six individuals had GRIN2B mutations (n = 6), which is reported in individuals with muscle tone abnormalities, epilepsy, and ASD (21). One individual had a SHANK3 SNV and two individuals had a CNV including SHANK3; SHANK3 encodes a protein that is essential for proper functioning of the synapse, the junction between neurons.
PAGES is a population sample, and we did not exclude subjects with complex presentations, and therefore is likely to be more representative of the population. Importantly, the results do agree with previous literature. For example, 15q11-q13 duplication syndrome is often cited as a prominent genomic disorder in ASD (22,23); ASD in the presence of CNV or SNV in SHANK3 Phelan-McDermid syndrome is estimated to account for 0.5-2% of ASD (24); and comorbid epilepsy and ASD are often traced to GRIN2B mutations. POGZ is emerging as a major gene in ASD (4), similar to what is observed here. Other ASD related high frequency mutations were SATB1 (n = 4), DYNC1H1 (n = 4), and CREBBP (n = 3).

Limitations
The results of this study should be interpreted in the context of some limitations. First, not all variants were validated by a second method; therefore, some could be artifacts. Nonetheless, a substantial portion of the CNVs were independently validated by calling CNVs from the whole exome data (16) and the validation rate of SNV is similarly high, as documented by variant calls from whole-genome versus whole-exome sequencing (25). Second, judgment calls and empirically de ned thresholds were used to identify likely pathogenic variants. Third, we did not have de novo information to aid in classi cation. Fourth, this study focused on AD and future studies on individuals with less profound ASD are warranted in order to draw a more comprehensive picture of the genetic architecture of the autism spectrum. Finally, phenotypes are de ned from medical registers, which may lead to underestimates of milder ndings.

Conclusions
This population survey, with its characterization of rare potentially likely pathogenic variants, provides greater insight into the genetic architecture of ASD and associated comorbidities. Moreover, because many of the same subjects have been characterized for genotypes from common variants, we can explore the genetic architecture of ASD in far greater detail. Indeed, in an accompanying manuscript (Klei et al.), we explore the joint contributions of rare and common variation to liability for ASD, nding that they work together approximately additively.