A Novel Missense ZBTB18 Mutation Identied in an Intellectual Disability Family by Whole Genome Sequencing

Background Intellectual disability is a generalized neurodevelopmental disorder characterized by signicantly impaired intellectual and adaptive functioning. About a quarter of cases are caused by genetic diseases, and about 5 percent are inherited from a person's parents. The objective of this study was to explore the candidate disease-causing gene in an intellectual disability family. Methods Whole genome sequencing was performed on affected twins, their affected children, and unaffected parents, and data ltration for most rare variants and In Silico prediction tools to determine the pathogenecity. Further, Sanger sequencing was used to validate the causative mutation. the family, a previously undescribed heterozygous variant in ZBTB18 gene c.1323C>G (p.His441Gln) in unaffected


Abstract Background
Intellectual disability is a generalized neurodevelopmental disorder characterized by signi cantly impaired intellectual and adaptive functioning. About a quarter of cases are caused by genetic diseases, and about 5 percent are inherited from a person's parents. The objective of this study was to explore the candidate disease-causing gene in an intellectual disability family.

Methods
Whole genome sequencing was performed on affected twins, their affected children, and unaffected parents, and data ltration for most rare variants and In Silico prediction tools to determine the pathogenecity. Further, Sanger sequencing was used to validate the causative mutation.

Results
In the family, a previously undescribed heterozygous variant in ZBTB18 gene c.1323C>G (p.His441Gln) was identi ed. The mutation co-segregated with all affected individuals in the family and was not found in unaffected members.
Conclusions c.1323C>G mutation in ZBTB18 gene on 1 chromosome may be related with the phenotype of intellectual disability in this family. WGS is an e cient method to perform molecular diagnosis for hereditary intellectual disability.

Background
Intellectual disability (ID) is a generalized neurodevelopmental disorder characterized by substantial impairment in intellectual functioning (reasoning, learning, problem solving) and adaptive behaviour (conceptual, social and practical skills) that originate in the developmental period, and affects 1% to 3% children [1,2]. ID can occur in isolation or in combination with congenital malformations, other neurological features such as epilepsy, sensory impairment and autism spectrum disorders (ASD), and its severity (mild, moderate, severe and profound) is highly variable [3,4].ID pose medical, nancial, and psychological challenges on individuals, families, health-care systems, and societies [5].
ID is genetically and phenotypically extremely heterogeneous [3]. There are many etiologies of ID: genetic causes (eg, chromosomal abnormalities, copy number variants (CNVs), or mutations), environmental causes (eg, alcohol and other teratogens, prenatal infections), traumatic brain injury, neurologic/brain disorders, nutritional de ciencies, and inborn errors of metabolism [3,6]. Among them, genetic factors are regarded as one of the most prominent etiologies.
Many studies have been performed to reveal the genetic etiology of ID. The traditional testing methods for ID diagnosis include karyotype, microarray, polymerase chain reaction, Fragile X, uorescent in situ hybridization and mitochondrial DNA testing [7]. Chromosomal microarray analysis (CMA), comparative genomic hybridization (CGH) or SNP arrays, has long been the rst-tier test for the child with ID of unknown etiology [2,8]. Many previous studies on the genetic etiologies of ID were based on CMA [9][10][11]. These methods have intrinsic speci c limitations. Karyotype and microarray can detect the whole genome of an individual's with low resolution, while the latter (polymerase chain reaction, Fragile X, uorescent in situ hybridization, and mitochondrial DNA testing) has higher resolution with only a small part of a person's genome.
In recent years, with the rapid development of next generation sequencing (NGS), as well as NGS approaches including whole-exome sequencing (WES) and whole-genome sequencing (WGS) can provide both broad and high-resolution identi cation of genetic variants. WES and WGS are more and more widely used to identify pathogenic genes for ID [12][13][14][15][16][17][18][19][20][21]. According to the results of previous studies on X-linked, autosomal-dominant and autosomal-recessive ID, over 700 genes and 130 rare CNVs have been identi ed, which can be used for the genetic diagnosis of both ID and ID-associated disorders [3,22]. In the current study, we performed whole genome sequencing in an ID family to identify the candidate gene.

Subjects Clinical samples
An ID family with three-generation-span from Sichuan province in China was recruited after informed consent ( Figure 1). Two affected (III:1, III:2) members received full clinical evaluation. The blood samples of four affected II:2, II:3, III:1, III:2 and two healthy members (I:1,I:2) were collected for further studies.
Clinical records and radiographic images were published under the patients' written permission. The study was approved by the Ethics Committee of West China Second University Hospital, Sichuan University (No: 2015011) and adhered to the tenets of the Declaration of Helsinki.

Bioinformatics analysis
We used SOAP nuke to remove the adapters and low-quality reads, then the reads were mapped to the human genome reference (UCSC GRCh37/hg19) by Burrows-Wheeler aligner (BWA-MEM, version 0.7.10) [23]. Variants calling was performed using the Genome Analysis Tool Kit (GATK, version 3.3) [24]. Variant Effect Predictor (VEP) was used to annotate and classify all the variants [25]. Then we screened variants based on their frequency in the public and internal databases (e.g, 1000 genome, GnomAD, and our internal database), and we only retained variants with a Minor Allele Frequency (MAF) <0.005. After that we do a ltration depending on the inheritance model of the pedigree. At last we do the prediction for harmful candidate variants using some software including Sift (http://www.sift.jcvi.org), PolyPhen2 (http://www. genetics.bwh.harvard.edu/pph2/).

Sanger validation
We designed all polymerase chain reaction primers for validation by Primer 5.0. The candidate variants were veri ed by Sanger sequencing to lter out false positive variants. The six family members (affected individuals II:2,II:3,III:1,III:2; unaffected individuals I:1,I:2) were sequenced by bidirectional Sanger sequencing to determine co-segregation of the candidate mutations. Polymerase chain reaction (PCR) and sequencing primers are available upon request.

Clinical features
Four individuals in this family have mild ID. The proband (III:1), was a 7-year-old male patient. At the age of 31 months, Magnetic resonance (MR) scan of the brain and language impairment screening assessment were performed. The MR result revealed dysplasia of corpus callosum. The process of language development were as follows: The overall language ability was at the level of 11-12 months old. His speech related ability was at the level of a 9 months old. His ability of auditory related expression was at the level of a 12-13 months old. His ability of visual expression was at the level of a 12-13 months old. At the age of 6 years, the Magnetic resonance imaging (MRI) examination showed that the posterior horn of both ventricles was enlarged, the corpus callosum was changed, and the hippocampus was small. Electroencephalogram (EEG) examination suggested abnormality.
As for patient III:2 at the age of 19 months general examination was performed. MR scan of the brain indicated mild paraventricular white matter softening. The Intelligence Development Diagnostic Scale showed that the total DQ of the children was 47.2, and the overall level of intelligence development was signi cantly lower than that of the normal children of the same age. Among them, the ability to cope with people was equivalent to 9 months old, the ability to cope with things was equivalent to 8 months old, the ability of gross motor was equivalent to 12 months old, the ability of ne motor was equivalent to 8 months old and the ability of speech was equivalent to 8 months old. III:1 and III:2 are currently attending special schools for the ID.
As for patients II:2 and ,II:3, there were no clinical diagnostic data. According to their parents' description, they are identical twins and often had a fever after weaning at 1 years of age. Once fever occurred, there would be convulsions. After 3 or 4 times, the convulsion occurred when the body temperature was only 37.5℃. The doctor diagnosed as possible epilepsy. After taking the medicine, the incidence of epilepsy was less. Under the guidance, they can only do ordinary housework, such as cooking, washing dishes, sweeping the oor and so on. But they can't go shopping alone. If they pay more and the salesman doesn't return the extra money to them, they don't know what to ask for.

Mutation detection
To identify the causative variants in the ID family, we performed WGS described in the methods. Highquality results were obtained, with mean coverage in excess of 91%, and the average depth > 40X (Table   1). Following bioinformatics analysis, a de novo mutation c.1323C>G, which was described in reference to RefSeq transcript NM_205768.2, was found in Exon2 of ZBTB18 gene completely segregated in the affected family members. The mutation causes an amino acid change from histidine to glutamine at position 441 (p.His441Gln). This variant is predicted to be probably damaging (Polyphen score 0.997) and deleterious (SIFT score 0).
Following this result, we applied a further mutation validation strategy, by Sanger sequencing, with family members. Sanger sequencing was performed with speci c primers, con rmed the co-segregation of the mutation (Figure 2). Multiple orthologous sequence alignment revealed that 441 codon alanine of ZBTB18 was highly conserved amino acids across different species (Figure 3). This indicates that any mutation at those codons may have a deleterious effect. This variant has not been reported in HGMD database or ClinVar database.

Discussion
In the present study, we performed whole genome sequencing in an ID family. After systematic NGS data analysis and Sanger sequencing veri cation, we identi ed a new heterozygous missense mutation (c.1323C>G, p.His441Gln) in ZBTB18 (NM_205768.2) in the four patients except the two unaffected individuals of the family.
ZBTB18 gene, previously known as ZNF238 or RP58, encodes a transcriptional repressor of BTB (broad complex tramtrack bric-a-brac) zinc nger family, which is composed of an N-terminal BTB domain mediating protein-protein interaction and four Cys2-His2-like (C2H2) zinc ngers mediating proteinbinding-to-regulator within promoters at its C-terminus respectively. BTB domain zinc nger factors linked to development of the mammalian cerebellum, cerebral cortex and macroglia [26,27]. ZBTB18 is activated during neuronal differentiation in pin-like cells of the ventricular zone, and in migrating multipolar cells [28]. ZBTB18 participates in neuron and astrocyte differentiation by mediating cell-cycle control of neural stem cells [29]. ZBTB18 is essential for the growth and organization of the cerebellum and regulates the development of both GABAergic and glutamatergic neurons [30]. ZBTB18 acts to restrict the represses expression of pax6, ngn2 and neuroD1, and expression of these three sequential proneurogenic genes causes intermediate neurogenic progenitors (INP) to differentiate and migrate.
ZBTB18 gene is extremely intolerant to genetic variation [34]. In humans, genetic mutations in ZBTB18 are associated with structural brain abnormalities, neuronal migration disorder and ID [12,14,16,17,[35][36][37][38][39][40][41][42][43][44][45][46][47][48]. These mutations are located in the BTB domain, C2H2 zinc nger domain or other sites of ZBTB18 protein. The types of mutations include missense, nonsense, small deletion, small insertion, and gross deletion. Nonsense, small deletion and small insertion lead to truncated ZBTB18 protein with a heterozygous loss of the zinc nger domain, this could be considered haploinsu ciency. Gross deletion lead to the total loss of ZBTB18 protein. The researchers suggest that haploinsu ciency/loss-of-function represents a general pathological mechanism for disease [40,46]. Most of the reported missense variants are clustered in the second, third and fourth zinc nger domain. Variants in a zinc nger domain could affect DNA-binding properties, impaired binding of ZBTB18 to DNA will disturb its function as transcriptional repressor [41,46]. In addition, mutating the zinc nger could have a dominant-negative effect by rendering the wild-type protein unable to bind DNA because it dimerizes with a mutant protein [41]. It has been suggested that altered transcriptional regulation could represent an important pathological mechanism for ZBTB18 missense variants in human disease [49].
The novel missense mutation c.1323C>G (p.His441Gln) of ZBTB18 gene was predicted to be pathogenic based on supportive information. First, it is absent from the normal population and National Center for Biotechnology Information (NCBI) single nucleotide polymorphism database. Second, the mutation loci is highly conserved amino acids across different species, and the alteration it produces in the chemical properties of amino acids is signi cant. Functional studies are further required to con rm the potential effect of the mutation on the protein.
Identifying the disease-causing mutation in patients with ID is necessary to provide proper genetic counseling and guiding gene-speci c therapies in the future. The extensive genetic heterogeneity of this disorder requires a genome-wide detection of all types of genetic variation. WGS enabled sequencing of the entire 3 billion bases of the human genome, including both coding and noncoding DNA. In recent years, the cost of WGS has gradually decreased. Therefore, WGS is increasingly being used to search causative mutations in order to get an accurate molecular diagnosis [13,18,19,21]. Our study also shows that WGS is an e cient method to perform molecular diagnosis for hereditary ID.
In conclusion, the c.1323C>G (p.His441Gln) mutation of ZBTB18 gene expands the spectrum of mutations that causes ID. This study provide supporting evidence that WGS is a highly e cient strategy to provide a molecular diagnosis for ID.   The validation results of Sanger sequencing of ZBTB18 ((NM_205768) Figure 3 Alignment of ZBTB18 protein sequences across species