Identication of Genomic Rare Variants by Whole Genome Sequencing in Primary Torsion Dystonia

Background:Primary torsion dystonia (PTD) is a group of related movement disorders characterized by abnormal repetitive, twisting postures due to the involuntary co-contraction of opposing muscle groups. The research is based on whole genome sequencing technology of PTD patients to analyze the pathogenic genes and mutation sites in patients with primary dystonia, the relationship among genotype, clinical phenotype and prognosis. Methods: In order to investigate the association between the familial disease and its molecular mechanisms, 100 normal Han Chinese donors were also examined. The DNA of all the samples was sequenced using whole genome sequencing technique.The participants was conducted and submitted to the Macrogen Group (Seoul, Korea) for analysis. Results: We had detected the data output of precursor is 112.91G, throughut mean depth is 39.50X, mappable mean depth is 35.70X, genome coverage ratio is 99.50%.A novel heterozygous missense variant of uncertain signicance (VUS) in ANO3 of Primary Torsion Dystonia had be found, but not in healthy control groups. Conclusions: Together, our results report a new mutation that may be similar in phenotype to known pathogenic genes, which will lay the foundation for future work. More families will be sequenced to identify more informations, which can help us to make the correct molecular diagnosis of the disease and to provide better genetic information.


Background
Primary torsion dystonia (PTD) is a disease of the external vertebral body characterized by abnormal posture and movement, which is caused by uncoordinated or excessive contraction of the active muscle and the antagonist muscle. The pathogenesis of PTD is not completely understood. Over the last few decades, several novel disease associated genes (DYT1-27) have been identi ed in dystonic syndromes, but the underlying genetic diagnosis remains elusive in most patients [1] .But,almost all primary dystonia have a genetic basis [2][3] .
At present, the study of genetic diseases mainly lies in genes, and the method used in the study of genetic diseases is gene sequencing. With the development of whole genome sequencing, especially the cost of single human sequencing has dropped to ten thousand yuan, it has brought new opportunities for the study of genetic diseases.Many problems, such as too few members in the family, sporadic cases, heterogeneity of gene loci, penetrance, and too many candidate clones in the targeting region, have been resolved in the traditional cloning technology [4] . Genome wide sequencing has made up for the lack of disease related structural variations and non coding region variation in whole genome exon sequencing. Genome wide sequencing can detect genomic changes that can not be detected in other ways, such as noncoding mutations, including promoters, enhancers, introns, and noncoding RNA (including tiny RNA). Chromosomal rearrangements can be detected, including inversions, tandem repeats, and deletions. A large number of genetic differences can be found to achieve genetic evolution analysis and the importance of candidate gene prediction [5] . It involves many elds such as clinical medicine research, population genetics research, association analysis and evolutionary analysis. Compared with exome sequencing technology, the whole genome sequencing technology has been tested more widely, and the result analysis is more thorough to the study of genetic disease [6] .
So far, the identi cation of PTD and genetic risk factors has proved to be a di cult task, and the introduction of the latest genome-wide sequencing technologies could drive progress in these areas [7] . As we can nd its advantages which compare with HiSeq 2000 from the table 1. ANO3 encodes a homodimeric protein that is structurally related and encodes a Ca + 2 -activated chloride ion channel and a protein of a membrane phospholipid antibody having a different expression pattern. ANO3 consists of eight hydrophobic transmembrane helices that act as Ca + 2 sensors for regulating calcium homeostasis [8] .
The exact function of ANO3 is unclear, and recent experiments have shown that it does not act as a Ca + 2 -activated chloride ion channel and may actually act as a Ca + 2 -dependent phospholipid fragment [9] . ANO3 appears to play a role in the regulation of neuronal excitability and is highly expressed in the striatum, hippocampus and cortex [10] . Mechanisms, pathogens in ANO3 may lead to striatal-neuronal excitability abnormalities, which manifested as uncontrolled dystonia movement. The expression level of ANO3 mRNA is the highest in striatum, 5.30 times that in frontal cortex and 70 times in cerebellum [11] , and its abnormality can affect endoplasmic reticulum related calcium ion gated chloride channel, which leads to disease [12] .
At present, the disease mainly rely on drugs and stereotactic surgical treatment, but the treatment is only symptomatic treatment, and there are many limitations, and the pathogenesis of dystonia is not completely clear; Therefore, it is necessary to screen new loci of DYT gene, discover new related genes, and study the mutant genes and related proteins.The research is based on whole genome sequencing technology of PTD patients to analyze the pathogenic genes and mutation sites in patients with primary dystonia, the relationship among genotype, clinical phenotype and prognosis. Detecting genetic mutations in genetic diseases and discovering new genes or mutations can help us to make the correct molecular diagnosis of the disease and to provide better genetic information.

Human Samples
In order to investigate the association between the familial disease and its molecular mechanisms, 100 normal Han Chinese donors were examined. The diagnosis of Primary torsion dystonia was based on typical clinical and laboratory measurements. Peripheral blood was collected in anticoagulation tubes from all study participants and genomic DNA was extracted from leukocytes of family members and normal donors using the phenol-chloroform protocol, following standard procedures. The protocol of the present study was approved by the Ethics Committee of The Second Clinical Medical College, Jinan University, Shenzhen People's Hospital (Shenzen, China) and written informed consent was obtained from all the participants.

Whole Genome Resequencing
The DNA of all the samples was sequenced using whole genome sequencing technique.Each sequenced sample is prepared according to the Illumina TruSeq DNA sample preparation guide to obtain a nal library of 300-400 bp average insert size. The libraries were sequenced using Illumina HiSeq X sequencer, the reader is a double end 150 bp. One microgram (TruSeq DNA PCR-free library) or 100 nanogram (TruSeq Nano DNA library) of genomic DNA is fragmented by covaris systems,There are converted into blunt ends using an End Repair Mix.Following the end repair, the appropriate library size is selected using different ratios of the Sample Puri cation Beads.
PCR is used to amplify the enriched DNA library for sequencing. And we perform quality control analysis on the sample library and quanti cation of the DNA library templates. Illumina utilizes a unique "bridged" ampli cation reaction that occurs on the surface of the ow cell. Sequencing-by-Synthesis chemistry utilizes four proprietary nucleotides possessing reversible uorophore and termination properties. Each sequencing cycle occurs in the presence of all four nucleotides at a time. This cycle is repeated, one base at a time, generating a series of images each representing a single base extension at a speci c cluster.

Variant Analysis
The participants was conducted from genomic DNA isolated from blood and was submitted to the Macrogen Group (Seoul, Korea) for analysis. After the sample passes the quality inspection, the shotgun libraries are respectively constructed. Through the public database ltering of the about three million SNP obtained, we exclude the common variation in the population, and focus on the variation that may cause the protein's advanced structure change and affect the gene function. We also excavate the candidate pathogenic variation combined with the sample family situation. The public databases for ltering includes dbSNP, 1000 Genomes Project and ESP6500. Through structural annotations and database annotations, hundreds of rare variations in protein encoded amino acids are found in the samples, as shown in Table 2.

Results
In this report we describe a patient with a novel heterozygous missense variant of uncertain signi cance (VUS) (Chr11(N294H): g.26556013 A > C; NM_031418.2 (ANO3) in ANO3 who had a Primary Torsion Dystonia of disease. A total of 240G bp data is obtained through sequencing. As we can see the overall data output and comparison from

The sequencing data of precursor and healthy control groups
The data output of precursor is 112.91G, throughut mean depth is 39.50X, mappable mean depth is 35.70X, genome coverage ratio is 99.50%, the number of SNPs is 3550305, the number of indels is 558038, and the number of small insertions is 280611, the number of small deletions is 277427, the number of CNVs is 824, the number of copy number gains is 542, the number of copy number losses is 282, the number of SV is 8222. The data output of healthy control group is 127.90G, throughut mean depth is 44.70X, mappable mean depth is 40.30X, genome coverage ratio is 98.90%, the number of SNPs is 3602141, the number of indels is 587127, and the number of small insertions is 298385, the number of small deletions is 288742, the number of CNVs is 831, the number of copy number gains is 577, the number of copy number losses is 254. the number of SV is 8431. The total number of sequencing data is over 100G bp in this experiment. The effective sequencing coverage is more than 30X. The genome coverage rate is about 99%, which satis es the requirements of subsequent gene mutation analysis. The number of SNP/InDel detection is between 3 million and − 500 million, which is in line with the routine test.

The pedigree analysis for the patient and healthy control samples
The prevalence of detection based on samples, we conducted a pedigree analysis, according to the patients and healthy controls, 96 mutations were found, mutations in genes: SAMD11, CNR2, CYB5RL, CACHD1, EVI5, mutation type contains a missense, frameshift mutation, protein termination, splicing and other types of abnormal. Some of these genes are recorded in OMIM and ClinVar, and have certain pathogenicity. We found that based on these 96 mutant genes and the clinical symptoms of probands, the location is 26556013 on chromosome 11 may be the precursor's mutation site. And we can see the basic information from the Table 4, and we do not nd the rare coding variants in healthy control groups. Table 1 the comparation of HiSeq X and HiSeq 2000  1. Introduction of primary torsion dystonia contraction of active muscles and antagonists, and about 75% of the patients with dystonia are PTD [13] . Of the primary localized dystonia in adults, 15-30% can be developed to the other parts of the limb [14][15] . Dystonia by clinical features and etiology of the two main lines for classi cation [16] . Clinical features include age of onset, physical distribution, temporal patterns and concomitant manifestations (manifestations of other dyskinesia or other neuropathies); etiology includes neurological pathology and genetic patterns.

Human genome sequencing opens up a new way to improve human health
With the development and progress of gene sequencing technology, shorter sequencing time and lower cost, scientists have used genome sequencing technology to get genome sequences of large numbers of species. On this basis, through whole genome sequencing technology (whole -genome resequencing, WGR) analysis of genome sequencing and sequence comparison of different individuals of a known genome sequence, genetic information can be obtained by individual differences in species, including a large number of single nucleotide polymorphisms (SNP), copy number variation (CNV), indel loci (InDel), structure (SV), mutation obtained genetic characteristics of biological population. Gene sequencing technology applied in the early phage, bacteria and viruses to gradually applied to animal and plant when applied to the human body, has achieved fruitful results, deepen the understanding of various organisms in various elds, and gradually transform and create great scienti c and social bene ts. The sequencing of human genome accelerates the understanding of human genetic characteristics, genetic diseases, rare and common diseases, and opens up a new way to improve human health.

Relationship betweenAno3 and gene expression in human PTD
Whole genome sequencing identi ed ANO3 as a candidate gene. The identi cation of the ANO3 gene needs to be con rmed from independent studies, which can also estimate its frequency compared to other dystonia genes. In addition, additional families are required to unravel the complete phenotype spectrum of DYT24. Although the pathophysiology is not clear, but the function of this gene is fascinating, because it is the rst time the ion channel dysfunction as the pathophysiology of dystonia [17] . ANO3 mutations in some patients were the only initial manifestations, no (or later) affected by tremor with slight minor dystonia posture, misdiagnosed as ET [18] . Charlesworth [19] and others used a combination of linkage analysis and whole exon sequencing to carry out genetic analysis of an autosomal dominant cranial neck dystonia in England and found that ANO3 may be its causative gene in 2012. This study coincides with the results of his research.
In this case, we report a new mutation that may be pathogenic in known genes with similar phenotypes. As we can see, clinical genetics studies may not be su cient to con rm pathogenicity, but may require other functional studies. However, taking this into account, it is essential to develop robust functional determinations to truly re ect the underlying disease mechanisms, that is to say that not all functional effects are equal. When we better understand the pathways and mechanisms of the DYT24 gene, and in general, clari cation of the dystonia, a rare variant will better guide the targeted drug design and clinical trials. This will provide the basis for future work in which more families will be sequenced to identify more informations.

Conclusions
Together, our results report a new mutation that may be pathogenic in known genes with similar phenotypes, which will provide the basis for future work in which more families will be sequenced to identify more informations.Detecting genetic mutations in genetic diseases and discovering new genes or mutations can help us to make the correct molecular diagnosis of the disease and to provide better genetic information.