Four Novel Gene Polymorphisms Cause Nuclear Age Related Cataract in Chinese People

Background More information on genetic variation can be obtained by exon sequencing for the diagnosis of nuclear age-related cataract (NARC). Methods In our present study, genomes of 12 DNA samples were sequenced. The average effective depth was 10× when using Illumina sequencing. After conducting whole-exon sequencing, we further performed depth analysis and spectrum analysis to determine the gene polymorphism sites closely associated with NARC. Results In genes showing single nucleotide polymorphism (SNP), there were 18,699 synonymous mutations and 17,975 missense mutations in the coding region. A total of 4,944 insertions and deletions (indels) were found. Among them, 1329 indels exhibited polymorphism and were further analyzed. Whole-exon sequencing previously showed polymorphism associated with ARC and known pathways associated with protein synthesis and metabolism. Following depth analysis (GO and KEGG analysis), we identied 20 promising candidate genes that were closely related to NARC. We further performed spectrum analysis for 26 polymorphism sites and found that ZNF573 (rs3095726, SNP), ZNF862 (rs62621204, SNP), SYNE3 (rs76499929, indel), and GAS2L2 (rs78557458, SNP) had statistically signicant relationship with NARC. The 3D protein structure showed obvious changes for ZNF573 (rs3095726, SNP) and GAS2L2 (rs78557458, SNP). Our ndings provide the basis for further studies and discovery of key genes associated with NARC.

WES is a useful clinical diagnostic tool to identify disease-related variants in patients [26][27] . Currently, to determine whether disease-related mutations in patients are related to mutations in coding regions, researchers are focusing on sequencing exons rather than entire genomes.
Recently, several studies have identi ed variants strongly associated with disease phenotypes [26][27] by successful application of WES technology.
Because of the uneven coverage of different WES datasets along the exon length under high-resolution detection, in the variation calling analysis, the impact identi cation of new variations that may be of clinical signi cance.
In the present study, we analyzed and determined the key issues related to sequence structure, which lead to low coverage, and systematically studied the different parameters that may affect WES. To date, WES has become a major genetic tool, with over 100,000 exons sequenced in several diagnostic centers [28] .
To ll the gap and identify the mutation accurately, it is very important to improve the mapping algorithm and modify the design of target sequence capture technology and estimation of genetic disease heritability. Large numbers of insertions and deletions (indels) and single-nucleotide polymorphisms (SNPs) have been identi ed using the next-generation sequencing (NGS) technology. In many species and human diseases, short indels are involved in phenotypic diversity and are regarded as the second most common form of genomic variation [29] .
It is feasible to identify and study the molecular basis of SNPs and indels for ARC. However, to date, very limited investigations have been reported using the method of whole-genome resequencing for genes related to ARC.
The present study aimed to detect polymorphic SNP sites and indels including short indels [1-49 base pairs (bp)] across the whole exon sequence in 8 patients with NARC. In patients with NARC, it is important to identify the differences in SNPs and indels, which result in functional genes.

Methods Subjects
A total of eight subjects were recruited from the Eye Hospital of Harbin Medical University. All the subjects received comprehensive ophthalmic examinations, including vision, slit lamp microscopy, and ophthalmoscopy. None of the subjects had blood relations (at least not among the four grandparents). All the subjects claimed to be Han (all four grandparents were Han). The study was approved by the institutional review committee, following the principles of the Helsinki declaration, and informed consent was signed by all subjects.

Lens opacity grading
According to LOCS III, a trained ophthalmologist graded the lens opacity of each right eye as cortical (C), nuclear color (NC), nuclear opalescence (NO), posterior subcapsular (P), or mixed type after pupil dilation with 1% tropicamide.
Nuclear ARC group and control group All subjects with NARC were included in this study, and the case and control groups were recruited according to the grading conditions. The following exclusion criteria were used: (1) history of diseases such as tumor, cancer, respiratory disease, kidney disease, or history of diabetes; (2) pseudophakia or aphakia in both eyes; (3) ocular surgery history in either eye; (4) complications with other eye diseases such as fundus diseases, dislocated lens, glaucoma, trauma, high myopia, and uveitis; and (5) under 45 years of age.
Blood sample collection and DNA isolation Peripheral blood (12 ml) samples of all subjects (8 disease cases and 4 controls) were collected in EDTA tubes and stored at -80°C before use. DNA was extracted from whole blood cells by using the mammalian blood genomic DNA extraction kit (Shanghai Life Biotech Co., Ltd., China) following the manufacturer's instructions and stored at -20°C until it was used for genotyping DNA library construction and sequencing Quali ed genomic DNA samples were randomly interrupted into fragments with a main peak of approximately 200-300 BP by an ultrasonic high-performance sample processing system (Covaris). The DNA segment was then repaired, and the 3 -end was added with the "A" base and the library joint at both ends. The library was prepared by linear ampli cation (LM-PCR). Some hybridization libraries and exon chips were captured and enriched, and the unenriched fragments were eluted and ampli ed. The ampli ed products were qualitatively controlled by Agilent 2100 Bioanalyzer (Agilent DNA 1000 Reagents) and qPCR, and then sequenced on a computer. We used the Illumina Hi Seq platform to perform high-throughput sequencing of each quali ed library and ensured that the amount of data of each sample met the standard. The original image data obtained by sequencing was transformed into raw reads (paired-end reads) by Illumina Base Calling software. The data were stored in FASTQ le format, which is called raw data.
Extraction of genomic DNA from frozen blood samples by standard the phenol/chloroform method.DNA contamination and degradation were observed on 1% agarose gels, and the purity and concentration were tested using NanoDrop 2000 (Thermo Scienti c Inc. Waltham, DE, USA). High-quality DNAs were used in library construction. For each individual, two paired-end libraries were constructed, and the read length was 2×100 bp. The sequencing was then performed by the Illumina Hi Seq 2000 instrument (Illumina Inc., San Diego, CA, USA).

Read mapping and variant calling
Information analysis was started with raw data. Raw data contain adapter sequences, bases with low sequencing quality, and bases with undetected N representations, which can interfere with subsequent information analysis. Therefore, it is necessary to lter the raw data to obtain clean data or clean re rst.
Ads. Then, by using the comparison software (Burrows-Wheeler Aligner [BWA]) [30] [31] , the clean data of each sample was compared to the human reference genome (GRCh37/HG19), and the original BAM format comparison result le was obtained. To ensure the accuracy of mutation detection, we followed the best mutation detection and analysis process recommended by the o cial Genome Analysis Toolkit (GATK) website. For comparing the results, Picard tool [32] was used to remove duplicate reads, and GATK [33,34] was used to process local readjustment and foundation quality recalibration.
On the basis of the comparison results, the evaluation indices such as sequence depth, coverage, and comparison rate of each sample were statistically analyzed. In addition, to ensure high quality sequencing data, a strict data quality control system (QC) was established in the entire analysis process. In this process, we used GATK v3.3.0's Haplotype Caller, the cutting-edge, rst-class software, to detect genomic variations, including SNP and Indel, and ltered the original mutation test results to yield highly reliable mutation results. Next, SnpEff (http://snpeff.sourceforge.net/SnpEff_manual.html) software was used to annotate the mutation results and predict the impact. The nal variation results and annotation results were used for downstream analysis. To remove noise from sequencing data, we rst ltered the data. The raw data ltering methods are as follows: After ltering, "clean data" were obtained, and the sequencing data were statistically analyzed, including the number of sequencing reads, data output, quality value distribution, etc. The BWA software (BWA v0.7.12) was used to compare all clean reads to the human reference genome (GRCh37/HG19). The sequence data of each lane were compared, and readgroup ID was added to the comparison results. The comparison method of BWA-MEM was adopted. The criteria were as follows: overall base quality score = 20; read depth <100 for each individual; and alternative allele on either forward or reverse supporting reads >3.

GO and KEGG pathway analysis
We then used KOBAS tools to annotate the function of Gene Ontology (GO) and analyzed the pathway of Kyoto Encyclopedia of Genes and Genomes (KEGG) for the genes with identi ed indel.
In the process of gene function annotation, p value of <0.05 as determined by Fisher's accurate test was considered as signi cant. In addition, we downloaded genes related to known pathways from the KEGG pathway website (http://www.kegg.jp/), including growth arrest-speci c 2 like 2 genes, spectrin repeatcontaining nuclear envelope family member 3 genes, and zinc nger protein families, and we further determined whether these genes were present in the list of genes identi ed by WES data for integrated analysis.
Mass spectrometry analysis SNP typing with Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF/MS) is as follows. The target sequence was ampli ed by PCR, and a base was then extended on the SNP site by adding SNP sequence-speci c extension primers. The prepared sample analyte was cocrystallized with the matrix of the chip. The crystal was placed in the vacuum tube of the mass spectrometer and then stimulated by an instantaneous nanosecond (10 -9 s) intense laser. The matrix molecule absorbed the radiation energy, which led to energy accumulation and rapid heat generation.
Consequently, the matrix crystal sublimates; the nucleic acid molecule was desorbed and transformed into metastable ions, and the ions produced were mostly single charged ions. The ions assumed the same kinetic energy in the accelerated electric eld, and they were then separated in a nonelectric eld drift region according to their mass-charge ratios and sent to the detector in a vacuum tube. Time-of-Flight (TOF) detectors are commonly used to detect ions produced by MALDI. The smaller the ion mass, the faster the ion arrives. Because mass spectrometry is highly sensitive with regard to quality, it is easy to distinguish two gene sequences containing only one different base and deduce SNP and indel typing.

Data production and read mapping
In this exon sequencing project, 12 DNA samples were sequenced with the Illumina sequencer, and the original base number of 14095.68 Mb was obtained for each sample. After deleting low-quality reads, 134,176,861 clean reads (13402.31Mb) per sample were obtained. The clean reads of each sample had higher Q20 and Q30, which indicated that the sequencing data had better sequencing quality. The average GC content was 45.47%. The results of the exon sequencing data are shown in Table 1.
The distribution of the content and quality values of the sequence bases on clean reads is shown in Figures 1 and 2, respectively.

Indel detection
Among the patients with severe lens opacity, we focused on the unique patient who showed gene polymorphisms that were inferred to be related to nuclear cataract.
Finally, after ltering, we obtained an average of 103,623,705 common differential indels that exhibited polymorphisms between patients with nuclear cataract and normal people. In this project, an approximately 43.99 Mb long target area was captured by the chip, and we will change it on that target area. We used BWA to clean read each sample. Compared to the human reference genome sequence (GRCh37/HG19), the average 99.94% reads ratio is referred to Genome. The number of indels detected in each patient varied from 102,188,315 to 104,851,747, with an average of 103,623,705 (Table 2).
After removing duplicate reads, 103,623,705 effective reads (10290.98 Mb effective bases) were obtained on average. The number of effective bases was 54.01% (i.e., capture e ciency [Capturespeci city]) compared to the target area. The average sequencing depth of the target region was approximately 126.34×; 99.24% of the target region was covered by at least one read per sample, and 97.70% of the target region was covered by at least 10 reads. In addition, the single base sequencing depth pro les and cumulative sequencing depth pro les of each sample in the target region are shown in Figures 3a, 3b, and 3c, respectively. The length of inserting fragments (the length of DNA fragments sequenced) of pairwise sequencing reads is shown in Figures 3a, 3b, and 3c.

Functional annotation and genomic distribution
Overall, 4,944 InDels were found in all samples; 69.68% of them were present in the dbSNP database and 57.93% in the database of the 1000 Genomes Project. A total of 1329 InDels were newly discovered. The statistical data of InDel distribution for each sample and the general population are shown in Table   3.
Among the total InDels, 415 frame-shifting mutations occurred in the coding region, eight InDels formed the termination codon and nontermination codon, six InDels formed the initiation codon and noninitiation codon, and 58 InDels changed the splice acceptor or donor in the splice site region (see Table 4).
The length distribution of the InDel mutation in the coding region of each sample is shown in Figure 4.

SNP detection
Overall, 63,896 SNPs were found in all samples, 94.01% out of them. Now in the dbSNP database, 91.16% is in the thousand person genome project (the1000).Genomes Project in the database. There were 3,178 newly discovered SNPs. The ratio of base conversion to base substitution was 2.69%. In the overall SNP, the coding area. There were 18,966 synonymous mutations in the domain, 17,975 missense mutations, and 39 SNPs. The stop codon was transformed into a nonstop codon, and the 170 SNPs caused the codon to change into a nonstop codon. The statistical data of SNP distribution for each sample and the general population are shown in Table 5.
The termination codon, the 29 SNP makes the start codon change into a noninitiating codon, 119. An SNP changed the splicing receptor or splice donor in the splice site region (see Table 6).
The KEGG analysis mainly included the following terms: microtubule organizing center attachment site and others. For Indel polymorphism, the GO analysis mainly included the following signi cant terms: microtubule; organizing center attachment site; cytoskeletal anchoring at nuclear membrane; nuclear outer membrane; cytoskeletal protein binding; cytoskeletal protein binding; single-organism membrane organization; rough endoplasmic reticulum; membrane organization; cytoskeleton organization; maintenance of protein location in cell; maintenance of protein location; maintenance of location in cell; actin lament binding; regulation of cell shape; organelle outer membrane; single-organism organelle organization; nuclear membrane; maintenance of location; establishment of protein localization to membrane; regulation of cellular component organization; actin binding; nuclear envelope; organelle membrane; organelle; regulation of cell morphogenesis; nucleus; intracellular membrane-bounded organelle; organelle organization; endomembrane system; cellular developmental process; protein complex; regulation of anatomical structure morphogenesis; nuclear outer membrane-endoplasmic reticulum membrane network; intracellular organelle; intracellular membrane-bounded organelle; organelle organization; endomembrane system; cellular developmental process; protein complex; regulation of anatomical structure morphogenesis; nuclear outer membrane-endoplasmic reticulum membrane network, and others. After depth analysis, we found the presence of 20 genes (containing SNPs and Indel polymorphism sites ) in six or seven NARC samples, and the gene heatmaps are shown in Figure 5. The GO and KEGG analysis of these 20 genes are shown in Tables 7 and 8.
Identi cation of candidate genes associated with NARC Signi cant SNPs identi ed based on known QTLs and mass spectrometry, and the biological function of genes. Twenty genes with at least one common SNP and indel gene are considered to be potential genes related to ARC. The results are shown in Figure 5. Depth analysis based on the raw data of whole exon sequences revealed that 15 SNPs and 11 indel polymorphisms were closely associated with the occurrence of cataract because of their signi cantly high

Indel and SNP validation
To evaluate the reliability of the resequencing data, 26 randomly selected indels and SNPs were validated by mass spectrometry. According to the polymorphism locus, the primer design was optimized through the assay design 3.1 software from Sequenom company. The primers synthesized by the company were quality checked by matrix-assisted laser desorption ionization time-of-ight mass spectrometry (MALDI-TOF) to assess whether the actual molecular weight is consistent with the theoretical molecular weight and the primer purity meets the experimental requirements. In the 384-well plate containing the reaction product, 16 μL triple distilled water was added and centrifuged at 2,000 rpm for 3 min; after the resin was added, the resin was puri ed on a reverse shaker for 35 min and desalted. This was followed by centrifugation at 2,000 rpm for 3 min. The desalted sample was placed on the sample target and allowed to crystallize naturally. MALDI-TOF-MS was then performed. Typer 4 software was used to detect the mass spectrum peak and interpret the genotype of each sample target site according to the mass spectrum peak map. Shesis software (http://analysis.bio-x.cn/shesis main. HTM) was used for genotyping analysis of the difference in allele frequency and haplotype distribution between the normal group and the case group. The chi-square test was performed to analyze the relationship between SNPs of each gene and the relevance of disease risk, and P < 0.05 was considered to be signi cant. After speci c ampli cation and extension of SNP loci related to peripheral blood DNA in all case and control samples, four polymorphic loci were successfully genotyped.
Of the 12 indels that were retained after screening, we chose those with polymorphism in patients with severe lens opaci cation, known as "common differential variants." The process of this study was as follows: we rst collected all polymorphic variations in patients and then retained the common variation with the same allele distribution pattern in individuals.

Statistical analysis
In addition, we compared the above difference index with variants in the human SNP database (http://www.ncbi.nlm.nih.gov). Next, we compared our results with the variants in the SNP database (NCBI dbSNP, updated on July 11, 2016), and we found that the three SNP polymorphisms and one indel polymorphism sites were signi cant. The frequencies of genotypes and alleles of 26 polymorphisms sites in patients with ARC and controls SNPs are shown in Table 9.

Discussion
In eukaryotic genome, zinc nger (ZNF) protein is one of the most abundant proteins. Its functions include DNA recognition, RNA packaging, transcriptional activation, apoptosis regulation, protein folding and assembly, and lipid binding. The structure of ZNF protein is as diverse as its function. Recently, many ZNF domains with new topological structures have been reported, which provide important insights into the structure/function relationship. In the eukaryotic system, ZNFs are one of the most common proteins with a wide range of biological functions, including apoptosis, protein structure, DNA recognition, and RNA transcription [30] .
The function of transcriptional inhibition is to regulate gene expression in mammals, which is an important part of molecular mechanism. Some studies have shown that there are approximately 2,000 transcription factors in human genome 5, and there are ZNF motifs in nearly 800 transcription inhibitors [31] . In the regulatory region of the genome, these motifs recognize speci c DNA sequences, and the interaction between ZNF protein regulatory gene expression and DNA target is the key [32] .
The most common of these motifs is the C2H2 or Krüppel type of ZNF (KZNF). It is reported that a motif termed a Krüppel-associated box (KRAB) could recruit histone deacetylase complexes to the DNA region to which the ZNF is attached and occupy one-third portion in KZNFs [33] .
Recently, some structural studies of ZNF proteins have shown new insights into their extraordinary diversity of structure and function. Although a large number of putative zinc nger motifs have been identi ed, the structures of only a few of them have been characterized. Some studies have proved that there are novel folds, while recent studies have shown that uncharacterized zinc nger domains are built on common structural cores.
Although remarkable progress has been made in this area, the structure, function, and mechanism of ZNF protein need further study. It is, however, very clear that these small, independently folded protein domains play a key role in regulating a series of signi cant biological functions. The other functions of ZNF protein besides DNA and RNA recognition and packaging are gradually being recognized.
In our study, after whole-exon sequencing, we further performed depth analysis including GO and KEGG analysis, and we found the presence of 20 sites (containing SNPs and indel polymorphism sites ) in 6 or 7 NARC samples. Next, we performed mass spectrometry analysis and veri ed that there are signi cant differences between the disease group and the control group for rs3095726 (ZNF573). We also investigated the structural variation caused by rs3095726 in the 3D protein. Analysis with the SWISS-MODEL software showed that rs3095726 (Met523Val) caused a structural variation in ZNF573 protein.
The 3D protein structural variation analysis showed that the position of primary two carbon atoms and a hydrogen bond changed from the tail position to the middle position for the corresponding protein of rs3095726. The space ball structure analysis showed that the position of protein from initial obvious position to later hidden position may be caused by the fact that ZNF573 is not easily activated, which in uences the function of ZNF573. Because ZNF573 is involved in a broad range of biological functions including apoptosis, protein structure, DNA recognition, and RNA transcription, the structural variation in ZNF573 protein may result in the abnormal growth and development of human lens, further leading to the occurrence of cataract.
The molecular mechanism of KRAB domain inhibition is not clear. In our study, we found signi cant differences between the disease group and the control group for rs62621204 (ZNF862). The search results of rs62621204 (Arg56Cys) in PubMed indicated that this site is located in the region of "Krüppelassociated box," which has been con rmed as the transcriptional inhibition region called the Krüppel-associated box (KRAB) and is conserved in several Krüppel-type ZNF proteins [34] .
KRAB is rich in charged amino acids and can be divided into two sub elds: A and B. These two sub elds can be folded into two amphiphilic α helices. KRAB A and B boxes can be separated by varying interval fragments. Many KRAB proteins contain only one box [35] .
The functions of the known members of the KRAB protein family include transcriptional inhibition of RNA polymerase I, II, and III promoters; RNA binding and splicing; and nucleolar function control. When the KRAB domain binds to the template DNA through the DNA binding domain, the KRAB domain acts as a transcription inhibitor. The sequence of 45 amino acids in the KRAB A sub eld has been proven to be su cient and necessary for transcriptional inhibition.
The B-box itself does not suppress, but it does strengthen the repression imposed by the A-subdomain of KRAB [36] .
The KRAB domain is usually encoded by two exons. The regions encoded by these two exons are called krab-a and krab-b. Although the functions of KRAB-ZFPS are largely unknown, they seem to play an important role in cell differentiation and development, organ development, and regulation of virus replication and transcription.
The molecular mechanisms of repression by the KRAB domain are not known. In our study, we found signi cant differences between the disease group and the control group for rs62621204 (ZNF862), and we further investigated the 3D protein variation caused by rs62621204. Analysis with SWISS-MODEL software revealed that rs62621204 (Arg56Cys) was not in the region of protein coding. On the basis of this result, we speculated that although this site was beyond the region of protein coding, rs62621204 (Arg56Cys) caused some variation in the KRAB domain.
Gas2 [MIM: 602835] is expressed in many human tissues, and it is involved in the regulation of micro lament dynamics during cell cycle and apoptosis [37,38] . Gas2 belongs to the micro lament network system and is a protein whose expression is also regulated during the growth arrest of diploid broblasts and has remained conserved in the evolution of species. Gas2l2 is a member of the Gas2 family, which includes Gas2, Gas2L1, Gas2L2, and Gas2L3 [39,40] . Gas2L2 has six exons that encode a 97-kDa protein. Previous studies have shown that Gas2L2 is located in actin stress bers and microtubules and thus promotes the coordinated arrangement of actin microtubules in different extents [40] . However, little is known about the function and location of Gas2L2 in natural tissues Cell transformation leads to changes in cell morphology, cell metabolism, gene expression, and growth control. Consequently, the cells become defective in reaching growth stagnation [41] . Growth stagnation, cell cycle arrest, or go are generally considered to exist only in the "negative" phase of the cell cycle. The isolation of highly expressed genes in growth arrest (Gas) has proved its existence. This provides a new tool for the study of cell biology of growth stagnation.
The Gas2 protein is induced in cultured cells during growth arrest [42] . It is also associated with apoptosis-related rearrangement of cytoskeleton in cultured cells [43] and possibly with the development of mammalian tissues [44] . During the transition from G0 to G1, the phosphorylation of the Gas2 protein in serine and threonine is believed to be a mechanism for the rapid induction of Gas2 inactivation after serum stimulation of blocked cells [45] .
Therefore, the recognition of the function of components with good characteristics in the micro lament system opens up a new research eld for linking the regulation of micro lament network and cell growth.
Cell shape is mainly determined by a complicated network of factors, including membrane cytoskeleton coupling factor, cytoskeleton-related elements, and cytoskeleton components [46] . It remains unclear whether the micro lament network play a direct role in the generation of growth control signals or only needs to generate motion and chemotaxis responses, which are coupling but independent events in the growth cycle process.
The identi cation and characterization of new components in the micro lament system that are closely related to the growth condition may be an important step to clarify the role of this system in growth control [47][48][49] . A good candidate for this kind of protein is Gas2, which has been proven to be an integral part of the micro lament system and its expression in mouse and human broblasts is highly induced when they are stagnant [50] . There is increasing interest in understanding the molecular processes that lead to growth arrest.
A previous study [51] has demonstrated that Gas2 is an evolutionarily conserved component of the micro lament system. When the growth of cells is restricted, the level of the Gas2 protein increases steadily, but because of its long half-life, there is no signi cant downregulation in the process of G0-G1 cell transformation.
In fact, cell responses (e.g., cell movement) that depend on the micro lament system are signi cantly restricted in stationary cells [52] . This probably implies that speci c elements are needed for the tissue to have such cytoskeleton-related constraints: Gas2 can represent one of these elements.
In our study, after whole-exon sequencing, we performed depth analysis including GO and KEGG analysis and found that 20 sites (containing SNP and indel polymorphism sites) are present in six or seven NARC samples. We then performed mass spectrometry analysis and found signi cant differences between the disease group and the control group for rs78557458 (GAS2L2). We also investigated the There is strong evidence that nesprin-3 (SYNE3) interacts with intermediate laments in vivo and in vitro through lectins [53,54] . However, the role of nesprin-3 in the pathogenesis of glucocorticoid-induced cataract has not yet been con rmed. In the present study, we assessed the expression and function of nesprin-3 in human lens epithelial cells (HLECs). The results showed that syne3 gene polymorphism was different between the disease group and the control group.
Therefore, the functional modi cations of SYNE3-mediated 3D protein structural changes in HLECs merit further investigation.
The loss of nesprin-3 changed the cytoskeleton around the nucleus, but not the entire cytoskeleton. This nding was consistent with previous studies [55][56][57] .
We believe that nesprin-3 provides a scaffold for the polyploid perinuclear tissue and that it is involved in the link between the centrosome and the nucleus.
The importance of nesprin-3 is not necessarily limited to the nuclear membrane, because the separation of polyploid and vimentin on the nuclear membrane may affect its availability in other roles. In fact, recent studies have shown that polyploid/vimentin complexes are the key in the regulation of focal adhesion and shape of mouse broblasts [59]. In addition, the in vivo results of polyploid and vimentin-de cient mice further emphasized the potential importance of the proposed nesprin-3/polyploid/vimentin linkage.
The function of nesprin-3 is to maintain the normal nuclear localization of HLECs, which adds new content to the already complex cytoskeleton and tissue structure around the nucleus. Our study showed signi cant differences between the disease group and the control group for rs76499929 (SYNE3). We then investigated the 3D protein structural variation caused by rs76499929. The SWISS-MODEL software showed that rs76499929 (Leu796del) was not in the region of protein coding. On the basis of this result, we speculated that although this site is beyond the region of protein coding, rs76499929 (Leu796del) in uenced the function of SYNE3 protein, which resulted in abnormality of cytoskeleton and tissue structure around the nucleus.
In conclusion, we found four new polymorphism sites associated with ARC, which could be the cause of the disease. Our ndings were supported by the results of studies on gene and protein structural variation. In this study, we adopted the whole-exon sequencing data to discover the molecular mechanisms involved in nuclear ARC. The data were veri ed through several lters and databases, and the results of depth analysis were further validated by mass spectrometry experiments. The corresponding gene expression was measured and compared with those of healthy people, and the results showed that the variations in the 3D protein structure caused by ZNF573 (rs3095726, snp), GAS2L2 (rs78557458, snp), ZNF862 (rs62621204, snp), and SYNE3 (rs76499929, indel) were involved in the mechanism of nuclear ARC formation.

Conclusions
Our ndings provide the basis for further studies and discovery of key genes associated with NARC.