Low-frequency and rare coding variants of NUS1 contribute to pathogenesis and phenotype of Parkinson’s disease: a case-control study

Background NUS1 has recently been identied as a candidate risk gene for Parkinson’s disease (PD), but the contribution of NUS1 rare and low-frequency variants to PD susceptibility and phenotypes is largely unknown. Methods In our case-control study, whole-exome or Sanger sequencing was performed on the subjects (4,779 cases vs. 4,442 controls) to analyze the coding sequence of NUS1 . The associations between variants and phenotypic data were analyzed using sequence kernel association test and regression models. Results A total of 13 variants were identied. Ten of them in 12 patients and one control were rare variants and three were low-frequency variants. Three rare variants (R86L, N144K, D163H) might be pathogenic. We identied a signicant burden of rare NUS1 variants in PD (adjusted P=0.016). Two low-frequency variants, rs550854234 and rs539668656, were associated with PD (odds ratio = 0.76, adjusted P = 0.041; odds ratio = 2.80, adjusted P = 0.016; respectively). Analyses stratied by age at onset showed that the same two variants were associated with late-onset PD (odds ratio = 0.66, adjusted P = 0.025; odds ratio = 2.96, adjusted P = 0.025; respectively). The genotype-phenotype associations of these variants showed that patients with PD carrying rare variants, rs550854234 or rs539668656 were signicantly associated with earlier onset age, emotional impairment and tremor severity. Conclusions Our study suggests that rare and low-frequency NUS1 variants play an important role in the pathogenesis and phenotype of PD. Moreover, our data will help understand the role of NUS1 plays in the pathogenesis of PD and further the development of personalized treatments for PD. variants rare NUS1 signicant PD familial EOPD nonsynonymous variants in PD also observed, suggest that genetic variations at the NUS1 locus is a signicant factor for PD in our large sample of Han Chinese FPD and EOPD


Introduction
Over the past 20 years, considerable efforts have been made to identify the genetic factors that cause the Parkinson's disease (PD) phenotype. Research has focused on the identi cation of PD-associated mutations, resulting in important discoveries in the genetics of PD [1] . This knowledge, in turn, may accelerate the development of mechanism-based therapies. To date, association studies have identi ed many genetic risk loci and many variants of the PD phenotype [2] . The discovery of PD-related genetic risk factors could help optimize the prevention and management of PD by enabling the identi cation of high-risk individuals. Recently, the targeted capture and sequencing of the protein-coding regions of the genome, known as exome sequencing, has been commonly used to the research of PD and other complex diseases [3,4] .Our recent whole-exome sequencing (WES) analysis [5] has identi ed NUS1 as a candidate risk gene for PD. However, a recent association study with a small sample size [6] failed to observe any signi cant association between NUS1 variants and PD. Considerable evidence suggests that a few low-frequency variants play a signi cantly greater role in establishing biomedical traits than do more common variants, and rare variants with large effect sizes are particularly relevant clinically [7] . However, the detection of associations for individual low-frequency and rare variants lacks statistical power with readily attainable sample sizes. Hence, exceptional sample sizes are often required for detecting these variants [8] .In this study, we explored the contribution of low-frequency and rare NUS1 variants to PD involved three steps. First, we sequenced the NUS1 coding regions in 1,542 PD patients and 1,625 controls(Cohort A) by WES to explore which aggregate variants are associated with PD. Second, to further increase power, we combined our previous study cohort [5] , which included 3,237 PD patients and 2,817 controls(Cohort B), with Cohort A to analyze the association of individual lowfrequency variants with PD. Third, to explore the in uence of the PD-associated genetic variants on clinical symptoms, linear and logistic regression analyses were utilized to reveal genotype-phenotype correlations. Our study systematically investigated the association of NUS1 low-frequency and rare coding variations with PD susceptibility/phenotypes in the Han Chinese population.  [5] , including 1,542 Han Chinese PD patients from mainland China (mean onset age, 46.03 ± 8.25 y; males, 54.28%), and 1,625 age-(mean age, 44.39 ± 8.14 y), gender-(males, 53.29%) and race-matched healthy controls selected from the Health Examination Center of Xiangya Hospital, the recruited patients' spouses, and the communities of Changsha; and (B) Cohort B, our previous cohort screened with Sanger sequencing, including 3,237 patients with PD and 2,817 healthy controls [5] . Cohort A included patients with a family history of PD (at least 1 relative with PD or parkinsonism) or born to consanguineous parents or with age at onset (AAO) no more than 50 y, and the controls without a history of neurological disease. Cohort B mainly included late-onset sporadic PD patients(90.64%) and healthy controls. All patients were diagnosed with PD by a movement disorder neurologist according to the clinical diagnostic criteria of either the UK PD Society Brain Bank [9] or the Movement Disorders Society [10] . Pathogenic variants of highcon dence PD disease-causing genes were ruled out in the patients of Cohort A [11,12] . Patients with PD were de ned as having early onset PD (EOPD) if AAO was ≤50 y [13] and late onset PD (LOPD) if AAO was >50 y. Clinical information collected on PD patients included disease duration, motor, and non-motor manifestations [14] . Blood samples were obtained from all participants. Genomic DNA was prepared from peripheral blood leukocytes according to standard procedures. Our protocol was approved by the Ethics Committee of Xiangya Hospital of Central South University, and written informed consent was collected from all participants according to the Declaration of Helsinki.

Patients and controls
Genotyping method, quality control, and analysis of population structure Cohort A was sequenced using WES technology. The average sequencing depth was 123× and a minimum of 10× coverage was achieved for 99.32% of the targeted regions. The data processing and analysis of sequencing data were carried out as described previously [15,16] to obtain high-quality variants. ANNOVAR [17] was used to annotate the variants based on the human genome hg19 RefSeq, including gene regions, amino acid alterations, functional effects, and allele frequencies for East Asian and all populations taken from the gnomAD database.
Prior to the association study, we used PLINK v1.90 [18] to screen the WES data for individual and variant quality control. In this step, genotypes were ltered to exclude those with a missing rate >5% and deviations from Hardy-Weinberg equilibrium in cases and controls (P < 1.0E-4). In addition, samples with discordant gender, an unusual heterozygosity of >3 standard deviations, and unusual relatedness (identity by descent > 0.15) were also excluded. To assess potential populationstructure factors, a principal-components analysis (PCA) was then performed with PLINK v1.90 after linkage disequilibrium (LD) pruning (Plink options: indep-pairwise 50,5,0.2). The top two principal components calculated from independent single-nucleotide polymorphisms were included as covariates in subsequent analyses as xed effects to control for population structure.

Statistical analysis
For variant burden analysis, sequence kernel association test (SKAT) [21] was performed using the SKAT R package.
Optimized SKAT (SKAT-O) was applied to Cohort A to analyze the joint effect of rare variants, low-frequency variants, and variant sets strati ed by functional level after adjusting for AAO in cases (age at entry in controls), sex and the rst two principal components. To further increase the detection power of single low-frequency variant association analyses, we pooled Cohort A and Cohort B. Power to identify associations between low-frequency variation and PD was estimated using QUANTO 1.2 (http://biostats.usc.edu/software) under a log-additive genetic model. Power to detect variants contributing 1% to the phenotypic variation was >90%, depending on the MAF of the variants (range, 0.1%-5%). Further strati cation by AAO was performed to assess the association of low-frequency variants with EOPD and LOPD. Logistic regression analyses were performed using PLINK v1.90, with adjustment for sex and AAO in cases (age at entry in controls). For the identi ed variants and clinical data association analyses, linear and logistic regression analyses were performed using PLINK v1.90, with adjustment for sex, AAO, and disease duration.
Continuous variables are presented as mean ± standard deviation and categorical variables are presented as frequencies (percentages). The corresponding odds ratios (OR) or beta coe cients (β), 95% con dence intervals (CI), and P-values in association analyses are also provided. For multiple comparisons, traditional Bonferroni correction is considered overly conservative, which may result in the increase of false negative errors and rejection of signi cant associations [22] . We used the method of Benjamini-Hochberg to calculate adjusted P values for false discovery rate, which reduces the probability of false negative errors while still controlling for false positive errors. Adjusted P values <0.05 were considered statistically signi cant.

Burden analyses
In Cohort A, signi cant associations comparing patients and controls were detected for the NUS1 gene (P = 0.023, adjusted P = 0.046). Further strati cation by the variants' predicted functional properties showed that the association was mainly in the nonsynonymous variants (P = 0.022, adjusted P = 0.046). Strati cation by allele frequency showed that the signi cant association mainly involved the rare variants (P = 0.0026, adjusted P = 0.016). However, when synonymous variants (P = 0.071), deleterious variants (P = 0.20), or low-frequency variants (P = 0.21) were considered for analysis, no signi cant associations were observed (Table 2). Our burden results further support the view that NUS1 is a risk gene for PD.

Association between low-frequency variants and PD
Additionally, we investigated the association between 3 low-frequency variants (rs550854234, rs539668656, and rs28362519) and PD. To further increase the power to detect associations, we collected the results for rs550854234, rs539668656, and rs28362519 from the Cohort B and combined the two cohorts for joint analysis. Among the three lowfrequency variants, we identi ed two independent variants (rs550854234 and rs539668656, R 2 <0.2) that were associated with PD (adjusted P < 0.05) ( Table 3).The frequency of rs539668656 in PD patients was higher than that of healthy controls even after correction for multiple comparisons (OR = 2.80, 95% CI = 1.36-5.80, P = 0.0054, adjusted P = 0.016). The rs550854234 variant was associated with a decreased PD risk after correction for multiple comparisons (OR = 0.76, 95% CI = 0.60-0.97, P = 0.027, adjusted P = 0.041). However, the rs28362519 variant was not signi cantly associated with PD (P = 0.48).
The patients with PD from the combined cohort were divided into EOPD and LOPD cohorts (i.e., strati ed by AAO; Table 3).

Variant-clinical data association
We also analyzed the clinical features of NUS1-variant carriers and non-carriers (Table 4). We found that in Cohort A, patients with rare variants had onset with PD with 6.16-year earlier than did non-carriers (OR=-6.16, P=0.0087), and had suffered a 6.88 times potential high risk of depression (HAMD [23] , OR=6.88, P=0.019) than those without rare variants. In the combined cohort, patients with rs550854234 had decreased risk for depression (HAMD [23] , OR=0.51, P=0.035) than the non-carriers, and rs550854234 in patients were positively associated with tremor score (UPDRS Item 20 and 21 [24] , β=0.86, P=0.027). Patients with rs539668656 had more severe mental/behavioral symptoms and greater emotional impairment (UPDRS-Part I [24] , β=1.38, P=0.012) than did non-carriers.
Additionally, we further validated the effects of the NUS1 variants on PD-related phenotype using a more conservative strategy: Permutation test by 10,000 times of random permutations, which does not require assumptions of distribution, also showed signi cant effects for the NUS1 rare variants on the AAO (EMP2=0.019), the NUS1 rare variants and rs550854234 on the incidence of depression (EMP2=0.0041; EMP2=0.041;respectively), rs550854234 on the tremor level (EMP2=0.022), and rs539668656 on mental/behavioral symptoms and emotional impairment (EMP2= 0.015). Discussion NUS1, encoding Nogo-B receptor and localized to the membrane of the endoplasmic reticulum, belongs to the undecaprenyl diphosphate synthase family. The neurodevelopmental phenotype, which includes epilepsy and tremors, is reportedly associated with a 6q22 deletion, which includes NUS1 [25] . To date, six variants have been identi ed to be associated with developmental delays and epileptic encephalopathy, a congenital disorder of glycosylation, and early-onset PD [26] .
In our previous research, we identi ed NUS1 as a candidate risk gene for PD through genetic and functional studies. Further studies are needed to identify any variants within the protein-coding regions of NUS1 and to explore whether these variations modify PD's phenotype. However, few studies have focused on the association between NUS1 variants and PD. In the present research, we used WES to sequence the exons and exon-intron junctions of NUS1 in 1,542 PD patients and 1,625 healthy controls from the Chinese Han population and totally identi ed 13 variants. Among the 13 variants, ten rare variants (H19H, S24S, F30F, K58N, A80A, R86L, T106T, N144K, L159I, and D163H) were identi ed in PD. Three of these rare variants (R86L, N144K, and D163H) could be deleterious according to in silico predictions. The three variants were extremely rare or absent in the 1,625 neurologically normal Han Chinese controls and in East Asian and all populations described by the gnomAD database. PhastCons analysis showed that these variants were located in the highly conserved region of the NUS1 gene, suggesting that the three variations were pathogenic according to the criteria of the ACMG 24 (Table 1). Our results showed that rare NUS1 mutations made a signi cant contribution to PD risk, especially in a large sample of familial PD(FPD) and EOPD population more likely to be driven by genetic factors. Enrichment of nonsynonymous variants in PD was also observed, which suggest that genetic variations at the NUS1 locus is a signi cant risk factor for PD in our large sample of Han Chinese FPD and EOPD patients.
previously in a genetic analysis of NUS1 in a Chinese PD population [6] , and none were found to have a signi cant association with sporadic PD. In the present study, an association was found between NUS1 rs550854234 and rs539668656 and PD; rs539668656 was associated with an increased risk of PD, while rs550854234 was associated with a decreased risk of PD, even after correction for multiple comparisons. The main reason for the discrepancy between our results and those of Xiang et al. may lie in the differences between the samples sizes and the genetic heterogeneity of the populations studied. Furthermore, we found that rs550854234 and rs539668656 were also associated with LOPD risk, suggesting that the role of rs550854234 and rs539668656 in the pathogenesis of LOPD is of importance. Notably, we found that rs550854234 and rs539668656 had entirely opposite effects on PD risk. Interestingly, NUS1 is considered intolerant of loss-of-function variants, based on a pLI score of 0.87 (pLI, probability of loss-of-function-intolerant) found in the Exome Aggregation Consortium Browser database [27] (http://exac.broadinstitute.org), suggesting that NUS1 variants may lead to disease through haploinsu ciency. Sequence variants within classical splice sites or splicing enhancer sequences leading to splicing defects can be reportedly an important mechanism of pathogenicity, especially for those genes in which loss of function is the common pathogenic mechanism [28] . A potential effect of rs550854234 and rs539668656 on splicing was found by searching the Human Splicing Finder (HSF) [29] (http://www.umd.be/HSF/), showing that rs539668656 can affect splicing through the alteration of an exonic splicing enhancer site (Table S2), which may have an effect on regulation of NUS1 expression. Although rs550854234 seems to have no effect on the amino acid sequences of proteins or on splicing signals, it is commonly accepted that synonymous mutations can cause either enhancement or suppression of local translation rates, conformation, or substrate speci city, depending on the location of the mutation, thereby affecting the function of the protein [30] . These data suggest that rs550854234 and rs539668656 may contribute to PD pathogenesis through different mechanisms.
In this study, genotype-phenotype analysis showed that patients with rare variants showed a signi cant association with earlier AAO in FPD and EOPD patients. Our results revealed that patients with rare NUS1 variants were associated with a greater risk of depression, while rs550854234 carriers showed a lower risk of depression and rs539668656 was also associated with more severe mental/behavioral symptoms and emotional impairment. Previous studies have shown the association between depression [31] , affective disorder and PD risk [32] . The study of Fang et al [33] suggested that depression might be an early symptom in PD and shared common etiological basis with PD. We speculate that the NUS1 rare and lowfrequency variants affecting the onset of depression or emotional impairment may contribute to PD risk through an unknown mechanism. Further researches to know the pathogenesis role of NUS1 variants underlying PD susceptibility/phenotypes are needed.
This study has several limitations. First, although we have used a stringent ltering strategy to identify deleterious variants, those identi ed in our study could not be analyzed for co-segregation to prove their true pathogenesis owing to the relatives of these patients refusing genetic testing or out of contact. Moreover, N144K has been identi ed in PD patients and absent in controls in the study of Xu et al [6] . Second, we lack an independent replication cohort to verify the associations between the two low-frequency variants (rs550854234 and rs539668656) and PD. We attempted to analyze the largest available genome-wide association studies dataset in PD [2] to identify the relationship, but these two variants were not found in the patients and controls. Third, no functional analysis was performed to verify an effect of rs550854234 and rs539668656 on alternative splicing and expression of NUS1 due to a lack of RNA samples. Therefore, more replications and functional studies are needed to provide more evidence of these rare variants' pathogenicity and the two low-frequency variants' association with PD, and further clarify the biological functions of rs550854234 and rs539668656 in PD.

Conclusions
In conclusion, our identi cation of rare and low-frequency NUS1 variants associated with PD suggests that additional studies should focus on this important gene in the Chinese and other populations, and our ndings may facilitate a better Page 7/12 understanding of pathophysiological mechanisms of NUS1 variants in PD and the development of personalized treatments.

Additional File
Additional le 1: Table S1. Functional Prediction of 7 nonsynonymous variants in NUS1 Additional le 2: Table S2