Reference SVA insertion polymorphisms are associated with dopaminergic degeneration in Parkinson’s Disease and differential gene expression in the PPMI cohort

Background: The development of Parkinson’s disease (PD) involves a complex interaction of genetic and environmental factors. The majority of studies investigating the genetic component of complex diseases, including PD, have focused on single nucleotide polymorphisms as this enables genome wide analysis of a large number of samples. Genome wide association studies have been crucial in identifying PD risk variants, however a large proportion of the heritability of PD remains to be identied. To investigate the component of PD that may involve complex genetic variants we characterised SINE-VNTR-Alus (SVAs), a retrotransposon known to affect gene expression, in the Parkinson’s Progression Markers Initiative (PPMI) cohort. Results: Utilising whole genome sequencing from the PPMI cohort that consisted of 179 healthy controls, 371 individuals with PD and 58 individuals classied as SWEDD (scans without evidence of dopaminergic decit) we genotyped SVAs in the reference genome for their presence or absence identifying 81 such SVAs. Seven of these SVAs were associated with progression of the disease, including four whose specic genotypes were linked to an increase in the gradient of dopaminergic loss when comparing the caudate to putamen from DaTscan imaging analysis. These seven SVAs also demonstrated regulatory properties as they were associated with differential gene expression in whole blood RNA sequencing data. Conclusion: This study highlights the importance of addressing variation of SVAs and potentially other types of retrotransposons in PD genetics, furthermore these SVA elements should be considered as regulatory domains that could play a role in disease progression. Delly2 genotype structural variants in individuals (179 371 58 SWEDD) PPMI cohort. structural variants within the coordinates of the human specic SVAs, available on the UCSC genome (https://genome.ucsc.edu/) (19), +/-100 bp were extracted. Structural variants classied as deletions in these regions were inspected manually to determine if the break points were consistent with the reported coordinates of the reference human specic SVAs. Association analysis of the 81 SVA RIPs identied with PD was performed using logistic regression in PLINK (v1.07) (20), using data from the control and PD samples only, and p values adjusted for multiple testing (Bonferroni correction). VCF les for the PPMI samples were downloaded from the PPMI website (www.ppmi-info.org/data). SNPs with a minor allele frequency greater than 0.001 located in 250 kb either side of the 81 SVA RIPs were extracted and PLINK (v1.07) was used to identify those SNPs in strong LD with the RIPs (r 2 > 0.8). For each SVA RIP the SNP in strongest LD was selected and searched for in the summary statistics from Nalls et al (6) to determine if this proxy SNP for the SVA RIP genotype was associated with PD risk. was then performed on the SVAs that were signicant after FDR correction for an effect size comparison with Tukey adjusted p value reported. Longitudinal data was obtained from PPMI (www.ppmi-info.org/data) and altogether, 111 different clinical variables were analysed in the PD subjects from four visits at baseline (0), 12, 24 and 36 months for the majority of the data points (data relating to the DaTscan SPECT imaging was not available at 36 months).


Background
Parkinson's disease (PD) is the second most common neurodegenerative disease affecting 1% of the population over 60 years of age (1). PD pathology is characterised by the loss of dopaminergic neurons from the substantia nigra and the formation of neuronal inclusions called Lewy bodies. The primary symptoms are related to motor dysfunction (bradykinesia, resting tremor, rigidity and postural instability), however individuals with PD can also develop a range of non-motor symptoms that include sleep disturbances, constipation and impaired olfaction (2). The precise mechanisms underlying the development of PD are unknown. Genetic studies have played a key role in highlighting pathways involved in disease pathogenesis that include autophagy, mitophagy and the innate immune system (3,4). There are highly penetrant mutations in several genes that cause rare monogenic forms of PD in either an autosomal dominant (e.g. SNCA and LRRK2) or an autosomal recessive (e.g. PRKN and PARK7) manner (5). However, monogenic forms of the disease contribute a small percentage to the total incidence of the disease, with the majority of cases classed as sporadic and likely caused by a complex interaction of genetic and environmental risk factors. Large-scale genome wide association studies (GWAS) have been crucial in identifying genetic risk factors of idiopathic disease, however a large proportion of the disease heritability has yet to be identi ed (6,7). In addition, many of the single nucleotide polymorphisms (SNPs) that are associated with disease development are in non-coding regions of the genome and their functional impact is often unknown. We aim to investigate the genetic component of PD further by studying other types of variation and their functional consequences and this study focuses on variation generated by a speci c family of retrotransposons. SINE-VNTR-Alus (SVAs) are hominid speci c retrotransposons and comprise approximately 2700 elements in the reference human genome (8). SVAs are composite elements (domains outlined in Fig. 1a) consisting of seven subfamilies (A-F and F1) and are the evolutionarily youngest member of the non-long terminal repeat class of retrotransposons (9,10). Members of the SVA family are still able to retrotranspose in the human genome (11), and this ongoing mobilisation has led to insertions that are polymorphic in the human population for their presence or absence. Twelve such polymorphic SVA insertions have been identi ed as the cause of diseases including X-linked dystonia parkinsonism (XDP), Neuro bromatosis type 1 and haemophilia B through mechanisms such as loss of function mutation, modulation of splicing and deletions at the site of insertion (12). XDP is associated with a founder haplotype that contains a SVA in intron 32 of the TAF1 gene, reduced expression of TAF1 and alternative splicing and intron retention directed by the SVA insertion (13). Polymorphic retrotransposons, including members of the SVA family, have been shown to regulate gene expression in a population speci c manner and could have an impact on phenotypic differences (14). In the literature retrotransposon insertion polymorphisms (RIPs) have also been identi ed to be in linkage disequilibrium (LD) with SNPs associated with a range of diseases through GWAS, further analysis of selected candidates identi ed six RIPs that could lead to disease through changes in gene regulation (15).
Our study focused on the potential role of reference SVA RIPs established in the population in the risk and progression of PD. To address this we utilised whole genome sequence, transcriptomic and clinical data from the Parkinson's Progression Markers Initiative (PPMI) which was established to identify PD progression markers to help understand disease aetiology and ultimately aid in the development of novel therapeutics (16). The PPMI is a longitudinal study collecting clinical and imaging data and biospecimens from PD subjects, healthy controls and those subjects classi ed as SWEDD (scans without evidence of dopaminergic de cit) (17). Using the extensive dataset from the PPMI cohort, we identi ed 81 SVAs that were polymorphic for their presence or absence in the reference genome. Seven of the identi ed SVA RIPs were associated with PD progression and subsequent analysis of these seven RIPs correlated speci c genotypes with changes in gene expression in the blood.

Methods
Identifying SVAs polymorphic for their presence/absence in the reference genome from whole genome sequencing data from the PPMI cohort Whole genome sequencing data in bam le format aligned to hg38 were obtained from the PPMI. The structural variant caller Delly2 (https://github.com/dellytools/delly) (18), with default settings, was used to genotype structural variants in 608 individuals (179 healthy controls, 371 PD and 58 SWEDD) from the PPMI cohort. Those structural variants located within the coordinates of the human speci c SVAs, available on the UCSC genome browser (https://genome.ucsc.edu/) (19), +/-100 bp were extracted. Structural variants classi ed as deletions in these regions were inspected manually to determine if the break points were consistent with the reported coordinates of the reference human speci c SVAs. Association analysis of the 81 SVA RIPs identi ed with PD was performed using logistic regression in PLINK (v1.07) (20), using data from the control and PD samples only, and p values adjusted for multiple testing (Bonferroni correction). than 0.001 located in 250 kb either side of the 81 SVA RIPs were extracted and PLINK (v1.07) was used to identify those SNPs in strong LD with the RIPs (r 2 > 0.8). For each SVA RIP the SNP in strongest LD was selected and searched for in the summary statistics from Nalls et al (6) to determine if this proxy SNP for the SVA RIP genotype was associated with PD risk.
Association analysis of SVA RIPs with clinical features and progression markers of PD in the PPMI cohort For longitudinal analysis, a linear mixed-effects model combined with FDR correction measured the changes in phenotype scores during info.org/data) and altogether, 111 different clinical variables were analysed in the PD subjects from four visits at baseline (0), 12, 24 and 36 months for the majority of the data points (data relating to the DaTscan SPECT imaging was not available at 36 months).
Analysis of SVA RIP genotype and gene expression in blood RNA-sequencing data from the PPMI cohort DESeq2 package in R was used to detect statistically signi cant differences in the gene expression pro les between the different genotypes of the seven SVAs associated with clinical features of PD at baseline (0 months) and 36 months.

Characterisation of reference SVA RIPs in the PPMI cohort
The PPMI cohort whose whole genome sequence data were analysed in this study consisted of 179 healthy controls, 371 PD and 58 subjects classi ed as SWEDD. The subjects from each group showed similar demographics in terms of age and sex (Table 1) and all subjects included reported race as white. In the 608 subjects analysed, 83 SVAs in the human reference genome were identi ed as absent in at least one genome compared to the reference (hg38) (Additional Table 1). Two of these were detected as absent in all genomes analysed and therefore were not classi ed as polymorphic in this cohort. Of the 81 SVAs RIPs, 28 (34.6%) had not been reported previously in the dbRIP track available on UCSC genome browser. The majority, 85.2%, of the SVA RIPs belonged to subtypes E, F and F1 (Fig. 1b), which was consistent with these human speci c subfamilies being the most recently active (9). The SVA RIPs showed a wide range of insertion allele frequencies from 0.0008-0.994 ( Fig. 1c). Using logistic regression and Bonferroni correction for multiple testing there were no SVA RIPs signi cantly associated with PD when comparing control and PD subjects in this cohort (Additional Table 1). Tagging SNPs (r 2 > 0.8) were identi ed for 76 (93.8%) of the SVA RIPs and were used to act as a proxy for the SVA genotype, enabling the evaluation of these elements in the largest PD and healthy control dataset available from the recent GWA meta-analysis performed by Nalls et al (6). In the summary statistics from this meta-analysis, 75 of the SVA RIP tagging SNPs were present (Additional Table 2). One of these SNPs (rs55653937) was reported as associated with PD at genome wide signi cance (p = 1.64 × 10 − 19 ) and was in strong LD (r 2 = 0.994) with SVA_67 located 12 kb upstream of the KANSL1 gene. Reference SVA RIPs are associated with progression markers of PD in the PPMI cohort The PPMI cohort has extensive clinical and phenotypic data available at four time points, baseline (0 months) and subsequent measurements at 12, 24 and 36 months. This data includes motor, cognitive and olfactory testing, dopamine transporter (DAT) imaging assessments and analysis of markers in biospecimens. Analysis of the 81 SVA RIPs identi ed seven associated with features of PD (Table 2) at 12 months or later. The nearest gene to each of these SVAs are reported in Table 3 and for chromosomal loci see Additional Table 1. Four of the SVA RIPs (36, 47, 67 and 79) in Table 3 were associated with changes in the count density ratios (CDR) of the caudate/putamen, which had been calculated from DaTscan single photon emission computed tomography (SPECT) imaging. Figure 2a (7, 11 and 24) in Table 3 were associated with the following measurements re ecting the progression of PD: Hoehn and Yahr staging and the Movement Disorder Society -Uni ed Parkinson's Disease Rating Scale (MDS-UPDRS). For example, the presence of SVA_11 was signi cantly associated with a lower mean MDS-UPDRS part II score at 36 months ( Fig. 2b) when comparing PP (7.6, 95% CI 6.3-8.9) and PA (8.9, 95% CI 8.2-9.6) genotypes to AA (10.9, 95% CI 9.6-12.2, p = 0.001 and p = 0.02). SVA_24 was signi cantly associated with changes in the categorical Hoehn and Yahr stage in PD subjects (Fig. 2c).  The genotype of speci c SVA RIPs is associated with differential gene expression Transcriptomic data from whole blood was utilised to determine if the seven SVA RIPs associated with PD progression (Table 3) Table 3). The number of genes whose differential expression was signi cantly associated with speci c SVA genotypes (FDR < 0.1) at 0 and 36 months is summarised in Table 3 and the gene with the lowest adjusted p value is shown in brackets. Genotypes of SVA_7 and SVA_36 were associated with changes in gene expression of hundreds or thousands of genes at many different loci. Whereas the effects of other SVA RIPs (24 and 67) were restricted to a smaller number of genes (Table 3), many of which were located at the SVA locus, and several of those genes were affected at both time points analysed. However, one SVA RIP demonstrated a narrower effect on gene expression with only the expression of one gene (RAB3B) associated with SVA_79 genotype. The effect of the SVA RIPs on gene expression varied between the different elements and below we highlight speci c examples of genes or clusters whose expression was associated with multiple SVA genotypes and/or at both time points at which statistical analysis was performed.
The expression of FLACC1 was signi cantly different for individuals with PP genotype compared to AA for SVA_11 (located in an intron of CASP8) at 0 (FDR p = 0·04) and at 36 months (FDR p = 2·22 × 10 − 5 ) (Fig. 3a). FLACC1 is adjacent to CASP8 in the genome and SVA_11 is 3·5 kb downstream of its 3'UTR. The expression of HLA-A was repeatedly the most signi cant gene to be associated with SVA_24 (located 8 kb upstream of HLA-A) genotypes at 0 and 36 months (Table 3). At both 0 and 36 months the subjects with PP genotypes had signi cantly higher HLA-A expression than those with AA genotypes (FDR p = 6·22 × 10 − 51 and p = 2·49 × 10 − 38 ) (Fig. 3b). Those with PA genotypes also showed signi cantly higher HLA-A expression than AA genotypes at 0 and 36 months (FDR p = 1·90 × 10 − 78 and p = 7·51 × 10 − 60 ), however there was no signi cant difference when comparing PP and PA genotypes (Fig. 3b).
SVA_67 at the chromosomal locus 17q21·31 was associated with differential expression of multiple genes, including six in a 1·15 Mb region centred around the SVA RIP (Fig. 4a). At baseline when comparing PP and AA genotypes three (PLEKHM1, ARL17A and CRHR1) out of the four signi cantly associated genes were located in this region as were ve (PLEKHM1, ARL17A, CRHR1, MAPT and LRRC37A) out of seven genes whose expression was signi cantly different when comparing PP to PA genotypes (Fig. 4b-f). Extending the analysis to expression data at 36 months there were 22 genes whose expression was signi cantly different when comparing PP and AA genotypes of SVA_67, 4 of which were located in the 1·15 Mb region (PLEKHM1, ARL17A, CRHR1 and MAPT) (Fig. 4b-e). These same four genes as well as two others (LRCC37A and KANSL1) in this genomic region were also differentially expressed when comparing PP and PA genotypes at 36 months ( Fig. 4b-g). Four of the genes in this region whose expression was associated with SVA_67 (PLEKHM1, ARL17A, CRHR1 and MAPT) all showed higher expression in individuals with PP genotypes and the individuals with the lowest expression had an AA genotype. This was in contrast with the levels of expression of LRCC37A and KANSL1 where the opposite pattern was observed.

Discussion
This study incorporated genetic, clinical and gene expression data from the PPMI cohort to evaluate the role of SVAs in PD, identifying elements associated with disease progression that can modify the transcriptome. In this analysis, we characterised novel variation in the human genome identifying 81 reference SVA RIPs. In the PPMI cohort we did not identify any SVAs that were associated with disease risk. However, in the literature RIPs have been reported as candidate causative variants at known disease associated loci by identifying those in strong LD with trait associated SNPs (15,21). Utilising this approach we identi ed one SVA RIP in our cohort in strong LD with a SNP associated with PD risk at genome wide signi cance. This SVA (SVA_67) is located in a 1.8 Mb region of high LD on chromosome 17 where there are two predominant haplotypes (H1 and H2) and the presence of SVA_67 is part of H1 with the SVA is absent in H2. H1 is associated with increased risk of developing PD in multiple studies (22)(23)(24). This region encompasses several genes and includes MAPT in which mutations can cause the neurodegenerative diseases frontotemporal dementia with parkinsonism and progressive supranuclear palsy (25,26).
However, it remains unclear which variant or gene within the H1 haplotype is responsible for the increased risk to PD and our study has highlighted additional types of variation to be considered at this locus.
The role of SVAs in the progression of PD has not been thoroughly addressed in the literature and here we identi ed seven SVAs associated with the progression of PD (Table 2). Clinical scales such as the Hoehn and Yahr and the MDS-UPDRS are used to measure disease severity and three of the SVA RIPs were associated with signi cant differences in these scores between the different genotypes of the insertions. The remaining four SVA RIPs were associated with changes observed in DaTscan SPECT analysis, speci cally the ratio of caudate to putamen CDR ( Table 2). DaTscan SPECT utilises a radioligand that binds to the DAT that is present on the presynaptic membrane of nigrostriatal neurons.
The loss of the DAT from the striatum is characteristic of PD, correlated with the loss of dopaminergic neurons from the substantia nigra and used to distinguish PD from other neurodegenerative parkinsonisms such as essential tremor (27). The reduction of striatal DAT correlates with symptom severity and changes over the course of the disease, therefore this can be used to monitor dopaminergic degeneration (28, 29). The caudate/putamen ratio is a measure of the anterior to posterior gradient of dopaminergic loss and increases in this gradient were associated with speci c SVA RIP genotypes. The SVAs associated with these changes may not be directly causing these alterations to the gradient of dopaminergic loss but may be involved in modi cations to the disease course that are measured by these features of DaTscan SPECT.
We have previously shown that SVAs can affect gene expression in in vitro and in vivo models using reporter gene assays (8,30). More recently, studies have integrated genetic data regarding SVA RIPs and RNA sequencing to identify those elements that regulate transcription (14,15). The SVA RIPs associated with progression of PD had a signi cant impact on the transcriptome and this could be one of the mechanisms by which the SVAs are exerting an effect of PD progression. Genes at both the SVA RIP locus and distant regions of the genome, including different chromosomes, were associated with speci c SVA RIP genotypes. For example SVA_24 was most signi cantly associated with robust changes of expression of its nearest gene (HLA-A) and SVA_67 was associated with differential expression of a cluster of neighbouring genes. In contrast, SVA_36 was associated with changes in expression of thousands of genes at many different loci including those on different chromosomes to the SVA RIP. These differences observed could be due to the SVA insertion itself, the genomic architecture at each particular locus or a combination of both. SVAs could in uence gene expression through multiple mechanism such as the recruitment of transcription factors, the interaction of SVAs with distant promoters through 3D chromatin structure and intronic insertions affecting mRNA splicing. Regulatory domains can function in a tissue speci c manner and therefore we utilised the GTEx database (https://gtexportal.org/home/) to evaluate the potential for the SVAs to act in other tissues using their proxy SNPs. Tagging SNPs for ve out the seven SVAs were located in the GTEx database and three of these were identi ed as eQTLs in multiple tissues (Table 3), including different regions of the brain such as the caudate, putamen, substantia nigra and cerebellum. This suggests the SVAs (7, 36 and 67) could also be in uencing gene expression in other tissues in addition to whole blood demonstrated in this study.

Conclusions
Using a systematic approach to analyse SVAs in PD integrating multiple different data types we identi ed novel genetic variation in the PPMI cohort and speci c SVA RIPs that were associated with dopaminergic degeneration. We were able to correlate these elements with a regulatory function and could therefore be in uencing the disease course through modulation of gene expression. Structure, subtype classi cation and frequency of polymorphic reference SVAs in the PPMI cohort. A-The structure of a canonical SVA consists of a (CCCTCT)n hexamer repeat, an Alu-like region consisting of two antisense Alu fragments and an intervening unique sequence, a variable number tandem repeat (VNTR), a SINE-R domain and a poly A-tail. The length of an intact SVA can vary depending on the number of repeats present in the hexamer and VNTR domains. SVAs are anked by target site duplications (TSDs). B-The proportion of the 81 reference SVA RIPs that belong to each of the seven subtypes of SVA. C-The number of reference SVA RIPs that belong in each category of frequency of allele insertion.

Figure 2
Polymorphic SVAs were associated with markers of PD progression in the PPMI cohort. A-PD individuals with the AA genotype of SVA_67 had a signi cantly higher ipsilateral caudate/putamen CDR at both 12 and 24 months compared to those with PA and PP genotypes. B-PD individuals with the AA genotype of SVA_11 had a signi cantly higher MDS-UPDRS part II score at 36 months compared to those with PA and PP genotypes. C-PD individuals with the PA genotype of SVA_24 had a signi cantly higher Hoehn and Yahr score at 36 months compared to those with AA and PP genotypes. 95% con dence intervals are represented by the blue bar.*=p<0.05, **=p<0.01 and ***=p<0.001 Figure 3 The genotype of reference SVAs were associated with differential gene expression. A-At 0 months a 1.3 fold change in expression of FLACC1 was signi cantly associated with SVA_11 genotype when comparing PP to AA (FDR p=0.04) and at 36 months a 1.4 fold change in expression when comparing PP to AA (FDR p=2.22x10-5). B-At 0 months a 3 fold change in expression of HLA-A was signi cantly associated with SVA_24 genotype when comparing PA to AA (FDR p=1.90x10-78) and a 3.7 fold change in expression when comparing PP to AA (FDR p=6.22x10-51). At 36 months a 2.5 fold change in expression of HLA-A was signi cantly associated with SVA_24 genotype when comparing PA to AA (FDR p=7.51x10-60) and a 3.6 fold change in expression when comparing PP to AA (FDR p=2.49x10-38). n=608 Figure 4 The expression of several genes at the 17q21.31 locus were associated with SVA_67. A-A schematic modi ed from UCSC genome browser of the 1.15Mb region on chromosome 17 that contains six of the genes whose expression was associated with SVA_67. B-G -The levels of expression of six genes (PLEKHM1, ARL17A, CRHR1, MAPT, LRRC37A and KANSL1) from individuals with different genotypes of SVA_67 (PP, PA and AA). Differential expression of these genes was signi cantly associated with speci c SVA_67 genotypes at 0 or 36 months or both. n=608

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. AdditionalTable1.xlsx AdditionalTable2.xlsx