Spinocerebellar ataxia is an autosomal dominant, progressive, genotypically heterogeneous class of neurodegenerative disorders. It presents abnormal gait and impairment of other cerebellar functions corresponding to debilitating effects on the individual. SCA27B is a late-onset, autosomal dominant disorder with tandem triplet (GAA) repeat expansion in intron 1 of FGF14 gene on chromosome 13 leading to haploinsufficiency (2).
Genotyping FGF14-GAA locus
We have studied the occurrence of SCA27B in a large Indian cohort of 1402(1256genetically unsolved ataxia patients; 146 kindreds) individuals with a mean age of 41 ± 18.4 years. Among the 1402 participants, 67% (n=939) were malewhile the remaining 33% (n=463) were female.
Initial screening on alln=1402 samples by priming the regions flanking the GAA-repeat locus revealed the heterogeneity in the repeat lengths. n=323 samples showing no or oneallelic peak in the electropherogram underwent repeat-primed (RP) PCR. n=75 samples demonstrating a saw-tooth electropherogram profile in RP-PCR underwent long-range (LR) PCR to determine the approximate length of the allele carrying the expanded repeating unit. The samples with repeats >= 250 repeating units were processed for long-read nanopore sequencing. The data procured from STRique followed a bimodal distribution pattern for each sample with two distinct clusters of reads. Hence, we adopted GMM for a better understanding of the STR landscape for each allele across each sample. We obtained multiple clusters for each sample following an unsupervised approach using MClust, R Bioconductor package. With this approach, we were able to obtain the allelic repeat data for n=21/24 samples processed in Nanopore (Figure 2). The remaining 3 samples had low read count and read quality post-sequencing and STRique analysis. In comparison to long-read sequencing analysis, the repeat numbers calculated manually from LR-PCR had a standard deviation of ±10 repeats for each allele across samples.
In this paper, we report a prevalence of 1.83% (n=23/1256) for GAA repeat expansion in intron 1 of FGF14 gene corresponding to SCA27B. These individuals had repeatexpansion in a minimum of one allele beyond the pathogenic range of 250 GAA repeats extending from 262 to 645 repeats(mean GAA repeats: 338 ± 81).21.7% (n=5) of the patients had a biallelic expansion while 8.6% (n=2) of the patients had a homozygous repeat expansion.The pre-mutable (large) normal allele ranges between 80 to 250 as per the 2 standard deviations away rule 8. The estimated FGF14-GAA repeating units from our entire cohort with 2804allelesreveal that 94.4% (n=2648) normal allele (<80 repeating units), 4.6% (n=128) large normal allele (80-250 repeating units), and 1% (n=28) expanded allele (>250 repeating units). The heterogeneity at this locus was marked by successive repeating units from 6 to 645 with mode repeats as 10 repeating units.In an earlier study involving Indian index patients (n=31), the disorder was indicated with a frequency of 10%. However, this might be misleading when considering the overall uncharacterized ataxia cohort in which 1.83% accounts for this disorder.In the genetic spectrum of SCAs in the Indian population, SCA27B may emerge as a strong candidate locus asits frequency is 1.83% which is higher than the previously determined frequencies of SCA6 (0.1%), SCA 7 (0.5%), and SCA 17 (0.1%). Its frequency is close to that of SCA3 (2%) and FRDA (2.2%) which are well-established emerging ataxia disorders in India (10).
We conducted a study on the variation in the intronic GAA repeat locus of FGF14 gene in a group of 86 neurologically healthy individuals serving as controls. These individuals varied in age between 1 year to 66 years (mean: 22 ± 19 years) and 59.3% (n= 51) were male while the remaining 40.7% (n= 35) were female.The estimated FGF14-GAA repeating units from our entire cohort with 172 alleles reveal that 93% (n=160) normal allele (<80 repeating units) while7% (n=12) large normal allele (80-250 repeating units). The repeat locus revealed polymorphism ranging between 9-223 GAA repeating units with a mode repeat of 10 repeating units. The entire allelic distribution across cohorts is depicted in Figure 1.
Table 1: Repeat distribution in SCA27B positive samples determined through LR-PCR and confirmed by long-read sequencing
SAMPLE ID
|
Estimated from LR-PCR and gel electrophoresis
|
Estimated through long-read sequencing and STRique
|
Allele 1
|
Allele 2
|
Allele 1
|
Allele 2
|
592
|
352
|
352
|
|
|
755
|
10
|
442
|
11
|
428
|
861
|
10
|
278
|
10
|
290
|
913
|
10
|
352
|
13
|
348
|
993
|
11
|
345
|
11
|
338
|
2048
|
262
|
398
|
267
|
404
|
3299
|
17
|
295
|
18
|
309
|
3320
|
11
|
278
|
11
|
295
|
3400
|
10
|
445
|
|
|
3522
|
278
|
378
|
316
|
335
|
4494
|
78
|
262
|
75
|
246
|
4680
|
145
|
378
|
|
|
5016
|
18
|
278
|
|
|
6258
|
70
|
312
|
85
|
336
|
6477
|
112
|
302
|
123
|
321
|
6681
|
9
|
378
|
8
|
347
|
6991
|
28
|
278
|
25
|
283
|
7362
|
262
|
645
|
|
|
8146
|
15
|
262
|
14
|
263
|
8432
|
15
|
312
|
18
|
321
|
8483
|
15
|
278
|
12
|
270
|
8506
|
138
|
295
|
137
|
311
|
9563
|
378
|
378
|
366
|
366
|
Clinical and genotypic correlation
The clinical data obtained revealed that the patients identified positive for SCA27B GAA repeat expansion varied in age from 17 to 79 years (mean = 51.2 ± 16.6 years). The frequency of occurrence of the disease in males and females is 82.6% (n=19) and 17.3% (n=4), respectively.
We obtained deeper insights into the clinical data of n=15/23 positive patient samples with varying age-at-onset between 14 to 71 years (mean = 50.2 ± 17.22 years). The brain MRI of these individuals revealed diffused cerebral and cerebellar atrophy. A steady disease advancement was observed in these individuals with a mean disease duration of 3.5 ± 2.4 years. The most common clinical features observed in these individuals included abnormal gait, cerebellar dysarthria, nystagmus, oculomotor apraxia i.e., broken ocular pursuits, and slow saccades. Few individuals reported pyramidal and extrapyramidal features. Two of the individuals reported autonomic dysfunctions such as urine incontinence and dysphagia.
Amidst the SCA27B positive patients, 20.8% (n=5) of the patients carry a biallelic expansion between 262 to 645 repeating units and 8.3% (n=2) of the patients had a homozygous repeat expansion. It is already well established in other polyglutamine SCAs thatthe severity of the phenotype and the age of onset of a disorder tend to increase with the size of the expansion(26). In SCA27B, the age of onset is weakly correlated to the size of the repeat but patients with biallelic expansion may manifest an early onset (below 30 years of age) with or without severe disease phenotype (12,27). We had some intriguing observations in a similar vein. The majority of individuals carrying a biallelic expansion exhibited an earlier onset of symptoms. The biallelic expansionmay be linked to a change in the disorder's nature, wherein the haploinsufficiency might be altered by loss-of-function, rendering it more pathogenic. Interestingly, a subset of patients with the short allele repeats in the premutable range also experienced an early onset of symptoms. This implies a potential role of premutable normal alleles and raises the prospect of the existence of associated genetic modifiers.
Among these, 80% (n=12) individuals reported a negative family history indicating a de novo expansion while 20% (n=3) individuals reported a relevant family history in one of the parents indicating the autosomal dominant nature of the disorder. This stochastic de novomanifestation of the disease could be attributed to several factors. Firstly, the FGF14 repeat region is highly variable, ranging between 6 - 223 repeating units in the control cohort, which is in stark contrast to other intronic repeat expansion disorders like FRDA, repeats typically range between 5 – 33 repeating units. This highlights the extreme instability of the FGF14 repeat region. Secondly, previous instances ofblatant de novo occurrence of a trinucleotide repeat expansion disorder have gotten us to “disease anticipation”. SCA27B exemplifies extensive intergenerational instability with maternal anticipation in concordance with other trinucleotide repeat expansion disorders such as SCA3 (28), DRPLA (29), and DM1 (30). Thirdly, incomplete penetrance and phenotypic heterogeneity are usual in autosomal dominant disorders like SCAs having a very narrow range of intermediate repeats (31).SCA27B, in paradox, has an extensive intermediate repeat ranging between 80 to 250 repeats, however, there’s no clear gap between the pathogenic and non-pathogenic repeat thresholds making it vulnerable to phenotypic heterogeneity and incomplete penetrance over generations.In summary, the de novo expression of the SCA27B phenotype is likely due to the volatile nature of FGF14 repeat region, intergenerational instability, potential maternal anticipation, and the complex inheritance patterns seen in similar genetic disorders.
Insights on the haplotype landscape of SCA27B
We studied LD localizing the GAA repeat motif of FGF14 gene across diverse geographical populations utilizing data from the 1000 Genomes Consortium and the IndiGen project. A total of 41 single nucleotide polymorphisms (SNPs) flanking 200kb upstream and downstream of the repeat region were considered during haplotyping and LD analysis. For each of the population, the mean D’ (D’mean), mean LOD (LODmean), mean r2 value (r2mean), and confidence interval (CI) were calculated.
This analysis brought us across a prominent LD block encompassing the region of interest with 9 SNPs spanning over 74kb. This LD block remains stable in South-Asian (D’mean=0.92; LODmean=45.5; r2mean=0.31; CI=0.83-0.95) and Indian (D’mean=0.86; LODmean=80.4; r2mean=0.28; CI=0.79-0.91) populations, while experiencing partial decay in other populations. The length of the LD block varies across groups, such as 67kb in East-Asian (D’mean=0.85; LODmean=42.7; r2mean=0.28; CI=0.73-0.92), 60kb in American (D’mean=0.84; LODmean=19.5; r2mean=0.21; CI=0.63-0.92), and 44kb in African (D’mean=0.83; LODmean=17.7; r2mean=0.1; CI=0.55-0.9) populations. Notably, in the European (D’mean=0.73; LODmean=23.7; r2mean=0.25; CI=0.56-0.83) population, the region of interest is almost in linkage equilibrium, lacking prominent LD. This characteristic makes it more vulnerable to recombination, repeat instability, and expansion. Many of the cohort studies from the European population indicate a high prevalence of the disorder in the population including German (frequency:8.7-18%) (2,12), French Canadian (59-61%) (12), French (17%) (32), Greek (12%) (33) and Spanish (28%) (34) cohorts. On the other hand, South-Asian and East-Asian populations relatively have a lower prevalence of the disorder as observed in our Indian cohort study (frequency:1.83%) and Japanese (frequency:1.2%) (35) cohort.Top of Form
Amidst 1920 alleles (IndiGen = 1874, Patient = 46), a total of 28 alleles had expansion over 250 repeats, 69 alleles lay in the intermediate range of 80- 250 while the remaining 1823 were normal repeat alleles. Following PHASE v2.1.1, 38 unique haplotypes were perceived for this 74kb stretch among these 1920 alleles. In these 28 expanded alleles, strikingly, 82.1% (n=23) of the expanded allele shared a common haplotype i.e., AATCCGTGG (Haplo-1), 10.7% (n=3) shared another haplotype i.e., AGCCCGTGG (Haplo-2) while the remaining 2 alleles had haplotypes AGCCTGTAA and CGCTCGTGG respectively. Similarly, in the 69 alleles with intermediate repeats, the most common haplotypes were the same as that of expanded alleles i.e., Haplo-1 and Haplo-2 accounting for 76.8% (n=56) and 14.5% (n=10) of the alleles respectively. In contrast, the most frequent haplotype in normal alleles is AGCCTGTGA (Haplo11) acquitting 23.4% (n=427) of the alleles which are otherwise absent in intermediate or expanded alleles. The most common haplotypes of expanded and intermediate alleles i.e., Haplo-1 and Haplo-2 combinatorically accounted for only 19.3% (n=352) of the normal alleles. In conclusion, Haplo-1 and Haplo-2 justify 92.8%, 91.3%, and 23.4% of the expanded, intermediate, and normal alleles respectively. This analysis is indicative of the fact that Haplo-1 may be a prominent risk haplotype with its major prevalence in the intermediate and expanded alleles. The association of the distinct haplotypes to various allele groups was assessed using a chi-square statistic test, yielding a value of 12.2682. The corresponding p-value is .002168, indicating statistical significance at the threshold of p < .05.
Top of FormBottom of FormIn the diverse population groups, the prevalence of the risk haplotype is highest in European individuals at 29.9%, followed by 21.5% in Indian populations, 19.5% in South-Asian populations, 14.5% in African populations, and 7.6% in American populations as shown in Figure 3.When the risk haplotype is in LD with normal alleles at the corresponding repeat locus, it indicates a multi-step evolutionary process. This process involves an initial historical mutation giving rise to a large normal allele (proto-mutation). Subsequently, this proto-mutation serves as a reservoir, facilitating gradual expansions that ultimately lead to the development of pathological alleles (26). The continuous range of repeats with no distinct gap between the pathogenic and the non-pathogenic threshold elucidates the gradual expansion of the proto-mutant allele.
We used the available SNP data from the patients to investigate the possibility of a ubiquitous haplotype underlying the expansion through generations. The core haplotype among the expanded and intermediate alleles is located at chr13:102096576-102171079 (hg38)
encompassing the FGF14-repeat motif and is 74kb in size. Assuming a correlated genealogy, the most recent common ancestor existed 1438.6 generations (CI-0.95: 851.7-2102.8) ago. Considering a 20-year generation span, a common ancestor with this haplotype would have lived ~28780 (CI-0.95: 17040-42060) years ago. This suggests a very ancient origin of this mutation even predating Indo-European divergence. Further, we assume that given the ancient origin and high frequency of at-risk haplotypes and mutable GAA-alleles may give rise to increase in the occurrence of SCA27B in Indian subcontinent.
Bottom of Form
Table 2: Repeat size-based distribution of all the unique haplotypes across 1920 alleles at the FGF14-GAA locus
GAA repeat range
|
GAA<80
|
GAA = 80-250
|
GAA>250
|
Nomenclature
|
6SNPs-GAA-3SNPs
|
Normal (n=1823)
|
Intermediate (n=69)
|
expanded (n=28)
|
Haplo-1
|
AATCCG-TGG
|
336 (18.4%)
|
53 (76.8%)
|
23 (82.1%)
|
Haplo-2
|
AGCCCG-TGG
|
16 (0.9%)
|
10 (14.5%)
|
3 (10.7%)
|
Haplo-3
|
AGCCTG-TAA
|
201 (11%)
|
-
|
1 (3.6%)
|
Haplo-4
|
CGCTCG-TGG
|
-
|
-
|
1 (3.6%)
|
Haplo-5
|
AACCCG-TGG
|
-
|
2 (2.9%)
|
-
|
Haplo-6
|
AGTCCG-TGG
|
59 (3.2%)
|
2 (2.9%)
|
-
|
Haplo-7
|
AGCCTG-TGG
|
60 (3.3%)
|
1 (1.4%)
|
-
|
Haplo-8
|
AGCTTA-CAA
|
146 (8%)
|
1 (1.4%)
|
-
|
Haplo-9
|
CGCTTA-CAA
|
192 (10.5%)
|
-
|
-
|
Haplo-10
|
AGTCTG-TGA
|
91 (5%)
|
-
|
-
|
Haplo-11
|
AGCCTG-TGA
|
427 (23.4%)
|
-
|
-
|
Haplo-12
|
AATCTG-TGA
|
156 (8.6%)
|
-
|
-
|
Others
|
Haplotypes with frequency <=1%
|
139 (7.6%)
|
-
|
-
|