Retrotransposon insertion as a novel mutational cause of spinal muscular atrophy

Spinal muscular atrophy (SMA) is an autosomal recessive neuromuscular disorder resulting from biallelic alterations of the SMN1 gene: deletion, gene conversion or, in rare cases, intragenic variants. The disease severity is mainly influenced by the copy number of SMN2, a nearly identical gene, which produces only low amounts of full-length (FL) mRNA. Here we describe the first example of retrotransposon insertion as a pathogenic SMN1 mutational event. The 50-year-old patient is clinically affected by SMA type III with a diagnostic odyssey spanning nearly 30 years. Despite a mild disease course, he carries a single SMN2 copy. Using Exome Sequencing and Sanger sequencing, we characterized a SINE-VNTR-Alu (SVA) type F retrotransposon inserted in SMN1 intron 7. Using RT-PCR and RNASeq experiments on lymphoblastoid cell lines, we documented the dramatic decrease of FL transcript production in the patient compared to subjects with the same SMN1 and SMN2 copy number, thus validating the pathogenicity of this SVA insertion. We described the mutant FL-SMN1-SVA transcript characterized by exon extension and showed that it is subject to degradation by nonsense-mediated mRNA decay. The stability of the SMN-SVA protein may explain the mild course of the disease. This observation exemplifies the role of retrotransposons in human genetic disorders.


Introduction
Spinal muscular atrophy (SMA) is an autosomal recessive neuromuscular disorder characterized by the degeneration of motor neurons of the anterior horns of the spinal cord, leading to progressive symmetrical limb and trunk paralysis associated with muscular atrophy. Estimated incidence is 1 in 6000 to 1 in 10,000 live births, depending on the ethnicity (Sugarman et al. 2012). SMA is associated with a wide spectrum of clinical severity and is usually classified into five groups (type 0-IV) based on age of onset and achieved motor milestones (Munsat and Davies 1992;Zerres et al. 1995;Dubowitz 1999;Wang et al. 2007;Finkel et al. 2015;Grotto et al. 2016). SMA type 0 with prenatal onset is the most severe form of the disease and the vast majority of patients do not survive beyond one month of age. Type I SMA (OMIM#253300) has classically an onset before the age of six months and in the absence of any therapy, death occurs within the first two years of life. Type II SMA (OMIM#253550) is intermediate in severity with onset before 18 months of age. These patients never gain the ability to walk. Patients affected with type III SMA (OMIM#253400) have proximal muscle weakness, starting after the age of 2. Type IV SMA (OMIM#271150), the mildest form of the disease, begins in adulthood, usually after age 20 years. Type I SMA is the most frequent form of the disease (nearly 50% of the SMA patients) whereas type II SMA and type III SMA represent respectively 30% and 19% of the SMA patients while type 0 SMA and type IV SMA remain extremely rare (Wirth 2021).
SMA is caused by the biallelic inactivation of the SMN1 gene (Survival of Motor Neuron 1, MIM*600354) located in the telomeric element of a duplicated region on chromosome 5q13 (Lefebvre et al. 1995). SMN2 (Survival of Motor Neuron 2, MIM*601627), a nearly identical copy of SMN1, is located in the centromeric element of this duplication but is only partially functional. SMN2 is a modifier gene of SMA disease severity with an inverse correlation between SMN2 copy number and disease severity. Among patients with SMA type III, 49% have 3 copies of the SMN2 gene and 44% have 4 copies of the SMN2 gene (Velasco et al. 1996;McAndrew et al. 1997;Burghes 1997;Feldkötter et al. 2002). A translationally silent nucleotide difference between SMN1 and SMN2 at position 6 of exon 7 (c.840C > T) results in inefficient inclusion of exon 7 in SMN2 transcripts (Monani et al. 1999). Therefore, in contrast to SMN1, which exclusively produces full-length (FL) transcripts containing exon 7, SMN2 produces a limited amount of FL transcripts and predominantly a shorter isoform resulting from exon 7 skipping (Δ7). The protein encoded by the Δ7 transcript has a different carboxy-terminus end and is unstable . Finally, the pathophysiology of SMA is based on the inability to produce a sufficient amount of FL transcript and therapeutic strategies aim to increase the inclusion of exon 7 in the SMN2 transcript or replace the defective SMN1 gene (Wirth 2021).
In about 95% of SMA patients, SMN1 gene inactivation is caused by homozygous gene deletion or gene conversion (Lefebvre et al. 1995;Frugier et al. 2002). In the 5% remaining cases, the loss of SMN1 function results from the deletion on one allele and a small intragenic pathogenic variant on the second allele or, very infrequently, from subtle variations on both alleles. Four recurrent variants in exons 3 and 6 account for almost 70% of these small variants (Alías et al. 2009;Xu et al. 2020).
In this study, we report the insertion of a SINE-VNTR-Alu (SVA) type F retrotransposon in SMN1 intron 7 of a patient clinically affected by SMA and carrying a heterozygous SMN1 deletion. The causative role of this SVA insertion was demonstrated using a transcriptomic analysis that showed a dramatic decrease of the full-length transcript and the generation of a mutant transcript including part of the SVA.

Determination of SMN1 and SMN2 copy number
High quality genomic DNA was extracted from the peripheral blood of the patients (QuickGene DNA Whole Blood Kit L, Kurabo, Osaka, Japan) according to the manufacturer's instructions.
Homozygous deletion of the SMN1 gene was searched for using ACRS (Amplification Created Restriction Site; van der Steege et al. 1995). SMN1 and SMN2 copy numbers were determined using two QMPSF assays as previously described (Quantitative Multiplex PCR of Short fluorescent Fragments; Saugier-Veber et al. 2001;Vezain et al. 2010). The QMPSF-DraI specifically introduces a DraI restriction site into SMN2 amplicons but not into SMN1 amplicons whereas QMPSF-HinfI creates a HinfI restriction site into SMN1 amplicons but not into SMN2 amplicons. PCR products were digested with DraI or HinfI, separated by electrophoresis and analyzed on an ABI Prism 3130 xl Genetic Analyser instrument (Applied Biosystems, Foster City, CA, USA).

DNA sequencing of the SMN1 intron 7-SVA-F region of interest
A specific SMN1-SVA genomic fragment was amplified by long-range PCR using a forward primer at the SMN intron 7-SVA boundary (SMNInt7-SVA-F) and a reverse primer specific to SMN1 (SMN1Int7-215A-R). The long-range PCR was carried out using 0.4 mM dNTP mix 1.25 unit of TaKaRa LA taq ® DNA polymerase (Takara Bio Europe, Saint-Germain-en-Laye, France), 1 μM of each primer and 250 ng of DNA. After an initial cycle of denaturation at 94 ℃ for 1 min, 40 cycles were performed consisting of denaturation at 98 ℃ for 10 s and annealing at 68 ℃ for 90 s, and final extension at 72 ℃ for 10 min. The PCR fragment was resolved on a 1% agarose gel and purified. The PCR fragment was then Sanger sequenced using a series of primers (SVA-1F to SVA-9F) so as to walk along the fragment. Sanger-sequencing reactions were carried out with BigDye™ Terminator v3.1 cycle sequencing kit and a 3500 Series Genetic Analyzer (Applied Biosystems). PCR and sequencing primers are listed in Table S2. Description is based on the SMN1 reference sequence NM_000344.3.

Quantification of SMN1 and SMN2 full-length transcript using SNaPshot assay
Blood samples were collected using the PAXgene™ Blood RNA Tube system (Qiagen, Courtaboeuf, France) and RNA was extracted with the PAXgene™ Blood RNA Kit (Qiagen). Total RNA (100 ng) was reverse-transcribed into cDNA using Verso cDNA kit (ThermoFisher Scientific, Waltham, USA) together with a mix of anchored oligo-dT primers and random hexamers (3:1). cDNA were PCR amplified under quantitative conditions using the forward primer at the junction of SMN exon 4 and 5 (SMNEx4-5F) and the reverse primer in exon 7 (SMNEx7-R). Genomic DNA from the same individual was amplified, under quantitative conditions, using the forward primer in SMN intron 6 (SMNInt6-F) and the same reverse primer in exon 7. Purified PCR products (20 ng) were used as templates in SMN exon 7 c.840 specific primer extension reactions (0.2 µM; SMNEx7-840-PE-R) with the SNaPshot™ Multiplex Kit (Applied Biosystems), according to the manufacturer's instructions. Then, the reaction products were treated with Shrimp Alkaline Phosphatase (1U; Roche Diagnostics, Meylan, France), and separated on a 3500 Series Genetic Analyzer (Applied Biosystems). Primer extension products were quantified by measuring the area of the peaks corresponding to each allele. Given the unequal fluorescence intensities of the incorporated fluorophore-labeled dideoxynucleotides, the relative amount (in percentage) of wild-type and mutant FL-SMN transcripts produced by each SMN gene was determined by normalizing the cDNA results to the matching gDNA data. PCR and extension primers are listed in Table S2. Experiments were performed in triplicate.

RNA sequencing
Total RNA was extracted from LCLs using the NucleoSpin RNA kit, according to the manufacturer's protocol (Macherey Nagel, Hoerdt, France). Total RNA was also extracted from patient LCL after puromycin treatment (10 μg/mL puromycin is added to the culture 5.5 h before harvesting). A second DNase treatment was performed to ensure RNA specific sequencing (AMPD1, Sigma-Aldrich, St Louis, MO, USA). The yield and quality of the isolated RNAs were assessed using a NanoDrop 2000 (Thermo Fisher Scientific) and a TapeStation 4200 (Agilent Technologies), respectively. rRNA were depleted from 1 µg of high-quality total RNA (RIN > 9) using the NEBNext Ribosomal Depletion kit (New England Biolabs, Ipswich, MA, USA). cDNA libraries were then prepared from each rRNA depleted RNA samples using NEBNext Ultra II Directional RNA Library Prep Kit for Illumina according to the manufacturer's instructions (New England Biolabs). Total RNA-seq libraries were sequenced on a NextSeq 500 platform (Illumina) using high-output paired-end sequencing (2 × 75 bases). About 90 million reads were generated from each library with a mean quality scores above 30 (Table S3). Reads demultiplexing and generation of Fastq files were obtained using bcl2fastq conversion software (Illumina,v2.20). The bioinformatics data analysis was performed using the nf-core RNASeq analysis pipeline (v3.1, Ewels et al. 2020). RNA-Seq reads were aligned to the human reference genome (GRCH37, hg19) using STAR aligner (v2.6.1d). Bam files visualization and read count determination were performed using Integrative Genome Viewer (IGV v2.5.2, Broad Institute).

RNA sequencing of the SMN1-SVA isoform
Total RNA obtained from LCLs was reverse-transcribed into cDNA using Verso cDNA kit (anchored oligo-dT and random hexamers primers). SMN-SVA cDNA fragment was PCR amplified using a forward primer at SMN exon 4-5 boundary (SMNEx4-5F) and a reverse primer in SMN exon 8 (SMNEx8-R).

A diagnostic conundrum in a patient strongly suspected of SMA
Patient P was the first child of unrelated healthy parents of Caucasian origin. Family history was unremarkable. He had a healthy brother. He was first investigated at age 15 years for gait disturbances. He had developed progressive lower limb weakness during the second decade. He first experienced difficulties to get up from a crouching position. Clinical examination showed marked amyotrophy of both limb girdle muscle associated with areflexia. Electromyography revealed severe axonal neuropathy. CPK levels were 800-900 UI/L. He never experienced any sensory impairment nor cognitive impairment. The disease was slowly progressive and the patient began to use a wheelchair intermittently from age 36 years on. The clinical presentation is compatible with SMA type III with a slow course of the disease.
Patient P was twice tested using ACRS assay (Amplification Created Restriction Site) at age 21 and 32 years (van der Steege et al. 1995). These tests failed to identify a homozygous deletion of the SMN1 gene, leading to a diagnostic deadlock (Fig. 1a). Additional molecular investigations were requested at age 50 years using two distinct QMPSF (Quantitative Multiplex PCR of Short fluorescent Fragments) assays to investigate the hypothesis of a heterozygous SMN1 deletion being associated to a small variant on the second allele. Nevertheless, this method failed to quantify the SMN1 copy number: the QMPSF profiles did not strictly display a heterozygous deletion and showed only trace of SMN1 exon 7 although the two primer pairs were not overlapping (Fig. 1b). Intriguingly, the father and the brother of patient P also presented an atypical QMPSF profile suggesting that they carried the same SMN1 alteration thus excluding the hypothesis of a somatic mosaic deletion in patient P (data not shown). In addition, the two independent QMPSF assays performed with non-overapping primers showed the presence of only one copy of the SMN2 gene in patient P.
After the cause of this form of SMA was elucidated in patient P, treatment with Nusinersen was considered. Nusinersen is an approved drug for SMA that forces inclusion of exon 7 of the SMN2 gene in the final transcript to increase FL transcript production. Patient P was treated by intrathecal injection of Nusinersen for 8 months, without sideeffects. Up to now, he has received 5 injections. He describes improved distal hand movements, such as opening or closing his belt. It was confirmed by an objective improvement in the pinch test, which assesses the strength of the distal thumb-index pinch. He also notices a decrease of muscular pain and fatigue. He is now able to roll over in bed, or cross and uncross his legs independently. His dyspnea has subsided. He has a constant follow-up by a physiotherapist, who has also observed an improvement in motor tests. Between August 2021 and March 2022, the RULM (Revised Upper Limb Module) scale, which assesses the upper limb locomotion, improved by 3 points out of 43 (from 27 to 30) and the MFM scale (Motor Function Measurement), which is a global assessment of the motor functions, improved by 5 points out of 96 (from 42 to 47).

Detection of a SVA-F insertion into SMN1 intron 7
We hypothesized the presence of a causative germline aberration in intron 7 of the SMN1 gene that impaired splicing and likely involved a repetitive region that impeded proper DNA amplification. We performed Exome Sequencing (ES ,  Table S1) on DNA isolated from the patient's fresh blood sample and focused our analysis on the SMN1 gene. We observed the presence of split reads partially mapping to SMN1 intron 7, with a breakpoint at position c.*3 + 32, arguing for the presence of a structural variant downstream in intron 7 (Fig. 2a). We focused on three positions which differ between SMN1 and SMN2 in intron 7 and exon 8, namely c.*3 + 100 and c.*3 + 215 in intron 7 and c.*239 in exon 8 (Blasco-Pérez et al. 2021), and we noticed that these SMN1 regions were normally captured and sequenced, indicating that the second breakpoint should occur between c.*3 + 32 and c.*3 + 100 (data not shown). To elucidate the nature of this rearrangement, we performed a BLAT tool search (BLAST-Like Alignment Tool; UCSC, GRCh37, Hg19) using the first 66 bp of the inserted sequence. More than 200 perfect matches were obtained all over the human genome corresponding to a retrotransposable element of the SVAtype F category. Retrotransposons are genomic sequences with the ability to duplicate themselves at a distance in the genome via a RNA transposition intermediate. SVA are hominid-specific, composite non-coding retrotransposons that contain SINE (Short Interspersed Sequence), VNTR (Variable Number of Tandem Repeat) and Alu sequences (Hancks and Kazazian 2010). Canonical SVA are on average ~ 2 kilobases long and they are classified in seven subfamilies, from SVA A-F and F1, depending on their SINE sequences (Wang et al. 2005). The presence of this SVA insertion, called SMN1-SVA allele, was validated with a specific PCR amplification using a primer in the SMN1 gene and a primer in the inserted sequence (Fig. 2b, Table S2).

Fig. 1
Conventional tools failed to confirm the diagnosis of SMA in patient P. a ACRS assay which is designed to detect SMN1 homozygous deletion, did not detect a homozygous deletion in patient P. Size of the PCR products: SMN1 (201 bp), SMN2 (177 bp) and undigested PCR product (232 bp). MW: DNA Molecular Weight Marker (100 bp ladder). b Two independent QMPSF assays with non-overlapping primers were used to determine SMN1 and SMN2 copy numbers. PCR multiplex amplification of the SMN amplicon, a control amplicon and a digestion control amplicon using dye labeled primers was followed by DraI or HinfI digestion to distinguish SMN1 from SMN2. Then, digested PCR fragments were separated by electrophoresis on an automated sequencer. The dosage quotient (DQ) was calculated using the following formula: DQ = A patient ∕A normal control ∕ A Cpatient ∕A Cnormal control where A is the fluorescent peak area of the SMN1 or SMN2 amplicon and A C is the fluorescent peak area of the control amplicon. The normal control individual displayed 2 copies of SMN2 gene and 2 copies of SMN1 gene 1 3 The SMN1-SVA allele was inherited from the father and was also detectable in the brother whereas the SMN1 deleted allele was inherited from the mother (Fig. 2c).

Characterization of the SVA-F in SMN1 intron 7
To fully elucidate this SVA insertion in the SMN1 gene, we designed a forward primer mapping to a sequence that overlapped intron 7 and the SVA element and a reverse primer, which specifically mapped to SMN1 intron 7, downstream of the insertion (Table S2). We amplified and Sanger sequenced the whole SVA insertion. According to the repeat Masker annotation, the obtained sequence was shown to be 88.7% identical to a retrotransposon of the SVA subtype F category. This SVA-F element was approximately ~ 1090 bp long, it is flanked by an SMN1 13 bp long target-site in SMN1 intron 7 in patient P's DNA. The bam alignment file showed split reads corresponding to the presence of an unaligned sequence downstream of the position SMN1 c.*3 + 32. b SMN specific PCR demonstrating that the insertion took place on SMN1 and not on SMN2 intron 7. Forward primers were used to specifically amplified SMN1 (c.835-44G) or SMN2 (c.835-44A) together with a reverse primer located at the insertion-intron 7 boundary (amplicon size: 171 bp). PCR were performed in each individual of patient P's family. A PCR assay was also designed in PDGFA as an amplification control (amplicon size: 293 bp). MW: DNA Molecular Weight Marker (100 bp ladder). c Recapitulation of the molecular results in the family showing that the SMN1-SVA allele was inherited from the father and was also detectable in the brother whereas the SMN1 deleted allele was inherited from the mother. Copy numbers of SMN1, SMN1 harboring the insertion (SMN1-SVA) and SMN2 were indicated for each member of the family duplication (TSD, chr5(GRCh37):70247841_70247853; NM_000344.3:c.*3 + 20_*3 + 32). This SVA appeared to be 5'-truncated and was missing the CCTCT hexamer. The presence of TSD, the L1 endonuclease recognition sequence, and the poly-A tail together indicated that this SVA insertion occurred by L1-mediated target-primed reverse transcription (Zingler et al. 2005, Fig. 3 and Online resource 1). This transposable element has never been reported in retrotransposon insertion polymorphism databases like dbRIP or euL1db. Moreover, by systematically searching for split reads in the SMN1 gene, we failed to identify any SVA insertion in over 1,300 control individuals in our inhouse ES database and in 2,800 control individuals in the 1000 Genomes project. To date, only few SVA insertions have been reported in human disorders, mostly by splicing alteration (Table 1;

Dramatic decrease of the SMN1 FL transcript production in patient P
The SVA-F retrotransposon insertion was located within SMN1 intron 7 in a region where several cis-regulatory splicing elements have been described (Singh and Singh 2018). Therefore, we wondered if this retrotransposon insertion could alter SMN1 exon 7 splicing regulation and thus lead to its exclusion from mature transcripts (∆7 SMN). To address this question, the FL-SMN transcript levels generated in total blood from SMN1 and SMN2 genes were quantified in patient P compared to several relevant controls using SnapShot fluorescent primer extension assay (Vezain et al. 2010). This approach allowed us to determine the relative amounts of FL-SMN transcripts produced either by the SMN1 gene or the SMN2 gene, as these transcripts differ at position c.840 (C in SMN1 versus T in SMN2 transcripts). As expected, FL SMN2 transcripts displayed a comparable level in patient P, who carried one copy of the SMN2 gene, and in controls also harboring one SMN2 copy (Fig. 4a). In contrast, FL SMN1 transcripts were drastically reduced in patient P, comparatively to controls with one wild-type SMN1 copy (Fig. 4a). This drastic reduction in the production of SMN1 FL transcripts validated the pathogenicity of the allele carrying the SVA-F insertion. Nevertheless, in contrast with controls affected with SMA harboring a homozygous SMN1 deletion who had no detectable FL SMN1 transcripts, patient P produced small amounts of FL SMN1 transcripts (Fig. 4a).

Characterization of the SMN transcripts in patient P
To investigate whether the SVA insertion would allow the production of additional SMN1 transcript isoforms leading   (Kherraf et al. 2018) to functional or partially functional proteins, we analyzed the SMN transcripts in lymphoblastoid cell lines (LCL) of patient P and controls. In a first approach, to avoid any selection bias introduced by targeted RT-PCR method, we performed an Illumina short-read total RNA sequencing in LCLs treated or not with puromycin, which blocks Fig. 4 Characterization of SMN transcripts in patient P. a Allele specific expression analysis using primer extension of the full-length SMN transcripts generated from the SMN2 gene (FL SMN2, bars in dark gray) and from the SMN1 gene (FL SMN1, bars in light gray) in total RNA blood samples. The cDNA region was PCR amplified using a forward primer at the exon 4-5 boundary and a reverse primer exon 7 encompassing the position c.840 where C is specific from the SMN1 gene and T is specific from the SMN2 gene. Results are shown for patient P (1 SMN2 copy/1 SMN1 copy carrying the SVA insertion, this mutant allele is represented by a star) and controls (1 SMN2/0 SMN1; 4 SMN2/0 SMN1; 0 SMN2/1 SMN1; 1 SMN2/1 SMN1 and 2 SMN2/1 SMN1). Reverse primer extension was performed at the c.840 position, a fluorescent ddG was incorporated in case of FL SMN1 and a fluorescent ddA in case of FL SMN2. The extended primers were separated by electrophoresis on an automated sequencer and peak area were measured. The results obtained with cDNA were normalized to those obtained with gDNA and were expressed as relative amount (in arbitrary unit) of wild-type and mutant FL-SMN transcripts produced by each SMN gene copy. Each value represents the mean (± S.E.M.) of three independent assays. ***, p < 0.0001 using the unpaired t test. b RNA sequencing data displayed from Integra-tive Genomics Viewer surrounding SMN1 exon 7 from patient P lymphoblastoid cell line (top) and patient P lymphoblastoid cell line treated with puromycin (bottom). In both cases, the top track represents read count number and bottom track aligned reads. Reads were properly aligned on the reference sequence before the + 32 position in SMN1 exon 7. Then, as depicted by clipped reads, SVA sequence reads failed to align (red brackets). c SMN transcript quantification from RNA-Seq data. SMN transcripts counts were normalized to total read counts in lymphoblastoid cell line RNA samples from patient P, patient P puromycin treated, and controls (4 SMN2/0 SMN1; 0 SMN2/1 SMN1 and 1 SMN2/2 SMN1). Three SMN transcripts were quantified: FL-SMN at exon 7-exon 8 junction (from SMN1 and SMN2, in dark gray), Δ7-SMN at exon 6-exon 8 junction (from SMN2, in light gray) and FL-SMN1-SVA (from SMN1-SVA, represented by a star; in black). d Schematic representation of SMN RNA processing. A generic SMN premessenger RNA is represented on the top. The red dot represents the splice donor site that was used in the inserted SVA. FL-SMN1-SVA transcript is subjected to NMD as the stop codon within exon 7 is located upstream the 50 last bp of the exon and is now recognized as a premature stop codon (PTC) translation and consequently nonsense-mediated mRNA decay (NMD). In patient P, RNA-Seq experiment revealed the expression of a chimeric isoform, called FL-SMN1-SVA transcript, harboring exon 7, the first 32 nucleotides from intron 7 and several nucleotides from the SVA-F (visible only on read 2; Fig. 4b). Due to the high level of repeat sequences in the SVA and short-read sequencing, this experiment did not allow the sequencing of the whole transcripts harboring the SVA sequence. To quantify the relative production of the SMN isoforms, we determined, after total read count normalization, the number of SMN reads overlapping the boundaries of exon 7 and 8 junctions: exon 7-exon 8 for FL-SMN transcript, exon 6-exon 8 for ∆7-SMN transcript and exon 7-int7-SVA for the FL-SMN1-SVA transcript (Fig. 4c). We showed that patient P who carries 1 SMN2 gene/1 SMN1-SVA gene produced less FL transcript than a normal control with 0 SMN2 gene/1 SMN1 gene who produced 100% FL transcript (30.7 counts vs 61.4 counts respectively). We also showed that patient P produced less FL transcript than a patient affected by SMA type III with 4 SMN2 gene/0 SMN1 gene (30.7 counts vs 53.5 counts, respectively). After puromycin treatment of patient P's cell line, we recorded an increased proportion of the FL-SMN1-SVA transcript among all types of transcripts, from 8% (5.6 counts) to 38% (39.1 counts), suggesting that this aberrant chimeric transcript is a target of the NMD surveillance system (Fig. 4b, c). Eventually, we were able to detect only two clipped reads, which were aligned partially on the SVA and on the 5' part of SMN exon 8 (data not shown). Concerning Δ7-SMN transcripts, levels in patient P were comparable from those of an individual carrying 1 SMN2 gene. Treatment with puromycin of patient P's LCLs did not affect the level of Δ7-SMN transcripts. Therefore, the FL-SMN1-SVA transcript did not induce any production of Δ7-SMN transcripts in patient P (Fig. 4c). In a second attempt to fully characterize this FL-SMN1-SVA transcript, we performed a targeted RT-PCR analysis of the chimeric transcript using a forward primer at SMN1 exon 4-5 boundary (SMN1Ex4-5F) and a reverse primer located in SMN exon 8 (SMNex8-R; Table S2). Sanger sequencing of this cDNA fragment with the SMNInt7-SVA-F and SVA-1F to SVA 9F primers (Table S2) revealed an isoform which included SMN1 exon 7, the first 32 nucleotides from intron 7, the first 581 nucleotides of the SVA and exon 8 (Fig. 4d). Bioinformatics splice site algorithms predicted the existence of several cryptic splice sites in the SVA inserted sequence, and notably, a donor site was predicted precisely at position 581 of the SVA (Fig. 4d, Table S4). The score of this cryptic SVA splice site was slightly stronger than that of the SMN1 exon 7 donor site. Therefore, in patient P, this cryptic donor site was used in place of the natural one and was responsible for the retention of part of intron 7 and part of SVA (Fig. 4d). This exonisation of intronic and SVA sequences had drastic consequences on the processing of FL-SMN1-SVA mRNA isoform. Indeed, in the FL-SMN transcript, the natural termination codon is located in exon 7, 3 bp before the last exon-exon boundary. According to the knowledge on the NMD surveillance, because this stop codon is located within the 50-55 nucleotides in the penultimate exon, this FL-SMN transcript escapes mRNA degradation and leads to FL-SMN protein (for review, Kurosaki et al. 2019). On the contrary, in the FL-SMN1-SVA transcript, because the SVA insertion abolished the use of the natural donor splice site and created a stronger new one, the natural stop codon moved to more than 600 nucleotides from the last exonexon boundary, defining it as a premature termination codon (PTC, Fig. 4d). Therefore, the FL-SMN1-SVA transcript is subject to mRNA degradation as observed when comparing RNA-Seq data with and without puromycin (Fig. 4c).

Discussion
Here we present the first case of germline SVA-F retrotransposon insertion into the SMN1 gene leading to a SMA phenotype. The insertion severely impaired normal splicing, resulting in the degradation of the FL-SMN1-SVA transcript by NMD. Its intronic location and the repeat-rich nature of the insertion generating cruciform structures, explained why PCR amplifications were impossible and why this genomic alteration was missed with conventional genetic procedures, leading to a diagnostic odyssey of nearly 30 years. Moreover, genetic counseling in the patient's family was not possible until diagnostic confirmation was obtained and the genomic alteration in each of the two alleles characterized.
It is noteworthy that patient P displayed a particularly mild phenotype corresponding to SMA type III despite a single SMN2 copy. Interestingly, a physiological alternative transcript, named SMN6B, generated by exonisation of an intronic-Alu sequence in intron 6 of both SMN1 and SMN2 genes has been described previously (Seo et al. 2016). Despite low abundance of this SMN6B transcript, which was subjected to NMD, the endogenous SMN6B protein was expressed in both neuronal and non-neuronal cells, interacted with the Gemin2 protein, and was more stable than the SMN-∆7 protein. Moreover, SMN∆7 read through product was shown to restore SMN protein functionality by increasing protein stability (Mattis et al. 2008). Indeed, the authors were able to show that these constructs could promote neurite outgrowth in SMN-deficient neurons, and significantly elevate SMN-dependent UsnRNP assembly in extracts from SMA patient fibroblasts. Therefore, we can hypothesize that, in patient P, the small quantity of FL-SMN1-SVA transcript can generate a non-negligible quantity of functional protein explaining the less severe course of the disease. The sequence modification of the C-terminal domain may dictate different protein interactions and complex formation resulting in higher stability of the protein through post-translational modifications. Indeed, post-translational modifications regulating the SMN protein stability were described, like hyperSUMOylation, which modifies the SMN degradation following ubiquitin/proteasome pathway (Locatelli et al. 2015;Yang et al. 2021) or phosphorylation of C-terminal sites, which conditions protein stability and nuclear localization (Rademacher et al. 2020). Stability of the SMN protein is linked to oligomerization and the YG box located at the C-terminal end of the SMN protein utilizes a glycine zipper motif to form dimers (Gupta et al. 2021). Knowing that the SVA is inserted downstream the YG box and does not disrupt this domain, the coiled-coil interactions in the YG domains of SMN and FL-SMN1-SVA may be enhanced, contributing at least in part to the stability of the FL-SMN1-SVA protein.
Nearly half of the human genome sequences are estimated to be transposable elements (TEs) in origin (Lander et al. 2001). These TE are divided into two categories, namely class 1 and class 2, depending on their mechanism of transposition via a DNA or RNA intermediate (Hancks and Kazazian 2016). Class 2 elements, also known as DNA transposons, represent around 2% of our genome, and are mobilized via a DNA intermediate through a cut-and-paste mechanism before losing their ability to mobilize in the human genome. Class 1 elements, also known as retrotransposons, represent the most abundant class of mobile elements in the human genome and mobilize through a "copy-and-paste" mechanism whereby a RNA intermediate is reverse-transcribed into a cDNA copy that is integrated elsewhere in the genome. Long INterspersed nuclear Elements (LINE), Short INterspersed nuclear Elements (SINE), and SINE-VNTR-Alu (SVA) are non-LTR retrotransposons. SVA element are the most recently evolved family of active non-LTR retrotransposable elements, being hominid-specific, with approximately 2700 SVA copies in humans (Lander et al. 2001). The SVA family of transposable elements can be further divided into 7 subfamilies, SVA A-F and F1 based on sequence divergence and evolutionary age, subfamilies SVA_E and SVA_F being restricted to the human lineage. Structurally, a canonical SVA is composed of 5 main domains: (1) a hexamer repeat of (CCC TCT )n at the 5′ end, which may be variable in copy number, (2) an Alu-like region made up of two antisense Alu fragments, (3) one or two variable number tandem repeat (VNTR) regions, typically with a repeating sequence between 35 and 50 bp, (4) a SINE region derived from the 3′ LTR of the retroviral HERV-K10 element, and (5) a 3′ poly-A signal (Hancks and Kazazian 2016). Finally, the SVA is flanked by target-site duplications. The inserted element that we have fully characterized in the SMN1 gene meets all the criteria of a SVA.
Retrotransposons can drive genetic diversity by modulating gene expression through modification in methylation patterns, chromatin structure, transcription factor binding, or gene splicing. They can also cause diseases through a variety of mechanisms (Hancks and Kazazian 2016;Kazazian and Moran 2017; Payer and Burns 2019; Table 1): (i) SVA element insertions may lead to target-site deletions (Vogt et al. 2014). Vogt and colleagues described deletions in two unrelated NF1 patients, of 1 Mb and 867 kb respectively, resulting from the insertion of SVA elements in the neighboring gene, SUZ12P, triggering NHEJ between SUZ12P and the telomeric NF1 gene region; (ii) SVA element insertions may disrupt gene function related to epigenetic changes and/or to disruption of regulation elements; (iii) retrotransposons have long been considered to be agents of genome instability and could result in chromotripsis. An interesting example was provided by the insertion of an SVA-E into an Alu-rich region within the A4GNT gene, resulting in multiple breakpoints at the A4GNT locus (Nazaryan-Petersen et al. 2016). The suspected mechanism was based on chromatin loops mediated by homologous Alu elements located in distal regions followed by non-allelic homologous recombination leading to chromotripsis after cleavage by L1 endonuclease; (iv) retrotransposon may result in an alternative C-terminus of a protein which can in turn alter function of the gene, leading to disease. Thus, SVA element insertion in the 3' untranslated region of the FKTN gene was shown to cause one of the most common autosomal recessive disorders in Japan, Fukuyama congenital muscular dystrophy. This SVA element insertion introduced a strong splice acceptor site. The resulting product corresponded to a carboxy-terminus truncated FKTN protein that included 129 novel amino acids encoded by the SVA. The generated protein was mislocalized from the Golgi to the endoplasmic reticulum (Taniguchi-Ikeda et al. 2011); (v) but the most frequent mechanism is the SVA insertion into an exon, which results in a frameshift, or into an intron, which alters splicing. Recently, a SVA-E insertion was detected in intron 2 of the SMARCB1 gene and shown to disrupt correct splicing, resulting in loss of a functional allele by NMD (Sabatella et al. 2021). Once again, the SVA-F retrotransposon we describe in intron 7 of the SMN1 gene results in exon extension and production of a FL-SMN1-SVA transcript.
More practically, the repetitive nature of TE sequences poses technical and computational challenges for their detection. In a patient whose clinical presentation is highly suggestive of SMA, if conventional genetic procedures fail to identify homozygous SMN1 deletion or heterozygous SMN1 deletion with a subtle SMN1 variation on the second allele, genome sequencing represents the best option to identify all types of genomic alterations within the SMN1 gene. Special attention must be paid to the choice of bioinformatics pipelines on diagnostic platforms, as analyzing TE requires specialized tools. Indeed, MELT, which uses discordant read pairs and clipped reads in combination with consensus sequences of known active mobile elements, represents a powerful tool for mobile element insertion detection in such next-generation sequencing data (Gardner et al. 2017). Other tools have been developed like Tangram (Wu et al. 2014), Mobster (Thung et al. 2014) or TranSurVeyor (Rajaby and Sung 2018). Nevertheless, as numerous SVA are polymorphic, the pathogenic consequences of this insertion have to be proven before any use of this result for genetic counseling. Finally, optical mapping has been successfully used to SVA-E retrotransposon in the SMARCB1 gene in two siblings with teratoid rhabdoid tumors (Sabatella et al. 2021). However, optical mapping cannot be used to detect SVA in the SMN1 gene until the reference sequence of the SMN locus is clearly established.
In conclusion, whole genome sequencing provides unprecedented opportunities for the detection of retrotransposons causing genetic disorders, in particular SVA, even if such alterations remain rare mutational events (Torene et al. 2020). Nevertheless, the detection of these retrotransposons remains a challenge, as specific bioinformatics algorithms must be implemented into standard variant calling protocols to detect this underestimated type of variations. Moreover, their interpretation requires the deployment of functional tests, which will make it possible to establish their pathogenicity. Retrotransposon insertion polymorphism databases like dbRIP or euL1db are also valuable tools to aid interpretation.