We detected abnormally spliced DMD transcripts in 19 of 20 cases of genetically undiagnosed dystrophinopathy analyzed by RNA-seq. Among them, exonization of intronic sequence was observed in 15 cases (Patients 1–12 and 15–17), whereas exon skipping was detected in one case (Patient 13). Further, 13 pathogenic SNVs were identified (Patients 1–13), which could reasonably have caused generation of the associated abnormal transcripts. In three cases (Patients 15–17) with suspected abnormal intron sequences, long-read sequencing using CRISPR-Cas9 gene targeting revealed genomic structural variations. In one case (Patient 18), DMD transcription was prevented by a nucleotide repeat expansion in an intron, while chromosomal rearrangements were detected in two cases (Patients 19, 20), where DMD transcripts were aberrantly spliced and polyadenylated. In the remaining case (Patient 14), although there was a large duplication in an intron, no abnormal transcripts were identified.
Exonization of intronic sequences caused by SNVs
We identified pseudoexon (PE) activation in DMD transcripts in 15 cases (Patients 1–12 and 15–17), among which we detected intronic SNVs related to PEs in 11 cases (Fig. 1, Table 1; Patients 1–9, 11, 12). In Patient 10, we detected exon extension, due to activation of a cryptic splice-acceptor site, and an SNV 13 bp upstream of the authentic exon 56 3' splice site (Fig. 1j).
Exon skipping caused by SNV
In Patient 13, RNA-seq revealed DMD exon 7 skipping (Fig. 1m). No significant variants in exon 7 or its immediate flanking regions were detected; however, an SNV (g.32,827,750A>T) 22 bp upstream of exon 7 was detected by genome sequencing and predicted to disrupt the authentic branchpoint.
Complex PE activation by large intragenic structural changes
We detected large intronic structural changes in genomic DNA in four cases (Patients 14–17, Fig. 2) who all showed complex PE activation in DMD transcripts, except for Patient 14. In Patient 14, although we could not find clear PE activation in intron 2, based on the hg19 reference genome sequence, we detected a remarkable reduction in DMD transcript levels, a genomic duplication in intron 2 (by genome sequencing), and an inverted duplication of a 42.5-kb sequence in intron 2 by targeted long-read sequencing (Fig. 2a). Reanalysis of Patient 14 RNA-seq data, based on their own genomic sequence, revealed no PE (Fig. 2b).
In Patient 15, we detected several reads in intron 13, but failed to detect intron/exon-spanning reads (data not shown). Sequencing of an RT-PCR product spanning exons 13 and 14 revealed an inserted inversely PE (Fig. S1). To discover the precise genomic change in intron 13, which spans 21 kb in the reference genomic sequence, we performed targeted long-read sequencing using the CRISPR-Cas9 system, by designing guide RNAs in exons 13 and 14. Reads of approximately 7-kb containing guide RNA recognition sites at each end were generated from patient genomic DNA, revealing two large deletions, spanning 12 and 2.6 kb near exons 13 and 14, respectively, and an inversion of the flanking 3.7 kb region (Fig. 2c); a PE was found in this inverted region (Fig. 2d).
In Patient 16, we detected low DMD transcript expression and multiple spliced products containing sequences from exon 44 to within intron 44 by RNA-seq. Since transcript termination was excepted, we performed 3' rapid amplification of cDNA ends (RACE) and detected a product terminating at an intronic polyadenylation signal at g.32,142,407 (Fig. S2a). Surprisingly, other 3'RACE products from exon 44, which were unintentionally amplified using endogenous adenine tracts, contained sequences from distant regions of the X chromosome (g.26,895,124–26,895,242/g.26,896,753–26,896,896/g.26,899,638–26,899,733). Further, we found a 55-kb deletion in intron 44 and a 212-kb duplication at g.26,768,158–32,235,180 by genome sequencing. Some reads with clipped sequences were found near the deletion in intron 44. To clarify the intron 44 structure, we designed crRNA in intron 44 and performed targeted long-read sequencing, which detected a 54-kb deletion and translocation of a 211-kb sequence into intron 44 (Fig. 2e). Reanalysis of RNA-seq data, based on patient genomic sequence, identified a new transcript with multiple exons in intron 44 (Fig. 2f).
In Patient 17, we could not detect clear exon-spanning reads after exon 60 by RNA-seq. To define the genomic structure, we designed crRNAs in exons 60 and 61, performed targeted long-read sequencing, and detected a 139-kb inversion from intron 60 to intron 63 (Fig. 2g). By reanalysis of RNA-seq data, based on the patient’s genomic sequence, we identified two new transcripts from the complementary strand of introns 61 and 62 (Fig. 2h).
Transcript termination due to repeat expansion
In Patient 18, reads spanning exons 62 to 63 were decreased (Fig. 2j), and reads containing a CTT nucleotide repeat expansion at g.31,302,724 were detected by genome sequencing. Targeted long-read sequencing showed that the CTT expansion comprised 1381–1502 repeats in the patient, compared to 16 repeats in the reference sequence (Fig. 2i, Fig. S4). Finally, ribo-zero RNA-seq analysis demonstrated that the majority of transcripts stopped in intron 62 (Fig. 2k).
Aberrant transcript generation due to a large chromosomal rearrangement
DMD transcripts from Patients 19 and 20 appeared to terminate in exons 51 and 52, respectively. RNA-seq data from Patient 19 contained a few short reads in exon 51, which had clipped sequences directly connecting to g.34,314,059, and several continuous reads into intron 51, based on visualization using IGV. Two types of 3' transcript sequences were detected by 3'RACE (Fig. S2b): a major product containing intron 51 sequence, which stopped at an intronic polyadenylation signal, and a minor product, including the g.34,295,924–34,296,069 sequence after exon 51. Genome sequencing and long-read sequencing detected a 2.4 Mb inversion of the short arm of chromosome X (p21.2–p21.1), which split the DMD gene (Fig. 2l and m). Reanalysis of RNA-seq data using personalized genomic sequence led to identification of a novel transcript (Fig. 2n).
In Patient 20, two kinds of short reads with sequence from g.31,747,746 connected to exon 52 and terminating in intron 52, similar to those detected in Patient 17, were detected by RNA-seq based on IGV analysis of the hg19 reference sequence. Genomic analysis to detect break points revealed a pericentric inversion of chromosome X (Fig. 2o). RNA-seq reanalysis based on the patient genome reference sequence detected multiple PE sequences after exon 52 (Fig. 2p), while 3'RACE detected an intron 52 sequence, which terminated at an intronic polyadenylation signal, at the end of DMD transcripts (Fig. S2c).
Analysis of mechanisms regulating abnormal splicing usingin silico prediction tools
We detected intronic SNVs related to PE in 11 cases (Fig. 1; Patients 1–9, 11, 12); these variants produced splice sites. We analyzed the spliceogenicity of sequences surrounding and including splice site variants using SpliceAI. Activation of each PE was predicted with high SpliceAI scores, indicating that sequences surrounding the variants had the necessary elements for exon generation, such as branch points (Table 1).
Patient 12 had an intronic variant 21 bases upstream of the PE (Fig. 3a). RNABP predicted that this variant caused a new alternative branch point, which was stronger than that in the reference sequence (Fig. 3b). Further, in Patient 13, we checked whether branch point selectivity was associated with skipping of exon 7 (Fig. 3c) using RNABP, which suggested that this variant created an alternative weaker branch point, while suppressing the authentic branch point (Fig. 3d).
In Patient 10, who had an SNV 13 bp upstream of the authentic 3' splice site of exon 56 (Fig. 3e), we tested the possibility that the variant caused exon extension by influencing the strength of the splice site recognition site using MaxEntScan. The 3' splice site scores calculated were 10.48 and 0.57 at the authentic splice site and the novel splice site, respectively, for the reference sequence, and 5.05 and 8.94 for the variant sequence, consistent with the RNA-seq results (Fig. 3e). SpliceAI also predicted Acceptor Loss (0.97) for the authentic splice-acceptor (–13 bp) and Acceptor Gain (1.00) for the cryptic splice-acceptor (–2 bp).
In Patient 15, we assessed the splice site scores of the PE inside the 3.7 kb inversion-sequence. The 3' and 5' splice site scores calculated by MaxEntScan were 9.30 and 6.32 (Fig. 3f), indicating that they could act as strong 3' and 5' splice sites, respectively.
Interestingly, a g.31,786,268A>G variant was detected alongside an intronic polyadenylation site in intron 51 of Patient 19, and predicted to be a recursive splice site using MaxEntScan (MaxEnt score, reference sequence: 6.40 for 3' splice site, 1.14 for 5' splice site; variant sequence: 4.46 for 3' splice site, –6.62 for 5' splice site) (Fig. S3a and b). Therefore, we evaluated RNA extracted from normal muscle by RT-PCR and detected a splicing intermediate, which contained the reconstituted 5' splice site, suggesting that this site is a recursive splicing site (Fig. S3c). Thus, the intronic polyadenylation in Patient 19 intron 51 is likely to be activated due to suppression of recursive splicing by a variant in the site.