The clinical signs and symptoms are presented in Table 1. All patients were previously diagnosed by measuring the alpha galactosidase A enzyme activity and Sanger sequencing. The samples tested were blinded except for gender, meaning that aside from male sex, the genetic variant and any relationship among the samples was not shared with our lab.
Table 1
Clinical presentation of the cohort of Fabry patients. Acrop - acroparestesias, Hypo - hypohidriosis, Angio - angioceratoma, Cornea - cornea verticillata, Nephro – nephropathy, Cardio – cardiomyopathy, Stroke – history of stroke.
Sample | Therapy | Acrop | Hypo | Angio | Cornea | Nephro | Cardio | Stroke | phenotype | Note |
1 | Agalsidase α | + | | | | | | | classic | |
2 | Agalsidase β | + | + | + | + | | | | classic | |
3 | Agalsidase β | + | + | | + | + | | | classic | |
4 | Agalsidase β | + | + | + | + | | | | classic | |
5 | Agalsidase β | + | + | + | + | + | + | | classic | |
6 | No therapy | | | | | | | | benign | |
7 | Agalsidase α | + | + | + | + | + | | | classic | brother of 8 |
8 | Agalsidase α | + | + | + | + | + | + | + | classic | brother of 7 |
9 | Agalsidase α | + | + | | + | + | + | | classic | |
10 | Agalsidase β | | | | | | + | | later-onset | |
11 | Agalsidase α | + | + | + | + | + | + | | classic | |
12 | Agalsidase α | + | + | + | + | + | + | | classic | |
Nanopore testing of the patients:
GLA amplicon sequencing
In order to detect genomic variants in the GLA locus, we designed a PCR amplicon that produces a 13kb product including the entire gene, and 800 bp and 2000 bp up- and down-stream sequences, respectively. Sequencing the pooled PCR products of the 12 samples on one MinION flow cell yielded a median of 88,800 reads and 617 million bases per sample (Table S1). Read length distributions of the 12 samples show a peak around 13kb, demonstrating that most reads are of full amplicons (Fig. S1). Mapping the reads to the human reference genome showed a median of 44,000X coverage per sample around the GLA region.
SNV and indel detection
The nanopolish tool was then used to call single nucleotide variants (SNV) and short indels. The resulting variants were filtered based on quality score to eliminate false positives calls. The filtered variants were first searched in the ClinVar database for any known classification. The variants that did not match any entry in known databases were manually classified for their predicted effect on the protein using several prediction tools (see Methods). A summary of the variants with likely pathogenic effect on the protein translation is shown in Table 2. In 10 of the 12 samples analyzed we identified a pathogenic or likely pathogenic exonic SNV (6 samples) or short indel (4 samples). For another sample, we detected an intronic SNV (IVS53 + 405T > G) at a possible branch site that is likely to affect splicing. This variant was detected previously by Sanger sequencing of multiple GLA intronic amplicons. With nanopore sequencing, the same variant was detected by a single PCR assay.
Table 2
Genotyping results based on ONT amplicon sequencing, and predicted consequence of the genetic variant on GLA. * - two samples from the same family were tested; N/A = not applicable
ONT Barcode | GLA nucleotide variant (Accession: NM_000169) | GLA protein variant (Accession: NP_000160) | Comments |
1 | IVS53 + 405T > G | N/A-deep intronic | deep intronic variant; possible branch site; may affect splicing |
2 | c.1147_1149delTTC | p.Phe383del | pathogenic |
3 | c.744_745delTA* | p.Phe248LeufsX7 | pathogenic; same haplotype as barcode05 suggests blood relationship |
4 | c.559_560delAT | p.Met187Valfs*6 | pathogenic |
5 | c.744_745delTA* | p.Phe248LeufsX7 | pathogenic; same haplotype as barcode03 suggests blood relationship |
6 | c.352C > T | p.Arg118Cys | Conflicting_interpretations_of_pathogenicity |
7 | c.370-2A > G* | N/A- splicing | pathogenic; same haplotype as barcode08 suggests blood relationship |
8 | c.370-2A > G* | N/A- splicing | pathogenic; same haplotype as barcode07 suggests blood relationship |
9 | c.704C > A | p.Ser235Tyr | pathogenic |
10 | c.337T > C | p. Phe113Leu | pathogenic |
11 | Exon 2 deletion) | N/A | pathogenic; 2914bp deletion removes GLA exon2 (chrX:100658307–100661221 |
12 | c.581C > T | p.Thr194Ile | likely pathogenic |
Structural variants
Although structural variants (SVs), such as insertions and deletions longer than 50 bp, are rare compared to SNVs, many of them result in a pathogenic effect on the encoded protein. Only 3 SVs (all pathogenic) in GLA are currently described in ClinVar, compared to 226 pathogenic SNVs and short indels. Surprisingly, in one sample in our study cohort, we detected a 2914bp deletion between introns 1 and 2, which completely removes exon 2 (Fig. S2). The read length distribution of the amplicon sequencing shows a peak around 10kb, the expected amplicon length of the deletion variant (Fig. S1, sample 11). As in the case of the intronic SNV, the precise breakpoints of this deletion could only be detected using whole gene amplicon sequencing as both of its boundaries are deep intronic.
Variant phasing
One of the advantages of using ONT in this workflow is the fact that > 50% of the reads include the full length gene, allowing haplotype phase determination of variants that originate from a single molecule. As the analysis was performed blindly to the personal background data of the patients, we could detect haplotypes that are shared between two samples (samples 3 and 5, and samples 7 and 8). We predicted that the samples from each of these two sets belonged to patients with blood relationships. At the end of the analysis, when the blinded file was decoded these findings were validated by the fact that the two sets of samples were actually siblings. Moreover, all 12 SNVs/CNVs were confirmed by prior Sanger sequence clinical testing.
Downsampling for designing higher multiplexed sequencing
In this study we sequenced in multiplex 12 samples on one MinION flowcell, yielding an average coverage of 45,000X, much higher than needed for high quality variant calling using ONT reads. In order to evaluate how many samples could be multiplexed in future analyses, we randomly downsampled the reads output to several points. Then, we repeated the analysis workflow on the downsampled reads and evaluated the detection rate of the variants found in the full dataset. Downsampling up to 500 reads per sample allowed detection of all variants from all 12 samples (Fig. 1). While for several samples using only 30 reads were sufficient for 100% true positive detection, for other samples using 250 reads or lower achieved only partial detection, and the rate of several false positive variants passing the quality filter increased. For future GLA genotyping using this pipeline we estimate that 1000 reads per sample will detect all true variants and efficiently discriminate them from low quality false negatives. Thus, for example, sequencing 96 multiplexed samples on a single ONT Flongle flow cell, is estimated to achieve the desired coverage while significantly reducing costs.
Deep intronic and large copy number variant detection using long amplicon sequencing
Two patients in the cohort were of special note: one with the deep intronic variant IV53 + 405T > G and the second one with the deletion of exon 2.
The patient with the variant IV53 + 405T > G suffered acroparesthsias and typical Fabry pain crises and abdominal cramping from childhood. Then, a suspicion of FD was raised, and his enzyme activity was found 8% of normal. His pathogenic variant remained unidentified for 10 years until Sanger sequencing was performed on multiple amplicons spanning the entire non-coding sequence of GLA. The patient with the deletion of exon 2, and his brother and mother were all clinically diagnosed, with Fabry disease. The patient had zero enzyme activity, but the variant was not found for several years by conventional Sanger sequencing. By extracting mRNA and sequencing the cDNA the deletion was found, as mentioned above, with a delay of several years. In contrast, both variants were identified with ease using the ONT long amplicon sequencing method.
All but two variants in Table 2 were classified as pathogenic by a combination of variant effect prediction tools (CADD, REVEL, SIFT, MutationTaser, Polyphen2) and 4 of them were already classified by ClinVar as pathogenic. One patient carried the variant R118C that has conflicting pathogenicity interpretation 29,30. Indeed, he did not present any of the signs and symptoms that are shown in Table 1. The second variant T194I was classified as likely pathogenic. This patient presented all the signs and symptoms described in Table 1 except for stroke.