Some patients with genetic disorders have structural abnormalities, of which CNVs are the most common type. NGS can detect both SNVs and CNVs, and many detection algorithms for CNVs have been developed using different principles. One popular method involves comparing the depths of coverage among cases and controls. The performance of those tools has been evaluated in several benchmark studies and was found to depend significantly on the dataset12,15,23,24. Roca et al. reported that ExomeDepth, DeCoN, and ExomeCNV had high sensitivity23. Moreno-Cabrera et al. conducted a benchmark evaluation of five CNV detection tools (DECoN, CoNVaDING, panelcn.MOPS, ExomeDepth, and CODEX2) using targeted panel NGS data and found that DECoN and panelcn.MOPS had high sensitivity, and ExomeDepth had the most balanced performance15. Zhao et al. conducted a performance evaluation of four CNV tools (CoNIFER, cn.MOPS, CNVkit, and exomeCopy) with whole exome sequencing data and found that performance differed according to targeted CNV size and type24. Of note, CNV algorithms based on the depth of coverage method commonly have problems with false positive calls, which are mainly affected by high GC content and poor mappability. Plus, inaccurate detection of small CNVs remains challenging because most target regions of whole exome sequencing and targeted NGS data are small and noncontiguous12,23.
To improve the performance of CNV detection, some CNV visualization tools have been developed (Table 1). Users can recognize true CNVs more intuitively by visually inspecting the depth of coverage in the regions of interest. Most CNV visualization tools use a window of a specific length to reduce variability in read depths, and they generally visualize CNVs at the chromosome or gene level25–32. On the contrary, CABANA visualizes CNVs with high resolution based on normalized single-base-level read depth. To the best of our knowledge, only one previous tool visualizes CNVs using normalized read depths per base14. With that higher resolution, users can efficiently discriminate true CNVs, both small and large, from false CNVs. Unlike other CNV visualization tools, CABANA produces uniform, steady lines plotted using the NRDs, which are an important factor in filtering out false-positive calls and greatly increase specificity.
Table 1
Copy number variation detection tools with a visualization function. CNV Copy number variation; NGS Next-generation sequencing; WGS Whole genome sequencing.
Name
|
Window size
|
Visualization level
|
Features
|
Reference
|
ReadDepth
|
> 500 bp
|
Chromosome
|
Improves resolution in low-coverage data
|
Miller CA et al. (2011) 25
|
CNView
|
100bp–1 kb
|
Chromosome
|
Preliminary CNV screening tool for large WGS datasets
|
RL Collins et al. (2016) 26
|
iCopyDAV
|
User-defined
(default 100 bp)
|
Chromosome
User-defined region
|
Integrated platform for CNV detection
Functional annotations for CNVs
|
Dharanipragada P et al. (2018) 27
|
(untitled)
|
Amplicon-dependent
|
Gene
|
CNV visualization method for amplicon-based sequencing data
|
SY Nishio et al. (2018) 28
|
CNVkit
|
Region-dependent
|
Chromosome,
Gene
|
Combination of read depths from both the on- and off-target regions
Visualization of segmented copy ratios generated from the algorithm
|
Talevich E et al. (2016) 29
|
VisCap
|
Exon-dependent
|
Chromosome,
Exon (low resolution)
|
CNV visualization for quality control and manual inspection
A visual scoring system for filtration of false-positive calls
|
TJ Pugh et al. (2016) 30
|
CNspector
|
User-defined
|
Chromosome,
Exon (low resolution)
|
Multi-scale CNV visualization with clinically contextual data
Web-based tool
|
Markham JF et al. (2019) 31
|
DeviCNV
|
Probe-dependent
|
Chromosome,
Exon
|
Detection tool for exon-level CNVs in targeted NGS data
Visualization of CNV candidates with statistical information
|
Y Kang et al. (2018) 32
|
(untitled)
|
Not required
|
Gene,
Exon (low resolution)
|
CNV visualization using the normalized reads per nucleotide
Potential to replace CNV confirmation tests
|
Kerkhof J et al. (2017) 14
|
CABANA
|
Not required
|
Exon (high resolution)
|
Visualization of CNVs screened by a conventional bioinformatics tool at the single-base level
An efficient method for detecting exon-level CNVs
Potential to replace CNV confirmation tests
|
Present study
|
In support of that specificity, 31 pathogenic CNVs determined as true by CABANA were all confirmed to be true by MLPA. Therefore, CABANA visualization can decrease the need for additional confirmatory testing to increase the cost-effectiveness of NGS and reduce the burden on laboratories. In addition, small deletions and partial exon deletions that were not identified by the conventional CNV algorithm were detected by CABANA. Because visual inspection with CABANA is very intuitive, even inexperienced users can easily identify true CNVs. Nonetheless, we recommend that confirmation tests be applied in specific instances, such as a single exon deletion.
CNVs in TTN, NEB, and TNXB were frequently called by ExomeDepth, but none of them showed a true-positive pattern in CABANA. The presence of tandem repeat regions in TTN and NEB and a highly homologous pseudogene in TNXB might have influenced the performance of CABANA33–35. Similar to other bioinformatic tools that use the read-depth approach, CABANA seems to have difficulty in determining true CNVs in regions with high GC content, where highly variable NRDs tend to appear7,13,36. Although the CNVs in TTN, NEB, and TNXB called by ExomeDepth were not confirmed by MLPA, the similar patterns recurrently observed in specific regions of those genes in normal healthy controls suggest a very low likelihood that they are true pathogenic CNVs.
In this study, we found that about 36% of patients with NMD harbored molecular abnormalities on the targeted NGS panels. Previous studies reported that clinically significant variants were detected in 20–49% of NMD patients, but that diagnostic yield varied with the NGS panel and cohort group tested9,37,38. A large-scale study on the diagnosis of NMD using multigene panels showed that pathogenic CNVs were identified in 7.6% of NMD patients, with the majority being on SMN1, PMP22, DMD, and SPAST9. Using our bioinformatics pipelines with CABANA, we found pathogenic CNVs in 7% of patients with NMD; in concordance with the previous large-scale study, most of them were in DMD, PMP22, SPAST, and SMN1.
The most commonly mutated gene in our cohort was DMD, a causative gene for Duchenne muscular dystrophy and a major cause of inherited muscular disorders in Korea39. Of 39 pathogenic variants in DMD, 16 (41%) were CNVs. Considering that approximately 70% of Duchenne/Becker muscular dystrophy patients with molecular defects had pathogenic variants of DMD in the form of CNVs, the proportion of CNVs detected by CABANA seems to be low40,41. However, there might have been selection bias in our patient cohort because some patients had proven to be negative for CNVs by MLPA or quantitative PCR before NGS testing. SPAST, a major causative gene for autosomal dominant spastic paraplegia42, was the third most commonly mutated gene in our patients with NMD, with about 29% being pathogenic CNVs. Previous studies on hereditary spastic paraplegia reported that the proportion of pathogenic CNVs was 2.5–37.5% depending on the characteristics of each cohort43,44. PMP22 is related to CMT type 1A and hereditary neuropathy with pressure palsies, with most patients having deletion and duplication CNVs, respectively45. Consistent with that, all the pathogenic variants found in PMP22 in this study were CNVs. Collectively, the mutation spectrum and proportion of CNVs in these disorders found using CABANA were concordant with the literature. In most patients, phenotype was consistent with disorders related to the gene with pathogenic CNVs. This evidence supports our CABANA algorithm as robust and accurate.
Our study has some limitations. First, the performance of CABANA could not be thoroughly evaluated due to the limited availability of confirmatory tests and practical considerations, such as an uncertain false-negative rate. Nonetheless, its performance was deemed to be acceptable compared with previously reported CNV data in patients with NMD and clinical correlations with our patients’ results. Second, CNV visualization with CABANA was performed only on CNVs called by ExomeDepth, which might have missed true CNVs15. Third, as described above, it can be challenging for CABANA to identify true CNVs in repeat regions, highly homologous regions, and GC content-rich regions7,13,36.
In summary, we developed a base-level visualization software, CABANA, as a confirmatory tool for CNVs called by other algorithms. With its high resolution, CABANA showed excellent fidelity and specificity and could help exclude false CNVs and identify true CNVs without additional confirmation tests. In patients with NMD, CABANA effectively detected pathogenic CNVs, demonstrating its high utility with clinical samples.