To investigate the poor performance of NanoVar in Jiang et al., we regenerated the long-read simulation datasets and benchmarked NanoVar in accordance to the methods stated in Jiang et al. (https://github.com/SQLiu-youyou/The-commands-of-the-evaluation). For comparison, we have also included Sniffles6 (described as one of the software with the highest performance) in the benchmarking. Due to errors raised during the benchmark analysis by Truvari7 (v3.0.1), some SVs from each tool were filtered-out (Not mentioned in Jiang et al.), which will be discussed later. Despite employing the identical simulated datasets, our benchmarking results yielded better NanoVar performance scores than Jiang et al. (Fig. 1). As Jiang et al. did not provide the exact benchmark scores of NanoVar in their benchmark (Table S2 in Jiang et al.), we can only compare our results to the bar graphs in Fig. 2 of their publication. Jiang et al. showed that NanoVar acquired F1 scores of less than 0.1 for all sequencing coverages of 3X, 5X, 10X, and 20X, which is at least threefold lower than what we observed in our results, which had F1 scores of 0.38, 0.45, 0.46, and 0.45, respectively (Fig. 1a). The disparity of F1 score reporting is most likely explained by differences in SV recall, where we observed higher recall rates of 0.28, 0.39, 0.45, and 0.44, for respective sequencing coverages, as compared to less than 0.05 for all coverages in Jiang et al. (Fig. 1b). NanoVar’s F1 scores of SV calling with genotype concordance was also observed to be higher in our results (Fig. 1a). Collectively, our repeated benchmark analysis using the same simulated datasets suggests that Jiang et al. may have underestimated NanoVar's performance.
We also observed different performance results for Sniffles. While our F1 scores of Sniffles for SV calling by presence are broadly in agreement with Jiang et al., the F1 scores for SV calling by genotype concordance were substantially lower in our results (Fig. 1a). F1 scores for Sniffles (Genotype) ranged from 0.17 to 0.24 in our analysis, whereas Jiang et al. reported F1 scores ranging from 0.50 to 0.70, for sequencing coverages of 3X to 20X. Consequently, we observed that NanoVar (Genotype) has outperformed Sniffles (Genotype) by more than 0.13 in F1 scores across all sequencing coverages (Fig. 1a). Sniffles' reduced performance in SV genotyping was also consistent with the benchmarks of Dierckxsens et al.4, where Sniffles’ genotype scores were at least 30% lower than other SV callers. These results suggest that Jiang et al. may have overestimated Sniffles' SV genotyping capability.
During our analysis, we made some changes to certain output files within the protocol described by Jiang et al. However, these changes were made in order to rectify the errors that we encountered. As these errors were not mentioned by Jiang et al., they were unanticipated while following their protocol, and we are uncertain how Jiang et al. had resolved them to allow successful completion of their analysis. The first error we encountered happened during the long-read simulation step using VISOR8 (v1.1) where we obtained an empty BAM output file. After consulting with the author of VISOR, we discovered that the problem was with the “SHORtS.LASeR.bed” file provided by Jiang et al., in which the start coordinates of genomic regions in the file were “0”s instead of “1”s (c.f. https://github.com/davidebolo1993/VISOR/issues/18). The problem was resolved after we corrected the start coordinates of the file. The second error occurred when NanoVar was running on the simulated long-read BAM file produced by VISOR. The error happened because the read names of the simulated reads contained the comma (,) symbol, which resulted in a parsing error and prevented NanoVar from completing successfully. After removing the commas in the read names, NanoVar completed its run with no errors. As this was a necessary correction to obtain results from NanoVar, it is unclear how Jiang et al. had handled it and whether this influenced the results. The third error happened due to VCF file incompatibilities with Truvari for NanoVar and Sniffles. For NanoVar, an error was raised due to the presence of “>” or “.” symbols in the “SVLEN” field of some entries in the VCF file. These symbols were added by NanoVar to refine information on SV length, or nullify it for SVs with no lengths, respectively. When these symbols were omitted from the VCF file, Truvari ran successfully. For Sniffles, the error was caused by the “STRANDBIAS” string in the “FILTER” column of a few entries (< 50), and eliminating these entries resolved the problem. With the presentation of these VCF incompatibilities, it is plausible that there might be more nuances in the VCFs of NanoVar and Sniffles that impede an accurate assessment by Truvari. Taken together, we are uncertain how these fundamental errors were addressed by Jiang et al. and if they may have affected the results.