Background: Single-molecule real-time (SMRT) sequencing data are characterized by long reads and high read depth. Compared with next-generation sequencing (NGS), SMRT sequencing data can present more structural variations (SVs) and has greater advantages in calling variation. However, there are high sequencing errors and noises in SMRT sequencing data, which brings inaccurately on calling SVs from sequencing data. Most existing tools are unable to overcome the sequencing errors and detect genomic deletions.
Methods and results: In this investigation, we propose a new method for calling deletions from SMRT sequencing data, called MaxDEL. MaxDEL can effectively overcome the noise of SMRT sequencing data and integrates new machine learning and deep learning technologies. Firstly, it uses machine learning method to calibrate the deletions regions from variant call format (VCF) file. Secondly, MaxDEL develops a novel feature visualization method to convert the variant features to images and uses these images to accurately call the deletions based on convolutional neural network (CNN). The result shows that MaxDEL performs better in terms of accuracy and recall for calling variants when compared with existing methods in both real data and simulative data.
Conclusions: We propose a method (MAXDEL) for calling deletion variations, which effectively utilizes both machine learning and deep learning methods. We tested it with different SMRT data and evaluated its effectiveness. The research result shows that the use of machine learning and deep learning methods has great potential in calling deletion variations.