Purpose: The primary objective of this study was to develop and evaluate a deep neural network model based on convolutional neural networks (CNNs) for accurately classifying SARS-CoV-2 viral sequences and other subtypes within the Coronaviridae family. With the rapid evolution of viral genomes and the increasing need for timely classification, we aimed to provide a robust and efficient tool that could enhance the accuracy of viral identification and classification processes. By harnessing the power of deep learning, we sought to contribute to advancing viral genomics research and aid in surveilling emerging viral strains.
Methods: We designed and implemented a CNN-based deep neural network architecture capable of processing complete cDNA genomic sequences to achieve our goal. We used a dataset comprising diverse viral subtypes, including SARSCoV- 2, for training and testing. The dataset was partitioned using a 5-fold cross-validation strategy to ensure rigorous evaluation. Our model’s performance was assessed using various metrics, including accuracy, precision, sensitivity, specificity, F1-score, and AUROC. Additionally, artificial mutation tests were conducted to evaluate the model’s generalization ability across sequence variations. We also used the BLAST algorithm and conducted comprehensive processing time analyses for comparison.
Results: The developed CNN-based model demonstrated exceptional performance across various evaluation metrics. In the training phase, the model consistently achieved maximum values for accuracy, sensitivity, specificity, and other key metrics, indicating its robust learning ability. Notably, during testing on over 10,000 viral sequences, the model exhibited a sensitivity of over 99% for sequences with fewer than 2,000 mutations. The CNN-based model showcased superior accuracy and significantly reduced processing times compared to the BLAST algorithm. These findings underscore the model’s effectiveness in accurately classifying viral sequences and its potential to revolutionize viral genomics research.
Conclusion: This study introduces a CNN-based deep neural network model as a powerful tool for precisely classifying viral sequences, specifically focusing on SARS-CoV-2 and other Coronaviridae family subtypes. Our model’s superiority is evident through rigorous evaluation and comparison with existing methods, offering enhanced accuracy and efficiency. The application of artificial mutation testing demonstrated the model’s robustness in handling sequence variations. By harnessing deep learning capabilities, our model significantly contributes to viral classification and genomics research. As viral surveillance becomes increasingly critical, our model holds promise in aiding rapid and accurate identification of emerging viral strains.