Deep Learning Applied to the SARS-CoV-2 Classification

doi:10.21203/rs.3.rs-3290221/v1

Download PDF

Research Article

Deep Learning Applied to the SARS-CoV-2 Classification

https://doi.org/10.21203/rs.3.rs-3290221/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Purpose: The primary objective of this study was to develop and evaluate a deep neural network model based on convolutional neural networks (CNNs) for accurately classifying SARS-CoV-2 viral sequences and other subtypes within the Coronaviridae family. With the rapid evolution of viral genomes and the increasing need for timely classification, we aimed to provide a robust and efficient tool that could enhance the accuracy of viral identification and classification processes. By harnessing the power of deep learning, we sought to contribute to advancing viral genomics research and aid in surveilling emerging viral strains.

Methods: We designed and implemented a CNN-based deep neural network architecture capable of processing complete cDNA genomic sequences to achieve our goal. We used a dataset comprising diverse viral subtypes, including SARSCoV- 2, for training and testing. The dataset was partitioned using a 5-fold cross-validation strategy to ensure rigorous evaluation. Our model’s performance was assessed using various metrics, including accuracy, precision, sensitivity, specificity, F1-score, and AUROC. Additionally, artificial mutation tests were conducted to evaluate the model’s generalization ability across sequence variations. We also used the BLAST algorithm and conducted comprehensive processing time analyses for comparison.

Results: The developed CNN-based model demonstrated exceptional performance across various evaluation metrics. In the training phase, the model consistently achieved maximum values for accuracy, sensitivity, specificity, and other key metrics, indicating its robust learning ability. Notably, during testing on over 10,000 viral sequences, the model exhibited a sensitivity of over 99% for sequences with fewer than 2,000 mutations. The CNN-based model showcased superior accuracy and significantly reduced processing times compared to the BLAST algorithm. These findings underscore the model’s effectiveness in accurately classifying viral sequences and its potential to revolutionize viral genomics research.

Conclusion: This study introduces a CNN-based deep neural network model as a powerful tool for precisely classifying viral sequences, specifically focusing on SARS-CoV-2 and other Coronaviridae family subtypes. Our model’s superiority is evident through rigorous evaluation and comparison with existing methods, offering enhanced accuracy and efficiency. The application of artificial mutation testing demonstrated the model’s robustness in handling sequence variations. By harnessing deep learning capabilities, our model significantly contributes to viral classification and genomics research. As viral surveillance becomes increasingly critical, our model holds promise in aiding rapid and accurate identification of emerging viral strains.

SARS-CoV-2

COVID-19

viral classification

deep learning

No competing interests reported.

Download PDF

Editorial decision: Revision requested
10 Oct, 2023
Editorial decision: Major revision
10 Oct, 2023
Reviews received at journal
15 Sep, 2023
Reviewers agreed at journal
14 Sep, 2023
Reviewers invited by journal
14 Sep, 2023
Editor invited by journal
05 Sep, 2023
Editor assigned by journal
05 Sep, 2023
Submission checks completed at journal
31 Aug, 2023
First submitted to journal
23 Aug, 2023

You are reading this latest preprint version

Deep Learning Applied to the SARS-CoV-2 Classification

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1