Background: This article is aimed to provide an updated landscape of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic mutations emerged since its first identification and sequencing.
Methods: We downloaded and analyzed all mutations within the SARS-CoV-2 RNA genome submitted up to February 8, 2022 to the website of the National Center for Biotechnology Information (NCBI), which contains all variants in Sequence Read Archive (SRA) records compared to the prototype SARS-CoV-2 reference sequence NC_045512.2.
Results: Our search identified 26,005 different mutations. The largest number of mutations was located within the gene encoding for the Nsp3 protein (20.7%), followed by the gene encoding for the spike protein (14.6%). Overall, 17948/26005 (69.0%) of these mutations interested single nucleotide positions, thus spanning over ~62% of the entire SARS-CoV-2 genome. Of all mutations, 61.5% were non-synonymous, whilst 17.4% of those in the gene encoding for the spike protein involved the sequence of the receptor binding domain, 59.2%, of which were non-synonymous. When the number of mutations was expressed as ratio to the gene size, the highest ratio was found in the sequence encoding for ORF7a (ratio, 2.25), followed by ORF7b (ratio, 1.85), ORF8 (ratio, 1.60) and ORF3a (ratio, 1.48). The gene encoding for RNA-dependent RNA polymerase accounted for only 0.1% of all mutations, with a considerably low ratio with the gene size (i.e., ratio, 0.01).
Conclusions: The results of our analysis demonstrate that SARS-CoV-2 has enormously mutated since its first sequence has been identified over 2 years ago.