The genetic diversity of coronaviruses occurs through mutation and recombination, as has been described for SARS-CoV-2 [10]. Although the RNA-dependent RNA polymerases of coronaviruses possess proof-reading capacity [11], the virus still undergo mutation, which might lead to amino acid replacement. Such changes impact the biology of the virus as well as the clinical manifestation of its infection. Recombination involves viral RNA merging with other RNAs, either its own RNA, the RNA of other viruses, or cellular RNA; thus, template switching occurs during transcription [12]. This process leads to RNA indels. Mutations in SARS-CoV-2 prior to the emergence of variants have been reported [5]. In HIV, the deletions occurred by at least three different mechanisms: (i) misalignment of the growing point; (ii) incorrect synthesis and termination in the primer-binding sequence during synthesis of the plus-strand strong-stop DNA; and (iii) incorrect synthesis and termination before the primer-binding sequence during synthesis of the plus-strand strong-stop DNA [13].
Previous whole-genome comparisons have been conducted, including for Omicron [14]. However, that work focused on phylogeny and did not cover the recently identified BA.2 and GPA lineages, which are colloquially known as Deltacron. Indels and amino acid substitutions unique to specific variants were not described.
Through random selection of variant representatives with definitive sequences across the genome, we managed to identify unique patterns of indels and amino acid substitutions. Even with only two representatives for each variant, we identified quasispecies or, in the case of a variant, quasivariants. Viral quasispecies refers to a population structure that consists of extremely large numbers of variant genomes, termed mutant spectra, mutant swarms or mutant clouds [15]. For SARS-CoV-2, this phenomenon has been discovered even in single infected individuals [16–20]. We propose the term quasivariant, as many indels and amino acid substitutions occur in one of only two representatives. We believe that we will find more variation if we analyse more variant representatives. There must have been no variant or clade annotation error in the GISAID database, as phylogenetic analysis (Fig. 1) shows each sequence cluster with its own variant partner.
Amino acids consistently substituted from Wuhan-Hu-1 across all variants are ORF1AB P4715L/F and spike D618G. The D618G has been covered in previous works [5, 21–28]. ORF1AB P4715L/F has also been described [29, 30]. A database-wide survey is needed to understand the frequency of those substitutions.
The variant that harbours the most variant-specific substitution to Wuhan-Hu-1 is VUM GH/490. Both representatives show five, eight, two, and one amino acid substitution(s) in ORF1AB, spike, NP, and ORF3A, respectively. This VUM is being tracked in Europe, Africa, Asia, and America; however, the genome frequency for access of GISAID dated March 30, 2022, is lower than 0.3%.
The GKA clade does not comprise a unique variant. It harbours no unique indel or substitution compared with Wuhan-Hu-1, but it does share 34 amino acid replacements with Wuhan-Hu-1 with Omicron and BA2, 13 with Delta, one in spike with Delta, Omicron, and BA.2. Three insertions in spike in one representative of Ins216E, Ins217P, and Ins218E of the GKA clade are shared with Omicron. The molecular signatures of Delta and Omicron are obvious in the GKA clade. It is plausible that the GKA clade is an Omicron subvariant. The Delta signatures are understandable, as phylogenetically, the Omicron, BA.2, and GKA clades emerged from Delta (Fig. 1) with a high boostrap value. We suggest that the clade is not the result of sequencing error, as previously thought [2].
Interestingly, we identified truncated ORF3A in the Mu variant. Deletion of four nucleotides generates a stop codon; thus, ORF3A in this variant is 257 amino acids in length, whereas the others are 275 residues long. This accessory protein contributes to the pathogenesis of SARS-CoV-2 by inducing pathological apoptosis [31]. The effect of the Mu variant at the cellular level has not yet been described. One paper on this variant covered the neutralization effect of antibodies [32]. According to the GISAID database accessed on March 30, 2022, this variant has been identified in many countries, with a maximum global genome frequency of less than 1%, which has declined recently.
BA.2 differs from Omicron in the deletion of 48 nucleotides from the 3’UTR. The 3’UTR of coronaviruses contains all of the cis-acting sequences necessary for viral replication and binds to cellular as well as the viral components nsp1 and N proteins [33], which are required for minus-strand RNA synthesis [34]. This has also been described in SARS-CoV-2, whereby the 3'UTR is involved in genomic dimerization and interacts with cellular micro-RNA [35]. BA.2 has recently increased in frequency in multiple regions of the world, suggesting that it has a selective advantage over Omicron [36–39]. The genome frequency of BA.2 has increased exponentially to 90% of total Omicron submission, as based on GISAID accessed on the above date. As the original strain of Wuhan-Hu-1 has a basic reproduction number (R0) of 2.4-3 [40], Delta has an R0 of 5 [41], and Omicron has an R0 of estimated to be higher than 10, or three times greater than Delta [42], BA.2 subvariant might has R0 of 15 or higher. The higher transmissibility of BA.2 might be attributed, at least in part, to the shorter 3’UTR, which results in higher speed of viral replication, which needs to be investigated further. However, because the coding region across the whole genome, particularly for the spike protein, of BA.2 is very close to that of Omicron, people who survived Omicron infection should be naturally protected against BA.2.
In conclusion, whole-genome comparison of representatives of all variants revealed indel patterns that are specific to SARS-CoV-2 variants or sub-variants. Polymorphic amino acid comparison across all coding regions also showed amino acid residues shared by specific groups of variants. Additionally, based on the findings, it is plausible that the GKA clade is an Omicron subvariant. Finally, the higher transmissibility of BA.2 might be due at least in part to the 48 nucleotides deletion in the 3’UTR, which results in higher speed of viral replication.