Samples have from 2 (France and USA1 variants) up to 853 (China2 variant) mutations and all samples except Wuhan and the USA1 variants have unique and specific mutations. in the following mutations are named such as “T 241 C/T”. the first letter T or D shows the mutation type, i.e., Translocation or Deletion. The following digits show the base number of the mutation within the alignment position order and C/T shows the substitution of cytosine for thymine (Table 1).
The South African and Brazil (MT324062.1, MT350282.1) samples are like the Wuhan variant in length (29903bp) but the South African variant has three translocation mutations that separate it from the Wuhan variant, the Brazil variant has ten translocations with four of them similar to the Wuhan variant. Three C/T mutations are at positions 241, 3037, and 14408 of the alignment positioning (T 241 C/T, T 3037 C/T, T 14408 C/T) and a single A/G substitution (T 23403 A/G). Between all of these variants, only France, South Africa, and Turkey have unique mutations.
Four translocation mutations (T 241 C/T, T 3037 C/T, T 23403 A/G, and T 14408 C/T) of the Wuhan virus exist in ten countries, i.e., Iran, Colombia, Italy, Nepal, Vietnam, India, England, South Korea, Brazil. This probably shows the source of these viruses is from Wuhan while the others are missing these mutations.
The sequences of variants from England, Belarus, the Philippines, the USA 2, and China 2, in addition to translocation mutations, also have deletion mutations. Interestingly the viruses containing deletions are seen in 2021.
The English variant has a 24 nucleotide deletion (D 23598–23621) in gene S. The Belarus virus has two mutations including a 9 nucleotide deletion (D 686–694) in ORF1ab and a 15 nucleotide deletion (D 27764–27778) in ORF7b. The Philippine virus has three mutations including a 9 nucleotide deletion (D 11288–11296) in ORF1ab, and a 6 nucleotide deletion in (D 21766–21771) in ORF7b, and a 3 nucleotide deletion (D 21994–21996) in gene S. The USA variant 2 has a 10 nucleotide deletion (D 80–89) in the 5’ and the China variant 2 has five long deletions; a 26 nucleotide deletion (D 27375–27400) in ORF6, a 130 nucleotide deletion (D 27416–27545) in ORF7a, a 332 nucleotide deletion (D 27555–27886) in ORF7a and ORF7b, a 104 nucleotide deletion (D 27908–28011) in ORF8, and a 233 nucleotide deletion (D 28023–28255) in ORF8. An interesting feature of the China variant 2 is the high density of deletions that almost completely remove ORF7b, ORF7a, ORF8. No sample had a deletion in ORF10, gene N, gene M, gene E, or ORF3a. More than 50% of mutations in 2020 are in ORF1ab, while the 2021 samples have less than 50% of the mutations in ORF1ab. Additionally, the number of mutations in 2021 is 2 to 3 times 2020’s average. Despite all these mutations, the virus continues to function and has not diminished its aggressiveness.
In the phylogenetic tree (Fig. 1), the differences between the China virus 2 and the Wuhan virus at the beginning and end of the graph from 2019–2021 are obvious. According to the graph, the Wuhan variant is the origin and is shown as a subordinate of several viruses. Additionally, the China virus 2 is a completely new variant compared to all other viruses in other branches of the graph with the Philippine virus on a corresponding branch with different mutations. According to the recent mortality rates in China and the high mutation rates seen in China variant 2 with the deletions of ORF7b, ORF7a, and ORF8 these genes can be considered very important.
Mutations in the functional protein PLpro also exist within the South African variant alignment position sequence (T 5572 T/G), the Columbia variant (T 5298 N/C), the Philippine variant (T 4964 G/A and T 5388 A/C), and the China 2 variant (T 5653 C/T).
The RdRp protein is mutated in the Indian variant (T 14408 C/T, T 16176 C/T) Vietnam, Nepal, Italy, Columbia, Iran 1, Brazil, Iran 2, Wuhan, English, South Korea variants (T 14805 T/C), the Belarus variant (T 15372 T/G) the Philippine variant (T 14676 T/C, T 15279 T/C) and the China 2 variant (T 15165 G/A).
The Nsp13-helicase functional protein is mutated in the Columbia variant (T 17470 T/C), Brazil variant (T 17247 C/T), Philippine variant (T 17615 G/A), and USA 2 variant (T 17014 T/G).
Nsp1 is also mutated in the Spain variant (T 313 T/C), France variant (T 618 G/A). ORF7a (27759 − 27394) is also mutated in the Belarus variant (T 27670 T/G), China 2 variant (T 27412, T 27410 C/T, T 27407 C/T, T 27405 A/T, D 2755–27759, T 27553–27554 C/T, D 27416–27545, T 27413 A/T).
Considering the increase in mutation and contagion we searched for further effective interacting elements. Because the virus genome must interact with other parts of the human genome for replication and pathogenesis and use the human genome for invasive purposes.
In the next part of the analysis, we used RNA sequencing to align 24 virus sequences with the HG38 human genome and we observed that in the genome sequence of all viruses a small sequence is similar to the human genome and aligns to the EPPK1 gene in chromosome 8. This 17 nucleotide sequence (TCCTGCTGCAGATTTGG) containing the PstI restriction site is the unique sequence between all SARS-CoV-2 sample genomes and the hg38 human genome. The sequence between all 24 sample genomes from different countries in gene N of the SARS-CoV-2 virus is almost in the 28500–29500 location and 29474 − 29458 location of the virus sequence alignment position. This sequence is also located in the EPPK1 gene at positions 3961–3977 (Fig. 2).