Mutational insights among the structural proteins of SARS-CoV-2; comprehensive analyses in the six continents

Background: Mutations among the structural proteins of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) can lead to the emergence of new variants with different features in mortality and sensitivity toward drugs and vaccines. Here we aimed to investigate the mutations among structural proteins of SARS-CoV-2 globally. Methods: We analyzed samples of amino-acid sequences (AASs) for envelope (E), membrane (M), nucleocapsid (N), and spike (S) proteins from declaration the coronavirus 2019 (COVID-19) as pandemic to January 2022. Then, the existence of mutations and their locations have been considered by the sequence alignment to the reference sequence, categorized based on frequency and continent. Finally, the related human genes with the viral structural genes have been discovered, and their interactions have been reported. Results: The results indicated that the most relative mutations among the E, M, N, and S AASs happened in the regions of 7 to 14, 66 to 88, 164 to 205, and 508 to 635 AAs, respectively. The most frequent mutations in E, M, N, and S proteins were concluded as T9I, I82T, R203M/R203K, and D614G. D614G is the most frequent mutation in all six geographical areas. Following D614G, L18F, A222V, E484K, and N501Y rank second to fth most frequent mutations in S protein globally. Besides, A-kinase Anchoring Protein 8 Like (AKAP8L) has been shown as the linkage unit between M, E, and E cluster genes. Conclusion: Screening the mutations of structural proteins can help scientists introduce better drug and vaccine development strategies.


Background
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded, positive-sense RNA virus with a genome size of 29,903 nucleotides (1). The infectivity and mortality rate of coronavirus 2019 , the infectious disease caused by SARS-CoV-2 and has been declared as a pandemic on 11 th March 2020, are variable among different countries and the exact factor in uencing this variability is not lucid yet (2).
Despite the proofreading mechanisms of the virus, mutation rates of coronaviruses are between 10 −5 and 10 −3 substitutions per nucleotide site per cell infection (s/n/c); therefore, several mutations have been detected by wide-ranging sequencing (3,4). The genome of SARS-CoV-2 contains 12 open reading frames (ORF) that encode 22 nonstructural proteins and 4 structural proteins including envelope (E) protein, membrane (M) protein, nucleocapsid (N) protein and spike (S) protein (5). The S protein, which plays key role in recognition by human host cell surface receptor angiotensin-converting enzyme 2 (ACE-2), consists of an N-terminal S1 subunit and a C-terminal S2 subunit. The occurrence of a point mutation in the S1 subunit named D614G mutation resulted in the dominant variant of SARS-CoV-2. This mutation is associated with the higher viral load, increased tness and enhanced infectivity of the virus and leads to the emergence of disease in the younger age (3, 6-8). In fact, emergent mutations may appear a new variant of concern (VOC) which achieves the increased potential in binding a nity to human receptors and higher infectivity and mortality power than previous variants. The B.1.617.2 (Delta) variant is an example of such demonstration which owns higher transmissibility and resistant to neutralization due to the combination of mutations occurred in the S protein (9). The B.1. 1.529 (Omicron) variant is the fth and up to now, the last VOC possessing about 40 mutations containing one mutation in E protein, 3 mutations in M protein, 6 mutations in N protein, and more than 30 changes in S protein. It has been identi ed in November 2021 in South Africa and increased infectivity and higher potential of stimulating the immune response was displayed in such variant by the virtue of high quantity of mutations (10). These ndings highlight the importance of mutations pattern affecting the COVID 19 prevalence, infectivity, and mortality rate among different countries which needs to be elucidated.
At present, vaccination is the most advantageous strategy for combating COVID-19. Most of the designed vaccines inhibit viral pathogenesis via targeting the S protein (11,12). Hence, investigating the effect of the mutations has an important role in determining the quality of vaccinations. On the other hand, some of the therapeutic strategies are based on the interaction between drugs and S proteins (13)(14)(15). Additionally, other structural proteins are considered to be attractive targets for manufactured drugs and vaccine formulations (16,17). Consequently, identifying the geographical distributions and evolutionary trends of the structural proteins of SARS-CoV-2 triggers the intelligent approach in the epidemiological researches and molecular designing of the vaccines and drugs.
In the current study, we aimed to discover the frequencies of mutations among structural proteins of SARS-CoV-2 globally and also within the different continents, separately. Then, we addressed the evolutionary trends of the mutation's bedside considering their characteristics from the beginning of the pandemic to January 2022 and ultimately, related human genes with structural proteins of SARS-CoV-2 and the interactions were studied.

Sequence retrieval
The current study performed by evaluation of whole data belonging to 4 structural genes of SARS-CoV-2  (Fig. 1).
SARS-CoV-2 sequences analyses Python 3.8.0 software was operated to initialize the FASTA les, extract the 4 mentioned structural proteins from other genes and conduct the SARS-CoV-2 sequences alignment and analysis in order to detecting the presence of any mutations. Each difference between sample and reference sequence was interpreted as a mutation and the location and substituted AA were reported. For each of the structural proteins, non-human samples, samples with different number AAs compared to reference and samples with non-speci ed AAs were excluded. The total process was optimized by applying 'Numpy' and 'Pandas' libraries.
The algorithm utilized for detecting mutants is as follows: For the reason that all sequences have equal lengths, the following algorithm used 'Refseq,' and 'seq' refer to reference sequence and sample sequence, respectively.

Report a new mutant
After extraction of structural proteins, the continent name and geographical coordinates for each sample were achieved and reported using pycountry-convert 0.5.8 software and 'Titlecase' library in Python to depict the global prevalence maps of mutations. Lastly, global maps were drawn using Matlab 2021 'Geobubble' package.

Data normalization and statistical analysis
Data normalization was performed by usage of R 4.0.3 and Microsoft Power BI. The normalized frequency of each of studied regions was achieved due to the better comparison of the data attributed to each continent. Thus, the number of mutations was divided by the number of sequences on that continent that were comparable in equal proportions.

Identi cation of related human genes and interactions
To discover related human genes with E, M, N, S SARS-CoV-2 genes, The Human Protein Atlas database (https://www.proteinatlas.org/humanproteome/sars-cov-2) has been utilized to determine target genes.
Additionally, STRING ver.11.5 with an average local clustering coe cient of 0.527 was utilized to determine the interaction between target genes. The interaction network by .tsv format was downloaded. Adjacent matrix was designed and imported to Cytoscape version 3.8 to visualize PPI network. Moreover, cytoHubba package was used to perform node ranking analysis to identify the hub genes.

Results
Mutations quantity among geographical areas At the start, we decided to nd the occurred mutations in order to understanding the incidence rates of mutations and realizing the potentially essential mutations statistically. The number of 6394483, 6177403, 5841477 and 895738 sequences belonging to E, M, N and S proteins were quali ed and studied for identifying the number of mutations, respectively.
According to the achieved data, 96.40% of E amino-acid (AA) sequences (AASs) displayed no mutation (Fig 2). These features for M, N and S AASs were 36.76%, 2.20% and 2.11%, respectively. Additionally, the one mutation incidence rate was 3.56%, 59.64%, 5.68% and 26.86%, sequentially. 0.02%, 2.80%, 7.11% and 26.15% for E, M, N and S AASs displayed two mutations, respectively. Besides these data, the incidence rate of three and four mutations for E, M, N and S AASs has been demonstrated in Fig 2. The achieved data belonging to the E protein demonstrated that 77.72% of AASs in Africa and 95.72% of Asia ASSs did not display any mutation (Fig 2) and 4.31% of N AASs in the area of Asia, Europe, North America, South America and Oceania were without mutation occurrence, respectively. In contrast to Africa, which displayed 19.35% with one mutation, the one mutation incidence rate in other 5 areas was noticeably lower and except Oceania, other areas displayed almost similar one mutation incidence rate. The percentage of N ASSs with two mutations in Oceania and Africa were higher than other areas. This demonstration in Oceania and Africa ASSs were 28.81% and 16.55%, respectively but in Asia, Europe, North America and South America were 7.07%, 4.8%, 9.53% and 4.95%, successively. Concerning the S protein, it has been resulted that in South America, only 0.46% of AASs did not display any mutation and about 82% demonstrated four and more mutations. For such protein, the no mutation incidence rate in Oceania was 8.45%, the highest no mutation incidence rate. The one mutation incidence rate among S AASs has been demonstrated as 36.01%, 35.81%, 20.75%, 31.70%, 6.95% and 13.78% in the ASSs of Africa, Asia, Europe, North America, South America and Oceania, respectively. 81.96%, 35.92%, 24.07%, 17.14%, 16.99% and 2.52% of S AASs displayed four and more mutations among South America, North America, Asia, Africa, Europe and Oceania ASSs, in order from large to small. The prevalence of AASs with one mutation in Africa, Asia and Europe were higher than other types of achieved data (Fig 2). Besides, the most prevalent number of mutations in Oceania and Americas were two mutations and more than three mutations, respectively.
In the following, we decided to draw a heat map for mutations to detect their frequency in total and among each of the areas. Data displayed the most mutations relative to the total AASs among the E, M, N and S AASs occurred in the regions of 7 to 14 AA (0.0018 frequency), 66 to 88 AA (0.0279 frequency), 164 to 205 AA (0.0294 frequency) and 508 to 635 AA (0.0079 frequency), respectively (Fig 3). The second highest mutations frequency in the E, M, N and S AASs arose in the regions of 56 to 63 AA (0.0006 frequency), 1 to 22 AA (0.0010 frequency), 205 to 246 AA (0.0201 frequency) and 1 to 127 AA (0.0048 frequency), respectively. The necessity of heat map refers to the variation in the appearance number of mutations and dispersion in samples of which mutations have been occurred.

Mutation's features based on the geographical areas
In the next step, the locations of mutations in the protein structure and their frequency have been considered to identify more dimensions of mutations. As it has been mentioned in the Fig 4, the most frequent mutation in the E protein is attributed to T9I with the frequency rate of 0.0128 and after that, P71L with 0.0068 frequency, V62F 0.0066 frequency, L21F/L21V with 0.0017/0.0003 frequencies and V58F with 0.0013 frequency have the highest frequency rate of mutations. Locations of top three frequent mutations were shown in Fig 5 section A. Accordingly, T9I were the most frequent mutation in Europe, Oceania, North America and South America with the 0.0187, 0.0249, 0.0066 and 0.0049 frequency rates, respectively. Nevertheless, P71L was the most frequent mutation in Africa and Asia with frequency rates of 0.1643 and 0.0146, respectively. V62F is one of the rst ten frequent mutations in Asia (0.0118 frequency), Europe (0.0016 frequency) and North America (0.0024 frequency), in contrast to Africa (0.0011 frequency), Oceania (0.0004 frequency) and South America (0.0012 frequency) which this mutation was as eighth, sixth and ninth, respectively.
Regarding the M protein, analysis showed I82T (0.6015 frequency), D3G (0.0077 frequency), A63T (0.0073 frequency), Q19E (0.0072 frequency) and A2S (0.0033 frequency) are the rst ve mutations with highest frequency, respectively (Fig 4). I82T was the most frequent mutation in all six areas (Fig 5 section  B). This situation is different from D3G mutation which is not the second frequent mutation in Asia and North America. In these areas, F28L (0.020 frequency) and A81S (0.0083 frequency) were at the second position of frequent mutations, respectively. In Africa and Europe, A63T was the third frequent mutation with 0.0224 and 0.0091 frequencies, respectively. On the other hand, the third frequent mutation in Asia, Oceania, North America and South America were D3G (0.0048 frequency), Q19E (0.0259 frequency), S197N (0.0069 frequency) and R164H (0.004 frequency), respectively.
Analysis of N AASs data illustrated that the R203M/R203K with 0.6084/0.2489 frequencies was at the rst position of frequent mutations (Fig 5 section C). Globally, D377Y (0.6134 frequency) mutation ranks second, D63G (0.6002 frequency) ranks third, G215C (0.5479 frequency) ranks fourth and G204R/G204P (0.2352/0.0134 frequencies) ranks fth mutation (Fig 4). In all continents except South America, up to fourth position of frequent mutations were similar to the global results. Analysis data of South America resulted in the different arrangement. The frequency of R203M mutation is higher than R203K mutation in all continents excluding South America. The R203M/R203K frequencies in Africa were 0.4195/0.1965, in Asia were0.6033/0.3052, in Europe were0.6074/0.2826, in North America were 0.6310/0.1776 and in Oceania were 0.6143/0.3008. However, in South America the frequencies of R203M/R203K were 0.3570/0.5700. A further dimension of differences between South America and other areas belongs to the positions of second and third frequent mutations. G204R (0.5685 frequency) and P80R (0.4184 frequency) ranks second and third mutations in South America.
The pattern of mutation frequency for S AASs displayed that in the world, D614G with 0.9756 frequency achieved rst place among frequent mutations. In the following, L18F (0.1680 frequency), A222V (0.1579 frequency), E484K (0.1454 frequency) and N501Y (0.1120 frequency) ranks second to fth frequent mutations (Fig 4). The rst frequent mutation in S AASs is identical in all six geographical areas; however, the frequencies are different between them (Fig 5 section D). The frequency of D614G in Africa, Asia, The T9 mutation, which is the most frequent mutation in E AASs in the world, began to prevail from October 2021 and till January 2022, it is present with 0.0693 frequency rate. In comparison, P71 mutation gained in prevalence in May 2020 and after decreasing in August 2020, restarted to increase from September 2020. P71 mutation was at maximum frequency rate in March 2021 with 0.0257 frequency rate. Although V62 mutation was present from the rst days of pandemic, it has been increased from August 2021 and in October 2021 was at maximum frequency rate (0.0058). In Africa, the emergence of P71 has been detected in August 2020. In subsequent, it started to increase up to April 2021 with highest frequency rate (0.3673) and gained to decrease till September 2021. Accordingly, the highest frequency of P71 mutation in all other continents is almost identical to Africa. In Asia, V62 mutation increased noticeably from august 2021 and displayed its maximum frequency rate (0.0901) in October 2021. At the beginning of pandemic in South America, V58 mutation has been grown and was at highest frequency rate (0.0457) in May 2020. However, it declined from July 2020.
The most worldwide frequent mutation of M AASs, I82, has notably frequency rate in January 2021 (0.1095). The second globally peak of I82 prevalence started from May 2021 and it has the highest frequency rate (0.9969) in October 2021 (Fig 6). From this perspective, except South America, the evolutionary trend in distribution of I82 mutation in all continents have almost identical pattern. Although I82 was detectable in South America from the beginning of pandemic, it has almost consistent and near zero frequency rate before April 2021 and started to prevail from July 2021. In all areas, the Q19 mutation increased from October 2021; at the time which I82 mutation began to decrease in the entire world.
The prevalence of mutations among N AASs is uctuant. R203 mutation has a peak of frequency rate in January 2020and began to increase from February 2020 till august 2020. It globally started to prevail again from November 2020 and has a growing pattern in the following with 0.9907 frequency rate in January 2022 (Fig 6). Considering the evolutionary trends in areas resulted in the similar pattern of R203 prevalence excluding South America. The achieved data from this area demonstrated almost steady pattern of frequency rate for R203 from April 2020 to January 2022 the evolutionary trends of D63 and G215 mutations have approximately identical pattern and both started to increase from April 2021 in all continents. Also, one of the frequent and exclusive mutations in South America, P80 mutation, has been increased from November 2020 and started to decrease from June 2021.
The growing evolutionary trend of D614 mutation has been started from February 2020 in the entire world. In contrast to other areas, in Africa the mentioned mutation did not have steady pattern and has been demonstrated uctuant. L18 mutation has been increased from August 2020 and began to decrease from July 2021 in the world. Such pattern has been demonstrated by E488 and N501, globally (Fig 6). Contrary to them, A222 mutation displayed different trend. It gains in prevalence from July 2020 and started the decreasing pattern from October 2020. In May 2020, S477 mutation, which was one of the top ten frequent mutations in Oceania, began to increase and decreased just three months later (August 2020). The results achieved from South America demonstrated that except D614, approximately all other mutations obviously started to increase from November 2020.

Protein-Protein Interaction (PPI) network presentation
The Protein-Protein Interaction (PPI) network with 57 nodes and 153 edges presents the interaction between E, M, N, S SARS-CoV-2 protein and human proteins (Fig 7) (See Additional le 9). Through the ranking analysis, Ras GTPase-activating protein-binding protein 1 (G3BP1) was identi ed as high human gene rank (Fig 8). Additional data have been illustrated in Additional le 10. The network showed the linkage between the M protein cluster gene and E and N members which are linked with the A-kinase anchoring protein 8 like (AKAP8L) human gene playing role as a bottleneck. Also in this achieved network, Zinc Finger DHHC-Type Palmitoyltransferase 5 (ZDHHC5) and Golgin A7 (GOLGA7) have been shown as the human genes with highest interaction with S protein.

Discussion
From the beginning days of March 2020 and announcement of the COVID-19 as a pandemic, it has been prevailed globally with variable demonstrations among countries. This variability is due to the differences in infectivity and mortality potentials of prevailed virus in uenced by distinct factors including community characteristics such as age, genetic basis and mutations among predominant virus (21) . Vaccination for prevention and therapeutic strategies for treatment are two important tools to inhibiting the global disease. Structural proteins have been persuading scientists to be targets for drug and vaccine developments (22,23) . Therefore, considering the mutations in these proteins and identifying their impacts on functions will be e cient in high-quality production and development of such prevention and remedial tools.
To focus on the S protein as a pathogenicity-initiator and antibody-stimulator in uenced identifying of new mutations emergence through S protein and considering the potential of drug and vaccine e cacies for new variants. N501Y was one of the 17 mutations in the viral genome attributed to B.1.1.7 (Alpha) variant which has enhancing effect in viral attachment and infectivity (24) . N501Y appearance was reported in UK in December 2020 and entered the USA in at the end of December 2020. The evolutionary trend belonging to the mutation emergence of S AA position 501 is consistent with these time periods. The mentioned mutation is associated signi cantly with the enhancement of binding a nity and consequently, increasing of viral infectivity (25) . Mutation at the S AA position 501 besides D614G and E484K were two of nine mutations detected in B.1.351 (Beta) variant which reported in the second wave of COVID-19 in South Africa in October 2020. Accumulation of such mutations, which concluded as the members of top ten frequent mutations globally in our study, not only elicited the increased potential of transmission, also was e cient in reducing the neutralization by monoclonal antibody therapy (26) . The B.1.1.28 (Gamma) variant, which detected for the rst time in Brazil in December 2020, harbors ten mutations in the S protein including L18F, T20N, P26S, D138Y, R190S, H655Y, T1027I, V1176, K417T, E484K, and N501Y (27) . According to the current study, all of these mutations were detected as the top ten frequent mutations in the continent of South America excluding the mutation at the S AA position 138. Mutations at the positions of 18, 417 and 484 in the receptor-binding domain (RBD) are the main factors in enhancing the viral binding in both Beta and Gamma variants. The next VOC, Delta variant, was the main reason of the COVID-19 wave in April 2021 among Indian society. In addition to T19R, T478K, P681R and D950N, Delta variant owns six other mutations and similar to Gamma variant, it harbors ten mutations located in S protein in total (28) . The entire rst four mentioned mutations were demonstrated in the current study as the members of frequent mutations in Asia which has been increased noticeably after April 2021. T19R, T478K, P681R and D950N mutations help the Delta variant in escaping immune response system, increasing the viral attachment and enhancing viral replication, respectively (29,30) . In addition to Delta variant, mutation at the AA position 681 belonging to the S protein has been detected in the prior variants as well. P681H is one of the observed mutations in Alpha variant. According to our results, the change from proline to histidine is the most frequent mutation attributed to the S AA position 681 in North America and concluded as the second frequent mutation at the S AA position 681 in other continents except Europe and South America. P681R is more frequent than P681H in Asia, Africa and Oceania. Similarly, because of the role of both mutations in increasing the cleavage of S1/S2 site within the S protein by furin enzyme, these mutations have positive correlation with increased viral infectivity and virulence of COVID-19 (31,32) . L452R are another mutation of Delta variant in the RBD of S protein.
In the current study, this mutation has been shown as one of the top ten frequent mutations in Asia as well as Africa, North America and Oceania. Co-occurrence of L452R beside T478K can enhance the viral transmissibility via in uence on viral attachment. Summation of such mutations is related to Delta variant. Similarly, in Beta and Gamma variants, E484K/N501Y double mutations in RBD have same outcomes (33) . The number of mutations occurring in Omicron variant, which is the last VOC so far, is increasing daily and because of such mutations especially in S protein, it is believed that the restricting the pandemic situation and end of fatal demonstrations of COVID-19 is nearer than previous (10,34) . Our data analyzing showed ongoing frequencies of two mutations in N protein related to Alpha, Gamma and Omicron variants, G204R and R203K, in approximately near the time of variant identi cation in Africa. Furthermore, it has been displayed that the frequency of R203M mutation is higher than R203K mutation in all continents but South America during the period of research. Co-occurrence of these two mutations has positive correlation with the potential of infectivity and viral virulence through increasing viral replication (35) . On the other hand, D614G was displayed as the most frequent mutation in S protein in the world and independently, in each of the continents. Changing the aspartic acid to glycine resulted to enhancing the binding a nity to ACE-2. This mutation in S protein has positive effect on the transmissibility increase and is related to the greater viral load (3) . In spite of these outputs, there is not any evidence about D614G and increased severity of COVID-19 (3,36) . Even though the sensitivity to antibody neutralization become increased by the virtue of the mutation, higher infectivity and transmissibility potential besides lack of effect on the strength of severity are the cause of being the most frequent mutation in S protein globally.
M protein is another structural protein that has not been attracted the scientists as much as it should. Despite it is high conserved between SARS-CoV and SARS-CoV-2, the role in viral assembly and viral pathogenicity is important enough to investigate the mutations (37) . I82T mutation has been observed as the most frequent mutation globally. The similar results have been concluded about mutations frequencies in another study researching the M mutations in a geographical region (38) . Additionally, Q19E mutation of M protein in Omicron variant displayed increasing trend of frequency in November 2021 mainly in Africa and globally.
E protein entitled as the smallest structural protein of SARS-CoV-2 has critical role in viral assembly and also pathogenesis. Previous studies concluded the same we observed in the point that evolutionary trend of E mutations were slow-going (39)(40)(41) . T9I mutation, which concluded as the most frequent mutation globally in our study, has the increased frequency in about November 2021. This mutation is one of the mutations occurred in Omicron variant. The global pattern of mutations emergence and distribution concluded in our study acknowledges slower progress of E mutations compared to others. According to our results, 96.40% of E AASs did not possess any mutation. Due to the high conservancy attributed to this structural protein, it might be attractive for the vaccine and drug researchers as a potential target in COVID-19 (42) .
Although S protein attracted the main attentions, to consider the other structural proteins can have the similar to or may be higher results in drug and vaccination development than the S protein. Recently, new serological method for detection of SARS-CoV-2 has been introduced using the N protein (34,43) .
Because the conserved regions in N protein of SARS-CoV-2 Harbor the highest AA similarity with SARS-CoV, researchers should investigate more speci c regions if they are curious about determining new methods based on the SARS-CoV-2 N protein (44) . Also, mutations can impact on the results of diagnostic tests. T135I mutation can result to false negative using rapid antigen test. Although this mutation was not classi ed among top ten frequent mutations of N protein in the current study, it is urgent to screen the mutations accurately to product and develops the reliable diagnostic tests. Therapeutic strategies targeting the structural proteins must be according to the mutations and frequencies and functions due to them. Some mutations among S protein such as E484K and L452R resulted in the reduced sensitivity toward antibodies and requiring higher titers of convalescent sera for effective neutralization (9,45,46) . N501Y mutation is also related to decreasing sensitivity against some speci c antibodies such as COVA1-12 and CB6 in spite of the fact that it has not any pronounced effect on the neutralizing activity of plasma or sera from vaccinated persons (47,48) .
Identifying the related human genes and proteins with SARS-CoV-2 genes can enhance the scope of our nding about the host targets of such virus. According to the previous proteomic studies, ZDHHC5 and GOLGA7 were two human genes with high interaction with the S protein of SARS-CoV-2 (49) . ZDHHC5 is localized in the endoplasmic reticulum (ER) or Golgi apparatus and has an important role in catalyzing protein palmitoylation. Also, GOLGA7 is another one of the cell palmitoylation system which plays role in regulating the ZDHHC enzyme activity (50,51) . Thus, ZDHHCs enzyme and GOLGA7 are essential for palmitoylation of viral proteins and with the last nding about their high con dence interaction with S protein of SARS-CoV-2, they can be introduced as a potential drug target. In con rmation of these ndings, high interaction of ZDHHC5 and GOLGA7 with S protein has been concluded in the current study. Furthermore, among our analyses, G3BP1 has been introduced as the human gene with highest rank among all human genes. It has been reported that N protein of SARS-CoV-2 can interact with this protein in order to inhibiting the host stress granule (SG) assembly and promoting the viral infection (52) . SGs are dynamic ribonucleoprotein (RNP) assemblies constructed in the stressful conditioned such as oxidative stress and viral infection. G3BP1 with G3BP2 are key SG-nucleating factors which have been selected as targets of viral proteins by interaction with their N-terminal nuclear transport factor 2-like (53,54) . The PPI network also displayed AKAP8L as a linkage unit between M protein cluster and E and N members. AKAP8 is associated with the family of structurally diverse scaffold proteins and AKAP8L is the homologue of that protein. They have been presented as the proteins in uencing the tumorigenesis and tumor metastasis of cancer cells (55, 56) . Although PPI network resulted in the report of related human genes with structural genes of SARS-CoV-2, the accurate effects of ZDHHC5 and GOLGA7 on S protein of SARS-CoV-2, the mechanism of interaction between N protein and SG disassembly and also the exact role of AKAP8L in viral infection and its mechanism are unclear and more molecular research must be performed to explain the molecular processes.
Our study encountered two limitations. First, in this research we considered AASs without studying the nucleotide sequences. This leads to the not checking the other aspects of new-emerging variants such as codon bias. Second limitation attributed to variation in samples and mutations reporting between different regions. Differences in mutation reports leads to the ignoring the present but not discovered mutations and consequently, it may cause not studying the new variants. Pie chart plot of the number of mutations among structural proteins of SARS-CoV-2 up to January 2022.

Conclusion
The incidence rate of one, two, three, four and more mutations in addition to the rate of lack of any mutation among E, M, N and S proteins has been displayed in the clusters of A to D, respectively. 96.40% of E AASs, 36.76% of M AASs, 2.20% of N AASs and 2.11% of S AASs clustered as the AASs without any mutation entire the world. The rate of four and more mutations among the M, N and S AASs calculated as 0.01%, 76.54% and 30.62% globally. This index for E AASs did not display in the current pie chart due to the very low amount of attributed index ( = 3.75E-0.004% ) Figure 3 The heat map of mutations among structural proteins of SARS-CoV-2 as of January 2022; these indicate the rate of mutation per 100 amino acids. The highest frequency rate among the E, M, N and S AASs occurred in the regions of 7 to 14 AA, 66 to 88 AA, 164 to 205 AA and 508 to 635 AA, respectively. In the world, the region of 56 to 63 for the E AASs has been concluded as the position with second relative frequent mutations. The position with same feature were 1 to 22, 205 to 246 and 1 to 127 among the M, N and S AASs globally Timeline for report of mutations and evolutionary trends of top ten high-rate mutations in the structural proteins of SARS-CoV-2 among different geographic areas including North America, South America, Europe, Asia, Oceania, and Africa and globally The visualization of PPI network with 57 nodes and 153 edges Figure 8 Hub genes identi cation was concluded by node ranking analysis. As it is seen, the AKAP8L human gene is the linkage gene between the M protein cluster gene and E and N genes of SARS-CoV-2. Moreover, ZDHHC5 and GOLGA7 showed high interaction with the S protein of SARS-CoV-2