Comparative physicochemical and evolutionary study of SARS-CoV-2 and SARS with special reference to salt bridge and their microenvironment: A plausible explanation for divergency, stability and severity of SARS-CoV-2

The occurrence of concentrated pneumonia cases in Wuhan city, Hubei province of China was rst reported on December 30, 2019. Currently, it is known as COVID 19 and now it is a nightmare for the whole world. SARS CoV rst reported in 2002, but not spread worldwide. After 18 years, in 2020 it reappears and spread worldwide as SARS-CoV-2 (COVID 19), the most dangerous virus creating disease in the world. Is it possible to create a favorable evolution within this short time? If possible, then what are those properties that are changed in SARS-CoV-2 to make it undefeated? What are the basic differences between SARS-CoV-2 and SARS? This study will nd all those queries. Here, all protein sequences of SARS-CoV-2 and SARS are retrieved from the database to check their physicochemical, evolutionary and structural properties. Results showed that, charged residues are playing a key role in SARS-CoV-2 evolution. SARS-CoV-2 increases its polarity by the help of charged residues, not by the polar residues. Their divergence is also strictly restricted. Induction of salt bridges with their high energies makes it very stable in any extreme conditions. Microenvironment residues also play a very crucial role in its stability. Mostly residues are favorable and contribute high energies. These microenvironment residues help in protein engineering to reduce its stability and make them week. This comparative study will help to understand the evolution from SARS to SARS-CoV-2.


Introduction
Coronavirus is an enveloped viruse having a positive single-strand RNA genome, and they have spike proteins on the surface with a size of 60 nm to 140 nm [1]. There are four subtypes' such as alpha, beta, gamma, and delta type of coronavirus. Most of the highly pathogenic viruses; like-Severe acute respiratory syndrome coronavirus (SARS-CoV), Middle-East respiratory syndrome coronavirus (MERS-CoV), and SARS-CoV-2, all are a type of β-coronavirus [2]. Disease caused by SARS-CoV-2 is known as Corona Virus Disease 2019 . SARS-CoV rst emerged in the Guangdong province of China in 2002 and had spread into ve countries infecting 8,098 people and 774 deaths having a mortality rate of 11%. After that in 2012, MERS-CoV emerged in the Arabian Peninsula and had spread into 27 countries, infecting a total of 2,494 individuals and took 858 lives with a mortality rate of 34%. Recently SARS-CoV-2 has been found in Wuhan city, Hubei province of China in December 2019. Till now there are over one crore cases of COVID-19 and over a half million deaths (mortality rate around 3%) have been reported to globally affect 218 countries. On March 11, 2020, the World Health Organization declared the COVID-19 pandemic a public health emergency of global concern. All ages of people can catch this viral infection but immune-compromised aged people having co-morbidities are most vulnerable.
Susceptibility of age, males with chronic diseases (like-diabetes, heart disease, cancer etc.) is higher than other groups of people [3]. This virus can be easily transmitted through the droplets generated at the time of coughing and sneezing by the infected people [4]. These infectious droplets can be spread up to 1-2 meters and stay on surfaces. This virus can survive on metal surfaces for several hours even days in favorable conditions but can be destroyed by disinfectants like hydrogen peroxide, sodium hypochlorite etc. [5]. The incubation period varies from 2 to 14 days. Few common clinical symptoms are fever (except asymptomatic cases), dry cough, sore throat, fatigue, headache, breathlessness, sudden loss of smell and taste. Without proper treatment, this disease can cause pneumonia, respiratory failure and even death. Generally, after one-week recovery started. It has been observed in patients that the progression of this disease increases the release of cytokine including interleukin (IL)-6 and IL-10 whereas the levels of CD4+T and CD8+T are reduced [6]. As of now, there is no approved treatment for COVID-19 but antiviral drugs such as Remdesivir, Tocilizumab are in use for treatment [7].
Generally, the β-coronavirus genome contains six open reading frames (ORFs), rst ORFs (ORF1a/b) are in two-thirds of the whole genome and encodes 16 nonstructural proteins (nsps). There is a one frameshift between ORF1a and ORF1b, which produces two polypeptides: pp1a and pp1ab. Main protease (M pro ) and chymotrypsin-like protease (3CL pro ) are involved in the processing of these polypeptides [8,9]. Other ORFs of the genome near the 3′-terminus encodes the four main structural proteins: spike glycoproteins, membrane, envelope, and nucleocapsid proteins [10]. Genome analysis of SARS-CoV-2 revealed that there are 79.5% and 97% of similarity with the whole genome sequences of SARS-CoV and bat SARS-CoV respectively (Chen et al., 2020). SARS-CoV-2 enters the host respiratory mucosa by binding with the receptor of angiotensin-converting enzyme 2 (ACE2) with its spike glycoproteins [11]. A recent study has shown that SARS-CoV-2 binds with ACE2 with 10-fold higher a nity compared to SARS-CoV [12]. The basic reproduction number (R 0 ), which is the average number of secondary infections produced by patients, is between 2.47-2.86 for SARS-CoV-2, whereas the R 0 value of SARS-CoV is 2.2-3.6, and 2.0-6.7 for MERS-CoV [13][14][15]. These results indicate that SARS-CoV-2 has comparatively high transmission ability than other coronaviruses. Sequence analysis of SARS-CoV 2, SARS-CoV and other SARS-related coronaviruses (SARSr-CoV) spike glycoproteins showed that four amino acids are inserted in the positions of 681-684 between S1 and S2 subunit of SARS-CoV-2 [16].
SARS-CoV ORF 3b, ORF 6, and N proteins inhibit the expression of beta interferon (IFN-β) [17]. The envelope (E) protein in coronavirus is a small membrane protein that has several functions in virion assembly and ion-channel activity, through which it can interact with the host [18].
With the unavailability of speci c vaccines and anti-viral drugs for nCoV, science demands sincere efforts in the eld of drug design and discovery for COVID-19. Since 2002, SARS has present on this earth. But it creates a dangerous situation and makes a pandemic situation after 18 years. Why? Why is this virus so dangerous to us? What are the basic differences between SARS-CoV-2 and SARS? How evolution makes them stronger than SARS? How can they gain stability for such extreme environments? To nd all those queries, all 4 types (spike proteins, membrane proteins, nucleoproteins and ORF proteins) protein sequences of SARS-CoV-2 and SARS were extracted from the database for physicochemical, evolutionary and structural properties analysis. To check their stability salt bridges are also extracted with their microenvironment residues. Salt bridges are playing an important role in protein stability for operates their physiological activity in an extreme environment [19][20][21].

Materials And Methods
Page 4/16

Dataset
A detailed analysis of those sequences and structure of SARS-CoV-2 was performed with reference to old SARS. All protein sequences of SARS-CoV-2 and SARS were retrieved from UNIPROT [22] database. Here we took 4 types of SARS-CoV-2 and SARS proteins i.e. spike proteins, membrane proteins, nucleoproteins and ORF proteins. The crystal structures of SARS-CoV-2 and SARS proteins were retrieved from the RCSB protein database (PDB) [23]. In structural comparison, we took protease which is a membrane protein.

Physicochemical and evolutionary properties
All protein sequences were subject to multiple sequence alignment was done for all sequences with the help of CLUSTAL Omega [24]. Both block and non-block format of the sequences were analyzed. Block of sequence was prepared by BLOCK by using ABPT [25]. Non-block and block both format were analyzed by PHYSICO2 [26] for calculation of physicochemical properties like amino acid composition, GRAVY, Zimmerman bulkiness, Grantham polarity, aliphatic index etc. The program works from UNIX like environment [27] including CYGWIN1 and is interpreted by AWK programming language [28]. It works on various forms of FASTA [29] les (Input). Mean relative abundance (MRA) was calculated from the mean value of SARS-CoV-2 proteins relative to SARS proteins to compare the data. Evolutionary properties like Maximum diverse residue (MDR), maximum conserve residue (MCR), dominant hetero pair (DHP), non conserve and conserve ratio, E value etc. were calculated by APBEST [25].

Analysis of crystal structure
SARS-CoV-2 protease (4X81) and SARS protease (1PU8) were extracting from RCSB PDB [23] for structural comparison. All structured were minimized by AUTOMINv.1 [30]. Core and surface composition of those structures were identi ed by COSURIM [31]. Analyses of the secondary structure were done by PROPAB [32] to nd the amino acid abundance in coil, helix and strand.

Analysis of salt bridges and their surrounding microenvironment
Salt bridges were extracted with their positions and types by SBION2 [33] and their energies were calculated by ADSBET2 [34]. Poisson-Boltzmann calculations were done with the help of APBS [35], PDB2PQR v1.9.0 [36] and CHARMM22 [37] force eld. Four types of energy terms were calculated in this analysis; desolvation (ΔΔG dslv ), bridge (ΔΔG brd ), background (ΔΔG bac ), total net energy (ΔΔG net ). NUM method [38] was used to calculate network salt bridge energies. Those salt bridge residues surrounded by charged, polar and hydrophobic residues which might affect those salt bridge energies. Those residues are indicating as microenvironment residues. It is a new concept in structural biology. Energies of microenvironments were calculated by an in house automated method [39].

Divergence in SARS-CoV-2
To check the difference homologous positions of sequence of SARS-CoV-2 from SARS, we analyzed the BLOCK format [25] to observe the evolutionary properties like MCR, MDR, DHP and R ratio.
In case of maximum conserve ratio (MCR) ( Table 1), it showed charged, uncharged polar and hydrophobic residues in SARS-CoV-2, whereas hydrophobic and neutral polar residues in SARS. In case of maximum diverse residue (MDR), it showed similar results like MCR. Dominant hetero pair also showed higher abundance of charged residues in SARS-CoV-2, but it is lacks in SARS. The R ratio (NCS:CS) showed lower value in every protein of SARS-CoV-2 than SARS.

Effect of charged residues on SARS-CoV-2 sequence
Here D, E, H, R, K took as a charged residues and C, S, T, N, Q, Y, W took as polar uncharged residues. Amino acid compositions were calculated from the non-block format whereas block format was used to calculate disorder forming residues (Dis), Zimmerman bulkiness (ZB), aliphatic index (AI), Grantham polarity (GB) and order forming residues (Ofr) etc. Is there a preference for amino acids in SARS-CoV-2 relative to SARS? To nd that answer we calculate all those physicochemical properties.
Spike proteins showed higher abundance (Fig. 1a) of charged residues (except D) in SARS-CoV-2. Polar residues showed higher abundance (except C, S) in SARS. Other proteins (membrane proteins, nucleoproteins, ORF proteins) showed almost similar results, where charged residues showed positive MRA in SARS-CoV-2 (Fig. 1c, e, g). Polar and hydrophobic residues showed negative MRA in SARS-CoV-2 which means those residues have higher abundance in SARS proteins. But when we check the polarity (Fig. 1b, d, f, h) of those proteins by Grantham polarity, it showed the high value in SARS-CoV-2 than SARS. The number of disorder forming residues (Dis) has higher abundance in SARS-CoV-2 than SARS.
In the other hand, order forming residues (Orf) showed negative MRA in SARS-CoV-2. Due to the latter, Zimmerman bulkiness and aliphatic index are also higher in SARS-CoV-2 GRAVY (grand average of hydropathy) is calculated by adding the hydropathy value [40] for each residue and dividing by the length of the protein sequence. The lower value of GRAVY indicates the hydrophilic nature of SARS-CoV-2 (Table 2).

Charged residues on surface of SARS-CoV-2
The increasing of charged residues in SARS-CoV-2 protein sequences gives a clue that they might have effect on structure. To check, those structures of protease from SARS-CoV-2 (5R81) and SARS (1P9U) have been analyzed.
Acidic residues showed similar abundance in the core of SARS-CoV-2 and SARS, but it showed higher abundance in the surface of SARS-CoV-2 (Table 3). Polar residues also showed higher abundance in the surface of SARS-CoV-2. But hydrophobic residues showed slightly low abundance in the surface of SARS-CoV-2 than SARS.

Charged residues in helix of SARS-CoV-2
Charged residues showed higher abundance in the helix of SARS-CoV-2 (Table 4). SARS-CoV-2 signi cantly increased the stability by stabilizing it's helix structure. In case of SARS, charged residues are mostly present in coil rather than the helix. Polar residues also present in high number at the coil of SARS, whereas in SARS-CoV-2 polar residues are mostly present in strand. Hydrophobic residues also have higher abundance in the helix of SARS-CoV-2 and in coil of SARS.

Stability of SARS-CoV-2
Salt bridges have a huge effect on protein stability [41,42]. Charged residues are participating in the formation of salt bridges. Normally two types of salt bridges are found in proteins [43]. The increasing number of charged residues of SARS-CoV-2 indicates that charged residues might have an effect on salt bridge formation to gain more stability.
Results of salt bridges (Table 5)

Effect of microenvironment residues in SARS-CoV-2
Due to the high number of isolated salt bridges (ISB) in SARS, they have more surrounding residues than SARS-CoV-2 (Table 6). But the total energies of the ISB microenvironment are high in SARS-CoV-2. SARS-CoV-2 has engineered its microenvironment for more stability. Thr 292 in SARS-CoV-2 contributes the highest energy i.e. -24.39 kj/mol. In case of network salt bridge microenvironment, SARS-CoV-2 has high number of residues and energies. Most of the microenvironment residues are present in coil followed by strand and helix.

Discussion
The comparative study between SARS-CoV-2 and SARS reveals that how favorable evolution makes SARS-CoV-2 more dangerous and stronger than SARS. The R ratio (Table 1) indicates that their divergence must be restricted [44]. Although polar residues have lower abundance in SARS-CoV-2, it means polarity increased in SARS-CoV-2 due to higher abundance of charged residue, not by the polar residues ( Fig. 1). Those acidic and basic residues play a major role in such alteration [39]. The higher number of disorder forming residues in SARS-CoV-2 indicates that it can easily create toxicity or disease in humans [45,46]. High value of Zimmerman bulkiness in SARS-CoV-2 clearly indicates that they need longer heating periods in hydrolysis [47]. They can tolerate heat better than SARS. High value of aliphatic index in SARS-CoV-2 proved that SARS-CoV-2 is more thermally stable than SARS [48]. The hydrophilic nature of SARS-CoV-2 (Table 2) gives a clue that it can easily interact with water or aqueous medium and spread easily than SARS. (Table 3) on the surface to interact easily with the aqueous medium, which helps it in stabilization [49][50]. In other hand, helix has a great effect on protein stability. Introduction of high number charged residues in the helix results in proteins more resistant to acidic environment or temperature denaturation and helps in increasing the stability [51][52].

SARS-CoV-2 increases hydrophilic residues
High numbers of salt bridges contribute high energies in SARS-CoV-2 stability for extreme environments (Table 5). SARS-CoV-2 signi cantly increases its total salt bridges energies by lowering the desolvation cost. Formation of thr cyclic salt bridge is also a great evolution with respect to protein stability. Cyclic salt bridge always has a high contribution in protein stability [53]. SARS-CoV-2 has specially designed salt bridge to gain more stability in extreme environments [19][20][21].
The analysis of microenvironment residues of those salt bridges revealed that they have played an important role in salt bridge energies and overall protein stability ( Table 6). The polar residues and the charged residues that are not forming any salt bridge have contributed in the microenvironment to gain more stability [39]. So, the evolution of SARS-CoV-2 has a great role in its stability. It is also a clue for how to stop SARS-CoV-2 severity of infection. Protein engineering helps us in this process. Unfavorable mutation of those favorable microenvironment residues can decrease the stability of SARS-CoV-2. The concept of the microenvironment is the key factor to identify those residues.      MRA of amino acid compositions, disorder forming residues (Dis), Zimmerman bulkiness (ZB), aliphatic index (AI), Granthum polarity (GB) and order forming residues (Ofr) of spike proteins (Fig. 1a, 1b), membrane proteins (Fig. 1c, 1d), nucleoproteins (Fig. 1e, 1f), ORF proteins (Fig. 1g, 1h) from SARS-CoV-2.