In-silico study of SARS-CoV-2 and SARS with special reference to intra-protein interactions, A plausible explanation for stability, divergency and severity of SARS-CoV-2

The current nightmare for the whole world is COVID-19. The occurrence of concentrated pneumonia cases in Wuhan city, Hubei province of China was rst reported on December 30, 2019. SARS-CoV rst discloses in 2002, but not outspread worldwide. After 18 years, in 2020, it reemerges and outspread worldwide as SARS-CoV-2 (COVID 19), as the most treacherous virus creating disease in the world. Is it possible to create a favorable evolution within this (18 years) short time? If possible, then what are those properties or factors that are changed in SARS-CoV-2 to make it undefeated? What are the fundamental differences between SARS-CoV-2 and SARS? This study will nd all those queries. Here, we took 4 types of protein sequences from SARS-CoV-2 and SARS are retrieved from the database to check their physicochemical and structural properties. Results showed that charged residues are playing a pivotal role in SARS-CoV-2 evolution. Those charged residues also contribute to helix stabilization of SARS-CoV-2. Formation of cyclic salt bridge and other intra-protein interactions also play crucial role in SAS-CoV-2. This comparative study will help to understand the evolution from SARS to SARS-CoV-2 and also helps in protein engineering.

and over a 1.3 million deaths (mortality rate around 2.40%) have been reported to globally affect 218 countries. On March 11, 2020, the World Health Organization announces the COVID-19 pandemic a public health emergency of global concern. All ages of people can catch this viral infection but immunecompromised aged people having co-morbidities are most vulnerable. Susceptibility of age, males with chronic diseases (like-diabetes, heart disease, cancer etc.) is higher than other groups of people 1 . This virus can be easily transmitted through the droplets generated at the time of coughing and sneezing by the infected people 2 . These infectious droplets can be spread up to 1-2 meters and stay on surfaces.
This virus can survive on metal surfaces for several hours even days in favorable conditions but can be destroyed by disinfectants like hydrogen peroxide, sodium hypochlorite etc. 3 . The incubation period varies from 2 to 14 days. Few common clinical symptoms are fever (except asymptomatic cases), dry cough, sore throat, fatigue, headache, breathlessness, sudden loss of smell and taste. Without proper treatment, this disease can cause pneumonia, respiratory failure and even death. Generally, after the oneweek recovery started. It has been observed in patients that the progression of this disease increases the release of cytokine including interleukin (IL)-6 and IL-10 whereas the levels of CD4+T and CD8+T are reduced 4 . As of now, there is no approved treatment for COVID-19 but anti-viral drugs such as Remdesivir, Tocilizumab are in use for treatment 5 .

Page 3/13
Coronavirus is an enveloped virus having a positive single-strand RNA genome, and they have spike proteins on the surface with a size of 60 nm to 140 nm 6 . There are four subtypes' such as alpha, beta, gamma, and delta type of coronavirus. Most of the highly pathogenic viruses; like-Severe acute respiratory syndrome coronavirus (SARS-CoV), Middle-East respiratory syndrome coronavirus (MERS-CoV), and SARS-CoV-2, all are a type of β-coronavirus 7 . Generally, the β-coronavirus genome contains six open reading frames (ORFs); rst ORFs (ORF1a/b) are in two-thirds of the whole genome and encode 16 nonstructural proteins (nsps). There is one frameshift between ORF1a and ORF1b, which produces two polypeptides, pp1a and pp1ab. Main protease (M pro ) and chymotrypsin-like protease (3CL pro ) are involved in the processing of these polypeptides [8][9] . Other ORFs of the genome near the 3′-terminus encodes the four main structural proteins, spike glycoproteins, membrane, envelope, and nucleocapsid proteins 10 . Genome analysis of SARS-CoV-2 revealed that there are 79.5% and 97% of similarity with the whole genome sequences of SARS-CoV and bat SARS-CoV respectively (Chen et al., 2020). SARS-CoV-2 enters the host respiratory mucosa by binding with the receptor of angiotensin-converting enzyme 2 (ACE2) with its spike glycoproteins 11 . A recent study has shown that SARS-CoV-2 binds with ACE2 with 10-fold higher a nity compared to SARS-CoV 12 . The basic reproduction number (R 0 ), which is the average number of secondary infections produced by patients, is between 2.47-2.86 for SARS-CoV-2, whereas the R 0 value of SARS-CoV is 2.2-3.6, and 2.0-6.7 for MERS-CoV [13][14][15] . These results indicate that SARS-CoV-2 has comparatively high transmission ability than other coronaviruses. Sequence analysis of SARS-CoV 2, SARS-CoV and other SARS-related coronaviruses (SARSr-CoV) spike glycoproteins showed that four amino acids are inserted in the positions of 681-684 between S1 and S2 subunit of SARS-CoV-2 16 . SARS-CoV ORF 3b, ORF 6, and N proteins inhibit the expression of beta interferon (IFN-β) 17 . The envelope (E) protein in coronavirus is a small membrane protein that has several functions in virion assembly and ion-channel activity, through which it can interact with the host 18 .
With the unavailability of speci c vaccines and anti-viral drugs for nCoV, science demands sincere efforts in the eld of drug design and discovery for COVID-19. Since 2002, SARS has present on this earth. But it creates a dangerous situation and makes a pandemic situation after 18 years. Why? Why is this virus so harmful to us? What are the basic differences between SARS-CoV-2 and SARS? How evolution makes them stronger than SARS? How can they gain stability for such extreme environments? Salt bridges and other intra-protein interactions are playing an essential role in protein stability to operate their physiological activity in an extreme environment [19][20][21] . Do they play a vital role in SARS-CoV-2? To nd all those queries, all the 4 types (spike proteins, membrane proteins, nucleoproteins and ORF proteins) protein sequences of SARS-CoV-2 and SARS were extracted from the database for physicochemical and structural properties analysis. To check their stability salt bridges and other factors are also extracted.

Materials And Methods
Dataset A detailed investigation of those sequences and structure of SARS-CoV-2 was performed with reference to the old SARS. Here we took 4 types of SARS-CoV-2 and SARS proteins i.e. spike proteins, membrane proteins, nucleoproteins and ORF proteins (ORF 3, ORF 6, ORF 7, ORF 8 and ORF 9). All protein sequences of SARS-CoV-2 and SARS were retrieved from UNIPROT 22 database (Table 1). The crystal structures of SARS-CoV-2 and SARS proteins were retrieved from the RCSB protein database (PDB) 23 . In structural comparison, we took the protease protein structure, cause it is heavily used as target in drug discovery. Analysis of crystal structure SARS-CoV-2 protease (5R80) and SARS protease (2H2Z) were extracting from RCSB PDB for structural comparison. All structured were minimized in 1000 steps by using UCSF Chimera 37 . Analyses of the secondary structure were done by CFSSP 38 to nd the amino acid abundance in coil, helix, sheet and turn.
Number of salt bridges were extracted by WHAT IF server 39 . Intra-protein interactions were determined by PIC server 40 and Arpeggio server 41 . Free solvation energy was calculated by ProWaVE server 42 . Surface area and volume was determined by CASTp server 43 . Phosphorylation sites of protein were identi ed by NetPhos server 44 . Protein mutations were analyzed by DUET 45 .

Results
Effect of charged residues on SARS-CoV-2 sequence Here D, E, H, R, K took as a charged residues and C, S, T, N, Q, Y, W took as uncharged polar residues.
Amino acid compositions were calculated from the non-block format whereas block format was used to calculate disorder forming residues (Dis), order forming residues (Ofr), bulkiness, aliphatic index (AI) and polarity. GRAVY (grand average of hydropathy) is calculated by adding the hydropathy value 46 for each residue and dividing by the length of the protein sequence. Is there a preference for amino acids in SARS-CoV-2 relative to SARS? To nd that answer, we calculate all those physicochemical properties.
Spike proteins showed higher abundance (Fig. 1) of charged residues (except D) in SARS-CoV-2. Polar residues showed higher quantity (except T, W) in SARS-CoV-2. In nucleoproteins of SARS-CoV-2 D, K and R shows higher abundance and E, H shows lower abundance as charged amino acids. Polar residues in nucleoproteins also showed higher abundance (except T, N) in SARS-CoV-2. Surprisingly C is absent in both groups of sequence in nucleoproteins. Other proteins i.e. membrane proteins and ORF proteins showed almost similar abundance with those previous results. The number of disorder forming residues has higher abundance in SARS-CoV-2 than SARS. The number of order forming residues has lower abundance in SARS-CoV-2 than SARS. The higher number of disorder forming residues in SARS-CoV-2 indicates that it can easily create toxicity or disease in humans. The lower value of GRAVY (except nucleoproteins) indicates the hydrophilic nature of SARS-CoV-2. So, it can be easily mixed with aqueous or liquid medium. The aliphatic index is high in every SARS-CoV-2 proteins. High value of the aliphatic index in SARS-CoV-2 proved that SARS-CoV-2 is more thermally stable than SARS 47 .
When we check the polarity of those proteins, it showed slightly high values in SARS-CoV-2 proteins than SARS (Fig. 2). Due to the latter, bulkiness is also high in SARS-CoV-2 than SARS. The high value of bulkiness in SARS-CoV-2 indicates that they need longer heating periods in hydrolysis 48 . They can tolerate heat better than SARS. The Kyte-Dollitle hydrophobicity scale indicates that the SARS-CoV-2 is hydrophilic in nature (Fig. 3). The hydrophilic nature of SARS-CoV-2 gives a clue that it can easily interact with water or aqueous medium and spread easily than SARS [49][50] . The intrinsic disorder regions are very much high in SARS-CoV-2 than SARS. High abundance of intrinsic disorder regions of SARS-CoV-2 indicates that it will more interact with other proteins than SARS 51-52 . Abundance of charged residues in helix of SARS-CoV-2 The building blocks of proteins i.e. amino acids are found in four positions of secondary structure; coil, helix, sheet, and turn. Charged residues showed higher abundance in every position (turn, helix, coil and sheet) of SARS-CoV-2 (Table 2) than SARS. Charged residues show higher abundance within the helix of both proteins. Introduction of high number charged residues in the helix results in proteins more resistant to the acidic environment or temperature denaturation and helps in increasing the stability [53][54] . Hydrophobic residues have higher abundance in SARS (except coil) than SARS-CoV-2. Polar residues also show higher abundance in SARS-CoV-2 than SARS.

Intra-protein interactions effect on stability of SARS-CoV-2
Salt bridges have a signi cant effect on protein stability [55][56][57][58] . Charged residues are participating in the formation of salt bridges. Normally two types of salt bridges are found in proteins, i.e. isolated salt bridge and network salt bridge. The increasing number of charged residues of SARS-CoV-2 indicates that charged residues might affect salt bridge formation to gain more stability. Other intra-protein interactions like, metal ion binding site 59 , aromatic-aromatic interactions [60][61] are also helps in protein stabilization.
SARS-CoV-2 has large pocket area than SARS (Fig. 4), which gives it more protein-protein or proteinligand interactions possibilities ( Table 3). Volume of the protein is also high in SARS-CoV-2 than SARS. Protease from SARS-CoV-2 possess 7 isolated salt bridges and 1 network salt bridges, whereas SARS protease has 5 isolated and 1 network salt bridges. Result indicates that SARS-CoV-2 is highly stabilized by those salt bridges. Number of metal ion binding site is also high in SARS-CoV-2 than SARS. Free solvation energy is a thermodynamic factor that determines protein salvation or nature of denaturation 62 . By this property we can determine how fast proteins easily denature. Solvation free energy is also high in SARS-CoV-2 than SARS which indicates the SARS-CoV-2 protein not easily denature in contact with solvent. Aromatic-aromatic interactions show high number in SARS-CoV-2 than SARS (Table 4). Not only number, those residue (Phe8, Tyr37, Phe103, Tyr101, Phe150, Phe159) which participate in aromatic-aromatic interactions are forming a very long network, which is never been reported in any proteomics research.
SARS-CoV-2 has 9 isolated and 1 network aromatic-aromatic interactions where as SARS has only 9 isolated aromatic-aromatic interactions. The number of phosphorylation site (Fig. 5) in SARS-CoV-2 is 54, whereas the number of phosphorylation site in SARS is 45. That means SARS-CoV-2 has higher number of phosphorylation sites than SARS. The high number of phosphorylation site in SARS-CoV-2 increase the strength of protein-protein interactions and also helps in stability 63 .

SARS-CoV-2 has cyclic salt bridge
Generally proteins have two types of salt bridges, isolated and network salt bridges. Both proteins have only one network salt bridge. But SARS-CoV-2 has special engineered salt bridge (Fig. 6) (Table 5). Residue number 35 which was threonine of SARS substitute by valine in SARS-CoV-2 after mutation, contribute high energy i.e. -2.24 Kcal/mol in protein stability. By those speci c point mutations SARS-CoV-2 ultimately got -7.46 kcal/mol energies which make them more stable than SARS.

Conclusion
The comparative study between SARS-CoV-2 and SARS reveals that how favorable evolution makes SARS-CoV-2 more dangerous and stronger than SARS. Those acidic and basic residues play a major role in evolution. Charged residues also present in helix to increase the protein stability. Also the long network aromatic-aromatic interactions have an effect on its stability. This is the rst report of cyclic salt bridge and long network aromatic-aromatic interaction in structural biology. Increasing of metal ion binding site, phosphorylation site also play crucial role in SARS-CoV-2 protein stability. So, the evolution of SARS-CoV-2 has a great role in its stability. Those point mutations show how SARS-CoV-2 engendered itself to gain more stability. It is also a clue for how to stop SARS-CoV-2 severity of the infection. Protein engineering helps us in this process. This study will also bene cial for drug or vaccine development against SARS-CoV-2.