Comparison of random and site directed mutation effects on the efficacy between lead SARS-CoV2 anti-protease drugs Indinavir and Hydroxychloroquine CURRENT STATUS: POSTED

The diversification of virus can be attributed to random mutations leading to the development of drug resistance. The variations can be inherited from one generation to the other rendering the drug ineffective. However, a pharmacologically induced selection pressure can be countered by introducing drugs better adapted to work under rapid mutations. In this study we try to explore the effect of site directed and random substitution mutations simulated in the ligand binding region of SARS-CoV2 protease. Amongst six currently studied anti-protease drugs for COVID-19, Indinavir and Hydroxychloroquine were chosen for the study based on their high binding affinity scores, -6.81, and -4.81 respectively. The effect of mutations in protein-ligand binding was analysed in two steps. Initially, analysis of over 90 homologous protease and 100 SARS-CoV-2 orf1ab regions revealed un-conserved residues in the ligand binding sites. Gly170 and Thr190 were identified and interchanged with polar residues such as ARG, ASN and non-polar residues such as ALA, ILE. The resulting mutants were modelled, minimized and docked with Indinavir and Hydroxychloroquine. A higher binding affinity was observed for Indinavir; however, less variance in the binding affinity was observed for the latter. These results were consistent for random mutations as well. A Bio.seqIO based pipeline was build to simulate changes in the ligand binding site. Under the assumption that the ligand binding region has an equal probability of mutation over a given range for continuous distribution, 200 cycles of mutation was carried out in the nucleotide region corresponding to the ligand binding site. A paired t-test revealed a significant difference between the binding affinity of these mutant Indinavir and Hydroxychloroquine-protease complexes. Further, mean and variance was found to be higher for Indinavir-protease complex but Hydroxychloroquine displayed lesser variance pointing at a constant binding capability towards the mutant. Our study highlights the role of Hydroxychloroquine as a drug that can complement an evolving SARS-CoV2 main protease. coronavirus and arterivirus RNA polymerase activity in vitro and zinc ionophores block the replication of these viruses in cell


Introduction
The first coronavirus infection has been described as early as 1949 and has been tested in all species of animals including humans [1]. Of the many known coronaviruses, only 7 are known to cause disease in humans, out of which 3 cause severe respiratory infections [2]. SARS-CoV2 is the type of 3 coronavirus that has been identified as the cause of COVID-19 disease that began in Wuhan, China.
The COVID-2019 epidemic has probably originated from bat which later underwent mutation in the spike glycoprotein leading to its transmission into humans [3]. Rapid investigation is required to study the impact of mutation and its influence on the potential drug targets. Retroviral genomes evolve rapidly which helps it to evade selective pressure from the host immune system which otherwise can decrease their survival chance [4,5]. Viral recombination rate, rate of replication, size of viral population, selective forces and viral mutation rate are the main players behind genetic variation [6]. The site of mutations is crucial in understanding the resulting viral viability. Viral replication is mediated by proteins encoded in open reading frame 1(ORF1) which comprises nearly 2/3rds of the viral genome. ORF1a and ORF1ab are two polyproteins translated from this region. These are non-structural polyproteins processed through the main protease and papainlike protease. This crucial role of viral protease makes it a very promising target for drug discovery [7,8]. In the present study, we focus on the SARS-CoV2 protease Variation in the viral SARS-CoV2 protease can affect the drug binding capability rendering the host susceptible to infection in spite of medication [9]. Studies are currently being carried out to identify possible drugs capable of targeting it. Combinatorial drugs used as an anti-influenza and anti-HIV agent, Lopinavir and ritonavir have been suggested, with successful results [10]. Another study suggests protease inhibitor Indinavir which is currently used for HIV infections [11]. Non-specific Chloroquine analogs have been tested against a broad range of emerging virus. The use of Chloroquine has found no evidence of clinical benefit. Hydroxychloroquine along with azithromycin too does not show any significant change in antiviral effects [12,13,14]. In the present study, the effect of mutations on the drug binding capability of Indinavir and Hydroxychloroquine to the SARS-CoV2 protease has been studied. We wish to use both binding affinity and its variability to propose the flexibility of drug effect upon both site directed and random mutations.

Protease Structural analysis
SARS-CoV2 protease consists of two protomers which associates to form a dimer. Each protomer is composed of three domains. Domains I and II are six-stranded anti-parallel beta barrels and the substrate-binding site is located in a cleft between these two domains. The third domain is connected to domain I & II through a long loop. Domain III consists of 5 Alpha-helix (Figure 1). The sequence analysis showed 41.5% hydrophobic residues, 8.5% acidic, 9.48% basic and 40.52% of the residues to be neutral.

Binding studies with retroviral protease drugs
The minimized structure of SARS-CoV2 protease (PE = -1.55498e+06Kj/mol) predicted forty five residues to be actively involved in ligand binding (

Site-directed mutation
To check the effect of mutation in the un-conserved region, ConSurf server based on multiple sequence alignment of 98 coronavirus and its related polyprotein sequences predicted 20 unconserved residues. However, the ligand binding site largely remained conserved. Amongst the 21 residues predicted with the ligand binding role, Gly170 and Thr190 were directly involved in interactions with Indinavir and Hydroxychloroquine respectively. The substitution of GLY and THR with polar and non-polar residues showed variation in the resulting modelled structure with respect to RMSD and the resulting binding affinity. The observed variation was in the range 0.375 Å to 0.435 Å and -5.98 to -8.04 for Indinavir complex, while for Hydroxychloroquine it was 0.376 Å to 0.404 Å and 5 -5.86 to -6.90 (Supplementary file 3). A very low correlation (r = -0.04, 0.2) stated the structural effects due to the mutations are negligible on the binding affinity. Paired t-test (α=0.05), revealed a significant difference (p=0.0016) between the binding energies of Indinavir and Hydroxychloroquine.

Random mutation effect
General features of the SARS-CoV-2 complete genome (Gbk Id: MN908947.3) were retrieved and corresponding protein information was tabulated ( Table 1) Table 2).

Discussion
Viral proteases catalyses the cleavage of peptide bonds specific to viral polyprotein precursors [15].
The role of viral protease in mediating assembly and disassembly involves converting the polyproteins into capsids which disassembles upon its entry into a newly infected cell [16]. Without doubt, proteases are being considered widely as a potential target for the current novel Coronavirus infection. A general mechanism involving protease function is to recognize substrates through a conserved shape. A variant of protease thus might not bind to inhibitors efficiently even though it may still process the substrate, turning resistant [17]. To verify the property of inhibitor effect under 6 mutations, a set of six major protease inhibitors were selected. These six compounds Amprenavir, Atazanavir, Darunavir, Fosamprenavir, Indinavir and Hydroxychloroquine were docked with retroviral protease 6LU7 with binding energy -4.74, -3.27, -4.17, -3.26, -6.81, and -4.81 respectively. Compared to H-bonds, residues involved in hydrophobic interactions were relatively higher. The hydrophobic interactions are important contributors to ligand-receptor binding affinities. The tendency of nonpolar molecular surfaces to interact with other nonpolar molecular surfaces contributes significantly to the ligand-receptor binding affinities [18]. A higher binding affinity was assumed to be mainly responsible due to these hydrophobic interactions in Indinavir complex.
These hydrophobic residues and other adjoining ones were analysed further. The sequence alignment of homologous proteins from ORF1a polyprotein belonging to bat, human, Infectious bronchitis virus, Bottlenose dolphin coronavirus and others showed very few residues in the ligand binding sites to be variable amongst the other species. Only two mutations Gly170 and Thr190 were observed amongst the residues involved in the ligand binding. These results indicated a high degree of conservation amongst the viral protease. Nevertheless, the two residues found to be variable were further used to simulate changes in binding affinity upon substitution. Both polar and non-polar residues were substituted in their positions. The effect of these site directed mutation were quantified as structural (RMSD) and biophysical (Binding affinity). Since no direct correlation was established between the two, the binding affinity was further analysed. The site directed mutations pointed at a significant variation between the binding affinity of Indinavir and Hydroxychloroquine-protease complex. Even though Indinavir had a higher mean binding affinity, a comparatively less variance was observed with Hydroxychloroquine. This was further validated with 200 protease mutants. The difference between the variance observed with few mutations further increased. Interestingly the mean of binding affinity increased for Hydroxychloroquine.
Based on our observations, Hydroxychloroquine is observed to have less variance in the binding affinity as compared to Indinavir under similar set of mutations. The mean binding affinity of Indinavir is only 0.7 higher than Hydroxychloroquine. Considering a similar binding affinity profile and a comparatively less variance, Hydroxychloroquine can be assumed to be more stable under protease 7 mutation. Previous studies on using Hydroxychloroquine along with Azithromycin have provided no strong evidence in association with protease inhibition [19]. Few studies have even used Zn2+ for inhibiting coronavirus which alters RNA polymerase activity [20]. This potential inhibitor along with Hydroxychloroquine is being tried in few countries although a scientific reference is yet to be produced. Besides Hydroxychloroquine, Indinavir an anti-protease drug used for HIV infections have been tried too. This drug has shown high degree of specificity to HIV protease [21]. Through this study we have compared the binding affinity of two prominent drugs currently being tried worldwide.
Indinavir and Hydroxychloroquine bind to SARS-CoV-2 protease with different binding affinities but under mutation effects, the observation has placed the later to be more stable. Based on the produced data, we propose Hydroxychloroquine to be a stable drug under rapid mutation conditions.

Protease Structural analysis
The X-ray crystal structure with PDB id: 6LU7 of SARS-CoV-2 protease was extracted from RCSB PDB database [22]. For structure analysis PyMOL [23] visualising tool was used.

Docking studies
Prior to further analysis the protein structure was energy minimised using GROMOS96 54a7 force field from the Gromacs 5.1 package [24]. To identify the ligand binding sites, residues showing binding activity in the ligand bounded PDB structure was identified. Along with this, online tool COACH [25][26] was also used to get the highest ranked ligand binding pockets. The top ranked site was then chosen for analysing the binding efficiency of six major ligands. These six compounds Amprenavir, Atazanavir, Darunavir, Fosamprenavir, Indinavir and Hydroxychloroquine were chosen based on literature survey of recent studies performed for finding suitable candidates as protease inhibitors (Supplementary figure 1). Molecular docking was performed using Autodock v.1.5.6 [27]. Kollman charges were used to process the protein. Maximum torsion of 6 was used for the ligands. The grid box was set based on ligand binding residues ranging from THR25 to GLN192. The docking was 8 initialised using Genetic Algorithm with 100 runs. Besides binding affinity, the comparative values for total energy, H.bond energy, and other were also computed. The residues taking part in hydrophobic interactions and H-bonds were identified using PoseView online tool [28].

Conserved versus variable region in the ligand binding region
To check the effect of mutation in the un-conserved region, the ConSurf [29] server was first used to find out variable regions based on multiple sequence alignment of 98 homologous coronavirus and other similar polyprotein sequences. Genes for protease belonging to orf1ab region was downloaded from 100 complete genomes of SARS-CoV-2 using NCBI [30]. These were aligned and analysed using MEGA [31].

Site directed In-silico mutation
Based on the homologous protein analysis, site directed mutagenesis were carried on the variable residues. The GLY170 and THR190 were changed with polar residues such as ARG, ASN and non-polar residues such as ALA, ILE, etc. In total 13 residues were used to observe the effect of mutation. The mutant proteins were modelled and its RMSD deviation was evaluated with respect to the wild type.
Indinavir and Hydroxychloroquine were docked against the generated mutant proteases based on the above protocol.

Random mutation effect
Nucleotides that code for the ligand binding sites of the protease were subjected to random changes.
A custom pipeline was written for this simulation. The start and end position of the mutational site along with the mutation rate was defined based on evolutionary analysis of similar sequences by aligning the protease and computing the variable region based on the above analysis (section). The mutational sites are designated based on uniform manner and the type of nucleotide in random manner. Bio.seqIO interface of Biopython was used to build the pipeline. The following inputs were used, fasta file of the nucleotide sequence of the organism, start (p1) and the end (pn) nucleotide position of the active site and the mutational rate (μ). The selected mutational sites are defined as the (pn -p1)/μ+σ and (pn -p1)/μ-σ, where σ is the standard deviation of the uniform distribution. At each of these mutational sites, A/T/G/C are substituted to yield the mutated variant of the input protease. The simulated sequence was then translated to the protein sequence to study the changes in protein. 200 models were then generated then subjected to energy minimization using Steepest Descent. Further, Indinavir and Hydroxychloroquine were docked with these 200 structures. Pearson correlation was computed to analyse the dependence of RMSD changes on the binding affinity for Indinavir and Hydroxychloroquine complex. Mean, variance, and t-test was performed to evaluate significant difference between the binding affinity of the two ligand.

Acknowledgement
Authors thank SRM Supercomputer Centre, SRM Institute of Science and Technology for providing the computational facility. Tables   Table 1 Genome  The active site depicted in blue, lies in the intersection of two domains.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.