In Silico Induction of Missense Mutation in NNRTI Protein: Computational Modelling Studies on Design of Modelled Proteins and their Stability Studies.

doi:10.21203/rs.3.rs-3090755/v1

Download PDF

Research Article

In Silico Induction of Missense Mutation in NNRTI Protein: Computational Modelling Studies on Design of Modelled Proteins and their Stability Studies.

https://doi.org/10.21203/rs.3.rs-3090755/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 28 Nov, 2023

Read the published version in Journal of Mathematical Chemistry →

You are reading this latest preprint version

The work presents in silico mutational studies on the energetics of HIV-1 reverse transcriptase protein 4G1Q, the highest resolved protein structure of NNRTIs of HIV-1. In silico mutations are induced on the twenty neighbouring residues, surrounding the embedded ligand, within the vicinity of 6 Å from the centre of the ligand. These 20 surrounding residues are mutated and a set of 380 novel proteins are designed in silico for the present study. The effects of mutation on the change in folding-unfolding free energy (ΔΔG), protein stability and solvation energy have been analysed and compared with the parent protein. A two-fold study is performed to assess the effect of mutation (i) by and (ii) on a specific amino acid residue. The results suggest that folding-unfolding is highly favoured in 12 designed proteins (ΔΔG < -3.0) leading to the formation of highly stable conformation. In 11 designed proteins the positive values of ΔΔG > 0.5, suggest unfavourable mutations, thus the resultant designed proteins are unstable. Though, in 171 designed proteins the ΔΔG is <-1.0 suggesting the results suggest mutations lead to the stable conformation of designed proteins. The results suggest that of all the 380 designed proteins 11 showed highly unfavourable, 69 less favourable and 270 showed favourable folding-unfolding transformations.

Proteins are engaged in highly selective interactions in micro to macro living systems. Variation (Mutation) in the sequence causes significant perturbations or complete abolishment of function, potentially leading to diseases. There is an important need to understand the impacts of variation in the protein structure. The stability of proteins plays an important role in characterizing their functions, activity and regulation [1].

One of the possible ways to assess the effect of a mutation on protein binding affinity/stability is to experimentally measure it. However, these methods can be time-consuming and costly. With the advancements and amalgamation of computing technology with chemistry, physics, and biology, it has become convenient to estimate the impact of mutations on protein stability/energy theoretically with near accuracy to the experimental results[2].

The current era of genome sequencing has unravelled a large number of human genetic variations, many of which may affect protein binding and function. [3]

Protein stability refers to the ability of a protein to maintain its native three-dimensional structure under a given set of conditions. The Gibbs free energy (ΔG) is a thermodynamic parameter that describes the tendency of a system to change spontaneously from one state to another. In the context of protein stability, ΔG is a measure of the free energy difference between the folded (native) and unfolded (denatured) states of the protein [4].

A negative ΔG value indicates that the protein is stable in its folded state, while a positive ΔG value indicates that the protein is unstable and has a tendency to unfold. The magnitude of ΔG reflects the strength of the interactions that stabilize the folded protein, such as hydrogen bonds, hydrophobic interactions, and electrostatic interactions [5].

Experimental techniques such as protein folding assays, circular dichroism spectroscopy, and differential scanning calorimetry can be used to measure protein stability and ΔG values under various conditions, such as changes in temperature, pH, and ionic strength. Computational methods such as molecular dynamics simulations and free energy calculations can also be used to predict protein stability and ΔG values based on the protein's structure and environmental conditions [6].

AIDS pandemic, caused by the retrovirus HIV-1, has claimed more than 30 million lives over the past four decades. Antiretroviral (ART), which is required for the whole life, has transformed the disease into a little manageable one. The CD+ T lymphocyte is the main target cell through which HIV-1 enters, by binding to its receptor CD4 and to the co-receptors i.e., CC-chemokine receptor-5 (CCR5). The fusion of the viral and human cell membranes, prompted by this binding, initiates a complex intracellular life cycle, producing new viruses. [7].

Computational Chemistry is a multidisciplinary field that combines principles of chemistry, physics, and computer science to investigate and understand chemical phenomena using computational methods. It involves the development and application of theoretical models, algorithms, and software tools to study various aspects of molecular systems, such as their structures, properties, and reactivity. Computational chemistry is a highly sophisticated branch of chemistry that uses computer simulations and mathematical models to study chemical systems. It involves the use of theoretical methods, algorithms, and computer programs to estimate the properties and behaviour of molecules, materials, chemical reactions etc.

The use of computational methods in chemistry has revolutionized the way researchers approach the study of molecules and materials. It enables the exploration of complex chemical systems that are often difficult or even impossible to study experimentally. Computational chemistry techniques provide insights into molecular interactions, reaction mechanisms, and properties of compounds, helping researchers to design new drugs, catalysts, and materials.

Computational chemistry has many applications, including drug discovery, materials science, catalysis, and environmental chemistry. By using computational methods, the properties of molecules and materials can be predicted to near accuracy without the need for expensive and time-consuming experiments. This helps in saving time thereby faster and more efficient development of new drugs, materials, and technologies.

Computational chemistry is a broader field that encompasses a wide range of computational methods and techniques used to study chemical systems. In addition to MD simulations and protein modelling, computational chemistry also includes techniques such as quantum chemistry, molecular mechanics, and molecular docking, among others. [8]

Some of the commonly used computational chemistry methods include computer aided drug design (CADD) including, molecular mechanics, quantum mechanics, density functional theory, and molecular dynamics simulations. These methods vary in their level of accuracy and computational cost and are chosen based on the specific research question and available computational resources.

Overall, computational chemistry plays an important role in advancing our understanding of chemical systems and developing new technologies that can improve our lives.

Computer-aided drug design (CADD) is a computational approach that involves the use of computer algorithms and software to assist in the drug discovery process. This approach uses various computational tools to identify potential drug candidates and optimize their properties before they are tested in the laboratory. [9]

CADD has become an essential tool in drug discovery, allowing researchers to rapidly screen large numbers of compounds and optimize their properties before investing time and resources in expensive experimental studies.

Virtual screening is a computational technique used to predict the potential activity of small molecules (ligands) against a specific target protein. It involves the use of computer software to analyse large databases of molecules and predict their affinity and activity for a specific target. It can be used in drug discovery to identify potential drug candidates that can bind to the target protein and modulate its activity [10]. It is a powerful tool in drug discovery as it can significantly reduce the time and cost involved in the drug discovery process by identifying potential drug candidates with high affinity and specificity for the target protein.

Molecular Dynamics (MD) simulation is a computational technique used in computational chemistry to study the behaviour of atoms and molecules over time [11]. In an MD simulation, the system of interest is described by a set of equations of motion that define the behavior of each atom or molecule in the system. The equations of motion take into account the interactions between atoms or molecules, which are described by a potential energy function. MD simulations can be used to study a wide range of chemical and biochemical systems, including proteins, DNA, and small molecules. They can provide insights into the dynamics and thermodynamics of these systems, such as the conformational changes that occur in proteins and the binding of ligands to enzymes. The simulation proceeds by solving the equations of motion numerically, typically using a numerical integration method such as the Verlet algorithm or the leapfrog algorithm [12]. The simulation calculates the position, velocity, and acceleration of each atom or molecule at each time step, and the positions of the atoms or molecules are updated based on these calculations.

Molecular dynamics (MD) simulations are one common type of simulation used in this field. MD simulations involve the use of computational models to simulate the motion of atoms and molecules over time. In the context of protein modelling, MD simulations can be used to study the structural and dynamic properties of proteins, including their folding and unfolding processes, interactions with ligands, and conformational changes. [13]

Protein modelling is the process of predicting the three-dimensional structure of a protein from its amino acid sequence. The three-dimensional structure of a protein is essential to understanding its function, interactions, and biochemical properties. There are several methods used to model protein structures, including homology modelling, ab initio modelling, and molecular dynamics simulations.

Homology modelling assumes that the amino acid sequence of a protein is similar to that of a known protein with a similar function and structure [14]. In homology modelling, the known protein structure is used as a template to predict the structure of the target protein. The accuracy of homology modelling depends on the similarity between the amino acid sequences of the target protein and the template protein.

Ab initio modelling, also known as de novo modelling, is a method that predicts the structure of a protein without using a template structure. Ab initio modelling is based on physical principles such as energy minimization and can be computationally expensive. This method is more challenging than homology modelling but can be used for proteins that do not have a close homolog with a known structure. [15]

Protein modelling is an essential tool for understanding protein function and structure. It has applications in drug design, protein engineering, and understanding the mechanisms of protein-protein interactions. A protein could have multiple structures available, and if another structure of the same protein is used, the predicted change in stability for structure-based methods might be different. The mutation causes a change in the stability of a protein.

DUET online server is used for these computations. DUET consolidates two reciprocal approaches (mCSM and SDM) in a agreement vaticination, attained by combining the results of the separate styles in an optimized predictor using Support Vector Machines (SVM) [16]. The system improves the overall delicacy of the prognostications in comparison with either system collectively and performs as well as or better than analogous styles. DUET is a bioinformatics web garçon created for gaining sapience into the goods of nsSNPs on protein stability. It integrates two reciprocal styles into a agreement/ optimized vaticination, as a way to work the stylish of SDM, a statistical implicit energy function that relies on negotiation tables deduced from homologous protein families which incorporates constraints on residue surroundings during elaboration, and mCSM, a machine literacy algorithm that takes into account the residue 3D physicochemical terrain epitomized as a graph- grounded structural hand [16].

Mutations can be classified into three categories (a) “Good” which increases fitness, (b) “Indifferent or Neutral”, as the effects are too small and, (c) “Bad” which decreases fitness. [16]

ΔΔG results will fall into three categories:

ΔΔG > 0.5: Positive results suggest that a mutation would be destabilizing. These mutations are residues that are usually avoided during design and can be classified as “Bad”.
0.5 > ΔΔG > -0.5: Things that are near 0 are within the noise range so should be considered indifferent or neutral. These can be included in the design to allow more neutral changes in the protein that may compensate for changes in the protein. These can be classified as “Neutral” or “Indifferent”
ΔΔG < -0.5: Negative results suggest that the mutation would lead to a more stable protein and can be classified as “Good”.

Protein modelling of missense mutations involves predicting the structural and functional consequences of amino acid substitutions that alter the protein sequence. Missense mutations are single-nucleotide variations that change a single amino acid residue in a protein sequence, potentially affecting protein stability, interactions, or enzymatic activity.

There are several computational tools and methods available for protein modelling of missense mutations, including homology modelling, molecular dynamics simulations, and machine learning-based approaches. These methods use various algorithms to predict the effect of a missense mutation on protein structure and function, such as changes in protein stability, folding, dynamics, and interactions. [17-21]

One common approach is to compare the predicted structure and stability of the wild-type protein with that of the mutated protein. If the mutation destabilizes the protein or alters its structural integrity, it may affect the protein's function or interactions with other molecules.

Overall, protein modelling of missense mutations can provide valuable insights into the potential effects of genetic variations on protein structure and function, which can help in understanding the molecular basis of genetic diseases and designing therapeutic interventions.

This is an attempt to study the impact of the mutation “on” and “by” specific amino acid residues. An in-silico introduction of missenses investigation has been undertaken to test the effect of mutation on the stability of the newly designed proteins.

In the present study HIV-1 NNRTI protein, namely 4G1Q [22], downloaded from protein data bank (www.rcsb.org), was used to perform mutation and assess and compare relative stability of designed proteins with the parent protein [23]. DUET server was used for performing mutations in 4G1Q on twenty neighbouring residues, surrounding the active ligand, within the vicinity of 6 Å from the centre of the ligand[16]. A dataset of 380 designed proteins is created. Further, ΔΔG was estimated for all the 380 designed proteins for comparing their relative stability with the parent protein, 4G1Q. The snapshot of protein 4g1q is presented in figure 1.

The fasta sequence of the protein 4g1q is given herewith

In order to understand the impacts of non-synonymous single nucleotide polymorphisms (nsSNPs) on the structure and function of the proteome, as well as to guide protein engineering, accurate in silico methodologies are needed to study and prognosticate their goods on protein stability. The change in folding free energy upon mutation (ΔΔG in kcal/ spook) is used as the measure to understand the impact of the mutation. DUET, a web garçon for an intertwined computational approach to study missense mutations in proteins is

In order to do so, complementary information regarding the mutation, analogous as secondary structure (used by SDM) and a pharmacophore vector that accounts for the changes between wild- type and mutant residue (used by mCSM) are also calculated and used by DUET. As described previously, the pharmacophore vector is attained by comparing the frequency of eight possible grain characteristics between wild- type and mutant remainders (positive, negative, hydrophobic, hydrogen patron, hydrogen acceptor, sulphur and neutral [16].

The results of missenses caused by inducing mutations in a protein (4g1q.pdb) molecule and their effects on the stability of designed proteins are detailed in this section.

Missenses were introduced in a total of 20 AARs in silico and mutated de novo design of 380 proteins is carried out. The stability of the designed proteins is carried out by comparing their ΔΔG values, which is a metric for comparing how a single point mutation affects protein stability, with the parent protein 4G1Q.

The impact of the mutations on protein stability based on ΔΔG are assessed in two ways:

Impact on stability of designed protein on mutation of a specific surrounding amino acid residue.
Impact on stability of designed protein by mutation of a specific amino acid residue.

The ΔΔG values of all the 380 designed proteins, on a mutation of surrounding amino acid residues, are presented in table 1, in which ΔΔG for 4G1Q is taken as zero and comparisons are made.

Table 1: ΔΔG values for the designed proteins* as obtained from DUET Server

		Amino Acid Residues causing Mutation																				Unstable Proteins
S.No.	Surrounding AA Residues number	A	C	D	E	F	G	H	I	K	L	M	N	P	Q	R	S	T	V	W	Y
	E138_B	-0.325	-0.126	-0.826	0	-0.412	-0.897	-0.767	0.567	-0.224	0.547	0.797	-0.604	-0.166	-0.597	-0.235	-1.089	-0.652	0.304	-0.368	-0.349	4
	F227	-3.032	-1.913	-3.156	-2.858	0	-3.255	-1.484	-1.063	-2.386	-1.501	-0.957	-2.905	-2.031	-2.887	-0.193	-3.593	-3.236	-1.859	-0.364	-0.545	0
	G190	-0.645	-1.255	-2.179	-2.129	-1.067	0	-1.655	0.242	-1.494	0.081	-0.451	-1.344	-0.979	-1.633	-1.132	-1.739	-1.441	0.16	-1.402	-1.233	3
	H235	-1.548	-0.597	-1.515	-1.34	0.388	-1.895	0	-0.323	-1.141	-0.411	-0.649	-1.606	-1.086	-1.223	-1.636	-1.65	-1.197	-0.665	0.476	0.495	3
	K101	-0.603	-0.573	-0.048	0.236	-0.133	-0.91	-0.82	0.756	0	0.66	0.044	-0.607	-0.157	-0.536	-0.183	-1.428	-0.815	0.418	-0.146	0.149	6
	K102	-0.219	-0.471	0.388	0.547	-0.1	-0.502	-0.982	0.747	0	0.704	0.263	-0.129	0.601	-0.264	-0.256	-0.899	-0.484	0.421	-0.038	0.143	8
	K103	-0.364	-0.731	0.03	0.255	-0.306	-0.896	-1.118	0.655	0	0.53	0.162	-0.43	-0.442	-0.626	-0.296	-1.387	-0.813	0.361	-0.143	0.143	7
	L100	-2.46	-1.552	-2.81	-2.612	-1.619	-3.065	-2.402	-0.918	-2.074	0	-0.863	-2.222	-1.521	-2.327	-1.218	-2.985	-2.632	-1.739	-1.872	0.168	1
	L228	-0.94	-0.547	-0.15	0.033	-0.701	-1.113	-0.374	-0.043	-0.295	0	-0.347	-0.248	-0.843	-0.402	0.2	-0.905	-0.589	-0.16	-0.737	-0.569	2
	P095	-1.563	-1.154	-2.626	-2.36	-1.183	-2.651	-2.175	0.08	-1.607	0.143	-0.105	-1.812	0	-1.739	-1.305	-2.548	-2.035	-0.352	-1.655	-1.39	2
	P225	-0.841	-0.385	-0.981	-1.037	-0.936	-0.401	-0.769	-0.196	-0.77	-0.194	-0.174	-0.557	0	-0.664	-0.289	-0.962	-0.834	-0.378	-0.936	-0.861	0
	P226	-0.815	-0.352	-1.018	-1.25	-1.068	-1.594	-1.045	-0.35	-0.777	0.026	-0.11	-0.583	0	-0.891	-0.422	-1.081	-0.918	-0.57	-0.773	-1.039	1
	P236	-1.362	-0.368	-1.455	-1.369	-0.754	-1.78	-1.233	-0.256	-1.014	-0.212	-0.051	-0.859	0	-0.972	-0.697	-1.558	-1.156	-0.509	-1.004	-0.951	0
	V106	-1.991	-1.202	-1.651	-1.254	-1.267	-2.355	-1.627	-0.247	-1.268	-0.425	-0.616	-1.34	-1.487	-1.31	-0.737	-2.194	-1.721	0	-1.435	-1.179	0
	V179	-0.759	-0.695	-0.197	-0.129	-0.729	-0.894	-0.373	0.092	-0.495	-0.054	-0.522	-0.496	-0.455	-0.681	0.008	-1.107	-0.82	0	-0.758	-0.513	2
	W229	-1.652	-1.5	-2.502	-2.346	-0.677	-1.547	-1.616	-0.519	-1.936	-0.811	-0.853	-2.614	-1.091	-2.32	-2.044	-2.647	-2.392	-1.026	0	-0.767	0
	Y181	-2.272	-1.355	-1.62	-1.359	-0.361	-2.523	-0.866	-0.715	-1.657	-0.844	-1.065	-2.419	-1.852	-2.166	-1.426	-2.968	-2.537	-1.184	0.086	0	1
	Y183	-2.226	-1.134	-1.823	-1.6	-0.878	-2.378	-0.8	-1.067	-1.425	-1.165	-1.15	-2.088	-1.846	-1.948	-1.171	-2.532	-2.12	-1.262	-0.466	0	0
	Y188	-2.335	-1.106	-2.639	-2.3	-0.055	-2.675	-1.36	-0.514	-1.583	-0.772	-0.669	-2.587	-1.778	-2.271	-1.578	-3.021	-2.508	-0.943	0.405	0	1
	Y318	-3.512	-1.64	-3.437	-3.151	-1.222	-3.955	-2.341	-2.159	-2.06	-2.115	-1.925	-2.784	-2.946	-2.628	-1.624	-3.215	-2.863	-2.711	-0.6	0	0
	Unstable Proteins	0	0	2	4	1	0	0	7	0	7	4	0	1	0	2	0	0	5	3	5	41

* The value of 0 (Zero) is for parent protein (4gq1)

A bar graph showing the comparative ΔΔG values of all the 380 designed proteins is presented in figure 1. All values above the x-axis indicate the ΔΔG values of proteins which are unstable than parent 4g1q while those below the x-axis (negative) indicate the ΔΔG values of protein which are stable than the parent 4g1q.

From the results thus obtained from the estimation of ΔΔG using the DUET server table presented in 1, it is observed that of 380 designed (mutated) proteins a total of 41 exhibit positive while 339 exhibit negative ΔΔG values. This suggests 339 stable proteins while 41 unstable proteins are obtained, indicating stabilization effect of mutation in nearly 90% cases.

A. Effect of mutation on stability of a specific surrounding amino acid residue.

Table 2 presents the order of stability of newly designed proteins formed on mutations of a specific AAR. This also gives a detailed insight into the effect of mutation of a specific AAR.

Table 2: Order of stability of designed protein of mutation of a specific SAAR.*

S. No.	Surrounding AA Residues	Oder of Stability on Mutation^*
1	E138_B	S > G > D > H > T > N > Q > F > W > Y > A > R > K > P > C > E > V > L > I > M
2	F227	S > G > T > D > A > N > Q > E > K > P > C > V > L > H > I > M > Y > W > R > F
3	G190	D > E > S > H > Q > K > T > W > N > C > Y > R > F > P > A > M > G > L > V > I
4	H235	G > S > R > N > A > D > E > Q > T > K > P > V > M > C > L > I > H > F > W > Y
5	K101	S > G > H > T > N > A > C > Q > R > P > W > F > D > K > M > Y > E > V > L > I
6	K102	H > S > G > T > C > Q > R > A > N > F > W > K > Y > M > D > V > E > P > L > I
7	K103	S > H > G > T > C > Q > P > N > A > F > R > W > K > D > Y > M > E > V > L > I
8	L100	G > S > D > T > E > A > H > Q > N > K > W > V > F > C > P > R > I > M > L > Y
9	L228	G > A > S > P > W > F > T > Y > C > Q > H > M > K > N > V > D > I > L > E > R
10	P095	G > D > S > E > H > T > N > Q > W > K > A > Y > R > F > C > V > M > P > I > L
11	P225	E > D > S > W > F > Y > A > T > K > H > Q > N > G > C > V > R > I > L > M > P
12	P226	G > E > S > F > H > Y > D > T > Q > A > K > W > N > V > R > C > I > M > P > L
13	P236	G > S > D > E > A > H > T > K > W > Q > Y > N > F > R > V > C > I > L > M > P
14	V106	G > S > A > T > D > H > P > W > N > Q > K > F > E > C > Y > R > M > L > I > V
15	V179	S > G > T > A > W > F > C > Q > M > Y > N > K > P > H > D > E > L > V > R > I
16	W229	S > N > D > T > E > Q > R > K > A > H > G > C > P > V > M > L > Y > F > I > W
17	Y181	S > T > G > N > A > Q > P > K > D > R > E > C > V > M > H > L > I > F > Y > W
18	Y183	S > G > A > T > N > Q > P > D > E > K > V > R > L > M > C > I > F > H > W > Y
19	Y188	S > G > D > N > T > A > E > Q > P > K > R > H > C > V > L > M > I > F > Y > W
20	Y318	G > A > D > S > E > P > T > N > V > Q > H > I > L > K > M > C > R > F > W > Y

^*The protein depicted in red colour is parent 4G1Q (unmutated).

From table-2 the following observations are made:

Total 380 new designed proteins were obtained on single point mutation.
A total of 339 stable proteins are obtained on single point mutation.
Out of the 380 designed proteins, 41 designed proteins that are less stable than parent 4G1Q are obtained.
All the designed proteins that are obtained by mutating F227, P225, P236, V106, W229, Y183 and Y318 are observed to be more stable than parent 4G1Q, suggesting no effect of mutation on these AARs positions.
While mutating P226, Y181, and Y188 mutation produces a total of Fifty-Four (out of Fifty-Seven i.e., Eighteen each) proteins, more stable than 4G1Q are obtained, suggesting mutations of these AARs also stabilizes the designed (mutated) protein but to a lesser extent.
21 out of 41 the unstable proteins were obtained when lysine (K) amino acid residues namely K101, K102 and K103 are mutated. The highest number (08) of unstable designed proteins are obtained when K101 is mutated, while mutation of K102 and K103 yielded 7 and 6 unstable designed proteins, respectively. This suggests mutation of lysine might be highly important in deciding the stability of a protein. This further suggest that introduction of instability might affect the process of denaturation and in all probabilities enhance it. i.e. when lysine is mutated the stability of a protein decreases.

B. Effect of mutation on stability by a specific amino acid residue.

Table 3 presents the impact of mutation by a specific mutation on the stability of designed proteins.

Table 3: Effect of mutation by specific AAR*.

S.No.		Oder of Stability on Mutation^*
1	A	Y318 > F227 > L100 > Y188 > Y181 > Y183 > V106 > W229 > P095 > H235 > P236 > L228 > P225 > P226 > V179 > G190 > K101 > K103 > E138_B > K102
2	C	F227 > Y318 > L100 > W229 > Y181 > G190 > V106 > P095 > Y183 > Y188 > K103 > V179 > H235 > K101 > L228 > K102 > P225 > P236 > P226 > E138_B
3	D	Y318 > F227 > L100 > Y188 > P095 > W229 > G190 > Y183 > V106 > Y181 > H235 > P236 > P226 > P225 > E138_B > V179 > L228 > K101 > K103 > K102
4	E	Y318 > F227 > L100 > P095 > W229 > Y188 > G190 > Y183 > P236 > Y181 > H235 > V106 > P226 > P225 > V179 > E138_B > L228 > K101 > K103 > K102
5	F	L100 > V106 > Y318 > P095 > P226 > G190 > P225 > Y183 > P236 > V179 > L228 > W229 > E138_B > Y181 > K103 > K101 > K102 > Y188 > F227 > H235
6	G	Y318 > F227 > L100 > Y188 > P095 > Y181 > Y183 > V106 > H235 > P236 > P226 > W229 > L228 > K101 > E138_B > K103 > V179 > K102 > P225 > G190
7	H	L100 > Y318 > P095 > G190 > V106 > W229 > F227 > Y188 > P236 > K103 > P226 > K102 > Y181 > K101 > Y183 > P225 > E138_B > L228 > V179 > H235
8	I	Y318 > Y183 > F227 > L100 > Y181 > W229 > Y188 > P226 > H235 > P236 > V106 > P225 > L228 > P095 > V179 > G190 > E138_B > K103 > K102 > K101
9	K	F227 > L100 > Y318 > W229 > Y181 > P095 > Y188 > G190 > Y183 > V106 > H235 > P236 > P226 > P225 > V179 > L228 > E138_B > K103 = K102 = K101
10	L	Y318 > F227 > Y183 > Y181 > W229 > Y188 > V106 > H235 > P236 > P225 > V179 > L100 = L228 > P226 > G190 > P095 > K103 > E138_B > K101 > K102
11	M	Y318 > Y183 > Y181 > F227 > L100 > W229 > Y188 > H235 > V106 > V179 > G190 > L228 > P225 > P226 > P095 > P236 > K101 > K103 > K102 > E138_B
12	N	F227 > Y318 > W229 > Y188 > Y181 > L100 > Y183 > P095 > H235 > G190 > V106 > P236 > K101 > E138_B > P226 > P225 > V179 > K103 > L228 > K102
13	P	Y318 > F227 > Y181 > Y183 > Y188 > L100 > V106 > W229 > H235 > G190 > L228 > V179 > K103 > E138_B > K101 > P095 = P236 = P226 = P225 > K102
14	Q	F227 > Y318 > L100 > W229 > Y188 > Y181 > Y183 > P095 > G190 > V106 > H235 > P236 > P226 > V179 > P225 > K103 > E138_B > K101 > L228 > K102
15	R	W229 > H235 > Y318 > Y188 > Y181 > P095 > L100 > Y183 > G190 > V106 > P236 > P226 > K103 > P225 > K102 > E138_B > F227 > K101 > V179 > L228
16	S	F227 > Y318 > Y188 > L100 > Y181 > W229 > P095 > Y183 > V106 > G190 > H235 > P236 > K101 > K103 > V179 > E138_B > P226 > P225 > L228 > K102
17	T	F227 > Y318 > L100 > Y181 > Y188 > W229 > Y183 > P095 > V106 > G190 > H235 > P236 > P226 > P225 > V179 > K101 > K103 > E138_B > L228 > K102
18	V	Y318 > F227 > L100 > Y183 > Y181 > W229 > Y188 > H235 > P226 > P236 > P225 > P095 > L228 > V106 = V179 > G190 > E138_B > K103 > K101 > K102
19	W	L100 > P095 > V106 > G190 > P236 > P225 > P226 > V179 > L228 > Y318 > Y183 > E138_B > F227 > K101 > K103 > K102 > W229 > Y181 > Y188 > H235
20	Y	P095 > G190 > V106 > P226 > P236 > P225 > W229 > L228 > F227 > V179 > E138_B > Y318 = Y183 = Y181 = Y188 > K103 > K102 > K101 > L100 > H235

^*The proteins depicted in red colour are 4G1Q (unmutated).

From table 3 the following observations are made:

The impact of mutation by a specific AAR can be observed from the table.
The AARs G, H, and K impact the stability of 4G1Q the most and on mutation by these AARs all the de novo designed proteins are observed to be more stable than parent 4G1Q.
A little lesser Impact is observed when mutations is performed by F and P, wherein only one designed protein, less stable than parent 4G1Q is obtained for each mutation.
The Lysine (K) AAR produces the highest number (07) of unstable designed proteins.
Of the various impacts of mutation, in 10 cases where K102 is mutated, most unstable designed proteins are obtained.
Surprisingly, mutation by and mutation of lysine is creating instability in the designed protein suggesting that neither lysine should be mutated nor it should be used for mutation.

The designed proteins have been classified on the basis of mutation of a specific AAR and their stability (ΔΔG) range. Table 4 shows the details of these mutations and stability (ΔΔG range) of designed proteins.

Table 4: Classification of designed proteins

From table 4 the following observations are made

As stated earlier a total of 339 stable and 41 unstable designer proteins are obtained.
Of the 339 stable designer protein:
1. 12 highly stable designed proteins are obtained on the mutation of F227, L100, Y188 and Y318. Their ΔΔG values thus obtained are between -4.0 and -3.0. Of these 12 designed proteins it is observed that the maximum number (05) of most stable proteins are obtained when Y318 is mutated.
2. These 12 stable proteins are obtained on mutation of hydrophobic AARs.
3. 58 proteins having ΔΔG values between -3.0 and -2.0 are obtained. Of these highest number (09 each) of designed proteins, within this stability range, is obtained when L100 and Y318 are mutated.
4. 113 proteins having ΔΔG values between -2.0 and -1.0 are obtained. These can be classified as moderately stable.
5. 87 proteins having ΔΔG values between -2.0 and -1.0 are obtained and these can be classified relatively less stable.
99 designer protein having ΔΔG values between -0.5 and 0.5 are obtained, and the stability of these cannot be justified as the ΔΔG values are within the noise range so should be considered indifferent or neutral.
A total of 11 highly unstable designed proteins are obtained on the mutation of E138_B, K101, K102, and K103. Their ΔΔG values thus obtained are greater than 0.5. The unstable designed proteins are obtained when the charged AARs (E and K) are mutated.

Table 5 shows the number of stable proteins obtained on mutation of a specific AAR.

S.No.	Mutated AARs	Number of stable Designer proteins
	E138_B	15
	F227	19
	G190	16
	H235	16
	K101	13
	K102	11
	K103	12
	L100	18
	L228	17
	P095	17
	P225	19
	P226	18
	P236	19
	V106	19
	V179	17
	W229	19
	Y181	18
	Y183	19
	Y188	18
	Y318	19

From the table 5 it is observed that mutation of F227, P225, P236, V106, W229, Y183, and Y318 have yielded all the designer proteins more stable than the parent 4G1Q. These results of the present study are contrary to the belief that mutation induces instability in the protein and the naturally occurring proteins acquire most stable form. The Lysine residues (101, 102 and 103) are the most affected AARs and they produce least number of stable designer proteins. Though, the missenses are induced in silico, the results need to be verified practically.

Another way in which the designed proteins have been classified is on the basis of mutation by a specific AAR and their stability range (ΔΔG). Table 6 shows the details of these mutations and stability (ΔΔG range) of designed proteins.

Table 6: Classification of designed proteins

From table 6 the following observations are made

As stated earlier a total of 339 stable and 41 unstable designer proteins are obtained.

Of the 339 stable designer protein
12 highly stable designed proteins are obtained on the mutation by A, D, G, E, S and T. Their ΔΔG values thus obtained are between -4.0 and -3.0. Of these 12 designed proteins it is observed that the maximum number (03 each) of most stable proteins are obtained when mutated by G and S. No regular pattern of impact of mutation by a specific property of is obtained.
58 proteins having ΔΔG values between -3.0 and -2.0 are obtained. Of these highest number (07 each) of designed proteins, within this stability range, is obtained when mutated by N and T.
113 proteins having ΔΔG values between -2.0 and -1.0 are obtained. These can be classified as moderately stable.
87 proteins having ΔΔG values between -2.0 and -1.0 are obtained and these can be classified relatively less stable.
99 designer protein having ΔΔG values between -0.5 and 0.5 are obtained, and the stability of these cannot be justified as the ΔΔG values are within the noise range so should be considered indifferent or neutral.
A total of 11 highly unstable designed proteins are obtained on the mutation by E, I, L M, and P. Their ΔΔG values thus obtained are greater than 0.5 . In this case the mutation caused by hydrophobic has given the most unstable designed protein.

Table 7 shows the number of stable and unstable proteins obtained on mutation by a specific AAR.

S.No.	Mutation by AAR	Number of stable Designer proteins	Number of Unstable Designer proteins
	A	20	0
	C	20	0
	D	18	2
	E	15	4
	F	18	1
	G	19	0
	H	19	0
	I	13	7
	K	17	0
	L	11	7
	M	16	4
	N	20	0
	P	15	1
	Q	20	0
	R	18	2
	S	20	0
	T	20	0
	V	13	5
	W	16	3
	Y	11	5

From the table it is observed that mutation caused by A, C, D, G, H, K, Q, N, S, and T have yielded the designer proteins which all are stable than the parent 4G1Q protein. On mutation by L and Y, highest number (08) of unstable designer proteins, suggesting I and L follow, relatively better than other AARs, the trend of natural phenomena wherein mutation causes instability in the protein.

The comparative stability analyses reveals that the following combinations give the top 11 most unstable de novo designed proteins and are presented in Table 8.

Table 8: Mutated and Mutation by AARs yielding most unstable designer proteins.

S.No.	Mutation in AAR	Mutation by AAR
	K101	I
	E138_B	M
	K102	I
	K102	L
	K101	L
	K103	I
	K102	P
	E138_B	I
	E138_B	L
	K102	E
	K103	L

From the table 8 it is observed that mutation of combinations K102-I/L/P/E give most unstable proteins.

The study has given surprising results and a higher number of stable designer proteins were obtained on mutation. As the work take cares of single point mutation and nothing else, the results are non-traditional. However, the environment at each position should be considered. If interacting molecules are not present in the model, such as at a known zinc-binding site, then a seemingly favourable mutation will not be favourable in reality.

A position that has a lot of negative ΔΔGs could mean that this position evolved a destabilizing residue because it is necessary for its catalytic activity, for binding another molecule, or because of another functionally relevant reason.

Moreover, it must be kept in mind that this quantifies a single-point mutation. Sometimes sufficient stability can only be attained by various interrelated changes. Only one mutation can be predicted by ΔΔG at a time. It is a must to induce the mutations and run some relax reiterations in order to determine if multiple mutations would have a cumulative effect on stability. It takes longer much time calculate even almost exact ΔΔG.

Acknowledgements

The Authors would like to thank Dr. Subhash Basak and Dr. Tanmoy Chakraborty, Conveners, 8^th IUWMC, 2022 for accepting the submission for presentation.

Ethical approvals :Not Applicable

Competing Interests: Not Applicable

Authors Contribution: Contributed equally.

Funding: Not Applicable

Availability of data and material: All provided with the manuscript

Tanford, Protein Denaturation. Advances in Protein Chemistry, 121, 121, (1968)
Gainza, P., Wehrle, S., Van Hall-Beauvais, A. et al. De novo design of protein interactions with learned surface fingerprints. Nature617, 176, (2023). https://doi.org/10.1038/s41586-023-05993-x
W.-W. Liao, M. Asri, J. Ebler, D. Doerr, M. Haukness, G. Hickey, S. Lu, J. K. Lucas, J. Monlong, H. J. Abel, S. Buonaiuto, X. H. Chang, H. Cheng, J. Chu, V. Colonna, J. M. Eizenga, X. Feng, C. Fischer, R. S. Fulton, S. Garg, C. Groza, A. Guarracino, W. T. Harvey, S. Heumos, K. Howe, M. Jain, T.-Y. Lu, C. Markello, F. J. Martin, M. W. Mitchell, K. M. Munson, M. N. Mwaniki, A. M. Novak, H. E. Olsen, T. Pesout, D. Porubsky, P. Prins, J. A. Sibbesen, C. Tomlinson, F. Villani, M. R. Vollger, H. P. R. Consortium, G. Bourque, M. J. Chaisson, P. Flicek, A. M. Phillippy, J. M. Zook, E. E. Eichler, D. Haussler, E. D. Jarvis, K. H. Miga, T. Wang, E. Garrison, T. Marschall, I. Hall, H. Li, and B. Paten, Nature617, 312 (2023).
R. A. Langan, S. E. Boyken, A. H. Ng, J. A. Samson, G. Dods, A. M. Westbrook, T. H. Nguyen, M. J. Lajoie, Z. Chen, S. Berger, V. K. Mulligan, J. E. Dueber, W. R. P. Novak, H. El-Samad, and D. Baker, Nature 572, 205 (2019).
K. E. Dunn, F. Dannenberg, T. E. Ouldridge, M. Kwiatkowska, A. J. Turberfield, and J. Bath, Nature 525, 82 (2015).
J. P. Renaud, C. W. Chung, U. H. Danielson, U. Egner, M. Hennig, R. E. Hubbard, and H. Nar, Nat Rev Drug Discov 15, 679 (2016).
E. O. Freed, Nat Rev Microbiol 13, 484 (2015).
Big Data Analytics in Chemoinformatics and Bioinformatics: With Applications to Computer-Aided Drug Design, Cancer Biology, Emerging Pathogens and Computational Toxicology, Subhash C Basak, Marjan Vračko, Publisher Elsevier, 2022
Dean, P. Drug design in the 1990s. Nat Biotechnol15, 1018 (1997).
J. Lyu, J. J. Irwin, and B. K. Shoichet, Nat Chem Biol, 19, 712 (2023).
T. A. Collier, T. J. Piggot, and J. R. Allison, in Methods in Molecular Biology 2072, 311 (2020).
L. Verlet, Physical Review 159, 98 (1967).
T. Blundell, D. Carney, S. Gardner, F. Hayes, B. Howlin, T. Hubbard, J. Overington, D. A. Singh, B. L. Sibanda, and M. Sutcliffe, Eur J Biochem 172, 513 (1988).
T. Schwede, J. Kopp, N. Guex, and M. C. Peitsch, Nucleic Acids Res 31, 3381 (2003)
Jothi, Protein Pept Lett 19, 1191 (2012).
L. Loewe and W. G. Hill, Philosophical Transactions of the Royal Society B: Biological Sciences 365, 1153 (2010).
D. E. V. Pires, D. B. Ascher, and T. L. Blundell, Nucleic Acids Res 42, W314 (2014).
K. P. Tan, T. R. Kanitkar, C. K. Kwoh, and M. S. Madhusudhan, Front Mol Biosci 8, (2021).
S. Iqbal, F. Ge, F. Li, T. Akutsu, Y. Zheng, R. B. Gasser, D. J. Yu, G. I. Webb, and J. Song, J Chem Inf Model 62, 4270 (2022).
S. Iqbal, D. Hoksza, E. Pérez-Palma, P. May, J. B. Jespersen, S. S. Ahmed, Z. T. Rifat, H. O. Heyne, M. S. Rahman, J. R. Cottrell, F. F. Wagner, M. J. Daly, A. J. Campbell, and D. Lal, Nucleic Acids Res 48, W132 (2021).
C. H. M. Rodrigues, D. E. V. Pires, and D. B. Ascher, Protein Science 30, 60 (2021).
https://www.rcsb.org/structure/4g1q
Kuroda, D., Bauman, J., Challa, J. et al. Snapshot of the equilibrium dynamics of a drug bound to HIV-1 reverse transcriptase. Nature Chem5, 174 (2013).

No competing interests reported.

Download PDF

Journal Publication

published 28 Nov, 2023

Read the published version in Journal of Mathematical Chemistry →

Editorial decision: Major revision
17 Aug, 2023
Reviews received at journal
30 Jun, 2023
Reviewers agreed at journal
30 Jun, 2023
Reviewers invited by journal
27 Jun, 2023
Editor assigned by journal
26 Jun, 2023
Submission checks completed at journal
22 Jun, 2023
First submitted to journal
21 Jun, 2023

You are reading this latest preprint version

In Silico Induction of Missense Mutation in NNRTI Protein: Computational Modelling Studies on Design of Modelled Proteins and their Stability Studies.

Status:

Journal Publication

Version 1

Abstract

Figures

1. Introduction

2. Material and Methods

3. Result and Discussion on Duet Results

4. Conclusions

Declarations

References

Additional Declarations

Status:

Journal Publication

Version 1