The Flow Chart illustrates the overall procedure of identification and categorization of detrimental SNPs in HMGCR along with the structural and functional consequence analysis upon mutation (Fig. 1).
HMGCR gene is prone to point mutation and rich in missense type
The HMGCR gene (25783 bp) consists of 23 Exons. Its SNP data for the HMGCR gene were collected from dbSNP as it contains the largest polymorphism database, despite housing both validated and non-validated polymorphism information [22]. The dbSNP contains a total of 6815 SNPs for the gene HMGCR where 388 SNPs were missense SNPs (Fig. 2).
Among 388 submitted missense SNP rsIDs from dbSNP SIFT analyzed 7 missense SNPs to bear a deleterious effect with TI score ≤ 0.05, results are shown in (Table 1). The corresponding 7 missense SNPs rs112503211, rs113949962, rs147043821, rs147818666, rs148335635, rs193026499, rs368129510 had the tolerance index 0.1 and considered as damaging in HMGCR gene (Table 1).
Table 1
Impact of Amino acid substitution on protein function using the SIFT.
SI
|
SNP
|
Ref allele
|
Alt allele
|
Amino acid change
|
Gene ID
|
Transcript ID
|
Protein ID
|
Region
|
Sift prediction
|
1.
|
rs112503211
|
T
|
C
|
S147P
|
ENSG00000113161
|
ENST00000287936
|
ENSP00000287936
|
CDS
|
Deleterious
|
2.
|
rs113949962
|
G
|
A
|
M1I
|
ENSG00000113161
|
ENST00000287936
|
ENSP00000287936
|
CDS
|
Deleterious
|
3.
|
rs147043821
|
G
|
C
|
L218F
|
ENSG00000113161
|
ENST00000287936
|
ENSP00000287936
|
CDS
|
Deleterious
|
4.
|
rs147818666
|
G
|
C
|
G663A
|
ENSG00000113161
|
ENST00000287936
|
ENSP00000287936
|
CDS
|
Deleterious
|
5.
|
rs148335635
|
A
|
G
|
N204S
|
ENSG00000113161
|
ENST00000287936
|
ENSP00000287936
|
CDS
|
Deleterious
|
6.
|
rs193026499
|
C
|
T
|
R595C
|
ENSG00000113161
|
ENST00000287936
|
ENSP00000287936
|
CDS
|
Deleterious
|
7.
|
rs368129510
|
C
|
T
|
R159C
|
ENSG00000113161
|
ENST00000287936
|
ENSP00000287936
|
CDS
|
Deleterious
|
Coding missense SNPs rs147043821 and rs193026499 are the two most probale damaging mutations in HMGCR
PolyPhen program was used to determine the missense SNPs with the potential to cause structural modifications due to the amino acid substitution. A total of 388 missense SNP rsIDs were submitted to the PolyPhen server and in the resulting output 27, amino acid substitutions have been reported to be probably damaging with a PSIC score range from 0.539 to 1. Seven missense SNPs (rs112503211, rs113949962, rs147043821, rs147818666, rs148335635, rs193026499 and rs368129510) were identified by SIFT as deleterious, also marked to be damaging by PolyPhen-2 program as well (Table 2).
Table 2
Functional characterization of missense SNPs by PolyPhen.
SI
|
SNP
|
Protein Acc
|
Position
|
AA1
|
AA2
|
Prediction/Confidence
|
Probability
|
HumDiv
|
Probability
|
HumVar
|
1.
|
rs112503211
|
P04035
|
147
|
S
|
P
|
PROBABLY DAMAGING
|
Score: 1.000
Sensitivity: 0.00
Specificity: 1.00
|
PROBABLY DAMAGING
|
Score: 0.993
Sensitivity: 0.47
Specificity: 0.96
|
2.
|
rs113949962
|
P04035
|
1
|
M
|
I
|
PROBABLY DAMAGING
|
Score: 0.97
Sensitivity: 0.77
Specificity: 0.95
|
POSSIBLY DAMAGING
|
Score: 0.650
Sensitivity: 0.79
Specificity: 0.84
|
3.
|
rs147043821
|
P04035
|
218
|
L
|
F
|
PROBABLY DAMAGING
|
Score: 1.000
Sensitivity: 0.00
Specificity: 1.00
|
PROBABLY DAMAGING
|
Score: 1.000
Sensitivity: 0.00
Specificity: 1.00
|
4.
|
rs147818666
|
P04035
|
663
|
G
|
A
|
POSSIBLY DAMAGING
|
Score: 0.659
Sensitivity: 0.86
Specificity: 0.91
|
POSSIBLY DAMAGING
|
Score: 0.519
Sensitivity: 0.82
Specificity: 0.81
|
5.
|
rs148335635
|
P04035
|
204
|
N
|
S
|
PROBABLY DAMAGING
|
Score: 1.000
Sensitivity: 0.00
Specificity: 1.00
|
PROBABLY DAMAGING
|
Score: 0.999
Sensitivity: 0.09
Specificity: 0.99
|
6.
|
rs193026499
|
P04035
|
595
|
R
|
C
|
PROBABLY DAMAGING
|
Score: 0.999
Sensitivity: 0.14
Specificity: 0.99
|
PROBABLY DAMAGING
|
Score: 0.983
Sensitivity: 0.56
Specificity: 0.94
|
7.
|
rs368129510
|
P04035
|
159
|
R
|
C
|
PROBABLY DAMAGING
|
Score: 1.000
Sensitivity: 0.00
Specificity: 1.00
|
PROBABLY DAMAGING
|
Score: 0.939
Sensitivity: 0.66
Specificity: 0.91
|
To further validate the results of the tools used beforehand we analyzed the missense SNPs with the following in silico SNP prediction algorithms: PMUT, SNAP, PANTHER, MUTPRED, and SNP & GO. The missense SNPs which are marked as deleterious by both SIFT and Poly-Phen-2 server were principally selected. The results generated from the abovementioned tools were further combined and compared with the result of SIFT and PolyPhen server. In the combined results of 388 missense SNPs, only 7 (rs114166108, rs113949962, rs182539049, rs145415894, rs142939718, rs35896902, and rs150721457) were predicted as disease-related by at least 5 out of the 7 tools (Fig. 3).
Two missense SNPs, rs147043821 and rs193026499, showed positive results in all the 7 tools (Table 3–7).
Table 3
Disease association study of missense SNPs by PMUT.
SI
|
SNP
|
Protein
|
Position
|
Mutation
|
Prediction
|
1.
|
rs112503211
|
P04035
|
147
|
S → P (Ser → Pro)
|
0.48 (83%) Neutral
|
2.
|
rs113949962
|
P04035
|
1
|
M → I (Met → Ile)
|
0.60 (83%) Disease
|
3.
|
rs147043821
|
P04035
|
218
|
L → F (Leu → Phe)
|
0.55 (81%) Disease
|
4.
|
rs147818666
|
P04035
|
663
|
G → A (Gly → Ala)
|
0.43 (85%) Neutral
|
5.
|
rs148335635
|
P04035
|
204
|
N → S (Asn → Ser)
|
0.48 (83%) Neutral
|
6.
|
rs193026499
|
P04035
|
595
|
R → C (Arg → Cys)
|
0.79 (89%) Disease
|
7.
|
rs368129510
|
P04035
|
159
|
R → C (Arg → Cys)
|
0.48 (83%) Neutral
|
Table 4
Prediction of damaging effect of SNP in HMGCR gene by SNAP2.
SI
|
SNP
|
Wildtype Amino Acid
|
Position
|
Variant Amino Acid
|
Predicted Effect
|
Score
|
Expected Accuracy
|
1.
|
rs112503211
|
S
|
147
|
P
|
neutral
|
-22
|
61%
|
2.
|
rs113949962
|
M
|
1
|
I
|
neutral
|
-27
|
61%
|
3.
|
rs147043821
|
L
|
218
|
F
|
effect
|
12
|
59%
|
4.
|
rs147818666
|
G
|
663
|
A
|
effect
|
5
|
53%
|
5.
|
rs148335635
|
N
|
204
|
S
|
Effect
|
48
|
71%
|
6.
|
rs193026499
|
R
|
595
|
C
|
effect
|
47
|
71%
|
7.
|
rs368129510
|
R
|
159
|
C
|
effect
|
73
|
85%
|
Table 5
Damagicity prediction of polymorphism by PANTHER.
SI
|
SNP
|
Substitution
|
Preservation time
|
Message
|
1.
|
rs112503211
|
S147P
|
673
|
probably damaging
|
2.
|
rs113949962
|
M1I
|
3807
|
probably damaging
|
3.
|
rs147043821
|
L218F
|
673
|
probably damaging
|
4.
|
rs147818666
|
G663A
|
3806
|
probably damaging
|
5.
|
rs148335635
|
N204S
|
673
|
probably damaging
|
6.
|
rs193026499
|
R595C
|
673
|
probably damaging
|
7.
|
rs368129510
|
R159C
|
673
|
probably damaging
|
Table 6
Effect in functional Motif in HMGCR gene by MUTPRED tool.
SI
|
SNP
|
Substitution
|
MutPred2 Score
|
Affected PROSITE and ELM Motifs
|
Molecular mechanisms with P-values < = 0.05
|
1.
|
rs112503211
|
S147P
|
0.652
|
ELME000063, ELME000064, ELME000136, ELME000159, ELME000202, ELME000239, ELME000249, ELME000
|
Gain of Phosphorylation at S146, Prob: 0.29, P-value: 0.02;
Altered Transmembrane protein, Prob: 0.17, P-value: 9.1e-03;
Loss of Ubiquitylation at K142, Prob: 0.16, P-value: 0.03;
Loss of GPI-anchor amidation at N148, Prob: 0.03, P-value: 8.4e-03
|
2.
|
rs113949962
|
M1I
|
0.881
|
ELME000355
|
Altered Disordered interface, Prob: 0.39, P-value: 5.1e-03;
Altered Ordered interface, Prob: 0.27, P-value: 5.6e-03;
Altered Signal peptide, Prob: 0.18, P-value: 8.8e-04;
Loss of N-terminal acetylation at M1, Prob: 0.03, P-value: 5.6e-03
|
3.
|
rs147043821
|
L218F
|
0.781
|
ELME000239, ELME000333, ELME000335
|
Loss of Helix, Prob: 0.33, P-value: 1.2e-03;
Gain of Strand, Prob: 0.28, P-value: 7.5e-03
|
4.
|
rs147818666
|
G663A
|
0.856
|
ELME000063, PS00008
|
Gain of Helix, Prob: 0.28, P-value: 0.03; Loss of Allosteric site at M659, Prob: 0.24, P-value: 0.02; Loss of Acetylation at K662, Prob: 0.24, P-value: 0.02
|
5.
|
rs193026499
|
R595C
|
0.569
|
ELME000155
|
Altered Ordered interface, Prob: 0.31, P-value: 0.01
Gain of Allosteric site at R590, Prob: 0.24, P-value: 0.01
Loss of Catalytic site at R590, Prob: 0.22, P-value: 8.8e-03
Gain of ADP-ribosylation at R598, Prob: 0.19, P-value: 0.04
Altered Transmembrane protein, Prob: 0.13, P-value: 0.02
Altered Metal binding, Prob: 0.05, P-value: 0.04
|
6.
|
rs368129510
|
R159C
|
0.304
|
-
|
-
|
Table 7
Disease probability prediction of SNP by SNP&GO.
SI
|
SNP
|
Mutation
|
Prediction
|
RI
|
Probability
|
Method
|
1.
|
rs112503211
|
S147P
|
Disease
|
4
|
0.710
|
PhD-SNP: F[S] = 52% F[P] = 0% Nali = 66
|
2.
|
rs113949962
|
M1I
|
Neutral
|
3
|
0.352
|
PhD-SNP: F[M] = 100% F[I] = 0% Nali = 37
|
3.
|
rs147043821
|
L218F
|
Disease
|
4
|
0.713
|
PhD-SNP: F[L] = 91% F[F] = 0% Nali = 66
|
4.
|
rs147818666
|
G663A
|
Neutral
|
6
|
0.197
|
PhD-SNP: F[G] = 62% F[A] = 33% Nali = 377
|
5.
|
rs148335635
|
N204S
|
Disease
|
6
|
0.820
|
PhD-SNP: F[N] = 75% F[S] = 0% Nali = 66
|
6.
|
rs193026499
|
R595C
|
Disease
|
6
|
0.793
|
PhD-SNP: F[R] = 51% F[C] = 0% Nali = 356
|
7.
|
rs368129510
|
R159C
|
Disease
|
6
|
0.806
|
PhD-SNP: F[R] = 52% F[C] = 0% Nali = 66
|
High-Risk Missense SNPs are located in the conserved region
Biological processes rely on functional sites of proteins such as catalytic sites, allosteric sites, and protein-protein interaction sites. Amino acids present in these biologically active sites tend to be highly conserved, compared to any other residues in the protein. Any substitution of these residues generally leads to complete loss of biological functions and render severe damaging effect to the biological process itself [25]. The retrieved amino acid corresponding to missense SNPs was utilized to identify the suitable template to build the 3D structure. To predict the 3D structure retrieved amino acid sequence was submitted to the NCBI protein BLAST tool to recognize the structure of the closest related proteins. The structure PDB ID: 3cd5 with 99.8% identity was selected and built 3D structure. Later, the InterEvDock2 server was used to calculate the degree of evolutionary conservation at each amino acid position of the HMGCR protein. InterEvDock2 identifies putative structural and functional residues and determine their evolutionary conservation [26]. Although a complete analysis was done, we focused on the conservation profile of the selected 7 high-risk missense SNP locations. The analysis showed that residues S147, M1, L218, G663, N204, R595, and R159 are highly conserved (Fig.4).
These conserved residues in HMGCR might have imperative functional importance and are identified as functional or structural based on their location relative to the protein surface or the protein core.
High-Risk missense SNPs are capable of inducing proteins unfolding.
The neural network-based routine tool I-Mutant 2.0 was used for examining the potential modifications in protein stability due to mutations. Models with the following mutations S147, M1, L218, G663, N204, R595, and R159 were submitted to the server for DDG stability prediction and RSA calculation. All the mutations decreased protein stability except rs113949962, which is shown to be increasing structural stability (0.58 Kcal/mol). Mutation rs368129510 accounted for the lowest DDG value (−3.34 kcal/mol), meaning to be more unstable due to this mutation (Fig.5).
Figure 5: Free Energy Calculation of polymorphisms.
All other mutations rs112503211, rs147043821, rs147818666 and rs193026499 have the DDG values respectively − 0.88 kcal/mol, -1.27 kcal/mol, -0.67 kcal/mol and − 0.56 kcal/mol; this suggests decreased protein stability, due to DDG values being less than 0 (Fig. 5). Further, we have also analyzed the Surface Accessibility Surface Area (SASA) and angles of the protein structure of alpha helix and beta sheet in both wild and mutant models of HMGCR. All the mutations reside in alpha helix and beta sheet which can induce the conformational change in protein (Table 8).
Table 8
Structural Variation upon the Mutational Impact.
SI
|
Muation
|
Mutation Type
|
Residue Name and Number
|
Structure
|
SASA
|
Phi Angle
|
Psi Angle
|
1.
|
rs112503211
|
Wild
|
S147
|
Coil
|
15.4
|
-57.58
|
144.73
|
Mutant
|
147P
|
18.8
|
-60.07
|
140.74
|
2.
|
rs368129510
|
Wild
|
R159
|
Alpha Helix
|
85.6
|
-57.10
|
-39.91
|
Mutant
|
159C
|
Alpha Helix
|
68.3
|
-66.57
|
-41.71
|
3.
|
rs147043821
|
Wild
|
L218
|
Alpha Helix
|
21.4
|
-65.25
|
-40.28
|
Mutant
|
218F
|
Alpha Helix
|
78.2
|
-64.22
|
-44.63
|
4.
|
rs193026499
|
Wild
|
R595
|
Beta Sheet
|
2.5
|
125.89
|
145.20
|
Mutant
|
595C
|
Beta Sheet
|
67.7
|
126.90
|
144.29
|
5.
|
rs147818666
|
Wild
|
G663
|
Alpha Helix
|
0.0
|
-64.63
|
-39.48
|
Mutant
|
663A
|
Alpha Helix
|
0.0
|
-64.96
|
-39.53
|
SASA may change due to mutation when the amino acid substitution is occurred. Here, the SASA value has been changed in all the mutations except rs147818666. The SASA value of rs368129510 has been slightly decreased than the wild type model. The increasing SASA value of the mutations rs147043821 and rs193026499 may lead to the unfolding of the protein 3D structure, meaning to the loss of the biological functions of the HMGCR gene (Table 8).
High risk polymorphisms are likely to alter the Domain Structures of HMGCR
The Prosite-ExPasy tool was used to search for domain structures in HMGCR and map the mutations in the domains for determining the changes they might cause in the domain structures. The tool searches the UNIProtKB database for motifs and in the produced result showed Sterol-Sensing Domain (SSD) and Hydroxymethylglutaryl-coenzyme A reductase (HMG-Co-A) domain in HMGCR. The SSD domain consists of 61–218 amino acid region in the HMGCR gene and HMG-Co-A is composed of 464–871 amino acid residues in the HMGCR region. All mutations S147, L218, G663, N204, R595, and R159C except 1MI are located in the SSD and HMG-Co-A domain (Fig.6).