In-silico analysis of mutations in ANK1, SPTB, SPTA1, SLC4A1, and EPB4.2 genes responsible for Hereditary Spherocytosis


 Hereditary spherocytosis (HS) is a rare inherited disorder of red blood cells which are characterized by spherical, doughnut-shaped with increase deformability that lead to the gallstones and splenomegaly. The role of mutation in the genes responsible for the regulation of synthesis of proteins and stucture of RBC is well know studied. It was found that there are five genes whose mutation result in hereditary spherocytosis.Therefore, we aimed to study the consequences of ANK1, EPB4.2, SPTA1, SPTB, and SLC4A1 non-synonymous mutaion by using advanced inslico methods. Studied for nsSNPs using insilico techniques including OMIN, clinVar, SIFT, Polyphen, homology modelling. Misssence nsSNP were identified in all the gene selected and their effect on the protein structure, stability and functioning was studies. The result showed that 52 nsSNPs are responsible for the changes in the shape of RBCs. After identifying the nsSNPs the structure of proteins were modelled and their RMSD, relative solvent accessibility, and protein stability were studied. Protein stability analysis revealed significant change in free energy (ΔΔG) of the most identified nsSNPs variants. These finding may be helpful for genotype-phenotype research as well as development in pharmacogenetic studies. Finally, this study unveil a significance of inslico methods to figure out highly pathogenic genomic variants affected the structure and functional of HS causing protein

nsSNPs are crucial, as they might be playing a major role in the functional diversity of coded proteins in humans that are responsible for various complex or hereditary diseases [16,17]. nsSNPs may be deleterious or tolerant in nature (18). Deleterious nsSNPs affects the function, interactions, and structure of proteins. Damaging effects may affect the stability of protein structure, alter the protein charge, geometry, translation, inter/intra protein structure, change in gene regulation, transcription and structure integrity of cells etc [19,20].
All these studies support the fact that SNP especially missense SNP is the simplest and most common source of genetic polymorphism in the human genome. Reports have been found that the missense SNPs is responsible for HS that leads to heredity anemia. This is the first ever study which involve
RMSD is vital parameters used to measures the rate of deviation of mutant protein structure form native protein structures and the deflection in the rate of protein structure is directly proportional to RMSD values. The increased RMSD value suggests that deviation in mutant and native protein structure configuration, which implies has extreme effect on protein structure, stability and function.
Protein Relative Solvent Accessibility (RSA) is necessary to know the limit to examine the protein stability. The protein RSA was examined by NetsurfP server. Solvent accessible are studies determined the surface area of protein. NetsurfP sever gives the result in two categories, buried (B) and exposed (E), indicating the low and high accessibility of biomolecules insolvent [22]. I-Mutant prediction result, RMSD calculation data and NetsurfP prediction results are given in table2.

Effect of change of amino acid on membrane protein phenotypic description:
Amino acid point mutations (nsSNPs) may change protein structure and function. This study compares residues that locally change protein three-dimensional structure. Such local conformational changes may impact protein function and may cause disease. Usually, this is more likely for structure changes connected to binding sites and folding. For instance, the disruption of hydrophobic interactions, hydrogen bonding, or the introduction of charged residues into buried sites, or mutations that break beta-sheets often impact phenotype severely and raise the susceptibility for disease [39][40][41].
Compared results are given in table 3.

Generation of the 3 D structure of membrane proteins
Target of ANK1, EPB42 and SLC4A1 is not experimentally determined completely cover sequence; computational based techniques were used for 3 D structure generate (Figure1). The FASTA sequence of the ANK1 protein (ID: P16157) with 1881 amino acid residues, SLC4A1 protein (ID: P02730) with 911 amino acid residues and EPB42 protein (ID: P16452) with 691 amino acid residues was identified from UniProt database. The template was maximum identity recognized by subjecting the objected sequence to BlastP. The template was validated by subjecting the received sequence to BlastP and represents the resulted maximum E-value. The ID: 4RLV was selected as template of ANK1 protein on the basis of query cover (45%) maximum identity (65%) with target. The ID 4PYGwas selected as template of EPB42 protein on the basis of query cover (99%) maximum identity (33%) with target.
The ID: 4YZF was selected as template of SLC4A1 protein on the basis of query cover (99 %) maximum identity (99 %) with target but other two proteins having below (35 %) of similarity.

The validation of the model
The 3 D structure of protein model developed was validated by PROCHECK and ProSA for model integrity [43]. Ramachanran plot (Figure 2) of the ANK1 protein represents 80.1% (584 aa) of the total residues in the most favoured region and 14.1% (103 aa) in additionally allowed regions, this indicating good quality of model; but in mutant ANK1 protein represents 76.4/% (557aa) of the total residues in the most favoured region and 18.2 % (133 aa) in additionally allowed regions, this indicating impact on protein model structure after mutation. In SLC4A1 protein analysis 86.5% (526 aa) of the total residues where the most favoured region and 10.9 % (66 aa) in additionally allowed regions, this indicating good quality model but in mutant SLC4A1 protein represents 86.4/% (586aa) of the total residues in the most favoured region and 10.7 % (85aa) in additionally allowed regions, this indicating impact on protein model structure after mutation. In case of EPB42 protein analysis 89.5% (544aa) of the total residues in the most favoured region and 8.1% (49 aa) in additionally allowed regions, indicating good quality model; but in mutant EPB42 protein represents 89.74/% (558aa) of the total residues in the most favoured region and 7.9 % (49 aa) in additionally allowed regions, this indicating impact on protein model structure after mutation. Moreover, ProSA Z-score (dark spot) is 0.33, -3.82, -7.83 which falls within the values range of the know protein determine by X-ray (light blue) and NMR (dark blue).

Superimpostion of mutant protein structures
There were five HS related gene namely ANK1, SLC4A1, EPB42, and SPTB. Here required to an examination of mutational study on structure level for damaging mutation of gene those genes have played a vital role in proving the shape to RBC. The individual was compared mutational with native structure. In ANK1gene the mutations namely (R1218W, T472I, R446T, T188M, N251K, R699Q),

Discussion
The RBC cell membrane disorder leads to a weakening of the cell membrane stability, increase fragility, membrane loss and leading to morphological changes due to irregular shape. This is most rare disorder. The genes responsible to maintain the cell shape of RBCs and their membrane disorders include ANK1 (Ankyrin 1), SLC4A1 (band 3), SPTA1 (α-spectrin), SPTB (βspectrin), EPB4.2 (protein4.2). RBC cells function due to their unique shape. Studies reported that mutation in membrane proteins altered shape and function of cells, with this affect the protein-protein interaction.
Till date various genetic mutations in membrane genes are found with the help of advance techniques.
It becomes necessary to aware of the rare causes of inherited RBC disorder which is not easily detected by routine laboratory approaches. It is the best example of compound heterozygosity, in this case clinically insignificant defects of RBC cell genes present in parents but not responsible to caused significant hemolysis in offspring. However, the biological outcome of the alleles is not clearly understood. The relative study of genomic variants responsible for precise clinical data with their molecular approach is more expensive as well as time taking. Whereas, Insilco analysis is useful and advance approach that can help in investigate the selection of pathogenic variants in genetic variants studies and for the structural and functional protein phenotype prediction.
In the present investigation, advanced computational approaches has refined Missense mutations and predicted structure and functional impacts on RBC cell membrane protein. SIFT, clinVar tools were used to screen and identify the most pathogenic missence variants of RBC cell membrane protein.

Collection of Missense Variants
The dataset related to missence variants in human genome was collected from the clinVar database (https://www.ncbi.nlm.nih.gov/clinvar/). All collected SNVs were classified as non-coding and coding based on the nature and condition of the variants. Only Missense variants (nsSNPs) were chosen for computational analyzes because of their ability to disturb the structural deformation of proteins [23].

Identification of nsSNP and their functional effect
Identification of missence SNPs was done by using the Soring Intoleration From Toleration (SIFT) tool. Sorting Intoleration From Toleration (SIFT) is online bioinformatics software which is used to find out the deleterious coding nsSNPs form other SNPs [24]. SIFT (http://sift.jcvi.org/) predicts the nsSNPs in submitted chromosome number and position that affect amino acid substitution in effective proteins. It assigns a score for each amino acid residue on range of 0 to 1, if the score is equal or less than 0.05 the nsSNPs is taken as intoleran nsSNP. The results were cross checked by using another online softwere PolyPhen-2 [25]. PolyPhen-2 predicts the functional significance of SNPs.
WHESS.db module of PolyPhen-2, is rapid, reliable tool and result are produced quickly after the submitted query as nsSNPs. The outcome can be: probably damaging, possibly damaging or benign, with score for amino acid residue on range of 0 to 1, if the score is 0 to 0.15 the nsSNPs is taken as benign similarly a score between 0.15-1.0 reflects damaging nsSNPs. Polyphen server was accessed at http://genetics.bwh.harvard.edu/pph2/index.shtml [26].

Mutant protein stability prediction
Protein stability analysis of mutant protein was done by I-Mutant2.0 tool [27]. I-Mutant2.0 tool is support vector machine SVM based tool that predicts the change in protein stability upon single point mutation while taking the target protein sequence or structure (PDB formate and Chain) as an input.
All nsSNP of hereditary spherocytosis disease were submitted to the I-Mutant2.0 tool to predict the free energy change values. Results obtained are in the form of stability of protein (increased or decreased stability) and Gibbs free energy in the form of DDG values [28]. The decrease protein stability is reflections of increased degradation, miss-folding, and aggregation of the protein and viceversa. Sequence of proteins ankyrin, α and β spectrins, band2 and Erythrocyte membrane protein band 4.2 were retrieved form uniprot database and further subjected to protein stability prediction analysis using I-Mutant2.0 tool. Protein sequences AC P16157, AC P16452, AC P02730, AC P02549 of alpha-spectrin and AC P11277 of Ankyrin, Erythrocyte membrane protein band 4.2, band 3, α and β respectively spectrin was selected for analysis.

Effect of mutant protein on solvent accessibility
Solvent accessibility of mutant protein was detected by using online NetSurfP server. NetSurfP severs Predicts the solvent accessible surface of protein, by using the FASTA sequence of protein. This sever performs the solvent accessibility on the basis of accessibility of the amino acid residues by the solvent (water) and analysis was obtained in two classes as ether buried or exposed [29,30].

3D modelling of the protein and RMSD calculations
The generation of 3 D structure of Ankyrin, erythrocyte membrane protein was carried out using modeller9.22 [31]. The amino acid sequences of membrane proteins was retrieve from the UniProtKB database in FASTA format, and submitted to BLASTp to identify suitable protein template. The template with (>30%) identity was selected for generating the 3D structure of the target [32][33][34].
PyMOL was used for calculating the RMSD (Root Mean Square Deviation) values between native structure and mutant four of each protein [35][36].

3D structure validation
The quality of the generated model was evaluated by PROCHECK and Protein Structural analysis (ProSA). PROCHECK a sever that relies on Ramachandran plot for structure verification; understands the stereo-chemical quality of the model. ProSA was used to refine and validate the experimental protein structure and modelling [37,38]. (rs138732899, rs41273519, rs35733059, rs36057043, rs121918634, rs121918642), SLC4A1 (rs121912741, rs121912755, rs121912758, rs121912754) were found to be highly damaging and deleterious to the red blood cells protein structure, stability and function and were predicted to be key cause for occurrence of Hereditary spherocytosis. These above predicted deleterious mutation might be result in the altered function of given five genes leading to hereditary spherocytosis. Mutant ANK1, SPTB, SPTA1, EPB42 and SLC4A1 gene are active (expresses) in red blood cells, but also found in brain and muscle cells, these genes are located at cell membrane and bind to other membrane protein which effect on stability and structure of red blood cells. Which was predicted to be the outcome associated with effect of the nsSNP's above on ankyrin, spectrins (alpha and beta), band2

Conclusion
and Erythrocyte membrane protein band 4.2 proteins. Those nsSNPs are causing potential effect on functional and structural of protein. This study is more helpful, important nsSNPs to select for wet lab evaluation and development in potent drug discovery.  Tables   Table 1: The list of genes, damaged nsSNPs and affected amino acids their tolerance index, predicted impact and probability score by SIFT and POLYPHEN   Glycine is more hydrophobic than Arginine residue D N Aspartic Acid (native) residue charge was negative and Asparagine (mutant) residue charge is neutral. The difference in charge will disturb the ionic interaction R H Both residues are positive charges but Arginine (native) residues is bigger size than Histidine (mutant) V M Methionine (mutant) is bigger than valine (native) residues; Both residues are hydrophobic nature.

GENE
W R Tryptophan (native) is bigger than Arginine (mutant). Arginine residue charge was positive and Tryptophan residue charge is neutral. The mutant residue is more hydrophobic than the native residue P L proline (mutant) residue charge is neutral and leucine (native) is Non-polar, aliphatic residues T A Threonine (native)is bigger than alanine (mutant). Threonine is hydrophilic molecule but alanine is hydrophobic.
S P Serine is polar residues from hydrogen bond and proline is hydrophobic molecules normally buried inside the protein core. G D Glycine is smaller than Aspratic acid (mutant). Aspratic acid residues were negative charge it's from salt bridge. S F phenylalanine (mutant) is bigger than Serine. Serine is polar residues from hydrogen bond but phenylalanine hydrophobic molecules normally buried inside the protein core. R C Arginine residue charge was positive and cysteine (mutant) is hydrophilic molecule. Arginine is bigger than cysteine residues R S Arginine residue charge was positive and serine (mutant) is hydrophilic molecule, Arginine is bigger than serine residues. L M Methionine (mutant) is bigger than leucine (native). A T Threonine (mutant) is bigger than alanine (native). Threonine is hydrophilic molecule but alanine is hydrophobic. E D Both are negative charge molecules but glutamic acid is bigger than aspartic acid L F Phenylalanine (mutant) is bigger than leucine (native). V A Both residue are hydrophobic but valine (native) have bigger size than alanine (mutant) R I Arginine (native) is bigger than isoleucine (native). Arginine residue charge was positive isoleucine is hydrophobic molecules normally buried inside the protein core.
A) B) C) Fig.1. (A.) EPB42 in, (I) symbolizes A112T mutation (II) corresponds to R280Q mutation;Likewise, (B) For SLC4A1 protein, (I) characterizes the native structure of SLC4A1 and (II) represents the native structure superimposed to the mutant structure where (II) symbolizes the R589H mutation (III) represents S613F mutation, (IV) represents R646W mutation, (V) represents G701D mutation,(VI) represents G771D mutation, (VII) represents S773P mutation, (C) For ANK1protein, is the native structure of ANK1 in cartoon representation and in the native structure is superimposed with the mutant structure where (I) corresponds to R446T mutation, (II) represents N251K mutation, (III) represents T188M mutation, (VI) represents R699Q mutation where the native residues are highlighted in cyan colour and the mutants in magenta colours Fig2. Stereochemical analysis of membrane proteins A) EPB4.2, B) SLC4A1 and C) ANK1. The red region declares the most favourable area of residues; the yellow region is additionally allowed. The RC plot of EPB42 declares 92 % of residues falling in allowed region, RC plot of SLC4A1 declares 86 % of residues falling in allowed region and RC plot of ANK1 declares 80.1% of residues falling in allowed region.