In-silico predictions of deleterious SNPs in Human ephrin type-A receptor 3 (EPHA3) gene

Dipankor Chatterjee University of Dhaka Faculty of Biological Sciences Umar Faruq Chowdhury (  umarfchy@gmail.com ) University of Dhaka Faculty of Biological Sciences https://orcid.org/0000-0001-5088-4658 Mohammad Umer Sharif Shohan University of Dhaka Faculty of Biological Sciences Md Mohasin University of Dhaka Faculty of Biological Sciences Yearul Kabir University of Dhaka Faculty of Biological Sciences


Introduction
The ephrin receptor (EPH) family, one of the largest sub-group of receptor tyrosine kinase, involves many important functions, such as development, directing segmentation, axon guidance, fasciculation, angiogenesis, and limb development [1]. These proteins contain ve domains, such as an extracellular ligand domain, two types III bronectin domains, and two intracellular domains: the tyrosine kinase and sterile-alpha-motif (SAM) domains. Two types of ephrins (EPH ligands), type A ephrins, type B ephrins, tightly interact with the proteins through a glycosyl-phosphatidylinositol (GPI) moiety or a transmembrane domain. The interaction between the Eph receptor and ephrin-containing cells lead to bidirectional signaling, the events led to a series that either stabilize cell-cell contact or lead to cell repulsion. Eph receptors involve in many pathways, like cell shape via Rho activation, cell growth via the MAPK pathway, cell-cell adhesion via cadherin complexes, and cell-matrix interactions via integrin complexes by the downstream signaling from EPH receptors. Receptor-ligand dimers form heterotetramers, which further assemble into higher-order signaling clusters. A high-a nity binding site in the N-terminal domain of EPH receptor mediates intercellular Eph-ephrin interaction. Intracellular signaling by EphA3 receptors is initiated by autophosphorylation of three de ned tyrosine residues, two in the highly conserved juxtamembrane region and the third in the activation loop of the kinase domain (Y779) [2]. Rapid reorganization of the actin and myosin cytoskeleton follows, leading to retraction of cellular protrusions, membrane blebbing, and cell detachment, followed by the association of the adaptor protein CrkII with tyrosine-phosphorylated EphA3 and activation of RhoA signaling [3]. So, EPHA3 is a vital protein receptor for biological activities.
Page 3 /24 EPHA3 has been seen to be highly expressed during vertebrate development. EPHA3 expression is restricted in adult tissues and detected signi cantly at a low level compared to the early developmental stage of vertebrates. Abnormal expression of EPHA3 is associated with many cancers. These abnormalities may occur due to mutational changes in speci c regions of the EPHA3 gene. So EPHA3 receptor protein analysis should be conducted to better understand the diseases pertaining to the receptor as well as to strategize a therapeutic regime. By now, many studies reported that this protein is associated with many cancers, such as lung adenocarcinoma, colorectal carcinoma, metastatic melanoma, pancreatic cancer, and many others.
In-silico studies are now a very popular approach to nd speci c point mutations of a particular protein that has been associated with diseases or cancers. Many tools have been used in these in-silico based approaches to screening out the functional deleterious SNPs of a particular gene. And these tools can predict any changes associated with structure and functions and then provide scores based on their algorithms, which have quite a high degree of accuracy. By using these in-silico methods, functional and structural SNPs of many genes, such as BRCA1 (Breast cancer type 1 susceptibility protein), BRAF (B-Raf proto-oncogene) [4], IGF1R (Insulin-like growth factor 1 receptor) [5], have been identi ed.
In this study, several computational approaches have been carried out to nd the most deleterious nsSNPs in the EPHA3 gene, which may be associated with dysfunctionality, structural instability, and are susceptible to many diseases or cancers. This study will help to extend the knowledge related to the effect of nsSNPs on protein functions and the stability of the EPHA3 protein.

Collection of nsSNPs
SNPs for the EPHA3 gene were collected from the NCBI dbSNP database [6] (http://www.ncbi.nlm.nih.gov/SNP/) and EPHA3 protein sequence (FASTA format) was collected from the Uniprot database (https://www.uniprot.org/). SNPs related to Homo sapiens were taken for analysis. Selected SNPs associated with non-synonymous, missense, non-sense were being ltered out for analysis.
Identi cation of damaging nsSNPs by SIFT Non-synonymous SNPs (nsSNPs) of the EPHA3 gene were rst screened through Sort Intolerant From Tolerant (SIFT, https://sift.bii.a-star.edu.sg/) program for nding out the most damaging SNPs. SIFT can predict if an amino acid substitution affects protein functions or not by analyzing sequence homology with other similar proteins and the physical properties of amino acids. Prediction scores less than or equal to 0.05 are considered as damaging and greater than 0.05 are considered to be tolerated [7].

Analyzing the effects of damaging SNPs
To further solidify the SIFT predictions, PolyPhen-2, I-Mutant Suite, and PROVEAN tools were used.
PolyPhen-2 (Phenotyping Polymorphism version 2) tool can predict the effects of nsSNPs on structure and functions of the proteins by using machine learning method, TMHMM (transmembrane helix prediction by hidden markov model) algorithm, and high-quality multiple sequence alignment. PolyPhen-2 predicts the effects of nsSNPs and categorizes nsSNPs into probably damaging, possibly damaging, and benign SNPs [8].
I-Mutant Suite predicts changes in protein stability for any single point mutation by using a Support Vector Machine (SVM) based predictor [9]. PROVEAN (Protein Variation Effect Analyzer, http://provean.jcvi.org/index.php) tool which can predict any impact on protein functions due to substitution or insertion-deletion of any amino acid, were used. [10].

Identi cation of nsSNPs in protein domains
A FASTA sequence of EPHA3 protein was submitted to the InterPro server (https://www.ebi.ac.uk/interpro/). InterPro server predicts families, conserved domains, and important sites by using signatures provided by many databases. Positions of nsSNPs in different domains can be predicted through this server [15]. EPHA3-protein interactions by STRING database: STRING v11.0 (https://string-db.org/) was used for studying protein-protein interactions of EPHA3 with a high con dence score (≥ 90%). The server identi es functional and physical associations by computational predictions. It also connects other databases to better understand network interactions of proteins [16].

Conservational analysis of EPHA3
ConSurf bioinformatics tool (https://consurf.tau.ac.il/) is used for the analysis of evolutionary conservation of nsSNPs positions in a protein sequence [17]. In this tool, phylogenetic relations are being analyzed between homologous proteins. The FASTA sequence of EPHA3 protein was submitted to the server, and it screened out highly conserved nsSNPs [18].

Identi cation of Functional nsSNPs in 3 ' UTR by PolymiRTS
PolymiRTS (Polymorphism in microRNAs and their target sites, http://compbio.uthsc.edu/miRSNP/) database comprises naturally occurring DNA variations in microRNA (miRNA) seed regions and miRNA target sites [20]. PolymiRTS gives results in four forms D, N, C, and O, which indicates the derived allele disrupts a conserved miRNA site, the derived allele disrupts a nonconserved miRNA site, the derived allele creates a new miRNA site, and other cases respectively.

Results
The progression of the study, methods, and tools that were used to nd damaging nsSNPs for the EPHA3 gene were summarized in Fig. 1 (Fig. 2). nsSNPs on the EPHA3 gene were taken for analysis as they can alter amino acids. Deleterious nsSNPs identi ed by SIFT SIFT predicts damaging SNPs by analyzing sequence homology of the homologous proteins and physical properties of amino acids. Out of 631 nsSNPs, SIFT predicted 281 nsSNPs as damaging nsSNPs, and these SNPs were further analyzed in the rest of the tools.

Analysis of the effects of deleterious nsSNPs
PolyPhen-2 predicts probably damaging nsSNPs as the most damaging nsSNPs with high con dence prediction, and out of 281 nsSNPs, PolyPhen-2 predicted 222 nsSNPs to be most damaging. I-Mutant 3.0 (Suite) predicted 183 nsSNPs associated with the alteration of the stability of the protein. PROVEAN algorithm predicted 211 nsSNPs as deleterious nsSNPs that could affect protein functions.
The conservational outcome of nsSNPs and phylogenetic analysis of EPHA3 Protein Highly conserved residues are most likely to involve in protein's structural integrity and functions. So, the conservational pro le was evaluated for EPHA3 protein for further analysis. The ConSurf algorithm represented structural and functional conservational levels of all the amino acid residues of EPHA3 protein, illustrated in Fig. 4. ConSurf predicted 77 of 81 nsSNPs to be conserved. The result showed R66, R104, R160, Y779, R842, P846, and G766 as functional residues that are highly conserved and exposed. On the other hand, G628, I652, N751, Y596, D598, R745, and P824 are predicted as highly conserved and buried structural residues. Therefore, these outcomes further con rmed the predicted nsSNPs as high-risk deleterious nsSNPs, which may impact on structural stability and functions of EPHA3 protein. These ndings were also veri ed by multiple sequence analysis (MSA) through the MEGA X package.
Phylogenetic analysis by MEGA X revealed that humans (Homo sapiens) and chimpanzees (Pan troglodytes) have a high degree of sequence homology, indicating EPHA3 protein is highly conserved in primates. (Fig. 5) Functional SNPs prediction in the 3 ' UTR region by PolymiRTS PolymiRTS was used to predict nsSNPs in the miRNA target sites as any alteration in these sites may create or disrupt miRNA-mRNA interactions. The server predicted 21 SNPs presented in miRNA target site (Supplementary Table 4), among which 2 SNPs were found as INDELs, whose ancestral alleles are unde ned, can alter miRNA target site, and the other 19 SNPs can either disrupt conserved miRNA target site (D) or can create new miRNA target site (C).

Discussion
Several in-silico SNPs prediction tools with different algorithms were used in this study concluded multiple nsSNPs in highly conserved regions can potentially have detrimental consequences considering the importance of conserved regions for sustaining protein's structure and function [21]. Eighty-one common nsSNPs predicted by eight different prediction tools are possibly the most targeted mutations for various diseases. Evolutionary conservational pro le analysis of all amino acids of EPHA3 was needed to ensure the deleterious impact of selected nsSNPs, and the ConSurf server predicted conservational levels for all amino acids of EPHA3. The server predicted R104Q, R160H, Y779C, R842Q, P846A&T, and G766V functionally exposed nsSNPs as highly conserved residues with a conservancy level of 9 and level 8 for functional conserved residues R66S. On the other hand, it predicted G628E, I652M, N751D, Y596H, D598G, R745Q, P824S nsSNPs as structurally highly conserved buried residues with a conservancy level of 9. As these residues have a high degree of conservancy level, these nsSNPs are more likely to be deleterious in EPHA3 protein and can be used as genetic markers. To further evaluate the ConSurf web server results, multiple sequence alignment and phylogenetic tree analysis were done by the MEGA X package. The analysis showed the sequence homology of 12 different.
Phylogenetic tree generation by MEGA X using 1000 bootstraps with a 50% cut-off revealed the close association of Homo sapiens and Pan troglodytes, indicating catalytic kinase domain of EPHA3 protein is evolutionary conserved mostly in primates. Protein-protein interactions analyzed by STRING v11 revealed that EPHA3 protein is associated with many important functions. Interactions with Ephrin type A and Ephrin type B ligands induce collapse growth cones, adhesion, crucial migration by contact dependant bi-directional pathways. Also, have a pivotal role in forebrain function. Adapter molecule crk-II mediates attachment-induced MAPK8 activation, membrane ru ing, and cell motility in a Rac-dependent manner. The transforming RhoA protein is involved in microtubule-dependent signaling, which is important for the myosin contractile ring formation during cell cycle cytokinesis. The interactions also stimulate PKN2 (serine/threonine-protein kinase N2) kinase activity. It revealed that EPHA3 participates in many important pathways and any disruption may lead to abnormal body functions [22].
In ATP phosphorylation events, Magnesium atoms form chelation with the β and γ phosphate groups of AMP-PNP and with the side chains of the catalytic residues Asn751 (present in a helix) and Asp764 [23,24]. The catalytic residue Asparagine in the 751st position was found to be substituted with aspartate means a polar neutral amino acid gets substituted with a negatively charged aspartate (D) amino acid residue; this charge difference might hamper protein-protein interactions. So, N751D may contain a high probability to cause hamper in protein functions. EPHA3 activation depends on three events of autophosphorylation [25]; one event occurs in JMS (Juxtamembrane segment) peptide that contains two residues for phosphorylation, JX1 (Tyr596) and JX2(Tyr602), the second event is the phosphorylation of Tyr779 residues, and a third phosphorylation event occurs to the Tyr701 residue. These three phosphorylation events are highly responsible for protein activation. Disruption of these autophosphorylation events causes the progression of various cancers [22]. Tyr596 (Y596) and Tyr779 (Y779) were found to be substituted with Histidine (H) and Cysteine (C), respectively. In the unphosphorylated state of EPHA3, JX2, in concert with JX1(Y596), stabilizes JMS interaction with the core of the kinase domain. Substitution of tyrosine (Y) with positively charged Histidine (H) residue is more likely to hamper autophosphorylation events. Autophosphorylation of Y779, present in the activation loop (AL), involves catalytic activation of EPHA3. The phosphorylated Y779 can bind to SH2 domain-containing Crk adaptor protein, which ultimately initiates a signaling pathway to activate RhoA protein required for growth cone collapse induced by ephrin-A5 by the activation of EphA receptors [26]. Thus mutation in this residue might hamper important biological functions. Attenuation of EPHA3 receptor functions might link to lung cancer [27]. In our studies, it was found that Y779 is substituted with polar cysteine (C) residue, and thus, it may cause inappropriate interactions with surrounding residues, which leads to the deregulation of many important biological events [23]. Therefore, these studies con rmed that the predicted Y596H and Y779C nsSNPs most likely to have deleterious effects on EPHA3 protein.
The highly conserved DFG segment, an extension of the activation loop, is associated with the phosphorylation state of the EPHA3. In the active state, the DFG motif is settled to interact with activating Mg 2+ ions. From a previous study, it was found that in phosphatase treated samples, which contain well de ned JMS segment, activation loop (AL) only order up to DFG 766 (G 766 ->Gly766), and residues Leu767-Ile786 were found to be disordered [28]. This DFG segment is highly associated with the phosphorylation state of the EPHA3 protein. But an SNP was found in the DFG segment, which is in the 766th position. As Glycine (G), a polar hydrophilic residue, is substituted with nonpolar hydrophobic Valine (V) residue, this substitution may hamper catalytic activity, which can lead to various abnormalities [29]. This con rms the predicted G766V nsSNP might have deleterious effects on EPHA3 protein functions.
The sterile alpha motif (SAM) domain, which involves the oligomerization of receptors, interacts with the kinase domain through a linker peptide [30]. This linker peptide is accommodated in a deep pocket formed by αGH loop (residues Tyr841-Pro846) of the kinase domain. Mutations in this domain could affect the dimerization and phosphorylation state of the EPHA3 receptor [31]. In this study, R842Q and P846A, T, R nsSNPs were found to be associated with the pocket. Positively charged arginine (R) residue substituted with polar glutamine (Q) residue, which may cause alteration of charge density and leads to abnormal interactions. And Proline (P) residue was found to have the probability of being substituted with three different residues, alanine (A), Threonine (T), and arginine (R). These substitutions in this position may disrupt the favorable interactions in the αGH loop and thus linker-kinase domain interaction may not be strong enough to connect SAM domain. As a result, the EPHA3 protein might lose its function to pass the signal, as the oligomerization process is ine cient.
Pretreatment of ATP with protein makes the activation loop (AL) ordered, and this ordered state is stabilized with many H-bonds and Van der Waals interactions. One of the residues, Arg745, is associated with this interaction in stabilizing AL's ordered structure by forming an H-bond with Leu767 [32]. R745Q may be associated with deleterious effects on protein functions as this substitution alter the charge density. The further EPHA3 structure analysis reveals that Asp598 (D598) and Tyr596 (Y596) residues might also be important in the regulatory mechanism as they are involved in ordering the activation loop state. D598G and Y596H nsSNPs were found in this study, and so it suggests that these nsSNPs may have deleterious effects on the activation of EPHA3 protein.
G628 and I652 were found to be present in the ATP-binding domain (627-653) [33]. G628 present in B1strand of G-loop involved in closed conformation of N-terminal lobe. Phosphorylation events increase the exibility or dynamics of the N-terminal lobe, which helps move towards the active state of the protein. G628E and I652M nsSNPs may disrupt the ATP binding mechanism and alter EPHA3 protein activation. This could lead to loss of tumor-suppressive effect of EPHA3 protein [22,27].
R104 in EPH-ligand binding domain associated with a salt bridge with ephrin-A5 and it is one of the most conserved bindings. R104Q, where positively charged arginine (R) is substituted with polar hydrophobic glutamine residue (Q), may ultimately affect receptor-ligand interactions, and thus impaired signaling pathways may most likely occur [34]. R160H and R66S nsSNPs also disrupt the interactions between the EPH-ligand-binding domain and ephrin-A5. Ephrin-A5 and EPHA3 receptor interactions are important for cell communications functions in normal development. Any disruption in these interactions leads to neoplasia [35]. So, R104Q, R160H, and R66S were predicted as deleterious nsSNPs.
The substrate-binding regions are formed by the αF-αG loop/αG helix [33]. The αF-αG loop (821-829 residues) was found to be associated with an SNP in 824th positions. Nonpolar hydrophobic proline (P) residue, which is highly conserved and buried in the structure, substitutes with polar hydrophilic serine (S) residue. This substitution in concert with neighboring residues may cause alteration of the secondary structure of the substrate-binding domain. Thus, P824S is considered to possess deleterious effects on protein structure and functions.
A small hydrophobic cluster formed by M734, Y810, and F871 surrounding the conserved aspartate in the regulatory spine (known as the R-spine-Asp). Together, these three residues form an interaction network that extends the R-spine to E and I helices in the C-lobe. These PTK-conserved residues are referred to as the 'extended R-spine-network'. Precise positioning of the R-spine-Asp is important for kinase activation and regulation [36]. Thus these 3 residue networks are involved in kinase activation and regulation. Therefore, Y810H nsSNP may interfere with the activation and regulatory mechanism of EPHA3 protein as polar tyrosine (Y) gets replaced by positively charged histidine residue (H).
Five nsSNPs (N751D, D746E, R750Q, R712C, G784R) were found, from a previously done study [37], in the positions that take part in receptor substrate interaction. So substitutions in these positions may exert deleterious consequences.
Besides, 9 SNPs were found in the bronectin type 3 domain, among those V415M, L389F, G414V, S327T, R354G, T432N, W345R, R354Q were highly conserved, and 8 SNPs were found in the Fibronectin type 2 domain, among those Y470H and I467K, T was found highly conserved ( by both Consurf and MEGAx analysis). So these nsSNPs may alter the bronectin domain stability and affect EPHA3 protein functions [38]. Previous studies show that G518L mutation was associated with lung adenocarcinoma [39]. So this is plausible that this G518E may be associated with a certain disease as polar hydrophobic glycine (G) residue gets substituted by negatively charged glutamate (E) residue. Differential expression of EPHA3 is associated with many cancers, such as breast cancer, prostate cancer [40], lung cancer, glioblastoma multiforme (GBM) [41], melanoma, sarcoma, renal carcinoma, breast cancer, hematopoietic tumor [42]. EPHA3 is highly expressed in tumor cells and signi cantly takes part in tumor angiogenesis [43].
EPHA3 can be targeted for the possible treatment of many diseases because EPHA3 is associated with the development and maturation of lymphoid and myeloid cells [44,45]. As differential expression of the EPHA3 gene lead to many cancers, controlling the expression can be an ideal solution. So, silencing mRNA can be one of the possible routes to control the differential EPHA3 gene expression. siRNA (small interfering RNA) can be designed against a speci c conserved region in mRNA, which leads to knocking down the gene [46] a strategy proposed for other diseases as well [47,48]. Another possible route to limit the effects of abnormal EPHA3 receptors would be designing of antibody against it. ChIIIA4 is one of the antibodies that can work on EPHA3 receptors and inhibit tumor growth and disrupt the function of tumor micro-vessels [43]. KBoo4 is another anti-EPHA3 antibody, which can be used for treating hematologic malignancies [49]. Sitravatinib, Hesperadin, Vandetanib, and Fostamatinib are some inhibitory drugs, collected from the Drug Gene Interaction Database [50], which can be used to limit the detrimental effects of the EPHA3 receptor. Thus, designed siRNA, antibodies, or clinical drugs can be further modi ed to interact against different regions of EPHA3 receptor where deleterious nsSNPs are located. This will ultimately prevent the abnormal effects of the EPHA3 receptor.
Substitutions in conserved regulatory regions lead to altering the regulations of signaling pathways, which promote abnormal cell differentiation or proliferation. Various damaging nsSNPs were found through this study, N751D, Y596H, Y779C, D598G, P824S, G628E, I652M, R66S, R160H, R104Q were found to have the most deleterious impacts on protein structure and functions as these residues hold very conserved and important positions in the EPHA3 protein and may highly responsible for alternating the expression of EPHA3 gene, which can cause cancers or tumor growth.

Conclusion
This study unravels several nsSNPs that can be deleterious and have mutational impacts on protein structure and functions of the EPHA3 protein using an extensive computational approach. 77 nsSNPs showed a high level of conservancy level, and thus, they can be used as genetic markers that will facilitate the nding of diseases associated with EPHA3 protein. This study screened out signi cant mutational alterations which may alter conformational changes of the EPHA3 protein structure which ultimately lead to disruption of protein functions. Therefore, this comprehensive study will ultimately open many possibilities to work on diseases or cancers associated with EPHA3 protein and can provide necessary supports in the development of target-speci c therapeutic agents or drugs.

Declarations Declaration of Competing Interest
The authors declare that they have no competing interests.

Ethical approval
This study doesn't involve any human or animal-related experiment.

Funding
The research did not receive any funds.    Conservation predictions by ConSurf of human EPHA3 residues.