In-silico analysis of BCL2 gene using multiple bioinformatics tools to identify the most lethal mutations that are crucial for its structural and functional integrity

BCL2 was the rst ever known gene for anti-apoptotic activity, that encodes for essential proteins of the external mitochondrial membrane. Regarding tumorigenesis, deregulated BCL2 expression and related proteins have been recognized as characteristic of several human cancers and there is concrete evidence that the deregulated expression of BCL2 like proteins plays a vital role in tumor development, persistence and therapeutic resistance. Therefore, it is important to identify the polymorphisms of BCL2 that are both structurally and functionally important for research to nd their possible malfunctions and therapeutics. For this reason, in our research, we have used a variety of bioinformatics tools to recognize the most destructive nsSNPs that may be important for the structure and function of BCL2. In silico tools, PROVEAN, SIFT, SNP&GO, PhD SNP, and PolyPhen2 included a variety of other tools such as I Mutant, MutPred, and ConSurf, to study their conservation proles to validate their stability, structural, and functional impacts. Post-transcriptional alteration sites were also predicted followed by application of 3-D mapping with I-TASSER and Phyre2 tools. Furthermore, the gene interactions were mapped via STRING and GeneMANIA. We also found that nsSNPs Q118R (rs759928495), G193R (rs1197820694), R129C (rs777784952), and Ll81V (rs752310933) are the most destructive nsSNPs in BCL2 genes that can have a vital part in BCL2 protein defects and possibly cause different cancers. Gene-gene interactions showed relation of BCL2 with other genes depicting its importance in several pathways and co-expressions. This research is the rst of its kind and offers future prospects for the development of dedicated medicines as well. In the animal models, the effects of BCL2 can also be tested in diseases. Such should be the study of BCL2 proteins from cancer patients. The effects of BCL2 can also be tested on animal models. nsSNPs, while 398 and 1842 contained in 5' UTR 3' The remaining certain forms (nonsense, uncategorized, analysis. This data suggests that a very small number of SNPs are exposed to BCL2 protein to directly affect their protein. Observing all nsSNPs in a given region these four, Q118R (rs759928495), G193R (rs1197820694), R129C (rs777784952) and Ll81V (rs752310933) were shown to be the most damaging. PROVEAN expected G193R with the highest score of -7.039 while L181V had projected the lowest score of -2.706 among these four nsSNPs. PolyPhen2 predicted nsSNPs on a scale of 0-1 which included Q118R, R129C, L181V and G193R with exact 1 ranking. PhD-SNP and SNP&GO forecast, G193R as the most detrimental with the highest probability score of 10 and 0.992.


Introduction
There are different categories of mutations that may occur in the DNA sequence of human. Among them the single base pair changes termed as SNPs (Single Nucleotide Polymorphisms) constitutes ~90% of it. It is estimated that around 500,000 SNPs occur in the coding region of human genomes (Collins et al., 1998). The nsSNps (nonsynonymous Snps) are some of the SNPs that results in the functional and structural diversity of human proteins (Lander 1996). Such nsSNPs have a very important effect in changing the amino acids that cause favorable or deleterious effects on protein structure and/or function (Capriotti and Altman, 2011). These alterations include shift in gene regulation, instability of protein structure (Boroso et. al., 1999), hydrophobicity, impacting structure and protein charge (Petukh et. a., 2015), translation, relative stability, protein interrelationships and dynamics (Inter/Intra) (Chasman andAdams, 2001, Kucukkal et. al., 2015) are other harmful effects for structural and functional cell integrity (Thomas et. al., 1999).
Single Nucleotide Polymorphisms in BCl2 gene have been also found that encodes an integral outer mitochondrial membrane protein that controls apoptosis, upregulate in nearly 50 percent of all human cancers and ful ll its role as an apoptotic control system. They are the principal regulators of apoptosis mitochondrial pathways. This mechanism is important for the normal development of embryos and cancer prevention. The mitochondrial external membrane, which releases cytohrome c and further apoptotic factors into the cytosol, is regulated by these proteins. The BCL2 family are generally classi ed into two groups, with up to four BCL2 homology domains (BH1, BH2, BH3 & BH4), antiapoptotic and pro-apoptotic proteins. Their intracellular function and sequence homology are based on anti-apoptotic and pro apoptotic proteins. BCL2 plays a major role as an apoptotic regulation agent for nearly one half of all human cancers (Cory et al, 2003, Yip and Reed, 2008, Reed 2008). In small cell lymphomas, namely chronic lymphocytic leukemia (CLL), peripheral lymphoma and cell lymphoma mantle, BCL2 are over-expressed, although less than 5 percent are BCL2-related (Tomite, N., 2011) Elevated BCL2 expression is often reported in nearly all patients suffering from acute lymphocytis and acute myeloid leukemia (Yip and Reed in 2008). Though, BCL2 protein has inappropriate expression in most adult follicular lymphoma FL events, the pediatric kind of follicular lymphoma (FL) is negative to expression of BCL2. About 30 percent of the patients who are suffering from large-diffuse B cell lymphoma (DLBCL), are regarded as relatively exposed to BCL (Schuetz et al., 2012). BCL2 is also suspected of being an inappropriate expression in non-hematic tumours, and in solid tumors such as prostate, breast, small cells and non-small cells lung cancers. Among small cells lung cancer, strong BCL2 expression was identi ed for almost for >90 percent of patients (Verdoodt et al., 2013, Anagnostou et al., 2010Oakes et al., 2012. The BCL2 inappropriate expression has been widely recognized for ovarian, neuroblastoma, lung, colorectal and certain cancers of the head and neck (Beierle et al., 2002Yasmeen et al., 2011Cho, H.J et al., 2006Carter, R.J., 2019Koehler, B.C., 2013 We have also proposed 3D model of wild type BCL2 protein and its possible deleterious ns SNPs protein models as well. For BCL2, this is rst study of its type which covers in silico analysis for protein and can be helpful in future for treatment of BCL2 associated diseases caused by nsSNPs.

Methodology
The work was carried out in many steps for which a schematic ow is given in gure 1.

Predicting Functional and Structural Effects
Mutpred 1.2 was used to model the functional and structural impact of nsSNPs (residual amino acid change) ). It is a web-based platform that tracks amino acid substitution and also forecasts molecular cause of a disease. This tool works on evaluating numerous structural and functional properties, such as increase in helical inclination or lack of site phosphorylation. The protein sequence of BCL2 in FASTA format was sent and the deleterious nsSNPs (change of amino acids) were chosen. P < 0.05 was deemed optimistic, and p < 0.01 was considered quite positive.
Protein stability Analysis I-Mutant 2.0 was used to test the stability of our target protein (Capriotti et. al., 2005). It is a web server that is based on vector machine technology. It predicts improvement in mutated protein stability and offers RI (Reliability Index) predictions that vary from 0 to 10 where 0 and 10 represent the lowest and highest reliability respectively. In order to predict the impact of deleterious nsSNPs on BCL2 protein, its protein sequence has been submitted with conditions set at 25 μC and 7.0 pH.

Prediction of Evolutionary Conservation of Protein
The determination of evolutionary conservation of all amino acids in a protein chain has been accomplished by ConSurf (Berezin et al., 2004). ConSurf analysis is based on phylogenetic connections between homologous sequences (Ashkenazy et. al., 2010;Celniker et. al., 2013;and Ashkenazy et. al., 2016).

Predicting 3D Protein Structure
Phyre2 is a 3D simulation tool for prediction of 3D protein structures (Kelly et. al., 2015). 3D models were created for wild type BCL2 and its 12 mutants associated with most deleterious nsSNPs. TM-align was used for contrast of wild type BCL2 with chosen mutants. It predicts TM ranking, RMSD (Root Mean Square Deviation), and structural superposition. TM scores are given in the range from 0 to 1, where 1 indicates greater structural similarity. Higher the RMSD values, the difference between mutant and wild form systems will be greater (Carugo andPongor, 2010 andZhang andSkolnick, 2005). For further analysis of protein 3D structure differences (Zhang, 2008, Roy et. al., 2010and Yang et. al., 2015, three mutants with higher RMSD values were sent to the I-TASSER along with the wild type of BCL2 Chimera 1.11 has been used to analyze molecular dynamics and digital analysis of the corresponding protein structure (Petterson et. al., 2004).

Post-Transcriptional Modi cation (PTM) Sites Prediction
A thorough study on PTM in protein will help in the prediction of protein activity. GPS-MSP 3.0 had expected methylation sites in BCL2 protein (Deng et. al., 2017). Predicting phosphorylation sites using GPS 3.0 and NetPhos 3.1 was performed at serine, tyrosine, and threonine residual locations in the BCL2 protein chain. GPS 3.0 expected a more speci c outcome with a higher phosphorylation ability than NetPhos 3.1 (Xue et. al., 2008). NetPhos 3.1 has used neural network ensembles and set a threshold of 0.5 (Blom et. al., 1999). Residues with a score higher than the phosphorylated threshold are expected. BDM-PUB and UbPred were used for the prediction of ubiquitination sites in BCL2 protein. For UbPred, a healthy cut-off was chosen  and UbPred projected lysine residues as ubiquitinated getting a value equivalent to or above 0.62 threshold (Radivojac et al., 2010).

Gene-Gene Interaction of BCL2
GeneMANIA and STRING researched the relationship of the BCL2 gene to identify its relation with other genes and to forecast the impact of BCL2's nsSNPs on other associated genes (Farley et. al., 2010, Gasteiger et. al., 2003. GeneMANIA predicts gene-gene interaction dependent on co-expression, mechanisms, co-localization, similarities to the protein domain and association between genes and proteins. STRING projections were limited to the top 10 best interactive genes for which the criteria were; gene fusions, co-occurrence, co-expression, experimental, and biochemical data. The combined score for each target gene interacting was between 0 and 1, where 0 was the lowest interaction and 1 was the highest interaction.
BCL2 submitted as in put gene for generating gene-gene interaction network.

Results And Discussion
Recruited nsSNPs A total of 44931 SNPs were recruited from dbSNP i.e. the largest SNP pool comprising of 159 non-synonymous SNPs, 398 found in 5'UTR, 1842 existed in 3'UTR and other forms of SNPs ( Figure 2). We only investigated nsSNPs for further review.
Identi cation of deleterious nsSNPs All of the 159 nsSNPs recruited from dbSNP were exposed to four separate bioinformatics tools that were used to predict the impact of these nsSNPs on BCL2 protein function and structure. It composed of PROVEAN, PhD SNP, SIFT, and SNP & GO in silico devices. In PROVEAN, the threshold value was set at -2.5 and the version having the nal score below this threshold was deleterious. According to PROVEAN ndings 35 nsSNPs is shown to have deleterious effects.
In SIFT, a baseline value of 0.05 was assumed to be the TI (Tolerance Index), while the tests below this value were deemed to be e cient or intolerant. SIFT had expected the hostility of 35 nsSNPs. 4 nsSNPs were expected to be diseased by PhD SNP. SNPs and GO concluded in 98 nsSNPs categorized as sick. Result shown below in Figure 3. We picked 4 nsSNPs that were shown to be deleterious in all four tools and 0.00 TI in SIFT estimation as well. These 4 chosen nsSNPs have been sent to PolyPhen2 which predicts benign, probably damaging and possibly damaging effects. Next to positive, and possibly harmful, the most optimistic forecasts are likely to be detrimental. It also offers 0 to 1. count rate for further study, 3 nsSNPs with score 1 were picked, as they are deemed the most destructive.

MutPred Prediction for Structural and Functional Modi cation
The selected 4 nsSNPs from PolyPhen2 were subjected to server MutPred. It provided results that are shown in table 2 with the likelihood scores. The projections have shown that certain nsSNPs can induce alteration of the protein and can in uence its role or structure.

BCL2 Stability Prediction
I-Mutant was used to determine the stability of BCL2 proteins for the chosen nsSNPs and their replacement of amino acids. All selected nsSNPs were submitted separately and their outcome of decreasing / increasing stability was obtained with RI varying from 0 to 10 presented in table 3. In 3 selected nsSNPs, no improvement in stability and all showed decrease in stability is shown. This result predicted that by reducing its stability, these 4 nsSNP may cause greater harm to BCL2 protein. Note: The pH was set at 7.0 and Temperature at 25'C.

Evolutionary Conservation of BCL2 Protein
ConSurf provided us evolutionary pro le for all the amino acids of BCL2 protein. According to ConSurf predictions, R129C Q118R and G193R were highly conserved, exposed and functionally active residue. L181V was highly conserved and buried as structural residue. Conservation scores for each of the selected nsSNPs are given in table 4.
These results show that nsSNPs, which are located at highly conserved regions are the most damaging to the BCL2 protein function and structure.

3D-Modelling of BCL2 Protein
I-mutant estimated that the stability of 4 nsSNP protein in BCL2 protein will decrease and have been chosen for nal protein modeling. Protein sequences apply to I-TASSER for generation of BCL2 protein structure with single wild-type amino acid and mutants. It is the most accurate and sophisticated method to determine the structure of proteins. Wild type and mutant protein sequences were sent to I-Tasser for the development of BCL2 protein structure and for the creation of ve models for each BCL2 protein and mutant. I-TASSER used 2co9 (83 per cent identity) and 2nbiA (85 per cent thread alignment coverage) models. I-TASSER used ten SPARKS-X, Rally, HHSEARCH-2, FFAS-3D, HHSEARCH 1, HHSEARCH, Neff-PPAS, pGenTHREADER, and wdPPAS threading programs. The result of I-TASSER was submitted to TM-align for measurement of TM-score and RMSD value for each of the mutant models speci ed in table 5. Chimera 1.11 was used to show the structure of proteins and to analyze the molecular characterization shown in Figure 2. 0.0< TM-score < 0.30, random structural similarity 0±0.3 and 0.5 < TM-score < 1.00, in about the same fold 0.5±1

Predicted PTMs (Post Transcriptional Modi cations)
Protein structures and function are regulated by PTMs that have been shown to be involved in cell signaling and protein-protein interactions such as key events in biological systems (Dai and Gu, 2010;Shiloh and Ziv, 2013). In our analysis we examined whether the chosen nsSNPs displays some alteration of BCL2 protein throughout PTMs. Hence different bioinformatics tools were used to model the PTM locations in this protein.
Methylation: Methylation is an essential PTM as it in uences DNA binding and alters gene expresses in certain proteins, lysine residues when methylated. For this reason, GPS-MSP 3.0 was used for this purposes, BCL2 no sites would be methylated.

Phosphorylation:
Page 8/17 GPS 3.0 and NetPhos 3.1 predicted BCL2 protein phosphorylation sites. GPS 3.0 estimates that 34 residues (Serine 43 %, Thr:34% and Tyr:23%) will have the target for phosphorylation. On the other side, Netphose 3.1 had estimated 25 residues (Ser:14, Thr:09, Tyr:02) that would be able to get phosphorylated. The ndings of GPS 3.0 and NetPhos 3.1 are shown in the table to compare and common residues are given in table in g 6.... Ubiquitination: For ubiquitination prediction BDM-PUB and UbPred were used. BDMPUB predicted lysine residues to get ubiquitinated, while UbPred predicted four of the lysine residues to get ubiquitinated  Fig 5 and 6 respectively.

Discussion:
The protein encoded by BCL2 gene play a very important role in controlling cell death mechanisms such as apoptosis, necrosis and autophagy. The gene product act as apoptotic regulator by controlling mitochondrial apoptosis, normal embryonic cells growth and protection from cancer. BCL2 is one of the major factors responsible for 50% of all human cancers such lymphocytic leukemia, acute myeloid leukemia, follicular lymphoma (FL), prostate cancer, breast cancer, lung cancer, neuroblastoma, colorectal cancer etc. The aim of this study is to identify the potential SNPs in BCL2 gene area which maybe have effects on regulatory regions of the gene and its protein product. From the former analysis we predict that the analyze SNPs have signi cant role in above mention diseases.
The analysis of total SNPs in a selected region reveal that 159 were nsSNPs, while 398 and 1842 were contained in 5' UTR and 3' UTR regions respectively. The remaining SNPs were of certain forms (nonsense, uncategorized, intronic etc.) and were not included in the analysis. This data suggests that a very small number of SNPs are exposed to BCL2 protein to directly affect their protein. Observing all nsSNPs in a given region these four, Q118R (rs759928495), G193R (rs1197820694), R129C (rs777784952) and Ll81V (rs752310933) were shown to be the most damaging. PROVEAN expected G193R with the highest score of -7.039 while L181V had projected the lowest score of -2.706 among these four nsSNPs. PolyPhen2 predicted nsSNPs on a scale of 0-1 which included Q118R, R129C, L181V and G193R with exact 1 ranking. PhD-SNP and SNP&GO forecast, G193R as the most detrimental with the highest probability score of 10 and 0.992.
We have cross-checked these nsSNPs on several additional tools such as CADD The RI values are from 0-10 where 0 indicates zero reliability while 10 implies full reliability. Three SNPs lead to three G193R, Q118R, Ll81V and R129C amino acid substitutions expected to decrease protein but our result was not projected to improve SNPs protein stability. The G193R, Q118R, Ll81V and R129C reliability indexes were strong (see Table 3). But we are keeping our outcomes so predictable that we have tested these mutations in the CUPSAT registry. (http:/cupsat.tu-bs.de/) further. CUPSAT server predictions is entirely in line with I-Mutant predictions.
ConSurf projected BCL2 protein conservation pro le providing prediction to maintain, embedded, functionally or structurally signi cant growing amino acids. Based on the role of amino acids in protein, those that were highly conserved were expected to be either functionally or structurally signi cant (Berezin et. al., 2004). Highly conserved and exposed amino acids are expected to have many important functions such as associations in protein activities.
Our data were based on our 3 selected nsSNPs G193R, Q118R and R129C which are highly conserved, exposed and functionally essential. The L181V, which is highly conserved, buried and structurally signi cant. This further reinforces certain nsSNPs' deleterious effect on BCL2 protein.
The protein structures were modelled using I-TASSER. Only FASTA sequence of protein was used as a data. The method itself does prototype collection and protein simulation. It used 2nbiA and 2co9 models with a range of 85 preferred and allowed for residues and 7.2% for outlier residues. For the Q118R mutant structure, the preferred and allowed residues were 92.8% and 7.2% of the outer residues. The preferred and allowed residues of R129C, L181V, and G193R were 91.5%, 95% and outlier values are 92.4%, 8.4%, 5.1% and 7.6%, respectively. The con gurations of proteins will be regarded as stronger structures if their RAMPAGE values surpass 80 percent (Morris et al, 1992 (Dai and Gu 2010;Shiloh and Ziv 2013). We also searched for PTM locations in BCL2 protein whether or not it had possible PTMs at such places in nsSNPs.
Interestingly, none of the phosphorylation and ubiquitination sites have been shown to be at the most destructive location of nsSNPs.
We looked for other nsSNPs that identi ed as GPS3.0 34 and NetPhos3.1 at 25 positions expected to be phosphorylated (Table 4). At two possible phosphorylation sites predicted by GPS3.0 34 and NetPhos3.1 we nd one nsSNPs at amino acid positions 7. Only one nsSNPs was also found at amino acid location at the two possible ubiquitination sites expected by BDM-PUB and UbPred at amino acid position 17.
Our study was conducted in depth, and all ndings were cross-checked in order to avoid confusion. Every research has some drawbacks and so does ours. While our analysis was in depth, it is focused on web servers and computational software powered by mathematical and statistical algorithms that need experimental examination for validation.

Conclusion
Our study found bcl2 gene SNPs that could be vital in nearly 50 percent of human cancers. This study shows that different nsSNPs can disturb the structure and/or function of bcl2 protein. Four major mutations were found in the native protein bcl2 gene: glutamine! Arginine (rs759928495), glycine at position 118! Position 193 Arginine (rs1197820694), Arginine! Position 129 Cysteine (rs77784952), Leucine! The most damaging of all predicted were valine Ll81V (rs752310933). These SNPs can have deleterious effects on bcl2 protein and can play a key role in diseases. These four nsSNPs can therefore be strongly regarded as key candidates. It will contribute towards effective drug discovery and the development of accurate medicines when causing diseases related to bcl2 malfunctions.
Careful investigations and experiments in the wet laboratory are necessary for exploring the structure and function of the protein effects of these polymorphisms. Different animal models that consist of these important mutations in a bcl2 protein can also be extremely helpful in exploring in this disease. Our study can help in personalized medicines that are designed for various types of cancer patients and can also help in future cancer prediction and can help in cancer diagnosis with family history of cancer. Our study was carried out in detail, however it is in silicon study so that wet laboratories are required to study cell cultures and animal models containing these SNPs.

Declarations
Con ict of Interest The authors have declared no con ict of interest.

Figure 2
All SNPs in BCL2 gene   Gene-gene Interaction of BCL2 with other genes proposed by STRING

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.