Identication of Most Lethal Non-synonymous SNPs in Tollip gene- An In-Silico Analysis

The Toll-interacting protein (TOLLIP), rst detected by hybrid screening using Interleukin-1 receptor accessory protein in 2000, is ubiquitous and its TLR signaling cascade gets negatively regulated by TOLLIP in particular by impeding the TLR4 and TLR2 pathways. Toll-interacting protein facilitates TLR and TGF-β type 1 receptor intracellular localization and lysosomal degradation and exerts its anti-apoptosis and pro-autophagy effects through interaction with a target of Myb1 membrane tracking protein 1 (TOM1) in autoimmunity. It also protects intestinal epithelial cells from apoptosis induced by (TNF-α) (IFN-γ) signaling and acts as a cargo adaptor linking (ATG8) autophagy gene 8 and microtubule-associated protein 1 light chain 3. Ubiquitin-modied cell debris along with coated autophagosomes removes harmful protein aggregates and maintains cellular homeostasis. It is therefore structurally and functionally important to detect TOLLIP polymorphisms to indicate the possible malfunctions and therapeutics. We have identied the gap of available data on nsSNPs in the TOLLIP gene in previous studies. Hence, We have used a wide range of bioinformatic techniques in this study to identify the most destructive nsSNPs in the TOLLIP gene. The in-silico tools such as PROVEAN, SIFT, SNP&GO, PhD SNP, and PolyPhen2 have been used followed by I Mutant, MutPred, and ConSurf. The 3-D mapping was carried out with I-TASSER and Phyre2. Though, STRING and GeneMANIA proposed the gene to gene interaction of TOLLIP. Our study identied G19D (rs866744102), G32R (rs1308704061), D71N (rs777772934), and E72G (rs1202660177) as the four most lethal non-synonymous SNPs in TOLLIP genes, which may play an essential part in defects of TOLLIP Protein and probably cause a different type of diseases. This is the rst study of its kind, and it could pave the way TOLLIP consists of three an N-terminal a conserved C2 domain, and a C-terminal Ubiquitin to ER lysosomal degradation domain. In TOLLIP proteins work by shutting down MyD88-dependent signaling pathways by inactivating IRAK-1 or by explicitly binding to the TLR2 and TLR4 13 . TOLLIP also contains protein that regulates other inammatory processes. TOLLIP C2 domain can bind with phosphatidic acid and a CUE domain which can interact with ubiquitinated proteins 14 . The C2 domain is involved in phagocytosis 15 . The JAK/STAT (Janus Kinase/Signal Transducers and Activators of Transcription) pathway is shown to be built around lysosomes 16 . C2 domain helps in liberating and transferring the transmembrane proteins from lysosome and Mitochondrial TOLLIP plays a substantial role in stimulating oxygen-derived free radicals 17 . TOLLIP by its upregulated expression in the intestine epithelial cells induces tolerance to the normal enteric ora. Failure in the upregulated expression in the TOLLIP gene can lead to chronic inammation in a patient with inammatory bowel disease 18 . TOLLIP has been associated with atopic dermatitis but the actual mechanism of the disease is still not known 19 . TOLLIP has been shown to act as a negative regulator by attenuating the function of IL1β 20 . Studies are expected to be conducted to understand TOLLIP and its functional and structural variants. We used bioinformatics methods to identify the most harmful nsSNPs found in the TOLLIP protein. We suggested 3D versions of wild-type TOLLIP protein and the protein that could be deleterious due to the nonsynonymous SNPs on the gene. For TOLLIP, this is the rst study of its kind that covers protein structure prediction and mutation analysis in the context of TOLLIP. It has not shown co-localization with any gene. In pathways, it relation with IRAK2, IL1RN, EHHADH, MYD88, IL1B, IL1A, IL1R1, TOLLIP has no genetic interactions with other genes. Predictions resulted from STRING showed that IRAK1 the most interactive gene with study, we examined the most damaging non-synonymous SNPs in the TOLLIP gene that may a critical role in diseases. The results have demonstrated the 196 nsSNPs, the data of other kinds of single nucleotide polymorphism extracted from the employed servers were excluded in this study. The present results have shown that the TOLLIP protein possesses the four most damaging nsSNPs. PROVEAN shows the highest score for D71N and G19D as -4.017and − 6.001 respectively, while for E72G score is -6.617, and the lower score for G32R came out to be -7.000. PolyPhen2, with a scale of 0 to 1 including G19D, G32R, D71N, and E72G scores turn out to be 0.999,1.000,0.999 and 0.995 respectively, and all had termed as probable damage by Polyphen2. We accessed Ensemble genome browser 96 to cross-check these nsSNPs via many servers, such as MetalR, REVEL, Mutation Assessors, and CADD. Mutation Assessor predicted that all four mutations are the most harmful. Likewise, the rest of the tools predicted that all of these 4 nsSNPs are toxic. CADD demonstrated the score of E72G as 34 which is the highest as compared to score of G19D, G32R, and D71N as 26,26 and 27 respectively (here the score of CADD 30 refers to the 0.1% and 20 to the 1% of the most damaging SNPs in the genome of human). The impact of characteristics like variation in the organized interface, methylation loss, and intensication in acetylation predicted by the MutPred. Out of four mutants, G19D represented 0.607 as the peak p-value, whereas three (G32R, D71N, and E72G) mutations demonstrated 0.599,0.412


Introduction
Various types of mutations may occur in the human DNA chain. The single base pair of variations referred to as SNPs (Single Polymorphisms of Nucleotides) which comprises about 90% of them. The coding area of human genomes is known to be about 500,000 SNPs 1 . The nsSNPs (non-synonymous SNPs) lead to the anatomical and physiological variation of human proteins 2 . Such nonsynonymous SNPs can affect amino acids that can be helpful or dangerous to the structure and/or function of the protein 3 . These alterations are associated with changes to gene regulation, protein structure volatility 4,5 . These changes are related to protein structure instability 6 , hydrophobicity, structure, and loading of protein 7 , translation, relative strength, protein relations, and dynamics (Inter/Intra) 8 . The SNPs in the Toll interacting protein gene have also been described as downregulating interleukin-6 (TNF -α) proin ammatory cytokines and causing interleukin (IL-6) anti-in ammatory cytokines, (TNF-α) tumor necrosis factoralpha and induces the anti-in ammatory cytokines interleukin-10 (IL-10) 9 . TOLLIP gene polymorphisms are linked to infectious diseases because they control the Toll-like receptors, a family of evolutionarily conserved speci c receptors that identify and respond to PAMPs formed by certain microbes such as PGN, LPS, CpG-DNA, and single and double-stranded RNA (dsRNA) 10 . Thus, TLRs are an important component of innate immune responses, acting as a buffer by initiating signaling cascades to trigger immune and in ammatory genes against invading pathogens.
However, disturbed activation of TLR pathways leads to an unnecessary release of signaling molecules, and proin ammatory cytokines, these processes may lead to in ammation, development of infectious and autoimmune diseases 11 . Therefore, the signaling equilibrium of TLR pathways must be carefully maintained to preserve the immune response. TOLLIP became a new component of the IL-1R pathway in mice and then was published in other species too 12 . TOLLIP protein consists of three conserved domains; an N-terminal Tom1 binding domain, a central conserved C2 domain, and a C-terminal Ubiquitin to ER lysosomal degradation domain. In mammals, TOLLIP   proteins work by shutting down MyD88-dependent signaling pathways by inactivating IRAK-1 or by explicitly binding   to the TLR2 and TLR4 13 . TOLLIP also contains protein that regulates other in ammatory processes. TOLLIP C2   domain can bind with phosphatidic acid and a CUE domain which can interact with ubiquitinated proteins 14 . The C2 domain is involved in phagocytosis 15 . The JAK/STAT (Janus Kinase/Signal Transducers and Activators of Transcription) pathway is shown to be built around lysosomes 16 . C2 domain helps in liberating and transferring the transmembrane proteins from lysosome and Mitochondrial TOLLIP plays a substantial role in stimulating oxygenderived free radicals 17 . TOLLIP by its upregulated expression in the intestine epithelial cells induces tolerance to the normal enteric ora. Failure in the upregulated expression in the TOLLIP gene can lead to chronic in ammation in a patient with in ammatory bowel disease 18 . TOLLIP has been associated with atopic dermatitis but the actual mechanism of the disease is still not known 19 . TOLLIP has been shown to act as a negative regulator by attenuating the function of IL1β 20 . Studies are expected to be conducted to understand TOLLIP and its functional and structural variants. We used bioinformatics methods to identify the most harmful nsSNPs found in the TOLLIP protein. We suggested 3D versions of wild-type TOLLIP protein and the protein that could be deleterious due to the nonsynonymous SNPs on the gene. For TOLLIP, this is the rst study of its kind that covers protein structure prediction and mutation analysis in the context of TOLLIP.

Methodology
The study has been completed in several steps. Below is the gure depicting the tools utilized to carry out the work.

Extracting non-synonymous SNPs
The database of Single Nucleotide polymorphism was accessed via the NCBI. Complete information of TOLLIP's gene including its SNPs and data regarding its Minor allele frequencies (MAF), location, and the changes in residues discussed by Bhagwat 21, out of which the non-synonymous SNPs of the desired gene were extracted.

Identi cation of Damaging nsSNPs
The effect of spotted nsSNPs on the TOLLIP gene was recognized by employing the SIFT 22 , PROVEAN 23 , PhD-SNP 24, and SNPs, and GO 25 software. The resultant mutual nsSNPs marked as intolerant by all of these tools were then further get ltered by another tool known as PolyPhen2 26 .

Examining structural and functional effects
For the purpose of examining the conformational and functional changes of targeted protein due to the presence of nsSNPs, a Mutpred 27 tool was used. This web application detects the molecular disorder by investigating any amino acid's replacement in the protein structure. The variation in conformation and con guration of protein such as an increase in helical tendency along with any disruption caused by the loss of phosphorylation site in protein's normal function get screened by this server. In this study, we inserted the FASTA sequence of TOLLIP protein and marked the regions where the changes in amino acids were observed (deleterious nsSNPs). The resultant values can be declared as con dent and very con dent when the probability value ranges from less than 0.05 to less than 0.01, respectively.

Analysis of TOLLIP's stability
To carry out the analysis of targeted TOLLIP's protein's stability, I-Mutant 2.0 28 was utilized in the study. Any alterations in the stability of protein due to mutation are estimated by this web server. The sequence of TOLLIP protein was subjected to this tool by setting the conditions of pH as 7 and temperature at 25 ˚C. The reliability index (RI) exhibited by this server in the presence of any mutant protein normally falls in the range of 0-10.

Identi cation of Evolutionary Conserved Amino Acids
Phylogenetic association among the homologous sequences of conserved amino acids in TOLLIP protein was identi ed by the Consurf 29,30,31,32 . The degree of evolutionary conservation of estimated amino acids was achieved by using 50 dissimilar homologous sequences. The detectable damaging nsSNPs were further studied which was spotted in a liation with the highly conserved amino acids.
Predicting 3D TOLLIP's Structure 3-dimensional homology modeling of TOLLIP protein was achieved through Phyre2 33 . It developed the four mutants' models because of the toxic non-synonymous SNPs, in addition to the TOLLIP's wild-type. Both wild-type and mutants' models were compared by Template modeling score. The accurate measurement of similarity score was attained by the calculated values of RMSD and TM. Studies have shown that an elevated level of root mean square deviation value is directly related to the enhanced level of alteration between wild-type and mutant 34,35 . By following their values, the deleterious mutants were then given as input to I-TASSER 36,37,38 . The molecular features and interactive visualization of the resulting protein structure were examined using Chimera 1.11 39 .
The sites of phosphorylation were assessed in the Tyrosine, serine, and threonine amino acid of TOLLIP protein by setting the threshold value of NetPhos to (0.5) 43 . The elevated values than the standard indicated the presence of a phosphorylation site. Similarly, the standard score for determining ubiquitylation set at 0.62 44 , lysine possess this site or not depends on the score equal/ highest or lower than standard, separately. All the biological events are triggered by the PTMs because these are directly involved in the normal functionality and structure maintenance of protein to carry out processes like interactions among proteins as well as cell signaling cascade 45,46 . Likewise, whenever a lysine residue of targeted protein gets methylated it ultimately causes the disruption in the binding capacity of DNA which results in the alteration of expression of gene 47 . On the other hand, repairing DNA damage is aided by the ubquitylation 48 . Similarly, signal transduction pathways also get activated or deactivated when certain amino acid residues get phosphorylated to achieve the desired structural con rmation by phosphorylation 49,50,51,52 .

Gene-Gene Interaction of TOLLIP
The in uence of non-synonymous SNPs on other genes was con rmed by the two famous tools known as STRING and GeneMANIA. 53,54 The interaction among the genes based on biochemical or signaling pathways, having resemblance in protein's domain analyzed by the GeneMANIA. The work of STRING is to exhibit the combined score of interactive genes. In this step, we provided TOLLIP as our targeted input gene and then carried out the analysis.

Recruited nsSNPs
The SNPs of the TOLLIP gene extracted from the database of NCBI were 11535, which further comprised of 196 non-synonymous SNPs, 320 positioned in the 5' Untranslated region whereas 836 located in the 3' untranslated region. The remaining 6104 were other forms of SNPs including Intronic SNPs, X synonymous SNPs, intronic SNPs, 198 synonymous SNPs, splice site SNPs, and uncategorized SNPs, their diagrammatical depiction has been shown in Fig. 2. Our target was 196 nsSNPs which were further examined out of this data. The ndings of this analysis have shown that 4 nonsynonymous SNPs can shorten the protein by causing variations in it which can stimulate the termination of mRNA's translation known as truncated protein. The S1 table contains information regarding nsSNPs.

Identi cation of Damaging nsSNPs
To further detect the effect of 196 nsSNPs on the structure as well as on the function of the TOLLIP gene the subsequent bioinformatics tools have presented the given results. The protein variation effect analyzer (PROVEAN) has shown 74 nsSNPs for possessing the damaging impact because the nal score of variants came out to be lower than the established value of threshold (-2.5). The ndings of another program known as Sorting intolerant from tolerant (SIFT) have shown 118 nsSNPs to be intolerant because the resultant values came out to be lower than the Tolerance Index (0.05). On the other hand, 39 nsSNPs have been characterized as diseased by the derived results of SNPs and GO. We picked 4 those mutual nsSNPs from these tools which exhibited the most deleterious impact and submitted them to the Polymorphism Phenotyping v2 (Polyphen2). This tool is used to predict the impact as probably, possibly damaging as well as benign. The ndings from this software have further con rmed the impact of this 4 nsSNPS as the most probable damaging and they were then brought for further analysis.

Structural and Functional Effect of nsSNPs Prediction
MutPred server was used for selected 4 nsSNPs. This web application has characterized the nsSNPs that can cause the disruption in the function as well as in the structure of the protein. The p-values derived from this server has shown in Table 2. The neural network-based web-server was used for all the selected nsSNP of TOLLIP protein to predict its stability known as I-Mutant. The stability of protein increased/decreased depends on amino acid substitution and their result was obtained with RI range from 0-10, which are given in the

Evolutionary Conservation of TOLLIP Protein
In order to carry out the determination of functional regions in TOLLIP protein, Consurf recognized the four nsSNPs.
Its ndings have estimated (G19D, G32R) and (D71N, E72G) as highly conserved buried and exposed respectively. Hence, its analysis has clari ed that for the proper functioning and conformation of TOLLIP protein the nonsynonymous SNPs traced at the extremely conserved regions tend to be very harmful. highly conserved and exposed(f) rs1202660177 E72G 9 highly conserved and exposed(f) 3d-modeling of TOLLIP'S Mutants I-Mutant predicted that four nsSNPs played a role in decreasing the TOLLIP protein's stability and then were selected for the ultimate comparative modeling of protein. The sequences of TOLLIP protein were submitted to I-TASSER with a single amino acid of wild type and mutants to generate TOLLIP protein structure. It is the most reliable and advanced tool for predicting protein structure. 5 models for each TOLLIP mutant and protein were then generated by this method. 2co9 (83% identity) and 2nbiA (85% coverage of the threading alignment) were the templates employed in this method. I-TASSER used ten threading programs to develop 3-D protein structures such as wdPPAS, FFAS-3D, SPARKS-X, pGenTHREADER, HHSEARCH (1 and 2 as well) MUSTER, and Neff-PPAS. The ndings derived from I-TASSER were then deferred to the algorithm (TM) to calculate its score. RMSD value of every mutant was also investigated (see Table 6). The visualization of protein's structure and molecular depiction of TOLLIP protein was studied via Chimera 1.11 We investigated the following modi cations in this study and their results are as follows: Methylation: In the human TOLLIP gene, no site has recognized as methylated by employing the GPS-MPS 3.0 for this objective. Phosphorylation: TOLLIP protein phosphorylation sites predicted by MODEPRED and NetPhos 3.1 are as follows: 21 residues of (S:35 %, T:35%, and Y:29%) and 16 residues of (Ser:10%, Thr:9%, Tyr:4%) to carry out the phosphorylation were marked by the MODEPRED and Netphos 3.1 respectively. The outcome from both of these servers has been compared in the following table.  From these ndings, it is apparent that phosphorylation is the only modi cation that can have major effects on the structure and function of the TOLLIP protein.

Discussion
Numerous studies have been carried out previously to elucidate the association of the TOLLIP gene's polymorphism with many disorders such as Leprosy or Hansen's disease 55 , tuberculosis 56 , Visceral Leishmaniasis 57 , idiopathic pulmonary brosis 58 . In this study, we have examined the most damaging non-synonymous SNPs in the TOLLIP gene that may have a critical role in certain diseases.
The results have demonstrated the 196 nsSNPs, the data of other kinds of single nucleotide polymorphism extracted from the employed servers were excluded in this study. The present results have shown that the TOLLIP protein possesses the four most damaging nsSNPs. PROVEAN shows the highest score for D71N and G19D as -4.017and − 6.001 respectively, while for E72G score is -6.617, and the lower score for G32R came out to be -7.000. PolyPhen2, with a scale of 0 to 1 including G19D, G32R, D71N, and E72G scores turn out to be 0  Table 3.
ConSurf predicted protein conservation pro le. According to Berezin 59, the highly preserved residues are those which are expected to become essential in their conformational and functional aspect depending on the core and location on the surface of the protein. Likewise, Miller and Kumar 60 described that amino acids involved in important biological processes are located in the most conserved region. Hence, relating to this information it is now obvious that non-synonymous SNP found on the very conserved regions can cause the maximum impairment to TOLLIP. Out of 4 nsSNPs, G19D and G32R marked on conserved and buried regions which are considered as structurally important while D71N, E72G spotted on highly conserved and exposed sections which are functionally very signi cant. The adverse effect of these nsSNPs on the TOLLIP protein is further con rmed. I-TASSER was employed to get the demonstration of TOLLIP protein's structure. The FASTA sequence of the protein was utilized as the only source of input. This server is well-developed because it collects the prototypes and initiates the simulation of protein Predictions from STRING and GeneMANIA indicate that IRK1 is the most interactive gene with the TOLLIP gene, which is associated with many diseases and also indicates its importance in many diseases.
Thus, it can be inferred that all 4 of the most destructive nsSNPs in the gene TOLLIP eventually affect and interrupt the regular work of other expressive genes based on their interaction patterns and their co-pression pro led with many diseases such as in pathways, it has relation IRAK2, IL1RAP, IRAK2, IL1RN, EHHADH, MYD88, IL1B, IRAK3,  TRAF6, IL1A, IL1R1, IRAK1 and IRAK4 genes which indicates its importance.

Conclusion
In our study, we conclude from the derived ndings that, these four selected non-synonymous SNPs can cause damage to the TOLLIP protein. The malfunctioning in the conformation and function of this protein might lead to a variety of diseases. The four major damaging nsSNPs found in this study are rs866744102 (glycine → aspartic acid at position 19), rs1308704061 ( glycine → arginine at position 32), rs777772934 (aspartic acid → asparagine at position 71), and rs1202660177 (glutamic acid → glycine at position 72). The input SNP Id of these four nonsynonymous SNPs has exhibited decreased stability as well. According to the Consurf predictions, D71N and E72G have located in highly conserved and buried regions as functional residue, though all these four nsSNPs are present in highly conserved regions indicates their threatening impact on the TOLLIP protein. In this study, we have also analyzed the interaction of the TOLLIP gene with other genes. The results of STING have revealed that TOLLIP is most interactive with the IRAKI (interleukin-1 receptor-associated kinase-1) gene. On the other hand, the TOLLIP gene is not co-localized with any gene. Hence, in this study, we have carried out an in-silico analysis of these four most damaging non-synonymous SNPs by using various tools of bioinformatics so that, this derived data may contribute to carrying out the careful investigation that should be done to identify the comprehensive effect of these nonsynonymous SNPs on the structure as well as on function of TOLLIP protein. Summary of complete methodology in ow chart format All SNPs in TOLLIP gene.    TOLLIP's gene-gene interaction exhibited by STRING.