The development of alcohol dependence is a complex and dynamic process that needs to be investigated in terms of comorbidities with psychiatric disorders especially to develop new psychotherapeutic and pharmacotherapeutic options (Farren et al. 2012). Studies support the importance of genetic influences in substance abuse and dependence. Some specific genes of interest are associated with alcohol use disorders (Mayfield et al. 2008). A study reported from Germany has documented GABRA2 gene sequence variation concerning alcohol abuse. Of the four haplotypes investigated, T-CA-C-A-T-T-C haplotype was significantly more often present in alcohol-dependent subjects compared to controls (Soyka et al. 2008).
Strong associations between SNPs in the gene encoding the alpha2 subunit of the GABRA2 was found with alcohol dependence and affecting brain oscillations as seen in distinct electroencephalography patterns (Edenberg et al. 2004). The link between alcohol abuse and SNPs in GABRA2 was found in subjects with illicit drug dependence (Agrawal et al. 2006). Genetic variations particularly, nsSNPs resulting in amino acid changes disrupt potential functional sites responsible for protein activity, structure, or stability (Schaefer et al. 2012). Investigating the role of nsSNPs in the structural and functional changes in GABRA2 will help in understanding the genetic mechanism of alcohol dependence associated with impulsive disorders. In the study on 295 Americans, of whom 97% of patients were of Caucasian origin, primarily the intronic regions of GABRA2 gene were analyzed to detect association with impulsiveness and lifetime alcohol problems. Our study, looked at missense variants in exons collected from the NCBI database representing the diverse population data as this would directly affect the GABRA2 protein stability and function.
The enormous human genomic sequence information obtained from large-scale projects are helpful in several computational approaches to identify the protein mutants in terms of single amino acid changes that disrupt gene functions. Several prediction tools have been developed to identify amino acid variants. The programs such as SIFT, PolyPhen-2, Mutation Assessor, MAPP, PANTHER, LogR.E-value, Condel and several others predict the effect of missense variants on protein function. The SIFT program prediction through PSI-BLAST indicated 29 nsSNPs as damaging with scores ≤ 0.05. The 14 commonly predicted nsSNPs had the least score of zero indicating the high predictive ability of the program.
PolyPhen-2 uses eight sequence-based and three structure-based predictive features by comparing the property of the wild-type and the corresponding mutant allele that together defines an amino acid replacement (Adzhubei et al. 2010). Our SNP data were analyzed in terms of pph2_prob (classifier probability of the variation being damaging), pph2_FPR [classifier model False Positive Rate (1 - specificity) at the above probability] and pph2_TPR [classifier model True Positive Rate (sensitivity) at the above probability]. Among the 14 nsSNPs identified by all the six programs, the 13 nsSNPs had high pph2_prob compared to other predicted mutations ranging from 0.99 to 1 and were predicted as probably damaging.
The PANTHER program estimates the likelihood of a particular nonsynonymous coding SNP to cause a functional impact on the protein (Tang et al. 2016). The position-specific evolutionary preservation (PSEP) tool employed in the PANTHER program uses a distinct metric based on evolutionary preservation wherein it calculates the length of time (in millions of years) a given amino acid has been preserved in the lineage leading to the protein of interest. The longer the preservation time, the greater the likelihood of functional impact. The 76 SNPs that included the 14 nsSNPs were identified as probably damaging.
Out of the 86 nsSNPs, the PROVEAN server program predicted 25 nsSNPs to be functionally damaging, of which, 18 nsSNPs were predicted by other programs such as SIFT, PolyPhen, and PANTHER servers. The 14 nsSNPs were predicted by all six programs with scores ranging from − 3.13 to -11. 96. The prediction accuracy of this program for human protein variations was reported to be 79.5%. Choi et al. (2012) compared the performance of PROVEAN with the results from two different protein databases. This included the NCBI NR (non-redundant) protein and the UniProtKB/Swiss-Prot protein databases. Their results indicated a reduced accuracy of 7% when using the UniProtKB/Swiss-Prot database instead of the NCBI NR protein database. The authors highlight the usefulness of the program to identify deleterious single nucleotide variants and variants that cause protein sequence indels. We found this program useful for predicting deleterious SNPs consistent with other prediction servers.
To predict the pathology of the identified 14 nsSNPs, the PMut program was utilized. Of 86 nsSNPs tested, 37 including the 14 nsSNPs were predicted to be associated with a pathological disease. PMut is reported to be a powerful tool to predict the functional consequences of protein sequence variants (Lopez-Ferrando et al. 2017).
We utilized six different bioinformatics programs (SIFT, PolyPhen, PANTHER, PROVEAN, Mutation Assessor and P-Mut) that use different methods to predict the nsSNP with deleterious effect on the GABRA2 protein function. Fourteen, out of 86 nsSNPs that were predicted as most damaging in all 6 programs that we used were further analyzed for structural stability changes.
Among the 14 nsSNPs, 10 were found in the conserved regions of the protein sequence as identified by ConSurf analysis. The nsSNPs that are located at highly conserved amino acid positions tend to be more deleterious than nsSNPs that are located at non-conserved sites. In general, highly conserved amino acids either buried (structural) or exposed (functional) act as biologically active sites compared to other residues. Any substitutions in these functional residues may either lead to complete loss of biological functions or cause severe deleterious effects compared to other polymorphisms of the non-conserved site (Dakal et al. 2017).
The functional and structural sites were identified in ConSurf program that combines evolutionary data and solvent accessibility predictions. In our study, among the 14 nsSNPs, 6 nsSNPs (P445H, P280S, V208F, S186C, T43A, R58T) were found in the exposed surface and 8 (I255T, L170S, W122L, I121N, F93C, Y73C, F93C, A188T) were buried.
We, therefore, analyzed the predicted structural consequences using tools available in Swiss-PDB viewer. The 3D structures of variants and the wild type were generated in I-TASSER program. The model that had a high C-score for each of the variants and the wild type was selected for analysis in the DeepView Swiss-PDBViewer.
The total energy of the native protein structure was determined to be -19946 KJ/mol. Among all the 14 mutants, the mutant R58T was found to have the highest energy of -19646 KJ/mol after energy minimization. The remaining 13 mutants were found to have much lower energy values ranging from − 13567.015 to -13857.964. The variant that had a high difference in RMSD score of 0.150Å is S186C. The Ramachandran plot as analyzed in RAMPAGE program indicated no major changes in terms of shifts to the favoured region or allowed regions. The number of residues in both regions remained the same. However, the I-Mutant suite which was used to analyze the protein structural stability changes predicted 10 of 14 nsSNPs as disease-related and 3 nsSNPs as neutral polymorphisms. The reliability index ranged from 2 to 10 for the 13 variants.
In our study, of 14 variants analyzed, one variant, R58T had DDG value of 0.23 indicating a weak effect and 12 other variants (excluding I255T) had scores less than − 0.5 indicating largely destabilizing effect and two variants, W122L and S186C with near scores of -0.49 and − 0.47 respectively indicating weak effect. Interestingly, among the 14 variants analyzed, 13 except P445H were present in the neurotransmitter-gated ion-channel ligand-binding domain of the protein. This indicates the potential role of the SNPs in the functionality of the protein.
While studying alcohol abuse, we have to be conscious of "Emergent Complexity" i.e. alcoholism could be a product of multi-factorial elements contributing to this psychiatric condition. One of several such elements would be the polymorphism of important proteins which contribute to change in function of pathways in the central nervous system. One such observation has been the SNPs in GABRA2 gene. Our study showed the effect of the SNPs in the structure and function of the protein (Agrawal et al. 2012).
The MutPred program predicts the impact of single amino acid substitutions on more than 50 different protein properties to infer the molecular mechanisms of pathogenicity. The software package includes genetic and molecular data of amino acid substitutions leading to varied pathology. It includes a general pathological prediction and a ranked list of specific molecular alterations potentially affecting the phenotype (Pejaver et al. 2017).
In our study, two variants (W122L, I121N) were available in the dbSNP database with a frequency of < 0.01. Nine variants (R58T, Y73C, W122L, L170S, I255T, A188T, P280S, P426S, P445H) were available and were identified as "rare" (Minor Allele Frequency < 0.01) variations by the Exome Aggregation Consortium (ExAC) database (http://exac.broadinstitute.org/). This database lists a total of 53 missense variants with a constraint metric (z value) of 3.34. Positive Z scores indicate increased constraint (intolerance to variation) and therefore that the gene had fewer variants than expected.
The rare functional variant could alter gene function significantly though hit occurs at low frequency in a population. The “common-disease rare-variant” hypothesis indicates that variants affecting health are under purifying selection and thus should be found only at low frequencies in human populations. Rare variants are increasingly being studied, as a consequence of exome and whole-genome sequencing efforts. While these variants are individually infrequent in populations, there are many such variants in human populations, and they can be unique to specific populations. They are more likely to be deleterious than common variants, as a result of rapid population growth and weak purifying selection (Nelson et al. 2012).
Our overall results indicate that there is a significant number of nsSNP (14/89; 16%) predicted to be associated with GABRA2 protein dysfunction. Gene-gene interactions were studied to highlight candidate genes that could be associated with alcohol dependence, especially if haplotypes are to be studied in the future. Among the 14 nsSNPs, 9 were within conserved regions. The functionally deleterious nsSNPs, showed 10 nsSNPs to be associated with disease, though none of them showed structural variation in the Ramachandran Plot.
GABRA2 gene variants are associated with alcohol dependence and other mental disorders, but nsSNP of the GABRA2 gene has not been studied earlier. To our knowledge, this is the first report on the SNPs focusing primarily on exons associated with functional changes directed by missense variants in the gene. It could be speculated that these nsSNPs play a vital role in an individual's alcohol dependence. However, this should be further evaluated through studies on individuals with varying degrees of alcohol dependence in addition to Genome-Wide Association Studies (GWAS). A case-control study involving alcoholics compared to non-alcoholics with SNPs will help rule out type I error in observations (Ray et al. 2009). This data could help institute social intervention programs. With a better understanding of the genetic basis of alcoholism, it is possible to pre-screen 'at-risk' individuals and design personalized early intervention, especially among the youth population (counselling, change of peer groups, improve family support and ensure employment and domiciliary status if necessary).