According to free access in silico analysis tools, five of the six effect variants for which frequencies in the Lithuanian population differed significantly from European populations are considered benign (regarding Varsome or UniProt) or a risk factor (Ensembl). All these five variants (PPARG: rs1801282, SLC30A8: rs13266634, ZC3HC1: rs11556924, PLCE1: rs2274223, SH2B1: rs7498665) were selected from the scientific articles for our catalogue of effect variants. In these articles, these variants were identified as candidate protective genome variants after GWAS data was filtered for nonsynonymous SNPs to increase the likelihood of them being functional and after bioinformatic analyses were performed to detect evidence of positive natural selection for the effect variant and to estimate the probability of the mutation being damaging. In addition, a variant was considered protective when it was more frequent in controls than cases [5].
Variant rs698 in the ADH1C gene is known as protective (according to the Ensembl, ClinVar and OMIM) and has an impact on ethanol metabolism. Even though databases define this variant as protective, various studies suggest that this variant is associated with slower ethanol metabolism, which could lead to a longer period of consuming alcohol and the consumption of greater quantities. Therefore, people carrying the variant have a higher risk of heavy and excessive drinking [15, 16]. According to one study, common SNPs are responsible for as much as 30% of the variance in alcohol dependence, but few have been identified [17]. Power analyses however indicate that additional SNPs associated with alcohol dependence are likely to have small effect sizes and are more consistent with more common psychiatric disorders [18]. This shows that an understanding of the molecular mechanisms involved in excessive alcohol consumption and other complex conditions are still unresolved and that the collection of large numbers of well-characterised cases and controls is needed.
Besides function, the origin of effect alleles must also be addressed. Every disease-associated SNP consists of two alleles, of which one is considered as risk-associated and the other as disease-protective. A common practice to ascertain whether a nonsynonymous SNP is protective (i.e. the respective derived allele is protective) is to deduce which allele is derived and which is ancestral, since a minor allele does not necessarily equal the derived (mutant) one. The origin of the allele could be determined by using genomic alignments with primate species. The effect variants we analysed did not have a very low (< 1%) minor allele frequency, and we cannot assume that the rare allele is the derived allele. However, if a derived allele provides a protective function and gives an individual a selective advantage, one might expect positive selection to sweep it to become the most common allele in the population [5]. This may be the reason why the effect variants we analysed have allele frequencies greater than 1%. Moreover, this could be the reason why databases and SNP analysis tools call these variants polymorphisms. Comparison with primate species showed that variants analysed in PLCE1, ADH1C, SH2B1 are indeed ancestral. The protective nature of genomic variants can be considered when the allele is derived, which is why we did not interpret these variants as protective. Despite contradicting data, significant variants may have some effects on the aetiopathogenesis of particular complex diseases.
In our study, effect variants may have an impact on protection against type 2 diabetes (variants in PPARG and SLC30A8 genes) and coronary heart disease (ZC3HC variant). It is important to keep in mind the effects of environmental factors. According to data from The Lithuanian Department of Statistics (Statistics Lithuania) [19], the highest number of deaths (55.4%) in 2018 was caused by diseases of the cardiovascular system. In 2014, 4.4% of the population had type 2 diabetes and 7.5% had coronary heart disease. Even though our population have effect variants that may protect against these diseases, lifestyle and other environmental factors may influence the frequency of morbidity. Also, many studies concentrate on effect variants of coding genomic parts, but interactions between coding and non-coding variants are as important but are not examined enough. Although these effect variants may reduce the risk of disease (or maintain health), there are additional genetic mechanisms that control this process. Not only are the effects of single genomic variants important, but their interactions and the interactions between regulatory regions are also consequential [9].
Butler et al. [5] estimated an integrated haplotype score for the effect variants that we have analysed in the PPARG, SLC30A8, and ZC3HC1 genes that showed that these variants may have undergone recent positive selection [20]. This shows that a derived allele is beneficial for an individual’s fitness and may be protective. However, the functional impact of these variants has to be confirmed and additional analysis is needed. According to Plenge et al., most alleles associated with complex diseases (approximately 85%) fall outside the protein-coding sequence, and thus each disease-associated allele should be evaluated to see whether it is in linkage disequilibrium with a variant that changes protein structure. If it is, then these findings should be fast-tracked for functional studies in human cells and animal models to assess the gain-of-function or loss-of-function. For non-coding effect variants, the effect on gene expression should be evaluated in a relevant human cell type. For example, if a risk allele is associated with higher gene expression, then pharmacological inhibition may be effective in treating the disease [21].