Genetic Variation at the Catalytic Subunit of Glutamate Cysteine Ligase Contributes to the Susceptibility to Colorectal Cancer: A Pilot Study

Marina Bykanova (  marina.bickanova@yandex.ru ) Kursk State Medical University: Kurskij Gosudarstvennyj Medicinskij Universitet https://orcid.org/0000-0001-5420-3557 Maria Solodilova Kursk State Medical University: Kurskij Gosudarstvennyj Medicinskij Universitet Iuliia Azarova Kursk State Medical University: Kurskij Gosudarstvennyj Medicinskij Universitet Elena Klyosova Kursk State Medical University: Kurskij Gosudarstvennyj Medicinskij Universitet Olga Bushueva Kursk State Medical University: Kurskij Gosudarstvennyj Medicinskij Universitet Anna Polonikova Kursk State Medical University: Kurskij Gosudarstvennyj Medicinskij Universitet Mikhail Churnosov Belgorod State University: Belgorodskij gosudarstvennyj nacional'nyj issledovatel'skij universitet Alexey Polonikov Kursk State Medical University: Kurskij Gosudarstvennyj Medicinskij Universitet


Introduction
Colorectal cancer (CRC) is the third most common malignant tumor and the second cause of death attributed to cancer worldwide [1,2]. About 1.9 million new cases of CRC and 935 000 deaths have been recorded in the world in 2020 [3]. The incidence of this cancer type is higher in developed countries than in countries with emerging economies, and the incidence rate of CRC is progressively growing in many countries of the world, including the Russian Federation [4].
Colorectal cancer is a multifactorial disease resulting from interactions between genetic and environmental factors such as a lack of regular physical activity, cigarette smoking, alcohol consumption, and various dietary factors [4,5,6]. Moreover, substances with carcinogenic activity such as drugs, pesticides, food additives, and chemicals released during food cooking have been found to increase the risk of colorectal cancer. A large number of candidate gene and genome-wide association studies have been done to investigate the role of genetic factors in CRC susceptibility, and numerous single nucleotide polymorphisms (SNPs) have been identi ed to be associated with disease risk [8,9,10]. There exists increasing evidence that dietary factors such as low consumption of fruits, vegetables, and bers, high-fat diet, a diet high in processed meats play an important role in the development of CRC and therefore dietary modi cation is promised to reduce disease incidence [4,11,12]. Anti-carcinogenic properties of fruits and vegetables, as well as unprocessed meats, are attributed to numerous natural components in this type of food among which glutathione is of particular importance. Glutathione (GSH) is an intracellular thiol peptide (consists of three amino acids such as cysteine, glycine, and glutamic acid) that presents in the majority of cell types at high concentrations, and is involved in xenobiotic detoxi cation, antioxidant defense, maintenance of mitochondrial function, and modulation of cellular proliferation, inhibition of apoptosis in many other crucial biological functions [13,14]. Since glutathione is known to detoxify a wide variety of exogenous and endogenous carcinogens and free radicals, and this function makes GSH a powerful molecule protecting from carcinogenesis [15,16]. Importantly, glutathione de ciency is associated with increased susceptibility to oxidative stress implicated in the development and progression of cancer [17,18]. In the context of the anti-carcinogenic function of GSH, Shiraishi with co-authors reported that long-term ingestion of reduced glutathione (GSH) was found to suppress an accelerating effect of beef tallow diet on colon carcinogenesis in rats [19]. Hoensch with co-workers [20] observed that glutathione levels in the large intestine are relatively low and decrease from proximal (the colon transversum) to a distal colon (sigma). Genetic polymorphisms for glutathione metabolism enzymes may explain interindividual differences in glutathione biosynthesis and thus in uence susceptibility to colorectal cancer. Thus, polymorphic genes encoding enzymes involved in the glutathione biosynthesis like glutamate-cysteine ligase (GCL) are attractive biomarkers for testing the genetic susceptibility to colorectal cancer. However, the contribution of genes responsible for glutathione biosynthesis such as GCLС, a catalytic subunit of glutamate-cysteine ligase, catalyzing the initial rate-limiting step of GSH biosynthesis [21], to the predisposition to colorectal cancer has not so far been investigated. Therefore, the purpose of this pilot study was to investigate whether common SNPs at the GCLС gene are associated with the risk of colorectal cancer in a population of Central Russia.

Study participants
The Ethical Review Committee of Kursk State Medical University has approved the research protocol. All participants gave written informed consent before enrollment for this study. A total of 681 unrelated individuals (283 patients with CRC and 398 healthy controls) from Central Russia were recruited for the study. The patients were enrolled from the Kursk Regional Oncological Dispensary during the period between 2013 and 2016. The diagnosis of CRC was veri ed by experienced oncologists based on the results of clinical, laboratory, and instrumental methods. The control group was recruited from the same population and included healthy blood volunteers and hospital-based patients with no clinical evidence for CRC, as described previously [22,23]. The criterion for inclusion in the control group was the absence of oncological and other chronic diseases.

DNA analysis
Whole blood samples (5 mL) were collected from all study participants into EDTA-coated tubes and maintained at -20OC until processed. Genomic DNA was isolated using the standard procedure of phenol-chloroform extraction. Six common functional SNPs of the GCLC gene (minor allele frequency in the European population is higher than 5%) such as rs12524494, rs17883901, rs606548, rs636933, rs648595, and rs761142 were selected for the study using SNPinfo, GenePipe, and FuncPred bioinformatics tools [24], as described previously [7]. Genotyping of the SNPs was performed with the MassArray-4 system (Agena Bioscience Inc, San Diego, CA, USA) at the Research Institute for Genetic and Molecular Epidemiology of Kursk State Medical University (Kursk, Russia). To ensure quality control, 10% of the samples were chosen at random for repeat genotyping, which was performed blindly to the case-control status, and the repeatability test yielded a 100% concordance rate.

Statistical and bioinformatics analysis
Allele frequencies were estimated by the gene counting method. The chi-square test was applied to assess signi cant departures of genotype frequency from Hardy-Weinberg equilibrium (HWE). P-value ≤ 0.05 was considered statistically signi cant. Allele and genotype frequencies and their association with CRC groups were analyzed using the SNPStats software [25] available online at https://snpstats.net. SNP-disease associations were evaluated by multiple logistic regression (codominant genetic model) with the calculation of odds ratios (ORs) and 95% con dence intervals (95%CI) adjusted for covariates such as age and sex. SNPStats software was also used to estimate GCLC haplotypes and their association with CRC risk, as well as to assess linkage disequilibrium (LD, D, and D' values) between SNPs. Genotype combinations were compared between the study groups using the chi-square test, and the method of false discovery rate (FDR) was applied to all SNP-disease associations to control for multiple testing (FDR calculator available online at http://www.sdmproject.com/utilities/?show=FDR).

Results
Allele and genotype frequencies of the GCLC polymorphisms in the studied population The mean age of the case and control groups were 66.13±10.02 and 66.08±5.27 years, respectively (P=0.93). The number of males was similar in the case (N=146, 51.59%) and control (N=228, 57.29%) groups (P=0.61). The genotype and allele frequencies are shown in Table 1. The genotype distribution for all studied polymorphisms of the GCLC gene was in the Hardy-Weinberg equilibrium (P>0.05). Minor allele frequencies (MAF) for SNPs rs12524494 and rs606548 were in accordance with those reported in European populations (www.ensembl.org), as a part of the 1000 Genomes Project. However, MAF for SNPs rs17883901, rs636933, rs648595, and rs761142 of GCLC in the Russian population differed signi cantly (P ≤ 0.05) from the European one.
Association of the GCLC polymorphisms with the risk of colorectal cancer Statistically signi cant difference in minor allele frequencies for SNPs rs606548 (P=0.041) and rs761142 (P=0.032) of the GCLC gene were observed between the case and control groups. A carriage of genotype rs606548-C/T (OR=2.24; 95%CI 1.24-4.03; P=0.007) was associated with increased risk of colorectal cancer regardless sex and age (overdominant effect of SNP). Furthermore, SNP rs761142 (OR=1.30; 95%CI 1.01-1.66; P=0.041) of GCLC showed an association with increased susceptibility to colorectal cancer (log-additive SNP effect).
Joint effects of the GCLC polymorphisms on CRC susceptibility Table 2 shows genotype combinations associated with the risk of colorectal cancer. As can be seen from Table 2, eight out ten genotype combinations were associated with increased risk of CRC. The disease high risk of these genotype combinations were attributed to the presence of heterozygotes such as rs12524494-G/A, rs636933-G/A, rs648595-G/T, and rs606548-C/T. In contrast, two genotype combinations such as rs636933-G/G⋅ rs761142-A/A (G3) and rs606548-C/C⋅ rs17883901-G/G (G9) were protective against the risk of CRC. However, this association did not survive after correction for multiple testing using the FDR procedure.
We estimated haplotype frequencies in CRC patients and controls (Supplementary table 1). No difference was observed in the haplotype distribution between the study groups (P>0.05). Supplementary table 2 shows data on linkage disequilibrium between the studied SNPs in the Russian population. SNPs rs12524494 and rs636933 were in positive linkage disequilibrium (D′=0.812, P=0.0011). SNPs pairs such as rs606548 and rs761142, rs12524494 and rs761142 were in strong linkage disequilibrium (D′=0.9987, D′=9114).
Functional annotation for CRC-associated polymorphisms of the GCLC gene Functional annotation of the studied SNPs was done using the Vannovar bioinformatics tools (Table 3). We found that all the polymorphisms represent functional genetic variants through which expression levels of the GCLC gene might be modulated in the colon and rectal cells. SNPs rs12524494 and rs606548 were subject of great interest since the variants showed association with CRC susceptibility. The polymorphisms of the GCLC gene were predicted as likely pathogenic variants with oncogenicity scores. VannoPortal data on regulatory chromatin states from DNase-Seq, ATAC-seq, histone ChIP-Seq, and selected transcription factor ChIP-seq from 869 biosamples, as a part of the Epimap Epigenomics 2021 project were analyzed [26]. It is observed that SNP rs12524494 is associated with histone mark H3K36me3 (the tri-methylation at the 36th lysine residue to the DNA packaging protein Histone H3) in malignant cell types such as lung epithelial and hepatocellular carcinoma, sarcoma, melanoma, B cell lymphoma, acute lymphoblastic leukemia, testicular embryonal carcinoma, eye retinoblastoma, neuroblastoma. In addition, rs12524494 is associated with epigenetic modi cation H3K79me2 (the di-methylation at the 79th lysine residue of the histone H3 protein) in lung epithelial carcinoma. SNP rs12524494 was found to be associated with strong gene transcription in mucosal cells of the colon and rectum (Roadmap Epigenomics) and also related with epigenetic modi cation H3K36me3. Importantly, 3D Genomes data from VannoPortal show that SNP rs12524494 is associated with enhancer/promoter activity of GCLC and AL033397.2 miRNA (antisense) in colorectal adenocarcinoma epithelial cells. The analysis of the Epimap Epigenomics data showed that SNP rs606548 is associated with histone marks such as H3K79me2 and H4K20me1 in colorectal adenocarcinoma cells. H4K20me1 (the mono-methylation at the 20th lysine residue of the histone H4 protein) is associated with transcriptional activation and is important for cell cycle regulation [27]. According to the Roadmap Epigenomics data, polymorphism rs606548 of the GCLC gene is associated with a weak transcriptional activity in mucosal cells of the colon and rectum as well as with histone mark H3K79me2 which in turn is associated with enhancer/promoter activity of GCLC in colorectal adenocarcinoma epithelial cells, as identi ed by the 3D Genomes project.
Tissue-speci c eQTL data on the polymorphisms of the GCLC gene of VannoPortal were analyzed. In addition, the bioinformatics databases such as eQTLGen and the GTEx mRNA expression in different tissues and whole-genome genotype data were also used to assess the functional effects of the SNPs. Table 4 shows tissue-speci c eQTL analysis for polymorphisms of the GCLC gene. In the whole blood, allele rs12524494-G is associated with decreased levels of GCLC (eQTLGen Consortium, Q<0.001) and increased levels of pseudogene ERHP2 (VannoPortal, Q=8.44×10-4). Allele rs606548-T is associated with decreased expression of GCLC in whole blood (eQTLGen Consortium, Q<0.001) and increased expression of ELOVL5 (VannoPortal, Q=1.44×10-7) in neutrophils and monocytes, as assessed on the transcriptomic data of Chen with co-workers [28]. Allele rs761142-C is associated with decreased expression of GCLC in the whole blood. Thus, none of the CRC-associated polymorphisms are associated with expression levels of GCLC in both sigmoid and transverse parts of the colon.
Tissue and cell type-speci c prioritization of regulatory variants that are in the linkage disequilibrium with the CRC-associated GCLC gene polymorphisms has revealed that these variants are likely regulated in both colonic mucosa and sigmoid colon through epigenetic mechanisms, as predicted by the VannoPortal tool (VarNote-REG V1.1) on the 1000 Genomes Project, Phase 3 (data of European ancestry). In particular, SNP rs761142 is associated with histone mark H3K79me2 (REG score=0.86537) and is in LD with a variant rs9474579 associated with histone marks such as H3K27ac, H3K4me2, and H3K79me2 (REG score=0.87631) in mucosal cells of the colon. In the sigmoid colon, SNP rs9474579 linked to the rs761142 variant is also associated with histone marks such as H3K27ac, H3K4me2, and H3K79me2 (REG score=0.87631). The regulatory variants rs17885586 (REG score=0.88625), rs1555907 (REG score=0.78026) and rs1555906 (REG score=0.74814) linked to the rs761142 polymorphism are associated with histone mark H3K79me2, whereas a regulatory variant rs2268326 (REG score=0.75310) is associated with histone mark H3K4me2.
The 3D Genomes data from VannoPortal show that the CRC-associated polymorphisms rs12524494, rs606548, and rs761142 are associated with enhancer/promoter activity of GCLC in colorectal adenocarcinoma epithelial cells. In addition, all these SNPs are associated with the weak transcriptional activity of the GCLC gene in mucosal cells of the colon, as assessed by regulatory chromatin states from the DNase-Seq and histone ChIP-Seq data of the Roadmap Epigenomics Project (VannoPortal).

Discussion
Glutathione is a tripeptide, γ-L-glutamyl-L-cysteinyl glycine, present in all tissues at high (1-10 mM) concentrations and is considered as the most abundant non-protein thiol antioxidant playing a critical role in maintaining redox homeostasis and defending the cell against oxidative damage [14,29,30]. GSH possesses numerous vital functions in the cell such as detoxifying xenobiotics, scavenging free radicals, maintaining the essential thiol status of proteins, providing a reservoir for cysteine, as well as modulating critical cellular processes such as DNA synthesis, microtubule dynamics, and immune function [14,31]. The major determinants of intracellular GSH production are the availability of cysteine, the sulfur amino acid determining the activity of glutamate-cysteine ligase (GCL), the rate-limiting enzyme of glutathione biosynthesis. GCL is composed of a catalytic (GCLC) and modi er (GCLM) subunits which are differentially regulated [30].
The levels of reduced glutathione were found to be elevated in numerous types of human cancers such as bone marrow [32], breast [33], and lung [34] as well as colorectal cancer [21]. Moreover, the increased expression of GCLC has been identi ed in lung, breast, liver, and other types of cancer [35]. It is observed that the increased resistance to chemotherapeutic drugs and radiation therapy might be associated with increased levels of GSH [36], suggesting that increased glutathione is a secondary event when tumor cells somehow enhance glutathione biosynthesis to ensure their vital functions. In addition, the levels of GCLC were found to be overexpressed in patients with liver metastases, where the enzyme is thought to promote tumor cell survival under hypoxic and cell-dense conditions [37]. Nguyen with co-workers [38] observed that the RNAi-mediated inhibition of glutathione synthesis impaired survival of multiple colon cancer cell lines.
The present study was the rst to identify signi cant associations between polymorphisms of GCLC and the risk of CRC. In particular, a polymorphism rs606548 of GCLC showed a signi cant association with the risk of colorectal cancer in the Russian population regardless of age and sex. Two other SNPs of the GCLC gene such as rs12524494 and rs761142 showed a weak association with disease risk, and the association did not survive after correction for multiple tests. Furthermore, ten genotype GCLC combinations were associated with the risk of CRC. Functional SNP annotation using multiple bioinformatics tools revealed that polymorphisms rs606548, rs12524494, and rs761142, despite being located in non-coding regions of the gene, represent the regulatory variants that themselves or due to their tightly linked SNPs may impact the expression level of the GCLC through epigenetic mechanisms such as histone modi cation and DNase sensitivity.
According to the literature, GCLC gene polymorphisms are known to be associated with breast and prostate cancer [39,40]. In particular, SNP s12524494 is associated with susceptibility to breast cancer [41]. Polymorphism rs761142 of GCLC is found to affect drug metabolism, but no evidence for association with any type of cancer was observed [42]. Polymorphism rs17883901 [43,44], rs41303970, and rs12524494 [7] were found to be associated with the risk of diabetes. Interestingly, ELOVL5 whose decreased expression level in the sigmoid part of the colon is correlated with allele rs17883901-A of GCLC (data obtained from GTEx portal), was found to be highly expressed in colorectal cancer tissues [45]. It is proposed that changes in expression may be indicative of the increased regulation of fatty acid biosynthesis that contributes to the reprogramming of cellular phospholipidome and membrane alterations in colon cancer [46]. There is also evidence for an association between polymorphism rs606548 of the GCLC gene and the risk of ischemic stroke [47]. Bioinformatics analysis allowed identifying that the CRC-associated alleles are associated with decreased expression of the GCLC gene, and the modulating effects of these variants, most likely, are realized through epigenetic mechanisms including histone modi cations operating in a tissue-speci c manner [48]. We propose that histone modi cations such as H3K79me2, H3K4me2, and H3K36me3 as well as H3K79me2 and H3K36me3 might contribute to the weak transcriptional activity of the GCLC gene in the sigmoid part of colon and colon mucosa, respectively. Taking together our ndings suggest that the decreased transcription of the GCLC gene in the carriers for the rs606548 variant and associated decreased levels of glutathione makes mucosal cells of the colon more sensitive to environmental carcinogens.

Study limitations
The present study has a limitation in that the results were obtained with a relatively small number of CRC patients and healthy controls. The link between polymorphisms of GCLC and colorectal cancer observed in the studied population of relatively low sample size should be considered as an exploratory nding highlighting the demand in validation in a larger independent population with a focus on a wider spectrum of polymorphisms of the GCLC gene. Moreover, the present study did analyze geneenvironment interactions, a joint effect of the GCLC gene polymorphisms, and well-recognized environmental factors such as hypodynamia, cigarette smoking, alcohol consumption, and dietary factors on the risk of colorectal cancer.
In conclusion, the present study is the rst to show an association between single nucleotide polymorphisms and the risk of colorectal cancer. Based on the observed associations, we suppose that the GCLC gene may contribute to the CRC susceptibility through a diminished biosynthesis of glutathione in the large intestine where the tripeptide is crucial for the regulation of multiple cellular processes, including cell differentiation, proliferation, and apoptosis as well as for the detoxi cation and removal of carcinogens and free radicals leading to oxidative stress that has been implicated in cancer development and progression [18,49]. However, before drawing a de nitive conclusion on the roles of the GCLC gene in colorectal cancer, further studies with a larger sample size are required to con rm the association between the gene polymorphisms to the risk of colorectal cancer and to investigate whether environmental factors modify the effects of SNPs on the disease susceptibility. Better understanding the impact of the GCLC gene polymorphisms on glutathione biosynthesis and their contribution to colorectal cancer susceptibility will open new avenues for disease prevention through glutathione replenishment and provide opportunities for effective genotype-   2 Odds ratio with 95% con dence intervals (crude analysis) with one degree of freedom. 3 Odds ratio with 95% con dence intervals adjusted for age and sex. Bold is signi cant P-values.
NA, not available.