Genetic analysis of colorectal carcinoma using high throughput single nucleotide polymorphism genotyping technique within the population of Jammu and Kashmir

SNP genotyping has become increasingly more common place to understand the genetic basis of complex diseases like cancer. SNP-genotyping through MassARRAY™ is a cost-effective method to quantitatively analyse the variation of gene expression in multiple samples, making it a potential tool to identify the underlying causes of colorectal carcinogenesis. In the present study, SNP genotyping was carried out using Agena MassARRAY™, which is a cost-effective, robust, and sensitive method to analyse multiple SNPs simultaneously. We analysed 7 genes in 492 samples (100 cases and 392 controls) associated with CRC within the population of Jammu and Kashmir. These SNPs were selected based on their association with multiple cancers in literature. This is the first study to explore these SNPs with colorectal cancer within the J&K population.7 SNPs with a call rate of 90% were selected for the study. Out of these, five SNPs rs2234593, rs1799966, rs2229080, rs8034191, rs1042522 were found to be significantly associated with the current study under the allelic model with an Odds Ratio OR = 2.981(1.731–5.136 at 95% CI); p value = 4.81E-05 for rs2234593,OR = 1.685(1.073–2.647 at 95% CI);; p value = 0.02292 for rs1799966, OR = 1.5 (1.1–2.3 at 95% CI), p value = 0.02 for rs2229080, OR = 1.699(1.035–2.791 at 95% CI); p value = 0.03521 for rs8034191, OR = 20.07 (11.26–35.75); p value = 1.84E-34 for rs1042522 respectively. This is the first study to find the relation of Genetic variants with the colorectal cancer within the studied population using high throughput MassARRAY™ technology. It is further anticipated that the variants should be evaluated in other population groups that may aid in understanding the genetic complexity and bridge the missing heritability.


Introduction
Colorectal cancer (CRC) is the third most common type of cancer worldwide, resulting in 1-2 million new cases each year (1). In 2020, it was estimated that nearly 10% of all cancer incidences were reported to be of CRC [1]. The incidence of CRC has been associated with obesity, red meat consumption, and physical inactivity [2,3]. In addition, the genetic factors and epigenetic changes also play a key role in the initiation and progression of CRC [4,5]. Delay in the diagnosis of CRC is a major hurdle in the management of CRC, which is evident by the rise in new cases each year. Therefore, it is critical to identify markers that may help in the early prognosis and development of therapeutic interventions accordingly.
In India, CRC accounted for about 1,479,783 deaths in the year 2020 [1]. One recent study has put Colorectal cancer as the forth most common cancer in the Jammu and Kashmir (J&K) region [6], there is a spike in the incidence of gastric, oesophagus and CRC in recent years. The rise in the incidence of these disorders may be as a result of lifestyle and food consumption [7]. However, the genetic aspect of gastro-intestinal cancer cannot be overruled (11). According to the J&K based hospital report, CRC accounted for the forth most common type of cancer [6]. In this regard, identification of the CRC related genetic variants in the region of J&K is necessary for the proper prognosis and CRC management, which can be achieved through singlenucleotide polymorphisms (SNP) genotyping. Several studies have looked at mutations in critical genes involved in cell cycle, cell growth, DNA damage repair, and a variety of other processes in the J&K population with regard to various malignancies [8][9][10].
SNP genotyping is a powerful tool that has identified the genetic basis of complex disease, including CRC [11]. The studied SNPs hence provides key insights on the molecular pathogenesis of cancer that can be further translated to the identification of cancer and therapeutic biomarkers [12][13][14]. Identifying the role of these genetic variants may provide valuable insights on prognosis and optimize therapies for the treatment of CRC. The Agena Bioscience MassARRAY™ System provides genotype data for several user-defined SNPs in a large number of DNA samples in a high-throughput and cost-effective manner [11]. In this study, we have carried out SNP genotyping using Agena MassARRAY™ to identify multiple SNPs and samples simultaneously.
The previous study in Chinese population has showed that rs2229080 of DCC (Deleted in Colorectal Carcinoma Netrin1 Receptor)was associated with low breast cancer risk [15]. In contrast, DCC rs2229080 displays no significant association with esophageal cancer risk in the region of J&K [16]. In the present study, we analysed 7 genes and found out that rs2234593, rs1799966, rs2229080, rs8034191, rs1042522 were found to be significantly associated with colorectal cancer in the current studied population of J&K region. Taken together, our study investigated the role of cancer-related genetic variants in CRC in the population of J&K. The study of these genetic variants may provide valuable insights on the proper prognosis and CRC management. However, more large-scale sample size studies are required that will further support the present study.

Sample collection
This study was approved by the Institutional Ethics Review Board (IERB) of Shri Mata Vaishno Devi University (SMVDU). All details were recorded in a pre-designed proforma and the written informed consent was obtained from each participant before conducting the study.
A total of 492 participants including 100 colorectal cancer patients and 392 healthy controls (age and sex-matched) were recruited from the Jammu and Kashmir region of India. All the participants recruited for the study were obtained from hospitals and various clinics of J&K. 2 ml of venous blood was collected in ethylenediaminetetraacetic acid (EDTA) vacutainer tubes from all the participants. The clinical parameters of both cases and controls are provided in Supplementary Table 1.

DNA extraction
The genomic DNA was isolated from the blood samples, using the manufacturer's protocol of Qiagen™ DNA isolation Kit (Catalogue No. #51206, Hilden, Germany). Genomic DNA was quantified using Eppendorf's Bio Spec-trometer™ (Hamburg, Germany) at wavelength 260 nm and 280 nm and the ratio of OD260nm/OD280nm was taken as a criterion to check the purity of DNA. The quality of the genomic DNA was checked by agarose gel electrophoresis (Bio-Rad Gel Doc™ EZ imager).

Genotyping
Agena MassARRAY™ platform was used for SNP genotyping, in Central Analyzer MassARRAY™ facility at SMVDU. It is a robust, cost-effective and highly sensitive tool for genotyping of SNPs and involves multiplex PCR [17]. Customized forward, reverse and single base extension primers were designed using Agena Design Suite V.2.0. Multiplex PCR was used to detect a variation in initially targeted region.1 µl of genomic DNA (concentration of 10 ng/ ul) was loaded in 384 well PCR plates and dried at 85 °C for 10 min. After drying, the reaction mixture was prepared containing dNTPs, primers pool (forward & reverse), reaction buffer and DNA polymerase. After completion of first PCR, the reaction was treated with shrimp alkaline phosphatase (SAP). The multiplex PCR reaction was then subjected to single base extension reaction using mass modified ddNTPs and primers (pooled single extension primers). PCR cycle was adopted from Gabriel et al. 2009 [17]. Further the final PCR product is treated with cationic resin and then energy reaction to keep check the quality of genotyping and transferred to spectro-chip. The transferred product then fired to MT analyser. The data was then processed and analysed by preinstalled Typer Analyzer v.4.0. The genotyping results were recognized by replicating 10% of random samples and the concordance rate were 98.3%. In the reaction of 384 well plates one negative and one positive control were added to check the quality of reaction mixture.

Genotyping quality control
The accuracy and precision of downstream data analyses are highly dependent on the data quality of single nucleotide polymorphism (SNP) arrays. Low-quality genotypes can lead to false-positive results and reduce the precision of genomic predictions. Individual call rate, defined as the proportion of SNPs per individual where a genotype was called, is a common quality control measure [18]. So SNPs having call rate above 90% were included for statistical analysis [19]. Hardy-Weinberg Equilibrium (HWE) among cases and controls were used for assessing the quality of genotypes after analysis.

Statistical analysis
The statistical analysis was performed using Plink V.1.0962 with a maximum of 10,000 permutations [20]. Each SNP was subjected to Hardy Weinberg Equilibrium (H.W.E) and significant association of SNPs was evaluated by 3 × 2 chi square tests for genotypic frequencies between cases and controls. Further logistic regression analysis was performed using SPSS V.23 in order to obtain corrected odds ratio (OR), confidence interval (CI) and p value as level of significance from confounding factors like age and BMI. The power of the study was calculated using PS: power and sample size calculation (PS version 3.1.6) software and the power of study was > 90% (41).

Results
The current case-control association study included a total of 492 participants with 100 colorectal cases and 392 healthy controls. In the current study, the patient cohort included 56 males and 44 females, with a average age and BMI of 62.87 (± 9.8) years and 20.75 (± 0.869) kg/m 2 respectively. Healthy controls constituted 280 males and 112 females with average age and average BMI of 48.81 (± 15.3) years and 24.9 (± 0.869) kg/m 2 respectively. The clinicopathological characteristics of all the participants are shown in (Supplementary Table 1).
In the current study, genetic variants that were not studied in association with CRC in the population of Jammu and Kashmir but are associated with other types of cancer were evaluated. These genetic variations were studied to know whether they show increased risk or reduced risk with colorectal cancer in the population of Jammu and Kashmir. This is the first study to find the association of colorectal cancer in the population of Jammu and Kashmir.  We analysed 7 SNPs using MassARRAY™ After stringent quality check, these SNPs having genotyping quality call greater than 90% (Table 1). This is the first study to explore these SNPs with colorectal cancer within the J&K population. Seven SNPs were selected for the study, out of these five SNPs rs2234593, rs1799966, rs2229080, rs8034191, rs1042522 were found to be significantly associated with the colorectal cancer under the allelic model with an odds ratio OR = 2.

Discussion
Colorectal cancer (CRC) is the 3rd most persistent cancer and a prominent cause of cancer-related morbidity and mortality worldwide [1]. CRC evolves due to the progressive accumulation of genetic and epigenetic modification in the colonic epithelium, transforming them into colorectal adenomas and adenocarcinomas [21].
Genetics is an important risk factor associated with CRC. So, in the present study, genetic elucidation among cases and controls was explored. We analyzed seven genes in 492 samples consisting of 100 cases and 392 controls. All the seven SNPs had a call rate above 90%. These identified variants were rs2229080 of DCC, rs10046 of CYP19A1, rs1042522 of TP53, rs10228682 of POT1, rs10069690 of TERT, rs1051266 of SLC19A1, and rs1026071 of ARTNL (Supplementry Fig. 1). GWAS threshold is a statistical threshold for establishing the statistical significance of a claimed relationship between a single-nucleotide polymorphism (SNP) and a characteristic in genome-wide association studies. The most widely used threshold is p < 5 × 10 − 8, which is calculated by applying a Bonferroni adjustment to all of the independent common SNPs across the the human genome. Table 1 shows that five out of seven SNPs were in a strong association; however, rs1042522 of TP53 have GWAS threshold value.And to check the Putative Role of the associated variants with Colorectal we used GTEx Table 2 rs2234593 of WT1 (Wilms tumor 1, transcription factor) This tumor suppressor gene plays an important role in cell growth and apoptosis. The chromosomal location of WT1 is 11 and it has been found to be associated with acute myeloid leukemia, lung cancer, brain tumors, breast cancer, colorectal adenocarcinoma, thyroid cancer, desmoid tumors, etc. [22,23]. In a study it was found that the variant rs2234593 of WT1 was linked with overall survival (OS) and relapse in cancer (leukemia) patients [24]. Another study signified overexpression of WT1 in colorectal cancer [25]. In the present study variant rs2234593 of WT1 has shown no association with CRC in the J&K population.

rs1799966 of BRCA1 (breast cancer type 1 susceptibility protein)
The variant rs1799966 of BRCA1 has been proved to be linked with "pancreatic cancer" with a "hazard ratio" of 1.23 (95% CI: 1.09-1.40, P = 0.0010) in the population of China [26] but not associated with possibility of breast carcinoma [27]. It has been confirmed that mutations in BRCA1 or BRCA2 results in multiple cancers like colorectal cancer and pancreatic adeno-carcinoma [28,29]. In present study, the genetic variant rs1799966 of BRCA1 has been evaluated with respect to colorectal cancer and it was observed that the variant under study was found to be associated with the higher risk of colorectal cancer in the J&K population with O.R 1.685 (1.073 to 2.647, at 95%CI, p value = 0.022).

rs2229080 of DCC ("deleted in colorectal carcinoma, netrin 1 receptor")
DCC (netrin-1), initially discovered in CRC, encodes the netrin-1 receptor, a member of the cell's immunoglobulin superfamily adhesion molecules, has been characterized as a potential tumor suppressor gene [30][31][32]. As soon as DCC bind to the netrin-1 receptor, it induces cell migration and proliferation. In the absence of netrin-1, DCC's intracellular domain is cleaved by a caspase that induces apoptosis in a caspase-9-dependent pathway (Supplementry Fig. 2) [33]. DCC is frequently silenced or inactivated in various human cancers due to epigenetic silencing or loss of heterozygosity at chromosome 18q21 region [31,34]. Loss of DCC gene expression was shown to be an independent prognostic factor in colorectal [12], AML [35], and gastric cancer [13,36] patients. Various studies have been carried out that demonstrate a significant association of DCC polymorphism with esophageal, colorectal, and gastric cancer risk [16,[37][38][39]. Further, rs2229080, a missense variation replacing Arg to Gly at DCC codon 201, was reported to increase the risk of colorectal cancer [40] and neuroblastoma [41]. In present study, the genetic variant rs2229080 of DCC has been evaluated with respect to colorectal cancer and it was observed that the variant under study was found to be associated with the higher risk of colorectal cancer in the J&K population with OR = 1.5 (1.1-2.3 at 95% CI), p value = 0.02.

rs1801133 of MTHFR (methylenetetrahydrofolate reductase)
An enzyme called "methylenetetrahydrofolate reductase" is produced with the instructions from the "MTHFR". This enzyme has a function in the processing of amino acids, the basic components of proteins [42]. Although a number of mutations were described, 1298A > C (rs1801131) and 677C > T (rs1801133) "single nucleotide polymorphisms" (SNPs) are the two most general mutations in the MTHFR [43]. For the creation of a thermo labile variety of MTHFR, these two identified polymorphisms are responsible [43]. There was a common presence of the 677 TT genotype in southern Italy (26%), Mexico (32%) and Northern China (20%). The 677C > T mutation "rs1801133" in the "MTHFR" is a key reason for mild "hyperhomocysteinemia", while at nucleotide position 1298, the second polymorphism is not so well described [44]. However, a metaanalysis by Zan Teng (2013) suggests that the possibility of getting "colorectal cancer"increases with the MTHFR variant rs1801133 polymorphism (677C > T), whereas there is no link between African people in the "subgroup analysis by ethnicity" [45]. In this research, we tried to find the relationship of the variant rs1801133 of MTHFR with CRC among the J&K population but the differences in the allelic frequency distribution of rs1801133 variant between cases and controls were statistically insignificant. rs10046 of CYP19A1 "cytochrome P450 family 19 subfamily a member 1" Many researches has stated that the rs10046 variant of CYP19A1 is linked with gastric,breast, lung, and colorectal cancer [46,47]. The current research tried to explore the association of variant rs10046 with CRC in Jammu & Kashmir but the differences in the allelic frequency distribution of rs10046 variant between cases and controls were statistically insignificant.

rs8034191 of HYKK (hydroxylysine kinase)
HYKK is a protein coding gene located at chromosome 15.
The HYKK was reported to be associated with the susceptibility of lung cancer [48,49]. Many studies has signified that the variant rs8034191 of HYKK is linked with "lung cancer" possibility [49][50][51][52]

rs1042522 off TP53 (tumor protein p53)
The "TP53"gives instructions for the production of a protein known as p53 tumor protein. This gene encodes a "tumor suppressor protein" that includes domains of DNA binding, oligomerizationand transcriptional activation [53]. Due to its function in inhibiting cancer growth and regulating cell division, p53 is also called the "guardian of the genome". TP53 mutations are universal across several cancer types [54]. The loss of a tumor suppressor is most often caused by important harmful events, such as frame change mutations or premature stop codons. Abnormalities of the tumor suppressor gene, such as those of TP53, are common but are currently not clinically actionable [55]. Many studies have indicated that the "TP53" (rs1042522 C > G) polymorphism is linked with susceptibility to differentforms of cancer like cervical cancer, breast cancer, lung cancer, CRC, endometrial cancer, and ovarian cancer [56,57]. While a retrospective study in Taiwan region, it was found that the carriers of the "C allele" of variant rs1042522 were linked with a reduced colorectal cancer risk [58]. In present study, the genetic variant rs1042522 of TP53 has been evaluated with respect to colorectal cancer and it was observed that the variant under study was found to be associated with the higher risk of colorectal cancer in the J&K population with OR = 20.07 (11.26-35.75); p value = 1.84E-34.
To the best of our knowledge, to date, no study has been conducted on the role of the Single Nucleotide variants rs2234593, rs1799966, rs2229080, rs8034191, and rs1042522 in colorectal cancer within the population of J&K. This is the first prelude study that investigated the possible correlation between the rs2234593, rs1799966, rs2229080, rs8034191, and rs1042522 polymorphisms and susceptibility to colorectal cancer.

Conclusion
In the present study, we explored the link between environmental factors, genetics, and colorectal cancer. This study is the first to investigate the relation of genetic variants associated with colorectal cancer within Jammu and Kashmir. The present study could provide insights into genetic variation associated with the risk of developing colorectal cancer. Hence, if investigated further in the large cohort, this can unravel the biological significance of these SNPs in colorectal cancer among the Jammu and Kashmir populations.