A replicative case-control study was done in about 550 samples to analyze the variants using Agena MassARRAY genotyping for the population of Jammu & Kashmir. Here, we investigate various BC loci in cases and controls. We investigated using the Agena massARRAY platform and identified numerous SNPs that were found significantly associated with BC genome-wide and independent of each other. Our study demonstrated 4 genome-wide loci which have been associated with BC development in the population under study. The rsIDs rs12190287 and rs1051266 associated with the genes TCF21 and SCL19A1 are causing risk in our population group. We also found six variants following HWE however not showing significance with BC development. The allele frequency of all the variants is shown in Figures S2-S11.
The variant rs1051266 is located on the SLC19A1 gene. SLC19A1 or Solute Carrier family member protein is a gene implicated in placental carcinomas and pediatrics osteosarcomas. Studies have shown the SLC19A1 gene variants to be associate with BC risk in worldwide populations(24) including African American women(25). Our data revealed the variant rs1051266 to be significantly associated with BC risk in the population under study. Further, the bioinformatic analysis revealed that the associated variants are conserved in primates including humans and have been located in the conserved domain region (Figure S13). We also studied the genotype tissue expression of the variants (GTEx) with their NES (Normalized Effect Size) values, which have been shown in Table S3. The GTEx with NES (Normalized Effect Size) was used to study the correlation between the genetic variation and gene expression in the human tissues. The variant 1051266 (SLC19A1) was significantly showing expression in breast tissue with an NES value of -0.4333 and a p-value of 2.4e-6 (<0.05).
TCF21 or Transcription factor gene is a tumor suppressor gene and is associated with Uterine Corpus carcinoma and Pericoronitis. TCF21 is found mutated in several types of cancers (26) Studies have shown a lower expression of TCF21 in breast tumor tissues corresponding to enhanced tumor size and increased lymph node metastasis (27). We analyzed the variant rs12190287 G>C of the TCF21 gene and found it to be significantly associated with BC in the studied population group. The variant was found causing risk for BC in the population. The variant rs12190287 (TCF21) showed significant expression in breast tissue, with an NES value of 0.210 and a p-value of 3.3e-5. The positive NES value indicated the up-regulated expression in the breast tissue.
ERCC1 gene or Excision Repair Cross-Complementing Rodent Repair gene which harbored the rs2298881 variant, functions in a nucleotide excision repair pathway(28). ERCC1 is found to be associated with multiple cancers. ERCC1 variants have also been linked to an increased risk of BC(29) in women. The variant rs2298881 C>A was found significantly associated with breast cancer. The variant was found to be conferring protection for our BC in the studied population group. The variant rs2298881 (ERCC1) showed significant expression in the breast with an NES value of -0.260 and a p-value of 3.8e-9.
DCC or Deleted in Colorectal Cancer is a gene encoding the netrin1 receptor. Netrin1 receptor is a transmembrane receptor belonging to the immunoglobulin superfamily. DCC gene is a tumor suppressor gene and is frequently mutated in colorectal carcinomas. DCC is abundantly expressed by neurons and stimulates cell survival and axon regeneration. Apart from mutations in colorectal cancers, studies have highlighted the role of DCC in BC. A variant of the DCC gene, rs2229080, has been found associated with increased BC risk(13). Our study revealed that rs2229080 G>C was significantly associated with breast cancer and the altered allele C was causing protection in the studied population group. Though for the variant rs2229080 (DCC) the expression in breast tissue was found non-significant with the NES value of 0.054 with a p-value of 0.3. The positive NES value in rs12190287 (TCF21) is indicative of the up-regulation of the expression in the breast tissue and the variants 1051266 (SLC19A1) and rs2298881 (ERCC1) with negative NES points towards down-regulated expression in the breast tissue.
The RNA fold analysis revealed the MEF and structural differences in the wild and the altered allele. We also studied the difference in the secondary structures and the MEF values of the wild type allele and the variant allele. There was a decreased MEF in the case of the wild type allele of the variants rs12190287 and rs1051266 providing them an enhanced stable structure than the altered allele. Whereas, the rsIDs rs2229080 and rs2298881 associated with the genes DCC and ERCC1 were found to be causing protection to BC. The MEF values of these variants were lower for the altered allele thus suggesting a more stable structure of these allele variants. These stable structures were further associated with the conferring of protection against BC. Further analysis of the second structure of the genes with the variants highlighted a substantial difference in the MFE and MFE of centroid secondary structure. The differences in the MFE values of the variants have been summarized in the Table S4. The differences in the secondary structures of the alleles have been shown in figure 1. On comparing the allele frequencies of the associated allele with 1000genome data, we found substantial differences in the allele frequencies. The differences in the allele frequency of the associated alleles have been depicted in the Figure S12. An intermediary value of allele frequency for the variant rs1051266 was observed. The allele frequency in the Indian subcontinent comprising of the PJL (Punjabi’s from Lahore, Pakistan), ITU (Indian Telugu from the UK) and STU (Sri Lankan Tamil from the UK) was intermediary, around 0.4 in a range of 0 (low) to 1 (high). Similar allele frequencies were observed in the GIH (Gujaratis Indian from Houston, Texas), BEB (Bengali from Bangladesh), GBR (British in England and Scotland), and CEU (Western European Ancestry) populations. The frequency of the variant rs12190287 for found inclined towards a higher side being around 0.7 for the Indian subcontinent. A similar high frequency was seen for the BEB (Bengali from Bangladesh), MXL (Mexican Ancestry from Los Angeles USA) and PEL (Peruvians from Lima, Peru) populations. The variant rs2298881 had a comparably lower frequency worldwide. Its frequency in India was on a lower side, around 0.2, with similar frequency observed in BEB (Bengali from Bangladesh) and ASW (Americans of African Ancestry in SW USA) populations. However, in the far eastern populations including the JPT (Japanese in Tokyo, Japan) and CHB (Han Chinese in Beijing, China) a higher frequency of these variants was observed. A very high frequency of about 0.8 for the variant 2229080 was observed in the Indian population. A similar high frequency of the variant was observed in JPT (Japanese in Tokyo, Japan) and BEB (Bengali from Bangladesh) populations. Similar frequency rates could indicate a higher BC rate in these regions. The wide gap between the genetic frameworks of the different populations makes it essential to analyze the genetic heterogeneity among various populations.