Association Study between SNPs within MicroRNA Binding Sites and the Prognosis of Breast Cancer


 Background: Single nucleotide polymorphisms (SNPs) within microRNA binding sites can affect the binding of microRNA to mRNA and regulate gene expression, thereby contributing to the prognosis of cancer. We performed this study to explore the association between SNPs within microRNA binding sites and the prognosis of breast cancer.Methods: We carried out a two-stage study including 2647 breast cancer patients. In stage I, we genotyped 192 SNPs within microRNA binding sites using the Illumina Goldengate platform. In stage II, we validated SNPs significantly associated with breast cancer prognosis in another dataset using the TaqMan platform. Survival times was calculated, and Kaplan-Meier curves and Cox regression model were used to analyze survival of breast cancer patients with different genotypes.Results: We identified 8 SNPs significantly associated with breast cancer prognosis in stage I (P<0.05), and only rs10878441 was statistically significant in stage II (AA vs CC: adjusted HR=2.21, 95% CI: 1.11-4.42, P=0.024). We combined the data from stage I and stage II, and found that, compared with rs10878441 AA genotype, CC genotype was significantly associated with poor survival of breast cancer (HR=1.69, 95% CI: 1.18-2.42, P=0.004; adjusted HR=2.19, 95% CI: 1.30-3.70, P=0.003). Stratified analyses demonstrated that rs10878441 was related to breast cancer prognosis in grade II patients and lymph node-negative patients (P<0.05).Conclusions: The LRKK2 rs10878441 CC genotype is associated with poor prognosis of breast cancer in a Chinese population, and it could be used as a potential prognostic biomarker for breast cancer. Further studies are warranted.


Association Study between SNPs within MicroRNA Binding Sites and the Prognosis of Breast Cancer
. In China, breast cancer is predicted to account for about 15% of all new cancer cases among women [2]. It is estimated that around 3-6 million SNPs in the human genome could provide a means for elucidating the genetic component of complex diseases [3].
MicroRNAs are endogenous non-coding small RNAs (containing about 22 nucleotides) that regulate gene expression by Waston-Crick pairing with the target gene of the 3' untranslated region (3'UTR). It has been reported that microRNAs regulate nearly 30% of human genes [4], and play important roles in most physiological and pathological processes, such as tumorigenesis and proliferation. The binding of microRNA to mRNA is critical for regulating the mRNA level and protein expression.
However, this binding can be affected by SNPs that reside in the microRNA binding sites. Therefore, SNP variations may interfere or disrupt the binding of the SNPs to microRNAs, which may affect the regulation of miRNAs on target genes, thereby contributing to the prognosis of cancer [5][6][7].
In recent years, a number of studies have reported a link between SNPs within microRNA binding sites and prognosis of various types of cancer including breast cancer [7][8][9]. Teo et al [10] reported the role of rs7180135 in RAD51 in the prognosis of breast cancer patients, and the G minor allele had improved breast cancer specific survival. Brendle et al [11] identified that the A allele of the SNP rs743554 in the 3'UTR of ITGB4 gene was associated with oestrogen receptor-negative tumors and worse survival in patients with breast cancer. Zhang et al [12] found that miR-367-binding site rs1044129 in RYR3 gene was associated with poor survival of patients with breast cancer. Liu et al [13] uncovered that TT genotype of rs16917496 on SET8 3′-UTR region was significantly associated with poor outcome of breast cancer in a Chinese population.
However, there is still a lack of large-scale association studies between SNPs with microRNA binding sites and the prognosis of breast cancer in China. Therefore, we carried out a two-stage cohort study to investigate the relationship between SNPs within microRNA binding sites and breast cancer prognosis.

Study subjects
We performed a two-stage cohort study including 2647 breast cancer patients, with 1297 and 1350 breast cancer patients in stage I and stage II, respectively. All patients were newly diagnosed and histologically confirmed for breast cancer at Tianjin Medical University Cancer Hospital (TJMUCH) from January 2006 to December 2012. The detailed description of Tianjin Cohort of Breast Cancer Cases (TBCCC) can be obtained in our previous study [14]. Demographic and epidemiological data were obtained from face-to-face questionnaires by trained personnel. Clinical data and pathology report were taken from medical records. All patients were followed up by telephone annually. In addition, we further confirmed the accuracy of self-reported information through Hospital information system (HIS) at TJMUCH and death registration system. The study was approved by the Ethics Committee of Tianjin Medical University Cancer Institute and Hospital, and all patients participated in the study signed written informed consent.

SNP selection
The "Patrocles" database (http://www.patrocles.org/) was used to select genome-wide microRNA target SNPs. Of all the 5035 SNPs within microRNA binding site provided by the database, 1,742 SNPs had been confirmed. At the same time, SNPs for inclusion conformed with the following criteria: (1) SNPs located at the binding site of microRNA-seed region, and the seed region was defined according to the "7-mirs" criteria [12]. (2) SNPs have Chinese population frequency data (htpp://www.ncbi.nlm.nih.gov/snp/), and SNPs have three genotypes with minor genotype frequency (MAF) ≥0.05. Finally, 192 microRNA target SNPs were included in our study, the detailed information of these SNPs were shown in Table S1.

SNP genotyping
Genomic DNA was extracted from peripheral blood using QIAGEN DNA Extraction Kit (QIAGEN Inc.) [15]. The Illumina Golden Gate SNP Genotyping Arrays was used to genotype 192 SNPs in stage I. The TaqMan platform was taken to genotype 8 SNPs associated with breast cancer prognosis in stage II.
We used a 5-μl reaction mixture system with 20 ng of genomic DNA, 2.5 μl of 2×TaqMan Genotyping Master Mix, 0.1 μl of 40×probe and 1.9μl of double distilled water. The PCR reaction conditions were 95℃ for 10 minutes, followed by 50 cycles of 92℃ for 30 seconds, and 60℃ for 1 minutes. We amplified using the 384-well reaction plates and performed genotype analysis using SDS 2.4 software (Applied Biosystems, Foster City, CA, USA). In order to ensure the accuracy and reliability of the experimental results, approximately 5% of the samples were randomly selected for retesting.

Follow-up of Breast cancer
Followed-up information included follow-up date, vital status (alive, dead, and lost to follow-up), tumor progression (recurrence, metastasis), and treatment after tumor progression. Overall survival (OS) was defined as the time from the date of breast cancer diagnosis to the date of death from any cause.
Disease-free survival (DFS) was calculated as the time from breast cancer diagnosis to the date of tumor progression (recurrence, metastasis or death). If patients were lost to follow-up, the follow-up date was calculated based on the date of the last visit. Follow-up of this study was completed on December 31, 2017.

Statistical analysis
The Kaplan-Meier method was used to calculate survival estimates, and log-rank test was used to compare the survival differences of these SNPs. To determine potential prognostic risk factors, univariate Cox regression was used to evaluate the relationship between demographic, epidemiological and clinicopathological characteristics and breast cancer prognosis, presented as hazard ratios (HRs) and 95% confidence intervals (CIs). Cox regression was used to appraise the association between SNPs and breast cancer OS, with and without adjustments for age at diagnosis, education, occupation, age at menarche, number of live births, breastfeeding duration, abortion, menopause, TNM stage, tumor size, histopathologic classification, grade, lymph node, ER, PR, and HER2. Similarly, Cox regression was used to assess the relationship between SNPs and breast cancer DFS, with and without adjustments for age at diagnosis, number of live births, breastfeeding duration, abortion, menopause, BBD, TNM stage, tumor size, histopathologic classification, grade, lymph node, ER, PR, and HER2. We further analyzed the relationship between the SNP rs10878441 and breast cancer OS stratified by clinical characteristics. All statistical tests were two-sided and P<0.05 was considered statistically significant. All statistical analysis was performed using SPSS 20.0 software (SPSS Inc. Chicago, IL, USA) and R version 3.4.3.

Demographic and epidemiological characteristics of patients
The demographic and epidemiological characteristics of 2647 breast cancer patients were shown in were significantly associated with breast cancer OS. In addition, age at diagnosis, number of live births, breastfeeding duration, abortion, menopause, and BBD were significantly related to breast cancer DFS.

Clinicopathological characteristics of patients
The clinicopathological characteristics of all participants were presented in Table 2 Figure S1). The associated SNPs were rs1053739 located in NMT1 at 17q21.31, rs2693 located in KIF13B at 8p12, rs698761 located in PREPL at 2p21, rs8602 located in MKNK1 at 1p33, rs10878441 located in LRRK2 at 12q12, rs10318 located in GREM1 at 15q13.3, rs10075853 located in ST8SIA4 at 5q21.1 and rs8410 located in PREPL at 2p21. We further analyzed the association between the 8 SNPs and breast cancer DFS, rs1053739, rs698761, rs10878441, rs10318, and rs8410 showed a significant association with breast cancer DFS (  Figure S2).

Association between 8 SNPs and breast cancer prognosis in Stage II
In stage II, the median follow-up time was 67 months (0 to 143). Among the 8 SNPs identified from stage I, the SNP rs10878441 in LRKK2 gene (the duplex structure between miR-550-3p and LRKK2 was shown in Figure S3) was significantly associated with the OS of breast cancer (AA vs CC: HR=2.21, 95% CI: 1.11-4.42, P=0.024) (  Figure 1). Furthermore, we evaluated the association between the SNP rs10878441 and breast cancer OS stratified by clinical characteristics (

Discussion
Through this association study, we genotyped 192 SNPs within microRNA binding sites and found that 8 SNPs were associated with the prognosis of breast cancer. We further replicated the 8 SNPs in an independent data set, and identified that the SNP rs10878441 (C allele) in LRRK2 gene was significantly associated with poor prognosis of breast cancer. This study provided some evidence for a novel prognostic locus for breast cancer.
In this present study, two SNPs (MKNK1 rs8602, GREM1 rs10318) were previously reported in the context of cancer prognosis. MKNK1 regulates diverse biologic processes including translation, cell proliferation, and differentiation [16,17]. Berger et al found that MKNK1 polymorphism rs8602 might serve as a predictive marker in KRAS wild-type metastatic colorectal cancer patients treated with first-line FOLFIRI and bevacizumab [18]. Neckmann et al showed that GREM1 was associated with metastasis and predicted poor prognosis in ER-negative breast cancer patients [19]. Dai et al indicated that GREM1 polymorphism rs10318 was associated with recurrence in stage II colorectal cancer patients [20]. Our study found significant association between these two SNPs and breast cancer prognosis only in stage I, while no significant difference was observed in stage II (the validation set).
The LRRK2 gene, located in human chromosome 12q12, is a member of the leucine-rich repeat kinase family and encodes a protein with multiple domains such as a leucine-rich repeat (LRR) domain, a RAS domain, a GTPase domain, a kinase domain and several protein-protein interaction domains [21].

Mutations in LRRK2 gene have been demonstrated to be associated with autosomal-dominant
Parkinson's disease [22,23]. Studies have revealed that single nucleotide polymorphism in LRRK2 gene have been related to Crohn's disease [24,25]. It is reported that LRRK2 gene is involved in a variety of cellular processes including cell transformation, proliferation and tumorigenesis, and is linked to various types of cancer [26,27]. Gu et al. demonstrated that high expression of LRRK2 promoted the cell proliferation and migration of intrahepatic cholangiocarcinoma (ICC) cells, and predicted worse prognosis in ICC patients [28]. Looyenga et al indicated that MET and LRRK2 cooperated to promote efficient tumor cell growth and survival in papillary renal and thyroid carcinomas [26]. Warø et al reported that LRRK2 mutation carriers had an increased risk of non-skin cancer [29].
Our findings suggest that the C allele of LRRK2 has poor prognosis in breast cancer. LRRK2 expression may be regulated in a variety of ways, while the association between the SNP rs10878441 and the prognosis of breast cancer might be caused by differential microRNA regulation. SNP rs10878441 (A/C) is located within the miR-550-3p binding site, and it is likely to affect the miR-550-3p/LRRK2 interaction. As shown in Figure S3, the C allele cannot be targeted by miR-550-3p, which will lead to increased expression of LRRK2 protein, thereby altering the prognosis of breast cancer. The definite underlying mechanism for the association with the prognosis of breast cancer remains unknown. Lin Although we conducted a large systematic two-stage cohort study to evaluate mircoRNA target SNPs and breast cancer prognosis, our study has several limitations. First, we only selected high frequency SNPs with MAF ≥ 0.05, inevitably miss low frequency SNPs that have an impact on breast cancer prognosis. Second, Type 1 error of multiple testing was not corrected in this study, although our design with large sample size and replication set can ensure a high repeatability of our findings.
Third, due to the good prognosis of breast cancer patients, the number of deaths and tumor progression were small, and further follow-up will be required to confirm the reliability of the results.
In addition, functional research is needed to elucidate the mechanism of this association in future study.

Conclusions
In conclusion, the LRKK2 rs10878441 CC genotype is associated with poor prognosis of breast cancer in a Chinese population, suggesting that it could be a potential prognostic biomarker for breast cancer. Further studies to elucidate the underling mechanism for this association are warranted.

Consent for publication
Not applicable.

Availability of data and materials
The data sets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.

Author Contributions
LWZ and LH developed the ideas and drafted the manuscript. YBH, ZWF, LYL, JXL and XW were responsible for data processing and statistical analysis. HXL, FFS, HZ, PSW supervised the study procedure and revised the manuscript. FJS and KXC are also involved in data analysis and interpretation, as well as manuscript preparation. All authors read and approved the final manuscript.

Acknowledgments
Not applicable.