Study on the Susceptibility of Lncrna PCAT1 Snps and Breast Cancer Risk in the Chinese Population

The purpose of this study is to explore the relationship between PCAT1-SNPs and breast cancer (BC) susceptibility. Logistic regression analysis was applied to determine the association between PCAT1-SNPs and BC risk. The relative expression of PCAT1 in different genotypes was detected by qRT-PCR. The binding between the genotype of C/T at rs4473999 locus and miR-149-5p was conrmed by dual luciferase gene reporter assays. The proliferation, migration and invasion of BC cells with dysregulated expression of miR-149-5p was evaluated by CCK8, Scratch and Transwell assay, respectively. Logistic regression analysis revealed PCAT1-SNPs was related to the susceptibility of BC that rs117117537 (OR:2.413, 95%CI: 1.057–5.508) and rs4473999 (OR:2.137 95%CI: 1.065–4.286) were risk factors of BC when the menopausal age was ≥ 50; The haplotype G rs1551514 T rs1551513 C rs4473999 C rs9656964 T rs17762938 C rs7823297 T rs785003 T rs117117537 may increase the risk of BC (OR:1.614 95%CI: 1.116– 2.333), and there was an association between genes and reproductive factors (OR:2.487 95%CI: 1.929–3.206). Preliminary functional studies demonstrated that PCAT1 interacted with miR-149-5p when rs4473999 carried wild type C; In addition, the dysregulated enrichment of miR-149-5p may affect the proliferation, invasion and migration of BC cells. Our study shows that PCAT1 gene polymorphism is related to BC susceptibility, PCAT1-rs4473999 C/T genotype may affect the occurence of BC by modulating the interactions with miR-149-5p. no relationship between all All samples were cryopreserved at -80°C for later Basic information and clinical characteristics of the patients were obtained from the patients' medical records, including age, age of menarche, menopausal status, menopausal age, number of pregnancies, number of miscarriages, history of breast-feeding and family history of breast cancer and so on. The patients' hormone receptor status was also obtained for cases, including estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor-2 (HER-2) status, examinated by the method of IHC. The study was approved by the Ethics Review Committee of the Ethics Committee of Medical and Health Research of Zhengzhou University. sites were genotyped with SNPscan TM multiple SNP typing kit; rs117117537 and rs785003 sites were genotyped using imLDR TM multiple SNP typing kit; the polymerase chain reaction-restriction fragment-length polymorphism (PCR-RFLP) was used to genotype rs4473999 samples and 10% of all SNP typings are sampled for gene sequencing to ensure the accuracy of typing. results of the dual-luciferase reporting experiment. The luciferase activity of NC group is signicantly higher than the miR-149-5p group (P=0.001). Simultaneously, the luciferase activity of mutant-type (MUT) plus miR-149-5p group is signicantly higher than the wild-type (WT) plus miR-149-5p group (P<0.001). These results indicate the combination of rs4473999-WT and miR-149-5p, but there was no evidence for the combination between the rs4473999-MUT and miR-149-5p, which was consistent with the previous prediction.

case frequency by age) were included. The case group samples included in the study were newly diagnosed with breast cancer by pathology from a third-grade A hospital in Henan Province, and all of them were Han Chinese women without any radiotherapy, chemotherapy or surgery. The control group was collected from the cardiovascular survey of Henan Province, excluding the family history of breast cancer and breast cancer-related diseases, and there was no kinship relationship between all the subjects. All samples were cryopreserved at -80°C for later use. Basic information and clinical characteristics of the patients were obtained from the patients' medical records, including age, age of menarche, menopausal status, menopausal age, number of pregnancies, number of miscarriages, history of breast-feeding and family history of breast cancer and so on. The patients' hormone receptor status was also obtained for cases, including estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor-2 (HER-2) status, examinated by the method of IHC. The study was approved by the Ethics Review Committee of the Ethics Committee of Medical and Health Research of Zhengzhou University.

DNA extraction, SNP selection and genotyping
According to the manufacturer's instructions, the total DNA was extracted by using a DNA extraction kit (Shanghai laifeng biotechnology co. LTD) from the whole blood and stored at -80℃ for use. The PCAT1 functional SNP and tag SNP were obtained through the website NCBI (accessed in December 2017), lncRNASNP2 and software Haploview, and then the SNPs were veri ed by NCBI, Ensembl database and 1000genomes according to minor allele frequency (MAF) of >0.05 in CHB population, and nally 8 functional regulatory regions SNPs and tag SNPs were determined, the basic information of these 8 SNPs is shown in Table S1. Based on the characteristics of SNP sequence and the cost performance of the typing, After the grading of SNP before typing by the biological company, SNP typing methods were divided into three types, speci cally as follows: rs17762938, rs7823297, rs9656964, rs1551513, rs1551514 sites were genotyped with SNPscan TM multiple SNP typing kit; rs117117537 and rs785003 sites were genotyped using imLDR TM multiple SNP typing kit; the polymerase chain reaction-restriction fragment-length polymorphism (PCR-RFLP) was used to genotype rs4473999 samples and 10% of all SNP typings are sampled for gene sequencing to ensure the accuracy of typing.

Bioinformatics 1) Secondary structure prediction
Online software RNAfold was used to predict the secondary structure of signi cant SNPs and observe whether there were changes in the secondary structure before and after the mutation(http://rna.tbi.univie.ac.at//cgi-bin/RNAWebSuite/RNAfold.cgi).

2) Function prediction
LncRNASNP2 was used to predict the binding capacity of miRNA that SNP might affect, as shown in Table S2.
Quantitative real-time PCR analysis (qRT-PCR) Total plasma RNA was extracted from randomly selected healthy controls with TRIzol reagent, and then DNA was removed with Takara reagent kit and RNA was reversely transcribed into cDNA. The relative expression of PCAT1 in three different genotypes of SNP rs4473999 and rs1551514 were determined by qRT-PCR with the method of SYBR-green in the ABI Prism 7500 Fast Real-Time PCR System. The relative expression of PCAT1 was calibrated by GAPDH as the endogenous control and present as the 2 -ΔCT value. The sequence of primers used was listed in Table S3.

Dual-luciferase reporter assay
According to the prediction of LncRNASNP2, the mutated PCAT1-rs4473999 may loss the binding site of miR-149-5p. In this study, HEK 293T cells with high MOI transfection and recognized by double luciferase assay were used to validate the binding of SNP and miRNA. The transfection was performed in a 12-well plate when the HEK 293T cells were at a con uence of 40%. The wild pmirGLO plasmid or mutated pmirGLO plasmid and miR-149-5p mimic or mir-NC were transfected by using the riboFECT TM CP kit. Fluorescence was detected 72 hours after transfection, and the relative activity of luciferase was calculated according to the instructions using a dual luciferase reporter assay system based on re y/renilla uorescence.
CCK8, Scratch and Transwell assay in miR-149-5p interference and overexpression lentiviral vectors were constructed and transfected into breast cancer MCF-7, MDA-MB-231 cells) and screened for stable strain. Then qPCR was performed to determine the e ciency of miRNA interference and overexpression. CCK8, transwell and scratch assay were performed to evaluate the effects of interfering and overexpressing miR-149-5p on proliferation, invasion and migration in MCF-7 and MDA-MB-231 cells respectively.

Statistical analysis
The distribution between the case and the control group was compared with the continuous variable using the t test, and the categorical variable was analyzed by the Chi-square test. Susceptibility analysis and strati ed analysis of basic features between SNP and breast cancer were performed using multi-factor unconditional logistic regression analysis to calculate the corresponding odds ratio (OR) with 95% con dence intervals (95%CI). MDR software was used to predict the interaction between SNP and environmental factors; The haplotype analysis was performed using SHEsis online software [19] ; false positive reporter rate analysis was used to verify the authenticity of the results obtained. The expression of PCAT1 in different groups was compared by using a t-test with a p-value of <0.05 considered as signi cant. The double luciferase activities comparison between different groups was performed by using a t-test statistics. In CCK8 experiment, independent sample t test was used to analyze the difference of OD value between different groups. In the cell scratch and transwell assay experiment, t test was used to analyze the difference of the cell scratch healing rate and the number of invaded cells between different groups.

Clinic characters of the patients
We collected the basic information of 504 BC cases and 505 healthy controls into the analysis, including their disease on-set age, menarche age, menopause status, menopause age, reproductive history, number of abortions, breastfeeding history, family history and hormone receptor status (Table1). A loistics analysis shows the age of menarche was different between the BC patients and the healthy controls (P=0.030), multiple pregnancy (OR: 1.964, 95%CI 1.355-2.796) and the family history of breast cancer (OR=1.869 95%CI: 1.116-3.130) may be related to the increased risk of breast cancer a history of breastfeeding (OR=0.724, 95%CI: 0.535-0.980) may be related to the reduced risk of breast cancer.

Susceptibility analysis
The correlation analysis between PCAT1 SNPs genotype and breast cancer susceptibility was present in Table 2. The analysis was performed in four different models (codominance, dominant, recessive and overdominance) respectively. In the adjusted logistics regression analysis, SNP rs4473999 was the risk factor of breast cancer in the overdominant model (OR=1.360, 95%CI: 1.009-1.832). To ensure the representativeness of the control group, the Hardy-Weinberg Balance Test was applied. It is apparent that all the control samples of SNPs were representative (P>0.05).

Strati ed analysis
The strati ed analysis consists of three aspects of strati cation. The rst is to stratify the patients' clinical information in the model, including age, menarche age, menopause status, menopause age, number of pregnancies, number of abortions, breastfeeding history and family history, as shown in Table 3. In the dominant model, SNP rs117117537 was a risk factor for breast cancer in menopausal age >50 years (OR=2.413 95%CI: 1.057-5.508); rs4473999 was a risk factor for breast cancer in menopausal age >50 years (OR=2.137 95%CI: 1.065-4.286) and abortion times <2 (OR=1.510 95%CI: 1.045-2.181). Secondly, the hormone receptor status of case breast cancer patients was strati ed. As shown in Table S4, only TT genotype (OR=0.158 95%CI: 0.029-0.864) of SNP rs785003 was associated with HER-2 receptor status. And nally, the data was strati ed according to the molecular subtypes of breast cancer, that the analysis shows GA+AA genotype (OR=0.671 95%CI: 0.452-0.997) of SNP rs1551514 was correlate with luminal type breast cancer (Table S5).

Haplotype analysis and Gene-reproductive interaction
Haplotype analysis was used to determine the joint effect between SNPs of lncRNA, and the frequencies less than 3% were not present (Each haplotype was divided into two groups, the haplotype group and the non-haplotype group. The reference group was the nonhaplotype group). As shown in Table 4, the G rs1551514 T rs1551513 C rs447399 C rs9656964 T rs17762938 C rs7823297 T rs785003 T rs117117537 haplotype of PCAT1 was associated with increased risk of breast cancer (OR=1.614 95%CI: 1.116-2.333). Table 5 demonstrated the results of the interaction between genetic and reproductive factors analyzed using MDR software. Among the 1 to 3 order interaction models produced by tting, the 3 order model was the optimal model, the average accuracy of training set was 0.6137, the average precision of test set was 0.5837 and the consistency rate of ten fold cross validation is 10/10. And the model includes three factors, rs4473999, number of pregnancies and breastfeeding history, which manifested that there was interaction between genes and reproductive factors.

False positive report probability (FPRP)
In this study, FPRP analysis [20] was used to evaluate the reliability of the positive results of PCAT1 SNPs associated with breast cancer susceptibility. The critical value of FPRP was set as 0.5. From the data in Table S6, it is apparent that when the prior probability was 0.25, the FPRP value of rs4473999, rs1551514 and rs117117537 positive results were all lower than the critical value, suggesting that rs4473999, rs1551514 and rs117117537 may have a real correlation with breast cancer susceptibility.
Real-time uorescent quantitative PCR (qPCR) From the results of qPCR in Figure 1 we can see that for rs4473999, CC, CT and TT genotypes, 41, 21 and 14 samples were randomly selected for qPCR, respectively; The relative expression of PCAT1 in the three genotypes was 1.50±0.70, 1.07±0.83 and 0.75±0.64, respectively. Pairwise comparisons showed that difference between CC vs CT (P=0.038) and CC vs TT (P=0.001) were statistically signi cant, and the expression levels of PCAT1 in CT and TT groups were lower than that in CC group. For rs1551514, 25 samples were GG genotype, the relative expression of PCAT1 was 1.63 ± 0.97; 33 samples were GA genotype, the relative expression of PCAT1 was 1.10 ± 0.61; 13 samples were AA genotype, the relative expression of PCAT1 was 0.94 ± 0.79; and the differences between GG vs GA (P=0.021) and GG vs AA (P=0.033) were both statistically signi cant.
Dual-luciferase reporter assay Figure 2 showed the results of the dual-luciferase reporting experiment. The luciferase activity of NC group is signi cantly higher than the miR-149-5p group (P=0.001). Simultaneously, the luciferase activity of mutant-type (MUT) plus miR-149-5p group is signi cantly higher than the wild-type (WT) plus miR-149-5p group (P<0.001). These results indicate the combination of rs4473999-WT and miR-149-5p, but there was no evidence for the combination between the rs4473999-MUT and miR-149-5p, which was consistent with the previous prediction.

Cytological experiment
The results of veri cation of the knockdown and overexpression stable transgenic effects of miR-149-5p combined with PCAT1 SNP showed that miR-149- Similarly, the healing rate of MCF-7 cells in the low-expression group of miR-149-5p was lower (P=0.021), while the healing rate of MCF-7 cells in the high-expression group of miR-149-5p was higher (P=0.014) ( Figure 4B). The results of cell invasion experiments showed that in both MDA-MB-231 cells and MCF-7 cells, the number of cell invasion in the miR-149-5p low expression group was lower than that of the NC group, and the cell invasion number in the miR-149-5p high expression group was higher than the NC group ( Figure 5).

Discussion
Through a series of experiments, we rst report the SNP of PCAT1 was related to the susceptibility of breast cancer that rs117117537 (OR = 2.413, 95%CI: 1.057-5.508) and rs4473999 (OR = 2.137 95%CI: 1.065-4.286) were identi ed as risk factors for breast cancer when the menopausal age was ≥ 50. In addition, rs785003 was associated with HER2 status of breast cancer (P = 0.033) and rs1551514 was related to luminal type breast cancer (P = 0.048). The haplotype GTCCTCTT of PCAT1-SNPs is a risk factor for breast cancer (OR = 1.614 95%CI: 1.116-2.333).The rs4473999, associated with number of pregnancies and breastfeeding history, was also identi ed as a risk factor for breast cancer (OR = 2.487 95%CI: 1.929-3.206). The false positive analysis con rmed the reliability of these results. Preliminary functional assays demonstrated that the relative expression of PCAT1 was different between SNP rs1551514 or rs4473999 genotypes, mainly manifested in WT vs MUT (P = 0.021, P = 0.038) and WT vs heterozygous (P = 0.033, P = 0.001); In addition, miR-149-5p was shown to bind the wild type of rs4473999 rather than mutant type by double luciferase reporter gene assay. The cell function veri cation results showed that low expression of miR-149-5p could inhibit the proliferation, invasion and migration of breast cancer cells, while high expression could promote the proliferation, invasion and metastasis of breast cancer cells.
It has been reported that single nucleotide polymorphism (SNPs) of lncRNA could be associated with cancer susceptibility. For example, Peng R et al. con rmed that lncRNA MALAT1's tagSNPs (rs3200401, rs619586) were related to the susceptibility of breast cancer through the change of serum mRNA expression level [6] ; SNP rs2073859 of LIMK2 may affect the risk of bladder cancer by speci cally up-or down-regulating miR-135a [7] . Bayram S et al. found that the polymorphism of HOTAIR rs920778 gene may play an important role in the genetic susceptibility and invasiveness of breast cancer in the Turkish population [21] . Currently some studies have found that the SNPs in PCAT1 were associated with susceptibility of different cancers, i.e. rs1902432 and prostate cancer [22] , rs710886 and bladder cancer [23] , rs2632159 and colorectal cancer [24] . However, no study has found the association between any SNP in PCAT1 and breast cancer susceptibility. Here we rst showed that rs4473999 genotype CT is associated with an increased risk of breast cancer in a super dominant model, and its mutant genotype CT + TT showed a higher incidence of breast cancer when the menopausal age was ≥ 50 years and the number of miscarriages is less than 2 compared with homozygous wild-type CC. This is the rst study that demonstrated the association between the SNPs of PCAT1 and breast cancer susceptibility.
In this study, the dual luciferase reporter gene experiment was used to verify whether the rs4473999 mutation affected the binding ability of PCAT1 and miR-149-5p. The results showed that PCAT1 could bind to miR-149-5p when rs4473999 carried the wild-type gene C, and after the mutation, there was no evidence that PCAT1 could bind to miR-149-5p, which was consistent with the predicted results of LncRNASNP2. That is, the mutation of rs4473999 could affect the binding ability of PCAT1 and miR-149-5p. Studies have shown that miR-149-5p is closely related to the development of a variety of cancers, such as liver cancer [25] , nasopharyngeal carcinoma [26] , nonsmall cell lung cancer [27] , etc., but there is no research on miR-149-5p related to breast cancer progression. In this study, CCK8, scratch experiment and transwell experiment were used to investigate the effect of miR-149-5p combined with rs4473999 of PCAT1 on the proliferation, migration and invasion of breast cancer cells; the results showed that the low expression of miR-149-5p may inhibit the proliferation, migration and invasion of breast cancer cells; the overexpression of miR-149-5p may promote the proliferation, migration and invasion of breast cancer cells.
This study is the rst report on the association between PCAT1 genetic variant SNPs and breast cancer susceptibility. Based on bioinformatics prediction and experimental veri cation, it is found that PCAT1 rs4473999 may affect the proliferation, invasion and migration of breast cancer cells by regulating miR-149-5p. The main advantages of this study are re ected in the following aspects, rstly, all the cases included in this study are new cases, which is conducive to controlling the incidence bias; secondly, the control group is randomly selected from the chronic disease investigation project of 20,000 community in Henan province, which can reduce the selection bias; nally, genotyping of all SNPs were randomly selected for 10% of samples for sequencing veri cation and in the cell function experiment, all experiments were repeated more than three times, therefore, the results of this study have authenticity and reliability. Nevertheless, this study still has some limitations, for one thing, all the subjects included in this study were Chinese Han, so the results of this study in other populations need to be further veri ed; for another, the effect of PCAT1 genetic variation SNPs on breast cancer was only preliminarily explored in this study, the further function of PCAT1 genetic variation SNPs needs to be explored.
In summary, our study shows the PCAT1 gene polymorphism is associated with the occurrence of breast cancer, which may help to improve our understanding about the susceptibility of breast cancer. In addition, the PCAT1 rs4473999 C/T variant may affect the binding of miR-149-5p to PCAT1, subsequently, affecting the susceptibility of breast cancer cells by regulating the expression of miR-149-5p.

Declarations Ethical Approval and Consent to participate
The study was approved by the Ethics Review Committee of the Ethics Committee of Medical and Health Research of Zhengzhou University.

Consent for publication
We agree to authorize the article for publication

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.