Identification of a prognostic long non-coding RNA signature in breast cancer

Background: Breast cancer is the most common malignant disease among women. At present, more and more attention has been paid to long non-coding RNAs (lncRNAs) in the field of breast cancer research. We aimed to investigate the expression profiles of lncRNAs and construct a prognostic lncRNA for predicting the overall survival (OS) of breast cancer. Methods: The expression profiles of lncRNAs and clinical data with breast cancer were obtained from The Cancer Genome Atlas (TCGA). Differentially expressed lncRNAs were screened out by R package (limma). The survival probability was estimated by the Kaplan‑Meier Test. The Cox Regression Model was performed for univariate and multivariate analysis. The risk score (RS) was established on the basis of the lncRNAs’ expression level (exp) multiplied regression coefficient (β) from the multivariate cox regression analysis with the following formula: RS=exp a1 * β a1 + exp a2 * β a2 +……+ exp an * β an . Functional enrichment analysis was performed by Metascape. Results: A total of 3404 differentially expressed lncRNAs were identified. Among them, CYTOR , MIR4458HG and MAPT-AS1 were significantly associated with the survival of breast cancer. Finally, The RS could predict OS of breast cancer (RS=exp CYTOR * β CYTOR + exp MIR4458HG * β MIR4458HG + exp MAPT-AS1 * β MAPT-AS1 ). Moreover, it was confirmed that the three-lncRNA signature could be an independent prognostic biomarker for breast cancer (HR=3.040, P=0.000). Conclusions: This study established a three-lncRNA signature, which might be a novel prognostic biomarker for breast cancer. of clinicopathological OS, age, T, N, M, TNM stage, HER2 expression, and ER/PR expression. The Kaplan-Meier performed to evaluate the relationship between differentially expressed lncRNAs and OS,

heterogeneous disease which makes just a part of patients could match predictive value of these factors. A study showed that their predictive ability was only effective in approximately 30% of patients [6]. In addition, above clinicopathological parameters are too general to cater to the precise management of breast cancer. In recent years, multigene detection such as 21-gene detection, 12-gene detection, 50-gene detection, and so on, with the molecular and genetic levels have displayed greater accuracy than traditional prognostic biomarkers in a cohort of patients with HR-positive and HER2-negative [7][8][9][10].
Nowadays, more and more attention has been paid to long non-coding RNAs (lncRNAs) which are involved in biological signal pathways and pathogenesis in malignant diseases, such as breast cancer.
LncRNAs are a kind of RNA transcripts with a length of more than 200 bp and no function of encoding protein. A number of biological processes such as cell proliferation, differentiation, chromosome remodeling, epigenetic modulation, transcriptional and posttranscriptional modifications are related to lncRNAs [11]. Various studies have indicated that disorderly expression of lncRNAs in the occurrence and progression of multiple cancers including breast cancer [11,12]. Moreover, more and more dysregulated lncRNAs have been found and demonstrated their potential as possible biomarkers for predicting clinical outcome in breast cancer. For instance, the over-expression lncRNAs of HOTAIR and BCAR4 would promote the proliferation and metastasis of tumor cells in breast cancer [13,14].
possessed the activity to suppress development and progression of tumor [16][17][18][19][20][21]. If these lncRNAs were absent or low-expression, breast cancer would be more aggressive. More and more lncRNAs have been found and identified with the development of gene sequencing technology. However, prognostic signature with lncRNAs has been rarely reported. In addition, the biological functions and signal pathways of lncRNA remain unclear in breast cancer. Therefore, the relationship between lncRNAs and breast cancer deserves further exploration.
In this study, we performed the lncRNA expression profiles by analyzing a cohort of previously published gene expression profiles from The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/) to establish a prognostic lncRNA signature in breast cancer. Finally, we identify three lncRNAs as the prognostic signature related to OS in breast cancer. These findings may provide a novel prognostic model and research concept at the genetic levels in the field of breast cancer study.

Methods
Differentially expressed lncRNAs and breast cancer cases from TCGA The RNA sequencing and clinical data were downloaded from TCGA data portal (https://cancergenome.nih.gov/.). The RNA-Seq data contained the expression profiles of 60483 mRNAs and clinical data covered 1222 cases, including 1109 patients of breast cancer and 113 cases of normal breast tissue. Next, we extracted 14847 lncRNAs from the expression data of 60483 mRNAs. The data were obtained from TCGA, which is a community resource project providing available data for community research. Approval by a local ethics committee was not required because the current study adhered to the TCGA publication guidelines and data access policies. Then, we selected 3404 significant differentially expressed lncRNAs by R package (limma) with the standard of P 0.01 and the absolute log2 FC 1. The clinical data of breast cancer were also obtained from TCGA as mentioned above. The cases with incomplete clinical data would be excluded, such as OS, and other clinicopathological parameters. The risk score (RS) for predicting OS was established on the basis of the expression level (exp) multiplied regression coefficient (β) from the multivariate cox regression analysis with the following formula: RS = exp CYTOR * β CYTOR + exp MIR4458HG * β MIR4458HG + exp MAPT-AS1 * β MAPT-AS1 . 565 cases of breast cancer were divided into two groups of low-risk and high-risk according to the median RS as cut-off. The relationships between RS groups and the expression of CYTOR, MIR4458HG and MAPT-AS1 were analyzed by the T Test. Receiver operating characteristic (ROC) curves predicted the sensitivity and specificity of the RS with the three-lncRNA signature for predicting OS. The area under characteristic (AUC) values was calculated from the ROC curves. All tests were two sided. P 0.01 was considered to indicate a statistically significant difference. Multivariate analysis for the expression of CYTOR, MIR4458HG and MAPT-AS1 was also statistically significant if P 0.05.

Co-expression Analysis Of lncRNAs With mRNAs
Co-expression network analysis (http://bio-bigdata.hrbmu.edu.cn/Co-LncRNA/) was performed to evaluate the prognostic lncRNAs and mRNAs of breast cancer. Linear Regression was performed and P < 0.01 was considered to define significant correlations.

Functional Prediction Of The Three-lncRNA Signature
Finally, the lncRNAs and the co-expressed mRNAs were inputed in the online tool: Metascape (http://metascape.org/gp/index.html) for functional enrichment analysis. Gene Ontology (GO) terms for the biological process (BP), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and so on, were enriched. P < 0.01 was considered as a statistically significant term. All the resultant terms were then grouped into clusters based on their similarities.

Differentially expressed lncRNAs in breast cancer
We extracted 14847 lncRNAs from the expression data of 60483 mRNAs. 3404 differentially expressed lncRNAs were obtained by R package (limma) (P 0.01), and these lncRNAs were presented with Volcano Plot (Fig. 1).

Differentially Expressed lncRNAs Association With Os In Breast Cancer
The clinical data of 1222 cases was downloaded from TCGA, and 565 cases were selected for prognostic analysis according to completed information of clinicopathological parameters including OS, age, T, N, M, TNM stage, HER2 expression, and ER/PR expression. The Kaplan-Meier was performed to evaluate the relationship between differentially expressed lncRNAs and OS, then, we obtained 30 lncRNAs related to prognostic value in breast cancer (P 0.01).
Prognostic lncRNA Signature In Breast Cancer Among these 30 lncRNAs with prognostic value, co-expression network analysis (http://biobigdata.hrbmu.edu.cn/Co-LncRNA/) revealed that three lncRNAs: ENSG00000222041 (CYTOR), ENSG00000247516 (MIR4458HG) and ENSG00000264589 (MAPT-AS1) were co expressed with mRNAs. Further univariate and multivariate cox regression analysis indicated that CYTOR, MIR4458HG and MAPT-AS1 were independent prognostic factors in breast cancer patients (P 0.05; Table 1). Next, we established the RS for predicting OS basing on the three lncRNAs above: RS = exp CYTOR * β CYTOR + exp MIR4458HG * β MIR4458HG + exp MAPT-AS1 * β MAPT-AS1 . The "β" means the coefficient from the multivariate cox regression analysis and the "exp" means the expression level of lncRNA. According to above RS formula, we calculated the RS for each patient in selected 565 cases of breast cancer group, and divided the patients into two groups of low-risk and high-risk basing on the median RS as cut-off (Fig. 2). Table 1 The detailed information of three prognostic lncRNAs in breast cancer The Kaplan-Meier was performed to evaluate the correlation between RS and OS, and we found that patients in the high-risk group had significantly shorter OS than those in the low-risk group (P = 4e-04; Fig. 3). The RS with the three-lncRNA signature could strongly predict OS in breast cancer.
Furthermore, multivariate cox regression analysis showed that the RS with the three-lncRNA signature was an independent predictor of OS in breast cancer, which was as same as TNM stage when multiple factors, including age, T, N, M, TNM stage, HER-2 expression, ER expression, and PR expression were controlled (P 0.01; Table 2). We also evaluated the relationships between the three lncRNAs expression and RS groups (high-risk and low-risk group) by T Test. The results showed that MIR4458HG and MAPT-AS1 was lower expression in the high-risk group compared with the low-risk group, respectively, while CYTOR was higher expression in the high-risk group (Fig. 4). We analyzed ROC to show sensitivity and specificity of survival prediction in our model. ROC curves indicated that AUC of the three-lncRNA signature was 0.802 (Fig. 5).

Functional Prediction Of Three LncRNAs
Functional enrichment analysis tools, including GO terms and KEGG pathways were used to predict potential biological processes for three lncRNAs. The results revealed that 1110 terms were enriched, including 954 BP terms, 24 KEGG pathways, and so on. The three lncRNAs may partake following cancer. For example, lncRNAs act as a signal or decoy to promote or suppress gene expression [22].
The lncRNAs regulate the translation of mRNAs and control their stability via forming double-stranded RNA with mRNAs or regulate protein stability by binding [23]. The expression of more and more lncRNAs are abnormal in cancers, and their potential as possible prognostic biomarkers can be used as therapeutic targets [15,24].
In this study, we firstly screened out the differentially expressed lncRNAs, and next identified three prognostic lncRNAs: CYTOR, MIR4458HG and MAPT-AS1. CYTOR is also called LINC00152, previous studies showed that the expression of lncRNA CYTOR was up-regulated in multiple malignant diseases, such as gastrointestinal cancer, liver cancer, lung adenocarcinoma and esophageal squamous cell carcinoma [25][26][27][28]. Yue et al. [29] showed that CYTOR promoted metastasis of colon cancer through Wnt/β-catenin signal pathway. Reon et al. [30] reported that CYTOR promoted invasion via a 3'-hairpin structure, and as a effective biomarker for predicting survival of glioblastoma. As far as we knew, there were just a few of studies on CYTOR in breast cancer. It might be involved in the EGFR/mammalian target of rapamycin pathway, which promoted triple negative breast cancer progression by affecting the stability of PTEN protein [31]. Previous study indicated the high expression of MAPT-AS1 was correlated with better survival in breast cancer [32]. However, we have not found any report about MIR4458HG related to breast cancer. Our data showed above three lncRNAs were associated with OS in breast cancer. Furthermore, MAPT-AS1 and MIR4458HG were lower expression in the high-risk group compared with the low-risk group, respectively, while CYTOR was higher expression in the high-risk group, indicated that our results were consistent with previous study. However, the studies about the three lncRNAs are still rare reports in breast cancer, so they are worth to further study as the good targets in the future.
We also structured a prognostic signature with three lncRNAs according to the coefficient from the multivariate cox regression analysis and the expression profiles. We calculated the RS for each patient in selected 565 cases of breast cancer and divided them into two groups of low-risk and highrisk. We found that patients in the high-risk group showed significantly shorter OS than those in the low-risk group, it might be used as an independent predictor for OS. Moreover, the ROC curve analysis suggested that three lncRNAs revealed high sensitivity and specificity of survival prediction in our model. As far as we knew, this was the first investigation to establish a prognostic signature with RNA- Their functions above were associated with the three lncRNAs, therefore, three lncRNAs which we screened out might be as the targets to influence the onset and progress of breast cancer. Sun et al.
and Zhou et al. applied additional independent GEO datasets to validate their lncRNA signatures. We also attempted to validate the prognostic value of the three-lncRNA signature from GEO database.
Regrettably, insufficient cases of these three lncRNAs could be provided from these GEO datasets, therefore, the prognostic value of the three-lncRNA signature needs to be further validated with other methods, and confirmed with a larger sample size in the future.

Conclusions
In a word, we evaluated the lncRNA data of 565 cases with breast cancer from TCGA database and established the three-lncRNA signature, which might be a novel prognostic biomarker for breast cancer patients. However, it needs more clinical practice to validate prognostic value, and more experiments to explore biological function of these lncRNAs in breast cancer.

Declarations
Authors' contributions GCL designed the study. GCL, XLY, and RLG prepared the manuscript. YNZ, JJL, and WL collected and processed data. GCL, XLY, and RLG analyzed data. RLG acquired funding. All authors read and approved the final manuscript.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Figure 1
Volcano plot of the differentially expressed lncRNAs between breast cancer and normal tissues. Red and green dot indicates high expression and low expression, respectively. Black dot shows the lncRNA expression with both the log2FC<1 and P-value<0.01. The Y axis represents a P-value and the X axis represents a log2FC.     clusters, colored by P-values. The smaller the P-value is the deeper the color is.