Identification and selection of prognostic-related lncRNAs
Compared with 36 normal specimens, 69 downregulated and 217 upregulated lncRNAs were discovered in 578 breast carcinoma specimens by performing the DESeq2 package[13] (Fig. 1a, b). In all, 577 samples with whole follow-up data were randomized to the training or testing set. For the training set, the lncRNAs were subjected to the univariate Cox regression model. Thirty-two lncRNAs were screened out that dramatically correlated with overall survival (OS, P < 0.001) among the 286 differentially expressed lncRNAs (Supplementary Table1).
Construction of a lncRNA-based prognostic prediction system and validation in the training group
Stepwise random survival forests analysis and the multivariate Cox regression model was further used to screen for the best prognostic assessment indexes in the 32 candidate lncRNAs. Based on the calculation results, a final four lncRNAs presented with an independent statistically significant association with survival prognosis (Fig. 2a). Three of them (RP1-193H18.2, AL022341.3, WDR86-AS1) had negative coefficients, representing an inverse relationship between the expression of lncRNAs with survival. The positive coefficients for the remaining one lncRNA (LINC00511) indicated a positive correlation between lncRNA expression with survival.
Dependence on the levels of four lncRNAs, a risk scoring equation weighted by their regression coefficients for breast cancer patients’ survival prediction was constructed as below: risk score= (-0.858 × expression level of RP1-193H18.2) + (-0.684 × expression level of AL022341.3) + (-0.720 × expression level of WDR86-AS1) + (0.466 × expression level of LINC00511). In the training subset, the risk scoring equation was performed to calculate the risk scores for all patients, and the median value of risk scores was regarded as a threshold to divide the set into a high-risk (n=144) and a low-risk group (n=145). The Kaplan Meier curve confirmed that the prognosis of the high-risk subset was prominently worse than that of the low-risk subset. The median survival times for the high-risk and the low-risk groups were 10.85 and 17.27 months, respectively (P-value=2.38e-04, Fig. 2b). Furthermore, the high-risk group’s 3- and 6-year survival rate was 81.4% and 75.1%, whereas the corresponding survival rates were 100% and 91.7%, respectively, in the low-risk group. We applied the time‐dependent receiver operating characteristic (ROC) curves to evaluate the four-lncRNA signature's prognostic accuracy. AUCs of the four-lncRNA signature were 0.78, 0.82 and 0.80 at 1‐, 3‐ and 5‐year survival times, respectively, which indicated excellent performance in predicting the prognosis (Fig. 2c). The risk score distribution and survival status for every patient were plotted as a separate dot in the diagram (Fig. 2d). Patients with high-risk score had more significant mortality than those with low-risk score. A heat map demonstrated the expression pattern of these four lncRNAs in the training set, and the expression pattern was clustered depend on the risk score (Fig. 2e). Among the four lncRNAs, LINC00511 displayed a positive coefficient derived for the multivariate Cox regression model, indicating that LINC00511could be a risk predictor, as its overexpression signified a shorter OS time of patients. However, the other three lncRNAs, including RP1-193H18.2, AL022341.3, and WDR86-AS1, which are negative coefficients, were observed in the multivariate Cox regression model. As their expression levels were higher for the low-risk subset vs. high-risk subset, these three lncRNAs could be protective factors.
Verification of the ability of the four‑lncRNA signature to predict the prognosis in the testing set
We further estimated whether the four-lncRNA signature maintains its prognostic value in the testing subset. In conformity with the same algorithm used in the training subset, every patient's risk score in the testing subset was computed and subdivided into the low-risk (n=128) and high-risk subset(n=160) by the same threshold point used in the training set. The Kaplan-Meier analysis demonstrated that the high-risk group gets a worse survival time than that of the low-risk group in the testing subset (11.52 months vs. 14.62 months; P value= 0.0058; Fig. 3a). The 3- and 6-year survival rates were 86.2% and 79.2% in the high-risk group, 98.4% and 91.9% in the low-risk group. The AUC score at 1, 3 and 5 years also indicated that the four-lncRNA signature could maintain excellent predictive accuracy in the testing subset (Fig. 3b). Additionally, Fig. 3c presents the risk score and survival status for each patient in the testing subset. Not surprisingly, the high-risk lncRNA had the tendency to be upregulated in patients with a high-risk score. By contrast, the protective lncRNAs were highly expressed in patients with a low-risk score (Fig. 3d).
Correlation between the four-lncRNA signature and standard clinicopathologic characteristics
To illustrate the four-lncRNA signature's clinical relevance in breast carcinoma, all the patients were divided into a high-risk and low-risk group in accordance with the median risk score obtained from the training subset. Given this criterion, associations between the four-lncRNA signature and clinicopathologic characteristics of breast carcinoma were evaluated. The findings demonstrated that the four-lncRNA signature has strong affinities with PR status, ER status, treatment therapies and the subtypes of breast cancer (table1). As the four-lncRNA signature is associated with breast carcinoma subtypes, the relationship between the four-lncRNA signature with the prognosis of luminal -type, Her2-type and triple negative-type patients was estimated by the Kaplan-Meier analysis. The findings exhibited that luminal-type patients in the high-risk subset had shorter OS than those in the low-risk subset, the 3- and 6-year OS of the high-risk subset was 88.7% and 79.0%, whereas the corresponding OS was 100% and 94.7% in the low-risk group, respectively (Fig. 4a). However, in the Her2-type and triple negative-type breast carcinoma, no statistical difference in OS was observed between the high-risk and low-risk subset (Fig. 4b, c).
Effect of chemotherapy for patient groups defined by the four-lncRNA signature
In light of the strong associations between the four-lncRNA signature and treatment therapies, we assessed whether it could be utilized to estimate chemosensitivity in patients with breast carcinoma. The patients were categorized into four main subgroups, depending on whether to accept chemotherapy or not and the risk grades of the four-lncRNA signature. For the subgroup with low signature risk, the 3- and 6-year OS of the patients who received no chemotherapy were 100% and 100%, as compared with the 3- and 6-year OS were 99.0% and 90.5% among the patients who received chemotherapy (P=0.369) (Fig. 4d). Besides, for the subgroup with high signature risk, we observed a remarkable difference between the no-chemotherapy and the chemotherapy group concerning OS (P=0.030), the 3- and 6-year OS was 73.6% vs. 88.3% and 58.3% vs. 85.5%, respectively (Fig. 4d). We further validate whether the four-lncRNA signature has a similar role in the subtypes of breast carcinoma. For the luminal type, no statistical difference in OS was identified between the no-chemotherapy and chemotherapy group in patients with high or low signature-risk (Supplementary Fig. 1a-c). We further divided the luminal type into ER+/ PR+ and ER+/ PR- subsets. For the ER+/ PR+ subset, the results confirmed that OS of chemotherapy vs. no chemotherapy was not different for patients with high signature-risk (Fig. 4e); however, the OS of chemotherapy vs. no chemotherapy was much longer for patients with high signature-risk in the ER+/ PR- subset (P=0.001) (Fig. 4f). The triple-negative type, with or without chemotherapy did not affect the OS of the low-risk subgroup (Supplementary Fig. 1d). By contrast, chemotherapy could prolong the survival time of the high-risk subgroup (Fig. 4g). Thus, the four-lncRNA signature could act as a predictor for chemotherapy benefit.
Effect of hormonotherapy for patient groups defined by the four-lncRNA signature
Since the four-lncRNA signature has a close relationship with hormone receptors and treatment therapies, we also assessed whether it could be applied to direct the uses of hormonotherapy in the luminal type of breast carcinoma. The outcomes of Kaplan-Meier analysis exhibited that the application of hormonotherapy does not affect the survival time of the low-risk subgroup (Fig. 5a); on the contrary, the application of hormonotherapy could conspicuously prolong the OS time of the high-risk subgroup (Fig. 5b). Moreover, in the subset of luminal type without chemotherapy, the OS rate of the high-risk subgroup was worse than the low-risk subgroup (Fig. 5c). In the high-risk group of the luminal type without chemotherapy, the results demonstrated that these patients’ OS rate was conspicuously increased by applying hormonotherapy (Fig. 5d). Herein, the four-lncRNA signature could be a reference tool when counseling patients about hormonotherapy options.
Independence of the four-lncRNA signature and other clinical characteristics
To verify the independence of the four-lncRNA signature from other clinicopathological features containing age, AJCC stage, progesterone receptor, HER2 level, estrogen receptor, treatment therapy, radiotherapy and subtypes, the Univariate and Multivariate Cox regression analysis was performed. Univariate Cox regression exhibited that the four-lncRNA signature, AJCC stage, progesterone receptor, estrogen receptor, treatment therapy, radiotherapy and subtype could efficiently predict the outcomes of patients with breast carcinoma (table2). As the multivariate Cox regression analysis demonstrated, confirmed factors with independent prognostic significance for breast carcinoma patients were four-lncRNA signature and AJCC stage (table2). To explore whether the four lncRNA signature could maintain its prognostic capacity at the same AJCC stage, stratified analysis was employed. In the light of their AJCC stage, patients were categorized into two stratums: the early-stage (stage I/II, n=429) and late-stage (stage III/IV, n=148). Dependence on the threshold point for the training set, the four-lncRNA signature was utilized to further divide breast carcinoma patients into high-risk and low-risk subgroups within each stratum. Kaplan-Meier plots exhibited that the survival rate of the high-risk group was significantly poorer than that of the low-risk group (Fig. 6a-c). Taken together, the four-lncRNA signature was an independent clinical prognostic biomarker for patients with breast carcinoma.
Functional characteristics of four prognostic lncRNAs
For the purpose of revealing the precise mechanism of the four lncRNAs in the tumorigenesis of breast carcinoma, functional category enrichment analysis was applied. Because lncRNAs can act as cis-regulators to modulate their neighboring polycomb group genes (PCGs), Pearson correlation analysis was applied to calculate the correlations between the four lncRNAs and PCGs. The results exhibited that 1,446 genes are closely related to at least one of the four lncRNAs (Pearson's correlation coefficient>0.4,P<0.05). Functional enrichment analysis suggested that lncRNA correlated PCGs were mainly enriched in 362 gene ontology (GO) terms and 17 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (P<0.005); these GO terms were further clustered into different functional categories (Fig. 7a, b). Not only the distribution but also the expression changes of these genes in functional categories (top 10) and KEGG pathways (top 10) were displayed in Fig. 7c and d. Collectively, the four prognostic lncRNAs could be crucial regulatory factors for breast carcinoma-related signal pathways by interacting with PCGs.