Screening prognostic lncRNAs in ESCC
We used 119 ESCC data from public GSE53624 as training group, 98 RNA-seq data as test group and 84 qPCR data as validation group. Detailed clinicopathological characteristics of all these ESCC patients were showed in Table 1. There were a total of 217 samples with lncRNA expression data including GSE53624 dataset and test dataset. From the two datasets, we found a total of 6253 lncRNAs were expressed in ESCC tissue. We then performed univariate cox analysis and Kaplan–Meier analysis to analyze the relationship between the lncRNA expression and ESCC OS. Based on the two ESCC profiling datasets and the corresponding clinical follow-up information, univariate cox analyses identified 368 lncRNAs in the GSE53624 and 290 lncRNAs in the RNA-seq dataset which were significantly associated with ESCC OS (Cox P < 0.05) and Kaplan–Meier analyses identified 386 survival associated lncRNAs in the GSE53624 and 312 survival associated lncRNAs in the RNA-seq dataset (log rank P < 0.05). Through comparing the above four groups of survival related lncRNA data, we found 19 identical lncRNAs significantly correlated with OS in two datasets (COX P < 0.05, log rank P < 0.05) (Fig. 1A). Subsequently, we found that 10 of the 19 prognostic-related lncRNAs showed a consistent risk trend in the four groups data, displaying a risk or protective role in ESCC(Supplementary table S1).
Constructing prognostic multi-lncRNA signature
Using 10 prognostic lncRNAs, we constructed 210-1 = 1023 signatures. In order to obtain a signature with the strongest prognostic ability, we performed multiple ROC analyses in the training dataset (Supplementary table S2). We screened the 5-lncRNA combination with the largest AUC value, namely the prognostic signature containing AC007179.1, MORF4L2-AS1, RP11-488I20.9, RP13-30A9.2, RP4-735C1.6 (Fig. 1B, Table 2). The lncRNA risk score was calculated as follows: (0.47 × expression value of AC007179.1) + (0.58 × expression value of MORF4L2-AS1) + (-0.47 × expression value of RP11-488I20.9) + (-0.64 × expression value of RP13-30A9.2) + (0.62 × expression value of RP4-735C1.6). As shown in Fig. 1C, the AUC of the screened lncRNA signature was 0.71.
Predictive power of the lncRNA signature for patients with ESCC
After calculated risk scores of ESCC patients in the GSE53624 dataset, the median risk classified ESCC patients into high- and low- risk groups (n = 59/60). Kaplan–Meier analysis found the survival of ESCC patients in the low-risk group were significantly longer than those in the high-risk group (median survival: 1.75 years vs. 4.01 years, log-rank test P < 0.001; Fig. 2A). The 3-year survival rate for patients in the high-risk group was only 23.73%, while that of patients in the low-risk group reached as high as 66.67%.
Then we evaluated the predictive ability of the five-lncRNA signature in ESCC RNA-seq dataset (n = 98). The median risk score of patients in the test dataset was calculated and used to divide patients into high- and low-risk groups. Figure 2B showed the Kaplan–Meier analysis result. The survival rate of ESCC patients in the low-risk group was significantly higher than patients in the high-risk group. (Median survival: 2.44 years vs. 3.97 years, log-rank test P = 0.014; Fig. 2B). The 3-year survival rate for the patients of high-risk group was only 40.82%, while that of the low-risk group was 61.22%.
In addition, we performed qPCR experiment with the primers (Supplementary table S3) in 84 ESCC tissues and obtained the five lncRNAs expression level. Through calculated risk scores and used the median risk score as the cutoff value, we obtained the high risk score group and low risk score group. Kaplan–Meier analysis discovered the survival difference between two risk groups. The 3-year survival rate of ESCC patients with low-risk scores was still significantly higher than patients with high-risk scores. (Median survival: 2.82 years vs. 4.08 years, log-rank test P = 0.011; Fig. 2C).
Figure 3 displayed the survival time, risk score and lncRNA expression level of each patients in training, test and qPCR validation datasets. As shown, the higher expression of AC007179.1, MORF4L2-AS1 and RP4-735C1.6, the higher the risk and the more deaths. On the contrary, patients with lower-risk scores tended to have higher expression level of protective lncRNAs (RP13-30A9.2, RP11-488I20.9) and have a lower mortality rate .
The five-lncRNA signature can predict the survival of ESCC patients independently
The Chi-square test found lncRNA signature was not associated with clinicopathological factors, including age, sex and pTNM stage in the training, test and qPCR validation groups (Table 3). Univariate cox analysis of the training, test and independent validation datasets showed TNM staging was associated with prognosis. The results of multivariable Cox regression analyses showed only the lncRNA signature was an independent prognostic factor for ESCC patients (High-risk group vs. Low-risk group, HR training = 3.50, 95% CI 2.13–5.76, P < 0.001, n = 119; HR test = 1.03, 95% CI 1.00–1.06, P = 0.014, n = 98; HR independent = 2.95, 95% CI 1.21–7.17, P = 0.017, n = 84, Table 4).
The prediction performance of the five-lncRNA signature and TNM stage
TNM stage is currently used as the main indicator for prognosis evaluation of ESCC. Our study confirmed that TNM staging and the five-lncRNA signature were related to the prognosis of patients. Therefore, we performed stratification analysis to explore the predictive power of the five-lncRNA signature in patients with known TNM stage. We first divided all the ESCC patients involved in this study (n = 301) into two groups, one group was patients with TNM low (I + II) stage and the other group was patients with high TNM stage (III + IV). Then we used the lncRNA signature to conduct stratification analyses of above two groups of patients. Kaplan–Meier results showed that the lncRNA signature could further separated the patients with TNM low (I + II) stage (Fig. 4A) or patients with high TNM stage (III + IV)(Fig. 4B) into two subgroups with significantly different survival(log-rank test P < 0.001).
To analyze the five-lncRNA signature's predictive performance advantages, TimeROC analyses were performed in the three datasets (n = 301). The AUCs of the signature were 0.690/0.648/0.639 at 5/3/1 years (Fig. 4C), while the AUCs of TNM were 0.643/0.654/0.637 at 5/3/1 years (Fig. 4D), suggesting the five-lncRNA signature was superior to TNM stage in terms of ESCC prognosis evaluation, especially at 5 years survival. Moreover, we found AUC values were maximized when signature and TNM staging were combined to assess patient prognosis (0.742/0.723/0.702 at 5/3/1 years, Fig. 4E), indicating the lncRNA signature improve the accuracy of TNM in evaluating prognosis.
Functional prediction of five prognostic lncRNAs
In order to predict function of the five prognostic lncRNAs, we performed GSEA analyses based on high-risk and low-risk groups in the GSE53624, and RNAseq datasets, respectively. The four-lncRNA risk signature was significantly enriched in 83 different GO categories and KEGG pathways in the GSE53624, and enriched in 385 GO categories and KEGG pathways in RNA seq dataset, simultaneously (P < 0.05, Supplementary Table S4), especially involved in RNA metabolism and immune response processes such as cell division, mRNA processing, ncRNA metabolic process, RNA splicing, activation of immune response, immune response regulating cell surface receptor signaling pathway, innate immune response, immune effector process (Fig. 5).