Screening prognostic lncRNAsin ESCC
There were a total of 217 samples with lncRNA expression profiles including GSE53624 (n=119) and RNA-seq (n=98) dataset in this study. From the above two datasets, we found a total of 6253 lncRNAs were expressed in ESCC tissues. We then performed univariate cox analysis and Kaplan–Meier analysis to analyze the relationship between the lncRNA expression and ESCC OS. Based on the two ESCC profiling datasets and the corresponding clinical follow-up information, univariate cox analyses identified 368 lncRNAs in the GSE53624 and 290 lncRNAs in the RNA-seq dataset which were significantly associated with ESCC OS (Cox P < 0.05) and Kaplan–Meier analyses identified 386 survival associated lncRNAs in the GSE53624 and 312 survival associated lncRNAs in the RNA-seq dataset (log rank P < 0.05). Through comparing the above four groups of survival related lncRNA data, we found 19 identical lncRNAs significantly correlated with OS in two datasets (COX P<0.05, log rank P <0.05) (Figure 1A). Subsequently, we found that 10 of the 19 prognostic-related lncRNAs showed a consistent risk trend in the four groups, displaying a risk or protective role in ESCC (Supplementary table S1).
Constructing prognostic multi-lncRNA signatures
We used 119 ESCC data from public GSE53624 as the training group, 98 RNA-seq data as test group and 84 qPCR data as validation group. Using above selected 10 prognostic lncRNAs, we constructed 210-1=1023 signatures in the training dataset. In order to obtain a signature with the strongest prognostic ability, we performed ROC analyses in the training dataset (Supplementary table S2). We screened the 5-lncRNA combination with the largest AUC value, namely the prognostic signature containing AC007179.1, MORF4L2-AS1, RP11-488I20.9, RP13-30A9.2, RP4-735C1.6 (Figure 1B, Table 2). The lncRNA risk score was calculated as follows: (0.47× expression value of AC007179.1) + (0.58× expression value of MORF4L2-AS1) + (-0.47× expression value of RP11-488I20.9) + (-0.64 × expression value of RP13-30A9.2) + (0.62 × expression value of RP4-735C1.6). As shown in Figure 1C, the AUC of the screened lncRNA signature was 0.71.
Predictive power of the lncRNA signature for patients with ESCC
After calculated risk scores of ESCC patients in the GSE53624 dataset, the median risk classified ESCC patients into high- and low- risk groups (n = 59/60). Kaplan–Meier analysis found the survival of ESCC patients in the low-risk group was significantly longer than those in the high-risk group (median survival: 1.75 years vs. 4.01 years, log-rank test P<0.001; Figure 2A). The 3-year survival rate for patients in the high-risk group was only 23.73%, while that of patients in the low-risk group reached as high as 66.67%.
Then we evaluated the predictive ability of the five-lncRNA signature in ESCC RNA-seq dataset (n=98), which was treated as the test set. The median risk score of patients in the test dataset was calculated and used to divide patients into high- and low-risk groups. Figure 2B showed the Kaplan–Meier analysis result. The prognosis of ESCC patients in the low-risk group was significantly better than patients in the high-risk group (Median survival: 2.44 years vs. 3.97 years, log-rank test P=0.014; Figure 2B). The 3-year survival rate for the patients of high-risk group was only 40.82%, while that of the low-risk group was 61.22%.
In addition, we performed qPCR experiment and obtained the five lncRNAs expression level in 84 ESCC tissues. Through calculated risk scores and used the median risk score as the cutoff value, we obtained the high score group and low score group. Kaplan–Meier analysis discovered the survival difference between two risk groups. The survival of ESCC patients with low-risk scores was still significantly greater than patients with high-risk scores. (Median survival: 2.82 years vs. 4.08 years, log-rank test P=0.011; Figure 2C).
Figure 3 displayed the survival time, risk score and lncRNA expression level of each patient in above two ESCC lncRNA expression profile datasets, including GSE53624 and RNA-seq datasets. As shown, patients with the higher expression of AC007179.1, MORF4L2-AS1 and RP4-735C1.6, the higher the risk were with the more deaths. On the contrary, patients with lower-risk scores tended to have higher expression level of protective lncRNAs (RP13-30A9.2, RP11-488I20.9) and have a lower mortality rate.
The five-lncRNA signature can independently predict the survival of ESCC patients
The Chi-square test found the lncRNA signature was not associated with clinicopathological factors, including age, sex and pTNM stage in the training, test and qPCR validation groups (Table 3). Although univariate cox analysis of the training, test and independent validation datasets showed TNM stage and the signature were both associated with the prognosis of ESCC patients, multivariable Cox regression analyses showed only the lncRNA signature was an independent prognostic factor for ESCC patients (High-risk group vs. Low-risk group, HR training = 3.50, 95% CI 2.13–5.76, P < 0.001, n=119; HR test = 1.03, 95% CI 1.00–1.06, P = 0.014, n=98; HR independent = 2.95, 95% CI 1.21–7.17, P = 0.017, n=84, Table 4).
The prediction performance of the five-lncRNA signature and TNM stage
TNM stage is currently used as the main indicator for prognosis evaluation of ESCC. Our study confirmed that TNM stage and the five-lncRNA signature were related to the prognosis of patients. Therefore, we performed stratification analysis to explore the predictive power of the five-lncRNA signature in patients with known TNM stage. We first divided all the ESCC patients involved in this study (n=301) into two groups, one group was patients with TNM low (I+II) stage and the other group was patients with high TNM stage (III+IV). Then we used the lncRNA signature to conduct stratification analyses of above two groups. Kaplan–Meier results showed that the lncRNA signature could further separate the patients with TNM low (I+II) stage (Figure 4A) or patients with high TNM stage (III+IV)(Figure 4B) into two subgroups with significantly different survival(log-rank test P<0.001).
To analyze the predictive performance advantages of the five-lncRNA signature, TimeROC analyses were performed in the three datasets (n=301). The AUCs of the signature were 0.690/0.648/0.639 at 5/3/1 years (Figure 4C), while the AUCs of TNM were 0.643/0.654/0.637 at 5/3/1 years (Figure 4D), suggesting the five-lncRNA signature was superior to TNM stage in terms of ESCC prognosis evaluation, especially at 5 years survival. Moreover, we found AUC values were maximized when signature and TNM staging were combined to assess patient prognosis (0.742/0.723/0.702 at 5/3/1 years, Figure 4E), indicating the lncRNA signature improve the accuracy of TNM in evaluating prognosis.
Functional prediction of the five-lncRNAs signature
In order to predict function of the five-lncRNAs signature, we performed Pearson analysis to obtain the genes implicated in the correlation of these 5-lncRNA signature in the GSE53624 and RNAseq datasets, respectively. Then we got 988 co-expression genes in total (Pearson coefficient >0.4/<-0.4, P<0.001, Figure 5A). SubpathwayMiner suggested these 988 genes were significantly enriched in 73 different KEGG pathways (P < 0.05, Supplementary Table S4), especially involved in p53 signaling pathway, ErbB signaling pathway, Regulation of actin cytoskeleton, MAPK signaling pathway, PPAR signaling pathway VEGF signaling pathway, Pathways in cancer and Toll-like receptor signaling pathway (Figure 5B), which were ranked in the top 20 and were closely related to the cancer development.