LncRNA expression profile of NSCLC tissues detected by a custom microarray in a discovery cohort
The 366 NSCLC patients from Sun Yat-Sen University Cancer center in South China were randomly divided into a discovery cohort and a validation cohort. The clinical characteristics of these patients are listed in Table 1. The lncRNA expression profile was first determined in lung tissues of 194 NSCLC patients and in 97 normal tissues in the discovery cohort using a custom lncRNA microarray containing 2412 human lncRNA probes. After subtracting background, normalization and log transformation of microarray data, the lncRNA expression profile was analyzed using the SAM program and student t test and 305 lncRNAs differentially expressed between NSCLC and adjacent normal lung tissues (FDR = 0 and fold change > 1.25) were identified, out of which 138 lncRNAs were unregulated and 167 were down-regulated in the NSCLC tissues.
The potential role of the differentially expressed lncRNAs in distinguishing lung cancer tissues from normal lung tissues was established by performing a hierarchical clustering analysis of the 194 lung cancer samples and the corresponding normal lung tissues with 305 differentially expressed lncRNAs in the discovery cohort. The result showed that a 41-lncRNA signature could distinguish NSCLC tissues from normal tissues with an accuracy of 96·44% (Supplementary Fig S1), and only 10 samples (7 tumor samples and 3 normal samples) were misclassified by the 41-lncRNA signature, implying that these differentially expressed lncRNAs may play an important role in the development and progression of lung cancer.
Identification of a 4-lncRNA prognostic signature for NSCLC patients in the discovery cohort
The elucidation of the prognostic significance of lncRNAs in NSCLC involved univariate Cox regression analysis on all 305 differentially expressed lncRNAs in the discovery cohort. Based on the threshold of the P-value < 0·05, 15 lncRNAs were found to be significantly associated with OS of the lung cancer patients (Table 2), of which 6 lncRNAs were risky and 9 were protective.
Table 2
Summary of 15 lncRNAs associated with overall survival of NSCLC patients in the discovery cohort
NO | LncRNA | Weight | P value | HR (95%CI) | Putative function |
1 | BF768381 | 0·168 | 0·048 | 1·183 (1·001–1·390) | High-risk |
2 | DD3 | 0·212 | 0·035 | 1·236 (1·015 − 1·500) | High-risk |
3 | BF944729 | 0·228 | 0·045 | 1·255 (1·005 − 1·560) | High-risk |
4 | SRG1 | 0·439 | 0·006 | 1·552 (1·136-2·120) | High-risk |
5 | NEAT1 | 0·412 | 0·003 | 1·510 (1.154–1.970) | High-risk |
6 | Zeb2NAT | 0·344 | 0·019 | 1.411 (1.057–1.880) | High-risk |
7 | ASLNC03555 | -0·574 | 0·025 | 0·563 (0·342-0·920) | Protective |
8 | ASLNC09137 | -0·488 | 0·025 | 0·614 (0·401-0·940) | Protective |
9 | GSO_1539211_377 | -0·578 | 0·039 | 0·561 (0·324-0·970) | Protective |
10 | GSO_1539832_035 | -0·486 | 0·041 | 0·615 (0·386-0·980) | Protective |
11 | Lnc-GAN1 | -0·349 | 0·048 | 0·705 (0·499-0·990) | Protective |
12 | GSO_1539211_480 | -0·446 | 0·007 | 0·640 (0·463-0·880) | Protective |
13 | ASLNC11245 | -1·269 | 0·000 | 0·281 (0·143-0·550) | Protective |
14 | BF375442 | -0·348 | 0·026 | 0·706 (0·520-0·950) | Protective |
15 | GSO_1539832_023 | -0·503 | 0·010 | 0·605 (0·412-0·880) | Protective |
The reliability and repeatability of the microarray results were confirmed by evaluating 5 out of the 15 selected prognostic lncRNAs by qRT-PCR in 30 pairs of samples randomly selected from the discovery cohort. Of the 5 lncRNAs, two (NEAT1 and XLOC_009261) were found to be up-regulated and three (XLOC_005302, XLOC_001306, and lnc-GAN1) were found to be down-regulated in the lung cancer tissues as compared to the normal lung tissues. The expression level ratios of the 5 lncRNAs in the cancer tissues to the normal adjacent tissues detected by qRT-PCR were consistent with the results obtained by microarray analysis (Fig. 1a) and significant correlations were found between qRT-PCR and microarray data of the five lncRNAs (Fig. 1b-1f). These results reveal that the lncRNA expression levels detected by lncRNA microarray are reliable and reproducible which can be used for further analysis.
An optimal lncRNA combination (signature) for predicting the survival outcome in NSCLC patients was identified by employing the 15 lncRNAs associated with survival to establish a prognostic signature with a risk-score method as previously reported.[26, 27] Using this method, a 4-lncRNA signature was established with the highest prognostic power, consisting of NEAT1, lnc-GAN1, ASLNC11245, and GSO_1539832_023. Based on the expression levels of the 4 lncRNAs measured by microarray and weighted by their corresponding regression coefficient derived from univariate Cox regression analysis, the risk score formula is as follow:
Risk score = (0·412 x NEAT1 level) + (-0·349 x lnc-GAN1 level) + (-1·269 x ASLNC11245 level) + (-0·503 x GSO_1539832_023 level).
A risk score was calculated for each patient using the risk-score formula and the scores were divided into high- and low-risk groups according to the median risk score. Kaplan-Meier survival analysis displays that patients with high-risk have remarkable poor OS and DFS than those with low-risk (Fig. 2a), implying that this lncRNA signature could prove to be a highly effective potential prognostic signature for NSCLC patients.
Validation of the 4-lncRNA prognostic signature in NSCLC patients selected from a multicenter registry
The prognostic value of the 4-lncRNA signature identified in the discovery cohort was verified by validating it in NSCLC patients from two different geographical areas, one used as an internal validation cohort and the other as an independent validation cohort. The 4-lncRNA signature was first tested in the validation cohort (172 NSCLC samples) acquired from the same center as the discovery cohort in South China. These NSCLC samples were also detected with the same lncRNA microarray as the discovery cohort and the risk scores were computed for each patient in the validation cohort using the same risk-score formula as used in the discovery cohort. Based on the risk scores, patients were classified into high-risk and low-risk groups. Survival analysis showed that patients with high-risk have much worse OS and DFS than those with low-risk (Fig. 2b), which is consistent with the results obtained in the discovery cohort.
The 4-lncRNAs prognostic signature was then tested in 73 more NSCLC samples (as an independent cohort) obtained from another medical center in Southwest China and the expression of the 4 lncRNAs was detected using qRT-PCR. Univariate Cox regression analysis was then performed on the 4 lncRNAs formulating a risk-score formula using the same method as in the discovery cohort:
Risk score = (0·297 x NEAT1 level) + (-0·259 x Lnc-GAN1 level) + (-0·706 x ASLNC11245 level) + (-0·153 x GSO_1539832_023 level)
The risk score for each of the patients in the independent cohort was calculated using the formula. The median risk score was applied as the cutoff point and patients were categorized into high- and low-risk groups. As shown in Fig. 2c, OS and DFS of NSCLC patients in the high-risk group were found to be significantly worse than those in the low-risk group, which is in concordance with the results obtained in the discovery and validation cohorts. These results demonstrate that the 4-lncRNA signature is significantly correlated with the prognosis of the NSCLC patients from the multicenters in different geographical areas, suggesting that it is a new and powerful prognostic biomarker in NSCLC patients from different areas of China.
The 4-lncRNA prognostic signature is independent of TNM staging system
The clinical significance of the 4-lncRNA signature was established by first conducting a correlation analysis on the clinical characteristics of the signature. The result shows that the 4-lncRNA signature is not correlated with any of the clinical characteristics in the three cohorts (Table 3), implying that the signature is independent of the clinical characteristics. A univariate Cox regression analysis was then carried out on the signature and the clinical characteristics. The results indicate that only the 4-lncRNA signature and the TNM stage are associated with OS (Table 4) and DFS (Table 5) of NSCLC patients in all three cohorts, providing yet evidence that the 4-lncRNA signature is a prognostic factor. Finally, a multivariate Cox regression analysis was performed on the signature and all the clinical characteristics. Various other clinic-pathological variables were considered and both the 4-LncRNA signature and TNM stage were found to be significantly correlated with OS and DFS of patients in all the three cohorts, while other factors were not (Table 6). The independence of the signature as a predictive factor for survival was further confirmed by a stratified analysis on the three different clinical stages with the 4-lncRNA prognostic signature. Based on the risk score of the 4-lncRNA prognostic signature, patients in the same TNM stage (stage I, II, or III) were divided into high- or low- risk subgroups. The results indicated that NSCLC patients with high-risk scores generally had significantly worse OS and DFS than those with low-risk scores (Fig. 3) in stages I, II and III, indicating that the prognostic signature is independent of the TNM staging system. These results, therefore, indicate that 4-lncRNA molecular signature is a powerful and independent prognostic factor for NSCLC patients.
Table 3
· Clinical characteristics of NSCLC patients with high and low signature risk scores·
| Discovery cohort (N = 194) | | Validation cohort (N = 172) | | Independent cohort (N = 73) |
Characteristics | Low-risk | High-risk | P value | | Low-risk | High-risk | P value | | Low-risk | High-risk | P value |
| n (%) | n (%) | | | n (%) | n (%) | | | n (%) | n (%) | |
Age | | | | | | | | | | | |
≥ 60 | 52(53·6) | 49 (50·5) | 0·706 | | 49 (57·0) | 46(53·5) | 0·816 | | 19 (51·4) | 21 (58·3) | 0·493 |
< 60 | 45 (46·4) | 48 (49·5) | | | 37 (43·0) | 40(46·5) | | | 18 (48·6) | 15 (41·7) | |
Gender | | | | | | | | | | | |
Male | 82 (84·5) | 72 (74·2) | 0·269 | | 67 (77·9) | 69 (80·2) | 0·374 | | 29 (78·4) | 23 (63·9) | 0·29 |
Female | 15 (15·5) | 25 (25·8) | | | 19 (22·1) | 17 (19·8) | | | 8 (21·6) | 13 (36·1) | |
TNM Stage | | | | | | | | | | | |
I | 46 (47·4) | 41 (42·3) | 0·637 | | 39 (45·3) | 35 (40·7) | 0·702 | | 13 (35·1) | 10 (27·8) | 0·518 |
II | 11 (11·3) | 21 (21·6) | | | 18 (21·0) | 16 (18·6) | | | 8 (21·6) | 12 (33·3) | |
III | 40 (41·2) | 35 (36·1) | | | 29 (33·7) | 35(40·7) | | | 16 (43·2) | 14 (38·9) | |
Histological Type | | | | | | | | | | | |
ADC | 55 (56·7) | 40 (41·2) | 0·304 | | 39 (45·3) | 50 (58·1) | 0·297 | | 25 (67·6) | 22 (61·1) | 0·451 |
SCC | 39 (40·2) | 49 (50·5) | | | 40 (46·5) | 36 (41·9) | | | 12 (32·4) | 14 (38·9) | |
ADC/SCC | 3 (3·1) | 8 (8·3) | | | 7 (8·2) | 0(0) | | | 0(0) | 0(0) | |
Tumor Size | | | | | | | | | | | |
< 5 cm | 59 (60·8) | 49 (50·5) | 0·332 | | 46 (53·5) | 51 (59·3) | 0·573 | | 13 (35·1) | 18 (50·0) | 0·197 |
≥ 5 cm | 38 (39·2) | 48 (49·5) | | | 40 (46·5) | 35 (40·7) | | | 24 (64·9) | 18 (50·0) | |
Differentiation | | | | | | | | | | | |
Well/ Moderate | 58 (59·8) | 68 (70·1) | 0·402 | | 44 (51·2) | 61(70·9) | 0·203 | | 19 (51·4) | 24 (66·7) | 0·31 |
Poor | 39 (40·2) | 29 (29·9) | | | 42 (48·8) | 25(29·1) | | | 18 (48·6) | 12 (33·3) | |
Lymphatic metastasis | | | | | | | | | | | |
No | 47(48·5) | 58 (59·8) | 0·257 | | 39 (45·3) | 42(48·8) | 0·574 | | 21 (56·8) | 24 (66·7) | 0·297 |
Yes | 50 (51·5) | 39 (40·2) | | | 47(54·7) | 44 (51·2) | | | 16 (43·2) | 12 (33·3) | |
Smoking History | | | | | | | | | | | |
No | 36 (37·1) | 43 (44·3) | 0·503 | | 29 (33·7) | 33 (38·4) | 0·692 | | 15 (40·5) | 18 (50·0) | 0·307 |
Yes | 61 (62·9) | 54 (55·7) | | | 57(66·3) | 53 (61·6) | | | 22 (59·5) | 18 (50·0) | |
Family Cancer History | | | | | | | | | | | |
No | 84 (86·6) | 77 (79·3) | 0·396 | | 73 (75·3) | 78 (90·7) | 0·417 | | 36 (97·3) | 35(97·2) | 0·664 |
Yes | 13 (13·4) | 20 (20·7) | | | 13 (13·4) | 8 (9·3) | | | 1 (2·7) | 1 (2·8) | |
Table 4
Univariate Cox regression analysis of the impact of the lncRNA signature and other clinicopathological features on OS in three NSCLC patient cohorts·
Parameters | Training cohort | | Validation cohort | | Independent cohort |
| Hazard Ratio (95% CI) | P Value | | Hazard Ratio (95% CI) | P Value | | Hazard Ratio (95% CI) | P Value |
Signature | | | | | | | | |
(high vs low) | 3·20 (0·58 − 1·65) | < 0·001 | | 2·84 (1·59 − 5·07) | < 0·001 | | 2·84 (1·59 − 5·07) | 0·009 |
Age | | | | | | | | |
(≥ 60vs < 60) | 1·24 (0·73 − 2·09) | 0·417 | | 1·13 (0·67 − 1·91) | 0·33 | | 0·88(0·36 − 2·13) | 0·782 |
Gender | | | | | | | | |
(male vs female) | 0·86 (0·48 − 1·52) | 0·619 | | 1·24 (0·63 − 2·41) | 0·05 | | 0·98 (0·37 − 2·57) | 0·978 |
TNM Stages | | | | | | | | |
(I vs II vs III) | 1·67 (1·28 − 2·19) | < 0·001 | | 1·70 (1·30 − 2·23) | 0·001 | | 1·74 (1·04 − 2·89) | 0·031 |
Histological Type | | | | | | | | |
(ADC vs SCC) | 1·30 (0·74 − 2·29) | 0·346 | | 1·39 (0·82 − 2·36) | 0·589 | | 0·53 (0·15 − 1·83) | 0·32 |
Tumor Size | | | | | | | | |
(≥ 5 cm vs < 5 cm) | 1·17 (0·69 − 1·98) | 0·545 | | 1·15 (0·68 − 1·95) | 0·017 | | 3·00 (1·00–9·00) | 0·048 |
Differentiation | | | | | | | | |
(Poor vs Well/Moderate) | 0·98 (0·58 − 1·65) | 0·951 | | 1·43 (0·85 − 2·42) | 0·079 | | 1·80 (0·74 − 4·32) | 0·188 |
Lymphatic metastasis | | | | | | | | |
(Yes vs No) | 1·44 (0·86 − 2·43) | 0·163 | | 0·73 (0·43 − 1·24) | 0·025 | | 1·38 (0·46 − 4·14) | 0·561 |
Smoking History | | | | | | | | |
(Yes vs No) | 0·91 (0·54 − 1·54) | 0·736 | | 1·63 (0·92 − 2·91) | 0·024 | | 1·11 (0·46 − 2·68) | 0·812 |
Family Cancer History | | | | | | | | |
(Yes vs No) | 1·04 (0·52 − 2·06) | 0·899 | | 1·18 (0·58 − 2·42) | 0·58 | | 0·47 (2·97 − 7·64) | 0·618 |
Table 5
· Univariate Cox regression analysis of the impact of lncRNA signature and other clinicopathological features on DFS in three NSCLC patient cohorts·
Parameters | Training group | | Validation group | | Independent group |
| Hazard Ratio (95% CI) | P Value | | Hazard Ratio (95% CI) | P Value | | Hazard Ratio (95% CI) | P Value |
Signature | | | | | | | | |
(high vs low) | 2·61 (1·50 − 4·56) | < 0·001 | | 3·21 (1·80 − 5·71) | < 0·001 | | 2·18 (1·10 − 4·34) | 0·025 |
Age | | | | | | | | |
(≥ 60vs < 60) | 1·30 (0·76 − 2·21) | 0·33 | | 1·30 (0·76 − 2·21) | 0·51 | | 1·19 (0·60 − 2·35) | 0·599 |
Gender | | | | | | | | |
(male vs female) | 0·57 (0·33 − 1·00) | 0·05 | | 0·97 (0·53 − 1·78) | 0·945 | | 0·77 (0·37 − 1·60) | 0·496 |
TNM Stages | | | | | | | | |
(I vs II vs III) | 1·55 (1·18 − 2·04) | 0·001 | | 1·70 (1·29 − 2·25) | ༜0·001 | | 1·46 (1·00–2·12) | 0·045 |
Histological Type | | | | | | | | |
(ADC vs SCC) | 0·83 (0·44 − 1·59) | 0·589 | | 1·48 (0·88 − 2·49) | 0·133 | | 1·02 (0·45 − 2·29) | 0·954 |
Tumor Size | | | | | | | | |
(≥ 5 cm vs < 5 cm) | 1·89 (1·12 − 3·22) | 0·017 | | 1·57 (0·94 − 2·61) | 0·082 | | 1·92 (0·92 − 4·03) | 0·081 |
Differentiation | | | | | | | | |
(Poor vs Well/Moderate) | 0·62 (0·36 − 1·05) | 0·079 | | 1·22 (0·73 − 2·04) | 0·427 | | 1·29 (0·65 − 2·57) | 0·453 |
Lymphatic metastasis | | | | | | | | |
(Yes vs No) | 1·82 (1·07 − 3·10) | 0·025 | | 1·72 (1·01–2·91) | 0·042 | | 2·13 (0·87 − 5·23) | 0·095 |
Smoking History | | | | | | | | |
(Yes vs No) | 0·54 (0·32 − 0·92) | 0·024 | | 1·15 (0·68 − 1·95) | 0·586 | | 1·00 (0·51 − 1·97) | 0·989 |
Family Cancer History | | | | | | | | |
(Yes vs No) | 0·80 (0·38 − 1·71) | 0·58 | | 0·69 (0·31 − 1·53) | 0·369 | | 1·23 (0·16 − 9·12) | 0·834 |
Table 6
Multivariate Cox regression analysis of the impact of lncRNA signature and clinicopathological features on OS and DFS in three NSCLC patient cohorts·
Dataset | Parameters | Overall Survival | | Disease-free Survival |
| Hazard Ratio | P Value | | Hazard Ratio | P Value |
| (95% CI) | | | (95% CI) | |
Training | Signature | 3·18(1·62 − 6·23) | 0·001 | | 2·17(1·35 − 3·47) | 0·001 |
Age | 1·07 (0·78 − 1·45) | 0·401 | | 1·03 (0·76 − 1·40) | 0·845 |
Gender | 1·61 (0·57 − 4·51) | 0·365 | | 1·42 (0·54 − 3·73) | 0·474 |
TNM Stages | 1·61 (1·09 − 2·15) | 0·008 | | 1·47 (1·06 − 2·05) | 0·022 |
Histological Types | 0·99 (0·61 − 1·61) | 0·965 | | 0·94 (0·58 − 1·51) | 0·783 |
Tumor Sizes | 0·91 (0·47 − 1·73) | 0·767 | | 0·86 (0·45 − 1·64) | 0·657 |
Differentiation | 1·18 (0·78 − 1·78) | 0·43 | | 1·12 (0·75 − 1·68) | 0·587 |
Pleural Invasion | 1·60 (0·85 − 3·01) | 0·147 | | 1·79 (0·96 − 3·35) | 0·068 |
Vascular Invasion | 2·17 (0·75 − 6·28) | 0·154 | | 1·86 (0·65 − 5·34) | 0·248 |
Smoking History | 2·94 (1·15 − 7·50) | 0·024 | | 2·60 (1·09 − 6·24) | 0·032 |
Family Cancer History | 0·58 (0·25 − 1·37) | 0·218 | | 0·53 (0·23 − 1·25) | 0·146 |
Validation | Signature | 2·41 (1·47 − 3·97) | 0·001 | | 2·49 (1·53 − 4·05) | < 0·001 |
Age | 1·14 (0·87 − 1·49) | 0·359 | | 1·13 (0·87 − 1·48) | 0·349 |
Gender | 0·64 (0·32 − 1·27) | 0·201 | | 0·79 (0·41 − 1·55) | 0·498 |
TNM Stages | 1·40 (1·03 − 1·91) | 0·031 | | 1·40 (1·04 − 1·88) | 0·026 |
Histological Types | 0·92 (0·62 − 1·38) | 0·697 | | 0·94 (0·63 − 1·4) | 0·763 |
Tumor Sizes | 1·41 (0·80 − 2·49) | 0·24 | | 1·27 (0·73 − 2·23) | 0·402 |
Differentiation | 0·89 (0·59 − 1·32) | 0·552 | | 0·88 (0·60 − 1·31) | 0·537 |
Pleural Invasion | 1·42 (0·85 − 2·39) | 0·185 | | 1·51 (0·90 − 2·52) | 0·116 |
Vascular Invasion | 5·40 (1·73 − 16·8) | 0·004 | | 4·91 (1·59 − 15·17) | 0·006 |
Smoking History | 0·54 (0·28 − 1·04) | 0·064 | | 0·52 (0·27 − 1·00) | 0·05 |
Family Cancer History | 1·22 (0·66 − 2·28) | 0·521 | | 1·48 (0·81 − 2·70) | 0·206 |
Independent | Signature | 1·88 (1·15 − 3·08) | 0·012 | | 1·80 (1·14 − 2·84) | 0·012 |
Age | 1·00 (0·70 − 1·42) | 0·988 | | 1·19 (0·86 − 1·66) | 0·294 |
Gender | 0·80 (0·46 − 1·40) | 0·43 | | 0·70 (0·42 − 1·18) | 0·183 |
TNM Stages | 1·80 (1·28 − 2·54) | 0·001 | | 1·69 (1·24 − 2·30) | 0·001 |
Histological Types | 1·26 (0·74 − 2·13) | 0·395 | | 1·12 (0·69 − 1·84) | 0·64 |
Tumor Sizes | 1·78 (1·06 − 2·97) | 0·028 | | 1·92 (1·19 − 3·09) | 0·008 |
Differentiation | 1·66 (1·03 − 2·66) | 0·037 | | 1·81 (1·16 − 2·83) | 0·009 |
Pleural Invasion | 1·26 (0·68 − 2·33) | 0·466 | | 1·59 (0·87 − 2·91) | 0·13 |
Vascular Invasion | 1·75 (0·39 − 7·92) | 0·468 | | 2·81 (0·79 − 9·98) | 0·11 |
The 4-lncRNA signature provides additional prognostic information to the TNM staging system in NSCLC patients
In clinical practice, the traditional TNM staging system is the main approach for predicting the survival of patients with NSCLC and determining the treatment strategy. However, TNM staging system is mainly based on anatomic information and does not include the tumor biology factors. Therefore, this system is insufficient to predict survival outcome in NSCLC patients.[28] For example, Kaplan-Meier survival analysis on the three cohorts in this study showed that TNM stage system cannot effectively predict the prognosis of NSCLC patients in different stages, especially in stage Ⅰ and Ⅱ (Fig. 4). In order to improve the survival prediction of the TNM staging system, a new risk score model was established by combining the risk scores of the signature and the TNM staging systems. The low and high-risk cases were scored as 0 and 1, respectively while the stage I, II, and III were scored as 1, 2, and 3, respectively. Patients with the combined score of 1, 2–3, and 4 were classified as low-, medium- and high-risk, respectively. The Kaplan-Meier survival analysis was then performed on the patients with different combined risk scores in the three cohorts. The results showed that there was a significant difference in OS and DFS between patients with low-, medium-, and high-risk scores in the discovery cohort (Fig. 5a) and these results were confirmed in the validation and independent cohorts (Fig. 5b-5c).
The ROC analysis was then performed to compare the accuracy of the TNM staging system and the combined risk model. In the ROC curve analysis, the combined risk model achieved a significantly higher predictive accuracy for OS (AUC = 0·726 vs 0·644) and DFS (AUC = 0·723 vs 0·641) than the TNM staging system in the discovery cohort (Fig. 6a), and the same results were observed in the validation and the independent cohorts, respectively (Fig. 6b-6c). All these results proved that the 4-lncRNA signature could provide additional prognostic information and enhance the prognostic power of the TNM staging system.