Data source and processing
Initially, we obtained a total of 14143 lncRNA expression and 19659 gene expression profiles from 1053 breast cancer samples and 111 normal samples. In addition, the corresponding clinical data of 986 patients were downloaded from TCGA. The immune-related gene list downloaded from the ImmPort database contained 1534 immune related genes (Table S1). We then obtained 937 immune-related lncRNAs through co-expression analysis of the immune gene list (p<0.001). The top 10 positively and negatively correlated lncRNAs are shown sorted by correlation coefficient in Table 1.
Identification of immune-related lncRNAs and construction of the prognostic model
The data of breast cancer patients was allocated randomly to the training and validation cohort, 494 patient samples in the training cohort, 492 patient samples in the validation cohort. We carried out univariate Cox regression analysis on the expression profiles of the lncRNAs in the training set and obtained 15 candidate immune-related lncRNAs, significantly associated with survival, p<0.01(Figure 1a,Table2). We performed Lasso regression on these lncRNAs. In order to avoid over fitting of the predicted signal, the prediction accuracy wasestimated by tenfold cross-validation (Figure 1b,c). A total of 8 immune-related lncRNAs were obtained, including OTUD6B-AS1(HR=1.698; 95% CI 1.066-2.707, p=0.026), AL122010.1(HR=0.404; 95% CI 0.209-0.782, p=0.007), AC136475.2(HR=0.596; 95% CI 0.369-0.964, p=0.035), AL161646.1(HR=1.215; 95% CI 0.954-1.549, p=0.115), AC245297.3(HR=0.710;95% CI 0.450-1.121, p=0.142), LINC00578(HR=1.269; 95% CI 1.001-1.609, p=0.049), LINC01871(HR=0.657; 95% CI 0.448-0.964, p=0.032), AP000442.2 (HR=0.335; 95% CI 0.101-1.115, p=0.075) (Figure 1d, Table 3). So OTUD6B-AS1, AL161646 and LINC00578 were risk factors with HR >1, while AL122010.1, AC136475.2, AC245297.3, LINC01871 and AP000442.2 were protective factors with an HR <1. The expression of all the 8 immune-related lncRNAs in breast cancer was shown in the Figure S1 and Table S2, then we compared their expression between cancer samples and normal samples(Figure S2). Except for AL161646.1 and LINC00578(p<0.0001), lncRNAs, including OTUD6B-AS1, AC245297.3, AC136475.2, AL122010.1, AL161646.1 and LINC01871, were high expressed between breast cancers and normal tissues. Some of the correlation between the lncRNAs and immune genes were shown in the Figure S3. The risk score for each sample was calculated based on the expression levels of these 8 lncRNAs. Risk score = (0.53* OTUD6B-AS1) + (- 0.91* AL122010.1) + (- 0.52* AC136475.2)+(0.20* AL161646.1)+(- 0.34* AC245297.3) + (0.24* LINC00578) + (- 0.42* LINC01871) + (- 1.10* AP000442.2) (Table3).
The immune-related lncRNA model is a robust prognostic tool for breast cancer
Breast cancer patients were divided into low- and high-risk groups according to the median risk score in the training set. Figure 2a presents the result of the Kaplan-Meier test. The p value of the log-rank test was 1.215e−06, indicating that patients in the low-risk group had a 10 year longer median OS compared with the high-risk group. To assess the accuracy of the prognostic model, further examinations in both the testing set and the whole set were performed with the same algorithm cutoff. Both sets yielded similar results. The low-risk group exhibited remarkably better overall survival (OS) than the high-risk group, which indicated that the prognostic signature was effective (p=0.0069 in the validation set; p=1.233e−07 in the total set) (Figure 2b and 2c).
The ROC curve analysis of the model in the training set demonstrated its promising predictive value for breast cancer survival (1-year AUC = 0.812, 5-year AUC = 0.772, 8-year AUC = 0.81, 10-year AUC = 0.857, Figure 2d). We then validated the model in the testing set, and the 1-year AUC was 0.615, the 5-year AUC was 0.599, the 8-year AUC was 0.68, and the 10-year AUC was 0.655 (Figure 2e). As for the total cohort, the 1-year AUC was 0.725, the 5-year AUC was 0.678, the 8-year AUC was 0.742,and the 10-year AUC was 0.741 (Figure 2f).
Principal component analysis (PCA) of the training, testing, and total breast cancer cohort demonstrated a different distribution pattern between the high- and low-risk groups, based on the expression of the 8 immune-related lncRNAs. This was indicative of the difference between the immune phenotypes of the groups (Figure S4).
Assessment of the correlation between candidate lncRNAs and clinicopathological characteristics
We generated risk curves and scatter plots to display the risk score and survival status of each breast cancer patient, not only in the training set, but also in the testing and in the total set. The risk coefficient and mortality in the low-risk group were lower than those in the high-risk group (Figure 3a-f).
Tumors with high prognostic scores expressed high-risk immune-related lncRNAs, whereas tumors with low prognostic scores expressed protective immune-related lncRNAs. The heatmap revealed that OTUD6B−AS1, AL161646.1, and LINC00578 were highly expressed in the low-risk group, while AC136475.2, AL122010.1, LINC01871, and AP000442.2 were highly expressed in the high-risk group (Figure 3g-i).
Further, in the overall sample, we analyzed the relationship between the expression of the 8 candidate lncRNAs and different clinicopathological factors (such as T, N, M, stage of 7th AJCC, molecular typing, etc.). Our results confirmed that differential expression of AC136475.2 (p<0.01), AL122010.1 (p<0.001), AL161646.1 (p<0.05), LINC01871(p<0.01) could be observed among different T grades (Figure 4a). The differences in expression of AC136475.2 (p<0.05), AL161646.1 (p<0.01), and OTUD6B−AS1 (p<0.05) were statistically significant between different N groups(Figure 4b). High expression of AL161646.1 (p<0.05) and LINC00578 (p<0.001) was observed in M1 group,while low expression of AC136475.2 was observed in M1 group (p<0.05) (Figure 4c). The differences in expression of AL122010.1 (p<0.001) and AL161646.1 (p<0.01) were statistically significant between different stage groups(Figure 4d). Except for AP000442.2 and OTUD6B−AS1, lncRNAs, including AC136475.2, AC245297.3 (p<0.001), AL122010.1 (p<0.001), AL161646.1 (p<0.001), LINC00578 (p<0.001), and LINC01871 (p<0.001), were differently expressed between different breast cancer molecular types(Figure 4e).
Evaluation of the immune-related lncRNA signature as an independent prognostic factor in patients with breast cancer
We carried out univariate and multivariate Cox regression analyses to verify that the model could serve as an independent prognostic factor for breast cancer, also accounting for certain clinicopathological variables (such as age, ER status, PR status, AJCC 7th T stage, etc.) (Figure 5). The univariate Cox analysis revealed that the high-risk group was significantly correlated with shorter survival in the training set (HR= 1.483; 95% CI 1.273-1.729,p<0.001), validation set (HR= 1.147; 95% CI 1.012−1.301, p = 0.032), and whole set (HR= 1.220; 95% CI 1.128−1.318, p<0.001). Multivariate Cox regression analyses of the above mentioned factors indicated that the immune-related lncRNA model was a reliable and independent prognostic factor for OS in the training set (HR= 1.432; 95% CI 1.204−1.702, p<0.001), validation set (HR= 1.162; 95% CI 1.004−1.345, p = 0.044), and whole set (HR=1.240; 95% CI 1.128−1.362, p<0.001). In the whole set, multivariate analysis revealed that age (HR= 1.040; 95% CI 1.013−1.067, p = 0.003) and PR status (HR= 0.401; 95% CI 0.173−0.931, p = 0.034) were independent prognostic factors for OS.
Gene set enrichment analysis for functional annotation of the immune-related risk signature
GSEA of the risk signature was performed using the GSEA software. The results revealed that immune-related responses were further enriched in low-risk groups compared to high-risk groups. We demonstrated 5 immune-related gene ontology terms in the GSEA results with FDR<0.25 (Figure 6), including the positive regulation of immune effector processes, positive regulation of the adaptive immune response, positive regulation of lymphocyte activation, regulation of T cell activation, and the T cell receptor signaling pathway.