Acquisition of TCGA data and construction of a prognostic signature
1222 samples were downloaded from TCGA database, in which 1109 belonged to BC patients and 113 were from normal tissues. By excluding incomplete samples, we kept 1089 patients. A total of 14142 lncRNA data were collected from tumor tissues. According to the Human Autophagy Database, we downloaded autophagy regulatory genes and screened 1270 autophagy related lncRNAs by pearson correlation analysis. Then, by comparing overall survival (OS), the univariate Cox regression models identified 51 autophagy-related lncRNAs (Table S1, Fig. 1a), 12 of which were finally screened to construct prognostic signature by a multivariate Cox regression analyses (Table 1, Fig. S1). Among these, lncRNA AC008105.3, Z68871.1 and AL109811.3 are risk factors for the prognosis. While AL731571.1, AC061992.1, AL136368.1, MAPT-AS1, SEMA3B-AS1, LINC01871, ST7-AS1, AL122010.1 and AC121761.2 are protective factors. With the accuired regression coefficients, the confirmed signature formula is: risk score = (0.286*ExpAC008105.3) + (0.461*ExpZ68871.1) + (-0.452*ExpAL731571.1) + (-0.270*ExpAC061992.1) + (-0.515*ExpAL136368.1) + (-0.285*ExpMAPT-AS1) + (-0.078*ExpSEMA3B-AS1) + (-0.333*ExpLINC01871) + (-0.322*ExpST7-AS1) + (-0.292*ExpAL122010.1) + (0.197*ExpAL109811.3) + (-0.377*ExpAC121761.2). Then we calculated and recorded the risk score of every sample. And according to the median risk score, the patients were divided into high- (n = 544) and low- risk groups (n = 545) (Fig. 1c). Then by assembling the patients’ survivle time and drawing survival curve, we found the OS of the high risk group is significantly worse than the low risk group (Fig. 1b). Meanwhile, as the risk score increases, the mortality of the patient increases to some extent (Fig. 1d). To verify the significance of the signature, we compared some clinical factors including age, gender, stage, T, N and M phase, and risk scores, respectively. By univariate Cox regression (Fig. e), we found that risk score is significant prognostic factors related to worse OS, while multivariate Cox regression (Fig. f) showed only age and risk score were independent prognostic factors. Besides, the area uder the ROC curve (AUC) equals to 0.882, which indicates its great predictive effect. All those results revealed that the signature is meaningful and is able to predict the prognosis of BC.
Table 1
Multivariate Cox regression of autophagy-related lncRNAs
lncRNAs | Beta | SE | HR | HR.95L | HR.95H | P value |
AC008105.3 | 0.286 | 0.181 | 1.33 | 0.93 | 1.90 | 0.114 |
Z68871.1 | 0.461 | 0.167 | 1.59 | 1.14 | 2.20 | 0.006 |
AL731571.1 | -0.452 | 0.201 | 0.64 | 0.43 | 0.94 | 0.024 |
AC061992.1 | -0.270 | 0.143 | 0.76 | 0.58 | 1.01 | 0.059 |
AL136368.1 | -0.515 | 0.252 | 0.60 | 0.37 | 0.98 | 0.041 |
MAPT-AS1 | -0.285 | 0.109 | 0.75 | 0.61 | 0.93 | 0.009 |
SEMA3B-AS1 | -0.078 | 0.036 | 0.92 | 0.86 | 0.99 | 0.029 |
LINC01871 | -0.333 | 0.092 | 0.72 | 0.60 | 0.86 | 0.000 |
ST7-AS1 | -0.322 | 0.194 | 0.72 | 0.50 | 1.06 | 0.097 |
AL122010.1 | -0.292 | 0.106 | 0.75 | 0.61 | 0.92 | 0.006 |
AL109811.3 | 0.197 | 0.088 | 1.22 | 1.02 | 1.45 | 0.026 |
AC121761.2 | -0.377 | 0.205 | 0.69 | 0.46 | 1.02 | 0.066 |
Stratified Analysis Of Prognosis Signature Based On Clinical Features
Clinical characteristics are important factors to evaluate the patients’ condition and predict the OS. So, we next investigated the relationship between risk scores and survival time under the same clinical condition. The clinical information acquired from TCGA includes age, gender, stage, T, N and M phase. Then we divided the clinical features into several subgroups, which are age ( < = 65 and > 65), stage (I&II and III&IV), gender (only female was investigated because of the lack of data for male), T (T1-2 and T3-4), N (N0 and N1-3), and M (M0 and M1) phase, respectively. The results revealed that all the subgroups showed a significant superior survival time for low-risk groups compared to high-risk groups with P values < 0.05 (Fig. 2). So, our findings indicated that the signature is suitable for multiple clinical factors, thus is universal under different circumstances and important in prognosis of BC.
Associated Relation Between The Clinical Factors And Autophagy-related Lncrnas
According to the assigned subgroups mentioned above, we further explored the expression of the 12 autophagy-related lncRNAs and the distribution of the clinical factors in high- and low-risk groups, which were represented by heat map (Fig. 3a). We found that between the two groups, stage (p < 0.001), N (p < 0.05) phase, M (p < 0.05) phase and survival condition (p < 0.001) were distinguished. Then by comparing the risk scores within the corresponding subgroups, we discovered that the patients with lower age (Fig. 3b), earlier T phase (Fig. 3c) or earlier N phase (Fig. 3d) generally had lower risk scores with P values equals to 0.0048, 0.026 and 0.0041, respectively, in accordance with the fact and our prediction. However, there was no significant difference between the subgroups of M phase. Hence, we could draw a brief conclusion that in BC patients, the autophagy-related lncRNA signature may be associated with some clinical characteristics.
To further explore the relationship between the expression of the respective 12 autophagy-related lncRNAs and clinical factors, we ulteriorly divided some clinical characteristics including stage (I, II, III and IV), T (1,2,3 and 4) and N (0,1,2 and 3) phase. Then we recorded the expression level of the lncRNAs in different subgroups and found that lncRNA AL731571.1, LINC01871, ST7-AS1 and SEMA3B-AS1 expressed differently in the subgroups about age (Fig. 4a). In regard to the stage, lncRNA AL136368.1, MAPT-AS1 and AL122010.1 were significantly distinguishing (Fig. 4b). And for T phase, lncRNA AL136368.1, MAPT-AS1, SEMA3B-AS1, LINC01871, ST7-AS1, AL122010.1 and AL109811.3 were associated with the tumor depth of BC (Fig. 4c). In addition, lncRNA SEMA3B-AS1was correlated with lymph node metastasis (Fig. 4d). While no lncRNA was found to express discrepantly in the groups divided by M phase (Fig. 4e). In sum, lncRNA AL731571.1, AL136368.1, MAPT-AS1, SEMA3B-AS1, LINC01871, ST7-AS1, AL122010.1 and AL109811.3 are approximately suppressors for BC, which are consistent with the results before. Therefore, these lncRNAs were proved to be correlated with some clinical factors and thus might be available biomarkers of BC.
Construction Of Lncrna-mrna Co-expression Network And Function Analysis
Among all the functions of lncRNA, regulatory role in gene expression is one of the most important, also known as lncRNA-mRNA co-expression. As is shown in Fig. 5a, 27 mRNAs were found to be correlated with the autophagy-related lncRNAs, 19 of which were regulated by more than one autophagy-related lncRNAs. In addition, a Sankey diagram demonstrated the corresponding relationship among lncRNA, mRNA and risk factors (Fig. 5b). The co-expression network and Sankey diagram both showed that the center and major of the signature could be Z68871.1, AL109811.3, AC008105.3 and LINC01871. Except for LINC01871, the others are risk factors, thus we could roughly recognize that the signature is in direct proportion to risk, which is consistent with the nature of our signature. Then, to further search the correlated pathways, we applied and visualized KEGG enrichment analysis (Fig. 5c), which revealed that autophagy, neurodegeneration and shigellosis were the top 3 enriched pathways. Almost 40% of the co-expressed mRNAs located in autophagy-related signaling pathway. There were 88 pathways enriched in autophagy signaling pathway, including 65 pathways that exist in animal (Table S3). Besides, by Go analysis, we got 58 genes. And the main genes included ubiquitin-like protein ligase binding, ubiquitin protein ligase binding and protein serine/threonine kinase activity, etc. (Fig. S2, Table S2), which all act on cell autophagy process.
The essence of the samples compared in high- and low-risk group.
Since KEGG and GO analysis had found the signaling pathways regulated by autophagy-related lncRNAs, further research should be done to confirm the essence of the correlated genes. So, with the autophagy-related lncRNA set and whole gene expression profiles, we have applied principal component analysis. According to the distribution of the gene in high- and low-risk groups presented in Fig. 6a and 6b respectively, significant difference was observed between the two groups, showing a distinguishing autophagy-related gene profiles between the two groups. However, with genome-wide expression profiles, PCA demonstrated no distinct separation of the groups (Fig. 6c). Then to search for the pathways of the differentially expressed gene sets, we used GSEA analysis and found 43 enriched gene sets ((cutoff NOM P value < 0.05), within which 12 were enriched in high-risk group and 31 in low-risk group. Then we enumerated the top 10 gene sets of the two groups respectively (Fig. 6d, 6e) and observed extracellular matrix (ECM) receptor interaction, TGF-β signaling pathway, adhesion junction, focal adhesion and o-glycan biosynthesis were enriched signaling pathways in high-risk group and autoimmune thyroid disease, arachidonic acid metabolism, intestinal immune network for IgA production, alpha-Linolenic acid metabolism and ether lipid metabolism in high-risk group.
Mutation profiles difference in high- and low-risk groups
Since the research about expression of the autophagy-related lncRNA and pathway enrichment analysis have been finished, we wanted to deeply explore the diversity in mutation profile. So, 498 patients have been selected to high-risk group while 475 patients to low-risk group. With waterfall plots, we could get the information of mutated genes in each sample, as well as the proportion and type of mutation in total. (Fig. 7a, b) With more statistics of the data, we found that in both groups, missense mutation accounted for the largest fraction of mutation. Meanwhile, single nucleotide polymorphism (SNP) was most common type of mutation and the transversion C>T, C>G, C>A were the top 3 SNV classes. More details of variants per sample and variant classification summaries in both groups were shown simultaneously. We could find that the median variants per sample in high-risk group is 32, comparing 26 in low-risk group (Fig. 7c, d), indicating that more mutants occur in high-risk group. According to the mutation rate, the top 10 mutated genes in high-risk group are TP53, PIK3CA, TTN, MUC16, GATA3, MAP3K1, CDH1, KMT2C, MUC4 and USH2A, and in low-risk group are PIK3CA, TP53, TTN, CDH1, MAP3K1, MUC16, GATA3, MUC4, KMT2C and PTEN. Most top 10 mutated genes in both groups are the same except for USH2A in high-risk group and PTEN in low-risk group. Finally, we used gene cloud demonstrating the frequency of the mutation, making it more visualized according to the size of the gene (Fig. 7e, f).
Immunology status based on risk score.
Immune has become the hot spot ever since it was discovered. It is known to have close relationship with multiple tumor progression and to play crucial roles in tumor cell death escape. The GSEA analysis has enriched several immune-related pathways, so we compared immune status between separated risk groups by CIBERSORTx, trying to figure out the impact of the autophagy-related signature on immunity. Heat map (Fig. 8a) and box plot (Fig. 8b) revealed that 9 of the infiltrated immune cell types had significant difference between high- and low-risk groups. Among them, we found that in high-risk group, the majority of immune cells increased, including CD8+ T cell (P <0.001), activated CD4+ B memory cell (P <0.001), macrophages M1 (P <0.001), Treg cell (p <0.01), T follicular helper cell (P <0.05), activated NK cell (P <0.05) and resting dendritic cell (P <0.05), accompanied with higher immune scores. Those cells most function as suppressors for tumor. Thus, it is revealed that in high-risk group, immune system may be more vibrant. Interestingly, macrophages M2 had higher amount in low-risk group, while macrophages M1 showed more numbers in high-risk group. Macrophage M2 is recognized as tumor associated macrophages that promotes tumor growth while M1 could prevent autophagy of cells. The results may be in accordance with the possible activated immunity of the patients with higher risk, fighting for cancer.
Feasibility verification of the 12 autophagy-related lncRNAs in tumor and normal tissues.
Until now, we have almost finished the construction and exploration of the prognostic model. To primarily verify the function of the 12 target lncRNAs in tumor, we compared the expression of those lncRNAs between 1109 tumor tissues and 113 normal tissues. By heat map (Fig. 9a) and box plot (Fig. 9b), we found the differentially expressed lncRNA Z68871.1, AL731571.1, AC061992.1, AL136368.1, LINC01871, ST7-AS1, AL122010.1, AL109811.3 and AC121761.2. Among them, lncAC061992.1 was significantly highly expressed in tumor tissues, and the rest in normal tissues by mean expression levels. However, our signature recognized lncAL109811.3 as a risk factor while AC061992.1 as protective factor in prognosis. Hence, we could find that lncAC061992.1 is associated with the occurrence of breast cancer but is a protector for the prognosis while lncAL109811.3 acts an opposite way.