Complete Identication of Autophagy-Related lncRNA Prognostic Signature for Breast Cancer

Purpose: Breast cancer (BC) has a relatively high morbidity and mortality for women. The research about BC prognosis is signicant. Autophagy is an essential process for tumor progression, which could play its role with lncRNA, a kind of ncRNA that have regulatory roles in multiple tumors. Therefore, constructing an autophagy-related prognostic model for breast cancer is meaningful. Methods: We download data from the TCGA and HADb. Pearson correlation analysis was performed to excavate autophagy-related lncRNA. Then with gene expression difference analysis, etc. we explored the relationship between clinical features and the signature. We applied Cytoscape as well as KEGG, etc. to explore expression condition. And the autophagy status of our signature was investigated by GSEA, etc. We also searched the immune distinction by CIBERSORTx to extend our study and preliminarily veried our study in the end. Results: Firstly, we got an independent autophagy-related lncRNA prognostic model, by which BC patients were divided into high- and low-risk groups. We found that the OS of high-risk group is signicantly lower than that of low-risk group, which was identical to those within various clinical subgroups. Then, the KEGG and GO analysis enriched several pathways including autophagy. PCA and GSEA analysis demonstrated the autophagy status. Several distinguishing immune cell types in separated groups were revealed by immunity analysis. Then the verication in the end proved the feasibility of our signature. Conclusion: In this study, we acquired an independent autophagy-related lncRNA signature involving 12 lncRNAs, which contributes to the prediction of prognosis of BC patients.

autophagy-related genes and its process involves abundant signaling pathways. The function of autophagy includes cell metabolism [Reggiori and Klionsky 2002, Poillet-Perez and White 2019], immune response [Cui, et al. 2019, Wu and Adamopoulos 2017], repair of damage [Boya, et al. 2018] and programmed cell death [Ou, et al. 2017]. Thus, autophagy is crucial to the basal homeostasis [He, et al. 2012]. Furthermore, it also plays important roles in cancer, not only as a suppressor, but also a promotor under different circumstances [Mowers, et al. 2018, Levy, et al. 2017]. Hence, the research about autophagy is quite necessary.
Long non-coding RNA (lncRNA) is one kind of non-coding RNAs (ncRNA) that has various functions in almost every activity of cells. More and more researches have shown that lncRNA plays essential roles in cancer progression, such as cell proliferation, apoptosis, metastasis and immune inhibition [Zhao, et al. 2019, Zhao, et al. 2018, Wei and Wang 2017]. With the discovery of autophagy, a large number of lncRNAs were found to be related with the process [Zhou, et al. 2020]. Thus, many scientists have shown a deep interest in this led and the signaling pathways about it is being completed gradually. There have been a large number of researches demonstrating the biomarker function of lncRNA. But the majority of these research claims single molecule as potential predicted marker. There are two articles about breast cancer and autophagy-related lncRNA prognostic model. But the research did not involve in-depth and extended exploration. So, in this study, we not only applied data in TCGA database, ltrating autophagy-related lncRNAs and construct a predictive signature to be a more systematic biomarker of breast cancer, but also verify its feasibility, universality and correlation with clinical features as well as extent our results to immune status. We believe that with the signature, there could be a new signi cant way to evaluate the prognosis of BC patients.

Methods And Materials
Acquisition of clinical and transcriptome data We searched and downloaded the data of transcriptome pro ling and clinical information of breast cancer patients in The Cancer Genome Atlas (TCGA) database. Fragments per kilobase of transcript per million (FPKM) from gene expression quanti cation were screened for our research. Then we separated and rearranged the data of lncRNAs and mRNAs. In addition, clinical information including age, stage, tumor depth (T) phase, lymph node (N) phase, metastasis (M) phase and follow-up outcomes were downloaded and settled for further research. By eliminating some incomplete data, we nally got the chart could be used in the research.

Selection of autophagy-related lncRNA
The autophagy-related gene expression data were downloaded from Human Autophagy Database (HADb). To select autophagy-related lncRNA, we performed Pearson correlation analysis and found autophagy co-expressed lncRNA with the correlation e ciency (|R| greater than 0.3 and P value less than 0.001). The statistical software R (version 3.6.0) were applied. And with the processed transcriptome data, we have got the information of lncRNA in BC patients. Then we picked the expression data of autophagy-related lncRNA for further research.

Construction of prognostic signature
Univariate Cox regression was used to explore the relationship between single autophagy-related lncRNA expression and prognosis of breast cancer patients with P value less than 0.05. Besides, we could get HR value to differentiated the dangerous or protective character of the lncRNA. Then multivariate Cox regression was performed to exclude interaction of lncRNAs and construct the predicted model for BC. And not only HR and its section, but also regression coe cients (β) and P values were demonstrated. So, we could get the risk scores of every patient with the expression of the lncRNAs and the formula exhibited as follows: risk score = coef(lncRNA1) × expr (lncRNA1) + coef(lncRNA2) × expr (lncRNA2) +…+ coef(lncRNAn) × expr (lncRNAn).

Prediction analysis of the prognosis signature
According to the median score, the patients were separated into high-risk group and low-risk group. Using the Kaplan-Meier analysis, we depicted and compared the survival curve of the two groups. Also, receiver operating characteristic (ROC) curve was depicted and area under curve were calculated to evaluate the e ciency of the signature. In addition, by Cox regression, we explored the in uence of several clinical factors and risk score for survival. Age, gender, T, N, M phase were involved. Because the lack of data of male patients, we only got the result of female. And we also detected the independency of the risk score by Cox regression. After that, we divided the samples into different subgroups according to the separation of several clinical factors and depicted the survival curve of those subgroups respectively. Then we searched the differentiated risk scores between the subgroups as well as the relationship among every target lncRNA and clinical factors by heat map and Student's t-test, proving the feasibility and universality of our signature.
Establishment of co-expression network and pathway analysis.
Expression data of autophagy-related lncRNA and mRNA were input into Cytoscape to construct coexpression network of lncRNA-mRNA. Then, a Sankey diagram was depicted to visualize the relationship among expression of autophagy gene, autophagy-related lncRNA and risk factors. Besides, Kyoto Encyclopedia Genes and Genomes (KEGG) and Gene Ontology (GO) analysis were used to excavate signaling pathways and the function of the target lncRNA. Principal component analysis (PCA) was performed to explore the differences of high-and low-risk groups based on autophagy-related lncRNA sets and genome-wide expression pro les. In addition, Gene set enrichment analysis (GSEA) have observed enrichment condition of gene sets in high-and low-risk groups respectively. The version of GSEA is v4.1.0, with P<0.05.

Analysis of mutant pro ling
We used water plot to investigate mutant information of BC patients and compared the results between high-and low-risk groups. The variant classi cation, type, SNV class were calculated to nd the rule of autophagy-related gene mutant, and gene cloud was depicted to visualize the mutant frequencies.
Differences of immune status in Groups divided by risk score By CIBERSORTx (https://cibersortx.stanford.edu/), the data from TCGA were input. Then the predictive results of various in ltrated immune cells were used to compare the immune difference between the highand low-risk groups. The analysis could investigate the association of the autophagy-related lncRNA signature and immunity, providing a new way of research.
Veri cation of target lncRNA in tumor and normal tissues Gene Expression Pro ling Interactive Analysiscompared the lncRNA expression data of tumor tissues and normal tissues. Then with heat map and box plot, we could clearly get the signi cantly distinct lncRNA and primarily verify the feasibility of our signature.

Statistical Analysis
R software was used for our statistical analysis. Pearson correlation analysis contributed to the relationship analysis. Kaplan-Meier analysis and log-rank test were performed for survival curve while Student's t-test for correlation among lncRNA and several clinical factors. Univariate and multivariate Cox Regression were implemented to construct the prognostic model and evaluate the independency of clinical factors and risk score. The signi cance was con rmed as P < 0.05.

Results
Acquisition of TCGA data and construction of a prognostic signature 1222 samples were downloaded from TCGA database, in which 1109 belonged to BC patients and 113 were from normal tissues. By excluding incomplete samples, we kept 1089 patients. A total of 14142 lncRNA data were collected from tumor tissues. According to the Human Autophagy Database, we downloaded autophagy regulatory genes and screened 1270 autophagy related lncRNAs by pearson correlation analysis. Then, by comparing overall survival (OS), the univariate Cox regression models identi ed 51 autophagy-related lncRNAs (Table S1, Fig. 1a), 12 of which were nally screened to construct prognostic signature by a multivariate Cox regression analyses (Table 1 2). Then we calculated and recorded the risk score of every sample. And according to the median risk score, the patients were divided into high-(n = 544) and low-risk groups (n = 545) (Fig. 1c). Then by assembling the patients' survivle time and drawing survival curve, we found the OS of the high risk group is signi cantly worse than the low risk group (Fig. 1b).
Meanwhile, as the risk score increases, the mortality of the patient increases to some extent (Fig. 1d). To verify the signi cance of the signature, we compared some clinical factors including age, gender, stage, T, N and M phase, and risk scores, respectively. By univariate Cox regression (Fig. e), we found that risk score is signi cant prognostic factors related to worse OS, while multivariate Cox regression (Fig. f) showed only age and risk score were independent prognostic factors. Besides, the area uder the ROC curve (AUC) equals to 0.882, which indicates its great predictive effect. All those results revealed that the signature is meaningful and is able to predict the prognosis of BC. The clinical information acquired from TCGA includes age, gender, stage, T, N and M phase. Then we divided the clinical features into several subgroups, which are age ( < = 65 and > 65), stage (I&II and III&IV), gender (only female was investigated because of the lack of data for male), T (T1-2 and T3-4), N (N0 and N1-3), and M (M0 and M1) phase, respectively. The results revealed that all the subgroups showed a signi cant superior survival time for low-risk groups compared to high-risk groups with P values < 0.05 (Fig. 2). So, our ndings indicated that the signature is suitable for multiple clinical factors, thus is universal under different circumstances and important in prognosis of BC.

Associated Relation Between The Clinical Factors And Autophagy-related Lncrnas
According to the assigned subgroups mentioned above, we further explored the expression of the 12 autophagy-related lncRNAs and the distribution of the clinical factors in high-and low-risk groups, which were represented by heat map (Fig. 3a). We found that between the two groups, stage (p < 0.001), N (p < 0.05) phase, M (p < 0.05) phase and survival condition (p < 0.001) were distinguished. Then by comparing the risk scores within the corresponding subgroups, we discovered that the patients with lower age (Fig. 3b), earlier T phase (Fig. 3c) or earlier N phase ( Fig. 3d) generally had lower risk scores with P values equals to 0.0048, 0.026 and 0.0041, respectively, in accordance with the fact and our prediction. However, there was no signi cant difference between the subgroups of M phase. Hence, we could draw a brief conclusion that in BC patients, the autophagy-related lncRNA signature may be associated with some clinical characteristics.

Construction Of Lncrna-mrna Co-expression Network And Function Analysis
Among all the functions of lncRNA, regulatory role in gene expression is one of the most important, also known as lncRNA-mRNA co-expression. As is shown in Fig. 5a, 27 mRNAs were found to be correlated with the autophagy-related lncRNAs, 19 of which were regulated by more than one autophagy-related lncRNAs. In addition, a Sankey diagram demonstrated the corresponding relationship among lncRNA, mRNA and risk factors (Fig. 5b). The co-expression network and Sankey diagram both showed that the center and major of the signature could be Z68871.1, AL109811.3, AC008105.3 and LINC01871. Except for LINC01871, the others are risk factors, thus we could roughly recognize that the signature is in direct proportion to risk, which is consistent with the nature of our signature. Then, to further search the correlated pathways, we applied and visualized KEGG enrichment analysis (Fig. 5c), which revealed that autophagy, neurodegeneration and shigellosis were the top 3 enriched pathways. Almost 40% of the coexpressed mRNAs located in autophagy-related signaling pathway. There were 88 pathways enriched in autophagy signaling pathway, including 65 pathways that exist in animal (Table S3). Besides, by Go analysis, we got 58 genes. And the main genes included ubiquitin-like protein ligase binding, ubiquitin protein ligase binding and protein serine/threonine kinase activity, etc. (Fig. S2, Table S2), which all act on cell autophagy process.
The essence of the samples compared in high-and low-risk group.
Since KEGG and GO analysis had found the signaling pathways regulated by autophagy-related lncRNAs, further research should be done to con rm the essence of the correlated genes. So, with the autophagyrelated lncRNA set and whole gene expression pro les, we have applied principal component analysis.
According to the distribution of the gene in high-and low-risk groups presented in Fig. 6a and 6b respectively, signi cant difference was observed between the two groups, showing a distinguishing autophagy-related gene pro les between the two groups. However, with genome-wide expression pro les, PCA demonstrated no distinct separation of the groups (Fig. 6c). Then to search for the pathways of the differentially expressed gene sets, we used GSEA analysis and found 43 enriched gene sets ((cutoff NOM P value < 0.05), within which 12 were enriched in high-risk group and 31 in low-risk group. Then we enumerated the top 10 gene sets of the two groups respectively (Fig. 6d, 6e) and observed extracellular matrix (ECM) receptor interaction, TGF-β signaling pathway, adhesion junction, focal adhesion and oglycan biosynthesis were enriched signaling pathways in high-risk group and autoimmune thyroid disease, arachidonic acid metabolism, intestinal immune network for IgA production, alpha-Linolenic acid metabolism and ether lipid metabolism in high-risk group.

Mutation pro les difference in high-and low-risk groups
Since the research about expression of the autophagy-related lncRNA and pathway enrichment analysis have been nished, we wanted to deeply explore the diversity in mutation pro le. So, 498 patients have been selected to high-risk group while 475 patients to low-risk group. With waterfall plots, we could get the information of mutated genes in each sample, as well as the proportion and type of mutation in total. (Fig. 7a, b) With more statistics of the data, we found that in both groups, missense mutation accounted for the largest fraction of mutation. Meanwhile, single nucleotide polymorphism (SNP) was most common type of mutation and the transversion C>T, C>G, C>A were the top 3 SNV classes. More details of variants per sample and variant classi cation summaries in both groups were shown simultaneously. We could nd that the median variants per sample in high-risk group is 32, comparing 26 in low-risk group (Fig. 7c, d), indicating that more mutants occur in high-risk group. According to the mutation rate, the top 10 mutated genes in high-risk group are TP53, PIK3CA, TTN, MUC16, GATA3, MAP3K1, CDH1, KMT2C,   MUC4 and USH2A, and in low-risk group are PIK3CA, TP53, TTN, CDH1, MAP3K1, MUC16, GATA3, MUC4, KMT2C and PTEN. Most top 10 mutated genes in both groups are the same except for USH2A in high-risk group and PTEN in low-risk group. Finally, we used gene cloud demonstrating the frequency of the mutation, making it more visualized according to the size of the gene (Fig. 7e, f).
Immunology status based on risk score.
Immune has become the hot spot ever since it was discovered. It is known to have close relationship with multiple tumor progression and to play crucial roles in tumor cell death escape. The GSEA analysis has enriched several immune-related pathways, so we compared immune status between separated risk groups by CIBERSORTx, trying to gure out the impact of the autophagy-related signature on immunity.
Heat map (Fig. 8a) and box plot (Fig. 8b) revealed that 9 of the in ltrated immune cell types had signi cant difference between high-and low-risk groups. Among them, we found that in high-risk group, the majority of immune cells increased, including CD8 + T cell (P <0.001), activated CD4 + B memory cell (P <0.001), macrophages M1 (P <0.001), Treg cell (p <0.01), T follicular helper cell (P <0.05), activated NK cell (P <0.05) and resting dendritic cell (P <0.05), accompanied with higher immune scores. Those cells most function as suppressors for tumor. Thus, it is revealed that in high-risk group, immune system may be more vibrant. Interestingly, macrophages M2 had higher amount in low-risk group, while macrophages M1 showed more numbers in high-risk group. Macrophage M2 is recognized as tumor associated macrophages that promotes tumor growth while M1 could prevent autophagy of cells. The results may be in accordance with the possible activated immunity of the patients with higher risk, ghting for cancer.
Feasibility veri cation of the 12 autophagy-related lncRNAs in tumor and normal tissues.
Until now, we have almost nished the construction and exploration of the prognostic model. To primarily verify the function of the 12 target lncRNAs in tumor, we compared the expression of those lncRNAs between 1109 tumor tissues and 113 normal tissues. By heat map (Fig. 9a) and box plot (Fig. 9b), we found the differentially expressed lncRNA Z68871.1, AL731571.1, AC061992.1, AL136368.1, LINC01871, ST7-AS1, AL122010.1, AL109811.3 and AC121761.2. Among them, lncAC061992.1 was signi cantly highly expressed in tumor tissues, and the rest in normal tissues by mean expression levels. However, our signature recognized lncAL109811.3 as a risk factor while AC061992.1 as protective factor in prognosis.
Hence, we could nd that lncAC061992.1 is associated with the occurrence of breast cancer but is a protector for the prognosis while lncAL109811.3 acts an opposite way.

Discussion
Currently, there have been comprehensive and standardized procedures for the diagnosis and treatment of breast cancer. Although the clinical classi cation of TNM stage and molecular subtype is relatively clear and widely used, further research about prediction of survival for BC patients is still needed. Therefore, the establishment of prognosis model has positive signi cance for the prediction of clinical survival. Many scientists have reported the function of lncRNA to the oncogenesis as well as development of breast cancer. For example, Sun, etc. [Sun, et al. 2020]reported that low expression of lncRNA-MALAT1 contributed to high 5-year overall survival of BC patients and is proved to be an independent prognosis factor. There is another research about lncRNA-miRNA-mRNA axis, claiming the ceRNA function of lncPCNAP1 in BC [Yu, et al. 2020]. And autophagy plays a crucial role in removing useless or even harmful wastes from cells and recycling energy. Previous studies revealed that autophagy may interact with lncRNA [Gu, et al. 2018], circRNA [Liang, et al. 2020] and cancer antigen ] and promotes the progression of BC. In this study, bioinformatics method was used to conduct a series of information statistics and analysis, thus an autophagy-related prognostic model of breast cancer was established, which has certain signi cance.
In this study, we rstly collected patients' information through TCGA database and acquired all the lncRNA recorded. Then, autophagy-related genes data were obtained from HADb. By Pearson correlation analysis and Cox regression, we ltered 12 autophagy-related lncRNAs which were associated with OS of BC. So prognostic formula was acquired with the regression coe cients. Then with the univariate and multivariate Cox regression analysis, we nally constructed an independent prognostic model.
Since the signature was established, we further detected its relation with some clinical characteristics.
Based on the grouping of patients' risk scores and the strati cation of clinical features, we can not only get the OS difference of the BC patients in high-and low-risk groups, but also nd the survival probability of the high-and low-risk groups and distinctions of risk score under different clinical subgroups. Moreover, we also observed that there were different expressions of several autophagy-related lncRNAs among clinical subgroups, in which T phase subgroups have the most associated lncRNAs while M phase subgroups contained no signi cant lncRNA. In addition, lncRNA AL136368.1, MAPT-AS1, SEMA3B-AS1, LINC01871, ST7-AS1 and AL122010.1 have been reported to be related with more than one clinical characteristic. In all, the results indicated that our signature is associated with certain clinical features and thus might be used to predict the patients' clinical condition.
Co-expression of lncRNA-mRNA is another key promoting the research process. We not only got the network of co-expressed lncRNA-mRNA and risk type, but also acquired signaling pathway enriched by KEGG and GO analysis. We found 4 main lncRNAs, Z68871.1, AL109811.3, AC008105.3 and LINC01871, 3 of which were risk factors for BC thereby were the basis of the signature. In addition, several pathways were enriched, and autophagy is the top one, indicating the close relation between our signature and the autophagy process. Moreover, the GSEA analysis screened 12 and 31 signaling pathway in high-and lowrisk groups respectively. The pathways were correlated with metabolism, cell adhesion, autophagy and immune, etc. Then, PCA analysis demonstrated that it might be the discrepant autophagy status divided by the signature that caused the different OS between patients in high-and low-risk groups. All the ndings have indirectly proved the autophagy property of our signature.
With that, the mutant information was explored and represented in Fig. 7. With gene sets analysis, we got the top 3 mutant genes in both groups, TP53, PIK3CA and TTN, which occupied more than 70% mutant in total and all are proved to have close relationship with BC progression [Duffy, Synnott and Crown 2018, Martinez-Saez, et al. 2020, Feng, et al. 2020]. The research not only demonstrated the mutant details, but also the frequencies of the gene sets, making it easier to understand the distinct gene pro les in high-and low-risk groups.
Distinguished from other similar research, we extended our study by comparing the abundance of multiple immune cells in high-and low-risk groups. We have found several distinct cell types, most of which were immune activator, observed increased in high-risk groups. With the research, we could draw a brief conclusion that the BC patients with high-risk scores were likely to have a more abundant immune cell types and higher immune scores, which may be caused by higher burden of tumor induced activation of the immune system.
Ultimately, veri cation is necessary. So, we investigated the expression of target lncRNAs between tumor tissues and normal tissues and found that 8 out of the 12 autophagy-related lncRNAs showed signi cant difference. From the research we also found the risk factors for prognosis might be the protector for tumor occurrence.
There are two articles about constructing autophagy-related lncRNA prognostic model, one of which was from Li, etc. [Li, et al. 2021],whose study is relatively brief, only involving the construction and exploration of prognostic model, pathway analysis and primary veri cation. The other is reported by zhang, etc. [Zhang, et al. 2020]. Although the research consists more experiments, the study did not further detect the survival differences between high-and low-risk groups within various clinical subgroups, or carry out preliminary veri cation of the obtained signature in the end. So, the experiment is also incomplete. Besides, by comparison, it is found that the AUC of our signature is 0.882, which has a better prediction e ciency than that of 0.788. Surprisingly, compared with the 9 autophagy-related lncRNA obtained in the paper, we only shared two jointly screened lncRNA, MAPT-AS1 and ST7-AS1, which were protective factors for BC patients in both of the prognostic models. So, the two lncRNAs might play crucial roles in the prognosis of BC.
This study is aimed at the risk strati cation of patients in line with autophagy-related lncRNA expression level to predict their survival prognosis. This paper not only is comprehensive, but also explored the immune status of patients in high-and low-risk groups, which may be of positive signi cance for future studies on the association between autophagy and immunity. However, there are still some improvable research in this paper. For example, our study did not investigate the difference among molecular subtyping of BC. In addition, there is a lack of multidimensional proof for the signature veri cation in this paper, and the few of data on male BC patients may lead to insu cient representativeness of some studies, so more data are needed to further verify the results of these experiments.
Not only lncRNA, there are plenty of non-coding RNAs that could play essential regulatory roles in autophagy process in tumors including but not limited to breast cancer [Liang,  . And in addition to tumor prognosis, they may also contribute to cancer screening, treatment and evaluation. Therefore, there is a lot of research space and potential clinical application value for this study.

Conclusion
Our study acquired an independent autophagy-related lncRNA signature, which had universality among

Competing interests
The authors declare that they have no competing interests Availability of data and materials The datasets generated and/or analysed during the current study are available in the TCGA and HADb   Comparison of survival time between high-and low-risk groups within the clinical subgroups. The survival difference between high-and low-risk groups within the subgroups of Age<=65 (a) and Age>65 (b), Stage I&II (c) and Stage III&IV (d), T1-2 (e), and T3-4 (f) phase, N0 (g) and N1-3 (h) phase, gender (i) as well as M0 (j) and M1(k) phase.  The expression level of the 12 autophagy-related lncRNAs changed with clinical factors. a-e represents age, stage, T, N and M phase, respectively. *p<0.05, **p<0.01, ***p<0.001. PCA and GSEA analysis showed the essence of the genes in high-and low-risk groups. According to PCA, there were separations of autophagy-related lncRNAs sets between the high-(a) and low-risk (b) groups.
(c) There was no signi cant difference of the genome-wide expression pro les between the two groups