Identication and validation of autophagy-related lncRNA prognostic signature for breast cancer

Background: Long noncoding RNAs (lncRNAs) are emerging as crucial regulators to the development of breast cancer and are involved in controlling autophagy. LncRNAs are also widely known as valuable prognostic factors for breast cancer patients. It is critical to identify autophagy-related lncRNAs with prognostic value in breast cancer. Methods: A coexpression network of autophagy-related mRNAs-lncRNAs from The Cancer Genome Atlas (TCGA) was constructed. Univariate and multivariate Cox proportional hazards analyses were used to identify an autophagy risk model with prognostic value. Kaplan-Meier analysis, univariate and multivariate Cox regression analyses and time-dependent receiver operating characteristic (ROC) curve analysis were performed to validate the risk model. Principal component analysis (PCA) and Gene Set Enrichment Analysis (GSEA) functional annotation were conducted to analyze the risk model. Results: In this study, autophagy-related lncRNAs in breast cancer were identied. We evaluated the prognostic value of these autophagy-related lncRNAs and eventually obtained a prognostic risk model consisting of 11 autophagy-related lncRNAs (U62317.4, LINC01016, LINC02166, C6orf99, LINC00992, BAIAP2-DT, AC245297.3, AC090912.1, Z68871.1, LINC00578 and LINC01871). The risk model was further veried as a novel independent prognostic factor for breast cancer patients based on the calculated risk score. Moreover, based on the risk model, the low risk and high risk groups displayed different autophagy and oncogenic statues. Conclusions: These ndings suggested that the risk model of the 11 autophagy-related lncRNAs has signicant prognostic value for breast cancer and might be a promising prognostic signature and therapeutic targets in clinical practice. (HER2) Metastasis (TNM) stage; node metastasis; distant metastasis; and risk score using the R package. Time-dependent receiver operating characteristic (ROC) curves the for survival through different clinical pathological factors and risk scores using the survivalROC R package.

Long noncoding RNAs (lncRNAs) are a class of transcript RNAs longer than 200 nucleotides without the capacity of protein-coding [10]. LncRNAs are considered as one of the most sensitive and speci c cancer biomarkers, which participate in the development and progression of various cancers at different levels including epigenetic, transcriptional and posttranscriptional regulation [11][12][13][14]. Moreover, accumulating studies have suggested that lncRNAs promote cancer progression and predict worse prognosis in numerous cancers via regulating autophagy [15][16][17]. Therefore, it is valuable to identify key lncRNAs closely related to autophagy and prognosis in breast cancer.
In the present study, we analyzed a dataset of lncRNA expression in breast cancers from The Cancer Genome Atlas (TCGA) and screened out autophagy-related lncRNAs with prognostic value. We identi ed an eleven autophagy-related lncRNA signature with the potential to predict the survival prognosis of breast cancer patients.

Patient data sets
Clinical information and pathology records of patients with breast cancer were taken from the TCGA (https://cancergenome.nih.gov/). The edgeR package was used to normalize gene expression. A total of 1053 TCGA female breast cancer patients with lncRNA expression pro les were utilized in the present study.
Among them, 986 patients with complete follow-up information and survival time ≥ 30 days and 539 patients with complete clinicopathological data were applied to subsequent analyses. The clinical characteristics are detailed in Table 1. To identify autophagy-related lncRNAs associated with survival, univariate Cox proportional hazards analysis was performed with p < 0.01 as the criteria and multivariate Cox analysis was subsequently performed to establish the optimal prognostic risk model based on the Akaike information criterion (AIC = 1444.62), using the Survival R package. Then, the risk score for each patient was calculated based on the following formula: Risk score = coef (lncRNA1) × expr (lncRNA1) + coef (lncRNA2) × expr (lncRNA2) +……+ coef (lncRNAn) × expr (lncRNAn) coef (lncRNAn) was de ned as the coe cient of lncRNAs correlated with survival. expr (lncRNAn) was de ned as the expression of lncRNAs.
Breast cancer patients in the TCGA were divided into a high risk group and a low risk group according to the median risk score. Kaplan-Meier survival analysis was conducted to evaluate the survival difference between the two groups using the Survival and survminer R packages.

Independent Prognostic Analysis And Roc Curve Plotting
Univariate and multivariate Cox regression analyses were performed to assess the relationship between survival prognosis and age; estrogen receptor (ER) expression; progesterone receptor (PR) expression; human epidermal growth factor receptor (HER2) expression; Tumor, Node, Metastasis (TNM) stage; tumor size; lymph node metastasis; distant metastasis; and risk score using the Survival R package. Timedependent receiver operating characteristic (ROC) curves were plotted to evaluate the predictive accuracy for survival time through different clinical pathological factors and risk scores using the survivalROC R package.

Statistical analysis
All statistical analyses were performed using R software (version 3.6.2). The correlation between the expression of autophagy-related lncRNAs and clinicopathological factors was analyzed by ggpubr R package. Principal component analysis (PCA) was performed for effective dimension reduction, pattern recognition, and exploratory visualization of high-dimensional data of the whole-genome, autophagyrelated encoding genes and the risk model of the autophagy-related lncRNAs expression pro les, respectively [18,19]. Gene Set Enrichment Analysis (GSEA) was used for functional annotation. GSEA (https://www.gsea-msigdb.org/gsea/index.jsp) is a powerful analytical approach for interpreting genome-wide expression pro les [20]. GSEA focuses on gene sets rather than just high scoring genes, which can detect biological processes such as several cancer-related pathways, metabolic pathways, transcriptional programs, and stress responses, and so on. GSEA tends to be more reproducible and more interpretable to analyze molecular pro ling data. Two-tailed p < 0.05 was considered statistically signi cant.
According to the risk score formula and the calculated median risk score, breast cancer patients were divided into a high risk group and a low risk group. Kaplan-Meier survival analysis showed that the high risk group presented worse overall survival (OS) than the low risk group (p = 3.048E-10) (Fig. 2a), suggesting that the risk score had prognostic value. The risk curve and scatterplot were made to illustrate the risk score and the corresponding survival status of breast cancer patients. The results showed that the higher the risk score was, the more mortality occurred (Fig. 2b-c). The heatmap of the expression of these 11 autophagy-related lncRNAs expression in breast cancer samples showed that C6orf99, LINC00992, Z68871.1 and LINC00578 were upregulated in the high risk group, while U62317.4, LINC01016, LINC02166, BAIAP2-DT, AC245297.3, AC090912.1 and LINC01871 were highly expressed in the low risk group (Fig. 2d). Therefore, these above studies identi ed 11 autophagy-related lncRNAs with prognostic signi cance for breast cancer. Evaluation of the risk model of the 11 autophagy-related lncRNAs as an independent prognostic factor for breast cancer patients Univariate and multivariate Cox regression analyses were performed to assess whether the risk model of the above 11 autophagy-related lncRNAs was an independent prognostic factor for breast cancer. The hazard ratio (HR) of the risk score and 95% CI were 1.507 and 1.308-1.736 (p < 0.001) in univariate Cox regression analysis (Fig. 3a), and 1.489 and 1.280-1.732 (p < 0.001) in multivariate Cox regression analysis (Fig. 3b), respectively, indicating that the risk model of the 11 autophagy-related lncRNAs was the most signi cant prognostic factor for breast cancer, independent of clinicopathological parameters such as age, ER expression, PR expression, HER2 expression, molecular subtypes, TNM stage, tumor size, lymph node metastasis and distant metastasis. To evaluate the predictive speci city and sensitivity of the risk score on the prognosis of breast cancer patients, the area under the ROC curve (AUC) of the risk score was estimated. The AUC of the risk score was 0.774, similar to the AUC of tumor size, followed by the AUC of age and more than the AUCs of other clinicopathological factors (Fig. 3c), suggesting that the prognostic risk model of the 11 autophagy-related lncRNAs for breast cancer was considerably reliable. All of the above results indicated that the 11 autophagy-related lncRNAs were signi cant independent prognostic factors for breast cancer patients.
Correlation of the expression of the 11 autophagy-related lncRNAs expression with clinicopathological factors To further investigate whether the 11 autophagy-related lncRNAs were involved in the development of breast cancer, we assessed the association of the expression of the 11 autophagy-related lncRNAs with clinicopathological factors. There was a signi cant correlation between the 11 autophagy-related lncRNAs and ER expression, PR expression, HER2 expression and molecular subtypes, as shown in Fig. 4.
Different stemness statuses in the low risk and high risk groups PCA was performed to compare the difference between low-risk and high-risk groups based on the risk model of the 11 autophagy-related lncRNAs, 395 autophagy-related encoding genes and the wholegenome expression pro les from the TCGA, respectively (Fig. 5). The results showed that the low risk and high risk groups were distributed in distinct directions, prior to the other two patterns, suggesting that the risk model could divide breast cancer patients into two parts and that the autophagy status of breast cancer patients in the high risk group differed from those in the low risk group. Functional annotation was further conducted using GSEA, and the results displayed that the differentially expressed genes between the high risk and low risk groups based on the risk model of the 11 autophagy-related genes were enrichment in autophagy processes and oncogenic signatures (Fig. 6). Above these, it all indicated that the low risk and high risk groups showed different autophagy and oncogenic statuses.

Discussion
In the eld of clinical treatment, although the OS of breast cancer patients has gained great improvements, metastasis and recurrence of breast cancer have constantly grown, which were the major causes of breast cancer mortality. Considerable evidence has revealed that the role of autophagy in the development of cancer is a double-edged sword. That is, autophagy may serve as either a pro-survival or pro-death mechanism under different circumstances [21,22]. Consistent with the theory, autophagy functioned dual roles in breast cancer. Autophagy induced the recurrence of metastatic breast cancer by elevating the survival of dormant breast cancer cells [8]. Cytostatic autophagy suppressed the proliferation and metastasis of triple-negative breast cancer cells [9]. Furthermore, numerous researches indicated the crucial role of lncRNAs in autophagy-inducing progression or inhibition of various cancers, such as hepatocellular carcinoma, lung cancer, breast cancer [23][24][25]. Based on the above, it led us to nd potential speci c lncRNAs associated with autophagy and the survival prognosis. In this study, we identi ed the risk model of the 11 autophagy-related lncRNAs as an independent prognostic factor for breast cancer. So far, among these 11 autophagy-related lncRNAs, only LINC01016 and LINC00578 have been studied in breast cancer or other cancers. LINC01016-miR-302a-3p/miR-3130-3p/NFYA/SATB1 axis plays a crucial role in the occurrence of endometrial cancer [26]. In addition, LINC01016 is a direct target of ERα, associated with positive survival prognosis of breast cancer [27]. It has been reported that LINC00578 is associated with poor OS in pancreatic cancer and lung adenocarcinoma [28,29]. In accordance with our results, LINC01016 was low-risk autophagy-related lncRNA and LINC00578 was high-risk autophagy-related lncRNA, both with prognostic value in breast cancer patients. Breast cancer is a complex disease and highly heterogeneous tumor [30]. To assess survival prognosis and guide individual therapeutic decisions, breast cancer has been divided into distinct molecular subtypes based primarily on the expression status of hormonal receptors such as ER, PR, HER2 and Ki67 (tumor proliferation index) as follows: luminal A/B (ER and/or PR positive), HER2 enriched (HER2 positive) and triple-negative breast cancer (ER, PR and HER2 negative) [31]. It is also well known that ER and/or PR positive indicates an effective endocrine therapy outcome and better survival prognosis; HER2 positive represents a more aggressive phenotype but is sensitive to HER2-targeted therapy; triple-negative is enriched with breast cancer stem cells and associated with a worse prognosis due to the lack of effective therapeutic targets [32,33]. Thus, distinct ER, PR and HER2 statuses indicated different biological processes of breast cancer and survival outcomes. The signi cant correlations of the 11 autophagyrelated lncRNAs with ER expression, PR expression, HER2 expression and molecular subtypes indicated that the 11 autophagy-related lncRNAs might be involved in regulating ER, PR and HER2 gene expression, further affecting molecular subtype formation. More interestingly, there were no signi cant relationship between most of the 11 autophagy-related lncRNAs and tumor size (T), lymph node (N) status, and TNM stage, indicating that the risk model of the 11 autophagy-related lncRNAs has no close correlation with the sooner or later for nding and diagnosis of breast cancer, but only strongly linked to the intrinsic biological characteristics of distinct subtypes of breast cancer. Our results also showed that the risk model of the 11 autophagy-related lncRNAs was the most statistically signi cant prognostic signature compared with other clinicopathological factors, and the ROC result con rmed that the risk model is reliable. Combined with the AUC of the risk model score, these results all indicated that the risk model of the 11 autophagy-related lncRNAs had superior prognostic value to other clinicopathological factors. The results of PCA and GSEA functional annotation illustrated that the high risk and low risk groups showed different distribution directions and aggregation centers based on the risk model, rather than the wholegenome pro les and autophagy genes expression, indicating that the signi cant difference in OS between the high risk and low risk groups might result from different autophagy and oncogenic statuses induced by the risk model of the 11 autophagy-related lncRNAs. These results also suggested that the risk model of the 11 autophagy-related lncRNAs represented more signi cant autophagy characteristics than autophagy-related genes. Taken together, these results suggested that the prognostic signature of the 11 autophagy-related lncRNAs might be a feasible independent prognostic factor for breast cancer in clinical practice. To date, a key challenge of precision genomic medicine is to make reliable and accurate predictions of clinical outcomes from high-dimensional molecular data [34]. To solve this problem, there have been some advances in the Cox regression with prognostic value in recent years. A Cox elastic net has been used in objective and data-driven feature selection with time-to-event data [35]. Cox-nnet is an arti cial neural network approach that has been utilized in predicting low-dimensional survival prognosis [36]. Bayesian-optimized deep survival models (SurvivalNet models) have successfully improved the accuracy of prognostic prediction for high-dimensional cancer genomic pro les [37]. In addition, Cox-nnet produces a better performance than SurvivalNet models, and SurvivalNet models provide comparable performance to the Cox elastic net [37,38]. Moreover, Cox-PASNet, which is a novel pathway-based sparse deep neural network for survival analysis that intergrats high-dimensional genomic data and clinical data, has been applied to identify signi cant prognostic factors [38]. However, our study has some limitations. We applied univariate and multivariate Cox proportional-hazards analysis to establish and estimate the prognostic value of the risk model of 11 autophagy-related lncRNAs. Although the method has been approved and employed in many researches, it is necessary to improve our further study with more advanced methodologies and technologies in future. To further validate our bioinformatics prediction results, in-depth studies on the 11 autophagy-related lncRNAs, including functional experiments and molecular mechanisms, are needed.

Availability of data and materials
All data utilized in this study are included in this article and all data supporting the ndings of this study are available on reasonable request from the corresponding author. Figure 1 Identi cation of autophagy-related lncRNAs with signi cant prognostic value in breast cancer. a The forest showed the HR (95%CI) and p-value of selected lncRNAs by univariate Cox proportional-hazards analysis. b A visualization coexpression network of the prognostic risk model of the 11 autophagy-related lncRNAs with autophagy-related mRNAs.   The correlation of the expression of the 11 autophagy-related lncRNAs with clinicopathological factors. a ER expression. b PR expression. c HER2 expression. d Subtypes (LuminalA/B; HER2 ampli cation; TNBC: triple-negative breast cancer). e TNM stage. f Tumor size (T1: <2cm; T2: ≥2cm and <5cm; T3: ≥5cm; T4: invasion of chest wall and/or skin). g N classi cation (N0: no lymph node metastasis; N1: 1-3 lymph node metastasis; N2: 4-9 lymph node metastasis; N3: ≥10 lymph node metastasis). h M classi cation (M0: no distant metastasis; M1: distant metastasis). ns: no statistical signi cance, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.

Figure 5
The low risk and high risk groups displayed different autophagy statuses. a-c Principal components analysis (PCA) between low-risk and high-risk groups based on the whole-genome, autophagy-related encoding genes and the risk model of the 11 autophagy-related lncRNAs expression pro les.

Figure 6
Functional enrichment analysis based on the risk model of the 11 autophagy-related lncRNAs by GSEA.
Signi cantly enriched autophagy of Gene Ontology (GO), KEGG pathways and oncogenic signatures in the low risk group.