Identification of DE-ARGs
Autophagy has been reported to contribute to TNBC progression. In this study, we want to construct a prognosis model using ARGs signature for TNBCs. The overall experimental design in this study was indicated as diagram (Figure 1). We first obtained the expression profiles contain 162 normal samples and 103 TNBC samples from TCGA database. A total of 43 differentially expressed ARGs (DE-ARGs) were identified by comparing normal and tumor samples with the cut off criteria of FDR< 0.05 and |log2 FC| >1. The volcano map (Figure 2.A), box plots (Figure 2.C), and heatmap (Figure 2.D) demonstrated that 21 ARGs were significantly down-regulated, while 22 ARGs were up-regulated in TNBC patients. These DE-ARGs interacted with each other forming an autophagy network as measured by string (Figure 2.B). Moreover, we observed many mutations occur on these DE-ARGs in TNBCs (Supplementary Figure 1).
Enrichment Analysis of DE-ARGs
To determine the functional enrichment of DE-ARGs, we performed GO and KEGG enrichment analysis. We found that these 43 DE-ARGs were highly correlated to autophagy, process utilizing autophagic mechanism and peptidyl serine modification in the term of biological process (BP). In the aspect of cellular components (CC), these genes were enriched in nuclear envelope, mitochondrial outer membrane and organelle outer membrane. For the molecular functions (MF), these genes were mainly concentrated in ubiquitin protein ligase binding, ubiquitin-like protein ligase binding and protein phosphatase (Supplementary Figure 2.A). Moreover, KEGG enrichment analysis indicated that the DE-ARGs were involved in the signaling pathways such as autophagy-animal, EGFR tyrosine kinase inhibitor resistance and apoptosis (Supplementary Figure 2.B). Overall, these data suggested that these ARGs play a role in other biological process in addition to autophagy.
Construction of a Prognostic ARG Signature of TNBC in the Train Set
To build a ARG prognostic model, we first analyze the risk score of all ARGs in TNBC by performing univariate Cox regression analysis. Eight ARGs were screened out including seven potential risky genes and one potential protective gene (Figure 3.A). Subsequently, we performed Lasso regression analysis on the basis of univariate Cox regression analysis (Figure 3.C, D). Then, we constructed the prognostic ARG signature for OS using CDKN1A, CTSD, CTSL, EIF4EBP1, TMEM74 and VAMP3 by Lasso regression. Finally, we performed multivariate Cox regression analysis and screened out four ARGs including three potential risk genes and one potential protective gene (Figure 3.B).
Next, we tested if the expression of these six ARGs were correlated with prognosis of TNBCs. We found that high expression of EIF4EBP1 (p=0.046), CTSL (p=0.009) and CTSD (p=0.07) might indicate a worse prognosis. There were no statistical differences in the survival analyses of CDKN1A (p=0.362), TMEM74 (p=0.017) and VAMP3 (p=0.0189) (Supplementary Figure 3). Then we want to validate whether this ARG signature can predict OS of TNBC. We first divided TNBC patients into “high risk” (n=50) and “low risk” (n=51) group according to the threshold of the median risk score (Figure 4.A). The risk score for each patient was calculated based on the formulate : risk score=(0.246026×CDKN1A)+( 0.359130×CTSD)+(0.234375×CTSL)+(0.590736×EIF4EBP1)+(-0.281261×TMEM74)+(0.338378×VAMP3).Patients were assigned to high-risk (n=50) and low-risk groups (n=51) according to the threshold of the median risk score. Patients with higher scores were more likely to poorer prognosis (Figure 4.C). A heatmap was used to visualize differences in expression levels of the six ARGs between groups (Figure 4.D). Survival curves further indicated that patients in high risk group showed a significantly lower probability of survival compared to low risk group (p < 0.05) (Figure 4.B). ROC analysis showed that the AUCs for 1-year, 3-years and 5-years OS were 0.925,0.866 and 0.784, respectively (Figure 4.E). These data suggested that ARGs signature in our model could benefit the prognosis prediction of TNBCs.
Validation of the Risk Score of ARG signature in a GEO Test Set
To further validate the prognostic and predictive role of ARGs signature, we employed another GEO cohort as a test set to calculate risk scores using the same formula used in the train set. The patients from of the test set were divided into the high-risk group (n=34) and low-risk group (n=73) by the median value of the train set (Figure 5.A), and higher risk score predicted poorer prognosis in the patients (Figure 5.C). A heatmap was presented to visualize the difference expression levels of the six ARGs between test groups (Figure 5.D). Similar to the train set, patients in the high-risk score group showed a poorer prognosis compared to low-risk group in the test set (p < 0.05) (Figure 5.B). Time-dependent ROC analysis showed that the prognostic accuracy of OS was 0.798 at 1 year, 0.564at 3 years and 0.696 at 5 years (Figure 5.E).
Independent Prognostic Indicator of the Prognostic Risk Model
To confirm whether risk scores can be used as an independent predictor for TNBC patients’ survival, we further performed univariate analysis in training set. Univariate Cox regression analysis revealed that N, T, stage and risk score were meaningful for predicting OS (Figure 6.A). Subsequently, we performed a multivariate Cox regression analysis to verify risk score as independent predictor (p<0.001) (Figure 6.B). Moreover, we identified that the expression of CTSD was significantly associated with stages (p=0.025) (Supplementary Figure 4.A) and T (p=0.031) (Supplementary Figure 4.B). These data demonstrated that our model could be a reliable prognostic predictor and biomarker in addition to known clinical classification.
Construction of the Nomogram and Performance Validation
To provide the clinician with a better quantitative method to predict prognosis of TNBC patient, we established a nomogram with multiple factors including N, T, stage and risk score (Figure 7.A). The nomogram was used to evaluate the survival probability of 1-, 3-, and 5-year. Nomograms showed a good performance with a high C-index of 0.764, suggesting that it could be served as an effective tool for prognostic evaluation of patients with TNBC. In addition, we constructed calibration curves which showed that the predicted and actual survival rates were in agreement with 1, 3, and 5 years (Figure 7.B-D). Finally, we compared the predictive accuracy for TNBC between the nomogram and clinicopathological risk factors by the values of AUC. Our model’s AUC value (AUC of 1-year, 3-year, 5-year OS) was higher than the traditional prognostic scoring systems (Figure 7.E-G). These findings revealed that the nomogram with our risk scores can improve predicting OS.
Enrichment Analysis between high-risk group and low-risk group
Finally, we performed Gene Set Enrichment Analysis (GSEA) between the high-risk group and the low-risk group in TCGA and GSE58812 cohort respectively to further provide biological insight. We found that the enriched KEGG pathways of high-risk group in TCGA cohort included Apoptosis, Fc epsilon RI signaling pathway, Glycosylphosphatidylinositol GPI anchor biosynthesis, Lysosome and Olfactory transduction. Meanwhile, enriched KEGG pathways of low risk group included Protein export, RIG-I like receptor signaling pathway, RNA polymerase, Taste transduction, Toll-like receptor signaling pathway (Supplementary Figure 5.A). In addition, KEGG enrichment pathway analysis of high-risk group in GSE58812 cohort indicated that the genes were enriched in ABC transporters, Arginine and proline metabolism, Lysosome, Pathogenic Escherichia coli infection and Pentose phosphate pathway. KEGG enrichment pathways analysis of the low-risk group in GSE58812 cohort were mainly concentrated in Regulation of autophagy, RIG-I like receptor signaling pathway, RNA degradation, Spliceosome and Vibrio cholerae infection (Supplementary Figure 5.B).
Knockdown of eIF4EBP1 inhibited TNBC cell proliferation and migration
We next want to test the biological function of these ARGs in our model in TNBCs. Among these six genes, the function of eIF4EBP1 in TNBCs remains unknown. We knockdown eIF4EBP1 using two independent siRNAs in two TNBC cell lines: MDA-MB-231 and BT549 (Figure. 8A). Depletion of eIF4EBP1resulted in dramatic decrease of cell growth and colony formation (Figure 8.B, C, D, F). Edu staining showed that knockdown of eIF4EBP induced a dramatically decrease in proliferation (Figure. 8E, H). In addition, eIF4EBP1 knockdown significantly impaired cell metastasis as measured by trans-well and wound healing assay (Figure 8.G, I, J, K). Furthermore, we observed increased eIF4EBP1 expression in primary TNBC samples compared to adjacent normal tissues in collected 3 TNBC patients (Figure 8.L). Based on the Human Protein Atlas database, the protein expression levels of eIF4EBP1 were evaluated by CAB005032 antibody. Among 12 TNBC tissues examined, 6 cases had medium to high staining (4 medium and 8 high), while no cases had low staining. Representative IHC image showed that eIF4EBP1 staining was higher in TNBC than in normal tissues (Figure 8.M). Overall, these findings suggest a potential oncogenic role of eIF4EBP1 in TNBCs supporting the importance of our prognosis model in TNBCs.