Identification of gradually up/down-regulated genes
Compared to normal breast tissue samples, a total of 5770 DEGs in stage I BCs, 6083 DEGs in stage II BCs, 5966 DEGs in stage III BCs, and 5855 DEGs stage IV BCs, a total of 4656 common DEGs were identified in stage I-IV BCs (Fig. 1A-B). We summarized 10 major variation tendencies of gene expression by STEM analysis in BC gene expression profiles in TCGA (Fig. 1C). Based on these tendencies, DEGs were classified into two groups: one group (BC1 cluster) containing gradually up-regulated genes with the development of BC, these genes showed an up-regulated tendency during the stage from Ⅰto Ⅳ (Fig. 1D-E), indicating the progression of BC, there were totally 476 up-regulated DEGs, named BC unfavorable gene set; while the second group (BC2 cluster) containing gradually down-regulated genes with the development of BC (Fig. 1F-G), which meant good prognosis, containing totally 262 down-regulated DEGs, named BC favorable gene set.
To identify BC DEGs associated with prognosis, we first used a univariate Cox regression analysis to assess the association between the expression levels of each DEG, and found that 26 genes were significantly associated with BC prognosis in BC unfavorable gene set (Supplemental Table 1; P < 0.01) and 14 genes in BC favorable gene set (Supplemental Table 2; P < 0.01).
Establishment and Evaluation of Prognostic Model Based on BC DEGs
Lasso regression analysis was performed to establish risk models concerning BC prediction and prognosis based on two characteristic gene sets of BC development. The results showed a total of 12 DEGs (Fig. 2A-D) of BC unfavorable gene set and 7 DEGs (Fig. 2E-H) of BC favorable gene set were significantly associated with BC prognosis. On the basis, two risk models were obtained, consisting of 12 genes and 7 genes respectively, named BC unfavorable model (BC1 model) and BC favorable model (BC2 model).
The ROC analysis was performed to evaluate the prognostic efficacy of two risk models. In BC1 model, the AUC was 0.80 (Fig. 2I), presenting great capability of the model for BC prognosis; while in BC2 model, the AUC was 0.66 (Fig. 2L). Compared with the BC2 model, BC1 model was more efficient in prognostic capability. We also carried out ROC analysis on BC patients for 1, 2, 3 years after diagnosis, the AUC was 0.80, 0.77, and 0.75, respectively in BC1 model (Fig. 2J), and 0.66, 0.69, 0.74 in BC2 model (Fig. 2M). The results showed that both BC1 and BC2 models could accurately predict the survival rate of BC patients, especially for BC1 model.
Additionally, we compared the efficacy of risk score, age, stage, T, M and N for BC prognosis. In BC1 model, the AUC of risk score was 0.80, which remained the highest among these clinical characteristics, illustrating that BC1 model was more efficient than other clinical factors for prognosis (Fig. 2K). Whereas in BC2 model, the AUC of risk score was 0.64, which was lower than those clinical characteristics, including 0.71 in age, 0.64 in stage, 0.70 in T, 0.55 in M and 0.62 in N (Fig. 2N).
Survival analysis was performed to evaluate the survival possibility of BC patients divided into high risk group and low risk group based on these two models. In BC1 model, the results of survival analysis manifested that, the high risk group survival possibility was lower than low risk group, and the gap between two groups getting much wider with the survival years getting longer (P < 0.001; Fig. 3A). The final survival time of high risk group was less than 22 years, shorter than the low risk group, which was 23.5 years. In the scattergraphs, we ranked BC patients by the risk scores evaluated with our model from low scores to high scores in the horizontal axis. As a result, the vertical axis showed their survival time, green scatters represented alive cases and red represented dead, the graphs demonstrated that more dead cases appeared in patients with high risk scores and more alive cases appeared in patients with low risk scores, the survival time of patients with low risk scores was longer than that in patients with high risk scores. As shown in Fig. 3B, the survival time in high-risk group was significantly lower than that in low-risk group using BC1 model. In BC2 model, the results were similar (Fig. 3C-D), indicating that these two models can accurately predict the survival rate of BC patients.
Association of risk score from BC models with clinical characteristics
Multivariate Cox regression analysis and Univariate Cox regression analysis were used to identify factors independently associated with BC prognosis. For both BC1 and BC2 models, P values of age, N and risk score were all less than 0.001, suggesting the factors age, lymphatic metastasis and the risk score were all independently associated with BC prognosis. These findings demonstrated that the risk score derived from our model was an independent prognostic predictor of BC patients (Fig. 3E-H).
The performance of the risk scores was tested with each clinical characteristic. Significant differences were observed between risk score and age (P < 0.001; Fig. 3I) in BC1 model, while risk score and tumor topography was significantly correlated (P < 0.001; Fig. 3J) in BC2 model.
In the heatmap, there were obviously more aged ( > = 60 years) in patients with high risk scores, yet there was no distinct regularity between risk scores and other features, which was consistent with the results of multivariate Cox regression analysis that age was independent factor correlated with BC1 model (Fig. 3I). In BC2 model, marked difference was observed between tumor topography and the risk score (Fig. 3J).
Immune infiltration analysis and drug susceptibility
To further explore the immune-related mechanism involved in BC patients, we performed the immune infiltration analysis using two BC prognosis models. It is strongly suggested that M2 macrophage was significantly correlated with BC1 and BC2 models (Fig. 4A-B). Moreover, more Macrophage M2 infiltration was remarkably higher in high risk group than in low risk group (Fig. 4C-D).
For BC chemotherapy, Bortezomib is a first-in-class selective and reversible proteasome inhibitor that targets the 26S proteasome (13). Previous studies have observed that Bortezomib combined with antiestrogen therapy might have therapeutic advantage in the management of early-stage breast cancer (14). Our study also found out that, patients in high risk group were more sensitive to Bortezomib than low risk group (Fig. 4E-F).
To further test these two models in cohorts from different populations, we assessed the testing power of two models of breast cancer development-related gene sets in GSE4922. The AUC of BC1 model was 0.79, higher than that of BC2 model, implying that the testing power of BC1 model was stronger than BC2 model (Fig. 4G-H).