Identification of prognostic immune-related genes
The work flow of this study is delineated in Fig. 1. A total of 2600 genes were identified from datasets and literature search. And then, the gene expression profiles from 1369 patients with ER or PR positive and HER2 negative breast cancer identified in the METABRIC dataset were used to perform the univariate cox analysis. As a result, 102 genes were discovered to be significantly associated with overall survival.
Then we performed the Lasso Cox regression analysis to eliminate the redundant collinearity and further validate the robustness. As a result, 7 genes were identified in Lasso regression from the 102 genes. Then we did stepwise multivariate Cox regression analysis of the 7 genes in the METABRIC dataset and ultimately, a prognostic signature comprising these 7 genes, including SHMT2, AGA, COL17A1, FLT3, SLC7A2, ATP6AP1, and CCL19, were selected to construct a prediction model. As shown in the forest plot, SHMT2 and ATP6AP1 were risk factors, whereas the other five genes were protective factors.
The comprehensive risk score (RS) was imputed as follows: h0(t)*exp(β1X1 + β2X2 +.... + βnXn). The Nbclust and ConsensusClusterPlus analysis showed that the optimal number of clusters was two. Therefore, the median risk score was set as the cut-off value, patients were categorized as two groups, RS-low and RS-high groups. (Fig. 2)
Performance of Risk Signature in Luminal breast cancer from METABRIC and TCGA
In order to validate the prognostic predicting role of the RS-based group in different datasets, METABRIC, TCGA, GSE20685 and GSE 9195 cohorts were included. The RS performed consistently in predicting long-term outcomes, including overall survival, disease free survival or distant-metastasis free survival. (Fig. 3)
In subgroup analysis, the RS also showed excellent predicting ability in luminal A, luminal B, stage II, with or without lymph node, and different age groups in both METABRIC and TCGA datasets. (Fig. 3)
Multivariate Cox analysis showed that the risk signature was the independent prognostic factor in both METABRIC and TCGA cohorts, after adjusting for established prognostic variables including TNM stage, age and tumor grade.(Table.1–2)
The Association Between Risk Signature And Clinicopathological Parameters
Subsequently, we analysed the relationship between the RS and clinicopathological parameters. It is shown that the RS-high group accounted for the highest percentage (> 80%) in the HER2-positive disease, followed by Luminal-B and Basal subtype in both METABRIC and TCGA cohort.(Fig. 4)
In order to explore the relationship between other gene prognostic score and our risk signature. 70-gene prognostic score and tamoxifen resistance signature (TAMR13) were calculated by genefu package. The results demonstrated that both 70-gene prognostic score and TAMR13 were significantly higher in the RS-high compared with RS-low group.(Fig. 4)
Correlation Of The Risk Signature With Tumor Microenvironment
To investigate tumor immunity relevance of the risk signature, the association of the RS with tumor purity and the presence of infiltrating stromal/immune cells in tumor tissues were evaluated by applying the ESTIMATE and CIBERSORT algorithm. As showed in Fig. 5, the stromal- and immune- score were both significantly higher in the RS-low group. RS was also strongly correlated with several important immune-related genes such as IL33 and TGFBR2. (Fig. 5)
By applying the CIBERSORT algorithm, the relative proportions of 22 immune cell subsets of ER or PR negative and HER2 positive breast cancer in the TCGA and METABRIC datasets were estimated. Consecutively, compared with RS-high group, the infiltration levels of CD4 memory resting T cells was increased, while the level of macrophage M0 was decreased significantly in RS-low group.(Fig. 5)
We further investigated the prognostic values of the abundance of infiltrative immune cells. The high abundance of CD4 memory resting T cell was significantly associated with favourable prognosis (p < 0.001), while the high abundance of macrophage M0 was associated with unfavourable survival (p = 0.021).(Fig. 5)
Risk signature is associated with immune and cell cycle related phenotypes
In order to further characterize the phenotype contributing to the worse prognosis in the high-risk group, we firstly performed differential expressed gene (DEGs) analysis of RS-high versus low group in the METABRIC and TCGA dataset. Then we performed GSEA using the collection of the MsigDB for these DEGs. As a result,we found that in the RS-low group, the immune-related pathway such as GSE22886_NAIVE_BCELL_VS_NEUTROPHIL_UP and KEGG_CYTOKINE-CYTOKINE RECEPTOR_INTERACTION were significantly enriched, while in the RS-high group, the cell cycle-related pathway such as KONG_E2F3_TARGETS and ISHIDA_E2F_TARGETS were enriched. (Fig. 6)
The heatmap displayed the expression of the core genes that contribute to pathway enrichment between the two groups. Notably, those cell cycle-related genes were predominantly up-regulated in the high-risk group, while the immune-related genes were significantly up-regulated in the low risk group.(Fig. 6)
In order to explore the relationship between the enrichment of pathway and survival, we selected gene sets involving immune and cell cycle pathway. A total of 10 gene sets from the MsigDB were selected and the gene set enrichment score of each sample was calculated by the GSVA method. The activity of each gene set in each sample was estimated by the enrichment score. Then the univariate cox analysis were performed. As showed in Fig. 7, the enrichment of cell cycle-related pathway was significantly associated with worse survival, and the enrichment of immune-related pathway was significantly related with favourable survival.