Identification of TME subtypes
The infiltration patterns were analyzed with CIBERSORT deconvolution algorithms by quantifying the fractions of 22 immune cell types in TME of 1095 TCGA-BRCA patients. The landscape of immunological infiltration was exhibited in Fig. 1, Figure S2 and Table S2. Data on the estimated proportions of infiltrating cells indicated an evident heterogeneity, considering, an unsupervised clustering analysis was performed to determine the potential TME subtypes. On the basis of consensus clustering method, three robust TME clusters were retrieved, and the prognosis of patients from these curated subtypes was different with statistical significance (P = 0.023) (Fig. 2a-b). With the aim of validation, the bulk tissue gene expression profiles of 1525 patients from METABRIC database was also estimated for immune infiltrations and undergone a clustering analysis (Table S3), from which three clusters were obtained with statistical difference in OS (P < 0.001) (Fig. 2c-d), indicating this partition was stable for TME subtypes in terms of immunological infiltrations.
Signatures of TME phenotypes
To explore the contributing mechanisms for versatile TME phenotypes, we systematically conducted differential analysis to identify the DEGs and enrichment analyses to elucidate the biological profiles. A total of 20530 DEGs were identified through differential analysis, of which 664 genes significantly expressed heterogeneity among three immune-related TME subtypes, with 233 genes up-regulated and 431 genes down-regulated (Fig. 3a-b, Table S4). Among the identified DEGs, MMP9, SLC16A3, CDT1, CA9 and CDC20 were revealed to be the leading up-regulated, while P2RY12, GPR34, ABCB1, IGF1, and CX3CR1 were estimated to express down-regulated with the foremost significance.
Enrichment analyses, including GO, KEGG, and GSEA, were carried out to portray the landscape of biological profiles of DEGs, and illustrate the mechanism contributing to the differential phenotypes.
GO analysis demonstrated that the significant DEGs intensively mapped to the GO terms concerning cellular localizations including collagen-containing extracellular matrix, condensed chromosome, centromeric region, and spindle (Fig. 3c). Downstream analysis from KEGG suggested that DEGs significantly involved in the enriched pathways comprising neuroactive ligand-receptor interaction, PI3K-Akt signaling pathways, and cell cycle (Fig. 3d). GSEA results showed the DEGs, in terms of cancer hallmarks, were ensemble of cell cycle dysfunction, including E2F targets and G2M checkpoints, while were significantly muted in metabolism such as adipogenesis and myogenesis. Regarding immune-related signatures, the improved effectiveness of CD8 T lymphocyte was expected to facilitate the genesis of TME phenotypes (Fig. 3e). Built on cluster analysis, PPI networks were built to clarify the interactive functions of significant DEGs, and the corresponding relationships were shown with ranked degree (Fig. 3f).
Prognostic value of signature genes
Survival analysis was carried out using the KM method to identify the significant genes with predictive values for survival. The whole 20530 DEGs were successively assessed based on the corresponding expression from each patient, and a total of 2274 genes were exhibited a statistical significance for OS. Then, random-forest analysis and Lasso analysis were synchronously performed to further recognize the significant genes for prognosis, which discriminated 411 and 269 genes, respectively, and the intersected 44 genes were prepared for the following analysis (Fig. 4a-c). Next, we carried out univariate COX regression analysis for identification of the most significant variables for survival, and the panel with a total of 15 genes was finally determined (Fig. 5a-b, Table S5).
Subsequently, the pIRS was constructed based on the expression of 15 genes in combinations with the corresponding coefficient for each patient, and all cases were stratified into high pIRS group and low pIRS group. Survival analysis suggested that patients from these two groups exhibited a divergent clinical prognosis with statistical significance (P < 0.0001), which was in accordance with the result of the validation cohort (P < 0.0001) (Fig. 5c-d).
Comparative analysis for immunological phenotypes was performed, of which the results indicated that the abundance of B lymphocytes and CD8 T cells was notably higher in low pIRS group (Fig. 6a). Distributions of leukocyte infiltration among groups with diverse clinicopathological factors was shown in Figure S3. The expression levels of immunologic modulators were also evaluated between two groups with a total of 73 signature genes included[15] (Figure S4). Interactive correlations were shown in Fig. 6b, and the genes, in particular, associated with the immune checkpoint were undergone comparative analyses (Fig. 6c). Results from paired analyses showed that the expression of PDCD1 (PD-1) and ICAM1 was significantly higher in low pIRS group, while VEGFA presented an increasing trend of expression in high pIRS group.
Prediction model for prognosis of breast cancer
A total of 677 breast cancer patients with complete clinicopathological characteristics including age, pathological TNM stage (pTNM), molecular features, PAM50 subtypes, and pIRS group were adopted into the COX proportional model for quantitatively estimation of survival. Results from multivariate regression analysis demonstrated that the age, pTNM, and pIRS group were independent factors for prognosis of breast cancer, which were utilized to construct nomogram for the prediction of 3-year, 5-year, and 10-year survival probability (Fig. 7a, Table S6). Time-dependent ROC suggested that the time-dependent under curve area were ranging from 0.77 to 0.78, indicating the curated prognostic model was well performed (Fig. 7b), and the quantified C-index was 0.823 obtained from training cohort and 0.776 from validation cohort, respectively, which were generally higher than those computed from TNM staging system, revealing the robustness in addition to a better accuracy of this prediction model (Table S7).