Identification of high immunogenicity modules in TNBC by WGCNA
First, we performed differential mRNAs expression analysis between 123 TNBC tumors and 113 normal control in TCGA database. As a result, 6927 differentially expressed mRNAs were screened, of which 3584 genes were upregulated and 3343 genes were downregulated in the TNBC samples compared with normal samples (Fig. S1). Taking the intersection of DEGs with IRGs, 1139 differentially expressed IRGs were obtained, of which 690 genes were upregulated and 449 were downregulated in TNBC samples (Fig. 1a). Functional annotation and pathway analysis revealed that these genes were involved in the process of T cell activation, negative regulation of immune system and chemokine signaling pathway (Fig. 1b-1c). To obtain the high immunogenicity modules in TNBC, WGCNA analysis was performed based on the expression of DEIGs. The outliers’ sample was removed by sample clustering (Fig. S2). The optimal soft-thresholding power of 6 was selected determined by the scale-free network (Fig. 1d). A hierarchical cluster analysis was applied to detect co-expression clusters and 7 co-expression modules were found using dynamic tree-cutting methods (Fig. 1e). Among the 7 co-expression modules, DEIGs in five modules (magenta, green, pink, turquoise, grey) were higher expression in TNBC tumors than normal (Fig. 1f). Since these IRGs from the five modules were more enrichment in TNBC samples, indicating TNBC patients showed a higher expression of these IRGs. Finally, the expression of IRGs in the five high immunogenicity modules was extracted for further analysis.
Construction of distinct IRGs expression patterns in TNBC
We combined TCGA-TNBC and METABRIC-TNBC cohort to generate a final metacohort. Consensus clustering analysis was applied to classify TNBC samples into different IRGs patterns and two IRGs clusters were identified (Fig. 2a and Fig. S3). We term it as IRGcluster A-B and survival analysis showed that TNBC patients with IRGcluster B had a better overall survival (OS) than IRGcluster A (Fig. 2b). ssGSEA scores were calculated to quantify the abundances of 23 immune-infiltrating cells in different IRGs clusters. IRGcluster B subtype presented a higher abundance of activated CD8 T cell, activated dendritic cell, natural killer cells and so on (Fig. 2c). Heatmap demonstrated that the expression of IRGs was higher in IRGcluster B subtype compared to IRGcluster A (Fig. 2d). GSVA was conducted to explore the biological process in two IRGs clusters. As shown in Figure 2e, IRGcluster-B was enriched in KRAS signaling, TNFA signaling and interferon gamma response. Furthermore, immune-related pathways including natural killer cell mediated cytotoxicity, T cell receptor signaling, and antigen processing and presentation were highly enriched in IRGcluster-B subgroup (Fig. 2f). Together these results demonstrated that ICGcluster B was as a high immunogenicity TNBC subtype characterized by abundant immune cells infiltration and activated immune-related pathways.
Generation of IRGs gene signatures and construction of the immune-related gene panel (IRGP)
We further screened the DEGs between two IRGclusters. In total, 530 DEGs were recognized, and functional enrichment analysis demonstrated that these genes were involved in leukocyte mediate immunity, MHC protein complex and various immune processes (Fig. S4). Then, we performed univariate cox regression analysis and 266 prognostic IRGs were associated with OS in TNBC (Table S1). Unsupervised consensus clustering further classified TNBC into two IRG gene clusters (Fig. 3a). Kaplan-Meier curve illustrated that IRG-genecluster B was correlated with better OS compared with IRG-genecluster A (Fig. 3b). To construct an Immune-Related Gene Panel (IRGP) for predicting the OS of TNBC patients, we applied LASSO regression based on the expression of 266 prognostic IRGs and a three-gene signature was identified (Fig. 3c and Fig. 3d). The riskscore was calculated by the formula IRGP = expression level of TAPBPL *(-0.41) + expression level of FBP1*(-0.22) + expression level of GPRC5C*(0.26). Interestingly, a lower riskscore was observed in IRGcluster B and IRG-genecluster B (Fig. 3e and Fig. 3f). Alluvial diagram illustrated that IRGcluster B had a high overlap with the IRG-genecluster B subtype, which showed a lower IRGriskscore with a favorable survival prognosis (Fig. 3g).
Validation of the capacity of IRGP
Based on the median IRGriskscore, TNBC cohort was divided into low-risk and high-risk subgroup. TNBC patients were randomly classified as two datasets: the training
cohort (N = 211) and testing cohort (N = 210). Kaplan–Meier survival curves showed that TNBC patients with lower IRGriskscore had a longer survival time than those with higher IRGriskscore in both the training and testing cohort (Fig. 4a, b). ROC curves with AUC at 1 year, 3 year and 5 year also demonstrated the moderate accuracy of our IRGP (Fig. 4c, d). Distribution of risk scores between high risk and low risk groups was shown in Fig. 4e and Fig. 4f. The scatter plots demonstrated that an increase of deaths was accompanied by increasing risk score in both the training cohort and testing cohort (Fig. 4g, h). As shown in Fig. 4i, j, the expression of TAPBPL and FBP1 was lower in high-risk group, while a higher expression of GPRC5C was observed in high-risk group.
Prognostic value and genomic features in different IRGP subgroups
PCA and t-SNE analysis showed that IRGP subgroups presented obvious segregation in training cohort (Fig. 5a, b). A similar result was also observed in testing cohort (Fig. 5c, d). To investigate whether the effect of IRGP on OS was an independent factor for TNBC patients, univariate and multivariate Cox analyses for variables including age, tumor stage, tumor size, node status and IRGriskscore were performed. Univariable Cox regression analysis revealed that IRGriskscore was a high-risk factor for TNBC [HR =2.188, 95% CI (1.360, 3.522), p < 0.001, Fig. 5e]. Multivariate Cox analysis indicated that IRGP could serve as an independent prognostic factor [HR =2.216, 95% CI (1.384, 3.548), p < 0.001, Fig. 5f]. Then, we explored the genomic features in different IRGP subgroups of TCGA-TNBC cohort. The waterfall plot revealed that the low IRGriskscore group presented a decreased mutation burden compared with the high IRGriskscore group (Fig. 5g, h). Among the top 20 most significant mutated gene, TP53 (84% vs. 75%) had higher mutation rates in the high IRGriskscore group.
Immune characteristics of different IRGP subgroups
Immune checkpoints gene expression was often considered as indicators for ICIs treatment. Firstly, the expression of PD-1, PD-L1 and CTLA-4 was significantly higher in low IRGriskscore group compared to high IRGriskscore group (Fig. 6a-6c). IRGriskscore also had a significant negative correlation with the expression of PD-1, PD-L1 and CTLA-4 (Fig. 6d-6f). Immune cells landscape suggested that the proportion of CD8 T cells, B naïve cell, plasma cells, M1 macrophages and T helper cells were more abundant in the low IRGriskscore subgroup, while M2 macrophages were more abundant in the high IRGriskscore subgroup (Fig. 6g). Correlation analysis further suggested that IRGriskscore presented a positive correlation with M0 macrophages, M2 macrophages and T regulatory cells (Fig. 6h-6j). However, a negative correlation was found between IRGriskscore and activated NK cells, CD4 memory T cells, CD8 cells and M1 macrophages (Fig. 6k-6n). All in all, TNBC patients with low IRGriskscore had a higher abundance with tumor-infiltrating immune cells.
Relationship of IRGP subgroups with immune subtypes and IPS
As for the correlation between IRGP and clinicopathological parameters, we found from Fig. 7a that our IRGriskscore was significantly correlated with patents tumor size. Thorsson et al. reported the immune landscape of pan-cancer of TCGA database and defined six immune subtypes [28]. As shown in Fig. 7b, TNBC patients mainly belonged to the subtype of C1 (Wound Healing) and C2 ((IFN-γ Dominant). More C2 subtypes were enriched in low IRGriskscore group (p = 0.001, chi-square test). Our IRGP signature contains three immue genes: FBP1, GPRC5C and TAPBPL. As in Fig. 7c, FBP1 had a strong correlation with infiltration of resting dendritic cell, resting mast cells and T helper cells. The expression of GPRC5C was strongly associated with the infiltration of M2 macrophages, resting mast cells and activated CD4 memory T cells. Furthermore, a strong positively correlation was seen between the expression of TAPBPL and M1 macrophages, activated NK cells, activated CD4 memory T cells and CD8 T cells. We then used IPS to assess the potential clinical efficacy of ICIs in different IRGP groups. The higher IPS prediction score indicated a better response to ICIs therapy. There was no significant difference for PD-1 and CTLA-4 negative expression patients (Fig. 7d). However, low IRGriskscore might be a robust indicator for ICIs therapy in TCGA-TNBC cohort for patients with PD-1 or CTLA-4 positive expression (Fig. 7e-7g). Although TNBC patients with high IRGriskscore might not be the most appropriate population for ICIs therapy, our drug sensitivity analysis demonstrated that high IRGriskscore subgroup were more sensitivity to PARP inhibitors (ABT.888 and AZD.2281), doxorubicin, gemcitabine and methotrexate (Fig. S4).