Identification of DEGs between normal and BC tissues
The pyroptosis-related gene expression levels between 113 normal and 1109 BC tissues from TCGA database were compared, and 38 DEGs (P<0.01) were identified. Among them, 21 genes including NLRP7, GSDMC, AIM2, PYCARD, NOD2, NLRP6, IL18, BAX, GZMA, GSDMD, BAK1, CASP6, IRF1, CASP3, CYCS, CHMP4B, CHMP2A, CHMP4C, GSDMB, CHMP6, CASP8 were upregulated while 17 genes containing PLCG1, TIRAP, IRF2, GPX4, SCAF11, CHMP3, CASP4, CASP1, IL1B, NOD1, NLRP3, GSDME, PJVK, NLRP1, ELANE, TP63, IL6 were downregulated in the tumor group. The expression levels of these genes are shown in heatmap (Fig. 1A) (blue: low expression level; red: high expression level). We also created a protein–protein interaction (PPI) network diagram to show the interactions of these pyroptosis-related genes (Fig. 1B). The minimum required interaction score for the PPI analysis was set at 0.4 (the medium confidence). The number of edges in the network is 177 indicating that there are 177 interactions between these genes. The correlations network of all pyroptosis-related genes is shown in Fig. 1C (red: positive correlations; blue: negative correlations).
Tumor clustering based on DEGs
A consensus clustering analysis with all 1109 BC patients in the TCGA database was carried out, showing the relations between the expression of the 38 pyroptosis-related DEGs and BC subsets. We increased the clustering variable(K)from 2 to 10, and the results showed that when k=2, the intragroup correlations were the highest and intergroup correlations were low, suggesting that 1109 BC patients could be well divided into two groups according to 38 DEGs (Fig. 2A). The survival curve showed significant differences in OS between the two clusters (P<0.05) (Fig. 2B). Gene expression levels and clinical features including age (≤65 or>65 years), TNM and stages between the two clusters are shown on the heatmap (Fig. 2C), but no significant differences presented in clinical features between the two groups.
Identification of prognostic genes and construction of risk models based on TCGA database
We performed univariate Cox regression analysis on 1089 BC samples who have complete survival information in TCGA database, and 7 prognosis-related genes were identified. Among them, 6 genes (CHMP6, IL18, IRF1, IRF2, TP63, GZMA) were protective genes with HRs<1 while TIRAP(HR>1)was associated with high risk (Fig. 3A). By applying the Cox regression analysis, 5 characteristic genes (CHMP6, IL18, IRF2, TP63, TIRAP) were screened out to constructed prognostic model and the risk coef for each gene was calculated. 1089 BC patients were evenly assigned to the high- and low-risk groups based on the median risk score (Fig. 3B). The principal component analysis (PCA) demonstrated that patients could be well classified into high- or low-risk group (Fig. 3C). The survival chart showed that the number of deaths increased as the risk increased (Fig. 3D). We performed time-dependent receiver operating characteristic (ROC) analysis to evaluate the sensitivity and specificity of the prognostic model, the result showed that the area under the ROC curve (AUC) was 0.707 for 1-year, 0.669 for 3-year, and 0.663 for 5-year survival (Fig. 3E). Significant difference in OS between the high and low risk groups was presented (P<0.001)(Fig. 3F).
Independent prognostic value assessment of prognostic model
To evaluate whether the prognostic model based on gene signatures could be used as an independent prognostic factor, univariate and multivariable Cox regression analyses were applied. Both analyses demonstrated risk score as an independent prognostic factor (P<0.01)(univariate:HR=1.926, 95% CI: 1.516–2.447; multivariable: HR=2.024, 95% CI: 1.553–2.639)(Fig. 4A and 4B). In addition, a heatmap showed the distribution of clinical features and signature genes in the high- and low-risk groups according to TCGA (Fig. 4C), in which we found that patient age and T stage distributed differently between the two groups (P<0.01).
Functional enrichment analysis of prognostic models
By using R software, FDR<0.05, |log2FC|>1 as screening conditions, we identified 19 DEGs between high- and low-risk groups. All the DEGs were downregulated in high-risk group. GO enrichment analysis and KEGG pathway analysis based on these DEGs were carried out, suggesting that the DEGs mainly enriched in immune response, chemokine-mediated signaling pathway, inflammatory response, cellular response to tumor necrosis factor, B cell receptor signaling pathway, positive regulation of B cell activation, Cytokine-cytokine receptor interaction and Chemokine signaling pathway (Fig. 5A and 5B).
Comparison of immune function between high and low risk groups
The single-sample gene set enrichment analysis (ssGSEA) was used to assess the immune cell infiltration and immune pathway action of each sample between the high - and low-risk groups in TCGA data. The distribution of immune cells showed that most immune cells were reduced in the high-risk group, especially CD8+T cells, B cells, regulatory T (Treg) cells, T helper (Th) cells (Tfh, Th1, and Th2 cells) and natural killer (NK) cell (Fig. 6A). All immune pathways were reduced in the high-risk group (Fig. 6B).