Immune scores and stromal scores are closely related to breast cancer subtypes but not associated with breast cancer stage
Our analysis covered a total of 1097 breast cancer patients with complete gene expression data and relevant clinical information from the TCGA database. To evaluate the TME, the stromal scores and immune scores of each patient were calculated by the ESTIMATE algorithm. The stromal scores ranged from -2164.14 to 2050.55, and the immune scores were distributed between -1724.88 and 3459.35. First, we assessed the relationships between stromal scores or immune scores and different stages of breast cancer. However, neither stromal scores nor immune scores showed significant differences in different stages of breast cancer (Figure 1A, 1B). Then, we explored whether stromal scores or immune scores were correlated with breast cancer subtypes. The average stromal scores of the basal-like subtype were the lowest of all four subtypes (Figure 1C); however, the average immune scores of the basal-like subtype ranked highest (Figure 1D). In addition, the immune scores of the HER2-enriched subtype were higher than those of the luminal B subtype (Figure 1D). Finally, we divided the scores into high and low groups and evaluated the survival probabilities of distinct subtypes in both groups of stromal scores and immune scores. Unfortunately, the high- and low-score groups did not show any survival differences in the four breast cancer subtypes (Figure S1).
Breast cancer gene mutations were correlated with stromal scores or immune scores
Some gene mutations are implicated in cancer susceptibility. Thus, we downloaded single-nucleotide polymorphism data on several conventional mutant genes in the clinic, such as BRCA1, BRCA2, CHEK2, PALB2, BRIP1, TP53, PTEN, STK11, CDH1, ATM, BARD1, MLH1, MRE11A, MSH2, MSH6, NBN, MUTYH, PMS1, PMS2, RAD50, and RAD51C. Then, the patients were divided into mutant and wild-type groups, and the distributions of stromal scores and immune scores were plotted based on the status of the mutant gene in breast cancer. When compared with the wild-type group, either the BRCA1 or BRCA2 mutant group exhibited no significantly difference in the stromal scores and immune scores (Figure 2A, 2B, 2F, 2G). However, CDH1 mutant cases had higher stromal scores and immune scores (Figure 2C, 2H). Although stromal scores of the PTEN/TP53 mutant and wild-type groups were not statistically significant (Figure 2D, 2E), PTEN/TP53 mutant cases exhibited higher immune scores than wild-type cases (Figure 2I, 2J). Other mutant genes, such as CHEK2, PALB2, BRIP1, STK11, ATM, BARD1, MLH1, MRE11A, MSH2, MSH6, NBN, MUTYH, PMS1, PMS2, RAD50, and RAD51C, had no significant differences in stromal scores or immune scores between the mutant and wild-type groups (data not shown). Thus, CDH1, PTEN and TP53 mutations were closely associated with stromal/immune scores in breast cancer.
Differential gene expression profiles with stromal scores and immune scores in breast cancer
To determine the correlation between comprehensive gene expression profiles and stromal/immune scores, we analyzed the 1097 cases acquired from the TCGA database. The heatmap shows the top 100 distinct gene expression profiles of cases belonging to the high vs. low score groups (Figure 3A, 3B). Based on the stromal score grouping, 2832 genes were upregulated and 253 genes were downregulated in the high vs. low stromal score groups. Similarly, 1930 genes were upregulated and 491 genes were downregulated in the high vs. low immune score groups (p<0.05, log2 FC>1). Figure 3C and 3D show the volcano plot of differentially expressed genes (DEGs) in high vs. low stromal/immune score groups.
Next, in order to comprehend the function of DEGs, we selected the upregulated DEGs for GO enrichment and KEGG pathway analyses. The top GO terms identified for the upregulated DEGs from the high vs. low stromal score groups were extracellular structure organization, collagen-containing extracellular matrix, and receptor ligand activity (Figure 4A). In addition, the top GO terms of the upregulated DEGs from the high vs. low immune score groups were T cell activation, external side of the plasma membrane, and receptor ligand activity (Figure 4B). Moreover, the top KEGG pathways of the upregulated DEGs from the high vs. low stromal score and high vs. low immune score groups were neuroactive ligand-receptor interaction and cytokine-cytokine receptor interaction, respectively (Figure 4C, 4D).
For deeply exploring the gene profiles of the top GO terms, we integrated the top 1 term of the BP, MF, or CC categories as gene datasets. Then, we screened the overlapping genes between upregulated DEGs and the gene datasets (Table S1). Venn diagrams showed that 225 genes overlapped between the upregulated genes from the high vs. low stromal score groups and the gene datasets (for convenience of description, we named these genes overlapping 1 genes), and 169 genes overlapped between the upregulated genes from the high vs. low immune score groups and the gene datasets (for convenience of description, we named these genes overlapping 2 genes) (Figure 4E, 4F).
The screened immune-related DEGs are associated with good prognosis in breast cancer
We performed a univariate survival analysis to reveal the relationship between the selected overlapping genes and prognosis in breast cancer patients from TCGA. Although the overlapping 1 genes did not show statistical significance in the OS analysis, the overlapping 2 genes exhibited highly significant differences in the prognosis of breast cancer, so we chose the overlapping 2 genes for further analysis. Using p value <0.05, 54 genes related to good prognosis were included. Then, a time-dependent ROC curve (area under the curve (AUC) >0.6) was employed to assess the prediction accuracy, and 31 genes with prognostic value in breast cancer remained. To better understand the interplay among the identified 31 genes, we performed PPI analysis, and Figure 5A shows the overall PPI network, which included 29 nodes and 110 edges (Figure 5A). Then, we selected the extremely critical module containing 15 nodes for further analysis (Figure 5B).
The 15 immune-related genes (CD226, KLRD1, KLRC4-KLRK1, IL2, KLRK1, ITK, SPN, SLAMF1, CD1C, FASLG, CD40LG, TBX21, IL7, LAT and ITGAX) in the TME that are associated with good prognosis are shown in Figure 6A-6O and Table 1.
Figure 7A-7O displays the 15-gene signature of the time-dependent ROC curve, and their 95% confidence interval (CI) of AUC were shown in Table 2. The accuracies in predicting the 1-year, 3-year and 5-year OS of patients with CD226 (AUC 0.73, 0.62, and 0.57, respectively), KLRD1 (AUC 0.73, 0.6, and 0.54, respectively), and KLRC4-KLRK1 (AUC 0.72, 0.6, and 0.56, respectively) were higher than those with the other selected genes, especially for the 1-year prediction accuracy (Figure 7 and Table 2). Therefore, we identified CD226, KLRD1, and KLRC4-KLRK1 for further functional evaluation.
High expression of CD226 and KLRC4-KLRK1 results in a better prognosis in stage II, stage III or luminal B breast cancer
To further explore the identified CD226, KLRD1, and KLRC4-KLRK1 genes with prognostic value, we analyzed the expression of these genes and their association with OS in different breast cancer stages or subtypes. Although the high expression of KLRD1 was not related to a better prognosis in any stage or subtype variation (data not shown), the expression of CD226 and KLRC4-KLRK1 displayed a strong correlation with OS for stages or subtypes. High CD226 expression was related to better prognosis in stage II and stage III breast cancer, as well as in luminal B breast cancer (Figure 8A-8C). Similarly, despite not being significant for stage II (Figure 8D), high KLRC4-KLRK1 expression revealed preferable survival probabilities in stage III breast cancer and in the luminal B subtype (Figure 8E, 8F). In other stages or subtypes of breast cancer, CD226 and KLRC4-KLRK1 expression was not significantly related to OS (Figure S2).
Validating that the prognostic genes identified from the TCGA database are equally significant in the GEO database
To validate whether the prognostic genes identified from the TCGA database are equally significant in the GEO database, two gene profiles from GEO (GSE20685 and GSE42568) were used as validation datasets. The gene expression data of 327 cases from GSE20685 and 104 cases from GSE42568 with clinical information were downloaded and evaluated for OS. Data from GSE42568 demonstrated that CD226 was related to good prognosis in breast cancer patients, while KLRC4-KLRK1 did not show any correlation with breast cancer OS (Figure 9A, 9B). However, the data from GSE20685 showed the reversed results; instead of CD226, KLRC4-KLRK1 was related to good prognosis in breast cancer (Figure 9C, 9D). Next, we employed the data from GSE20685 to better confirm the relationships between gene expression and OS in different cancer stages (Figure 9E-9J), and found CD226 was firmly associated with good prognosis in stage II breast cancer patients. Thus, our validation datasets partly proved that high CD226 and KLRC4-KLRK1 expression levels would predict satisfied overall survival of breast cancer, especially in specific stages.