3.1 Relationship between T, N, M stages (or pathological stages) and immune, stromal, and ESTIMATE scores
The primary pathologic and clinical data of 898 cases with breast cancer were obtained from the TCGA database after exclusion of samples with non-conforming tumor tissue and incomplete clinical data. We used the ESTIMATE algorithm to estimate the range of immune score (-1284.33 to 3659.11), the range of stromal score –(2068.96 to 2098.24), and the range of estimation score –(2926.51 to 5331.17) (Table 1). Additionally, the immune score remained unchanged in T and M stage or pathological stage, while the stromal score showed a decrease with the increase in T stage and pathological stage, and no correlation was seen with the M stage. Besides a decrease in the ESTIMATE score as T stage progressed, there was no other correlation with changes in M stage and pathological stage (Figure 1).
Table 1 Immune scores, stromal scores, and clinical data of patients with breast cancer.
Characteristic
|
n
|
Stromal score
(range)
|
Immune score
(range)
|
Age
|
|
|
|
≤60
|
510
|
-2068.96 to 2056.97
|
-1284.33 to 3113.22
|
>60
|
388
|
-1784.53 to 2098.24
|
-1070.49 to 3659.11
|
Stage
|
|
|
|
I
|
158
|
-1784.53 to 2098.24
|
-1070.49 to 3070.83
|
II
|
524
|
-2068.96 to 2094.93
|
-1284.33 to 3659.11
|
III
|
199
|
-1522.73 to 1931.95
|
-762.70 to 3113.22
|
IV
|
17
|
-783.05 to 1627.01
|
-734.57 to 2490.13
|
T stage
|
|
|
|
T1
|
233
|
-1784.53 to 2098.24
|
-1070.49 to 3070.83
|
T2
|
531
|
-2068.96 to 2094.93
|
-1284.33 to 3659.11
|
T3
|
102
|
-1185.72 to 1702.80
|
-734.57 to 3113.22
|
T4
|
32
|
-1522.73 to 1299.00
|
-553.68 to 2059.34
|
N stage
|
|
|
|
N0
|
445
|
-2068.96 to 2098.24
|
-1284.33 to 3659.11
|
N1
|
296
|
-1267.08 to 2056.97
|
-1128.40 to 3113.22
|
N2
|
103
|
-1142.45 to 1931.95
|
-762.70 to 2798.21
|
N3
|
54
|
-1055.12 to 1563.09
|
-734.57 to 2407.86
|
M stage
|
|
|
|
M0
|
881
|
-2068.96 to 2098.24
|
-1284.33 to 3659.11
|
M1
|
17
|
-783.05 to 1627.01
|
-734.57 to 2490.13
|
3.2 Higher immune score is associated with better prognosis
In order to explore the potential correlation between the differences in immune, stromal, and ESTIMATE scores and the prognosis of breast cancer patients, and Previous studies have suggested that overall survival and immune score have statistical significance [11]. The samples were then divided into two groups according to their immune scores as high immune score group and low immune score group. According to the Kaplan-Meier survival curve, higher immune score predicts better prognosis for breast cancer patients (Figure 2a, p= 0.018). In contrast, there was no correlation between stromal (Figure 2b, p= 0.379) and ESTIMATE score (Figure 2c, p= 0.281) and the overall survival of breast cancer patients.
3.3 Identification of differentially expressed genes
Figure 3a shows the heat map of 888 DEGs in the high and low immune score groups. A total of 790 genes were up-regulated and 98 genes were down-regulated in the high score group when compared with those seen in the low score group. Figure 3b shows the heat map of 797 DEGs in the high and low stromal score groups, including 675 up-regulated and 122 down-regulated genes. Venn diagrams showed that 247 genes were common in both the groups, including 221 up-regulated (Figure 3c) and 26 down-regulated (Figure 3d) genes in the immune and stromal group.
3.4 Enrichment analysis of common genes
GO analyses and KEGG pathways were used to analyze the biological functions of 247 common genes. Most of these genes were associated with several biological processes related to T cell activation. The molecular function process analysis revealed their significant correlation with the cytokine receptor activity. Data from the cellular components process analysis revealed that most of the target genes were localized to the external region of plasma membrane (Figure 4a). Moreover, the KEGG pathways revealed that these genes were enriched for cytokine-cytokine receptor interaction (Figure 4b).
3.5 Protein-protein interaction (PPI) analysis of common genes
We constructed a protein-protein interaction (PPI) network using the STRING database. Screening was carried out with the minimum required interaction score of 0.700 presents interconnections systemly. Cytoscape software was used for visualizing the data (Figure 5a). PTPRC, IL6, CCR5, CD2, CD28, CCR2, CD5, CD3E, CXCR3, and CSF1R were found as the top 10 hub genes in the network by CytoHubba plugin. MCODE was used to identify 135 genes and 9 function modules from the PPI network. We chose the top two modules for further analysis. The major module was called module a (Figure 5b), which included 12 nodes and 66 edges and all the nodes had the same node degree. PTPRC, CD2, and CD28 were connected more effectively in another module, called module b (Figure 5c). Moreover, we also analyzed the biological processes, cellular components, and molecular functional processes associated with the genes in module a. The results showed that these genes were enriched in the cell chemotaxis pathway, external region of plasma membrane, and C-C chemokine receptor activity (Supplementary Figure 1a). The genes in module b are mainly enriched in the biological process of T cell and lymphocyte differentiation and T cell activation (Supplementary Figure 1b).
3.6 Survival analysis of hub genes
Further survival analysis of the hub genes was essential to predict the prognosis of breast cancer patients in clinically. The Kaplan-Meier survival curves were constructed for the hub genes. CCR5, CD2, CD3E and CD5 displayed a significant correlation with the prognosis of patients (p<0.05 was considered as statistically significant). The survival curves are shown in figure 6.