Data preprocessing and DEGs analysis
After preprocessing, the overall expression pattern of the samples was similar for each dataset. There was no significant batch effect among the samples in each dataset (Figure 2A).
A total of 153 DEGs were obtained from the GSE76714 dataset, including 117 upregulated and 36 downregulated genes. Total 1898 DEGs were obtained from the GSE43837, including 563 upregulated and 1335 downregulated genes. Additionally, 954 DEGs were obtained from the GSE125989, including 311 upregulated genes and 643 downregulated genes. The volcano plots for DEGs are shown in Figure 2B. The bidirectional hierarchical clustering heatmaps of DEGs is shown in Figure 2C.
Tumor immune microenvironment analysis
In GSE76714, the stromal score of brain metastases group was significantly lower than that of primary cancer group, while the immune score, ESTIMATE score, and cytolytic activity score showed no significant difference between two groups (Figure 3A). For the datasets of GSE43837 and GSE125989, the stromal score, immune score, and ESTIMATE score in brain metastases group were significantly lower than that in primary cancer group. Additionally, cytolytic activity scores for the two groups were not significantly different (Figure 3B and 3C). The result may indicate the difference of immune microenvironment between two groups.
Immune cell infiltration abundance analysis
The infiltration abundance matrix of 22 kinds of immune cells in all samples of GSE76714 was estimated using Cibersort algorithm. The result showed that among the 71 samples, 60 were valid, including 40 cases of PC and 20 cases of BM. The immune cell infiltration abundance for GSE76714 is shown in Figure 4. The barplot (Figure 4A) and clustering heatmap (Figure 4B) showed that the infiltration rates of T cells gamma delta (green), T cells CD4 naive (yellow) and mast cells activated cells (pink) in each sample were relatively high. The correlation heatmap indicated that there existed difference in immune cell infiltration pattern between PC and BM groups (Figure 4C). For instance, the correlation between NK cell activated and T cell helper in BM was very low (r = 0.02), while it was relatively higher in PC (r = 0.6). The violin plot showed that the infiltration ratio of plasma cells in brain metastases samples was significantly higher than that in primary cancer samples (p = 0.026), while the infiltration ratio of macrophages M2 cells in brain metastases samples was lower than that in primary cancer samples (p = 0.003) (Figure 4D). For GSE43837 and GSE125989, there was not enough CIBERSORT for analysis.
Identification of immune-related genes
Correlation analysis was conducted between macrophages M2 cell infiltration and the expression of DEGs, and 42 immune-related genes were screened. T-test revealed that 27 genes, such as THY1, present significant differences in expression levels between two groups. These genes were significantly enriched in BP term associated with positive regulation of GTPase activity (ALDH1A1, APC2, DOCK2 and THY1). Additionally, other glycan degradation pathway was significantly enriched (Figure 5).
TF-miRNA-mRNA network analysis
A total of 9 miRNA-mRNA pairs were predicted, which included 5 mRNAs and 9 miRNAs, such as miR-520a and miR-361-3p. In addition, 12 TF-mRNA pairs were obtained, involving 8 mRNAs and 11 TFs. Based on the miRNA-mRNA and TF-mRNA intraction pairs, a TF-miRNA-mRNA regulatory network was constructed (Figure 6A). Function analysis showed that the TFs and mRNAs in the network were significantly enriched in lipid homeostasis, and cholesterol homeostasis associated BP terms (Figure 6B). Moreover, they were involved in 8 pathways, such as transcriptional misregulation in cancer, PPAR signaling pathway, AMPK signaling pathway, and breast cancer (Figure 6C).
PPI network analysis
Based on the immune-related genes, 46 PPI pairs were obtained and the constructed PPI network included 31 nodes (21 up-regulated and 10 down-regulated ones) (Figure 7). Among the 31 nodes, NEUROD1, THY1, ALDH1A1, GBX2, MIXL1, CDH8 and ASPN had degrees more than 5, and were considered as hub nodes.
Drug-gene interaction analysis
A total of 10 drug-gene interaction pairs were identified based on the immune-related genes, which involved 10 drugs (busulfan, retinol, tretinoin, zanamivir, deferoxamine, temazepam, diazepam, oxazepam, bromazepam and nitrazepam) and 5 mRNAs (PRSS1, ALDH1A1, NEU2, NEUROD1 and GABRQ). In detail, GABRQ interacted with temazepam, diazepam, oxazepam, bromazepam and nitrazepam; ALDH1A1 interacted with tretinoin and retinol; NEU2 interacted with zanamivir; PRSS1 interacted with busulfan (Figure 8).
Metastatic recurrence survival analysis of key genes
The Venn diagram of the intersection of DEGs in the three datasets and immune-related genes is shown in Figure 9A. The genes verified by GSE43817 dataset were DOCK2, HCN4, HASPIN, STK33 and KYNU. The gene verified by GSE125989 dataset was THY1. The genes verified by the two datasets were ASPN and CD1B. These genes were considered as key genes and were performed metastatic recurrence survival analysis. Based on the analysis of the bc-GenExMiner v4.4 database, CD1B, THY1 and DOCK2 were found to affect the metastatic recurrence of triple-negative breast cancer. As shown in Figure 9B8B, high expression of THY1 was more likely to cause metastasis of breast cancer, while low expression of CD1B and DOCK2 was likely to cause metastasis of breast cancer.
Literature retrieval of key genes
Among the key genes, ASPN, DOCK2, THY1 and KYNU were found to be associated with breast cancer based on NCBI Entrez database. Based on the GenCLiP 2.0 database, only THY1 was associated with breast cancer.