Data preprocessing and DEGs analysis
After preprocessing, the overall expression pattern of the samples in the datasets was similar for the three datasets. There was no significant batch effect among the samples, and the difference between groups was small (Fig. 2A).
A total of 153 DEGs were obtained from the GSE76714 dataset, including 117 upregulated and 36 downregulated genes. Total 1898 DEGs were obtained from the GSE43837, including 563 upregulated and 1335 downregulated genes. Additionally, 954 DEGs were obtained from the GSE125989, including 311 upregulated genes and 643 downregulated genes. The volcano plots for DEGs are shown in Fig. 2B. The bidirectional hierarchical clustering heatmaps of DEGs is shown in Fig. 2C.
Tumor immune microenvironment analysis
In GSE76714, the stromal score of brain metastases group was significantly lower than that of primary cancer group, while the immune score, ESTIMATE score, and cytolytic activity score showed no significant difference between two groups (Fig. 3A). For the datasets of GSE43837 and GSE125989, the stromal score, immune score, and ESTIMATE score in brain metastases group were significantly lower than that in primary cancer group. Additionally, cytolytic activity scores for the two groups were not significantly different (Fig. 3B and 3C). The result may indicate the difference of immune microenvironment between two groups.
Immune cell infiltration abundance analysis
The immune cell infiltration abundance for GSE76714 is shown in Figure 4. The barplot (Figure 4A) and clustering heatmap (Figure 4B) showed that the infiltration rates of T cells gamma delta (green), T cells CD4 naive (yellow) and mast cells activated cells (pink) in each sample were relatively high. The correlation heatmap indicated the difference of immune cell infiltration pattern between two groups (Figure 4C). The violin plot showed that the infiltration ratio of plasma cells in brain metastases samples was significantly higher than that in primary cancer samples (p = 0.026), while the infiltration ratio of macrophages M2 cells in brain metastases samples was lower than that in primary cancer samples (p = 0.003) (Figure 4D). For GSE43837 and GSE125989, there was not enough CIBERSORT for analysis.
Identification Of Immune-related Genes
Correlation analysis was conducted between macrophages M2 cell infiltration and the expression of DEGs, and 42 immune-related genes were screened. T-test revealed that 27 genes, such as THY1, present significant differences in expression levels between two groups. These genes were significantly enriched in BP term associated with positive regulation of GTPase activity (ALDH1A1, APC2, DOCK2 and THY1). Additionally, other glycan degradation pathway was significantly enriched (Fig. 5).
TF-miRNA-mRNA Network Analysis
A total of 9 miRNA-mRNA pairs were predicted, which included 5 mRNAs and 9 miRNAs, such as miR-520a and miR-361-3p. In addition, 12 TF-mRNA pairs were obtained, involving 8 mRNAs and 11 TFs. Based on the miRNA-mRNA and TF-mRNA intraction pairs, a TF-miRNA-mRNA regulatory network was constructed (Fig. 6A). Function analysis showed that the TFs and mRNAs in the network were significantly enriched in lipid homeostasis, and cholesterol homeostasis associated BP terms (Fig. 6B). Moreover, they were involved in 8 pathways, such as transcriptional misregulation in cancer, PPAR signaling pathway, AMPK signaling pathway, and breast cancer (Fig. 6C).
PPI Network Analysis
A total of 60 PPI pairs were obtained from STRING database based on all DEGs. The PPI network was constructed, which contained 120 nodes (89 up-regulated genes corresponding proteins and 31 down-regulated genes corresponding proteins) (Fig. 7A). By mining the sub-network module from the PPI network, two sub-networks with score = 4 were obtained, both containing 4 nodes and 6 protein interaction relationships (Fig. 7B). Based on the immune-related genes, 46 PPI pairs were obtained and the constructed PPI network included 31 nodes (21 up-regulated and 10 down-regulated ones) (Fig. 7C).
Drug-gene Interaction Analysis
A total of 10 drug-gene interaction pairs were identified based on the immune-related genes, which involved 10 drugs (busulfan, retinol, tretinoin, zanamivir, deferoxamine, temazepam, diazepam, oxazepam, bromazepam and nitrazepam) and 5 mRNAs (PRSS1, ALDH1A1, NEU2, NEUROD1 and GABRQ) (Fig. 8).
Metastatic Recurrence Survival Analysis Of Key Genes
The Venn diagram of the intersection of DEGs in the three datasets and immune-related genes is shown in Fig. 9A. The genes verified by GSE43817 dataset were DOCK2, HCN4, HASPIN, STK33 and KYNU. The gene verified by GSE125989 dataset was THY1. The genes verified by the two datasets were ASPN and CD1B. These genes were considered as key genes and were performed metastatic recurrence survival analysis. Based on the analysis of the bc-GenExMiner v4.4 database, CD1B and THY1 were found to affect the metastatic recurrence of triple-negative breast cancer. As shown in Fig. 9B, high expression of THY1 was more likely to cause metastasis of breast cancer, while low expression of CD1B was likely to cause metastasis of breast cancer.
Literature retrieval of key genes
Among the key genes, ASPN, DOCK2, THY1 and KYNU were found to be associated with breast cancer based on NCBI Entrez database. Based on the GenCLiP 2.0 database, only THY1 was associated with breast cancer.