Screening and Identication of Key Biomarkers in Breast Cancer with Brain Metastasis: Evidence from Bioinformatic Analysis

Purpose. Breast cancer (BC) has a poor prognosis when brain metastases (BM) occur, and the treatment effect is limited. In this study, we aim to identify representative candidate biomarkers for clinical prognosis of patients with BM and explore the mechanisms underlying the progression of BC. Methods. Herein, we examined the Microarray datasets (GSE125989) obtained from the Gene Expression Omnibus database to nd the target genes in BC patients with BM. We employed the GEO2R tool to lter the differentially expressed genes (DEGs) that participate in primary BC and BC with BM. Subsequently, using the DAVID tool, we conducted an enrichment analysis with the screened DEGs based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) functional annotation. The STRING database was employed to analyze the protein-protein interactions of the DEGs and visualized using Cytoscape software. Lastly, the Kaplan-Meier plotter database was employed to determine the prognostic potential of hub genes in BC. Results. We screened out 311 and 104 downregulated DEGs. The enrichment analyses revealed that all the DEGs were` enriched in the biological process of extracellular matrix organization, cell adhesion, proteolysis, collagen catabolic process and immune response. The signicant enrichment pathways were focal adhesion, protein absorption and digestion, ECM-receptor interaction, PI3K-Akt signalling pathway, and Pathways in cancer. The top ten hub nodes screened out included FN1, VEGFA, COL1A1, MMP2, COL3A1, COL1A2, POSTN, DCN, BGN and LOX. The Kaplan-Meier plotter results showed that the three hub genes (FN1, VEGFA and DCN) are candidate biomarkers for clinical prognosis of patients with BM.


Introduction
Breast cancer is one of the most common tumours in females globally and one of the cancers that cause high mortality. About 2.1 million cases and more than 626,000 deaths were recorded in 2018 [1]. Breast cancer (BC) has a poor prognosis when brain metastases (BM) occur, and the treatment effect is limited. A previous study has shown that more than half of patients with brain metastases dead of central nervous system progression [2]. Considering the high morbidity, high mortality and the limitations of effective treatment of BC patients with brain metastases, it is particularly important to conduct timely screening and intervention for patients at high-risk and develop individualized treatment. High-throughput microarrays are one of the practical tools for identifying biomarkers to diagnose and detect BC phenotype features with distant metastasis. Therefore, by analyzing the microarray gene expression pro le of BC patients with brain metastases, we could establish a comprehensive PPI network to obtain biomarkers that can be used for clinical prognosis of BM patients.
Herein, we rst analyzed the data on gene expression pro les retrieved from the GEO database to nd DEGs in breast cancer with brain metastatic and matched non-brain metastatic breast cancer. Then, the DEGs were used for GO function annotation analysis and KEGG pathway enrichment analysis. Next, a PPI network was established to identify hub genes related to breast cancer brain metastases. Finally, through survival analysis, further search for genes related to prognosis among the important genes screened. These results will help us understand the molecular basis of breast cancer with brain metastatic and provide potential targets for future clinical intervention.

Data Retrieval
We downloaded gene expression pro le of GSE125989 datasets from the GEO (http://www.ncbi.nlm.nih.gov/geo/) database. GSE125989 was based on Agilent GPL571 (Affymetrix Human Genome U133A 2.0 Array) and contained 16 samples each of non-metastatic primary BC and BC with BM.

Identi cation of DEGs
We used GEO2R in the identi cation of the DEGs, then statistically compared their expressions between the primary BC group and BC with the BM group. We chose the DEGs according to the standard: p < 0.05 and |logFC|> 1.

Enrichment analysis of DEGs
Using DAVID tool [3], we performed KEGG pathway and GO functional annotation enrichment analyses [4] with the screened DEGs.

Assessment of PPI network and identi cation of hub genes
We employed the STRING database [5] to examine the PPI of the DEGs, which were visualized using the Cytoscape software. The Cytohubba plug-in based on Cytoscape was utilized to perform the hub gene analysis [6,7], and the top rank 10 genes with degrees > 50 in all modules were considered to be the hub genes. Subsequently, the KEGG analyses for hub genes in this module were performed using DAVID.

Prognostic Potential of hub genes in Breast Cancers
The Kaplan-Meier (KM) plotter can be employed to evaluate the in uence of up to 54, 000 genes on prognosis in 21 types of cancer. Currently, breast cancer has the largest dataset (n = 6,234) followed by ovarian, lung, and gastric cancers with 2,190, 3,452, and 1,440 datasets, respectively. The miRNA subsystems comprise 11,000 samples sourced from 20 different types of cancer and include RNA-seq data and gene chip-data for the databases are sourced from TCGA, GEO, and EGA. Primarily, this tool is used to discover and validate prognostic markers based on meta-analysis. Herein, we employed KM plotter (http://kmplot.com/analysis/) to determine the correlation between the expression of hub genes and BC prognosis [8]. Also calculated were the log-rank P-value and the hazard ratio (HR) with 95% con dence intervals.

Identi cation Of Degs In Bc With Bm
We selected GSE125989 containing data on 16 samples each of non-metastatic primary BC and BC with BM samples for analysis of DEG through GEO2R. Subsequently, we identi ed 415 DEGs based on the stander: P < 0.05 and |log FC|>1. The DEGs included 311 upregulated and 104 downregulated genes ( Fig. 1).

GO and KEGG analyses of DEGs
We performed the analyses mentioned above to explore further the main roles of the DEGs, as well as key pathways in BC patients with BM. The GO function enrichment was categorized as follows: biological process (BP), cellular component (CC), and molecular function (MF). The upregulated DEGs that were considerably enriched in MF included: calcium ion binding, heparin-binding, extracellular matrix structural constituent, serine-type endopeptidase activity and integrin-binding, et al. (Fig. 2A). For BP, enrichment of the upregulated DEGs was observed in extracellular matrix organization, cell adhesion, proteolysis, collagen catabolic process and immune response, et al. Furthermore, the CC analysis indicated that the signi cantly enriched were in extracellular region, extracellular exosome, extracellular space, extracellular matrix, and proteinaceous extracellular matrix, et al. The signalling pathway analysis by KEGG show that the upregulated DEGs were enriched in Focal adhesion, ECM-receptor interaction, PI3K-Akt signalling pathway, Protein absorption and digestion, Proteoglycans in cancer, Pathways in cancer, and et al. (Table 1). On the other hand, in BP, enrichment of the downregulated DEGs were mainly observed in ubiquitin-dependent protein catabolic process, positive regulation of the apoptotic process, chemical synaptic transmission, canonical glycolysis, and glycolytic process, et al. (Fig. 2B). For MF, they were mainly enriched in the following activities: protein binding, thiol-dependent ubiquitin-speci c protease, translation initiation factor and guanyl-nucleotide exchange factor, SUMO binding and phosphatidylinositol-4,5-bisphosphate 5-phosphatase. Additionally, the CC terms for these genes were related to the cytoplasm, cytosol, membrane, nucleoplasm and perinuclear region of cytoplasm, et al. The KEGG pathway results showed that the down-regulated DEGs were enriched in the HIF-1 signalling pathway, Insulin secretion, Glycolysis / Gluconeogenesis, Fructose and mannose metabolism, Carbon metabolism, and Aldosterone-regulated sodium reabsorption (Table 2).  -11  COMP, CAV1, COL1A1, COL1A2, COL3A1, COL5A1, COL5A2,  COL5A3, COL6A1, COL6A2, COL6A3, CCND2, FN1, FLNA,  IGF1, LAMA4, MYL9, MYLPF, PDGFD, PDGFRA, PDGFRB,  THBS1, THBS2, THBS4 PI3K-Akt signaling pathway

Construction of PPI network and Hub genes analysis
The STRING database was applied to construct the PPI network to explore the molecular mechanisms about BC with BM progression. In total, 390 nodes and 2,197 edges were identi ed among the DEGs (Fig. 3A). Subsequently, Cytoscape software was employed for further assessment of the network to enable us to understand the DEGs with regards to the PPI network. Further, the hub genes were selected with select degree > 50 as the standard by the plug-in CytoHubba, and we extracted the top ten hub nodes with higher degrees. They included FN1, VEGFA, COL1A1, MMP2, COL3A1, COL1A2, POSTN, DCN, BGN and LOX (Fig. 3B). Moreover, the DAVID online tool was used to predict possible enrichment pathways of the hub genes, and the pathways closely related to Focal adhesion, ECM-receptor interaction, Amoebiasis, PI3K-Akt signalling pathway, Proteoglycans in cancer, Protein absorption and digestion, Platelet activation, Bladder cancer and Pathways in cancer (Table 3).

Prognostic Potential of hub genes in Breast Cancer
To examine the prognostic potential of the ten hub genes in Breast Cancer, KM plotter database was employed to determine the hub genes prognostic value based on Affymetrix microarrays. As shown in

Discussion
Evidence indicates that breast cancer patients with brain metastasis (BM) have a very poor prognosis and a short survival time following BM [9]. However, much progress has been made regarding the diagnosis of the condition, especially with the application of PET-CT and MRI [10]. Despite these, the progress towards effective therapies has been slow, mainly because the standard treatment options for BM, such as surgical operation, chemotherapy, and radiotherapy are ineffective [11]. As such, there is a need to identify sensitive markers to improve the prognosis of BC with BM.
First, we screened the differential expressed genes which may be associated with BM in BC in our study.
GO term analysis revealed that most of the DEGs participate in tumorigenesis through regulating cell proliferation, differentiation, apoptosis adhesion, and migration, as well as angiogenesis. Also, the KEGG pathway analysis indicated that most of the DEGs had a strong relation to the development of tumours through the key signalling pathways, including the PI3K-AKT pathway, HIF-1 signalling pathway, pathways in cancer and the metabolic pathway [12][13][14]. Subsequently, the PPI network was constructed to explore the interactions of the DEGs. The top ten hub genes were signi cantly related to the pathway of Focal adhesion, ECM-receptor interaction, Amoebiasis, PI3K-Akt signalling pathway, Proteoglycans in cancer, Protein absorption and digestion, Platelet activation, Bladder cancer and Pathways in cancer (Table 1). FN1 and VEGFA were the most connected nodes in the module, in which FN1 ( bronectin 1 protein) regulates the migratory and adhesion ability of cells. Upregulation of FN1 has been observed BC metastases [15]. VEGFA has a vital and dual function in angiogenesis and tumour metastasis and has been implicated in the pathogenesis of breast cancer [16]. Lastly, we performed survival analysis to determine the prognostic value of ten hub genes, we found that the three hub genes (FN1, VEGFA and DCN) were correlated with worse OS (Fig. 4A), and the seven hub genes (FN1, VEGFA, COL1A1, POSTN, DCN, BGN and LOX) were correlated with worse DMFS (Fig. 4B). Of the 10 genes, ve genes (FN1, VEGFA, COL1A1, BGN and LOX) have been implicated in tumour progression, as well as in predicting disease outcomes. For example, COL1A1 expression increases the invasion and metastatic capacity of gastric cancer cells [17]. BGN up-regulation has been demonstrated in many malignancies [18]. Also, LOX is upregulated in many cancers, such as BC, nasopharyngeal carcinoma, hepatocellular carcinoma, head and neck tumours, colorectal cancer and is related to poor prognosis [19]. These indicating that our big data analysis above has prognostic values in the BC cohort of GEO database. Moreover, POSTN and DCN genes were identi ed herein for the rst time as possible markers for BC prognosis.
Herein, we identi ed seven possible biomarkers FN1, VEGFA, COL1A1, POSTN, DCN, BGN and LOX. However, their role in the formation and progression of cancers should be investigated further. Several studies have explored the molecular dynamics that occur during the formation and development of breast cancer [20,21]. As a result, many BC prognostic markers have been identi ed. Most of these studies were conducted using animal models, in vitro cell experiments, and clinical tumour samples involving small sample size [22][23][24]. However, the complex interplay of BC with BM and the delicate nature of the condition requires that we conduct comprehensive studies that involve large sample sizes.
Luckily, substantial progress has been made so far in bioinformatics, including the establishment of highthroughput tumour databases like GEO and TCGA [25]. The databases collect and manage data which are made public for use by researchers and has so far assisted in big data analysis consisting of large-scale cohorts of BC with BM [26,27].
The current study is unique because, other than focusing on the mechanisms by which the activation of tumour intrinsic gene affect BC pathogenesis, we focused on gene signatures in BC with BM and their effect on BC prognosis. The ndings of the present study could offer new insights into the understanding of the complex interactions between BC and BC with BM. However, this study had a few limitations.
Firstly, the clinical data in the GEO database was not complete, which will prevent us from performing a comprehensive survival analysis. Secondly, we only studied one data set, which may lead to biases in our results. Thirdly, our results still need further veri cation. For example, Western blot and real-time PCR can be employed to verify the selected hub genes expression in the basic experiments.

Conclusions
In conclusion, we identify seven genes with poor prognosis from the BC with BM cohort of GEO database.
The three genes, namely, FN1, VEGFA and DCN can be perceived as candidate genes for the prognostic prediction of BC with BM. At the same time, COL1A1, POSTN, BGN and LOX may be related to the distant transformation of breast cancer. These genes need further research to help us fully understand the relationship between them and the prognosis of breast cancer with brain metastasis. Our BM-related gene mining strategy once again validated the feasibility of using big data analysis in the discovery of prognostic value biomarkers for malignant tumors.