Identication of key genes and long non-coding RNA related to bone metastasis in breast cancer by bioinformatics analysis

The molecular mechanism of bone metastasis in breast cancer is largely unknown. We aimed to nd key genes and long noncoding RNAs (lncRNAs) related to the bone metastasis of breast cancer using a bioinformatic approach. We downloaded breast cancer bone metastasis gene expression data (GSE66206) and probe annotation les from the public database Gene Expression Omnibus. Then, we analyzed the network nodes to nd core driver genes. Weighted correlation network analysis was performed on the gene expression prole of breast cancer bone metastasis to identify potential pathogenic modules. We performed function and pathway enrichment analyses of the genes and lncRNAs in the breast cancer bone metastasis-related modules.


Abstract Background
The molecular mechanism of bone metastasis in breast cancer is largely unknown. We aimed to nd key genes and long noncoding RNAs (lncRNAs) related to the bone metastasis of breast cancer using a bioinformatic approach.

Methods
We downloaded breast cancer bone metastasis gene expression data (GSE66206) and probe annotation les from the public database Gene Expression Omnibus. Then, we analyzed the network nodes to nd core driver genes. Weighted correlation network analysis was performed on the gene expression pro le of breast cancer bone metastasis to identify potential pathogenic modules. We performed function and pathway enrichment analyses of the genes and lncRNAs in the breast cancer bone metastasis-related modules.

Results
Based on our results, we constructed a differentially expressed lncRNA-mRNA network related to bone metastases in breast cancer and identi ed core driver genes, including BNIP3 and RP11-317-J19.1.

Conclusions
Our ndings provide a theoretical foundation for explaining the intrinsic molecular mechanism of breast cancer bone metastasis.

Highlight
We compared breast cancer bone metastasis and normal samples with bioinformatic tools We identi ed a module of differentially expressed protein-coding and lncRNA genes BNIP3 and RP11-317-J19.1 were associated with breast cancer bone metastases

Background
Breast cancer is a malignant tumor that occurs in the epithelial tissue of the breast [1]. It is one of the most common malignant tumors in women worldwide, accounting for 30% of the total incidence of tumors in women. Continuous improvements in radical mastectomy technology and the application of various targeted drugs have markedly improve the survival rate and quality of life of breast cancer patients, but tumor recurrence and metastasis are still the main prognostic factors [2]. Among patients with recurrent and metastatic advanced breast cancer, bone metastasis is the most common form of metastasis, with an incidence of 65-75%; patients with rst bone metastases account for 27-50% of cases [3]. Metastases commonly occur in the thoracolumbar vertebrae, sacrum, and ribs [4]. Patients with bone metastases from breast cancer often experience bone-related events such as bone pain, pathological fractures, vertebral compression or deformation, spinal cord compression, and hypercalcemia, which seriously affect patient quality of life [5]. The purpose of this study was to identify genes and long noncoding RNAs (lncRNAs) that are related to bone metastases in breast cancer to determine the molecular mechanism of bone metastases in breast cancer. The key genes and lncRNAs we identi ed provide a theoretical foundation for the intrinsic molecular mechanism of breast cancer bone metastasis.

Gene expression data
We downloaded gene chip expression data (GSE66206) and probe annotation les from mouse breast cancer bone metastases from the Gene Expression Omnibus (GEO) database (n = 12 normal samples and n = 12 breast cancer bone metastasis samples). The probe annotation les include all probe ID les and probe sequence les for the platform.

Preparation of the data for probe re-annotation
We downloaded the V19 version of the human protein-coding gene reference transcript sequence and lncRNA reference genomic sequence data from the GENCODE database. The probe sequences of the Affymetrix chip GPL6246 platform used for GSE66206 were downloaded from GEO.

Chip probe re-annotation
First, we used a library comprising the human protein-coding gene reference transcript sequence data and lncRNA reference genome sequence data in fasta format in GENCODE to build our database. Then, based on the constructed transcription and lncRNA libraries, we re-annotated the probe sequences of GSE66206 via the blastn algorithm in preparation for construction of the mRNA and lncRNA expression pro les. When re-annotating, we ensured that all the remaining probes met the following 3 conditions: 1) The probe sequence fell completely within the transcript sequence of the protein-coding gene (PCG) or lncRNA and matched exactly; 2) if a probe sequence aligned to the transcripts of multiple PCGs or lncRNAs, it was ltered out; and 3) each PCG or lncRNA was supported by at least 2 probe sequences.

Differential expression analysis
First, we assigned the appropriate gene symbol to each probe based on re-annotation and calculated the mean expression value for all probes corresponding to the same symbol to determine the expression value for the gene in the breast cancer bone metastasis and normal samples. The gene expression pro le and lncRNA expression pro le were determined using the R package limma to analyze the differentially expressed genes and differentially expressed lncRNAs between the breast cancer bone metastasis samples and normal samples; the thresholds for expression differences were fold change > 1.2 or fold change < 5/6 where p < 0.05. We compared the screened differentially expressed genes with related genes on the National Center for Biotechnology Information (NCBI) database to obtain veri ed genes or potential genes.

Construction of an interaction network related to bone metastasis in breast cancer
The Spearman correlation coe cient was calculated from the gene and lncRNA expression data of the breast cancer bone metastasis and normal samples, and a rank-sum test was performed to construct a differentially expressed lncRNA-mRNA interaction network for breast cancer bone metastases (coe cient | r | ≥ 0.3, p < 0.05 determined by rank-sum test). We visualized the network and analyzed the node degrees to nd the core driving genes.

Weighted correlation network analysis of co-expression
Co-expression analysis of all genes and lncRNAs associated with breast cancer bone metastasis was performed with the weighted correlation network analysis (WGCNA) R package. We used the WGCNA algorithm to mine co-expressed gene modules, and then analyzed the associations between these modules and the sample phenotype. The identi ed breast cancer metastasis-related modules were displayed in the network using Cytoscape.

Functional enrichment analysis of the gene modules
The functions and pathways of the breast cancer bone metastasis-related and differentially expressed genes and lncRNAs were enriched with KOBAS and Enrichr, respectively.

Screening of differentially expressed genes
The differentially expressed genes and lncRNAs in 12 breast cancer bone metastasis samples and 12 normal samples were identi ed using the R limma package for differential expression. We used breast cancer bone metastasis expression pro le GSE66206. A total of 133 differentially expressed genes and 23 differentially expressed lncRNAs (fold change >1.2 or fold change <5/6) were identi ed. Among them, 3.2 Constructing an interaction network related to bone metastasis in breast cancer We calculated Spearman correlation coe cients for the co-expression of PCGs and lncRNAs, which we corrected using the rank-sum test. We identi ed 747,870 PCG-lncRNA, PCG-PCG, and lncRNA-lncRNA interactions, of which 7,451 were differentially expressed PCG-lncRNA interactions (| r | ≥ 0.3, p < 0.05).
We then visualized the network ( Figure 1A) and analyzed the node degrees. The Cytoscape analysis revealed that the degree of the network nodes ranged from 1 to 905. In order to identify the hub nodes in the network, we set the threshold to 20 and extracted all possible core driver genes. Twenty-eight different core driver genes were identi ed ( Figure 1B and Table 1). We also identi ed regulatory interactions between some differentially expressed genes and lncRNAs ( Figure 1C). We consulted the Human Protein Atlas Interactive Analysis to validate the protein and RNA levels of the core driver genes in breast cancer, including the levels of IDI1, BNIP3, IFRD1, COQ10B, and ZBTB10 ( Figure 2). Immunohistochemistry and RNA expression analysis indicated that IDI1, BNIP3, IFRD1, and ZBTB10 were signi cantly differentially expressed in cancer and healthy tissues. Gene expression pro les for 1,533 PCG and lncRNA genes related to breast cancer bone metastases were constructed with the WGCNA R package to build a co-expression network (merge Cut Height = 0.25, verbose = 3). The optimal threshold for the WGCNA was 12 ( Figure 3A). The sample clustering chart is shown in Figure 3B. We excavated 5 modules from the breast cancer bone metastasis gene expression pro le, including the blue (112 genes), turquoise (1,274 genes), brown (37 genes), yellow (20 genes), and grey (90 genes) modules ( Figure 3C). The results of the co-expression analysis are shown in Figure 3D. The relationships between the various modules and traits related to bone metastases in breast cancer are shown in Figure 3E. The genes in the yellow module are related to the disease characteristics of breast cancer bone metastases and breast cancer. In addition, the occurrence of bone metastases positively correlated with the expression of the genes in the yellow module.

Functional enrichment analysis and veri cation
The yellow module comprises 14 genes (Table 2). We used KOBAS to perform functional enrichment analysis using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) for the genes in the yellow module [6]. With a signi cance threshold of p < 0.05, we identi ed 7 signi cant KEGG functional pathways, which revealed that the core genes in the yellow module are enriched in the prolactin signaling pathway [7] ( Figure 4A). GO enrichment analysis revealed 300 related GO functions (p < 0.05) involving a large number of pathways that may be related to cancer development, including Enrichr was used to enrich the KEGG and GO functions of the 5 lncRNAs in the yellow module (p < 0.05). Some pathways enriched for the lncRNAs were related to amino acid transport ( Figure 4C and D), including alanine transport (GO: 0015808, p = 0.00249) and amino acid transmembrane transport (GO: 0003333, p = 0.00747). We identi ed the GABAergic synapse (p = 0.02205) in the lncRNA-enriched KEGG pathway, which is closely related to breast cancer metastasis [8,9]. Therefore, the yellow module is likely to be a pathogenic module.

Discussion
By comparing the differentially expressed genes identi ed by our screening with genes related to breast cancer bone metastasis in the NCBI database, we identi ed genes with veri ed links to breast cancer bone metastases, such as HSPB1 and PRL, in our data set, thereby validating our approach. HSPB1 expression is associated with a variety of human cancers with poor clinical prognosis. Furthermore, the HSPB1-encoded protein promotes cancer cell proliferation and metastasis, while protecting cancer cells from apoptosis. In addition, prolactin promotes breast cancer bone metastasis [14][15][16]. Expression of the receptor for prolactin can modulate the microenvironment to induce osteoclast formation.
Through functional analysis of lncRNA RP11-317-J19.1, PTP4A1, and BNIP3, we found that the interaction pairs formed by these genes participate in programmed cell death via cell apoptosis and autophagy [10][11][12][13]. The protein encoded by the PTP4A1 gene is a cell signaling molecule with a regulatory role in various processes, such as cell proliferation and migration; dysregulation of the protein may also be involved in the occurrence and metastasis of cancer [17][18][19][20][21]. The BNIP3 gene encodes a mitochondrial protein that contains a BH3 domain and functions as a pro-apoptotic factor. BNIP3 silencing may be mediated by lncRNA RP11-317-J19.1, allowing the protein encoded by the PTP4A1 gene to function normally, which may induce breast cancer cells to undergo uncontrolled proliferation and migration. PTP4A1 is highly expressed in several cancer types, and the overexpression of PTP4A1, which is associated with aggressive tumor characteristics, may be regulated by the PI3K/AKT pathway [22]. PTP4A1 expression can be regulated by microRNAs that control cellular processes in breast cancer, whereas miR-601 targets PTP4A1 to inhibit breast cancer growth and invasion [23]. The function of BNIP3 is similar to that of PTP4A1 response to the inhibition of cancer aggressive. PTP4A1 can be modulated by the lncRNAs HULC and RP4 in response to cellular injury [24,25]. These ndings suggest that RP11-317-J19.1, PTP4A1, and BNIP3 may play crucial roles in restraining the aggressiveness of cancer, thereby serving as strong predictors of breast cancer bone metastasis.

Conclusions
In conclusion, we constructed a differentially expressed lncRNA-mRNA network related to bone metastases in breast cancer and identi ed core driver genes. We found that expression modules related to alanine transport and amino acid transmembrane transport were differentially regulated in bone metastasis and normal samples. Our results reveal key genes and lncRNAs, including BNIP3 and RP11-317-J19.1, that are related to breast cancer bone metastasis. Our ndings lay the foundation for understanding the molecular basis of breast cancer bone metastasis.