Candidate biomarkers and gene modules investigation for bone tumor samples derived from castration-resistant prostate cancer bone metastasis patients using WGCNA

Background About 80-90% of castration-resistant cancer (CRPC) patients would develop bone metastasis, which leads to the disorder of bone metabolism and induces skeletal related events. However, except for the few approved radiotherapeutic and chemotherapy drugs, like radium-223 and denosumab, there is still lack of effective treatment targeting the bone metastatic tumor. It is necessary and significant to explore the mechanisms of bone metastasis and tumorigenesis, especially the differences between the tumor and normal cells in bone after metastatic colonization, which will provide a set of candidate genes for the screening of novel bone targeting agents. Results 4 datasets (GSE32269, GSE101607, GSE29650 and GSE74685) were obtained from the GEO database. 1983 differentially expressed genes (DEGs) were first identified between bone marrow tumor samples and normal marrow samples in GSE32269, followed by the weighted gene co-expression analysis. Most of the top 10 DEGs are found to be related with prostate cancer. 7 co-expression modules were then detected based on the 1469 DEGs shared by the 4 datasets, and 3 of them were found highly preserved among the other three datasets. The top 30 hub genes of the 3 modules were extracted. Among the enriched pathways of preserved modules, Cell adhesion molecules (CAMs) and Leukocyte transendothelial migration might play significant important roles in the tumor development in bone marrow. Literature searches further showed that a set of DEGs and hub genes might also contribute to the development of tumor in bone. Conclusions Together, our findings not only provide outline of expression profile in CRPC bone metastasis, but also screen a set of genes associated with CPRC tumor

cell colonization and development of bone tumor, which could be helpful for novel bone targeting agents screening.

Background
Prostate cancer (PCa) is one of the most common cancers and the tenth most common cause of cancer related mortality in men in China [1]. The rankings rise first in men in the developed countries [2]. Castration-resistant prostate cancer (CRPC) is an advanced form of prostate cancer by disease progression following surgical or pharmaceutical castration. This process is not inevitable, which is usually companied by poor prognosis and reduced survival time. To be known, CRPC patients are also at high risk of developing metastases. The common sites are bone, lymph nodes, liver, lungs and brain. However, bone is the most prominent site for metastases. About 80-90% of CRPC patients develop bone metastases [3]. Bone metastases could lead to the disorder of bone metabolism and induce skeletal related events(SREs), such as pathological fracture, spinal cord compression and hypercalcemia, which not only reduce survival time and life quality, but also increase burden of treatment [4].
At present, the diagnose of CRPC bone metastases was mainly based on Symptoms, Imaging and Histopathology. The treatment of CRPC bone metastases was mainly divided into three categories [5]: first, Radiotherapeutic drug, like radium-223, which was approved to treat the mCRPC patients; second, bone targeting therapy, like Denosumab, which was also approved to treat the patients with solid tumor bone metastases; third, trial therapy, like Cabozantinib and Dasatinib. Among them, the bone targeting agents showed a significantlly and potentially clinical actionability. However, the development of novel bone targeting agents needs a 4 deep understanding of mechanisms of bone metastasis and tumorigenesis. At present, the widely accepted mechanism of bone metastasis is the 'seed and soil hypothesis', which describes an interaction between circulating tumor cell and microenvironment of bone tissue [6]. Most of further researches focus on dissecting the process of initiation to development of distant metastasis, such as cancer cells migrate through the endothelial cells to gain access to systemic circulation via the tortuous and leaky tumor vasculature and cell signaling aberrations [7,8]. However, these researches do not explore the state of tumor cells after metastatic colonization and also do not detect the differences between the tumor cells and normal cells in bone, which could be more helpful to screen novel bone targeted agents. Therefore, based on the mentioned consideration, this study selected multiple expression datasets of bone marrow tumor samples derived from CRPC bone metastases patients, and hope to identify some candidate modules and genes through differential expression analysis and weighted gene co-expression network analysis. These modules and genes represent the significant changes of expression profiles in bone tumor cells, which will provide a cluster of candidates for further agents screening and validation.

Data collection and preprocessing
Four expression profile datasets containing CRPC bone metastasis were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo). Dataset GSE32269 was chosen for further analysis with 29 CRPC bone metastatic marrow samples and 4 normal bone marrow samples, which was used for bone cancer significantly genes selection and correlated modules detection. The other three datasets GSE101607, GSE29650 and GSE74685 were kept with only CRPC bone metastatic samples, which was used to validate and screen the truly significant and preserved bone cancer related modules. Detailed information of datasets was shown in table 1.
Before the analysis, all the raw data were reprocessed. Probes were mapped to gene symbols. Empty probes and probes mapping to multiple genes were both discarded according to each annotation platform. If there were multiple probes that mapped to the same gene symbol, their mean values were considered as the gene expression value. The reprocessed data was normalized by the limma (linear models for microarray data) package in R [9].

Identification of differentially expressed genes (DEGs)
The eBayes analysis was used to detect the DEGs between bone metastatic marrow samples and bone normal marrow samples in GSE32269 using limma package [9].
The adjusted P-value <0.05 and |log-fold change|>1 were set as the threshold for DEGs screening.
Enrichment Analysis R package clusterProfiler [10] was used for the Enrichment analysis of genes. False discovery rate (FDR) <0.05 was set as the threshold for the identification of significant GO-Enrichment terms and Pathway-Enrichment terms.

WGCNA analysis
The co-expression network analysis was performed using weighted gene coexpression network analysis (WGCNA) [11]. First, the soft threshold for network construction was selected, which is the lowest power for which the scale-free topology fit index curve flattens out upon reaching a high value. Second, the function blockwiseModules was used for one-step network construction and module 6 detection. Then, the module eigengene (ME) of each module was calculated, and the correlation between MEs was also calculated. Finally, the key node (hub gene) was determined by high intramodule connectivity of genes. According to the intromodule connectivity, the top 30 hub genes in modules were visualized using VisANT software [12].
In addition, network preservation at the module level was conducted between GSE322269 and the other three datasets using the function modulePreservation [13].
The comparability of two datasets is assessed by correlating measures of average gene expression and overall connectivity of two datasets. The higher the correlations of these properties, the better chance you will have of finding similarities between the two datasets at subsequent stages of analysis. GFI1, EPB42, which were functionally associated with prostate cancer or bone marrow. For example, KLK3 is a candidate marker for diagnosis and monitoring of prostate cancer [14]. EFNA1 plays an important role in angiogenesis and tumor neovascularization [15]. The expression profiles of these DEGs were showed as heatmap in Fig1B

WGCNA analysis
Since the 4 datasets come from different platforms, we should ensure that the 4 datasets are comparable. First, we need to limit the analysis to genes that expressed among the datasets. The intersection was taken among the DEGs of GSE32269 and the genes of other three datasets. 1469 genes were selected, and the corresponding expression profiles of these genes in 4 datasets were then prepared. Second, the comparability of GSE32269 and other dataset was assessed by measuring the average gene expression and overall connectivity between two datasets (Fig 2). It's clear to see that the correlations are positive and the p-value are significant in all cases, which suggests that the datasets are comparable.
The 1469 genes were further investigated as input for hierarchical clustering using the function hclust. We found that the 29 samples mainly yielded two clusters Module validation among the other 3 datasets In order to detect whether these modules are preserved between the other three datasets, module preservation statistics were calculated using the function modulePreservation. We set the threshold Z>10 to screen the highly preserved modules. As a result, Green, yellow and turquoise module are determined as the preserved modules. Besides, hub genes were determined according to the network topological index. Top 30 hub genes were investigated from each module as showed in Fig5.

Discussion
Development of bone metastases is a key and usual event in the progression of CRPC, which could lead to disorders of bone metabolism and skeletal related events.
The median survival form men with bone metastases CRPC is approximately 1.5 to 2 years. At present, bisphosphonate and denosumab are the standard therapeutic drug for bone metastasis, but the overall therapeutic effectiveness is limited. Therefore, the development of novel bone targeting agents is an imperious demand.
The purpose of this study was to dissect the expression profile differences between the established metastatic bone marrow samples and normal bone marrow samples and then screened some differentially expressed genes, co-expression modules and module-hub genes, all of which might be candidate biomarkers for the diagnose and treatment of bone tumor.
First, the screened differentially expressed genes are related with prostate cancer or bone development or blood development. For example, among the top 10 upregulated genes, KLK3 and KLK2, are highly enriched in prostate cancer, which are taken as effective biomarkers for diagnose and prognostic monitoring of prostate cancer [16]. GOLM1 [17], FOLH1B [18], STEAP1 [19] and PLPP1 [20] are also identified as a candidate biomarker for prostate cancer. AGR2 expressed strongly in prostate tissue and show increased expression in prostate cancer [21]. Interestingly, there are two genes showing different results. ACCP gene acts as a tumor suppressor of prostate cancer through dephosphorylation of ERBB2 and deactivation of MAPKmediated signaling. It should be expected to express lowly in the prostate cancer, including CRPC bone metastases. Similarly, decreased TSPAN1 was identified to promote prostate cancer progression [22]. However, both genes showed a highly expressed values with more than 16 fold change.
As for the top10 down-regulated genes, all of them are identified to be overexpressed in whole blood according to GTEx [23] and take part in embryonic development of blood and bone according to LifeMap Discovery [24]. However, further literature search results show no clear connection between bone tumor and these genes. Some of them were found to take an important in the tumorigenesis of other types of tumors. For example, Luo G et al. found that down-regulated LTF may serve as an important role in the dysregulation of the MAPK signaling pathway, which could induce the tumorigenesis of gastric tissue. CEACAM1 was reported to be highly expressed in several different cancers and is correlated with tumor progression, metastasis and overall survival [25,26]. The consistency and inconsistency with previous researches suggested that the tumorigenesis mechanism of bone metastasis of CRPC has changed compared to the tumorigenesis mechanism of primary prostate cancer. In a word, the genes showing inconsistent results might be significant features to detect and dissect tumorigenesis mechanisms of bone tumor.
Besides above DEGs, the modules identified among the DEGs using WGCNA analysis should get more attention, which predicts clusters of candidate genes involved in the tumorigenesis of bone marrow samples. Especially, the 3 modules (yellow, turquoise and green module), validated in other three datasets (GSE101607, GSE29650 and GSE74685), need further exploration. The top 30 hub genes were extracted from highly preserved modules and the corresponding expression profile were presented (Fig6). Surprisingly, all of these hub genes were all down-regulated in the bone marrow tumor samples. Pathway enrichment results of these hub genes were also showed. Compared to the enrichment results of whole module genes, three pathways were kept respectively in yellow and green module. Two of them should get more attention, which are Cell adhesion molecules (CAMs) and Leukocyte transendothelial migration. As fa as we know, CAMs take part in the process of integrating differentiation, proliferation and pro-survival signals from the surrounding microenvironment to the inner cell, which enables the numerous cellcell and cell-matrix interactions within the bone marrow microenvironment and the controlled lifelong self-renewal and progeny of hematopoietic stem and progenitor cells [27]. Leukocyte transendothelial migration is generally activated in cancer progression, which hampers the anti-tumor responses of the host [28]. The involved hub genes of these two pathways are CTSS, CXCR4, HLA-A/E, ITGA4, NCF2, PTPRC and VCAM1. Literature searches show that low expression of them might lead to the dyregulation of the pathways, which could directly affect tumor development. For example, CXCR4, the G-protein-coupled receptor, could lead to chemotaxis, enhanced intracellular calcium, cell adhesion, survival, proliferation, and gene transcription. Abnormal expression of CXCR4 is reported to be related with tumor growth, invasion, angiogenesis, metastasis, relapse, and therapeutic resistance [29].
Down-regulation of cathepsin S (CTSS) was also identified to suppress triple-negative breast cancer growth and metastasis [30]. NCF2 encodes p67phox, the cytosolic subunit of the NADPH oxidase enzyme complex, which acts as a p53 target gene. Down-regulation of NCF2 could stimulates apoptosis [31]. In the bone marrow, cancer cell VCAM-1 attracts and tethers α4 integrin-expressing osteoclast progenitors to facilitate their maturation into multinucleated osteoclasts that mediate osteolytic metastasis. Aberrant expression of VCAM-1 mediates distinct tumor-stromal interactions in bone microenvironments [32].
In addition, some of other hub genes were also found to show correlation with tumor or bone development. For example, PRKCB, was reported as a disease-specific druggable target for treatment of Ewing sarcoma, and the loss of PRKCB could induce apoptosis in vitro and prevented tumor growth [33]. CCNB2 [34] and RRM2 [35], could promote carcinogenesis. Gas7 [36,37], TNFAIP2 [38], CENPF [39], TOP2A [40], PTTG1 [41], and CTSS [30], are reported to be overexpression in different cancers. Low expression or knockdown of these genes could lead to the suppression of tumor cell proliferation, metastasis and invasion.
In a word, this study has screened a set of candidate genes that might contribute to the tumorigenesis of bone marrow cells. But the expression levels of these genes presented an obvious difference in bone tumor compared to other types of tumors.
This phenomenon suggested that tumor development in bone induced by CRPC bone metastases could have a different mechanism. The screened modules and hub genes would be good targets for further researches of tumorigenesis of bone and biomarkers screening.

Conclusion
In summary, this research creatively uses public data to identify tumor-related 13 modules and genes based on transcriptional network analysis. Three modules were found to be highly preserved among 4 GEO datasets. Literature searches showed that some of them are involved in bone development or related with tumorigenesis.
And some of the enriched hub genes are also implicated to be correlated with tumorigenesis in different types of cancers. Therefore, the hub genes of preserved modules might be used to offer clusters of candidate genes associated with the The data employed in this study are publicly available at: https://www.ncbi.nlm.nih.gov/geo. These genetic data have been employed in published studies and have been approved by the corresponding ethics committees.

Availability of data and materials
The data used in this study are available at: https://www.ncbi.nlm.nih.gov/geo. The accession numbers are GSE32269, GSE101607, GSE29650 and GSE74685.

Consent for publication
Not applicable.