Identification and characterization of metastasis-associated individualized gene expression signature in osteosarcoma

Introduction: Osteosarcoma (OS) patients with complete surgical resection still relapse with poor prognosis. Part of this is due to the inability to accurately detect distant metastasis. Thus it’s enssential to identify metastasis-related biomarkers for OS. Methods: In present study, a computational pipeline based on relative expression orderings (REOs) using gene expression proles was constructed in metastases and non-metastases OS patients. Results: 138 metastasis-associated gene pair signature (MGPS) were identified follow two independent datasets. In order to further extract metastasis-associated biomarker for clinical application, a metastases-specific co-expressed MGPS network was constructed and analyzed. MGPS such as MYL5 and RPL27A showed strong positive correlation (Cor =0.75, P <0.001) in metastatic OS patients. There were thirteen MGPSs in above network were associated with prognosis. These prognostic MGPSs could become as a specific classifier to distinguish metastases and non-metastases OS patients. Functional analysis showed MGPSs were associated with cancer metastasis-related functions. Drug and MGPS network could provide some drug candidates for treatment of OS. Conclusions: Collectively, the roles of the MGPSs in OS were elucidated, which could be beneficial for understanding OS pathogenesis and treatment.


Background
The American Cancer Society reported that the incidence of childhood and adolescent cancers is increasing rapidly. Among these, osteosarcoma (OS) is one of the cancer with highest mortality rate in the paediatric population [1]. In the last 35 years, there was no significant improvement in the fiveyear survival rate if metastases are present during diagnosis [2,3]. Similar to the observation in other cancers, metastasis is a major factor fo most death (75%) of OS patients [4]. For non-metastatic OS patients, tumor resection together with chenotherapy is a successful treatment plan. However, planning appropriate treatment for metastatic OS is still a great challenge for researchers and clinicians.
Cancer metastasis includ two process: (1) cancer cells disseminate from primary tumors to whole body. (2) establish secondary lesions in distant organs [5]. Identifying metastases-associated biomarker was a key insight of diagnosis and treatment for OS. Recent years, some attempt has been contributed to identify metastases-associated biomarkers using gene expression profiles for many kinds of cancers [6][7][8]. However, the number of metastases-associated biomarkers which could be applicated for clinic is very small. Usually, most of the suggested metastases-associated biomarkers, classify patients into different risk groups by differential expression or clustering. The thresholds of differential expression which generated from training data set could't be directly applied to another data set. One of major reason is the gene expression level is sensitve based on diverse batch effect and platform of microarray. Furthermore, this means that the generated data of all the samples detected together should be normalized when only deal with a single sample. In other words, risk composition of other samples which are normalized together determines the risk prediction of a single sample [9].
One of insensitive approach for systematic biases of microarray is within-sample relative expression orderings (REOs). REOs are invariant and robust for monotonic data normalization and against differences within biological indibiduals [9, 10]. In past years, REOs have been proposed to identify prognostic signatures [11,12]. However, the performance of REOs in predicting metastasesassociated signatures for OS has been not reported. Therefore, it is worth tring to identify robust metastases-associated biomarkers for clinical application in OS based on the rank-based approach.
In present study, a computational pipeline based on REOs using gene expression proles was constructed in metastases and non-metastases OS patients. Follow two independent datasets, 138 common metastasis-associated gene pair signature (MGPS) were identified in OS. A metastasesspecific co-expressed MGPS network and its topological characteristics was constructed and analyzed.
Some of MGPS such as MYL5 and RPL27A showed strong positive correlation (Cor = 0.75, P < 0.001) in metastatic OS patients. Thirteen prognostic MGPSs in co-expressed MGPS network were discovered.
These prognostic MGPSs could better classify metastases and non-metastases OS patients. Functional analyses suggested that MGPSs were related with regulation of cancer and cancer-related pathways.
Drug and MGPS network could provide some drug candidates for treatment of OS. Collectively, this study could provide assistance for diagnosis and treatment of OS. 4

Extract GMPSs in OS for metastases and non-metastases patients
We built an integrated and computational pipeline based on REO approach using gene expression profiles ( Figure 1A). First, 245 metastases-related genes were identified and they would be used to follow analysis. Second, a 0-1 matrix for a pair of genes (G m and G n ) in metastases and nonmetastases OS patients according to expression pattern of the gene pair. Then, if the frequency of metastases OS patients accord with a particular REO pattern was measured by fisher's exact test compared with non-metastases controls. The particular REO pattern was expression Gm > Gn or Gm < Gn. Lastly, MGPSs were identified if the P values of Fisher's exact test were smaller than 0.05. We performed this pipeline based on dataset from TARGET. The distribution of P values of MGPSs with fluctuation is between 0.01 and 0.05 ( Figure 1B). We divide all P values into five regions: 0 to 0.01, 0.01 to 0.02, 0.02 to 0.03, 0.03 to 0.04 and 0.04 to 0.05. The amount in each region had no great difference ( Figure 1C). In order to ensure the accuracy and stability of MGPSs for OS, another dataset GSE33382 from GEO was used for identifying MGPSs. The P values of GSE33382 showed similar distribution to TARGET dataset ( Figure 1D, 1E). Lastly, we got 138 common MGPSs in both dataset TARGET and GSE33382 for OS ( Figure 1F).

OS
The correlation values of MGPSs would showed great difference between metastases and nonmetastases OS patients. According to the inference, a metastases-specific co-expressed network was constructed (Figure 2A). The network contained 71 nodes and 57 edges. More than half MGPSs showed great difference (absolute values of difference for PCCs > 0.25) between metastases and nonmetastases OS patients. There were 50.88% and 5.26% MGPSs were both positive and negative correlated in metastases and non-metastases OS patients ( Figure 2B). The correlated direction of 43.86% MGPSs were reversed in metastases and non-metastases OS patients. The global network showed scale-free distribution (R square= 0.75) which is a specific topological feature of transcriptional regulatory network ( Figure 2C). We found metastases-related gene RPL27A had 5 highest degree in this metastases-specific co-expressed network (degree= 19). Most of the interacted strength between RPL27A and other genes had great difference in metastases and non-metastases OS patients. For example, gene RPL27A and MYL5 had strong positive correlations (Cor= 0.75, P< 0.001) in metastatic OS patients ( Figure 2D). However, gene RPL27A and MYL5 were not correlated in non-metastatic OS patients (Cor= 0.02, P= 0.86, Figure 2E). The results indicated that these MGPSs maybe play key and specific roles in metastases process for OS patients.

Some genes in MGPSs were associated with prognosis for OS patients
As we known, time to metastases and number of lesions are the most important prognostic factors for OS patients [13]. Thus it's essential to consider the prognostic roles of MGPSs in OS patients. 13 genes were associated with survival in above metastases-specific co-expressed network for OS ( Figure 3A). For example, prognosis-related MGPSs CYBB and P4HA1 showed strong negative correlations in metastases OS patients (Cor= -0.35, Figure 3B). Prognosis-related MGPSs CYBB and P4HA1 also showed strong negative correlations in metastases OS patients (Cor= -0.25, Figure 3C).
Gene P4HA1, CYBB and CD53 were all related with survival ( Figure 3D, E, F). In addition, samples with high expression for these genes usually showed worse survival. All the results indicated that some genes in MGPSs were associated with prognosis for OS patients.

Prognosis-related genes in MGPSs could classify metastases and non-metastases OS patients
We constructed a prognosis-related MGPS network which extracted from metastases-specific coexpressed network. This network contained prognosis-related genes and their direct neighbored genes in metastases-specific co-expressed network ( Figure 4A). There were 25 nodes (13 prognosisrelated genes) and 18 edges in this prognosis-related MGPS network. We found three MGPSs including P4HA1-CYBB, P4HA1-EVI2B and P4HA1-CD53 were all associated with survival. In order to validate if these prognosis genes in MGPSs could become biomarkers for metastases of OS patients, we classify metastases and non-metastases OS patients using gene expression profile follow a consensus clustering method. The prognosis-related genes in MGPSs could distinguish all samples into diverse groups. Final number of group was 4 based on area under CDF curve plot (Figure 4B and C). Each 6 OS patients group had a consensus expression pattern and could distinguish clearly ( Figure 4D).
Most OS patients could be classified accurately (Chi-square test, P< 0.001) and especially the last group (C4) were all matched with the non-metastases OS patients ( Figure 4E). Collectively, all the above results indicated that the expression of prognosis-related genes in MGPSs could be served as specific biomarkers to distinguish metastases and non-metastases OS patients. On the contrary, metastasis would be promoted by self-eating based on strengthening fitness of tumor cell environmental stresses response including anoikis during metastatic progression [17]. For KEGG pathway enrichment analyses, these MGPSs were enrichment in some key pathways associated with cancer or metastasis such as Natural killer cell mediated cytotoxicity, B cell receptor signaling pathway and MAPK signaling pathway ( Figure 5B). Natural killer cells participate to immune response against metastasis [18]. These functional enrichment analysis showed these GMPSs were associated with cancer metastasis.
We also constructed a drug-related MGPS network to explore drug repurposing candidates for OS ( Figure 5C). We found Proline (DB00172) was a drug repurposing candidate and its target gene were P4HA1 and PYCR2. Proline is one of the twenty amino acids in organism, which is a component of protein. Normal functions of jonts and tendons and maintain and strengthen all reply on proline at a great extent [19]. Azacytidine is clinically used to treat myelodysplastic syndrome, a group of heterogeneous bone marrow stem cell diseases [20]. In our analysis, Azacitidine was considered as a drug candidate for OS. Azacitidine is a cytidine nucleoside analogue, which has the clinical activity of 7 myelodysplastic syndrome and the potential activity of solid tumor.

Discussion
In this study, we suggested that the MGPSs using REO approach follow expression level of genes play specific roles in metastasis process of OS patients. These MGPSs could be as metastasis-related biomarkers for OS. Differential expression analysis for gene expression profiles between metastases and non-metastases samples was considered as a common approach. The adoption of normalization for other samples contribute to differential expression of patients and it would generate a lot of uncertainty for risk classification of patient. Especially, the uncertainty would increase if the sample size was small and could't represent all the disease patients [21]. Specially, qualitative REOs-based MGPSs would identify more accurated personalized information of individual patient than traditional differential expression in clinical application.
A hypothesis that the OS patients with poor prognosis might harbor metastases was advanced. Thus we performed survival analysis and identified 13 prognosis-related genes in MGPSs. The problems of nsufficient power and too complex to evaluate metastasis would be present based on all the MGPSs in OS patients. According the clinical needs, we proposed prognosis-related MGPSs based a strict voting criterion for survival analysis and proved that these prognosis-related MGPSs performed better than all the MGPSs. The results indicated that prognosis-related MGPSs maybe more suitable for clinical application of predicting metastasis. In order to explore if these prognosis-related genes in MGPSs could be specific biomarkers of metastasis in OS patients, we used them to classify metastases and non-metastases patients based on gene expression. The 13 prognosis-related genes could classify metastases and non-metastases patients (P < 0.001).
In prognosis-related GMPSs network, gene P4HA1 was a key node with highest degree. The previous study has reported that P4HA1 was the active catalytic component of prolyl 4-hydroxylase in cancers [22], glioma [23], prostate cancer [24] and pancreatic cancer [25], P4HA1 can promote chemoresistance, tumor growth and metastasis. The prognosis-related gene CD53 could form a MGPS with P4HA1 in OS patients. CD53 is essential for CD2 signal transduction, growth regulation and cell survival of cancer [26]. P4HA1 was also a drug target in drug-related MGPS network. The roles of 8 P4HA1 in OS should be explored and validated.

Conclusions
In this study, some MGPSs were identified and characterized for OS patients based on gene expression profiles. Correlations of some MGPSs showed obvious difference between metastases and non-metastases samples. Prognosis-related genes in metastases-specific co-expressed network could become as specific biomarkers to classify metastases and non-metastases OS patients. The functional analysis showed the association between MGPSs and cancer metastasis in OS patients. Collectively, our study provides novel insights into the mechanisms underlying the roles of MGPSs in metastasis process for OS.

Clinical and gene expression profile datasets of OS
There were two main independent datasets in present study. First dataset including gene expression and related clinical information of 92 OS patients containing 23 and 69 metastatic and non-metastatic patients were obtained from the Therapeutically Applicable Research To Generate Effective Treatments (TARGET, https://ocg.cancer.gov/programs/target) data portal, which included 17070 mRNAs. TARGET program provides a conprehensive genomic landscape to explore molecular characteristics of childhood cancers. Providing a novel guide for developing effective therapeutic plans based on generated data is the major goal of TARGET program. The case selection criteria and sample details could be obtained at https://ocg.cancer.gov/programs/target/projects/osteosarcoma. Then, the second gene expression profile was obtained from gene expression omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE33382). Dataset GSE33382 gene expression profile included 23 and 69 metastatic and non-metastatic patients. The detailed information of patients could be found in previous study [27,28]. Corresponding platform files (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL10295) was used to map the gene Ids to genes. The averaged expression values were calculated and applied when multiple probes correspongding to a same gene for each sample. The probes which could't match any gene or matched not only a gene would be removed.

Identification of MGPS in OS
First, some metastases-related genes were obtained from CancerSEA (http://biocc.hrbmu.edu.cn/CancerSEA/), which is a database describes the functions of cancer cells [29]. We got 245 metastases-related genes and they would be used to follow analysis. The gene pair must have at least one metastases-related gene. Then, we built a 0-1 matrix for a pair of genes (G m and G n ) in metastases and non-metastases OS patients. Lastly, if the frequency of metastases OS patients accord with a particular REO pattern was measured by fisher's exact test compared with nonmetastases controls. The particular REO pattern was expression G m > G n or G m < G n . The significant gene pairs evaluated with P < 0.05 were considered as MGPSs.

Construction and analysis of metastases-specific co-expressed MGPS network and its topological features
In order to construct a metastases-specific co-expressed MGPS network, pearson's correlation coefficients (PCCs) are calculated for each MGPS in metastases and non-metastases patients, respectively. The MGPSs were considered as metastases-specific co-expressed MGPSs, if absolute values of difference between PCCs in metastases and non-metastases patients were bigger than 0.2.
Then a metastases-specific co-expressed MGPS network was constructed using Cytoscape 3.3.0 (http://www.cytoscape.org/). The degree analysis of the network was also using Cytoscape 3.3.0.

Survival analysis of MGPSs for OS patients
In order to evaluate the performance about the MGPS for prognosis in OS, we performed survival analysis for genes in each MGPS. Follow the median value of expression level for each gene, the OS patients were divided into two risk groups. And then, Kaplan-Meier (K-M) survival analysis was used for the two groups. P < 0.05 was consider as prognostic gene for OS.

Classification power of the prognostic MGPS in OS
Consensus clustering approach was used to classify metastases and non-metastases OS patients based on expression data of genes [30]. A R package named ConsensusClusterPlus (https://www.rproject.org/) was performed to this process. Best category number was select when the areas of Cumulative distribution function (CDF) curves were smallest. Chi-square test was applied to evaluate if metastases and non-metastases OS patients could be classified using this method (P<0.01).

Functional and drug enrichment analysis for MGPSs in OS
Online Enrichr (http://amp.pharm.mssm.edu/Enrichr/) tool was applied with default parameters to functional enrichment analysis for genes in MGPSs [31]. The enriched GO terms (P<0.01) and KEGG pathways (P<0.05) were extracted and considered as MGPS-associated functions. The gene-drug interaction data are download from DrugBank (https://www.drugbank.ca/) [32]. Then a drug-related MGPS network was constructed and analyzed to identify drug repurposing candidates for OS.

Funding
Not applicable.

Disclosure of interest
The authors declare that they have no competing interest.