Identication of potential prognostic markers for osteosarcoma by integrated analysis

Background: Osteosarcoma (OS), one of the utmost common and malignant cancer, accounts for over 30% among skeletal sarcomas. Although great efforts have been made, the mechanism of OS still remains largely unknown. Here, we intend to identify gene modules and candidate biomarkers for clinical diagnosis of patients with OS, and reveal the mechanisms of OS progression. Methods: Weighted gene co-expression network analysis (WGCNA) was conducted to build a coexpression network and investigate the relationship between modules and clinical traits. Functional enrichment analysis was performed on module genes. Protein-protein interaction (PPI) network was constructed to identify the hub gene and the expression level of hub genes was validated based on another dataset. Results: A total of 9854 genes were included in WGCNA, and 17 gene modules were constructed. Gene module related with OS in sacrum was mainly enriched in skeletal system development, bone development and extracellular structure organization. Furthermore, we screened the top 10 hub genes and further validated 5 of the 10 (MMP13, DCN, GNG2, PCOLCE and RUNX2), the expression of which were upregulated as compared with normal tissues. Conclusion: The hub gene we identied show great promise as prognostic markers for the management of OS and our ndings also provide new insight for molecular mechanism of OS.


Introduction
Osteosarcoma (OS), as one of the utmost common and malignant cancer, mainly affects the adolescent and elderly individuals [1,2]. OS can be divided into diverse sub-types associated with varying degrees of aggressiveness. Most OSs are high-grade malignancies, which is characterized by its aggressiveness and strong tendency for the development of pulmonary metastases [3]. The 5-year survival rate for patients with OS is less than 25%, indicating an extremely poor prognosis. Therefore, a better understanding of underlying mechanisms regarding the tumorigenesis and prognosis of OS plays a vital role to actively develop novel therapeutic targets [4,5].
Previous researches have analyzed differentially expressed genes (DEGs) between metastatic and nonmetastatic samples and functional enrichment analysis by microarray data of OS [6]. Several DEGs such as astrocyte elevated gene-1(AEG-1) and epidermal growth factor receptor (EGFR) were found to play an oncogenic role in the progression of OS [7,8]. However, these analyses cannot reveal the connections and interactions among genes that are crucial in biological processes.
Weighted gene co-expression network analysis (WGCNA) is an effective tool to reveal the correlations between genes and phenotypes in different samples. It functions as a kind of method to investigate gene modules which includes associated genes and establishes a weighted gene co-expression network.
WGCNA also serves a role to correlates gene modules of external sample trait [9]. As a result, WGCNA has been globally applied to help discover potential biomarkers and therapeutic sites during various biological processes including cancer [10][11][12].
In our study, we proposed to build co-expression module by analyze the data of osteosarcoma and nd the modules of co-expressed genes which affected speci c sites of OS. We applied the GO and KEGG enrichment analysis on interested modules to investigate involved cell signal pathways and functional genes to understand the main gene function in speci c module. Furthermore, module gene network was created and investigated hub genes in the module by Cytoscape software. Finally, we validated the top values of genes in the PPI network based on another data set and got 5 hub genes, such as MMP13, DCN, GNG2, PCOLCE and RUNX2, which may be associated with the OS development and clinical characteristics. Our study not only laid a theoretical foundation for further research and the discovery of new OS target biomarkers, but also helped in nding a novel and potential clinical therapy.

Materials And Methods
Data acquisition and processing Series matrix les and data tables in microarray platform for WGCNA searched with keywords "Osteosarcoma" (Title) AND "Homo Sapiens"(Organism) were attained from NCBI Gene Expression Omnibus (GSE19276). The joint data set involved 47 osteosarcoma samples, including tibial group (n = 13), femoral group (n = 18), pelvic group (n = 7), Sacral group (n = 2), rib group (n = 2) and calcaneus group (n = 5). Before WGCNA analysis, we applied the array annotations to pair array probes with corresponding gene IDs. Then we removed the probes with multiple genes and counted mean expression of the gene values detected by multiple probes. An appropriate threshold was determined on the basis of different threshold of expression value. We applied WGCNA algorithm to assess the expression of different genes, and then we used ashClust to apply cluster analysis with proper threshold [9,13,14].

Construction of WGCNA and identi cation of modules
By constructing co-expression network gene modules related to osteosarcoma, we applied function soft Connectivity in WGCNA package to build a co-expression network in GSE19276 dataset. The top 5000 genes were screened by this algorithm for further analysis with the soft threshold set at 9. Weighted coexpression relationships among datasets was evaluated by paired Pearson correlations [9]. Hierarchical clustering analysis was performed by applying topological overlap matrix (TOM) and the clustering dendrogram was constructed by hierarchical clustering. Different branches in clustering dendrogram denoted diverse gene modules. All the genes were classi ed based on their expression patterns and weighted correlation coe cients of genes. Module was de ned as the different genes which have similar patterns. Module-trait association was valued by matching each module and clinical data provided by GSE19276 dataset.
Functional and pathway enrichment of gene modules Kyoto encyclopedia of genes genomes (KEGG) pathway analysis and gene ontology (GO) were performed to recognize the function and pathways regarding each module. The DAVID 6.8 (https://david.ncifcrf.gov/) database was used with the threshold p < 0.05 for the KEGG and GO [15,16].

Hub Gene Analysis and Validation
Hub genes have been shown to be functionally signi cant and exceedingly connected with nodes among modules.
The protein-protein network (PPI) was constructed by uploading the genes among certain module to identify the importance of the top graded genes [17]. The Cytoscape software was further used to conduct the network analysis and we identi ed hub genes as genes of top 10 degree [18]. Independent gene expression pro les (GSE9460) containing osteosarcoma and healthy bone tissue were attained by the Gene Expression Omnibus. And the test sets were used to validate expression level of 10 genes with the highest degree. The dataset GSE9460 contained 15 primary osteosarcoma tissues and 3 healthy bone tissues.

Results
Data processing A total of 47 tissue sample raw les were attained from NCBI Gene Expression Omnibus. Firstly, we performed Robust Multiarray Averaging (RMA) correction, transformed signals with log2 and normalized the data through quantile normalization. Then, median polishing probe sets were summarized by "affy" R package and 9854 probes were selected for further analysis.

Construction of Weighted Gene Co-Expression Network of osteosarcoma
The sample clustering dendrogram of osteosarcoma is shown in Fig. 1A. The soft-power threshold β was measured by scale independence and mean connectivity analysis of modules with different power values ranging from 1 to 14. As revealed in Figure 1B-C, β = 9 were selected for further analysis of the osteosarcoma. We detected gene modules based on the TOM matrix and 17 modules were generated in osteosarcoma for further analysis and labeled 1-17. The major module had 1440 genes with the minimum of 30 genes and module included 294 genes in average. Figure 2A-B showed the interaction relationship analysis among co-expression genes.

Module-clinical trait correlations
It is of importance to identify diverse genes associated with speci c clinical feature so as to clarify hiding mechanism behind clinical trait. The clinical trait of osteosarcoma patients involving caocaneum, femur, pelvis, rib, sacrum and tibia were selected for the analysis. As is shown in Figure 3A, modules of similar expression forms related to speci c clinical feature were recognized on the basis of eigengene and clinic traits correlation. In order to identify related eigengenes, the heatmap and eigengene dendrogram was performed. As shown in Figure 3B-C, the dendrogram shows that green module signi cantly associated with osteosarcoma clinical traits (sacrum), and the scatterplot of Module Membership vs Gene Signi cance was plotted in green module ( Figure 3D).

Functional enrichment analysis of genes in interested modules
In order to better understand the function of 266 candidate module genes, we used the online database DAVID, GO analysis, KEGG analysis using R package of DOSE and the clusterPro ler for pathway enrichment analysis [19,20]. We conducted Go enrichment analysis in the selected green module [16]. As shown in Figure 4, the great majority of genes of green module were enriched in GO:0001501 (skeletal system development), GO:0060348 (bone development) and GO:0043062 (extracellular structure organization).

Module genes network analysis and hub genes identi cation
In order to understand the association between genes in the green module, we performed PPI network by Cytoscape software based on WGCNA. As shown in Figure 5, the co-expression networks of top-ranked genes for the selected green module was constructed. Genes interconnected with over 6 nodes were identi ed as hub nodes (MMP13, COL6A1, DCN, FAM20C, FBN1, GNG2, PCOLCE, PCOLCE2, RUNX2 and VCAN) and the expression level of above hub genes was depicted in Figure 6A.

Hub gene validation
The expression level of genes with the top 10 values were validated using the GSE9460 data set ( Figure  6B). In the validation set, we found that 5 of the 10 genes (MMP13, DCN, GNG2, PCOLCE and RUNX2) were signi cantly up-regulated in osteosarcoma group (p < 0.05), consistent with the analysis of GSE19276.

Discussion
Osteosarcoma (OS) is one of the utmost common and malignant cancer, which mainly affects children and adolescents with the clinical feature of high metastatic potential. During the past several decades, despite great improvement in treatment strategies has been achieved to eliminate the local tumor, the survival rate for patients still gets unchanged in the long term [1,2]. So as to improve survival rate of patients suffered from osteosarcoma, it is of great importance to put more efforts in underlying mechanism of osteosarcoma. Previous studied have demonstrated several target biomarkers of OS based on both gene expression pro ling and micro-RNA pro ling of the primary tumor [21]. Wang and Baumhoer et al. had analyzed the OS sample by RT-PCR or immunohistochemistry and observed that insulin-like growth factor-1 receptor (IGF-1R) was independent prognostic markers for OS and the high expression of IGF-1R was correlated with progression and metastasis of OS [22,23]. A study performed by Duan et al. evaluated different miRNA in OS by Real-time PCR and reported that 3 miRNA were increased in osteoblast as comparted with different kinds of OS cells, while 2 miRNAs were downregulated in osteoblast [24]. Still, we still lack powerful prognostic biomarkers for patients with OS. WGCNA, an effective method to detect the complicated relationships between different genes and phenotypes, helps us apperceive signaling networks associated with clinical traits. The analysis of WGCNA helps us to identify modules related to biological process and hub genes, which might be biomarkers for identi cation or therapy [9]. By constructing WGCNA to detect gene expression of OS, Fan et al. identi ed matrix metallopeptidase 3 (MMP3) as a vital gene for OS development [25]. Likewise, Wang et al. reported the involvement of Matrix extracellular phosphoglycoprotein (MEPE) and Hemoglobin A2 (HBA2) in OS [26]. However, the precise mechanism of OS remains to be elucidated and more candidate biomarkers or therapeutic target needs to be discovered. Here, we utilized WGCNA to investigate co-expression modules associated with OS and forecasted candidate gene in speci c bioprocess.
We rst constructed 17 co-expression modules among 9854 genes in 47 OS samples via WGCNA, followed by module-trait correlations. As a result, we identi ed the green module that related to clinical traits (patient's OS location). Then the functional enrichment analysis was conducted in green module.
Consistent with the previous study by Xiong and colleague, our result exhibited the green module was mainly enriched in pathway related to skeletal system development, bone development and extracellular structure organization [6]. Since the development of OS was accompanied with oncogenes and antioncogenes of abnormal levels, the enriched pathway that the green module involved in may regulate genes which were associated with skeletal system development. Heng et al. assessed the DEGs and microRNAs in OS samples and found that downregulated DEGs COL5A1 were enriched in cellular structure organization as well as extracellular collagen biosynthesis pathway [27]. Increased extracellular collagen degradation might promote the development of OS and metastasis to lungs, which indicated that genes involved in collagen synthesis might have a negative effect on extracellular matrix, which may be responsible for OS tumor invasion in an undirect manner.
In addition, PPI network was built and hub genes in OS-related regulatory network was recognized. In this study, we assessed the importance of genes via degree in network. The hub genes with highest degree were identi ed and COL6A1, MMP13, DCN, FAM20C, FBN1, GNG2, PCOLCE, PCOLCE2, RUNX2 and VCAN were viewed as hub genes. Furthermore, we validated 5 of 10 hub genes, MMP13, DCN, GNG2, PCOLCE, and RUNX2, had signi cantly high expression pro le in the validation data set.
As is reported, numerous genes had correlations with OS. Matrix metalloproteinase-13(MMP13) had been recognized as one of the most important regulator involved in tumor osteolysis. Ma et al. identi ed MMP-13 as the target gene of proteolytic enzyme that regulated tumor-induced osteolysis [28]. Hirahata and colleagues showed that high MMP13 expression was of great importance for the survival of OS cells [29]. Decorin (DCN), Procollagen C-proteinase enhancer protein(PCOLCE) and Runx2 had an vital part in lung metastasis of osteosarcoma [30,31]. Runx2 is a kind of protein that play major roles in bone formation. A study by Won assessed the expression of Runx2 in OS tissues and concluded that overexpression of Runx2 was associated with OS metastasis. Also, patients of osteosarcoma who had good reaction to chemotherapy usually had lower expression level of Runx2 compared with bad responders [32]. No reports have revealed that G Protein Subunit Gamma 2 (GNG2) was related to OS and further study was needed.

Conclusion
To sum up, the green module was viewed as a critical module correlated with progression of osteosarcoma patients. Furthermore, the hub genes in the module were con rmed through another dataset, which might help in identifying diagnostic and prognostic biomarkers of OS.

Declarations Acknowledgments
We greatly appreciate the nancial support from the National Natural Science Foundation of China (81772433).

Funding
This work was supported by grants from the National Natural Science Foundation of China (no. 81772433).

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate

Competing interests
The authors declare that they have no competing interests.   The Cytoscape 3.4 software was applied to visualize proteins in different expression patterns in enriched KEGG pathways. Nodes containing proteins from related KEGG terms are colored the same.