Data processing
A total of 47 tissue sample raw files were attained from NCBI Gene Expression Omnibus. Firstly, we performed Robust Multiarray Averaging (RMA) correction, transformed signals with log2 and normalized the data through quantile normalization. Then, median polishing probe sets were summarized by “affy” R package and 9854 probes were selected for further analysis.
Construction of Weighted Gene Co-Expression Network of osteosarcoma
The sample clustering dendrogram of osteosarcoma is shown in Fig. 1A. The soft-power threshold β was measured by scale independence and mean connectivity analysis of modules with different power values ranging from 1 to 14. As revealed in Figure 1B-C, β = 9 were selected for further analysis of the osteosarcoma. We detected gene modules based on the TOM matrix and 17 modules were generated in osteosarcoma for further analysis and labeled 1-17. The major module had 1440 genes with the minimum of 30 genes and module included 294 genes in average. Figure 2A-B showed the interaction relationship analysis among co-expression genes.
Module-clinical trait correlations
It is of importance to identify diverse genes associated with specific clinical feature so as to clarify hiding mechanism behind clinical trait. The clinical trait of osteosarcoma patients involving caocaneum, femur, pelvis, rib, sacrum and tibia were selected for the analysis. As is shown in Figure 3A, modules of similar expression forms related to specific clinical feature were recognized on the basis of eigengene and clinic traits correlation. In order to identify related eigengenes, the heatmap and eigengene dendrogram was performed. As shown in Figure 3B-C, the dendrogram shows that green module significantly associated with osteosarcoma clinical traits (sacrum), and the scatterplot of Module Membership vs Gene Significance was plotted in green module (Figure 3D).
Functional enrichment analysis of genes in interested modules
In order to better understand the function of 266 candidate module genes, we used the online database DAVID, GO analysis, KEGG analysis using R package of DOSE and the clusterProfiler for pathway enrichment analysis[19, 20]. We conducted Go enrichment analysis in the selected green module[16]. As shown in Figure 4, the great majority of genes of green module were enriched in GO:0001501 (skeletal system development), GO:0060348 (bone development) and GO:0043062 (extracellular structure organization).
Module genes network analysis and hub genes identification
In order to understand the association between genes in the green module, we performed PPI network by Cytoscape software based on WGCNA. As shown in Figure 5, the co-expression networks of top-ranked genes for the selected green module was constructed. Genes interconnected with over 6 nodes were identified as hub nodes (MMP13, COL6A1, DCN, FAM20C, FBN1, GNG2, PCOLCE, PCOLCE2, RUNX2 and VCAN) and the expression level of above hub genes was depicted in Figure 6A.
Hub gene validation
The expression level of genes with the top 10 values were validated using the GSE9460 data set (Figure 6B). In the validation set, we found that 5 of the 10 genes (MMP13, DCN, GNG2, PCOLCE and RUNX2) were significantly up-regulated in osteosarcoma group (p < 0.05), consistent with the analysis of GSE19276.