3.1 Clinical characteristics of different cohorts
Clinical characteristics of TARGET and GSE21257 cohorts were shown in Table 1. Because age and overall survival time (OS.time) were not always normally distributed in the two sets, so Kruskal test was adopted in the comparison between the two sets. Results showed that there was no significant difference between the two sets in gender, site, metastasis, OS.time and OS status. In GSE21257, the median of age was elder than that in TARGET cohort, with P value = 0.048. The two cohorts were considered to be similar and comparable. Thus, we considered GSE21257 to be a suitable validation set.
Table 1
Clinical characteristics of TARGET-OS and GSE21257 cohorts
Characteristics
|
Overall
|
TARGET
|
GSE21257
|
p
|
test
|
Sample size
|
137
|
84
|
53
|
|
|
Age (median)
|
15.10 [12.35, 18.08]
|
14.35 [12.20, 17.23]
|
16.67 [13.67, 18.58]
|
0.048
|
kruskal.test
|
Gender
|
|
Female
|
56 (40.9)
|
37 (44.0)
|
19 (35.8)
|
0.440
|
chisq.test
|
Male
|
81 (59.1)
|
47 (56.0)
|
34 (64.2)
|
Site
|
|
Axial skeleton
|
4 (2.9)
|
4 (4.8)
|
0 (0.0)
|
0.372
|
chisq.test
|
Femur
|
65 (47.4)
|
38 (45.2)
|
27 (50.9)
|
Tibia
|
36 (26.3)
|
21 (25.0)
|
15 (28.3)
|
others
|
32 (23.4)
|
21 (25.0)
|
11 (20.8)
|
Metastasis
|
|
Absent
|
102 (74.5)
|
63 (75.0)
|
39 (73.6)
|
1.000
|
chisq.test
|
Present
|
35 (25.5)
|
21 (25.0)
|
14 (26.4)
|
OS.time (median)
|
46.00 [22.77, 77.00]
|
48.65 [20.63, 69.15]
|
45.00 [27.00, 94.00]
|
0.122
|
kruskal.test
|
OS status
|
|
Alive
|
87 (63.5)
|
57 (67.9)
|
30 (56.6)
|
0.250
|
chisq.test
|
Dead
|
50 (36.5)
|
27 (32.1)
|
23 (43.4)
|
3.2 Results of the prognostic gene identification
14 genes were screened by univariate Cox regression analysis with prognostic value (P value < 0.05). 11 of them with hazard ratio (HR) > 1 and 3 with HR ≤ 1. The prognostic value of genes with HR close to 1 were limited, such as MYC. While with HR value far from 1 were more suitable as prognostic genes. The minimum and maximum HR values were 0.94 for CTNNBIP1 and 1.1 for FZD3, respectively. Here, CTNNBIP1 with the smallest HR value 0.94 was considered to be one of the best candidates for a prognostic model. Finally, 9 genes with P value < 0.01 (DKK1, MYC, WNT16, PLCB4, PPP3CC, PRKX, CTNNBIP1, WNT10B, WNT11) were select as candidates for the next screening (Fig. 1).
Result of Boruta feature selection showed that CTNNBIP1, PRKX, DKK1, PPP3CC and MYC were genes confirmed according the importance of the genes (Fig. 2A-B) and CTNNBIP1 got the highest importance score. KM analysis of the 9 genes showed that CTNNBIP1, PLCB4, MYC, PRKX were genes with P value < 0.05 between their high and low expression groups respectively and CTNNBIP1 got the smallest P value 0.0024 (Fig. 2C-F). Therefore, taking the above results together, we considered CTNNBIP1 to be a hubgene with the highest prognostic value in Wnt signaling pathway of osteosarcoma and it was selected as an indicator for the prognostic model.
3.3 Identify clinical and immune cell infiltration indicators for the prognostic model
Clinical features were screened by univariate and multivariate Cox regression analysis in TARGET-OS cohort and the results were shown in Table 2. Metastasis and tumor site were considered to be indicators with prognostic value (P < 0.05).
Table 2
Univariate and multivariate Cox regression analysis of the clinical features
Characteristics
|
Univariate Cox regression
|
Multivariate Cox regression
|
Hazard.Ratio (95%CI)
|
P.Value
|
Hazard.Ratio (95%CI)
|
P.Value
|
Age
|
1 (0.92–1.09)
|
0.976
|
#N/A
|
#N/A
|
Ethnicity
|
0.38 (0.13–1.07)
|
0.068
|
#N/A
|
#N/A
|
Gender
|
0.71 (0.34–1.52)
|
0.382
|
#N/A
|
#N/A
|
Metastasis
|
4.76 (2.22–10.22)
|
0
|
4.43 (2.01–9.75)
|
0
|
Race
|
1.31 (0.38–4.56)
|
0.672
|
#N/A
|
#N/A
|
Region
|
0.34 (0.11–1.04)
|
0.059
|
#N/A
|
#N/A
|
Site (control: axial skeleton)
|
|
Tibia
|
0.1 (0.02–0.51)
|
0.006
|
0.1 (0.02–0.52)
|
0.006
|
Femur
|
0.34 (0.1–1.19)
|
0.092
|
0.25 (0.07–0.92)
|
0.038
|
Others
|
0.19 (0.04–0.78)
|
0.022
|
0.17 (0.04–0.76)
|
0.02
|
Surgical strategy
|
2.34 (0.3-18.02)
|
0.413
|
#N/A
|
#N/A
|
Immune scores of 64 immune cells for patients in TARGET-OS and GSE21257 cohorts were calculated by “xCell” package. Univariate Cox regression analysis in TARGET-OS cohorts indicated that Plasma cells, MPP, Macrophages, Macrophages M2, Sebocytes, CD4 T cells, Macrophages M1, Keratinocytes were candidate indicators with P value < 0.05 (additional file 1). Then, GSE21257 cohort was used to validate the prognostic value of these cells, result showed that only Macrophages M2 was successfully validated in GSE21257 with P value = 0.018 in univariate Cox regression analysis (additional file 2).
3.4 Establishment and evaluation of the prognostic model in training set.
The survival characteristics of patients in high and low CTNNBIP1 expression groups in TARGET-OS cohort were shown in Fig. 3A. Patients were divided into high and low expression group by median of CTNNBIP1 and more alive patients with longer survival time were found in the high expression group (more green dots in the upper left of the plot). Then, the prognostic model, established by CTNNBIP1, metastasis, site and M2 macrophages infiltration was visualized by a nomogram (Fig. 3B). By summing up the points of each indicator in the nomogram to predict the survival probability of an osteosarcoma patient, we could see that the expression of CTNNBIP1 and M2 macrophages infiltration acted as protective factors, while present metastasis and tumor site at axial skeleton were risk factors for survival. C-index of the model in training set was 0.812 (95%CI: 0.768–0.856). Meanwhile, calibration analysis showed that the predict overall survival rates of 1-, 3- ,5-year were in high agreement with the observed overall survival rates (the solid color line were close to gray dashed ideal line) (Fig. 3C). Results of C-index and calibration analysis suggested high prediction accuracy of the model in training set. AUCs of Time dependent ROC analysis for 1-, 3- and 5-year prediction of the model were 0.96, 0.81 and 0.81, respectively (Fig. 3D). Decision curve analysis showed that patients could got higher net benefit from the model (nomogram) than other strategies (based on 5-year prediction) (Fig. 3E). Besides, although lower than the nomogram strategy, CTNNBIP1 alone was also a good strategy with higher net benefit compared with treat all and treat none strategies.
3.5 Validation of the model in GSE21257
KM analysis showed that there was a significant difference in the prognosis of osteosarcoma patients between high and low CTNNBIP1 expression groups in GSE21257 with P value = 0.0063 (Fig. 4A). The survival characteristics of patients in high and low CTNNBIP1 expression groups in GSE21257 cohort showed that there were more alive patients with longer survival time in high expression group (more green dots in the upper left of the plot) and this result was consistent with that in training set (Fig. 4B). Nomogram in GSE21257 showed that the expression of CTNNBIP1 and M2 macrophages infiltration acted as protective factors and present metastasis acted as a risk factor for survival, the same with that in training set (Fig. 4C). While there were also some minor differences in the contribution of different tumor sites to the survival rate between the two sets, which probably due to the small sample size and the absence of patients with tumor site at axial skeleton in validation set. C-index of the model in training set is 0.787 (95%CI: 0.739–0.835), a little lower than that in training set. Calibration analysis indicated that the predict overall survival rate of 1-, 3- ,5-year were in high agreement with the observed overall survival rate (the solid color line were close to gray dashed ideal line) (Fig. 4D). Though C-index in validation set was a litter lower than it in training set, but the results of C-index and calibration analysis still suggested high prediction accuracy of the model in validation set. AUCs of Time dependent ROC analysis for 1-, 3- and 5-year prediction of the model were 0.8, 0.83 and 0.89, respectively (Fig. 4E). Decision curve analysis showed that compared with treat all and treat none strategies, patients could get higher net benefit from the model (nomogram) and CTNNBIP1 strategy (based on 5-year prediction) (Fig. 4F). Meanwhile, when the threshold > 20%, the nomogram strategy was better.
3.6 Identification and functional analysis of the differentially expressed genes between normal and tumor groups.
There were 10759 differentially expressed genes between GTEX normal and TARGET-OS tumor groups with adjusted P value < 0.05 and |log2fold change| > 1 by “DESeq2” package. The names of genes with adjusted P value < 0.0001 and |log2fold change| > 8.5 were shown in the volcano plot (Fig. 5A). Biological processes (BP) of GO clustering of the DEGs showed that they were top clustered in mitochondrial gene expression, mitochondrial translation, translational elongation and termination. The top clustered cellular components (CC) by GO clustering were mitochondrial matrix, mitochondrial inner membrane, mitochondrial protein complex, contractile fiber and the top clustered molecular functions (MF) were actin binding, coenzyme binding, electron transfer activity, oxidoreductase activity acting on NAD(P)H. KEGG clustering indicated that besides specific diseases, the DEGs were top clustered in pathways like p53 signaling pathway, thermogenesis, oxidative phosphorylation, focal adhesion, cell cycle and etc. (Fig. 5C, additional file 3). GSEA showed that calcium signaling pathway, JAK-STAT signaling pathway, NF-kappa B signaling pathway, PPAR signaling pathway were top enriched pathways (additional file 4). Besides, we found that Wnt signaling pathway, cell cycle, oxidative phosphorylation and citrate cycle (TCA cycle) were up regulated and calcium signaling pathway, JAK-STAT signaling pathway, NF-kappa B signaling pathway, PPAR signaling pathway and focal adhesion were down regulated in osteosarcoma patients (Fig. 5D).
3.7 Identification and functional analysis of the CTNNBIP1 co-expressed genes.
2740 CTNNBIP1 co-expressed genes in osteosarcoma were identified with P value < 0.05 by Pearson correlation analysis. GO biological processes clustering of these genes showed that they were top clustered in regulation of cell morphogenesis, regulation of actin filament-based process, cell-substrate adhesion, cell junction organization. While focal adhesion, cell-substrate adherens junction, cell-substrate junction were top 3 clustered CC and actin binding, cell adhesion molecule binding, small GTPase binding, Ras GTPase binding were top clustered MF (Fig. 6A). As to KEGG clustering, CTNNBIP1 co-expressed genes were mainly clustered in focal adhesion, adherens junction, regulation of actin cytoskeleton, endocytosis, proteoglycans in cancer and etc. (Fig. 6B). PPI network of the proteins encoded by CTNNBIP1 and the top 100 CTNNBIP1 co-expressed genes ranked by P value indicated that CLSTN1 might interact with CTNNBIP1 (Fig. 6C) and FLNA and MMP11 were hubgenes that might be directly or indirectly associated with CTNNBIP1. Proteins which might interact with CTNNBIP1 were shown on the right side of Fig. 6C and they were mainly involved in the Wnt signaling pathway.
3.8 CTNNBIP1 expression in osteosarcoma tissues by immunohistochemistry.
CTNNBIP1 was stained brown in immunohistochemistry. In patient 1, the cytoplasm of tumor cells was stained brown, whereas nontumor cells, such as vascular endothelial cells, were not stained (Fig. 7A-B). Meanwhile, in the paracancer tissues, cells were not stained brown as well (Fig. 7C-D). The same result was observed in patient 2, the difference was that the paracancer tissue in patient 2 were normal bone tissue (Fig. 7E-H). These results strongly support the overexpression of CTNNBIP1 in osteosarcoma cells.