Demographic baseline and clinical characteristics
According to exclusion and inclusion criteria, entire 2195 osteosarcoma patients were identified from the SEER database (Fig 1). An optimal cutoff of tumor size (9.5cm) was defined by ROC analysis using clinical date of all the osteosarcoma patients. Based on ROC analysis result, AUC for this test was 0.650 (95% CI 0.630–0.670). The tumor size with largest Youden value was considered as the optimal cutoff. An optimal tumor size cutoff value 9.5cm was obtained with a diagnostic sensitivity of 60.24% (95% CI 55.4–65.0), a specificity of 62.31% (95% CI 60.0–64.6) and Youden value 0.2255 (Fig 2).
Table 1 showed demographic characteristics of all osteosarcoma patients. Of the 2195 patients with osteosarcoma, 659 patients (45.0%) were female and 805 patients (55.0%) were male. 1682 patients (76.6%) were diagnosed when they were unmarried, while the others (23.4%) were married. Among those patients, 19.2% of the tumors were located at the axial bone, and the rest (80.8%) were in the extremity bone. 420 patients (19.1%) had distant metastasis disease, while the rest 1775 patients (80.9%) hadn’t. Randomly, all of osteosarcoma patients were assigned into the training subgroup (n=1464) to establish and validate nomogram internally and validation subgroup (n=731) to validate nomogram externally. We performed Chi-square analysis in this table and found that patient sex, age, marital status, race, histologic subtype, tumor size, tumor site, tumor grade and use of surgery had no significant differences between validation and training subgroups (Table 1).
Feature selection
All 9 features were selected as potential predictors with nonzero coefficients (Fig 3A and 3B) in LASSO regression model. These clinical features included patient sex, age, marital status, race, tumor site, tumor grade, tumor size, histologic subtype and use of surgery.
Predictive factors for the risk of metastasis
With regard to the training subgroup, we used univariate logistic regression to analyses patient sex, age, marital status, race, tumor site, tumor grade, tumor size, histologic subtype and use of surgery. As the result, these variables, including sex, age, use of surgery, tumor size and tumor grade, were found to be connected with metastasis risk in osteosarcoma patients in the univariate analysis (p<0.05) (Table 2). While tumor site, histologic subtype, marital status and race were excluded (p>0.05) (Table 2). Furthermore, we performed multivariate logistic analysis to control the confounding variable. Sex, age, tumor grade, tumor size and use of surgery were recognized as independent predictive factors for the risk of metastasis in osteosarcoma patients.
Construction and validation of nomogram for metastasis risk in osteosarcoma patients
Independent predictive factors including sex, age, tumor size, tumor grade and use of surgery were included to construct nomogram predicting the risk of metastasis in osteosarcoma patients (Fig 4). The nomogram gave every prognostic factor a distinctive score on the nomogram point scale (Table 3). We also validated the nomogram internally and externally. Based on the information of an individual patient, we could use the nomogram by adding up the points of each predictive factor and correlating the total points with prediction in the nomogram. The calibration curve displayed perfect consistency between actual probability and predicted probability (Fig 5A). Fig 6 displayed the predictive accuracy of nomogram. AUC of the present model was 0.7427067. And AUC value of external validation subgroup was 0.6798374. These AUC results confirmed that this prognostic nomogram was reasonably accurate.
Clinical use
DCA for the nomogram was presented in Fig 5B. According to threshold probability in this analysis, the decision analysis curve was applied to assess the clinical application of metastasis prediction model. Decision curve analysis result showed that when the probability of metastasis generated by the nomogram is less than 3% or more than 76%, decisions based on the nomogram would be meaningless. In other words, the nomogram provided additional values relative to the treat-all patients scheme or the treat-none scheme in these threshold probabilities. This analysis result suggested that this model was extremely useful for clinical determinations.