Characteristics of the study subjects
Based on the SEER database, we plotted the age-adjusted incidence of GBM patients from 2000 to 2017 (Supplementary Fig. 1). The incidence rate of GBM showed ubiquitous steady high trends for total population, males only and females only. Taken together, incidence rate of GBM was significantly higher in males than in females.
Our study included 4310 patients of GBM from 2004 to 2015, and the demographic characteristics for all subjects are summarized in Table 1. The average age at diagnosis for all cases in study was 59.8(standard deviation: 12.2), and the median age was 75 (quartile: 52, 68). After converting the continuous variable “age at diagnosis” into a categorical variable, the number of patients in the 40–65 stage was the largest, accounting for 58% cases. There were more males than females for all subjects: 59.5% male. Most of the patients were white (91.2%), and the proportions of black (5.1%) and others (3.7%) were relatively low. Details for other prognostic factors can be found in Table 1.
Table 1. Chracteristics and multivariable cox regression analysis of the independent prognostic factors for diseases-specific survival among patients with glioblastoma diagnosed in SEER-18 database, 2004-2015.
aAsian or Pacific Islander.
CI, confidence interval; SD, standard deviation; IQR, interquartile range; Statistically significant p values are bolded; Red bars indicate risk factors, green bars indicate protective factors, and blue bars indicate no statistical significance.
Survival analysis
The cases included in this study had a total of 3490 deaths due to GBM at the end of follow-up. Supplementary Table 1 shows the complete 1-. 3-, and 5-year DSS list. DSS was strongly age-dependent and the elderly populations (> 65 years) had a poorer prognosis compared to younger populations (< 40 years, 40–65 years). Likewise, males and white were more likely to have a poor DSS, respectively. Married patients had an advantage in short-term survival than unmarried patients, but it was opposite in long-term survival. The group with tumor size less than 30 mm was significantly better in 1-year DSS (64.1%) than the groups with 30–50 mm (55.9%) and > 50 mm (52.4%). The highest 3-year DSS of the tumor located in occipital lobe compared to those located in other primary site was observed and the highest 5-year DSS was observed in the tumor located in frontal lobe. left-origin of primary laterality, and diagnosed after 2010, were all related to a better DSS prognosis.
In terms of treatment methods, patients who received surgery had a more obvious advantage in DSS than patients who did not; Patients who received radiotherapy alone had a worse prognosis than patients who received a combination treatment with radiotherapy and chemotherapy; In addition, the sequence of radiotherapy with surgery did not show a significant effect on DSS. The above results were showed pictorially by Fig. 1 and Supplementary Fig. 2–3.
Variable selection
In Fig. 2(A), the ordinate represented the coefficient value, and the upper abscissa represented the number of non-zero coefficient in the model. This figure showed that as the lambda decreasing, the parameters compressed were decreased and the absolute values of the coefficients were increased. Figure 2(B) used cross validation to select variables and then fit the model. The default ten-fold cross validation method was to divide the data into ten parts, and nine parts were used to build the model, the other part was used to validate. By establishing and verifying the model, the results obtained were more stable. Overall, Figs. 2(A) and (B) showed the changes in the calculation operation of the model with the change of lambda.
Finally, we selected the best lambda value (lambda.min) and output the corresponding coefficient values of different variables (Supplementary Table 2). If the variable coefficient was not 0, it indicated that it was meaningful for predicting DSS. Lasso regression results showed that age at diagnosis, sex, marital status, race, tumor size, primary site, laterality, surgery, radiotherapy and chemotherapy, radiotherapy sequence with surgery, and year of diagnosis all showed significance for DSS (coefficient not 0).
Multivariate cox regression model
The cox regression model is a semi-parametric model, and uses survival outcome and survival time as dependent variables to analyze the impact of multiple prognostic factors on survival. According to the Lasso regression results, factors that may affect the prognosis of GBM were selected into the cox regression model for multivariate analysis (Table 1). The results showed that age at diagnosis, sex, marital status, race, tumor size, surgery, radiotherapy and chemotherapy, and year of diagnosis were all significantly correlated with DSS (P < 0.05). Individually, taking the < 40 years old group as a reference, the group of 40–65 years (HR = 2.16, 95%CI = 1.84–2.54, p < 0.001) and ≥ 65 years (HR = 3.61, 95%CI = 3.05–4.27, p < 0.001) showed significant increases in risk. Compared with white, the risks of others (Asian or Pacific Islander) were reduced by 20%. The larger the tumor size was, the higher was the risk of GBM (30-50mm: HR = 1.17, 95%CI = 1.06–1.31, p = 0.003; >50mm: HR = 1.27, 95%CI = 1.15–1.41, p < 0.001). Patients who did not choose surgery were more likely to have an increased risk for DSS than those who choosed (HR = 1.77,95% CI = 1.00- 3.12, p = 0.050). Males, unmarried status, and choosing radiotherapy only were also shown to be risk factors.
In addition, tumor located in the temporal lobe, later year of diagnosis indicated certain protective effects for DSS. No other prognostic factors were found to be correlated significantly with DSS(p > 0.05).
Restricted cubic spline cox regression
In the cox regression model, the continuous variables age at diagnosis and tumor size were classified to convert into categorical variables and included in the analysis, which was impossible to actually observe the continuous risk variation trend. Artificially transforming continuous variables into categorical variables may cause bias, because there were difficulities in realizing the association between continuous variables chaging and DSS. In this study, we used restricted cubic spline cox regression to better observe the effect of continuous changes with age and tumor size on DSS. Results showed that taking the minimum age of 19 years as a reference for the included population, the risk effect value(harzard ratio) gradually increased as the age increasing and had a faster growth trend in the elderly population(> 75 years), suggesting that advanced age was an significant risk factor for DSS (Fig. 3(A)). When 30 mm of tumor size was used as a reference, the value of harzard ratio rapidly increased below 75 mm, and after that it remained a rather stable risk trend (Fig. 3(B)).
The non-linear associations between continuous variables (age, tumor size) and DSS were statistically significant (p < 0.0001), indicating that the application of restricted cubic spline regression was necessary. Additionally, it still maintained similar trends in age and tumor size categorted by other variables (Supplement Fig. 4–7).
Nomogram
The nomogram with multivariable cox regression model was constructed by scoring independent prognostic factors and it could provide guidance for clinical prediction (Fig. 4 and Supplementary table 3). New samples generated by bootstrap self-sampling were used to evaluate the accuracy of the nomogram. The horizontal axis of the calibration chart was the predicted survival rate, and the vertical axis was the actual survival rate. In theory, the standard curve is a straight line that passes through the origin of the coordinate axis and its slope is 1. If the predicted calibration curve is closer to the standard curve, the better is the predictive ability of the nomogram. Supplementary Fig. 8 showed that the 1-, 3-, and 5-year predicted probabilities of the nomogram constructed based on the variables selected by Lasso regression fit better with the true probability. Moreover, we selected variables based on the traditional univariate cox regression analysis method (Supplementary table 4), and then constructed a nomogram (Supplementary Fig. 9), which showed that the prediction probability level was poor (Supplementary Fig. 10). Therefore, the choice of disease prognostic factors should not be limited to use traditional analysis methods.