Data source
We used data in the publicly available SEER database (http://www.seer.cancer.gov), which covers approximately 28% of the US population. SEER provides data on patient demographics, primary tumor site, tumor stage, surgical treatment, patient survival among others. Relevant data was retrieved using SEER*Stat software (version 8.3.5).
Study population
To be included in this study, participants must have first presented with metastasis RCC at diagnosis. The RCC must have been identified using universal morphology codes (8050/3, 8260/3, 8310/3, 8317/3, 8318/3, and 8319/3) based on the International Classification of Diseases for Oncology codes (3rd edition). Besides, the RCC must have been the first and only primary disease. The diagnoses were confirmed by histological examination and there was complete follow-up data of the patients. Patients under 18 years at diagnosis, missing data on follow-up, race, marital status, Fuhrman grade, tumor size, tumor stage, lymph node status, metastasis, and surgery were all excluded from the study. Autopsy or death certificate cases were also excluded. In the end, 2315 metastatic RCC patients fulfilled the inclusion criteria, and were included in the final analyses. The flow diagram for patient selection is presented in Figure 1.
Measurable variables
Demographic and clinical variables such as age at diagnosis, race (black, white, other), sex, marital status (married, unmarried), histologic subtype (clear cell renal cell carcinoma (CCRCC), papillary renal cell carcinoma (PRCC), chromophobe renal cell carcinoma (CHRCC), sarcomatoid renal cell carcinoma (SRCC), collecting duct renal cell carcinoma (CDRCC)), Fuhrman grade (grade Ⅰ, grade Ⅱ, grade Ⅲ, grade Ⅳ), Tumor classification (T1, T2, T3, T4, TX), lymph node status (N0, N1, NX), sarcomatoid feature (yes, no, unknown), cancer-directed surgery (recommended and performed, recommended but not performed, not recommended), bone, brain, liver, and lung metastasis status, survival time and vital status were all captured. The AJCC Cancer Staging Manual (7th edition, 2010) was employed to evaluate tumor stages.
Ascertainment of the outcome
OS was the primary outcome of this study, defined as survival time between metastatic RCC diagnoses to death, attributed to any cause. OS was ascertained based on the code “vital status” in the SEER database.
Statistical analysis
Selected patients were randomly and equally stratified into training and validation sets. Preliminary descriptive statistics were performed to describe the baseline characteristics of the patients in both sets. Thereafter, normally distributed continuous variables were expressed as mean ± the standard deviation, whereas non-normal continuous variables were presented by medians (interquartile range). Categorical variables were summarized in frequencies and percentages. Univariable and multivariable Cox regression analyses were performed on the training set to obtain crude and adjusted hazard ratios (HRs), important in identifying prognostic factors significant for OS. Prognostic factors were determined by a backward stepwise process using the Bayesian information criterion. Schoenfeld residuals were used to assess the proportional hazards assumption of Cox regression models.
Nomograms are pictorial representations that quantify risks and the probability of clinical events by scoring the involved factors. They have been demonstrated to generate more precise predictions than the conventional AJCC staging system in several types of cancers [10, 11]. In this study, a nomogram for predicting 1, 3 and 5-year OS was derived based on the findings of multivariable Cox regression analysis.
Discrimination and calibration, important properties in evaluating the performance of the model [12, 13], were both assessed in our study. C-index was applied to evaluate the discriminative ability of the nomogram, which depicted the probability of the predicted risk was higher for a random patient having an event than for a random patient not having an event. After comparing the predicted probability of events for all possible pairs of patients, C-index is 0.5 if the model can not discriminate the patients with and without events. Conversely, C-index is 1 if the probability predicted by the model is always higher for patients with events than those without events [14]. The robust performance of the model was assessed using the original and optimism-corrected C-indices. Calibration plot, the best visual representation of the relationship between predicted risk and actual risk, was presented using bootstrap resampling method [14]. Calibration plots fall on a 45-degree diagonal line, reflecting excellent absolute risk estimates. NRI and IDI usually assess and quantify the refinement in risk prediction between new and old models [15]. NRI is based on reclassification tables composed of patients with and without events, and can quantify the correct reclassification in categories. NRI is calculated by adding the percentage of patients with events who are correctly reclassified to the percentage of patients without events who are correctly reclassified [14]. IDI reflects the improvement in sensitivity and specificity of a model. It is also an integrated difference in Youden’s indices [15]. IDI is calculated by adding the increased probability predicted by new model compared to old model for patients with events to the decreased probability predicted by new model compared to old model for patients without events [15]. Therefore, NRI and IDI were both employed to compare the discriminative ability between the new model and the AJCC staging system. Notably, even though NRI and IDI have become increasingly popular, they should be interpreted with caution [16]. IPA is a promising metric that combines discrimination and calibration in one value, thus improves interpretability by adjusting for the benchmark model [17, 18]. IPA was also reported in this study to reflect the performance of the model. DCA is a method for evaluating the benefits of a diagnostic test across a range of patient preferences for accepting risk of undertreatment and overtreatment to facilitate decisions about test selection and use [19]. Unlike the sensitivity, specificity, and area under the curve, DCA directly assesses the utility of clinical risk prediction models for decision making [20]. Herein, DCA was plotted to evaluate the clinical value of the nomogram by quantifying the net benefit in comparison with the AJCC staging system.
Statistical tests were performed using R software (version 3.5.2, http://www.r-project.org/). All tests were two-sided, with statistical significance set at P<0.05.