Development of a novel nomogram for the prediction of cause specic survival for patients with non-metastatic Chromophobe Cell Renal Carcinoma, following radical nephrectomy

Background:The applicability of the nomogram approach to the prognosis of CCRC patients following radical nephrectomy has yet been established or tested. In this study, we utilized data from a large publicly available US cancer database to obtain a large sample size of data on CCRC patients, and used this dataset to evaluate the independent prognostic risk factors for cause-specic survival (CSS) in patients with non-metastatic CCRC. The data were used to construct a new novel prognostic nomogram for CCRC following radical nephrectomy in order to provide a clinically useful predictive tool for estimating a patient's survival probability.This study aimed to establish a novel prognostic nomogram for patients with non-metastatic Chromophobe Cell Renal Carcinoma (CCRC) after radical nephrectomy. Methods:A total of 1040 CCRC patients with non-metastatic cancer who had undergone radical nephrectomy, were identied in SEER (2004 through 2014). A novel nomogram was constructed based on the data and variables associated with cause specic survival time were included in the model using the Cox Proportional-Hazards Model. The nomogram was cross-validated against a subset of the data (n = 520 patients) from 9 randomly selected cancer registries in SEER, by calculation of Harell’s Concordance index (C-index) and the calibration curve for the time-related probability of survival. Results:Multivariate analysis of the training dataset (n = 1040 patients) revealed age at diagnosis, tumor size and tumor grade as independent factors associated with cause specic survival (CSS), and these were selected for inclusion in the nomogram. The calibration curve for the time-limited probability of survival showed good agreement between the predictions of the nomogram and actual observation. The C-index of the nomogram for predicting survival was 0.81 (95% CI 0.75 - 0.87), which was statistically higher than the C-index values produced by the 6/7th AJCC systems (0.75, 95% CI 0.67 - 0.83). The nomogram prediction of survival based on the validation dataset was also superior


Background
Chromophobe Cell Renal Carcinoma (CCRC) is a distinct subtype of renal cell carcinoma (RCC) which is derived from the epithelial cells of the collecting ducts. It is relatively uncommon and accounts for only 5% of RCC cases [1]. The average age of patients presenting with CCRC tumors is 50, spread equally across males and females [2]. Recent clinical research has demonstrated a more favorable prognosis following CCRC surgery compared with that for other renal cancer subtypes [3]. Although CCRC is characterized by a low-grade tumor, clinical evidence has shown that more than 50% of CCRC patients do not exhibit the typical symptoms of RCC, which include hematuria, back pain and an abdominal mass [4,5]. In addition, some 6-7% of patients are prone to distant metastasis, which may involve the lung (48.4% of cases), bone (23.2% of cases), and the liver (12.9% of cases) [6]. These features complicate the clinical diagnosis of CCRC and the ability to accurately predict a patient's prognosis at the current time.The most common staging system currently used to evaluate the prognosis of renal cancer is the American Joint Committee on Cancer (AJCC) classi cation system. However, the AJCC staging system does not take into account certain prognostic factors such as age, gender and ethnicity, that can also signi cantly affect an individual's likelihood of survival from certain types of cancer. As a potential alternative to, or re nement of, the existing system, a prognostic Nomogram has been established, based on the Cox proportional hazards model [7]. The nomogram can more accurately predict individual survival probability by incorporating the individualized level of parameters of interest [8]. Since their inception, nomograms have become useful tools for clinicians as they provide individualized predictions based on the individual characteristics of the patients [9]. However, the applicability of the nomogram approach to the prognosis of CCRC patients following radical nephrectomy has yet been established or tested. In this study, we utilized data from a large publicly available US cancer database to obtain a large sample size of data on CCRC patients, and used this dataset to evaluate the independent prognostic risk factors for causespeci c survival (CSS) in patients with non-metastatic CCRC. The data were used to construct a new novel prognostic nomogram for CCRC following radical nephrectomy in order to provide a clinically useful predictive tool for estimating a patient's survival probability.

Patients
Data records of CCRC patients who were diagnosed between 2004 and 2014 were obtained from the November 2016 submission of the Surveillance, Epidemiology, and End Results (SEER) database (https://seer.cancer.gov/). The SEER database incorporates data from 18 population-based cancer registries of the United States. A precondition for patients eligibility was that radical nephrectomy had been accepted. Furthermore, only patients with a pathologically con rmed unilateral lesion Chromophobe Cell Renal Carcinoma (topography code: IDC-O-3;8317/3) and in whom there was no distant metastasis were included. Patients were excluded if they had an uncertain AJCC stage of T and N or if the CCRC tumor was accompanied by other malignant tumors. In addition, patients who had been diagnosed with another type of malignant tumor before CCRC were excluded, as well as those where the sequence of cancer diagnosis was unknown. Where information on the survival time (in months) or cause of death was not available these patients were also excluded from the study cohort. Finally, in order to maintain the quality of the predictions made using the dataset, patients whose tumor grades were ambiguous at the time of diagnosis were also omitted. After the application of these exclusion criteria, data relating to the following variables were extracted from the dataset: sex, race, age at diagnosis, marital status at diagnosis, tumor size, tumor grade, primary site, American Joint Committee on Cancer (AJCC) stage, and surgery of the primary site.

Statistical analysis
The extracted patient data from the SEER database was assigned as the training dataset (n = 1040) and a subset of this data, comprising of patients from 9 randomly selected cancer registries, was assigned as the validation set (n = 520). The CSS rate of patients was the rst variable to be included in the nomogram model, de ned as the duration (in months) from the date of diagnosis to the date of last follow-up and death from CCRC. Subsequently, patient age and tumor size were incorporated as predictive variables using Restricted Cubic Spline Regression to translate the continuous data into categorical data [10,11].Then the association of variables with CSS was assessed using the Cox Proportional-Hazards model. Backward stepwise selection with the Akaike information criterion (AIC) was used to identify the most predictive variables for inclusion in the multivariate Cox model [12,13]. Based on the results of this multivariate analysis, the model with the lowest AIC value was selected for construction of the nal nomogram predictive model. The established model was evaluated by assessing its discrimination and calibration aspects by calculation of Harrell's C-index and the calibration curve for the time-related probability of surviva [14]. Statistical analyses were performed using Microsoft R version 3.4.2 (http://www.r-project.org). P value of < 0.05 was considered statistically signi cant.

Patient characteristics
The training dataset comprised of 1040 patients who ful lled the inclusion criteria. The mean age was 58.15 ± 14.07 years. Approximately 57.4% of the patients were male (n = 597), and the rest were female (42.6%, n = 443). The mean tumor size was 7.2 ± 4.29 cm. The majority of patients had moderate or poor grade tumors (55.1% and 31.6%, respectively). According to the AJCC staging system, the majority of patients had pT1 stage tumors (n = 558, 53.7%). Only 13 patients had pN1 stage tumors. The median follow-up was 115.8 months. A total of 116 patients (21.2%) died during follow-up. The 3-year, 5-year and 10-year, survival probabilities were 0.996, 0.991 and 0.982, respectively.
The validation dataset comprised of 520 non-metastatic CCRC patients who had bee selected randomly from 9 cancer registries within the SEER database. Median patient age was 57 years (range 23-92), and More than 50% of patients were male. The mean tumor size was 7.17 ± 4.24 cm, Patients characteristics for the two datasets are shown in Table 1. Standard demographic and tumor characteristic variables that were found to be associated with CCRC CSS were selected for analysis [15]. First, the clinical and pathological data variables were transformed and examined to ensure that they tted the Cox proportional-hazards regression and linear assumptions, prior to model construction. Continuous variables, such as patient age and tumor size, were translated into categorical variables using Restricted Cubic Spline Regression [16]. Both of these variables were found to have signi cant non-linear effects on the hazard ratio (HR) of CSS. Sensitivity analysis revealed a maximization of the Wald χ 2 statistic with three knots for tumor size (χ 2 = 7.2) and age (χ 2 = 24.5). The log-relative mortality hazard values were found to be relatively homogenous below approximately 50 years of age and above a tumor size of about 8 cm (Fig. 1).
Subsequently, in the univariate analysis, age at diagnosis, tumor size, tumor T stage and N stage, and tumor grade were all found to be associated with survival. All of the associative factors were included in the multivariate analysis. The results of the multivariate analysis revealed that independent factors for CSS included age at diagnosis, tumor size and tumor grade (Table 2). Finally, backward stepwise selection with the AIC was used to identify variables for the multivariate Cox proportional-hazards model.
Based on the results of that multivariate analysis, the model with the lowest AIC value was selected for construction of the nal nomogram predictive model [12] (Fig. 2). In the same way we put AJCC staging related variables into a new model and the traditional AJCC nomogram predictive model was established. The nomogram was initially validated by the bootstrap method and then cross-validated using the validation dataset. Harrell's C-index was used to measure the predictive accuracy of the nal nomogram model and the conventional AJCC model. As shown by the bootstrap validation results, the nomogram model demonstrated increased accuracy for predicting CSS, with an unadjusted C-index of 0.81 (95% CI 0.75-0.87) and a bootstrap-adjusted C-index of 0.80 (95% CI 0.73-0.83), which was higher than that of the AJCC staging system (0.75, 95% CI 0.67-0.83). Figure 3 shows the calibration plot of the nomogram predictions of 3-year, 5-year and 10-year survival based on the training dataset. It can be seen that the predicted survival rates corresponded closely with the actual survival rates.
For the external validation of the nomogram in the validation cohort, the C-index was 0.83. It had a better discrimination compared with the AJCC staging system which had a C-index of 0.71 (p = 0.038). Figure 3 shows the calibration plot of the nomogram of 3-, 5-year and 10-year survival of the validation set. As the same result, the predicted survival corresponded closely with the actual survival.

Discussion
Although researchers have developed nomogram models for predicting the prognosis of renal cancer [17,18], these models are only based on TNM staging. The lack of a predictive model for the prognosis of renal cancer subtypes is incomplete, and needs to be further improved. perfect. Based on the existing research results contained in the SEER cancer database, we propose a nomogram prediction model for CCRC as a subtype of renal cancer, and use part of the data to verify the model internally. Furthermore, considering the uncertainty of the nature of disease progression in metastatic patients, we decided to limit this nomogram to non-metastatic patients to improve the clinical utility of the nal nomogram model.
Nomograms are currently considered to be the most accurate tool with which to predict oncological outcomes following surgical treatment and might be especially useful for the management of uncommon diseases such as CCRC, where evidence-based medicine is lacking [19,20]. The accuracy of our present nomogram was calculated to be 81%, which is comparable to that of other nomogram models available in the cancer eld. The inclusion of AIC analysis in the validation process enabled us to select the most accurate model that incorporated the most predictive and fewest number of variables. Indeed, not all independent predictive factors of CSS improved the prediction of clinical outcomes by the model, and only the following variables were included in the nal nomogram: sex, age, tumor size, yp-T status, yp-N status and tumor grade. These seven variables have been proposed in earlier models for the prediction of postoperative recurrence risk of patients diagnosed with RCCC [21] (23), and are also consistent with the main prognostic factors of CCRC. The difference of race has also been shown to be strongly predictive of evaluation risk after radical nephrectomy [22], but according to AIC analysis, the model including race exhibited weaker performance.
Clinical decisions are particularly crucial in CCRC patients [23,24]. It is proposed that clinicians could discuss cancer management based on the CCS predictions generated by the nomogram model developed in this study. A large number of studies have shown that CCRC patients receiving radical nephrectomy early in the course of the disease receive a good prognosis [25]. According to a clinical study by Lee (2010), the ve-year overall survival rate for CCRC can be as high as 88% [26]. However, the potential severity of the disease should not be underestimated. A recent study has suggested that in patients who develop postoperative metastasis, the prognosis is poorer where the primary tumor diameter exceeds 8 cm than when it is less than 8 cm. In the present study, tumors larger than 8 cm were associated with a decrease in the CSS time. Therefore, from the perspective of prognosis, patients with tumors larger than 8 cm in diameter should be recommended for more accurate preoperative assessment, because their outcome is poorer than patients with more localized disease.
There are certain limitations affecting the current study that should be considered. First, our model was built from the data contained in the SEER database, therefore its accuracy is directly linked to the accuracy of data collection and input into that database. Secondly, in the development of the nomogram, we included only clinical and histological parameters, but there are other biochemical parameters, such as the serum level of lactic acid dehydrogenase (LDH), that might also be associated with adverse pathological features and that could be related to oncological outcomes [27]. Moreover, postoperative adjuvant therapy is also a major factor in uencing the prognosis of the disease, however none of these variables were included in the present model because they were not available for the patients that were included in the study. We recommend that such a model should be subject to continuous improvement by inclusion of additional relevant variables and by updating it in clinical practice.

Conclusion
The nomogram prediction model we built has a higher C index value, and has better sensitivity and predictive value. A prospective study with a larger sample is needed to verify our ndings. This article does not contain any studies with human participants or animals performed by any of the authors.

Consent for publication
Not applicable.

Availability of data and materials
The datasets analyzed during the current study is available from the corresponding author on reasonable request.

Competing interests
All authors declare that they have no con ict of interest.  Transformation of continuous variables in univariate analysis using restricted cubic splines