Prognostic nomogram to predict overall survival for patients with perihilar cholangiocarcinoma: A population-based study of SEER database

Background Perihilar cholangiocarcinoma (pCCA) is a highly aggressive malignancy with poor prognosis. Accurate prediction is of great signicance for patients’ survival outcome. The present study aimed to propose a prognostic nomogram for predicting the overall survival (OS) for patients with pCCA. Methods We conducted a retrospective analysis in a total of 940 patients enrolled from the Surveillance, Epidemiology, and End Results (SEER) program and developed a nomogram based on the prognostic factors identied from the cox regression analysis. Concordance index (C-index), risk group stratication and calibration curves were adopted to test the discrimination and calibration ability of the nomogram with bootstrap method. Decision curves were also plotted to evaluate net benets in clinical use against TNM staging system. Results On the basis of multivariate analysis, ve independent prognostic factors including age, summary stage, surgery, chemotherapy, together with radiation were selected and entered into the nomogram model. The C-index of the model was signicantly higher than TNM system in the training set (0.703 vs 0.572, P<0.001), which was also proved in the validation set (0.718 vs 0.588, p<0.001). The calibration curves for 1-, 2-, and 3-year OS probabilities exhibited good agreements between the nomogram-predicted and the actual observation. Decision curves displayed that the nomogram obtained more net benets than TNM staging system in clinical context. The OS curves of two distinct risk groups stratied by nomogram-predicted survival outcome illustrated statistical difference. Conclusions We established and validated an easy-to-use prognostic nomogram, which can provide more accurate individualized prediction and assistance in decision making for pCCA patients.


Background
, also named as bile duct cancer, a relatively rare tumor, is the most common biliary tract malignancy and second most common primary liver malignancy [1]. According to different anatomical locations [2], they are classi ed into intrahepatic cholangiocarcinoma (iCCA), perihilar cholangiocarcinoma (pCCA), and distal cholangiocarcinoma (dCCA). The pCCA, located between the second-degree hepatic ducts and the insertion of cystic duct into the common bile duct [3], also recognized as Klatskin tumor, is the most common subtype, with the proportion of approximately 50% of cholangiocarcinomas [4]. The prognosis of Klatskin tumor is dismal with 10% of 5-year survival rates [5].
It is generally believed that surgical treatment of surgery and liver transplantation remains the mainstay of potentially curative treatment. However, only about one in ve patients are candidates for such therapies due to late-stage diseases or metastasis at the time of presentation [6]. Unfortunately, with such a low resection rate, most patients (76%) will still recur after resection [7], which emphasizes the need for better adjuvant strategies. Some experts held the view that adjuvant therapy can extend the lifespans of patients receiving surgical resection, especially margin-positive resection [8]. For inoperable patients, palliative management with chemotherapy, radiation or chemoradiation also might be life-extending options [4]. Moreover, evidences disclosed that patients could bene t from some novel treatment options like intervention therapy, targeted therapy or immunotherapy [9,10]. Currently, the broadly used staging system for perihilar cholangiocarcinoma include Bismuth-Corlette [11,12] and Memorial Sloan-Kettering Cancer Center (MSKCC) staging systems [13], but there are frequently applied to determine the operative methods or predict the resectability and the likelihood of metastasis. Besides, the American Joint Committee on Cancer (AJCC) TNM system and the Mayo Clinic systems are capable of classifying patients into different prognostic stages based on nonoperative information at the diagnosis. However, these previous staging systems only contain a restricted number of prognosis-associated factors, with insu cient considerations on other signi cant variables, like age, gender, tumor differentiation, and therapeutic strategies. Thus, their roles in accurate prognosis prediction are nite [14][15][16]. In the era of personalized cancer treatment, an accurate individualized evaluation is critical for personal speci c treatment or care options. For the above reasons, the more precise model for predicting survival for patients with pCCA is urgently required.
Nomogram, as a simple graphical visualization of statistical model, is capable of providing individual patient with accurate survival information [17], and it has been proved to be favorable for a range of cancer types [18][19][20][21][22]. Such a prognostic tool has the potential to become an alternative and even a novel standard in cancer prognosis evaluation. With respect to the nomogram established for pCCA, to our knowledge, the model constructed from real world study which combines demographic variants, surgery, chemotherapy, and radiation treatment is still lack. Hence, the current study aimed at the derivation and validation of an effective prognostic nomogram for pCCA patients utilizing the data from the Surveillance, Epidemiology, and End Results (SEER) program.

Patients and study design
The SEER program collects and publishes the cancer occurrence and survival data in the United States, accounting for about 26% of the population [23]. In this retrospective study, baseline and clinical data of pCCA patients were obtained through the SEER*Stat version 8. Patients with unknown demographic information, such as age at diagnosis, gender or race. (2) Patients with unknown survival time information; (3) Patients with unknown diagnostic con rmation or only diagnosed via a death certi cate or autopsy; (4) Patients with inde nite treatment information including surgery and radiation. Through patient identi cation, 940 eligible cases were enrolled to this study. For further analysis, the whole cohort was randomly split into a training set (n = 658) and a validation set (n = 282) according to the ratio of 7:3 using 'sample' function in R software. The training set was employed to establish the predictive nomogram, while the validation set for external validation. Ethical approval and informed consent were unnecessary for our study because SEER dataset is publicly available and all patient data are de-identi ed. We have also signed data use agreement and achieved permission to access the database.

Variables extraction
Baseline and clinicopathological characteristics were gathered from the SEER database, including age, race, gender, marital status, grade, T stage (AJCC, 6th ed.), N stage (AJCC, 6th ed.), M stage (AJCC, 6th ed.), SEER summary stage, surgery, radiation, chemotherapy, survival time and vital status. Age at diagnosis was categorized into three subgroups (less than 68 years old, aged between 69 and 84 years old and 85 years or older group). The terms of "Surg Prim Site" and "Reason no cancer-directed surgery" could help identify patients with the receipt of surgery. Surgery and radiation therapy were de ned as receiving relevant treatment or not. Chemotherapy was classi ed as receiving or not/unknown. The endpoint was overall survival (OS), the de nition of which is the length of time from diagnosis to allcause deaths or the last follow-up.

Statistical analysis
Continuous variables were transformed to categorical variables by virtue of cut-off value derived from Xtile software (Version 3.6.1, Yale University, New Haven, CT, USA). Categorical data were compared by Chisquare test or Fisher's exact test. Cox proportional hazards regression model was employed in the univariate and multivariate analysis. Hazard ratio (HR) and relevant 95% con dence interval (CI) were calculated. Prognostic predictors were determined by multivariate analysis and a nomogram was formulated. Performance of the constructed nomogram was assessed via Concordance index (C-index) and calibration plots [24,25]. Bootstraps with 1,000 resamples were used in validation. Calibration plots represented the relationship between the predicted OS and the observed OS probability. Comparison of the nomogram and traditional AJCC TNM staging system was carried out through the R 'Hmisc' package and tested by C-index. The C-index (range: 0.5-1) re ected the discrimination capacity of the model and larger values indicate better discrimination among different survival outcome. Additionally, decision curve analysis was employed to assess the clinical net bene t of the predictive model. After calculating the total nomogram scores for patients in training and validation cohort, then determining the optimal cutoff values by X-tile, individuals were separated into low-risk or high-risk subgroups. Survival estimation was accomplished by the Kaplan-Meier (KM) method and differences between survival curves were analysed by log-rank test. All analyses were performed by SPSS26.0 for windows (IBM Corporation; Chicago, IL, USA) and R version3.6.2 (Institute for Statistics and Mathematics; Vienna, Austria) using 'rms', 'survival', 'Hmisc', and 'stdca' packages. Two-sided P value less than .05 was considered statistically signi cant.

Results
The characteristics of Patients with pCCA A sum of 940 eligible patients were included in the whole study cohort, which was then selected at random into the training set and validation set. Detailed process for patient selection was shown in Figure.1. The demographic and clinicopathological characteristics of all patients were presented in Table 1. All variables were similar in both sets. In the whole cohort, there were 876 events (death) over the Independent predictors for OS in the training set As described in Table 2, through the univariate analysis, a total of 10 covariates were related to OS (P < 0.05), while gender and race had no correlation with survival. Statistically signi cant variables in the univariate Cox analysis were combined into the subsequent analysis. Following adjustment for other covariates, multivariable analysis indicated that age, summary stage, surgery, chemotherapy as well as radiation remained independently connected with OS.

Development and validation of the nomogram
The predictive nomogram model incorporating all signi cant prognostic factors identi ed by Cox regression analysis was built as presented in Figure.2. The length of each respective point scale indicates the extent of contribution to survival outcome. As demonstrated in the nomogram, surgery treatment shared the most contribution, followed by the summary stage and age. In addition, presence or absence of radiation also had a moderate effect on the prognosis. In terms of the internal validation, the nomogram had a C-index of 0.703 (95%CI: 0.680-0.725), as could be seen in Figure.3, the calibration plots for 1-, 2-, and 3-year OS probabilities indicated high consistency between the nomogram-predicted and the actual observations. Regarding the external validation, the nomogram still exhibited good accuracy, supported by the C-index of 0.718 (95%CI: 0.684-0.752). Calibration curves for external validation presented in Figure.3 still con rmed satisfactory agreements between OS predictions and the actual observations. In a word, excellent discrimination and calibration in internal and external validations guaranteed the reliability and repeatability of the established nomogram.

Comparison between the nomogram and AJCC TNM system
Comparison of the discriminative ability of nomogram and traditional TNM staging system was executed by calculating corresponding C-indies. In the training cohort, our model has a notably higher value than that of TNM (0.703 vs 0.572, P < 0.001), which was then veri ed in the validation set (0.718 vs 0.588, P < 0.001). To further validate and compare the superiority of the nomogram and TNM system in clinical utility, decision curve analysis was introduced to our study. As depicted in Figure.4, the nomogram yielded remarkably more bene ts in foreseeing 1-year, 2-year and 3-year OS than TNM staging system did over a relatively broad range of threshold probabilities.

The performance of nomogram on risk group strati cation
Relying on the nomogram total points calculated for each patient and the optimal cut-off values detected by the X-tile analysis, patients were categorized into the low-and high-risk subgroups. In this study, the cases were divided as the low-(less than 214 total points) and the high-risk group (greater than or equal to 214 total points) respectively. The KM curves were plotted in Figure.5, which illustrated that patients in lower risk had signi cantly higher OS probabilities in comparison with the ones in high-risk group. The above results denoted that the nomogram was reliable in predicting the probability of OS for Klatskin tumors patients and aiding in gaining better equivalence between study arms for clinical study design.

Discussion
Most pCCA are diagnosed at an advanced stage [26]. Therefore, individualized survival evaluation via easily accessible measures for patients is of importance. In the current study, ve independent determinants were screened through univariate and multivariate Cox proportional hazard regression analysis, namely as age, surgery, chemotherapy, radiation and summary stage. We developed the nomogram with these ve factors to predict 1-, 2-, and 3-year OS probabilities of pCCA, which allows physicians to predict their patients' prognosis in an easy-to-implement way. It's in agreements between the nomogram-predicted OS and the observed OS as evidenced by C-indies of 0.703 and 0.718 in both training and validation sets respectively. We had an additional interpretation that the calibration for 3-year OS was slightly reduced when externally validated, whereas the discrimination was outstanding. Tumor characteristics, management strategies and the differences within patients may be responsible for the suboptimal calibration in validation set. In addition, survival curves of distinct risk groups manifested the signi cantly statistical difference, indicating the feasibility of the constructed monograph.
As our results indicated, age was signi cantly correlated with OS rates, with older age corresponding to shorter OS, which was in accordance with the published studies [27,28]. To obtain the most signi cant difference between age groups, optimal cut-off values derived from X-tile analysis were applied to both the training and validation cohort. It's recognized that patients with older age have worse surgery or other treatment tolerance and other comorbidities that overall survival might be negatively affected. However, Koerkamp et al [29] revealed that age was not a signi cant factor to predict disease-speci c survival (DSS). We speculated that deaths from some other competing risks like cardiovascular events may partially explain the result. Interestingly, in the nomogram presented by Chen et al [30], age was also related to the OS, whereas age at 55 years old indicated the lowest OS rate, with the OS rate conversely increasing within the age between 55 to 85. The underlying reason for the contradiction remained unclear.
SEER summary stage is a basic method of determining how far a cancer has extended from its point of origin, mainly comprising local, regional, and distant stage. What's more, the regional stage consists of three detailed classi cations, including direct extension only, lymph nodes involved only, both direct extension and lymph node involvement. The present study detected the prognostic value of summary stage in association with OS probabilities, which was similar to the results published by Qi et al [28].
Compared to the cases at regional and distant stage, only tumors at the localized stage had better prognosis.
With regards to the treatments, other two treatment modalities of chemotherapy and radiation also demonstrated signi cant correlation with survival outcome besides surgery. Surgery with curative intent can prolong the median OS of patients from 8 months to 40 months [31]. Nevertheless, only a con ned portion of patients met the eligibility for surgical treatment, with the addition of high recurrence rate after the resection, so options like chemotherapy or radiation should be considered in these cases for adjuvant or palliative treatment. Controversy also exists in the correlation between chemotherapy or radiation and OS. Chen et al [30] proposed that chemotherapy and radiotherapy was not related to the OS, while they were both independent prognostic factors in our study, thus integrated in the nomogram. In reference of a randomized trial [32], adjuvant capecitabine chemotherapy should be given to patients with the resected cholangiocarcinoma for 6 months duration [33], with a dose of 1,250 mg/m 2 twice a day in every 3-week cycle. In addition, several evidences disclosed that there were potential bene ts of chemotherapy treatment in both preoperative and postoperative Klatskin tumors [34][35][36]. The regimen of gemcitabine and cisplatin was recommended for inoperable or postoperative pCCA patients [4,37], but detailed chemotherapy regimen information were unavailable in SEER database. Patients with extrahepatic cholangiocarcinoma may be offered chemoradiation therapy, which was administered with a dose of 45 Gy to regional lymphatics and 54 to 59.4 Gy to the tumor bed [38,39]. With respect to the radiotherapy for perihilar cholangiocarcinoma, Leng et al [40] argued that adjuvant radiotherapy was of no association with survival improvement for resected pCCA groups. A retrospective study [41] of SEER dataset also concluded that radiation modality had no therapeutic bene ts for pCCA. But in the light of some reviews [42,43], external beam and endoluminal radiation therapy was an option for unresectable hilar cholangiocarcinoma. In general, although the e cacy of radiotherapy for pCCA was still in dispute, it was t into our nal model on multivariate analysis. Methods of radiation mainly comprised the beam radiation, radioactive implants (including brachytherapy), and combination of beam with implants or isotopes. Additionally, appropriate dosing, timing and radioactive source were needed for further investigation.
There have been several prognostic models for pCCA reported, including a point scoring system and three nomograms [28][29][30]44]. As speci ed in several previous studies, lymph node counts and lymph node metastasis were often identi ed as independent predictors for pCCA [45,46]. For instance, lymph node ratio (LNR) was often utilized to develop the nomogram for biliary tract system diseases [47,48]. While in our study, LNR is hard to formulate due to the missing data, and AJCC N stage failed to enter into the nal model with statistical insigni cance. Compared to the scoring system, the current study con rmed the prognostic roles of age and chemoradiation in patients with pCCA, and the proposed nomogram could work for a scoring system and a visualized prediction approach at the same time. Concerning the nomograms developed for patients with resected pCCA by Koerkamp et al [29] and Chen et al [30], our model did not include portal vein or hepatic artery involvement, tumor pathological differentiation and lymph node status as well as margin status, which might in uence the prognosis of the pCCA patients, but SEER summary stage could equivalently re ect these tumor characteristics to a great extent.
Moreover, we have encompassed the chemotherapy and radiation factors rarely mentioned in other studies. With regard to a recently published nomogram with C-index of 0.651, it was only based on 317 patients from the SEER database [28], without consideration of chemotherapy and radiation information.
In contrast, our model was based on a large cohort of cases, and has a comparable discrimination and calibration to those relevant models.
Despite the good performance and easy-to-use of our constructed monograph as a prognostic model, some limitations in our study should be noted. For instance, due to the retrospective nature, the nomogram establishment was based on the data from the SEER program that selection bias is a potential concern. Furthermore, some important clinicopathological parameters and widely used tumor biomarkers for pCCA, such as surgical margin status, the information of serum carbohydrate antigen 19 − 9 (CA199), carcinoembryonic antigen (CEA) and vascular encasement were unavailable in SEER database. In addition, many promising options like interventional therapy [49], immunotherapy [50] and targeted treatment [51] are enjoying increasing popularities among pCCA patients, but it is a pity that they have been inaccessible in SEER program until now. As a user-friendly method for prognosis prediction and decision-making, there is di culty in providing everlastingly excellent prognosis prediction in clinical practice, because not all signi cant prognostic factors were embodied in this tool.

Conclusions
In conclusion, the proposed nomogram in our study could predict the OS of patients with Klatskin tumors effectively and e ciently, which could be utilized for individualized prognosis prediction and patient strati cation.  Figure 1 The owchart of patient selection.