Construction and Validation of a Nomogram for The Prediction of Overall Survival in Intrahepatic Cholangiocarcinoma

Objective: This study aimed to establishand validates a nomogram to predict the overall survival (OS) of patients with intrahepatic cholangiocarcinoma (ICC). Patients and methods: The ICC patients were collected from the Surveillance, Epidemiology, and End Results (SEER) database from 2004 to 2015. Then, the independent prognosis-related factors were obtained from the training set using the Cox regression model for the establishment of a nomogram. Results: We identied 3675 eligible patients with a median survival time of 9 months (0–153 months). According to multivariate analysis, age, sex, marital status, grade, T stage, N stage, M stage, surgery, chemotherapy and radiotherapy were identied as the factors to independently predictthe prognosis for ICC (all P<0.05). Thereafter, the above factors were incorporated for the construction of a nomogram. In comparison with the AJCC 8th TNM classication system and the SEER summary stage system, our constructed nomogram showed higher ability in discrimination, as revealed by the C-index (all P<0.001).Besides, the internal as well as external calibration curve analysis demonstrated that the predicted results were highly consistent with the actual ones. On the other hand, our nomogram outperformed the AJCC 8th TNM classication system and the SEER summary stage system in predicting the 3- and 5-year OS, as suggested by time-independent area under the curve (tAUC) values. Conclusion: Our constructed nomogram performs well, indicating its potential as an ecient approach to evaluate the prognosis of ICC patients. and multivariate analyses of overall survival in the training set.


Introduction
Intrahepatic cholangiocarcinoma (ICC) accounts for an uncommon liver cancer, whose morbidity is much lower than hepatocellular carcinoma (HCC), and it is originated from intrahepatic bile duct epithelium 1 . ICC is associated with the lowest incidence among all types of cholangiocarcinomas, in comparison with those originated from the upper 1/3 of biliary tract or the 2/3that involves the common hepatic duct bifurcation (Klatskin tumors) 2 . Despite of the rarity, ICC shows an increasing incidence within the past few decades 3 . Besides, its clinical characteristics and prognostic outcomes show signi cant differences from HCC 4,5 . Due to the rarity and heterogeneous nature of ICC, identifying reliable prognostic features have been a challenge.
At present, the 8th edition of the Tumor-Node-Metastasis (TNM) classi cation system developed by the American Joint Commission on Cancer (AJCC) 6 has been extensively applied in evaluating the prognosis for ICC. The TNM classi cation system is usually used to predict cancer outcomes through evaluatingthe tumor site and size (T), involvement of regional lymph node (N) together with distant metastasis (M). But additional vital factors, including age, race, gender, degree of tumor differentiation as well as treatment, may also affect individual patient survival 7 . Additionally, the TNM 8th classi cation system can not su cient predicts the prognosis for individual patients. As a result, it is urgently needed to establish an approach to classify ICC prognosis with technical feasibility and easy accessibility.
Nomogram is the facile approach used for statistical prediction, which is extensively utilized for prognosis prediction clinically [8][9][10] . To construct a nomogram, it is necessary to take the prognostic weights of all factors into account during the calculation of an outcome probability; in addition, it is also required to integrate several independent factors for drawing the optimal conclusion. In comparison with the AJCC TNM classi cation system, nomogram is able to precisely evaluate patient survival through combining the vital prognosis-related factors 11 . As far as we know, no nomogram has been constructed to predict the prognosis for ICC. In this regard, this work aimed to construct and validate a nomogram to predict the OS for ICC patients collected from the Surveillance, Epidemiology, and End Results (SEER) database.

Ethical statement
In the SEER program organized by the National Cancer Institute, the population-based data are used for developing the integrated sources, which cover approximately 30% US population from diverse geographic regions 12,13 . To extract data from the SEER database, we signed the Research Data Agreement using the reference number 19858-Nov2018. In line with the veri ed guidelines, data were obtained according to research methods. All the collected information was public and de-identi ed.
Therefor, this study did not require approval by the institutional review board.

Study population
The eligible cases were screened using the SEER*State v8.3.6 approach (released on August 8 th , 2019).In this work, we applied the International Classi cation of Diseases for Oncology third edition (ICD-O-3) in identifying ICC cases and selecting themusing the ICD-O-3 site codes C22.0 or C22.1 (liver and intrahepatic bile duct). Besides, ICC was identi ed using the 8160/3 ICD-O-3 histological codes 7 .Patients conforming to the following criteria were excluded: (1) those with over one primary tumor; (2)those only with the clinical diagnosis, or those diagnosed based on autopsy or the death certi cate; (3)those with insu cient clinicopathological data, such as TNM stage or surgical classi cation; (4) those with no data on prognosis; (5)those with no data on race or marital status. The remaining participants were included into the initial SEER cohort.

Covariates and endpoint
The clinicopathological features shown below were examined, including age( 65, ≥65 years); gender (male,female); marital status (married, unmarried); race (white, black, others); insurance status (uninsured/unknown, any medicaid/insured); T stage (T1-4); N stage ( N0-1); M stage ( M0-1);grade (I/II, III/IV, unknown); surgery(no surgery, local tumor removal/segmental resection, lobectomy/hepatectomy); radiotherapy(yes, no/unknown) and chemotherapy (yes, no/unknown). The widowed or single (never married or having a domestic partner) or divorced or separated patients were classi ed as unmarried 14,15 . Age was grouped according to previous studies 16,17 . Besides, cancer stage was classi ed according to the AJCC 6th classi cation system adapting to SEER-derived patients diagnosed between 2004 and 2015. Further, the quali ed patients were divided in line with the AJCC 8th classi cation system. In this study, the endpoint was set as overall survival (OS), which referred to the duration between diagnosis and death due to all causes.We preliminarily determined the deadline till November 2018 based on the SEER 2018 submission database. Finally, the deadline was set as November 31st, 2018.

Statistical analysis
Nomogram construction Categorical variables were compared by Fisher's exact test or chi-square test and expressed in the manner of proportions and frequencies. We carried out univariate as well as multivariate regression analysis using the Cox proportional hazards model analysis to identify factors signi cantly related to prognosis, and they were presented as hazard ratios (HRs) and the corresponding 95% con dence intervals (CIs).
Upon univariate analysis, factors of P<0.1were combined for multivariate backward stepwise analysis to identify independent risk factors.We established the nomogram model based on all independent prognosis factors obtained from the training set to predict the 3-and 5-year OS by the R package rms function (version 3.51).

Nomogram validation
The constructed nomogram was validated by measuring its discrimination and calibration abilities using the internal (training) and external(validation) set, respectively.In addition, we used the concordance index (C-index) to evaluate our model discrimination performance and assess the heterogeneities in the predicting ability between the predicted and observed results 18 . As a result, the greater C-index value indicated the better patient discrimination ability among different prognostic outcomes. Also, we employed the R package Rcorrp.cens function of Hmisc to compare the different results obtained by our constructed nomogram from those acquired by the existing TNM classi cation or the SEER summary stage system, and utilized the C-index to assess the results. The marginal estimate versus model was employed to establish a calibration plot representing the calibration between nomogram-predicted and observed survival. A calibration plot along the 45-degree line implicated a perfect model, with great consistency between the predicted and actual outcomes. On the other hand, the Receiver Operating Characteristic (ROC) curves were drawn for validating the nomogram score. In this study, R (version 3.51, www.r-project.org) and SPSS19.0 (SPSS Inc., Chicago, USA) were applied in statistical analysis. A difference of P<0.05 (two-tailed) was deemed to be statistically signi cant.

Patient characteristics
Altogether 3675 quali ed ICC cases diagnosed from 2004 to 2015 were recruited into the present work, including 2573 in training set while 1102 in validation set. Figure 1 shows the data collection ow chart. The age of included patients ranged from 14 to 104 (median, 65) years, with the male-to-female ratio of nearly 1:1. Most patients were insured (83.13%), white (76.71%) and married (59.40%). As for AJCC stage, many cases were at the early stages, including T0/T1(38.20%), N0(72.98%) and M0(65.28%). More than half of the included patients received chemotherapy in both sets, but only 16.65% received radiotherapy.The follow-up period ranged from 0 to 153 (median, 9.0) months. Meanwhile, the 3-year and 5-year OS rates of all cases were 14.89% and 9.83%. Table 1 lists all the demographic and clinicopathological features in both groups. There was no signi cant statistical difference between the two groups in all variables.
Nomogram construction Table 2 presents the independent factors that are signi cantly related to OS identi ed from multivariate analysis. Ten factors were identi ed as the independent variables after other risk factors were adjusted, which were age (P<0.001), gender (P<0.001), marital status (P=0.016), T stage(P<0.001), N stage(P<0.001), M stage(P<0.001), grade (P<0.001),surgery (P<0.001), radiotherapy (P<0.001) and chemotherapy (P<0.001).Additionally, we established a nomogram to predict the 3-and 5-year OS using the above-mentioned independent factors (Figure 2).It was discovered that,surgery made the greatest contribution to prognosis, chemotherapy ranked the second place, while AJCC stage ranked the third place. Then, the score of every screened factor was added to determine the survival probability for individual patient.

Nomogram validation
Our constructed nomogram was validated internally and externally. According to internal validation based on training set, our constructed nomogram had the C-index of 0.737 (95% CI, 0.726-0.748) in predicting OS. Besides, external validation based on validation set suggested that, our nomogram had the C-index of 0.744(95% CI, 0.726-0.762) in predicting OS, which well consistent with the real OS. Besides, we compared our constructed nomogram with the TNM 8th classi cation system and the SEER stage system for their discrimination abilities based on both datasets. As a result, our constructed nomogram showed higher discrimination ability in predicting OS than the other two systems (all P<0.001) ( Table 3). Furthermore, for our constructed nomogram of OS, its internal and external calibration plots displayed that the nomogram-predicted values were closely correlated with the actual results (  (Figure 4).

Page 6/18
ICC is one of the bile duct adenocarcinoma subtypes that involve the small intrahepatic ducts 19 . ICC ranks the second place in terms of its morbidity among primary liver cancer, only second to HCC 20 . ICC is rare; as a result, its prognosis can not be accurately predicted using the conventional classi cation systems alone. Establishing the effective prognosis prediction system for estimating patient prognosis is important. Therefore, this study aimed to construct and validate a novel prognosis nomogram for ICC on the basis of SEER-derived samples. Altogether 3675 ICC cases were examined in this study. As a result, our constructed nomogram exhibited high discrimination ability, as validated internally and externally. In addition, as suggested by the calibration plots, the predicted OS was close to the actual result. Our nomogram outperformed the present AJCC TNM classi cation system, which might serve as the clinical approach to assist in popularizing patient counseling as well as individualized treatment.
In this work, altogether 10 clinicopathological factors were identi ed to independently predict prognosis, which were age, gender, marital status, T stage, N stage, M stage, grade, surgery, radiotherapy and chemotherapy.Among them, age is identi ed as the vital factor that affects OS in some articles 21,22 . It is also suggested previously that apart from histological grade and the AJCC classi cation system, gender [7] and marital status 15 are also identi ed as the prognosis-related factors for ICC. In addition, radical surgery is identi ed as the only e cient treatment, and aggressive surgery is recommended in many institutions 23 .It is reported in numerous studies that, patients undergoing chemotherapy or radiotherapy show higher survival bene ts [24][25][26] , consistent with our results.
Nomogram accounts for a key part in the modern decision-making in the medical eld 27 . It graphically presents a statistical prediction model for providing the speci c outcome probability 28,29 . Therefore, the considered variables must be easily accessible and detectable. It is increasing reported that, nomogram outperforms the traditional AJCC TNM classi cation system in predicting the prognosis for several cancers, as a result, it is recognized to be the alternative tool or even the novel standard 30,31 . In comparison with the extensively applied TNM classi cation system, our nomogram was easily used and quantitatively predicted the prognosis. Moreover, nomogram can assist clinicians in managing the complicatedsituations where there is no xed clinical guideline.
There are certain strengths in this work. We obtained su cient clinicopathological data from the SEERderived ICC cases, which ensured that we established an accurate prognosis nomogram. Our nomogram outperformed the TNM 8th classi cation system in terms of its discrimination ability of OS prediction. The presentation and validity of the nomogram were also con rmed by calibration. In addition, our work used 10 available clinical factors extensively used clinically, making it convenient for the use of our nomogram.
Nonetheless, several limitations must be noted as well. Firstly, selection bias was inevitable due to the retrospective nature. Secondly, some vital prognosis-related clinicopathological factors, including status of surgical margin, detailed data on radiotherapy and chemotherapy, were not available from the SEER database, and future research should focus on these aspects. Thirdly, our constructed nomogram, which might serve as a user-friendly approach for the decision-making of doctors,did not incorporate each prognostic factor or always offer accurate prognosis prediction in clinical practice.

Conclusion
To sum up, this study rst constructs and validates a nomogram for predicting the OS for ICC at 3 and 5 years by using a large population-based cohort. Our constructed nomogram performs well and may serve as the e cient approach to predict prognosis for ICC patients. Nonetheless, more external validation is also needed.

Declarations
Funding This work received no funding.
Con ict of interest All authors declare that they have no con icts of interest associated with this study.

Author contributions
Chun-jiao Wu designed this study. Yong-jing Yang, Ling Cao, Ling Yan, Jing Zhu and Qiang Li wrote the main manuscript text. Chun-jiao Wu revised the manuscript nally. All authors reviewed the manuscript.

Ethical Approval
The data analysis was considered by the O ce for Human Research Protection to be non-human subjects who were researched by the United States Department of Health and Human Services, as they were publicly available and de-identi ed. Thus, it did not require approval by the institutional review board.

Data Availability Statement
The data that support the ndings of this study are available from the corresponding author upon reasonable request. wAll are compared with the Nomogram; HR: hazard ratio; CI: con dence interval. Figure 1 Flow chart for screening eligible patients.   Vertical bars indicate 95% con dence intervals.