Development and validation of magnetic resonance imaging-based nomograms for patients with perihilar cholangiocarcinoma after radical resection: Preoperative prediction of survival


 BackgroundSurvival status prediction for perihilar cholangiocarcinoma (pCCA) patients is essential for postoperative clinical decision making. This study aimed to develop and validate prediction models for overall survival (OS) evaluation in pCCA patients preoperatively.Materials and MethodsA total of 184 patients who had curative resection for pCCA between January 2010 and December 2018 were enrolled. 110 patients were randomly selected for model development while other 74 patients for model testing. Imaging-derived radiomics signatures were developed. Independent preoperative clinical predictors were involved independently or in combination with radiomics signatures to construct different preoperative models through multivariable Cox proportional hazards method. The nomograms were constructed to predict OS, and the performance of which was evaluated by the discrimination ability, time-dependent receiver operating characteristic curve (ROC), calibration curve and decision curve.ResultsThe clinical model (Modelclinic) was constructed based on three independent variables including preoperative CEA, cN stage and invasion of hepatic artery in images. The model with best performance (Modelclinic&AP&PVP) was build based on three independent variables, SignatureAP and SignaturePVP. In training and testing cohorts, the concordance indexes (C-indexes) of Modelclinic were 0.846 (95% CI, 0.735-0.957) and 0.755 (95% CI, 0.540-969), and Modelclinic&AP&PVP manifested favorable performance with C-indexes of 0.962 (95% CI, 0.905-1) and 0.814 (95% CI, 0.569-1), and both of them outperformed TNM staging system (C-indexes, 0.616, 95% CI, 0.522-0.711 and 0.599, 95% CI, 0.490-0.708). Good agreement was observed in the calibration curves, and favorable clinical utility was validated using the decision curve analysis both for Modelclinic and Modelclinic&AP&PVP.ConclusionsTwo preoperative nomograms were constructed to predict 1-, 3- and 5-years survival following surgery for individual pCCA patients. Such methods are easy to be performed which have clinical application potential for decision-making and patients stratification in randomized, controlled trials.


Abstract
Background Survival status prediction for perihilar cholangiocarcinoma (pCCA) patients is essential for postoperative clinical decision making. This study aimed to develop and validate prediction models for overall survival (OS) evaluation in pCCA patients preoperatively.

Materials and Methods
A total of 184 patients who had curative resection for pCCA between January 2010 and December 2018 were enrolled. 110 patients were randomly selected for model development while other 74 patients for model testing. Imaging-derived radiomics signatures were developed. Independent preoperative clinical predictors were involved independently or in combination with radiomics signatures to construct different preoperative models through multivariable Cox proportional hazards method. The nomograms were constructed to predict OS, and the performance of which was evaluated by the discrimination ability, time-dependent receiver operating characteristic curve (ROC), calibration curve and decision curve.

Results
The clinical model (Model clinic ) was constructed based on three independent variables including preoperative CEA, cN stage and invasion of hepatic artery in images. The model with best performance (Model clinic&AP&PVP ) was build based on three independent variables, Signature AP and Signature PVP . In training and testing cohorts, the concordance indexes (C-indexes) of Model clinic were 0.846 (95% CI, 0.735-0.957) and 0.755 (95% CI, 0.540-969), and Model clinic&AP&PVP manifested favorable performance with C-indexes of 0.962 (95% CI, 0.905-1) and 0.814 (95% CI, 0.569-1), and both of them outperformed TNM staging system (C-indexes, 0.616, 95% CI, 0.522-0.711 and 0.599, 95% CI, 0.490-0.708). Good agreement was observed in the calibration curves, and favorable clinical utility was validated using the decision curve analysis both for Model clinic and Model clinic&AP&PVP .

Conclusions
Two preoperative nomograms were constructed to predict 1-, 3-and 5-years survival following surgery for individual pCCA patients. Such methods are easy to be performed which have clinical application potential for decision-making and patients strati cation in randomized, controlled trials.

Background
Over the past four decades, the overall incidence of cholangiocarcinoma (CCA) has increased progressively worldwide [1]. CCA is a devastating malignancy, which derived from cells of the biliary tree and may occur anywhere in the biliary tree [2]. CCA is categorized according to anatomical location and divided into intrahepatic cholangiocarcinoma (iCCA), perihilar cholangiocarcinoma (pCCA), and distal cholangiocarcinoma (dCCA) [3]. The most common type is pCCA, and pCCA is referred to as 'Klatskin' tumor sometimes [4]. pCCA arise in the perihilar region, the upper boundary of which is the second-order of bile ducts, the lower border is conjunction of the cystic duct and common hepatic duct [5]. Radical surgical resection is the only curative option for pCCA. Although accepted surgical resection, unfortunately, the prognosis is still poor along with high rate of local recurrence [6,7]. Several recent studies indicated that adjuvant therapy could improve outcomes in resected pCCA [4,5]. So, it is valuable to stratify patients who will have a short life expectancy and may bene t from oncologic treatment.
Oncologists and patients alike desire reliable prognostic tool tailored to the individual patient. The American Joint Committee on Cancer (AJCC) tumor-node-metastasis (TNM) staging system was mostly designated for predicting prognosis. The Amsterdam Medical Centre (AMC) Hilar Cholangiocarcinoma Group have proposed a nomogram which embrace post-operative tumor biological characteristics [8].
The Memorial Sloan-Kettering Cancer Centre (MSKCC) system, primarily used for assessment of potential resection, has been con rmed to be able to stratify patients' prognostic putatively [9]. But the accuracy of them were not satisfying [10]. There is an urgent need for an alternative tool to predict prognosis individually.
To achieve personalized medicine, personalized imaging is essential. Tumors are characterized by genetic and phenotype variation. Heterogeneity of malignancy is associated with cancer treatment failure and thus a poor prognosis [11]. Biomedical images contain information that re ects underlying pathophysiology. These relationships can be revealed via quantitative image analyses [12]. Radiomics could be used for extracting quantitative data which are undetectable at visual morphologic analysis by using dedicated algorithms [13]. The radiomics features have been reported to correlate with the prognosis of patients with malignancies [14][15][16][17]. And radiomics signature has also been involved to construct nomograms for individualized diagnosis [18,19]. Nomograms can create a simple graphical representation of a statistical predictive model interpreting numerical probability of a clinical event, which is widely utilized in cancer area [20,21]. The ability of nomogram to generate individualized prediction enables its use in clinic decision making. For many cancers, nomograms have been used for comparing to the traditional TNM staging systems favorably, and even have been proposed as an alternative or a new standard [22][23][24][25].
To our knowledge, radiomics-based nomogram for predicting prognosis in pCCA has not yet been established. Accordingly, the aim of this study is to develop and validate a decision support tool by incorporating preoperative clinical variables and MRI-based radiomics signature for pCCA patients in a high-volume center. In the ideal scenario, we hypothesized that the new developed tool could predict the prognosis of pCCA resected with curative intent accurately.

Materials And Methods
The current study was performed as a retrospective study. The ethics committee of West China Hospital, Sichuan University approved this retrospective analysis, and waived the requirement for informed consent All data were transferred to a devoted workstation (Advantage Workstation 4.6, GE Healthcare) for analysis. Two radiologists with 16 (Wei Zhang, reader1) and 15 (Jun Zhang, reader2) years of experience in abdominal MRI diagnosis reviewed the imaging features of all patients independently. The radiologists were aware that the patients had pCCA, but they were blinded to the exact pathologic type and TNM stage. Differences in their ndings were resolved by consensus. Conventional MRI ndings were assessed focus on invasion of hepatic artery, invasion of portal vein, clinical lymph nodules status (cN stage) and growth patterns of the tumors. Invasion of hepatic artery and portal vein were considered present at following signs: ( ) overt tumor thrombus in the vessel. ( ) the vessel was deformed, stenosis or occluded. ( ) 50% or more of the circumference of vessel was surrounded by tumor. ( ) the border between the tumor and the relevant vessel was irregular. cN stage was considered according to the 8th edition of AJCC staging system. Stage cN0 represents free of metastatic regional LNs. Stage cN1 was de ned as patients with 1-3 metastatic regional LNs. Stage cN2 was de ned as > 3 metastatic regional LNs [3]. Metastatic regional LNs was de ned as: ( ) the lymph node has a short diameter of more than 1 cm. ( ) fusion of the lymph nodes. ( ) the occurrence of a centrally necrosis.

Univariate and multivariate COX regression analysis of preoperative variables
After univariate analysis, variables with P < 0.10 were inputted to multivariate analysis. Independent variables were test for the proportional hazards (PH) assumption and used for construction of model.

Image acquisition, Region-of-Interest segmentation
All images were stored in a Digital Imaging and Communications in Medicine (DICOM) format and anonymized. Prior to image analysis, the MR images at contrast enhanced artery phase (CE-AP) and portal vein phase (CE-PVP) were preprocessed by two steps as follows: (1) Intensity standardization (Zscore normalization) was conducted for all the MR images to reduce the variability across different image acquisitions and improve the radiomics feature reproducibility to some extent [29]. (2) Every standardized MR image was resampled to a uniform voxel dimension of 1.0 × 1.0 × 1.0 mm 3 before region-of-interest (ROI) segmentation. Image registration for preoperative MR images at artery and portal vein phases was rstly performed with in-house software (Arti cial Intelligence Kit, AK, version 3.2.2, GE Healthcare).
ROI segmentation was performed using ITK-SNAP software (version 3.6.0, open source software; https://itk.org/). ROI was delineated for the entire tumor on each slice of both axial CE-AP and CE-PVP images in each patient. To ensure intra-and inter-observer reproducibility, layer-by-layer images segmentation was performed along the tumor contour separately on CE-AP and CE-PVP MR images by two radiologists (twice by reader1 and once by reader2) for 30 patients selected randomly. The two radiologists segmented the images with double-blinded manner. The previously mentioned radiologist (Wei Zhang, reader1) then nished all tumor segmentation.
Radiomics feature extraction and selection, and model development Radiomics features were extracted from CE-AP and CE-PVP images separately based on segmented tumor ROI by using in-house software (Arti cial Intelligence Kit, AK, version 3.2.2, GE Healthcare). For respective CE-AP and CE-PVP images, 396 radiomics features were extracted, including 42 histogram features, 9 morphological features, 144 gray-level co-occurrence matrix (GLCM) features, 180 gray-level run length matrix (GLRLM) features, 11 gray-level size zone matrix (GLSZM) features and 10 Haralick features. The description of the radiomics features which are involved in the radiomics OS predicting models were described in Appendix E2.
Radiomics feature with missing values were rstly replaced with median values. Prior to feature selection, we used the intra-and interclass correlation coe cient (ICC) with the 95% con dence interval (CI) to assess the agreement for each extracted feature. After ICC calculation, features (ICC > 0.7, indicated good agreement) of CE-AP and CE-PVP were kept for further selection. Then the dataset was randomly divided by the ratio of 3:2 into training (110 patients) and testing (74 patients) cohorts, and the feature selection was performed in the training cohort.
The independent signi cant radiomics features for OS prediction were selected separately based on CE-AP and CE-PVP features. Following the Z-score normalization of radiomics feature in the training and testing cohorts, the redundant features in the training set were excluded by intra-correlation analysis at a cut value of 0.9, which means only one feature was retained when the correlation coe cient between two features larger than 0.9. Next, the univariate COX regression (P < 0.05) and the least absolute shrinkage and selection operator (LASSO) COX regression with 10-fold cross validation was used to select independent features for OS. Two radiomics signatures (Signature AP and Signature PVP ) were built through a linear combination of selected features weighted by their corresponding coe cients generated from LASSO Cox regression models.

Assessment of models
Then, we compared the performance of the models. The C-index was employed to assess the discrimination power of the models. The C-index was commonly used to evaluate the discriminative ability of prognostic models in survival analysis. The value of the C-index can range from 0.5 to 1.0 (0.5 indicates no discriminative ability and 1 indicates perfect performance). The best predictive model was determined by performance of comparison and selected for subsequent use. Meanwhile, comparison with the AJCC staging system was carried out using concordance probabilities.

Development of nomogram and validation of the prediction model
To provide a more visualized and individual predictive model, we drafted nomograms for the best predictable model and the clinical model. The discriminant accuracy of the model was further evaluated using the area under curves (AUC) of time-dependent receiver operating characteristic curve (ROC). Calibration curves were assessed by plotting the nomogram-predicted probabilities against the observed rates graphically via a bootstrap method with 1000-iteration resampling. Decision curve analysis (DCA) was employed to evaluate the clinical usefulness of models, through calculation of the net bene t for a range of threshold probabilities [30]. In addition, risk strati cation using the constructed model was conducted. Kaplan-Meier survival curves were plotted, and the log-rank tests were used to compare OS between the low-risk and high-risk groups. The performance of the models was then validated in an independent testing set by using the formula generated from the training cohort.

Sample size estimation
As limited by the available sample size and no generally accepted methods to estimate the sample size for developing radiomics-based pCCA risk prediction models, we ensured that the number of positive outcome events in our study was near or larger than 10 times that of covariates (predictors) referring to the TRIPOD statement [26]. In the current study, there are 73 positive outcome events in the training set (110 patients in total) which meet the required condition. Meanwhile, there are 8, 2 and 3 independent predictors retained in the arterial-phase, portal-vein-phase radiomics and preoperative clinical OS predicting model, respectively.

Statistical analysis
All statistical analyses were performed using Statistical Package for Social Sciences software (SPSS, version 25.0, IBM, Armonk, NY, USA), R software (Version: 3.5.3, https: www.r-project.org). Continuous variables were presented as the mean and standard deviation (SD). The Mann-Whitney U-test or the t-test was performed to compare continuous variables. Cut-off values were determined by using the X-tile software (Version: 3.6.1, Yale University, New Haven, CT) [31]. Categorical variables were used Chisquared (χ2) test or Fisher's exact test to compare between groups. The Schoenfeld residual test was used to test the proportional hazards (PH) assumption for the selected clinical features [32]. All C-indexes and HRs were reported with 95% CIs. A two-tailed P-value < 0.05 was considered statistically signi cant. The following R packages were mainly used: "glmnet" for LASSO COX regression algorithm; "survival" for survival analyses including PH assumption of COX regression, COX regression analysis, Kaplan-Meier analysis and decision curve analysis; "tdROC" for time-dependent ROC analyses.

Clinical characteristics and OS
The patients selecting work ow of the current study was shown in Fig. S1. A total of 184 patients underwent radical resection for pCCA were enrolled (115 male, 69 female; mean age 59.2 ± 10.3 years). The dataset was divided randomly into the training cohort (110 patients) and testing cohort (74 patients).
The detailed clinical characteristics and treatment parameters of patients with pCCA in the training and testing cohort are shown in Cut-off values of preoperative variables were determined by X-tile software and were illustrated in Table  S1. Univariate analysis exploring clinical factors found that preoperative white blood cell count, neutrophile count, lymphocyte count, CEA, CA19-9, cN stage, invasion of hepatic artery and portal vein in images were potentially associated with OS. Multivariate COX regression analysis identi ed CEA, cN stage and invasion of hepatic artery in images as independent variables (Table S1). The Schoenfeld residual test showed that the selected variables satis ed the proportional hazards assumption, and the constructed model as well (Fig. S2, P = 0.766).

Models building, comparison and validation
Signature AP , Signature PVP and three independent clinical variables (preoperative CEA, cN stage and invasion of hepatic artery in images) were used to build Cox regression models with various combinations. For the training and testing cohorts, the C-indexes of these models for survival prediction were illustrated in Table 2.  (Fig. 2a) and Model clinic&AP&PVP (Fig. 2b). Furthermore, Time dependent-ROC analysis was applied to validate Model clinic and Model clinic&AP&PVP in predicting OS, and gained satisfactory results ( Fig. 3a-d).
The nomograms showed a signi cant improvement in predicting survival over the TNM staging system.
In the training cohort, the Model clinic showed a better prognostic capability than TNM staging system (Cindex: 0.846 vs. 0.616). In the testing cohort, compared to TNM staging system, a higher C-index value was observed (C-index: 0.755 vs. 0.599). In the training cohort, Model clinic&AP&PVP showed a better prognostic capability than TNM staging system (C-index: 0.962 vs. 0.616). In the testing cohort, a better prognostic ability was also observed in the novel nomogram (C-index: 0.814 vs. 0.599). The timedependent ROC curves manifested that the Model clinic ( Fig. 3a and Fig. 3b), Model clinic&AP&PVP ( Fig. 3c and Fig. 3d) outperformed TNM staging system ( Fig. 3e and Fig. 3f) in the training cohort and testing cohort in predicting both short-and long-term survival.
The calibration curves were plotted for Model clinic (Fig. 4a-b) and Model clinic&AP&PVP (Fig. 4c-d). Good agreement was observed in the calibration curves between the nomograms estimated 1-, 3-and 5-years OS rate and the observed OS rate in both Model clinic and Model clinic&AP&PVP . But less satis ed results were observed in TNM staging system (Fig. 4e-f).

Subgroup survival analysis strati ed by the nomograms
By Model clinic , we calculated the risk score for each patient in training and testing cohorts. The formula was as follow: risk score clinic = 0.7643512 × CEA + 0.6273337 × cN + 1.4823995 × invasion of hepatic artery in images. The optimal cutoff value of 0.627 was calculated by R function "surv_cutpoint" which is an outcome-oriented method. Based on this cutoff value, the patients were appointed to low-risk and high-risk groups. In training cohort, the risk score of the high risk subgroup was 1.  Fig. 5b).
Similarly, risk score for each patient was acquired by Model clinic&AP&PVP . The formula was as follow: risk score clinic&AP&PVP = 0.7137881 × Signature AP + 0.4683287 × Signature PVP + 0.9269602 × CEA + 0.4928044 × cN + 1.1506710 × invasion of hepatic artery in images. The optimal cutoff value was 0.198 which strati ed the patients into low-risk and high-risk groups. In training cohort, the risk score of the high risk subgroup was 1.566 (range, 1.422 to 1.711), and that of low risk group was − 0.423 (range,-0.719 to -0.128). The risk scores of the high risk subgroup and low risk subgroup were 1.444 (range, 1.288 to 1.600) and − 0.446 (range, -0.889 to -0.004) in testing cohort. Patients in the high-risk subgroup of the training cohort showed a poorer survival rate than low-risk subgroup in the subgroup survival analysis (1-  Fig. 5d).

Clinical Use
The decision curve analysis for Model clinic , Model clinic&AP&PVP and TNM staging system were presented in Fig. S4a-d. According to DCA, when the threshold probability for a patient was within the range of 20-80%, the nomogram added more net bene t than the "treat all" or "treat none" strategies in training cohort in predicting both 3-year survival (Fig. S4a) and 5-year survival (Fig. S4b) in Model clinic . Model clinic&AP&PVP had similar performance ( Fig. S4a and Fig. S4b for 3-year and 5-year survival) in training cohort. In testing cohort, good result was observed in predicting 3-year survival (Fig. S4c), but fair result in predicting 5-year survival (Fig. S4d) in Model clinic . Model clinic&AP&PVP had similar performance (Fig. S4c and Fig. S4d for 3-year and 5-year survival). Both Model clinic and Model clinic&AP&PVP had more favorable clinical utility than TNM staging system.

Discussion
In recent years, gratifying achievements have been made in a few study of cholangiocarcinoma.
Radiomics nomogram based on radiomics features and clinical characteristics was shown to predict early recurrence of iCCA after resection preoperatively [33]. Radiomics signatures based on ultrasound (US) medicine images have been proven to predict the biological behaviors of iCCA and had moderate e ciency [34]. In a study by Ji et al. [35], a radiomics model was established for predicting LNs metastasis of iCCA. Another study also con rmed the applicability of such method [36]. Radiomics model was also suggested to predict differentiation degree and lymph node metastases of extrahepatic cholangiocarcinoma [37]. To our knowledge, no study was performed to investigate individual prediction of survival of pCCA using MRI features. In this study, several MRI based prognostic models were derived and validated to predict OS after curative intent resection of pCCA. The models were based on Signature AP , Signature PVP and three independent clinical variables (CEA, cN stage and invasion of hepatic artery in images). Model clinic and Model clinic&AP&PVP which were presented as nomogram showed good discrimination. Calibration curves were good in the training and testing cohort. Both of them outperformed AJCC staging system, and can be used to assist clinical decision-making, inform patients' prognosis and stratify patients in randomized, controlled trials.
The performance of the AJCC TNM staging system to predict long-term outcome was poor (< 0.7) both in current training cohort (C-index, 0.616) and testing cohort (C-index, 0.599). Ruzzenente et al. [38] demonstrated that the 8th edition was superior to 7th edition, with C-index of 0.624. A study from South Korea con rmed this result with C-index of 0.621 [39]. The C-index of 8th edition was 0.67 in another study from Netherlands in patients who underwent a curative-intent resection [40]. Our results were similar to those literature, and Model clinic and Model clinic&AP&PVP had an improved performance in predicting OS compared with TNM staging system. cohort [42]. A study incorporated variables including age, surgery, SEER historic stage and lymph node metastases to establish a nomogram, and the C-index for OS prediction was 0.651 [43]. Whereas validation cohort was absent in their study. Groot Koerkamp et al. [44] proposed a nomogram based on data from the Memorial Sloan Kettering Cancer Center (MSKCC) for patients with resected pCCA. The concordance index was 0.73 in primary cohort and 0.72 in the validation cohort. Buettner et al. [10] demonstrated that the proposed nomogram by MSKCC for pCCA performed poorly, with C-index of 0.587.
In that study, a revised nomogram was constructed, but the performance still less satis ed (C-index, 0.682). Most of these studies only highlight the clinicopathological characteristics postoperatively. Our nomograms were driven from preoperative variables and had a higher C-index. In clinical practice, when radiomics feature could not gain conveniently, this clinical model could act as an effective substitute.
The calibration curves of Model clinic and Model clinic&AP&PVP indicated that the constructed nomograms were reliable and robust predictive models. Few patients could survival ve years after surgery, thus the 3year survival rate was the most important factor affecting clinical decision-making. So, although the decision curve analysis manifested fair result in predicting 5-year survival in testing cohort, the clinical utility will not affected. Therefore, the proposed MRI-based models can be applied directly in clinical practice. The decision curve analyses also con rmed that our nomograms had better clinical usefulness compared to 8th edition of AJCC TNM staging system.
The current study has several limitations. Firstly, our study is limited by its retrospective nature, selection bias may exist. Secondly, this study was performed in a single institute and single equipment, this may impede application of propagation, further multi-national and multi-institutional studies are required to con rm this result. Thirdly, regarding ROI segmentation, development of robust and reliable methods for auto-segmentation may be necessary in future [45]. Fourthly, this study only included CE-AP and CE-PVP sequences, whereas did not embrace diffusion weighed imaging (DWI), non-enhanced T1WI and T2WI.
The reason for this selection is that the tumor is generally small and not conducive to delineation in nonenhanced images. Furthermore, the cohort of this study were conducted in patients who had curativeintent resection, thus its use is limited to the operative situation. At last, the knowledge regarding different molecular markers (gene-and protein expression) in bile duct tumors is rapidly developing. Introducing those markers into the model may further increase its accuracy.

Conclusion
We constructed two nomograms which are potential for easy day-to-day clinical use for individual patient. The nomograms provide tools to preoperatively predict 1-, 3 Figure 1 Work ow of necessary steps in current study. Tumors were segmented manually on axial hepatic artery and portal venous phase MRI section. Radiomic features were extracted from within the de ned tumor contours on MRI images to quantify tumor intensity, shape, and texture. For feature selection, two successive steps were applied to the extracted features. At rst, with inter-and intra-observer reliability assessment, and then univariate and multivariate LASSO-Cox regression. To provide a more understandable outcome measure, nomograms were built for individualized evaluation. The performance of the model was assessed and followed by decision curve analysis and survival prediction.  and testing cohort (d). AJCC TNM staging system had an unsatis ed performance in training cohort (e)

Figures
and testing cohort (f).

Figure 4
Calibration curves of models to predict survival probability. The curves of Modelclinic showed good agreement between the predicted and the actual 1-, 3-and 5-year OS in the training cohort (a) and testing cohort (b). The curves of Modelclinic&AP&PVP showed favorable agreement between the predicted and the actual 1-, 3-and 5-year OS in the training cohort (c) and testing cohort (d). The curves of AJCC TNM staging system showed fair agreement between the predicted and the actual 1-, 3-and 5-year OS in the training cohort (e) and testing cohort (f). Model predicted OS probability was plotted on the x-axis. The actual OS rate was plotted on the y-axis. The closeness of the lines indicated absence of systematic bias.
All CIs lie over the 45° dotted line of perfect calibration in Modelclinic and Modelclinic&AP&PVP.