Preoperative Differentiation of Pancreatic Cystic Neoplasm Subtypes on CT-Based Radiomics: SCN, MCN And IPMN

Background: The serous cystic neoplasm (SCN), mucinous cystic neoplasm (MCN), intraductal papillary mucinous neoplasm (IPMN) comprise the large proportion of pancreatic cystic neoplasm (PCN). The appropriate clinical management of MCN and IPMN isextremely essential to improve the 5-years survival rate for the early detection of pancreatic cancer. However, the differential diagnosis of patients with PCN before the treatment is still a tough challenge for all surgeons. Therefore, a reliable diagnosis tool is urgently required to be established for the improvement of precision diagnostics. radiomics model (AUC = 0.850), with a large net benet in the decisive curve analysis. The radiomics-based nomogram provided the correct predicted probability for the diagnosis of PCN. Conclusion: The proposed radiomics models with clinical-radiologic parameters and radiomics features helped predict the accurate diagnosis among SCN, MCN, and IPMN to advance personalized medicine.


Introduction
Pancreatic cancer is a highly aggressive and malignant gastrointestinal tumor with the lowest 5-year survival of 10% for all cancers [1,2]. Pancreatic cancer is estimated to be the second cause of cancer-associated death in the next few decades in the USA [3]. However, once the apparent clinical symptoms are present, almost all the patients are diagnosed with the advanced stage [2,4]. Currently, the early detection of pancreatic cancer is widely recognized to improve the patient health bene t of longer median survival times [5,6]. The mucinous cystic neoplasm (MCN) and intraductal papillary mucinous neoplasm (IPMN) among the pancreatic cystic neoplasm (PCN) are the important precursor lesions to provide the greater likelihood for the screening and early diagnosis of pancreatic cancer [7,8]. Consequently, the correct diagnosis of PCNs before the treatment is considerably essential for clinical decision support.
Pancreatic cystic neoplasms represent a heterogeneous group of cystic lesions that mainly consisting of serous cystic neoplasm (SCN), mucinous cystic neoplasm (MCN), intraductal papillary mucinous neoplasm (IPMN), solid pseudopapillary neoplasms (SPN), and cystic pancreatic neuroendocrine tumor (PNET), in which the IPMN, SCN and MCN are the major types of the PCNs [9,10]. The ability of differential diagnosis among three PCN subtypes of SCN, MCN and IPMN ahead of treatment is necessary to improve the management and clinical outcomes due to the variable potential of malignancy [11]. IPMN and MCN patients with a high rate of malignant potential are mostly required to either undergo the surgical resection or receive the lifelong follow-up for surveillance of disease progression [12,13]. Conversely, SCN was considered as benign lesions; the surgery and follow-up are not recommended unless the speci c symptoms exist [14].
The reported prevalence of pancreatic cystic neoplasms is dramatically increasing with the widespread recommendation of annual physical examinations [15]. Presently, the majority of PCNs are incidentally detected on the abdominal computed tomography (CT) examinations, and thus these CT scans provide the available imaging dataset for diagnostic studies [16]. The pancreatic protocol CT and gadolinium-enhanced MRI with MRCP are the main diagnostic imaging modalities for the identi cation of different PCNs subtypes, and the endoscopic ultrasonography (EUS) is recognized as the complementary means to obtain a more precise diagnosis [17,18]. Although the promising advances in imaging techniques, the distinction between the various PCN types still represents a great clinical challenge for practicing physicians [19]. Consequently, the phenomenon of misdiagnosis for PCNs patients is commonly reported even in the renewed hospital of the big city [20].
The exponential advances in medical image analysis have facilitated the progression of the novel eld called radiomics, which represents the processes for the high-throughput extraction of numerous quantitative features from medical images [21,22]. Radiomics provides the great potential to assess the extensive inter-and intratumoural heterogeneity hidden in the radiologic images, as well-known as a non-invasive approach [23]. The role of radiomics is committed to becoming an important adjunct to clinical decision making in two aspects of diagnosis and prognosis [24,25]. Previous studies suggested that widespread use of radiomics in pancreatic lesions have signi cantly improved the ability of diagnosis and clinical management[26-28]. Moreover, some recent studies indicate that radiomics hold great promise for differential diagnosis of PCN subtypes [29][30][31]. Nevertheless, there is still an urgent demand to improve the diagnosis performance of radiomics-based models.
Our study aims to construct an automated model to make the precise diagnosis of PCNs for precision medicine. The radiomics models consisting of clinical-radiologic parameters and radiomics features would improve the diagnosis accuracy and assist the involved physicians in making optimized treatment decisions to reduce the morbidity and mortality rates of pancreatic cancer.

Patients
The retrospective study was approved by the ethics committee of the Institutional Review Board of Nanjing Drum Tower Hospital, and the informed consent from patients was waived.
Between February 2016 and December 2020, 302 consecutive patients with PCNs who underwent curative resection at our center were retrospectively evaluated (Fig. 1).
The inclusion criteria were as follows: (a) with the postoperative pathological diagnosis of MCN, SCN or IPMN; (b) with the identi able cystic lesion in contrast-enhanced CT images within three weeks before curative resection; (c) without any systemic or focal therapy before CT examination and surgery; (d) with available clinicopathological parameters.
The exclusion criteria were as follows: (a) complicated with other hepatic-biliary-pancreatic malignant tumors (n = 27) ;(b) with artifacts on CT images in arterial phase or venous phase (n = 9); (c) with di culty in outlining the margin of the tumor because of the size (the maximum diameter <1cm) and poor contrast of the lesion (n = 10).
Eventually, a total of 143 patients (median age, 53 years; interquartile range,43-66 years; 77 females ,66 males) were enrolled in our study cohort. The dataset of patients with PCN was randomly divided into two groups at the ratio of 7: 3, development and test cohort. The development cohort consisted of 102 patients (MCN 35, SCN 33,IPMN 34) and the test cohort consisted of 41 patients (MCN 10, SCN 14, IPMN 17).
Preoperative clinical data and postoperative pathological information of those patients were obtained from the electronic medical record system.

Ct Technique
All CT examinations were performed with a multidetector spiral CT scanner (LightSpeed, VCT, or Discovery HD750, GE Healthcare, US). The parameters of contrast-enhanced CT scans were as follows: tube voltage 120 kVp, tube current 250-300 mAs, slice thickness: 1.25mm, slice interval: 1.25mm, matrix: 512x512, rotation time: 0.6s, helical pitch 1.375, eld of view: 35-40cm. All patients were required to fast for at least 6-8 hours before the CT scan and placed in the supine position with their arms raised during the scanning. The scan range was from the dome of the diaphragm to the lower pole of the kidney. After the unenhanced CT scan, the iodinated contrast agent (Omnipaque 350 mgI/mL, GE Healthcare, US) at a rate of 1.5 mL/kg per body weight was injected intravenously by the high-pressure syringe (Medrad Stellant CT injector system; One Medrad Drive Indianola, PA, US). The pancreatic phase, and portal venous phase of contrast enhancement were obtained respectively at 40-50 seconds and 65-70 seconds after the injection of the contrast agent. The mean interval between the CT scan and surgery was 8.5±3.6 days (range, 7-20 days).

Ct Images Acquisition And Segmentation
The radiomics work ow is shown in Figure 2. The contrast-enhanced computed tomography images of PCNs with 1.25 mm thickness (DICOM) were obtained from the Picture Archiving and Communication System (PACS) in our hospital. The pancreatic phase of the contrast-enhanced CT images revealed the distinct regional difference in contrast sensitivity between the normal pancreas tissue and pancreatic cystic tumor, which was selected for analysis of the tumor heterogeneity and correlation with clinical parameters and imaging features. The regions of interest (ROIs) of the whole tumor area were automatically segmented with a deep-learning segmentation algorithm for the study. The segmentation images of all tumor regions were corrected on each section by two radiologists (Z.Y.F and W J, with 4 years and 10 years of experience in abdominal imaging, respectively) using the ITK-SNAP (version3.8.0, https://www.itksnap.org/), avoiding the pancreatic parenchyma and large vessels around the tumor and adjacent abdominal organs. Two radiologists were blinded to the clinicopathological information of all the patients during the period of segmentation. The radiologists performed the segmentation twice in a 2week independently to evaluate inter-and intra-observer reproducibility to determine the stability of each radiomic feature extracted from the ROIs.

Radiomic Feature Extraction And Selection
The image pre-processing and radiomic feature extraction were implemented by the Open-source Pyradiomics package (version 3.0.1, https://github.com/Radiomics/pyradiomics). Resampling-based approaches and normalization techniques were performed to improve the repeatability, stability, and accuracy of feature extraction. The voxel size of CT images was resampled to the 1*1*1mm 3 ; images intensity was normalized by discretizing values of voxel intensity as 64 gray levels for the non-homogeneity of spatial intensity. A total of 1218 radiomics features were extracted from each of three-dimensional segmentation in the pancreatic phase, which consisted of the rst-order statistics (n = 19), shape descriptors (3D, n = 16), 75 texture features (GLCM, GLSZM, GLRLM, NGTDM, GLDM; n = 75), wavelet decompositions (n = 688) and LoG-based features (n = 430). Subsequently, all feature values were centered and standardized by z-scores for all analyses.
The feature selection process was conducted to follow the three steps for identifying the robust characteristics. Before the feature screening, all data were normalized to range [0,1], avoiding the impact of signi cant discrepancy between the values. The speci c process of feature selection was as follows. In the rst step, we excluded the radiomic features with poor stability from the dataset (intraclass correlation coe cients lower than 0.75) in the assessment of test-retest and interobserver agreement. Subsequently, the Pearson correlation test was employed to lter out the redundant features with the average and absolute coe cient higher than 0.8. Ultimately, the Boruta algorithm was applied to select the important features for the establishment of radiomic signature by the Random Forest (RF) classi er, as high-related with differential diagnosis in the PCNs.

Model Development And Validation
We developed two types of preoperative radiomics models by RF classi er: the multi-class prediction model was constructed to discriminate three pancreatic cystic neoplasm subtypes; the binary-class prediction models made pairwise classi cation among three subtypes of MCN, SCN and IPMN. The above radiomics models included the radiomic signature and signi cant clinical-radiologic characteristics before surgery. The overview of the RF modeling process was shown in Figure 3. Univariate analysis of clinical-radiologic parameters with P value lower than 0.1 was entered into multivariable analysis. The nal risk factors among the clinical and imaging features were selected using multivariable logistic regression by stepwise backward method to eliminate the nonsigni cant features. All of models were validated in an independent test cohort.

Statistical Analysis
Shapiro-Wilk test and Levene test were used for normality and homogeneity of variances, respectively. Continuous variables were expressed as median with interquartile range in parentheses, and categorical variables were expressed as total numbers and frequencies. Univariate analysis was performed to analyze clinical-radiologic parameters: Chi-square test, corrected chi-square test or Fisher test was used for categorical variables; independent sample t-test or Mann-Whitney U test was used for continuous variables. Multivariate logistic regression analysis was applied to select the important predictors of clinical parameters. The radiomics signature was built based on the selected radiomics features by the packages "Boruta" in R. The receiver operating characteristic (ROC) analysis was used to evaluate predictive e ciency of the model. And the area under ROC curve (AUC), sensitivity and speci city were obtained to assess the performance of the prediction model. Decision curve analysis was performed to evaluate the net bene ts of the prediction models. The statistical analysis was conducted with software (R, version 4.1.0, http://www.r-project.org). A two-tailed P value less than 0.05 was considered statistically signi cant.

Patients Characteristics of the Study Cohorts
The preoperative clinical-radiologic characteristics of all PCN patients were found no signi cant difference in the development and test cohort, as shown in Table 1  Univariate And Multivariate Analysis Of Clinical-radiologic Parameters Table 1 showed the researchers retrospectively collected and analyzed baseline characteristics that were considered clinically relevant with the outcome of diagnosis. Univariate logistic regression analysis of the clinical data and radiological features indicated that the age, sex, abdominal symptom, serum tumor markers [alphafetoprotein (AFP), carcinoembryonic antigen (CEA)], serum alanine aminotransferase (ALT), serum aspartate aminotransferase (AST), tumor diameter, calci cation, bile duct dilatation and lesion location were statistically signi cant (P < 0.1) between the three subtypes in the development cohort. Next, the signi cant variables were entered into multivariate logistic regression to obtain the risk factors for diagnosis of PCNs. The statistical data of multivariate analysis was completely shown in Table 2. The result indicated that the age (P = 0.001, 0.005, 0.041), sex (P = 0.005, 0.041, 0.017) and tumor diameter (P < 0.000, 0.020, 0.021) were independent risk factors for differential diagnosis between MCN and SCN, MCN and IPMN, SCN and IPMN. The Boruta algorithm was conducted to screen 13 features to construct a radiomics signature by RF analysis. The important features selected from Boruta algorithm were presented in Figure 4. The radiomics signature demonstrated good prediction ability with the out-of-bag (OOB) error of 0.317 and a C index of 0.772 in the test cohort, and the diagnosis performance was summarized in Table 3.

Prediction Models Development And Validation
The radiomics-based models were established by three signi cant clinical-radiologic parameters at multivariable analysis and radiomics signature in the development cohort.
The multi-class prediction model indicated that the classi cation error (out-of-bag estimate) reached stable with the minimum value of 19.61%, when the number of trees was more than 500 and three variables were tried at each split. Figure 5 illustrated the relationship between error rate and the number of trees in the process of multi-class model construction. In the development dataset, the multi-class radiomics model had an overall accuracy of 0.804 and the respective precision of 0.800, 0.727 and 0.929 for SCN, MCN and IPMN (Table 4). In the test dataset, the overall accuracy to classify the triple tumors was 0.707 and the precision of identi cation severally for SCN, MCN and IPMN was 0.750, 0.667 and 0.722.    (Table 5A). For MCN-IPMN model, the precision was 0.917 and 0.939 in the development cohort (Table 5B). Meanwhile, the precision of SCN-IPMN model was 0.853 and 0.879 in the development cohort in Table 5C.
All binary class prediction models virtually presented the higher overall accuracy and F1-score than the multi-class prediction model both in the development and test cohort. Especially, the model for diagnosis classi cation of MCN and IPMN yielded favorable predictive performance. By analyzing ROC curve in the test cohort, the multiclass radiomics model integrating radiomic and clinical-radiologic features improved the diagnostic accuracy e cacy, compared with the radiomics signature ( Figure 6). The value of AUC was 0.772 and 0.850, respectively. However, the binary-class radiomics model showed the best discriminatory ability, with the value of AUC was 0.914 for SCN and MCN, 0.863 for SCN and IPMN ,0.926 for MCN and IPMN in the test cohort.
The calibration curve demonstrated that the model-predicted subtype was well-calibrated with the pathologically con rmed subtype in the binary-class radiomics models (Figure 7). With decision curve analysis, three binaryclass prediction models displayed a great net bene t under the suitable range of threshold probabilities in the test dataset (Fig. 7). As shown in Figure 8, the nomogram was performed to visualize the binary-class radiomics models and provide the predicted probability of tumor subtypes for the individuals.

Discussion
The study had retrospectively enrolled patients with pathologically diagnosed as three subtypes of PCNs to predict the precisive histological types preoperatively. The RF classi er was the optimal algorithm selected from the multiple radiomics methods used to construct of all the models. Ultimately, all the classi cation models were composed of the three signi cant clinical-radiologic characteristics (age, sex, tumor diameter) and radiomics signature with thirteen radiomics features in the development cohort.
Compared to previous studies, our study had a distinct advantage in the image segmentation method. The ROI of cystic neoplasms was achieved by using automatic segmentation via a new deep-learning network system. The explicit introduction of targets' geometric information in the deep learning network was helpful to obtain a better segmentation boundary [32]. The proposed segmentation method leads to signi cant improvements in the stability and robustness of radiomic features.
Two types of classi cation models were developed to make a more accurate and precise diagnosis of MCN, SCN, and IPMN. The binary-class prediction models displayed a higher overall accuracy both in the development and test datasets than the multi-class prediction model. Apparently, the binary-class radiomics models showed better predictive performances with high AUC values than the multi-class radiomics model in the test cohort. Among the three binary-class classi cation models, the SCN-IPMN model provided the highest value of AUC and presented the excellent net bene t in the analysis of decision curves in the test cohort. Using the nomogram to visualize diagnosis prediction models, the analysis of nomogram indicated that radiomics models had a potential ability for differentiation diagnosis of cystic pancreatic neoplasms. Currently, the treatment protocols of MCN, SCN, and IPMN are signi cantly different in the clinical management of surgery and surveillance. According to current evidence and guidelines, the resection is recommended for MCN which is larger than 40mm or in the presence of symptoms and risk factors, otherwise the follow-up should be performed for screening of disease progression [12]. IPMN are recommended to undergo surgical resection and receive lifelong postoperative surveillance, with evidence of IPMN recurrence or IPMN-associated aggressive malignancy development [34]. Once the diagnosis of SCN is clear, the patients with symptoms of compressing the adjacent organs are only recommended to require surgery; follow-up is required to be continued based on the presence of clinical symptoms after one-year surveillance for asymptomatic SCN. Improving diagnosis accuracy is urgently needed to make personalized and precision medicine become a reality for every patient.
Our research has some limitations. First, all the datasets were from a single medical center and the number of included PCNs patients was slightly limited, although the internal validation was applied to enhance the reliability of radiomics models. Second, the study was a retrospective analysis which resulting in potential selection bias. Third, all of the contrast-enhanced CT images were acquired with the same CT scanner and xed parameters.
Therefore, the reproducibility and stability of radiomics features extracted from CT images required further validation in the multiple brands of CT scanners. Finally, the dataset was only committed to classifying the three most common pancreatic cystic neoplasms. In the future, the other cystic subtypes will be included in the dataset.

Conclusion
In conclusion, the radiomics model consisting of clinical-radiologic characteristics and radiomics features based on enhanced CT images demonstrates the promising diagnosis performance and discrimination ability for the subtypes of pancreatic cystic neoplasms. And the binary-class prediction models are superior to the multi-class prediction model in an overall accuracy of diagnosis. Further studies are required to improve the clinical utility of the radiomics models to help surgeons provide precisive diagnosis information for PCNs patients.   The overview of the Random Forest modeling process. OOB, out-of-bag.

Figure 4
The importance of radiomics features in feature selection with Boruta algorithm. Thirteen important features were retained with the highest rank score in the importance.