This study was conducted with the permission of the institutional review board of our hospital (No. 2021QT339). The individual informed consent was waived for this retrospective study. Between January 2017 and January 2021, there were 497 patients whom were pathologically confirmed as RC were enrolled. The patients who underwent preoperative CT examinations within two weeks before surgeries, were confirmed as the type of classical adenocarcinoma of RC, didn’t received preoperative chemotherapy or chemoradiotherapy, were not accompanied with other cancer, and were taken MSI testing were chosen. The flow-chat of patients enrollment was listed in Fig. 1. Finally, 497 RC patients including 96 patients with MSI status and 401 patients with MSS status were retrospectively selected in this study.
All the patients underwent triphasic CT examinations on a 64 slices (127 patients) or 128 slices (370 patients) CT scanner (Somaton Definition AS, Siemens, Germany) including unenhanced phase, arterial phase, and venous phase. The triphasic CT scanning was conducted after injecting a does of 1.3 mL/Kg contrast media (iomeperol 350) at a a rate of 3.0 mL/s. Then the arterial phase and venous phase were scanned after 15 seconds and 30 seconds of unenhanced phase. The uniform parameters were as follows: tube voltage 120 Kv, tube current 200mA, field of view 360mm, rotation time 0.75s, collimation 64*0.625mm, interval thickness 5mm.
Evaluation the clinicopathological characteristics and MSI status
The clinicopathological characteristics of RC patients comprised age, gender, CT-displayed diameter, location, carcinoembryonic antigen (CEA), carbohydrate 19 − 9 (CA19-9), lymph node metastasis ratio (LNR, LNR = positive lymph node count/lymph node count*100%), perineural invasion (PNI), extramural venous invasion (EMVI), the history of smoking, drinking, diabetes, and hypertension. The tumor location was divided into low-lying which referred to the lesion located within 5cm from anal margin, middle-lying which referred to the lesion located between 5cm to 10cm from anal margin, and high-lying which referred to the lesion located more than 10cm from anal margin. And the tumor located in the rectosigmoid junction was classified as high-lying RC. The threshold values of CEA and CA19-9 were 5.0 ng/mL and 37.0 U/mL. When the tumor histopathologically invaded the surrounding tissues including perineural structure and extramural venous was defined as PNI and EMVI. The MSI status was assessed by the method of immunohistochemistry to test MMR proteins including MLH1, MSH2, MSH6, and PMS2. Then the RC patients were divided into MSI and MSS group based on whether they were deficient in one or more MMR proteins.
Tumor segmentation and radiomic features selection
The process of tumor segmentation was divided into three steps: (1) before tumor segmentation, the DICOM images were reconstructed into the voxel of 1.0 in X/Y/Z axes and the gray scale into 1 to 32 in A.K. software (Artificial Intelligence Kit, GE Healthcare) for standardization, automatically. (2) the tumoral volume of interest (VOI-t) was depicted in itk-SNAP software (Version 3.4.0. https://www.itksnap.org/) by two radiologists with about 10 years of diagnostic experience, manually (Fig. 2a). (3) the peritumoral VOI (VOI-pt) was acquired after expanding 5mm from the VOI-t in A.K. software, automatically (Fig. 2b). The regions of intraluminal air, peritumoral structures including bone, bowl, prostate, and uterus were eliminated from the contours of VOI-pt.
The selection of radiomic features was divided into four steps: (1) after the segmentation of VOI-t and VOI-pt, the radiomics features were calculated in A.K. software, automatically. (2) the radiomic features from two radiologists were compared by the method of inter-observer correlation coefficient (ICC). And ICC greater than 0.75 is considered to be of good reliability and accuracy. So the phase with more radiomic features with ICC greater than 0.75 was chosen for analyze. (3) the cohort was randomly assigned into the training set and validation set with a proportion of 7:3. (4) the dimension reduction of radiomic features was performed by the method of pre-processing, variance, correlation analysis, and least absolute shrinkage and selection operator (LASSO). Specific information was reported in Supplementary Materials.
Radiomics-based machine learning
After the selection of radiomic features, six machine learning algorithms including logistic regression (LR), Bayes, support vector machine (SVM), random forest (RF), k-nearest neighbor (KNN), and decision tree (DT) were conducted to construct radiomics models. The 100 Bootstrap replication and its relative standard deviation (RSD) was taken to quantify the stability of six algorithms. The equation of RSD was: (the standard deviation of the 100 AUC values of each machine learning algorithm)/(the corresponding mean value of the 100 AUC values)*100%. The lower the RSD value, the higher the stability of the algorithm. Therefore the algorithm with minimal RSD value was selected for further analysis. Finally, the radiomics score (Rad-score) was calculated to quantify the radiomics-based machine learning algorithm in predicting the MSI status of RC.
Integration of radiomics and clinicopathological characteristics analysis
The method of multivariate logistic regression of backward stepwise selection was used to analyze the integration of radiomic features and significant clinicopathological characteristic, and the integrative model was built. The receiver operator curve (ROC) was made by Delong test and the area under curve (AUC) with 95% confidence interval (95%CI) was calculated to evaluate the performance of the model. The Hosmer-Lemeshow test was taken to evaluate the goodness-of-fit and accuracy of the model.
All statistical analysis for the radiomic features selection and machine learning algorithm were performed in R software (Version 3.5.1, https://www.r-project.org/) and Python (Version 3.5.6, https://www.python.org/). The methods to analyze the clinicopathological characteristics including independent t-test and Pearson chi-square test were implemented in SPSS software (Version 22, https://spss-64bits.en.softonic.com/). The Delong test and ROC were carried out in MedCalc software (Version 18.2, https://www.medcalc.org/). A two-tailed p value < 0.05 indicated statistical significance.