This retrospective study was approved by the local institutional (Committee on Ethics of Biomedicine, Second Military Medical University) review board, and written informed consent was waived for each patient. Between March 2017 and September 2018, 182 consecutive patients with rectal lesions identified by colonoscopy with no previous treatment were involved in this study. All patients underwent rectal MRI examination and postoperative pathological test. The exclusion criteria were as follows: chemotherapy or radiotherapy before and after MRI (n = 20), poor image quality (n = 6), and distant metastases (n = 4). Therefore, 152 patients were included in the final analysis.
Magnetic resonance imaging
All patients were scanned on a 3T MRI (MAGNETOM Skyra, Siemens Healthcare, Erlangen, Germany) using an 18-channel pelvic phased-array coil. Every patient fasted for 4 h prior to the scan. Transversal high-resolution T2-weighted turbo spin echo images were acquired with the following parameters: TR/TE = 4000/108 ms, FOV = 180 × 180 mm2, matrix = 320 × 320, slice thickness = 3 mm, gap = 0 mm, acceleration factor = 3, echo train length = 16, and acquisition time = 4 min 10 s. All patients underwent surgery at a time interval of 8.9 ± 5.8 (range, 2–28) days after the MRI examination.
The tissue sections were subjected to hematoxylin and eosin staining. All lymph nodes in the mesorectum were retrieved from the surgical specimens to ensure that at least 12 lymph nodes per patient were collected. The final histopathological reports detailed the tumor TN staging, histological grade, and circumferential resection margin (CRM). All TN statuses were determined according to the American Joint Committee on Cancer staging system, eighth edition [14, 15]. The patients were divided into two groups according to different pathological criteria. Histological grade: high-to-moderate and poor differentiation; T stage: T1–2 and T3–4 stages; and N stage: N0 and N1–2 stages.
The radiomics features were extracted from the VOIs as confirmed by a radiologist (with 8 years of experience in radiology) on high-resolution T2WI using a radiomics analysis platform [Radcloud, Huiying Medical Technology (Beijing, China) Co., Ltd.] (Fig. 1). 1029 high-throughput data features based on feature classes and filter classes were automatically extracted from the platform. The platform feature extraction is based on the “pyradiomics” package in Python (version 2.1.2, https://pyradiomics.readthedocs.io/).
(See Formula 1 in the Supplementary Files)
First, to guarantee image feature robustness, the basis of an intraclass correlation of 0.6 was set for test–retest analysis. Then, the robust features were selected by the least absolute shrinkage and selection operator (LASSO) method to best predict the classification performance. In the LASSO method, leave-one-out cross-validation was used to select the optimal regularization parameter alpha, as the average of mean square error of each patient was the smallest. With the optimal alpha, features having nonzero coefficient in LASSO were reserved.
Prediction model analysis
The machine learning is based on the “scikit-learn” package in Python (version 0.21.3, https://scikit-learn.org/stable/). The original collection was divided into a training set (70%) and a test set (30%) randomly. Moreover, to lower the imbalance impact of samples distribution of the degree of histological grade and N stage, the synthetic minority oversampling technique algorithm was used in the training set. The
multilayer perceptron (MLP), logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), and K-nearest neighbor (KNN) classifiers were trained (the parameters of the six classifiers are shown in Table 5) using fivefold cross-validation to build a prediction model. Moreover, the independent test set was used to test the performance of the model. The experiment used the mean model as the final model for the test set. The performance of models for the statistically significant pathological features was assessed using sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve (AUC). P value <0.05 was considered statistically significant