Two datasets were collected to investigate the effects of delineation of VOIs on radiomics analysis. The first problem is to distinguish whether metastasis occurs in patients with NPC before radiotherapy. In clinical practice, the majority of NPC patients with metastasis before radiotherapy suffer from poor prognosis [21-23]. Hence, it will be beneficial to improve prognosis if the risk of transfer before treatment can be distinguished accurately and take timely intervention on patients with high risk of metastasis. Our study retrospectively recruited 238 patients with NPC who had been diagnosed by histopathology between August 2009 and January 2013. All patients were divided into two groups in accordance with the metastasis status: (i) metastasizing (TM) group with 126 patients; (ii) non-metastasizing (NM) group with 112 patients.
The second problem is the prediction of sentinel lymph node (SLN) metastasis in patients with breast cancer, as described in [24]. It is of great significance using radiomics analysis to predict SLN metastasis for treatment decision making in breast cancer. A total of 146 consecutive patients with histologically confirmed breast cancer between March 2014 and June 2016 were retrospectively enrolled in this work. The patients consisted of two groups on the basis of SLN metastasis: (i) TM group with 55 patients; (ii) NM group with 91 patients. The inclusion criteria are available in Supplementary note 1.
The patients were divided into a strictly training cohort for radiomics model building and an independent validation cohort (25% in NPC group and 33% in SLN group) for evaluating the final prediction performance. The detailed demographic characteristics and clinical information are summarized in Table 1.
Image acquisition protocol
All analyses were carried out in accordance with the relevant guidelines and regulations, and the requirement to obtain informed consent was waived. This retrospective study was approved by the local institutional review board.
a) NPC group: All patients had scanned axial contrast- enhanced T1-weighted (CET1-w) and T2-weighted (T2-w) images acquired from a 1.5-T GE scanner (Signa EXCITE HD, TwinSpeed, GE Healthcare, Milwaukee, WI, USA) and a 1.5-T Philips scanner (Achieva, Philips Healthcare, The Netherlands). The GE MRI acquisition parameters were as follows: Contrast-enhanced T1-weighted images (TR/TE: 410/Min Full ms, FOV = 230 × 230 mm2, NEX = 2.0, slice thickness = 4 mm, spacing = 1 mm); T2-weighted images (TR/TE: 5000/85 ms, FOV = 230 × 230 mm2, NEX = 2.0, slice thickness = 4 mm, spacing = 1 mm). The Philips MRI acquisition parameters were as follows: Contrast- enhanced T1-weighted images (TR/TE: 636/20 ms, FOV = 220× 220 mm2, NEX = 4.0, slice thickness = 4.5 mm, spacing = 1 mm); T2-weighted images (TR/TE: 3700/100 ms, FOV = 220 × 220 mm2, NEX = 3.0, slice thickness =5 mm, spacing = 1 mm).
b) SLN group: All patients underwent pretreatment T2-weighted fat suppression (T2-FS) and diffusion-weighted images (DWI) scan. The anatomical MRI data were acquired on a 1.5-T MR scanner (Achieva, Philips Healthcare, Best, The Netherlands) equipped with a 4-channel SENSE breast coil in prone position. Axial DWI with bilateral breast coverage were obtained (TR/TE = 5065/66 ms, FOV = 300 × 300 mm2, matrix = 200 × 196, slice thickness = 5 mm, slice gap = 1mm, b values of 0 and 1000 s/mm2) by using single-shot spin-echo echo-planar imaging (EPI). T2-FS images of breast were collected (TR/TE = 3400/90 ms, FOV = 320 × 260 mm2, matrix = 348 × 299, slice thickness = 3 mm, slice gap = 0.3 mm).
As for the subjects in two datasets enrolled in our study, multi-sequence MR images are required from several MR scanners with different protocols, hence image standardization are essential for all images to avoid the inhomogeneity. Prior to analyzing MR images, additional image standardization involving bias field correction and intensity normalization were conducted to avoid inhomogeneity. First, the N4ITK algorithm [25] was applied to remove the bias field artifacts in the MR images. Subsequently, intensity normalization [26] was utilized to reduce the variability across image acquisitions from different manufactures. Fig. 1 illustrates the schematic framework of the radiomics analysis in this work.
Volume of interest segmentation
All MR images were imported into the ITK-SNAP software designed by Yushkevich et al. [27] to define the VOI of each tumor. In view of patient examination and medical history, the tumor contours were manually outlined slice-by-slice by a radiologist (with ten years of experience) and verified by a senior radiologist (with twelve years of experience).
On the basis of the original segmented regions, erosion, dilation, and smoothing were performed on the VOIs slice-by-slice to generate diverse VOIs. For the dilation operation, the radius sizes (number of pixels) of the circular structural element to dilate the VOIs were separately set as 3, 5, and 7. Given that certain tumors were extremely small, the size for the erosion operation was only set to 3. Image smoothing for VOIs was implemented by a Gaussian smoothing filter configured with correlation operator, where sigma was set as 3 and the template size was 7×7. Pixel values outside the bounds of the region of interest were set to the value of the nearest border. The three types of operations were respectively implemented using the functions imdilate, imerode, and imfilter of MATLAB version 8.5 (MathWorks, R2015a). No additional processing was implemented on the contours. For the sake of analysis, five operations on VOIs were abbreviated as Erosion, Smoothing, Dilation, Dilation5, Dilation7, respectively. The VOIs accurately delineated by radiologists were denoted as Baseline. Fig. 2 exemplifies the VOI in a single slice of the original tumor and presents the corresponding drawing of partial enlargement under different operations simultaneously. The degree of tumor volume change of diverse delineations in relation to the accurate delineation is summarized in Table 2.
A total of 2068 radiomics features were extracted for each VOI. In reference to [12], four non-texture features that describe the geometric characteristics were calculated, including tumor volume, size, solidity, and eccentricity. In view of the effects of varying extraction parameters on texture features, three extraction parameters, respectively, isotropic voxel size, quantization of gray levels, and quantization algorithm, were adopted, thereby leading to 2064 textural features for each patient. The textural features consisted of Global (extracted from the intensity histogram with 100 bins of the tumor region), grey-level co-occurrence matrix (GLCM), grey-level run length matrix (GLRLM), grey-level size zone matrix (GLSZM) and neighbourhood grey-tone difference matrix (NGTDM) [28-30].The detailed extraction parameters and description are available in Supplementary note 2 and Table S1.
The feature selection was performed within the training cohort. Maximum Relevance Minimum Redundancy (mRMR) [31], which has good trade-off between the maximum relevance and minimum redundancy, was firstly explored to identify a well-ranked feature set that included 100 features. Referring to [12, 32], the 0.632+ bootstrap method combined with the area under the receiver operating characteristic curve (AUC) metric were adopted to evaluated the predictability of features (Supplementary note 3). One thousand iterations were performed with 63.2% random data resampling from the training cohort between runs. The features selected in the previous step were ranked through maximizing the 0.632+ bootstrap AUC to determine the final twenty top-predictive features that maximally distinguished two classes.
Development of radiomics model
Once the discriminative features were identified, radiomics models were built based on different feature sets. A new feature set was composed when one feature from higher to lower rank was added, which contributed to 20 radiomics models. We used the random forest classifier [33] to evaluate the capability of foregoing radiomics models across 10-fold cross-validation in the training cohort, with 150 decision trees used for training ultimately. The model that possessed the most superior properties was determined for further analysis.
First, Mann-Whitney U was used to compare the difference in age and other continuous variables between TM and NM. Chi-square test was performed to analyze the differences based on factors, such as gender and clinical stages. Statistical analysis was performed on SPSS version 22.0 (IBM, Armonk, NY, USA).
The robustness of features against delineation differences versus the accurate VOIs was quantified using the intra-class correlation coefficient (ICC). Features with ICC ≥ 0.9 were considered excellent robust. The performance of diverse VOI-derived radiomics models were assessed by AUC, and the differences were compared by the method of DeLong et al. [34] using the MedCalc version 15.2.2 (MedCalc Software bvba, Ostend, Belgium). Note that a two-tailed p value less than 0.05 indicated statistical significance in this work.