Study population
112 cases that underwent breast MRI were confirmed by retrieving our institutional picture archiving and communication system (PACS) between March 2013 and September 2017. The inclusion criteria were as follows: (1) patients who had suspected breast tumors and accepted breast MRI; (2) all patients confirmed with malignant breast tumors by histopathologic examination; (3) patients with the status of ER, PR, HER2 and Ki-67 obtained from immunohistochemical analysis; (4) High quality DW images used for outlining the lesions, without size threshold for the lesions. And the exclusion criteria were as follows: (1) patients with breast lesions who underwent any treatment before breast MRI, including surgery, chemotherapy or / and radiotherapy, as well as anti-HER2 therapy; (2) patients with bilateral breast lesion; (3) patients with suspected breast metastatic tumors; (4) DW images were illegible for assessment; (5) patients with pseudo-tumors or tumor-like lesions, including chronic inflammatory nodule, adenosis of breast, fat necrosis nodule; (6) patients with tumors located in skin and areola.
Clinicopathologic subtyping
The immunohistochemical data of 112 cases was obtained by retrieving the hospital information system. The status of ER, PR, HER2 and Ki-67 was determined by immunohistochemical test. ER and PR expression are considered positive if there are at least 1% of tumor cells showing positive nuclear staining [13]. The HER2 status is defined positive if it is presented immunohistochemical score 3+ or/and in situ hybridization positive [14]. A Ki-67 index of higher than 14% is regarded as being at high level [15]. There are five clinicopathological subtypes of breast cancers [5]: luminal A, ER and/or PR positive, HER2 negative and Ki-67 low (<14%); luminal BHER2- (luminal B/HER2 negative), ER and/or PR positive, HER2 negative and Ki-67 high; luminal BHER2+ (luminal B/HER2 positive), ER and/or PR positive, any Ki-67 and HER2 over-expressed or amplified; HER2 positive, HER2 over-expressed or amplified, ER and PR absent; triple negative, ER and PR absent, HER2 negative.
Imaging data
All of 112 cases underwent breast MR examinations on a 3.0T MR system (Siemens Healthineers, MAGNETOM Skyra). Only diffusion-weighted MRI was used in this study. The DW imaging was performed with a 4-channel breast coil while patients lied prone: With axial imaging plane; repetition time/echo time, 5400/55 ms; field of view, 350mm; voxel size, 1.8 × 1.8 × 5.0 mm; slice thickness 5 mm; spacing 0 mm; NEX 2; acquisition matrix 128 × 128; b value (s/mm2), 0 and 800. The acquisition time of DWI was about 125 seconds. Other imaging protocols were as follows: (1) Axial T2-weighted imaging with fat-suppression/SPAIR: repetition time/echo time, 3500/68 ms; field of view, 350 mm; voxel size, 0.5 × 0.5 × 5.0 mm; slice thickness 5 mm; Flip angle 80°; NEX 1; (2) Sagittal T2-weighted imaging with fat-suppressed: repetition time/echo time, 3200/66 ms; field of view, 260 mm; voxel size, 0.8 × 0.8 × 4.0 mm; slice thickness 4 mm; Flip angle 120°; NEX 1; (3) Axial T1-weighted imaging without fat-suppression: repetition time/echo time, 6/2.46 ms; field of view, 340 mm; voxel size, 0.8 × 0.8 × 1.6 mm; slice thickness 1.6 mm; Flip angle 15°; NEX 1; (4) 3D T1-weighted pre-contrast imaging with fat-suppression: repetition time/echo time, 4.49/1.68 ms; field of view, 340 mm; voxel size, 1.0 × 1.0 × 1.2 mm; slice thickness 1.2 mm; Flip angle 10°; NEX 1; (5) 3D dynamic contrast-enhanced (DCE) T1-weighted imaging with fat-suppression by injection of Gd-DTPA (0.1mmol/kg), acquiring seven phases after injection. The entire acquisition time was about 26 minutes.
Image segmentation and feature extraction
Diffusion-weighted images of each case were saved and transferred to a radiomics analysis package, i.e. Artificial Intelligent Kit (A.K.) software (GE Healthcare, Shanghai, Version 3.0.1). The T2-weighted images and DCE-MR images were reviewed for lesion validation. The segmentation of breast tumor on DW image (b value, 800) was performed by using a two-step approach: first, tumor margin was delineated manually slice by slice and regions of interest (ROI) were obtained; second, these ROIs were merged automatically by the A.K. software, and volume of interest (VOI) of a tumor was finally completed. During ROI determination, both cystic and necrotic areas of the tumor were included in the ROI. Moreover, only one biggest lesion was selected in patients with multiple unilateral tumors.
396 radiomic features could be derived from the VOI from DW image by A.K. software, as showed in Figure 1. These features were categorized into six statistic methods including texture parameters, gray level size zone matrix (GLSZM), grey level co-occurrence matrix (GLCM), form factor parameters, run length matrix (RLM) and Histogram. Texture parameters represent the appearance of the surface and how its elements are distributed. GLSZM provides a statistical representation by the estimation of a bivariate conditional probability density function of the image distribution values. GLCM represents the joint probability of certain sets of pixels with certain grey-level values. RLM is defined as the number of runs with pixels of grey-level i and run length j for a given directionθ.
Preprocessing
The training dataset was built by 396 radiomic parameters of 112 breast cancer cases. To eliminate redundant radiomic parameters, preprocessing of the training dataset were performed as follows (Figure 2): First, if one value of a certain radiomic feature was out of the range of mean ± standard deviation (SD), it would be considered as an outlier and then be removed from the dataset; Second, the Pearson correlation analysis was conducted on the two radiomic features in training dataset, and if the correlation coefficient between pairwise features was above 0.9, one of the two features would be removed by random; Third, mean center and standard deviation scale were performed to standardize the variables to the same values range. Finally, noise processing with linear smoothing filtering was automatically done by A.K. software [16][17] .
Classifier Building
The Fisher discriminant analysis was used for clinicopathological subtyping by using a backward selection method [18]. An approach of 104 times of iteration and 84 variables was determined to establish the Fisher discriminant model (Function 1 to Function 5).
To illustrate the process of building Fisher discriminant model for the differential analysis of five clinicopathological subtypes of breast cancers, equations were shown as below: Xi was the radiomic features used for the function building, and Yi was the score of one specified unbeknown patient.
The leave-one-out cross-validation method was used for testing the Fisher discriminant analysis model. If the sample size was n, leave-one-out cross-validation was accomplished by the prediction of remaining samples with the discriminant model established by n-1 samples, and the final prediction results for all samples would be obtained after the iterations (n times), and then was regarded as the criteria standard for the prediction of the model.
Receiver operating characteristic curve (ROC) analysis
To predict different status of immunohistochemical biomarkers, diagnostic performance of radiomic features was assessed by ROC analysis with a two-step approach. Radiomic features after preprocessing were included for calculating the predicted value of ER status, PR status, HER2 status and Ki-67 index by the binary logistic regression. Then, ROC analysis was performed by using those predicted values. Based on the predicted values, ROC analyses were performed for differentiating between ER positive and negative group, PR positive and negative group, HER2 positive and negative group, as well as Ki-67 low and high group. The z test was performed to compare the areas under ROC (AUROC) between different radiomic features.
Binary logistic regression and ROC analyses were done by using IBM SPSS version 19.0 (IBM Corporation, New York). A p-value < 0.05 was considered statistically significant.