A retrospective study about breast cancer patients who had DCE-MRI and immunohistochemistry examinations simultaneously: two anonymized and published datasets were downloaded from The Cancer Imaging Archive (TCIA). The first dataset, including 222 female patients, was used as a cross-validation group, and the second dataset, including 64 female patients, was used as a test group (Fig.1a). Some overlapped patients were researched to predict recurrence-free survival after neoadjuvant chemotherapy in previous studies . In the present study, patients (n=21) who had incomplete immunohistochemistry information of HER2 expression were excluded. Patients (n=24) who had incomplete DCE-MRI series were also excluded. There were also some patients (n=5) whose tumors could not be clearly observed on images. In the end, 189 patients in the cross-validation group and 47 patients in the test group met the research criteria.
DCE-MRI of the cross-validation group was performed with a 1.5T system (Siemens Healthcare, Erlangen, Germany/General Electric Healthcare, Fairfield, America) by using a dedicated radiofrequency coil. The image acquisition protocol included a localization scan and T2-weighted sequence followed by a contrast-enhanced T1-weighted series. All imaging was performed unilaterally over the symptomatic breast and in the sagittal orientation. The contrast-enhanced series consisted of a high resolution (≤1mm in-plane spatial resolution) three-dimensional, fat-suppressed, T1-weighted gradient-echo sequence with TR≤20 ms, TE = 4.5 ms, flip angle ≤ 45º, 16-18 cm field-of-view, minimum matrix 256 192, 64 slices, slice thickness ≤ 2.5 mm. Scan time length for the T1-weighted sequence was required to be between 4.5 and 5 minutes. The sequence was acquired once before contrast injection and repeated at least twice following injection.
DCE-MRI of the test group was acquired on a 1.5-T scanner (General Electric Healthcare, Fairfield, America) by using a bilateral phased array coil. The imaging protocol included a 3D localizer and a unilateral sagittal DCE acquisition (TR/TE 8/4.2; flip angle 20 degrees; field of view 18-20 cm; acquisition matrix 256 192 60, section thickness 2 mm; spatial resolution 0.7 0.94 2.0 mm3). Imaging time was approximately 5 minutes per acquisition, resulting in effective early and late post-contrast time points of 2.5 minutes and 7.5 minutes from the start of the contrast injection, respectively, using standard k-space sampling. In addition, fat suppression was performed using a frequency-selective inversion recovery preparatory pulse.
The central position of each tumor was marked by an experienced radiologist who was blinded to any clinical or immunohistochemistry information. Image segmentation using a fuzzy C-means algorithm (FCM)  was performed on median postcontrast series. Skin and fatty tissues were excluded from breast images to get the entire parenchyma surrounding a tumor. Similar to the previous study , peritumoral parenchyma tissue of <20 mm from the tumor margin was researched. The tissue boundary of each tumor or parenchyma was automatically produced by using a segmentation program on MATLAB v.2019a. These poor segmentations checked by our investigators were manually corrected.
CAM decomposition tool (MATLAB source code can be downloaded at the website www.cbil.ece.vt.edu/software.htm) has been developed to dissect complex tissues into subregions with differential contrast enhancement patterns. Tracer concentration x(i,t) of a voxel can be expressed as a nonnegative linear combination of latent tissue-specific compartmental time-series curves aj(t) and relative tissue type proportions DCE-MRI measured values satisfy the definition of a convex set [30, 31]:
j is the number of functional tissue compartments, and aj is the vector notation of aj(t) over time.
There were different imaging protocols in patients between the cross-validation group and test group, so time intervals of the DCE-MRI series were different. Therefore, images between two slices were interpolated linearly to obtain a unified 60s time interval in this study. According to the definition in previous studies of CAM, three main subregions were termed slow-flow (steady enhancement), fast-flow (plateau of signal intensity), and plasma-input (washout of signal intensity), respectively. In addition, K(i) of approximately one was defined as the pure subregion, and the K(i) between zero to one was defined as the overlapped subregion (Fig.2).
Radiomic Features Extraction
Three-dimensional radiomic features, including texture and histogram, were valuable for signal intensity variation in DCE-MRI. Radiomic features have been widely used to quantify pathology characteristics in medical studies [23, 24, 27, 33]. As shown in Table 1, 18 texture features and ten histogram features were acquired from each DCE-MRI series. Signal intensities of DCE-MRI were changed along with imaging time after injecting tracer so that precontrast, median postcontrast, and last postcontrast series were utilized to acquire these radiomic features. In the end, 84 (18 3+10 3) features were extracted from images of an undecomposed tissue or a CAM subregion.
The differences of patient race and tumor laterality were assessed using the X2 test or Fisher’s exact test if the expected frequency in any cell of the contingency table was <5. The differences in patient age, maximum diameter, lesion volume, and parenchymal density were assessed using analysis of variance (ANOVA). One-way ANOVA evaluated the variety of one feature between cancers with different HER2 expression, and the ability to predict HER2 expression was evaluated using a univariate logistic model. The random forest model formed by 84 radiomic features was built as the predictor of HER2 expression. The model was trained and tested using the leave-one-out cross-validation (LOOCV) method. The predictive power was assessed using the area under the curve (AUC) of receiver operating characteristic (ROC). In each LOOCV loop, a 10-fold cross-validation test was applied in the training set to achieve optimum parameters of the random forest model . The 95% confidence interval of a single AUC and the comparison between two AUCs were determined using a bootstrap test. A two-tailed P of <0.05 was considered statistically significant, and Bonferroni correction was performed in multiple-comparison tests. All statistical analyses were performed on R v.3.4.2 and MATLAB v.2019a.