The institutional review board approved our study (No. 2018-L-23) while waiving the need for written informed consent.
Patient enrollment
We retrospectively reviewed the health-related information from the database of our institution between January 2018 and October 2021. Finally, 320 women satisfying the following inclusion criteria were chosen: (1) MRI performed within two weeks before biopsy; (2) initial diagnosis of unilateral primary breast cancer based on preoperative percutaneous core needle biopsy; and (3) confirmed luminal cancer based on immunohistochemistry (ER or PR positive and HER2 negative) based on biopsy specimens. Ultimately, we removed 180 patients based on the following criteria: (1) MRI examination not performed within two weeks before biopsy (n = 120); (2) missing values of Ki-67 (n = 22); (3) nonmass or multifocal breast cancer (n = 18); and (4) inadequate breast MR images (lacking T2WI, DWI, ADC, DCE) or unclear image quality affecting lesion segmentation and evaluation (n = 20). Ultimately, 140 women were consecutively enrolled (mean age with a standard deviation of 51 years ± 12; age range: 26–77 years). Based on the St. Gallen consensus[8], we divided the participants into two subgroups: (1) a high Ki-67 expression group (≥ 14%, luminal B (HER2-negative) type, n = 85) and (2) a low Ki-67 expression group (< 14%, luminal A type, n = 55) (Fig. 1).
MRI examination
MRI was performed using a 3.0-T scanner (Achieva; Philips Healthcare, best, Netherland) in the axial orientation with a specific coil of a seven-channel breast array. All the imaging sections were obtained by putting the patients in a prone position. Bilateral axial T2-weighted images with fat suppression were acquired (TR, 5375 ms; TE, 65 ms; FOV, 340 mm × 340 mm; flip angle, 90°, matrix, 620 × 303; slice thickness, 3 mm; space, 0 mm). DWI was acquired utilizing echo-planar imaging in the axial plane with fat suppression (TR/TE, 5417 ms/72 ms; FOV, 320 mm×320 mm; flip angle, 90°, matrix, 96 × 126; slice thickness, 3 mm; space, 0 mm; b values, 0 and 1000 s/mm2). A 3D T1-weighted gradient-echo sequence, which was rapidly spoiled, was used to obtain one precontrast and five postcontrast dynamic series by suppressing the fat (TR/TE, 5 ms/2 ms; FOV 340 mm × 340 mm; matrix, 436 × 436; flip angle, 12°; slice thickness, 1 mm; space, 0 mm) by intravenously injecting 0.1 mmol/kg gadoterate (Gadovist, Bayer Healthcare).
Routine clinical MRI evaluation
Two breast radiologists, with 15 and 5 years of experience, independently and blindly assessed all the MR images. They utilized the 2013 Breast Imaging Reporting and Data System MR lexicon for their assessment. Routine clinical variables included patient age, tumor maximum diameter, shape, invasion (skin, nipple, pectoral muscle, chest wall), T2 signal, ADC value, first-minute enhancement rate, maximum enhancement rate, time to peak (Tmax), and time-signal intensity curves (TICs).
Radiomic feature extraction
T2WI, DCE (the 1st phase), DWI (b = 1000), and ADC datasets were incorporated into the 3D-Slicer V4.11.2 free software (https://WWW.slicer.org/), which is typically used for processing images[25]. The same two radiologists performed a semiautomated delineation of each slice of the lesions (including necrosis and liquefaction, which showed intertumoral hyperintensity on T2WI). The regions of interest (ROIs) were delineated using T2WI, DWI and DCE sequences. Subsequently, the ROIs extracted from the DWI sequence were replicated onto the corresponding ADC maps. Radiomic features were extracted from four ROIs using the PyRadiomics package in Python version 3.7 (https://www.python.org/). We extracted 851 features from each segmented sequence of the tumor 3D-ROI (DCE, DWI, T2WI, and ADC). The features had 162 first-order, 14 shapes, and 675 textures, primarily derived from the maximum short axis, kurtosis, and gray-level matrices of co-occurrence, dependency, run length, etc.
Selection of characteristic features and development of radiomic signatures
To reduce redundant features and avoid overfitting, three procedures were sequentially executed to determine the best features for predicting the Ki-67 status in luminal breast cancer. First, the independent sample t test helped eliminate the radiomic features without statistically significant differences between the two groups. Subsequently, the Z score method was utilized to separately normalize the selected radiomic features from the ROIs of each MRI sequence. Then, the LASSO algorithm was used to extract the features with nonzero coefficients at a specific coefficient lambda. Fivefold cross-validation was used to determine the optimal coefficient lambda in the LASSO method.
Establishment and evaluation of machine learning classifiers
We randomly divided the data into training and validation sets at a 7:3 ratio (70% training and 30% validation). (1) Building and validating the routine clinical MRI signature. We developed signatures using the training set with clinical and routine MRI characteristics, and features were incorporated into further selection at a significance of p < 0.05. Next, four classification algorithms (logistic regression (LR), random forest (RF), support vector machine (SVM), and multilayer perceptron (MLP)) were selected for model training and validation. (2) MRI radiomic signature development and authentication. The training set was initially evaluated for predicting Ki-67 expression and then validated using 5-fold cross-validation in the inner loop. In the training set, we established signatures using radiomic features extracted from the T2WI, DCE, DWI and ADC images, as well as mpMRI radiomic features derived from four MRI sequences. Next, the four classification algorithms (LR, RF, MLP, SVM) were used for model training and validation.
Constructing the clinical-radiomic nomogram and decision curve analysis
A predictive clinical-radiomic nomogram model was formulated by constructing multivariate analysis and regression modeling using the Rad-score combined with routine clinical MRI variables. The Rad score was calculated from mpMRI radiomic features and using the intercept and coefficients obtained from LR, as shown in the following formula:
Rad-score = 1.1998 − 0.03480*F1-0.3029*F2 + 0.3678*F3 + 0.2167*F4 + 0.3736*F5 + 0.6594*F6 + 0.5211*F7 + 0.6672*F8 + 0.1984*F9-0.2013*F10 + 0.2827*F11
F1 = wavelet-LLH_gldm_SmallDependenceLowGrayLevelEmphasis (ADC)
F2 = wavelet-HLL_gldm_SmallDependenceLowGrayLevelEmphasis (DCE)
F3 = wavelet-HHH_gldm_DependenceNonUniformity (DCE)
F4 = wavelet-LHL_glcm_Correlationoriginal_shape_MinorAxisLength(T2WI)
F5 = wavelet-LHH_glcm_Imc2original_shape_MinorAxisLength(T2WI)
F6 = wavelet-LHH_gldm_DependenceVarianceoriginal_shape_MinorAxisLength(T2WI)
F7 = wavelet-LHH_glrlm_LongRunLowGrayLevelEmphasisoriginal_shape_MinorAxisLength(T2WI)
F8 = wavelet-HLH_glcm_Correlationoriginal_shape_MinorAxisLength(T2WI)
F9 = wavelet-HLH_glrlm_LongRunEmphasisoriginal_shape_MinorAxisLength(T2WI)
F10 = wavelet-HHH_firstorder_Meanoriginal_shape_MinorAxisLength(T2WI)
F11 = wavelet-HHH_glcm_Idnoriginal_shape_MinorAxisLength(T2WI)
Then, calibration curves were used to evaluate the performance of the nomogram in the training and validation sets. The clinical usefulness of the nomogram was evaluated using clinical decision curve analysis (DCA). Three decision curves were plotted based on the routine clinical variables, radiomic signature and nomogram, and the net benefits of each decision curve were calculated. Python 3.7 was used for conducting all the analyses. We implemented the features and classifiers using the Scikit-learn 1.0.2 package. Figure 2 presents a concise flowchart outlining the workflow of this study.
Statistical analysis
We analyzed and compared the categorical groups with the χ2 or Fisher’s exact tests. Additionally, we compared two groups of continuous variables with the independent-sample t test. ROC analysis helped predict the signature performance. The area under the ROC curve (AUC) facilitated the sensitivity and specificity evaluation for every signature. We compared the AUCs for two ROC curves using the DeLong test. Multivariate LR models were constructed for the nomogram. Statistical significance was determined for all analyses with a two-sided p value threshold < 0.05. We conducted the statistical analyses with SPSS version 26.0 and R software (version 4.2.1, http://www.r-project.org).