Data set
A total of 310 patients with complete DCE-BMRI and pathological data were collected from January 2017 to December 2020, including 17 patients with bilateral lesions (benign and malignant lesions on one side). All lesions were pathologically confirmed (surgical or needle biopsy) in this part. Lesions were divided into benign and malignant groups. Age, pathological type, and tumor diameter were compared between the two groups (Table 1). There was no significant difference in age between the two groups (P = 0.52). A statistically significant difference was observed in lesion diameter (p < 0.05). The inclusion criteria were as follows:Ⅰ: patients who did not receive any preoperative chemotherapy or chemoradiotherapy before MRI. Ⅱ: No puncture or surgical procedure was performed before the MRI examination. We discarded images under the condition that benign and malignant lesions appeared in the same image.
To avoid the influence of benign and malignant lesions in the bilateral breasts of the same patient, DCE-BMRI images were unilateral images one by one. Finally,2124 benign images (benign group) and 2226 malignant images (malignant group) were obtained. The images of the benign and malignant groups were randomly divided into a training set (benign group:1704 images, malignant group: 1786 images), test set (benign group:210images, malignant group:220 images) and validation set (benign group:210images, malignant group:220 images) according to a ratio of 8:1:1.
MRI Techniques
MRI was performed using two 3T MRI scanners with a dedicated breast coil in the prone position. Gd-DTPA (0.1 mmol/kg, 2.50 mL/s) was administered via elbow vein injection. A total of six phase images were acquired (one pre-contrast phase and five post-contrast dynamic enhancement phases). The detailed scanning parameters are listed in Table 2. MRI was performed preoperatively and before therapy initiation.
Proposed model
The computer environment was configured as the Windows 10 enterprise version of the 64-bit operating system, Intel (R) Core (TM) i7-10700F, NVIDIA RTX 2060 GPU, 6 GB. The other running programs are closed when the model is running. To facilitate the comparison of the network performance, the same training and validation set were selected for each network. The thresholds of the models were set at 0.5; if a result was ≥0.5, the image was judged as malignant, while it was predicted as benign otherwise. The architecture of the proposed DTL model for breast lesion classification is shown in Figure 1.
Data augmentation
The images were randomly shuffled using a set of programs. Data augmentation was performed before model training. The original images were augmented by flipping and rotating the original image, so that the augmented image maintained the original medical characteristics. The parameters and values are listed in Table 3.
Network structure of MobileMetV1
The main principle of a MobileNetV1 model is the application of depth-wise separable convolutions, which is made of a depth-wise convolution and a pointwise convolution[19]. The kernel of the depth-wise convolutional layer slides only to convolve with one input channel. Pointwise convolutional layer with a convolution kernel size of 1 × 1. Channels of output matrices are equal to the number of convolutional kernels, while channels of input matrices are equal to the number of convolutional kernels channels.
Network structure of MobileNetV2
In MobileNetV2, convolutional layers, bottleneck layers, and an average pooling layer form the basic network structure. The structure of the bottleneck layers can be found in reference[18], it usually includes pointwise convolution and depth-wise convolution; when the stride is 1, the input is added to the output. Another structural feature of MobileNetV2 inverted residuals[18]. In addition, a Relu6 serves as activation function in inverted residuals, this function is defined according to the expression:
$$\text{y}=\text{R}\text{e}\text{L}\text{U}\left(6\right)=\text{m}\text{i}\text{n}\left(\text{max}\left(\text{x},0\right)6\right)$$
1
Fine-tuning strategy
In this study, we designed two fine-tuning strategies for MobileNetV1 and MobileNetV2, they were S0 and S1. In S0, all parameters were non-trainable except for parameters in fine-tuned fully connected layers, while in S0, all trainable parameters were activated and along with parameters in fine-tuned fully connected layers were participated in the model training (Fig. 2). In this way, four models were generated for our study, they are MobileNetV1_False(V1_False), MobileNetV1_True(V1_True), MobileNetV2_False(V2_False) and MobileNetV2_True(V2_True).
The network structure is not changed throughout the training process. We selected the convergence of parameters and capacity for generalization as the primary outcome measure for the DTL models.
Hyperparameter settings
We used binary cross-entropy as our loss function. To explore suitable hyperparameter combinations for DTL workflows, we trained an DTL model for each classification task and each hyperparameter combination. (Fig. 3). The input image size was 224×224. Training for our model on DCE-BMRI images required 60 epochs with a batch size of 64 images. In addition, the activation functions were ReLU and sigmoid in fine-tuned fully connected layers, as shown in equations 1 and 2:
$$\text{R}\text{e}\text{L}\text{U}\left(\text{x}\right)=\text{f}\left(\text{x}\right)=\left\{\begin{array}{c}max(0,x), x\ge 0\\ 0, x<0\end{array}\right.$$
2
$$\text{s}\text{i}\text{g}\text{m}\text{o}\text{i}\text{d}\left(\text{x}\right)=\text{f}\left(\text{x}\right)=\frac{1}{1+{\text{e}}^{-\text{x}}}$$
3
Evaluation metrics
To compare the performance of DTL models, the following five performance indices were calculated: accuracy (Ac), precision (Pr), recall rate(Rc), F1 score (\(\text{f}1\)), and area under the receiver operating characteristic (ROC) curve (AUC) as the metrics in this study[20]. The positive and negative cases were assigned to the malignant and benign groups, respectively. Hence, true positive (TP) and true negative (TN) represent the number of correctly diagnosed malignant and benign lesions, respectively. FP and FN indicate the numbers of incorrectly diagnosed malignant and benign lesions, respectively. TP was considered a true-positive sample. This was also a positive sample. The mathematical formulations of Ac, Pr, Rc,\(\text{f}1\)are as follows:
$$\text{A}\text{c}=\frac{\text{T}\text{P}+\text{T}\text{N}}{\text{T}\text{P}+\text{T}\text{N}+\text{F}\text{P}+\text{F}\text{N}}$$
4
$$\text{P}\text{r}=\frac{\text{T}\text{P}}{\text{T}\text{P}+\text{F}\text{P}}$$
5
$$\text{R}\text{c}=\frac{\text{T}\text{P}}{\text{T}\text{P}+\text{F}\text{N}}$$
6
$$\text{f}1=\frac{2\times \text{P}\text{r}\times \text{R}\text{c}}{\text{P}\text{r}+\text{R}\text{c}}$$
7
The distribution of the data was not considered in Ac. \(\text{f}1\) is a balanced metric determined by precision and recall; it is useful when there are imbalanced classes in the dataset.
The false positive rate (FPR) was calculated by dividing the total number of negatives by the fraction of negatives incorrectly classified as positive by the model. It can be evaluated as:
$$\text{F}\text{P}\text{R}=\frac{\text{F}\text{P}}{\text{F}\text{P}+\text{T}\text{N}}$$
8
A false negative rate (FNR) is the fraction of positives misclassified by the model as negative divided by the total number of positives. It can be evaluated as:
$$\text{F}\text{N}\text{R}=\frac{\text{F}\text{N}}{\text{F}\text{N}+\text{T}\text{P}}$$
9