2.1 Study population
The breast MRI images were obtained from the Duke Breast Cancer MRI dataset (publicly available online at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70226903). The dataset is composed of a retrospectively collected cohort of 922 biopsy-confirmed invasive breast cancer patients from a single institution with preoperative MRI from January 1, 2000 to March 23, 2014. The MR images are available in DICOM format with non-fat saturated T1-weighted, fat-saturated gradient echo T1-weighted pre-contrast, and post-contrast sequences. An overview of the study population used in the training, validation, and test sets can be found in Table 1. Other relevant clinical data such as tumor characteristics, genomic data, and clinical outcomes are also included in the public dataset. More information on the dataset can be found in a previous publication.11
Table 1
Characteristic | Patients (n = 127) |
Age – yrs | 53.1 ± 10.8 |
Race – no (%) | |
• White | 91 (71.6) |
• Black | 28 (22.0) |
• Hispanic | 3 (2.3) |
• Other | 5 (4.1) |
Menopause Status – no (%) | |
• Positive | 72 (56.7) |
• Negative | 53 (41.7) |
• Not reported | 2 (1.6) |
2.2 Magnetic Resonance Imaging protocol
With the patients placed in a prone position, MRI was performed on 1.5- or 3.0T scanners (GE Healthcare and Siemens) to obtain axial breast images. Slice thickness was 1.04-2.5mm. Repetition time was 3.54–7.39 ms and echo time was 1.25–2.76 ms. The acquisition matrix had a minimum array size of 320 x 320 and maximum array size of 448 x 448. Flip angle was 7–12 degrees and field of view was 250–480 mm.
2.3 Image selection and annotation
From the Duke Breast Cancer MRI dataset, we selected 127 fat-saturated gradient echo T1-weighted pre-contrast MRI studies. For each study, three image slices were chosen for breast segmentation and three image slices were chosen for FGT segmentation. To guarantee the comprehensiveness of our MRI dataset, the following selection rules were applied: to select images for breast segmentation, the MRI volume was evenly divided into thirds from which one image slice was randomly selected from each third of the volume; to select images for FGT segmentation, the MRI study was evenly divided into fourths from which two image slices were randomly selected from the middle half (since most fibroglandular tissue is concentrated in the middle of the breast) and one image slice was randomly selected from the first or last quarter sections of the MRI study.
For each MRI image, the outer contours of the breast and the FGT were traced. Manual segmentation was performed on the selected images to serve as ground truth using 3D Slicer (https://www.slicer.org/).12 For breast segmentation, the outer contour of each breast was traced. For FGT segmentation, manual thresholding was applied to classify the FGT voxels. All manual segmentations were reviewed and confirmed by a breast radiologist with seven years of experience.
All the segmentation masks with more information about the dataset used can be found at our website: https://sites.duke.edu/mazurowski/resources/breast-cancer-mri-dataset-fgt-and-breast-segmentation-2d/.
Our study was determined to be exempt from review by the Duke Health Institutional Review Board. The review board also determined that receiving informed consent is waived as we are using a publicly available database. All methods were performed accordance with the relevant guidelines and regulations.
2.4 Automated image segmentation using U-net
The U-net architecture has been successfully used in multiple medical imaging segmentation tasks.13–18 This architecture consists of three sections: contraction, bottleneck, and expansion. The contraction section is made of many contraction blocks. Each block takes an input, applies two 3\(\times\)3 convolution layers, and then uses 2\(\times\)2 max pooling. The number of kernels or feature maps after each block doubles so that the architecture can learn the complex structures effectively. The bottommost layer mediates between the contraction layer and the expansion layer. It uses two 3\(\times\)3 CNN layers followed by 2\(\times\)2 up convolution layer.
Before training the network, we preprocessed the dataset in two steps. First, we normalized the image intensity between zero and one based on the 5th and 95th intensity percentiles. Next, all images were resized to 512\(\times\)512 using bilinear interpolation. 19
We selected 100 cases for development and 27 cases for testing. During the development process, 10% of the training cases (10 cases) were randomly chosen as the validation set. The models for breast and FGT segmentation were trained independently using the same U-net structure .The network was trained from random weights initialized using He’s initialization method.20 The training processes included a total of 200 epochs before convergence. The Adam optimizer was used for back-propagation of gradients. To prevent overfitting, a term for L2 regularization was added to the binary cross-entropy loss. An initial learning rate of 1e-4 was used and decayed by 0.8 every 40 steps. The final segmented masks were predicted with a threshold of 0.5. These hyperparameters were chosen after iterative tests done on the validation set, with varying learning rates, batch sizes, and total epochs.
All of the code used for data preprocessing, model training, and model evaluation can be found online at: https://github.com/MaciejMazurowski/Breast-dense-tissue-segmentation
2.5 Evaluation of the model for segmentation accuracy
Performance of our automated segmentation model was assessed with the Dice similarity coefficient (DSC) which is a statistical validation metric that measures the similarity between two datasets and is a commonly used spatial overlap index to evaluate the performance of image segmentation models. The values of DSC range from 0 (indicating no spatial overlap between two sets of binary segmentation results) to 1 (indicating complete overlap). For this study, the DSCs were calculated by comparing the image segmentations produced by the model against the manual segmentations on the selected images to determine the accuracy of the model on the training and test sets.
2.6 Evaluation of the model for FGT percentage assessment
To evaluate the performance of the model in assessing the overall breast density, we used an extended testing set of all slices for 50 cases (including the 27 test set cases used in the previous sections). Three fellowship-trained and board-certified breast radiologists reviewed the pre-contrast images on these studies and classified the FGT into four categories according to the 2013 Breast Imaging-Reporting and Data System (BI-RADS) Atlas: almost entirely fat, scattered fibroglandular tissue, heterogeneous fibroglandular tissue, and extreme fibroglandular tissue (category a, b, c, and d, respectively).
The FGT percentage for the algorithm was computed based on the breast and FGT segmentations using the following formula21:
$$FGT percentage=\frac{\left|FGT\right|}{\left|Breast\right|} \times 100$$
where |FGT| is the total volume of FGT and |Breast| is the total volume of the breast. To calculate |FGT| and |Breast|, we applied the trained breast segmentation model and FGT segmentation model to the whole MRI study (i.e., all image slices) in the test set to achieve the predicted masks of breast and FGT. Then |FGT| was calculated by adding the area of FGT in each slice of the MRI study, and |Breast| was computed in a similar fashion (Fig. 1).
The radiologist assessments were regarded as ground truth and were compared to the FGT percentage calculated by our algorithm. We assessed the association between the algorithm and the radiologists using Spearman’s correlation coefficient (r). The inter-reader variability among the three radiologists was measured with Cohen’s kappa coefficient (κ).