Deep Learning Based Pectoral Muscle Segmentation on MIAS Mammograms

Background: The purpose of this study was to propose a deep learning-based method for automated detection of the pectoral muscle, in order to reduce misdetection in a computer-aided diagnosis (CAD) system for diagnosing breast cancer in mammography. This study also aimed to assess the performance of the deep learning method for pectoral muscle detection by comparing it to an image processing-based method using the random sample consensus (RANSAC) algorithm. Methods: Using the 322 images in the Mammographic Image Analysis Society (MIAS) database, the pectoral muscle detection model was trained with the U-Net architecture. Of the total data, 80% was allocated as training data and 20% was allocated as test data, and the performance of the deep learning model was tested by 5-fold cross validation. Results: The image processing-based method for pectoral muscle detection using RANSAC showed 92% detection accuracy. Using the 5-fold cross validation, the deep learning-based method showed a mean sensitivity of 95.55%, mean specicity of 99.88%, mean accuracy of 99.67%, and mean Dice similarity coecient (DSC) of 95.88%. Conclusions: The proposed deep learning-based method of pectoral muscle detection performed better than an existing image processing-based method. In the future, by collecting data from various medical institutions and devices to further train the model and improve its reliability, we expect that this model could greatly reduce misdetection rates by CAD systems for breast cancer diagnosis.


Background
Among breast imaging methods, mammography is widely used in screening for breast cancer [1].
However, the results can be di cult to interpret for dense breasts, which leads to a high risk of misdiagnosis [2]. For this reason, there has been ongoing research on computer-aided diagnosis (CAD) systems, in order to reduce the number of misdiagnoses and to improve the accuracy of diagnosis by radiologists using mammography [3][4][5]. CAD systems use computer algorithms to enable objective and accurate detection of lesions that are di cult to distinguish with the naked eye. However, accurate lesion detection by these CAD systems can be negatively affected for various reasons. In particular, in CAD systems used to diagnose breast cancer, the pectoral muscle shows a similar pixel intensity to that of lesions of the right mediolateral oblique (RMLO) and left mediolateral oblique (LMLO) muscles, which can cause misdetection [6]. To prevent this, a separate pectoral muscle detection algorithm is required.
In 2016, we developed an image processing-based automated pectoral muscle detection algorithm using the random sample consensus (RANSAC) algorithm on images from the Mammographic Image Analysis Society (MIAS) database [7]. This algorithm had a detection accuracy of 92.2%, which was higher than that found in other studies using the MIAS database [8][9][10][11][12]. Nevertheless, detection accuracy was poor in some images due to the complex shape of the pectoral muscle, and so the algorithm needed to be improved.
Recent advances in hardware have created a favorable environment for deep learning techniques, which have been applied in various elds. Convoluted neural networks (CNNs) especially, as one type of deep learning technique, have been used with outstanding results in various imaging elds [13,14]. In medical imaging as well, numerous studies using CNNs have reported better performance than conventional image processing techniques [15][16][17]. Thus, in this study, we aimed to use deep learning in pectoral muscle detection, to improve upon the previously encountered problems for the complex-shaped pectoral muscle, and to enhance detection accuracy. The deep learning model for pectoral muscle detection was trained using the same MIAS database as before, and the performance of the algorithm was assessed in comparison to the results of the image processing-based method using RANSAC.

Data
This study used mammograms from the mini-MIAS database, which is one of several open access databases. The MIAS database consists of scans of 322 mammogram lms taken as part of the United Kingdom's national breast cancer screening program [18]. All the images include RMLO or LMLO views, and the image sizes are 1,024 × 1,024 pixels. For annotation data to use in learning, regions of interest (ROIs) for the pectoral muscle were drawn directly by a specialist. A binary mask was made, using 0 for the background and 1 for the pectoral muscle, and this was used as the ground truth. Figure 1 shows one of the binary mask images used as the ground truth.
The size of the data set used in this study was not large, and so 5-fold cross validation was used to ensure that the model would be robust in terms of data dependency. For each cross validation, 80% of the overall data (257-258 scans) was used as learning data, and 20% (64-65 scans) was separately constructed and used as test data. Each data point was used exactly once as test data, without duplication.

Development environment
The system for deep learning consisted of 4 NVIDIA TITAN Xp (NVIDIA Corp., Santa Clara, CA, USA) graphics processing units (GPUs), a Xeon E5-1650 v4 (Intel Corp., Santa Clara, CA, USA) central processing unit (CPU), and 128 GB of random access memory (RAM). Deep learning was conducted using Python 2.7.6 and the Keras 2.1.5 framework with a TensorFlow backend in the Ubuntu 14.04 operating system.

Data augmentation
The images from the MIAS database used in this experiment were insu cient to train the deep learning model. Data augmentation was performed to acquire a su cient quantity of learning data [19]. An arbitrary combination of ips, rotations, translations, and stacking were used to expand the learning data set 20-fold.
Training the deep learning model For the CNN in this study, the U-Net model was used. One advantage of the U-Net model is that it has a structure that re-uses encoding and decoding phases via skip connections (Fig. 2); as a result, when images are reconstructed using the network, the original image can be reconstructed without losing even ne details, meaning that the output images have excellent quality [20,21]. For the learning environment, the batch size was set to 8, and the number of epochs was xed to 300 based on an Adam optimizer. The learning rate was set to 0.001 up to Epoch 100, 0.0001 from Epoch 100 to 250, and 0.00001 from Epoch 250 to 300.

Results
In this study, deep learning was used to train a model to detect pectoral muscle in MIAS database images. The trained model was applied to separately constructed test data to assess its performance. Figure 3 compares the ground truth data from the test data with the results automatically extracted using the trained model.
We performed 5-fold cross validation to make sure the model was robust in terms of data dependency.
For each cross validation, 20% of the total data was used as test data, and each data point was used exactly once as test data, without duplication. For each cross validation, the model was tested using 4 statistical indices: sensitivity, speci city, accuracy, and Dice similarity coe cient (DSC). The extracted results of the deep-learning model were compared pixel-by-pixel with the ground truth data, the true positive (TP), false positive (FP), true negative (TN), and false negative (FN) rates were calculated, and the statistical indices were calculated using the equations below. From the results of the 5 cross validations, the mean sensitivity was 95.55%, the mean speci city was 99.88%, the mean accuracy was 99.67%, and the mean DSC was 95.88% (Table 1). The deep learning-based pectoral muscle detection algorithm was assessed using the same method as our previous study on an image processing-based method using the RANSAC algorithm, and the results of the two models were compared. We assessed the differences between the automated detection results of the deep learning model and the manually drawn ground truth data. Concordance ≥ 90% between the deep learning-based automated detection and the manual detection images was de ned as "good", concordance ≥ 50% and < 90% was de ned as "acceptable", and concordance < 50% was de ned as "unacceptable". The previous method using the RANSAC algorithm showed 264 "good" results, whereas the deep learning model showed 322 "good" results ( Table 2). The FP and FN rates of the previous method were, respectively, 4.51 ± 6.53% and 5.68 ± 8.57% (Table 3). In contrast, the FP and FN rates of the deep learning method were, respectively, 2.88 ± 6.05% and 4.27 ± 8.72%.

Discussion
This study proposed a pectoral muscle detection method using deep learning with the MIAS database.
Although the proposed method used all the images in the MIAS database, there was too little data to construct a separate validation set. We used 5-fold cross validation to supplement the shortage of validation data and to minimize the data dependency of the model. The proposed model showed high accuracy, with a mean sensitivity of 95.55% and mean DSC of 95.88%.
We also compared the results of the deep learning-based pectoral muscle detection method with a previous image processing-based pectoral muscle detection method using the RANSAC algorithm. While the RANSAC method showed "unacceptable" results for 22 images, the deep learning method did not show "unacceptable" results for even a single image. Moreover, when the misdetection rate was inspected, the RANSAC algorithm showed FP and FN rates < 5% for only 56.5% of the images, whereas the deep learning algorithm showed a higher proportion of images (71.0%) with FP and FN rates < 5%.
These results demonstrate that the deep learning algorithm achieved more accurate and more stable detection results than the RANSAC algorithm.
In image processing, RANSAC is an algorithm for approximation. Approximation cannot guarantee the detection of an accurate pectoral muscle region. Although there have been attempts to approximate the area of a curved pectoral muscle using nonlinear RANSAC, it is quite di cult to accurate approximate the pectoral muscle. This weakness is thought to have led to the 22 "unacceptable" results. In contrast, because deep learning makes an overall judgment about the shape and the relationship attenuation and pixels based on the training images, it requires generalized, diverse training data. In this study, we used the same data set in order to compare the new model objectively with the previous image processingbased pectoral muscle detection algorithm. The data set was somewhat small to train a deep learning model. The fact that we were still able to obtain relatively good results is thought to be because the shape and position of the pectoral muscle was fairly consistent, and did not differ greatly between patients.
Although the possibility of over tting can be suspected, these doubts can be partially resolved based on the results of cross validation.
This study had some limitations. In current clinical settings, digital mammography is used in most instances, but the MIAS database contains data obtained by scanning lm mammograms. Therefore, there is no guarantee that a deep learning model trained on MIAS data will show good results for digital mammograms. In order to resolve this issue, it will be necessary to collect more digital mammograms, and to further train a deep learning model based on the collected data. It will also be important to collect data from diverse medical institutions and devices, and to objectively validate the deep learning model through multi-center validation. We expect that these additional studies would further enhance the clinical reliability of the deep learning-based pectoral muscle detection method proposed in this report.

Conclusions
The experiments in this study demonstrated that the deep learning-based pectoral muscle detection method achieved a more accurate and stable detection rate than a previous image processing-based method. We expect that this type of deep learning technique could be useful to overcome the limitations of conventional image processing techniques.