3.1. Overview
This paper proposed a multi-class semantic segmentation method for breast tissues to reconstruct breast shape in a standing position using U-Net based on Haar wavelet pooling. To train deep learning networks, labeled breast tissue data are necessary. MRI images, which are essential for breast cancer screening, have higher resolution and lower noise than CT or ultrasound images. Moreover, MRI images are effective for representing and segmenting micro-soft tissues such as fat and fibrous granular variants in the breast.
Figure 1 shows breast tissue segmentation process based on deep learning network for breast shape reconstruction. In the first step, MRI images in Digital Imaging Communication in Medicine (DICOM) format were collected. In the second step, breast tissues from collected MRI breast images were labelled to build a dataset for training the deep learning network. Labeling steps included removing noise from the MRI image with a median filter using Otsu's threshold algorithm, expanding to all MRI images through the template-based segmentation method based on the segmented tissues, and verifying labels by a radiologist. In the third step, U-Net based on Haar wavelet pooling was designed to segment breast tissues for breast shape reconstruction. To improve the multi-class semantic segmentation performance for MRI breast images, this study combined U-Net with Haar wavelet pooling. The U-Net based on Haar wavelet pooling uses the LL sub-band, which holds approximate value of the input image, and three distinct frequency sub-bands (LH, HL, and HH), which have detailed edge features. Therefore, breast tissue features could be effectively extracted by reducing the loss of image information in the subsampling. The U-Net based on Haar wavelet pooling was trained with constructed datasets and its performance was then tested.
3.2. Building an MRI dataset for breast tissue segmentation
A set of data including labels for every pixel of the MRI data is required for breast tissue segmentation with deep learning. In this study, an MRI dataset was collected from the breast-diagnosis database 22 of The Cancer Imaging Archive (TCIA), an open access database for medical images for cancer research. The breast-diagnosis database contains medical images of breast cancer patients as well as cases of breast diagnosis such as high-risk normal, DCIS, fibroids, and lobular carcinomas. Each image was captured with three pulses (T2, STIR, BLISS) using a Phillips 1.5 T MRI system. Breast MRI images of 89 breast cancer patients were obtained at 2 mm intervals at resolutions of 500 to 600 DPI, with 80 to 90 MRI slices per person. This study used MRI slice images of T2-weighted pulse sequences data.
Figure 2 shows a step-by-step process of labeling breast tissues from MRI slice images. The breast tissue should be segmented into skin, fat, fibroglandular tissue, and background because the upper part of the pectoral muscle is incised in a mastectomy. As shown Fig. 2 (a), the noise of the slice image was removed using a median filter. Otsu's threshold algorithm was used to segment breast tissues from denoised MRI images. This algorithm could separate the foreground and background by a threshold based on the distribution of pixels in the image. Multiple thresholds were utilized to separate breast tissues with the same pixel distribution value.
Figure 2 (b) shows the result of segmenting fibroglandular tissues (yellow region) and the background (light blue region) using Otsu’s threshold algorithm by setting the background and the foreground (fibroglandular tissues). Figure 2 (c) shows result of segmenting fat (green region) and the rest of the breast tissues (red region) through Otsu’s threshold algorithm by setting the background and the foreground (fat, muscle, and chest wall). Figure 2 (d) shows that the background (brown region) and the inside of the human body (green region) are separated through Otsu’s threshold algorithm. Skin data are lost during T2-weighted pulse sequence images. Hence, boundary lines between green and brown regions were offset by pixels with a thickness of 2 mm that matched the thickness of the human body's skin and the result was used as skin data. These segmented skin data were validated through BLISS MRI images of breast cancer patients. In this process, breast-diagnosis cases such as fibroma and breast cancer were recognized and segmented as fibroglandular tissues because they were diseases expanded by the action of hormones on mammary glands. As shown in Fig. 2 (f), breast tissues obtained in the previous step were integrated. Light blue, blue, green, and yellow regions represented the background, skin, fat, and fibroglandular tissues, respectively.
In order to reduce the manual labeling of breast tissues, we used template-based segmentation. Template-based segmentation expands with a single MRI slice as a reference template for the remaining other MRI slices. It can extract individual breast tissue features after analyzing each tissue's location, size, shape, and pixel distribution values with MRI slice of a cross-section into which breast tissues are segmented and set as a reference template. The full MRI image in which breast tissues are finally segmented can be obtained by extracting breast tissues with similar characteristics from consecutive MRI slices. Figure 3 shows the process of segmenting the entire MRI slice image through template-based segmentation. T2-weighted pulse sequence data that were input for breast tissue segmentation and the segmented breast tissue are output at 800 × 800 DPI resolution. These labeled data with template-based segmentation were validated by a radiologist.
3.3. Designing U-Net based on Haar wavelet pooling
This study proposed a U-Net based on Haar wavelet pooling in the subsampling stage. The wavelet transform, which has information on the spatial domain and the frequency domain, is expressed as a vibration with an average of zero that vibrates while repeating increase and decrease within a preset time. Wavelet transform can effectively detect sudden signal changes because it describes regional features and provides signal analysis at different scales and levels. The wavelet transform for a signal \(x\left(t\right)\) is defined by Eq. (1) 23:
$${W}_{a}x\left(b\right)= \frac{1}{\sqrt{a}}\underset{-{\infty }}{\overset{+{\infty }}{\int }}x\left(t\right){\varPsi }^{*}\left(\frac{t-b}{a}\right)dt a>0$$
1
where \(a\) is the parameter for scale change, \(b\) is the displacement rate, and \({\varPsi }^{\text{*}}\left(t\right)\) is a continuous basis function called mother wavelet. In this study, a two-dimensional (2D) discrete Haar wavelet transform that could minimize the amount of computation when converting MRI breast images in the deep learning network was used.
The 2D-wavelet transform presents the input image as a matrix of two-dimensional signals based on the brightness of pixels. Data passing through the 2D-wavelet transform were divided into four bands according to the applied filter. Figure 4 shows the structure of wavelet pooling. Wavelet pooling passed through two steps. Input data were decomposed through a high pass filter and a low pass filter at each step. The size of the input data was reduced because down-sampling was performed in each step. In the first step of wavelet pooling, the input image was horizontally separated into low (L), which was a low-frequency component, and high (H), which was a high-frequency component by applying the horizontal filter. In this process, the approximate value of the input image was decomposed for the low-frequency component and the detailed value was decomposed for the high-frequency component. In the second step where the vertical filter was applied, images of low- and high-frequency components were vertically separated again and decomposed into four sets of data: LL, LH, HL, and HH bands. Resolutions of data in all bands were reduced to half the resolution of the input data. Data of the LL band had a low-frequency component, indicating the overall trend data of the input data. Data of LH, HL, and HH bands had edge features for vertical, horizontal, and diagonal components, respectively. The segmentation performance can be improved because decomposed data can represent various features such as micro breast tissues.
U-Net consists of a contracting path that extracts features from the training data and an expansive path for restoring the original resolution. The contracting path performs down-sampling by setting the stride size of the convolution to two in each step, whereas the expansive path performs up-sampling using transposed convolution. The max pooling in the subsampling stage used in previous studies was difficult to generalize because it was sensitive to overfitting of the dataset 24. Although some studies have attempted to solve the vanishing gradient problem by passing the information in the contraction path to the expansive path through a skip connection, overfitting still occurs 25. Breast tissues are delicate data linked by small pixels. Therefore, if max pooling is used, information on the breast tissues might be lost. This study designed a deep learning network of the U-Net architecture based on Haar wavelet pooling for subsampling to segment breast tissues.
Figure 5 shows the deep learning network architecture that combines Haar wavelet pooling with U-Net. The deep learning network was composed of 12 convolution layers, 5 Haar wavelet pooling layers, and 5 inverse wavelet-based up-sampling layers. Input breast image data were converted into LL, LH, HL, and HH band data by Haar wavelet pooling. These converted data were then transmitted to the convolution layer. The resolution was restored using an inverse wavelet, which could reconstruct data using the output value of wavelet pooling. In the proposed architecture, a batch normalization function and a ReLU activation function were used with each convolution layer. The amount of computation for the network was reduced compared with previous studies by applying the Haar wavelet pooling. The number of existing channels was maintained because the pooling result did not affect the number of channels in the deep learning network. However, the number of input channels of the convolution layer was increased by a factor of four because the U-Net based on Haar wavelet pooling simultaneously used LL, LH, HL, and HH bands.