In this study, we generated highly realistic synthetic mammographic images from a state-of-the-art generative network, the StyleGAN2, and developed an unsupervised anomaly detection method. The StyleGAN2 model generated efficient coarse breast structures from the beginning, and as training progressed, it generated complex parenchymal tissues. Although most generated images have shown comparable fidelity to real images, unusual noise-like patterns were present in some synthesized images. These artifacts might have occurred because of the StyleGAN2 architecture, wherein the generator and discriminator use the progressive training process, which means that networks are trained layer by layer, and geological features are learned from coarse to fine scales. Moreover, per-pixel noise was injected after each convolution, which could compensate for the loss of information compression so as to capture high-variance details [26]. In our model, complex and highly variable internal parenchymal structures might be too fine to be trained as styles and could be regulated only via injected noise.
Furthermore, we developed an anomaly detection method using this generation model trained via only normal mammograms and evaluated this method in breast cancer classification. Also, we compared the results according to the number of synthetic image seeds created per image. The best case was using nine different seeds. The AUC, sensitivity, and specificity of this method via anomaly scores were 70%, 78%, and 52%, respectively. Using one seed for a synthetic image could be relatively insufficient to remove the false positive part of the normal dataset. When 16 seeds were used, the sensitivity was the same, but there could be a decrease in performance when averaging the difference map because too many different images were created.
Although the classification performance was not high enough and the validation dataset was small, these preliminary results highlight the potential of this method to classify breast cancer in mammographic images in an unsupervised manner using the anomaly detection using GANs (AnoGAN) architecture. However, projected synthetic images that were the most similar to normal mammograms did strictly correspond to the real images, and thus increased the false- positive cases. Additionally, benign diseases were not excluded when collecting mammographic images for training the generation model, and this might have also increased the false-negative cases because some benign conditions look similar to cancer. To overcome these weaknesses, future studies could consider methods to improve the acquisition of corresponding projection images for test images, but also a staged model that can reduce the number of false-negative cases. A stage model could be implemented via training normal mammograms with and without benign conditions and filtering the normal cases and then the benign cases via staged projection and anomaly scoring.
Although the performance of breast cancer classification via a supervised method is well-suited [27–29], it might have some limitations. It can be considered as a learning method that only memorizes patterns of learning data. Therefore, classifying new unseen data that have never been learned during training is difficult, and a large amounts of labeled data is inevitably required. However, in clinical situations, abnormal data are usually very scarce compared with normal data, and annotation for a large number of datasets for supervised learning requires enormous manpower and time. Thus, semisupervised or unsupervised methods could be very useful alternatives for supervised methods[30, 31]. In a similar context, our anomaly detection method for breast cancer detection would be an additional screening tool or/and alarm system with the supervised method.
This study has several limitations. First, only craniocaudal views of mammograms with limited-resolution were used for the generation of images and the detection of anomalies in this study. Therefore, generation with mediolateral oblique views of mammograms and full high-resolution images (i.e. 2294\(\times\)1914 pixels), which are commonly used in the clinical field would allow for significantly more accurate anomaly detection. Second, we could not conduct the image Turing test by radiologists to evaluate the qualitative performance of the generated images. Especially in medical images, evaluation for qualitative performance might be more crucial than quantitative evaluation, which only measures differences in the density between two distributions in the high-dimensional feature space. Finally, our preliminary results of breast cancer detection showed insufficient performance for clinical application. Therefore, improvements on more similar projections for test images and a staged model using a large amount of high-quality dataset, including distinguished benign cases should be considered to investigate its potential as an additional screening tool.
In conclusion, this study proposes a generative model that uses StyleGAN2 for the generation of high-quality synthetic mammographic images and the anomaly detection method for the detection of breast cancer on mammograms in an unsupervised manner via AnoGANs. Our generative model has shown comparable fidelity to real images, and the anomaly detection method via this generative model trained with only normal mammograms could differentiate between normal and cancer-positive mammograms. This method could provide additional information and help overcome the weakness of current supervised methods for breast cancer detection.