In this research, we compared the FA model and two non-FA models in terms of accuracy and uncertainty. We also examined the clinical utility of synthetic FA generated from color fundus images. As a result, the FA model achieved the best accuracy, while the other two models also attained comparable accuracy. As for uncertainty, the FA model yielded the most stable prediction, while the color fundus model showed the highest Monte Carlo uncertainty.
Despite the comparable accuracy, the prediction from the non-FA models was unstable compared to the FA model. This is likely because some color fundus images lacked visible abnormalities such as hemorrhages. In other words, the accuracy of NPA prediction using color fundus was inconsistent; when lesions are visible in both color and FA images, the color fundus model can perform comparably to the FA model; however, when lesions are only visible in FA, the color fundus model performs worse than FA. Furthermore, the volatile imaging quality of the color fundus images might also have destabilized the color fundus model. Compared to FA, color fundus images are more vulnerable to image artifacts such as the angle of the camera and the direction of the lighting (e.g., shadows). These artifacts can lower the Dice score by deteriorating the image quality.
The impact of synthetic FA on accuracy was mixed; it led to both increases and decreases in Dice scores, depending on the samples. Meanwhile, synthetic FA consistently reduced Monte Carlo uncertainty, likely due to GAN’s capability of image enhancement. As previously mentioned, some color fundus images suffer from quality issues, leading to greater uncertainty. However, GANs can improve image quality. In fact, GANs are commonly used in image enhancement tasks such as noise reduction and super-resolution 15–19. In this study, our GAN model presumably acquired image enhancement capability because the target images (FA) had better image quality than the source images (color fundus).
By using synthetic FA, we could reduce the misleading uncertainty estimates. This improvement is clinically beneficial. Uncertainty can help clinicians identify areas requiring further examination for abnormalities. However, "false alarms" of uncertainty estimates can mislead clinicians into investigating completely normal areas, thereby wasting their time. By using synthetic FA for NPA prediction, such false alarms can be decreased, and therefore the uncertainty estimates will be more reliable and helpful.
However, inappropriate “improvement” in image quality could also be harmful to segmentation accuracy. Most generative AI has a problem called hallucination, in which AI generates false information with certainty 20,21. In this study, although the GAN model was trained on images that contain NPAs, most parts were normal and therefore the GAN model might have tended to generate normal images. That would have obscured abnormalities, negatively affecting Dice scores and sensitivity.
Although our results suggest the limited effect of synthetic data on accuracy, some studies have reported that GAN-generated images could improve prediction performance in medical image tasks such as contrast-enhanced CT synthesis and other medical fields such as pathology 22,23. Collectively, it is suggested that the effectiveness of GANs is task-dependent. In general, GANs cannot acquire additional information from patients; they can only refine existing features within the existing images. There should be a substantial limitation in that it cannot detect what is not there. Therefore, theoretically, using GAN-generated images for downstream tasks would only be beneficial when downstream models fail to extract features effectively.
Our analysis was limited due to the small size of the dataset, which could have affected the accuracy of the segmentation model and GAN model. Therefore, more extensive research is needed to determine the utility of synthetic data in medical image AI. Future work should aim to develop a more stable NPA segmentation model that performs well even when abnormalities are subtle or not readily apparent on color fundus images.
In conclusion, the deep learning models can predict NPAs solely from color fundus images with acceptable accuracy. This result is prospective towards the aim of providing RVO patients with safe and accessible examinations. However, at this point, NPA prediction relying solely on color fundus images can lead to missed lesions, given its instability. Further research is needed to overcome this challenge. The unstable performance can be attributed to two factors. First, the color fundus model performs comparably only when there are visible lesions of NPA in the color fundus images. When an input image completely lacks indicative features, the model performance deteriorates. Second, the quality of color fundus images is more likely to be impaired than FA due to image artifacts such as shadowing. The primary contribution of the GAN-generated FA is its image enhancement effect such as noise reduction and brightness adjustment. Although the improvement in accuracy is subtle, GAN-generated FA lowers “false alarm” in Monte Carlo dropout uncertainty estimates and thereby enhances their clinical utility as an indicator for requiring doctors’ further inspection of a specific part in a fundus image.