We undertake a comparative analysis between QGAN models employing discriminators comprised of fully connected layers and those utilizing convolutional layers. Initially, we focus on assessing the fundamental capability of image generation, selecting the digit "0" and "Ankle boot" as our test subjects. Employing our RBF-QGAN architecture, we embark on generating grayscale images corresponding to these chosen categories.
The ensuing figures showcase the generated results alongside the evolution of generator and discriminator losses throughout the training phase. As depicted in Fig. 6 and Fig. 7, our algorithm exhibits swift convergence towards the Nash equilibrium point. Notably, the cross-entropy loss values for both the generator and discriminator stabilize around 0.7, indicative of convergence attainment. Furthermore, a visual inspection of the generated images reveals a striking resemblance to the chosen sample pictures. These findings underscore the proficiency of our proposed RBF-QGAN model in executing image generation tasks for MNIST and Fashion MNIST datasets.
Indeed, our model demonstrates the capacity to achieve a remarkably low mean cross-entropy loss between generated and original images within a mere 70 epochs. This underscores the effectiveness and efficiency of the RBF-QGAN approach in generating high-quality images while maintaining fidelity to the original dataset samples.
5.1 Stability of the RBF-QGANs
To assess the stability of our RBF-QGANs, we adopt the coefficient of variation (CV) [42] as a pivotal measure for evaluating the consistency of the discriminator’s performance. The CV is a standardized metric that compares the dispersion of data by calculating the ratio of the standard deviation to the mean of the original dataset. This calculation offers a unitless scale, facilitating the comparison of variability across different datasets or models. By leveraging this metric, we can closely monitor the fluctuation of loss values throughout the model’s training phase, thus providing a detailed insight into the overall training stability of the model’s components. The formula used to compute the CV is as follows:
$$CV= \frac{{S}_{D}}{{M}_{\text{D}}} \left(8\right)$$
where 𝑆𝐷 represents the standard deviation, and 𝑀D denotes the mean of the discriminator’s loss value. A smaller CV value indicates greater system stability as demonstrated in Ref. [42].
To delve deeper into the effectiveness and stability of the RBF-enhanced discriminator within our training framework, we have devised a specific experimental setup. Throughout the training regimen of each model variant, we meticulously document the loss values associated with the discriminator at every iteration. These collected data points enable us to compute the CV for the loss values corresponding to each training approach, providing a quantifiable measure of stability.
Figure 6 presents a detailed visualization illustrating the shift in the coefficient of CV in response to increasing noise levels within our innovative RBF-QGAN framework. This analysis directly compares our approach with an QGAN setup incorporating fully connected (FC) and 1-dimensional convolutional (CONV) layers in the discriminator. We conducted this comparison using both the MNIST and Fashion MNIST datasets, each subjected to three distinct noise profiles: Gaussian, Uniform, and Salt Pepper noise.
This comparative study allows for a comprehensive evaluation of our model’s robustness and adaptability under various conditions of noise interference. Our results unequivocally demonstrate the effectiveness of the proposed RBF-QGAN, as indicated by consistently lower CV values across all types of noise, in contrast to classical FC-based and CONV-based QGAN architectures, as depicted in Fig. 6. These significantly smaller CV values underscore the robustness and stability of our model in identifying and processing noisy image inputs. Enhanced stability is crucial in real-world scenarios where noise is an inherent aspect of data acquisition and processing pipelines. Therefore, our approach not only showcases superior performance in noisy environments but also signifies its potential for deployment in practical settings requiring resilient and reliable image processing capabilities.
5.2. Robustness in RBF-QGANs
Robustness stands as a cornerstone attribute within the realm of Generative Adversarial Networks [43], encapsulating a model's resilience in maintaining performance and effectiveness despite variations in input data, noise, or other perturbations. In the intricate landscape of GANs, a robust model manifests its prowess by consistently generating high-quality and realistic outputs, even when subjected to uncertainties or adversarial inputs. This critical characteristic ensures the reliability and generalizability of the model across a spectrum of datasets and conditions, effectively guarding against potential pitfalls such as mode collapse or vulnerability to minute changes in input data.
Fig .7 The comparison of the loss values of QGAN models composed of RBF discriminator and those composed of FC and CONV discriminators under noisy data input.
To comprehensively evaluate the robustness of our algorithm, we deliberately selected the digit "0" from the MNIST dataset as the focal point for model training. We systematically introduced various levels of noise to test the model's resilience. The ensuing comparison of robustness, depicted in Fig. 7, offers a thorough assessment of our model's performance under diverse noise conditions.
Our evaluation incorporates both qualitative observations and quantitative analysis, with a particular emphasis on cross-entropy loss values as a metric for performance assessment. Specifically, we scrutinized the performance of HQCGAN, which includes three distinct discriminators, across varying degrees of noise. Figure 7 reveals that under Gaussian noise with a strength of 0.4, Uniform noise with a strength of 0.1, and Salt Pepper noise with a strength of 0.015, the RBF-QGAN demonstrates robustness, with loss values exhibiting minimal increase despite the presence of noise. These results underscore the model's commendable consistency in achieving similar effects despite fluctuations in noise levels, thereby reaffirming its robustness in adverse conditions.
Such robustness is pivotal for ensuring the algorithm's efficacy in real-world scenarios. Moreover, the study acknowledges previous research by Cheng at el., which addressed robustness concerns in image synthesis with QGANs on NISQ devices [44]. However, in contrast to Chu's focus on developing a new quantum generator tailored for pure quantum circuits, this study emphasizes enhancing the robustness of image synthesis in hybrid models featuring classical data input. This is achieved through modifications to the classical discriminator component.
5.3 Comparison of Similar Works in Image Generation
In our comprehensive comparative analysis, we meticulously evaluate our architecture (showed as Fig. 8(d)) alongside pertinent prior works, shedding light on the nuanced approaches within the domain of QGANs. Notably, Huang et al. [19] pioneered a pioneering paradigm with their introduction of a novel patch quantum generative adversarial network. Their innovative strategy involved the reduction of MNIST image dimensions to a compact 8 × 8 format, followed by the subdivision of the image into four distinct sections. Each section underwent individual processing by dedicated 5-qubit quantum generators, contributing to a multifaceted synthesis process. The culmination of these efforts is the seamless synthesis of the final image, achieved through the synergistic amalgamation of outputs from the four generators, as vividly illustrated in Fig. 8(a).
Moreover, in a landmark contribution, the Ref. [45] presented a versatile quantum GAN framework engineered to tackle the complex challenge of generating images endowed with intricate high-dimensional features. This groundbreaking approach leverages the inherent power of quantum superposition to concurrently train multiple examples, representing a significant stride forward in the realm of quantum image generation. Their noteworthy achievement lies in the successful learning and generation of real-world handwritten digit images, a feat made possible through the utilization of a state-of-the-art superconducting quantum processor, as showcased in Fig. 8(b).
Fig .8 Comparison of our generated MNIST dataset images with different works.
Additionally, we have compared our image generation with QuGAN [46], as displayed in Fig. 8(c), a pure quantum GAN architecture that operates discriminator and generator based on quantum state fidelity and calculates quantum-based loss function values via swap tests of quantum bits. In contrast, our employed hybrid framework is more adapted to the current development stage of NISQ devices, and it is better positioned to leverage the advantages of quantum.
Overall, the images generated by RBF-QGAN exhibited distinct handwritten digits with well-defined edges and minimal distortion. This comparative examination serves to underscore the rich diversity of methodologies employed in the burgeoning field of QGANs, offering a glimpse into the continuous evolution and innovative strides driving advancements in quantum image synthesis and generation.