Generative Adversarial Networks : A Survey

Generative Modelling has been a very extensive area of research since it finds immense use cases across multiple domains. Various models have been proposed in the recent past including Fully Visible Belief Nets, NADE, MADE, Pixel RNN Variational Auto Encoders, Markov Chain, and Generative Adversarial Networks. Amongst all the models, Generative Adversarial Networks have been consistently showing huge potential and developments in the area of Art, Music, SemiSupervised learning, Handling Missing data, Drug Discovery, and unsupervised learning. This emerging technology has reshaped the research landscape in the field of generative modeling. The research in the area of Generative Adversarial Networks (GANs) was introduced by Ian J. Goodfellow et al in 2014 [1]. However, since its inception, various models have been proposed over the years and are considered state-of-the-art models in generative modeling. In this survey, we provide a comprehensive review of the original GAN model and its modified versions, covering broad topics on model architecture, Objective and Loss Functions, and applications. First, we summarize different architectures proposed along with Objective Functions and loss functions used. Second, we cover the evolution of GAN followed by summarizing comparative analysis for various GANs. Then, we review various applications proposed by various authors which are built over these architectures in different domains. Finally, the technical challenges and several promising directions are highlighted.


Introduction
Artificial Intelligence has taken leaps forward since its inception due to immense improvement in the hardware capabilities of machines to carry out computations. The advancements carried out in parallel processing with the GPUbased machines have induced the capabilities of performing thousands of parallel computations simultaneously. Parallel processing, in particular, has propelled the way for Artificial Intelligence in general, and Deep learning in specific as Deep Neural Networks require massive computations in parallel for faster execution. Different sectors are tapping the potential of deep learning in their unique ways and finding uses in areas ranging from healthcare, image segmentation, data generation, autonomous vehicles, text analytics and in various other domains. The problem-solving capabilities of our systems at present have evolved exponentially in the past decade and are addressing problems that were earlier considered extremely difficult to solve. These problems involve teaching machines to read, write, paint, create and generate text, images, and videos. Generative Modelling has been instrumental in solving these problems. A taxonomy of the Generative modeling is shown in Figure 1. Amongst the various models proposed in Generative Modelling, Generative Adversarial Networks have become a very versatile and strong problem-solving model. Theoretically, GAN performs unsupervised learning by taking a supervised learning approach and generates synthetic data. A GAN model comprises of two Sub-models. A Generator, that generates plausible examples from the problem domain, and a Discriminator, that is used to classify the examples as real or fake [1].

Survey Structure
The survey paper is structured as follows: Section 2 covers an overview of GAN. Section 3 covers the Evolution of GAN architectures followed by an extensive analysis of architectures of GAN proposed in Section 4. Table 1 shows various GAN variants reviewed in section 4. In Section 5, we summarise a comparative analysis of the GAN architectures. Section 6 covers the limitations of Generative Adversarial Networks followed by various use cases proposed by researchers to date categorized by the architectures and type of data used in Section 7. Finally, Section 8 concludes the survey paper with potential future research directions on the topic.

Related Surveys
There are various relevant surveys of GAN that have investigated the various architectures, trends in generative modeling and GANs, and their potential applications in different domains.

Generative Adversarial Network: Overview
Generative Adversarial Networks belong to the family of generative models that has many advantages over other generative models. This includes the generation of samples in parallel as compared to serial generation in Fully Visible Belief Networks [11]. GANs do not require Markov chains as compared to Boltzmann machines. It is also observed that GANs produce better samples than other models. GANs are based on a minimax game [1] in which the Generator directly produces samples, whereas the Discriminator attempts to distinguish between samples drawn from the training data and samples drawn from the Generator [12] The Generator model takes a fixed-length random vector as an input and generates a sample in the domain, such as an image. A vector is drawn randomly from a Gaussian distribution and is used to seed noise for the generative process. After training, points in this multidimensional vector space will correspond to points in the problem domain, forming a compressed representation of the data distribution. This vector space is also referred to as a Latent space, or a vector space comprised of latent variables. Latent variables, or hidden variables, are those variables that are important for a domain but are not directly observable [13]. An example of latent space is shown in the figure below Based on the feedback by the Discriminator, if the image is correctly classified, the Discriminator is rewarded whereas the Generator is penalized and is made to update its weight. However, if the Discriminator classifies the data incorrectly, the Discriminator is penalized and forced to update its weights. As a result, the distribution of fake images starts resembling the distribution of real images, and the Discriminator, therefore, starts classifying an image as real or fake with a probability of 0.5 as shown in the figure below. The objective function for training is as under : Year Related Literature Surveys

2017
Survey on Generative Adversarial Networks [2] 2018 Generative Adversarial Networks: An Overview [3] Comparative Study on Generative Adversarial Networks [4] 2019 How Generative Adversarial Nets and its Variants Work. An Overview of GAN [5] … Figure 4: Architecture for GAN training

Evolution of GAN
Generative Adversarial networks have come a long way since their inception and many researchers have contributed to the development of some very intuitive GANs. Figure 5   Conditional GANs were introduced in 2014. The paper introduced the concept that data generation can be carried out in a more directed manner, if both the Generator and Discriminator have been conditioned over a piece of additional information, y. The conditionality might involve class labels or data from different modalities. In the generator, the prior input noise z, and y are combined in joint hidden representation, and the adversarial training framework allows for considerable flexibility in how this hidden representation is composed. In Discriminator, x and y are presented as inputs to the model. Unlike vanilla GAN, CGAN can control the data generation.
The objective function of a two-player minimax game post conditioning gets modified and is represented as under : A comparison between equation 1 and 2 indicates that the only difference in the 2 models is the added condition y, that helps C-GAN to produce more directed output.
The architecture of Conditional GAN is shown in Figure 6

Deep Convolutional Generative Adversarial Networks(DCGAN) [15]
This model specifies a particular architecture to enhance the capability of GAN by adding new constraints and modifying the CNN architecture. Given a restricted set of images, DC GANs are capable of high-resolution image generation (Figure 7). . The main architectural modifications in this model can be summarised as follows. DC GAN uses Adam optimizer instead of Stochastic Gradient descent with momentum. It uses ReLU and LeakyReLU in generator and discriminator respectively as Activation functions. Pooling Layers are not used in this model and Transpose convolution is used for upsampling. Batch normalization is used on both the sub-models with an exception of the last layer in Generator and the first layer in Discriminator to ensure correct learning of mean and scale of data distribution. The structure of DCGAN is shown in Figure 8

Bidirectional GAN[16][17]
The concept was simultaneously published in 2 papers named Adversarial Feature Learning and Adversarially Learned Inference. Both the paper attempts to learn an inverse mapping ie projecting the data back into the latent space. The paper also demonstrates that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised, and self-supervised feature learning. The overall model is depicted in Figure 9(Adversarially Learned Inference) and Figure 10 (Adversarial Feature LearningAdversarially Learned Inference).
BiGAN architecture includes an additional component, an encoder 'E' that maps the data x to latent representations z along with a Generator, 'G'. The BiGAN discriminator D discriminates in 2 contexts. Firstly D discriminates jointly in data and latent space(tuples (x, E(x)) versus (G(z), z)). Secondly, D discriminates in data space (x versus G(z)).
An important observation brings out that the Encoder and Decoder do not communicate directly with each other as is the case in a conventional Auto Encoder. The Bi GAN training objective is defined as under : Where, BiGANs have an L0 loss function and holds a good similarity with AutoEncoders. Bi GANs in addition to properties of GANs guarantees that at the global optimum, G and E are each other's inverse. BiGANs are also closely related to autoencoders with an L0 loss function.

Super-resolution generative adversarial network (SRGAN) [18]
A Super Resolution is referred to a High-Resolution image that is derived from its low-resolution (LR) counterpart. The paper was proposed by Ledig et al. The network architecture of SRGAN is based on that of DCGAN( Figure 11). It uses a very deep convolutional network with residual blocks. Its objective function includes an adversarial loss and a feature loss. The feature loss is computed as the distance between the feature maps of the generated image, extracted from a pre-trained VGG19 network, and the real image. The super-resolution generative adversarial network shows experimentally a high performance on public datasets. The GAN employs a deep residual network (ResNet) with skipconnection. Also, as compared to previous models used to approach Super Resolution, this model diverges from Mean Square Error as the sole optimization target. The model also defines a novel perceptual loss using high-level feature maps of the VGG network and combines with a discriminator that encourages solutions perceptually hard to distinguish from the HR reference images.

Information Maximizing Generative Adversarial Networks (InfoGAN) [19]
The InfoGAN attempts to learn the disentangled representations in a completely supervised manner. It is an information-theoretic extension of GAN. It decomposes an input noise vectoring a standard latent vector 'z' and latent vector 'c' to capture the semantic saint features of real samples. Thus, the generator network becomes the generator G(z,c). The objective function for Info GAN is represented as under : : Where, λ is a hyperparameter and L1(G, Q) is a regularisation term, and Q is a neural network-based model. Here Generator takes a concatenated input of {z,c} and attempts to maximize the Mutual Information between 'c' and G(z,c). The mutual information term in the above equation is extremely hard to maximize, and hence a lower bound of the same is achieved by using Variational Mutual Information Maximization. The uniqueness of the InfoGAN compared to the standard GAN is the introduction of a regularization term as it is shown in Equation 3 that captures the shared information among the interpretable variables (c) and the generator output. The optimisation for Objective function is carried out in 2 Steps Step 1 : D* = max V InfoGAN(D,G) = V(D,G) Step 2 : G* = min V InfoGAN(D,G) = Ez~P(z)z log (1-D(g(z,c)))

Cycle GAN [21]
The computer Vision field also deals with Problems that aim to capture the mapping between an input and an output image using a trained set of aligned image pairs. ( Figure 12). However, getting a customize aligned image pair is generally not available or expensive to curate. The paper attempts to solve this problem by recommending an approach that does not require an aligned image pair (Figure 13). The model learns about a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y. The model recommended by the authors uses 2 types of loss to optimize. The model uses Adversarial Loss to learn the mapping, and Cycle Consistency Loss coupled with an inverse mapping F: Y → X to enforce F(G(X)) ≈ X (and vice versa) as this mapping is highly underconstrained.

Analysis of GAN architectures
The above section has covered various popular GAN and their architectures. In this section, we attempt to perform an analysis in the following manner. In section 5.1 we will provide the analysis of GANs in terms of their merits and demerits. The comparative analysis of GAN over 8 different attributes will be covered in section 5.2, which includes Type of learning, Architecture, Dataset, Optimizer, Hyperparameters, and Activation Functions used. However, prior knowledge about the strengths and limitation of the model will be important before applying it successfully for a practical application

Comparison based on merits and demerits[14][15][16][17][18][19][20][21]
CGAN applies conditions over the Generator and Discriminator to control the generation of the image. Also, CGAN based models dictate the type of data generated through an applied condition. This results in the creation of a general framework for different applications. This further results in developing a potentially huge tool for providing new image datasets.
DCGAN is the first GAN architecture that utilizes a convolutional neural network (CNN) [16]. It also demonstrates a steady training procedure and achieved great performance in superior quality sharp image generation tasks. However, the output quality degrades significantly on removing the Batch normalization layer. It also results in a decrease in the diversity of the images.
InfoGANs are used for a non-complex dataset as they have not been a very efficient model for complex datasets. They make the image generation procedure more controllable, and the outcome can be more interpreted through the induction of mutual information.
Bidirectional GAN attempts to learn an inverse mapping ie projecting the data back into the latent space. The motivation is to make a type of GAN that can learn rich representations in unsupervised learning. BiGANs are agnostic to the domain of the data.
SR GAN is the first framework capable of inferring photo-realistic natural images for 4× upscaling factors. The model uses the perceptual loss for the model which comprises content loss and adversarial loss. However, constructing the perceptual loss is empirically based and is a field of research.
Cycle GAN introduces cycle consistency loss in addition to Adversarial Loss to learn a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y. This helps in reducing the possibility of mode collapse where all input images map to the same output image and the optimization fails to make progress. The model works on unpaired images, unlike Pix to Pix GAN which takes paired images as input. However, the model does not perform well for tasks that require geometric changes.

Comparative Analysis
This section brings out a comparative analysis of the GANs discussed earlier over 7 parameters that include Activation Function, Hyperpapmeters, Objective Function, Architecture, and datasets used.

Limitation of GANs
Though GANs are state of the art model but are highly unstable in nature. This is due to several reasons that are reviewed in this section

Evaluation metrics
As the generation task is very generalist in nature, it becomes very difficult to evaluate a particular approach with respect to the other due to the lack of a standard defined evaluation metric. Due to this reason evaluation of GAN models is an active research area at present. As different paper recommends different evaluation metrics, there are no standard consensus parameters for fair model comparison.

Mode collapse
While training another challenge faced by GAN is Mode Collapse. Mode collapse refers to a state where all input data maps to the same output image and the optimization fail to make progress. This results in the generator always producing an identical output and leads to a low diversity, which is a key requirement for a model to be useful in various applications for various domains to include computer vision tas

Nash equilibrium
As the cost of both discriminator and generator is dependent on each other's parameter, with a constraint that neither of them can control the other's parameter, a setup requires to achieve Nash equilibrium [1]. Nash equilibrium refers to a state where neither the discriminator nor the generator can improve their cost unilaterally. However, achieving Nashequilibrium is very difficult for both Generator and Discriminator as Discriminator has a relatively easier task. This results in an unstable GAN.

Vanishing Gradient
Another challenge faced by GAN is the Vanishing Gradient problem. This is a state where the generator fails to produce more images good in both diversity and quality. Vanishing Gradients occurs when the gradients in the Generator network become too small that the model stops learning. This results in the Discriminator classifying the generated data as fake with a very good probability and further leads to instability.

Internal covariate shift
Internal Covariate Shift occurs when the input distribution of network activation differs as a consequence of updating parameters in previous layers. When the input distribution of the network changes, intermediate layers (hidden layers) try to learn to adapt to the new distribution. These learning parameters slow down the training of the model due to a change in learning rates. Due to the updated learning rates, the model required much longer training time to counter these shifts. The longer time automatically increases the training cost because the model reserved the resources in a higher time.

Applications of GAN
The state of the art Generative Model, ie Generative Adversarial Networks explores immense and wide applications in various niche areas and can reduce research time significantly that earlier involved years of research. Though GANs are a relatively a very new technology, lots of research is still being carried out to refine and create better and more powerful GAN based models and GANs are penetrating into wider domains for their effective use. Some of the applications proposed by researchers of GANs that are based on the architectures discussed in this survey

Architecture based
This section covers various use cases categorised by the type of architecture used. Feature Filter for EEG [71] Text-to-Image Translation Text-to-Video Synthesis [34] Medical Imaging [55] Multi-Modal Distributionof Pedestrian Trajectories [64] Feature Filter for EEG [72] Semantic-Image-to-Photo Translation Multiple object tracking in UAV videos [35] Reconstruction of turbulent velocity fields [56] Pronunciation Fluency [65] Feature Filter for EEG [73] Generate New Human Poses Maps from Satellite Images [36] super resolving face images [57] Unsupervise classification of street architectures [66] Feature Filter for EEG [74] Photos to Emojis CT Image Augmentation [37] HD movie from low-quality movie [58] Photograp h Editing Region-based Activity Recognition [38] Face Aging Fine Art [39] 8.

Conclusion
This paper aims to provide a comparative analysis along with the evolution of GANs since their inception. The paper discusses various attributes towards the realization of GANs including Architecture, Loss Functions, Hyperparameters,

Face aging
Refers to generation of images of a person after a particular age in future. Identity-Preserved CGAN imposes an age classification term to generate photo-realistic faces and an identity-preserved term into age synthesis. Similarly, Wavelet-GAN based face aging make sure the steadiness of features through inputting the facial feature vector along-with face image as inputs to both G and D.

Image super-resolution
Image super-resolution upsample the images of low-resolution to images with highresolution Super-Resolution GAN (SRGAN) does this through 4x up-scaling factor. Deep Enhanced SRGAN (ESRGAN) further enhances the visual quality of images by using Residual-in-Residual Dense Block (RRDB).

Sketch synthesis
Sketch synthesis refers to conversion of sketch into real image with maximum resemblance to the input sketch. Texture GAN (TGAN) is one such GAN.Its generator takes a texture image, a sketch image, and a color image as inputs, and generates a new image as output. Pix to Pix architecture is used to achieve this.

Image-to image translation
It changes the visual illustration of an image with the new visual representation. The GAN based on Pixel-to-Pixel (pix2pix) translation method adopts a supervised learning technique for this. The generator of pix2pix translates the source image into the target image based on the condition applied, and the discriminator makes it confirm whether the used condition is meet-up by considering the pixel-wise loss.

Stenographic applications
Steganography is the methodology to hide infowithin non-secret information.
Stenographic GAN (S-GAN) and Secure SGAN (SS-GAN) are used for this purpose. Stegano-GAN is the another popular approach being used

Image in-painting
It refers to approximation of the missing pixels of an image. It is an advance reconstruction technique in the photo and video editing applications. Exemplar GAN (Ex-GAN) is used for the same. PGGAN is another model used for image in painting and has produced good results by aggregating local and global information.

Image Based Applications
Video Based