Best Selection of Generative Adversarial Networks Hyper-Parameters Using Genetic Algorithm

Generative adversarial networks (GANs) are most popular generative frameworks that have achieved compelling performance. They follow an adversarial approach where two deep models generator and discriminator compete with each other. In this paper, we propose a Generative Adversarial Network with best hyper-parameters selection to generate fake images for digit numbers 1–9 with generator and train discriminator to decide whereas the generated images are fake or true. Genetic algorithm (GA) technique was used to adapt GAN hyper-parameters, the resulted algorithm is named GANGA: generative adversarial network with genetic algorithm. The resulted algorithm has achieved high performance; it was able to get zero value of loss function for the generator and discriminator separately. Anaconda environment with tensorflow library facilitates was used; python as programming language was adapted with needed libraries. The implementation was done using MNIST dataset to validate the work. The proposed method is to let genetic algorithm choose best values of hyper-parameters depending on minimizing a cost function such as a loss function or maximizing accuracy function used to find best values of learning rate, batch normalization, number of neurons and a parameter of dropout layer.


Introduction
Many machine learning (ML) systems generally have complicated input; there are two types of input to be fed to machine learning system: structured vs unstructured data. Structured data are highly organized and formatted in a way so it is easily researchable in relational databases (tables with format csv file, dat file, matrices or any other type of formats with clear tables). Unstructured data have no predefined format or organization (text, audio, video or images with more than one dimension), making it much more difficult to collect, process, and analyze. Machine learning system can handle someway all these type of data. So, ML systems accept a complicated input such as an image and produce a simple and specific output (label), the output differs according to the type of ML process; classification, recognition, re-identification or regression. The output could be a categorical label like, "male or female" or numeric label like 1, 2, or any other number that represents a class. By contrast, the goal of a generative model is something like the opposite: take an input (possibly a few random numbers or vector of noise) then make a complex output, like an image of a true looking face. A generative adversarial network (GAN) is an especially effective type of generative model, introduced only a few years ago, which has become a subject of intense interest in the machine learning community [1]. As shown in Fig. 9, generative model is a model that handles an input and make processing to put these points of noise in a way that simulates an image. It builds a net between these points and train links to have values to edit these points to be like a real image of something.
The idea of a GAN is creating true images from scratch that can seem like magic, but GANs use special method to turn a unclear, seemingly impossible goal into reality. The method is to use randomness as an component. Thinking in terms of probabilities, it also helps us understanding the problem of generating realistic images into a natural mathematical framework. The system should be able to distinguish which images are likely to be true, and which are not. Mathematically, this involves modeling a probability distribution on images, that is, a function that determines which images are true and which are not. This type of problem modeling a function on a high dimensional space is exactly the sort of thing neural networks are made for. GAN set up this modeling problem as a kind of contest. This is where the "adversarial" part of the name comes from. GAN builds two competing networks: a generator and a discriminator. The generator is trained to create synthetic outputs using random inputs, while the discriminator is trained to differentiate these apart from real outputs. The hope is that as the two networks will both get better with more training. Figure 1 below explains GAN simply.
GAN has to works cases according to the application, the first case is to use as generator to fake a specific system, while the second system is to use it as discriminator to detect fake inputs and differentiate them apart from real ones.
Generative adversarial networks are neural networks that learn to get samples from a special distribution; random distribution or vector noise (the "generative" part) as input. GAN is about creating stuff from noise and this is hard to compare other deep leaning fields. In other words, generative adversarial networks (GANs) are deep neural net architectures that included two nets, competing one against the other.
In machine learning, a hyper-parameter is a parameter whose value is used to control the learning subprocess in ML process (Classification, regression, etc.). By contrast, the values of other parameters (typically node weights) are calculated via training. Hyper-parameters can be classified as model hyper-parameters, that cannot be deduced during the process of fitting the machine to the training set because they refer to the model selection task, or algorithm hyperparameters, that in principle have no influence on the performance of the model but affect the speed and quality of the learning process. The choice of hyper-parameters can significantly affect the resulting models performance, but determining good values can be complex [2]. Figure 2

SN Computer Science
GANs possibilities are very enormous because they can learn to simulate and map any data. Learned GAN can be used to create similar staff to our own in any domain: image, animation, news anchor, speech. The main work is to build a GAN that can be used in two scopes: for cheating systems with fake items or for improve system security in detecting fake items. After building this GAN, an tuning process for hyper-parameters is achieved to get optimal GAN for two scopes.

GAN Applications
GAN has interesting applications that are commonly used in the industry right now.
• GANs for image editing: Most image editing software these days do not give us full possibilities make creative changes in pictures. For example, changing the appearance of a 90-year-old person by changing his/her hairstyle. This can not be done by the current image editing software. Another similar application is image de-raining (removing rainy texture from images. Using GAN, give the ability to do all edits to specific image [3]. • Using GANs for security: Industrial applications should be robust to cyber attacks. There is a lot of confidential information that should be protected. GANs are proving to be of immense help here. GANs are used to make existing deep learning models more robust to these attacks by creating more such fake examples and training the model to identify them [4]. • Generating data with GANs: Many domains and fields do not have enough data especially in domains where training data are needed to model supervision deep learning algorithms. GANs can be used to generate synthetic data for supervision [5].
• GANs for 3D object generation: Game designers work countless hours recreating 3D avatars and backgrounds to give them a realistic feel. It certainly takes a lot of effort to create 3D models by imagination. GAN has incredible power to be used to automate the entire process and create 3D models [6].
There are also other applications, So Gan is very important, interesting, and useful tool to be understood and studied well.

Background and Related work
There are many works related to GAN hyper-parameter tuning. In [7], the authors tried to find the appropriate structure more conveniently and efficiently. A method with multiobjective algorithm was proposed to obtain the optimal structure for the GANs. In the proposed method, the nondominated sorting genetic algorithm II (NSGA II) is utilized to optimize the hyper-parameters and structure of deep convolution generative adversarial network (DCGAN). The experiments are conducted on MNIST and Malware datasets demonstrate the efficiency and high performance of proposed method.
In [8], authors proposes the use of conditional generative adversarial networks (cGANs) for trading strategy calibration and aggregation. They provide a full methodology on (i) the training and selection of a cGAN for time series data; (ii) how each sample is used for strategies calibration; and (iii) how all generated samples can be used for ensemble modeling. They have designed an experiment with multiple trading strategies, encompassing 579 assets. They compared cGAN with an ensemble scheme and model validation methods, both suited for time series. The results suggest that cGANs are a suitable alternative for strategies calibration and combination, providing out performance when the traditional techniques fail to generate any alpha. Their problem can be decomposed into two tasks: model validation and hyper-parameter optimization. For each hyper-parameter, they have a space of values, they hope the desire that this space contains the best value of hyper-parameter. Best value will give a max value of accuracy or min value of loss function or error.
In [9], conditional version of generative adversarial networks (cGAN) is used to approximate the true data distribution and generate data for the minority class of various imbalanced datasets. The performance of cGAN is compared against multiple standard oversampling algorithms. They present empirical results that show a significant improvement in the quality of the generated data when cGAN is used as an oversampling algorithm. The hyper-parameters of cGAN are the dimension of the noise space, the hyperparameters related to the G (Generator) and D (Discriminative) networks architecture as well as their training options. The hyper-parameter tuning of the classifiers and the various oversampling algorithms was done to maximize the AUC: area under the curve of the validation set.
In [10], authors propose and study an architectural modification (self-modulation), which improves GAN performance across different data sets, architectures, losses, regularizers, and hyper-parameter settings. They found that self-modulation allows the intermediate feature maps of a generator to change as a function of the input noise vector. While reminiscent of other conditioning techniques, it requires no labeled data. They also observe a relative decrease of 5-35% in FID (Frechet inception distance). They made a modification to the generator and that leads to improved performance (86%) of the studied settings.
In [11], authors studied the crowd counting in images and created feature map of this crowd in the image, CNN was used to get feature maps. GNN can be used to get true image of crowd and tell if this crow in the image is fake or not.
In [12], authors studied approach for semantic segmentation in real urban scenes. Given a source domain (synthetic data) with pixel/object-level labels, and a target domain (real-world scenes) with only object-level labels, their goal is to train a segmentation model to predict the per-pixel labels of the target domain, their model looks like GAN network.
In general, authors found that most models can reach similar scores with enough hyper-parameters optimization and random restarts. They suggested that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes. To overcome some limitations of some metrics, they also proposed several data sets on which precision and recall can be computed. Their experimental results suggested that future GAN research should be based on more systematic and objective evaluation procedures.

Our Work
In this paper, we studied the effects of hyper-parameters in GAN, how they affect the generator and discriminator. So our work is distinguished by • Choosing best learning rate for GAN. • Choosing best dropout keep probability. • Choosing best batch size. • Choosing best number of neurons in dense layers.
Briefly, the proposed method is to choose optimal values of hyper-parameters (learning rate, dropout keep probability, batch size and number of neurons in dense layers) to make the designed system better. Designed system can be used in two different scopes: • Detection of fake items (this way has its own optimal hyper-parameters). • Fake real system with fake items (this way also has its own optimal hyper-parameters).
For both scopes, the tool to find optimal hyper-parameters is genetic algorithm. Hyper-parameters are important because they directly control the behavior of the training algorithm and have a significant impact on the performance of the model is being trained. Choosing appropriate hyper-parameters plays a crucial role in the success of our neural network architecture. Since it makes a huge impact on the learned model. For example, if the learning rate is too low, the model will miss the important patterns in the data. If it is high, it may have collisions. Choosing good hyper-parameters gives two benefits: • Efficiently search the space of possible hyper-parameters. • Easy to manage a large set of experiments for hyperparameter tuning.

Methods
Here, the concept of our work is explained in detail, genetic algorithm was used to get best values of some hyper-parameters.

Genetic Algorithm
According to [13]. Genetic algorithm (GA) is a metaheuristic 1 inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions for optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection. Genetic algorithms are used to provide quality solutions for optimization problems and search problems. To get more details about GA, see [14].

GAN
GAN model consists of two main parts: generator and discriminator. The training phase of the discriminator and generator are kept separate. In other words, the weights of the generator remain fixed while it produces examples for the discriminator to train on, and vice versa. The discriminator training process looks like any other neural network. The discriminator classifies both real samples and fake data from the generator. The discriminator loss function penalizes the discriminator for misclassifying a real instance as fake or a fake instance as real, and updates the discriminator's weights via back-propagation, discriminator has loss and accuracy function to be validated.
Similarly, the generator generates samples which are then classified by the discriminator as being fake or real. The results are then fed into a loss function which penalizes the generator for failing to fool the discriminator and back-propagation is used to modify the generator's weights, generator has loss function.
As the generator improves with training, the discriminator performance gets worse because the discriminator fails to distinguish between real and fake. If the generator succeeds perfectly, then the discriminator has a 50% accuracy (no better than random chance). The later poses a real problem for convergence of the GAN as a whole. If the GAN continues training past the point when the discriminator is giving completely random feedback, then the generator starts to train on junk feedback, and its own performance may be affected. The generator is typically a de-convolutional neural network, and the discriminator is a convolutional neural network [15] when generator is very accurate, i.e its loss function ends to zero, discriminator has bad accuracy and vice versa.
New studies about adaptive loss function and entropy regularization were achieved by [16]. Figure 3 shows generator (de-convolutional neural network) and discriminator (convolutional neural network)

Hyper-Parameter Tuning
Tuning process with GA needs a mathematical term (lost function) to be minimized during GA running. From the above description of GAN, we have Generator loss: g_loss, Discriminator loss: d_loss and Discriminator accuracy: d_acc.
Performance parameters might be:

Programming Environment
According to many references, Anaconda is a free and open-source distribution of the Python and R programming languages for scientific computing (data science, machine learning applications, data processing, predictive analytics, etc.), that aims to simplify package management and deployment. The distribution includes data science packages suitable for Windows, Linux, and macOS. It is developed and maintained by Anaconda, Inc., which was founded by Peter Wang and Travis Oliphant in 2012. Details about python version is shown in Fig. 4.
With tensorflow version 1.14.0, python version 3.7.4 and Keras version 2.2.5, the implementation was done. Keras is an open-source library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library. Keras is an API designed for human beings, not machines. Keras follows best practices for reducing cognitive load: it offers consistent and simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear and actionable error messages. It also has extensive documentation and developer guides [15].

Basic Model
Basic generator in GAN model is shown in Fig. 5, while each layer in the generator is shown in Fig. 6.
Basic discriminator in GAN model is shown in Fig. 7.
The optimizer was Adam: adaptive moment estimation; Adam is an algorithm for first-order gradient-based  [17]. Discriminator and generator loss function is "binary cross-entropy", also called sigmoid cross-entropy loss. It is a sigmoid activation plus a cross-entropy loss, it is independent of each vector component (class), meaning that the loss computed for every discriminator output vector component is not affected by other component values. That is why it is used for multi-label classification; MNIST dataset is multi-label classification, where the insight of an element belonging to a certain class should not influence the decision for another class. It is called binary crossentropy loss because it sets up a binary classification problem between C = 2 classes for every class in C.

Layer Details
For LeakReLU activation function, Alpha was chosen to be 0.2, where LeakReLU activation function is as follows: Default value of is 0.3.
For batch normalization, momentum is the importance given to the moving average or it is the lag in learning mean and variance, so that noise due to mini-batch can be ignored as described in the equation: By default, momentum would be set a high value about 0.99, meaning high lag and slow learning. When batch sizes are small, the no. of steps run will be more. So high momentum will result in slow but steady learning (more lag) of the moving mean. So, in this case, it is helpful. But when the batch size is bigger, as I have used, i.e. 5 K images (out of 50 K) in single step, the number of steps is less. Also, the statistics of mini-batch are mostly same as that of the population. At these times, momentum has to be less, so that the mean and variance are updated quickly. Hence, a ground rule is that • Small batch size leads to high momentum (0.9-0.99). • Big batch size leads to low momentum (0.6-0.85).
Feature were selected for discriminator by a convolutional neural network; there are many way for features selection that are explained in [18]. Since discriminator is a convolutional neural network and dropout layers are important in training CNNs because they prevent over-fitting on the training data, dropout layer can be also added to discriminator, dropout refers to dropping out units (both hidden and visible) in a neural network. Dropout refers to ignoring neurons during the training phase of certain set of neurons by a term named Keep Probability, which is chosen at random. By ignoring, i.e. these units are not considered during a particular forward or backward pass  [19]. Figure 8. shows how dropout layer ignores some neurons from its input.

Genetic Algorithm Implementation
Using genetic algorithm library to implement genetic algorithm, Table 1 shows the main code of implementation: dimension refers to number of variables that the GA will find best values of them.
variable_type refers to types of variables that the GA will find best values of them.
variable_boundaries refers to min and max of area limits of variables that the GA will find best values of them.
Other parameters can be set using algorithm_param [20].

Experiment Results
Results for two scopes of GAN are presented. First, the scope of discriminator where the scope is to detect of fake items. Then results for the scope of generator where the scope is to fake real system with fake items are presented.

Tuning for Discriminator Loss
In this test, the preferred part is discriminator. The goal of GAN is to detect fake items in the used system, so the mission of GA was to search for best value of learning rate to minimize discriminator loss mission for GA was  to minimize discriminator loss, so the returned value from the f(X) function in GA was the discriminator loss (Table 1). We can also minimize the value (1-Discriminator accuracy).

Learning Rate
First of all, learning rate was chosen to be tuned, after setting all needed parameters. The mission of GA was to search for best value of learning rate that minimizes discriminator loss, results gave some values of learning rate that helped to get discriminator loss 0 and discriminator accuracy 100%. Table 2 shows some results during training.
Filter results for only accuracy 100% are shown in Table 3. max generator loss was 12.89 and min value was 8.06.
For max loss of generator, learning rate had some values that are highlighted in green; these results are sample of results during training. In the final, the most repeated value was 0.00019. For min loss of generator, learning rate had value 0.3.

Other Parameter Results
Repeating previous test to get best values of other parameters (keep probability, dense neurons, batch size), best values are shown in Table 4. For finding optimal batch size value, GA will search for this value, but the step size for this value cannot be determined; suppose the first values was 8, GA at each iteration will change this value by step that is not determined in the defined range. According to many studied, batch size is an integer number, we choose the constraint in the code below to choose its value (Table 5).
If batch size is a multiples of 8, then the function will return the true value of g_loss. If it is not, it will return 100*g_loss, thus the minimum value of g_loss will be only in the case that batch size is a multiples of 8. Suppose we did not consider this constraint: • For iterations 1,2,3,...,10, batch size will change 8,9,10,12,13,...,18, g_loss will be calculated each time.

Tuning for Generator Loss
In this test, mission for GA was to minimize generator loss, so the returned value from the f(X) function in GA was the generator loss Table 6.

Learning Rate
First of all, learning rate was chosen to be tuned, after setting all needed parameters. The mission of GA was to search for best value of learning rate that minimizes Generator Loss, results gave some values of learning rate that helped to get generator loss 0. Table 7 shows some results during training.
Filter results for only loss 0.0 are shown in Table 8. max discriminator loss was 7.97 and min value was 4.78.

SN Computer Science
For max loss of discriminator, most repeated value was: 0.171924. Highlighted rows refer to max discriminator loss. For min loss of generator, learning rate had value: 0.0154.

Other Parameter Results
Repeating previous test to get best values of other parameters (keep probability, dense neurons, batch size), best values are shown in Table 9. Figure 9 shows results during generator learning when all best parameters were set.
For generating fake item, each point of noise represents a pixel in the image, during the creating or generating process. The value of pixel "the point of noise" is changed according to the desired output to minimize error function between the output and the desired output. Changes keep happening till the error decreased too much to reach acceptable value.
As seen above, first in the left corner (step a) the pixels be like zeros and ones with no consideration to their locations. After a while, their value are changed (as step b) to have this shape. At the end, the noise will be converted to meaning image as in final step.  So this image is created from noise, it is not a real image. For example, suppose the real image is a face of someone or the fingerprint.

Discussion
From the results, it can be said that GAN with GA (GANGA) is more useful powerful tool to be used in all fields. In our case, it is applied to MNIST dataset, but it can be implemented to any other dataset such as fingerprints dataset or any other datasets, GAN may be used to generate fingerprint images and test their reality in security fields [21]. GAN could be used to detect fake and real iris images [22].
According to the scopes that have been explained previously, main usage of GAN is detecting fake images that have been produced by application such as photo-shop; these fake images may contain fake faces, iris or any other part of any view. GAN is able to distinguish these fake images.
GAN be used to generate fake items such as fingerprints, fake iris prints and fake signatures; they look like the real items. On the other hand, it can detect these fake items. MNIST in our research is just a case study. Once the user has all details about application, GANGA can be applied to get best parameters. Comparing with [23], the author validates the value 0.0001 for learning rate and 16 for batch size, with seam loss type, max discriminator accuracy was 96.25% using ACGAN: auxiliary classifier generative adversarial network. In our case, the discriminator accuracy was 100% for learning rate 0.00019984 and batch size 64 and keep probability 0.812.
Best values of hyper-parameters are described in Tables 5 and 8; the main case is that we have two options to select these parameters: Fig. 9 Generator results of final testing • If the goal of GAN is to detect fake items in the used system (the preferred part is discriminator), Table 5 has to be considered for the hyper-parameters. • If the goal of GAN is to fake the used system with fake items, (the preferred part is generator), Table 8 has to be considered for the hyper-parameters.
While tuning a hyper-parameters, we tried to search for each item separately to understand and analysis the effect of each parameter. Search for many parameters at same time, this will not enable us to understand which parameters is affected the system performance. We can simplify the process of finding best values of best hyper-parameters as follows:  At the end of this process, the combination of all tuned hyper-parameters gave us best performance. Figure 10 shows pseudo-code for all our work.

Conclusion
A Generative Adversarial Network is presented in this study to generate fake images for digits from 1 to 9, and train Discriminator part to classify the results into fake or real. Genetic Algorithm was used to select best values for some hyper-parameters of GAN, results showed the importance of GA in selecting hyper-parameters. As a future work, the structure and design of GAN can be edited with best values of hyper-parameters to make GAN as robust as possible.