GGADN: Guided generative adversarial dehazing network

Image dehazing has always been a challenging topic in image processing. The development of deep learning methods, especially the generative adversarial networks (GAN), provides a new way for image dehazing. In recent years, many deep learning methods based on GAN have been applied to image dehazing. However, GAN has two problems in image dehazing. Firstly, For haze image, haze not only reduces the quality of the image but also blurs the details of the image. For GAN network, it is difficult for the generator to restore the details of the whole image while removing the haze. Secondly, GAN model is defined as a minimax problem, which weakens the loss function. It is difficult to distinguish whether GAN is making progress in the training process. Therefore, we propose a guided generative adversarial dehazing network (GGADN). Different from other generation adversarial networks, GGADN adds a guided module on the generator. The guided module verifies the network of each layer of the generator. At the same time, the details of the map generated by each layer are strengthened. Network training is based on the pre-trained VGG feature model and L1-regularized gradient prior which is developed by new loss function parameters. From the dehazing results of synthetic images and real images, the proposed method is better than the state-of-the-art dehazing methods.


Introduction
In computer vision, the weather is an important factor affecting the quality of image. References (Khan 2018;O'Mahony 2019;Nanni et al. 2017) show the latest achievements of deep learning in this field. However, references (Voulodimos 2018;Janai 2020;Kendall and Yarin 2017) elaborate the challenges in this direction. Especially for haze, floating particles in haze lead to the fading and blurring of pictures, and the reduction in contrast and softness. They absorb and scatter light, resulting in serious color attenuation, poor clarity and contrast, and poor visual effect, which has a serious impact on subsequent Communicated by Mu-Yen Chen. B Qinqin Dong 2009027@whse.edu.cn; 337034622@qq.com 1 computer vision tasks. Therefore, it is necessary to remove haze effectively.
In recent years, the research of image dehazing algorithm has made great progress. At present, image dehazing research is mainly divided into two types, feature-based method (Hou et al. 2020;Dharejo 2021) and learning-based method (Zhao 2021). The difficulty of feature-based method lies in the feature extraction and a priori choice. The common dehazing features and priors are as follows: -Contrast: Tan found that the contrast of haze-free image was high. Thus, image dehazing is performed by maximizing the local contrast of the image. -Dark channel prior: He found that the value of dark channel in haze-free image is close to zero, and then it can be used to estimate the transmission image. References Zhang 2017;Iwamoto et al. 2020) improve the dark channel method and achieve better results of haze removal. -Prior of color attenuation: Zhu found the relationship between haze concentration and brightness and saturation through statistics. And a linear model of scene depth is created to solve the scene. Then, the haze-free image is calculated. Zhang et al. (2021) improves this algorithm and applies it to the water region dehazing.
Learning-based dehazing algorithms can be divided into two kinds, step-by-step learning algorithm (Dudhane and Murala 2019;Chen 2020) and end-to-end learning algorithm (Li 2021;Shyam et al. 2021).
Step-by-step learning algorithm is similar to the traditional method, focusing on the prediction of intermediate variables. For example, Cai (2016) designed a dehaze net by analyzing artificial prior features to complete the prediction of transmission image. Similarly, Ren (2016) proposed a multi-scale convolutional neural network (MSCNN), which can accurately predict the transmission image through two different scale network models. End-toend learning algorithm can realize image dehazing simply and efficiently through the design of full convolutional neural network. For example, considering that the above algorithm ignores the reasonable prediction of atmospheric light value, Li (2017) integrated multiple intermediate variables in the atmospheric scattering model into one using linear variation, and proposed AOD net to directly predict haze-free images.
In this paper, we present a guided generative adversarial dehazing Network (GGADN). The innovation factors of this paper are listed as follows.
-Different from other generation adversarial networks, GGADN adds a guided module on the generator. The guided module verifies the network of each layer of the generator. At the same time, the details of the map generated by each layer are strengthened. -GGADN is trained and corrected in a synthetic fuzzy image dataset containing indoor and outdoor images. Network training is based on the pre-trained visual geometry group (VGG) feature model and L1-regularized gradient prior which is developed by new loss function parameters.
From the dehazing results of synthetic images and real images, the proposed method is better than the state-of-theart dehazing methods. And the image after dehazing is clearer in detail.
The remainder of this paper is organized as follows. In Sect. 2, we briefly review the atmospheric scattering model and generative adversarial networks. In Sect. 3, we present the details of our proposed method. Experiments are presented in Sect. 4. The conclusion is drawn in Sect. 5.

Atmospheric scattering model
The purpose of image defogging is to restore a clear image from the blurred image corroded by haze or smoke. In the field of computer vision, in order to overcome the image distortion caused by haze, McCartney (1976) proposed an atmospheric scattering model which can be used to describe the formation process of haze image (Ju 2021;Ju et al. 2017;Dai 2020). The equation is as follows: where I (x) is a haze image and J (x) is a haze-free image.
A(x) is the value of atmospheric light, representing the intensity of atmospheric light. T (x) is the transmittance, which indicates the part of light that is not scattered when it reaches the imaging device through the atmospheric medium, and x represents the pixel position. In the formula, the first term on the right is the direct attenuation term, which represents the reflected light of the object after atmospheric attenuation, and the second term is the enhanced atmospheric light obtained by atmospheric scattering. When the composition of the atmosphere is uniform, that is, A(x) is constant, the transmittance can be expressed as: where β is the attenuation coefficient of the atmosphere and D(x) is the depth of the scene. From the formula, it is not difficult to find that the scene depth and atmospheric light value have a great influence on the image dehazing effect. In single-image dehazing, only the haze image is known, the atmospheric light value and scene depth are unknown, and the assumption of uniform atmospheric composition is not necessarily true. Therefore, how to remove haze effectively is a challenging problem.

Generative adversarial networks
GAN (generative adversarial networks) is a deep learning model designed by Goodfellow et al. (2014). GAN is a deep learning model, which is one of the most promising unsupervised learning methods on complex distribution in recent years (Schonfeld et al. 2020). References (Gui et al. 2020;Deldjoo et al. 2021) introduce the development of this field. The model produces good output through the game learning of two modules in the framework: generative model and discriminative model (Karras et al. 2020). In the original GAN theory, we do not need that G and D are neural networks, we just need to fit the corresponding generating and discriminating functions. But in practice, deep neural networks are generally used as G and D (Lin et al. 2021). An excellent GAN application needs a good training method, and otherwise, the output may not be ideal due to the freedom of the neural network model (Yamamoto et al. 2020).
However, the network structure of GAN is unstable in the training process, and some artifacts such as noise and color shift are often produced in the synthetic image. Sohn et al. (2015) introduced conditional information into GAN. The increase of conditional information variables ensures the stability of learning process to a certain extent and improves the representation ability of generator, but the running time is too long. Engin et al. (2018) propose an end-to-end network called Cycle-Dehaze for single-image dehazing. In order to improve the quality of texture information recovery and produce a better visual haze-free image, the CycleGan agent is enhanced by combining cycle consistency and perceptual loss. Du and Xin (2018) also uses the end-to-end learning method to learn the mapping from hazy image to haze-free image directly. This dehazing network generates the confrontation training network through the GAN model. An adaptive loss function is used in discriminator. A postprocessing method for halo artifacts removal using guided filtering is proposed. Dong et al. (2020) proposes a Multiscale Boosted Dehazing Network with Dense Feature Fusion based on the U-Net architecture. This method is designed based on two principles, boosting and error feedback, and is suitable for the dehazing problem. By incorporating the Strengthen-Operate-Subtract boosting strategy in the decoder of the proposed model, this method develops a simple yet effective boosted decoder to progressively restore the haze-free image.  proposes a non-local dehazing network (NLDN), which learns the mapping between hazy images and haze-free images. This network architecture consists three components: The first is full point-wise convolutional part, which extracts non-local statistical regularities; the second is feature combination part, which learns the spatial relation of statistical regularities; the third is reconstruction part, which recovers the haze-free image by the features extracted from the second part. By using these three components, the network obtains a high-quality dehazing result.
Although the existing GAN network has achieved some results in haze removal, there are also some problems. This is the motivation of this paper.
-Firstly, GAN model is defined as a minimax problem, which has no loss function. It is difficult to distinguish whether GAN is making progress in the training process.
The learning process of GAN may have collapse problem, and the generator begins to degenerate. It always generates the same sample points and cannot continue learning. When the generating model collapses, the discrimination model will also point to the similar direction for the similar sample points, so the training cannot continue.
-Secondly, for haze image, haze not only reduces the quality of the image, but also blurs the details of the image. For GAN network, it is difficult for the generator to restore the details of the whole image while removing the haze. Especially for the image with complex structure, the effect of dehazing is not ideal.
Based on the above motivations, this paper improves the dehazing GAN and designs a guided generative adversarial dehazing network (GGADN). The main research contributions of the proposed dehazing network are as follows: -Firstly, the proposed network training is based on the pretrained visual geometry group (VGG) feature model and L1-regularized gradient prior which is developed by new loss function parameters. The whole loss function frame is a distributed structure. The loss function of generator model includes not only the output of generator model but also the result of discriminator model. The composite loss function avoids the same sample points. Due to the combination of the feedback results of the discriminator model, the degradation probability of generator model is greatly reduced. Therefore, new loss function enables the proposed network to make progress in the training. -Secondly, in order to improve the detail quality of dehazing image, we add a guide module in the generator. The guidance module corrects the details of each layer network of the generator. The guide maps the high-level features of the data distribution to the low-level representations of the data and provides the feature mapping to filter the data and parameters by processing the image training set.
The next section will present the proposed method.

The proposed method
This paper specifically presents a guided generative adversarial dehazing network (GGADN). In GGADN the generator and discriminator architectures are modified. The synthetic data set is trained by end-to-end training neural network. The loss function is modified by using the pre-trained VGG feature and L1-regularization gradient. Sigmoid function is introduced to the last layer of discriminator for feature mapping. In order to carry out probability analysis, the discriminant results can be normalized to [0, 1].

Guided generative adversarial dehazing network (GGADN)
In the proposed algorithm, an end-to-end dehazing network is used to train the network to avoid image distortion or arti-fact which caused by the estimation of transmittance and atmospheric light value. In order to generate better dehazing image from generator, generator and discriminator architecture are designed in the GGAN network so that it can capture more useful information. At the same time, for generating realistic and clear images and remove artifacts, the dehazing network is trained and corrected in a synthetic fuzzy image dataset containing indoor and outdoor images. Network training is based on the pre-trained VGG feature model and L1-regularized gradient prior which is developed by new loss function parameters. The VGG features include: -Small convolution core, replacing convolution core with 3 × 3. -Small pool core, replacing 3 × 3 pool core with 2 × 2 pool core. -Deeper layer feature map and wider feature map. Because convolution core focuses on expanding channel number and pooling focus on reducing width and height, the model architecture is deeper and wider, while the increase of computation amount slows down. -Full connection convolution. In the network test phase, three full connections in the training phase are replaced by three convolutions. And the parameters of training are reused in the test, so that the full convolution network can receive any width or height input because there is no full connection limit.
The overall framework of proposed method is shown in Fig. 1.

Generator model
The function of the generator is to generate a clear image from the input haze image. It should not only preserve the structure and details of the input image, but also remove the haze as much as possible. The generator introduces the skip connection of symmetry layer in "RESNET" and "u-net" models to break through the bottleneck of information redundancy in the process of dehazing. In addition, the generator discards the method of simply connecting all channels of symmetry layer, and uses the summary method to capture more useful information. The generation process is mainly based on the down sampling operation.
In order to better guide the generator to generate haze-free image, we add a guide module in the generator. The guide maps the high-level features of the data distribution to the low-level representations of the data, and provides the feature mapping to filter the data and parameters by processing the image training set. The process of guide module mainly uses up sampling operation and nonlinear space transfer, absorbs the low-level representation of the training data set, and then outputs the high-level representation of the same data. Specifically, the generator modifies the loss function, and selects the weight for training, so as to obtain a clear image. Figure  2 shows the generator structure.
The main function of generation module is to generate haze-free image. As depicted in Fig. 1, the generation module includes five convolutional layers. These convolutional layers form multi-scale features by fusing varied size filters. They are defined from left to right as follows: -Conv1: In-channels are three. Out-channels are three.
Guidance module provides guidance to the generation module. As depicted in Fig. 2, the compensation module includes five convolutional layers. These layers have the same structure: -Conv1: In-channels are three. Out-channels are three.
Every layer of guidance module will output a guide data. These guide data ensure that the generating module will not have large deviation when generating haze-free image.

Discriminator model
The function of discriminant network is to distinguish the real image from the generated image. Like most GAN network structures, the structure design of discriminant network is generally not too complicated. The discriminant network in this paper is shown in Fig. 3. Different from the original GAN discriminator, as a true/false two classifier, the purpose of D(x) of the discriminator in GGAN is to approximate the Wasserstein distance between the real sample and the generated sample, which belongs to the regression task. So the network output does not use sigmoid activation function, but directly outputs the result.
The discriminant network includes six convolutional layers. They are defined from left to right as follows: -Conv1: In-channels are 64. Out-channels are 64. Kernel size is 4 × 4.

Loss function
In order to train the proposed network, a loss function is needed to measure the effect of image dehazing. In this paper, a multi task loss function is used, which includes three parts: discrimination loss, image mean square error loss, and perception loss Let {I i , i = 1, 2, . . . , N } and {J i , i = 1, 2, . . . , N } denote the haze image and the corresponding clear image, respectively. The GGAN training method is to directly use the adversarial loss function is expressed as: where D is the discriminant network,J i is the output of generator G. Discriminators are input in Minibatch mode. In image processing, the whole large training set is divided into several small training sets to improve the computational efficiency and help to train the model quickly. The size of each subset of Minibatch's gradient descent is just in the middle of the two extreme cases, and it can be vectorized without waiting for the whole training set to complete. Only formula 3 cannot restore a clear image, so the perceptual loss based on pre-trained VGG features is introduced to constrain the generator, and the perceptual loss function of VGG features is defined as: where G is the generating network. F i is the characteristic map of layer i. VGG network is pre-trained on ImageNet. Equation 4 can help detail recovery and haze removal, but it introduces artifacts into the restored image, which reduces the quality of the restored image. In order to eliminate artifacts and preserve details and structure, L1 regularization gradient is introduced before the output of the generator and the loss of content-based pixel mode. The definition is as follows: where G(I i ) l is the total variation regularization. G(I i )− J i ) l represents content-based direction loss. λ is the regularization weight. The loss function can remove the detail artifacts and preserve the details in the image. Finally, these losses are combined to standardize the proposed GGAN. The loss function H is defined as:h where α, β, γ represent positive weight. After the generator is modified according to equation 6, the discriminator is updated by equation 7.

Experimental results and analysis
To evaluate the performance of proposed model, we compare it with several advanced single image dehazing algorithms on synthetic datasets and real scene images. The comparison algorithm selected in this paper mainly includes: DCP (He et al. 2010), DehazeNet (Cai 2016), MSCNN (Ren 2016), AODNet (Qian et al. 2020). DCP is a traditional method based on prior. DehazeNet and MSCNN are deep learning methods based on multi-step learning. And AODNet is an end-to-end image dehazing algorithm. All of the experiments are performed on Python and PC system with Nvidia Quadro P400 GPU.

Synthetic experimental data set
Since there is almost no real haze image data set in the image dehazing, this paper uses the synthesized data set including indoor and outdoor images to train the network. NYU depth data set (Huynh et al. 2020) including only indoor images and Make3D data set (Sagar 2020) of outdoor images are selected as the training data set. Among them, 3000 synthetic images are randomly selected and 300 test images are extracted. Given the clear image J and the corresponding ground real-time depth D, the random atmospheric light value A = [n1, n2, n3] is generated, where n ∈ (0.8, 1.0). The random value ρ ∈ (0.8, 1.6) is used for each image. Finally, the haze image I is synthesized according to formula 3. Before the synthesis, the clear image and the corresponding scene depth map are adjusted to a standard size of 512 × 512 pixels. Direct synthetic images usually have obvious artifacts. In order to eliminate these artifacts, the better smooth haze image is generated by the method of guided filtering through the filtering depth.

Implementation
In the network training stage, the convolution kernel of each layer is initialized as Gaussian distribution, the bias is initialized as 0, and the learning rate is initialized as 0.0001. The parameters of loss function are α = 1, β = 150, γ = 150, λ = 10 −5 . The model is trained in small batch, and

Analysis of experimental results on synthetic data sets
In order to verify the dehazing effect of the proposed algorithm, this paper compares the proposed model with the existing classical algorithms in the synthetic data test set.
In the synthetic dataset, the clear image corresponding to the haze image is known, so the performance of the model can be evaluated qualitatively and quantitatively. Figure 4 shows the dehazing results of a typical two samples in the test set. In the first image, the background is bright, and the difficulty is to remove the haze while preserving the details of the vision. It is not difficult to find that DCP algorithm over dehazing, resulting in color distortion. However, the algorithm based on learning still has the problems of insufficient dehazing degree and loss of image details. The background of the second image is dim, so the restoration of image details is undoubtedly a challenge. As can be seen from the figure, compared with the learning based method, the traditional DCP dehazing result has excessive dehazing. And the learning based methods dehaze net, mscnn, AOD-Net, compared with proposed method, there is not enough dehazing. It can be seen from the results that the proposed algorithm in this paper has strong applicability and better dehazing results.
In order to evaluate the algorithm objectively, this paper uses peak-signal-to-noise ratio (PSNR) (Helmrich et al. 2020) and Structural Similarity index (SSIM) (Jin 2020) as objective evaluation indexes. PSNR reflects the integrity of image structure information. The higher the PSNR is, the less the image is affected by noise. The equation about PSNR is as follows: SSIM reflects the similarity of image structure information. The larger the value is, the closer the image is, and the smaller the distortion difference is. The equation about SSIM is as follows: where μ x is the mean of x, μ y is the mean of y, σ 2 x is the variance of x, σ 2 y is the variance of y, and σ xy is the covariance of x and y. c 1 = (k 1 L) 2 , c 2 = (k 2 L) 2 are constants used to maintain stability. L is the dynamic range of pixel values.
The PSNR and SSIM index results of the five algorithms after dehazing are shown in Table 1. Compared with other algorithms, the SSIM of proposed algorithm is higher, which shows that the algorithm has better restoration effect in the image edge details. And this algorithm has obvious advantages in PSNR index, which shows that the dehazing image of this algorithm is less affected by noise, the image is clearer, and the dehazing quality is significantly improved.

Analysis of experimental results on real images
The proposed model is trained on the synthetic haze image data set, and the haze removal effect is ideal in the synthetic test set. Because there is still a gap between the real scene and the synthetic haze image in visual perception, in order to verify the generalization ability of proposed method, the proposed method is compared with DCP, DehazeNet, MSCNN and AODNet in the natural scene images. We selected five difficult dehazing images. And the dehazing results are shown in the Fig. 5. These five haze images are challenging. They either have complex objects or lots of sky and white areas. Surprisingly, the proposed approach has been successful in these challenges. Although DCP method can eliminate some haze, it will produce artifacts and color distortion such as the second image. Due to the inaccuracy of transmittance estimation, DehazeNet and MSCNN still contain more artifacts and haze residues such as the third image and the fourth image. AOD-Net method can generate a clear image as shown in the fifth image, but compared with the dehazing image generated by proposed algorithm, there are more details of unclear objects, the color of dehazing image is dark, and the sky part of the image has slight color distortion. Compared with other dehazing methods (DCP, DehazeNet, MSCNN and AOD-Net), the advantages of proposed method are that the detail information of dehazing image is preserved completely, the color recovery is more natural, and the degree of dehazing is moderate. Therefore, the algorithm in this paper is better than other contrast algorithms in natural haze image processing and has strong applicability.

Discussion
Based on the analysis of the above experiments, we have some discoveries.
-The guidance module effectively checks the generated model network, which makes the details of the final dehazing image more clear. -Reasonable loss function design makes proposed method achieve better results in the dehazing challenge images. Such as haze images of rivers and sky areas.
In summary, the experiments prove the key role of proposed two innovations in image dehazing.

Conclusions
This paper specifically presents a guided generative adversarial dehazing network (GGADN). In GGADN the generator and discriminator architectures are modified. The synthetic Haze Image DCP DehazeNet MSCNN AODNet Proposed

Fig. 5
Qualitative comparisons with state-of-the-art dehazing methods for hazy images on real data sets data set is trained by end-to-end training neural network. The loss function is modified by using the pre-trained VGG feature and L1-regularization gradient. Sigmoid function is introduced to the last layer of discriminator for feature mapping. From the dehazing results of synthetic images and real images, proposed method is better than the state-of-the-art dehazing methods. And the image after dehazing is clearer in detail. The proposed dehazing network has tow contributions. Firstly, due to the combination of the feedback results of the discriminator model, the degradation probability of generator model is greatly reduced. Therefore, new loss function enables the proposed network to make progress in the training. Secondly, in order to improve the detail quality of dehazing image, we add a guide module in the generator. The guidance module corrects the details of each layer network of the generator. The guide maps the high-level features of the data distribution to the low-level representations of the data, and provides the feature mapping to filter the data and parameters by processing the image training set.
However, for the same natural scene, it is very difficult to collect the haze image and haze-free image at same scene. Therefore, the train set of dehazing is composed of haze-free images and synthetic haze images. After all, synthetic haze is not real haze. It will affect the training of haze removal. This is the first limitation of this paper. The discriminator model uses the conventional discriminator network structure. when judging special images, there will be a deviation, such as white-based image, dim image. This is the second limitation of this paper.
In the future, we will explore at least two directions based on proposed method. The first one is to explore the clear image without reference for training the dehazing network (Wang et al. 2021;. As we mentioned above, the training set now uses a pair of synthetic images. Because the synthetic image is not a real image, it may mislead the training (Huang et al. 2021;Caviglione et al. 2020). The second is to explore video dehazing (Sekh 2020;Rani et al. 2021). At present, the proposed method has achieved good results in single image dehazing. However, in real-time processing, we need to continue to improve the network for video dehazing (Preethaa and Sabari 2020;Ahmad and Peters 2020).