Single image dehazing using frequency-guided filtering and progressive physics learning

Abstract. Current learning-based dehazing methods simply rely on the paired synthetic datasets and physical models, which can hardly describe the complicated degradation for practical applications. These methods still struggle to achieve haze removal, color distortion, and detail restoration instantly, and they ignore the frequency characteristic differences and prior knowledge importance. To address these problems, we propose an unpaired stage-wise framework integrating frequency-guided filtering and progressive physics learning in an adversarial dehazing network, called FPD-Net. To be specific, a guided filter based on frequency information is employed to decompose the high and low frequency components for better feature extraction. We further merge the prior and physical knowledge to form progressive physics learning, which produces pleasing haze-free outputs with high visibility and reality. For better atmospheric light estimation, the variational auto-encoder and Kullback–Leibler loss are included to represent the illumination message. Extensive experiments on both synthetic and real datasets prove that our designed FPD-Net achieves better performance visually and quantitatively than the comparison dehazing models.


Introduction
Images captured under hazy conditions suffer from blurring, color distortion, and other degradation problems.Taking hazy images directly as input, advanced computer vision tasks may be severely affected by low quality, resulting in poor performance.As a vital prerequisite, single image haze removal is an important issue and needs to be extensively studied.
Generally, the atmospheric scattering model (ASM) for the hazing process is described as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 7 ; 2 3 1 where IðxÞ and JðxÞ refer to the hazy image and corresponding clear one, respectively, and x is the pixel location.A and tðxÞ denote the global atmospheric light and the medium transmission map, respectively.Image dehazing attempts to generate the dehazed image JðxÞ from the hazy image IðxÞ, which is considered to be an ill-posed problem.
To solve this problem, the conventional approaches employ various hand-crafted priors to distinguish the haze from clear images.Earlier dehazing studies triy to increase the image quality through image enhancement and parameter estimation; these include Retinex processing, 1 homomorphic filtering, 2 dark channel prior (DCP), 3 color attenuation prior, 4 non-local prior (NLP), 5 and so on.These methods are efficient in visibility restoration; however, they usually lead to abnormal color shift and halos in various scenarios.
Recently, data-driven methods employ learning-based models to investigate some richer cues, which can hardly be found using only empirical knowledge.0][11][12][13][14][15] Cai et al. 8 tried to estimate the transmission map and then used nonlinear regression to restore the haze-free image.EPDN 15 serves the dehazing task as an image-to-image translation problem and uses the embedded GAN and enhancer to produce perceptually satisfactory results.7][18][19] CycleGAN 16 employs the unpaired and cycle-consistent translation to adapt the domain between the hazy and haze-free images.Semi-dehaze 17 comprises supervised and unsupervised training branches jointly for dehazing purposes.As a weakly supervised dehazing method, RefineDNet 18 applies the two-stage adversarial learning to refine dehazed results following the physical-prior stage.Though achieving great progress, existing unpaired dehazing methods suffer from the under-constrained problem and fail to restore the hazy images well, especially for real ones.Previous approaches ignored the frequency cues and did not effectively use different features at high and low frequencies.Using physical models alone, the estimation of parameters lacks effective constraints.
To address the above issue, we propose a stage-wise framework for dehazing named FPD-Net, which can remove the haze properly and restore the structure details in an unpaired manner.Using frequency-guided filtering, the image frequency-aware decomposition is performed prior to training the feature extraction module.Therefore, our model generates high and low frequency feature maps, and then these features are continuously refined during the following training process.Next, we further designed a progressive physics learning stage to efficiently conduct the features from low-frequency maps.In detail, DCP is applied to produce the preliminary results for better restoring the visibility outcomes, and further, three subnets are designed to evaluate the transmission map (T), atmospheric light (A) and the dehazed image (J) of the physical model.To improve the estimation of atmospheric light, we embed the variational auto-encoder and Kullback-Leibler (KL) loss to constrain the training process.The contribution can be summarized as follows.
1. We propose a stage-wise unpaired dehazing model, FPD-Net, which first adopts a frequency-guided filtering stage to separate the frequency information for better feature extraction.This stage aids in better understanding the hazy scenes and improving the subsequent processing steps.Then we apply the progressive physics learning stage to improve the parameter estimation of the ASM.By gradually refining the physics understanding, we achieve an accurate and reliable parameter estimation for hazy tasks.2. A novel progressive physics learning strategy is designed to integrate the advantages of prior and learning-based haze removal methods.This strategy leverages the variational auto-encoder and KL loss to improve the atmospheric light estimation.By integrating these advanced techniques, our model demonstrates superior performance in generating visually pleasing dehazed results with enhanced visibility.

Proposed Method
In this section, we present the designed FPD-Net, which is a stage-wise model to deal with the dehazing task in an unpaired way.As shown in Fig. 1, the proposed FPD-Net decomposes the input image into the high and low frequency components and then reformulates the lowfrequency parts with a progressive physics learning strategy.For better physical learning, we further add constraints on the atmospheric light using the variational auto-encoder and KL loss.The details of our proposed method are explained in the following subsections.

Frequency-Guided Filtering Stage
In general, hazy images contain numerous high and low frequency features.The high frequency features reflect the structures and edges, and the low rank features represent the color and illumination distributions.Estimating the high rank features directly is complex, as there is a strong presence of hazy accumulation in the lower frequency range.This entanglement also leads to increased difficulty in calculating the haze parameters J, A, and T. Therefore, it is necessary to decompose the high and low frequencies and manage the dehazing problems according to the specific frequency range.Inspired by Ref. 20, a trainable guided filter is used in our frequency-guided filtering module.Our filtering process decomposes the frequency information using a differentiable CNN layer and residue channel messages.We smooth the raw input image I, so the smoothed output serves as the low-rank frequency component I L .The residual part I H is calculated as I H ¼ I-I L and contains vast high frequency information about the background details.In each part, Eq. ( 1) becomes E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 7 ; 4 1 4 where ð•Þ H and ð•Þ L , represent the high frequency component and the low frequency component, respectively.Because the high frequencies are more texture information, the frequency variation is not significant for color.Assuming that the atmospheric light A is constant, we assume A H ¼ 0, and the cross effects of A H on I L and A L on I H are relatively small.Therefore, the prior equation is translated into ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 7 ; 3 3 0 As the haze and atmospheric light are approximately constant, low frequency components can represent the major features of haze, depth, and illumination.For better frequency decomposition, we employ the residual image 21 as a reference one to guide the filtering in low-pass smoothing process.This guided filtering has a spatially varying low-frequency passband, which greatly facilitates the learning of haze related information.The residual image is expressed as where I c and I d are the color channels of residual image I.This residual channel has a transformed version of the background with an extremely low haze.Thus, it provides information to guide and change the passband in the low-frequency smoothing, and it helps to reduce the background detail loss in hazy image.In other words, using this strategy makes the extracted highfrequency components only contain background texture information, whereas haze features exist in the low-frequency components.To deal with the large variation of background, frequencyguided filtering applies several smoothing kernels K belonging to the set k ¼ 2 i , i ¼ 0; 1, to handle different frequencies.The filter outputs of each channel are concatenated and then sent to a 1*1 convolution for further channel feature selection and parameter estimation.The 1*1 convolution is generally equivalent to the cross channel parametric pooling layer in a certain level.It can realize cross-channel interaction and information fusion and thus enhances the expressive capability of the network.Therefore, the use of a 1*1 convolution allows for learning linear combinations of the input channels, which can help to capture more complex and informative features from the previous steps.

Progressive Physics Learning Stage
Due to the lack of paired samples as a training dataset, learning-based methods suffer from limited abilities for real dehazing tasks.A progressive physics learning strategy is exploited to cascade the physical prior operations with the reformulated scattering model-based dehazing procedure.DCP, proposed by He et al., 3 points out that each image pixel has one or more color channels with values that are nearest to zero.It is one of the effective physical prior models for image dehazing.In the first stage, progressive physics learning utilities DCP to perform the initial dehazing to remove the haze, but DCP alone is not enough as it brings too much color shift and hazy residue.
Referring to the ASM, we then attempt to use the prior knowledge to acquire the atmospheric light (A), transmission map (T), and clear images (J) in the second stage.As AðxÞ is an independent factor, the global atmospheric light serves as a potential sampling from a Gaussian distribution in the hazy image.Therefore, the evaluation of AðxÞ can be considered to be a variational inference problem.To be specific, we introduce a variational autoencoder to form the A-Net, which has three main modules: encode, intermediate space, and decode.The encoderdecoder process is a symmetric U-NET structure 22 that includes four downsampling and four upsampling convolution layers.To learn the latent Gaussian model, the intermediate block transforms the encoder output to the latent Gaussian distribution by minimizing the KL loss.By reparametrizing vector z, the reconstruction of the latent code is generated by resampling the Gaussian distribution.Reparametrized latent variable z is passed to the variation decoder to obtain the reconstruction of disentangled atmospheric light AðxÞ.The specific equation is defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 4 ; 4 3 6 where q ϕ ðzÞ is the actual features distribution and p θ ðzÞ is the target distribution, which is the unit Gaussian distribution.
Inspired by the work of Ref. 18, a similar autoencoder network is applied to estimate the transmission map and clear images.In addition, the high frequency information is involved to produce a more realist reconstruction image.

Objective Function
The hybrid loss function is proposed to reach the clear image domain; it includes the adversarial, reconstruction, and KL losses.
Adversarial loss: to generate better hazy-free images, we introduce a multiscale adversarial discriminator to train the intermediate output J process.Specifically, the multiscale discriminator D is trained to detect whether an image is "real" or "fake."The adversarial loss is described as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 4 ; 2 5 6 The reconstruction loss is employed to evaluate the difference between the input hazy image IðxÞ and the corresponding reconstructed hazy image ÎðxÞ, and it is formulated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 4 ; 2 0 6 where p denotes the Frobenius parametrization of the given data matrix.Given a hazy image IðxÞ, ÎðxÞ is computed by Eq. (1) using the outputs of three sub-networks.
The KL loss is proposed to reduce the mismatch between the latent variables z ∈ N ðμ z ; σ 2 z Þ and the conforming normal Gaussian distribution N ð0;1Þ.Using the standard stochastic gradient methods, the reparametrized optimization can be utilized to achieve a lower bound estimation, 23 mathematically.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 4 ; 1 0 4 where z i is the i'th dimension of element z and KLð•Þ represents the KL divergence between two distributions.
3 Experiment Results

Datasets setup
We evaluate the proposed FPD-Net on three datasets including the RESIDE dataset, 24 the Foggy Cityscapes dataset, 25 and a real hazy dataset.The RESIDE-unpaired 18 dataset is chosen for training and includes 3577 clear images and 2903 hazy images.For the testing, we evaluate the dehazing result using 500 indoor images of the synthetic objective testing set (SOTS) and 10 synthetic hazy images of the hybrid subjective testing set (HSTS).The Foggy Cityscapes dataset 25 contains 5000 images of urban street scenes with 1024 × 2048 pixels in each image; it has 2975 images in training set, 500 images in validation set, and 1525 images in test set.Additionally, we select 50 real-world hazy images collected from the Google search for practical testing.

Experimental setup
The entire structure and parameter settings of FPD-Net are shown in Fig. 1, and all experiments are implemented using the platform PyTorch on a server with a NVIDIA Tesla V100 GPU.For the training process, the Adam optimizer 26 is applied with an initial learning rate of 0.0002, and the batch size is set to 8. According to the typical GAN, the improved generator and discriminator are updated alternately.The FPD-Net is trained for a total of 200 epochs, which is decayed by half each 40 epochs.For better practical use, the whole framework is trained using no paired hazy datasets.

Comparison of Dehazing Performance
We evaluate FPD-Net with a prior-based model (DCP 3 ), two supervised learning-based nets (DehazeNet 17 and EPDN 15 ), and three unpaired methods (CycleGAN, 16 Semi-Dehaze, 8 and RefineDNet 18 ).For a fair comparison, we employ the same training sets, and the comparison algorithms are trained according to their own source implementation.For the synthetic testing data, two typical quality metrics peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) 27 are applied to evaluate the dehazing performance.We also adopt the no-reference metric natural image quality evaluator (NIQE) 28 to verify the quantitative comparison for the realworld conditions without a ground truth.

Results on the synthetic datasets
As an unpaired model, our designed model gets the best results on both PSNR and SSIM in Table 1, which means it obtains very completive results in both indoor and outdoor conditions.The improved scores reflect that our model can properly remove the haze and maintain the details even in various scenarios.For a further visual comparison, we present a few challenging samples with a dark environment or sky scene, as shown in Figs.2(a) and 2(b).As displayed, the DCP and CycleGAN suffer from too much color distortion, especially in the sky or low-light conditions.The other learning-based methods can deal with haze removal to a certain extent; however, there are still some hazy residuals by zooming the color boxes.According to the ground truth, our FPD-Net outperforms the comparison models in handling the haze in a long distance and large sky region, and it keeps the structures well even in the dark surrounding.SSIM imitates the theory of structural similarity in human visual systems and is sensitive to the perception of local structural changes in images.SSIM quantifies the attributes of an image based on brightness, contrast, and structure, so our method has a higher SSIM indicator, which indicates that the restored image is superior in terms of the above attributes.These results prove that the frequency-guided filtering module helps the model to restore the detail, and the progressive physics learning module increases the ability to deal with complex light conditions.

Results on the unpaired real-world datasets
To verify the practical application, we perform fair experiments against other methods on the real-world hazy data from Google search.Table 2 and Fig. 2(c) show the competing dehazing results quantitatively and visually.Through the table, our designed method achieves a lower value of NIQE, which represents better dehazed outputs with higher fidelity and naturalness.
According to the pictures, DCP, EPDN, and CycleGAN make an unusual color shift.Semidehaze and RefineDNet produce several artifacts that seriously blur the dehazed images.As there is no ground truth, all of the comparison models suffer from obvious haze residual.By contrast, our FPD-Net can conduct the haze removal task, restoring the color and details in a natural level.This benefits from the frequency-guided filtering strategy and the constraints on the calculation of the physical model.

Ablation Studies
We perform the ablation studies to analyze the effectiveness of the main components and loss function impacts.To be fair, training settings and datasets are the same.

Network architecture
We study the structure strategies on the frequency-guided filtering (F) and variational auto-encoder for atmospheric light (V) to evaluate the effect on the feature extraction and light estimation.The studies are constructed on different modules as follows: (1) with only F, (2) with only V, and (3) with both F and V.The visual and quantitative results are presented in Fig. 3, with the description of NIQE under the images.The network with the F module has too much haze left and fails to restore the detail, whereas the one with only V produces some unusual artifacts.We can see that our module achieves better results with both F and V than the other two possible configurations.Each strategy has its own contribution to deal with the dehazing tasks.

Effect of loss function
We further analyze the effect on the loss functions.Three types of loss functions are shown in Table 3.We find that the loss without the adversarial loss and Rec loss drops sharply.As mentioned above, the KL loss helps the network obtain a better light estimation.Therefore, a hybrid loss with these three parts achieves the best value and is good for the dehazing process.The mentioned losses can complement each other to obtain a better dehazing performance.

Application
To demonstrate the effectiveness of our FPD-Net model, we employ the Google vision online tool to evaluate our dehazed results on the outdoor vision application as object detection and recognition.As illustrated in Fig. 4, more vehicles are detected and the recognition accuracy is increased using our dehazing process, indicating that FPD-Net is useful for the high-level tasks.Thus, our unpaired dehazing method can be used to improve the scalability of practical applications (Fig. 5).

Conclusion
We proposed a stage-wise FPD-Net for unpaired dehazing tasks.In the first stage, a frequencyguided filtering was introduced to decompose the low and high frequency components for better joint feature extraction.Furthermore, we integrated the DCP and physical model to form progressive physics learning in the second stage and the variational auto-encoder and KL loss to deal with the complex light environment.Comprehensive experimental evaluations showed that our FPD-Net exhibits superior performance in handing both synthetic and real-world hazy conditions.Quantitative analyses revealed that our FPD-Net achieved significant improvements in various metrics, such as PSNR and SSIM, when compared with state-of-the-art approaches.These findings validate the effectiveness of our proposed method in effectively handling unpaired dehazing tasks.

6 I
m p : i n t r a l i n k -; e 0 0 4 ; 1 1 7 ; 2 4 res ðxÞ ¼ max c∈r;g;b I c ðxÞ − min d∈r;g;b I d ðxÞ;

Fig. 1
Fig. 1 Overall structure of the proposed FPD-Net network for unpaired single image dehazing.

Fig. 3
Fig. 3 Visual comparison of the qualitative evaluation on the synthetic and real-world examples, including (a) Foggy Cityscapes, (b) HSTS for synthetic, and (c) HSTS for real-world.Zooming in the samples provides a proper view for the dehazing capability.

Fig. 4
Fig. 4 Visualization results of different structure strategies with the description of NIQE under the pictures.(a) Hazy image/3.6631,(b) with F only/3.5980, (c) with V only/3.3522,and (d) with F and V/3.2469.

Fig. 5
Fig. 5 Vehicle recognition results are tested on the Google Vision API for the input hazy image and our dehazed results.(a) Dehaze image detection result and (b) hazy image detection result.

Table 1
Quantitative comparison of the methods.The PSNR and SSIM are the average results on three hazy datasets.The bold values represent the best results for the dehazing methods.

Table 2
Quantitative comparison on the real dataset.The bold ones represent the best results.

Table 3
Ablation studies on loss function using SOTS-indoor datasets.