Color Correction and Repair of Haze Images under Hue-saturation-intensity Color Space and Machine Learning

To improve the visual eﬀect and quality of haze images after fog removal, a model for color correction and repair of haze images under hue-saturation-intensity (HSI) color space combined with machine learning is proposed. First, the haze image imaging model is constructed according to the atmospheric scattering theory. Second, based on HSI color space, the color enhancement and fog removal of the haze image model is proposed, and a haze image-transmittance gallery is constructed. Third, the visual dictionary of the transmittance graph is obtained by training the k-means clustering algorithm based on density parameter optimization and support vector machine algorithm based on genetic algorithm optimization. Fourth, based on the visual dictionary and the atmospheric scattering model, the haze image is repaired and defogged, and the subjective visual eﬀects and objective evaluation indexes of color enhancement and fog removal of haze images are compared. It is concluded that the algorithm can eﬀectively guarantee the detail and clarity of the image after defogging.


Introduction
Severe weather often occurs in North China, such as fog, haze, and sandstorms [1]. In these inclement weather conditions, suspended particles in the atmosphere reduce the contrast and clarity of images collected outdoors, and may also cause color shifts. [2]. The loss of image details may cause failures in road monitoring systems, military reconnaissance systems, and satellite monitoring systems, which in turn affect people's lives, social production, and national defense security [3].
In order to solve this problem, experts and scholars in relevant fields at home and abroad have successively proposed degraded image enhancement algorithms to improve the image quality of haze or sand. Dong et al. [4] proposed a defogging algorithm based on the improved multi-scale Retinex algorithm, which has a better defogging effect than the adaptive histogram equalization method and repaired the image degradation problem. In order to solve the problems of uneven distribution of image fog density and color imbalance in haze weather, Mei et al. [5] proposed a multi-layer fusion-based haze image enhancement algorithm and estimated transmission images based on the dark primary color prior theory. Finally, the model was qualitatively and quantitatively verified. Wen et al. [6] based on the application of the Kuwahara filter refinement algorithm in the process of transmittance refinement, proved that the algorithm can improve the speed of haze image defogging. Chaudhry et al. [7] proposed an image defogging algorithm combining a combination of median filtering, accelerated local Laplacian filtering and image decomposition based on 1 0 restricted gradients, and qualitative and quantitative verification of outdoor RGB images and remote sensing images.
At present, most researches on image defogging are based on the prior principle of dark primary colors, but there are few studies on applying machine learning to defogging images. Traditional prior defogging method has some disadvantages such as difficulty in capturing some features of haze images and low effect of image defogging.For example, in the process of image defogging, it is susceptible to interference from bright areas and white objects, resulting in image distortion.The existing haze image algorithms are all based on the atmospheric scattering model for image defogging. In the model, the transmittance and atmospheric light are the key variables that affect the defogging algorithm, but these two variables cannot be obtained directly from a single haze image. Therefore, an accurate estimation of the transmittance and atmospheric light of the haze image is the key to improving the dehazing effect. The dark primary color a priori algorithm is to obtain the transmittance and atmospheric light of the image by comparing the original image and the haze image, which has strong limitations.In order to solve this problem, a machine learning algorithm is proposed to capture the features of haze images and estimate the transmittance of haze images more accurately.This article first uses the HIS color space method to construct the haze image-transmittance graph library [8]; Secondly, the machine learning algorithm is constructed using the k-means clustering algorithm optimized by the density parameter method and the GA-SVM algorithm of the support vector machine improved by the genetic algorithm [9][10][11]. The machine learning algorithm is used to train the haze transmittance image library to obtain visual words with different characteristics, and use these visual words with different characteristics to construct a visual dictionary. Then, when defogging the haze image, first detect the haze image, and then select the best transmittance map according to the characteristics of the haze image; Finally, combined with the atmospheric scattering model, the haze image is defogged. The experimental part verifies the effectiveness of the algorithm from subjective and objective aspects.

Haze image imaging model
In the haze weather, there will be more suspended particles in the atmosphere than usual, and these suspended particles will generate Mie scattering during natural light propagation [12]. During image acquisition, light from the sky and ground accidentally enters the lens, forming a haze image. [13]Therefore, based on the principle of Mie scattering, the model used to describe the degradation process of the haze image is of great significance for the clear processing of the haze image. The mathematical expression of the haze image imaging process is: In the above formula, x represents the position of pixels in the image, F is the input haze image, Z is the image when there is no fog, A is the atmospheric light value, and t is the transmittance. In the haze imaging model, t and A are unknown quantities.
To get a clear image Z, we need to estimate the value of image transmittance t and the value of atmospheric light A.
When the atmosphere is uniform, the formula for calculating the transmittance t is: In the above formula, β is the scattering coefficient of the atmosphere, and d is the distance between the target object and the lens.

Refinement of transmittance based on guided filter
It can be seen from formula (1) that in fog and haze, the collected image contains two models [14]. Z (x) t (x) is the light attenuation model,A (1 − t (x)) is the atmospheric light model. However, the formula contains three unknown parameters and is an ill-conditioned equation. It is necessary to introduce a priori condition in the calculation, such as the prior theory of dark primary colors.
The mathematical expression of the dark primary color is: (Z c (y)) → 0 In the above formula, Ω(x) is the partial window of the neutral point x, and Z c is a color channel of Z image in RGB color space.
According to the priori theory of dark primary colors, the deformation of formula (1) is as follows: In the above formula, A c is always a positive number, then the coarse transmittance In real life, when viewing distant objects, there will still be smog. Therefore, to keep a small amount of fog in the distance, a constant ω ∈ (0, 1]. needs to be substituted into formula (5): In the above equation, ∼ t (x) is the coarse transmittance, and ω is 0.95.
To solve the halo effect that occurs when coarse transmittance is used for image defogging, a guide filter is introduced for image processing. The formula of the guide filter is: In the above formula, q is the output image, F is the guide map, w k is the partial window, and a k and b k are the linear constant coefficients of w k . To ensure that the error between the input value of the function and the actual values is minimal, the coefficients a k and b k can be obtained by the least square method: In the above formula, p is the input image, µ k is the variance of F in w k , σ 2 k is the mean of F in w k , |w| is the total number of pixels in w k , and w k is the mean of p in w k . Then formula (7) can be updated to: Image enhancement and defogging based on HSI color space When the dark channel prior method performs haze image defogging, there will be a deviation in some areas where the dark channel is not 0, resulting in original image distortion and the soft matting algorithm has a high time and space complexity. Therefore, the HIS color space method is proposed to conduct image enhancement and defogging processing for haze images. First, the HIS color space is used to roughly estimate the transmittance and atmospheric light of the haze image, sencond, a guided filter is used to refine the transmittance, and then the color mapping equation is used to adjust the image. finally, a clear image is obtained. Through experiments, the HIS haze image enhancement algorithm is proved to have a better dehazing effect.
In the same image, the physical quantities of the HSI and RGB color spaces are not the same, but they can be converted to each other. Compared with the RGB color space [15], the HIS color space is more in line with the human visual perception of color. Therefore, this paper uses HIS color space for image defogging and color enhancement. The mathematical expression for converting RGB to HSI color space is: Based on formula (9), formula (10), formula (11) and formula (1), the image degradation model that can obtain the HSI color space is: In the above formula, H F , S F and I F are HSI color space models in the haze image, H Z , S Z and I Z are HSI color space models in the non-fog image, and A I is atmospheric light in the brightness channel.
The mathematical expression of the dark primary colors in the HSI color space is: In the above formula, S z and I z are the variables of the model in the HSI color space. When performing color enhancement of an image, it is first necessary to estimate atmospheric light and transmittance. The atmospheric light in the brightness channel can be regarded as the average value of the atmospheric light in each RGB channel: The calculation expression of transmittance is: The value of X is fixed at 0.95. Based on formula (1), (14), and (15) to obtain the restored image, and then use the color mapping equation to adjust the color in the image. The mathematical expression of the color mapping equation is: In the above formula, R d is the image after color adjustment, Z cmax is the maximum value in the RGB corresponding channel, λ is the offset parameter (value is 0.85), and L dmax m is the maximum brightness (value is 100).
Image dehazing based on machine learning Firstly, this paper downloads 10000 images containing haze from the Baidu image database with the keyword "haze image"; Second use the HIS color space image enhancement algorithm proposed in 1.3 to generate transmittance libraries for these images; In order to increase the number of samples in the image training set and make the transmittance library feature richer, Then use the Gaussian pyramid block sampling method to sample these images to obtain the transmittance block corresponding to the haze image; Then the LPB information of the image is extracted to label the image, and the image is simply clustered using the improved k-means clustering algorithm to obtain k initial visual words and the corresponding transmittance map. The traditional k-Means algorithm [16] is usually defined by the Euclidean distance between two samples. But the algorithm has a strong dependence on the selection of the initial cluster center point. So this paper uses density parameters to optimize the k-Means algorithm. The specific improvement steps are: (1) Input data set S, the number of clusters k, and the constant Minpts; (2) Calculate all the density values in the data set S based on the constants Minpts, solve the density mean, and get the density set C. (3) Select the data object corresponding to the maximum value in the set C as the center point C0 of the initial cluster; (4) According to the above method, obtain k initial cluster center points of the highdensity area; (5) Use the traditional k-Means clustering step to cluster the samples, and finally output the clustering results. The specific flow chart is shown in Fig. 1.
Then use SVM to cross-validate K visual words of different categories, improve the characteristics and representativeness of each visual word, and obtain multiple visual words and their corresponding transmittance maps. Research shows that the kernel function g and penalty factor c in SVM will affect the performance of the algorithm [17]. To improve the performance of the SVM algorithm, the parameters g and C need to be optimized. The kernel function in the SVM in this paper selects the Gaussian kernel function, and improves the SVM based on GA and obtains the GA-SVM algorithm [18,19]. The specific optimization process is shown in Fig. 2.
After obtaining the optimal g and c of the SVM, the improved SVM algorithm is used to cross-validate the K initial visual words, so that the obtained visual words are more representative. Finally, a large number of visual words and corresponding transmittance maps are obtained to form a visual dictionary. Suppose the structure of the image is that the upper part is far and the lower part is close, that is, the gray value of the upper part of the transmittance image is 0, and the gray value of the lower part is 1. The upper left corner of the transmittance image is defined as the origin of the image coordinates (0,0), then the gray value of each point in the image is: In the above formula, x and y are the horizontal and vertical coordinates of the two-dimensional coordinate system, and H is the height of the image. When defogging the haze image, the Gaussian pyramid block is used to sample the haze image, and the transmittance map corresponding to the visual word similar to the haze image is selected. When selecting a similar transmittance map for the haze image, the corresponding transmittance map may not be found. To solve the problem that the transmittance map of certain areas in the image does not match, after the transmittance map of the haze image is obtained, the guide filter is used to refine the transmittance map of the haze image to improve the quality of image defogging. It has been assumed that the structure of the image is that the upper part is far and the lower part is close, and the upper area of the picture is generally the area with the densest smog or the sky. Select the first % of the pixel values from the first quarter of the image to form a set, and then select the maximum value of the RGB channel at the corresponding position in the set as the atmospheric light value Ac of the atmospheric scattering model. Based on the above-obtained transmittance map and atmospheric light value, the atmospheric scattering model formula Z c (x) = Fc(x)−A c t(x) + A c is used to restore the haze image. The haze image defogging process proposed in this paper is shown in Fig. 3.

Results and Discussion
In the experiment, Intel (R) Core (TM) i5 CPU@ 3.00 GHz and 8.00GB of memory were used as the hardware simulation environment. The software simulation environment is QT5.5.0 and Matlab R2012a, Ubuntu operating system.

Defogging experiment based on HIS haze image
In order to verify the haze image enhancement effect based on the HIS color space proposed in this paper. This paper chooses He algorithm [20], Huang algorithm [13] and histogram equalization algorithm [17] to compare and analyze the effect of image enhancement.
It can be seen from Fig. 4 that due to the influence of haze, the scene in the collected image is almost unclear, and the visual effect is very poor. In Fig. 4(b), the scene can be observed after processing with the He algorithm, and the overall effect is better. Fig. 4 (c) and (d) is the use of Huang and histogram equalization algorithm for image defogging, the scene can be observed in the image, but the overall color of the image is dark and the color of the distant scene is more saturated serious. Fig.  4(e) shows the haze image enhancement and defogging processing algorithm based on the HSI color space proposed in this paper. It can be seen that the algorithm can effectively enhance the color of haze images, and on the whole, the scene is clear degree and saturation are better than other algorithms.
In order to better evaluate the defogging performance of the HIS color space method for haze images, five indicators of G, H, σ g , e and r are used for quantitative evaluation. It can be seen from Table 1 that the average gradient, information entropy, standard deviation, and R of the algorithm proposed in this paper are all maximum. It shows that the algorithm proposed in this paper has obvious advantages over other algorithms in terms of color enhancement and defogging of haze images. It shows that the algorithm proposed in this paper can not only eliminate the color shift phenomenon in the haze image, but also can effectively restore the details of the image, and at the same time can improve the color saturation, definition, and contrast in the haze image. Therefore, the HSI color space image processing algorithm proposed in this study can enhance and remove the haze in the color of the haze image, which has a very significant improvement effect.

Testing of improved machine learning algorithms
This article first tests the performance of the improved machine-learning algorithm to ensure the effectiveness of the machine learning algorithm when defogger the haze image.

Performance test of improved k-Means algorithm
To test the superiority of the improved K-Means algorithm in sample clustering, Iris, Seeds, and Wine data sets were selected for clustering algorithm performance testing. Using Average Between-Within Proportion (ABWP) and Average Improved Between-Within Proportion (AIBWP) to evaluate the clustering effect of the improved k-Means algorithm. The Between-Within Proportion formula is Formula 18, and the Improved Between-Within Proportion formula is Formula 19. Then calculate the average of the BWP and IBWP values of all samples. The larger the average values, the better the clustering effect of the clustering algorithm on the samples.
In the above formula, b (j, i) is the distance between classes, which is the minimum average distance between the sample i in the j class and the samples of other classes; w (j, i) is the distance within the class, which is the average distance between sample i in class j and other samples of this class. ib (j, i) is the distance between classes, which is the minimum distance between the sample i in the j class and the center points of other classes, iw (j, i) is the distance within the class, which is the average distance between sample i in class j and other samples of this class. The intra-class distance w (j, i) in BWP can well reflect the degree of connection between the sample and other objects in the same class, and the inter-class distance b (j, i) can well reflect the relationship between samples and samples in other classes. The larger b (j, i), the better the separation between classes, and the smaller w (j, i), the tighter the distribution of samples within the classes. In IBWP, the inter-class distance ib (j, i) of sample i can better reflect the relationship between the sample and other classes. If sample i is far away from the nearest class, it will be farther away from other classes. In addition, If the sample point i is not classified into class j, it must be classified into the class closest to it. Compared with BWP, IBWP can better evaluate the clustering effectiveness of a single sample.
It can be seen from Fig. 5 that the improved K-means algorithm proposed in this paper is more stable when measuring index values on different data sets. AIBWP value is more stable than the ABWP value, indicating that the improved K-means algorithm clustering result is more stable. At the same time, the value of AIBWP is greater than that of ABWP, prove that the improved K-means algorithm in this paper has a high accuracy rate for sample clustering.

SVM classification test based on GA parameter optimization
In the initial algorithm, C ∈ [0.1, 100], g ∈ [0.01, 10], the initial population size of the GA-SVM algorithm was set as 60, crossover probability as 0.4, mutation probability as 0.01, and the maximum number of iterations was set as 100. The fitness value was output every five times, and the SVM parameter optimization algorithm constructed in this paper was used for classification experiments on the Wine, Spect Heart, and Balance Scale data sets. The test results are shown in Table  II and Fig. 6. It can be seen from Table I that the GA-SVM algorithm proposed in this paper can output the optimal parameters c and g, and the classification accuracy of different data sets is higher than 80%. As can be seen from Fig. 6, the GA-SVM algorithm proposed in this paper after 30, 10, and 20 iterations in the Wine, Spect Heart, and Balance Scale data sets, the average fitness value gradually stabilizes. The results show that the GA-SVM algorithm can achieve intelligent optimization of c and g parameters and has good performance.

Image defogging effect based on machine learning
To verify the effect of the haze image defogging algorithm based on machine-learning proposed in this paper. This paper takes the image containing haze as the research object, then selects the He algorithm [20], Mei algorithm [5], Liu Haibo algorithm [21], and Choi algorithm [22] to compare and analyze the effect of image defogging. As shown in Fig. 7.
As can be seen from Fig. 7, there are no gray areas and bright areas in the input original haze image, and almost all defogging algorithms are related to the dark channel prior. The transmittance calculation method is similar, but the atmospheric light was acquired in different ways. Therefore, similar results are obtained when these algorithms are used to remove fog.
However, the Choi algorithm, He algorithm, Liu algorithm, and Mei algorithm are not obvious in image defogging processing. Besides, Choi's algorithm showed that distant scenes and leaves were darker in color, indicating that the algorithm lost details during the haze image defogging process. The He algorithm handles the near scene better, while the haze on the far scene does not. The image colors of Liu algorithm and Mei algorithm are different from the real scenes that people are subjectively familiar with, and the phenomenon of image color super-saturation is more serious. Based on (f) image is shown in Fig. 7. The algorithm in this paper can not only protect the detailed information of the image during the defogging process but also prevent the color distortion in the image. Experiments show that compared with other algorithms, the defogging algorithm based on machine learning training proposed in this paper can effectively complete the defogging of haze images.
To better evaluate the performance of the algorithm in this paper, the following five indexes are used to quantitatively evaluate the images after the fog removal of different algorithms.
(1) The average gradient(G) is the average value of all points on the image gradient map, reflecting the tiny details and texture changes of the picture, and also reflecting the sharpness of the image. The larger the average gradient value, the sharper the image.
In the above formula, F (i, j) is the gray value of (i, j) pixels; M is the height of the image; N is the width of the image.
(2) Image entropy (H) reflects the average signal amount of an image. The larger the value of H, the more detailed information of the image.
In the above formula, i is the gray value of the pixel, j is the gray level of the neighborhood; f (i, j) is the frequency of occurrence of the 2-tuple (i, j).
(3) The standard deviation (σ g ) represents the degree of dispersion between the pixel value and the mean value of the image. The larger the σ g value, the better the image quality.
In the above formula, p(i, j) represents the pixel value in row i and column j, and µ represents the mean value.
(4) The ratio of the number of visible edges (e) is the comparison of the number of visible edges before and after image defogging and represents the change in image richness after image defogging. e = (n r − n 0 ) /n 0 (23) In the above formula, n 0 is the number of visible edges in the original image; n r is the number of visible edges in the image after defogging.
The normalized gradient mean of visible edges (r) is the comparison of the average gradient value of the image before and after defogging, which represents the change of the image clarity before and after the haze image processing.
In the above formula, g 0 is the average gradient value of the original image; g r is the average gradient value of the image after defogging. Using G, H, σ g , e, r five indicators to quantitatively analyze the defogging effect of the haze image defogging algorithm, the results are shown in Table II. It can be seen that the values of G, H, σ g , e and r are the largest in the image after defogging by the algorithm in this paper. It shows that the algorithm in this paper can not only improve the clarity and color vividness of the image, but also enhance the saturation and contrast of the image in the defogging of the haze image.

Conclusions
To improve the defogging effect of haze images and overcome the disadvantage of traditional prior defogging methods that are difficult to capture certain features of haze images, this paper proposes a color correction and restoration of haze images based on HSI color space and machine learning. This article first obtains a large number of haze images from the Internet to form a haze image library; secondly uses the HIS color space haze image restoration algorithm to correct the smog image to obtain the haze image transmittance library; then based on the density parameter optimization k-means algorithm and GA-SVM algorithm to build a machine learning haze image defogging model, use the image defogging model to train the haze image transmittance library, and obtain many visual words representing different characteristic transmittance maps to form a visual dictionary. When defogging the haze image, first use the Gaussian pyramid block sampling, and then find the visual word corresponding to the haze image block to obtain the coarse transmittance map; then use the guided filter to refine the coarse transmittance map. Obtain the transmittance map of the haze image. Finally, according to the method of obtaining atmospheric light value proposed in this paper, combined with the atmospheric scattering model, a clear image is obtained. Through subjective comparison and objective evaluation, experiments prove that the method proposed in this paper can effectively improve the defogging effect of haze images, and enhance the clarity, detail, and color effects of the image.The haze image restoration and correction algorithm based on his color space and machine learning is effective for haze image de fogging. However, this algorithm is aimed at a single haze image, which has great limitations in practical application. In the future, we plan to study real-time haze affirmative restoration and correction based on the theory of this paper.

Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.