Underwater Image Restoration with a Complementary Haar Wavelet Transform Restoration and an Ensemble of Triple Elite Correction Algorithms using Bootstrap Aggregation and Random Forests

This paper presents both a new strategy for traditional underwater image restoration using Haar wavelet transform as well as a new learned model that generates an ensemble of triple elite correction algorithm labels based on histogram quadrants’ cumulative distribution feature instead of generating pixel intensities. The Haar wavelet transform is applied on the input image and its contrast stretched image to generate the degraded wavelet coecients which are rened using Gaussian pyramid of the saliency weights to restore the original image. The ensemble of triple corrections exerts three elite color correction algorithms sequentially on the degraded image for restoration. The ensemble of algorithms covers red channel mean shifting, global RGB adaptation, global luminance adaptation, global saturation adaptation, luminance stretching, saturation stretching, contrast stretching, adaptive Gamma correction for red spectrum, even to odd middle intensity transference using look-up table, green to red spectrum transference using histogram equalization, local brightening, Dark Channel Prior (DCP), fusion restoration, and our Haar wavelet transform restoration. The source is available at https://github.com/vahidr213/Underwater-Image-Restoration-And-Enhancement-Collection


Introduction
Various researchers have recently visualized several mathematical methods and thoughts to de ne an effective solution for color losses in underwater images and limitations by the extent to which several literatures in the machine vision science have approached the underwater image restoration problem from different perspectives. On the other hand, underwater image restoration is by no means trivial. It is actually as challenging as most in machine vision and image processing.
In the past decade, machine vision has become universal in multiple disciplines into where the integration of automatic inspection and analysis of images is needed. Object detection [1], recognition [2] and tracking [3] are some of well-known elds of machine vision which are recently extending their dominance into the aquatic world.
The primary factor in underwater image degradation is the high attenuation of the radiated light from the surface of the objects [4]. An early and famous single image haze removal for outdoor images named Dark Channel Prior (DCP) introduced the assumption of darkness in at least one of the RGB channels in any non-sky patch of the image. Underwater Dark Channel Prior (UDCP) [5] is derived from the DCP [6] for underwater images assuming the red channel as the dark channel, hence using green and blue channels to estimate the medium transmission matrix.
From the application point of view, 3D reconstruction of sea oor encounters objects in images that require automatic true color recovery since the effects can limit the ability to assess the changes happened to underwater organisms such as those living in benthic zone [7].
The Sea-thru method [8] examined the dark channel prior in detail via nding the optimized medium transmission matrix by estimating the range-dependent attenuation coe cient using local space average color. An alternative to dark channel prior named red channel prior [9] is a rearrangement of the original DCP equation as well as re ning the estimated transmission map with guided lter instead of matting technique [10]. Recently, the transmission map of the DCP method for the red channel have been estimated [11] by applying adaptive non-convex and non-smooth variation to a rough initial transmission map and then using this optimized transmission map to obtain red global background light via thermal exchange optimization [12], then the re ned red channel is recombined with green and blue channels to form the restored image.
Despite the proli c works as yet presented DCP and UDCP approaches, none have comprehensively su cient positive effect on the underwater image restoration problem since the dark channel prior formation model is regulated for the atmospheric haze and due to being a weak function of wavelength, it's inadequate to be rendered for light propagation in harsh oceanic environments [4].
Other strategies such as multi-scale fusion has drawn much attention in recent years. The method proposed by Ancuti et al [13] has been built by fusing four weights (namely, exposedness, Laplacian of luminance, locally averaged luminance and saliency) extracted from both a contrast stretched image as well as an equalized luminance image to determine how each pixel should be modi ed.
Inspired by UDCP and multi-scale fusion, an algorithm is proposed [14] so that the medium transmission map produced by UDCP is decomposed into saliency and Laplacian weights. These weights are joined by multi-scale fusion to produce a re ned medium transmission map which is used in DCP image formation model to restore the original image. However, fusion methods mentioned above produce equalized distributions in RGB spectrums that can cause over-enhancement or under-enhancement at some regions of image where the distribution is yet to remain untouched or shall have carried to the farthest ends of the intensity distribution. Moreover, saliency weights in both methods largely take up the principal structure of the weightings in contrast to other weights which have negligible effect.
Over the past decades, convolutional neural networks (CNN) have extended in many visual recognition elds [15]. An encoding-decoding convolutional neural network [16] is used to restore underwater images where the encoding and decoding levels are visualized by convolution and deconvolution operations respectively. The inner structure of the convolutional part is similar to the popular Alexnet [17] network.
They have accelerated the training process of the CNN as well as including low-level features by using skip connection [18], [19] between the encoding and decoding stages. The skip connection has been used since few convolutions extract some important features, but it can smooth the image and destruct the edges if over applied especially in underwater images.
The above mentioned networks still re ect a considerable rate of failure due to lack of handling the saturation and contrast problems. In [31], an end-to-end CNN with three blocks named RGB (for basic operations), HSV (saturation and luminance adjustment) and attention (stage for quality enhancement) is proposed where the nal restored image is produced by a weighted sum between the RGB block output and the attention block's RGB component as well as a weighted sum between HSV block output and the attention block's HSV component. The input of the attention block is a concatenation of the raw image and the images of other two blocks (RGB block and HSV block).
The following literature explains the proposed method, the inputs, the structure of the multi-scale fusion restoration and 2-D Haar wavelet transform restoration, the structure of the Ensemble of Triple Elite Correction Algorithms (ETCA) and nally the experimental results and comparison with other state of the art algorithms.

Proposed Method
In this paper, our contribution to the problem of underwater image restoration includes a traditional complementary method as well as a learned model using new features. The traditional methods solve portions of the whole problem quite topnotch. We used an alternative single image-based solution through re nement of input image's Haar wavelet coe cients. Our complementary approach should be applied after the multi-scale fusion restoration to further improve the quality of the restored image. The owchart of this approach is shown in Figure 1. On the other hand, a comprehensive method for random patterns requires trained models. In this paper, we use the histogram quadrants' cumulative distribution as a new input feature instead of using pixel intensity. The input features are learned using Bootstrap Aggregation and Random Forests. We have trained three models separately. The responses of models are also not in the form of pixel intensities. Each model generates a numeric label corresponding to a special color correction algorithm that should be applied sequentially to restore the original image.

Input Images
The input images of fusion process can generally be different because the fusion is based on combining multiple sources to preserve the signi cant features of them. In this regard, the contrast stretching is one of the most popular initial color correction tools as yet has overcome signi cant initial white balancing.
Numerous white balancing methods have been suggested but none of them are always experimentally appropriate in underwater scenes. The Gray-World approach of Bachsbaum et al [32] is a mediocre example of white balancing methods, yet creates color deviation by introducing reddish artifacts where the appearance is overall blue. The following method is used to adjust white balance initially which is simple and e cient and creates less red artifacts. At rst, a three-element vector is de ned as below: Where K is a constant (e.g. 0.005) to make a vector of probability ratios in the interval [0 1]. The three cumulative probabilities presented in ratio and their complements in (1-ratio) determine the percentage of the data that should be saturated to the lowest and highest values in each of the three RGB channels respectively.

Haar wavelet inputs
Our proposed method uses Haar wavelet transform which is fed with the raw input image as well as a contrast stretched version of the raw input mentioned above. The input image for Haar transform is resized into 640x640 in order to have the Haar pyramid dimensions as equal as the Gaussian pyramid of the weights (W 1 , W 2 ) as well as considering a power of 2 dimension to facilitate the wavelet transform.
The input image can also be resized to 1024x1024 to increase the accuracy.

Multi-scale fusion input
The rst image for the pyramid of multi-scale fusion [13] is the contrast stretched version of the input image explained above and the second one is an equalized luminance image provided by adaptive histogram equalization [33].

Weights of Multi-Scale Fusion
The choice of correct weights reduces computations and artifacts. There are four weights in [13] that are explained below:

Laplacian contrast weight (W L )
Laplacian contrast weight is the absolute value of the Laplacian lter applied on the luminance component of the image.

Local contrast weight (W LC )
Local contrast weight is an additional contrast measure to recover the contrast in the regions where the

( )
Saliency weights emphasize on the pixels standing farthest relative to the mean of the reddish, bluish and luminance components collectively. In other words, the salient objects in a scene are biologically recognized and similarly mathematically de ned as yet taken up a region of center-surround contrast. The saliency algorithm of Achanta et al [34] is used which is de ned in Lab color space by: Exposedness weights are a rough estimation of how much a pixel is exposed to light. This weight is de ned by a Gaussian pro le resembling (^) sign with mean value 0.5 and standard deviation 0.25. Therefore, pixels close to the average value are exaggerated to produce better results when combined with saliency weights. Exposedness weight (W E ) is de ned as: The nal weights W 1 and W 2 combine the above mentioned four weights and have the same size as input image (like all four weights). The nal weights determine the amount of modi cations that each pixel will receive after the fusion applied. The nal weights W 1 and W 2 are computed as below:  Figure 2 demonstrates the structure of updating weights.
In the re nement process, the saliency weights are used solely since the saliency weights take up the principal structure among the four weights described above. The re nement weights (W 1 , W 2 ) are computed according to the following equations: The re nement or updating of the degraded Haar wavelet coe cients are performed according to the following equations: Where H l I k c x, y)}, V l I k c x, y)}, D l I k c x, y)} are the horizontal, vertical and diagonal coe cients of kth image for the channel c ∈ {R, G, B} at level l. A k is the approximation coe cient matrix for image k.
The inverse Haar transform is then applied on the re ned coe cients (A new , H new , V new , D new ) to restore the original image.
The Haar wavelet restoration is independent and capable of restoring underwater images solely. Our experiments shown the overall improvement of e ciency through using the Haar wavelet restoration method as a complementary algorithm. Thus, we applied the Haar wavelet restoration method on the restored image produced by multi-scale fusion. Figure 1 shows the owchart of the Haar wavelet and multi-scale fusion structure. The overall algorithm shown a record breaking result representing an average mean square error (MSE) better than almost all trained models and traditional methods except for Dive+ algorithm.

Multi Scale Fusion
The fusion is represented by a weighted sum of images at every location (x,y): Due to halos and artifacts that are introduced in R(x,y) by directly applying equation above, both weights and input images are decomposed into a multi-scale pyramid de ned below: Where l represents the number of pyramid levels. Each scale is derived by down sampling the previous level. The initial level is the original image.
To preserve the important and desired information, an operation such as Gaussian ltering or Laplacian ltering can be applied before down sampling. In this regard, we decomposed weights into a Gaussian The Laplacian pyramid is created by building a pyramid of down sampled images at rst, then calculating the difference of the lower level image from the resized (enlarged) upper level image (using bicubic interpolation). The rst level for both the Gaussian pyramid and the Laplacian pyramid is as the same size as the input image to increase the accuracy.

Ensemble Of Triple Elite Correction
Traditional methods present a global solution (or limited joint solutions) for a typical prior assumption such as homogeneous lighting along the line of sight, unbalanced color spectrum in RGB channels, saturated colors, low contrast and brightness, blurriness, and especially degradation based on underwater image formation. On the other hand, the comprehensiveness of a speci c traditional method solving different patterns of underwater degradation is as yet undecided.
There are several conventional color correction techniques aimed at increasing the visual quality of an image. These techniques include stretching, global adaptation, sharpening, unsharp masking, histogram equalization and many others. In this paper, we propose to predict and choose the best three combinations from an ensemble of conventional color correction techniques as well as our current proposed methods to be applied on a single image according to the features extracted from the image. In this regard, we have trained three models. The training method is Bootstrap Aggregation and Random Forests. Each model is responsible for generating a numeric label that corresponds to a speci c color correction method among the fteen available methods. In this paper, we have also proposed new features to be learned which are explained in detail in the feature extraction section. The ensemble of color correction methods are red channel mean shift and sharpening, global RGB adaptation, global luminance adaptation, global saturation adaptation, luminance stretching, saturation stretching, contrast stretching, adaptive Gamma correction for red spectrum, our even to odd middle intensity transference using look-up table, green to red spectrum transference using histogram equalization, local brightening, DCP [6], fusion restoration [13], and our Haar wavelet transform restoration.
It should be noted that the triple elite corrections are applied sequentially as the block diagram of the proposed method ETCA is showing in Figure 3.
The total number of the available methods including no action is equal to fteen. It should be noted that the null operation leaves the image intact. This was necessary due to existence of some fully enhanced images in the collection of images which had a negligible difference with their reference image (negligible mean square error). Other color correction algorithms are labeled with a special number identifying the corresponding operation. The color correction methods are described in the feature extraction section in detail.

Features extraction
The feature extraction plays a key role in the quality of the deep learning-based underwater enhancement methods. As far as we know, most of the recent deep learning models have used the RGB values as their features. A majority of the state of the art underwater deep learning methods map pixel to pixel or pixel to difference. Pixel to pixel mapping mutually produces an output pixel in response to an input pixel. Pixel to difference mapping produces a positive or negative amount in the output to be added to or subtracted from the input pixel intensity. However, the possibility of generating multiple RGB combinations in the outputs of these networks still remain unsolved and will fall behind the generalization of the such conventional deep learning-based underwater restoration methods.
In this regard, we propose a probability based solution to be an alternative feature for pixel intensity. We have used cumulative distribution function (CDF) in 4 closed intervals or regions of the probability distribution function (PDF) of the input image. Therefore, each quadrant of the image's PDF is assigned with a CDF scalar. Since 8-bit RGB images have three channels, a total of twelve scalars are extracted for each input image using equations below.
Where F I is the cumulative distribution function for the intensity I, f is the probability distribution function or histogram, c ∈ {R, G, B}, and Ω 1 , …, Ω 4 are the maximum boundary of each quadrant which their numeric values in the experimental results.
We have used these twelve CDFs as input features. The cumulative distribution (CDF) of each histogram quadrant decreases the number of training features signi cantly in contrast to huge training data used in above mentioned networks. The histogram quadrants also decrease the training time signi cantly.
The training process also needs the corresponding responds for which the input features have been extracted. In this work, the respond of the network is also not a pixel intensity. The respond to each input feature is an ensemble of optimized triple labels minimizing the cost function (in our case, the Mean Square Error between the reference image and the restored image). Each numeric label corresponds to a single special color correction method and each ensemble of triple labels has to be applied on the input image sequentially to restore the original image. Therefore, an optimization has been performed to generate the optimum triple correction methods (labels) for each input image.
A total number of fteen color correction methods used in evaluations. One of them is no action. The no action is labeled with zero showing that the input image requires no color correction. This was necessary due to existence of some fully enhanced images in the UIEB dataset. Hereafter, other color correction methods are described in detail.
The contrast stretching in RGB color space, luminance stretching in Lab color space, and saturation stretching in HSV color space are performed simply by using the following equation which saturates the bottom 1% and the top 1% of all pixel values (with proper L and H). Where I(x, y) is an M × N gray scale image and δ is small value to avoid singularity (e.g. δ = 10 − 3 ).
As the log-average − I increases to larger values, the global adaptation equation I g behaviors more in linear shape than logarithm shape. Therefore, darker scenes are brightened more than brighter scenes through global adaptation.
Adaptive Gamma correction is performed using the method proposed in [37] which provides an automatic way to adaptively compute the Gamma for a given image instead of using a constant scalar. This method is computed through equations below for a gray scale image I: Where scalar μ is the average value of the image, scalar σ is the standard deviation of the image, and δ is an in nitesimal value to avoid zero input to Heaviside function.
Red channel mean shifting and sharpening is composed of two operations. It should be noted that this approach is used in only few cases since this method requires a user de ned scalar for blending.
The initial operation shifts the mean of the red channel toward the gray image's mean and then blends the green spectrum into red spectrum according to the following equation: The second operation sharpens the red channel I R using unsharp masking. Unsharp masking is performed by subtracting a blurred (unsharp) version of the image from the initial image.
In this work, we propose the idea of even to odd middle intensity transference using look-up table to augment the probability of middle intensities (roughly doubling). After transferring the even middle intensities to their adjacent odd intensities, every odd middle intensity will roughly have a doubled probability. This phenomenon is due to the continuity of the cumulative distribution (cdf) of the discrete and continuous data. In other words, two adjacent intensities have quite slight difference in their probabilities. The selected middle intensities are between . Other intensities are left intact. This method partially affects the image and has no extensive positive or negative effect on the image.
Green to red spectrum transference using histogram equalization is performed since we have observed many positive effects using green spectrum in contrast to blue spectrum. The transference is performed with the following equation: Where ρ green is the green spectrum which is used in histogram equalization.
The local brightening [38] is used as one of our color correction methods but it should be noted that local brightening method quite degrades the overall underwater images. Due to this effect, we have set the blending option of the brightening method to (0.1).
We have used the Bootstrap Aggregation (Bagging) and Random Forests [39]- [40] to create three trained models with which the three labels are generated. The rst, second and third labels are generated separately through their designated trained model and then these labels are sequentially exerted on the input according to their order.

Experimental Results
A comprehensive and fair experiment of several traditional underwater image enhancement methods is performed in UIEB dataset paper [41] from which our raw and reference images are drawn. Likewise, a comprehensive evaluation of several recent learned models (deep learning) is performed in [49] from which we quote them in Table 1. The UIEB dataset [41] has 890 real-world paired images (raw and reference) plus 60 challenging raw images. Our simulations are executed via Matlab software and the source is freely available for evaluation.
The speed performance depends on the image size and the methods applied on the image. The larger the image, the slower the result is produced since all of our color correction algorithms are applied on the whole image. Therefore, the number of pixels has direct in uence on the speed performance. The maximum latency happens when both multi-scale fusion [13] and our proposed Haar wavelet restoration go along with each other inside the ensemble of triple algorithms.
The number of levels used in decomposing the input images and weights in both multi-scale fusion and Haar wavelet transform is equal to l=5.
The maximum boundary of each quadrant depends on the number of histogram bins. We have used 256bin histogram. Therefore, the maximum boundaries Ω 1 , Ω 2 , Ω 3 , Ω 4 will be {63,127,191,255}.
The underwater image restoration using Haar wavelet coe cients re nement has shown a superior improvement in terms of average MSE, PSNR and SSIM on the whole dataset over traditional and deep learning methods as you can see in Table1. The only superior method over our proposed traditional method is the commercial app called Dive+. As mentioned before, the images are resized to 640×640 in image restoration using Haar wavelet transform. Due to this, the accuracy decreases while the processing speed increases. It is possible to increase the input image to higher dimensions to increase the precision since the Gaussian pyramid of the saliency weights will carry more information to be used in the modi cation of the nal pixel.
Our method ETCA is exible and it is possible to use other color correction algorithms that are more robust instead of some of our current fteen methods. As you can see in Table 1, the best performance among all the presented methods belongs to our proposed ETCA. Our evaluations and calculations shown that the ETCA has an average percent of MSE improvement of nearly %83.4 which is a high percentage. This means that most of the MSEs are low and only few cases are least improved.
During optimization, more than 1460 triple permutations are evaluated to reach the best ensemble of labels. The number of images on which the ETCA has applied null operation is 11. Table 1. quotes the performance of our methods as well as the performance of other methods according to [41], [49].
Our technique shown limitations when the optimization process has produced weak ensemble of correction algorithms in few cases. By evaluating the restored images visually, we found that most of the images are well restored and only in very few cases some reddish artifacts are produced. An example of such case is shown in Figure 4. This situation is due to the type of the cost function. Since the criterion of our cost function relies only on the mean square error (MSE), the small possibility of nding a weak minimum is not avoidable.
An example of no operation is shown in Figure 5. In Figure 6 we compare our results with the reference images presented in UIEB.
Out of fteen color correction algorithms, only two of them rely quite a lot on experimental user de ned parameters. These two algorithm can create extreme changes when applied with inappropriate parameters. These methods include red channel mean shifting and local brightening. Both of them require a parameter for blending amount to which we have set a scalar value of (0.1).  Figure 1 Page 19/20 The owchart of the complementary Haar wavelet transform for underwater image restoration The block diagram for illustrating the Haar wavelet coe cient re nement The block diagram illustrating the ow chart of the Ensemble of Triple Elite Correction. M 1 , M 2 , and M 3 are the trained models generating a numeric label for C 1 , C 2 , and C 3 sequential color correction blocks respectively.

Figure 4
An example of our method's failure in nding the best three color correction algorithms. The red artifacts are introduced in the image due to weak ensemble of algorithms.

Figure 5
An example of output where our proposed method has applied no color correction on the input image.

Figure 6
The comparison between the restored images of ETCA in the left vs the reference image