KinD-LCE: Curve Estimation and Retinex Fusion on Low-Light Image

doi:10.21203/rs.3.rs-3340164/v1

Download PDF

Research Article

KinD-LCE: Curve Estimation and Retinex Fusion on Low-Light Image

https://doi.org/10.21203/rs.3.rs-3340164/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 06 Dec, 2023

Read the published version in Signal, Image and Video Processing →

You are reading this latest preprint version

Low-light images often suffer from noise and color distortion. Object detection, semantic segmentation, instance segmentation, and other tasks are challenging when working with low-light images because of image noise and chromatic aberration. We also found that the conventional Retinex theory loses information in adjusting the image for low-light tasks. In response to the aforementioned problem, this paper proposes an algorithm for low illumination enhancement. The proposed method, KinD-LCE, uses a light curve estimation module to enhance the illumination map in the Retinex decomposed image, improving the overall image brightness. An illumination map and reflection map fusion module were also proposed to restore the image details and reduce detail loss. Additionally, a total variation loss function was applied to eliminate noise. The proposed method was trained using the GladNet dataset and tested with the Low-Light dataset. The ExDark dataset was used for validation in downstream tasks. The benchmark experiments demonstrated that the proposed algorithm achieved PSNR (19.7216) and SSIM (0.8213) values, which are close to state-of-the-art results.

Deep learning has gained immense popularity in computer vision because of its ability to process and recognize images through data-driven approaches. However, when capturing photos, images are often affected by environmental factors, such as backlighting, blurring, and distortion. Conventional computer vision algorithms are trained and designed for normal images, but they cannot achieve the same level of recognition accuracy for low-light images, leading to a reduction in image quality and industrial production efficiency.

Numerous approaches have been introduced to address the effect of low-light images on computer vision applications [1, 2, 5–9, 33]. One of the mainstream approaches for designing low-light enhancement networks is based on the Retinex theory [3]. Moreover, night-time driving poses a significant challenge for autonomous driving technology. To overcome this challenge, we must enhance the images, and semantic segmentation should be used to extract useful information. Thus, deploying low-light enhancement algorithms on IoT devices at the edge is a problem that must be addressed.

Researchers have proposed algorithms to enhance the quality of low-light images. The most widely used algorithm in the early research work was histogram equalization [15]. However, histogram equalization has certain limitations in some cases: 1) The equalization effect is not natural enough; 2) It can quickly produce over-enhanced effects; 3) The computational efficiency is not high. Recent studies [2, 4, 5, 7–11, 19, 24] have demonstrated that the Retinex theory [3] is more effective in restoring images in low light enhancement and is widely used in image processing tasks. However, the Retinex theory assumes that the illumination map is smooth, so blurring can occur at the edges leading to loss of edge details and an increase in noise. Therefore, the Retinex theory requires noise suppression, recovery of smooth and non-smooth parts of the illumination map, and suppression of the effect on edge details.

When extracting features from low-light images, researchers have proposed using a U-shaped network structure to compensate for features to avoid losing spatial information during upsampling and downsampling, thereby allowing for the retention of more feature information during the process of decomposing images. Although the U-shaped structure network has good generality and allows the extraction of various features, the extraction encodes the feature from the image, introducing redundant information and additional noise, which enhances noise after merging.

Concerning the aforementioned situation and issues, we proposed a Retinex Fusion module based on the U-shaped network structure, which extends the Retinex theory and fuses the features of the reflectance map with the illumination map to solve the edge-blurring problem in the illumination map, followed by the light curve estimation module to adjust the image brightness distribution by the trainable parameter α to reduce the noise generated during upsampling and downsampling, to reduce the edge detail loss, and finally, to suppress the generation of noise with the total variation in the restoration network loss function.

The main contributions of this study are as follows.

The feature information contained in the reflectance map of the image is fused with the illumination map of the image to optimize the image quality, and the image brightness is enhanced using the reflectance map to calculate the illumination coefficients.

The brightness of the image is improved with the Light Curve Estimation (LCE) module, which brings out more detail in the image.

The total variation loss function is included in the restoration network to further eliminate the noise, reduce the parts with significant variation, and smooth the image edges.

The rest of the paper is organized as follows. We describe our proposed approach in Section 4. We demonstrate our methods in Section 5. We discuss relevant work in Section 2 and conclude in Section 6.

The analysis and solution presented in this section are motivated by the increasing significance of handling low-light images in computer vision. We begin by examining our previous work and then introduce our novel approach.

2.1 Low-Light Enhancement

The primary task of low-light enhancement is to increase the brightness of the image to improve the recognition accuracy of computer vision tasks, such as image classification, target detection, and image segmentation, and to reduce the effect of environmental factors on recognition accuracy.

Histogram equalization [15] is the most widely used low-illumination enhancement algorithm in early research. It uses a cumulative distribution function to normalize the pixels of an image to the entire grey level, by which the histogram of the output image is constrained to enhance the overall image contrast. Simultaneously, it is possible to calculate the parameters for block-wise histogram equalization, thereby avoiding the excessive enhancement that results from global equalization. This method is simple to implement, suitable for various images, and produces better results than many conventional processing methods. The technology has certain limitations in low-light enhancement. Specifically, in low-light images, the limited information in the image and the presence of noise can lead to issues, such as over-enhancement, excessive denoising, and distortion, when using global histogram equalization and other global histogram adjustment algorithms. Additionally, these methods may increase the noise level of the image, decreasing the image quality. Therefore, we should comprehensively pay more attention to local control to enhance the effectiveness of low-light enhancement tasks.

The Retinex model [3] and its multi-scale version [21, 32] decompose the image into illumination and reflection maps and process the two maps separately before fusing them to obtain the recovered image. The model can process images based on the human eye's perception of color and simulate changes in a scene's lighting conditions, allowing for a better visual experience in low light or other adverse environments. However, it is difficult to achieve better robustness using only hand-designed a priori knowledge. The grayscale values of images are very close to each other in low-light conditions, making it difficult to distinguish them during multi-scale decomposition, resulting in information loss and the generation of artifacts, and it is deeply affected by noise. These factors can lead to an inability to recover details, colors, and other features in the image. Therefore, it is necessary to dynamically adjust prior knowledge according to the current environment and preserve more loss information in the multi-scale decomposition process [19, 35].

Inspired by Photoshop-like image editing software, researchers [1] have recently attempted to design a curve-based adjustment method that adaptively adjusts an image's light curve. A low-light image is mapped automatically to a light curve, and its brightness is adjusted according to the light curve. There are three goals for such a curve: 1) The pixel values of the image to be enhanced should be normalized to [0, 1] to avoid gradient explosion because of large loss values; 2) The curve should be monotonic to facilitate maintaining the variance between adjacent pixels; 3) Backpropagation to compute the gradient should be as derivable as possible. In practical applications, this method effectively solves the gamma correction problem and achieves high fidelity. However, as a curve estimation-based method, the instability and limitations of the results may lead to the destruction of the original image features, affecting the original image's structure.

Meanwhile, several studies have focused on using unsupervised, zero-shot, and few-shot learning methods to achieve low-light enhancement. Guo et al. [1, 28] recently proposed the Zero-DCE network and Zero-DCE + + network that can be trained without training data pairs with a non-referenced loss function. Using a zero-reference strategy reduces the burden of data preparation and increases the training speed. However, because of this strategy, the network may show poor enhancement results for low-light images with high noise levels.

Zhang et al. [2] proposed a multi-stage low-light enhancement model based on the Retinex theory. The decomposition and fusion of the Retinex method were based on a neural network approach; the method was divided into three main stages. First, the illumination and reflection maps are segmented by Encoder-Decoder with a U-shaped structure, then the reflection map is restored by the restoration network, and the illumination map is enhanced by the enhancement network. Finally, the enhanced and restored images are fused based on the problems of the previously mentioned Retinex theory [3]. Blurred edges lead to problems, such as loss of edge details and increased noise. The KinD-Plus [14] network was also designed based on Retinex theory, which will have the same problems. Besides, Jiang et al. [29] proposed an unsupervised method to learn the correspondence between low-light and normal lighting with normally lit images without using low-light images as reference objects. However, this training method affects the quality of images by introducing noise and loss of edge details. Until now, Liu et al. [31] and Xie et al. [30] have proposed using semantic information to guide the reconstruction of Retinex images to eliminate noise. By contrast, it is more advantageous for us to enhance the Illumination stage directly to eliminate image enhancement. In conclusion, regardless of whether unsupervised or supervised methods are used for enhancement, whether using contrastive images or semantic information for guidance, each approach has advantages and disadvantages.

2.2 Image Fusion

Most neural networks use image fusion to enhance the features extracted using the backbone network, usually fusing the features extracted at different scales or categories in the middle of the neural network to make the extracted features more effective [20,21,23,34]. The process of downsampling the neural network causes a loss of information, resulting in the loss of details after resampling because of insufficient information.

In the process of low-light enhancement, the introduction of image fusion and feature fusion can compensate for the loss of details caused by the downsampling operation and preserve the details of the original image while enhancing the image's brightness. Ying et al. [12, 13] argued many similarities between cameras and the human eye because the brain fuses information from different luminance ranges after the eye has acquired the image, resulting in a high dynamic range image with complete detail retention. Based on this conjecture, they proposed a method to obtain high dynamic range images by transforming the luminance of low-light images and then fusing the images of different luminance ranges.

However, because of the requirement for exposure fusion, there are specific requirements for the brightness differences between adjacent images, which may not apply to all image scenes. Moreover, compared with other low-light enhancement algorithms, the enhancement effect of this algorithm may not be natural enough, leading to excessive enhancement and loss of details in the image.

3.1 Basic Information

In this paper, an image is defined as a matrix with pixel values ranging from 0 to 255, representing the degree of lightness or darkness of the image. The brightness of an image is influenced by multiple pixels and can be adjusted using factors, such as contrast and saturation. Low-light images lack scene information due to various factors, such as the environment, making it difficult for the human eye to perceive details in the image. Based on the Retinex model, the original image is decomposed into two components: the reflection map, which represents the essential properties of the image (excluding brightness) and contains edge details and color information, and the illumination map, which captures the overall scene outline and luminance distribution. And process the reflected and illuminance images to generate pictures with normal brightness.

3.2 Problem Description

The optimization objective of low-light image enhancement is to input two sets of images (low-light image, normal-light image) using a neural network algorithm to learn the mapping relationship from the low-light image to the normal-light image, which can be expressed by the following equation:

$$\begin{array}{c}{I}^{{\prime }}=F\left({I}_{low},{I}_{normal}\right)\#\left(3-1\right)\end{array}$$

where ${I}^{{\prime }}$ denotes the image after enhancement, $F\left(\cdot \right)$ denotes the corresponding low-light enhancement neural network, ${I}_{low}$ denotes the low-light image, and ${I}_{normal}$ denotes the normal-light image. there are various representations of $F\left(\cdot \right)$ to achieve the enhancement of low-light images, such as the histogram equalization method based on traditional histogram equalization method HE of digital image processing algorithms [15], methods based on Retinex model [2, 5–9], etc. In this paper, the main recovery using Retinex model is divided into three main stages, which are image decomposition (Decomposition), illumination map enhancement (Illumination), reflection map recovery (Reflectance), and restoration of the image. In the Decomposition stage, the input image is decomposed into Illumination and Reflectance maps for subsequent processing.

$$\begin{array}{c}I,R={F}_{decom}\left({I}_{in}\right)\#\left(3-2\right)\end{array}$$

where $I$ denotes the illumination map, which reflects the different brightness details of the image in different areas. $R$ denotes the reflection map, which reflects the contour details of different parts of the image after removing the brightness representation.${F}_{decom}\left(\cdot \right)$ denotes the decomposition network, and ${I}_{in}$ denotes the input image. The input image is decomposed into illumination map and reflection map by Decomposition, and the illumination map and reflection map are processed separately. The decomposition process requires the use of loss function optimization, which is generally used to optimize the MSE loss function and derive the minimum value of the loss function. The formula is expressed as follows.

$$\begin{array}{c}{L}_{mse}={‖\widehat{I}-{I}_{p}‖}_{2}^{2}\#\left(3-3\right)\end{array}$$

where $\widehat{\text{I}}$ denotes the normal-light image, ${\text{I}}_{\text{p}}$ denotes the predicted image. Assuming that the parameters of the k-th layer are ${W}^{\left(k\right)}$ and ${b}^{\left(k\right)}$, the final optimization objective is

$$\begin{array}{c}Loss={min}\left(0,\frac{\partial {L}_{mse}}{\partial {W}^{\left(k\right)}}\right)\#\left(3-4\right)\end{array}$$

where ${\text{W}}^{\left(\text{k}\right)}$ denotes the weight of model, and ${\text{b}}^{\left(\text{k}\right)}$ denotes the bias of model.

For the Illumination stage, the illumination maps decomposed from low-light images and normal illumination are input separately, and the enhanced illumination maps are obtained after network optimization enhancement to learn the luminance mapping relationship from low-light illumination maps to normal illumination maps. The formula is expressed as.

$$\begin{array}{c}{I}_{illum}={F}_{illum}\left({I}_{low},{I}_{normal}\right)\#\left(3-5\right)\end{array}$$

where ${I}_{illum}$ denotes the enhanced illumination map, ${I}_{low}$ denotes the illumination map obtained by decomposing the low-illumination image, and ${I}_{normal}$ denotes the illumination map obtained by decomposing the normal-illumination image, ${F}_{illum}\left(\cdot \right)$ denotes the network of the enhanced illumination map, which will accept illuminance maps under low light conditions and illuminance maps under normal light conditions to produce enhanced illuminance maps under low light conditions. The loss function is also used for optimization.

At the same time, the reflectance map obtained from the low-light image and normal light decomposition are input in the Reflectance stage respectively, and the information of the low-light reflectance map, including details and edges, is recovered by the neural network. The formula is expressed as.

$$\begin{array}{c}{R}_{reflect}={F}_{reflect}\left({R}_{low},{R}_{normal}\right)\#\left(3-6\right)\end{array}$$

where ${R}_{reflect}$ denotes the recovered reflectance map, ${R}_{low}$ denotes the reflectance map obtained by decomposing the low-light image, and ${R}_{normal}$ denotes the reflectance map obtained by decomposing the normal-light image, ${F}_{reflect}\left(\cdot \right)$ denotes the network to recover the reflectance map, which will accept reflection maps under low light conditions and reflection maps under normal light conditions to produce enhanced reflection maps under low light conditions. The loss function is also used for optimization. Finally, the outputs of Illumination and Reflectance stages are fused to obtain.

$$\begin{array}{c}{I}_{out}={I}_{illum}\cdot {R}_{reflect}\#\left(3-7\right)\end{array}$$

Low-light image enhancement optimizes the brightness of the image by decomposing the image into illumination map and reflection map to achieve the decoupling effect, brightness enhancement of the illumination map to achieve the recovery of the brightness of the original image, and recovery of the reflection map to avoid the introduction of excess noise.

In this section, we present our proposed solutions to the aforementioned problems. Our approach optimized the enhancement effect while minimizing the effect on image quality. For the first time, we combined the Retinex image fusion module and the light curve adjustment module to solve the image restoration problem caused by incomplete decomposition in retinex theory. The structure of the Retinex image fusion module is described in Section 4.3, whereas the structure of the light adjustment curve module is discussed in Section 4.4. To enhance image quality, we fused the illumination and reflectance maps and calculated the illumination coefficients based on the reflectance maps to suppress noise generation. We then used the total variation loss function for noise removal and the illumination adjustment curve module to mitigate the effect of noise removal on luminance.

4.1 Network Architecture

In this paper, TV loss [36], Light Curve estimation module, and Retinex Fusion (RF) module are applied to KinD-Plus [14] network to design the KinD-LCE network. We reduced the noise in the output image with the total variation loss function to attach a normal term to the model, which constrains the noise by matching other loss functions. The Light Curve Estimation module adjusts the brightness of the illumination map, mapping different pixels to a continuous differentiable numerical space and adjusting the light curve using trainable parameters in backpropagation to suppress the chessboard effect and improve the image quality. The Retinex Fusion module is to fuse the reflection map and the illumination map in the illumination enhancement stage to compensate for the lost features. The network structure of KinD-LCE is depicted in Fig. 4.1.

In Fig. 4.1, the Retinex image decomposition network structure is shown in Retinex Decomposition and Image Fusion (RDIF) on the left side of the figure. The model is divided into three parts training the low illumination image ${\text{input}}_{low}$ and the normal image ${\text{input}}_{normal}$ into the Decomposition module ${F}_{decom}\left(·\right)$ to obtain two sets of illumination maps $\left({I}_{low},{I}_{normal}\right)$ and reflection maps $\left({R}_{low},{R}_{normal}\right)$ .

$$\begin{array}{c}{I}_{low},{R}_{low}={F}_{decom}\left({\text{input}}_{low}\right)\#\left(4-1\right)\end{array}$$

$$\begin{array}{c}{I}_{normal},{R}_{normal}={F}_{decom}\left({\text{input}}_{normal}\right)\#\left(4-2\right)\end{array}$$

Passing ${I}_{low},{R}_{low}$ to Restoration module。${F}_{restore}\left(·\right)$ for restoring ${R}_{low}$, which is used to generate the enhanced image and also acts as a noise reduction.

$$\begin{array}{c}{R}_{out}={F}_{restore}\left({I}_{low},{R}_{low}\right)\#\left(4-3\right)\end{array}$$

${I}_{low},{I}_{normal},{R}_{low},{R}_{normal}$ is passed to the RDIF module ${F}_{rdif}\left(·\right)$ to restore ${I}_{low}$, and ${x}_{rf}$ is obtained by fusing ${I}_{low},{I}_{normal},{R}_{low},{R}_{normal}$ with the Retinex Fusion module, and ${x}_{rf}$ is fed to the Illumination module ${F}_{illum}\left(·\right)$ to enhance ${I}_{out}$, where ${F}_{illum}\left(·\right)$ uses an image enhancement method based on Light Curve Estimation, which improves the problem of noise and loss of edge detail due to the use of upsampling and downsampling.

$$\begin{array}{c}{x}_{rf}={F}_{rdif}\left({I}_{low},{R}_{low},{I}_{normal},{R}_{normal}\right)\#\left(4-4\right)\end{array}$$

$$\begin{array}{c}{I}_{out}={F}_{illum}\left({x}_{rf}\right)\#\left(4-5\right)\end{array}$$

Finally, based on Retinex theory, the illumination map ${I}_{out}$ and reflection map ${R}_{out}$ are fused at the pixel level to obtain the enhanced image, denoted as

$$\begin{array}{c}{P}_{r}= {R}_{out}\otimes {I}_{out}\#\left(4-6\right)\end{array}$$

4.2 TV loss

The TV loss is a regularization term that measures the total variation in an image. It is well-known that the total variation of a noisy image is higher than that of a noise-free image [36]. By minimizing the TV loss, we can achieve the goal of denoising. This is accomplished by calculating the difference between neighbouring pixel values in the image. In this paper, we employ the TV loss to reduce the noise generated during network enhancement and the noise generated during the fusion of illumination and reflection maps. The TV loss is defined as follows.

$$\begin{array}{c}{L}_{tv}={\sum }_{i,j}\sqrt{{\left({x}_{i,j+1}-{x}_{i,j}\right)}^{2}+{\left({x}_{i+1,j}-{x}_{i,j}\right)}^{2}}\#\left(4-7\right)\end{array}$$

In low-light images, hidden noise is prevalent, and convolutional neural networks can introduce additional noise while upsampling and downsampling the images. To mitigate this noise, we used total variation (TV) loss as it measures the total change in the image while preserving its edges. Meanwhile, Gaussian noise and Poisson noise are prevalent in the image, and Poisson noise is a kind of noise associated with illumination based on the ability to preserve the edges. The Poisson noise is illumination-related, and based on the property of preserving edges, the Gaussian noise and Poisson noise can be eliminated through TV loss. The total loss function used by the method in this paper in the Illumination module is expressed as

$$\begin{array}{c}L=\beta {L}_{grad}+{L}_{mse}+{L}_{tv}\#\left(4-8\right)\end{array}$$

$$\begin{array}{c}{L}_{grad}={‖\left|\nabla \stackrel{\sim}{I}\right|-\left|\nabla {I}_{p}\right|‖}_{2}^{2}\#\left(4-9\right)\end{array}$$

$$\begin{array}{c}{L}_{mse}={‖\stackrel{\sim}{I}-{I}_{p}‖}_{2}^{2}\#\left(4-10\right)\end{array}$$

Where ${L}_{grad}$ denotes the gradient loss, ${L}_{mse}$ denotes the mean square loss, ${I}_{p}$ denotes the normally illuminated image, $\stackrel{\sim}{I}$ denotes the adjusted image, which is the image in the process of mapping from the low illuminated image to the normal image, ∇ denotes the gradient operator, ${\beta }$ denotes the weight of the loss of ${\text{L}}_{\text{g}\text{r}\text{a}\text{d}}$, and the parameter ${\beta }=0.01$ is set in this paper according to experience.

4.3 Retinex Fusion Module

The Retinex Fusion module is divided into two parts to process low illumination images, local illumination coefficients ${t}_{local}$ are multiplied with the unit matrix $E$ to ignore spatial features and improve the effect of illumination coefficients, and global illumination coefficients ${t}_{global}$ are multiplied with ${R}_{low}$ to focus on global detail features, which work together to make the neural network focus on more image information and consider local features of the image. The module for image illumination attributes adjustment, which allows the edge illumination information of ${R}_{low}$ and ${I}_{low}$ at the time of combination to be complimented. It restores unsmooth areas of an image to preserve edge detail.

The global illumination matrix ${S}_{global}$ is first computed using ${R}_{low}$, the two illumination maps ${I}_{low}$ and ${I}_{normal}$ are compared and then averaged to obtain the original illumination coefficients ${t}_{global}$, and then the original illumination coefficients ${t}_{global}$ are multiplied by ${R}_{low}$ to recover the missing edge detail in ${I}_{low}$ due to incomplete decomposition in Decomposition, ${S}_{global}$ is computed as follows.

$$\begin{array}{c}{S}_{global}={R}_{low}· {t}_{global}\#\left(4-11\right)\end{array}$$

$$\begin{array}{c}{t}_{global}=\frac{1}{MN}{\sum }_{i,j}\frac{{I}_{low}\left(i,j\right)}{{I}_{normal}\left(i,j\right)}\#\left(4-12\right)\end{array}$$

where $N$ and $M$ denote the height and width of ${I}_{low}$ .

Secondly, the local illumination matrix ${S}_{local}$ is calculated using Eq. (4–13), and the illumination map of the cropped low light image $\left({I}_{lowcrop}\right)$ and the illumination map of the normal image $\left({I}_{normalcrop}\right)$ are compared and then averaged to obtain the local illumination coefficient ${t}_{local}$, and then multiplied with the unit matrix E of the same size as ${I}_{lowcrop}$ and the local illumination coefficient ${t}_{local}$. To cope with the detail reduction problem of ${I}_{low}$ under different illumination conditions and to improve the robustness, ${S}_{local}$ is calculated as follows.

$$\begin{array}{c}{S}_{local}=E·{t}_{local}\#\left(4-13\right)\end{array}$$

$$\begin{array}{c}{t}_{local}=\frac{1}{WH}{\sum }_{i,j}\frac{{I}_{lowcrop}\left(i,j\right)}{{I}_{nomralcrop}\left(i,j\right)}\#\left(4-14\right)\end{array}$$

where $H$ and $W$ denote the height and width of ${I}_{lowcrop}$. After the above calculation, ${I}_{lowcrop}$ and ${S}_{local}$ and ${I}_{low}$ and ${S}_{global}$ are concatted to obtain ${I}_{flocal}$ and ${I}_{fglobal}$ respectively, and finally ${I}_{flocal}$ and ${I}_{fglobal}$ are concatted to obtain ${x}_{rf}$ and input to the light adjustment network module.

$$\begin{array}{c}{I}_{flocal}=\text{C}\text{o}\text{n}\text{c}\text{a}\text{t}\left({S}_{local},{S}_{global}\right)\#\left(4-15\right)\end{array}$$

$$\begin{array}{c}{I}_{fglobal}=\text{C}\text{o}\text{n}\text{c}\text{a}\text{t}\left({S}_{global},{I}_{low}\right)\#\left(4-16\right)\end{array}$$

$$\begin{array}{c}{x}_{rf}=\text{C}\text{o}\text{n}\text{c}\text{a}\text{t}\left({I}_{flocal},{I}_{fglobal}\right)\#\left(4-17\right)\end{array}$$

The information from the reflectance map is introduced in the Retinex image fusion module to compensate for the loss of illumination map edge information because of incomplete decomposition of the decomposition network, allowing the unsmooth areas in the image to be correctly adjusted and improving the quality of the output image. The method is shown in Fig. 4.2.

4.4 Light Curve Estimation with Illumination Map

In the study of Retinex theory, the reflectance map contains information, such as feature details and edges of the original image, and the illumination map contains information, such as contrast and brightness of the image. Therefore, further optimization of the decomposed reflection map and illumination map can achieve the effect of image enhancement. Inspired by the image editing software, the light curve was introduced to make the illumination curve of the illuminance map smoother and more natural, and each pixel point was adaptively enhanced, compared to Zero-DCE [1], which calculates the light curve for the original image and its adjustment has an effect on the original image color, uses the Retinex theory to decompose the illumination and detail features of the image, to adjust the illumination using the light curve, and to reduce the effect of color shift.

The light curve adjustment module is different from the conventional method of Encoder-Decoder network, which needs to downsample and then upsample the image, but in the process of upsampling, it is easy to produce a "checkerboard artifacts", which causes the color of a specific part of the image to be darker than other parts of the image. The aforementioned situation occurs when the ratio of the convolution kernel size to the convolution step size is not an integer. In our work, the light curve adjustment module was used as an illumination map enhancement network to enhance the illumination map through the light curve to make the image brightness smoother and eliminate the effect of checkerboard artifacts. A schematic of the light curve estimation method is shown in Fig. 4.3.

At the beginning of the Illumination phase, the image processed by the RDIF module is used as input and finally decomposed into 8 light estimation coefficient matrices ${{\alpha }}_{i}\left(i=\text{1,2},\cdots ,8\right)$ by four $3 \times 3$ convolutions and a concat operation on them respectively, which partially maps the light estimation coefficient matrices into a continuous space.

${{\alpha }}_{i}$ is a trainable parameter, the range belongs to $\left[-\text{1,1}\right]$, the illumination map processed by RDIF module is input to Illumination map enhancement module (Illumination), after four convolutions of $3 \times 3$, the feature map obtained from each layer convolution is mapped separately, and finally the feature map with the number of channels is ${x}_{7}$, which represents the illumination estimation ratio for each pixel point on the image.

$${x}_{1}={\text{Conv}}_{3\times 3}\left({x}_{rf}\right), {x}_{2}={\text{Conv}}_{3\times 3}\left({x}_{1}\right)$$

$${x}_{3}={\text{Conv}}_{3\times 3}\left({x}_{2}\right), {x}_{4}={\text{Conv}}_{3\times 3}\left({x}_{3}\right)$$

$${x}_{5}={\text{Conv}}_{3\times 3}\left(\text{Concat}\left({x}_{3},{x}_{4}\right)\right),{x}_{6}={\text{Conv}}_{3\times 3}\left(\text{Concat}\left({x}_{2},{x}_{5}\right)\right)$$

$${x}_{7}={\text{Conv}}_{3\times 3}\left(\text{Concat}\left({x}_{1},{x}_{6}\right)\right)$$

Depending on the adjustment scale factor, is decomposed into 8, which are each entered the Light Curve Estimation module to adjust the light curve, using the formula expressed as

$$\begin{array}{c}{{\alpha }}_{1},\dots ,{{\alpha }}_{8}=\text{s}\text{p}\text{l}\text{i}\text{t}\left({x}_{7}\right)\#\left(4-18\right)\end{array}$$

$$\begin{array}{c}{I}_{i}={x}_{rf}\left(j\right)+{{\alpha }}_{i}{x}_{rf}\left(j\right)\left(1-{x}_{rf}\left(j\right)\right)\#\left(4-19\right)\end{array}$$

$$\begin{array}{c}{I}_{out}=\text{s}\text{i}\text{g}\text{m}\text{o}\text{i}\text{d}\left(\text{Concat}\left({I}_{1},\dots ,{I}_{8}\right)\right)\#\left(4-20\right)\end{array}$$

Where ${x}_{rf}\left(j\right)$ represents the $j$ pixels of ${x}_{rf}$, ${I}_{out}$ represents the adjusted image, and ${I}_{i}$ represents the image adjusted by the $i$ th ${{\alpha }}_{i}$ .

5.1 Implemental Details

Datasets & Settings. We conduct experiments on two datasets: LOw-Light dataset [4] and GladNet [10] dataset. The GladNet dataset, proposed by Wang et al. [10], contains 5000 pairs of 8-bit RGB image pairs of low-light/normal images, which we use as the training data. We split the dataset into 3500 pairs for training, 1000 pairs for validation, and 500 pairs for testing. The LOw-Light dataset [4], which contains 500 pairs of low light/normal image pairs, is used as the comparison set for the test set to validate the enhancement effect. All images in both datasets are captured from real scenes. The GladNet dataset has variable image sizes with various scales, while all the images in the LOw-Light dataset are 600 × 400 × 3 images. For the downstream task, we use semantic segmentation on the ExDark [26] dataset, which is commonly used for blackout scenes. This dataset includes images of black night scenes taken from 13 regions, with over 10 categories of scenes such as buildings, streets, and cars.

Metrics. Peak signal-to-noise ratio (PSNR) [18], structural similarity index (SSIM) [17], mean absolute error (MAE), and mean squared error (MSE) are widely used evaluation metrics in image processing and computer vision tasks. PSNR is a measure of the peak error between two images and is often used to evaluate the image reconstruction quality, where higher values indicate better image quality. SSIM evaluates the similarity between two images by comparing the luminance, contrast, and structure of the images. MAE and MSE measure the difference between two images by computing the absolute and squared differences between the pixel values, respectively. These metrics are commonly used to evaluate the accuracy of image restoration and enhancement methods, where lower values indicate better performance.

Experimental environment. In this section, all experimental data except for the data generated under Nvidia NX for model optimization deployment testing, were generated under 2080ti. Some of the data may differ from the data in the original paper.

5.2 Ablation Study

This section discusses our ablation experiments on the GladNet and LOw-Light datasets to validate the proposed improvements. We then evaluated the model using the validation sets of both datasets.

Ablation for Light Curve Estimation In studying the KinD-Plus [14] network, we found that the illumination map enhancement module did not consider the color and light balance, so we designed its illumination map enhancement network with light curve estimation and our proposed KinD-LCE network. Our light curve estimation module improved the PSNR metrics for the same image input. The rendering is shown in Fig. 5.1, and the metrics table is shown in Table 5.1.

Table 5.1

PNSR and SSIM metrics of using Light Curve Estimation
Algorithm	PSNR↑	SSIM↑
KinD-Plus w/o LCE	16.2156	0.8173
KinD-Plus w/ LCE	18.9548	0.8112

Ablation for Retinex Fusion We fused the reflectance map with the illumination map, adjusted the coefficients of illumination map enhancement with the pixel values of the reflectance map, and compensated for the loss of edge details and other problems generated from the original image decomposition by fusing the reflectance and illumination maps. The image fusion module improved the PSNR and SSIM metrics over the original KinD-Plus [14] network by 16.66% and 3.52%, respectively. The results are shown in Fig. 5.2, and the metrics table is shown in Table 5.2.

Table 5.2

PNSR and SSIM metrics of using Retinex Fusion
Algorithm	PSNR↑	SSIM↑
KinD-Plus w/o Fusion	16.2156	0.8173
KinD-Plus w/ Fusion	19.4154	0.8433

Ablation for TV loss We add TV loss to the Illumination module to calculate the total variation of the illumination map to denoise the illumination map and reduce the noise due to the decomposition network, the metrics table is shown in Table 5.3.

Table 5.3

PNSR and SSIM metrics of using TV loss
Algorithm	PSNR↑	SSIM↑
KinD-Plus w/o TV loss	16.2156	0.8173
KinD-Plus w/ TV loss	18.5241	0.7822

5.3 Comparison of KinD-LCE and Other Methods

After our experimental comparison and cross-validation using different methods, we can see that we can achieve relatively high metrics by adding the three modules proposed in this paper. The experimental results are shown in Table 5.4.

Table 5.4

PNSR and SSIM metrics of using different methods
Light Curve Estimation	Fusion	TV loss	PSNR↑	SSIM↑
			16.2156	0.8173
✓			18.9548	0.8112
	✓		19.4154	0.8433
		✓	18.5241	0.7822
✓	✓		18.2518	0.7912
	✓	✓	17.4779	0.8113
✓	✓	✓	19.7216	0.8213

The method described in this paper was compared with the best methods available, the enhancement effect was evaluated using PSNR and SSIM metrics, and the evaluation was tested using 15 validation sets of the Low-Light dataset. The images are shown in Fig. 4.3, and the metrics are shown in Table 5.5. According to Table 5.5, we were performing at a high level in terms of metrics, providing evidence for the enhancement effect of our method on low-light tasks. This evidence is consistent with our observations, but we must improve the robustness of our method. Although there has been some optimization, some regions in some images HAVE color and detail variations.

Table 5.5

Table of comparative indicators for other methods, with "↑" indicating larger is better and "↓" indicating smaller is better.
Algorithm	PSNR↑	SSIM↑	MAE↓	MSE↓
KinD-Plus [14]	16.2156	0.8023	101.3138	103.6879
MBLLEN [33]	17.8583	0.7247	97.1637	77.0107
RetinexNet [4]	14.9774	0.5392	120.9511	102.1104
EnlightenGAN [7]	18.9248	0.8112	147.7995	96.0031
Zero-DCE [1]	14.7971	0.6640	68.2873	108.8530
Ours	19.7216	0.8213	101.1897	108.1573

5.4 Comparison in Downstream Task

Applying the method of this paper to downstream tasks (e.g., object detection and semantic segmentation) for conventional dark images without image processing can significantly reduce the detection effectiveness of our method. Therefore, using our method, the dark image was processed and then applied to the downstream task regardless of whether an object was detected in the image.

To highlight the effectiveness of the algorithms in this paper, we experimentally compared the images using different low-light enhancement networks to verify the effectiveness of various low-light enhancement networks for downstream tasks. The downstream task validation experiment used the HRNet [25] network for semantic segmentation with low-light images in traveling scenes for enhancement. We used the ExDark [26] dataset for this experiment when using the Cityscape dataset-based [27]. The trained model was validated against the driving scene. The following are the experimental comparison plots.

Based on the implementation results (Fig. 5.4), our algorithm performed well in color restoration and edge information restoration in downstream tasks, making our algorithm more effective for practical downstream applications. However, it is undeniable that in some extreme cases, many noises and brightness differences in the original picture information cause our algorithm to produce color distortion. We believed that the optimization of the fusion process could more effectively solve these problems

5.5 Test in Model optimization deployment

The method we tested for our paper's model optimization deployment can verify the superiority of our method when deployed on edge devices. We chose Nvidia NX as our test device and the Low-Light dataset as our test data. We deployed our code on it and ran it, ultimately obtaining the following results:

Table 5.6

The performace of our method before and after model optimization
parameter	Origin	Model optimization
Average frame count	3.263	0.042

After deploying and optimizing the method described in this paper on edge devices, its average frame rate performance has significantly improved, indicating its potential in edge computing tasks.

This paper proposes the KinD-LCE network, which integrates light curve evaluation, light map, and reflectance map fusion for low-light enhancement tasks. Our approach improved upon the KinD-Plus [14] method by compensating for the introduced noise and achieving effective image illumination enhancement. Specifically, we adjusted the light map using light curve estimation and fused the features of the reflectance map into the light map to improve the brightness of the image while reducing noise. We demonstrated the effectiveness of our method through ablation experiments and experiments on downstream tasks. However, the performance of our method has limitations when evaluated with metrics such as MAE and MSE, as image quality can still be compromised during image processing. Due to the limitation of the decomposition network, it is unable to achieve complete decomposition, while Retinex theory requires that the image be completely separated into illumination and reflection components. Even though we have implemented optimization measures in this regard, subjectively, this issue still inevitably occurs.

7.1 Ethical Approval

this declaration is “not applicable”.

7.2 Competing interests

No potential conflict of interest was reported by the authors

7.3 Authors’ Contributions

Conceptualization, Xiaochun Lei. and Junlin Xie.; Data Curation, Junlin Xie. and Weiliang Mai; Formal Analysis, Xiaochun Lei. and Junlin Xie.; Methodology, Junlin Xie. and Xiaochun Lei.; Supervision, Zetao Jiang. and Zhaoting Gong.; Validation, Ziqi Shan. and Linjun Lu.; Visualization, Linjun Lu. and Chang Lu, Zhaoting Gong.and Weiliang Mai.; Writing original draft preparation, Xiaochun Lei. and Junlin Xie.; Writing review and editing, Zetao Jiang., Zhaoting Gong. and Weiliang Mai; All authors have read and agreed to the published version of the manuscript.

7.4 Funding

National Natural Science Foundation of China (61876049, 62172118)

Nature Science key Foundation of Guangxi (2021GXNSFDA196002)

7.5 Availability of data and materials

The data that support the findings of this study are available on request from thecorresponding author, Jiang，upon reasonable request.

Guo, Chunle, et al. "Zero-reference deep curve estimation for low-light image enhancement." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
Zhang, Yonghua, Jiawan Zhang, and Xiaojie Guo. "Kindling the darkness: A practical low-light image enhancer." Proceedings of the 27th ACM international conference on multimedia. 2019.
Land, Edwin H., and John J. McCann. "Lightness and retinex theory." Josa 61.1 1971.
Wei, Chen, et al. "Deep retinex decomposition for low-light enhancement." arXiv preprint arXiv:1808.04560 (2018).
Gharbi, Michaël, et al. "Deep bilateral learning for real-time image enhancement." ACM Transactions on Graphics (TOG) 36.4 (2017).
Li, Chongyi, et al. "Low-light image and video enhancement using deep learning: A survey." IEEE Transactions on Pattern Analysis & Machine Intelligence 01 (2021).
Jiang, Yifan, et al. "EnlightenGAN: Deep light enhancement without paired supervision." IEEE Transactions on Image Processing 30 (2021): 2340–2349.
Guo, Xiaojie, Yu Li, and Haibin Ling. "LIME: Low-light image enhancement via illumination map estimation." IEEE Transactions on image processing 26.2 (2016).
Lore, Kin Gwn, Adedotun Akintayo, and Soumik Sarkar. "LLNet: A deep autoencoder approach to natural low-light image enhancement." Pattern Recognition 61 (2017).
Wang, Wenjing, et al. "Gladnet: Low-light enhancement network with global awareness." 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, 2018.
Zhang, Rongkai, et al. "Rellie: Deep reinforcement learning for customized low-light image enhancement." Proceedings of the 29th ACM International Conference on Multimedia. 2021.
Ying, Zhenqiang, Ge Li, and Wen Gao. "A bio-inspired multi-exposure fusion framework for low-light image enhancement." arXiv preprint arXiv:1711.00591 (2017).
Ying, Zhenqiang, et al. "A new image contrast enhancement algorithm using exposure fusion framework." International conference on computer analysis of images and patterns. Springer, Cham, 2017.
Zhang, Yonghua, et al. "Beyond brightening low-light images." International Journal of Computer Vision 129.4 (2021).
Pizer, Stephen M., et al. "Adaptive histogram equalization and its variations." Computer vision, graphics, and image processing 39.3 (1987).
Guo, Lanqing, et al. "Enhancing Low-Light Images in Real World via Cross-Image Disentanglement." arXiv preprint arXiv:2201.03145 (2022).
Wang, Zhou, et al. "Image quality assessment: from error visibility to structural similarity." IEEE transactions on image processing 13.4 (2004): 600–612.
Huynh-Thu, Quan, and Mohammed Ghanbari. "Scope of validity of PSNR in image/video quality assessment." Electronics letters 44.13 (2008): 800–801.
Liu, Risheng, et al. "Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
Liu, Yu, et al. "Image fusion with convolutional sparse representation." IEEE signal processing letters 23.12 (2016).
Zhou, Jingchun, Dehuan Zhang, and Weishi Zhang. "Multiscale fusion method for the enhancement of low-light underwater images." Mathematical Problems in Engineering 2020 (2020): 1–15.
Stark, J. Alex. "Adaptive image contrast enhancement using generalizations of histogram equalization." IEEE Transactions on image processing 9.5 (2000).
M. Yan, C. A. Chan, W. Li, C. L. I, S. Bian, A. F. Gygax, C. Leckie, K. Hinton, E. Wong, and A. Nirmalathas, "Network Energy Consumption Assessment of Conventional Mobile Services and Over-the-Top Instant Messaging Applications," IEEE Journal on Selected Areas in Communications, vol. 34, no. 12, pp. 3168–3180, 2016.
Yang, Jingyu, et al. "Low-light image enhancement based on Retinex decomposition and adaptive gamma correction." IET Image Processing 15.5 (2021): 1189–1202.
Yuan, Yuhui, Xilin Chen, and Jingdong Wang. "Object-contextual representations for semantic segmentation." European conference on computer vision. Springer, Cham, 2020.
Loh, Yuen Peng, and Chee Seng Chan. "Getting to know low-light images with the exclusively dark dataset." Computer Vision and Image Understanding 178 (2019): 30–42.
Cordts, Marius, et al. "The cityscapes dataset for semantic urban scene understanding." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Chongyi Li, et al. "Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation" IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021
Jiang, Yifan, et al. "Enlightengan: Deep light enhancement without paired supervision." IEEE Transactions on Image Processing 30 (2021): 2340–2349.
Xie, Junyi, et al. "Semantically-guided low-light image enhancement." Pattern Recognition Letters 138 (2020): 308–314.
Liu, Risheng, et al. "Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
Jobson, Daniel J., Zia-ur Rahman, and Glenn A. Woodell. "A multiscale retinex for bridging the gap between color images and the human observation of scenes." IEEE Transactions on Image processing 6.7 (1997): 965–976.
Lv, Feifan, et al. "MBLLEN: Low-Light Image/Video Enhancement Using CNNs." BMVC. Vol. 220. No. 1. 2018.
M. Yan, W. Li, C. A. Chan, S. Bian, C. I and A. F. Gygax, "PECS: Towards Personalized Edge Cachin g for Future Service-Centric Networks," China Communications, vol. 16, no. 8, pp. 93–106, 2019.
M. Yan, X. Lou, C. A. Chan, Y. Wang, and W. Jiang, "A semantic and emotion-based latent variable generation model for a dialogue system," CAAI Trans. Intell. Technol. 2023, pp. 1– 12, 2023. https://doi.org/10.1049/cit2.12153
Rudin, Leonid I., Stanley Osher, and Emad Fatemi. "Nonlinear total variation based noise removal algorithms." Physica D: nonlinear phenomena 60.1-4 (1992).

No competing interests reported.

Download PDF

Journal Publication

published 06 Dec, 2023

Read the published version in Signal, Image and Video Processing →

Editorial decision: Major revision
27 Sep, 2023
Reviews received at journal
26 Sep, 2023
Reviewers agreed at journal
24 Sep, 2023
Reviewers invited by journal
21 Sep, 2023
Submission checks completed at journal
19 Sep, 2023
Editor assigned by journal
19 Sep, 2023
First submitted to journal
09 Sep, 2023

You are reading this latest preprint version

KinD-LCE: Curve Estimation and Retinex Fusion on Low-Light Image

Status:

Journal Publication

Version 1

Abstract

Figures

1 Introduction

2 Related Works

2.1 Low-Light Enhancement

2.2 Image Fusion

3 Low-Light Enhancement Problems

3.1 Basic Information

3.2 Problem Description

4 Methodology

4.1 Network Architecture

4.2 TV loss

4.3 Retinex Fusion Module

4.4 Light Curve Estimation with Illumination Map

5 Experimental Results and Discussion

5.1 Implemental Details

5.2 Ablation Study

5.3 Comparison of KinD-LCE and Other Methods

5.4 Comparison in Downstream Task

5.5 Test in Model optimization deployment

6 Conclusions

Declarations

7.1 Ethical Approval

7.2 Competing interests

7.3 Authors’ Contributions

7.4 Funding

7.5 Availability of data and materials

References

Additional Declarations

Status:

Journal Publication

Version 1