Feature-Level Fusion of Infrared Image and Visible Light Image Based on Two-Dimensional Discrete Wavelet Transform and High-Pass Filtering

Visible light images and infrared images can effectively complement each other in visual information. Effective fusion of infrared images and visible light images can not only provide better visual effects, but also provide more visual information. In this paper, the discrete cosine wavelet transform is used to obtain the basic information of the image, and the high-pass filter is used to obtain the characteristic information of the image, and the basic information and the characteristic information are effectively fused. In this paper, two-level discrete cosine wavelet transform is performed on the image, which can effectively extract features without distorting the image; secondly, use image quantization to encode, compress and decode the image, reduce the amount of image data, increase the computing speed, and be effective Perform fusion operations on color images. A representative image evaluation method is used to evaluate the fusion image, and the effectiveness of the proposed method is discussed. Experiments have proved that this method is more effective than other methods, and the effect is more obvious in the fusion of color images and infrared images.


Introduction
Any object above absolute zero (-273.15 ℃ / 0 K) in nature is constantly emitting infrared rays of different intensities and wavelengths.
For this characteristic, we can use infrared sensors for image acquisition. Although the infrared sensor has the characteristics of allweather and strong environmental adaptability, the application of infrared sensor is limited because of its weak perception of scene brightness and low image resolution. The use of visible light sensor image acquisition of high resolution and strong perception of the brightness of the environment, can effectively make up for the shortcomings of the infrared sensor. Therefore, the subject information and details of the target can be well obtained by using the fusion of visible light and infrared sensors. The fusion methods of visible images (VI) and infrared images (IR) can be divided into two categories: region-based fusion and pixel-based fusion. The region-based fusion method is through image segmentation or extracting the salient regions and regions of interest of the image. at present, the commonly used fusion method is the use of neural network; the pixel-based fusion method is usually easy to implement, generally based on the transform domain, such as image pyramid, wavelet transform and contour transform [1][2][3][4][5]. The key of the region-based fusion method is the local salient region information of the IR, which is easy to ignore the detailed information of the image; the method based on pixel fusion pays attention to extracting the detailed information of the image, and the expression of the salient region information of the image is not obvious. Based on the above information, this paper proposes a pixel fusion method based on wavelet transform to express the basic information of IR and VI. High-pass filtering method is used to extract the details of IR and VI. Considering that the VI contains more visual information data, image quantization compression and decoding are used to remove data redundancy to improve the computing speed.
 Corresponding author. Tel.: +0-000-000-0000; fax: +0-000-000-0000; e-mail: author@university.edu A B S T R A C T Visible light images and infrared images can effectively complement each other in visual information. Effective fusion of infrared images and visible light images can not only provide better visual effects, but also provide more visual information. In this paper, the discrete cosine wavelet transform is used to obtain the basic information of the image, and the high-pass filter is used to obtain the characteristic information of the image, and the basic information and the characteristic information are effectively fused. In this paper, two-level discrete cosine wavelet transform is performed on the image, which can effectively extract features without distorting the image; secondly, use image quantization to encode, compress and decode the image, reduce the amount of image data, increase the computing speed, and be effective Perform fusion operations on color images. A representative image evaluation method is used to evaluate the fusion image, and the effectiveness of the proposed method is discussed. Experiments have proved that this method is more effective than other methods, and the effect is more obvious in the fusion of color images and infrared images. [6,7] Fusion image multi-level decomposition (MDLatLRR) is used to decompose the source image to extract the basic information and outstanding details of the image. It is mainly realized through two steps: first, the details of the image are reconstructed through the adaptive fusion strategy and reshaping operator, and the basic information of the image is fused through the average strategy; finally, the image is obtained by fusing the details and basic W information of the image.

Image Fusion based on MDLatLRR Fusion Framework
A fusion framework based on image multi-level decomposition (MDLatLRR) is used to fuse infrared and VIs. Firstly, the projection matrix L is learned by LatLRR, [7] and then the matrix is used to extract the detail and basic information of the input image at several representation levels. MDLatLRR is used to extract multi-level prominent features of the image, and finally the fused image is specially used to deal with the details and basic information of the adaptive fusion strategy reconstruction.
The MDLATLRR framework is generic and can be used to provide an effective decomposition method for extracting multi-level features for any number of input images. It can also be applied to other image processing fields with different projection matrices.

Image Fusion based on DWT [9]
Compared with Fourier, wavelet transform (WT) not only has the idea of local short-time Fourier, but also overcomes the disadvantage that the size of local window does not change with frequency. The traditional WT algorithm mainly makes use of the theoretical basis that the human eye is sensitive to the change of local contrast of image or signal, and selects the region of interest in the original image according to certain image fusion rules. and select the most prominent features in the region of interest, and retain these features in the final composite image. For the fusion of infrared grayscale image and VI, the visible details of IR and VI are effectively fused by WT.
However, the subsampling operation of the traditional WT will divide the picture into different levels, resulting in visual distortion of the image. The traditional WT also does not have the characteristics of transfer invariance, so it can not be expressed accurately. Stationary Wavelet Transform (SWT) is used to solve the transfer invariance of WT, and Discrete Cosine Transform (DCT) and local spatial frequency (LSF) are used to express part of the image features to solve the problem of image distortion in order to achieve better fusion results.

Fusion of visible light image and grayscale IR
The common WT will process the color image in grayscale, in order to reduce the dimension of image information, reduce the matrix dimension of recognition and improve the operation speed. In this paper, based on the principle of image compression coding and decoding, intra-frame data compression technology is used to reduce the amount of VI data, compress a large amount of data, and improve the operation rate to reduce the image fusion time. In order to make the details and feature information of the image more detailed, this paper fuses the image by using Discrete Wavelet Transform (DWT) and High-Pass Filtering (HPF) at the same time.

2-D DWT
Two-dimensional discrete WT (2-D DWT) is proposed based on continuous WT. In order to use the computer to calculate, the continuous function is sampled. The 1-D discrete WT of the new discrete function is represented by formula (1): In order to extend the 1-D discrete cosine transform to 2-D scale, it is necessary to define the scale function and the shift function: The three 2-D measurement functions vary with the intensity and grayscale of the image in three directions, H ψ changes horizontally, V ψ changes vertically, and D ψ changes diagonally.
In the same way as 0 j j  , f(x,y) is obtained by inverse transform of DWT: Like a DWT, using separable 2-D scales and wavelet functions, the rows of 2-D functions are decomposed by 1-D DWT, and then the columns are decomposed by 1-D DWT [12]. After decomposition, the low-frequency components LL, HL and high-frequency components LH and HH of the original image in horizontal and vertical directions can be obtained. In this paper, the image is decomposed by two-level WT, that is, the low-frequency component LL is further transformed by 2-D DWT, which can effectively quantify the image to meet the requirements of feature extraction. The specific decomposition process is shown in the Fig.1.

Image coding and compression
Redundant data is often produced in the process of image processing. generally, in order to improve the transmission efficiency of the image and the operation speed of the image, the redundant data is often deleted in the process of image processing. deleting these redundant data can effectively reduce the image data without destroying the effective information of the image [13]. This paper mainly deletes the visual redundancy and spatial redundancy of the color image without destroying the details and feature information of the image. Image coding can gradually reduce the amount of data, improve the transmission rate and accelerate the calculation rate. In practical application, it is also necessary to decode it and restore it to2-D data image form. In order to ensure the fidelity of the decoded image relative to the original image, that is, the image acquired after image compression and decoding has no information loss, it is necessary to propose an objective fidelity criterion.

Root Mean Square Error
Root Mean Square Error (RMSE) is a commonly used fidelity criterion. The RMSE between the original image f(x,y) and the compressed and decoded image f(x,y) is: The size of the image is M N  .
2 Mean square signal-to-noise Because there may be noise in the process of image compression and decoding, it is proposed f (x,y)=f(x,y)+n(x,y) $ that n(x,y) is the noise signal, and the root mean square signal-to-noise ratio of the output image is: The size of the image is M N  .

High-Pass Filtering method
The HPF method uses the High-Pass filter to suppress the low-frequency part of the image, and the ideal high-pass filter filters the low-frequency image signal by setting the threshold. In this paper, the high-frequency information in the quantized image is extracted by the high-pass filter, and the high-frequency information of the IR and the VI is superimposed, and the mean value of the low-frequency information in the quantized image is processed to reduce the influence of the fused image obtained by the HPF method on the details of the final fused image.

Fusion steps and parameter settings
Step 1, the image editor is used to scale the IR and VI to the same scale.
Step 2, the two images are decomposed respectively by 2-D DWT, and the LL subband of the decomposed image is decomposed by second-order WT.
Step 3, the decomposed images are quantized respectively, and the image coding technology of discrete cosine transform is used to compress them, so as to remove the information ignored by human visual system and reduce the amount of image data.
Step 4, HPF is used to filter the compressed data, low-frequency data is averaged, and high-frequency data is retained to obtain feature image A; Step 5, the compressed image data are respectively de-quantized, and the look-up table method is used to decode them. The de-quantized data are obtained by discrete cosine transform to form the decoded image.
Step 6, the inverse WT is performed on the decoded image to obtain the fusion image, which is used to express the fusion detail information.
Step 7, feature image A and detail image B are fused.

Evaluation system
To judge whether the method proposed in this paper is effective or not, the images fused by the above three methods are comprehensively evaluated and analyzed. The traditional comprehensive evaluation analysis can be divided into two categories: objective evaluation method and subjective evaluation method. The subjective evaluation method evaluates the image quality through the subjective evaluation of human eyes, but it is easily affected by environment, emotion and so on, which leads to the shortcomings of unintuitive evaluation results and poor stability. The objective evaluation method can quantitatively evaluate the image quality based on a certain mathematical model with strong stability and simple operation. In this paper, the objective evaluation method is used to evaluate the quality of the fused image, and whether the proposed method is effective or not is discussed. The evaluation systems used in this paper are: Image Gradient (IG); Peak Signal-to-noise Ratio (PSNR); Structural Similarity (SSIM); Mutual Information (MI); Information Entropy (EN). [2,[16][17][18][19][20][21] 1 Image Gradient IG makes use of the content with high sensitivity of the human eye, which well represents the structure information of the fused image.
For the2-D grayscale image [22], the horizontal gradient and vertical gradient can be decomposed and calculated first: Then take the sum of the two as the point gradient: H The point gradient value can obtain the edge information of the fused image features, and the larger the gradient value is, the better the algorithm is for image feature edge extraction.
2 Peak Signal-to-noise Ratio PSNR is measured in decibels, which is the ratio of the energy of the peak signal to the average energy of the noise: bits 10 2 -1 PSNR=10log ( ) MSE (18) MSE is the mean square error of the image, and bits is the number of bits per sampling value, bits=8 in this paper. The larger the PSNR, the higher the fidelity of the fused image and the better the image quality [23].

Structural Similarity
SSIM is an index to measure the similarity of two images. At the same time, if the two images are before and after compression, SSIM can evaluate the image quality of the compressed image. SSIM is mainly for brightness, contrast and image structure comparison [24,25]. The average gray level of the image is used as the estimation of luminance measurement: Luminance contrast function l(x,y) : 2μ μ +C l(x,y)= μ +μ +C (20) Contrast function c(x,y) : x y 2 2 2 x y 2 2σ σ +C c(x,y)= σ +σ +C (21) Structural contrast function s(x,y) : x y 3 x y 3 σ +C s(x,y)= σ σ +C (22) SSIM(x,y)=f l(x,y),c(x,y),s(x,y) = l(x,y) c(x,y) s(x,y) (23) Where x μ y μ is the average grayscale of the two images, x σ y σ is the square check of the two images, and xy σ variance represents the covariance of the image. 1 C 2 C 3 C is constant, in order to avoid the case that the denominator is 0. Weights > 0, which are used to adjust the weight of each module. When α ， β ， γ =1, and 3 2 C =C /2 to get: x y 1 xy 2 2 2 2 2 x y 1 x y 2 (2μ μ +C )(2σ +C ) SSIM(x,y)= (μ +μ +C )(σ +σ +C ) (24) The value range of the function is [0, 1]. The closer the value of the function is to 1, the higher the similarity between the two images.

Mutual Information
For how to calculate the amount of information transmitted before and after image fusion, the key lies in the problem of how much information is obtained from the fused image, and the main problem is to solve the problem of information transmission. The amount of received information depends on how much uncertainty is eliminated, which is the amount of MI [21], defined as: I(x,y)=I(x)-I(x|y) (25) I(x,y) indicates the amount of information obtained after transmission, I(x) represents the amount of information before transmission, and I(x|y) indicates the amount of information lost during transmission.

Information Entropy
EN is a common index to evaluate image quality. generally speaking, the greater the image information entropy, the more information the image contains, the better the image quality. However, if there is a lot of noise in the image, the information entropy will increase significantly, so it is impossible to determine the quality of the image. Formula of image information entropy [21]: H=-p log p  (26) I p refers to the probability that the grayscale value I appears in the image, and I refers to the grayscale of the image, which generally takes a value of 255.

Evaluation index system and experimental results
The above three different methods are used to evaluate the quality of the fused image to verify the effectiveness of the proposed method.
The first group of infrared and VIs are shown in fig.2 (a) and (b) with a resolution of 632 × 496. This image is a commonly used test image. The images obtained by different fusion methods can be seen that the effect of the proposed method is more obvious than the other two, and there is no ghosting in feature extraction. Obviously, the proposed method is effective and superior to the other two methods, and has a better feature extraction for the fused image.
The image quality of fused image fig.2 (c), (d) and (e) is evaluated. Table 1 shows the specific values of the proposed image quality evaluation index for different images. It can be seen from the table that the gradient information of the image obtained by the proposed fusion method is richer, the structure similarity with the original image is higher, and the information entropy is also larger. There is no obvious noise on the image, which indicates that the image quality is better.    The second group of IRs and light images are common infrared test images as shown in fig. 3. The picture environment is a city road with light source, and the image resolution is 632 × 496. MDLatLRR (c), DWT (d)proposed (e) in are used respectively. Compared with the other two fusion methods of DWT, more information is obtained. There is no obvious difference between the MDLatLRR and the proposed under the condition of subjective analysis, so the objective evaluation method is used. Table 2 shows the five indicators of the fused images of each group, which directly reflects the image quality of the images obtained by different methods. The data in the punching table can be seen intuitively that the image quality of the proposed method is better than that of the cyclic neural network. at the same time, the fusion image obtained by the proposed method has greater structural similarity and interactive information with the IR (a). Although the peak signal-to-noise ratio and structural similarity of the DWT are higher, the information entropy of the image obtained by the DWT is less. At the same time, the detail information of the obtained image is less, and the artifact problem of the image is more serious. To sum up, the detailed information of the image obtained by the proposed method is rich.
The third group of IRs and VIs, the main body of the image is street lights and pedestrian rural roads, the image environment is mostly trees, there is more obvious infrared radiation, is also one of the common test images, the image resolution is 632 × 464. In fig.3, it can be seen that the fusion image using DWT has poor processing of detail information, and the method proposed in this paper is rich in extracting the details of the image.
It can be seen from Table 3 that the SSIM and MI of the fusion image (e) obtained by the proposed method are significantly higher than those of the other two methods. The IG is also higher than that of the other two methods, indicating that the overall structure of the fused image (e) is better, but due to the existence of some noise in the original IR and VR, the two parameters of PSNR and EN have some limitations in the evaluation of image quality.  The fourth group of IR and VI are shown in fig. 5 (a) (b). The resolution is 770 × 562. The picture is an office building in the sun and has more detail information. Due to the exposure problem, the VI lost a lot of information. Three groups of images (c) (d) (e) were 13 obtained after the IR and VI was fused. By comparing the other two groups of images, it can be seen that the fusion image obtained by the proposed method has more detailed information. It has good ability of feature extraction. Table 4 lists the fusion quality coefficient of the fourth group of fused images, and there is some noise in the fused image. PSNR and EN have some limitations in the evaluation of image quality, and SSIM and MI are obviously higher than the other two methods, which shows that the proposed method eliminates more uncertain information, improves the amount of image information, and has less image distortion and higher similarity with the original image. The results show that the quality of the fused image obtained by the proposed method is high.  The fifth group of IR and VI is shown in fig. 6 (a) Table 5 lists the image quality indicators of different fusion methods for the fifth group of images. The PSNR and EN of the images obtained by DWT method are higher than those of the proposed method. However, for MI, the proposed method is higher than the other two fusion methods, indicating that more information is obtained from the original image, but the image quality is not as good as DWT fusion. For the fifth group of images, the quality of the fused image obtained by DWT method is better, and the image obtained by the proposed method can extract more information from the original image.

Conclusion
In this paper, the image is decomposed by 2-D DWT, the basic information of the image is extracted, and then the detail features of the image are extracted by HPF, and the basic information and detail features are fused. The fused image has more information of the original image, and the basic information of the original image can be extracted effectively at the same time. At present, the algorithm can extract the detail features of the fused image, but it cannot solve the problem of IR and VI registration, so it can further register the IR and VI, which can make image feature extraction more accurate and eliminate artifacts and other problems. The experimental results show that the proposed method can effectively extract the basic information and detail information of the original image, while maintaining high structural similarity and information interaction, which shows that the fusion image has less distortion and has a better ability of detail feature extraction compared with the MDLatLRR of region fusion and the DWT of pixel fusion. The amount of information of the image is improved.