Import vertical characteristic of rain streak for single image deraining

Recently, deep convolutional neural networks show good effect for single image deraining. These networks always adopt the conventional convolution method to extract features, which may neglect the characteristic of rain streak. A novelty vertical module is proposed to focus on the vertical characteristic of rain streak. Such module uses 1 ×X\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times X$$\end{document} convolution kernel to extract the vertical information of rain streaks and a X×X\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X \times X$$\end{document} convolution kernel to keep relative location information. Use this module in the front of deraining network can better detach rain streaks from background. In addition, the contrastive learning is employed to improve the performance of the model. Extensive experimental results demonstrated the superiority of the deraining methods with the proposed methods in comparison with the base ones.


Introduction
Rain is a common natural phenomenon. while shooting with this weather, the images captured outdoors under the influence of this weather will have many disadvantages, such as rain streaks in the image will have a certain obscuring and distorting effect on the background scene or the heavy rain obscures the important things.
In image processing, the rain streaks and other noise in the image will produce certain obscuration and distortion of the background scene, making the key information in the image more disturbed, which is not conducive to the postuse of the image.
However, many computer vision applications for outdoor scenes generally require high quality and clear images. Occluded and distorted images due to rain can make these computer vision applications less effective or lose their original functionality. Based on these demands, use computer programs such as deraining [1] and dehazing [2,3] to make images clearer have developed rapidly in recent years. With the development of deep learning, deep learning-based derain network has been widely used. Reference [1] proposed a progressively optimized residual network progressive ResNet (PRN) and progressively optimized recurrent network (PReNet) to complete the image rain removal. Reference [4] proposed bilateral recurrent network further improves the accuracy of deraining network. To generate higher quality derained images, [5] Ding et al. proposed a recurrent distributed feedback network (DFN). The new feedback block is proposed to use high-level information to correct low-level input.
Rain streaks have distinct features in the vertical direction. During a rainy day image capture, due to the motion blur caused by the movement of the captured raindrops, rain streaks will be distorted, blurring objects in the occluded areas. Due to the influence of gravity, the main direction of movement of raindrops is usually downward, so rain streaks usually appear as vertical strips on the image. However, lots of deraining networks before have not fully exploited the vertically oriented features of rain streaks. We think using the vertical feature of rain streaks can help the computer 1 3 more accurately find where the rain streaks are covered. In order to make reasonable use of this feature to make the network achieve better results, we propose vertical module in this paper in the Fig. 1. The convolution kernel of vertical module is in the shape of a vertical strip, this allows the convolution to better extract feature information in the vertical direction. Adding vertical convolution before the deraining network can better extract the low-level information of rain streaks.
To gain better performance, proposed frame uses a contrastive learning method in the Fig. 2. Contrastive learning has been widely used in various deep learning networks and achieved good results [6][7][8]. The proposed method use a norain picture and a raining picture (same as the input of the model) as samples, compare the output of model and the sample picture with the pre-trained VGG to calculate a new loss. Make the training model close to positive sample(norain picture) and far away from Negative sample (raining picture) to gain better training model.

Related work
In this section, we present a briefly over for the approaches and results of deraining algorithms.
These studies can be divided into three parts: deep network-based deraining methods, special convolution and contrastive learning.

Developing of deraining models
A rainy image is consists of a clean image and a rain layer. Linear summation deraining models have been used in video deraining [9][10][11][12][13]. Inspired by the success of its success in video editing, linear summation model have been developed in combination with proper regularizers on both background image and rain layer. With the help of Gaussian mixture model and other models, linear summation image deraining models have shown significant promise [14]. A further improved model, the screen blend [15,16] model, can remove rain streaks with the use of the discriminative dictionary learning to separate rain streaks.
Deep learning-based models for removing rain marks have been developed after 2017. From Derainnet to DNN [17,18], a residual neural network based on processing the relationship between the rain and the background layers was proposed by Fu.
Based on previous studies, Ren et al. [1] provided a better and simpler baseline deraining network by improving the network architecture, the input and output, and the loss functions, and repeatedly using ResNet proposed progressive ResNet to take advantage of the recursive computation. Based on bilateral LSTMs, BRN [4] is also proposed later  to improve the interplay between rain layers and background layers.

Special convolution
Asymmetric convolutions is a type of special convolutions which are compression and acceleration-optimized for computational complexity on the basis of square-kernel convolutional.
Previous studies [19,20] have shown that a typical convolution with a kernel d × d can be replaced with two convolution with d × 1 and 1 × d kernel to reduce the parameters. This result, however, is based on the fact that the learned kernels' eigenvalues' intrinsic rank is not equal to one, therefore, applying the new kernel instead of the typical one would result in significant information loss [21]. Denton et al. [19] developed an optimization algorithm with a low-rank approximation by using SVD algorithm and successfully reduced losses of this method. Jin et al. [21]Jin et al. used structural constraints to separate 2D kernels and obtained comparable performance as conventional CNN with 2 times speed-up. Asymmetric convolutions also have a good effect on reducing the amount of parameters.
EDANet [22] replaced 3 × 3 convolution kernel with a similar method and reduced the relevant part by one-third of the parameters computations with minor performance degradation. Ding et al. propose Asymmetric Convolution Blocks (ACNet [23]), the method replaced the traditional 3 × 3 convolution in the existing network with the mix of 1 × 3 convolution, 3 × 1 convolution and 3 × 3 convolution as the output of the convolutional layer.

Contrastive learning
Contrastive learning is widely used in self-supervised learning [24][25][26][27][28] and deep learning-based image processing models [6,[26][27][28][29]. Contrastive learning uses a positive point and a negative point to judge the output loss of the training model. Contrastive learning makes the output of the model close to the positive result and away from the negative result faster and more accurate by changing the method of calculating the loss value. It has been shown that contrastive learning leads to a better effect in image-to-image translating. Recently, Wu et al. [6] proposed a new sampling method and a novel pixel-wise contrastive loss with contrastive learning and used the method to image dehazing with success. We propose that the similar model would be applicable to deraining network.

Deraining frame with vertical module and contrastive learning
This section describes the composition of the proposed deraining framework in the present study. The proposed method first uses vertical module to make the network better extract features, then use contrastive learning to optimize the training of the model.

Structure of vertical module
Special convolutions provide an efficient way to process and extract information. According to the information to be processed, special convolution can improve the accuracy or efficiency of the convolutional extraction of information or reduce the amount of operations and parameters by changing the way the convolution operates. To improve the effect of single image deraining network, we propose vertical module Fig. 3 to extract low-level information of the rain image. Due to gravity, rain streaks are usually in the vertical direction, the proposed method uses a 1 ×X strip convolution kernel to extract the vertical features of rain streaks. Simultaneously, in order to preserve the positional relationship between rain streaks and surrounding images, an X × X convolution kernel is also used to extract relevant spatial position information. ACNet [23] shows that additivity may hold for 2D convolutions, even with different kernel sizes: where I is a matrix, the input image of vertical module, K 1 and K 2 are two 2D kernels with size 1 ×X and X × X respectively. Taking advantage of the additivity of convolution, we complete the design of the vertical module. The vertical module uses a 1 × X convolution kernel and a X × X convolution kernel to complete the convolution operation in a sliding window method to obtain two parallel layers, each of the two layers is followed by batch normalization (Fig. 4). Finally we add the resulting two parallel layers to complete our vertical module. When X = k , the vertical module can be expressed by the following formula: where A represents the convolution kernel of X × X , and B represents the convolution kernel of 1 ×X . The proposed method calculate as above method to form the final vertical module to achieve better extraction of the vertical direction features of rain streaks without losing position information.

Vertical module with single image derain networks
In order to improve the effect of the single image deraining networks, vertical module is proposed to add to the network to improve the effect. Vertical module is more effective in extracting low-level information. Due to the operation of the neural network, the low-level information including rain streaks has been destroyed in the middle and rear parts of the neural network, which is not conducive to the proposed module to detach the rain streaks from background. But in the front section of the neural network, the data contains vertical features of rain marks obviously can make our modules work better. After considering many factors, we found adding vertical module in front of the deraining neural networks to process rain streaks directly can gain better effect in detaching rain streaks from background and remove them. Based on this, the proposed method add the vertical module to the single image deraining neural network to better improving the effect of original single image deraining method.

Contrastive learning for deraining method
Contrastive learning aims to learn a representation to pull the result with positive sample in some metric space and separate apart the representation between negative sample. The application of this method to rain removal involves two parts: to find suitable positive and negative samples, and to find a proper method to determine the difference between the network output and the sample. In this paper, we use common deraining datasets such as Rain100L, Derain datasets usually contains a certain number of rain image such as Figs. 7, 8 and their corresponding clear image such as Fig. 6. In the contrastive learning of the single image deraining neural network, we define clear images as positive sample and raining images as negative samples. In order to better extract the characteristics of the image to determine the difference between the network output and the sample, we use the pre-trained VGG-19 [30] to extract the features of the picture to more accurately and efficiently judge the difference between the output picture and sample pictures; The contrastive learning part can be expressed as formula: In the formula, V represents the VGG-19 model, L is the Loss function in contrastive learning, and A, B, and O

Fig. 4
Contrastive learning for single image deraining method represent the positive and negative samples and the output of deraining model. Contrastive learning can make the image output by the model closer to the norain image and away from the raining image to achieve the purpose of single image deraining. In addition, we compared that L1 loss work better than L2 loss [31] and choose L1 loss to cooperate with the proposed module. Therefore, the overall loss function can be further formulated as: where V i extracts the ith hidden features from VGG-19. k i is a weight coefficient of different feature. Loss 0 is the loss of original model and m is a weight coefficient to Balancing the weight of contrastive learning loss and loss of original model. Figure 5 shows the structure and usage of the proposed framework. We add the vertical module in front of the single image deraining network to extract the low-level features that make the rain streaks show vertical regularity in the raining pictures. This allows the position of the rain streaks to be more accurately identified and the characteristics of the picture blocked by the rain streaks can be obtained therefrom. After the rain image is restored by the single image deraining network, the traditional loss calculation effect has become not so good. For this reason, we have introduced a contrastive learning module for the single image deraining network, and use the VGG-19 in the module to better extract the features of differential of image. Then by comparing the output image with the clear image and the raining image, the output image of the model is pushed from the raining image to the clear image. The cooperation of the two proposed

The overall structure of the proposed frame
modules effectively improves the effect of the rain removal network.

Experimental results
In this section, we compare the method of the proposed frame with other methods, including PReNet [1], BRN, et al.
In addition, we also compare the proposed framework with non-contrastive learning and other methods with special convolutions, including ACNet [23], dynamic convolution [32], et al.

Datasets and training settings
We conduct experiments on multiple datasets , including Rain100L [33], Rain12 [14], Rainlight [34], et al. Datasets such as Rain100L are widely used in rain removal tasks. Rain100L contain 100 pictures of light or heavy rain respectively, simulating various rainy scenes. Figure 6 shows a norain picture in the dataset, and Figs. 7 and 8 show the rainy versions, respectively.

Environment and implementation
We used the pytorch framework to train the proposed model. We refer to the method in Ref. [23] for the optimized implementation of the vertical module. We refer to [6] for the optimized implementation of our proposed contrastive learning and the adjustment of the scale coefficients of the contrastive learning loss and SSIM loss according to the training situation.
The model was trained with 2 NVIDIA titan Xp graphics card and Intel i7-6950X CPU with 128 GB RAM. Our dataset processor uses h5py to pack the dataset into groups, and then feed each group of data into the network . The training batch size is set to 18 to run under a suitable computing power, and run each experiment for 200 epochs to make the model achieve a more stable effect. We use 0.001 as the initial learning rate for model training and automatically adjust it multiple times during training.

Experiments on Rain100L datasets
Rain100L is a widely used single image deraining datasets with a training sets of 200 images and a test sets of 100 images. The proposed method uses the single image derain network PReNet and the PReNet with our proposed framework to train on the Rain100L training set and test on Rain100L to reflect the role of the proposed single image derain framework.
We use PSNR and SSIM as metrics to measure the accuracy of test results. The significance of bold value means the best results.The PSNR (peak signal-to-noise ratio) in the table is a measure of the ratio of the maximum possible power of a signal to the power of destructive noise that affects the accuracy of its representation. PSNR of clear image x and model output image y is calculate as follow formula: where MAX is the maximum possible value of pixels in the image. MSE is mean squared error and it can be calculated as: SSIM (structural similarity) is a measure of the similarity between two images. SSIM of picture x and y can calculate as follow formula: where a x is the mean of x, a y is the mean of y, 2 x is the variance of x, 2 y is the variance of y, and 2 xy is the covariance of sum of x and y. c 1 = (0.001L) 2 , c 2 = (0.003L) 2 are constants used to maintain stability. L is the dynamic range of pixel values. Table 1 shows that the proposed frame plays a great role in the task of single image deraining. The vertical module in the proposed framework can better extract the features of the vertical direction of the rain streak in the light rain image, resulting in an improved accuracy in the rain removal. Adding this module to the single-image deraining network can improve the peak signal-to-noise ratio of the output image and improve the image quality.
PSNR(x, y) = 10 ⋅ log 10 MAX 2 MSE , SSIM(x, y) =  We select one of the output pictures from the models trained with dataset Rain100L. In Figs. 11 and 12, comparisons between the outputs of the original model and of the model with our proposed framework suggest that the model with our proposed framework can better recognize and remove rain streaks through vertical module.

Experiments on other datasets
To further test the proposed method, we have also tested the proposed model with (test) datasets RainLight and Rain12.
Specifically, we trained the proposed model on PReNet and tested the model on these test datasets and compare our results with PReNet and other deraining networks.
Experimental data in Table 2 shows that our proposed framework performs well for the task of removing rain. The frame results in an enhanced Peak Signal to Noise Ratio from 37.48 to 37.87 in dataset Rain100L and from 36.66 to 37.17 in dataset Rain12.
Experimental data in Figs. 9, 10, 11, 12, 13, 14, 15 and 16 shows that with the help of the vertical module, the experimental model preserves many vertical and

Ablation study
To better understand the contribution of vertical module and contrastive learning to image deraining in network model,  we changed the network structure and trained two models to compare the results with the network structure proposed in this study. We trained one of the models with only vertical module and the other with only contrastive learning. From the experiment results in Table 3, we find that the vertical module and contrastive learning can indeed lead to improvements in single image deraining, and the proposed framework, which integrates the advantages of both methods, results in further improved results.

Conclusion
We have developed a novel framework for single image deraining. A novelty vertical module was proposed in order to detach rain streak from background. The proposed vertical module includes an 1 × X and a X × X convolution kernel to gain better derain effect when used in front of derain networks. In the meantime, we introduce contrastive learning to make the direction more accurate and the training more effective. Extensive tests on a widely used deraining datasets demonstrate the validity and effectiveness of our approach.