Here we provide details of our experimental setup, datasets and performance metrics. We evaluated the performance and showed the effectiveness of DeTformer network on benchmark synthetic and real-rain datasets.
a) Experimental setup
Our proposed network was implemented on PyTorch 1.7 deep learning framework. AdamW optimizer solution was applied during the network training for a total of 105 iterations. Fixed learning strategy was used with 3 × 10−4 learning rate. Batch size was set to 8, and adapted variable patch sizes 128x128, 160x160 and 192x192 respectively. To make proposed network more robust various augmentation techniques were applied such as horizontal flip and vertical flip during the network training. In all TBs, window size was fixed to 8. All the experiments were carried on Google Colab pro which has Tesla V100 16G GPU.
b) Datasets
The effectiveness of the proposed network was evaluated on synthetic paired rain datasets and real-rain dataset [28], which includes Rain100L [6], Rain100H [6], Rain800 [10], Rain1200 [8], Rain12 [7] and Rain14000 [9] and renamed Testset as Rain100L, Rain100H, Test100, Test1200 and Test2800. Table.1 shows a brief summary of datasets used in this work.
Table.1. Summary of used datasets.
Datasets
|
Train Imgs.
|
Test Imgs.
|
Testset Renamed
|
Rain800 [10]
|
700
|
100
|
Test100
|
Rain14000 [9]
|
11200
|
2800
|
Test2800
|
Rain1800 [6]
|
1800
|
0
|
NC
|
Rain100L [6]
|
0
|
100
|
Rain100L
|
Rain100H [6]
|
0
|
100
|
Rain100H
|
Rain1200 [8]
|
0
|
1200
|
Test1200
|
Rain12 [7]
|
12
|
0
|
NC
|
Total
|
13712
|
4300
|
|
c) Evaluation metrics
To show the effectiveness and performance of the proposed network, we evaluated the derained image quality using two evaluation metrics. “Peak-signal-to-noise-ratio” (PSNR) and “Structural similarity index measurement” (SSIM) were calculated on the restored images. Generally, the larger their values are, the better the deraining effect is.
4.1. Comparison with the state‑of‑the‑art networks
We compared the performance of DeTformer comprehensively with several state-of-the-art (SOTA) deraining networks such as JORDER [6], DID-MDN [8], RESCAN [30], SSTL [17], PReNet [32], DerainNet [25], UMRL [26], MSPNet [29], SAPNet [46], SEMI [31], OUCD [33], ECNet [34], PMSDNet [35], RCDNet [36], DualGCN [37], MPRNet [38], RLNet [39] and DFIANet [40].
The visual quantitative results of DeTformer network on synthetic rain datasets are shown in Table.2. It is clear from the table that our proposed network achieves superior performance over all the state-of-the-art (SOTA) networks on all synthetic datasets. In particular, on Rain100L and Rain100H datasets, DeTformer network obtains 38.99 and 21.45 dB PSNR which is +3.79 and +1.97 dB PSNR higher compared to DFIANet [40] and which clearly shows that our network removes heavy and complex rain streaks more effectively. Table.2, shows that DeTformer network achieves the highest PSNR and SSIM metric values on Rain100L, Rain100H, Test100, Test1200 and Test2800 synthetic datasets. These is due to the fact that our network uses the benefits of transformer as well models the long-range contextual information better.
The visual qualitative results of the proposed network on synthetic rain datasets are shown in Fig.3, Fig.4 and Fig.5 respectively. Although the networks (PReNet, ECNet and DFIANet) remove the heavy rain streaks but yet “visible artifacts” and “blurred details” were observed in the derained outputs as shown in Fig.3.
Table.2 Quantitative results of the proposed network on synthetic datasets and made comparison with the SOTA networks.
Datasets
|
Year
|
Rain100L [28]
|
Rain100H [28]
|
Test100
[27]
|
Test2800 [13]
|
Test1200 [15]
|
Avg. PSNR/
|
Metrics
|
PSNR/SSIM
|
PSNR/SSIM
|
PSNR/SSIM
|
PSNR/SSIM
|
PSNR/SSIM
|
Avg. SSIM
|
Networks
|
|
|
|
|
|
|
|
JORDER [6]
|
2017
|
31.27/0.92
|
27.75/0.84
|
24.72/0.85
|
31.6/0.91
|
31.27/0.90
|
29.32/0.88
|
DIDMDN [8]
|
2018
|
25.23/0.74
|
17.35/0.52
|
22.56/0.82
|
28.13/0.86
|
29.65/0.90
|
24.58/0.77
|
RESCAN [30]
|
2018
|
29.80/0.88
|
26.36/0.78
|
25.01/0.83
|
31.29/0.90
|
30.51/0.88
|
28.59/0.85
|
SSTL [17]
|
2019
|
25.03/0.84
|
16.56/0.48
|
22.35/0.78
|
24.43/0.78
|
26.05/0.82
|
22.88/0.74
|
PReNet [32]
|
2019
|
32.44/0.95
|
26.77/0.85
|
24.81/0.85
|
31.75/0.91
|
31.36/0.91
|
29.42/0.89
|
DerainNet [25]
|
2017
|
27.03/0.88
|
14.92/0.59
|
22.77/0.81
|
24.31/0.86
|
23.38/0.83
|
22.48/0.79
|
UMRL [26]
|
2019
|
29.18/0.92
|
26.01/0.83
|
24.41/0.83
|
29.97/0.90
|
30.55/0.91
|
28.02/0.88
|
MSPNet [29]
|
2020
|
32.44/0.95
|
28.66/0.86
|
27.50/0.87
|
32.82/0.93
|
32.39/0.91
|
30.76/0.74
|
SEMI [31]
|
2019
|
22.25/0.84
|
18.08/0.57
|
20.72/0.68
|
24.38/0.73
|
23.91/0.71
|
21.87/0.70
|
OUCD [33]
|
2021
|
29.84/0.90
|
24.38/0.73
|
23.58/0.80
|
28.72/0.89
|
26.09/0.82
|
26.52/0.83
|
ECNet [34]
|
2022
|
33.42/0.95
|
27.91/0.86
|
27.55/0.88
|
32.42/0.93
|
30.05/0.90
|
30.27/0.90
|
PMSDNet [35]
|
2022
|
36.41/0.97
|
30.38/0.89
|
30.32/0.90
|
33.62/0.93
|
32.96/0.92
|
32.74/0.92
|
RCDNet [36]
|
2020
|
38.60/0.98
|
28.83/0.88
|
24.59/0.82
|
-
|
29.81/0.86
|
30.45/0.88
|
DualGCN [37]
|
2021
|
38.05/0.99
|
29.06/0.91
|
28.28/0.89
|
-
|
32.98/0.93
|
25.67/0.93
|
MPRNet [38]
|
2021
|
36.69/0.97
|
27.65/0.87
|
27.86/0.85
|
-
|
31.73/0.91
|
30.98/0.90
|
RLNet [39]
|
2021
|
37.38/0.98
|
28.87/0.90
|
27.95/0.87
|
-
|
32.62/0.91
|
31.70/0.91
|
SAPNet [46]
|
2022
|
34.77/0.97
|
29.46/0.89
|
29.13/0.88
|
32.18/0.93
|
32.46/0.91
|
31.60/0.91
|
DFIANet [40]
|
2022
|
35.20/0.95
|
29.48/0.87
|
28.90/0.88
|
33.12/0.93
|
32.92/0.92
|
31.92/0.91
|
Proposed
|
|
38.99/0.97
|
31.45/0.90
|
32.07/0.90
|
34.17/0.94
|
33.18/0.92
|
33.97/0.93
|
From the observation of derained images in Fig.3, this situation occurs in clouds, sky and roof and appears in JORDER [6], RESCAN [30], SEMI [31] and DFIANet [40] networks. As the colour of background is similar to rain streaks, some networks perform excessive deraining and remove the fine details of similar colour as in the second row of Fig.3.When the test images contain denser objects, it is difficult to remove rain streaks completely and recover finer details simultaneously, was clear from the telephone booth and black fence in the third and fourth rows in SEMI [31], PReNet [32], ECNet [34], JORDER [6] and DFIANet [40]. OUCD [33] network combines global information in their network and pays more attention only on local features and the network fails to remove heavy rain streaks completely. Therefore, compared to all these SOTA networks our proposed network can avoid all these problems and restore the derained images which is highly similar to target images.
Fig.4. shows that our network exhibits impressive recovery deraining results while removing diverse rain and also heavy rain. From the observed images in the 10th column, our network was able to restore clear image details and appropriate contrast and which are similar to ground truth images. Some more sample deraining results of the proposed network on Rain100H synthetic dataset along with their Mean square error (MSE), PSNR and SSIM are shown in Fig.5.
To show the robustness and efficiency of the proposed network, we also made a comparison with SOTA networks on real-rain dataset [28]. Fig.6 shows the derained results on real-rain dataset of the proposed network and a comparative analysis with DualGCN [37], MPRNet [38], RLNet [39] and DFIANet [40] networks. However, many of these methods produce artifacts during the image deraining process, which are not as clear as that of the images restored by our network. Our network removes rain streaks which are more unevenly distributed, and also achieves impressive performance while removing heavy rain streaks and outputs clear and detailed content results. Inspite of complex rain scenes present in nature, our network generates excellent results while removing rain streaks under realistic conditions.
We also provided a number of parameters required and floating-point operations (FLOPS) performed on a specific Rain100H dataset and made comparison with the SOTA networks in Table.3. It is observed that number of parameters in the proposed network reduces as general convolution was replaced by transformer.
Table.3. Number of parameters and FLOPS generated by SOTA networks.
Network
|
FLOPS (G)
|
Parameters (M)
|
MPRNet [38]
|
175.80
|
28.46
|
SwinIR [25]
|
238.00
|
36.94
|
Uformer [20]
|
174.70
|
32.58
|
HiNet [47]
|
293.79
|
26.59
|
Proposed
|
87.70
|
25.31
|
4.2. Ablation studies
A series of ablation studies were conducted to show the impact of various factors on DeTformer network and we evaluated the ability of the proposed network during the deraining process. All ablation studies use Rain100H during network training and testing.
4.2.1. Effect of basic composition
Table.4 shows the ablation study results of the importance of each component separately. Therefore, our network achieves higher quality performance. As seen from the table when FFN was replaced with GDWCFN module, PSNR dropped by 0.88 dB. So, this proves the effectiveness of GDWCFN in enhancing and preserving the local feature information and alleviates the drawback of original transformer in extracting local feature information. If MDWCTA module is removed, PSNR drops by 1.31 dB and this proved that the networks performance would be improved by multi-scale feature fusion. PSNR was drastically reduced by 1.52 dB, when all the up sampling and down sampling layers were removed and this shows the effectiveness of the designed U-shaped transformer structure. We also provided a number of parameters required and floating-point operations (FLOPS) performed when a specific component was employed in the proposed network.
Table.4. Effects of basic composition in the proposed network.
Component
|
PSNR
|
FLOPS (B)
|
Parameters (M)
|
Remove MDWCTA
|
30.14
|
83.7
|
24.86
|
Remove GDWCFN with FFN
|
29.57
|
85.3
|
25.02
|
Replace TB with CNN
|
27.86
|
81.3
|
23.02
|
Eliminate up and down sampling layers
|
28.93
|
83.4
|
24.53
|
Proposed structure
|
31.45
|
87.7
|
25.31
|
We also performed experiments on the number of scales to be employed in the encoder-decoder network structure for removing rain streaks effectively during the deraining process.
4.2.2. Effect of number of scales
Table.5 shows the impact of the number of scales to be employed in the proposed network and to show the effectiveness of multi-scale structure. From these observations, it is clear that when S=1, PSNR drops by 0.24 dB, since multi-resolution features can assist the DeTformer network better to remove heavy and complex rain streaks effectively. When S=4 we were able to achieve both higher PSNR and SSIM metric values.
Table.5. Impact of number of scales in the proposed network.
No. of scales
|
PSNR
|
SSIM
|
1
|
30.21
|
0.87
|
2
|
31.13
|
0.88
|
3
|
31.26
|
0.89
|
4
|
31.45
|
0.9
|
4.2.3. Effect of λ hyper parameter
From equation (9) the total weighted loss function depends on λ hyper parameter and was set to 0.05. In order to obtain better network performance, we performed an ablation study to fix λ parameter. Table.6 shows the influence of λ value on PSNR and SSIM values. So from these observations, we fixed λ value as 0.05 in weight loss function as it achieves higher PSNR and SSIM.
Table.6. Impact of λ parameter on total loss function.
λ
|
PSNR
|
SSIM
|
0
|
31.16
|
0.89
|
0.05
|
31.45
|
0.9
|
1
|
31.32
|
0.9
|
2
|
30.97
|
0.88
|
4.2.4. Effect of number of transformer blocks in encoder-decoder network
To decide the number of transformer blocks (N) to be employed in the encoder-decoder network, we performed an ablation study. Table.7 shows the impact of N on the proposed network on complexity and computational burden. In order to balance both complex structure and computational complexity i.e., deraining performance and efficacy we adopt N = 2 in our network.
Table.7. Impact of N in the proposed network.
N
|
PSNR
|
SSIM
|
1
|
31.26
|
0.89
|
2
|
31.45
|
0.90
|
3
|
32.94
|
0.91
|
4
|
33.27
|
0.91
|
4.2.5. Effect of different loss functions in the proposed network
An ablation study was conducted to show the effectiveness of Charbonnier loss, and make a comparison with two other popular loss functions L1 and L2. Table.8 shows the effectiveness of Charbonnier loss, so we adopted this loss function to reconstruct the derained image.
Table.8. Effect of loss function for improving deraining performance.
Loss function
|
PSNR
|
SSIM
|
L1
|
31.36
|
0.89
|
L2
|
31.39
|
0.89
|
Charbonnier
|
31.45
|
0.90
|
4.2.6. Impact of progressive learning
The impact of progressive learning adopted in our network ablation study is shown in Table.9. We achieved better results using progressive learning than fixed patch training while still balancing similar training time.
Table.9. Impact of progressive learning on the proposed network
Patch size
|
PSNR
|
Train time (Hours)
|
Progressive (1282 to 1922)
|
31.45
|
23.4
|
Fixed (1282)
|
31.33
|
22.3
|
4.3. Limitation
Although our DeTformer deraining network has achieved superior performance over SOTA networks, it has certain limitations. During the testing stage we fed our network with a rain-drop image and it shows inconsistent behaviour and was unable to remove rain drops as shown in Fig.7. This is because we didn’t train the proposed network with rain-drop images.