Dual-fusion Active Contour Model with Semantic Information for Saliency Target Extraction of Underwater Images

： Underwater vision research is the foundation of marine-related disciplines. The target contour 7 extraction is of great significance to target tracking and visual information mining. Aiming at the problem that 8 conventional active contour models cannot effectively extract the contours of salient targets in underwater 9 images, we propose a dual-fusion active contour model with semantic information. First, the saliency images 10 are introduced as semantic information, and extract salient target contours by fusing Chan–Vese and local 11 binary ﬁtting models. Then, the original underwater images are used to supplement the missing contour 12 information by using the local image ﬁtting. Compared with state-of-the-art contour extraction methods, our 13 dual-fusion active contour model can effectively filter out background information and accurately extract 14 salient target contours. 15


19
In recent years, the development and utilization of the ocean have gradually become an important 20 development direction. Since underwater vision research is the basis of marine-related disciplines, the 21 rapid development of underwater image processing technology is inevitable [1][2] . Image segmentation is a 22 basic method of target extraction, which aims to partition an image into several meaningful and 23 constituent regions and each region has coherent features such as intensities, colors, and textures [3] . 24 Now, some results have been achieved in underwater image segmentation. Liu et al. [4] have proposed 25 an improved level set algorithm based on the gradient descent method, and applied to segment underwater 26 biological image. Wei et al. [5] have improved the K-means algorithm to segment underwater image 27 background, and addressed the issue of improper K value determination. And then this algorithm can 28 minimize the impact of initial centroid position of grayscale image. SM et al. [6] have used canny edge 29 detection algorithm to segment underwater images, whereas canny edge detection algorithm was greatly 30 affected by background noise. Sun et al. [7] and Li et al. [9] have used fuzzy C-means algorithm to segment 31 underwater images. Rajeev et al. [8] have used K-means algorithm to segment underwater images. 32 However, the aforesaid clustering algorithms have been greatly affected by local gray unevenness of 33 underwater images. Also, clustering algorithms had local convergence errors and were only suitable for 34 underwater images with a single background gray level. 35 Some investigators have segmented underwater images based on optical properties and achieved 36 results. For example, Chen et al. [10] have proposed an optical feature extraction, calculation, and decision 37 method to identify the collimated region of artificial light, and employed a level set method to segment 38 the objects within the collimated region. This method could better identify the target region, but level set 39 method could not filter out background noise when the target region contains background information. 40 Xuan [11] et al. have proposed a RGB color channel fusion segmentation method for underwater images. 41 The proposed method obtained the grayscale image with high foreground-background contrast and 42 employed thresholding segmentation method to conduct fast segmentation. However, the disadvantage 43 of this method is that when the color of the background region is similar to the foreground region, the 44 target cannot be segmented. 45 Active contour model have also been used for underwater image segmentation. Zhu et al. [12] used 46 the cluster-based algorithm for co-saliency detection, and made salient region in the underwater images 47 be highlighted. And then the local statistical active contour model was used to extract the target contours 48 of underwater images. Qiao et al. [13] proposed an improved method based on active contour model. The 49 method used the RGB color space and the contrast limited adaptive histogram equalization(CLAHE) 50 method to increase the contrast of the sea cucumber thorns and body, respectively. Then, the method 51 extracted the edge of the sea cucumber thorns by active contour model. Li et al [14] have improved the 52 traditional level set methods by avoiding the calculation of signed distance function (SDF) to segment 53 underwater images. The improved method could speed up the computational complexity without re-54 initialisation. Bai et al. [15] proposed a method based on morphological component analysis (MCA) and 55 adaptive level set evolution to segment underwater images. The MCA was used to sparse decompose the 56 image into texture and cartoon parts. The new adaptive level set evolution method combined the threshold 57 piecewise function with variable right coefficient and halting speed function and was used to obtain the 58 edges of the cartoon part. Shelei et al. [16] segmented underwater grayscale images by fusing the geodesic 59 active contour model (GAC) and the Chan-Vese (CV) model. However, this method required that the 60 target region of the underwater image has uniform grayscale. Chen et al. [17] integrated the transmission 61 map and the saliency map into a unified level set formulation to extract the salient target contours of the 62 underwater images. 63 As a new technology of image processing, neural network has also been used for underwater image 64 segmentation. O'Byrne et al. [18] have proposed the use of photorealistic synthetic imagery for training 65 deep encoder-decoder network. This method synthesized virtual underwater images and each rendered 66 image had a corresponding ground truth per-pixel label map. Then established the mapping relationship 67 between the underwater images and the segmented images by training the encoder-decoder network. 68 Zhou et al. [19] have proposed a deep neural network architecture for underwater scene segmentation. The 69 architecture extracted feature by pre-training VGG-16 and learned to expand the lower resolution feature 70 maps by the decoder. The neural network has achieved certain results in underwater image segmentation, 71 but the lack of underwater data sets with corresponding functions is still a problem. 72 In general, most of the existing underwater image segmentation methods are used to segment images 73 with high foreground-background contrast and single background grayscale. When the underwater 74 images with varying background grayscale and the targets have complex texture, the segmentation results 75 of the above methods are not satisfactory . To address the above problem, we propose a novel dual-fusion  76  model with semantic information for salient object segmentation of underwater images with complex  77 background. In summary, the contributions of our model are as follows: 78  We introduce saliency maps as semantic information to segment foreground information and 79 background information; 80  The dual-fusion energy equation is proposed to extract the contours of saliency targets by integrating 81 local and global intensity fitting term; 82  For the missing saliency target information, we propose the correction module to correct the saliency 83 target contour error by introducing the original image contour information. 84 This paper is organized as follows. Section 2 reviews related works. In Section 3, we introduce in 85 detail the derivation process of the dual-fusion model. Section 4 shows the experimental process and we 86 compare the proposed method with state-of-the-art segmentation methods, and the results demonstrate 87 the superiority of the proposed methods. Section 5 presents the discussion about the parameters of 88 proposed model. The conclusion of this paper is shown in Section 6. 89 2 Related works 90

The C-V model 91
The Chan-Vese (CV) model [20] is initially derived from the Mumford-Shah (MS) functional [21] . The 92 MS functional aims to find an optimal piecewise smooth approximation image where ,0 v   are positive weighting constants, |C| is the length of the contour C . However, the non-96 convexity of the above energy functional make it difficult to be minimized, so the CV model has been 97 proposed to simplify and modify the above functional. The energy functional of the CV model can be 98 defined as follows: 99 Wx is a truncated Gaussian window or a constant window. 123 And then, the LIF model used the variation calculus and the steepest descent method to minimize In this section, we propose a dual-fusion active contour model with semantic information to extract 128 target contours of underwater images. Without the semantic information, the existing methods cannot 129 individually extract the target contour from the background. So it is necessary to introduce semantic 130 information and roughly extract the saliency target contour from the complex background. To avoid the 131 extraction error of saliency target, we introduce the original image contour to correct and supplement the 132 missing contour information. By semantic information and correction module, the proposed model can 133 accurately extract the saliency target contour from the complex background. 134

Saliency image fitting energy 135
In this paper, we used the pyramid feature attention network [23] to acquire the saliency images. 136 However, due to the low contrast of underwater images, there were some errors in the saliency detection 137 results such as local inhomogeneous intensity, background noise, and missing contour information. In 138 view of the local inhomogeneous intensity of the saliency images, we preliminarily employ the local 139 binary fitting to construct the energy functional sal E : 140 where S is the saliency images, C is a contour in the image domain  , 1 f and 2 f are image local 142 fitting intensities near the point x . The local fitting intensities 1 f , 2 f can be expressed as follows [24][25] : 143 and can be expressed as: 147 However, the local binary fitting may introduce some local minimums and is sensitive to noise. 149 Affected by the accuracy of saliency detection, saliency map of underwater images will inevitably have 150 background noise. Also the initialization curve greatly affects the segmentation results. To solve the 151 aforesaid problems, we introduce the global fitting term from the CV model into the energy functional 152 sal E . The local-global fitting intensities can be expressed as follows: 153 where 1 I and 2 I are mixed intensity, 1 c and 2 c are two constants derived from Eq.(3),  is a weight 155 According to the test images in this paper, the value of  can be taken from 0.5 156 to 0.9. And the more inhomogeneous the image intensity, the smaller the value of  . 157 With the level set representation, the energy functional can be expressed as follows: 158 The improved fitting energy sal E not only take local intensity information into account but also 160 avoid the local minimization. Therefore, for the saliency images of underwater images, the improved 161 energy functional can extract the contour of the inhomogeneous images more accurately. 162

Original image fitting energy 163
The problems of local inhomogeneous intensity and noise can be solved by fusing the local intensity 164 fitting and CV model. However, the missing contour information of saliency image still needs to be solved. 165 Therefore, the original underwater images be used to make up the missing contour information. 166 In this paper, we used the local image fitting model (LIF) [22] to extract the contour of original 167 underwater images. The energy functional can be expressed as: 168 Ix is a local fitted image, as shown in Eq. (6). Although the models such as LBF [24][25] , 170 RMPCM [3] , and LGIF [26] can extract the target contours of underwater images very well, as shown in Fig.  171 1, the LIF model has higher efficiency. The higher efficiency is because that the energy functional of the 172 LIF model does not include a kernel function. Also, the LIF model can well fit the original image, while 173 reducing the noise significantly by minimizing the difference between the fitted image and the original 174 image. 175 shows the segmentation results of the LBF, LGIF, LIF, and RMPCM model, respectively. 177 In Fig. 1, LBF, LGIF, and LIF models could better extract the target contour, but LBF was more 178 sensitive to the initial contour curve. The energy functional of LGIF and RMPCM both involved kernel 179 function. The kernel function performs more than one convolution operations for each iteration step, so 180 the evolution speed is slow. The running time of the above models are shown in Table 1 information of the salient target. 185

Dual-fusion Active Contour Model 186
To take smaller fitting energy at target contours than at other locations, we use an edge indicator 187 function [27] . The function can be expressed as follows: 188 Then we define the dual-fusion intensity fitting energy functional as follows: 190   regularize the level set function. Therefore, the smoothing process of the level set function can be 206 expressed as: 207 where is the standard deviation, and t  is the time-step. 209 In fact, the smoothing effect of the level set function by Gaussian filtering is slightly worse than the 210 traditional regularized term and is greatly affected by the time-step. However, the computing efficiency 211 of Gaussian filtering is much higher than the traditional regularized term. 212

213
In this section, the proposed method was tested on several underwater images with intensity 214 inhomogeneity. Also, the method compared with some state-of-the-art contour extraction methods in 215 efficiency and accuracy. In order to ensure the fairness of the comparison results, all contour extraction 216 results were produced on the same computer. And the computer was configured as Intel(R) Core(TM) i7-217 8650U CPU @ 2.11 GHz, 16.00 GB memory, Windows 10 system, and x64 processor. MatlabR2017a is 218 software platform. We use the same parameters 2

.1 The benefits of local-global intensity fitting 224
A comparative experiment was performed to prove the effectiveness of the local-global intensity 225 fusion in Section 3.1. We conducted different experiments, as shown in Table 2. 226 Table 2 The comparative experiment of local-global intensity 227 As shown in Fig.2, Experiment A could extract the target contour in intensity inhomogeneity region, 234 but the result was greatly affected by the initial contour curve (blue circled area) and was sensitive to 235 noise (green circled area). And the method of Experiment A also extracts the contours of non-boundary 236 regions. Experiment B could extract the target contour in intensity homogeneity region and was not 237 disturbed by noise, but the target contour in intensity inhomogeneity region cannot be extracted. Our 238 method could not only extract the target contour in intensity homogeneity region and inhomogeneity 239 region, but also not be disturbed by noise.   Fig. 3 (b). But in Fig. 3 (c), the coordinate point at the same position is inside the target instead 244 of on the target edge. This error is caused by the deviation of saliency detection. Therefore, it is necessary 245 to use the original image to supplement the missing information. This paper used the local image fitting 246 method to extract the contour information of the original image, and then used the contour information 247 to correct the deviation caused by saliency detection. The result of the correction is shown in Fig. 3 (e). 248 As shown in Fig. 3 (e), the missing contour information of saliency image is accurately supplemented, 249 and the background information is filtered out. 250  filter out the background information and accurately extract the target contour. Fig.4 (b) are the saliency 257 images of the original underwater images, the red circled represent the intensity inhomogeneity region, 258 the yellow circled represent the noise region, and the green circled represent the missing region of target. 259 For the regions of intensity inhomogeneity and noise, our method can still extract the target contour well 260 by the local-global intensity fitting term. Also, the saliency image of the first image obviously lacks part 261 of the target information (green circled region), our method can still extract the complete target contour 262 by integrating the original image contour information. It can be seen from Fig.5  The third row are the results of our method. 292 As can be seen in Figure 6, even though our method, Ref.
[17] all introduce semantic 293 information, our method can extract the target contour more accurately than Ref.
[12] and Ref. [17]. As 294 shown in the blue circle region of Fig. 6(a) and Fig. 6(b), our method extracted the target contour in the 295 detail region more accurately. This is because we have added the local-global fitting term to better extract 296 the contours of local inhomogeneous regions, and the original image correction module can correct errors 297 in semantic information. As shown in the green circle region of Fig. 6(c) and Fig. 6(d), our method can 298 filter out background noise better than Ref.
[17] and is more robust. To further verify the superiority of the proposed method, we also compared the contour extraction 301 results of the underwater image with the saliency image as the input of several classic models. In order 302 to test the robustness of the proposed method, we only selected low-quality saliency images 303 (inhomogeneous local intensity and incomplete saliency information) for comparison experiments. As 304 shown in Fig. 7, the segmentation results of LBF are severely affected by the initial contour curve and 305 are disturbed by the inhomogeneous regions inside the target. The LGIF model can avoid the influence 306 of the initial contour curve, but cannot extract complete contour information, as shown in the green dotted 307 region in Fig. 7(2). The LIF model can extract the target contour relatively completely, but it is easy to 308 fall into the local optimum and is also affected by the initial contour curve. The RMPAM model avoids 309 the local optimum error, but it also has the problem that the contour information cannot be extracted 310 completely, as shown in the green dotted region in Fig. 7(2). Our method can not only effectively avoid 311 local optimum, but also supplement the missing contour information through the original image, so the 312 results of our method are more accurate and complete than other methods. 313 images. 318

Quantitative comparison 319
In the following experiment, we compare the proposed method with aforementioned methods using 320 several evaluation index to conduct a quantitative analysis. Here, three evaluation indicators namely the 321 mean absolute error(MAE), the error rate(ER), and the detection rate(DR) are employed for quantitative 322 comparison. The MAE, ER and DR can be expressed by the following equations: 323 where m and n represent the length and width of the image, Det is the result of image segmentation, 327 gt is the hand-crafted ground truth. So evaluation results of the aforementioned five methods are shown in Table. 3, Table. 4 and Table. 5. 333 Table 3 The MAE results of LBF, LGIF, LIF, RMPCM and our method 334  Fig.5(8 Table 4 The ER results of LBF, LGIF, LIF, RMPCM and our method 335 Fig.5(1) Fig.5(2) Fig.5(3) Fig.5(4) Fig.5(5) Fig.5(6) Fig.5(7) Fig.5(8 Table 5 The DR results of LBF, LGIF, LIF, RMPCM and our method 336  and ground truth are the smallest, so the proposed method has the highest accuracy. Table 5 shows the 341 detection rates of the above five methods. The detection rate represents how many contour pixels are 342 correctly extracted. Therefore, our model with the highest detection rate can extract the target contour 343 more accurately. 344

345
In this paper, the parameter 1  is a constant, which controls the influence of the saliency image 346 fitting energy and original image fitting energy. When the missing information of saliency target contour 347 is severe, 1  should be relatively larger; otherwise 1  should be taken to a small value. Also,  should 348 be should be taken smaller when the intensity inhomogeneity of saliency image is severe. This is because 349 that the local intensity fitting can better segment target in intensity inhomogeneity region and the results 350 of contour extraction relies on the local intensity fitting.  images. 465 Tables   466  Table 1 Iterations and CPU time (in seconds) 467