Fine-grained image processing based on convolutional neural networks

doi:10.21203/rs.3.rs-3126618/v1

Download PDF

Short Report

Fine-grained image processing based on convolutional neural networks

https://doi.org/10.21203/rs.3.rs-3126618/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In the field of computer vision, convolutional neural networks are deep learning algorithms that can classify or detect images by learning image features. In order to achieve advanced recognition and analysis of images, multi-layer neural network models are employed in the discipline of image processing to gather and recall intricate aspects and patterns in the pictures. In this paper, we summarize and analyze the fine-grained image processing methods based on convolutional neural networks, including fine-grained image segmentation, image super-resolution reconstruction, and image edge detection methods. We also analyze the research progress of the three techniques both domestically and internationally. At the same time, experimental comparisons are conducted on mainstream datasets in the corresponding fields to obtain the performance of various fine-grained image processing methods. Finally, the development of convolutional neural networks in the field of fine-grained image processing is prospected.

Computer vision

Convolutional neural network

Fine-grained

Image processing

Deep learning

With the rapid development of deep learning, computer vision has been gradually subdivided and become a separate architecture[1]. Convolutional neural networks for image processing is a well-studied subject that has advanced greatly in the field of computer vision. At its foundation, the deep learning technique employs convolutional layers to extract data features[2]. The convolutional kernel is the core structure of the convolutional layer, and is a small and learnable parameter matrix[3]. In the convolutional layer, the convolution kernel obtains a new set of feature maps by sliding and convolution operations on the input data[4]. These feature maps capture different aspects of the output data features, such as edges, textures and shapes[5]. Convolutional neural networks have been successfully applied to many fine-grained image processing tasks, such as image segmentation[6], image super-resolution reconstruction[7], and image edge detection[8], among others.

Image segmentation is the process to form several disjoint but semantic subregions by dividing the image to different degrees[9]. The pixel points located within a specific boundary contain a certain degree of correlation, while the pixels within different boundaries have certain degree of difference[10]. Therefore, in the image segmentation process, some pixels with a high degree of similarity need to be similarly labeled, and thus they are classified into the same region[11]. In the fields such as medical image processing[12], satellite remote sensing[13], and biometric tracking[14], image segmentation can offer detailed and accurate information about an image's features for complicated and changing situations, which increases the efficiency of image processing jobs and has a influence on field[15]. Due to the superposition of various semantic information, the traditional segmentation techniques cannot solve certain segmentation problems[16], so edge-based[17-22], region[23-25], and clustering[26-30] segmentation methods have been emerged one after another, and the segmentation accuracy has been gradually improved. Especially, after integrating convolutional neural networks into the field of image segmentation, the image segmentation challenge has been obtained substantial breakthrough at this time[31].

After convolutional computational processing, image super-resolution reconstruction converts the input low-resolution image into a high-resolution image[32], which is widely used in video surveillance[33], satellite remote sensing[34] and medical images[35]. Image super-resolution reconstruction can reconstruct different high-resolution images based on effective information, while outputting low-resolution images as high-resolution images[36]. However, the complexity of the picture detail recovery process increases with the technique factor, opening new possibilities for convolutional neural networks is used in super-resolution reconstruction[37], and the introduction of convolutional neural networks into image super-resolution reconstruction has made it fruitful[38].

Image edge detection can accurately and quickly characterize and refine image edge feature information, has been used throughout the fields of feature characterization, target recognition, and feature refinement[39]. Image edge detection is used to calculate image edge features (e.g., color, brightness) and determines the accuracy and quality of image feature refinement[40]. In previous methods, such as edge detection algorithms based on Sobel operator, the artificial underlying features are prioritized for edge detection with developed relatively rapidly, but the disadvantages are sensitivity to noise, sensitivity to image brightness changes, and insensitivity to edge orientation[41]. With the development of deep learning, In the last few years, convolutional neural networks have made tremendous progress in the exploration of high-dimensional representation methods based on natural images through automatic learning advantages[42]. Therefore, the use of convolutional neural networks for edge detection has become a new trend[43].

In the area of fine-grained image processing, we outlines three techniques: image segmentation, image super-resolution reconstruction and picture edge detection. And we introduces various convolutional neural network structures in detail based on their development, and concludes with an outlook on the field of fine-grained image processing.

Convolutional Neural Network

With the application of Convolutional neural network in the field of image segmentation, the shortcomings of the previous image segmentation methods (such as threshold[44], edge detection[45] and region segmentation[46]methods) are increasingly obvious. The threshold-based image segmentation method sets different parameters for the grayscale threshold, classifies the grayscale histogram, and then divides it into different ranges. Pixels within the same range are considered to belong to the same type with similarity[47]. This method is simple to calculate, has high computational efficiency, and is fast, but it is not ideal for processing images that contain excessive information. Edge based image segmentation method is a classic segmentation technique, whose basic principle is to analyze the brightness values between pixels in the image to inspect possible boundaries[49]. If the difference in brightness value between pixel points and adjacent edge pixels points is significant, it is assumed that the corresponding pixel points are located at a boundary point. [50]. By detecting and connecting pixel points on these boundaries, an edge range is formed, and the image is divided into different regions[51]. This method can effectively distinguish different regions, but it should be noted that image noise may interfere with boundary detection, so appropriate preprocessing and filtering are required[52]. Region based image segmentation is a technique that connects pixels with similar features and segments different regions[53]. Compared to other segmentation methods, this method can effectively reduce the impact of insufficient segmentation space and spatial continuity on segmentation results[54]. However, this method is prone to improper segmentation, where pixels that should not have been divided into different regions are divided into different regions, thereby affecting the segmentation effect[55]. Therefore, when using this method, it is important to carefully select and adjust the parameters. With the development and innovation of convolutional neural networks, they have been able to effectively utilize image information to solve various segmentation problems of images, thereby achieving accurate image segmentation[56]. So far, the field of image segmentation has achieved great development. The current research on image segmentation methods based on convolutional neural networks mainly includes full convolutional neural networks(FCN), deep laboratories(DeepLab), masked region based convolutional neural networks(Mask Region-based), and Pyramid Scene Parsing Network(PSPNet).

2.1 Image segmentation model based on FCN

FCN first applied convolutional neural networks to image segmentation, laying a certain foundation for the basic network model framework of image segmentation[57]. The previous Convolutional neural network are usually stacked by multiple convolutional layers, and feature mapping is performed in the full connection layer of the last layer. In contrast, FCN adopts a full convolutional approach, which performs convolution calculations on the output image again. In the deconvolution calculation, an upsampling operation is used, and the output result is achieved after passing through the final layer.

FCN utilizes convolutional neural networks for extracting image features, and uses reverse convolution to resample the image, and then revert its resolution. So as to get more accurate segmentation results, FCN utilizes the skip connection mechanism to fuse feature maps of different resolutions [58]. As shown in Figure 1, In this process, the high-resolution feature maps and the low-resolution feature maps are fused through addition. Then, a 1x1 convolutional layer is used as the classification layer to reduce the dimensionality of the feature map and output the classification results for each pixel. The output of the classification layer is a probability graph that represents the probability of each pixel belongs to different categories. In image segmentation tasks, the category with the highest probability is used as the segmentation result for that pixel. According to the different pooling layers used during the pooling process, the segmentation results are divided into FCN-32s, FCN-16s, and FCN-8s. From the segmentation results shown in Figure 2, it can be seen that due to the fusion of multi-level data features, the segmentation effect of the FCN-8s model is significantly better than that of FCN-32s and FCN-16s.

2.2 Image segmentation model based on DeepLab

The DeepLab series of network models is a type of model used for semantic segmentation. In order to obtain context information, it uses atrous convolution to expand the receiving field, which solves the limitation that traditional convolution operations can only obtain information in the local receptive field[62]. Atrous convolution enhances the ability of feature extraction by adding voids within the convolutional kernel, capable of covering a large range of image regions without adding parameters and computational complexity. This method can efficaciously deal with large receptive field and improve model performance[63].

In the early DeepLab models[64], as shown in Figure 4 (a), the input image was first feature extracted through a convolutional feature extraction module. Then the feature map is processed through the atrous convolution module, Among them, the receptive field is expanded by using atrous convolution kernels with different expansion rates, and then achieve more context information. Secondly, in the global pooling module, the feature map is compressed into a vector to extract the contextual information of the image. Finally, the output vector from global pooling is expanded to the same size as the original input image using bilinear interpolation, yielding the final segmentation result[65].

In the Deep Lab V2 model, it uses Atrous convolution more flexibly, explaining the Atrous Spatial Pyramid Pooling (ASPP), Utilizing the advantages of atrous convolution, extract features from different scales and fuse them, and finally retain the fully connected module for processing[66]. On this basis, a coding decoder structure based on atrous convolution is proposed. The encoder stage adopts a series of convolutional operations to extract the features of the input image while maintaining the integrity of the image information[67]. The decoder upsamples the encoded output image and utilizes skip connection technology to fuse shallow and deep features in the image, thereby obtaining richer image details[68]. In addition, through atrous convolution, the receptive field of the target is enlarged, and the edge and detail information of the target are increased, thus realizing the effective extraction of the target.

In the Deep Lab V3 model, the image is input into the feature extraction module for feature extraction. The feature map is processed by the multi-scale atrous convolution module, and different scales of hole convolution cores are used to obtain broader context information and richer semantic information[69]. The ASPP module further expands the perceptual field by using different sizes of pore convolutions to perform convolution operations on feature images, thereby obtaining multiple contextual information with a wide range of scales. The feature fusion module further improves the accuracy of image segmentation by fusing the features extracted from different modules in multiple modules. Finally, using Bilinear interpola image can be output [66].

In the Deep Lab-V3+ model, as shown in Figure 4 (b), the encoding and decoding structure is used, where the encoder part refers to the DeepLab-V3+model[70]. After the image is input into the backbone network, it will obtain two feature layers. The high-level feature layer will enter the ASPP module in the encoder, and the low-level feature layer will directly enter the decoder for 1×1 convolution and channel Is compressed which can effectively reduce the proportion of lower levels. In Figure 5, the segmentation effect of the DeepLab-V3+model on different objects (dogs, people) is highlighted. The first column is the input image, and the second column is the segmentation effect. From the figure, it can be seen that the segmented image can clearly distinguish the foreground and background, and can also display the boundary information of the segmented image. Experimental results have shown that this algorithm can effectively segment fine-grained images.

2.3 Image segmentation model based on Mask R-CNN

Mask R-CNN is an instance segmentation model based on Faster R-CNN, which further provides pixel level segmentation information for each object on the basis of object detection[71]. The framework of Mask R-CNN is shown in Figure 6. Firstly, the input image is subjected to convolutional neural networks to extract feature maps, and then candidate target regions are generated through Region Proposal Networks (RPN). Then, The Region of Interest (ROI) pooling layer can convert the effective feature maps in the corresponding candidate regions into feature vectors of a certain size.

Next, the extracted feature vectors then pass through two fully connected layers to predict the category and boundary coordinates of candidate objects. In addition, a branch called Mask head has been added to predict the category of each pixel in each candidate box, thereby obtaining pixel level segmentation information. Finally, the final target box is selected through Non-Maximum Suppression (NMS) and the corresponding mask is output as the final result of the model.

Unlike the semantic segmentation of models such as FCN and Deep Lab, Mask R-CNN can complete instance segmentation based on semantic segmentation. Compared to existing instance segmentation models, such as FCIS[72] and MNC[73], Mask R-CNN can make the model more flexible and improve partitioning accuracy, which can be used to achieve more image processing tasks, including instance segmentation[74] and object detection[75].

Figure 7 illustrates the segmentation effect of the Mask R-CNN model on objects in different information environments. The model not only enables accurate detection and localization of objects in the image but also performs precise segmentation, thereby allowing differentiation of individual instances within the same class of objects.

2.4 Image segmentation model based on PSPNet

PSPNet is a model used for semantic segmentation of scene objects, which can fully utilize contextual information to analyze complex environments. This model is also the first to use convolutional neural networks to compute the output of the convolutional feature map at the end of the image, and to use an internal pyramid pooling module to capture and upsample features from different subregions[76]. This model can make full use of the features of each subordinate region to form a feature representation that contains both local and global information. Finally, use the SoftMax layer to classify and merge these features, and then perform convolution operations, obtaining the final predicted result and classification for each pixel. The model contains many advantages such as multi scale analysis, feature sharing, gradual adjustment and robustness, which can well improve the accuracy and stability of image recognition, making PSPNet a commonly used deep learning network structure in computer vision tasks, and has achieved remarkable results in semantic segmentation of scene objects[77].

PSPNet is an advanced semantic segmentation model that can adapt to variety of complex scenarios and tasks. Through the pyramid pooling module, it combines contextual information from different regions and improves the expression ability of global features[78]. In addition, the model also adopts a deeply supervised optimization strategy to train the deep network and achieves excellent segmentation results on multiple datasets, surpassing models such as FCN and DeepLab-V2[79]. However, this model also has some drawbacks, such as insufficient precision in processing occlusion and inaccurate boundary segmentation for different targets (humans, airplanes, cows), as shown in Figure 8.

2.5 Comparison and analysis of experimental results

Qualitative analysis was conducted on the above methods, and three commonly used datasets in the field of image segmentation, PASACL VOC[81] , Microsoft COCO[82]and Cityscapes[83], were used to compare the performance of convolutional neural network-based image segmentation methods, objective and fair test results were obtained, as shown in Figure 9. By conducting comparative experiments on image segmentation methods using the Microsoft COCO dataset, it can be seen from the analysis of experimental results that compared with the true values of semantic segmentation (Figure 9 (b)) and the FCN-8s method (Figure 9 (d)), the algorithm can effectively distinguish the types of most objects; PSPNet (Figure 9 (e)) can classify most targets, especially in traffic scenes with complex image information, and can achieve good results. DeepLab-V3+(Figure 9 (f)) can effectively segment most objects and handle boundary details well, resulting in a very obvious overall segmentation effect. Mask R-CNN (Figure 9 (g)) belongs to instance segmentation. Compared with another instance segmentation (Figure 9 (c)), this method can also achieve higher classification accuracy by segmenting different individuals in the same class based on semantic segmentation of the segmented objects. In summary, FCN, PSPNet, and DeepLab-V3+can effectively perform semantic segmentation on images, with significant output effects. Mask R-CNN is suitable for instance segmentation and can classify objects very accurately.

The performance of the image segmentation method introduced in this paper is compared with other convolutional neural network-based image segmentation methods in quantitative analysis under the existing experimental conditions. The results are shown in Tables 1-3, The average intersection ratio is used as the accuracy measure in Tables 1 and 2, and the pixel accuracy is used as the accuracy measure in Table 3. From Tables 1 and 2, it can be seen that compared with other segmentation methods, DeepLab-V3+can obtain higher accuracy on PASCAL VOC, Cityscapes and other test datasets, with 89.0% and 82.1% respectively. However, from Table 3, it can be seen that compared to the other two methods, Mask R-CNN segmentation accuracy is very high on the Microsoft COCO dataset, with pixel accuracy reaching 37.00%.

Table. 1 Performance of Various Segmentation Methods on PASCAL VOC Datasets

Sort	Segmentation method	MIoU/%
1	DeepLab-v3+	89.1
2	DeepLab-v3	85.6
3	DeepLab-v2	79.8
4	PSPNet	85.5
5	FCN-8s	67.1
6	CRF-RNN	74.8
7	DPN	77.4

Table. 2 Performance of various segmentation methods on the Cityscapes dataset

Sort	Segmentation method	MIoU/%
1	DeepLab-v3+	82.0
2	DeepLab-v3	81.2
3	DeepLab-v2	70.3
4	PSPNet	81.1
5	FCN-8S	65.2
6	CRF-RNN	62.4
7	DPN	66.7

Table. 3 Performance of Various Segmentation Methods on Microsoft COCO Datasets

Sort	Segmentation method	PA/%
1	Mask R-CNN	37.00
2	FCIS	33.50
3	MNC	24.50

For super-resolution reconstruction of images, traditional algorithms mostly perform simple reconstruction of image, such as interpolation based super-resolution reconstruction, which can effectively decompose the image and form fine-grained pixel points to analyze super-resolution images. Through this method, known and unknown information of pixels is fitted[84]. Although this method is simple and easy to implement, in practical reconstruction, it can cause neglect of the image degradation model because it relies too much on the a priori transform function for super-resolution images, thus causing problems such as blending and blurring in the recovered images [85]. Based on the degradation model, super-resolution reconstruction involves performing operational transformations, blurring, noise, and other processing on high-resolution images to obtain low-resolution images[86]. This method mainly focuses on information processing of low resolution images, combining them with unknown knowledge to achieve constraints on super-resolution image generation. Although this method is computationally simple and easy to operate, it is often difficult to estimate assumptions[87]. With the popularity of deep learning, it extracts high-level abstract features of data, analyzes the potential distribution patterns of multi-layer nonlinear transformation learning data, and achieves reasonable judgment and prediction processing of new data[88]. Introducing deep learning into the field of image super-resolution reconstruction has also become a future focus for many researchers in the future. This section will focus on four deep learning based image super-resolution reconstruction methods, including Super Resolution Convolutional Network (SRCNN), Fast Super Resolution Convolutional Neural Networks (FSRCNN), Super Resolution Generative Adversarial Network (SRGAN) and Very Deep Super Resolution (VDSR). It should be noted that compared to traditional methods in the past, the use of convolutional neural network can commendably improve the reconstruction of the model. However, methods based on convolutional neural networks generally require huge computational data and a huge dataset[89]. Therefore, when selecting the network model, the choice of dataset, network structure, loss function, hyperparameter, etc., needs to be considered to achieve the best experimental results.

3.1 Image super-resolution reconstruction model based on SRCNN

SRCNN is proposed for deep learning in image super-resolution reconstruction, as shown in Fig. 10. SRCNN fully utilizes the relationship between deep learning and traditional sparse coding by dividing the three-layer network into three parts: image extraction module, nonlinear mapping, and final reconstruction. First, the target image (low-resolution image) is magnified to the target size by bicubic interpolation method, and the magnified image is still called a low-resolution image. After it is input into the three-layer convolution neural network, 64, 32, and 1 feature maps are respectively output through three-layer convolution calculation, and the last feature map is taken as the final reconstructed high-resolution image. The network has a moderate number of filters and convolutional layers, and it is also a feedforward network that does not need to consider optimization issues when used. This network can also learn end-to-end mapping based on effective information between low resolution (LR) and high resolution (HR) images, which makes it very fast in CPU operation speed and can effectively apply deep learning to super-resolution problems[90].

3.2 Image super-resolution reconstruction model based on FSRCNN

FSRCNN makes new consideration in the network structure and hyperparameter configuration of SRCNN, and improves on the traditional network. The structural framework of FSRCNN is shown in Fig. 11. A subpixel convolution is used to replace the deconvolution layer in SRCNN, which directly extracts features from the original image, that is, directly on the LR layer, and then upsampling to the HR layer. In addition, parameter optimization is achieved by adding a contraction layer to the network model. In order to increase the complexity of the model, an expansion layer symmetrical to the contraction layer is added after mapping to achieve this goal. In the mapping layer, this network structure reduces the number of filters. Although the number of layers in the mapping layer is not well controlled, it can obtain a narrower and longer network structure. This method has a good adjustment for the complexity and redundancy of the model, thereby improving the reconstruction speed of the SRCNN network[91].

3.3 SRGAN based image super-resolution reconstruction model

SRGAN is a generative adversarial network based on perceptual loss optimization, which combines perceptual loss and adversarial loss to improve the quality of restored images. The network structure of SRGAN is shown in Fig. 12, it can be seen that the generating network of SRGAN is a deep network composed of B residual blocks, while the discriminant network is a common CNN network, which aims to train the maximization part, in which Leaky ReLU activation function is used to avoid some negative output necrosis. SRGAN can be used as a separate SR network, using the loss function MSE, using 4 times the scaling factor and 16 residual blocks can be well achieve the current SOAT, making its super-resolution reconstruction effect optimal[92].

3.4 Image super-resolution reconstruction model based on VDSR

The high-precision single image super-resolution algorithm based on VGG can better restore the details and texture information of the image by increasing the hierarchy of the network, thereby improving the accuracy of reconstruction. As shown in Fig. 13, the network has 20 convolutional layers, and small filters are used multiple times in each layer to extract image features. In the deep network structure, the global information of the image region is effectively utilized to guide image reconstruction. Since deep network structures tend to converge slowly, the network adopts residual learning method to accelerate the training process, has a high learning rate, and prevents gradient explosion or disappearance by adjustable gradient clipping. In addition, during each convolution, the network performs a zero-complement operation on the image, making the feature image the same size as the final output image, solving the problem of information loss caused by the image becoming smaller after multiple convolutions. The VDSR model can be trained not only for low-resolution images of specific magnification alone, but also for low-resolution images of different magnifications together. This training method allows the model adaptable to super-resolution problems with different magnifications, thus obtaining a wider range of applications, and is also very excellent and accurate for reconstruction results with strong generalization ability[93].

3.5 Comparison and Analysis of Experimental Results

This study used three publicly available datasets, Set5, Set14, and BSD100, which are authoritative datasets in the field of image super-resolution reconstruction, thus ensuring the reliability and effectiveness of test results. The comparison of the results between SRCNN and traditional algorithms is shown in Fig. 14. SRCNN can automatically optimize the learning parameters based on the input training set, so the processed output is often better than traditional algorithms. The comparison of the results between FSRCNN and traditional algorithms is shown in Fig. 15. After FSRCNN, the image restoration quality is higher and the information processing effect is better. The comparison of the effectiveness between SRGAN and other mainstream methods is shown in Fig. 16, which clearly shows that this network outperforms other methods in performance evaluation. The comparison of VDSR with other methods is shown in Fig. 17. It can be seen that VDSR performs well in PSNR and SSIM evaluation standards, thanks to its deep network structure, high-precision reconstruction, no need for preprocessing, numerous applicable image types, and strong scalability. VDSR can adapt well to the needs of different application scenarios. Therefore, VDSR is a very effective method for image super-resolution reconstruction.

Traditional edge detection methods, such as gradient-based Laplacian edge detection[94], are simple to use and have good results. However, the operators in this algorithm only analyze the gradient properties of the image, and do not analyze changes in other properties such as texture of the image. This can lead to problems such as boundary blurring and sensitivity to noise[95]. Boundary detection based on artificial features, such as multi-scale feature edge detection and structured edge detection algorithms, although improved in feature utilization and classification accuracy, also faces problems such as complex convolution operations and inability to detect in real-time[96].

Therefore, the powerful learning ability and excellent feature representation capability of deep learning have become an inevitable choice for solving traditional image edge detection problems. The image edge detection algorithms based on convolutional neural networks mainly include: network reconstruction edge detection algorithm, cross layer fusion edge detection algorithm, multi-scale fusion edge detection algorithm, and subpixel edge detection algorithm. These algorithms overcome the limitations of traditional methods and can better extract image features, thereby achieving more accurate and robust edge detection.

4.1 Edge detection model based on network reconstruction

At present, most CNN models are gradually moving from giant networks to small networks, which have shown good performance in computational efficiency and accuracy. However, for edge detection, not only high accuracy, but also high computational speed is required. Therefore, researchers have developed a convolutional neural network for image classification (AlexNet)[97], a deep residual learning for image recognition (ResNet)[98], Network architectures such as Very Deep Convolutional Networks for Large Scale Image Recognition (VGGNet)[99]are reconstructed to improve accuracy and computational speed for large-scale image recognition.

A new image processing network model (N4 Fields) was proposed by Ganin in 2014, which uses CNN networks to extract image features and applies feature matching and weighted least squares registration techniques to solve problems such as natural edge detection and sparse object segmentation. Specifically, the model first inputs the image into a CNN network for feature extraction, then performs feature matching, and uses a weighted least squares registration algorithm to optimize the matching results, ultimately obtaining the registered image. This algorithm not only improves registration accuracy, but also saves computational complexity. The researchers tested the network model using the BSD500 dataset, which showed results comparable to or even better than the average level at the time (ODS = 0.753)[100].

The Convolutional Oriented Boundaries (COB) model based on convolutional edge structure uses CNN for feature extraction of images, and is combined with two networks: Spatial Attention Network for Image Classification (SpatialNet) and Learning Optical Flow with Convolutional Networks (FlowNet) for feature extraction of images. SpatialNet is used to extract spatial features of images, while FlowNet is used to extract motion features of images. These features are calculated by the multi-layer perceptron (MLP) for relative position. By optimizing the weight and biases of MLP, the COB model realizes image registration and aligns them to the same coordinate system. Compared with the previous advanced technology, it has a qualitative leap in performance and can also be extended to new categories and data sets[101].

The AG-CRF model is an image segmentation method based on convolutional neural networks. As shown in Fig. 18, this method uses CNN to express the features of the image, and uses multiple convolutional and pooling layers to gradually learn local and global features with rich contextual information, thereby achieving recognition of different objects and regions. Then, based on this, a new attention mechanism was proposed to guide image segmentation. On this basis, the algorithm can automatically learn the attention information of each region in the image, and focus on the key areas, thus maintaining the detail information of the image while ensuring the accuracy of image segmentation. Finally, the CRF model is used to further optimize the segmentation effect of the image. CRF is a graph-based image processing method that maximizes the relationship between images by processing neighboring images, thereby improving the accuracy and continuity of image segmentation. Combining CNN, attention mechanism and CRF, AG-CRF model can better understand the Semantic information in the image, so as to achieve accurate and fine image segmentation[102]. The experimental results show that this method can effectively solve problems such as semantic segmentation and medical image segmentation. Through the analysis of experimental results, this algorithm can effectively utilize the rich and complementary features contained in the BSDSS500 and NYUDV2 datasets, and its performance is better than the mainstream edge detection algorithms at that time.

4.2 Edge detection model based on cross layer fusion

Under the same network structure, deep features have a wider acceptance range and richer semantics, while shallow features have a wider acceptance range and richer positions. The so-called cross layer fusion refers to the fusion of deep level features with shallow level features[103]. With the same network structure, combine the advantages of both to achieve better results. Although low-level features have high resolution and contain a large amount of position, detail, and other information, they have low semantics and strong noise; Although high level features contain rich semantics, they have drawbacks such as low resolution and inability to express details well[104]. Therefore, by effectively integrating these two features, the ability to express deep features of objects can be improved, providing richer information for the boundaries of objects.

Generally based on CNN, when performing convolutional calculations, only the label information of the previous layer is used, which leads to the loss of many feature information. Therefore, Yun Liu proposed a fully convolutional network Richer Convolutional Feature (RCF), which can fully utilize the features of each layer[105]. The RCF model utilizes deep residual networks (ResNet) and dilated convolutions to extract image features, enabling the model to perceive a wider range of contextual information. In addition, the RCF model also introduces a cascade structure, which cascades features of different scales to improve the robustness and validation accuracy of the model. Finally, the RCF model obtains edge information of the image through pixel level classification. By analyzing the experimental results, ODS = 0.806(8FPS)in the BSDS500 dataset, which achieved the best experimental results at that time.

In 2018, CNN at that time underwent post-processing to produce clearer boundaries. Therefore, Ruoxi Deng analyzed the common problem of coarse predicted edges at that time and designed a new edge detection method LPCB (Learning to Predict Crisp Boundaries), which incorporated the structure of a Full Convolutional Network (FCN), At the same time, Long short-term memory (LSTM) is used to integrate the context information and extract the semantic information of the image. The LPCB model consists of two parts: one is an encoder based on VGG-16, which is used to extract image features; The other is a decoder with LSTM, used for contextual modeling and pixel level classification of feature maps[106]. Through the analysis of the experimental results, ODS = 0.815 in the BSDS500 dataset and ODS = 0.762 in the NYUD dataset, both of which achieved the expected results.

The network boundary detection method based on Visual Cross Fusion (VCF) integrates the Convolutional Architecture for Fast Feature Embedded (Caffe) and Visual Geometry Group 16 (VGG16) network templates for fast feature embedding[107]. It is a cross modal image fusion model that learns feature representations of different image modalities, And fuse them together to generate a brand new cross modal image. The advantage of VCF is that it can expand the representation ability of images and improve the accuracy of image recognition and classification. Specifically, VCF uses variational autoencoders to combine feature representations of different modalities, thereby achieving information exchange and fusion. This method can be used for image conversion between different modalities, such as converting RGB images to infrared images, or converting low resolution images to high-resolution images. By analyzing the experimental results, ODS = 0.808 was found in the BSDS500 dataset. The expected effect has been achieved.

4.3 Edge detection model based on multi-scale fusion

The complexity and diversity of signals mean that they can be sampled at different granularity to form multiple scales, each with its own characteristics, which can be used to complete various tasks. Currently, researchers are currently designing multiscale model frameworks to enable tasks such as edge detection. These frameworks can usually be classified into four types: multi-scale feature fusion, multi-scale input, multi-scale feature prediction fusion, and a combination of the above methods. Through these methods, different features of the signal can be captured at different scales, thereby improving the performance of the model.

The DeepEdge network structure is based on the Ivan network[108], by recalculating the first five convolutional layers to obtain output features, and then recombining these features[109]. Because the full Convolutional neural network can be used to extract the edge information in the target image, it has higher performance than previous methods. As shown in Fig. 19, DeepEdge uses several convolutional and pooling layers to extract features from the detected object, and uses skip connections to fuse feature maps at different levels, taking into account both edge details and global information. Finally, DeepEdge uses some post-processing methods to optimize the edge detection results, such as using Bilateral filter to remove noise and smooth edges. Through the analysis of the experimental results, it was found that the feature values of the classification branch have higher repeatability, while the accurate values of the regression branch are higher. Therefore, combining the two can have a good effect. This method can improve model performance while considering both classification and regression tasks, with ODS = 0.753 and mAP = 0.807.

Bertasius proposes a boundary detection system HFL (High For Low)[110] based on the fact that humans use specific type judgments when recognizing certain designated pixel edges. As shown in Fig. 20, this system has high accuracy and efficiency, and can effectively utilize pre trained classifiers and object features to predict boundaries. It is a top-down processing, while the features of the upper object are completed by the boundary recognition processing of the lower object. HFL also shows that the semantic properties of edges can be used to improve certain high-level vision tasks, a process that can be compared to a "bottom-to-top" scheme, in which low-level edges help high-level vision tasks. By analyzing the experimental results, the ODS = 0.76 on the BSDS500 dataset, which is the optimal performance at that time.

Holistically Nested Edge Detection (HED) network is mainly proposed to address the problems in the edge detection process[111]. The basic idea of HED is to obtain more comprehensive and accurate edge extraction by fusing features of different scales and levels in the image. HED extracts multi-scale, multi-level, and multi-dimensional features from images, and fuses them to achieve overall consistency of the image. The method utilizes a convolutional neural network as the backbone network to achieve VGGNet-like image feature extraction. the training process of the HED network uses a pixel level binary cross-entropy loss function, while the Gaussian filter is optimized using a backward propagation algorithm. In the training phase, the algorithm can extract more Semantic information from high-level features, and enhance the accuracy and robustness of image edge detection. Based on the experimental results, the BSDS500 dataset shows ODS = 0.801, while the NUYD dataset shows ODS = 0.751. This network model can effectively improve detection speed and is significantly superior to edge detection models based on convolutional neural networks.

Firstly, several layers of convolutional neural networks are introduced to extract multi-dimensional features from the input image, which includes information from multiple levels such as texture and edges at the bottom to semantics and shapes at the top. Then, the method uses a fully convolutional neural network method to achieve the mapping between the feature image and the target image. Finally, in the learning stage, a "relaxed" learning method was used, which only learned the relatively prominent edges, rather than the relatively non prominent edges. The method reduces the learning difficulty and avoids the problems of over-fitting and gradient loss, and also enhances the generalization ability of the model. By analyzing the experimental results, the ODS = 0.792 in the BSDS500 dataset.

4.4 A Subpixel Based Edge Detection Model

The coordinates of the boundary point can be obtained by subdividing the two pixels twice. Subpixel boundary extraction technology is a new method. Currently, there are mature technologies that can achieve layering at 2, 4, or even higher levels. The subpixel edge detection technique has received more and more attention because of its advantages of cost saving and improving the recognition accuracy of images.

In 2017, there was a problem of output blurring in convolutional neural network boundary detection. Therefore, Yupei Wang creatively proposed a fine structure CED (Crisp Edge Detector)[116]in order to balance the localization ability of HED[113]and based on new developments in dense image information at that time[114–115]. The principle of this algorithm is based on the Canny edge detection algorithm, but unlike the Canny algorithm, CED first preprocesses the image before edge detection. By analyzing the gradient direction, grayscale, and other parameters in the image, certain methods are used to maintain the edge details in the image. Specifically, the boundary information in CED images is preserved using the minimum fitting method, while the maximum fitting method is used to improve the accuracy of edge detection.

In addition, CED also uses a method called Signal to Noise Ratio Adaptive Thresholding to binarize edges to obtain better edge detection results. The main advantages of CED are the ability to preserve the edge detail information in the image and to locate the edges accurately. This algorithm has wide applications in the fields of computer vision and image processing, such as object detection, image segmentation, object recognition, etc. The CED method achieves good results on the BSDS500 dataset and has a great advantage over the general methods to correctly process the information of the image.

4.5 Comparison and Analysis of Experimental Results

In order to better evaluate the effectiveness of edge detection algorithms, consistent evaluation metrics need to be used to describe each algorithm. The most commonly used evaluation metrics in the field of edge detection are Optimal Dataset Size (ODS), Optimal Image Size (OIS), Frames Per Second (FPS), and PR curve. In the edge detection evaluation index ODS, OIS can be obtained when the F-measure value reaches its optimal value. For the BSDS500 dataset, Fig. 21 shows a comparison of the ODS values of various edge detection algorithms. In addition, Fig. 22 presents the P-R curves of various edge detection algorithms. It can be clearly seen from the chart that the edge detection method using convolutional neural networks has better ODS values than conventional methods. In the experiment, four algorithms, such as VCF, RCF, LPCB, BDCN, etc., performed better than human vision (Human = 0.803).

The field of deep learning has been greatly developed, resulting in the rapid development of fine-grained image processing. Similarly, convolutional neural networks as base models can also provide researchers with research ideas. The fine-grained image processing methods based on convolutional neural networks are receiving increasing attention from scholars in terms of accuracy and efficiency. This paper mainly summarizes the research progress of fine-grained image processing methods based on Convolutional neural network at home and abroad, and focuses on the application of Convolutional neural network in image segmentation, image super-resolution reconstruction, image edge detection, etc.

Based on the research status of convolutional neural network in the field of fine-grained image processing, the following future research directions are proposed for various image processing methods: (1) In view of the problem that image segmentation methods based on Convolutional neural network often lack segmentation data sets, we can consider to devote ourselves to developing relevant segmentation data sets in the follow-up work. Through this approach, we can effectively reduce the heavy annotation workload and provide more and better training data for deep learning algorithms. (2) In image segmentation methods based on convolutional neural networks, if the input image size is too small, it may lead to unsatisfactory segmentation results. Therefore, how to design convolutional layers to adapt to small scale image input is also an important issue. It is recommended to try optimizing the structure and parameter settings of the convolutional layer to better handle small-sized images. This can improve the robustness and performance of the algorithm and enhance the reliability of convolutional neural network in practical applications. (3)Image segmentation methods based on convolutional neural networks often find it difficult to achieve real-time and interactive partitioning in practical applications, which hinders the popularization and application of partitioning technology. Therefore, how to solve real-time interactive segmentation has become one of the urgent problems to be solved. By exploring the structure and parameters of optimization algorithms, designing more efficient network models, and accelerating the training and inference process of the models, and achieving faster and more real-time interactive segmentation. This can greatly improve the practicality and applicability of segmentation technology, and promote the widespread application of convolutional neural networks in the field of image segmentation. (4) In the image super-resolution reconstruction method based on convolutional neural networks, the algorithm shows a significant improvement in PSNR and SSIM values. However, the visualization effect is not satisfactory, and even though this problem is alleviated to some extent by generating adversarial networks, the instability of generating adversarial networks is also an important issue. Therefore, it is necessary to further study how to optimize the algorithm to improve its visualization effect and stability, such as adding more regularization items, designing a more robust network structure, and adopting more effective training strategies. These measures can further improve the practicality and reliability of image super-resolution reconstruction methods, and lay a solid foundation for the application of deep learning in the field of image processing. (5) In the field of image super-resolution detection, convolutional neural network based image super-resolution reconstruction methods are increasingly in demand for practical applications. Therefore, it is necessary to combine network structure, training methods, and prior conditions in relevant fields to meet the needs of practical applications. This can better improve the practicality and reliability of the algorithm, and expand the application space of convolutional neural networks in fine-grained image processing. At the same time, it is also necessary to pay attention to the characteristics of different application fields and explore more effective application methods, such as applying image super-resolution reconstruction methods to medical images, video processing, remote monitoring and other fields, in order to improve the efficiency and accuracy of practical applications. These research directions have important research value and will promote the widespread application and further development of convolutional neural networks in practical applications. (6) The edge detection method based on convolutional neural networks has made significant breakthroughs in fields such as semantic segmentation and multi-scale information fusion. However, the method still has problems such as low detection accuracy and blurred boundaries. Therefore, it is necessary to further study how to optimize the algorithm to improve its detection accuracy and reliability, such as adopting more robust network structure, optimizing training strategies, adding more regularization terms, etc. In addition, it is necessary to further explore the essence of edge detection and further explore the internal relationship between Semantic information and multi-scale features to better achieve the goal of edge detection. These methods can make the application of convolutional neural networks in edge detection more widespread, reliable, and provide a more solid foundation for the application of deep learning in the field of image processing. (7) The edge detection method based on convolutional neural networks is more challenging in processing 3D datasets compared to low dimensional datasets, as the creation of 3D datasets requires more data and higher time costs. Although some basic work has been done, the realistic data space in reality varies greatly, and various factors such as noise and deformation need to be considered for real-world data. Therefore, more effective methods need to be explored to deal with these problems, such as adopting more flexible and robust network structures, employing adaptive training strategies and using techniques such as augmented learning to improve the generalization ability of the models. At the same time, in order to simulate more realistic data conditions, it is necessary to design more diverse data augmentation methods to improve the robustness and generalization ability of the model. These measures will further promote the application of edge detection methods based on convolutional neural networks on 3D datasets, and provide a more solid foundation for achieving efficient and accurate 3D image analysis and processing.

Acknowledgements This work was supported by the National Key R&D Program of China (No. 2022YFB3603703), the National Natural Science Foundation of China (No. 52173263) and the Qinchuangyuan High-level Talent Project of Shaanxi (No. QCYRCXM-2022-219).

Conﬂict of interest The authors declare that they have no conﬂict of interest.

Han X F, Laga H, Bennamoun M. Image-based 3D object reconstruction: State-of-the-art and trends in the deep learning era [J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 43(5): 1578-1604.
Islam M R, Nahiduzzaman M. Complex features extraction with deep learning model for the detection of COVID19 from CT scan images using ensemble based machine learning approach [J]. Expert Systems with Applications, 2022, 195: 116554.
Guo W, Ma J, Ouyang Y, et al. Efficient convolutional networks learning through irregular convolutional kernels [J]. Neurocomputing, 2022, 489: 167-178.
Yao P, Wu H, Gao B, et al. Fully hardware-implemented memristor convolutional neural network [J]. Nature, 2020, 577(7792): 641-646.
Morales A, Alomar A, Porras A R, et al. Babynet: Reconstructing 3d faces of babies from uncalibrated photographs [J]. Pattern Recognition, 2023, 139: 109367.
Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs [J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(4): 834-848.
Dong C, Loy C C, He K, et al. Learning a deep convolutional network for image super-resolution [C]. Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13. Springer International Publishing, 2014: 184-199.
Liu F, Shen C, Lin G, et al. Learning depth from single monocular images using deep convolutional neural fields [J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(10): 2024-2039.
Minaee S, Boykov Y Y, Porikli F, et al. Image segmentation using deep learning: A survey [J]. IEEE transactions on pattern analysis and machine intelligence, 2021.
Zhu H, Meng F, Cai J, et al. Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation [J]. Journal of Visual Communication and Image Representation, 2016, 34: 12-27.
Liu J, Tang Y Y. Adaptive image segmentation with distributed behavior-based agents [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(6): 544-551.
Hesamian M H, Jia W, He X, et al. Deep learning techniques for medical image segmentation: achievements and challenges [J]. Journal of digital imaging, 2019, 32: 582-596.
Kotaridis I, Lazaridou M. Remote sensing image segmentation advances: A meta-analysis [J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 173: 309-322.
Álvarez-Aparicio C, Guerrero-Higueras Á M, González-Santamarta M Á, et al. Biometric recognition through gait analysis [J]. Scientific Reports, 2022, 12(1): 14530.
Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481-2495.
Hao Z, Lu C, Li Z. Highly accurate and automatic semantic segmentation of multiple cracks in engineered cementitious composites (ECC) under dual pre-modification deep-learning strategy [J]. Cement and Concrete Research, 2023, 165: 107066.
Khan J F, Bhuiyan S M A, Adhami R R. Image segmentation and shape analysis for road-sign detection [J]. IEEE Transactions on Intelligent Transportation Systems. 201112 (1): 83-96.
Rosenfeld A. The max Roberts operator is a Hueckel type edge detector [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence. 1981, 3 (1): 101-103.
Lang Y, Zheng D. An improved Sobel edge detection operator [C]. IEEE International Conference on Computer Science & Information Technology. New York: IEEE Press, 2010: 67-71.
Yang L, Wu X Y, Zhao D W, et al. An improved Prewitt algorithm for edge detection based on noised image [C]. 2011 4th International Congress on Image and Signal Processing. New York: IEEE Press, 2011:1197-1200.
Uluoinar F, Medioni G. Refining edges detected by a LoG operator [J]. Computer Vision Graphics & Image Processing, 1990, 51(3): 275-298.
Li E S, Zhu S L, Zhu B S, et al. An adaptive edge-detection method based on the Canny operator [C]. 2009 International Conference on Environmental Science and Information Application Technology. New York: IEEE Press,2009:465-469.
Zhang Y J. An Overview of Image and Video Segmentation in the Last 40 Years [EB/OL]. [2018-02-10]. https://www.irma-international.org/viewtitle/4834/？isxn=97815914 07539.
Pham D L, Xu C Y, Prince J L. A survey of current methods in medical image segmentation [J]. Annual Re⁃ view of Biomedical Engineering, 2000, 2(1): 315-337.
Tremeau A, Borel N. A region growing and merging algorithm to color segmentation [J]. Pattern Recognition, 1997, 30(7):1191-1203.
Cheng Y．Mean shift, mode seeking, and clustering [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(8): 790-799.
Fukunaga K, Hostetler L D．The estimation of the gradient of a density function, with applications in pattern recognition [J]. IEEE Transactions on Information Theory,1975,21(1):32-40.
Sheikh Y A, Khan E A, Kanade T．Mode-seeking by Medoidshifts [C]. 2007 IEEE 11th International Con⁃ ference on Computer Vision. New York: IEEE Press,2007:1-8.
Levinshtein A, Stere A, Kutulakos K N, etal. TurboPixels: Fast superpixels using geometric flows [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(12):2290-2297.
Achanta R, Shaji A, Smith K, et al. SLIC superpixels compared to state-of-the-art superpixel methods [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(11):2274-2282.
Sultana F, Sufian A, Dutta P. Evolution of image segmentation using deep convolutional neural network: A survey [J].Knowledge-Based Systems, 2020, 201: 106062.
Tian C, Yuan Y, Zhang S, et al. Image super-resolution with an enhanced group convolutional neural network [J]. Neural Networks, 2022, 153: 373-385.
Yue L, Shen H, Li J, et al. Image super-resolution: The techniques, applications, and future [J]. Signal processing, 2016, 128: 389-408.
Wang Y, Bashir S M A, Khan M, et al. Remote sensing image super-resolution and object detection: Benchmark and state of the art [J]. Expert Systems with Applications, 2022: 116793.
Zhang Y N, An M Q. Deep learning-and transfer learning-based super resolution reconstruction from single medical image [J]. Journal of healthcare engineering, 2017, 2017.
Ran R, Deng L J, Jiang T X, et al. GuidedNet: A general CNN fusion framework via high-resolution guidance for hyperspectral image super-resolution [J]. IEEE Transactions on Cybernetics, 2023.
Liu B, Ait-Boudaoud D. Effective image super resolution via hierarchical convolutional neural network [J]. Neurocomputing, 2020, 374: 109-116.
Shen H, Lin L, Li J, et al. A residual convolutional neural network for polarimetric SAR image super-resolution [J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 161: 90-108.
Jing J, Liu S, Wang G, et al. Recent advances on image edge detection: A comprehensive review [J]. Neurocomputing, 2022.
Elharrouss O, Hmamouche Y, Idrissi A K, et al. Refined edge detection with cascaded and high-resolution convolutional network [J]. Pattern Recognition, 2023, 138: 109361.
Xiang H, Yan B, Cai Q, et al. An edge detection algorithm based-on Sobel operator for images captured by binocular microscope [C]. 2011 International Conference on Electrical and Control Engineering. IEEE, 2011: 980-982.
Bazgir O, Zhang R, Dhruba S R, et al. Representation of features as images with neighborhood dependencies for compatibility with convolutional neural networks [J]. Nature communications, 2020, 11(1): 4391.
Cha Y J, Choi W, Büyüköztürk O. Deep learning‐based crack damage detection using convolutional neural networks [J]. Computer‐Aided Civil and Infrastructure Engineering, 2017, 32(5): 361-378.
Yen J C, Chang F J, Chang S. A new criterion for automatic multilevel thresholding [J]. IEEE Transactions on Image Processing, 1995, 4(3): 370-378.
Khan J F, Bhuiyan S M A, Adhami R R. Image segmentation and shape analysis for road-sign detection [J]. IEEE Transactions on Intelligent Transportation Systems, 2010, 12(1): 83-96.
Tremeau A, Borel N. A region growing and merging algorithm to color segmentation [J]. Pattern recognition, 1997, 30(7): 1191-1203.
Otsu N. A threshold selection method from gray-level histograms [J]. IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(1): 62-66.
Kheradmandi N, Mehranfar V. A critical review and comparative study on image segmentation-based techniques for pavement crack detection [J]. Construction and Building Materials, 2022, 321: 126162.
Trombini M, Solarna D, Moser G, et al. A goal-driven unsupervised image segmentation method combining graph-based processing and Markov random fields [J]. Pattern Recognition, 2023, 134: 109082.
Wang M, Xu F, Xu Y, et al. A robust subpixel refinement technique using self‐adaptive edge points matching for vision‐based structural displacement measurement [J]. Computer Aided Civil and Infrastructure Engineering, 2023, 38(5): 562-579.
Wu F, Duan J, Ai P, et al. Rachis detection and three-dimensional localization of cut off point for vision-based banana robot [J]. Computers and Electronics in Agriculture, 2022, 198: 107079.
Brejl M, Sonka M. Object localization and border detection criteria design in edge-based image segmentation: automated learning from examples [J]. IEEE Transactions on Medical imaging, 2000, 19(10): 973-985.
Ma F, Zhang F, Xiang D, et al. Fast task-specific region merging for SAR image segmentation [J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-16.
Yang Y, Yan T, Jiang X, et al. MH-Net: Model-data-driven hybrid-fusion network for medical image segmentation [J]. Knowledge-Based Systems, 2022, 248: 108795.
Wang Z, Jensen J R, Im J. An automatic region-based image segmentation algorithm for remote sensing applications [J]. Environmental Modelling & Software, 2010, 25(10): 1149-1165.
Jiang F, Grigorev A, Rho S, et al. Medical image semantic segmentation based on deep learning [J]. Neural Computing and Applications, 2018, 29: 1257-1265.
Yang X, Li H, Yu Y, et al. Automatic pixel‐level crack detection and measurement using fully convolutional network [J]. Computer Aided Civil and Infrastructure Engineering, 2018, 33(12): 1090-1109.
Long J. Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation [C]. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press.
Wang W, Shen J, Shao L. Video salient object detection via fully convolutional networks [J]. IEEE Transactions on Image Processing, 2017, 27(1): 38-49.
Wang L, Wang L, Lu H, et al. Salient object detection with recurrent fully convolutional networks [J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 41(7): 1734-1746.
Li S, Zhao X, Zhou G. Automatic pixel‐level multiple damage detection of concrete structure using fully convolutional network [J]. Computer‐Aided Civil and Infrastructure Engineering, 2019, 34(7): 616-634.
Hu K, Li M, Xia M, et al. Multi-scale feature aggregation network for water area segmentation [J]. Remote Sensing, 2022, 14(1): 206.
Wang Y, Gao L, Hong D, et al. Mask DeepLab: End-to-end image segmentation for change detection in high-resolution remote sensing images [J]. International Journal of Applied Earth Observation and Geoinformation, 2021, 104: 102582.
Chen L C, Papandreou G, KOKKINOS I, et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs [EB/OL]. [2018-08- 09].
Wu H, Song H, Huang J, et al. Flood Detection in Dual-Polarization SAR Images Based on Multi-Scale Deeplab Model [J]. Remote Sensing, 2022, 14(20): 5181.
Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(4):834-848.
Liu Y, Zhu Q, Cao F, et al. High-resolution remote sensing image segmentation framework based on attention mechanism and adaptive weighting [J]. ISPRS International Journal of Geo-Information, 2021, 10(4): 241.
Wang W, Shen J. Deep visual attention prediction [J]. IEEE Transactions on Image Processing, 2017, 27(5): 2368-2378.
Hu K, Li M, Xia M, et al. Multi scale feature aggregation network for water area segmentation [J]. Remote Sensing, 2022, 14(1): 206.
Chen L C, Papandreou G, SCHROFF F, et al. Rethinking Atrous Convolution for Semantic Image Seg ⁃ mentation [EB/OL]. [2018-05-09].
He K M, Gkioxari G, Dollár P, et al. Mask RCNN [C]. 2017 IEEE International Conference on Computer Vision (ICCV). New York: IEEE Press, 2017:2980-2988.
Li Y, Qi H Z, Dai J, et al. Fully convolutional instance aware semantic segmentation [C]. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2359-2367.
Dai J, He K, Sun J. Instance-aware semantic segmentation via multi-task network cascades [C]. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 3150-3158.
Tian Y, Yang G, Wang Z, et al. Instance segmentation of apple flowers using the improved mask R–CNN model [J]. Biosystems engineering, 2020, 193: 264-278.
Hao Z, Lin L, Post C J, et al. Automated tree-crown and height detection in a young forest plantation using mask region-based convolutional neural network (Mask R-CNN) [J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 178: 112-123.
Yin Y, Guo Y, Deng L, et al. Improved PSPNet-based water shoreline detection in complex inland river scenarios [J]. Complex & Intelligent Systems, 2023, 9(1): 233-245.
Yuan W, Wang J, Xu W. Shift Pooling PSPNet: Rethinking PSPNet for Building Extraction in Remote Sensing Images from Entire Local Feature Pooling [J]. Remote Sensing, 2022, 14(19): 4889.
Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network [C]. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEEPress,2017:2881-2890.
Pan Y, Zhang G, Zhang L. A spatial-channel hierarchical deep learning network for pixel-level automated crack detection [J]. Automation in Construction, 2020, 119: 103357.
Zhu X, Cheng Z, Wang S, et al. Coronary angiography image segmentation based on PSPNet [J]. Computer Methods and Programs in Biomedicine, 2021, 200: 105897.
Everingham M, ESLAMI S M A, VAN GOOL L, et al. The PASCAL visual object classes challenge: A retro⁃ spective [J]. International Journal of Computer Vision,2015,111(1):98-136.
Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context [C]. European Con⁃ ference on Computer Vision (LNCS 8639). Berlin: Springer, 2014:740-755.
Cordts M, Omran M, RAMOS S, et al. The Cityscapes dataset for semantic urban scene understanding [C]. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York :IEEE Press,2016:3213-3223.
Gendy G, He G, Sabor N. Lightweight image super-resolution based on deep learning: State-of-the-art and future directions [J]. Information Fusion, 2023, 94: 284-310.
Zhang H, Zhang C, Xie F, et al. A Closed-Loop Network for Single Infrared Remote Sensing Image Super-Resolution in Real World [J]. Remote Sensing, 2023, 15(4): 882.
Luo P, Hu G, Tan Z. Logo Based on Improved Generative Countermeasure Network Image Super Resolution Reconstruction Method [C]. Journal of Physics: Conference Series. IOP Publishing, 2021, 1881(4): 042040.
Ha V K, Ren J, Xu X, et al. Deep learning based single image super-resolution: A survey [C]. Advances in Brain Inspired Cognitive Systems: 9th International Conference, BICS 2018, Xi'an, China, July 7-8, 2018, Proceedings 9. Springer International Publishing, 2018: 106-119.
Tian J, Sun X, Du Y, et al. Recent advances for quantum neural networks in generative learning [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
Zhang J, Shao M, Yu L, et al. Image super-resolution reconstruction based on sparse representation and deep learning [J]. Signal Processing: Image Communication, 2020, 87: 115925.
Dong C, Loy C C, He K, et al. Image super-resolution using deep convolutional networks [J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(2): 295-307.
Dong C, Loy C C, Tang X. Accelerating the super-resolution convolutional neural network [C]. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer International Publishing, 2016: 391-407.
Ledig C, Theis L, Huszár F, et al. Photo-realistic single image super-resolution using a generative adversarial network [C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4681-4690.
Kim J, Lee J K, Lee K M. Accurate image super-resolution using very deep convolutional networks [C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1646-1654.
Mlsna P A, Rodriguez J J. Gradient and Laplacian edge detection [M]. The Essential Guide to Image Processing. Academic Press, 2009: 495-524.
Karthick C N, Nirmala P. Smart Edge Detection Technique in X ray Images for Improving PSNR using Canny Edge Detection Algorithm with Gaussian Filter in Comparison with Laplacian Algorithm [J]. Cardiometry, 2022 (25): 1744-1750.
Patel J, Patwardhan J, Sankhe K, et al. Fuzzy inference based edge detection system using Sobel and Laplacian of Gaussian operators [C]. Proceedings of the International Conference & Workshop on Emerging Trends in Technology. 2011: 694-697.
Krizhevsky A, Sutskever I, Hinton G E. Image net classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90.
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [J]. arXiv preprint arXiv: 1409.1556, 2014.
Ganin Y, Lempitsky V. N^ 4-Fields: Neural Network Nearest Neighbor Fields for Image Transforms [M]. Computer Vision–ACCV 2014. Springer International Publishing, 2014: 536-551.
Maninis K K, Pont-Tuset J, Arbeláez P, et al. Convolutional oriented boundaries: From image segmentation to high-level tasks [J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(4): 819-833.
Xu D, Ouyang W, Alameda-Pineda X, et al. Learning deep structured multi-scale features using attention-gated crfs for contour prediction [J]. Advances in neural information processing systems, 2017, 30.
Qu Z, Gao L, Wang S, et al. An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network [J]. Image and Vision Computing, 2022, 125: 104518.
Xu Z, Li T, Liu Y, et al. PAC-Net: Multi-pathway FPN with position attention guided connections and vertex distance IoU for 3D medical image detection [J]. Frontiers in Bioengineering and Biotechnology, 2023, 11.
Liu Y, Cheng M, Hu X, et al. Richer convolutional features for edge detection [C]. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5872-5881.
Deng R, Shen C, Liu S, et al. Learning to predict crisp boundaries [C]. Proceedings of the 2018 European Conference on Computer Vision, LNCS 11210. Cham: Springer, 2018: 570-586.
Qu Z, Wang S, Liu L, et al. Visual cross-image fusion using deep neural networks for image edge detection [J]. IEEE Access, 2019, 7: 57604-57615.
Ivan G D. DermaKNet: Incorporating the knowledge of dermatologists to convolutional neural networks for skin lesion diagnosis [J]. IEEE Journal of Biomedical and Health Informatics,2019,23(2):547-559.
Bertasius G, Shi J, Torresani L. DeepEdge: a multiscale bifurcated deep network for top-down contour detection [C]. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2015:4380-4389.
Bertasius, Shi J, Torresani L. High-for-low and low-for-high: efficient boundary detection from deep object features and its applications to high-level vision [C]. Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 504-512.
Xie S, Tu Z. Holistically-nested edge detection [C]. Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1395-1403.
Liu Y, Lew M S. Learning relaxed deep supervision for better edge detection [C]. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 231-240.
Xie S, Tu Z. Holistically-nested edge detection [C]. Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE,2015:1395-1403.
Pinheiro P O, Lin T Y, Collobert R, et al. Learning to refine object segments [C]. Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016:75-91.
Shi W, Cabakkero J, Huszar F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network [C]. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016:1874-1883.
Wang Y, Zhao X, Huang K. Deep crisp boundaries: from boundaries to higher-level tasks [J]. IEEE Transactions on Image Processing, 2019, 28(3):1285-1298.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Fine-grained image processing based on convolutional neural networks

Status:

Version 1

Abstract

Figures

1 Introduction

2 Image Segmentation Method Based on Convolutional Neural Network

3 Image super-resolution reconstruction method based on convolutional neural network

3.1 Image super-resolution reconstruction model based on SRCNN

3.2 Image super-resolution reconstruction model based on FSRCNN

3.3 SRGAN based image super-resolution reconstruction model

3.4 Image super-resolution reconstruction model based on VDSR

3.5 Comparison and Analysis of Experimental Results

4 Image Edge Detection Method Based on Convolutional Neural Network

4.1 Edge detection model based on network reconstruction

4.2 Edge detection model based on cross layer fusion

4.3 Edge detection model based on multi-scale fusion

4.4 A Subpixel Based Edge Detection Model

4.5 Comparison and Analysis of Experimental Results

5 Summary and Outlook

Declarations

References

Additional Declarations

Status:

Version 1