Superpixel Driven Unsupervised Deep Image Super-Resolution

Most of the existing deep learning-based image super-resolution methods require a large number of datasets or ground truth. However, these methods are not suitable for the restoration of real image with different domains. Recently, Deep Image Prior (DIP) based on single-image explores image prior and uses network structure as implicit image prior to recover images, but it ignores the explicit prior information of the actual image itself. The addition of image prior can effectively alleviate the ill-posed problem in the image restoration model. Therefore, in this paper, we propose an unsupervised deep image super-resolution (SR) method that based on segmentation driven. Intuitively, clear image has a clearer segmentation boundary. It will drive deep neural networks to obtain higher performance SR image when forcing the restored image to have clear boundary. In order to make energy flow into DIP better, we use the fully convolutional networks-based superpixel method, and we use back propagation to inject the gradient generated by segmentation entropy energy into DIP to obtain lower energy optimization parameters. Experiments show that the image generated by our method has clearer boundary and better performance than that generated by DIP on Set5, Set14 and BSD100.


Introduction
Image super-resolution reconstruction technology is used to generate a high-resolution (HR) image from a low-resolution (LR) image using the algorithm.However, the image's resolution is limited by signal transmission bandwidth and noise interference.Due to current manufacturing limitations, it is difficult to improve the hardware resolution.Therefore, more attention is being paid to the research of image super-resolution algorithms.With the development of deep learning, people are focusing on how to deploy models on edge devices [1,2] and also pursuing high-quality results brought by deep learning.Nowadays, super-resolution algorithms are widely used in medical imaging [3][4][5], remote sensing imaging [6][7][8], and picture compression [9][10][11].
The existing methods to solve the problem of image super-resolution are mainly divided into three parts: methods based on interpolation, reconstruction, and deep learning.The interpolation method [12][13][14] selects the appropriate pixel coordinates for image interpolation through the relationship between adjacent pixels of the image.However, these methods do not consider the semantic information of the whole image.Because they only use the value between adjacent pixels of the original image to improve the resolution, the edges of the reconstructed image are poor.
The method based on reconstruction mainly introduces effective prior information in the reconstruction process and uses prior information to constrain the image.Schultz [15] references probability priors to image super-resolution, while Sun [16] applies image gradient contour information to image edge contour restoration.When the up-sampling factor of the image is too large, the method based on reconstruction obtains less prior knowledge, and the reconstruction effect will decline sharply.
The method based on deep learning mines the detailed features of the image using convolutional neural networks.It can recover the image using its strong nonlinear fitting ability.Supervised SR methods [17][18][19][20][21] use paired low-resolution images and high-resolution images to learn the mapping from low-resolution images to high-resolution images.However, these methods require a large amount of training data and ground truth (GT).Unsupervised methods do not use GT and are trained on some features of the image itself.[22][23][24][25] learn the degradation process by learning the domain between different images.[26,27] train the super-resolution network model based on frequency separation.[28,29] use the network structure as the prior condition, while [30,31] use the image statistics within a single image.Unsupervised methods learn the image degradation process to make the restored image more in line with the domain distribution of real images.However, these methods ignore the processing of image boundary details.Through the superpixel segmentation method, the edges of multiple superpixel blocks and superpixel blocks can be obtained, and more segmentation boundary information can be extracted.
In this paper, we propose an unsupervised deep image SR method that based on segmentation driven.Intuitively, clear image has a clearer segmentation boundary.We use superpixel segmentation method to extract the contour boundary of image and force them to become clear.The gradient generated by the segmentation entropy energy will flow into the neural network of DIP [28] through back propagation when forcing DIP to obtain a higher performance SR image.Specifically, we add segmentation entropy to each iteration of DIP as a driver, it can make DIP to focus more on edge detail recovery in each iteration and a clear edge contour image will be obtained.Experiments on Set5, Set14 and BSD100 datasets show that the image generated by our method has clearer boundary and better performance than that generated by DIP.
The main contributions of this paper are summarized as follows.
• We propose an unsupervised deep image SR method that based on segmentation driven and use back propagation to inject the gradient generated by segmentation entropy energy into DIP.• In deep learning, a method for computing segmentation entropy is proposed.
• Solving the problem of insufficient edge detail recovery in unsupervised super resolution method.

Related Works
The existing methods to solve the problem of image super resolution are mainly divided into three parts.Methods based on interpolation, reconstruction and deep learning.
The method based on interpolation mainly uses the relationship between adjacent pixels to select the appropriate pixel coordinates for image interpolation.Blu [12] proposed that the value of the interpolation point is the value of the pixel with the shortest euclidean distance from the interpolation point.However, the results obtained by this method have obvious sawtooth phenomenon and the amplification effect is not ideal.In order to solve it, the linear interpolation of four adjacent pixels are mainly implemented from the vertical and horizontal directions to realize the image interpolation [14].The enlarged image sawtooth phenomenon is improved, but the edge is blurred.Then Li [13] proposed an interpolation method based on edge guidance, it assumed that low-resolution images and high-resolution images had the same edge information at the edge.The prediction coefficient of the optimal linear super-resolution mapping was derived by calculating the local covariance of the edge of the low-resolution image.This method solves the problem of image edge sharpening, but the algorithm complexity is high.The method based on interpolation do not consider the semantic information of the whole image, the reconstruction effect is limited.
The method based on reconstruction mainly uses prior information to constrain image restoration.Schultz [15] introduced Maximum A Posteriori (MAP) into image superresolution reconstruction.However, the obtained high-resolution image edge contour is smooth.To solve this problem, Sun [16] proposed an image prior reconstruction method based on edge guidance, it can statistics the gradient contour information of the image and effectively sharpen the image edge.The reconstruction effect depends on the consistency of the statistical model and the gradient contour of the image.But once the magnification of this reconstruction-based method is too large, the effect of reconstruction will fall sharply.
The method based on deep learning involves extracting deep features of an image through a convolutional neural network and recovering the image using its strong nonlinear fitting ability.It can be divided into supervised and unsupervised SR.
The supervised super-resolution method uses a large number of datasets and ground truth (GT) to learn the mapping relationship between low-resolution images and high-resolution images.It then predicts high-resolution images based on the learned mapping relationship.Dong [17] introduced the convolutional neural network into the field of image superresolution and proposed the SRCNN network structure.Specifically, the SRCNN network structure uses an interpolation method to resize the image, and then obtains the high-resolution image through nonlinear mapping with a three-layer convolutional network.The mapping relationship between low-resolution images and high-resolution images is learned by the convolution neural network.However, the method of changing the size of the image by interpolation and sending it to the neural network for recovery has affected the performance of image restoration.Shi [19] proposed the sub-pixel convolution layer, which does not require an up-sampling process for a given low-resolution image, but indirectly realizes the image amplification process through the sub-pixel convolution layer.This method improves the reconstruction effect.However, the above methods use mean square error as the target loss function, which can cause the image to be too smooth and lack sensory image realism.Ledig [18] proposed SRGAN and applied GAN [32] to super-resolution tasks.The high-level feature mapping of the VGG [33] network is used to define the new perceptual loss.This loss uses the discriminant to make the generated high-resolution image visually similar to the ground truth.Recently, RCAN [20] and SAN [21] have introduced channel attention and secondorder channel attention to exploit feature correlation for improved performance.However, the datasets used by these methods are obtained through known degradation processes.If the model trained on this dataset is applied to low-resolution images with different domains in the real world, the effect is often not good.Therefore, researchers are paying more attention to unsupervised super-resolution methods.
Unsupervised SR realizes image restoration by learning the image degradation process.Bulat proposed a two-stage process to learn the degradation process [22].Firstly, unmatched LR-HR images are used to train an HR-to-LR GAN network to learn the degradation process, which obtains natural LR images from HR images to simulate real low-resolution data.Then, the LR-to-HR GAN network is trained using paired LR-HR images based on the first GAN.However, this method does not take into account the generated resolution image feature distribution.Therefore, Yuan proposed a Cycle-in-Cycle (CinCGAN) structure [23].It lets the LR space and HR space be two domains and uses a Cycle-in-Cycle structure to learn the mapping between them.Firstly, the network maps the input images with noise and blur to a low-resolution space that conforms to the real-world feature distribution and has no noise.Then, the feature distribution of the output high-resolution image is compared with that of the mismatched high-resolution image in the real world.This not only takes into account the feature distribution of low-resolution images but also the feature distribution of highresolution images.However, this part of the SR model is a pre-trained model.In order to solve this problem, Maeda starts from the high-resolution image, downsamples it, maps the down-sampled image to the real low-resolution domain, and then passes it through an upsampling network to obtain a high-resolution image [24].Maeda also compares the difference between the obtained high-resolution image and the real image in the domain.Recently, Wei proposed a domain distance map to better reduce domain bias [25].It should be given different importance based on the distance from different regions to the target domain.
FSSR [26] and Zhou et al. [27] proposed to learn a downsampling process to generate paired data and train an SR network with the generated data in a supervised manner.FSSR proposes frequency separation, which guides the network to realize the domain migration of high-frequency components and uses the migrated images for SR network training [26].Zhou et al. improved on the basis of FSSR [26] by proposing a color-guided domain mapping network to alleviate the color shift in the domain transformation process.Moreover, it modified the discriminator of the super-resolution stage so that the network not only keeps the high-frequency features but also maintains the low-frequency features [27].
However, these methods do not use prior information from the image to constrain the restoration process.Deep Image Prior (DIP) [28] uses a randomly initialized convolutional neural network (CNN) as a prior.This approach takes into account that the CNN structure is sufficient to capture a large number of low-level image statistical priors.It takes a random vector z as input and tries to generate the target HR images.Since the network is randomly initialized and never trained, the only prior is the CNN structure itself.However, this method takes the network as implicit prior information, and the restored image effect is not very good.
Enhanced Image Prior (EIP) [29] improves on DIP by introducing an external high-resolution reference image to enhance the image prior and update the input noise.
Afterwards, the approach to learn the degradation process by utilizing the internal information of an image considers that the image's statistics within a single image provide sufficient information for super-resolution (SR).ZSSR [30] uses nonlocal self-similarity of the image to exploit the internal recurrence of information within a single image.To perform superresolution of a low-resolution (LR) image, the image is down-sampled again to learn the super-resolution parameters between the LR image and the down-sampled LR image.These parameters are then utilized for LR super-resolution to finally obtain a high-resolution (HR) image.The method is based on internal learning, and the mapping is learned from this image.However, ZSSR [30] has a long training time and requires thousands of iterations.To address this issue, MZSR [31] introduces a novel training scheme based on metatransfer learning that learns an effective initial weight for rapid adaptation to new tasks.

Methodology
While Deep Image Prior (DIP) can be effective in generating high-resolution images by using the network structure as an implicit prior, the resulting image quality may not always be satisfactory.Therefore, a new approach has been proposed to address this issue by incorporating explicit priors into DIP.This can help alleviate the ill-posed nature of image restoration models, especially in the case of super-resolution tasks.
We observe that the clear image has a clearer segmentation boundary, it will driven DNN to obtain higher performance SR image when forcing the restored image to have clear boundary.The FCN-based superpixel method divides pixels into multiple superpixel blocks and edges.However, this method may lose some segmentation details, as the segmentation loss is obtained through weighted averaging of the predicted correlation matrix 'q' and subsequent reconstruction.To overcome this limitation, we propose using the correlation matrix 'q' directly and transforming it into segmentation entropy.
To obtain lower energy optimization parameters, we use backpropagation to inject the gradient generated by segmentation entropy energy into Deep Image Prior (DIP).Based on this theory, we calculate the segmentation entropy using a convolutional network and add it to each iteration of DIP to force the DNN to generate higher-performance super-resolution (SR) images.Figure 1shows the segmentation entropy results of image segmentation with clear and blurred edges.The results indicate that the clearer the boundary, the lower the segmentation entropy.To verify this idea, we conducted experiments on the BSD500 dataset, which contains 500 complex natural images.As shown in Fig. 2, we processed the dataset with varying degrees of ambiguity and calculated the segmentation entropy of images with different levels of fuzziness.The results demonstrate that the clearer the boundary, the smaller the segmentation entropy.We used different sampling methods, including nearest, bilinear, and bicubic, to downsample and upsample the dataset.The image edge obtained by bicubic interpolation is clearer than that obtained by bilinear interpolation, resulting in a smaller segmentation entropy for the whole dataset.Since the image obtained by nearest interpolation has a sawtooth effect, the more obvious the effect, the smaller the segmentation entropy.We downsampled the dataset 2 times, 4 times, 6 times, and 8 times, and then upsampled it to the original image size.'Up_factor' and 'Down_factor' represent the operation of upsampling and downsampling the dataset by different multiples.The abscissa represents the multiple of the downsampling and upsampling operations on the dataset.For bilinear and bicubic  Fig. 2 In order to verify the effect of clear edge and fuzzy edge on segmentation entropy, we test on the BSD500 dataset.The results show that the clearer the segmentation boundary is, the smaller the entropy is interpolations, the larger the value, the more blurred the image boundary we obtain.The vertical axis represents the segmentation entropy result of the dataset.We used a convolutional neural network to calculate the correlation matrix 'q' and converted it into segmentation entropy.To prevent DIP from falling into local minima, we added L2 regularization as a constraint.
In this paper, we propose an unsupervised deep image SR method based on segmentationdriven approach.The reconstruction of super-resolution images can be divided into three steps.Firstly, a random code vector z is used as the input to the DIP network, which generates high-resolution images as output.Secondly, the high-resolution image is sent to the segmentation network for training to obtain the segmentation entropy.Finally, we use back propagation to inject the gradient generated by the segmentation entropy energy into DIP, while also adding an L2 regularization term for constraint.This method will be described in detail step-by-step in the following sections.

Deep Image Prior Network
The image is generated by x = f θ (z), which maps the random code vector z to the image x by sampling from the real image distribution.We interpret neural networks as parameterized functions: x = f θ (z), where z is a random code vector, θ is the network parameter, and x is the output result after applying the parameter θ .In order to illustrate the effect of parameterization, we consider the image inverse task, which can be expressed as an energy minimization problem shown in Eq. (1).
The function E(x; x 0 ) is a task-dependent data term, while R(x) is a regularization term that captures the general prior of natural images.In this context, x * is the target image, and x 0 is the image that requires repair.In this paper, we replace the regularization term R(x) with a neural network that enables the network to learn the mapping from the random code vector z to degraded images.This allows us to replace Eq. ( 2) with Eq. (1).
The image x * is obtained using the learned parameter θ .The 2D tensor z has 32 feature maps, the same size as x * , and is filled with uniform noise.In Eq. ( 2), R(x) is not eliminated but rather hidden, with its value becoming an extreme form such that R(x) = 0.The image can be generated from z using a specific CNN structure.
A high-capacity network can be used as prior information.We aim to find a set of parameters θ that can reconstruct the target image x 0 , including random noise, without imposing any constraints on the generated images.The optimization function is given by Eq. (3), which calculates the L 2 -norm between x and x 0 .
Putting Eq. (3) into Eq.( 2), we obtain the optimization problem shown in Eq. ( 4).For super-resolution tasks, the data term is set as shown in Eq. ( 5).The training process is outlined in Algorithm 1.

Algorithm 1 Algorithm for solving DIP
Require: Random code vector z and low-resolution image x 0 .Ensure: High-resolution image x. 1: repeat 2: Update DI P loss by Eq (5).4: Update θ x (i) using the ADAM algorithm.5:

Segmentation Entropy
In the previous section, we described how we obtain a high-resolution image by inputting the random code vector z into the DIP network.This high-resolution image is then sent to the segmentation network for training.The specific process is as follows.
Firstly, a regular grid of size h × w is used to segment the H × W image, and each grid cell is regarded as an initial superpixel.Here, h and w represent the height and width of the superpixel block, respectively, while H and W represent the height and width of the image.This step initializes the superpixel centers.To obtain the final superpixel segmentation map, we need to determine which cluster center s = (i, j) each pixel p = (u, v) belongs to using a mapping g, where (u, v) is the coordinate position of the pixel and (i, j) is the coordinate position of the cluster center.If the current pixel p is found to belong to the cluster center s, we set the map g s ( p) = g i, j (u, v) = 1.However, a pixel is only related to several clustering centers around it, and the connection with other clustering centers can be ignored.In order to reduce the amount of calculation, we only calculate the nine clustering centers around it.Therefore, the mapping is written as g ⊆ Z H ×W ×9 .There are many methods to calculate the mapping g, such as calculating the Euclidean distance.However, in this paper, we use a deep neural network to predict g directly.To make our objective function differentiable, we use a soft correlation mapping q ∈ Z H ×W ×N p to replace g.Here, q represents the predicted weight of each pixel to the nine cluster centers around the pixel, and the sum of the weights is 1.N p is the number of cluster centers.Here, q s ( p) represents the probability that each pixel p is assigned to the surrounding cluster center s, where s ⊆ N p, and s∈N p q s (p) = 1.
, l s = p:s∈N p p • q s (p) The correlation matrix q of the superpixels is predicted using the standard encoder-decoder design with skip connections.First, the encoder takes the high-resolution image generated by DIP as input and generates a high-level feature map through the convolutional neural network.Then, the decoder gradually samples the feature map through the deconvolution layer to predict the correlation matrix q.In terms of the loss function, f p represents the pixel attribute, which in this method is the 3D CIELAB color vector.We further represent the position of the pixel by its image coordinate p = [x, y] T , where x is the abscissa and y is the ordinate.
By the association graph q, we can predict the color and location properties of superpixel centers by Eq. ( 6).The attributes of the superpixel center can be represented as C s = (U s , I s ), where U s is the CIELAB color attribute vector of the superpixel center and I s is the location attribute vector of the superpixel center.
Equation ( 6) is a clustering process.A pixel will assign its own attributes to the surrounding cluster centers with a certain weight.If the predicted pixel belongs to the current cluster center, the current weight is set to be large, and then, update the properties of the cluster center.
The process from pixel to cluster center cannot complete the training of superpixel segmentation.It also needs to use Eq. ( 7) to reconstruct the original pixel attribute.That is to say, the original pixel attribute is reconstructed again by the correlation matrix q and the superpixel center.The following two steps complete the segmentation of superpixel.Firstly, the superpixel center is found by pixel clustering, and then the attributes of the superpixel center are reconstructed into pixel attributes.
The reconstructed pixel color attribute is denoted as f ( p), and the reconstructed pixel position attribute is denoted as p .To complete the training of the superpixel segmentation network, a loss function, as shown in Eq. ( 8), is designed.The loss function consists of two parts.The first is the content reconstruction loss, which uses the L 2 -norm as the distance measure to constrain the loss of color attributes.The second is the loss of spatial location, which forces the superpixel to occupy a compact space.Here, s denotes the superpixel sampling interval, and m is used to balance the weight of these two items, with its value set by us.The training process is shown in Algorithm 2.

Algorithm 2 Segmentation Entropy
Require: High-resolution image x Ensure: A soft correlation mapping q 1: repeat 2: Update superpixel center C s = (U s , I s ) by Eq (6).

4:
Update the reconstructed f (P) and P by Eq (7).

7:
Update the seg loss by Eq (8).8: Update θ s ( j) using the ADAM algorithm.9: j ← j + 1 10: until j > j max 11: Build H (q) according to Eq (9).13: Update the loss by Eq (10).14: Update θ x (i) using the ADAM algorithm. 15: During the joint training process, a random code vector denoted as z x is mapped to a high-resolution (HR) image through a neural network.In each iteration of the Deep Image Prior (DIP), the generated HR image is fed into a superpixel segmentation network that produces a correlation matrix q using an encoder-decoder neural network structure.The matrix q is a weight matrix used to predict which pixels belong to the surrounding nine clustering centers.Here, q ∈ Z H ×W ×N p , where H , W , and N p denote the height, width, and number of cluster centers, respectively.Using Eq. ( 6), a clustering operation similar to downsampling is performed on the input HR image.Then, using Eq. ( 7), the original image is reconstructed to achieve segmentation similar to upsampling.The superpixel segmentation network is trained using Eq. ( 8), and the corresponding segmentation entropy is obtained using Eq. ( 9), where N p i=1 q i = 1.Here, q i represents the probability that pixel p belongs to the surrounding superpixel block, and N p denotes the number of cluster centers.We sent the high-resolution image to the segmentation network to obtain the segmentation entropy.We added the segmentation entropy to each iteration of DIP to guide the generation of highresolution images.From this, we can see that the segmentation entropy also plays a role in the DIP network parameters.The overall process is illustrated in Figs.3and 4and Algorithm  3.
The training process is unsupervised, meaning that there is no labeled data.During the training process, the gradient generated by the segmentation entropy energy flows into the DIP network through backpropagation, ultimately leading to improved training parameters for the deep neural network (DNN).

Dataset
To evaluate the proposed algorithm, we conducted experiments on commonly used benchmark datasets, namely Set5, Set14, and BSD100, which contain five, fourteen, and one hundred complex natural images, respectively.These datasets are typically used for single image super-resolution reconstruction.In the experiments, the size of the high-resolution image is the same as that of the original image.The low-resolution image is downscaled according to the scaling factor using the bicubic interpolation method to obtain a matched pair of low-resolution (l L R ) and high-resolution (l H R ) images.The l L R images are used for network training, and the l H R images are used to evaluate the training results.The batch size of each image during training is set to 1.

Implementation Details
Our model was implemented using PyTorch.In the DIP network, we set the number of iterations to 2000 and used the Adam optimizer with β1 = 0.9 and β2 = 0.999, and initialized the learning rate to 0.0005.We first downscaled the high-resolution image to match the size of the segmentation network.In the segmentation network, we used the Adam optimizer with α1 = 0.9 and α2 = 0.999, and used loss seg in Eq. ( 8) as the segmentation loss function, where m = 0.003.We set the size of the superpixel cell to 4 × 4, which determines the number of superpixel cluster centers.For the total training, we used Eq. ( 10) as the loss function.

Evaluation Metrics
The evaluation criteria of single-image super-resolution methods are usually divided into subjective evaluation and objective evaluation.Subjective evaluation is to visually compare the original image with the generated image by human eyes.In order to verify the quality of the model, objective evaluation criteria such as peak signal-to-noise ratio (PSNR) and structure similarity (SSIM) are usually used to evaluate the reconstruction quality of the generated image for different models.The peak signal-to-noise ratio measures the quality of image reconstruction by calculating the error between the corresponding pixels.The higher the value is, the stronger the repair ability of the network is.

Experimental Results
The experiment in this paper aims to compare the reconstructed images generated by our proposed method with those generated by other existing methods.We compared our unsupervised method with existing supervised and unsupervised methods on the Set5, Set14, and BSD100 datasets at three different scale factors.Our method aims to improve the edge details of the reconstructed images by incorporating the segmentation entropy obtained through image segmentation.As presented in  when the scale factor is 2 or 4, thanks to the addition of segmentation entropy.However, the improvement is not significant when the scale factor is 8.This is because when the scale factor is not large, the image generated by the DIP network has clear edges.The clearer the edge, the lower the segmentation entropy.Lower segmentation entropy indicates a higher probability that the pixel belongs to the correct clustering center, which can better force the DIP network to produce clearer images.
As shown in Figs.5and 6, we compare the subjective effects of our method with other different methods.The function of the segmentation entropy is to make each pixel better divided into the superpixel block in each iterative training, so that the edge details of the image can be better recovered.Figures 5 and 6 show that Our method can produce clearer image edges than DIP by adding segmentation entropy.
However, supervised methods use pairs of data for training, which results in clearer image edges.In contrast, our method only uses a single image for restoration and does not rely on paired data.Therefore, our method restores the image solely based on the image prior, without any supplement from external data.
Table 2shows the segmentation entropy results of images trained by DIP and our method on the Set5 dataset.The results demonstrate that our method produces lower segmentation entropy on each image, which confirms why our method can produce clearer edges.
In Table 3, we compare the influence of the number of cluster centers on the experimental results for the Set5 dataset.We adjust the number of cluster centers by changing the size of the superpixel cells, where a larger cell size leads to fewer cluster centers.The results in Table 3 demonstrate that a smaller number of cluster centers result in smaller segmentation We send the image obtained by our method and the image obtained by DIP into the segmentation network, and use Eq. ( 9) to calculate the segmentation entropy.Experimental results show that the segmentation entropy generated by our method is smaller The cell size determines the number of superpixel cluster centers.The larger the size is, the less the number of cluster centers is entropy and higher PSNR value.This is because fewer cluster centers lead to a greater weight difference between each pixel and the surrounding clustering centers, resulting in smaller segmentation entropy during training.The smaller segmentation entropy can drive DIP to produce clearer image edges (Table 4).As shown in Table 5, we compare the parameters and FLOPs of different methods, respectively.Among them, The parameters and FLOPs of SRCNN are the smallest, 0.57 M and 2.56 G, respectively.This is because SRCNN only has three convolutional layers, resulting in a small number of parameters.The number of input image channels is one, resulting in a small number of FLOPs.Due to using only three convolutional layers for recovery, the number of parameters is small, but the recovery effect is limited.In order to better obtain the restored image from the input noisy image, we use Unet network, which includes more convolutional layers and can better learn image features, and to guide the restoration effect with segmentation results, we add a segmentation network to obtain the segmentation results.This leads to an increase in the number of parameters.the parameters and FLOPs of our method are 4.49 M and 66.56 G, respectively.
We also compared the effects of different DIP downsampling methods on the results for scale factors of 2, 4, and 8.Among the area interpolation method, bicubic interpolation method, and nearest interpolation method, the image effect obtained by the area interpolation method was the best.Therefore, among the three methods, joint training using the area interpolation method resulted in the best image effect.These results demonstrate that using a better interpolation method in DIP can lead to better image quality during joint training.

Conclusions
In this paper, we propose an unsupervised deep image SR method based on segmentationdriven techniques.Our method does not require pre-training or large datasets with ground truth (GT), as only a single image is needed for image super-resolution.The proposed method is completely unsupervised and relies only on the image itself.By adding segmentation-driven techniques to DIP, additional improvements are obtained.
We use back propagation to inject the gradient generated by the segmentation entropy energy into DIP, which results in lower energy optimization parameters.Adding segmentation entropy forces the restored image to have clear boundaries, which is demonstrated in the experiments through the abundant experimental results of edges and details.
Although our method can produce clearer image edges than DIP, it does not perform well when the scale factor is too large.The disadvantage is that the segmentation network needs to be trained in each iteration of DIP, which results in long training times.Furthermore, our method is not ideal for large scale factors.

Fig. 1
Fig. 1 Segmentation.The first line shows the images with blurred boundary and clear boundary, respectively.The second line shows the corresponding superpixel segmentation results.The third line shows the segmentation entropy

Fig. 5 Fig. 6
Fig. 5 Subjective results.The image is the third image in the Set5 dataset

Table 1
Comparison with existing methods on image super-resolution tasks

Table 1 ,
our proposed method achieves better results than DIP

Table 2
Segmentation entropy generated by our method and method on Set5 and Set14 dataset

Table
We compared the effect of the number of cluster centers on the experimental results under different scale factors in the set5 dataset

Table 4
The influence of different down-sampling methods on the results of DIP