Inverse Halftoning Based on Weighted Nuclear Norm Minimization

The inverse halftoning technology refers to restore a continuous-tone image from a halftone image with only bi-level pixes. However, recovering the continuous images from their halftoned ones is normally ill-posed, which making the inverse halftoning algorithm very challenging. In this paper, we propose an optimization model with two alternate projections (TAP) for image inverse halftoning under the weighted nuclear norm minimization (WNNM) framework. The main contributions are two-folds. First, the WNNM nonlocal regularization term is established, which oﬀers a powerful mechanism of nonlocal self-similarity to ensure a more reliable estimation. Second, the alternate minimization projections are formulated for solving the image inverse halftoning, which reconstructs the continuous-tone image without destroying the image details and structures. The experiment results shown that the proposed method outperformed the state of the arts in terms of both objective measurements and subjective visual performance.


Introduction
The halftoning algorithm can represent a gray scale image with only one color through the use of ink dots, and is widely used in today's digital image printing, publishing and displaying applications, such as printers, newspapers, books, magazines, etc. [11,13]. Most printing devices are based on binary technologies. Halftoning involves use of patterns or ink dots at a high spatial be well separated. Therefore, the similarity of non-local patches in this group of images can be enhanced. In order to optimize the whole framework, we also use TAP-WNM to construct an iterative projection method to ensure the lower bound of each group of singular values. Experimental results show that this method has good performance. The rest of this paper is organized as follows. In Section 2, we introduce the related work. Section 3 presents the proposed framework. In Section 4, experiment results are reported. Finally, conclusions are addressed in Section 5.

Weighted Nuclear Norm Minimization Regulation
The weighted nuclear norm minimization (WNNM) regulation [5,6] is the generalization form to the nuclear norm minimization(NNM) model [1]. WNNM is described as: min where X w, * = i w i σ i (X) is the weighted nuclear norm (WNN) of matrix X, and w = [w 1 , · · · , w n ] T (w i ≥ 0) is the weight vector. The σ i (X) is the ith singular value of X. The Corollary 1 in [6] proved that Eq. (1) would have a closed-form solution if the weights are monotonic increasing: whereX = UΣV T is the singular value decomposition (SVD) [3] ofX and S w/2 (•) is the soft-thresholding operator typically generated by the weight vector w: The WNNM model has demonstrated the competitive performance for image denoising. However, if we directly extend WNNM to image inverse halftoning by denoising the inversed halftoning version, inverse halftoning and denoising artifacts may happen, which will result in the details and textures missing. In this paper, we propose an optimization model with two alternate projections (TAP) for image inverse halftoning under the WNNM framework, which preserves the construction and texture details of the inverse halftoning image while removing the artifacts and halftoning noise.

Image Inverse Halftoning
During the last decade, a lot of approaches have been proposed for image inverse halftoning. Among them, Bayesian based approach [17] employs the least-mean-square (LMS) algorithm to build the map between the current processing position and its corresponding neighboring positions in each type of halftone image. The look-up-table (LUT) halftoning method [7] is an effective method for constructing halftone image and can approximate the dot distribution of the learned halftone image set. In the wavelet-based method [20], the convolution operator specified by the model is first half-modulated and then the residual noise is attenuated by scalar wavelet domain shrinkage. However, these methods only explore the internal information of halftone images to improve the performance of inverse halftone images.
There also exist some methods exploiting internal similarities of an image [19,27]. Sparsity-based inverse halftoning method [27] is proposed to solve the problem of cross-style image restoration from halftone image to continuous tone image. These works only using the internal information to inverse halftoning, which are generally inferior to external ones. Inspired by recent success of deep learning technology in image processing applications, some convolutional neural network(CNN) based works [9,10] are developed, which explore the external information for halftone image. Using external halftone and continuous tone images, the CNN based inverse haltoning approach [9] trains a deep CNN as a nonlinear transformation function to map a halftone image to a continuous tone image. However, the external methods built their models directly from the low-quality inputs, without developing the internal similarities. Hence it remains a challenge to combine the internal and external priors for image inverse halftoning.
The BM3D based method [16] is a representative one, which added a regularization term of nonlocal similarity and significantly improve the performance. To avoid blurry solutions, The G-SVD based approach [28] is proposed to exploit the similarity of nonlocal image patches, which achieve the state-of-the-art performance.
This paper proposes an optimization framework with two alternate projections (TAP) for image inverse halftoning under the WNNM framework, which utilizes the strong low-rank prior of image non-local similar patches as the internal information. And the external information between the images of the adjacent iterations are also explored to enhance the performance of the inverse halftoning.

The Inverse Halftoning Framework
The proposed inverse halftoning model reconstructs the continuous image from halftone version while tries to preserve local smoothness under the constraint of nonlocal similarity. In this section we first explain how to utilize the unconstrained local smoothness (the first projection), and then discuss the proposed WNNM constraint (the second projection). Finally an iterative projection strategy is established upon TAP-WNNM to optimize the framework.

Preserving Local Smoothness
The halftoned image Y is generated from its continuous-toned version X by Y = H(X) where H represents a generic halftoning model, such as dotdiffusion dithering or error diffusion. In this paper, the halftoning model is error diffusion. In order to preserve local smoothness, we look for an approxi-mationX which can optimally represent X:

Constraining Nonlocal Similarity With WNNM
Obviously, the optimal result of Eq. (4) is an approximationX which is exactly the same as the real continuous-tone image X. However, it is usually impossible for that the optimization is non-convex and highly ill-posed. What's worse, a lot of works have proved that Eq. (4) often tends to preserve local-space-region smoothness of image and gets stuck in blurry solutions [16]. Therefore, in order to limit the solution set, avoid fuzzy results and better restore discontinuity, we add WNNM constraints to promote the non-local patch similarity: where H is the halftoning operation, and X w, * = i w i σ i (X) is the weighted nuclear norm of matrix X, w = [w 1 , · · · , w n ] T (w i ≥ 0) is the weight vector, and σ i (X) is the ith singular value of X. More specifically, X w, * consists of a grouping phase, a WNNM phase and a pixel-averaging phase. For a reconstructed imageX, firstly, similar image patches are grouped as X w, * , and singular values of each group are decomposed to synthesize weighted non-zero singular values. Finally, the average pixel value of overlapping image patches is calculated. Usually the components corresponding to the previous singular values represent the common pattern, while other components may be destroyed  by noise. The importance of significance patterns to halftone noise can be increased by constraining the leading singular values with large weights and other singular values with small weights, thus reducing artifacts and enhancing patch similarity.
We will discuss these three stages in detail below.

Grouping Phase
Intuitively, visually similar patches are usually concentrated near a manifold, while visually different patches are distributed differently. The purpose of the grouping stage is to separate spatially distant manifolds so as to find a good linear representation of each manifolds in the following decomposition stage. Similar ideas can be found in work related to denoising, for example [21]. For the given image with size of H × W , every h × w adjacent pixels are extracted, aim to produce (H − h + 1) × (W − w + 1) normal patches of hw dimensions. Among them N reference patches covering the whole image are selected uniformly. For each reference patch I i , finding the C most similar regular patches in p×q surrounding pixels and sort them by Euclidean distance. These C patches are further rearranged as an hw ×C matrix with each column storing one of them sequentially.
In the conclusion, the grouping stage (GROUP) takes an image and the number of groups (i.e., N ) as the input and outputs an hw × C-dimensional N matrix. The goal of this phase is to describe the low-dimensional manifolds in each set in order to separate the significant patterns concentrated near the manifolds from the noise that is often orthogonal to the manifolds. Fig. 2 shows an illustration. After the image is destroyed by zero-mean Gaussian noise, we can see that the main singular values usually do not change much, while the rest of the singular values change a lot. Therefore, by limiting the lower bound of the nonzero minimum singular value to a large, the noise can be successfully ignored.

Pixel-Averaging Phase
For the overlapping patches in the patch grouping stage extracted above, each pixel may have several approximations with random halftone noise. It is natural to use their average as the new approximate pixel at this position. In this way, the main message is likely to be enhanced and false or rough textures are nicely removed.
The patches recovered in each X is put back to rebuild the input image with an average overlap of pixels (denote this process as "return"). We describe this phase in detail below.
The restored grouping patchesĜ p are the result of the patch grouping phase. The patch recovered at eachĜ p are put back, and the input image is reconstructed with average overlapping pixels (denote this procedure as "Back"). We will describe this phase in detail below.
We define the following formula: where R(·) is the extraction operator which extractsĜ from imageX for patch p, and the transpose of it is denoted by R T (·). The vectorized patches can be put back to their corresponding positions in the reconstructed image as columns ofĜ and filled with zeros elsewhere.
Taking the average of all the matrices {Ĝ pi }, i = 1, 2, · · · , N , where p i , i = 1, 2, · · · , N are the patches which are extracted from the image, the recovery of the whole imageX from {Ĝ pi } can be described as: where ⊘ represents the element-wise division and 1 n×m is a matrix of size n × m with all elements 1. Fig. 3 shows the overall schematic diagram of the WNNM-constrained nonlocal similarity, including the grouping phase, WNNM and pixel-averaging phase. These three stages are in-place operation, such as extracting internal halftone image information.

Iterative Projection
Under unconstrained conditions, the residual errors in Eq. (5) can by optimized by iteratively perturbingX around the decision boundary when H(X) and Y [16] are inconsistent. We refer to this iteration as the coding step.
To handle the increased constraints, we insert an approximate update step after each coding step to ensure that the WNNM constraint is satisfied after projections.Given an approximateX, the projection is performed by the following steps: 1. Apply grouping on N reference patches ofX to generate a set of matrices {G 1 , G 2 , . . . , G N }.

For each matrix G i , calculate an singular value decomposition
Generalize soft-thresholding operator with weight vector w: It is easy to see that the above process (also shown in Fig. 3) is an extension of WNNM. The projection will continue iteratively until the stop condition is met. The whole optimization process is summarized in Alg. 1.

Experiments
This section starts by investigating the iteration number. Followed by comparing our algorithm with state-of-the-art inverse halftoning algorithms.

Data and Implementation
[Data] Following the standard protocol [16,20,28], several images including Lena, Barbara, Boat, Hill, House, Man and Peppers are used as test images, with their sizes being scaled to (H × W = 256 × 256). Twelve Kodak images 1 including first four images Kodim01-04, middle four images Kodim09-12 and last four images Kodim21-24 are converted to gray versions and used, with the sizes being reshaped to (H × W = 256 × 256). The corresponding halftoned images are obtained by the standard Floyd-Steinberg diffusion [4].
[Implementation] For our framework, the patch size is set to H 32 × W 32 , i.e., 8 × 8. Larger Patch size values would get better performance, but will be slower. (C = 70) similar patches centered within (p × q = 20 × 20) surrounding pixels are found for each of (N = 127 × 127) reference patches. The step between neighbor patches is set to 2 pixels. Though p, q and N can be larger at the expense of running time, we do not observe significant improvements. The optimized initial input is simply set to the smoothed version of the input halftone image.

Iteration Number
We show the effect of the number of iterations. Fig. 6 shows the results of our inverse halftone framework on the first seven test images, with different iterations. We can see that PSNRs and SSIMs increase steadily at first, and then gradually saturate. This phenomenon proves the correctness of the pro- posed TAP-WNNM iterative method. As the number of iterations increases, our method gradually improves the quality of the inverse halftoning image, rather than impair the performance.

Comparisons with State of the Arts
We compare our proposed inverse halftone method with three representative methods: total-variation-based (TV) [15], BM3D based [16] and G-SVD based [28] inverse halftoning.
As shown in Table 1, our framework achieves better or comparable performance when compared with the state-of-arts on all the test images, on the PSNR and SSIM metrics. More precisely, we have obtained the PSNR of more than 29 dB on Man and kodim11, dramatically outperformed the state-of-thearts. Besides, for the test images House and Man, our framework achieved 35.05 dB and 29.05 dB respectively. To the best of our knowledge, this is the first time to see in the open literature that inverse halftoning of House or Man can achieve the PSNR values of larger than 35 dB and 29 dB Respectively.
In addition to the Table 1 which contains the objective PSNR / SSIM quality measurement results for all approaches, Fig. 4 and Fig. 5 offer some subjective quality comparisons. Due to the space limitation, we only show results of the first seven test images (e.g. Lena, Barbara, Boat, Hill, House, Man and Peppers in Fig. 4. And the first seven selected Koda test images (e.g. Kodim01, Kodim02, Kodim03, Kodim04, Kodim09, Kodim10 and Kodim11 ) in  5. We can see that in Barbara, the texture of the scarf is reconstructed with a higher fidelity. And there are lesser artificial flaws in the smooth region of Peppers.

Conclusion
This paper studies how to build an effective inverse halftone framework. To avoid fuzzy solutions, we propose a TAP-WNM constraint to take advantage of the similarity of nonlocal image patches. Our TAP-WNNM constraint may also be beneficial for related image reconstruct problems, e.g., denoising. Work in these directions are ongoing and will be presented in future.

Compliance with Ethical Standards statements
Ethical approval: This article does not contain any studies with human participants/animals performed by any of the authors.

Conflict of interests:
The authors declare that they have no conflict of interest regarding the publication of this article.
Informed Consent: The authors consent the declarations and the publication of this article. Figure 1 Please see the Manuscript PDF le for the complete gure caption Figure 2