Blind color image watermarking incorporating a residual network for watermark denoising and super-resolution reconstruction

Watermarking is a technique for hiding secret information in various types of multimedia data to protect intellectual property rights. The integration of deep learning technology with image watermarking is currently reshaping the application and promotion of relevant techniques developed so far. This paper presents a novel type of blind color image watermarking method that embeds a downsized color image into a host color image. Watermarking implementation involves partitioning the host image into non-overlapping blocks of 8 × 8 pixels, performing discrete cosine transform (DCT) for each block of every channel, and then manipulating the magnitudes of three designated DCT coefficients subject to a minimization constraint. The experimental results confirmed that the proposed image watermarking method outperformed six other methods in terms of zero-normalized cross-correlation (ZNCC). Moreover, watermark imperceptibility, as reflected by the measured peak signal-to-noise ratio and mean structural similarity metrics, remained satisfactory. In addition to this new style of color image watermarking, we employed a deep residual network to reduce noise and increase the resolution of the retrieved watermarks. Overall, the residual network achieved a satisfactory ZNCC level (> 0.88) when the watermark images were super-resolved by a factor of sixteen.


Introduction
Currently, the sharing and distribution of digital images are commonplace in our daily lives, especially with the advent of smart devices, internet technology, and social media (Arnold et al. 2003). Simultaneously, useful software tools have been created that can tamper with image data distributed around network environments. As a result, various intentional or unintentional violations and breaches against intellectual property occur frequently. Image watermarking techniques are considered promising for preventing such intellectual property infringements (Mahto and Singh 2021), which involve hiding secret information within the host images. Provided the embedded information is retrievable, these methods are effective for applications including copyright protection, content authentication, forensic investigation, and covert communication.
Image watermarking can be classified into two categories: blind and non-blind. The non-blind approach requires the original image or additional side information for watermark extraction, whereas blind techniques require neither. Because the original image is not always accessible during watermark extraction, the blind watermarking approach is more prevalent in practical applications.
Blind watermark embedding can be implemented in either the spatial or transformed domain. Transform-domain methods are the most popular, as they can exploit both image characteristics and human visual perception. To perform watermarking, transform-domain methods usually convert the host image into a transformed domain first, and then hide the watermark information by modifying the transformed coefficients. Commonly employed transformations for image watermarking include discrete Fourier transform (DFT) (Fares et al. 2020;Hsu and Hu 2020;Urvoy et al. 2014), discrete wavelet transform (DWT) (Barni et al. 2001;Hu and Hsu 2017;Huynh-The et al. 2016), DCT Hu et al. 2022;Moosazadeh and Ekbatanifard 2019;Patra et al. 2010;Singh and Bhatnagar 2018), matrix decomposition (MD) (Ali et al. 2015;Chang et al. 2005;Hsu and Tu 2020;Koley 2021;Nha et al. 2022;Su et al. 2020b), and combinations of these transforms (Hu and Hsu 2015;Kang et al. 2020).
In recent years, many investigators have developed effective watermarking techniques for applications that involve color images. The adopted watermark can be any type of digital data, such as text strings, photo signatures, logos, images, audio, biometrics, and multimedia. Image logos are the most commonly used watermark in the literature, as the performance of watermarking methods can be easily examined through visual inspection of the retrieved watermarks. Unsurprisingly, attempts to embed color watermarks into color images have increased. However, embedding color watermarks requires an extremely high payload capacity. In addition to the graphical content, color watermarks require information pertaining to the purity or intensity of color.
The recent literature surveys (Mahto and Singh 2021;Singh et al. 2021;Wan et al. 2022) indicate that various color image watermarking approaches have been published in the last two decades. Although all of these watermarking methods achieved certain degrees of success in terms of capacity and robustness, it remains a difficult challenge to achieve high capacity, imperceptibility, and robustness simultaneously. Hence, this paper aims to develop an effective and high-capacity blind color image watermarking method that cooperates with a deep-learning neural network to further enhance the visual quality of extracted watermarks.
The contributions of this paper include: (1) we introduce a novel embedding method termed constrained magnitude modulation to embed integer numbers into the DCT domain directly; (2) a large payload capacity is intended to enable the embedding of a color watermark of 64 Â 64 pixels into a host image of 512 Â 512 pixels; (3) a deep learning network is incorporated with the watermarking method to produce high-quality, high-resolution watermarks for visual examination.
The rest of the paper proceeds as follows: Sect. 2 provides an overview of recent high-capacity image watermarking methods. In Sect. 3, we outline the technical background, covering the fundamentals of DCT and color space conversion. Section 4 discusses the proposed watermarking method for embedding color pixel values into the DCT domain. Section 5 presents the experimental results, including details of a residual network that was employed to enhance the watermark in terms of perceptual quality and super-resolution. Finally, the conclusions are presented in Sect. 6.
2 Overview of recent high-capacity image watermarking methods In this section, we briefly review some representative highcapacity blind color image watermarking techniques capable of embedding a full-color image of 64 Â 64 pixels (equivalent to 64 Â 64 Â 3 Â 8 bits) into a host color image of 512 Â 512 pixels. Here, the targeted payload capacity is 3/8 bit per pixel (bpp). To achieve such a high capacity, Zhang et al. (2020) exploited the unique feature of the direct current (DC) component of DFT derived from image blocks of size 2 9 2, where each binary bit was determined by manipulating the inequality between the DC components drawn from two adjacent image blocks. In the original design, they hid two 24-bit color images of 32 Â 32 pixels into a host color image of 512 Â 512 pixels. Nonetheless, it is possible to embed 64 Â 64 Â 3 Â 8 bits into the host image if every 2 Â 2 block is used to perform watermarking. Liu et al. (2021b) quantized the coefficient with maximum energy after taking the Haar transform of an image block. Given that one bit was hidden in every two 2 Â 2 blocks, the maximum allowable capacity was exactly 64 Â 64 Â 3 Â 8 bits. Chen et al. (2022) presented an efficient blind watermarking algorithm using the Walsh-Hadamard transform (WHT), in which two binary bits were embedded into a WHT 4 9 4 matrix block by fine-tuning the paired coefficients in the first row of the transformed matrix. Su et al. (2020a) devised a combined-domain approach to carry out color image watermarking. They analyzed the first-level approximation coefficients of the DWT in the spatial domain and quantized the designated coefficients into binary categories. A capacity of 3/8 bpp could be achieved if half of the approximation coefficients participated in the watermarking process. Due to the exploitation of the relationship between the DWT coefficients and spatial-domain image pixels, the presented algorithm was claimed to be as robust as the transform-domain watermarking methods and as efficient as the spatial-domain watermarking methods.
Because an image can be considered a two-dimensional matrix representation of pixel values, image watermarking using various matrix decomposition methods has also been attempted. Hsu et al. (2022) proposed a high-capacity QR decomposition (QRD)-based image watermarking algorithm that intended to embed two bits into each nonoverlapping block of 4 9 4 pixels. The required operations involved manipulating two pairs of elements located in the first column of the orthogonal matrix after applying the ORD to each block. In addition, a super-resolution convolutional neural network (SRCNN) was employed to enhance the visual quality of the retrieved watermark. The SRCNN contributed an improvement of 0.191 to image quality in terms of the mean structural similarity index measure (MSSIM). Liu et al. (2021a) adopted a different strategy that embedded a quaternary value into non-overlapping blocks of size 4 9 4. After acquiring the eigenvalues of an intended block using Schur decomposition (SD), they quantified the maximum eigenvalue to four intervals so as to achieve quaternary watermarking. Simulation results indicated that the presented scheme satisfied the needs of visual imperceptibility, large payload capacity, and strong robustness. Table 1 gives an overview of the abovementioned highcapacity watermarking methods. As revealed from this table, all these methods tended to perform binary watermarking in relatively small image blocks (containing either 2 Â 2 or 4 Â 4 pixels) for each channel in pursuit of sufficient payload capacity. Such an arrangement usually imposes burdens on the performances in imperceptibility and robustness. Furthermore, as indicated by Mahto and Singh (2021) in their survey of color image watermarking, the watermarking carried out over the RGB channels individually cannot survive JPEG compression attacks. Hence our method is proposed to address the above limitations.
The last row of Table 1 presents the attributes of our color image watermarking method. The novelty of the proposed approach lies exactly in its contrast to the others mentioned above. Following the specifications used for JPEG compression, the proposed approach deliberately embeds three 8-bit unsigned numbers into each 8 Â 8 matrix block in the luma (i.e., Y 0 ) channel, thus allowing us to formulate high-capacity watermarking that is also resistant to JPEG compression. Furthermore, as the deep learning technique has proved to be contributive to watermark enhancement , we also planned to employ a residual neural network to denoise and supersolve the extracted watermark images. The novelties in this study consist of three aspects: (1) devising a novel numberbased image watermarking method, (2) taking RGB channels into account as a whole to resist JPEG compression, and (3) enhancing the extracted watermark image using a residual network model.

Two-dimensional discrete cosine transform
For image watermarking, the transform-based approach has become popular due to its advantage of exploiting spectral features and human visual characteristics. Among the transformations mentioned in the introduction section, the DCT has attracted much attention, as this transformation can efficiently compact the signal energy in low frequencies (Rao and Yip 1990). Such a property renders the DCT eminently suitable for image compression and watermarking applications.
Let xðm; nÞ f g MÂN denote a matrix block drawn from one of the RGB channels of a color image. The DCT can be expressed as where Cði; jÞ denotes the ði; jÞth DCT coefficient derived from xðm; nÞ f g MÂN . Terms u i and u j are index-related constants defined as follows: 1 i M À 1: 1 j N À 1: The inverse DCT that converts Cði; jÞ f g MÂN to xðm; nÞ f g MÂN is a form of In the sequel, the DCT matrix derived from the block in a specific channel is designated C ch name ði; jÞ, with the subscript representing the channel name. For example, the DCT matrixes computed in the R, G, and B channels are denoted as C R ði; jÞ f g, C G ði; jÞ f g, and C B ði; jÞ f g, respectively.

Color space conversion between RGB
and Y 0 C b C r Many RGB-based color image watermarking techniques have been introduced during the past two decades (Mahto and Singh 2021). However, regardless of the success of these developed watermarking schemes, they all faced difficulties with JPEG compression attacks. The effect due to JPEG compression can be better understood from its encoding process. The process of JPEG encoding consists of the following: (i) converting the RGB color model into the color space represented by one luma and two chroma components (Y 0 , Cb, and Cr); (ii) dividing each channel component in the Y 0 CbCr model into non-overlapping blocks of 8 9 8 pixels; (iii) performing two-dimensional (2D) DCT on each block; (iv) quantizing the DCT coefficients with a prescribed quantization table; and (v) applying Huffman coding (CCITT 1992;Hudson et al. 2018) to the quantized data. The color space conversion described in (i) is performed as follows: where x R , x G , and x B denote the pixel values in the red, green, and blue channels, and x Y 0 , x C b , and x C r are the converted luma, red-difference chroma, and blue-difference chroma, respectively. The inverse conversion from Y 0 CbCr to RGB adopts the following form: From Eqs. (5) and (6), it is evident that the application of 2D DCT to the left-and right-hand sides of any of the above equations will remain equal. Specifically, where DCT 2D fÁg represents the 2D DCT transformation. Based on the linear nature of the DCT, the transformed DCT coefficients will also remain equal: A similar inference applies to the other equations listed in Eqs. (5) and (6). As JPEG compression involves the quantization of C Y 0 ði; jÞ, C C b ði; jÞ, and C C r ði; jÞ, these quantization processes inevitably impair the watermarks hidden in the Y 0 , Cb, and Cr channels. At the receiving end, JPEG decoding is simply a reverse process of the previously mentioned steps. After restoring the DCT coefficients from the coded bit stream, the decoder performs inverse 2D DCT to acquire the Y 0 , Cb, and Cr channel components block-by-block. The final step requires a conversion from Y 0 CbCr to the RGB color space. The conversion between these two color spaces (Y 0 CbCr and RGB) further undermines the watermark hidden inside the RGB channels. Many researchers have recognized that watermarking in the Y 0 CbCr color model could improve resilience against JPEG compression (Mahto and Singh 2021). Moreover, among the Y 0 , Cb, and Cr components, the luma Y 0 is more suitable for embedding watermarks (Cheema et al. 2020;Liu et al. 2018;Tan et al. 2019).

Blind color image watermarking in the discrete cosine transform domain
This section presents a novel number-based image watermarking method resistant to color JPEG compression.

Constrained magnitude modulation
Given the previous discussion, we considered embedding the watermark into the Y 0 channel. Specifically, to increase the survival rate of the watermark in the presence of JPEG compression, we developed a constrained magnitude modulation (CMM) method to embed three 8-bit unsigned integers (pixel values) into the DCT coefficients derived from the Y 0 channel. However, due to the connections across the Y 0 , R, G, and B channels, the watermarking process took effect in the R, G, and B channels simultaneously.
The concept of the proposed watermarking method (hereafter referred to as DCT-CMM) is illustrated in Fig. 1. Within the figure, Fig. 1a demonstrates the block partition of the ''Lena'' image, while Fig. 1b highlights the location of the DCT coefficients involved in the subsequent derivation. It was assumed that three 8-bit unsigned integers were embedded into three DCT coefficients in the luma channel, with the positions selected as ði 1 ; j 1 Þ ¼ ð3; 1Þ, ði 2 ; j 2 Þ ¼ ð2; 2Þ, and ði 3 ; j 3 Þ ¼ ð1; 3Þ, respectively.

Procedures for watermark embedding and extraction
The complete watermark embedding process is displayed in Fig. 2. The watermarks used in this study were downsampled color images of 64 Â 64 pixels. For the sake of information security, each watermark was scrambled using the Arnold transform (Arnold and Avez 1968) with a specific encryption key before proceeding with watermark embedding. When extracting the watermark, the correct key was then required to restore the watermark content. The proposed DCT-CMM method begins by partitioning the host image into non-overlapping 8 Â 8 blocks, followed by applying 2D DCT to each block in the R, G, and B color channels. For the DCT coefficients at locations ði 1 ; j 1 Þ, ði 2 ; j 2 Þ, and ði 3 ; j 3 Þ, we determined the intended intensity levels of the luma components (i.e., q) using Eq. (9) and then pursued the optimal solutions of D R , D G , and D B using Eq. (16). Adding the derived results to C R ði 1 ; j 1 Þ, C G ði 1 ; j 1 Þ, and C B ði 1 ; j 1 Þ completed the embedding process of the red pixel value. A similar procedure was then applied to the coefficients located at ði 2 ; j 2 Þ and ði 3 ; j 3 Þ to embed the green and blue pixel values, respectively. Overall, there are three branches presented in Fig. 2 for embedding the three color pixel values into a color image block.
Watermark extraction is the process of retrieving a watermark from a watermarked color image. The extraction procedure is illustrated in Fig. 3, in addition to the use of a residual learning network to further enhance the recovered watermark. The first step was to convert the host image from the RGB to Y 0 CbCr color space and then perform block partitioning. At first glance, this step appears different from the one used in watermark embedding, although the purpose is the same. It is simply a preparation for acquiring the luma component x Y 0 ðm; nÞ f g . Because C Y 0 ði; jÞ f g can be obtained from x Y 0 ðm; nÞ f g directly, watermark extraction does not require the individual DCTs of the R, G, and B channels. This type of arrangement saves two rounds of 2D DCT conversion in terms of computation, thus accelerating the processing speed.
If we suppose that the three DCT coefficients of C Y 0 ði 1 ; j 1 Þ,C Y 0 ði 2 ; j 2 Þ, andC Y 0 ði 3 ; j 3 Þ have already been extracted from the ðp; qÞ th block after 2D DCT, substituting these three coefficients into Eq. (17), respectively, renders the red, blue, and green levels of a color pixel.
In Eq. (17),w R ðp; qÞ,w G ðp; qÞ, and ,w B ðp; qÞ, respectively, denote the intensity levels of the ðp; qÞth pixel of the watermark in the red, blue, and green channels. The tildes on the variables imply that the watermark image might have suffered intentional attacks or unintentional modifications. After gathering all the pixels, the final step is to use the secret key to decrypt the watermark image.

Pixel value regulation
IN the experiments, we observed that some of the extracted watermarks appeared brighter (such as those recovered after unsharp filtering), and some were darker (such as those after applying median/Wiener filtering). Interestingly, these watermarks with altered luminance were still visually recognizable, suggesting that the attacks simply resulted from a stretching effect on the pixel values.
Consequently, regulating the pixel values back to the normal luminance range was expected to render a better view of the recovered watermarks. Luminance regulation started by sorting the pixel values gathered from every channel of the color watermark. Then, the average in the range 90-95% of the sorted values in ascending order (termed g) served as a gauge to mark the dynamic range of the watermark. Depending on the resulting g, each pixel value of the recovered watermark was proportionally rescaled to ensure that the regulated g fell within a reasonable range Here,w reg ðp; qÞ denotes the regulated outcome from the retrieved pixel valuewðp; qÞ. In this way, the regulator can change the luminance of the color watermark to a normal range. It should also be noted that the scale adjustment, as given in Eq. (18), did not influence the textural content, although it rendered the watermark more visually distinguishable. Fig. 4 Test color images. Each image consists of 512 9 512 pixels

Performance evaluation
The test materials comprised 25 color images (512 Â 512 pixels) collected from the USC-SIPI (1997) and CVG-UGR (2002) image databases, which are presented in Fig. 4 as a 5 9 5 array. The watermarks were 25 downsampled color images (64 9 64 pixels) chosen from IAPR TC-12 Benchmark (Grubinger et al. 2006). The IAPR TC-12 database includes 20,000 still natural images of people, animals, cities, and landscapes suitable for training deeplearning neural networks for image denoising and superresolution reconstruction. For security reasons, each color channel of the color watermark image was scrambled using the Arnold transform (Arnold and Avez 1968) before embedding. The test watermarks with their scrambled versions examined in our experiments are presented in Fig. 5. The embedding strength a for the proposed DCT-CMM method was empirically set at 30 to maintain a balance between the imperceptibility and robustness of the watermark. In addition to the proposed DCT-CMM, six recently published blind color image watermarking methods were employed for performance evaluation and comparisons. For simplicity, these six methods are abbreviated to DFT20 (Zhang et al. 2020

Imperceptibility test
To assess the influence on the host image due to the watermarking process, we adopted the peak signal-to-noise ratio (PSNR) and MSSIM metrics to determine the degradation of image quality. The definitions of PSNR and MSSIM are as follows: where I ¼ I m;n;t È É N row ÂN col Â3 andÎ ¼Î m;n;t È É N row ÂN col Â3 , respectively, represent the original and watermarked images of M row Â N col pixels with a depth of three color channels. Terms B l;k;t andB l;k;t correspond to the ðl; kÞth windows in the tth channel acquired from I andÎ, respectively. The denominator L Â K Â 3 indicates the number of image blocks, while the function SSIMðÁÞ aims to measure the degree of similarity between B l;k;t andB l;k;t . SSIM B l;k;t ;B l;k;t À Á ¼ 2l B l;k;t lB l;k;t þ k 1 2r B l;k;tBl;k;t þ k 2 l 2 B l;k;t þ l 2  where l B l;k;t and r 2 B l;k;t , respectively, represent the mean and variance of B l;k;t ; r B l;k;tBl;k;t denotes the covariance between B l;k;t andB l;k;t ; the k 1 and k 2 are values introduced to ensure the stability of SSIM. By default, these values were k 1 ¼ 0:01 and k 2 ¼ 0:02. Table 2 presents the statistical results of the PSNRs and MSSIMs for the test images and watermark. The proposed DCT-CMM resulted in an average PSNR of 37.253 dB with an MSSIM of 0.961. The QRMM22 achieved comparable outcomes, with PSNR remaining above 37 dB. However, its MSSIM was somewhat lower than the level rendered by our method. Compared to the QRMM22, the embedding strength used in the SD21 was stronger, as reflected by a decrease of approximately 1.6 dB in PSNR. Therefore, the MSSIM resulting from the SD21 was worse than that of QRMM22.
Among the seven compared methods, the PSNR acquired from the DFT20 tended to be the lowest, but the resulting MSSIM was among the leading group. The reason can be attributed to the fact that the DFT20 merely modifies the DC component of each block, leaving the structural content intact. Because the Harr21 modulated the most significant Haar coefficients derived from smaller image blocks of 2 Â 2 pixels, the scope of influence on the entire host image was rather broad. The resulting MSSIM was unacceptable, even though the corresponding PSNR appeared normal. A similar phenomenon was also observed in the case of DWT20, where the embedding process affected every pixel. As a result, the DWT20 received the lowest MSSIM, and the corresponding PSNR remained moderate. In contrast, the affected area for the WHT22 in each embedding block was only partial, thus leading to a relatively high MSSIM.

Robustness test
The second phase of the performance evaluation concerned the robustness of the compared watermarking methods in the presence of commonly encountered attacks. The types of attacks considered in this study included image compression, noise corruption, filtering, histogram equalization, and geometric correction. Table 3 lists the specifications of the intended attacks.
As explained in Sect. 4.2, the DCT-CMM was designed to embed three 8-bit pixel values into the mid-frequency DCT coefficients derived from an 8 Â 8 color image block. Consequently, a color image of 512 Â 512 pixels could be used to hide a color watermark of 64 Â 64 Â 24 bits. According to our literature survey, there have been very few attempts to deliver such high-capacity watermarking. Moreover, previously developed color image watermarking methods were designed to embed binary information. Because the embedding object of the proposed DCT-CMM differs from those that performed binary watermarking, this difference imposed difficulties during the performance comparisons. A feasible assessment applicable to arbitrary data types is zero-normalized cross-correlation (ZNCC), which measures the similarity of two data sequences.
where w v denotes the mean of the pixel values taken from a watermark image W v ¼ w v f g of size n L Â n K ð¼ 64 Â 64Þ with n ch ð¼ 3Þ color channels. The tilde signs signify retrieval after a possible attack, where the watermark consists of binary bits, the ZNCC adopts the following form: where w b denotes the mean of the binary values taken from a watermark image W b ¼ w b f g of size n L Â n K Â n ch Â n b . Here, n L ¼ n K ¼ 64, n ch ¼ 3, and n b ¼ 8.
It should be noted that the ZNCC specified in Eq. (22) can also be used to verify the legitimacy of the watermark against illicit ones. The verification process is demonstrated in Fig. 6, which shows the ZNCC between the desired and arbitrary candidates selected from a pool of 1000 watermark images. The distinct value of 1 indicates (a) (b) (c) Fig. 6 Panel (c) depicts the ZNCC coefficients obtained from a subset of the IAPR TC-12 image database with the scrambled version (shown in (b)) of the 495 th watermark image (shown in (a)) chosen as the targeted object Blind color image watermarking incorporating a residual network for watermark denoising and… 927 its identity at location 495, while other ZNCCs are mostly distributed between -0.2 and 0.2. Consequently, a ZNCC value [ 0.5 can be considered a strong indicator of the existence of the watermark. Table 4 summarizes the average ZNCCs for the watermarks retrieved under various attacks. As revealed by the tabulated data, except for two cases (salt and pepper noise corruption and histogram equalization), the proposed DCT-CMM generally outperformed the other six methods. The inferior performance of the DCT-CMM in the case of salt and pepper noise corruption could be ascribed to the extreme alteration of a few pixels in an image block being enough to perturb the DCT magnitudes noticeably. In contrast, the other six methods demonstrated superior resistance against the salt and pepper noise corruption. This advantage is conceivable because the noise attack only affects part of the small-size image blocks. For histogram equalization, the ZNCC obtained by the proposed method was also below that acquired by WHT21. In this particular case, the higher ZNCCs of the WHT21 benefited from the transformed domain and embedding scheme for watermarking.
A general comparison across the other six methods revealed that each had its advantages and disadvantages when facing different attacks. Importantly, none of them could match the proposed DCT-CMM under overall consideration and could hardly survive attacks such as JPEG compression and median filtering. To provide a common view of how the extracted watermarks looked, Table 5 presents two representative watermarks that were embedded into the ''Lena'' color image and recovered after the presence of various attacks. Here, the merit of the DCT-CMM is evident, since the extracted watermarks were all visually recognizable, regardless of the type of attack.

Watermark enhancement via deep learning
As the retrieved watermark were only 64 Â 64 color pixels, the resolution was insufficient if zooming was required for a more detailed inspection. However, advances in deep learning techniques make it possible to enhance the quality and resolution of watermark images using a very deep super-resolution (SR) network. SR reconstruction is challenging because the original high-frequency content does not typically reside in low-resolution images. Following the procedure presented by Ledig et al. (2017), we employed a residual learning network to learn the residual derived from the difference between the high-resolution reference image and the up-sampled low-resolution image. Theoretically, provided the residual image retains the highfrequency details of an image, adding the residual to the up-sampled low-resolution image is expected to compensate for the lack of high-frequency components.
The residual network (ResNet) used to perform image denoising and super-resolution reconstruction is depicted in Fig. 7, which is the same network architecture used by Ledig et al. (2017). The ResNet model can be thought of as a composition of K multiple functions altogether, each involving adjustable weights, termed W, and biases,  termed b, to form a nonlinear transformation between an input x of dimension n and output y of dimension p.
The determination of Ws and bs requires supervised learning techniques along with a sufficient amount of data samples. In Fig. 7, once the watermark image is taken into a convolutional (abbreviated as ''Conv'') layer with the use of the parametric rectified linear unit (PReLU) as the activation function, the SR model then uses an identity mapping function to learn the high-frequency details. The upper branch of the additive residual learning network controls the identity mapping, which involves a stacked structure of a composite of five different layers arranged as ''convolution, batch normalization (BN), PReLU, convolution, and BN.'' For the specifications of each layer, readers are directed to the original paper (Ledig et al. 2017) for more details. After gathering the identity mapping and additive residual, the result is then further propagated through two stacks formed by the ''Conv, PixelShuffle, and PReLU'' layers. Here, the primary function of ''PixelShuffle'' is to acquire high-resolution features from low-resolution feature maps through convolution and reorganization across multiple channels. The number within the parentheses of the string label ''PixelShuffle(2)'' signifies the desired multiple of up-sampling in each dimension. As there are two PixelShuffle(2) layers in the SR network model, the resolution of the outcome is a 256 Â 256-pixel color image.
The watermarks used for training the SR network consisted of 1000 color images taken from the IAPR TC-12 database (Grubinger et al. 2006). Before launching the network training, we adopted a preprocess analogous to image augmentation to expand the dataset. Specifically, we first trimmed each watermarked image to 256 Â 256 pixels and then down-sampled them to 64 Â 64 pixels. The four classical images (''Lena,'' ''Baboon,'' ''Peppers,'' and ''F16'') depicted on the top row of Fig. 4 served as the host images in a simulation of watermark extraction under attacks. Apart from the ideal case without any attack, we applied every attack listed in Table 3 to the four watermarked images and then retrieved the watermarks using the expressions in Eqs. (17) and (18). The retrieved watermark (64 Â 64 pixels) and its original high-resolution image (256 Â 256 pixels) formed an input-output pair for supervised learning. Overall, 68,000 (= 4 Â 17 Â 1000) watermark samples were used for the SR network training. Our purpose here was twofold: (1) training the SR network to remove the noise resulting from imperfect watermark extraction, and (2) attaining a higher resolution. During the training phase, apart from selecting the Adam optimizer, we used a mini-batch of 16 observations at each iteration and set the maximum number of epochs for training to 100. The above ResNet model training was performed using the PyTorch framework with an NVIDIA 3080 GPU to accelerate the processing speed.
The improvement from employing the SR network is demonstrated in Fig. 8, which presents the exemplary watermark retrieved from the watermarked ''Lena'' image under the examined attacks. Inside each cell, the left-side image comprises the retrieved watermark on the upper-left corner along with its fourfold up-sampled version in each dimension using bicubic interpolation. The right side Fig. 9 The average ZNCCs derived from three sorts of watermarks in the training dataset shows the high-resolution image resulting from the SR network. As revealed from the contrast in each cell of Fig. 8, the SR network generally rendered clearer highresolution images with more high-frequency details.
To measure the improvement quantitatively, we compared the ZNCCs obtained with and without using the SR network. In the final experiment, apart from the training set of 1000 64 Â 64-pixel color watermarks used in network training, we adopted another 500 color watermarks from the same database to examine the efficiency of the trained SR network. Figures 9 and 10, respectively, present the average ZNCCs obtained between the original and retrieved watermarks in the presence of various attacks for the training and test datasets. Both figures consist of three types of ZNCCs (''w64,'' ''BIw256,'' and ''D&SRw256''), which, respectively, correspond to the results derived from 64 Â 64 pixel watermarks, bicubically interpolated 256 Â 256 pixel watermarks, and denoised and super-resolved 256 Â 256 pixel watermarks. The bar graph of the ''w64'' in these two figures generally reflected the statistical distribution described in Table 3. Directly applying bicubic interpolation to the watermark, as revealed in the elements of the ''BIw256,'' did not contribute any significant improvements in terms of ZNCC. In contrast, the benefit of the SR network is evident, as the resulting ZNCCs generally reached a level of [ 0.9 after the use of the SR network, regardless of the type of attack. The ZNCCs obtained from the test dataset exhibited a similar trend. Compared to the results observed in the training dataset, these values varied somewhat, although the differences were negligible. The biggest change due to involving the SR network occurred in case F (i.e., median filtering), where the SR network increased the ZNCC from 0.503 to 0.887 for the training set and from 0.504 to 0.882 for the test set.

Conclusions
In this study, we developed a DCT-based color watermarking method entitled constrained magnitude modulation, which directly embeds color pixel values into the designated DCT coefficients. Moreover, to reinforce robustness against JPEG compression, watermark embedding was performed on the luma components derived from the three prime color channels. In particular, we incorporated the Lagrange multiplier to ensure the minimum alteration of each involved DCT coefficient. For high-capacity color watermarking at a rate of 3/64 8-bit unsigned integers per pixel, the use of the suggested embedding strength in the DCT-CMM achieved an average PSNR of approximately 37.253 dB and an MSSIM of 0.961, both of which were comparable to the other recently developed watermarking methods that were tested. The robustness of the proposed DCT-CMM was evidenced by the ZNCC between the original and extracted watermarks under various attacks. Except for salt and pepper noise corruption and histogram equalization, the proposed method was generally the most successful. Among the attacks considered in this study, the median filtering attack presented the most difficult challenge. Nonetheless, thanks to the robustness of the DCT-CMM, the resulting ZNCC still reached an average of 0.525, while the six compared methods achieved considerably inferior ZNCCs (0.004-0.456). Fig. 10 The average ZNCCs derived from three sorts of watermarks in the test dataset After extracting the color watermark from the watermarked image, we employed a residual network to perform image denoising and super-resolution reconstruction. Our experiment results confirmed that the employment of an SR network could increase the ZNCC to a level [ 0.88, regardless of the encountered attack scenarios. Moreover, the improvement due to the SR network was even conspicuous through visual perception, as the processed outputs rendered more recognizable, higher quality, and higher resolution color images.
The proposed approach has generally shown superiority in all aspects of blind image watermarking, but there is still room for further improvement. For example, the proposed approach lacks sufficient resilience when encountering median and Wiener filtering. Besides, the enhanced color watermark appears somewhat smooth after raising the resolution. Our future work will be focused on two aspects: (1) developing a watermarking method that can resist various filtering attacks, and (2) employing another type of super-resolution technique, such as the generative adversarial network, to improve the perceived quality of recovered watermarks.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Ethical approval This article does not contain any studies with human participants performed by any authors.