A Uni�ed Framework of Deep Unfolding for Compressed Color Imaging

: Traditional iterative-based reconstruction algorithms for compressed color imaging often suffer from long reconstruction time and low reconstruction accuracy at extreme low-rate subsampling. This paper proposes a model-driven deep learning framework for compressed color imaging. In the training step, extract the image blocks at the same position of the R, G, and B channel images as the ground truth, then singular value decomposition is performed on the measurement matrix to obtain the optimized measurement matrix and low-dimensional measurements, afterwards the ground-truth and optimized measurements are utilized to construct a large amount of training data pairs to train an ‘ end-to-end ’ deep unfolding model for compressed color imaging. In the test step, the single pretrained model is used to reconstruct high-quality images from optimized low-dimensional compressed measurements for each channel and synthesize a color image. Numerical experiments demonstrate that our proposed unified framework can achieve high accuracy and real-time reconstruction for the color image at extremely low subsampling rate.


I. Introduction
Compressed sensing (CS) theory shows that when the signal is sparse, the original signal can be recovered from the measurements far fewer than that required by Nyquist-Shannon sampling theorem [1].CS has been successfully used in sensor networks [2], medical imaging and other fields.As the color image can provide more information about real-world than grayscale images and show more details to further meet the visual and physical needs of the audience, therefore the research on color images has attracted more and more attention.
Compressed sensing has been introduced into the sensing of color images, and the correlation between different channels of color images is used to achieve accurate reconstruction of color images under compressive sampling, which greatly improves the sampling efficiency of color images.In 2009, Nagesh et al. used the correlation between the channels of color image and proposed a non-convex weighted least squares method based on group sparsity to achieve accurate reconstruction of color images [3].
In 2010, Majumdar et al proposed two new non-convex group sparse optimization methods under the premise that each color channel is sparse and highly correlated in a specific domain (DCT, wavelet, etc.).The original color image is reconstructed from the data of the random projection (subsampling) of each color channel.The experimental results show that compared with the ordinary sparse optimization algorithm, sparsity is incorporated in the reconstruction problem to further improve the image reconstruction accuracy [4].
In 2012, Liu et al proposed to separate the RGB components of a color image into dense and sparse parts, then the dense part uses the traditional coding method, and the sparse part uses the method of compressed sensing to complete the color coding of the color image [5].In 2019, Ye et al. proposed a method of transforming the components into YCbCr color space and then performing compressed sensing super-resolution method to improve the distortion and aliasing in the super-resolution reconstruction of color images [6].
One key problem in color image sensing is sparse optimization algorithm for recovering the color image from compressed measurements, the most common methods are mainly solved by model-driven iterative optimization algorithms.Although structured priors such as group sparse can be used to improve the reconstruction quality, it mainly faces two major challenges: on the one hand, the iterative reconstruction process is extremely time-consuming, especially for the large-scale images.On the other hand, the traditional compressed imaging method has a low reconstruction quality at very low sampling rates.
To solve the ill-posed inverse problem, the rise of deep learning has opened the "data-driven" era, complete the extraction and recognition of data features from a large number of data samples.
The problem of recovering the original signal is transformed into a learning problem from the original input to the desired output.This end-to-end learning strategy has a synergistic advantage, making more likely to obtain an optimal solution [7].
In 2015, Mousavi et al. applied the deep learning method to the field of compressed sensing for the first time, and proposed a method using Stacked Denoising Autoencoders (SDA) to reconstruct from the subsampling measurements [8].In 2017, Mousavi et al. also proposed the use of a Deep Convolutional Neural Network (DCNN) network to learn the inverse transformation from the measurement vector to the signal [9].In 2017, Yao et al. proposed a Deep Residual Reconstruction Network (DR2-Net) of a residual module, the fully connected layer in the neural network is used as a linear map for preliminary reconstruction, and then the reconstruction result is improved by training the residual learning block between preliminary reconstruction and the real objects [10,11].In 2019, Zhou et al. for the problem of block-based image compressed sensing, the correlation between image blocks is used to rebuild image blocks with different sampling rates and complete assembly in a single model by developing a block-based CS algorithm [12].In order to solve the nonlinear optimization problem in compressed sensing, end-to-end learning does not adopt a piecewise linear approach to approximate the global optimal solution, but learns the mapping from the original input to the expected output through a deep neural network.
In order to overcome the limitations of poor reconstruction quality at extreme low sampling rate and time-consuming reconstruction of traditional iterative-based approach for color image compressed sensing, this paper combines iterative-based approach with deep learning method, embedding physical prior knowledge into deep learning, and proposes a unified model-driven deep learning framework for compressed sensing of color image---Deep Unfolding Compressed Color Imaging (DUCCI), then a singular value decomposition preprocessing layer to obtain optimized measurement matrix and measurements is added into DUCCI to train the deep neural network to reconstruction R, G, and B channel of the color image from low-rate compressed measurements, dubbed as SVD-DUCCI.Our proposed SVD-DUCCI method takes the advantage of both deep learning approach and classical iterative-based optimization method, which can make the network interpretable, avoid computational complexity, and realize non-iterative real-time and high-quality reconstruction of color image from extremely low-rate subsampling.

Compressive Color Imaging
Color images can be divided into three channels of Red (R), Green (G), and Blue (B).The component of each channel is assumed to be sparse in a certain transform domain (such as DCT basis or wavelet basis), that is where c represents c-th channel of the color image, and C s represents the sparse transform coefficient.Since the image is sparse in the transform domain, a high-dimensional signal can be projected onto a low-dimensional space through a random measurement matrix F , and the measurements are obtained as follows, Therefore, the compressed sampling model of color image can be written as == 0 0 0 0 0 0 x yx l X Q (5) where y is the measurement of the color image expressed as [ , , ]  y y y y , Q is the measurement matrix, which is a block diagonal matrix, expressed as Iterative Shrinkage Thresholding Algorithm (ISTA) is suitable for solving many large-scale linear inverse problems.ISTA solves the CS reconstruction problem in equation ( 5) by iterate between the following update steps: ( ), where k is the index of the ISTA iteration and  is the step size.When X is a non-orthogonal or non-linear complex transformation basis, multiple iterations are required to make () k x in equation ( 7) is a more accurate solution, which causes high computational cost.The optimal transformation basis X and all parameters, for example,  and  are defined manually, which makes the adjustment of prior knowledge very challenging.G is a learnable linear convolution operator, which N P represents the number of feature maps.

ISTA-Net Framework
Ⅲ Deep Unfolding Compressed Color Imaging

Singular value decomposition
ISTA-Net+ requires that the reconstruction matrix must be semi-orthogonal.In practical application, it is very difficult to meet this condition.In the added preprocessing layer, we perform singular value decomposition on the measurement matrix to obtain the optimized training data pair, which reduces the requirements for the front-end measurement matrix design while ensure that it is a semi-orthogonal reconstruction matrix.
Singular value decomposition is performed on the measurement matrix  in equation (4)   which can be expressed as follows: Multiply the matrix 1 1 T  DU left by both sides of equation ( 8) to get following equation: where is a submatrix of V , which are orthogonal matrix.Combining equations ( 8) and ( 9), we know that the optimized measurement matrix y at this time.

Sub-channel training strategy
The diagram of compressed sensing and rapid reconstruction for color image is depicted in Figure 2. First, we input an original color image, extract R, G, and B three-channel images , and obtain different measurements through a degraded system.This process is called the forward process of image recovery.The inverse process of recovering the original image from these measurements is called the recovery problem.In general, the traditional solution methods mainly include convex optimization and greedy, but as the model of inverse problems becomes more complicated, the calculation complexity of the optimization algorithm is higher.The deep learning-based method can directly learn the non-linear mapping from the input to output , which further improves the quality and speed of image recovery.

A Unified 'End-to-End' Framework for Compressed Color Imaging
The

Ⅳ. Numerical Experiments
This section mainly evaluates the performance of the proposed method in compressed sensing of color images.The datasets used for training and testing are described first, and then the detailed parameters of neural network training are presented.Finally, different methods are compared.

Training and test sets
In the experiment, we use 91 color images [14] as the training set for the network.x .An SVD post-processing layer is added into ISTA-Net+ to extend ISTA-Net+ framework for other generalized compressed imaging system with non-orthogonal measurement matrix, which can realize the separate design and optimization of the optical imaging process and the reconstruction process, i.e. measurement matrix F and reconstruction matrix SVD F .
To verify the superiority of our proposed U-SVD-ISTA-Net+ framework for color compressed sensing， we test on the standard test set with 11 color images including Lena, Baboon, etc.Using SNR, PSNR, and SSIM as evaluation index for image reconstruction.

Training details
As shown in Fig. 1, the first convolution operator equivalently, where A and B respectively correspond to the two convolution operators mentioned above.Due to () g F is learnability and non-linearity, it is more hopeful to achieve a more compact representation of natural images.
In this paper, TensorFlow [15] is used to complete the training under different sampling rates of sub-ISTA-Net+.In order to reduce the computational overhead, the data is trained in batches and the size of each training batch is 64.In the training of sub-ISTA-Net+ and U-ISTA-Net+, the epochs are both 60.Adam optimization [16] is used to complete the parameters update, the learning rate is 0.0001 (60 epochs).All experiments are performed on a GeForce RTX 2070 GPU.

Experiment
To verify that the U-SVD-ISTA-Net+ can be greatly reduced time of training while ensuring the reconstruction accuracy.This paper firstly selects a semi-orthogonal matrix orth Φ as the measurement matrix and conducted a set of experiments as a baseline for comparison.The experimental results are depicted in Fig. 6.
We mainly involved three different measurement matrices, which include the measurement matrix with row normalization (RN) RN Φ , the measurement matrix after semi-orthogonalization orth Φ and the measurement matrix after with singular value decomposition on Gaussian matrix SVD Φ (our proposed method).For the sake of fairness, this paper compares and analyzes the above three measurement matrices at different sampling rates   5%, 10%, 20%, 30% on the same training set and test set.The quantitative measure of color image reconstruction quality is presented in Table 1.

Ⅵ. Conclusions
This paper proposes a unified deep unfolding framework for compressive color imaging and non-iterative real-time reconstruction.On the one hand, this method reduces the design requirements of the forward imaging system by singular value decomposition for the measurement matrix, eliminates the correlation between the measured values, and we get optimized measurement matrix and training data pair.On the other hand, by performing deep unfolding method on the three channels of color images, the computational complexity of traditional iterative optimization is avoided, and high-quality images can be recovered in real-time.
Experimental results show that this paper can achieve non-iterative real-time high-quality reconstruction for compressive color imaging by our proposed U-SVD-ISTA-Net+ framework.
Abbreviated as There are three main types of compressed sensing reconstruction methods: model-driven optimization methods, data-driven optimization methods, and model-based data-driven optimization methods.Model-based data-driven methods combine prior knowledge of data with sample data-driven.Starting from practical problems, a model family and an algorithm family are designed, and the algorithm family is developed into a deep learning network.The advantage is that it can not only improve the learning efficiency and data generalization ability of the network in massive data, but also make the algorithm universal and theoretically guaranteed.ISTA-Net is a typical model-based data-driven optimization method.It uses deep learning to learn the image transformations and parameters involved in the original ISTA algorithm.ISTA-Net+ is an enhanced version of the ISTA-Net network.The basic idea is to unroll the previous ISTA update steps to a fixed number of phases.Each phase of the formed deep network architecture corresponds to one iteration in traditional ISTA.The calculation speed is more than 100 times faster than classical model-drive optimization methods.As well known, the residuals of natural images and videos are more compressible than themselves [13], ISTA-Net+ is an enhanced version of ISTA-Net [14], the nonlinear transformation of the image in ISTA-Net is replaced by modeling  o F H D and using the sum of x () k and () k G as the input of the next phase, it can further improve the universality and reconstruction performance of the neural network.The schematic diagram of the ISTA-Net+ framework is depicted in Figure 1.

Fig. 1 .
Fig.1.Framework of ISTA-Net+ As shown in Fig.1, ISTA-Net+ consists of N P phases, each phase strictly corresponds to

Fig. 2 .Figure 3 .Fig. 4 .
Fig. 2. Schematic Diagram of Compressed Color Imaging As described in Section 3.1, singular value decomposition is used to optimize the measurement matrix and the measured vector, therefore the requirement on the orthogonality for measurement matrix of ISTA-Net+ is eliminated.The block diagram of ISTA-Net+ training is depicted in Fig. 4. First, R, G, B three channels of color image are extracted, cropped into small blocks respectively, and then each image block is vectorized.Next, optimized measurement matrix and measurement vectors are achieved through the SVD preprocessing layer which are used as the input of network training.The pipelines for our proposed DUCCI and enhanced version SVD-DUCCI is depicted in Figure 3. Taking the R channel as an example, first, the image of the extracted channel is cropped and then vectorized, denoted as sub-channel training strategy needs to train the three-channel images separately, which results in a long training time and a large storage requirement.This paper proposes a unified 'End-to-End' framework for compressed color imaging based on ISTA-Net+ framework and SVD preprocessing, dubbed as U-SVD-ISTA-Net+, which can achieve non-orthogonal measurement matrix imaging by adding an SVD post-processing layer.The principle block diagram is illustrated in Fig. 5.As illustrated in Fig.5, U-SVD-ISTA-Net+ completes the training of the three-channel images within only one training process.Firstly, three-channel images of color image are extracted and cropped into 33*33 blocks.The image blocks at the same position of the three-channel images are selected for vectorization and compressed sensing, then the optimized measurement matrix and measured values are obtained through the SVD post-processing layer, and the training data pairs as the input of untrained U-SVD-ISTA-Net+.Finally, trained U-SVD-ISTA-Net+ is achieved.

Fig. 5 .
Fig.5.Illustration of our proposed U-SVD-ISTA-Net+ for compressive color imaging As illustrated in Fig.5 (b), in the reconstruction (i.e.test) step, R, G, and B channel of test color image is firstly extracted and cropped into 33*33 image blocks and vectorized, then after SVD preprocessing step on the measurement matrix to achieve optimized measurement matrix and measurement vector, the data pair First, color channels R, G, and B of 91 color images in the training set are extracted, as shown in Fig. 3. Three channel images are cropped into 33 * 33 blocks with stride 6  and vectorized, denoted as i b size 33  in the experiments), and the second convolution operator corresponds to another

Fig. 6
Fig. 6 The reconstruction results under different sampling rates of the sub-channel training method [3]a, at an extremely low sampling rate of 5%, the SNR is 18.78dB, the PSNR is 24.519dB, and the SSIM is 0.936.As the CS Ratio increases, the accuracy of image reconstruction gradually increases.When the sampling rate reaches at 30%, the average SNR is 27.72 dB, the PSNR is 33.360dB, and the SSIM is 0.989.Due to the low correlation between column vectors of Net+ framework, as shown in Figure5(a), we can see that without modifying the imaging system, the measurement matrix can meet the orthogonality requirement, which is more meaningful for actual imaging applications.The ISTA-Net+ deep learning network framework proposed in this paper has obvious advantages in compressed color imaging compared with traditional optimization algorithms, such as, Nagesh et al. proposed that to get measurements using Bayer measurement method and then use the EJSM algorithm for joint reconstruction[3], and Majumdar et al. regarded the reconstruction problem for R, G, B three-channel images as a group sparse problem and proposed L two non-convex group sparse optimization methods.In order to ensure the consistency and fairness of the experimental data variables, we chose to compare the experimental results of the four different methods at CS ratio is 30%.The results are shown in Table2.

Table 2
Comparison of experimental results between the method proposed in this paper and the traditional optimization algorithms at CS ratio is 30%