Greedy Randomized and Maximal Weighted Residual Kaczmarz Methods with Oblique Projection

For solving large-scale consistent linear system, we combine two efficient row index selection strategies with Kaczmarz-type method with oblique projection, and propose a greedy randomized Kaczmarz method with oblique projection (GRKO) and the maximal weighted residual Kaczmarz method with oblique projection (MWRKO) . Through those method, the number of iteration steps and running time can be reduced to a greater extent to find the least-norm solution, especially when the rows of matrix A are close to linear correlation. Theoretical proof and numerical results show that GRKO method and MWRKO method are more effective than greedy randomized Kaczmarz method and maximal weighted residual Kaczmarz method respectively.


Introduction
Consider to solve a large-scale consistent linear system where the matrix A ∈ R m×n , b ∈ R m . One of the solutions of the system (1) is x * = A † b, which is the least Euclidean norm solution. Especially, when the coefficient matrix A is full column rank, x * is the unique solution of the system (1). There are many researches on solving the system (1) through iterative methods, among which the Kaczmarz method is a representative and efficient row-action method. The Kaczmarz method [25] selects the rows of the matrix A by using the cyclic rule, and in each iteration, the current iteration point is orthogonally projected onto the corresponding hyperplane. Due to its simplicity and performance, the Kaczmarz method has been applied to many fields, such as computerized tomography [1,2], image reconstruction [3,4,5,6], distributed computing [7], and signal processing [8,1,2]; and so on [9,10,11,12]. Since the Kaczmarz method cycles through the rows of A, the performance may depend heavily on the ordering of these rows. A poor ordering may result in a very slow convergence rate. McCormick [13] proposed a maximal weighted residual Kaczmarz (MWRK) method, and proved its convergence. In recent work, a new theoretical convergence estimate was proposed for the MWRK method in [39]. Strohmer and Vershynin [37] proposed a randomized Kaczmarz (RK) method which selects a given row with proportional to the Euclidean norm of the rows of the coefficient matrix A, and proved its convergence. After the above work, research on the Kaczmarz-type methods was reignited recently, see for example, the randomized block Kaczmarz-type methods [14,15,16], the greedy version of Kaczmarz-type methods [38,32,31,18,19], the extended version of Kaczmarztype methods [28,21], and many others [17,40,39,34,30]. Kaczmarz's research also accelerated the development of column action iterative methods represented by the coordinate descent method [26]. See [42,41,36,33,29,27,23,20], etc.
Recently, Bai and Wu [18] proposed a new randomized row index selection strategy, which is aimed at grasping larger entries of the residual vector at each iteration, and constructed a greedy randomized Kaczmarz (GRK) method. They proved that the convergence of the GRK method is faster than that of the RK method. Due to its greedy selection strategy for row index, a large number of greedy versions of Kaczmarz work have been developed and studied. At present, a lot of work is based on Kaczmarz's theory of orthogonal projection. In [45,34], Constantin Popa gives the definition of oblique projection, which breaks the limitation of orthogonal projection. Therefore, in this paper, we propose a new descent direction based on the definition of oblique projection, which can guarantee the two entries of residual error to be zero during iteration, so as to accelerate convergence. Based on the row index selection rules of two representative randomized and non-randomized Kaczmarz-type methods -the GRK method and the MWRK method, we propose two new Kaczmarz-type methods with oblique projection (KO-type) -the GRKO method and the MWRKO method respectively, and their convergence is proved theoretically and numerically. We emphasize the efficiency of our proposed methods when the rows of the matrix A are nearly linearly correlated, and find that Kaczmarz-type method based on orthogonal projection performed poorly when applied to this kind of matrices.
The organization of this paper is as follows. In Section 2, we introduce the KO-type method, and give its two lemmas. In Section 3, we propose the GRKO method and MWRKO method naturally and prove the convergence of the two methods. In Section 4, some numerical examples are provided to illustrate the efficiency of our new methods. Finally, some brief concluding remarks are described in Section 5.
In this paper, 〈·〉 stands for the scalar product. x is the Euclid norm of x ∈ R n . For a given , G F and λ min (G), are used to denote the ith row, the transpose, the Moore-Penrose pseudoinverse [22], the range space, the null space, the Frobenius norm, and the smallest nonzero eigenvalue of G respectively. P C (x) is the orthogonal projection of x onto C,x is any solution of the system (1); x * = A † b is the least-norm solution of the system (1). Let E k denote the expected value conditonal on the first k iterations, that is, where j s (s = 0, 1, ..., k − 1) is the column chosen at the sth iteration.

Kaczmarz-type Method with Oblique Projection and its Lemmas
The sets H i = {x ∈ R n , 〈a i , x〉 = b i } (i = 1, 2, · · · , m) are the hyperplanes which associated to the ith equation of the system (1) . To project the current iteration point x (k) to one of the hyperplanes, the oblique projection [45,34] can be expressed as follows: where d ∈ R n is a given direction. In Figure 1, The framework of KO-type mthod is given in Section 2.1.

Algorithm 1 Kaczmarz-type Method with Oblique Projection
a i 1 4: for k = 1, 2, · · · , K do 5: Choose i k+1 based on a certain selection rule 6: Compute D i k = 〈a i k , a i k+1 〉 and r 9: end for 10: Output x (K+1) We will give two lemmas of KO-type method. The selection rule of its row index i k+1 does not affect the lemmas.

Lemma 1.
For the Kaczmarz-type method with oblique projection, the residual satisfies the following equations: r Proof. From the definition of the KO-type method, for k = 1, we have For k > 1, we have The fifth equality holds due to 〈a i k , The second equality holds due to the equation (3). Thus, the equation (4) holds.

Lemma 2.
The iteration sequence x (k) ∞ k=0 generated by the Kaczmarz-type method with oblique projection, satisifies the following equations: wherex is an arbitrary solution of the system (1). Especially, when P N (A) ( Proof. For k = 0, we have which shows that x (1) −x is orthogonal to a i 1 . Therefore, we know It follows that For k > 0, we have The third and last equalities hold due to h i k = w (i k ) 2 , and the equation (3) respectively. Thus we get that x (k+1) −x is orthogonal to w (i k ) . Therefore, we get that It follows that Thus, from the above proof, the equation (5) holds. According to the iterative formula

Greedy Randomized and Maximal Weighted Residual Kaczmarz methods with Oblique Projection
In this section, we combine the oblique projection with the GRK method [18] and the MWRK method [13] to obtain the GRKO method and the MWRKO method, and prove their convergence. Theoretical results show that the KO-type method can accelerate the convergence when there are suitable row index selection strategies.

Greedy Randomized Kaczmarz Method with Oblique projection
The core of the GRK method [18] is a new probability criterion, which can grasp the large items of the residual vector in each iteration, and randomly select the item with probability in proportion to the retained residual norm. Theories and experiments prove that it can speed up convergence speed. This paper uses its the row index selection rule in combination with the KO method to obtain the GRKO method, and the algorithm is as follows: Determine the index set of positive integers The convergence of the GRKO method is provided as follows. (1), where the coefficient matrix A ∈ R m×n , b ∈ R m . Let x (0) ∈ R n be an arbitrary initial approximation ,x is a solution of system (1) 0) ). Then the iteration sequence x (k) ∞ k=1 generated by the GRKO method obeys

Theorem 1. Consider the consistent linear system
In addition, if Proof. When k = 1, we can get The second equality holds due to the equation (3). When k > 1, we get The second equality holds due to the equation (3) and the equation (4). Under the GRKO method, Lemma 2.2 still holds, so we can take the full expectation on both sides of the equation (5), and get that for k = 0, and for k > 0, The first inequality of the equation (14) is achieved with the use of the fact that , and the first inequality of the equation (15) is achieved with the use of the fact , and the second inequality of the equation (15) is achieved with the use of the definition of k which lead to Here in the last inequalities of the equation (14) and (15), we have used the estimate ||Au|| 2 2 ≥ λ min (A T A)||u|| 2 , which holds true for any u ∈ C n belonging to the column space of A T . According to the lemma 2.2, it holds.
By making use of the equation (12), (13) and (15), we get Finally, by recursion and taking the full expectation , the equation (8) holds.

Remark 1.
In the GRKO method, h i k is not zero. Suppose h i k = 0, which means ∃λ > 0, λa i k = a i k+1 . Due to the system is consistent, it holds 〈a i k+1 , From step 5 of Algorithm 3.1, we can konw that such index i k+1 will not be selected.

Maximal Weighted Residual Kaczmarz Method with Oblique Projection
The selection strategy for the index i k used in the maximal weighted residual Kaczmarz (MWRK) method [13] is: Set McCormick proved the exponential convergence of the MWRK method. In [39], a new convergence conclusion of the MWRK method is given. We use its row index selection rule combined with KO-type method to obtain MWRKO method, and the algorithm is as follows: Compute i k+1 = ar g max i∈{1,2,··· ,m} The convergence of the MWRKO method is provided as follows. (1), where the coefficient matrix A ∈ R m×n , b ∈ R m . Let x (0) ∈ R n be an arbitrary initial approximation ,x is a solution of system (1) such that P N (A) (x) = P N (A) (x (0) ). Then the iteration sequence x (k) ∞ k=1 generated by the MWRKO method obeys

Theorem 2. Consider the consistent linear system
(∀k > 1), which γ 1 , γ 2 and ∆ are defined by equations (9), (10) and (11) respectively. In addition, if x (0) ∈ R(A T ), the sequence x (k) ∞ k=1 converges to the least-norm solution of the system Proof. Under the MWRKO method, Lemma 2.2 still holds. For k = 1, we have For k = 1,we have For k > 1,we have Here in the last inequalities of the equation (17), (18) and (19), we have used the estimate which holds true for any u ∈ C n belonging to the column space of A T . According to the lemma 2.2, it holds. From the equation (17), (18) and (19), the equation (16) holds.

Remark 3.When multiple indicators i k+1 are met in
Step 2 of Algorithm 3.2 in the iterative process, we randomly select any one of them.
Remark 4. In the MWRKO method, the reason of h i k = 0 is similar to Remark 1.

Numerical Experiments
In this section, some numerical examples are provided to illustrate the effectiveness of the greedy randomized Kaczmarz (GRK) method, the greedy randomized Kaczmarz method with oblique projection (GRKO), the maximal weighted residual Kaczmarz method (MWRK), and the maximal weighted residual Kaczmarz method (MWRKO) . All experiments are carried out using MATLAB (version R2019b) on a personal computer with 1.60 GHz central processing unit (Intel(R) Core(TM) i5-10210U CPU), 8.00 GB memory, and Windows operating system (64 bit Windows 10). In our implementations, the right vector b = Ax * such that the exact solution x * ∈ R n is a vector generated by the r and function. Define the relative residual error (RRE) at the kth iteration as follows: The initial point x (0) ∈ R n is set to be a zero vector, and the iterations are terminated once the relative solution error satisfies RRE < ω or the number of iteration steps exceeds 100,000. If the number of iteration steps exceeds 100,000, it is denoted as "-". We will compare the numerical performance of these methods in terms of the number of iteration steps (denoted as "IT") and the computing time in seconds (denoted as "CPU"). Here the CPU and IT mean the arithmetical averages of the elapsed running times and the required iteration steps with respect to 50 trials repeated runs of the corresponding method.

Experiments for Random Matrix Collection in [0, 1]
The random matrix collection in [0, 1] is randomly generated by using the MATLAB function r and, and the numerical results are reported in Tables 1-2 and Figures 2-3. In this subsection, we let ω = 0.5 × 10 −8 . According to the characteristics of the matrix generated by MATLAB function r and, Table  1 and Table 2 are the experiments for the overdetermined consistent linear systems, underdetermined consistent linear systems respectively. Under the premise of convergence, all methods can find the unique least Euclidean norm solution x * .
From Table 1 and Figure 2, we can see that when the linear system is overdetermined, with the increase of m, the IT of all methods decreases, but the CPU shows an increasing trend. Our new methods -the GRKO method and the MWRKO method, perform better than the GRK method and the MWRK method respectively in both iteration steps and running time. Among the four methods, the MWRKO method performs best. From Table 2 and Figure 3, we can see that in the case of underdetermined linear system, with the increase of m, the IT and CPU of all methods decrease.
In this group of experiments, whether it is an overdetermined or underdetermined linear system, whether in terms of the IT or CPU, the GRKO method and the MWRKO method perform very well compared with the GRK method and the MWRK method. These experimental phenomena are consistent with the theoretical convergence conclusions we got.

Experiments for Random Matrix Collection in [c, 1]
In this subsection, the entries of our coefficient matrix are randomly generated in the interval [c, 1]. This set of experiments was also done in [30] and [46], and pointed out that when the value of c is close to 1, the rows of matrix A is closer to linear correlation. Theorem 3.1 and theorem 3.2 have shown the effectiveness of the GRKO method and the MWRKO method in this case. In order to verify this phenomenon, we construct several 1000 × 500 and 500 × 1000 matrices A, which entries is independent identically distributed uniform random variables on some interval [c, 1]. Note that there is nothing special about this interval, and other intervals yield the same results when the interval length remains the same. In the experiment of this subsection, we take ω = 0.5 × 10 −8 . From Table 3 and Figure 4 , it can be seen that when the linear system is overdetermined, with c getting closer to 1, the GRK method and the MWRK method have a significant increase in the number of iterations and running time. When c increases to 0.7, the GRK method and the MWRK method exceeds the maximum number of iterations. But the IT and CPU of the GRKO method and the MWRKO method have decreasing trends. From Table 4 and Figure 5, we can get that the numerical experiment of the coefficient matrix A in the underdetermined case has similar laws to the numerical experiment in the overdetermined case.
In this group of experiments, it can be observed that when the rows of the matrix are close to linear correlation, the GRKO method and the MWRKO method can find the least Euclidean norm solution more quickly than the GRK method and the MWRK methd.

Experiments for Sparse Matrix
In this subsection, we will give three examples to illustrate the effectiveness of our new methods applied to sparse matrix. The coefficient matrices A of these three examples are the practical problems from [44] and the two test problems from [43]. We uniformly take ω = 0.5 × 10 −5 in these three numerical examples.

Example 1.
We solve the problem (1) with the coefficient matrix A ∈ R m×n chosen form the University of Florida sparse matrix collection [44]. the matrices are divor ce, phot og r ammet r y, Ragusa18, Tr ec8, S t r anke94, and wel l1033. In Table 5, we list some properties of these matrices, where density is defined as follows: density = number of nonzeros of m-by-n matrix mn .
In order to solve Example 4.1, we list the IT, CPU and historical convergence of the GRK, GRKO, MWRK, and MWRKO methods in Figure 6 and Table 6, respectively. It can be seen that MWRKO's IT and CPU are the least. Although the GRKO method is not faster than the MWRK method for most of the experiments in Table 6, it is always faster than the GRK method.   [43], which generates saprse matrix A, an exact solution x * and b = Ax * . We set N = 60, θ = 0 : 0.5 : 179.5 • , P = 50, then resulting matrix is of size 32400 × 3600. We test RRE every 10 iterations and run these four methods until RRE< ω is satisfied, where ω = 0.5 × 10 −5 .
We first remove the rows of A where the entries are all 0, and perform row unitization processing on A and b. We emphasized that this will not cause a change in x * . In Figure 7, we give 60 × 60 images of the exact phantom and the approximate solutions obatined by the GRK, GRKO, MWRK, MWRKO methods. In Figure 7, these four methods can basically restore the original image, but in the subgraph (f) of Figure 7, we can see that the MWRKO methods needs the least iterative steps, and the GRKO method has less iterative steps than GRK method. It can be observed from Table 7 that the MWRKO method is the best in terms of IT and CPU.  Example 3. We use an example from 2D seismic travel-time tomography reconstruction, implemented in the function seismic t omo(N , s, p) in the MATLAB package AIR Tools [43], which generates sparse matrix A, an exact solution x * and b = Ax * . We set N = 12, s = 24, p = 35, then resulting matrix is of size 840 × 144. We run these four methods until RRE< ω is satisfied, where ω = 0.5 × 10 −5 .
We first remove the rows of A where the entries are all 0, and perform row unitization processing on A and b. In Figure 8, we give 12 × 12 images of the exact phantom and the approximate solutions obatined by the GRK, GRKO, MWRK, MWRKO methods. From the subgraph (f) of Figure 8 and Table  8, we can see that the MRKO method, the GRKO method, and the MWRK method perform similarly in the number of iteration steps, and are twice as small as the number of iteration steps of the GRK method. It can be observed from Table 8 that MWRKO method is the best in terms of IT and CPU.

Conclusion
Combined with the representative randomized and non-randomized row index selection strategies, two Kaczamrz-type methods with oblique projection for solving large-scale consistent linear systems are proposed, namely the GRKO method and the MWRKO method. The exponential convergence of the GRKO method and the MWRKO method are deduced. Theoretical and experimental results show that the convergence rates of the GRKO method and the MWRKO method are better than GRK method and the MWRK method respectively. Numerical experiments show the effectiveness of these two methods, especially when the rows of the coefficient matrix A are close to linear correlation.