N-mode Minimal Tensor Extrapolation Methods

The purpose of this work is to present, using the n-mode product, a new approach to generalize, for tensor sequence,s the well-known vector extrapolation methods MPE ( Minimal Polynomial Extrapolation method) and RRE (Reduced Rank Extrapolation method). We deﬁne the notion of the n-mode minimal polynomial of a matrix with respect to a tensor. This polynomial will be used, through the iterative solution of some tensor linear systems, to introduce the tensor version of MPE and RRE. These methods involve only the terms of sequences that result from the used iterative meth-ods. The implementation of these methods on some sequences of tensors conﬁrms the eﬀectiveness and applicability of our approach.


Introduction
Computing the limits of scalar or vector sequences is an important problem that arises in different areas of science and engineering.The Fixed-Point iterative methods generate sequences whose limits are simply the required solutions.In most cases of interest, however, the convergence rate of these sequences are extremely slow, that is, to approximate the required solution with a reasonable prescribed level of accuracy, we need to consider very large number of iterations which means the necessity of computing many terms of theses sequences until we reach one that has acceptable accuracy, a defect that makes this way of approximating very expensive.One practical way of overcoming this problem is to apply to these sequences a suitable convergence acceleration method (or extrapolation method).For scalar sequences, one of the earlier and popular extrapolation method is the well-known Aitken's ∆ 2 process, introduced for the first time in 1926 by Aitken in [1].It consists in transforming a sequence (x n ) converging to x to new one by the transformation t n defined as follows where the first and the second forward differences are defined by ∆x n = x n+1 − x n and ∆ 2 x n = ∆x n+1 − ∆x n .In [6], it is proven that under specific assumptions, the new sequence (t n ) converges faster than (x n ) to the same limit x.For sequences of vectors, two different types of vector extrapolation methods exist in the literature: * Polynomial methods: The minimal polynomial extrapolation method (MPE) proposed by Cabay and Jackson [8], the reduced rank extrapolation method (RRE) introduced by Kaniel and Stein [20] and Mesina [22], and the modified minimal polynomial extrapolation method (MMPE) of Brezinski [5], Pugachev [24], Sidi, Ford, and Smith [29].* Epsilon algorithms: The vector epsilon algorithm (VEA) of Wynn [31] which is a vectorization of the scalar epsilon algorithm (SEA) proposed by the same author in [32] (which is a recursive procedure for implementing the transformation of Shanks [27]), and the topological epsilon algorithm (TEA) of Brezinski [4].
In several fields with applications such as google page rank [6], statistics or solving discretized Navier-Stokes problems [13]..., the polynomial type methods, especially MPE and RRE, have been widely used and proved to be good convergence accelerators, they need less computation and storage requirement for the same degree of accuracy; see [16][17][18] for more details on applications of vector extrapolation methods.Block and global versions of these methods were also introduced in [15,19].
In the last few decades, tensors are currently gaining importance due to the developments in various disciplines, like higher-order statistics (HOS), chemometrics, psychometrics, econometrics, image processing, biomedical signal processing, etc.Therefore, it is natural to ask about the generalization of these methods (vector extrapolation method) to the tensor case.Recently, the first paper in this direction was proposed by Jbilou and al. in [14].Throughout their approach, the authors defined, using the Einstein product, a new tensor product between two tensors which is a generalization of the diamond product.They used orthogonal and oblique projections onto subspaces of small dimensions to present a generalization of MPE and RRE, namely, GT-MPE and GT-RRE (see [14] for details).
In this paper, using the n-mode product, we present a new approach that generalizes for tensor sequences, the vector extrapolation methods MPE and RRE.In our derivation of these methods, we will go through the iterative solution of tensor linear system of equations, that is, we will work within the context of tensor linear systems, making sure that these methods involve only the terms of the sequence that result from the iterative methods used.In fact, this is closely the same approach adopted in [28] to develop vector extrapolation methods MPE and RRE.We introduce the notion of n-mode minimal polynomial of a matrix with respect to a tensor that will be used.The implementation of these methods on some sequences of tensors confirms their effectiveness and applicability.
The remainder of this paper is organized as follows.The next section presents some basic definitions and properties related to tensors and some notions and results which are essential for deriving our methods.In Section 3, we discuss the solution of some linear tensor systems of equations, using the n-mode product, by fixed iterative method.The section 4 deals with the construction of the solution of these systems.We introduce the n-mode minimal polynomial of a matrix with respect to a tensor that generalizes the minimal polynomial of a matrix with respect to a vector.In Section 5, we derive from the solution of two different least squares problems the tensor versions of the vector polynomial extrapolation methods namely tensor MPE (TMPE) and tensor RRE (TRRE).Section 6 summarizes these methods and gives a description of their implementations.Section 7 is devoted to the algorithms TMPE and TRRE, and in the last section, we present some numerical experiments with applications to color image processing.

Preliminaries and Notations
In this section, we summarize some of the basic facts about tensors and their computations that will be used in the remainder of this paper.
Definition 1 A tensor is a multidimensional array of data whose elements are referred by using multiple indices.The number of indices ( modes or ways ) required is called the order of a tensor.For a given (N -order) tensor A ∈ R I1×I2×I3×•••×I N we use the notation where A i1i2...in−1inin+1...i N are the entries of A. .
-tensor defined by: The n-mode product of the tensor A by a vector w ∈ R In , denoted by A ×n w is the subtensor of order (I 1 × I 2 ...
Proposition 1 [12] For Definition 3 (Inner product of two tensors [10]).Let A and B two tensors of the same size Frobenious) inner product of A and B is the scalar defined as: leading to the tensor norm A = A, A .
In the following, we introduce the definition of the n-mode minimal polynomial with respect to a tensor which is a generalization of minimal polynomial with respect to a vector.
Definition 4 Given a nonzero tensor A ∈ R I1×I2×I3×•••×I N and a matrix M ∈ R In×In (1 ≤ n ≤ N ), the n-mode minimal polynomial of M with respect to A is the monic polynomial P with the smallest degree satisfying A ×n P(M ) = 0. (5)

Theorem 2
The n-mode minimal polynomial P of M with respect to A exists and is unique.
Proof Let Q be the minimal polynomial of M ; it satisfies Q(M ) = 0 and then A ×n Q(M ) = 0. Therefore, there is a monic polynomial P of the smallest degree k, k ≤ d ≤ In, where d is the degree of Q, satisfying A ×n P(M ) = 0. Suppose that there is another monic polynomial P of degree m that satisfies A ×n P(M ) = 0. Then the difference P = P − P also satisfies A ×n P(M ) = 0 and its degree is less than m, which is impossible.Therefore, P is unique.
Proposition 3 Suppose that A ×n T(M ) = 0 for some polynomial T with deg(T) > deg(P), then P divides T.
Proof Let T be of degree k such that k < k and A ×n T(M ) = 0.Then, there exists polynomials T 1 of degree k − k and T 2 of degree at most k-1 such that T(x) = T 1 (x)P(x) + T 2 (x), therefore, A ×n T(M ) = A ×n P(M )T 1 (M ) + A ×n T 2 (M ) = A ×n T 2 (M ) = 0, since A ×n P(M )T 1 (M ) = (A ×n P(M )) ×n T 1 (M ) = 0.Then, since T 2 has a degree less than k, T 2 = 0 and then P divides T.
As an example, we consider the matrix given random matrices.Consider the 3 × 3 × 3 tensor A defined as: Then, the 3-Mode minimal polynomial of M with respect to A is : 3 Fixed-point iterative method for linear n-mode system Let A ∈ R In×In be a nonsingular matrix and B ∈ R I1×I2×I3×•••×I N , and consider the tensor linear system where X ∈ R I1×I2×I3×•••×I N is the unknown tensor to be determined.We split the matrix A as in A = I − M , the problem (6) can be expressed as whose solution will be denoted by S ∈ R I1×I2×I3×•••×I N .Let the sequence of tensors {X k } be generated as in starting with some initial tensor X 0 ∈ R I1×I2×I3×•••×I N .Now, we would like the sequence {X k } to converge to S. The subject of convergence can be addressed in terms of ρ(M ), the spectral radius of M .We know that a necessary and sufficient condition for that {X k } converges to S from arbitrary X 0 is ρ(M ) < 1. Writing the system (7) in the form X × n (I − M ) = B, it is clear that the uniqueness of the solution is guaranteed when the matrix A = I − M is nonsingular or, equivalently when M does not have one as an eigenvalue.Given the sequence {X k }, generated as in (1.6).As we have shown already, provided ρ(M ) < 1, lim k−→∞ X k exists and equals S.
Let us define the first and second forward differences and the error tensor Using ( 8) and ( 9), it follows that and Therefore, We can, in addition, relate respectively E k and U k to U k and W k , and vice versa, via and We will also need the following simple relations

Construction of solutions via n-mode minimal polynomial
We first give the following result that will be used.
Theorem 4 Let P k be the n-mode minimal polynomial of M with respect to E k = X k − S given as Then m i=0 θ k i = 0, and the solution S can be expressed as Proof By definition of P k and from ( 16), we have Therefore, Provided that m i=0 θ k i = 0, we get (18).Now using the fact that P k (1) = m i=0 θ k i , and since one is not an eigenvalue of M , hence one is not a root of P k , and then t The problem is the fact that since the error E k is not known, we couldn't compute P k directly using the errors.The following result gives a way of computing this polynomial using only the forward tensors differences U k .
Theorem 5 The n-mode minimal polynomial P k of M with respect to E k is also the n-mode minimal polynomial of M with respect to U k = X k+1 − X k .
Proof Let P k be the minimal polynomial of M with respect to E k as before, and denote by Pk the n-mode minimal polynomial of M with respect to U k .Thus, and Multiplying (with the n-mode product) the equality ( 21) by (M − I), and recalling from ( 14) that E k ×n (M − I) = U k , we obtain U k ×n P k (M ) = 0, this implies, by proposition (3), that Pk divides P k .On the other hand, using ( 14), we can rewrite ( 22) as E k ×n ((M − I)) Pk (M ) = 0, which, upon multiplying by (M − I) −1 , gives E k ×n Pk (M ) = 0, this implies that P k divides Pk ; Therefore, Pk ≡ P k .

'
It follows from this last result that Hence, , ( 23) and ( 24) give the following relations where tensor such that the j th frontal slice ( obtained by fixing the last index at j) (U k m ) :,:,...,:,j = U k+j , 0 ≤ j ≤ m.
Using Theorem 5, the equation ( 26) is consistent and has a unique solution θk .Therefore, with θ k i available and by θ k m = 1, we set Then Therefore, using (18), the solution S can be expressed as Invoking (27) and dividing (25) by m i=0 θ k i , we find that the δ k i satisfy the constrained system This is a consistent system of m + 1 unknown δ k 0 , δ k 1 , . . ., δ k m that has a unique solution.
Before closing this section, it will be useful to state this remark about the n-mode minimal polynomial P k in theorem 5.
Remark 1 With m is the degree of P k , the tensors U k , U k+1 , . . ., U k+i , i < m, are linearly independent, while U k , U k+1 , . . ., U k+m are not since the tensor U k+m is a linear combination of U k+i , 0 ≤ i ≤ m − 1, as expressed in (24).

Derivation of TMPE and TRRE methods
In the previous section, we have realized that S can be computed via a sum of the form ), the n-mode minimal polynomial of M with respect to E k , has been determined.We have also seen that P is the n-mode minimal polynomial with respect to U k and can be determined uniquely by solving the system (26) or the system (30).But the determination of S by this way is usually expensive as far as computation time and storage requirements because the dimension I n of the matrix M can be very large in general and so the degree of P k .To get around this problem, we replace the n-mode minimal polynomial of M with respect to U k by another unknown polynomial, whose degree is much smaller than I n and is at our disposal.If we denote by m 0 the degree of the n-mode minimal polynomial of M with respect to U k , Then, by Remark 4, the sets {U k , U k+1 , . . ., U k+i }, i < m 0 , are linearly independent, while the set {U k , U k+1 , . . ., U k+m0 } is not, it means that the systems ( 26) and (30) no longer have a solution in the ordinary sense and then a twist should be done to get this problem and have an effective approximation to the (exact) solution S.

Derivation of TMPE
We have seen that when m is chosen smaller than the degree of the n-mode minimal polynomial of M with respect to U k (hence also E k ) and, therefore, also smaller than I n , the system U k m ×(N+1) θk = −U k+m in (26) has no solution for θ k 1 , θ k 2 , ..., θ k m−1 in the ordinary sense.To override this problem, we can solve this system in the least-squares sense, indeed, such a solution always exists.Next, we let precisely as in (27) and then compute the tensor X k,m = m i=0 δ k i X k+i as our approximation to S. The resulting method is Tensor MPE (TMPE).The steps of the TMPE method can be summarized as: 1. Choose k and m and input X k , X k+1 , . . ., X k+m+1 , 2. Compute the tensors U k , U k+1 , . . ., U k+m and form the tensor U k m−1 , 3. Solve for θk = (θ k 0 , θ k 1 , ..., θ k m−1 ) T the system (26) in the least squares sense, this leads to solve : or equivalently, where . is the tensor norm defined in (3

Derivation of TRRE
Again when m is chosen smaller than the degree of the n-mode minimal polynomial of M with respect to U k (hence also E k ) and, therefore, also smaller than I n , the system U k m ×(N+1) δ k = 0 in (30) subject to m i=0 δ k i = 1 has no solution for δ k 1 , δ k 2 , ..., δ k m in the ordinary sense.Similarly , to get around this problem, we can solve this system in the least-squares sense, with the equation m i=0 δ k i = 1 serving as a constraint.Such a solution always exists and since δ k 1 , δ k 2 , ..., δ k m are available, we compute the tensor X k,m = m i=0 δ k i X k+i as our approximation to S. The resulting method is the Tensor RRE (TRRE).Solving the system U k m ×(N+1) δ k = 0 in (30) subject to m i=0 δ k i = 1 in the least squares sense amounts to solving the (constrained) optimization problem In fact, we can replace this constrained minimization problem by an equivalent unconstrained one.Using the relation in (9), we have We recall that where then, the problem (33) can be replaced by the unconstrained problem which can be expressed in tensor form where Note that the δ k i can also be computed from the µ k i as: Therefore, we can also compute X k,m through The steps of the TRRE method can be summarized as: 1. Choose k and m and input X k , X k+1 , . . ., X k+m+1 .2. Compute the tensors U k , U k+1 , . . ., U k+m and form the tensor where . is the tensor norm defined in (3).

With δ
Remark 2 We notice that the approximations produced by TMPE and TRRE depend only on the tensors-terms X k , X k+1 , . . ., X k+m+1 and not how the sequence {X k } is generated, linear or nonlinear, or even if {X k } is obtained directly from only measurements or from a probabilistic process.

Implementation of TMPE and TRRE
This section is devoted to giving an efficient implementation of TMPE and TRRE.It is clear that their definitions based, respectively, on the solution of the least squares problems ( 31) and (33).We begin this section by some definitions and proprieties that will be used for this purpose.

As a particular case, for
Proof By definition of the involved tensor products.
the following assertions are equivalents.

The subtensors
i=1 are linearly independent, then also the columns {A ⊡ (N +1) A i } m i=1 , and vice versa.

Tensor Global-QR decomposition [14]
) Q = I m×m (matrix identity) and an upper triangular matrix R ∈ R m×m such that This is summarized in the following algorithm.
2: for i = 1, ..., m do 3: for j = 1, ..., i − 1 do 5: end for 7: Theorem 9 Consider the least squares problem, where The vector x is a solution to (44) if and only if it satisfies the n × n system, Proof We have to show that x is a solution to (44) if and only if the residual R = A ×(N+1) x − B = 0.A ⊡ (N +1) R = 0.For any vector x, one have First, suppose that A ⊡ (N +1) R = 0.Then, by ( 6) we obtain . implying that x is a solution to (44).Assume now that x is a solution to (44) but A ⊡ (N +1) R = 0. Choose x = x + λ(A ⊡ (N +1) R), where λ is a real scalar.Then Since A ×(N+1) x − B 2 ≥ R 2 , we must have g(λ) ≥ 0 for all real λ.However, by the fact that A ⊡ (N +1) R = 0, we can have g(λ) < 0 for λ < 0 and | λ | sufficiently close to zero, which is a contradiction with the fact that x is a solution to (44).As a result A ⊡ (N +1) R = 0.
Before giving the main theorem of this section, we need the following lemma.
Lemma 1 (see [28]).Let M ∈ R m×m be a symmetric and positive definite matrix and let h ∈ R m , h = 0. Then the minimization problem, has a unique solution x given as In addition, Theorem 10 Consider the least squares problem then the problem (49) has a unique solution x given by In addition, Proof Notice first that Then, the proof of Theorem 10 can be achieved by applying Lemma 1 with M = A⊡ (N +1) A ∈ R m×m which is symmetric and positive definite due to the assumption that rank(A ⊡ (N +1) A) = m.
Numerically, for solving the problem in (49), we use the Global QR factorization we obtain the vector v as the solution of the square linear system R T Rv = h by solving two triangular linear systems.Finally , provided h T v = 0, we get x = v h T v .

Algorithms for TMPE and TRRE
Going back to Section 5, we see throughout the summary of the TMPE and TRRE, that once the scalars δ k i are determined, the computation of the approximation X k,m is done by the same way for both two methods, that is: Invoking the relations ((34),( 35) and ( 36)), we have where To implement this is a good way, we first compute and then compute the approximate solution X k,m as

Determination of δ k i for TMPE
Applying Theorem 9 with A = U k m−1 and B = U k+m , the solution θk of the least-squares problem (31) can be determined via the solution of the linear system R T R θk = −r m with r m = U k m−1 ⊡ (N +1) U k+m , with θk available, we let θ k m = 1 and compute δ k i as in (27).All these steps are given in the following algorithm.

Determination of δ k i for TRRE
Applying Theorem 10 with A = U k m and h = (1, 1, ..., 1) T (the constraint m i=0 δ k i = 1 can be expressed in the form h T δ k = 1), the solution δ k to the constrained least-squares problem in (42) can be determined as follows.First, we solve and then compute β = 1 h T v to get δ k = βv.The different steps are summarized in the following algorithm.

Numerical experiments
This section is devoted to some numerical experiments to illustrate the acceleration effect of our methods.The computations were carried out using MATLAB R2015b with machine epsilon about 2.10 −16 .We compared the performance of our proposed methods (TMPE and TRRE) with the Nesterov's extrapolation process [23] defined by where {β k } is a recurrent scalar sequence such that β k+1 = 4β 2 k + 1) + 1 2 with β 1 = 1.Several approaches investigated the convergence of Nesterov's process, e.g [11].For a best evaluation of the algorithms and in addition to the computation of the relative error , we also computed the ratio that measures the rate of acceleration.

Example 1
For this example, we considered the tensor sequence X k ∈ R d×d×d obtained from the basic linear iteration with M = U ΣU T ∈ R d×d where U is a random orthogonal matrix and Σ = diag(v) with v ∈ R d is a chosen vector such that ρ(M ) < 1.The tensor B = S − S × 3 M is such that the exact solution is the tensor S whose elements are equal to one.We choose the vector v such that the sequence {X k } has a very slow convergence.
In Table 1, we reported the number of iterations, the relative error and the required cpu-time for the three methods.The results reveals the acceleration effect of our proposed methods.for different dimensions (d = 10, 30), the number of iterations and the cpu-time corresponding to the TMPE method and the TRRE method are less than those obtained for the initial sequence (Basic-Iteration) and those for the Nesterov process.

Example 2
Consider the linear system of tensor equations, given in [10] and derived from the numerical solution of the partial differential equation The standard central difference discretization on equidistant nodes of (54) leads to the following tensor equation with Einstein product where A = [a ijkl ] ∈ R N ×N ×N ×N and B = [b ij ] ∈ R N ×N , whose entries are respectively defined as , where ⊗ denotes the Kronecker product, ivec is the index mapping function defined as follows: For given integers i, j and dimensions Starting from an initial guess X 0 , and throughout the algorithm 3.1 in [26], we construct the linear sequence {X k }, whose limit is the solution of the equation (55), as follows In Table 2, we reported the number of iterations, the relative error and the required cpu-time for the four methods in different dimensions N (N = 10, 20 and N = 50).The obtained results show clearly the effectiveness of our proposed methods.The number of iterations and the required cpu-time corresponding to the TMPE method and the TRRE method are less than those obtained for the initial sequence (Basic-Iteration) and those for the Nesterov process.

Example 3
This numerical example is devoted to text removal; it is a process of data inpainting that aims to remove the text added to an image or a video, see [30] Iteration number for more details.Let X exact denotes the original image, and the tensor B represents its painted ( with text added) data.To remove the text added we consider the tensor total variation inpainting problem given by the following form: min where P E is the tensor projection defined as: otherwise, with E = {(i 1 , i 2 , ..., i N ) : X i1,i2,...,i N is observed}, (see [9]).
To solve this problem we adopt the Tensorial Double Proximal Gradient algorithm (TDPG) proposed in [2].Since TDPG is a gradient descent-like algorithm, it has a slow convergence rate.
The applications of our methods on the sequence generated by this algorithm in image and video inpainting problems, according to the results bellow, confirm also their effectiveness and performance.
In this example, we took the original Onion color image of size 150 × 200 × 3 and we added a text as shown in Figure 6.
Figure 7 shows the recovered images by TDPG, Nesterov, TMPE and TRRE.We can see clearly that our proposed methods have removed nearly all the added text and this was not the case for the other non accelerated methods.

Example 4
In this example, we took the original "Lighthouse" color image of size 640 × 480 × 3. We added a paint on it as shown in Figure 10. Figure 11 shows the recovered images by TDPG , TMPE and TRRE.We can see clearly that our methods removed more effectively the added noise than TDPG method.The relative error, the PSNR, as well as the cpu-time in seconds for the three methods are reported in Table 3.It follows from the obtained results that our proposed methods gave good results in terms of relative error as well as in terms of cpu-time .We can observe that TRRE is more effective as compared to TMPE.In Figure 12, we plotted the PSNR curves in graph (a), the acceleration rate in graph (b) and the relative error in graph (c).We see clearly from graphs b and c that both TMPE and TRRE converge faster than TDPG as the iteration number increasing.We can also see the advantage of our methods.
1) and D = [d ij ] are tridiagonal and diagonal matrices of order N and β = (β i ) ∈ R N 2 , defined by

Fig. 4 :
Fig. 4: The residual error (left) and the rate of acceleration (right) for N=10.

Fig. 5 :
Fig. 5: The residual error (left) and the rate of acceleration (right) for N=40.

Figure 8 Fig. 8 :Fig. 9 :
Figure8presents the relative error (graph in the left) and the rate of acceleration (graph in the left) versus the iteration index k for the four methods (TMPE, TRRE, Nestrov and TDPG).In Figure9we plotted the obtained PSNR curves associated to the four methods .Compared to TDPG and Nesterov process, the curves reveal clearly that our proposed algorithms are more effective.

Table 1 :
Comparison between the proposed methods for dimensions d=40, 100.

Table 2 :
Comparison of the four methods for dimensions N=10, 20 and 50.

Table 3 :
Comparison between the proposed acceleration methods (example 4).