A class of spectral conjugate gradient methods for Riemannian optimization

Spectral conjugate gradient (SCG) methods are combinations of spectral gradient method and conjugate gradient (CG) methods, which have been well studied in Euclidean space. In this paper, we aim to extend this class of methods to solve optimization problems on Riemannian manifolds. Firstly, we present a Riemannian version of the spectral parameter, which guarantees that the search direction always satisfies the sufficient descent property without the help of any line search strategy. Secondly, we introduce a generic algorithmic framework for the Riemannian SCG methods, in which the selection of the CG parameter is very flexible. Under the Riemannian Wolfe conditions, the global convergence of the proposed algorithmic framework is established whenever the absolute value of the CG parameter is no more than the Riemannian Fletcher–Reeves CG parameter. Finally, some preliminary numerical results are reported and compared with several classical Riemannian CG methods, which show that our new methods are efficient.


Introduction
The research of conjugate gradient (CG) methods can be traced back to the work of Hestenes and Stiefel [1] in 1952 for solving linear equations with an symmetric positive definite coefficient matrix. Later, the so-called nonlinear CG method is first proposed by Fletcher and Reeves [2] to solve unconstrained nonlinear optimization problems of the form min f (x), s.t. x ∈ R n , (1.1) where f : R n → R is a continuously differentiable function, whose gradient is denoted by g(x) := ∇f (x).
For the current iterate x k , the iterative form of CG methods is as follows x k+1 = x k + α k d k , k = 0, 1, 2, . . . , (1.2) where α k > 0 is the stepsize determined by a certain line search, and d k ∈ R n is the search direction generated by where g k = ∇f (x k ), and β k ∈ R is the so-called CG parameter whose selection may significantly affect the theoretical properties and numerical effects of the algorithm. Some classical and famous CG parameters can be found in [1][2][3][4]. The CG methods have been widely studied over the past several decades because of their advantages of low storage, less computational cost, and fast convergence. In addition to some classical formulas, there are many variants of CG methods, such as three-term CG methods [5][6][7][8][9], hybrid CG methods [10][11][12], and spectral CG (SCG) methods [13][14][15][16]. Here, we are particularly interested in the class of SCG methods, which is originated from Birgin and Martínez [13]. The SCG methods are good combinations of the spectral gradient method [17] and CG methods, whose direction is of the form where θ k is the spectral parameter and β k is the CG parameter. If θ k = 1, then the SCG methods reduce to the CG methods. If θ k > 1, then more emphasis will be placed on the negative gradient, which is expected to obtain a direction with larger reduction. Some specific choices of θ k can be found in [14][15][16].
In this paper, we aim to extend the SCG methods to solve optimization problems on Riemannian manifolds (Riemannian optimization for short) in the form of (1.5) where f : M → R is a smooth function and M is a Riemannian manifold. Riemannian optimization is a hot research field in recent years, which frequently arises in practice, such as computer vision, quantum computation, machine learning, and engineering; see the monographs [18][19][20][21]. The extension is not straightforward since the manifold M usually has a nonlinear structure, and therefore some special tools of Riemannian optimization, such as retraction and vector transport, should be incorporated into the methods. The study of Riemannian CG methods can be dated back to the work of Lichnewsky [22] and Smith [23], in which the concept of Riemannian optimization is introduced, and the tools of Riemannian geometry, namely exponential map and parallel translation, are used to tackle the nonlinearity of the manifold M. However, these two tools are computationally inefficient in general. To overcome this difficulty, Absil et al. [18] proposed to use retraction and vector transport to relax exponential map and parallel transport, respectively. By taking advantage of the new tools, Ring and Wirth [24] presented a Riemannian FR CG method and proved its global convergence by assuming that the vector transport does not increase the norm of tangent vectors. Sato and Iwai [25] removed the assumption in [24] by introducing a scaled vector transport, which also guarantees the global convergence of the related Riemannian CG method. Sato [26] proposed a DY-type Riemannian CG method and proved its global convergence under the Riemannian Wolfe conditions. Zhu [27] proposed an efficient Riemannian CG method for optimization on the Stiefel manifold, in which two novel types of vector transport are introduced by using the Cayley transform such that the Ring-Wirth nonexpansive condition is satisfied. Sakai and Iiduka [28,29] proposed several globally convergent Riemannian hybrid CG methods, and numerical results show that these hybrid methods are very efficient. Sato [30] proposed a general framework of Riemannian CG methods, which covers a variety of existing related methods.
To the best of our knowledge, our work is the first to extend the SCG methods in Euclidean space to solve the Riemannian optimization problem (1.5). Firstly, we will present a Riemannian version of the spectral parameter, which generalizes the corresponding formula in [16] to the Riemannian setting. The construction of the new spectral parameter guarantees that the search direction always satisfies the sufficient descent property which is independent of any line search strategy. Secondly, we introduce a generic algorithmic framework for the Riemannian SCG methods. An interesting feature is that, to ensure convergence, the selection of the CG parameter is very flexible. More precisely, it is only required that |β k | ≤ β R-FR k , where β R-FR k (defined below) is the Riemannian version of the FR CG parameter. Global convergence of the proposed algorithmic framework is established under the Riemannian Wolfe conditions. Finally, some preliminary numerical results are reported and compared with several classical Riemannian CG methods, which show that our new methods have advantages in terms of the number of iterations and CPU time.
The paper is organized as follows. In Section 2, we review some basic knowledge in Riemannian optimization and some existing Riemannian CG methods. In Sections 3 and 4, we present our algorithmic framework and prove its global convergence, respectively. Some encouraging numerical results are reported in Section 5, and concluding remarks are made in Section 6. Notations: T x M denotes the tangent space to M at x; T M denotes the tangent bundle of M; ·, · x denotes the inner product on T x M; ξ x := ξ, ξ x represents the norm of a tangent vector ξ ∈ T x M; grad f (x) denotes the Riemannian gradient of f at x.

Preliminaries
In order to extend the SCG methods to the Riemannian setting, we shall use the following retraction map and vector transport.
Based on Definition 1, the classical formula (1.2) can be generalized as where η k is some tangent vector in the tangent space T x M, which plays the same role as d k in (1.2) and will be clear later. Let R be a retraction associated with a given vector transport T . Then from (2.1) and Definition 2, we know that T α k−1 η k−1 (η k−1 ) ∈ T x k M. Hence, the formula (1.3) can be generalized as where β RCG k is a Riemannian version of some CG parameter β k defined in Euclidean space. In order to prove global convergence, Ring and Wirth [24] assumed a nonexpansive condition on the vector transport T , i.e., In this paper, we may consider to use the differentiated retraction T R of R as a vector transport, which is defined by In this way, the retraction R is associated with T R . However, the vector transport T R is not always an isometry, and thus it does not satisfy the Ring-Wirth nonexpansive condition (2.3) in general. To overcome this difficulty, Sato and Iwai [25] proposed to use a scaled form of T R as follows . It can be easily verified that the nonexpansive condition (2.3) holds for the scaled vector transport T S , which will be used to guarantee the global convergence of the proposed methods. By replacing T with T S in (2.2), we can rewrite η k as follows Given constants ρ, σ ∈ (0, 1), the Riemannian Armijo condition determines the stepsize The Riemannian Wolfe conditions require α k to satisfy the two relations where 0 < σ 1 < σ 2 < 1. The Riemannian strong Wolfe conditions require α k to satisfy both (2.5) and the following inequality Finally, we summarize some existing Riemannian CG formulas as follows: [29]

Riemannian spectral conjugate gradient methods
In this section, we propose a generic algorithmic framework for a class of Riemannian spectral CG methods (Algorithm 1) and discuss its properties. First of all, we generalize the search direction (1.4) of the SCG methods to the form of where β RCG k is a Riemannian CG parameter, and θ RS k is a Riemannian spectral parameter which should be carefully selected to well combine the spectral gradient method and CG method. Compared with a fixed choice 1 for the coefficient of the negative gradient −grad f (x k ) in the standard Riemannian CG methods, the adaptive choice of θ RS k is expected to bring some nice theoretical properties and numerical advantages.
In this paper, we propose a specific selection of the parameter θ RS k as follows which is motivated by the spectral parameter proposed in [16] in Euclidean space and taking into account that the relation grad f (x k−1 ), η k−1 x k−1 / grad f (x k−1 ) 2 x k−1 = −1 is always satisfied for the proposed methods (see Lemma 1 below), and as a result the first term in the right-hand side of (3.2) can be simply set to 1.
A direction η k is called a sufficient descent direction if it satisfies for some constant c > 0. A remarkable advantage of the formula (3.2) is that the corresponding search direction is sufficient descent in each iteration regardless of the line search strategy used; see the following lemma.

4)
i.e., η k is a sufficient descent direction in the sense of (3.3) with c = 1.
Based on the above analysis, we are now ready to present our algorithmic framework for the Riemannian SCG methods as follows (Algorithm 1).

Algorithm 1 A generic algorithmic framework for the Riemannian SCG methods.
Remark 1 (i) In order to ensure convergence, the choice of the CG parameter β RCG k is very flexible, since it is only required to satisfy (3.5) In practice, the hybrid forms of β R-FR k and β R-PRP k may produce satisfactory numerical results. Here, we present three possible such choices: which are the Riemannian versions of the classical formulas in [10,11] and [31], respectively. Note that the formulas (3.6) and (3.8) are first presented in this article, while formula (3.7) is presented by Sakai and Iiduka [29]. It is not difficult to verify that these choices of β RCG k all satisfy (3.5). (ii) Motivated by the general framework of Sato [30], our algorithmic framework can be further extended. More precisely, the term T S α k η k (η k ) can be extended to s k T R α k η k (η k ), where the scaling parameter s k satisfies 0 < s k ≤ min{1, if s k is set to be its upper bound. In addition, the term T S α k η k (gradf (x k )) can be extended to l k T R α k η k (gradf (x k )) for some scaling parameter l k > 0. If l k = min{1, η k x k / T R α k η k (η k ) x k+1 }, then l k T R α k η k (gradf (x k )) = T S α k η k (gradf (x k )); if l k = 1, then the vector transport T R is not scaled. This means that the scaling of the transported gradient can be independent of the Ring-Wirth condition. Note that the theoretical results in the next section hold immediately for the above extensions, while the extended numerical performance needs further investigation in future. During our numerical experiments, it is interesting to observe that the condition T R α k η k (gradf (x k )) R x k (α k η k ) ≤ gradf (x k ) x k is always satisfied for the test problems and methods. This means that the vector transport T R is not scaled, which is equivalent to setting l k ≡ 1.

Global convergence
In this section, we establish the global convergence of Algorithm 1 under the Riemannian Wolfe conditions and when the Riemannian CG parameter β RCG k is selected to satisfy (3.5). For this, we need to make an assumption on the objective function f , which is commonly used in the related literature (see, e.g., [25][26][27]).

Assumption 1
The objective function f is bounded below and of C 1 − class, and there exists a Lipschitz constant L > 0 such that for all t ≥ 0 and x ∈ M, η ∈ T x M with η x = 1.
The following lemma presents a fundamental relation with respect to the stepsize α k , which will be used to prove the Zoutendijk theorem.
Based on Lemmas 1 and 2, we can obtain the following Zoutendijk-type condition, whose proof can be found in [24,Theorem 2].

Lemma 4
Suppose that the conditions in Lemma 2 hold, the Riemannian CG parameter β RCG k satisfies (3.5), and that the sequence {x k } is generated by Algorithm 1. Then, we have Proof For k = 0, the claim holds immediately. For k ≥ 1, from (3.1) and (3.2), we can obtain It is easy to see that w k = 0. This together with (4.4) shows that

Numerical experiments
In this section, we aim to evaluate the practical performance of the proposed algorithmic framework Algorithm 1. Three specific Riemannian SCG methods are tested, which are denoted respectively by given by (3.8).
As mentioned before, the above choices of β RCG k all satisfy (3.5). In addition, we compare our methods with eight Riemannian CG methods, i.e., the search direction is generated by (2.2), and the Riemannian CG parameters are described as follows:

Test problems and implementation details
Five different types of Riemannian optimization problems (P1-P5) are tested; see Table 1 for details.
In the experiment, the specific data settings of problems P1, P2, and P3 are the same as those of [29], and P4 and P5 are set the same as in [28]. For each problem, we randomly generated 50 instances, and therefore a total of 250 instances are tested. The tests were performed in Python 3.10 with NumPy 1.22.4 package and the Pymanopt 0.2.5 Package on a laptop equipped with the 11th Gen Intel(R) Core(TM) i5-1135G7@2.40GHz 2.42GHz, 16.0GB RAM. Our codes were developed based on the public software Pymanopt (see [34]) which is available via https://pymanopt.org/.
The algorithm terminates when one of the following three cases occurs: The Rayleigh-quotient minimization problem on the unit sphere [18] P2 The Brockett-cost-function minimization problem on a Stiefel manifold [18] P3 The off-diagonal cost function minimization problem on oblique manifold [32] P4 Computation of stability number [33] P5 The closest unit norm column approximation problem [28] grad f (x k ) x k ≤ 10 −6 (the algorithm converges and an approximate optimal solution is obtained); -the maximum number of iterations 10000 is reached; -the line search returns a displacement vector whose norm is less than 10 −10 , i.e., α k η k < 10 −10 .
For each method, we tested three different types of Riemannian line search conditions: the Riemannian Armijo condition (2.4), the Riemannian Wolfe conditions (2.5) and (2.6), and the Riemannian strong Wolfe conditions (2.5) and (2.7). For simplicity, they will be denoted by "Armijo," "Wolfe," and "Strong," respectively, in the following figures. The corresponding parameters are set as follows: ρ = 0.5, σ = 10 −4 , σ 1 = 10 −4 and σ 2 = 0.9. For execution details, the backtracking line search algorithm [29, Algorithm 2] is used to find a stepsize satisfying the Riemannian Armijo condition; the line search algorithm [29,Algorithm 3] is used to return a stepsize satisfying the Riemannian strong Wolfe conditions, and it is also used to find a stepsize satisfying the Riemannian Wolfe conditions by changing the condition of step 6 in [29, Algorithm 3] to φ (α i ) ≥ σ 2 φ (0).

Numerical results and presentation
For the convenience of numerical comparisons, we use the performance profiles introduced by Dolan and Moré [35] to illustrate the efficiencies of all the methods. In more detail, the benchmark results are generated by running a set of solvers S on a set of problems P and recording information of interest, such as the number of iterations and CPU time. If there are n p problems and n s solvers, then for each problem p ∈ P and each solver s ∈ S, the performance ratio is defined as r p,s := t p,s min s ∈S t p,s , where t p,s denotes the number of iterations (CPU time or other measures) required by the solver s to solve the problem p. And then a comprehensive evaluation of solver performance can be defined by where sizeA represents the number of elements of a set A. In fact, ρ s (τ ) is the probability for solver s ∈ S that a performance ratio r p,s is within a factor τ ∈ R of the best possible ratio. As a result, ρ s is the (cumulative) distribution function for the performance ratio. Intuitively, the solver whose curved shape is on the top will win over the rest of the solvers. See [35] for more details.
In what follows, to make the figures clearer, we divide the compared eleven methods into two groups. In the first group, we compare our methods (RSCG1, RSCG2 and RSCG3) with the methods R-HS, R-DY, R-PRP, and R-FR. In the second group, we compare our methods with the hybrid methods R-HS-DY, R-Hyb1, R-Hyb2, and R-Hyb3. For each group, the above-mentioned three Riemannian line searches are tested.      Armijo line search. In addition, we observe that RSCG1 has the best performance, followed by RSCG2 and RSCG3.
In summary, it can be seen intuitively from the six groups of figures above that the numerical characteristics of the proposed Riemannian spectral CG methods are superior to the compared Riemannian CG methods for the test problems. In particular, RSCG1 performs best of all the methods, followed by RSCG2 and RSCG3. It can also be seen that the performance of the proposed methods is less dependent on the line searches used when compared with the other Riemannian CG methods.

Conclusion
We have presented a class of spectral CG methods for solving Riemannian optimization problems. It seems to be the first time that we generalize the classical spectral CG methods in Euclidean space to the Riemannian setting. The proposed Riemannian spectral parameter guarantees that the search direction satisfies the sufficient descent property which is independent of the line search. Our algorithmic framework is very flexible in the sense that the Riemannian CG parameter is only required to satisfy β RCG k ≤ β R-FR k . Global convergence is established under the Riemannian Wolfe conditions. In the numerical experiments, we tested three specific choices of β RCG k and compared them with Riemannian CG methods under different types of line search conditions. Numerical results show that our methods generally consume less number of iterations and less CPU time. In future work, it is interesting to explore more advanced choices of the Riemannian spectral parameter θ RS k aiming to obtain more efficient algorithms.