A regularized limited memory subspace minimization conjugate gradient method for unconstrained optimization

In this paper, based on the limited memory techniques and subspace minimization conjugate gradient (SMCG) methods, a regularized limited memory subspace minimization conjugate gradient method is proposed, which contains two types of iterations. In SMCG iteration, we obtain the search direction by minimizing the approximate quadratic model or approximate regularization model. In RQN iteration, combined with regularization technique and BFGS method, a modified regularized quasi-Newton method is used in the subspace to improve the orthogonality. Moreover, some simple acceleration criteria and an improved tactic for selecting the initial stepsize to enhance the efficiency of the algorithm are designed. Additionally, a generalized nonmonotone line search is utilized and the global convergence of our proposed algorithm is established under mild conditions. Finally, numerical results show that the proposed algorithm has a significant improvement over ASMCG_PR and is superior to the particularly well-known limited memory conjugate gradient software packages CG_DESCENT (6.8) and CGOPT(2.0) for the CUTEr library.


Introduction
Consider problem where f : R n → R is a continuously differentiable nonlinear function.
Throughout the article, we use the following notations.
• represents the Euclidean norm and λ max denotes the maximum eigenvalue.Moreover, dist{x, S} = inf{ y − x , y ∈ S}, where x ∈ R n and S ∈ R n .
Nonlinear conjugate gradient(CG) method is a well-known method for solving the problem (1), which main iteration is where x k is the kth iteration point, α k > 0 is the stepsize and d k is the search direction obtained by where g k is the gradient of f (x k ) and β k is the conjugate parameter.
It is shown in theory that the convergence and numerical performance variation of different CG methods depend on the selection of conjugate parameters.Some very classical choices of the conjugate parameter β k are Fletcher-Reeves(FR) [9], Polak-Ribière-Polyak(PRP) [30,31], Dai-Yuan(DY) [7] and Hestenes-Stiefel(HS) [16], and are given by CG algorithms have evolved considerably, and some well-known CG packages such as CG DESCENT [12,14] and CGOPT [5] have been proposed in recent years.Other recent related studies on nonlinear CG algorithms can be found in [4,13].
The subspace minimization conjugate gradient (SMCG) algorithm, as a generalization of the CG algorithm, has received much attention from scholars [1,37], which can be traced back to the work of Yuan and Stoer [39].The search direction of SMCG method is obtained by minimizing the following problem: where Ω k is a subspace spanned by the vectors g k and s k−1 , i.e., Ω k = Span{g k , s k−1 }, and B k ∈ R n×n is an approximation of Hessian matrix, which is positive definite and symmetric.Then the search direction d is given by where u and v are both real parameters.Substituting (5) to (4) and combined with the standard secant equation B k s k−1 = y k−1 , formula (4) is reorganized as follows: where ρ k ≈ g T k B k g k .On the basis of the Barzilai-Borwein(BB) method [2], Dai and Kou [6] proposed an effective BBCG3 method for strictly convex quadratic minimization problem.Afterwards, based on BBCG3 method, Liu and Liu [26] proposed SMCG BB method for solving general unconstrained optimization problems.Motivated by SMCG BB method, some efficient SMCG methods [20,21,36,42] were later proposed, among which the method based on the regularization model presented by Zhao et al. [42] is the best in the numerical performance.
The nonlinear CG method is very effective for unconstrained optimization problems.However, the convergence of the algorithm can be very slow for some ill-posed problems and even for quadratic problems with very small dimensions, which may be due to the loss of orthogonality [15].Hager and Zhang [15] pointed out theoretically that the generated successive gradients either in the CG method or the L-BFGS method for the quadratic test problem should be orthogonal.Yet, Hager and Zhang [15] observed that, when solving the quadratic strictly convex minimization problem PALMER1C in the CUTEr library [10], the CG method loses orthogonality due to the rounding errors, while L-BFGS method preserves the orthogonality.In view of this, they developed the limited memory CG method (CG DESCENT(6.8))to correct the possible loss of orthogonality in ill conditioned optimization problems.For the test problems in the CUTEr library [10], their performance results indicated that CG DESCENT(6.8)has an significant improvement over their previously proposed package CG DESCENT (5.3).
Although CG DESCENT(6.8)[15] is an efficient method for unconstrained optimization, it still suffers from the following shortcomings: (i) In the numerical implementation, the AWolfe line search [14] utilized in the algorithm CG DESCENT (6.8) does not guarantee global convergence.
(ii) CG DESCENT(6.8)contains the following three pre-conditioners, corresponding to three different iterations: where σ k is determined by (4.2) of [15], Bk+1 , Z k and Zk are given by the matrices in literature [15].These three pre-conditioners make the algorithm CG DESCENT(6.8)look complex.
(iii) In the convergence analysis, the algorithm CG DESCENT(6.8)needs to impose the following assumptions on the pre-conditioners: where γ 0 > 0, γ 1 > 0 and γ 2 > 0. These assumptions are comparatively strict and difficult to be verified in actual practice.
To address the above-mentioned shortcomings, Liu et al. [27] presented an improved Dai¨CKou CG algorithm called CGOPT(2.0), which combines limited memory technology and Dai-Kou CG method.
In CGOPT(2.0)[27], they utilized a modified quasi-Newton method to restore the lost orthogonality, and established the convergence of CGOPT(2.0) with fewer assumptions.Some numerical experiments indicated that CGOPT(2.0) is better than the famous CG software package CG DESCENT(6.8)[15].
In view of the above discussion, a regularized limited memory subspace minimization conjugate gradient method on the basis of SMCG method and limited memory technique is studied in this paper.To recover orthogonality, we propose a modified regularized quasi-Newton method.The major contributions of this paper are the following.
1.A regularized limited memory subspace minimization conjugate gradient algorithm is proposed, which combines limited memory technology and SMCG method.
2. Based on the idea of regularization and BFGS method, an improved regularized quasi-Newton method is exploited to improve orthogonality.
3. Some simple acceleration criteria and an improved initial stepsize selection strategy are designed to enhance the efficiency of the algorithm.Additionally, an generalized nonmonotone line search condition is presented, which may be regarded as an extension of the Zhang-Hager's [41] nonmonotone line search.
4. The convergence of the method is built under mild conditions and the corresponding numerical performance shows that the new method is much more effective than the existing methods.
The structure of the paper is as follows.In Section 2, we describe the detail of the regularized limited memory subspace minimization conjugate gradient algorithm, including the direction selection of SMCG iteration and regularized Quasi-Newton iteration and an effective acceleration technique.Moreover, the decision of the initial step size and the generalized nonmonotone Wolfe line search are also given in this section.In Section 3, some important properties of the search direction are analyzed and the global convergence of the proposed algorithm is established.Numerical experiments for algorithm comparison are showed in Section 4. Conclusions are given in the last section.

A Regularized Limited Memory Subspace Minimization Conjugate Gradient Algorithm
In the section, combining the idea of subspace minimization and regularization quasi-Newton method, we present a regularized limited memory subspace minimization conjugate gradient algorithm.Firstly, we give the choices of search direction under different iterations.Subsequently, we develop a very effective acceleration technique, a modified initial step selection strategy and generalized nonmonotonic line search technology to optimize the performance of the proposed algorithm.Finally, the details of algorithm RL SMCG are described.

Direction Selection of SMCG Iteration and Regularized Quasi-Newton Iteration
The regularized limited memory subspace minimization conjugate gradient method mainly contains two kinds of iterations which are SMCG iteration and regularized quasi-Newton(RQN) iteration, respectively.
Furthermore, the search direction derivation of the two iterations is also different.

SMCG iteration
The search direction selection of SMCG iteration is closely related to the properties of the objective function f (x) at the iteration point x k .By reference [3,38], defined to describe how f (x) approaches a quadratic function on a line segment between x k−1 and x k .Literature [24] indicates that if the condition is satisfied, where ξ4 and ξ5 are the smaller positive constants and ξ4 < ξ5 , f (x) may be near to a quadratic function on a line between x k−1 and x k .Moreover, According to [32], we know that if the following condition is satisfied, then the condition number of the Hessian matrix of the normal function may be not very large, here ξ1 and ξ2 are positive constants.
Similar to [42], based on some certain properties of the function f (x) at the current point x k , we derive different search direction by dividing it into the following four cases.
(i) If the condition ( 11) is satisfied while the condition (10) are not, this implies that the quadratic model may not be able to approach the objective function f (x) well at the present iteration point x k .Then, search direction d k will be obtained by minimizing the following cubic regular subproblem, i.e.
where Ω k is a subspace spanned by the vectors g k and s k−1 , B k ∈ R n×n is an approximation of Hessian matrix, which is positive definite and symmetric and satisfying the secant condition is an adaptive regularization parameter obtained from interpolation condition and d k is determined by where v k and u k are parameters to be established.Obviously, we could obtain (12) by giving (4) a weighted regularization term 1 3 σ k d k

3
B k .Substituting (13) to (12), it is easy to obtain that ( 12) is equivalent to where Bk =  is a positive definite and symmetric matrix, ρ k is an estimate of g T k B k g k .Similar to BBCG3 [6], we also use 3 2 I to estimate B k in the term ρ k , which means Then, by solving problem (14) we obtain the following solutions about u k and v k : among them, σ k and ̟ * are the same as those in literature [42], which will not be repeated here.
(ii) If both conditions (11) and (10) hold, this indicates that the objective function f (x) may approach the quadratic model at the current iteration point x k .Since that is the case, let σ k = 0, i.e. we consider deriving the search direction by solving the minimization problem (6).Like (i), we choose and ∆ k is determined by ( 16), then we obtain the following unique solution of quadratic approximate problem (6): here the search direction is calculated by d k = ūk g k + vk s k−1 , where ūk and vk are determined by (17).
(iii) If condition (11) is not satisfied and the conditions are satisfied, where 0 ≤ ξ3 ≤ 1, the condition number of the Hessian matrix may be lager, hence the search direction obtained in cases (i) and (ii) may not be better.However, the condition ( 18) can ensure sufficient descent and linear growth in HS conjugate gradient method.Moreover, because of the finite termination nature of the HS conjugate gradient method for solving exact convex quadratic minimization problems, this choice of direction allows for faster convergence of the algorithm.Then, in this case, the search direction is determined by (3) and 11) nor ( 18) holds, then we pick the following direction, i.e. : In summary, the search direction in the SMCG iteration can be described as in the following: holds and ( 10) does not hold, holds and (10) holds, 11) does not hold and (18) holds, where u k and v k are determined by (15); ūk and vk are determined by (17).
If the successive gradients have orthogonality or the lost orthogonality is restored, the algorithm performs SMCG iteration.On the contrary, if the orthogonality is lost, the iteration will turn to the following regularized quasi-Newton iteration to improve the orthogonality.

Regularized Quasi-Newton(RQN) iteration
When the successive gradients lose their orthogonality, the iteration switches from SMCG iteration to RQN iteration.In other words, a modified regularized BFGS algorithm in subspace S k is proposed to restore the orthogonality, where S k is a subspace generated by the following limited memory m search directions where m > 0 and m is the number of limited memory.In this article, the limited memory m selected in our algorithm does not exceed 11.Then, as soon as orthogonality is corrected, the RQN iteration is terminated and the SMCG iteration is triggered immediately.
First, we introduce some preparations for turning to RQN iteration.Let S k ∈ R n×m be a matrix which In similar fashion to limited memory CG method [15], we also assume that columns of S k are line-independent.Let the QR factorization of S k be S k = Z k Rk , where the columns of Z k ∈ R n×m form the normal orthogonal bases for subspace S k and Rk ∈ R m×m is the upper triangular matrix with positive diagonal terms.
If g k is included almost in subspace S k , then we think that the orthogonality property of the algorithm may be lost.In this case, we interrupt the SMCG iteration and move to minimize the objective function in the subspace S k : The solution to the subspace problem (21) will improve the orthogonality and guide us to a suitable search direction that will lead us out of the subspace S k .Similar to [15], we utilize the distance from g k to subspace is satisfied, where 0 < η0 < 1 and η0 is small, we think g k is almost contained in S k , it means that the orthogonality of the successive gradients has lost.Then, we switch to RQN iteration to solve the subspace problem ( 21) until the gradient is nearly orthogonal enough to the subspace to meet the condition where 0 < η0 < η1 < 1.At this time, the algorithm iteration will go away subspace S k and turn to the SMCG iteration.Because the column of Z k is the orthonormal basis of S k , it's not hard to know from the definition of dist {g k , S k } that ( 22) and ( 23) can be expressed as and In [15], Hager and Zhang utilized the limited memory BFGS (L-BFGS) [22,28] method to solve the subspace problem (21) for restoring the orthogonality, and achieved better numerical results.However, it should be noted that the convergence analysis of the limited memory CG method [15] requires imposing strict assumptions (8) on the preprocessors (7).Because the dimension m of the chosen subspace S k is usually small and when orthogonality is lost, the properties of the function at the iteration point maybe not very good.Based on these, we consider a regularized L-BFGS method in the subspace S k for solving the subproblem (21).
The search direction of general quasi-Newton method [40] for unconstrained optimization (1) is the form of d k = −B −1 k g k , where B k is a positive definite and symmetric approximation to the Hessian matrix.As one of the most popular methods of quasi-Newton method, L-BFGS method stores the approximate Hessian matrix of the objective function using small memory and computes the search direction d k using the nearest m vector pairs of (s k−i , y k−i ), i = 0, 1, . . ., m − 1.
Ueda and Yamashita [35] presented a regularized Newton method for nonconvex unconstrained optimization, whose search direction d k is obtained by solving the following linear equations: where µ > 0 is referred to as the regularized parameter.The regularized Newton method [35] generally defaults to a step size of 1, and global convergence is guaranteed by controlling the parameter µ k .However, as a type of Newton method, the regularized Newton method in [35] must solve the Hessian matrix of f which is particularly computationally complex.To address this drawback, some scholars proposed the regularized limited memory BFGS-type method [33,23] for solving unconstrained optimization problems, i.e. the search direction d k is the solution of the following equations where matrix B k is an approximate Hessian determined by a particular quasi-Newton method.Regularization technology can effectively improve the efficiency of quasi-Newton method in solving ill-conditioned problems.Nevertheless, when computing B k by the L-BFGS method, it is very hard to calculate (B k + µI) −1 .
Hence, motivated by [34], we present a regularized quasi-Newton method which combines the BFGS method with the regularized technique to improve orthogonality in the m-dimensional subspace S k .In this paper, we consider B k + µI as an approximation of ∇ 2 f (x k ) + µI.Because the matrix B k is the approximate Hessian of f (x k ) and B k + µI can be used as an approximate Hessian of f (x k ) + µ 2 x 2 .At this point, we utilize (s k , y k (µ)) instead of (s k , y k ), where Note that the regularized BFGS method stores as many vector pairs as the traditional BFGS method and hence it does not require additional memory.
In [19], a effective BFGS quasi-Newton method for solving nonconvex unconstrained minimization was proposed by Li and Fukushima [19], in which the matrix B k+1 is updated by where υ > 0 and α > 0. Some recent advances about modified BFGS method can be found in [18,11,34].
Inspired by the quasi-Newton methods described above, we propose an improved regularized BFGS method to solve the subproblem (21) in subspace S k .

Remark 1.
In what follows, the variables with hats belong to subspace S k , distinguished from the ones found in the full space R n .
The subproblem ( 21) can be expressed as Similar to [27], because the regularized quasi-Newton directions in the subspace S k always transform to the full space R n and QR decomposition of matrix S k , we can obtain Let B k (µ) = B k + µI, then inspired by Li and Fukushima [19], we develop an improved regularized BFGS method to solve the above subproblem (28) with a search direction of the form where Bk+1 (µ) is given by where υ > 0, mod(k, l) = 0 represents the remainder for k modulo l, ŷk (µ) = ŷk + µŝ k and µ > 0 is an important regularized parameter.The condition mod(k, l) = 0 means the matrix Bk (µ) will be reset to the identity matrix Î after updating l times, which ensures the good convergence of the algorithm.In the paper, we set l = max(m 2 , 20).Obviously, ŝT k ŷk (µ) > 0, and as soon as the matrix Bk (µ) is symmetric and positive definitive, it is not hard to prove that the matrix Bk+1 (µ) is symmetric and positive definitive.
As a very important regularization parameter, µ is closely related to the convergence analysis of the regularized BFGS method.In this paper, the idea of the trust-region radius is used to find the suitable search direction by controlling µ, in other words, The ratio of objective function value reduction to model function value reduction is utilized.Then, give the definition of a ratio function r k ( dk , µ) as follows where qk : Then, if the ratio function r k ( dk , µ) is relatively large, this means that compared with the reduction of the model function, the reduction of the objective function is large enough, we choose to reduce the parameter µ.On the flip side, if the ratio function r k ( dk , µ) is relatively small, i.e., f (x k ) − f (x k + α k dk ) is small, we will increase µ.In addition, to ensure that the algorithms converge well, we limit µ to an interval, i.e. 0 < µ min < µ < µ max .In general, if the next iteration point is closer to the current iteration point, the reduction of the function value may not be obvious.At this time, we hope to get a new iteration point by modifying the search direction, then the search direction improved by regular parameter µ may be a good choice.Therefore, if ŝk 2 ≤ τ (τ > 0), our choice and update of µ are as follows: where 0 < σ 1 ≤ 1, σ 2 > 1 and 0 < σ 3 ≤ 1.Otherwise, we choose µ = 0, i.e., the regularized BFGS method is reduced to a general BFGS method.
Remark 2. In order to simplify the symbol and facilitate writing, we still record the updated symbol µ k+1 as µ.
In the process of algorithm implementation, the search direction (29) in subspace S k always converts to the full space R n at each RQN iteration, i.e., where and Bk+1 (µ) is given by (30).
In Section 3, we will show that matrices Bk+1 (µ) and P k have some good properties in the RQN iteration, which is critical for the convergence analysis.

An Effective Acceleration Technique
In order to optimize the performance of the algorithm, Sun et al. [32] proposed an acceleration technique, which replaces (2) with the following new iterative form where ηk ≥ 0 is an acceleration parameter obtained from an interpolation function.In view of the numerical effect of the acceleration technique, our algorithm also takes it into account.Similar to reference [32], we minimize the following interpolation function to get the acceleration parameter ηk : ηk = arg min q(ϕ k (η)), (37) where η ≥ 0, ϕ k (η) = f (x k + ηα k d k ), and q(ϕ k (η)) represents the interpolation function defined by ϕ k (η).
In the paper, we consider minimizing the quadratic interpolation function [29] By minimizing (38) we have and ǭ > 0 is a small constant.We propose the following acceleration criterion, which is simpler than the rule in reference [32], that is bk where ǭ, τ , τ , c, ς and ς are all small positive constants, bk and g z = ∇f (z).When the condition (40) holds, we accelerate the algorithm and update the relevant variables.In addition, one of the necessary conditions for successful acceleration is that the trial iteration point must satisfy the line search condition.Therefore, if the algorithm accelerates successfully, update the iteration point x k+1 by using (36).Otherwise the algorithm acceleration fails and returns to the original algorithm, at which point ηk = 1, update the iteration point x k+1 with (2).
In reference [32], the acceleration criterion is divided into three cases, which seems to be more complex, while our acceleration criterion has only one case and the form is simpler.

Choices of the Initial Stepsize and the Generalized Nonmonotone Wolfe Line Search
It is well known that the design of the search direction and the conditions of the line search are two critical factors which affect the efficiency of the line search algorithm.In this subsection, we will develop an improved nonmonotone Wolfe line search which can be regarded as an extension of the Zhang-Hager's [41] nonmonotone line search.In addition, an improved initial step selection strategy is designed.
For the sake of convenience, we express the one-dimensional line search function as The choice of the initial stepsize α 0 k is of great importance for a line search in an optimization method.For the Newton-like methods, choosing the initial step α 0 k = 1 is important to speed up convergence.For the conjugate gradient methods, it is essential to use information from the current iteration of the problem to make initial guesses [29].In the conjugate gradient method, there have been various ways to choose the initial stepsize, for example, see [5,12,15,29].However, it did not have an agreement on which is the best.
In particular, Hager and Zhang [15] select the initial step in CG DESCENT as below: where q φ k (0) , φ ′ k (0) , φ k (τ 1 α k−1 ) represents the interpolation function given by the three values φ k (0) , φ ′ k (0) and φ k (τ 1 α k−1 ) , τ1 and τ2 are positive parameters.In CGOPT, Dai and Kou [5] determined the initial stepsize in the following way: where Most recently, Liu and Liu [26] discussed the development a very effective initial stepsize selection strategy for SMCG method by combining the BB methods and the interpolation technique.
Based on the above research, we devise an improved strategy to obtain the initial stepsize.We first consider the initial stepsize for the search direction in the RQN iteration.
(ii) Initial stepsize of the search direction (34) with B k+1 (µ) = I. where For the initial stepsize of the search direction in the SMCG iteration.If the search direction d k is calculated by (20) with d k = −g k , the initial stepsize is chosen in the same way as the RQN iteration, which is determined by ( 43).If the search direction d k is given by ( 19), the initial stepsize is determined by where ᾱk is determined by (45) and αk = min q(φ k (0), φ k ′ (0), φ k ( ᾱk )).
Next, we introduce a generalized line search condition, which can be regarded as a development of the Zhang-Hager's nonmonotone line search.We recall the nonmonotone line search introduced by Zhang and Hager [41] f where 48), it is easy to see that C k+1 is a convex combination of f k+1 and , it is thus clear that C k can be regard as a convex combination of the function values As it was reported in [41], the nonmonotone line search proposed by Zhang and Hager plays a crucial role in generating an appropriate stepsize compared to the monotone line search method.Based on ( 47) and (48), Huang et al. [17] presented a very effective nonmonotone line search technique, which can be regard as an extension of Zhang-Hager's nonmonotone line search, that is where Inspired by the previous discussion, we will study a generalized nonmonotone Wolfe line search technique based on (48) and (49).Considering the acceleration technique, the generalized nonmonotone Wolfe line search conditions are as follows: where 0 < δ min < δ k < δ max < 1, σ ∈ (0, 1), Q 0 = 1, C 0 = f 0 , ηk is an acceleration parameter determined by (39), C k and Q k are updated as follows where when k ≥ 1, C k+1 and Q k+1 are updated by (52), and η k is given as Here η k is a parameter that controls the degree of non-monotonicity, referred to [25].
Furthermore, we demonstrate that the generalized nonmonotone Wolfe line search is an extension of the Zhang-Hager's nonmonotone Wolfe line search method.It follows from (50) that we get It is easy to see that if δ k = δ Q k+1 , nonmonotone line search condition (56) reduces to the Zhang-Hager's nonmonotone Wolfe line search condition (47).This means that the Zhang-Hager's nonmonotone Wolfe line search condition in [41] can be considered as a particular version of (50).

A Regularized Limited Memory Subspace Minimization Conjugate Gradient Algorithm(RL SMCG)
In this subsection, we describe the regularized limited memory subspace minimization conjugate gradient algorithm in detail.As mentioned above, the regularized limited memory subspace minimization conjugate gradient algorithm is made of two kinds of iterations.The "state" in Algorithm 1 represents for the type of iteration, i.e., state= "SMCG" means that SMCG iteration will be carried out, and state= "RQN" means that RQN iteration will be performed.
If the condition (40) holds, then go to 6.1.
If (state = "SMCG"), then If (24) holds, then state = "RQN".elseif (state = "RQN"), then If (25) holds, then state = "SMCG".end Step 11.Set k := k + 1 and go to Step 1. Remark 3. Notably, when the lost orthogonality is corrected, our algorithm terminates the RQN iteration and immediately calls the SMCG iteration.However, the limited memory CG method [15] first carries out the complex preprocessing CG iteration after the orthogonality is improved.This means that algorithm RL SMCG is more simple compared to the limited memory CG method [15].

Convergence Analysis
In the section, we establish the global convergence of the algorithm RL SMCG under the following assumptions and properties.
Define N to be an open neighborhood of the level set L (x 0 ) = {x ∈ R n : f (x) ≤ f (x 0 )} , where x 0 is an initial point.
Assumption 1 (i) The objective function f is continuously differentiable in N and the level set is bounded from below.(ii) The gradient g of the objective function is Lipschitz continuous in N , i.e., there exists a constant L > 0 such that g(x) − g(y) ≤ L x − y , ∀x, y ∈ N .
Under these assumptions, we have the following several properties.Lemma 1 Suppose that Assumption 1 holds.Then, for Bk+1 (µ) in (30), there exist three constants ξ1 > 0, ξ2 > 0 and ξ3 > 0 such that Proof We know that Z k is a normal orthogonal basis of S k and the dimension m < +∞, hence we have According to (30) and the property of the matrix norm in finite dimensional spaces, we can get that λ max Bk (µ) = 1 or .
Further, by ŷk (µ) = ŷk + µŝ k , µ > 0, we get The fourth inequality above is obtained from ŷk = Z T k y k , Z k ≤ ξ 0 and Assumption 1 (ii).Because Bk (µ) will be set to Î after a maximum of l updates, combining with (57) easy to get λ max Bk+1 (µ) Let Pk (µ) = B−1 k+1 (µ).According to (30) and some simple matrix operations, we have that Pk (µ) = Î or . For any ẑ = 0 ∈ R m and Pk (µ) in (58), we have The above inequality is divided by ẑ 2 , and the resulting inequality is maximized, then we have The third inequality above is obtained from ŷk = Z T k y k , Z k ≤ ξ 0 and Assumption 1 (ii).Because Pk (µ) will be set to Î after a maximum of l updates, it is easy to know that there exists a constant ξ2 > 0 such that λ max B−1 k+1 (µ) = λ max Pk (µ) ≤ ξ2 .Since B−1 k+1 (µ) is a positive definite and symmetric matrix, we have B−1 As a result, using the equivalence property of matrix norm in a finite dimensional space, it follows that there exists a constant ξ3 > 0 such that B−1 k+1 (µ) ≤ ξ3 .The proof is completed.⊓ ⊔ Lemma 2 Suppose that Assumption 1 holds.Then, for P k in (35), there exist three constants γ 0 > 0, γ 1 > 0 and γ 2 > 0 such that where P −1 k denotes the pseudoinverse of P k .
Proof By (25), (35) and Lemma 1, we obtain that Therefore, we can get the conclusions.The proof is completed.

⊓ ⊔
Subsequently, we provide some properties of the search directions produced by the algorithm RL SMCG, which are crucial for the following convergence analysis.
Lemma 3 Suppose that Assumption 1 holds.Then, there exists a constant c 1 > 0 such that the search directions (20) and (34) are calculated by algorithm RL SMCG satisfy the sufficient descent condition: Proof We divide the proof into the following two cases.
(i) SMCG iteration.Similar to the proof of Lemma 4.1 of [42], it is easy to have where .
(ii) RQN iteration.According to Lemma 2, we have By setting c1 = min {c 1 , γ 1 }, we can obtain (60).The proof is completed.⊓ ⊔ Lemma 4 Suppose that Assumption 1 holds.Then, there exists a constant c 1 > 0 such that the search directions (20) and (34) are calculated by algorithm RL SMCG satisfy Proof We divide the proof into the following two cases.
(i) SMCG iteration.Referring to the proof procedure of Lemma 4.2 of [42], it is easy to get (ii) RQN iteration.According to Lemma 2, we obtain By setting c2 = min {c 2 , γ 0 }, we can obtain (61).The proof is completed.

⊓ ⊔
The following lemmas are very critical for the convergence analysis of algorithm RL SMCG.
Lemma 5 Suppose that Assumption 1 holds, and the sequence {x k } is generated by the algorithm RL SMCG.
Then, If acceleration succeeds: If acceleration fails: Where σ are given by (51).
Proof We divide the proof into the following two cases.
(ii) If acceleration fails: Let ηk = 1, and the rest of the proof procedure is the same as before.⊓ ⊔ Lemma 6 Suppose that Assumption 1 holds, and the sequence {x k } is generated by the algorithm RL SMCG.
Then, there holds that f k ≤ C k for each k.
Proof We divide the proof into the following two cases.
(i) If acceleration succeeds: The new iterative update format is x k+1 = x k + ηk α k d k , where ηk = − āk bk .Through (56), we have Combining (52), δ k > 0, lemma 5 and the sufficiently descent property of the direction d k+1 , we have f k+1 < C k .The remaining proof process refers to Lemma 5.1 in [42], we can obtain f k+1 ≤ C k+1 , hence f k ≤ C k is established for each k.

(ii) If acceleration fails:
Let ηk = 1, and the rest of the proof procedure is the same as before.
⊓ ⊔ Theorem 1 Suppose that Assumption 1 holds, the sequence {x k } is generated by the algorithm RL SMCG. Then, Proof We divide the proof into the following two cases.
(i) If acceleration succeeds: By Assumptions 1, lemmas 3 -5 and the generalized nonmonotone Wolfe line search conditions (50) and (51), we get that Where β = (ii) If acceleration fails: Let ηk = 1, and the rest of the proof procedure is the same as before.⊓ ⊔

Numerical Experiments
In this section, we compare the numerical performance of RL SMCG with ASMCG PR [32], CG DESCENT(6.8)[15] and CGOPT(2.0)[27] for the 145 test problems from CUTEr library [10].The codes of CG DESCENT(6.8)[15] and CGOPT(2.0)[27]   We will show the performances of the test methods using the performance profiles introduced by Dolan and Moré [8].In the following Figs.1-12, "N iter ","N f ","N g " and "T cpu " represent the number of iterations, the number of function evaluations, the number of gradient evaluations and CPU time(s), respectively.
We divided the numerical experiments in three teams.
In the first set of numerical experiments, figures 1-4 illustrate the performance profiles of RL SMCG and ASMCG PR [32].From Figs. 1, 2, 3 and 4, we can observe that RL SMCG has a quite significant improvement over ASMCG PR in terms of the number of iterations, the number of function evaluations, In the second set of numerical experiments, we give a comparison of the performance profiles of RL SMCG with CG DESCENT(6.8)[15].Regarding the number of iterations and the number of function evaluations in Fig. 5 and Fig. 6 respectively, we observe that RL SMCG is a little better than CG DESCENT(6.8)for the number of iterations and the number of function evaluations.As shown in Fig. 7, we can see that RL SMCG is much better than CG DESCENT(6.8) in terms of the number of gradient evaluations, because RL SMCG outperforms for about 71.5% of the CUTEr test problems, while the percentage of software CG DESCENT(6.8) is below 40%.It can be observe from Fig. 8 that RL SMCG is faster than CG DESCENT(6.8) in terms of CPU time.By Theorem 1, RL SMCG is globally convergent with the generalized nonmonotone Wolfe line search, while CG DESCENT (6.8) does not guarantee global convergence when using the rather efficient approximate Wolfe (AWolfe) line search.This means that RL SMCG is superior to CG DESCENT(6.8)for CUTEr library in theory and numerical performance.
In the third set of the numerical experiments, comparing the performance of RL SMCG with CGOPT(2.0)[27].As shown in Figs. 9 and 10, we can take a look at RL SMCG performs almost always better than CGOPT(2.0) in terms of the number of iterations and the number of function evaluations.Figures.11 and   In this paper, combined subspace minimization conjugate gradient method with limited memory technique, we presented a regularized limited memory subspace minimization conjugate gradient method, which contains two types of iteration.In the proposed algorithm, a modified regularized quasi-Newton method is given in small dimensional subspace to correct the orthogonality, and an improved initial step size selection strategy and some simple acceleration criteria are designed.Moreover, we establish the global convergence of the proposed algorithm by utilizing generalized nonmonotone Wolfe line search under some mild assumptions.Some numerical results suggest that our algorithm yields a tremendous improvement over the ASMCG PR and outperforms the most up-to-date limited memory CG software packages CG DESCENT (6.8) and CGOPT(2.0).

Ethical Approval
Not Applicable

Availability of supporting data
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Competing interests
The authors declare no competing interests.
means that C k can employ information about the known function values from the previous iteration.The Zhang-Hager's nonmonotone line search (47) is reduced to the standard Armijo line search condition when η k = 0 for each k.

Step 4 .
Determine a stepsize α k satisfying the generalized nonmonotone Wolfe line search (50) and (51) with initial stepsize α 0 k .Step 5.Compute the trial iteration z = x k + α k d k and g z = ∇f (z).If g z ∞ ≤ ε, then stop; otherwise, go to Step 6.

Fig. 4 :
Fig. 1: N iter 12 indicates that RL SMCG outperforms CGOPT(2.0) in terms of the number of gradient evaluations and CPU time for the CUTEr library.