A Dynamical Alternating Direction Multiplier Method for Two-Block Optimization Problems


 In this paper, we propose a dynamic alternating direction method of multipliers for two-block separable optimization problems. The well-known classical ADMM can be obtained after the time discretization of the dynamical system. Under suitable condition, we prove that the trajectory asymptotically converges to a saddle point of the Lagrangian function of the problems. When the coefficient matrices in the constraint are identiy matrices, we prove the worst-case O(1/t) convergence rate in ergodic sense.

As is known to all, if problem (1.1) meets the restriction conditions (1.3) and has an optimal solution (x * , z * ), then we have the optimal solution y * of problem (1.2) that makes equation (1.4) valid. On the contrary, (x * , z * , y * ) satisfies the optimality condition (1.4) if and only if (x * , z * ) is the optimal solution of (1.1) and y * is the optimal solution of (1.2).
Let S + (H) denote the set of continuous, linear, positive semidefinite and self-adjoint operators from H to H. For M ∈ S + (H) we define the seminorm ∥ · ∥ M : H → ℜ with ∥x∥ M = √ ⟨x, M x⟩. We recall the following Loewner partial ordering on S + (H) : for M 1 , M 2 ∈ S + (H), M2 , ∀x ∈ H.
In fact, since the 1970s, dynamical systems related to the problem of approximate monotone inclusion and optimization have attracted extensive attention (see Brezis, Baillon and Bruck, Crandall and Pazy [8,14,15,16]). This is not only because of their inherent importance in such fields as differential equations and applied functional analysis, but also because they are considered a useful tool for finding and studying numerical algorithms for optimization problems resulting from discretization of continuous dynamical systems. In the field of optimization, the dynamic method of iterative method can provide an in-depth understanding of the expected behavior of the method. In addition, the methods and techniques we used in the continuous case can be applied to the results of the discretization algorithm. For more information on the relationship between continuous and discrete dynamics, the reader can refer to the reference [1].
In recent years, there are some researches on the dynamical system of numerical algorithm. In [5], Abbas and Attouch considered a forward backward dynamical system. Banert and Bot in [7] introduced a forward-backward-forward dynamical system. Bot and others proposed in [13] an implicit dynamical system.
Csetnek and others considered a Douglas-Rachford type dynamical system in [17]. And recently, a primal-dual dynamical system was introduced in [2], and a proximal alternating minimization algorithm dynamical system was proposed in [3].
One of the motivations for our study is the classical ADMM algorithm resulting from the discretization of this dynamical system, which we will see in the Remark 3. The algorithm and its variants are widely used to solve various optimization problems, and it is one of the research hotspots in the optimization field. Therefore, it is of great significance for us to further understand the algorithm by using the continuous correspondence of its algorithm through tools such as differential equations. Here, we compare the differences between this paper and the existing literatures on the discrete algorithms corresponding to the dynamical system. In reference [2], the discrete algorithm for dynamical systems is a combination of the linearized proximal method of multipliers and the proximal ADMM algorithm. In reference [3], the discrete algorithm corresponding to the dynamical system is the proximal AMA algorithm. In reference [4], the discrete algorithms corresponding to the dynamical system are some fast inertial approximate ADMM algorithms.
In addition, in the process of solving the problem, this paper uses tools similar to literature [2] and [3]. Next, we focus on the differences between this paper and these two literatures. First of all, the structure of the problem we consider is different. Compared with [2], our constraint set contains two linear operators. Secondly, our dynamical system corresponds to different discrete algorithms, which have been mentioned above. Finally, in our dynamical system, there is an additional parameter c (it is a time variable), which is different from reference [2], and we also have another additional parameter σ, which is different from reference [3] (the motivation of taking variables c and σ comes from reference [6], where the numerical scheme ADMM also contains variable parameters c and σ).
The follow-up organization of the article is as follows. In the next section, we obtain the classical ADMM algorithm by explicit time discretization of the dynamical system. In addition, we give the concept of solution related to the dynamical system (1.5) and the equivalent expression of (1.5). Furthermore, we give a numerical example to illustrate how the selection of parameters affects the convergence of the trajectory. In the third section, we focus on proving the existence and uniqueness of strong global solutions of the dynamical systems (1.5). The analysis mainly depends on Cauchy-Lipschitz-Picard theorem. In the section 4, relying on a continuous variant of the Opial lemma and a Lyapunov analysis, where the selection of appropriate energy functions plays a key role, we prove that the solution trajectory of the proposed dynamical system converges weakly to the saddle point of Lagrange L. Furthermore, when the coefficient matrices in the constraint are identiy matrices, we prove the worst-case O( 1 t ) convergence rate in ergodic sense. In addition, we demonstrate how the dynamical alternating direction method of multipliers can be applied in the absence of a strongly convex block.

Solution concept, discretization, example
Let's review the definition of a locally absolutely continuous map before we specify what do we mean by a solution of (1.5).
Definition 1 A function x : [0, ∞) → H is said to be locally absolutely continuous, if it is absolutely continuous on every interval [0, T ], T > 0; that is, for every T > 0 there exists an integrable function y : [0, T ] → H such that Remark 2 (a)Every absolutely continuous function is differentiable almost everywhere.
(b)Let T > 0, and x : [0, T ] → H is an absolutely continuous function. This is equivalent to: for every ϵ > 0 there exists η > 0 such that for any finite family of intervals I k = (a k , b k ) ⊆ [0, T ] the following property holds: for any subfamily of disjoint intervals I j with From this characterization it is easy to see that, if M : H → H is L-Lipschitz continuous with L ≥ 0, then the function z = M • x is absolutely continuous, too. This means that z is differentiable almost everywhere and ∥ż(·)∥ ≤ L∥ẋ(·)∥ holds almost everywhere.
The following definition specifies which type of solutions we consider in the analysis of the dynamical system(1.5).
We say that the function (x, z, y) : [0, +∞) → H × G × K is a strong global solutions of (1.5), if the following properties are satisfied: (i) the functions x, z, y are locally absolutely continuous; (ii) for almost every t ∈ [0, +∞) Remark 3 Let us consider a discretization of the considered dynamical system. The first two inclusions in (1.5) can be written in an equivalent way as for t ∈ [0, +∞). Through explicit discretization with respect to the time variable t and constant step size Furthermore, using convex subdifferential calculus this can be written equivalently for all k ≥ 0 as Hence the dynamical system (1.5) provides through explicit time discretization the following numerical algorithm: Choose (x 0 , z 0 , y 0 ) ∈ H × G × K, σ > 0 and (c k ) k≥0 . For all k ≥ 0 generate the sequence (x k , z k , y k ) k≥0 as follows: The algorithm above is the classical ADMM algorithm introduced by Glowinski in [6].

Remark 4
In this paper we will often use the following equivalent formulation of the dynamical system(1.5). For U (t) = (x(t), z(t), y(t)), (1.5) can be written as .
2 ∥ · −u∥ 2 and G(t, ·) + c(t) 2 ∥ · −v∥ 2 are proper, convex and lower semicontinuous. If f (p) and g(q) are strong convex, then we can use the sign equal in the first and second relation of (2.3).
According to Theorem 2, it has to be satisfied for an − ϵ andċ(t) ≤ 0 for all t in order to ensure asymptotic convergence of the trajectory, where σ f , σ g are the strong convexity param- Since ∥A∥ 2 = 1 and ∥B∥ 2 = 1, we choose c1(t) = 1 25, and c 4 (t) = 1.99 to satisfy the above conditions. From Figure 1-4 above, we can see that the larger σ is, the faster all three trajectories converge, which is independent of the choice of c(t). Furthermore, we can also find that when the σ value is certain large, the growth rate of the primal trajectories x(t) and z(t) are not obvious as σ continues to increase.

Existence and uniqueness of the trajectories
In this section we will prove the existence and uniqueness of the trajectories generated by the dynamical system (1.5). Let's start with two very important lemmas, which are often used in subsequent proofs. To this end, we make the following assumption (P) f and g are σ f − and σ g − strongly convex functions, respectively.

Lemma 1 Let assumption (P) hold true and t ∈ [0, +∞).
Then the operator Lemma 2 Let assumption (P) hold true, (x, z, y) ∈ H × G × K and the maps and Then the following holds for every t, r ∈ [0, ∞).
With these estimates, we can now prove the existence and uniqueness of trajectories.  In this section, we will first give some results, which we will use them to prove the convergence of the trajectories of the dynamical system (1.5). In the following the real vector space taken with respect to the norm topology of L(H) exists. When this is the case, we denote byṀ (t 0 ) ∈ L(H) the value of the limit.
In case M : [0, +∞) → L(H) is derivable at t 0 ∈ [0, +∞) and x, y : [0, +∞) → H are also derivable at t 0 , we will use the following formula: We start with a result where we show that under appropriate conditions the second derivatives of the trajectories exist almost everywhere and give also an upper bound on their norms. This will be used in the proof of the main result Theorem 2.
Proof See Appendix B.
In the following we recall two results which we need for the asymptotic analysis (see [21]).

Lemma 4 Assume that u : [0, +∞) → ℜ is locally absolutely continuous and bounder from below and that there exists
Then there exists lim t→+∞ u(t) ∈ ℜ.
By Lyapunov analysis, we obtain the result that the trajectory generated by the dynamic system (1.5) converges to the Lagrange saddle point of problem (1.1).
Theorem 2 In the setting of the optimization problem (1.1), assume that (P) holds, the set of saddle points of the Lagrangian L is nonempty, is Lipschitz continuous and monotonically decreasing.
For an arbitrary starting point (x 0 , z 0 , y 0 ) ∈ H × G × K, let (x, z, y) : [0, +∞) → H × G × K be the unique strong global solution of the dynamical system (1.5). Then the trajectory (x(t), z(t), y(t)) converges weakly to a saddle point of L as t → ∞.
Proof See Appendix B.
In the case of A = B = Id, we provide rates for both the violation of the feasibility condition by the ergodic trajectories and the convergence of the objective function along these ergodic trajectories to its minimal value.  (1.5). Consider further for every t ∈ (0, +∞) the ergodic trajectories Then there exists K ≥ 0 such that for every t ∈ (0, +∞) In addition, for everyx ∈ H,z ∈ G,x +z = b and for The dynamical alternating direction multiplier method for problem (1.1) in the absence of a stongly convex block In the section 4.1, We assume that both f and g are strongly convex functions. However, in many applications, there is only one block in the objective function is strongly convex. In this section, we demonstrate how to properly apply the dynamical alternating direction method of multipliers to minimize the convex optimization problem in a Hilbert space when there is only one strong convex block in the objective function. We considered the following convex programming questions: where H, G and K are real Hilbert spaces, h : H → ℜ := ℜ ∪ {±∞} is a proper, σ h -strongly convex and lower semicontinuous function, g : G →R := ℜ ∪ {±∞} is a proper, convex and lower semicontinuous function, A is a nonzero linear continuous operators and b ∈ K. In (4.3), θ is not a strongly convex function, so we cannot directly apply the dynamical alternating direction method of multipliers to obtain the similar convergence results in section 4.1. In order to obtain strongly convex terms, we restate the problem as the following equivalent problem: Then, f is strongly convex with modulus σ h − ρ∥A∥ 2 , and g is strongly convex with modulus ρ. Then, (4.4) has the form of (1.1) and satisfies the assumption (P).
Remark 5 It is easy to see that the dynamical alternating direction method of multipliers for (4.3) is defined implicitly as follows: By comparison, it is easy to see that (4.6) is very similar to (4.7).
Similar with Theorems 2, we can obtain the following convergence results of the dynamical alternating direction method of multipliers (4.6):

Corollary 1 In the setting of the optimization problem (4.3), assume that the set of saddle points of the
is Lipschitz continuous and mono-

Conclusions
In this paper, we introduce and study a dynamical system for solving the two-block separable strongly convex minimization problem with linear constraints. In the framework of Lyapunov analysis, by finding an appropriate energy functional, we prove that the solution trajectory of the system converges to the saddle point of Lagrange function of the problem. Furthermore, when the coefficient matrices in the constraint are identiy matrices, we prove the worst-case O( 1 t ) convergence rate in ergodic sense. In addition, we demonstrate how the dynamical alternating direction method of multipliers can be applied in the absence of a strongly convex block. The discretization of the dynamical system considered is the classical ADMM.
A . Proof of Section 3

Lemma 1
Proof Let t ∈ [0, +∞) be fixed. Then we have For all u, v ∈ H we obtain and Since ∂f + c(t)A * A is σ f -strongly monotone, we have Using the Cauchy-Schwarz inequality it follows For t ∈ [0, +∞) fixed we have For all u, v ∈ G we obtain and Since ∂g + c(t)B * B is σ g -strongly monotone, we have Using the Cauchy-Schwarz inequality it follows
(i)Form the definition of R (x,z,y) we have and If we add c(t)A * A(R (x,z,y) (r) + x) on both sides of the relation above, we have

From(A.2) and (A.3) and using that ∂f
Form the Cauchy-Schwarz inequality, the definition of P (x,z,y) it follows (ii)Form the definition of Q (x,z,y) we have and ∈ ∂g(Q (x,z,y) (r) + z) + c(r)B * B(Q (x,z,y) (r) + z).
If we add c(t)B * B(Q (x,z,y) (r) + z) on both sides of the relation above, we have (A.5)

From(A.4) and (A.5) and using that ∂g
Form the Cauchy-Schwarz inequality, the definition of P (x,z,y) and (i) it follows

Theorem 1
Proof In the following we use the equivalent formulation of the dynamical system described in Remark 4. We show the existence and uniqueness of a strong global solution using the Cauchy-Lipschitz-Picard Theorem.

Furthermore by taking into account Lemma 1 we have
According to Lemma 1, we have that J t is c(t) σ g -Lipschitz continuous. We derive: Finally, Then we have, where and which means that Γ (t, ·, ·, ·) is L(t)-Lipschitz continuous. Since c(t) is bounded, it follows that L(·) ∈ L 1 loc ([0, +∞), ℜ).
In the next we prove the second statement. Note that c(t) is bounded for all t ∈ [0, +∞) . Then L 1 , L 2 and L 3 can be taken as being global constants, so that (B.1),(B.2) and (B.6) hold for every t, r ∈ [0, +∞) .

Theorem 2
Proof We need an appropriate energy functional in order to conclude. This will be accomplished in (B.18) below. Let (x * , z * , y * ) ∈ H × G × K be a saddle point of the Lagrangian L. Then it fulfills the system of the optimality conditions From (2.1) we have for almost every t ∈ [0, +∞) and by taking into account the strong monotonicity of ∂f we have In an analog way, according to (2.2) we have for almost every t ∈ [0, +∞) and by taking into account the strong monotonicity of ∂g we have We use the last equation of (1.5) and the optimality condition Ax * + Bz * = b to obtain for almost every t ∈ [0, +∞) By summing up (B.11) and (B.12) and by taking into account (B.13), we obtain for almost every t ∈ [0, +∞) (B.14) We have for almost every t ∈ [0, +∞) (Using the last equality forẏ in (1.5)): Similarly, for almost every t ∈ [0, +∞), by using the last equality forẏ in (1.5), we have .
Taking into account that we obtain that .