On the Convergence Rate of Multi-Block ADMM

0 downloads 0 Views 182KB Size Report
Aug 19, 2014 - 1 + A2x∗. 2 + A3x∗. 3 − b = 0,. (2.12) where gi(x∗ i ) ∈ ∂fi(x∗ i ), i = 1,2,3. Furthermore, the following condition is assumed in our subsequent ...
arXiv:1408.4265v1 [math.OC] 19 Aug 2014

On the Convergence Rate of Multi-Block ADMM Tianyi Lin∗

Shiqian Ma∗

Shuzhong Zhang†

March 30, 2014

Abstract The alternating direction method of multipliers (ADMM) is widely used in solving structured convex optimization problems. Despite of its success in practice, the convergence properties of the standard ADMM for minimizing the sum of N (N ≥ 3) convex functions with N block variables linked by linear constraints, have remained unclear for a very long time. In this paper, we present convergence and convergence rate results for the standard ADMM applied to solve N -block (N ≥ 3) convex minimization problem, under the condition that one of these functions is convex (not necessarily strongly convex) and the other N − 1 functions are strongly convex. Specifically, in that case the ADMM is proven to converge with rate O(1/t) in a certain ergodic sense, and o(1/t) in non-ergodic sense, where t denotes the number of iterations. Keywords: Alternating Direction Method of Multipliers, Convergence Rate, Convex Optimization

1

Introduction

We consider solving the following multi-block convex minimization problem: min f1 (x1 ) + f2 (x2 ) + · · · + fN (xN ) s.t. A1 x1 + A2 x2 + · · · + AN xN = b xi ∈ Xi , i = 1, . . . , N,

(1.1)

where Ai ∈ Rp×ni , b ∈ Rp , Xi ⊂ Rni are closed convex sets, and fi : Rni → Rp are closed convex functions. One recently popular way to solve (1.1), when the functions fi ’s are of special structures, is to apply the ADMM. The ADMM is closely related to the Douglas-Rachford [6] and Peaceman-Rachford [22] operator splitting methods that date back to 1950s. These operator splitting methods were further studied later in [20, 10, 12, 7]. The ADMM has been revisited recently due to its success in solving problems with special structures arising from compressed sensing, machine learning, image processing, and so on; see the recent survey papers [2, 8] for more information. ∗ Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China. † Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN 55455, USA.

1

ADMM for solving (1.1) is based on an augmented Lagrangian method framework. The augmented Lagrangian function for (1.1) is defined as

2 + * N

N

N X X X

γ Aj xj − b Aj xj − b + fj (xj ) − λ, Lγ (x1 , . . . , xN ; λ) :=

, 2

j=1

j=1

j=1

where λ is the Lagrange multiplier and γ > 0 is a penalty parameter. In a typical iteration of the standard ADMM for solving (1.1), the following updating procedure is implemented:  k+1 x1 := argminx1 ∈X1 Lγ (x1 , xk2 , . . . , xkN ; λk )    k k k  xk+1 := argminx2 ∈X2 Lγ (xk+1  1 , x2 , x3 , . . . , xN ; λ )   2 .. (1.2) .  k+1 k+1 k+1 k+1 k   := argmin xN ∈XN Lγ (x1 , x2 , . . . , xN −1 , xN ; λ )  xN    λk+1 := λk − γ PN A xk+1 − b . j=1

j j

Note that the standard ADMM (1.2) minimizes in each iteration the augmented Lagrangian function with respect to x1 , . . . , xN alternatingly in a Gauss-Seidel manner. The ADMM (1.2) for solving twoblock convex minimization problems (i.e., N = 2) has been studied extensively in the literature. The global convergence of ADMM (1.2) when N = 2 has been shown in [11, 9]. There are also some very recent works that study the convergence rate properties of ADMM when N = 2 (see, e.g., [18, 21, 5, 1, 17]).

However, the convergence of ADMM (1.2) when N ≥ 3 has remained unclear for a long time. Some recent progresses on solving this problem are the followings. In [13], it is shown that ADMM (1.2) converges if all the functions f1 , . . . , fN are further assumed to be strongly convex. In a very recent work by Chen et al. [3], a counter-example is constructed that shows the failure of ADMM (1.2) when N ≥ 3. In fact, it is shown in [3] that for a given γ, it is possible to construct a counter-example such that ADMM fails. The authors in [3] also give some sufficient condition to guarantee the convergence of ADMM (1.2). The sufficient condition given in [3] for N = 3, is that there are two matrices that are ⊤ ⊤ orthogonal, i.e., A⊤ 1 A2 = 0, or A2 A3 = 0, or A1 A3 = 0. This condition is essentially very restrictive, and rules out many interesting applications. In a recent work by Hong and Luo [19], a variant of ADMM (1.2) with small step size in updating the Lagrange multiplier is studied. Specifically, [19] proposes to replace the last equation in (1.2) to   N X λk+1 := λk − αγ  Aj xk+1 − b , j j=1

where α > 0 is a small step size. Linear convergence of this variant is proved under the assumption that the objective function satisfies certain error bound conditions. However, it is noted that the selection of α is in fact bounded by some parameters associated with the error bound conditions to guarantee the convergence. Therefore, it might be difficult to choose α in practice. There are also studies on the convergence rate of some other variants of ADMM (1.2), and we refer the interested readers to [15, 16, 14, 4] for details of these variants. In this paper, we focus on the original standard ADMM (1.2). 2

Our contribution. The main contribution in this paper lies in the followings. We show that the standard ADMM (1.2) when N ≥ 3 converges with rate O(1/t) in ergodic sense and o(1/t) in nonergodic sense, under the assumption that f2 , . . . , fN are strongly convex and f1 is convex but not necessarily strongly convex. It should be pointed out that our assumption is weaker than the one used in [13], in which all the functions are required to be strongly convex. Moreover, unlike the sufficient condition suggested in [3], we do not make any assumption on the matrices A1 , . . . , AN . To the best of our knowledge, the convergence rate results given in this paper are the first sublinear convergence rate results for standard ADMM when N ≥ 3. Organization. The rest of this paper is organized as follows. In Section 2 we provide some preliminaries for our convergence rate analysis. In Section 3, we prove the convergence rate of ADMM (1.2) in the ergodic sense. In Section 4, we prove the convergence rate of ADMM (1.2) in the non-ergodic sense. Section 5 draws some conclusions and points out some future directions.

2

Preliminaries

We will only prove the convergence results of ADMM for N = 3, because all the analysis can be extended to arbitrary N easily. As a result, for the ease of presentation and succinctness, we assume N = 3 in the rest of this paper. We will present the results for general N but omit the proofs. We restate the problem (1.1) for N = 3 as min f1 (x1 ) + f2 (x2 ) + f3 (x3 ) s.t. A1 x1 + A2 x2 + A3 x3 = b x1 ∈ X 1 , x2 ∈ X 2 , x3 ∈ X 3 .

(2.1)

The ADMM for solving (2.1) can be summarized as (note that some constant terms in the three subproblems are discarded): 1 γ kA1 x1 + A2 xk2 + A3 xk3 − b − λk k2 2 γ x1 ∈X1 1 γ := argmin f2 (x2 ) + kA1 xk+1 + A2 x2 + A3 xk3 − b − λk k2 1 2 γ x2 ∈X2 γ 1 + A2 xk+1 + A3 x3 − b − λk k2 := argmin f3 (x3 ) + kA1 xk+1 1 2 2 γ x3 ∈X3   := λk − γ A1 xk+1 + A2 xk+1 + A3 xk+1 −b . 1 2 3

xk+1 := argmin f1 (x1 ) + 1

(2.2)

xk+1 2

(2.3)

xk+1 3 λk+1

(2.4) (2.5)

The first-order optimality conditions for (2.2)-(2.4) are given respectively by xk+1 ∈ Xi , i = 1, 2, 3, and i h i k+1 ⊤ k ⊤ k+1 k k ⊤ g (x ) − A λ + γA (A x + A x + A x − b) ≥ 0, ∀x1 ∈ X1 , (2.6) (x1 − xk+1 ) 1 1 2 3 1 1 2 3 1 1 1 h i ⊤ k ⊤ k+1 ⊤ g2 (xk+1 + A2 xk+1 + A3 xk3 − b) ≥ 0, ∀x2 ∈ X2 , (2.7) (x2 − xk+1 2 ) − A2 λ + γA2 (A1 x1 2 2 ) h i k+1 ⊤ k ⊤ ⊤ g3 (xk+1 + A2 xk+1 + A3 xk+1 − b) ≥ 0, ∀x3 ∈ X3 , (2.8) (x3 − xk+1 3 ) − A3 λ + γA3 (A1 x1 2 3 3 ) 3

where gi ∈ ∂fi is the subdifferential of fi for i = 1, 2, 3. Moreover, by combining with can be rewritten as h i k+1 ⊤ k+1 ⊤ k k+1 ⊤ k k+1 ⊤ g (x ) − A λ + γA A (x − x ) + γA A (x − x ) ≥ 0, (x1 − xk+1 ) 1 2 3 1 1 2 1 3 1 2 3 1 h i ⊤ k+1 k k+1 ⊤ g2 (xk+1 + γA⊤ (x2 − xk+1 2 A3 (x3 − x3 ) ≥ 0, 2 ) − A2 λ 2 ) i h ⊤ k+1 ⊤ ≥ 0, g3 (xk+1 (x3 − xk+1 3 ) − A3 λ 3 )

(2.5), (2.6)-(2.8) ∀x1 ∈ X1 , (2.9) ∀x2 ∈ X2 , (2.10) ∀x3 ∈ X3 . (2.11)

We denote Ω = X1 × X2 × X3 × Rp and the optimal set of (2.1) as Ω∗ , and the following assumption is made throughout this paper. Assumption 2.1 The optimal set Ω∗ for problem (2.1) is non-empty. According to the first-order optimality conditions for (2.1), solving (2.1) is equivalent to finding (x∗1 , x∗2 , x∗3 , λ∗ ) ∈ Ω∗ such that the following holds:  ∗ (x1 − x∗1 )⊤ (g1 (x∗1 ) − A⊤  1 λ ) ≥ 0, ∀x1 ∈ X1 ,   ∗ ⊤ ∗ ⊤ ∗ (x2 − x2 ) (g2 (x2 ) − A2 λ ) ≥ 0, ∀x2 ∈ X2 , ∗ (x − x∗ )⊤ (g3 (x∗3 ) − A⊤  3 λ ) ≥ 0, ∀x3 ∈ X3 ,   3 ∗ 3 A1 x1 + A2 x∗2 + A3 x∗3 − b = 0,

(2.12)

where gi (x∗i ) ∈ ∂fi (x∗i ), i = 1, 2, 3.

Furthermore, the following condition is assumed in our subsequent analysis. Assumption 2.2 The functions f2 and f3 are strongly convex with parameters σ2 > 0 and σ3 > 0, respectively; i.e., the following two inequalities hold: σ2 ky − xk2 , 2 σ3 f3 (y) ≥ f3 (x) + (y − x)⊤ g3 (x) + ky − xk2 , 2 f2 (y) ≥ f2 (x) + (y − x)⊤ g2 (x) +

∀x, y ∈ X2 ,

(2.13)

∀x, y ∈ X3 ,

(2.14)

or equivalently, (y − x)⊤ (g2 (y) − g2 (x)) ≥ σ2 ky − xk2 , 2



(y − x) (g3 (y) − g3 (x)) ≥ σ3 ky − xk ,

∀x, y ∈ X2 ,

(2.15)

∀x, y ∈ X3 ,

(2.16)

where g2 (x) ∈ ∂f2 (x) and g3 (x) ∈ ∂f3 (x) are the subdifferentials of f2 and f3 respectively. In our analysis, the following two well-known identities are used frequently, (w1 − w2 )⊤ (w3 − w4 ) = (w1 − w2 )⊤ (w3 − w1 ) =

 1  1 kw1 − w4 k2 − kw1 − w3 k2 + kw3 − w2 k2 − kw4 − w2 k2 , (2.17) 2 2  1 2 2 kw2 − w3 k − kw1 − w2 k − kw1 − w3 k2 . (2.18) 2 4

Notations. For simplicity, we use the following notation to denote the stacked vectors or tuples:  ∗    k   x1 x1 x1 u =  x2  , uk =  xk2  , u∗ =  x∗2  . x∗3 x3 xk3

We denote by f (u) ≡ f1 (x1 )+f2 (x2 )+f3 (x3 ) the objective function of problem (2.1); gi is a subgradiatent of fi ; λmax (B) denotes the largest eigenvalue of a real symmetric matrix B; kxk denotes the Euclidean norm of x.

3

Ergodic Convergence Rate of ADMM

In this section, we prove the O(1/t) convergence rate of ADMM (2.2)-(2.5) in the ergodic sense. n

o , where σ2 and σ3 are defined in As-

σ2 σ3 , ⊤ 2λmax (A⊤ 2 A2 ) 2λmax (A3 A3 ) k+1 k+1 k+1 k+1 sumption 2.2. Let (x1 , x2 , x3 , λ ) ∈ Ω be generated by for any u∗ = (x∗1 , x∗2 , x∗3 ) ∈ Ω∗ and λ ∈ Rp , it holds that

Lemma 3.1 Assume that γ ≤ min

ADMM from given (xk2 , xk3 , λk ). Then,

 ⊤  k+1 −A⊤ x∗1 − xk+1 1λ 1 k+1   x∗ − xk+1   −A⊤ 2λ 2 2    f (u∗ ) − f (uk+1 ) +  k+1 ⊤   x∗ − xk+1   −A3 λ 3 3 k+1 k+1 k+1 A1 x1 + A2 x2 + A3 x3 − b λ − λk+1  γ   1 2 kλ − λk k2 − kλ − λk+1 k2 + kA1 x∗1 + A2 x∗2 + A3 xk3 − bk2 − kA1 x∗1 + A2 x∗2 + A3 xk+1 − bk + 3 2γ 2   γ kA1 x∗1 + A2 xk2 + A3 xk3 − bk2 − kA1 x∗1 + A2 xk+1 + A3 xk+1 − bk2 + 2 3 2 γ ≥ kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 . (3.1) 1 2 

Proof. Note that combining (2.9)-(2.11) yields   ⊤    ⊤ k+1 ⊤  k g1 (xk+1 x1 − xk+1 γA⊤ k+1  1 ) − A1 λ 1 1 A2 γA1 A3 − x x 2 2  x2 − xk+1   g2 (xk+1 ) − A⊤ λk+1  +    ≥ 0. 0 γA⊤ 2 A3 2 2 2 k − xk+1 x k+1 k+1 3 k+1 3 0 0 g3 (x3 ) − A⊤ x3 − x3 3λ 

The key step in our proof is to bound the following two terms ⊤ ⊤ k k+1 k k+1 (x1 − xk+1 1 ) A1 (A2 (x2 − x2 ) + A3 (x3 − x3 ))

5

and

⊤ ⊤ k k+1 (x2 − xk+1 2 ) A2 A3 (x3 − x3 ).

(3.2)

For the first term, we have h i k+1 k+1 k k ⊤ ⊤ A (x − x ) + A (x − x ) (x1 − xk+1 ) A 2 2 3 3 1 2 3 1 h i⊤ h i k+1 k+1 k k = (A1 x1 − b) − (A1 xk+1 − b) (−A x − A x ) − (−A x − A x ) 2 3 2 3 2 3 1 2 3  1 kA1 x1 + A2 xk2 + A3 xk3 − bk2 − kA1 x1 + A2 xk+1 + A3 xk+1 − bk2 = 2 3 2  1 k+1 k+1 2 k+1 k k 2 + kA1 xk+1 + A x + A x − bk − kA x + A x + A x − bk 2 3 1 2 3 2 3 1 2 3 1 2  1 1 k+1 2 = + 2 kλk+1 − λk k2 kA1 x1 + A2 xk2 + A3 xk3 − bk2 − kA1 x1 + A2 xk+1 + A x − bk 3 3 2 2 2γ 1 + A2 xk2 + A3 xk3 − bk2 , − kA1 xk+1 1 2 where in the second equality we used the identity (2.17), and the last equality follows from the updating formula for λk+1 in (2.5). For the second term, we have k+1 ⊤ ⊤ k (x2 − xk+1 2 ) A2 A3 (x3 − x3 ) k = ((A1 x1 + A2 x2 − b) − (A1 x1 + A2 xk+1 − b))⊤ ((−A3 xk+1 2 3 ) − (−A3 x3 ))   1 = kA1 x1 + A2 x2 + A3 xk3 − bk2 − kA1 x1 + A2 x2 + A3 xk+1 − bk2 3 2  1 k+1 2 k+1 k 2 + kA1 x1 + A2 xk+1 + A x − bk − kA x + A x + A x − bk 3 1 1 2 3 3 2 3 2 2  1 2 kA1 x1 + A2 x2 + A3 xk3 − bk2 − kA1 x1 + A2 x2 + A3 xk+1 − bk ≤ 3 2 1 + kA1 x1 + A2 xk+1 + A3 xk+1 − bk2 , 2 3 2

where in the second equality we applied the identity (2.17). Therefore, we have k+1 k+1 k+1 ⊤ k+1 ⊤ ⊤ k k ⊤ k (x1 − xk+1 1 ) γA1 (A2 (x2 − x2 ) + A3 (x3 − x3 )) + (x2 − x2 ) γA2 A3 (x3 − x3 )   γ kA1 x1 + A2 xk2 + A3 xk3 − bk2 − kA1 x1 + A2 xk+1 + A3 xk+1 − bk2 ≤ 2 3 2   γ 2 + kA1 x1 + A2 x2 + A3 xk3 − bk2 − kA1 x1 + A2 x2 + A3 xk+1 − bk 3 2 1 γ + kλk+1 − λk k2 + kA1 x1 + A2 xk+1 + A3 xk+1 − bk2 2 3 2γ 2 γ + A2 xk2 + A3 xk3 − bk2 . (3.3) − kA1 xk+1 1 2

6

Combining (3.3), (3.2) and (2.5), it holds for any λ ∈ Rp that  ⊤  ⊤ k+1 g1 (xk+1 x1 − xk+1 1 ) − A1 λ 1 ⊤ k+1  1  x2 − xk+1   g2 (xk+1 2 ) − A2 λ 2  + (λ − λk+1 )⊤ (λk+1 − λk )    k+1 k+1 k+1  γ    x3 − x g3 (x3 ) − A⊤ 3λ 3 k+1 k+1 k+1 k+1 λ−λ A1 x1 + A2 x2 + A3 x3 − b γ 1 + A3 xk+1 − bk2 + kλk+1 − λk k2 + kA1 x1 + A2 xk+1 2 3 2γ 2  γ k+1 2 + kA1 x1 + A2 xk2 + A3 xk3 − bk2 − kA1 x1 + A2 xk+1 + A x − bk 3 3 2 2  γ 2 + kA1 x1 + A2 x2 + A3 xk3 − bk2 − kA1 x1 + A2 x2 + A3 xk+1 − bk 3 2 γ kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 . (3.4) ≥ 1 2 

Using the identity ⊤  1 1 k+1 1  λ − λk+1 (λk+1 − λk ) + kλ − λk k2 − kλ − λk+1 k2 , kλ − λk k2 = γ 2γ 2γ letting u = u∗ in (3.4), and applying the facts that (invoking (2.13) and (2.14)) σ2 ∗ 2 ∗ k+1 ⊤ k+1 kx − xk+1 2 k ≥ (x2 − x2 ) g2 (x2 ), 2 2 σ3 ∗ k+1 ⊤ k+1 2 ∗ f3 (x∗3 ) − f3 (xk+1 kx − xk+1 3 )− 3 k ≥ (x3 − x3 ) g3 (x3 ), 2 3

f2 (x∗2 ) − f2 (xk+1 2 )−

and γ kA1 x∗1 + A2 xk+1 + A3 xk+1 − bk2 2 3 2 γ = kA2 (xk+1 − x∗2 ) + A3 (xk+1 − x∗3 )k2 2 3 2 k+1 k+1 ≤ γ(λmax (A⊤ − x∗2 k2 + λmax (A⊤ − x∗3 k2 ), 2 A2 )kx2 3 A3 )kx3 we obtain, ⊤   k+1 x∗1 − xk+1 −A⊤ 1λ 1 k+1  x∗ − xk+1    −A⊤ 2λ 2 2    f (u∗ ) − f (uk+1 ) +  ⊤ k+1  x∗ − xk+1    −A3 λ 3 3 k+1 k+1 k+1 A1 x1 + A2 x2 + A3 x3 − b λ − λk+1     γ 1 2 kλ − λk k2 − kλ − λk+1 k2 + kA1 x∗1 + A2 x∗2 + A3 xk3 − bk2 − kA1 x∗1 + A2 x∗2 + A3 xk+1 − bk + 3 2γ 2     σ2 σ3 + γλmax (A⊤ kxk+1 − x∗2 k2 + γλmax (A⊤ kxk+1 − x∗3 k2 2 A2 ) − 3 A3 ) − 2 3 2 2  γ k+1 2 kA1 x∗1 + A2 xk2 + A3 xk3 − bk2 − kA1 x∗1 + A2 xk+1 + A x − bk + 3 2 3 2 γ k+1 k k 2 kA1 x1 + A2 x2 + A3 x3 − bk . ≥ 2 

7

σ3 σ2 ⊤ This together with the facts that γλmax (A⊤ 2 A2 ) − 2 ≤ 0 and γλmax (A3 A3 ) − 2 ≤ 0 implies the desired inequality (3.1). 

Now, we are ready to present the O(1/t) ergodic convergence rate of the ADMM. Theorem 3.2 Assume that γ ≤ min generated by ADMM (2.2)-(2.5) from ¯ t be defined as λ t

x ¯t1 =

1 X k+1 x1 , t+1 k=0

n

o

σ2 σ3 , ⊤ 2λmax (A⊤ 2 A2 ) 2λmax (A3 A3 ) k k k given (x2 , x3 , λ ). For any

t

x ¯t2 =

1 X k+1 x2 , t+1

k+1 k+1 k+1 . Let (xk+1 ) ∈ Ω be 1 , x2 , x3 , λ

integer t > 0, let u ¯t = (¯ xt1 , x¯t2 , x ¯t3 ) and

t

x ¯t3 =

k=0

1 X k+1 x3 , t+1 k=0

t

¯t = λ

1 X k+1 λ . t+1 k=0

Then, for any (u∗ , λ∗ ) ∈ Ω∗ , by defining ρ := kλ∗ k + 1, we have 0 ≤ f (¯ ut ) − f (u∗ ) + ρkA1 x ¯t1 + A2 x ¯t1 + A3 x ¯t3 − bk ρ2 + kλ0 k2 γ γ kA3 x∗3 − A3 x03 k2 + + kA1 x∗1 + A2 x02 + A3 x03 − bk2 . ≤ 2(t + 1) γ(t + 1) 2(t + 1) Note that this also implies that both the error of the objective function value and the residual of the equality constraint converge to 0 with convergence rate O(1/t), i.e., |f (¯ ut ) − f (u∗ )| = O(1/t),

and

kA1 x ¯t1 + A2 x ¯t1 + A3 x ¯t3 − bk = O(1/t).

(3.5)

¯ t ) ∈ Ω for all t ≥ 0. By Lemma 3.1 and invoking the Proof. Because (uk , λk ) ∈ Ω, it holds that (¯ ut , λ convexity of function f (·), we have

=







¯t3 − b) f (u∗ ) − f (¯ ut ) + λ⊤ (A1 x ¯t1 + A2 x ¯t2 + A3 x  ∗ ⊤   ¯t ¯t1 x1 − x −A⊤ 1λ ¯t  x∗2 − x   ¯t2  −A⊤ 2λ    f (u∗ ) − f (¯ ut ) +  t ∗ t ⊤ ¯  x3 − x  ¯3   −A3 λ t t t t ¯ λ−λ A1 x ¯1 + A2 x ¯2 + A3 x ¯3 − b      ∗ ⊤ ⊤ λk+1 −A x1 − xk+1 1 1 t k+1   x∗ − xk+1   1 X −A⊤  ∗ k+1 2λ 2 2     f (u ) − f (u ) +  ∗ ⊤ λk+1    −A   t+1 x3 − xk+1 3 3 k=0 k+1 k+1 A1 xk+1 + A x + A x − b λ − λk+1 2 2 3 3 1  t 1 X 1 (kλ − λk+1 k2 − kλ − λk k2 ) t+1 2γ k=0  γ 2 ∗ ∗ k 2 + kA1 x∗1 + A2 x∗2 + A3 xk+1 − bk − kA x + A x + A x − bk 1 1 2 2 3 3 3 2  i γ k+1 2 ∗ k k 2 + kA1 x∗1 + A2 xk+1 + A x − bk − kA x + A x + A x − bk 3 3 1 1 2 2 3 3 2 2 γ γ 1 kλ − λ0 k2 − kA1 x∗1 + A2 x∗2 + A3 x03 − bk2 − kA1 x∗1 + A2 x02 + A3 x03 − bk2 . − 2γ(t + 1) 2(t + 1) 2(t + 1) 8

Note that this inequality holds for all λ ∈ Rp . From weak duality of (2.1) we obtain 0 ≥ f (u∗ ) − f (¯ ut ) + (λ∗ )⊤ (A1 x ¯t1 + A2 x ¯t1 + A3 x ¯t3 − b). Moreover, since ρ := kλ∗ k + 1, kλ − λ0 k2 ≤ 2(ρ2 + kλ0 k2 ) for all kλk ≤ ρ, and A1 x∗1 + A2 x∗2 + A3 x∗3 = b, we obtain 0 ≤ f (¯ ut ) − f (u∗ ) + ρkA1 x ¯t1 + A2 x ¯t1 + A3 x ¯t3 − bk ρ2 + kλ0 k2 γ γ kA3 (x∗3 − x03 )k2 + + kA2 (x∗2 − x02 ) + A3 (x∗3 − x03 )k2 . ≤ 2(t + 1) γ(t + 1) 2(t + 1)

(3.6)

We now define the function v(ξ) = min{f (u)|A1 x1 + A2 x2 + A3 x3 − b = ξ, x1 ∈ X1 , x2 ∈ X2 , x3 ∈ X3 }. It is easy to verify that v is convex, v(0) = f (u∗ ), and λ∗ ∈ ∂v(0). Therefore, from the convexity of v, it holds that v(ξ) ≥ v(0) + hλ∗ , ξi ≥ f (u∗ ) − kλ∗ kkξk. (3.7) ¯ Therefore, by denoting the constant Let ξ¯ = A1 x ¯1 + A2 x ¯2 + A3 x ¯3 − b, we have f (¯ ut ) ≥ v(ξ). C :=

γ kλ0 k2 γ kA3 x∗3 − A3 x03 k2 + + kA1 x∗1 + A2 x02 + A3 x03 − bk2 , 2 γ 2

and combining (3.6) and (3.7), we can get C + ρ2 /γ ¯ ≥ f (¯ ¯ − ρkξk ut ) − f (u∗ ) ≥ −kλ∗ kkξk, t+1 which, by using ρ = kλ∗ k + 1, yields, 2 ¯ ≤ C + ρ /γ . ¯1 + A2 x ¯2 + A3 x ¯3 − bk = kξk kA1 x t+1

(3.8)

Moreover, by combining (3.6) and (3.8), one obtains that −

C + ρ2 /γ ρC + ρ3 /γ ≤ f (¯ ut ) − f (u∗ ) ≤ . t+1 t+1

(3.9)

(3.8) and (3.9) imply (3.5) immediately.



Therefore, we have established the O(1/t) convergence rate of the ADMM (2.2)-(2.5) in an ergodic sense. Our proof is readily extended to the case of N -block ADMM (1.2). The following theorem shows the O(1/t) convergence rate of N -block ADMM (1.2). We omit the proof here for the sake of succinctness. Theorem 3.3 Assume that  γ≤ min i=2,··· ,N −1

2σN 2σi , ⊤ (2N − i)(i − 1)λmax (Ai Ai ) (N − 2)(N + 1)λmax (A⊤ N AN ) 9



,

k+1 k+1 k+1 k+1 )∈Ω where σi is the strong convexity parameter of fi , i = 2, . . . , N . Let (xk+1 1 , x2 , x3 , · · · , xN , λ be generated by the N -block ADMM (1.2). For any integer t > 0, we define t

x ¯ti

t

1 X k+1 xi , 1 ≤ i ≤ N, = t+1

¯t = λ

k=0

1 X k+1 λ . t+1 k=0

Then, for ρ := kλ∗ k + 1, it holds that N X i=1

(fi (¯ xti ) − fi (x∗i ))

2

N N −1 X N

X X ρ2 + kλ0 k2 γ

0 ∗ t Am (xm − xm ) + . Ai x ¯ i − b ≤ + ρ



2(t + 1)

γ(t + 1) i=1

i=1

m=i+1

Similarly as Theorem 3.2, this also implies that N -block ADMM (1.2) converges with rate O(1/t) in terms both error of objective function value and the residual of the equality constraints, i.e., it holds that

N

X

t t ∗ Ai x ¯i − b = O(1/t). |f (¯ u ) − f (u )| = O(1/t), and

i=1

4

Non-Ergodic Convergence Rate of ADMM

In this section, we prove an o(1/k) non-ergodic convergence rate for ADMM (2.2)-(2.5). Let us first observe the following (see also Lemma 4.1 in [13]). Suppose at the (k + 1)-th iteration of ADMM (2.2)-(2.5), we have  k+1  A2 x2 − A2 xk2 = 0, (4.1) A xk+1 − A3 xk3 = 0,  3 3k+1 k+1 A1 x1 + A2 xk+1 + A x − b = 0. 3 3 2 Then, (2.9)-(2.11) would immediately lead to  i h k+1 k+1 ⊤ ⊤ λk+1 ≥ 0, ∀x ∈ X ,  g (x ) − A (x − x )  1 1 1 1 1 1   1 i h k+1 k+1 ⊤ ⊤ k+1 ≥ 0, ∀x2 ∈ X2 , (x2 − x2 ) g2 (x2 ) − A2 λ  i h    (x3 − xk+1 )⊤ g3 (xk+1 ) − A⊤ λk+1 ≥ 0, ∀x3 ∈ X3 . 3 3 3

k+1 k+1 k+1 In other words, if (4.1) is satisfied, then (xk+1 ) would have been already an optimal 1 , x2 , x3 , λ solution for (2.1). It is therefore natural to introduce a residual for the linear system (4.1) as an optimality measure. Below is such a measure, to be denoted by Rk+1 :

Rk+1 := kA1 xk+1 + A2 xk+1 + A3 xk+1 − bk2 + 2kA2 xk+1 − A2 xk2 k2 + 3kA3 xk+1 − A3 xk3 k2 . 1 2 3 2 3

(4.2)

In the sequel, we will show that Rk converges to 0 at the rate o(1/k). Note that this gives the convergence rate of ADMM (2.2)-(2.5) in non-ergodic sense. We first show that Rk is non-increasing. 10

σ2 ⊤ max (A2 A2 )

Lemma 4.1 Assume γ ≤ min{ λ

σ3 ⊤ max (A3 A3 )



}. Let the sequence {xk1 , xk2 , xk3 , λk } be generated

by ADMM (2.2)-(2.5). It holds that Rk defined in (4.2) is non-increasing, i.e., Rk+1 ≤ Rk , k = 0, 1, 2, ....

(4.3)

Proof. Letting x1 = xk1 in (2.6) yields, h i k+1 k+1 ⊤ k ⊤ k k ⊤ g (x ) − A λ + γA (A x + A x + A x − b) ≥ 0, (xk1 − xk+1 ) 1 1 2 3 1 1 2 3 1 1 1 with g1 ∈ ∂f1 , which further implies that (xk+1 − xk1 )⊤ g1 (xk+1 1 1 ) h i k+1 ⊤ ⊤ k k k+1 ⊤ γA⊤ + A2 xk2 + A3 xk3 − b) ≤ (xk1 − xk+1 1 (A1 x1 1 ) (−A1 λ ) + (x1 − x1 )

k+1 ⊤ k+1 ⊤ k k + A2 xk2 + A3 xk3 − b) = (A1 xk1 − A1 xk+1 1 ) (−λ ) + γ(A1 x1 − A1 x1 ) (A1 x1  γ ⊤ k = (A1 xk1 − A1 xk+1 kA1 xk1 + A2 xk2 + A3 xk3 − bk2 1 ) (−λ ) + 2  2 , − kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 − kA1 xk1 − A1 xk+1 1 1 k

(4.4)

where the last equality is due to the identity (2.18). Letting x1 = xk+1 in (2.9) with k + 1 changed to 1 k yields, h i k ⊤ k ⊤ k−1 k ⊤ k−1 k k ⊤ g (x ) − A λ + γA A (x − x ) + γA A (x − x ) ≥ 0, (xk+1 − x ) 1 2 3 1 1 1 2 1 3 1 2 3 1 which further implies that

⊤ k (xk1 − xk+1 1 ) g1 (x1 )

i h k−1 k−1 k+1 ⊤ k k ⊤ k ⊤ k ⊤ ⊤ k ) + A A (x − x ) A (x − x A ) λ ) + γ(x − x ≤ (xk+1 − x ) (−A 3 1 2 2 2 1 3 3 1 1 1 1 1 i h k−1 k−1 k+1 k k k k ⊤ k ⊤ ) + A (x − x ) A (x − x ) (−λ ) + γ(A x − A x ) = (A1 xk+1 − A x 3 3 2 2 1 1 1 1 1 1 2 3 1   γ k+1 k−1 k−1 k ⊤ k k 2 k k 2 ≤ (A1 xk+1 − A x ) (−λ ) + kA x − A x k + kA (x − x ) + A (x − x )k 1 1 1 1 1 1 2 2 3 3 2 3 1 2 k+1 k ⊤ k ≤ (A1 x1 − A1 x1 ) (−λ )  γ (4.5) kA1 xk+1 − A1 xk1 k2 + 2kA2 (x2k−1 − xk2 )k2 + 2kA3 (x3k−1 − xk3 )k2 . + 1 2

Combining (4.4) and (4.5) gives

h i k+1 k k ⊤ g (x ) − g (x ) (xk+1 − x ) 1 1 1 1 1 1 γ ≤ kA1 xk1 + A2 xk2 + A3 xk3 − bk2 − kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 1 2  + 2kA2 (x2k−1 − xk2 )k2 + 2kA3 (x3k−1 − xk3 )k2 .

Letting x2 = xk2 in (2.7) yields, h i k+1 ⊤ k ⊤ ⊤ g2 (xk+1 + A2 xk+1 + A3 xk3 − b) ≥ 0, (xk2 − xk+1 2 ) − A2 λ + γA2 (A1 x1 2 2 ) 11

(4.6) (4.7) (4.8)

which further implies that (xk+1 − xk2 )⊤ g2 (xk+1 2 2 ) h i k+1 k+1 k+1 ⊤ ⊤ k ⊤ ⊤ k k γA (A x + A x + A x − b) ≤ (xk2 − xk+1 ) (−A λ ) + (x − x ) 1 2 3 2 3 2 2 1 2 2 2

⊤ k k k+1 ⊤ k+1 = (A2 xk2 − A2 xk+1 + A2 xk+1 + A3 xk3 − b) 2 ) (−λ ) + γ(A2 x2 − A2 x2 ) (A1 x1 2  γ ⊤ k kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 = (A2 xk2 − A2 xk+1 1 2 ) (−λ ) + 2  2 , − kA1 xk+1 + A2 xk+1 + A3 xk3 − bk2 − kA2 xk2 − A2 xk+1 1 2 2 k

(4.9)

where the last equality is due to the identity (2.18). Letting x2 = xk+1 in (2.10) with k + 1 changed to 2 k yields, h i k−1 k ⊤ (xk+1 − xk2 )⊤ g2 (xk2 ) − A⊤ − xk3 ) ≥ 0, 2 λ + γA2 A3 (x3 2

which further implies that

⊤ k (xk2 − xk+1 2 ) g2 (x2 )

h i ⊤ k−1 k k ⊤ ⊤ k k+1 k ⊤ A A (x − x ) ≤ (xk+1 − x ) (−A λ ) + γ(x − x ) 2 3 3 3 2 2 2 2 2

= (A2 xk+1 − A2 xk2 )⊤ (−λk ) + γ(A2 xk+1 − A2 xk2 )⊤ (A3 x3k−1 − A3 xk3 ) 2 2   γ k+1 k−1 k 2 k 2 k ⊤ k . kA x − A x k + kA x − A x k ) (−λ ) + ≤ (A2 xk+1 − A x 2 2 2 2 3 3 3 3 2 2 2 2

Combining (4.9) and (4.10) gives h i k+1 k ⊤ k (xk+1 − x ) g (x ) − g (x ) 2 2 2 2 2 2 γ ≤ kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 − kA1 xk+1 + A2 xk+1 + A3 xk3 − bk2 1 1 2 2  +kA3 x3k−1 − A3 xk3 k2 .

(4.10)

(4.11)

Letting x3 = xk3 in (2.11) and x3 = xk+1 in (2.11) with k + 1 changed to k, and adding the two resulting 3 inequalities, yields, h i k+1 k k ⊤ g (x ) − g (x ) (xk+1 − x ) 3 3 3 3 3 3 k+1 k ≤ (xk+1 − xk3 )⊤ (A⊤ − A⊤ 3λ 3λ ) 3

= (A3 xk+1 − A3 xk3 )⊤ (λk+1 − λk ) 3   k+1 k+1 k+1 ⊤ A x = γ(A3 xk3 − A3 xk+1 ) + A x + A x − b 1 1 2 2 3 3 3  γ kA1 xk+1 + A2 xk+1 + A3 xk3 − bk2 − kA1 xk+1 + A2 xk+1 + A3 xk+1 − bk2 = 1 2 1 2 3 2  2 −kA3 xk3 − A3 xk+1 , 3 k

where the last equality is due to the identity (2.18).

12

(4.12)

Combining (4.7), (4.11) and (4.12) yields, h i h i k+1 k k − xk2 )⊤ g2 (xk+1 (xk+1 − xk1 )⊤ g1 (xk+1 2 ) − g2 (x2 ) 1 ) − g1 (x1 ) + (x2 1 i h k+1 k k ⊤ g (x ) − g (x ) +(xk+1 − x ) 3 3 3 3 3 3 h γ kA1 xk1 + A2 xk2 + A3 xk3 − bk2 − kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 + 2kA2 (x2k−1 − xk2 )k2 ≤ 1 2 +2kA3 (x3k−1 − xk3 )k2 + kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 − kA1 xk+1 + A2 xk+1 + A3 xk3 − bk2 1 1 2 +kA3 x3k−1 − A3 xk3 k2 + kA1 xk+1 + A2 xk+1 + A3 xk3 − bk2 1 2

=

=

i k+1 k+1 k+1 2 2 k −kA1 xk+1 + A x + A x − bk − kA x − A x k 2 3 3 3 3 1 2 3 3 γh kA1 xk1 + A2 xk2 + A3 xk3 − bk2 + 2kA2 (x2k−1 − xk2 )k2 + 3kA3 (x3k−1 − xk3 )k2 2 i k+1 2 −kA3 xk3 − A3 xk+1 + A2 xk+1 + A3 xk+1 − bk2 3 k − kA1 x1 2 3 i γh Rk − Rk+1 + 2kA2 xk+1 − A2 xk2 k2 + 2kA3 xk3 − A3 x3k+1 k2 . 2 2

Note that (2.15) and (2.16) imply that h i k+1 k k ⊤ (x ) − g (x ) ≥ σ2 kxk+1 − xk2 k2 , g (xk+1 − x ) 2 2 2 2 2 2 2 h i k+1 k k ⊤ g (x ) − g (x ) ≥ σ3 kxk+1 − xk3 k2 . (xk+1 − x ) 3 3 3 3 3 3 3 Combining (4.13) and (4.14), and the fact that γ ≤ min Rk+1 ≤ Rk for k = 0, 1, 2, ....

n

σ2 σ3 , ⊤ λmax (A⊤ 2 A2 ) λmax (A3 A3 )

(4.13)

(4.14)

o

, it is easy to see that 

We are now ready to present the o(1/k) non-ergodic convergence rate of the ADMM (2.2)-(2.5).

erated by ADMM (2.2)-(2.5).

o

n

σ3 σ2 , . Let the sequence ⊤ ⊤ 2λP max (A2 A2 ) 2λmax (A3 A3 ) ∞ Then k=1 Rk < +∞ and Rk = o(1/k).

Theorem 4.2 Assume γ ≤ min

13

{xk1 , xk2 , xk3 , λk } be gen-

Proof. Combining (4.11) and (4.12) yields h i h i k+1 k+1 k+1 k k k ⊤ k ⊤ g (x ) − g (x ) g (x ) − g (x ) + (x − x ) (xk+1 − x ) 3 3 3 3 2 2 2 2 3 2 3 2 γh kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 − kA1 xk+1 + A2 xk+1 + A3 xk3 − bk2 ≤ 1 1 2 2 +kA3 x3k−1 − A3 xk3 k2 + kA1 xk+1 + A2 xk+1 + A3 xk3 − bk2 1 2 i k+1 k+1 k+1 2 2 k −kA1 xk+1 + A x + A x − bk − kA x − A x k 2 2 3 3 3 3 3 3 1 γh 2 = kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 + kA3 x3k−1 − A3 xk3 k2 − kA3 xk3 − A3 xk+1 1 3 k 2 i =

−kA1 xk+1 + A2 xk+1 + A3 xk+1 − bk2 1 2 3 γh 2 kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 + kA3 x3k−1 − A3 xk3 k2 − kA3 xk3 − A3 xk+1 1 3 k − Rk+1 2 i

2 +2kA2 xk+1 − A2 xk2 k2 + 3kA3 xk3 − A3 xk+1 2 3 k i γh k−1 k+1 2 k k 2 k 2 k ≤ kA1 xk+1 + A x + A x − bk + kA x − A x k − kA x − A x k − R 2 2 3 3 3 3 3 3 3 3 3 3 k+1 1 2 3γ k+1 k+1 +γλmax (A⊤ − xk2 k2 + λmax (A⊤ − xk3 k2 . 2 A2 )kx2 3 A3 )kx3 2 o n σ3 2 , , we obtain Using (4.14) and the assumption that γ ≤ min 2λ σ(A ⊤ A ) 2λ (A⊤ A ) max

2

2

max

3

3

2 Rk+1 ≤ kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 + kA3 x3k−1 − A3 xk3 k2 − kA3 xk3 − A3 xk+1 1 3 k .

(4.15)

From the optimality conditions (2.12) and the convexity of f , it follows that k+1 ⊤ k+1 ⊤ ⊤ ⊤ ∗ ∗ ⊤ ∗ ∗ ⊤ ∗ f (u∗ ) − f (uk+1 ) ≤ (x∗1 − xk+1 1 ) (A1 λ ) + (x2 − x2 ) (A2 λ ) + (x3 − x3 ) (A3 λ ).

(4.16)

By combining (3.1) and (4.16), we have ⊤  ⊤ ∗  x∗1 − xk+1 A1 (λ − λk+1 ) 1 ∗ k+1 )   x∗ − xk+1   A⊤ 2 (λ − λ 2 2 ∗ k+1 ) A⊤ x∗3 − xk+1 3 (λ − λ 3  γ 1 kA1 x∗1 + A2 x∗2 + A3 xk3 − bk2 − kA1 x∗1 + A2 x∗2 + A3 xk+1 − bk2 + kλk − λk+1 k2 + 3 2γ 2  γ k+1 ∗ k 2 + kA1 x1 + A2 x2 + A3 xk3 − bk2 − kA1 x∗1 + A2 xk+1 + A x − bk 3 3 2 2 γ kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 . (4.17) ≥ 1 2 

Note that the first term in (4.17) is equal to −(A1 xk+1 + A2 xk+1 + A3 xk+1 − b)⊤ (λ∗ − λk+1 ) 1 2 3  ⊤  1  k+1 λ∗ − λk+1 λ − λk = γ  1  ∗ = kλ − λk k2 − kλk+1 − λk k2 − kλ∗ − λk+1 k2 . 2γ 14

Therefore, (4.17) can be rearranged as  1  ∗ k 2 ∗ k+1 2 + (kA1 x∗1 + A2 x∗2 + A3 xk3 − bk2 − kA1 x∗1 + A2 x∗2 + A3 xk+1 − bk2 ) kλ − λ k − kλ − λ k 3 γ2   k+1 2 + kA1 x∗1 + A2 xk2 + A3 xk3 − bk2 − kA1 x∗1 + A2 xk+1 + A x − bk 3 2 3 ≥ kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 . 1

(4.18)

By (4.15) and (4.18) we get that ∞ X

Rk+1

k=1



∞ h X

2 kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 + kA3 x3k−1 − A3 xk3 k2 − kA3 xk3 − A3 xk+1 1 3 k

k=1



∞ X

i

kA1 xk+1 + A2 xk2 + A3 xk3 − bk2 + kA3 x03 − A3 x13 k2 1

k=1



kA3 x03



A3 x13 k2

∞ h  X 2 − bk kA1 x∗1 + A2 x∗2 + A3 xk3 − bk2 − kA1 x∗1 + A2 x∗2 + A3 xk+1 + 3 k=1

  + A3 xk+1 − bk2 + kA1 x∗1 + A2 xk2 + A3 xk3 − bk2 − kA1 x∗1 + A2 xk+1 2 3  1  ∗ k 2 ∗ k+1 2 + 2 kλ − λ k − kλ − λ k γ

≤ kA3 x03 − A3 x13 k2 + kA1 x∗1 + A2 x∗2 + A3 x13 − bk2 + kA1 x∗1 + A2 x12 + A3 x13 − bk2 + Note that we have proved that Rk is monotonically non-increasing, and in Lemma 1.2 of [4], one has

P∞

k=1 Rk

1 ∗ kλ − λ1 k2 . γ2

< +∞. As observed

kR2k ≤ Rk + Rk+1 + · · · + R2k → 0, as k → ∞, and therefore Rk = o(1/k).



Note that our analysis can be extended to N -block ADMM (1.2) easily. The results are summarized in the following theorem and the proof is omitted for the sake of succinctness. Theorem 4.3 Assume that  γ≤ min

 2σi 2σN , . i=2,··· ,N −1 (2N − i)(i − 1)λmax (A⊤ Ai ) (N − 2)(N + 1)λmax (A⊤ AN ) i N P k+1 k+1 k+1 k+1 ) ∈ Ω be generated by ADMM (1.2). Then ∞ Let (xk+1 1 , x2 , x3 , · · · , xN , λ k=1 Rk < +∞ and Rk = o(1/k), where Rk is defined as

2

N N

X X (2N − i)(i − 1)

k+1 kAi xki − Ai xk+1 k2 . Ai xi − b + Rk+1 := i

2 i=1

i=2

15

5

Conclusions

In this paper, we analyzed the sublinear convergence rate of the standard multi-block ADMM in both ergodic and non-ergodic sense. These are the first sublinear convergence rate results for standard multiblock ADMM. Using the techniques developed in this paper, we can also analyze the convergence rate of some variants of the standard multi-block ADMM such as the ones studied in [14] and [4], where the primal variables are updated in a Jacobian manner; we plan to pursue this direction of research in the future.

Acknowledgements Research of Shiqian Ma was supported in part by the Hong Kong Research Grants Council (RGC) Early Career Scheme (ECS) (Project ID: CUHK 439513). Research of Shuzhong Zhang was supported in part by the National Science Foundation under Grant Number CMMI-1161242.

References [1] D. Boley. Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM Journal on Optimization, 23(4):2183–2207, 2013. [2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011. [3] C. Chen, B. He, Y. Ye, and X. Yuan. The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Preprint, 2013. [4] W. Deng, M. Lai, Z. Peng, and W. Yin. Parallel multi-block admm with o(1/k) convergence. Preprint, 2014. [5] W. Deng and W. Yin. On the global and linear convergence of the generalized alternating direction method of multipliers. Technical report, Rice University CAAM, 2012. [6] J. Douglas and H. H. Rachford. On the numerical solution of the heat conduction problem in 2 and 3 space variables. Transactions of the American Mathematical Society, 82:421–439, 1956. [7] J. Eckstein. Splitting methods for monotone operators with applications to parallel optimization. PhD thesis, Massachusetts Institute of Technology, 1989. [8] J. Eckstein. Augmented lagrangian and alternating direction methods for convex optimization: A tutorial and some illustrative computational results. Preprint, 2012. [9] J. Eckstein and D. P. Bertsekas. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55:293–318, 1992.

16

[10] M. Fortin and R. Glowinski. Augmented Lagrangian methods: applications to the numerical solution of boundary-value problems. North-Holland Pub. Co., 1983. [11] D. Gabay. Applications of the method of multipliers to variational inequalities. In M. Fortin and R. Glowinski, editors, Augmented Lagrangian Methods: Applications to the Solution of Boundary Value Problems. North-Hollan, Amsterdam, 1983. [12] R. Glowinski and P. Le Tallec. Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics. SIAM, Philadelphia, Pennsylvania, 1989. [13] D. Han and X. Yuan. A note on the alternating direction method of multipliers. Journal of Optimization Theory and Applications, 155(1):227–238, 2012. [14] B. He, L. Hou, and X. Yuan. On full Jacobian decomposition of the augmented Lagrangian method for separable convex programming. Preprint, 2013. [15] B. He, M. Tao, and X. Yuan. Alternating direction method with Gaussian back substitution for separable convex programming. SIAM Journal on Optimization, 22:313–340, 2012. [16] B. He, M. Tao, and X. Yuan. Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Preprint, 2013. [17] B. He and X. Yuan. On nonergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Preprint, 2012. [18] B. He and X. Yuan. On the O(1/n) convergence rate of Douglas-Rachford alternating direction method. SIAM Journal on Numerical Analysis, 50:700–709, 2012. [19] M. Hong and Z. Luo. On the linear convergence of the alternating direction method of multipliers. Preprint, 2012. [20] P. L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16:964–979, 1979. [21] R. D. C. Monteiro and B. F. Svaiter. Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM Journal on Optimization, 23:475–507, 2013. [22] D. H. Peaceman and H. H. Rachford. The numerical solution of parabolic elliptic differential equations. SIAM Journal on Applied Mathematics, 3:28–41, 1955.

17