PG EXTRA Appendix Small o 1 K v0.1

A Note: the Non-Ergodic o
(1)
k
Rate of
PG-EXTRA
Wei Shi
Abstract
This note provides detailed proof that shows the non-ergodic o
(1)
k
rate of the PG-EXTRA [1] including EXTRA
[2] under general convexity assumption.
I. P RELIMINARIES
To make this note independent, we states the basic problem formulation, the algorithms, assumptions, and
supporting lemmas in this section.
A. Consensus Optimization
We considers a connected network of n agents that cooperatively solve the consensus optimization problem in
the form
1
minimize
f(x) :=
fi (x),
p
xR
n i=1
n
where fi (x) := si (x) + ri (x),
(1)
and si , ri : Rp R are convex differentiable and non-differentiable functions that are kept private by agent i,
respectively.
B. PG-EXTRA and P-EXTRA
We develop the algorithm PG-EXTRA to solve problem (1). PG-EXTRA is outlined in Algorithm 1.
Algorithm 1: PG-EXTRA
Rnn ;
Set mixing matrices W Rnn and W
Choose step size > 0;
1. For all agent i, pick any initial state x0(i) Rn , and
1
x 2 = W x0 s(x0 );
bx1 = arg min r(x) +
x
1
2 x
x 2 2F ;
2. for k = 0, 1, , for all agent i, do

1
1
xk [s(xk+1 ) s(xk )];
xk+1+ 2 = W xk+1 + xk+ 2 W
xk+2 = arg min r(x) +

x
1
2 x
xk+1+ 2 2F ;
end for
W. Shi is with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Email:
wilburs@illinois.edu.
October 27, 2015
DRAFT
When r = 0, all proximal step vanishes and PG-EXTRA reduces to EXTRA [2]; When s = 0, all explicit
gradient computation vanishes and PG-EXTRA reduces to P-EXTRA, short for proximal EXTRA.
, satisfy Assumption 1 stated in the next subsection.
The two mixing matrices, W and W
C. Assumptions
Unless otherwise stated, the convergence results in this section are given under Assumptions 13.
Assumption 1 (Mixing matrices). Consider a connected network G = {V, E} consisting of a set of agents
V = {1, 2, , n} and a set of undirected edges E. An unordered pair (i, j) E if agents i and j have a direct
= [w
communication link. The mixing matrices W = [wij ] Rnn and W
ij ] Rnn satisfy
1) (Decentralization property) If i = j and (i, j) E, then wij = w
ij = 0.
=W
T.
2) (Symmetry property) W = W T , W
} = span{1}; null{I W
} span{1}.
3) (Null space property) null{W W
0 and
4) (Spectral property) W
I+W
2
< W.
<W
Assumption 2 (Convex objective with partial Lipschitz gradients). For all i, functions ri and si are proper
closed convex and si satisfy
si (x) si (y)2 Lsi x y2 ,
x, y Rp ,
where Lsi > 0 are constant.

Following Assumption 2, f (x) =
n
i=1
fi (x(i) ) is proper closed convex and s satisfies
s(x) s(y)F Ls x yF ,
x, y Rnp
with constant Ls = maxi {Lsi }.

Assumption 3 (Solution existence). The set of solution(s) X to problem (1) is nonempty.
D. Supporting Lemmas
We first give a lemma on the first-order optimality conditions of problem (1).
and their economical-form
Lemma 1 (First-order optimality conditions). Given mixing matrices W and W
W = V SV T , define U , V S 1/2 V T = (W
W )1/2 Rnn . Then, under
singular value decomposition W
Assumptions 13, the following two statements are equivalent
x Rnn is consensual, that is, x(1) = x(2) = = x(n) , and x(1) is optimal to problem (1);
There exists q = U p for some p Rnp and subgradient r(x

) such that
U q + (s(x ) + r(x
)) = 0,
DRAFT
U x = 0.
(2a)
(2b)
October 27, 2015
Let x and q satisfy the optimality conditions (2a) and (2b). Introduce the auxiliary sequence
qk ,
U xt .
t=0
The next lemma restates the updates of PG-EXTRA in terms of xk , qk , x , and q for convergence analysis.
Lemma 2 (Recursive relations of PG-EXTRA). In PG-EXTRA, the quadruple sequence {xk , qk , x , q } obeys
)xk+1 + W
(xk+1 xk )
(I + W 2W
k+1
= U qk+1 s(xk ) r(x

)
(3)
and
)(xk+1 x ) + W
(xk+1 xk )
(I + W 2W
= U (qk+1 q ) [s(xk ) s(x )]
(4)
k+1
[r(x
) r(x
)],
for any k = 0, 1, .
II. N ON - ERGODIC o
(1)
k
CONVERGENCE OF
PG-EXTRA
In Lemma 3 of reference [1], we prove that the progress/successive difference zk zk+1 2G of P-EXTRA is
( )
monotonic, and thus give the non-ergodic o k1 rate. But in PG-EXTRA or EXTRA, since we use explicit gradient
step and error correction, the iteration is more complicated and different from basic gradient descent, it is not clear
if the successive difference zk zk+1 2G is still monotonic. Thus it is not clear if PG-EXTRA and EXTRA can
( )
have non-ergodic o k1 rate.
We recently find that the monotonicity of successive difference zk zk+1 2G in PG-EXTRA is provable (See
( )
Lemma 3 below)This property improves the convergence rate of PG-EXTRA and EXTRA from ergodic O k1 to
( )
non-ergodic o k1 . The technique used in the proof is basic and can provide insight for the rate proof of other
explicit gradient based optimization algorithms.
The convergence analysis of PG-EXTRA is based on (3) and (4). Define
q
q
I
0
,
,
.
zk ,
z ,
G,
k
x
x
0 W
(
)
)
(W
Lemma 3. Under the same assumptions in Theorem 1 of [1], for any step size 0, 2min
, the sequence
Ls
{zk } generated by PG-EXTRA satisfy
zk+1 zk+2 2G zk zk+1 2G ,
k = 0, 1, .
(5)
Proof: To ease the description of the proof, let us define

xk+1 , xk xk+1
qk+1 , qk qk+1
zk+1 , zk zk+1
October 27, 2015
DRAFT
k+1
k
k+1
r(x
) , r(x
) r(x
), and
s(xk+1 ) , s(xk ) s(xk+1 ).

By the convexity of r, we have
k+1
0 xk+1 , r(x
),
(6)
By the convexity of s and the Lipschitz continuity of s, we have

1
k 2
Ls s(x )F
xk , s(xk )
xk+1 , s(xk ) + xk xk+1 , s(xk )
xk+1 , s(xk ) +
Ls
k
4 x
xk+1 2F +
(7)
1
k 2
Ls s(x )F ,
that is
0
xk+1 , s(xk ) +
Ls
k
4 x
xk+1 2F .
(8)
Difference the k-th iteration and the (k + 1)-th iteration of (3) to get
k+1
r(x
) + s(xk ) + U qk+1
)xk+1 + W
(xk+1 xk ) = 0.
+(I + W 2W
(9)
Combine (6), (8), and (9), it follows that

Ls
k
4 x
xk+1 2F + xk+1 , U qk+1
(xk+1 xk ) xk+1 2
+xk+1 , W
.
I+W 2W
(10)
From the definition of qk , we know that qk+1 = U xk+1 , thus

qk qk+1 = U xk+1 .
(11)
Substitute (11) into (10), it follows that

Ls
k
4 x
xk+1 2F + qk qk+1 , qk+1
(xk+1 xk ) xk+1 2
+xk+1 , W
,
I+W 2W
(12)
or equivalently,
Ls
k
4 x
xk+1 2F + zk zk+1 , Gzk+1 xk+1 2I+W 2W

.
(13)
Apply the basic equality 2zk zk+1 , Gzk+1 = zk 2G zk+1 2G zk zk+1 2G to (13), we
finally get
zk 2G zk+1 2G
k+1 2
qk qk+1 2F + xk xk+1 2W
I+W 2W
Ls I + 2x
(14)
0,
which means (5) holds and completes the proof.
Theorem 1. In the same setting of Lemma 3, the following rates hold for PG-EXTRA:
DRAFT
October 27, 2015
(i) Successive difference:

z
t
zt+1 2G
( )
1
=o
;
k
(iv) Optimality residuals:

t+1
U qt + (s(xt ) + r(x
))2W
=o
( )
1
U xt 2F = o
.
k
(1)
k
R EFERENCES
[1] W. Shi, Q. Ling, G. Wu, and W. Yin, A Proximal Gradient Algorithm for Decentralized Composite Optimization, UCLA CAM Report
15-17, 2015.
[2] W. Shi, Q. Ling, G. Wu, and W. Yin, EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization, arXiv preprint
arXiv:1404.6264, 2014.
October 27, 2015
DRAFT

PG EXTRA Appendix Small o 1 K v0.1

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

PG EXTRA Appendix Small o 1 K v0.1

Загружено:

Авторское право:

Доступные форматы

A Note: the Non-Ergodic o

rate of the PG-EXTRA [1] including EXTRA

[2] under general convexity assumption.

where fi (x) := si (x) + ri (x),

2. for k = 0, 1, , for all agent i, do

xk+2 = arg min r(x) +

October 27, 2015

where Lsi > 0 are constant.

fi (x(i) ) is proper closed convex and s satisfies

with constant Ls = maxi {Lsi }.

There exists q = U p for some p Rnp and subgradient r(x

= U qk+1 s(xk ) r(x

Proof: To ease the description of the proof, let us define

s(xk+1 ) , s(xk ) s(xk+1 ).

By the convexity of s and the Lipschitz continuity of s, we have

xk+1 , s(xk ) + xk xk+1 , s(xk )

Combine (6), (8), and (9), it follows that

xk+1 2F + xk+1 , U qk+1

From the definition of qk , we know that qk+1 = U xk+1 , thus

Substitute (11) into (10), it follows that

xk+1 2F + qk qk+1 , qk+1

xk+1 2F + zk zk+1 , Gzk+1 xk+1 2I+W 2W

October 27, 2015

(i) Successive difference:

(iv) Optimality residuals:

October 27, 2015

Вам также может понравиться