Chapter9-Parallel Distributed Algorithm
Chapter9-Parallel Distributed Algorithm
X
N
min f (x) = fj (x) (1.1)
x∈Rn
j=1
For example,x represents the parameters in a model and fj represents the loss
function associated with the jth block of data or measurements.We say that x is
found by collaborate filltering.Since the data sources are ”collaborating” to de-
velop a global model
minx f (x) = g(x) + h(x)
The problem can be rewriten with local variables xi ∈ Rn and a common global
variable z ∈ Rn P
minx̃,z j fj (xj )
(1.2)
s.t.xj − z = 0
where x̃ = (x1 , x2 , · · · , xN )⊤ ∈ RnN ,This is called the global consensus prob-
lem,since the constraint is that all the local variable should agree,i.e.be equal
x1 −z = 0
x2 −z = 0
. .. .. (1.3)
.
N x −z =0
ADMM for (1.2) can be derivedPeither directly from engmented Lagrange.
Lρ (x1 , · · · , xN ; z) = j Fj (xj ) + ⟨yi , xi − z⟩ + ρ2 ∥xi − z∥22
1
or simply as a special case of the constraint optimization.
X
min f (x) = fj (xi ) (1.4)
x∈C
i
C = {(x1 , · · · , xN ) ∈ RN ; x1 = x2 = x3 = · · · xN } (1.5)
The resulting ADMM algorithm is as follows.
2
xk+1 = arg minxi fi (xi ) + ρ2
xi − z k + yik /ρ
2 ,
i
= P roxfi /ρ z k − yik /ρ , i = 1, . . . , N
P
2
z k+1 = arg minz i ρ2
xk+1 − z + yik /ρ
2 (1.6)
P i
= N1 N j=1 xj
k+1
+ yik /ρ
k+1
yi = yik + ρ xk+1
i − z k+1 i = 1, · · · , N
This processing element that handles the global variable z is sometimes callled
the Center Collector or Fusion Center.Note that z-update is simply the projection
of xk+1 + y k /ρ. Onto the constraint sets C of ”block constant”vectors.i.e. z k+1 is
the average of xk+1 + y k /ρ for N-elements.
1 X k+1 1 X k
z k+1 = xj + y /ρ = x̄k+1 + ȳ k /ρ (1.7)
N j N j j
N j=1 j
which is consensus ADMM.Here each pair (xk+1 i , yik+1 ) can be updated in parallel
way.for i = 1, ..., N .
This is similar as that of Tocabi-ADMM.
For consensus ADMM ,the primal and chual residuals are
( ⊤
γ k = xk1 − x̄k , · · · , xkN − x̄k
⊤ (1.11)
S k = −ρ x̄k−1 − x̄k , · · · , x̄k−1 − x̄k
So that,squard norms are
2
(
2 P
γ k
= N
xkj − x̄k
k
22
j=1
2 (1.12)
s
= N ρ2
x̄k − x̄k−1
2
2 2
3
3 Sharing
where fj is a local cost function for subsystems i,and g is the shared objective.This
sharing problem is improtant both
(a).many useful problems can be put into this form
(b).it enjoys a dual relationship with the consensus problem:
Sharing can be writen as ADMM form by copying all variable.
( P P
minx,z j fj (xj ) + g j zj j = 1, . . . , N (3.2)
s · t. xj − zj = 0,
with x = (x1 , ..., xN )⊤ ∈ RnN ,z = (z1 , ..., zN )⊤ ∈ RnN . This ADMM is
ρ
x k+1
= argmin f j (xj ) +
xj − zjk + yjk /ρ
2
j
2 2
xj
= projfj /ρ zjk − yjk /ρ
!
X ρ X k+1 (3.3)
z k+1
= argmin g z j + ∥xj − zj + yjk /ρ∥22
j
2
j
j
k+1
yj = yj + ρ xj − z j
k k+1 k
The xj -and-yj -steps can be carvied at independently in paralle for each j = 1....., N
4
and
yjk+1 = yjk + ρ xk+1
j − z j
k+1
k+1
= yj + ρ xj − vjk + z̄ k+1 − v̄ k
k
.
k+1
= yj + ρ xj − xj + yj /ρ − z̄
k k+1 k k+1
+ v̄ k (3.7)
= yjk + ρ −yjk /ρ − z̄ k+1 + x̄k+1 + ȳ k /ρ
= ȳ k + ρ x̄k+1 − z̄ k+1
The equation (3.7) shows that the dual variable y k+1 are all equal (i.e.consensus)and
can be replaced by a single dual variable y ∈ Rn
y k+1 = y k + ρ x̄k+1 − z̄ k+1 (3.8)
substiting the equivalent expression for zj
zj = v0k + z̄ k+1 − ∇
¯k
= xk+1
j + yjk /ρ + z̄ k+1 − x̄k+1 − ȳ k /ρ)
(3.9)
= xk+1
j + y k /ρ + z̄ k+1 − x̄k+1 − y k /ρ
= xk+1
j + z̄ k+1 − x̄k+1
into the x-update,we get
ρ
xj − zjk + y k /ρ
2
xk+1
j = argmin fj (xj ) + 2
2
ρ
= arg min fj (xj ) + ∥xj − xkj + z̄ k − x̄k − y k /ρ ∥22 (3.10)
2
= P roxfj /ρ xj + z̄ − x̄k − y k /ρ
Combining the (3.10),(3.6) ,and (3.8).we get the final algorithm.
k+1
xj = P roxfj /ρ xkj + z̄ k − x̄k −
y k /ρ
k+1
z̄ = P roxg/ρN N x̄k+1 + y k /ρ /N
(3.11)
y
k+1
+ ρ x̄k+1 − z̄ k+1
= yk P
x̄k+1 = N1 j xk+1j
5
The dual function
Γ (y1 , · · · , yN ) = inf L (x1 , · · · , xN , z1 · · · , zN , y1 , · · · , yN )
x=z
" !#
X
N X X
= inf inf − [⟨−yj , xj ⟩ + fj (xj )] − ⟨yj , zj ⟩ − g zj
z x
j=1 j j
!!
X X X
=− fj∗ (−yj ) − sup ⟨yj , zj ⟩ − g zj
z
j j j
* + !!
X X X
=− fj∗ (yj ) − sup yj , zj −g zj
z
j j j
P
− j fj∗ (−yj ) − g ∗ (yj ) , if y1 = · · · = yN
=
−∞, otherwise
(4.1)
∗
Letting ψ = g ,hj (yj ) = fj∗ (−yj ).the
dual problem can be written as
P
− miny,w i hi (yj ) + ψ(w) (4.2)
s.t. yj − w = 0, j = 1, · · · , N
5 Optimal Exchange
6
k+1
xj = P roxfj /ρ xkj − x̄k − y k /ρ
y k+1 = y k P
+ ρx̄k+1 (5.3)
k+1
x̄ = N1 j xk+1j
6 Decentribulized Concensus
when we solve the global consensus optimization problem,
PN
min f (x) = j fj (x) (6.1)
we use the constriants
P
minxi N i fi (xi ) (6.2)
s.t. xj − z = 0 j = 1, . . . , N
where x̂ = (x1 , · · · , xN )
Here z is the center collector.However,in many scenes,the data is collected or
sorted in a distributed meaner.a fusion center is either disallowed or not econ-
imal.Consequently,any computing tasks must be complushed a decentrialized and
collorative manner by a agents .This approach can be powerful and efficient,as
(a)the computing tasks are distributed over all the agents.
(b)information exchange occurs only between the agents with direct communica-
tion links.
This is no risk of central computation overload .or networks congestion.
7
v = {v1 , · · · , v5 } , ε = (1, 2), (1, 5), (1, 3), (2, 5), (3, 2), (3, 4), (4, 5)
7 Distributed ADMM
In the optimization problem (6.3) ,the constraint can be writted as
x1 =x2 = x3 = · · · = xN
x1 − x2 =0
x2 − x3 =0
i.e. . .. ..
.
x
N −1 − xN = 0
This can be represented by edge-node incidnete.matrix A as follows
Ax = 0
where
1−1
1−1
A= . . ∈ R(N −1)×N
.
1−1
is the Laplace matrix
Therefore,we can get following problem
P
min N j=1 fj (xj ) (7.1)
s.t. Ax = 0
or
( P
minx1 ,··· ,xN N j=1 fj (xj )
PN (7.2)
s.t. j=1 Aj xj = 0
Next,we give the distributed ADMM for solving (7.1). For agent i
( k P
xi = arg minfi (xi ) + ρ2 j∈N (i) ∥xj − xi + yij /ρ∥22 i = 1, · · · , N
xi (7.3)
k+1
yji k
= yji + ρ xk+1
j − x k+1
i
Given
Nj = Ni+ ∪ Ni−
8
with
Ni+ = {j : (i, j) ∈ ε, i < j}
(7.4)
Ni− = {j : (i, j) ∈ ε, i > j}
then design the following Distributed ADMM
hP i hP i
0 ∈ ∂fi xi − xj + yii /ρ + ρ j∈Ni− xj − xi
k+1 k+1 k+1
+ρ + x
k k k k
+ yji /ρ
h j∈N i
i
P i hP − k+1 i
= ∂f x k+1
+ ρ N + k+1
x − x k
+ y k
/ρ + ρ x k
+ y k
/ρ + N i x i
i i i i
+
j∈Ni j ji −
j∈Ni j ji
k+1
yji = yji k
+ ρ xk+1
j − xk+1i
(7.5)
k+1
where xi -update only relates to its neigborhood Ni
ρ P
xi = arg min fi (xi ) + 2 ∥ i̸=j Aj xj + Ai xi − b + y/ρ∥22
k+1 k
At temtion k, every agent i sends its current itenite xki to its neighbors j ∈
Ni ,and receives xki from its neighbors j ∈ Ni .Then every agent i,excutes the con-
sensus update step P P
xk+1
i = aii xki + j∈Ni aijP
xkj = j∈Ni ∪{i} aij xkj
where the weights aii > 0 for j ∈ Ni with j∈Ni ∪{i}
Here the position sclars aij ,j ∈ Ni ∪ {i} are referred to as convex weights,and
vector xk+1
i is said to e a convex combination(or weighted average)of points xkj ,j ∈
Ni ∪ {i}. Note that the weights aij ,i ∈ Ni ∪ {i}are selected by agent i.
Let us define an N × N weight matrix A with entries
> 0 j ∈ Ni ∪ {i} P
aij , j∈Ni ∪{i} , aij = 1 (8.1)
=0 j∈ / Ni ∪ {i}
9
PN
Where the sum of each row of A is equal to l;i.e. j=1 aij = 1.we refer such matrix
to row stochastic.Then becomes
1 X
N
xk+1
i = PN aij xkj (8.2)
j=1 aij i=1
1 X k+1 1 X k αk
L
xi = x̃i − ∇fi x̄ki
L i L i L
! (8.5)
1X X αk X
L L L
= aij xj −
k
fi x̃ki
L j=1 i=1 L i=1
Where
PL the matrix is doublely column stochastic,i.e.The matix A is also stochastic
i=1 aij = 1,for j,then
k P
x̃ = j aij xkj
P (8.6)
x̄k+1 = x̄kj − aLk Lj=1 ∇fi x̃ki
10
Figure 2: Distributed Computing
X ρ
k
xk+1 = argmin f i j +
(x
xj − xi + yijk /p
2 i = 1, · · · , L. (9.1)
i 2
j∈N
2
i
X
N
min fi (xi , ai , bi ) (9.2)
x∈χ
j=1
where χ ∈ [−1, 1]n ,and fi are the loss functions associated over the dataset.For
example ,fi are quadratic,i.e.
X
N
2
min a⊤
i x − bi (9.3)
x∈χ
i=1
References
[1] S Boyd, N Parikh, E Chu, B Peleato, and J Eckstein. Found trends mach. In
Learn, volume 3, page 1, 2011.
[2] Ermin Wei and Asuman Ozdaglar. Distributed alternating direction method
of multipliers. In 2012 IEEE 51st IEEE Conference on Decision and Control
(CDC), pages 5445–5450. IEEE, 2012.
[3] Jinshan Zeng and Wotao Yin. On nonconvex decentralized gradient descent.
IEEE Transactions on signal processing, 66(11):2834–2848, 2018.
12