Distributed Optimization With Nonconvexities and Limited Comm - PhDthesis2016
Distributed Optimization With Nonconvexities and Limited Comm - PhDthesis2016
Limited Communication
SINDRI MAGNÚSSON
Licentiate Thesis
Stockholm, Sweden 2016
KTH Royal Institute of Technology
TRITA-EE 2016:006 School of Electrical Engineering
ISSN 1653-5146 SE-100 44 Stockholm
ISBN 978-91-7595-854-5 SWEDEN
c 2016 Sindri Magnússon, unless otherwise stated.
Tryck: Universitetsservice US AB
Abstract
In economical and sustainable operation of cyber-physical systems, a number of
entities need to often cooperate over a communication network to solve optimiza-
tion problems. A challenging aspect in the design of robust distributed solution
algorithms to these optimization problems is that as technology advances and the
networks grow larger, the communication bandwidth used to coordinate the solu-
tion is limited. Moreover, even though most research has focused distributed convex
optimization, in cyberphysical systems nonconvex problems are often encountered,
e.g., localization in wireless sensor networks and optimal power flow in smart grids,
the solution of which poses major technical difficulties. Motivated by these chal-
lenges this thesis investigates distributed optimization with emphasis on limited
communication for both convex and nonconvex structured problems. In particular,
the thesis consists of four articles as summarized below.
The first two papers investigate the convergence of distributed gradient so-
lution methods for the resource allocation optimization problem, where gradient
information is communicated at every iteration, using limited communication. In
particular, the first paper investigates how distributed dual descent methods can
perform demand-response in power networks by using one-way communication. To
achieve the one-way communication, the power supplier first broadcasts a coordina-
tion signal to the users and then updates the coordination signal by using physical
measurements related to the aggregated power usage. Since the users do not com-
municate back to the supplier, but instead they only take a measurable action, it
is essential that the algorithm remains primal feasible at every iteration to avoid
blackouts. The paper demonstrates how such blackouts can be avoided by appro-
priately choosing the algorithm parameters. Moreover, the convergence rate of the
algorithm is investigated. The second paper builds on the work of the first paper
and considers more general resource allocation problem with multiple resources. In
particular, a general class of quantized gradient methods are studied where the gra-
dient direction is approximated by a finite quantization set. Necessary and sufficient
conditions on the quantization set are provided to guarantee the ability of these
methods to solve a large class of dual problems. A lower bound on the cardinality
of the quantization set is provided, along with specific examples of minimal quan-
tizations. Furthermore, convergence rate results are established that connect the
fineness of the quantization and number of iterations needed to reach a predefined
solution accuracy. The results provide a bound on the number of bits needed to
achieve the desired accuracy of the optimal solution.
The third paper investigates a particular nonconvex resource allocation prob-
lem, the Optimal Power Flow (OPF) problem, which is of central importance in the
operation of power networks. An efficient novel method to address the general non-
convex OPF problem is investigated, which is based on the Alternating Direction
Method of Multipliers (ADMM) combined with sequential convex approximations.
The global OPF problem is decomposed into smaller problems associated to each
iv
bus of the network, the solutions of which are coordinated via a light communica-
tion protocol. Therefore, the proposed method is highly scalable. The convergence
properties of the proposed algorithm are mathematically and numerically substanti-
ated. The fourth paper builds on the third paper and investigates the convergence
of distributed algorithms as in the third paper but for more general nonconvex
optimization problems. In particular, two distributed solution methods, including
ADMM, that combine the fast convergence properties of augmented Lagrangian-
based methods with the separability properties of alternating optimization are in-
vestigated. The convergence properties of these methods are investigated and suf-
ficient conditions under which the algorithms asymptotically reache the first order
necessary conditions for optimality are established. Finally, the results are numeri-
cally illustrated on a nonconvex localization problem in wireless sensor networks.
The results of this thesis advocate the promising convergence behaviour of some
distributed optimization algorithms on nonconvex problems. Moreover, the results
demonstrate the potential of solving convex distributed resource allocation prob-
lems using very limited communication bandwidth. Future work will consider how
even more general convex and nonconvex problems can be solved using limited
communication bandwidth and also study lower bounds on the bandwidth needed
to solve general resource allocation optimization problems.
Sindri Magnússon
Stockholm, January 2016
Contents
Contents ix
I Thesis Overview 1
1 Introduction 3
1.1 Motivating Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Future Power Distribution Networks . . . . . . . . . . . . 4
1.1.2 Wireless Sensor Networks . . . . . . . . . . . . . . . . . . 6
1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Outline and Contribution of the Thesis . . . . . . . . . . . . . . . 8
2 Background 15
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Optimal Power Flow Problem . . . . . . . . . . . . . . . . . . . . 16
2.4 Dual Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Alternating Direction Method of Multipliers . . . . . . . . . . . . 19
II Included Papers 23
ix
x Contents
C.6 conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
C.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
C.7.1 On the use of quadratic programming QP solvers . . . . . 90
C.7.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Bibliography 127
List of Figures
1.1 The future distribution network will consist of various devices that
cooperate to achieve more economic and environmental friendly power
distribution. A challenging aspect is how to achieve the best oper-
ation using limited communication bandwidth for coordination and
communication among the devices. (source: https://www.flickr.com) 5
1.2 Wireless sensor network operating a city. Some of the devices coor-
dinate the power flows through the city, others the vehicular traffic
and water distribution. (source: Yuzhe Xu’s Licentiate Thesis [1]) . 7
xiii
xiv List of Figures
1.5 The figure depicts the convergence of ADMM for the nonconvex OPF
problem, of the form (1.1), studied in the third paper. The algorithm
is distributed between the nodes/buses of a power network where
each node keeps a private estimate of its neighbours voltages and δ
is a measure of the consensus/consistency between the voltage esti-
mates. ρ is an algorithm parameter that penalizes violations of the
consensus constraint. Figure 1.5a depicts δ over the course of the al-
gorithm for different ρ’s. The figure shows that the algorithm always
reaches consensus among the nodes with high numerical accuracy
and larger ρ enforce more consistency. Figure 1.5b depicts the objec-
tive function value compared with the consensus, δ, where n indicates
the iteration number. The figure shows that when ρ = 106 then the
algorithm converges almost to the red vertical line depicting a lower
bound on the optimal value given by a relaxation. . . . . . . . . . . 11
1.6 The figure depicts the the convergence of the distributed algorithms
studied in the forth reported paper when (1.1) represents a non-
convex estimation problem, i.e., localization based on noise distance
measurements. Figure 1.6a depicts the problem setup, where the sen-
sors, black markers, estimate their own location by measuring the dis-
tances to their neighbours in the network and communicating over
the network. The grey circles are anchor nodes that know their own
location and the coloured markers denote the estimations obtained
from different algorithms. Figure 1.6b depicts the objective function,
i.e., a penalty of violating the distance measurements, at every iter-
ation. Since the problem is nonconvex the algorithms can converge
to different local minima. . . . . . . . . . . . . . . . . . . . . . . . . 12
A.1 Using a sufficiently small step size, the feasibility of the primal prob-
lem is maintained. The upper boundary of this feasible set is denoted
by the dotted line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
A.2 Convergence of the algorithm for different number of users N =
5, 10, 20, 30, 40, 150, 1000. In Figures A.2a and A.2c the behaviour of
the algorithm is identical for all N . The blue dotted line in A.2c is
the theoretical bound from Proposition 5. . . . . . . . . . . . . . . . 37
B.1 Gradient and objective function value over the course of the algorithm. 57
C.1 The feasible set of ((vkRe)r, (vkIm)r) where A=(vkmax)r and B=(vkmin)r . 73
C.2 Graphical illustration of Algorithm 2. . . . . . . . . . . . . . . . . . . 78
C.3 CDF displaying DFk,n of Eq. (C.38) for every subproblem k and
every ADMM iteration n for each of the four examples. . . . . . . . 83
C.4 Histograms displaying DFk,n in Eq. (C.38) for every subproblem k
and every ADMM iteration n for the considered test networks. . . . 84
C.5 δ versus number of ADMM iterations. . . . . . . . . . . . . . . . . . 85
C.6 versus number of ADMM iterations. . . . . . . . . . . . . . . . . . 86
C.7 δ versus the objective function value. . . . . . . . . . . . . . . . . . 87
List of Figures xv
xvii
Part I
Thesis Overview
1
Chapter 1
Introduction
3
4 Introduction
The task of finding good operation points in cyberphysical systems can often be
formulated as an optimization problem with a global network-wide objective func-
tion and problem data that are scattered between the network entities. Distributed
algorithms for solving these optimization problems have been much investigated re-
cently, but almost extensively for well behaved convex problems. On the contrary,
distributed algorithms for the more challenging nonconvex problems have received
little attention despite the emerging need for scalable solutions for many large scale
nonconvex applications, e.g., localization in wireless sensor networks and optimal
power flow in smart grids. Fortunately, most large scale nonconvex problems share
the structures that are usually exploited to achieve distributed/scalable algorithms
for their convex counterparts; however, the convergence properties of the solution
algorithms for nonconvex problems are left largely unexplored. In this thesis, we
investigate some prominent nonconvex optimization problems that appear in cy-
berhysical systems, as we survey in the next section that provides some motivating
examples.
Figure 1.1: The future distribution network will consist of various devices that co-
operate to achieve more economic and environmental friendly power distribution.
A challenging aspect is how to achieve the best operation using limited communi-
cation bandwidth for coordination and communication among the devices. (source:
https://www.flickr.com)
with the future developments. Fortunately, the many recent technical advancements
such as i) cheaper, smaller, and more effective sensing-, computing-, and actuator
devices, ii) the promising evaluation of fast and distributed signal processing and
control algorithms, and iii) more reliable communication, have the potential to
revolutionize today’s distribution networks. In particular, in the future smart dis-
tribution networks [3] (smart grids) users will be equipped with smart meters which
are small computational devices that can communicate with the power operator to
help consumers use power in a more economical way. The smart meter will both i)
affect/control the users behaviour positively by incentivising them to consider the
state of the distribution network when using power, e.g., lowering the price when
the total demand is low, and ii) increase the efficiency of the distribution network
by gathering the needed information and coordinating with other network entities.
6 Introduction
Moreover, the household appliances will be more autonomous and use power in
more economic and environmental friendly manner, in sync with the user. The fu-
ture distribution network will also generate power from many distributed renewable
generators such as wind turbines. The consumers will also be able to inject power
into the grid from various renewable sources such as solar panels, training bikes, or
from the batteries of their electrical vehicles.
While the vision of the next generation distribution networks has the poten-
tial to revolutionize todays power networks, many technical problems must first be
solved. With the smart grid now in its infancy, industry and academia will inves-
tigate new hardware that will be integrated to the grid and solve advanced tasks
with the new technology. With the distribution networks growing, due to the inte-
gration of all the new technologies, it is essential to carefully consider from the start
how the different tasks can be accomplished with minimal communication so the
smart grid will evolve in a sustainable manner and will not crash under the future
growing huge scale. Another reason for considering efficient communication is that
no sophisticated communication infrastructure has yet been integrated into todays
distribution networks - nevertheless power networks are equipped with natural com-
munication infrastructure, i.e., power line communication [4, 5], which can already
be used but has limited communication bandwidth. The power line communication
has the potential to kickstart many of the early integrations of smart grid before a
dedicated communication network is merged with the grid, which could take a long
time since industry will carefully consider any steps towards integrating costly new
infrastructure into the grid. Therefore, it is essential to address the communication
challenges right away both for sake of early integration of smart grid and also so
that the smart grid can keep up with future challenges.
Another central problem in power networks is the optimal power flow problem,
which finds the best of feasible power flows through the network, where what is
best depends on the engineering need. The problem is well studied in transmission
networks, but due to lack of control and information in traditional distribution
networks the problem has only recently become relevant. A challenge in solving the
optimal power flow problem in distribution networks is that DC-optimal power flow,
an approximation of the problem which works well in transmission networks, is not
appropriate for the distribution network due to their low voltage. Therefore, when
solving power flow problems in distribution networks the full AC optimal power
flow problem needs to be considered, which turns out to be highly nonconvex. As
a result, it is important to find distribution solution method to nonconvex power
flow problems in the distribution networks to operate them more efficiently.
Figure 1.2: Wireless sensor network operating a city. Some of the devices coordinate
the power flows through the city, others the vehicular traffic and water distribution.
(source: Yuzhe Xu’s Licentiate Thesis [1])
environmental constraints, such as when the networks are mobile (e.g., vehicle net-
works [7]), are underwater [8], or are attached or even inside the human body (e.g.,
body area networks [9]). Even when wired communications are feasible, wireless
communication can still be a better option due to the many benefits of WSNs, e.g.,
they don’t need an existing communication infrastructure and are usually easier to
employ. However, to achieve all the benefits of wireless operation of cyber physi-
cal systems, lightweight communications are essential because communication over
wireless channels is a scarce resource and has to be well coordinated due interfer-
ences and messages collisions, or may be very expensive. Moreover, the wireless
devices are usually battery powered with no or limited power sources and there-
fore economical usage of communication is essential to prolong the lifetime of the
networks.
Localization and tracking is a fundamental task in many WSN’s, since the physi-
cal location of the network devices usually has large statistical impact on the sensed
information and is also relevant in controlling the network. Moreover, in many
WSN applications, such as indoor positioning systems, the localization/tracking
is the sensing task of the network. Using the Global Positioning Systems (GPS)
is often unfeasible due non-line of sight to a satellite, e.g., in indoor positioning,
or unsatisfactory accuracy, e.g., in interbody wireless sensor networks, in addition
to being generally unattractive due to the power and cost constraints of the in-
dividual sensors. A more attractive option is self-localization of the network [10],
where the locations of the devices are estimated from the known locations of ref-
erence nodes and distance measurements between communication neighbors in the
network. Localization using distance measurements is a nonconvex optimization
8 Introduction
Figure 1.3: The figure depicts the convergence of the one-way communication dual
decomposition, reported in the first paper, for solving an instance of (1.1) where the
number of users is N = 5, 10, 20, 30, 40, 150, 1000. Figure 1.3a and 1.3b depict the
the distance from the dual and primal variables, at each iteration, to the optimal
dual and primal variable, respectively. The blue dotted line in Figure 1.3a depicts
a theoretical bound on the convergence provided in the paper. For illustration
purposes the primal problems have been constructed so that the dual problem does
not change when number of users increases and hence the dual convergence shown
in Figure 1.3a is the same for all N . The paper discusses how the primal problem
can be regulated so the dual problem does not change significantly when the number
of users increase, indicating superior scalability properties.
the step sizes in the dual descent algorithm that give the optimal convergence rate
O(1/t2 ), where t is the iteration index, for the given the structure of the dual
problem, which is convex with Lipschitz continuous gradients, and the algorithm
generally has the convergence rate O(1/t). Nevertheless, we show that under mild
structure on the resource allocation problem, a linear convergence rate O(ct ), with
c ∈ [0, 1[, is achieved. Moreover, we provide additional problem structure where the
linear convergence rate is independent of the number of users, hence demonstrating
a superior scalability properties. Finally, we illustrate the results using numerical
simulations.
This article has been accepted to appear in:
Figure 1.4: The figure depicts iterations to obtain the solution to optimization
problem (1.1) based on the quantized gradient methods from the second paper
reported in the thesis. The vertical axis in Figures 1.4a and 1.4b depicts the norm
of the gradient and the primal objective function, respectively, at every iteration.
The green dotted line is an approximation of the gradient direction method (B.3)
where only 4 bits are communicated per iteration, but yet achieves almost the same
performance.
The second paper investigates distributed gradient methods, where the gradient
is communicated at every iteration of the algorithm, when bandwidth is limited.
In particular, the paper considers quantized gradient methods (QGM) where the
gradient descent direction is projected to a finite quantization set D before being
communicated. The paper investigates necessary and sufficient conditions that en-
sure the quantization set D be proper, in the sense that the QGMs can minimize any
convex function f : RN → R with Lipschitz continuous gradients and non-empty,
bounded set of minimizers. We use this characterization to provide examples of
proper quantization sets D. We also show that if |D| ≤ N then D cannot be proper
and there exists an optimization problem, from the aforementioned class, which
QGMs can not solve. Moreover, we show that there exists proper quantization sets
with |D| = N + 1, hence the minimal cardinality of a proper quantization set is
N + 1, which can be communicated using log2 (N + 1) bits. We provide a bound on
the number of iterations needed to achieve any accuracy on the optimal solution
that depends on the fineness of the quantization set D. Specifically, the bound on
number of iterations decreases when the quantization set becomes finer. We also
show that, when the step-sizes are non summable but square summable, then the
iterates of QGMs converge to the set of optimal values. Finally, we demonstrate
how the theory can be applied to a resource allocation problem in power networks.
This paper has been submitted to:
1.3. Outline and Contribution of the Thesis 11
0 0
10 10
δ
δ
Figure 1.5: The figure depicts the convergence of ADMM for the nonconvex OPF
problem, of the form (1.1), studied in the third paper. The algorithm is distributed
between the nodes/buses of a power network where each node keeps a private
estimate of its neighbours voltages and δ is a measure of the consensus/consistency
between the voltage estimates. ρ is an algorithm parameter that penalizes violations
of the consensus constraint. Figure 1.5a depicts δ over the course of the algorithm for
different ρ’s. The figure shows that the algorithm always reaches consensus among
the nodes with high numerical accuracy and larger ρ enforce more consistency.
Figure 1.5b depicts the objective function value compared with the consensus, δ,
where n indicates the iteration number. The figure shows that when ρ = 106 then
the algorithm converges almost to the red vertical line depicting a lower bound on
the optimal value given by a relaxation.
100
f (x(t))
10-2
10-4
10-6
Figure 1.6: The figure depicts the the convergence of the distributed algorithms
studied in the forth reported paper when (1.1) represents a nonconvex estimation
problem, i.e., localization based on noise distance measurements. Figure 1.6a depicts
the problem setup, where the sensors, black markers, estimate their own location by
measuring the distances to their neighbours in the network and communicating over
the network. The grey circles are anchor nodes that know their own location and
the coloured markers denote the estimations obtained from different algorithms.
Figure 1.6b depicts the objective function, i.e., a penalty of violating the distance
measurements, at every iteration. Since the problem is nonconvex the algorithms
can converge to different local minima.
highly scalable. The convergence properties of the proposed algorithm are math-
ematically substantiated. Finally, the algorithm is evaluated on a number of test
examples, where the convergence properties are investigated and the performance
is compared with a global optimal method.
The paper has been published in:
function method and is called the Alternating Direction Penalty Method (ADPM).
Unlike the original quadratic penalty function method, in which single-step op-
timizations are adopted, ADPM uses an alternating optimization, which in turn
makes it scalable. The second method is the well-known Alternating Direction
Method of Multipliers (ADMM). It is shown that ADPM for nonconvex prob-
lems asymptotically converges to a primal feasible point under mild conditions.
Additional conditions ensuring that ADPM asymptotically reaches the standard
first order necessary conditions for local optimality are introduced. In the case of
the ADMM, novel sufficient conditions under which the algorithm asymptotically
reaches the standard first order necessary conditions are established. Based on this,
complete convergence of ADMM for a class of low dimensional problems are char-
acterized. Finally, the results are illustrated by applying ADPM and ADMM to a
nonconvex localization problem in wireless sensor networks.
The chapter is based on the following papers:
• S. Magnússon, P. C. Chathuranga, M. Rabbat, C. Fischione, ”On the Con-
vergence of Alternating Direction Lagrangian Methods for Nonconvex Struc-
tured Optimization Problems,” Accepter, to Appear, in Control of Network
Systems, IEEE Transactions on, 2016
• S. Magnússon, P. C. Chathuranga, M. Rabbat, C. Fischione, ”On the conver-
gence of an alternating direction penalty method for nonconvex problems,” in
Signals, Systems and Computers, 2014 48th Asilomar Conference on, pp.793-
797, 2-5 Nov. 2014
Chapter 2
Background
This chapter summarizes the background theory used in the contribution of the the-
sis. In particular, Section 2.2 introduces general resource allocation problems that
are studied in this thesis. Section 2.3 introduces the optimal power flow problem, a
specific resource allocation problem that plays an important role in the operation
of power networks. In the following Sections 2.4 and 2.5 we discuss distributed so-
lution methods for the studied problems. In particular, Section 2.4 discusses dual
decomposition and 2.5 discusses the alternating direction method of multipliers
(ADMM).
2.1 Notation
We use the following notation in this chapter. Vectors and matrices are represented
by boldface lower and upper case letters, respectively. The set of real n vectors
and n×m matrices are denoted by Rn and Rn×m , respectively, and C represents
the set of complex numbers. Otherwise, we use calligraphy letters to represent
sets. The superscript (·)T stands for transpose. j denotes the imaginary number
√
−1. diag(A1 , . . ., An ) denotes the diagonal block matrix with A1 , . . ., An on the
diagonal. ||·|| denotes the 2-norm. The domf is the domain of the function f : Rn →
Rm . intX denotes the interior of the set X .
15
16 Background
N
X
minimize fk (zk )
z1 ,··· ,zN
k=1 (2.3)
subject to gk (zNk ) = 0, for all k ∈ N ,
zk ∈ Xk , k ∈ N .
The utility functions fk and the local constraints Xk can vary depending on the
engineering need. For example, the utilities can be chosen so that (2.3) minimizes
the power loss or generation costs to satisfy demand of the users given by Xk or
the utilities can be constructed so (2.3) finds some fair allocation of power between
the households.
We now formally express the power flow equations g(z1 , · · · , zN ) = 0, see any
textbook on power system analysis for more details, e.g., [2, 13]. To express the
power flows set zk = (sk , vk ) where sk , vk ∈ C are the power and voltage at user
(or bus) k ∈ N , respectively, then
X
gk (zNk ) = sk − vk yki vi , (2.4)
i∈Nk
where yki = gki + jbki ∈ C, with gki , bki ∈ R, is the admittance in the flow line
(k, i) ∈ E. The power flow equations are also commonly expressed with the voltage
in polar coordinates. In that case, we set zk = (pk , qk , |vk |, θk ) where sk = pk + jqk ,
with pk , qk ∈ R, vk = |vk |ejθk , and the power flow equations (2.4) reduce to
" P #
pk − i∈Nk |vi ||vk | (gki cos(θk − θi ) + bki sin(θk − θi ))
gk (zNk ) = P . (2.5)
qk − i∈Nk |vi ||vk |((gki sin(θk − θi ) − bki cos(θk − θi ))
The power flow equations (2.4), (2.5), and (2.3) are nonlinear which renders the
optimization problem (2.3) nonconvex.
As argued in the introduction, it is essential to solve problems such as (2.1),
(2.2), and (2.3) in a distributed manner. We next review standard distributed meth-
ods for these problems.
18 Background
which results in
The dual problem of (2.1) with respect to the coupling constraint g is then given
by
minimize D(p)
p
(2.10)
subject to p ≥ 0.
The dual function, D, is always convex and therefore the dual problem (2.10) is
always convex, even when the primal problem (2.1) is nonconvex. Therefore, D has a
gradient (or a subgradient) at every interior point of the domain of D [15, Theorem
1.7], i.e., at every p ∈ int dom D, and the (sub)gradients are given by
If optimization problem (2.9) has a unique solution, then (2.11) is a gradient, other-
wise every solution p of (2.9) provides a subgradient (2.11). When D is everywhere
differentiable, which is for example the case when fi are strictly concave for all
i ∈ N and g is convex, then (2.10) can be solved by using the dual descent method
with appropriate step-size choice γ(t) ∈ R+ , see [14]. When D is not differentiable
but subgradients exist everywhere, which is for example the case if X is compact
and fi and g are continuous, then (2.10) can be solved using subgradient method
which follows the recursion (2.12) using subgradients.
2.5. Alternating Direction Method of Multipliers 19
The interesting aspect about the dual descent method (2.12) is that a decom-
position structure in the coupling constraint g can be used to decouple the re-
cursion (2.12) between the users. For example if g is separable between the users,
i.e.,
N
X
g(x) = gk (xk ),
k=1
then optimization problem in (2.7) is fully separable between the users and the
(sub)gradient ∇D(p) can be computed without coordination where each user k ∈ N
solves the local problem
However, as we are interested in solving the primal problem (2.1) but not the dual
problem (2.10), the dual descent method is only usable if the primal solution can
be constructed from the optimal dual solution p? , e.g., if the duality gap is zero
and (2.7) has a unique solution for p? .
The dual descent method can be unstable in some cases since the dual gradient
might not exist everywhere, e.g., outside of the domain of D where D(p) = ∞.
Moreover, to ensure that the primal optimal solution can be constructed from the
dual optimal solution, strong assumptions must be made on the primal problem,
such as strongly convexity. In addition, the convergence of the dual ascent is heavily
dependent on the step size choice γ(t). These drawbacks of the dual decomposition
have motivated the more robust variant of the dual descent method which is now
introduced.
where the information keept by the two users is (x, f , A) and (z, g, B), respec-
tively, and both users know c. A method for addressing problems of the from (2.13)
cooperatively between the two users is the Alternating Direction Method of Mul-
tipliers (ADMM), a variant of the dual descent method where the dual function is
obtained from a regularized Lagrangian function given by
Note that the users need to communicate Ax(t + 1) and Bz(t + 1) over the course
of the algorithm and the dual variable p can be maintained by either or both
users. Compared to the dual decomposition, the ADMM has very good convergence
properties and is guaranteed to converge to the optimal value of (2.13) for any ρ > 0
if f and g are closed, proper, and convex and L0 (the standard Lagrangian) has a
saddle point [16].
Problem (2.13) can also cover multiuser scenarios. For instance, by letting the
x variable be a private variable of the different users and z be a coupling variable
that ensures consensus. Consider for example the optimal power flow problem given
in (2.3). Let x = (x1 , · · · , xN ), where xk is a local copy that user k keeps of his/her
own variable zk and it’s neighbours zi for i ∈ Nk , where Nk = {i ∈ N |(k, i) ∈ E}.
For convenience we use the notation xkk to denote the component of xk associated
with zk . Now the problem (2.3) can be formulated equivalently on the form of (2.13),
as follows
XN
minimize fk (xkk )
x1 ,··· ,xN
k=1
subject to x − Ez = 0 (2.18)
gk (xk ) = 0, for all k ∈ N ,
xk ∈ Xk , k ∈ N ,
where E = (E1 , · · · , EN ) is a binary matrix where component Ek captures the
coupling constraints xk = zNk = Ek z. Notice that (2.18) is on the form of (2.13)
where the coupling constraint is x − Ez = 0. Moreover, the x-update of the ADMM
(cf. (2.15)) can be performed in a fully distributed manner between the users N
where each user k ∈ N solves the local subproblem
ρ
minimize fk (xkk ) + pT (t) (xk − Ez(t)) + ||xk − Ez(t)||
xk 2
subject to gk (xk ) = 0, (2.19)
xk ∈ Xk .
2.5. Alternating Direction Method of Multipliers 21
The z-update of the ADMM (cf. (2.16)) reduces to a simple averaging between
neighbors, i.e.,
1 X
zk = xi ,
|Nk |
i∈Nk
and can be achieved by a simple coordination between the neighbors in the network.
The dual variable update (cf. (2.17)) can then be achieved locally by each user.