0% found this document useful (0 votes)

21 views

Distributed Optimization With Nonconvexities and Limited Comm - PhDthesis2016

This document discusses distributed optimization algorithms for solving resource allocation problems with limited communication. It presents four papers that investigate distributed gradient methods, quantized gradient methods, and nonconvex optimization techniques like ADMM for problems in power networks and wireless sensor networks. The papers analyze the convergence properties of these algorithms and how limited communication bandwidth affects achieving optimal solutions.

Uploaded by

John Herrera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Distributed Optimization With Nonconvexities and Limited Comm - PhDthesis2016

Uploaded by

John Herrera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Distributed Optimization with Nonconvexities and

Limited Communication

SINDRI MAGNÚSSON

Licentiate Thesis
Stockholm, Sweden 2016
KTH Royal Institute of Technology
TRITA-EE 2016:006 School of Electrical Engineering
ISSN 1653-5146 SE-100 44 Stockholm
ISBN 978-91-7595-854-5 SWEDEN

Akademisk avhandling som med tillstånd av Kungl Tekniska högskolan framlägges

till offentlig granskning för avläggande av teknologie licentiatesexamen i electro och
systemteknik fredag den 19 februari 2016 klockan 10.00 i Osquarsbacke 14, sal E3,
Huvudbyggnaden, KTH Campus.

c 2016 Sindri Magnússon, unless otherwise stated.

Tryck: Universitetsservice US AB
Abstract
In economical and sustainable operation of cyber-physical systems, a number of
entities need to often cooperate over a communication network to solve optimiza-
tion problems. A challenging aspect in the design of robust distributed solution
algorithms to these optimization problems is that as technology advances and the
networks grow larger, the communication bandwidth used to coordinate the solu-
tion is limited. Moreover, even though most research has focused distributed convex
optimization, in cyberphysical systems nonconvex problems are often encountered,
e.g., localization in wireless sensor networks and optimal power flow in smart grids,
the solution of which poses major technical difficulties. Motivated by these chal-
lenges this thesis investigates distributed optimization with emphasis on limited
communication for both convex and nonconvex structured problems. In particular,
the thesis consists of four articles as summarized below.
The first two papers investigate the convergence of distributed gradient so-
lution methods for the resource allocation optimization problem, where gradient
information is communicated at every iteration, using limited communication. In
particular, the first paper investigates how distributed dual descent methods can
perform demand-response in power networks by using one-way communication. To
achieve the one-way communication, the power supplier first broadcasts a coordina-
tion signal to the users and then updates the coordination signal by using physical
measurements related to the aggregated power usage. Since the users do not com-
municate back to the supplier, but instead they only take a measurable action, it
is essential that the algorithm remains primal feasible at every iteration to avoid
blackouts. The paper demonstrates how such blackouts can be avoided by appro-
priately choosing the algorithm parameters. Moreover, the convergence rate of the
algorithm is investigated. The second paper builds on the work of the first paper
and considers more general resource allocation problem with multiple resources. In
particular, a general class of quantized gradient methods are studied where the gra-
dient direction is approximated by a finite quantization set. Necessary and sufficient
conditions on the quantization set are provided to guarantee the ability of these
methods to solve a large class of dual problems. A lower bound on the cardinality
of the quantization set is provided, along with specific examples of minimal quan-
tizations. Furthermore, convergence rate results are established that connect the
fineness of the quantization and number of iterations needed to reach a predefined
solution accuracy. The results provide a bound on the number of bits needed to
achieve the desired accuracy of the optimal solution.
The third paper investigates a particular nonconvex resource allocation prob-
lem, the Optimal Power Flow (OPF) problem, which is of central importance in the
operation of power networks. An efficient novel method to address the general non-
convex OPF problem is investigated, which is based on the Alternating Direction
Method of Multipliers (ADMM) combined with sequential convex approximations.
The global OPF problem is decomposed into smaller problems associated to each
iv

bus of the network, the solutions of which are coordinated via a light communica-
tion protocol. Therefore, the proposed method is highly scalable. The convergence
properties of the proposed algorithm are mathematically and numerically substanti-
ated. The fourth paper builds on the third paper and investigates the convergence
of distributed algorithms as in the third paper but for more general nonconvex
optimization problems. In particular, two distributed solution methods, including
ADMM, that combine the fast convergence properties of augmented Lagrangian-
based methods with the separability properties of alternating optimization are in-
vestigated. The convergence properties of these methods are investigated and suf-
ficient conditions under which the algorithms asymptotically reache the first order
necessary conditions for optimality are established. Finally, the results are numeri-
cally illustrated on a nonconvex localization problem in wireless sensor networks.
The results of this thesis advocate the promising convergence behaviour of some
distributed optimization algorithms on nonconvex problems. Moreover, the results
demonstrate the potential of solving convex distributed resource allocation prob-
lems using very limited communication bandwidth. Future work will consider how
even more general convex and nonconvex problems can be solved using limited
communication bandwidth and also study lower bounds on the bandwidth needed
to solve general resource allocation optimization problems.

Keywords: Distributed Optimization, Resource Allocation, Power Networks,

Limited Communication, Nonconvex Optimization, Wireless Sensor Networks, Cy-
berphysical Systems.
Acknowledgments
First of all, I would like to thank my supervisor Associate Professor Carlo Fis-
chione for his unlimited support, insightful guidance, and inspiring spirit. I am
grateful to Prof. Bo Wahlberg for being my co-advisor. I would also like to express
my deep gratitude to Pradeep Chathuranga Weeraddana for patiently guiding me
through my first steps of research. During my studies I also had the great plea-
sure of working with Associate Professor Michael Rabbat, Professor Vahid Tarokh,
Assistant Professor Na Li, Chinwendu Enyioha, and Kathryn Heal, thanks for all
the fun and productive times we had together, I learned so much from all of you!
In particular, I would like express sincere appreciation to Prof. Vahid Tarokh for
inviting me to work in his group in Harvard University and making me feel so wel-
come during my stay there. I would also like to thank the Engblom Foundation
for supporting my research at Harvard University. I would also like to think all
my collages, at the Department of Automatic Control at the Electrical Engineering
school at KTH Royal Institute of Technology, for contributing in making work so
much fun. Finally, I would like thank my girlfriend, friends, and family for their
unconditional support during my studies.

Sindri Magnússon
Stockholm, January 2016
Contents

Contents ix

List of Figures xiii

List of Acronyms xvii

I Thesis Overview 1

1 Introduction 3
1.1 Motivating Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Future Power Distribution Networks . . . . . . . . . . . . 4
1.1.2 Wireless Sensor Networks . . . . . . . . . . . . . . . . . . 6
1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Outline and Contribution of the Thesis . . . . . . . . . . . . . . . 8

2 Background 15
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Optimal Power Flow Problem . . . . . . . . . . . . . . . . . . . . 16
2.4 Dual Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Alternating Direction Method of Multipliers . . . . . . . . . . . . 19

II Included Papers 23

A Distributed Resource Allocation Using One-Way Communi-

cation with Applications to Power Networks 25
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
A.1.1 Contributions of This Work . . . . . . . . . . . . . . . . . 28
A.1.2 Notation and Definitions . . . . . . . . . . . . . . . . . . . 29
A.2 System Model and Algorithm . . . . . . . . . . . . . . . . . . . . 29
A.3 Convergence analysis of Algorithm 2 . . . . . . . . . . . . . . . . 32

ix
x Contents

A.3.1 General Convergence Result . . . . . . . . . . . . . . . . 32

A.3.2 Linear Convergence Rate . . . . . . . . . . . . . . . . . . 34
A.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . 36
A.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . 37

B Convergence of Limited Communications Gradient Methods 41

B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
B.1.1 Contributions of This Work . . . . . . . . . . . . . . . . . 44
B.1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
B.2 Preliminaries and Application Example . . . . . . . . . . . . . . 45
B.2.1 Application Example: Distributed Power Allocation . . . 46
B.3 Quantized Gradient Descent Methods . . . . . . . . . . . . . . . 47
B.3.1 θ-Covers: Solution to Question A) . . . . . . . . . . . . . 48
B.3.2 Algorithm: Solution to Questions B) and C) . . . . . . . . 50
B.3.3 Minimal Quantization: Solution to Question D) . . . . . . 50
B.4 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
B.4.1 Constant Step Size . . . . . . . . . . . . . . . . . . . . . . 51
B.4.2 Diminishing Step Size . . . . . . . . . . . . . . . . . . . . 56
B.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 57
B.6 Conclusions and future work . . . . . . . . . . . . . . . . . . . . 58
B.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

C A Distributed Approach for the Optimal Power Flow Problem

Based on ADMM and Sequential Convex Approximations 61
C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
C.1.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . 63
C.1.2 Our Contributions . . . . . . . . . . . . . . . . . . . . . . 65
C.1.3 Organization and Notations . . . . . . . . . . . . . . . . . 66
C.2 System model and Problem Formulation . . . . . . . . . . . . . . 66
C.2.1 Centralized formulation . . . . . . . . . . . . . . . . . . . 67
C.2.2 Distributed formulation . . . . . . . . . . . . . . . . . . . 68
C.3 Distributed solution method . . . . . . . . . . . . . . . . . . . . . 71
C.3.1 Outline of the algorithm . . . . . . . . . . . . . . . . . . . 71
C.3.2 The subproblems: Private variable update . . . . . . . . . 72
C.3.3 On the use of quadratic programming (QP) solvers . . . . 75
C.3.4 Net variables and dual variable updates . . . . . . . . . . 75
C.4 Properties of the distributed solution method . . . . . . . . . . . 76
C.4.1 Graphical illustration of Algorithm 2 . . . . . . . . . . . . 76
C.4.2 Optimality properties of Algorithm 2 solution . . . . . . . 78
C.4.3 Optimality properties of ADMM-DOPF solution . . . . . 79
C.5 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . 81
C.5.1 Properties of Algorithm 2 . . . . . . . . . . . . . . . . . . 82
C.5.2 Connection to Proposition 8 . . . . . . . . . . . . . . . . 84
C.5.3 Convergence and scalability properties . . . . . . . . . . . 85
Contents xi

C.6 conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
C.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
C.7.1 On the use of quadratic programming QP solvers . . . . . 90
C.7.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

D A On the Convergence of Alternating Direction Lagrangian

Methods for Nonconvex Structured Optimization Problems 97
D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
D.1.1 Related Literature . . . . . . . . . . . . . . . . . . . . . . 100
D.1.2 Notation and Definitions . . . . . . . . . . . . . . . . . . . 101
D.2 Problem Statement, Related Background, and Contribution of
the Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
D.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . 102
D.2.2 Penalty and Augmented Lagrangian Methods . . . . . . . 102
D.2.3 Alternating Direction Lagrangian Methods . . . . . . . . 103
D.2.4 Contribution and Structure of the Paper . . . . . . . . . . 104
D.3 Alternating Direction Penalty Method . . . . . . . . . . . . . . . 105
D.3.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . 105
D.3.2 Algorithm Properties: Unconstrained Case . . . . . . . . 106
D.3.3 Algorithm properties: Constrained Case . . . . . . . . . . 109
D.4 Alternating Direction Method of Multipliers . . . . . . . . . . . . 113
D.4.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . 113
D.4.2 Algorithm Properties . . . . . . . . . . . . . . . . . . . . . 114
D.5 Application: Cooperative Localization in Wireless Sensor Networks 120
D.5.1 Numerical Results . . . . . . . . . . . . . . . . . . . . . . 123
D.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Bibliography 127
List of Figures

1.1 The future distribution network will consist of various devices that
cooperate to achieve more economic and environmental friendly power
distribution. A challenging aspect is how to achieve the best oper-
ation using limited communication bandwidth for coordination and
communication among the devices. (source: https://www.flickr.com) 5

1.2 Wireless sensor network operating a city. Some of the devices coor-
dinate the power flows through the city, others the vehicular traffic
and water distribution. (source: Yuzhe Xu’s Licentiate Thesis [1]) . 7

1.3 The figure depicts the convergence of the one-way communication

dual decomposition, reported in the first paper, for solving an in-
stance of (1.1) where the number of users is N = 5, 10, 20, 30, 40, 150, 1000.
Figure 1.3a and 1.3b depict the the distance from the dual and primal
variables, at each iteration, to the optimal dual and primal variable,
respectively. The blue dotted line in Figure 1.3a depicts a theoreti-
cal bound on the convergence provided in the paper. For illustration
purposes the primal problems have been constructed so that the dual
problem does not change when number of users increases and hence
the dual convergence shown in Figure 1.3a is the same for all N .
The paper discusses how the primal problem can be regulated so the
dual problem does not change significantly when the number of users
increase, indicating superior scalability properties. . . . . . . . . . . 9

1.4 The figure depicts iterations to obtain the solution to optimization

problem (1.1) based on the quantized gradient methods from the
second paper reported in the thesis. The vertical axis in Figures 1.4a
and 1.4b depicts the norm of the gradient and the primal objective
function, respectively, at every iteration. The green dotted line is
an approximation of the gradient direction method (B.3) where only
4 bits are communicated per iteration, but yet achieves almost the
same performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

xiii
xiv List of Figures

1.5 The figure depicts the convergence of ADMM for the nonconvex OPF
problem, of the form (1.1), studied in the third paper. The algorithm
is distributed between the nodes/buses of a power network where
each node keeps a private estimate of its neighbours voltages and δ
is a measure of the consensus/consistency between the voltage esti-
mates. ρ is an algorithm parameter that penalizes violations of the
consensus constraint. Figure 1.5a depicts δ over the course of the al-
gorithm for different ρ’s. The figure shows that the algorithm always
reaches consensus among the nodes with high numerical accuracy
and larger ρ enforce more consistency. Figure 1.5b depicts the objec-
tive function value compared with the consensus, δ, where n indicates
the iteration number. The figure shows that when ρ = 106 then the
algorithm converges almost to the red vertical line depicting a lower
bound on the optimal value given by a relaxation. . . . . . . . . . . 11
1.6 The figure depicts the the convergence of the distributed algorithms
studied in the forth reported paper when (1.1) represents a non-
convex estimation problem, i.e., localization based on noise distance
measurements. Figure 1.6a depicts the problem setup, where the sen-
sors, black markers, estimate their own location by measuring the dis-
tances to their neighbours in the network and communicating over
the network. The grey circles are anchor nodes that know their own
location and the coloured markers denote the estimations obtained
from different algorithms. Figure 1.6b depicts the objective function,
i.e., a penalty of violating the distance measurements, at every iter-
ation. Since the problem is nonconvex the algorithms can converge
to different local minima. . . . . . . . . . . . . . . . . . . . . . . . . 12

A.1 Using a sufficiently small step size, the feasibility of the primal prob-
lem is maintained. The upper boundary of this feasible set is denoted
by the dotted line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
A.2 Convergence of the algorithm for different number of users N =
5, 10, 20, 30, 40, 150, 1000. In Figures A.2a and A.2c the behaviour of
the algorithm is identical for all N . The blue dotted line in A.2c is
the theoretical bound from Proposition 5. . . . . . . . . . . . . . . . 37
B.1 Gradient and objective function value over the course of the algorithm. 57
C.1 The feasible set of ((vkRe)r, (vkIm)r) where A=(vkmax)r and B=(vkmin)r . 73
C.2 Graphical illustration of Algorithm 2. . . . . . . . . . . . . . . . . . . 78
C.3 CDF displaying DFk,n of Eq. (C.38) for every subproblem k and
every ADMM iteration n for each of the four examples. . . . . . . . 83
C.4 Histograms displaying DFk,n in Eq. (C.38) for every subproblem k
and every ADMM iteration n for the considered test networks. . . . 84
C.5 δ versus number of ADMM iterations. . . . . . . . . . . . . . . . . . 85
C.6 versus number of ADMM iterations. . . . . . . . . . . . . . . . . . 86
C.7 δ versus the objective function value. . . . . . . . . . . . . . . . . . 87
List of Figures xv

C.8 Relative objective function. . . . . . . . . . . . . . . . . . . . . . . . 87

C.9 Tp , |f − f ? |/f ? , δ, and as a function of ADMM iterations. . . . . 89
D.1 Example where ADPM fails to converge to a feasible point when sets
X and Z are nonconvex. . . . . . . . . . . . . . . . . . . . . . . . . . 113
D.2 The results of running all 5 algorithms on the test network. . . . . . 123
D.3 The position estimate each algorithm converges to. . . . . . . . . . 124
List of Acronyms

ADMM Alternating Direction Method of Multipliers

CDF Cumulative Distribution Function
OPF Optimal Power Flow
QGM Quantized Gradient Methods
RA Resource Allocation
WSN Wireless Sensor Network

xvii
Part I

Thesis Overview

1
Chapter 1

Introduction

Cyber physical systems, consisting of networked computing and sensing devices

used to control physical systems, appear in numerous modern engineering appli-
cations. Examples range from large systems such as smart cities, power grids, and
vehicular networks, to smaller systems such as robotic teams, smart homes, and
inter-body wireless sensor networks. The different network entities, e.g., comput-
ing, sensing, and actuator devices, are usually scattered throughout the physical
system that they operate and coordinate through a communication network. The
traditional way to operate these systems is to use a centralized control/decision
making approach, where all the entities send their data to a central node which
makes an executive decision. However, the centralized approach is generally not
applicable in practice due to the constraints of real-world engineering systems, e.g.,
their large scale, the need for real-time operation, or communication limits. More-
over, the centralized approach may violate the autonomy and privacy of the indi-
vidual entities, which does not comply with most practical applications. Therefore,
reliable control/decision making algorithms for cyber physical systems must run in
a distributed fashion where the entities cooperate by a communication network.
In addition to the distributed nature, a challenging aspect in the design of
robust distributed algorithms is that as technology advances and the networks be-
come larger the communication bandwidth used to coordinate these networks has
fundamental limits or can be poorly available due to high costs. In particular, the
computing, sensing, and actuator devices are always becoming cheaper, smaller, and
more convenient to employ in larger quantities. At the same time, both academia
and industry are constantly coming up with new ways to revolutionize networked
infrastructures and cyber physical systems using the latest technology. Yet, a bot-
tleneck for future developments is the communication bandwidth, which is a scarce
or expensive resource. Therefore, distributed algorithms for cyber physical systems
that are agnostic about the communication infrastructure are not sustainable. As
a result, it is important to investigate communication efficient algorithms for com-
pleting the main tasks of cyber physical systems, such as optimization, decision
making, or estimation.

3
4 Introduction

The task of finding good operation points in cyberphysical systems can often be
formulated as an optimization problem with a global network-wide objective func-
tion and problem data that are scattered between the network entities. Distributed
algorithms for solving these optimization problems have been much investigated re-
cently, but almost extensively for well behaved convex problems. On the contrary,
distributed algorithms for the more challenging nonconvex problems have received
little attention despite the emerging need for scalable solutions for many large scale
nonconvex applications, e.g., localization in wireless sensor networks and optimal
power flow in smart grids. Fortunately, most large scale nonconvex problems share
the structures that are usually exploited to achieve distributed/scalable algorithms
for their convex counterparts; however, the convergence properties of the solution
algorithms for nonconvex problems are left largely unexplored. In this thesis, we
investigate some prominent nonconvex optimization problems that appear in cy-
berhysical systems, as we survey in the next section that provides some motivating
examples.

1.1 Motivating Examples

We now provide two motivating examples for the theory developed in this thesis.

1.1.1 Future Power Distribution Networks

Power networks are arguably among the largest infrastructures made by humans
and an essential foundation for most modern technology. Power networks consists of
two layers, transmission networks and distribution networks, which are connected
by electric substations [2]. The transmission networks deliver power from big power
plants to large factories and electrical substations. The distribution networks take
over at the electrical substations and deliver the power within the cities to the
end users. In the transmission networks, all the network entities are owned by the
the power companies and possibly few other big corporations. Therefore, the in-
formation and the control actions needed for efficiently operating these networks
are easily attainable. In fact, transmission networks are now operated at almost
optimal efficiency after being heavily investigated for many decades. On the other
hand, the distribution networks consist of large number of users, such as households
and small business, whose electricity consumption is unpredictable. Therefore, dis-
tribution networks have traditionally been operated using crude information using
statistical inference of the aggregated/total power usage and without timely and
efficiently affecting the users behaviour.
The rapid developments of modern electronics over the last decades have re-
sulted in unprecedented growth of power demand in the distribution networks. At
the same time, both legislators and consumers have largely rejected power gener-
ation that comes at the expense of nature. Therefore, the future power networks
are confronted with the contradicting goals of increasing the generation and causing
less environmental impacts. The distribution networks must be reformed to keep up
1.1. Motivating Examples 5

Figure 1.1: The future distribution network will consist of various devices that co-
operate to achieve more economic and environmental friendly power distribution.
A challenging aspect is how to achieve the best operation using limited communi-
cation bandwidth for coordination and communication among the devices. (source:
https://www.flickr.com)

with the future developments. Fortunately, the many recent technical advancements
such as i) cheaper, smaller, and more effective sensing-, computing-, and actuator
devices, ii) the promising evaluation of fast and distributed signal processing and
control algorithms, and iii) more reliable communication, have the potential to
revolutionize today’s distribution networks. In particular, in the future smart dis-
tribution networks [3] (smart grids) users will be equipped with smart meters which
are small computational devices that can communicate with the power operator to
help consumers use power in a more economical way. The smart meter will both i)
affect/control the users behaviour positively by incentivising them to consider the
state of the distribution network when using power, e.g., lowering the price when
the total demand is low, and ii) increase the efficiency of the distribution network
by gathering the needed information and coordinating with other network entities.
6 Introduction

Moreover, the household appliances will be more autonomous and use power in
more economic and environmental friendly manner, in sync with the user. The fu-
ture distribution network will also generate power from many distributed renewable
generators such as wind turbines. The consumers will also be able to inject power
into the grid from various renewable sources such as solar panels, training bikes, or
from the batteries of their electrical vehicles.
While the vision of the next generation distribution networks has the poten-
tial to revolutionize todays power networks, many technical problems must first be
solved. With the smart grid now in its infancy, industry and academia will inves-
tigate new hardware that will be integrated to the grid and solve advanced tasks
with the new technology. With the distribution networks growing, due to the inte-
gration of all the new technologies, it is essential to carefully consider from the start
how the different tasks can be accomplished with minimal communication so the
smart grid will evolve in a sustainable manner and will not crash under the future
growing huge scale. Another reason for considering efficient communication is that
no sophisticated communication infrastructure has yet been integrated into todays
distribution networks - nevertheless power networks are equipped with natural com-
munication infrastructure, i.e., power line communication [4, 5], which can already
be used but has limited communication bandwidth. The power line communication
has the potential to kickstart many of the early integrations of smart grid before a
dedicated communication network is merged with the grid, which could take a long
time since industry will carefully consider any steps towards integrating costly new
infrastructure into the grid. Therefore, it is essential to address the communication
challenges right away both for sake of early integration of smart grid and also so
that the smart grid can keep up with future challenges.
Another central problem in power networks is the optimal power flow problem,
which finds the best of feasible power flows through the network, where what is
best depends on the engineering need. The problem is well studied in transmission
networks, but due to lack of control and information in traditional distribution
networks the problem has only recently become relevant. A challenge in solving the
optimal power flow problem in distribution networks is that DC-optimal power flow,
an approximation of the problem which works well in transmission networks, is not
appropriate for the distribution network due to their low voltage. Therefore, when
solving power flow problems in distribution networks the full AC optimal power
flow problem needs to be considered, which turns out to be highly nonconvex. As
a result, it is important to find distribution solution method to nonconvex power
flow problems in the distribution networks to operate them more efficiently.

1.1.2 Wireless Sensor Networks

Wireless sensor networks (WSN) [6] are one of the fundamental parts of cyber phys-
ical systems, where the autonomous devices used to monitor/control the physical
system cooperate over a wireless medium. In many cyber physical systems, wireless
communication is the only feasible way to coordinate the network devices due to
1.1. Motivating Examples 7

Figure 1.2: Wireless sensor network operating a city. Some of the devices coordinate
the power flows through the city, others the vehicular traffic and water distribution.
(source: Yuzhe Xu’s Licentiate Thesis [1])

environmental constraints, such as when the networks are mobile (e.g., vehicle net-
works [7]), are underwater [8], or are attached or even inside the human body (e.g.,
body area networks [9]). Even when wired communications are feasible, wireless
communication can still be a better option due to the many benefits of WSNs, e.g.,
they don’t need an existing communication infrastructure and are usually easier to
employ. However, to achieve all the benefits of wireless operation of cyber physi-
cal systems, lightweight communications are essential because communication over
wireless channels is a scarce resource and has to be well coordinated due interfer-
ences and messages collisions, or may be very expensive. Moreover, the wireless
devices are usually battery powered with no or limited power sources and there-
fore economical usage of communication is essential to prolong the lifetime of the
networks.
Localization and tracking is a fundamental task in many WSN’s, since the physi-
cal location of the network devices usually has large statistical impact on the sensed
information and is also relevant in controlling the network. Moreover, in many
WSN applications, such as indoor positioning systems, the localization/tracking
is the sensing task of the network. Using the Global Positioning Systems (GPS)
is often unfeasible due non-line of sight to a satellite, e.g., in indoor positioning,
or unsatisfactory accuracy, e.g., in interbody wireless sensor networks, in addition
to being generally unattractive due to the power and cost constraints of the in-
dividual sensors. A more attractive option is self-localization of the network [10],
where the locations of the devices are estimated from the known locations of ref-
erence nodes and distance measurements between communication neighbors in the
network. Localization using distance measurements is a nonconvex optimization
8 Introduction

problem. Therefore, it is essential to find distributed solution method that solve

such challenging nonconvex optimization cooperatively among the network devices
using efficient communication schemes.

1.2 Problem Formulation

In this thesis we consider general problems of the form
N
X
minimize fi (xi ),
x
i=1
(1.1)
subject to (x1 , · · · , xN ) ∈ X , xk ∈ Xk ,

where i = 1, · · · N , are users in a cyberphysical system, or nodes in a network. The

objective function is separable between the usurers where the objective function
part of user i = 1, · · · , N , fi , represents his/her preference. The constraints are
coupled between the users, which can for example represent shared resources. We
articulate in Chapter 2 the theoretical background on problems of the form of (1.1).
In this thesis, we consider distributed solution methods for (1.1), where the users
keep their private objective function part, fk , and constraints, Xk , and communicate
to reach the solution, over possibly a bandwidth limited channel. Moreover, we
consider some nonconvex instances of problem (1.1).

1.3 Outline and Contribution of the Thesis

This thesis is based on four published and submitted papers, we now list the con-
tribution of each of them.

Distributed Resource Allocation Using One-Way Communication with

Applications to Power Networks
The first paper investigates the solution to problem (1.1) based on distributed dual
descent power allocation algorithm in power networks that only use one way com-
munication. At every iteration of the algorithm the power supplier i) first broadcasts
a scalar coordinating/price signal to the users and ii) then he/she measures the ag-
gregate power usage given that pricing signal. Therefore, the algorithm operates by
communicating only one scalar message at every iteration of the algorithm. Since
the users actually update their power consumption while the algorithm is running,
it is possible that the total power usage excites the supplier power capacity causing
blackouts. To avoid such blackouts the resource allocation problem must remain
feasible at every iteration of the algorithm, i.e., the power usage must always be
below the suppliers capacity. We show how the algorithm parameters can be chosen
to ensure primal feasibility while the algorithm runs and hence avert costly black-
outs. To ensure the primal feasibility at every iteration, it is not possible to choose
1.3. Outline and Contribution of the Thesis 9

(a) Dual Iterates (b) Primal Iterates

Figure 1.3: The figure depicts the convergence of the one-way communication dual
decomposition, reported in the first paper, for solving an instance of (1.1) where the
number of users is N = 5, 10, 20, 30, 40, 150, 1000. Figure 1.3a and 1.3b depict the
the distance from the dual and primal variables, at each iteration, to the optimal
dual and primal variable, respectively. The blue dotted line in Figure 1.3a depicts
a theoretical bound on the convergence provided in the paper. For illustration
purposes the primal problems have been constructed so that the dual problem does
not change when number of users increases and hence the dual convergence shown
in Figure 1.3a is the same for all N . The paper discusses how the primal problem
can be regulated so the dual problem does not change significantly when the number
of users increase, indicating superior scalability properties.

the step sizes in the dual descent algorithm that give the optimal convergence rate
O(1/t2 ), where t is the iteration index, for the given the structure of the dual
problem, which is convex with Lipschitz continuous gradients, and the algorithm
generally has the convergence rate O(1/t). Nevertheless, we show that under mild
structure on the resource allocation problem, a linear convergence rate O(ct ), with
c ∈ [0, 1[, is achieved. Moreover, we provide additional problem structure where the
linear convergence rate is independent of the number of users, hence demonstrating
a superior scalability properties. Finally, we illustrate the results using numerical
simulations.
This article has been accepted to appear in:

• S. Magnússon, K. Heal, C. Enyioha, N. Li, C. Fischione, and V. Tarokh, Dis-

tributed Resource Allocation Using One-Way Communication with Applica-
tions to Power Networks, Accepted to Appear in the 50th Annual Conference
on Information Sciences and Systems (CISS) 2016, Princeton, NJ, USA.
10 Introduction

(a) Gradient (b) Objective Function Value

Figure 1.4: The figure depicts iterations to obtain the solution to optimization
problem (1.1) based on the quantized gradient methods from the second paper
reported in the thesis. The vertical axis in Figures 1.4a and 1.4b depicts the norm
of the gradient and the primal objective function, respectively, at every iteration.
The green dotted line is an approximation of the gradient direction method (B.3)
where only 4 bits are communicated per iteration, but yet achieves almost the same
performance.

Convergence of Limited Communications Gradient Methods

The second paper investigates distributed gradient methods, where the gradient
is communicated at every iteration of the algorithm, when bandwidth is limited.
In particular, the paper considers quantized gradient methods (QGM) where the
gradient descent direction is projected to a finite quantization set D before being
communicated. The paper investigates necessary and sufficient conditions that en-
sure the quantization set D be proper, in the sense that the QGMs can minimize any
convex function f : RN → R with Lipschitz continuous gradients and non-empty,
bounded set of minimizers. We use this characterization to provide examples of
proper quantization sets D. We also show that if |D| ≤ N then D cannot be proper
and there exists an optimization problem, from the aforementioned class, which
QGMs can not solve. Moreover, we show that there exists proper quantization sets
with |D| = N + 1, hence the minimal cardinality of a proper quantization set is
N + 1, which can be communicated using log2 (N + 1) bits. We provide a bound on
the number of iterations needed to achieve any accuracy on the optimal solution
that depends on the fineness of the quantization set D. Specifically, the bound on
number of iterations decreases when the quantization set becomes finer. We also
show that, when the step-sizes are non summable but square summable, then the
iterates of QGMs converge to the set of optimal values. Finally, we demonstrate
how the theory can be applied to a resource allocation problem in power networks.
This paper has been submitted to:
1.3. Outline and Contribution of the Thesis 11

0 0
10 10

ρ = 106 n=1 n=1

n=201
−10 7 −10 n=401 n=601
10 ρ = 10 10
n=2801 n=1401
ρ = 108 n=4801
9 n=2201
ρ = 10 n=6801

δ
δ

−20 −20 n=3001

10 10
ρ = 1010 n=8801
n=3801
ρ = 1011 SDP relaxation
n=4601
−30 ρ = 1012 −30
10 ρ=106 n=5401
10 13 n=6201
ρ = 10
13 ρ=10
0 2000 4000 6000 8000 10000 0 5000 10000 15000
Number of ADMM iterations Objective function
(a) Consensus (b) Objective Function

Figure 1.5: The figure depicts the convergence of ADMM for the nonconvex OPF
problem, of the form (1.1), studied in the third paper. The algorithm is distributed
between the nodes/buses of a power network where each node keeps a private
estimate of its neighbours voltages and δ is a measure of the consensus/consistency
between the voltage estimates. ρ is an algorithm parameter that penalizes violations
of the consensus constraint. Figure 1.5a depicts δ over the course of the algorithm for
different ρ’s. The figure shows that the algorithm always reaches consensus among
the nodes with high numerical accuracy and larger ρ enforce more consistency.
Figure 1.5b depicts the objective function value compared with the consensus, δ,
where n indicates the iteration number. The figure shows that when ρ = 106 then
the algorithm converges almost to the red vertical line depicting a lower bound on
the optimal value given by a relaxation.

• S. Magnússon, K. Heal, C. Enyioha, N. Li, C. Fischione, and V. Tarokh,

Convergence of Limited Communications Gradient Methods, submitted to
IEEE American Control Conference (ACC) 2016.

A Distributed Approach for the Optimal Power Flow Problem Based

on ADMM and Sequential Convex Approximations
The third paper considers the optimal power flow (OPF) problem, which plays a
central role in operating electrical networks. The problem consists in finding the
optimal flow of power through a power networks. It is nonconvex and NP hard.
Therefore, designing efficient algorithms of practical relevance is crucial, though
their global optimality is not guaranteed. The paper develops an efficient novel
method to address the general nonconvex OPF problem. The proposed method is
based on the alternating direction method of multipliers combined with sequen-
tial convex approximations. The global OPF problem is decomposed into smaller
problems associated to each bus of the network, the solutions of which are coor-
dinated via a light communication protocol. Therefore, the proposed method is
12 Introduction

100

f (x(t))
10-2

10-4

10-6

0 2000 4000 6000 8000

number of iterations (t)
(a) Location Estimates (b) Objective Function

Figure 1.6: The figure depicts the the convergence of the distributed algorithms
studied in the forth reported paper when (1.1) represents a nonconvex estimation
problem, i.e., localization based on noise distance measurements. Figure 1.6a depicts
the problem setup, where the sensors, black markers, estimate their own location by
measuring the distances to their neighbours in the network and communicating over
the network. The grey circles are anchor nodes that know their own location and
the coloured markers denote the estimations obtained from different algorithms.
Figure 1.6b depicts the objective function, i.e., a penalty of violating the distance
measurements, at every iteration. Since the problem is nonconvex the algorithms
can converge to different local minima.

highly scalable. The convergence properties of the proposed algorithm are math-
ematically substantiated. Finally, the algorithm is evaluated on a number of test
examples, where the convergence properties are investigated and the performance
is compared with a global optimal method.
The paper has been published in:

• S. Magnússon, P. C. Weeraddana, C. Fischione, ”A Distributed Approach for

the Optimal Power-Flow Problem Based on ADMM and Sequential Convex
Approximations,”, Control of Network Systems, IEEE Transactions on, vol.2,
no.3, pp.238-253, Sept. 2015

On the Convergence of Alternating Direction Lagrangian Methods for

Nonconvex Structured Optimization Problems
The forth reported paper investigates the convergence of distributed methods for
nonconvex structured optimization problems. In particular, the paper investigates
two distributed solution methods that combine the fast convergence properties of
augmented Lagrangian-based methods with the separability properties of alternat-
ing optimization. The first method is adapted from the classic quadratic penalty
1.3. Outline and Contribution of the Thesis 13

function method and is called the Alternating Direction Penalty Method (ADPM).
Unlike the original quadratic penalty function method, in which single-step op-
timizations are adopted, ADPM uses an alternating optimization, which in turn
makes it scalable. The second method is the well-known Alternating Direction
Method of Multipliers (ADMM). It is shown that ADPM for nonconvex prob-
lems asymptotically converges to a primal feasible point under mild conditions.
Additional conditions ensuring that ADPM asymptotically reaches the standard
first order necessary conditions for local optimality are introduced. In the case of
the ADMM, novel sufficient conditions under which the algorithm asymptotically
reaches the standard first order necessary conditions are established. Based on this,
complete convergence of ADMM for a class of low dimensional problems are char-
acterized. Finally, the results are illustrated by applying ADPM and ADMM to a
nonconvex localization problem in wireless sensor networks.
The chapter is based on the following papers:
• S. Magnússon, P. C. Chathuranga, M. Rabbat, C. Fischione, ”On the Con-
vergence of Alternating Direction Lagrangian Methods for Nonconvex Struc-
tured Optimization Problems,” Accepter, to Appear, in Control of Network
Systems, IEEE Transactions on, 2016
• S. Magnússon, P. C. Chathuranga, M. Rabbat, C. Fischione, ”On the conver-
gence of an alternating direction penalty method for nonconvex problems,” in
Signals, Systems and Computers, 2014 48th Asilomar Conference on, pp.793-
797, 2-5 Nov. 2014
Chapter 2

Background

This chapter summarizes the background theory used in the contribution of the the-
sis. In particular, Section 2.2 introduces general resource allocation problems that
are studied in this thesis. Section 2.3 introduces the optimal power flow problem, a
specific resource allocation problem that plays an important role in the operation
of power networks. In the following Sections 2.4 and 2.5 we discuss distributed so-
lution methods for the studied problems. In particular, Section 2.4 discusses dual
decomposition and 2.5 discusses the alternating direction method of multipliers
(ADMM).

2.1 Notation
We use the following notation in this chapter. Vectors and matrices are represented
by boldface lower and upper case letters, respectively. The set of real n vectors
and n×m matrices are denoted by Rn and Rn×m , respectively, and C represents
the set of complex numbers. Otherwise, we use calligraphy letters to represent
sets. The superscript (·)T stands for transpose. j denotes the imaginary number
√
−1. diag(A1 , . . ., An ) denotes the diagonal block matrix with A1 , . . ., An on the
diagonal. ||·|| denotes the 2-norm. The domf is the domain of the function f : Rn →
Rm . intX denotes the interior of the set X .

2.2 Resource Allocation

The question of how to distribute limited resources between number of entities is
central to every society, even so that it is the theme of a whole field of social science,
i.e., economics. The question also has deep roots in engineering since many engi-
neering systems are designed to distribute some limited resources between users,
e.g., power, data/bandwidth, water, etc. To design engineering systems that can
distribute the limited resources in some sensible manner the question must be ex-
pressed formally as a mathematical problem, so it is clear which resource allocations
are more desirable than others. A standard approach is to formulate the resource

15
16 Background

allocation problem as an optimization problem by associating a utility function fi

to each user i ∈ N , where N = {1 · · · , N } is a set of N users. Then the resource
allocation problem can be formulated as follows.
N
X
minimize fk (xk )
x
k=1 (2.1)
subject to g(x1 , · · · , xN ) ≤ 0, (or g(x1 , · · · , xN ) = 0)
xk ∈ Xk , for all k = 1, · · · , N,
where the function g is associated to the resources that are shared among the users
and Xk is a local constraint associated with user k ∈ N which expresses his/her
preferences. We refer to the constraint g(x1 , · · · , xN ) ≤ 0 (or g(x1 , · · · , xN ) = 0)
as the coupling constraints since they couple the allocations between the users. The
coupling constraint can be an equality or an inequality constraint depending on
weather the resources must be used up or not. The utility functions fk can represent
the preferences of user k ∈ N for the allocated resources or can be designed to
achieve various types of equilibrium points or fairness among the users [11, 12].
An illustrative and practical instance of (2.1) is when one resource is shared
among all the users as follows:
N
X
minimize fk (xk )
x
k=1
N N
!
X X (2.2)
subject to xk ≤ Q, or xk = Q
k=1 k=1
xk ∈ Xk , for all k = 1, · · · , N,
Resource allocation problem (2.2) can, for an example, appear in power networks
where limited power is available and needs to be shared among the users. Here, Q
PN the power available in the network, xk the power allocated to user k ∈ N ,
denotes
and k=1 xk the aggregated power usages. The set Xk and the function fk represent
the preferences of user k ∈ N . More complicated variants of (2.1) can have more
resources or even coupling between the different resources.
In power networks, it is usually not only important to find a good/fair allocation
between the users of the available power, but it is also important that the network
can support that allocation. This is because the power in power networks flows
according to physical laws, i.e., Ohm’s and Kirchoff’s laws, represented by power
flow equations. We next demonstrate a special case of (2.1) that includes these
physical properties of power networks.

2.3 Optimal Power Flow Problem

Consider a power network (N , E) where N and E are the users/vertices and power
lines/edges, respectively. We represent the power flow through each user k ∈ N by
2.3. Optimal Power Flow Problem 17

the power flow equation

gk (zNk ) = 0,
where zNk = (zi )i∈Nk , Nk = {j|(k, j) ∈ E} ∪ {k}, and the vector zi keeps all the
needed physical quantities of user i, such as voltage and power injection. Now the
optimal power flow problem consists in finding the optimal flows that satisfies the
power flow equations. More formally, the problem can be written as

N
X
minimize fk (zk )
z1 ,··· ,zN
k=1 (2.3)
subject to gk (zNk ) = 0, for all k ∈ N ,
zk ∈ Xk , k ∈ N .

Notice that (2.3) is a special case of (2.1) with

g(z1 , · · · , zN ) = (gk (zNk ))k∈N .

The utility functions fk and the local constraints Xk can vary depending on the
engineering need. For example, the utilities can be chosen so that (2.3) minimizes
the power loss or generation costs to satisfy demand of the users given by Xk or
the utilities can be constructed so (2.3) finds some fair allocation of power between
the households.
We now formally express the power flow equations g(z1 , · · · , zN ) = 0, see any
textbook on power system analysis for more details, e.g., [2, 13]. To express the
power flows set zk = (sk , vk ) where sk , vk ∈ C are the power and voltage at user
(or bus) k ∈ N , respectively, then
X
gk (zNk ) = sk − vk yki vi , (2.4)
i∈Nk

where yki = gki + jbki ∈ C, with gki , bki ∈ R, is the admittance in the flow line
(k, i) ∈ E. The power flow equations are also commonly expressed with the voltage
in polar coordinates. In that case, we set zk = (pk , qk , |vk |, θk ) where sk = pk + jqk ,
with pk , qk ∈ R, vk = |vk |ejθk , and the power flow equations (2.4) reduce to
" P #
pk − i∈Nk |vi ||vk | (gki cos(θk − θi ) + bki sin(θk − θi ))
gk (zNk ) = P . (2.5)
qk − i∈Nk |vi ||vk |((gki sin(θk − θi ) − bki cos(θk − θi ))

The power flow equations (2.4), (2.5), and (2.3) are nonlinear which renders the
optimization problem (2.3) nonconvex.
As argued in the introduction, it is essential to solve problems such as (2.1),
(2.2), and (2.3) in a distributed manner. We next review standard distributed meth-
ods for these problems.
18 Background

2.4 Dual Decomposition

We start by recalling the demonstration on how problems on the form of (2.1) can be
decomposed between the users in N by using duality theory, for more comprehensive
overview see, for example, [14, Chapter 6]. The dual function D of the problem (2.1)
with respect to the coupling constraint is obtain by maximizing the Lagrangian
function, given by
N
X
L(x, p) = fk (xk ) − pT g(x1 , · · · , xN ), (2.6)
k=1

which results in

D(p) = maximize L(x, p), (2.7)

x∈X
= L(x(p), p), (2.8)
QN
where x = (x1 , · · · , xN ), X = i=1 Xi , and

x(p) = argmax L(x, p). (2.9)

x∈X

The dual problem of (2.1) with respect to the coupling constraint g is then given
by
minimize D(p)
p
(2.10)
subject to p ≥ 0.
The dual function, D, is always convex and therefore the dual problem (2.10) is
always convex, even when the primal problem (2.1) is nonconvex. Therefore, D has a
gradient (or a subgradient) at every interior point of the domain of D [15, Theorem
1.7], i.e., at every p ∈ int dom D, and the (sub)gradients are given by

∇D(p) = g(x1 (p), · · · , xN (p)). (2.11)

If optimization problem (2.9) has a unique solution, then (2.11) is a gradient, other-
wise every solution p of (2.9) provides a subgradient (2.11). When D is everywhere
differentiable, which is for example the case when fi are strictly concave for all
i ∈ N and g is convex, then (2.10) can be solved by using the dual descent method

p(t+1) = p(t) + γ(t)∇D(p(t)) (2.12)

with appropriate step-size choice γ(t) ∈ R+ , see [14]. When D is not differentiable
but subgradients exist everywhere, which is for example the case if X is compact
and fi and g are continuous, then (2.10) can be solved using subgradient method
which follows the recursion (2.12) using subgradients.
2.5. Alternating Direction Method of Multipliers 19

The interesting aspect about the dual descent method (2.12) is that a decom-
position structure in the coupling constraint g can be used to decouple the re-
cursion (2.12) between the users. For example if g is separable between the users,
i.e.,
N
X
g(x) = gk (xk ),
k=1

then optimization problem in (2.7) is fully separable between the users and the
(sub)gradient ∇D(p) can be computed without coordination where each user k ∈ N
solves the local problem

xk (p) = argmaxfi (xk ) − pT gk (xk ).

xk ∈Xk

However, as we are interested in solving the primal problem (2.1) but not the dual
problem (2.10), the dual descent method is only usable if the primal solution can
be constructed from the optimal dual solution p? , e.g., if the duality gap is zero
and (2.7) has a unique solution for p? .
The dual descent method can be unstable in some cases since the dual gradient
might not exist everywhere, e.g., outside of the domain of D where D(p) = ∞.
Moreover, to ensure that the primal optimal solution can be constructed from the
dual optimal solution, strong assumptions must be made on the primal problem,
such as strongly convexity. In addition, the convergence of the dual ascent is heavily
dependent on the step size choice γ(t). These drawbacks of the dual decomposition
have motivated the more robust variant of the dual descent method which is now
introduced.

2.5 Alternating Direction Method of Multipliers

Consider now a different setup where two user wish to cooperatively solve an opti-
mization problem, we see later that this setup naturally generelizes to a multi user
case, e.g., to (2.1). The two users wish to solve a problem of the form

minimize f (x) + g(z)

x,z
(2.13)
subject to Ax + Bz = c.

where the information keept by the two users is (x, f , A) and (z, g, B), respec-
tively, and both users know c. A method for addressing problems of the from (2.13)
cooperatively between the two users is the Alternating Direction Method of Mul-
tipliers (ADMM), a variant of the dual descent method where the dual function is
obtained from a regularized Lagrangian function given by

Lρ (x, z, p) = f (x) + g(z) + pT (Ax + Bz − c) + ρ||Ax + Bz − c||22 . (2.14)

20 Background

Unlike the standard Lagrangian, the regularized Lagrangian is not separable in x

and z, even though the primal problem (2.13) has the useful decomposition struc-
ture in x and z. Therefore, the usual primal variable update (cf. (2.7)) cannot be
performed in parallel by the two users without communication. Instead, ADMM
approximately solves the primal problem in two steps, first with respect to the x
variable and then with respect to the z variable before updating the dual variable,
as follows

x(t + 1) =argmin Lρ (x, z(t), p(t)), (2.15)

x
z(t + 1) =argmin Lρ (x(t + 1), z, p(t)), (2.16)
z
p(t + 1) =p(t) + ρ (Ax(t + 1) + Bz(t + 1) − c) . (2.17)

Note that the users need to communicate Ax(t + 1) and Bz(t + 1) over the course
of the algorithm and the dual variable p can be maintained by either or both
users. Compared to the dual decomposition, the ADMM has very good convergence
properties and is guaranteed to converge to the optimal value of (2.13) for any ρ > 0
if f and g are closed, proper, and convex and L0 (the standard Lagrangian) has a
saddle point [16].
Problem (2.13) can also cover multiuser scenarios. For instance, by letting the
x variable be a private variable of the different users and z be a coupling variable
that ensures consensus. Consider for example the optimal power flow problem given
in (2.3). Let x = (x1 , · · · , xN ), where xk is a local copy that user k keeps of his/her
own variable zk and it’s neighbours zi for i ∈ Nk , where Nk = {i ∈ N |(k, i) ∈ E}.
For convenience we use the notation xkk to denote the component of xk associated
with zk . Now the problem (2.3) can be formulated equivalently on the form of (2.13),
as follows
XN
minimize fk (xkk )
x1 ,··· ,xN
k=1
subject to x − Ez = 0 (2.18)
gk (xk ) = 0, for all k ∈ N ,
xk ∈ Xk , k ∈ N ,
where E = (E1 , · · · , EN ) is a binary matrix where component Ek captures the
coupling constraints xk = zNk = Ek z. Notice that (2.18) is on the form of (2.13)
where the coupling constraint is x − Ez = 0. Moreover, the x-update of the ADMM
(cf. (2.15)) can be performed in a fully distributed manner between the users N
where each user k ∈ N solves the local subproblem
ρ
minimize fk (xkk ) + pT (t) (xk − Ez(t)) + ||xk − Ez(t)||
xk 2
subject to gk (xk ) = 0, (2.19)
xk ∈ Xk .
2.5. Alternating Direction Method of Multipliers 21

The z-update of the ADMM (cf. (2.16)) reduces to a simple averaging between
neighbors, i.e.,
1 X
zk = xi ,
|Nk |
i∈Nk

and can be achieved by a simple coordination between the neighbors in the network.
The dual variable update (cf. (2.17)) can then be achieved locally by each user.