Consensus in Multi Agent Systems
Consensus in Multi Agent Systems
php
Chapter 13
Consensus in
Multi-agent Systems
13.1 Introduction
This chapter brings together game theory and consensus in multi-agent systems. A multi-
agent system involves n dynamic agents; these can be vehicles, employees, or computers,
each one described by a differential or difference equation. The interaction is modeled
through a communication graph. In a consensus problem the agents implement a dis-
tributed consensus protocol, i.e., distributed control policies based on local information.
The goal of a consensus problem is to make the agents’ reach consensus, that is, to converge
to a same value, called a consensus value.
The core message in this chapter is that the consensus problem can be turned into a
noncooperative differential game, where the dynamic agents are the players. To do this, we
formulate a mechanism design problem where a supervisor “designs” the objective func-
tions such that if the agents are rational and use their best-response strategies, then they
converge to a consensus value. We illustrate the results by simulating the vertical align-
ment maneuver of a team of unmanned aerial vehicles (UAVs).
Unfortunately, solving the mechanism design problem is a difficult task, unless the
problem can be modeled as an affine quadratic game. Given such a game, the main idea
is then to translate it into a sequence of more tractable receding horizon problems. At
each discrete time tk , each agent optimizes over an infinite planning horizon T → ∞
and executes the controls over a one-step action horizon δ = tk+1 − tk . The neighbors’
states are kept constant over the planning horizon. At time tk+1 each agent reoptimizes its
controls based on the new information on neighbors’ states which have become available.
We then take the limit for δ → 0.
This chapter is organized as follows. Section 13.2 formulates the consensus problem
(Problem 13.1) and the mechanism design problem (Problem 13.2). Section 13.3 provides
a solution to the consensus problem. Section 13.4 addresses the mechanism design prob-
lem. Section 13.5 illustrates the results on a study case involving a team of UAVs perform-
ing a vertical alignment maneuver. Finally, Section 13.6 provides notes and references on
the topic.
143
144 Chapter 13. Consensus in Multi-agent Systems
a network describes the interactions between pairs of agents. By undirected we mean that
Downloaded 08/19/16 to 131.156.224.67. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
if (i, j ) ∈ E, then ( j , i) ∈ E. By connected we mean that for any vertex i ∈ Γ there exists
a path in E that connects i with any other vertex j ∈ Γ . Recall that a path from i to j is
a sequence of edges (i, k1 )(k1 , k2 ) . . . (k r , j ) in E. Note that in general, the network G is
not complete; that is to say that each vertex i has a direct link only to a subset of other
vertices, denoted by Ni = { j : (i, j ) ∈ E}. This subset is referred to as neighborhood of i.
The interpretation of an edge (i, j ) in the edgeset E is that the state of vertex j is
available to vertex i. As the network is undirected, then communication is bidirectional;
namely, the state of agent i is available to agent j .
Let xi be the state of agent i. The evolution of xi is determined by the following
first-order differential equation driven by a distributed and stationary control policy:
where x (i ) represents the vector collecting the states of the only neighbors of i. In other
words, for the j the component of x (i ) we have
(i ) x j if j ∈ Ni ,
xj =
0 otherwise.
The control policy is distributed, as the control ui depends only on xi and x (i ) . The
control policy is stationary, as there is no explicit dependence of ui on time t . Occasion-
ally, we also call such a control policy time invariant. Let the state of the collective system
be defined by the vector x(t ) = {xi (t ), i ∈ Γ }, and let the initial state be x(0). Similarly,
denote by u(x) = {ui (xi , x (i ) ) : i ∈ Γ } the collective control vector, which we occasionally
call simply protocol. Fig. 13.1 depicts a possible network of dynamic agents. In the graph,
for some of the vertices, we indicate the corresponding dynamics.
The consensus problem consists in determining how to make the players reach agree-
ment on a so-called consensus value. To give a precise definition of such a value, consider
a function χ̂ : n → . This function is a generic continuous and differentiable function
of n variables x1 , . . . , xn which is permutation invariant. In other words, for any permu-
tation σ(.) from the set Γ to the set Γ , the function satisfies
The above means that the collective system converges to χ̂ (x(0))1, where 1 denotes
Downloaded 08/19/16 to 131.156.224.67. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
In other words, the consensus value is a point in the interval from the minimum to the
maximum values of the agents’ initial states.
In preparation for the formulation of the consensus problem as a game, let us also
introduce a cost functional for agent i as the one displayed below:
!T: ;
Ji (xi , x (i ) , ui ) = lim F (xi , x (i ) ) + ρui2 d t , (13.3)
T −→∞ 0
counts for the deviation of player i from his neighbors. With the above cost functional
in mind, a protocol is said to be optimal if each control ui minimizes the corresponding
cost functional Ji . Fig. 13.2 depicts a network of dynamic agents and the cost functionals
corresponding to different agents.
J1 (x1 (t ), x (1) (t ), u1 (t ))
Ji (xi (t ), x (i ) (t ), ui (t ))
J2 (x2 (t ), x (2) (t ), u2 (t ))
Figure 13.2. Network of dynamic agents with the cost functionals assigned to the players.
After this preamble, the problem under study can be stated as follows.
Note that a pair (F (.), u(.)) which is solution to Problem 13.2 must guarantee that
all cost functionals in (13.3) converge to a finite value. For this to be true, the integrand
in (13.3) must be null in χ 1.
146 Chapter 13. Consensus in Multi-agent Systems
Assumption 13.1 (Structure of χ̂ (.)). Assume that the agreement function χ̂ (.) verifies
d g (x )
(13.2) and it is such that χ̂ (x) = f ( i ∈Γ g (xi )) for some f , g : → with d x i = 0 for all
i
xi .
It is worth noting that the class of agreement functions contemplated in the above
assumption involves any value in the range between the minimum and the maximum of
the initial states. This is clear if we look at Table 13.1 and note that to span the whole
interval we simply consider the mean of order p and let p vary between −∞ and ∞.
Theorem 13.1 (Solution to the Consensus Problem). The following protocol is solution
to the consensus problem:
1
ui (xi , x (i ) ) = α dg
φ̂(ϑ(x j ) − ϑ(xi )) ∀i ∈ Γ , (13.4)
d xi j ∈Ni
where
• the parameter α > 0, and the function φ̂ : → is continuous, locally Lipschitz, odd,
and strictly increasing;
d ϑ(xi )
• the function ϑ : → is differentiable with d xi
locally Lipschitz and strictly posi-
tive;
d g (y)
• the function g (.) is strictly increasing, that is, dy
> 0 for all y ∈ .
have V (η) = 0 if and only if η = 0. In addition, V (η) > 0 for all η = 0. Our goal is to
Downloaded 08/19/16 to 131.156.224.67. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
show that V̇ (η) < 0 for all η = 0. To this purpose, let us first rewrite V̇ (η) as follows:
d g (xi )
V̇ (η) = ηi η̇i = ηi ẋ . (13.5)
i ∈Γ i ∈Γ d xi i
Now, from (13.4) we can rewrite (13.5) as
d g (xi )
V̇ (η) = ηi ui
i ∈Γ
d xi
d g (xi ) 1
= ηi α dg φ̂(ϑ(x j ) − ϑ(xi )) (13.6)
i ∈Γ d xi
d xi j ∈Ni
=α ηi φ̂(ϑ(x j ) − ϑ(xi )).
i ∈Γ j ∈Ni
Now, by noting that j ∈ Ni if and only if i ∈ N j for each i, j ∈ Γ , from (13.6) we can
rewrite
V̇ (η) = −α ( g (x j ) − g (xi ))φ̂(ϑ(x j ) − ϑ(xi )). (13.7)
(i , j )∈E
From (13.7) we conclude that V̇ (η) ≤ 0 for all η and, more specifically, V̇ (η) = 0 only
for η = 0. To see this, observe that for any (i, j ) ∈ E, x j > xi implies g (x j ) − g (xi ) > 0,
ϑ(x j ) − ϑ(xi ) > 0, and φ̂(ϑ(x j ) − ϑ(xi )) > 0. This is true, as α > 0 and g (.), φ̂(.), and
ϑ(.) are strictly increasing. Therefore we have α( g (x j ) − g (xi ))φ̂(ϑ(x j ) − ϑ(xi )) > 0 if
x j > xi . A similar argument can be used if x j < xi .
Let the following update times be given: tk = t0 +δk, where k = 0, 1, . . . . Let x̂i (τ, tk )
Downloaded 08/19/16 to 131.156.224.67. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
and x̂ (i ) (τ, tk ), τ ≥ tk be the predicted state of agent i and of his neighbors, respectively.
The problem we wish to solve is the following one.
Problem 13.3 (Receding horizon). For all agents i ∈ Γ and times tk , k = 0, 1, . . . , given
the initial state xi (tk ) and x (i ) (tk ), find
ûi (τ, tk ) = arg min #i (xi (tk ), x (i ) (tk ), ûi (τ, tk )),
where
! T : ;
(i )
#i (xi (tk ), x (tk ), ûi (τ, tk )) = lim " (x̂i (τ, tk ), x̂ (i ) (τ, tk )) + ρ ûi2 (τ, tk ) d τ
T −→∞ tk
(13.8)
subject to the following constraints:
The above set of constraints involves the predicted state dynamics of agent i and of
his neighbors; see (13.9) and (13.10), respectively.
The constraints also involve the boundary conditions at the initial time tk ; see condi-
tions (13.11) and (13.12). Note that, by setting x̂ (i ) (τ, tk ) = x (i ) (tk ) for all τ > tk , agent i
restrains the states of his neighbors to be constant over the planning horizon.
At tk+1 new information on x (i ) (tk+1 ) becomes available. Then the agents update
their best-response strategies, which we refer to as receding horizon control policies. Con-
sequently, for all i ∈ Γ , we obtain the closed-loop system
The complexity reduction introduced by the method derives from turning Prob-
lem 13.3 into n one-dimensional problems. This is a consequence of constraint (13.10),
which forces x̂ (i ) to be constant in (13.8). Further evidence of this derives from rewrit-
ing " (.), thus highlighting its dependence on the state x̂i (τ, tk ). By doing this the cost
functional (13.8) takes the form
!T
@ A
Ji = lim " (x̂i (τ, tk )) + ρ ûi2 (τ, tk ) d τ. (13.13)
T −→∞ tk
Consequently, the problem simplifies, as it involves the computation of the optimal con-
trol ûi (τ, tk ) that minimizes (13.13).
Fig. 13.3 illustrates the receding horizon formulation. Given a dynamics for x j (t ), for
all j ∈ Ni (solid line), agent i takes for it the value measured at time tk (small circles) and
maintains it constant from tk on (thin horizontal lines).
13.4. A solution to the Mechanism Design Problem 149
Downloaded 08/19/16 to 131.156.224.67. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
x j (t ), x̂ j (t ), j ∈ Ni
Figure 13.3. Receding horizon formulation for agent i: at each sampling time (circles) the
estimated state of neighbor j , x̂ j (.) is maintained constant over the horizon (thin solid); the actual state
x j (.) changes with time (thick solid).
Let us now use the Pontryagin Minimum Principle to prove that the control ûi (τ, tk )
is a best-response strategy. Before doing this, let the Hamiltonian function be given by
where pi is the co-state. In the above we have dropped dependence on τ and tk . After
doing this, the Pontryagin necessary conditions yield the following set of equalities:
∂ H (x̂i , ûi , pi )
Optimality condition: = 0 ⇒ pi = −2ρ ûi . (13.15)
∂ ûi
∂ H (x̂i , ûi , pi )
Multiplier condition: ṗi = − . (13.16)
∂ x̂i
∂ H (x̂i , ûi , pi )
State equation: x̂˙i = ⇒ x̂˙i = ûi . (13.17)
∂ pi
B
∂ 2 H (x̂i , ûi , pi ) BB
Minimality condition: B ≥ 0 ⇒ ρ ≥ 0. (13.18)
∂ û 2 B ∗ ∗ ∗
i x̂i =x̂i , ûi = ûi , pi = pi
The boundary condition (13.19) restrains the Hamiltonian to be null along any opti-
mal path {x̂i∗ (t )∀t ≥ 0} (see, e.g., [52, Sect. 3.4.3]).
Recall from Section 9.2.1 that the Pontryagin Minimum Principle yields conditions
that are, in general, necessary but not sufficient (see also [52]). However, sufficiency is
guaranteed under the following additional assumption:
If we impose further restraints on the structure of " (xi ), we obtain sufficient con-
ditions that yield a unique optimal control policy ûi (.). This is established in the next
result.
Theorem 13.2. Let agent i evolve according to the first-order differential equation (13.1).
At times tk = 0, 1, . . . , let the agents be assigned the cost functional (13.8), where the penalty
150 Chapter 13. Consensus in Multi-agent Systems
is given by
Downloaded 08/19/16 to 131.156.224.67. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
⎛ ⎞2
1
" (x̂i (τ, tk )) = ρ ⎝ d g (ϑ(x j (tk )) − ϑ(x̂i (τ, tk )))⎠ , (13.21)
d xi j ∈Ni
1
and where g (.) is increasing, ϑ(.) is concave, and d g (y) is convex.
dy
Then the control policy
1
ûi (τ, tk ) = ui (xi (τ)) = α dg
(ϑ(x j (tk )) − ϑ(xi (τ))), α = 1, (13.22)
d xi (τ) j ∈Ni
∂ H (x̂i , ûi , pi )
2ρ û˙i = . (13.23)
∂ x̂i
∂ ûi ∂ ûi ∂ û
Also, from (13.17), we have û˙i = ∂ x̂i
x̂˙i = ∂ x̂i
ûi . Then, (13.23) yields 2ρ ∂ x̂i ûi =
i
∂ H (x̂i , ûi , pi )
∂ x̂i
. After integration and from (13.19) we have that the solution of (13.23) must
satisfy
ρ ûi2 = " (x̂i ). (13.24)
1
Then, it suffices to note that ûi (τ, tk ) = dg j ∈Ni (ϑ(x j (tk )) − ϑ(x̂i (τ, tk ))) verifies the
d x̂i
above condition.
To prove uniqueness, let us prove that " (x̂i ) is convex. To this purpose, we can
: ∂ g ;−1
write " = "3 (F1 (x̂i ), "2 (x̂i )), where function "1 (x̂i ) = ∂ x , function "2 (x̂i ) =
i
j ∈Ni (ϑ(x j (tk )) − ϑ(x̂i )), and "3 = ("1 (x̂i ) · "2 (x̂i )) . As "3 (.) is nondecreasing in
2
each argument, function "3 (.) is convex if both functions "1 (.) and "2 (.) are also convex
: d g ;−1
[64]. Function "1 (.) is convex, as d x̂ is convex by hypothesis. Analogously, "2 (.) is
i
convex, as ϑ(.) is concave, and this concludes the proof.
dg
The above theorem holds also for α = −1 if d x < 0 for all xi (0).
i
From the above theorem, we can derive the following corollary.
Corollary 13.3. Let a network of dynamic agents G = (Γ , E) be given. Assume that the
agents evolve according to the first-order differential equation (13.1). At times tk = 0, 1, . . . ,
13.5. Numerical example: Team of UAVs 151
let the agents be assigned the cost functional (13.8), where the penalty is given by
Downloaded 08/19/16 to 131.156.224.67. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
⎛ ⎞2
1
" (x̂i (τ, tk )) = ρ ⎝ d g (ϑ(x j (tk )) − ϑ(x̂i (τ, tk )))⎠ , (13.25)
d xi j ∈Ni
1
and where g (.) is increasing, ϑ(.) is concave, and d g (y) is convex. If we take δ −→ 0, then we
dy
have
(i) the penalty function
⎛ ⎞2
1
" (xi (τ, tk )) −→ F (xi , x (i ) ) = ρ ⎝ d g (ϑ(x j ) − ϑ(xi ))⎠ (13.26)
d xi j ∈Ni
and
(ii) the applied receding horizon control law
1
uiRH (τ) −→ ui (xi , x (i ) ) = dg
(ϑ(x j ) − ϑ(xi )). (13.27)
d xi j ∈Ni
The above corollary provides a solution to the mechanism design problem (Prob-
lem 13.2). To see this, imagine that a game designerwishes the agents to asymptotically
reach consensus on the consensus value χ̂ (x) = f ( i ∈Γ g (xi )). He can accomplish this
by assigning the agents the cost functional (13.3), where the penalty is as in (13.26) and
1
where g (.) is increasing, d g (y) is convex, and δ is “sufficiently” small.
dy
v4 v3
v1 v2
Let us now illustrate the results on a team of four UAVs. The UAVs are initially at
different heights, and they are performing a vertical alignment maneuver in longitudinal
flight. Each vehicle controls the vertical rate on the basis of the neighbors’ heights. The
UAVs interact as described by the communication network depicted in Fig. 13.4. The
goal of the mission is to make the UAVs reach consensus on the formation center. We
analyze four different vertical alignment maneuvers where the formation center is the (i)
arithmetic mean, (ii) geometric mean, (iii) harmonic mean, and (iv) mean of order 2 of the
initial heights of all UAVs. Set the initial heights as x(0) = (5, 5, 10, 20)T .
Simulations are performed using the following algorithm.
152 Chapter 13. Consensus in Multi-agent Systems
In the first simulation, the UAVs are assigned the cost functional (13.3), where the
: ;2
penalty F (xi , x (i ) ) =j ∈Ni (x j − xi ) . The UAVs use their best responses
u(xi , x (i ) ) = (x j − xi ), (13.28)
j ∈Ni
and as a result, they reach asymptotically consensus on the arithmetic mean of x(0). We
illustrate this in Fig. 13.5(a).
In the second simulation, the UAVs are assigned a cost functional where the penalty
: ;2
F (xi , x (i ) ) = xi j ∈Ni (x j − xi ) . By using their best responses
u(xi , x (i ) ) = xi (x j − xi ), (13.29)
j ∈Ni
they reach asymptotically consensus on the geometric mean of x(0). A graphical illustra-
tion of this is available in Fig. 13.5(b).
In the third simulation scenario, the UAVs are assigned a cost functional where for
: ;2
the penalty we have F (xi , x (i ) ) = xi2 j ∈Ni (x j − xi ) . The implementation of their best
responses
u(xi , x (i ) ) = −xi2 (x j − xi ) (13.30)
j ∈Ni
leads them to reach asymptotically consensus on the harmonic mean of x(0). A sketch of
the resulting dynamics is given in Fig. 13.5(c).
In the fourth simulation scenario, the UAVs are assigned cost functionals where the
: 1 ;2
penalty F (xi , x (i ) ) = 2x j ∈Ni (x j − xi ) . The UAVs’ best responses
i
1
u(xi , x (i ) ) = (x − xi ) (13.31)
2xi j ∈N j
i
lead them to reach asymptotically consensus on the mean of order 2 of x(0). This is
sketched in Fig. 13.5(d).
Finally, Fig. 13.6 depicts a vertical alignment maneuver when the UAVs use protocol
maxi ∈Γ {xi (0)}
u(xi , x (i ) ) = (x j − xi ). (13.32)
2xi j ∈N i
The above protocol is obtained by scaling the protocol (13.31) by twice an upper bound
of maxi ∈Γ {xi (0)}.
13.5. Numerical example: Team of UAVs 153
20 20
Downloaded 08/19/16 to 131.156.224.67. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
height 15 15
height
10 10
5 5
0 1 2 3 4 5 0 0.2 0.4 0.6 0.8
time time
(a) (b)
20 20
15 15
height
height
10 10
5 5
0 0.02 0.04 0.06 0.08 0.1 0.12 0 1 2 3 4 5
time time
(c) (d)
Figure 13.5. Longitudinal flight dynamics converging to (a) the arithmetic mean under
protocol (13.28); (b) the geometric mean under protocol (13.29); (c) the harmonic mean under proto-
col (13.30); (d) the mean of order 2 under protocol (13.31). Reprinted with permission from Elsevier
[30].
20 20
height
height
10 10
0 0
0 2 4 0 2 4
east east
20 20
height
height
10 10
0 0
0 2 4 0 2 4
east east
Figure 13.6. Vertical alignment to the mean of order 2 on the vertical plane. Reprinted with
permission from Elsevier [30].
154 Chapter 13. Consensus in Multi-agent Systems
This chapter shows how to turn a consensus problem into a noncooperative differential
game. Consensus is the result of a mechanism design where a game designer imposes
individual objective functions. Then, the agents reach asymptotically consensus as a side
effect of the optimization of their own individual objectives. The results of this chapter
are important, as they shed light on the game-theoretic nature of a consensus problem.
We refer the reader to a few classical references on consensus [128, 194, 193, 205, 206, 252].
Consensus arises in several application domains, such as autonomous formation flight
[94, 102], cooperative search of UAVs [46], swarms of autonomous vehicles or robots
[100, 161], and joint replenishment in multi-retailer inventory control [31, 32]. More
details on mechanism design or inverse game theory can be found in [196, Chap. 10]. For
more details on receding horizon we refer the reader to [89] and [158].
Part of the material contained in this chapter is borrowed from [30]. We refer the
reader to the original work for further details on invariance properties of the consensus
value. In this chapter, the presentation of the topic has been tailored to emphasize the
game theory perspective on the problem. Additional explanatory material and figures
have been added to help the reader gain a better insight and physical interpretation of the
different concepts.