Decentralized_Control_for_Large-Scale_Nonlinear_Systems_With_Unknown_Mismatched_Interconnections_via_Policy_Iteration
Decentralized_Control_for_Large-Scale_Nonlinear_Systems_With_Unknown_Mismatched_Interconnections_via_Policy_Iteration
Abstract—In this paper, the decentralized control problem is of a set of subsystems coupled by interconnections, which
solved based on a policy iteration algorithm for large-scale non- lead to increasing difficulties of analysis and synthesis when
linear systems with unknown mismatched interconnections. The utilizing centralized control. To overcome the difficulties
unknown interconnection is approximated by a neural network
with local states of isolated subsystem and substituted refer- in controlling such systems, decentralized control strategy,
ence states of coupled subsystems. Then, an adaptive estimation which utilizes local states of each subsystem, is an efficient
term is utilized to construct the improved local performance and effective approach. In the past few decades, considerable
index function that reflects the substitution error. Hereafter, attention has been paid to the design of decentralized con-
the closed-loop large-scale nonlinear system is guaranteed to be trollers for large-scale systems. For example, Wang et al. [1]
ultimately uniformly bounded by the implementation of a set
of developed decentralized optimal control policies. Two sim- presented an adaptive neural decentralized control approach
ulation examples are given to verify the effectiveness of the for stochastic systems with strong interconnected nonlineari-
presented scheme. The significant contribution of this scheme ties both in drift and diffusion terms. Li et al. [2] proposed
lies in that it removes the common assumptions on satisfying a decentralized adaptive neural control scheme for a class of
matching condition and upper boundedness of interconnections, interconnected large-scale uncertain systems with input satura-
when designing the decentralized optimal control for large-scale
nonlinear systems. tion. Zhao et al. [3], [4] presented decentralized fault-tolerant
control schemes based on self-tuned local feedback
Index Terms—Adaptive dynamic programming (ADP), gain and local nonlinear velocity observer against actuator
decentralized control, large-scale systems, neural networks
(NNs), optimal control, policy iteration (PI), reinforcement failures.
learning, unknown mismatched interconnections. As is well known, the optimal control problem for nonlinear
systems can be addressed by the Hamilton–Jacobi–Bellman
(HJB) equation, which can be solved by adaptive dynamic pro-
gramming (ADP) [5] to avoid the difficulty in the “curse of
I. I NTRODUCTION
dimensionality” with the help of function approximators, such
HE increasing demands of production quality and
T economic efficiency have led to increasingly large-scale
and complex modern systems, such as ecosystems, communi-
as neural networks (NNs). Since NNs have a strong approxi-
mation capability, Wang et al. [6], [7] proposed adaptive neural
control schemes for nonlinear systems with dynamic uncer-
cation systems, transportation systems, urban traffic systems, tainties and completely unknown dynamics. Li et al. [8] tack-
and power systems. In general, a large-scale system consists led the control problems by using NNs for nonlinear systems
with unknown dead-zone, time-varying delays [9], [10], and
Manuscript received November 22, 2016; revised February 15, 2017;
accepted March 30, 2017. Date of publication April 24, 2017; date of current unmodeled dynamics[11]. There are many synonyms used
version September 14, 2018. This work was supported in part by the National for ADP, such as ADP [12], approximate dynamic pro-
Natural Science Foundation of China under Grant U1501251, Grant 61603387, gramming [13], adaptive critic designs [14], neuro-dynamic
Grant 61533017, Grant 61374051, Grant 61374105, and Grant 61503379, in
part by the Scientific and Technological Development Plan Project in Jilin programming [15], [16], and reinforcement learning [17].
Province of China under Grant 20150520112JH and Grant 20160414033GH, Recently, ADP algorithms were further employed to solve
and in part by the Beijing Natural Science Foundation under Grant 4162065. control problems of continuous-time systems [18]–[20],
This paper was recommended by Associate Editor Z. Wang. (Corresponding
author: Ding Wang.) discrete-time systems [21]–[23], trajectory tracking [24]–[26],
B. Zhao, D. Wang, and G. Shi are with the State Key Laboratory input/output constraints [22], [27], external disturbances and
of Management and Control for Complex Systems, Institute of uncertainties [28]–[30], zero-sum games [31], fault toler-
Automation, Chinese Academy of Sciences, Beijing 100190, China
(e-mail: [email protected]; [email protected]; [email protected]). ant [32]–[34], etc. We can see from the literature that
D. Liu is with the School of Automation and Electrical Engineering, ADP algorithms can be categorized into heuristic dynamic
University of Science and Technology Beijing, Beijing 100083, China (e-mail: programming (HDP) [24], dual HDP (DHP) [35], action-
[email protected]).
Y. Li is with the Department of Control Science and Engineering, dependent HDP (ADHDP) [36], ADDHP [37], globalized
Changchun University of Technology, Changchun 130012, China (e-mail: DHP (GDHP) [38], and ADGDHP [14].
[email protected]). As previously mentioned, local controllers should be
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. designed for their corresponding subsystems in decentralized
Digital Object Identifier 10.1109/TSMC.2017.2690665 control strategy. Saberi [39] established a decentralized
2168-2216 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
1726 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 10, OCTOBER 2018
optimal control for local subsystems of interconnected system. 2) The unknown mismatched interconnection is estimated
It is shown that the optimal control can be implemented by local observer, which utilizes the local states and the
to design a decentralized controller. Several literatures have substituted reference states of the coupled subsystems.
paid attention to the optimal design in decentralized con- As a result, the proposed scheme avoids the common
trol. Jiang and Jiang [19] presented a decentralized control assumptions on satisfying matching condition and upper
design by using robust ADP theory and policy iteration boundedness of interconnections of large-scale nonlinear
(PI) technique for complex systems with unknown parame- systems in previous ADP-based approaches.
ters and dynamic uncertainties. And then, Bian et al. [40] The rest of this paper is organized as follows. In Section II,
extended the method to systems with unmatched uncertain- the problem statement is presented, and approximate the
ties. Lu et al. [41] presented a direct HDP method to address interconnection by employing local states of isolated subsys-
nonlinear coordinated control for a large power system with tem and substituted reference states of coupled subsystems. In
uncertainties. For damping low frequency oscillations in power Section III, the decentralized optimal control policy is derived
systems, Molina et al. [42] developed an intelligent con- for the isolated subsystem, and NNs are employed to approxi-
troller based on local signals by using virtual generators and mate the interconnection and the critic NN, respectively. Then,
ADP technique. Bernstein et al. [43] presented an optimal the local online PI algorithm is presented. In Section IV, two
PI algorithm to handle the decentralized partially observ- numerical simulation examples are provided to demonstrate
able Markov decision process. Liu et al. [44] developed an the effectiveness of the developed scheme. In Section V, the
NN-based online learning optimal control approach to sta- conclusion is drawn.
bilize nonlinear interconnected large-scale systems, and then
by introducing an integral PI algorithm, a model-free opti- II. P ROBLEM S TATEMENT
mal control method was extended to unknown interconnected
systems [45]. Karimi et al. [47] proposed a reinforcement In this paper, we consider a large-scale nonlinear system
learning-based backsteping decentralized control scheme for composed of N subsystems with unknown mismatched
electric power systems, where gains of decentralized con- interconnections as
⎧
trollers were tuned by reinforcement learning to adapt to ⎪ ẋ1 (t) = f1 (x1 (t)) + g1 (x1 (t))u1 (x1 (t)) + h1 (x(t))
⎪
⎪
various operating conditions. Mehraeen and Jagannathan [48] ⎪
⎪ ..
⎪
⎨ .
solved HJB equation via direct neural dynamic program-
ẋi (t) = fi (xi (t)) + gi (xi (t))ui (xi (t)) + hi (x(t)) (1)
ming for the decentralized near optimal regulation of non- ⎪
⎪ ..
⎪
⎪
linear interconnected discrete-time systems. However, most ⎪
⎪ .
⎩
previously mentioned literature focused on the plants in ẋN (t) = fN (xN (t)) + gN (xN (t))uN (xi (t)) + hN (x(t)).
linear or satisfying assumed matching conditions. Actually,
interconnections are always unknown and mismatched in The ith (i = 1, 2, . . . , N) interconnected subsystem is
many applications. However, ADP-based decentralized control described by
approaches were not presented in previous works for systems ẋi (t) = fi (xi (t)) + gi (xi (t))ui (xi (t)) + hi (x(t)) (2)
in these situations.
Motivated by [44]–[46], this paper addresses the decen- where xi (t) ∈ Rni and ui (xi (t)) ∈ Rmi are the state vec-
tralized control problem for large-scale nonlinear systems tor and input vector of the ith subsystem,
respectively. x =
with unknown mismatched interconnections. By using local [x1 , x2 , . . . , xN ]T ∈ Rn with n = N i=1 ni denotes the entire
states of isolated subsystem and substituted reference states of system state and u1 (x1 ), u2 (x2 ), . . . , uN (xN ) are local control
coupled subsystems, the unknown interconnection is approx- inputs. For the ith subsystem, fi (·) and gi (·) are known, locally
imated by an NN. Then, the improved local performance Lipschitz and differentiable in their augments with fi (0) = 0.
index function which reflects the substitution error is con- hi (x(t)) is the unknown mismatched interconnection term.
structed with the help of the estimated term. Hereafter, the As is well known, fuzzy logic systems, NNs, etc. are excel-
PI algorithm is developed to solve the HJB equation via lent approximators for unknown nonlinearities. Furthermore,
the constructed critic NN, and the approximated decentral- since the radial basis function NN (RBFNN) has simple
ized control policy can be directly obtained. It is proven that structure and excellent approximation capability, RBFNN is
the closed-loop large-scale nonlinear system can be guaran- employed in a given compact set ∈ Rn to approximate the
teed to be ultimately uniformly bounded (UUB) based on unknown interconnection hi (x(t)), that is
Lyapunov stability theorem. Two numerical simulation exam-
T
ples are provided to ensure the effectiveness of the proposed hi (x(t)) = Wih σih (x(t)) + εi (x(t)) (3)
scheme.
where σih (x(t)) is called the basis function that is commonly
The main contributions of this paper include the following
selected as a Gaussian function
two aspects.
1) Unlike the literature previously mentioned, this paper −(x − ci )T (x − ci )
extends the ADP algorithm to deal with the decentralized σih (x) = exp
b2i
control problem with an improved local performance
index function for large-scale nonlinear systems with where the constant vector ci is the center of the basis func-
unknown mismatched interconnections. tion, and bi > 0 is a real number which is the width
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: DECENTRALIZED CONTROL FOR LARGE-SCALE NONLINEAR SYSTEMS 1727
of the basis function. The optimal weight vector Wih = The main objective of this paper is to find a set of local
[wi1 , wi2 , . . . , wik ]T is defined as control policies u1 (x1 ), u2 (x2 ), . . . , uN (xN ) as the decentral-
ized control law to stabilize the system (1). To handle the
T
Wih = arg min sup hi (x) − Ŵih σih (x(t)) optimal control problem, we need to obtain the optimal control
Ŵih ∈Rk x∈ policy u∗i (xi ) for the ith subsystem. Thus, it is desired to find
the feedback control policy ui (xi ) to minimize the improved
and εi (x) is the NN approximation error, which can be
local infinite horizon performance index function as
decreased by increasing the NN hidden node number k.
Assumption 1: The NN approximation error εi is upper ∞
bounded, i.e., |εi (x)| ≤ φi1 , where φi1 is an unknown positive Ji (xi0 ) = δ̂i ∇JiT (xi (τ ))Ei + Ui (xi (τ ), ui (τ )) dτ
0
constant. (6)
To relax the upper boundedness assumption of
interconnections, we approximate the interconnection where Ui (xi , ui ) = xiT Qi xi + uTi Ri ui is the utility function,
term in the ith subsystem by RBFNN using the states of local Ui (0, 0) = 0, and Ui (xi , ui ) ≥ 0 for all xi and ui , in which Qi ∈
subsystem and the reference states of the coupled subsystems, Rni ×ni and Ri ∈ Rmi ×mi are positive definite matrices. δ̂i is a
that is positive function which will be defined later. ∇Ji (xi ) denotes
the partial derivative of local performance index function Ji (xi )
T
hi (x) = Wih σih (xiD ) + i (x, xiD ) + εi (xi ) with respect to local state xi , i.e., ∇Ji (xi ) = (∂Ji (xi )/∂xi ).
= hid (xiD ) + i (x, xiD ) + εi (xi ) (4)
III. D ECENTRALIZED C ONTROLLER D ESIGN
where xiD = [x1d , x2d , . . . , xi , . . . , xNd ]T , xid indicates the AND S TABILITY A NALYSIS
reference states of the coupled subsystems, hid (xiD ) =
T σ (x ), (x, x ) = W T σ (x) − W T σ (x ) is the sub- In this section, we present the optimal decentralized con-
Wih ih iD i iD ih ih ih ih iD troller design and stability analysis in detail.
stitution error since it arises from the substitution of NN
inputs.
Similar to [49], the Gaussian function σih (xi ) satisfies the A. Optimal Control
global Lipschitz condition, which implies Based on the optimal control theory, the designed feedback
N
control policy must be admissible. Therefore, before the opti-
mal control is presented, the definition of admissible control
i ≤ dij Ej
is introduced.
j=1,j=i
Definition 1: For the ith isolated subsystem (5), a control
where Ej = xj − xjd and dij > 0 is an unknown global policy ui (xi ) is defined to be admissible with respect to (6) if
Lipschitz constant. ui (xi ) is continuous on a set i ∈ Rni , ui (0) = 0 and ui (xi )
Remark 1: We can observe that (4) can be obtained only stabilizes the isolated subsystem (5), and Ji (xi0 ) in (6), where
by adding and subtracting the term hid (xiD ), which can be xi0 is the initial state of xi , is finite for all xi ∈ i .
approximated by RBFNN. So it can avoid the common upper Consider the ith isolated subsystem (5), for any admissible
boundedness assumption of the interconnection term in the ith control policy ui (xi ) ∈ ψi (i ), where ψi (i ) denotes the set
subsystem. In other words, the function hi (x) depends only on of admissible control, if the improved local value function
the corresponding local states and the reference states, which ∞
are shared with each subsystem before the system runs. Vi (xi ) = δ̂i ∇ViT (xi )Ei + Ui (xi (τ ), ui (τ )) dτ (7)
For the ith isolated subsystem 0
is continuously differentiable, then the infinitesimal version
ẋi (t) = fi (xi (t)) + gi (xi (t))ui (xi (t)) + hid (xiD ) (5) of (7) is the so-called local nonlinear Lyapunov equation
since fi (·) and gi (·) are locally Lipschitz continuous on a set
0 = δ̂i ∇ViT (xi )Ei + Ui (xi , ui )
i ∈ Rni , the subsystem (5) is controllable. Different from
the interconnected subsystem (2), the isolated subsystem (5) + ∇ViT (xi )(fi (xi ) + gi (xi )ui (xi ) + hid (xiD )) (8)
depends only on its local states. with Vi (0) = 0.
Remark 2: To eliminate confusions, it is necessary to dis- Define the local Hamiltonian as
tinguish the concepts of interconnected subsystems, isolated
subsystems, and coupled subsystems. In this paper, we call Hi (xi , ui , ∇Vi (xi )) = δ̂i ∇ViT (xi )Ei + Ui (xi , ui )
all the subsystems interconnected with the ith one as cou-
pled subsystems. On the other hand, we call (2) interconnected + ∇ViT (xi )(fi (xi ) + gi (xi )ui (xi )
subsystem, since hi (x(t)) contains the actual states of all the + hid (xiD ))
subsystems. Different from it, hid (xiD ) in the isolated subsys-
and the local value function as
tem (5) depends only on the local states and the substituted
∞
reference states of the coupled subsystems. That is to say,
Vi∗ (xi ) = min δ̂i ∇Vi∗T (xi )Ei + Ui (xi (τ ), ui (τ )) dτ .
hid (xiD ) is independent from interconnection hi (x(t)). So it ui ∈ i (i ) 0
can be called isolated subsystem. (9)
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
1728 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 10, OCTOBER 2018
N
N
≤ − xiT Qi xi
+ ∇Vi∗T dij Ej + ∇Vi∗T φi1 i
i=1
j=1,j=i
N
≤ −δ̂i ∇Vi∗T Ei − Ui (xi , ui ) ≤ i − λmin (Qi )xi 2
N i=1
+ max dij ∇Vi∗T Ej + ∇Vi∗T φi1 . where λi (·) denotes the minimum eigenvalue of the matrix.
ij
j=1,j=i Hence, we can conclude that V̇ ∗ < 0 when xi lies outside of
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: DECENTRALIZED CONTROL FOR LARGE-SCALE NONLINEAR SYSTEMS 1729
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
1730 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 10, OCTOBER 2018
where ∇σic (xi ) = (∂σic (xi )/∂xi ) ∈ Rli ×ni and ∇εic (xi ) are Theorem 3: Consider the interconnected subsystem (2), the
the gradients of the activation function and the approximation weight of the critic NN is updated by (24), the dynamics of
error, respectively. the weight approximation error vector can be guaranteed to
From (20), the approximate critic NN can be expressed by be UUB.
Proof: Select the Lyapunov function candidate as
V̂i (xi ) = ŴicT σic (xi ).
1 T
Li2 = W̃ W̃ic .
Then, the gradient of V̂i (xi ) along the corresponding state is 2lic ic
∇ V̂i (xi ) = (∇σic (xi ))T Ŵic . Its time derivative is
1 T˙
For the isolated subsystem (5), substituting (21) into the L̇i2 = W̃ W̃ic
lic ic
nonlinear Lyapunov function (8), we have
= W̃icT eicH − W̃icT θi θi
T
2
0 = δ̂i ∇ViT (xi )Ei + Ui (xi , ui ) + WicT ∇σic (xi ) + ∇εic (xi )
= W̃icT eicH θi − W̃icT θi
× ( fi (xi ) + gi (xi )ui (xi ) + hid (xiD )).
1 1
2
≤ e2icH − W̃icT θi .
Let υi = ∇ViT (xi ) − ∇ V̂iT (xi ), for the interconnected 2 2
subsystem (2), the Hamiltonian can be expressed as
Hence, L̇i2 < 0 when W̃ic lies outside of the compact set
eicH
Hi (xi , ui , Wic ) = δ̂i ∇ V̂iT (xi )Ei + Ui (xi , ui )
W̃ic = W̃ic : W̃ic ≤
+ WicT ∇σic (xi )ẋi θiM
T
= −δ̂i υi Ei − ∇εic (xi )ẋi where θi ≤ θiM , and θiM is a positive constant. Based
on the Lyapunov stability theorem, the dynamics of the
= eicH (22)
weight approximation error vector is UUB. This completes the
where eicH is the approximation error of the critic NN. proof.
Thus, the approximate local Hamiltonian can be obtained by Remark 6: Since the convergence rate of RBFNN is higher
than that of back propagation NN (BPNN), the RBFNN is
Hi xi , ui , Ŵic = δ̂i ∇ V̂iT (xi )Ei + Ui (xi , ui ) + ŴicT ∇σic (xi )ẋi employed by the developed local state observer (15). However,
= eic . the local control policy (25) requires the partial derivative
of local critic NN, which has heavy computational burden if
Let θi = ∇σic (xi )ẋi . By the steepest descent algorithm, the RBFNN is employed. To tradeoff between the convergence
objective function Eic = (1/2)eTic eic can be minimized in order rate and computational burden, BPNN is selected for local
to adjust the weight vector of the critic NN Ŵic , which should critic NN. Thus, different structures are chosen for these
be updated by two NNs.
˙ = −l e θ
Ŵ (23)
ic ic ic i C. Stability Analysis
where lic > 0 is the learning rate. Theorem 4: Consider the interconnected subsystem (2),
Define the weight approximation error as W̃ic = Wic − Ŵic , together with the improved local value function (7), where
according to (22) and (23), one has δ̂i is updated by
T
eic = eicH − W̃icT θi . δ̂˙i = iδ ∇Vi∗ (xi ) Ei (26)
The critic NN weight approximation error can be updated by and iδ > 0 a constant, the N approximated decentralized
control policies developed by (25) guarantee the closed-
˙ = −Ŵ
W̃ ˙ = l e − W̃ T θ θ . (24) loop large-scale nonlinear system (1) to be UUB. In other
ic ic ic icH ic i i
words, the control policies u1 (x1 ), u2 (x2 ), . . . , uN (xN ) are the
Therefore, according to (11) and (20), the ideal local control decentralized control law for the large-scale nonlinear system
policy can be expressed as composed of N subsystems as (2).
1 Proof: Select the Lyapunov function candidate for the ith
ui (xi ) = − R−1 gT
(xi ) (∇σic (xi )) T
Wic + ∇ε T
(xi ) . interconnected subsystem as
2 i i ic
N
And it can be approximated as 1
Li3 = Vi∗ + δ̃iT iδ−1 δ̃i .
1 2
ûi (xi ) = − R−1 gT (xi )(∇σic (xi ))T Ŵic . (25) i=1
2 i i Its time derivative is
From the above equation, we can observe that the local control N
policy is derived by the critic NN, and the training of the action L̇i3 = ∇Vi∗T ẋi − δ̃iT iδ−1 δ̂˙i .
NN is no longer required. i=1
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: DECENTRALIZED CONTROL FOR LARGE-SCALE NONLINEAR SYSTEMS 1731
According to (8) and (14), we can obtain Algorithm 1 Local Online PI Algorithm
1: For i = 1, 2, . . . , N, select a set of small positive constants
N ξi , let p = 0 and Vi(0) (xi ) = 0, and begin with admissible
L̇i3 ≤ δ̃iT ∇Vi∗T Ei − Ui (xi , ui )
control policies u(0) i (xi ).
i=1
2: (Local policy evaluation) Let p > 0, based on the local
+ ∇Vi∗T φi − δ̃iT iδ−1 δ̂˙i . (27) (p)
control policy ui (xi ), solve the following local nonlinear
(p)
Lyapunov equation for ui (xi ):
Substituting (26) into (27), we have
(p)T (p)
N 0 = δ̂i ∇Vi (xi )Ei + Ui (xi , ui )
L̇i3 ≤ −Ui (xi , ui ) + ∇Vi∗T φi (p)T
+ ∇Vi (xi )(fi (xi ) + gi (xi )ui (xi ) + hid (xiD )). (28)
i=1
N
≤ −λmin (Qi )xi 2 + ∇Vi∗T φi . 3: (Local policy improvement) Update the local control
(p)
i=1 policy ui (xi ) by
˙ ≤ 0 when xi lies outside of the
We can conclude that (p+1) 1 (p)
ui (xi ) = − R−1 gT (xi )∇Vi (xi ). (29)
compact set 2 i i
⎧ ⎫
⎨ ∇V ∗T φi ⎬ (p+1) (p)
i 4: If Vi (xi ) − Vi (xi ) ≤ ξi , stop and obtain the
xi = xi : xi < .
⎩ λmin (Qi ) ⎭ approximated optimal control; else, let p = p + 1 and
return to 2.
From Lyapunov stability theorem, the closed-loop large-
scale nonlinear system (1) is UUB with the control policies
u1 (x1 ), u2 (x2 ), . . . , uN (xN ). This completes the proof. Similarly, this conclusion can be extended to the case of N
Remark 7: The positive function δ̂i defined in (6) can be isolated subsystems. Additionally, we denote p0 = max{p0i }.
updated by (26). It cannot guarantee δ̂i to be positive at the Therefore, there exists any integer p0 for any ζ , where
very beginning of updating. Noticing that the right hand side ζ = max{ζi }, such that for any p ≥ p0 , (30) and (31) are true
of (26) is positive when t > 0, δ̂i will be guaranteed to be posi- for i = 1, 2, . . . , N. That is to say, the algorithm will converge
tive all the time as long as its initial value δ̂i0 ≥ 0 for updating. to the improved local optimal value functions and local opti-
Thus, (7) can be guaranteed to be a Lyapunov equation with mal controls of the N isolated subsystems. This completes the
a proper initial value. proof.
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
1732 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 10, OCTOBER 2018
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: DECENTRALIZED CONTROL FOR LARGE-SCALE NONLINEAR SYSTEMS 1733
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
1734 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 10, OCTOBER 2018
states of isolated subsystem and the substituted reference [20] D. Liu, X. Yang, and H. Li, “Adaptive optimal control for a class
states of coupled subsystems are employed to approximate of continuous-time affine nonlinear systems with unknown internal
dynamics,” Neural Comput. Appl., vol. 23, nos. 7–8, pp. 1843–1850,
interconnection terms. Then, an improved local performance 2013.
index function is established to reflect the NN substitution [21] D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, “Optimal control of
error. At last, by the Lyapunov stability theorem, the closed- unknown nonaffine nonlinear discrete-time systems based on adaptive
dynamic programming,” Automatica, vol. 48, no. 8, pp. 1825–1832,
loop large-scale nonlinear system is guaranteed to be UUB via 2012.
the developed decentralized control scheme. The simulation [22] H. Zhang, Y. Luo, and D. Liu, “Neural-network-based near-optimal con-
results ensure that the proposed decentralized control scheme trol for a class of discrete-time affine nonlinear systems with control
constraints,” IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1490–1503,
is effective. Sep. 2009.
[23] D. Liu, Q. Wei, and P. Yan, “Generalized policy iteration adaptive
dynamic programming for discrete-time nonlinear systems,” IEEE Trans.
R EFERENCES Syst., Man, Cybern., Syst., vol. 45, no. 12, pp. 1577–1591, Dec. 2015.
[24] L. Yang, J. Si, K. S. Tsakalis, and A. A. Rodriguez, “Direct heuris-
[1] H. Wang, X. Liu, and K. Liu, “Robust adaptive neural tracking control tic dynamic programming for nonlinear tracking control with filtered
for a class of stochastic nonlinear interconnected systems,” IEEE Trans. tracking error,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39,
Neural Netw. Learn. Syst., vol. 27, no. 3, pp. 510–523, Mar. 2016. no. 6, pp. 1617–1622, Dec. 2009.
[2] T. Li, D. Wang, J. Li, and Y. Li, “Adaptive decentralized NN control [25] Q. Wei and D. Liu, “Adaptive dynamic programming for optimal track-
of nonlinear interconnected time-delay systems with input saturation,” ing control of unknown nonlinear systems with application to coal gasi-
Asian J. Control, vol. 15, no. 2, pp. 533–542, 2013. fication,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, pp. 1020–1036,
[3] B. Zhao, Y. Li, and D. Liu, “Self-tuned local feedback gain based Oct. 2014.
decentralized fault tolerant control for a class of large-scale nonlinear [26] B. Zhao, D. Liu, X. Yang, and Y. Li, “Observer-critic structure-
systems,” Neurocomputing, vol. 235, pp. 147–156, Apr. 2017. based adaptive dynamic programming for decentralised tracking control
[4] B. Zhao and Y. Li, “Local joint information based active fault tolerant of unknown large-scale nonlinear systems,” Int. J. Syst. Sci., 2017,
control for reconfigurable manipulator,” Nonlin. Dyn., vol. 77, no. 3, doi: 10.1080/00207721.2017.1296982.
pp. 859–876, 2014. [27] X. Yang, D. Liu, and D. Wang, “Reinforcement learning for adaptive
[5] P. J. Werbos, “A menu of designs for reinforcement learning over time,” optimal control of unknown continuous-time nonlinear systems with
in Neural Network Control. Cambridge, MA, USA: MIT Press, 1990, input constraints,” Int. J. Control, vol. 87, no. 3, pp. 553–566, 2014.
pp. 67–95. [28] D. Wang, D. Liu, and H. Li, “Policy iteration algorithm for online design
[6] H. Wang, P. Shi, H. Li, and Q. Zhou, “Adaptive neural tracking control of robust control for a class of continuous-time nonlinear systems,” IEEE
for a class of nonlinear systems with dynamics uncertainties,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 2, pp. 627–632, Apr. 2014.
Trans. Cybern., to be published, doi: 10.1109/TCYB.2016.2607166. [29] Y. Jiang and Z.-P. Jiang, “Robust adaptive dynamic programming for
[7] H. Wang, W. Sun, and P. X. Liu, “Adaptive intelligent control large-scale systems with an application to multimachine power systems,”
of nonaffine nonlinear time-delay systems with dynamic uncertain- IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 59, no. 10, pp. 693–697,
ties,” IEEE Trans. Syst., Man, Cybern., Syst., to be published, Oct. 2012.
doi: 10.1109/TSMC.2016.2627048. [30] D. Wang, D. Liu, H. Li, B. Luo, and H. Ma, “An approximate opti-
[8] Z. Li, T. Li, and G. Feng, “Adaptive neural control for a class of mal control approach for robust stabilization of a class of discrete-time
stochastic nonlinear time-delay systems with unknown dead zone using nonlinear systems with uncertainties,” IEEE Trans. Syst., Man, Cybern.,
dynamic surface technique,” Int. J. Robust Nonlin. Control, vol. 26, no. 4, Syst., vol. 46, no. 5, pp. 713–717, May 2016.
pp. 759–781, 2016. [31] D. Liu, H. Li, and D. Wang, “Online synchronous approximate optimal
[9] T. Li, Z. Li, D. Wang, and C. L. P. Chen, “Output-feedback adaptive learning algorithm for multi-player non-zero-sum games with unknown
neural control for stochastic nonlinear time-varying delay systems with dynamics,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 44, no. 8,
unknown control directions,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1015–1027, Aug. 2014.
vol. 26, no. 6, pp. 1188–1201, Jun. 2016. [32] Z. Wang, L. Liu, H. Zhang, and G. Xiao, “Fault-tolerant controller design
[10] Y. Li, T. Li, and S. Tong, “Adaptive neural networks output feed- for a class of nonlinear MIMO discrete-time systems via online rein-
back dynamic surface control design for MIMO pure-feedback non- forcement learning algorithm,” IEEE Trans. Syst., Man, Cybern., Syst.,
linear systems with hysteresis,” Neurocomputing, vol. 198, pp. 58–68, vol. 46, no. 5, pp. 611–622, May 2016.
Jul. 2016. [33] B. Zhao, D. Liu, and Y. Li, “Online fault compensation control based
[11] Y. Li, T. Li, B. Miao, and C. L. P. Chen, “Adaptive NN control for a on policy iteration algorithm for a class of affine non-linear systems
class of stochastic nonlinear systems with unmodeled dynamics using with actuator failures,” IET Control Theory Appl., vol. 10, no. 15,
DSC technique,” Neurocomputing, vol. 149, pp. 142–150, Feb. 2015. pp. 1816–1823, Oct. 2016.
[12] F. Y. Wang, H. Zhang, and D. Liu, “Adaptive dynamic programming: [34] B. Zhao, D. Liu, and Y. Li, “Observer based adaptive dynamic program-
An introduction,” IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39–47, ming for fault tolerant control of a class of nonlinear systems,” Inf. Sci.,
May 2009. vol. 384, pp. 21–33, Apr. 2017.
[13] A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time [35] G. K. Venayagamoorthy, R. G. Harley, and D. C. Wunsch, “Dual
nonlinear HJB solution using approximate dynamic programming: heuristic programming excitation neurocontrol for generators in a mul-
Convergence proof,” IEEE Trans. Syst., Man, Cybern. B, Cybern., timachine power system,” IEEE Trans. Ind. Appl., vol. 39, no. 2,
vol. 38, no. 4, pp. 943–949, Aug. 2008. pp. 382–394, Mar./Apr. 2003.
[14] D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,” IEEE [36] D. Fuselli et al., “Action dependent heuristic dynamic programming for
Trans. Neural Netw., vol. 8, no. 5, pp. 997–1007, Sep. 1997. home energy resource scheduling,” Int. J. Elect. Power Energy Syst.,
[15] D. P. Bertsekas and J. N. Tsitsiklis, “Neuro-dynamic programming: An vol. 48, pp. 148–160, Jun. 2013.
overview,” in Proc. 34th IEEE Conf. Decis. Control, 1995, pp. 560–564. [37] S. Song, G. Cai, and X. Lin, “Optimal neuron-controller for fluid
[16] D. Wang, C.-X. Mu, and D.-R. Liu, “Data-driven nonlinear near- triple-tank system via improved ADDHP algorithm,” in Advances
optimal regulation based on iterative neural dynamic programming,” in Computational Intelligence. Berlin, Germany: Springer, 2009,
Acta Automatica Sinica, vol. 43, no. 3, pp. 366–375, 2017. pp. 483–492.
[17] L. P. Kaelbling, M. L. Littman, and A. M. Moore, “Reinforcement [38] J. Si and Y.-T. Wang, “Online learning control by association and rein-
learning: A survey,” J. Artif. Intell. Res., vol. 4, no. 1, pp. 237–285, forcement,” IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264–276,
1996. Mar. 2001.
[18] D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive [39] A. Saberi, “On optimality of decentralized control for a class of nonlin-
optimal control for continuous-time linear systems based on policy ear interconnected systems,” Automatica, vol. 24, no. 1, pp. 101–104,
iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009. 1988.
[19] Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control for [40] T. Bian, Y. Jiang, and Z.-P. Jiang, “Decentralized adaptive optimal con-
continuous-time linear systems with completely unknown dynamics,” trol of large-scale systems with application to power systems,” IEEE
Automatica, vol. 48, no. 10, pp. 2699–2704, 2012. Trans. Ind. Electron., vol. 62, no. 4, pp. 2439–2447, Apr. 2015.
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: DECENTRALIZED CONTROL FOR LARGE-SCALE NONLINEAR SYSTEMS 1735
[41] C. Lu, J. Si, and X. Xie, “Direct heuristic dynamic programming for Guang Shi received the B.S. degree in automation
damping oscillations in a large power system,” IEEE Trans. Syst., Man, from Zhejiang University, Hangzhou, China, in
Cybern. B, Cybern., vol. 38, no. 4, pp. 1008–1013, Aug. 2008. 2012. He is currently pursuing the Ph.D. degree
[42] D. Molina, G. K. Venayagamoorthy, J. Liang, and R. G. Harley, with the State Key Laboratory of Management
“Intelligent local area signals based damping of power system oscilla- and Control for Complex Systems, Institute of
tions using virtual generators and approximate dynamic programming,” Automation, Chinese Academy of Sciences,
IEEE Trans. Smart Grid, vol. 4, no. 1, pp. 498–508, Mar. 2013. Beijing, China.
[43] D. S. Bernstein, C. Amato, E. A. Hansen, and S. Zilberstein, “Policy His current research interests include neural
iteration for decentralized control of Markov decision processes,” networks, adaptive dynamic programming, and
J. Artif. Intell. Res., vol. 34, no. 1, pp. 89–132, 2009. optimal control and energy management in smart
[44] D. Liu, D. Wang, and H. Li, “Decentralized stabilization for a class grids.
of continuous-time nonlinear interconnected systems using online learn-
ing optimal control approach,” IEEE Trans. Neural Netw. Learn. Syst.,
vol. 25, no. 2, pp. 418–428, Feb. 2014.
[45] D. Liu, C. Li, H. Li, D. Wang, and H. Ma, “Neural-network-based decen-
tralized control of continuous-time nonlinear interconnected systems
with unknown dynamics,” Neurocomputing, vol. 165, pp. 90–98,
Oct. 2015.
[46] D. Wang, H. He, B. Zhao, and D. Liu, “Adaptive near-optimal con-
trollers for non-linear decentralised feedback stabilisation problems,”
IET Control Theory Appl., vol. 11, no. 6, pp. 799–806, Apr. 2017.
[47] A. Karimi, S. Eftekharnejad, and A. Feliachi, “Reinforcement learning
based backstepping control of power system oscillations,” Elect. Power
Syst. Res., vol. 79, no. 11, pp. 1511–1520, 2009. Derong Liu (S’91–M’94–SM’96–F’05) received the
[48] S. Mehraeen and S. Jagannathan, “Decentralized optimal control of a Ph.D. degree in electrical engineering from the
class of interconnected nonlinear discrete-time systems by using online University of Notre Dame, Notre Dame, IN, USA,
Hamilton-Jacobi-Bellman formulation,” IEEE Trans. Neural Netw., in 1994.
vol. 22, no. 11, pp. 1757–1769, Nov. 2011. He was a Staff Fellow with General Motors
[49] W. Chen and J. Li, “Decentralized output-feedback neural control Research and Development Center, from 1993 to
for systems with unknown interconnections,” IEEE Trans. Syst., Man, 1995. He was an Assistant Professor with the
Cybern. B, Cybern., vol. 38, no. 1, pp. 258–266, Feb. 2008. Department of Electrical and Computer Engineering,
[50] Y. Tang, M. Tomizuka, G. Guerrero, and G. Montemayor, “Decentralized Stevens Institute of Technology, Hoboken, NJ, USA,
robust control of mechanical systems,” IEEE Trans. Autom. Control, from 1995 to 1999. He joined the University of
vol. 45, no. 4, pp. 771–776, Apr. 2000. Illinois at Chicago, Chicago, IL, USA, in 1999, and
[51] C. Hua, Y. Li, H. Wang, and X. Guan, “Decentralised fault-tolerant became a Full Professor of Electrical and Computer Engineering and of
finite-time control for a class of interconnected non-linear systems,” IET Computer Science in 2006. He served as the Associate Director of the State
Control Theory Appl., vol. 9, no. 16, pp. 2331–2339, Oct. 2015. Key Laboratory of Management and Control for Complex Systems, Institute
of Automation, Chinese Academy of Sciences, Beijing, China, from 2010
to 2015. He is currently a Full Professor with the School of Automation
Bo Zhao (M’16) received the B.S. degree in and Electrical Engineering, University of Science and Technology Beijing,
automation and the Ph.D. degree in control science Beijing. He has published 15 books (six research monographs and nine edited
and engineering from Jilin University, Changchun, volumes).
China, in 2009 and 2014, respectively. Dr. Liu was a recipient of the Faculty Early Career Development Award
He is currently a Post-Doctoral Fellow with the from the National Science Foundation in 1999, the University Scholar Award
State Key Laboratory of Management and Control from University of Illinois from 2006 to 2009, the Overseas Outstanding
for Complex Systems, Institute of Automation, Young Scholar Award from the National Natural Science Foundation of China
Chinese Academy of Sciences, Beijing, China, in 2008, and the Outstanding Achievement Award from Asia Pacific Neural
from 2014. His current research interests include Network Assembly in 2014. He was selected for the “100 Talents Program” by
adaptive dynamic programming, fault diagnosis the Chinese Academy of Sciences in 2008. He is an Elected AdCom Member
and tolerant control, neural-network-based control, of the IEEE Computational Intelligence Society and he is the Editor-in-Chief
optimal control, and robot control. of the Artificial Intelligence Review. He was the General Chair of 2014 IEEE
World Congress on Computational Intelligence and is the General Chair of
2016 World Congress on Intelligent Control and Automation. He is a Fellow
of the International Neural Network Society.
Ding Wang (M’15) received the B.S. degree
in mathematics from the Zhengzhou University
of Light Industry, Zhengzhou, China, the M.S.
degree in operations research and cybernetics from
Northeastern University, Shenyang, China, and the
Ph.D. degree in control theory and control engi-
neering from the Institute of Automation, Chinese
Academy of Sciences, Beijing, China, in 2007, 2009,
and 2012, respectively.
He was a Visiting Scholar with the Department of
Electrical, Computer, and Biomedical Engineering,
University of Rhode Island, Kingston, RI, USA, from 2015 to 2017. He is cur-
rently an Associate Professor with the State Key Laboratory of Management
and Control for Complex Systems, Institute of Automation, Chinese Academy Yuanchun Li received the Ph.D. degree in general
of Sciences. His current research interests include adaptive and learning mechanics from the Harbin Institute of Technology,
systems, computational intelligence, and intelligent control. He has published Harbin, China, in 1990.
over 90 journal and conference papers, and co-authored two monographs. He is currently a Professor in Control Science
Dr. Wang was a recipient of the Excellent Doctoral Dissertation Award of and Engineering with the Changchun University of
Chinese Academy of Sciences in 2013, and a nomination of the Excellent Technology, Changchun, China. His current research
Doctoral Dissertation Award of Chinese Association of Automation (CAA) interests include adaptive dynamic programming,
in 2014. He serves as an Associate Editor of the IEEE T RANSACTIONS ON complex system modeling, and robot control, intel-
N EURAL N ETWORKS AND L EARNING S YSTEMS and Neurocomputing. He is ligent control.
a member of Asia–Pacific Neural Network Society and CAA.
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.