0% found this document useful (0 votes)
3 views

Decentralized_Control_for_Large-Scale_Nonlinear_Systems_With_Unknown_Mismatched_Interconnections_via_Policy_Iteration

This paper addresses the decentralized control problem for large-scale nonlinear systems with unknown mismatched interconnections using a policy iteration algorithm. It proposes an adaptive estimation method that approximates unknown interconnections through neural networks, allowing for improved local performance without the need for matching conditions. The effectiveness of the proposed decentralized control strategy is demonstrated through two simulation examples, ensuring the closed-loop system remains uniformly bounded.

Uploaded by

lekhanh2410
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Decentralized_Control_for_Large-Scale_Nonlinear_Systems_With_Unknown_Mismatched_Interconnections_via_Policy_Iteration

This paper addresses the decentralized control problem for large-scale nonlinear systems with unknown mismatched interconnections using a policy iteration algorithm. It proposes an adaptive estimation method that approximates unknown interconnections through neural networks, allowing for improved local performance without the need for matching conditions. The effectiveness of the proposed decentralized control strategy is demonstrated through two simulation examples, ensuring the closed-loop system remains uniformly bounded.

Uploaded by

lekhanh2410
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO.

10, OCTOBER 2018 1725

Decentralized Control for Large-Scale Nonlinear


Systems With Unknown Mismatched
Interconnections via Policy Iteration
Bo Zhao, Member, IEEE, Ding Wang, Member, IEEE, Guang Shi, Derong Liu, Fellow, IEEE, and Yuanchun Li

Abstract—In this paper, the decentralized control problem is of a set of subsystems coupled by interconnections, which
solved based on a policy iteration algorithm for large-scale non- lead to increasing difficulties of analysis and synthesis when
linear systems with unknown mismatched interconnections. The utilizing centralized control. To overcome the difficulties
unknown interconnection is approximated by a neural network
with local states of isolated subsystem and substituted refer- in controlling such systems, decentralized control strategy,
ence states of coupled subsystems. Then, an adaptive estimation which utilizes local states of each subsystem, is an efficient
term is utilized to construct the improved local performance and effective approach. In the past few decades, considerable
index function that reflects the substitution error. Hereafter, attention has been paid to the design of decentralized con-
the closed-loop large-scale nonlinear system is guaranteed to be trollers for large-scale systems. For example, Wang et al. [1]
ultimately uniformly bounded by the implementation of a set
of developed decentralized optimal control policies. Two sim- presented an adaptive neural decentralized control approach
ulation examples are given to verify the effectiveness of the for stochastic systems with strong interconnected nonlineari-
presented scheme. The significant contribution of this scheme ties both in drift and diffusion terms. Li et al. [2] proposed
lies in that it removes the common assumptions on satisfying a decentralized adaptive neural control scheme for a class of
matching condition and upper boundedness of interconnections, interconnected large-scale uncertain systems with input satura-
when designing the decentralized optimal control for large-scale
nonlinear systems. tion. Zhao et al. [3], [4] presented decentralized fault-tolerant
control schemes based on self-tuned local feedback
Index Terms—Adaptive dynamic programming (ADP), gain and local nonlinear velocity observer against actuator
decentralized control, large-scale systems, neural networks
(NNs), optimal control, policy iteration (PI), reinforcement failures.
learning, unknown mismatched interconnections. As is well known, the optimal control problem for nonlinear
systems can be addressed by the Hamilton–Jacobi–Bellman
(HJB) equation, which can be solved by adaptive dynamic pro-
gramming (ADP) [5] to avoid the difficulty in the “curse of
I. I NTRODUCTION
dimensionality” with the help of function approximators, such
HE increasing demands of production quality and
T economic efficiency have led to increasingly large-scale
and complex modern systems, such as ecosystems, communi-
as neural networks (NNs). Since NNs have a strong approxi-
mation capability, Wang et al. [6], [7] proposed adaptive neural
control schemes for nonlinear systems with dynamic uncer-
cation systems, transportation systems, urban traffic systems, tainties and completely unknown dynamics. Li et al. [8] tack-
and power systems. In general, a large-scale system consists led the control problems by using NNs for nonlinear systems
with unknown dead-zone, time-varying delays [9], [10], and
Manuscript received November 22, 2016; revised February 15, 2017;
accepted March 30, 2017. Date of publication April 24, 2017; date of current unmodeled dynamics[11]. There are many synonyms used
version September 14, 2018. This work was supported in part by the National for ADP, such as ADP [12], approximate dynamic pro-
Natural Science Foundation of China under Grant U1501251, Grant 61603387, gramming [13], adaptive critic designs [14], neuro-dynamic
Grant 61533017, Grant 61374051, Grant 61374105, and Grant 61503379, in
part by the Scientific and Technological Development Plan Project in Jilin programming [15], [16], and reinforcement learning [17].
Province of China under Grant 20150520112JH and Grant 20160414033GH, Recently, ADP algorithms were further employed to solve
and in part by the Beijing Natural Science Foundation under Grant 4162065. control problems of continuous-time systems [18]–[20],
This paper was recommended by Associate Editor Z. Wang. (Corresponding
author: Ding Wang.) discrete-time systems [21]–[23], trajectory tracking [24]–[26],
B. Zhao, D. Wang, and G. Shi are with the State Key Laboratory input/output constraints [22], [27], external disturbances and
of Management and Control for Complex Systems, Institute of uncertainties [28]–[30], zero-sum games [31], fault toler-
Automation, Chinese Academy of Sciences, Beijing 100190, China
(e-mail: [email protected]; [email protected]; [email protected]). ant [32]–[34], etc. We can see from the literature that
D. Liu is with the School of Automation and Electrical Engineering, ADP algorithms can be categorized into heuristic dynamic
University of Science and Technology Beijing, Beijing 100083, China (e-mail: programming (HDP) [24], dual HDP (DHP) [35], action-
[email protected]).
Y. Li is with the Department of Control Science and Engineering, dependent HDP (ADHDP) [36], ADDHP [37], globalized
Changchun University of Technology, Changchun 130012, China (e-mail: DHP (GDHP) [38], and ADGDHP [14].
[email protected]). As previously mentioned, local controllers should be
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. designed for their corresponding subsystems in decentralized
Digital Object Identifier 10.1109/TSMC.2017.2690665 control strategy. Saberi [39] established a decentralized
2168-2216 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
1726 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 10, OCTOBER 2018

optimal control for local subsystems of interconnected system. 2) The unknown mismatched interconnection is estimated
It is shown that the optimal control can be implemented by local observer, which utilizes the local states and the
to design a decentralized controller. Several literatures have substituted reference states of the coupled subsystems.
paid attention to the optimal design in decentralized con- As a result, the proposed scheme avoids the common
trol. Jiang and Jiang [19] presented a decentralized control assumptions on satisfying matching condition and upper
design by using robust ADP theory and policy iteration boundedness of interconnections of large-scale nonlinear
(PI) technique for complex systems with unknown parame- systems in previous ADP-based approaches.
ters and dynamic uncertainties. And then, Bian et al. [40] The rest of this paper is organized as follows. In Section II,
extended the method to systems with unmatched uncertain- the problem statement is presented, and approximate the
ties. Lu et al. [41] presented a direct HDP method to address interconnection by employing local states of isolated subsys-
nonlinear coordinated control for a large power system with tem and substituted reference states of coupled subsystems. In
uncertainties. For damping low frequency oscillations in power Section III, the decentralized optimal control policy is derived
systems, Molina et al. [42] developed an intelligent con- for the isolated subsystem, and NNs are employed to approxi-
troller based on local signals by using virtual generators and mate the interconnection and the critic NN, respectively. Then,
ADP technique. Bernstein et al. [43] presented an optimal the local online PI algorithm is presented. In Section IV, two
PI algorithm to handle the decentralized partially observ- numerical simulation examples are provided to demonstrate
able Markov decision process. Liu et al. [44] developed an the effectiveness of the developed scheme. In Section V, the
NN-based online learning optimal control approach to sta- conclusion is drawn.
bilize nonlinear interconnected large-scale systems, and then
by introducing an integral PI algorithm, a model-free opti- II. P ROBLEM S TATEMENT
mal control method was extended to unknown interconnected
systems [45]. Karimi et al. [47] proposed a reinforcement In this paper, we consider a large-scale nonlinear system
learning-based backsteping decentralized control scheme for composed of N subsystems with unknown mismatched
electric power systems, where gains of decentralized con- interconnections as

trollers were tuned by reinforcement learning to adapt to ⎪ ẋ1 (t) = f1 (x1 (t)) + g1 (x1 (t))u1 (x1 (t)) + h1 (x(t))


various operating conditions. Mehraeen and Jagannathan [48] ⎪
⎪ ..

⎨ .
solved HJB equation via direct neural dynamic program-
ẋi (t) = fi (xi (t)) + gi (xi (t))ui (xi (t)) + hi (x(t)) (1)
ming for the decentralized near optimal regulation of non- ⎪
⎪ ..


linear interconnected discrete-time systems. However, most ⎪
⎪ .

previously mentioned literature focused on the plants in ẋN (t) = fN (xN (t)) + gN (xN (t))uN (xi (t)) + hN (x(t)).
linear or satisfying assumed matching conditions. Actually,
interconnections are always unknown and mismatched in The ith (i = 1, 2, . . . , N) interconnected subsystem is
many applications. However, ADP-based decentralized control described by
approaches were not presented in previous works for systems ẋi (t) = fi (xi (t)) + gi (xi (t))ui (xi (t)) + hi (x(t)) (2)
in these situations.
Motivated by [44]–[46], this paper addresses the decen- where xi (t) ∈ Rni and ui (xi (t)) ∈ Rmi are the state vec-
tralized control problem for large-scale nonlinear systems tor and input vector of the ith subsystem,
 respectively. x =
with unknown mismatched interconnections. By using local [x1 , x2 , . . . , xN ]T ∈ Rn with n = N i=1 ni denotes the entire
states of isolated subsystem and substituted reference states of system state and u1 (x1 ), u2 (x2 ), . . . , uN (xN ) are local control
coupled subsystems, the unknown interconnection is approx- inputs. For the ith subsystem, fi (·) and gi (·) are known, locally
imated by an NN. Then, the improved local performance Lipschitz and differentiable in their augments with fi (0) = 0.
index function which reflects the substitution error is con- hi (x(t)) is the unknown mismatched interconnection term.
structed with the help of the estimated term. Hereafter, the As is well known, fuzzy logic systems, NNs, etc. are excel-
PI algorithm is developed to solve the HJB equation via lent approximators for unknown nonlinearities. Furthermore,
the constructed critic NN, and the approximated decentral- since the radial basis function NN (RBFNN) has simple
ized control policy can be directly obtained. It is proven that structure and excellent approximation capability, RBFNN is
the closed-loop large-scale nonlinear system can be guaran- employed in a given compact set  ∈ Rn to approximate the
teed to be ultimately uniformly bounded (UUB) based on unknown interconnection hi (x(t)), that is
Lyapunov stability theorem. Two numerical simulation exam-
T
ples are provided to ensure the effectiveness of the proposed hi (x(t)) = Wih σih (x(t)) + εi (x(t)) (3)
scheme.
where σih (x(t)) is called the basis function that is commonly
The main contributions of this paper include the following
selected as a Gaussian function
two aspects.  
1) Unlike the literature previously mentioned, this paper −(x − ci )T (x − ci )
extends the ADP algorithm to deal with the decentralized σih (x) = exp
b2i
control problem with an improved local performance
index function for large-scale nonlinear systems with where the constant vector ci is the center of the basis func-
unknown mismatched interconnections. tion, and bi > 0 is a real number which is the width

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: DECENTRALIZED CONTROL FOR LARGE-SCALE NONLINEAR SYSTEMS 1727

of the basis function. The optimal weight vector Wih = The main objective of this paper is to find a set of local
[wi1 , wi2 , . . . , wik ]T is defined as control policies u1 (x1 ), u2 (x2 ), . . . , uN (xN ) as the decentral-
ized control law to stabilize the system (1). To handle the
T
Wih = arg min sup hi (x) − Ŵih σih (x(t)) optimal control problem, we need to obtain the optimal control
Ŵih ∈Rk x∈ policy u∗i (xi ) for the ith subsystem. Thus, it is desired to find
the feedback control policy ui (xi ) to minimize the improved
and εi (x) is the NN approximation error, which can be
local infinite horizon performance index function as
decreased by increasing the NN hidden node number k.
Assumption 1: The NN approximation error εi is upper ∞   
 
bounded, i.e., |εi (x)| ≤ φi1 , where φi1 is an unknown positive Ji (xi0 ) = δ̂i ∇JiT (xi (τ ))Ei + Ui (xi (τ ), ui (τ )) dτ
0
constant. (6)
To relax the upper boundedness assumption of
interconnections, we approximate the interconnection where Ui (xi , ui ) = xiT Qi xi + uTi Ri ui is the utility function,
term in the ith subsystem by RBFNN using the states of local Ui (0, 0) = 0, and Ui (xi , ui ) ≥ 0 for all xi and ui , in which Qi ∈
subsystem and the reference states of the coupled subsystems, Rni ×ni and Ri ∈ Rmi ×mi are positive definite matrices. δ̂i is a
that is positive function which will be defined later. ∇Ji (xi ) denotes
the partial derivative of local performance index function Ji (xi )
T
hi (x) = Wih σih (xiD ) + i (x, xiD ) + εi (xi ) with respect to local state xi , i.e., ∇Ji (xi ) = (∂Ji (xi )/∂xi ).
= hid (xiD ) + i (x, xiD ) + εi (xi ) (4)
III. D ECENTRALIZED C ONTROLLER D ESIGN
where xiD = [x1d , x2d , . . . , xi , . . . , xNd ]T , xid indicates the AND S TABILITY A NALYSIS
reference states of the coupled subsystems, hid (xiD ) =
T σ (x ),  (x, x ) = W T σ (x) − W T σ (x ) is the sub- In this section, we present the optimal decentralized con-
Wih ih iD i iD ih ih ih ih iD troller design and stability analysis in detail.
stitution error since it arises from the substitution of NN
inputs.
Similar to [49], the Gaussian function σih (xi ) satisfies the A. Optimal Control
global Lipschitz condition, which implies Based on the optimal control theory, the designed feedback
N
control policy must be admissible. Therefore, before the opti-
mal control is presented, the definition of admissible control
i  ≤ dij Ej
is introduced.
j=1,j=i
Definition 1: For the ith isolated subsystem (5), a control
where Ej = xj − xjd  and dij > 0 is an unknown global policy ui (xi ) is defined to be admissible with respect to (6) if
Lipschitz constant. ui (xi ) is continuous on a set i ∈ Rni , ui (0) = 0 and ui (xi )
Remark 1: We can observe that (4) can be obtained only stabilizes the isolated subsystem (5), and Ji (xi0 ) in (6), where
by adding and subtracting the term hid (xiD ), which can be xi0 is the initial state of xi , is finite for all xi ∈ i .
approximated by RBFNN. So it can avoid the common upper Consider the ith isolated subsystem (5), for any admissible
boundedness assumption of the interconnection term in the ith control policy ui (xi ) ∈ ψi (i ), where ψi (i ) denotes the set
subsystem. In other words, the function hi (x) depends only on of admissible control, if the improved local value function
the corresponding local states and the reference states, which ∞   
 
are shared with each subsystem before the system runs. Vi (xi ) = δ̂i ∇ViT (xi )Ei + Ui (xi (τ ), ui (τ )) dτ (7)
For the ith isolated subsystem 0
is continuously differentiable, then the infinitesimal version
ẋi (t) = fi (xi (t)) + gi (xi (t))ui (xi (t)) + hid (xiD ) (5) of (7) is the so-called local nonlinear Lyapunov equation
 
since fi (·) and gi (·) are locally Lipschitz continuous on a set  
0 = δ̂i ∇ViT (xi )Ei + Ui (xi , ui )
i ∈ Rni , the subsystem (5) is controllable. Different from
the interconnected subsystem (2), the isolated subsystem (5) + ∇ViT (xi )(fi (xi ) + gi (xi )ui (xi ) + hid (xiD )) (8)
depends only on its local states. with Vi (0) = 0.
Remark 2: To eliminate confusions, it is necessary to dis- Define the local Hamiltonian as
tinguish the concepts of interconnected subsystems, isolated  
 
subsystems, and coupled subsystems. In this paper, we call Hi (xi , ui , ∇Vi (xi )) = δ̂i ∇ViT (xi )Ei + Ui (xi , ui )
all the subsystems interconnected with the ith one as cou-
pled subsystems. On the other hand, we call (2) interconnected + ∇ViT (xi )(fi (xi ) + gi (xi )ui (xi )
subsystem, since hi (x(t)) contains the actual states of all the + hid (xiD ))
subsystems. Different from it, hid (xiD ) in the isolated subsys-
and the local value function as
tem (5) depends only on the local states and the substituted
∞   
reference states of the coupled subsystems. That is to say,  
Vi∗ (xi ) = min δ̂i ∇Vi∗T (xi )Ei + Ui (xi (τ ), ui (τ )) dτ .
hid (xiD ) is independent from interconnection hi (x(t)). So it ui ∈ i (i ) 0
can be called isolated subsystem. (9)

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
1728 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 10, OCTOBER 2018

According to optimal control theory, Vi∗ (xi ) satisfies HJB Therefore


equation N
  V̇ ∗ = Vi∗
0= min Hi xi , ui , ∇Vi∗ (xi ) (10) i=1
ui ∈ i ()
N      
   
where ∇Vi∗ (xi )
= (∂Vi∗ (xi )/∂xi ).
Assume the solution Vi∗ (xi ) ≤ −δ̂i ∇Vi∗T Ei − Ui (xi , ui ) + ∇Vi∗T φi1
exists and is continuously differentiable, the local optimal i=1

control policy can be described as


N
  


N
+ max dij ∇Vi∗T  Ej
ij
1 i=1 j=1,j=i
u∗i (xi ) = − R−1 gT (xi )∇Vi∗ (xi ). (11) N      
2 i i    
≤ −δ̂i ∇Vi∗T Ei − Ui (xi , ui ) + ∇Vi∗T φi1
For considered large-scale nonlinear system (1), the local i=1
feedback control policies u1 (x1 ), u2 (x2 ), . . . , uN (xN ) should be   N   N
 ∗T 
presented to guarantee the entire closed-loop system stable. To + max dij ∇Vi  Ej . (13)
ij
achieve this goal, we will transform the stabilization problem i=1 j=1,j=i
into designing a set of local optimal controllers with proper
Since Ei = xi − xid  ≥ 0, (13) becomes
local value functions.
Theorem 1: For the ith interconnected subsystem (2) with N      
∗    
the improved local value function (7), Vi∗ (xi ) is the opti- V̇ ≤ −δ̂i ∇Vi∗T Ei − Ui (xi , ui ) + ∇Vi∗T φi1
mal solution of the HJB (10), and u∗i (xi ) is the optimal i=1
control policy by (11). It implies that the control policies   N   N
 ∗T 
u∗1 (x1 ), u∗2 (x2 ), . . . , u∗N (xN ) are the decentralized control law + max dij ∇Vi  Ej
ij
of large-scale nonlinear system (1). i=1 j=1
Proof: The theorem can be proved by showing that Vi∗ (xi ) N      
   
is a Lyapunov function. From the definition of each term = −δ̂i ∇Vi∗T Ei − Ui (xi , ui ) + ∇Vi∗T φi1
in (9), we can observe that Vi∗ (xi ) > 0 for any xi = 0 i=1
and Vi∗ (xi ) = 0 for xi = 0, which implies that Vi∗ (xi ) is   N  
 ∗T 
a positive definite function. Therefore, the time derivative + N · max dij ∇Vi Ei
ij
of Vi∗ (xi ) along the corresponding state of the closed-loop i=1
interconnected subsystem is described by   N   N
 
 ∗T 
− max dij ∇Vi  Ei − Ej .
 T ij
V̇i∗ (xi ) = ∇Vi∗ (xi ) ẋi i=1 j=1
 T
= ∇Vi∗ (xi ) (fi (xi ) + gi (xi )ui (xi ) Let δi = N · max{dij }. We have
ij
+ hiD (xiD ) + i + εi ). (12)
N      
   
V̇ ∗ = δ̃i ∇Vi∗T Ei − Ui (xi , ui ) + ∇Vi∗T φi1
Denoting (∇Vi∗ (xi ))T as ∇Vi∗T for simplicity, and substitut-
i=1
ing (8) into (12), we have
  N   N
 
   ∗T 
  − max dij ∇Vi  Ei − Ej (14)
V̇i∗ (xi ) = −δ̂i ∇Vi∗T Ei − Ui (xi , ui ) + ∇Vi∗T (i + εi ) ij
i=1 j=1
   
   
≤ −δ̂i ∇Vi∗T Ei − Ui (xi , ui ) + ∇Vi∗T i  where δ̃i = δi − δ̂i . Denoting ηi1 = δ̃i ∇Vi∗T Ei +∇Vi∗T φi1 −
   ∗T
N
  max{dij } N i=1 ∇Vi  j=1 (Ei − Ej ), which is assumed to
+ ∇Vi∗T εi . ij
be bounded, i.e., |ηi1 | ≤ i, we have
Considering Assumption 1, we have N  
  V̇ ∗ ≤ T T
i − xi Qi xi − ui Ri ui
 
V̇i∗ (xi ) ≤ −δ̂i ∇Vi∗T Ei − Ui (xi , ui ) i=1

  N  
N  
    ≤ − xiT Qi xi
+ ∇Vi∗T  dij Ej + ∇Vi∗T φi1 i
i=1
j=1,j=i
  N  
 
≤ −δ̂i ∇Vi∗T Ei − Ui (xi , ui ) ≤ i − λmin (Qi )xi 2
    
N i=1
   
+ max dij ∇Vi∗T  Ej + ∇Vi∗T φi1 . where λi (·) denotes the minimum eigenvalue of the matrix.
ij
j=1,j=i Hence, we can conclude that V̇ ∗ < 0 when xi lies outside of

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: DECENTRALIZED CONTROL FOR LARGE-SCALE NONLINEAR SYSTEMS 1729

the compact set Proof: Select the Lyapunov function candidate as


  
1 T 1 T −1
xi = xi : xi  ≤
i
. Li1 = e ei + W̃ih ih W̃ih (18)
λmin (Qi ) 2 i 2
where W̃ih = Wih − Ŵih is the weight approximation error.
It implies that V ∗ (x) is a Lyapunov function. This indicates Denoting ηi2 = Dig ei ui + i (x, xiD ) + εi , the time
that xi (t) will converge to a small neighborhood wherever the derivative of (18) is
initial position is. This completes the proof.
T −1 ˙
Remark 3: In [44], the interconnection term is required to L̇i1 = eTi ėi − W̃ih ih Ŵih
satisfy the assumed matching condition, which plays an impor-  
T −1 ˙
tant role in guaranteeing the closed-loop isolated subsystem ≤ ei Dif ei  + W̃ih
T T
σih (xiD ) + ηi2 − li ei − W̃ih ih Ŵih .
to be stable. Unlike the method in [44], in this paper, we Suppose that the norm of the entire error is bounded as ηi2  ≤
can see from the detailed proof that the strong assumption is φi2 with φi2 > 0 as an unknown constant, we have
relaxed by moving the substituted interconnection into the iso-  
lated subsystem, and leaving the bounded term ∇Vi∗ (xi )φi1 L̇i1 ≤ − λmin (li ) − Dif ei 2 + ei φi2
 
to be handled. Furthermore, the UUB stability is guaranteed T T −1 ˙
+ W̃ih ei σih (xiD ) − ih Ŵih . (19)
for the large-scale nonlinear system (1), rather than the iso-
lated subsystem (5). Therefore, the assumption on the matched Substituting (17) into (19), we have
interconnection can be removed.  
L̇i1 ≤ − λmin (li ) − Dif ei 2 + ei φi2
  
B. Neural Network Implementation = ei  −λmin (li ) + Dif ei  + φi2 .
In this section, two NNs are employed to approximate Therefore, we can conclude that L̇i1 ≤ 0 when ei lies outside
the unknown mismatched interconnection and the assumed of the compact set
differentiable local performance index function.
1) Approximation of the Interconnection: In this part, φi2
ei = ei : ei  <
a state observer is employed to estimate the state of λmin (li ) − Dif
interconnected subsystem (2) as where λmin (li ) > Dif . According to the Lyapunov’s direct
     
x̂˙ i (t) = fi x̂i + gi x̂i ui + ĥid x̂iD + li ei (15) method, the observation error is UUB with the approximation
and substitution of the unknown mismatched interconnection.
where ei = xi − x̂i is the observation error, and li = This completes the proof.
diag[li1 , li2 , . . . , lin ] ∈ Rni ∗ni is the observer gain matrix with Remark 4: It is reasonable to assume ηi1 and ηi2 in
all positive elements. Noticing that the approximated unknown Theorems 1 and 2 to be bounded. Take ηi1 in Theorem 1
mismatched interconnection hi is shown as (3), it can be as an example, the bounded ηi1 is necessary to guarantee the
approximated by ĥid , which is expressed as closed-loop system to be UUB, since we cannot promise ηi1
T
  is a positive or negative function. It means that xi (t) will con-
ĥid = Ŵih σih x̂iD (16)
verge to a small neighborhood, which may be smaller than the
whose weight vector is updated by given xi , but never larger than it.
˙ =  eT σ x̂ 
Ŵ (17)
Remark 5: From Theorems 1 and 2, we can see that the
ih ih i ih iD summaries are obtained by the boundedness assumptions of ηi1
with ih > 0 a constant. and ηi2 . It indicates that the stability verifications are based on
Combining (2), (4) with (15), we have the boundedness of the states, rather than the boundedness on
      interconnections. Thus, it removes the assumption on available
ėi = fi (xi ) − fi x̂i + gi (xi ) − gi x̂i ui upper boundedness of interconnections in [44].
 
+ hid (xiD ) + i (x, xiD ) + εi − ĥid x̂iD − li ei . 2) Critic Neural Network: Since the term δ̂i ∇Vi (xi )Ei
in (7) is not completely known, we need to use parametric
Since fi (·) and gi (·) are locally Lipschitz, we have structures, such as NNs, to approximate it. We can observe
ėi ≤ Dif ei  + Dig ei ui + hid (xiD ) + i (x, xiD ) that the unknown part ∇Vi (xi ) is the gradient along the cor-
  responding state xi of the critic NN, and it can be indirectly
+ εi − ĥid x̂iD − li ei obtained by approximating Vi (xi ) with a single layer NN on
 
= Dif ei  + Dig ei ui + W̃ih
T
σih x̂iD the compact set i as
+ i (x, xiD ) + εi − li ei Vi (xi ) = WicT σic (xi ) + εic (xi ) (20)
where Dif and Dig are positive constants. where Wic ∈ Rli is the ideal weight vector, σi (xi ) ∈ is theRli
Theorem 2: Consider the interconnected subsystem (2), activation function, li is the number of neurons in the hidden-
as well as the approximation of the unknown mismatched layer, and εic (xi ) is the approximation error. Then, its gradient
interconnection (16) and with the updated law as (17), the along corresponding state xi is
observation error ei which is derived by combining (2) with
the developed state observer (15) is guaranteed to be UUB. ∇Vi (xi ) = (∇σic (xi ))T Wic + ∇εic
T
(xi ) (21)

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
1730 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 10, OCTOBER 2018

where ∇σic (xi ) = (∂σic (xi )/∂xi ) ∈ Rli ×ni and ∇εic (xi ) are Theorem 3: Consider the interconnected subsystem (2), the
the gradients of the activation function and the approximation weight of the critic NN is updated by (24), the dynamics of
error, respectively. the weight approximation error vector can be guaranteed to
From (20), the approximate critic NN can be expressed by be UUB.
Proof: Select the Lyapunov function candidate as
V̂i (xi ) = ŴicT σic (xi ).
1 T
Li2 = W̃ W̃ic .
Then, the gradient of V̂i (xi ) along the corresponding state is 2lic ic
∇ V̂i (xi ) = (∇σic (xi ))T Ŵic . Its time derivative is
1 T˙
For the isolated subsystem (5), substituting (21) into the L̇i2 = W̃ W̃ic
lic ic
nonlinear Lyapunov function (8), we have  
  = W̃icT eicH − W̃icT θi θi
 T
   2
0 = δ̂i ∇ViT (xi )Ei + Ui (xi , ui ) + WicT ∇σic (xi ) + ∇εic (xi )  
= W̃icT eicH θi − W̃icT θi 
× ( fi (xi ) + gi (xi )ui (xi ) + hid (xiD )).
1 1
2

≤ e2icH − W̃icT θi  .
Let υi = ∇ViT (xi ) − ∇ V̂iT (xi ), for the interconnected 2 2
subsystem (2), the Hamiltonian can be expressed as
Hence, L̇i2 < 0 when W̃ic lies outside of the compact set
   
     eicH 
Hi (xi , ui , Wic ) = δ̂i ∇ V̂iT (xi )Ei + Ui (xi , ui )    
W̃ic = W̃ic : W̃ic ≤ 
+ WicT ∇σic (xi )ẋi θiM 
T
= −δ̂i υi Ei − ∇εic (xi )ẋi where θi  ≤ θiM , and θiM is a positive constant. Based
on the Lyapunov stability theorem, the dynamics of the
= eicH (22)
weight approximation error vector is UUB. This completes the
where eicH is the approximation error of the critic NN. proof.
Thus, the approximate local Hamiltonian can be obtained by Remark 6: Since the convergence rate of RBFNN is higher
  than that of back propagation NN (BPNN), the RBFNN is
Hi xi , ui , Ŵic = δ̂i ∇ V̂iT (xi )Ei + Ui (xi , ui ) + ŴicT ∇σic (xi )ẋi employed by the developed local state observer (15). However,
= eic . the local control policy (25) requires the partial derivative
of local critic NN, which has heavy computational burden if
Let θi = ∇σic (xi )ẋi . By the steepest descent algorithm, the RBFNN is employed. To tradeoff between the convergence
objective function Eic = (1/2)eTic eic can be minimized in order rate and computational burden, BPNN is selected for local
to adjust the weight vector of the critic NN Ŵic , which should critic NN. Thus, different structures are chosen for these
be updated by two NNs.
˙ = −l e θ
Ŵ (23)
ic ic ic i C. Stability Analysis
where lic > 0 is the learning rate. Theorem 4: Consider the interconnected subsystem (2),
Define the weight approximation error as W̃ic = Wic − Ŵic , together with the improved local value function (7), where
according to (22) and (23), one has δ̂i is updated by
 T 
 
eic = eicH − W̃icT θi . δ̂˙i = iδ  ∇Vi∗ (xi ) Ei (26)

The critic NN weight approximation error can be updated by and iδ > 0 a constant, the N approximated decentralized
  control policies developed by (25) guarantee the closed-
˙ = −Ŵ
W̃ ˙ = l e − W̃ T θ θ . (24) loop large-scale nonlinear system (1) to be UUB. In other
ic ic ic icH ic i i
words, the control policies u1 (x1 ), u2 (x2 ), . . . , uN (xN ) are the
Therefore, according to (11) and (20), the ideal local control decentralized control law for the large-scale nonlinear system
policy can be expressed as composed of N subsystems as (2).
1   Proof: Select the Lyapunov function candidate for the ith
ui (xi ) = − R−1 gT
(xi ) (∇σic (xi )) T
Wic + ∇ε T
(xi ) . interconnected subsystem as
2 i i ic
N  
And it can be approximated as 1
Li3 = Vi∗ + δ̃iT iδ−1 δ̃i .
1 2
ûi (xi ) = − R−1 gT (xi )(∇σic (xi ))T Ŵic . (25) i=1
2 i i Its time derivative is
From the above equation, we can observe that the local control N  
policy is derived by the critic NN, and the training of the action L̇i3 = ∇Vi∗T ẋi − δ̃iT iδ−1 δ̂˙i .
NN is no longer required. i=1

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: DECENTRALIZED CONTROL FOR LARGE-SCALE NONLINEAR SYSTEMS 1731

According to (8) and (14), we can obtain Algorithm 1 Local Online PI Algorithm
1: For i = 1, 2, . . . , N, select a set of small positive constants
N    ξi , let p = 0 and Vi(0) (xi ) = 0, and begin with admissible
 
L̇i3 ≤ δ̃iT ∇Vi∗T Ei − Ui (xi , ui )
control policies u(0) i (xi ).
i=1
   2: (Local policy evaluation) Let p > 0, based on the local
 
+ ∇Vi∗T φi − δ̃iT iδ−1 δ̂˙i . (27) (p)
control policy ui (xi ), solve the following local nonlinear
(p)
Lyapunov equation for ui (xi ):
Substituting (26) into (27), we have  
 (p)T  (p)
N     0 = δ̂i ∇Vi (xi )Ei + Ui (xi , ui )
 
L̇i3 ≤ −Ui (xi , ui ) + ∇Vi∗T φi (p)T
+ ∇Vi (xi )(fi (xi ) + gi (xi )ui (xi ) + hid (xiD )). (28)
i=1
N    
 
≤ −λmin (Qi )xi 2 + ∇Vi∗T φi . 3: (Local policy improvement) Update the local control
(p)
i=1 policy ui (xi ) by
˙ ≤ 0 when xi lies outside of the
We can conclude that  (p+1) 1 (p)
ui (xi ) = − R−1 gT (xi )∇Vi (xi ). (29)
compact set 2 i i
⎧   ⎫  
⎨ ∇V ∗T φi ⎬  (p+1) (p) 
i 4: If Vi (xi ) − Vi (xi ) ≤ ξi , stop and obtain the
xi = xi : xi  < .
⎩ λmin (Qi ) ⎭ approximated optimal control; else, let p = p + 1 and
return to 2.
From Lyapunov stability theorem, the closed-loop large-
scale nonlinear system (1) is UUB with the control policies
u1 (x1 ), u2 (x2 ), . . . , uN (xN ). This completes the proof. Similarly, this conclusion can be extended to the case of N
Remark 7: The positive function δ̂i defined in (6) can be isolated subsystems. Additionally, we denote p0 = max{p0i }.
updated by (26). It cannot guarantee δ̂i to be positive at the Therefore, there exists any integer p0 for any ζ , where
very beginning of updating. Noticing that the right hand side ζ = max{ζi }, such that for any p ≥ p0 , (30) and (31) are true
of (26) is positive when t > 0, δ̂i will be guaranteed to be posi- for i = 1, 2, . . . , N. That is to say, the algorithm will converge
tive all the time as long as its initial value δ̂i0 ≥ 0 for updating. to the improved local optimal value functions and local opti-
Thus, (7) can be guaranteed to be a Lyapunov equation with mal controls of the N isolated subsystems. This completes the
a proper initial value. proof.

D. Local Online PI Algorithm IV. S IMULATION S TUDY


Here, a local online PI algorithm is introduced to solve HJB For large-scale nonlinear systems with unknown mis-
equations. The local online PI algorithm consists of the local matched interconnections, two simulation examples are given
policy evaluation based on (8) and the local policy improve- in order to show the effectiveness of the proposed decentral-
ment based on (11), and its iteration process can be described ized control scheme in this section.
as Algorithm 1. Example 1: Consider the following large-scale nonlinear
(0) system:
From Algorithm 1, we can see that Vi (xi ) = 0 is required.  
It is required to prove the convergence of Algorithm 1, e.g., x12 − x11
(p) (p) ẋ1 =
Vi (xi ) → Ji∗ (xi ) and ui (xi ) → u∗i (xi ) as p → ∞. −0.5(x11 + x12 ) − 0.5x12 (cos(2x11 ) + 2)2
 
Theorem 5: For the ith isolated subsystem (5), given N ini- 0
(0)
tial admissible control policies ui (xi ), where i = 1, 2, . . . , N. + u (x )
cos(2x1 ) + 2 1 1
Then, using the local PI algorithm described by (28) and (29),  
0 

the improved local value functions and control policies con- +
(p) 4(x11 + x22 ) sin x12
3 cos(0.5x
21 )
verge to the optimal ones as p → ∞, i.e., Vi (xi ) → Ji∗ (xi )    
(p) x22 0
and ui (xi ) → u∗i (xi ). ẋ2 = + u (x )
(p) −x21 − 0.5x22 + 0.5x21 2 x
22 x21 2 2
Proof: For the ith subsystem, we have ui (xi ) ∈ (i ) 
for any p ≥ 0 with a given initial admissible control pol- 0  2
(0) +
icy ui (xi ). Furthermore, there exists an integer p0i for any 0.5(x12 + x22 ) cos ex21
ζi such that for any p ≥ p0i , the following formulas hold
simultaneously: where xi = [xi1 , xi2 ]T ∈ R2 and ui (xi ) ∈ R are the state and
control input of ith subsystem, respectively. It is assumed that
(p)
sup Vi (xi ) − Ji∗ (xi ) < ζi (30) the interconnections are unknown and mismatched.
xi ∈i Let the initial states of the system be x10 = x20 = [1, −1]T ,
(p)
sup ui (xi ) − u∗i (xi ) < ζi . (31) and the initial states of the observer be x̂10 = [2, −2]T and
xi ∈i x̂20 = [1.5, −1.5]T , respectively. Because it is a regulation

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
1732 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 10, OCTOBER 2018

Fig. 1. State estimation errors of Example 1.

problem, the reference states of the coupled subsystems can


be chosen as xid = 0. In this simulation, the RBFNN in
the local observer is chosen as 2–7–1 with 2 input neu-
rons, 7 hidden neurons, and 1 output neuron. Meanwhile, the
improved local value function (6) is approximated by a critic
NN, whose structure is chosen as 2–3–1 with 2 input neu-
rons, 3 hidden neurons, and 1 output neuron, and the weight
vector as Ŵic = [Ŵic1 , Ŵic2 , Ŵic3 ]T with the initial values Fig. 2. Weights of critic NNs of Example 1.
W1c0 = [1.6, 0.4, 0.6]T and W2c0 = [0.3, 0.4, 1.3]T . The
activation function of the critic NN is selected as σic (xi ) =
2 , x x , x2 ]. Let Q = 20I , R = 20I, the weight learn-
[xi1 i1 i2 i2 i 2 i
ing rates of the approximated interconnection and the critic NN
be ih = 10 and lic = 0.05, the updated rate of δ̂i in improved
local value function (7) be iδ = 0.0001, the state observer
gain matrix be li = 10I2 , where In denotes the identity matrix
with n dimensions, respectively.
The simulation results are shown in Figs. 1–3. Fig. 1
illustrates the state estimation error by using the local state
observer (15). It implies that the unknown interconnection can
be approximated precisely online. We can see in Fig. 2, the
weights of two critic NNs converge to [2.297408, −0.339473,
1.384527]T and [2.838536, −1.982101, 3.267496]T . From
Fig. 3, the system states can converge to zero by using the
improved local value function (7) and the developed local
PI algorithm. Therefore, the simulation results verify the
effectiveness of the proposed decentralized control scheme. Fig. 3. System states of Example 1.
Remark 8: In the considered large-scale nonlinear systems,
the known dynamic fi (xi ) and gi (xi ) are estimated well when
the estimation error is guaranteed to be UUB. Thus, in where b1 and b2 are damping coefficients, and
this case, the unknown mismatched interconnection can be ! "
approximated successfully. F = k 1 + A2 (lk − l0 )2 (lk − l0 ), |A(lk − l0 )| < 1
Example 2: In order to further show the effectiveness of  
a1 cos θ1 − a2 cos θ2
the proposed decentralized control scheme based on local PI β = arctan
l0 − a1 sin θ1 + a2 sin θ2
algorithm, a hard spring connected parallel inverted pendulum ! "2
system [50], [51] is employed in our simulation. The model lk = (l0 −a1 sin θ1 +a2 sin θ2 )2+(a1 cos θ1 +a2 cos θ2 )2 .
of the parallel inverted pendulum system shown in Fig. 4 can
be expressed by In this simulation, parameters of the coupled inverted pen-
dulums are chosen as: δ1 = δ2 = 1, m1 = m2 = 1kg,
m1 l12 θ̈1 − m1 gl1 sin θ1 + b1 θ̇1 − Fa1 cos(θ1 − β) = δ1 u1 l1 = l2 = 0.5m, l0 = 1m, g = 9.8m/s2 , b1 = b2 = 0.009,
m2 l22 θ̈2 − m2 gl2 sin θ2 + b2 θ̇2 − Fa2 cos(θ2 − β) = δ2 u2 (32) k = 30, A = 0.1, and the spring position a1 = a2 = 0.1.

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: DECENTRALIZED CONTROL FOR LARGE-SCALE NONLINEAR SYSTEMS 1733

Fig. 4. Parallel inverted pendulum system.

Fig. 6. Weights of critic NNs of Example 2.

Fig. 5. State estimation errors of Example 2.

Let xi = [xi1 , xi2 ]T = [θi , θ̇i ]T ∈ R2 , the model (32) can be


expressed as
ẋ11 = x12
ẋ12 = δ1 u1 + f1 (x1 ) + h1 (x)
ẋ21 = x22
ẋ22 = δ2 u2 + f2 (x2 ) + h2 (x)
where f1 (x1 ) = 5.88 sin x11 − 0.036x12 , f2 (x2 ) = 5.88 sin x21 −
0.036x22 , h1 (x) = 4Fa1 cos(x11 − β), and h2 (x) =
4Fa2 cos(x21 − β). Fig. 7. System states of Example 2.
Let the initial states of the parallel inverted pendulum, the
structure of critic NN be the same as those of Example 1.
that the system states can converge to zero by using the
Let initial values of the weight vectors, respectively, be
presented decentralized control policy. The simulation results
W1c0 = [1, 1.8, 1.6]T and W2c0 = [1.6, 1, 1.2]T , Qi = 0.1I2 ,
reveal that the proposed decentralized control scheme can
and Ri = 0.01I, the weight learning rates of the approximated
be applied to large-scale nonlinear systems with unknown
interconnection and the critic NN be ih = 100 and lic = 0.2,
mismatched interconnections.
the updated rate of δ̂i in value function (7) be iδ = 0.001,
and the state observer gain matrix be li = 10I2 , respectively.
The simulation results are illustrated in Figs. 5–7. Fig. 5 V. C ONCLUSION
illustrates the unknown interconnection can be estimated suc- In this paper, we proposed a decentralized control scheme
cessful online. We can see in Fig. 6, the weights of two based on local PI algorithm for large-scale nonlinear systems
critic NNs converge to [0.896337, 1.932218, 1.369870]T and with unknown mismatched interconnections. To relax the com-
[0.621462, 1.340702, 1.113269]T , respectively. Fig. 7 shows mon boundedness assumption of the interconnection, the local

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
1734 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 10, OCTOBER 2018

states of isolated subsystem and the substituted reference [20] D. Liu, X. Yang, and H. Li, “Adaptive optimal control for a class
states of coupled subsystems are employed to approximate of continuous-time affine nonlinear systems with unknown internal
dynamics,” Neural Comput. Appl., vol. 23, nos. 7–8, pp. 1843–1850,
interconnection terms. Then, an improved local performance 2013.
index function is established to reflect the NN substitution [21] D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, “Optimal control of
error. At last, by the Lyapunov stability theorem, the closed- unknown nonaffine nonlinear discrete-time systems based on adaptive
dynamic programming,” Automatica, vol. 48, no. 8, pp. 1825–1832,
loop large-scale nonlinear system is guaranteed to be UUB via 2012.
the developed decentralized control scheme. The simulation [22] H. Zhang, Y. Luo, and D. Liu, “Neural-network-based near-optimal con-
results ensure that the proposed decentralized control scheme trol for a class of discrete-time affine nonlinear systems with control
constraints,” IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1490–1503,
is effective. Sep. 2009.
[23] D. Liu, Q. Wei, and P. Yan, “Generalized policy iteration adaptive
dynamic programming for discrete-time nonlinear systems,” IEEE Trans.
R EFERENCES Syst., Man, Cybern., Syst., vol. 45, no. 12, pp. 1577–1591, Dec. 2015.
[24] L. Yang, J. Si, K. S. Tsakalis, and A. A. Rodriguez, “Direct heuris-
[1] H. Wang, X. Liu, and K. Liu, “Robust adaptive neural tracking control tic dynamic programming for nonlinear tracking control with filtered
for a class of stochastic nonlinear interconnected systems,” IEEE Trans. tracking error,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39,
Neural Netw. Learn. Syst., vol. 27, no. 3, pp. 510–523, Mar. 2016. no. 6, pp. 1617–1622, Dec. 2009.
[2] T. Li, D. Wang, J. Li, and Y. Li, “Adaptive decentralized NN control [25] Q. Wei and D. Liu, “Adaptive dynamic programming for optimal track-
of nonlinear interconnected time-delay systems with input saturation,” ing control of unknown nonlinear systems with application to coal gasi-
Asian J. Control, vol. 15, no. 2, pp. 533–542, 2013. fication,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, pp. 1020–1036,
[3] B. Zhao, Y. Li, and D. Liu, “Self-tuned local feedback gain based Oct. 2014.
decentralized fault tolerant control for a class of large-scale nonlinear [26] B. Zhao, D. Liu, X. Yang, and Y. Li, “Observer-critic structure-
systems,” Neurocomputing, vol. 235, pp. 147–156, Apr. 2017. based adaptive dynamic programming for decentralised tracking control
[4] B. Zhao and Y. Li, “Local joint information based active fault tolerant of unknown large-scale nonlinear systems,” Int. J. Syst. Sci., 2017,
control for reconfigurable manipulator,” Nonlin. Dyn., vol. 77, no. 3, doi: 10.1080/00207721.2017.1296982.
pp. 859–876, 2014. [27] X. Yang, D. Liu, and D. Wang, “Reinforcement learning for adaptive
[5] P. J. Werbos, “A menu of designs for reinforcement learning over time,” optimal control of unknown continuous-time nonlinear systems with
in Neural Network Control. Cambridge, MA, USA: MIT Press, 1990, input constraints,” Int. J. Control, vol. 87, no. 3, pp. 553–566, 2014.
pp. 67–95. [28] D. Wang, D. Liu, and H. Li, “Policy iteration algorithm for online design
[6] H. Wang, P. Shi, H. Li, and Q. Zhou, “Adaptive neural tracking control of robust control for a class of continuous-time nonlinear systems,” IEEE
for a class of nonlinear systems with dynamics uncertainties,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 2, pp. 627–632, Apr. 2014.
Trans. Cybern., to be published, doi: 10.1109/TCYB.2016.2607166. [29] Y. Jiang and Z.-P. Jiang, “Robust adaptive dynamic programming for
[7] H. Wang, W. Sun, and P. X. Liu, “Adaptive intelligent control large-scale systems with an application to multimachine power systems,”
of nonaffine nonlinear time-delay systems with dynamic uncertain- IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 59, no. 10, pp. 693–697,
ties,” IEEE Trans. Syst., Man, Cybern., Syst., to be published, Oct. 2012.
doi: 10.1109/TSMC.2016.2627048. [30] D. Wang, D. Liu, H. Li, B. Luo, and H. Ma, “An approximate opti-
[8] Z. Li, T. Li, and G. Feng, “Adaptive neural control for a class of mal control approach for robust stabilization of a class of discrete-time
stochastic nonlinear time-delay systems with unknown dead zone using nonlinear systems with uncertainties,” IEEE Trans. Syst., Man, Cybern.,
dynamic surface technique,” Int. J. Robust Nonlin. Control, vol. 26, no. 4, Syst., vol. 46, no. 5, pp. 713–717, May 2016.
pp. 759–781, 2016. [31] D. Liu, H. Li, and D. Wang, “Online synchronous approximate optimal
[9] T. Li, Z. Li, D. Wang, and C. L. P. Chen, “Output-feedback adaptive learning algorithm for multi-player non-zero-sum games with unknown
neural control for stochastic nonlinear time-varying delay systems with dynamics,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 44, no. 8,
unknown control directions,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1015–1027, Aug. 2014.
vol. 26, no. 6, pp. 1188–1201, Jun. 2016. [32] Z. Wang, L. Liu, H. Zhang, and G. Xiao, “Fault-tolerant controller design
[10] Y. Li, T. Li, and S. Tong, “Adaptive neural networks output feed- for a class of nonlinear MIMO discrete-time systems via online rein-
back dynamic surface control design for MIMO pure-feedback non- forcement learning algorithm,” IEEE Trans. Syst., Man, Cybern., Syst.,
linear systems with hysteresis,” Neurocomputing, vol. 198, pp. 58–68, vol. 46, no. 5, pp. 611–622, May 2016.
Jul. 2016. [33] B. Zhao, D. Liu, and Y. Li, “Online fault compensation control based
[11] Y. Li, T. Li, B. Miao, and C. L. P. Chen, “Adaptive NN control for a on policy iteration algorithm for a class of affine non-linear systems
class of stochastic nonlinear systems with unmodeled dynamics using with actuator failures,” IET Control Theory Appl., vol. 10, no. 15,
DSC technique,” Neurocomputing, vol. 149, pp. 142–150, Feb. 2015. pp. 1816–1823, Oct. 2016.
[12] F. Y. Wang, H. Zhang, and D. Liu, “Adaptive dynamic programming: [34] B. Zhao, D. Liu, and Y. Li, “Observer based adaptive dynamic program-
An introduction,” IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39–47, ming for fault tolerant control of a class of nonlinear systems,” Inf. Sci.,
May 2009. vol. 384, pp. 21–33, Apr. 2017.
[13] A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time [35] G. K. Venayagamoorthy, R. G. Harley, and D. C. Wunsch, “Dual
nonlinear HJB solution using approximate dynamic programming: heuristic programming excitation neurocontrol for generators in a mul-
Convergence proof,” IEEE Trans. Syst., Man, Cybern. B, Cybern., timachine power system,” IEEE Trans. Ind. Appl., vol. 39, no. 2,
vol. 38, no. 4, pp. 943–949, Aug. 2008. pp. 382–394, Mar./Apr. 2003.
[14] D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,” IEEE [36] D. Fuselli et al., “Action dependent heuristic dynamic programming for
Trans. Neural Netw., vol. 8, no. 5, pp. 997–1007, Sep. 1997. home energy resource scheduling,” Int. J. Elect. Power Energy Syst.,
[15] D. P. Bertsekas and J. N. Tsitsiklis, “Neuro-dynamic programming: An vol. 48, pp. 148–160, Jun. 2013.
overview,” in Proc. 34th IEEE Conf. Decis. Control, 1995, pp. 560–564. [37] S. Song, G. Cai, and X. Lin, “Optimal neuron-controller for fluid
[16] D. Wang, C.-X. Mu, and D.-R. Liu, “Data-driven nonlinear near- triple-tank system via improved ADDHP algorithm,” in Advances
optimal regulation based on iterative neural dynamic programming,” in Computational Intelligence. Berlin, Germany: Springer, 2009,
Acta Automatica Sinica, vol. 43, no. 3, pp. 366–375, 2017. pp. 483–492.
[17] L. P. Kaelbling, M. L. Littman, and A. M. Moore, “Reinforcement [38] J. Si and Y.-T. Wang, “Online learning control by association and rein-
learning: A survey,” J. Artif. Intell. Res., vol. 4, no. 1, pp. 237–285, forcement,” IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264–276,
1996. Mar. 2001.
[18] D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive [39] A. Saberi, “On optimality of decentralized control for a class of nonlin-
optimal control for continuous-time linear systems based on policy ear interconnected systems,” Automatica, vol. 24, no. 1, pp. 101–104,
iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009. 1988.
[19] Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control for [40] T. Bian, Y. Jiang, and Z.-P. Jiang, “Decentralized adaptive optimal con-
continuous-time linear systems with completely unknown dynamics,” trol of large-scale systems with application to power systems,” IEEE
Automatica, vol. 48, no. 10, pp. 2699–2704, 2012. Trans. Ind. Electron., vol. 62, no. 4, pp. 2439–2447, Apr. 2015.

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: DECENTRALIZED CONTROL FOR LARGE-SCALE NONLINEAR SYSTEMS 1735

[41] C. Lu, J. Si, and X. Xie, “Direct heuristic dynamic programming for Guang Shi received the B.S. degree in automation
damping oscillations in a large power system,” IEEE Trans. Syst., Man, from Zhejiang University, Hangzhou, China, in
Cybern. B, Cybern., vol. 38, no. 4, pp. 1008–1013, Aug. 2008. 2012. He is currently pursuing the Ph.D. degree
[42] D. Molina, G. K. Venayagamoorthy, J. Liang, and R. G. Harley, with the State Key Laboratory of Management
“Intelligent local area signals based damping of power system oscilla- and Control for Complex Systems, Institute of
tions using virtual generators and approximate dynamic programming,” Automation, Chinese Academy of Sciences,
IEEE Trans. Smart Grid, vol. 4, no. 1, pp. 498–508, Mar. 2013. Beijing, China.
[43] D. S. Bernstein, C. Amato, E. A. Hansen, and S. Zilberstein, “Policy His current research interests include neural
iteration for decentralized control of Markov decision processes,” networks, adaptive dynamic programming, and
J. Artif. Intell. Res., vol. 34, no. 1, pp. 89–132, 2009. optimal control and energy management in smart
[44] D. Liu, D. Wang, and H. Li, “Decentralized stabilization for a class grids.
of continuous-time nonlinear interconnected systems using online learn-
ing optimal control approach,” IEEE Trans. Neural Netw. Learn. Syst.,
vol. 25, no. 2, pp. 418–428, Feb. 2014.
[45] D. Liu, C. Li, H. Li, D. Wang, and H. Ma, “Neural-network-based decen-
tralized control of continuous-time nonlinear interconnected systems
with unknown dynamics,” Neurocomputing, vol. 165, pp. 90–98,
Oct. 2015.
[46] D. Wang, H. He, B. Zhao, and D. Liu, “Adaptive near-optimal con-
trollers for non-linear decentralised feedback stabilisation problems,”
IET Control Theory Appl., vol. 11, no. 6, pp. 799–806, Apr. 2017.
[47] A. Karimi, S. Eftekharnejad, and A. Feliachi, “Reinforcement learning
based backstepping control of power system oscillations,” Elect. Power
Syst. Res., vol. 79, no. 11, pp. 1511–1520, 2009. Derong Liu (S’91–M’94–SM’96–F’05) received the
[48] S. Mehraeen and S. Jagannathan, “Decentralized optimal control of a Ph.D. degree in electrical engineering from the
class of interconnected nonlinear discrete-time systems by using online University of Notre Dame, Notre Dame, IN, USA,
Hamilton-Jacobi-Bellman formulation,” IEEE Trans. Neural Netw., in 1994.
vol. 22, no. 11, pp. 1757–1769, Nov. 2011. He was a Staff Fellow with General Motors
[49] W. Chen and J. Li, “Decentralized output-feedback neural control Research and Development Center, from 1993 to
for systems with unknown interconnections,” IEEE Trans. Syst., Man, 1995. He was an Assistant Professor with the
Cybern. B, Cybern., vol. 38, no. 1, pp. 258–266, Feb. 2008. Department of Electrical and Computer Engineering,
[50] Y. Tang, M. Tomizuka, G. Guerrero, and G. Montemayor, “Decentralized Stevens Institute of Technology, Hoboken, NJ, USA,
robust control of mechanical systems,” IEEE Trans. Autom. Control, from 1995 to 1999. He joined the University of
vol. 45, no. 4, pp. 771–776, Apr. 2000. Illinois at Chicago, Chicago, IL, USA, in 1999, and
[51] C. Hua, Y. Li, H. Wang, and X. Guan, “Decentralised fault-tolerant became a Full Professor of Electrical and Computer Engineering and of
finite-time control for a class of interconnected non-linear systems,” IET Computer Science in 2006. He served as the Associate Director of the State
Control Theory Appl., vol. 9, no. 16, pp. 2331–2339, Oct. 2015. Key Laboratory of Management and Control for Complex Systems, Institute
of Automation, Chinese Academy of Sciences, Beijing, China, from 2010
to 2015. He is currently a Full Professor with the School of Automation
Bo Zhao (M’16) received the B.S. degree in and Electrical Engineering, University of Science and Technology Beijing,
automation and the Ph.D. degree in control science Beijing. He has published 15 books (six research monographs and nine edited
and engineering from Jilin University, Changchun, volumes).
China, in 2009 and 2014, respectively. Dr. Liu was a recipient of the Faculty Early Career Development Award
He is currently a Post-Doctoral Fellow with the from the National Science Foundation in 1999, the University Scholar Award
State Key Laboratory of Management and Control from University of Illinois from 2006 to 2009, the Overseas Outstanding
for Complex Systems, Institute of Automation, Young Scholar Award from the National Natural Science Foundation of China
Chinese Academy of Sciences, Beijing, China, in 2008, and the Outstanding Achievement Award from Asia Pacific Neural
from 2014. His current research interests include Network Assembly in 2014. He was selected for the “100 Talents Program” by
adaptive dynamic programming, fault diagnosis the Chinese Academy of Sciences in 2008. He is an Elected AdCom Member
and tolerant control, neural-network-based control, of the IEEE Computational Intelligence Society and he is the Editor-in-Chief
optimal control, and robot control. of the Artificial Intelligence Review. He was the General Chair of 2014 IEEE
World Congress on Computational Intelligence and is the General Chair of
2016 World Congress on Intelligent Control and Automation. He is a Fellow
of the International Neural Network Society.
Ding Wang (M’15) received the B.S. degree
in mathematics from the Zhengzhou University
of Light Industry, Zhengzhou, China, the M.S.
degree in operations research and cybernetics from
Northeastern University, Shenyang, China, and the
Ph.D. degree in control theory and control engi-
neering from the Institute of Automation, Chinese
Academy of Sciences, Beijing, China, in 2007, 2009,
and 2012, respectively.
He was a Visiting Scholar with the Department of
Electrical, Computer, and Biomedical Engineering,
University of Rhode Island, Kingston, RI, USA, from 2015 to 2017. He is cur-
rently an Associate Professor with the State Key Laboratory of Management
and Control for Complex Systems, Institute of Automation, Chinese Academy Yuanchun Li received the Ph.D. degree in general
of Sciences. His current research interests include adaptive and learning mechanics from the Harbin Institute of Technology,
systems, computational intelligence, and intelligent control. He has published Harbin, China, in 1990.
over 90 journal and conference papers, and co-authored two monographs. He is currently a Professor in Control Science
Dr. Wang was a recipient of the Excellent Doctoral Dissertation Award of and Engineering with the Changchun University of
Chinese Academy of Sciences in 2013, and a nomination of the Excellent Technology, Changchun, China. His current research
Doctoral Dissertation Award of Chinese Association of Automation (CAA) interests include adaptive dynamic programming,
in 2014. He serves as an Associate Editor of the IEEE T RANSACTIONS ON complex system modeling, and robot control, intel-
N EURAL N ETWORKS AND L EARNING S YSTEMS and Neurocomputing. He is ligent control.
a member of Asia–Pacific Neural Network Society and CAA.

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:27 UTC from IEEE Xplore. Restrictions apply.

You might also like