B2.1
B2.1
Information Sciences
journal homepage: www.elsevier.com/locate/ins
a r t i c l e i n f o a b s t r a c t
Article history: This paper studies the problem of sliding-mode surface (SMS)-based adaptive optimal con-
Received 8 March 2021 trol for a class of continuous-time switched nonlinear systems with average dwell time
Received in revised form 15 August 2021 (ADT) via using an actor-critic (AC) reinforcement learning (RL) strategy. By developing a
Accepted 18 August 2021
specific cost function related to SMS, the original control problem is equivalently trans-
Available online 20 August 2021
formed into the problem of finding a series of optimal control policies. Then, the error
terms separated from the Hamilton-Jacobi-Bellman (HJB) equation are integrated into a
Keywords:
function, which effectively reduces the influence of some repeated magnifications caused
Switched nonlinear systems
Adaptive optimal control
by approximation errors. Besides, the solution to the HJB equation is identified by the
Actor-critic (AC) architecture SMS-based AC neural networks (NNs), where the actor and critic NNs are developed to
Sliding mode surface (SMS) carry out the RL strategy simultaneously. The actor updating law is to implement control
Average dwell time (ADT) actions based on the system output, while the critic updating law is required to assess
the current control action and feedback to the actor. Based on the Lyapunov stability the-
ory, the applicability of the proposed adaptive AC optimal control method is verified to
guarantee the boundedness of all signals in the considered closed-loop switched nonlinear
systems. Finally, a simulation examples is given to illustrate the effectiveness of the pro-
posed adaptive optimal control method.
Ó 2021 Elsevier Inc. All rights reserved.
1. Introduction
As a representative class of hybrid systems, switched [1–3] systems have captured extensive attention over the past two
decades. This is mainly because a host of practical applications inevitably manifest the characteristic of multimodality due to
the changes of environmental factors, and thus the system models of actual plants are able to be described via a switchable
architecture, such as biological ecosystems [4], flight systems [5], robot systems [6], etc. In general, a switched system is con-
stituted of a switching signal, a family of continuous-time or discrete-time subsystems and a switching rule [7]. Each sub-
system of a switched system can be regarded as a mode, and the switching rule can be controlled to achieve the purpose of
appropriate switching among modes. A large number of control problems on switched systems have been investigated in [8–
11], just mention a few works. Besides, a great deal of outstanding strategies concerning the switching rule design of
switched systems have been developed, such as minimum switching strategy [12], delay switching strategy [13], average
⇑ Corresponding author.
E-mail address: [email protected] (H. Zhang).
https://doi.org/10.1016/j.ins.2021.08.062
0020-0255/Ó 2021 Elsevier Inc. All rights reserved.
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
dwell time strategy [14], arbitrary switching strategy [15], etc. Compared with the arbitrary switching strategy, the average
dwell time switching strategy means that the number of switching times of a group of restricted switching signals is finite,
and the dwell time is not less than a constant, and its limit case is the arbitrary switching [16]. Therefore, the research on
average dwell time switching strategy is more meaningful.
It is well known that multitudinous systems in practical engineering possess the natural property of nonlinearity to a cer-
tain extent, which will make the systems display diversities and high uncertainties [17–19], hence, it will be more compli-
cated or even unable to design an appropriate control law for such nonlinear systems. During the past a few decades, it has
been put a high value on the control issue of nonlinear systems, and some significant approaches have been developed, such
as fuzzy control [20], variable structure control [21], neural network (NN) control [22], feedback linearization method [23],
and so on. Although these control methods have been widely discussed, their applications inevitably encounter some restric-
tions, such as the acquisition and determination of fuzzy control rules and membership functions, the choice regarding the
node number of hidden layers, the chattering phenomenon of variable structure control, feedback linearization conditions,
etc. By comparison, adaptive optimal control [24,25] is a special control method, which is able to adaptively adjust controller
parameters according to the changes of various environments, so as to optimize the performance index of controlled sys-
tems. Hence, the optimal control is quite necessary for some practical applications, such as optimal path planning for
unmanned ground vehicles, real-time optimal control for spacecraft orbit transfer, optimal trajectory planning and collision
avoidance for underwater vehicles, etc. In brief, this control method cannot only deal with the control issues of nonlinear
systems but also guarantee the expected systems performances. However, the linchpin of optimal control law design is to
solve the Hamilton-Jacobi-Bellman (HJB) equation [26,27]. Owing to the fact that the HJB equation is a nonlinear partial dif-
ferential equation, hence, it is quite hard to obtain the solution to the HJB equation, which has become the major obstacle to
design an appropriate optimal control law.
To cope with the optimal control issues of nonlinear systems, some relevant control technologies have been extensively
discussed in the past decade. As an effective approximate optimal method, adaptive dynamic programming (ADP) [28] has
been concerned by the control community. This control method combines reinforcement learning (RL) [29] and dynamic
programming (DP), which overcomes the limitations of traditional DP, such as the curse of dimension [30,31]. Besides,
ADP is able to approximate the solution to the HJB equation via online input-state data without using the precise model
knowledge to achieve approximate optimal control. For practical systems, stability is the basic requirement. Nevertheless,
the transient and steady performance, as well as the optimality, are often further considerations. Recently, ADP control tech-
nology based on an actor-critic framework [32,33] has received more and more attention. The actor-critic framework is gen-
erally composed of two NNs, which are a critic NN and an actor NN. The critic NN is used to evaluate the current system
performance and to modify the control action based on the actor NN. With the modification of the critic NN, the actor
NN continually generates outputs to control systems such that optimal performances can come true. Combining the features
of actor-only and critic-only architectures, the AC architecture usually acts out better properties, which has been successfully
applied to achieve adaptive optimal stabilization for nonlinear dynamic systems.
Considering the improvement of system response speed, slide mode control (SMC) [34–36] has attracted extensive
research interest in recent years. SMC is an effective robust control approach, and one of the characteristics of this control
approach is that the structure of slide mode surface is not fixed, and different control effects are implemented according to
the changeable slide mode surfaces composed of system states [37]. Besides, SMC can also overcome the influences from
system uncertainties [38], and possess strong robustness to unknown disturbances and perturbations [39]. Some different
types of slide mode control have so far been developed, such as integral sliding mode control [40], terminal sliding mode
control [41] and hierarchical sliding mode control [42], etc. Recently, plenty of adaptive optimal control problems combined
with slide mode control have been deeply studied. The authors in [43] investigated the nearly optimal control problem for
affine nonlinear systems with actuator faults and state constraints by using an integral sliding mode surface. In [44], a novel
optimal guaranteed cost sliding mode control for constrained-input nonlinear systems was studied. The sliding-mode
surface-based approximate optimal control problems for uncertain nonlinear systems with asymptotically stable critic struc-
ture was investigated in [45]. However, there is no research on the sliding-mode surface-based adaptive optimal control for
uncertain switched nonlinear systems with average dwell time under an actor-critic framework now, which inspires our pre-
sent work.
Motivated by the above discussions, in this article, we focus on the problem of adaptive AC optimal control for switched
nonlinear systems by using the SCM technique and ADT strategy. The main contributions are listed in the following:
1) On the basis of the actor-critic RL architecture, a sliding-mode surface-based adaptive optimal control approach for
switched nonlinear systems with average dwell time is proposed for the first time. In addition, a class of switching rules
with ADT is given.
2) Compared with the existing results in [46,47], the control scheme of this paper is extended from the original non-
switched framework to a switched actor-critic dual networks framework. In addition, different from the optimal control
methods requiring information of system state, this paper combines the optimal control theory with sliding mode tech-
nology, which improves the response rate of the considered switched system to a certain extent.
3) In the proposed control scheme, the critic updating law is designed via using the gradient descent method. Also, the
designed critic updating law can evaluate the current control action and make feedback to the actor updating law, which
minimizes the cost function to achieve the optimal control.
757
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
The rest of this paper is organized as follows. The problem formulation and preliminaries are given in Section 2. In Sec-
tion 3, we present an adaptive optimal control scheme under actor-critic dual networks architecture. In Section 4, the sta-
bility of the system is strictly proved based on Lyapunov stability theory. In Section 5, we present a practical example of the
dual inverted pendulums to illustrate the validity of the proposed control method. Finally, the conclusion is summarised in
Section 6. In addition, some key notations in this paper are summarized in Table 1.
Table 1
Notation description.
Symbol Meaning
switching signal. For k 2 Xr ; uk ðtÞ 2 Rn is the control input. g k ðxÞ 2 Rn with g k ð0Þ 2 0n is the internal dynamic and
hk ðxÞ 2 Rnn denotes the control gain matrix. g k ðxÞ 2 Rn and hk ðxÞ 2 Rnn are differentiable with the condition that
g k ðxÞ þ hk ðxÞuk ðt Þ is Lipschitz continuous, which guarantees the system (1) to exist a unique solution for the bounded initial
value.
Assumption 1. For k 2 Xr , the control gain matrix hk ðxÞ and the internal dynamic g k ðxÞ are assumed to be known and norm-
bounded. Moreover, the control gain matrix hk ðxÞ is nonsingular, which indicates that hk ðxÞ is an invertible matrix.
Similar to [42], the sliding mode variable is defined as
s ¼ GxI þ x ð2Þ
hR Rt Rt iT
t
where xI ¼ 0
x1 ðsÞds; 0
x2 ðsÞds; ; 0
xn ðsÞds 2 Rn , and G 2 Rnn is a diagonal positive-definite matrix. Furthermore, the
time derivative of the sliding mode surface s can be obtained as follows
s_ ¼ Gx þ x_ ð3Þ
Definition 1. [14] A switching signal rðtÞ has an ADT ba 2 Rþ if there exists a positive constant C0 2 Rþ such that
T t
Cr ðT; t Þ P C0 þ ; 8T P t P 0;
ba
where Cr ðT; t Þ is the number of switches occurring in the interval ½t; T Þ.
For any continuous and smooth function F ðvÞ defined on the set Xv 2 R, there exist a NN ðW Þ UðvÞ satisfying [15]
T
758
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
and
" #
W ¼ arg minm supF ðvÞ ðW Þ UðvÞ
T
W2R v2Xv
T
where dðvÞ is the estimation error and m is the number of neural nodes. W ¼ W 1 ; W 2 ; ; W m represents the ideal weight
vector. Besides, UðvÞ ¼ ½U1 ðvÞ; U2 ðvÞ; ; Um ðvÞT denotes the basis function vector with the Gaussian function
" #
ðv fi ÞT ðv fi Þ
Ui ðvÞ ¼ exp ; i ¼ 1; 2; ; m
q2i
where fi and qi stand for the center and width of Gaussian function, respectively.
qa1 1
N1 N2 6 jN ja1 þ jN ja2
a1 1 a2 qa2 2
where q > 0; a1 > 1; a2 > 1 and ða1 1Þða2 1Þ ¼ 1.
Define an infinite-horizon integral function as the performance index with respect to the slide mode surface s as follows
Z 1
J k ðsÞ ¼ ‘k ðsðsÞ; uk ðsÞÞds ð4Þ
t
where ‘k ðsðtÞ; uk ðt ÞÞ ¼ sT ðtÞQ k sðtÞ þ uTk ðt ÞRk uk ðtÞ; k 2 Xr is a local cost function with ‘k ¼ ð0; 0Þ ¼ 0, and Q k 2 Rnn and
Rk 2 Rnn are invertible positive-definite matrixes.
Definition 2. [26] A control protocol uk ; k 2 Xr , is admissible on the compact set /k denoted by uk 2 Nk ð/k Þ, if it is
continuous and uk ð0Þ ¼ 0, then J k ðsÞ will be finite on /k .
759
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
Remark 1. By observing the formula (9), it can be easily obtained the gradient term rJ k ðsÞ. Then, we can further obtain the
optimal control protocol uk by combining the Eq. (8). However, it is difficult to handle the gradient term rJ k ðsÞ due to rJ k ðsÞ
is unknown and possesses strong nonlinearity. In order to solve this issue, the NNs-based actor-critic RL method is
considered in this paper. Besides, the block diagram of the proposed AC optimal control scheme is shown in Fig. 1.
By applying the NNs, the optimal performance index (6) can be approximated as
T
J k ðsÞ ¼ W k Uk ðsÞ þ dk ðsÞ
where rUk ðsÞ 2 Rmn and rdk ðsÞ 2 Rn1 are the gradient of Uk ðsÞ and dk ðsÞ, respectively.
Substituting (11) into (8), we have
1 1
uk ¼ R1 h ðxÞrUTk ðsÞW k R1
T T
h ðxÞrdk ðsÞ ð12Þ
2 k k 2 k k
Since the J k ðsÞ and uk are unknown, they are not available. Based on the NNs approximation, (10) is approximated as
^J ðsÞ ¼ W
^ T Uk ðsÞ ð13Þ
k ck
1 1 1 T c
uk ¼ ck R1
T
k hk ðxÞsðt Þ hk ðxÞGx Rk hk ðxÞrUk ðsÞ W ak
T
ð14Þ
2
where ck > 0; k 2 Xr , is a design parameter. W^ a 2 Rm is the actor NN weight vector, which will be given later.
k
760
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
where xðtÞ ¼ 14 W k T rUk ðsÞIk ðxÞrdk ðsÞ þ 14 rdTk ðsÞIk ðxÞrUTk ðsÞW k þ W k T rUk ðsÞGx þ 14 rdTk ðsÞ@k ðxÞrdk ðsÞ þ rdTk ðsÞ
T
1 1
Gx þ g k ðxÞ þ hk ðxÞuk 2 R with Ik ðxÞ ¼ hk ðxÞR1
T
k hk ðxÞ 2 R
nn
and @k ðxÞ ¼ GT hk Rk hk G 2 Rnn , and }k ðx; sÞ ¼
rUk ðsÞIk ðxÞrUTk ðsÞ 2 Rmm .
Remark 2. The function xk ðtÞ contains all terms related to the NNs approximation errors of (15). To alleviate the influence of
repeated magnifications caused by the errors, in this paper, xk ðt Þ is assumed to be bounded.
The gradient of ^J ðsÞ in (13) can be written by
k
r ^J ðsÞ ¼ rU T ^
k k ðsÞW ck ð16Þ
Substituting (14) and (16) into (7), one has
h i
Hk s; uk ; r^J k ðsÞ ¼ sT ðtÞQ k sðtÞ þ ck ck sT ðtÞ þ 12 W^T W
ak
^ T rUk ðsÞ @k ðxÞsðtÞ
ck
1 1
þ ck sT ðt Þ þ xT GT hk ðxÞRk hk ðxÞ Gx þ 12 ck sT ðtÞIk ðxÞ ð17Þ
þxT GT rUTk ðsÞ W ca 1 W ^ T 1W ^ T } ðx; sÞ W ca
k 2 ck 2 ak k k
Based on the (18), the square of the Bellman residual error is defined as
1 2
EB ðt Þ , H ðt Þ ð19Þ
2 k
c c is designed by using the gradient descent method and
To minimize the Bellman residual error, the critic updating law W k
c
W ck is presented as follows
_ g c ek ð t Þ
c
W ck ¼ k c c þ eT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W
sT ðt ÞQ k sðt Þ þ 3eTk ðtÞ W ^ T } ðx; sÞ W
ca þ 1 W ca ð20Þ
2 k k k k 4 ak k k
c a 2 Rm1 ; k 2 Xr , and g > 0 is a design parameter.
where ek ðt Þ ¼ rUk ðsÞ g k ðtÞ ck Ik ðxÞsðt Þ 12 Ik ðxÞrUTk ðsÞ W k ck
Assumption 3. [32] Persistence of excitation (PE): Within the interval ½t; t þ Dt with Dt > 0, there exist positive constants
k ; k 2 Xr , such that ek ðt ÞeTk ðtÞ satisfies:
uk ; u
k Im
uk Im 6 ek ðtÞeTk ðtÞ 6 u
where Im 2 Rmm is an identity matrix.
c a is designed as
Besides, The actor updating law W k
^_ a
W ¼ k
k
gc u
cc þ
rUk ðsÞ@k ðxÞrUTk ðsÞ W
gc
k ^ T } ðx; sÞ W
ek ðtÞW cc
2 8 ak k
k k k
ð21Þ
ca
þ 12 rUk ðsÞITk ðxÞsðt Þ gak W k
Remark 3. To achieve the purpose of minimizing Hk ðtÞ, the square term EB ðtÞ , 12 H2k ðt Þ is required that E_ B ðtÞ 6 0, which can
make control policies close to the optimal one. To this end, the critic updating law (20) is designed via the negative gradient
descent method such that E_ B ðt Þ 6 0 can be satisfied.
In this section, the main results and stability analysis of this paper will be summarized in the following theorem. For the
considered system (1), a detailed process to prove the validity of Theorem 1 will be presented.
761
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
Theorem 1. Consider the proposed switched nonlinear system (1), under Assumptions 1–3, the ideal optimal control policy (14),
together with the critic updating law in (20) and the actor updating law in (21), including the sliding mode surface (2), the
developed adaptive optimal control scheme can ensure that all signals of the system (1) can be bounded for every switching signal
with ADT ba > log 1= k n .
Remark 4. In modern control theory, the second Lyapunov method is a main method to study the system stability. However,
there is short of a general method to construct the Lyapunov function, and thus it is necessary to analyze the constructions of
the Lyapunov function for different practical applications. It is worth noting that based on Lyapunov stability theory, the Lya-
punov function is usually suggested to be selected as a positive definite quadratic function, then control laws are designed to
ensure that the time derivative of the Lyapunov function is negative definite such that the considered practical system is
stable.
Combining (1) and (3), the time derivative of (22) is given as follows
V_ k ðt Þ ¼ sT ðt Þs_ ðtÞ W
~TW ^_ c W
~T W ^_ a
ck k ak k
ð23Þ
~TW
¼ sT ðt ÞðGx þ g k ðxÞ þ hk ðxÞuk ðt ÞÞ W ^_ c W
~T W ^_ a
ck k ak k
Substituting the optimal controller (14) and the critic updating law (20) into (23), one has
h
V_ k ðt Þ ¼ sT ðt Þ Gx þ g k ðxÞ þ hk ðxÞ ck R1
T 1
k hk ðxÞsðt Þ hk ðxÞGx
i h
12 R1
T c ~ T gck ek ðtÞ sT ðt ÞQ k sðt Þ
k hk ðxÞrUk ðsÞ W ak
T
W ck 2
ca
þeTk ðt ÞrUk ðsÞ@k ðxÞrUTk ðsÞ W k
i
1 ^ T c ~ T ^_
þ 4 W ak }k ðx; sÞ W ak W ak W ak
ð26Þ
ca
þeTk ðt ÞrUk ðsÞ@k ðxÞrUTk ðsÞ W k
i
^ T } ðx; sÞ W
þ 14 W ca W~T W ^_ a þ 1 kg ðxÞk2
ak k k ak k 2 k
762
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
ca
6 sT ðtÞ ck Ik ðxÞ 12 sðtÞ 12 sT ðt ÞIk ðxÞrUTk ðsÞ W ð28Þ
k
gc T gc T
k ~ T k ~ T
þ 8 W ck ek ðtÞ W k }k ðx; sÞW k 2 W ck ek ðt Þ W k rUk ðsÞg k ðxÞ
gc
k ~ T ek ðtÞxk ðt Þ þ 3gck W
W ~ T ek ðtÞeT ðt Þ W
cc
2 ck 2 ck k k
gc
þ k ~ T ek ðtÞeT ðt ÞrUk ðsÞ@k ðxÞrUT ðsÞ W
W ca
2 ck k k k
þ
gc
k ~ T e k ðt ÞW
W ^ T } ðx; sÞ W
ca W
~T W ^_ a þ 1 kg ðxÞk2
8 ck ak k k ak k 2 k
3gc uk
k ~
6 2 W ck W ck W ck
T
ð30Þ
k
3g c u k
3gc u 2
6 k ~ T Wc þ
W k
W ck
4 ck k 4
gc T gc T
k ~ T k ~ T
þ 8 W ck ek ðtÞ W k }k ðx; sÞW k 2 W ck ek ðt Þ W k rUk ðsÞg k ðxÞ
k
5gc u
~ T W c þ gck W
W ~ T ek ðt ÞeT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W
ca
ð31Þ
k
8 ck k 2 ck k k k
þ
gc
k ~ T e k ðt ÞW
W ^ T } ðx; sÞ W
ca ~T W
W ^_ a þ gck jx
k j2
8 ck ak k k ak k 2
k
3gc u 2
þ 12 kg k ðxÞk2 þ k
4
W ck
gck ~ T T k
gc u g 2
W ck ek ðt Þ W k rUk ðsÞg k ðxÞ 6 k W ~TW ~ c þ ck W T rUk ðsÞg ðxÞ ð33Þ
ck k k
2 k
8 2
into (31) results in
V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 12 sðtÞ 12 sT ðt ÞIk ðxÞrUTk ðsÞ W
ca
k
k
3gc u
k ~ T W c þ gck W
W ~ T ek ðt ÞeT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W
ca
8 ck k 2 ck k k k
þ
gc
k ~ T e k ðt ÞW
W ^ T } ðx; sÞ W
ca W
~T W ^_ a þ 1 kg ðxÞk2 þ gck jx
k j2
8 ck ak k k ak k 2 k 2 ð34Þ
k
3gc u 2 gc
T
2
þ k
4
W ck þ 32
k
W k }k ðx; sÞW k
gc T 2
þ 2
k
W k rUk ðsÞg k ðxÞ
763
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
^_ a þ 1 IT ðxÞrUT ðsÞW
gc 2
þ k ~ T e k ðt ÞW
W ^ T } ðx; sÞ W
ca W
~T W
8 ck ak k k ak k 2 k k ak ð36Þ
gc k
3gc u 2 gc T 2
þ 12 kg k ðxÞk2 þ 2
k kj þ
jx 2 k
4
W ck þ 32
k
W k }k ðx; sÞW k
gc
T
2
þ 2
k
W k rUk ðsÞg k ðxÞ
2 ð37Þ
þ 12 ITk ðxÞrUTk ðsÞW ak
gc k
3gc u 2 gc T 2
þ 12 kg k ðxÞk2 þ 2
k k j2 þ
jx k
4
W ck þ 32
k
W k }k ðx; sÞW k
gc T 2
þ 2
k
W k rUk ðsÞg k ðxÞ
k
6 ~ T rUk ðsÞ@k ðxÞrUT ðsÞW a þ gck u k W
W
gc u
k ~ T rUk ðsÞ@k ðxÞrUT ðsÞW
ck 2 k k 2 ck k ak
k
gc u
T k
gc u
6 k2 W ck rUk ðsÞ@k ðxÞrUk ðsÞW ak þ k2 W
T ~ T rUk ðsÞ@k ðxÞrU ðsÞW
T
ck k ak
k
gc u
^ T rUk ðsÞ@k ðxÞrUT ðsÞW a
ð38Þ
þ k
2
W ck k k
k 2
6
gc u
k ~ T W c þ gck u k W
W ~ T W a þ gck u k rUk ðsÞ@k ðxÞrUT ðsÞW
8 ck k 4 ak k 2 k ak
k
gc u
T 2
k
gc u
þ k
W ck rUk ðsÞ@k ðxÞrUTk ðsÞ þ k ^ T rUk ðsÞ@k ðxÞrUT ðsÞW a
W
4 2 ck k k
k
gc u
k ~ T W c þ gck u k W
W ~ T W a gck W
~ T ek ðt ÞW
^ T } ðx; sÞ W
cc
4 ck k 4 ak k 8 ak ak k k
2
~T W
þgak W c a þ gck jx
k j2 þ 12 kg k ðxÞk2 þ 12 ITk ðxÞrUTk ðsÞW a
ak k 2 k
gc T 2 gc T 2 ð39Þ
þ 32k W k }k ðx; sÞW k þ 2
k
W k rUk ðsÞg k ðxÞ
k
3gc u 2 k
gc u 2
þ k
4
W ck þ k
2
rUk ðsÞ@k ðxÞrUTk ðsÞW ak
k
gc u
T 2
þ k
4
W ck rUk ðsÞ@k ðxÞrUTk ðsÞ
764
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
gc
T g
8k W~ T ek ðtÞ W } ðx; sÞW a ck W ~ T ek ðt ÞW
~ T } ðx; sÞW
ck ak k k 8 ck ak k ak
gc
þ k
W ~ T } ðx; sÞW a
~ T ek ðtÞW
8 ck ak k k
k
6
gc u
k
W~ T W c þ gck u k W
~ T W a þ gck W^ T ek ðt ÞW
^ T } ðx; sÞW a
8 ck k 16 ak k 8 ck ak k k
gc
T 2
gc
T
þ 16k
W ak }k ðx; sÞW ak þ 16k W ~ T } ðx; sÞW W
}Tk ðx; sÞW ak
ak k ak ak
ð40Þ
gc
T 2
gc
T
þ 16k W ak }k ðx; sÞW ck ~ T } ðx; sÞW W }T ðx; sÞW a
þ 16k W ak k ck ck k k
k
6 W~ T W c þ gck u k W
gc u
k ~ T W a þ gck W ~ T ðAk þ Bk ÞW a
ck 8k 16 ak k 16 ak k
gc g
T 2
^ T ek ðtÞW
þ 8k W ^ T } ðx; sÞW a þ k W c
}k ðx; sÞW ak
ck ak k k 16 ak
gc
T 2
þ 16k W ak }k ðx; sÞW ck
T T
where Ak ¼ }k ðx; sÞW ak W ak }Tk ðx; sÞ 2 Rmm and Bk ¼ }k ðx; sÞW ck W ck }Tk ðx; sÞ 2 Rmm .
It can be readily verified from (39) and (40) that
k
gc u
V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 58 sðtÞ k8 W ~ T W c þ 3gck u k W
~ T Wa
ck k 8 ak k
gc T 2
~ T ðAk þ Bk ÞW a þ g W
þ 16k W ~T W c a þ gck W } ðx; sÞW
ak k ak ak k 16 ak k ak
gc T 2 k
gc u 2
þ 2
k
W k rUk ðsÞg k ðxÞ þ k
2
rUk ðsÞ@k ðxÞrUTk ðsÞW ak
2 gc
T 2 ð41Þ
þ 12 ITk ðxÞrUTk ðsÞW ak þ 12 kg k ðxÞk2 þ 16
k
W ak }k ðx; sÞW ck
gc k
3gc u 2 gc T 2
þ 2
k k j2 þ
jx k
4
W ck þ 32
k
W k }k ðx; sÞW k
k
gc u
T 2
þ k
4
W ck rUk ðsÞ@k ðxÞrUTk ðsÞ
~T W ^a 6
gak ~ T ~ ga 2
gak W ak W ak W ak þ k W ak ð42Þ
k
2 2
into (41) yields
k
gc u
V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 58 sðtÞ k8 W ~ T Wc
ck k
gc
h i ð43Þ
k
3u ~
2 1 4 8 kmin ðAk þ Bk Þ W ak W ak þ hk ðt Þ
k 1 T
where kmin ðAk þ Bk Þ is the minimum eigenvalue of the matrix ðAk þ Bk Þ, and
k
gc u 2 k
gc u
T 2
hk ðt Þ ¼ k
2
rUk ðsÞ@k ðxÞrUTk ðsÞW ak þ k
4
W ck rUk ðsÞ@k ðxÞrUTk ðsÞ
ga 2 gc
T 2 2
þ 2
k
W ak þ þ 16k W ak }k ðx; sÞW ak þ 12 ITk ðxÞrUTk ðsÞW ak
gc
T 2
gc k
3gc u 2
þ 12 kg k ðxÞk2 þ 16
k
W ak }k ðx; sÞW ck þ 2
k k j2 þ
jx k
4
W ck
gc T 2 gc T 2
þ 32k W k }k ðx; sÞW k þ 2
k
W k rUk ðsÞg k ðxÞ
Let
5
k1k ¼ 2ck kmin ðIk ðxÞÞ ð44Þ
4
765
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
where kmin ðIk ðxÞÞ is the minimum eigenvalue of the matrix Ik ðxÞ, and
1
k 2k ¼ g uk ð45Þ
4 ck
k 1
3u
k3k ¼ gck 1 kmin ðAk þ Bk Þ ð46Þ
4 8
kmax ðIk Þ
1 ¼ max ð47Þ
k2X kmin ðIk Þ
where kmin ðIk Þ and kmax ðIk Þ are the minimum eigenvalue and maximum eigenvalue of the matrix Ik , respectively. To guar-
antee the stability of the considered closed-loop system, the parameters ck ; u k ; k 2 Xr , are required to satisfy
k < ð8 kmin ðAk þ Bk ÞÞ=6, respectively.
ck > 5=ð8kmin ðIk ðxÞÞÞ and u
With regard to (44), (45) and (46), and then taking (43) into account yields the following inequality.
V_ k 6 k1k sT ðt ÞsðtÞ k2k W
~TW
ck
~ c k3 W
k k
~T W
ak
~ a þ hk ðtÞ
k
ð48Þ
Furthermore, we have.
V_ k ðt Þ 6 k V k ðt Þ þ h ð49Þ
where
n o
k ¼ min k1k ; k2k ; k3k
k2X
and
¼ supfhk g
h
k2X
Remark 5. Compared with some general optimal control schemes for nonlinear systems, the developed control scheme in
this paper can also guarantee the optimal control performance. It is different from the optimal control of nonlinear systems
based on system states, the optimal control scheme proposed in this paper provides faster control response by combining
sliding mode technology. Besides, the computational complexity booms as the dimensionality of the system increases during
the process of solving the HJB equation. To cope with this problem, the AC dual networks architecture is used in this paper.
Under this design architecture, the issue of curse of dimensionality can be well avoided.
h iT
2 j1 such that lðkWkÞ 6 V k ðWÞ 6 l
In addition, there are l; l ðkWkÞ with W ¼ s; W ak ; W ck . According to (47), we can
deduce that
V k ðWÞ 6 1V l ðWÞ; 8k; l 2 X ð50Þ
It is not hard to observe that the function KðtÞ ¼ e k t V rðtÞ ðWðtÞÞ is piecewise differentiable along solutions of the system
(1). In view of (49), on each interval t p ; t pþ1 , we get.
K_ ðt Þ ¼ k ek t V rðtÞ ðWðtÞÞ þ ek t V_ rðtÞ ðWðt ÞÞ
ð51Þ
6h e k t ; t 2 tp ; t pþ1
Reviewing the inequality V k ðWÞ 6 1V l ðWÞ; 8k; l 2 X, the following results can be obtained
K t pþ1 ¼ ek tpþ1 V rðtpþ1 Þ W tpþ1
6 1e k tpþ1 V rðtp Þ W t pþ1 ¼ 1K t pþ1 ð52Þ
h R i
t
6 1 K t p þ pþ1 h ek t dt
tp
Selecting an arbitrary T > t 0 ¼ 0 and integrating the (52) from p ¼ 0 to p ¼ Cr ðT; 0Þ 1, one has
RT
KðT Þ 6 K t Cr ðT;0Þ þ tC ðT;0Þ hek t dt
h r
R t T;0Þ k t RT i
6 1 K tCr ðT;0Þ1 þ tCCrððT;0 he dt þ 11 t e k t dt
h
Þ1 r C r ðT;0Þ
6 ð53Þ
" Cr X
ðT;0Þ1
#
R tp þ1 k t RT
6 1Cr ðT;0Þ Kð0Þ þ 1p tp
he dt þ 1Cr ðT;0Þ t ek t dt
h
C r ðT;0Þ
p¼0
766
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
Since ba > log 1= k , for any n 2 0; k log 1= k , one has ba > log 1= k n . According to Definition 1, it holds that.
k n ðT t Þ
Cr ðT; 0Þ 6 C0 þ ; 8T P t P 0 ð54Þ
log 1
Noticing that Cr ðT; 0Þ p 6 1 þ Cr T; tpþ1 and p ¼ 0; 1; ; Cr ðT; 0Þ, which implies.
nÞðTt pþ1 Þ
1Cr ðT;0Þp 6 11þC0 eðk ð55Þ
Moreover, paying attention to n < k, we have
Z t pþ1 Z t pþ1
ek t dt 6 eðk nÞtpþ1 ent dt
h h ð56Þ
tp tp
This means
lðkWðT ÞkÞ 6 V rðT Þ ðWðT ÞÞ
log 1
6 eT 0 log 1 e ba k T l ðkWð0ÞkÞ þ 11þT 0 hn 1 enT ð58Þ
log 1
6T 0 log 1 e ba k T l ðkWð0ÞkÞ þ 11þT 0 hn ; 8T > 0
~ c and W
According to (58) and n > 0, if ba > log 1= k n , then the variable s; W ~ a are bounded with bounded initial val-
k k
^ c and W
ues. Furthermore, it can be obtained that x; W ^ a are also bounded. Thus, we can conclude that all the signals of the
k k
considered switched closed-loop system are bounded under an arbitrary switching signal rðt Þ satisfying the ADT
ba > log 1= k n .
Remark 6. The obtained optimal control policy (14) can stabilize the switched nonlinear systems (1) and minimize the local
cost function when the system moves on the sliding mode surface (2). It is worth mentioning that the ideal sliding mode
control is considered in this paper, which means the system trajectories strictly remain on the switching surface s ¼ 0.
However, in practice, due to the existence of the chattering phenomenon, the sliding mode motion does not strictly occur on
the switching surface s ¼ 0, which is the problem we will solve in the future work.
Remark 7. It should be mentioned that general non-optimal control approaches can only guarantee that all signals in the
closed-loop system are bounded and convergent, but cannot achieve the optimal control objective, which fails to make
the tradeoff between performance and control cost, while this issue is of great importance for many practical applications.
In this paper, the proposed optimal control method can not only ensure that all signals in the closed-loop system are con-
vergent and bounded, but also guarantee that the control cost is minimal.
5. Simulation study
In this section, to make a better presentation of the universality of the developed control scheme, a practical example of
the two inverted pendulums connected via a spring [48] is exhibited to show the validity of the proposed SMS-based adap-
tive AC optimal control method. The dynamic equation of the dual inverted pendulums is given by
8
> J € k0 h2 sin ðh1 Þ þ k0 h ðl0 lÞ þ urðtÞ;1 þ k0 h2 sin ðh2 Þ
1 h1 ¼ M 1 g 0 h 4
>
< 2 4
2 2 ð59Þ
>
> J 2 h€2 ¼ M 2 g 0 h 04 sin ðh2 Þ þ 20 ðl0 lÞ þ urðtÞ;2 þ 04h sin h_ 1
k h k h k
:
The structure of the dual inverted pendulums is shown in Fig. 2. urðtÞ;1 and urðtÞ;2 ; rðtÞ ¼ f1; 2g, denote control inputs. It
should be emphasized that many practical applications invariably manifest the characteristic of multimodality due to the
changes of environmental factors, and thus the system models of practical applications are able to be described via a switch-
able architecture. In this part, we assume that there are two modes in the dual inverted pendulums system, thus the switch-
ing signal is set as rðtÞ ¼ f1; 2g. The parameters and meanings of the dual inverted pendulums used in this example are
provided in Table 2.
767
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
Table 2
Parameters used in the two inverted pendulums.
h iT
Defining ½x1 ; x3 T ¼ ½h1 ; h2 T and ½x2 ; x4 T ¼ h_ 1 ; h_ 2 are angular positions and angular rates of the dual inverted pendulums,
respectively. Then, the dynamic Eq. (59) can be rewritten as
8
>
> x_ 1 ¼ x2
>
>
>
> _
2 u 2
¼ M1Jg0 h k4J0 h sin ðx1 Þ þ k2J0 h ðl0 lÞ þ rJðtÞ;1 þ k4J0 h sin ðx3 Þ
< x2
> 1 1 1 1 1
x_ 3 ¼ x4 ð60Þ
>
>
>
> 2 u 2
> x_ 4
> ¼ M2Jg0 h k4J0 h sin ðx3 Þ þ k2J0 h ðl0 lÞ þ rJðtÞ;1 þ k4J0 h sin ðx2 Þ
>
: 2 2 2 1 2
In view of (60) and the system (1), we can get that hrðtÞ ðxÞ ¼ diag x2 ; J 1 ; x4 ; J 2 and
2 3
0
6 M1 g0 h k0 h2 sin ðx1 Þ þ k0 h ðl0 lÞ þ k0 h2 sin ðx3 Þ 7
6 J1 7
g rðtÞ ðxÞ ¼ 6 7:
4J 1 2J 1 4J 1
6 0 7
4 5
M2 g0 h k0 h2
k0 h 2
k0 h
J
4J
sin ðx 3Þ þ
2J
ð l0 lÞ þ
4J
sin ð x 2Þ
2 2 2 2
To cope with the SMS-based HJB Eq. (7) to obtain the optimal control policy, the actor-critic dual NNs are introduce in this
paper. Under this architecture, an approximate solution to the HJB equation can be obtained and the issue of curse of dimen-
sionality can be well avoided. Taking the NNs approximation into account, radius basis functions NNs with 5 nodes are used.
The basis function vector is given as UðvÞ ¼ ½U1 ðvÞ; U2 ðvÞ; ; U5 ðvÞT with the Gaussian-type
h i
Ui ðvÞ ¼ exp ðv fi Þ ðv fi Þ=qi ; i ¼ 1; 2; ; 5. Then, the center and widths of the Gaussian-type function are chosen as
T 2
fi ¼ f2; 1; 0; 1; 2g and qi ¼ 2. Based on the basis function vectors, the weight estimation vectors in actor-critic dual net-
works are chosen as
768
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
h iT
ca ¼ W
W ca ; Wca ; Wca ; W ca ; W ca ;
1 1;1 1;2 1;3 1;4 1;5
h iT
cc ¼ W
W cc ; Wcc ; Wcc ; W cc ; W cc ;
1 1;1 1;2 1;3 1;4 1;5
h iT
ca ¼ W
W ca ; Wca ; Wca ; W ca ; W ca ;
2 2;1 2;2 2;3 2;4 2;5
h iT
cc ¼ W
W cc ; Wcc ; Wcc ; W cc ; W cc :
2 2;1 2;2 2;3 2;4 2;5
It is worth noting that the initial values of actor-critic weight vectors are selected under the trial-and-error method, and it
is found that the switched system is stable when the initial values of the weight vectors are selected as
c a ð0Þ ¼ ½0:2; 0:1; 0:3; 0:4; 0:5T ;
W 1
c
W c ð0Þ ¼ ½0:01; 0:06; 0:08; 0:07; 0:05T ;
1
In addition, the initial values of the system states are selected as x ¼ ½x1 ; x2 ; x3 ; x4 T ¼ ½0; 0; 0; 0T . For rðt Þ ¼ f1; 2g, the cost
functions for different modes are formulated as follows:
R1 T
J 1 ðsÞ ¼ s ðtÞQ 1 sðt Þ þ uT1 ðtÞR1 u1 ðtÞ ds;
Rt1 T
J 2 ðsÞ ¼ t s ðtÞQ 2 sðt Þ þ uT2 ðtÞR2 u2 ðtÞ ds:
In order to ensure that matrix Q 1 ; Q 2 ; R1 and R2 are all invertible positive definite matrices, which are selected as
5:36 0 4:85 0 9:6 0 5 0
Q1 ¼ ; Q2 ¼ ; R1 ¼ ; R2 ¼ :
0 9:8 0 4:7 0 8 0 4:12
The controllers parameters are taken as c1 ¼ 0:5; c2 ¼ 0:42; u 1 ¼ 4:3 and u
2 ¼ 0:13. Taking the critic updating laws (20)
and the actor updating laws (21) into account, the updating rates are set as gc1 ¼ 0:08; ga1 ¼ 0:13; gc2 ¼ 0:9 and ga2 ¼ 0:27,
respectively. The positive-definite matrix in (2) is selected as
6:5 0
G¼ :
0 9:55
The simulation are implemented via utilizing the MATLAB (2018a) and the obtained simulation results are shown in
Figs. 3–10. Fig. 3 shows the convergence of the system states, it can be seen from Fig. 3 that the system state trajectories
are unstable in the first 27s, the reason lies in that the chattering phenomenon is inevitable when using the sliding mode
technology. After that, the system states gradually converge to a small neighborhood of the origin. Figs. 4–7 present weight
estimations of the critic and actor networks for the different system modes, which can be found that they start to converge to
constants after 23s. It should also be mentioned here that the actor NN aims to execute control actions, and the critic NN
makes an estimation on the actions and make feedback to the actor NN. Moreover, Persistence of excitation in Assumption
3 guarantees that weight estimations of the critic and actor NNs converge close to optimal values. From the critic updating
769
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
^a .
Fig. 4. Actor network weight vector W 1
^c .
Fig. 5. Critic network weight vector W 1
^a .
Fig. 6. Actor network weight vector W 2
770
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
^c .
Fig. 7. Critic network weight vector W 2
771
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
law (20) and the actor updating law (21), we know that actor NN and critic NN are tuned to minimize EB ðtÞ , ð1=2ÞH2k ðt Þ. The
trajectories of the adaptive optimal control policies are displayed in Figs. 8 and 9, from which we can see that the controllers
are unstable in the first 25s due to the existence of chattering phenomenon, and then converges to a small neighborhood of
the origin quickly. The switching signal is given in Fig. 10. Therefore, we can conclude that the boundedness of all the signals
h iT
^ a ;W
can be ensured, i.e., ½x1 ; x2 ; x3 ; x4 T ¼ h1 ; h_ 1 ; h2 ; h_ 2 ; W ^ c ;W
^ a ;W
^ c ; u1;1 ; u1;2 ; u2;1 and u2;2 are bounded and convergent via
1 1 2 2
using the developed adaptive actor-critic optimal control in the paper. These simulation results also indicates that the pro-
posed control scheme is feasible to guarantee the stability of the considered closed-loop system.
6. Conclusions
This paper presents a sliding-mode surface-based adaptive optimal control method for switched nonlinear systems with
average dwell time under an actor-critic architecture. A specific cost function related to sliding mode surface is constructed
to find a series of optimal control policies more quickly. The HJB equation is solved by a SMS-based critic updating law
nested with the actor updating law. The designed critic updating law and actor updating law can guarantee that the weights
of critic and actor NNs converge to small neighborhoods of ideal values. Based on the Lyapunov stability theory, all the sig-
nals of the closed-loop system are proved to be bounded. The simulation results show the effectiveness of the proposed opti-
mal control scheme. On the basis of the results in this paper, the input-output constraints problems of switched pure-
feedback nonlinear systems will be the focus in our future research.
Haoyan Zhang: Conceptualization, Methodology, Writing - original draft. Huanqing Wang: Conceptualization, Visualiza-
tion. Ben Niu: Conceptualization, Visualization. Liang Zhang: Conceptualization, Visualization. Adil M. Ahmad: Conceptua-
lization, Visualization.
The authors declare that they have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.
Acknowledgements
The Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia has funded this project,
under grant no. (FP-161-43). The authors, therefore, acknowledge with thanks DSR for technical and financial support. This
work was also partially supported by the Education Committee Liaoning Province, China (LJ2019002).
References
[1] Y. Wang, B. Niu, H. Wang, N. Alotaibi, E. Abozinadah, Neural network-based adaptive tracking control for switched nonlinear systems with prescribed
performance: an average dwell time switching approach, Neurocomputing 435 (2021) 295–306.
772
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
[2] Y. Chang, P. Zhou, B. Niu, H. Wang, N. Xu, M. Alassafi, A. Ahmad, Switched-observer-based adaptive output-feedback control design with unknown gain
for pure-feedback switched nonlinear systems via average dwell time, Int. J. Syst. Sci. (2021) 1–15, https://doi.org/10.1080/00207721.2020.1863503.
[3] N. Xu, X. Zhao, G. Zong, Y. Wang, Adaptive Control Design for Uncertain Switched Nonstrict-Feedback Nonlinear Systems to Achieve Asymptotic
Tracking Performance, Applied Mathematics and Computation 408 (126344) (2021).
[4] A. Molter, M. Rafikov, Nonlinear optimal control of population systems: applications in ecosystems, Nonlinear Dyn. 76 (2) (2014) 1141–1150.
[5] X. Tang, D. Zhai, X. Li, Adaptive fault-tolerance control based finite-time backstepping for hypersonic flight vehicle with full state constrains, Inf. Sci.
507 (2020) 53–66..
[6] H. Xiao, C. P. Chen, Time-varying Nonholonomic Robot Consensus Formation Using Model Predictive Based Protocol With Switching Topology, Inf. Sci.
567 (2021) 201–215..
[7] X. Zhao, X. Wang, L. Ma, G. Zong, Fuzzy approximation based asymptotic tracking control for a class of uncertain switched nonlinear systems, IEEE
Trans. Fuzzy Syst. 28(4) (2019) 632–644..
[8] Z. Liu, B. Chen, C. Lin, Adaptive neural quantized control for a class of switched nonlinear systems, Inf. Sci. 537 (2020) 313–333..
[9] X. Su, L. Wu, P. Shi, C.P. Chen, Model approximation for fuzzy switched systems with stochastic perturbation, IEEE Trans. Fuzzy Syst. 23 (5) (2014)
1458–1473.
[10] Y. Li, S. Sui, S. Tong, Adaptive fuzzy control design for stochastic nonlinear switched systems with arbitrary switchings and unmodeled dynamics, IEEE
Trans. Cybern. 47 (2) (2016) 403–414.
[11] J. Lu, Z. She, W. Feng, S.S. Ge, Stabilizability of time-varying switched systems based on piecewise continuous scalar functions, IEEE Trans. Autom.
Control 64 (6) (2018) 2637–2644.
[12] R. Ma, S. An, Minimum dwell time for global exponential stability of a class of switched positive nonlinear systems, IEEE/CAA J. Autom. Sin. 6 (2) (2018)
471–477.
[13] X. Liu, S. Zhong, Q. Zhao, Dynamics of delayed switched nonlinear systems with applications to cascade systems, Automatica 87 (2018) 251–257.
[14] L. Ma, N. Xu, X. Huo, X. Zhao, Adaptive finite-time output-feedback control design for switched pure-feedback nonlinear systems with average dwell
time, Nonlinear Anal.: Hybrid Syst. 37 (2020) 100908.
[15] B. Jiang, Q. Shen, P. Shi, Neural-networked adaptive tracking control for switched nonlinear pure-feedback systems under arbitrary switching,
Automatica 61 (2015) 119–125.
[16] G. Chesi, P. Colaneri, Homogeneous rational lyapunov functions for performance analysis of switched systems with arbitrary switching and dwell time
constraints, IEEE Trans. Autom. Control 62 (10) (2017) 5124–5137.
[17] P. Zhou, L. Zhang, S. Zhang, A.F. Alkhateeb, Observer-Based Adaptive Fuzzy Finite-Time Control Design with Prescribed Performance for Switched Pure-
Feedback Nonlinear Systems, IEEE Access 9 (2021) 69481–69491.
[18] R.E. Precup, R.C. Roman, T.A. Teban, A. Albu, E.M. Petriu, C. Pozna, Model-free control of finger dynamics in prosthetic hand myoelectric-based control
systems, Stud. Inf. Control 29 (4) (2020) 399–410.
[19] A. Turnip, J. Panggabean, Hybrid controller design based magneto-rheological damper lookup table for quarter car suspension, Int. J. Artif. Intell 18 (1)
(2020) 193–206.
[20] Y. Li, N. Xu, B. Niu, Y. Chang, J. Zhao, X. Zhao, Small-gain technique-based adaptive fuzzy command filtered control for uncertain nonlinear systems
with unmodeled dynamics and disturbances, International Journal of Adaptive Control and Signal Processing 35 (9) (2021) 1664–1684.
[21] S. Wen, M.Z. Chen, Z. Zeng, X. Yu, T. Huang, Fuzzy control for uncertain vehicle active suspension systems via dynamic sliding-mode approach, IEEE
Trans. Syst. Man Cybern.: Syst. 47 (1) (2016) 24–32.
[22] S. Yin, H. Yang, H. Gao, J. Qiu, O. Kaynak, An adaptive nn-based approach for fault-tolerant control of nonlinear time-varying delay systems with
unmodeled dynamics, IEEE Trans. Neural Networks Learn. Syst. 28 (8) (2016) 1902–1913.
[23] J. Lei, H.K. Khalil, Feedback linearization for nonlinear systems with time-varying input and output delays by using high-gain predictors, IEEE Trans.
Autom. Control 61 (8) (2015) 2262–2268.
[24] D. Liu, X. Yang, H. Li, Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics, Neural
Comput. Appl. 23 (7–8) (2013) 1843–1850.
[25] Y. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete-time systems with
dead-zone, IEEE Trans. Fuzzy Syst. 24 (1) (2015) 16–28.
[26] Y. Lv, J. Na, Q. Yang, X. Wu, Y. Guo, Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics, Int. J.
Control 89 (1) (2016) 99–112.
[27] B. Luo, D. Liu, H. Wu, Adaptive constrained optimal control design for data-based nonlinear discrete-time systems with critic-only structure, IEEE
Trans. Neural Networks Learn. Syst. 29 (6) (2017) 2099–2111.
[28] N. Xu, B. Niu, H. Wang, X. Huo, X. Zhao, Single-network adp for solving optimal event-triggered tracking control problem of completely unknown
nonlinear systems, Int. J. Intell. Syst. doi: 10.1002/int.22491..
[29] G. Wen, C.P. Chen, S.S. Ge, H. Yang, X. Liu, Optimized adaptive nonlinear tracking control using actor–critic reinforcement learning strategy, IEEE Trans.
Ind. Inf. 15 (9) (2019) 4969–4977.
[30] V. Schmid, Solving the dynamic ambulance relocation and dispatching problem using approximate dynamic programming, Eur. J. Oper. Res. 219 (3)
(2012) 611–621.
[31] D. Liu, D. Wang, D. Zhao, Q. Wei, N. Jin, Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized
dual heuristic programming, IEEE Trans. Autom. Sci. Eng. 9 (3) (2012) 628–634.
[32] S. Bhasin, R. Kamalapurkar, M. Johnson, K.G. Vamvoudakis, F.L. Lewis, W.E. Dixon, A novel actor–critic–identifier architecture for approximate optimal
control of uncertain nonlinear systems, Automatica 49 (1) (2013) 82–92.
[33] W. Wang, X. Chen, Model-free optimal containment control of multi-agent systems based on actor-critic framework, Neurocomputing 314 (2018)
242–250.
[34] R. Palm, D. Driankov, Design of a fuzzy gain scheduler using sliding mode control principles, Fuzzy Sets Syst. 121 (1) (2001) 13–23.
[35] T. Haidegger, L. Kovács, R.E. Precup, S. Preitl, B. Benyó, Z. Benyó, Cascade control for telerobotic systems serving space medicine, IFAC Proc. Vol. 44 (1)
(2011) 3759–3764.
[36] R.M. Asl, Y.S. Hagh, R. Palm, Robust control by adaptive non-singular terminal sliding mode, Eng. Appl. Artif. Intell. 59 (2017) 205–217.
[37] C. Chiu, Derivative and integral terminal sliding mode control for a class of mimo nonlinear systems, Automatica 48 (2) (2012) 316–326.
[38] A. Nasiri, S.K. Nguang, A. Swain, D. Almakhles, Passive actuator fault tolerant control for a class of mimo nonlinear systems with uncertainties, Int. J.
Control 92 (3) (2019) 693–704.
[39] J. Fei, C. Lu, Adaptive sliding mode control of dynamic systems using double loop recurrent neural network structure, IEEE Trans. Neural Networks
Learn. Syst. 29 (4) (2017) 1275–1286.
[40] R. Cui, L. Chen, C. Yang, M. Chen, Extended state observer-based integral sliding mode control for an underwater robot with unknown disturbances and
uncertain nonlinearities, IEEE Trans. Industr. Electron. 64 (8) (2017) 6785–6795.
[41] J. Xiong, G. Zhang, Global fast dynamic terminal sliding mode control for a quadrotor uav, ISA Trans. 66 (2017) 233–240.
[42] X. Zhao, H. Yang, G. Zong, Adaptive neural hierarchical sliding mode control of nonstrict-feedback nonlinear systems and an application to electronic
circuits, IEEE Trans. Syst. Man Cybern.: Syst. 47 (7) (2016) 1394–1404.
[43] Q. Fan, G. Yang, Nearly optimal sliding mode fault-tolerant control for affine nonlinear systems with state constraints, Neurocomputing 216 (2016) 78–
88.
[44] H. Zhang, N. Xu, G. Zong, A. Alkhateeb, Adaptive fuzzy hierarchical sliding mode control of uncertain under-actuated switched nonlinear systems with
actuator faults, International Journal of Systems Science 52 (8) (2021) 1499–1514.
773
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774
[45] B. Zhao, D. Liu, C. Alippi, Sliding-Mode Surface-Based Approximate Optimal Control for Uncertain Nonlinear Systems With Asymptotically Stable Critic
Structure, IEEE Trans. Cybern. doi:10.1109/TCYB.2019.2962011..
[46] J. Zhao, J. Na, G. Gao, Adaptive dynamic programming based robust control of nonlinear systems with unmatched uncertainties, Neurocomputing 395
(2020) 56–65.
[47] J. Zhao, M. Gan, Finite-horizon optimal control for continuous-time uncertain nonlinear systems using reinforcement learning, Int. J. Syst. Sci. 51 (13)
(2020) 2429–2440.
[48] L. Long, Z. Wang, J. Zhao, Switched adaptive control of switched nonlinearly parameterized systems with unstable subsystems, Automatica 54 (2015)
217–228.
774