0% found this document useful (0 votes)

3 views

B2.1

This paper presents a sliding-mode surface-based adaptive optimal control method for switched nonlinear systems using an actor-critic reinforcement learning strategy. The proposed approach transforms the control problem into finding optimal policies and integrates error terms from the Hamilton-Jacobi-Bellman equation to enhance stability and performance. The effectiveness of the method is demonstrated through simulations, ensuring boundedness of signals in the closed-loop system.

Uploaded by

nonec77553

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

B2.1

Uploaded by

nonec77553

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Information Sciences 580 (2021) 756–774

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

Sliding-mode surface-based adaptive actor-critic optimal

control for switched nonlinear systems with average dwell time
Haoyan Zhang a,⇑, Huanqing Wang b, Ben Niu c, Liang Zhang a, Adil M. Ahmad d
a
College of Control Science and Engineering, Bohai University, Jinzhou, Liaoning 121013, China
b
College of Mathematical Science, Bohai University, Jinzhou, Liaoning 121013, China
c
School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China
d
Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia

a r t i c l e i n f o a b s t r a c t

Article history: This paper studies the problem of sliding-mode surface (SMS)-based adaptive optimal con-
Received 8 March 2021 trol for a class of continuous-time switched nonlinear systems with average dwell time
Received in revised form 15 August 2021 (ADT) via using an actor-critic (AC) reinforcement learning (RL) strategy. By developing a
Accepted 18 August 2021
specific cost function related to SMS, the original control problem is equivalently trans-
Available online 20 August 2021
formed into the problem of finding a series of optimal control policies. Then, the error
terms separated from the Hamilton-Jacobi-Bellman (HJB) equation are integrated into a
Keywords:
function, which effectively reduces the influence of some repeated magnifications caused
Switched nonlinear systems
Adaptive optimal control
by approximation errors. Besides, the solution to the HJB equation is identified by the
Actor-critic (AC) architecture SMS-based AC neural networks (NNs), where the actor and critic NNs are developed to
Sliding mode surface (SMS) carry out the RL strategy simultaneously. The actor updating law is to implement control
Average dwell time (ADT) actions based on the system output, while the critic updating law is required to assess
the current control action and feedback to the actor. Based on the Lyapunov stability the-
ory, the applicability of the proposed adaptive AC optimal control method is verified to
guarantee the boundedness of all signals in the considered closed-loop switched nonlinear
systems. Finally, a simulation examples is given to illustrate the effectiveness of the pro-
posed adaptive optimal control method.
Ó 2021 Elsevier Inc. All rights reserved.

1. Introduction

As a representative class of hybrid systems, switched [1–3] systems have captured extensive attention over the past two
decades. This is mainly because a host of practical applications inevitably manifest the characteristic of multimodality due to
the changes of environmental factors, and thus the system models of actual plants are able to be described via a switchable
architecture, such as biological ecosystems [4], flight systems [5], robot systems [6], etc. In general, a switched system is con-
stituted of a switching signal, a family of continuous-time or discrete-time subsystems and a switching rule [7]. Each sub-
system of a switched system can be regarded as a mode, and the switching rule can be controlled to achieve the purpose of
appropriate switching among modes. A large number of control problems on switched systems have been investigated in [8–
11], just mention a few works. Besides, a great deal of outstanding strategies concerning the switching rule design of
switched systems have been developed, such as minimum switching strategy [12], delay switching strategy [13], average

⇑ Corresponding author.
E-mail address: [email protected] (H. Zhang).

https://doi.org/10.1016/j.ins.2021.08.062
0020-0255/Ó 2021 Elsevier Inc. All rights reserved.
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

dwell time strategy [14], arbitrary switching strategy [15], etc. Compared with the arbitrary switching strategy, the average
dwell time switching strategy means that the number of switching times of a group of restricted switching signals is finite,
and the dwell time is not less than a constant, and its limit case is the arbitrary switching [16]. Therefore, the research on
average dwell time switching strategy is more meaningful.
It is well known that multitudinous systems in practical engineering possess the natural property of nonlinearity to a cer-
tain extent, which will make the systems display diversities and high uncertainties [17–19], hence, it will be more compli-
cated or even unable to design an appropriate control law for such nonlinear systems. During the past a few decades, it has
been put a high value on the control issue of nonlinear systems, and some significant approaches have been developed, such
as fuzzy control [20], variable structure control [21], neural network (NN) control [22], feedback linearization method [23],
and so on. Although these control methods have been widely discussed, their applications inevitably encounter some restric-
tions, such as the acquisition and determination of fuzzy control rules and membership functions, the choice regarding the
node number of hidden layers, the chattering phenomenon of variable structure control, feedback linearization conditions,
etc. By comparison, adaptive optimal control [24,25] is a special control method, which is able to adaptively adjust controller
parameters according to the changes of various environments, so as to optimize the performance index of controlled sys-
tems. Hence, the optimal control is quite necessary for some practical applications, such as optimal path planning for
unmanned ground vehicles, real-time optimal control for spacecraft orbit transfer, optimal trajectory planning and collision
avoidance for underwater vehicles, etc. In brief, this control method cannot only deal with the control issues of nonlinear
systems but also guarantee the expected systems performances. However, the linchpin of optimal control law design is to
solve the Hamilton-Jacobi-Bellman (HJB) equation [26,27]. Owing to the fact that the HJB equation is a nonlinear partial dif-
ferential equation, hence, it is quite hard to obtain the solution to the HJB equation, which has become the major obstacle to
design an appropriate optimal control law.
To cope with the optimal control issues of nonlinear systems, some relevant control technologies have been extensively
discussed in the past decade. As an effective approximate optimal method, adaptive dynamic programming (ADP) [28] has
been concerned by the control community. This control method combines reinforcement learning (RL) [29] and dynamic
programming (DP), which overcomes the limitations of traditional DP, such as the curse of dimension [30,31]. Besides,
ADP is able to approximate the solution to the HJB equation via online input-state data without using the precise model
knowledge to achieve approximate optimal control. For practical systems, stability is the basic requirement. Nevertheless,
the transient and steady performance, as well as the optimality, are often further considerations. Recently, ADP control tech-
nology based on an actor-critic framework [32,33] has received more and more attention. The actor-critic framework is gen-
erally composed of two NNs, which are a critic NN and an actor NN. The critic NN is used to evaluate the current system
performance and to modify the control action based on the actor NN. With the modification of the critic NN, the actor
NN continually generates outputs to control systems such that optimal performances can come true. Combining the features
of actor-only and critic-only architectures, the AC architecture usually acts out better properties, which has been successfully
applied to achieve adaptive optimal stabilization for nonlinear dynamic systems.
Considering the improvement of system response speed, slide mode control (SMC) [34–36] has attracted extensive
research interest in recent years. SMC is an effective robust control approach, and one of the characteristics of this control
approach is that the structure of slide mode surface is not fixed, and different control effects are implemented according to
the changeable slide mode surfaces composed of system states [37]. Besides, SMC can also overcome the influences from
system uncertainties [38], and possess strong robustness to unknown disturbances and perturbations [39]. Some different
types of slide mode control have so far been developed, such as integral sliding mode control [40], terminal sliding mode
control [41] and hierarchical sliding mode control [42], etc. Recently, plenty of adaptive optimal control problems combined
with slide mode control have been deeply studied. The authors in [43] investigated the nearly optimal control problem for
affine nonlinear systems with actuator faults and state constraints by using an integral sliding mode surface. In [44], a novel
optimal guaranteed cost sliding mode control for constrained-input nonlinear systems was studied. The sliding-mode
surface-based approximate optimal control problems for uncertain nonlinear systems with asymptotically stable critic struc-
ture was investigated in [45]. However, there is no research on the sliding-mode surface-based adaptive optimal control for
uncertain switched nonlinear systems with average dwell time under an actor-critic framework now, which inspires our pre-
sent work.
Motivated by the above discussions, in this article, we focus on the problem of adaptive AC optimal control for switched
nonlinear systems by using the SCM technique and ADT strategy. The main contributions are listed in the following:

1) On the basis of the actor-critic RL architecture, a sliding-mode surface-based adaptive optimal control approach for
switched nonlinear systems with average dwell time is proposed for the first time. In addition, a class of switching rules
with ADT is given.
2) Compared with the existing results in [46,47], the control scheme of this paper is extended from the original non-
switched framework to a switched actor-critic dual networks framework. In addition, different from the optimal control
methods requiring information of system state, this paper combines the optimal control theory with sliding mode tech-
nology, which improves the response rate of the considered switched system to a certain extent.
3) In the proposed control scheme, the critic updating law is designed via using the gradient descent method. Also, the
designed critic updating law can evaluate the current control action and make feedback to the actor updating law, which
minimizes the cost function to achieve the optimal control.
757
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

The rest of this paper is organized as follows. The problem formulation and preliminaries are given in Section 2. In Sec-
tion 3, we present an adaptive optimal control scheme under actor-critic dual networks architecture. In Section 4, the sta-
bility of the system is strictly proved based on Lyapunov stability theory. In Section 5, we present a practical example of the
dual inverted pendulums to illustrate the validity of the proposed control method. Finally, the conclusion is summarised in
Section 6. In addition, some key notations in this paper are summarized in Table 1.

Table 1
Notation description.

Symbol Meaning

Rn The n-dimensional Euclidean space

Rnn The set of n n real matrices
Rþ The set of all positive numbers
j The set of functions: Rþ ! Rþ , which are continuous, strictly
increasing and vanishing at zero
j1 The unbounded j class functions
jj The absolute value of a real number
kk The Frobenius norm of a vector or matrix
rðÞ The gradient operator
T The transpose operator
, The definition of formula
kmin ðÞ The minimum eigenvalue of a matrix
kmax ðÞ The maximum eigenvalue of a matrix

2. Problem formulation and preliminaries

2.1. Problem formulation

Consider a class of continuous-time switched nonlinear systems described by

x_ ðtÞ ¼ g rðtÞ ðxÞ þ hrðtÞ ðxÞurðtÞ ðtÞ ð1Þ
T
where xðtÞ ¼ ½x1 ðt Þ; x2 ðtÞ; ; xn ðtÞ 2 R is the system state vector with xð0Þ ¼ 0n . rðtÞ : ½0; þ1Þ ! Xr ¼ f1; 2; lg is the
n

switching signal. For k 2 Xr ; uk ðtÞ 2 Rn is the control input. g k ðxÞ 2 Rn with g k ð0Þ 2 0n is the internal dynamic and
hk ðxÞ 2 Rnn denotes the control gain matrix. g k ðxÞ 2 Rn and hk ðxÞ 2 Rnn are differentiable with the condition that
g k ðxÞ þ hk ðxÞuk ðt Þ is Lipschitz continuous, which guarantees the system (1) to exist a unique solution for the bounded initial
value.

Assumption 1. For k 2 Xr , the control gain matrix hk ðxÞ and the internal dynamic g k ðxÞ are assumed to be known and norm-
bounded. Moreover, the control gain matrix hk ðxÞ is nonsingular, which indicates that hk ðxÞ is an invertible matrix.
Similar to [42], the sliding mode variable is defined as
s ¼ GxI þ x ð2Þ
hR Rt Rt iT
t
where xI ¼ 0
x1 ðsÞds; 0
x2 ðsÞds; ; 0
xn ðsÞds 2 Rn , and G 2 Rnn is a diagonal positive-definite matrix. Furthermore, the
time derivative of the sliding mode surface s can be obtained as follows
s_ ¼ Gx þ x_ ð3Þ

Definition 1. [14] A switching signal rðtÞ has an ADT ba 2 Rþ if there exists a positive constant C0 2 Rþ such that
T t
Cr ðT; t Þ P C0 þ ; 8T P t P 0;
ba
where Cr ðT; t Þ is the number of switches occurring in the interval ½t; T Þ.

2.2. Neural networks (NNs)

For any continuous and smooth function F ðvÞ defined on the set Xv 2 R, there exist a NN ðW Þ UðvÞ satisfying [15]
T

F ðvÞ ¼ ðW Þ UðvÞ þ dðvÞ

758
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

and
" #

W ¼ arg minm supF ðvÞ ðW Þ UðvÞ
T
W2R v2Xv
T
where dðvÞ is the estimation error and m is the number of neural nodes. W ¼ W 1 ; W 2 ; ; W m represents the ideal weight
vector. Besides, UðvÞ ¼ ½U1 ðvÞ; U2 ðvÞ; ; Um ðvÞT denotes the basis function vector with the Gaussian function
" #
ðv fi ÞT ðv fi Þ
Ui ðvÞ ¼ exp ; i ¼ 1; 2; ; m
q2i
where fi and qi stand for the center and width of Gaussian function, respectively.

Lemma 1. [7] 8ðN 1 ; N 2 Þ 2 R2 , the following inequality holds

qa1 1
N1 N2 6 jN ja1 þ jN ja2
a1 1 a2 qa2 2
where q > 0; a1 > 1; a2 > 1 and ða1 1Þða2 1Þ ¼ 1.

3. Optimal control design

Define an infinite-horizon integral function as the performance index with respect to the slide mode surface s as follows
Z 1
J k ðsÞ ¼ ‘k ðsðsÞ; uk ðsÞÞds ð4Þ
t

where ‘k ðsðtÞ; uk ðt ÞÞ ¼ sT ðtÞQ k sðtÞ þ uTk ðt ÞRk uk ðtÞ; k 2 Xr is a local cost function with ‘k ¼ ð0; 0Þ ¼ 0, and Q k 2 Rnn and
Rk 2 Rnn are invertible positive-definite matrixes.

Definition 2. [26] A control protocol uk ; k 2 Xr , is admissible on the compact set /k denoted by uk 2 Nk ð/k Þ, if it is
continuous and uk ð0Þ ¼ 0, then J k ðsÞ will be finite on /k .

Associated with (3) and (4), defined as

Hk ðs; uk ; rJ k ðsÞÞ ¼ ‘k ðs; uk Þ þ rJ Tk ðsÞs_

¼ sT ðtÞQ k sðtÞ þ uTk ðtÞRk uk ðtÞ þ rJ Tk ðsÞðGx þ x_ Þ ð5Þ
¼ sT ðtÞQ k sðtÞ þ uTk ðtÞRk uk ðtÞ þ rJ Tk ðsÞðg k ðxÞ þ hk ðxÞuk ðtÞÞ
where rJ k ðsÞ ¼ @J k ðsÞ=@s is the gradient of J k ðsÞ.
Let uk represent the optimal control law of uk , then the optimal performance index is given by
R 1
J k ðsÞ ¼ min ‘k ðsðsÞ; uk ðsÞÞds
uk 2Nk ð/k Þ t
R1 ð6Þ
¼ t ‘k sðsÞ; uk ðsÞ ds
In light of the optimal performance index (6), the HJB equation is given by

Hk s; uk ; rJk ðsÞ ¼ ‘k s; uk þ rJ T _
k ðsÞs
T
ð7Þ
k Rk uk þ rJ k ðsÞ g k ðxÞ þ hk ðxÞuk ¼ 0
¼ sT ðt ÞQ k sðt Þ þ uT

where rJ k ðsÞ ¼ @J k ðsÞ=@s is the gradient of J k ðsÞ.

By solving the partial differential equation @Hk s; uk ; rJ k ðsÞ =@uk ¼ 0, we can get that
1
uk ¼ R1 h ðxÞrJ k ðsÞ
T
ð8Þ
2 k k
Substituting (8) into (7) yields

Hk s; uk ; rJk ðsÞ ¼ sT ðt ÞQ k sðt Þ 14 rJ T 1 T
k ðsÞhk ðxÞRk hk ðxÞrJ k ðsÞ
ð9Þ
þrJ T
k ðsÞðGx þ g k ðxÞÞ ¼ 0

759
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

Remark 1. By observing the formula (9), it can be easily obtained the gradient term rJ k ðsÞ. Then, we can further obtain the
optimal control protocol uk by combining the Eq. (8). However, it is difficult to handle the gradient term rJ k ðsÞ due to rJ k ðsÞ
is unknown and possesses strong nonlinearity. In order to solve this issue, the NNs-based actor-critic RL method is
considered in this paper. Besides, the block diagram of the proposed AC optimal control scheme is shown in Fig. 1.
By applying the NNs, the optimal performance index (6) can be approximated as
T
J k ðsÞ ¼ W k Uk ðsÞ þ dk ðsÞ

Furthermore, the gradient of J k ðsÞ can be expressed as follows

rJk ðsÞ ¼ rUTk ðsÞW k þ rdk ðsÞ ð11Þ

where rUk ðsÞ 2 Rmn and rdk ðsÞ 2 Rn1 are the gradient of Uk ðsÞ and dk ðsÞ, respectively.
Substituting (11) into (8), we have
1 1
uk ¼ R1 h ðxÞrUTk ðsÞW k R1
T T
h ðxÞrdk ðsÞ ð12Þ
2 k k 2 k k
Since the J k ðsÞ and uk are unknown, they are not available. Based on the NNs approximation, (10) is approximated as
^J ðsÞ ¼ W
^ T Uk ðsÞ ð13Þ
k ck

where ^J k ðsÞ is the estimations of J k ðsÞ, and W

^ c 2 Rm denotes the critic NN update law.
k

Then, the ideal optimal control policy is designed as

1 1 1 T c
uk ¼ ck R1
T
k hk ðxÞsðt Þ hk ðxÞGx Rk hk ðxÞrUk ðsÞ W ak
T
ð14Þ
2
where ck > 0; k 2 Xr , is a design parameter. W^ a 2 Rm is the actor NN weight vector, which will be given later.
k

Substituting (11) and (12) into (7), it is true that

T
Hk s; uk ; rJk ðsÞ ¼ sT ðt ÞQ k sðt Þ 14 W k }k ðx; sÞW k
T ð15Þ
þ W k rUk ðsÞg k ðxÞ þ xk ðtÞ ¼ 0

Fig. 1. Block diagram of the proposed AC optimal control scheme.

760
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

where xðtÞ ¼ 14 W k T rUk ðsÞIk ðxÞrdk ðsÞ þ 14 rdTk ðsÞIk ðxÞrUTk ðsÞW k þ W k T rUk ðsÞGx þ 14 rdTk ðsÞ@k ðxÞrdk ðsÞ þ rdTk ðsÞ
T
1 1
Gx þ g k ðxÞ þ hk ðxÞuk 2 R with Ik ðxÞ ¼ hk ðxÞR1
T
k hk ðxÞ 2 R
nn
and @k ðxÞ ¼ GT hk Rk hk G 2 Rnn , and }k ðx; sÞ ¼
rUk ðsÞIk ðxÞrUTk ðsÞ 2 Rmm .

k ; k 2 Xr , such that jxk ðt Þj 6 x

Assumption 2. There exists a positive constant x k.

Remark 2. The function xk ðtÞ contains all terms related to the NNs approximation errors of (15). To alleviate the influence of
repeated magnifications caused by the errors, in this paper, xk ðt Þ is assumed to be bounded.
The gradient of ^J ðsÞ in (13) can be written by
k

r ^J ðsÞ ¼ rU T ^
k k ðsÞW ck ð16Þ
Substituting (14) and (16) into (7), one has
h i
Hk s; uk ; r^J k ðsÞ ¼ sT ðtÞQ k sðtÞ þ ck ck sT ðtÞ þ 12 W^T W
ak
^ T rUk ðsÞ @k ðxÞsðtÞ
ck

1 1
þ ck sT ðt Þ þ xT GT hk ðxÞRk hk ðxÞ Gx þ 12 ck sT ðtÞIk ðxÞ ð17Þ

þxT GT rUTk ðsÞ W ca 1 W ^ T 1W ^ T } ðx; sÞ W ca
k 2 ck 2 ak k k

The Bellman residual error Hk ðt Þ is denoted as follows

Hk ðt Þ ¼ Hk s; uk ; r^Jk ðsÞ Hk s; uk ; rJk ðsÞ
ð18Þ
¼ Hk s; uk ; r^J k ðsÞ

Based on the (18), the square of the Bellman residual error is defined as
1 2
EB ðt Þ , H ðt Þ ð19Þ
2 k
c c is designed by using the gradient descent method and
To minimize the Bellman residual error, the critic updating law W k

c
W ck is presented as follows
_ g c ek ð t Þ

c
W ck ¼ k c c þ eT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W
sT ðt ÞQ k sðt Þ þ 3eTk ðtÞ W ^ T } ðx; sÞ W
ca þ 1 W ca ð20Þ
2 k k k k 4 ak k k

c a 2 Rm1 ; k 2 Xr , and g > 0 is a design parameter.
where ek ðt Þ ¼ rUk ðsÞ g k ðtÞ ck Ik ðxÞsðt Þ 12 Ik ðxÞrUTk ðsÞ W k ck

Assumption 3. [32] Persistence of excitation (PE): Within the interval ½t; t þ Dt with Dt > 0, there exist positive constants
k ; k 2 Xr , such that ek ðt ÞeTk ðtÞ satisfies:
uk ; u

k Im
uk Im 6 ek ðtÞeTk ðtÞ 6 u
where Im 2 Rmm is an identity matrix.
c a is designed as
Besides, The actor updating law W k

^_ a
W ¼ k
k
gc u
cc þ
rUk ðsÞ@k ðxÞrUTk ðsÞ W
gc
k ^ T } ðx; sÞ W
ek ðtÞW cc
2 8 ak k
k k k
ð21Þ
ca
þ 12 rUk ðsÞITk ðxÞsðt Þ gak W k

Remark 3. To achieve the purpose of minimizing Hk ðtÞ, the square term EB ðtÞ , 12 H2k ðt Þ is required that E_ B ðtÞ 6 0, which can
make control policies close to the optimal one. To this end, the critic updating law (20) is designed via the negative gradient
descent method such that E_ B ðt Þ 6 0 can be satisfied.

4. Main results and stability analysis

In this section, the main results and stability analysis of this paper will be summarized in the following theorem. For the
considered system (1), a detailed process to prove the validity of Theorem 1 will be presented.
761
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

Theorem 1. Consider the proposed switched nonlinear system (1), under Assumptions 1–3, the ideal optimal control policy (14),
together with the critic updating law in (20) and the actor updating law in (21), including the sliding mode surface (2), the
developed adaptive optimal control scheme can ensure that all signals of the system (1) can be bounded for every switching signal

with ADT ba > log 1= k n .

Proof. Consider the following Lyapunov function candidate:

1 T 1 ~T ~ 1 ~T ~
V k ðt Þ ¼ s ðt Þsðt Þ þ W Wc þ W W ak ð22Þ
2 2 c k k 2 ak
~ c ¼ W W
where W ^ c and W
~ a ¼ W W
^a . h
k ck k k ak k

Remark 4. In modern control theory, the second Lyapunov method is a main method to study the system stability. However,
there is short of a general method to construct the Lyapunov function, and thus it is necessary to analyze the constructions of
the Lyapunov function for different practical applications. It is worth noting that based on Lyapunov stability theory, the Lya-
punov function is usually suggested to be selected as a positive definite quadratic function, then control laws are designed to
ensure that the time derivative of the Lyapunov function is negative definite such that the considered practical system is
stable.
Combining (1) and (3), the time derivative of (22) is given as follows

V_ k ðt Þ ¼ sT ðt Þs_ ðtÞ W
~TW ^_ c W
~T W ^_ a
ck k ak k
ð23Þ
~TW
¼ sT ðt ÞðGx þ g k ðxÞ þ hk ðxÞuk ðt ÞÞ W ^_ c W
~T W ^_ a
ck k ak k

Substituting the optimal controller (14) and the critic updating law (20) into (23), one has
h
V_ k ðt Þ ¼ sT ðt Þ Gx þ g k ðxÞ þ hk ðxÞ ck R1
T 1
k hk ðxÞsðt Þ hk ðxÞGx

i h
12 R1
T c ~ T gck ek ðtÞ sT ðt ÞQ k sðt Þ
k hk ðxÞrUk ðsÞ W ak
T
W ck 2

c c þ eT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W

þ3eTk ðt Þ W ca
k k k k
i
1 ^ T c ~ _
T ^
þ 4 W ak }k ðx; sÞ W ak W ak W ak
ð24Þ
¼ s ðt Þg k ðxÞ ck s ðt ÞIk ðxÞsðtÞ s ðtÞIk ðxÞ rU T c
k ðsÞ W ak
T T 1 T
2
h
W~ T gck ek ðtÞ sT ðt ÞQ k sðt Þ þ 3eT ðt Þ W
cc
ck 2 k k

ca
þeTk ðt ÞrUk ðsÞ@k ðxÞrUTk ðsÞ W k
i
1 ^ T c ~ T ^_
þ 4 W ak }k ðx; sÞ W ak W ak W ak

By utilizing Lemma 1, we have

1 2 1 2
sT ðt Þg k ðxÞ 6 ksðtÞk þ kg k ðxÞk ð25Þ
2 2
Substituting (25) into (24) yields

ca
V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 12 sðtÞ 12 sT ðt ÞIk ðxÞrUTk ðsÞ W k
h g e ðt Þ
W~T k c k cc
sT ðt ÞQ k sðt Þ þ 3eTk ðt Þ W
ck 2 k

ð26Þ
ca
þeTk ðt ÞrUk ðsÞ@k ðxÞrUTk ðsÞ W k
i
^ T } ðx; sÞ W
þ 14 W ca W~T W ^_ a þ 1 kg ðxÞk2
ak k k ak k 2 k

According to (15), there is the following equation

1 T T
sT ðt ÞQ k sðt Þ ¼ W k }k ðx; sÞW k W k rUk ðsÞg k ðxÞ xk ðt Þ ð27Þ
4

762
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

Inserting (27) into (26) produces

V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 12 sðtÞ 12 sT ðt ÞIk ðxÞrUTk ðsÞ W
ca
k
h g e ðt Þ T
~
W ck 2
T ck k 1 T
W k }k ðx; sÞW k W k rUk ðsÞg k ðxÞ
4

c c þ eT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W

xk ðt Þ þ 3eTk ðt Þ W ca
k k k k
i
^ T } ðx; sÞ W
ca ~T W _
^ a þ 1 kg ðxÞk2
þ 14 W ak k k
W ak k 2 k

ca
6 sT ðtÞ ck Ik ðxÞ 12 sðtÞ 12 sT ðt ÞIk ðxÞrUTk ðsÞ W ð28Þ
k

gc T gc T
k ~ T k ~ T
þ 8 W ck ek ðtÞ W k }k ðx; sÞW k 2 W ck ek ðt Þ W k rUk ðsÞg k ðxÞ
gc
k ~ T ek ðtÞxk ðt Þ þ 3gck W
W ~ T ek ðtÞeT ðt Þ W
cc
2 ck 2 ck k k

gc
þ k ~ T ek ðtÞeT ðt ÞrUk ðsÞ@k ðxÞrUT ðsÞ W
W ca
2 ck k k k

þ
gc
k ~ T e k ðt ÞW
W ^ T } ðx; sÞ W
ca W
~T W ^_ a þ 1 kg ðxÞk2
8 ck ak k k ak k 2 k

By utilizing the Assumption 2,3, we have

gck ~ T k
gc u g
W ck ek ðt Þxk ðt Þ 6 k W ~ c þ c k jx
~ T ðt ÞW k j2 ð29Þ
ck
2 k
8 2
3gc k
3gc u
k ~ T ek ðtÞeT ðt Þ W
W cc 6 ~T W
W kcc
2 ck k k ck2 k

3gc uk

k ~
6 2 W ck W ck W ck
T
ð30Þ
k
3g c u k
3gc u 2
6 k ~ T Wc þ
W k
W ck
4 ck k 4

It follows from (28) to (30) that

V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 12 sðtÞ 12 sT ðt ÞIk ðxÞrUTk ðsÞ W
ca
k

gc T gc T
k ~ T k ~ T
þ 8 W ck ek ðtÞ W k }k ðx; sÞW k 2 W ck ek ðt Þ W k rUk ðsÞg k ðxÞ
k
5gc u
~ T W c þ gck W
W ~ T ek ðt ÞeT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W
ca
ð31Þ
k
8 ck k 2 ck k k k

þ
gc
k ~ T e k ðt ÞW
W ^ T } ðx; sÞ W
ca ~T W
W ^_ a þ gck jx
k j2
8 ck ak k k ak k 2

k
3gc u 2
þ 12 kg k ðxÞk2 þ k
4
W ck

Substituting the facts that

gck ~ T T k
gc u g 2
W ck ek ðt Þ W k }k ðx; sÞW k 6 k W ~TW ~ c þ ck W T } ðx; sÞW ð32Þ
ck k k k
8 k
8 32

gck ~ T T k
gc u g 2
W ck ek ðt Þ W k rUk ðsÞg k ðxÞ 6 k W ~TW ~ c þ ck W T rUk ðsÞg ðxÞ ð33Þ
ck k k
2 k
8 2
into (31) results in

V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 12 sðtÞ 12 sT ðt ÞIk ðxÞrUTk ðsÞ W
ca
k

k
3gc u
k ~ T W c þ gck W
W ~ T ek ðt ÞeT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W
ca
8 ck k 2 ck k k k

þ
gc
k ~ T e k ðt ÞW
W ^ T } ðx; sÞ W
ca W
~T W ^_ a þ 1 kg ðxÞk2 þ gck jx
k j2
8 ck ak k k ak k 2 k 2 ð34Þ
k
3gc u 2 gc
T
2
þ k
4
W ck þ 32
k
W k }k ðx; sÞW k

gc T 2
þ 2
k
W k rUk ðsÞg k ðxÞ

763
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

In virtue of Lemma 1, it can be deduced that

ca
12 sT ðtÞIk ðxÞrUTk ðsÞ W ¼ 12 sT ðtÞIk ðxÞrUTk ðsÞW ak
k

þ 12 sT ðt ÞIk ðxÞrUTk ðsÞW ak
~ T rUk ðsÞIT ðxÞsðt Þ þ 1 sT ðt Þsðt Þ ð35Þ
6 1W 2 ak k 8
2
þ 12 ITk ðxÞrUTk ðsÞW ak

Substituting (35) into (34), it is true that

V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 58 sðtÞ þ 12 W
~ T rUk ðsÞIT ðxÞsðt Þ
ak k
k
3gc u
k ~ T W c þ gck W
W ~ T ek ðt ÞeT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W
ca
8 ck k 2 ck k k k

^_ a þ 1 IT ðxÞrUT ðsÞW
gc 2
þ k ~ T e k ðt ÞW
W ^ T } ðx; sÞ W
ca W
~T W
8 ck ak k k ak k 2 k k ak ð36Þ
gc k
3gc u 2 gc T 2
þ 12 kg k ðxÞk2 þ 2
k kj þ
jx 2 k
4
W ck þ 32
k
W k }k ðx; sÞW k
gc
T
2
þ 2
k
W k rUk ðsÞg k ðxÞ

Combining the actor updating law (21) with (36) yields

V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 58 sðtÞ
k
3gc u
W ~ T W c þ gck W
k ~ T ek ðt ÞeT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W
ca
ck 8 k 2 ck k k k
gc
g u
k ~ T ^ c ~
þ 8 W ck ek ðtÞW ak }k ðx; sÞ W ak W ak 2 rUk ðsÞ@k ðxÞrUTk ðsÞ W
T T c k k c c þ gck ek ðt ÞW
^ T } ðx; sÞ W
cc g W ca
k 8 ak k k ak k

2 ð37Þ
þ 12 ITk ðxÞrUTk ðsÞW ak
gc k
3gc u 2 gc T 2
þ 12 kg k ðxÞk2 þ 2
k k j2 þ
jx k
4
W ck þ 32
k
W k }k ðx; sÞW k
gc T 2
þ 2
k
W k rUk ðsÞg k ðxÞ

With the help of Lemma 1 and Assumption 3, we derive

gc
k
W c a 6 gck u k W
~ T ek ðt ÞeT ðt ÞrUk ðsÞ@k ðxÞrUT ðsÞ W ~ T rUk ðsÞ@k ðxÞrUT ðsÞ W
ca
2 ck k k k 2 ck k k

k
6 ~ T rUk ðsÞ@k ðxÞrUT ðsÞW a þ gck u k W
W
gc u
k ~ T rUk ðsÞ@k ðxÞrUT ðsÞW
ck 2 k k 2 ck k ak
k
gc u
T k
gc u

6 k2 W ck rUk ðsÞ@k ðxÞrUk ðsÞW ak þ k2 W
T ~ T rUk ðsÞ@k ðxÞrU ðsÞW
T
ck k ak
k
gc u
^ T rUk ðsÞ@k ðxÞrUT ðsÞW a
ð38Þ
þ k
2
W ck k k

k 2
6
gc u
k ~ T W c þ gck u k W
W ~ T W a þ gck u k rUk ðsÞ@k ðxÞrUT ðsÞW
8 ck k 4 ak k 2 k ak

k
gc u
T 2
k
gc u
þ k
W ck rUk ðsÞ@k ðxÞrUTk ðsÞ þ k ^ T rUk ðsÞ@k ðxÞrUT ðsÞW a
W
4 2 ck k k

Substituting (38) into (37), one can obtain that

gc
V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 58 sðtÞ þ 8k W
~ T ek ðtÞW
ck
^ T } ðx; sÞ W
ak k
ca
k

k

gc u
k ~ T W c þ gck u k W
W ~ T W a gck W
~ T ek ðt ÞW
^ T } ðx; sÞ W
cc
4 ck k 4 ak k 8 ak ak k k

2
~T W
þgak W c a þ gck jx
k j2 þ 12 kg k ðxÞk2 þ 12 ITk ðxÞrUTk ðsÞW a
ak k 2 k

gc T 2 gc T 2 ð39Þ
þ 32k W k }k ðx; sÞW k þ 2
k
W k rUk ðsÞg k ðxÞ
k
3gc u 2 k
gc u 2
þ k
4
W ck þ k
2
rUk ðsÞ@k ðxÞrUTk ðsÞW ak
k
gc u
T 2
þ k
4
W ck rUk ðsÞ@k ðxÞrUTk ðsÞ

764
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

According to Lemma 1, one has

gc
T
W~ T ek ðt ÞW
k c a ¼ gck W
^ T } ðx; sÞ W ~ T ek ðtÞ W } ðx; sÞW
8 ck ak k k 8 ck ak k ak

gc
T g
8k W~ T ek ðtÞ W } ðx; sÞW a ck W ~ T ek ðt ÞW
~ T } ðx; sÞW
ck ak k k 8 ck ak k ak

gc
þ k
W ~ T } ðx; sÞW a
~ T ek ðtÞW
8 ck ak k k

k
6
gc u
k
W~ T W c þ gck u k W
~ T W a þ gck W^ T ek ðt ÞW
^ T } ðx; sÞW a
8 ck k 16 ak k 8 ck ak k k

gc
T 2
gc
T
þ 16k
W ak }k ðx; sÞW ak þ 16k W ~ T } ðx; sÞW W
}Tk ðx; sÞW ak
ak k ak ak
ð40Þ
gc
T 2
gc
T
þ 16k W ak }k ðx; sÞW ck ~ T } ðx; sÞW W }T ðx; sÞW a
þ 16k W ak k ck ck k k

k
6 W~ T W c þ gck u k W
gc u
k ~ T W a þ gck W ~ T ðAk þ Bk ÞW a
ck 8k 16 ak k 16 ak k

gc g
T 2
^ T ek ðtÞW
þ 8k W ^ T } ðx; sÞW a þ k W c
}k ðx; sÞW ak
ck ak k k 16 ak

gc
T 2
þ 16k W ak }k ðx; sÞW ck

T T
where Ak ¼ }k ðx; sÞW ak W ak }Tk ðx; sÞ 2 Rmm and Bk ¼ }k ðx; sÞW ck W ck }Tk ðx; sÞ 2 Rmm .
It can be readily verified from (39) and (40) that
k
gc u
V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 58 sðtÞ k8 W ~ T W c þ 3gck u k W
~ T Wa
ck k 8 ak k

gc T 2
~ T ðAk þ Bk ÞW a þ g W
þ 16k W ~T W c a þ gck W } ðx; sÞW
ak k ak ak k 16 ak k ak

gc T 2 k
gc u 2
þ 2
k
W k rUk ðsÞg k ðxÞ þ k
2
rUk ðsÞ@k ðxÞrUTk ðsÞW ak
2 gc
T 2 ð41Þ
þ 12 ITk ðxÞrUTk ðsÞW ak þ 12 kg k ðxÞk2 þ 16
k
W ak }k ðx; sÞW ck

gc k
3gc u 2 gc T 2
þ 2
k k j2 þ
jx k
4
W ck þ 32
k
W k }k ðx; sÞW k
k
gc u
T 2
þ k
4
W ck rUk ðsÞ@k ðxÞrUTk ðsÞ

Substituting the fact that

~T W ^a 6
gak ~ T ~ ga 2
gak W ak W ak W ak þ k W ak ð42Þ
k
2 2
into (41) yields
k
gc u
V_ k ðt Þ 6 sT ðtÞ ck Ik ðxÞ 58 sðtÞ k8 W ~ T Wc
ck k

gc
h i ð43Þ
k
3u ~
2 1 4 8 kmin ðAk þ Bk Þ W ak W ak þ hk ðt Þ
k 1 T

where kmin ðAk þ Bk Þ is the minimum eigenvalue of the matrix ðAk þ Bk Þ, and

k
gc u 2 k
gc u
T 2
hk ðt Þ ¼ k
2
rUk ðsÞ@k ðxÞrUTk ðsÞW ak þ k
4
W ck rUk ðsÞ@k ðxÞrUTk ðsÞ

ga 2 gc
T 2 2
þ 2
k
W ak þ þ 16k W ak }k ðx; sÞW ak þ 12 ITk ðxÞrUTk ðsÞW ak

gc
T 2
gc k
3gc u 2
þ 12 kg k ðxÞk2 þ 16
k
W ak }k ðx; sÞW ck þ 2
k k j2 þ
jx k
4
W ck

gc T 2 gc T 2
þ 32k W k }k ðx; sÞW k þ 2
k
W k rUk ðsÞg k ðxÞ

Let
5
k1k ¼ 2ck kmin ðIk ðxÞÞ ð44Þ
4

765
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

where kmin ðIk ðxÞÞ is the minimum eigenvalue of the matrix Ik ðxÞ, and
1
k 2k ¼ g uk ð45Þ
4 ck
k 1
3u
k3k ¼ gck 1 kmin ðAk þ Bk Þ ð46Þ
4 8

kmax ðIk Þ
1 ¼ max ð47Þ
k2X kmin ðIk Þ
where kmin ðIk Þ and kmax ðIk Þ are the minimum eigenvalue and maximum eigenvalue of the matrix Ik , respectively. To guar-
antee the stability of the considered closed-loop system, the parameters ck ; u k ; k 2 Xr , are required to satisfy
k < ð8 kmin ðAk þ Bk ÞÞ=6, respectively.
ck > 5=ð8kmin ðIk ðxÞÞÞ and u
With regard to (44), (45) and (46), and then taking (43) into account yields the following inequality.

V_ k 6 k1k sT ðt ÞsðtÞ k2k W
~TW
ck
~ c k3 W
k k
~T W
ak
~ a þ hk ðtÞ
k
ð48Þ

Furthermore, we have.

V_ k ðt Þ 6 k V k ðt Þ þ h ð49Þ
where
n o
k ¼ min k1k ; k2k ; k3k
k2X

and

¼ supfhk g
h
k2X

Remark 5. Compared with some general optimal control schemes for nonlinear systems, the developed control scheme in
this paper can also guarantee the optimal control performance. It is different from the optimal control of nonlinear systems
based on system states, the optimal control scheme proposed in this paper provides faster control response by combining
sliding mode technology. Besides, the computational complexity booms as the dimensionality of the system increases during
the process of solving the HJB equation. To cope with this problem, the AC dual networks architecture is used in this paper.
Under this design architecture, the issue of curse of dimensionality can be well avoided.
h iT
2 j1 such that lðkWkÞ 6 V k ðWÞ 6 l
In addition, there are l; l ðkWkÞ with W ¼ s; W ak ; W ck . According to (47), we can
deduce that
V k ðWÞ 6 1V l ðWÞ; 8k; l 2 X ð50Þ

It is not hard to observe that the function KðtÞ ¼ e k t V rðtÞ ðWðtÞÞ is piecewise differentiable along solutions of the system

(1). In view of (49), on each interval t p ; t pþ1 , we get.

K_ ðt Þ ¼ k ek t V rðtÞ ðWðtÞÞ þ ek t V_ rðtÞ ðWðt ÞÞ
ð51Þ
6h e k t ; t 2 tp ; t pþ1

Reviewing the inequality V k ðWÞ 6 1V l ðWÞ; 8k; l 2 X, the following results can be obtained

K t pþ1 ¼ ek tpþ1 V rðtpþ1 Þ W tpþ1

6 1e k tpþ1 V rðtp Þ W t pþ1 ¼ 1K t pþ1 ð52Þ
h R i
t
6 1 K t p þ pþ1 h ek t dt
tp

Selecting an arbitrary T > t 0 ¼ 0 and integrating the (52) from p ¼ 0 to p ¼ Cr ðT; 0Þ 1, one has
RT
KðT Þ 6 K t Cr ðT;0Þ þ tC ðT;0Þ hek t dt
h r
R t T;0Þ k t RT i
6 1 K tCr ðT;0Þ1 þ tCCrððT;0 he dt þ 11 t e k t dt
h
Þ1 r C r ðT;0Þ

6 ð53Þ
" Cr X
ðT;0Þ1
#
R tp þ1 k t RT
6 1Cr ðT;0Þ Kð0Þ þ 1p tp
he dt þ 1Cr ðT;0Þ t ek t dt
h
C r ðT;0Þ
p¼0

766
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

Since ba > log 1= k , for any n 2 0; k log 1= k , one has ba > log 1= k n . According to Definition 1, it holds that.

k n ðT t Þ
Cr ðT; 0Þ 6 C0 þ ; 8T P t P 0 ð54Þ
log 1

Noticing that Cr ðT; 0Þ p 6 1 þ Cr T; tpþ1 and p ¼ 0; 1; ; Cr ðT; 0Þ, which implies.

nÞðTt pþ1 Þ
1Cr ðT;0Þp 6 11þC0 eðk ð55Þ

Moreover, paying attention to n < k, we have
Z t pþ1 Z t pþ1
ek t dt 6 eðk nÞtpþ1 ent dt

h h ð56Þ
tp tp

Then, it follows from (53) to (56) that

Z T
ent dt

KðT Þ 6 1Cr ðT;0Þ Kð0Þ þ 11þT 0 eðk nÞT
h ð57Þ
0

This means
lðkWðT ÞkÞ 6 V rðT Þ ðWðT ÞÞ
log 1
6 eT 0 log 1 e ba k T l ðkWð0ÞkÞ þ 11þT 0 hn 1 enT ð58Þ
log 1
6T 0 log 1 e ba k T l ðkWð0ÞkÞ þ 11þT 0 hn ; 8T > 0

~ c and W
According to (58) and n > 0, if ba > log 1= k n , then the variable s; W ~ a are bounded with bounded initial val-
k k

^ c and W
ues. Furthermore, it can be obtained that x; W ^ a are also bounded. Thus, we can conclude that all the signals of the
k k

considered switched closed-loop system are bounded under an arbitrary switching signal rðt Þ satisfying the ADT

ba > log 1= k n .

Remark 6. The obtained optimal control policy (14) can stabilize the switched nonlinear systems (1) and minimize the local
cost function when the system moves on the sliding mode surface (2). It is worth mentioning that the ideal sliding mode
control is considered in this paper, which means the system trajectories strictly remain on the switching surface s ¼ 0.
However, in practice, due to the existence of the chattering phenomenon, the sliding mode motion does not strictly occur on
the switching surface s ¼ 0, which is the problem we will solve in the future work.

Remark 7. It should be mentioned that general non-optimal control approaches can only guarantee that all signals in the
closed-loop system are bounded and convergent, but cannot achieve the optimal control objective, which fails to make
the tradeoff between performance and control cost, while this issue is of great importance for many practical applications.
In this paper, the proposed optimal control method can not only ensure that all signals in the closed-loop system are con-
vergent and bounded, but also guarantee that the control cost is minimal.

5. Simulation study

In this section, to make a better presentation of the universality of the developed control scheme, a practical example of
the two inverted pendulums connected via a spring [48] is exhibited to show the validity of the proposed SMS-based adap-
tive AC optimal control method. The dynamic equation of the dual inverted pendulums is given by
8
> J € k0 h2 sin ðh1 Þ þ k0 h ðl0 lÞ þ urðtÞ;1 þ k0 h2 sin ðh2 Þ
1 h1 ¼ M 1 g 0 h 4
>
< 2 4

2 2 ð59Þ
>
> J 2 h€2 ¼ M 2 g 0 h 04 sin ðh2 Þ þ 20 ðl0 lÞ þ urðtÞ;2 þ 04h sin h_ 1
k h k h k
:

The structure of the dual inverted pendulums is shown in Fig. 2. urðtÞ;1 and urðtÞ;2 ; rðtÞ ¼ f1; 2g, denote control inputs. It
should be emphasized that many practical applications invariably manifest the characteristic of multimodality due to the
changes of environmental factors, and thus the system models of practical applications are able to be described via a switch-
able architecture. In this part, we assume that there are two modes in the dual inverted pendulums system, thus the switch-
ing signal is set as rðtÞ ¼ f1; 2g. The parameters and meanings of the dual inverted pendulums used in this example are
provided in Table 2.
767
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

Fig. 2. Two Inverted pendulums connected via a spring.

Table 2
Parameters used in the two inverted pendulums.

Parameter Meaning Value

J 1 Moment of inertia (the first pendulum) 0.3 kg m2

J 2 Moment of inertia (the second pendulum) 0.4 kg m 2

M1 Mass of the first pendulum end 1 ðkgÞ

M2 Mass of the second pendulum end 1 ðkgÞ
l Distance between the pendulum hinges 0.8 ðmÞ

h The pendulum height 1 ðmÞ

g0 Gravitational acceleration 9.80665 m=s2
k0 Spring constant 6 ðN=mÞ
l0 Natural length of the spring 0.85 ðmÞ

h iT
Defining ½x1 ; x3 T ¼ ½h1 ; h2 T and ½x2 ; x4 T ¼ h_ 1 ; h_ 2 are angular positions and angular rates of the dual inverted pendulums,
respectively. Then, the dynamic Eq. (59) can be rewritten as
8
>
> x_ 1 ¼ x2
>
>
>
> _
2 u 2
¼ M1Jg0 h k4J0 h sin ðx1 Þ þ k2J0 h ðl0 lÞ þ rJðtÞ;1 þ k4J0 h sin ðx3 Þ
< x2
> 1 1 1 1 1

x_ 3 ¼ x4 ð60Þ
>
>
>
> 2 u 2
> x_ 4
> ¼ M2Jg0 h k4J0 h sin ðx3 Þ þ k2J0 h ðl0 lÞ þ rJðtÞ;1 þ k4J0 h sin ðx2 Þ
>
: 2 2 2 1 2

In view of (60) and the system (1), we can get that hrðtÞ ðxÞ ¼ diag x2 ; J 1 ; x4 ; J 2 and
2 3
0
6 M1 g0 h k0 h2 sin ðx1 Þ þ k0 h ðl0 lÞ þ k0 h2 sin ðx3 Þ 7
6 J1 7
g rðtÞ ðxÞ ¼ 6 7:
4J 1 2J 1 4J 1
6 0 7
4 5

M2 g0 h k0 h2
k0 h 2
k0 h
J
4J
sin ðx 3Þ þ
2J
ð l0 lÞ þ
4J
sin ð x 2Þ
2 2 2 2

To cope with the SMS-based HJB Eq. (7) to obtain the optimal control policy, the actor-critic dual NNs are introduce in this
paper. Under this architecture, an approximate solution to the HJB equation can be obtained and the issue of curse of dimen-
sionality can be well avoided. Taking the NNs approximation into account, radius basis functions NNs with 5 nodes are used.
The basis function vector is given as UðvÞ ¼ ½U1 ðvÞ; U2 ðvÞ; ; U5 ðvÞT with the Gaussian-type
h i
Ui ðvÞ ¼ exp ðv fi Þ ðv fi Þ=qi ; i ¼ 1; 2; ; 5. Then, the center and widths of the Gaussian-type function are chosen as
T 2

fi ¼ f2; 1; 0; 1; 2g and qi ¼ 2. Based on the basis function vectors, the weight estimation vectors in actor-critic dual net-
works are chosen as
768
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

h iT
ca ¼ W
W ca ; Wca ; Wca ; W ca ; W ca ;
1 1;1 1;2 1;3 1;4 1;5
h iT
cc ¼ W
W cc ; Wcc ; Wcc ; W cc ; W cc ;
1 1;1 1;2 1;3 1;4 1;5

h iT
ca ¼ W
W ca ; Wca ; Wca ; W ca ; W ca ;
2 2;1 2;2 2;3 2;4 2;5
h iT
cc ¼ W
W cc ; Wcc ; Wcc ; W cc ; W cc :
2 2;1 2;2 2;3 2;4 2;5

It is worth noting that the initial values of actor-critic weight vectors are selected under the trial-and-error method, and it
is found that the switched system is stable when the initial values of the weight vectors are selected as
c a ð0Þ ¼ ½0:2; 0:1; 0:3; 0:4; 0:5T ;
W 1

c
W c ð0Þ ¼ ½0:01; 0:06; 0:08; 0:07; 0:05T ;
1

c a ð0Þ ¼ ½0:3; 0:6; 0:2; 0:1; 0:5T ;

W 2

c c ð0Þ ¼ ½0:02; 0:08; 0:04; 0:07; 0:05T :

W 2

In addition, the initial values of the system states are selected as x ¼ ½x1 ; x2 ; x3 ; x4 T ¼ ½0; 0; 0; 0T . For rðt Þ ¼ f1; 2g, the cost
functions for different modes are formulated as follows:
R1 T
J 1 ðsÞ ¼ s ðtÞQ 1 sðt Þ þ uT1 ðtÞR1 u1 ðtÞ ds;
Rt1 T
J 2 ðsÞ ¼ t s ðtÞQ 2 sðt Þ þ uT2 ðtÞR2 u2 ðtÞ ds:
In order to ensure that matrix Q 1 ; Q 2 ; R1 and R2 are all invertible positive definite matrices, which are selected as

5:36 0 4:85 0 9:6 0 5 0
Q1 ¼ ; Q2 ¼ ; R1 ¼ ; R2 ¼ :
0 9:8 0 4:7 0 8 0 4:12
The controllers parameters are taken as c1 ¼ 0:5; c2 ¼ 0:42; u 1 ¼ 4:3 and u
2 ¼ 0:13. Taking the critic updating laws (20)
and the actor updating laws (21) into account, the updating rates are set as gc1 ¼ 0:08; ga1 ¼ 0:13; gc2 ¼ 0:9 and ga2 ¼ 0:27,
respectively. The positive-definite matrix in (2) is selected as

6:5 0
G¼ :
0 9:55
The simulation are implemented via utilizing the MATLAB (2018a) and the obtained simulation results are shown in
Figs. 3–10. Fig. 3 shows the convergence of the system states, it can be seen from Fig. 3 that the system state trajectories
are unstable in the first 27s, the reason lies in that the chattering phenomenon is inevitable when using the sliding mode
technology. After that, the system states gradually converge to a small neighborhood of the origin. Figs. 4–7 present weight
estimations of the critic and actor networks for the different system modes, which can be found that they start to converge to
constants after 23s. It should also be mentioned here that the actor NN aims to execute control actions, and the critic NN
makes an estimation on the actions and make feedback to the actor NN. Moreover, Persistence of excitation in Assumption
3 guarantees that weight estimations of the critic and actor NNs converge close to optimal values. From the critic updating

Fig. 3. Trajectories of system states.

769
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

^a .
Fig. 4. Actor network weight vector W 1

^c .
Fig. 5. Critic network weight vector W 1

^a .
Fig. 6. Actor network weight vector W 2

770
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

^c .
Fig. 7. Critic network weight vector W 2

Fig. 8. The control inputs u1;1 and u1;2 .

Fig. 9. The control inputs u2;1 and u2;2 .

771
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

Fig. 10. Switching signal.

law (20) and the actor updating law (21), we know that actor NN and critic NN are tuned to minimize EB ðtÞ , ð1=2ÞH2k ðt Þ. The
trajectories of the adaptive optimal control policies are displayed in Figs. 8 and 9, from which we can see that the controllers
are unstable in the first 25s due to the existence of chattering phenomenon, and then converges to a small neighborhood of
the origin quickly. The switching signal is given in Fig. 10. Therefore, we can conclude that the boundedness of all the signals
h iT
^ a ;W
can be ensured, i.e., ½x1 ; x2 ; x3 ; x4 T ¼ h1 ; h_ 1 ; h2 ; h_ 2 ; W ^ c ;W
^ a ;W
^ c ; u1;1 ; u1;2 ; u2;1 and u2;2 are bounded and convergent via
1 1 2 2

using the developed adaptive actor-critic optimal control in the paper. These simulation results also indicates that the pro-
posed control scheme is feasible to guarantee the stability of the considered closed-loop system.

6. Conclusions

This paper presents a sliding-mode surface-based adaptive optimal control method for switched nonlinear systems with
average dwell time under an actor-critic architecture. A specific cost function related to sliding mode surface is constructed
to find a series of optimal control policies more quickly. The HJB equation is solved by a SMS-based critic updating law
nested with the actor updating law. The designed critic updating law and actor updating law can guarantee that the weights
of critic and actor NNs converge to small neighborhoods of ideal values. Based on the Lyapunov stability theory, all the sig-
nals of the closed-loop system are proved to be bounded. The simulation results show the effectiveness of the proposed opti-
mal control scheme. On the basis of the results in this paper, the input-output constraints problems of switched pure-
feedback nonlinear systems will be the focus in our future research.

CRediT authorship contribution statement

Haoyan Zhang: Conceptualization, Methodology, Writing - original draft. Huanqing Wang: Conceptualization, Visualiza-
tion. Ben Niu: Conceptualization, Visualization. Liang Zhang: Conceptualization, Visualization. Adil M. Ahmad: Conceptua-
lization, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.

Acknowledgements

The Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia has funded this project,
under grant no. (FP-161-43). The authors, therefore, acknowledge with thanks DSR for technical and financial support. This
work was also partially supported by the Education Committee Liaoning Province, China (LJ2019002).

References

[1] Y. Wang, B. Niu, H. Wang, N. Alotaibi, E. Abozinadah, Neural network-based adaptive tracking control for switched nonlinear systems with prescribed
performance: an average dwell time switching approach, Neurocomputing 435 (2021) 295–306.

772
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

[2] Y. Chang, P. Zhou, B. Niu, H. Wang, N. Xu, M. Alassafi, A. Ahmad, Switched-observer-based adaptive output-feedback control design with unknown gain
for pure-feedback switched nonlinear systems via average dwell time, Int. J. Syst. Sci. (2021) 1–15, https://doi.org/10.1080/00207721.2020.1863503.
[3] N. Xu, X. Zhao, G. Zong, Y. Wang, Adaptive Control Design for Uncertain Switched Nonstrict-Feedback Nonlinear Systems to Achieve Asymptotic
Tracking Performance, Applied Mathematics and Computation 408 (126344) (2021).
[4] A. Molter, M. Rafikov, Nonlinear optimal control of population systems: applications in ecosystems, Nonlinear Dyn. 76 (2) (2014) 1141–1150.
[5] X. Tang, D. Zhai, X. Li, Adaptive fault-tolerance control based finite-time backstepping for hypersonic flight vehicle with full state constrains, Inf. Sci.
507 (2020) 53–66..
[6] H. Xiao, C. P. Chen, Time-varying Nonholonomic Robot Consensus Formation Using Model Predictive Based Protocol With Switching Topology, Inf. Sci.
567 (2021) 201–215..
[7] X. Zhao, X. Wang, L. Ma, G. Zong, Fuzzy approximation based asymptotic tracking control for a class of uncertain switched nonlinear systems, IEEE
Trans. Fuzzy Syst. 28(4) (2019) 632–644..
[8] Z. Liu, B. Chen, C. Lin, Adaptive neural quantized control for a class of switched nonlinear systems, Inf. Sci. 537 (2020) 313–333..
[9] X. Su, L. Wu, P. Shi, C.P. Chen, Model approximation for fuzzy switched systems with stochastic perturbation, IEEE Trans. Fuzzy Syst. 23 (5) (2014)
1458–1473.
[10] Y. Li, S. Sui, S. Tong, Adaptive fuzzy control design for stochastic nonlinear switched systems with arbitrary switchings and unmodeled dynamics, IEEE
Trans. Cybern. 47 (2) (2016) 403–414.
[11] J. Lu, Z. She, W. Feng, S.S. Ge, Stabilizability of time-varying switched systems based on piecewise continuous scalar functions, IEEE Trans. Autom.
Control 64 (6) (2018) 2637–2644.
[12] R. Ma, S. An, Minimum dwell time for global exponential stability of a class of switched positive nonlinear systems, IEEE/CAA J. Autom. Sin. 6 (2) (2018)
471–477.
[13] X. Liu, S. Zhong, Q. Zhao, Dynamics of delayed switched nonlinear systems with applications to cascade systems, Automatica 87 (2018) 251–257.
[14] L. Ma, N. Xu, X. Huo, X. Zhao, Adaptive finite-time output-feedback control design for switched pure-feedback nonlinear systems with average dwell
time, Nonlinear Anal.: Hybrid Syst. 37 (2020) 100908.
[15] B. Jiang, Q. Shen, P. Shi, Neural-networked adaptive tracking control for switched nonlinear pure-feedback systems under arbitrary switching,
Automatica 61 (2015) 119–125.
[16] G. Chesi, P. Colaneri, Homogeneous rational lyapunov functions for performance analysis of switched systems with arbitrary switching and dwell time
constraints, IEEE Trans. Autom. Control 62 (10) (2017) 5124–5137.
[17] P. Zhou, L. Zhang, S. Zhang, A.F. Alkhateeb, Observer-Based Adaptive Fuzzy Finite-Time Control Design with Prescribed Performance for Switched Pure-
Feedback Nonlinear Systems, IEEE Access 9 (2021) 69481–69491.
[18] R.E. Precup, R.C. Roman, T.A. Teban, A. Albu, E.M. Petriu, C. Pozna, Model-free control of finger dynamics in prosthetic hand myoelectric-based control
systems, Stud. Inf. Control 29 (4) (2020) 399–410.
[19] A. Turnip, J. Panggabean, Hybrid controller design based magneto-rheological damper lookup table for quarter car suspension, Int. J. Artif. Intell 18 (1)
(2020) 193–206.
[20] Y. Li, N. Xu, B. Niu, Y. Chang, J. Zhao, X. Zhao, Small-gain technique-based adaptive fuzzy command filtered control for uncertain nonlinear systems
with unmodeled dynamics and disturbances, International Journal of Adaptive Control and Signal Processing 35 (9) (2021) 1664–1684.
[21] S. Wen, M.Z. Chen, Z. Zeng, X. Yu, T. Huang, Fuzzy control for uncertain vehicle active suspension systems via dynamic sliding-mode approach, IEEE
Trans. Syst. Man Cybern.: Syst. 47 (1) (2016) 24–32.
[22] S. Yin, H. Yang, H. Gao, J. Qiu, O. Kaynak, An adaptive nn-based approach for fault-tolerant control of nonlinear time-varying delay systems with
unmodeled dynamics, IEEE Trans. Neural Networks Learn. Syst. 28 (8) (2016) 1902–1913.
[23] J. Lei, H.K. Khalil, Feedback linearization for nonlinear systems with time-varying input and output delays by using high-gain predictors, IEEE Trans.
Autom. Control 61 (8) (2015) 2262–2268.
[24] D. Liu, X. Yang, H. Li, Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics, Neural
Comput. Appl. 23 (7–8) (2013) 1843–1850.
[25] Y. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete-time systems with
dead-zone, IEEE Trans. Fuzzy Syst. 24 (1) (2015) 16–28.
[26] Y. Lv, J. Na, Q. Yang, X. Wu, Y. Guo, Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics, Int. J.
Control 89 (1) (2016) 99–112.
[27] B. Luo, D. Liu, H. Wu, Adaptive constrained optimal control design for data-based nonlinear discrete-time systems with critic-only structure, IEEE
Trans. Neural Networks Learn. Syst. 29 (6) (2017) 2099–2111.
[28] N. Xu, B. Niu, H. Wang, X. Huo, X. Zhao, Single-network adp for solving optimal event-triggered tracking control problem of completely unknown
nonlinear systems, Int. J. Intell. Syst. doi: 10.1002/int.22491..
[29] G. Wen, C.P. Chen, S.S. Ge, H. Yang, X. Liu, Optimized adaptive nonlinear tracking control using actor–critic reinforcement learning strategy, IEEE Trans.
Ind. Inf. 15 (9) (2019) 4969–4977.
[30] V. Schmid, Solving the dynamic ambulance relocation and dispatching problem using approximate dynamic programming, Eur. J. Oper. Res. 219 (3)
(2012) 611–621.
[31] D. Liu, D. Wang, D. Zhao, Q. Wei, N. Jin, Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized
dual heuristic programming, IEEE Trans. Autom. Sci. Eng. 9 (3) (2012) 628–634.
[32] S. Bhasin, R. Kamalapurkar, M. Johnson, K.G. Vamvoudakis, F.L. Lewis, W.E. Dixon, A novel actor–critic–identifier architecture for approximate optimal
control of uncertain nonlinear systems, Automatica 49 (1) (2013) 82–92.
[33] W. Wang, X. Chen, Model-free optimal containment control of multi-agent systems based on actor-critic framework, Neurocomputing 314 (2018)
242–250.
[34] R. Palm, D. Driankov, Design of a fuzzy gain scheduler using sliding mode control principles, Fuzzy Sets Syst. 121 (1) (2001) 13–23.
[35] T. Haidegger, L. Kovács, R.E. Precup, S. Preitl, B. Benyó, Z. Benyó, Cascade control for telerobotic systems serving space medicine, IFAC Proc. Vol. 44 (1)
(2011) 3759–3764.
[36] R.M. Asl, Y.S. Hagh, R. Palm, Robust control by adaptive non-singular terminal sliding mode, Eng. Appl. Artif. Intell. 59 (2017) 205–217.
[37] C. Chiu, Derivative and integral terminal sliding mode control for a class of mimo nonlinear systems, Automatica 48 (2) (2012) 316–326.
[38] A. Nasiri, S.K. Nguang, A. Swain, D. Almakhles, Passive actuator fault tolerant control for a class of mimo nonlinear systems with uncertainties, Int. J.
Control 92 (3) (2019) 693–704.
[39] J. Fei, C. Lu, Adaptive sliding mode control of dynamic systems using double loop recurrent neural network structure, IEEE Trans. Neural Networks
Learn. Syst. 29 (4) (2017) 1275–1286.
[40] R. Cui, L. Chen, C. Yang, M. Chen, Extended state observer-based integral sliding mode control for an underwater robot with unknown disturbances and
uncertain nonlinearities, IEEE Trans. Industr. Electron. 64 (8) (2017) 6785–6795.
[41] J. Xiong, G. Zhang, Global fast dynamic terminal sliding mode control for a quadrotor uav, ISA Trans. 66 (2017) 233–240.
[42] X. Zhao, H. Yang, G. Zong, Adaptive neural hierarchical sliding mode control of nonstrict-feedback nonlinear systems and an application to electronic
circuits, IEEE Trans. Syst. Man Cybern.: Syst. 47 (7) (2016) 1394–1404.
[43] Q. Fan, G. Yang, Nearly optimal sliding mode fault-tolerant control for affine nonlinear systems with state constraints, Neurocomputing 216 (2016) 78–
88.
[44] H. Zhang, N. Xu, G. Zong, A. Alkhateeb, Adaptive fuzzy hierarchical sliding mode control of uncertain under-actuated switched nonlinear systems with
actuator faults, International Journal of Systems Science 52 (8) (2021) 1499–1514.

773
H. Zhang, H. Wang, B. Niu et al. Information Sciences 580 (2021) 756–774

[45] B. Zhao, D. Liu, C. Alippi, Sliding-Mode Surface-Based Approximate Optimal Control for Uncertain Nonlinear Systems With Asymptotically Stable Critic
Structure, IEEE Trans. Cybern. doi:10.1109/TCYB.2019.2962011..
[46] J. Zhao, J. Na, G. Gao, Adaptive dynamic programming based robust control of nonlinear systems with unmatched uncertainties, Neurocomputing 395
(2020) 56–65.
[47] J. Zhao, M. Gan, Finite-horizon optimal control for continuous-time uncertain nonlinear systems using reinforcement learning, Int. J. Syst. Sci. 51 (13)
(2020) 2429–2440.
[48] L. Long, Z. Wang, J. Zhao, Switched adaptive control of switched nonlinearly parameterized systems with unstable subsystems, Automatica 54 (2015)
217–228.

774

Season 1: The Rural Girl Fell in Love With A Gangster
92% (13)
Season 1: The Rural Girl Fell in Love With A Gangster
95 pages
CCNA1 v7.0 - ITN Practice PT Skills Assessment (PTSA) Answers
100% (2)
CCNA1 v7.0 - ITN Practice PT Skills Assessment (PTSA) Answers
22 pages
Optimal Tracking Control of Motion Systems
No ratings yet
Optimal Tracking Control of Motion Systems
11 pages
1-s2.0-S0020025524012647-main
No ratings yet
1-s2.0-S0020025524012647-main
22 pages
Asian Journal of Control - 2013 - Ornelas Tellez - Optimal Tracking For State Dependent Coefficient Factorized Nonlinear
No ratings yet
Asian Journal of Control - 2013 - Ornelas Tellez - Optimal Tracking For State Dependent Coefficient Factorized Nonlinear
14 pages
energies-17-00580
No ratings yet
energies-17-00580
18 pages
Algoritmos Genéticos
No ratings yet
Algoritmos Genéticos
10 pages
Observer-Based Event-Triggered Sliding Mode Control For Uncertain
No ratings yet
Observer-Based Event-Triggered Sliding Mode Control For Uncertain
30 pages
Bài-thầy-Nam
No ratings yet
Bài-thầy-Nam
16 pages
Reinforcement learning-based finite-time
No ratings yet
Reinforcement learning-based finite-time
12 pages
Development of A New Adaptive Backstepping Control Design For A Non-Strict and Under-Actuated System Based On A PSOTuner
No ratings yet
Development of A New Adaptive Backstepping Control Design For A Non-Strict and Under-Actuated System Based On A PSOTuner
17 pages
Gao 2018
No ratings yet
Gao 2018
10 pages
Applsci 11 03919 v3
No ratings yet
Applsci 11 03919 v3
14 pages
21-0010_03_MS
No ratings yet
21-0010_03_MS
7 pages
Adaptive Sliding Mode Control for Uncertain Active Suspension Systems With Prescribed Performance
No ratings yet
Adaptive Sliding Mode Control for Uncertain Active Suspension Systems With Prescribed Performance
9 pages
Applsci 13 13181
No ratings yet
Applsci 13 13181
21 pages
An_Adaptive_Sliding_Mode_Controller_with_The_Exponential_and_Power_Reaching_Law_for_Discrete_Systems
No ratings yet
An_Adaptive_Sliding_Mode_Controller_with_The_Exponential_and_Power_Reaching_Law_for_Discrete_Systems
6 pages
Neural Assumed Modes Method
No ratings yet
Neural Assumed Modes Method
11 pages
Robust Model Predictive Control Thesis
100% (3)
Robust Model Predictive Control Thesis
4 pages
54.03_13
No ratings yet
54.03_13
8 pages
A New Robust LMI-based Model Predictive Control For Continuous-Time Uncertain Nonlinear Systems
No ratings yet
A New Robust LMI-based Model Predictive Control For Continuous-Time Uncertain Nonlinear Systems
13 pages
Robust Adaptive Control of Robotic Manipulator With Input Timevarying Delay
No ratings yet
Robust Adaptive Control of Robotic Manipulator With Input Timevarying Delay
10 pages
Bumpless_Transfer_Hybrid_Non-Fragile_Finite-Time_Control_for_Markovian_Jump_Systems_and_its_Application
No ratings yet
Bumpless_Transfer_Hybrid_Non-Fragile_Finite-Time_Control_for_Markovian_Jump_Systems_and_its_Application
13 pages
Nonlinear-robust-integral-based-actor-critic-reinforceme_2025_Computers-and-
No ratings yet
Nonlinear-robust-integral-based-actor-critic-reinforceme_2025_Computers-and-
19 pages
Ijetae 0812 07
No ratings yet
Ijetae 0812 07
9 pages
Fuzzy15 00221
No ratings yet
Fuzzy15 00221
18 pages
Results in Physics: Mundher H.A. Yaseen, Haider J. Abd
No ratings yet
Results in Physics: Mundher H.A. Yaseen, Haider J. Abd
7 pages
Research developments in adaptive intelligent vibration control of smart civil structures
No ratings yet
Research developments in adaptive intelligent vibration control of smart civil structures
38 pages
Sensors: Efficient Force Control Learning System For Industrial Robots Based On Variable Impedance Control
No ratings yet
Sensors: Efficient Force Control Learning System For Industrial Robots Based On Variable Impedance Control
26 pages
1 s2.0 S0020025524003694 Main
No ratings yet
1 s2.0 S0020025524003694 Main
25 pages
Measurement: Ankush Chakrabarty, Suvadeep Banerjee, Sayan Maity, Amitava Chatterjee
No ratings yet
Measurement: Ankush Chakrabarty, Suvadeep Banerjee, Sayan Maity, Amitava Chatterjee
14 pages
Electronics 11 03499
No ratings yet
Electronics 11 03499
18 pages
10ijss Draft
No ratings yet
10ijss Draft
25 pages
Chang-2019-International_Journal_of_Robust_and_Nonlinear_Control (1)
No ratings yet
Chang-2019-International_Journal_of_Robust_and_Nonlinear_Control (1)
15 pages
An Intelligent Control Method For A Flexible-Link
No ratings yet
An Intelligent Control Method For A Flexible-Link
13 pages
Active Suspension Control of Full Car Systems Without Function Approximation PDF
No ratings yet
Active Suspension Control of Full Car Systems Without Function Approximation PDF
12 pages
1 s2.0 S0888327020301333 Main
No ratings yet
1 s2.0 S0888327020301333 Main
22 pages
Mechanical Systems and Signal Processing: Zhi-Cheng Qiu
No ratings yet
Mechanical Systems and Signal Processing: Zhi-Cheng Qiu
19 pages
Data-driven-based Sliding-mode Dynamic Event-triggered Control (2) (1)
No ratings yet
Data-driven-based Sliding-mode Dynamic Event-triggered Control (2) (1)
11 pages
Slotine Li Tac 88
No ratings yet
Slotine Li Tac 88
9 pages
Design A Robust Proportional-Derivative Gain-Sched
No ratings yet
Design A Robust Proportional-Derivative Gain-Sched
21 pages
constrainedLQR Vehicle Dynamics Applications of Optimal Control Theory
No ratings yet
constrainedLQR Vehicle Dynamics Applications of Optimal Control Theory
40 pages
New Intelligent AVR Controller Based On Particle Swarm Optimization For Transient Stability Enhancement
No ratings yet
New Intelligent AVR Controller Based On Particle Swarm Optimization For Transient Stability Enhancement
6 pages
IET Power Electronics - 2021 - Saadat - Adaptive Neuro Fuzzy Inference Systems ANFIS Controller Design On Single Phase
No ratings yet
IET Power Electronics - 2021 - Saadat - Adaptive Neuro Fuzzy Inference Systems ANFIS Controller Design On Single Phase
14 pages
Decentralized_Control_for_Large-Scale_Nonlinear_Systems_With_Unknown_Mismatched_Interconnections_via_Policy_Iteration
No ratings yet
Decentralized_Control_for_Large-Scale_Nonlinear_Systems_With_Unknown_Mismatched_Interconnections_via_Policy_Iteration
11 pages
Continuous Robust Control for Series ElasticActuator With Unknown Payload Parametersand External Disturbances2017一区南开孙雷
No ratings yet
Continuous Robust Control for Series ElasticActuator With Unknown Payload Parametersand External Disturbances2017一区南开孙雷
8 pages
Model-Based Adaptive Critic Designs: Editor's Summary
No ratings yet
Model-Based Adaptive Critic Designs: Editor's Summary
31 pages
Robust GDI-based Adaptive Recursive Sliding Mode Control (RGDI-ARSMC) For A Highly Nonlinear MIMO System With Varying Dynamics of UAV
No ratings yet
Robust GDI-based Adaptive Recursive Sliding Mode Control (RGDI-ARSMC) For A Highly Nonlinear MIMO System With Varying Dynamics of UAV
14 pages
Valve Controller Design of Multi-Machine Power Systems Based On Adaptive Hamilton Minimax Method
No ratings yet
Valve Controller Design of Multi-Machine Power Systems Based On Adaptive Hamilton Minimax Method
8 pages
Jiang 2017
No ratings yet
Jiang 2017
14 pages
aveghseo,+05_PPEECS_17127_P (1)
No ratings yet
aveghseo,+05_PPEECS_17127_P (1)
15 pages
FINAL Article
No ratings yet
FINAL Article
16 pages
Continuum Adapative Control
No ratings yet
Continuum Adapative Control
14 pages
Adaptive Backstepping Fault Tolerant Control For Flexible Spacecraft With Unknown Bounded Disturbances and Actuator Failures 2010 ISA Transactions
No ratings yet
Adaptive Backstepping Fault Tolerant Control For Flexible Spacecraft With Unknown Bounded Disturbances and Actuator Failures 2010 ISA Transactions
13 pages
10.1007@s00521-020-04977-6
No ratings yet
10.1007@s00521-020-04977-6
18 pages
Mathematics 11 01094
No ratings yet
Mathematics 11 01094
21 pages
2020.Research on Multi-Attribute Decision-Making in Condition-Based Maintenance for Power Transformers Based on Cloud and Kernel Vector Space Models
No ratings yet
2020.Research on Multi-Attribute Decision-Making in Condition-Based Maintenance for Power Transformers Based on Cloud and Kernel Vector Space Models
11 pages
Robust Control of A Ball and Beam System Through Sliding Mode Controller
No ratings yet
Robust Control of A Ball and Beam System Through Sliding Mode Controller
5 pages
Vibration Control of an Active Vehicle Suspension Systems Using Optimized
No ratings yet
Vibration Control of an Active Vehicle Suspension Systems Using Optimized
9 pages
A survey on anti-disturbance control of switched systems with input saturation
No ratings yet
A survey on anti-disturbance control of switched systems with input saturation
9 pages
1 s2.0 S0016003224003363 Main
No ratings yet
1 s2.0 S0016003224003363 Main
20 pages
Decentralized Control of Complex Systems
From Everand
Decentralized Control of Complex Systems
Dragoslav D. Siljak
No ratings yet
Consumer Behaviour-External Factors
No ratings yet
Consumer Behaviour-External Factors
25 pages
Belt Inspect Adjust Replace PDF
No ratings yet
Belt Inspect Adjust Replace PDF
4 pages
Organizational Communication Worksheet
No ratings yet
Organizational Communication Worksheet
8 pages
Finite Element Analysis of Steel Cantilever I Beam
No ratings yet
Finite Element Analysis of Steel Cantilever I Beam
27 pages
High Power Electric Locomotives
No ratings yet
High Power Electric Locomotives
2 pages
Anne of Green Gables Audition Package
No ratings yet
Anne of Green Gables Audition Package
7 pages
BCS Modelling Business Processes Exam Questions v5.1 Sep18
No ratings yet
BCS Modelling Business Processes Exam Questions v5.1 Sep18
4 pages
Boiler Tube Analysis
100% (1)
Boiler Tube Analysis
8 pages
Log-2024 06 07 03 59
No ratings yet
Log-2024 06 07 03 59
5 pages
Advantages and Disadvantages of Fdi
No ratings yet
Advantages and Disadvantages of Fdi
10 pages
Electronic Medical Records
No ratings yet
Electronic Medical Records
31 pages
Azka Cafe: Tanveer Tabreez Tauseef & Syed Sibtain
No ratings yet
Azka Cafe: Tanveer Tabreez Tauseef & Syed Sibtain
35 pages
Unijunction Transistor Lecture Note
No ratings yet
Unijunction Transistor Lecture Note
7 pages
Creating A Dynamic Poll With Jquery and PHP
No ratings yet
Creating A Dynamic Poll With Jquery and PHP
8 pages
Rollix Slewing Ring Installation Maintenance English PDF
0% (1)
Rollix Slewing Ring Installation Maintenance English PDF
7 pages
Air Conditioning Calculations
No ratings yet
Air Conditioning Calculations
2 pages
EMI EMC Brochure
No ratings yet
EMI EMC Brochure
16 pages
MBA Brochure
No ratings yet
MBA Brochure
20 pages
PROFINET Assembling 8072 V212 Sept22
No ratings yet
PROFINET Assembling 8072 V212 Sept22
91 pages
L 1
No ratings yet
L 1
7 pages
Engine Builder 2016-12
No ratings yet
Engine Builder 2016-12
52 pages
CPA AUD Textbook
No ratings yet
CPA AUD Textbook
308 pages
Project Report Implementation of IFRS in India Hopes & Challenges
No ratings yet
Project Report Implementation of IFRS in India Hopes & Challenges
18 pages
Module 2 1
No ratings yet
Module 2 1
10 pages
Lesson 11 Design Thinking
No ratings yet
Lesson 11 Design Thinking
22 pages
C25 CHEMISTRY Practice Sheet - POC Part-2
No ratings yet
C25 CHEMISTRY Practice Sheet - POC Part-2
8 pages
HTLS Conductors
No ratings yet
HTLS Conductors
15 pages
Code tải file trên gg drive bị chặn
No ratings yet
Code tải file trên gg drive bị chặn
7 pages

B2.1

Uploaded by

B2.1

Uploaded by

Information Sciences 580 (2021) 756–774

Contents lists available at ScienceDirect

Sliding-mode surface-based adaptive actor-critic optimal

Rn The n-dimensional Euclidean space

2. Problem formulation and preliminaries

2.1. Problem formulation

Consider a class of continuous-time switched nonlinear systems described by

2.2. Neural networks (NNs)

F ðvÞ ¼ ðW  Þ UðvÞ þ dðvÞ

Lemma 1. [7] 8ðN 1 ; N 2 Þ 2 R2 , the following inequality holds

3. Optimal control design

Associated with (3) and (4), defined as

Hk ðs; uk ; rJ k ðsÞÞ ¼ ‘k ðs; uk Þ þ rJ Tk ðsÞs_

where rJ k ðsÞ ¼ @J k ðsÞ=@s is the gradient of J k ðsÞ.

Furthermore, the gradient of J k ðsÞ can be expressed as follows

rJk ðsÞ ¼ rUTk ðsÞW k þ rdk ðsÞ ð11Þ

where ^J k ðsÞ is the estimations of J k ðsÞ, and W

Then, the ideal optimal control policy is designed as

Substituting (11) and (12) into (7), it is true that

Fig. 1. Block diagram of the proposed AC optimal control scheme.

k ; k 2 Xr , such that jxk ðt Þj 6 x

The Bellman residual error Hk ðt Þ is denoted as follows

4. Main results and stability analysis

Proof. Consider the following Lyapunov function candidate:

c c þ eT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W

By utilizing Lemma 1, we have

According to (15), there is the following equation

Inserting (27) into (26) produces

c c þ eT ðtÞrUk ðsÞ@k ðxÞrUT ðsÞ W

By utilizing the Assumption 2,3, we have

It follows from (28) to (30) that

Substituting the facts that

In virtue of Lemma 1, it can be deduced that

Substituting (35) into (34), it is true that

Combining the actor updating law (21) with (36) yields

With the help of Lemma 1 and Assumption 3, we derive

Substituting (38) into (37), one can obtain that

According to Lemma 1, one has

Substituting the fact that

Then, it follows from (53) to (56) that

Fig. 2. Two Inverted pendulums connected via a spring.

Parameter Meaning Value

M1 Mass of the first pendulum end 1 ðkgÞ

c a ð0Þ ¼ ½0:3; 0:6; 0:2; 0:1; 0:5T ;

c c ð0Þ ¼ ½0:02; 0:08; 0:04; 0:07; 0:05T :

Fig. 3. Trajectories of system states.

Fig. 8. The control inputs u1;1 and u1;2 .

Fig. 9. The control inputs u2;1 and u2;2 .

Fig. 10. Switching signal.

CRediT authorship contribution statement

Declaration of Competing Interest

You might also like

F ðvÞ ¼ ðW Þ UðvÞ þ dðvÞ

where rJ k ðsÞ ¼ @J k ðsÞ=@s is the gradient of J k ðsÞ.

Furthermore, the gradient of J k ðsÞ can be expressed as follows

rJk ðsÞ ¼ rUTk ðsÞW k þ rdk ðsÞ ð11Þ

where ^J k ðsÞ is the estimations of J k ðsÞ, and W