0% found this document useful (0 votes)

44 views10 pages

Disturbance Observer

This paper proposes a disturbance observer based actor-critic learning control method for uncertain nonlinear systems with unmodeled dynamics and time-varying disturbances. The actor network is used to estimate the unknown system dynamics, the critic network evaluates control performance, and a disturbance observer estimates the compounded disturbance including time-varying disturbances and actor-critic network errors. A composite adaptation law updates the actor network weights using cost function and modeling error signals. Analysis shows the controller guarantees bounded stability, and simulations on a robot manipulator validate the approach.

Uploaded by

gustavoarins1612

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views10 pages

Disturbance Observer

Uploaded by

gustavoarins1612

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Chinese Journal of Aeronautics, (2023), 36(11): 271–280

Chinese Society of Aeronautics and Astronautics

& Beihang University
Chinese Journal of Aeronautics
[email protected]
www.sciencedirect.com

Disturbance observer based actor-critic learning

control for uncertain nonlinear systems
Xianglong LIANG a, Zhikai YAO b, Yaowen GE a, Jianyong YAO a,*
a
School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
b
College of Artificial Intelligence, Nanjing University of Post and Telecommunication, Nanjing 210023, China

Received 8 October 2022; revised 3 December 2022; accepted 17 January 2023

Available online 27 June 2023

KEYWORDS Abstract This paper investigates the disturbance observer based actor-critic learning control for a
Actor-critic structure; class of uncertain nonlinear systems in the presence of unmodeled dynamics and time-varying dis-
Composite adaptation; turbances. The proposed control algorithm integrates a filter-based design method with actor-critic
Disturbance observer; learning architecture and disturbance observer to circumvent the unmodeled dynamic and the time-
Robot manipulator; varying disturbance. To be specific, the actor network is employed to estimate the unknown system
Uncertain nonlinear system dynamic, the critic network is developed to evaluate the control performance, and the disturbance
observer is leveraged to provide efficient estimation of the compounded disturbance which includes
the time-varying disturbance and the actor-critic network approximation error. Consequently, high-
gain feedback is avoided and the improved tracking performance can be expected. Moreover, a
composite weight adaptation law for actor network is constructed by utilizing two types of signals,
the cost function and the modeling error. Eventually, theoretical analysis demonstrates that the
developed controller can guarantee bounded stability. Extensive simulations and experiments on
a robot manipulator are implemented to validate the performance of the resulted control strategy.
Ó 2023 Production and hosting by Elsevier Ltd. on behalf of Chinese Society of Aeronautics and
Astronautics. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/
licenses/by-nc-nd/4.0/).

1. Introduction of nonlinear systems. However, all these aforementioned meth-

ods cannot be directly applied to the nonlinear systems con-
Owing to its significance, both from a practical and a theoret- taining completely unknown dynamic structure, which
ical perspective, the control design of uncertain nonlinear sys- hinders their widespread application.
tems has been a major research topic over the past decades.1–4 In recent years, scholars have observed that Neural Net-
Some remarkable control approaches can be found in Refs. 5– work (NN) displays excellent ability in dealing with unknown
7, including backstepping control, adaptive control, and nonlinearity due to its universal function approximation prop-
observer-based nonlinear control, to name a few. Among erties, and substantial control problems have been addressed
them, adaptive control is an effective approach to address by utilizing the NN.8–10 For instance, in Ref. 11, an adaptive
unknown parameters, and hereafter, its combination with neural tracking control problem was investigated for strict
backstepping technique plays an important role in the control feedback nonlinear systems with unmodeled dynamics. By
introducing robust control and disturbance observer tech-
* Corresponding author. niques, the authors in Refs. 12–13 presented robust adaptive
E-mail address: [email protected] (J. YAO). neural control and disturbance observer based adaptive neural
https://doi.org/10.1016/j.cja.2023.06.028
1000-9361 Ó 2023 Production and hosting by Elsevier Ltd. on behalf of Chinese Society of Aeronautics and Astronautics.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
272 X. LIANG et al.

control, which can handle both unmodeled dynamics and time- of the time-varying disturbance and actor-critic network
varying disturbances. Different from the traditional NN-based approximation error on tracking performance, the actor-
control, an advanced neural learning control strategy is further critic learning algorithm is combined with the disturbance
developed according to actor-critic learning architecture.14,15 observer to circumvent the effects of the above factors. In
To be specific, the actor-critic learning architecture consists addition, a composite weight adaptation law for actor network
of two networks: an actor network and a critic network. The is constructed by utilizing two types of signals, the cost func-
actor network is leveraged to approximate an unknown function and the modeling error. Consequently, high-gain feedback
tion and generate action or control signal, while the critic net- is avoided and the improved tracking performance can be
work is leveraged to evaluate the control performance in the achieved. Eventually, extensive simulations and experiments
actor-critic learning architecture. The actor-critic learning on a robot manipulator are implemented to validate the per-
architecture with a generalized learning structure can be easily formance of the resulted control strategy.
applied to control other nonlinear systems. The key contributions of this paper are listed as follows:
Enlightened by the philosophy in Ref. 15, extensive control
approaches were studied by integrating the actor-critic learn- (1) An actor-critic learning architecture is developed to esti-
ing architecture with traditional control approaches for mate the unknown system dynamic online, which
unknown nonlinear systems. In Refs. 16–18, the actor-critic requires less model information and improves the
architecture has been successfully applied to estimate the robustness to unmodeled dynamic effectively.
unknown nonlinearity online and achieved satisfactory results, (2) A disturbance observer is effectively combined to com-
yet they ignore the negative influence of time-varying unknown pensate for the time-varying disturbance and actor-
disturbances on control performance. However, for the practi- critic network approximation error, which avoids high-
cal systems (e.g., vehicular systems, robot manipulators and gain feedback and achieves improved tracking
unmanned aerial vehicles19–21), the time-varying perturbations performance.
always exist. The appearance of time-varying disturbance
could generate some unexpected results, such as degrading To the best of our knowledge, few studies have integrated
the control performance and leading to system divergence, actor-critic learning architecture with disturbance observer
and it is difficult to circumvent the influence of the time- for tracking control of uncertain nonlinear systems with
varying disturbances with actor-critic learning control alone. unmodeled dynamic and time-varying unknown disturbance.
Moreover, the approximation inaccuracy caused by actor- The remainder of this paper is organized as follows. The
critic learning also influences the tracking performance. In this problem description is provided in Section 2. Section 3 states
regard, the above-mentioned factors facilitate the combination the disturbance observer based actor-critic learning control
of the actor-critic control and robust control or disturbance scheme and system stability analysis. Simulation and experi-
observer based control. In Ref. 22, the actor-critic structure mental studies on a single-link robot are provided in Section 4,
is used to estimate the modeling uncertainties that exist in a and conclusions are drawn in Section 5.
small unmanned helicopter, and a discontinuous sliding model
based robust component is introduced to eliminate the influ- 2. Problem description
ence of the approximation error of actor network and
unknown disturbance. To overcome the discontinuities of con- Consider a class of the nth-order Multiple Input Multiple Out-
trol input, a prescribed performance fault-tolerant control put (MIMO) nonlinear systems with the following form:
approach is developed in Ref.23 by integrating the actor- 8
critic learning scheme with a Robust Integral of the Sign of < x_ i ¼ xiþ1 i ¼ 1; 2; ; n 1
>
Error (RISE) feedback term, which requires less system infor- x_ n ¼ gðxÞu þ fðxÞ þ dðtÞ ð1Þ
mation and achieves the asymptotic stability. However, large >
:
yðtÞ ¼ x1
feedback gains are required to resist unknown disturbances
T
in robust control, which will reduce the stability margin, even where x ¼ ½xT1 ; xT2 ; ; xTn 2 Rmn denotes the system state
stimulate high-frequency dynamics, and then result in system vector with xi 2 Rm , which is assumed to be available for mea-
instability. Inspired by the feedforward design, disturbance surement; u 2 Rm is the control input; y 2 Rm is the system
observer based control24–27 can be used to estimate the impact output; f 2 Rm is the unknown smooth nonlinear function,
of disturbance and further used for compensation. In Refs. gðxÞ 2 Rm is the known nonzero function, and dðtÞ 2 Rm is
28,29, a reinforcement learning based controller integrated the time-varying disturbance.
with disturbance observer is established to reject the real- The main control objective of this study is to propose a dis-
time external disturbance, which can guarantee the robust sta- turbance observer based actor-critic learning control strategy
bility and the nominal performance even for the uncertain to achieve high tracking accuracy with unmodeled dynamic
plant, and obtain satisfactory results in the numerical simula- and time-varying unknown disturbance. To facilitate the pre-
tion. However, both of them merely focus on the regulation sentation, some related assumptions and lemmas are
control problem. necessary.
Inspired by the aforementioned challenges, a disturbance
observer based actor-critic learning control is developed for Assumption 1. The reference trajectory yr ðtÞ 2 Rm and its n-
a class of uncertain nonlinear systems in the presence of ðnÞ
derivative yr are available, smooth, and bounded.
unknown dynamic and time-varying disturbance. To cope with
the unknown dynamic, the actor-critic learning architecture is
Assumption 2. The time-varying disturbance dðtÞ and its first
developed to provide the feedforward compensation by
approximating unknown nonlinearity. Considering the effect _ 6 dm .
derivative are bounded, which satisfy kdk 6 dm ; kdk
Disturbance observer based actor-critic learning control 273
ðn1Þ
Lemma 1. The NN universal approximation property indi- z_ n ¼ gðxÞu þ fðxÞ þ d yðnÞ
r þ k1 z1
cates that a continuous function U : S ! RN1 (S is a compact þk2 z2
ðn2Þ
þ þ kn1 zn1
ð1Þ
ð4Þ
set) can be approximated as
¼ gðxÞu þ fðxÞ þ d yðnÞ
r þ k z
UðxÞ ¼ WT uðxÞ þ eðxÞ ð2Þ
ðn1Þ ðn2Þ ð1Þ
where k ¼ ½k1 ; k2 ; ; kn1 , z ¼ ½z1 ; z2 ; ; zn1 , and the
where x 2 RN2 is the input vector, W 2 RN3 N1 is the ideal
unknown nonlinear function fðxÞ can be approximated by an
weight matrix, and N1 , N2 and N3 are numbers of neurons in
NN-based actor network
the output, input, and hidden layer, respectively. uðxÞ 2 RN3
is the nonlinear activation function. According to Ref. 22, fðxa Þ ¼ WTa uðxa Þ þ ea ð5Þ
the ideal NN weights, nonlinear activation function and T
approximation error are assumed to be bounded by where xa ¼ ½xT1 ; xT2 ; ; xTn
denotes the input vector, Wa
denotes the weight vector of actor network and ea denotes
kWk 6 Wm ,kuk 6 um ; kuk_ 6 um ,kek 6 em ; k_ek 6 e m .
the actor network function reconstruction inaccuracy, which

satisfies kea k 6 eam and k_ea k 6 e am . Defining f ¼ ea þ d, the
3. Main results
expression in Eq. (4) can be further expressed as

In this scenario, the disturbance observer based actor-critic z_ n ¼ gðxÞu þ W z
^ a T ua WT ua þ f yðnÞ þ k ð6Þ
a r
learning control strategy for uncertain nonlinear system Eq.
(1) will be presented. First, we provide the backstepping con- where Wa ¼ W ^ a Wa and W ^ a is the estimate of Wa , which
troller design based on a filter-based design approach. Then, will be introduced later.
we design the actor-critic network to deal with the unknown Generally, time-varying external disturbances can be esti-
nonlinear dynamic, where the critic network is leveraged to mated by using model-based disturbance observers.30 Herein,
evaluate the control performance while the actor network is the time-varying external disturbance and the residual function
leveraged to approximate the unknown function. The architec- reconstruction inaccuracy of the actor network are integrated
ture of the developed control scheme is depicted as Fig. 1. as f. The adaptive neural disturbance observer for estimating
the lumped disturbance f is designed by using neural network
3.1. Controller design approximation
(
h_ ¼ aðh þ axn Þ aðW^ T ua þ gðxÞuÞ
To quantify the aforementioned control objective, the tracking
a
ð7Þ
bf ¼ h þ axn
error z1 2 Rm is defined as z1 ¼ y yr , and the following fil-
tered tracking errors are introduced to facilitate the controller where h 2 Rn is an internal state and a 2 R is a positive con-
design stant. Therefore, the control input u can be given by
8
< z2 ¼ z_ 1 þ k1 z1 u ¼ g1 ðxÞðyðnÞ z kn zn W
^ T ua bfÞ
r k a ð8Þ
i ¼ 3; 4; ; n ð3Þ
: where kn 2 R denotes a positive control gain. Substituting Eq.
zi ¼ z_ i1 þ ki1 zi1
(8) into Eq. (6), the dynamics of filtered error zn can be rewrit-
where k1 ; k2 ; ; kn1 2 R denote positive control gains. By ten as
substituting Eq. (1) into Eq. (3), the dynamic of filtered track-

ing error zn can be written as z_ n ¼ kn zn WTa ua þ f ð9Þ

Fig. 1 Structure of disturbance observer based actor-critic learning control strategy.

274 X. LIANG et al.

with f ¼ f bf and tem experience. The approximation of fðxÞ is designed as
follows:

~f_ ¼ a f þaWT u þ f_ ð10Þ
a a
^fðxa Þ ¼ W
^ T uðxa Þ ð17Þ
a

Define a prediction error ea ðÞ as

Remark 1. Apart from external disturbance, the residual
function reconstruction inaccuracy of actor network is esti- ^ T ua
ea ¼ KV ðV^ Vd Þ þ W ð18Þ
a
mated by designing adaptive neural disturbance observer,
which is different from conventional disturbance observers.30
where KV 2 Rm is the positive design parameter and Vd ¼ 0
The performance and robustness of the control plant can be
greatly improved by utilizing the adaptive neural disturbance is the ideal value of cost-to-go. The weight vector c W a is
observer for disturbance compensation. selected to minimize the objection function Ea ¼ 0:5eTa ea and
the actor network weight parameters are updated by the fol-
lowing update law:
3.2. Critic network design
^_ a ¼ ka1 ua ðKV V^ þ W
W ^ T ua Þ ka2 c
Wa ð19Þ
a

The critic network is utilized to provide the evaluation func- with positive parameters ka1 and ka2 . To further improve the
tion for the current strategy, which can test the performance
convergence of the estimated weights c W a and the function
of the current policy and generate rewards/punishments as
approximation precision of actor network, another prediction
the feedback for adaptive learning. Therefore, we introduce
the infinite horizon performance index function as follows: error xn , named modeling error,24 is defined as xn ¼ b
x n xn , in
Z 1 ^
which xn can be obtained by constructing the following serial-
VðtÞ ¼ exp½cðs tÞrðsÞds ð11Þ parallel estimation model 31:
t
where c > 0 represents a discount factor, which can guarantee (
the boundedness of the cost function even if the reference tra- ^_ i1 ¼ b
x x i i ¼ 1; 2; ; n 1
ð20Þ
jectory does not converge to zero, and rðtÞ represents an ^_ n ¼ gðxÞu þ W
x ^ T u þ bf bx
a a n
instantaneous cost function

rðtÞ ¼ zT1 Qz1 þ uT Ru ð12Þ and the dynamic equation xn is written as
where Q 2 Rmm and R 2 Rmm are the weighting matrices for

the lumped tracking error z1 and control input u, respectively. ~_ n ¼ bxn WTa ua f
x ð21Þ
To achieve optimal control, the cost-to-go function is sup-
posed to be minimized. Given that it is difficult to obtain the in which b 2 R is a positive constant.
cost-to-go function, an NN-based critic network is introduced Therefore, the actor network weight parameters are
Vðxc Þ ¼ WTc uðxc Þ þ ec ð13Þ adjusted by the following composite update law:

where xc ¼ z1 denotes the input vector, Wc denotes the weight ^_ a ¼ ka1 ua ðKV V^ þ W
W ^ T ua Þ ka2 c
W a ka3 ua xTn ð22Þ
a
vector of critic network and ec denotes the critic network func- where ka3 is a positive parameter.
tion reconstruction inaccuracy, which satisfies kec k 6 ecm and

k_ec k 6 e cm . And the cost-to-go function can be approximated by Remark 2. Different from the traditional action network
^ T uðxc Þ
b cÞ ¼ W
Vðx weights updating,18,22,23 a composite weight adaptation law
c
for actor network is constructed by using both prediction error
where W^ c is the estimate of Wc .
ea and modeling error xn , which ensures that the estimated
The weight vector c W c is selected to minimize the objection weights cW a converge better to unknown weights Wa and the
function Ec ¼ 0:5eTc ec , and according to Eq. (11) and Eq. (12), more precise approximation of the nonlinear function is
the prediction error ec ðÞ can be expressed as achieved.32
_^ cVðtÞ
e ¼ rðtÞ þ VðtÞ ^ ð15Þ
c

and the critic network weight parameters are updated by 3.4. Stability analysis
the following update law:
^_ c ¼ kc1 ec ðcuc þ ruc x_ c Þ kc2 c
W Wc
Theorem 1. In consideration of the nonlinear system Eq. (1) in
W Tc KÞK kc2 c
¼ kc1 ðr þ c Wc ð16Þ
the presence of unmodeled dynamic and time-varying distur-
where kc1 ; kc2 are positive parameters and K ¼ cuc þ ruc x_ c . bance, if the control input Eq. (8), the critic network weight
adaptive law Eq. (16), the actor network weight adaptive law
3.3. Actor network design Eq. (22) and the adaptive neural disturbance observer Eq. (7)
are designed, then all system signals are bounded. Proof details
The actor network is leveraged to estimate the unknown are given in Appendix A.
function fðxÞ that exists in Eq. (4), which can generate the
appropriate control policy by gradually accumulating the sys-
Disturbance observer based actor-critic learning control 275

4. Simulation and experiments

4.1. Simulation

To substantiate the feasibility and effectiveness of the devel-

oped control strategy, we consider a two-degree-of-freedom
robot manipulator (see Ref. 32) with the following dynamic
equation:
_ q_ þ FðqÞ
MðqÞ€q þ Cðq; qÞ _ þ sd ¼ s ð23Þ
_ €q denote the position, velocity, and acceleration,
where q; q;
respectively, MðqÞ is the inertia matrix, Cðq; qÞ _ is the
_ is the friction, sd is the exter-
centripetal-Coriolis matrix, FðqÞ
nal disturbance, and s is the control input.
The matrix MðqÞ; Cðq; qÞ; _ FðqÞ
_ and sd are given as follows: Fig. 2 Tracking performance with proposed controller Con-
troller 1.
p1 þ 2p3 cosðq2 Þ p2 þ p3 cosðq2 Þ
MðqÞ ¼
p2 þ p3 cosðq2 Þ p2

p3 sinðq2 Þq_ 2 p3 sinðq2 Þðq_ 1 þ q_ 2 Þ
_ ¼
Cðq; qÞ
p3 sinðq2 Þq_ 1 0

fd1 0 q_ 1 sd1
_ ¼
FðqÞ ; sd ¼
0 fd2 q_ 2 sd2
where p1 ¼ 3:473 kg m2 , p2 ¼ 0:196 kg m2 ,
p3 ¼ 0:242 kg m ,2
fd1 ¼ 5:3 N m s, fd2 ¼ 1:1 N m s,
sd1 ¼ 3 sinðtÞ and sd2 ¼ 0:2 sinðtÞ.
Then the dynamics in Eq. (23) can be transformed into the
state-space equation considered in this paper, i.e.,

x_ 1 ¼ x2
ð24Þ
x_ 2 ¼ gðxÞu þ fðxÞ þ dðtÞ

with x1 ¼ ½q1 ; q2 T ; x2 ¼ ½q_ 1 ; q_ 2 T ; gðxÞ ¼ M1 ðx1 Þ, u ¼ s, Fig. 3 Tracking errors for Joints 1 and 2 under Controller 1 and
fðxÞ ¼ M1 ½Cðx1 ; x2 Þx2 þ Fðx2 Þ and dðtÞ ¼ M1 sd . Controller 2.
The following two control strategies are compared to vali-
date the effectiveness of the proposed approach:
Controller 1. This is the proposed controller or more specif-
ically, actor-critic learning control integrated with disturbance
observer. The control parameters are chosen as k1 ¼ 30,
k2 ¼ 10, a ¼ 20, b ¼ 100, kc1 ¼ 2, kc2 ¼ 0:1, ka1 ¼ 20, ka2 ¼ 1
and ka3 ¼ 5. The initial weights of actor-critic networks are
chosen as c W a ¼ zerosð10; 4Þ and c W c ¼ zerosð10; 2Þ. The dis-
count factor is chosen as c ¼ 0:1, the positive matrices Q and
R in the cost function are selected as Q ¼ diagð½50 200Þ and
R ¼ diagð½0:1 0:1Þ, respectively.
Controller 2. This is the actor-critic learning control
approach without disturbance feedforward compensation. To
ensure fair comparison, the selected control parameters are
consistent with Controller 1.
The reference trajectories of two joints are chosen as
yr1 ¼ 0:6 sinð3:14tÞ ½1 expðtÞ and yr2 ¼ 0:8 sinð3:14tÞ
½1 expðtÞ. The simulation results are depicted as Figs. 2- Fig. 4 Compound estimation ðf þ dÞ½1 for Joint 1 under
6. As depicted in Fig. 2 and Fig. 3, the integrated Controller Controller 1 and Controller 2.
1 controller can follow the reference signal well and achieve
the best tracking performance in aspects of convergence speed
The results in Figs. 4 and 5 depict the compound estimation
and steady tracking error since the disturbance observer is
of f þ d, and it can be found that the composite estimation
introduced. In comparison of the last 20 s, the maximum
architecture established by actor-critic learning and distur-
amplitude of steady tracking error Mz1 ¼ ½0:0009; 0:0058 rad
bance observer can well approximate the unmodeled dynamic
under Controller 1, while the maximum amplitude of steady
and time-varying disturbance in comparison of the estimation
tracking error Mz1 ¼ ½0:0021; 0:0103 rad under Controller 2.
architecture only with actor-critic learning. This phenomenon
276 X. LIANG et al.

Fig. 5 Compound estimation ðf þ dÞ½2 for Joint 2 under

Controller 1 and Controller 2.

explains that the accurate feedforward compensation can

result in higher tracking accuracy. Eventually, the control
inputs are shown in Fig. 6, which are regular and bounded.

4.2. Experiments

To further substantiate the priority of the developed control Fig. 7 A single-degree-of-freedom robot manipulator platform.
strategy, an experiment was conducted on a single-degree-of-
freedom robot manipulator platform shown in Fig. 7. The test
rig includes a bench case, a motor actuator (consisting of a DC
friction, GðqÞ is the unknown gravity, sd is the external distur-
motor Kollmorgen DH063A, an electrical driver Kollmorgen
bance, s is the control input, and J ¼ Jr þ Jl þ Jp is the total
ServoStar 620, a rotary encoder Heidenhain ERN180, and a
moment of inertia with the joint moment of inertia Jr , the link
revolute joint), a link, a payload and a control module. The
moment of inertia Jl ¼ ml L2 =3 and the payload moment of
control module consists of a real-time control software com-
posed of an Advantech PCI-1723 and a Heidenhain IK-220 inertia Jp ¼ mp L2 . The system parameters are provided as
counter card, and a monitoring software. The sampling time Jr ¼ 0:3 kg m2 , L ¼ 0:5 m, ml ¼ 0:5 kg and mp ¼ 0 1 kg.
is 0.5 ms. Likewise, the aforementioned two controllers are still uti-
The dynamic of single-degree-of-freedom robot manipula- lized to test in the experiments. The reference signal is chosen
tor can also be written as the state-space form shown in Eq. as yr ¼ 10½1 cosð3:14tÞ ½1 expðtÞ, the control parame-
(24) with x1 ¼ q; x2 ¼ q; _ x ¼ ½x1 ; x2 T , gðxÞ ¼ J1 , u ¼ s, ters are chosen as k1 ¼ 150, k2 ¼ 50, a ¼ 10, b ¼ 50, kc1 ¼ 2,
kc2 ¼ 0:5, ka1 ¼ 20, ka2 ¼ 2 and ka3 ¼ 5. The initial weights
fðxÞ ¼ J1 ½FðqÞ
_ þ GðqÞ and dðtÞ ¼ J1 sd , where q; q_ denote
of actor-critic networks are chosen as
the position and velocity, respectively, FðqÞ _ is the unknown
c
W a ¼ zerosð10; 2Þ; W ^ c ¼ zerosð10; 1Þ. The discount factor is
chosen as c ¼ 0:1, and the positive matrices Q and R in the cost
function are selected as Q ¼ ½50 and R ¼ ½1, respectively. In
this scenario, the control performance with different payloads
mp ¼ 0 kg, mp ¼ 0:5 kg and mp ¼ 1 kg is tested, respectively.
Furthermore, to quantitatively evaluate the tracking perfor-
mance of the aforementioned controllers, three performance
indices (i.e., the maximum Mz , average l, and standard devia-
tion r) in Ref. 33 are introduced.

Case 1. The compared two controllers are tested under no-

load condition with mp ¼ 0 kg. The tracking errors and the
performance indices (during the last 20 s) of the two
controllers are presented in Fig. 8. From these results, it can
be observed that the tracking performance of Controller 1 is
improved compared with that of Controller 2 since the
disturbance observer is integrated with actor-critic learning
control, and the result of compound estimation f^þ d^ acquired
Fig. 6 Control inputs of two joints under Controller 1 and by action network and disturbance observer is depicted in
Controller 2. Fig. 9.
Disturbance observer based actor-critic learning control 277

Fig. 10 System tracking errors of Controller 1 and Controller 2

under light load condition with mp ¼ 0:5 kg.
Fig. 8 System tracking errors of Controller 1 and Controller 2
under no-load condition with mp ¼ 0 kg.

Case 2. The compared two controllers are tested under light

load condition with mp ¼ 0:5 kg. The tracking errors and the
performance indices are shown in Fig. 10. It can be seen that
the performance of Controller 1 still outperforms that of Con-
troller 2. In comparison with the no-load condition, Mz of
Controller 1 changed very little, and only increased by 1:6%,
while that of Controller 2 increased by about two times, and
this phenomenon illustrates that the actor-critic learning con- Fig. 11 Compound estimation of f^þ d^ of Controller 1 under
trol alone is not enough to compensate for the impact of load light load condition with mp ¼ 0:5 kg.
change on the system. In addition, the result of compound esti-
mation is depicted in Fig. 11. Compared with Fig. 9, it can be
observed that as the payload becomes larger, the proposed
composite estimation scheme can well adapt to the change of
system uncertainty, which further illustrates that Controller 1
can effectively compensate for the influence of load change.

Case 3. The compared two controllers are tested under heavy

load condition with mp ¼ 1 kg. The tracking errors and the
performance indices are shown in Fig. 12. Likewise, the perfor-
mance of Controller 1 still outperforms that of Controller 2,
and compared with the no-load condition, Mz of Controller
1 increased by 5:5%, while that of Controller 2 increased by
about 2.3 times, which further illustrates that Controller 1
can effectively compensate for the impact of load change on
the system. The result of compound estimation is depicted in
Fig. 13. Compared with Fig. 9, it can be observed that as the Fig. 12 System tracking errors of Controller 1 and Controller 2
payload becomes larger, Controller 1 can still well adapt to under heavy load condition with mp ¼ 1 kg.
the change of system uncertainty.

Fig. 13 Compound estimation of f^þ d^ of Controller 1 under

heavy load condition with mp ¼ 1 kg.
Fig. 9 Compound estimation of f^þ d^ of Controller 1 under no-
load condition with mp ¼ 0 kg.
278 X. LIANG et al.

where
Xn
1 T 1 1
L1 ¼ zi zi ; L2 ¼ f T f; L3 ¼ trðWTc Wc Þ;
i¼1
2 2 2
1 1
L4 ¼ trðWTa Wa Þ; L5 ¼ xTn xn ðA2Þ
2 2

Using Eq. (3) and Eq. (9), the derivative of L1 can be

expressed as
Fig. 14 Mz of two controllers under different payloads. L_ 1 ¼ k1 zT1 z1 þ zT1 z2 k2 zT2 z2 þ zT2 z3 þ kn1 zTn1 zn1

þ zTn1 zn kn zTn zn þ zTn ðWTa ua þ f Þ
1 1
k1 kz1 k2 þ kz1 k2 þ kz2 k2 k2 kz2 k2
In order to more intuitively observe the impact of load 2 2
change on the tracking performance of controllers Controller 1 1
þ kz2 k þ kz3 k þ kn1 kzn1 k2
2 2
1 and Controller 2, Mz of the two controllers under different 2 2
payloads are collected together, as shown in Fig. 14. It is evi- 1 1
þ kzn1 k þ kzn k2 kn kzn k2 þ kzn k2
2

dent that Controller 1 can effectively compensate for the 2 2

1 2 1 2
impact of load change on the system and maintain its tracking þ k f k þ kWa k kua k2 ðA3Þ
performance, while the tracking performance of Controller 2 2 2
gradually deteriorates with the increase of payload. To this Using Eq. (10), the derivative of L2 can be expressed as
end, the proposed controller Controller 1 is robust to
unknown uncertainties and the improved tracking perfor- L_ 2 ¼ af T f þaf T WTa ua þ f T f_

mance can be achieved. 6 ak f k2 þ 12 ak f k2 þ 12 akWa k2 kua k2
ðA4Þ
_
þ 12 k f k2 þ 12 kfk
2
5. Conclusions

_ 2
6 12 ða 1Þk f k2 þ 12 akWa k2 kua k2 þ 12 kfk
In this paper, the disturbance observer based actor-critic learn-
ing control is investigated for a class of nonlinear systems in Using Eq. (13) and Eq. (16), the derivative of L3 can be
the presence of unknown dynamic and time-varying distur- expressed as.

bance. A composite weight adaptation law for actor network L_ 3 ¼ kc1 WTc ðrðtÞ þ cW Tc KÞK kc2 WTc c
Wc

is constructed by both cost function and modeling error, and ¼ kc1 WTc ð c W Tc K þ ec þ c
W Tc KÞK kc2 WTc Wc þ Wc
a disturbance observer component is combined to compensate
for the residual functional reconstruction inaccuracy caused by 6 12 kc1 kmin KKT þ kc2 WTc Wc þ k2c1 2Tc 2c þ k2c2 WTc Wc
the actor network and the time-varying disturbance. Extensive (A5)
simulations and experiments on a robot manipulator present where ec ¼ cec e_ c , which is bounded, i.e., kec k 6 ecm .
that using the developed disturbance observer based actor- Using Eq. (22), the derivative of L4 can be expressed as
critic learning control strategy can effectively circumvent the
influence of unknown system dynamic and time-varying distur- L_ 4 ¼ ka1 WTa ua ðKV c W Ta ua Þ ka2 WTa c
W Tc uc þ c Wa

bance, and higher tracking accuracy can be achieved. Consid- ka3 WTa uxTn
ering that the full state information is required in the
implementation of the developed control strategy, we will ¼ ka1 WTa ua WTa ua ka1 WTa ua ðWTa ua þ KV c
W Tc uc Þ
explore an output feedback control approach for uncertain
ka2 WTa ðWa þ Wa Þ ka3 WTa ua xTn ðA6Þ
nonlinear systems in the future.
6 12 ðka1 kmin ðua uTa Þ þ ka2 ka3 kua k2 ÞWTa Wa
Declaration of Competing Interest
þka1 kWa k2 kua k2 þ 2ka1 KTV KV kuc k2 WTc Wc
2
The authors declare that they have no known competing þ2ka1 KTV KV kWc k2 kuc k2 þ k2a2 WTa Wa þ k2a3 kxn k
financial interests or personal relationships that could have
appeared to influence the work reported in this paper. Using Eq. (21), the derivative of L5 can be expressed as
1 1
L_ 5 ¼ xTn ðbxn WTa ua fÞ ðb 1Þkxn k2 þ k f k2 þ kWa k2 kua k2 ðA7Þ
2 2
Acknowledgements
Then combining Eqs. (A3)–(A7), the first derivative of L
This work was supported by the National Key R&D Program can be written as
of China (No. 2021YFB2011300), the National Natural
L_ 6 ðk1 0:5Þkz1 k2 ðki 1Þkzi k2 ðkn 1:5Þkzn k2
Science Foundation of China (No. 52075262).
12 ðkc1 kmin ðKKT Þ þ kc2 4ka1 KTV KV kuc k2 ÞkWc k2
Appendix A. Proof of Theorem 1. Consider the following
12 ðka1 kmin ðua uTa Þ ðka3 þ a þ 2Þkua k2 þ ka2 ÞkWa k2
Lyapunov candidate function: 2
12 ða 3Þk f k2 12 ð2b ka3 2Þkxn k þ q1
L ¼ L1 þ L2 þ L3 þ L4 þ L5 ðA1Þ 6 q0 L þ q1
ðA8Þ
Disturbance observer based actor-critic learning control 279

where 12. Yao ZK, Yao JY, Sun WC. Adaptive RISE control of hydraulic
systems with multilayer neural-networks. IEEE Trans Ind Electron
2019;66(11):8638–47.
q0 ¼ minf2ðk1 0:5Þ; 2ðk2 1Þ; ; 2ðkn 1:5Þ; a 3; 2b 13. Wang XJ, Yin XH, Wu QH, et al. Disturbance observer based
ka3 2; kc1 kmin ðKKT Þ þ kc2 4ka1 KTV KV kuc k2 ; adaptive neural control of uncertain MIMO nonlinear systems
ka1 kmin ðua uTa Þ ðka3 þ a þ 2Þkua k2 þ ka2 g with unmodeled dynamics. Neurocomputing 2018;313:247–58.

14. Sutton RS, Barto AGReinforcement learning: An introduction. 2nd
q1 ¼ 0:5ðkc1 2cm þ e 2am þ d2m Þ þ e2am þ d2m þ ð2ka1 KTV KV kucm k2
ed. Cambridge: MIT Press; 2018. p. 331–2.
þ0:5kc2 ÞkWcm k2 þ ð0:5ka2 þ ka1 kuam k2 ÞkWam k2 15. Widrow B, Gupta NK, Maitra S. Punish/reward: Learning with a
To ensure q0 > 0, the following conditions must be fulfilled: critic in adaptive threshold systems. IEEE Trans Syst Man Cybern
1973;3(5):455–65.
k1 > 0:5; k2 > 1; ; kn > 1:5 16. Cui RX, Yang CG, Li Y, et al. Adaptive neural network control of
a > 3; 2b ka3 2 > 0 AUVs with control input nonlinearities using reinforcement
ðA9Þ learning. IEEE Trans Syst Man Cybern 2017;47(6):1019–29.
kc1 kmin ðKKT Þ þ kc2 4ka1 KTV KV kuc k2 > 0 17. Guo XX, Yan WS, Cui RX. Event-triggered reinforcement
ka1 kmin ðua uTa Þ þ ka2 ðka3 þ a þ 2Þkua k2 > 0 learning-based adaptive tracking control for completely unknown
continuous-time nonlinear systems. IEEE Trans Cybern 2020;50
Solving the aforementioned differential inequality Eq. (A8), (7):3231–42.
we have 18. He W, Gao HJ, Zhou C, et al. Reinforcement learning control of a
q q q flexible two-link manipulator: an experimental investigation. IEEE
LðtÞ 6 ðLð0Þ 1 Þ expðq0 tÞ þ 1 6 Lð0Þ þ 1 ðA100Þ Trans Syst Man Cybern 2021;51(12):7326–36.
q0 q0 q0
19. Yang J, Su JY, Li SH, et al. High-order mismatched disturbance
Consequently, all system signals are bounded according to compensation for motion control systems via a continuous
the definition of L in Eq. (A1). dynamic sliding-mode approach. IEEE Trans Ind Inform 2014;10
(1):604–14.
20. Razmjooei H, Shafiei MH, Palli G, et al. Non-linear finite-time
References tracking control of uncertain robotic manipulators using time-
varying disturbance observer-based sliding mode method. J Intell
1. Han SS, Jiao ZX, Wang CW, et al. Fuzzy robust nonlinear control Rob Syst 2022;104(2):1–13.
approach for electro-hydraulic flight motion simulator. Chin J 21. Liang YQ, Dong Q, Zhao YJ. Adaptive leader-follower formation
Aeronaut 2015;28(1):294–304. control for swarms of unmanned aerial vehicles with motion
2. Yao JY, Deng WX. Active disturbance rejection adaptive control constraints and unknown disturbances. Chin J Aeronaut 2020;33
of uncertain nonlinear systems: Theory and application. Nonlinear (11):2972–88.
Dyn 2017;89(3):1611–24. 22. Xian B, Zhang X, Zhang HN, et al. Robust adaptive control for a
3. Deng WX, Yao JY, Ma DW. Time-varying input delay compen- small unmanned helicopter using reinforcement learning. IEEE
sation for nonlinear systems with additive disturbance: An output Trans Neural Netw Learn Syst 2022;33(12):7589–97.
feedback approach. Int J Robust Nonlinear Control 2018;28 23. Wang XR, Wang QL, Sun CY. Prescribed performance fault-
(1):31–52. tolerant control for uncertain nonlinear MIMO system using
4. Lu Y. Disturbance observer-based backstepping control for actor–critic learning structure. IEEE Trans Neural Netw Learn
hypersonic flight vehicles without use of measured flight path Syst 2022;33(9):4479–90.
angle. Chin J Aeronaut 2021;34(2):396–406. 24. Xu B, Sun FC, Pan YP, et al. Disturbance observer based
5. Yao JY, Jiao ZX, Ma DW. Extended-state-observer-based output composite learning fuzzy control of nonlinear systems with
feedback nonlinear robust control of hydraulic systems with unknown dead zone. IEEE Trans Syst Man Cybern 2017;47
backstepping. IEEE Trans Ind Electron 2014;61(11):6285–93. (8):1854–62.
6. Chen M, Ge SS, Ren BB. Adaptive tracking control of uncertain 25. Jing YH, Yang GH. Fuzzy adaptive quantized fault-tolerant
MIMO nonlinear systems with input constraints. Automatica control of strict-feedback nonlinear systems with mismatched
2011;47(3):452–65. external disturbances. IEEE Trans Syst Man Cybern 2020;50
7. Wu XQ, Xu KX, He XX. Disturbance-observer-based nonlinear (9):3424–34.
control for overhead cranes subject to uncertain disturbances. 26. Zhang R, Xu B, Shi P. Output feedback control of micromechan-
Mech Syst Signal Process 2020;139 106631. ical gyroscopes using neural networks and disturbance observer.
8. Bu XW, Wu XY, Ma Z, et al. Novel adaptive neural control of IEEE Trans Neural Netw Learn Syst 2022;33(3):962–72.
flexible air-breathing hypersonic vehicles based on sliding mode 27. Min HF, Xu SY, Fei SM, et al. Observer-based NN control for
differentiator. Chin J Aeronaut 2015;28(4):1209–16. nonlinear systems with full-state constraints and external distur-
9. Ouyang YC, Dong L, Xue L, et al. Adaptive control based on bances. IEEE Trans Neural Netw Learn Syst 2022;33(9):4322–31.
neural networks for an uncertain 2-DOF helicopter system with 28. Ran MP, Li JC, Xie LH. Reinforcement-learning-based distur-
input deadzone and output constraints. IEEE/CAA J Autom Sin bance rejection control for uncertain nonlinear systems. IEEE
2019;6(3):807–15. Trans Cybern 2022;52(9):9621–33.
10. Ma L, Xu N, Zhao XD, et al. Small-gain technique-based adaptive 29. Kim JW, Shim H, Yang I. On improving the robustness of
neural output-feedback fault-tolerant control of switched nonlin- reinforcement learning-based controllers using disturbance obser-
ear systems with unmodeled dynamics. IEEE Trans Syst Man ver. 2019 IEEE 58th conference on decision and control (CDC).
Cybern 2021;51(11):7051–62. Piscataway: IEEE Press; 2020. p. 847-52.
11. Zhang T, Ge SS, Hang CC. Adaptive neural network control for 30. Chen WH, Yang J, Guo L, et al. Disturbance-observer-based
strict-feedback nonlinear systems using backstepping design. control and related methods—an overview. IEEE Trans Ind
Automatica 2000;36(12):1835–46. Electron 2015;63(2):1083–95.
280 X. LIANG et al.

31. Xu B, Shi ZK, Yang CG, et al. Composite neural dynamic surface 33. Yao ZK, Liang XL, Zhao QT, et al. Adaptive disturbance
control of a class of uncertain nonlinear systems in strict-feedback observer-based control of hydraulic systems with asymptotic
form. IEEE Trans Cybern 2014;44(12):2626–34. stability. Appl Math Model 2022;105:226–42.
32. Hojati M, Gazor S. Hybrid adaptive fuzzy identification and
control of nonlinear systems. IEEE Trans Fuzzy Syst 2002;10
(2):198–210.

Mathematics: Quarter 2 Week 1
100% (2)
Mathematics: Quarter 2 Week 1
10 pages
Handbook Logistics FRC - PIRAC
No ratings yet
Handbook Logistics FRC - PIRAC
44 pages
p35 Phuoc Nam Ijoc2021 2.0m
No ratings yet
p35 Phuoc Nam Ijoc2021 2.0m
12 pages
Data-driven-based Sliding-mode Dynamic Event-triggered Control (2) (1)
No ratings yet
Data-driven-based Sliding-mode Dynamic Event-triggered Control (2) (1)
11 pages
Observer-Based Adaptive Neural Network Control For Nonlinear Systems in Nonstrict-Feedback Form
No ratings yet
Observer-Based Adaptive Neural Network Control For Nonlinear Systems in Nonstrict-Feedback Form
10 pages
Systems Science & Control Engineering: An Open Access Journal
No ratings yet
Systems Science & Control Engineering: An Open Access Journal
7 pages
Adaptive Attitude Controller For A Satellite Based On Neural Network
No ratings yet
Adaptive Attitude Controller For A Satellite Based On Neural Network
11 pages
Asymptotic Adaptive Tracking Control Based on Disturbance Observer for Non-Strict Feedback Nonlinear Systems
No ratings yet
Asymptotic Adaptive Tracking Control Based on Disturbance Observer for Non-Strict Feedback Nonlinear Systems
8 pages
Neural Network Adaptive Control For A Class of Nonlinear Uncertain Dynamical Systems With Asymptotic Stability Guarantees
No ratings yet
Neural Network Adaptive Control For A Class of Nonlinear Uncertain Dynamical Systems With Asymptotic Stability Guarantees
10 pages
2020 ADP Nonlinear System Mismatched Disterbances 2
No ratings yet
2020 ADP Nonlinear System Mismatched Disterbances 2
8 pages
Trinh2021 - Article - RobustOptimalTrackingControlUs 1
No ratings yet
Trinh2021 - Article - RobustOptimalTrackingControlUs 1
12 pages
Adaptive Nonlinear Control of Agile Antiair Missiles Using Neural Networks
No ratings yet
Adaptive Nonlinear Control of Agile Antiair Missiles Using Neural Networks
8 pages
Research Article
No ratings yet
Research Article
14 pages
Presentation Workshop
No ratings yet
Presentation Workshop
25 pages
Model-Following Neuro-Adaptive Control Design For Non-Square, Non-Affine Nonlinear Systems
No ratings yet
Model-Following Neuro-Adaptive Control Design For Non-Square, Non-Affine Nonlinear Systems
12 pages
A Novel Adaptive-Gain Disturbance Estimator-Based Asymptotic Adaptive Tracking Control For Uncertain Nonlinear Systems
No ratings yet
A Novel Adaptive-Gain Disturbance Estimator-Based Asymptotic Adaptive Tracking Control For Uncertain Nonlinear Systems
11 pages
tac232
No ratings yet
tac232
7 pages
Adaptive Neural Network-Based Control For A Class of Nonlinear Pure-Feedback Systems With Time-Varying Full State Constraints
No ratings yet
Adaptive Neural Network-Based Control For A Class of Nonlinear Pure-Feedback Systems With Time-Varying Full State Constraints
11 pages
Disturbance Observer-Based Adaptive Fuzzy Control For Strict-Feedback Nonlinear
No ratings yet
Disturbance Observer-Based Adaptive Fuzzy Control For Strict-Feedback Nonlinear
10 pages
Revised Paper vs12
No ratings yet
Revised Paper vs12
16 pages
05 - IEEE TRANSACTIONS ON CYBERNETICS 2017 - Adaptive Neural Network Control
No ratings yet
05 - IEEE TRANSACTIONS ON CYBERNETICS 2017 - Adaptive Neural Network Control
10 pages
Liu2019 Article AdaptiveNeuralNetworkTrackingC PDF
No ratings yet
Liu2019 Article AdaptiveNeuralNetworkTrackingC PDF
18 pages
Observer-Based Adaptive Neural Network Control For Nonlinear Stochastic Systems With Time Delay
No ratings yet
Observer-Based Adaptive Neural Network Control For Nonlinear Stochastic Systems With Time Delay
10 pages
IJEEE-V5I6P103
No ratings yet
IJEEE-V5I6P103
5 pages
Adaptive NN Output Feedback Control of Nonlinear System With Saturation
No ratings yet
Adaptive NN Output Feedback Control of Nonlinear System With Saturation
6 pages
Sliding Mode Control Design Based On Ackermann's Formula PDF
No ratings yet
Sliding Mode Control Design Based On Ackermann's Formula PDF
4 pages
RN Appliqué À Un SMC Pour La Synchronisation
No ratings yet
RN Appliqué À Un SMC Pour La Synchronisation
8 pages
Nonlinear Unknown Input Observer Design For Nonlinear Systems: A New Method
No ratings yet
Nonlinear Unknown Input Observer Design For Nonlinear Systems: A New Method
6 pages
Automatic Control: Experimental Approaches
From Everand
Automatic Control: Experimental Approaches
Subodh Keshari
No ratings yet
Adaptive Dynamic Programming Algorithm For Uncertain Nonlinear Switched Systems
No ratings yet
Adaptive Dynamic Programming Algorithm For Uncertain Nonlinear Switched Systems
7 pages
ADAPTIVE CONTROL
No ratings yet
ADAPTIVE CONTROL
12 pages
Robust Adaptive Control of Robotic Manipulator With Input Timevarying Delay
No ratings yet
Robust Adaptive Control of Robotic Manipulator With Input Timevarying Delay
10 pages
21-0010_03_MS
No ratings yet
21-0010_03_MS
7 pages
Observer-Based Adaptive Fuzzy Backstepping Dynamic Surface Control For A Class of MIMO Nonlinear Systems
No ratings yet
Observer-Based Adaptive Fuzzy Backstepping Dynamic Surface Control For A Class of MIMO Nonlinear Systems
12 pages
s11071-016-3327-7
No ratings yet
s11071-016-3327-7
14 pages
Article4 IJAST - 71 2014
No ratings yet
Article4 IJAST - 71 2014
15 pages
Adaptive Control For A Class of Non-Affine Nonlinear Systems Via Neural Networks
No ratings yet
Adaptive Control For A Class of Non-Affine Nonlinear Systems Via Neural Networks
37 pages
Using Dynamic Neural Networks To Generate Chaos: An Inverse Optimal Control Approach
No ratings yet
Using Dynamic Neural Networks To Generate Chaos: An Inverse Optimal Control Approach
7 pages
pure-feedback form
No ratings yet
pure-feedback form
17 pages
[15] Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method
No ratings yet
[15] Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method
34 pages
Jas 2018 0068
No ratings yet
Jas 2018 0068
13 pages
Adaptive Learning Feedback Linearization
No ratings yet
Adaptive Learning Feedback Linearization
9 pages
The Observer Adaptive Backstepping Control For A Simple Pendulum
No ratings yet
The Observer Adaptive Backstepping Control For A Simple Pendulum
8 pages
Nonaffine Nonlinear
No ratings yet
Nonaffine Nonlinear
5 pages
Neural Network-Based Distributed Finite-Time Tracking Control of Uncertain Multi-Agent Systems With Full State Constraints
No ratings yet
Neural Network-Based Distributed Finite-Time Tracking Control of Uncertain Multi-Agent Systems With Full State Constraints
10 pages
Chen IEEE ASME Trans Mechatronics 2006
No ratings yet
Chen IEEE ASME Trans Mechatronics 2006
7 pages
Nonlinear and Adaptive Control
No ratings yet
Nonlinear and Adaptive Control
125 pages
Automatica: Yan-Jun Liu Shaocheng Tong
No ratings yet
Automatica: Yan-Jun Liu Shaocheng Tong
6 pages
Article1 IJAST - 65 2014
No ratings yet
Article1 IJAST - 65 2014
15 pages
Adaptive Control of Nonlinear Systems Basic Results and Applications
No ratings yet
Adaptive Control of Nonlinear Systems Basic Results and Applications
12 pages
A Direct Adaptive Neural-Network Control For Unknown Nonlinear Systems and Its Application
No ratings yet
A Direct Adaptive Neural-Network Control For Unknown Nonlinear Systems and Its Application
8 pages
Hierarchical Anti-Disturbance Adaptive Control For Non-Linear Systems With Composite Disturbances and Applications To Missile Systems
No ratings yet
Hierarchical Anti-Disturbance Adaptive Control For Non-Linear Systems With Composite Disturbances and Applications To Missile Systems
15 pages
Adaptive Data Based Neural Network Leader-Follower Control of Multi-Agent Networks
No ratings yet
Adaptive Data Based Neural Network Leader-Follower Control of Multi-Agent Networks
6 pages
Force Observer-Based Admittance Control Design For Robot Manipulators
No ratings yet
Force Observer-Based Admittance Control Design For Robot Manipulators
4 pages
Data-Assimilated Model-Informed Reinforcement Learning
No ratings yet
Data-Assimilated Model-Informed Reinforcement Learning
31 pages
Adaptive Sliding Mode Control With Nonlinear Disturbance Observer For Uncertain Nonlinear System Based On Backstepping Method
No ratings yet
Adaptive Sliding Mode Control With Nonlinear Disturbance Observer For Uncertain Nonlinear System Based On Backstepping Method
6 pages
Slide Mode Control
No ratings yet
Slide Mode Control
215 pages
Reinforcement_Learning-Based_Tracking_Control_for_a_Three_Mecanum_Wheeled_Mobile_Robot
No ratings yet
Reinforcement_Learning-Based_Tracking_Control_for_a_Three_Mecanum_Wheeled_Mobile_Robot
8 pages
Book Control Systems
No ratings yet
Book Control Systems
164 pages
Low-Cost Adaptive Fuzzy Neural Prescribed Performa
No ratings yet
Low-Cost Adaptive Fuzzy Neural Prescribed Performa
27 pages
Cooperative Control of Multi-Agent Systems With Unknown State-Dependent Controlling Effects
No ratings yet
Cooperative Control of Multi-Agent Systems With Unknown State-Dependent Controlling Effects
8 pages
Autopilot Systems and Architecture: Definitive Reference for Developers and Engineers
From Everand
Autopilot Systems and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Varios Controladores 2
No ratings yet
Varios Controladores 2
20 pages
Zhou Et Al 2020 Deep Neural Networks As Add On Modules For Enhancing Robot Performance in Impromptu Trajectory Tracking
No ratings yet
Zhou Et Al 2020 Deep Neural Networks As Add On Modules For Enhancing Robot Performance in Impromptu Trajectory Tracking
22 pages
Varios Controladores
No ratings yet
Varios Controladores
11 pages
Vertical Take-Off and Landing System Control Using Deep Reinforcement Learning
No ratings yet
Vertical Take-Off and Landing System Control Using Deep Reinforcement Learning
6 pages
Reinforcement Learning Framework For Dynamic Power Transmission in Cloud RAN Systems
No ratings yet
Reinforcement Learning Framework For Dynamic Power Transmission in Cloud RAN Systems
6 pages
Deep Reinforcement Learning Based Dynamic Proportional-Integral PI Gain Auto-Tuning Method For A Robot Driver System
No ratings yet
Deep Reinforcement Learning Based Dynamic Proportional-Integral PI Gain Auto-Tuning Method For A Robot Driver System
15 pages
Biomimetics 08 00434
No ratings yet
Biomimetics 08 00434
26 pages
Cancer
No ratings yet
Cancer
14 pages
Deep Reinforcement Learning Based End-to-End Multiuser Channel Prediction and Beamforming
No ratings yet
Deep Reinforcement Learning Based End-to-End Multiuser Channel Prediction and Beamforming
15 pages
Tax On Corporation
No ratings yet
Tax On Corporation
2 pages
Mini Project Reaction Engineering GROUP 9 (Stage 2)
100% (3)
Mini Project Reaction Engineering GROUP 9 (Stage 2)
41 pages
Dokumen - Tips - High Performance Waterborne Epoxy Formulation For 2019 03 06 19 Table 1 Epoxy Modified
No ratings yet
Dokumen - Tips - High Performance Waterborne Epoxy Formulation For 2019 03 06 19 Table 1 Epoxy Modified
3 pages
Writing SOPs
33% (3)
Writing SOPs
50 pages
The Limit Comparison Test
No ratings yet
The Limit Comparison Test
14 pages
Human Resources Management & Labour Relation
No ratings yet
Human Resources Management & Labour Relation
93 pages
2024 IPPE Export Interest Directory
100% (1)
2024 IPPE Export Interest Directory
36 pages
Dossier 6170 (1) .FR - en
100% (1)
Dossier 6170 (1) .FR - en
48 pages
Equivalence of Boolean Algebras and Pre A - Algebras: K.Suguna Rao, P.Koteswara Rao
No ratings yet
Equivalence of Boolean Algebras and Pre A - Algebras: K.Suguna Rao, P.Koteswara Rao
6 pages
project wine and spirits
No ratings yet
project wine and spirits
16 pages
gate 1
No ratings yet
gate 1
1 page
Lesson 10 Gerunds
No ratings yet
Lesson 10 Gerunds
8 pages
Estimating On A Number Line To 1000 - Horizontal
No ratings yet
Estimating On A Number Line To 1000 - Horizontal
7 pages
If You Cant Beat Them
No ratings yet
If You Cant Beat Them
628 pages
Manuel Regulateur 10A Srne
No ratings yet
Manuel Regulateur 10A Srne
4 pages
Baby Rudin Chapter 4
No ratings yet
Baby Rudin Chapter 4
5 pages
Gusmer 349982ENEU-A
No ratings yet
Gusmer 349982ENEU-A
10 pages
BBPW 3103-Financial Management I PDF
No ratings yet
BBPW 3103-Financial Management I PDF
13 pages
silver-peak-whitepaper-ipsec-udp-1018_1
No ratings yet
silver-peak-whitepaper-ipsec-udp-1018_1
12 pages
EPA Consensus Project Paper Accuracy of Photogrammetry Devices Intraoral Scanners and Conventional Techniques For The Full Arch Implant Impressions A Systematic Review
No ratings yet
EPA Consensus Project Paper Accuracy of Photogrammetry Devices Intraoral Scanners and Conventional Techniques For The Full Arch Implant Impressions A Systematic Review
12 pages
Operation Manual: Revision 1
No ratings yet
Operation Manual: Revision 1
23 pages
Soap Carving Instructions
No ratings yet
Soap Carving Instructions
2 pages
headphones_brochure_EN
No ratings yet
headphones_brochure_EN
6 pages
Explanation of COCOMO Model in Urdu / Hindi
No ratings yet
Explanation of COCOMO Model in Urdu / Hindi
15 pages
Water Supply and Sanitary Engineering
100% (1)
Water Supply and Sanitary Engineering
12 pages
Appendix C Design Examples
No ratings yet
Appendix C Design Examples
2 pages
Unit h167 01 Research Methods Sample Assessment Materials
No ratings yet
Unit h167 01 Research Methods Sample Assessment Materials
36 pages
Class X CH-6 (BBQ, TBQ)
No ratings yet
Class X CH-6 (BBQ, TBQ)
5 pages

Disturbance Observer

Uploaded by

Disturbance Observer

Uploaded by

Chinese Journal of Aeronautics, (2023), 36(11): 271–280

Chinese Society of Aeronautics and Astronautics

Disturbance observer based actor-critic learning

Received 8 October 2022; revised 3 December 2022; accepted 17 January 2023

1. Introduction of nonlinear systems. However, all these aforementioned meth-

ing error zn can be written as z_ n ¼ kn zn WTa ua þ f ð9Þ

Fig. 1 Structure of disturbance observer based actor-critic learning control strategy.

Define a prediction error ea ðÞ as

4. Simulation and experiments

To substantiate the feasibility and effectiveness of the devel-

Fig. 5 Compound estimation ðf þ dÞ½2 for Joint 2 under

explains that the accurate feedforward compensation can

Case 1. The compared two controllers are tested under no-

Fig. 10 System tracking errors of Controller 1 and Controller 2

Case 2. The compared two controllers are tested under light

Case 3. The compared two controllers are tested under heavy

Fig. 13 Compound estimation of f^þ d^ of Controller 1 under

Using Eq. (3) and Eq. (9), the derivative of L1 can be

dent that Controller 1 can effectively compensate for the 2 2

You might also like