Disturbance Observer
Disturbance Observer
KEYWORDS Abstract This paper investigates the disturbance observer based actor-critic learning control for a
Actor-critic structure; class of uncertain nonlinear systems in the presence of unmodeled dynamics and time-varying dis-
Composite adaptation; turbances. The proposed control algorithm integrates a filter-based design method with actor-critic
Disturbance observer; learning architecture and disturbance observer to circumvent the unmodeled dynamic and the time-
Robot manipulator; varying disturbance. To be specific, the actor network is employed to estimate the unknown system
Uncertain nonlinear system dynamic, the critic network is developed to evaluate the control performance, and the disturbance
observer is leveraged to provide efficient estimation of the compounded disturbance which includes
the time-varying disturbance and the actor-critic network approximation error. Consequently, high-
gain feedback is avoided and the improved tracking performance can be expected. Moreover, a
composite weight adaptation law for actor network is constructed by utilizing two types of signals,
the cost function and the modeling error. Eventually, theoretical analysis demonstrates that the
developed controller can guarantee bounded stability. Extensive simulations and experiments on
a robot manipulator are implemented to validate the performance of the resulted control strategy.
Ó 2023 Production and hosting by Elsevier Ltd. on behalf of Chinese Society of Aeronautics and
Astronautics. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/
licenses/by-nc-nd/4.0/).
control, which can handle both unmodeled dynamics and time- of the time-varying disturbance and actor-critic network
varying disturbances. Different from the traditional NN-based approximation error on tracking performance, the actor-
control, an advanced neural learning control strategy is further critic learning algorithm is combined with the disturbance
developed according to actor-critic learning architecture.14,15 observer to circumvent the effects of the above factors. In
To be specific, the actor-critic learning architecture consists addition, a composite weight adaptation law for actor network
of two networks: an actor network and a critic network. The is constructed by utilizing two types of signals, the cost func-
actor network is leveraged to approximate an unknown func- tion and the modeling error. Consequently, high-gain feedback
tion and generate action or control signal, while the critic net- is avoided and the improved tracking performance can be
work is leveraged to evaluate the control performance in the achieved. Eventually, extensive simulations and experiments
actor-critic learning architecture. The actor-critic learning on a robot manipulator are implemented to validate the per-
architecture with a generalized learning structure can be easily formance of the resulted control strategy.
applied to control other nonlinear systems. The key contributions of this paper are listed as follows:
Enlightened by the philosophy in Ref. 15, extensive control
approaches were studied by integrating the actor-critic learn- (1) An actor-critic learning architecture is developed to esti-
ing architecture with traditional control approaches for mate the unknown system dynamic online, which
unknown nonlinear systems. In Refs. 16–18, the actor-critic requires less model information and improves the
architecture has been successfully applied to estimate the robustness to unmodeled dynamic effectively.
unknown nonlinearity online and achieved satisfactory results, (2) A disturbance observer is effectively combined to com-
yet they ignore the negative influence of time-varying unknown pensate for the time-varying disturbance and actor-
disturbances on control performance. However, for the practi- critic network approximation error, which avoids high-
cal systems (e.g., vehicular systems, robot manipulators and gain feedback and achieves improved tracking
unmanned aerial vehicles19–21), the time-varying perturbations performance.
always exist. The appearance of time-varying disturbance
could generate some unexpected results, such as degrading To the best of our knowledge, few studies have integrated
the control performance and leading to system divergence, actor-critic learning architecture with disturbance observer
and it is difficult to circumvent the influence of the time- for tracking control of uncertain nonlinear systems with
varying disturbances with actor-critic learning control alone. unmodeled dynamic and time-varying unknown disturbance.
Moreover, the approximation inaccuracy caused by actor- The remainder of this paper is organized as follows. The
critic learning also influences the tracking performance. In this problem description is provided in Section 2. Section 3 states
regard, the above-mentioned factors facilitate the combination the disturbance observer based actor-critic learning control
of the actor-critic control and robust control or disturbance scheme and system stability analysis. Simulation and experi-
observer based control. In Ref. 22, the actor-critic structure mental studies on a single-link robot are provided in Section 4,
is used to estimate the modeling uncertainties that exist in a and conclusions are drawn in Section 5.
small unmanned helicopter, and a discontinuous sliding model
based robust component is introduced to eliminate the influ- 2. Problem description
ence of the approximation error of actor network and
unknown disturbance. To overcome the discontinuities of con- Consider a class of the nth-order Multiple Input Multiple Out-
trol input, a prescribed performance fault-tolerant control put (MIMO) nonlinear systems with the following form:
approach is developed in Ref.23 by integrating the actor- 8
critic learning scheme with a Robust Integral of the Sign of < x_ i ¼ xiþ1 i ¼ 1; 2; ; n 1
>
Error (RISE) feedback term, which requires less system infor- x_ n ¼ gðxÞu þ fðxÞ þ dðtÞ ð1Þ
mation and achieves the asymptotic stability. However, large >
:
yðtÞ ¼ x1
feedback gains are required to resist unknown disturbances
T
in robust control, which will reduce the stability margin, even where x ¼ ½xT1 ; xT2 ; ; xTn 2 Rmn denotes the system state
stimulate high-frequency dynamics, and then result in system vector with xi 2 Rm , which is assumed to be available for mea-
instability. Inspired by the feedforward design, disturbance surement; u 2 Rm is the control input; y 2 Rm is the system
observer based control24–27 can be used to estimate the impact output; f 2 Rm is the unknown smooth nonlinear function,
of disturbance and further used for compensation. In Refs. gðxÞ 2 Rm is the known nonzero function, and dðtÞ 2 Rm is
28,29, a reinforcement learning based controller integrated the time-varying disturbance.
with disturbance observer is established to reject the real- The main control objective of this study is to propose a dis-
time external disturbance, which can guarantee the robust sta- turbance observer based actor-critic learning control strategy
bility and the nominal performance even for the uncertain to achieve high tracking accuracy with unmodeled dynamic
plant, and obtain satisfactory results in the numerical simula- and time-varying unknown disturbance. To facilitate the pre-
tion. However, both of them merely focus on the regulation sentation, some related assumptions and lemmas are
control problem. necessary.
Inspired by the aforementioned challenges, a disturbance
observer based actor-critic learning control is developed for Assumption 1. The reference trajectory yr ðtÞ 2 Rm and its n-
a class of uncertain nonlinear systems in the presence of ðnÞ
derivative yr are available, smooth, and bounded.
unknown dynamic and time-varying disturbance. To cope with
the unknown dynamic, the actor-critic learning architecture is
Assumption 2. The time-varying disturbance dðtÞ and its first
developed to provide the feedforward compensation by
approximating unknown nonlinearity. Considering the effect _ 6 dm .
derivative are bounded, which satisfy kdk 6 dm ; kdk
Disturbance observer based actor-critic learning control 273
ðn1Þ
Lemma 1. The NN universal approximation property indi- z_ n ¼ gðxÞu þ fðxÞ þ d yðnÞ
r þ k1 z1
cates that a continuous function U : S ! RN1 (S is a compact þk2 z2
ðn2Þ
þ þ kn1 zn1
ð1Þ
ð4Þ
set) can be approximated as
¼ gðxÞu þ fðxÞ þ d yðnÞ
r þ k z
UðxÞ ¼ WT uðxÞ þ eðxÞ ð2Þ
ðn1Þ ðn2Þ ð1Þ
where k ¼ ½k1 ; k2 ; ; kn1 , z ¼ ½z1 ; z2 ; ; zn1 , and the
where x 2 RN2 is the input vector, W 2 RN3 N1 is the ideal
unknown nonlinear function fðxÞ can be approximated by an
weight matrix, and N1 , N2 and N3 are numbers of neurons in
NN-based actor network
the output, input, and hidden layer, respectively. uðxÞ 2 RN3
is the nonlinear activation function. According to Ref. 22, fðxa Þ ¼ WTa uðxa Þ þ ea ð5Þ
the ideal NN weights, nonlinear activation function and T
approximation error are assumed to be bounded by where xa ¼ ½xT1 ; xT2 ; ; xTn
denotes the input vector, Wa
denotes the weight vector of actor network and ea denotes
kWk 6 Wm ,kuk 6 um ; kuk_ 6 um ,kek 6 em ; k_ek 6 e m .
the actor network function reconstruction inaccuracy, which
satisfies kea k 6 eam and k_ea k 6 e am . Defining f ¼ ea þ d, the
3. Main results
expression in Eq. (4) can be further expressed as
In this scenario, the disturbance observer based actor-critic z_ n ¼ gðxÞu þ W z
^ a T ua WT ua þ f yðnÞ þ k ð6Þ
a r
learning control strategy for uncertain nonlinear system Eq.
(1) will be presented. First, we provide the backstepping con- where Wa ¼ W ^ a Wa and W ^ a is the estimate of Wa , which
troller design based on a filter-based design approach. Then, will be introduced later.
we design the actor-critic network to deal with the unknown Generally, time-varying external disturbances can be esti-
nonlinear dynamic, where the critic network is leveraged to mated by using model-based disturbance observers.30 Herein,
evaluate the control performance while the actor network is the time-varying external disturbance and the residual function
leveraged to approximate the unknown function. The architec- reconstruction inaccuracy of the actor network are integrated
ture of the developed control scheme is depicted as Fig. 1. as f. The adaptive neural disturbance observer for estimating
the lumped disturbance f is designed by using neural network
3.1. Controller design approximation
(
h_ ¼ aðh þ axn Þ aðW^ T ua þ gðxÞuÞ
To quantify the aforementioned control objective, the tracking
a
ð7Þ
bf ¼ h þ axn
error z1 2 Rm is defined as z1 ¼ y yr , and the following fil-
tered tracking errors are introduced to facilitate the controller where h 2 Rn is an internal state and a 2 R is a positive con-
design stant. Therefore, the control input u can be given by
8
< z2 ¼ z_ 1 þ k1 z1 u ¼ g1 ðxÞðyðnÞ z kn zn W
^ T ua bfÞ
r k a ð8Þ
i ¼ 3; 4; ; n ð3Þ
: where kn 2 R denotes a positive control gain. Substituting Eq.
zi ¼ z_ i1 þ ki1 zi1
(8) into Eq. (6), the dynamics of filtered error zn can be rewrit-
where k1 ; k2 ; ; kn1 2 R denote positive control gains. By ten as
substituting Eq. (1) into Eq. (3), the dynamic of filtered track-
The critic network is utilized to provide the evaluation func- with positive parameters ka1 and ka2 . To further improve the
tion for the current strategy, which can test the performance
convergence of the estimated weights c W a and the function
of the current policy and generate rewards/punishments as
approximation precision of actor network, another prediction
the feedback for adaptive learning. Therefore, we introduce
the infinite horizon performance index function as follows: error xn , named modeling error,24 is defined as xn ¼ b
x n xn , in
Z 1 ^
which xn can be obtained by constructing the following serial-
VðtÞ ¼ exp½cðs tÞrðsÞds ð11Þ parallel estimation model 31:
t
where c > 0 represents a discount factor, which can guarantee (
the boundedness of the cost function even if the reference tra- ^_ i1 ¼ b
x x i i ¼ 1; 2; ; n 1
ð20Þ
jectory does not converge to zero, and rðtÞ represents an ^_ n ¼ gðxÞu þ W
x ^ T u þ bf bx
a a n
instantaneous cost function
rðtÞ ¼ zT1 Qz1 þ uT Ru ð12Þ and the dynamic equation xn is written as
where Q 2 Rmm and R 2 Rmm are the weighting matrices for
the lumped tracking error z1 and control input u, respectively. ~_ n ¼ bxn WTa ua f
x ð21Þ
To achieve optimal control, the cost-to-go function is sup-
posed to be minimized. Given that it is difficult to obtain the in which b 2 R is a positive constant.
cost-to-go function, an NN-based critic network is introduced Therefore, the actor network weight parameters are
Vðxc Þ ¼ WTc uðxc Þ þ ec ð13Þ adjusted by the following composite update law:
where xc ¼ z1 denotes the input vector, Wc denotes the weight ^_ a ¼ ka1 ua ðKV V^ þ W
W ^ T ua Þ ka2 c
W a ka3 ua xTn ð22Þ
a
vector of critic network and ec denotes the critic network func- where ka3 is a positive parameter.
tion reconstruction inaccuracy, which satisfies kec k 6 ecm and
k_ec k 6 e cm . And the cost-to-go function can be approximated by Remark 2. Different from the traditional action network
^ T uðxc Þ
b cÞ ¼ W
Vðx weights updating,18,22,23 a composite weight adaptation law
c
for actor network is constructed by using both prediction error
where W^ c is the estimate of Wc .
ea and modeling error xn , which ensures that the estimated
The weight vector c W c is selected to minimize the objection weights cW a converge better to unknown weights Wa and the
function Ec ¼ 0:5eTc ec , and according to Eq. (11) and Eq. (12), more precise approximation of the nonlinear function is
the prediction error ec ðÞ can be expressed as achieved.32
_^ cVðtÞ
e ¼ rðtÞ þ VðtÞ ^ ð15Þ
c
and the critic network weight parameters are updated by 3.4. Stability analysis
the following update law:
^_ c ¼ kc1 ec ðcuc þ ruc x_ c Þ kc2 c
W Wc
Theorem 1. In consideration of the nonlinear system Eq. (1) in
W Tc KÞK kc2 c
¼ kc1 ðr þ c Wc ð16Þ
the presence of unmodeled dynamic and time-varying distur-
where kc1 ; kc2 are positive parameters and K ¼ cuc þ ruc x_ c . bance, if the control input Eq. (8), the critic network weight
adaptive law Eq. (16), the actor network weight adaptive law
3.3. Actor network design Eq. (22) and the adaptive neural disturbance observer Eq. (7)
are designed, then all system signals are bounded. Proof details
The actor network is leveraged to estimate the unknown are given in Appendix A.
function fðxÞ that exists in Eq. (4), which can generate the
appropriate control policy by gradually accumulating the sys-
Disturbance observer based actor-critic learning control 275
4.1. Simulation
with x1 ¼ ½q1 ; q2 T ; x2 ¼ ½q_ 1 ; q_ 2 T ; gðxÞ ¼ M1 ðx1 Þ, u ¼ s, Fig. 3 Tracking errors for Joints 1 and 2 under Controller 1 and
fðxÞ ¼ M1 ½Cðx1 ; x2 Þx2 þ Fðx2 Þ and dðtÞ ¼ M1 sd . Controller 2.
The following two control strategies are compared to vali-
date the effectiveness of the proposed approach:
Controller 1. This is the proposed controller or more specif-
ically, actor-critic learning control integrated with disturbance
observer. The control parameters are chosen as k1 ¼ 30,
k2 ¼ 10, a ¼ 20, b ¼ 100, kc1 ¼ 2, kc2 ¼ 0:1, ka1 ¼ 20, ka2 ¼ 1
and ka3 ¼ 5. The initial weights of actor-critic networks are
chosen as c W a ¼ zerosð10; 4Þ and c W c ¼ zerosð10; 2Þ. The dis-
count factor is chosen as c ¼ 0:1, the positive matrices Q and
R in the cost function are selected as Q ¼ diagð½50 200Þ and
R ¼ diagð½0:1 0:1Þ, respectively.
Controller 2. This is the actor-critic learning control
approach without disturbance feedforward compensation. To
ensure fair comparison, the selected control parameters are
consistent with Controller 1.
The reference trajectories of two joints are chosen as
yr1 ¼ 0:6 sinð3:14tÞ ½1 expðtÞ and yr2 ¼ 0:8 sinð3:14tÞ
½1 expðtÞ. The simulation results are depicted as Figs. 2- Fig. 4 Compound estimation ðf þ dÞ½1 for Joint 1 under
6. As depicted in Fig. 2 and Fig. 3, the integrated Controller Controller 1 and Controller 2.
1 controller can follow the reference signal well and achieve
the best tracking performance in aspects of convergence speed
The results in Figs. 4 and 5 depict the compound estimation
and steady tracking error since the disturbance observer is
of f þ d, and it can be found that the composite estimation
introduced. In comparison of the last 20 s, the maximum
architecture established by actor-critic learning and distur-
amplitude of steady tracking error Mz1 ¼ ½0:0009; 0:0058 rad
bance observer can well approximate the unmodeled dynamic
under Controller 1, while the maximum amplitude of steady
and time-varying disturbance in comparison of the estimation
tracking error Mz1 ¼ ½0:0021; 0:0103 rad under Controller 2.
architecture only with actor-critic learning. This phenomenon
276 X. LIANG et al.
4.2. Experiments
To further substantiate the priority of the developed control Fig. 7 A single-degree-of-freedom robot manipulator platform.
strategy, an experiment was conducted on a single-degree-of-
freedom robot manipulator platform shown in Fig. 7. The test
rig includes a bench case, a motor actuator (consisting of a DC
friction, GðqÞ is the unknown gravity, sd is the external distur-
motor Kollmorgen DH063A, an electrical driver Kollmorgen
bance, s is the control input, and J ¼ Jr þ Jl þ Jp is the total
ServoStar 620, a rotary encoder Heidenhain ERN180, and a
moment of inertia with the joint moment of inertia Jr , the link
revolute joint), a link, a payload and a control module. The
moment of inertia Jl ¼ ml L2 =3 and the payload moment of
control module consists of a real-time control software com-
posed of an Advantech PCI-1723 and a Heidenhain IK-220 inertia Jp ¼ mp L2 . The system parameters are provided as
counter card, and a monitoring software. The sampling time Jr ¼ 0:3 kg m2 , L ¼ 0:5 m, ml ¼ 0:5 kg and mp ¼ 0 1 kg.
is 0.5 ms. Likewise, the aforementioned two controllers are still uti-
The dynamic of single-degree-of-freedom robot manipula- lized to test in the experiments. The reference signal is chosen
tor can also be written as the state-space form shown in Eq. as yr ¼ 10½1 cosð3:14tÞ ½1 expðtÞ, the control parame-
(24) with x1 ¼ q; x2 ¼ q; _ x ¼ ½x1 ; x2 T , gðxÞ ¼ J1 , u ¼ s, ters are chosen as k1 ¼ 150, k2 ¼ 50, a ¼ 10, b ¼ 50, kc1 ¼ 2,
kc2 ¼ 0:5, ka1 ¼ 20, ka2 ¼ 2 and ka3 ¼ 5. The initial weights
fðxÞ ¼ J1 ½FðqÞ
_ þ GðqÞ and dðtÞ ¼ J1 sd , where q; q_ denote
of actor-critic networks are chosen as
the position and velocity, respectively, FðqÞ _ is the unknown
c
W a ¼ zerosð10; 2Þ; W ^ c ¼ zerosð10; 1Þ. The discount factor is
chosen as c ¼ 0:1, and the positive matrices Q and R in the cost
function are selected as Q ¼ ½50 and R ¼ ½1, respectively. In
this scenario, the control performance with different payloads
mp ¼ 0 kg, mp ¼ 0:5 kg and mp ¼ 1 kg is tested, respectively.
Furthermore, to quantitatively evaluate the tracking perfor-
mance of the aforementioned controllers, three performance
indices (i.e., the maximum Mz , average l, and standard devia-
tion r) in Ref. 33 are introduced.
where
Xn
1 T 1 1
L1 ¼ zi zi ; L2 ¼ f T f; L3 ¼ trðWTc Wc Þ;
i¼1
2 2 2
1 1
L4 ¼ trðWTa Wa Þ; L5 ¼ xTn xn ðA2Þ
2 2
where 12. Yao ZK, Yao JY, Sun WC. Adaptive RISE control of hydraulic
systems with multilayer neural-networks. IEEE Trans Ind Electron
2019;66(11):8638–47.
q0 ¼ minf2ðk1 0:5Þ; 2ðk2 1Þ; ; 2ðkn 1:5Þ; a 3; 2b 13. Wang XJ, Yin XH, Wu QH, et al. Disturbance observer based
ka3 2; kc1 kmin ðKKT Þ þ kc2 4ka1 KTV KV kuc k2 ; adaptive neural control of uncertain MIMO nonlinear systems
ka1 kmin ðua uTa Þ ðka3 þ a þ 2Þkua k2 þ ka2 g with unmodeled dynamics. Neurocomputing 2018;313:247–58.
14. Sutton RS, Barto AGReinforcement learning: An introduction. 2nd
q1 ¼ 0:5ðkc1 2cm þ e 2am þ d2m Þ þ e2am þ d2m þ ð2ka1 KTV KV kucm k2
ed. Cambridge: MIT Press; 2018. p. 331–2.
þ0:5kc2 ÞkWcm k2 þ ð0:5ka2 þ ka1 kuam k2 ÞkWam k2 15. Widrow B, Gupta NK, Maitra S. Punish/reward: Learning with a
To ensure q0 > 0, the following conditions must be fulfilled: critic in adaptive threshold systems. IEEE Trans Syst Man Cybern
1973;3(5):455–65.
k1 > 0:5; k2 > 1; ; kn > 1:5 16. Cui RX, Yang CG, Li Y, et al. Adaptive neural network control of
a > 3; 2b ka3 2 > 0 AUVs with control input nonlinearities using reinforcement
ðA9Þ learning. IEEE Trans Syst Man Cybern 2017;47(6):1019–29.
kc1 kmin ðKKT Þ þ kc2 4ka1 KTV KV kuc k2 > 0 17. Guo XX, Yan WS, Cui RX. Event-triggered reinforcement
ka1 kmin ðua uTa Þ þ ka2 ðka3 þ a þ 2Þkua k2 > 0 learning-based adaptive tracking control for completely unknown
continuous-time nonlinear systems. IEEE Trans Cybern 2020;50
Solving the aforementioned differential inequality Eq. (A8), (7):3231–42.
we have 18. He W, Gao HJ, Zhou C, et al. Reinforcement learning control of a
q q q flexible two-link manipulator: an experimental investigation. IEEE
LðtÞ 6 ðLð0Þ 1 Þ expðq0 tÞ þ 1 6 Lð0Þ þ 1 ðA100Þ Trans Syst Man Cybern 2021;51(12):7326–36.
q0 q0 q0
19. Yang J, Su JY, Li SH, et al. High-order mismatched disturbance
Consequently, all system signals are bounded according to compensation for motion control systems via a continuous
the definition of L in Eq. (A1). dynamic sliding-mode approach. IEEE Trans Ind Inform 2014;10
(1):604–14.
20. Razmjooei H, Shafiei MH, Palli G, et al. Non-linear finite-time
References tracking control of uncertain robotic manipulators using time-
varying disturbance observer-based sliding mode method. J Intell
1. Han SS, Jiao ZX, Wang CW, et al. Fuzzy robust nonlinear control Rob Syst 2022;104(2):1–13.
approach for electro-hydraulic flight motion simulator. Chin J 21. Liang YQ, Dong Q, Zhao YJ. Adaptive leader-follower formation
Aeronaut 2015;28(1):294–304. control for swarms of unmanned aerial vehicles with motion
2. Yao JY, Deng WX. Active disturbance rejection adaptive control constraints and unknown disturbances. Chin J Aeronaut 2020;33
of uncertain nonlinear systems: Theory and application. Nonlinear (11):2972–88.
Dyn 2017;89(3):1611–24. 22. Xian B, Zhang X, Zhang HN, et al. Robust adaptive control for a
3. Deng WX, Yao JY, Ma DW. Time-varying input delay compen- small unmanned helicopter using reinforcement learning. IEEE
sation for nonlinear systems with additive disturbance: An output Trans Neural Netw Learn Syst 2022;33(12):7589–97.
feedback approach. Int J Robust Nonlinear Control 2018;28 23. Wang XR, Wang QL, Sun CY. Prescribed performance fault-
(1):31–52. tolerant control for uncertain nonlinear MIMO system using
4. Lu Y. Disturbance observer-based backstepping control for actor–critic learning structure. IEEE Trans Neural Netw Learn
hypersonic flight vehicles without use of measured flight path Syst 2022;33(9):4479–90.
angle. Chin J Aeronaut 2021;34(2):396–406. 24. Xu B, Sun FC, Pan YP, et al. Disturbance observer based
5. Yao JY, Jiao ZX, Ma DW. Extended-state-observer-based output composite learning fuzzy control of nonlinear systems with
feedback nonlinear robust control of hydraulic systems with unknown dead zone. IEEE Trans Syst Man Cybern 2017;47
backstepping. IEEE Trans Ind Electron 2014;61(11):6285–93. (8):1854–62.
6. Chen M, Ge SS, Ren BB. Adaptive tracking control of uncertain 25. Jing YH, Yang GH. Fuzzy adaptive quantized fault-tolerant
MIMO nonlinear systems with input constraints. Automatica control of strict-feedback nonlinear systems with mismatched
2011;47(3):452–65. external disturbances. IEEE Trans Syst Man Cybern 2020;50
7. Wu XQ, Xu KX, He XX. Disturbance-observer-based nonlinear (9):3424–34.
control for overhead cranes subject to uncertain disturbances. 26. Zhang R, Xu B, Shi P. Output feedback control of micromechan-
Mech Syst Signal Process 2020;139 106631. ical gyroscopes using neural networks and disturbance observer.
8. Bu XW, Wu XY, Ma Z, et al. Novel adaptive neural control of IEEE Trans Neural Netw Learn Syst 2022;33(3):962–72.
flexible air-breathing hypersonic vehicles based on sliding mode 27. Min HF, Xu SY, Fei SM, et al. Observer-based NN control for
differentiator. Chin J Aeronaut 2015;28(4):1209–16. nonlinear systems with full-state constraints and external distur-
9. Ouyang YC, Dong L, Xue L, et al. Adaptive control based on bances. IEEE Trans Neural Netw Learn Syst 2022;33(9):4322–31.
neural networks for an uncertain 2-DOF helicopter system with 28. Ran MP, Li JC, Xie LH. Reinforcement-learning-based distur-
input deadzone and output constraints. IEEE/CAA J Autom Sin bance rejection control for uncertain nonlinear systems. IEEE
2019;6(3):807–15. Trans Cybern 2022;52(9):9621–33.
10. Ma L, Xu N, Zhao XD, et al. Small-gain technique-based adaptive 29. Kim JW, Shim H, Yang I. On improving the robustness of
neural output-feedback fault-tolerant control of switched nonlin- reinforcement learning-based controllers using disturbance obser-
ear systems with unmodeled dynamics. IEEE Trans Syst Man ver. 2019 IEEE 58th conference on decision and control (CDC).
Cybern 2021;51(11):7051–62. Piscataway: IEEE Press; 2020. p. 847-52.
11. Zhang T, Ge SS, Hang CC. Adaptive neural network control for 30. Chen WH, Yang J, Guo L, et al. Disturbance-observer-based
strict-feedback nonlinear systems using backstepping design. control and related methods—an overview. IEEE Trans Ind
Automatica 2000;36(12):1835–46. Electron 2015;63(2):1083–95.
280 X. LIANG et al.
31. Xu B, Shi ZK, Yang CG, et al. Composite neural dynamic surface 33. Yao ZK, Liang XL, Zhao QT, et al. Adaptive disturbance
control of a class of uncertain nonlinear systems in strict-feedback observer-based control of hydraulic systems with asymptotic
form. IEEE Trans Cybern 2014;44(12):2626–34. stability. Appl Math Model 2022;105:226–42.
32. Hojati M, Gazor S. Hybrid adaptive fuzzy identification and
control of nonlinear systems. IEEE Trans Fuzzy Syst 2002;10
(2):198–210.