0% found this document useful (0 votes)
8 views

Nonlinear-robust-integral-based-actor-critic-reinforceme_2025_Computers-and-

This article presents a novel Robust Integral of the Sign of the Error (RISE)-based Actor/Critic reinforcement learning control structure for trajectory tracking of a three-wheeled mobile robot with mecanum wheels, addressing dynamic uncertainties and external disturbances. The proposed method eliminates the need for the persistence of excitation condition, ensuring stability and optimal performance in the control system. Simulation results demonstrate the effectiveness and resource efficiency of the RISE-based approach compared to conventional methods.

Uploaded by

minhvanbich1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Nonlinear-robust-integral-based-actor-critic-reinforceme_2025_Computers-and-

This article presents a novel Robust Integral of the Sign of the Error (RISE)-based Actor/Critic reinforcement learning control structure for trajectory tracking of a three-wheeled mobile robot with mecanum wheels, addressing dynamic uncertainties and external disturbances. The proposed method eliminates the need for the persistence of excitation condition, ensuring stability and optimal performance in the control system. Simulation results demonstrate the effectiveness and resource efficiency of the RISE-based approach compared to conventional methods.

Uploaded by

minhvanbich1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Machine Translated by Google

Computers and Electrical Engineering 121 (2025) 109870

Contents lists available at ScienceDirect

Computers and Electrical Engineering


journal homepage: www.elsevier.com/locate/compeleceng

Nonlinear robust integral based actor–critic reinforcement learning control


for a perturbed three-wheeled mobile robot with mecanum wheels

Phuong Nam Dao , Minh Hiep Phung


ÿ

School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam

ARTICLEINFO ABSTRACT

Keywords: In this article, a novel Robust Integral of the Sign of the Error (RISE)-based Actor/Critic
Reinforcement learning reinforcement learning control structure is established, which addresses the trajectory
Actor/critic structure
tracking control problem, optimality performance and observer effectiveness of a three
Mecanum-wheels mobile robots (MWMR)
mecanum wheeled mobile robot to be subject to slipping effect. The Actor–Critic
Optimal tracking control
Reinforcement Learning algorithm with a discount factor is introduced in integration with
Robust integral of the sign of the error (RISE)
the Nonlinear RISE feedback term, which is designated to eliminate the dynamic
uncertainties/disturbances from the affine nominal system. On the other hand, the
persistence of excitation (PE) condition can be ignored due to the presence of RISE term.
Stability analyzes in two proposed theorems demonstrate all the signals in the closed-
loop system and learning weights would be Uniformly Ultimate Boundedness (UUB) and
the consideration of the system under the impact of RISE that can promote the tracking
effectiveness. In conclusion, simulation results are shown in conjunction with the
comparison to illustrate the powerful capability as well as the economy in control resources of the pr

1. Introduction

Motion control field for Mecanum Wheeled mobile Robots (MWMRs) has attracted a great deal of interest with its wide-spread
application in the service industry, military, aerospace [1,2]. In recent years, several significant strategies for the trajectory tracking
control of MWMRs have been reported in [2–10]. In terms of emerging external disturbances and unmodeled dynamics, several control
approaches for MWMRs appear in the literature. Among this literature, sliding mode control (SMC) design has been discussed to achieve
the trajectory tracking problem [2,3,6,8] using the appropriate sliding surfaces. In [8], a simple sliding surface was presented to compute
the control signals for keeping the closed system on the sliding surface (SS) as well as tending to the SS.
However, it is important to ensure the finite time convergence in the process of approaching to the sliding surface due to the trajectory
tracking satisfaction. Therefore, several extensions were proposed with exponential function based sliding surfaces, such as nonsingular
terminal sliding mode (NTSM) approach [6]. Moreover, by considering extended state observer (ESO), several complete output feedback
control designs were presented for MWMRs to improve the tracking performance [2,3]. Authors in [2] extended the application of
conventional nonlinear control to the prescribed-time formation tracking control using linear matrix inequalities (LMIs). To overcome the
challenge of external disturbances and dynamic uncertainties, a different approach has been developed by Fuzzy/Neural Network based
estimation [10]. On the other hand, several different objectives have been mentioned in MWMRs systems, such as calibration, indoor
positioning systems [4,5] by Monte Carlo method, minimizing the special cost function.

ÿ
Corresponding author.
E-mail address: [email protected] (PN Dao).

https://doi.org/10.1016/j.compeleceng.2024.109870
Received June 17, 2024; Received in revised form 5 September 2024; Accepted November 10, 2024
Available online 22 November 2024
0045-7906/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

Most studies on control designs of MWMRs always consider conventional nonlinear control structures based on Lyapunov
stability theory, which focuses on the trajectory tracking control. However, few works actually developed the optimal control
approach in the connection with trajectory tracking problem [1].
Another approach to dealing with the obstacles of external disturbances and dynamic uncertainties is the RISE technique [11–14].
The key idea of the RISE method is to employ the integral of sign of error for obtaining control design without the dynamic model.
However, due to the effect of the sign function, the existence of a solution of the closed system is necessary to consider by
the work of Filippov [11]. In [12], the RISE method was extended with the trigonometric function to address the actuator
saturation in classical manipulator control systems. Furthermore, due to the satisfaction of transient performance, RISE
method was modified by adding a smooth decreasing performance function [14] to obtain the control design for
manipulators. Another approach of time-varying RISE technique was investigated to handle the sensitivity of disturbances
and limit of tuning capabilities [13]. However, the disadvantages of RISE method is to require the initial state variables to be located in
On the other hand, to the best of our knowledge, comprehensive research on MWMRs under the framework of RL based
optimal control and RISE feedback term remains unreported.
Also, research on the optimal performance of dynamic systems has been widely utilized in recent decades [1,15,16].
Approx-imate/adaptive dynamic programming (ADP) is considered as a branch of reinforcement learning (RL), which is able
to find the optimal control scheme without directly solving the Hamilton–Jacobi–Bellman (HJB) partial derivative equation
(PDE). For the development of RL algorithm in trajectory tracking control problem, it is important to evaluate non-
autonomous systems due to the time-varying Ref. [1,15]. Most existing researches solve the challenge of non-autonomous
systems by indirect method of transforming into equivalent autonomous systems with extended state variables [1,15,17]. It
should be emphasized that few works actually mentioned to the direct RL algorithm for nonautonomous systems [18,19],
which utilizes Taylor approximation to estimate the value function. In general, two major directions of RL control designs
are developed for robotic systems. The first approach is to find the optimal control based on data collection and the integral
RL equation with the advantage of model-free situation [20–24]. In [22], to avoid the consideration of solving to find value
function after each step technique, the value iteration (VI) is employed with the convergence to be guaranteed. Authors in
[20,21,24] developed a model-free approach using Off-Policy algorithm for discrete time systems after transforming it into
static optimization problem subject to algebraic equations based on Sylvester maps. This approach is extended for linear
continuous-time multi-agent systems (MASs) by considering zero-sum game problem [21]. Additionally, the work in [23]
addressed to the actuator saturation and complete dynamic uncertainty by considering trigonometric function and state
following (StaF) method to approximate optimal value function, respectively. However, the obstacle of the first RL control
direction requires the information of collected data satisfying persistent of excitation (PE) condition. In addition, the
satisfaction of not only tracking performance but also convergence of learning weights is difficult to prove. The second
approach of RL control design is developed by Actor/Critic learning mechanism, which evaluates the feedback control
scheme through the Actor and Critic learning networks [1,15,16,25–28]. For the Actor/Critic learning mechanism, the authors
in [1,15,25] simultaneously implement the learning algorithm in Actor and Critical Neural Networks based on the minimization
of Hamiltonian function without discount factor. Additionally, the Actor/Critic method was also extended for the overall
system by the Backstepping procedure [27,29] and it can be implemented for discrete time systems [28]. It should be
emphasized that the effect of dynamic uncertainties and external disturbances was carried out directly by integrating them
into the cost function for developing Hamilton–Jacobi–Issac (HJI) equation consideration [28]. However, the impacts of a discount fact
In the aforementioned literature, the dynamic uncertainties/external disturbances in MWMRs are always carried out by
conven-tional robust control, disturbance observer [2,10]. However, it is difficult or even failed to guarantee the optimal effectiveness.
Furthermore, the RL based optimal control strategy for MWMR systems in [1] has not investigated the effect of dynamic
uncertainties/external disturbances, which reduces the tracking performance. On the other hand, RL based optimal control
scheme for robotics requires the PE condition [1,15]. While, it is well known that RISE algorithm [11–14], which is a Model-
Free algorithm, can be utilized to guarantee without satisfying PE condition. That is to say, RISE is employed to improve
the trajectory tracking problem by observer possibility in the absence of PE condition. Therefore, RISE presents a better
opportunity to unify the trajectory tracking control and optimality requirement than conventional robust control and classical
disturbance observer. Inspired by the aforementioned challenges and the advantages of the RISE algorithm, and to ensure
the unification of trajectory tracking control, optimality performance, and observer effectiveness of external disturbances/
dynamic uncertainties, we develop the RISE based RL strategy with discount factor for MWMRs. In contrast to previous works, the mai

(1) The RL based optimal control strategy with discount factor is first constructed for controller design of MWMRs by
transforming the tracking error model of MWMRs into autonomous affine systems using filtered variables and the
results are summarized in Theorem 1. The trajectory tracking objective and the convergence of learning process are
guaranteed in the presence of the dynamic uncertainties/external disturbances, which are caused by unknown
parameters and slipping term. Meanwhile, a stability analysis is given for the closed-loop system with the proposed discount fact
(2) To deal with the dynamic uncertainties/external disturbances in MWMRs, distinguishing from existing approximation
[10] and robust controller [3,6,8], the control framework of the RISE term and Actor/Critic strategy is proposed in
Theorem 2 to guarantee that observation error is capable of converging to original point. In addition, the presented
method is of better tracking performance compared to the conventional RL based optimal control scheme in [1].
(3) Unlike previous RL methods, exemplified in [15,25], the PE condition in RL based optimal control strategy is eliminated
in Theorem 2 for satisfying the tracking and convergence to improve the control performance of the MWMRs control system.
Two Theorems with strict Proofs and comparative simulations are made to demonstrate the proposed control structure.

2
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

The remainder of this paper is organized as follows: preliminaries and problem statement are presented in Section 2, and the
proposed control design and stability, optimality, observer analyzes are described in Section 3. The simulation results are given and
discussed in Section 4. Finally, Section 5 summarizes the conclusion.
Notation: The following notation are given throughout this article. Let R, R , R × define the spaces of real numbers, real-vectors,
real-matrices, respectively. The identity matrix with compatible dimension is . represents the transpose operation. ÿ ÿ expresses
+ÿ+ 2
. Furthermore, ( ) denotes the minimal eigenvalue
the Euclidean norm of the vector = [ 1 , …, ] ÿ R as ÿ ÿ = ÿ 2 1
of a symmetric real matrix ÿ R × ÿ R × . R>0 , Rÿ0 denotes the set of positive real numbers and non-negative real numbers, respectively.
0 defines the × zero matrix.
×( )

2. Preliminaries and problem formulation

2.1. Kinematics and dynamics of MWMR with consideration of the slipping effect

In this part, we study the kinematics and dynamics of a MWMR under the impact of slipping being considered externally
disturbance. disturbance. First, a brief description of the distinctive mobility characteristics of MWMR is being capable of simultaneous and
independent movement in both translation and rotation through the incorporation of three omni-wheels uniformly settled at 120
degrees around its body (Fig. 1) that facilitates its stability while performing as well as the rapid adaptability for the robot in
heading to various tasks. Furthermore, this configuration together with the specially designed wheel enables the MWMR to perform
motions that are not achievable by conventional wheeled robots [30]. Second, to resolve the tracking task closer to reality, the
slipping effect is examined with the expectation of accuracy enhancement. Consider the following dynamic equation of a MWMR
being located in the earth-fixed frame (Fig. 1) as described
and the
in body-fixed
[1]. frame
=
ÿ

ÿ
(1)
=()+()+()(, ,)

where ( ) = [ , , ] ÿ R3 denotes the vector of position and heading angle of the MWMR in the earth-fixed frame, ( ) ÿ R3
denotes the velocity of the center of the mass (CM) point in the inertial frame (Fig. 1) and ( ) ÿ R3 corresponds to the torques
generated by 3 motors of the MWMR. The drift function ( ) ÿ ÿ ( ) ÿ1( ) ÿ R3 ÿ R3×3 denotes mechanical and friction effects defined as
ÿ ( ) ÿ1 ÿ1( ), ÿ R3 ÿ R3×3 represents the effect of the control input ( ) on the state defined as
ÿ1
( ) ÿ ( ) and ÿ R3 is a, vector of nonlinear disturbances caused by slipping effect with ( , ) ÿ ÿ where the slipping ,
term ÿ [ÿ1, 1], which represents the effect of the tire deformation [1], is automatically unpredictable. The parameters ÿ Rÿ0
represents the friction coefficient and the matrices ÿ , ÿ R3×3 are determined by
ÿ sin( + ) ÿcos( + )ÿÿ ÿ 1 2 2ÿ
3 3
ÿ1( ) ÿ ÿ
ÿsin( ) cos( ) ) ÿ

, ÿ ÿ ÿ
,
2 1 2
ÿ ÿ ÿ ÿ

ÿ ÿcos( + 6
ÿcos( ÿ
3
)ÿÿ ÿ 2 2 1ÿ

where , ÿ R>0 denotes the length from the center of mass to a wheel and the radius of the wheel, respectively (Fig. 1). Next,
2 2 2
= 4 = 2 2
1
+ + , 2 satisfies the following property:
ÿ

9 92 92 9

Property 1. The inertia matrix is symmetric and bounded by two matrices as 0 ÿ 1 ÿ ÿ 2 with two positive constant numbers
1 > 0 and 2 > 0.

2.2. Problem formulation and control objective

To leverage the RISE controller for handling disturbance as described in [31], the dynamic equations of the MWMR (1) are
modified using an earth-fixed representation of the dynamic model as follows:
ÿ

+ ÿ( ) ÿ =+ (2)
ÿ

ÿ1 ÿ1
where ÿ( ) = ÿ . The objective
( ) ÿ1( ) +is( to
) design a controller
ÿ1 ÿ1( ),that
=() , =ÿ()
ensuring the system state ( ) tracks a desired time-varying trajectory ( ). In order to measure the success of the tracking objective,
a trajectory tracking error denoted by 1 ÿ R , is defined as

ÿÿ (3)
1

where ÿ R3 defines the desired time-varying trajectory. To facilitate the subsequent RISE analysis, two tracking errors denoted
by 2 , ÿ R , are also established as

ÿÿ1+11
2
(4)
ÿÿ2+22

3
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

Fig. 1. Coordinate frames and configuration parameters for a MWMR [1].

where 1 ÿ R × and 2 ÿ R are positive matrix and positive constant, respectively. To develop a state–space model for the tracking 2 ( ) can be
errors in (3) and (4), we derive with respect to time (4) and substitute the model (2), (3), the error system for obtained as:

2
=ÿ
2
ÿÿÿ

(5)
ÿ ÿ

where ( 1 , 1, , , ÿ ) ÿ R represents a vector of uncertainty is defined as


ÿ

ÿ ÿ(ÿ + 1 1 ) ÿ ÿ( ÿ +11) (6)

On the other hand, the closed-loop error system for 1 ( ) is developed from (4):
ÿ

1 =ÿ
1 1+2 (7)

With the intention of addressing the problem of RL-based optimization under the nominal system and eliminating the consideration of disturbances from
the dynamic system, the controller input is designed as follows:
ÿ
ÿÿ (8)
ÿ

where is designed to approximate the term ÿ( + ) as shown in Section 3.2 and the RL based optimal control scheme is investigated in Section 3.1. Combining (5) and
(7) and using the controller input in (8), the tracking error dynamics is expressed by the following equation with the notation (3):

1 1 1+2 1 1+2 ×
ÿ

ÿ
ÿ

+ (9)
[ ÿ2 ]=[
ÿ

()2 ÿÿÿ]=[ÿ
ÿ

ÿ ( (ÿ 1 + ) ) 2 ] +[0 ×](ÿ ÿ)
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿ ( 1 ,
(1,2,) 2)

where the error ÿÿ R3 is defined as the difference between and ÿ( + ) expressed as

ÿÿ ÿ( + + ) (10)

By letting ÿ [ 1 2 ] ÿ R2 , Eq. (9) can be rewritten as


ÿ

=(, + (11)
)+()(ÿ ÿ)

Then, the following assumption is introduced to serve the tracking purpose:

Assumption 1. The reference dynamic system is expressed in Eq. (12) and assumed to be under the existence of the function ÿ ) ÿ R2 ÿ R2 ÿ ( ,
, which is known as a Lipschitz continuous function:

ÿ) (12)
[ÿ]=ÿ(,

where defines the time derivative operator.

Remark 1. In practical systems, the trajectory is generated at each distinct time depending on various external factors such as its considerations, its
maneuverability, and the complexity of the environment. The trajectory references are represented in some mathematical forms. For instance, the
polynomial, and trigonometric functions are appropriate for ÿ and ÿ . Assumption 1 is utilized to transform the time-varying system (11) into an affine system
for developing RL strategy as described in next sections.

4
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

In view of Eq. (11), the presence of ( ) in the matrix renders the aforementioned tracking error dynamics as a non-autonomous closed
system. Based on Assumption 1, the vector of augmented system state variables is defined to facilitate the system under the examination of
an affine system as follows
4
=[ ÿ]ÿR (13)

Then the dynamics of ( ) are formulated as:

+ (14)
= ( ) + ( ) ( ÿ where: ÿ)

)
()=[(1,2,ÿ)], ÿ(, ()=[(1,2
02) ×
It ]is. different from the Zero Sum Game approach considering the integration of
disturbance into performance index with Nash stability theory [32], for the extension of handling for dynamic uncertainties and external disturbances
ÿ

(10), RISE [31] feedback term is utilized as the auxiliary term with some following assumptions to be made:

Assumption 2. Position and direction vector and their partial derivative are bounded by positive constants ÿ 2 ( ÿ ÿ R+ ) . 1 , 2 ÿ R+ such that
ÿ

ÿ ( )ÿ ÿ ÿ ÿ ( )ÿ ÿ ÿ 1 ,

Assumption 3. The first and second partial derivatives of ÿ( ) are existed and bounded.

Assumption 4. The trajectory reference , its first and second derivatives are bounded.

Assumption 5. The unknown disturbance , its first and second derivatives with respect to time are bounded.
ÿ

Assumption 6. The matrix G(X) is known and bounded, there exists a known positive constant ÿ R+ satisfied 0 < ÿ ( )ÿ ÿ ÿ.
Control objective: In our article, the problem of tracking a desired time-varying trajectory is presented in the presence of ineluctable
uncertainties/disturbances ( + ) on Mecanum wheels, particularly the impact of slipping is examined elaborately.
Generally speaking, this article concentrates on some issues as follows:

(1) To track a reference time-varying trajectory and handle the external disturbance simultaneously in the angle of optimization by
investigating the optimal control equations under the impact of the Discount Factor using the modified HJB equation.
(2) The RISE controller is introduced along with the assignment of compensating for uncertainties/disturbances ( + ) under the slipping
effect and dynamic uncertainties for improving the trajectory tracking performance.

(1,2) ×
it follows that the Assumption 6 is evidently achieved
Remark 2. According to ( ) = [ 02 × ] and ( 1 , 2 ) = [ 0 ×],
ÿ

and the tracking control is developed for the variable = [ ] ÿ R4 in Section


1 23. ]On
ÿ R2
thebeing
otherone
hand,
termit is
ofworth
=[ pointing out that is different from
[1] considering classical order reduction with the joint derivative, the filtered variable (4) is utilized to establish the affine system (14) for
developing RL control design in Section 3. On the other hand, the filtered variable (4) is also employed to develop the observer problem of
uncertainties/disturbances ( + ) by the RISE approach as shown in Section 3.2, which improves the tracking effectiveness. Therefore, we can
cope with the general problem without considering the uncertainties/disturbances to find the optimal control term, while simultaneously
handling the uncertainties/disturbances term in another distinct part.

3. RISE is based on the Online Actor–Critic algorithm with a discount factor

In this section, a framework of RISE and Online Actor/Critic algorithm is presented for a perturbed MWMR with the stability of
the closed control system to be studied in two theorems, and the observability of RISE feedback term is discussed.

3.1. Optimal control with discount factor and the modified HJB equation

Due to the presence of RISE, dynamic uncertainties or disturbances can be handled by estimating its effects on the system through the
term (47) which means the error ÿ(10) will be converged to 0. The proof of its convergence is detailed in Theorem 2. In this section, the
ÿ

nominal system (15) obtained from the perturbed system (14) by ignoring the term control design.
ÿ
is taken into account in optimal is
Moreover, the error considered in determining the stability of closed system as mentioned in two Theorems 1, 2.

= ( ) + ( )ÿ (15)

The objective of the infinite-horizon optimal control problem is to find an optimal control scheme ÿ ( ) that both achieves the
tracking goal and minimizes the following cost function with a positive discount factor > 0:
ÿ
ÿ ÿ

( ), ÿ(ÿ) ( ( ), ( )) (16)
(
())=ÿ

5
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870
ÿ

ÿ ÿÿ

where ( , )ÿÿ + , ÿ R × is a positive-definite symmetric constant matrix, and is defined as


ÿ

02 ×2

=[ 02 ×2 02 ×2 ]

where ÿ R2 ×2 is a positive-definite matrix. The addition of discount factor in performance index (16) is to keep the control
ÿ

signal as time goes to infinity. Moreover, the satisfaction of the admissible control policy as mentioned in [15,25] of optimal
ÿ

control signal ÿ ( ) is not necessary. The constraint set ( ) of control signal ( ) is only satisfied the finite of performance
index (16). By inserting the desired trajectory into state variables vector (13), the challenge of developing the RL algorithm for
time-varying systems is carried out. Therefore, the Bellman function with respect to arbitrary time made by Dynamic programming
principle is expressed by a following static function ÿ ( ( )):
ÿ
ÿ
( ( )) = min ÿ ( ( ), ( ( )) ) (17)
( ( ))ÿ ( )

The modified Hamiltonian function with discount factor is formulated by taking the time derivative of the static Bellman function
ÿ
( ( )) (17) by two different methods. The first approach is to directly derive as:
ÿ ÿ
ÿ
( ( )) = = (18)
(()+()ÿ)

The second approach is to compute by employing Dynamic Programming (DP) principle for the static Bellman function (17) at
instant time to achieve the equation as:
+
ÿ
ÿ(ÿ)
(19)
ÿ

ÿ( ( (), ( ( ))) + ÿ( (+ ) )
())=ÿ

Hence, it yields that:


+
ÿ(())ÿÿ( (+ ) ) = (ÿ ÿ1)
ÿ(ÿ)
( (),
ÿ
( ( ))) + ÿ((+)) (20)
1ÿ

In the view of (18) and (20) as ÿ 0, it follows partial derivative Eq. (21) for obtaining ÿ ( ( )) from ÿ ( ):
ÿ
ÿ ÿ
( (), ())ÿ ( ( )) + (()+()ÿ)=0 (21)

ÿ
The modified optimization problem is given to obtain the inverse implicate ÿ ( ( )) ÿ ( ) by using optimal value function
ÿ
( ( )) (17) and the DP principle as:
+
ÿ ÿ
(22)
ÿ

( ( )) = min ÿ ( ,ÿ ) +
( )ÿ ( ) ( ÿ ( ( + )) )
ÿ
As ÿ 0 +, the following problem is given to optimize the inverse implicate ÿ ( ( )) ÿ ( ):
ÿ

ÿ
min ,ÿ ) ÿ
ÿ
()+ ( ()+() (23)
( )ÿ ( ) ( ( ÿ))=0

By utilizing the modified Hamiltonian function with the addition of a discount factor > 0 as
ÿ

ÿ ÿ ÿ

(, ,ÿ
ÿ
,
ÿ
)= + () ()ÿ
ÿ
()+ÿ
ÿ
( ) ( ( ) + ( )ÿ ( ) ) (24)
ÿ
()
with ÿ ÿ
()ÿ . On the other hand, according to (23), (24), the optimal control signal ÿ ( ) is then obtained as described
in the following optimization:
ÿ
ÿ
ÿ
1 ÿ1 ÿ
()= ÿ
[ ( , ,ÿ , ÿ )] = ÿ 2 ( )ÿ () (25)
( )ÿ ( )

Additionally, in the view of optimal control ÿ ( ) (25) and optimization problem (23), the following partial derivative equation
(PDE) is established as:
ÿ
ÿ
ÿ ÿ
ÿ

ÿ
1 ÿ ÿ1 ÿ ÿ
( , ,ÿ , )= ÿ

()ÿ ÿ ()() ( )ÿ ()+ÿ ()()=0 (26)


4

However, it is impossible to analytically solve the partial derivative Eq. (26). Therefore, the on-policy actor/critic strategy is
employed to seek the optimal control signal in Section 3.1.1.

Remark 3. Note that it is necessary to employ a discount factor in the cost function (16). The fact is that the reference in (13)
may not converge to zero in the most practical systems. Hence, the cost function (16) will become infinity after eliminating the
discount factor while, the optimal value function ÿ ( ( )) is required to be a finite value. Additionally, different from standard
quadratic performance index [15,25], we consider an additional discount factor in cost function (16) for maintaining the control
ÿ

signal as time converges to infinity. Hence, the modified Hamiltonian function (24) is established by computing time derivative
of Bellman function with two above different approaches.

6
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

3.1.1. Online Actor–Critic algorithm implementation with a discount factor


In this section, an online actor–critic algorithm which involves tuning both critic and actor NNs simultaneously in real-time is
used to find an approximate solution of the HJB Eq. (26). According to the Weierstrass high-order approximation theorem [25], the
Bellman function ÿ ( ) can be approximated by using a single-layer NN as
ÿ ÿ
()= ()+() (27)
ÿ ÿ

where ÿ R is an ideal constant weight vector and bounded by a known positive constant . Moreover, such that ÿ ÿ ÿ
( ) ÿ R is a suitable activation function vector and , ( ) are the number of neurons, the approximation error, respectively.
The corresponding gradient vector of (ÿ) is given:
ÿ
ÿ ()=ÿ()+ÿ() (28)

As we have all known, on the compact set the


, approximation error and its gradient ÿ ( ) are bounded as shown in [25]:
ÿ and ÿ ÿ . For the activation functions ( ), we make the following assumption to estimate the attractor of tracking
error in Section 3.3.

Assumption 7. The activation functions ( ) and their gradients are bounded, ie, ÿ ( )ÿ ÿ , ÿÿ ( )ÿ ÿ

The optimal control scheme (25) is rewritten as:


ÿ 1 ÿ1
()=ÿ (29)
2 )
()(( +())

As the number of neurons converges to infinity, the approximation , ÿ tend to zero. Additionally, the value function
errors and the control policy in RL algorithm are given with a fixed number ÿ N as:
ÿ ÿ

ÿ( , ÿ )= ()
1 (30)
ÿ1
,
( ÿ)=ÿ 2 ()( )
ÿ ÿ

where and are the observers of the converged weights ÿ with the deviations of critical NNs and actor NNs to be known as
ÿ ÿ ÿ ÿ ÿ ÿ

=ÿ =ÿ and error as:


ÿ R ÿ R,. The training laws of are obtained after computing the Bellman
ÿ ÿ

ÿ ÿÿ

ÿ
= (, )+ + (31)

where
ÿ

(, ) = ( ( ) + ( )ÿ ) ÿ (32)

is the function combining by model (14), activation function and its partial derivative, approximate control signal.
In light of [1,15], the learning law of Critic weights is formulated as:
ÿ

=ÿ
ÿ
1+
(33)

=ÿ(ÿ+ 1+ )

where ÿ R+ are positive constants, ÿ (0, 1) is the constant forgetting factor, ÿ R × is the estimation of gain matrix. The
,
Actor weights are given by the following learning algorithm:
ÿ
1
=ÿ
(34)
ÿ ÿ ÿ

1
( )ÿ( )(ÿ ÿ)ÿ 2(ÿ ÿ).
ÿ1+
According to (24) and (27), it yields the equality as:
ÿ

ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ
( , ,ÿ , )=( )(, )+ + () ()+(()+()ÿ)ÿ=0 (35)

With the addition of (35) to (31), the following Bellman error is rewritten as:
ÿ ÿ ÿ

ÿÿ ÿ ÿ ÿ ÿ
= ,ÿ ) + + ÿ( ) (, )ÿ () ()
(
ÿ

(()+()ÿ)+
ÿ
(36)
ÿ ÿ ÿ
=ÿ
( ,ÿ ) ÿ (()+()ÿ)++( ) ((, )ÿ(, ))+
ÿ ÿ
+ , () ()
ÿ(,ÿ)ÿ( ÿ)ÿ

Additionally, according to (29) and (30), it implies the following equality:


ÿ ÿ
()= = (37)
ÿ(,ÿ)+ ÿ(,ÿ)+1+2

7
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

where:
1 ÿ1
=ÿ
1 2 ()()ÿ()(),
(38)
1
=ÿ ÿ1 ÿ =
2 2 1+2

The following equation is modified as:


ÿ ÿÿ
ÿ ÿ ÿ ÿ
( ) ((,) )ÿ(, ))+
ÿ

ÿ
ÿÿ
= ÿ( ÿÿ2
ÿ(1+2) (1+2) (39)
()( )())()
ÿÿ
= ÿ( ÿ ÿ
ÿ2 ÿ2 ÿ
+(
ÿ

2)
1 1 1 2

Furthermore, based on (30) and (38), it yields that:


ÿÿ
ÿ( ÿ ÿ
ÿ2 ÿ2 ÿ
=0 (40)
)( )() 1

On the other hand, according to (36), (39) and (40), the Bellman error term is formulated as:
ÿ
1 1
ÿ
=ÿ
+ ÿ

ÿ)+ (41)
4 ÿ( )ÿ( ) 4()ÿ()ÿ()(+
The dynamic of the estimated weight Critic can be obtained by substituting (41) into (33) as:
ÿ ÿ
ÿÿ

=ÿ
+ (+ ) (42)
1+
where:

1 ÿ
ÿ

(43)
4ÿ( )ÿ( ) 4()ÿ()ÿ()(+
=(1 ))
and = is guaranteed as:
ÿ 1+
1
ÿÿÿ , 1ÿ() (44)
ÿÿ1
Furthermore, the dynamic of the estimated weight Critic error is formulated as described in the following perturbed system:
ÿ

= (45)
1+2

where =ÿ
ÿÿ

=
1 ,2
1+
(+ )

3.2. Designing a feedback controller RISE

This section presents the development of the RISE controller. To complete the control input (8), substituting (5) into (4) to obtain
that:
ÿ

=ÿ
2 ÿÿÿ+22 (46)

The term in the control signal (8) is designed based on the subsequent RISE analysis to estimate the input disturbance ÿ( + ) [31] as follows:

()=(+1)(2()ÿ ( 2 ( ))) (47)


2 (0)) + ÿ0 ( (+ 1) 2 2 ( ) + 1
where , 1 ÿ R are two positive constant coefficients being selected as:
1
1>1+ 2. (48)
2

denote two given bounds of ÿ where ÿ ÿ and ÿ ÿ ÿ ÿ , respectively, as described in [31] and (72). ÿ ÿ
1,2 ÿ

Under the proposed control signal (8), the tracking error of the closed-loop system can be developed by substituting it into (46) to obtain that:

ÿ
=ÿ
2 ÿÿ+22+
ÿ

(49)

In the view of dynamic Eq. (2) and the control law (8), (30), it implies that:
ÿ ÿ ÿ ÿ

+(ÿ()ÿ+ ÿ)= +ÿ(, ,)=() (50)


ÿ

.
ÿ

where ÿ ( , , ) = ()ÿ+
ÿ

8
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

Remark 4. The addition of RISE term (47) is made to improve the tracking performance of RL based control signal (30) due to the observer property of dynamic uncertainties/
disturbance ÿ( + ) (Theorem 2).

3.3. Stability analysis of closed-loop system

For performance analysis, two Theorems are given to consider the stability of the closed system under the control frame (8) of RL controller and RISE term. Theorem 1
focuses on Optimal Control Problems with the tracking performance being uniformly ultimately bounded (UUB). Theorem 2 investigates the tracking effectiveness of RISE
feedback term (47) under the requirements of initial state variables.

Theorem 1. Let Assumptions 1, 6, 7 hold and consider the Mecanum WMR system to be expressed by dynamic Eq. (1). By establishing the proposed control
framework (8) with RISE feedback term (47) and RL based optimal control strategy (30) under adjusting mechanism (34), (33) and the signal vector satisfies
3
the PE condition (52), the design parameters areproperties
following selected to satisfy
are ÿ ( ) = > 1 2 ÿ 1+ with to be obtained from the Eqs. (34), (53). Then, the
achieved:
1

1,3
ÿ ÿ

and
(1) The error of the weights of the Actor–Critic NNs will stabilize UUB; (2) The
tracking error ÿ R2 of a Mecanum WMR is also satisfied UUB.

Proof. It should be noted that after ignoring the effect of system 2 in (45), The exponential convergence of of the following nominal

ÿ ÿ
ÿÿ

=ÿ
(51)

is obtained with the satisfactory of the persistence of excitation (PE) condition (52)[15,25]:
0+

ÿ ( )ÿ ( ) ÿ 1 , ÿ0ÿ0 (52)
2ÿÿ 0
ÿ

Due to the time-varying property of the tracking error choose as time- models in Eqs. (45), (51), the Lyapunov function candidate is necessary to
ÿÿ

varying function for studying the stability of the closed system. Therefore, the choice of traditional term in Lyapunov function is inappropriate. However, thanks 12

to the satisfactory of PE condition (52), there exists a time-varying function ÿ R × [0, ÿ) ÿ R to be known as a term of the completed Lyapunov function candidate with some
of the following properties:
ÿ ÿ

2 2
1ÿ ÿ ÿ
ÿ ( ÿ , ) ÿ 2ÿ ( ÿ , ) , )

(ÿ ÿ

2
+ ÿ 1 ÿ ÿ 3ÿ ÿ
(53)

ÿ
ÿ
(ÿ,) ÿ

ÿ
ÿ
ÿ ÿ ÿ 4ÿ ÿ ÿ
ÿ

where 1 , 2 , 3 , 4 ÿ R are positive constant coefficients. According to Assumptions 6, 7, several following estimates are given to
obtain the attractor after taking the time derivative of Lyapunov function candidate:
ÿ ÿ

= ÿ1
.ÿ ÿÿ1;
ÿ
2 ()ÿ ( ) ÿ ÿ 2ÿ ;
. ÿÿ ÿ
ÿ ÿ

3
ÿ (54)
. ÿÿ + ÿÿÿ3;ÿ
ÿ

ÿ
ÿ(ÿ 14 ÿ ()ÿ () ()ÿ () ÿ ()ÿ ()ÿ ()ÿ ()ÿÿÿÿ4;
.
2ÿ ) 1 1 ) 1 ÿÿ
+2 (+2 +2
ÿ ÿ

Additionally, due to the purpose of considering the convergence of not only but also and trajectory tracking performance to study
1 ÿÿ

( ) in (14), the Lyapunov function candidate is added more the optimal function ÿ ( ) and the quadratic form 2
the stability of the closed control system and the convergence of the approximate control policy as follows:
ÿ ÿÿ

ÿÿ()+( 1,) (55)


+2

Under the smooth function description and positive definite of Bellman function ÿ ( ) (17), it follows that there exist two K-class functions such that:
1,2

ÿ 4
1(ÿ ÿ)ÿ ()ÿ2(ÿÿ), =[ ÿ]ÿR (56)

Based on (53) and (56), it can be seen that the Lyapunov function candidate is proper as shown by:
ÿ ÿ ÿ ÿ ÿ ÿ

1 2 1 2
1 (ÿ ÿ) + 1ÿ ÿ 2ÿ+2 ÿ , ÿ 2ÿ+2 ÿ
ÿ(, , ) ÿ 2 (ÿ ÿ) + 2ÿ

9
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

Taking the time derivative of along the closed system trajectory (14) under RL based optimal control strategy achieve the following result: ( ) (30), we

ÿ ÿ
ÿ ÿÿ
ÿ

() () ÿ

= + ()
( ( ) + ( )ÿ ) + + ÿ(1+2)ÿ (57)

where
ÿ
ÿÿ

=ÿ
1

(58)
2 =
1+ (+ )

Thanks to the property of Hamiltonian (21), it obtains that: ( )


ÿ ÿ
() ÿ
ÿ

ÿ ÿ ÿ

()=ÿ () ()ÿ
ÿ

() ()+ () (59)

Substituting (59) into (57), and then combining with = , (53) and above inequalities (54), (25), (34), it obtains that:
ÿ
ÿ ÿ ÿ ÿ ÿ ÿ
ÿ
2 ÿ

ÿÿ ÿ

() ()+ ()+2 (
ÿ

) ÿ 3ÿ ÿ + 4ÿ ÿÿ 2ÿ+

1 ÿ ( ) ÿ( ) () (60)
+2
ÿ

ÿ ÿ+ ÿ 1 + ÿ

ÿ(ÿ ÿ)ÿ2 (ÿ ÿ)ÿ

According to (29), (37), (38) and using the estimation (54), it implies to:

ÿ ÿ ()ÿ () ÿ ()ÿ ()ÿ ()ÿ ()


2 (ÿ
ÿ

ÿ)= 1 ) + 1 ) +
(2 (2 12
(61)
1 ()ÿ ()ÿ
+ ÿ4
2
and the term 2 in (58) is bounded by:

0
ÿÿÿ2 3 (62)
ÿ1

In the view of the inequality (54) and Bellman error ÿ (36) and the classical inequalities ÿ ÿ ÿ ÿ ÿÿ ÿ, ÿ , ÿR ,
ÿ
ÿ ÿÿ ÿ ÿ 1, ÿ
1
ÿ ÿ 1+
ÿ 1, it follows that the estimation is modified as:
ÿ 1+
1 ÿ()ÿ () 1 ÿ()ÿ ()
ÿ
= ÿ

(ÿ ÿ)ÿ (ÿ ÿ)
ÿ1+ ÿ1+
ÿ

ÿ()ÿ1 ()ÿ 1 ()ÿ () () (63)


+ ÿ ÿ

(()+()
×(ÿ 4 4 ÿ))
ÿ
2 2
ÿ ÿ

ÿ1 1 2ÿ ÿ +1 1 2ÿ ÿ + 1 1 2 3ÿ ÿ+1 2123

On the one hand, it can be seen that:


ÿ ÿ

2
2 2 (64)
ÿ ÿ

ÿ ÿ 2ÿ ÿ
ÿ(ÿ ÿ)= ÿ(ÿ ÿ ) ÿ 2 1ÿ

Moreover, it is obvious to obtain that:


ÿ ÿ ÿ2
ÿ

ÿ2 ÿ ÿÿ ÿ ÿÿ ( ) (65)

Combining (61), (62), (63), (60) and (65), we obtain the estimation that:
ÿ
ÿ
2 ÿ
2 2
ÿÿ ÿ(3
ÿ

1 12)ÿ ÿ ÿ 2ÿ ÿ +1 1 2 3+
0 2 ÿ

ÿ2 ÿ (66)
3+1 123+1 ÿ+()+ ()
+4+(4 2ÿ1 12+21)ÿ

1 2
In the view of Cauchy inequality ÿ 2+ , it follows that:
4
ÿ
ÿ ÿ2
ÿ
2 ÿ
2
ÿÿ + ()+() ÿ (1 ÿ ) ( 3
ÿ

1 12)ÿ ÿ ÿ 2ÿ ÿ

2 1 0 2 (67)
+1 123+4+4(3 3+1123+21+1
1 2 )2
1 1 2 ) ( 42 ÿ 1
ÿ

Let us choose the parameters satisfying 0 < < 1, 3 > 1 1 2 . For the purpose of considering the convergence of ( ) as well as 3 , 4 satisfying:
ÿ ÿ ÿ ÿ

, , we define the vector = [ , , ] . It can be seen that there exist two K class functions
ÿ
2 ÿ
2
3 (ÿ ÿ) ÿ + (1 ÿ ) ( 3
ÿ

112)ÿ ÿ + 2ÿ ÿ ÿ 4 (ÿ ÿ) (68)

10
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

Based on (68), the inequality (67) can be written as:


ÿ
2 ÿ2 1 0 2
ÿÿ 3 (ÿ ÿ) + 1 123+4+() + 3+1123+1
ÿ

1 12) (4 1 2 + 2 1 )2 (69)
4(3 2ÿ1
ÿ
+ ()

Therefore, we can conclude that ÿ ÿ is UUB with the attraction region as:
ÿ1 1
ÿÿ
ÿ{ÿÿ×(4 3( 4(3
ÿ

112)
(70)
0 2 2 ÿ2 ÿ
3+1123+1 123+4+() +
1 2 + 2 1 )2 + 1 ( ) )}
2ÿ1
The proof of Theorem 1 is completed. ÿ

Remark 5. It can be seen that according to (32) and (51), the satisfaction of PE condition (52) depends on the discount factor .
Furthermore, the attraction region (70) is also determined by this discount factor . Different from RL control scheme for Mecanum WMR
systems [1] in which the uncertainties/disturbance has not been carried out, the proposed controller (8) studies its estimation to improve the
tracking performance. On the other hand, it should be noted that the disadvantages of the PE requirement (52) in Theorem 1 can be carried out
by RISE feedback term (47), which is considered in Section 3.2.
It can be seen that the attraction region (70) depends on the effectiveness of not only the RL based optimal control scheme (30) but also
the RISE feedback term (47) to be described by tracking error ÿ. The following Theorem 2 focuses on Tracking Performance to be handled by
RISE feedback term (47). To evaluate the stability consideration of the closed system with RISE control term ( ) (47), two following Lemma 1
[11], Lemma 2 [33] are utilized to obtain the Lyapunov candidate function and estimate the attractor, respectively:

Lemma 1 ([11]). The following terms 1 ( ) is designed as follows:

1 ( ) = ( ÿ 2 + 2 2 ) ( ÿ 1 sgn( 2 ) ) (71)

where

= ÿ;
(72)
= ÿ

+ ÿ( .
)
, ÿ then the following inequality can be obtained:

1 ( ) ÿ 1 ÿ ÿ 2 (0) ÿ ÿ 2 (0) (0) (73)


ÿ0 =1

Proof. See proof in [11]. ÿ

Lemma 2 ([33]).: Assume that the existence of a solution of the nonlinear system the ()=((),)( ÿ R × Rÿ0 ÿ R ) is satisfied. Let
region be defined as , ÿ ÿ
< } and let = { ÿ R ÿ × Rÿ0 ÿ Rÿ0 be a continuously differentiable function such that:
1 ( ) ÿ ( , ) ÿ 2 ( ) and
(74)
( , ) ÿ ÿ ( ) , ÿ ÿ 0, ÿ ÿ

where 1 ( ) , 2 ( ) are continuously positive definite functions and ( ) is a uniformly continuous positive semi-definite function. If the
inequality (74) is satisfied and the initial point (0) ÿ , then we obtain:

( ( )) ÿ 0 as ÿ ÿ (75)

where the region is given by:

={ÿ , 2()ÿ}, 0 < < min ÿ ÿ= 1() (76)

Proof. See proof in [33]. ÿ

Theorem 2. Consider the perturbed MWMR (1) (Fig. 1) under Assumptions 2–5, the proposed control structures (8) consists of RISE control law
( ) to be given in (47) and the Actor/ Critic strategy (30), can achieve all closed-system signals are bounded and optimal performance as well as
the following convergence performance in the closed MWMR control system.

lim
ÿÿ ÿ ( )ÿ = 0, ÿ (0) ÿ (77)

11
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

where ÿ R3 is defined as ÿ [ 1 2 ] , and the region is achieved from (93). On the other hand, if ( ) ÿ R in (27) is ÿ ] ÿ R4 then the
=0|
chosen such that ( )| | ×(4 ) , =[ observer effectiveness is also guaranteed because of the following
| =0
convergence:

( ) ÿ ( ÿ ÿ ) as ÿ ÿ (78)

Proof. The RL control signal (30) can be written as


ÿ
ÿ

1()=ÿ2
ÿ1 ÿ () (79)
()( )

The proof contains three parts. First, based on (49) and the consideration of Actor/Critic strategy (79), we have the closed tracking error system as follows:

ÿ ÿ

=ÿ
2 ÿÿ+22+ ()ÿ() (80)

In order to analyze more the convergence of vector ( ), it is necessary to take the time derivative of (80) under the estimation of ( ) (47) and the Actor/Critic
control in Theorem 1 for obtaining the dynamic of ( ) as follows:
ÿ

= +ÿ ÿ

(81)
2 ÿ(+1)22 1 hour ( 2 )

with ÿ( 1,2, , ) to be achieved as:


ÿ

ÿ()()+
ÿ
ÿ

2
ÿ ÿ

(82)
ÿ

=ÿ
+2+
ÿ

2 2 + (ÿ ÿ ) + 2

Second, for proceeding the challenge of sign term in (81), according to Lemma 1, the positive definite time varying Lyapunov function candidate is chosen
as:
1 1
(,)= 11+2 2+ +() (83)
2

with the following auxiliary term to be utilized:

()=ÿ ÿ 2 (0) ÿ ÿ (0) 1() (84)


2 (0) ÿ ÿ 0
=1

and the selected state variables vector is inserted the term ÿ ( ) as:

3 +1
1 (85)
()=[ 2 ÿ()]ÿR

Furthermore, it is worth emphasizing that due to the employment of Lemma 2, the work of Filippov in [11] determines the existence of solution ( ) in a set of
differential Eqs. (46), (81), (84).
Third, after differentiating the time varying Lyapunov function (83) along the closed error system (81) and substituting the
auxiliary term (84), we have:

ÿ2111+2 2 1 + ÿ( ( , ) = 1,2, , ) ÿ ( + 1) ÿ (86)


222

Since the conventional inequality 2 2 1 ÿ ÿ 2ÿÿ1ÿ


2+ÿÿÿ2
ÿ , the estimation can be verified as:

( , ) ÿ ÿ( 1,2, , ) ÿ ( + 1) ÿ ( 2 1 ÿ 1 ) ÿ2 ÿ ÿ 1 ÿ ÿ ÿ ( 2ÿ 1) 2ÿÿ2ÿ (87)

According to the work in [11.31], there exists a non-decreasing function F ÿ Rÿÿ ÿ Rÿÿ such that:
ÿ
ÿ ÿ( 1,2, ÿ , ) ÿ F (ÿ ÿ) ÿ ÿ ÿ ÿ (88)
ÿ

where ÿ [ 1 2 ] . Based on (87) and (88), the following inequality is derived as:

2 2
(,)ÿÿ3ÿÿ ÿÿÿ + F (ÿ ÿ) ÿ ÿ ÿ ÿ (89)

2 + 1 2
where
3 ÿ 1), ( 2 ÿ ÿ, ÿ ÿ ÿ Rÿ0 . As ÿ 1), 1}. According to Cauchy's inequality, = min{(2 1 ÿÿ F 2 (ÿ ÿ) ÿ ÿ ÿ F (ÿ ÿ) ÿ ÿ ÿ ÿ for all
4
a result, estimation (89) can be extended that:

2+ 2 2
(,)ÿÿ3ÿÿ F (ÿ ÿ) ÿ ÿ (90)
14

2
The inequality in (90) implies that there exists a constant ÿ R>0 such as (,)ÿÿÿÿ for every ÿ , where

3 +1
ÿÿÿÿF (91)
={ÿR ÿ1 ( 2 ÿ 3 )}

In the view of Property 1, we can confirm the conditions in Lemma 2 as follows:

1()ÿ(,)ÿ2() (92)

12
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

2 1 2.
where
1()=(1 2 min { 1, 1})ÿÿ , 2 ( ) = ( max { 1, 2 2 }) ÿ ÿ
In the situation that ÿ , from (91) and (92), we can obtain that ( , ) ÿ and (4), the ÿ. Hence, it can be seen that and ÿ 1,2 ÿ
ÿ ÿ

holds. In the view of = 2 + 2 2 2 ÿ following


determinestatement
that ( ), ÿ (holds
), ÿ ( )ÿÿ holds. As a result, ÿ. Therefore, we can employ Assumption 3 to
1,
combining with Assumption 2, it leadsÿ to

ÿ( ), ( ÿ ) ÿ ÿ

Based on Mecanum model (2) and Assumption 2, it can be verified that ( ) ÿ and the boundary property
ÿ
of all closed-loop
system signals can be achieved. Due to the requirement of employing the condition ÿ , (ÿ ÿ 0) to apply Lemma 2, the following
set of initial states (0) is pointed out from (92) and the region in (91) under the selection of parameter :
1
(93)
= { ( ) ÿ ÿ 2 ( ( )) < 2 min { 1, 1}(F ÿ1 ( 2 ÿ 3 ))2 }
2
On the other hand, Lemma 2 requires the uniform continuity property of ( ) = ÿ ÿ ÿ in (91) and it can be determined as
ÿ

ÿ that
follow. Since ( ) ÿ and (47), itÿ is true that Using (81), it is concluded ÿ. ÿ as well as 1 ( ), ÿ 2 ( ) ÿ ÿ ( ) ÿ Finally, in the light
that limof
ofLemma 2, it yields
RISE ÿcontrol scheme (47) ÿ.

to be rewritten as follows: ÿÿ ÿ ( )ÿ = 0, ÿ ÿ . According to (2), (6) and (50), it leads to the estimation

ÿ ÿ

2+ 2+(ÿ ()+)=ÿ() (94)

After obtaining lim with ÿÿ ÿ ( )ÿ = 0, ÿ (0) ÿ , based on the definition of in (85) that implies ÿ 1ÿ, ÿ 2ÿ, ÿ ÿ ÿ 0 as ÿ ÿ, along
the expressions in (4), we can contain that ÿ 1ÿ, ÿ ÿ 1ÿ, ÿ 2ÿ, ÿ ÿ 2ÿ ÿ 0 as ÿ ÿ. Therefore, we achieve that

( ) ÿ 0 as ÿ ÿ (95)
ÿ

|
=0 of Theorem 2, it follows that ×(4 ) (, , ÿ ) = 0 at = 0.
Furthermore, according to (79) and the condition ( )| | | =0 ÿ ÿ

( , ,
Combining with the continuity of ÿ ) and (95), we obtain that ( ) ÿ ( ÿ ÿ ) as ÿ ÿ, we can (, , ÿ ) ÿ 0 as ÿ ÿ. Hence, along with (94), we have
achieve the statement (78). ÿ

=0 =
| ×(4 ) ,
Remark 6. The selection of activation function ( ) ÿ R in (27) can be implemented to satisfy ( )| | | =0
ÿ

[ ] ÿ R4 (see Section 4). It is worth emphasizing that unlike the previous Refs. [15,25], the advantage of the
proposed control structure (8) is considered that satisfaction of PE condition is not required in Theorem 2. However, although
PE condition (52) is not necessary in Theorem 2, the requirement (77) is given in this Theorem 2. On the other hand, the result (77)
in Theorem 2 points out the tracking error would converge to 0 as ÿ ÿ. It follows that the presence of RISE (47) is able to play a
role of not only disturbance observer (78), but also guaranteeing the performance of the system (77). It is worth emphasizing that
the strict convergence in (77) is the advantage compared to previous RL studies being compensated by disturbance observer (DO),
which only satisfies uniformly ultimate boundedness (UUB) stability [1,15,17,23,25,27,29].

4. Simulation results

To validate the effectiveness of the proposed RISE-based RL algorithm for MWMR, two numerical examples are presented. Print
Case 1, the robot is subject to unsupportive settings which can be characterized by time-varying slippage. In Case 2, a simulation
is conducted in constant slippage for comparing the performance of our proposed controller with the existing controller in [1].
Parameters of the robot for simulations are taken as follows: = 10 kg, its wheels [ , ] = [0.5, 0.05] m, its inertia [ ] = [5, 0.1] ,
kg.m2 and the friction coefficient of the surface = 1. The activation function ( ), ÿ R12 for the approximation of the Bellman
function in (27) is chosen with = 51 nodes as:
2 2 2 2 2 2 2 2 2 2 2
()=[ 14; 15; 16;24;25;26;34;35; 36; 7
;
;1 ;2 ;3 ;2 ;3 2 ;3

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 12 2 12 2 2 2 12 2
;8 ;9 1 10; 11; 12; 2 ;7 2 ;8 2 ;9 2 10; 2 11; 2 12; ;7 ;8 ;9 3 10;
12 2 12 2 2 2 12 2 12 2 2 2 2 2 2 2 2 2 2 2 2 2 32 2 32 2 32 2 2 2
7
;
11; 12; 4 ;7 4 ;8 4 ;9 4 10; 4 11; 4 12; 5 ;7 5 ;8 5 ;9 5 10; 5 11; 5 12; 6
32 2 32 2 2 2 2 2 2 2
6 ;8 6 ;9 6 10; 6 11; 6 12 ].

Remark 7. It is worth emphasizing that the disadvantages of the proposed method lies in obtaining the activation function and
the method for choosing appropriate activation function in general cases has not been mentioned yet. However, the selection was
proposed in a particular case of cost function, as shown in Theorem 2 of [15].

4.1. The case of severe tire erosion with the time-varying slippage

In this case, we conduct a simulation to perform a tracking task with the reference trajectory ( ) = [0.5 cos(0.2 ),ÿ0.5 cos(0.2 ),
0.5 sin(0.2 )] in an environment with the varying slippage as ( ) = cos(0.5 )sin(0.5 )+cos(0.5 ) 2×sin(0.5 ). The selection of parameters
for performance index (16), and control algorithm in Theorem 1 are provided in detail in Table 1. The probing noise is added to
ÿ
2 2 2 +
in the interval [0, 50] for the satisfactory of the PE condition ( ) = 9(sin( ) cos( )+sin(2 ) cos(0.1 )+sin(ÿ1.2 ) 5 cos(0.5 )+sin( )

13
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

Table 1
Initial conditions and various parameters for the performance index (16) and control algorithm
in Theorem 1, 2 for case 1.
Initial conditions:
(0) = [0.1, 0.2, 0.3] ÿ ,(0) = [0.1, 0.2, 0.3] , (0) = , (0) = 0.2rand( , 1), (0) = 0.3rand( , 1).
(where rand( , ) is an a-by-b matrix of random numbers in R[ÿ1,1].)
Parameters for the performance index (16):
= 3, = 10 6 , = 0.5.

Gains and parameters for ADP update laws (33), (34):


= 0.08, 1 = 0.2, 2 = 50, = 0.001, = 0.001.

Gains and parameters for the RISE controller (47):


1
= 3 , 2 = 50, = 6,
1 = 100.

Table 2
The mean integral absolute error (MIAE), total applied input and the selection of parameters of each controller.
Methodology MIAE Total applied Parameters Values
(×10ÿ2) (m) input (×105 ) (Nms)

, , , 1,2
10 6 , 10ÿ3 , 2.3, 8, 10ÿ3
Robust-based RL [1] 3.44 1,392
[, , , [3, 10ÿ3 , 10ÿ3 , 10ÿ3 ]
, , , 1,2
10 6 , 0.5, 0.08, 0.2, 50
RISE-based RL 0.12 1,349
[, ,, 1,2, 1, 3 , , 10ÿ3 , 4 3 , 50, 6, 100]
[ 10ÿ3

, , , 1,2
10 6 , 0.5, 1.22, 0.2, 53
DO-based RL [15] 1.13 1,372
[, , , 1, ]]] [ 3 ,10ÿ3 , 10ÿ3 , 4 50 3 , 3]
AITSM [34] 0.47 1,426 [ , , 1,2, 1,2] [ 5, 3, 5 3 , 22 3, 5 3 , 10 3]

Fig. 2. Sketch of the tracking performance under our algorithm developed of each axis is shown in the left-hand figure and the further observation of the
tracking errors is demonstrated in the right-hand figure.

sin(1.2 ) 2 ).+Thecos(2.4
simulation )sin(2.4 ) 3 with the results being presented in Figs. 2–5. Fig. 2a demonstrates
lasts 150 seconds

the robot approaches the reference trajectory within a few seconds from the beginning even under the probing noise. After the
learning phase in 50 seconds, the probing noise is turned off and the tracking errors are shown in Fig. 2b ranging from ÿ5 × 10ÿ4 to
2×10ÿ4 (m or rad). Fig. 3 shows that both Actor and Critic NN weights convergence are achieved at about 35s. Fig. 4 illustrates the
response of the control inputs generated under the proposed controller. Finally, Fig. 5 showcases the successful disturbance tracking
capability of the RISE controller with the estimation error remains within the range of (ÿ0.05, 0.05) in 3 degrees of freedom of the
system. system. According to these results, the RISE-based RL controller has shown the effectiveness for the tracking task in the unfavorable
environment with time-varying slippage. To emphasize the powerful tracking capability of our algorithm compared to other
existing methods, the following subsection is presented.

4.2. The constant slippage case and the comparisons

To evaluate the enhancements of our proposed algorithm compared to existing approaches, we conduct a simulation on the
same MWMR, as described earlier in this section. This simulation considers four different methods, referenced in [1,15], and [34],
for tracking a figure-eight reference trajectory ( ) = [sin( ),sin(2 ), cos(0.5 )] . The parameters for each controller are provided in

14
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

ÿ ÿ

Fig. 3. Evolution of the actor and the critic NN weights in (a) and (b), respectively.

Fig. 4. Torque generated by of 3 motors on the MWMR over time.

Table 2 for = ÿ0.1. Furthermore, the success of the tracking task is quantitatively analyzed using the Mean Integral Absolute Error (MIAE) and control resource
consumption is evaluated through the integral of the absolute input, as shown in Table 2 with the following formulas ÿ 1 ÿ3 ÿ 2 | ( )| where [ 1 , 2 ] is the
examined time interval of our system =1 ÿ 21 | 1, ( )| , ÿ ÿ3 =1 ,
2ÿ 1 1
performing the task. In this scenario, the calculation starts at = 50 when all of controllers
1 learned. The comparison is made to witness the innovation of our
work versus the work in [1], and two algorithms inspired in [15,34] which are applied to the model established in (2) as follows

• DO-based RL controller: A traditional disturbance observer cooperating with RL-based optimal control as in [15], where the term ÿ( + ) is estimated
through a term which has the following update rule:

ÿ + ( 2 ), ( 2 )
(+ ÿ
(96)
ÿÿÿ
(2)ÿ 2 ÿ),
2

(2)
where ÿ R is the intermediate term, ÿ R is a designed function with the value of is handled > 0 to be defined as ÿ in which 2 ,
2
as in the sub is specified in Table 2. Only the disturbance observer in [15] is utilized while the RL-based optimal control
Section 3.1. Further information and the stability analysis of this algorithm are detailed in [15]. • AITSM controller: The existing algorithm in [34]
is leveraged to control an MWMR with the model as in (1). The torques is ÿ R (see definitions of symbols and the stability analysis detailed
ÿ1 ÿ1
designed as ÿ + + ( ) in [34]). ÿ( ) ÿ , where ,

The tracking performance across different degrees of freedom of the system and the nominal resource consumption under four involved controllers are
presented in Fig. 6 and in the bar graph in Figs. 7, respectively. Based on these illustrative results and

15
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

Fig. 5. Disturbance under the compensation of the RISE controller. The green dash corresponds to the actual disturbance signal affecting our system, the blue
ÿ

solid line represents the performance of the developed RISE controller, and the red line shows the error between unmeasurable effects and our designed
estimator.

Fig. 6. Sketch of the tracking errors under our algorithm developed and three other controllers.

16
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

Fig. 7. The illustration of nominal consumption each method.

Fig. 8. The illustration of the performance indices of our proposed algorithm and ones in [1,15].

quantitative analysis in Table 2, we attain the following discussion. It can be observed that the AITSM algorithm shows the rapid tracking convergence
and the tracking error with an MIAE of 4.7×10ÿ3(m) which is smaller than the errors produced by the DO-based RL and Robust-based RL algorithms.
However, AITSM consumes the most energy, with = 1.426 × 105 (Nms), compared to other RL-based optimal algorithms, revealing a trade-off between
accuracy and control effort. The RISE-based RL controller (our proposed method) achieves the best performance with an MIAE of 0.12 × 10ÿ2 (m) and a
total applied input of 1.349 × 105 (Nms), indicating both higher accuracy and lower control effort. The two Robust and DO based RL controllers also show
the aspect of optimality with lower control resources, having values of 1.392 × 105 and 1.372 × 105 (Nms), respectively, in comparison with the conventional
controller AITSM. Heading to Fig. 8, our algorithm developed accommodates the approximation in (30) with the infinitesimal mismatch against ÿ . That is
ÿ

evidence of the enhancement in the operation efficiency for the superior assistance of the RISE term over the help of the robust term presenting additional
in the cost function [1] and the traditional DO as in [15].

Remark 8. The results above demonstrate the superiority of our proposed method, but a notable drawback is its strict dependence on an accurately
measured model. If the model is inaccurately identified, the learned controller may become suboptimal, negatively impacting task performance.
Additionally, this focusing on a specific model limits the method's adaptability to varying environments.
To address this issue, a model-free approach can be developed, as suggested in [23].

17
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

5. Conclusions

This article studies the satisfaction of trajectory tracking control and optimality effectiveness, observer problem for a
Three Mecanum WMR subject to external disturbance. Considering a filtered variable and minimization of Hamiltonian
function, we construct an Actor/Critic strategy for an order reduction affine model. Additionally, the RISE scheme provides
the estimation of dynamic uncertainties/disturbances as well as the improvement of tracking effectiveness. The
convergences of learning process and tracking problem, observer performance are discussed in two theorems with and without the c
Simulation results have shown that the proposed methods could obtain tracking, observer, optimality effectively. In the
future, the authors plan to conduct experimental validation and extend the RL controller with model-free RL algorithms that
do not necessarily require complete system dynamics, which is more challenging.

CRediT authorship contribution statement

Phuong Nam Dao: Writing – original draft – review & editing, Supervision, Methodology. Minh Hiep Phung: Software, Data
curation, Conceptualization.

Declaration of competing interest

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no
significant financial support for this work that could have influenced its outcome.

Data availability

The data that has been used is confidential.

References

[1] Zhang D, Wang G, Wu Z. Reinforcement learning-based tracking control for a three Mecanum wheeled mobile robot. IEEE Trans Neural Netw Learn Syst
2022.
[2] Chang S, Wang Y, Zuo Z, Luo X. Prescribed-time formation control for wheeled mobile robots with time-varying super-twisting extended state observer.
Appl Math Comput 2023;457:128189.
[3] Yuan Z, Tian Y, Yin Y, Wang S, Liu J, Wu L. Trajectory tracking control of a four mecanum wheeled mobile platform: an extended state observer-based sliding
mode approach. IET Control Theory Appl 2020;14(3):415–26.
[4] Bayar G, Hambarci G. Improving measurement accuracy of indoor positioning system of a Mecanum wheeled mobile robot using Monte Carlo-latin
hypercube sampling based machine learning algorithm. J Franklin Inst 2022.
[5] Savaee E, Rahmani Hanzaki A. A new algorithm for calibration of an omni-directional wheeled mobile robot based on effective kinematic parameters
estimation. J Intell Robot Syst 2021;101:1–11.
[6] Sun Z, Xie H, Zheng J, Man Z, He D. Path-following control of Mecanum-wheels omnidirectional mobile robots using nonsingular terminal sliding mode.
Mech Syst Signal Process 2021;147:107128.
[7] Bayar G, Ozturk S. Investigation of the effects of contact forces acting on rollers of a mecanum wheeled robot. Mechatronics 2020;72:102467.
[8] Alakshendra V, Chiddarwar SS. Adaptive robust control of Mecanum-wheeled mobile robot with uncertainties. Nonlinear Dynam 2017;87:2147–69.
[9] Li S, Zhang J, Zhao K, Zhang Y, Sun Z, Xia Y. Trajectory tracking control for four-mecanum-wheel mobile vehicle: A variable gain active disturbance
rejection control approach. Internat J Robust Nonlinear Control 2022;32(4):1990–2006.
[10] Qin P, Zhao T, Dian S. Interval type-2 fuzzy neural network-based adaptive compensation control for omni-directional mobile robot. Neural Comput Appl
2023;1–15.
[11] Xian B, Dawson DM, de Queiroz MS, Chen J. A continuous asymptotic tracking control strategy for uncertain nonlinear systems. IEEE Trans Automated Control
2004;49(7):1206–11.
[12] Fischer N, Kan Z, Kamalapurkar R, Dixon WE. Saturated RISE feedback control for a class of second-order nonlinear systems. IEEE Trans Automated Control
2013;59(4):1094–9.
[13] Saied H, Chemori A, Bouri M, El Rafei M, Francis C, Pierrot F. A new time-varying feedback RISE control for 2nd-order nonlinear MIMO systems: Theory and
experiments. Internat J Control 2019;1–25.
[14] Asl HJ, Narikiyo T, Kawanishi M. RISE-based prescribed performance control of Euler–Lagrange systems. J Franklin Inst 2019;356(13):7144–63.
[15] Pham TL, Dao PN, et al. Disturbance observer-based adaptive reinforcement learning for perturbed uncertain surface vessels. ISA Trans 2022;130:277–92.
[16] Annaswamy AM, Guha A, Cui Y, Tang S, Fisher PA, Gaudio JE. Integration of adaptive control and reinforcement learning for real-time control and
learning. learning. IEEE Trans Automated Control 2023.
[17] Dao PN, Liu YC. Adaptive reinforcement learning in control design for cooperating manipulator systems. Asian J Control 2022;24(3):1088–103.
[18] Sun T, Sun XM. An adaptive dynamic programming scheme for nonlinear optimal control with unknown dynamics and its application to turbofan engines.
IEEE Trans Ind Inf 2020.
[19] Wei Q, Liao Z, Yang Z, Li B, Liu D. Continuous-time time-varying policy iteration. IEEE Trans Cybern 2019;50(12):4958–71.
[20] Jiang Y, Kiumarsi B, Fan J, Chai T, Li J, Lewis FL. Optimal output regulation of linear discrete-time systems with unknown dynamics using reinforcement
learning. IEEE Trans Cybern 2019.
[21] Jiang Y, Gao W, Wu J, Chai T, Lewis FL. Reinforcement learning and cooperative H ÿ output regulation of linear continuous-time multi-agent systems.
Automatica 2023;148:110768.
[22] Xiao G, Zhang H. Convergence analysis of value iteration adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Cybern 2023.
[23] Deptula P, Bell ZI, Doucette EA, Curtis JW, Dixon WE. Data-based reinforcement learning approximate optimal control for an uncertain nonlinear system
with control effectiveness faults. Automatica 2020;116:108922.
[24] Zhai G, Tian E, Luo Y, Liang D. Data-driven optimal output regulation for unknown linear discrete-time systems based on parameterization approach.
Appl Math Comput 2024;461:128300.

18
Machine Translated by Google

PN Dao and MH Phung Computers and Electrical Engineering 121 (2025) 109870

[25] Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE. A novel actor–critic–identifier architecture for approximate optimal control
of uncertain nonlinear systems. Automatica 2013;49(1):82–92.
[26] Liu J, Zhang N, Li Y, Xie X, Tian E, Cao J. Learning-based event-triggered tracking control for nonlinear networked control systems with unmatched
disturbance. disturbance. IEEE Trans Syst Man Cybern: Syst 2022.
[27] Wen G, Chen CP, Ge SS, Yang H, Liu X. Optimized adaptive nonlinear tracking control using actor–critic reinforcement learning strategy. IEEE Trans Ind
Inform 2019;15(9):4969–77.
[28] Li S, Ding L, Zheng M, Liu Z, Li X, Yang H, et al. NN-based reinforcement learning optimal control for inequality-constrained nonlinear discrete-time systems
with disturbances. IEEE Trans Neural Netw Learn Syst 2023.
[29] Wen G, Ge SS, Chen CP, Tu F, Wang S. Adaptive tracking control of surface vessels using optimized backstepping technique. IEEE Trans Cybern
2018;49(9):3420–31.
[30] Kim Y, Singh T. Energy-time optimal trajectory tracking control of wheeled mobile robots. IEEE/ASME Trans Mechatronics 2023.
[31] Dupree K, Patre PM, Wilcox ZD, Dixon WE. Asymptotic optimal control of uncertain nonlinear Euler–Lagrange systems. Automatica 2011;47(1):99–107.
[32] Wang K, Mu C. Learning-based control with decentralized dynamic event-triggering for vehicle systems. IEEE Trans Ind Inf 2022;19(3):2629–39.
[33] Khalil HK. third ed.. Nonlinear Systems, vol. 115, Patience Hall; 2002.
[34] Sun Z, Hu S, He D, Zhu W, Xie H, Zheng J. Trajectory-tracking control of Mecanum-wheeled omnidirectional mobile robots using adaptive integral terminal
sliding mode. Comput Electr Eng 2021;96:107500.

Phuong Nam Dao he received the Ph.D. degree in Industrial Automation from Hanoi University of Science and Technology, Hanoi,
Vietnam in 2013. Currently, he holds the position as Associate Professor at Hanoi University of Science and Technology, Vietnam.
His research interests include control of robotic systems and robust/adaptive, optimal control. He is the author/co-author of more than
90 papers (Journals, Conferences, etc.). E-mail: [email protected]

Minh Hiep Phung he is currently studying the Bachelor degree in talent program of Control Engineering and Automation from Hanoi
University of Science and Technology, Hanoi, Vietnam. He is currently working toward a Ph.D.degree and his research interests include
optimal control, intelligent control and its application for robotic systems. E-mail: [email protected]

19

You might also like