Data Driven Control IEEE Paper
Data Driven Control IEEE Paper
Abstract
This paper investigates a data-driven control approach for autonomous systems using reinforcement
learning (RL). The project focuses on stabilizing an inverted pendulum and robotic arm through
Q-learning, a model-free RL algorithm. The system dynamics, including torque and angular
velocities, are represented discretely to enable optimal control policy learning. Simulations
demonstrate the successful stabilization of both systems, showing the effectiveness of RL in control
Keywords
Reinforcement learning, Q-learning, data-driven control, inverted pendulum, robotic arm, autonomous syste
I. Introduction
Optimal control has been a fundamental topic in control theory for decades, applied in numerous
fields such as robotics, missile guidance, and energy systems. However, for complex nonlinear
systems, traditional control methods often fall short. This paper presents a reinforcement learning
approach to stabilize two classic systems: an inverted pendulum and a robotic arm. Using
Q-learning, the agent learns to apply optimal torques to stabilize both systems, demonstrating the
Reinforcement learning, particularly Q-learning, has seen extensive applications in control systems
due to its ability to learn optimal policies without requiring a model of the environment. Previous
studies have demonstrated its effectiveness in stabilizing various control systems. This work aims to
contribute to this research gap by applying Q-learning to autonomous systems with unknown
dynamics.
1. Inverted Pendulum: A classic example of a nonlinear system, where the goal is to stabilize the
2. Robotic Arm: A multi-joint system where the objective is to apply optimal torques to achieve a
In both systems, the state is represented by the angular displacement and velocity, which are
discretized for use with Q-learning. The torque is treated as a discrete action, and the reward
IV. Methodology
A. Q-Learning Framework
Q-learning is a model-free reinforcement learning algorithm where the agent learns the optimal
policy by updating a Q-table based on observed rewards. The algorithm operates in discrete time
steps, where at each step, the agent selects an action, observes the reward, and updates its
1. State Representation: The continuous state space (angular displacement and velocity) is
2. Action Representation: The torque applied to the pendulum and robotic arm is discretized into
3. Reward Function: A custom reward function is used to penalize large deviations from equilibrium.
B. Training Process
The agent is trained for 10,000 episodes, during which it explores and exploits different actions. The
decays over time. The Q-values are updated using the Bellman equation.
V. Results
A. Pendulum Stabilization
The Q-learning agent successfully stabilized the inverted pendulum, bringing both the angular
displacement and velocity to zero. The agent was able to adapt to various initial conditions,
Similarly, the robotic arm was stabilized by the RL agent, which successfully applied torques to the
VI. Conclusion
autonomous systems such as the inverted pendulum and robotic arm. The results indicate that RL
can effectively stabilize nonlinear and uncertain systems, providing a foundation for real-world
- Testing the trained policies on real-world systems to evaluate their performance in physical
environments.
References
1. X. Jia, X. Zhang, S. Zhu, F. Deng, and B. Zhu, "Data-driven adaptive consensus control for
control of unknown nonlinear systems via reinforcement learning," IEEE Transactions on Control
https://in.mathworks.com/help/reinforcement-learning/ug/train-ddpg-agent-to-swing-up-and-balance-