Reinforcement Learning
Reinforcement Learning
Reinforcement Learning (RL) is a subset of machine learning where an agent learns to make
decisions by interacting with an environment to maximize a reward signal. Unlike supervised
learning, where the model learns from labelled examples, RL focuses on learning from the
consequences of actions and exploring an environment to discover optimal behaviours.
How RL Works
1. Model-Free RL:
o Focuses on learning directly from the interaction without modeling the
environment.
o Examples:
Q-Learning (off-policy)
SARSA (on-policy)
2. Model-Based RL:
o Attempts to build a model of the environment for planning.
3. Policy Gradient Methods:
o Directly optimize the policy using gradient ascent.
o Examples:
REINFORCE
Proximal Policy Optimization (PPO)
4. Deep Reinforcement Learning:
o Combines RL with deep neural networks to handle high-dimensional state and
action spaces.
o Examples:
Deep Q-Networks (DQN)
Actor-Critic methods (A3C, DDPG)
Applications of RL
1. Robotics: Training robots to perform tasks like walking, grasping, and assembling.
2. Game Playing: Achieving superhuman performance in games like Go, Chess, and
StarCraft (e.g., AlphaGo, AlphaStar).
3. Autonomous Vehicles: Learning to navigate and make driving decisions.
4. Healthcare: Personalized treatment planning and drug discovery.
5. Finance: Portfolio management and algorithmic trading.
1. Exploration vs. Exploitation: Balancing trying new actions (exploration) and using
known strategies (exploitation).
2. Sparse Rewards: Rewards might be infrequent, making learning difficult.
3. Computational Complexity: Requires significant computational resources,
especially for deep RL.
4. Stability: Training RL models can be unstable and sensitive to hyperparameters.