0% found this document useful (0 votes)
15 views9 pages

ML-10

Uploaded by

22beit30160
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views9 pages

ML-10

Uploaded by

22beit30160
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Machine Learning CT604EN

C L LEGPT

Reinforcement
10
Learning

Prepared and Edited by:- Mayank Yadav Designed by:- Kussh Prajapati

Get Prepared Together

www.collegpt.com [email protected]
Prepared By : Mayank Yadav Machine Learning

Reinforcement Learning

⭐Reinforcement Learning is a feedback-based Machine learning technique in which an agent


learns to behave in an environment by performing the actions and seeing the results of actions. For each
good action, the agent gets positive feedback or reward, and for each bad action, the agent gets
negative feedback or penalty.

● In Reinforcement Learning, the agent learns automatically using feedback without any labeled
data, unlike supervised learning.

● Since there is no labeled data, the agent is bound to learn by its experience only.

● RL solves a specific type of problem where decision making is sequential, and the goal is
long-term, such as game-playing, robotics, etc.

● The agent interacts with the environment and explores it by itself. The primary goal of an agent
in reinforcement learning is to improve the performance by getting the maximum positive
rewards.

● The agent learns with the process of hit and trial, and based on the experience, it learns to
perform the task in a better way. Hence, we can say that "Reinforcement learning is a type of
machine learning method where an intelligent agent (computer program) interacts with the
environment and learns to act within that.

● It is a core part of Artificial intelligence, and all AI agents work on the concept of reinforcement
learning. Here we do not need to pre-program the agent, as it learns from its own experience
without any human intervention.

● The agent continues doing these three things (take action, change state/remain in the same
state, and get feedback), and by doing these actions, he learns and explores the environment.
Prepared By : Mayank Yadav Machine Learning

Terms used in Reinforcement Learning:

● Agent(): An entity that can perceive/explore the environment and act upon it.

● Environment(): A situation in which an agent is present or surrounded by. In RL, we assume the
stochastic environment, which means it is random in nature.

● Action(): Actions are the moves taken by an agent within the environment.

● State(): State is a situation returned by the environment after each action taken by the agent.

● Reward(): A feedback returned to the agent from the environment to evaluate the action of the
agent.

● Policy(): Policy is a strategy applied by the agent for the next action based on the current state.

● Value(): It is expected long-term returns with the discount factor and opposite to the short-term
reward.

● Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current
action (a).
Prepared By : Mayank Yadav Machine Learning

Reinforcement Learning Algorithms

There are 3 approaches to implement reinforcement learning algorithms:

● Value-Based – The main goal of this method is to maximize a value function. Here, an agent
through a policy expects a long-term return of the current state.

● Policy-Based – In policy-based, you can come up with a strategy that helps to gain maximum
rewards in the future through possible actions performed in each state. Two types of
policy-based methods are deterministic and stochastic.

● Model-Based – In this method, we need to create a virtual model for the agent to help in
learning to perform in each specific environment.

Common RL Techniques:

● Q-Learning: A value-based method where the agent learns a Q-value for each state-action pair,
representing the expected future reward for taking an action in a given state.

● SARSA (State-Action-Reward-State-Action): Another value-based method that updates the


Q-value based on the current state, action taken, received reward, next observed state, and next
chosen action.

● Policy Gradient Methods: Focus on directly improving the policy by estimating the gradient of
the expected reward with respect to the policy parameters.

● Deep Reinforcement Learning: Combines RL with deep neural networks for powerful agents
capable of handling complex environments and high-dimensional states.
Prepared By : Mayank Yadav Machine Learning

⭐Concept of Penalty and Reward in Reinforcement Learning⭐


In Reinforcement Learning (RL), rewards and penalties are the core feedback mechanisms that guide an
agent's learning process. They act as the language of the environment, telling the agent what actions are
desirable (rewarded) and which ones are undesirable (penalized). This feedback loop allows the agent to
gradually learn the optimal behavior to achieve its goals.

Rewards

-> Positive reinforcement: Positive reinforcement is defined as when an event, occurs due to specific
behavior, increases the strength and frequency of the behavior. It has a positive impact on behavior.

Advantages:
● Maximizes the performance of an action
● Sustain change for a longer period
Disadvantage:
● Excess reinforcement can lead to an overload of states which would minimize the results.

Example: In a game of Pong, the agent receives a reward for hitting the ball towards the opponent's goal.

-> Shaping rewards: Complex tasks might require breaking them down into smaller sub-goals with
associated rewards. These intermediate rewards can guide the agent's learning towards the final goal.

● Example: In a robot learning to walk, initial rewards might be given for taking a step, then for
maintaining balance, and finally for achieving forward movement.

Penalties

-> Negative reinforcement: Negative Reinforcement is represented as the strengthening of a behavior.


In other ways, when a negative condition is barred or avoided, it tries to stop this action in the future.

Advantages:
● Maximized behavior
● Provide a decent to minimum standard of performance
Disadvantage:
● It just limits itself enough to meet up a minimum behavior

Example: In a maze navigation task, the agent receives a penalty for hitting a wall.

-> Importance of appropriate penalties: Penalties should be strong enough to discourage


undesirable actions but not too harsh, hindering exploration.
Prepared By : Mayank Yadav Machine Learning

Key Points about Rewards and Penalties:

● Magnitude: The magnitude of rewards and penalties can influence the learning speed and
effectiveness.
○ Larger rewards can encourage faster learning towards desired behavior, but smaller,
more frequent rewards can provide more continuous feedback.
● Sparsity: Rewards and penalties might not be received after every action, especially in complex
environments. The agent needs to learn to deal with delayed rewards and sparse feedback.
● Exploration vs. Exploitation: A balance needs to be struck between exploration (trying new
actions to learn about the environment) and exploitation (repeating actions known to yield
rewards).
● Reward Engineering: Defining appropriate reward signals is crucial for effective RL. Rewards
should be clear, consistent, and aligned with the desired goals.
Prepared By : Mayank Yadav Machine Learning

Reinforcement Learning Framework

The Reinforcement Learning (RL) framework provides a structured approach for training agents to learn
and make decisions in an interactive environment. Unlike supervised learning with labeled data, RL
agents learn by trial and error, receiving rewards for desired actions and penalties for undesirable ones.

Core Elements:

● Agent: The learning entity that interacts with the environment and aims to maximize its
long-term reward.
● Environment: The system or world the agent operates in. It provides the agent with observations
(state) and rewards based on its actions.
● State: The representation of the environment relevant to the current situation (e.g., game board
configuration, robot's sensor readings). It captures the information necessary for the agent to
make decisions.
● Action: The choices the agent can make in a given state. These actions influence the
environment and the agent's future state.
● Reward: A signal indicating the desirability of an action. Positive rewards encourage repetition,
while negative rewards (penalties) discourage it. Rewards provide feedback to the agent about
the consequences of its actions.
Prepared By : Mayank Yadav Machine Learning

The RL Loop:

● Perception: The agent observes the current state of the environment through sensors or other
information sources.
● Decision-Making: Based on the perceived state, the agent selects an action using its policy
(strategy). This policy defines the mapping between states and actions.
● Action: The agent takes the chosen action in the environment, potentially changing the
environment's state.
● Reward: The environment provides a reward signal based on the outcome of the action. This
reward reflects the desirability of the chosen action.
● Update: The agent updates its policy (learning) based on the observed reward and the transition
from the previous state to the current state. The goal is to adjust the policy to favor actions that
lead to higher rewards in the long run.

Reinforcement Learning Applications

● Robotics: RL is used in Robot navigation, Robo-soccer, walking, juggling, etc.


● Control: RL can be used for adaptive control such as Factory processes, admission control in
telecommunication, and Helicopter pilot is an example of reinforcement learning.
● Game Playing: RL can be used in Game playing such as tic-tac-toe, chess, etc.
● Chemistry: RL can be used for optimizing the chemical reactions.
● Business: RL is now used for business strategy planning.
● Manufacturing: In various automobile manufacturing companies, the robots use deep
reinforcement learning to pick goods and put them in some containers.
● Finance Sector: The RL is currently used in the finance sector for evaluating trading strategies.
C LLEGPT

All the Best


"Enjoyed these notes? Feel free to share them with

your friends and provide valuable feedback in your

review. If you come across any inaccuracies, don't

hesitate to reach out to the author for clarification.

Your input helps us improve!"

Visit: www.collegpt.com

www.collegpt.com ColleGPT [email protected]

You might also like