0% found this document useful (0 votes)
10 views35 pages

Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I

Uploaded by

h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views35 pages

Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I

Uploaded by

h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Reinforcement Learning

Dr.Ch.Balaram Murthy
What is Reinforcement Learning (RL)?

To understand Reinforcement Learning, let’s start with the big


picture.

The idea behind Reinforcement Learning is that an agent (an AI) will
learn from the environment by interacting with it (through trial and
error) and receiving rewards (negative or positive) as feedback for
performing actions.

Learning from interactions with the environment comes from our


natural experiences.
Main points in Reinforcement learning :

Input: The input should be an initial state from which the model
will start.

Output: There are many possible outputs as there are a variety of


solutions to a particular problem.

Training: The training is based upon the input, The model will
return a state and the user will decide to reward or punish the
model based on its output.

The model keeps continues to learn……………….

The best solution is decided based on the maximum reward.


First, we need to start with defining the problem setting. RL
algorithms require us to define the following properties:
Terms used in Reinforcement Learning
Agent(): An entity that can perceive/explore the environment and act upon it.
Environment(): A situation in which an agent is present or surrounded by.
In RL, we assume the stochastic environment, which means it is random in nature.

Action(): Actions are the moves taken by an agent within the environment.
State(): State is a situation returned by the environment after each action taken by
the agent.

Reward(): A feedback returned to the agent from the environment to evaluate the
action of the agent.

Policy(): Policy is a strategy applied by the agent for the next action based on the
current state.

Value(): It is expected long-term retuned with the discount factor and opposite to
the short-term reward.

Q-value(): It is mostly similar to the value, but it takes one additional parameter as
a current action (a).
Examples of Reinforcement Learning

Any real-world problem where an agent must interact with an


uncertain environment to meet a specific goal is a potential
application of RL.

Robotics. Robots with pre-programmed behavior are useful in


structured environments, such as the assembly line of an automobile
manufacturing plant, where the task is repetitive in nature. In the
real world, where the response of the environment to the behavior
of the robot is uncertain, pre-programming accurate actions is nearly
impossible. In such scenarios, RL provides an efficient way to build
general-purpose robots.
Examples of Reinforcement Learning
Examples of Reinforcement Learning

AlphaGo. One of the most complex strategic games is a 3,000-year-


old Chinese board game called Go. Its complexity stems from the
fact that there are 10^270 possible board combinations, several
orders of magnitude more than the game of chess.

In 2016, an RL-based Go agent called AlphaGo defeated the greatest


human Go player. Much like a human player, it learned by
experience, playing thousands of games with professional players.

The latest RL-based Go agent has the capability to learn by playing


against itself, an advantage that the human player doesn’t have.
Examples of Reinforcement Learning

Autonomous Driving. An autonomous driving system must perform


multiple perception and planning tasks in an uncertain environment.

Some specific tasks where RL finds application include vehicle path


planning and motion prediction.

Vehicle path planning requires several low and high-level policies to


make decisions over varying temporal and spatial scales.

Motion prediction is the task of predicting the movement of


pedestrians and other vehicles, to understand how the situation
might develop based on the current state of the environment.
Benefits of Reinforcement Learning

RL is applicable to a wide range of complex problems that cannot be


tackled with other ML algorithms.

RL is closer to artificial general intelligence (AGI), as it possesses the


ability to seek a long-term goal while exploring various possibilities
autonomously.
Benefits of Reinforcement Learning
Some of the benefits of RL include:

Focuses on the problem as a whole. Conventional ML algorithms


are designed to excel at specific subtasks, without a notion of the big
picture.

RL, on the other hand, doesn’t divide the problem into sub-
problems; it directly works to maximize the long-term reward. It has
an obvious purpose, understands the goal, and is capable of trading
off short-term rewards for long-term benefits.
Benefits of Reinforcement Learning
Does not need a separate data collection step.
In RL, training data is obtained via the direct interaction of
the agent with the environment.

Training data is the learning agent’s experience, not a separate


collection of data that has to be fed to the algorithm.

This significantly reduces the burden on the supervisor in charge of


the training process.
Benefits of Reinforcement Learning
Works in dynamic, uncertain environments.
RL algorithms are inherently adaptive and built to respond to
changes in the environment.

In RL, time matters and the experience that the agent collects is not
independently and identically distributed (i.i.d.), unlike conventional
ML algorithms.

Since the dimension of time is deeply buried in the mechanics of RL,


the learning is inherently adaptive.
Challenges with Reinforcement Learning
RL agent needs extensive experience.

RL methods autonomously generate training data by interacting with


the environment. Thus, the rate of data collection is limited by the
dynamics of the environment. Environments with high latency slow
down the learning curve.

Furthermore, in complex environments with high-dimensional state


spaces, extensive exploration is needed before a good solution can
be found.
Challenges with Reinforcement Learning
Delayed rewards.
The learning agent can trade off short-term rewards for long-term
gains. While this foundational principle makes RL useful, it also
makes it difficult for the agent to discover the optimal policy.

This is especially true in environments where the outcome is


unknown until a large number of sequential actions are taken. In this
scenario, assigning credit to a previous action for the final outcome
is challenging and can introduce large variance during training.

The game of chess is a relevant example here, where the outcome of


the game is unknown until both players have made all their moves.
Challenges with Reinforcement Learning
Lack of interpretability.

Once an RL agent has learned the optimal policy and is deployed in


the environment, it takes actions based on its experience. To an
external observer, the reason for these actions might not be obvious.

This lack of interpretability interferes with the development of trust


between the agent and the observer. If an observer could explain the
actions that the RL agent tasks, it would help him in understanding the
problem better and discovering limitations of the model, especially in
high-risk environments.
Characteristics of Reinforcement Learning

Important characteristics of reinforcement learning.

• There is no supervisor, only a real number or reward signal.


• Sequential decision making.
• Time plays a crucial role in Reinforcement problems.
• Feedback is always delayed, not instantaneous.
• Agent’s actions determine the subsequent data it receives.
Elements of Reinforcement Learning

There are four main elements of Reinforcement Learning:

• Policy
• Reward Signal
• Value Function
• Model of the environment
Approaches to implement Reinforcement Learning

There are mainly three ways to implement reinforcement-learning


in ML, which are:

1.Value-based:

The value-based approach is about to find the optimal value


function V(s), which is the maximum value at a state under any
policy. In this method, the agent is expecting a long-term return of
the current states under policy π.
Approaches to implement Reinforcement Learning

2. Policy-based:

In a policy-based RL method, you try to come up with such a policy


that the action performed in every state helps you to gain maximum
reward in the future.

The policy-based approach has mainly two types of policy:


Deterministic: The same action is produced by the policy (π) at
any state.
Stochastic: In this policy, probability determines the produced
action.
Approaches to implement Reinforcement Learning

3.Model-based: In the model-based approach, a virtual model is


created for the environment, and the agent explores that
environment to learn it. There is no particular solution or algorithm
for this approach because the model representation is different for
each environment.
Types of Reinforcement Learning

There are two types of Reinforcement:

Positive: Positive Reinforcement is defined as when an event, occurs


due to a particular behavior, increases the strength and the
frequency of the behavior. In other words, it has a positive effect on
behavior. Advantages of reinforcement learning are:
 Maximizes Performance
 Sustain Change for a long period of time
 Too much Reinforcement can lead to an overload of states
which can diminish the results
Types of Reinforcement Learning

Negative: Negative Reinforcement is defined as strengthening of


behavior that occurs because of a negative condition which should
have stopped or avoided.

It helps you to define the minimum stand of performance.

However, the drawback of this method is that it provides enough to


meet up the minimum behavior.
Advantages of Reinforcement learning
 It is used to solve very complex problems that cannot be solved by
conventional techniques.
 The model can correct the errors that occurred during the training process.
 In RL, training data is obtained via the direct interaction of the agent with the
environment
 RL can handle environments that are non-deterministic, meaning that the
outcomes of actions are not always predictable. This is useful in real-world
applications where the environment may change over time or is uncertain.
 RL can be used to solve a wide range of problems, including those that involve
decision making, control, and optimization.
 RL is a flexible approach that can be combined with other ML techniques, such
as deep learning, to improve performance.
Disadvantages of Reinforcement learning
 RL is not preferable to use for solving simple problems.
 RL needs a lot of data and a lot of computation
 RL is highly dependent on the quality of the reward function. If the reward
function is poorly designed, the agent may not learn the desired behavior.
 RL can be difficult to debug and interpret. It is not always clear why the agent
is behaving in a certain way, which can make it difficult to diagnose and fix
problems.
Learning Models of Reinforcement

There are two important learning models in RL:


• Markov Decision Process (MDP)
• Q learning
Markov Decision Process
The following parameters are used to get a solution:
Set of actions- A
Set of states -S
Reward- R
Policy- n
Value- V

The mathematical approach for mapping a solution in RL is recon as


a Markov Decision Process or (MDP).
Reinforcement Learning vs. Supervised Learning
Reinforcement Learning Applications
, Rob o-s occe r, wa lk in g, juggling, etc.
Robot navigation

s.
er
in
ec g
ta
m in
on
s o ar n
i n le
m nt
he e
t t cem
pu for
nd ein
sa r
od ep
go de
ck se
pi s u

optimizing the chemical reactions.


to bot
ro

business strategy planning

You might also like