Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
Dr.Ch.Balaram Murthy
What is Reinforcement Learning (RL)?
The idea behind Reinforcement Learning is that an agent (an AI) will
learn from the environment by interacting with it (through trial and
error) and receiving rewards (negative or positive) as feedback for
performing actions.
Input: The input should be an initial state from which the model
will start.
Training: The training is based upon the input, The model will
return a state and the user will decide to reward or punish the
model based on its output.
Action(): Actions are the moves taken by an agent within the environment.
State(): State is a situation returned by the environment after each action taken by
the agent.
Reward(): A feedback returned to the agent from the environment to evaluate the
action of the agent.
Policy(): Policy is a strategy applied by the agent for the next action based on the
current state.
Value(): It is expected long-term retuned with the discount factor and opposite to
the short-term reward.
Q-value(): It is mostly similar to the value, but it takes one additional parameter as
a current action (a).
Examples of Reinforcement Learning
RL, on the other hand, doesn’t divide the problem into sub-
problems; it directly works to maximize the long-term reward. It has
an obvious purpose, understands the goal, and is capable of trading
off short-term rewards for long-term benefits.
Benefits of Reinforcement Learning
Does not need a separate data collection step.
In RL, training data is obtained via the direct interaction of
the agent with the environment.
In RL, time matters and the experience that the agent collects is not
independently and identically distributed (i.i.d.), unlike conventional
ML algorithms.
• Policy
• Reward Signal
• Value Function
• Model of the environment
Approaches to implement Reinforcement Learning
1.Value-based:
2. Policy-based:
s.
er
in
ec g
ta
m in
on
s o ar n
i n le
m nt
he e
t t cem
pu for
nd ein
sa r
od ep
go de
ck se
pi s u