0% found this document useful (0 votes)

41 views

RL Module 1

Reinforcement learning is the science of decision-making and learning optimal behavior through trial-and-error to maximize rewards. It uses algorithms that learn from outcomes and feedback to determine the best actions without human guidance. Reinforcement learning enables robots and other agents to learn complex tasks by interacting with their environment and receiving rewards or penalties for their actions.

Uploaded by

Amitesh S

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

RL Module 1

Uploaded by

Amitesh S

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1. What is Reinforcement Learning?

Reinforcement Learning (RL) is the science of decision-making. It is about learning the optimal
behavior in an environment to obtain the maximum reward. In RL, the data is accumulated
from machine learning systems that use a trial-and-error method. Data is not part of the input
that we would find in supervised or unsupervised machine learning.

Reinforcement learning uses algorithms that learn from outcomes and decide which action to
take next. After each action, the algorithm receives feedback that helps it determine whether
the choice it made was correct, neutral, or incorrect. It is a good technique to use for automated
systems that have to make a lot of small decisions without human guidance.

It performs actions with the aim of maximizing rewards, or in other words, it is learning by
doing in order to achieve the best outcomes.

2. Differentiate between supervised, unsupervised and reinforcement

learning.

3. Explain any one application of RL.

RL enables robots to learn complex tasks by trial and error, which is especially useful in
scenarios where it is difficult or impossible to pre-program all the possible actions. For
example, consider a robot tasked with picking and placing objects in a warehouse. RL can be
used to train the robot to learn how to pick and place objects on its own, interact with the
environment, take actions, receive feedback, and learn from its mistakes. Over time, the robot
improves its performance and its actions become more efficient and accurate. RL has been used
to train robots for a variety of tasks, such as locomotion, manipulation, and navigation, and to
perform tasks in environments with changing conditions.

4. Explain RL framework with example

The RL framework consists of an agent, an environment, actions, states, rewards, and policies.
Agent: The entity that interacts with the environment, learns, and takes actions
Environment: The surrounding where the agent interacts and receives feedback (reward)
Actions: The decisions are taken by the agent in response to the environment
States: The current situation or context of the environment at a given time step
Rewards: The feedback signal received by the agent indicating how well it did in the
environment
Policies: A mapping of states to actions that guide the agent's behavior

Example: Let's consider the game of chess. In this case, the agent is the computer program that
plays chess, the environment is the chessboard, the actions are the moves that the agent can
make, the states are the positions of the chess pieces on the board, the rewards are the points
received by the agent for winning or losing the game, and the policy is the set of rules the agent
follows to make decisions. The chess-playing agent will explore the board by making moves
and evaluating the resulting state. The agent learns by trial and error and adjusts its policy to
increase its chances of winning the game. The reward function would give a positive reward
for winning, a negative reward for losing, and a neutral reward for a draw.

5. Describe the elements of RL with an example

Beyond the agent and the environment, the four main sub-elements of a reinforcement learning
system are:

Policies: A mapping of states to actions that guide the agent's behavior

Rewards: The feedback signal received by the agent indicating how well it did in the
environment
A value function is the total amount of reward an agent can expect to accumulate over the
future, starting from that state (in the long run)
A model of the environment mimics the behavior of the environment, or more generally, allows
inferences to be made about how the environment will behave

Let's consider the game of chess. In this case, the agent is the computer program that plays
chess, the environment is the chessboard, the actions are the moves that the agent can make,
the states are the positions of the chess pieces on the board, the rewards are the points received
by the agent for winning or losing the game, and the policy is the set of rules the agent follows
to make decisions. The chess-playing agent will explore the board by making moves and
evaluating the resulting state. The agent learns by trial and error and adjusts its policy to
increase its chances of winning the game. The reward function would give a positive reward
for winning, a negative reward for losing, and a neutral reward for a draw.

6. Explain deterministic and stochastic policy with examples.

In Reinforcement Learning (RL), the policy is a function that maps states to actions. A policy
can be deterministic or stochastic.

Deterministic Policy: A deterministic policy is a mapping from states to a single action. In

other words, given a state, the policy will always choose the same action. A deterministic policy
is represented as follows:

π:(s) → a
Where π is the policy function, s is the state, and a is the action chosen by the policy in state s.

For example, consider a game of chess where the agent is the player. A deterministic policy in
chess would always choose a particular move for a given position. If the agent observes the
current state of the chessboard as a state, the deterministic policy would always suggest the
same move for that state.

Stochastic Policy: If an agent follows policy π at time t, then π(a|s) is the probability that At
= a if St = s. This means that at time t, under policy π, the probability of taking action a in state
s is π(a|s). For each state s ∈ S, π is a probability distribution over a ∈ A(s)

For example, consider an autonomous car learning to navigate a busy street. A stochastic
policy in this case would choose the next action probabilistically based on the traffic condition,
pedestrian activity, and other factors in the environment. The policy may suggest a different
action with different probabilities for the same state based on the conditions at the time.
7. Discuss state-value function and action-value function for policy π with
their mathematical definition
8. Explain the concept of exploration and exploitation in RL

Exploitation is defined as a greedy approach in which agents try to get more rewards by using
estimated value but not the actual value. So, in this technique, agents make the best decision
based on current information.

Unlike exploitation, in exploration techniques, agents primarily focus on improving their

knowledge about each action instead of getting more rewards so that they can get long-term
benefits. So, in this technique, agents work on gathering more information to make the best
overall decision.

9. What do you mean by exploration and exploitation dilemma in RL?

Explain with an example.

The dilemma is between choosing what you know and getting something close to what you
expect ('exploitation') and choosing something you aren't sure about and possibly learning more
('exploration'). The reinforcement learning agent will be in a dilemma on whether to exploit
the partial knowledge to receive some rewards or it should explore unknown actions which
could result in many rewards.

In the example of a mobile robot navigating a grid-world environment, the exploration-

exploitation dilemma arises when the robot must decide whether to explore new actions or
exploit its existing knowledge. Exploration involves trying different actions to gather
information about the environment, while exploitation entails using current knowledge to select
actions that yield immediate rewards. The robot needs to strike a balance between exploring
to discover potentially better actions and exploiting its current knowledge to maximize long-
term rewards. By exploring, the robot gathers information about obstacles and the target
location, but it incurs short-term penalties. By exploiting, the robot prioritizes actions expected
to yield higher rewards based on its current knowledge, but it risks missing out on undiscovered
rewards. Balancing exploration and exploitation is crucial for the robot to navigate the
environment effectively and reach its target.

10. Compare evolutionary methods and RL

An evolutionary algorithm is considered a component of evolutionary computation in artificial

intelligence. An evolutionary algorithm functions through the selection process in which the
least fit members of the population set are eliminated, whereas the fit members are allowed to
survive and continue until better solutions are determined. Evolutionary algorithms are a
heuristic-based approach to solving problems that cannot be easily solved in polynomial time,
such as classically NP-Hard problems, and anything else that would take far too long to
exhaustively process.

Reinforcement learning uses the concept of one agent, and the agent learns by interacting with
the environment in different ways. In evolutionary algorithms, they usually start with many
"agents" and only the "strong ones survive". Reinforcement learning agent(s) learns both
positive and negative actions, but evolutionary algorithms only learn the optimal, and the
negative or suboptimal solution information is discarded and lost.

11. Describe how RL and Evolutionary methods will approach the scenario
of changing room temperature from 15 ° to 23 °

Using Reinforcement learning, the agent will try a bunch of different actions to increase and
decrease the temperature. Eventually, it learns that increasing the temperature yields a good
reward. But it also learns that reducing the temperature will yield a bad reward.

For evolutionary algorithms, it initiates with a bunch of random agents that all have a
preprogrammed set of actions it is going to do. Then the agents that have the "increase
temperature" action survive and move on to the next generation. Eventually, only agents that
increase the temperature survive and are deemed the best solution. However, the algorithm
does not know what happens if you decrease the temperature.

12. What is immediate RL? Give example

Immediate Reinforcement Learning (RL) is a type of RL where the reward signal is received
immediately after each action. In Immediate RL, the agent learns by interacting with the
environment in a trial-and-error fashion, receiving a reward or penalty immediately after each
action.

In robot control tasks, the agent needs to take actions based on the immediate state of the
environment to achieve a specific objective, such as moving to a target location.

13. Define Agent, Action, Environment, reward, policy

Agent: The entity that interacts with the environment, learns, and takes actions
Actions: The decisions are taken by the agent in response to the environment
Environment: The surrounding where the agent interacts and receives feedback (reward)
Rewards: The feedback signal received by the agent indicating how well it did in the
environment
Policies: A mapping of states to actions that guide the agent's behavior

14. Classify the following applications under Supervised, Unsupervised and

Reinforcement Learning techniques.

SL - Risk evaluation system, weather forecast

UL - Recommendation system, Cyber fraud detection
RL - Self-driving car, game of chess

15. You have a bank credit dataset and want to take a decision whether to
approve a loan of the applicant based on his profile. Which learning
technique will be used?

Supervised Learning

16. You have to establish a mathematical equation for distance as a function

of speed. So that you can predict the distance when only speed is known.
Which learning technique can be used to implement this?

Supervised Learning

The Book of Mistakes by Corinna Luyken
No ratings yet
The Book of Mistakes by Corinna Luyken
10 pages
Conductor Bundling
No ratings yet
Conductor Bundling
24 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
19 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Sections
No ratings yet
Sections
76 pages
RL RS-Unit_3 (1)
No ratings yet
RL RS-Unit_3 (1)
6 pages
Maai 6
No ratings yet
Maai 6
143 pages
Reinforcement Learning and Robotics
No ratings yet
Reinforcement Learning and Robotics
35 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement learning
No ratings yet
Reinforcement learning
10 pages
RL Ese Answers
No ratings yet
RL Ese Answers
16 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
ML_Unit-4
No ratings yet
ML_Unit-4
10 pages
AI unit -3.docx
No ratings yet
AI unit -3.docx
102 pages
Module 01
No ratings yet
Module 01
66 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
4.1 Reinforcement Learning 2
No ratings yet
4.1 Reinforcement Learning 2
31 pages
Module 1
No ratings yet
Module 1
72 pages
Winter Semester 2023-24_CSE4037_ETH_AP2023246000594_2024-01-05_Reference-Material-I
No ratings yet
Winter Semester 2023-24_CSE4037_ETH_AP2023246000594_2024-01-05_Reference-Material-I
35 pages
MLT Unit-5 notes
No ratings yet
MLT Unit-5 notes
17 pages
Unit 5
No ratings yet
Unit 5
10 pages
Unit 5
No ratings yet
Unit 5
45 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
AS01
No ratings yet
AS01
14 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
31 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
4 pages
kguh
No ratings yet
kguh
38 pages
Assignment_15_Modern_AI
No ratings yet
Assignment_15_Modern_AI
3 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
Reinforcement Learning2A
No ratings yet
Reinforcement Learning2A
88 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
lecture 9 Reiforcement learning (1)
No ratings yet
lecture 9 Reiforcement learning (1)
29 pages
A Primer Chapter on Reinforcement Learning-Final
No ratings yet
A Primer Chapter on Reinforcement Learning-Final
22 pages
UNIT-4
No ratings yet
UNIT-4
56 pages
2024 MTH058 Lecture05 ReinforcementLearning
No ratings yet
2024 MTH058 Lecture05 ReinforcementLearning
59 pages
ML Unit 5 @ VS
No ratings yet
ML Unit 5 @ VS
29 pages
UNIT V reinforcement learning
No ratings yet
UNIT V reinforcement learning
8 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
17 pages
Lecture_01 - Introduction - I
No ratings yet
Lecture_01 - Introduction - I
15 pages
RL
No ratings yet
RL
94 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
F90de-Introduction To Reinforcement Learning
No ratings yet
F90de-Introduction To Reinforcement Learning
67 pages
UNIT VI
No ratings yet
UNIT VI
17 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
9 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
unit 3 ai
No ratings yet
unit 3 ai
5 pages
Unit 3
No ratings yet
Unit 3
12 pages
ML U5 Notes
No ratings yet
ML U5 Notes
26 pages
Reinforcement Learning.pptx
No ratings yet
Reinforcement Learning.pptx
59 pages
DW 01
No ratings yet
DW 01
14 pages
Unit V
100% (1)
Unit V
24 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Table 1 - Virtual Port Channel (VPC) Config
100% (1)
Table 1 - Virtual Port Channel (VPC) Config
4 pages
978 1 941926 11 6 - Chapter06
No ratings yet
978 1 941926 11 6 - Chapter06
22 pages
Komatsu pc27mrx 1 pc30mrx 1 pc35mrx 1 pc40mrx 1 pc45mrx 1 Shop Manual Sebm016808
100% (53)
Komatsu pc27mrx 1 pc30mrx 1 pc35mrx 1 pc40mrx 1 pc45mrx 1 Shop Manual Sebm016808
20 pages
Business Final Exam Review
No ratings yet
Business Final Exam Review
47 pages
Grade 2 English Revision Assignment T2-2024
No ratings yet
Grade 2 English Revision Assignment T2-2024
4 pages
CapSLET English 2nd Quarter
No ratings yet
CapSLET English 2nd Quarter
4 pages
Telangana State Road Transport Corporation: 1.student Details
No ratings yet
Telangana State Road Transport Corporation: 1.student Details
2 pages
Calasanz Pueblo Bavaro Highschool: Diagnostic Test 6Th
No ratings yet
Calasanz Pueblo Bavaro Highschool: Diagnostic Test 6Th
2 pages
APP 007 Chapter 1 5
No ratings yet
APP 007 Chapter 1 5
34 pages
The Citibank India Story Nurturing Global Leaders - Group 5
No ratings yet
The Citibank India Story Nurturing Global Leaders - Group 5
11 pages
Gotcha Brochure
No ratings yet
Gotcha Brochure
6 pages
A Completed Well Sample
No ratings yet
A Completed Well Sample
4 pages
Gauss Law
No ratings yet
Gauss Law
4 pages
Unknown Love-English Paper
No ratings yet
Unknown Love-English Paper
1 page
Hook Up Format
0% (1)
Hook Up Format
3 pages
Further Reading PDF
No ratings yet
Further Reading PDF
2 pages
Q3 DLP Taking a Stand to a Material Viewed
No ratings yet
Q3 DLP Taking a Stand to a Material Viewed
3 pages
Ilc SLC
No ratings yet
Ilc SLC
13 pages
CI1422-The Certified Islamic Specialist in Shariah Auditing
No ratings yet
CI1422-The Certified Islamic Specialist in Shariah Auditing
4 pages
433 Full
No ratings yet
433 Full
6 pages
Harolds Taylor Series Cheat Sheet 2022
No ratings yet
Harolds Taylor Series Cheat Sheet 2022
8 pages
Release Information
No ratings yet
Release Information
11 pages
Band Pass Filters
No ratings yet
Band Pass Filters
5 pages
Free Body Diagram 2
No ratings yet
Free Body Diagram 2
59 pages
13 Borges
No ratings yet
13 Borges
63 pages
English Powerpoint Presentation: Name:Nur Halimah Binti Mohammad Fazil Matrix Number:13620 Class:4 Zamrud
No ratings yet
English Powerpoint Presentation: Name:Nur Halimah Binti Mohammad Fazil Matrix Number:13620 Class:4 Zamrud
15 pages
Laboratory Activity 9.1: Skeletal System
No ratings yet
Laboratory Activity 9.1: Skeletal System
4 pages
Karan CV
No ratings yet
Karan CV
3 pages