0% found this document useful (0 votes)

16 views

A12 Spring2024

The document discusses reinforcement learning concepts like value functions, policies, and Q-learning. It presents a simple gridworld example and questions to help understand how an agent can learn an optimal policy through learning state values or state-action values. Key points covered include iterative value function updates, constructing optimal policies from learned values, and addressing unknown transition functions.

Uploaded by

sofia pillai

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

A12 Spring2024

Uploaded by

sofia pillai

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Assignment 12

Introduction to Machine Learning

Prof. B. Ravindran
1. Statement 1: Empirical error is always greater than generalisation error.
Statement 2: Training data and test data have different underlying(true) distributions.
Choose the correct option:
(a) Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statemnet
1.
(b) Statement 1 is true. Statement 2 is true. Statement 2 is not the correct reason for
statemnet 1.
(c) Statement 1 is true. Statement 2 is false.
(d) Both statements are false.
Sol. (d)
Empirical error can be greater than or less than generalisation error. In fact, it is typically
less than generalisation error since the model tends to perform better on the data it has seen
during training.
5
2. Let P (Ai ) = 2−i . Calculate the upper bound for P (
S
Ai ) using union bound (rounded to 3
i=1
decimal places).
(a) 0.937
(b) 0.984
(c) 0.969
(d) 1
Sol. (c)

P (A1 ∪ A2 ∪ A3 ∪ A4 ∪ A5 ) ≤P (A1 ) + P (A2 ) + P (A3 ) + P (A4 ) + P (A5 )

1 1 1 1 1
= + + + +
2 4 8 16 32
=0.5 + 0.25 + 0.125 + 0.0625 + 0.03125
=0.96875
≈0.969

3. Which of the following is/are the shortcomings of TD Learning that Q-learning resolves?
(a) TD learning cannot provide values for (state, action) pairs, limiting the ability to extract
an optimal policy directly.
(b) TD learning requires knowledge of the reward and transition functions, which is not
always available.
(c) TD learning is computationally expensive and slow compared to Q-learning.

1
(d) TD learning often suffers from high variance in value estimation, leading to unstable
learning.
(e) TD learning cannot handle environments with continuous state and action spaces effec-
tively.
Sol. (a), (b), (d)
Refer to the lectures.

4. Given 100 hypothesis functions, each trained with 106 samples, what is the lower bound on
the probability that there does not exist a hypothesis function with error greater than 0.1?
4
(a) 1 − 200e−2·10
4
(b) 1 − 100e10
2
(c) 1 − 200e10
2
(d) 1 − 200e−2·10

Sol.
k = 100
m = 106
γ = 0.1

2
.106
P (∄ hi s.t.|E(hi ) − Ẽ(hi )| > 0.1) ≥ 1 − 2.100.e−2.0.1
4
≥ 1 − 200e−2.10

5. The VC dimension of a pair of squares is:

(a) 3
(b) 4
(c) 5
(d) 6
Sol. (c)
For binary classification on 5 samples, there will exist at least one class for which the number
of samples is less than or equal to two. Use the two squares to bound these.

For 6 samples, arrange the samples in the following manner:

xox
oxo
Two squares cannot classify these.

2
COMPREHENSION:
For the rest of the questions, we will follow a simplistic game and see how a Reinforcement
Learning agent can learn to behave optimally in it.
This is our game:

At the start of the game, the agent is on the Start state and can choose to move left or right
at each turn. If it reaches the right end(RE), it wins and if it reaches the left end(LE), it loses.

Because we love maths so much, instead of saying the agent wins or loses, we will say that the
agent gets a reward of +1 at RE and a reward of -1 at LE. Then the objective of the agent is
simply to maximum the reward it obtains!
For each state, we define a variable that will store its value. The value of the state will help
the agent determine how to behave later. First we will learn this value.

Let V be the mapping from state to its value.

Initially,
V(LE) = -1
V(X1) = V(X2) = V(X3) = V(X4) = V(Start) = 0
V(RE) = +1
For each state S ∈ {X1, X2, X3, X4, Start}, with SL being the state to its immediate left and
SR being the state to its immediate right, repeat:

V (S) = 0.9 × max(V (SL ), V (SR ))

Till V converges (does not change for any state).

6. What is V(X4) after one application of the given formula?

(a) 1
(b) 0.9
(c) 0.81
(d) 0

Sol. (b)
V (X4) = 0.9 × max(V (X3), V (RE))
V (X4) = 0.9 × max(0, 1)
V (X4) = 0.9 × 1
V (X4) = 0.9

7. What is V(X1) after one application of given formula?

3
(a) -1
(b) -0.9
(c) -0.81
(d) 0
Sol. (d)
V (X4) = 0.9 × max(V (LE), V (X2))
V (X4) = 0.9 × max(−1, 0)
V (X4) = 0.9 × 0
V (X4) = 0
8. What is V(X1) after V converges?
(a) 0.54
(b) -0.9
(c) 0.63
(d) 0
Sol. (a)
This is the sequence of changes in V:
V (X4) = 0.9 → V (X3) = 0.81 → V (Start) = 0.72 → V (X2) = 0.63 → V (X1) = 0.54
Final value for X1 is 0.54.
9. The behavior of an agent is called a policy. Formally, a policy is a mapping from states to
actions. In our case, we have two actions: left and right. We will denote the action for our
policy as A.
Clearly, the optimal policy would be to choose action right in every state. Which of the
following can we use to mathematically describe our optimal policy using the learnt V?

For options (c) and (d), T is the transition function defined as: T (state, action) = next state.
(more than one options may apply)
(a) (
Left if V (SL ) > V (SR )
A=
Right otherwise
(b) (
Left if V (SR ) > V (SL )
A=
Right otherwise
(c)
A = arg max({V (T (S, a))})
a

(d)
A = arg min({V (T (S, a))})
a

Sol. (a), (c)

The Value function (V) is higher for states that is closer to the winning state. Therefore, we
take steps in the direction that maximizes V.

4
10. In games like Chess or Ludo, the transition function is known to us. But what about Counter
Strike or Mortal Combat or Super Mario? In games where we do not know T, we can only
query the game simulator with current state and action, and it returns the next state. This
means we cannot directly argmax or argmin for V(T(S,a)). Therefore, learning the value
function V is not sufficient to construct a policy. Which of these could we do to overcome this?
(more than 1 may apply)
Assume there exists a method to do each option. You have to judge whether doing it solves
the stated problem.
(a) Directly learn the policy.
(b) Learn a different function which stores value for state-action pairs (instead of only state
like V does).
(c) Learn T along with V.
(d) Run a random agent repeatedly till it wins. Use this as the winning policy.
Sol. (a), (b), (c)
(a) - If we learn the policy itself, problem solved.
(b) - Given a function Q(s, a), we can use policy: A = arg max({Q(S, a)}).
a
(c) - If we have T and V, we can do what we saw in the previous question.
(d) - If the agent learns a single sequence of actions as its policy, then it will fail when any one
of the states that it saw for that sequence changes, which can easily happen for a stochastic
environment (i.e. transitions are probabilistic).

C1M5 Peer Reviewed Others
No ratings yet
C1M5 Peer Reviewed Others
27 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
Practical Cryptography With Go
No ratings yet
Practical Cryptography With Go
54 pages
DS 432 Assignment I 2020
No ratings yet
DS 432 Assignment I 2020
7 pages
Advanced Calculus Test Questions
No ratings yet
Advanced Calculus Test Questions
13 pages
Final Practice
No ratings yet
Final Practice
68 pages
6b Soln
No ratings yet
6b Soln
3 pages
Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Reinforcement Learning Prof. B. Ravindran
4 pages
242 Sheet 02 02
No ratings yet
242 Sheet 02 02
6 pages
CS6700 RL 2024 Wa1
No ratings yet
CS6700 RL 2024 Wa1
7 pages
Assignment 4 (Sol.) : Reinforcement Learning
No ratings yet
Assignment 4 (Sol.) : Reinforcement Learning
6 pages
pgcs2023
No ratings yet
pgcs2023
5 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
additional_exercises
No ratings yet
additional_exercises
303 pages
midterm1_B
No ratings yet
midterm1_B
18 pages
242 Sheet 02 03
No ratings yet
242 Sheet 02 03
5 pages
QUIZ (Objectives) Identification: - (Residual)
No ratings yet
QUIZ (Objectives) Identification: - (Residual)
5 pages
A8
No ratings yet
A8
4 pages
Qp Mathematics Xii i Pb
No ratings yet
Qp Mathematics Xii i Pb
6 pages
Midterm Solution
No ratings yet
Midterm Solution
11 pages
Assignment 7 (Sol.) : Reinforcement Learning
0% (1)
Assignment 7 (Sol.) : Reinforcement Learning
3 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Tutorial 4: Determinants and Linear Transformations
No ratings yet
Tutorial 4: Determinants and Linear Transformations
5 pages
Unit 3 Discrete Fourier Transform Questions and Answers - Sanfoundry PDF
No ratings yet
Unit 3 Discrete Fourier Transform Questions and Answers - Sanfoundry PDF
5 pages
Midterm Exam: CS 188 Introduction To Fall 2008 Artificial Intelligence
No ratings yet
Midterm Exam: CS 188 Introduction To Fall 2008 Artificial Intelligence
12 pages
D WT HT e DT: Answer
No ratings yet
D WT HT e DT: Answer
21 pages
MDS-Entrance-Model-Questions_2023_11_29_12_20_40
No ratings yet
MDS-Entrance-Model-Questions_2023_11_29_12_20_40
14 pages
Economics Mod
No ratings yet
Economics Mod
22 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Problem Set 5
No ratings yet
Problem Set 5
5 pages
Part A Common Questions
No ratings yet
Part A Common Questions
16 pages
Algebra Advanced (Coret)
No ratings yet
Algebra Advanced (Coret)
23 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
Entrance Test
No ratings yet
Entrance Test
4 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
Mid Term QP Maths Sample 1
No ratings yet
Mid Term QP Maths Sample 1
4 pages
1735916864_tutorial_sheet-1
No ratings yet
1735916864_tutorial_sheet-1
4 pages
MATHEMATICS (041) QP DT_18-12-2024
No ratings yet
MATHEMATICS (041) QP DT_18-12-2024
4 pages
Financial Econometrics and Empirical Finance - Module 2 Midterm Exam Solutions - March 2015
No ratings yet
Financial Econometrics and Empirical Finance - Module 2 Midterm Exam Solutions - March 2015
14 pages
Mscds2023 Solutions
No ratings yet
Mscds2023 Solutions
17 pages
Grade XII Term I -App Math sample paper
No ratings yet
Grade XII Term I -App Math sample paper
7 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
270 pages
Maths Class XII Chapter 01, 02 and 03 Revision Practice Paper 2024 Answers
No ratings yet
Maths Class XII Chapter 01, 02 and 03 Revision Practice Paper 2024 Answers
7 pages
DNN Cluster S2 22 MidSem Regular
No ratings yet
DNN Cluster S2 22 MidSem Regular
6 pages
Midterm Exam Solution
No ratings yet
Midterm Exam Solution
11 pages
Unit 2 - Complex Analysis
No ratings yet
Unit 2 - Complex Analysis
32 pages
Gs2015 Qp Css
No ratings yet
Gs2015 Qp Css
16 pages
Sample Paper XII
No ratings yet
Sample Paper XII
6 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
gate 2024
No ratings yet
gate 2024
13 pages
2DD40-20231102-answers
No ratings yet
2DD40-20231102-answers
5 pages
ANSWERS TO 15-381 Final, Spring 2004: Friday May 7, 2004
No ratings yet
ANSWERS TO 15-381 Final, Spring 2004: Friday May 7, 2004
20 pages
10315 S23 Midterm2 Practice Problems Sol
No ratings yet
10315 S23 Midterm2 Practice Problems Sol
42 pages
XII LC Math 2024-25
No ratings yet
XII LC Math 2024-25
51 pages
Final Exam Solutions
No ratings yet
Final Exam Solutions
20 pages
Design and Analysis of Algorithms Trial Exam: Ade Azurat - Fasilkom UI September 30, 2010
No ratings yet
Design and Analysis of Algorithms Trial Exam: Ade Azurat - Fasilkom UI September 30, 2010
13 pages
10 Tukka Tricks For Jee Mains 2025
No ratings yet
10 Tukka Tricks For Jee Mains 2025
5 pages
DOC-20241014-WA0000
No ratings yet
DOC-20241014-WA0000
8 pages
Prebord 2 Maths 12th Paper
No ratings yet
Prebord 2 Maths 12th Paper
6 pages
Sat Mathematics Review And Practice
From Everand
Sat Mathematics Review And Practice
Addison Shaw
1/5 (1)
Statistics Practice Workbook
No ratings yet
Statistics Practice Workbook
87 pages
Key Formulas: Confidential Appendix 1 (1) CS/STA408
No ratings yet
Key Formulas: Confidential Appendix 1 (1) CS/STA408
2 pages
A Supersymmetric Treatment of A Particle Subjected To A Ring-Shaped Potential
No ratings yet
A Supersymmetric Treatment of A Particle Subjected To A Ring-Shaped Potential
21 pages
Programming Computable Functions
No ratings yet
Programming Computable Functions
3 pages
Mat 052: Differential Equations: Week In/Out Assessment Lesson No. Topic
No ratings yet
Mat 052: Differential Equations: Week In/Out Assessment Lesson No. Topic
1 page
Paper 3 PDF
No ratings yet
Paper 3 PDF
5 pages
DAA Important Question
No ratings yet
DAA Important Question
3 pages
Markov Chains: Introduction
No ratings yet
Markov Chains: Introduction
71 pages
Mallik, R
No ratings yet
Mallik, R
13 pages
Forecasting G11 TVL
No ratings yet
Forecasting G11 TVL
18 pages
ECE2006 Digital-Signal-Processing ETH 1 AC40
No ratings yet
ECE2006 Digital-Signal-Processing ETH 1 AC40
2 pages
EXPERIMENT 7: Design The Low Pass Filter, The High Pass Filter, and The Band Pass Filter With Infinite Impulse Response
No ratings yet
EXPERIMENT 7: Design The Low Pass Filter, The High Pass Filter, and The Band Pass Filter With Infinite Impulse Response
5 pages
WS 03.5 F, F', F''
No ratings yet
WS 03.5 F, F', F''
11 pages
2.4 Graphs Question Paper
No ratings yet
2.4 Graphs Question Paper
17 pages
A Survey On Power - Efficient Forward Error Correction Scheme For Wireless Sensor Networks
No ratings yet
A Survey On Power - Efficient Forward Error Correction Scheme For Wireless Sensor Networks
7 pages
Ch02-Regression Handout
No ratings yet
Ch02-Regression Handout
22 pages
(Updated) Phase-1 - Final Project Presentation (G7)
No ratings yet
(Updated) Phase-1 - Final Project Presentation (G7)
16 pages
Assignment 3 Warehouse Saturdays Class
No ratings yet
Assignment 3 Warehouse Saturdays Class
3 pages
MAT2612 Assignment 01
No ratings yet
MAT2612 Assignment 01
5 pages
Graphs Breadth First Search & Depth First Search: by Shailendra Upadhye
No ratings yet
Graphs Breadth First Search & Depth First Search: by Shailendra Upadhye
21 pages
3.3 3.4 Some Solutions
No ratings yet
3.3 3.4 Some Solutions
12 pages
Srednicki
No ratings yet
Srednicki
278 pages
Stability of Linear Systems: 11.1 Some Definitions
No ratings yet
Stability of Linear Systems: 11.1 Some Definitions
8 pages
Chapter 1 - Introdution
No ratings yet
Chapter 1 - Introdution
26 pages
Tutorial Suffix Tree
No ratings yet
Tutorial Suffix Tree
16 pages
Fuzzy and Nural Approaches in Engineering Matlab Suppliment
100% (1)
Fuzzy and Nural Approaches in Engineering Matlab Suppliment
218 pages
2022 Optimization of Random Forest Through The Use of MVO, GWO and MFO in Evaluating The Stability of Underground Entry-Type Excavations
No ratings yet
2022 Optimization of Random Forest Through The Use of MVO, GWO and MFO in Evaluating The Stability of Underground Entry-Type Excavations
22 pages

A12 Spring2024

Uploaded by

A12 Spring2024

Uploaded by

Assignment 12

Introduction to Machine Learning

P (A1 ∪ A2 ∪ A3 ∪ A4 ∪ A5 ) ≤P (A1 ) + P (A2 ) + P (A3 ) + P (A4 ) + P (A5 )

5. The VC dimension of a pair of squares is:

For 6 samples, arrange the samples in the following manner:

Let V be the mapping from state to its value.

V (S) = 0.9 × max(V (SL ), V (SR ))

6. What is V(X4) after one application of the given formula?

7. What is V(X1) after one application of given formula?

Sol. (a), (c)

You might also like