0% found this document useful (0 votes)

5 views

Week-12

Uploaded by

Harshit Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Week-12

Uploaded by

Harshit Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Introduction to Machine Learning

- Prof. Balaraman Ravindran | IIT Madras

Problem Solving Session (Week-12)

Shreya Bansal
PMRF PhD Scholar
IIT Ropar
Week-12 Contents

1. Introduction to Theory in Machine Learning

2. VC Dimension
3. Reinforcement Learning
Introduction to Theory in Machine Learning

● In computer science, "theory" often refers to:

○ Problem hardness (how difﬁcult a problem is to solve)
○ Space/time complexity
○ Approximability (how close solutions are to optimal)
● Goal: Apply similar theoretical frameworks to machine
learning.
● Key questions:
○ How hard is it to approximate a solution?
○ How do we measure the hardness of ML problems?
Generalization Error

● Generalization error (ε(h)): Probability that hypothesis h

makes a mistake on a new data point (x,y) sampled from
distribution D:
ϵ(h)=P(x,y)∼D(h(x)≠y))
● Empirical error (ε̂(h)): Error measured on training data:
ϵ^(h)=(1/m) i=1∑m1(h(xi)≠yi)
● Challenge: Estimate generalization error using empirical
error.
Empirical Risk Minimization (ERM)
● Most learning algorithms aim to minimize empirical error:

● Hypothesis class H:

● Linear classiﬁers: Deﬁned by parameters θ.

● Neural networks: Deﬁned by architecture + weights.

● Practical issue: ERM may not ﬁnd the true minimizer (e.g., due to
optimization challenges).
Key Theoretical Tools
Uniform Convergence
Uniform Convergence
Probabilistic Guarantee
Bound for ERM Solution
Sample Complexity
Sample Complexity
VC Dimension

● VC Dimension (Vapnik-Chervonenkis Dimension) is a

measure of the capacity (or complexity) of a statistical
classiﬁcation model, speciﬁcally in terms of its ability to
shatter datasets.
Key Concepts

● Hypothesis Space
● This is the set of all possible functions (or classiﬁers) that a model can learn. For
example, in linear classiﬁers, it's all possible straight lines (in 2D) that can divide the
space into two classes.
● Shattering
● A hypothesis space shatters a set of points if it can correctly classify all possible labelings
of those points.
○ For n points, there are 2𝑛 possible ways to label them (each point can be +1 or
-1).
○ If your model can classify all 2n labelings correctly using some function in
the hypothesis space, then it shatters those n points.
● VC Dimension
● The VC dimension of a hypothesis class is the maximum number of points that can be
shattered by the hypothesis class.
Examples -1
Examples
Examples -2
Why Does VC Dimension Matter?
What is PAC Learning?

● PAC (Probably Approximately Correct) learning is a formal

framework that helps us understand when and how well a
machine learning algorithm can learn a concept from data.

● The idea is:

“A concept is PAC-learnable if, with high probability, a
learner can ﬁnd a hypothesis that is approximately
correct using a reasonable amount of data and
computation.”
Breaking it Down: PAC Terminology

● Let’s deﬁne the terms in “Probably Approximately Correct”:

● Probably (1 - δ): The learning algorithm should succeed with high

probability (e.g., 95% of the time).

● Approximately (ε): The hypothesis should be close to the true concept,

i.e., error ≤ ε.

● Correct: Refers to how well the model generalizes on unseen data (not
just training data).
Formal Deﬁnition
Relationship with VC Dimension
Examples of PAC Learnable Classes
Intuition: What Makes a Class PAC-Learnable?

● To be PAC-learnable, a class must have:

● Finite VC dimension (limited capacity to overﬁt)

● Efﬁcient learning algorithm (computable in reasonable time)
● Sufﬁcient data to learn from
What is Reinforcement Learning (RL)?

● Reinforcement Learning is a type of

machine learning where an agent
learns by interacting with its
environment, using trial and error,
and receiving feedback in the form
of rewards or punishments.
● It’s neither supervised nor
unsupervised learning.
Key Comparisons

Supervised Unsupervised Reinforcement

Learning Learning Learning

You’re given labeled data You’re given unlabeled You’re not told the right
- inputs with data, with no clear answer.
corresponding outputs output.
(e.g., image → “cat”). You interact with an
environment, try actions,
The model learns to map The goal is to ﬁnd and observe the outcomes
inputs to outputs. structure or patterns (rewards/punishments).
(e.g., clustering,
Examples: association rules). Learning is driven by
Classiﬁcation, experience, not direct
Regression. supervision.
Illustrative Example: Learning to Ride a Bicycle

● Not supervised: No one gives exact

instructions (like move your foot 3
pounds, tilt 2 degrees).
● Not unsupervised: You don’t just watch
people ride and ﬁgure it out.
● It’s reinforcement: You try, fall, learn not
to fall. You get minimal feedback — like
pain (punishment) or clapping (reward) —
and gradually ﬁgure it out.
Trial and Error is Key

● The RL agent tries different actions (explores).

● It learns which actions lead to better outcomes over time.
● The delayed nature of rewards adds complexity. You might get
feedback well after the action that caused it.
Pavlov’s Dog & RL Origins

● RL is inspired by
behavioral psychology —
like Pavlov’s classical
conditioning.
● The modern ﬁeld was
kickstarted by Sutton &
Barto, whose 1983 work
laid the foundation for
today's algorithms and
techniques.
Games as a RL Metaphor

● In chess, if you're told what

move to play in a given
situation, that’s supervised
learning.
● If you just play, win or lose,
and only get a reward at the
end, that’s reinforcement
learning.
● You must ﬁgure out which
moves led to winning, despite
delayed rewards.
Reinforcement Learning (RL)

● RL involves an agent learning through interaction with an environment

(e.g., helicopter, game board, opponent).
● The agent senses the state of the environment and takes actions to
inﬂuence it.
● Key challenge: Actions must consider long-term beneﬁts, not just
immediate rewards (e.g., chess strategy).
● Reward signal: Scalar feedback from the environment (e.g., +1 for
winning, -100 for crashing).
● Biological analogy: Rewards/punishments are interpreted from sensory
inputs (e.g., pain as negative feedback).
Key Comparisons

Supervised Unsupervised Reinforcement

Learning Learning Learning

Input → Output (with Input → Pattern detection Input → Action → Scalar

target labels). (no explicit labels). reward (no target labels).

Error signal guides Trial-and-error learning

learning (e.g., gradient (exploration required).
descent).
Temporal Difference (TD) Learning

● Core Idea: Predict future rewards by

updating estimates based on later
predictions.

● Prediction at time t+1 is more accurate

than at t.

● Example: Chess—conﬁdence in winning

increases as the game progresses.

● Update Rule: Adjust earlier predictions

using newer information (e.g., reduce
winning probability from 0.6 to 0.55 if
later evidence suggests lower odds).

● Biological basis: Similar to

dopamine-driven learning in brains.
Tic-Tac-Toe as an RL Problem
● States: Board conﬁgurations.

● Actions: Placing X or O in empty cells.

● Rewards:

○ +1 for winning, 0 otherwise (binary).

○ Alternative: +1 (win), -1 (lose), 0 (draw).

● Key Points:

● Learn a value function: Expected reward from each state

(probability of winning).

● Update values via TD learning or end-game feedback.

● Imperfect opponent required for meaningful learning

(perfect play leads to always draws).
Exploration vs. Exploitation

● Exploitation: Choose actions with the

highest known rewards.
● Exploration: Try suboptimal actions to
discover better strategies.
● Challenge: Balancing exploration
(learning) vs. exploitation (performance).
● Bandit Problems: Simpliﬁed RL focusing on
this trade-off (no sequential states).
● Example: In tic-tac-toe, occasionally pick
random moves to avoid repeating
suboptimal paths.
Assignment-11 (Cs-101- 2024) (Week-12)

Source
Question-1

What is the VC dimension of the class of linear classiﬁers in 2D space?

a) 2
b) 3
c) 4
d) None of the above
Question-1- Correct answer

What is the VC dimension of the class of linear classiﬁers in 2D space?

a) 2
b) 3
c) 4
d) None of the above

Correct options: (b) Any 3 points can be classiﬁed using a linear decision
boundary
Question-2

Which of the following learning algorithms does NOT typically perform

empirical risk minimization?

a) Linear regression
b) Logistic regression
c) Decision trees
d) Support Vector Machines
Question-2 - Explanation

Which of the following learning algorithms does NOT typically perform

empirical risk minimization?

SVM, is called structural risk minimization because they have an additional constraint that is
there apart from the empirical they also try to minimize the solution size.

They try to minimize the norm of the weight factor so that actually gives rest to a different kind
of minimization.

So it does not do empirical, they called structural risk minimization.

Question-2- Correct answer

Which of the following learning algorithms does NOT typically perform

empirical risk minimization?

a) Linear regression
b) Logistic regression
c) Decision trees
d) Support Vector Machines

Correct options: (d).

Question-3
Statement 1: As the size of the hypothesis class increases, the sample complexity for PAC learning
always increases.
Statement 2: A larger hypothesis class has a higher VC dimension. Choose the correct option:

a) Statement 1 is true. Statement 2 is true. Statement 2 is the correct

reason for statement 1
b) Statement 1 is true. Statement 2 is true. Statement 2 is not the correct
reason for statement 1
c) Statement 1 is true. Statement 2 is false
d) Both statements are false
Question-3
Statement 1: As the size of the hypothesis class increases, the sample complexity for PAC learning
always increases.
Statement 2: A larger hypothesis class has a higher VC dimension. Choose the correct option:
Question-3
Statement 1: As the size of the hypothesis class increases, the sample complexity for PAC learning
always increases.
Statement 2: A larger hypothesis class has a higher VC dimension. Choose the correct option:
Question-3
Statement 1: As the size of the hypothesis class increases, the sample complexity for PAC learning
always increases.
Statement 2: A larger hypothesis class has a higher VC dimension. Choose the correct option:
Question-3 - Correct answer
Statement 1: As the size of the hypothesis class increases, the sample complexity for PAC learning always increases.
Statement 2: A larger hypothesis class has a higher VC dimension. Choose the correct option:

a) Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statement 1
b) Statement 1 is true. Statement 2 is true. Statement 2 is not the correct reason for statement
1
c) Statement 1 is true. Statement 2 is false
d) Both statements are false

Correct options: (b)

Question-4
When a model’s hypothesis class is too small, how does this affect the model’s
performance in terms of bias and variance

a) High bias, low variance

b) Low bias, high variance
c) High bias, high variance
d) Low bias, low variance
Question-4 - Correct answer
When a model’s hypothesis class is too small, how does this affect the model’s
performance in terms of bias and variance

a) High bias, low variance

b) Low bias, high variance
c) High bias, high variance
d) Low bias, low variance

Correct options: (a)

Question-5

Imagine you’re designing a robot that needs to navigate through a maze to

reach a target. Which reward scheme would be most effective in teaching
the robot to ﬁnd the shortest path

a) +5 for reaching the target,-1 for hitting a wall

b) +5 for reaching the target,-0.1 for every second that passes before the robot reaches the
target.
c) +5 for reaching the target,-0.1 for every second that passes before the robot reaches the
target, +1 for hitting a wall.
d) -5 for reaching the target, +0.1 for every second that passes before the robot reaches the
target.
Question-5 - Correct answer
Imagine you’re designing a robot that needs to navigate through a maze to reach a target. Which
reward scheme would be most effective in teaching the robot to ﬁnd the shortest path

a) +5 for reaching the target,-1 for hitting a wall

Correct options: (b) The +5 reward for reaching the target encourages goal achievement, while
the-0.1 penalty for each second promotes ﬁnding the shortest path. Omitting rewards for
hitting walls as question has nothing in this regard
Question-6-8
For the rest of the questions, we will follow a simplistic game and see how a
Reinforcement Learning agent can learn to behave optimally in it.
This is our game:

a) At the start of the game, the agent is on the Start state and can choose to move left or right
at each turn.
b) If it reaches the right end(RE), it wins and if it reaches the left end(LE), it loses.
c) Because we love maths so much, instead of saying the agent wins or loses, we will say that
the agent gets a reward of +1 at RE and a reward of-1 at LE. Then the objective of the agent
is simply to maximum the reward it obtains!
Question-6
For each state, we deﬁne a variable that will store its value. The value of the state will help the
agent determine how to behave later. First we will learn this value. Let V be the mapping from
state to its value.
Initially,
V(LE) =-1
V(X1) = V(X2) = V(X3) = V(X4) = V(Start) = 0
V(RE) = +1
For each state S ∈ {X1,X2,X3,X4,Start}, with SL being the state to its immediate left and SR being
the state to its immediate right, repeat: V(S) = 0.9×max(V(SL),V(SR))
Till V converges (does not change for any state).

What is V(X4) after one application of the given formula?

a) 1
b) 0.9
c) 0.81
d) 0
Question-6 - Correct answer

What is V(X4) after one application of the given formula?

a) 1 V(X4) = 0.9×max(V(X3),V(RE))
b) 0.9
V(S) = 0.9×max(0,+1) = 0.9
c) 0.81
d) 0

Correct options: (b)

Question-7
What is V(X1) after one application of given formula?

(a)-1

(b)-0.9

(c)-0.81

(d) 0
Question-7 - Correct answer

What is V(X1) after one application of given formula?

a) -1
b) -0.9
c) -0.81
d) 0- V(X1) = 0.9×max(V(LE),V(X2)) V(S) = 0.9×max(−1,0) = 0

Correct options: (d)

Question-8
What is V(X1) after V converges?

a) 0.59
b) -0.9
c) 0.63
d) 0
Question-8 - Correct answer

What is V(X1) after V converges?

a) 0.59 - V(X4) = 0.9 → V(X3) =0.81 → V(Start) = 0.729 → V(X2) = 0.656 →

V(X1) = 0.59 Final value for X1 is 0.59.
b) -0.9
c) 0.63
d) 0

Correct options: (a)

Suggestions and Feedback

Next Session:

Wednesday:
16-Apr-2025
6:00 - 8:00 PM

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Map Eurovelo 1. Atlantic Coast Route
No ratings yet
Map Eurovelo 1. Atlantic Coast Route
1 page
Practical Design of Experiments: DoE Made Easy
From Everand
Practical Design of Experiments: DoE Made Easy
Colin Hardwick
4.5/5 (7)
Nota Medik Xtra
No ratings yet
Nota Medik Xtra
10 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Module 1
No ratings yet
Module 1
72 pages
37 RL
No ratings yet
37 RL
18 pages
ML RUSA Module 1 Intro
No ratings yet
ML RUSA Module 1 Intro
30 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Module 01
No ratings yet
Module 01
66 pages
2702 PDF
No ratings yet
2702 PDF
7 pages
AI Unit-4
No ratings yet
AI Unit-4
59 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
19. Larning Introduction
No ratings yet
19. Larning Introduction
6 pages
UNIT V reinforcement learning
No ratings yet
UNIT V reinforcement learning
8 pages
Introduction to Prolog-Unit3
No ratings yet
Introduction to Prolog-Unit3
30 pages
Machine Learning Unit-1.2
No ratings yet
Machine Learning Unit-1.2
23 pages
AI notes Module- 4
No ratings yet
AI notes Module- 4
13 pages
UNIT-3
No ratings yet
UNIT-3
29 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
17 pages
Reinforcement Learning: A Tutorial
No ratings yet
Reinforcement Learning: A Tutorial
17 pages
Unit I Introduction To RL
No ratings yet
Unit I Introduction To RL
30 pages
21ai020 & Reinforcement Learning UNIT 1-LM:1
No ratings yet
21ai020 & Reinforcement Learning UNIT 1-LM:1
8 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Course. Introduction To Machine Learning Lecture 1. Introduction To ML
No ratings yet
Course. Introduction To Machine Learning Lecture 1. Introduction To ML
46 pages
Unit 1 ML
No ratings yet
Unit 1 ML
14 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Assignment_15_Modern_AI
No ratings yet
Assignment_15_Modern_AI
3 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
10. Learning Task
No ratings yet
10. Learning Task
14 pages
Lecture#1_RL An Introduction 2023
No ratings yet
Lecture#1_RL An Introduction 2023
44 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
UNIT I
No ratings yet
UNIT I
17 pages
F20-AI-L11
No ratings yet
F20-AI-L11
52 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
Unit V
100% (1)
Unit V
24 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
136 pages
Sandholm 1996
No ratings yet
Sandholm 1996
20 pages
Winter Semester 2023-24_CSE4037_ETH_AP2023246000594_2024-01-05_Reference-Material-I
No ratings yet
Winter Semester 2023-24_CSE4037_ETH_AP2023246000594_2024-01-05_Reference-Material-I
35 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
No ratings yet
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
23 pages
ML - Questions & Answer
No ratings yet
ML - Questions & Answer
45 pages
DRL Final Notes
No ratings yet
DRL Final Notes
281 pages
Ai PPT New
No ratings yet
Ai PPT New
14 pages
Learningintro Notes
No ratings yet
Learningintro Notes
12 pages
DS Unit Iv
No ratings yet
DS Unit Iv
89 pages
1. Supervised Learning (1)
No ratings yet
1. Supervised Learning (1)
4 pages
5 Le
No ratings yet
5 Le
36 pages
lecture 9 Reiforcement learning (1)
No ratings yet
lecture 9 Reiforcement learning (1)
29 pages
AI (IT) UNIT-5
No ratings yet
AI (IT) UNIT-5
43 pages
Lecture 29 RL
No ratings yet
Lecture 29 RL
38 pages
F20-AI-L10
No ratings yet
F20-AI-L10
45 pages
Structural Glazing
100% (1)
Structural Glazing
1 page
Keto For Beginners
No ratings yet
Keto For Beginners
39 pages
2015 DM Bauxite Info ROLeon
No ratings yet
2015 DM Bauxite Info ROLeon
45 pages
500
No ratings yet
500
8 pages
Bruker AXS AXSCOM User Manual DOC-M87-EXX001 V6 Complete
No ratings yet
Bruker AXS AXSCOM User Manual DOC-M87-EXX001 V6 Complete
64 pages
File 274804260391
No ratings yet
File 274804260391
22 pages
Anatomy of Flowering Plants - DPP 01 (Of Lec 02) - (Yakeen 2.0 2023 PW Stars)
No ratings yet
Anatomy of Flowering Plants - DPP 01 (Of Lec 02) - (Yakeen 2.0 2023 PW Stars)
3 pages
Coleção Da Taverna - Parte 2
100% (2)
Coleção Da Taverna - Parte 2
364 pages
Reading 1
No ratings yet
Reading 1
136 pages
63kA Breaker SafeGear HD Customer Specification Oct 2013
No ratings yet
63kA Breaker SafeGear HD Customer Specification Oct 2013
5 pages
Parents Teachers Meet - XI
No ratings yet
Parents Teachers Meet - XI
7 pages
QSHS History
100% (1)
QSHS History
2 pages
Batu Pig Valve Brochure v4
No ratings yet
Batu Pig Valve Brochure v4
16 pages
Microsoft Excel
No ratings yet
Microsoft Excel
13 pages
IHP FULL SYLLABUS IMP QUESTIONS (1)
No ratings yet
IHP FULL SYLLABUS IMP QUESTIONS (1)
6 pages
( (Critical Voices) ) Rachel Bowlby - Talking Walking - Essays in Cultural Criticism-Sussex Academic Press (UK) (2018)
No ratings yet
( (Critical Voices) ) Rachel Bowlby - Talking Walking - Essays in Cultural Criticism-Sussex Academic Press (UK) (2018)
256 pages
(Canarium Luzonicum) and Cassva
No ratings yet
(Canarium Luzonicum) and Cassva
18 pages
Lesson Plan For Science 10
No ratings yet
Lesson Plan For Science 10
4 pages
DEVICES, ENG.6, 4TH QUARTER, WEEK 3 -Distinguish text-types according to purpose and language features
No ratings yet
DEVICES, ENG.6, 4TH QUARTER, WEEK 3 -Distinguish text-types according to purpose and language features
20 pages
COMPUTER STUDIES SS 2 1ST TERM WEEK 2
No ratings yet
COMPUTER STUDIES SS 2 1ST TERM WEEK 2
9 pages
IntelliNAC Datasheet A4 46038A 0611
No ratings yet
IntelliNAC Datasheet A4 46038A 0611
1 page
Instant Access to Java Concepts Compatible with Java 5 and 6 5th Edition ebook Full Chapters
100% (3)
Instant Access to Java Concepts Compatible with Java 5 and 6 5th Edition ebook Full Chapters
34 pages
QNET DCMCT - Workbook (Student)
No ratings yet
QNET DCMCT - Workbook (Student)
30 pages
Instant ebooks textbook The Book of 528 Prosperity Key of Love Leonard G. Horowitz download all chapters
100% (16)
Instant ebooks textbook The Book of 528 Prosperity Key of Love Leonard G. Horowitz download all chapters
60 pages
Gem LZ Nh300n
No ratings yet
Gem LZ Nh300n
1 page
ECG426 - Week 10 - I - Compaction & Consolidation III
No ratings yet
ECG426 - Week 10 - I - Compaction & Consolidation III
44 pages
Strap Manual File
No ratings yet
Strap Manual File
29 pages
Homework 1 (Due 21/4) Problem 1
No ratings yet
Homework 1 (Due 21/4) Problem 1
2 pages

Week-12

Uploaded by

Week-12

Uploaded by

Introduction to Machine Learning

- Prof. Balaraman Ravindran | IIT Madras

Problem Solving Session (Week-12)

1. Introduction to Theory in Machine Learning

● In computer science, "theory" often refers to:

● Generalization error (ε(h)): Probability that hypothesis h

● Linear classiﬁers: Deﬁned by parameters θ.

● Neural networks: Deﬁned by architecture + weights.

● VC Dimension (Vapnik-Chervonenkis Dimension) is a

● PAC (Probably Approximately Correct) learning is a formal

● The idea is:

● Let’s deﬁne the terms in “Probably Approximately Correct”:

● Probably (1 - δ): The learning algorithm should succeed with high

● Approximately (ε): The hypothesis should be close to the true concept,

● To be PAC-learnable, a class must have:

● Finite VC dimension (limited capacity to overﬁt)

● Reinforcement Learning is a type of

Supervised Unsupervised Reinforcement

● Not supervised: No one gives exact

● The RL agent tries different actions (explores).

● In chess, if you're told what

● RL involves an agent learning through interaction with an environment

Supervised Unsupervised Reinforcement

Input → Output (with Input → Pattern detection Input → Action → Scalar

Error signal guides Trial-and-error learning

● Core Idea: Predict future rewards by

● Prediction at time t+1 is more accurate

● Example: Chess—conﬁdence in winning

● Update Rule: Adjust earlier predictions

● Biological basis: Similar to

● Actions: Placing X or O in empty cells.

○ +1 for winning, 0 otherwise (binary).

○ Alternative: +1 (win), -1 (lose), 0 (draw).

● Learn a value function: Expected reward from each state

● Update values via TD learning or end-game feedback.

● Imperfect opponent required for meaningful learning

● Exploitation: Choose actions with the

What is the VC dimension of the class of linear classiﬁers in 2D space?

What is the VC dimension of the class of linear classiﬁers in 2D space?

Which of the following learning algorithms does NOT typically perform

Which of the following learning algorithms does NOT typically perform

So it does not do empirical, they called structural risk minimization.

Which of the following learning algorithms does NOT typically perform

Correct options: (d).

a) Statement 1 is true. Statement 2 is true. Statement 2 is the correct

Correct options: (b)

a) High bias, low variance

a) High bias, low variance

Correct options: (a)

Imagine you’re designing a robot that needs to navigate through a maze to

a) +5 for reaching the target,-1 for hitting a wall

a) +5 for reaching the target,-1 for hitting a wall

What is V(X4) after one application of the given formula?

What is V(X4) after one application of the given formula?

Correct options: (b)

What is V(X1) after one application of given formula?

Correct options: (d)

What is V(X1) after V converges?

a) 0.59 - V(X4) = 0.9 → V(X3) =0.81 → V(Start) = 0.729 → V(X2) = 0.656 →

Correct options: (a)

You might also like