0% found this document useful (0 votes)
5 views

Week-12

Uploaded by

Harshit Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Week-12

Uploaded by

Harshit Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Introduction to Machine Learning

- Prof. Balaraman Ravindran | IIT Madras

Problem Solving Session (Week-12)


Shreya Bansal
PMRF PhD Scholar
IIT Ropar
Week-12 Contents

1. Introduction to Theory in Machine Learning


2. VC Dimension
3. Reinforcement Learning
Introduction to Theory in Machine Learning

● In computer science, "theory" often refers to:


○ Problem hardness (how difficult a problem is to solve)
○ Space/time complexity
○ Approximability (how close solutions are to optimal)
● Goal: Apply similar theoretical frameworks to machine
learning.
● Key questions:
○ How hard is it to approximate a solution?
○ How do we measure the hardness of ML problems?
Generalization Error

● Generalization error (ε(h)): Probability that hypothesis h


makes a mistake on a new data point (x,y) sampled from
distribution D:
ϵ(h)=P(x,y)∼D(h(x)≠y))
● Empirical error (ε̂(h)): Error measured on training data:
ϵ^(h)=(1/m) i=1∑m1(h(xi)≠yi)
● Challenge: Estimate generalization error using empirical
error.
Empirical Risk Minimization (ERM)
● Most learning algorithms aim to minimize empirical error:

● Hypothesis class H:

● Linear classifiers: Defined by parameters θ.

● Neural networks: Defined by architecture + weights.

● Practical issue: ERM may not find the true minimizer (e.g., due to
optimization challenges).
Key Theoretical Tools
Uniform Convergence
Uniform Convergence
Probabilistic Guarantee
Bound for ERM Solution
Sample Complexity
Sample Complexity
VC Dimension

● VC Dimension (Vapnik-Chervonenkis Dimension) is a


measure of the capacity (or complexity) of a statistical
classification model, specifically in terms of its ability to
shatter datasets.
Key Concepts

● Hypothesis Space
● This is the set of all possible functions (or classifiers) that a model can learn. For
example, in linear classifiers, it's all possible straight lines (in 2D) that can divide the
space into two classes.
● Shattering
● A hypothesis space shatters a set of points if it can correctly classify all possible labelings
of those points.
○ For n points, there are 2𝑛 possible ways to label them (each point can be +1 or
-1).
○ If your model can classify all 2n labelings correctly using some function in
the hypothesis space, then it shatters those n points.
● VC Dimension
● The VC dimension of a hypothesis class is the maximum number of points that can be
shattered by the hypothesis class.
Examples -1
Examples
Examples -2
Why Does VC Dimension Matter?
What is PAC Learning?

● PAC (Probably Approximately Correct) learning is a formal


framework that helps us understand when and how well a
machine learning algorithm can learn a concept from data.

● The idea is:


“A concept is PAC-learnable if, with high probability, a
learner can find a hypothesis that is approximately
correct using a reasonable amount of data and
computation.”
Breaking it Down: PAC Terminology

● Let’s define the terms in “Probably Approximately Correct”:

● Probably (1 - δ): The learning algorithm should succeed with high


probability (e.g., 95% of the time).

● Approximately (ε): The hypothesis should be close to the true concept,


i.e., error ≤ ε.

● Correct: Refers to how well the model generalizes on unseen data (not
just training data).
Formal Definition
Relationship with VC Dimension
Examples of PAC Learnable Classes
Intuition: What Makes a Class PAC-Learnable?

● To be PAC-learnable, a class must have:

● Finite VC dimension (limited capacity to overfit)


● Efficient learning algorithm (computable in reasonable time)
● Sufficient data to learn from
What is Reinforcement Learning (RL)?

● Reinforcement Learning is a type of


machine learning where an agent
learns by interacting with its
environment, using trial and error,
and receiving feedback in the form
of rewards or punishments.
● It’s neither supervised nor
unsupervised learning.
Key Comparisons

Supervised Unsupervised Reinforcement


Learning Learning Learning

You’re given labeled data You’re given unlabeled You’re not told the right
- inputs with data, with no clear answer.
corresponding outputs output.
(e.g., image → “cat”). You interact with an
environment, try actions,
The model learns to map The goal is to find and observe the outcomes
inputs to outputs. structure or patterns (rewards/punishments).
(e.g., clustering,
Examples: association rules). Learning is driven by
Classification, experience, not direct
Regression. supervision.
Illustrative Example: Learning to Ride a Bicycle

● Not supervised: No one gives exact


instructions (like move your foot 3
pounds, tilt 2 degrees).
● Not unsupervised: You don’t just watch
people ride and figure it out.
● It’s reinforcement: You try, fall, learn not
to fall. You get minimal feedback — like
pain (punishment) or clapping (reward) —
and gradually figure it out.
Trial and Error is Key

● The RL agent tries different actions (explores).


● It learns which actions lead to better outcomes over time.
● The delayed nature of rewards adds complexity. You might get
feedback well after the action that caused it.
Pavlov’s Dog & RL Origins

● RL is inspired by
behavioral psychology —
like Pavlov’s classical
conditioning.
● The modern field was
kickstarted by Sutton &
Barto, whose 1983 work
laid the foundation for
today's algorithms and
techniques.
Games as a RL Metaphor

● In chess, if you're told what


move to play in a given
situation, that’s supervised
learning.
● If you just play, win or lose,
and only get a reward at the
end, that’s reinforcement
learning.
● You must figure out which
moves led to winning, despite
delayed rewards.
Reinforcement Learning (RL)

● RL involves an agent learning through interaction with an environment


(e.g., helicopter, game board, opponent).
● The agent senses the state of the environment and takes actions to
influence it.
● Key challenge: Actions must consider long-term benefits, not just
immediate rewards (e.g., chess strategy).
● Reward signal: Scalar feedback from the environment (e.g., +1 for
winning, -100 for crashing).
● Biological analogy: Rewards/punishments are interpreted from sensory
inputs (e.g., pain as negative feedback).
Key Comparisons

Supervised Unsupervised Reinforcement


Learning Learning Learning

Input → Output (with Input → Pattern detection Input → Action → Scalar


target labels). (no explicit labels). reward (no target labels).

Error signal guides Trial-and-error learning


learning (e.g., gradient (exploration required).
descent).
Temporal Difference (TD) Learning

● Core Idea: Predict future rewards by


updating estimates based on later
predictions.

● Prediction at time t+1 is more accurate


than at t.

● Example: Chess—confidence in winning


increases as the game progresses.

● Update Rule: Adjust earlier predictions


using newer information (e.g., reduce
winning probability from 0.6 to 0.55 if
later evidence suggests lower odds).

● Biological basis: Similar to


dopamine-driven learning in brains.
Tic-Tac-Toe as an RL Problem
● States: Board configurations.

● Actions: Placing X or O in empty cells.

● Rewards:

○ +1 for winning, 0 otherwise (binary).

○ Alternative: +1 (win), -1 (lose), 0 (draw).

● Key Points:

● Learn a value function: Expected reward from each state


(probability of winning).

● Update values via TD learning or end-game feedback.

● Imperfect opponent required for meaningful learning


(perfect play leads to always draws).
Exploration vs. Exploitation

● Exploitation: Choose actions with the


highest known rewards.
● Exploration: Try suboptimal actions to
discover better strategies.
● Challenge: Balancing exploration
(learning) vs. exploitation (performance).
● Bandit Problems: Simplified RL focusing on
this trade-off (no sequential states).
● Example: In tic-tac-toe, occasionally pick
random moves to avoid repeating
suboptimal paths.
Assignment-11 (Cs-101- 2024) (Week-12)

Source
Question-1

What is the VC dimension of the class of linear classifiers in 2D space?

a) 2
b) 3
c) 4
d) None of the above
Question-1- Correct answer

What is the VC dimension of the class of linear classifiers in 2D space?

a) 2
b) 3
c) 4
d) None of the above

Correct options: (b) Any 3 points can be classified using a linear decision
boundary
Question-2

Which of the following learning algorithms does NOT typically perform


empirical risk minimization?

a) Linear regression
b) Logistic regression
c) Decision trees
d) Support Vector Machines
Question-2 - Explanation

Which of the following learning algorithms does NOT typically perform


empirical risk minimization?

SVM, is called structural risk minimization because they have an additional constraint that is
there apart from the empirical they also try to minimize the solution size.

They try to minimize the norm of the weight factor so that actually gives rest to a different kind
of minimization.

So it does not do empirical, they called structural risk minimization.


Question-2- Correct answer

Which of the following learning algorithms does NOT typically perform


empirical risk minimization?

a) Linear regression
b) Logistic regression
c) Decision trees
d) Support Vector Machines

Correct options: (d).


Question-3
Statement 1: As the size of the hypothesis class increases, the sample complexity for PAC learning
always increases.
Statement 2: A larger hypothesis class has a higher VC dimension. Choose the correct option:

a) Statement 1 is true. Statement 2 is true. Statement 2 is the correct


reason for statement 1
b) Statement 1 is true. Statement 2 is true. Statement 2 is not the correct
reason for statement 1
c) Statement 1 is true. Statement 2 is false
d) Both statements are false
Question-3
Statement 1: As the size of the hypothesis class increases, the sample complexity for PAC learning
always increases.
Statement 2: A larger hypothesis class has a higher VC dimension. Choose the correct option:
Question-3
Statement 1: As the size of the hypothesis class increases, the sample complexity for PAC learning
always increases.
Statement 2: A larger hypothesis class has a higher VC dimension. Choose the correct option:
Question-3
Statement 1: As the size of the hypothesis class increases, the sample complexity for PAC learning
always increases.
Statement 2: A larger hypothesis class has a higher VC dimension. Choose the correct option:
Question-3 - Correct answer
Statement 1: As the size of the hypothesis class increases, the sample complexity for PAC learning always increases.
Statement 2: A larger hypothesis class has a higher VC dimension. Choose the correct option:

a) Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statement 1
b) Statement 1 is true. Statement 2 is true. Statement 2 is not the correct reason for statement
1
c) Statement 1 is true. Statement 2 is false
d) Both statements are false

Correct options: (b)


Question-4
When a model’s hypothesis class is too small, how does this affect the model’s
performance in terms of bias and variance

a) High bias, low variance


b) Low bias, high variance
c) High bias, high variance
d) Low bias, low variance
Question-4 - Correct answer
When a model’s hypothesis class is too small, how does this affect the model’s
performance in terms of bias and variance

a) High bias, low variance


b) Low bias, high variance
c) High bias, high variance
d) Low bias, low variance

Correct options: (a)


Question-5

Imagine you’re designing a robot that needs to navigate through a maze to


reach a target. Which reward scheme would be most effective in teaching
the robot to find the shortest path

a) +5 for reaching the target,-1 for hitting a wall


b) +5 for reaching the target,-0.1 for every second that passes before the robot reaches the
target.
c) +5 for reaching the target,-0.1 for every second that passes before the robot reaches the
target, +1 for hitting a wall.
d) -5 for reaching the target, +0.1 for every second that passes before the robot reaches the
target.
Question-5 - Correct answer
Imagine you’re designing a robot that needs to navigate through a maze to reach a target. Which
reward scheme would be most effective in teaching the robot to find the shortest path

a) +5 for reaching the target,-1 for hitting a wall


b) +5 for reaching the target,-0.1 for every second that passes before the robot reaches the target.
c) +5 for reaching the target,-0.1 for every second that passes before the robot reaches the target, +1 for
hitting a wall.
d) -5 for reaching the target, +0.1 for every second that passes before the robot reaches the target.

Correct options: (b) The +5 reward for reaching the target encourages goal achievement, while
the-0.1 penalty for each second promotes finding the shortest path. Omitting rewards for
hitting walls as question has nothing in this regard
Question-6-8
For the rest of the questions, we will follow a simplistic game and see how a
Reinforcement Learning agent can learn to behave optimally in it.
This is our game:

a) At the start of the game, the agent is on the Start state and can choose to move left or right
at each turn.
b) If it reaches the right end(RE), it wins and if it reaches the left end(LE), it loses.
c) Because we love maths so much, instead of saying the agent wins or loses, we will say that
the agent gets a reward of +1 at RE and a reward of-1 at LE. Then the objective of the agent
is simply to maximum the reward it obtains!
Question-6
For each state, we define a variable that will store its value. The value of the state will help the
agent determine how to behave later. First we will learn this value. Let V be the mapping from
state to its value.
Initially,
V(LE) =-1
V(X1) = V(X2) = V(X3) = V(X4) = V(Start) = 0
V(RE) = +1
For each state S ∈ {X1,X2,X3,X4,Start}, with SL being the state to its immediate left and SR being
the state to its immediate right, repeat: V(S) = 0.9×max(V(SL),V(SR))
Till V converges (does not change for any state).

What is V(X4) after one application of the given formula?

a) 1
b) 0.9
c) 0.81
d) 0
Question-6 - Correct answer

What is V(X4) after one application of the given formula?

a) 1 V(X4) = 0.9×max(V(X3),V(RE))
b) 0.9
V(S) = 0.9×max(0,+1) = 0.9
c) 0.81
d) 0

Correct options: (b)


Question-7
What is V(X1) after one application of given formula?

(a)-1

(b)-0.9

(c)-0.81

(d) 0
Question-7 - Correct answer

What is V(X1) after one application of given formula?

a) -1
b) -0.9
c) -0.81
d) 0- V(X1) = 0.9×max(V(LE),V(X2)) V(S) = 0.9×max(−1,0) = 0

Correct options: (d)


Question-8
What is V(X1) after V converges?

a) 0.59
b) -0.9
c) 0.63
d) 0
Question-8 - Correct answer

What is V(X1) after V converges?

a) 0.59 - V(X4) = 0.9 → V(X3) =0.81 → V(Start) = 0.729 → V(X2) = 0.656 →


V(X1) = 0.59 Final value for X1 is 0.59.
b) -0.9
c) 0.63
d) 0

Correct options: (a)


Suggestions and Feedback

Next Session:

Wednesday:
16-Apr-2025
6:00 - 8:00 PM

You might also like