0% found this document useful (0 votes)
24 views

ITML U1 Overview

The document outlines a machine learning course that introduces students to basic machine learning concepts like supervised and unsupervised learning, and covers specific techniques including regression, support vector machines, decision trees, Bayesian classifiers, clustering, genetic algorithms, and reinforcement learning. The syllabus details 5 units that cover topics such as concept learning, parametric methods, Bayesian learning, decision trees, instance-based learning, and more.

Uploaded by

jainkomal1976
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

ITML U1 Overview

The document outlines a machine learning course that introduces students to basic machine learning concepts like supervised and unsupervised learning, and covers specific techniques including regression, support vector machines, decision trees, Bayesian classifiers, clustering, genetic algorithms, and reinforcement learning. The syllabus details 5 units that cover topics such as concept learning, parametric methods, Bayesian learning, decision trees, instance-based learning, and more.

Uploaded by

jainkomal1976
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Machine Learning

Dr. N. Kalyani
Professor
Text Book
• Machine Learning, Tom T. Mitchell, McGraw-Hill, 2013
• Ethem Alpaydin, Introduction to Machine Learning, 2nd Edition, MIT
Press, 2010.
Course Objectives
1. To introduce students to the basic concepts of Machine Learning.
2. To become familiar with regression and SVM.
3. To study the Decision tree and Bayesian Classifiers.
4. To understand instance-based learning and clustering techniques
5. To learn Genetic algorithms and reinforcement learning.
Syllabus
UNIT 1:
Introduction to Machine Learning: What is Machine Learning, Examples of Machine
Learning Applications, Types of Machine Learning systems-Supervised learning,
Unsupervised learning, Reinforcement learning, Learning a class from examples.
Concept learning and the general to specific ordering – Introduction, A concept
learning task, Concept learning as search, find-S, Version spaces and the candidate
elimination algorithm.
UNIT 2:
Parametric methods: Introduction, Maximum Likelihood Estimation, Evaluating an
Estimator: Bias and Variance, The Bayes‟ Estimator, Parametric Classification,
Regression.
Support vector machines: Introduction, Optimal hyperplane for linearly separable
patterns, Quadratic optimization for finding the optimal hyperplane, Statistical
properties of the optimal hyperplane, Optimal hyperplane for non-separable patterns,
SVM non-linear regression.
Syllabus
UNIT 3:
Bayesian Learning: Introduction, Bayes theorem, Bayes theorem and concept learning, Bayes
optimal classifier, Naïve Bayes classifier.
Decision Tree Learning: Introduction, Decision tree representation, Appropriate problems for
decision tree learning, The basic decision tree learning algorithm, Hypothesis space search in
decision tree learning, Inductive bias in decision tree learning, Issues in decision tree learning.
UNIT 4:
Instance-based learning: Introduction, KNN learning, Distance weighted NN algorithm,
Remarks on KNN Algorithm.
Unsupervised learning: Introduction, K-means clustering technique, Hierarchical clustering,
Choosing the number of clusters.
UNIT 5:
Reinforcement Learning: Introduction, The learning task, Q learning, Nondeterministic rewards
and actions.
Genetic Algorithm: Motivation, Genetic algorithms, An illustrative example, Hypothesis space
search, Genetic programming.
Outline
• Machine Learning Overview
• Learning Methods
• Types of Machine Learning
Machine Learning Overview
Learner Domain

Performance Goal

Machine
Success Criteria Learning Representation

Prior Knowledge Algorithmic approach

Training Information Source


Learning Methods

Rote Learning Memorizing, Save Knowledge, Association based storage, Repeated computation
Issue : Organization, Generalization, Stability

Learning by Methods: Develop sophisticated tools, generating rules from High level abstraction
advice
Process: Request/enquire, Interpret, Operationalize, Integrate, Evaluate
Learning by
parameter Define static evaluation functions using the significant attributes that define the output
adjustment Process: Initialize, modify the weights using input samples

Learning by Identify the similarities and dissimilarities within the data that is not labeled.
Analogy
Types: Transformational Analogy, Derivational Analogy
Learning by
Macro Same as Rote learner
operations Avoid expensive re computation by using macro-operation

8
Types of Learning
Learning

Supervised Semi Unsupervised


Supervised Reinforcement
Clustering
K Means Q Learning
Regression Classification Hierarchical State Action
Simple Linear Dimensionality Reward State
Multiple linear KNN reduction Action (SARSA)
Polynomial SVM Principal Deep Q Network
Kernel SVM Component Analysis Markov Decision
Support Vector
Linear Discriminant Process
Ridge Naïve Bayes
Analysis Deep Deterministic
Lasso Decision Tree Policy gradient
Kernel Principal
Elastic Net Random Forest Component analysis (DDPG)
Bayesian
Decision Tree
Random Forest
9
Examples of Machine Learning Applications
Learning Unsupervised Reinforcement
Classification Regression
Association Learning Learning
• Basket • Credit scoring • Car price • Customer • Game playing
Analysis • Risk analysis prediction segmentation • Robot
• Recommender • Pattern • Stock price • Density navigation
systems recognition prediction estimate
• Speech • Navigation of clustering
recognition mobile robot • Image
• Medical compression
diagnosis • Document
• Face clustering
recognition
• Biometric
recognition
Concept Learning
• Learning from examples
• Specific to General or General-to-specific ordering over hypotheses
• Version Spaces
• Find S and candidate elimination algorithm
• The need for inductive bias
Some Examples for SmileyFaces

Eyes Nose Head Fcolor Hair? Smile?


Features from Computer View

Eyes Nose Head Fcolor Hair? Smile?


Round Triangle Round Purple Yes Yes
Square Square Square Green Yes No
Square Triangle Round Yellow Yes Yes
Round Triangle Round Green No No
Square Square Round Yellow Yes Yes
Representing Hypotheses
Many possible representations for hypotheses h
Idea: h as conjunctions of constraints on features each can be:
– a specific value (e.g., Nose = Square)
– don’t care (e.g., Eyes = ?)
– no value allowed (e.g., Fcolor=Ø)
For example,

Eyes Nose Head Fcolor Hair?


?
<Round, ?, Round, ?, No>
Prototypical Concept Learning Task
Given:
– Instances X: Faces, each described by the attributes
Eyes, Nose, Head, Fcolor, and Hair?
– Target function c: Smile? : X -> { no, yes }
– Hypotheses H: Conjunctions of literals such as
<?,Square,Square,Yellow,?>
– Training examples D: Positive and negative examples of the
target function
 x1 , c(x1 ) ,  x2 , c(x2 ) ,...,  xm , c(xm ) 

Determine: a hypothesis h in H such that h(x)=c(x) for all x in D.


Inductive Learning Hypothesis
Any hypothesis found to approximate the target function well over a
sufficiently large set of training examples will also approximate the
target function well over other unobserved examples.

• What are the implications?


• Is this reasonable?
• What (if any) are our alternatives?
• What about concept drift (what if our views/tastes change over
time)?
Instances, Hypotheses, and More-General-
Than
Instances X Hypotheses H

General

h3

h1 h2
Specific

h1=<Round,?,Square,?,?>
x1=<Round,Square,Square,Purple,Yes>
h2=<Round,?,?,?,Yes>
x2=<Round,Square,Round,Green,Yes>
h3=<Round,?,?,?,?>
Hypothesis Space Search by Find-
SInstances X Hypotheses Hh 5

x
2
General
x x h 3,4

1 3

x
5

x
h1,2
4 Specific
h0
h0=< >
x1=<Round,Triangle,Round,Purple,Yes> + h1=<Round,Triangle,Round,Purple,Yes>
x2=<Square,Square,Square,Green,Yes> - h2=<Round,Triangle,Round,Purple,Yes>
x3=<Square,Triangle,Round,Yellow,Yes> + h3=<?,Triangle,Round,?,Yes>
x4=<Round,Triangle,Round,Green,No> - h4=<?,Triangle,Round,?,Yes>
x5=<Square,Square,Round,Yellow,Yes> + h5=<?,?,Round,?,Yes>
General to Specific
ordering
H1 = {Round, ? ,Round, Purple, Yes }
H2 = {?, ?, Round, ?, Yes }
Definition: Let hi and hj be Boolean-valued functions defined over X.
Then hi is more general than or equal to hj
(written hi ≥g hj) if and only if (x  X)[(hj(x)=1) →(hi(x)=1)
(Strictly) more general than
(written hi g hj) if and only if (hi ≥g hj) (hj ≥g hi)
Find-SAlgorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
IF the constraint ai in h is satisfied by x THEN do nothing
ELSE
replace ai in h by next more general constraint satisfied by x

3. Output hypothesis h
Training Examples - 2 EnjoySport

Sky Temp Humid Wind Water Forecst EnjoySport

Sunny Warm Normal Strong Warm Same Yes


Sunny Warm High Strong Warm Same Yes
Rainy Cold High Strong Warm Change No
Sunny Warm High Strong Cool Change Yes

What is the general concept?


Prototypical Concept Learning Task(1/2)
• Given:
• Instances X: Possible days, each described by the attributes
Sky, AirTemp, Humidity, Wind, Water, Forecast
• Target function c: EnjoySport : X → {0, 1}
• Hypotheses H: Conjunctions of literals. E.g.
<?, Cold, High, ?, ?, ?>.
• Training examples D: Positive and negative examples of the target
function
< x1, c(x1)>, … <xm, c(xm)>
• Determine: A hypothesis h in H such that h(x) =c(x) for all x in D.
Hypothesis Space Search by Find-S

23
Representing Hypotheses
Sky Temp Humid Wind Water Forecst EnjoySport
Sunny Warm Normal Strong Warm Same Yes
Sunny Warm High Strong Warm Same Yes
Rainy Cold High Strong Warm Change No
Sunny Warm High Strong Cool Change Yes

What are the attributes and attribute values ?


What is size of search space ?
What happens when each sample input contradicts each
other?
Complaints / Issues about Find-
S
• Cannot tell whether it has learned concept
• Cannot tell when training data inconsistent
• Picks a maximally specific h (why?)
• Depending on H, there might be several!

• How do we fix this?


The List-Then-Eliminate Algorithm

1.Set VersionSpace equal to a list containing every hypothesis in H


2. For each training example, <x,c(x)> remove from VersionSpace any
hypothesis h for which h(x) != c(x)
3. Output the list of hypotheses in VersionSpace
• But is listing all hypotheses reasonable?
• How many different hypotheses in our simple problem?
– How many not involving “?” terms?
Version
Spaces
A hypothesis h is consistent with a set of training examples D
of target concept c if and only if h(x)=c(x) for each training
example in D.
Consistent (h, D)  (  x, c(x)   D) h( x)  c(x)
The version space, VSH,D, with respect to hypothesis space H and
training examples D, is the subset of hypotheses from H consistent
with all training examples in D.
VS H ,D  {h  H | Consistent (h, D)}
Representing Version Spaces
The General boundary, G, of version space VSH,D is the set of its
maximally general members.
G{g  H / Consistent(g , D)  ( g` H )[(g`>g g )  Consistent(g`, D)]}

The Specific boundary, S, of version space VSH,D is the set of its


maximally specific members.
S{s  H / Consistent(s, D)  ( s` H)[(s` >g s`)  Consistent(s`, D)]}

Every member of the version space lies between these boundaries


VSH ,D  {h  H | ( s  S )( g G)( g  h  s)}
where x  y means x is more general or equal to y
Example Version
Eyes Nose Head Fcolor
Space
Hair? Smile?
Round Triangle Round Purple Yes Yes

Square Square Square Green Yes No G: { <?,?,Round,?,?> <?,Triangle,?,?,?> }


Square Triangle Round Yellow Yes Yes

Round Triangle Round Green No No

Square Square Round Yellow Yes Yes

<?,?,Round,?,Yes> <?,Triangle,Round,?,?> <?,Triangle,?,?,Yes>

S: { <?,Triangle,Round,?,Yes> }
Candidate Elimination Algorithm
1/2
G = maximally general hypotheses in H
S = maximally specific hypotheses in H

For each training example d, do If d is a positive example


Remove from G any hypothesis that does not include d
For each hypothesis s in S that does not include d
Remove s from S
Add to S all minimal generalizations h of s such that
1.h includes d, and
2.Some member of G is more general
than h
Remove from S any hypothesis that is more
general than another hypothesis in S
Candidate Elimination Algorithm 2/2
For each training example d, do (cont) If d is a negative example
Remove from S any hypothesis that does include d
For each hypothesis g in G that does include d
Remove g from G
Add to G all minimal generalizations h of g such that
1.h does not include d, and
2.Some member of S is more specific than h
Remove from G any hypothesis that is less general than another hypothesis in G

If G or S ever becomes empty, data not consistent (with H)


Example Version
Space
Eyes Nose Head Fcolor Hair? Smile?
Round Triangle Round Purple Yes Yes

Square Square Square Green Yes No

Square Triangle Round Yellow Yes Yes

Round Triangle Round Green No No


Square Square Round Yellow Yes Yes
Example Trace
G1 G0: { <?,?,?,?,?> }
X2=<S,S,S,G,Y> -
G2: { <R,?,?,?,?>, <?,T,?,?,?>, <?,?,R,?,?>, <?,?,?,P,?> } G3

G5 G4: { <?,T,?,?,Y>, <?,?,R,?,Y> } X4=<R,T,R,G,N> -

S5: { <?,?,R,?,Y> } X5=<S,S,R,Y,Y> +


Eyes Nose Head Fcolor Hair? Smile?
S4 S3: { <?,T,R,?,Y> } X3=<S,T,R,Y,Y> +
Round Triangle Round Purple Yes Yes
Square Square Square Green Yes No
S2 S1: X1=<R,T,R,P,Y> +
{ <R,T,R,P,Y> } Square Triangle Round Yellow Yes Yes

S0: { <Ø,Ø,Ø,Ø,Ø> } Round Triangle Round Green No No


Square Square Round Yellow Yes Yes
How Should These Be Classified?
G: { <?,?,Round,?,?> <?,Triangle,?,?,?> }

<?,?,Round,?,Yes> <?,Triangle,Round,?,?> <?,Triangle,?,?,Yes>

S: { <?,Triangle,Round,?,Yes> }

?
Example to find version space

Sky Temp Humid Wind Water Forecst EnjoySport


Sunny Warm Normal Strong Warm Same Yes
Sunny Warm High Strong Warm Same Yes
Rainy Cold High Strong Warm Change No
Sunny Warm High Strong Cool Change Yes
Example 2 Version Space
Example Trace

37
What Next Training Example?

38
Remarks on Version Space and Candidate Elimination

1. Will the Candidate –Elimination Algorithm Converge to


the correct Hypothesis?
2. What Training Examples should the learner request next?
3. How can partially learned concepts be used?

a. <Sunny Warm Normal Strong Cool Change>


b. <Rainy Cold Normal Light Warm Same>
c. <Sunny Warm Normal Light Warm Same>
d. <Sunny Cold Normal Strong Warm Same>

39
Inductive Bias

Biased Hypothesis Space

+ < Round, Triangle, Round, Purple, Yes >


+ < Square, Triangle, Round, Yellow, Yes >

S: < ?, Triangle, Round, ?, Yes >

Why believe we can classify the unseen?


< Square, Triangle, Round, Purple, Yes > ?
An UN-Biased Learner

Idea: Choose H that expresses every teachable concept


(i.e., H is the power set of X)
Consider H’ = disjunctions, conjunctions, negations over previous H.
For example:
 ?,Triangle, Round ,?,Yes    Square, Square, ?, Purple,? 

What are S, G, in this case?

S : Hypothesis with disjunction of positive examples


G: negated disjunction of negative examples
Inductive Bias
Consider
– concept learning algorithm L
– instances X, target concept c
– training examples Dc={<x,c(x)>}
– let L(xi,Dc) denote the classification assigned to the
instance xi by L after training on data Dc.
Definition:
The inductive bias of L is any minimal set of assertions B
such that for any target concept c and corresponding
training examples Dc
(xi  X )[(B  Dc  xi ) L( xi , Dc )]
where A B means A logically entails B
Inductive Systems and Equivalent Deductive
Systems
Inductive System
Candidate Classification of
Training examples new instance,
Elimination or "don't know"
Algorithm
New instance Using Hypothesis
Space H

{{{[

Equivalent Deductive System


Training examples Classification of
new instance,
New instance or "don't know"
Theorem Prover
Assertion "H
contains hypothesis"
Three Learners with Different Biases
1. Rote learner: store examples, classify new
instance iff it matches previously
observed example (don’t know
otherwise).

2. Version space candidate elimination


algorithm.

3. Find-S
Summary Points
1. Concept learning as search through H
2. General-to-specific ordering over H
3. Version space candidate elimination algorithm
4. S and G boundaries characterize learner’s
uncertainty
5. Learner can generate useful queries
6. Inductive learning possible only if learner is
unbiased
7. Inductive learners can be modeled by equivalent
deductive systems

You might also like