Lect6 PDF
Lect6 PDF
Machine Learning - 1
Motivation
Recommender systems
eg. Amazon --> you might like this book
eg. LinkedIn --> people you might know
Pattern Recognition
Learning to recognize postal codes
Handwritten recognition --> who wrote this
cheque?
Types of Learning
In Supervised learning
big teeth
big eyes
no moustache
small nose
small teeth
small eyes
no moustache
f(X) = person
small nose
big teeth
small eyes
moustache
f(X) = ?
moustache
f(X) = ?
In Reinforcement learning
big teeth
small eyes
But somehow we are told whether our learned f(X) is right or wrong
Goal: maximize the nb of right answers
In Unsupervised learning
big teeth
big eyes
no moustache
not given
small nose
small teeth
small eyes
no moustache
not given
small nose
big teeth
small eyes
moustache
f(X) = ?
Inductive Learning
X
X = features of a face (ex. small nose, big teeth, )
f(X) = function to tell if X represents a human face or not
7
Example
Ex: X
Example
Techniques in ML
Probabilistic Methods
Decision Trees
Genetic algorithms
Neural networks
11
Today
12
Guess Who?
Play online
13
Decision Trees
14
Example
Info on last years students to determine if a student will get an A this year
Features
Student
A last
year?
Black
hair?
Output f(X)
Works
hard?
Drinks?
A this year?
X1: Richard
Yes
Yes
No
Yes
No
X2: Alan
Yes
Yes
Yes
No
Yes
X3: Alison
No
No
Yes
No
No
X4: Jeff
No
Yes
No
Yes
No
X5: Gail
Yes
No
Yes
Yes
Yes
X6: Simon
No
Yes
Yes
Yes
No
16
Example
Features
A last year?
yes
no
Output = No
Works hard?
yes
no
Output = Yes
Output = No
Output
f(X)
Student
A last
year?
Black
hair?
Works
hard?
Drinks
?
Richard
Yes
Yes
No
Yes
No
Alan
Yes
Yes
Yes
No
Yes
Alison
No
No
Yes
No
No
Jeff
No
Yes
No
Yes
No
Gail
Yes
No
Yes
Yes
Yes
Simon
No
Yes
Yes
Yes
No
A this
year?
17
Training data:
19
20
21
empty tree
F3?
F4?
class
F5?
class
class
F6?
class
class
F7?
class
class
class
F1?
class
F2?
class
F3?
class
F4?
class
F5?
class
F6?
class
F7?
class
class
23
ID3
24
Intuitively
Patron:
Output f(X)
Type:
If value
If value
If value
If value
is
is
is
is
25
Next Feature
hungry
type:
If value
If value
If value
If value
4 tests instead of 9
11 branches instead of 21
27
H(X) = p(x)log2p(x)
xX
1 1
H(fair coin toss) = p(xi )log2p(xi ) = H ,
2 2
xi X
1 1
1
1
= log2 + log2 = 1 bit
2 2
2
2
H(a,b) = entropy if
probability of success = a and
probability of failure = b
28
Entropy
100
100 100
100
Entropy
v values(A)
Sv
S
x H(Sv )
32
Some Intuition
Size
Color
Shape
Output
Big
Red
Circle
Small
Red
Circle
Small
Red
Square
Big
Blue
Circle
33
Color
Shape
Big
Red
Circle
Small
Red
Circle
Small
Red
Square
Big
Blue
Circle
1 1
1
1
H(S) = log2 + log2 = 1
2 2
2
2
for each v of Values(Color)
Output
Values(Color) = {red,blue}
Color
red: 2+ 1-
blue: 0+ 1-
v values(Color)
Sv
S
x H(Sv )
2 1
1
2 1
2
H(S | Color = red) = H , = log2 + log2 = 0.918
3 3
3
3 3
3
1
1
H(S | Color = blue) = H(1,0 ) = log2 = 0
1
1
3
1
H(S | Color) = (0.918) + (0) = 0.6885
4
4
gain(Color) = H(S) - H(S | Color) = 1 - 0.6885 = 0.3115
34
Color
Shape
Output
Big
Red
Circle
Small
Red
Circle
Small
Red
Square
Big
Blue
Circle
Shape
circle: 2+ 1-
square: 0+ 1-
Note: by definition,
Log 0 = -
0log0 is 0
1 1
1
1
H(S) = log2 + log2 = 1
2 2
2
2
3
1
H(S | Shape) = (0.918) + (0) = 0.918
4
4
gain(Shape) = H(S) - H(S | Shape) = 1 - 0.918 = 0.3115
35
Color
Shape
Output
Big
Red
Circle
Small
Red
Circle
Small
Red
Square
Big
Blue
Circle
Size
big: 1+ 1-
small: 1+ 1-
1 1
1
1
H(S) = log2 + log2 = 1
2 2
2
2
1
1
H(S | Size) = (1) + (1) = 1
2
2
gain(Size) = H(S) - H(S | Size) = 1 - 1 = 0
36
Color
Shape
Output
Big
Red
Circle
Small
Red
Circle
Small
Red
Square
Big
Blue
Circle
gain(Shape) = 0.3115
gain(Color) = 0.3115
gain(Size) = 0
Training data:
38
gain(fri) = ...
gain(hun) = ...
2
0 2 4
0 4 6
2 4
gain(pat) = 1 x H , +
x H , +
x H ,
2 2 12
4 4 12
6 6
12
2
0 2
2 4
0
4
4
0
0
= 1 x - log2 + log2 + x log2
+ log2 + ... 0.541bits
2 2
2 12
4
4
4
2
4
12
gain(res) = ...
2
1 1 2
1 1 4
2 2 4
2 2
gain(type) = 1 x H , +
x H , +
x H , + x H , = 0 bits
2 2 12
2 2 12
4 4 12
4 4
12
gain(est) = ...
39
Decision Boundaries of
Decision Trees
Feature 1
Feature 2
40
Decision Boundaries of
Decision Trees
Feature 1
Feature 2 > t1
??
t1
Feature 2
41
Decision Boundaries of
Decision Trees
Feature 1
Feature 1 > t1
t2
Feature 2 > t2
t1
Feature 2
??
42
Decision Boundaries of
Decision Trees
Feature 2 > t1
Feature 1
Feature 2 > t2
t2
t3
t1
Feature 2
Feature 2 > t3
43
44
Today
45
Eg:
46
Evaluation Methodology
Standard methodology:
1. Collect a large set of examples (all with correct classifications)
2. Divide collection into two disjoint sets: training set and test set
3. Apply learning algorithm to training set
DO NOT LOOK AT THE TEST SET !
4. Measure performance with the test set
how well does the function you learn in 3. correctly classify the
examples in the test set? (ie. Compute accuracy)
47
Error analysis
correct class
(that should have
been assigned)
C1
C2
C3
C4
C5
C6
Total
C1
99.4
.3
.3
100
C2
90.2
3.3
4.1
100
C3
.1
93.9
1.8
.1
1.9
100
C4
.5
2.2 95.5
.2
100
C5
.3
1.4 96.0
2.5
100
C6
1.9
3.4 93.3
100
48
A Learning Curve
49
Noisy input
Overfitting/underfitting the training data
50
Noisy Input
Color
Shape
Output
Big
Red
Circle
Big
Red
Circle
51
Overfitting / Underfitting
52
Cross-validation
run k experiments, each time you test on 1/k of the data, and train on the rest
than you average the results
train
exp2:
train
exp3:
test
train
test
test
train
train
53
Today
54
Unsupervised Learning
big teeth
small eyes
moustache
f(X) = ?
Clustering
X5
X2
X1
a1
a2
a3
Output
X1
X2
X3
X4
X5
X4
X3
56
Clustering
Clustering algorithm
57
k-means Clustering
User selects how many cluster they want (the value of k)
Euclidean Distance
d=
(p q )
i
i=1
10
9
8
7
6
5
4
9 10
59
c1
3
c2
c3
0
0
5
60
Example
partition data points to closest centrod
5
c1
3
c2
c3
0
0
5
61
Example
c1
c3
c2
0
0
5
62
Example
re-assign data points to new closest centrods
5
c1
c3
c2
0
0
5
63
Example
5
c1
c2
c3
0
0
5
64
Notes on k-means
65
Today
66