Pattern Recognition 14
Pattern Recognition 14
3
Classification in Statistical PR
• A class is a set of objects having some important
properties in common
5
Possible features for char rec.
6
Some Terminology
• Classes: set of m known categories of objects
(a) might have a known description for each
(b) might have a set of samples for each
• Reject Class:
a generic class for objects not in any of
the designated known classes
• Classifier:
Assigns object to a class based on features
7
Discriminant functions
• Functions f(x, K)
perform some
computation on
feature vector x
• Knowledge K
from training or
programming is
used
• Final stage
determines class
8
Classification using nearest class
mean
• Compute the
Euclidean distance
between feature
vector X and the mean
of each class.
9
Nearest mean might yield poor
results with complex structure
• Class 2 has two
modes; where is
its mean?
10
Scaling coordinates by std dev
11
Nearest Mean
• What’s good about the nearest mean
approach?
12
Nearest Neighbor Classification
13
Nearest Neighbor
• Pros
• Cons
14
Evaluating Results
• We need a way to measure the performance
of any classification task.
• Binary classifier: Face or not Face
– We can talk about true positives, false positives,
true negatives, false negatives
• Multiway classifier: ‘a’ or ‘b’ or ‘c’ .....
– For each class, what percentage correct and what
percentage for each of the wrong classes
15
Receiver Operating Curve ROC
• Plots correct
detection rate
versus false
alarm rate
• Generally, false
alarms go up
with attempts to
detect higher
percentages of
known objects
16
An ROC from our work:
17
Confusion matrix shows empirical
performance for multiclass problems
• Then we have:
19
Classifiers often used in CV
• Decision Tree Classifiers
20
Decision Trees
#holes
0 2
1
moment of
#strokes #strokes
inertia
<t ≥t
0 1
best axis
#strokes 0 1
direction
0 90 2 4
60
- / 1 x w 0 A 8 B 21
Decision Tree Characteristics
1. Training
How do you construct one from training data?
Entropy-based Methods
2. Strengths
Easy to Understand
3. Weaknesses
Overtraining
22
Entropy-Based Automatic Decision
Tree Construction
24
Entropy
• Two class problem: class I and class II
• Suppose ½ the set belongs to class I and ½
belongs to class II?
• Then
entropy = -½ log2 ½ -½ log2 ½
= (-½) (-1) -½ (-1)
= 1
25
Information Content
The information content I(C;F) of the class variable C
with possible values {c1, c2, … cm} with respect to
the feature variable F with possible values {f1, f2, … , fd}
is defined by:
X Y Z C
1 1 1 I
1 1 0 I
0 0 1 II
1 0 0 II
27
Example (cont)
28
Using Information Content
• Start with the root of the decision tree and the whole
training set.
30
Artificial Neural Nets
Artificial Neural Nets (ANNs) are networks of
artificial neuron nodes, each of which computes
a simple function.
. .
. .
.
. Outputs
Inputs 31
Node Functions
a1 w(1,i) neuron i
a2
output
w(j,i)
aj
an
output = g (∑ aj * w(j,i) )
33
Convolutional Neural Nets
• CNNs were invented in the 90s, but they have
returned and become very popular in computer
vision in the last few years, because they have
been used to achieve higher than competitor
accuracy on several benchmark data sets in
object recognition.
• They are related to “deep learning”.
• They have multiple layers, some of them do
convolutions instead of having full connections.
34
Simple CNN
35
Support Vector Machines (SVM)
• A kernel ‘trick’.
36
Maximal Margin
1 Margin
0 1
1
0
0 1 Hyperplane
0
0 1
0 0
0 11 0 1
0 1 1 0
0 1
1 0
0 0 1
0
40
Example from AI Text
41
Application
42
Kernel Function used in our 3D Computer
Vision Work
• k(A,B) = exp(-θ2AB/σ2)
43
We used SVMs for Insect Recognition
44
EM for Classification
• The EM algorithm was used as a clustering
algorithm for image segmentation.
45
Summary
• There are multiple kinds of classifiers
developed in machine learning research.
• We can use and have used pretty much all of
them in computer vision classification,
detection, and recognition tasks.
46