0% found this document useful (0 votes)
19 views

An Introduction To Pattern Recognition - 2

This document provides an introduction to pattern recognition, including definitions of key concepts like patterns, classes, classification, and clustering. It discusses common pattern recognition applications and the main approaches and phases of pattern recognition problems. It also explores challenges like data variability, feature extraction, decision boundaries, overfitting and improving generalization.

Uploaded by

Michael Iyagha
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

An Introduction To Pattern Recognition - 2

This document provides an introduction to pattern recognition, including definitions of key concepts like patterns, classes, classification, and clustering. It discusses common pattern recognition applications and the main approaches and phases of pattern recognition problems. It also explores challenges like data variability, feature extraction, decision boundaries, overfitting and improving generalization.

Uploaded by

Michael Iyagha
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 46

Introduction to Pattern

Recognition
Chapter 1 (Duda et al.)

CS479/679 Pattern Recognition


Dr. George Bebis

1
What is a Pattern?
• An object or event.  x1 
x 
• Represented by a vector x of  2
x . 
values corresponding to  
 . 
various features.  xD 

biometric patterns hand gesture patterns

2
What is a Class ?
• A collection of “similar” objects.

Female class Male class

3
Pattern Recognition/Classification
• Assign a pattern to one of several known
categories or classes.

Category “A”

Category “B”

4
Classification vs Clustering

Classification (known categories)


(Supervised Classification)

Category “A” Clustering (unknown categories)


(Unsupervised Classification)

Category “B”

5
Pattern Classification Examples
• Loan/Credit card applications
– Income, # of dependents, mortgage amount  credit
worthiness classification

• Dating services
– Age, hobbies, income “desirability” classification

• Web documents
– Key-word based descriptions (e.g., documents
containing “football”, “NFL”)  document
classification

6
Main Objectives
(1) Separate data belonging to different classes.

(2) Assign new data to the correct class.

Gender Classification

7
Main Approaches
x: input vector (pattern)
ω1
ωi: class label (class)

•Generative
ω2
– Models the joint probability, p(x, ωi)
– Make predictions by using Bayes rule to calculate P(ωi/x)
– Pick the most likely class label ωi

•Discriminative
– Does not model p(x, ωi)
– Estimate P(ωi/x) by “learning” a direct mapping from x to ωi (i.e.,
estimate decision boundary).
– Pick the most likely class label ωi
8
How do we model p(x, ωi)?
• Typically, using a statistical model.
– probability density function (pdf) (e.g., Gaussian)

male
Gender Classification female

9
Applications

10
Handwriting Recognition

11
License Plate Recognition

12
Biometric Recognition

13
Face Detection/Recognition

Detection

Matching

Recognition

Face Gallery 14
Fingerprint Classification
Important step for speeding up identification

15
Autonomous Systems
Obstacle detection and avoidance
Object recognition

16
Medical Applications
Skin Cancer Detection Breast Cancer Detection

17
Wildfire Monitoring
(using aerial or satellite images)

18
Other Applications

• Recommendation systems
– e.g., Amazon, Netflix
• Spam filters
• Malicious website detection
• Loan/Credit Card Applications

19
Main PR Phases
Testing Training

20
Complexity of PR – An Example
camera

Problem: Sort
incoming fish on a
conveyor belt.

Assumption: Two
kind of fish:
(1) sea bass
(2) salmon

21
Sensors
• Sensing: camera

– Use some kind of a


sensor (e.g., camera,
weight scale) for data
capture.
– PR’s overall performance
depends on bandwidth,
resolution, sensitivity,
distortion of the sensor
being used.

22
Preprocessing
A critical step for reliable feature extraction!

Examples:

•Noise removal

•Image enhancement

•Separate touching
or occluding fish

•Extract boundary of each


fish

23
Training/Test data
• How do we know that we have collected a
sufficiently large and representative set of
examples for training and testing the
system?
Training Set ?

Test Set ?

24
Data Variability
• Intra-class variability

The letter “T” in different typefaces

• Inter-class variability

Letters/Numbers that look similar

• Need to collect a large number of examples but also a


good set of features!
25
Feature Extraction
• How should we choose a good set of features?
– Discriminative features

– Invariant features (e.g., invariant to geometric


transformations such as translation, rotation and
scale)

• Are there ways to automatically learn which features


are best ?
26
Feature Extraction
• Assume that sea bass is
generally longer than salmon.
• Use length as a feature.
• Decide between sea bass and
salmon by applying a threshold
on length.

• How should we choose the


threshold?

27
Feature Extraction (cont’d)
Histogram of “length”

threshold l*

• Even though sea bass is longer than salmon on


average, this is not always the case (i.e.,
distributions overlap). 28
Feature Extraction (cont’d)
• Maybe consider a different feature: lightness
Histogram of “lightness”

threshold x*

• It seems easier to choose a threshold but we still


cannot make a perfect decision. 29
Multiple Features
• To improve classification accuracy, we might need
to use more than one features.
– Single features might not yield good performance.
– Combinations of features might yield better
performance.

 x1  x1 : lightness
x  x2 : width
 2

30
Multiple Features (cont’d)
• Does adding more features always help?
– It might be difficult and computationally
expensive to extract more features.
– Correlated features might not improve
performance (i.e., redundancy).
– Adding too many features can, paradoxically,
lead to a worsening of performance (i.e.,
“curse” of dimensionality).

31
Curse of Dimensionality
• The number of training data depends exponentially
on the number of features.
– Divide each of the input features into a number of intervals/cells
M, so that the value of a feature can be specified approximately by
saying in which interval it lies.

– The total number of cells will be MD (D: # of features).


– Assuming uniform sampling, each cell should contain at least one
data point, i.e., the number of training data grows exponentially
with D.
32
Missing Features
• Certain features might be missing (e.g., due to
occlusion).

• How should we train the classifier with missing


features ?

• How should the classifier make the best decision


with missing features ?

33
Decision Boundary
• A decision boundary is typically found by
minimizing an error function (e.g., classification
error) using a set of training data.

How should we find an optimal decision boundary?


34
Decision Boundary (cont’d)
• In general, we can get perfect classification results
on the training set by choosing a complex model
(more parameters) instead of a simpler model (less
parameters).
• Should prefer a simpler model or a complex model?

35
simpler model complex model
Overfitting
• Complex models are tuned to the training data,
rather than on the characteristics of the true model
(i.e., memorization or overfitting).
• Overfitting our data implies poor generalization!

36
simpler model complex model
Generalization
• Generalization is defined as the ability of a classifier to
produce correct results on novel patterns (i.e., not in the
training set).
• How could generalization performance be improved?
– More training data (i.e., better model estimation).
– Simpler models (i.e., less model parameters).

simpler model
complex model 37
Understanding model complexity:
function approximation
• Approximate a function from a set of samples:
o Green curve is the true function
o 10 sample points are shown by the blue circles
(assuming some “noise”)

38
Understanding model complexity:
function approximation (cont’d)
Polynomial curve fitting: polynomials having various
orders (i.e., complexity/parameters), shown as red
curves, fitted to the set of 10 sample points.

overfitting

39
Understanding model complexity:
function approximation (cont’d)
• Using more data can improve model estimation!

Polynomial curve fitting: 9’th order polynomials


fitted to 15 and 100 sample points.

40
Cost of miss-classifications
• There are two possible classification errors in
the fish classification example:
(1) Deciding the fish was a sea bass when it was a
salmon.
(2) Deciding the fish was a salmon when it was a sea
bass.

• Are both errors equally important ?

41
Cost of miss-classifications (cont’d)

• Let us assume that:


– Customers who buy salmon will object vigorously if
they see sea bass in their cans.
– Customers who buy sea bass will not be unhappy if
they occasionally see some expensive salmon in
their cans.

• How does this knowledge affect our decision?

42
Improve Classification Performance
using Ensembles of Classifiers

• Performance can be
improved using a
"pool" of classifiers.

• How should we build


and combine different
classifiers ?

43
Improve Classification Performance
through Post-processing

• Consider the problem of character recognition.

How m ch info mation are


y u mi sing?

• Exploit context to improve classification


accuracy!

44
Computational Complexity
• How does an algorithm scale with:
• Number of features
• Number of training data
• Number of classes

• Need to consider tradeoffs between


computational complexity and
performance.

45
Would it be possible to build a
“general purpose” PR system?

• Very difficult to design a system that is capable of


performing a variety of classification tasks.
– Different problems require different features.
– Different features might yield different solutions.
– Different tradeoffs exist for different problems.

46

You might also like