0% found this document useful (0 votes)
10 views

Handwritten Digit Recognition

Uploaded by

akilankannan4
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Handwritten Digit Recognition

Uploaded by

akilankannan4
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

1

Classification techniques
for
Hand-Written Digit Recognition

Venkat Raghavan N. S., Saneej B. C., and Karteek Popuri

Department of Chemical and Materials Engineering


University of Alberta, Canada.

Thursday, June 1, 2006 CPC group Seminar


2
Introduction
Objective: To recognise images of Handwritten
digits based on classification methods for
multivariate data.
 Optical Character Recognition (OCR)
• Predict the label of each image using the classification
function learned from training
 OCR is basically a classification task on
multivariate data
• Pixel Values  Variables
• Each type of character  Class

Thursday, June 1, 2006 CPC group Seminar


Handwritten Digit data 3

 16 x16 (= 256 pixel) Grey Scale images of


digits in range
x  [0,10-9
] ij

• Xi=[xi1, xi2, ……. xi256]


2

• yi { 0,1,2,3,4,5,6,7,8,9}
4

 9298 labelled samples 8

16
• Training set ~ 1000 images 10

• Test set 12

• Randomly selected from the full data base 14

16

2 4 6 8 10 12 14 16

 Basic idea
– Correctly identify the digit given an image 16

Thursday, June 1, 2006 CPC group Seminar


4
Dimension reduction - PCA
AVERAGE IMAGE

 PCA done on the mean centered images 2

 The eigenvectors of ∑ matrix are


256x256
10

12

called the Eigen digits (256 dimensional) 14

16
2 4 6 8 10 12 14 16

 The larger an Eigen value the more 5 5 5 5 5

10 10 10 AVERAGE
10 DIGIT 10

important is that Eigen digit. 15


5 10 15
15
5 10 15
15
5 10 15
15
5 10 15
15
5 10 15

EIGEN DIGITS
 The ith PC of an image X is
yi=ei’X 5 5 5 5 5

10 10 10 10 10

15 15 15 15 15

Thursday, June 1, 2006 CPC group Seminar


5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5
PCA (continued…)

 Based on the Eigen values first 64 PCs were found to be


significant

 Variance captured ~ 92.74%

 Any image represented by its PC: Y= [y1 y2….....y64 ]

 Reduced Data Matrix with 64 variables


Y = 1000 x 64 matrix
Thursday, June 1, 2006 CPC group Seminar
7
Interpreting the PCs as Image Features

 The Eigen vectors are the rotation of the original


axes to more meaningful directions. The PCs
are the projection of the data onto each of these
new axes.

 Image Reconstruction:
• The original image can be reconstructed by projecting
the PCs back to old axes.
• Using the most significant PC will give a
reconstructed image that is close to original image.
• These features can be used for carrying out further
investigations e.g. Classification!!

Thursday, June 1, 2006 CPC group Seminar


9
Normality test on PCs
QQ Plot of Sample Data versus Standard Normal QQ Plot of Sample Data versus Standard Normal QQ Plot of Sample Data versus Standard Normal
Principle Component No 1 Principle Component No 3 Principle Component No 5
1 1 1
Quantiles of Input Sample

Quantiles of Input Sample

Quantiles of Input Sample


0.5 0.5 0.5

0 0 0

-0.5 -0.5 -0.5

-1 -1 -1
-4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4
Standard Normal Quantiles Standard Normal Quantiles Standard Normal Quantiles

QQ Plot of Sample Data versus Standard Normal QQ Plot of Sample Data versus Standard Normal QQ Plot of Sample Data versus Standard Normal
Principle Component No 10 Principle Component No 20 Principle Component No 30
1 1.5 1.5
Quantiles of Input Sample

Quantiles of Input Sample

Quantiles of Input Sample


1 1
0.5
0.5 0.5

0 0 0

-0.5 -0.5
-0.5
-1 -1

-1 -1.5 -1.5
-4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4
Standard Normal Quantiles Standard Normal Quantiles Standard Normal Quantiles

QQ Plot of Sample Data versus Standard Normal QQ Plot of Sample Data versus Standard Normal QQ Plot of Sample Data versus Standard Normal
Principle Component No 40 Principle Component No 50 Principle Component No 60
2 3 5
Quantiles of Input Sample

Quantiles of Input Sample

Quantiles of Input Sample


1 2

1
0
0 0
-1
-1
-2 -2

-3 -3 -5
-4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4
Standard Normal Quantiles Standard Normal Quantiles Standard Normal Quantiles

Thursday, June 1, 2006 CPC group Seminar


10
Classification

 Principle Components used as features of


images

 LDA assuming multivariate normality of the


feature groups and common covariance

 Fisher discriminant procedure which assumes


only common covariance

Thursday, June 1, 2006 CPC group Seminar


11
Classification (contd..)

 Equal cost of misclassification

 Misclassification error rate:


• APER based on training data Averaged over
several random
• AER on the validation data
sampling of training
and validation data
from the full data set.
 Error rate using different number
of PCs were compared

Thursday, June 1, 2006 CPC group Seminar


12
Performing LDA

 Prior probabilities of each class were taken


as the frequency of that class in data.

 Equivalence of covariance matrix


• Strong Assumption
• Error rates used to check validity of
assumption
• Spooled used for covariance matrix

Thursday, June 1, 2006 CPC group Seminar


13
LDA Results

 APER

No of PCs 256 150 64


APER % 1.8 4 6.4
 AER

No of PCs 256 150 64


AER % 13.63 10.91 10.087
 APER underestimates the AER
 Using 64 PCs is better than using 150/256 PCs!
• The PCs with lower Eigen values tend to capture the noise in the data.

Thursday, June 1, 2006 CPC group Seminar


14
Fisher Discriminants

 Uses equal prior probabilities, covariances.

 No of discriminants can be r <= 9


• When all discriminants are used Fischer equivalent to
LDA (verified by error rates)
i.e. when r=9

 Error rates with different r compared

Thursday, June 1, 2006 CPC group Seminar


15
Fisher Discriminant Results

r=2 discriminants
APER
No of PCs 256 150 64
AER APER % 32 34.5 37.4
No of PCs 256 150 64
AER % 45 42 40
 Both AER and APER are very high

Thursday, June 1, 2006 CPC group Seminar


16
Fisher Discriminant Results

r=7 discriminants

APER No of PCs 256 150 64


APER % 3.2 4.8 7.9

AER No of PCs 256 150 64


AER % 14.1 12.4 10.8

 Considerable improvement in AER and APER


 Performance is close to LDA
 Using 64 PCs is better
Thursday, June 1, 2006 CPC group Seminar
17
Fisher Discriminant Results

r=9(all) discriminants

No of PCs 256 150 64


APER APER % 1.6 4.3 6.4
No of PCs 256 150 64
AER AER % 13.21 10.55 9.86

 No significant performance gain from r=7


 Error rates are ~ LDA (as expected!)
Thursday, June 1, 2006 CPC group Seminar
18
Nearest Neighbour Classifier

Finds the nearest neighbours from the training


set to test image and assigns its label to test
image. Test point assigned to
Class 2

 No assumption about
distribution of data Class 1

Class 2
 Euclidean distance to find
nearest neighbour

Thursday, June 1, 2006 CPC group Seminar


19
K-Nearest Neighbour Classifier (KNN)

 Compute the k nearest neighbours and


assign the class by majority vote.
k=3

Test point assigned to


Class 1 Class 1 ( 2 votes )

Class 2 ( 1 vote )

Thursday, June 1, 2006 CPC group Seminar


20
1-NN Classification Results:

No of PCs 256 150 64


AER % 7.09 7.01 6.45

Test error rates have improved compared to


LDA and Fisher
Using 64 PCs gives better results
Using higher k’s does not show improvement
in recognition rate

Thursday, June 1, 2006 CPC group Seminar


21
Misclassification in NN:

Recognised as
0 1 2 3 4 5 6 7 8 9
0 1376 0 4 2 0 5 12 2 0 0
Actual

1 0 1113 1 0 1 0 2 0 2 0
2 22 9 728 17 4 4 6 16 18 2
3 4 0 4 690 2 26 0 4 6 3
4 3 15 9 0 687 0 7 2 4 32
5 9 3 12 37 5 517 32 0 23 9
6 10 3 5 0 3 2 714 0 3 2
7 0 6 1 0 19 0 0 657 1 20
8 8 11 1 26 7 7 8 5 547 13
9 6 1 2 0 23 0 0 32 0 664

Euclidean distances between transformed


images of same class can be very high
Thursday, June 1, 2006 CPC group Seminar
23
Issues in NN:

Expensive: To determine the nearest


neighbour of a test image, must compute the
distance to all N training examples

Storage Requirements: Must store all


training data

Thursday, June 1, 2006 CPC group Seminar


24
Euclidean-NN method inefficient

 Store all possible instances (positions, sizes,


angles, thickness, writing styles…),
this is impractical.

Thursday, June 1, 2006 CPC group Seminar


25
Euclidean distance metric fails

Pattern to be classified Prototype A Prototype B

 Prototype B seems more similar than Prototype


A according to Euclidean distance.

 Digit “9” misclassified as “4”.

 Possible solution is to use an distance metric


invariant to irrelevant transformations.
Thursday, June 1, 2006 CPC group Seminar
26
Effect of a Transformation

Pixel Space  
X + α .Τ

X  
Τ
s (X, α)   s
 


SX = { y | there exists α for which y = s (X, α) }

Thursday, June 1, 2006 CPC group Seminar


27
Tangent Distance

Tangent distance
P SP

Euclidean distance
between P and E
Distance between
SP and SE

SE

Thursday, June 1, 2006 CPC group Seminar


28
Images in tangent plane
P T  = -2  = -1  =0  =1  =2

Rotation

Scaling

Thickness

X Translation

Diag. Deformation

Axis Deformation

Y Translation

Thursday, June 1, 2006 CPC group Seminar


29
Implementation

 The vectors tangent to the manifold S X form the


hyper plane TX tangent to SX.

 The Tangent distance D(E,P) is found by


minimizing distance between TE and TP.

 The images are smoothed with a gaussian


σ = 1.

Thursday, June 1, 2006 CPC group Seminar


Implementation (Contd…) 30

The Equations of TP and TE are given by

E '  E   E  L E  E P '  P   P  L P  P
 s ( X ,  ) s ( X ,  ) 
where LX   ,......, 
   1   m   0

DE , P   min E ( E )  P ( P ) ' ' 2

 E , P

Thursday, June 1, 2006 CPC group Seminar


31
Implementation (Contd…)

D( E , P )
 E
  T
 2 E ' ( E )  P ' ( P ) L E  0

D( E , P )
 P
 T

 2 P ( P )  E ( E ) L P  0
' '

Solving for αP and αE we can calculate D(E,P) the Tangent Distance between
two patterns E and P.

Thursday, June 1, 2006 CPC group Seminar


32
Tangent Distance method Results

 USPS data set ,1000 training examples and


7000 test examples.

 The misclassification error rate using 3-NN is


3.26 %.

 The time taken is 9967.94 sec.

Thursday, June 1, 2006 CPC group Seminar


33
References:

 “The Elements of Statistical Learning- Data Mining,


Inference and Prediction” by Trevor Hastie, Robert
Tibshirani, Jerome Friedman

 “Applied Multivariate Statistical Analysis” by Richard A.


Johnson, Dean W. Wichern.

 http://www.robots.ox.ac.uk/~dclaus/

 “Transformation Invariance in Pattern Recognition –


Tangent Distance and Tangent propagation” by Patrice
Y. Simard, Yann A. Le Cun .
Thursday, June 1, 2006 CPC group Seminar

You might also like