Statistical Methods in Artificial Intelligence CSE471 - Monsoon 2015: Lecture 02
Statistical Methods in Artificial Intelligence CSE471 - Monsoon 2015: Lecture 02
Avinash Sharma
CVIT, IIIT Hyderabad
Course Content
Introduction
Linear Classification
Neural Networks
Probability Densities
Bayesian Classifiers
Dimensionality Reduction
Support Vector Machines
Kernel Methods
Clustering Techniques
Decision Tree/Graphical Models
Reference Material
Books
Pattern Classification by Duda, Hart & Stork
The Elements of Statistical Learning by Hastie, Tibshirani and
Friedman
Machine Learning : A probabilistic Perspective by Kevin P. Murphy
Pre-requisite
Basics of Linear Algebra, Probability Theory and Statistics.
Programming in Matlab and C/C++.
Course Website
http://courses.iiit.ac.in
Expected Outcome
This course would enable students to
understand pattern recognition techniques in
detail.
We will ensure that both theoretical as well as
practical aspects are learnt simultaneously.
The project deliverables are expected to be
working systems attached to some practical
application.
NN Classifier
A lazy learning approach where each data sample is
represented as point in a Euclidean space.
Any new test data sample is assigned with the label
of closest data point in the training data using
Euclidean distance metric.
Test #3
Test #2
Test #1
Test #3
Test #2
Test #1
Practical Aspects
K should be chose empirically and preferably odd to
avoid tie situation.
KNN can have both discrete-value and continuousvalue target functions.
Weighted contributions from different neighbors can
be used to compute final label.
Distance based weighting can be used for giving
higher importance to closer data points.
Performance of NN classification typically degrades
when data is high-dimensional.
This can be avoided by assigning feature weights
inside Euclidean distance computation.
Effect of K on decision
boundaries
1 NN
5 NN
20 NN
KNN Classifier
Advantages:
Learn complex target functions
Training is very fast
Zero loss of information
Disadvantages:
Classification cost for new instances can be
very high
Major computation takes place at
classification time
Classifier Evaluation
Cross Validation is an important means of
evaluating classifiers. Types of cross
validation techniques are:
Random Subsampling
K-fold Cross-Validation
Leave-one-out Cross-validation
Assignment
Iris data or other standard dataset
KNN classifier
Matlab/C++ pipeline
K-fold cross-validation
A 2-3 page report on setup & experimental
evaluation
Equation of a Plane
Vector
Operations
Transpose
Vector
Operations
Vector
Operations
Equation of a Plane
Linear Discriminant
Functions
Assumes
a 2-class classification setup
Class A
Class B
The perceptron
Perceptron Decision
Boundary
Perceptron Summary
the surface
positive side,
negative
side