0% found this document useful (0 votes)

7 views

Supervised Classification 3601

This document provides an introduction and overview of supervised classification and machine learning techniques, including: 1) It discusses the supervised learning framework, including training data, predictors, loss functions, and risk. 2) It introduces Bayes classifiers and plug-in classifiers, explaining how they aim to estimate the conditional distribution of the output given the input to make predictions. 3) It specifically describes Naive Bayes classification, which makes a strong independence assumption between features to allow simple modeling, even in high dimensions. 4) It outlines discussing additional techniques like discriminant analysis, support vector machines, and neural networks that will be covered.

Uploaded by

wanjacquot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Supervised Classification 3601

Uploaded by

wanjacquot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

MAT3601

Introduction to data analysis

Supervised classification
Bayes classifier and discriminant analysis

1/39
Machine Learning

2/39
Outline

Introduction to supervised learning

Bayes and Plug-in classifiers

Naive Bayes

Discriminant analysis (linear and quadratic)

3/39
Supervised Learning

Supervised Learning Framework

+ Input measurement X ∈ X (often X ⊂ Rd ), output measurement Y ∈ Y.
+ The joint distribution of (X, Y ) is unknown.
+ Y ∈ {−1, 1} (classification) or Y ∈ Rm (regression).
+ A predictor is a measurable function in F = {f : X → Y}.

Training data
+ Dn = {(X1 , Y1 ), . . . , (Xn , Yn )} i.i.d. with the same distribution as (X, Y ).

Goal
+ Construct a good predictor b
fn from the training data.
+ Need to specify the meaning of good.

4/39
Loss and Probabilistic Framework

Loss function
+ `(Y , f (X)) measures the goodness of the prediction of Y by f (X).
+ Prediction loss: `(Y , f (X)) = 1Y 6=f (X) .
+ Quadratic loss: `(Y , X) = kY − f (X)k2 .

Risk function
+ Risk measured as the average loss:
R(f ) = E [`(Y , f (X))] .

+ Prediction loss: E [`(Y , f (X))] = P (Y 6= f (X)).

+ Quadratic loss: E [`(Y , f (X))] = E kY − f (X)k2 .

fn depends on Dn , R(b
+ Beware: As b fn ) is a random variable!

5/39
A robot that learns

A robot endowed with a set of sensors and an online learning algorithm.

+ Task: play football.

+ Performance: score.
+ Experience: current environment and outcome, past games...

6/39
Object recognition in an image

+ Task: say if an object is present or not in the image.

+ Performance: number of errors.
+ Experience: set of previously seen labeled image.

7/39
Number

+ Task: Read a ZIP code from an envelop.

+ Performance: give a number from an image.
+ Prediction problem with X: image and Y : corresponding number.

8/39
Applications in biology

+ Task: protein interaction network prediction.

+ Goal: predict (unknown) interactions between proteins.
+ Prediction problem with X: pair of proteins and Y : existence or no of
interaction.

9/39
Detection

+ Goal: detect the position of faces in an image.

+ X: mask in the image and Y : presence or no of a face...

10/39
Classification

Setting
+ Historical data about individuals i = 1, . . . , n.
+ Features vector Xi ∈ Rd for each individual i.
+ For each i, the individual belongs to a group (Yi = 1) or not (Yi = −1).
+ Yi ∈ {−1, 1} is the label of i.

Aim
+ Given a new X (with no corresponding label), predict a label in {−1, 1}.
+ Use data Dn = {(x1 , y1 ), . . . , (xn , yn )} to construct a classifier.

11/39
Classification

Geometrically

Learn a boundary to separate two “groups” of points.

12/39
Classification

...many ways to separate points!

13/39
Supervised learning methods

Support Vector Machine

Linear Discriminant Analysis

Logistic Regression

Trees/ Random Forests

Kernel methods

Neural Networks

Many more...

14/39
Outline

Introduction to supervised learning

Bayes and Plug-in classifiers

Naive Bayes

Discriminant analysis (linear and quadratic)

15/39
Best Solution

The best solution f ∗ (which is independent of Dn ) is

f ∗ = arg minf ∈F R(f ) = arg minf ∈F E [`(Y , f (X))] .

Bayes Predictor (explicit solution)

+ Binary classification with 0 − 1 loss:

+1 if P (Y = 1|X) > P (Y = −1|X)

f ∗ (X) = ⇔ P (Y = 1|X) > 1/2 ,

−1

otherwise .
+ Regression with the quadratic loss
f ∗ (X) = E [Y |X] .

The explicit solution requires to know E [Y |X]...

16/39
Plugin Classifier

+ In many cases, the conditional law of Y given X is not known... or relies on

parameters to be estimated.
+ An empirical surrogate of the Bayes classifier is obtained from a possibly
nonparametric estimator ηbn (X) of

η(x) = P(Y = 1|X)

using the training dataset.

+ This surrogate is then plugged into the Bayes classifier.

Plugin Bayes Classifier

+ Binary classification with 0 −(1 loss:

+1 if ηbn (X) > 1/2 ,
fn (X) =
b
−1 otherwise .

17/39
Plugin Classifier

Input: a data set Dn .

Learn the ditribution of Y given X (using the data set) and plug this estimate
in the Bayes classifier.

fn : Rd → {−1, 1}
Output: a classifier b

(
+1 if ηbn (X) > 1/2 ,
fn (X) =
b
−1 otherwise .

+ Can we certify that the plug-in classifier is good ?

18/39
Classification Risk Analysis

The missclassification error satisfies (see exercices):

1/2
fn (X) 6= Y ) − L? 6 2E |η(X) − η̂n (X)|2

0 6 P(b ,

where
L? = P(f ? (X) 6= Y )

and ηbn (x) is an empirical estimate based on the training dataset of

η(x) = P(Y = 1|X = x) .

19/39
How to estimate the conditional law of Y ?

Fully parametric modeling.

Estimate the law of (X, Y ) and use the Bayes formula to deduce an estimate
of the conditional law of Y : LDA/QDA, Naive Bayes...

Parametric conditional modeling.

Estimate the conditional law of Y by a parametric law: linear regression,
logistic regression, Feed Forward Neural Networks...

Nonparametric conditional modeling.

Estimate the conditional law of Y by a non parametric estimate: kernel
methods, nearest neighbors...

20/39
Fully Generative Modeling

If the law of (X, Y ) is known everything can be easy!

Bayes formula
With a slight abuse of notation, if the law of X has a density g with respect to
a reference measure,
gk (X)P(Y =k)
P (Y = k|X) = g(X)
,

where gk is the density of the distribution of X given {Y = k}.

Generative Modeling
Propose a model for (X, Y ).
Plug the conditional law of Y given X in the Bayes classifier.

Remark: require to model the joint law of (X, Y ) rather than only the
conditional law of Y .
Great flexibility in the model design but may lead to complex computation.

21/39
Outline

Introduction to supervised learning

Bayes and Plug-in classifiers

Naive Bayes

Discriminant analysis (linear and quadratic)

22/39
Naive Bayes

Naive Bayes
Classical algorithm using a crude modeling for P (X|Y ):
+ Feature independence assumption:
d
Y
P X (i) Y

P (X|Y ) = .
i=1

+ Simple featurewise model: binomial if binary, multinomial if finite and

Gaussian if continuous.

If all features are continuous, the law of X given Y is Gaussian with a diagonal
covariance matrix!

Very simple learning even in very high dimension!

23/39
Gaussian Naive Bayes

+ Feature independence assumption:

d
Y
P X (j) Y

P (X|Y ) = .
j=1

For k ∈ {−1, 1}, P (Y = k) = πk and the conditional density of X (j) given

{Y = k} is

2 −1/2

gk (x(j) ) = (2πσj,k ) exp −(x(j) − µj,k )2 /(2σj,k
2
) .

The conditional distribution of X given {Y = k} is then

gk (x) = (det(2πΣk ))−1/2 exp −(x − µk )T Σ−1

k (x − µk )/2 ,

2 2
where Σk = diag(σ1,k , . . . , σd,k ) and µk = (µ1,k , . . . , µd,k )T .

24/39
Gaussian Naive Bayes

In a two-classes problem, the optimal classifier is (see linear discriminant

analysis below):

f ∗ : X 7→ 21{P (Y = 1|X) > P (Y = −1|X)} − 1 .

+ When the parameters are unknown, they may be replaced by their maximum
likelihood estimates.
This yields, for k ∈ {−1, 1},
n
1X
bkn =
π 1Yi =k ,
n
i=1
n
1 X
bnk = Pn
µ 1Yi =k Xi ,
1
i=1 Yi =k i=1
n
!
b nk = diag 1X T
Σ bnk ) (Xi − b
(Xi − µ µnk ) 1Yi =k .
n
i=1

25/39
Gaussian Naive Bayes

26/39
Gaussian Naive Bayes

27/39
Gaussian Naive Bayes

28/39
Gaussian Naive Bayes

29/39
Kernel density estimate based Naive Bayes

30/39
Outline

Introduction to supervised learning

Bayes and Plug-in classifiers

Naive Bayes

Discriminant analysis (linear and quadratic)

31/39
Discriminant Analysis

Discriminant Analysis (Gaussian model)

The conditional densities are modeled as multivariate normal. For all class k,
conditionnally on {Y = k},
X ∼ N (µk , Σk ) .

Discriminant functions:

gk (X) = ln(P{X|Y = k}) + ln(P (Y = k)) .

In a two-classes problem, the optimal classifier is (see exercises):

f ∗ : x 7→ 21{g1 (x ) > g−1 (x )} − 1 .

QDA (differents Σk in each class) and LDA (Σk = Σ for all k)

lightr: this model can be false but the methodology remains valid!

32/39
Discriminant Analysis

Estimation
In practice, µk , Σk and πk := P (Y = k) have to be estimated.
nk 1
Pn
+ Estimated proportions π
bk = n
= n i=1
1{Yi =k} .

+ Maximum likelihood estimate of µbk and Σ

ck (explicit formulas).

The DA classifier then becomes(

+1 g−1 (X) ,
g1 (X) ≥ b
if b
fn (X) =
b
−1 otherwise .

If Σ−1 = Σ1 = Σ then the decision boundary is an affine hyperplane.

33/39
The loglikelihood of the observations is given by
n
X
log Pθ (X1:n , Y1:n ) = log Pθ (Xi , Yi ) ,
i=1
n
! n
!
nd n X X
=− log(2π) − log det(Σ) + 1Yi =1 log π1 + 1Yi =−1 log(1 − π1 )
2 2
i=1 i=1
n n
1X 1X
− 1Yi =1 (Xi − µ1 )T Σ−1 (Xi − µ1 ) − 1Yi =−1 (Xi − µ−1 )T Σ−1 (Xi − µ−1 ) .
2 2
i=1 i=1

This yields, for k ∈ {−1, 1},

n
1X
bkn =
π 1Yi =k ,
n
i=1
n
1 X
bnk = Pn
µ 1Yi =k Xi ,
1
i=1 Yi =k i=1
n
bn = 1X T
bnYi bnYi

Σ Xi − µ Xi − µ .
n
i=1

Remains to plug these estimates in the classification boundary.

34/39
Example: LDA

35/39
Example: LDA

36/39
Example: QDA

37/39
Example: QDA

38/39
Packages in R

Function svm in package e1071.

Function lda and qda in package MASS.

Function naive_bayes in package naivebayes.

39/39

whitepaper_emebddings_vectorstores_v2
No ratings yet
whitepaper_emebddings_vectorstores_v2
64 pages
Lista Fabio Cozman
No ratings yet
Lista Fabio Cozman
6 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Lecture 6_Generative Models
No ratings yet
Lecture 6_Generative Models
33 pages
Wk08
No ratings yet
Wk08
10 pages
module_3_Last Part
No ratings yet
module_3_Last Part
16 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
2 Naive Bayes
No ratings yet
2 Naive Bayes
49 pages
lecture3-linear-classifiers
No ratings yet
lecture3-linear-classifiers
36 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Lecture 2 - Principle of Machine Learning
No ratings yet
Lecture 2 - Principle of Machine Learning
39 pages
Tutorial 4
No ratings yet
Tutorial 4
21 pages
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
No ratings yet
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
17 pages
Mod09-ppt2-ML_in_Image_Classification
No ratings yet
Mod09-ppt2-ML_in_Image_Classification
30 pages
Practical-3 Ritesh
No ratings yet
Practical-3 Ritesh
5 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
Generative Models For Classification Neural Networks
No ratings yet
Generative Models For Classification Neural Networks
43 pages
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
No ratings yet
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
17 pages
NBayes Log Reg
No ratings yet
NBayes Log Reg
18 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
Slide ML 0915
No ratings yet
Slide ML 0915
24 pages
07_Naive_Bayes
No ratings yet
07_Naive_Bayes
6 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
QUESTIONS
No ratings yet
QUESTIONS
20 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Machine Learning UNIT-2: Logistic Regression
No ratings yet
Machine Learning UNIT-2: Logistic Regression
12 pages
Mathematics of Machine Learning MIT
No ratings yet
Mathematics of Machine Learning MIT
411 pages
Lecture 4
No ratings yet
Lecture 4
36 pages
L6 Lecture Image.classification.fundemental v4
No ratings yet
L6 Lecture Image.classification.fundemental v4
66 pages
Supervised Unsupervised
No ratings yet
Supervised Unsupervised
39 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
09 Naive Bayes
No ratings yet
09 Naive Bayes
23 pages
Lec5 Class
No ratings yet
Lec5 Class
14 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
EC994 Naive Bayes
No ratings yet
EC994 Naive Bayes
15 pages
Dl Highlights
No ratings yet
Dl Highlights
6 pages
Evaluation of Different Classifier
No ratings yet
Evaluation of Different Classifier
4 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
6 Classification
No ratings yet
6 Classification
53 pages
8 ML
No ratings yet
8 ML
22 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
No ratings yet
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
31 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
CS-601-Machine-learning-Unit-5 (1)
No ratings yet
CS-601-Machine-learning-Unit-5 (1)
18 pages
Class 06 07 Naive Bayes
No ratings yet
Class 06 07 Naive Bayes
91 pages
Lec12 PDF
No ratings yet
Lec12 PDF
9 pages
ESGB - Naive Bayes and Logistic Regression
No ratings yet
ESGB - Naive Bayes and Logistic Regression
36 pages
datamining-lect12
No ratings yet
datamining-lect12
75 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Legal 3 AI
No ratings yet
Legal 3 AI
3 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
The Brilliance and Weirdness O
No ratings yet
The Brilliance and Weirdness O
2 pages
PGP-AIFL_Brochure (1)
No ratings yet
PGP-AIFL_Brochure (1)
16 pages
Naman Meena: Data Science Engineer
No ratings yet
Naman Meena: Data Science Engineer
1 page
32035
No ratings yet
32035
53 pages
Movie Recommendation System Based On Emotion Detection Using Machine Learning Te
No ratings yet
Movie Recommendation System Based On Emotion Detection Using Machine Learning Te
6 pages
The Impact of Artificial Intelligence On The Strategic Power Dynamics Between The USA and India: A Comparative Analysis of Technological Advancements and Geopolitical Influence
No ratings yet
The Impact of Artificial Intelligence On The Strategic Power Dynamics Between The USA and India: A Comparative Analysis of Technological Advancements and Geopolitical Influence
3 pages
Deep Learning
No ratings yet
Deep Learning
43 pages
Fuzzy logic for beginners 1st ed., 3rd repr. Edition Mukaidono - Download the entire ebook instantly and explore every detail
No ratings yet
Fuzzy logic for beginners 1st ed., 3rd repr. Edition Mukaidono - Download the entire ebook instantly and explore every detail
47 pages
Nec Mock Test 5
100% (1)
Nec Mock Test 5
15 pages
Ayub 2020
No ratings yet
Ayub 2020
6 pages
Ain't I A Woman
No ratings yet
Ain't I A Woman
5 pages
麦肯锡 2023 - how businesses can close chinas ai talent gap
No ratings yet
麦肯锡 2023 - how businesses can close chinas ai talent gap
11 pages
A Multi-Task Feature Fusion Model For Cervical Cell Classification
No ratings yet
A Multi-Task Feature Fusion Model For Cervical Cell Classification
11 pages
Nath et al 2024
No ratings yet
Nath et al 2024
18 pages
AI Chat Bot
No ratings yet
AI Chat Bot
13 pages
Canada's Most Empowering Leaders Redefining Leadership in 2025
No ratings yet
Canada's Most Empowering Leaders Redefining Leadership in 2025
40 pages
Profile - Bilytica (Digital Transformation Partner) (1) - Compressed
No ratings yet
Profile - Bilytica (Digital Transformation Partner) (1) - Compressed
19 pages
Lesson 1 - Uncovering The Mystery of Computers
No ratings yet
Lesson 1 - Uncovering The Mystery of Computers
5 pages
Cse Presentation
No ratings yet
Cse Presentation
3 pages
Embracing Emerging Technologies in Philippine Education Aileen Lam Most Updated
No ratings yet
Embracing Emerging Technologies in Philippine Education Aileen Lam Most Updated
132 pages
(PDF Download) Technology in Action, Rental Edition, 17th Edition Alan Evans Fulll Chapter
100% (8)
(PDF Download) Technology in Action, Rental Edition, 17th Edition Alan Evans Fulll Chapter
49 pages
Google_DeepMind
No ratings yet
Google_DeepMind
23 pages
ANS3400F - Challenge of Culture - Outline 2024 - Finalised 2
No ratings yet
ANS3400F - Challenge of Culture - Outline 2024 - Finalised 2
17 pages
Thumbs Up? Sentiment Classification Using Machine Learning Techniques
No ratings yet
Thumbs Up? Sentiment Classification Using Machine Learning Techniques
15 pages
Unit-3 Artificial Intelligence
No ratings yet
Unit-3 Artificial Intelligence
68 pages
Approach To The Synthesis of Neural Network Structure During Classification
No ratings yet
Approach To The Synthesis of Neural Network Structure During Classification
7 pages
Chatgpt For Dummies Pam Baker instant download
No ratings yet
Chatgpt For Dummies Pam Baker instant download
46 pages
AI Class 8 Ch-3
No ratings yet
AI Class 8 Ch-3
6 pages
AI Driven+Policy+Administration+in+Life+Insurance
No ratings yet
AI Driven+Policy+Administration+in+Life+Insurance
52 pages

Supervised Classification 3601

Uploaded by

Supervised Classification 3601

Uploaded by

MAT3601

Introduction to data analysis

Introduction to supervised learning

Bayes and Plug-in classifiers

Discriminant analysis (linear and quadratic)

Supervised Learning Framework

+ Prediction loss: E [`(Y , f (X))] = P (Y 6= f (X)).

A robot endowed with a set of sensors and an online learning algorithm.

+ Task: play football.

+ Task: say if an object is present or not in the image.

+ Task: Read a ZIP code from an envelop.

+ Task: protein interaction network prediction.

+ Goal: detect the position of faces in an image.

Learn a boundary to separate two “groups” of points.

...many ways to separate points!

Support Vector Machine

Linear Discriminant Analysis

Trees/ Random Forests

Introduction to supervised learning

Bayes and Plug-in classifiers

Discriminant analysis (linear and quadratic)

The best solution f ∗ (which is independent of Dn ) is

f ∗ = arg minf ∈F R(f ) = arg minf ∈F E [`(Y , f (X))] .

Bayes Predictor (explicit solution)

The explicit solution requires to know E [Y |X]...

+ In many cases, the conditional law of Y given X is not known... or relies on

η(x) = P(Y = 1|X)

using the training dataset.

Plugin Bayes Classifier

+ Binary classification with 0 −(1 loss:

Input: a data set Dn .

+ Can we certify that the plug-in classifier is good ?

The missclassification error satisfies (see exercices):

and ηbn (x) is an empirical estimate based on the training dataset of

η(x) = P(Y = 1|X = x) .

Fully parametric modeling.

Parametric conditional modeling.

Nonparametric conditional modeling.

If the law of (X, Y ) is known everything can be easy!

where gk is the density of the distribution of X given {Y = k}.

Introduction to supervised learning

Bayes and Plug-in classifiers

Discriminant analysis (linear and quadratic)

+ Simple featurewise model: binomial if binary, multinomial if finite and

Very simple learning even in very high dimension!

+ Feature independence assumption:

For k ∈ {−1, 1}, P (Y = k) = πk and the conditional density of X (j) given

The conditional distribution of X given {Y = k} is then

gk (x) = (det(2πΣk ))−1/2 exp −(x − µk )T Σ−1

In a two-classes problem, the optimal classifier is (see linear discriminant

f ∗ : X 7→ 21{P (Y = 1|X) > P (Y = −1|X)} − 1 .

Introduction to supervised learning

Bayes and Plug-in classifiers

Discriminant analysis (linear and quadratic)

Discriminant Analysis (Gaussian model)

gk (X) = ln(P{X|Y = k}) + ln(P (Y = k)) .

In a two-classes problem, the optimal classifier is (see exercises):

f ∗ : x 7→ 21{g1 (x ) > g−1 (x )} − 1 .

QDA (differents Σk in each class) and LDA (Σk = Σ for all k)

+ Maximum likelihood estimate of µbk and Σ

The DA classifier then becomes(

If Σ−1 = Σ1 = Σ then the decision boundary is an affine hyperplane.

This yields, for k ∈ {−1, 1},

Remains to plug these estimates in the classification boundary.

Function svm in package e1071.

Function lda and qda in package MASS.

Function naive_bayes in package naivebayes.

You might also like