0% found this document useful (0 votes)
4 views4 pages

classification_2_ex

This document outlines an exercise on classification techniques in machine learning, focusing on Naive Bayes and discriminant analysis. It includes practical tasks such as computing predictions, handling numeric features, and visualizing decision boundaries for different classifiers. The exercises aim to enhance understanding of classification methods and their applications using R and Python libraries.

Uploaded by

Tef Elbert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

classification_2_ex

This document outlines an exercise on classification techniques in machine learning, focusing on Naive Bayes and discriminant analysis. It includes practical tasks such as computing predictions, handling numeric features, and visualizing decision boundaries for different classifiers. The exercises aim to enhance understanding of classification methods and their applications using R and Python libraries.

Uploaded by

Tef Elbert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Exercise 4 – Classification II

Introduction to Machine Learning

Hint: Useful libraries

# you may need the following packages for this exercise sheet:

library(mlr3)
library(mlr3learners)
library(ggplot2)
library(mlbench)
library(mlr3viz)

Python

# Consider the following libraries for this exercise sheet:

# general
import numpy as np
import pandas as pd
from scipy.stats import norm
# plotting
import matplotlib.pyplot as plt
import seaborn as sns
# sklearn
from sklearn.naive_bayes import CategoricalNB # import Naive Bayes Classifier for categori
from sklearn.naive_bayes import GaussianNB # import Naive Bayes Classifier for normal dist
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import LabelEncoder

1
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_recall_fscore_support

Exercise 1: Naive Bayes

Learning goals

Compute Naive Bayes predictions by hand

You are given the following table with the target variable Banana:

ID Color Form Origin Banana


1 yellow oblong imported yes
2 yellow round domestic no
3 yellow oblong imported no
4 brown oblong imported yes
5 brown round domestic no
6 green round imported yes
7 green oblong domestic no
8 red round imported no

We want to use a Naive Bayes classifier to predict whether a new fruit is a Banana or not.
Estimate the posterior probability 𝜋(x
̂ ∗ ) for a new observation x∗ = (yellow, round, imported).
How would you classify the object?

Assume you have an additional feature Length that measures the length in cm. Describe in
1-2 sentences how you would handle this numeric feature with Naive Bayes.

2
Exercise 2: Discriminant analysis

Learning goals

1) Set up discriminant analysis by hand


2) Make predictions with discriminant analysis
3) Discuss difference between LDA and QDA

4.0

3.5

3.0
y

2.5

2.0

0 2 4 6 8
x
The above plot shows 𝒟 = ((x(1) , 𝑦(1) ) , … , (x(𝑛) , 𝑦(𝑛) )), a data set with 𝑛 = 200 observations
of a continuous target variable 𝑦 and a continuous, 1-dimensional feature variable x. In the
following, we aim at predicting 𝑦 with a machine learning model that takes x as input.

To prepare the data for classification, we categorize the target variable 𝑦 in 3 classes and call
the transformed target variable 𝑧, as follows:

⎧1, 𝑦(𝑖) ∈ (−∞, 2.5]


{
𝑧(𝑖) = 2, 𝑦(𝑖) ∈ (2.5, 3.5]

{3, 𝑦(𝑖) ∈ (3.5, ∞)

Now we can apply quadratic discriminant analysis (QDA):

3
Estimate the class means 𝜇𝑘 = 𝔼(x|𝑧 = 𝑘) for each of the three classes 𝑘 ∈ {1, 2, 3} visually
from the plot. Do not overcomplicate this, a rough estimate is sufficient here.

Make a plot that visualizes the different estimated densities per class.

How would your plot from ii) change if we used linear discriminant analysis (LDA) instead of
QDA? Explain your answer.
Why is QDA preferable over LDA for this data?

Given are two new observations x∗1 = −10 and x∗2 = 7. Assuming roughly equal class sizes,
state the prediction for QDA and explain how you arrive there.

Exercise 3: Decision boundaries for classification learners

Learning goals

Get a feeling for decision boundaries produced by LDA/QDA/NB

We will now visualize how well different learners classify the three-class mlbench::mlbench.cassini
data set.

• Generate 1000 points from cassini using R or import cassini_data.csv in Python.


• Then, perturb the x.2 dimension with Gaussian noise (mean 0, standard deviation 0.5),
and consider the classifiers already introduced in the lecture:
– LDA (Linear Discriminant Analysis),
– QDA (Quadratic Discriminant Analysis), and
– Naive Bayes.

Plot the learners’ decision boundaries. Can you spot differences in separation ability?
(Note that logistic regression cannot handle more than two classes and is therefore not listed
here.)

You might also like