0% found this document useful (0 votes)

27 views

Classification: K N X X X y I y

Uploaded by

usasua1112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Classification: K N X X X y I y

Uploaded by

usasua1112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Statistical Machine Learning, 1RT700: Exercises Lesson 3 - Classification

Classification

3.1 k nearest neighbor

The table below is the training data set with n = 6 observations of a 3-dimensional quantitative input x = [x1 x2 x3 ]T
and 1 qualitative output y (the color green or red).
i x y
1 0 3 0 Red
2 2 0 0 Red
3 0 1 3 Red
4 0 1 2 Green
5 −1 0 1 Green
6 1 1 1 Red

(a) Compute the Euclidean distance between each observation in the training data, and the test point x? = [0 0 0]T .
(b) What is our prediction for the test point x? , if we use k-NN with k = 1?
(c) What is our prediction for the test point x? , if we use k-NN with k = 3?

3.2 Logistic regression

Suppose we collect data from a group of students in a Machine learning class with variables x1 = hours studied, x2 =
grade point average, and y = a binary output if that student received grade 5 (y = 1) or not (y = 0). We learn a
logistic regression model

eβ0 +β1 x1 +β2 x2

p(y = 1 | x) = (3.1)
1 + eβ0 +β1 x1 +β2 x2

with parameters βb0 = −6, βb1 = 0.05, βb2 = 1.

(a) Estimate the probability according to the logistic regression model that a student who studies for 40 h and has the
grade point average of 3.5 gets a 5 in the Machine learning class.
(b) According to the logistic regression model, how many hours would the student in part (a) need to study to have
50% chance of getting a 5 in the class?

January 29, 2019

Statistical Machine Learning, 1RT700: Exercises Lesson 3 - Classification

3.3 Difference between LDA and QDA

We now examine the differences between LDA and QDA. The Bayes decision boundary is the decision boundary of the
Bayes classifier, which is the ‘optimal’ classifier (Section 3.4 in the lecture notes).

(a) If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? What
do we expect on the test set?
(b) If the Bayes decision boundary is nonlinear, do we expect LDA or QDA to perform better on the training set?
What do we expect on the test set?
(c) In general, as the sample size n increases, do we expect the test error rate of QDA relative to LDA to increase,
decrease or be unchanged? Why?
(d) True or false: Even if the Bayes decision boundary for a given problem is linear, we will probably achieve a smaller
test error rate using QDA rather than LDA because QDA is flexible enough to model a linear decision boundary.
Justify your answer.

3.4 Bayes’ classifier

Suppose you work at a clinic in a field mission, and want to predict whether a patient has a particular (potentially
deadly) disease or not. You do have a limited supply of an effective drug, which however has severe side effects. Due
to unfortunate circumstances, the only diagnosis tool you have access to is a clinical thermometer, with which you can
measure the body temperature of the patient. From previous studies made on the disease, you know the following:

• the distribution of body temperatures in infected patients is approximately Gaussian distributed with mean 38.5 ◦ C
and standard deviation 1 ◦ C.
• the distribution of body temperatures in patients not infected by the disease (either
√ healthy or infected by other
diseases) is approximately Gaussian with mean 37.5 ◦ C and standard deviation 0.5 ◦ C.
• the prevalence of the disease is 5% (i.e., 5% of the population is infected).
The body temperatures of three patients are as follows: patient A 38.5 ◦ C, patient B 39.2 ◦ C, and patient C 40.1 ◦ C.
(a) What is the probability that patient A, B and C are infected by the disease, respectively?
Hint: Use Bayes’ theorem
p(x | y)p(y)
p(y | x) = PK
k=1 p(x | k)p(k)

where p(y) is the prior probability of class y, and p(x | y) the probability density of x for an observation from class
y.
(b) Which prediction should you make for each patient, in order to make on average as few misclassifications as
possible? (Hint: Bayes’ classifier)
(c) Argue why another performance metric other than the standard accuracy (’on average as few misclassifications as
possible’) should be considered for this problem. How would that affect your decisions in (b)?
(d) For most applications, Bayes’ classifier cannot be used. Why is it possible to use Bayes’ classifier for this problem?

3.5 Error rates

Suppose that we take a data set, divide it into equally-sized training and test sets, and then try out two different
classification procedures. First we use logistic regression and get an error rate of 20% on the training data and 30%
on the test data. Next we use 1-nearest neighbors (i.e. k-NN with k = 1) and get an average error rate (averaged over
both test and training data sets) of 18%. Based on these results, which method should we prefer to use for classification
of new observations? Why?

January 29, 2019

Statistical Machine Learning, 1RT700: Exercises Lesson 3 - Classification

3.6 Quadratic Discriminant Analysis

Consider a classification problem with the input x ∈ Rp and output y ∈ {1, . . . , K}. Consider also Bayes’ classifier

p(x | y)p(y)
yb = arg max p(y = k | x), where p(y | x) = PK . (3.2)
k={1,...,K} k=1 p(x | k)p(k)

(each with a different µk , Σk and πk ).

(a) Show that if one makes the assumption (3.3), then Bayes’ classifier becomes
1 1 T −1 1
yb = arg max δk (x), where δk (x) = − xT Σ−1 T −1
k x + x Σk µk − µk Σk µk − log |Σk | + log πk . (3.4)
k={1,...,K} 2 2 2

This is QDA, and δk (x) is called the discriminant function.

Hint: In lecture 4, an equivalent derivation for LDA was made, assuming Σk = Σ is constant for all k. Look on
that one and extend it accordingly for QDA by relaxing this assumption.
(b) Consider two classes k and l. Show that the decision boundary between these two classes is given by a quadratic
function.

3.7 Curse of dimensionality

For large number of inputs p, some methods, such as the nonparametric k-NN, may perform bad due to the large
dimensionality p of the input space. The problem is that the concept of ‘near’ or ‘close’ is very much depending on the
number of dimensions p, and is commonly referred to as the curse of dimensionality. To investigate this, we will now
consider an alternative version of the k-NN method, considering all neighbors within a fixed hypercube (instead of the
k nearest) for making the decision.

(a) Suppose that p = 1, and that the inputs x are uniformly distributed on [0, 1]. We decide to consider all observations
with an input within a ±0.05 interval (as an alternative to using the k nearest observed inputs in the k-NN method)
when making predictions. We now want to predict a test observation with input X = 0.3. On average, what fraction
of all training observations will be used in making the prediction?
(b) Now consider the corresponding situation for p = 2: The inputs are uniformly distributed on [0, 1] × [0, 1], and for
making predictions we use all training observations within ±0.05 in each dimension. On average, what fraction of
all training observations will we use when making a prediction for a test observation with input x? = [0.3 0.6]T ?
(c) In general, what fraction of all training observations will be used in predictions if there are p dimensions? As before,
all inputs are uniformly distributed on [0, 1]p and for prediction we consider the training observations within ±0.05
for each dimension. You may ignore the boundary effects if the test input is within 0.05 from the borders 0 or 1.
(d) Based on your answers to (a)-(c), argue why the prediction performance of k-NN might deteriorate for large p.
(e) If the inputs are distributed as in (c), and we want to make predictions using 10% of the training data inputs,
which length should the side of a symmetric hypercube have, that covers on average 10% of the inputs?

January 29, 2019

Statistical Machine Learning, 1RT700: Exercises Lesson 3 - Classification

Solutions

3.1 (a) The Euclidian distances are in the rightmost column below
i x y distance kx − x? k
1 0 3 0 Red 3
2 2 0 0 Red √ 2
3 0 1 3 Red √10 ≈ 3.2
4 0 1 2 Green √5 ≈ 2.2
5 −1 0 1 Green √2 ≈ 1.4
6 1 1 1 Red 3 ≈ 1.7

(b) k-NN with k = 1, the closest observation, i.e., observation 5, is selected as prediction. Thus, the prediction is
green.
(c) The 3 closest observation are observation 5, 6, and 2. Thus, the prediction is red.

3.2 (a) The probability of getting a 5 using the parameters βb0 = −6, βb1 = 0.05 is

eβb0 +βb1 x1 +βb2 x2

p(y = 1 | x) = (3.5)
1 + eβb0 +βb1 x1 +βb2 x2
e−6+0.051x1 +x2
= (3.6)
1 + e−6+0.05x1 +1x2
Now, with x1 = 40 and x2 = 3.5,

e−6+0.05 · 40+1 · 3.5

p(y = 1 | x) = (3.7)
1 + e−6+0.05 · 40+1 · 3.5
e−0.5
= (3.8)
1 + e−0.5
1
= ≈ 38%. (3.9)
1 + e0.5

(b) Set p(y = 1 | x) = 0.5 and x2 = 3.5. This gives

e−6+0.05x1 +3.5
0.5 = (3.10)
1 + e−6+0.05x1 +3.5
1
= 2.5−0.05x1 ⇒ (3.11)
e +1
0.5(1 + e2.5−0.05x1 ) = 1 ⇒ (3.12)
1
e2.5−0.05x1 = −1=1⇒ (3.13)
0.5
2.5 − 0.05x1 = log(1) = 0 ⇒ (3.14)
2.5
x1 = = 50 h. (3.15)
0.05

January 29, 2019

Statistical Machine Learning, 1RT700: Exercises Lesson 3 - Classification

3.3 (a) We can always expect QDA to perform better than LDA on the training set because it is more flexible and is
capable of fitting the training data better. If the Bayes decision boundary is linear, we expect LDA to perform
better on test data because it does not overfit.
(b) If the Bayes decision boundary is nonlinear, we expect QDA to be able to perform better also on the test sets.
(c) In general we expect the test error rate to improve with QDA relative to LDA as the sample size n increases, since
QDA is more flexible and will therefore be able to be closer to the Bayes decision boundary. (However, for small
n, it may overfit to the training data.)
(d) False. With few data points n, the QDA is likely to overfit (yielding a higher test error rate than LDA), which the
LDA cannot if the Bayes decision boundary also is linear.

3.4 (a) We have the output y describing the patient status as {infected, healthy}, and the input x being the body temper-
ature:

• p(x | y = infected) = N (x | 38.5, 1)

• p(x | y = healthy) = N (x | 37.5, 0.5)
• p(infected) = 0.05, and hence p(healthy) = 0.95

Inserting the expressions and patient’s temperatures into Bayes’ theorem, we get
p(Patient A is infected) = p(y = infected | x = 38.5) = 0.09,
p(patient B is infected) = p(y = infected | x = 39.2) = 0.34,
p(patient C is infected) = p(y = infected | x = 40.1) = 0.90.

(b) Bayes’ classifier (predicting the most likely class) minimizes the average number of misclassifications. For this
problem, it gives the following predictions: patient A healthy, patient B healthy, patient C infected.
(c) Since the disease is deadly, there is an asymmetry in the problem. The consequences of falsely classifying an
infected patient as healthy is probably worse than falsely classifying a healthy patient as infected (despite the side
effects of the drug). A classifier designed with this asymmetry in mind would probably also predict patient B, and
perhaps also A, as infected.
A useful tool for such a design could be the confusion matrix.
(d) What is special in this problem is that we are not given training data, but instead we have access to information
about the distributions p(y | x) and p(y).

3.5 Logistic regression has a training error rate of Ptraining = 20% and test error rate of Ptest = 30%. k-NN (k = 1):
P +P
average error rate of training2 test = 18%.
However, for k-NN with k=1, the training error rate is Ptraining = 0% because for any training observation, its nearest
neighbor will be the response itself. So, k-NN has a test error rate of Ptraining = 36%. I would choose logistic regression
because of its lower test error rate of 30%.

3.6 (a) Since the denominator in (3.2) does not depend on k, we get
yb = arg max p(y = k|x)
k={1,...,K}

= arg max p(x | k)p(k)

k={1,...,K}

= arg max log p(x | k)p(k)
k={1,...,K}

Further, we see that

log p(x | k)p(k) = log p(k) + log p(x | k)
1 1
= − (x − µk )T Σ−1
k (x − µk ) − log |Σk | + log(πk ) + const.
2 2 | {z }
independent of k
1 1 T −1 1
= − xT Σ−1 T −1
k x + x Σk µk − µk Σk µk − log |Σk | + log(πk ) + const.
| 2 2 {z 2 | {z }
} independent of k
=δk (x)

January 29, 2019

Statistical Machine Learning, 1RT700: Exercises Lesson 3 - Classification

where the classification problem can be written as

yb = arg max δk (x). (3.16)

k={1,...,K}

(b) Compare two classes y = k and y = l. The decision boundary between these two classes is given by

p(y = k|x) = p(y = l|x) ⇒ δk (x) − δl (x) = 0,

i.e., where the predicted probability for each of the two classes are equally high.
This gives
1 1 T −1 1
δk (x) − δl (x) = − xT Σ−1 T −1
k x + x Σk µk − µk Σk µk − log |Σk | + log(πk )
2 2 2
1 T −1 1 T −1 1

T −1
− − x Σl x + x Σl µl − µl Σl µl − log |Σl | + log(πl )
2 2 2
1 T −1
= − x (Σk − Σ−1 −1 −1
l )x + x (Σk µk − Σl µl )
T
2
1 1 1 1
− µT Σ−1 µ − log |Σk | + log(πk ) + µT Σ−1 µl + log |Σl | − log(πl ),
2 k k k 2 2 l l 2
which is a quadratic function in x as long as Σk 6= Σl .

3.7 (a) All training observations with inputs on the interval [0.25, 0.35] will be used for making the prediction. Since the
inputs are uniformly distributed on the interval [0, 1], will 10% on the average be in [0.25, 0.35], and hence be used
for the prediction.
(b) In this case, we will use all training observations with inputs in the square [0.25, 0.35] × [0.55, 0.65]. The square
covers 1% of [0, 1] × [0, 1], and hence will on average only 1% of the training observations be used in the predictions.
(c) The probability of an input to be inside the hypercube with dimensions 0.1 × · · · × 0.1 is 0.1 per dimension, and
thus 0.1p for all dimensions.
(d) If p is large, the nearest neighbor to a test input might still be quite far away from the test input: when p = 1 in
the situation in (a), about 10% of the training data could be expected to be within ±0.05, and we can thus expect
to find an observation ‘similar’ to the test case among the training data, yielding a hopefully useful prediction. If
p = 10 in (c), only 0.000001% of the training data could be expected to be within the hypercube ±0.05 for each
dimension around the test input, and it seems less likely that we have an observation among the training data that
is ‘similar’ to the test case, and the prediction performance may therefore deteriorate.
(e) • For p = 1, the side needs to be 0.1.

• For p = 2, the side needs to be 0.11/2 = 0.316.

• For p = 3, the side needs to be 0.11/3 = 0.464.
• ...

• For p, the side needs to be 0.11/p .

Thus, if the number of input is high, let us say p = 100, the side of the cube needs to be 0.977. This means that
if we want on average to use 10% of the training data for a prediction, we would need to include almost the entire
range of each input dimension in order to achieve that, because of the large number of dimensions.

January 29, 2019

Jntuk R20 ML MANUAL
100% (1)
Jntuk R20 ML MANUAL
53 pages
Homework 1
0% (1)
Homework 1
4 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
5 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Genlyd Model For Noise Annoyance
No ratings yet
Genlyd Model For Noise Annoyance
121 pages
Classification
No ratings yet
Classification
11 pages
HW02 Sol - KNN DT
No ratings yet
HW02 Sol - KNN DT
8 pages
HW02 - KNN DT
No ratings yet
HW02 - KNN DT
3 pages
ml-20230316-1
No ratings yet
ml-20230316-1
9 pages
2223hk1 Slide03 ML2022
No ratings yet
2223hk1 Slide03 ML2022
33 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
Wa0006.
No ratings yet
Wa0006.
4 pages
Solution 2
0% (1)
Solution 2
6 pages
Final Exam Epfl 2020 Machine Leaning
No ratings yet
Final Exam Epfl 2020 Machine Leaning
16 pages
Assignment #2 Introduction To Classification
No ratings yet
Assignment #2 Introduction To Classification
4 pages
3 2KNN
No ratings yet
3 2KNN
27 pages
SDS Solution1
No ratings yet
SDS Solution1
26 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
Extra 2
No ratings yet
Extra 2
7 pages
ESL: Chapter 1: 1.1 Introduction To Linear Regression
No ratings yet
ESL: Chapter 1: 1.1 Introduction To Linear Regression
4 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Overview of Supervised Learning
No ratings yet
Overview of Supervised Learning
41 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
ML Midterm Question Pool
No ratings yet
ML Midterm Question Pool
7 pages
Legal 3 AI
No ratings yet
Legal 3 AI
3 pages
Machine 2020 Jul-Dec Practice 7,8
No ratings yet
Machine 2020 Jul-Dec Practice 7,8
37 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
09. Stochastic Gradient Descent 1
No ratings yet
09. Stochastic Gradient Descent 1
42 pages
Assignment 1-12 ML
No ratings yet
Assignment 1-12 ML
54 pages
AML Winter 2021 Solution
No ratings yet
AML Winter 2021 Solution
6 pages
Lecture10 Mid
No ratings yet
Lecture10 Mid
43 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Male
No ratings yet
Male
9 pages
Statistical learning
No ratings yet
Statistical learning
4 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
08 Kmethods3 Curse Deminsionality
No ratings yet
08 Kmethods3 Curse Deminsionality
44 pages
10 EST Solution
No ratings yet
10 EST Solution
16 pages
Questions and Solutions On Linear Regression
No ratings yet
Questions and Solutions On Linear Regression
5 pages
Topic 2 Matlab Examples
No ratings yet
Topic 2 Matlab Examples
5 pages
cor
No ratings yet
cor
6 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
4Gaussian Discriminant
No ratings yet
4Gaussian Discriminant
50 pages
Final Assignment
No ratings yet
Final Assignment
7 pages
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
No ratings yet
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
79 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Cours 2 MVA
No ratings yet
Cours 2 MVA
5 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
ISLR solutions——Classification
No ratings yet
ISLR solutions——Classification
20 pages
Multivariate classification
No ratings yet
Multivariate classification
7 pages
Homework
No ratings yet
Homework
6 pages
TQM - TRG - F-09 - Discriminant Analysis - Rev01 - 20180602 PDF
No ratings yet
TQM - TRG - F-09 - Discriminant Analysis - Rev01 - 20180602 PDF
22 pages
Ps 3
No ratings yet
Ps 3
6 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
BZAN_6310-project_instructions
No ratings yet
BZAN_6310-project_instructions
4 pages
Minitab 19 Statistical Software For Mac
No ratings yet
Minitab 19 Statistical Software For Mac
3 pages
Abraham Tenaw 4 Power Point Proposal
No ratings yet
Abraham Tenaw 4 Power Point Proposal
32 pages
Credit Card Fraud Detection Using Machine Learning Techniques
No ratings yet
Credit Card Fraud Detection Using Machine Learning Techniques
9 pages
Poultry Farmers Willingness To Pay For Agricultural Insurance Policy in Kogi State, Nigeria
No ratings yet
Poultry Farmers Willingness To Pay For Agricultural Insurance Policy in Kogi State, Nigeria
10 pages
Accurate Prediction of Heart Disease Using Machine Learning-A Case Study On The Cleveland Dataset - IJISRT24JUL1400
No ratings yet
Accurate Prediction of Heart Disease Using Machine Learning-A Case Study On The Cleveland Dataset - IJISRT24JUL1400
8 pages
Data Science in R Interview Questions and Answers
No ratings yet
Data Science in R Interview Questions and Answers
56 pages
Prediction of High - Risk Emergency Department Revisits From A Machine - Learning Algorithm A Proof - of - Concept Study
No ratings yet
Prediction of High - Risk Emergency Department Revisits From A Machine - Learning Algorithm A Proof - of - Concept Study
9 pages
Crop Prediction System Final Report
No ratings yet
Crop Prediction System Final Report
46 pages
(Ebook) Practice of Statistics in the Life Sciences by Brigitte Baldi, David S. Moore ISBN 9781319013370, 1319013376 - Quickly download the ebook to start your content journey
100% (1)
(Ebook) Practice of Statistics in the Life Sciences by Brigitte Baldi, David S. Moore ISBN 9781319013370, 1319013376 - Quickly download the ebook to start your content journey
74 pages
A SLR On Customer Dropout Prediction 44
No ratings yet
A SLR On Customer Dropout Prediction 44
29 pages
Mock
No ratings yet
Mock
35 pages
LM04 Extensions of Multiple Regression IFT Notes
No ratings yet
LM04 Extensions of Multiple Regression IFT Notes
17 pages
Ass 1
No ratings yet
Ass 1
3 pages
TransCAD 19 TabulationsAndStatistics
No ratings yet
TransCAD 19 TabulationsAndStatistics
23 pages
Research Methodology Assignment 2: By: Nirmal K Manoj Priyanka R Yashraj Singh Rathore Shwethaa S M
No ratings yet
Research Methodology Assignment 2: By: Nirmal K Manoj Priyanka R Yashraj Singh Rathore Shwethaa S M
15 pages
Arenius Minniti 2005
No ratings yet
Arenius Minniti 2005
16 pages
Credit Card Fraud Detection Using Machine Learning
100% (1)
Credit Card Fraud Detection Using Machine Learning
5 pages
Children in The Slums of Dhaka Diarrhoea Prevalence and Its Impl
No ratings yet
Children in The Slums of Dhaka Diarrhoea Prevalence and Its Impl
30 pages
Scam Compliance and The Psychology of Persuasion
No ratings yet
Scam Compliance and The Psychology of Persuasion
34 pages
Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjee 2024 Scribd Download
100% (1)
Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjee 2024 Scribd Download
69 pages
ML QB
No ratings yet
ML QB
13 pages
Estimating Group Fixed Effects in Panel Data With A Binary Dependent Variable How The LPM Outperforms Logistic Regression in Rare Events Data
No ratings yet
Estimating Group Fixed Effects in Panel Data With A Binary Dependent Variable How The LPM Outperforms Logistic Regression in Rare Events Data
12 pages
Statistical Sleuthing Through Generalized Linear Models: Statistics 149
No ratings yet
Statistical Sleuthing Through Generalized Linear Models: Statistics 149
4 pages
Download eTextbook 978-1305268920 Fundamentals of Biostatistics ebook All Chapters PDF
100% (7)
Download eTextbook 978-1305268920 Fundamentals of Biostatistics ebook All Chapters PDF
65 pages
(Article 2020) A Cross-Country Study On Corpor
No ratings yet
(Article 2020) A Cross-Country Study On Corpor
38 pages
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
No ratings yet
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
35 pages
Multiple Choice Models Part I - MNL, Nested Logit
No ratings yet
Multiple Choice Models Part I - MNL, Nested Logit
33 pages

Classification: K N X X X y I y

Uploaded by

Classification: K N X X X y I y

Uploaded by

Statistical Machine Learning, 1RT700: Exercises Lesson 3 - Classification

3.1 k nearest neighbor

3.2 Logistic regression

eβ0 +β1 x1 +β2 x2

with parameters βb0 = −6, βb1 = 0.05, βb2 = 1.

January 29, 2019

3.3 Difference between LDA and QDA

3.4 Bayes’ classifier

3.5 Error rates

January 29, 2019

3.6 Quadratic Discriminant Analysis

(each with a different µk , Σk and πk ).

This is QDA, and δk (x) is called the discriminant function.

3.7 Curse of dimensionality

January 29, 2019

eβb0 +βb1 x1 +βb2 x2

e−6+0.05 · 40+1 · 3.5

(b) Set p(y = 1 | x) = 0.5 and x2 = 3.5. This gives

January 29, 2019

• p(x | y = infected) = N (x | 38.5, 1)

= arg max p(x | k)p(k)

Further, we see that

January 29, 2019

where the classification problem can be written as

yb = arg max δk (x). (3.16)

p(y = k|x) = p(y = l|x) ⇒ δk (x) − δl (x) = 0,

• For p = 2, the side needs to be 0.11/2 = 0.316.

• For p, the side needs to be 0.11/p .

January 29, 2019

You might also like