0% found this document useful (0 votes)
35 views3 pages

Assignment 1

This document provides the details for Assignment #1 of CSE 575: Statistical Machine Learning at [University]. It includes 5 questions covering topics like Bayes classifier, parameter estimation, naive Bayes classifier, logistic regression, and comparing naive Bayes and logistic regression. Students are asked to submit their solutions in a PDF or word file named with their name by February 8, 2018.

Uploaded by

Razin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views3 pages

Assignment 1

This document provides the details for Assignment #1 of CSE 575: Statistical Machine Learning at [University]. It includes 5 questions covering topics like Bayes classifier, parameter estimation, naive Bayes classifier, logistic regression, and comparing naive Bayes and logistic regression. Students are asked to submit their solutions in a PDF or word file named with their name by February 8, 2018.

Uploaded by

Razin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CSE 575: Statistical Machine Learning Assignment #1

Instructor: Prof. Jingrui He


Out: Jan 19, 2018; Due: Feb 8, 2018
Submit electronically, using the submission link on Blackboard for Assignment #1, a file named
yourFirstName-yourLastName.pdf containing your solution to this assignment (a .doc
or .docx file is also acceptable, but .pdf is preferred).

1 Bayes Classifier [15 points]

Suppose that in your coin flip experiment, you observed a set of αH heads and αT tails. Let θ
denote the probability of observing heads, whose prior distribution follows Beta(βH , βT ), where
βH and βT are two positive parameters. Prove that the posterior distribution P (θ|D) (D denotes
the observed coin flips) follows Beta(βH + αH , βT + αT ). What is the mean of P (θ|D)? What is
the MAP estimator θ̂M AP of θ?

2 Parameter Estimation [15 points]

For this question, assume that x1 , . . . , xN ∈ R are i.i.d samples drawn from the same underlying
distribution. Assume that the underlying distribution is Gaussian N (µ, σ 2 ).

1. (5 points) Let µ̂M LE denote the MLE estimator of µ. Please prove that µ̂M LE is unbiased.
Hint: The bias of an estimator of the parameter µ is defined to be the difference between
the expected value of the estimator and µ.

2. (10 points) If the true value of µ is unknown, then the MLE estimator of σ 2 is as follows.
N
2 1 X
σ̂M LE = (xi − µ̂M LE )2
N i=1

2
Please prove that σ̂M LE is biased.

3 Naı̈ve Bayes Classifier [20 points]

Given the training data set shown in Figure 1, we train a Naı̈ve Bayes classifier with it. Each row
refers to a person, where the categorical features (age, income etc.) and the class label (whether
he/she buys a computer) are shown.

1. (5 points) How many independent parameters would be there for the Naı̈ve Bayes classifier
trained with this data? What are they? Justify the your answers.

2. (10 points) Using standard MLE, what are the estimated values for these parameters?

3. (5 points) Given a new person with features x = (youth, medium, yes, f air), what is P (y =
yes|x)? Would the Naı̈ve Bayes classifier predict y = yes or y = no for this person?

1
Figure 1: Training Data for Naı̈ve Bayes Classifier

4 Logistic Regression [20 points]

Suppose we have two positive examples x1 = (1, 0) and x2 = (0, −1) and two negative examples
x3 = (0, 1) and x4 = (−1, 0). Apply standard gradient ascent method to train a logistic regression
classifier (without any regularization terms). Initialize the weight vector with two different values
and set w00 = 0 (e.g. w0 = (0, 0, 0)0 , w0 = (0, 1, 0)0 ). Would the final weight vector (w∗ ) be the
same for the two different initial values? What are the values? Please explain your answer. You
may assume the learning rate to be a positive real constant η.

5 Naı̈ve Bayes Classifier and Logistic Regression [30 points]

1. (5 points) Gaussian Naı̈ve Bayes and Logistic Regression. Suppose a logistic regression
model and a Gaussian Naı̈ve Bayes classifier are trained for a binary classification task f :
X → Y where X is real-valued features X =< X1 , ..., Xd >∈ Rd , Y = {0, 1} is the
binary label. After training, we get the weight vector w =< w0 , w1 , ..., wd > for the logistic
regression model.
Recall that in Gaussian Naı̈ve Bayes, each feature Xi (i = 1, ..., d) is assumed to be
conditional independent given the label Y so that P (Xi |Y = k) = N (µik , σik ) (k =
0, 1; i = 1, ..., d). We assume that the marginal distribution of class labels P (Y ) follows
Bernoulli(θ, 1 − θ) (P (Y = 1) = θ, P (Y = 0) = 1 − θ).

– How many independent parameters are there in this Gaussian Naı̈ve Bayes classifier?
What are them?
– Can we translate w into the parameters of an equivalent Gaussian Naı̈ve Bayes classifier
without any extra assumption? If that is the case, justify your answer. Otherwise, please
specify what extra assumption(s) you need to complete the translation and explain why.

2. (25 points) Implementation of Gaussian Naı̈ve Bayes and Logistic Regression. Compare
the two approaches on the bank note authentication dataset, which can be downloaded from

2
http://archive.ics.uci.edu/ml/datasets/banknote+authentication. Complete description of the
dataset can be also found on this webpage. In short, for each row the first four columns are
the feature values and the last column is the class label (0 or 1). You will observe the learn-
ing curves similar to those Dr. He mentioned in class. Implement a Gaussian Naı̈ve Bayes
classifier (recall the conditional independent assumption mentioned before) and a logistic
regression classifier. Please write your own code from scratch and do NOT use existing
functions or packages which can provide you the Naı̈ve Bayes Classifier/Logistic Re-
gression class or fit/predict function (e.g. sklearn). But you can use some basic linear
algebra/probability functions (e.g. numpy.sqrt(), numpy.random.normal()). For the Naı̈ve
Bayes classifier, assume that P (xi |y) ∼ N (µi,k , σi,k ), where xi is a feature in the bank note
data, and y is the class label. Use three-fold cross-validation to split the data and train/test
your models.

– (5 points) For each algorithm: briefly describe how you implement it by giving the
pseudocode. The pseudocode must include equations for estimating the model pa-
rameters and for classifying a new example. Remember, this should not be a print-
out of your code, but a high-level outline description. Include the pseudocode in
your pdf file (or .doc/.docx file). Submit the actual code as a single zip file named
yourFirstName-yourLastName.zip IN ADDITION TO the pdf file (or .doc/.docx file).
– (10 points) Plot a learning curve: the accuracy vs. the size of the training set. Plot 6
points for the curve, using [.01 .02 .05 .1 .625 1] RANDOM fractions of you training
set and testing on the full test set each time. Average your results over 5 runs using
each random fraction (e.g. 0.05) of the training set. Plot both the Naı̈ve Bayes and
logistic regression learning curves on the same figure. For logistic regression, do not
use any regularization term.
– (10 points) Show the power of generative model: Use your trained Naı̈ve Bayes classi-
fier (with the complete training set) to generate 400 examples from class y = 1. Report
the mean and variance of the generated examples and the corresponding training data
(for each fold, over 1 run). and compare with those in your training set (examples in
training set with y = 1). Try to explain what you observed in this comparison.

You might also like