Ex 2
Ex 2
Machine Learning
May 3, 2012
Introduction
In this exercise, you will implement logistic regression and apply it to two
different datasets. Before starting on the programming exercise, we strongly
recommend watching the video lectures and completing the review questions
for the associated topics.
To get started with the exercise, you will need to download the starter
code and unzip its contents to the directory where you wish to complete
the exercise. If needed, use the cd command in Octave to change to this
directory before starting this exercise.
You can also find instructions for installing Octave on the Octave Installation page on the course website.
Throughout the exercise, you will be using the scripts ex2.m and ex2 reg.m.
These scripts set up the dataset for the problems and make calls to functions
that you will write. You do not need to modify either of them. You are only
required to modify functions in other files, by following the instructions in
this assignment.
Logistic Regression
In this part of the exercise, you will build a logistic regression model to
predict whether a student gets admitted into a university.
Suppose that you are the administrator of a university department and
you want to determine each applicants chance of admission based on their
results on two exams. You have historical data from previous applicants
that you can use as a training set for logistic regression. For each training
example, you have the applicants scores on two exams and the admissions
decision.
Your task is to build a classification model that estimates an applicants
probability of admission based the scores from those two exams. This outline
and the framework code in ex2.m will guide you through the exercise.
1
Octave is a free alternative to MATLAB. For the programming exercises, you are free
to use either Octave or MATLAB.
1.1
Exam 2 score
80
70
60
50
40
30
30
40
50
60
70
Exam 1 score
80
90
100
1.2
1.2.1
Implementation
Warmup exercise: sigmoid function
Before you start with the actual cost function, recall that the logistic regression hypothesis is defined as:
h (x) = g(T x),
where function g is the sigmoid function. The sigmoid function is defined as:
g(z) =
1
.
1 + ez
Now you will implement the cost function and gradient for logistic regression.
Complete the code in costFunction.m to return the cost and gradient.
Recall that the cost function in logistic regression is
m
1 X (i)
y log(h (x(i) )) (1 y (i) ) log(1 h (x(i) )) ,
J() =
m i=1
and the gradient of the cost is a vector of the same length as where the j th
element (for j = 0, 1, . . . , n) is defined as follows:
4
1 X
J()
(i)
=
(h (x(i) ) y (i) )xj
j
m i=1
Note that while this gradient looks identical to the linear regression gradient, the formula is actually different because linear and logistic regression
have different definitions of h (x).
Once you are done, ex2.m will call your costFunction using the initial
parameters of . You should see that the cost is about 0.693.
You should now submit the cost function and gradient for logistic regression. Make two submissions: one for the cost function and one for the
gradient.
1.2.3
In this code snippet, we first defined the options to be used with fminunc.
Specifically, we set the GradObj option to on, which tells fminunc that our
function returns both the cost and the gradient. This allows fminunc to
use the gradient when minimizing the function. Furthermore, we set the
MaxIter option to 400, so that fminunc will run for at most 400 steps before
it terminates.
To specify the actual function we are minimizing, we use a short-hand
for specifying functions with the @(t) ( costFunction(t, X, y) ) . This
creates a function, with argument t, which calls your costFunction. This
allows us to wrap the costFunction for use with fminunc.
If you have completed the costFunction correctly, fminunc will converge
on the right optimization parameters and return the final values of the cost
and . Notice that by using fminunc, you did not have to write any loops
yourself, or set a learning rate like you did for gradient descent. This is all
done by fminunc: you only needed to provide a function calculating the cost
and the gradient.
Once fminunc completes, ex2.m will call your costFunction function
using the optimal parameters of . You should see that the cost is about
0.203.
This final value will then be used to plot the decision boundary on the
training data, resulting in a figure similar to Figure 2. We also encourage
you to look at the code in plotDecisionBoundary.m to see how to plot such
a boundary using the values.
1.2.4
After learning the parameters, you can use the model to predict whether a
particular student will be admitted. For a student with an Exam 1 score
of 45 and an Exam 2 score of 85, you should expect to see an admission
probability of 0.776.
Another way to evaluate the quality of the parameters we have found
is to see how well the learned model predicts on our training set. In this
100
Admitted
Not admitted
90
Exam 2 score
80
70
60
50
40
30
30
40
50
60
70
Exam 1 score
80
90
100
In this part of the exercise, you will implement regularized logistic regression
to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure
it is functioning correctly.
Suppose you are the product manager of the factory and you have the
test results for some microchips on two different tests. From these two tests,
you would like to determine whether the microchips should be accepted or
rejected. To help you make the decision, you have a dataset of test results
on past microchips, from which you can build a logistic regression model.
7
You will use another script, ex2 reg.m to complete this portion of the
exercise.
2.1
1
0.8
Microchip Test 2
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
0.5
0.5
Microchip Test 1
1.5
2.2
Feature mapping
One way to fit the data better is to create more features from each data
point. In the provided function mapFeature.m, we will map the features into
all polynomial terms of x1 and x2 up to the sixth power.
1
x1
x2
x21
x1 x2
x22
x31
..
.
mapFeature(x) =
x1 x52
x62
2.3
Now you will implement code to compute the cost function and gradient for
regularized logistic regression. Complete the code in costFunctionReg.m to
return the cost and gradient.
Recall that the regularized cost function in logistic regression is
#
m
n
X 2
1 X (i)
(i)
(i)
(i)
y log(h (x )) (1 y ) log(1 h (x )) +
.
J() =
m i=1
2m j=2 j
"
Note that you should not regularize the parameter 0 . In Octave, recall that indexing starts from 1, hence, you should not be regularizing the
theta(1) parameter (which corresponds to 0 ) in the code. The gradient of
the cost function is a vector where the j th element is defined as follows:
m
J()
1 X
(i)
=
(h (x(i) ) y (i) )xj
0
m i=1
9
for j = 0
J()
=
j
1 X
(i)
(h (x(i) ) y (i) )xj
m i=1
!
+
j
m
for j 1
Once you are done, ex2 reg.m will call your costFunctionReg function
using the initial value of (initialized to all zeros). You should see that the
cost is about 0.693.
You should now submit the cost function and gradient for regularized logistic regression. Make two submissions, one for the cost function and one
for the gradient.
2.3.1
Similar to the previous parts, you will use fminunc to learn the optimal
parameters . If you have completed the cost and gradient for regularized
logistic regression (costFunctionReg.m) correctly, you should be able to step
through the next part of ex2 reg.m to learn the parameters using fminunc.
2.4
To help you visualize the model learned by this classifier, we have provided the function plotDecisionBoundary.m which plots the (non-linear)
decision boundary that separates the positive and negative examples. In
plotDecisionBoundary.m, we plot the non-linear decision boundary by computing the classifiers predictions on an evenly spaced grid and then and drew
a contour plot of where the predictions change from y = 0 to y = 1.
After learning the parameters , the next step in ex reg.m will plot a
decision boundary similar to Figure 4.
10
2.5
In this part of the exercise, you will get to try out different regularization
parameters for the dataset to understand how regularization prevents overfitting.
Notice the changes in the decision boundary as you vary . With a small
, you should find that the classifier gets almost every training example
correct, but draws a very complicated boundary, thus overfitting the data
(Figure 5). This is not a good decision boundary: for example, it predicts
that a point at x = (0.25, 1.5) is accepted (y = 1), which seems to be an
incorrect decision given the training set.
With a larger , you should see a plot that shows an simpler decision
boundary which still separates the positives and negatives fairly well. However, if is set to too high a value, you will not get a good fit and the decision
boundary will not follow the data so well, thus underfitting the data (Figure
6).
You do not need to submit any solutions for these optional (ungraded)
exercises.
lambda = 1
1.2
y=1
y=0
Decision boundary
1
0.8
Microchip Test 2
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
0.5
0.5
Microchip Test 1
1.5
11
lambda = 0
1.5
y=1
y=0
Decision boundary
Microchip Test 2
0.5
0.5
1
1
0.5
0.5
Microchip Test 1
1.5
1.2
y=1
y=0
Decision boundary
1
0.8
Microchip Test 2
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
0.5
0.5
Microchip Test 1
1.5
12
Submitted File
sigmoid.m
costFunction.m
costFunction.m
predict.m
costFunctionReg.m
costFunctionReg.m
Points
5 points
30 points
30 points
5 points
15 points
15 points
100 points
You are allowed to submit your solutions multiple times, and we will take
only the highest score into consideration. To prevent rapid-fire guessing, the
system enforces a minimum of 5 minutes between submissions.
All parts of this programming exercise are due Saturday, May 19th at
23:59:59 PDT.
13