lec43
lec43
Lecture- 43
Logistic Regression
And the sigmoidal function that we used is given here and notice
that this is your hyperplane equation. And in n dimensions this quantity
is a scalar, because you have X elements n elements in X and n
elements in β1 and this becomes something like β0 + β11 x1 + β12 x2 and
so on β1n xn.
So, this is a scalar and then we saw that if this quantity is a very
large negative number, then the probability is 0 and if this quantity is a
very large positive number the probability is 1. And the transition of
the probability at 0.5 remember I said you have to always look at it
from 1 classes viewpoint.
So, let us say if you want class 1 to have high probability and class
0 is a, row prob, low probability case, then you need to have a
threshold that we described before that you could convert this into a
binary output by using a threshold. So, if you were to use a threshold of
0.5, because probabilities go from 0 and 1. And then you notice that
this p of X becomes 0.5 exactly when β 0 + β1 X = 0. This is because p
of X then = e 0 divided by 1 + equal 0 which is equal to 1 by 2.
Also notice another interesting thing that this equation is then the
equation of the hyperplane. So, if I had data like this and data like this
and if I draw this line any point on this line is the probability is equal to
0.5 point. That basically says that any point on this line in this 2 d case
or hyperplane, in the n dimensional case will have an equal probability
of belonging to either class 0 or class 1 which makes sense from what
we are trying to do. So, this model is what is called a logit model.
So, a line will separate this. And in a typical case in these kinds of
classification problems this is actually called as supervised
classification problem. We call this a supervised classification problem
because all of this data is labeled. So, I already know that all of this
data is coming from class 0 and all of this data is coming from class 1.
So, for example, the time of the day whether the credit card was done
at, the place where the person lives credit card transfer or credit card
use was done at the place the person lives and many other attributes.
So, if those are the attributes let us say many attributes are there. And
you have lots of records for normal use of credit card and some records
for fraudulent use of credit card.
Then you could build a classifier which given a new set of attributes
that is a new transaction that is being initiated, could identify what
likelihood it is of this transaction being fraudulent. So that is one other
way of thinking about the same problem. So, nonetheless as far as this
example is concerned what we need to do is we have to fill this column
with zeros and ones. If I fill a column with row with 0 then that means,
this data belongs to class 0 and if I fill it 1 then let us say this belongs
to class 1 and so on.
So, this is what we are trying to do, we do not know what the classes are.
And as we see here there are 3 decision variables, because this was
A₂ dimensional problem. So, 1 coefficient for each dimension and then
1 constant. Now once you have this then what you do is, you have your
expression for p of X which is as written before the sigmoid. So, this is
a sigmoidal function that we have been talking about. Then whenever
you get a test data, let us say 1 3, you plug this into this sigmoidal
function and you get a probability. Let us say the first data point when
you plug in you get a probability this.
So, if you use a threshold of 0.5 then what we are going to say is
anything less than 0.5 is going to belong to class 0 and anything greater
than 0.5 is going to belong to class 1. So, you will notice that this is 0
class 0, class 1, class 1, class 0, class 0, class 1, class 0, class 0, class 0.
So, as I mentioned in the previous slide what we wanted was to fill
this column and if you go across row then it says that particular sample
belongs to which class. So, now, what we have done is we have
classified these test cases, which the classifier did not see while you
were identifying these parameters.
So, typically what you do is if you have lots of data with class labels
already given one of the good things, you know that one should do is to
split this into training data and the test data. And the reason for
splitting this into training and test data is the following. In this case if
you look at it, we built a classifier based on some data and then we
tested it on some other data, but we have no way of knowing whether
these results are right or wrong.
So, we just have to take the results as it is. So, ideally what you
would like to do is, you would like to use some portion of the data to
build a classifier. And then you want to retain some portion of the data
for testing and the reason for retaining this is because the labels are
already known in this.
So, if I just give this portion of the data to the classifier, the
classifier will come up with some classification. Now that can be
compared with the already established labels for those data points. So,
from verifying how good your classifier is it is always a good idea to
split this into training and testing data. What proportion of data you use
for training, what proportion of data used for testing and so on are
things to think about.
Also there are many different ways of doing this validation as one
would call it with test data. There are techniques such as k fold
validation and so on. So, there are many ways of splitting the data into
train and test and then verifying how good your classifier is.
Nonetheless the most important idea to remember is that one should
always look at data and partition the data into training and testing so
that you get results that are consistent.
(Refer Slide Time: 10:23)
So, if one were to draw these points again that that we use this in
this exercise. So, these are all class 1 data points these are class 0 data
points and this is your hyperplane that a logistic regression model
figured out and these are the test points that we tried with this
classifier. So, you can see that in this case everything seems to be
working well, but as I said before you can look at results like this in 2
dimensions quite easily.
So, with this the portion on logistic regression comes to an end what
we are going to do next is we are going to show you an example case
study, where logistic regression is used for a solution. However, before
we do this case study since all the case studies on classification and
clustering will involve looking at the output from the r code, I am
going to take a typical output from the r code and there are several
results that will show up. These are called performance measures of a
classifier. I am going to describe what these performance measures are
and how you should interpret these performance measures once you
use a particular technique for any case study.
Thank you for listening to this lecture and I will see you in the next lecture.