0% found this document useful (0 votes)

2 views

lec43

The lecture focuses on Logistic Regression, explaining its use in modeling probabilities through a sigmoidal function and the concept of hyperplanes for classification. It discusses the importance of training and test data, as well as the need for regularization to prevent overfitting in models with many parameters. The session concludes with a preview of the next lecture, which will cover performance measures for classifiers and a case study using logistic regression.

Uploaded by

sarika satya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

lec43

Uploaded by

sarika satya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Data science for Engineers

Prof. Ragunathan Rengasamy

Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture- 43
Logistic Regression

(Refer Slide Time: 00:13)

We will continue our lecture on Logistic Regression that we

introduced in the last lecture. And if you recall from the last lecture we
modeled the probability as a sigmoidal Function.

And the sigmoidal function that we used is given here and notice
that this is your hyperplane equation. And in n dimensions this quantity
is a scalar, because you have X elements n elements in X and n
elements in β1 and this becomes something like β0 + β11 x1 + β12 x2 and
so on β1n xn.
So, this is a scalar and then we saw that if this quantity is a very
large negative number, then the probability is 0 and if this quantity is a
very large positive number the probability is 1. And the transition of
the probability at 0.5 remember I said you have to always look at it
from 1 classes viewpoint.

So, let us say if you want class 1 to have high probability and class
0 is a, row prob, low probability case, then you need to have a
threshold that we described before that you could convert this into a
binary output by using a threshold. So, if you were to use a threshold of
0.5, because probabilities go from 0 and 1. And then you notice that
this p of X becomes 0.5 exactly when β 0 + β1 X = 0. This is because p
of X then = e 0 divided by 1 + equal 0 which is equal to 1 by 2.

Also notice another interesting thing that this equation is then the
equation of the hyperplane. So, if I had data like this and data like this
and if I draw this line any point on this line is the probability is equal to
0.5 point. That basically says that any point on this line in this 2 d case
or hyperplane, in the n dimensional case will have an equal probability
of belonging to either class 0 or class 1 which makes sense from what
we are trying to do. So, this model is what is called a logit model.

(Refer Slide Time: 02:52)

Let us take a very simple example to understand this. So, let us

assume that we are given data. So, here we have data for class 0 and
data for class 1 and then clearly this is a 2 dimensional problem. So,
the hyperplane is going to be a line.

So, a line will separate this. And in a typical case in these kinds of
classification problems this is actually called as supervised
classification problem. We call this a supervised classification problem
because all of this data is labeled. So, I already know that all of this
data is coming from class 0 and all of this data is coming from class 1.

So, in other words I am being supervised in terms of what I should call

as class 0 and what I should call as class 1. So, in these kinds of
problems typically you have this and then you are given new data
which is called the test data and then the question is what class does
this test data belong to. So, it is either class 0 or class 1, as far as we
are concerned in this example.
Just to keep in mind that there would we use problems like this.
Remember at the beginning of this course I talked about fraud
detection and so on. Where you could have lots of records of
fraudulent let us say credit card use and all of those instances of
fraudulent credit card use you could describe by certain attribute.

So, for example, the time of the day whether the credit card was done
at, the place where the person lives credit card transfer or credit card
use was done at the place the person lives and many other attributes.
So, if those are the attributes let us say many attributes are there. And
you have lots of records for normal use of credit card and some records
for fraudulent use of credit card.

Then you could build a classifier which given a new set of attributes
that is a new transaction that is being initiated, could identify what
likelihood it is of this transaction being fraudulent. So that is one other
way of thinking about the same problem. So, nonetheless as far as this
example is concerned what we need to do is we have to fill this column
with zeros and ones. If I fill a column with row with 0 then that means,
this data belongs to class 0 and if I fill it 1 then let us say this belongs
to class 1 and so on.

So, this is what we are trying to do, we do not know what the classes are.

(Refer Slide Time: 05:34)

So, just so let me see this it is a very simple problem we have

plotted the same data that was shown in the last table. And you would
notice that if you wanted a classifier, something like this would do. So,
this problem is linearly separable. So, you could you could come up
with a line that does it. So, let us see what happens if we use logistic
regression to solve this problem.
(Refer Slide Time: 06:04)

So, if you did a logistics regression solution, then in this case it

turns out that the parameter values are these. And how did we get these
parameter values? These parameters values are guard through the
optimization formulation, where 1 is maximizing log likelihood with β 0
, β11 and β12 as decision variables.

And as we see here there are 3 decision variables, because this was
A₂ dimensional problem. So, 1 coefficient for each dimension and then
1 constant. Now once you have this then what you do is, you have your
expression for p of X which is as written before the sigmoid. So, this is
a sigmoidal function that we have been talking about. Then whenever
you get a test data, let us say 1 3, you plug this into this sigmoidal
function and you get a probability. Let us say the first data point when
you plug in you get a probability this.

So, if you use a threshold of 0.5 then what we are going to say is
anything less than 0.5 is going to belong to class 0 and anything greater
than 0.5 is going to belong to class 1. So, you will notice that this is 0
class 0, class 1, class 1, class 0, class 0, class 1, class 0, class 0, class 0.
So, as I mentioned in the previous slide what we wanted was to fill
this column and if you go across row then it says that particular sample
belongs to which class. So, now, what we have done is we have
classified these test cases, which the classifier did not see while you
were identifying these parameters.

So, the process of identifying these parameters is what is usually

called in machine learning algorithms as training. So, you are training
the classifier to be able to solve test cases later. And the data that you
use while these parameters are being identified are called the training
data and this is called the test data that you are testing a classifier with.

So, typically what you do is if you have lots of data with class labels
already given one of the good things, you know that one should do is to
split this into training data and the test data. And the reason for
splitting this into training and test data is the following. In this case if
you look at it, we built a classifier based on some data and then we
tested it on some other data, but we have no way of knowing whether
these results are right or wrong.

So, we just have to take the results as it is. So, ideally what you
would like to do is, you would like to use some portion of the data to
build a classifier. And then you want to retain some portion of the data
for testing and the reason for retaining this is because the labels are
already known in this.

So, if I just give this portion of the data to the classifier, the
classifier will come up with some classification. Now that can be
compared with the already established labels for those data points. So,
from verifying how good your classifier is it is always a good idea to
split this into training and testing data. What proportion of data you use
for training, what proportion of data used for testing and so on are
things to think about.

Also there are many different ways of doing this validation as one
would call it with test data. There are techniques such as k fold
validation and so on. So, there are many ways of splitting the data into
train and test and then verifying how good your classifier is.
Nonetheless the most important idea to remember is that one should
always look at data and partition the data into training and testing so
that you get results that are consistent.
(Refer Slide Time: 10:23)

So, if one were to draw these points again that that we use this in
this exercise. So, these are all class 1 data points these are class 0 data
points and this is your hyperplane that a logistic regression model
figured out and these are the test points that we tried with this
classifier. So, you can see that in this case everything seems to be
working well, but as I said before you can look at results like this in 2
dimensions quite easily.

However, when there are multiple dimensions it is very di cult to

visualize where the data point lies and so on. Nonetheless so, it gives
you an idea of what logistic regression is doing. It is actually doing a
linear classification here. However, based on the distance in some
sense from this hyperplane. We also assign a probability for the data
being in a particular class.
Now, there is one more idea that we want to talk about in logistic
regression. This idea is what is called as regularization. The idea here
is the following. If you notice the objective function that we used in the
general logistic regression, which is what we called as a log likelihood
objective function.
(Refer Slide Time: 11:42)

Here θ again speaks to the constants in the hyperplane or the

decision variables and this is the form of the equation that we saw in
the previous lecture and in the beginning of this lecture also I believe.
Now, if you have n variables in your problem or n features or n
attributes then the number of decision variables that you are identifying
are n + 1. So, 1 constant for each variable and the constant if this n
becomes very large when there are large number of variables that are
present then, what happens is this logistic regression models can over t
because there are so many parameters that you could tend to over t the
data.

So, to prevent this what we want to do is somehow we want to say

though you have this n + 1 decision variables to use, one would want
these decision variables to be used sparingly. So, whenever you use a
coefficient for a variable, for the classification problem, then we want
ensure that you get the maximum value for using that variable in the
classification problem. So, in other words if let us say there are 2
variables I say β0 β11 x1 + β12 x2.Then for this classifier, I am using both
let us say variables x1 and x2 as being important. What 1 would like to
do is make sure that I use these only if they really contribute to the
solution or to the efficacy of the solution.
So, one might say that for every term that you use, you should get
something in return or in other words if you use a term and get nothing
in return I want to penalize this term. So, I want to penalize these
coefficients. This is what is typically called as regularization.
(Refer Slide Time: 14:23)

So, regularization avoids building complex models or it helps in

building non-complex models. So that your over fitting effects can be
reduced. So, how do we penalize this? So, notice that what we are
trying to do is we are trying to minimize a log likelihood.
So, what we do here is we add another term to the objective and λ is
called the regularization parameter and this h (θ) is some regularization
function. So, what we want to do is, when I choose the values of θ to
be very large, I want this function to be large So, that the penalty is
more or whenever I choose a variable right away a penalty kicks in.

And this penalty should be o set by the improvement I have in this

term of the objective function. So, that is the basic idea behind
regularization. Now this function could be of many types if you use
this function to be θTθ, then this is called L2 type regularization. So, in
the previous example this will turn out to be θ = (β0 β11β12 )T (β0 β11β12 ).

So, in this case h ( θ) = β 02 +β112+β122. Now there are other types of

regularization that you can use. You can use this is what is called the
L2 type or L2 norm you can also use something called an L type or 1
an L1 norm. And larger the value of this coefficient that is multiplying
this the more is regularization strength that is you are penalizing for
use of variables lot more. And one general rule is regularization helps
the model work better with test data because you avoid over fitting on
the train data. So, that is in general something that one can keep in
mind as one does these kinds of problems.

So, with this the portion on logistic regression comes to an end what
we are going to do next is we are going to show you an example case
study, where logistic regression is used for a solution. However, before
we do this case study since all the case studies on classification and
clustering will involve looking at the output from the r code, I am
going to take a typical output from the r code and there are several
results that will show up. These are called performance measures of a
classifier. I am going to describe what these performance measures are
and how you should interpret these performance measures once you
use a particular technique for any case study.

So, in the next lecture we will talk about these performance

measures and then following that will be the lecture on a case study
using logistic regression.

Thank you for listening to this lecture and I will see you in the next lecture.

Arabic Grammar in Urdu - Easy Way To Learn Arabic Grammar Part 1&2
86% (166)
Arabic Grammar in Urdu - Easy Way To Learn Arabic Grammar Part 1&2
97 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Setters Books
No ratings yet
Setters Books
6 pages
Disc Personality Test
No ratings yet
Disc Personality Test
3 pages
lec42
No ratings yet
lec42
12 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Classification
100% (2)
Classification
105 pages
Yousef ML Washin Classification
100% (1)
Yousef ML Washin Classification
333 pages
Lec 3
No ratings yet
Lec 3
21 pages
3. LR, decision tree
No ratings yet
3. LR, decision tree
48 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Lecture Notes Chapt13
No ratings yet
Lecture Notes Chapt13
15 pages
Lecture - 6.2 - Logistic Regression - Standford ML Andrew NG
No ratings yet
Lecture - 6.2 - Logistic Regression - Standford ML Andrew NG
29 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Chapter Classification
No ratings yet
Chapter Classification
12 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
Mathematics Behind Logistic Regression Model 1598272636
No ratings yet
Mathematics Behind Logistic Regression Model 1598272636
6 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Slide 2
No ratings yet
Slide 2
30 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Notes6_Classification
No ratings yet
Notes6_Classification
10 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Logistic Regression Training DR Anil
No ratings yet
Logistic Regression Training DR Anil
38 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
Lec 2
No ratings yet
Lec 2
37 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Lec 2
No ratings yet
Lec 2
22 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Lect 1
No ratings yet
Lect 1
24 pages
Module 3.1
No ratings yet
Module 3.1
25 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
GFAT1002-Machine-Learning-Algorithms
No ratings yet
GFAT1002-Machine-Learning-Algorithms
13 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
CH 1
No ratings yet
CH 1
24 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
ML
No ratings yet
ML
9 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
05 Lecture ML Supervised - Learning SVM
No ratings yet
05 Lecture ML Supervised - Learning SVM
69 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Untitled Document
No ratings yet
Untitled Document
19 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
ML Algo
No ratings yet
ML Algo
36 pages
ML-chap10_2024_110300
No ratings yet
ML-chap10_2024_110300
29 pages
Lec 25
No ratings yet
Lec 25
15 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Lec 1
No ratings yet
Lec 1
42 pages
A Layman's Guide to the Project
No ratings yet
A Layman's Guide to the Project
34 pages
Data Mining Chapter
No ratings yet
Data Mining Chapter
6 pages
Notes Chapter Logistic Regression
No ratings yet
Notes Chapter Logistic Regression
6 pages
Session-11 Machine Learning - Jupyter Notebook
No ratings yet
Session-11 Machine Learning - Jupyter Notebook
11 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Unit Ii: Beyond Binary Classification: Handling More Than Two Classes, Regression, Unsupervised
No ratings yet
Unit Ii: Beyond Binary Classification: Handling More Than Two Classes, Regression, Unsupervised
22 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Attacking Problems in Logarithms and Exponential Functions
From Everand
Attacking Problems in Logarithms and Exponential Functions
David S. Kahn
5/5 (1)
lec41
No ratings yet
lec41
6 pages
lec46
No ratings yet
lec46
12 pages
Internet Protocols
No ratings yet
Internet Protocols
12 pages
Sensor Networks Unit 1
No ratings yet
Sensor Networks Unit 1
20 pages
Mac Sublayer
No ratings yet
Mac Sublayer
42 pages
Multiplexing
No ratings yet
Multiplexing
40 pages
Ai PPT Un It 2
No ratings yet
Ai PPT Un It 2
60 pages
Knowledge Representation Ai Unit 3
No ratings yet
Knowledge Representation Ai Unit 3
26 pages
Ai Unit 1
No ratings yet
Ai Unit 1
54 pages
7 Roles of Entrepreneurship in Economic Development of A Country
No ratings yet
7 Roles of Entrepreneurship in Economic Development of A Country
2 pages
Financial Accounting: Accounting - Process of Identifying, Measuring, and Communicating Economic
No ratings yet
Financial Accounting: Accounting - Process of Identifying, Measuring, and Communicating Economic
7 pages
Design of Cold Storage Structure For Thousand PDF
No ratings yet
Design of Cold Storage Structure For Thousand PDF
8 pages
Nutrition in Pregnancy and Lactation
No ratings yet
Nutrition in Pregnancy and Lactation
84 pages
Composites: Part A: K. Sanada, Y. Tada, Y. Shindo
No ratings yet
Composites: Part A: K. Sanada, Y. Tada, Y. Shindo
7 pages
BFA-Boao Asia Forum 2016 Agenda
No ratings yet
BFA-Boao Asia Forum 2016 Agenda
39 pages
Deld Unit 1
No ratings yet
Deld Unit 1
25 pages
Awareness in Anesthesia Finalised
100% (1)
Awareness in Anesthesia Finalised
42 pages
Occidental Mindoro National High School: School Action Plan in Comprehensive Sexuality Education (Cse)
No ratings yet
Occidental Mindoro National High School: School Action Plan in Comprehensive Sexuality Education (Cse)
2 pages
V5 Process Technology 3
100% (1)
V5 Process Technology 3
449 pages
Description of The Proposed System: Hardware Requirements
No ratings yet
Description of The Proposed System: Hardware Requirements
1 page
CK-E55 H - Generator Set Caterpillar2
No ratings yet
CK-E55 H - Generator Set Caterpillar2
6 pages
Rohini 14151210016
No ratings yet
Rohini 14151210016
3 pages
Fin Su 37 For Web
No ratings yet
Fin Su 37 For Web
64 pages
3 TP Sums
No ratings yet
3 TP Sums
21 pages
Natural Grammar
No ratings yet
Natural Grammar
15 pages
Data Reveals Pharma Industry Not Connecting With Social Media
No ratings yet
Data Reveals Pharma Industry Not Connecting With Social Media
8 pages
IWI Uzi Pro Manual
No ratings yet
IWI Uzi Pro Manual
60 pages
Parthian Shot Issue 2 - 29.06.10
100% (1)
Parthian Shot Issue 2 - 29.06.10
8 pages
As An Alternative Antibacterial Dishwashing Liquid - Suyat
No ratings yet
As An Alternative Antibacterial Dishwashing Liquid - Suyat
85 pages
Cement Concrete Calculation
No ratings yet
Cement Concrete Calculation
2 pages
MODULE 6 Lesson 1 Activity
100% (1)
MODULE 6 Lesson 1 Activity
5 pages
Biostatistics:Descriptive Statistics
No ratings yet
Biostatistics:Descriptive Statistics
146 pages
(Ebook) Microsoft .NET Remoting by Kim Williams ISBN 9780735617780, 0735617783 2024 Scribd Download
100% (4)
(Ebook) Microsoft .NET Remoting by Kim Williams ISBN 9780735617780, 0735617783 2024 Scribd Download
81 pages
Roadmap
No ratings yet
Roadmap
1 page
Precipitation Module (TC-PRISMA) User Guide
No ratings yet
Precipitation Module (TC-PRISMA) User Guide
55 pages
EPTD - W8-L2 - Corona in Transmission Lines
100% (1)
EPTD - W8-L2 - Corona in Transmission Lines
16 pages

lec43

Uploaded by

lec43

Uploaded by

Data science for Engineers

Prof. Ragunathan Rengasamy

(Refer Slide Time: 00:13)

We will continue our lecture on Logistic Regression that we

(Refer Slide Time: 02:52)

Let us take a very simple example to understand this. So, let us

So, in other words I am being supervised in terms of what I should call

(Refer Slide Time: 05:34)

So, just so let me see this it is a very simple problem we have

So, if you did a logistics regression solution, then in this case it

So, the process of identifying these parameters is what is usually

However, when there are multiple dimensions it is very di cult to

Here θ again speaks to the constants in the hyperplane or the

So, to prevent this what we want to do is somehow we want to say

So, regularization avoids building complex models or it helps in

And this penalty should be o set by the improvement I have in this

So, in this case h ( θ) = β 02 +β112+β122. Now there are other types of

So, in the next lecture we will talk about these performance

You might also like