0% found this document useful (0 votes)

22 views

ML-Unit I - Logistic Regression

Uploaded by

Pranav Reddy

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

ML-Unit I - Logistic Regression

Uploaded by

Pranav Reddy

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 102

Machine Learning

Dr. Sunil Saumya

IIIT Dharwad
Logistic Regression
Revisiting the regression model
● Recall the dataset we have seen:
Word Product
Count quality
27 52
2 6
100 42
40 38
14 30
20 ??
The new observations
● Consider the new dataset as:
Word Product ● Here, the given dataset is a classification dataset.
Count quality ○ The target is either 0 or 1.
27 52 ● Because, there are only two possible label given for
each input, it is known as Binary Classification.
2 6
100 42 How to build a Binary classification model?
40 38
14 30 Let’s apply the linear regression model on the given data
and check whether it also works for binary classification?
20 ??
Applying OLS on categorical data
● Consider the new dataset as:
● Linear regression predicts
Word Product value not only 0 and 1 but
Count quality the values between 0 and
1, less than 0 and greater
27 52 than 1.
2 6 ● But here, we want to
predict categories either 0
100 42 or 1.
40 38 ● One try we give by
putting a threshold at 0.5
14 30
on Y-axis.
20 ??
Applying OLS on categorical data
● Consider the new dataset as:
Word Product
Count quality
27 52
2 6
100 42
40 38
14 30
20 ??
Applying OLS on categorical data
● Consider the new dataset as:
Word Product
Count quality
27 52
2 6
100 42
40 38
14 30
20 ??
Applying OLS on categorical data
● Consider the new dataset as:
Word Product
Count quality
27 52
2 6
100 42
40 38
14 30
20 ??
Applying OLS on categorical data
● Consider the new dataset as:
Word Product
Count quality
27 52
For this set of dataset, it looks
2 6 like Linear Regression could do
a pretty good job.
100 42
Here, we have only one
40 38 misclassification.

14 30
20 ??
By adding a new data point
(x,y) = (300,1)
Applying OLS on categorical data In our training set, we get a new
line Y2= w2x+b2

● Consider the new dataset as: Due to this, now we have two
misclassifications.

However, the best line should

not increase the misclassification
as we increase the number of
training examples.

When best regression line is

added with a new data point, the
decision boundary (green dotted
line) has shift over to the right
(red dotted line).

Therefore, we can say that linear

regression is not a good model
for classification task.
Logistic Regression
● Let’s fit the logistic regression model on the following dataset. We get the following
model:
Logistic Regression
● Let’s fit the logistic regression model on the following dataset. We get the following
model:
For the given S-shaped
model, if the

votes=17, then
product quality= 0.9

that means it is more

likely to be 1 (High).

But, the output can

only be 0 or 1. How do
we achieve that?
Sigmoid or logistic function
● To get the output between 0 and 1, we use the sigmoid (or logit) function.
Case I: if Z is very big -ve number, say z=∞
e-z ≃ 0
g(z) = 1/(1+0) = 1

Case II: if Z is very big +ve number, say z=-∞

e-z ≃ ∞
g(z) = 1/(1+∞) = 0

Case II: if z=0

e-z ≃ 1
g(z) = 1/(1+1) = 0.5
Let’s use this sigmoid function to build up the
logistic regression.
Logistic Regression
● To get the output between 0 and 1, we use the sigmoid (or logit) function.
Step 1:
Let’s use the Linear Regression function
fw,b(x) = and store it in

Step 2:
Pass the value of z to the logistic function:

Therefore we get logistic regression model as;

fw,b(x) = g( ) = g(z) =
Logistic Regression: output interpretation
● To get the output between 0 and 1, we use the sigmoid (or logit) function.

For Votes= 17,

The learned logistic regression will give
output as; f(17) = 0.9

It means that there is 90% probability that the

output is 1 or product quality is high.

The model also tell that there is 10%

probability that the output is 0 or product
quality is low.
Logistic Regression always gives the probability assuming that class is 1.
Decision Boundary
● Decision boundary gives the output as
either 1 or 0.
● Logistic regression model:

● In the above model , if we keep a

threshold at 0.5, then we get

When fw,b(x) ≥ 0.5 ?

Decision Boundary

When fw,b(x) ≥ 0.5 ?

g(z) ≥ 0.5
z ≥0
≥ 0 and <0

Decision Boundary: =0
Decision Boundary:
for 1D data

Decision boundary for the given dataset is:

⇒ -30.8841+1.9292x1 = 0
⇒ 1.9292x1=30.8841
⇒ x1 = 30.8841/1.9292 = 16.0087

16
Votes=x1
Decision Boundary: for 2D data

Consider the 2-dimensional data:

Y=1

Let's find the decision

boundary for this.

Y=0
Decision Boundary: for 2D data
Logistic Regression Hypothesis for the 2-D data
is as follows:
Let's find the decision
boundary for this.
Let’s consider; w1=1, w2=1, and b=-3
Y=1
As, we saw the decision boundary in Logistic
Regression is at;
z=

Putting the values of w1=1, w2=1, and b=-3, we

Y=0 get;
z= 1. x1+1. x2 - 3 = 0
Therefore, x1+x2= 3 is my decision boundary
Decision Boundary: for 2D data
Logistic Regression Hypothesis for the 2-D data
is as follows:
Let's find the decision
boundary for this.
Let’s consider; w1=1, w2=1, and b=-3
Y=1
As, we saw the decision boundary in Logistic
Regression is at;
z=

Putting the values of w1=1, w2=1, and b=-3, we

Y=0 get;
z= 1. x1+1. x2 - 3 = 0
Therefore, x1+x2= 3 is my decision boundary
Decision Boundary: Non-linear
Consider Logistic Regression Hypothesis for the
polynomial data is as follows:
Let's find the decision
boundary for this.
Let’s consider; w1=1, w2=1, and b=-1

As, we saw the decision boundary in Logistic

Regression is at;
z=

Putting the values of w1=1, w2=1, and b=-1, we

get;

Therefore, is my decision boundary

Decision Boundary: Non-linear
Consider Logistic Regression Hypothesis for the
polynomial data is as follows:
Let's find the decision
boundary for this.
Let’s consider; w1=1, w2=1, and b=-1

As, we saw the decision boundary in Logistic

Regression is at;
z=

Putting the values of w1=1, w2=1, and b=-1, we

get;

Therefore, is my decision boundary

Decision Boundary: Non-linear
● We can have even more complex decision boundary, when we have even higher
order polynomial features.
● Consider the hypothesis as:
Decision Boundary: Non-linear
● We can have even more complex decision boundary, when we have even higher
order polynomial features.
● Consider the hypothesis as:
Decision Boundary: Non-linear
● We can have even more complex decision boundary, when we have even higher
order polynomial features.
● Consider the hypothesis as:
Decision Boundary: Non-linear
● We can have even more complex decision boundary, when we have even higher
order polynomial features.
● Consider the hypothesis as:
Cost function for Logistic Regression
Consider the following multivariate dataset:

The logistic regression model

for this dataset is:

Given this dataset, how to

choose parameters:
Cost function for Logistic Regression
Recall the linear regression cost function:

Linear regression model:

Cost function for linear regression

Cost function for Logistic Regression
The logistic regression model for this
Recall the linear regression cost function: dataset is:

Linear regression model: If we apply the same cost function on

logistic regression, it looks like:
Non-Convex
Convex

Therefore, for Logistic

Regression, the squared
error cost function is not
Cost function for linear regression a good choice.
Loss function for Logistic Regression
Let’s define the loss function for Logistic Regression as:

Why this cost function will work for logistic regression?

Loss function for Logistic Regression
Let’s define the loss function for Logistic Regression as:

Let’s plot the loss for y(i)=1, here f is the output of logistic regression and 0<f<1.

Relevant portion of
this graph is
Loss function for Logistic Regression
Let’s define the loss function for Logistic Regression as:

Case 1: If the Logistic Regression model predicts 1

Case 2:
Loss function for Logistic Regression
Let’s define the loss function for Logistic Regression as:

Case 1: If the Logistic Regression model predicts 1

Case 2: If the Logistic Regression model predicts 0.5

Loss is little higher but not very high

0.5
Loss function for Logistic Regression
Let’s define the loss function for Logistic Regression as:

Case 1: If the Logistic Regression model predicts 1

Case 2: If the Logistic Regression model predicts 0.5

Loss is little higher but not very high
Case 3: If the Logistic Regression model predicts 0.1
Loss is higher, only 10% chance of being class 1
0.1
Loss function for Logistic Regression
Let’s define the loss function for Logistic Regression as:

Case 1: If the Logistic Regression model predicts 1

Case 2: If the Logistic Regression model predicts 0.5

Loss is little higher but not very high
Case 3: If the Logistic Regression model predicts 0.1
Loss is higher, only 10% chance of being class 1
0.5
Case 4: If the Logistic Regression model predicts 0
0.1
Loss is higher, only 0% chance of being class 1
Loss function for Logistic Regression
Let’s define the loss function for Logistic Regression as:

Case 1: If the Logistic Regression model predicts 0

Case 2: If the Logistic Regression model predicts 0.5

Loss is little higher but not very high

Case 3: If the Logistic Regression model predicts 1

Loss is higher, only 0% chance of being class 1
Loss function for Logistic Regression
Let’s define the loss function for Logistic Regression as:

Overall Cost
function is
convex
Cost function for Logistic Regression
Therefore the cost function for Logistic Regression as:

Where;

Overall Cost
function is
convex
Simpliﬁed Loss Function for Logistic Regression
The loss function for Logistic Regression as:

Simplified version of loss function for logistic regression:

Simpliﬁed Loss Function for Logistic Regression
The loss function for Logistic Regression as:

Simplified version of loss function for logistic regression:

In the above loss function if we substitute y(i) =1, In the above loss function if we substitute y(i) =0,
We get; We get;
Simpliﬁed Cost Function for Logistic Regression
Simplified version of loss function for logistic regression:

The simplified cost function derived from above loss function is:
Simpliﬁed Cost Function for Logistic Regression
Simplified version of loss function for logistic regression:

The simplified cost function derived from above loss function is:

● But, why have we chosen this cost function over

tons of other function available?
○ Because it is convex.
Gradient descent for training Logistic Regression
● Find the parameters: such that the we get the minimum cost.
Then, Given new x, output
● The simplified cost function derived from above loss function is:

● In order to minimize the cost function, we use the following gradient update rule:
Gradient descent for training Logistic Regression
● Find the parameters: such that the we get the minimum cost.
Then, Given new x, output
● The simplified cost function derived from above loss function is:

● In order to minimize the cost function, we use the following gradient update rule:

Simultaneous update
Gradient descent for training Logistic Regression
The simplified gradient descent update rule is:

● Have you seen this update rule before?

○ The same update rule we used for
linear regression also.
● However, they vary as:
The problem of overfitting: classification
Addressing Overfitting
● There are many ways to overcome overfitting problem:
○ Training with more data
○ Feature tuning
○ Early stopping
○ Cross Validation
○ Regularization
Addressing Overfitting: Training with more data
● This technique might not work every time, as we have also discussed in the example
above, where training with a significant amount of population helps the model.
● It basically helps the model in identifying the signal better.
● But, getting more data may not be the ideal case in general.

Training with
Overfitting more example.
Addressing Overﬁtting: Feature tuning
● Many times we are in a situation where:

Many features + insufficient data → Overfitting

● We can manually remove a few irrelevant features from the input features to improve
the generalization.
● One way to do it is by deriving a conclusion as to how a feature fits into the model
(correlation between feature and target).
○ It is quite similar to debugging the code line-by-line.
○ In case if a feature is unable to explain the relevancy in the model, we can
simply identify those features.
● We can even use a few feature selection heuristics for a good starting point.
Addressing Overfitting: Early stopping
● When the model is training, you can
actually measure how well the model
performs based on each iteration.
● We can do this until a point when the
iterations improve the model’s
performance.
● After this, the model overfits the
training data as the generalization
weakens after each iteration.
Addressing Overfitting: Cross validation
● One of the most powerful features to avoid or
prevent overfitting is cross-validation.
● The idea behind this is to use the initial training
data to generate mini train-test-splits, and then
use these splits to tune your model.
● In a standard k-fold validation, the data is
partitioned into k-subsets also known as folds.
● After this, the algorithm is trained iteratively on
k-1 folds while using the remaining folds as the
test set, also known as holdout fold.
Addressing Overfitting: Regularization
● Regularization is done:
○ by penalizing the algorithm proportional to value of Wj
○ which will ensure small values of these parameters and hence would prevent
overfitting
○ by attributing small contributions from each features and hence removing high bias
or high variance.

f(x) = 28x-385x2+39x3-174x4+100 f(x) = 13x-0.23x2+0.000014x3-0.00014x4+10

Addressing Overﬁtting: Regularization
● Regularization encourages the learning algorithm to shrink the values of the
parameters without necessarily demanding that the parameter is set to exactly 0.
● It turns out that even if we fit a higher order polynomial, so long as we can get
the algorithm to use smaller parameter values: w1, w2, w3, w4, we end up with a
curve that ends up fitting the training data much better.
● So what regularization does, is it lets you keep all of your features, but they just
prevents the features from having an overly large effect, which is what
sometimes can cause overfitting.
Implementing Regularization
● Intuition:

Just fit
Implementing Regularization
● Intuition:

Just fit
Overfit
Implementing Regularization
● Intuition:

Just fit
Overfit

How do we overcome the overfitting?

Implementing Regularization
● Intuition:

Just fit
Overfit

● Overfitting can be controlled by minimizing the effect of w3 and w4.

● That means make w3 and w4 really small (close to 0).
● That means, instead of minimizing the cost function:

We minimize the cost function as:

+1000 w3+1000w4
Implementing Regularization
● Intuition:

Just fit
Overfit

We minimize the cost function as:

+1000 w3+1000w4

By making w3=0.001 and w4=0.001

1000 w3⋍ 0
1000 w4⋍ 0,
Therefore we get the quadratic curve with little contribution of x3 and x4.
Implementing Regularization
● Intuition:

Just fit
Overfit

We minimize the cost function as:

+1000 w3+1000w4

● In general, we may have 100 features and it is hard to find which one is most important feature and which one
to penalize.
● The way regularization works is to penalize all features by reducing the effect of Wj.which is less likely to
overfit.
Cost function with Regularization
● So the main ideas in regularization are: maintain small values for the parameters
W0 , W1,W2 ⋯ Wn which keeps the hypothesis simple and less prone to
overfitting
● Mathematically, regularization is achieved by modifying the cost function as
follows,

[ ]
Fit data by Keep Wj small
minimizing the to control the
MSE overfitting
Cost function with Regularization
● If noticed closely, regularization term mainly points to the fact that if value of
Wj is increased, it would consequently increase the cost which is to be
minimized during gradient descent.
● So it would ensure small values of the parameters as intended to prevent
overfitting.

[ ]
Fit data by Keep Wj small
minimizing the to control the
MSE overfitting
Cost function with Regularization
● Case I: If λ=0, That means we are not using
regularization term.

Therefore, we end up with overfitted curve as:

Cost function with Regularization
● Case II: If λ=1010, That means we are placing
very heavy weight to the regularization term.
○ The only way to minimize it by choosing all the value of
wj very close to 0.

So, if λ is very large then to minimize the

regularization term, our algorithm will choose

Therefore, F(x) =b
Cost function with Regularization
● Case II: If λ=1010, That means we are placing very
heavy weight to the regularization term.
○ The only way to minimize it by choosing all the
value of wj very close to 0.

So, if λ is very large then to minimize the regularization term,

our algorithm will choose

Therefore, F(x) =b and we get a horizontal line and

model underfits.

● Therefore, choosing a balance value of λ balances

both goals well.
Cost function with Regularization
● Case II: If λ=1010, That means we are placing very
heavy weight to the regularization term.
○ The only way to minimize it by choosing all the
value of wj very close to 0.

So, if λ is very large then to minimize the regularization term,

our algorithm will choose

Therefore, F(x) =b and we get a horizontal line and

model underfits.

● Therefore, choosing a balance (not too small and not

too large) value of λ balances both goals well.
Regularized linear regression
● Mathematically, regularization is achieved by modifying the cost function as follows,

● To minimize the cost function we use the following gradient descent update rule:
Regularized linear regression
● Mathematically, regularization is achieved by modifying the cost function as follows,

● To minimize the cost function we use the following gradient descent update rule:
Regularized linear regression
● To minimize the cost function we use the following gradient descent update rule:

In the term,

Wj= 1.Wj-∝ -∝ If ∝ = 0.001, and λ = 1, then (1-∝ ) is (1- 0.001* 1/50)

Which is = 0.9998.

Wj= Wj(1- ∝ )-∝ Therefore, in every update we reduce the Wj by multiplying it

with a little small value 0.9998.
Regularized Logistic Regression
● Similar to linear regression, we can also regularized logistic regression

To minimize the effects of polynomial features and reduce the

overfitting, we regularize the wj
Regularized Logistic Regression
● The cost function of logistic regression:

● The modified cost function:

Multiclass Classiﬁcation
● Logistic regression can be applied to solve
multiclass problems.
● Common Approaches
○ One-vs-Rest (One-vs-All)
○ Softmax Regression (Multinomial
Logistic Regression)

Multiclass Classification
Multiclass Classification: One-vs-Rest
● For each class, build a logistic regression to find the probability the
observation belongs to that class.
● For each data point, predict the class with the highest probability.
● Consider the following dataset:
Multiclass Classification: One-vs-Rest
● For each class, build a
logistic regression to find
the probability the P(Y=0|x)
observation belongs to that
class.
● For each data point, predict P(Y=1|x)
the class with the highest
probability.
● Consider the following
dataset:
P(Y=2|x)
Multinomial Regression
● Assignment for you.
Classification metrics
● There are many classification metrics available:
○ Accuracy
○ Confusion Matrix
○ Precision
○ Recall
○ F1 score
○ AUC
Classification metrics: Accuracy
● Consider the dataset as follows:
● Based on the predictions given by LR
and DT, identify which classifier is
better?
● How do we find?
Classification metrics: Accuracy
● Accuracy measures performance of
the classifier.
● For example, LR:
○ 1st data : correct
○ 2nd data: correct
○ 3rd data: wrong
○ 4th data: correct
○ 5th data: wrong
Classification metrics: Accuracy
● Accuracy measures performance of
the classifier.
● For example, LR:
○ 1st data : correct
○ 2nd data: correct
○ 3rd data: wrong
○ 4th data: correct
○ 5th data: wrong
● Accuracy of LR = ⅗= 0.6 = 60%
Classification metrics: Accuracy
● Accuracy measures performance of
the classifier.
● For example, DT:
○ 1st data : correct
○ 2nd data: correct
○ 3rd data: correct
○ 4th data: wrong
○ 5th data: correct
● Accuracy of DT = ⅘ = 0.8 = 80%
Classification metrics: Accuracy
● Accuracy measures performance of
the classifier.
● Even in multiclass classification
problem accuracy can do the job in the
same way.
● Accuracy of LR = ⅘=0.8
● Accuracy 0f DT = ⅖=0.4
Classification metrics: Accuracy
● How much Accuracy is good?
Classification metrics: Accuracy
● How much Accuracy is good?
○ It depends on the problem.
● Scenario 1: Say that we have to predict the cancer (Yes/No) based on the
image of chest
○ How much accurate your model should be?
Classification metrics: Accuracy
● How much Accuracy is good?
○ It depends on the problem.
● Scenario 1: Say that we have to predict the cancer (Yes/No) based on the
image of chest
○ How much accurate your model should be?
■ Say your model is 99% accurate.
■ Can you deploy this model?
Classification metrics: Accuracy
● How much Accuracy is good?
○ It depends on the problem.
● Scenario 1: Say that we have to predict the cancer (Yes/No) based on the
image of chest
○ How much accurate your model should be?
■ Say your model is 99% accurate.
■ Can you deploy this model?
● No, we can't rely on this model.
● There is a chance that out of 100, 1 patient will die.
● Its a bad model.
Classification metrics: Accuracy
● How much Accuracy is good?
○ It depends on the problem.
● Scenario 1: Say that we have to predict the cancer (Yes/No) based on the
image of chest
○ How much accurate your model should be?
■ Say your model is 99% accurate.
■ Can you deploy this model?
Classification metrics: Accuracy
● How much Accuracy is good?
○ It depends on the problem.
● Scenario 2: Predict your self driving car has to take left or right?
○ How much accurate your model should be?
■ Say your model is 99% accurate.
■ Can you deploy this model?
● No, we can't rely on this model.
● There is a chance that out of 100, 1 patient will die.
● Its a bad model.
Classification metrics: Accuracy
● How much Accuracy is good?
○ It depends on the problem.
● Scenario 2: Predict whether a customer will order food this weekend or
not?
○ How much accurate your model should be?
■ Say your model is 80% accurate.
■ Can you deploy this model?
● Yes, we can.
Classification metrics: Accuracy
● The problem with Accuracy:
○ Accuracy score gives a number.
■ That says how good a model is.
■ Or, how bad a model is.
○ Say a model is 90% accurate.
■ That also means 10% incorrect.
■ But, what is the type of incorrect? That does not explain by
Accuracy.
● For example: Actual label is 0 → Model prediction is 1
Actual label is 1 → Model prediction is 0
Classification metrics: Confusion Metric
● The Confusion Metric looks like:
Predicted class
Actual class
Classification metrics: Confusion Metric
● The Confusion Metric looks like:
Classification metrics: Confusion Metric
● The Confusion Metric looks like:
Classification metrics: Confusion Metric
● Sometime accuracy is misleading:
● Consider of predicting a passenger as terrorist at airport.
● Say the number of passenger are as:
not terrorist: 9999
terrorist: 1

○ Clearly, there is an data imbalance case.

● We train the model on above dataset and get the

prediction as shown in confusion metric:
● Consider model always predicts as “Not terrorist”
Classiﬁcation metrics: Confusion Metric
● Sometime accuracy is misleading:
● Consider of predicting a passenger as terrorist at airport.
● Say the number of passenger are as:
not terrorist: 9999
terrorist: 1

○ Clearly, there is an data imbalance case.

● We train the model on above dataset and get the

prediction as shown in confusion metric: Accuracy = 99.99%
● Consider model always predicts as “Not terrorist” is misleading
Classiﬁcation metrics: Confusion Metric
● When accuracy is misleading we will use Precision,
Recall.
● Consider there are two data scientist who develops the
email spam classifier model as follows:

Spam-A Not-spam-A Spam-B Not-spam-B

Spam 100(TP) 70(FN) Spam 100(TP) 90(FN)

Not-spam 30(FP) 700(TN) Not-spam 10(FP) 700(TN)

Model A Model B

Among above which model will you select?

Classiﬁcation metrics: Confusion Metric ● Accuracy of both models are
80%.
● When accuracy is misleading we will use ● Therefore, from accuracy we can
not select model.
Precision, Recall. CASE I: if FP is important for you, you
● Consider there are two data scientist who develops select Model B because
the email spam classifier model as follows: FP=10 in Model B < FP=30 in Model A

Spam-A Not-spam-A Spam-B Not-spam-B

Spam 100(TP) 70(FN) Spam 100(TP) 190(FN)

Not-spam 30(FP) 700(TN) Not-spam 10(FP) 700(TN)

Model A Model B

CASE II: if FN is important for you, you

Among above which model will you select?
select Model A because
FN=70 in Model A < FP=190 in Model B
Here, you may select CASE I.
Classiﬁcation metrics: Confusion Metric Model A

Spam-A Not-spam-A
● This is explained by Precision and
Spam 100(TP) 70(FN)
Recall as follows:
● Precision: Not-spam 30(FP) 700(TN)

○ What proportion of predicted

positives is truly positive? Spam-B Not-spam-B

Spam 100(TP) 190(FN)

Not-spam 10(FP) 700(TN)

PrecisionA= 100/(100+30) Model B

PrecisionB = 100/(100+10)

Clearly, PrecisionA<PrecisionB
Classiﬁcation metrics: Confusion Metric
● Consider there are two data scientist who develops the
cancer prediction model as follows:

Cancer Not-Cancer Cancer Not-Cancer

Cancer 1000(TP) 200(FN) Cancer 1000(TP) 500(FN)

Not-Cancer 800(FP) 8000(TN) Not-Cancer 500(FP) 8000(TN)

Model A Model B

Among above which model will you select?

Classiﬁcation metrics: Confusion Metric
● Accuracy of both models are 90%.
● When accuracy is misleading we will use ● Therefore, from acc we can not select
model.
Precision, Recall. CASE I: if FP is important for you, you
● Consider there are two data scientist who develops select Model B because
the email spam classifier model as follows: FP=500 in Model B < FP=800 in Model A

CASE II: if FN is important for you, you select

Among above which model will you select?
Model A because
FN=200 in Model A < FN=5000 in Model B
Here, you may select CASE II.
Classiﬁcation metrics: Confusion Metric
● This is explained by Precision and
Recall as follows:
● Recall:
○ What proportion of actual
positives is correctly classified?

RecallA= 1000/(1000+200)
RecallB = 1000/(1000+500)

Clearly, RecallB<RecallA
Classiﬁcation metrics: Confusion Metric
● Sometimes your model is neither precision based nor recall based or you
can say both are equally important.
● Then we use the mean of precision and recall and that is called as
F1-score.
● F1-score: is given as

● For example: say Precision = 0 and Recall = 100,

Arithmetic mean: (0+100)/2 = 50

Harmonic mean (F1-score): (2* (0*100)/0+100) = 0
F1-score is more inclined towards the low value.
Regression and Classiﬁcation: Assignments
1. How do we find the Precision, Recall and F1-score for Multiclass
classification?
2. What is Ridge Regression/L1 Regression?
3. What is Lasso Regression/L2 Regression?
4. What is Softmax regression?

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6129)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (627)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1148)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (935)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4/5 (8215)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (631)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1253)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4/5 (8365)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (860)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (877)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (954)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4/5 (2923)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (484)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (277)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (4972)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (444)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2061)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4281)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (447)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2283)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1987)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1068)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1993)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2641)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1936)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (125)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (692)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (1912)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4074)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (75)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (830)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (901)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (143)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2544)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M L Stedman
4.5/5 (790)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (105)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
3.5/5 (109)