0% found this document useful (0 votes)

16 views

Practical - Logistic Regression

1. Logistic regression is a statistical method used for binary classification problems to predict the probability that an observation falls into one of two categories. 2. It models the probability of an observation being in a particular class based on input features by using the logistic function, which outputs a value between 0 and 1. 3. Logistic regression is commonly used for problems like disease diagnosis, customer churn prediction, and spam detection by evaluating models using metrics like sensitivity, specificity, precision, and F1-score.

Uploaded by

whitenegrogotchicks.619

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Practical - Logistic Regression

Uploaded by

whitenegrogotchicks.619

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

Logistic Regression

Unit 3
• Logistic and Multinomial Regression
• Understanding logistic regression and its use in
binary classification.
• Estimating probabilities using logistic regression.
• Model evaluation metrics: sensitivity, specificity,
precision, F-score.
• Model Performance and Conclusion
• Introduction to ROC curve and AUC.
• Determining the optimal cutoff probability.
•
Logistic Regression and Multinomial
Logistic Regression
• They are both types of regression analysis
used for binary and multiclass classification,
respectively.
• They are commonly used in machine learning
and statistics to model the relationship
between a set of input features and
categorical target variables.
Logistic Regression

• Logistic Regression is used for binary classification

problems where the target variable (dependent) has
two classes.
• It models the probability that a given input belongs to
a particular class.
• The logistic function (also known as the sigmoid
function) is used to map the linear combination of
input features to a probability between 0 and 1.
• In logistic regression, the goal is to find the best-fitting
coefficients for the input features that maximize the
likelihood of the observed class labels.
Multinomial Logistic Regression

• Multinomial Logistic Regression is an extension of logistic

regression for multiclass classification problems.
• It's used when the target variable has more than two
classes.
• It models the probabilities of each class independently,
compared to each other.
• In multinomial logistic regression, a separate binary logistic
regression model is created for each class, using one class
as the reference.
• The model predicts the log-odds of each class, and then the
softmax function is applied to convert the log-odds into
probabilities.
In this example
• Age is independent variable
• Have insurance is dependent variable
• If we apply simple linear regression here then we can see that
younger people and older people do not have insurance
• To predict age 47 will buy insurance or not we can develop
probabilities such that
• Greater than 0.5 = Yes (1)
• Less than 0.5 = No (0)
• And we will get the answer
• But if age is 90, Y is more than 1 and if age is less than 20, Y is
negative
• This means the model is not a food fit of this data
• My value of Y should be only 0 and 1 or between 0 and 1.
• Hence simple regression does not fit well this type of data
• In The Sigmoid function the value of Y will
remain between 0 and 1
• This is also known as Logistic Function
• It is an ‘S’ shaped curve
• This fits very well in such data
• The logistic regression uses Maximum
Likelihood function to maximize the
probability and then through cut off
probability we answer how Y will be between
0 and 1.
2. Understanding logistic regression
and its use in binary classification.
Example
• Logistic Regression is a statistical method used
for binary classification, which involves
predicting one of two possible classes or
outcomes.
• It's a fundamental algorithm in machine
learning and statistics.
• Despite its name, logistic regression is
primarily used for classification rather than
regression tasks.
Purpose of Logistic Regression

• Logistic Regression is used to model the

probability of a binary outcome based on one or
more predictor variables.
• It predicts the probability that an instance belongs
to a particular class (e.g., positive or negative) by
using the logistic function (also known as the
sigmoid function) to map the linear combination
of predictor variables to a value between 0 and 1.
• This probability can then be thresholded to make
a binary classification decision.
Logistic Function (Sigmoid)

• The logistic function is the core of logistic

regression. It has an S-shaped curve and is
defined as:

• σ(z)=1/(1+e-z)

• Where z=b0+b1x1+b2x2 is the linear

combination of predictor variables and their
corresponding coefficients.
Model Training

• During training, logistic regression estimates

the coefficients of the predictor variables that
best fit the training data.
• The optimization algorithm (e.g., 'lbfgs',
'newton-cg', 'sag', 'saga') finds the coefficients
that maximize the likelihood of the observed
class labels given the input features.
Decision Boundary

• The decision boundary is a threshold that

determines the class prediction.
• It is the point where the logistic function
crosses the threshold (usually 0.5).
• Instances with probabilities above the
threshold are classified as one class, and those
below are classified as the other class.
Evaluation

• Logistic Regression models are evaluated

using various metrics such as accuracy,
precision, recall, F1-score, ROC curve, and
AUC-ROC.
• The choice of evaluation metric depends on
the problem's requirements and the class
distribution.
Use Cases

• Logistic Regression is commonly used in

various fields, including:
• - Medical diagnosis (e.g., disease prediction)
• - Customer churn prediction
• - Spam detection
• - Credit risk assessment
• - Sentiment analysis
• - Fraud detection
Few examples of classification problem
are as follows:
1. A bank would like to classify the customers based on risk such as
low-risk or high-risk customers.
2. E-commerce providers would like to predict whether a customer is
likely to churn or not. It is a loss of revenue for the company if an
existing and valuable customer churns.
3. Health service providers may classify a patient, based on the
diagnostic results, as positive (presence of disease) or negative.
4. The HR department of a firm may want to predict if an applicant
would accept an offer or not.
5. Predicting outcome of a sporting event, for example, whether India
will win the next world cup cricket tournament
6. Sentiments of customers on a product or service may be classified
as positive, negative, neutral, and sarcastic.
7. Based on the image of a plant, one can predict if the plant is
infected with a specific disease or not.
Limitations

• Logistic Regression assumes a linear relationship

between the predictor variables and the log-odds of
the outcome.
• It may not perform well with complex interactions
between features.
• Additionally, it's sensitive to outliers and can suffer
from multicollinearity.
• In summary, Logistic Regression is a powerful and
interpretable algorithm used for binary classification
tasks. It models the probability of belonging to a class
and is widely applied in practical machine learning
problems.
3. Logistic Regression is not only used for binary
classification, but it also provides estimated
probabilities for each class.
• These estimated probabilities can be interpreted
as the model's confidence in its predictions.
• When logistic regression is used for binary
classification, it models the probability of an
instance belonging to the positive class (usually
labeled as "1").
• The logistic function (sigmoid) transforms the
linear combination of predictor variables into a
probability score between 0 and 1.
The formula for logistic regression is as
follows:
•
• P(y = 1 | X) = 1/(1+e-z)

• Where P(y = 1 | X) is the probability of the positive

class given the input features X.
• z is the linear combination of predictor variables and
their coefficients.

• To get the predicted probability for a specific instance,

you compute $ z $ based on the model's coefficients
and the input features, and then apply the logistic
function.
4.Model evaluation
• Metrics such as sensitivity, specificity,
precision, and F-score are used to assess the
performance of classification models,
particularly in binary classification problems.
• These metrics provide insights into different
aspects of the model's behaviour and are
often used together to provide a
comprehensive understanding of its
performance.
Confusion Matrix
• A confusion matrix is a table that is used to
define the performance of a classification
algorithm.
• A confusion matrix visualizes and summarizes
the performance of a classification algorithm.
• tn (True Negatives): The number of instances that
were correctly predicted as the negative class (0).
• fp (False Positives): The number of instances that
were incorrectly predicted as the positive class (1)
when they actually belong to the negative class.
• fn (False Negatives): The number of instances that
were incorrectly predicted as the negative class
(0) when they actually belong to the positive class.
• tp (True Positives): The number of instances that
were correctly predicted as the positive class (1).
1. Sensitivity (True Positive Rate or
Recall)
• Sensitivity measures the ability of the model
to correctly identify positive instances (true
positives) out of all actual positive instances.
• It is the ratio of true positives to the total
number of actual positives.
• Sensitivity = True Positives/{True Positives +
False Negatives}
2. Specificity (True Negative Rate)

• Specificity measures the ability of the model

to correctly identify negative instances (true
negatives) out of all actual negative instances.
• It is the ratio of true negatives to the total
number of actual negatives.
• Specificity = {True Negatives}/{True
Negatives + False Positives}.
• Fpr=1-Specificity
3. Precision (Positive Predictive Value)

• Precision measures the proportion of correctly

predicted positive instances (true positives)
out of all instances predicted as positive.
• It is a measure of the model's accuracy when
predicting positive cases.
• Precision = True Positives/{True Positives +
False Positives}
4. F-score (F1-score)

• The F-score is the harmonic mean of precision and

recall (sensitivity).
• It provides a balanced measure that takes both false
positives and false negatives into account.
• The F1-score is often used when there is a class
imbalance.
• F-score = (2 (Precision *Sensitivity)/(Precision +
Sensitivity)
• These metrics are particularly important when the
cost of false positives and false negatives varies or
when you want to understand the trade-off between
different aspects of the model's performance.
• Remember that the choice of which metrics to
focus on depends on the specific problem and
the relative importance of different types of
errors in your application.
5. The Receiver Operating Characteristic (ROC)
curve and the Area Under the Curve (AUC)
• The Receiver Operating Characteristic (ROC)
curve and the Area Under the Curve (AUC) are
widely used tools for evaluating and
visualizing the performance of binary
classification models.
• They provide valuable insights into the
model's ability to distinguish between positive
and negative classes across different threshold
settings.
ROC Curve (Receiver Operating
Characteristic)
• The ROC curve is a graphical representation of the
true positive rate (sensitivity) versus the false positive
rate (1 - specificity) as the classification threshold is
varied.
• It shows how well the model can discriminate
between the positive and negative classes across
different threshold levels.
• Each point on the ROC curve corresponds to a specific
threshold setting.
• The diagonal line (y = x) on the ROC plot represents
random guessing.
• An ideal model would have a ROC curve that hugs the
top-left corner of the plot, indicating high sensitivity
(true positive rate) and low false positive rate across
all threshold values.
AUC (Area Under the Curve)

• The AUC is a scalar value that summarizes the

overall performance of a model's ROC curve.
• It quantifies the area under the ROC curve and
ranges between 0 and 1.
• A model with perfect discrimination will have an
AUC of 1, while a model with no discrimination
(random guessing) will have an AUC of 0.5.
• A higher AUC generally indicates better
classification performance, but the interpretation
of AUC can depend on the specific problem and
the trade-offs between sensitivity and specificity.
Interpretation

• AUC = 0.5: Random guessing.

• AUC > 0.5 and < 0.7: Poor to fair
discrimination.
• AUC ≥ 0.7 and < 0.8: Acceptable
discrimination.
• AUC ≥ 0.8 and < 0.9: Good discrimination.
• AUC ≥ 0.9: Excellent discrimination.
Advantages of ROC Curve and AUC

1. Threshold Invariance: ROC curve and AUC provide

a comprehensive view of the model's performance
across different threshold settings, making them
useful when the decision threshold needs to be
adjusted.
2. Class Imbalance: They are less affected by class
imbalance compared to metrics like accuracy.
3. Model Comparison: ROC curves and AUC allow for
easy comparison between different models, helping
you choose the best-performing one.
Finding Optimal Classification Cut-off
• While using logistic regression model one of
the decisions that a data scientist has to make
is to choose the right classification cut-off
probability (Pc ).
• The overall accuracy, sensitivity, and
specificity will depend on the chosen cut-off
probability.
The following two methods are used
for selecting the cut-off probability:
1. Youden’s index
2. Cost-based approach
Youden’s Index

• Sensitivity and specificity change when we

change the cut-off probability.
• Youden’s index (Youden, 1950) is a
classification cut-off probability for which the
following function is maximized (also known
as J-statistic):
• Youden’s Index = J-Statistic = Max [Sensitivity
(p) + Specificity (p) – 1]=Max(tpr-fpr)
Cost-Based Approach

• As the cost of false negatives and false positives is not

same, the optimal classification cut-off probability can
also be determined using cost-based approach, which
finds the cut-off where the total cost is minimum. In
the cost-based approach, we assign penalty cost for
misclassification of positives and negatives and find
the total cost for a cut-off probability.
• Assuming cost of a false positive is C1 and that of a
false negative is C2, total cost will be
• Total cost = FN × C1 + FP × C2
• The optimal cut-off probability is the one which
minimizes the total penalty cost.
Example 1 (LR3)
• The example we consider below is a marketing
scenario in which we try to predict the probability that
a customer will renew his subscription to an online
information service.
• The data correspond to a sample of 60 readers, with
the age category, the average number of page views
per week over the last 10 weeks, and the number of
page views during the last week. These readers were
asked to renew their subscription which is due to
expire in two weeks.
• The goal is to understand why some have
re-subscribed while others have not.
Step 1. Import data
# import data
import pandas as pd
df =
pd.read_excel('C:/Users/LENOVO/Desktop/RLA/
BMS/Sem 3/Introduction to Business
Analytics/Practical/Logistic regression
examples/LR3.xlsx') # insert path of your Excel
file
df.head()
Step 2. import libraries
#Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_curve, roc_auc_score
import statsmodels.api as sm
Step3- Preparing the variables
#prepare the X and y
y = df.Renewed
X = df.iloc[:,:-1]
X.head()

(we will specify the y variable from the dataset and

segregate all the other variables except y in x, make
sure the y variable is the last column with values
only 0 or 1)
Step 4- Splitting Dataset into Training
and Test Sets
• Before building the model, split the dataset
into 80:20 or (70:30) ratio for creating training
and validation datasets.
• The model will be built using the training set
and tested using test set.
Dividing a dataset into training and testing data is a fundamental
practice in machine learning, including logistic regression, to
evaluate how well a model generalizes to new, unseen data. This
division serves several purposes:

• Model Evaluation: The primary reason for

splitting the dataset is to evaluate the
performance of the trained model. By testing
the model on data it has never seen before
(testing data), you get an estimate of how well
it will perform on real-world data. This
evaluation helps you assess whether the
model is overfitting or underfitting.
Preventing Overfitting:
• Overfitting occurs when a model learns to
perform very well on the training data but fails
to generalize to new data.
• By evaluating the model on a separate testing
dataset, you can identify if it's overfitting.
• If the model's performance on the testing data
is significantly worse than on the training
data, it's a sign of overfitting.
Real-world Simulation:
• The testing data simulates real-world
scenarios where the model is presented with
new observations that it hasn't encountered
during training.
• This is a critical aspect of assessing a model's
practical utility.
• The typical practice is to split the dataset into
two parts: a larger portion for training (usually
around 70-80% of the data) and a smaller
portion for testing (the remaining 20-30%).
• The training data is used to fit the model's
parameters, while the testing data is used to
evaluate its performance.
• This approach helps ensure that the
evaluation is unbiased and provides a reliable
estimate of the model's generalization ability.
Code

#Training and test set

from sklearn.model_selection import
train_test_split
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size = 0.2,
random_state = 42)
• X_train and y_train contain the independent
variables and response variable values for the
training dataset respectively.
• Similarly, X_test and y_test contain the
independent variables and response variable
values for the test dataset, respectively.
Step 5- fitting the model
#Logistic Regression
model = LogisticRegression()
model.fit(X_train, y_train)

(We will fit the model using LogisticRegression()

and pass training sets y_train and X_train as
parameters.)
Step 6- Predicting the model
• This line of code uses the trained model to predict the target
variable (binary classification) for the test data.
• It takes the test feature data (X_test) as input and returns
predicted labels for each observation in the test dataset.
• After executing this code, y_pred will contain the predicted
labels for the test data based on the logistic regression
model's predictions.
• For binary classification problems like logistic regression,
predicted labels are usually encoded as 0 or 1, indicating the
predicted class for each observation.
• These predicted labels can then be compared with the actual
labels (y_test) to assess the model's performance, calculate
metrics like accuracy, ROC curves, AUC, and more.
Code
# Make predictions on the test data
y_pred = model.predict(X_test)
y_pred
Step 7- Classification Report &
Confusion Matrix
• In classification, the model performance is often
measured using concepts such as sensitivity,
specificity, precision, and F-score.
• The ability of the model to correctly classify
positives and negatives is called sensitivity (also
known as recall or true positive rate) and
specificity (also known as true negative rate),
respectively.
• The terminologies sensitivity and specificity
originated in medical diagnostics
Code
#Classification report
from sklearn.metrics import
classification_report
print(classification_report(y_test, y_pred))
Output
(Accuracy)
Output
(FI SCORE)
Output
(Specificity)
Output
(Sensitivity)
Confusion matrix code
#Confusion matrix
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test, y_pred))
tn, fp, fn, tp = confusion_matrix(y_test,
y_pred).ravel()

Output:
Mannual calculations of components
of Confusion matrix
#Accuracy
accuracy = (tp + tn) / (tp + tn + fp + fn)
print("Accuracy:", round(accuracy * 100, 1), "%")

#F1_score
f1 = 2 * tp / (2 * tp + fn + fp)
print("F1_score:", round(f1 * 100, 1), "%")

#Specificity
specificity = tn / (tn + fp)
print("Specificity:", round(specificity * 100, 1), "%")

#Sensitivity
sensitivity = tp / (tp + fn)
print("Sensitivity:", round(sensitivity * 100, 1), "%")
Output
Accuracy: 75.0 %
F1_score: 76.9 %
Specificity: 66.7 %
Sensitivity: 83.3 %
Analysis- Accuracy
• An accuracy of 75.0% in a logistic regression
analysis indicates that the model's predictions
are correct for 75% of the observations in the
dataset.
F1 score
• An F1 score of 76.9% in logistic regression signifies a well-balanced
model performance in binary classification, considering both
precision and recall.
• This score is especially important when dealing with imbalanced
datasets, offering a reliable measure of the model's ability to
manage false positives and false negatives.
• The F1 score's harmonious blend of precision and recall aids in
making informed decisions across applications, and its value
should be assessed alongside contextual factors and comparative
analyses for a comprehensive understanding of the model's
efficacy.
• An F1 score of 76.9% suggests that the model is achieving a
balanced performance in terms of both precision and recall.
• This means that the model is making a good trade-off between
minimizing false positives (precision) and false negatives (recall).
Specificity
• A specificity of 66.7% in logistic regression
indicates the model's ability to correctly
identify negative cases among all actual
negatives.
• This metric is crucial for tasks where avoiding
false positives is vital.
• A high specificity suggests the model is adept
at reducing false alarms.
Sensitivity
• A sensitivity of 83.3% in logistic regression
highlights the model's capability to correctly
identify positive cases among all actual positives.
• This metric is vital in scenarios where avoiding
false negatives is critical.
• A high sensitivity indicates the model is proficient
at capturing true positives.
• Nonetheless, striking a balance between
sensitivity and specificity is essential, particularly
when false positives and false negatives have
differing consequences.
Step 8- Display coefficients and
intercept
• # Display coefficients and intercept
• coef = model.coef_
• intercept = model.intercept_
• print("Coefficients:", coef)
• print("Intercept:", intercept)
Output
Coefficients: [[-0.00820279 0.04094567
0.04561219]]
Intercept: [-1.42380146]
Analysis
• The log of odds ratio or probability that a customer will renew his
subscription to an online information service will decrease by 0.0082
units if the age increases by 1 unit. This means more people will renew
the subscription to online information service as age decreases.
• The log of odds ratio or probability that a customer will renew his
subscription to an online information service will increase by 0.0409
units if the average number of page views per week over the last 10
weeks increases by 1 unit. This means more people will renew the
subscription to online information service as views in the last 10 weeks
increases.
• The log of odds ratio or probability that a customer will renew his
subscription to an online information service will increase by 0.0456
units if the number of page views during the last week increases by 1
unit. This means more people will renew the subscription to online
information service as views in the last weeks increases.
• This is the baseline log-odds when all independent variables are zero.
In this context, it represents the estimated log-odds of subscription
renew with zero age, no average number of page views per week over
the last 10 weeks , and zero page views during the last week. The
intercept is then used in the logistic function to calculate the initial
probability.
Step 9- Obtaining predicted
probabilities
• We use the trained logistic regression model (model) to generate
predicted probabilities for each class (positive and negative) for the test
data (X_test). It computes the probability of each observation belonging to
each class.
• Decision Threshold: By obtaining the predicted probabilities for the
positive class, you can apply a decision threshold (usually 0.5) to convert
these probabilities into binary class predictions. If the predicted
probability is greater than or equal to the threshold, the observation is
classified as the positive class; otherwise, it's classified as the negative
class.
• ROC Curve and AUC: Predicted probabilities are used to create Receiver
Operating Characteristic (ROC) curves and calculate Area Under the Curve
(AUC), which provide insights into the model's performance across various
thresholds.
• In summary, obtaining the predicted probabilities using is crucial for
making informed decisions, understanding the model's confidence, tuning
threshold-dependent metrics, and assessing the model's overall
performance.
Code
y_pred_prob = model.predict_proba(X_test)[:,
1]
y_pred_prob
Output
array([0.82546484, 0.45931361, 0.48723111,
0.52871316, 0.83128135, 0.40167389, 0.5608134 ,
0.99609292, 0.99968847, 0.3439706 , 0.3461162 ,
0.99847422])

Interpretation: Each probability value indicates how

likely the corresponding observation is to belong to
the positive class. For instance, a probability of
0.825 for an observation implies that the model
estimates an 82.5% chance that this observation
belongs to the positive class.
Step 10- Calculate ROC curve and AUC

• Receiver operating characteristic (ROC) curve can be

used to understand the overall performance (worth)
of a logistic regression model (and, in general, of
classification models) and used for model selection.
• Given a random pair of positive and negative class
records, ROC gives the proportions of such pairs that
will be correctly classified.
• ROC curve is a plot between sensitivity (true positive
rate) on the vertical axis and 1 – specificity (false
positive rate) on the horizontal axis.
Code
# Calculate ROC curve and AUC
fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
roc_auc = roc_auc_score(y_test, y_pred_prob)

# Plot the ROC curve

plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()
Output
Analysis
• As a thumb rule, AUC of at least 0.7 is required for
practical application of the model.
• AUC greater than 0.9 implies an outstanding model.
• Caution should be exercised while selecting models
based on AUC, especially when the data is imbalanced
(i.e., dataset which has less than 10% positives).
• In case of imbalanced datasets, the AUC may be very
high (greater than 0.9); however, either sensitivity or
specificity values may be poor.
• For this example, the AUC is 0.81, which implies the
model is fairly good.
Example 1(spam)
• Spam E-mail
• Data Description: The data consist of 4601 email items, of
which 1813 items were identified as spam.
• Format: This data frame contains the following columns:
• crl.tot: total length of words in capitals
• Dollar: number of occurrences of the \$ symbol
• Bang: number of occurrences of the ! symbol
• Money: number of occurrences of the word ‘money’
• N000: number of occurrences of the string ‘000’
• Make: number of occurrences of the word ‘make’
• Yesno: outcome variable, a factor with levels n not spam, y
spam
Example 3 (LR1)
• Includes customer information such as:
• Age
• How many days since they first visited the
store website
• No. of items in their cart
• The first column is the dependent variable
that indicates whether the customer
purchased on their latest visit

Game King - Software Products - Release 5 PDF
No ratings yet
Game King - Software Products - Release 5 PDF
134 pages
13 Logistic Regression Main
No ratings yet
13 Logistic Regression Main
14 pages
Session9-LogisticRegression_a6c5bc556df30fa3eb779e22e464a08a - Copy
No ratings yet
Session9-LogisticRegression_a6c5bc556df30fa3eb779e22e464a08a - Copy
33 pages
Session-11 Machine Learning - Jupyter Notebook
No ratings yet
Session-11 Machine Learning - Jupyter Notebook
11 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regressions
No ratings yet
Logistic Regressions
11 pages
Unit II
100% (1)
Unit II
13 pages
ML_MU_Unit_2 - Supervised Learning-Classification Techniques
No ratings yet
ML_MU_Unit_2 - Supervised Learning-Classification Techniques
153 pages
3-Intro-to-Logistic-Regression-LT
No ratings yet
3-Intro-to-Logistic-Regression-LT
18 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
Logistic Regression
No ratings yet
Logistic Regression
5 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
Logistic_Regression_Class_Notes
No ratings yet
Logistic_Regression_Class_Notes
3 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
03 Logistic Regression
No ratings yet
03 Logistic Regression
23 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
23 pages
Part A Assignment - No - 5 PDF
No ratings yet
Part A Assignment - No - 5 PDF
8 pages
Session-11 Machine Learning
No ratings yet
Session-11 Machine Learning
27 pages
Classification Basics
No ratings yet
Classification Basics
14 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
logisticregression
No ratings yet
logisticregression
22 pages
Logistic regression
No ratings yet
Logistic regression
12 pages
Interview Questions
No ratings yet
Interview Questions
26 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
Sonia Jessica - 2022 - How Does Logistic Regression Work
No ratings yet
Sonia Jessica - 2022 - How Does Logistic Regression Work
4 pages
4. Logistic Regression
No ratings yet
4. Logistic Regression
21 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Deep Learning Week 204-4
No ratings yet
Deep Learning Week 204-4
1 page
Logistic Regression
No ratings yet
Logistic Regression
20 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
ML 4
No ratings yet
ML 4
80 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
11 pages
Module-2_Logistic Regression in Machine Learning
No ratings yet
Module-2_Logistic Regression in Machine Learning
28 pages
logistic regression
No ratings yet
logistic regression
6 pages
Chapter 4 Statistical Classification Methods
No ratings yet
Chapter 4 Statistical Classification Methods
63 pages
ml (08-08-2024)
No ratings yet
ml (08-08-2024)
5 pages
07 Logistics Regression
No ratings yet
07 Logistics Regression
23 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
Logistic Regression
No ratings yet
Logistic Regression
1 page
Presentation (FA20 BCS 104)
No ratings yet
Presentation (FA20 BCS 104)
9 pages
Classification Algorithms
100% (2)
Classification Algorithms
23 pages
Slide 2
No ratings yet
Slide 2
30 pages
Logistic Regression and Naive Bayes
No ratings yet
Logistic Regression and Naive Bayes
4 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
10 pages
FALLSEM2024-25 MMAT501L TH VL2024250107615 2024-09-24 Reference-Material-I
No ratings yet
FALLSEM2024-25 MMAT501L TH VL2024250107615 2024-09-24 Reference-Material-I
12 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Logistic+regression Data
No ratings yet
Logistic+regression Data
13 pages
02 LogisticRegression
No ratings yet
02 LogisticRegression
29 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Analysis & Design of Algorithms: Introduction To Sorting
No ratings yet
Analysis & Design of Algorithms: Introduction To Sorting
9 pages
Finite Automato Final 28-08-2022
No ratings yet
Finite Automato Final 28-08-2022
35 pages
RAM and Types of Rams
No ratings yet
RAM and Types of Rams
11 pages
Mixing The Warm Cool Palette Anna Wakitsch
No ratings yet
Mixing The Warm Cool Palette Anna Wakitsch
8 pages
CV Odorovic
No ratings yet
CV Odorovic
4 pages
Week 1 - Introduction and Basic Concepts of AI
No ratings yet
Week 1 - Introduction and Basic Concepts of AI
31 pages
Computers 2
100% (1)
Computers 2
1 page
Enterprise Computing
No ratings yet
Enterprise Computing
1 page
Presume (1) - 2
No ratings yet
Presume (1) - 2
3 pages
Broucher
No ratings yet
Broucher
3 pages
BCT - Volume 1 - Issue 1 - Pages 14-23
No ratings yet
BCT - Volume 1 - Issue 1 - Pages 14-23
10 pages
User Manual DVP-14SS
No ratings yet
User Manual DVP-14SS
440 pages
Microsoft AI Cloud Partner Program Badge Guidelines March 2024
No ratings yet
Microsoft AI Cloud Partner Program Badge Guidelines March 2024
27 pages
Car Detection From Low-Altitude UAV Imagery With
No ratings yet
Car Detection From Low-Altitude UAV Imagery With
11 pages
Integrating Okta With Microsoft Sentinel
No ratings yet
Integrating Okta With Microsoft Sentinel
18 pages
FMS Unit4
No ratings yet
FMS Unit4
12 pages
File Handling 1
No ratings yet
File Handling 1
72 pages
Mis Teau Exam Jan April 2019
No ratings yet
Mis Teau Exam Jan April 2019
2 pages
Creating An IDoc File On SAP Application
No ratings yet
Creating An IDoc File On SAP Application
16 pages
CT1 QP SetA
No ratings yet
CT1 QP SetA
4 pages
Computer Networking Principles Bonaventure 1-30-31 OTC1
No ratings yet
Computer Networking Principles Bonaventure 1-30-31 OTC1
1 page
FPVFreerider Manual PDF
No ratings yet
FPVFreerider Manual PDF
13 pages
Sliper: Schleibinger Geräte
No ratings yet
Sliper: Schleibinger Geräte
4 pages
Job Description - Fresh Minds - ASE
No ratings yet
Job Description - Fresh Minds - ASE
2 pages
Attendance Project Synopsis
No ratings yet
Attendance Project Synopsis
3 pages
Pitch Deck-MaxsPOS 2022
No ratings yet
Pitch Deck-MaxsPOS 2022
18 pages
Mod05 Kernel Arch and Task Structure
No ratings yet
Mod05 Kernel Arch and Task Structure
59 pages
Rear Seat Removal
No ratings yet
Rear Seat Removal
21 pages
Webaccess - PT. Terminal Teluk Lamong 2
No ratings yet
Webaccess - PT. Terminal Teluk Lamong 2
1 page

Practical - Logistic Regression

Uploaded by

Practical - Logistic Regression

Uploaded by

Logistic Regression

• Logistic Regression is used for binary classification

• Multinomial Logistic Regression is an extension of logistic

• Logistic Regression is used to model the

• The logistic function is the core of logistic

• Where z=b0+b1x1+b2x2 is the linear

• During training, logistic regression estimates

• The decision boundary is a threshold that

• Logistic Regression models are evaluated

• Logistic Regression is commonly used in

• Logistic Regression assumes a linear relationship

• Where P(y = 1 | X) is the probability of the positive

• To get the predicted probability for a specific instance,

• Specificity measures the ability of the model

• Precision measures the proportion of correctly

• The F-score is the harmonic mean of precision and

• The AUC is a scalar value that summarizes the

• AUC = 0.5: Random guessing.

1. Threshold Invariance: ROC curve and AUC provide

• Sensitivity and specificity change when we

• As the cost of false negatives and false positives is not

(we will specify the y variable from the dataset and

• Model Evaluation: The primary reason for

#Training and test set

(We will fit the model using LogisticRegression()

Interpretation: Each probability value indicates how

• Receiver operating characteristic (ROC) curve can be

# Plot the ROC curve

You might also like