Practical - Logistic Regression
Practical - Logistic Regression
Unit 3
• Logistic and Multinomial Regression
• Understanding logistic regression and its use in
binary classification.
• Estimating probabilities using logistic regression.
• Model evaluation metrics: sensitivity, specificity,
precision, F-score.
• Model Performance and Conclusion
• Introduction to ROC curve and AUC.
• Determining the optimal cutoff probability.
•
Logistic Regression and Multinomial
Logistic Regression
• They are both types of regression analysis
used for binary and multiclass classification,
respectively.
• They are commonly used in machine learning
and statistics to model the relationship
between a set of input features and
categorical target variables.
Logistic Regression
• σ(z)=1/(1+e-z)
Output:
Mannual calculations of components
of Confusion matrix
#Accuracy
accuracy = (tp + tn) / (tp + tn + fp + fn)
print("Accuracy:", round(accuracy * 100, 1), "%")
#F1_score
f1 = 2 * tp / (2 * tp + fn + fp)
print("F1_score:", round(f1 * 100, 1), "%")
#Specificity
specificity = tn / (tn + fp)
print("Specificity:", round(specificity * 100, 1), "%")
#Sensitivity
sensitivity = tp / (tp + fn)
print("Sensitivity:", round(sensitivity * 100, 1), "%")
Output
Accuracy: 75.0 %
F1_score: 76.9 %
Specificity: 66.7 %
Sensitivity: 83.3 %
Analysis- Accuracy
• An accuracy of 75.0% in a logistic regression
analysis indicates that the model's predictions
are correct for 75% of the observations in the
dataset.
F1 score
• An F1 score of 76.9% in logistic regression signifies a well-balanced
model performance in binary classification, considering both
precision and recall.
• This score is especially important when dealing with imbalanced
datasets, offering a reliable measure of the model's ability to
manage false positives and false negatives.
• The F1 score's harmonious blend of precision and recall aids in
making informed decisions across applications, and its value
should be assessed alongside contextual factors and comparative
analyses for a comprehensive understanding of the model's
efficacy.
• An F1 score of 76.9% suggests that the model is achieving a
balanced performance in terms of both precision and recall.
• This means that the model is making a good trade-off between
minimizing false positives (precision) and false negatives (recall).
Specificity
• A specificity of 66.7% in logistic regression
indicates the model's ability to correctly
identify negative cases among all actual
negatives.
• This metric is crucial for tasks where avoiding
false positives is vital.
• A high specificity suggests the model is adept
at reducing false alarms.
Sensitivity
• A sensitivity of 83.3% in logistic regression
highlights the model's capability to correctly
identify positive cases among all actual positives.
• This metric is vital in scenarios where avoiding
false negatives is critical.
• A high sensitivity indicates the model is proficient
at capturing true positives.
• Nonetheless, striking a balance between
sensitivity and specificity is essential, particularly
when false positives and false negatives have
differing consequences.
Step 8- Display coefficients and
intercept
• # Display coefficients and intercept
• coef = model.coef_
• intercept = model.intercept_
• print("Coefficients:", coef)
• print("Intercept:", intercept)
Output
Coefficients: [[-0.00820279 0.04094567
0.04561219]]
Intercept: [-1.42380146]
Analysis
• The log of odds ratio or probability that a customer will renew his
subscription to an online information service will decrease by 0.0082
units if the age increases by 1 unit. This means more people will renew
the subscription to online information service as age decreases.
• The log of odds ratio or probability that a customer will renew his
subscription to an online information service will increase by 0.0409
units if the average number of page views per week over the last 10
weeks increases by 1 unit. This means more people will renew the
subscription to online information service as views in the last 10 weeks
increases.
• The log of odds ratio or probability that a customer will renew his
subscription to an online information service will increase by 0.0456
units if the number of page views during the last week increases by 1
unit. This means more people will renew the subscription to online
information service as views in the last weeks increases.
• This is the baseline log-odds when all independent variables are zero.
In this context, it represents the estimated log-odds of subscription
renew with zero age, no average number of page views per week over
the last 10 weeks , and zero page views during the last week. The
intercept is then used in the logistic function to calculate the initial
probability.
Step 9- Obtaining predicted
probabilities
• We use the trained logistic regression model (model) to generate
predicted probabilities for each class (positive and negative) for the test
data (X_test). It computes the probability of each observation belonging to
each class.
• Decision Threshold: By obtaining the predicted probabilities for the
positive class, you can apply a decision threshold (usually 0.5) to convert
these probabilities into binary class predictions. If the predicted
probability is greater than or equal to the threshold, the observation is
classified as the positive class; otherwise, it's classified as the negative
class.
• ROC Curve and AUC: Predicted probabilities are used to create Receiver
Operating Characteristic (ROC) curves and calculate Area Under the Curve
(AUC), which provide insights into the model's performance across various
thresholds.
• In summary, obtaining the predicted probabilities using is crucial for
making informed decisions, understanding the model's confidence, tuning
threshold-dependent metrics, and assessing the model's overall
performance.
Code
y_pred_prob = model.predict_proba(X_test)[:,
1]
y_pred_prob
Output
array([0.82546484, 0.45931361, 0.48723111,
0.52871316, 0.83128135, 0.40167389, 0.5608134 ,
0.99609292, 0.99968847, 0.3439706 , 0.3461162 ,
0.99847422])