Logistic Regression - Validating
Logistic Regression - Validating
Classification of metrics:
Assume model is identifying defaulters. In this binary classification defaulter class is class of
interest and labeled as +ive(positive -1) class, other class is –ve(negative -0)
1. True Positives -cases where the actual class of
the data point and the predicted is same. For e.g. a
defaulter (1) predicted as defaulter (1)
2. True Negatives –cases where the actual class
was non-defaulter and the prediction also was non-
defaulter
3. False Positives –cases where actual class was
negative (0) but predicted as defaulter (1)
4. False Negatives –cases where the actual class
was positive (1) but predicted as non-defaulter (0)
5. Ideal scenario will be when all positives are
predicted as positives and all negatives are predicted as
negatives
6. In practical world this will never be the case.
There will be some false positives and false negatives
7. Our objective will be to minimize both but the
problem is, when we minimize one the other will
increase and vice versa!
8. The problem is in the overlap region in the
distributions
9. Objective will be to minimize one of the error
types, either the false positive or false negative
10.
11. Minimize false negatives -if predicting a positive case as negative is going to be
moredetrimental for e.g. predicting a potential defaulter (positive) as non-defaulter
(negative)
12. 11.Minimize false positives –if predicting a negative as positive is going to be
moredetrimental for e.g. predicting a boss’s mail as spam!
13. 12.Accuracy –over all correct predictions from all the classes to total number of
cases.Should rely on this metrics only when all classes are equally represented. Not reliable
ifclass representation is lopsided as algorithms are biased towards over representedclass
14. 13.Precision -TP/ TP+ FP. When we focus on minimizing false negatives, TP will
increasebut along with it FP will also increase. How much increase in TP starts hurting (due
toincrease in FP) ?
15. 14.Recall –TP / TP + FN : when we reduce FN to increase TP, how much we gain ?
Recall and precision will oppose each other. We want recall to be as close to 1 aspossible
without precision being too bad