2 Classification
2 Classification
CLASSIFICATION
TYPES
Binary Classification Multi-class Classification
Binary classification is the task of Multiclass classification is the task
classifying the elements of a given of classifying the elements of a
set into two groups on the basis of given set into more than two
a classification rule. groups on the basis of a
classification rule.
BMSCE - ME | PA G E
MCL - Python 2
Classification
• Can you separate red class
from blue class ?
BMSCE - ME
MCL - Python
| PA G E 3
Linear Boundary
• Straight line for two dimensions.
BMSCE - ME
MCL - Python
| PA G E 4
Linear Boundary
• Straight line for two dimensions.
• Plane for three dimensions.
• Hyperplane for higher dimension
BMSCE - ME
MCL - Python
| PA G E 5
Confusion Matrix Accuracy
Predicted
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑
𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠
Positive Negative ¿ ( 𝑇𝑃 +𝑇𝑁 )
A c c u+𝐹𝑃
(𝑇𝑃+𝐹𝑁 r a c +𝑇𝑁
y )
Positive a (TP) b (FN)
Actual Confusion Matrix
Negative c (FP) d (TN)
Other error metrics
• Precision
Sensitivity Specificity • Re c a l l
• F score
a / (a + c) d/ (b + d) • ROC cur ve
Algorithm
Disease
Yes No
9 6
0.6 0.4
ZeroR Method
BMSCE - ME
MCL - Python
| PA G E 7
Blood
Pressure
Classification
Target
variable is
discrete
Target : 0/1
BMSCE - ME
MCL - Python
| PA G E 11
Can you fit a
Linear regression
model ?
BMSCE - ME
MCL - Python
| PA G E 12
BMSCE - ME
MCL - Python
| PA G E 13
BMSCE - ME
MCL - Python
| PA G E 14
BMSCE - ME
MCL - Python
| PA G E 15
BMSCE - ME
MCL - Python
| PA G E 16
BMSCE - ME
MCL - Python
| PA G E 17
BMSCE - ME
MCL - Python
| PA G E 18
BMSCE - ME
MCL - Python
| PA G E 19
Algorithm
Logistic Function
Logistic
Regression BMSCE - ME | PA G E 20
MCL - Python
Algorithm
C a l c u l a te t h e p o s te r i o r p ro ba b i l i t y, P ( A | B ) ,
from P(A), P(B), and P(B|A). Naive Bayes
classifier assume that the effect of the
value of a predictor ( x) on a given class
(c) is independent of the values of other
predictors.
Naive Bayes
BMSCE - ME
MCL - Python
| PA G E 22
Algorithm
C a l c u l a te t h e p o s te r i o r p ro ba b i l i t y, P ( A | B ) ,
from P(A), P(B), and P(B|A). Naive Bayes
classifier assume that the effect of the
value of a predictor ( x) on a given class
(c) is independent of the values of other
predictors.
P(A|D) P(A)
P(D)
Naive Bayes
BMSCE - ME | PA G E
P(D|Alco & S & Age) = P(A|D) * P(S|D)MCL
* P(Age|D)
- Python 23
* P(D)
Naive Bayes
PROS
• Very easy and fast
• Can be used for multiclass prediction
• Performs well with categorical features
• If features are independent NB gives
superior predictions
CONS
• Features are not independent in most real
life examples
• Issue with category that was not found in
training data
• Assumption that the numerical features
follow normal distribution
P(Y | X) = P(X1 | Y) * P( X2 | Y)…. P(Xn|Y) * P(Y)
BMSCE - ME
MCL - Python
| PA G E 24
Algorithm
Machines (SVM) points in the best way possible into different classes
in that N-dimensions BMSCE - ME | PA G E
25
MCL - Python
Which one would you select ?
Why ?
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 26
Which one would you select ?
Why ?
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 27
Maximum Margin Classifier
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 28
What will you do in this case?
How about Z = X2 + Y2
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 29
What will you do in this case?
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 30
What will you do in this case?
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 31
What will you do in this case?
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 32
Support Vector
Machines
PROS
Works very well for small and clean
datasets
Works well with clear separation margins
CONS
Large data sets require lot of training
time and eventually won’t perform well
Can’t do a good job when noisy data is
given ( overlapping classes)
BMSCE - ME
MCL - Python
| PA G E 33
Algorithm
Decision Trees
BMSCE - ME
MCL - Python
| PA G E 34
BMSCE - ME
MCL - Python
| PA G E 35
BMSCE - ME
MCL - Python
| PA G E 36
Steps in Decision Trees
Step
1 Calculate entropy of the target variable
Step
3 Calculate Gain for each of the above splits
BMSCE - ME
MCL - Python
| PA G E 37
Decision Trees
PROS
Implicitly perform feature selection
BMSCE - ME
MCL - Python
| PA G E 38
Bias-Variance
Tradeoff
BIAS :
How well the model fits the data
BMSCE - ME
MCL - Python
| PA G E 41
Features of
Random Forests
03 06 09
Can handle thousands of input It has methods for balancing error
Uses OOB for error
variables without variable in class population unbalanced
calculation
deletion. data sets.
BMSCE - ME
MCL - Python
| PA G E 42