0% found this document useful (0 votes)
7 views

Detailed_Classification_and_Performance_Measures_Notes

Uploaded by

kunal b malviya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Detailed_Classification_and_Performance_Measures_Notes

Uploaded by

kunal b malviya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

UNIT-II: Classification Algorithms and Performance Measures

Classification Algorithms and Performance Measures:

--------------------------------------------------

Classification is a supervised learning technique used to predict categorical labels (classes) for data

points. Below are the most common classification algorithms along with their working principles and

examples:

1. **Classification Algorithms**:

---------------------------------

a) **Logistic Regression**:

- Logistic regression is used for binary classification problems. It models the probability that a

given input belongs to a particular class using the logistic function.

- The decision boundary is determined by the sigmoid function \( \sigma(x) = \frac{1}{1 + e^{-x}}

\).

- Example: Predicting whether an email is spam (1) or not spam (0).

b) **Decision Tree Classification**:

- A decision tree builds a flowchart-like structure where each internal node represents a feature,

each branch represents a decision, and each leaf represents a class label.

- Trees are built using splitting criteria like Gini Index or Information Gain.

- Example: Predicting whether a customer will buy a product based on age and income level.

c) **Neural Network**:
- Neural networks consist of layers of interconnected nodes (neurons) where each connection

has an associated weight.

- The network learns by updating weights through backpropagation and gradient descent.

- Example: Image classification tasks, such as recognizing handwritten digits (MNIST dataset).

d) **K-Nearest Neighbors (K-NN)**:

- K-NN is a simple, instance-based algorithm where a new data point is classified based on the

majority class of its K nearest neighbors.

- Distance metrics like Euclidean distance are used to find the neighbors.

- Example: Classifying the type of flower (Iris dataset) based on petal and sepal dimensions.

e) **Support Vector Machine (SVM)**:

- SVM finds the hyperplane that maximizes the margin between two classes.

- Kernels like linear, polynomial, or RBF (Radial Basis Function) are used for non-linear decision

boundaries.

- Example: Classifying whether a tumor is malignant or benign.

f) **Naive Bayes**:

- Naive Bayes is a probabilistic algorithm based on Bayes' theorem and the assumption of

feature independence.

- Types:

1. **Gaussian Naive Bayes**: Assumes continuous features follow a normal distribution.

2. **Multinomial Naive Bayes**: Used for text classification where features represent word

counts.

3. **Bernoulli Naive Bayes**: Used for binary features.

- Example: Spam email classification using Multinomial Naive Bayes.


2. **Performance Measures**:

-----------------------------

Performance evaluation is crucial to understanding how well a classification algorithm works. Below

are the key metrics:

a) **Confusion Matrix**:

- A confusion matrix is a table that shows the true positives (TP), true negatives (TN), false

positives (FP), and false negatives (FN).

- Example for binary classification:

| | Predicted Positive | Predicted Negative |

|-----------------|--------------------|--------------------|

| **Actual Positive** | TP | FN |

| **Actual Negative** | FP | TN |

b) **Classification Accuracy**:

- Accuracy measures the percentage of correctly classified instances out of all instances.

- Formula: \( Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \)

- Example: If 90 out of 100 emails are correctly classified as spam or not spam, accuracy is

90%.

c) **Classification Report**:

- The classification report includes metrics like precision, recall, F1-score, and support for each

class.

1. **Precision**: Measures the proportion of true positive predictions among all positive
predictions.

- Formula: \( Precision = \frac{TP}{TP + FP} \)

- Example: If 80 emails are classified as spam, but only 70 are truly spam, precision = 70/80 =

87.5%.

2. **Recall (Sensitivity)**: Measures the proportion of true positives correctly identified out of all

actual positives.

- Formula: \( Recall = \frac{TP}{TP + FN} \)

- Example: If there are 100 spam emails and the model identifies 80 correctly, recall = 80/100

= 80%.

3. **F1 Score**: The harmonic mean of precision and recall, balancing both metrics.

- Formula: \( F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall} \)

- Example: If precision = 87.5% and recall = 80%, F1 = 2 * (0.875 * 0.8) / (0.875 + 0.8) =

84.06%.

4. **Support**: The number of true instances of each class.

Example Confusion Matrix Calculation:

-------------------------------------

- Actual spam emails = 100, actual not spam = 200.

- Model predicted 80 spam emails correctly (TP), misclassified 20 not spam as spam (FP), missed

20 spam emails (FN), and correctly classified 180 not spam (TN).

- Metrics:

- Precision = 80 / (80 + 20) = 80%

- Recall = 80 / (80 + 20) = 80%

- Accuracy = (80 + 180) / (100 + 200) = 86.67%

You might also like