0% found this document useful (0 votes)
13 views

FML - |||

The document provides an overview of classification in machine learning, detailing its definition, types (binary, multi-class, multi-label, and imbalanced classification), and steps to build a classification model. It also explains algorithms like k-Nearest Neighbour and Decision Trees, their advantages and disadvantages, and compares Support Vector Machines with k-NN. Additionally, it discusses the importance of feature selection and presents a real-world use case for classification in loan approval prediction.

Uploaded by

toufiqkhan809
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

FML - |||

The document provides an overview of classification in machine learning, detailing its definition, types (binary, multi-class, multi-label, and imbalanced classification), and steps to build a classification model. It also explains algorithms like k-Nearest Neighbour and Decision Trees, their advantages and disadvantages, and compares Support Vector Machines with k-NN. Additionally, it discusses the importance of feature selection and presents a real-world use case for classification in loan approval prediction.

Uploaded by

toufiqkhan809
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

1. A) Define Classification and list its types with examples.

Definition: Classification is a supervised machine learning technique used to predict a category or


class label for given input data. The model learns from already labeled data and uses this learning to
predict labels for new data.

Types of Classification with Examples:

1. Binary Classification:

o Only two output classes.

o Example: Classifying emails as Spam or Not Spam.

2. Multi-Class Classification:

o More than two categories.

o Example: Classifying fruits as Apple, Banana, Orange.

3. Multi-Label Classification:

o An instance can belong to more than one class at the same time.

o Example: A movie can be labeled as both Comedy and Romance.

4. Imbalanced Classification:

o When one class has much more data than others.

o Example: Detecting fraud (fraud cases are fewer than normal ones).

1. B) Explain the steps involved in building a classification model.

1. Data Collection:
Gather the dataset which contains input features and class labels.

2. Data Preprocessing:
Clean the data by handling missing values, converting categories to numbers (encoding), and
scaling values if needed.

3. Splitting Data:
Divide the data into training and testing sets (e.g., 80% train, 20% test).

4. Model Selection:
Choose an algorithm like Decision Tree, SVM, k-NN, etc.

5. Training the Model:


Train the algorithm on the training data to learn patterns.

6. Testing the Model:


Test the trained model using test data to check how well it predicts.
7. Model Evaluation:
Use metrics like accuracy, precision, recall, and F1-score.

8. Tuning and Optimization:


Adjust model parameters to improve performance.

2. Explain the working principle of the k-Nearest Neighbour algorithm with an example.

k-NN (k-Nearest Neighbour) Algorithm:

 It is a lazy learning algorithm, meaning it doesn’t learn during training.

 When given a new data point, it compares it to all the data in the training set.

 It selects the ‘k’ closest points (neighbors) and assigns the class that is most frequent among
those neighbors.

Steps:

1. Choose a value for k (e.g., k = 3).

2. Calculate the distance (usually Euclidean) from the new point to all existing points.

3. Pick the k nearest neighbors.

4. Count how many belong to each class.

5. Assign the most frequent class to the new point.

Example: Predicting if a person likes tea or coffee based on age and income:

 New person: age 30, income ₹40k.

 Find 3 closest people from the dataset.

 If 2 like tea, 1 likes coffee → Prediction = Tea.

3. A) Describe the Decision Tree algorithm.

A Decision Tree is a tree-shaped structure used to make decisions. It splits the data into branches
based on questions or conditions.

Working:

1. It chooses the best feature that divides the data well (based on Gini Index or Information
Gain).

2. It creates a node for that feature.

3. For each possible value, it creates branches.

4. This continues until the tree reaches leaves with class labels.

Example: To predict whether to play:

 Is it sunny?

o Yes → Is humidity high?


 Yes → Don’t Play

 No → Play

o No → Play

3. B) Advantages and Disadvantages of Decision Trees (Expanded)

Advantages:

1. Simple to Understand:

o The structure looks like a flowchart with decisions (questions) and outcomes
(answers), so it is easy to explain.

2. No Need for Data Scaling:

o Works without normalizing or scaling the input features.

3. Works for Different Types of Data:

o Handles both categorical (like gender) and numerical (like age) data.

4. Fast and Efficient:

o Especially when the dataset is not too large.

Disadvantages:

1. Overfitting:

o If the tree is too deep, it might memorize the training data and perform badly on
new data.

2. Unstable:

o Small changes in data can lead to a completely different tree.

3. Biased to Dominant Classes:

o If some classes appear more in the data, it may prefer those.

4. Not Always Optimal:

o Sometimes, the decisions are not the best for generalizing.

4. A) Explain Support Vector Machines (SVM) and its key components.

Definition: SVM is a powerful supervised learning algorithm used mainly for binary classification. It
tries to find the best separating line or hyperplane between two classes.

Key Components:

1. Hyperplane:
o A line (2D) or plane (3D+) that divides the data.

o The best hyperplane separates the classes with the largest margin.

2. Support Vectors:

o Data points that are closest to the hyperplane.

o These points help in deciding the position of the hyperplane.

3. Margin:

o The distance between the support vectors and the hyperplane.

o Larger margin = better generalization = better model.

4. B) Kernel Tricks in SVM (Expanded)

Sometimes data points are not linearly separable in the current dimension. In such cases, we use
kernel tricks.

What is a Kernel Trick?


A kernel is a mathematical function that transforms the data into a higher-dimensional space, where
a linear separator (a hyperplane) can be found.

Example:

 Suppose we have data in a circle shape.

 In 2D, we can't draw a straight line to separate the classes.

 Kernel functions like RBF (Gaussian) or polynomial kernel can map the data into a higher
dimension where a straight line (hyperplane) can separate them.

Types of Kernels:

1. Linear Kernel:

o Used when data is linearly separable.

o Example: Text classification.

2. Polynomial Kernel:

o Adds polynomial features (like x², x³).

o Good for curved boundaries.

3. RBF (Radial Basis Function):

o Good for very complex patterns.

o Used when we don’t know the best shape of the decision boundary.
4. Sigmoid Kernel:

o Similar to neural networks.

Benefits:

 Helps SVM work on non-linear data.

 No need to manually transform data.

 Keeps calculations efficient.

5. Compare SVM and k-NN in performance and applications

Feature SVM k-NN

Training Time Long Very short

Prediction Time Fast Slow

Memory Usage Low (only support vectors) High (stores all data)

Best For Text, image classification Simple datasets, recommendation

Handling Large Data Good Not good

Type of Model Global model Local model

Works Well With High-dimensional data Low-dimensional data

6. A) Steps in Classification Learning (Expanded)

1. Problem Definition:

o Understand what you want to classify (e.g., spam detection).

2. Data Collection:

o Gather historical data with input features and known labels.

3. Data Preprocessing:

o Clean data, handle missing values, encode categories, and scale features if needed.

4. Split Data:

o Divide the data into training and test sets (e.g., 70/30 split).

5. Model Selection:

o Choose a suitable algorithm like Decision Tree, SVM, or k-NN based on data and
accuracy needs.

6. Training:
o Use the training data to help the algorithm learn the pattern.

7. Evaluation:

o Use the test data and metrics like accuracy, precision, recall, and F1-score.

8. Improvement (Tuning):

o Try changing algorithm parameters or selecting better features for higher


performance.

9. Deployment:

o Use the trained model in a real-world system (like a medical app or email filter).

6. B) Real-World Use-Case of Classification (Expanded)

Use Case: Loan Approval Prediction

 Problem: Banks need to decide whether to give a loan to a person or not.

 Input Features: Age, job type, income, credit score, number of dependents.

 Output Label: Approve loan (Yes/No).

How Classification Helps:

 The model learns from past loan data.

 It predicts whether a new applicant is likely to repay the loan.

 Saves time and reduces human error.

 Can improve bank profits by reducing bad loans.

7. How does feature selection affect the performance of classification algorithms? (Expanded)

Explanation:

Features are the input values used to make predictions (like age, income, marks, etc.).
If the right features are chosen, the model will learn better patterns and make more accurate
predictions.

Importance of Feature Selection:

1. Better Accuracy:

o Irrelevant or noisy features can mislead the model.

o Example: Using a person’s “eye color” to predict salary may confuse the model.

2. Faster Computation:

o Fewer features = less time needed to train and predict.


3. Prevents Overfitting:

o Using only important features helps the model generalize better to new data.

4. Simplifies the Model:

o Makes it easier to understand and explain.

Example: For classifying whether a student will pass or fail:

 Good features: Study hours, attendance, past scores.

 Bad features: Favourite food, T-shirt color.

You might also like