0% found this document useful (0 votes)

11 views

Types of Kernels in Support Vector Machines

The document discusses various machine learning concepts, focusing on Support Vector Machines (SVM) and Decision Trees. It details different types of kernels used in SVM, the structure and functioning of decision trees, and introduces Bayesian learning methods, including Naïve Bayes classifiers and Bayesian belief networks. Key challenges and advantages of each method are also highlighted, emphasizing the importance of kernel selection, overfitting, and the handling of uncertainty.

Uploaded by

[I8OII3IO46] Gautam Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Types of Kernels in Support Vector Machines

Uploaded by

[I8OII3IO46] Gautam Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Types of Kernels in Support Vector Machines (SVM)

In SVM, kernels play a crucial role in transforming the input data into a higher-dimensional space,
enabling the algorithm to find optimal hyperplanes for classification or regression. Here are some
common types of kernels used in SVM:

1. Linear Kernel - Used when the data is linearly separable.

• Equation: K(x1, x2) = x1 · x2

• Description: The simplest kernel, calculating the dot product between two vectors.

2. Polynomial Kernel - Suitable for non-linear data.

• Equation: K(x1, x2) = (x1 · x2 + c)^d

• The degree of the polynomial determines the flexibility.

3. Radial Basis Function (RBF) Kernel : Most commonly used for non-linear problems.

• Captures the similarity between points based on distance.

• Equation: K(x1, x2) = exp(-γ||x1 - x2||2)

• The parameter 'γ' controls the width of the RBF kernel.

4. Sigmoid Kernel

• Similar to a neural network activation function.

5. Custom Kernel

• You can define your own kernel function tailored to the specific problem.

Choosing the Right Kernel

The choice of kernel depends on the nature of the data and the complexity of the problem.

• Linearly separable data: Use the linear kernel.

• Non-linear, low-dimensional data: Consider the polynomial kernel.

• Complex, high-dimensional data: The RBF kernel is often a good choice.

In the context of Support Vector Machines (SVM), a hyperplane is a decision boundary that
separates different classes in the feature space. It is defined mathematically as:

Where:

• W is the weight vector (normal to the hyperplane).

• X is the input vector (data point).

• b is the bias term (offset from the origin).

Key Features of a Hyperplane:

1. Dimensionality:

o In a 2D space, the hyperplane is a line.

o In a 3D space, the hyperplane is a plane.

o In higher dimensions, it is a generalization called a "hyperplane."

2. Maximizing Margin:

o SVM aims to find the hyperplane that maximizes the margin between the classes.

o The margin is the distance between the hyperplane and the closest data points from
each class, called support vectors.

3. Linear Separability:

o If the data is linearly separable, the SVM finds a single hyperplane.

o If the data is not linearly separable, kernel functions are used to transform the data
into a higher-dimensional space.

Hyperplane in Different Scenarios:

• For binary classification, the hyperplane separates the two classes.

• For multi-class classification, multiple hyperplanes are used (one-vs-one or one-vs-all

strategies).

Properties of SVM (Support Vector Machine):

1. Maximizes the Margin:

o SVM aims to find a hyperplane that maximizes the margin (distance between the
hyperplane and the nearest data points of any class).

2. Uses Support Vectors:

o The decision boundary is determined only by the support vectors, making SVM
computationally efficient for sparse data.
3. Effective in High Dimensions:

o SVM works well with datasets that have a high number of features, as it avoids
overfitting by maximizing the margin.

4. Kernel Trick:

o SVM can handle non-linear relationships by using kernel functions to map data into a
higher-dimensional space.

5. Regularization Parameter (C):

o The parameter CC controls the trade-off between achieving a large margin and
minimizing classification error.

6. Versatile:

o Can be used for classification, regression (SVR), and outlier detection.

Issues in SVM:
1. Choosing the Right Kernel:

o Selecting an appropriate kernel (linear, polynomial, RBF, etc.) and its parameters is
crucial but can be challenging.

2. Computational Complexity:

o Training time is high for large datasets, especially with non-linear kernels.

3. Sensitivity to Parameters:

o Performance heavily depends on hyperparameters like CC (regularization) and

γ\gamma (kernel parameter).

4. Not Suitable for Noisy Data:

o SVM is sensitive to outliers; noisy data can significantly affect the margin and
hyperplane.

5. Class Imbalance:

o Struggles with datasets where one class has significantly more samples than the
other, as the decision boundary can get skewed.

6. Interpretability:

o The results of an SVM model, especially with non-linear kernels, are less
interpretable compared to simpler models.

7. Scaling of Features:

o SVM requires proper scaling of features to perform optimally, as it is sensitive to the

magnitude of the features.
Decision Tree : A decision tree is a supervised learning model used for classification and
regression tasks. It works by splitting data into subsets based on the feature values, creating a tree-
like structure of decisions.

Key Components:

1. Root Node: Represents the entire dataset and the starting point for splitting.

2. Internal Nodes: Represent tests on features to split data further.

3. Branches: Represent the outcome of a test (feature value).

4. Leaf Nodes: Represent the final decision or prediction (class or value).

How a Decision Tree Works:

1. Splitting: At each node, the algorithm selects the feature that best splits the dataset.

o Metrics for splitting include:

▪ Gini Index: Measures impurity of a split.

▪ Information Gain: Measures reduction in entropy after the split.

▪ Variance Reduction: Used for regression tasks.

2. Stopping Criterion: Splitting stops when:

▪ All data points in a node belong to one class (pure node).

▪ No features are left for further splitting.

▪ A maximum depth or minimum samples per node is reached.

3. Prediction: In classification: Assign the most common class in the leaf node.

o In regression: Assign the mean or median of the target variable in the leaf node.

Algorithm Steps of Decision Tree

1. Input: Training dataset with features and labels.
2. Start at the Root Node: Calculate the splitting criteria (e.g., Information Gain, Gini Impurity)
for all features.
Choose the feature with the highest information gain (or lowest Gini Impurity).
3. Split the Data: Split the data into subsets based on the chosen feature. Each subset forms a
branch from the node.
4. Repeat: Apply the same process to each subset (node) until one of the stopping criteria is
met (e.g., pure nodes, max depth reached).
5. Output : A decision tree where each leaf node provides the final classification or prediction.

Termination Conditions:

1. All data points in a node belong to a single class.

2. No features are left to split on.

3. A pre-defined tree depth or minimum samples per node is reached.

ID3 Algorithm (Iterative Dichotomiser 3) : The ID3 algorithm is one of the earliest and
most well-known decision tree learning algorithms developed by Ross Quinlan. ID3 is used to create
a decision tree based on a given dataset.

Working Principle of ID3 Algorithm –

• The goal of ID3 is to construct a decision tree that can classify a set of training examples into
given classes based on the features.
• The ID3 algorithm uses Information Gain based on Entropy as the splitting criterion to
determine the best feature at each node of the tree.

Steps in ID3 Algorithm

1. Start at the Root Node: Begin with the full training dataset at the root.
2. Calculate Entropy and Information Gain for all features.
3. Choose the Best Feature: The feature with the highest Information Gain is selected to split
the data.
4. Split Data: Create branches for each possible value of the feature, and assign subsets of the
training data to these branches.
5. Repeat Recursively: Continue splitting each subset based on the next feature with the
highest information gain until all data points are classified or other stopping criteria are met
(e.g., all samples belong to the same class).
6. Create Leaf Nodes: When all data points are classified, the nodes are turned into leaf nodes,
which represent the final decision.

Features of ID3
• Attribute Selection - ID3 uses Information Gain to determine the most informative attribute
at each level.
• Works Well for Categorical Data : ID3 is suited for classification problems, particularly with
categorical data.
Inductive Bias-
Inductive Bias refers to the set of assumptions a machine learning algorithm makes to generalize
from the training data to unseen data.

Inductive Bias in Decision Trees:

The Inductive Bias of a decision tree is:

• Shorter Trees Are Preferred: Decision trees aim to create the shortest possible tree that fits
the data.
• Preference for Features with High Information Gain The ID3 algorithm selects features
based on their information gain, assuming that features with higher information gain lead to
better classification results.

Importance of Inductive Bias:

• Inductive bias helps decision trees generalize well to unseen data by preventing them from
creating unnecessarily complex models that overfit the training data.

Inductive Bias in Machine Learning Models:

1. Support Vector Machines (SVM):

o Assumes data is separable by a hyperplane or can be made separable with a kernel

function.

2. Neural Networks:

o Assumes a smooth mapping between inputs and outputs.

o Architecture (e.g., number of layers, activation functions) defines the bias.

3. Decision Trees:

o Prefers smaller trees with fewer splits (Occam's Razor).

4. k-Nearest Neighbors (k-NN):

o Assumes that data points close to each other have the same label.
Issues in Decision Tree Learning

1. Overfitting:

o Trees grow too complex, capturing noise and reducing generalization.

2. Underfitting:

o Trees are too simple, failing to capture patterns in data.

3. Bias Toward Categorical Features:

o Features with many unique levels dominate splits unfairly.

4. Instability:

o Small changes in data can lead to significantly different tree structures.

5. Scalability:

o Computationally expensive for large datasets or high-dimensional data.

6. Lack of Smooth Decision Boundaries:

o Decision trees create step-like boundaries that may not fit complex patterns.

7. Data Imbalance Sensitivity:

o Performance can drop if one class significantly outweighs others.

8. Greedy Algorithm Limitation:

o Greedy splitting may result in suboptimal trees, as it only considers local optima.

9. Difficulty in Handling Missing Data:

o Decision trees struggle to handle missing values effectively without imputation or

specific strategies.

10. Pruning Complexity:

• Pruning decisions can be challenging, and incorrect pruning may reduce tree accuracy.
Bayesian Learning - It is a probabilistic framework for machine learning that leverages
Bayes' theorem to update the probability of a hypothesis based on observed evidence or
data. Bayesian learning methods are valuable for dealing with uncertainty and making
predictions that incorporate prior knowledge.
Bayes' Theorem: It is a mathematical formula used to update the probability of an event
happening based on new evidence. It tells you how to combine what you already know
(prior knowledge) with new information to make better predictions.
Bayes Optimal Classifier: It is a theoretical model that provides the best possible
prediction for a classification problem. It predicts the class with the highest posterior
probability, considering all possible hypotheses and their probabilities.

Steps in Bayes Optimal Classification:

3. Predict the Most Probable Class:

Choose the class CC with the maximum posterior probability.
Example:
Scenario: Classify an email as spam or not spam based on words like "win" or "offer."
• Classes: C1=Spam,C2=Not Spam
• Hypotheses:
o h1: “Emails with ‘win’ and ‘money’ are spam.”
o h2h_2: “Emails with ‘offer’ are spam.”
Step 1: Compute Prior Probabilities P(C1) and P(C2)
Step 2: Calculate Likelihoods P(D∣C1) and P(D|C_2).
Step 3: Compute Posterior Probabilities using Bayes’ Theorem.
Step 4: Predict the class with the highest P(C|D).
Advantages:
1. Optimal Performance: Guarantees the best prediction if probabilities are known.
2. Accounts for Uncertainty: Considers all hypotheses and weighs their probabilities.
Challenges:
1. Computational Complexity: Requires calculating probabilities for all possible
hypotheses.
2. Dependency on Accurate Probabilities
Naïve Bayes Classifier It is a is a method used to classify things (like whether an email is
spam or not) based on probabilities. It uses a formula called Bayes' Theorem to calculate
the likelihood of something belonging to a particular category.

How It Works:
1. Look at the Features:
It checks for specific clues (like whether an email has words like "win" or "offer").
2. Assume Independence:
It assumes that each clue works independently (even if that’s not true in real life).
3. Calculate Probabilities:
It calculates the probability of the email being spam or not spam based on the clues.
4. Pick the Most Likely Category:
It predicts the category (spam or not spam) with the highest probability.\

Steps in Naïve Bayes Classification:

1. Calculate Prior Probabilities P(C):
Estimate the prior probability for each class C from the training data.
2. Calculate Likelihood P(X_i|C):
For each feature Xi and class C, compute the likelihood.
3. Apply Bayes' Theorem:
Combine prior and likelihood to calculate P(C∣X)P for each class.
4. Choose the Class with Maximum Posterior Probability:
Predict the class C that maximizes P(C∣X).
Bayesian Belief Networks (BBNs): It is also called a Bayesian Network, is a type of
graphical model that represents a set of variables and their probabilistic dependencies using
a directed acyclic graph (DAG). It is based on Bayesian probability theory.
Key Components:
1. Nodes: Each node represents a variable (e.g., weather, traffic). Variables can be
observable, hidden, or query variables.
2. Edges: Directed edges (arrows) between nodes represent dependencies. For
example, if rain affects traffic, an arrow will go from "Rain" to "Traffic."
3. Conditional Probabilities: Each node has a conditional probability table (CPT) that
quantifies the effect of parent nodes on it.
o Example: The probability of traffic given it is raining
4. Independence:
A node is conditionally independent of its non-descendants, given its parent nodes.
How It Works:
1. Nodes = Things to Know: Each circle (node) in the map represents something you
care about. For example:
o "Rain" (Is it raining?)
o "Traffic" (Is there heavy traffic?)
2. Arrows = Connections: Arrows between nodes show how one thing affects another.
For example:
o If it rains, it might cause traffic, so there's an arrow from "Rain" to "Traffic."
3. Probabilities = How Likely Things Are:
Each connection has a table of probabilities. For example:
o If it rains, the chance of traffic might be 90%.
o If it doesn’t rain, the chance of traffic might only be 30%.
4. Make Predictions: If you know one thing (like "It’s raining"), the network can
calculate how likely other things are (like "There will be traffic").
Example: Imagine you want to predict if you'll be late for work:
• Nodes: "Rain," "Traffic," "Late for Work."
• Connections: Rain → Traffic (Rain can cause traffic).
o Traffic → Late for Work (Traffic can make you late).
If you know it's raining, the network can predict the chances of heavy traffic and whether
you might be late for work.
Advantages:
1. Handles Uncertainty:
2. Intuitive Representation:
3. Flexible Querying:
Challenges:
1. Complexity: Computationally expensive for large networks.
2. Requires Probabilities: Needs accurate conditional probabilities for all variables.

Machine Learning
No ratings yet
Machine Learning
15 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
13 pages
ML UNIT4
No ratings yet
ML UNIT4
10 pages
AAM UT-1 QB ANS
No ratings yet
AAM UT-1 QB ANS
12 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Three Machine Learning Algorithms
No ratings yet
Three Machine Learning Algorithms
11 pages
Aam Ut-1 Qb Ans- [Final]
No ratings yet
Aam Ut-1 Qb Ans- [Final]
28 pages
Midterm Topics - V Advanced Data Mining Algorithms
No ratings yet
Midterm Topics - V Advanced Data Mining Algorithms
7 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
Machine Learning 1707965934
No ratings yet
Machine Learning 1707965934
15 pages
ML ModuleUntitled 2
No ratings yet
ML ModuleUntitled 2
8 pages
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
No ratings yet
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
93 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Aam Ut-1 Qb Ans [Final]
No ratings yet
Aam Ut-1 Qb Ans [Final]
26 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
5 pages
Data Collection
No ratings yet
Data Collection
8 pages
ABDULLAH SAAD MACHINE LEARNING ASSIGNMENT 01
No ratings yet
ABDULLAH SAAD MACHINE LEARNING ASSIGNMENT 01
15 pages
Support Vector Machine
No ratings yet
Support Vector Machine
13 pages
DSM_MOd_5
No ratings yet
DSM_MOd_5
34 pages
ML Mod 4
No ratings yet
ML Mod 4
13 pages
Support, Decision and Random
No ratings yet
Support, Decision and Random
8 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Assignment 1-ML
No ratings yet
Assignment 1-ML
4 pages
Lab 6 Dsa
No ratings yet
Lab 6 Dsa
15 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
ML QA
No ratings yet
ML QA
10 pages
Notes on Unsupervised Learning
No ratings yet
Notes on Unsupervised Learning
14 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
14 pages
Chatgpt Unit - 3
No ratings yet
Chatgpt Unit - 3
4 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
ML Assigment 3
No ratings yet
ML Assigment 3
4 pages
FMLanswerkey-IT 2.docx (1) (1) (1)
No ratings yet
FMLanswerkey-IT 2.docx (1) (1) (1)
11 pages
entropy and information gain for decision tree algorithm
No ratings yet
entropy and information gain for decision tree algorithm
12 pages
Clustering
No ratings yet
Clustering
3 pages
Decision Tree
No ratings yet
Decision Tree
3 pages
unit 5
No ratings yet
unit 5
25 pages
UNIT2SVMKNN
No ratings yet
UNIT2SVMKNN
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
1
No ratings yet
1
2 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Week 5 Slides
No ratings yet
Week 5 Slides
25 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-155-202
No ratings yet
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-155-202
48 pages
Unit-7 ML
No ratings yet
Unit-7 ML
11 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
ml unit3
No ratings yet
ml unit3
8 pages
DL Question Paper Solved
No ratings yet
DL Question Paper Solved
12 pages
Decision Trees
67% (3)
Decision Trees
14 pages
data science notes b
No ratings yet
data science notes b
5 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
Introduction to Convolutional Neural Networks (1)
No ratings yet
Introduction to Convolutional Neural Networks (1)
4 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
ML U4
No ratings yet
ML U4
48 pages
2179-Unit-3
No ratings yet
2179-Unit-3
29 pages
Terms to Review
No ratings yet
Terms to Review
9 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
BookSlides 4B Information Based Learning Edited
No ratings yet
BookSlides 4B Information Based Learning Edited
64 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
11 pages
Unit-5_3161610
No ratings yet
Unit-5_3161610
92 pages
Final Report
No ratings yet
Final Report
26 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
23 pages
ML Assignment 2 2019 Nptel
No ratings yet
ML Assignment 2 2019 Nptel
34 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
6729 (Ebooks PDF) Download Relational Knowledge Discovery 1st Edition M. E. Müller Full Chapters
100% (2)
6729 (Ebooks PDF) Download Relational Knowledge Discovery 1st Edition M. E. Müller Full Chapters
84 pages
ML Unit 2
No ratings yet
ML Unit 2
21 pages
21ai66 ML Lab Manual
No ratings yet
21ai66 ML Lab Manual
41 pages
Machine Learning With Decision Trees and Random Forest ?
No ratings yet
Machine Learning With Decision Trees and Random Forest ?
31 pages
Pilot_Fear_Detection_from_EEG_Signals_Classified_by_Decision_Tree_During_Landing_Conditions
No ratings yet
Pilot_Fear_Detection_from_EEG_Signals_Classified_by_Decision_Tree_During_Landing_Conditions
7 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Mla Cae 1 QB
No ratings yet
Mla Cae 1 QB
2 pages
Decision Trees
No ratings yet
Decision Trees
25 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
DM DT Solved Example 01 - Unlocked
No ratings yet
DM DT Solved Example 01 - Unlocked
4 pages
Decision Trees
No ratings yet
Decision Trees
16 pages
AI Unit 4 NEW
No ratings yet
AI Unit 4 NEW
60 pages
7 PythonDyslexia USETHIS jdr20230059
No ratings yet
7 PythonDyslexia USETHIS jdr20230059
9 pages
Module 3 (1)
No ratings yet
Module 3 (1)
63 pages
ML Unit-2
No ratings yet
ML Unit-2
51 pages
1 s2.0 S2352801X24000535 Main
No ratings yet
1 s2.0 S2352801X24000535 Main
15 pages
ML Lec-12
No ratings yet
ML Lec-12
17 pages
Lec4 Tree v2.4 1
No ratings yet
Lec4 Tree v2.4 1
54 pages
ida unit-4
No ratings yet
ida unit-4
19 pages

Types of Kernels in Support Vector Machines

Uploaded by

Types of Kernels in Support Vector Machines

Uploaded by

Types of Kernels in Support Vector Machines (SVM)

1. Linear Kernel - Used when the data is linearly separable.

• Equation: K(x1, x2) = x1 · x2

2. Polynomial Kernel - Suitable for non-linear data.

• Equation: K(x1, x2) = (x1 · x2 + c)^d

• The degree of the polynomial determines the flexibility.

• Captures the similarity between points based on distance.

• Equation: K(x1, x2) = exp(-γ||x1 - x2||2)

• The parameter 'γ' controls the width of the RBF kernel.

• Similar to a neural network activation function.

Choosing the Right Kernel

• Linearly separable data: Use the linear kernel.

• Non-linear, low-dimensional data: Consider the polynomial kernel.

• Complex, high-dimensional data: The RBF kernel is often a good choice.

• W is the weight vector (normal to the hyperplane).

• X is the input vector (data point).

• b is the bias term (offset from the origin).

Key Features of a Hyperplane:

o In a 2D space, the hyperplane is a line.

o In a 3D space, the hyperplane is a plane.

o In higher dimensions, it is a generalization called a "hyperplane."

o If the data is linearly separable, the SVM finds a single hyperplane.

Hyperplane in Different Scenarios:

• For binary classification, the hyperplane separates the two classes.

• For multi-class classification, multiple hyperplanes are used (one-vs-one or one-vs-all

Properties of SVM (Support Vector Machine):

2. Uses Support Vectors:

5. Regularization Parameter (C):

o Can be used for classification, regression (SVR), and outlier detection.

o Performance heavily depends on hyperparameters like CC (regularization) and

4. Not Suitable for Noisy Data:

o SVM requires proper scaling of features to perform optimally, as it is sensitive to the

2. Internal Nodes: Represent tests on features to split data further.

3. Branches: Represent the outcome of a test (feature value).

4. Leaf Nodes: Represent the final decision or prediction (class or value).

How a Decision Tree Works:

o Metrics for splitting include:

▪ Gini Index: Measures impurity of a split.

▪ Information Gain: Measures reduction in entropy after the split.

▪ Variance Reduction: Used for regression tasks.

2. Stopping Criterion: Splitting stops when:

▪ All data points in a node belong to one class (pure node).

▪ No features are left for further splitting.

▪ A maximum depth or minimum samples per node is reached.

Algorithm Steps of Decision Tree

1. All data points in a node belong to a single class.

2. No features are left to split on.

3. A pre-defined tree depth or minimum samples per node is reached.

Working Principle of ID3 Algorithm –

Steps in ID3 Algorithm

Inductive Bias in Decision Trees:

Importance of Inductive Bias:

Inductive Bias in Machine Learning Models:

1. Support Vector Machines (SVM):

o Assumes data is separable by a hyperplane or can be made separable with a kernel

o Assumes a smooth mapping between inputs and outputs.

o Architecture (e.g., number of layers, activation functions) defines the bias.

o Prefers smaller trees with fewer splits (Occam's Razor).

4. k-Nearest Neighbors (k-NN):

o Trees grow too complex, capturing noise and reducing generalization.

o Trees are too simple, failing to capture patterns in data.

3. Bias Toward Categorical Features:

o Features with many unique levels dominate splits unfairly.

o Small changes in data can lead to significantly different tree structures.

o Computationally expensive for large datasets or high-dimensional data.

6. Lack of Smooth Decision Boundaries:

7. Data Imbalance Sensitivity:

o Performance can drop if one class significantly outweighs others.

8. Greedy Algorithm Limitation:

9. Difficulty in Handling Missing Data:

o Decision trees struggle to handle missing values effectively without imputation or

10. Pruning Complexity:

Steps in Bayes Optimal Classification:

3. Predict the Most Probable Class:

Steps in Naïve Bayes Classification:

You might also like