Types of Kernels in Support Vector Machines
Types of Kernels in Support Vector Machines
In SVM, kernels play a crucial role in transforming the input data into a higher-dimensional space,
enabling the algorithm to find optimal hyperplanes for classification or regression. Here are some
common types of kernels used in SVM:
• Description: The simplest kernel, calculating the dot product between two vectors.
3. Radial Basis Function (RBF) Kernel : Most commonly used for non-linear problems.
4. Sigmoid Kernel
5. Custom Kernel
• You can define your own kernel function tailored to the specific problem.
The choice of kernel depends on the nature of the data and the complexity of the problem.
Where:
1. Dimensionality:
2. Maximizing Margin:
o SVM aims to find the hyperplane that maximizes the margin between the classes.
o The margin is the distance between the hyperplane and the closest data points from
each class, called support vectors.
3. Linear Separability:
o If the data is not linearly separable, kernel functions are used to transform the data
into a higher-dimensional space.
o SVM aims to find a hyperplane that maximizes the margin (distance between the
hyperplane and the nearest data points of any class).
o The decision boundary is determined only by the support vectors, making SVM
computationally efficient for sparse data.
3. Effective in High Dimensions:
o SVM works well with datasets that have a high number of features, as it avoids
overfitting by maximizing the margin.
4. Kernel Trick:
o SVM can handle non-linear relationships by using kernel functions to map data into a
higher-dimensional space.
o The parameter CC controls the trade-off between achieving a large margin and
minimizing classification error.
6. Versatile:
Issues in SVM:
1. Choosing the Right Kernel:
o Selecting an appropriate kernel (linear, polynomial, RBF, etc.) and its parameters is
crucial but can be challenging.
2. Computational Complexity:
o Training time is high for large datasets, especially with non-linear kernels.
3. Sensitivity to Parameters:
o SVM is sensitive to outliers; noisy data can significantly affect the margin and
hyperplane.
5. Class Imbalance:
o Struggles with datasets where one class has significantly more samples than the
other, as the decision boundary can get skewed.
6. Interpretability:
o The results of an SVM model, especially with non-linear kernels, are less
interpretable compared to simpler models.
7. Scaling of Features:
Key Components:
1. Root Node: Represents the entire dataset and the starting point for splitting.
1. Splitting: At each node, the algorithm selects the feature that best splits the dataset.
3. Prediction: In classification: Assign the most common class in the leaf node.
o In regression: Assign the mean or median of the target variable in the leaf node.
Termination Conditions:
• The goal of ID3 is to construct a decision tree that can classify a set of training examples into
given classes based on the features.
• The ID3 algorithm uses Information Gain based on Entropy as the splitting criterion to
determine the best feature at each node of the tree.
Features of ID3
• Attribute Selection - ID3 uses Information Gain to determine the most informative attribute
at each level.
• Works Well for Categorical Data : ID3 is suited for classification problems, particularly with
categorical data.
Inductive Bias-
Inductive Bias refers to the set of assumptions a machine learning algorithm makes to generalize
from the training data to unseen data.
• Shorter Trees Are Preferred: Decision trees aim to create the shortest possible tree that fits
the data.
• Preference for Features with High Information Gain The ID3 algorithm selects features
based on their information gain, assuming that features with higher information gain lead to
better classification results.
• Inductive bias helps decision trees generalize well to unseen data by preventing them from
creating unnecessarily complex models that overfit the training data.
2. Neural Networks:
3. Decision Trees:
o Assumes that data points close to each other have the same label.
Issues in Decision Tree Learning
1. Overfitting:
2. Underfitting:
4. Instability:
5. Scalability:
o Decision trees create step-like boundaries that may not fit complex patterns.
o Greedy splitting may result in suboptimal trees, as it only considers local optima.
• Pruning decisions can be challenging, and incorrect pruning may reduce tree accuracy.
Bayesian Learning - It is a probabilistic framework for machine learning that leverages
Bayes' theorem to update the probability of a hypothesis based on observed evidence or
data. Bayesian learning methods are valuable for dealing with uncertainty and making
predictions that incorporate prior knowledge.
Bayes' Theorem: It is a mathematical formula used to update the probability of an event
happening based on new evidence. It tells you how to combine what you already know
(prior knowledge) with new information to make better predictions.
Bayes Optimal Classifier: It is a theoretical model that provides the best possible
prediction for a classification problem. It predicts the class with the highest posterior
probability, considering all possible hypotheses and their probabilities.
How It Works:
1. Look at the Features:
It checks for specific clues (like whether an email has words like "win" or "offer").
2. Assume Independence:
It assumes that each clue works independently (even if that’s not true in real life).
3. Calculate Probabilities:
It calculates the probability of the email being spam or not spam based on the clues.
4. Pick the Most Likely Category:
It predicts the category (spam or not spam) with the highest probability.\