0% found this document useful (0 votes)
4 views

Unit5_ml

The document discusses Support Vector Machines (SVMs), a supervised learning algorithm effective for classification and regression tasks, particularly in binary classification. It explains the concept of hyperplanes, the use of kernel functions for handling complex data, and the advantages and disadvantages of SVMs. Additionally, it covers types of SVMs, important vocabulary, and introduces Bayesian statistics and computer vision in subsequent lectures.

Uploaded by

punia4901
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Unit5_ml

The document discusses Support Vector Machines (SVMs), a supervised learning algorithm effective for classification and regression tasks, particularly in binary classification. It explains the concept of hyperplanes, the use of kernel functions for handling complex data, and the advantages and disadvantages of SVMs. Additionally, it covers types of SVMs, important vocabulary, and introduces Bayesian statistics and computer vision in subsequent lectures.

Uploaded by

punia4901
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Lecture No.

37

Today's Agenda:
Detail discussion on Support Vector Machines
Support Vector Machine (SVM)

A support vector machine (SVM) is a type of supervised learning algorithm used in machine learning to
solve classification and regression tasks; SVMs are particularly good at solving binary classification
problems, which require classifying the elements of a data set into two groups.

The aim of a support vector machine algorithm is to find the best possible line, or decision boundary,
that separates the data points of different data classes. This boundary is called a hyperplane when
working in high-dimensional feature spaces. The idea is to maximize the margin, which is the distance
between the hyperplane and the closest data points of each category, thus making it easy to distinguish
data classes.

SVMs are useful for analysing complex data that can't be separated by a simple straight line.
Called nonlinear SMVs, they do this by using a mathematical trick that transforms data into higher-
dimensional space, where it is easier to find a boundary.

Support Vector Machines Working

The key idea behind SVMs is to transform the input data into a higher-dimensional feature space. This
transformation makes it easier to find a linear separation or to more effectively classify the data set.

To do this, SVMs use a kernel function. Instead of explicitly calculating the coordinates of the
transformed space, the kernel function enables the SVM to implicitly compute the dot products between
the transformed feature vectors and avoid handling expensive, unnecessary computations for extreme
cases.

SVMs can handle both linearly separable and non-linearly separable data. They do this by using
different types of kernel functions, such as the linear kernel, polynomial kernel or radial basis function
(RBF) kernel. These kernels enable SVMs to effectively capture complex relationships and patterns in
the data.

During the training phase, SVMs use a mathematical formulation to find the optimal hyperplane in a
higher-dimensional space, often called the kernel space. This hyperplane is crucial because it maximizes
the margin between data points of different classes, while minimizing the classification errors.
The kernel function plays a critical role in SVMs, as it makes it possible to map the data from the
original feature space to the kernel space. The choice of kernel function can have a significant impact on
the performance of the SVM algorithm; choosing the best kernel function for a particular problem
depends on the characteristics of the data.

Some of the most popular kernel functions for SVMs are the following:

 Linear kernel. This is the simplest kernel function, and it maps the data to a higher-dimensional
space, where the data is linearly separable.

 Polynomial kernel. This kernel function is more powerful than the linear kernel, and it can be used
to map the data to a higher-dimensional space, where the data is non-linearly separable.

 RBF kernel. This is the most popular kernel function for SVMs, and it is effective for a wide range
of classification problems.

 Sigmoid kernel. This kernel function is similar to the RBF kernel, but it has a different shape that
can be useful for some classification problems.

The choice of kernel function for an SVM algorithm is a tradeoff between accuracy and complexity. The
more powerful kernel functions, such as the RBF kernel, can achieve higher accuracy than the simpler
kernel functions, but they also require more data and computation time to train the SVM algorithm. But
this is becoming less of an issue due to technological advances.

Once trained, SVMs can classify new, unseen data points by determining which side of the decision
boundary they fall on. The output of the SVM is the class label associated with the side of the decision
boundary.

Types of support vector machines

Support vector machines have different types and variants that provide specific functionalities and
address specific problem scenarios. Here are two types of SVMs and their significance:

1. Linear SVM. Linear SVMs use a linear kernel to create a straight-line decision boundary that
separates different classes. They are effective when the data is linearly separable or when a linear
approximation is sufficient. Linear SVMs are computationally efficient and have good
interpretability, as the decision boundary is a hyperplane in the input feature space.

2. Nonlinear SVM. Nonlinear SVMs address scenarios where the data cannot be separated by a
straight line in the input feature space. They achieve this by using kernel functions that implicitly
map the data into a higher-dimensional feature space, where a linear decision boundary can be
found. Popular kernel functions used in this type of SVM include the polynomial kernel, Gaussian
(RBF) kernel and sigmoid kernel. Nonlinear SVMs can capture complex patterns and achieve higher
classification accuracy when compared to linear SVMs.

Advantages of SVMs

SVMs are powerful machine learning algorithms that have the following advantages:

 Effective in high-dimensional spaces. High-dimensional data refers to data in which the number of
features is larger than the number of observations, i.e., data points. SVMs perform well even when
the number of features is larger than the number of samples. They can handle high-dimensional data
efficiently, making them suitable for applications with a large number of features.

 Resistant to overfitting. SVMs are less prone to overfitting compared to other algorithms, like
decision trees -- overfitting is where a model performs extremely well on the training data but
becomes too specific to that data and can't generalize to new data. SVMs' use of the margin
maximization principle helps in generalizing well to unseen data.

 Versatile. SVMs can be applied to both classification and regression problems. They support
different kernel functions, enabling flexibility in capturing complex relationships in the data. This
versatility makes SVMs applicable to a wide range of tasks.

 Effective in cases of limited data. SVMs can work well even when the training data set is small.
The use of support vectors ensures that only a subset of data points influences the decision
boundary, which can be beneficial when data is limited.

 Ability to handle nonlinear data. SVMs can implicitly handle non-linearly separable data by using
kernel functions. The kernel trick enables SVMs to transform the input space into a higher-
dimensional feature space, making it possible to find linear decision boundaries.

Disadvantages of SVMs

While support vector machines are popular for the reasons listed above, they also come with some
limitations and potential issues:

 Computationally intensive. SVMs can be computationally expensive, especially when dealing with
large data sets. The training time and memory requirements increase significantly with the number
of training samples.

 Sensitive to parameter tuning. SVMs have parameters such as the regularization parameter and the
choice of kernel function. The performance of SVMs can be sensitive to these parameter settings.
Improper tuning can lead to suboptimal results or longer training times.
 Lack of probabilistic outputs. SVMs provide binary classification outputs and do not directly
estimate class probabilities. Additional techniques, such as Platt scaling or cross-validation, are
needed to obtain probability estimates.

 Difficulty in interpreting complex models. SVMs can create complex decision boundaries,
especially when using nonlinear kernels. This complexity may make it challenging to interpret the
model and understand the underlying patterns in the data.

 Scalability issues. SVMs may face scalability issues when applied to extremely large data sets.
Training an SVM on millions of samples can become impractical due to memory and computational
constraints.

Important support vector machine vocabulary

C parameter

A C parameter is a primary regularization parameter in SVMs. It controls the tradeoff between


maximizing the margin and minimizing the misclassification of training data. A smaller C enables more
misclassification, while a larger C imposes a stricter margin.

Classification

Classification is about sorting things into different groups or categories based on their characteristics,
akin to putting things into labeled boxes. Sorting emails into spam or nonspam categories is an example.

Decision boundary

A decision boundary is an imaginary line or boundary that separates different groups or categories in a
data set, placing data sets into different regions. For instance, an email decision boundary might classify
an email with over 10 exclamation marks as "spam" and an email with under 10 marks as "not spam."

Grid search

A grid search is a technique used to find the optimal values of hyperparameters in SVMs. It involves
systematically searching through a predefined set of hyperparameters and evaluating the performance of
the model.

Hyperplane
In n-dimensional space -- that is, a space with many dimensions -- a hyperplane is defined as an (n-1)-
dimensional subspace, a flat surface that has one less dimension than the space itself. In a two-
dimensional space, its hyperplane would be one-dimensional or a line.

Kernel function

A kernel function is a mathematical function used in the kernel trick to compute the inner product
between two data points in the transformed feature space. Common kernel functions include linear,
polynomial, Gaussian (RBF) and sigmoid.

Kernel trick

A kernel trick is a technique used to transform low-dimensional data into higher-dimensional data to
find a linear decision boundary. It avoids the computational complexity that arises when explicitly
mapping the data to a higher dimension.

Margin

The margin is the distance between the decision boundary and the support vectors. An SVM aims to
maximize this margin to improve generalization and reduce overfitting.

One-vs-All

One-vs-All, or OvA, is a technique for multiclass classification using SVMs. It trains a binary SVM
classifier for each class, treating it as the positive class and all other classes as the negative class.

One-vs-One

One-vs-One, or OvO, is a technique for multiclass classification using SVMs. It trains a binary SVM
classifier for each pair of classes and combines predictions to determine the final class.

Regression

Regression is predicting or estimating a numerical value based on other known information. It's similar
to making an educated guess based on given patterns or trends. Predicting the price of a house based on
its size, location and other features is an example.

Regularization
Regularization is a technique used to prevent overfitting in SVMs. Regularization introduces a penalty
term in the objective function, encouraging the algorithm to find a simpler decision boundary rather than
fitting the training data perfectly.

Support vector

A support vector is a data point or node lying closest to the decision boundary or hyperplane. These
points play a vital role in defining the decision boundary and the margin of separation.

Support vector regression

Support vector regression (SVR) is a variant of SVM used for regression tasks. SVR aims to find an
optimal hyperplane that predicts continuous values, while maintaining a margin of tolerance.
Lecture No. 38

Today's Agenda:
Detail discussion on Basics of Bayesian Statistics

The Bayesian statistics mostly involves conditional probability, which is the the probability of an event
A given event B, and it can be calculated using the Bayes rule. The concept of conditional probability is
widely used in medical testing, in which false positives and false negatives may occur. A false positive
can be defined as a positive outcome on a medical test when the patient does not actually have the
disease they are being tested for. In other words, it’s the probability of testing positive given no disease.
Similarly, a false negative can be defined as a negative outcome on a medical test when the patient does
have the disease. In other words, testing negative given disease. Both indicators are critical for any
medical decisions.

For how the Bayes’ rule is applied, we can set up a prior, then calculate posterior probabilities based on
a prior and likelihood. That is to say, the prior probabilities are updated through an iterative process of
data collection.

Bayesian Machine Learning is an approach that combines Bayesian statistics and machine learning to
make predictions and inferences while explicitly accounting for uncertainty. It leverages Bayes’ theorem
to update prior beliefs or probabilities based on observed data, enabling the estimation of posterior
probabilities and making more informed decisions. Bayesian inference is grounded in Bayes’ theorem,
which allows for accurate prediction when applied to real-world applications.

The fundamental formula behind Bayesian Machine Learning is Bayes’ theorem:


P(H|D) = (P(D|H) * P(H)) / P(D)
Where:
P(H|D) is the posterior probability of hypothesis H given the observed data D.

P(D|H) is the likelihood of the data D given the hypothesis H.

P(H) is the prior probability of hypothesis H.

P(D) is the probability of the observed data D.

In Bayesian Machine Learning, this formula is used to update prior beliefs (P(H)) based on new
evidence (P(D|H)) and calculate the posterior probabilities (P(H|D)). The following is an example of
spam filtering to illustrate Bayesian Machine Learning in action:
Let’s say we have a classification problem of distinguishing (filtering) between spam and non-spam
emails. We start with prior beliefs about the probability of an email being spam, which is represented as
P(spam) and P(non-spam). Based on a labeled dataset, we can calculate the likelihood of observing
certain words or features in spam emails (P(Word|spam)) and non-spam emails (P(Word|non-spam)).

Now, given a new email with certain words, we want to determine whether it is spam or non-spam.
Using Bayes’ theorem, we can update our prior beliefs to calculate the posterior probabilities:

P(spam|Word) = (P(Word|spam) * P(spam)) / P(Word)

P(non-spam|Word) = (P(Word|non-spam) * P(non-spam)) / P(Word)


Lecture No. 39

Today's Agenda:
Detail discussion on Basics of Computer vision
Computer vision is a scientific field of study that is concerned with enabling computers to automatically extract
useful information from digital images and videos. Its goal is to teach computers to gain high-level understanding
of visual data for interpretation and decision making.

Some key focus areas of computer vision include:

 Image classification – Identifying what objects are present in an image, such as cats, dogs, cars
etc. It involves labeling image datasets and training classification models.
 Object detection – Detecting instances of objects in images and localizing them with bounding
boxes. Models are trained to detect the presence and location of multiple object classes.
 Image segmentation – Partitioning images into multiple coherent regions or objects. This allows
separating foreground from background.
 Activity recognition – Understanding motions and behaviors from video sequences. This may
involve connecting a sequence of poses to identify actions.
 Scene reconstruction – Reconstructing 3D environments from 2D images via processing multiple
images with overlapping views. Helps recreate real-world scenes digitally.

Comparison Computer Vision Machine Learning


Criteria
Focus Processing and analyzing visual Applying algorithms to all kinds of structured and
data like images, videos unstructured data
Goals High-level image understanding Making predictions by finding statistical patterns
and replicating human vision and relationships
Typical Image classification, object Classification, regression, clustering, reinforcement
Tasks detection, segmentation learning
Training Requires labeled datasets of Can work with labeled and unlabeled data
Data images/videos
Models Used Mainly convolutional neural SVM, linear/logistic regression, neural nets,
networks decision trees, etc.
Outputs Bounding boxes, masks, 3D Predictions, recommended actions, data clusters
reconstructions
Compute High graphics processing power Can run on standard compute resources
Needs using GPUs
Applications Facial recognition, medical Predictive analytics, chatbots, recommendation
imaging, robots, autonomous systems, fraud detection
vehicles

Computer Vision Vs Machine Learning

Lecture No. 40

Today's Agenda:
Detail discussion on ImageNet
ImageNet is a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000
categories. The images were collected from the web and labeled by human labelers using Amazon’s
Mechanical Turk crowd-sourcing tool. Starting in 2010, as part of the Pascal Visual Object Challenge,
an annual competition called the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) has
been held. ILSVRC uses a subset of ImageNet with roughly 1000 images in each of 1000 categories. In
all, there are roughly 1.2 million training images, 50,000 validation images, and 150,000 testing images.

On ImageNet, it is customary to report two error rates: top-1 and top-5, where the top-5 error rate is the
fraction of test images for which the correct label is not among the five labels considered most probable
by the model. ImageNet consists of variable-resolution images, while our system requires a constant
input dimensionality.

ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

The general challenge tasks for most years are as follows:

 Image classification: Predict the classes of objects present in an image.


 Single-object localization: Image classification + draw a bounding box around one example of each
object present.
 Object detection: Image classification + draw a bounding box around each object present.

Summary of the Improvement on ILSVRC Tasks Over the First Five Years of the Competition. Taken
from ImageNet Large Scale Visual Recognition Challenge, 2015
Deep Learning Milestones From ILSVRC

The pace of improvement in the first five years of the ILSVRC was dramatic, perhaps even shocking to
the field of computer vision. Success has primarily been achieved by large (deep) convolutional neural
networks (CNNs) on graphical processing unit (GPU) hardware, which sparked an interest in deep
learning that extended beyond the field out into the mainstream.

ILSVRC-2012

AlexNet (SuperVision)

ZFNet (Clarifai)

VGG

Karen Simonyan and Andrew Zisserman from the Oxford Vision Geometry Group (VGG) achieved top
results for image classification and localization with their VGG model.
Their approach is described in their 2015 paper titled “Very Deep Convolutional Networks for Large-
Scale Image Recognition.”.
The folks at Visual Geometry Group (VGG) invented the VGG-16 which has 13 convolutional and 3
fully-connected layers, carrying with them the ReLU tradition from AlexNet. This network stacks more
layers onto AlexNet, and use smaller size filters (2×2 and 3×3). It consists of 138M parameters and
takes up about 500MB of storage space They also designed a deeper variant, VGG-19.

You might also like