0% found this document useful (0 votes)
2 views

Machine Learning unit 3

The document discusses instance-based and model-based learning in machine learning, highlighting their advantages and disadvantages. It explains the K-Nearest Neighbors (KNN) algorithm, its working principles, and the impact of dimensionality on its performance. Additionally, it covers Bayes theorem and the Naive Bayes classifier, along with logistic regression and feature selection techniques.

Uploaded by

roopa5431m
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Machine Learning unit 3

The document discusses instance-based and model-based learning in machine learning, highlighting their advantages and disadvantages. It explains the K-Nearest Neighbors (KNN) algorithm, its working principles, and the impact of dimensionality on its performance. Additionally, it covers Bayes theorem and the Naive Bayes classifier, along with logistic regression and feature selection techniques.

Uploaded by

roopa5431m
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Machine Learning (AM20APC603)

UNIT - III

Prepared by-

Srishyla K
Asst.
Professor

Dept. of CSE(AI&ML)
Instance-based learning
• Instance-based learning (also known as memory-based learning or lazy learning) involves
memorizing training data in order to make predictions about future data points.
• This approach doesn’t require any prior knowledge or assumptions about the data, which makes
it easy to implement and understand.
• However, it can be computationally expensive since all of the training data needs to be stored in
memory before making a prediction.
• Additionally, this approach doesn’t generalize well to unseen data sets because its predictions are
based on memorized examples rather than learned models
• At the time of making prediction, the system uses similarity measure and compare the new cases
with the learned data. K-nearest neighbors (KNN) is an algorithm that belongs to the instance-
based learning class of algorithms.
• KNN is a non-parametric algorithm because it does not assume any specific form or underlying
structure in the data. Instead, it relies on a measure of similarity between each pair of data
points. Generally speaking, this measure is based on either Euclidean distance or ma
Advantages of Instance-Based Learning
1. No need for model creation: Instance-based learning doesn’t require creating a model, which can
be an advantage if you don’t have the expertise to create the model.
2. Can handle small datasets: Instance-based learning can handle small datasets because it doesn’t
require a large dataset to create a model.
3. More flexibility: Instance-based learning can be more flexible than model-based learning because
the machine stores all instances of data and can use this data to make predictions.
Disadvantages of Instance-Based
Learning
1. Slower predictions: Instance-based learning is typically slower than model-based learning
because the machine has to compare the new data to all instances of data in order to make a
prediction.
2. Less accurate predictions: Instance-based learning can often make less accurate predictions than
model-based learning because it doesn’t have a mathematical model to generalize from.
3. Limited understanding of data: Instance-based learning doesn’t provide as much insight into the
relationships between input and output variables as model-based learning does.

Some of the instance-based learning algorithms are :


4.K Nearest Neighbor (KNN)
5.Self-Organizing Map (SOM)
6.Learning Vector Quantization (LVQ)
Model-Based Learning
Model-based learning involves creating a mathematical model that can predict outcomes based on
input data. The model is trained on a large dataset and then used to make predictions on new data.
The model can be thought of as a set of rules that the machine uses to make predictions.
In model-based learning, the training data is used to create a model that can be generalized to new
data. The model is typically created using statistical algorithms such as linear regression, logistic
regression, decision trees, and neural networks. These algorithms use the training data to create a
mathematical model that can be used to predict outcomes.
Advantages and Disadvantages of Model-Based
Learning
Advantages of Model-Based Learning
1. Faster predictions: Model-based learning is typically faster than instance-based learning because the
model is already created and can be used to make predictions quickly.
2. More accurate predictions: Model-based learning can often make more accurate predictions than
instance-based learning because the model is trained on a large dataset and can generalize to new
data.
3. Better understanding of data Model-based learning allows you to gain a better understanding of the
relationships between input and output variables. This can help identify which variables are most
important in making predictions.

Disadvantages of Model-Based Learning


4. Requires a large dataset: model-based learning requires a large dataset to train the model. This can
be a disadvantage if you have a small dataset.
5. Requires expert knowledge: Model-based learning requires expert knowledge of statistical algorithms
and mathematical modeling. This can be a disadvantage if you don’t have the expertise to create the
model.
k-NN(k- Nearest neighbour)
K-Nearest Neighbors (KNN) is a versatile supervised machine learning algorithm used for both
classification and regression tasks. Its fundamental concept revolves around the idea that similar
things are close to each other in a feature space. KNN operates on the principle of lazy learning,
where it does not explicitly learn a model during the training phase but stores the entire training
dataset.
During the prediction phase, when faced with a new data point, KNN identifies the ‘k’ nearest
neighbors to that point based on a chosen distance metric, commonly the Euclidean distance. For
classification tasks, KNN assigns the majority class among the neighbors to the new data point. In
regression tasks, it calculates the average of the target values of the ‘k’ nearest neighbors.
During the training phase, the KNN algorithm memorizes the entire dataset. When presented with
new data, it categorizes it into a class that closely resembles the characteristics of the new data as
shown in Figure 1 below.
kNN working
Step 1: Choose the Number of Neighbors (K)
Start by deciding how many neighbors (data points from your dataset) you want to consider when
making predictions. This is your ‘K’ value.
Step 2: Calculate Euclidean Distance
Find the distance between your new data point and the chosen number of neighbors. Imagine it as
measuring the straight-line distance between two points.
Step 3: Identify Nearest Neighbors
Pick the ‘K’ neighbors with the smallest calculated distances. These are the closest points to your
new data.
Step 4: Count Data Points in Each Category
Among these neighbors, count how many belong to each category. For instance, count how many
are in Category A and how many are in Category B.
Step 5: Assign to the Majority Category
Assign your new data point to the category that has the most neighbors. If most of them are in
Category A, your new point goes into Category A.
• Let’s picture this with an example: If we choose ‘K’ to be 5, calculate distances, and find that 3
neighbors are in Category A and 2 are in Category B, our new data point is likely in Category A.
• Choosing ‘K’ is crucial. It represents the number of neighbors considered. KNN is a lazy learning
algorithm, meaning it doesn’t update distances with every calculation to save computational
resources.
• As seen in our example, changing ‘K’ changes predictions. With K=3, we might predict Category B,
while with K=7, it could be Category A. So, picking the right ‘K’ is a big deal in making KNN work
well.
kNN Algorithm
• Let an arbitrary instance x be described by the feature vector ((a1(x), a2(x), ………, an(x))
Where, ar(x) denotes the value of the rth attribute of instance x.
• Then the distance between two instances xi and xj is defined to be d(xi , xj ) Where,

• In nearest-neighbor learning the target function may be either discrete-valued or real valued.
• Let us first consider learning discrete-valued target functions of the form

Where, V is the finite set {v1, . . . vs }


Knn contd.
• The k- Nearest Neighbor algorithm for approximation a discrete-valued target function is given
below:

• The value 𝑓̂(xq) returned by this algorithm as its estimate of f(xq) is just the most common value of f
among the k training examples nearest to xq.
• If k = 1, then the 1- Nearest Neighbor algorithm assigns to 𝑓̂(xq) the value f(xi). Where xi is the
training instance nearest to xq. For larger values of k, the algorithm assigns the most common value
among the k nearest training examples.
Knn contd.
Example for discrete & real valued target
function
Distance-Weighted Nearest Neighbor Algorithm
The refinement to the k-NEAREST NEIGHBOR Algorithm is to weight the contribution of each of the
k neighbors according to their distance to the query point xq, giving greater weight to closer
neighbors. For example, in the k-Nearest Neighbor algorithm, which approximates discrete-valued
target functions, we might weight the vote of each neighbor according to the inverse square of its
distance from xq

Distance-Weighted Nearest Neighbor Algorithm for approximation a discrete-valued target


functions
Distance-Weighted Nearest Neighbor Algorithm
contd.
• Distance-Weighted Nearest Neighbor Algorithm for approximation a Real-valued target functions
Example for weighted discrete & real valued
target function
Solved Examples OF kNN
What is the Curse of Dimensionality?
• The Curse of Dimensionality refers to various phenomena that arise when dealing with high-
dimensional data.
• As the number of features or dimensions increases, the volume of the feature space grows
exponentially, leading to sparsity in the data distribution.
• This sparsity can result in several challenges such as increased computational complexity,
overfitting, and deteriorating performance of certain algorithms

How does Dimensionality effect KNN Performance?


The impact of dimensionality on the performance of KNN (K-Nearest Neighbors) is a well-known
issue in machine learning. Here’s a breakdown of how dimensionality affects KNN performance:
1. Increased Sparsity: As the number of dimensions increases, the volume of the space grows
exponentially. Consequently, the available data becomes sparser, meaning that data points are
spread farther apart from each other. This sparsity can lead to difficulties in finding meaningful
nearest neighbors, as there may be fewer neighboring points within a given distance.
How does Dimensionality effect KNN Performance?

2. Equal Distances: In high-dimensional spaces, the concept of distance becomes less meaningful.
As the number of dimensions increases, the distance between any two points tends to become
more uniform, or equidistant. This phenomenon occurs because the influence of any single
dimension diminishes as the number of dimensions grows, leading to points being distributed
more uniformly across the space.
3. Degraded Performance: KNN relies on the assumption that nearby points in the feature space are
likely to have similar labels. However, in high-dimensional spaces, this assumption may no longer
hold true due to the increased sparsity and equalization of distances. As a result, KNN may
struggle to accurately classify data points, leading to degraded performance.
4. Increased Computational Complexity: With higher dimensionality, the computational cost of KNN
increases significantly. The algorithm needs to compute distances in a high-dimensional space,
which involves more calculations. This can make the KNN algorithm slower and less efficient,
especially when dealing with large datasets.
BAYES THEOREM
Bayes theorem provides a way to calculate the probability of a hypothesis based on its prior
probability, the probabilities of observing various data given the hypothesis, and the observed
data itself. Bayes theorem is one of the most popular machine learning concepts that helps to
calculate the probability of occurring one event with uncertain knowledge while other one has
already occurred.
The theorem can be mathematically expressed as:

Here, both events X and Y are independent events which means probability of outcome of both
events does not depends one another.
P(X|Y) is called as posterior. It is defined as updated probability after considering the evidence.
P(Y|X) is called the likelihood. It is the probability of evidence when hypothesis is true.
P(X) is called the prior probability, probability of hypothesis before considering the evidence
P(Y) is called marginal probability. It is defined as the probability of evidence under any
consideration.
Hence, Bayes Theorem can be written as:
posterior = likelihood * prior / evidence
Naive Bayes Classifier Algorithm
The Naive Bayes classifier algorithm is a machine learning technique used for classification tasks. It
is based on Bayes’ theorem and assumes that features are conditionally independent of each other
given the class label. The algorithm calculates the probability of a data point belonging to each class
and assigns it to the class with the highest probability.

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:
• Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the bases of
color, shape, and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each
feature individually contributes to identify that it is an apple without depending on each other.
• Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem
Python exercise on Naive Bayes Classifier Algorithm

• https://www.javatpoint.com/machine-learning-naive-bayes-classifier
Logistic Regression
• Logistic regression is one of the most popular Machine Learning algorithms, which comes under
the Supervised Learning technique. It is used for predicting the categorical dependent variable
using a given set of independent variables.
• Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False,
etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie
between 0 and 1.
• In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function,
which predicts two maximum values (0 or 1).
Logistic Function (Sigmoid Function):
• The sigmoid function is a mathematical function used to map the predicted
values to probabilities.
• It maps any real value into another value within a range of 0 and 1.
• The value of the logistic regression must be between 0 and 1, which cannot go
beyond this limit, so it forms a curve like the "S" form. The S-form curve is
called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines
the probability of either 0 or 1. Such as values above the threshold value
tends to 1, and a value below the threshold values tends to 0.
Differences between linear and logistic regression
Feature Selection
A feature is an attribute that has an impact on a problem or is useful for the problem, and choosing the important
features for the model is known as feature selection. "It is a process of automatically or manually selecting the subset
of most appropriate and relevant features to be used in model building”. Role of feature selection:
1. To reduce the dimensionality of feature space.
2. To speed up a learning algorithm.
3. To improve the predictive accuracy of a classification algorithm.
4. To improve the comprehensibility of the learning results.

There are mainly two types of Feature Selection techniques, which are:
• Supervised Feature Selection technique
It considers the target variable and can be used for the
labelled dataset.
• Unsupervised Feature Selection technique
It ignores the target variable and can be used
for the unlabeled dataset.
1. Wrapper Methods
In wrapper methodology, selection of features is done by considering it as a search problem, in
which different combinations are made, evaluated, and compared with other combinations. It trains
the algorithm by using the subset of features iteratively.
On the basis of the output of the model, features are added or subtracted, and with this feature set,
the model has trained again.
Wrapper Methods
1. Forward selection
Forward Feature Selection is a feature selection technique that iteratively builds a model by adding
one feature at a time, selecting the feature that maximizes model performance.
It starts with an empty set of features and adds the most predictive feature in each iteration until a
stopping criterion is met.
This method is particularly useful when dealing with a large number of features, as it incrementally
builds the model based on the most informative features.
This process involves assessing new features, evaluating combinations of features, and selecting the
optimal subset of features that best contribute to model accuracy.
Wrapper Methods
2. Backward elimination
This method is also an iterative approach where we initially start with all features and after each
iteration, we remove the least significant feature. The stopping criterion is till no improvement in
the performance of the model is observed after the feature is removed.

3. Exhaustive selection – This technique is considered as the brute force approach for the
evaluation of feature subsets. It creates all possible subsets and builds a learning algorithm for each
subset and selects the subset whose model’s performance is best.

4. Recursive elimination – This greedy optimization method selects features by recursively


considering the smaller and smaller set of features. The estimator is trained on an initial set of
features and their importance is obtained using feature_importance_attribute. The least important
features are then removed from the current set of features till we are left with the required number
of feature
2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures. This method does not
depend on the learning algorithm and chooses the features as a pre-processing step.
The filter method filters out the irrelevant feature and redundant columns from the model by using
different metrics through ranking.
The advantage of using filter methods is that it needs low computational time and does not overfit
the data.
2. Filter Methods
1. Information Gain: Information gain determines the reduction in entropy while transforming the
dataset. It can be used as a feature selection technique by calculating the information gain of each
variable with respect to the target variable.
2. Chi-square Test: Chi-square test is a technique to determine the relationship between the
categorical variables. The chi-square value is calculated between each feature and the target
variable, and the desired number of features with the best chi-square value is selected.

3. Fisher’s score: It may be used for continuous features in a classification problem. Fisher's Score
is calculated as the ratio of between-class and within-class variance. A higher Fisher's Score implies
the characteristic is more discriminative and valuable for classification. Fisher's score is one of the
popular supervised technique of features selection. It returns the rank of the variable on the fisher's
criteria in descending order. Then we can select the variables with a large fisher's score.
4. Missing Value Ratio: The value of the missing value ratio can be used for evaluating the feature
set against the threshold value. The formula for obtaining the missing value ratio is the number of
missing values in each column divided by the total number of observations. The variable is having
more than the threshold value can be dropped.
Univariate feature selection
• Univariate feature selection is a method used to select the most important features in a dataset.
The idea behind this method is to evaluate each individual feature’s relationship with the target
variable and select the ones that have the strongest correlation. This process is repeated for each
feature and the best ones are selected based on defined criteria, such as the highest correlation
or statistical significance.
• In univariate feature selection, the focus is on individual features and their contribution to the
target variable, rather than considering the relationships between features. This method is simple
and straightforward, but it does not take into account any interactions or dependencies between
features.
• Univariate feature selection is useful when working with a large number of features and the goal
is to reduce the dimensionality of the data and simplify the modeling process. It is also useful for
feature selection in cases where the relationship between the target variable and individual
features is not complex and can be understood through a simple statistical analysis.
Multivariate feature selection
Multivariate feature selection refers to the process of selecting subsets of
features from multivariate data (data with multiple variables) to improve the
performance of a machine learning model or to gain insights into the underlying
data structure. Unlike univariate feature selection, which considers each feature
individually, multivariate feature selection methods take into account the
relationships between features.

You might also like