0% found this document useful (0 votes)
28 views

Module -03 Machine Learning(BCS602) Search Creators

The document covers various machine learning concepts, focusing on similarity-based learning techniques such as k-Nearest Neighbors (k-NN) and its weighted variant, as well as regression analysis including linear and polynomial regression. It explains the principles, advantages, and limitations of these methods, along with their applications in real-world scenarios. Additionally, it introduces decision tree learning, detailing its structure and classification process.

Uploaded by

gamingsrb34
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Module -03 Machine Learning(BCS602) Search Creators

The document covers various machine learning concepts, focusing on similarity-based learning techniques such as k-Nearest Neighbors (k-NN) and its weighted variant, as well as regression analysis including linear and polynomial regression. It explains the principles, advantages, and limitations of these methods, along with their applications in real-world scenarios. Additionally, it introduces decision tree learning, detailing its structure and classification process.

Uploaded by

gamingsrb34
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Module-3

Chapter – 01 - Similarity-based Learning

Nearest-Neighbor Learning

k-Nearest-Neighbors (k-NN) Learning

Definition:

o k-NN is a non-parametric, similarity-based algorithm used for both classification and


regression.
o It predicts the class or value of a test instance based on the ‘K’ nearest neighbors in
the training data.

Working:

o Classification:
 The algorithm determines the class of a test instance by considering
the ‘K’ nearest neighbors and selecting the class with the majority vote.

Search Creators... Page 1


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o Regression:
 The output is the mean of the target variable values of the ‘K’ nearest
neighbors.

Assumption:

o k-NN relies on the assumption that similar objects are closer to each other in the
feature space.

Instance-Based Learning:

o Memory-Based: The algorithm does not build a prediction model ahead of time, but
stores training data for predictions to be made at the time of the test instance.
o Lazy Learning: No model is constructed during training; the learning process
happens only during testing when predictions are required.

Distance Metric:

o The most common distance metric used is Euclidean distance to measure the
closeness of training data instances to the test instance.

Choosing ‘K’:

o The value of ‘K’ determines how many neighbors should be considered for the
prediction. It is typically selected by experimenting with different values of K to find
the optimal one that produces the most accurate predictions.

Search Creators... Page 2


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Classification Process:

o For a discrete target variable (classification): The class of the test instance is
determined by the majority vote of the 'K' nearest neighbors.
o For a continuous target variable (regression): The output is the mean of the output
variable values of the ‘K’ nearest neighbors.

Advantages:

o Simple and intuitive.


o Effective for small to medium-sized datasets.
o Can handle multi-class classification.

Disadvantages:

o Computationally expensive during prediction because it requires calculating


distances to all training data instances.
o Performance may degrade with high-dimensional data (curse of dimensionality).

Weighted K-Nearest-Neighbor Algorithm

Overview:

o Weighted k-NN is an extension of the k-NN algorithm.


o It improves upon k-NN by assigning weights to neighbors based on their distance
from the test instance.

Motivation:

o Traditional k-NN assigns equal importance to all the ‘k’ nearest neighbors, which can
lead to poor performance when:
 Neighbors are at varying distances.
 The nearest instances are more relevant than the farther ones.

Search Creators... Page 3


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o Weighted k-NN addresses this by making closer neighbors more influential.

Working Principle:

Weights are inversely proportional to distance:

 Closer neighbors get higher weights, while farther neighbors get lower
weights.

o The final prediction is based on the weighted majority vote (classification) or the
weighted average (regression) of the k nearest neighbors.

Weight Assignment:

o Uniform Weighting: All neighbors are given the same weight (as in standard k-NN).
o Distance-Based Weighting: Weights are computed based on the inverse distance,
giving closer neighbors more influence.

Advantages:

o Addresses the limitations of standard k-NN by considering the relative importance of


neighbors.
o Performs better in datasets where closer neighbors are more relevant to the
prediction.

Applications:

o Classification: Predict the class of the test instance by weighted voting of the k
nearest neighbors.

Search Creators... Page 4


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o Regression: Predict the output value by computing the weighted mean of the k
nearest neighbors.

Limitations:

o Computational cost increases as distance calculations and weight assignments are


performed for each query.
o Sensitive to the choice of the distance metric (e.g., Euclidean, Manhattan, etc.).

Nearest Centroid Classifier

A simple alternative to k-NN classifiers for similarity-based classification is the Nearest


Centroid Classifier.

It is a simple classifier and also called as Mean Difference classifier.

The idea of this classifier is to classify a test instance to the class whose
centroid/mean is closest to that instance.

The Nearest Centroid Classifier (also known as the Mean Difference Classifier) is a
simple alternative to k-Nearest Neighbors (k-NN) for similarity-based classification.

Algorithm

Inputs: Training dataset T, Distance metric d, Test instance t

Output: Predicted class or category

1. Compute the mean/centroid of each class.

2. Compute the distance between the test instance and mean/centroid of each class
(Euclidean Distance).

3. Predict the class by choosing the class with the smaller distance.

Search Creators... Page 5


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Locally Weighted Regression (LWR)

Locally Weighted Regression (LWR) is a non-parametric supervised learning


algorithm that performs local regression by combining regression model with
nearest neighbor’s model.
LWR is also referred to as a memory-based method as it requires training data while
prediction but uses only the training data instances locally around the point of
interest.

Search Creators... Page 6


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Using nearest neighbors algorithm, we find the instances that are closest to a test
instance and fit linear function to each of those ‘K’ nearest instances in the local
regression model.
The key idea is that we need to approximate the linear functions of all ‘K’ neighbors
that minimize the error such that the prediction line is no more linear but rather it is a
curve.

Search Creators... Page 7


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Search Creators... Page 8


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Search Creators... Page 9


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Chapter – 02

Regression Analysis

Introduction to Regression

Definition:
Regression analysis is a supervised learning technique used to model the relationship
between one or more independent variables (x) and a dependent variable (y).

Objective:
The goal is to predict or forecast the dependent variable (y) based on the independent
variables (x), which are also called explanatory, predictor, or independent variables.

Mathematical Representation:
The relationship is represented by a function:

Purpose:

Regression analysis helps to determine how the dependent variable changes when an
independent variable is varied while others remain constant.

It answers key questions such as:

o What is the relationship between variables?


o What is the strength and nature (linear or non-linear) of the relationship?
o What is the relevance and contribution of each variable?

Search Creators... Page 10


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Applications:

 Sales forecasting
 Bond values in portfolio management
 Insurance premiums
 Agricultural yield predictions
 Real estate pricing

Prediction Focus:
Regression is primarily used for predicting continuous or quantitative variables, such as
price, revenue, and other measurable factors.

Introduction to Linear Regression

Definition:
Linear Regression is a fundamental supervised learning algorithm used to model the
relationship between one or more independent variables (predictors) and a dependent
variable (target).

It assumes a linear relationship between the variables.

Objective:
The primary goal of linear regression is to find a linear equation that best fits the data
points. This equation is used to predict the dependent variable based on the values of
the independent variables.

Mathematical Representation:
The relationship is represented as:

Assumptions:

 Linearity: The relationship between x and y is linear.


 Independence: Observations are independent of each other.

Search Creators... Page 11


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

 Homoscedasticity: Constant variance of errors across all levels of xxx.


 Normality: The residuals (errors) are normally distributed.

Types of Linear Regression:

 Simple Linear Regression: Involves one independent variable.


 Multiple Linear Regression: Involves two or more independent variables.

Applications:

 Predicting house prices based on features like size and location.


 Estimating sales based on advertising expenditure.
 Forecasting stock prices or other financial metrics.
 Modeling growth trends in industries.

Advantages:

 Easy to implement and interpret.


 Efficient for linearly separable data.

Limitations:

 Struggles with non-linear relationships.


 Sensitive to outliers, which can distort predictions.

Search Creators... Page 12


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Search Creators... Page 13


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Multiple Linear Regression

Multiple regression model involves multiple predictors or independent variables and one
dependent variable.

This is an extension of the linear regression problem. The basic assumptions of


multiple linear regression are that the independent variables are not highly correlated
and hence multicollinearity problem does not exist.

Also, it is assumed that the residuals are normally distributed.

Search Creators... Page 14


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Search Creators... Page 15


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Definition:
Multiple Linear Regression (MLR) is an extension of simple linear regression, where
multiple independent variables (predictors) are used to model the relationship with a
single dependent variable (target).

Mathematical Representation:
The relationship is represented as:

Assumptions of Multiple Linear Regression:

No Multicollinearity: The independent variables should not be highly correlated with


each other. Multicollinearity can cause issues in estimating the coefficients accurately.

Normality of Residuals: The residuals (errors) should be normally distributed for valid
inference and hypothesis testing.

o Linearity: The relationship between each independent variable and the dependent
variable should be linear.
o Independence of Errors: Observations should be independent of each other.
o Homoscedasticity: The variance of residuals should be constant across all levels of
the independent variables.

Search Creators... Page 16


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Applications:

o Predicting house prices based on multiple features (size, location, number of rooms,
etc.).
o Estimating the sales of a product based on various factors (price, advertising budget,
competition, etc.).
o Modeling health outcomes based on multiple risk factors (age, BMI, physical activity,
etc.).

Advantages:

o Can model the relationship between multiple predictors and a single outcome.
o Provides insights into how different predictors influence the dependent variable.

Limitations:

o If multicollinearity exists (high correlation between predictors), it can affect the


stability and interpretability of the model.
o Can be computationally complex with a large number of predictors.
o Sensitive to outliers, which can distort the relationship between variables.

Polynomial Regression

Introduction to Polynomial Regression

Definition:
Polynomial Regression is a form of regression analysis that models the relationship
between the independent variable(s) and the dependent variable as a polynomial
function.

It is used when the relationship between variables is non-linear and cannot be effectively
modeled using linear regression.

Search Creators... Page 17


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Purpose:
When the data exhibits a non-linear trend, linear regression may result in large errors.
Polynomial regression overcomes this limitation by fitting a curved line to the data.

Approaches to Handle Non-Linearity:

Features of Polynomial Regression:

 Captures curved relationships between variables.


 Provides a more flexible model compared to linear regression.

Applications:

 Modeling growth trends in populations or markets.


 Predicting real-world phenomena such as temperature variations, physics
experiments, or chemical reactions.
 Engineering designs involving complex relationships.

Search Creators... Page 18


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Advantages:

 Capable of modeling non-linear relationships without transforming the data.


 Provides a better fit for datasets with curved trends.

Limitations:

 Increasing the polynomial degree can lead to overfitting the training data.
 Sensitive to outliers, which can significantly distort the fitted curve.
 May require careful tuning of the degree nnn to balance bias and variance.

Logistic Regression

Introduction to Logistic Regression

Definition:
Logistic Regression is a supervised learning algorithm used for classification problems,
particularly binary classification, where the output is a categorical variable with two
possible outcomes (e.g., yes/no, pass/fail, spam/not spam).

Purpose:
Logistic Regression predicts the probability of a categorical outcome and maps the
prediction to a value between 0 and 1. It works well when the dependent variable is
binary.

Applications:

o Email classification: Is the email spam or not?


o Student admission prediction: Should a student be admitted or not based on scores?
o Exam result classification: Will the student pass or fail based on marks?

Core Concept:

o Logistic Regression models the probability of a particular response variable.


Search Creators... Page 19
BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o For instance, if the predicted probability of an email being spam is 0.7, there is a 70%
chance the email is spam.

Challenges with Linear Regression for Classification:

o Linear regression can predict values outside the range of 0 to 1, which is unsuitable
for probabilities.
o Logistic Regression overcomes this by using a sigmoid function to map values to the
range [0, 1].

Sigmoid Function:
The sigmoid function (also called the log it function) is used to map any real number to
the range [0, 1]. It is mathematically represented as:

Difference between Odds and Probability:

Search Creators... Page 20


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

For example:
If the probability of an event is 0.75, the odds are:

Features of Logistic Regression:

 Logistic Regression predicts the probability of a class label.


 It applies a threshold (e.g., 0.5) to determine the class label.
 It is based on the log-odds transformation to linearize the relationship between
variables.

Advantages:

 Simple and efficient for binary classification.


 Works well when the relationship between the dependent and independent
variables is linear (in terms of log-odds).
 Outputs interpretable probabilities.

Limitations:

 Struggles with non-linear decision boundaries (can be addressed with extensions like
polynomial logistic regression).
 Sensitive to outliers in the dataset.

Search Creators... Page 21


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

For Understanding Purpose Only

Search Creators... Page 22


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Chapter – 03

Decision Tree Learning

Introduction to Decision Tree Learning Model

Overview:

 Decision tree learning is a popular supervised predictive model for classification


tasks.
 It performs inductive inference, generalizing from observed examples.
 It can classify both categorical and continuous target variables.
 The model is often used for solving complex classification problems with high
accuracy.

Structure of a Decision Tree:

 Root Node: The topmost node that represents the entire dataset.
 Internal/Decision Nodes: These are nodes that perform tests on input attributes
and split the dataset based on test outcomes.
 Branches: Represent the outcomes of a test condition at a decision node.

Search Creators... Page 23


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

 Leaf Nodes/Terminal Nodes: Represent the target labels or output of the decision
process.
 Path: A path from root to leaf node represents a logical rule for classification.

Process of Building a Decision Tree:

Goal: Construct a decision tree from the given training dataset.

Tree Construction:

o Start from the root and recursively find the best attribute for splitting.
o This process continues until the tree reaches leaf nodes that cannot be
further split.
o The tree represents all possible hypotheses about the data.

Output: A fully constructed decision tree that represents the learned model.

Inference or Classification:

Goal: For a given test instance, classify it into the correct target class.

Classification:

o Start at the root node and traverse the tree based on the test conditions for
each attribute.
o Continue evaluating test conditions until reaching a leaf node, which provides
the target class label for the instance.

Advantages of Decision Trees:

1. Easy to model and interpret.


2. Simple to understand.
3. Can handle both discrete and continuous predictor variables.
4. Can model non-linear relationships between variables.
Search Creators... Page 24
BCS602 | MACHINE LEARNING| SEARCH CREATORS.

5. Fast to train.

Disadvantages of Decision Trees:

1. It is difficult to determine how deep the tree should grow and when to stop.
2. Sensitive to errors and missing attribute values in training data.
3. Computational complexity in handling continuous attributes, requiring
discretization.
4. Risk of overfitting with complex trees.
5. Not suitable for classifying multiple output classes.
6. Learning an optimal decision tree is an NP-complete problem.

Decision Tree Induction Algorithms

Several decision tree algorithms are widely used in classification tasks, including ID3,
C4.5, and CART, among others.

These algorithms differ in their splitting criteria, handling of attributes, and robustness
to data characteristics.

Popular Decision Tree Algorithms:

ID3 (Iterative Dichotomizer 3):

o Developed by J.R. Quinlan in 1986.


o Constructs univariate decision trees (splits based on a single attribute).
o Uses Information Gain as the splitting criterion.
o Assumes attributes are discrete or categorical.
o Works well with large datasets but is prone to overfitting on small datasets.
o Cannot handle missing values or continuous attributes directly (requires
discretization).
o No pruning is performed, making it sensitive to outliers.

Search Creators... Page 25


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

C4.5:

o An extension of ID3 developed by J.R. Quinlan in 1993.


o Uses Gain Ratio as the splitting criterion, which normalizes Information Gain.
o Can handle both categorical and continuous attributes.
o Handles missing values by estimating the best split based on available data.
o Prone to outliers, which can affect the tree construction.

CART (Classification and Regression Trees):

o Developed by Breiman et al. in 1984.


o Can handle categorical and continuous-valued target variables.
o Uses the GINI Index as the splitting criterion for classification tasks.
o Builds binary decision trees (only two splits per node).
o Handles missing values and is robust to outliers.
o Can be used for regression tasks, making it versatile.

Univariate vs. Multivariate Decision Trees:

Univariate Decision Trees:

o Split based on a single attribute at each decision node.


o Examples: ID3 and C4.5.
o Simple and axis-aligned splits.

Multivariate Decision Trees:

o Consider multiple attributes for splitting at a single decision node.


o Example: CART.
o More complex and better suited for non-linear relationships.

Search Creators... Page 26


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Features of Decision Tree Algorithms

Advantages and Limitations of ID3, C4.5, and CART:

Search Creators... Page 27


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Algorithm

Search Creators... Page 28


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Search Creators... Page 29

You might also like