0% found this document useful (0 votes)

5 views

class 3 - classification

The document discusses various data mining techniques, focusing on classification and regression methods such as k-Nearest Neighbors (KNN), Neural Networks, Ensemble Methods, and Linear Regression. It explains the principles behind these methods, including how to classify unknown records, the importance of choosing parameters like 'k' in KNN, and the training process for neural networks. Additionally, it covers advanced topics like boosting, random forests, and regression trees, highlighting their advantages and applications.

Uploaded by

eltcarva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

class 3 - classification

Uploaded by

eltcarva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

Prof.

Heitor S Lopes
Prof. Thiago H Silva

Data Mining &

Knowledge Discovery

3a - Classiﬁcation and Regression

KNN
Instance-based classiﬁcation
Use speciﬁc training records to make predictions, without needing to maintain
a model derived from the data

Requires a proximity measure

Example: k - Nearest neighbor

● Uses the k “nearest” points (nearest neighbors) to perform classiﬁcation
KNN classiﬁer
Test record
Training
records

Choose k of the
“closest” records
Basic idea (example):
If it walks like a duck, makes sounds like a duck, then it is
probably a duck
KNN classiﬁer Requires:
● Set of labeled records
● Distance metric
● Value of k, the number of nearest
neighbors to retrieve
To classify an unknown record:
● Compute the distance to other
training records
● Identify k nearest neighbors
● Use the class labels of the nearest
neighbors to determine the label of
the unknown record (e.g., majority
vote)
Deﬁnition of nearest neighbor

K-nearest neighbors of a record x are points that have the k

smallest distances from x
Classiﬁcation
Compute the distance between two points:

Ex: Euclidean distance

Determine the class according to the list of nearest neighbors

● Consider the majority of class label votes among the k-nearest
neighbors
● Give a weight to the vote according to the distance
○ Weight: w = 1/d2
Classiﬁcation
Choosing the value of k:
● If k is too small, sensitive to noise
● If k is too large, neighborhood may include points from other classes
Normalization!
Classiﬁcation
Scale issues
Take scale into account to prevent distance measurements from being
dominated by just one attribute
Ex:
A person's height can vary from 1.5m to 1.8m
Weight from 2kg to 150kg
Euclidean distance
Income from R$10K to R$1M
ExampleX = (≤ 30, Medium, Yes,
Computer purchase

Good)
Example
Example
For K = 5
The 5 nearest neighbors are:
● X1 = ( ≤ 30 High Yes Good) Class = No
● X2 = (≤ 30 Medium Not Good) Class = No
● X3 = (≤ 30 Low Yes Good) Class = Yes
● X4 = (> 40 Medium Yes Good) Class = Yes
● X5 = (≤ 30 Medium Yes Excellent) Class = Yes
Therefore, X is classiﬁed in class = Yes
KNN other points
k-NN classiﬁers are “lazy” as they do not explicitly build
models

Classifying unknown records is relatively expensive

Neural Networks
Neural Networks

Based on the inputs (X1 to X3), predict the output Y

Neural Networks

Based on the inputs (X1 to X3), predict the output Y

AND logical function
Truth table The neuron designed to solve the problem must:
- Use x1 and x2 as input
- Weight the inputs with synaptic weights
- Perform a summation
- Apply an activation function to produce an
output y that must be equal to y
Various types of activation functions
AND logical function
For the example we define the sign function (step):
● It will produce output 1 when the induced field is > 0 and -1 otherwise
The neuron processing rule can be defined as:
● Calculation of the signal that enters the neuron is:

- xi is the input value

- the neuron output is y0 = f(vo)
AND logical function (learning)
Initialize weights (w0, w1, …, wn)
Repeat
● For each training instance (xi, yi) Error
● Compute f(w, xi)
● Update weights: Learning rate

Until the stop condition is reached

AND logical function (learning) Sigmod (f)

Example with another activation function:

XOR logical function

Not linearly separable

That is, classes that can be separated by a line (or a hyperplane)
XOR logical function
We need to combine more than one neuron
(allows combining lines)

Not linearly separable

That is, classes that can be separated by a line (or a hyperplane)
XOR logical function

Problem on the left is a simpliﬁed (easier to analyze) version of

the problem on the right
Multilayer neural network
● Possibility of combining neurons in a multilayer network

● Makes it possible to obtain more complex structures

● Can be useful in solving tasks involving non-linear decision surfaces

Multilayer neural network
XOR logical function
Multilayer neural network

There is no way to calculate the error in the hidden layer neurons

directly, as the desired response does not exist for such neurons.
Multilayer neural network

Calculated error
Propagated error
Weights adjusted slightly
The process is repeated for all inputs and outputs until the error is
small or another condition is imposed.
After this process the network is considered trained.
Advantages and disadvantages

● Time-consuming training phase. Backpropagation iterations

can occur hundreds of thousands of times.
● Poor interpretability
● Many empirically determined parameters

● High noise tolerance

● Results tend to be good
Ensemble methods
Ensemble methods

● Build a set of classiﬁers from the training data

● Predict the class of the test set by combining the predictions

made by multiple classiﬁers
Overview
Types of ensemble methods

● Manipulate data distribution

○ Example: bagging, boosting

● Manipulate input features

○ Example: random forests

● …
Bagging

● Replacement sampling

Build a classiﬁer on each bootstrapping (repeated sample)

Bagging
Bagging - example

● Consider a 1-dimensional dataset:

Classiﬁer:
Decision rule: x <= k versus x > k
The split point k is chosen based on entropy
Bagging - example

(separation point is calculated based on these values,

lower entropy)
Bagging - example
Bagging - example
Bagging - example
● Assume the test set is the same as the original data
● Use majority vote to determine the class

Predicted class
Bagging - example
● Assume the test set is the same as the original data
● Use majority vote to determine the class

Predicted class
Boosting
● Iterative procedure to adaptively change the distribution of
training data, focusing more on misclassiﬁed records

○ Initially, all N records are given equal weights

○ Unlike bagging, the weights can change at the end of
each boosting round
Boosting
● Incorrectly classiﬁed records will have their weights increased
● Correctly classiﬁed records will have their weights decreased

- Example 4 is diﬃcult to classify

Increase the selection - Its weight is increased, so it is more likely to
of example 4 be picked again in subsequent rounds
Random Forests
● Generate a sample with repetition (bootstrap)

● Create decision trees – at each division, consider only a subset of

features.
Random Forests
● Generate a sample with repetition (bootstrap)

● Create decision trees – at each division, consider only a subset of

features.
Random Forests
● RF are robust to overﬁtting
● Computationally fast, even with a large number of features (because of
the local restriction of “p” features)

● “p” is a parameter
● Literature suggests: sqrt(dimensions) and log2dimensions +1
Gradient boosting
Like bagging and boosting, gradient boosting is a methodology
applied on top of another learning algorithm.

Examples:

● XGBoost
● LightGBM
Linear Regression (LR)
Linear Regression
It is a simple supervised learning strategy
Assume that the dependence of Y on X1,X2,...Xp is linear

The real function is not linear

Linear Regression
What is a good model?
How to estimate model parameters?
Simple LR with a single predictor
Simple LR with a single predictor
Simple LR with a single predictor
Assuming the model

where β0 is the point where the line crosses the Y-axis and β1 is
the slope of the line (coeﬃcients or parameters) and is an error

Given some estimates b0 and b1 for the coeﬃcients, we make

predictions with

Y prediction
Estimating parameters with least squares
If then the error in the estimate for X i is:

We deﬁne the Sum of Squared Errors (SSE)

Estimating parameters with least squares
The least squares approach chooses b0 and b1 that minimizes the SSE

The best regression parameters (which lead to the smallest error

variance) are:

Where and

are the sample means

Example - advertising

A linear approach captures the essence of the relationship, despite the

“deﬁciency” at the beginning
Example – estimating the parameters
Execution time of a query for multiple words:
Example – estimating the parameters
Execution time of a query for multiple words:
Overall model accuracy
Without regression, the best estimate of y is

Regression provides a better estimate, but there are still errors

Overall model accuracy
Regression quality measured by the coeﬃcient of determination:

The higher the R2 value, the better the regression (data ﬁt)
Overall model accuracy
Regression quality measured by the coeﬃcient of determination:
Overall model accuracy
Previous example

R2 = 0.98
Example - advertising

Separate models
Multiple linear regression
Models with more than one predictor variable (independent variable)

Each predictor variable has a linear relationship with the response

variable
Multiple linear regression
Our model:

We interpret as the average eﬀect on Y of a one-unit increase in

Xj, holding all other parameters ﬁxed.

In the advertising example:

Multiple linear regression Hyperplane for two
dimensions (hard to
draw for 3+)

independent variables
Example result - advertising
Sales value when all
investments are 0
(intercept)

Indication that when placing It is not signiﬁcant in

radio in the model, newspaper the presence of TV and
is not necessary radio in the model
Evaluation Improved with respect to
a single variable

Adjusted R2 = Takes into account the fact that more

variables could inﬂate the R2
Regression Tree
Regression Tree

Linear regression may not be a good model

Regression Tree
Leaves represent a numeric value.

In Classiﬁcation Trees we have leaves as

being true or false or other discrete
classes.
Regression Tree
Regression Tree
Regression Tree
Regression Tree
Regression Tree
Regression Tree
Regression Tree

There are several strategies to prevent overﬁtting in regression trees. One of

them is to limit the minimum number of instances by threshold. E.g. 20.
Regression Tree
If there are more features:

- Calculate the best threshold for all features.

- Identify the lowest SSE for each candidate.

References
Oﬃcial StasModel library documentation https://www.statsmodels.org/
Oﬃcial Scikit-learn library documentation www.scikit-learn.org
Complementary material for the book The Elements of Statistical Learning:
Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani
and Jerome Friedman
Raj Jain. The art of computer systems performance analysis: techniques for
experimental design, measurement, simulation, and modeling.
Tan, P. N., Steinbach, M., & Kumar, V. (2018). Introduction to data mining.
Pearson Education.

Amphetamine Sulphate Synthesis
67% (9)
Amphetamine Sulphate Synthesis
12 pages
Collins Sam The King 39 S Indian Defence - Move by Move PDF
100% (2)
Collins Sam The King 39 S Indian Defence - Move by Move PDF
339 pages
PerceptiLabs-ML Handbook
No ratings yet
PerceptiLabs-ML Handbook
31 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Unit-5
No ratings yet
Unit-5
73 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
Machine learning
No ratings yet
Machine learning
62 pages
Lecture 3
No ratings yet
Lecture 3
51 pages
MLP Unit-2
No ratings yet
MLP Unit-2
102 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
Slide 1
No ratings yet
Slide 1
29 pages
Machine Learning
No ratings yet
Machine Learning
115 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
Module 3 (1)
No ratings yet
Module 3 (1)
63 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
92 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Module 2
No ratings yet
Module 2
139 pages
Module 5
No ratings yet
Module 5
48 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
100% (2)
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
21 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
1 - Intro to Machine Learning
No ratings yet
1 - Intro to Machine Learning
34 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
Lect 1
No ratings yet
Lect 1
24 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
Unit 2 Supervised Learning and Applications
No ratings yet
Unit 2 Supervised Learning and Applications
13 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Cp4252 Ml Unit-II
No ratings yet
Cp4252 Ml Unit-II
44 pages
Week 7. Intro to ML. Regression
No ratings yet
Week 7. Intro to ML. Regression
24 pages
Linear Regression for ML ass
No ratings yet
Linear Regression for ML ass
99 pages
Regression Models: by Mayuri Bhandari
No ratings yet
Regression Models: by Mayuri Bhandari
64 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
13 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Supervised Learning
No ratings yet
Supervised Learning
187 pages
ICT202B AI ML and Emerging technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging technologies UNIT 3 (Classification and Regression) 2
23 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Introduction To AI and ML
No ratings yet
Introduction To AI and ML
22 pages
Module 3
No ratings yet
Module 3
25 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Supervised ML
No ratings yet
Supervised ML
69 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
machine learning
No ratings yet
machine learning
37 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
nn
No ratings yet
nn
24 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Unit-7 ML
No ratings yet
Unit-7 ML
11 pages
UNIT-3
No ratings yet
UNIT-3
12 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Vas Score
No ratings yet
Vas Score
3 pages
Course 2 Google
No ratings yet
Course 2 Google
36 pages
Machine Learning Unit4
No ratings yet
Machine Learning Unit4
8 pages
Open Source Vs SaaS
No ratings yet
Open Source Vs SaaS
13 pages
5b Circumference
No ratings yet
5b Circumference
8 pages
Report - Top-Secret Turkish Plans To Invade Greece and Armenia Exposed - Walid Shoebat
No ratings yet
Report - Top-Secret Turkish Plans To Invade Greece and Armenia Exposed - Walid Shoebat
8 pages
SH SHD2 Series
No ratings yet
SH SHD2 Series
4 pages
Shortcuts Netbeans para MAC
No ratings yet
Shortcuts Netbeans para MAC
2 pages
Verbs and verb phrases used to talk about money
No ratings yet
Verbs and verb phrases used to talk about money
3 pages
Compenium of Alchemical Processes
100% (3)
Compenium of Alchemical Processes
53 pages
TCP-IP Protocol and Subnet Planning
No ratings yet
TCP-IP Protocol and Subnet Planning
31 pages
Acme Ball Screws
100% (1)
Acme Ball Screws
27 pages
ON CURATING - Issue 24 PDF
No ratings yet
ON CURATING - Issue 24 PDF
94 pages
Importent Banking
No ratings yet
Importent Banking
9 pages
Lab Manual Laboratories/electrical Workshops: 1 Stage (MIET)
No ratings yet
Lab Manual Laboratories/electrical Workshops: 1 Stage (MIET)
3 pages
Stqa - Unit Iii, Iv, V
No ratings yet
Stqa - Unit Iii, Iv, V
50 pages
Chapter 7 Mixture Formation in CI Engine
No ratings yet
Chapter 7 Mixture Formation in CI Engine
47 pages
Holiday Treats Recipes
No ratings yet
Holiday Treats Recipes
10 pages
9th Biology Chap 8,9,10 Dec 2023
No ratings yet
9th Biology Chap 8,9,10 Dec 2023
21 pages
SP-1258 - Specification - Quantitative Risk Assessment & Physical Effects Modelling - 2019
100% (1)
SP-1258 - Specification - Quantitative Risk Assessment & Physical Effects Modelling - 2019
109 pages
PhET Simulation
No ratings yet
PhET Simulation
2 pages
Basic Calculus Q4 Week 3 Module 11
No ratings yet
Basic Calculus Q4 Week 3 Module 11
13 pages
E Line and S Line Assesed
No ratings yet
E Line and S Line Assesed
4 pages
Biopro Mv30: Multi-Biometric Access Control and Time Attendance Terminal
No ratings yet
Biopro Mv30: Multi-Biometric Access Control and Time Attendance Terminal
2 pages
CS Ashiyana Self Help Group
No ratings yet
CS Ashiyana Self Help Group
3 pages
Operational Research
No ratings yet
Operational Research
3 pages
PB-2022-02-01-01872MPK-HERO Passion X Pro Splendor 100 BS6 MPK - Piston and Ring Kit
No ratings yet
PB-2022-02-01-01872MPK-HERO Passion X Pro Splendor 100 BS6 MPK - Piston and Ring Kit
2 pages
MANUAL EPG5-M INGLES Electric Governor
No ratings yet
MANUAL EPG5-M INGLES Electric Governor
3 pages

class 3 - classification

Uploaded by

class 3 - classification

Uploaded by

Prof.

Data Mining &

3a - Classiﬁcation and Regression

Requires a proximity measure

Example: k - Nearest neighbor

K-nearest neighbors of a record x are points that have the k

Ex: Euclidean distance

Determine the class according to the list of nearest neighbors

Classifying unknown records is relatively expensive

Based on the inputs (X1 to X3), predict the output Y

Based on the inputs (X1 to X3), predict the output Y

- xi is the input value

Until the stop condition is reached

Example with another activation function:

Not linearly separable

Not linearly separable

Problem on the left is a simpliﬁed (easier to analyze) version of

● Makes it possible to obtain more complex structures

● Can be useful in solving tasks involving non-linear decision surfaces

There is no way to calculate the error in the hidden layer neurons

● Time-consuming training phase. Backpropagation iterations

● High noise tolerance

● Build a set of classiﬁers from the training data

● Predict the class of the test set by combining the predictions

● Manipulate data distribution

● Manipulate input features

Build a classiﬁer on each bootstrapping (repeated sample)

● Consider a 1-dimensional dataset:

(separation point is calculated based on these values,

○ Initially, all N records are given equal weights

- Example 4 is diﬃcult to classify

● Create decision trees – at each division, consider only a subset of

● Create decision trees – at each division, consider only a subset of

The real function is not linear

Given some estimates b0 and b1 for the coeﬃcients, we make

We deﬁne the Sum of Squared Errors (SSE)

The best regression parameters (which lead to the smallest error

are the sample means

A linear approach captures the essence of the relationship, despite the

Regression provides a better estimate, but there are still errors

Each predictor variable has a linear relationship with the response

We interpret as the average eﬀect on Y of a one-unit increase in

In the advertising example:

Indication that when placing It is not signiﬁcant in

Adjusted R2 = Takes into account the fact that more

Linear regression may not be a good model

In Classiﬁcation Trees we have leaves as

There are several strategies to prevent overﬁtting in regression trees. One of

- Calculate the best threshold for all features.

- Identify the lowest SSE for each candidate.

You might also like