0% found this document useful (0 votes)
5 views

class 3 - classification

The document discusses various data mining techniques, focusing on classification and regression methods such as k-Nearest Neighbors (KNN), Neural Networks, Ensemble Methods, and Linear Regression. It explains the principles behind these methods, including how to classify unknown records, the importance of choosing parameters like 'k' in KNN, and the training process for neural networks. Additionally, it covers advanced topics like boosting, random forests, and regression trees, highlighting their advantages and applications.

Uploaded by

eltcarva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

class 3 - classification

The document discusses various data mining techniques, focusing on classification and regression methods such as k-Nearest Neighbors (KNN), Neural Networks, Ensemble Methods, and Linear Regression. It explains the principles behind these methods, including how to classify unknown records, the importance of choosing parameters like 'k' in KNN, and the training process for neural networks. Additionally, it covers advanced topics like boosting, random forests, and regression trees, highlighting their advantages and applications.

Uploaded by

eltcarva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Prof.

Heitor S Lopes
Prof. Thiago H Silva

Data Mining &


Knowledge Discovery

3a - Classification and Regression


KNN
Instance-based classification
Use specific training records to make predictions, without needing to maintain
a model derived from the data

Requires a proximity measure

Example: k - Nearest neighbor


● Uses the k “nearest” points (nearest neighbors) to perform classification
KNN classifier
Test record
Training
records

Choose k of the
“closest” records
Basic idea (example):
If it walks like a duck, makes sounds like a duck, then it is
probably a duck
KNN classifier Requires:
● Set of labeled records
● Distance metric
● Value of k, the number of nearest
neighbors to retrieve
To classify an unknown record:
● Compute the distance to other
training records
● Identify k nearest neighbors
● Use the class labels of the nearest
neighbors to determine the label of
the unknown record (e.g., majority
vote)
Definition of nearest neighbor

K-nearest neighbors of a record x are points that have the k


smallest distances from x
Classification
Compute the distance between two points:

Ex: Euclidean distance

Determine the class according to the list of nearest neighbors


● Consider the majority of class label votes among the k-nearest
neighbors
● Give a weight to the vote according to the distance
○ Weight: w = 1/d2
Classification
Choosing the value of k:
● If k is too small, sensitive to noise
● If k is too large, neighborhood may include points from other classes
Normalization!
Classification
Scale issues
Take scale into account to prevent distance measurements from being
dominated by just one attribute
Ex:
A person's height can vary from 1.5m to 1.8m
Weight from 2kg to 150kg
Euclidean distance
Income from R$10K to R$1M
ExampleX = (≤ 30, Medium, Yes,
Computer purchase

Good)
Example
Example
For K = 5
The 5 nearest neighbors are:
● X1 = ( ≤ 30 High Yes Good) Class = No
● X2 = (≤ 30 Medium Not Good) Class = No
● X3 = (≤ 30 Low Yes Good) Class = Yes
● X4 = (> 40 Medium Yes Good) Class = Yes
● X5 = (≤ 30 Medium Yes Excellent) Class = Yes
Therefore, X is classified in class = Yes
KNN other points
k-NN classifiers are “lazy” as they do not explicitly build
models

Classifying unknown records is relatively expensive


Neural Networks
Neural Networks

Based on the inputs (X1 to X3), predict the output Y


Neural Networks

Based on the inputs (X1 to X3), predict the output Y


AND logical function
Truth table The neuron designed to solve the problem must:
- Use x1 and x2 as input
- Weight the inputs with synaptic weights
- Perform a summation
- Apply an activation function to produce an
output y that must be equal to y
Various types of activation functions
AND logical function
For the example we define the sign function (step):
● It will produce output 1 when the induced field is > 0 and -1 otherwise
The neuron processing rule can be defined as:
● Calculation of the signal that enters the neuron is:

- xi is the input value


- the neuron output is y0 = f(vo)
AND logical function (learning)
Initialize weights (w0, w1, …, wn)
Repeat
● For each training instance (xi, yi) Error
● Compute f(w, xi)
● Update weights: Learning rate

Until the stop condition is reached


AND logical function (learning) Sigmod (f)

Example with another activation function:


XOR logical function

Not linearly separable


That is, classes that can be separated by a line (or a hyperplane)
XOR logical function
We need to combine more than one neuron
(allows combining lines)

Not linearly separable


That is, classes that can be separated by a line (or a hyperplane)
XOR logical function

Problem on the left is a simplified (easier to analyze) version of


the problem on the right
Multilayer neural network
● Possibility of combining neurons in a multilayer network

● Makes it possible to obtain more complex structures

● Can be useful in solving tasks involving non-linear decision surfaces


Multilayer neural network
XOR logical function
Multilayer neural network

There is no way to calculate the error in the hidden layer neurons


directly, as the desired response does not exist for such neurons.
Multilayer neural network

Calculated error
Propagated error
Weights adjusted slightly
The process is repeated for all inputs and outputs until the error is
small or another condition is imposed.
After this process the network is considered trained.
Advantages and disadvantages

● Time-consuming training phase. Backpropagation iterations


can occur hundreds of thousands of times.
● Poor interpretability
● Many empirically determined parameters

● High noise tolerance


● Results tend to be good
Ensemble methods
Ensemble methods

● Build a set of classifiers from the training data

● Predict the class of the test set by combining the predictions


made by multiple classifiers
Overview
Types of ensemble methods

● Manipulate data distribution


○ Example: bagging, boosting

● Manipulate input features


○ Example: random forests

● …
Bagging

● Replacement sampling

Build a classifier on each bootstrapping (repeated sample)


Bagging
Bagging - example

● Consider a 1-dimensional dataset:

Classifier:
Decision rule: x <= k versus x > k
The split point k is chosen based on entropy
Bagging - example

(separation point is calculated based on these values,


lower entropy)
Bagging - example
Bagging - example
Bagging - example
● Assume the test set is the same as the original data
● Use majority vote to determine the class

Predicted class
Bagging - example
● Assume the test set is the same as the original data
● Use majority vote to determine the class

Predicted class
Boosting
● Iterative procedure to adaptively change the distribution of
training data, focusing more on misclassified records

○ Initially, all N records are given equal weights


○ Unlike bagging, the weights can change at the end of
each boosting round
Boosting
● Incorrectly classified records will have their weights increased
● Correctly classified records will have their weights decreased

- Example 4 is difficult to classify


Increase the selection - Its weight is increased, so it is more likely to
of example 4 be picked again in subsequent rounds
Random Forests
● Generate a sample with repetition (bootstrap)

● Create decision trees – at each division, consider only a subset of


features.
Random Forests
● Generate a sample with repetition (bootstrap)

● Create decision trees – at each division, consider only a subset of


features.
Random Forests
● RF are robust to overfitting
● Computationally fast, even with a large number of features (because of
the local restriction of “p” features)

● “p” is a parameter
● Literature suggests: sqrt(dimensions) and log2dimensions +1
Gradient boosting
Like bagging and boosting, gradient boosting is a methodology
applied on top of another learning algorithm.

Examples:

● XGBoost
● LightGBM
Linear Regression (LR)
Linear Regression
It is a simple supervised learning strategy
Assume that the dependence of Y on X1,X2,...Xp is linear

The real function is not linear


Linear Regression
What is a good model?
How to estimate model parameters?
Simple LR with a single predictor
Simple LR with a single predictor
Simple LR with a single predictor
Assuming the model

where β0 is the point where the line crosses the Y-axis and β1 is
the slope of the line (coefficients or parameters) and is an error

Given some estimates b0 and b1 for the coefficients, we make


predictions with

Y prediction
Estimating parameters with least squares
If then the error in the estimate for X i is:

We define the Sum of Squared Errors (SSE)


Estimating parameters with least squares
The least squares approach chooses b0 and b1 that minimizes the SSE

The best regression parameters (which lead to the smallest error


variance) are:

Where and

are the sample means


Example - advertising

A linear approach captures the essence of the relationship, despite the


“deficiency” at the beginning
Example – estimating the parameters
Execution time of a query for multiple words:
Example – estimating the parameters
Execution time of a query for multiple words:
Overall model accuracy
Without regression, the best estimate of y is

Regression provides a better estimate, but there are still errors


Overall model accuracy
Regression quality measured by the coefficient of determination:

The higher the R2 value, the better the regression (data fit)
Overall model accuracy
Regression quality measured by the coefficient of determination:
Overall model accuracy
Previous example

R2 = 0.98
Example - advertising

Separate models
Multiple linear regression
Models with more than one predictor variable (independent variable)

Each predictor variable has a linear relationship with the response


variable
Multiple linear regression
Our model:

We interpret as the average effect on Y of a one-unit increase in


Xj, holding all other parameters fixed.

In the advertising example:


Multiple linear regression Hyperplane for two
dimensions (hard to
draw for 3+)

independent variables
Example result - advertising
Sales value when all
investments are 0
(intercept)

Indication that when placing It is not significant in


radio in the model, newspaper the presence of TV and
is not necessary radio in the model
Evaluation Improved with respect to
a single variable

Adjusted R2 = Takes into account the fact that more


variables could inflate the R2
Regression Tree
Regression Tree

Linear regression may not be a good model


Regression Tree
Leaves represent a numeric value.

In Classification Trees we have leaves as


being true or false or other discrete
classes.
Regression Tree
Regression Tree
Regression Tree
Regression Tree
Regression Tree
Regression Tree
Regression Tree

There are several strategies to prevent overfitting in regression trees. One of


them is to limit the minimum number of instances by threshold. E.g. 20.
Regression Tree
If there are more features:

- Calculate the best threshold for all features.

- Identify the lowest SSE for each candidate.


References
Official StasModel library documentation https://www.statsmodels.org/
Official Scikit-learn library documentation www.scikit-learn.org
Complementary material for the book The Elements of Statistical Learning:
Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani
and Jerome Friedman
Raj Jain. The art of computer systems performance analysis: techniques for
experimental design, measurement, simulation, and modeling.
Tan, P. N., Steinbach, M., & Kumar, V. (2018). Introduction to data mining.
Pearson Education.

You might also like