class 3 - classification
class 3 - classification
Heitor S Lopes
Prof. Thiago H Silva
Choose k of the
“closest” records
Basic idea (example):
If it walks like a duck, makes sounds like a duck, then it is
probably a duck
KNN classifier Requires:
● Set of labeled records
● Distance metric
● Value of k, the number of nearest
neighbors to retrieve
To classify an unknown record:
● Compute the distance to other
training records
● Identify k nearest neighbors
● Use the class labels of the nearest
neighbors to determine the label of
the unknown record (e.g., majority
vote)
Definition of nearest neighbor
Good)
Example
Example
For K = 5
The 5 nearest neighbors are:
● X1 = ( ≤ 30 High Yes Good) Class = No
● X2 = (≤ 30 Medium Not Good) Class = No
● X3 = (≤ 30 Low Yes Good) Class = Yes
● X4 = (> 40 Medium Yes Good) Class = Yes
● X5 = (≤ 30 Medium Yes Excellent) Class = Yes
Therefore, X is classified in class = Yes
KNN other points
k-NN classifiers are “lazy” as they do not explicitly build
models
Calculated error
Propagated error
Weights adjusted slightly
The process is repeated for all inputs and outputs until the error is
small or another condition is imposed.
After this process the network is considered trained.
Advantages and disadvantages
● …
Bagging
● Replacement sampling
Classifier:
Decision rule: x <= k versus x > k
The split point k is chosen based on entropy
Bagging - example
Predicted class
Bagging - example
● Assume the test set is the same as the original data
● Use majority vote to determine the class
Predicted class
Boosting
● Iterative procedure to adaptively change the distribution of
training data, focusing more on misclassified records
● “p” is a parameter
● Literature suggests: sqrt(dimensions) and log2dimensions +1
Gradient boosting
Like bagging and boosting, gradient boosting is a methodology
applied on top of another learning algorithm.
Examples:
● XGBoost
● LightGBM
Linear Regression (LR)
Linear Regression
It is a simple supervised learning strategy
Assume that the dependence of Y on X1,X2,...Xp is linear
where β0 is the point where the line crosses the Y-axis and β1 is
the slope of the line (coefficients or parameters) and is an error
Y prediction
Estimating parameters with least squares
If then the error in the estimate for X i is:
Where and
The higher the R2 value, the better the regression (data fit)
Overall model accuracy
Regression quality measured by the coefficient of determination:
Overall model accuracy
Previous example
R2 = 0.98
Example - advertising
Separate models
Multiple linear regression
Models with more than one predictor variable (independent variable)
independent variables
Example result - advertising
Sales value when all
investments are 0
(intercept)