Week 5 - Instance-Based Learning & PCA
Week 5 - Instance-Based Learning & PCA
Summer 2024
Today’s Agenda
• Recommendation System
• Instance Based Learning
• k Nearest Neighbors (kNN)
k-NN & PCA • Principle Component Analysis (PCA)
• Labs – PCA
• HW3 Walkthrough - Video (coming tomorrow)
2
Recommendation
Systems
4
A common architecture for recommendation systems:
Credit: https://developers.google.com/machine-learning/recommendation/overview/types
5
Case-Study
(University Track & Field Team)
6
Fundamentals (K-NN)
7
Fundamentals (Feature Space)
8
Fundamentals (Feature Space)
9
K Nearest Neighbours (K-NN)
Example
10
K Nearest Neighbours (K-NN) – Exercise 1
11
Fundamentals (Distance Metrics)
One of the best known metrics is Euclidean distance which computes the length of the straight
line between two points. Euclidean distance between two instances a and b in a m-dimensional
feature space is defined as:
12
Fundamentals (Distance Metrics)
Example
• The Euclidean distance between instances d12 (SPEED= 5.00, AGILITY= 2.5) and
d5 (SPEED= 2.75,AGILITY= 7.5) is:
13
K Nearest Neighbours (K-NN) – Exercise 1
14
K Nearest Neighbours (K-NN) – Exercise 2
Question: Should we select an athlete with
the following profile this year?
IS “Draft” or “Non-draft
Using 1-NN?
Using 3-NN?
Using 5-NN?
15
Handling Noisy Data
16
Handling Noisy Data
17
Handling Noisy Data
18
Fundamentals (Distance Metrics)
(a) The Voronoi tessellation of the feature space for the dataset
(b) the decision boundary created by aggregating the neighboring Voronoi regions that belong to the same target level.
19
Fundamentals (Distance Metrics)
21
Fundamentals (Decision Trees)
23
K- NN Algorithm
(summary)
24
How to tune the K in k-NN ?
25
Dealing with a tie (draw) situation
Only possible when K is an even number
26
Dealing with a tie (draw) situation
Only possible when K is an even number
27
Quiz Time!
Data Normalization
29
Data Normalization
NoT
Figure: A dataset listing the salary and age
information for customers and whether or
not the purchased a pension plan.
30
Data Normalization
NoT NoT
NoT
31
Data Normalization
This odd prediction is caused by features taking different ranges of values, this is
equivalent to features having different variances.
We can adjust for this using normalization; the equation for range normalization is:
32
Data Normalization
NoT NoT
NoT
33
Data Normalization
34
Predicting Continuous Targets
This time, instead of majority voting, we use (weighted) average for the top K nearest neighbors:
35
Whisky
Dataset
36
Predicting Continuous Targets
37
Predicting Continuous Targets
38
Predicting Continuous Targets
39
Predicting Continuous Targets
40
Predicting Continuous Targets
41
K-NN Demos
Best Demos
• http://vision.stanford.edu/teaching/cs231n-demos/knn/
• http://sleepyheads.jp/apps/knn/knn.html
42
Pros and Cons of K-NN
Advantages of K-NN:
1. K-NN is pretty intuitive and simple: K-NN algorithm is very simple to understand and equally easy to implement. To classify the
new data point K-NN algorithm reads through whole dataset to find out K nearest neighbors.
2. K-NN has no assumptions: K-NN is a non-parametric algorithm which means there are assumptions to be met to implement K-
NN. Parametric models like linear regression has lots of assumptions to be met by data before it can be implemented which is
not the case with K-NN.
3. No Training Step: K-NN does not explicitly build any model, it simply tags the new data entry based learning from historical
data. New data entry would be tagged with majority class in the nearest neighbor.
4. It constantly evolves: Given it’s an instance-based learning; k-NN is a memory-based approach. The classifier immediately
adapts as we collect new training data. It allows the algorithm to respond quickly to changes in the input during real-time use.
5. Very easy to implement for multi-class problem: Most of the classifier algorithms are easy to implement for binary problems
and needs effort to implement for multi class whereas K-NN adjust to multi class without any extra efforts.
6. Can be used both for Classification and Regression: One of the biggest advantages of K-NN is that K-NN can be used both for
classification and regression problems.
7. One Hyper Parameter: K-NN might take some time while selecting the first hyper parameter but after that rest of the
parameters are aligned to it.
8. Variety of distance criteria to be choose from: K-NN algorithm gives user the flexibility to choose distance while building K-NN
model. Euclidean Distance, Hamming Distance, Manhattan Distance, or Minkowski Distance
Pros and Cons of K-NN
Disadvantages of K-NN:
1. K-NN slow algorithm: K-NN might be very easy to implement but as dataset grows efficiency or speed of algorithm
declines very fast.
2. Curse of Dimensionality: KNN works well with small number of input variables but as the numbers of variables grow K-NN
algorithm struggles to predict the output of new data point.
3. K-NN needs homogeneous features: If you decide to build k-NN using a common distance, like Euclidean or Manhattan
distances, it is completely necessary that features have the same scale, since absolute differences in features weight the
same, i.e., a given distance in feature 1 must means the same for feature 2.
4. Optimal number of neighbors: One of the biggest issues with K-NN is to choose the optimal number of neighbors to be
consider while classifying the new data entry.
5. Imbalanced data causes problems: k-NN doesn’t perform well on imbalanced data. If we consider two classes, A and B,
and the majority of the training data is labeled as A, then the model will ultimately give a lot of preference to A. This might
result in getting the less common class B wrongly classified.
6. Outlier sensitivity: K-NN algorithm is very sensitive to outliers as it simply chose the neighbors based on distance criteria.
Brain Break – 5 min
Dimensionality Reduction
• Motivation
• Data compression
• Data visualization
• Principal component analysis
• Intuition
• Formulation
• Algorithm
• Reconstruction
• Choosing the number of principal components
• Applying PCA
A beginner-friendly tutorial on PCA with the math behind it:
http://www.iro.umontreal.ca/~pift6080/H09/documents/papers/pca_tutorial.pdf
Principle Component Analysis
𝒉𝟐 = 𝒂𝟐 + 𝒃𝟐
Eigenvector 2
Eigenvector 1
Eigenvector 1
Dot Product of Two Vectors
Intuitively, the dot product is a measure of how much two vectors are aligned.
So, if we have two vectors, u and v, the dot product between these two would give
the length of the vector v along the vector u, or if you will, the projection of v along u.
Eigenvectors & Eigenvalues
In linear algebra, an eigenvector of a linear transformation is a nonzero vector that changes at most
by a scalar factor when that linear transformation is applied to it.
The corresponding eigenvalue, often denoted by 𝜆, is the factor by which the eigenvector is scaled.
Week 6 Quiz – Q4
C D
• Lastly, all the eigenvectors of a matrix are perpendicular, ie. at right angles to each other, no matter how
many dimensions you have.
Data Compression
𝑥! 𝑥 (%) ∈ 𝑅' → 𝑧 % ∈𝑅
𝑥 (') ∈ 𝑅' → 𝑧 % ∈𝑅
⋮
𝑥" 𝑥 (() ∈ 𝑅' → 𝑧 ( ∈𝑅
𝑧"
Data Compression
• Reduce data from 3D to 2D
𝑥# 𝑧!
𝑧!
𝑥# 𝑧"
𝑥" 𝑥!
𝑥! 𝑥" 𝑧"
Data pre-processing
If different features on different scales, scale features to have comparable range of values
(')
(') 𝑥& − 𝜇&
𝑥& ←
𝑠&
Principal Component Analysis Algorithm
𝑥! 𝑥!
𝑥" 𝑥"
Transformed Data
How do we choose k
(number of principal components)
% '
) )
• Average squared projection error: ∑ 𝑥 − 𝑥344.56
8 )
% ) '
• Total variation in the data: ∑ 𝑥
8 )
% ' (
𝑥344.56 , 𝑥344.56 , ⋯ , 𝑥344.56
• Check if
! # )
#
∑# : ;:$%%&'(
"
! ) ≤ 0.01 ?
∑# : #
"
Application of PCA
• Compression
• Reduce memory/disk needed to store data
• Speed up learning algorithm
• Visualization (k=2, k=3)
Original Image
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
60 most important eigenvectors
http://en.wikipedia.org/wiki/Discrete_cosine_transform
• Week 6 – Decision Trees & Ensemble Methods
Coming up
Next…
• Homework #3 due June 14 (@ 7pm Pacific Time)