0% found this document useful (0 votes)

16 views

Week 5 - Instance-Based Learning & PCA

Uploaded by

fantiaoxi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Week 5 - Instance-Based Learning & PCA

Uploaded by

fantiaoxi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

CS6140: Machine Learning

k-Nearest Neighbors (k-NN) & Principle Component Analysis (PCA)

Dr. Ryan Rad

Summer 2024
Today’s Agenda

• Recommendation System
• Instance Based Learning
• k Nearest Neighbors (kNN)
k-NN & PCA • Principle Component Analysis (PCA)
• Labs – PCA
• HW3 Walkthrough - Video (coming tomorrow)

2
Recommendation
Systems

4
A common architecture for recommendation systems:

Credit: https://developers.google.com/machine-learning/recommendation/overview/types

Recommendation Type Definition Example

Systems Content-based filtering Uses similarity between
items to recommend items
If user A watches two cute cat
videos, then the system can
similar to what the user likes. recommend cute animal
videos to that user.
Interested to learn more? Collaborative filtering Uses similarities between If user A is similar to user B,
• https://developers.google.com/machine-learning/recommendation
• https://www.datacamp.com/tutorial/recommender-systems-python queries and items and user B likes video 1, then
simultaneously to provide the system can recommend
recommendations. video 1 to user A (even if user
A hasn’t seen any videos
similar to video 1).

5
Case-Study
(University Track & Field Team)

Every year Northeastern University recruits

some additional talented athletes to join the
team.

• How kNN can support our team selection

process?

6
Fundamentals (K-NN)

The fundamentals of similarity-based learning are:

Feature space Similarity metrics

7
Fundamentals (Feature Space)

Figure: A feature space plot of The speed and agility ratings

for 20 college athletes labelled with the decisions for
whether they were drafted or not.

The Triangles represent ‘Non-draft’ instances and the

crosses represent the ‘Draft’ instances.

8
Fundamentals (Feature Space)

A feature space is an abstract n-dimensional space

• In a feature space, each descriptive feature corresponds to an axis.

• Each instance in the dataset is mapped to a point in the feature space

9
K Nearest Neighbours (K-NN)

Example

• Should we draft an athlete with the following profile:

Query: SPEED = 6.75, AGILITY= 3

Using 1-NN?

10
K Nearest Neighbours (K-NN) – Exercise 1

Question: Should we select an athlete with

the following profile this year?

Query: SPEED = 6.75, AGILITY= 3

Using 1-NN?

11
Fundamentals (Distance Metrics)

One of the best known metrics is Euclidean distance which computes the length of the straight
line between two points. Euclidean distance between two instances a and b in a m-dimensional
feature space is defined as:

12
Fundamentals (Distance Metrics)

Example
• The Euclidean distance between instances d12 (SPEED= 5.00, AGILITY= 2.5) and
d5 (SPEED= 2.75,AGILITY= 7.5) is:

Euclidean(⟨5.00, 2.50⟩ , ⟨2.75, 7.50⟩)

= 5.4829

13
K Nearest Neighbours (K-NN) – Exercise 1

Query: SPEED= 6.75, AGILITY= 3

IS “Draft” or “Non-draft”? DRAFT!

Distance to PLUS sign (“Draft”) ?
Distance to Triangle sign (“Non-Draft”) ?
SPEED= 7
AGILITY= 4

Distance to PLUS sign: root[ (7-6.75)2 + (4-3)2 ] SPEED= 5

AGILITY= 2.5

14
K Nearest Neighbours (K-NN) – Exercise 2
Question: Should we select an athlete with
the following profile this year?

Query: SPEED = 8.5, AGILITY= 9

IS “Draft” or “Non-draft

Using 1-NN?

Using 3-NN?
Using 5-NN?
15
Handling Noisy Data

Figure: Is the instance at the top

right of the diagram really noise?

16
Handling Noisy Data

Figure: The decision boundary using majority

classification of the nearest 3 neighbors.

17
Handling Noisy Data

Figure: The decision boundary using majority

classification of the nearest 5 neighbors.

18
Fundamentals (Distance Metrics)

(a) The Voronoi tessellation of the feature space for the dataset
(b) the decision boundary created by aggregating the neighboring Voronoi regions that belong to the same target level.
19
Fundamentals (Distance Metrics)

One of the great things about nearest neighbour algorithms is

that we can add in new data to update the model very easily.

21
Fundamentals (Decision Trees)

The Nearest Neighbour Algorithm

• Require: set of training instances

• Require: a query to be classified
1: Iterate across the instances in memory and find the instance that is
shortest distance from the query position in the feature space.
2: Make a prediction for the query equal to the value of the target
feature of the nearest neighbor.

23
K- NN Algorithm
(summary)

Photo via kdnuggets.com

24
How to tune the K in k-NN ?

What happens if we select a very small value for K?

What happens if we select a very large value for K?

25
Dealing with a tie (draw) situation
Only possible when K is an even number

26
Dealing with a tie (draw) situation
Only possible when K is an even number

In a distance weighted k nearest neighbor algorithm the

contribution of each neighbor to the classification
decision is weighted by the reciprocal of the squared
distance between the neighbor d and the query q:
1 / [dist (q, d)]

Figure: The weighted KNN decision boundary.

27
Quiz Time!
Data Normalization

29
Data Normalization

NoT
Figure: A dataset listing the salary and age
information for customers and whether or
not the purchased a pension plan.

The marketing department wants to decide

whether or not they should contact a customer
with the following profile:
⟨SALARY = 56, 000, NoT = 35⟩ ?

30
Data Normalization

NoT NoT
NoT

31
Data Normalization

This odd prediction is caused by features taking different ranges of values, this is
equivalent to features having different variances.

We can adjust for this using normalization; the equation for range normalization is:

32
Data Normalization

NoT NoT
NoT

33
Data Normalization

Normalizing the data is an

important thing to do for almost
all machine learning algorithms,
not just nearest neighbor!

34
Predicting Continuous Targets

This time, instead of majority voting, we use (weighted) average for the top K nearest neighbors:

Return the average value in the neighborhood:

35
Whisky
Dataset

36
Predicting Continuous Targets

Figure: A dataset of whiskeys

listing the age (in years) and the
rating (between 1 and 5, with 5
being the best) and the bottle
price of each whiskey.

37
Predicting Continuous Targets

Figure: The whiskey

dataset after the
descriptive features
have been normalized.

38
Predicting Continuous Targets

Figure: The AGE and RATING feature space for

the whiskey dataset. The location of the query
instance is indicated by the ? symbol. The circle
plotted with a dashed line demarcates the
border of the neighborhood around the query
when k = 3. The three nearest neighbors to the
query are labelled with their ID values.

39
Predicting Continuous Targets

The model will return a price prediction that is the

average price of the three neighbors:

(200.00 + 250.00 + 55.00) / 3 = 168.33 3

40
Predicting Continuous Targets

Table: The calculations for the weighted k

nearest neighbor prediction

(411.64 + 5987.53 + 4494.38) / (7.4844 + 29.9376 + 17.9775)

= ~ 197

41
K-NN Demos

Best Demos
• http://vision.stanford.edu/teaching/cs231n-demos/knn/
• http://sleepyheads.jp/apps/knn/knn.html

42
Pros and Cons of K-NN
Advantages of K-NN:

1. K-NN is pretty intuitive and simple: K-NN algorithm is very simple to understand and equally easy to implement. To classify the
new data point K-NN algorithm reads through whole dataset to find out K nearest neighbors.
2. K-NN has no assumptions: K-NN is a non-parametric algorithm which means there are assumptions to be met to implement K-
NN. Parametric models like linear regression has lots of assumptions to be met by data before it can be implemented which is
not the case with K-NN.
3. No Training Step: K-NN does not explicitly build any model, it simply tags the new data entry based learning from historical
data. New data entry would be tagged with majority class in the nearest neighbor.
4. It constantly evolves: Given it’s an instance-based learning; k-NN is a memory-based approach. The classifier immediately
adapts as we collect new training data. It allows the algorithm to respond quickly to changes in the input during real-time use.
5. Very easy to implement for multi-class problem: Most of the classifier algorithms are easy to implement for binary problems
and needs effort to implement for multi class whereas K-NN adjust to multi class without any extra efforts.
6. Can be used both for Classification and Regression: One of the biggest advantages of K-NN is that K-NN can be used both for
classification and regression problems.
7. One Hyper Parameter: K-NN might take some time while selecting the first hyper parameter but after that rest of the
parameters are aligned to it.
8. Variety of distance criteria to be choose from: K-NN algorithm gives user the flexibility to choose distance while building K-NN
model. Euclidean Distance, Hamming Distance, Manhattan Distance, or Minkowski Distance
Pros and Cons of K-NN

Disadvantages of K-NN:
1. K-NN slow algorithm: K-NN might be very easy to implement but as dataset grows efficiency or speed of algorithm
declines very fast.
2. Curse of Dimensionality: KNN works well with small number of input variables but as the numbers of variables grow K-NN
algorithm struggles to predict the output of new data point.
3. K-NN needs homogeneous features: If you decide to build k-NN using a common distance, like Euclidean or Manhattan
distances, it is completely necessary that features have the same scale, since absolute differences in features weight the
same, i.e., a given distance in feature 1 must means the same for feature 2.
4. Optimal number of neighbors: One of the biggest issues with K-NN is to choose the optimal number of neighbors to be
consider while classifying the new data entry.
5. Imbalanced data causes problems: k-NN doesn’t perform well on imbalanced data. If we consider two classes, A and B,
and the majority of the training data is labeled as A, then the model will ultimately give a lot of preference to A. This might
result in getting the less common class B wrongly classified.
6. Outlier sensitivity: K-NN algorithm is very sensitive to outliers as it simply chose the neighbors based on distance criteria.
Brain Break – 5 min
Dimensionality Reduction

• Motivation
• Data compression
• Data visualization
• Principal component analysis
• Intuition
• Formulation
• Algorithm
• Reconstruction
• Choosing the number of principal components
• Applying PCA
A beginner-friendly tutorial on PCA with the math behind it:
http://www.iro.umontreal.ca/~pift6080/H09/documents/papers/pca_tutorial.pdf
Principle Component Analysis

𝒉𝟐 = 𝒂𝟐 + 𝒃𝟐

Eigenvector 2

Eigenvector 1

Eigenvector 1
Dot Product of Two Vectors

Intuitively, the dot product is a measure of how much two vectors are aligned.

So, if we have two vectors, u and v, the dot product between these two would give
the length of the vector v along the vector u, or if you will, the projection of v along u.
Eigenvectors & Eigenvalues

In linear algebra, an eigenvector of a linear transformation is a nonzero vector that changes at most
by a scalar factor when that linear transformation is applied to it.

The corresponding eigenvalue, often denoted by 𝜆, is the factor by which the eigenvector is scaled.
Week 6 Quiz – Q4

Which of the following figures correspond to

possible values that PCA may return for first A B
principal component (the first eigen vector)?
Select ALL that apply.

C D

Two acceptable answers:

Only A (partial credit)
Both A, C
Eigenvectors & Eigenvalues

• For a square matrix A (𝒏 ∗ 𝒏), there are n eigenvectors.

• For a 3 * 3 matrix, there are 3 eigenvectors!

• Lastly, all the eigenvectors of a matrix are perpendicular, ie. at right angles to each other, no matter how
many dimensions you have.
Data Compression

• Reduces the required time and storage space

• Removing multi-collinearity improves the interpretation of the
parameters of the machine learning model.

𝑥! 𝑥 (%) ∈ 𝑅' → 𝑧 % ∈𝑅
𝑥 (') ∈ 𝑅' → 𝑧 % ∈𝑅

⋮
𝑥" 𝑥 (() ∈ 𝑅' → 𝑧 ( ∈𝑅

𝑧"
Data Compression
• Reduce data from 3D to 2D

𝑥# 𝑧!
𝑧!
𝑥# 𝑧"
𝑥" 𝑥!
𝑥! 𝑥" 𝑧"
Data pre-processing

• Training set: 𝑥 (") , 𝑥 ($) , ⋯ , 𝑥 (%)

• Preprocessing (feature scaling/mean normalization)
1 (')
𝜇& = * 𝑥&
𝑚
'
(')
Replace each 𝑥& with 𝑥& − 𝜇&

If different features on different scales, scale features to have comparable range of values
(')
(') 𝑥& − 𝜇&
𝑥& ←
𝑠&
Principal Component Analysis Algorithm

Goal: Reduce data from n-dimensions to k-dimensions

• Step 1: Compute “covariance matrix”

+
1 ) ) ,
Σ= + 𝑥 𝑥
𝑚
)*%
• Step 2: Compute “eigenvectors” of the covariance matrix
• Principal components: 𝑢(%) , 𝑢(') , ⋯ , 𝑢 - ∈ 𝑅+
Reconstruction from compressed
representation
,
• Compression: 𝑧 ()) = 𝑈./012/ 𝑥 ())
())
• Reconstruction: 𝑥344.56 = 𝑈./012/𝑧 ())
())
• 𝑥344.56 ∈ 𝑅+ 𝑈./012/ ∈ 𝑅+×- 𝑧 ()) ∈ 𝑅-×%

𝑥! 𝑥!

𝑥" 𝑥"
Transformed Data
How do we choose k
(number of principal components)
% '
) )
• Average squared projection error: ∑ 𝑥 − 𝑥344.56
8 )
% ) '
• Total variation in the data: ∑ 𝑥
8 )

• Typically, choose 𝑘 to be the smallest value so that

! # )
#
∑# : ;:$%%&'(
"
! ) ≤ 0.01 (1%)
∑# : #
"

“99% of variance is retained”

How do we choose k
(number of principal components)

• Try PCA with k = 1, 2, ⋯

• Compute U./012/, z (%) , z (') , ⋯ , z ( ,

% ' (
𝑥344.56 , 𝑥344.56 , ⋯ , 𝑥344.56

• Check if

! # )
#
∑# : ;:$%%&'(
"
! ) ≤ 0.01 ?
∑# : #
"
Application of PCA

• Compression
• Reduce memory/disk needed to store data
• Speed up learning algorithm
• Visualization (k=2, k=3)

• Bad use of PCA

• Reduce the number of features -> less likely to overfit?
• Use regularization instead.
Variance ≠ Predictive Power

• High Variance, Low Predictive Power:

• Favorite Color in Customer Demographics: Imagine you're building a model to
predict customer churn (when a customer stops using your service). While
favorite color has a lot of variance (people have many different favorites), it likely
has no bearing on whether someone cancels.
• Low Variance, High Predictive Power:
• Purchase History (Same Product in Last Month): This feature might have low
variance (many customers might not have bought the same product recently), but
it's a strong indicator of potential future purchase.
Application: Image compression

Original Image

• Divide the original 372x492 image into patches:

• Each patch is an instance that contains 12x12 pixels on a grid
• View each as a 144-D vector
PCA compression: 144D à 60D
PCA compression: 144D à 16D
16 most important eigenvectors
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12

2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12

2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
60 most important eigenvectors

Looks like the discrete cosine bases of JPG!...

2D Discrete Cosine Basis

http://en.wikipedia.org/wiki/Discrete_cosine_transform
• Week 6 – Decision Trees & Ensemble Methods

Coming up
Next…
• Homework #3 due June 14 (@ 7pm Pacific Time)

I’ll introduce course projects in Week 6

• Team Formation due June 14
• Course Survey due June 14
Lab Session:
Principal-Component-Analysis
Questions?

5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
KNN
No ratings yet
KNN
53 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
k-Nearest Neighbors (k-NN) Algorithm
No ratings yet
k-Nearest Neighbors (k-NN) Algorithm
10 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
K- Nearest Neighbor
No ratings yet
K- Nearest Neighbor
13 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
KNN
No ratings yet
KNN
29 pages
ML DSBA Lab4
No ratings yet
ML DSBA Lab4
5 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
m3 final-1
No ratings yet
m3 final-1
171 pages
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
KNN_Algorithm
No ratings yet
KNN_Algorithm
2 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
Shubh
No ratings yet
Shubh
10 pages
Pec-Cs 701e
No ratings yet
Pec-Cs 701e
4 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
No ratings yet
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
93 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
No ratings yet
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
11 pages
E Learning KNN
No ratings yet
E Learning KNN
31 pages
KNN v2
No ratings yet
KNN v2
31 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
1694600817-Unit2.3 KNN CU 2.0
No ratings yet
1694600817-Unit2.3 KNN CU 2.0
25 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
K Nearest Neighbour
100% (1)
K Nearest Neighbour
35 pages
Knn
No ratings yet
Knn
30 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
WEEK 07
No ratings yet
WEEK 07
24 pages
AIML PPT[1]
No ratings yet
AIML PPT[1]
13 pages
K Nearest Neighbors (KNN) : "Birds of A Feather Flock Together"
No ratings yet
K Nearest Neighbors (KNN) : "Birds of A Feather Flock Together"
16 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
K-Nearest Neighbour (KNN)
No ratings yet
K-Nearest Neighbour (KNN)
14 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
KNN - Feb 19
No ratings yet
KNN - Feb 19
42 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Presentation of KNN-1
No ratings yet
Presentation of KNN-1
18 pages
3.1 K Nearest Neighbour Classifier (1)
No ratings yet
3.1 K Nearest Neighbour Classifier (1)
24 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
No ratings yet
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
24 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
KNN ALGO[1]
No ratings yet
KNN ALGO[1]
9 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
06-knn
No ratings yet
06-knn
41 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
MidSem MarkingScheme PDF
No ratings yet
MidSem MarkingScheme PDF
12 pages
Algebra Lineal I - Cryptography in An Algebraic Alphabet
89% (9)
Algebra Lineal I - Cryptography in An Algebraic Alphabet
9 pages
P2 - Chapter 5 - The Vector Space RN
No ratings yet
P2 - Chapter 5 - The Vector Space RN
62 pages
Solution Assignment 1 Linear Algebra
No ratings yet
Solution Assignment 1 Linear Algebra
7 pages
Lab 5 Matlab
No ratings yet
Lab 5 Matlab
28 pages
Kinematics of CM Complete
No ratings yet
Kinematics of CM Complete
141 pages
Quadratic Matric
No ratings yet
Quadratic Matric
5 pages
03 Tensors
No ratings yet
03 Tensors
12 pages
Home Assignment-I PDF
No ratings yet
Home Assignment-I PDF
3 pages
CG Chapter 5
No ratings yet
CG Chapter 5
61 pages
Chapter 5 - State Space Analysis
No ratings yet
Chapter 5 - State Space Analysis
22 pages
MIT18 02SC Exam1 PDF
No ratings yet
MIT18 02SC Exam1 PDF
2 pages
Numpy
No ratings yet
Numpy
28 pages
Numerical Linear Algebra
No ratings yet
Numerical Linear Algebra
45 pages
Practical 7-10 BSc VI Sem_240527_180417
No ratings yet
Practical 7-10 BSc VI Sem_240527_180417
7 pages
Cramer'S Rule Formula: CE 323 - Numerical Solutions To CE Problems
100% (1)
Cramer'S Rule Formula: CE 323 - Numerical Solutions To CE Problems
12 pages
Levi Civita Notation
No ratings yet
Levi Civita Notation
5 pages
Assignment 02
No ratings yet
Assignment 02
1 page
Functional Dual Space
No ratings yet
Functional Dual Space
10 pages
Math 265 Answer Key
No ratings yet
Math 265 Answer Key
5 pages
BSc III SEM V, Sem VI CBCS syllabus (Revised)-1
No ratings yet
BSc III SEM V, Sem VI CBCS syllabus (Revised)-1
8 pages
Magic Squares
No ratings yet
Magic Squares
7 pages
PHYSICS Formulas
No ratings yet
PHYSICS Formulas
19 pages
ENG207A Notes
No ratings yet
ENG207A Notes
20 pages
S.Y.B.Sc - Mathematics PDF
No ratings yet
S.Y.B.Sc - Mathematics PDF
16 pages
MATH 3650 Homework 8
No ratings yet
MATH 3650 Homework 8
2 pages
(Ebook) Contemporary Linear Algebra, Student Solutions Manual by Howard Anton, Robert C. Busby ISBN 9780471170594, 0471170593 pdf download
100% (1)
(Ebook) Contemporary Linear Algebra, Student Solutions Manual by Howard Anton, Robert C. Busby ISBN 9780471170594, 0471170593 pdf download
51 pages
Hand Written Notes Matrix Part 1
No ratings yet
Hand Written Notes Matrix Part 1
54 pages
Shear
No ratings yet
Shear
2 pages
HKAL Applied Mathematics Syllabus 1994
No ratings yet
HKAL Applied Mathematics Syllabus 1994
76 pages

Week 5 - Instance-Based Learning & PCA

Uploaded by

Week 5 - Instance-Based Learning & PCA

Uploaded by

CS6140: Machine Learning

k-Nearest Neighbors (k-NN) & Principle Component Analysis (PCA)

Dr. Ryan Rad

Recommendation Type Definition Example

Every year Northeastern University recruits

• How kNN can support our team selection

The fundamentals of similarity-based learning are:

Feature space Similarity metrics

Figure: A feature space plot of The speed and agility ratings

The Triangles represent ‘Non-draft’ instances and the

A feature space is an abstract n-dimensional space

• In a feature space, each descriptive feature corresponds to an axis.

• Should we draft an athlete with the following profile:

Query: SPEED = 6.75, AGILITY= 3

Question: Should we select an athlete with

Query: SPEED = 6.75, AGILITY= 3

Euclidean(⟨5.00, 2.50⟩ , ⟨2.75, 7.50⟩)

Query: SPEED= 6.75, AGILITY= 3

IS “Draft” or “Non-draft”? DRAFT!

Distance to PLUS sign: root[ (7-6.75)2 + (4-3)2 ] SPEED= 5

Query: SPEED = 8.5, AGILITY= 9

Figure: Is the instance at the top

Figure: The decision boundary using majority

Figure: The decision boundary using majority

One of the great things about nearest neighbour algorithms is

The Nearest Neighbour Algorithm

• Require: set of training instances

Photo via kdnuggets.com

What happens if we select a very small value for K?

What happens if we select a very large value for K?

In a distance weighted k nearest neighbor algorithm the

Figure: The weighted KNN decision boundary.

The marketing department wants to decide

Normalizing the data is an

Return the average value in the neighborhood:

Figure: A dataset of whiskeys

Figure: The whiskey

Figure: The AGE and RATING feature space for

The model will return a price prediction that is the

(200.00 + 250.00 + 55.00) / 3 = 168.33 3

Table: The calculations for the weighted k

(411.64 + 5987.53 + 4494.38) / (7.4844 + 29.9376 + 17.9775)

Which of the following figures correspond to

Two acceptable answers:

• For a square matrix A (𝒏 ∗ 𝒏), there are n eigenvectors.

• For a 3 * 3 matrix, there are 3 eigenvectors!

• Reduces the required time and storage space

• Training set: 𝑥 (") , 𝑥 ($) , ⋯ , 𝑥 (%)

Goal: Reduce data from n-dimensions to k-dimensions

• Step 1: Compute “covariance matrix”

• Typically, choose 𝑘 to be the smallest value so that

“99% of variance is retained”

• Try PCA with k = 1, 2, ⋯

• Bad use of PCA

• High Variance, Low Predictive Power:

• Divide the original 372x492 image into patches:

Looks like the discrete cosine bases of JPG!...

I’ll introduce course projects in Week 6

You might also like