0% found this document useful (0 votes)

9 views

01 Basics 02knn 02

Uploaded by

kabiru Atiku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

01 Basics 02knn 02

Uploaded by

kabiru Atiku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Sebastian Raschka STAT479 FS18.

L01: Intro to Machine Learning Page 7

which is a pairwise distance metric that computes the distance between two data points x[a]
and x[b] over the m input features.

Figure 5: Illustration of k NN for a 3-class problem with k=5.

2.4.2 Regression

The general concept of k NN for regression is the same as for classification: first, we find
the k nearest neighbors in the dataset; second, we make a prediction based on the labels
of the k nearest neighbors. However, in regression, the target function is a real- instead of
discrete-valued function,
f : Rm → R. (9)
A common approach for computing the continuous target is to compute the mean or average
target value over the k nearest neighbors,
k
1X
h x[t] = f x[i] .

(10)
k i=1

As an alternative to averaging the target values of the k nearest neighbors to predict the
label of a query point, it is also not uncommon to use the median instead.

2.5 Curse of Dimensionality

The k NN algorithm is particularly susceptible to the curse of dimensionality 7 . In machine

learning, the curse of dimensionality refers to scenarios with a fixed size of training examples
but an increasing number of dimensions and range of feature values in each dimension in a
high-dimensional feature space.
In k NN an increasing number of dimensions becomes increasingly problematic because the
more dimensions we add, the larger the volume in the hyperspace needs to be to capture a
7 David L Donoho et al. “High-dimensional data analysis: The curses and blessings of dimensionality”.

In: AMS math challenges lecture 1.2000 (2000), p. 32.

Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 8

fixed number of neighbors. As the volume grows larger and larger, the “neighbors” become
less and less “similar” to the query point as they are now all relatively distant from the query
point considering all different dimensions that are included when computing the pairwise
distances.
For example, consider a single dimension with unit length (range [0, 1]). Now, if we consider
100 training examples that are uniformly distributed, we expect one training example located
at each 0.01th unit along the [0, 1] interval or axis. So, to consider the three nearest neighbors
of a query point, we expect to cover 3/100 of the feature axis. However, if we add a second
dimension, the expected interval length that is required to include the same amount of data
(3 neighbors) now increases to 0.031/2 (we now have a unit rectangle). In other words,
instead of requiring 0.03 × 100% = 3% of the space to include 3 neighbors in 1D, we now
need to consider 0.031/2 × 100% = 17.3% of a 2D space to cover the same amount of data
points – the density decreases with the number of dimensions. In 10 dimensions, that’s
now 0.031/10 = 70.4% of the hypervolume we need to consider to include three neighbors
on average. You can see that in high dimensions we need to take a large portion of the
hypervolume into consideration (assuming a fixed number of training examples) to find k
nearest neighbors, and then these so-called “neighbors” may not be particularly “close” to
the query point anymore.

2.6 Computational Complexity and the Big-O Notation

The Big-O notation is used in both mathematics and computer science to study the asymp-
totic behavior of functions, i.e., the asymptotic upper bounds. In the context of algorithms
in computer science, the Big-O notation is most commonly used to measure the time com-
plexity or runtime of an algorithm for the worst case scenario. (Often, it is also used to
measure memory requirements.)
Since Big-O notation and complexity theory, in general, are areas of research in computer
science, we will not go into too much detail in this course. However, you should at least be
familiar with the basic concepts, since it is an essential component for the study of machine
learning algorithms.

f(n) Name
1 Constant
log n Logarithmic
n Linear
n log n Log Linear
n2 Quadratic
n3 Cubic
nc Higher-level polynomial
2n Exponential
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 9

1400
O(1)
1200 O(log n)
O(n)
1000 O(n log n)
O(n^2)
800 O(n^3)
O(2^n)
f(n)

600
400
200
0
2 4 6 8 10
n

Figure 6: An illustration of the growth rates of common functions.

Note that in “Big O” analysis, we only consider the most dominant term, as the other terms
and constants become insignificant asymptotically. For example, consider the function
f (x) = 14x2 − 10x + 25. (11)
2 2
The worst case complexity of this function is O(x ), since x is the dominant term.
Next, consider the example
f (x) = (2x + 8) log2 (x2 + 9). (12)
In “Big O” notation, that is O(x log x). Note that it does not need to distinguish between
different bases of the logarithms, e.g., log10 , or log2 , since we can regard these just as a
scalar factor given the conversion
log2 (x) = log10 (x)/ log10 (2), (13)
1
where log10 (2) is just a scaling factor.
Lastly, consider this naive example of implementing matrix multiplication in Python:

A = [[1, 2, 3],
[2, 3, 4]]

B = [[5, 8],
[6, 9],
[7, 10]]

def matrixmultiply (A, B):

C = [[0 for row in range(len(A))]

for col in range(len(B[0]))]

for row_a in range(len(A)):

for col_b in range(len(B[0])):
for col_a in range(len(A[0])):
C[row_a][col_b] += \
A[row_a][col_a] * B[col_a][col_b]
return C

matrixmultiply(A, B)
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 10

Result:

[[38, 56],
[56, 83]]

Due to the three nested for -loops, the runtime complexity of this function is O(n3 ).

2.6.1 Big O of k NN

For the brute-force neighbor search of the k NN algorithm, we have a time complexity of
O(n × m), where n is the number of training examples and m is the number of dimensions in
the training set. For simplicity, assuming n m, the complexity of the brute-force nearest
neighbor search is O(n). In the next section, we will briefly go over a few strategies to
improve the runtime of the k NN model.

2.7 Improving Computational Performance

2.7.1 Naive k NN Algorithm in Pseudocode

Below are two naive approaches (Variant A and Variant B) for finding the k nearest neighbors
of a query point x[q] .
Variant A
Dk := {}
while |Dk | < k:

• closest distance := ∞
• for i = 1, ..., n, ∀i ∈
/ Dk :

– current distance := d(x[i] , x[q] )

– if current distance < closest distance:
∗ closest distance := current distance
∗ closest point := x[i]

• add closest point to Dk

Variant B
Dk := D
while |Dk | > k:

• largest distance := 0
• for i = 1, ..., n ∀i ∈ Dk :

– current distance := d(x[i] , x[q] )

– if current distance > largest distance:
∗ largest distance := current distance
∗ farthest point := x[i]
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 11

• remove farthest point from Dk

Using a Priority Queue

Both Variant A and Variant B are expensive algorithms, O(k × n) and O((n − k) × n),
respectively . However, with a simple trick, we can improve the nearest neighbor search to
O(n log(k)). For instance, we could implement a priority queue using a heap data structure
8
.
We initialize the heap with the k arbitrary points from the training dataset based on their
distances to the query point. Then, as we iterate through the dataset to find the first
nearest neighbor of the query point, at each step, we make a comparison with the points
and distances in the heap. If the point with the largest stored distance in the heap is
farther away from the query point that the current point under consideration, we remove
the farthest point from the heap and insert the current point. Once we finished one iteration
over the training dataset, we now have a set of the k nearest neighbors.

2.7.2 Data Structures

Different data structures have been developed to improve the computational performance
of k NN during prediction. In particular, the idea is to be smarter about identifying the k
nearest neighbors. Instead of comparing each training example in the training set to a given
query point, approaches have been developed to partition the search space most efficiently.
The details of these data structures are beyond the scope of this lecture since they require
some background in computer science and data structures, but interested students are en-
couraged to read the literature referenced in this section.
Bucketing
The simplest approach is “bucketing”9 . Here, we divide the search space into identical,
similarly-sized cells (or buckets), that resemble a grid (picture a 2D grid 2-dimensional
hyperspace or plane).
KD-Tree
A KD-Tree10 , which stands for k -dimensional search tree, is a generalization of binary
search trees. KD-Trees data structures have a time complexity of O(log(n)) on average (but
O(n) in the worst case) or better and work well in relatively low dimensions. KD-Trees
also partition the search space perpendicular to the feature axes in a Cartesian coordinate
system. However, with a large number of features, KD-Trees become increasingly inefficient,
and alternative data structures, such as Ball-Trees, should be considered.11
Ball-Tree
In contrast to the KD-Tree approach, the Ball-Tree12 partitioning algorithms are based on
the construction of hyperspheres instead of cubes. While Ball-Tree algorithms are generally
8A heap is a special case of a binary search tree with a structure that makes lookups more efficient.
You are not expected to now how heaps work in the exam, but you are encouraged to learn more about
this data structure. A good overview is provided on Wikipedia with links to primary sources: https:
//en.wikipedia.org/wiki/Heap %28data structure%29
9 Ronald L Rivest. “On the Optimality of Elia’s Algorithm for Performing Best-Match Searches.” In:

IFIP Congress. 1974, pp. 678–681.

10 Jon Louis Bentley. “Multidimensional binary search trees used for associative searching”. In: Commu-

nications of the ACM 18.9 (1975), pp. 509–517.

11 Note that software implementations such as the ighborsClassifier in the Scikit-learn library has a

”method=’auto’” default setting that chooses the most appropriate data structure automatically.
12 Stephen M Omohundro. Five balltree construction algorithms. International Computer Science Institute

Berkeley, 1989.
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 12

more expensive to run than KD-Trees, the algorithms address some of the shortcomings of
KD-Tree and are more efficient in higher dimensions.
Note that these data structures or space partitioning algorithms come each with their own
set of hyperparameters (e.g., the leaf size, or settings related to the leaf size). Detailed
discussions of the different data structures for efficient data structures are beyond the scope
of this class.

2.7.3 Dimensionality Reduction

Next, to help reduce the effect of the curse of dimensionality, dimensionality reduction strate-
gies are also useful for speeding up the nearest neighbor search by making the computation
of the pair-wise distances “cheaper.” There are two approaches to dimensionality reduction:

• Feature Selection (e.g., Sequential Forward Selection)

• Feature Extraction (e.g., Principal Component Analysis)

We will cover both feature selection and feature extraction as separate topics later in this
course.

2.7.4 Faster Distance Metric/Heuristic

k NN is compatible with any pairwise distance metric. However, the choice of the distance
metric affects the runtime performance of the algorithm. For instance, computing the Maha-
lanobis distance is much more expensive than calculating the more straightforward Euclidean
distance.

2.7.5 “Pruning”

There are different kinds of “pruning” approaches that we could use to speed up the k NN
algorithm. For example, editing and prototype selection.
Editing
In edited k NN, we permanently remove data points that do not affect the decision boundary.
For example, consider a single data point (aka “outlier”) surrounded by many data points
from a different class. If we perform a k NN prediction, this single data point will not
influence the class label prediction in plurality voting; hence, we can safely remove it.
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 13

Figure 7: Illustration of k NN editing, where we can remove points from the training set that do
not influence the predictions. For example, consider a 3-NN model. On the left, the two points
enclosed in dashed lines would not affect the decision boundary as “outliers.” Similarly, points of
the “right” class that are very far away from the decision boundary, as shown in the right subpanel,
do not influence the decision boundary and hence could be removed for efficiency concerning data
storage or the number of distance computations.

Prototypes
Another strategy (somewhat related to KMeans, a clustering algorithm that we will cover
towards the end of this course), is to replace selected data points by prototypes that sum-
marize multiple data points in dense regions.

2.7.6 Parallelizing k NN

k NN is one of these algorithms that are very easy to parallelize. There are many different
ways to do that. For instance, we could use distributed approaches like map-reduce and place
subsets of the training datasetsEuclidean
on different machines for theEuclidean
distance computations. Further,
distance=1
the distance computations themselves can be carried outdistance=1
using parallel computations on
multiple processors via CPUs or GPUs.
c c

? ?
a a b
2.8 Distance measures b

There are many distance metrics or measures we can use to select k nearest neighbors. There
is no “best” distance measure, and the choice is highly context- or problem-dependent.

Euclidean
distance=1 Manhattan
distance=1

a
c
? ?

Figure 8: The phrase “nearest” is ambiguous and depends on the distance metric we use.

For continuous features, the probably most common distance metric is the Euclidean dis-

Chapter_2
No ratings yet
Chapter_2
70 pages
08 Kmethods3 Curse Deminsionality
No ratings yet
08 Kmethods3 Curse Deminsionality
44 pages
Self Reading - KNN - Notes
No ratings yet
Self Reading - KNN - Notes
7 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
29 K-Nearest Neighbor and Summing Up The End-To-End Workflow
No ratings yet
29 K-Nearest Neighbor and Summing Up The End-To-End Workflow
6 pages
lec02 (1)
No ratings yet
lec02 (1)
27 pages
08 - kNN
No ratings yet
08 - kNN
39 pages
Why Do You Need To Scale Data in KNN: 3 Answers
No ratings yet
Why Do You Need To Scale Data in KNN: 3 Answers
1 page
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
Neural Networks Study Notes
100% (2)
Neural Networks Study Notes
11 pages
HW02 - KNN DT
No ratings yet
HW02 - KNN DT
3 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
Machine learning Lecture 02
No ratings yet
Machine learning Lecture 02
25 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
CSE445 NSU Week_5
No ratings yet
CSE445 NSU Week_5
26 pages
Lecture 5-KNN
No ratings yet
Lecture 5-KNN
55 pages
EE 5301 - VLSI Design Automation I: Algorithms
No ratings yet
EE 5301 - VLSI Design Automation I: Algorithms
128 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Similarity_Based_learning_(part_2_)__
No ratings yet
Similarity_Based_learning_(part_2_)__
15 pages
ML UNIT-2
No ratings yet
ML UNIT-2
33 pages
Nearest Neighbor: Big Data Analy-Cs Howles
No ratings yet
Nearest Neighbor: Big Data Analy-Cs Howles
3 pages
ml unit2
No ratings yet
ml unit2
38 pages
HW02 Sol - KNN DT
No ratings yet
HW02 Sol - KNN DT
8 pages
Lecture5
No ratings yet
Lecture5
21 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
23.-Scaling-Techniques
No ratings yet
23.-Scaling-Techniques
30 pages
3 2KNN
No ratings yet
3 2KNN
27 pages
UNIT-2 ML notes
No ratings yet
UNIT-2 ML notes
15 pages
3_AML _Lecture 3_Feature Engg
No ratings yet
3_AML _Lecture 3_Feature Engg
39 pages
L3 KNN
No ratings yet
L3 KNN
17 pages
ML-Chapter-8 (IBL) - Notes
No ratings yet
ML-Chapter-8 (IBL) - Notes
15 pages
K-Nearest Neighbors: Nipun Batra July 5, 2020
No ratings yet
K-Nearest Neighbors: Nipun Batra July 5, 2020
66 pages
ML Chapter 8 (IBL) Notes
No ratings yet
ML Chapter 8 (IBL) Notes
60 pages
cs4302-lecture2
No ratings yet
cs4302-lecture2
40 pages
Notes 02
No ratings yet
Notes 02
79 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
Ml unit 2
No ratings yet
Ml unit 2
11 pages
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
No ratings yet
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
61 pages
AML_mod5
No ratings yet
AML_mod5
33 pages
01 Basics 02knn 03
No ratings yet
01 Basics 02knn 03
9 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
WEEK 07
No ratings yet
WEEK 07
24 pages
Assignment 3 B
No ratings yet
Assignment 3 B
7 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
Showfile
No ratings yet
Showfile
130 pages
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
No ratings yet
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
29 pages
Reachable Distance Function For KNN Classification
No ratings yet
Reachable Distance Function For KNN Classification
152 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
III-clustering
No ratings yet
III-clustering
87 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
2_DataPreProcessing_code
No ratings yet
2_DataPreProcessing_code
46 pages
Salary Estimation using K-Nearest Neighbour
No ratings yet
Salary Estimation using K-Nearest Neighbour
1 page
Clustering
No ratings yet
Clustering
104 pages
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
Statistical Learning
No ratings yet
Statistical Learning
92 pages
ML Unit V
No ratings yet
ML Unit V
10 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Mathematics - Wikipedia
No ratings yet
Mathematics - Wikipedia
30 pages
Introduction To State Space Analysis - GATE Study Material in PDF
No ratings yet
Introduction To State Space Analysis - GATE Study Material in PDF
9 pages
Finding Triconnected Components of Graphs PDF
No ratings yet
Finding Triconnected Components of Graphs PDF
24 pages
W3 Ecs7020p
No ratings yet
W3 Ecs7020p
51 pages
Mathematics Form 3 Paper 1 Question Paper Zeraki Achievers 5.0 - February 2022
No ratings yet
Mathematics Form 3 Paper 1 Question Paper Zeraki Achievers 5.0 - February 2022
16 pages
frequent pattern mining
No ratings yet
frequent pattern mining
2 pages
Pumping Lemma For Regular Languages: Problem 1: Section 4.3
No ratings yet
Pumping Lemma For Regular Languages: Problem 1: Section 4.3
15 pages
An Introduction To Matched Filter
No ratings yet
An Introduction To Matched Filter
19 pages
DEEP LEARNING LAB MANUAL
No ratings yet
DEEP LEARNING LAB MANUAL
11 pages
Tips For DSA Interview
No ratings yet
Tips For DSA Interview
3 pages
Satyajith - Research - Paper - Text Generation Using Markov Model LSTM Networks To Generate Realistic Text
No ratings yet
Satyajith - Research - Paper - Text Generation Using Markov Model LSTM Networks To Generate Realistic Text
8 pages
Surface Integral
No ratings yet
Surface Integral
2 pages
Assignment 1 Ai
No ratings yet
Assignment 1 Ai
4 pages
Topic 6 Part 1
No ratings yet
Topic 6 Part 1
15 pages
MIMO Channel Capacity
No ratings yet
MIMO Channel Capacity
9 pages
Gauss Quadrature
No ratings yet
Gauss Quadrature
9 pages
Aiml Module 04
No ratings yet
Aiml Module 04
62 pages
5213935-UNIT 2 AI PROJECT CYCLE With Modelling - Uploaded
No ratings yet
5213935-UNIT 2 AI PROJECT CYCLE With Modelling - Uploaded
42 pages
BTP Report (Group 14 Design)
No ratings yet
BTP Report (Group 14 Design)
19 pages
JNTUA Python Programming & Data Science - PPT Notes - R20
No ratings yet
JNTUA Python Programming & Data Science - PPT Notes - R20
57 pages
Deniz Ee 2
No ratings yet
Deniz Ee 2
13 pages
ML L9 Naive Bayes
No ratings yet
ML L9 Naive Bayes
18 pages
EMS507U - Water Level in Tank Report
No ratings yet
EMS507U - Water Level in Tank Report
12 pages
Eet305 Signals and Systems, December 2022 - 2
No ratings yet
Eet305 Signals and Systems, December 2022 - 2
4 pages
Electric Vehicle Charging Load Forecasting Method Based on
No ratings yet
Electric Vehicle Charging Load Forecasting Method Based on
19 pages
Minimum Spanning Tree: Presented By: Hinal Lunagariya
No ratings yet
Minimum Spanning Tree: Presented By: Hinal Lunagariya
30 pages
15 MCQs ML (DT Classification)
No ratings yet
15 MCQs ML (DT Classification)
6 pages
Evolutionary algorithms
No ratings yet
Evolutionary algorithms
66 pages
Introduction To Xy - Plane & Xyz-Space: Continuity
No ratings yet
Introduction To Xy - Plane & Xyz-Space: Continuity
32 pages

01 Basics 02knn 02

Uploaded by

01 Basics 02knn 02

Uploaded by

Sebastian Raschka STAT479 FS18.

L01: Intro to Machine Learning Page 7

Figure 5: Illustration of k NN for a 3-class problem with k=5.

2.5 Curse of Dimensionality

The k NN algorithm is particularly susceptible to the curse of dimensionality 7 . In machine

In: AMS math challenges lecture 1.2000 (2000), p. 32.

2.6 Computational Complexity and the Big-O Notation

Figure 6: An illustration of the growth rates of common functions.

def matrixmultiply (A, B):

C = [[0 for row in range(len(A))]

for row_a in range(len(A)):

2.7 Improving Computational Performance

2.7.1 Naive k NN Algorithm in Pseudocode

– current distance := d(x[i] , x[q] )

• add closest point to Dk

– current distance := d(x[i] , x[q] )

• remove farthest point from Dk

Using a Priority Queue

2.7.2 Data Structures

IFIP Congress. 1974, pp. 678–681.

nications of the ACM 18.9 (1975), pp. 509–517.

2.7.3 Dimensionality Reduction

• Feature Selection (e.g., Sequential Forward Selection)

• Feature Extraction (e.g., Principal Component Analysis)

2.7.4 Faster Distance Metric/Heuristic

You might also like