KNN Is A Very Simple Algorithm Used To Solve Classification Problems. KNN Stands For K-Nearest Neighbors. K Is The Number of Neighbors in KNN

KNN is a poor choice for spam filtering because: 1. KNN classifiers will only filter spam that is very similar to known spam examples and will not generalize well to new spam. 2. KNN also suffers from only confidently labeling emails as non-spam if they are very similar to a trained non-spam email. 3. KNN performs poorly on large datasets because calculating distances between all points is computationally expensive, and it is also sensitive to outliers, missing values, and the number of dimensions in the data.

Uploaded by

Jessica Samuel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

1K views

KNN Is A Very Simple Algorithm Used To Solve Classification Problems. KNN Stands For K-Nearest Neighbors. K Is The Number of Neighbors in KNN

Uploaded by

Jessica Samuel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Why KNN is poor choice for spam filter?

What is KNN?

 KNN is a very simple algorithm used to solve classification

problems. KNN stands for K-Nearest Neighbors. K is the
number of neighbors in KNN.
Why KNN is poor choice as spam filter
 KNN classifiers are good whenever there is a really
meaningful distance metric. In the spam case, KNN
classifiers are going to label as spam things that are “close”
to known spams being “close” in the sense of your distance
metric (which will likely be poor).
Therefore, KNN classifiers are only going to filter
spams that are really similar to what you already
know. It won’t really generalize properly.
Also, you have to train on non-spam examples too,
and KNN will suffer from the same problem: it will
only confidently say something is non-spam if it is
written very similarly to a non-spam email that KNN
was trained on.
 Limitations of KNN to use as spam filters

1. Doesn’t work well with a large dataset:

Since KNN is a distance-based algorithm, the cost of
calculating distance between a new point and each
existing point is very high which in turn degrades
the performance of the algorithm.
2. Doesn’t work well with a high number of
dimensions:
Again, the same reason as above. In higher
dimensional space, the cost to calculate distance
becomes expensive and hence impacts the
performance.
 Distribution of e-mails data set
 3. Sensitive to outliers and missing values:
KNN is sensitive to outliers and missing values
and hence we first need to impute the missing
values and get rid of the outliers before applying
the KNN algorithm.
 4. Need feature scaling: We need to do
feature scaling (standardization and
normalization) before applying KNN
algorithm to any dataset. If we don't do so,
KNN may generate wrong predictions.
 5. For different values of ‘k’
prediction of gain data may
varies, therefore accuracy may
be poor.
 For example
 With respect to given data if k=3
,the given data belongs to class B
 If K=7,the given data belongs to
classA
 So, for different values of k
predictions may varies
 Failure of KNN
CASE 1
In this case, the data is grouped in
clusters but the query point seems far
away from the actual grouping. In such
a case, we can use K nearest neighbors
to identify the class, however, it doesn’t
make much sense because the query
point (yellow point) is really far from the
data points and hence we can’t be very
sure about its classification.
Case 2
In this case, the data is randomly
spread and hence no useful
information can be obtained from it.
Now in such a scenario when we are
given a query point (yellow point), the
KNN algorithm will try to find the k
nearest neighbors but since the data
points are jumbled, the accuracy
is questionable
 Based on accuracy

ML Unit-1
100% (2)
ML Unit-1
12 pages
Question Bank
No ratings yet
Question Bank
14 pages
DAA TechKnowledge
No ratings yet
DAA TechKnowledge
195 pages
Unit 2 Machine Learning Notes
100% (1)
Unit 2 Machine Learning Notes
25 pages
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
ADA Complete Notes
33% (3)
ADA Complete Notes
151 pages
ML - LAB Record
No ratings yet
ML - LAB Record
36 pages
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
100% (1)
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
4 pages
Email Classification: Roll No-41463 (LP-3)
No ratings yet
Email Classification: Roll No-41463 (LP-3)
5 pages
ML Decode
No ratings yet
ML Decode
130 pages
Find S Algorithm
No ratings yet
Find S Algorithm
7 pages
Unit 1 Introduction of Machine Learning Notes
No ratings yet
Unit 1 Introduction of Machine Learning Notes
57 pages
Unit I Notes Machine Learning Techniques 1
No ratings yet
Unit I Notes Machine Learning Techniques 1
21 pages
IT8601-Computational Intelligence PDF
No ratings yet
IT8601-Computational Intelligence PDF
12 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
ML UNIT-4 Notes PDF
100% (1)
ML UNIT-4 Notes PDF
40 pages
Primitives For Distributed Communication
100% (2)
Primitives For Distributed Communication
10 pages
Multistage Backward
No ratings yet
Multistage Backward
13 pages
MCA Syllabus BPUT
No ratings yet
MCA Syllabus BPUT
11 pages
UNIT IV (Well Posed Leaning Problems)
100% (1)
UNIT IV (Well Posed Leaning Problems)
16 pages
Associative Memory Neural Networks
100% (1)
Associative Memory Neural Networks
35 pages
Challenges InThreading A Loop - Doc1
100% (2)
Challenges InThreading A Loop - Doc1
6 pages
Machine Learning Question Paper Solved ML
No ratings yet
Machine Learning Question Paper Solved ML
55 pages
AIML Course File
No ratings yet
AIML Course File
31 pages
Unit 1 Introduction To ML
100% (1)
Unit 1 Introduction To ML
52 pages
ML Mid Sem Question Bank
No ratings yet
ML Mid Sem Question Bank
11 pages
Sonata Software Sample Aptitude Placement Paper Level1
No ratings yet
Sonata Software Sample Aptitude Placement Paper Level1
7 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Machine Learning Unit 5
No ratings yet
Machine Learning Unit 5
43 pages
Jntu SL Lab Manual
No ratings yet
Jntu SL Lab Manual
33 pages
Discrete Mathematical Structures 15CS36: Course Objectives: This Course Will Enable Students To
No ratings yet
Discrete Mathematical Structures 15CS36: Course Objectives: This Course Will Enable Students To
53 pages
ADA Notes-NEP 2023-1
No ratings yet
ADA Notes-NEP 2023-1
17 pages
M6 QA Univ Sol
No ratings yet
M6 QA Univ Sol
19 pages
Ai-Unit2 - QB-VDP
No ratings yet
Ai-Unit2 - QB-VDP
13 pages
Data Analytics Unit-I
No ratings yet
Data Analytics Unit-I
25 pages
Scripting Languages Unit - V Handwritten Notes
No ratings yet
Scripting Languages Unit - V Handwritten Notes
37 pages
Computer Networks Lab File
No ratings yet
Computer Networks Lab File
35 pages
ccs346 Eda Lab Manual
No ratings yet
ccs346 Eda Lab Manual
41 pages
Unit 4
100% (1)
Unit 4
57 pages
Os Question Bank
100% (1)
Os Question Bank
12 pages
NNDL Technical Publication Notes
No ratings yet
NNDL Technical Publication Notes
81 pages
Concept Learning
No ratings yet
Concept Learning
62 pages
Practical Slip Mcs Sem 3 Pune University
0% (1)
Practical Slip Mcs Sem 3 Pune University
4 pages
AI ch.1
No ratings yet
AI ch.1
22 pages
DAA Viva Questions
100% (1)
DAA Viva Questions
15 pages
Ada Lesson Plan Bcs401
No ratings yet
Ada Lesson Plan Bcs401
11 pages
Soft Computing Laboratory Lab Manual
100% (1)
Soft Computing Laboratory Lab Manual
32 pages
Machine Learning UNIT-3
100% (1)
Machine Learning UNIT-3
16 pages
C & Ds Notes 2022-2023 r22 Syllabus
100% (1)
C & Ds Notes 2022-2023 r22 Syllabus
210 pages
NNDL Lab Record
No ratings yet
NNDL Lab Record
26 pages
AIML LAB MANAUAL R23
100% (1)
AIML LAB MANAUAL R23
10 pages
KNN Poor Choice
No ratings yet
KNN Poor Choice
9 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
Unit 3 KNN
No ratings yet
Unit 3 KNN
16 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
KNN Algorithm
No ratings yet
KNN Algorithm
9 pages
Week 7 Part 1KNN K Nearest Neighbor Classification
No ratings yet
Week 7 Part 1KNN K Nearest Neighbor Classification
47 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Experiment 4: To Calculate Linear and Circular Convolution of Discrete Time Signals
No ratings yet
Experiment 4: To Calculate Linear and Circular Convolution of Discrete Time Signals
6 pages
13 Clustering Techniques
No ratings yet
13 Clustering Techniques
47 pages
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
No ratings yet
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
79 pages
1 Neural Networks
No ratings yet
1 Neural Networks
16 pages
CPU Scheduling
No ratings yet
CPU Scheduling
11 pages
CH 4-Design Optimization-Optimum Design Concepts PDF
No ratings yet
CH 4-Design Optimization-Optimum Design Concepts PDF
62 pages
Simulasi Fenomena Aliran Daya Pada Sistem Tenaga Listrik "Ieee 5-Bus" Berbasis Metode Numeris Dan Berbantuan Aplikasi Matlab
No ratings yet
Simulasi Fenomena Aliran Daya Pada Sistem Tenaga Listrik "Ieee 5-Bus" Berbasis Metode Numeris Dan Berbantuan Aplikasi Matlab
9 pages
NN Ch04
No ratings yet
NN Ch04
29 pages
SE-807 Machine Learning - Compressed
No ratings yet
SE-807 Machine Learning - Compressed
2 pages
Algorithms For Factoring
No ratings yet
Algorithms For Factoring
15 pages
WDM01 01 Que 20180118
No ratings yet
WDM01 01 Que 20180118
32 pages
Unit 3
No ratings yet
Unit 3
21 pages
Cs8082 Machine Learning Techniques
No ratings yet
Cs8082 Machine Learning Techniques
14 pages
Firas Al-Azizy ML Assignment 1
No ratings yet
Firas Al-Azizy ML Assignment 1
12 pages
Math 10 Quiz Bee Quarter 2
No ratings yet
Math 10 Quiz Bee Quarter 2
50 pages
Mini project hpc
No ratings yet
Mini project hpc
3 pages
CH 8-5 Adding and Subtracting Polynomials
No ratings yet
CH 8-5 Adding and Subtracting Polynomials
13 pages
Particle Swarm Optimisation
No ratings yet
Particle Swarm Optimisation
24 pages
Neural Networks & Machine Learning: Worksheet 3
No ratings yet
Neural Networks & Machine Learning: Worksheet 3
3 pages
Module1 Lecture1
No ratings yet
Module1 Lecture1
23 pages
Heterogeneous Quantum Computing For Satellite Optimization: Gideon Bass Booz Allen Hamilton
No ratings yet
Heterogeneous Quantum Computing For Satellite Optimization: Gideon Bass Booz Allen Hamilton
19 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
Association Rule Mining - Apriori Algorithm
No ratings yet
Association Rule Mining - Apriori Algorithm
22 pages
Solving Systems of Linear Equations
No ratings yet
Solving Systems of Linear Equations
9 pages
(2020129) On Layer Normalization in The Transformer Architecture
No ratings yet
(2020129) On Layer Normalization in The Transformer Architecture
17 pages
ML Lecture#02
No ratings yet
ML Lecture#02
20 pages
Chapter 2
No ratings yet
Chapter 2
9 pages
Slides Chapter8 DISCRETIZATION OF CONTINUOS SYSTEMS
No ratings yet
Slides Chapter8 DISCRETIZATION OF CONTINUOS SYSTEMS
64 pages
Field Programmable Gate Array Implementation of 14 Bit Sigma-Delta Analog To Digital Converter
No ratings yet
Field Programmable Gate Array Implementation of 14 Bit Sigma-Delta Analog To Digital Converter
4 pages
AP CS Java Arrays Answers
No ratings yet
AP CS Java Arrays Answers
12 pages

KNN Is A Very Simple Algorithm Used To Solve Classification Problems. KNN Stands For K-Nearest Neighbors. K Is The Number of Neighbors in KNN

Uploaded by

KNN Is A Very Simple Algorithm Used To Solve Classification Problems. KNN Stands For K-Nearest Neighbors. K Is The Number of Neighbors in KNN

Uploaded by

Why KNN is poor choice for spam filter?

 KNN is a very simple algorithm used to solve classification

1. Doesn’t work well with a large dataset:

You might also like