Lec-6 Spam-1

The document discusses spam detection using machine learning techniques. Specifically, it describes using a naïve Bayes classifier to classify emails as either spam or ham (not spam) based on the words contained. The naïve Bayes approach works by calculating the probability that an email belongs to each class based on the occurrence of individual words. It is claimed to have a high accuracy of 97% and advantages such as being self-learning, considering the full message, and being language independent. An example of how the probabilities are calculated is provided.

Uploaded by

Adish garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

142 views16 pages

Lec-6 Spam-1

Uploaded by

Adish garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

SPAM Detection

SPAM
• Originating from the name of Hormel's canned meat,
• "spam" now also refers to junk e-mail or irrelevant postings to
a newsgroup or bulletin board.
• The unsolicited e-mail messages you receive about refinancing
your home, reversing aging, and losing those extra pounds are
all considered to be spam.
• Spamming other people is definitely not cool and is one of the
most notorious violations of Internet etiquette (or
"netiquette").
• So if you ever get the urge to let thousands of people know
about that hot new guaranteed way to make money on the
Internet, please reconsider.
One Solution to Spam Detection
• Machine Learning
– Learn spam versus good/ham

• Naïve Bayes

3
Advantages of Bayesian Method
• Bayesian approach is self adapting. It keeps learning from the new
spams.
• Bayesian method takes whole message into account.
• Bayesian method is easy to use and very accurate (Claimed Accuracy
Percentage is 97).
• Bayesian approach is multi-lingual.
• Reduces the number of false positives.

4
A Spam Filter
Dear Sir.
• Naïve Bayes spam filter
First, I must solicit your confidence in this
transaction, this is by virture of its nature as
• Data: being utterly confidencial and top secret. …
– Collection of emails, labeled
spam or ham
TO BE REMOVED FROM FUTURE MAILINGS,
– Note: someone has to hand SIMPLY REPLY TO THIS MESSAGE AND PUT
label all this data! "REMOVE" IN THE SUBJECT.
– Split into training, testing
sets 99 MILLION EMAIL ADDRESSES
FOR ONLY $99

• Classifiers Ok, Iknow this is blatantly OT but I'm

– Learn on the training set beginning to go insane. Had an old Dell
– Test it on new emails Dimension XPS sitting in the corner and
decided to put it to use, I know it was working
pre being stuck in the corner, but when I
plugged it in, hit the power nothing
happened.
Later in time

Coming before or earlier

Discrete example
Separate spam from valid email, attributes=words
• D1: “send us your password” Spam
• D2: “send us your review” ham
• D3: “review your password” ham
• D4: “review us” spam
• D5: “send your password” spam
• D6: “send us your account” spam
Construct Vocabulary

spam Ham
2/4 ½ Password
¼ 2/2 Review
¾ ½ Send
¾ ½ Us
¾ ½ Your
1/4 0/2 Account

Separate spam from valid email, attributes=words

P (spam)= 4/6 • D1: “send us your password” Spam
P (ham)= 2/6 • D2: “send us your review” ham
• D3: “review password” ham
• D4: “review us” spam
• D5: “send your password” spam
• D6: “send us your account” spam
Naïve Bayes
• Want P( spam | words)
• Use Bayes Rule: P(spam | words)  P( words | spam) P(spam)
P ( words)

P( words )  P( words | spam)  P( spam)  P( words | ham)  P( ham)

• Assume independence: probability of each word

independent of others
P( words | spam)  P( word1 | spam)  P(word 2 | spam)  ... P( wordn | spam)

14
Construct Vocabulary

spam Ham
2/4 ½ Password
¼ 2/2 Review
¾ ½ Send
¾ ½ Us
¾ ½ Your
1/4 0/2 Account

P (spam)= 4/6 New email: “review us now”

P (ham)= 2/6

P(review us|spam) = P( 0,1,0,1,0,0| spam) = (1-2/4)(1/4)(1-3/4)(3/4)(1-3/4)(1-1/4)

P(review us|ham) = P( 0,1,0,1,0,0| ham) = (1-1/2)(2/2)(1-1/2)(1/2)(1-1/2)(1-1/2)
P( words | ham) P(ham)
P(ham | words) 
P( words)

P(ham|review us) = 0.0625*2/6 divide by

0.0625*2/6+ 0.0044*4/6
= 0.87

Is it correct!!!!

HW4 Text-1
No ratings yet
HW4 Text-1
8 pages
This Content Downloaded From 128.86.177.225 On Wed, 30 Nov 2022 12:18:10 UTC
No ratings yet
This Content Downloaded From 128.86.177.225 On Wed, 30 Nov 2022 12:18:10 UTC
20 pages
10 11 035 ErrorList Liebherr V1.0
100% (2)
10 11 035 ErrorList Liebherr V1.0
17 pages
Biofloc Fish Farming PDF Book
83% (6)
Biofloc Fish Farming PDF Book
24 pages
Ba Yes I An Filtering
No ratings yet
Ba Yes I An Filtering
8 pages
Spam Classifier
No ratings yet
Spam Classifier
8 pages
Spam Email Classification-1
No ratings yet
Spam Email Classification-1
10 pages
Assignment 3 28855
No ratings yet
Assignment 3 28855
3 pages
Spam Detection in Email Using Machine Le
No ratings yet
Spam Detection in Email Using Machine Le
8 pages
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
No ratings yet
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
7 pages
Spam 2023
No ratings yet
Spam 2023
11 pages
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
No ratings yet
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
3 pages
Synopsis Email Spam
No ratings yet
Synopsis Email Spam
9 pages
Spam Detection Using BERT
No ratings yet
Spam Detection Using BERT
6 pages
Naive Bayes Spam Filte....
No ratings yet
Naive Bayes Spam Filte....
10 pages
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
No ratings yet
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
4 pages
Considering Behavior of Sender in Spam Mail Detection: S. Naksomboon, C. Charnsripinyo and N. Wattanapongsakorn
No ratings yet
Considering Behavior of Sender in Spam Mail Detection: S. Naksomboon, C. Charnsripinyo and N. Wattanapongsakorn
5 pages
1 s2.0 S0950705106001390 Main
No ratings yet
1 s2.0 S0950705106001390 Main
6 pages
46_ijme...Mech Engg..Research Paper-1
No ratings yet
46_ijme...Mech Engg..Research Paper-1
10 pages
Chung-Kwei Spam IA
No ratings yet
Chung-Kwei Spam IA
18 pages
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
No ratings yet
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
64 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
A Study of Supervised Spam Detection Using Artificial Intelligence
No ratings yet
A Study of Supervised Spam Detection Using Artificial Intelligence
18 pages
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
No ratings yet
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
4 pages
$RB0DCAN
No ratings yet
$RB0DCAN
10 pages
Spam Filtering Using Bayesian Approach: Presented By: Nitin Kumar
No ratings yet
Spam Filtering Using Bayesian Approach: Presented By: Nitin Kumar
11 pages
Naive Bayes Classification - Elements of AI
No ratings yet
Naive Bayes Classification - Elements of AI
1 page
vishal FOML micro project vishal & milan
No ratings yet
vishal FOML micro project vishal & milan
26 pages
Format For PBS
No ratings yet
Format For PBS
18 pages
Week 3 - 5-Bayesian Methods
No ratings yet
Week 3 - 5-Bayesian Methods
4 pages
7.email Spam Filtering Using Naive Bayes Classifier
No ratings yet
7.email Spam Filtering Using Naive Bayes Classifier
14 pages
A Model To Detect Spam Email Using Support Vector Classifier and Random Forest Classifier
No ratings yet
A Model To Detect Spam Email Using Support Vector Classifier and Random Forest Classifier
11 pages
Email Spam Detection (Research Paper)
No ratings yet
Email Spam Detection (Research Paper)
8 pages
Pending Proj
No ratings yet
Pending Proj
37 pages
Evaluating the Effectiveness of Machine Learning Methods for
No ratings yet
Evaluating the Effectiveness of Machine Learning Methods for
8 pages
(IJCST-V12I1P3) :ipsita Panda, Sidharth Dash
No ratings yet
(IJCST-V12I1P3) :ipsita Panda, Sidharth Dash
6 pages
A Support Vector Machine Based Naive Bayes Algorithm for Spam Filtering
No ratings yet
A Support Vector Machine Based Naive Bayes Algorithm for Spam Filtering
8 pages
PRUTHVIRAJ MICOR FOML
No ratings yet
PRUTHVIRAJ MICOR FOML
26 pages
Jebin 2
No ratings yet
Jebin 2
22 pages
Content Based Spam Detection in Email Us PDF
No ratings yet
Content Based Spam Detection in Email Us PDF
5 pages
Implementation of Naïve Bayesian Spam Filter Algorithm
No ratings yet
Implementation of Naïve Bayesian Spam Filter Algorithm
16 pages
44 Decision Tree Model for Email Classification
No ratings yet
44 Decision Tree Model for Email Classification
4 pages
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
No ratings yet
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
7 pages
Email Spam CLassification
No ratings yet
Email Spam CLassification
16 pages
Slide Format
No ratings yet
Slide Format
14 pages
IJRPR8167
No ratings yet
IJRPR8167
7 pages
Naive Bayes Spam Classifier
0% (1)
Naive Bayes Spam Classifier
44 pages
Project 2: Spam Filtering: Linear Statistical Models SYS 4021
No ratings yet
Project 2: Spam Filtering: Linear Statistical Models SYS 4021
36 pages
email Spam Detection Project
No ratings yet
email Spam Detection Project
2 pages
AIML ASSIGNMENT-2
No ratings yet
AIML ASSIGNMENT-2
8 pages
SPAM E-MAIL
No ratings yet
SPAM E-MAIL
9 pages
E-Mail Spam Detection Using Machine Learning Naive Bayes Theorem
No ratings yet
E-Mail Spam Detection Using Machine Learning Naive Bayes Theorem
5 pages
Email Based Spam Detection
No ratings yet
Email Based Spam Detection
5 pages
Spam
No ratings yet
Spam
34 pages
VBK23 Cse 041
No ratings yet
VBK23 Cse 041
6 pages
Apache Spam Assassin
No ratings yet
Apache Spam Assassin
8 pages
Research Paper Spam Detection
No ratings yet
Research Paper Spam Detection
4 pages
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
No ratings yet
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
9 pages
Lecture 6
No ratings yet
Lecture 6
54 pages
164-331-3-PB
No ratings yet
164-331-3-PB
10 pages
NSAI notes Unit3
No ratings yet
NSAI notes Unit3
50 pages
Decision Tree Model For Email Classification: Ivana Čavor
No ratings yet
Decision Tree Model For Email Classification: Ivana Čavor
4 pages
E-mail In An Instant: 60 Ways to Communicate With Style and Impact
From Everand
E-mail In An Instant: 60 Ways to Communicate With Style and Impact
Karen Leland
2/5 (1)
225 228 1 SM
No ratings yet
225 228 1 SM
15 pages
Profile
No ratings yet
Profile
4 pages
Transmission Removal & Installation - Metro & Firefly (Canadian)
No ratings yet
Transmission Removal & Installation - Metro & Firefly (Canadian)
2 pages
Inside Track - Joel Corry - Head - Heart
No ratings yet
Inside Track - Joel Corry - Head - Heart
6 pages
Sikh Numismatics
No ratings yet
Sikh Numismatics
31 pages
Cyber Crime and Punishments
No ratings yet
Cyber Crime and Punishments
19 pages
GeoTalk XIV Geotechnical Investigation
No ratings yet
GeoTalk XIV Geotechnical Investigation
29 pages
Sword and Pyramid Guide1
No ratings yet
Sword and Pyramid Guide1
9 pages
Indemnity & Guarantee
No ratings yet
Indemnity & Guarantee
20 pages
Getting multi-channel distribution right Second Edition Ailawadi download
100% (2)
Getting multi-channel distribution right Second Edition Ailawadi download
63 pages
Service Manual: Foreword
No ratings yet
Service Manual: Foreword
6 pages
School Catalog Basis Peoria-2022
No ratings yet
School Catalog Basis Peoria-2022
11 pages
Android:An Open Handset Alliance Project
No ratings yet
Android:An Open Handset Alliance Project
23 pages
English Precis and Composition CSS Syllabus
No ratings yet
English Precis and Composition CSS Syllabus
3 pages
Civil715 S1 2019
No ratings yet
Civil715 S1 2019
8 pages
Vibrato&SoundHandout
No ratings yet
Vibrato&SoundHandout
7 pages
Service Manual Trucks: Disassembly Instructions, Complete Vehicle
No ratings yet
Service Manual Trucks: Disassembly Instructions, Complete Vehicle
32 pages
VPN Diagram Existing
No ratings yet
VPN Diagram Existing
1 page
Team Charter Sample
No ratings yet
Team Charter Sample
2 pages
Kay Karu - Google Search
No ratings yet
Kay Karu - Google Search
3 pages
Notebook 4
No ratings yet
Notebook 4
3 pages
Exporter Guide - Rangoon - Burma - Union of - 12-22-2017 PDF
No ratings yet
Exporter Guide - Rangoon - Burma - Union of - 12-22-2017 PDF
26 pages
Technical Specifications For Valves
No ratings yet
Technical Specifications For Valves
32 pages
4.3.1 Journal - Your Susceptibility To Disease (Journal)
No ratings yet
4.3.1 Journal - Your Susceptibility To Disease (Journal)
3 pages
Unit-3 HRM Ms
No ratings yet
Unit-3 HRM Ms
23 pages
Government of Karnataka: (Raichur Rto)
No ratings yet
Government of Karnataka: (Raichur Rto)
1 page
FSG Reports
0% (1)
FSG Reports
4 pages