0% found this document useful (0 votes)

9 views14 pages

7.email Spam Filtering Using Naive Bayes Classifier

Uploaded by

nanipavan830

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views14 pages

7.email Spam Filtering Using Naive Bayes Classifier

Uploaded by

nanipavan830

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

___

Back to Blog

DATA SCIENCE

Email Spam Filtering Using

Naive Bayes Classifier
9 minute read | June 14, 2021

Written by:
Sakshi Gupta

Naive Bayes is a probabilistic algorithm based on the Bayes

Theorem used for email spam filtering in data analytics. If you have
an email account, we are sure that you have seen emails being
categorised into different buckets and automatically being marked
important, spam, promotions, etc. Isn’t it wonderful to see machines
being so smart and doing the work for you?
More often than not, these labels added by the system are right. So
does this mean our email software is reading through every
communication and now understands what you as a user would
have done? Absolutely right! In this age and time of data analytics &
machine learning, automated filtering of emails happens via
algorithms like Naive Bayes Classifier, which apply the basic Bayes
Theorem on the data.

In this article, we will understand briefly about the Naive Bayes

Algorithm before we get our hands dirty and analyse a real email
dataset in Python. This blog is second in the series to understand
the Naive Bayes Algorithm. You can read part 1 here in the
introduction to Bayes Theorem & Naive Bayes Algorithm blog.

Source: GitHub

The Naive Bayes Classifier

Formula
One of the most simple yet powerful classifier algorithms, Naive
Bayes is based on Bayes’ Theorem Formula with an assumption of
independence among predictors. Given a Hypothesis A and
evidence B, Bayes’ Theorem calculator states that the relationship
between the probability of Hypothesis before getting the evidence
P(A) and the probability of the hypothesis after getting the evidence
P(A|B) is:

Source: Wikipedia

Here:

A, B = events
P(A|B) = probability of A given B is true
P(B|A) = probability of B given A is true
P(A), P(B) = the independent probabilities of A and B

This theorem, as explained in one of our previous articles, is mainly

used for classification techniques in data analytics. The Naive Bayes
theorem calculator pays an important role in spam detection of
emails.

Get To Know Other Data Science

Students

Peter Liu
Business Intelligence Analyst at Indeed

Read Story

Detecting Email Spam

Modern spam filtering software continuously struggles to
categorise the emails correctly. Unwanted spam & promotional
communication is the toughest of them all. Spam communication
algorithms must be iterated continuously since there is an ongoing
battle between spam filtering software and anonymous spam &
promotional mail senders. Naive Bayes Algorithm in data analytics
forms the base for text filtering in Gmail, Yahoo Mail, Hotmail & all
other platforms.

Like Naive Bayes, other classifier algorithms like Support Vector

Machine, or Neural Network also get the job done! Before we begin,
here is the dataset for you to download:
Email Spam Filtering Using Naive Bayes Algorithm
This would be a zipped file, attached in the email. Please allow users
to download this data.

Source: Sebastian Raschka

For convenience, we have already split the data into train & test files.
Let’s get into it:

import pandas as pd
# read training data & test data
df_train = pd.read_csv("training.csv")
df_test = pd.read_csv("test.csv")

Always review the first 5 rows of the dataset:

df_test.sample(5)
df_train.sample(5)

Your output for train dataset may look something like this:

type email
1779 Ham Into thereis tapping said that scarce whose…
1646 Ham Then many take the ghastly and rapping gaun…
534 Spam Did parting are dear where fountain save ca…
288 Spam His heart sea he care he sad day there anot…
1768 Ham With ease explore. See whose swung door and…

And the output for test dataset would look something like this:

type email
58 Ham Sitting ghastly me peering more into in the…
80 Spam A favour what whilome within childe of chil…
56 Spam From who agen to sacred breast unto will co…
20 Ham Of to gently flown shrieked ashore such sad…
94 Spam A charms his of childe him. Lowly one was b…

If you notice, you will realise that we have two columned CSV files
here. Type column contains whether the email is marked as Spam
or Ham & the email columns contains body (main text) of the email.
Both the train & test datasets have the same format.

Ensuring data consistency is of utmost importance in any data

analytics problem. Let’s do some descriptive statistics as the first
step in the data analytics problem, on our training data.

df_train.describe(include = 'all')

type email
count 2000 2000
unique 2 2000
top Spam Along childe love and the but womans a the …
freq 1000 1

In the output, we will see that there are 2000 records. We have two
unique Type and 2000 unique emails. Let’s detail a little more about
Type column.

df_train.groupby('type').describe()

email
count unique top freq
type
Ham 1000 1000 Broken if still art within lordly or the it… 1
Along childe love and the but womans
Spam 1000 1000 1
a the …

As we can see, in our test data, we have an equal number (1000

each) of Spam and Ham. There is no duplicate data in the email
column. Let’s sanitise our data now.

import email_pre as ep
from gensim.models.phrases import Phrases

def do_process(row):
global bigram
temp = ep.preprocess_text(row.email,[ep.lowercase,
ep.remove_html,
ep.remove_esc_chars,
ep.remove_urls,
ep.remove_numbers,
ep.remove_punct,
ep.lemmatize,
ep.keyword_tokenize])

if not isinstance(temp,str):
print temp

return ' '.join(bigram[temp.split(" ")])

def phrases_train(sen_list,min_ =3):

if len(sen_list) <= 10:
print("too small to train! ")
return

if isinstance(sen_list,list):
try:
bigram = Phrases.load("email_EN_bigrams_spam")
bigram.add_vocab(sen_list)
bigram.save("email_EN_bigrams_spam")
print "retrain!"

except Exception as ex:

print "first "
bigram = Phrases(sen_list, min_count=min_, threshold=2)
bigram.save("email_EN_bigrams_spam")
print ex

Phrase Model train (we can run this once & save it)

train_email_list = [ep.preprocess_text(mail,[ep.lowercase,
ep.remove_html,
ep.remove_esc_chars,
ep.remove_urls,
ep.remove_numbers,
ep.remove_punct,
ep.lemmatize,
ep.keyword_tokenize]).split(" ") for mail in df_train.email.values]

print "after pre_process :"

print " "
print len(train_email_list)
print df_train.ix[22].email,">>"*80,train_email_list[22]

Here is the output after an initial pre_processing:

2000
Him ah he more things long from mine for. Unto feel they seek
other adieu crime dote. Adversity pangs low. Soon light now time
amiss to gild be at but knew of yet bidding he thence made. Will
care true and to lyres and and in one this charms hall ancient
departed from. Bacchanals to none lay charms in the his most his
perchance the in and the uses woe deadly. Save nor to for that that
unto he. Thy in thy. Might parasites harold of unto sing at that in for
soils within rake knew but. If he shamed breast heralds grace once
dares and carnal finds muse none peace like way loved. If long
favour or flaunting did me with later will. Not calm labyrinth tear
basked little. It talethis calm woe sight time. Rake and to hall. Land
the a him uncouth for monks partings fall there below true sighed
strength. Nor nor had spoiled condemned glee dome monks him
few of sore from aisle shun virtues. Bidding loathed aisle a and if
that to it chill shades isle the control at. So knew with one will wight
nor feud time sought flatterers earth. Relief a would break at he if
break not scape.The will heartless sacred visit few. The was
from near long grief. His caught from flaunting sacred care fame
said are such and in but a. [‘ah’, ‘things’, ‘long’, ‘mine’, ‘unto’, ‘feel’,
‘seek’, ‘adieu’, ‘crime’, ‘dote’, ‘adversity’, ‘pangs’, ‘low’, ‘soon’, ‘light’, ‘time’,
‘amiss’, ‘gild’, ‘know’, ‘yet’, ‘bid’, ‘thence’, ‘make’, ‘care’, ‘true’, ‘lyres’, ‘one’,
‘charm’, ‘hall’, ‘ancient’, ‘depart’, ‘bacchanals’, ‘none’, ‘lay’, ‘charm’,
‘perchance’, ‘use’, ‘woe’, ‘deadly’, ‘save’, ‘unto’, ‘thy’, ‘thy’, ‘might’,
‘parasites’, ‘harold’, ‘unto’, ‘sing’, ‘soil’, ‘within’, ‘rake’, ‘know’, ‘sham’,
‘breast’, ‘herald’, ‘grace’, ‘dare’, ‘carnal’, ‘find’, ‘muse’, ‘none’, ‘peace’, ‘like’,
‘way’, ‘love’, ‘long’, ‘favour’, ‘flaunt’, ‘later’, ‘calm’, ‘labyrinth’, ‘tear’, ‘bask’,
‘little’, ‘talethis’, ‘calm’, ‘woe’, ‘sight’, ‘time’, ‘rake’, ‘hall’, ‘land’, ‘uncouth’,
‘monks’, ‘part’, ‘fall’, ‘true’, ‘sigh’, ‘strength’, ‘spoil’, ‘condemn’, ‘glee’,
‘dome’, ‘monks’, ‘sore’, ‘aisle’, ‘shun’, ‘virtues’, ‘bid’, ‘loathe’, ‘aisle’, ‘chill’,
‘shade’, ‘isle’, ‘control’, ‘know’, ‘one’, ‘wight’, ‘feud’, ‘time’, ‘seek’, ‘flatterers’,
‘earth’, ‘relief’, ‘would’, ‘break’, ‘break’, ‘scapethe’, ‘heartless’, ‘sacred’,
‘visit’, ‘near’, ‘long’, ‘grief’, ‘catch’, ‘flaunt’, ‘sacred’, ‘care’, ‘fame’, ‘say’]

df_train["class"] = df_train.type.replace(["Spam","Ham"],[0,1])
df_test["class"] = df_test.type.replace(["Spam","Ham"],[0,1])

Bigram Training

phrases_train(train_email_list,min_=3)
bigram = Phrases.load("email_EN_bigrams_spam")
len(bigram.vocab)

And let’s retrain again! Here is the output:

159158

print len(dict((key,value) for key, value in bigram.vocab.iteritems

You may get this as the output:

4974

df_train["clean_email"] = df_train.apply(do_process,axis=1) df_test

# df_train.head()
print "phrase found train:",df_train[df_train['clean_email'].str.co
print "phrase found test:",df_test[df_test['clean_email'].str.conta

Output

phrase found train: (371, 3)

phrase found test: (7, 3)

Let’s start training for Spam Detection now:

df_train.head()

Output

type email clean_email class

But could then once could pomp glee glorious
0 Spam 0
pomp to nor that glee g… deign vex time childe…
His honeyed and land honey land vile native ah ah
1 Spam 0
vile are so and native… like flash gild b…
Tear womans his was tear womans tis eremites
2 Spam 0
by had tis her eremites… present dear know pro…
The that and land. Cell land cell shun blazon
3 Spam 0
shun blazon passion… passion uncouth paphian …
Sing aught through sing aught part things
4 Spam 0
partings things was sacr… sacred know passion pro…

Source: Medium
Source: TowardsDataScience

For the next section, you can proceed with the Naive Bayes part of
the algorithm:

from sklearn.pipeline
import Pipeline from sklearn.feature_extraction.text import CountVe
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
text_clf = Pipeline([('vect', CountVectorizer()),

('tfidf', TfidfTransformer()), ('clf', MultinomialNB()),

text_clf.fit(df_train.clean_email, df_train["class"])
predicted = text_clf.predict(df_test.clean_email)
from sklearn import metrics
array = metrics.confusion_matrix(df_test["class"], predicted)
import seaborn as sn
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df_cm = pd.DataFrame(array, ["Spam","Ham"],

["Spam","Ham"])

sn.set(font_scale=1.4)#for label size

sn.heatmap(df_cm, annot=True,annot_kws={"size": 16})# font size

After running the Naive Bayes Algorithm, the output looks

something like this:
print metrics.classification_report(df_test["class"], predicted,
target_names=["Spam","Ham"])

We print the metrics to see something like this:

precision recall f1-score support

Spam 1.00 1.00 1.00 43

Ham 1.00 1.00 1.00 57

avg / total 1.00 1.00 1.00 100

In order to assess the model, we put the test data into our created
model after which we compare our results. As you can see in the
output above, it is visible that out of 43 spam mail, the model
successfully identifies all the 43 spam mails. And in the same way,
out of 57 ham mail, the model successfully identifies all the 57 Ham
mails.

Our application of the Bayes theorem formula in a Naive Bayes

Classifier technique is working successfully on this dataset. While it
is unusual to have 100% success from a model, we have been able
to achieve it due to the small size of training & testing datasets. All
we need to ensure is that the model trains with sufficient data. If this
happens, it will deliver more accurate results.

For further reading, see here the data science definition guide or
the data scientist job description.

Since you’re here…

Curious about a career in data science? Experiment with our
free data science learning path, or join our Data Science
Bootcamp, where you’ll get your tuition back if you don’t land a
job after graduating. We’re confident because our courses
work – check out our student success stories to get inspired.

About Sakshi Gupta

Sakshi is a Managing Editor at Springboard. She is
a technology enthusiast who loves to read and
write about emerging tech. She is a content
marketer with experience in the Indian and US
markets.

What Do Data Scientists Make? [Data Scientist Salary

Guide]
DATA SCIENCE JULY 8, 2020

How Much Does a Data Scientist at Amazon Earn?

DATA SCIENCE OCTOBER 28, 2021

Future Scope of Machine Learning: What’s to Come

for Work & Business
Elevate your skills and broaden your
horizons.
ENTER YOUR EMAIL

[email protected] Subscribe for free

DATA + AI WHY SPRINGBOARD FOR PARTNERS

Data Analytics Bootcamp How it works Universities

Data Science Bootcamp Mentorship Businesses

Data Engineering Job guarantee Hire our

Bootcamp graduates
Student
Machine Learning outcomes Affiliates
Engineering and AI
Student stories
Bootcamp RESOURCES
Community
Introduction to Data
Payment options Blog
Analytics
Scholarships Compare
Data Science Prep
bootcamps

CYBERSECURITY
Free courses

Learn data
Cybersecurity Bootcamp
science

Learn coding
SOFTWARE ENGINEERING
Learn
Software Engineering cybersecurity
Bootcamp
Learn data
Software Engineering analytics
Bootcamp for Beginners

Software Engineering Prep ABOUT US

Our story
DESIGN Become a mentor

Join our team

UI/UX Design Bootcamp
Press inquiries:
UX Design Bootcamp
[email protected]
Introduction to Design

Take our All

quiz courses

HW4 Text-1
No ratings yet
HW4 Text-1
8 pages
The 7 Laws of Highly Effective Sales Emails
No ratings yet
The 7 Laws of Highly Effective Sales Emails
3 pages
FW: For Electronic Arbitration Follow-Up (With New Bank of America Accounts Update)
No ratings yet
FW: For Electronic Arbitration Follow-Up (With New Bank of America Accounts Update)
61 pages
Naive Bayes Spam Classifier
0% (1)
Naive Bayes Spam Classifier
44 pages
164-331-3-PB
No ratings yet
164-331-3-PB
10 pages
Aayush Nihar Spam Mail Filtering
No ratings yet
Aayush Nihar Spam Mail Filtering
18 pages
Simple Naive Bayes Classifier For Email Classification
No ratings yet
Simple Naive Bayes Classifier For Email Classification
5 pages
Ba Yes I An Filtering
No ratings yet
Ba Yes I An Filtering
8 pages
Spam Classifier
No ratings yet
Spam Classifier
8 pages
Naive Bayes Classification - Elements of AI
No ratings yet
Naive Bayes Classification - Elements of AI
1 page
Implementation of Naïve Bayesian Spam Filter Algorithm
No ratings yet
Implementation of Naïve Bayesian Spam Filter Algorithm
16 pages
AIML ASSIGNMENT-2
No ratings yet
AIML ASSIGNMENT-2
8 pages
Detecting Spam Mail With Naive Bayes
No ratings yet
Detecting Spam Mail With Naive Bayes
5 pages
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
No ratings yet
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
4 pages
Naive Bayes Algorithm Notes
No ratings yet
Naive Bayes Algorithm Notes
10 pages
Assignment 3 28855
No ratings yet
Assignment 3 28855
3 pages
Week 3 - 5-Bayesian Methods
No ratings yet
Week 3 - 5-Bayesian Methods
4 pages
Bayesian_Inference
No ratings yet
Bayesian_Inference
20 pages
Mail Type Spam Classifier: Abstarct
No ratings yet
Mail Type Spam Classifier: Abstarct
9 pages
Ass 3
No ratings yet
Ass 3
2 pages
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
No ratings yet
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
3 pages
Spam Detection Model
No ratings yet
Spam Detection Model
4 pages
Spam email. Classifier ppt
No ratings yet
Spam email. Classifier ppt
16 pages
Maths Answers
No ratings yet
Maths Answers
4 pages
PPt For Email (3)
No ratings yet
PPt For Email (3)
8 pages
Spam Filtering Using Bayesian Approach: Presented By: Nitin Kumar
No ratings yet
Spam Filtering Using Bayesian Approach: Presented By: Nitin Kumar
11 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
DM chapter 3
No ratings yet
DM chapter 3
6 pages
NSAI notes Unit3
No ratings yet
NSAI notes Unit3
50 pages
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
No ratings yet
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
64 pages
Spam Email Detection Using Machine Learning[1] (1)
No ratings yet
Spam Email Detection Using Machine Learning[1] (1)
8 pages
A Plan For Spam
No ratings yet
A Plan For Spam
10 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
7 pages
Majority Voting Technique To Classify Emails As Spam or Ham: 1 Background, Context and Scope 2 Problem Description
No ratings yet
Majority Voting Technique To Classify Emails As Spam or Ham: 1 Background, Context and Scope 2 Problem Description
17 pages
paper presentation
100% (1)
paper presentation
8 pages
Content Based Spam Detection in Email Us PDF
No ratings yet
Content Based Spam Detection in Email Us PDF
5 pages
Anti-Spam Filter Based On Naïve Bayes, SVM, and KNN Model
No ratings yet
Anti-Spam Filter Based On Naïve Bayes, SVM, and KNN Model
5 pages
Lab7&8 NaiveBayes
No ratings yet
Lab7&8 NaiveBayes
5 pages
67e35ab89468f8a4cb01b1e4
No ratings yet
67e35ab89468f8a4cb01b1e4
31 pages
Email Spam Filtering Using Machine Learning.1[1]
No ratings yet
Email Spam Filtering Using Machine Learning.1[1]
16 pages
Naive_Bayes_Classifier
No ratings yet
Naive_Bayes_Classifier
3 pages
A Comparison of The Accuracy of Support Vector
No ratings yet
A Comparison of The Accuracy of Support Vector
17 pages
44 Decision Tree Model for Email Classification
No ratings yet
44 Decision Tree Model for Email Classification
4 pages
Lab5 NaiveBayes Full
No ratings yet
Lab5 NaiveBayes Full
5 pages
Multimedia Application L7_for
No ratings yet
Multimedia Application L7_for
46 pages
20 (1)
No ratings yet
20 (1)
16 pages
AI Phash 5
No ratings yet
AI Phash 5
14 pages
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
No ratings yet
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
6 pages
Categorization of Email Using Machine Learning On Cloud: Abstract
No ratings yet
Categorization of Email Using Machine Learning On Cloud: Abstract
5 pages
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
No ratings yet
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
7 pages
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
No ratings yet
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
4 pages
Spam Filtering Algorithm
No ratings yet
Spam Filtering Algorithm
19 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Information Security Awareness - Refresher Course
100% (2)
Information Security Awareness - Refresher Course
83 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
16 pages
Text Classification Using TF-IDF and Machine Learning
No ratings yet
Text Classification Using TF-IDF and Machine Learning
30 pages
ML6 Naive Bayes Spam Filter
No ratings yet
ML6 Naive Bayes Spam Filter
11 pages
Building A Powered Ai and Spam Caller
No ratings yet
Building A Powered Ai and Spam Caller
7 pages
Considering Behavior of Sender in Spam Mail Detection: S. Naksomboon, C. Charnsripinyo and N. Wattanapongsakorn
No ratings yet
Considering Behavior of Sender in Spam Mail Detection: S. Naksomboon, C. Charnsripinyo and N. Wattanapongsakorn
5 pages
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Computer Programming: A Step-by-Step Guide to Learn Python, SQL, C++, C#, Raspberry Pi, and Data Science
From Everand
Computer Programming: A Step-by-Step Guide to Learn Python, SQL, C++, C#, Raspberry Pi, and Data Science
Vere salazar
No ratings yet
Important Questions
No ratings yet
Important Questions
2 pages
17011/HYB SKZR EXP Second Sitting (2S)
No ratings yet
17011/HYB SKZR EXP Second Sitting (2S)
2 pages
0 BATCH
No ratings yet
0 BATCH
4 pages
AugustSeptember 2021
No ratings yet
AugustSeptember 2021
1 page
DBMS unit 1-5 notes (1)
No ratings yet
DBMS unit 1-5 notes (1)
69 pages
....
No ratings yet
....
3 pages
Helmet Detection Using Machine Learning and Automatic License Plate Recognition
No ratings yet
Helmet Detection Using Machine Learning and Automatic License Plate Recognition
18 pages
Computervisionandrobotics 181108104159
No ratings yet
Computervisionandrobotics 181108104159
61 pages
OD332463059110361100
No ratings yet
OD332463059110361100
2 pages
Coding Questions 2 - Accenture
No ratings yet
Coding Questions 2 - Accenture
6 pages
17405/krishna Express Second Sitting (2S)
No ratings yet
17405/krishna Express Second Sitting (2S)
3 pages
Kvyafr
No ratings yet
Kvyafr
1 page
Wa0009.
No ratings yet
Wa0009.
18 pages
Cover Page of Mini Project 2024-25
No ratings yet
Cover Page of Mini Project 2024-25
5 pages
Research Paper
No ratings yet
Research Paper
4 pages
CD Unit-2 Part 1
No ratings yet
CD Unit-2 Part 1
26 pages
Avinash App
No ratings yet
Avinash App
1 page
Bozkus 2009
No ratings yet
Bozkus 2009
6 pages
Sample
No ratings yet
Sample
28 pages
Mini Project Sample Report
No ratings yet
Mini Project Sample Report
83 pages
VTS JD 2025 (3.0)
No ratings yet
VTS JD 2025 (3.0)
6 pages
Pec Shorts
No ratings yet
Pec Shorts
4 pages
Muharram - Holiday Circular
No ratings yet
Muharram - Holiday Circular
1 page
Unit 4
No ratings yet
Unit 4
31 pages
ML Important Questions For Preparation All Units 2022
No ratings yet
ML Important Questions For Preparation All Units 2022
12 pages
Unit 1
No ratings yet
Unit 1
19 pages
Shorts
No ratings yet
Shorts
2 pages
UNIT 3 - Part 1 Google Docs
No ratings yet
UNIT 3 - Part 1 Google Docs
13 pages
CD Assignment - 2
No ratings yet
CD Assignment - 2
1 page
Unit 2
No ratings yet
Unit 2
6 pages
Advert - Attendant 3 - Tool Keeper - November - 2024
No ratings yet
Advert - Attendant 3 - Tool Keeper - November - 2024
2 pages
Microsoft FLow Offficial Documentation
100% (2)
Microsoft FLow Offficial Documentation
538 pages
G Suite Data Protection Implementation Guide
No ratings yet
G Suite Data Protection Implementation Guide
31 pages
AutoLINK AL329 Quick Start Guide
No ratings yet
AutoLINK AL329 Quick Start Guide
1 page
All About Java Mail
100% (26)
All About Java Mail
175 pages
ICTICT608 Project Portfolio
No ratings yet
ICTICT608 Project Portfolio
23 pages
On-Line Transaction Request Form: Type of Transaction Requirement Email Address 1 2 3
100% (1)
On-Line Transaction Request Form: Type of Transaction Requirement Email Address 1 2 3
2 pages
Extra Credit For Each Privacy Policy Review-Final 2023 - Steam
No ratings yet
Extra Credit For Each Privacy Policy Review-Final 2023 - Steam
7 pages
Insider Threats in Cyber Security 1st edition by Christian Probst, Jeffrey Hunker, Dieter Gollmann, Matt Bishop ISBN 1461426383 9781461426387 pdf download
No ratings yet
Insider Threats in Cyber Security 1st edition by Christian Probst, Jeffrey Hunker, Dieter Gollmann, Matt Bishop ISBN 1461426383 9781461426387 pdf download
56 pages
FMT Courier Company Service-1 PDF Courier Dubai
100% (1)
FMT Courier Company Service-1 PDF Courier Dubai
1 page
How To Succeed in MrBeast Production (Leaked PDF
No ratings yet
How To Succeed in MrBeast Production (Leaked PDF
4 pages
Help File English
No ratings yet
Help File English
37 pages
RSRTC Mobile Ticket Booking: User Guide
No ratings yet
RSRTC Mobile Ticket Booking: User Guide
10 pages
digital-communication
No ratings yet
digital-communication
10 pages
Step by Step On How To Apply For NCAGE
No ratings yet
Step by Step On How To Apply For NCAGE
8 pages
Test 02
No ratings yet
Test 02
8 pages
Chinhoyi University of Technology School Engineering Sciences and Technology
No ratings yet
Chinhoyi University of Technology School Engineering Sciences and Technology
7 pages
HoduCC - MT - Supervisor - User - Manual - V 3.5.1
No ratings yet
HoduCC - MT - Supervisor - User - Manual - V 3.5.1
181 pages
Testcasei D Descriptio N Steps Actual Data Expected Results Actual Results Statu S
No ratings yet
Testcasei D Descriptio N Steps Actual Data Expected Results Actual Results Statu S
8 pages
Chicken Waray
No ratings yet
Chicken Waray
5 pages
Int Oke
No ratings yet
Int Oke
12 pages
Course Outline - EndUser
No ratings yet
Course Outline - EndUser
4 pages
Fitness Ace
No ratings yet
Fitness Ace
1 page
HP Channel Services Delivery Operations Guide (FY24-H1) (AMS)
No ratings yet
HP Channel Services Delivery Operations Guide (FY24-H1) (AMS)
82 pages
Educ 205-Human Relations
No ratings yet
Educ 205-Human Relations
20 pages
internet and email exam
No ratings yet
internet and email exam
2 pages
Install Squid on pfSense including complete ClamAV Setup
No ratings yet
Install Squid on pfSense including complete ClamAV Setup
15 pages
Client Name: Package Chosen/Amount Paid
No ratings yet
Client Name: Package Chosen/Amount Paid
6 pages

7.email Spam Filtering Using Naive Bayes Classifier

Uploaded by

7.email Spam Filtering Using Naive Bayes Classifier

Uploaded by

___

Email Spam Filtering Using

Naive Bayes is a probabilistic algorithm based on the Bayes

In this article, we will understand briefly about the Naive Bayes

The Naive Bayes Classifier

This theorem, as explained in one of our previous articles, is mainly

Get To Know Other Data Science

Detecting Email Spam

Like Naive Bayes, other classifier algorithms like Support Vector

Source: Sebastian Raschka

Always review the first 5 rows of the dataset:

Ensuring data consistency is of utmost importance in any data

As we can see, in our test data, we have an equal number (1000

return ' '.join(bigram[temp.split(" ")])

def phrases_train(sen_list,min_ =3):

except Exception as ex:

print "after pre_process :"

Here is the output after an initial pre_processing:

And let’s retrain again! Here is the output:

print len(dict((key,value) for key, value in bigram.vocab.iteritems

df_train["clean_email"] = df_train.apply(do_process,axis=1) df_test

phrase found train: (371, 3)

Let’s start training for Spam Detection now:

type email clean_email class

('tfidf', TfidfTransformer()), ('clf', MultinomialNB()),

df_cm = pd.DataFrame(array, ["Spam","Ham"],

sn.set(font_scale=1.4)#for label size

After running the Naive Bayes Algorithm, the output looks

We print the metrics to see something like this:

precision recall f1-score support

Spam 1.00 1.00 1.00 43

avg / total 1.00 1.00 1.00 100

Our application of the Bayes theorem formula in a Naive Bayes

Since you’re here…

About Sakshi Gupta

DATA SCIENCE MAY 19, 2023

What Do Data Scientists Make? [Data Scientist Salary

How Much Does a Data Scientist at Amazon Earn?

DATA SCIENCE OCTOBER 28, 2021

Future Scope of Machine Learning: What’s to Come

[email protected] Subscribe for free

DATA + AI WHY SPRINGBOARD FOR PARTNERS

Data Analytics Bootcamp How it works Universities

Data Science Bootcamp Mentorship Businesses

Data Engineering Job guarantee Hire our

Software Engineering Prep ABOUT US

Join our team

Take our All

You might also like