Slide Format
Slide Format
CLASSIFICATION OF EMAILS
USING DATA MINING AND
MACHINE LEARNING ALGORITHMS
TEAMS
SI NO NAME USN
01 ADITI 01JST21IS004
03 RAKSHITH A H 01JST21IS040
AGENDA
INTRODUCTION
E-mail is one of the most popular and frequently used ways of communication due to its
worldwide accessibility, relatively fast message transfer, and low sending cost.
Email messages are sent from software programs and web browsers, collectively
referred to as email ‘clients.’ Individual messages are routed through multiple servers
before they reach the recipient’s email server. In general, all the e-mail messages are
classified as “Ham” and “Spam”. Ham messages are the intended or safe legitimate
messages in a mailbox; whereas Spam messages are the junk, unsolicited bulk or
commercial messages in the mailbox.
An e-mail could be considered as Spam e-mail when it is associated with Bad grammar,
Distorted images, Distorted symbols or logos, Bad links, Tempting offers, and timebased
subscriptions that forces the users to subscribe immediately. Phishing is also considered
as one of the dangerous cyber-crime which targets the individuals and tricks them to
click on links or subscribe to steal the individual’s data like login credentials of social
accounts ,internet banking details in the worst-case scenario. Phishing e-mails are also
considered as spam messages. Spam e-mails also include Spamvertised sites - emails
that advertise products containing URLs that direct to other webpages, 419 Scams –
spam emails where a small initial payment in a huge sum of money is offered to the
users, Image spams – content present in an e-mail is displayed in the form of images
When a large number of spam messages are received, it is necessary to take a long time
to identify spam or non-spam email and their email messages may cause the mail server
to crush. To solve the spam problem, there have been several attempts to detect and
filter the spam email on the client-side. Data mining and ML approaches are applied to
the problem, including Bayesian classifiers as Naive Bayes ,KNN algorithm .
SI NO TITLE AUTHORS REVIVEW
02 SPAM T. Hamsapriya ,
CLASSIFICATION D. Karthika
BASED ON Renuka and M.
SUPERVISED
LEARNING USING Raja
MACHINE Chakkaravarthi
LEARNING
TECHNIQUES
SI NO TITLE AUTHORS REVIVEW
04 EMAIL
CLASSSIFICATIO
N ANALYSIS
USING
MACHINE
LEARNING
TECHNIQUES
SI NO TITLE AUTHORS REVIVEW
05
Problem statement
Aim :
To develop an effective classification model using data mining and machine learning
techniques for accurately distinguishing between spam and non-spam (ham) emails.
Objectives :
Data Gathering: Collect a variety of labeled emails containing both spam and ham
examples.
Data Cleaning: Remove duplicate emails, handle missing data, and address class
imbalance.
Feature Extraction: Identify and extract relevant features from the email content.
Model Building: Develop and evaluate multiple machine learning models for classifying
spam and ham emails.
Performance Evaluation: Assess the models' performance using metrics like accuracy,
precision, recall, and F1 score.
Model Comparison: Compare the performance of different models to identify the most
effective one.
Optimization: Fine-tune the parameters of the best-performing model to improve its
accuracy and generalization.
Deployment: Implement the chosen model into a real-world email system for automatic
spam classification