Artificial Intelligence: Project Proposal On Spam Filtering
Artificial Intelligence: Project Proposal On Spam Filtering
Project Proposal
on
Spam Filtering
Motivation
Email has become one of the most important forms of communication. Spam is one of the major
threats posed to email users. In 2013, 69.6% of all email flows were spam.[2] Links in spam
emails may lead to users to websites with malware or phishing schemes, which can access and
disrupt the receiver’s computer system. Therefore, an effective spam filtering technology is a
significant contribution to the sustainability of the cyberspace and to our society.
There are currently different approaches to spam detection. These approaches include
blacklisting, detecting bulk emails, scanning message headings, greylisting, and content-based
filtering :
Blacklisting is a technique that identifies IP addresses that send large amounts of spam.
These IP addresses are added to a Domain Name System-Based Blackhole List and future
email from IP addresses on the list are rejected. However, spammers are circumventing
these lists by using larger numbers of IP addresses.
Detecting bulk emails is another way to filter spam. This method uses the number of
recipients to determine if an email is spam or not. However, many legitimate emails can
have high traffic volumes.
Scanning message headings is a fairly reliable way to detect spam. Program written by
spammers generate headings of emails. Sometimes, these headings have errors that cause
them to not fit standard heading regulations. When these headings have errors, it is a sign
that the email is probably spam. However, spammers are learning from their errors and
making these mistakes less often
Greylisting is a method that involves rejecting the email and sending an error message
back to the sender. Spam programs will ignore this and not resend the email, while
humans are more likely to resend the email. However, this process is annoying to humans
and is not an ideal solution.
Current spam techniques could be paired with content-based spam filtering methods to
increase effectiveness. Content-based methods analyze the content of the email to determine if
the email is spam. The goal of our project was to analyze determine their effectiveness as
content-based spam filters.