Phishing Email Detection Abstract
Phishing Email Detection Abstract
Abstract:
The phishing email is one of the significant threats in the world today and has
causedtremendous financial losses. Although the methods of confrontation are
continually being updated, the results of those methods are not very satisfactory at
present. Moreover, phishing emails are growing at an alarming rate in recent years.
Therefore, more effective phishing detection technology is needed to curb the
threat of phishing emails. In this paper, we first analyzed the email structure. Then
based on an improved Recurrent Convolutional Neural Networks (RCNN) model
with multilevel vectors and attention mechanism, we proposed a new phishing
email detection model named, which is used to model emails at the email header,
the email body, the character level, and the word level simultaneously. To evaluate
the effectiveness of, we use an unbalanced dataset that has realistic ratios of
phishing and legitimate emails. Experimental results show that the. Meanwhile, the
ensure that the filter can identify phishing emails with high probability and filter
out legitimate emails as little as possible. This promising result is superior to the
existing detection methods and verifies the effectiveness of in detecting phishing
emails.
Architecture:
EXISTING SYSTEM:
Various techniques for detecting phishing emails are mentionedin the literature. In
the entire technology development process, there are mainly three types of
technical methods including blacklist mechanisms, classification algorithms based
on machine learning and based on deep learning. From previous work, the existing
detection methods based on the blacklist mechanism mainly rely on people’s
identification and reporting of phishing links requiring a large amount of
manpower and time. However, applying artificial intelligence to the detection
method based on a machine learning classification algorithm requires feature
engineering to manually find representative features that are not conducive to the
migration of application scenarios. Moreover, the current detection method based
on deep learning is limited to word embedding in the content representation of the
email. These methods directly transferred natural language processing (NLP) and
deep learning technology, ignoring the specificity of phishing email detection so
that the results were not ideal Given the methods mentioned above and the
corresponding problems, we set to study phishing email detection systematically
based on deep learning. Specifically, this paper makes the following contributions:
Disadvantages
1. With respect to the particularity of the email text, weanalyze the email
structure, and mine the text featuresfrom four more detailed parts: the email
header, theemail body, the word-level, and the char-level.
2. The RCNN model is improved by using the Then,the email is modelled from
multiple levels using animproved RCNN model. Noise is introduced as
littleas possible, and the context information of the emailcan be better
captured.
PROPOSED SYSTEM:
With the emergence of email, the convenience of communicationhas led to the
problem of massive spam, especially phishing attacks through email. Various anti
phishing technologies have been proposed to solve the problem of phishing
attacks. studied the effectiveness of phishing blacklists. Blacklists mainly include
sender blacklists and link blacklists. This detection method extracts the sender’s
address and link address in the message and checks whether it is in the blacklist to
distinguish whether the email is a phishing email. The update of a blacklist is
usually reported by users, and whether it is a phishing website or not is manually
identified. At present, the two well-known phishing websites are PhishTank
andOpenPhish. To some extent, the perfection of the blacklist determines the
effectiveness of this method based on the blacklist mechanism for phishing email
detection.The currentsituation is that new threats may not only cause severe
damage to customers’ computers but also aim to steal their money and identity.
Among these threats, phishing is a noteworthy one and is a criminal activity that
uses social engineering and technology to steal a victim’s identity data and account
information. According to a report from the Anti-Phishing Working compared with
the fourth quarter of According to the striking data, it is clear that phishing has
shown an apparent upward trend in recent years. Similarly, the harm caused by
phishing can be imagined as well.
Advantages
1. Phishing email refers to an attacker using a fake email to trick the recipient
into returning information such as an account passwordto a designated
recipient.
2. Additionally, it may be used to trick recipients into entering special web
pages, which are usually disguised as real web pages, such as a bank’s web
page, to convince users to enter sensitive information such as a credit card or
bank card number and password. Although the attack of phishing email
seems simple, its harmis immense.
ALGORITHM
R-CNN Algorithms
Let’s quickly summarize the different algorithms in the R-CNN family (R-CNN,
Fast R-CNN, and Faster R-CNN) that we saw in the first article. This will help lay
the ground for our implementation part later when we will predict the bounding
boxes present in previously unseen images (new data). R-CNN extracts a bunch of
regions from the given image using selective search, and then checks if any of
these boxes contains an object. We first extract these regions, and for each region,
CNN is used to extract specific features. Finally, these features are then used to
detect objects. Unfortunately, R-CNN becomes rather slow due to these multiple
steps involved in the process. Fast R-CNN, on the other hand, passes the entire
image to ConvNet which generates regions of interest (instead of passing the
extracted regions from the image). Also, instead of using three different models (as
we saw in R-CNN), it uses a single model which extracts features from the regions,
classifies them into different classes, and returns the bounding boxes. All these
steps are done simultaneously, thus making it execute faster as compared to R-
CNN. Fast R-CNN is, however, not fast enough when applied on a large dataset as
it also uses selective search for extracting the regions.
REQUIREMENT ANALYSIS
REQUIREMENT SPECIFICATION
Functional Requirements
Graphical User interface with the User.
Software Requirements
For developing the application the following are the Software Requirements:
1. Python
2. Django
3. MySql
4. MySqlclient
5. WampServer 2.4
1. Windows 7
2. Windows XP
3. Windows 8
1. Python
Hardware Requirements
For developing the application the following are the Hardware Requirements:
Conclusion:
we use a new deep learning model namedto detect phishing emails. The model
employs an improved CNN to model the email header and the email body at both
the character level and the word level. Therefore, the noise is introduced into the
model minimally. In the model, we use the attention mechanism in the header and
the body, making the model pay more attention to the morevaluable information
between them. We use the unbalanced dataset closer to the real-world situation to
conduct experiments and evaluate the model. The model obtains a promising
result. Several experiments are performed to demonstrate the benefits of the
proposed model. For future work, we will focus on how to improve our model for
detecting phishing emails with no email header and only an email body.