0% found this document useful (0 votes)

52 views

An Effective Detection Approach For Phishing URL Using ResMLP

Uploaded by

hariharanrniit

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views

An Effective Detection Approach For Phishing URL Using ResMLP

Uploaded by

hariharanrniit

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Received 15 May 2024, accepted 29 May 2024, date of publication 3 June 2024, date of current version 11 June 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3409049

An Effective Detection Approach for Phishing

URL Using ResMLP
S. REMYA 1 , MANU J. PILLAI 2, KAJAL K. NAIR 2, SOMULA RAMA SUBBAREDDY3 ,
AND YONG YUN CHO 4
1 AmritaSchool of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, Kerala 690525, India
2 Department of Computer Science and Engineering, TKM College of Engineering, Kollam, Kerala 691005, India
3 Department of Information Technology, Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology, Hyderabad, Telangana 500090,

India
4 Department of Information and Communication Engineering, Sunchon National University, Jeollanam-do, Suncheon 57922, South Korea

Corresponding author: Yong Yun Cho ([email protected])

This work was supported in part by the Innovative Human Resource Development for Local Intellectualization Program through the
Institute of Information and Communications Technology Planning and Evaluation (IITP) Grant funded by Korean Government, Ministry
of Science and ICT (MSIT), South Korea, under Grant IITP-2024-2020-0-01489, 50%; and in part by the MSIT through the Information
Technology Research Center (ITRC) Support Program supervised by IITP under Grant RS-2024-00259703, 50%.

ABSTRACT Phishing websites, mimicking legitimate counterparts, pose significant threats by stealing
user information through deceptive Uniform Resource Locators (URLs). Traditional blacklists struggle
to identify dynamic URLs, necessitating advanced detection mechanisms. In this study, we propose an
effective approach utilizing residual pipelining for phishing URL detection. Our method extracts common
URL features and sentiments, employing a residual pipeline comprising convolutional and inverted residual
blocks. These resultant features are then fed into a Multi-Layer Perceptron (MLP) for classification.
We evaluate the efficacy of our approach against traditional algorithms using a Kaggle dataset. Our results
demonstrate superior accuracy, precision, F1 Score, and recall, showcasing its effectiveness in mitigating
phishing threats. Utilizing a residual pipeline made up of convolutional and inverted residual blocks, we start
our method by identifying similar URL features and sentiments. We also use domain age research to figure
out how long URLs have been around. Additionally, the lexical study of URL structure makes our method
more useful, resulting in impressive accuracy. With an accuracy of 98.29%, this research highlights the
importance of innovative techniques in combating evolving cyber threats. Future research directions could
focus on enhancing the model’s robustness against adversarial attacks and integrating real-time monitoring
for proactive defense strategies.

INDEX TERMS Phishing, URL detection, residual pipelining, cybersecurity, classification.

I. INTRODUCTION suffer severe consequences [1]. Realistically, detecting and

Phishing websites are malicious websites that look similar recognizing phishing websites is a dynamic and difficult
to legitimate websites in terms of their web pages and endeavor. Phishing offenses can be conducted through
Uniform Resource Locator(URL) addresses. Phishing takes several channels, including via email, websites, spyware,
the form of URL phishing, in which a threat actor manipulates SMS, and voice calls. Among the most prominent types
internet URLs in a variety of ways to encourage their of URL phishing attacks is, when a fraudster impersonates
targets to click them. Usually, clicking on these links leads a well-known company and sends a bogus email with the
individuals to fraudulent, malware-infected websites that message ‘‘Your account has been disabled’’. In response,
look for sensitive personal data, such as banking account alarmed users click the link, unknowingly downloading
details and passwords. The victims of these connections may malware onto their computers.
Phishing attempts have risen by 33% annually since
The associate editor coordinating the review of this manuscript and 2015 on average. Due to the expansion of the internet
approving it for publication was Tyson Brooks . and the number of people working from home, phishing
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 79367
S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

has increased more than twice as much as it did in 2022. enhanced performance and resilience against evolving
December 2021 witnessed 316,747 attacks, reported by the cyber threats.
Anti-Phishing Working Group’s (APWG) [2], the highest • Additionally, we introduce a comprehensive feature set
case in its history. Bank-related phishing assaults accounted comprising novel and existing features, boosting the
for 23.2% of all phishing attacks in the fourth quarter of 2021, effectiveness of our detection mechanism.
according to OpSec Security, a founding member of APWG. • Through extensive experimentation and evaluation,
Anti-phishing is the technique of preventing phishing we validate the precision and accuracy of our method in
attacks in which attackers try to get sensitive information identifying legitimate websites while minimizing false
through non-repudiation. Attacker’s tactics and methods of positive rates.
targeting have advanced significantly as common phishing To present our research findings logically, the subsequent
techniques have become more transparent to the general sections are organized as follows. The section on Literature
public. Many businesses have created anti-phishing systems Review thoroughly examines relevant publications that form
[3] to reduce these hazards, however, these tools are not the the basis of our research and offer insights into current phish-
last layer of defense. Anti-phishing software is a platform ing detection techniques and approaches. The methodology
or series of software services that can identify malicious section provides comprehensive details about the technical
inbound messages that pose as authentic or try to gain trust principles and theoretical foundations of our suggested
through social engineering. It also allows users to create methodology, which are crucial for placing our research in
whitelists and blacklists for message filtering and takes context. The dataset that we used for our experiments is
preventative measures when necessary [4]. However, these presented in the section Dataset and Experimental Results,
are insufficient to battle phishing, since attackers make use along with a detailed analysis of the findings of our research.
of one-time phishing URLs. Machine learning techniques In the result section, we also provide in-depth analyses of
are used to deal with this trick, depending on an integrated the performance of our proposed approach across various
classifier to look at the properties of sample URLs [5], [6] evaluation metrics. Concisely summarising our results, the
to make judgments for new, developing ones[21]. Likewise, conclusion section explores future directions for phishing
deep learning-based methods [8] are developed which are detection research and development.
capable of classifying data more accurately than traditional
ML models. II. RELATED WORKS
A novel approach is introduced in this research work Phishing has long been one of the most popular cyber-attack
leveraging residual pipelining methodology to enhance the strategies used by bad actors. The problems posed by
efficacy of traditional detection mechanisms, where URL phishing websites have been addressed by numerous studies.
features along with a few sentimental features are collected Many techniques for detecting phishing websites have
as part of feature extraction and transformed into a matrix. been proposed, including blacklist-based and heuristic-based
Then this matrix is fed onto the residual pipeline module techniques. The statistics from the training dataset have
which consists of convolution layers and inverted residual a substantial impact on the weights in the heuristic-based
layers. After that, the obtained result is fed into an output approach. Blacklists [9], which is a dataset consisting of
block where the actual classification of the URL takes malicious URLs are still used by several internet companies.
place. In response to the escalating sophistication of phishing However, it is unable to forecast outcomes for a new URL
attacks, we provide an effective solution using a hybrid that has not yet been added to the list, because attackers
feature set. This collection includes different hyperlink are increasingly using one-time URLs to carry out attacks.
information, and URL character sequence characteristics, To address this issue several approaches have been developed.
culminating in the creation of feature vectors necessary for Xiao et al [10] developed CNN-MHSA, a combination of
training our anti-phishing model. Ultimately, rigorous testing Convolutional Neural Network (CNN) and multi-head self-
reveals that the accuracy reaches up to 98.295%, which attention mechanisms to detect phishing websites. In this
outperforms the conventional methods. method, feature extraction and weight calculation are per-
Our anti-phishing solution is meticulously designed to formed independently by duplicating the input matrix into
fulfill stringent requirements essential for robust detection two. The self-attention mechanism then aids in identifying
and prevention of phishing attacks. It prioritizes high whether websites are malicious or benign. This method
detection efficiency, real-time detection capabilities, target exhibits strong performance in differentiating phishing
independence, and third-party independence. By minimizing websites from authentic ones by utilizing CNN’s capability
false positives and maximizing true positives, our method for spatial feature learning and self-attention for collecting
ensures timely prediction of phishing attempts while main- long-range relationships. Model CNN-MHSA, improved
taining adaptability to emerging threats without reliance on efficiency and interpretability can be seen in its ability
external services. to decouple the weight calculation procedure from feature
Key contributions of our research include: extraction. It is crucial to recognize that, even with the
• The proposal of a phishing detection approach that encouraging results, the complex neural network architecture
integrates residual pipelining methodology, offering may need a significant amount of computing power for both
79368 VOLUME 12, 2024
S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

training and inference. The efficacy of the model can also there might be some restrictions on it. For example, the
be impacted by factors such as the variety and quality of availability and variety of phishing kits evaluated, as well as
the training data as well as the dynamic tactics employed by the timeliness and accuracy of the data acquired, could limit
phishing opponents. the efficacy. Furthermore, the study generalizes the findings
For precise phishing detection, Weiping Wang et al. to more extensive phishing threats and attack scenarios. Apart
[11] established a method called Recurrent Convolutional from that, it might be difficult to analyze the dynamic nature
Neural Networks (PDRCNN). A two-dimensional tensor of phishing attempts and the quick development of phishing
representation generated by the PDRCNN is given as an input strategies. Notwithstanding these possible drawbacks, this
for classification purposes. By utilizing these attributes, the investigation adds a great deal to our knowledge of how
model can identify temporal and spatial patterns of URL phishing kits are used and develop stronger defences against
data, which improves the identification of phishing efforts. phishing attacks.
Large labeled datasets are necessary, high computational The novel technique ‘‘Antiphishing through Phishing
resource needs, and overfitting vulnerability are some of Target Discovery,’’ by Liu et al. [15], aims to detect possible
the drawbacks of PDRCNN, despite its advantages such as phishing websites through an analysis of their parasitic
its ability to handle sequential data and adapt to various community structure. Identifying the principal phishing
URL formats. PDRCNN cannot be successfully used in target webpage and exposing ‘‘parasitic’’ connections are the
actual cybersecurity applications until these problems are goals of the above method, which collects webpages that
fixed. are either directly or indirectly linked to a certain suspicious
The Phishing Hybrid Feature-Based Classifier (PHFBC), webpage. However there is a chance that this approach
designed by Zuhair et al. [12], combines recursive fea- may fail, especially in cases when the parasitic community
ture subset selection with ML approaches to produce a structure is dynamic or complex. Furthermore, the quality and
comprehensive phishing detection system. With a set of completeness of the web link data used for analysis, as well
features gathered from phishing and legitimate websites, as the variety of phishing strategies and techniques used by
their objective was to accurately classify phishing. PHFBC the attackers, could have an impact on the accuracy of the
incorporates decision tree and Naive Bayes models using methodology.
a statistical measure known as the Phish Ratio. Though it CANTINA is a content-based method developed by
is an innovative technique, it has limitations such as being Zhang et al. [16] that analyses character scores and extracts
susceptible to feature replication, requiring laborious and keywords from website texts to identify phishing websites.
prone to error manual feature extraction, and having trouble Notwithstanding its inventive methodology, CANTINA can
selecting the best features for different phishing scenarios. encounter various constraints. For example, the use of
Furthermore, parameters like representativeness in response TF-IDF analysis and character scores alone may miss less
to changing phishing strategies could have an impact on how obvious signs of phishing activity, like visual cues or
successful PHFBC is. Resolving these issues would improve contextual components. In addition, the model’s efficacy
PHFBC’s resilience and suitability for use in actual phishing might be restricted by the level of accuracy and significance
detection situations. of the selected keywords in addition to the possibility
Ramesh et al. [13] provided a technique for identifying of false positives or false negatives in the Google search
phishing webpages and their target domains through simula- results. Furthermore, Google search rankings as a metric
tion analysis. Utilizing row and column sums, the technique for legitimacy could lead to bias and inaccurate results,
determines target linkages, producing a parasitic matrix especially when phishing websites alter search engine results
that depicts the connection between two sites. However, or genuine websites are not highly ranked.
scalability problems with this method could appear when To create a classification model, Marchal et al. [17] used
handling huge datasets or intricate website architecture. a feature extraction technique, extracting 212 features and
Additionally, how well the technique works may depend on applying Gradient Boosting. Although this methodology is
the accuracy of the presumed correlations discovered and the a thorough attempt to capture several aspects of phishing
consistency of the row and column total computations. The websites, it might run into issues with feature selection and
human-generated parasite matrix may also produce biased or model complexity. The amount of features that are extracted
erroneous results, and the system may not be able to adapt to may cause problems like overfitting, particularly if some
evolving phishing or website design trends. of the features are irrelevant to the purpose of phishing
Cova et al. [14] intended to comprehend the basic detection. Furthermore, the process of manually extracting
design and application of phishing kits to determine the features can be time-consuming and may eliminate important
methods of deception used by hackers to hide backdoors details from phishing websites, which could reduce the
they had installed and to educate interested parties about ability of the model to identify new and developing phishing
the techniques that phishers normally use to send phished techniques. Furthermore, considering large-scale deployment
data. Although their research offers insightful information scenarios when computational resources are limited, the
about the strategies and methods used in phishing attempts, selection of Gradient Boosting as the classification algorithm

VOLUME 12, 2024 79369

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

TABLE 1. Literature review of phishing detection methods.

79370 VOLUME 12, 2024

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

TABLE 1. (Continued.) Literature review of phishing detection methods.

may provide interpretability and scalability issues for the computational complexity of training deep neural networks
model. like DBN and the additional overhead caused by over-
Self-structuring neural networks are the basis of an sampling techniques like Borderline-SMOTE could threaten
inventive technique proposed by Mohammad et al. [18] for the scalability and efficiency of the model, particularly in
comprehending phishing websites. For all its advantages— real-time or resource-constrained contexts. Addressing these
such as a self-organized neural network and a high level of limitations will be essential for ensuring the practical viability
noise tolerance—this method may have a few disadvantages. and effectiveness of the proposed phishing detection method
Neural network complexity can be a disadvantage since it in real-world cybersecurity applications.
can lead to problems with the interpretability of the models Leveraging online learning with n-grams as a technique
and processing performance. The effectiveness may also for phishing website identification was proposed by Verma
be influenced by the training data, given the dynamic and and Das [21]. They divide URLs into n-grams to detect
ever-changing nature of phishing attacks. Furthermore, the phishing websites. Even while this strategy has advantages,
diverse and representative nature of the training dataset, like its adaptability to new phishing strategies and its capacity
along with the neural network’s capacity to generalize across to manage flowing data, it may also have disadvantages.
various phishing scenarios and attack vectors, could have Other variables could impact the outcome, including the
an impact on the model’s performance. Addressing these choice of n-grams, the level of information in the feature
limitations will be crucial for ensuring the practical utility and representation, and the effectiveness of online learning
effectiveness of the proposed approach. algorithms in processing massive amounts of data quickly.
Nguyen et al. [19] developed a single-layer neural network, The method may also not work as well or last as long if it
which computes heuristics and generates weights using the relies too much on online training, which can lead to problems
network. This is an effective technique for phishing detection. with model shifting and idea development over time.
The single-layer architecture of this approach makes it Through deep learning-based multidimensional features,
simple and computationally efficient, but it might have Yang et al. [22] suggested an approach to find fake
drawbacks. For example, the single-layer neural network’s websites. Because they use deep learning to identify features
ability to identify complex patterns and relationships in related to character sequences from URLs, their method
the data may limit the efficacy. Additionally, in situations makes it possible to quickly group things into categories.
when the underlying data distribution is extremely complex Dimensionality reduction, pattern recognition, and one-hot
or unpredictable, relying solely on heuristics for feature encoding and embedding of URLs are used in this method
extraction and weight computation may result in less- to try to find complicated patterns that point to phishing
than-ideal performance. Additionally, single-layer neural activities.
networks’ lack of depth and limited representational capacity For finding fake websites, Sun et al. [23] proposed a
may limit the capacity to generalize, which could impair their new method using graph neural networks. Unlike traditional
effectiveness in phishing cases that have yet to be discovered feature-based systems, the one created by Sun et al. does an
or developed. excellent task of capturing the complex relationships between
Zhang and Li [20] introduced Borderline-SMOTE Deep URLs. The network architecture is extensively looked at
Belief Network (DBN). Improved detection accuracy and and subtle patterns linked to phishing are found using a
model robustness are two potential advantages of this graph neural network. Utilizing the framework of information
approach, but it may also have some disadvantages. An exam- found in URLs and the intricate links between them, their
ple of this would be the representativeness and the training method greatly enhances the accuracy of detection. There will
data quality, which could affect the effectiveness, particularly be significant advantages over current feature engineering
considering the challenges that imbalanced datasets in methods if this new method can regularly and accurately spot
phishing detection tasks may provide. Nevertheless, the complex phishing attempts.

VOLUME 12, 2024 79371

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

FIGURE 1. Architecture of the proposed model.

FIGURE 2. Model overview.

In order to deal with the problem of not having adequate handle large-scale datasets. It is important to tackle these
information, Chen et al. [24] suggested a deep transfer constraints to guarantee the dependability and effectiveness
learning system that would be optimized for finding phishing of the proposed phishing detection technique in practical
emails. Their method works well with new datasets that cybersecurity implementations. The summary of the existing
don’t have a lot of labeled data because it uses models state of the art is shown in Table 1.
that have already been trained and transfer knowledge
from high-quality datasets. To get high recognition accuracy III. METHODOLOGY
with minimal training data, this method works well for With the increasing threat of phishing attacks, our research
generalizing models, which is helpful when it’s hard to get aims to create a strong method for spotting phishing
cases that have been labeled. Their method combines domain URLs. Central to our approach is the integration of MLP
knowledge with transferable information. Therefore, it will within a residual pipelining framework. This innovative
be possible to make detection systems that are more flexible amalgamation of methodologies aims to capitalize on the
and reliable. strengths of each approach, thereby enhancing the efficacy
Asiri et al. [25] came up with a new way to use deep and accuracy of phishing website detection.
reinforcement learning to find hacking attempts in real time
so that security can be proactive against attempts that change A. SYSTEM ARCHITECTURE
all the time. Because it is always changing and learning from The overall architecture depicted in Figure 1 of the proposed
how people use URLs, their system gets better at finding approach is divided into 4 phases such as, the features are
things over time. An flexible learning method is used to make extracted in Phase 1. Feature vectorization to create a unique
the model quickly respond to new phishing threats as they feature vector for every webpage is incorporated in Phase 2.
appear. This is done by using real-world data about how Phase 3 is doing the ML part. Whether the provided webpage
users interact with the system. Their system transforms into a is phishing or not is determined in Phase 4.
strong defense that can find and stop phishing attempts very
well through a routine of observation, action, and reward. 1) FEATURE EXTRACTION
Flexible cybersecurity solutions can start a new era with this An integral part of our methodology is the feature extraction
way. High-level security is provided by these solutions, which procedure from URLs, which is a critical step in the detection
can adapt quickly to changing cyber threat situations. pipeline. We carefully choose and extract emotive attributes
These strategies may have problems despite their benefits, from 25 different URLs to use as input features in our
such as being able to naturally learn hierarchical represen- detection model. A few instances of the numerous variables
tations from raw data and capturing complex correlations covered by these characteristics are the length of the URL,
between attributes. The efficacy of the model may be the host, the directory, the TLD, and the number of special
affected, for instance, by the quantity of labeled training characters like @, -,., =, and? Moreover, we recognize
data as well as the computational resources needed to train that affective dimensions are important in determining the
deep learning models. Practical application in real-world legitimacy of URLs and account for them by considering
settings may also be hampered by the interpretability of variables such as domain age, domain registration duration,
the learned representations and the approach’s scalability to and Google indexing status. Table 2 displays the features that

79372 VOLUME 12, 2024

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

TABLE 2. Features extracted from URLs.

we have carefully chosen to capture the nuances of patterns to feed a sequence of symbols in raw data directly into a
and characteristics present in phishing URLs. As a result, our sci-kit feature extractor. Consequently, while some of them
detection system gained a thorough understanding of accurate assume unstructured text documents of different lengths,
classification. most of them assume numerical feature vectors of a given
Our detection model, which aims to reliably and precisely size. Sci-kit-learn offers tools for the most popular methods
distinguish between phishing and authentic URLs, is trained of extracting numerical features from text to handle this, such
using the extracted attributes as its basis. Making use of as:
the wide range of parameters readily accessible our model • tokenizing: The strings are tokenized, in which each
applies sophisticated machine-learning techniques to identify potential token is assigned an integer id. The token
minute details and irregularities indicative of phishing separators may include whitespaces and punctuation
activities. Furthermore, by using deep learning techniques, marks.
our model is better equipped to detect malicious URLs since • counting: The number of times each token appears in a
it can find intricate patterns and relationships in the data. Our document is recorded.
research attempts to clear the path for more powerful and • normalizing: Involves weighting and normalising the
efficient defenses against the ubiquitous threat of phishing tokens according to decreasing significance to those that
assaults in the digital realm by using this integrated and appear in most samples.
meticulously developed strategy. Figure 2 displays the model Here are the definitions for features and samples: The
overview. frequency with which each unique token appears (normalised
Feature extractor is designed to include the contextual or not) is considered a feature. For a particular document, the
sentiment score for each URL by using the GLOVE and vector containing all of those token frequencies is regarded
Natural Language Tool Kit (NLTK) tools. Each URL in the as a multivariate sample.
dataset will get preprocessed and tokenized. The prepro-
cessing includes the removal of stopping words, trimming, B. FEATURE VECTORIZATION
etc. Then each token will be passed on to the sci-kit learns Vectorization is the process of converting a set of text
text vectorizers to get a sentiment score. It is not possible documents into numerical feature vectors. In this process,

VOLUME 12, 2024 79373

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

a matrix can be used to represent a corpus of documents, for phishing URL detection. To guarantee data quality, the
where each row represents a document and tokens are method begins with preprocessing a dataset D made up of
represented in each column. Tokenization, counting, and URLs to eliminate null values and duplicates. It then utilizes
normalisation combined into ‘‘Bag of Words’’ or ‘‘Bag a feature extraction technique to identify characteristics
of n-grams’’ representation. Word occurrences are used to indicative of phishing activity from every URL in D.
characterise the documents, with no consideration given to Afterwards, a list L containing the features is created. The
the terms’ relative positions within the text. software appends to L the features it computes for every
Some terms will be prevalent in massive text corpus; URL u in D. The technique leverages the characteristics
hence, it has relatively little significant information about the gathered and stored in L to train a machine learning model
document’s real contents. Usually, one applies the TF-IDF M after processing each URL. The output is then this trained
transform to re-weight the count features into floating point model M , which can accurately identify URLs as phishing or
values suitable for a classifier. Terms are denoted by the sign authentic based on attributes that have been extracted. The
Tf, while inverted document frequencies are indicated by the algorithm provides a systematic framework for building a
notation Tf–idf. phishing detection model, leveraging machine learning tech-
For example, TF-IDF Transformer (norm =’l2’, use_idf = niques to enhance cybersecurity measures against fraudulent
True, smooth_idf = True, sublinear_tf = False) might be used online activities.
with its default parameters. The term frequency is defined as
the number of times a word appears in a particular document. Algorithm 1 Phishing URL Detection Algorithm
It is multiplied by the idf component and is calculated as: Require: URL dataset D
1+n Ensure: Phishing detection model M
idf (x) = log +1 (1) 1: Preprocess dataset D to remove duplicates and null
1 + df (x)
values
In the document set, df(x) is the number of documents that
2: Extract features from URLs in D using feature
include word x, and n is the total number of documents in the
extraction algorithm
document set. Subsequently, the Euclidean norm is used to
3: Initialize empty list L
normalize the resulting TF-IDF vectors.
4: for each URL u in D do
u u 5: Calculate features for URL u
unorm = 2
=√ (2)
||u|| u12 + u22 + . . . .. + un2 6: Append features to L
The term weighting method was initially created for infor- 7: end for
mation retrieval and is useful for grouping and classifying 8: Train machine learning model M using features in L
documents. The calculation of TF-IDF is shown in the next 9: return Trained model M
section.
n
idf (x) = log (3)
1 + df (x) IV. RESIDUAL PIPELINE
In positional feature extraction the tfid transformer gets Residual pipeline enhances the overall system performance
the weight by the token position on the glove dataset which and is a crucial component of the entire architecture.
will be defined by NLTK tool kit and assigned by the sci-kit To address the issue of the vanishing gradient, residual
transformers. There is a need for normalization because the blocks were introduced. Skip connection is the technique
matrix acquired during feature extraction contains floating primarily used here, it connects layer activations to subse-
point values. Min-max normalization operation rescales a set quent layers by skipping portions of the intermediate layers.
of data. The original set’s smallest value would be mapped to Regularisation will bypass any layer that reduces architecture
0. The largest value in the original set would be assigned the performance, which is an advantage of using this type of skip
value 1. Every other value would be assigned a value between link.
these two bounds. The lower bound is denoted by min(y) and Figure 3 shows, the overview of the residual pipeline block.
the upper bound is denoted by max(y). The normalized value The input to the residual pipeline includes 27 URL properties,
(y’) can be represented as: 64 filters, and two classes. It consists of convolutional
y − min(y) blocks and seven inverted residual blocks which execute
y′ = (4) asynchronously. The convolutional block consists of a 3 ×
max(y) − min(y)
3 convolution layer followed by a batch normalization layer,
Followed by normalization, the entire dataset is divided where the batch size chosen is 32. The ReLU activation
into training and test sets of 80:20. function turns the provided input to the necessary output
with the specified range. The output matrix from the
C. PSEUDOCODE convolutional block is fed onto the inverted residual blocks,
The goal of the proposed Phishing URL Detection Algorithm, which conduct different operations including convolution,
presented in Algorithm 1, is to provide a dependable model separable convolution, batch normalization and activation.

79374 VOLUME 12, 2024

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

FIGURE 3. Residual pipeline.

Convolution operation performs computation in just a

single step while separable convolution operates in two steps.
Separable convolution divides the kernel into two smaller
kernels, initially, the input is convoluted using the first kernel
then the result obtained from convoluting the first kernel is
again convoluted using the second kernel. It allows for the
separation of even the smallest differences in data. Then the
result after separable convolution is batch normalized and
activated. After the process of activation, the result is again
convoluted followed by batch normalization and activation.
Finally, another convolutional block receives the output
from each of these inverted residual blocks and the result
is convoluted, batch-normalized, and activated. The output
obtained after applying the residual pipeline will also be a
matrix which will then be passed on to an MLP output layer.

V. OUTPUT BLOCK
After obtaining the result from the residual block, the output
block comes into action. Figure 4 depicts the output block
structure. Here, max pooling is performed initially and it
FIGURE 4. Training and detection phases-output block.
determines the maximum value to gradually shrink the spatial
size representation. Then the pooled matrix is flattened into a
single column. Following the process of flattening, a neural collection. Type indicates if a URL is phishing or safe.
network is used to process the massive input data vector for The dataset includes 6,51,191 URLs and their types; of
further purposes. The dense layer, which is highly connected these, 32,520 are malware URLs, 94,111 are phishing URLs,
to the layer before it, works to change the output’s dimension. 96,457 are defacement URLs, and 4,28,103 are benign
Typically, a dropout layer is used after a dense layer. Finally, URLs. From this, only the benign and phishing URLs are
an activation is performed, softmax activation function is used selected for conducting the experiment, which constitutes
here because it always returns a value between 0 and 1. As a 5,22,214 URL samples, among which 94,111 are phishing
result, very small or negative values can be mapped to 0.0 and and 4,28,103 are benign. The sample dataset is shown in
very large values can be represented as 1.0 when given as the Table 4.
weighted total of the input. The result from the output block Batch size denotes the maximum number of URLs that our
will be a floating point value. A threshold of 0.5 is set, values model can handle concurrently, while ‘‘epoch’’ denotes the
below the threshold are placed in the lower class, which number of training cycles that require the training set. Here,
equals 0 and others are placed in a higher class, which equals in this research study, the number of epochs is determined
1. Class 0 represents the benign URLs and class 1 represents as 50. After the completion of each epoch, the loss is
the phishing URLs. A sample output is represented in Table 3. monitored, if the same error occurs for all fifty iterations
then the execution gets stopped, which means the system is
VI. EXPERIMENTAL RESULTS AND DISCUSSIONS not correctly configured. Accuracy is monitored throughout
A. EXPERIMENTAL DATA the epochs and whenever best accuracy is observed then
The underlying experimental data is taken from the Kaggle it is saved and the model is trained using this saved
dataset [26]. URLs and their types are included in the data.

VOLUME 12, 2024 79375

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

TABLE 3. Result of output block.

B. IMPLEMENTATION OF DOMAIN AGE ANALYSIS the date when the domain was created. The level of detail
Adding domain age analysis to our proposed approach is needed for our study dictates the interval between the two
a critical step in improving the robustness and efficiency dates.
of our detection model. This feature is implemented by Adding domain age analysis to our detection process
first gathering WHOIS information for every URL in our significantly enhanced and improved the detection accuracy
collection, and then determining the age of the domain that of our model. Our model can differentiate between real and
hosts the URL. malicious URLs by looking at the age of the domain that hosts
For each URL in our dataset, we use WHOIS data to obtain them. The analysis of domain age has had a significant impact
detailed information on domain registration. Gaining access on our results such as:
to this data allows us to learn important things about the • Addition of domain age analysis has led to a big drop
age and legitimacy of the URL-associated domains. Using in false results. Knowing the difference between real
the WHOIS information, we can find out the exact date websites and phishing URLs has made our model more
that the domain for each URL was created. For purposes of accurate and lowered the number of false positives.
determining the domain’s age, this creation date is considered • Our method finds and avoids future computer threats
as the starting point. Next, we find out how old the domain is by looking at domain ages. Our ability to find more
in days, months, or years by subtracting the current date from things has improved. To help stop phishing attempts

79376 VOLUME 12, 2024

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

TABLE 4. Sample dataset.

TABLE 5. Comparison of phishing URL detection models.

TrP
Precision = (6)
TrP + FaP
TrP
Recall = (7)
TrP + FaN
2 ∗ Precision ∗ Recall
F1 Score = (8)
Precision + Recall

The Accuracy measure in the metrics above represents the

accurate classification proportion. Recall is the proportion of
from newly registered domains, our method improves phishing URLs that we correctly identify out of all phishing
identification by finding these domains. URLs. The percentage of phishing URLs that we properly
We checked our model against four other models that were identify out of all the anticipated phishing URLs is known as
already made to find fake URLs to make sure our program precision. The harmonic average of the recall and precision
worked. Our proposed model, which uses domain age anal- rate is what determines the F1 Score.
ysis, was tested along with a number of other phishing URL A CNN [28], [30] and MHSA [31], [32] combined
detection methods. These models included random forests, approach for detecting phishing websites has been selected
gradient-boosting classifiers, neural networks, and rule-based as the baseline model to assess the efficacy of the proposed
strategies. The comparison used important performance phishing URL detection method. After receiving URL strings
measures like F1 Score, recall, accuracy, and precision. as input data, CNN-MHSA [33], [34] passes them on to
Table 5, which shows how our model is better than the current the embedding layer [35], [36], where one-hot encoding
models. is carried out and the resulting matrix’s dimension is
Based on the comparison table, our model consistently subsequently reduced. The convolutional neural network
offers more effectively than other methods at finding phishing receives the matrix after which it is fed for feature extraction
URLs across all criteria. Our approach improves phishing [37]. The MHSA is then used to calculate the weight. On the
attack security by improving recall, accuracy, precision, and training set, the baseline and suggested models are trained,
F1 Score. and on the testing set, they are assessed. Figure 5 and
Figure 6 shows a comparison of both models concerning the
C. RESULT ANALYSIS & PERFORMANCE COMPARISON observed accuracy, precision, recall, and F1 Score and ROC
We analyzed how well this model works using four different AUC values. This shows that URL detection using residual
measures, including accuracy, recall, precision, and F1 pipelining has better performance in terms of all the chosen
Score [27]. The following metrics can be expressed using metrics. Then we compare the proposed model with different
the following: True Positive (TrP) for the percentage of textual content features. The Table 6 presents the performance
correctly identified phishing URLs, True Negative (TrN) metrics of various classifiers on different types of textual
for the percentage of legitimate URLs that are recognized content features for a classification task. Each classifier is
as legitimate, False Positive (FaP) for the percentage of evaluated based on precision, recall, F-score, area under the
legitimate URLs that are mistakenly classified as phishing, ROC curve (AUC), and accuracy. LR, XGBoost, Random
and False Negative (FaN) for the percentage of legitimate Forest, Naive Bayes, DNN, LSTM are among the classifiers.
URLs that are mistakenly identified as phishing. Many textual content elements are taken into consideration,
including count vectors, word sequence vectors, character
TrP + TrN sequence vectors, TF-IDF word level, TF-IDF N-gram level,
Accuracy = (5)
TrP + TrN + FaP + FaN and TF-IDF character level [38].

VOLUME 12, 2024 79377

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

FIGURE 5. ROC curve.

FIGURE 6. Precision-recall curve.

Different textual content features are representations of a document relative to a collection, while TF-IDF at the
text data used as input for machine learning algorithms, N-gram level considers sequences of words or characters.
each capturing distinct aspects of the text. TF-IDF at the Character patterns are analysed using TF-IDF character
word level assesses the significance of individual words in level representation, which is helpful for languages with

79378 VOLUME 12, 2024

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

TABLE 6. Classifier performance on textual content features.

complex morphology. By counting the instances of words TABLE 7. Comparison of phishing URL detection models.
in documents, count vectors offer efficiency and simplicity
in situations where word frequency is crucial. Because
word sequence vectors maintain word order when encoding
word sequences, they are essential for applications like
text generation [39]. Character sequence vectors are useful
for analyzing complex writing systems and identifying
misspelled words since character sequences are encoded
to represent text. The best representation strategy must
be found through experimentation because it depends on
several variables, including properties, task complexity, and
algorithm requirements.
From Table 6, it can be observed that the performance
varies depending on the classifier and the type of textual
content features used. For instance, MLP consistently by looking at the textual content parts. Figure 7 offers a visual
achieves high precision, recall, F1 Score, AUC, and accuracy representation of the comparisons for each group.
across different types of textual content features, indicating its
robustness and effectiveness in capturing complex patterns in D. LEXICAL ANALYSIS OF URL STRUCTURE
the data. When it comes to more complex textual content fea- Our proposed approach to find phishing URLs involves
tures, such as word sequence vectors and character sequence thoroughly studying the structure of the URL’s words and
vectors, alternative classifiers do better than Naive Bayes. looking for small connections that could mean a phishing
Text categorization tasks show how different classifiers work attempt. From the URL’s domain name, route, and query

VOLUME 12, 2024 79379

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

FIGURE 7. Comparison of precision, recall, F1 score, and accuracy for various classifiers across different feature representations. Each subplot
represents the performance metrics for a specific feature representation category, including TF-IDF word level, TF-IDF N-gram level, TF-IDF character
level, count vectors, word sequences vectors, and character sequences vectors.

parameters, we look for patterns in their syntax and meanings can analyze user input and detect abnormal activity, such
in this part. potential phishing attempts.
Checking the domain name for any misspellings or Our analysis of URLs is comprehensive, covering both
ambiguities helps us spot scam attempts. Phishing websites syntactic and semantic aspects. Just by examining the context
often use attacks that are similar to those used by real and meaning of the URL sections, you may be able to find
domains. For example, attackers may change language or semantic inconsistencies or conflicts. Malicious URLs use
characters slightly. Our approach is designed to detect domain names that don’t relate to the content of the webpage
these unusual occurrences and identify potentially hazardous or have an unusual combination of path segments and query
URLs [40]. parameters.
The query parameters and URL route are the first places We incorporate lexical analysis of the URL structure into
we search for signs of phishing attempts. Phishing URLs our detection method to enhance our model’s understanding
may use a convoluted path structure or a large number of URL properties and their security ramifications. This
of query parameters to hide their true intent. Our model enhanced analysis allows our computer to detect phishing

79380 VOLUME 12, 2024

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

attempts despite the presence of minute signals that would [7] A. Blum, B. Wardman, T. Solorio, and G. Warner, ‘‘Lexical feature
be missed by previous detection techniques. based phishing URL detection using online learning,’’ in Proc. 3rd ACM
Workshop Artif. Intell. Secur., Chicago, IL, USA, Oct. 2010, pp. 54–60.
The method we propose can consistently differentiate [8] A. C. Bahnsen, E. C. Bohorquez, S. Villegas, J. Vargas, and
between legitimate and malicious URLs by analyzing them F. A. González, ‘‘Classifying phishing URLs using recurrent neural
component by component. Our approach is designed to networks,’’ in Proc. APWG Symp. Electron. Crime Res. (eCrime),
Scottsdale, AZ, USA, Apr. 2017, pp. 1–8.
identify suspicious patterns and highlight them, enhancing
[9] R. Aravindhan, R. Shanmugalakshmi, K. Ramya, and C. Selvan, ‘‘Certain
the accuracy of phishing attempt detection while lowering investigation on web application security: Phishing detection and phishing
false positives and negatives. We enhanced the detection target discovery,’’ in Proc. 3rd Int. Conf. Adv. Comput. Commun. Syst.
(ICACCS), Coimbatore, India, Jan. 2016, pp. 1–10.
system’s ability to handle newly emerging cyber threats by
[10] X. Xiao, D. Zhang, G. Hu, Y. Jiang, and S. Xia, ‘‘CNN–MHSA: A
incorporating lexical analysis into our model. Our technology convolutional neural network and multi-head self-attention combined
is designed to effortlessly handle even the most advanced approach for detecting phishing websites,’’ Neural Netw., vol. 125,
phishing techniques, thanks to its continuous learning and pp. 303–312, May 2020.
[11] W. Wang, F. Zhang, X. Luo, and S. Zhang, ‘‘PDRCNN: Precise
discovery of new patterns that may indicate malicious phishing detection with recurrent convolutional neural networks,’’
activity. We can conduct experiments with and without Secur. Commun. Netw., vol. 2019, Oct. 2019, Art. no. 2595794, doi:
this feature to evaluate performance and examine results 10.1155/2019/2595794.
[12] H. Zuhair and A. Selamat, ‘‘Phishing hybrid feature-based classifier
using lexical analysis of URL structure. A comparison of
by using recursive features subset selection and machine learning
performance is presented in Table 7. algorithms,’’ in Proc. 3rd Int. Conf. Reliable Inf. Commun. Technol.
(IRICT). Springer, 2018, pp. 267–277.
[13] G. Ramesh, J. Gupta, and P. G. Gamya, ‘‘Identification of phish-
VII. CONCLUSION AND FUTURE WORK ing webpages and its target domains by analyzing the feign rela-
Phishing website assaults are a serious and growing risk tionship,’’ J. Inf. Secur. Appl., vol. 35, pp. 75–84, Aug. 2017, doi:
to Internet users, as seen by the rise in incidents in recent 10.1016/j.jisa.2017.06.001.
[14] M. Cova, C. Kruegel, and G. Vigna, ‘‘There is no free phish: An analysis
times. Daily and hourly, a multitude of users inadvertently of ‘free’ and live phishing kits,’’ in Proc. WOOT, Jul. 2008, pp. 1–8.
engage with phishing URLs, perpetuating the risk of cyber [15] L. Wenyin, G. Liu, B. Qiu, and X. Quan, ‘‘Antiphishing through phishing
exploitation. Exploiters favor phishing as it exploits human target discovery,’’ IEEE Internet Comput., vol. 16, no. 2, pp. 52–61,
vulnerabilities, exploiting the innate trust users place in Mar. 2012.
[16] Y. Zhang, J. I. Hong, and L. F. Cranor, ‘‘Cantina: A content-based approach
seemingly authentic links, and evading conventional security to detecting phishing web sites,’’ in Proc. 16th Int. Conf. World Wide Web,
measures. Although extensive research endeavors have been Banff, AB, Canada, May 2007, pp. 639–648.
undertaken to counter these threats, achieving optimal [17] S. Marchal, J. François, R. State, and T. Engel, ‘‘PhishStorm: Detecting
phishing with streaming analytics,’’ IEEE Trans. Netw. Service Manage.,
detection accuracy remains an ongoing pursuit. vol. 11, no. 4, pp. 458–471, Dec. 2014.
This research work aims to discern and categorize URLs [18] R. M. Mohammad, F. Thabtah, and L. McCluskey, ‘‘Predicting phishing
into either phishing or benign classes. Evaluation metrics websites based on self-structuring neural network,’’ Neural Comput. Appl.,
encompassing Accuracy, Precision, Recall, and F1 Score vol. 25, no. 2, pp. 443–458, Aug. 2014.
[19] L. A. Tuan Nguyen, B. L. To, H. K. Nguyen, and M. H. Nguyen,
underscore the superior performance of the proposed system. ‘‘An efficient approach for phishing detection using single-layer neural
Looking ahead, future endeavors may explore the expansion network,’’ in Proc. Int. Conf. Adv. Technol. Commun. (ATC ), Hanoi,
of this work into a multi-class classification framework. Vietnam, Oct. 2014, pp. 435–440.
[20] J. Zhang, and X. Li, ‘‘Phishing detection method based on borderline-
Meanwhile, efforts to optimize the residual pipeline, which smote deep belief network,’’ in Security, Privacy, and Anonymity in
currently comprises seven inverted residual blocks, will Computation, Communication, and Storage (SpaCCS) (Lecture Notes in
focus on streamlining and reducing the complexity of this Computer Science), vol. 10658, G. Wang, M. Atiquzzaman, Z. Yan, and
K. K. Choo Eds. Cham, Switzerland: Springer, 2017, pp. 45–53.
architectural component.
[21] R. Verma and A. Das, ‘‘What’s in a URL: Fast feature extraction and
malicious URL detection,’’ in Proc. 3rd ACM Int. Workshop Secur. Privacy
REFERENCES Anal., Scottsdale, Arizona, USA, Mar. 2017, pp. 55–63.
[22] P. Yang, G. Zhao, and P. Zeng, ‘‘Phishing website detection based on
[1] A. Van der Merwe, M. Loock, and M. Dabrowski, ‘‘Characteristics and multidimensional features driven by deep learning,’’ IEEE Access, vol. 7,
responsibilities involved in a phishing attack,’’ in Proc. Winter Int. Symp. pp. 15196–15209, 2019.
Inf. Commun. Technol., 2005, pp. 249–254. [23] H. Sun, Z. Liu, S. Wang, and H. Wang, ‘‘Adaptive attention-based graph
[2] (4th Quart., 2021). APWG Phishing Activity Trends Report. representation learning to detect phishing accounts on the Ethereum
[Online]. Available: https://docs.apwg.org/reports/apwg/_trends_ blockchain,’’ IEEE Trans. Netw. Sci. Eng., vol. 11, no. 3, pp. 2963–2975,
report_q4_2021.pdf May 2024, doi: 10.1109/tnse.2024.3355089.
[3] B. Liang, M. Su, W. You, W. Shi, and G. Yang, ‘‘Cracking classifiers for [24] M. W. Shaukat, R. Amin, M. M. A. Muslam, A. H. Alshehri, and J. Xie,
evasion: A case study on the Google’s phishing pages filter,’’ in Proc. Int. ‘‘A hybrid approach for alluring ads phishing attack detection using
Conf. World Wide Web (WWW), Montral, QC, Canada, 2016, pp. 345–356. machine learning,’’ Sensors, vol. 23, no. 19, p. 8070, Sep. 2023.
[4] Q. Cui, G. V. Jourdan, G. V. Bochmann, R. Couturier, and I. V. Onut, [25] S. Asiri, Y. Xiao, S. Alzahrani, S. Li, and T. Li, ‘‘A survey of intelligent
‘‘Tracking phishing attacks over time,’’ in Proc. 26th Int. Conf. World Wide detection designs of HTML URL phishing attacks,’’ IEEE Access, vol. 11,
Web (WWW), Perth, WA, Australia, 2017, pp. 667–676. pp. 6421–6443, 2023.
[5] H. Y. Abutair and A. Belghith, ‘‘Using case-based reasoning for phishing [26] M. Sameen, K. Han, and S. O. Hwang, ‘‘PhishHaven—An efficient
detection,’’ Proc. Comput. Sci., vol. 109, pp. 281–288, Jan. 2017. real-time AI phishing URLs detection system,’’ IEEE Access, vol. 8,
[6] M. Al-Janabi, E. D. Quincey, and P. Andras, ‘‘Using supervised machine pp. 83425–83443, 2020.
learning algorithms to detect suspicious URLs in online social networks,’’ [27] S. He, B. Li, H. Peng, J. Xin, and E. Zhang, ‘‘An effective cost-sensitive
in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining, Sydney, XGBoost method for malicious URLs detection in imbalanced dataset,’’
NSW, Australia, Jul. 2017, pp. 1104–1111. IEEE Access, vol. 9, pp. 93089–93096, 2021.

VOLUME 12, 2024 79381

S. Remya et al.: Effective Detection Approach for Phishing URL Using ResMLP

[28] X. Xiao, Z. Wang, Q. Li, S. Xia, and Y. Jiang, ‘‘Back-propagation neural MANU J. PILLAI received the Ph.D. degree
network on Markov chains from system call sequences: A new approach for in computer science and engineering from the
detecting Android malware with system call sequences,’’ IET Inf. Secur., National Institute of Technology, Calicut. He is
vol. 11, no. 1, pp. 8–15, Jan. 2017. currently an Associate Professor with the Depart-
[29] D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, ment of Computer Science and Engineering, TKM
E. P. Markatos, and T. Karagiannis, ‘‘We.B: The web of short URLs,’’ in College of Engineering, Kollam, Kerala, India. His
Proc. 20th Int. Conf. World Wide Web, Mar. 2011, pp. 715–724. research interests include wireless networks, deep
[30] N. Ketkar and J. Moolayil, ‘‘Convolutional neural networks,’’ in Deep learning, and smart environments.
Learning with Python: Learn Best Practices of Deep Learning Models With
PyTorch, 2021, pp. 197–242.
[31] D. Ciregan, U. Meier, and J. Schmidhuber, ‘‘Multi-column deep neural
networks for image classification,’’ in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit., Providence, RI, USA, Jun. 2012, pp. 3642–3649.
[32] D. Bahdanau, K. Cho, and Y. Bengio, ‘‘Neural machine translation by
jointly learning to align and translate,’’ Sep. 2014, arXiv:1409.0473. KAJAL K. NAIR received the B.Tech. degree from
[33] V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, ‘‘Recurrent models of Kerala Technological University (KTU) and the
visual attention,’’ in Proc. 27th Int. Conf. Neural Inf. Process. Syst. (NIPS), M.Tech. degree from the TKM College of Engi-
Montreal, QC, Canada, 2014, pp. 2204–2212. neering, Kollam, Kerala, where she demonstrated
[34] M. T. Luong, H. Pham, and C. D. Manning, ‘‘Effective approaches outstanding academic performance. Her research
to attention-based neural machine translation,’’ Aug. 2015, interest includes cybersecurity, with a specific
arXiv:1508.04025. focus on identifying phishing attacks.
[35] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. 31st
Conf. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017, pp. 1–11.
[36] T. Berners-Lee, L. Masinter, and M. McCahill, Uniform Resource Locators
(URL), document RFC 106107, 1994.
[37] Y. Kim, ‘‘Convolutional neural networks for sentence classification,’’ in
Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), Doha,
Qatar, 2014, pp. 1746–1751. SOMULA RAMA SUBBAREDDY received the
[38] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image Ph.D. degree in computer science and engi-
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), neering from VIT University, Vellore, India,
Las Vegas, NV, USA, Jun. 2016, pp. 770–778. in 2022. He was a Postdoctoral Research with
[39] M. J. Pillai, S. Remya, V. Devika, S. Ramasubbareddy, and Y. Cho, the Department of Information and Communica-
‘‘Evasion attacks and defense mechanisms for machine learning-based web tion, Sunchon National University, South Korea,
phishing classifiers,’’ IEEE Access, vol. 12, pp. 19375–19387, 2024. in 2024. He is currently an Assistant Professor
[40] S. Remya, M. J. Pillai, C. Arjun, S. Ramasubbareddy, and Y. Cho, with the Department of Information Technology,
‘‘Enhancing security in LLNs using a hybrid trust-based intrusion detection Vallurupalli Nageswara Rao Vignana Jyothi Insti-
system for RPL,’’ IEEE Access, vol. 12, pp. 58836–58850, 2024, doi:
tute of Engineering and Technology, Hyderabad.
10.1109/access.2024.3391918.
He has more than 40 publications in reputed journals and conferences.
His research interests include mobile cloud computing, the IoT, machine
learning, and edge computing.

S. REMYA received the Ph.D. degree in computer YONG YUN CHO received the Ph.D. degree in
science and engineering from Vellore Institute of computer engineering from Soongsil University.
Technology, Vellore Campus. She is currently an He is currently a Professor with the Department
Assistant Professor with the Department of Com- of Information and Communication Engineering,
puter Science and Engineering, School of Com- Sunchon National University. His main research
puting, Amrita Vishwa Vidyapeetham, Amritapuri interests include system software, embedded soft-
Campus, Kollam, Kerala, India. Her research inter- ware, and ubiquitous computing.
ests include deep learning, data science, computer
vision, security, and smart environments.

79382 VOLUME 12, 2024

A Novel Approach For Phishing URLs Detection Using Lexical Based Machine Learning in A Real-Time Environment
No ratings yet
A Novel Approach For Phishing URLs Detection Using Lexical Based Machine Learning in A Real-Time Environment
11 pages
IEEE
No ratings yet
IEEE
12 pages
V6I602
No ratings yet
V6I602
8 pages
Phish Guard Phishing Website using Machine Learning Algorithms
No ratings yet
Phish Guard Phishing Website using Machine Learning Algorithms
10 pages
Paper 7AdvancesinEngineeringSoftware
No ratings yet
Paper 7AdvancesinEngineeringSoftware
6 pages
A Machine Learning Based Approach For Phishing Detection Using
No ratings yet
A Machine Learning Based Approach For Phishing Detection Using
14 pages
Contents 1
No ratings yet
Contents 1
19 pages
CyberSec Review3 Team10
No ratings yet
CyberSec Review3 Team10
28 pages
Fin Irjmets1682919970
No ratings yet
Fin Irjmets1682919970
5 pages
A Novel Algorithm To Detect Phishing URLs - 2016
No ratings yet
A Novel Algorithm To Detect Phishing URLs - 2016
5 pages
base paper
No ratings yet
base paper
16 pages
ssrn-3624621
No ratings yet
ssrn-3624621
14 pages
phishing4
No ratings yet
phishing4
6 pages
(IJCST-V9I3P26) :P.Hema Sujatha, S.Sushma Sree, N. Vinay Sreenath, S. Suresh, DR - Bala Brahmeswara Kadaru
No ratings yet
(IJCST-V9I3P26) :P.Hema Sujatha, S.Sushma Sree, N. Vinay Sreenath, S. Suresh, DR - Bala Brahmeswara Kadaru
6 pages
Detection of Phishing Websites Using Machine Learning IJERTV10IS050235
No ratings yet
Detection of Phishing Websites Using Machine Learning IJERTV10IS050235
5 pages
PUMMP: Phishing URL Detection Using Machine Learning With Monomorphic and Polymorphic Treatment of Features
No ratings yet
PUMMP: Phishing URL Detection Using Machine Learning With Monomorphic and Polymorphic Treatment of Features
20 pages
Jain 2018
No ratings yet
Jain 2018
14 pages
Batch-5 Journal-6 ECE-D new (1)
No ratings yet
Batch-5 Journal-6 ECE-D new (1)
6 pages
Towards Detection of Phishing Websites On Client-Side Using Machine
No ratings yet
Towards Detection of Phishing Websites On Client-Side Using Machine
14 pages
Social Engineering Detection: Phishing URLs
No ratings yet
Social Engineering Detection: Phishing URLs
7 pages
Generative Adversarial Network-Based Phishing URL Detection With Variational Autoencoder and Transformer
No ratings yet
Generative Adversarial Network-Based Phishing URL Detection With Variational Autoencoder and Transformer
8 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
6 pages
Detection of Phishing Website
No ratings yet
Detection of Phishing Website
12 pages
Phishing Detection Using Machine Learnin
No ratings yet
Phishing Detection Using Machine Learnin
5 pages
A Survey of Intelligent Detection Designs of HTML URL Phishing Attacks
No ratings yet
A Survey of Intelligent Detection Designs of HTML URL Phishing Attacks
23 pages
updated_phishing_url_detection
No ratings yet
updated_phishing_url_detection
13 pages
Reference 10
No ratings yet
Reference 10
21 pages
Sat - 26.Pdf - Phishing Website Detection Using Novel Machine Learning Fusion Approach
No ratings yet
Sat - 26.Pdf - Phishing Website Detection Using Novel Machine Learning Fusion Approach
11 pages
Mahajan 2018 Ijca 918026
No ratings yet
Mahajan 2018 Ijca 918026
3 pages
Phishing Web Site Detection Using Diverse Machine Learning Algorithms
No ratings yet
Phishing Web Site Detection Using Diverse Machine Learning Algorithms
16 pages
Presentation Slides
No ratings yet
Presentation Slides
42 pages
Mini Project Report Sample Format 2024 - Final
No ratings yet
Mini Project Report Sample Format 2024 - Final
80 pages
Detecting Phishing Website With Code Implementation
No ratings yet
Detecting Phishing Website With Code Implementation
13 pages
Based On URL Feature Extraction
No ratings yet
Based On URL Feature Extraction
6 pages
NLPBased Phishing Attack
No ratings yet
NLPBased Phishing Attack
11 pages
Project Report1
No ratings yet
Project Report1
83 pages
Fr -Detecting Malicious Urls Using Data Analytics
No ratings yet
Fr -Detecting Malicious Urls Using Data Analytics
17 pages
Logistic Regression Based Machine Learning Technique For Phishing Website Detection
No ratings yet
Logistic Regression Based Machine Learning Technique For Phishing Website Detection
4 pages
Batch-5 ECE-D
No ratings yet
Batch-5 ECE-D
4 pages
20mis0106 VL2023240102875 Pe003
No ratings yet
20mis0106 VL2023240102875 Pe003
42 pages
Phishingurl Report23
No ratings yet
Phishingurl Report23
52 pages
Expert Systems With Applications: Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, Banu Diri
No ratings yet
Expert Systems With Applications: Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, Banu Diri
13 pages
Development of A Phishing Detection System Using Support Vector Machine
No ratings yet
Development of A Phishing Detection System Using Support Vector Machine
11 pages
Survey On Phishing Websites Detection Using Machine Learning
No ratings yet
Survey On Phishing Websites Detection Using Machine Learning
8 pages
paper-major1
No ratings yet
paper-major1
6 pages
CSE3502-Final J Comp Report
No ratings yet
CSE3502-Final J Comp Report
20 pages
PhishTrim_Fast_and_adaptive_phishing_detection_based_on_deep_representation_learning
No ratings yet
PhishTrim_Fast_and_adaptive_phishing_detection_based_on_deep_representation_learning
5 pages
Classification of Features For Detecting Phishing Web Sites Based On Machine Learning Techniques
No ratings yet
Classification of Features For Detecting Phishing Web Sites Based On Machine Learning Techniques
51 pages
Review Paper
No ratings yet
Review Paper
9 pages
Detection of Phishing Websites Using An Efficient Feature
No ratings yet
Detection of Phishing Websites Using An Efficient Feature
11 pages
Review Paper
No ratings yet
Review Paper
8 pages
DEPHIDES Deep Learning Based Phishing Detection System
No ratings yet
DEPHIDES Deep Learning Based Phishing Detection System
19 pages
valar doc
No ratings yet
valar doc
60 pages
Content Pages CPE
No ratings yet
Content Pages CPE
79 pages
ASRP-116 Camera Ready
No ratings yet
ASRP-116 Camera Ready
13 pages
Detection of Url Based Phishing Attacks Using Machine Learning IJERTV8IS110269
No ratings yet
Detection of Url Based Phishing Attacks Using Machine Learning IJERTV8IS110269
8 pages
Our Paper
No ratings yet
Our Paper
8 pages
Detection of Phishing Websites Using An Efficient Feature-Based Machine Learning Framework
No ratings yet
Detection of Phishing Websites Using An Efficient Feature-Based Machine Learning Framework
23 pages
Hacking Essentials - The Beginner's Guide To Ethical Hacking And Penetration Testing
From Everand
Hacking Essentials - The Beginner's Guide To Ethical Hacking And Penetration Testing
Adidas Wilson
3/5 (1)
Advanced Penetration Testing with Kali Linux: Unlocking industry-oriented VAPT tactics (English Edition)
From Everand
Advanced Penetration Testing with Kali Linux: Unlocking industry-oriented VAPT tactics (English Edition)
Ummed Meel
No ratings yet
YOLOv12 to Its Genesis A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series
No ratings yet
YOLOv12 to Its Genesis A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series
29 pages
ANN Unit 1
No ratings yet
ANN Unit 1
77 pages
A Review Paper On License Plate Recognition System
No ratings yet
A Review Paper On License Plate Recognition System
3 pages
Deepfake Video Detection System Using Deep Neural Networks
No ratings yet
Deepfake Video Detection System Using Deep Neural Networks
6 pages
Ai Model Question Paper-3
No ratings yet
Ai Model Question Paper-3
27 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
272 539 1 PB
No ratings yet
272 539 1 PB
5 pages
COVID-19 Image Classification Using VGG-16 & CNN Based On CT Scans
No ratings yet
COVID-19 Image Classification Using VGG-16 & CNN Based On CT Scans
9 pages
Early Detection of Glaucoma Feature Visualization With A Deep Convolutional Network
No ratings yet
Early Detection of Glaucoma Feature Visualization With A Deep Convolutional Network
13 pages
1. Multi-Scale_Transformer_Pyramid_Networks_for_Multivariate_Time_Series_Forecasting
No ratings yet
1. Multi-Scale_Transformer_Pyramid_Networks_for_Multivariate_Time_Series_Forecasting
11 pages
It C Synopsis
No ratings yet
It C Synopsis
11 pages
Face Mask Detection Using Machine Learning Technique
No ratings yet
Face Mask Detection Using Machine Learning Technique
10 pages
Stock Market Analysis With The Usage of Machine Learning and Deep Learning Algorithms
No ratings yet
Stock Market Analysis With The Usage of Machine Learning and Deep Learning Algorithms
9 pages
Good Radar Target Detection CNN 0001581
No ratings yet
Good Radar Target Detection CNN 0001581
5 pages
Analogy Between CNN and RNN Using MNIST Dataset: Prof. Rathi R Assistant Professor Sr. Grade 1
No ratings yet
Analogy Between CNN and RNN Using MNIST Dataset: Prof. Rathi R Assistant Professor Sr. Grade 1
21 pages
Ai & Reasoning
No ratings yet
Ai & Reasoning
12 pages
Fikir Setie Tezera
No ratings yet
Fikir Setie Tezera
68 pages
Advanced Spectral Classifiers For Hyperspectral Images A Review
No ratings yet
Advanced Spectral Classifiers For Hyperspectral Images A Review
25 pages
Data Augmentation On Plant Leaf Disease Image Dataset Using Image Manipulation and Deep Learning Techniques
No ratings yet
Data Augmentation On Plant Leaf Disease Image Dataset Using Image Manipulation and Deep Learning Techniques
6 pages
Optimizing Brain Tumor Identification With Fine - Tuned Pre-Trained CNN Models A Comparative Study of VGG16 and EfficientNetB4
No ratings yet
Optimizing Brain Tumor Identification With Fine - Tuned Pre-Trained CNN Models A Comparative Study of VGG16 and EfficientNetB4
5 pages
CodeChads A
No ratings yet
CodeChads A
4 pages
Full Stack Datasciece & Ai, Generative Ai, LLM Models
No ratings yet
Full Stack Datasciece & Ai, Generative Ai, LLM Models
26 pages
Food Spoilage Detection Using Convolutional Neural Networks and K Means Clustering
No ratings yet
Food Spoilage Detection Using Convolutional Neural Networks and K Means Clustering
7 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
Ting 2019
No ratings yet
Ting 2019
28 pages
Machine Learning by Joerg Kienitz
No ratings yet
Machine Learning by Joerg Kienitz
5 pages
16 Comparison of Data Science Algorithms
No ratings yet
16 Comparison of Data Science Algorithms
13 pages
A RISC-V Matrix Multiplier Using Systolic Arrays
No ratings yet
A RISC-V Matrix Multiplier Using Systolic Arrays
41 pages
2022 Ijesdf-75299 PPV
No ratings yet
2022 Ijesdf-75299 PPV
28 pages

An Effective Detection Approach For Phishing URL Using ResMLP

Uploaded by

An Effective Detection Approach For Phishing URL Using ResMLP

Uploaded by

Received 15 May 2024, accepted 29 May 2024, date of publication 3 June 2024, date of current version 11 June 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3409049

An Effective Detection Approach for Phishing

Corresponding author: Yong Yun Cho ([email protected])

INDEX TERMS Phishing, URL detection, residual pipelining, cybersecurity, classification.

I. INTRODUCTION suffer severe consequences [1]. Realistically, detecting and

VOLUME 12, 2024 79369

TABLE 1. Literature review of phishing detection methods.

79370 VOLUME 12, 2024

TABLE 1. (Continued.) Literature review of phishing detection methods.

VOLUME 12, 2024 79371

FIGURE 1. Architecture of the proposed model.

79372 VOLUME 12, 2024

TABLE 2. Features extracted from URLs.

VOLUME 12, 2024 79373

79374 VOLUME 12, 2024

FIGURE 3. Residual pipeline.

Convolution operation performs computation in just a

VOLUME 12, 2024 79375

TABLE 3. Result of output block.

79376 VOLUME 12, 2024

TABLE 4. Sample dataset.

TABLE 5. Comparison of phishing URL detection models.

The Accuracy measure in the metrics above represents the

VOLUME 12, 2024 79377

FIGURE 5. ROC curve.

FIGURE 6. Precision-recall curve.

79378 VOLUME 12, 2024

TABLE 6. Classifier performance on textual content features.

VOLUME 12, 2024 79379

79380 VOLUME 12, 2024

VOLUME 12, 2024 79381

79382 VOLUME 12, 2024

You might also like