Improved Detection of Phishing Websites Using Machine Learning 11-6-2024
Improved Detection of Phishing Websites Using Machine Learning 11-6-2024
Keywords: Website Phishing Detection; Machine Learning; Cybersecurity; Support Vector Machine; Decision Tree; Artificial Neural
Networks
1
Department of Computer Science
adapt to the ever-evolving landscape of phishing attacks.
College of Computer and Information Sciences This requires leveraging advanced technologies that can
Jouf University, Sakaka 72388, Saudi Arabia learn from past incidents and improve their detection
Email: [email protected], [email protected]
2
Department of Information Systems capabilities over time. Machine learning offers a promising
College of Computer and Information Sciences approach to address this challenge. By analyzing large
Jouf University, Sakaka 72388, Saudi Arabia
Email: [email protected]
datasets of phishing and legitimate websites, machine
3
Department of Information Systems and Technology learning models can identify patterns and features that
Faculty of Graduate studies for Statistical Research, Cairo University, distinguish malicious sites from safe ones. These models
Egypt
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4619
can then be trained to detect phishing attempts with high performance of these models. The Results and Discussion
accuracy, even as attackers modify their tactics. section presents the findings of our study, comparing the
performance of different machine learning models. It
In this paper, we propose a comprehensive approach to
discusses the implications of these results, highlighting the
phishing detection using machine learning techniques. We
most effective models for phishing detection and identifying
employ a variety of machine learning models, including
areas for improvement. The Conclusion summarizes the key
Decision Tree, Support Vector Machine (SVM), Artificial
insights gained from our paper, emphasizing the potential of
Neural Network (ANN), and Random Forest (RF), to assess
machine learning in enhancing phishing detection. It also
their effectiveness in identifying phishing websites. Each
outlines directions for future paper, suggesting ways to
model has its strengths and weaknesses, and by evaluating
further improve the robustness and accuracy of phishing
them rigorously, we aim to determine the most effective
detection systems. This structured approach ensures that the
approach for phishing detection. The dataset used for this
reader can follow our paper journey from identifying the
paper is primarily sourced from PhishTank.org, a widely
problem to proposing a solution, evaluating its
recognized repository for phishing URLs. This dataset
effectiveness, and considering future improvements.
provides a real-world context for our models, ensuring they
are trained and tested against a representative sample of 2. BACKGROUND
phishing threats. To prepare the dataset for analysis, we
Phishing attacks, characterized by deceptive practices
implemented several preprocessing steps, including artifact
where attackers impersonate legitimate entities to steal
removal, normalization, and handling data inconsistencies.
sensitive data, pose serious threats across various digital
These steps were crucial to refining the input data and
platforms. These platforms range from emails and social
enhancing the performance of our machine learning models.
media to malicious websites designed to capture personal
Our methodology involves splitting the dataset into training
and financial information. The ramifications of phishing
and testing subsets, allowing us to evaluate the models'
attacks are extensive, affecting not just individual victims
performance on unseen data. We then apply various metrics,
but also large organizations by compromising data integrity,
such as accuracy, precision, recall, and F1-score, to measure
financial security, and overall reputation.
each model's effectiveness in detecting phishing attempts.
By comparing these metrics, we can identify the strengths These cyber-threats continue to evolve in complexity, often
and limitations of each model and determine the best outpacing the capabilities of traditional cybersecurity
approach for real-world phishing detection. measures. Phishing schemes have become increasingly
sophisticated, using advanced tactics like spear phishing,
This paper is organized to provide a comprehensive
whaling, and pharming that require more than basic filters
understanding of our approach to improving phishing
and rule-based detection systems. The dynamic nature of
detection using advanced machine learning techniques. The
phishing attacks, combined with their ability to adapt and
structure is designed to guide the reader through the problem
mimic legitimate user interfaces and communication, makes
background, methodology, results, and conclusions
them particularly challenging to detect and mitigate.
systematically. The Introduction section introduces the
problem of phishing attacks, providing context on their 2.1. Research Questions
significance and the challenges they pose to cybersecurity.
• RQ1: How can machine learning algorithms be optimized
It defines the problem, outlines our proposed solution using
to accurately differentiate between phishing and
machine learning models, and explains the organization of
legitimate websites based on URL characteristics
the paper. The Literature Review surveys existing
and content analysis?
methodologies and approaches to phishing detection,
emphasizing the advancements and limitations of current • RQ2: What role do evolving phishing techniques play in
techniques. It provides a critical review of previous paper, the development of machine learning models for
setting the stage for our study by highlighting the need for phishing website detection?
improved detection methods. The Research Methodology
• RQ3: Can machine learning models be trained to predict
details our methodological approach to phishing detection.
the emergence of new phishing websites before
It covers the data collection process, describing how we
they become active threats?
sourced our dataset from PhishTank.org. It also explains the
preprocessing steps taken to clean and standardize the data, • RQ4: How effective are the selected machine learning
ensuring it is suitable for machine learning analysis. techniques in detecting complex phishing websites
Additionally, it outlines the development and training of compared to other traditional cybersecurity
various machine learning models, including Decision Tree, methods?
Support Vector Machine (SVM), Artificial Neural Network
• RQ5: What challenges do machine learning models face
(ANN), and Random Forest (RF). Finally, this section
in real-time phishing website detection, and how
describes the evaluation metrics used to assess the
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4620
can these challenges be addressed to improve practical challenges of real-world application, laying the
detection rates? groundwork for significant advancements in cybersecurity
defenses against phishing.
On the one hand, Paper Question 1 (RQ1) probes into the
optimization of machine learning algorithms to discern 2.2. Paper Contributions
between phishing and legitimate websites effectively. This
• Develop a machine learning-based framework that
exploration is critical for pinpointing which algorithms
can efficiently and accurately detect phishing
when applied to URL characteristics and content analysis,
websites.
yield the highest accuracy in phishing detection. It aims to
unearth the intricate balance between algorithm complexity • Evaluate and compare the effectiveness of various
and detection precision, ensuring that the chosen models are machine learning algorithms in identifying
both efficient and scalable for practical cybersecurity phishing activities.
applications. On the other hand, Paper Question 2 (RQ2)
• Enhance the adaptability and responsiveness of
delves into the impact of evolving phishing techniques on
phishing detection systems to cope with the
the development of these machine-learning models. It
continually evolving tactics used by
emphasizes the necessity for adaptive models that can not
cybercriminals.
only recognize current phishing patterns but also learn from
emerging threats. This question is pivotal in constructing a • Integrate the proposed machine learning detection
dynamic defense mechanism that stays ahead of system into existing cybersecurity frameworks to
cybercriminals' continually evolving tactics. improve real-time detection capabilities and reduce
the incidence of phishing attacks.
Meanwhile, Paper Question 3 (RQ3) expands the horizon by
questioning the predictive power of machine learning 3. LITERATURE REVIEW
models against the inception of new phishing sites. It
The continuous evolution of cyber threats, especially
investigates the potential for these models to act not just
phishing attacks, underscores the urgent need for effective
reactively but proactively, identifying likely phishing
detection methods. Phishing attacks, deceptive in nature,
threats before they materialize into active attacks. This
aim to trick users into divulging sensitive information by
forward-looking approach could revolutionize phishing
masquerading as legitimate entities. The surge in such
defense strategies, shifting from a stance of response to one
threats has propelled the adoption of machine learning (ML)
of anticipation.
and deep learning as forefront technologies in identifying
Paper Question 4 (RQ4) explores the effectiveness of and neutralizing these risks. The primary purpose of these
selected machine learning techniques in detecting complex technologies is to augment the accuracy and speed of
phishing websites and compares these with traditional phishing detection, thereby ensuring a more secure digital
cybersecurity methods. This inquiry is pivotal in assessing environment for users [1]. A cornerstone in this paper is the
how advanced ML algorithms measure up against utilization of consistent datasets like PhishTank, renowned
conventional security measures in identifying sophisticated for its comprehensive compilation of verified phishing
phishing threats. The goal is to ascertain if the detailed URLs alongside legitimate websites. This dataset enables
analysis facilitated by these ML models leads to a researchers to benchmark and compare the efficacy of
significant improvement in detection rates in real-world various machine learning models accurately. For instance,
scenarios, offering a more potent defense against the A. K. Dutta (2021) leveraged this dataset to explore the
evolving landscape of phishing attacks. potential of Random Forest and Support Vector Machine
(SVM) classifiers in phishing detection, achieving a
Lastly, Paper Question 5 (RQ5) confronts the practical
remarkable accuracy of 95.7% with the Random Forest
challenges machine learning models face in real-time
model. This result underscores the model's adeptness at
phishing detection. It aims to unravel the barriers to
discerning between phishing and legitimate content,
implementing these models in live environments, where
benefiting from the ensemble method's inherent capability
phishing websites must be identified and neutralized swiftly
to minimize variance and bias [1]. Jain A.K. & Gupta B.B.
to prevent harm. Addressing these challenges is vital for
(2018) also tapped into the rich resource of the UCI Machine
enhancing the real-time operational efficiency of phishing
Learning Repository's Phishing Websites dataset to develop
detection systems, ensuring they can provide immediate
"PHISH-SAFE." Utilizing a Decision Tree classifier, they
protection against phishing threats as they arise.
managed to detect phishing URLs with an accuracy of
Collectively, these paper questions forge a comprehensive 92.3%. The simplicity of Decision Trees, combined with
inquiry into leveraging machine learning for phishing their interpretability, makes them invaluable for rapid
website detection. They address the spectrum from assessments and decisions in phishing detection scenarios
theoretical optimization of algorithms and features to [2]. Exploring further, Purbay M. & Kumar D. (2021)
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4621
evaluated the efficacy of SVM against other supervised of phishing detection systems [11]. The sample sizes and
algorithms like Naïve Bayes and K-Nearest Neighbors diversity in these studies are pivotal for generalizing the
(KNN). Their study highlighted SVM's superiority, findings. Studies using larger and more varied samples
achieving an accuracy of 93.8%. The model's success stems enable the detection of nuanced phishing tactics. This is
from its capacity to effectively manage the high- clear in the works of researchers like Aljofey et al., who, by
dimensional spaces characteristic of phishing data, thereby using mixed datasets, could discern complex phishing
enhancing detection [3].In a different vein, Gandotra E. & behaviors, a crucial step in developing effective
Gupta D. (2021) employed Gradient Boosting on the same countermeasures [10]. The results across these studies
dataset, attaining an accuracy of 94.5%. This study consistently highlight that machine learning and deep
illuminated the power of boosting techniques in phishing learning significantly enhance phishing detection. The
detection by iteratively refining models to correct previous adaptability of these models to new threats, coupled with
errors, thereby progressively improving accuracy [4]. Hung their ability to process vast amounts of data, positions them
Le et al. (2017) took a deep learning approach with as essential tools in the ongoing fight against cybercrime.
Convolutional Neural Networks (CNN) in their "URLNet" However, there is room for further exploration and
system. Applied to a dataset combining PhishTank and integration of these methodologies to keep pace with the
Alexa's top websites, URLNet achieved an F1-score of rapidly evolving landscape of phishing and other cyber
97.2%, highlighting CNNs' ability to autonomously extract threats.
complex features from URLs. This capability is critical for
4. RESEARCH METHODOLOGY
learning the intricate patterns embedded in URLs, making
CNN a powerful tool in phishing detection. However, its The methodology adopted for this paper is designed to
reliance on significant computational resources and a robust explore the efficacy of machine learning (ML) algorithms in
training regime is a consideration for its deployment [5]. detecting phishing websites, which is essential for the
advancement of cybersecurity measures. This section
Integrating lexical features and block-listed domains into
elaborates on the systematic approaches used in data
phishing detection, Hong J. et al. aimed to refine the
collection, feature selection, and engineering, model
detection process, achieving an accuracy of 91%. This
development, and the evaluation frameworks implemented
integrated approach leveraged machine learning models to
to measure the performance and reliability of the proposed
enhance traditional block-listing methods, offering a
models.
dynamic response to evolving phishing threats. However,
this method's effectiveness is less pronounced against 4.1. Data Collection
completely new or previously unseen phishing sites [6].J. The dataset used in this study, which is essential for
Kumar et al. (2020) reaffirmed the effectiveness of the detecting phishing sites, was meticulously compiled from
Random Forest classifier, achieving 96% accuracy on the two main sources. Initially, much of the data was
UCI dataset. The model's ability to manage large and downloaded from PhishTank.org, a reputable source known
diverse datasets without significant overfitting is a testament for its comprehensive and regularly updated repository of
to its utility in phishing detection. It underscores the verified phishing URLs. Additionally, to enrich the dataset
importance of feature diversity and the classifier's capacity and ensure a broad representation of phishing
to manage various indicators of phishing [7]. Aljofey A. et characteristics, we combined data from the final dataset
al. (2020) explored the use of a character-level used in the study by A. K. Dutta [1], which includes both
convolutional neural network model, reaching an phishing URLs and legitimate website URLs.
impressive F1 score of 98% on a mix of PhishTank and
DMOZ datasets. This approach, particularly potent at the The dataset includes a total of 10,000 instances, evenly
character level, was effective in identifying subtle divided with 5,000 instances classified as phishing and
anomalies in URLs. This achievement highlights the 5,000 instances classified as non-phishing.
potential of neural networks to detect sophisticated phishing This balanced approach allows for a fair comparison
attempts that might elude simpler detection systems between models and helps to prevent any bias that might
[10].AlEroud A. & Karabatis G. (2020) investigated the arise from uneven class distribution. Each instance in the
application of generative adversarial networks for refining dataset is characterized by features that are critical for
phishing detection, reaching an accuracy of 94%. This novel distinguishing phishing sites from legitimate ones. Table 1
approach proved that generative models could simulate and shows features like the URL structure, domain attributes,
learn from adversarial attacks, thus enhancing the resilience and the use of secure protocols…etc
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4622
Table 1. Features Selection
4.1.1. Division of the Dataset models are evaluated on data they have not seen
during training, providing a measure of their
To train and evaluate the machine learning models
generalization capability and accuracy in real-
effectively, the dataset was partitioned into two subsets:
world scenarios.
• Training Set: 80% of the dataset, or 8,000
• To ensure our machine learning models are both
instances, was allocated for training the models.
trained and tested under realistic conditions, we
This subset includes 4,000 phishing and 4,000 non-
divided the dataset into two segments:
phishing instances. The training set is crucial for
the models to learn the distinguishing features of • Training Set: Including 80% of the total instances
phishing and legitimate websites. (8,844 instances), this segment is used to train the
models. It includes a mix of phishing and
• Test Set: The remaining 20%, consisting of 2,000
legitimate labels, providing the models with ample
instances (1,000 phishing and 1,000 non-phishing),
examples to learn from and adapt to various
formed the test set. This division ensures that the
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4623
phishing tactics and legitimate behaviors. site detection by the machine learning models.
• Test Set: The remaining 20% of the data (2,211 4.3. Model Development and Evaluation
instances) forms the test set. This segment is
To address the paper objectives, multiple ML models were
crucial for evaluating the trained models against
developed and rigorously evaluated. Each model was
unseen data, assessing their generalization
chosen based on its proven track record in classification
capabilities, and ensuring they keep high accuracy
tasks, particularly in the domain of cybersecurity.
and reliability when deployed in real-world
scenarios. 4.3.1. Machine Learning Models
This structured approach to data collection and division is • Decision Trees are fundamental to the field of machine
foundational to enhancing the accuracy and reliability of our learning, known for their straightforward and
phishing detection system. By training our models on a transparent approach to classification and regression
dataset that closely mirrors the complex dynamics of real- tasks. These models operate by creating a tree-like
world web interactions, we ensure that our system is structure where each node represents a feature of the
prepared to effectively combat the ever-evolving landscape dataset, and branches denote the decision rules leading
of cyber threats. to different outcomes. The simplicity of Decision
Trees lies in their ability to break down complex
4.2. Data Preprocessing
decision-making processes into a series of simpler,
Since the dataset encompasses a variety of URL structures, binary choices, making the model's decisions easy to
domain information, and textual content, each with its interpret and explain. This characteristic is particularly
peculiarities, the data preprocessing step in this paper was advantageous in phishing detection, as it allows
crucial. To address such variances, a normalization method security analysts to understand and trace the reasoning
was employed as described in Eq. (1), transforming the behind each classification. Moreover, Decision Trees
numerical features to a common scale without distorting can manage both numerical and categorical data,
differences in the ranges of values. This was achieved by making them versatile for various types of input
standardizing each feature value using the following features commonly encountered in phishing datasets.
formula:
• Support Vector Machines (SVM): Support Vector
𝑋−𝜇
Xstd = Machines are powerful, supervised learning models
𝜎
used for classification and regression tasks. SVMs are
Here, 𝑋std represents the standardized value, 𝑋 is the particularly noted for their ability to create optimal
original value, 𝜇 is the mean of the feature values, and 𝜎 is hyperplanes in a multidimensional space that distinctly
the standard deviation of those values. This transformation classifies the data points. This capability is crucial in
ensures that each feature contributes equally to the model, phishing detection, where the distinction between
thereby improving the learning efficiency and stability of phishing and legitimate websites often lies in subtle
the machine learning algorithms. and high-dimensional differences in features. SVMs
Furthermore, to help the analysis and model training, are robust against overfitting, especially in high-
categorical attributes were converted into a numerical dimensional spaces, due to their regularization
format through label encoding and, where necessary, one- parameter, which helps maintain the generalizability
hot encoding. This ensured that models could interpret the of the model. Their effectiveness in dealing with non-
data correctly without being misled by non-numerical linear boundaries, thanks to kernel tricks, allows them
values. to adapt to the complex and evolving nature of
phishing attacks.
To address variations in categorical data and enhance model
interpretability, the categorical features were processed • Neural Networks: Neural Networks represent a more
using the approach outlined in Eq. (2). The value advanced tier of machine learning models, inspired by
transformation for each categorical feature was performed the neural structure of the human brain. Comprising
by mapping each unique category to a distinct integer value, layers of interconnected nodes or "neurons," these
normalizing the categorical diversity across the dataset. networks can model highly complex, non-linear
relationships in data. The depth and flexibility of
Neural Networks make them exceptionally suited for
Category encoded =index(category) 2 phishing detection, where attackers constantly
innovate and vary their techniques. The layered
Each preprocessing step, from feature standardization to
architecture allows Neural Networks to learn from a
categorical encoding, was designed to improve the dataset's
vast amount of data and recognize intricate patterns
structure, facilitating more accurate and efficient phishing
that simpler models might miss. This capability is
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4624
pivotal in identifying sophisticated phishing schemes ROC Curves: The Receiver Operating Characteristic (ROC)
that employ advanced cloaking, scripting, and social curves graphically portray the diagnostic ability of binary
engineering tactics. classifiers, a cornerstone in phishing detection to balance the
trade-offs between true positive rates and false positive
• Random Forest Classifier: The Random Forest
rates. The Area Under the Curve (AUC) metric provides a
Classifier extends the concept of Decision Trees into a
measure of the model's discernment between positive and
more powerful ensemble method that combines
negative classes, as depicted in Eq. (7):
multiple trees to improve the predictive performance
1
and reduce the risk of overfitting. Each tree in a AUC= ∫0 𝑇𝑃𝑅(𝑡)𝑑𝑡 7
Random Forest works on a random subset of features
and data points, leading to a diverse set of classifiers The comprehensive application of these methodologies
whose results are aggregated to produce a final aims not only to validate the effectiveness of the ML models
decision. This diversity makes Random Forests in distinguishing between phishing and legitimate websites
particularly effective in phishing detection, as they can but also to explore their potential integration into broader
capture a wide array of indicators of malicious cybersecurity frameworks, offering advancements in
behavior without being overly sensitive to noise and preemptive cyber defense mechanisms.
outliers in the data. The ensemble approach also means 5. RESULTS
that Random Forests are less likely to be swayed by
deceptive techniques used by phishing attacks, In this study, we aimed to evaluate the effectiveness of
providing a robust defense against a variety of various machine learning models in detecting phishing
phishing tactics. websites using a comprehensive dataset derived from
verified sources. The dataset was preprocessed using label
4.3.2. Key Metrics for Assessing Machine Learning encoding to transform categorical features into a format
Models suitable for model input. This preprocessing step was crucial
The models were evaluated using a suite of metrics to assess for facilitating the application of machine learning
their predictive accuracy and generalizability: algorithms on the data.
Accuracy: This essential metric gauges the model's overall 5.1. Analysis Performance of Models
correctness across all classes. It is the ratio of correctly 5.1.1. Visualizing the data:
predicted instances to the entire set of instances within the
dataset, formalized as shown in Eq. (3): Before applying machine learning techniques, we conducted
an initial analysis to understand the distribution of the
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
Accuracy = 3 various features within our dataset. Each feature's histogram
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠𝑁𝑢𝑚𝑏𝑒𝑟
was generated to visualize its distinct patterns and the
Precision: This metric elucidates the model’s capability in presence of potential outliers. These visualizations, shown
accurately predicting positive (phishing) instances. It in Fig. 1, demonstrate significant differences in the
captures the proportion of true positives among all positive distributions and ranges of features such as URL length,
predictions, as delineated in Eq. (4): HTTPS domain presence, and the use of special symbols in
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 URLs.
Precision= 4
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4626
0.257
0.305
ANN 88.95% 86.85% (log
(log loss)
loss)
Rando 0.022
0.135
m 100% 95.75% (log
(log loss)
Forest loss)
Fig. 4 Accuracy and Loss for the ANN Model. The Confusion Matrix is a pivotal tool in machine learning,
essential for evaluating the performance of classification
5.1.5. . Random Forest Classifier:
models. It delineates the number of correct and incorrect
The Random Forest Classifier enhances the Decision Tree predictions, enabling the identification of the types of errors
approach by integrating multiple trees to form an ensemble, a model makes. Our study employed four classification
which significantly improves the model's accuracy and models: Decision Tree, Support Vector Machine (SVM),
robustness. By averaging the results of individual trees, Artificial Neural Network (ANN), and Random Forest
Random Forest reduces the risk of overfitting that single Classifier, each assessed using their respective confusion
Decision Trees often face. This ensemble method is matrices as shown in Fig. 6.
effective in handling diverse types of data and complex
The Decision Tree model exhibited exceptional
patterns, making it a strong choice for both classification
performance with 992 true negatives (TN) and 942 true
and regression tasks. Table 6 details the performance of the
positives (TP), while producing only 20 false positives (FP)
Random Forest model. Fig. 5 shows the performance of the
and 46 false negatives (FN). This indicates a high accuracy,
Random Forest model. The training accuracy is perfect at
with minimal misclassification.
100%, while the validation accuracy is an impressive
95.75%. This high accuracy on unseen data demonstrates Conversely, the SVM model showed a significant number
the model's strong generalization capabilities. The training of false negatives (296), although it achieved 984 TN and
loss is minimal at 0.022, and the validation loss is 0.135. 692 TP, with 28 FP. This suggests that while SVM is
effective in identifying negative samples, it struggles with
Table 5. summarized results for the Random Forest model
correctly classifying positive instances.
Classification Value
The ANN model performed well, recording 973 TN and 770
Training Accuracy 100% TP. However, it had 39 FP and 218 FN, indicating a
balanced but slightly less effective performance in
Validation Accuracy 95.75%
comparison to the Decision Tree and Random Forest
Training Loss 0.022 (log models.
loss) The Random Forest model demonstrated robustness similar
Validation Loss 0.135 (log to the Decision Tree, achieving 993 TN and 922 TP, along
loss) with 19 FP and 66 FN. This underscores its high accuracy
and reliability in classification tasks.
Trainin
Validatio
g Trainin Validatio
Model n
Accurac g Loss n Loss
Accuracy
y
Decisio ~0 (log 1.14 (log
100% 96.70%
n Tree loss) loss)
0.348
0.369
SVM 84.85% 83.80% (log
(log loss) Fig. 6 Confusion Matrices for Various Classification
loss)
Models
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4627
Table 7. results of Performance Metrics for Decision Tree reducing false negatives. These insights highlight the
model efficacy of ensemble methods like Random Forest in
achieving optimal classification performance, reinforcing
Class Accurac Precisi Recal F1 Score
their applicability in tasks requiring high precision and
y on l
Decision Tree Model
accuracy.
Phish 0.967 0.979 0.953 0.966
ing
Norm 0.967 0.953 0.979 0.966
al
Avera 0.967 0.966 0.966 0.966
ge
Normal 0.838 0.796 0.929 0.857 In this section, we systematically examine the
discriminatory capabilities and precision-recall balance of
Averag 0.838 0.861 0.815 0.834 the Decision Tree, SVM, ANN, and Random Forest models
e in phishing detection. The ROC curves in Fig. 7 collectively
display the trade-off between the true positive rate and false
positive rate for each model, enabling a comparative
Table 9. results of Performance Metrics for ANN model analysis of their ability to distinguish between phishing and
Class Accurac Precisi Recal F1 Score non-phishing instances across varied thresholds. Similarly,
y on l the Precision-Recall curves in Fig. 8 aggregate the models'
precision and recall metrics, crucial for assessing
ANN Model
Phishin 0.8715 0.952 0.779 0.857 performance in our imbalanced dataset context. This
g integrated approach facilitates a holistic view of the models'
Normal 0.8715 0.832 0.952 0.889 strengths and weaknesses, highlighting which models
maintain high precision while maximizing recall, and
Averag 0.8715 0.892 0.865 0.873 provides a nuanced understanding of their overall
e 5 effectiveness in differentiating and accurately predicting
phishing activities.
y on l Score
Phishin 0.9575 0.979 0.933 0.955
MODEL
g
Normal 0.9575 0.940 0.979 0.959
Averag 0.9575 0.959 0.956 0.957
e
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4628
The results of this study were promising, as the decision tree
model showed the highest accuracy at 96.7%, followed by
the random forest model at 95.75%. These results confirm
the ability of these models to effectively detect phishing
sites. The ANN model, despite the challenges of overfitting,
highlighted the potential of deep learning in this area,
suggesting that with further fine-tuning and regularization,
it could provide more powerful detection capabilities. The
SVM model's low accuracy of 83.8% was not sufficient.
Instead, it provided important insights into what types of
phishing strategies require different or more precise
detection methods.
6. Discussion
Fig. 9 Precision-Recall Curves The results of this paper are an important step towards
improving the detection of phishing sites using advanced
The table presents a comprehensive view of the key metrics machine learning techniques. The analysis of different
used to assess the effectiveness of each model in phishing models' performance highlights the effectiveness and
detection. The Decision Tree model demonstrates challenges faced by individual models.
exceptional performance across all metrics, with an F1 score
of 0.9662, indicating a high balance between precision and The decision tree model showed a high resolution of 96.7%
recall. This model also shows the highest accuracy at 0.967, on the verification group, indicating its great ability to
making it highly effective in correctly identifying phishing distinguish between phishing and legitimate sites. However,
attempts. The SVM model, while showing high precision the 100% accuracy of training shows the likelihood of over-
(0.9611), struggles with recall (0.7004), leading to the adaptation, as the model learns the patterns of training data
lowest F1 score (0.8103) among the models. This suggests very accurately, which may reduce its ability to generalize
that while it is precise in marking positive instances, it new data.
misses a significant number of true positives, which is a On the other hand, the supporting vector machine (SVM)
critical consideration in phishing detection. model provided a reasonable accuracy of 83.8% on the
The ANN model balances performance with an F1 score of verification kit. This model had the highest accuracy in
0.8570 and shows a good precision of 0.9518. However, its identifying false positives, which means it is very accurate
recall at 0.7794 and accuracy at 0.8715 indicate some in identifying non-phishing sites but may fail to detect some
missed phishing instances, suggesting room for phishing sites. This indicates the need to improve the model
improvement in model sensitivity. Lastly, the Random to include a wider range of sophisticated phishing threats.
Forest model achieves robust overall metrics, with an F1 The synthetic neural network (ANN) achieved 86.85%
score of 0.9559 and the highest precision (0.9798). Its recall accuracy on the verification group, and showed challenges
of 0.9332 and accuracy of 0.9575 make it highly related to over-adaptation. While the results indicate the
competitive with the Decision Tree model, offering a strong neural network's ability to learn from complex data, it needs
alternative with consistent performance across various improvement to adjust and modify the model to reduce the
evaluation metrics. gap between training performance and verification.
Table 11. Summary of Performance Metrics for Each For the Random Forest model, it showed excellent
Model performance with verification accuracy of 95.75%. This
F1 model combines the predictive power of multiple decision
Model Precision Recall Accuracy trees, reducing the risk of over-adaptation and improving the
Score
Decision accuracy of predictions, making it one of the best models
0.9662 0.9792 0.9534 0.967
Tree used to detect hunting.
SVM 0.8103 0.9611 0.7004 0.838
ANN 0.857 0.9518 0.7794 0.8715 These results are consistent with previous papers that has
Random confirmed the effectiveness of various models of machine
0.9559 0.9798 0.9332 0.9575
Forest learning in detecting phishing. For example, in the study A.
Average 0.8974 0.9681 0.8416 0.909 K. Dutta (2021) the RNN model was used with LSTM and
achieved accuracy of 95.7%. Although this model requires
substantial accounting resources, it may face difficulties in
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4629
handling data in real time. In the study of Jain A.K. & Gupta several directions. Combining strengths in multiple models
B.B. (2018) the decision tree model was used and achieved such as integrating decision tree with SVM or ANN can
92.3% accuracy, although this model simply has its provide a more balanced and effective solution. Lightweight
complexity and interpretability, its ability to handle models that maintain high accuracy with computational
complex data is limited. The Purbay M. & Kumar D. (2021) efficiency can also be developed, using techniques such as
study used the SVM model and achieved 93.8% accuracy, Model Pruning or using efficient structures such as
and although this model is effective in managing high- MobileNets.
dimensional space, it may face challenges in handling
Increasing data diversity by collaborating with security
multidimensional data. Gandotra E. & Gupta D. (2021) used
agencies and companies to obtain more comprehensive and
Gradient Boosting technology and achieved 94.5%
up-to-date data sets reflecting the current trolling landscape
accuracy, a technology based on gradually correcting errors,
is essential. Synthetic data generation techniques can be
improving model performance over time. Hung Le et al.
used to train models in varied and difficult scenarios. These
(2017), it used CNN (URLNet) and achieved an F1 rate of
results emphasize the potential for machine learning in
97.2%, but this technology requires significant
improving phishing detection, but also highlight the need for
computational resources. Hong J. et al. Machine learning
continuous improvements and adopt new techniques to keep
techniques were used with verbal features and achieved
pace with the evolution of threats. Tackling real-time
91% accuracy, but were less effective in dealing with new
trolling requires models that can quickly adapt to new
or invisible sites. J. Kumar et al. (2020) Used the random
threats, making continuous learning or online learning
forest model and achieved 96% accuracy, demonstrating the
techniques essential in this area.
ability to manage large and diverse data without over-
adaptation. Study Aljofey A. et al. (2020) used a neural Table 12. comparative analysis of phishing detection
network based on letter analysis and achieved an F1 rate of studies
98%, as it was effective in detecting nuances in URLs. The
Methodo Limitatio
AlEroud A. & Karabatis G. (2020) study used generative Study Accuracy Dataset
logy ns
competitive models and achieved 94% accuracy, Requires
demonstrating that generative models can learn from significant
competitive attacks to improve detectability. In contrast, the computati
current study used several models such as decision tree, A. K. RNN onal
PhishT
SVM, ANN and random forest, where the decision tree Dutta with 95.70% resources;
ank
(2021) LSTM may
model achieved resolution of 96.7%, the SVM model
struggle
achieved resolution of 83.8%, the ANN model achieved with real-
resolution of 87.15%, and the random forest model achieved time data
accuracy of 95.75%, As shown in the table 12. UCI
Jain
Limited Phishin
These results demonstrate that each model has its own A.K.
complexit g
challenges, such as computational efficiency and over- & Decision
92.30% y, Websit
adaptation in the ANN model, requiring the use of Gupta Tree
interpreta es,
B.B.
regulatory techniques and reducing the number of bility PhishT
(2018)
parameters to improve generalization capability. In ank
addition, computational efficiency is a major challenge, UCI
Purba
with models such as neural networks and random forest High- Phishin
y M.
dimension g
requiring significant computational resources, which may &
SVM 93.80% al space Websit
be an impediment in environments with limited resources. Kuma
managem es,
This comparison shows that the results of the current study r D.
ent PhishT
are consistent with previous studies and emphasizes the (2021)
ank
importance of using advanced machine learning techniques UCI
Gando
in detecting phishing, with a focus on improving Error Phishin
tra E.
computational efficiency and adapting new data to achieve correction g
& Gradient
better performance in the future. 94.50% , iterative Websit
Gupta Boosting
refinemen es,
D.
It should be noted that the nature of phishing data is t PhishT
(2021)
constantly changing, and the data used may not reflect all ank
current phishing types. Therefore, it is important to use Hung PhishT
CNN Significan
Le et F1: ank,
continuously updated datasets and apply continuous (URLNet t
al. 97.2% Alexa
learning techniques to ensure that models can adapt to new ) computati
(2017) Top
threats quickly. This study opens doors for future paper in
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4630
onal when processing large data sets or operating in real-time
resources environments. This requirement can hinder the deployment
Less of these models in low-resource environments or
Hong ML with effective applications where fast response time is critical. Another
PhishT
J. et lexical 91% on limitation is the heterogeneity of the dataset used in our
ank
al. features new/unsee
study. While we use a dataset from PhishTank, the phishing
n sites
UCI landscape is constantly changing, and the datasets may not
Managing be able to capture the full scope of phishing threats. This
Phishin
J. large, limitation can affect the models' ability to generalize to all
g
Kuma Random diverse
96% Websit types of phishing attacks, especially those that use new
r et al. Forest datasets
es, techniques or target specific demographics. Additionally,
(2020) without
PhishT the study focuses on machine learning models without
overfitting
ank
including deep learning methods such as convolutional
Aljofe Detecting
Character PhishT neural networks (CNNs) or more advanced recurrent models
y A. et subtle
-level F1: 98% ank, to explore which techniques may be more effective in
al. anomalies
CNN DMOZ detecting phishing. Deep learning models have shown
(2020) in URLs
Learning promise in capturing complex patterns in data but were not
AlEro
Generativ from examined in this study due to their high computational
ud A.
e adversaria requirements and complexity. In future paper, these
& PhishT
Adversari 94% l attacks limitations should be addressed to increase the robustness
Karab ank
al to
atis G. and applicability of phishing detection models. One avenue
Networks enhance
(2020) for future work is to explore hybrid models that combine the
resilience
Decision Model- strengths of different machine-learning techniques. For
Tree: specific example, ensemble methods that combine decision trees and
Decision 96.7%, challenges meta-models or use a combination of SVM and ANN can
The Tree, SVM:83. , compensate for weaknesses such as overfitting or
propos SVM, 8%, computati PhishT computational inefficiency in individual models. Moreover,
ed ANN, ANN:87. onal ank
developing lightweight models that maintain high accuracy
model Random 15%, efficiency
Forest Random , with computational efficiency is essential for real-time
Forest: overfitting phishing detection. Techniques such as model pruning,
95.75% in ANN quantization, or using efficient architectures such as mobile
networks can be explored to reduce the computational
burden without compromising detection performance.
7. Limitations and Future Research
Developing a dataset used in phishing detection paper is
Exploring machine learning techniques for detecting another important area for future work. Collaboration with
phishing sites, as presented in this study, has led to cybersecurity agencies and industry partners will facilitate
important insights into the strengths and weaknesses of access to more comprehensive and up-to-date datasets that
different models. However, it is necessary to acknowledge reflect the current phishing landscape. In addition, the use
the inherent limitations associated with our paper and of synthetic data generation techniques such as generative
identify potential directions for future studies. Our paper adversarial networks (GAN) can help create different and
deployed a suite of machine learning models including challenging scenarios for training and testing phishing
Decision Tree, SVM, ANN, and Random Forest to evaluate detection models. Another promising direction is to
their effectiveness in detecting phishing sites. While these incorporate user behavior and contextual data into model
models showed excellent accuracy, they also showed some training. And prophecy. Understanding user interactions
limitations that need to be addressed. The main concern is with phishing threats can provide additional insights that
the challenge of overfitting, especially with ANN models. improve models' detection capabilities. Techniques such as
The tendency of artificial neural networks to outgrow behavior-based analysis or incorporating contextual
training data can reduce their generalizability to new, features from user environments may lead to a more
unseen data sets. This limitation is critical in the context of accurate and personalized phishing detection system.
phishing detection, as attackers are constantly evolving their Additionally, meeting the challenge of real-time phishing
strategies and models must adapt to new patterns of attacks as new and unknown threats emerge requires models
malicious behavior. Moreover, the computational efficiency that can learn and adapt in real-time. Incremental learning
of these models poses another challenge. The complexity methods, or online learning strategies, where models
and depth of models such as ANN and Random Forest can constantly update their knowledge as new data arrives, are
lead to significant computational requirements, especially critical to combating these evolving threats.
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4631
8. Conclusion methodologies employed, the thorough dataset preparation,
and the promising results all contribute to the advancement
The proliferation of phishing attacks in the digital age
of cybersecurity measures against phishing threats. The
presents a formidable challenge, one that demands
insights gained from this paper not only underscore the
innovative and effective solutions. This study's exploration
potential of machine learning in this domain but also
of machine learning models to detect phishing websites has
highlight the importance of continuous adaptation and
contributed significantly to this ongoing battle,
improvement in the fight against cyber threats. Future paper
demonstrating the potential of these techniques to enhance
will be pivotal in enhancing the robustness of phishing
cybersecurity measures. Through a detailed examination of
detection systems, expanding datasets, and integrating user
various models, including Decision Tree, SVM, ANN, and
behavior to develop more effective and personalized
Random Forest, this paper has highlighted both the strengths
solutions for combating phishing attacks.
and weaknesses inherent in each approach, providing a
comprehensive understanding of their capabilities in the Acknowledgements
context of phishing detection. The Decision Tree model
The authors would like to thank the Deanship of Graduate
emerged as a standout performer in this study, achieving an
Studies and Scientific Research at Jouf University for
accuracy rate of 96.7%, indicative of its robustness and
funding and supporting this research through the initiative
reliability in identifying phishing threats. This model's
of DGSR, Graduate Students Research Support (GSR) at
simplicity and interpretability make it an invaluable tool in
Jouf University, Saudi Arabia.
the cybersecurity arsenal, especially for rapid assessments
and modifications in response to evolving threats. The References
Random Forest model also showed impressive results, with [1] A. K. Dutta, "Detecting Phishing Websites Using
a 95.75% accuracy rate, underscoring the efficacy of Machine Learning Technique," PLoS ONE, vol. 16, no.
ensemble methods in enhancing detection capabilities by 10, e0258361, 2021. [Online]. Available:
leveraging the strengths of multiple decision trees. While https://doi.org/10.1371/journal.pone.0258361.
the ANN model demonstrated considerable promise with its
deep learning capabilities, it also faced challenges related to [2] Jain A.K., Gupta B.B. “PHISH-SAFE: URL Features-
overfitting. This limitation underscores the need for careful Based Phishing Detection System Using Machine
model tuning and regularization to ensure its applicability to Learning”, Cyber Security. Advances in Intelligent
a broader range of phishing scenarios. Despite these Systems and Computing, vol. 729, 2018,
challenges, the insights gained from the ANN model are https://doi.org/10.1007/978-981-10-8536-9_44
instrumental in understanding the complex, non-linear [3] Purbay M., Kumar D, “Split Behavior of Supervised
relationships in phishing data, paving the way for future Machine Learning Algorithms for Phishing URL
advancements in this area. Detection”, Lecture Notes in Electrical Engineering,
The SVM model, although exhibiting a lower accuracy rate vol. 683, 2021, https://doi.org/10.1007/978-981-15-
of 83.8%, provided crucial insights into the phishing 6840-4_40
strategies that require more nuanced detection approaches. [4] Gandotra E., Gupta D, “An Efficient Approach for
This finding highlights the importance of a diverse model Phishing Detection using Machine Learning”,
portfolio to address the multifaceted nature of phishing Algorithms for Intelligent Systems, Springer,
threats effectively. Incorporating user behavior and Singapore, 2021, https://doi.org/10.1007/978-981-15-
contextual data into model training and prediction is another 8711-5_12.
promising avenue for paper. Techniques like behavior-
based analysis or integrating contextual features from user [5] Hung Le, Quang Pham, Doyen Sahoo, and Steven C.H.
environments could lead to more accurate and personalized Hoi, “URLNet: Learning a URL Representation with
phishing detection systems. This approach could enhance Deep Learning for Malicious URL Detection”,
the models' ability to adapt to individual users' unique risk Conference’17, Washington, DC, USA,
profiles and usage patterns. arXiv:1802.03162, July 2017.
Finally, addressing zero-day phishing attacks, where new [6] Hong J., Kim T., Liu J., Park N., Kim SW, “Phishing
and unknown threats emerge, requires models that can learn URL Detection with Lexical Features and Blacklisted
and adapt in real time. Incremental learning approaches or Domains”, Autonomous Secure Cyber Systems.
online learning strategies, where models update their Springer, https://doi.org/10.1007/978-3-030-33432-
knowledge as new data arrives, are crucial in combating 1_12.
these evolving threats. [7] J. Kumar, A. Santhanavijayan, B. Janet, B. Rajendran
In conclusion, this study has made significant strides in the and B. S. Bindhumadhava, “Phishing Website
use of machine learning for phishing detection. The diverse Classification and Detection Using Machine Learning,”
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4632
2020 International Conference on Computer [19] S. M. Al-Rawahi and M. S. Al-Fahdi, "Using machine
Communication and Informatics (ICCCI), Coimbatore, learning techniques for rising phishing attacks on social
India, 2020, pp. 1–6, networks," in Proc. IEEE Conf. on Application,
10.1109/ICCCI48352.2020.9104161. Information and Network Security (AINS), Muscat,
Oman, 2020, pp. 1-6.
[8] "Hassan Y.A. and Abdelfettah B, "Using case-based
reasoning for phishing detection", Procedia Computer [20] A. N. Khan, M. Kiah, S. A. Madani, S. Ali, and M.
Science, vol. 109, 2017, pp. 281–288." (“[1] F. Yahya Shamshirband, "Phishing attacks detection using
et al., Detection of Phishing Websites ... - machine learning and deep learning techniques: A
ResearchGate”) review," Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 6,
2019.
[9] Rao RS, Pais AR. Jail-Phish: An improved search
engine-based phishing detection system. Computers & [21] E. Sitnikova, "Phishing in the era of advanced cyber
Security. 2019 Jun 1; 83:246–67. threats," in Cybersecurity Education for Awareness and
Compliance, IGI Global, 2019, pp. 28-50.
[10] "Aljofey A, Jiang Q, Qu Q, Huang M, Niyigena JP."
(“Prediction of Phishing Websites Using Stacked
Ensemble ... - Springer”) An effective phishing
detection model based on the character-level
convolutional neural network from URL. Electronics.
2020 Sep; 9(9):1514.
[11] AlEroud A, Karabatis G. Bypassing detection of URL-
based phishing attacks using generative adversarial
deep neural networks. In: Proceedings of the Sixth
International Workshop on Security and Privacy
Analytics 2020 Mar 16 (pp. 53–60).
[12] R. Verma and N. Dyer, "Detection of Phishing
Websites Using a Novel Twofold Ensemble Model,"
IEEE Access, vol. 7, pp. 114134-114145, 2019.
[13] H. R. Shahriar, M. Zulkernine, and S. M. Farhad,
"PhishDef: URL names say it all," IEEE Trans. Netw.
Serv. Manag., vol. 17, no. 1, pp. 498-511, Mar. 2020.
[14] B. B. Gupta, A. Tewari, D. Jain, and M. Agrawal,
"Fighting against phishing attacks: state of the art and
future challenges," Neural Comput. Appl., vol. 31, no.
12, pp. 9143-9169, Dec. 2020.
[15] [14] K. R. Choo, "Cryptocurrency phishing and scams:
Attack vectors, impacts, and a way forward," IEEE
Access, vol. 8, pp. 67512-67525, 2020.
[16] L. Zhang, S. Tan, and J. Yang, "URLNet: Learning a
URL representation with deep learning for malicious
URL detection," IEEE Access, vol. 8, pp. 1776-1786,
2020.
[17] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter,
"Accessorize to a crime: Real and stealthy attacks on
state-of-the-art face recognition," in Proc. ACM
SIGSAC Conf. Comput. Commun. Secur., Dallas, TX,
USA, 2019, pp. 1528-1540.
[18] A. D. Nguyen, M. L. Nguyen, and N. G. Nguyen, "Deep
learning for deepfakes creation and detection: A
survey," IEEE Access, vol. 9, pp. 139877-139907,
2021.
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(21s), 4619–4633 | 4633