Across_the_Spectrum_In-Depth_Review_AI-Based_Models_for_Phishing_Detection
Across_the_Spectrum_In-Depth_Review_AI-Based_Models_for_Phishing_Detection
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Received XX Month, XXXX; revised XX Month, XXXX; accepted XX Month, XXXX; Date of publication XX Month, XXXX; date of
current version 11 January, 2024.
Digital Object Identifier 10.1109/OJCOMS.2024.011100
ABSTRACT Advancement of the Internet has increased security risks associated with data protection
and online shopping. Several techniques compromise Internet security, including hacking, SQL injection,
phishing attacks, and DNS tunneling. Phishing attacks are particularly significant among web phishing
techniques. In a phishing attack, the attacker creates a fake website that closely resembles a legitimate one
to deceive users into providing sensitive information. These attacks can be detected using both traditional
and modern AI-based models. However, even with state-of-the-art methods, accurately classifying newly
emerged links as phishing or legitimate remains a challenge. This study conducts a comparative analysis
of more than 130 articles published between 2020 and 2024, identifying challenges and gaps in the
literature and comparing the findings of various authors. The novelty of this research lies in providing a
roadmap for researchers, practitioners, and cybersecurity experts to navigate the landscape of machine
learning (ML) and deep learning (DL) models for phishing detection. The study reviews traditional
phishing detection methods, ML and DL models, phishing datasets, and the step-by-step phishing process.
It highlights limitations, research gaps, weaknesses, and potential improvements. Accuracy measures are
used to compare model performance. In conclusion, this research provides a comprehensive survey of
website phishing detection using AI models, offering a new roadmap for future studies.
INDEX TERMS Anomaly Detection, Blocklists, Cyber-Attack Mitigation, Cybersecurity, Deep Learning
(DL), Machine Learning (ML), Phishing Detection, Threat Intelligence, Web Phishing Detection, Whitelists
I. Introduction for fraudulent activities or sales on the dark web [2]. Under-
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME , 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
a small fraction would fall for it [4]. Such messages often phishing attack methods along with the step-by-step phishing
included poor grammar and other obvious signs of fraud [5]. attack process. Section V presents how phishing works and
Today’s phishing techniques are highly targeted and person- different phishing techniques. Section VI explains different
alized, using even minute details about the victim obtained phishing detection methods based on webpage screenshots,
from social media or other sources [6]. This approach, known while Section VII describes the different phishing detection
as spear phishing, increases the chances of success due to datasets and their comparative analysis. Similarly, Section
its more convincing deceit [7]. VIII presents different anti-phishing methods with their
Detection of web phishing involves identifying and mit- comparative analysis. Section IX presents model recommen-
igating malicious activities before they can spread [8]. dations for phishing detection, highlighting which model
Traditional detection mechanisms are highly dependent on is the best fit for phishing detection. Section X highlights
blacklists, which are databases containing known phishing open challenges and discussions related to phishing detection
URLs that are blocked in web browsers and security software and research papers. In the last section, the conclusion is
[9]. The effectiveness of blacklists is often limited, as they presented.
can only protect against threats that have been identified in
the past—making them essentially reactive. This limitation
II. Literature Review
has spurred the development of more advanced detection
techniques designed to track down newer and emerging web Internet use has become integral to people’s daily lives,
phishing threats in real-time [10]. making it difficult to envision a world without it. Ac-
Machine learning has become a critical tool in the fight cording to the Global Digital Population Survey Report
against phishing on the web. By analyzing large volumes (GDP) [15], published in 2023, approximately 5.3 billion
of data, machine learning algorithms can identify patterns people use the internet worldwide. Of these, 62% use social
and anomalies associated with phishing attacks [11]. These media. In the report [16], it is stated that 94.6% of these
algorithms can learn characteristics related to URL structure, users have accessed the internet through smartphones. This
domain age, website content, and email metadata to detect connectivity has revolutionized life, including information
phishing attempts with high precision [12]. Thus, using exchange, online shopping, communication, and professional
machine learning not only enhances detection capabilities but tasks. At the beginning of 2019, when the pandemic began,
also reduces false positives, thereby avoiding the misidenti- there were significant changes in traditional offline services.
fication of legitimate websites and emails as malware [13]. These services transformed from offline to online platforms,
However, traditional and AI models often fail to detect newly particularly in industries such as catering and retail.
emerged phishing links. Therefore, systematic literature re- In this digital era, individuals frequently share sensitive
views are essential for researchers to identify study gaps, online data, such as login credentials, personal informa-
evaluate the performance of existing models, and discuss tion, and credit card details. Unfortunately, cybercriminals
current datasets [14]. exploit various illicit methods to acquire this information
The significant contributions of this study include the and subsequently engage in unauthorized activities on the
examination of methods to prevent phishing attacks, covering Internet. Network security concerns have been present since
attack types, phishing processes, user behaviors, and pre- the inception of the Internet, evolving in tandem with its
vention measures. It discusses both traditional and modern development. In [17], the author proposed that the rapid
detection methods, analyzes existing solutions, and addresses evolution of network attack techniques poses significant
their limitations. The key contributions include: challenges to cybersecurity. There are several categories of
cybersecurity issues, classified based on attack methods and
1) Discussing all possible phishing attack modes, tech- forms, including denial-of-service attacks (DoS), man-in-
niques, and the effects of attacks. the-middle (MitM) attacks, SQL injection (SQL-Inj), zero-
2) Exploring attack processes, typologies, and anti- day (ZD) exploits, DNS tunneling, phishing, and malware.
phishing solutions. In [18], the author explained that the dynamic landscape of
3) Presenting a comparative analysis of traditional as well the Internet and its vulnerabilities necessitate ongoing efforts
as machine learning (ML) and deep learning (DL) based to enhance cybersecurity measures and protect users from
models. potential threats.
4) Providing taxonomic classification of anti-phishing In [19], the author explained that phishing attacks require
techniques. tactful skills, including re-engineering, networking, coding,
5) Measuring the performance of models using accuracy databases, and deep knowledge of protocols and how infor-
to evaluate significant models for phishing detection. mation is stolen from these protocols. In [20], the author
6) Finally, presenting a plethora of promising future re- explained how the attacker designs the phishing page linked
search directions. to the database; the web form looks like the original and
The remainder of this paper is organized as follows: shares a link with the user using social media, SMS Gateway,
Section II discusses the literature review. Section III covers and email. This sharing contains alarming messages and
the research methodology. Section IV discusses different warning text, including misleading images, to attract and
2 VOLUME ,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
induce the user to click the link. Phishing attacks have caused rameters, and data that need to be retrieved. Ancient rules
economic losses in the last 30 years. In [21], the author such as whitelists and blacklists, while checking the URL
discusses the history of phishing, stating that phishing attacks registered domain, if the domain exists, the system passes
increased dramatically during the 2019 pandemic. In that the URL as legitimate; otherwise, it blocks it [26]. For
period, governments worldwide issued financial assistance the authenticity of the URLs, the system needs to verify
to their citizens and started collecting sensitive data that information such as domain expiry and registered date from
contained bank accounts, credit card history, debit card a third party. Once the authentication rules are published,
history, and personal details to disburse funds. the attacker learns them and works according to the authen-
Similarly, the attacker launched the same campaign to tication laws to bypass the system. Thus, ancient methods
obtain data online from citizens. According to phishing have not been successful in controlling phishing attacks.
attack statistics published in 2022, approximately 36% of Several models were used for anti-phishing after the ML
data breaches are caused by phishing attacks, and 83% of and DL models. The phishing detection mechanism uses
citizens in the US experience phishing attacks [22]. This ratio labeled data to classify phishing and legitimate websites.
increased from 80% to 345% from 2020 to 2021. Another Different state-of-the-art models have been used for web
report published in 2022 by a US-based organization [23] phishing detection and identification in ML and DL [26],
states that the number of phishing attacks doubled in 2022 [31], [32], [33], [34], [35], [36], [37], [38]. The fundamental
compared to the 2019 pandemic due to the high success rate. use of these models is to identify and classify phishing links
In [23], the author proposed several methods to prevent correctly. Therefore, the models are differentiated based on
phishing attacks, including technical staff education and their accuracy and computation time. The higher accuracy
training on daily email responses, SMS, WhatsApp, and and lower time computation models are considered the best
social media material sharing. In [24], it is represented that models for detection.
the objective is to survey recently published anti-phishing Phishing attacks have increased dramatically due to the
methods. Identifying a phishing website is a challenging increased number of members on social media and online
task during the process of obtaining user information. Re- businesses. Therefore, cyber risks and threats have increased,
searchers have proposed several methods to identify phish- needing to be addressed appropriately [39]. The complex
ing websites before the invention of artificial intelligence, nature of hyperlinks makes it difficult for the human eye to
including traditional methods such as whitelisting universal recognize original and fake links. Therefore, cybersecurity
resource locators (URLs) and blocklisting URLs [25]. In experts are paying more attention to the detection of coun-
the whitelist URLs list, several URLs were considered le- terfeit URLs. Phishers use advanced techniques and methods
gitimate, while others were considered phishing. Similarly, after learning the modern methods of ML and DL [29].
blocklists contain all shortened URLs, unnecessary strings, Several research papers have been published on phish-
long lengths, unstructured formats, and ambiguous domains. ing detection methods. In [40], the authors have analyzed
Whitelist URLs and blacklist URLs are shared with the different phishing solutions based on different parameters.
general public to avoid visiting such URLs [26]. The authors discussed lists of phishing techniques used on
This approach prevents the user from phishing attacks; other devices and provided countermeasures against phishing
however, it is not as effective because of the higher com- attacks in four major categories: AI-Anti-Phishing models,
putational cost of algorithm matching with a single string Classical methods based on different scenarios, and lists-
by string in a real-time environment. However, this method based. The authors concluded that the appropriate feature
could not identify shortened, modified, and long-string phish- selection method gives a higher output for better results
ing URLs [27]. Another ancient method is known as the and that the model shows the highest accuracy compared to
rule-based phishing detection method. In this method, rules other AI models. However, the authors did not investigate
are defined for web surfing. This type of detection requires other ML and DL methods proposed in [41], [42], [43],
expert knowledge of cybersecurity policies and web filtering. [44], [45], such as SVM, LSTM, NB, and other modern DL
According to this method, the user must know how to models, which can detect with accuracy rates from 99.00%
implement rules and analyze the URLs, either phishing or to 99.62%.
legit [28]. After the introduction of ML and DL models, In [46], the authors explained the two main types of
phishing detection and identification became more efficient, phishing attacks, including social engineering and the use
but there are drawbacks to traditional ML models. In these of malware. The authors also discussed feature extraction
models, feature extraction must be performed manually to techniques based on some rules. However, they did not
identify phishing pages. This means that humans must write discuss the challenges associated with feature extraction,
the rules that if such a string, word, or signature does not limitations, which feature extraction technique is suitable for
validate, that URL will be marked as phishing [29]. which ML DL model, and how accurately required features
According to the World Wide Web Consortium (W3C) were extracted as given in these [36]–[47] papers.
[30], a URL must contain elements such as the protocol, In [48], authors provide phishing attacks, and their so-
subdomain, domain, port number, database path, query pa- lutions are categorized into three main types: URL-based,
VOLUME , 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
content-based, and hybrid approaches. After carefully study- The author used the same models for noise removal from the
ing and comparing the proposed methods, they concluded dataset. The author and collaborators explore the interaction
that a hybrid approach is best for the detection and preven- of swarm intelligence and deep learning for phishing detec-
tion of real-time phishing. However, they did not discuss tion. Swarm intelligence draws inspiration from collective
the challenges associated with implementing the model and behavior observed in natural systems (e.g., ant colonies,
dataset described in these [49], [50] papers. bird flocks). Combining these principles with deep learning
In [51], the authors explained different anti-phishing meth- allows their I-BBA model to be observed [60].
ods with phishing techniques. They discussed nine different In [61], the authors introduce a novel approach that
datasets used for phishing detection methods. They also combines support vector machines (SVMs) with nature-
wrote about 18 different AI-based models and compared inspired optimization algorithms. SVMs are robust classifiers
their results. They discussed various challenges and limi- that aim to find an optimal hyperplane to separate data
tations of the models, such as precision and over-fitting. points into different classes. By integrating these algorithms,
However, they did not discuss the methods to reduce the the researchers achieved promising results in identifying
model over-fitting and improve accuracy, as discussed in phishing websites. In [62], the study investigates techniques
these papers [51], [52]. for detecting spoofed websites, which often mimic legiti-
The authors in [53] discussed different email phishing mate sites to deceive users. The authors explore machine
detection techniques, including email spoofing using modern learning models, feature engineering, and anomaly detection
ML methods and natural language processing (NLP). NLP methods. Their work enhances the accuracy of identifying
and ML are used for feature extraction and to detect ma- fraudulent web pages. Their study involves URL feature
licious email content. In [54], the author explained that the extraction, behavioral analysis, and model training. Taking
analysis is based on URL parts, page contents, and web page into account the lexical and content-based features, they
coding to find whether any tag or part of the code is modified contribute to the development of robust detection mecha-
or redirected elsewhere. Then, all models’ performance will nisms [63]. Another study [64] explores a comprehensive
be compared to find the best one. In [55], the authors did approach to web phishing detection. The authors combine
not clearly explain the feature detection methods. There are web crawling techniques, cloud infrastructure, and deep
many ways to extract features, such as manual selection learning frameworks. By analyzing web content, network
methods and applying ML and DL models. However, there is traffic, and behavioral patterns, their model provides robust
a problem with the ML and DL models because the analyst protection against phishing attacks.
has to manually select features that will be useful for the The overview paper [65] by Scholar and the team critically
current dataset only. In case the data set changes, the feature examines existing methods for detecting phishing sites. They
selection technique fails. Furthermore, the authors did not discuss zero-day attacks, adversarial evasion, and real-time
review the modern methods for feature selection. detection challenges. In this work, the authors investigate
The author reviews modern AI-based phishing detection the effectiveness of combining multiple machine-learning
models in [56]. This paper divided the detection methods models for phishing classification. They achieve improved
into four categories: ML DL-based, scenario-based, hybrid performance by leveraging ensemble techniques such as
approach, and list-based. The author fails to provide a more stacking or blending. Their study emphasizes the benefits
detailed review of AI models that can be used confidently of model fusion in security applications.
for phishing detection. This paper lacks data processing and In [66], researchers and colleagues propose an ensemble
feature extraction techniques for phishing detection datasets. model designed explicitly for detecting phishing intrusions
In [57], they proposed a novel approach for detecting from URLs. Their approach combines decision trees, random
phishing websites by combining Support Vector Machines forests, and gradient boosting. By considering diverse classi-
(SVMs) with nature-inspired optimization algorithms. SVMs fiers, they enhance the robustness of their detection system.
are robust classifiers that aim to find an optimal hyperplane The research article [67] opens up a new dimension to
to separate data points into different classes. By integrating the world: the DRL-BWO algorithm, optimized by Black
these algorithms, the researchers achieved promising results Widow Optimization, for UAV networks. In addition, DRL
in identifying phishing URLs. The author and colleagues de- incorporates an enhanced reinforcement learning-based DBN
veloped an anti-phishing browser that leverages the Random for the detection of intrusions in UAV networks. The BWO
Forest algorithm and a rule-based extraction framework. algorithm is applied to the parameter optimization of the
In [58], the authors proposed an RF method for the DRL approach. It enhances the performance of intrusion
detection of Web phishing. The study is based on rule- detection in UAV networks, securing communication over
based detection. The features extracted for modeling are the UAV.
based on the RF model. In [59], the author gives us a
view of the different phishing techniques. This study is
A. Research Gap in Literature Review
based on the hybrid approach for phishing detection, and
features are extracted using XGBoost and Gradient Boost. During the literature review, we identified several research
gaps that highlight areas where further investigation is
4 VOLUME ,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
VOLUME , 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
6
TABLE 1. The research gap review table summarizes the methods employed and the limitations identified in various studies focused on phishing detection. It highlights gaps such as the need for model
adaptability, handling modern attack techniques, effective feature selection, dataset diversity, and real-world applicability across different approaches.
3 Shahrivari et al. [32] DT, LR, KNN, ANN, RF, Lack of feature selection/extraction methods, lack of exploration of deep learning techniques, and
Ad inadequate real-world scenario applicability.
4 ALSARIERA et al. [34] ABET, BET, RoFBET, There is a need to explore diverse datasets beyond Mendeley, scalability assessment for larger datasets,
LBET, ANN, SVM, RF, hyperparameter impact analysis, and adaptation to evolving threats.
KNN, SVM, DT
5 Lokesh & BoreGowda RF, KNN, DT, L-SVC Lack of specific feature discussion, detailed algorithm comparison, real-world dataset robustness, and
[38] adaptation to emerging phishing techniques.
6 Butt et al. [41], [42], LSTM, SVM, NB, ISHO, Lack of comparative analysis, absence of discussion on URL feature selection, and limited applicability
[68] WARM, Firefly, BAT to specific datasets.
7 Zamir et al. [47] [RF], [NN], NB, KNN Lack of integration methods for various data sources repetition of existing models from previous research.
8 Jovanovic et al. [60] XGBoost, WI, MOFA In this study, the author discussed only selected features and did not discuss handling blank images and
shortened URLs.
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
9 KARIM et al. [63] DNN, RF, NB, Na Absence of generalization to evolving phishing methods, inadequate feature extraction for large datasets,
and limited real-world scenario assessment.
10 Zieni et al. [64] list-based, similarity- Inadequate handling of imbalanced datasets, lack of feature details for classification, and focus on
based, and machine controlled experiments over real-world scenarios.
learning-based
11 Shaukat et al. [65] SVM, RF, MP, XGBoost Limited to three datasets, lacks a universal feature extraction method, and inadequate analysis of ML
and DL models.
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
12 Korkmaz et al. [69] XGBoost, RF, LR, KNN, Missing exploration of mitigation strategies for detected phishing attacks.
SVM, DT, NN, NB
13 Adebowal et al. [70], LSTM, CNN, IPDS Clas- Uncertainty on adaptation to new phishing methods, lack of automated feature selection for large datasets,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
[71] sifier and inadequate real-world scenario assessment.
14 Maci et al. [72] DL, DRL, MDL, ICMDP Unclear performance with increasing features and large datasets, lack of exploration on additional features’
impact, and absence of real-time scenario evaluation.
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
VOLUME ,
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
VOLUME , 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
institution report, about $4.1 billion was lost due to social 3) Domain Registration The developed website is
media phishing attacks [78]. then hosted on a domain using a URL similar
to the original one, with minor modifications—for
example, https://www.bankalfa.com.pk to
3) Using SMS (Short Message Service) https://www.bankalfaa.com.pk.
Phishing attacks have increased due to changes in the way 4) URL Shortening After the website is hosted, the next
people communicate. Since the invention of the smartphone, step is to shorten the URL to share with the target
social communication modes have evolved [79]. People audience.
enjoy sending SMS messages to their loved ones to stay in 5) Forward Phishing URL Once the URL is shortened, it
touch because it is an inexpensive communication method. is forwarded using various means, such as social media,
Similarly, companies send SMS messages to promote prod- email, SMS, WhatsApp, and other sources.
uct campaigns to end users [80]. Using the same strategy, 6) User Click The phishing link contains attractive de-
phishers send SMS messages containing links that request tails, content, offers, and language that persuade the user
the recipient to fill in personal information. Sometimes these to click. Users then fill in the requested information,
messages claim to be surveys, while other times they an- which is transferred to the phisher’s web server.
nounce the launch of a new product. The user may click the 7) Collect Required Data When users click the link, they
link, believing that they will receive a significant discount, are prompted to fill in information that is directly saved
and then fill in all the required fields. Consequently, all on the phisher’s web server. Phishers then use this data
sensitive information is transferred to phishers, who use it for for other purposes.
illegal activities, leading to financial loss for the user [81]. 8) Illegal Use of Data Phishers collect data and use
it for illegal purposes, resulting in financial loss and
compromise of online accounts.
4) Using Live Messengers
There are other ways to communicate with friends and
family through live messengers such as Yahoo Messenger, Y-
Mail Messenger, and Hotmail Messenger. People share their
location, pictures, documents, and sensitive information via
these platforms. Phishers often pretend to be company repre-
sentatives, sending phishing links and requesting recipients
to fill in their information to participate in a supposed lucky
draw. As a consequence, users may face financial loss and
account termination [82].
5) Using Blog Posts and Community Forums FIGURE 4. It illustrates the typical steps of a phishing attack. It begins
Blog posts and community forums are websites where people with the attacker sending a phishing email to the target, followed by the
target clicking on a phishing link. This action leads them to a fake website
share their thoughts and problems, discuss issues, and get where their credentials are collected. Ultimately, the attacker uses these
feedback from others. They are widely used for sharing credentials to access private information on the original website.
information and completing surveys, forms, and other details.
Attackers may create fake surveys, pretending to be the
forum owner, to obtain members’ details, which are later A. Phishing Attack Damages
used for illegal purposes [83]. A phishing attack is a hacking activity in which the phisher
obtains personal information accessed via URL, which may
V. How Web-Phishing works cause the following effects on the user:
Phishing attacks differ entirely from hacking or gaining 1) By gaining unauthorized access, phishers cause finan-
unauthorized access through various means. In phishing cial damage to the end user without consent.
attacks, the phisher is typically a technical person with 2) It may tarnish the user’s online reputation, leading to
deep knowledge of web development, SQL tools, machine reduced business opportunities.
learning, and the creation of fake web pages. Phishing attacks 3) Phishing attacks diminish trust in companies, resulting
involve several steps as shown in Figure 4 and outlined in decreased customer engagement and business activ-
below. ities.
1) Develop Strategy First, phishers develop a strategy to
identify and target a specific community. B. Possible Phishing Techniques
2) Development of Phishing Website Once the target Phishing attacks employ various methods to deceive users
is identified, the next step is to develop a website that and obtain sensitive information. The following are some
closely resembles the original. possible techniques used by phishers as shown in Figure 5:
8 VOLUME ,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
d: Spear Phishing
In this attack, the phisher targets an individual from a
reputable organization. The attacker monitors the target’s
social media accounts to learn their schedule. Once enough
FIGURE 5. Phishing Attack Methods diagram categorizes various information is gathered, the phisher contacts the target via
phishing techniques into four main groups: Social Manipulation, email, pretending to be a company manager, and requests
System-Based Methods, Using Mobile Devices, and Other Phishing
Methods. Each category lists specific types of attack, highlighting the them to fill out a form for an urgent meeting. The target,
diversity and complexity of phishing tactics used to compromise security believing the request is legitimate, shares sensitive infor-
across different platforms. mation with the attacker. Using spear phishing methods,
several prominent authorities have been attacked, suffering
significant financial losses and data breaches [87].
1) Social Manipulation
Social manipulation [84] involves tricking individuals into
sharing personal information without realizing the risk of 2) System-Based Methods
being hacked. This often involves following a URL to The following are the main types of system-based phishing
provide information, thinking it is necessary to secure their attacks:
account. For example, receiving an email that appears to be
from a bank asking for account confirmation can prompt a: Ransomware Virus
a person to click on a link and fill out a form, believing Ransomware is a modern type of malware that deeply affects
that they are providing accurate information to their financial users, causing financial and sensitive data loss [88]. In this
institution. Phishers exploit this by counting on the user to attack, phishers send harmful links containing ransomware.
click on the URLs, leading to unauthorized access. Once the user clicks on the link, the malware is downloaded
and installed on the target computer. After installation, all
a: Web Phishing Tricking files are encrypted, and a pop-up displays the attacker’s
Phishers create fake web pages that closely resemble legit- account details for data recovery [89].
imate ones to deceive users. These fake pages often mimic
login screens or other interactive elements to trick users b: Trojan Horse Virus
into divulging sensitive information such as usernames and The Trojan horse is another type of malware similar to
passwords. ransomware, but it functions differently. The attacker sends
a Trojan via email, text message, or WhatsApp link to down-
load media or applications [90]. The Trojan then installs
b: Web Phishing Cloning
on the target device, running in the background without
This phishing attack involves cloning web pages to resemble
user knowledge [90]. Once installed, it sends sensitive data,
the original site. Phishers use online or offline tools to
including bank details, to the attacker [91].
create these fake web pages [85]. They primarily focus on
replicating login pages to ensure users believe they are on c: Content Injection
the original site rather than the phishing page. The cloned In content injection attacks, also known as cross-site script-
site can be detected visually and using browser security ing (XSS), attackers exploit website vulnerabilities to add
features, which may alert users about the phishing page harmful code to web pages [92]. This code, often JavaScript,
[86]. Therefore, phishers often remove identification tags, can perform dangerous actions such as stealing cookies,
codes, divs, and frames from the phishing web page to avoid session tokens, or other sensitive user data stored in the
detection [23]. browser [92]. Once inserted, the code can capture what users
type in forms or redirect users to fake pages that resemble
c: Email Tracking legitimate websites. According to reports, about 2.0 million
Another method of obtaining user details is by email. Phish- websites suffer from content injection attacks [49], making
ers might send two types of email to end users: it a prevalent online threat. These attacks often occur due
VOLUME , 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
to poor input validation or lack of sanitization, allowing verifying its authenticity, allowing phishers to collect the
attackers to inject malicious scripts into trusted sites. To information.
mitigate these risks, web developers should implement secu-
rity measures such as Content Security Policy (CSP), input b: Mali-App
validation, output encoding, and regular security audits to Mali-App stands for malicious applications, referring to apps
detect and fix vulnerabilities before exploitation [93]. not verified by Google’s security mechanisms [104]. These
third-party apps can harm mobile devices [105]. Phishers
d: Keylogger / Screen Logger gain unauthorized access through these apps and start col-
A keylogger is an application installed on a system that lecting data and sensitive information.
tracks user keystrokes [94]. This malicious software can
be installed physically or remotely on the target system c: Vishing
without the user’s knowledge. Once installed, the software Similar to smishing, vishing is a type of phishing attack
tracks keystrokes and website visits, sending the data to the where attackers use voice manipulation to mimic legitimate
phisher’s email or configured location. voices [106]. The attacker sets up a VoIP platform and uses
voice changer software to imitate an authentic voice. For
e: Communication Crack Phishing Attack example, an employee of a reputable company may receive
The communication crack phishing exploit targets vulner- a call from someone pretending to be their manager, asking
abilities within wireless networks, such as open or public for critical login details. This method is often successful due
Wi-Fi [95]. The attacker conducts a man-in-the-middle at- to the high level of trust involved.
tack using tools like Wireshark or SSLstrip to capture and
decrypt traffic between users and the organization’s servers d: Wi-Fi Phishing
[96]. Users are drawn to a rogue access point that mimics
Wi-Fi phishing is a modern phishing method involving open
a real network, capturing sensitive information, including
Wi-Fi hotspots. While users access the internet through these
login credentials and financial data [97]. Alternatively, the
hotspots, attackers monitor traffic for valuable information.
attacker may redirect users to phishing sites for credential
Alternatively, users may be asked to fill out a registration
harvesting or inject malware during communication. [98]
form, which includes personal information stored in the
The attacker exploits weaknesses in encryption, such as the
phishing system [49].
KRACK vulnerability in WPA2 protocols, to decrypt and
modify network traffic [99]. Mitigating such risks involves
implementing robust encryption, regularly updating security
4) Other Phishing Methods
protocols, and educating users about the risks of connecting
to open Wi-Fi networks and recognizing phishing attempts a: Compromised Server
[100]. In this phishing attack, the phisher hacks a target website’s
server and uploads a malicious toolkit. The phisher silently
controls the server and hosts a similar web page to di-
3) Using Mobile Devices vert users, making them believe it is legitimate [107]. By
Over the decades, mobile device use has become widespread. compromising the server, hackers save on hosting costs. A
Today, phones are primarily used for communication, in- study [108] found that about 76.5% of websites are hosted
formation sharing, and online shopping. Phishers generate on compromised servers.
phishing links and share them via SMS, email, WhatsApp,
and third-party mobile apps [101]. When users click on these b: Phishing Using Botnets
links and provide information, their devices are compro- Botnets [109] are networks of computers connected to per-
mised. The attacker gains control over the mobile device, form specific tasks. These computers ensure the smooth op-
leading to misuse. Several mobile phishing methods are eration of applications like VoIP and chat systems. However,
listed below. phishers may insert malicious applications into one system,
sending numerous emails from the organization’s systems.
a: Smishing This type of attack is particularly dangerous and difficult to
Smishing is a blend of ”SMS” and phishing, hence the detect.
name. Attackers send SMS messages with malicious links,
prompting users to click and fill out forms [102]. For c: DNS Injection Attack
example, if the government announces financial assistance Phishing attacks that rely on DNS manipulation have become
for needy people, phishers might generate messages with more sophisticated and dangerous. Many fake websites are
phishing links requesting sensitive personal, banking, and hosted, and traffic is diverted using DNS injection methods
other information [103]. Users, thinking the message is [108]. Once the DNS cache becomes infected, it starts
from a legitimate agency, share sensitive information without transmitting data to malicious URLs.
10 VOLUME ,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
5) Social Networking Phishing Methods available online for phishing detection, varying in size from
Social networking sites are significant targets for phishing small to large. As phishing attacks continue to increase,
attacks. The following methods are used to exploit these there is a growing need for advanced models for detection.
platforms. A well-trained model relies on the latest datasets with a
comprehensive set of features. The table 2 provides details
a: Malicious URL Sharing of some existing phishing datasets. In this section, we will
Social media has become integral to daily life, especially for review existing datasets, their features, and the models used
selling products and launching campaigns. Attackers design for feature extraction using these datasets, as well as their
malicious URLs and share them with members, friends, or advantages and drawbacks.
company contacts, asking for information related to meetings TABLE 2. Table provides a comprehensive list of datasets used in phishing
[50]. This information may include system usernames and detection research, categorized by their nature (Legit or Phisher), update
passwords. The relevant group interacts with the URL, pro- year, and accessible URLs for further exploration
viding the requested information, which is then transferred
to the attacker’s account. Dataset Name Update Type URL
Year
b: Masked URLs Phishing- 2024 Phisher https://phishtank.org/
In this attack, the phisher shares a URL while pretending Tank [116], developer info.php
to be the admin of a social media group [50]. The URL [117]
links to a dummy form that requests sensitive information. Alexa- 2022 Legit https://www.similarweb.
Once group members fill out the form, the attacker uses dataset [117] com/website/alexa.com/
the information for unauthorized access [24]. Attackers may
send messages from hacked accounts or even request loans, Wein- 2021 Phisher https://jeowein.net/
pretending to be legitimate contacts [110]. dataset [118]
Crawl- 2021 Legit https://commoncrawl.org/
c: Forged Profile dataset [119]
Another deceptive method is using a fake profile. The Open-Phish- 2021 Phisher https://openphish.com/
attacker targets a prominent social media figure, monitoring dataset [119],
their profile, posts, and comments daily. After gathering [120]
enough information, the attacker creates a similar profile, Phishing-army 2021 Phisher https://www.phishing.army/
mirroring the content, and adds new friends to deceive others. dataset [119],
[120]
VI. Phishing Detection Based on Webpage Screenshot Kaggle- 2021 Legit https://www.kaggle.
Screenshot-based phishing detection is a novel approach that phishing com/datasets/
utilizes visual analysis techniques to identify fraudulent web dataset [121] shashwatwork/phishing-/
pages [111]. In this technique, a screenshot of a webpage /dataset-for-machine-learning
is analyzed for visual elements such as logos, text layout,
color schemes, and design patterns to determine whether the
website is genuine [112]. Machine learning algorithms are UCI- 2022 Phisher https://data.world/uci/
employed to compare the screenshot against legitimate and Dataset [51] phishing-websites
phishing sites within a database, acting as image processors
Parsed- 2022 Legit https://doi.org/10.7910/dvn/
for anomaly detection or identifying potential threats [113].
dataset [122] omv
This technique is particularly useful for detecting phishing
attempts that visually replicate a real website, which might Yahoo- 2022 Legit https://webscope.sandbox.
otherwise go unnoticed by traditional text-based detection Phishing [123] yahoo.com/
methods [114]. Additionally, the approach can be integrated
with other detection techniques, such as URL analysis and Yandex- 2022 Legit https://Yandex.com/dev/xml/
text feature extraction, to enhance the overall accuracy and phishing [123]
reliability of phishing detection systems [115]. Phishing Phished- 2022 Phisher https://www.medien.ifi.lmu.
detection through screenshots offers powerful protection dataset [121] de/team
against growing phishing attacks by focusing on the visual
content of web pages [115]. • Phisher-Tank Dataset: The Phisher-Tank dataset con-
tains almost 2 million entries from phishing websites
VII. Web Phishing Detection Datasets that are blocklisted on the internet. Approximately 90%
In web phishing detection, the latest datasets play a vital of these websites are offline due to removal from
role in model training and detection. Several datasets are internet sources, while 11,000 remain active as phishing
VOLUME , 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
sites. This dataset is compiled and maintained by the the character level, allowing the model to use the TF-IDF
Talos group of companies. The Phisher-Tank also offers method for feature scoring. Once extracted, these features
an online service where users can paste a web URL to are used for model training.
check its legitimacy. If a URL is identified as phishing,
it is automatically added to the Phisher-Tank database.
• Dataset: Alexa provides an online system that analyzes 4) Recursive Feature Elimination Method
website performance, checking how efficiently a web- Known as REF, this method was introduced in [47]. It
site operates and whether it contains hidden phishing involves extracting all features and removing weak ones
links. This service, launched and controlled by Amazon, based on a threshold value.
helps identify and report phishing sites.
• Wein Phishing Dataset: The Jet-Wein phishing dataset
is an open-source system that uses an API to blocklist
5) Using Principal Component Analysis Method
phishing websites. This dataset contains approximately
Proposed in [47], this method begins with preprocessing,
15,000 phishing URLs.
followed by the selection of features after removing redun-
• Crawl Dataset: The Common Crawl dataset is generated
dant and unwanted ones. Techniques like median filtering
using a web crawler. Once URLs are crawled, they
or adaptive thresholding are commonly used. The wavelet
are tested with a phishing detector to identify phishing
packet transform (WPT) is another method that can be
URLs. This dataset contains over 10,000 entries.
applied in this technique [125].
• OpenPhish Dataset: OpenPhish is a free open-source
platform for web analysis and phishing detection. This
system includes almost 19 million phishing links.
6) Using Information Gain Method
A. Feature Extraction Methods
Information Gain is a popular method for feature extraction
in phishing datasets. As discussed in [47], this method
1) Dynamic Feature Extraction Method
uses probability functions to identify vital features based on
In phishing detection, accurate model performance relies
probability scores, selecting features that meet algorithmic
on extracting relevant features from the data, enabling the
criteria and discarding others.
model to train effectively and differentiate between phishing
and legitimate websites. Researchers [66], [85], [117] have
proposed a dynamic feature extraction method based on
7) Relief Ranking Filter Method
feature weights. They extracted 17 features from the dataset,
categorized into three groups: address-based features, script- Used in [126], this algorithm extracts features based on
based features, and tag-based features. This paper [85] a near-neighbor score algorithm. First applied to the UCI
discusses automatic feature extraction from the URL and dataset, features are scored, compared with near nodes, and
address bar without using third-party tools. However, the selected using the NNS algorithm. This method identified 22
WHOIS database is used for domain name and registra- features from the UCI-Phishing dataset.
tion data. Additionally, page scores are extracted from the
Google rank database. After extracting the features, they are
weighted based on their scores and averages, which are then 8) FRS (Fuzzy Rough Set) Feature Extraction Method
used to train and test the model. This algorithm, related to rough set theory, identifies related
data points and compares nodes and classes to discern
between them. For example, it compares every feature of a
2) Machine Learning Feature Extraction phishing website with another to check legitimacy. Features
In 2015, researchers proposed a new dataset containing are extracted based on class matches, represented as 0 or 1
over 11,000 instances with 30 features. These features were in the original UCI phishing dataset [126], [127].
extracted using machine learning models to improve phishing
detection accuracy.
9) El-Rashidy Feature Extraction Method
First introduced by El-Rashidy in [128], this algorithm
3) Feature Extraction Using NLP (Natural Language operates in two steps. Initially, it extracts features from the
Processing) dataset and trains the model using the RF ML model. During
The integration of NLP in machine learning has enabled re- training, it assesses accuracy and removes features with
searchers to extract features from phishing URLs more effec- lower accuracy. After training, it refines features and selects
tively. Character-level characteristics are extracted using ma- high-accuracy features for further testing. While effective for
chine learning models, classified, and used for model training small datasets, this method is less suitable for large datasets
[124]. NLP facilitates feature extraction from datasets at due to computational costs.
12 VOLUME ,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
B. URL-Based Feature Extraction Method appears in the code. The analysis checks for any tampering
This method involves extracting URL properties using var- or unusual information outside the normal structure [136],
ious techniques. These properties include URL syntax, do- [137], [138]. These features are used for model training.
main name, registration and expiration dates, website age, Additionally, form-related tags are identified and used as fea-
hosting server location, IP address, and DNS details. Ex- ture parameters in the training process. If a website contains
tracted features help identify whether a URL is legitimate graphical visuals, properties related to those graphics are also
or phishing. Four main categories of URL feature extraction extracted to identify manipulated pages. These features are
exist: crucial for phishing detection [129].
1) Hypertext-Based Features
These features relate to the website’s source code and include 5) Character N-grams
different HTML tags and forms. They are significant in train- The character N-grams extracted from URLs are overlapping
ing data for identifying parsed HTML or phishing HTML sequences of N-consecutive characters, extracted from the
pages. Several studies focus on these features, including URLs where the value of N varies between 1 and 10. For
[131], [45], [127], [128], [130], [132], [133], [129], [134], example, the first three bigrams of the URL ”example.com”
[135]. There are three main types of these features: are ”ex”, ”xa”, ”am”. This is much richer than the bag-
of-words approach used by researchers in [11] as it cap-
a: Text-Based Properties tures punctuation, misspellings, etc. in the URLs. Here N
This property involves analyzing the complete web page represents the length of the character substring. Table 3
code, including the HTML tags. First, tags are extracted, and provides comparative overview of different feature extraction
a frequency table is used to determine how often each tag methods.
VOLUME , 13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
14
TABLE 3. The table categorizes extracted features from various studies into different properties such as lexical, statistical, network, reputation, textual/visual, and traffic. It lists the feature types, the
number of features extracted, and the third-party services used for each category, providing a comprehensive overview of methods and tools applied in URL-based phishing detection research.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
VOLUME ,
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
VIII. Anti-Phishing Methods teractive training sessions are more successful than warning
In this section, we will discuss different approaches to anti- notifications [148], [149], [150].
phishing. These approaches include classical and state-of- A serious game was created to enhance users’ ability
the-art methods, including machine learning (ML) and deep to recognize phishing URLs. The game, ”Phisher,” is a
learning (DL). The basic structure of anti-phishing methods solo, lightweight, intuitive, and narrative-driven game. The
is shown in Figure 6. scenario begins with the player receiving several messages
claiming they won a substantial sum of $600,000 and would
be sent to an island if they input bank details. The player
A. User-Based Anti-Phishing Approaches
must capture the scammer using a boat, a hungry tiger, and
1) User-Based Anti-Phishing Technique
no money, returning to the beach to survive. The game play
a: Educating Anti-Phishing Technique progresses as the player answers questions about phishing.
Humans are often unaware of new threats, which makes ed- The surveys of the participants in the pre and post-game
ucation essential to learn about new methods and techniques. events showed that the number of correct answers increased
Education is crucial to teach people about phishing attacks, from 4-7 to 5-8 after playing the game [148], [149].
how phishing works, and how to identify phishing emails. The confidence level of accuracy rose from 4.09 to 4.47
Organizations worldwide must train their staff about phishing (p < 0.05), and the accuracy improved from 0.70 to 0.795
threats. Similarly, all government agencies should educate (p = 4.12 × 10−142 ). The false negative rate decreased from
the public about phishing [147]. By educating employees 0.22 to 0.14 (p = 5.03×10−091 ), while the false positive rate
and the general public, phishing attacks can be controlled. decreased from 0.34 to 0.25 (p = 7.71 × 10−076 ). 25% of
the participants played the game more than once, indicating
b: Awareness About Security Warning Anti-Phishing the appeal and participation of the game [148], [149].
Technique
Most phishing detection methods use browser plugins, which
quickly identify suspicious websites and alert users when d: User Response Anti-Phishing Technique
visiting a potentially dangerous site. Understanding security
Research conducted by various academics has investigated
warning signs is crucial when human intervention is required.
why individuals become susceptible to phishing schemes.
If users ignore these warnings, it could lead to negative
These studies also evaluate whether users examine the URL,
consequences. Proper training on the recognition of security
browser toolbar, or other security indicators. Many computer
indicators is essential. Studies have shown that 60% of the
users are targeted due to their ignorance of warning signs and
users ignore warnings and proceed to phishing URLs without
indicators when using toolbars. The survey found that most
training, while the click-through rate for trained users was 0
of the participants had no bank account and were unfamiliar
There are two types of warning: active warnings that
with financial jargon, making them unable to recognize
prevent users from accessing phishing URLs and passive
90% fraudulent sites, even when they visually resembled
warnings that display a message while allowing access. Most
legitimate ones. The studies discovered that 23% of the users
contemporary web browsers, such as Mozilla Firefox and
prefer to avoid verifying URLs. Researchers studied phishing
Google Chrome, use passive warnings. Active warnings are
attacks, specifically spear phishing, among 158 volunteers
more effective, since many users tend to ignore passive warn-
of various age groups. The study found that older women
ings. A study with 60 participants found passive warnings
were more susceptible to phishing attacks compared to other
inadequate; 79% noticed active warnings, while only 13%
demographics. Scarcity was more prevalent among young
noticed passive warnings [148], [149], [150].
people, while reciprocation was more prevalent among el-
derly adults [88].
c: Training Using Games Anti-Phishing Technique:
Training methods that incorporate games are advantageous
because they are convenient and easy to learn, providing a
natural setting for teaching. Various developers have created B. Classical Methods
interactive teaching tools to educate users on recognizing 1) URL-Based Method
phishing attempts. a: Blocklist URLs
Before and after studies demonstrated the effectiveness of This method uses anti-phishing tools like Phish-Net, Google
training games. Participants who played these games showed Safe Browsing, and PhishTank to generate a list of URLs.
an increased awareness of phishing emails and websites. The It works by matching suspected phishing URLs against the
training was integrated into users’ daily routines, making blocklist. If a match is found, the URL is identified as
it user-friendly. Periodically, instructive notes were sent to phishing; otherwise, it is considered legitimate. This method
users after the program began. Research showed that only is beneficial for quickly identifying known phishing URLs
30% of trained users clicked on fake links in emails they but is not effective for new URLs. Therefore, the blocklist
learned to recognize. Moreover, findings indicated that in- must be updated daily [151].
VOLUME , 15
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
FIGURE 6. The diagram of classical and AI-based anti-phishing models categorizes various anti-phishing techniques into three main groups:
user-based approaches, classical methods, and AI methods. It outlines specific strategies within each category, ranging from educating users and
employing URL-based methods to advanced machine learning and deep learning models. This comprehensive framework highlights the multi-faceted
approach necessary to effectively combat phishing threats.
b: Whitelist URLs This method does not identify new phishing URLs, requiring
This list contains only legitimate URLs with no associated the URL list to be updated daily. Similarly, whitelist and
phishing sources or code. All URLs that are not identified heuristic lists also need to be updated regularly. However,
as phishing are added and managed in this list. When a if all these methods are combined and implemented with
URL is visited, it is compared with the whitelist to verify automated list updates, they would provide the most effective
its legitimacy. If a URL is identified as phishing, it is and reliable solution for anti-phishing.
marked for the blocklist. However, if it matches the whitelist,
it is considered safe. The verification mechanism is often
integrated within search engines or browser extensions like TABLE 4. The table provides a comparison between URL-based anti-
the Google toolbar [26]. phishing methods, detailing their limitations, proposed enhancements,
and accuracy levels. It offers insights into how integrating these methods
c: Heuristics
with advanced technologies like machine learning and AI can improve
Heuristics involve analyzing URL details such as domain efficacy and reduce manual effort in identifying phishing threats.
name, domain path, website rank, Alexa ranking, and rep-
utation score. A suspected phishing URL is tested against Author Method Gaps Solution Accuracy
heuristic criteria. If the URL satisfies criteria such as domain [153] Blocklist Needs Combine with High for
name, path, address, and rank, it is considered legitimate. URLs daily other methods known
Otherwise, it is deemed phishing and added to the blocklist. updates like whitelists phishing
Due to the short lifespan of phishing URLs, they rarely and heuristics sites
achieve high rankings [152].
Table 4 shows that the Blacklist method performs well,
but there are some limitations associated with this approach.
16 VOLUME ,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
[26] Whitelist Requires Combine with High for is more efficient than HTML source code analysis and key-
URLs manual automated whitelisted word matching. These detection methods can be enhanced
effort to tools and sites by combining different approaches. If keyword matching,
maintain regular HTML source code analysis, and visual similarity methods
updates are combined, the resulting output will be superior to other
[154] Heuristics Requires Use machine Varies classical methods discussed in Table 4
advanced learning and depending
TABLE 5. The table provides a comparison of different methods for analyz-
analysis AI to improve on the im-
ing HTML source code in anti-phishing solutions, highlighting their gaps,
and accuracy plementation
proposed solutions, accuracy, and overall suitability for various security
expertise and reduce
needs. It underscores the trade-offs between ease of implementation
false posi-
and the level of security provided, suggesting combined approaches for
tives/negatives
enhanced protection.
VOLUME , 17
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
ined. Machine learning and deep learning models are trained ods to detect phishing websites. Several models have been
using labeled data to classify phishing and legitimate traffic. developed and proposed as standards for phishing detection
However, this method may not provide 100% accuracy due within the industry. However, due to the high rate of phishing
to a high false positive rate. While it can help secure the attacks and new phishing techniques, these methods need
system, more advanced methods are needed for precise traffic continuous improvement. AI models are divided into two
analysis [153]. categories:
phishing and legitimate URLs. The extension can block E(T, X) = p(c) · E(c) (Equation-2)
JavaScript and provide alerts for any phishing URLs. c∈X
IG(T, X) = E(T ) − E(T, X) (Equation-3)
Equations 1, 2, and 3 are central to optimizing Decision Tree
2) Anti-Phishing Toolbars
models for phishing detection. Entropy (E(s)) measures the
Toolbar-based solutions are presented as browser extensions impurity of a data set, conditional entropy (E(T, X)) evaluates
that must be installed to prevent phishing attacks. When users the entropy after splitting by attribute X, and Information
visit a phishing website, the toolbar assesses the website’s Gain (IG(T, X)) calculates the reduction in entropy due to the
credibility and warns users to avoid fraud. split. These metrics help to determine the most informative
attributes for splitting nodes in the model.
E. Search Engine-Based Methods
In this method, when a site is visited through a search engine, b: Random Forest Model
its page ranking in the search results is considered. The The random forest (RF) is a classifier that is used to catego-
search engine indexes the website based on its lifespan and rize data into different classes. It is a highly effective model
visit statistics. New websites typically do not rank at the top that is often used to solve classification problems [159].
of search results. Search engine detection is classified into Like a decision tree, this model organizes data into various
two methods: categories as given in Equation 4. It aggregates results from
different nodes to predict classes [158], [160].
B
1) Logo-Based Technique 1 X
E(s) = fi (xt ) (Equation-4)
This is an older method to detect original URLs using a B i=0
search engine. It involves extracting the logo of the original
website and searching for it to find legitimate URLs. c: Support Vector Machine Model
Support Vector Machine (SVM) is a supervised machine
learning model used for binary classification problems as
2) Information Retrieval Technique given in equation 5,6,7 and 8. SVM solves linear problems
In this method, the search engine extracts features of a using the kernel function, which eliminates the need to
website, including web page tags. The extracted tags are transform the data manually, as the SVM kernel handles this
used in search queries to discover phishing websites. The task. Additionally, there is no need to make assumptions
Google Chrome browser uses a phishing detection extension about feature extraction or selection, as the SVM kernel
to analyze website content, including domain names and web manages these aspects. SVM provides a robust solution
page titles. for classification problems by employing a nature-inspired
optimization algorithm for phishing detection and spam
F. AI Methods identification. It classifies data into two classes using a
AI plays a vital role in developing anti-phishing models, hyperplane, predicting the class based on the new value’s
which perform well using different feature extraction meth- position relative to the hyperplane [146].
18 VOLUME ,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
VOLUME , 19
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
20 VOLUME ,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
yt = softmax(Wyh ht + by ) (Equation-23)
X X
L(y, ŷ) = − yi log (ŷ i ) (Equation-24)
i
Equation 16: RNN Mathematical Model with Loss Function
VOLUME , 21
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
22
TABLE 6. Table provides a comparative analysis of various phishing detection models, detailing their method types, datasets, main challenges, limitations, and model accuracy. It highlights the
performance metrics and constraints of each model, showcasing how they fare against different types of phishing datasets and under varying conditions.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Convolutional Hybrid Model Phishing Tank Lower accuracy than some methods, small Rule-based features, potentially limited in- 0.8900
Autoencoder + DNN dataset formation
[167]
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
VOLUME ,
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
4) Autonomous System Development: Developing an au- tional Journal of Computer Applications, vol. 185, no. 11, pp. 1–11,
tonomous system that continuously updates itself with 2023.
[2] A. Safi and S. Singh, “A systematic literature review on phishing
the latest phishing attacks and maintains an up-to-date website detection techniques,” Journal of King Saud University-
database of phishing URLs is a significant challenge Computer and Information Sciences, vol. 35, no. 2, pp. 590–611,
[18], [26]. The need for real-time environments that 2023.
[3] B. Naqvi, K. Perova, A. Farooq, I. Makhdoom, S. Oyedeji, and
automatically learn and train themselves to ensure user J. Porras, “Mitigation strategies against the phishing attacks: A
safety is critical [38]. systematic literature review,” Computers & Security, p. 103387, 2023.
5) User Awareness and Education: Despite technical ad- [4] M. K. Pandey, M. K. Singh, S. Pal, and B. Tiwari, “Prediction
of phishing websites using machine learning,” Spatial Information
vancements, many users remain unaware of phishing Research, vol. 31, no. 2, pp. 157–166, 2023.
attacks, including educated individuals who might in- [5] C. Cross, ““i knew it was a scam”: Understanding the triggers for
advertently click on malicious URLs [74]. Conducting recognizing romance fraud,” Criminology & Public Policy, vol. 22,
no. 4, pp. 613–637, 2023.
training sessions and awareness campaigns is crucial to [6] L. Brotherston, A. Berlin, and W. F. Reyor III, Defensive security
educate people about phishing threats and prevention handbook. ” O’Reilly Media, Inc.”, 2024.
strategies [147]. [7] T. Xu, K. Singh, and P. Rajivan, “Personalized persuasion: Quan-
tifying susceptibility to information exploitation in spear-phishing
6) Dependency on Third-Party Services for Feature Ex- attacks,” Applied Ergonomics, vol. 108, p. 103908, 2023.
traction: Most feature extraction methods rely on third- [8] M. Nadeem, S. W. Zahra, M. N. Abbasi, A. Arshad, S. Riaz,
party services to extract specific information such as do- and W. Ahmed, “Phishing attack, its detections and prevention
main names, DNS registration, and host names. These techniques,” International Journal of Wireless Security and Networks,
vol. 1, no. 2, pp. 13–25p, 2023.
services are often paid and may not always provide [9] R. Goenka, M. Chawla, and N. Tiwari, “A comprehensive survey of
up-to-date information, leading to higher error rates in phishing: Mediums, intended targets, attack and defence techniques
models [26]. and a novel taxonomy,” International Journal of Information Security,
vol. 23, no. 2, pp. 819–848, 2024.
7) Handling Tiny URLs: The literature lacks specific [10] I. Ahmad, S. Khan, and S. Iqbal, “Guardians of the vault: unmasking
mechanisms for handling tiny URLs, which are difficult online threats and fortifying e-banking security, a systematic review,”
to track and verify for phishing content. Educating users Journal of Financial Crime, 2024.
[11] F. S. Alsubaei, A. A. Almazroi, and N. Ayub, “Enhancing phishing
about the credibility of URLs, whether from known detection: A novel hybrid deep learning framework for cybercrime
or unknown sources, is crucial [37], [131]. Designing forensics,” IEEE Access, 2024.
a system to identify whether tiny URLs are phished [12] S. Asiri, Y. Xiao, S. Alzahrani, S. Li, and T. Li, “A survey of
intelligent detection designs of html url phishing attacks,” IEEE
or legitimate remains a significant research challenge Access, vol. 11, pp. 6421–6443, 2023.
[130]. [13] Y. Guo, “A review of machine learning-based zero-day attack detec-
8) Limitations of Rule-Based and List-Based Models: tion: Challenges and future directions,” Computer communications,
vol. 198, pp. 175–185, 2023.
Rule-based and list-based models, while effective in [14] A. S. Albahri, A. M. Duhaim, M. A. Fadhel, A. Alnoor, N. S.
identifying phished or legitimate URLs, require frequent Baqer, L. Alzubaidi, O. S. Albahri, A. H. Alamoodi, J. Bai, A. Salhi
updates and can have slow detection speeds, resulting et al., “A systematic review of trustworthy and explainable artificial
intelligence in healthcare: Assessment of quality, bias risk, and data
in high response times [154]. Designing systems capa- fusion,” Information Fusion, vol. 96, pp. 156–191, 2023.
ble of handling phishing links spread through various [15] S. Kemp, “Global overview report,”
devices presents a substantial challenge [133]. Https://Datareportal.Com/Reports/Digital-2022-Global-Overview-
Report, 2022.
[16] A.P.W.G., “Apwg phishing trends report 2nd quarter 2022,” Anti-
XI. Conclusion Phishing Working Group (APWG, no. September, 2022.
In conclusion, datasets with few phishing URLs can ad- [17] B. Gontla, P. Gundu, P. Uppalapati, K. Rao, and S. Hussain, “A
versely affect model performance when tested on larger machine learning approach to identify phishing websites: A compara-
tive study of classification models and ensemble learning techniques,”
datasets. Although blacklisting, whitelisting, and rule-based EAI Endorsed Transactions on Scalable Information Systems, vol. 10,
detection methods are effective, they are constrained by lists no. 5, 2023.
or rules. Machine learning models have been introduced in [18] S. Santos, P. Costa, and A. Rocha, “It/ot convergence in industry
4.0 : Risks and analysis of the problems,” in Iberian Conference on
some research, but these are limited to specific features. Information Systems and Technologies, CISTI, 2023.
Even when using NLP-based machine learning models for [19] N. Do, A. Selamat, O. Krejcar, E. Herrera-Viedma, and H. Fujita,
feature extraction, third-party services are still necessary. “Deep learning for phishing detection: Taxonomy, current challenges,
and future directions,” IEEE Access, vol. 10, 2022.
A comprehensive and dynamic system capable of handling [20] U. Agarwal, “Blockchain technology for secure supply chain man-
all types of attack, implementing across all devices, and agement: A comprehensive review,” IEEE Access, 2022.
updating dynamically with new phishing techniques is still [21] S. Zahra, M. Chishti, A. Baba, and F. Wu, “Detecting covid-19 chaos
driven phishing/malicious url attacks by a fuzzy logic and data mining
needed. Continued research is necessary to develop bench- based intelligence system,” Egyptian Informatics Journal, vol. 23,
mark datasets and systems for both offline and real-time no. 2, 2022.
detection. [22] B. Naqvi, K. Perova, A. Farooq, I. Makhdoom, S. Oyedeji, and
J. Porras, “Mitigation strategies against the phishing attacks: A
systematic literature review,” Computers and Security, vol. 132, 2023.
REFERENCES [23] A. Safi and S. Singh, “A systematic literature review on phishing
[1] D. Kalla, F. Samaah, S. Kuraku, and N. Smith, “Phishing detection website detection techniques,” Journal of King Saud University -
implementation using databricks and artificial intelligence,” Interna- Computer and Information Sciences, vol. 35, no. 2, 2023.
VOLUME , 23
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
[24] A. Jain, S. Sahoo, and J. Kaubiyal, “Online social networks security [48] M. Vijayalakshmi, S. Shalinie, M. Yang, and U. Meenakshi, “Web
and privacy: comprehensive review and analysis,” Complex and phishing detection techniques: A survey on the state-of-the-art, taxon-
Intelligent Systems, vol. 7, no. 5, 2021. omy and future directions,” IET Networks, vol. 9, no. 5, pp. 235–246,,
[25] S. Abad, H. Gholamy, and M. Aslani, “Classification of malicious 2020-09-01.
urls using machine learning,” Sensors, vol. 23, no. 18, 2023-09. [49] A. Jain and B. Gupta, “A survey of phishing attack techniques,
[26] I. Kotenko, “Detection of anomalies and attacks in container systems: defence mechanisms and open research challenges,” Enterprise In-
An integrated approach based on black and white lists,” in Lecture formation Systems, vol. 16, no. 4, pp. 527–565,, 2022.
Notes in Networks and Systems, 2023. [50] M. Bhattacharya, S. Roy, S. Chattopadhyay, A. Das, and S. Shetty,
[27] T. Pattewar, C. Mali, S. Kshire, M. Sadarao, J. Salunkhe, and A. Shah, “A comprehensive survey on online social networks security and
“Malicious short urls detection: A survey,” International Research privacy issues: Threats, machine learning-based solutions, and open
Journal of Engineering and Technology, 2019. challenges,” SECURITY AND PRIVACY, vol. 6, no. 1, 2023.
[28] O. Abiodun, S. A.S, and K. S.O, “Linkcalculator – an efficient link- [51] S. Samad, “Analysis of the performance impact of fine-tuned machine
based phishing detection tool,” Acta Informatica Malaysia, vol. 4, learning model for phishing url detection,” Electronics (Switzerland,
no. 2, 2020. vol. 12, no. 7, 2023.
[29] P. Yang, G. Zhao, and P. Zeng, “Phishing website detection based [52] M. Almousa, T. Zhang, A. Sarrafzadeh, and M. Anwar, “Phishing
on multidimensional features driven by deep learning,” IEEE Access, website detection: How effective are deep learning-based models and
vol. 7, 2019. hyperparameter optimization?” SECURITY AND PRIVACY, vol. 5,
[30] L. Tang and Q. Mahmoud, “A survey of machine learning-based no. 6, 2022.
solutions for phishing website detection,” Machine Learning and [53] S. Salloum, T. Gaber, S. Vadera, and K. Shaalan, “Phishing email
Knowledge Extraction, vol. 3, no. 3, 2021. detection using natural language processing techniques: A literature
[31] S. Anupam and A. Kar, “Phishing website detection using sup- survey,” in Procedia CIRP, 2021.
port vector machines and nature-inspired optimization algorithms,” [54] M. Korkmaz, O. Sahingoz, and B. Diri, “Feature selections for the
Telecommun Syst, vol. 76, no. 1, pp. 17–32,, 2021-01. classification of webpages to detect phishing attacks: A survey,”
[32] V. Shahrivari, M. Darabi, and M. Izadi, “Phishing detection in HORA 2020 - 2nd International Congress on Human-Computer
using machine learning techniques,” 2020-09, available:. [Online]. Interaction, Optimization and Robotic Applications, Proceedings,
Available: http://arxiv.org/abs/2009.11116 2020.
[33] J. Rashid, T. Mahmood, M. Nisar, and T. Nazir, “Phishing detec- [55] M. Hr, A. Mv, S. Prasad, and S. Vinay, “Development of anti-phishing
tion using machine learning technique,” in Proceedings - 2020 1st browser based on random forest and rule of extraction framework,”
International Conference of Smart Systems and Emerging Technolo- Cybersecurity, vol. 3, no. 1, 2020-12.
gies, SMART-TECH 2020. Institute of Electrical and Electronics [56] P. Kumar, T. Jaya, and V. Rajendran, “Si-bba – a novel phishing
Engineers Inc, 2020-11, p. 43–46. website detection based on swarm intelligence with deep learning,”
[34] S. Alnemari and M. Alshammari, “Detecting phishing domains using Mater Today Proc, vol. 80, pp. 3129–3139,, 2023-01.
machine learning,” Applied Sciences (Switzerland, vol. 13, no. 8, [57] L. Abdulrahman, S. Ahmed, Z. Rashid, Y. Jghef, T. Ghazi, and
2023. U. Jader, “Web phishing detection using web crawling, cloud infras-
[35] A. Dutta, “Detecting phishing websites using machine learning tructure and deep learning framework,” Journal of Applied Science
technique,” PLoS One, vol. 16, no. 10, 2021-10. and Technology Trends, vol. 4, no. 01, pp. 54–71,, 2023-03.
[36] E. Gandotra and D. Gupta, “Improving spoofed website detection [58] P. Kalaharsha and B. Mehtre, “Detecting phishing sites –
using machine learning,” Cybern Syst, vol. 52, no. 2, pp. 169–190,, an overview,” Mar, 2021, available:. [Online]. Available: http:
2021. //arxiv.org/abs/2103.12739
[37] B. Waseso and N. Setiyanto, “Web phishing classification using com- [59] R. Pravali, S. Raha, Y. Rachana, and D. Kamesh, “Ensemble machine
bined machine learning methods,” Journal of Computing Theories learning model for phishing intrusion detection and classification
and Applications, vol. 1, no. 1, pp. 11–18,, 2023-08. from urls,” 2023.
[38] G. Lokesh and G. BoreGowda, “Phishing website detection based [60] L. Jovanovic, “Improving phishing website detection using a hybrid
on effective machine learning approach,” Journal of Cyber Security two-level framework for feature selection and xgboost tuning,” Jour-
Technology, vol. 5, no. 1, pp. 1–14,, 2021-01. nal of Web Engineering, vol. 22, no. 3, pp. 543–574,, 2023-07.
[39] M. Lei, Y. Xiao, S. Vrbsky, and C. Li, “Virtual password using [61] A. Lakshmanarao, P. P. Rao, and M. Krishna, “Phishing website
random linear functions for on-line services, atm machines, and detection using novel machine learning fusion approach,” in Pro-
pervasive computing,” Comput Commun, vol. 31, no. 18, 2008. ceedings - International Conference on Artificial Intelligence and
[40] A. Basit, M. Zafar, X. Liu, A. Javed, Z. Jalil, and K. Kifayat, Smart Systems, ICAIS 2021. Institute of Electrical and Electronics
“A comprehensive survey of ai-enabled phishing attacks detection Engineers Inc, 2021-03, p. 1164–1169.
techniques,” Telecommunication Systems, vol. 76, no. 1, 2021. [62] L. Lakshmi, M. Reddy, C. Santhaiah, and U. Reddy, “Smart phishing
[41] U. Butt, R. Amin, H. Aldabbas, S. Mohan, B. Alouffi, and A. Ah- detection in web pages using supervised deep learning classification
madian, “Cloud-based email phishing attack using machine and deep and optimization technique adam,” Wirel Pers Commun, vol. 118,
learning algorithm,” Complex and Intelligent Systems, vol. 9, no. 3, no. 4, pp. 3549–3564,, 2021-06.
pp. 3043–3070,, 2023-06. [63] A. Karim, M. Shahroz, K. Mustofa, S. Belhaouari, and S. Joga,
[42] M. Sabahno and F. Safara, “Isho: improved spotted hyena optimiza- “Phishing detection system through hybrid machine learning based
tion algorithm for phishing website detection,” Multimed Tools Appl, on url,” IEEE Access, vol. 11, pp. 36 805–36 822,, 2023.
vol. 81, no. 24, pp. 34 677–34 696,, 2022-10. [64] R. Zieni, L. Massari, and M. Calzarossa, “Phishing or not phishing? a
[43] A. Odeh, I. Keshta, and E. Abdelfattah, “Phiboost-a novel phishing survey on the detection of phishing websites,” IEEE Access, vol. 11,
detection model using adaptive boosting approach,” 2021. pp. 18 499–18 519,, 2023.
[44] P. Barraclough, G. Fehringer, and J. Woodward, “Intelligent cyber- [65] M. Shaukat, R. Amin, M. Muslam, A. Alshehri, and J. Xie, “A hybrid
phishing detection for online,” Comput Secur, vol. 104, 2021-05. approach for alluring ads phishing attack detection using machine
[45] B. Gupta, K. Yadav, I. Razzak, K. Psannis, A. Castiglione, and learning,” Sensors, vol. 23, no. 19, 2023-10.
X. Chang, “A novel approach for phishing urls detection using [66] L. Yang, J. Zhang, X. Wang, Z. Li, Z. Li, and Y. He, “An improved
lexical based machine learning in a real-time environment,” Comput elm-based and data preprocessing integrated approach for phishing
Commun, vol. 175, pp. 47–57,, 2021-07. detection considering comprehensive features,” Expert Syst Appl, vol.
[46] C. Singh and Meenu, “Phishing website detection based on ma- 165, 2021-03.
chine learning: A survey,” in 2020 6th International Conference on [67] V. Praveena, A. Vijayaraj, P. Chinnasamy, I. Ali, R. Alroobaea, S. Y.
Advanced Computing and Communication Systems, ICACCS 2020, Alyahyan, and M. A. Raza, “Optimal deep reinforcement learning
2020. for intrusion detection in uavs,” Computers, Materials & Continua,
[47] A. Zamir, “Phishing web site detection using diverse machine learn- vol. 70, no. 2, pp. 2639–2653, 2022.
ing algorithms,” Electronic Library, vol. 38, no. 1, pp. 65–80,, 2020- [68] P. Chinnasamy, K. S. Sathya, B. J. A. Jebamani, A. Nithyasri, and
03. S. Fowjiya, “Deep learning: Algorithms, techniques, and applica-
24 VOLUME ,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
tions—a systematic survey,” in Deep Learning Research Applications [91] I. Riadi, Sunardi, and D. Aprilliansyah, “Analysis of anubis trojan
for Natural Language Processing. IGI global, 2023, pp. 1–17. attack on android banking application using mobile security labware,”
[69] M. Korkmaz, O. Sahingoz, and B. Diri, “Detection of phishing International Journal of Safety and Security Engineering, vol. 13,
websites by using machine learning-based url analysis,” 2020. no. 1, 2023.
[70] M. Adebowale, K. Lwin, and M. Hossain, “Intelligent phishing detec- [92] A. Hannousse, S. Yahiouche, and M. C. Nait-Hamoud, “Twenty-
tion scheme using deep learning algorithms,” Journal of Enterprise two years since revealing cross-site scripting attacks: a systematic
Information Management, vol. 36, no. 3, pp. 747–766,, 2023-04. mapping and a comprehensive survey,” Computer Science Review,
[71] Y. Alsariera, V. Adeyemo, A. Balogun, and A. Alazzawi, “Ai meta- vol. 52, p. 100634, 2024.
learners and extra-trees algorithm for the detection of phishing [93] F. Kalantari, M. Zaeifi, T. Bao, R. Wang, Y. Shoshitaishvili, and
websites,” IEEE Access, vol. 8, pp. 142 532–142 542,, 2020. A. Doupé, “Context-auditor: Context-sensitive content injection mit-
[72] A. Maci, A. Santorsola, A. Coscia, and A. Iannacone, “Unbalanced igation,” in ACM International Conference Proceeding Series, 2022.
web phishing classification through deep reinforcement learning,” [94] S. Yadav, A. Mahajan, M. Prasad, and A. Kumar, “Advanced keylog-
Computers, vol. 12, no. 6, 2023-06. ger for ethical hacking,” International Journal of Engineering Applied
[73] R. Goenka, M. Chawla, and N. Tiwari, “A comprehensive survey of Sciences and Technology, vol. 5, no. 1, 2020.
phishing: Mediums, intended targets, attack and defence techniques [95] K. Hussain, A. R. Rahmatyar, B. Riskhan, M. A. U. Sheikh, and
and a novel taxonomy,” International Journal of Information Security, S. R. Sindiramutty, “Threats and vulnerabilities of wireless networks
vol. 23, no. 2, pp. 819–848, 2024. in the internet of things (iot),” in 2024 IEEE 1st Karachi Section
[74] L. Gallo, D. Gentile, S. Ruggiero, A. Botta, and G. Ventre, “The Humanitarian Technology Conference (KHI-HTC). IEEE, 2024, pp.
human factor in phishing: Collecting and analyzing user behavior 1–8.
when reading emails,” Computers & Security, vol. 139, p. 103671, [96] I. Despotopoulos, “Wireless local area network security and modern
2024. cryptographic protocols: Wep & wpa1/2/3,” 2024.
[75] R. Paudel and M. N. Al-Ameen, “Priming through persuasion: [97] B. Tsouvalas and N. Nikiforakis, “Knocking on admin’s door:
Towards secure password behavior,” Proceedings of the ACM on Protecting critical web applications with deception,” in International
Human-Computer Interaction, vol. 8, no. CSCW1, pp. 1–27, 2024. Conference on Detection of Intrusions and Malware, and Vulnera-
[76] H. Gururaj, V. Janhavi, and V. Ambika, Social Engineering in bility Assessment. Springer, 2024, pp. 283–306.
Cybersecurity: Threats and Defenses. CRC Press, 2024. [98] D. Senecal, The Reign of Botnets: Defending Against Abuses, Bots
[77] A. Juanna, M. A. S. Monoarfa, R. Podungge, and R. Tantawi, “Identi- and Fraud on the Internet. John Wiley & Sons, 2024.
fication of trends in business promotion and marketing using video- [99] I. Despotopoulos, “Wireless local area network security and modern
based content on social media,” Jambura Science of Management, cryptographic protocols: Wep & wpa1/2/3,” 2024.
vol. 6, no. 2, pp. 88–103, 2024.
[100] M. A. I. Mallick and R. Nath, “Navigating the cyber security land-
[78] N. Akyeşilmen and A. Alhosban, “Non-technical cyber-attacks scape: A comprehensive review of cyber-attacks, emerging trends,
and international cybersecurity: The case of social engineering,” and recent developments,” World Scientific News, vol. 190, no. 1, pp.
Gaziantep University Journal of Social Sciences, vol. 23, no. 1, pp. 1–69, 2024.
342–360, 2024.
[101] U. Joseph and M. Jacob, “Real time detection of phishing attacks in
[79] D. Senecal, The Reign of Botnets: Defending Against Abuses, Bots
edge devices using lstm networks,” in AIP Conference Proceedings,
and Fraud on the Internet. John Wiley & Sons, 2024.
2022.
[80] K. Church and R. De Oliveira, “What’s up with whatsapp? comparing
[102] R. Ulfath, I. Sarker, M. Chowdhury, and M. Hammoudeh, “Detecting
mobile instant messaging behaviors with traditional sms,” in Pro-
smishing attacks using feature extraction and classification tech-
ceedings of the 15th international conference on Human-computer
niques,” in Lecture Notes on Data Engineering and Communications
interaction with mobile devices and services, 2013, pp. 352–361.
Technologies, 2022, vol. 95.
[81] Z. Alkhalil, C. Hewage, L. Nawaf, and I. Khan, “Phishing attacks:
[103] S. Tang, X. Mi, Y. Li, X. Wang, and K. Chen, “Clues in tweets:
A recent comprehensive study and a new anatomy,” Frontiers in
Twitter-guided discovery and analysis of sms spam,” in Proceedings
Computer Science, vol. 3, p. 563060, 2021.
of the ACM Conference on Computer and Communications Security,
[82] P. Syiemlieh, G. M. Khongsit, U. M. Sharma, and B. Sharma,
2022.
“Phishing-an analysis on the types, causes, preventive measuresand
case studies in the current situation,” IOSR J. Comput. Eng, vol. 9, [104] R. Mayrhofer, J. Stoep, C. Brubaker, and N. Kralevich, “The android
pp. 2278–8727, 2015. platform security model,” ACM Transactions on Privacy and Security,
vol. 24, no. 3, 2021.
[83] W. Kim, O.-R. Jeong, C. Kim, and J. So, “The dark side of the
internet: Attacks, costs and responses,” Information systems, vol. 36, [105] M. Suleman, T. Soomro, T. Ghazal, and M. Alshurideh, “Combating
no. 3, pp. 675–705, 2011. against potentially harmful mobile apps,” 2021.
[84] N. Knopf, “Social engineering: How crowdmasters, phreaks, hack- [106] M. Armstrong, K. Jones, and A. Namin, “How perceptions of caller
ers, and trolls created a new form of manipulative communication honesty vary during vishing attacks that include highly sensitive or
—robert w,” IEEE Technology and Society Magazine, vol. 42, no. 1, seemingly innocuous requests,” Hum Factors, vol. 65, no. 2, 2023.
p. 344, 2022. [107] “scholar (11)”.
[85] B. Dooremaal, P. Burda, L. Allodi, and N. Zannone, “Combining text [108] S. Mahdavifar, N. Maleki, A. Lashkari, M. Broda, and A. Razavi,
and visual features to improve the identification of cloned webpages “Classifying malicious domains using dns traffic analysis,” in 2021
for early phishing detection,” in ACM International Conference IEEE Intl Conf on Dependable, Autonomic and Secure Computing,
Proceeding Series, 2021. Intl Conf on Pervasive Intelligence and Computing, Intl Conf on
[86] I. Tomicic, “Social engineering aspects of email phishing: an Cloud and Big Data Computing, Intl Conf on Cyber Science and
overview and taxonomy,” in 2023 46th ICT and Electronics Con- Technology Congress (DASC/PiCom/CBDCom/CyberSciTech. IEEE,
vention, MIPRO 2023 - Proceedings, 2023. 2021, p. 60–67.
[87] S. Dadvandipour and A. Ganie, “Analyzing and predicting spear- [109] H. Owen, J. Zarrin, and S. Pour, “A survey on botnets, issues, threats,
phishing using machine learning methods,” Multidiszciplináris tu- methods, detection and prevention,” Journal of Cybersecurity and
dományok, vol. 10, no. 4, 2020. Privacy, vol. 2, no. 1, 2022.
[88] H. Oz, A. Aris, A. Levi, and A. Uluagac, “A survey on ransomware: [110] H. Kilavo, L. Mselle, R. Rais, and S. Mrutu, “Reverse social
Evolution, taxonomy, and defense solutions,” ACM Comput Surv, engineering to counter social engineering in mobile money theft: A
vol. 54, no. 11s, 2022. tanzanian context,” Journal of Applied Security Research, vol. 18,
[89] Ekta and U. Bansal, “A review on ransomware attack,” in ICSCCC no. 3, 2023.
2021 - International Conference on Secure Cyber Computing and [111] M. Wang, L. Song, L. Li, Y. Zhu, and J. Li, “Phishing webpage
Communications, 2021. detection based on global and local visual similarity,” Expert Systems
[90] S. Ullah, T. Ahmad, A. Buriro, N. Zara, and S. Saha, “Trojande- with Applications, vol. 252, p. 124120, 2024.
tector: A multi-layer hybrid approach for trojan detection in android [112] A. Brunstein, “Automatic web crawler for malicious websites classi-
applications,” Applied Sciences (Switzerland, vol. 12, no. 21, 2022. fication,” Ph.D. dissertation, Politecnico di Torino, 2024.
VOLUME , 25
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
[113] D.-J. Liu and J.-H. Lee, “A cnn-based sia screenshot method to [135] I. Kara, M. Ok, and A. Ozaday, “Characteristics of understanding
visually identify phishing websites,” Journal of Network and Systems urls and domain names features: The detection of phishing websites
Management, vol. 32, no. 1, p. 8, 2024. with machine learning methods,” IEEE Access, vol. 10, 2022.
[114] O. Sarker, A. Jayatilaka, S. Haggag, C. Liu, and M. A. Babar, “A [136] F. Sadique, R. Kaul, S. Badsha, and S. Sengupta, “An automated
multi-vocal literature review on challenges and critical success factors framework for real-time phishing url detection,” in 2020 10th Annual
of phishing education, training and awareness,” Journal of Systems Computing and Communication Workshop and Conference, CCWC
and Software, vol. 208, p. 111899, 2024. 2020, 2020.
[115] S. Das Guptta, K. T. Shahriar, H. Alqahtani, D. Alsalman, and I. H. [137] A. Bozkir and M. Aydos, “Logosense: A companion hog based
Sarker, “Modeling hybrid feature-based phishing websites detection logo detection scheme for phishing web page and e-mail brand
using machine learning techniques,” Annals of Data Science, vol. 11, recognition,” Comput Secur, vol. 95, 2020.
no. 1, pp. 217–242, 2024. [138] M. Pandey, M. Singh, S. Pal, and B. Tiwari, “Prediction of phishing
[116] A. Aljammal, S. taamneh, A. Qawasmeh, and H. Salameh, “Machine websites using stacked ensemble method and hybrid features selec-
learning based phishing attacks detection using multiple datasets,” tion method,” SN Comput Sci, vol. 3, no. 6, 2022.
International Journal of Interactive Mobile Technologies, vol. 17, [139] P. Indrasiri, M. Halgamuge, and A. Mohammad, “Robust ensemble
no. 5, 2023. machine learning model for filtering phishing urls: Expandable ran-
[117] M. Sánchez-Paniagua, E. Fidalgo, E. Alegre, and R. Alaiz-Rodrı́guez, dom gradient stacked voting classifier (erg-svc,” IEEE Access, vol. 9,
“Phishing websites detection using a novel multipurpose dataset and 2021.
web technologies features,” Expert Syst Appl, vol. 207, 2022. [140] M. Prince, A. Hasan, and F. Shah, “A new ensemble model for
[118] Q. Li, G. Zhong, C. Xie, and R. Hedjam, “Weak edge identification phishing detection based on hybrid cumulative feature selection,” in
network for ocean front detection,” IEEE Geoscience and Remote ISCAIE 2021 - IEEE 11th Symposium on Computer Applications and
Sensing Letters, vol. 19, 2022. Industrial Electronics, 2021.
[119] L. Xue, “mt5: A massively multilingual pre-trained text-to-text [141] L. Rani, C. Foozy, and S. Mustafa, “Feature selection to enhance
transformer,” in NAACL-HLT 2021 - 2021 Conference of the North phishing website detection based on url using machine learning
American Chapter of the Association for Computational Linguistics: techniques,” Journal of Soft Computing and Data Mining, vol. 4,
Human Language Technologies, Proceedings of the Conference, no. 1, 2023.
2021. [142] J. Moedjahedy, A. Setyanto, F. Alarfaj, and M. Alreshoodi, “Ccrfs:
[120] S. Gopal and C. Poongodi, “Mitigation of phishing url attack in iot Combine correlation features selection for detecting phishing web-
using h-ann with h-ffgwo algorithm,” KSII Transactions on Internet sites using machine learning,” Future Internet, vol. 14, no. 8, 2022.
and Information Systems, vol. 17, no. 7, 2023. [143] Y. Mansour and M. Alenizi, “Enhanced classification method for
[121] H. Alqahtani, “Evolutionary algorithm with deep auto encoder net- phishing emails detection,” Journal of Information Security and
work based website phishing detection and classification,” Applied Cybercrimes Research, vol. 3, no. 1, 2020.
Sciences (Switzerland, vol. 12, no. 15, 2022. [144] A. Alhussan, H. Al-Mahdawi, and A. Kadi, “Spam detection in
[122] K. Apoorva and S. Sangeetha, “Analysis of uniform resource locator connected networks using particle swarm and genetic algorithm
using boosting algorithms for forensic purpose,” Comput Commun, optimization: Youtube as a case study,” International Journal of
vol. 190, 2022. Wireless and Ad Hoc Communication, vol. 6, no. 1, pp. 08–18,, 2023.
[145] A. Ramana, K. Rao, and R. Rao, “Stop-phish: an intelligent phishing
[123] V. Mazzeo, A. Rapisarda, and G. Giuffrida, “Detection of fake news
detection method using feature selection ensemble,” Soc Netw Anal
on covid-19 on web search engines,” Front Phys, vol. 9, 2021.
Min, vol. 11, no. 1, 2021.
[124] A. Aljofey, Q. Jiang, Q. Qu, M. Huang, and J. Niyigena, “An effective
[146] S. Anupam and A. Kar, “Phishing website detection using sup-
phishing detection model based on character level convolutional
port vector machines and nature-inspired optimization algorithms,”
neural network from url,” Electronics (Switzerland, vol. 9, no. 9,
Telecommun Syst, vol. 76, no. 1, pp. 17–32,, 2021-01.
pp. 1–24,, 2020-09.
[147] O. Sarker, A. Jayatilaka, S. Haggag, C. Liu, and M. Babar, “A
[125] E. Gualberto, R. Sousa, T. Brito Vieira, J. Costa, and C. Duque, “The multi-vocal literature review on challenges and critical success
answer is in the text: Multi-stage methods for phishing detection factors of phishing education, training and awareness,” Journal
based on feature engineering,” IEEE Access, vol. 8, 2020. of Systems and Software, vol. 208, pp. 111 899,, 2024. [Online].
[126] S. Shabudin, N. Sani, K. Ariffin, and M. Aliff, “Feature selection for Available: https://doi.org/10.1016/j.jss.2023.111899.
phishing website classification,” International Journal of Advanced [148] M. Grubbs, “Anti-phishing game-based training: An experimental
Computer Science and Applications, vol. 11, no. 4, 2020. analysis of demographic factors,” SSRN Electronic Journal, 2022.
[127] A. Singh and S. Misra, “A comparison of performance of rough [149] J. Brickley, K. Thakur, and A. Kamruzzaman, “A comparative
set theory with machine learning techniques in detecting phishing analysis between technical and non-technical phishing defences,” In-
attack,” in Lecture Notes in Networks and Systems, 2022, vol. 289. ternational Journal of Cyber-Security and Digital Forensics, vol. 10,
[128] M. El-Rashidy, “A smart model for web phishing detection based no. 1, 2021.
on new proposed feature selection technique,” Menoufia Journal of [150] A. Chattopadhyay, C. Maschinot, and L. Nestor, “Mirror on the wall -
Electronic Engineering Research, vol. 30, no. 1, 2021. what are cybersecurity educational games offering overall: A research
[129] A. Thahira and A. John, “Phishing website detection using lgbm study and gap analysis,” in Proceedings - Frontiers in Education
classifier with url-based lexical features,” in Proceedings - 2022 IEEE Conference, FIE, 2021.
Silchar Subsection Conference, SILCON 2022, 2022. [151] S. Bell and P. Komisarczuk, “An analysis of phishing blacklists:
[130] H. Zhao, Z. Chen, and R. Yan, “Malicious domain names detection Google safe browsing, openphish, and phishtank,” in ACM Interna-
algorithm based on statistical features of urls,” in 2022 IEEE 25th tional Conference Proceeding Series, 2020.
International Conference on Computer Supported Cooperative Work [152] S. Chanti and T. Chithralekha, “Classification of anti-phishing solu-
in Design, CSCWD 2022, 2022. tions,” SN Comput Sci, vol. 1, no. 1, 2020.
[131] S. Asiri, Y. Xiao, S. Alzahrani, S. Li, and T. Li, “A survey of [153] M. Korkmaz, E. Kocyigit, O. Sahingoz, and B. Diri, “A hybrid
intelligent detection designs of html url phishing attacks,” IEEE phishing detection system using deep learning-based url and content
Access, vol. 11, pp. 6421–6443,, 2023. analysis,” Elektronika ir Elektrotechnika, vol. 28, no. 5, 2022.
[132] R. Rao, T. Vaishnavi, and A. Pais, “Catchphish: detection of phishing [154] R. Zaimi, M. Hafidi, and M. Lamia, “Survey paper: Taxonomy of
websites by inspecting urls,” J Ambient Intell Humaniz Comput, website anti-phishing solutions,” in 2020 7th International Confer-
vol. 11, no. 2, 2020. ence on Social Network Analysis, Management and Security, SNAMS
[133] F. Kausar, B. Al-Otaibi, A. Al-Qadi, and N. Al-Dossari, “Hybrid 2020, 2020.
client side phishing websites detection approach,” International Jour- [155] T. Suleman, “A survey on web phishing detection techniques: A
nal of Advanced Computer Science and Applications, vol. 5, no. 7, taxonomy-based approach,” LGU International Journal for Electronic
2014. Crime Investigation, pp. 1–12,, 2021.
[134] C. Tan, K. Chiew, K. Yong, S. Sze, J. Abdullah, and Y. Sebastian, [156] Z. Azam, M. Islam, and M. Huda, “Comparative analysis of intru-
“A graph-theoretic approach for the detection of phishing webpages,” sion detection systems and machine learning-based model analysis
Comput Secur, vol. 95, 2020. through decision tree,” IEEE Access, vol. 11, 2023.
26 VOLUME ,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
[157] S. Mishra, P. Mallick, H. Tripathy, A. Bhoi, and A. González-Briones, MUHAMMAD ZAMAN (Memeber, IEEE) is a
“Performance evaluation of a proposed machine learning model for committed lecturer at the University of Lahore
chronic disease datasets using an integrated attribute evaluator and and is a graduate of COMSATS University of
an improved decision tree classifier,” Applied Sciences (Switzerland, Islamabad with a Master of Science in Computer
vol. 10, no. 22, 2020. Science. Muhammad, driven by an unwavering
[158] A. Alsufyani and S. Alzahrani, “Social engineering attack detection dedication to the progression of knowledge, pos-
using machine learning: Text phishing attack,” Indian Journal of sesses extensive research acumen in numerous
Computer Science and Engineering, vol. 12, no. 3, 2021. fields, including but not limited to medical image
[159] S. Wang, S. Khan, C. Xu, S. Nazir, and A. Hafeez, “Deep learning- processing, remote sensing, and natural language
based efficient model development for phishing detection using processing. The individual’s substantial research
random forest and blstm classifiers,” Complexity, vol. 2020, 2020. pursuits and contributions demonstrate a profound
[160] S. Sindhu, S. Patil, A. Sreevalsan, F. Rahman, and A. Saritha, “Phish- enthusiasm for artificial intelligence, machine learning, deep learning, com-
ing detection using random forest, svm and neural network with puter vision, and reinforcement learning. Muhammad’s numerous publica-
backpropagation,” in Proceedings of the International Conference tions in computer vision have substantiated his contributions to the field.
on Smart Technologies in Computing, Electrical and Electronics, Furthermore, he has served as a mentor and guide to a considerable number
ICSTCEE 2020, 2020. of MS students as they have completed their research theses and papers.
[161] A. Ozcan, C. Catal, E. Donmez, and B. Senturk, “A hybrid dnn–lstm This demonstrates his dedication to cultivating the subsequent generation of
model for detecting phishing urls,” Neural Comput Appl, vol. 35, scholars and innovators.
no. 7, pp. 4957–4973,, 2023-03.
[162] G. Xu, T. Ren, Y. Chen, and W. Che, “A one-dimensional cnn-lstm
model for epileptic seizure recognition using eeg signal analysis,” AHMAD SAMI AL-SHAMAYLEH received the
Front Neurosci, vol. 14, 2020. master’s degree in Information Systems from The
[163] Z. Alshingiti, R. Alaqel, J. Al-Muhtadi, Q. Haq, K. Saleem, and University of Jordan, Jordan, in 2014, and the
M. Faheem, “A deep learning-based phishing detection system using Ph.D. degree in Artificial intelligence from Univer-
cnn, lstm, and lstm-cnn,” Electronics (Switzerland, vol. 12, no. 1, sity of Malaya, Malaysia, in 2020. He is currently
2023-01. an Assistant Professor with the Faculty of Informa-
[164] V. Adeyemo, A. Balogun, H. Mojeed, N. Akande, and K. Adewole, tion Technology, Al-Ahliyya Amman University,
“Ensemble-based logistic model trees for website phishing detection,” Jordan. His research interests include: Artificial
in Communications in Computer and Information Science, 2021. Intelligence, Human Computer Interaction, IoT,
[165] A. Aljofey, Q. Jiang, Q. Qu, M. Huang, and J. Niyigena, “An effective Arabic NLP, Arabic sign language recognition,
phishing detection model based on character level convolutional language resources production, the design and
neural network from url,” Electronics (Switzerland, vol. 9, no. 9, evaluation of interactive applications for handicapped people, multimodality,
pp. 1–24,, 2020-09. and software engineering.
[166] W. Ali and A. Ahmed, “Hybrid intelligent phishing website prediction
using deep neural networks with genetic algorithm-based feature
selection and weighting,” IET Inf Secur, vol. 13, no. 6, 2019. Dr. TANZILA KEHKASHANis a lecturer at the
[167] D. Thanammal and D. Sujatha, “Phishing website detection using
Faculty of Computer Science at the University
novel features and machine learning approach,” 2021. of Lahore, Pakistan. She earned her Ph.D. from
Universiti Teknologi Malaysia (UTM) and holds
XII. Biography Section a Master of Science in Computer Science from
the University of Central Punjab, Lahore, Pakistan.
In addition to her teaching role, Dr. Kehkashan
is an active member of the Virtual, Visualization,
SHAKEEL AHMAD an eminent professional ed- and Vision Research Group (UTM VicubeLab,
ucationist working as Subject Specialist (Computer Malaysia), where she conducts cutting-edge re-
Science) in School Education Department, Punjab, search in visual computing, particularly in the ar-
Pakistan, from last 10 years, born on September 6, eas of computer vision and natural language processing. Her work has been
1986, in Rasool Pur Tarar, a town within District published in prestigious journals and conference proceedings, showcasing
Hafizabad, Punjab, Pakistan, holds a master’s de- her commitment to advancing research in these fields. Dr. Kehkashan also
gree in computer science & information technol- serves as a supervisor for master’s theses and final-year projects, contributing
ogy from the prestigious University of Education significantly to academic mentorship. Her research interests include image
Division of Science and Technology Township and video analysis, medical imaging, and language modeling. Her dedication
Campus Lahore, Punjab, Pakistan (2012). Prior to to both research and academic practice reflects her passion for the field of
his master, Mr. Shakeel Ahmad done his grad- computer science.
uation in Business and Finance from the University of Punjab, Lahore,
Pakistan in 2008. He has started his professional career as Subject Specialist
(Computer Science) in 2014. He also started his career as Research Assistant
in Machine learning & Deep Learning in 2015. Mr. Shakeel Ahmad also RAHIEL AHMAD a distinguished professional
done, certification of Computer Application offered by the Govt of Punjab, born on February 4, 1987, in Rasool Pur Tarar, a
Pakistan in 2008. Shakeel’s expertise spans Research and Development town within District Hafizabad, Punjab, Pakistan,
in Machine Learning and Object Detection Algorithms, Technical Office holds a master’s degree in computer science (Spe-
Management & Administration, Technical Report writing, SOP writing, cialized in AI & ML) from the prestigious Uni-
Training management, and Training Policy administration across Primary versity of Lahore, Punjab, Pakistan (2021). Prior
to Advanced technical levels, showcasing his exceptional versatility and to his Masters, Rahiel completed his bachelor’s in
proficiency. software engineering from COMSATS University
Islamabad. With a rich and diverse professional
background, Engr. Rahiel currently serves as a
Training Coordinator at Avionics Flight, PAFAA,
a position he has held since June, 2023. Prior to this, Engr. Rahiel held
notable roles as an Assistant Avionics Maintenance Engineer at Trainer
Fleet, Avionics Flight, PAFAA (2010-2014), and Senior Avionics Mainte-
nance Engineer and Technical Administrator at UAV Fleet, PAF, Mushaf
(2014-2023). Engr. Rahiel’s expertise spans Research and Development
in Machine Learning and Object Detection Algorithms, Technical Office
VOLUME , 27
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Open Journal of the Communications Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/OJCOMS.2024.3462503
Shakeel et al.: Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection
28 VOLUME ,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/