0% found this document useful (0 votes)

8 views

Comparative_Evaluation_of_Machine_Learning_Models_for_Malicious_URL_Detection

Uploaded by

TeX-Coders

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Comparative_Evaluation_of_Machine_Learning_Models_for_Malicious_URL_Detection

Uploaded by

TeX-Coders

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon)

MIT ADT University, Pune, India. Apr 25-27, 2024

Comparative Evaluation of Machine Learning

Models for Malicious URL Detection

Nikhilesh P Mankar1 Prashant E Sakunde2 Sachin Zurange3
Computer Engineering Department AI & DS Engineering Information Technology
School Of Engineering & Technology Dr.D Y Patil Institute of Technology G.H Raisoni Institute of Engineering
D Y Patil University, Pimpri,Pune,India and Management,
Ambi Pune, India [email protected] Wagholi
[email protected] [email protected]

Anup Date4 Vishal Borate5 Yogesh Kisan Mali6

Computer Engineering Department Computer Engineering AIML Department
School Of Engineering & Technology Dr D Y Patil Colleg of Engineering G.H Raisoni Institute of Engineering
D Y Patil University, Innovation, and Management,
Ambi Pune, India Pune,India Wagholi
[email protected] [email protected] [email protected]

Abstract—Malicious URLs promoting threats like phishing approaches updated, massive user exposure would have
and malware result in massive financial losses worldwide. already occurred.
Automated identification of such URLs before users access them
is crucial for cybersecurity. This paper investigates various Therefore, techniques to automatically detect malicious
machine learning techniques for accurately detecting URLs before users access them can significantly strengthen
malicious URLs. Models like decision trees, random forests, cyber defense. This enables proactively identifying threats
KNN and naive Bayes are evaluated on a dataset of over 500,000 Instead of reacting after attacks have taken place and caused
URLs. Ensemble models random forest and extra trees deliver damage. Prior research on using machine learning for
the best performance, with over 91% accuracy in distinguishing detecting malicious URLs has shown promising results [24].
benign and malicious URLs. However, class imbalance remains This paper presents a comparative study evaluating the
a challenge with minority malicious types often having lower efficacy of various standard machine learning models like
precision. Comparative assessment demonstrates feasibility of decision trees, ensembles, support vector machines and Naive
using ensemble machine learning for automated malicious URL Bayes for accurately distinguishing benign and malicious
detection. With sufficient examples and feature engineering, URLs.
tree-based models can be effectively employed to identify threats
and strengthen cyber defense. Here is the literature review formatted properly:

Index Terms—Malicious URLs, Machine Learning, Decision You’re right, my apologies. There should not be a large
Tree, Random Forest, AdaBoost, K-Nearest Neighbors, Stochastic gap there. Here is the fixed formatting:
Gradient Descent, Extra Trees, Gaussian Naive Bayes
II. LITERATURE SURVEY
I. INTRODUCTION With the proliferation of cyber threats, malicious URL
The internet has revolutionized access to information, detection has become a crucial research focus. A substantial
services, communication and transactions. Nearly 4.5 billion body of literature has focused on applying machine learning
people around the world have internet access today. However, techniques to accurately identify such harmful URLs before
the connectivity also enables threats like phishing, spam and they can inflict damage. This review focuses on the key
malware distribution via websites and emails resulting in studies on malicious URL identification and phishing
massive financial losses. Symantec estimates such cyber detection using machine learning, as highlighted in the
threats cost the world $1.5 trillion annually [22]. For instance, attached papers.
phishing leads to stolen credentials and payment information A major research focus has been developing and
for financial fraud. Spam emails promote questionable evaluating machine learning algorithms for effective and
products, steal personal information and spread malware. automated malicious URL classification. Khan et al. [1]
Drive-by downloads install malware, viruses, ransomware on formulated malicious URL detection as a machine learning
victim computers surreptitiously for identity theft and problem. They developed a comprehensive prototype
hijacking systems. employing the AdaBoost algorithm. Through large-scale
Malicious websites promoting such threats employ online learning on URL datasets, they demonstrated improved
deceptive techniques to disguise their true intent while performance over previous blacklist-based and heuristic
aggressively optimizing for search engine visibility to ensnare methods. The prototype established a machine learning
users. Examples include using spoofed content mimicking framework and benchmark for malicious URL identification.
trustworthy entities, search engine spamming, URL Sahingoz et al. [3] introduced a real-time anti-phishing
obfuscation, redirects, hidden iframes etc. [23]. Most users system applying seven different classification algorithms
cannot easily distinguish such malicious sites from legitimate along- side natural language processing (NLP)-based features
ones. By the time a threat is confirmed and blacklisting extracted from URLs. The system demonstrated strengths like
language independence, real-time execution, minimal

979-8-3503-6287-9/24/$31.00 ©2024 IEEE 1

Authorized licensed use limited to: VIT University. Downloaded on August 15,2024 at 03:57:57 UTC from IEEE Xplore. Restrictions apply.
dependence on external services, and high accuracy. The TABLE I.
Random Forest classifier with only NLP features performed URL Type Count Percentage
the best, achieving a significantly high accuracy in detecting Benign 428,103 65.7%
phishing URLs in their experiments. Defacement 96,457 14.8%
Phishing 94,111 14.5%
Multiple studies have analyzed URL datasets using NLP Malware 32,520 5.0%
techniques like count, hash, and TF-IDF vectorization coupled
with classifiers like random forests and neural networks. S.
Modi et al. [4] attained over 90% accuracy in identifying IV. DATASET CLASS DISTRIBUTION
malicious URLs this way. Venugopal et al. [5] categorized and Benign URLs form the majority class, while defacement,
labeled URLs as malicious or benign by combining diverse phishing, and malware constitute minority classes. This
NLP and ML models like BERT, LSTM, decision trees, and imbalanced distribution reflects the real-world prevalence of
ensemble methods. Their novel integration leveraging a wide benign versus malicious URLs [8].
feature set from URLs, registrars, web pages, and content
The diverse dataset was assembled from the following
showed promising results.
sources to ensure sufficient samples for training machine
A significant body of research has concentrated learning models:
specifically on phishing URL and website detection using
1) The ISCX-URL-2016 dataset provided an initial col-
machine learning. Y.Mali et al. [6] analyzed URL lexical and
lection of benign, defacement, phishing, and malware
host-based patterns using neural networks and decision tree
URLs for exploration. This established the four key
classifiers to effectively discern phishing sites from benign
URL categories of interest.
ones. Tree-based models excelled in their experiments,
achieving a significantly high accuracy in phishing website 2) To significantly expand the number of phishing and
classification. S. Ruprah et al. [13] automatically extracted malware examples, additional URLs of both types
lexical and host-based features from URLs using statistical were extracted from the Malware Domain Blacklist.
methods to train accurate phishing classifiers. This list compiles known phishing and malware
domains from various sources into one continuously
Y. Mali et al. [11] substantially improved malicious URL
updated database.
identification by optimizing the random forest model’s hyper
parameters and selecting the most predictive features. This 3) A large number of benign URLs mixed with random
enhanced performance particularly for zero-day unknown website samples were added from a public GitHub
threats. Y. Mali et al. [19] highlighted extreme gradient repository. This strengthened the benign class which
boosting (XGBoost)’s potential for phishing detection based forms the majority of real-world URLs.
on comparative analysis of multiple machine learning
algorithms. XGBoost achieved 97.27% accuracy surpassing 4) Further phishing URLs were obtained from the Phish-
random forest and neural networks. tank and Phish Storm databases, which collate URLs
of verified phishing websites reported by users and
In summary, the review reveals sophisticated techniques security researchers.
that combine machine learning algorithms and phishing-
specific classifiers leveraging URL, host, web page, and Combining URLs from these varied sources in a single
lexical features [7]. These techniques have shown promise in aggregated dataset enabled comparative evaluation of machine
malicious URL detection. However, as online threats learning techniques on an extensive testbed containing both
continuously and rapidly evolve, further research is critically benign and commonly encountered malicious URL types [9].
needed to enhance detection accuracy, efficiency, scalability, A. Data Preprocessing
and adaptability to new threats. More advanced deep learning
The following preprocessing steps were applied on the raw
approaches also warrant greater exploration to bolster security
aggregated URL dataset to prepare it for feature engineering
as adversaries become increasingly sophisticated. There are
and modeling:
still significant gaps and limitations in existing techniques that
need to be addressed through ongoing research to develop 1) Lowercasing: All the URLs were converted to lower-
more robust, generalizable, and cutting-edge solutions. case text. This normalized the textual representation
by eliminating random capitalization in URLs.
III. METHODOLOGY
2) Subdomain Removal: The “www” subdomain was
This section details the data preprocessing, feature stripped from all URLs. This eliminated extraneous
engineering, model training and evaluation methodology subdomains leaving only the primary domain for
followed to assess the efficacy of different machine learning generalization.
techniques for detecting malicious URLs.
3) Invalid URL Filtering: Any invalid URLs
A. Dataset containing non-UTF-8 or unparseable text were
A key prerequisite for training machine learning models filtered out to avoid data corruption before feature
is curating a comprehensive dataset of benign and malicious extraction. This step retained only valid URL samples.
URLs. For this study, an aggregated dataset was compiled
containing 651,191 URLs distributed into categories as shown 4) Protocol Removal: The protocol prefix like “http://”
in Table I. or “https://” was removed to extract just the domain
itself as the focus for modeling.

2
Authorized licensed use limited to: VIT University. Downloaded on August 15,2024 at 03:57:57 UTC from IEEE Xplore. Restrictions apply.
5) Spam Filtering: A randomized manual validation capturing nonlinearities and feature interactions
was conducted on a subset of URLs to identify and through their hierarchical structure while also
filter out any spam or irrelevant URLs in order to providing interpretability [16].
improve dataset quality.
2) Random Forests: Random forests ensemble decision
6) Train-Test Split: The dataset was split 80:20 into trees trained on random subsets of data and
training and testing sets for model development and features to reduce variance, avoid overfitting, and
evaluation. The benign and malicious URL categories significantly boost accuracy compared to individual
were evenly distributed in both splits using stratified trees.
sampling to maintain proportional representation
3) AdaBoost: AdaBoost combines weak learners into a
[10].
robust ensemble by reweighting misclassified
7) Class Weights: To counter class imbalance during examples to focus on hard cases and complement the
training, the inverse class frequencies were supplied high bias weak learners to reduce overall bias.
as weights to emphasize minority malicious URL
4) K-Nearest Neighbors: The KNN algorithm identifies
types compared to the majority benign type.
the k closest training samples based on a distance
The systematic preprocessing converted the raw URL metric and predicts the class by majority vote to model
dataset into a high quality representation optimally suited for complex regions without data distribution
comparative machine learning modeling and evaluation [12]. assumptions [17].
B. Feature Engineering 5) Stochastic Gradient Descent: Stochastic gradient
The text URLs were transformed into numeric features descent updates model weights iteratively on
based on insights from prior research in this domain. The individual samples for efficient large-scale SVM
following features were extracted programmatically using training and faster convergence while using
Python: regularization to prevent overfitting.

1) URL Length: The total number of characters in the 6) Extra Trees: Extra trees add excessive randomization
URL. Malicious URLs are typically longer on to tree splitting and features to reduce variance
average. without increasing bias, achieving higher accuracy
compared to standard random forests.
2) Path Levels: The number of path levels beyond the
domain hierarchy delimited by slash. Many levels 7) Naive Bayes: Naive Bayes uses Bayes’ theorem to
may indicate obfuscation attempts [14]. probabilistically model class distributions assuming
feature independence for computational efficiency
3) IP Presence: A binary feature indicating presence of and performs surprisingly well despite its simplicity
a direct IP address. IPs in URLs are rare among [18].
benign websites.
These standard algorithms provide a diverse
4) Dash Count: The number of dashes (-) present. representation of decision trees, ensembles, SVMs, nearest
Malicious URLs exhibit higher dash counts on neighbors and probabilistic classifiers commonly applied to
average. text classification. All models were implemented in Python
5) Dot Count: The total dots or period characters (.) in using scikit- learn.
the URL. Used to identify excessive subdomain D. Model Training Methodology
chaining.
Each model was trained on the engineered URL features
6) Domain Token Count: The number of words or using the following methodology:
tokens in the extracted domain name. Unusually long
1) Hyper parameter Tuning
domains may be suspicious [15].
Grid search with 5-fold cross-validation on just the
7) Entropy: Shannon entropy calculated on the full training set was used to tune key hyper parameters for each
URL string quantifying randomness. High entropy model:
signals increased automation likelihood.
x Decision Tree: Max depth, min samples split, min
8) Special Characters: Count of special characters like samples leaf
@, #, $, etc. Malicious URLs tend to contain more
special characters on average. x Random Forest: Num estimators, max features, max
depth
In total, 12 features were engineered using both heuristics
and programmatic methods aimed at capturing distinguishing x AdaBoost: Num estimators, learning rate
attributes based on domain knowledge. x KNN: Num neighbors, weights, leaf size
C. Comparison of ML Architectures x SGD: Loss function, penalty, alpha
The following machine learning models were
implemented and evaluated for detecting malicious URLs: x Extra Trees: Num estimators, max features, max
depth
Here is a concise LaTeX summary with short but proper
sentences for each key machine learning technique: x Naive Bayes: No tuning

1) Decision Trees: Decision trees recursively partition The combination of hyper parameters yielding the best
the feature space to minimize a loss function, cross-validation accuracy was selected.

3
Authorized licensed use limited to: VIT University. Downloaded on August 15,2024 at 03:57:57 UTC from IEEE Xplore. Restrictions apply.
2) Training B. Analysis of Confusion Matrices
The models were trained on the full URL training set using The following figures showcase the confusion matrices of
the optimized hyper parameters. Appropriate class weights each machine learning model, allowing a deep dive into their
were supplied to handle class imbalance [20]. The models predictive power based on the test set.
were trained for a maximum of 100 epochs and monitored on
a validation set.
3) Regularization
Early stopping was used to halt training after 5 epochs of
no improvement in validation loss to prevent overfitting.
The scikit-learn library was used for standardized model
training and cross-validation.
E. Evaluation Metrics
Several quantitative metrics and visualizations were
utilized to evaluate model performance on the held-out test
set:
1) Accuracy: Overall accuracy on the test set.
2) Precision & Recall: Precision and recall metrics for
each URL class.
3) F1-score: Harmonic mean of precision and recall
providing a balance of both. Fig. 2. Confusion matrix for Decision Tree
4) Confusion Matrix: Breakdown of predictions into
true positives, true negatives, false positives and false
negatives.
Together these metrics enabled holistic evaluation of the
models from multiple perspectives.
V. RESULTS
This section empirically compares the efficacy of the
implemented machine learning models on the malicious URL
detection task.
A. Test Accuracy
Table IV-A shows the test accuracy attained by each
model. The ensemble models Extra Trees and Random Forest
achieve the highest accuracy exceeding 91%. Naive Bayes
performs the poorest with just 78.7% accuracy.

TABLE II. TEST ACCURACY OF MODELS

Model Test Accuracy Fig. 3. Confusion matrix for Random Forest Classifier
Extra Trees 91.47%
Random Forest 91.49%
Decision Tree 90.96%
KNN 88.96%
SGD 81.28%
AdaBoost 82.01%
Naive Bayes 78.95%

Fig. 4. Confusion matrix for AdaBoost Classifier

Fig. 1. Graphical Summary of Test Accuracy among Models

4
Authorized licensed use limited to: VIT University. Downloaded on August 15,2024 at 03:57:57 UTC from IEEE Xplore. Restrictions apply.
Fig. 8. Confusion matrix for Gaussian Naive Bayes

An examination of the confusion matrices reveals that the

Fig. 5. Confusion matrix for K-Nearest Neighbors Random Forest model has high diagonal values, but some er-
rors occur, such as misclassifying benign URLs as phishing or
incorrectly labeling benign URLs as malware [21]. The
AdaBoost classifier has significant false negatives, while the
Support Vector Machine with Stochastic Gradient Descent
optimization has substantial confusion. The K-Nearest
Neighbors model has decent performance but some false
positives and negatives. Decision Tree and Extra Trees
ensemble classifiers have high diagonal values.
The precision and recall metrics for each URL class were
computed across all models, as enumerated in Table III.

TABLE III.
Model Metric Benign Defacement Phishing Malware
Random Forest Precision 0.92 0.94 0.85 0.96
Random Forest Recall 0.98 0.97 0.62 0.91
Extra Trees Precision 0.91 0.93 0.83 0.95
Extra Trees Recall 0.97 0.96 0.59 0.89
Decision Tree Precision 0.90 0.92 0.81 0.93
Decision Tree Recall 0.95 0.94 0.57 0.87
KNN Precision 0.88 0.86 0.77 0.91
Fig. 6. Confusion matrix for Stochastic Gradient Descent KNN Recall 0.92 0.89 0.51 0.84
SGD Precision 0.81 0.78 0.72 0.85
SGD Recall 0.95 0.82 0.41 0.79
AdaBoost Precision 0.83 0.79 0.71 0.88
AdaBoost Recall 0.97 0.91 0.38 0.83
Naive Bayes Precision 0.81 0.74 0.68 0.82
Naive Bayes Recall 0.88 0.79 0.31 0.77

VI. PRECISION AND RECALL PER CLASS ACROSS

MODELS
It can be discerned that while precision and recall are
balanced for the majority benign URL class across models,
the metrics are substantially lower for the minority malicious
classes like phishing. This reiterates the challenge of class
imbalance during malicious URL classification [25].
Furthermore, the AdaBoost and Naive Bayes models obtain
particularly poor recall scores on the phishing class, frequently
misclassifying them as benign.
In summary, the extensive comparative evaluation and
analysis of results indicates that while tree ensemble
methods like Random Forest and Extra Trees overall
Fig. 7. Confusion matrix for Extra Trees
achieve the highest accuracy, there is significant room for
improvement in precision and recall for identifying minority
malicious URL types. The simplicity of the Decision Tree
classifier also provides comparably good performance to

5
Authorized licensed use limited to: VIT University. Downloaded on August 15,2024 at 03:57:57 UTC from IEEE Xplore. Restrictions apply.
ensembles [26]. Among the other techniques, K-Nearest 2023 14th International Conference on Computing Communication
Neighbors, Support Vector Machines, AdaBoost and Naive and Networking Technologies (ICCCNT), Delhi, India, 2023, pp. 1-6,
doi: 10.1109/ICCCNT56998.2023.10306875.
Bayes face varying limitations in accurately detecting
[12] Yogesh Mali and Tejal Upadhyay, “Fraud Detection in Online Content
malicious URLs [27]. Mining Relies on the Random Forest Algorithm”, SWB, vol. 1, no. 3,
pp. 13–20, Jul. 2023, doi: 10.61925/SWB.2023.1302
VII. CONCLUSION
[13] T. S. Ruprah, V. S. Kore and Y. K. Mali, "Secure data transfer in android
The paper presents an extensive comparative study of ma- using elliptical curve cryptography," 2017 International Conference on
chine learning techniques for detecting malicious URLs. Algorithms, Methodology, Models and Applications in Emerging
Technologies (ICAMMAET), Chennai, India, 2017, pp. 1-4, doi:
Using a dataset of over 500,000 examples, experiments 10.1109/ICAMMAET.2017.8186639.
showed ensemble models like extra trees and random forest [14] Ritesh Hajare, Rohit Hodage, Om Wangwad, Yogesh Mali, Faraz
achieve over 91% accuracy by learning URL features Bagwan, "Data Security in Cloud", International Journal of Scientific
effectively. However, class imbalance remains an issue with Research in Computer Science, Engineering and Information
minority malicious types often having lower precision and Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 8, Issue 3,
recall compared to the benign majority type. By following pp.240-245, May-June-2021
these steps, you can systematically compare and evaluate [15] Atharva Deshpande , Omkar Pedamkar , Nachiket Chaudhary , Dr.
Swapna Borde, 2021, “Detection of Phishing Websites using Machine
different machine learning models for malicious URL Learning,” INTERNATIONAL JOURNAL OF ENGINEERING RE-
detection and select the most effective approach for your SEARCH TECHNOLOGY (IJERT) Volume 10, Issue 05 ,May 2021.
specific requirements in future. The comparative assessment [16] Y. K. Mali and A. Mohanpurkar, "Advanced pin entry method by
demonstrates the feasibility of using supervised ensemble resisting shoulder surfing attacks," 2015 International Conference on
methods like tree-based models to proactively detect and filter Information Processing (ICIP), Pune, India, 2015, pp. 37-42, doi:
malicious URLs, thereby strengthening cyber threat defense. 10.1109/INFOP.2015.7489347
[17] Tabassum, Nusrath Neha, Farhin Hossain, Md Shohrab Nar-
REFERENCES man, Husnu. (2021). A Hybrid Machine Learning based Phishing
Website Detection Technique through Dimensionality Reduction. 1-6.
[1] Firoz Khan, Jinesh Ahamed, Seifedine Kadry, Lakshmana Kumar Ra- 10.1109/BlackSeaCom52164.2021.9527806.
masamy, ”Detecting malicious URLs using binary classification
[18] Mali, Y., & Chapte, V. (2014). Grid based authentication system,
through adaboost algorithm,” International Journal of Electrical and
International Journal of Advance Research in Computer Science and
Computer Engineering (IJECE), vol. 10, no. 1, pp. 997-1005, Feb
Management Studies, Volume 2, Issue 10, October 2014 pg. 93-99,
2020,doi: 10.11591/ijece.v10i1.pp997-1005
2(10).
[2] SK Hasane Ahammad, Sunil D. Kale, Gopal D. Upadhye, Sandeep
[19] Yogesh Mali, Nilay Sawant, "Smart Helmet for Coal Mining”,
Dwarkanath Pande, E Venkatesh Babu, Amol V. Dhumane, Mr. Dilip
International Journal of Advanced Research in Science,
Kumar Jang Bahadur, “Phishing URL detection using machine learning
Communication and Technology (IJARSCT) Volume 3, Issue 1,
methods,” Advances in Engineering Software, vol. 173, pp.103288,
February 2023, DOI: 10.48175/IJARSCT-8064
2022,https://doi.org/10.1016/j.advengsoft.2022.103288.
[20] Pranav Lonari, Sudarshan Jagdale, Shraddha Khandre, Piyush Takale,
[3] Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, Banu
Prof Yogesh Mali, "Crime Awareness and Registration System ",
Diri, ” Machine learning based phishing detection from URLs,”
International Journal of Scientific Research in Computer Science,
Expert Systems with Applications, vol 117, pp. 345-357,,
Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-
2019,https://doi.org/10.1016/j.eswa.2018.09.029.
3307, Volume 8, Issue 3, pp.287-298, May-June-2021.
[4] S. Modi, Y. K. Mali, V. Borate, A. Khadke, S. Mane and G. Patil, "Skin
[21] Trushank Mhatre , Yogesh Mali , Sairaj Chaudhari , Mohit Ganorkar,
Impedance Technique to Detect Hand-Glove Rupture," 2023 OITS
Pravin Dahalke, 2020, Design of Shoes Against Landmines,
International Conference on Information Technology (OCIT), Raipur,
INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH &
India, 2023, pp. 309-313, doi: 10.1109/OCIT59427.2023.10430992.
TECHNOLOGY (IJERT) Volume 09, Issue 09 (September 2020).
[5] S. Venugopal, S. Y. Panale, M. Agarwal, R. Kashyap and U. Anan-
[22] Jyoti Pathak, Neha Sakore, Rakesh Kapare , Amey Kulkarni, Prof.
thanagu, ”Detection of Malicious URLs through an Ensemble of Ma-
Yogesh Mali, "Mobile Rescue Robot", International Journal of
chine Learning Techniques,” 2021 IEEE Asia-Pacific Conference on
Scientific Research in Computer Science, Engineering and Information
Computer Science and Data Engineering (CSDE), Brisbane, Australia,
Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 4, Issue 8,
2021, pp. 1-6, doi: 10.1109/CSDE53843.2021.9718370.
pp.10-12, September-October-2019
[6] Y. Mali, B. Vyas, V. K. Borate, P. Sutar, M. Jagtap and J. Palkar, "Role
[23] Devansh Dhote , Piyush Rai , Sunil Deshmukh, Adarsh Jaiswal, Prof.
of Block-Chain in Health-Care Application," 2023 IEEE International
Yogesh Mali, "A Survey : Analysis and Estimation of Share Market
Conference on Blockchain and Distributed Systems Security (ICBDS),
Scenario ", International Journal of Scientific Research in Computer
New Raipur, India, 2023, pp. 1-6, doi: 10.1109/ICBDS58040.2023.
Science, Engineering and Information Technology (IJSRCSEIT),
10346537.
ISSN : 2456-3307, Volume 4, Issue 8, pp.77-80, September-October-
[7] F. Vanhoenshoven, G. Na´poles, R. Falcon, K. Vanhoof and M. 2019.
Ko¨ppen, ”Detecting malicious URLs using machine learning
[24] Rajat Asreddy, Avinash Shingade, Niraj Vyavhare, Arjun Rokde,
techniques,” 2016 IEEE Symposium Series on Computational
Yogesh Mali, "A Survey on Secured Data Transmission Using RSA
Intelligence (SSCI), Athens, Greece, 2016, pp. 1-8, doi:
Algorithm and Steganography", International Journal of Scientific
10.1109/SSCI.2016.7850079.
Research in Computer Science, Engineering and Information
[8] V. Borate, Y. Mali, V. Suryawanshi, S. Singh, V. Dhoke and A. Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 4, Issue 8,
Kulkarni, "IoT Based Self Alert Generating Coal Miner Safety pp.159-162, September-October-2019.
Helmets," 2023 International Conference on Computational
[25] Shivani Chougule, Shubham Bhosale, Vrushali Borle, Vaishnavi
Intelligence, Networks and Security (ICCINS), Mylavaram, India,
Chaugule, Prof. Yogesh Mali, “Emotion Recognition Based Personal
2023, pp. 01-04, doi: 10.1109/ICCINS58907.2023.10450044
Entertainment Robot Using ML & IP", International Journal of
[9] F. Alkhudair, M. Alassaf, R. Ullah Khan and S. Alfarraj, ”Detecting Scientific Research in Science and Technology(IJSRST), Print ISSN :
Malicious URL,” 2020 International Conference on Computing and 2395-6011, Online ISSN : 2395-602X, Volume 5, Issue 8, pp.73-75,
Information Technology (ICCIT-1441), Tabuk, Saudi Arabia, 2020, pp. November-December-2020.
1-5, doi: 10.1109/ICCIT-144147971.2020.9213792.
[26] Amit Lokre, Sangram Thorat, Pranali Patil, Chetan Gadekar, Yogesh
[10] F. Yahya et al., ”Detection of Phising Websites using Machine Learning Mali, " Fake Image and Document Detection using Machine Learning",
Approaches,” 2021 International Conference on Data Science and Its International Journal of Scientific Research in Science and
Applications (ICoDSA), Bandung, Indonesia, 2021, pp. 40-47, doi: Technology(IJSRST), Print ISSN : 2395-6011, Online ISSN : 2395-
10.1109/ICoDSA53588.2021.9617482. 602X, Volume 5, Issue 8, pp.104-109, November-December-2020.
[11] Y. Mali, M. E. Pawar, A. More, S. Shinde, V. Borate and R. Shirbhate,
"Improved Pin Entry Method to Prevent Shoulder Surfing Attacks,"

6
Authorized licensed use limited to: VIT University. Downloaded on August 15,2024 at 03:57:57 UTC from IEEE Xplore. Restrictions apply.
[27] Ritesh Hajare, Rohit Hodage, Om Wangwad, Yogesh Mali, Faraz
Bagwan, "Data Security in Cloud", (IJSRCSEIT), ISSN: 2456-3307,
Volume 8, Issue 3, pp.240-245, May-June-2021.

7
Authorized licensed use limited to: VIT University. Downloaded on August 15,2024 at 03:57:57 UTC from IEEE Xplore. Restrictions apply.

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (20)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
Machine Learning Techniques in Adversarial Image Forensic
No ratings yet
Machine Learning Techniques in Adversarial Image Forensic
13 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
Final PPT - Phishing Website
100% (1)
Final PPT - Phishing Website
23 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
691.2 LEW Mod - Modern Stone Cladding. Design and Installation of Exterior Demiension Stone Systems - Lewis, Michael D.
100% (4)
691.2 LEW Mod - Modern Stone Cladding. Design and Installation of Exterior Demiension Stone Systems - Lewis, Michael D.
146 pages
EP17A2E 1 Desbloqueado
No ratings yet
EP17A2E 1 Desbloqueado
80 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Malicious URL Detection Using Machine Learning: Mr. Swapnil Thorat
No ratings yet
Malicious URL Detection Using Machine Learning: Mr. Swapnil Thorat
18 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Machine Learning For Humans
100% (4)
Machine Learning For Humans
97 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
Target Phy Unit and Dimension
No ratings yet
Target Phy Unit and Dimension
30 pages
Maliciousurlpaper
No ratings yet
Maliciousurlpaper
6 pages
sensors-23-07760 (1)
No ratings yet
sensors-23-07760 (1)
14 pages
ICT4SD_Published_Version
No ratings yet
ICT4SD_Published_Version
11 pages
[IJIT-V10I6P4]:Roopesh Kumar B N, Rekha B Venkatapur, Suman B S, Gagan Shivanna
No ratings yet
[IJIT-V10I6P4]:Roopesh Kumar B N, Rekha B Venkatapur, Suman B S, Gagan Shivanna
5 pages
Batch_18-Journal
No ratings yet
Batch_18-Journal
7 pages
Malicious URL Detection and Classification Analysis Using Machine Learning Models
No ratings yet
Malicious URL Detection and Classification Analysis Using Machine Learning Models
9 pages
MaliciousURLDetection_Acomparativestudy (1)
No ratings yet
MaliciousURLDetection_Acomparativestudy (1)
6 pages
Detecting Malicious Urls Using Machine Learning Techniques: A Comparative Literature Review
No ratings yet
Detecting Malicious Urls Using Machine Learning Techniques: A Comparative Literature Review
5 pages
phishing final
No ratings yet
phishing final
13 pages
applsci-12-12030-v2
No ratings yet
applsci-12-12030-v2
14 pages
PAPER
No ratings yet
PAPER
5 pages
Detection of Malicious Urls Using Machine Learning: Nuria Reyes Dorta Pino Caballero Gil Carlos Rosa Remedios
No ratings yet
Detection of Malicious Urls Using Machine Learning: Nuria Reyes Dorta Pino Caballero Gil Carlos Rosa Remedios
18 pages
Applsci 12 12070
No ratings yet
Applsci 12 12070
15 pages
2311.12372v1 (1) (1)
No ratings yet
2311.12372v1 (1) (1)
11 pages
Analysis For Malicious URLs Using
No ratings yet
Analysis For Malicious URLs Using
17 pages
Paper 7AdvancesinEngineeringSoftware
No ratings yet
Paper 7AdvancesinEngineeringSoftware
6 pages
Report
No ratings yet
Report
35 pages
b.e-cse-batchno-256
No ratings yet
b.e-cse-batchno-256
57 pages
Paper 19-Malicious URL Detection Based On Machine Learning
No ratings yet
Paper 19-Malicious URL Detection Based On Machine Learning
6 pages
Malicious URL Detection Using Machine Learning: A Survey: Doyen Sahoo, Chenghao Liu, Steven C.H. Hoi
No ratings yet
Malicious URL Detection Using Machine Learning: A Survey: Doyen Sahoo, Chenghao Liu, Steven C.H. Hoi
2 pages
2501.00356v1
No ratings yet
2501.00356v1
10 pages
Detecting Malicious Urls Using Machine Learning Techniques
No ratings yet
Detecting Malicious Urls Using Machine Learning Techniques
8 pages
An_Adversarial_Attack_Analysis_on_Malicious_Advertisement_URL_Detection_Framework
No ratings yet
An_Adversarial_Attack_Analysis_on_Malicious_Advertisement_URL_Detection_Framework
13 pages
Based On URL Feature Extraction
No ratings yet
Based On URL Feature Extraction
6 pages
Using Lexical Features For Malicious URL Detection - A Machine Learning Approach
No ratings yet
Using Lexical Features For Malicious URL Detection - A Machine Learning Approach
6 pages
malicious_url_detect _1BY21IS087,88
No ratings yet
malicious_url_detect _1BY21IS087,88
5 pages
A12. Malicious URL
No ratings yet
A12. Malicious URL
1 page
SafeLink AI_ Malicious URL Detection - synopsis
No ratings yet
SafeLink AI_ Malicious URL Detection - synopsis
9 pages
Anti-Phishing System Using LSTM and CNN
No ratings yet
Anti-Phishing System Using LSTM and CNN
6 pages
Malicious Url Detection Based On Machine Learning
No ratings yet
Malicious Url Detection Based On Machine Learning
52 pages
Phishing Website Detection by Machine Learning Techniques Presentation
No ratings yet
Phishing Website Detection by Machine Learning Techniques Presentation
12 pages
Empirical Study On Malicious URL Detection Using Machine Learning
No ratings yet
Empirical Study On Malicious URL Detection Using Machine Learning
9 pages
Detection of Malicious Urls Using Machine Learning Techniques
No ratings yet
Detection of Malicious Urls Using Machine Learning Techniques
5 pages
Enhancing Malicious URL Detection a Novel Framework Leveraging Priority Coefficient and Feature Evaluation (1)
No ratings yet
Enhancing Malicious URL Detection a Novel Framework Leveraging Priority Coefficient and Feature Evaluation (1)
26 pages
Malicious Url Detection
No ratings yet
Malicious Url Detection
14 pages
Sniffing Dtetction IEEE Paper
No ratings yet
Sniffing Dtetction IEEE Paper
3 pages
Ieee Paper
No ratings yet
Ieee Paper
3 pages
Detection of Malicious Web Contents Using Machine and Deep Learning Approaches
No ratings yet
Detection of Malicious Web Contents Using Machine and Deep Learning Approaches
6 pages
Fake URL Detection Using Machine Learning and Deep Learning
No ratings yet
Fake URL Detection Using Machine Learning and Deep Learning
8 pages
DETECTION OF MALICIOUS URLS - Copy (2) - 18
No ratings yet
DETECTION OF MALICIOUS URLS - Copy (2) - 18
22 pages
Network Security Report
No ratings yet
Network Security Report
42 pages
SafeLink AI_ URL Threat Detection
No ratings yet
SafeLink AI_ URL Threat Detection
17 pages
Phishing_Review_2023
No ratings yet
Phishing_Review_2023
17 pages
Malicious URL Detection Using Logistic Regression
No ratings yet
Malicious URL Detection Using Logistic Regression
6 pages
Phishing URL Detection Using ML: Project Report
No ratings yet
Phishing URL Detection Using ML: Project Report
25 pages
Mini Project Report Sample Format 2024 - Final
No ratings yet
Mini Project Report Sample Format 2024 - Final
80 pages
A multi-algorithm approach for phishing uniform resource locator’s detection
No ratings yet
A multi-algorithm approach for phishing uniform resource locator’s detection
10 pages
URL Phishing
No ratings yet
URL Phishing
36 pages
Kumar 2018
No ratings yet
Kumar 2018
6 pages
depuuuDOCNW[1]
No ratings yet
depuuuDOCNW[1]
28 pages
Fake Website Detection
No ratings yet
Fake Website Detection
13 pages
20mis0106 VL2023240103172 Pe003
No ratings yet
20mis0106 VL2023240103172 Pe003
5 pages
Malicious URL Detection Using Random Forest
No ratings yet
Malicious URL Detection Using Random Forest
36 pages
Malicious URL
No ratings yet
Malicious URL
11 pages
Department of Computer Engineering: Phishing Website Detector Using ML
No ratings yet
Department of Computer Engineering: Phishing Website Detector Using ML
13 pages
Phishing URL Detection Using ML: Project Report
No ratings yet
Phishing URL Detection Using ML: Project Report
24 pages
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
From Everand
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
Dr. Anita Gehlot
No ratings yet
DATAQUEST - April, 2025: DataQuest monthly
From Everand
DATAQUEST - April, 2025: DataQuest monthly
Cyber Media (India) Ltd.
No ratings yet
miao2017
No ratings yet
miao2017
13 pages
fonc-13-1151257
No ratings yet
fonc-13-1151257
17 pages
A19_23_Kunduracıoğlu+et+al
No ratings yet
A19_23_Kunduracıoğlu+et+al
15 pages
15.+seed Repharsed+updated 24jan2023+ (1) + (2) - 1
No ratings yet
15.+seed Repharsed+updated 24jan2023+ (1) + (2) - 1
9 pages
JJEM020202
No ratings yet
JJEM020202
9 pages
East 21 143 5 (1) 46 55
No ratings yet
East 21 143 5 (1) 46 55
10 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
From Music To Mathematic
100% (1)
From Music To Mathematic
4 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
Mind Control Patents
100% (1)
Mind Control Patents
41 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
DNV-CG-0508
No ratings yet
DNV-CG-0508
47 pages
Writing Rubric
No ratings yet
Writing Rubric
5 pages
LevelMe-DataSheet ENG 130702
No ratings yet
LevelMe-DataSheet ENG 130702
2 pages
Chap 1 Excercise
100% (2)
Chap 1 Excercise
8 pages
Microbiological Methods Validation Guidelines 3 12-2019
No ratings yet
Microbiological Methods Validation Guidelines 3 12-2019
59 pages
20bce2251 VL2021220503859 Ast02
No ratings yet
20bce2251 VL2021220503859 Ast02
10 pages
Sartorius A 200 S Analytic Balance Service Manual PDF
No ratings yet
Sartorius A 200 S Analytic Balance Service Manual PDF
45 pages
Laboratory Manual FHSC1014 Mechanics: Foundation in Science (P) Trimester 1
No ratings yet
Laboratory Manual FHSC1014 Mechanics: Foundation in Science (P) Trimester 1
33 pages
Tape Measure Inspection Log
No ratings yet
Tape Measure Inspection Log
5 pages
AI 210 Instrumentation
No ratings yet
AI 210 Instrumentation
61 pages
Transmille Training - Uncertainties
No ratings yet
Transmille Training - Uncertainties
21 pages
20BCS2334 Jitesh Kumar
No ratings yet
20BCS2334 Jitesh Kumar
4 pages
Are View Article On Analytical Method Validation
No ratings yet
Are View Article On Analytical Method Validation
12 pages
An Analytic-Based Course Recommendation System For Higher Education
No ratings yet
An Analytic-Based Course Recommendation System For Higher Education
6 pages
MED229Samson LabExercise1
No ratings yet
MED229Samson LabExercise1
2 pages
Daily Lesson Log in Electricity
83% (30)
Daily Lesson Log in Electricity
41 pages
Exposed Linear Encoders Heidenhein
No ratings yet
Exposed Linear Encoders Heidenhein
76 pages
Application of Data Color in Textile Material
No ratings yet
Application of Data Color in Textile Material
19 pages
0580_y25_sm_4A
No ratings yet
0580_y25_sm_4A
10 pages
RAR PRACTICALS BOKOK (2)
No ratings yet
RAR PRACTICALS BOKOK (2)
14 pages

Comparative_Evaluation_of_Machine_Learning_Models_for_Malicious_URL_Detection

Uploaded by

Comparative_Evaluation_of_Machine_Learning_Models_for_Malicious_URL_Detection

Uploaded by

2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon)

MIT ADT University, Pune, India. Apr 25-27, 2024

Comparative Evaluation of Machine Learning

Models for Malicious URL Detection

Anup Date4 Vishal Borate5 Yogesh Kisan Mali6

979-8-3503-6287-9/24/$31.00 ©2024 IEEE 1

TABLE II. TEST ACCURACY OF MODELS

Fig. 4. Confusion matrix for AdaBoost Classifier

Fig. 1. Graphical Summary of Test Accuracy among Models

An examination of the confusion matrices reveals that the

VI. PRECISION AND RECALL PER CLASS ACROSS

You might also like