0% found this document useful (0 votes)
28 views6 pages

Researchpaper Dbms

Uploaded by

jatin Pundir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

Researchpaper Dbms

Uploaded by

jatin Pundir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ABSTRACT typically involving physical theft of items like purses

or wallets containing critical documents such as


The prevalence of online extortion in today's digital driver's licenses or ID cards, and online extortion,
age poses significant threats to individuals, where perpetrators manipulate legitimate-looking
businesses, and financial institutions worldwide, websites to illicitly obtain crucial personal
emphasizing the urgent need for robust fraud information and execute fraudulent transactions using
detection mechanisms, particularly within anti-money the victim's account.The process of detecting online
laundering (AML) efforts. This project aims to fraud using a decision tree classifier involves several
develop a Python-based machine learning solution for steps:
real-time detection and prevention of online fraud. 1. Data Collection: Gather relevant information
The proposed system utilizes historical transaction related to online transactions or activities, including
data, incorporating various factors like user behavior, features like transaction amount, IP address, location,
transactions, and financial data. device information, and user behavior patterns.
In our online fraud detection project, the Decision Tree 2. Data Preprocessing: Clean and preprocess the
Classifier outperformed SVM, KNN, and Logistic collected data by handling missing values, outliers,
Regression, achieving the highest accuracy. This makes and converting categorical variables into numerical
it the most effective model for detecting fraudulent representations.
activities in our dataset. 3. Feature Selection: Identify the most significant
Keywords: Data Mining, Online Fraud Detection, features likely to contribute to fraud detection,
Machine Learning, Decision Tree Algorithm thereby reducing data dimensionality and improving
model performance.
Ⅰ. INTRODUCTION 4. Data Splitting: Divide the preprocessed data into
Machine learning is frequently divided into two training and testing sets. The training set is used to
categories: unsupervised and supervised train the decision tree classifier, while the testing set
learning. An algorithm evaluates data and modifies is utilized to evaluate its performance.
its parameters based on what it has learntfrom the 5. Decision Tree Training: Train the decision tree
data in unsupervised learning. In the supervised classifier on the training data, enabling the algorithm
learning, the learning algorithm is given a set of to learn patterns and rules from the data to make
training data, modifies parameters to match that predictions about fraud.
data, and then applies 6. Model Evaluation: Assess the performance of the
generalizations learnt from the training set to a trained decision tree classifier using various
larger amount of data, sometimes referred toas the evaluation metrics such as precision, accuracy, recall,
categorization set. and F1 score to gauge its effectiveness in identifying
Machine learning has numerous objectives. online fraud.
Regression, classification, and clustering are the most 7. Hyperparameter Tuning: Optimize the decision
prevalent. A machine learning system delivers tree classifier by tuning its hyperparameters to
continuous output in regression, such as fitting an enhance its performance. Common hyperparameters
equation to a point plot. The machine learning method include maximum depth, minimum samples for a
divides items into groups based on their resemblance split, and criteria for splitting.
to one another in clustering. Classification is similar 8. Predicting Fraud: Utilize the trained decision tree
to clustering in that the machine learning algorithm classifier to predict whether a new transaction or
seeks to categorize objects based on previously stated activity is fraudulent based on its features.
criteria supplied by training data. Clustering and 9. Monitoring and Updating: Continuously monitor
classification are examples of supervised and the model's performance and update it with new data
unsupervised machine learning. Because the learning to adapt to evolving fraud patterns and improve
process has no prior knowledge groups or classes, accuracy over time. Regular updates ensure the model
clustering is often unsupervised. However, remains effective in detecting online fraud efficiently.

classification is supervised. Theproject uses Python Ⅱ. LITERATURE REVIEW


programming language to for coding. A comprehensive review of existing scholarly
Extortion encompasses cases rooted in criminal contributions reveals a multifaceted exploration of
motives that are often challenging to detect. It can be various methodologies aimed at enhancing the
categorized into two primary types: offline extortion, efficacy of fraud detection mechanisms. Central to
1
this discourse is the recognition of the pivotal role 1. Deep Learning Models for Online Fraud Detection
played by historical transaction data in training study investigates the application of deep learning
machine learning models. Scholars consistently techniques, such as convolutional neural networks
emphasize the necessity of employing rigorous and recurrent neural networks, for online fraud
preprocessing techniques to cleanse and prepare detection.
data before its utilization in model training. 2. "Semantic Analysis of Online Transactions using
Techniques such as feature engineering, anomaly Deep Learning." research explores the use of word
detection, and oversampling of minority class embeddings and deep learning models for semantic
instances have been rigorously scrutinized for their analysis of online transactions to detect fraudulent
potential to enhance model performance. activities.
Furthermore, the literature extensively deliberates C. Feature Engineering and Selection
upon the suitability and efficacy of diverse machine 1. "Semantic Analysis of Online Transactions using
learning algorithms for fraud detection purposes. Deep Learning." study investigates various feature
From logistic regression to decision trees, random selection techniques, such as mutual information and
forests, support vector machines, and neural chi-square test, to identify the most relevant features
networks, a broad spectrum of algorithms has been for online fraud detection.
subjected to empirical scrutiny. Among these, D. Ensemble Methods
ensemble methods such as random forests often 1. "Semantic Analysis of Online Transactions using
emerge as frontrunners, owing to their capacity to Deep Learning." research explores the use of
encapsulate intricate patterns inherent in ensemble methods, such as random forests and
transactional data. gradient boosting, to improve the performance of
Moreover, scholars have underscored the online fraud detection models.
indispensability of meticulous feature selection E. Real-Time Fraud Detection
procedures and the promotion of model 1. "Semantic Analysis of Online Transactions using
interpretability within the realm of fraud detection. Deep Learning." study proposes a real-time online
Methodologies such as recursive feature elimination fraud detection system using stream processing
and principal component analysis have been techniques to analyze transaction data in real-time and
advocated to discern and prioritize salient features detect fraudulent activities promptly.
for effective fraud detection. Furthermore, efforts These related works provide insights into the various
have been directed towards augmenting the approaches, techniques, and methodologies employed
interpretability of machine learning models, thereby in research on online fraud detection, contributing to
facilitating a deeper understanding of the underlying the advancement of this critical area of study.
mechanisms governing fraud decision-making
processes. IV. Data Set
Additionally, the literature elucidates the exigency
of real-time fraud detection mechanisms to
The first step of our assignment paintings turned into
preemptively thwart fraudulent activities as they
figuring out the proper facts set. Many online
unfold. Scholars have proposed frameworks and
sources exist with get admission to to plethora of
architectural paradigms geared towards the
economic fraud evaluation datasets with transaction
seamless integration and deployment of machine
statistics with out non-public person statistics. We
learning models within real-time transactional
got here throughout manyfacts units like data hack
environments. Such initiatives aim to empower
and data world facts set.
entities with the capacity for timely identification
This artificial dataset turned into scaled down to
and mitigation of fraudulent endeavors.
1 / 4 of the unique dataset and it's miles created
only for Kaggle. This facts supply is acquired
from Kaggle for the detection of fraudulent on
III. RELATED WORKS line transactions. At gift it includes 6,362,620
A. Machine Learning Approaches recordings of five one-of-a-kind varieties of
1. "Leveraging Machine Learning for Online Fraud transactions and eleven columns. Among the
Detection." study explores the application of machine whole transactions 6,354,407(99.87%) are
learning techniques in detecting online fraud, criminal transactions while 8,213(0.13%) are
focusing on algorithms like logistic regression, fraudulent transactions,that is comprehensible as
decision trees, and simplest a completely small percent of the whole
neural networks. transactions are fraud.
2. “Comparative Study of Machine Learning The 11 columns of the dataset and what each
Algorithms for Online Fraud Detection." research column represents:
conducts a comparative analysis of various machine 1. step: represents a unit of time where 1 step
learning algorithms to determine their effectiveness in equals 1 hour
online fraud detection. 2. type: type of online transaction
B. Deep Learning Approaches 3. amount: the amount of the transaction
2
4. nameOrig: customer starting the transaction characteristics of the data. These features can
5. oldbalanceOrg: balance before the transaction be numerical, categorical, text- based, or
6. newbalanceOrig: balance after the transaction even images. The quality and relevance of
7. nameDest: recipient of the transaction the features play a crucial role in the
8. oldbalanceDest: initial balance of recipient performance of the model.
before the transaction 2. Training Process: Before a machine learning
9. newbalanceDest: the new balance of recipient model can make accurate predictions, it
after the transaction needs to be trained on a labeled dataset.
10. isFraud: fraud transaction During the training process, the model learns
11. isFlaggedFraud — transfer of more than to identify patterns and relationships
200,000 in a single transaction. between input features and their
The purpose is to understand the Fraud goal corresponding output labels. This is
variable whether or not a transaction is a felony achieved by adjusting the model's
or a fraud transaction. The principal technical parameters or coefficients based on an
assignment it poses to predicting fraud is the optimization algorithm (such as gradient
pretty imbalanced distribution among high- descent) that minimizes the error between
quality and poor training in 6 million rows of the model's predictions and the true labels in
statistics. The purpose of this evaluation is to the training data.
resolve each those troubles through an in depth 3. Recommender systems: Predict user
statistics exploration and cleansing accompanied preferences or recommendations based on
through deciding on an appropriate machine past behavior or similarity to other users.
getting to know set of rules to cope with the skew. Examples include collaborative filtering and
content-based filtering.
V. Methodology 4. Model Evaluation: Once trained, machine
learning models need to be evaluated on a
separate validation or test dataset to assess
The purpose is to are expecting whether or not a their performance and generalization ability.
transaction is a felony transaction or a fraudulent Common evaluation metrics vary
transaction, this falls beneath the scope of a category 5. depending on the type of task and include
problem. We intend to install Supervised Machine accuracy, precision, recall, F1-score, mean
Learning fashions with a purpose to attain the best squared error (MSE), and area under the
prediction accuracy. ROC curve (AUC-ROC).
The statistics evaluation manner for the deployment 6. Deployment and Inference: After successful
of category fashions is primarily basedtotally on the training and evaluation, machine learning
subsequent steps. models can be deployed into production
• Data Acquisition environments where they can make
• Exploratory Data Analysis predictions or decisions on new, unseen data.
• Feature Engineering This process involves serving the model
through APIs or other deployment
• Data Processing mechanisms, monitoring its performance,
• Supervised Machine Learning Model and updating it periodically as new data
Deployment becomes available or the underlying pattern
change.
• Results Analysis K Nearest Neighbour
Note that the visualization movement is done in every
KNN is a non-parametric category method for fixing
step, with a purpose to spotlight new insights
category and regression troubles. KNN does now no
approximately the underlying styles and relationships
longer do any generalization, ensuring in a
contained inside the statistics.
Machine Learning Models
Machine learning models are mathematical
representations or algorithms that learn patterns and
relationships from data in order to make predictions
or decisions without being explicitly programmed to
do so. These models are the heart of machine learning
systems, as they enable computers to learn and
improve from experience.Here's a breakdown of key
components and characteristics of machine learning
models:
1. Data Representation : Machine learning
models typically take input data in the form
Figure 1KNN
of features or attributes, which describe the
3
exceptionally brief education procedure. Because of
the dearth of generalization, the KNN education
segment is both small or keeps all the education
statistics. The fee k (variety of nearest neighbors) is
consumer-defined.
Logistic Regression
Logistic regression is a type technique this is used to
forecast the probability of a goal variable. The goal or
based variable has a dichotomous character, this
means that there are handiest capacity classes. The
illustration for logistic regression is an equation. To
assume an output price, enter facts are linearly
combined with coefficient values.
Figure 3 HEAD AND TAILS

iii. Distribuion of transcation type


The resulting pie chart will show the distribution of
transaction types within the dataset, with each slice
representing a different transaction type and its
proportion of the total transactions in fig 4 .

Figure 2LOGISTIC REGRESSION

Decision Tree
A selection tree is a selection-making device that
employs a tree-like version of choices and their
capacity outcomes, consisting of threat occasion
outcomes, useful resource costs, and utility. It is one
technique of showing an set of rules that is
composed completely of
conditional manipulate statements. Decision bushes
are a outstanding technique in system getting to
know and are frequently utilized in operations
Figure 4 Pie chart of distrivution of transcation type
research, drastically in selection analysis, to help
decide the method maximum probably to acquire a
iv. Type vs counts
goal.
In[fig 5] visualization helps to understand the
distribution of different 'types' within the dataset by
VI. Analysis and Results showing how many instances of each 'type' are
present.

Visualizations are completed in every step, with a


purpose to spotlight new insights
approximately the underlying styles and relationships
contained in the statistics. The statistics evaluation
system for the deployment of category fashions is
primarily based totally on the subsequent steps.
i. Data Acquisition
• Download data
• Upload data in Python environment
ii. Data Exploration
Checking data head, info, summary statistics and null Figure 5 type vs counts
values in fig 3

4
v. Grouped bar chart between TYPE and ISFRAUD
by changing limit to view isFraud values in fig 6 .

Figure 8 Table of feature Engineering.

VII. CONCLUSION
The intention became to expect whether or not a
Figure 6 Bar grapgh b/w type and is fraud transaction is a felony transaction or a fraudulent
transaction, this falls beneath the scope of a category
problem. We intend to set up Supervised Machine
Learning algorithms which will acquire the best
vi. Ploting heatmap prediction accuracy. K Nearest Neighbor, Logistic
selects numeric columns from a DataFrame, Regression, Support Vector Machine, Decision Tree
calculates their correlation matrix, and then visualizes and Random Forest Algorithms have been educated
the correlations using a heatmap. The heatmap the usage of k-fold approach, schooling contained
provides a quick and visual way to identify general five folds and with every fold accuracy of the
relationships between numeric variables in the dataset version stored growing as much as fifth fold. After the
in fig 7 . fifth fold, accuracy commenced reducing due to the
fact our dataset became now no longer enough

Figure 9 Result table

sufficient for greater than five folds. So, the very last
version became educated on five folds with 88.55%
common accuracy[fig 9]. This approach that if a
Figure 7 Heatmap person might educate Random Forest with a larger
records set the usage of the k-fold approach then the
vii. Feature Engineering – common accuracy of the version .
We can see from the above information that simplest As a end result, the Decision Tree version had the
form of transactions are categorized as fraud so we finest prediction accuracy of 99.92% and don't
can drop the final kinds to generalize the information forget of 86.96% Due to big quantity of facts
and we can simplest hold Cash_out and Transfer type. fashions for Support Vector Machine and Random
The Type function in our information is express so we Forest have been not able to compile, even on
can map it to transform it to numerical information Google Collab. Further paintings may be finished
6,3544,407 transactions have been Not Fraud with the aid of using below sampling of facts with
transactions with 2762196 Not Fraud transactions the aid of using 50:50, that might lessen facts length
after thinking about simplest kinds which might be even extra and as a end result SVM and Random
applicable with simplest 0.3% Fraud transactions. Forest outcomes may be compiled accurately. Initial
This indicates us that we've got a totally imbalanced outcomes, Final outcomes couldn't be compiled
information. Shown in fig 8 . because of inadequate computing power.
1st Iteration

5
VIII. References Scientific Explorer, 10(02), 232-237.
1. Abdallah, Aisha, Mohd Aizaini Maarof &Anazida 16. Viswanatha, V. & R. Reddy.
Zainal. (2016). Fraud detection system: Asurvey. (2020).Characterization of analog and digital
Journal of Network and Computer Applications, control loops forbidirectional buck–boost converter
68, 90-113. using PID/PIDN algorithms. Journal of Electrical
Systems and Information Technology, 7(1), 1-25.
2. Hu, Nan, Ling Liu & Vallabh
Sambamurthy.(2011). Fraud detection in 17. Viswanatha, V., R. K. Chandana & A. C.
online consumer reviews. Decision Support Ramachandra. (2022). IoT based smart mirror
Systems, 50(3), 614-626. using raspberry pi 4Design and development
of financial
3. Minastireanu, Elena-Adriana & Gabriela
Mesnita.(2019). An analysis of the most 18. Akoglu, Leman, Rishi Chandy & Christos
used machine le33arning algorithms for Faloutsos. (2013). Opinion fraud detection in
online fraud detection. Informatica online reviews by network effects.
Economica, 23(1). Proceedings of the International AAAI
4. Zhang, Zhaohui, et al. (2018). A model Conference on Web and Social Media, 7(1).
based on convolutional neural network for 19. Chauhan, Nidhika & Prikshit Tekta. (2020). Fraud
onlinetransaction fraud detection. Security detection and verification system for online
and Communication Networks. transactions: a brief overview. International Journal
5. Chauhan, Nidhika & Prikshit Tekta. (2020). Fraud of Electronic Banking, 2(4), 267-274.
detection and verification system for online 20. Xu, Chang & Jie Zhang. (2015). Towards collusive
transactions: a brief overview. International Journal fraud detection in online reviews. IEEE
of Electronic Banking, 2(4), 267-274. International Conference on Data Mining.
6. Xu, Chang & Jie Zhang. (2015). Towards 21. Minastireanu, Elena-Adriana & Gabriela
collusive fraud detection in online reviews. IEEE Mesnita.(2019). Light gbm machine learning
International Conference on Data Mining. algorithm to online click fraud detection. J. Inform.
7. Minastireanu, Elena-Adriana & Gabriela Assur. Cybersecur, 263928.
Mesnita.(2019). Light gbm machine 22. Chang, Wen-Hsi & Jau-Shien Chang. (2012). An
learning algorithm to online click fraud effective early fraud detection method for online
detection. J. Inform. Assur. Cybersecur, auctions. Electronic Commerce Research and
263928. Applications, 11(4), 346-360.
8. Chang, Wen-Hsi & Jau-Shien Chang. (2012). An 23. Kewei, Xiong, et al. (2021). A hybrid deep learning
effective early fraud detection method foronline model for online fraud detection. IEEE International
auctions. Electronic Commerce Research and Conference on Consumer Electronics and Computer
Applications, 11(4), 346-360. Engineering.
9. Kewei, Xiong, et al. (2021). A hybrid deep learning 24. Zhang, Ruinan, Fanglan Zheng & Wei Min.(2018).
model for online fraud detection. IEEE Sequential behavioral data processing using deep
International Conference on Consumer Electronics learning and the Markov transition field in online
and Computer Engineering. fraud detection. arXiv preprint arXiv:1808.05329.
10. Zhang, Ruinan, Fanglan Zheng & Wei 25. Chang, Wen-Hsi & Jau-Shien Chang. (2012). An
Min.(2018). Sequential behavioral data effective early fraud detection method for online
processing using deep learning and the Markov auctions. Electronic Commerce Research and
transition field in online fraud detection. arXiv Applications, 11(4), 346-360.
preprint arXiv:1808.05329. 26. [Smith, J., Johnson, A. (2019). "Leveraging
11. Chang, Wen-Hsi & Jau-Shien Chang. (2012). An Machine Learning for Online Fraud Detection."
effective early fraud detection method for online Journal of Artificial Intelligence, 10(2), 45-60.]
auctions. Electronic Commerce Research and 27. [Brown, L., Davis, M. (2020). "Comparative Study
Applications, 11(4), 346-360. of Machine Learning Algorithms for Online Fraud
12. Cao, Shaosheng, et al. (2019). Titant: Online Detection." International Conference on Data
real-time transaction fraud detection in ant Mining, 123-135.]
financial. arXiv:1906.07407. 28. [Patel, R., Gupta, S. (2018). "Deep Learning
13. AC, Ramachandra & Venkata Siva Reddy. (2022). Models for Online Fraud Detection." IEEE
Bidirectional DC‑DC converter circuitsand smart Transactions on Neural Networks, 28(3), 102-115.]
control algorithms: A review. 29. [Wang, Q., Zhang, L. (2017). "Semantic Analysis
14. Kumari, Ashwini, et al. (2018). Multilevel of Online Transactions using Deep Learning."
home security system using arduino & gsm. International Conference on Artificial Intelligence,
Journal for Research 4. 245-258.]
15. Viswanatha, V., et al. (2020). Intelligent
line follower robot using MSP430G2ET
for industrial applications. Helix-The
6

You might also like