0% found this document useful (0 votes)
1 views

Credit Card Fraud Detection Using Machine Learning (1) (1)

This document discusses the use of machine learning and data science for credit card fraud detection, emphasizing the importance of identifying fraudulent transactions to protect customers. It outlines various algorithms, including Local Outlier Factor and Isolation Forest, used to model and analyze credit card transaction data to detect anomalies. The study highlights the challenges of fraud detection, such as class imbalance and the need for continuous improvement of algorithms through feedback and additional data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Credit Card Fraud Detection Using Machine Learning (1) (1)

This document discusses the use of machine learning and data science for credit card fraud detection, emphasizing the importance of identifying fraudulent transactions to protect customers. It outlines various algorithms, including Local Outlier Factor and Isolation Forest, used to model and analyze credit card transaction data to detect anomalies. The study highlights the challenges of fraud detection, such as class imbalance and the need for continuous improvement of algorithms through feedback and additional data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Credit Card Fraud Detection using

Machine Learning and Data


mastercard Fraud can be outlined as a case wherever
Scienc someone uses someone elses credit card for private
reasons whereas the owner and the card supplying
e authorities are unaware of the actual fact that the
cardboard is being used. Fraud detection involves
AUTHORS: Karthika.K and observance the activities of populations of users so
Monisha.T as to estimate, understand or avoid objectionable
DEPARTMENT: II B.Sc behaviour, that comprises fraud, intrusion, and
Computer Science. E-mail: defaulting a really relevant drawback that demands
[email protected] the eye of communities love machine learning and
u.in information science wherever the answer to the
stu_MONISHA_T4@cttewc. present problem can be automated. This problem is
edu.in especially difficult from the angle of learning,
because it is characterised by varied factors such as
category imbalance. the quantity of valid
dealingss way total dishonorable ones. Also, the
transaction patterns typically amendment their
Abstract— It is vital that credit card companies are able applied mathematics properties over the course of
to identify fraudulent credit card transactions so that
customers are not charged for items that they did not time
purchase. Such problems can be tackled with Data
Science and its importance, along with Machine
Learning, cannot be overstated. This project intends to
illustrate the modelling of a data set using machine
learning with Credit Card Fraud Detection. The Credit
Card Fraud Detection Problem includes modelling past
credit card transactions with the data of the ones that
turned out to be fraud. This model is then used to
recognize whether a new transaction is fraudulent or
not. Our objective here is to detect 100% of the
fraudulent transactions while minimizing the incorrect
fraud classifications. Credit Card Fraud Detection is a
typical sample of classification. In this process, we have
focused on analysing and pre-processing data sets as These aren't the handiest demanding situations
well as the deployment of multiple anomaly detection with inside implementation of a actual-global
algorithms such as Local Outlier Factor and Isolation fraud detection system, however. In actual global
Forest algorithm on the PCA transformed Credit Card
examples, the large circulate of price requests is
Transaction data.
quickly scanned with the aid of using
computerized gear that decide which transactions
to authorize. Machine studying algorithms are
hired to examine all the legal transactions and
document the suspicious ones. These reviews are
Keywords— Credit card fraud, applications of
investigated with the aid of using experts who
machine learning, data science, isolation forest touch the cardholders to affirm if the transaction
algorithm, local outlier factor, automated fraud changed into authentic or fraudulent.
detection. The investigators offer a remarks to the
automatic system that is used to teach and
I. INTRODUCTION replace the set of rules to eventually
enhance the fraud-detection overall
'Fraud' in mastercard transactions is unauthorized performance over time.
and unwanted usage of an account by somebody
however hand} the owner of that account. Necessary
interference measures is taken to prevent this abuse
and also the behaviour of such dishonorable
practices can be studied to reduce it and defend
against similar occurrences within the future.In other
words,
because the unlawful or criminal deception meant to
lead to monetary or personal benefit. it's a deliberate
act that is against the law, rule or policy with
associate aim to realize unauthorized financial
benefit.various literatures touching on anomaly or
fraud detection during this domain are printed
already and are obtainable for public usage. A
comprehensive survey conducted by Clifton Phua
and his associates have revealed that techniques
utilized in this domain embrace data processing
applications, machine-controlled fraud detection,
adversarial detection. In associateother paper,
Suman, analysis Scholar, GJUS& T at Hisar HCE
bestowed techniques like supervised and
unsupervised Learning for mastercard fraud
detection. even if these ways and algorithms fetched
an sudden success in some areas, they didn't give a
permanent and consistent answer to fraud detection.
the same research domain was presented by Wen-
Fang YUand metallic element Wang wherever they
used Outlier mining, Outlier detection mining and
Fraud detection ways are unendingly Distance total algorithms to accurately predict
developed to defend fallacious dealings in an emulation experiment of
criminals in adapting to their fallacious mastercard ve calculated that distance between the
determined value of that attribute dealings data set of
1 sure business bank. Outlier mining may be a field
of knowledge mining that is essentially utilized in
financial and web fields. It deals with police
investigation objects that are detached from the most
system i.e. the transactions that arent genuine. they
strategies. These frauds need taken attributes of customers behaviour and
are classified as: supported the worth of these attributes they' was
• mastercard Frauds: on-line and Offline accompanied by classif misclassification costs.
• Card thieving
III. METHODOLOGY
• Account Bankruptcy The approach that this paper proposes, uses
• Device Intrusion the latest machine learning algorithms to
detect anomalous activities, called outliers.
• Application Fraud
The basic rough architecture diagram can be
•• Counterfeit Card Telecommunication Fraud
represented with the following figure:
Some of the currently used approaches to
detection of such fraud are:
• Artificial Neural Network
• Fuzzy Logic
• Genetic Algorithm
• Logistic Regression
• Decision tree
• Support Vector Machines
• Bayesian Networks
• Hidden Markov Model When looked at in detail on a larger scale along with
real life elements, the full architecture diagram can
• K-Nearest Neighbour
be represented as follows:

II LITERATURE REVIEW

Fraud
act
111
countered fraud from a unique direction.it
proven

and its preset value.Unconventional techniques comparable This graph shows that the number
to hybrid data mining/complex network classification of fraudulent transactions is much
formula is in a position to understand prohibited instances in lower than the legitimate ones.
associate actual card dealings knowledge set, supported
network reconstruction algorithm that permits making
representations of the deviation of 1 instance from a
reference cluster have proven economical generally on
medium sized on-line transaction.There have conjointly been
efforts to progress from a very aspect. tries are created to
enhance the alertfeedback interaction just in case of
fallacious transaction. just in case of fraudulent transaction,
the authorised system would be alerted and a feedback
would be sent to deny the continuing transaction. Artificial
Genetic Algorithm, one in every of the approaches that shed
new lightweight during this domain, correct to find out
thefallacious transactions and minimizing the amount of
false alerts. Even though, if
First, we got our dataset from Kaggle, a data
analysis website that provides datasets. There are
31 columns in this dataset, 28 of which are named This graph shows the times at
v1-v28 to protect sensitive data.the other columns which transactions were done
represent time, amount and class. Time shows within two days. It can be seen that
the time span between the first transaction and the the least number of transactions
following one. Amount is the amountof money were made during night time and
that will be transacted. Class 0 stands for a valid highest during the days.
transaction and 1 for a fraudulent one. We draw
different charts to check the data set for
inconsistencies and to understand it visually:

This graph represents the amount


that was transacted. A majority of
transactions are relatively small and only a handful of
them come close to the maximum transacted amount.

After checking this dataset, we plot a histogram for


every column. This is done to get a graphical
representation of the dataset which can be used to verify
that there are no missing any values in the dataset. This
require any missing value learning algorithms can
process the dataset smoothly.

After this analysis, we plot a heatmap to get a coloured


representation of the data and to study the correlation
between out predicting variables and the class variable. This
heatmap is shown below:

The record is now formatted and processed. The "Time"


and "
Amount" columns are standardized and the "Class" column has
been removed to ensure fair scoring. The data is processed by a
series of algorithms from modules. The following module
diagram explains how these algorithms work together: This data
is fitted into a model and the following outlier detection modules
are applied to it:

• Local Outlier Factor


• Isolation Forest Algorithm

These algorithms are part of sklearn. The Ensemble module in the


sklearn package contains ensemble-based methods and
Classification, regression and outlier functions Recognition.
This free and open source Python library is built with NumPy, SciPy
and Matplotlib modules many provide simple and efficient tools that
can be used for data analysis

113
and machine learning. It has various classification, clustering and
regression algorithms and is designed to work with numerical and
scientific libraries.
We used the Jupyter Notebook platform to create a program in
Python to demonstrate the approach proposed in this document.
This program can also run in cloud with Google Collab platform
which supports all Python notebook files. Detailed explanations of
the modules with pseudocodes for their algorithms and output
graphs are given as follows:
A. Local Outlier Factor
It is an unsupervised outlier detection algorithm. "Local
Outlier Factor" refers to the anomaly score of each sample. It
measures the local deviation of the sample data with respect to its
neighbors.
. By comparing the local values of neighbors, one can identify
More specifically, locality is given by k-nearest neighbors,
samples that are ignificantly lower than their neighbors. These
whose distance is used to estimate the local data. The
values are quite amanous and are considered outliers.
pseudocode for this algorithm is as follows:
Since the dataset is very large, we used only a fraction of it in
our tests to reduce processing times. The end result with the
fully processed data set is also determined and is given in the
result part of this work.
The Isolation Forest ‘isolates’ observations by arbitrarily
selecting a feature and then randomly selecting a split
value between the maximum and minimum values of the
designated feature.
Recursive partitioning can be represented

On plotting the results of Local Outlier Factor algorithm, we get


the following figure:

On plotting the results of Isolation Forest algorithm, we


get the following figure:

114
fraudulent transaction.
This result is compared to the class values to check for false
positive

Their random partitioning produces shorter paths for anomalies. When a


forest of random trees produce each other
shorter path lengths for certain samples are extreme likely anomalies.
Once the anomalies are detected, the system can be used report them to
the relevant authorities. For test purposes ,we compare the outputs of
these algorithms to determine this

their accuracy and precision.


IV. IMPLEMENTATION Results with the complete dataset is used:
This idea is difficult to implement in real life because requires the cooperation
of banks that are not willing to do this

share information based on their market competition, and also for legal
reasons and to protect the data of its users. So we looked up some reference
papers that followed similar approaches and collected results. As stated in one
of these reference works:

“ This technique was applied to a complete application data set


Supplied by a German bank in 2006. For banking for reasons of confidentiality,
only a summary of the results obtained
is presented below. After applying this technique is the stage 1 List includes
some cases, but with high probability be a scammer. All of the people
mentioned in this list had their cards closed avoid any risk due to their high risk
profile. The condition is more complex for the other list. The level 2 list is still
standing appropriately constrained to be examined on a case-bycase basis.
Credit and collections officers reviewed half of the cases in this list could be
considered as suspect fraudulent Behavior. For the last list and the biggest is
work evenly heavy. Less than a third of them are suspicious. To maximize time
efficiency and overhead Fees, one way is to include a new item in the query;
this element can be the first five digits of phone numbers, the email address
and
VI. CONCLUSION
password, for example the new ones
Queries can be applied to the level 2 list and the level 3 list.”
V. RESULTS Credit card fraud is undoubtedly an act of criminal
The code returns the number of false alarms detected and compares them dishonesty. This article has listed the most common
scam methods along with their detection methods and
with the actual values. That takes getting used to
reviewed the latest findings in the field. This article also
Calculate the accuracy value and the precision of the algorithms. The
explained in detail how machine learning can be applied
proportion of data we used for faster testing is 10% of the entire data set.
The complete data set is also used at the end and both results to achieve better fraud detection results, along with the
algorithm, the pseudocode, the explanation of its
are printed.
These results together with the respective classification report The algorithm is implementation, and the test results. While the
algorithm achieves over 99.6% accuracy, it only stays at
given in the output as follows, where class 0 means the transaction was
28% when considering one-tenth of the data set.
determined valid and means 1 it was determined as a
However, when the entire dataset is fed into the

115
algorithm, the accuracy increases to 33%. This high percentage of
accuracy is to be expected due to the large imbalance between the
number of valid and real transactions.

REFERENCES
Since the entire dataset consists of only two days’ transaction records, [1]"Detection of credit card fraud based on transaction behavior -by
its only a fraction of data that can be made available if this project were John Richard D. Kho, Larry A. Vea, ” edited by Proc. of the 2017 IEEE
to be used on a commercial scale. Being based on machine learning
Region 10 Conference (TENCON), Malaysia, May 5-8
algorithms, the program will only increase its efficiency over time as
November,2017
more data is put into it.
[1] CLIFTON PHUA1, VINCENT LEE1, KATE SMITH1 & ROSS
VII. FUTURE ENHANCEMENTS GAYLER2 “ A comprehensive investigation into data mining based
” published by the School of Business of
Although w e couldn't reach the goal of 10 0 % accuracy in fraud detection Research, systems, Faculty Information
fraud detection, w e ended up developing a system that, Technology, Monash University, Wellington Road,Clayton, Victoria
given enough time and data, can come very close to that 3800, Australia
goal. As w ith any project of this nature, there is still room
for improvement.The nature of this project allow s to [2] “ Suman Credit Card Fraud Detection Survey Paper” ,
integrate multiple algorithms as modules and to combine Research
their results to increase the accuracy of the final result.
This model can be further improved by adding more Scholar, GJUS&T Hisar HCE, Sonepat published by International
Journal of Advanced Research in Computer Engineering and
algorithms. How ever, the output of these algorithmts must Technology
be in the same format as the others. Once this condition is (IJARCET) Volume 3 Issue 3, March 2014
met, the modules can be easily added as in code. This gives
the project a high degree of modularity and versatility. [3] “ Research on credit card fraud detection model based on
Further possibilities for improvement can be found in the distance
data set. As previously show n, the accuracy of the Sum - by Wen-Fang Yu and Na Wang", published in 2009
algorithms increases as the size of the data set increases. International Joint Conference on Artificial Intelligence [5]
"Detection of credit card fraud through parenclitic network
Therefore, more data w ill certainly make the model more analysis-by Massimiliano Zanin, Miguel Romance, Regino Criado
accurate in detecting fraud and reducing the number of false and santiagoMoral” , edited by Hindawi Complexity Volume 2018,
positives.
Item ID 5764370, 9 pages
How ever, this requires official support from the banks [6] "Detecting Credit Card Fraud: A Realistic Modeling and Novel
themselves
Learning Strategy,” published by IEEE TRANSACTIONS ON NEURAL
NETWORKS AND LEARNING SYSTEMS, VOL. 29,

116
NO. 8 AUGUST 2018
[7] “ Credit Card Fraud Detection – by Ishu Trivedi, Monika, Mrigya,mridushi,
published by the International Journal of
Advanced Research in computer and Communication Technology
Vol. 5, Issue 1, January
2016 [8] David J. Wetson, David J. Hand, M. Adams, Whitrow, and Piotr
Jusczak
"Plastic card fraud detection using peer group analysis" Springer, Edition 2008.

117

You might also like