Credit Card Fraud Detection and Analysis
Credit Card Fraud Detection and Analysis
Abstract—With the advancement of technology and globaliza- transactions that deviate from typical cardholder behavior (see
tion around the world, the necessity of a cashless economy is also [9]). Typical cardholder behavior has also been modeled
increasing day by day. So, use of credit cards and other money using self-organizing cards [10]–[12]. To date, supervised
transaction methods are being available to everyone in order
to cope with the competitive global economy. But credit card methods are the most popular methods for detecting fraud and
fraudulence has become an important issue that is questioning the using labeled transactions to train classifiers. Fraud is detected
security and reliability of these money transaction systems. With by classifying the feature vector of authorized transactions
the Advent of Credit cards and their increasing functionality or possibly by analyzing the back of the [13] classifier.
have not only given people more personal comfort, but have also Various algorithms used to classify credit card transactions
attracted heavily malicious characters interested in handsome
rewards to be earned by fraudulence. Credit cards are a nice have been fraud tested, including neural network [3], [14],
target for fraud now-a-days. Since in a very short time a huge [15], logistic regression [16], association rules [17], machine
amount of money can be earned without taking too many risks support vector [18], modified Fisher discriminant analysis [19]
overall. This is because often the crime is only discovered a and decision tree [20]–[22]. Some studies report that random
few weeks after date [1]. So, immediate fraud detection is very forests(RF) can achieve the best performance. [2], [18], [23]–
necessary for a smooth economic system. A successful prediction
of credit card fraudulence can save someone’s hard earned wages [25] This is one of the reasons why we use RF in our
for months. experiments. Performance measurement of fraud detection The
Index Terms—Credit Card, Fraud Detection, Isolation Forest, typical performance measurement of fraud detection problem
Local Outlier Factor, Support Vector Machine is AUC [21], [24], [25]. You can use Mann-Whitney statis-
tics [26] to estimate AUC, and its value can be interpreted
I. I NTRODUCTION as a probability The classifier classifies fraud into actual
It is important that credit card companies are able to transactions of [27] and above. Another ranking metric often
recognize fraudulent credit card transactions so that customers used in fraud detection is average accuracy [24], which is
are not charged for items that they did not purchase. Credit the area under the accuracy recovery curve. Although these
card fraud issues have skyrocketed in recent years. In 2016, metrics are widely used for detection problems, cost-based
Credit card fraud cases have reached $4.57 billion, up 34 metrics are specifically designed for fraud detection purposes.
percent from the year before. And it is still increasing in recent The cost-based metric [19], [20], [22] uses a cost matrix to
years. In order to reduce the risks and increase reliability quantify the monetary loss caused by fraud, which associates
of credit card transaction systems, it is very important to the cost with each entry in the confusion matrix. Elkan [28]
introduce better fraud detection methods. For that reason, pointed out that the cost matrix may be misleading because
thorough research on credit card transaction data is much the minimum/maximum loss of this article has been accepted
required. Classifying transactions into either fraud transactions in the next issue of the magazine. Except for paging, the
and legal transactions might help to stop any occurrence of content is the final statement of 6 IEEE transactions in neural
fraudulence autonomously. Successful fraud detection might networks and learning systems. This diary may change over
result in reducing an enormous amount of economic losses time. To avoid this problem, you can use standardized cost
around the world every year. [18] or save [20] to evaluate performance at a maximum loss.
We believe that performance indicators should also take into
II. R ELATED W ORK account the availability of life. Isolate programs because they
Supervised [2]–[4] and unsupervised [5]–[7] methods have should check all alerts generated by SDS. Since researchers
been proposed to detect credit card fraud. The unsupervised have limited time and can only verify a small number of alarms
method consists of atypical anomaly detection technology, per day, an effective SDS should provide researchers with a
which treats any transaction that does not meet most re- small number of reliable alarms. This is why we introduced
quirements as fraud. Surprisingly, unsupervised DDM in FDS the alert described in Section III for accuracy indicators.
can be configured directly from unmarked transactions. The
well-known method is [8] peer group analysis, which groups
customers based on their personal data and identifies fraud as
III. O BJECTIVE There are no missing or null values. There are two types of
• Analysis of credit card user transaction data of a certain transaction classes for each feature either normal or fraud.
region or institution Number of total normal transactions is 284315 where the
• Analyze fraudulence pattern from user transaction data number of fraud transactions is only 492. The dataset is highly
• Detect Any fraud detection successfully at real time imbalanced as most of the transactions are non-fraud. If we
• Validate machine learning classification algorithms in use this dataframe as the base for our predictive models and
order to predict fraud transaction with better accuracy analysis we might get a lot of errors and our algorithms will
probably overfit. But, we want our model to detect patterns that
IV. S IMULATION T OOLS give signs of fraud. From the analysis of different amounts of
• Isolation Forest money transaction in each of the classes we can observe that
• Local Outlier Factor the transaction class distribution is very skewed as well.
• Support Vector Machine(SVM)
V. P LATFORM
• Jupyter Notebook
• Open-source web-based application
• Google Colab
• Cloud Service
VI. M ETHODOLOGY
Our proposed research work is going to be conducted in
step by step modules. Our steps towards a successful fraud
detection technique can be described as follows –
• Data Collection Fig. 1. Transaction Class Distribution
• Data preparation and analysis
• Model Development
A. Data Collection
The proposed research will be conducted on a dataset that
contains transactions made by credit cards in September 2013
by european cardholders. Transactions that occurred in two
days are presented in the dataset, There are 492 frauds out of
284,807 transactions in total. So, The dataset is highly unbal-
anced as the positive class (frauds) account for just 0.172%
of all transactions. It contains only numeric input variables
which are the results of a PCA transformation. Unfortunately,
due to confidentiality issues, the dataset doesn’t provide the
original features and more background information about the
data. There are 28 features that have been obtained using PCA(
Fig. 2. Amount Per Transaction by class
Principal Component Analysis), the only features which have
not been transformed with PCA are ’Time’ and ’Amount’.
The feature ’Time’ contains the seconds elapsed between each
transaction and the first transaction in the dataset. The feature
’Amount’ contains the transaction Amount, Feature ’Class’ is
the target variable and it takes value 1 in case of fraud and 0
otherwise.
B. Data preparation and analysis
The description of the dataset says that all the features
went through a PCA transformation which is a Dimensionality
Reduction technique except for Time and amount. In order
to implement a PCA transformation features need to be
previously scaled. In this case, all the features have been scaled
previously. So, there’s no need to apply any dimensionality
reduction technique or scaling on our dataset. Here, The Fig. 3. Time of Transaction vs Amount by class
dataset has 31 attributes including Time, Amount and Class.
C. Model Development Complex anomaly detection models can also be used
Different types of machine learning classification algorithms to get better accuracy in determining more fraudulent
like Isolation Forest, Local Outlier Factor, Support Vector cases.
Machine(SVM) will be applied upon the processed data.
1) Isolation Forest: One of the most up-to-date irregularity VIII. C ONCLUSION
recognition techniques is called isolated forest. The calculation Finally, we can say that we have used 3 algorithms in
is based on the truth that there are some inconsistencies and order to detect credit card fraud. And these three algorithms
distinctive data points. In view of these properties, the rarities - the isolation forest, the local outlier factor, and the support
are subject to an instrument called separation. This technique vector machine are analyzed and applied on the dataset. Of
is very useful and is essentially not the same as all current these three algorithms, the isolation forest gives the most
strategies. Introduces the use of shielding as a more compe- accurate outcome. Because, the isolation forest can handle
tent and efficient method of recognizing inconsistencies than highly imbalance datasets and the outliers on its own.
regular distance and thickness estimates. Furthermore, this
strategy is an algorithm with low direct time complexity and R EFERENCES
low memory requirements. Assemble an incredible model with
[1] S. Maes, K. Tuyls, B. Vanschoenwinkel, and B. Manderick, “Credit card
a modest number of trees using small fixed-size subsamples, fraud detection using bayesian and neural networks,” in Proceedings of
paying little attention to the size of the data set. the 1st international naiso congress on neuro fuzzy technologies, 2002,
2) Local Outlier Factor: The Local Outlier Factor (LOF) pp. 261–270.
[2] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, “Data
algorithm is an individual peculiarity location method that mining for credit card fraud: A comparative study,” Decision support
calculates the deviation of the local thickness of a given systems, vol. 50, no. 3, pp. 602–613, 2011.
information directly relative to its neighbors. Consider outliers [3] R. Brause, T. Langsdorf, and M. Hepp, “Neural data mining for credit
as tests that are fundamentally less thick than their neighbors. card fraud detection,” in Proceedings 11th International Conference on
Tools with Artificial Intelligence. IEEE, 1999, pp. 103–106.
3) Support vector Machine: The support vector machine is [4] P. K. Chan, W. Fan, A. L. Prodromidis, and S. J. Stolfo, “Distributed
an exceptionally favored calculation, as it produces critical pre- data mining in credit card fraud detection,” IEEE Intelligent Systems
cision with less computing power. The aim of the calculation and Their Applications, vol. 14, no. 6, pp. 67–74, 1999.
[5] R. J. Bolton, D. J. Hand et al., “Unsupervised profiling methods for
of the support vector machine is to discover a hyperplane in an fraud detection,” Credit scoring and credit control VII, pp. 235–255,
N-dimensional space or a data set of N numbers of highlights 2001.
that particularly characterize the information foci. To isolate [6] M. Carminati, R. Caron, F. Maggi, I. Epifani, and S. Zanero,
“Banksealer: A decision support system for online banking fraud anal-
the two classes of information focuses, there are numerous ysis and investigation,” computers & security, vol. 53, pp. 175–186,
conceivable hyperplanes that could be selected. Our goal is to 2015.
locate a plane that has the most extreme edge. That implies the [7] D. K. Tasoulis, N. M. Adams, and D. J. Hand, “Unsupervised clustering
in streaming data,” in Sixth IEEE International Conference on Data
most extreme distance between the information points of the Mining-Workshops (ICDMW’06). IEEE, 2006, pp. 638–642.
two classes. Extending the distance to the edge provides some [8] D. J. Weston, D. J. Hand, N. M. Adams, C. Whitrow, and P. Juszczak,
support so that future sources of information can be organized “Plastic card fraud detection using peer group analysis,” Advances in
Data Analysis and Classification, vol. 2, no. 1, pp. 45–62, 2008.
with more certainty. In SVM, we take the performance of the [9] C. Phua, V. Lee, K. Smith, and R. Gayler, “A comprehensive sur-
pure capacity and if that performance is more prominent than vey of data mining-based fraud detection research,” arXiv preprint
1, we distinguish it with a class and if the performance is -1, arXiv:1009.6119, 2010.
[10] D. Olszewski, “Fraud detection using self-organizing map visualizing the
we recognize it with another class. we acquire this range of user profiles,” Knowledge-Based Systems, vol. 70, pp. 324–334, 2014.
support of values ([- 1,1]) that goes like edge. [11] J. T. Quah and M. Sriganesh, “Real-time credit card fraud detection
using computational intelligence,” Expert systems with applications,
VII. R ESULT AND D ISCUSSION vol. 35, no. 4, pp. 1721–1732, 2008.
[12] V. Zaslavsky and A. Strizhak, “Credit card fraud detection using self-
1) 73 errors were detected by Isolation Forest versus 97 organizing maps,” Information and Security, vol. 18, p. 48, 2006.
errors detected by the Local Outlier Factor versus 8516 [13] R. J. Bolton, D. J. Hand et al., “Statistical fraud detection: A review,”
errors detected by SVM. Statistical science, vol. 17, no. 3, pp. 235–255, 2002.
[14] E. Aleskerov, B. Freisleben, and B. Rao, “Cardwatch: A neural network
2) Isolation Forest has a 99.74% more accurate than LOF based database mining system for credit card fraud detection,” in
of 99.65% and SVM of 70.09%. Proceedings of the IEEE/IAFE 1997 computational intelligence for
3) The Isolation Forest performed much better,When com- financial engineering (CIFEr). IEEE, 1997, pp. 220–226.
paring error precision and recall for 3 models than the [15] J. R. Dorronsoro, F. Ginel, C. Sgnchez, and C. S. Cruz, “Neural
fraud detection in credit card operations,” IEEE transactions on neural
LOF with the detection of fraud cases around 27% networks, vol. 8, no. 4, pp. 827–834, 1997.
versus LOF detection rate of just 2% and SVM of 0%. [16] S. Jha, M. Guillen, and J. C. Westland, “Employing transaction ag-
4) So, overall the Isolation Forest Method performed much gregation strategy to detect credit card fraud,” Expert systems with
applications, vol. 39, no. 16, pp. 12 650–12 657, 2012.
better in the aspect of determining the fraud cases, which [17] D. Sánchez, M. Vila, L. Cerda, and J.-M. Serrano, “Association rules
is around 30%. applied to credit card fraud detection,” Expert systems with applications,
5) This accuracy can be improved by increasing the sample vol. 36, no. 2, pp. 3630–3640, 2009.
[18] C. Whitrow, D. J. Hand, P. Juszczak, D. Weston, and N. M. Adams,
size or by using deep learning algorithms. However, “Transaction aggregation as a strategy for credit card fraud detection,”
it will require additional computational expense. The Data mining and knowledge discovery, vol. 18, no. 1, pp. 30–55, 2009.
[19] N. Mahmoudi and E. Duman, “Detecting credit card fraud by modified
fisher discriminant analysis,” Expert Systems with Applications, vol. 42,
no. 5, pp. 2510–2516, 2015.
[20] A. C. Bahnsen, D. Aouada, and B. Ottersten, “Example-dependent cost-
sensitive decision trees,” Expert Systems with Applications, vol. 42,
no. 19, pp. 6609–6619, 2015.
[21] A. Dal Pozzolo, R. Johnson, O. Caelen, S. Waterschoot, N. V. Chawla,
and G. Bontempi, “Using hddt to avoid instances propagation in
unbalanced and evolving data streams,” in 2014 International Joint
Conference on Neural Networks (IJCNN). IEEE, 2014, pp. 588–594.
[22] Y. Sahin, S. Bulkan, and E. Duman, “A cost-sensitive decision tree
approach for fraud detection,” Expert Systems with Applications, vol. 40,
no. 15, pp. 5916–5923, 2013.
[23] A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi,
“Credit card fraud detection and concept-drift adaptation with delayed
supervised information,” in 2015 international joint conference on
Neural networks (IJCNN). IEEE, 2015, pp. 1–8.
[24] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and
G. Bontempi, “Learned lessons in credit card fraud detection from
a practitioner perspective,” Expert systems with applications, vol. 41,
no. 10, pp. 4915–4928, 2014.
[25] V. Van Vlasselaer, C. Bravo, O. Caelen, T. Eliassi-Rad, L. Akoglu,
M. Snoeck, and B. Baesens, “Apate: A novel approach for automated
credit card transaction fraud detection using network-based extensions,”
Decision Support Systems, vol. 75, pp. 38–48, 2015.
[26] H. B. Mann and D. R. Whitney, “On a test of whether one of two
random variables is stochastically larger than the other,” The annals of
mathematical statistics, pp. 50–60, 1947.
[27] D. J. Hand, “Measuring classifier performance: a coherent alternative
to the area under the roc curve,” Machine learning, vol. 77, no. 1, pp.
103–123, 2009.
[28] C. Elkan, “The foundations of cost-sensitive learning,” in International
joint conference on artificial intelligence, vol. 17, no. 1. Lawrence
Erlbaum Associates Ltd, 2001, pp. 973–978.