0% found this document useful (0 votes)
51 views

Fruad SVM

This document discusses using machine learning techniques to detect bank fraud. It focuses on using support vector machines (SVMs) to build models of normal and abnormal customer behavior from transaction data. Three main types of bank fraud are examined: credit card fraud, money laundering, and mortgage fraud. The authors propose using both binary and single-class SVMs in combination to better detect fraudulent transactions across different fraud types. The models are tested on bank transaction databases, showing promise for using machine learning to help banks combat fraud.

Uploaded by

Everpee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Fruad SVM

This document discusses using machine learning techniques to detect bank fraud. It focuses on using support vector machines (SVMs) to build models of normal and abnormal customer behavior from transaction data. Three main types of bank fraud are examined: credit card fraud, money laundering, and mortgage fraud. The authors propose using both binary and single-class SVMs in combination to better detect fraudulent transactions across different fraud types. The models are tested on bank transaction databases, showing promise for using machine learning to help banks combat fraud.

Uploaded by

Everpee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Proceedings of the International conference on Computing Technology and Information Management, Dubai, UAE, 2014

Automatic Bank Fraud Detection Using Support Vector Machines

Djeffal Abdelhamid1, Soltani Khaoula1, Ouassaf Atika2


1
Computer science department, LESIA Laboratory, Biskra University, Algeria
2
Economic science department, Economic science Laboratory, Biskra University, Algeria
[email protected], [email protected], [email protected]

ABSTRACT activities. The remote use of credit cards is a very


fashionable tool of fraud; just have some
With the significant development of communications information to make a purchase by other’s card via
and computing, bank fraud is growing in its forms and the Internet.
amounts. We try in this paper to analyze the various Data mining can play a very important role in the
forms of fraud to which are exposed banks and data fight against these types of fraud. It is a set of
mining tools allowing its early detection using data techniques for extracting relevant information
already accumulated in a bank. We propose the use of
from large amounts of data to assist in decision-
supervised learning methods called support vector
machines to build models representing normal and making. The SVM method is used particularly in
abnormal customers behaviors and then use it to check this context, due to its precision and its variants
new transactions. We also propose a hybridization of fitting different learning situations [1,2,10]. In the
the two SVM methods, binary and single class to literature, several techniques have been used for
enhance the detection of fraudulent transactions. the detection of fraud, including credit cards fraud.
The obtained results from databases of credit card Among these techniques, there are neural
transactions show the power of these techniques in the networks [3], Bayesian networks [6], Markov
fight against banking fraud comparing them to others chains [13] , sequence alignment [5] ... etc. The
in the same field. objective of this work is to provide to bankers an
automatic fraud detection system that enables
KEYWORDS them to detect fraudulent transactions based on
machine learning by support vector machine that
Support vector machines, Bank fraud detection, single
has shown its power in several other areas such as
class SVM, Binary SVM.
face recognition, fingerprints, voice, ... etc.
Our goal is, therefore, to study the problem of
1 INTRODUCTION
fraud in banks and its resolution by the SVM
The development of our country is the main techniques. We present an analysis of the bank
occupation of our society. This development fraud problem and its different forms and propose
begins with the modernization of our businesses for each form, the variant of SVMs that can be
and administrations by introducing new used for its resolution and the necessary
management techniques, monitoring and analysis adaptations. We also propose a hybridization of
based on the use of large amounts of accumulated two SVM methods: binary and single class to
data. These techniques can help to minimize risk enhance the detection of fraudulent transactions. A
and improve the quality of services offered to system summarizing the use of the proposed
customers in order to succeed in a competitive solutions is designed and built in this work.
world. Fraud is a very important risk facing The rest of the paper is organized as follows: we
financial companies and banks in particular and first present the various forms of fraud in banks as
traditional prevention techniques such as PINs, well as the indices used to discover it, then we
passwords and identification systems have become discuss the types of data mining solutions that can
inadequate and heavy in modern banking systems be used. In the third section, we discuss the use of
[7]. Fraud in banks may be faced in several support vector machines to meet the needs of
detection of each form of fraud. The fourth section
ISBN: 978-0-9891305-5-4 ©2014 SDIWC 10
Proceedings of the International conference on Computing Technology and Information Management, Dubai, UAE, 2014

presents the validation of the proposed solutions ability of financial institutions to automatic
by testing them on bank databases. We conclude processing of suspect data. However, the search
the article by a conclusion and envisaged for efficient methods for identifying suspicious
perspectives. transactional behaviors of money laundering
remains a very active research field.
2 FORMS AND INDICES OF BANK FRAUD Nowadays, it is difficult to determine all the
indices and variables characterizing a money
Fraud in banks has many forms; it can be internal laundering operation, because generally such
i.e. committed by employees of the bank itself or unofficial activities are generated by complex
external, committed by clients, persons or social and economic conditions. Among the
institutions foreign to the bank. We are interested money bleaching indices used in the literature
in this paper to external fraud that may exist in include:
three main forms:  The amount of the transaction
(withdrawal/payment) if it exceeds a
2.1 Fraud by credit cards predetermined amount by the bank, the
transaction not justified, is then suspicious.
The remote use of credit cards is a very For example, in Algeria this amount is set
fashionable fraud tool. It is sufficient to have just at 10,000 Euros,
some information to make a purchase by the card  Billing: If the customer in his profession,
of others via the Internet. The detection of credit has no accounting as in public works,
card fraud is often based on a number of forecast agriculture, ... etc, then a transaction with
indicators that are generally concluded from the large amount is considered suspicious,
transaction information retrieved from the  The source of transfer,
historical database [15]. We calculate from this  The date of the transaction,
base, indices such as: frequency of use, the
 Type of customer: a transaction with a
remaining unpaid balance of each cycle, the
high amount of 
 passenger customer is
frequency of the uncovered, the maximum number
suspected,
of late days, shopping frequency, average number
 The change of address,
of consumption, daily transactions, the largest
number of transactions in historic database ... etc.  The speed of circulation of money in the
account,
These indices or features are extracted for each
transaction and are recorded for discovering  The time of the transaction: transactions
patterns of fraudulent transactions. made at night 
 with a large amount are
suspect.
2.2 Money laundering  ... etc.

Money laundering is also a well-known form of 2.3 Mortgage Fraud


fraud, international lute against this activity is
conducted by different states to discover and At each attribution of credit to a customer by the
prosecute criminal activities that occur. The fight bank, it holds a mortgage or guarantee to ensure
against money laundering in the financial industry repayment of the credit in case of repayment
is based on the analysis and processing statements difficulties by the customer. Several customers
regarding suspicious transactions detected by present to the bank false mortgages or with
financial institutions [4]. Generally, only a few overestimated value not allowing the bank a
suspicious transactions are really money refund of its credit. This form of fraud presents to
laundering operations, but the number of banks a significant portion of their loss. The
operations to be analyzed by financial institutions indices used for the detection of this type of fraud
require a long time . In the literature, artificial are personal and professional information of
intelligence methods are used to improve the customers as well as the presented mortgage.

ISBN: 978-0-9891305-5-4 ©2014 SDIWC 11


Proceedings of the International conference on Computing Technology and Information Management, Dubai, UAE, 2014

discovered by the bank can play a very important


3. FRAUD DETECTION SYSTEM role in the fight against banking fraud.
In the literature, the two known forms of
The system we propose, for the detection of bank automatic learning are used: supervised and
fraud, is shown in figure 1. unsupervised. The supervised learning methods
have been used such as association rules [11],
Bayesian networks [3, 9]. These methods assume
a prior knowledge of the nature of transactions,
fraudulent or sane, the learning in this case
consists in building a model separating the space
into two parts according to the available examples
then classify new examples based on their
membership to one of these two classes.
Unsupervised methods such as sequence
alignment [12], HMM [5], neural networks [6],
[14] …etc, require no prior classification of
training examples, it is rather based on the
detection of strange transactions.
In this work, we propose to reinforce the two types
of learning by hybridizing them. Supervised
learning is performed to separate fraudulent
transaction of those sane then another
unsupervised learning on the sane transactions
only to detect strange transactions, which can take
at the same time information about fraudulent
Figure 1. Fraud detection system transactions and information about strange data
(Figure 2).
The system takes the data base accumulated in the
bank and performs a learning for extracting a
model (functions, rules,...) representing the
characteristics of the data. The model is used to
decide about new transactions, a transaction
accepted by the model is executed then added to
the database to improve the model. Transactions
rejected by the model (suspicious) pass to a
manual check; if they are considered normal, they
are executed and then added to the database
otherwise the transaction is rejected.

4 MACHINE LEARNING FOR BANK


FRAUD DETECTION

The discovery of models followed by fraudsters


through the analysis of their behavior is
impossible due to the complexity of the operation
and secondly the rapid change and development of
the techniques used by fraudsters. In this context, Figure 2. Hybridization of supervised and unsupervised
machine learning from examples of fraud learning for fraud detection

ISBN: 978-0-9891305-5-4 ©2014 SDIWC 12


Proceedings of the International conference on Computing Technology and Information Management, Dubai, UAE, 2014

This technique avoids the case of a false


generalization in use of the binary learning by
filtering the positive part with single class
learning. In figure 3, the upper-hatched area is
excluded by supervised learning while the lower
hatched area is excluded by unsupervised training.
It is clear that the use of binary learning only
leaves the lower part, representing half of the
space, as part of the sane transactions, which
creates a false over-generalization. While the use
of single class learning only can extend the
generalization to contain fraudulent trans- actions.
Hybridization can well reduce sane transactions
space, and therefore, extend the space of
fraudulent transactions. This can pose obstacles to
the rapid development of techniques of fraud by
fraudsters.

Figure 4. Proposed system

Database operations with their old sane or


fraudulent classification are passed to the system.
Two independent operations are launched:
supervised and unsupervised learning. The first
uses the entire operations and the second uses only
sane transactions. Each operation provides a
decision model. On arrival of a new transaction,
the supervised model tests it, if it is accepted, a
second test is performed by the unsupervised
model, otherwise it is passed to the manual
verification. If the unsupervised model accepts the
transaction, it is executed; otherwise it is passed to
the manual verification. After manual inspection,
Figure 3. Advantages of hybridization of supervised and if the transaction is accepted it is executed,
unsupervised learning otherwise it is rejected. In both cases the
transaction and the decision are added to the
The proposed system that allows to use both types database to build the model with the new
of learning side by side, is structured as in Figure information.
4.
4.1 Support vector machine

The binary SVM solves the problem of separating


two classes represented by n examples of m
attributes each. Consider the following problem:
{(x ,y ),..,(x ,y )}, x ∈ Rm, y ∈ {−1,+1}}
1 1 n n i i

Where xi are learning examples and yi their


respective classes. The objective of the SVM

ISBN: 978-0-9891305-5-4 ©2014 SDIWC 13


Proceedings of the International conference on Computing Technology and Information Management, Dubai, UAE, 2014

method is to find a linear function f (equation 1) This technique is also used for the detection of
called hyperplane, which allows to separate the outliers through a version called single class SVM.
two classes: We provides to the method a set of examples with
the same class, it produces a decision function that
f(x)=(x•w)+b (1) is positive for examples resembling to the training
ones and negative for strange ones.
Where x is an example to classify, w is a vector
and b is a bias. We must therefore find the widest 4.2 Credit card fraud and money laundering
margin between the two classes, which is
equivalent to minimizing ½w2. In the case where If the bank has a historical database on fraudulent
the training data are not linearly separable, we transactions and those sane, the system given in
allow deviations ξi of examples relative to the Figure 3 is used. In cases where the bank has no
boundaries of the margin of separation with a such historical database and all transactions are
penalty parameter C, and the problem becomes a considered sane, single class learning is only
convex quadratic programming problem: appropriate. In both cases, the construction of the
decision model involves three steps:
1. Features Extraction: allows to convert all
transactions of each account in a features
vector (vectors which will be used by the
training and testing phases). The feature
vector contains statistics on customer
The problem of the equation 2 can be solved by behavior such as the number of transactions,
introducing Lagrange multipliers in the following the amount handled, times and dates of
dual problem: transaction per day, week, month and year.
This phase concerns the credit card fraud
and money laundering.

2. Training: is to build the decision model.

3. Test and validation: allows to test and


validate the 
 learned model.
From which we can have the following decision
function (hyperplane): These three phases allow to build a model
according to the scheme of Figure 5.

The K function is called the kernel, it is a


symmetric function that satisfies Mercer
conditions [12]. It can represent a transformation
of the original space in which the data may be
non-linearly separable to a new space with more
dimensions where a linear separator exists.
Solving the problem of equation 3 passes through
optimization especially in the case where the
number of samples is high. Among the most used
optimization methods include the SMO
(Sequential Minimal Optimization) where the
problem is broken into several sub-problems, each
must optimize tow αi [8].

ISBN: 978-0-9891305-5-4 ©2014 SDIWC 14


Proceedings of the International conference on Computing Technology and Information Management, Dubai, UAE, 2014

Figure 6. System for mortgage fraud detection

5 TESTS AND RESULTS


Figure 5. System for Credit card fraud detection
5.1 Used data
4.3 Mortgage fraud
In reality, it is very difficult to obtain real data that
In the case of this type of fraud, the detection is describe the behavior of bank customers, due to
not based on historical transactions, but rather on the confidential nature of the data, however there
the information provided by previous customers are standard databases used in the literature to test
on their mortgages. Figure 6 shows the fraud detection methods.
construction of the decision model. We used in our tests, three databases of different
types, GeneralLedger, PayablesData and
RevenueData, corresponding to credit card fraud,
money laundering and mortgage fraud
respectively. Databases are available from the
repository ”Fraud Detection with ActiveData for
Excel”. It contains data belonging only sane
transactions, and consistent we used to prove the
convenience of SVM method (single class) for
fraud detection.
To test our proposition of hybridization, we used
the German and Australian databases of credit
cards using.

ISBN: 978-0-9891305-5-4 ©2014 SDIWC 15


Proceedings of the International conference on Computing Technology and Information Management, Dubai, UAE, 2014

5.2 Results national bank for use in the analysis of our


proposition.
Results obtained on the databases of detection of
banking fraud are shown in Table 1. Parameters of 6 Conclusion
the single class SVM method are taken as follows:
The fight against fraud is a current need for
 V=0.5
 multiple sectors and banks in particular. It is in
 C=100
 this context that we propose a system for detecting
 Sigma=0.1 bank fraud based on support vector machines
technique, depending on the application in the
The used validation method is split of 70% bank. We studied in this context, three cases of
training and 30% test. Table 1 presents recognition fraud in banks: credit card fraud, money
rates obtained on the test databases. Preliminary laundering and mortgage fraud. We proposed, in
results show the power of the SVM method for this context a method based on the hybridization
detecting fraud in banks. Indeed, the authors of [6] of single class and binary SVM methods.
used Bayesian networks combined with neural The performance of the proposed system has been
networks, which enabled them to obtain an tested on the benchmarks GeneralLedger,
accuracy of about 70%. The use of hidden Markov PayablesData, Revenue- Data, Australian and
models have achieved only accuracy of about 80% German databases. The precision obtained for the
according to [13]. single class SVM method, was of about 80%,
which represents a significant improvement in
Table 1. Obtained accuracy by single class SVM method comparison to similar works. For the proposed
method the slight improvement on credit scoring
Table
General
PayablesData
Revenue databases was because of the difficulty of
Ledger Data Data obtaining real databases. The results can be
Precision 94% 100% 85%
improved by studying the influence of various
The sequence alignment method used in [5] could parameters used by the SVM method.
not get only accuracy rates below 80%.
7 References
Comparing to these results, support vector
machines can make a considerable improvement [1] Tareq Allan and Justin Zhan. Towards fraud detection
to accuracy of fraud detection methods in banks. methodologies. In Future Information Technology
(FutureTech), 2010 5th International Conference on,
The results obtained by the method we proposed pages 1–6. IEEE, 2010.
are shown in table 2. [2] S Benson Edwin Raj and A Annie Portia. Analysis on
credit card fraud detection methods. In Computer,
Communication and Electrical Technology (ICCCET),
Table 2. Results obtained by the proposed method 2011 International Conference on, pages 152– 156.
IEEE, 2011.
Binary Single Hybrid [3] Rdiger W Brause, T Langsdorf, and Michael Hepp.
Neural data mining for credit card fraud detection. In
Table SVM class SVM SVM Tools with Artificial Intelligence, 1999. Proceedings.
11th IEEE International Conference on, pages 103–106.
IEEE, 1999.
Australian 83.56% 54.67% 83.85% [4] FATF-GAFI.ORG. Financial action task force on
German 72.4% 67.2% 72.4% money laundering. Rapport 1996-1997 sur les
typologies du blanchiment de l’argent, Groupe d’Action
Financire (GAFI), Fvrier 1997.
The preliminary results of the proposed method [5] Amlan Kundu, Shamik Sural, and Arun K Majumdar.
show a slight improvement over the binary Two-stage credit card fraud detection using sequence
alignment. In Information Systems Security, pages 260–
method, but on bases of credit scoring and not on 275. Springer, 2006.
databases accumulating customer’s behavior. We [6] SamMaes,KarlTuyls,BramVanschoenwinkel,andBernar
intend in the near future, as part of a PNR project dManderick. Credit card fraud detection using bayesian
and neural networks. In Proceedings of the 1st
to which we participate, acquire data from a international naiso congress on neuro fuzzy
technologies, 2002.

ISBN: 978-0-9891305-5-4 ©2014 SDIWC 16


Proceedings of the International conference on Computing Technology and Information Management, Dubai, UAE, 2014
[7] Md Delwar Hussain Mahdi, Karim Mohammed Rezaul,
and Muham- mad Azizur Rahman. Credit fraud
detection in the banking sector in uk: a focus on e-
business. In Digital Society, 2010. ICDS’10. Fourth
International Conference on, pages 232–237. IEEE,
2010.
[8] Edgar Osuna, Robert Freund, and Federico Girosi.
Animprovedtraining algorithm for support vector
machines. In Neural Networks for Signal Processing
[1997] VII. Proceedings of the 1997 IEEE Workshop,
pages 276–285. IEEE, 1997.
[9] Suvasini Panigrahi, Amlan Kundu, Shamik Sural, and
Arun K Majum- dar. Credit card fraud detection: A
fusion approach using dempster– shafer theory and
bayesian learning. Information Fusion, 10(4):354–363,
2009.
[10] Jon TS Quah and M Sriganesh. Real-time credit card
fraud detection using computational intelligence. Expert
Systems with Applications, 35(4):1721–1732, 2008.
[11] DanielSa nchez, A ila, Cerda,and ose -
ar aSerrano.Association rules applied to credit card
fraud detection. Expert Systems with Applications,
36(2):3630–3640, 2009.
[12] Bernhard Scholkopf and Alexander J Smola. Learning
with kernels: support vector machines, regularization,
optimization, and beyond. The MIT Press, 2001.
[13] Abhinav Srivastava, Amlan Kundu, Shamik Sural, and
Arun K Majum- dar. Credit card fraud detection using
hidden markov model. Dependable and Secure
Computing, IEEE Transactions on, 5(1):37–48, 2008.
[14] Wen-Fang Yu and Na Wang. Research on credit card
fraud detection model based on distance sum. In
Artificial Intelligence, 2009. CAI’09. International
Joint Conference on, pages 353–356. IEEE, 2009.
[15] Gao Zengan. Application of cluster-based local outlier
factor algorithm in anti-money laundering. In
anagement and Service Science, 2009. ASS’09.
International Conference on, pages 1–4. IEEE, 2009.

ISBN: 978-0-9891305-5-4 ©2014 SDIWC 17

You might also like