EMSS2014 Final
EMSS2014 Final
net/publication/265736405
CITATIONS READS
16 11,317
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Edgar Alonso Lopez-Rojas on 06 January 2015.
5.1. Theft
This scenario includes cases where the customer loses
5. Fraud Scenarios in a Bank Payment System physical possession of her card and a fraudster imper-
In this section we describe how three example of fraud sonate the customer purchasing goods or service with the
that can be implemented in BankSim. These fraud sce- stolen card. In terms of the object model used in BankSim
narios are based on selected cases from the Grant Thorn- the Theft scenario can be implemented by the following
ton report Member and Council (2009). As can be seen setting: Include in the fraudster the behaviour of sensing
in section 6., the different scenarios can be implemented customer proximity, then execute the theft and later pur-
chase goods from another merchant with the information 6.1. Overview
from the customer. The volume of fraudulent activity can 6.1.1. Purpose
be modelled changing the specific parameter of number We aim to produce a simulation that resembles a bank
of theft, zip code and frequency. A ``red flag'' for de- payment system. Our main purpose is to generate a syn-
tection in this case could be a high number of unusual thetic data set of commercial transactions that can be used
transactions with high value in a short period. for the development and testing of different fraud detec-
tion techniques.
If we want to use the real original data for the devel-
5.2. Cloned Card/Skimming
opment of fraud detection methods, it often happens that
This scenario includes cases where the fraudster creates is difficulty to find diverse and enough cases of fraud.
a clone of the card, letting the user keep the original card However this is not the case of a simulated environment,
but without knowledge of the loss of security. In terms of where fraud can be injected following known patterns of
the object model used in BankSim, the cloned card sce- fraud and flagged for easy recognition and evaluation of
nario can be implemented by the following setting: In- the performance of the detectors.
clude in the fraudster the behaviour of sensing customers
proximity, then execute the acquisition or cloning of a 6.1.2. Entities, state variables and scales
card and later purchase goods from another merchant with There are three agents in this simulation: Merchant, Cus-
the information from the customer. An alternative way tomer and Fraudster.
to implement this scenario could be when a merchant is
compromised in different ways (e.g. by hacking) and al- Merchant This agent serves the customer with one cat-
low a fraudster to steal information from all customers egory of merchandise specified by the original data. It
that have been served there on a massive scale. The vol- offers products or services according to the statistics ob-
ume of fraudulent activity can be modelled changing the tained from the specific zip code and time (week, day of
specific parameter of number of theft and merchant af- the week and/or hour). They are waiting for customers to
fected, zip code and frequency of use for purchasing. A request products and register the payments.
``red flag'' for detection in this case could be similar as
previous case, a high number of unusual transactions with
Customer This agent's main objective is to satisfy a
high value in a short period. Other methods such as simul-
need for one of the 16 categories and purchase goods or
taneous payments in different physical locations, or using
services from merchants. They posses a payment method
the card far from previously known locations, could also
which in this case we will be generalised as a credit card.
be flagged.
Amount
missing information from the real system can be found 4000
barsandrestaurants
wellnessandbeauty
sportsandtoys
transportation
otherservices
hotelservices
contents
fashion
leisure
health
hyper
home
travel
food
tech
Categories
2000
1500
1000
500
Amount
2000
1500
1000
Figure 3: ScatterPlot Payments vs Age/Gender 500
wellnessandbeauty
erating a distribution of categories that resembles the real sportsandtoys
transportation
otherservices
hotelservices
contents
fashion
leisure
health
hyper
home
ulated similar average and standard deviation to the ones
food
tech
Categories
present in the original data. One thing to notice is that
the category auto did not get any transaction during the
simulation, this could be due to the location of the mer- Figure 5: BoxPlot of a BankSim simulation without cat-
chant in the environment being random and was perhaps egory Travel
far enough to be hidden from customers that wanted to
purchase from this category. A box plot of the simulated
micro behaviour, produces the same type of overall inter-
categories is shown in figure 4. Since the values of travel
action that we can observe in the original data, and fur-
are bigger than other categories, we decided to draw the
thermore, this interaction give rise to the same macro be-
box plot omitting this category in figure 5 to improve the
haviour for the whole zip code as for a real situation as
visualization of the simulated data.
well.
The simulated fraud behaviour is presented in table 7.
Since we are running a simulation we argue that the
The total amount stolen was around 3.8 million Euros
differences are not significant for our purpose, which is to
which corresponds to a rather high crime rate of nearly
use this distribution to simulate the normal behaviour of
17% of the total amount of payments. We programmed an
payments, and simultaneously combine this with injected
aggressive behaviour where few transactions (only 7200
anomalies and known patterns of fraud.
and 1.2% of total)) could defraud 17% of the payments
with an average of 530 Euros per fraud. For the purpose
of fraud detection there is a benefit from the occurrence 8. CONCLUSIONS
of enough cases of fraud that can help the investigators to BankSim is a simulation of bank payments with the objec-
gather the evidence needed to prosecute the criminals. In tive to generate a synthetic transactional data set that can
our case we benefit from the abundance of fraud cases be- be used for research into fraud detection. The data sets
cause many detection methods need enough data to train generated with BankSim can aid academia, financial or-
better a classifier that can detect the fraud behaviour. ganisations and governmental agencies to test their fraud
So in summary, our agent model with its programmed detection methods or to compare the performance of dif-
ferent methods under similar conditions using a common edge discovery and data mining - KDD 06, page 504,
public available and standard synthetic data set for the 2006. doi: 10.1145/1150402.1150459.
test. SJ Alam and Armando Geller. Networks in agent-based
In section 3. we formulated our research question: social simulation. Agent-based models of geographical
How could we model and simulate a bank payment sys- systems, pages 77--79, 2012.
tem and generate a realistic and reliable synthetic data R.J. Bolton and D.J. Hand. Statistical fraud detection: A
set for the purpose of fraud detection? review. Statistical Science, 17(3):235--249, 2002.
In section 6. we presented the model for BankSim, Volker Grimm, Uta Berger, Finn Bastiansen, Sigrunn
which is based on the ODD methodology. In order to Eliassen, Vincent Ginot, Jarl Giske, John Goss-
better support our claim and answer our research ques- Custard, Tamara Grand, Simone K. Heinz, Geir Huse,
tion we analysed the type of data needed to generate and Andreas Huth, Jane U. Jepsen, Christian Jø rgensen,
output as a CVS file (see section 7.) and we evaluated Wolf M. Mooij, Birgit Müller, Guy Pe’er, Cyril Piou,
and verified our model in section 7.2. Steven F. Railsback, Andrew M. Robbins, Martha M.
It is important to know how much information from the Robbins, Eva Rossmanith, Nadja Rüger, Espen Strand,
real data set is contained in the generated synthetic data. Sami Souissi, Richard a. Stillman, Rune Vabø, Ute
First we do not have access to any specific record of who Visser, and Donald L. DeAngelis. A standard pro-
is purchasing anything and neither the merchant involved tocol for describing individual-based and agent-based
in the transaction. We based our simulation purely on models. Ecological Modelling, 198(1-2):115--126,
the aggregated statistical measures present in the original September 2006. ISSN 03043800. doi: 10.1016/j.
data that give us an approximate description of how the ecolmodel.2006.04.023.
individual agents behave. This means that Bank Inc. can P.J. Lin, B. Samadi, and Alan Cipolone. Development of
be sure that the privacy from the customers is preserved a synthetic data set generator for building and testing
when using BankSim. information discovery systems. In ITNG 2006., pages
We argue that BankSim is ready to be used as a gen- 707--712. IEEE, 2006. ISBN 0769524974.
erator of synthetic data sets of financial activity of a pay- Edgar Alonso Lopez-Rojas and Stefan Axelsson. Money
ments. Data sets generated by BankSim can be used to Laundering Detection using Synthetic Data. The 27th
implement fraud detection scenarios and malicious be- workshop of Swedish Artificial Intelligence Society
haviour scenarios such as a stolen or cloned credit cards (SAIS), pages 33--40, 2012a.
or unusual simultaneous activity of purchase in differ- Edgar Alonso Lopez-Rojas and Stefan Axelsson. Multi
ent physical locations. We will make a stable release of Agent Based Simulation (MABS) of Financial Trans-
BankSim available to the research community together actions for Anti Money Laundering (AML). The 17th
with standard data sets developed for this article and fur- Nordic Conference on Secure IT Systems, pages 25--
ther research. 32, 2012b.
For the future we plan several improvements of and ad- Edgar Alonso Lopez-Rojas, Stefan Axelsson, and Dan
ditions to the current model. BankSim can be calibrated Gorton. RetSim: A Shoe Store Agent-Based Simula-
to improve the results presented in section 7. and increase tion for Fraud Detection. The 25th European Modeling
the granularity and the coverage of zip codes that enrich and Simulation Symposium, 2013.
the synthetic data set and make it even more valuable as S. Luke. MASON: A Multiagent Simulation Environ-
a realistic data set for fraud detection. ment. Simulation, 81(7):517--527, July 2005. ISSN
In order to generate records with malicious behaviour 0037-5497. doi: 10.1177/0037549705058073.
we plan to extend BankSim to also generate malicious ac- Dan Magnusson. The costs of implementing the anti-
tivity that can come from the merchants, customers, dif- money laundering regulations in Sweden. Journal
ferent fraudsters or combinations of these. of Money Laundering Control, 12(2):101--112, 2009.
Among the additions we consider are: increase the step ISSN 1368-5201. doi: 10.1108/13685200910951884.
granularity and add to the simulation more zip codes si- Associate Member and Advisory Council. Reviving
multaneously. We intend to make BankSim a complete retail Strategies for growth in 2009 Executive sum-
bank system by adding other bank transactions such as mary, 2009. URL http://www.grantthornton.
deposit, withdraws and transfers besides the current pay- com/staticfiles/GTCom/files/Industries/
ments. Unfortunately for this addition there is a lack of Consumer&industrialproducts/Whitepapers/
real data that we can use for this purpose, but hopefully Revivingretail_Strategiesforgrowthin2009.
in the future we will find financial institutions interested pdf.
in our project that are willing to share this data. Paul Ormerod and Bridget Rosewell. Validation and
Verification of Agent-Based Models in the Social Sci-
ences. In Flaminio Squazzoni, editor, LNCS, pages
REFERENCES 130--140. Springer Berlin / Heidelberg, 2009. ISBN
978-3-642-01108-5.
Naoki Abe, Bianca Zadrozny, and John Langford. Outlier
Clifton Phua, Vincent Lee, Kate Smith, and Ross Gayler.
detection by active learning. Proceedings of the 12th
A comprehensive survey of data mining-based fraud
ACM SIGKDD international conference on Knowl-
detection research. Arxiv preprint arXiv:1009.6119,
2010.
S. F. Railsback, S. L. Lytinen, and S. K. Jackson. Agent-
based Simulation Platforms: Review and Develop-
ment Recommendations. Simulation, 82(9):609--623,
September 2006. ISSN 0037-5497. doi: 10.1177/
0037549706073695.
AUTHORS BIOGRAPHY
MSc. Edgar A. Lopez-Rojas
Edgar Lopez is a PhD student in Computer Science and
his research area is Multi-Agent Based Simulation, Ma-
chine Learning techniques with applied Visualization for
fraud detection and Anti Money Laundering (AML) in
the domains of retail stores, payment systems and fi-
nancial transactions. He obtained a Bachelors degree in
Computer Science from EAFIT University in Colombia
(2004). After that he worked for 5 more years at EAFIT
University as a System Analysis and Developer and par-
tially as a lecturer. He obtained a Masters degree in Com-
puter Science from Linköping University in Sweden in
2011 and a licentiate degree in computer science (a de-
gree halfway between a Master's degree and a PhD) in
2014.