Multi-Model-Identifies-Fraud-At-Scale-–-ArangoDB-White-Paper
Multi-Model-Identifies-Fraud-At-Scale-–-ArangoDB-White-Paper
Fraud Questions 5
Conclusion 12
Appendix A: Queries 15
1
The Significance of Fraud and Graphs
Fraud is an enormous and ever growing problem impacting all industries and
government services. Global fraud results in over $3.7 trillion losses annually.
Businesses lose on average 5% of their income to fraud every year. In 2018
businesses incurred $3.13 remediation costs for each dollar of fraud [1], dealing
with chargebacks, fees, interest and labor.
Traditional fraud detection views data through a straw, focusing on discrete data
points including specific accounts, individuals, devices or IP addresses. However,
today’s sophisticated fraudsters escape detection by forming fraud rings
composed of stolen and synthetic identities and circuitous back channels.
1
Multimodal data. Our experience of the world is multimodal — we see objects, hear sounds, feel
the texture, smell odors, and taste flavors. Modality refers to the way in which something
happens or is experienced and a research problem is characterized as multimodal when it
includes multiple such modalities.
2
This juxtaposition of multi-model and multi-modal is deliberate, they are orthogonal terms.
2
Figure 1: Identify fraud patterns in the network of transactions and relationships.
The identification of fraud ring patterns requires very deep (multi-hop) traversals
across the graph. The query for detecting a fraud ring can be accomplished in
six lines of (easy to write and maintain) AQL code and ArangoDB can execute
these queries with sub-second response times
3
Converting from Relational Source to Multi-model Graph
The source of data for fraud detection would likely be a relational database,
for example, the schema depicted in Figure 3, which describes the foreign
key relationships among the Bank, Branch, Customer, Account, and
Transaction Tables.
4
entities AKA join tables in the relational model can be used as edges in a
graph model as we have done with Transaction.
Fraud Questions
We will describe how to use ArangoDB to answer various questions:
5
graph. For this example, the query finds long loops of transactions starting
from a suspicious account and looping back to the suspicious account over 5
to 10 transaction hops.
Figure 5 depicts the fraud ring detection query written in the ArangoDB
Query Language (AQL) being developed and executed in the ArangoDB
administrative panel. Note that this sophisticated query is expressed in 6
lines of AQL code and that the compact representation is easily
understandable and maintainable. The query results are displayed as a
circuit in the graph visualization and are also available in json, so they can be
processed by applications calling this query. Note also that the query is
parameterized by ‘suspicious account’ and number of loops to detect.
6
This is easily accomplished in AQL by adding an outer loop to the fraud ring
detector for suspicious accounts. This sophisticated query is written in only 6
lines of AQL!
The query for finding all fraud loops is depicted in Figure 6.
There are many patterns for finding suspicious accounts that may require
further investigation. Most of these patterns are essentially finding
anomalous behavior to flag accounts.
Figure 7 depicts a query for finding orphan accounts and reports on the
accounts and account owner.
7
Figure 7: Find Suspicious “Orphan” Accounts
We can also use standard graph algorithms like pagerank to find deeply
coordinated activity, by looking for the most influential customers and
accounts.
8
Figure 8: Find most influential accounts and customers
Top 3 or top 10 queries are often used to focus attention. In this example,
we use an AQL query to find the top 3 most influential customers. This query
is essentially reading the pagerank value inserted by ArangoDB’s pagerank
algorithm and ordering the results in descending order and returning a limit
of three. The query and the results of execution are depicted in Figure 9.
9
Finding Money Laundering Patterns
ArangoDB can also be used to find more specific patterns, for example, in
money laundering. In money laundering there is a funds
disaggregation/aggregation pattern, where many small transactions (below
some known triggering threshold) are used to split up a large sum of money,
followed by multiple transaction hops across accounts to further avoid
detection, ultimately followed by a number of transactions that aggregate
the funds back to an account.
This fan-out/fan-in pattern can easily be detected using AQL. The query and
results are depicted in Figure 10.
10
Detecting Fraud At Scale
Real-world financial transactions generate billions of data points and
relationships, which will rapidly overrun the capabilities of a single server.
Providing fraud-detection performance at scale requires the underlying data
systems to be able to scale out data across multiple nodes in a distributed
cluster and to be able to efficiently distribute computation in parallel across
the cluster.
Optimizing the layout of data on the cluster can reduce the inter-node
communication needed to perform queries. ArangoDB uses Smartgraph
algorithm to optimize graph distribution across a cluster, SmartJoins to
ensure that joins do not cross servers, and satellite collections to replicate
metadata across servers so that lookups occur local to servers.
Figure 11: Bad distribution of graph data causes network hops during query execution
11
The Smartgraph feature of ArangoDB allows us to handle this problem in a
smarter way. In Fraud Detection we might know from the past that
fraudsters use banks in certain countries or regions to launder their money.
We can use this domain knowledge as a sharing key for our graph data and
allocate all financial transactions performed in this region on DB server 1,
and distribute other transactions on other DB servers. By using this
approach we can allocate all data needed to be grouped together on each
machine, and use the query engines on each DB Server to execute our
queries in parallel.
Conclusion
This paper points the way to using ArangoDB as part of a fraud detection
solution. We encourage users to experiment with our sample data and
sample queries, learn how to apply ArangoDB to fraud visa experimentation
by adding/modifying the data and queries, and be inspired and empowered
to apply your knowledge of fraud to use ArangoDB on your own data to
12
detect fraudulent activity. To get started easily, you can follow the interactive
demo provider on our cloud service ArangoDb Oasis and described below.
Just s ign-up for ArangoDB Oasis and follow the few steps below
13
This White Paper was written by Arthur Keen. For any questions about solving
Fraud Detection cases with ArangoDB, feel free to reach out to
[email protected]
14
Appendix A: Queries
/*
*/
/*
Find number of Curious loops from a suspicious Account
Hints:
Try suspiciousAccountID = account/10000032
Rerun the query for different number of loops detected
Show the graph and json results
Scroll to bottom of graph results and click "GraphViewer" to see results in Graph Viewer
*/
/*
Find Orphan Account
An orphan account is an account with little or no transactions.
These may be set up in advance of money laundering operations.
This query finds accounts with no transactions
*/
15
/*
Anti Money Laundering Pattern Detection
Find transaction patterns that contain a disaggregation and re-aggregation of funds
pattern
This pattern is characterized by transactions that dis-aggregate funds from a source
account to
multiple accounts in amounts that are below a reporting threshold, i.e., below $10,000
followed by a series of small transactions into 1 or more accounts, followed by
re-aggregation
of the small transactions into a destination account.
Show the graph and json results
Scroll to bottom of graph results and click "GraphViewer" to see results in Graph Viewer
*/
16