0% found this document useful (0 votes)

76 views

Data Engineering For Fraud Detection

A DSS for Data engineering to use in fraud detection.

Uploaded by

Amir

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views

Data Engineering For Fraud Detection

A DSS for Data engineering to use in fraud detection.

Uploaded by

Amir

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Decision Support Systems xxx (xxxx) xxx

Contents lists available at ScienceDirect

Decision Support Systems

journal homepage: www.elsevier.com/locate/dss

Data engineering for fraud detection

Bart Baesens a, b, Sebastiaan Höppner c, Tim Verdonck c, d, *
a
KU Leuven, Faculty of Economics and Business, Naamsestraat 69, Leuven 3000, Belgium
b
University of Southampton, School of Management, Highfield Southampton, SO17 1BJ, United Kingdom
c
KU Leuven, Department of Mathematics, Celestijnenlaan 200B, Leuven 3001, Belgium
d
University of Antwerp, Department of Mathematics, Middelheimlaan 1, Antwerp 2020, Belgium

A R T I C L E I N F O A B S T R A C T

Keywords: Financial institutions increasingly rely upon data-driven methods for developing fraud detection systems, which
Decision analysis are able to automatically detect and block fraudulent transactions. From a machine learning perspective, the task
Payment transactions fraud of detecting suspicious transactions is a binary classification problem and therefore many techniques can be
Instance engineering
applied. Interpretability is however of utmost importance for the management to have confidence in the model
Feature engineering
Cost-based model evaluation
and for designing fraud prevention strategies. Moreover, models that enable the fraud experts to understand the
underlying reasons why a case is flagged as suspicious will greatly facilitate their job of investigating the sus
picious transactions. Therefore, we propose several data engineering techniques to improve the performance of
an analytical model while retaining the interpretability property. Our data engineering process is decomposed
into several feature and instance engineering steps. We illustrate the improvement in performance of these data
engineering steps for popular analytical models on a real payment transactions data set.

1. Introduction fraud. On the other hand, it does not very precisely describe the nature
and characteristics of fraud and as such does not provide much direction
The association of certified fraud examiners (ACFE) estimates that a for discussing the requirements of a fraud detection system. A more
typical organization loses 5% of its revenues to fraud each year. The fifth thorough and detailed characterization of the multifaceted phenomenon
oversight report on card fraud analyses developments in fraud related to of fraud is provided by Van Vlasselaer et al. [61]: Fraud is an uncommon,
card payment schemes (CPSs) in the Single Euro Payments Area (SEPA), well-considered, imperceptibly concealed, time-evolving and often carefully
issued in September 2018 by the European Central Bank and covering organized crime which appears in many types of forms. This definition
almost the entire card market, indicates that the total value of fraudulent highlights five characteristics that are associated with particular chal
transactions conducted using cards issued within SEPA and acquired lenges related to developing a fraud detection system.
worldwide amounted to 1.8 billion Euros in 2016, which in relative The first emphasized characteristic and associated challenge con
terms, i.e. as a share of the total value of transactions, amounted to cerns the fact that fraud is uncommon. Independent of the exact setting
0.041% in 2016 [21]. These are just a few numbers to indicate the or application, only a small minority of the involved population of cases
severity of the payment transactions fraud problem. It is also seen that typically concerns fraud, of which furthermore only a limited number
losses due to fraudulent activities keep increasing each year and affect will be known to be fraudulent. This makes it difficult to both detect
card holders worldwide. Therefore, fraud detection and prevention are fraud, since the fraudulent cases are covered by the non-fraudulent ones,
more important than ever before and developing powerful fraud as well as to learn from historical cases to build a powerful fraud
detection systems is of crucial importance to many organizations and detection system since only few examples are available. This will make it
firms in order to reduce losses by timely blocking, containing and pre hard for machine learning techniques to extract meaningful patterns
venting fraudulent transactions. from the data.
The Oxford Dictionary defines fraud as follows: the crime of cheating Fraud is also imperceptibly concealed since fraudsters exactly try to
somebody in order to get money or goods illegally. This definition captures blend into their environments to remain unnoticed. This relates to the
the essence of fraud and covers the many different forms and types of subtlety of fraud since fraudsters try to imitate normal behavior.

* Corresponding author at: University of Antwerp, Department of Mathematics, Middelheimlaan 1, Antwerp 2020, Belgium.
E-mail addresses: [email protected] (B. Baesens), [email protected] (S. Höppner), [email protected] (T. Verdonck).

https://doi.org/10.1016/j.dss.2021.113492
Received 15 July 2020; Received in revised form 25 November 2020; Accepted 7 January 2021
Available online 12 January 2021
0167-9236/© 2021 Published by Elsevier B.V.

Please cite this article as: Bart Baesens, Decision Support Systems, https://doi.org/10.1016/j.dss.2021.113492
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

Fig. 1. Timeline of transactions of a customer using, for example, a particular payment channel.

Moreover, fraud is well-considered and intentional and complex fraud featurization) from different sources (e.g. transactional data, network
structures are carefully planned upfront. Fraudsters can also adapt or data, time series data, text data, ...) to achieve better performance.
refine their tactics whenever needed, for example, due to changing fraud Instance engineering entails the careful selection of instances or obser
detection mechanisms. Therefore, fraud detection systems need to vations again with the aim to improve predictive modeling performance.
improve and learn by example. Put differently, it aims at selecting those observations which positively
The traditional approach to fraud detection is expert-driven, which contribute to the learning of the analytical technique and remove those
builds on the experience, intuition, and business or domain knowledge that have a detrimental impact on it. Obviously, this is not a trivial ex
of one or more fraud investigators. Such expert-based rule base or engine ercise and many instance engineering techniques have been developed
is typically hard to build and maintain. A shift is occurring towards data- which we will carefully study and experiment with in this paper. In this
driven or machine learning based fraud detection methodologies. This paper the focus will be on successful data engineering steps to improve
shift is triggered by the digitization of almost every aspect of society and the performance of a fraud detection model. More concretely, we will
daily life, which leads to an abundance of available data. Financial in describe the lessons that we have learnt when complementing expert-
stitutions increasingly rely upon data-driven methods for developing based approaches with machine learning or data-driven techniques to
powerful fraud detection systems, which are able to automatically detect combat payment transactions fraud for a large European bank.
and block fraudulent transactions. In other words, we need adaptive This paper is organized as follows. We start with presenting our data
analytical models to complement experience-based approaches for engineering process: Section 2 presents feature engineering steps
fighting fraud. A stream of literature has reported upon the adoption of whereas instance engineering is explained in Section 3. In Section 4
data-driven aproaches for developing fraud detection systems [45,47]. popular performance measures in an (imbalanced) classification setting
These methods significantly improve the efficiency of fraud detection are described. In Section 5, more information about payment transaction
systems and are easier to maintain and more objective. From a machine fraud and the observed data set is given. This section also illustrates the
learning perspective, the task of detecting fraudulent transactions is a benefits of the various data engineering steps by showing increased
binary classification problem. performance on our real data set. Finally, concluding remarks and po
A natural first step to move from expert-based approaches to data tential directions for future research are provided in Section 6.
driven techniques (while still taking into account the experience of the
fraud experts) is to consider logistic regression and/or decision trees. 2. Feature engineering
These simple analytical models can then be replaced by complex tech
niques such as random forests and boosting methods, support vector The main objective of machine learning is to extract patterns to turn
machines, neural networks and deep learning to increase the detection data into knowledge. Since the beginning of this century, technological
power. Although the latter are definitely powerful analytical techniques, advances have drastically changed the size of data sets as well as the
they suffer a very important drawback which is not desirable from a speed with which these data must be analyzed. Modern data sets may
fraud prevention perspective: they are black box models which means have a huge number of instances, a very large number of features, or
that they are very complex to interpret. We would also like to note that both. In most applications, data sets are compiled by combining data
these complex models not always significantly outperform simple from different sources and databases (containing both structured and
analytical models such as logistic regression [4,38] and we strongly unstructured data) where each source of information has its strengths
believe that you should always start with implementing these simple and weaknesses. Before applying any machine learning algorithm, it is
techniques. Many benchmarking studies have illustrated that complex therefore necessary to transform these raw data sources into interesting
analytical techniques only provide marginal performance gains on features that better help the predictive models. This essential step, which
structured, tabular data sets as frequently encountered in common is often denoted feature engineering, is of utmost importance in the
classification tasks such as fraud detection, credit scoring and marketing machine learning process. We believe that data scientists should be well
analytics [4,38]. It is our firm belief that in order to improve the per aware of the power of feature engineering and that they should share
formance of any analytical model, we should focus more on the data good practices.
itself rather than developing new, complex predictive analytical tech An important set of interesting features can be created based on the
niques. This is exactly the aim of data engineering. It can be defined as famous Recency, Frequency, Monetary (RFM) principle. Recency mea
the clever engineering of data hereby exploiting the bias of the analyt sures how long ago a certain event took place, whereas frequency counts
ical technique to our benefit, both in terms of accuracy and interpret the number specific events per unit of time. Besides recency features, we
ability at the same time. Often times it will be applied in combination also present several other time-related features. Features related to
with simple analytical techniques such as linear or logistic regression so monetary value measure the intensity of a transaction, typically
as to maintain the interpretability property which is so often needed in expressed in a currency such as Euros or USD. We also introduce features
analytical modeling. In our context of fraud analytics, interpretability is based on unsupervised anomaly detection and briefly discuss some other
of key importance to design smart fraud prevention mechanisms. Data advanced feature engineering techniques.
engineering can be decomposed into feature engineering and instance
engineering. Feature engineering aims at designing smart features in one
2.1. Frequency features
of two possible ways: either by transforming existing features using
smart transformations, which will allow a simple analytical technique
We explain the idea behind the RFM principle by first deriving fre
such as linear or logistic regression to boost its performance, or by
quency features using a transaction aggregation strategy in order to
extracting or creating new meaningful features (a process often called
capture a customer’s spending behavior. This methodology was first

2
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

Table 1 ( )
Example calculation of frequency features: xfreq
i is the number of transactions in D freq
tp ,i = AGG
freq
D , i, tp
the last 24 h, and xfreq2
i is the number of transactions with the same authenti { ( ) ( ( ) ) }N (1)
cation method and payment channel in the last 24 h. = xamt id
j | xj = xi
id
and days xtime
i , xtime
j < tp
j=1

Initial features Frequency

features where AGG(⋅) is a function that aggregates transactions of D into a
TransId CustId Timestamp Authentication Payment xfreq xfreq2
subset associated with a transaction i with respect to the time frame tp;
i i
method channel xtime
i is the timestamp of transaction i; xamt
i is the amount of transaction i;
id
xi is the customer or card identification number of transaction i; and
1 1 01/07/ pin code web 0 0
2019 16:51 days(t1, t2) is a function that calculates the number of days between the
2 1 01/07/ pin code web 1 1 times t1 and t2. Finally, the frequency feature is calculated as
2019 19:04 ⃒ ⃒
⃒ ⃒
3 1 01/07/ fingerprint app 2 0 xfreq
i = ⃒D freq
tp ,i ⃒ (2)
2019 19:36
4 1 01/07/ pin code web 3 2
2019 23:31 where ∣ ⋅ ∣ is the cardinality of a set. This aggregation strategy, however,
5 1 02/07/ fingerprint app 3 1 does not take the combination of different features into account. For
2019 17:48 example, we can aggregate transactions according to certain criteria,
6 1 02/07/ fingerprint app 2 1
such as: transactions made in the last tp days using the same authenti
2019 22:12
7 1 02/07/ fingerprint app 2 2 cation method (e.g. pin code or fingerprint) and the same payment
2019 23:34 channel (e.g. online banking or mobile app). For calculating such fea
8 1 03/07/ pin code app 3 0 tures, Bahnsen et al. [6] expand (1) as follows
2019 01:40 ( )
D freq2
tp ,i = AGG
freq
D , i, tp , cond1,cond2
{ ( ) ( ( ) )
proposed by Whitrow et al. [62] and has been used by a number of = xjamt | xid id
j = xi and days xitime , xtime
j < tp (3)
studies [6,10,18,35]. Frequency calculates how many transactions were (
cond1 cond1
) (
cond2 cond2
) }N
and xj = xi and xj = xi
made during a sliding time window that satisfies predefined conditions, j=1
as illustrated in Fig. 1. The first step in creating frequency features
consists in aggregating the transactions made during the last given time where cond1 and cond2 could be one of the features of a transaction (e.g.
period (e.g. last 3 months), first by card or account number, then by authentication method, payment channel, beneficiary country, etc.).
payment channel, authentication method, beneficiary country or other, Similarly, the frequency feature is then calculated as
followed by counting the number of transactions. It is important to ⃒
⃒
⃒
⃒
choose an appropriate time period over which to aggregate a customer’s xfreq2
i = ⃒D freq2
tp ,i ⃒ (4)
transactions. When time passes, the spending patterns of a customer are
One could also define new features as the ratio of frequency features.
not expected to remain constant over the years. For transactions made
For example,
with debit cards, we propose to use a fixed time frame of 90, 120 or 180
/ freq
days (~ 3, 4 or 6 months). Let D denote a set of N transactions where xratio
i = xfreq2
i xi (5)
each transaction is represented by the pair (xi, yi) for i = 1, 2, …, N. Here
yi ∈ {0, 1} describes the true class of transfer i and xi = (x1i , x2i , …, xpi ) which is always between 0 and 1. Since xratio
i is the fraction of of transfers
represents the p associated features of transfer i. Bahnsen et al. [6] for which conditions cond1 and cond2 hold over all transactions in the
describe the process of creating frequency features as selecting those past tp days, this feature represents the probability that both conditions
transactions that were made in the previous tp days, for each transaction cond1 and cond2 are met by the customer.
i in the data set D , We show an example to further clarify how the frequency features
are calculated. Consider a set of transactions made by a customer be
tween 01/07/2019 and 03/07/2019, as shown in Table 1. Then we

Fig. 2. Example of a recency feature derived from the authentication method used by a customer. When the customer makes a transaction, she chooses one of five
possible authentication methods which are labeled as AU01, AU02, …, AU05. If the time between the same two successive authentication methods is long, the
recency is close to zero, while if that time is short, the recency is close to 1. If an authentication method is used for the first time, its recency is defined as zero.

3
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

Fig. 3. Recency versus time (in days) for different values of γ.

Fig. 4. (Left) Circular histogram of timestamps of transactions. The dashed line is the estimated periodic mean of the von Mises distribution. (Right) Circular
histogram including the 90% confidence interval (orange area).

estimate the frequency features xfreq

i and xfreq2
i by setting tp = 1 day (~ When a customer makes a transfer xi, she chooses a method xAU i to
24 h) for ease of calculation. authenticate herself. Examples of authentication methods are pass
The frequency features give us specific details about the spending words, pin codes, fingerprints, itsme,1 iris scans and hardware tokens.
behavior of the customer. For example, if a customer frequently used a For each transaction i in the data set D , we define the recency of the
particular payment channel in the past tp days, its frequency is obviously transaction’s authentication method as.
large. However, a zero frequency for a particular payment channel im
plies that the customer has not used that payment channel in the past tp xAU,recency
i = exp( − γ⋅Δti )where
{ ( ) ( ) ( ) }N (6)
days which indicates anomalous behavior and perhaps fraud. The total Δti = min days xitime , xtime
j | xjid = xiid and xAU
j = xAU
i .
number of frequency features can grow quite quickly, as tp can have j=1

several values, and the combination of criteria can be quite large as well.
here Δti is the time interval, typically in days, between two consecutive
For the experiments we set the different values of tp to 90, 120 and 180
transfers made by the same customer with identification number xid i
days. Then we calculate the frequency features using (2) and (4) as well
using the same authentication method xAU i . The parameter γ can be
as (5) with the aggregation criteria including payment channel,
chosen such that, for example, the recency is small (e.g. 0.01) when Δt =
authentication method, beneficiary country, type of communication,
180 days (~ 6 months) in which case γ = − log (0.01)/180 = 0.026.
and others.
Notice that recency is always a number between 0 and 1. When the time
period Δt between two consecutive transfers with the same authenti
2.2. Recency features cation method is small (large), we say that the authentication method
has (not) recently been used. In that case the recency for this authenti
Although frequency features are powerful in describing a customer’s cation method is close to one (zero). When an authentication method is
spending behavior, they do not take the aspect of time into account. used for the first time, we define its recency to be zero. A zero or small
Recency features are a way to capture this information. Recency mea recency shows atypical behavior and might indicate fraud. Fig. 3 shows
sures the time passed since the previous transaction that satisfy pre that recency indeed decreases when the time interval becomes larger.
defined conditions. To explain how recency features are defined we The parameter γ determines how fast the recency decreases. For larger
show an example where we create a recency feature derived from the values of γ, recency will decrease quicker with time and vice versa.
authentication method used by the customer as illustrated in Fig. 2.

1
This is a popular app in Belgium that allows you to safely, easily and reli
ably confirm your (digital) identity and approve transactions.

4
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

Table 2 timestamps using these estimates:

Example calculation of a binary feature that informs whenever a transaction is ( ( ) ( ))
being made within the confidence interval (with α = 0.9) of the time of the xtime
i μ D time
̃von Mises ̂ tp ,i κ D time
,̂ tp ,i (9)
previous transactions.
Once the von Mises distribution is fitted on the timestamps of the
TransId Time Periodic Confidence Binary
mean interval feature customer’s transactions we can construct a confidence interval with
probability α, e.g. 80%, 90%, 95%. An example is presented in Fig. 4
1 01/07/2019 – – –
16:51 (right). Using the confidence interval, a binary feature is created: a
2 01/07/2019 – – – transaction is flagged as normal or suspicious depending on whether or
19:04 not the time of the transaction is within the confidence interval. Table 2
3 01/07/2019 17:57 16:07–19:48 1 shows an example of a binary feature that takes the value of one if the
19:36
current time of the transaction is within the confidence interval of the
4 01/07/2019 18:31 16:32–20:29 0
23:31 time of the previous transactions with a confidence of α = 0.9. Of course,
5 02/07/2019 19:40 15:39–23:40 1 multiple of these binary features can be extracted for different values of
17:48 α and time period tp. The new feature also helps to get a better under
6 02/07/2019 19:14 15:27–23:01 1
standing of when a customer is expected to make transactions. Note that
22:12
7 02/07/2019 19:47 15:52–23:42 1
this feature (just as many others) solely indicates atypical behavior for a
23:34 customer, which might give an indication for fraud. If a certain trans
8 03/07/2019 20:21 16:05–00:38 0 action is flagged as potentially fraudulent due to this feature, then it is
01:40 important that this information is also given to the fraud investigators. If
they see that the customer is abroad, then that could be the reason for
2.3. Other time-related features the atypical value of this feature.
Instead of looking at the timestamp of a transaction within a day, we
It is well-known that time is an important aspect in fraud detection. can of course create similar features indicating how atypical it is for a
Besides recency features other time-related features can be created customer to have a payment on a certain day or above a certain amount.
based on the assumption that certain events, like a customer who makes Some customers, for example, may only do transactions during the
transactions, occur at similar moments in time. Having a transaction at weekend. Adding such features based on customer spending history may
22:00 might be very regular for one person, but very suspicious for bring significant increase in model performance. Most predictive models
another person. Since, for every customer, we know the timestamps of let you also easily evaluate which features increased the performance of
all their transactions in the past, we can use this information to decide your model and which are not significant for discriminating frauds from
whether a new transaction at 22:00 is atypical for a particular customer. non-frauds.
For the set of timestamps of transactions made by a each customer we
can construct a circular histogram, as shown in Fig. 4 (left). Since 00:00 2.4. Monetary value related features
is the same as 24:00, we have to model the time of a transaction as a
periodic variable by fitting an appropriate statistical distribution [6]. A The last pillar of the RFM principle involves monetary value related
popular choice is the von Mises distribution, also known as the periodic features which focus on the amount that is transferred. Monetary fea
normal distribution because it represents a normal distribution wrapped tures calculate various statistics such as the total value, the average, and
around a circle [25]. The von Mises distribution of a set of timestamps the standard deviation of the transferred amounts that were pursued
D time = {t1 , t2 , …, tN } is defined as during the sliding time window that satisfy predefined conditions
(Fig. 5). The first step in creating monetary features is the same as with
D time ̃von Mises(μ, κ) (7)
frequency features: select those transactions that were made in the last tp
where parameters μ and 1/κ represent the periodic mean and the peri days, as in (1). Next, we can calculate the total amount spent on those
odic standard deviation, respectively. These parameters can easily be transactions,
estimated by most statistical software. We use the function mle.von ∑
N ( )
mises from the R package circular to compute the maximum likelihood xtotal
i = xjamt I xamt
j ∈ D freq
tp ,i (10)
estimates for the parameters of a von Mises distribution. j=1

For each customer we construct a confidence interval for the time of

where I(⋅) is the indicator function. Of course, we can also aggregate
a transaction. First, we select the set of transactions made by the same
transactions according to certain criteria, as in (3), followed by calcu
customer in the last tp days,
lating their sum,
( )
D time
tp ,i = AGG
time
D , i, tp ( )
∑
N
{ ( ) ( ( ) ) }N (8) xtotal2
i = xamt
j I xj
amt
∈ D freq2
tp ,i (11)
= xtime
j | xid
j = xi
id
and days xtime
i , xtime
j < tp . j=1
j=1

Based on this set of selected timestamps, the estimated parameters ̂ μ Transferring 500 Euros may be little for one person, but a lot for
and ̂
κ are calculated. Next, a von Mises distribution is fitted on the set of another person. A monetary feature that calculates the so-called z-score
of an amount can indicate whether the amount is atypical for a

Fig. 5. Timeline of amounts transferred by a customer using, for example, a particular payment channel.

5
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

where
(⃒ ({ } ) ⃒n )
⃒ n ⃒
MAD({x1 , x2 , …, xn }) = 1.4826⋅Median ⃒xi − Median xj j=1 ⃒
i=1
(16)
The constant scale factor 1.4826 ensures that the MAD is a consistent
estimator for the estimation of the standard deviation σ, i.e.
E[MAD({X1 , X2 , …, Xn }) ] = σ for Xj distributed as N(μ, σ 2) and large n.
Using the robust estimates, the z-score of the last amount in Fig. 6 is
5.79, which clearly indicates that the 500 Euros is atypical for this
customer.
Remark: transferred amounts are often right-skewed as shown in
Fig. 7 (left). The rule of thumb, i.e. ∣zi ∣ > 3, implicitly assumes that the z-
scores are distributed as N(μ, σ2). Before standardizing the amounts, a
Fig. 6. An example of transferred amounts. The last amount of 500 Euros is
transformation is often applied to them that changes their distribution to
clearly an outlier compared to the previous amounts. The atypical high amount
one that resembles a normal distribution, or at least a symmetric dis
is not indicated when using traditional estimates such as sample mean and
sample standard deviation. Instead, we have to use robust estimates such as the tribution. One such transformation is the natural logarithm, as shown in
median and the median absolute deviation (MAD). Fig. 7 (right).
A popular alternative for computing (robust) z-scores is the boxplot,
which is a frequently used graphical tool to analyze a univariate data set
particular customer. For a set of amounts D freq
tp ,i , the standardized values
[60]. The boxplot marks all observations outside the interval [Q1 −
or z-scores are defined as 1.5IQR; Q3 + 1.5IQR] as potential outliers, where Q1, Q2 and Q3 denote
xamt μD
− ̂ respectively the first, second (or median) and third quartile and IQR =
zi = i
(12) Q3 − Q1 equals the interquartile range. It is known that the boxplot
σD
̂
typically flags too many points as outlying when the data are skewed and
where ̂μ D and ̂
σ D are the sample mean and sample standard deviation, therefore Hubert and Vandervieren [34] have modified the boxplot in
respectively, terval so that the skewness is sufficiently taken into account.
( ) ( ) In practice one often tries to detect outliers using diagnostics starting
μ D = Mean D freq
̂ tp ,i σ D = Stdev D freq
and ̂ tp ,i (13) from a classical or traditional fitting method. Unfortunately, these
traditional techniques can be affected by outliers so strongly that the
As a rule of thumb, an amount is flagged as an outlier if its z-score is resulting fitted model may not allow to detect the deviating observa
larger than 3, ∣zi ∣ > 3. Now consider the transactions made by a tions. This is called the masking effect (see e.g. Rousseeuw and Leroy
customer, as shown in Fig. 6. The last amount of 500 Euros is clearly an [55]). Additionally, some good data points might even appear to be
outlier compared to the previous amounts. However, when using the outliers, which is known as swamping [19]. To avoid these effects, the
sample mean and sample standard deviation, the z-score of the atypi goal of robust statistics is to find a fit which is close to the fit we would
cally high amount is only 2.66 and is therefore not regarded as have found without the outliers. We can then automatically identify the
abnormal. outliers by their large ‘deviation’ (e.g., their distance or residual) from
Instead of computing the z-score using traditional estimates such as that robust fit. It is not our aim to replace traditional techniques by a
sample mean and sample standard deviation, we propose using robust robust alternative, but we have illustrated that robust methods can give
alternatives such as the median and the median absolute deviation you extra insights in the data and may improve the reliability and ac
(MAD), curacy of your analysis.
xamt − μrD
zri = i
(14)
σrD 2.5. Features based on (unsupervised) anomaly detection techniques

with In this section we focus on unsupervised techniques that do not use

( ) ( ) the target variable (fraudulent or not). Anomaly detection techniques
μrD = Median D freq
tp ,i and σ rD = MAD D freq
tp ,i (15) flag anomalies or outliers, which are observations that deviate from the
pattern of the majority of the data. These flagged observations indicate

Fig. 7. Histogram and kernel density estimate of amounts (left) and natural logarithm of those amounts (right).

6
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

atypical behavior and hence may contain crucial information for fraud 2.6. Other feature engineering techniques
detection and should be investigated by the fraud expert. As an alter
native, we propose to use the outlyingness score or metric of several In this paper, we only study a few feature engineering techniques to
anomaly detection techniques as features that we add to our data set. illustrate their importance as a key data engineering mechanism. Other
Anomalies in a single dimension (i.e. univariate outliers) can be powerful feature engineering techniques are the Box-Cox and Yeo-
detected by computing (robust) z-scores (and see which observations are Johnson transformation which both univariately transform data vari
in absolute value larger than 3) or by constructing the (adjusted) boxplot ables so as to boost the performance of the predictive analytical model.
(and see which observations are outside the boxplot interval or fence). Note that these transformation techniques are sensitive to outliers and
Another tool for univariate anomaly detection that is also popular in will try to move outliers inward at the expense of the normality of the
fraud detection is Newcomb-Benford law, which makes predictions central part of the data. Therefore various robust transformation pro
about the distribution of the first leading digit of all numbers [7,46]. cedures have been proposed in literature (see e.g. Carroll and Ruppert
These techniques can then be applied on each feature in the data set. [15]; Riani [51]; Marazzi et al. [43]; Raymaekers and Rousseeuw [50]).
However, in this way it is only possible to detect anomalies that are Feature engineering techniques have also been designed for unstruc
atypical in (at least) one dimension or feature of our data set. Since tured data such as text, network data, and multimedia data (e.g., images,
fraudsters succeed very well in blending in with legitimate customers, audio, videos). For text data, one commonly uses Singular Value
they are typically not detected by checking each feature separately. It is Decomposition (SVD) or Natural Language Processing (NLP) as feature
important to flag those observations that deviate in several dimensions engineering techniques. For network data, node2vec and GraphSage
from the main data structure but are not atypical in one of the features. [29,30] have proven to be very valuable techniques. Deep learning has
Such multivariate outliers can only be detected in the multidimensional been used to learn complex features for multimedia data. As an example,
space and require the use of advanced models. convolutional neural networks can learn key features to describe objects
A first tool for this purpose is robust statistics, which first fits the in images. However, an important caveat is that many of these features
majority of the data and then flags the observations that deviate from are black box in nature and thus hard to interpret for business decision
this robust fit [54]. For a multivariate n × p data set X, one can calculate makers. Finally, tailored feature engineering techniques have been
the robust Mahalanobis distance (or robust generalized distance) for designed for specific domains, e.g., Item2Vec in Recommender Systems
each observation xi: [8].
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
MD(xi , ̂ ̂ = (x − ̂
μ , Σ) μ )T Σ ̂ − 1 (x − ̂ μ ). (17) 3. Instance engineering

An observation is then flagged as anomaly if its distance exceeds the

√̅̅̅̅̅̅̅̅̅̅̅̅̅ A major challenge in fraud analytics is the imbalance or skewness of
cut-off value χ 2p,0.975 , which is the 0.975 quantile of the chi-squared the data, meaning that typically there are plenty of historical examples
distribution with p degrees of freedom. It is of utmost importance that of non-fraudulent cases, but only a limited number of fraudulent cases.
robust estimates of multivariate location and scatter are used in the For example, in a credit card fraud setting, typically less than 0.5% of
computation of the distances (to avoid masking and swamping effects). transactions are fraudulent. Such a problem is commonly referred to as
A popular method yielding such estimates is the Mininimum Covariance the needle in a haystack problem, and might cause an analytical tech
Determinant (MCD) method of Rousseeuw and Driessen [53] or the nique to experience difficulties in learning to create an accurate model.
Minimum Regularized Covariance Determinant (MRCD) estimator of Every classifier faced with a skewed data set typically tends to favor the
Boudt et al. [11] in case of high-dimensional data. Note that also various majority class. In other words, the classifier tends to label all trans
robust alternatives for popular predictive models are proposed in liter actions as non-fraudulent since it then already achieves a classification
ature. These robust supervised techniques automatically flag anomalies accuracy of more than 99%. Classifiers typically learn better from a
(typically with a convenient graphical tool to visualize the anomalies). more balanced distribution. Two popular ways to accomplish this is by
Therefore it is interesting to also apply robust versions of the predictive undersampling, whereby non-fraudulent transactions in the training set
models on the data and carefully examine the anomalies flagged with are removed, or oversampling, whereby fraudulent transactions in the
these techniques (for more information see e.g. Maronna et al. [44]; training set are replicated.
Heritier et al. [33]; Atkinson and Riani [3]). Recently, Rousseeuw et al. A practical question concerns the optimal, non-fraud/fraud odds,
[56] also used robust statistics to detect potential fraud cases in time which should be the goal by doing under- or oversampling. This of
series of imports into the European Union. course depends on the data characteristics and quality and type of
Besides robust statistics, many other unsupervised anomaly detec classifier. Although train and error is commonly adopted to determine
tion tools from various research fields have been proposed [28]. We this optimal odds, the ratio 90% non-fraudsters versus 10% fraudsters is
briefly introduce and illustrate three popular techniques: k-nearest usually already sufficient for most business applications.
neighbors distance [2,14], local outlier factor (LOF) [13] and isolation The Synthetic Minority Oversampling technique, or SMOTE, is
forests [40]. The k-nearest neighbors distance for an observation is the another interesting approach to deal with skewed class distributions
average distance to each of its k closest neighbors. This distance mea [16]. In SMOTE, the minority class is oversampled by adding synthetic
sures how isolated an observation is from its neighbors and hence a large observations. The creation of these artifical fraudsters goes as follows. In
distance typically indicates an anomaly. The LOF score is the average Step 1 of SMOTE, for each minority class observation, the k nearest
density around the k nearest neighbors divided by the density around neighbors (of same class) are determined. Step 2 then randomly selects
the observation itself and anomalies typically have a score above one. one of the neighbors and generates synthetic observations as follows: 1)
Isolation forest is obtained by taking an ensemble of isolation trees take the difference between the features of the current minority sample
which try to isolate each observation as quickly as possible. The final and those of its nearest neighbor. 2) multiply this difference with a
score is the average of the standardized path length (i.e. number of splits random number between 0 and 1 and 3) add the obtained result as new
to isolate the observation) over all trees. Hence for all the methods above observation to the sample, hereby increasing the frequency of the mi
it holds: the higher the score or metric, the more suspicious is the nority class.
observation. The key idea of these undersampling and oversampling techniques is
to adjust the class priors to enable the analytical technique to create a
meaningful model that discriminates the fraudsters from the non-
fraudsters. By doing so, the class posteriors become biased. This is not
a problem if the fraud analyst is interested in ranking the observations in

7
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

Fig. 8. Illustration of SMOTE, ADASYN, MWMOTE and ROSE. The blue circles represent the legitimate cases, the black squares are the original fraud cases, and the
red dots are the synthetic fraud cases. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

terms of their fraud risks. However, if well-calibrated fraud probabilities

Table 3
are needed, then the posterior probabilities can be adjusted [57].
Confusion matrix of a binary classification task.
Since its introduction in 2002, many variants of SMOTE have been
proposed in literature (see e.g. Zhu et al. [63] and Kovács [36] for an Actual legitimate (negative) Actual fraudulent (positive)
y=0 y=1
overview). In Fig. 8, we visually show the differences between ADASYN
[32], MWMOTE [9] and ROSE [41] and show their performance on our Predicted as True negative False negative
legitimate
data set. We refer to their papers for details. It is clear that there is not
(negative) ̂
y =0 (TN) (FN)
one oversampling technique that always yield the best result [1].
Predicted as False positive True positive
fraudulent
4. Measuring performance (positive) ̂
y =1 (FP) (TP)

The aim of detecting transfer fraud is to identify transactions with a

high probability of being fraudulent. From the perspective of machine A classification exercise typically leads to a confusion matrix as
learning, the task of predicting the fraudulent nature of transactions can shown in Table 3. Based on the confusion matrix, we can compute
be presented as a binary classification problem where observations (i.e. several performance measure such as Precision, Recall (also called True
transactions, customers, etc.) belong either to class 0 or to class 1. We Positive Rate, Sensitivity or Hit Rate), False Positive Rate, and F1-measure.
follow the convention that the fraudulent observations belong to class 1, Each of these measures are calculated for a given confusion matrix that
whereas the legitimate observations correspond to class 0. We often is based on a certain threshold value t ∈ [0, 1].
speak of positive (class 1) and negative (class 0) observations. The receiver operating characteristic (ROC) curve, as shown on the
{( )}N left plot in Fig. 9, is probably the most popular method to analyze the
Consider again our set D = xi , yi i=1 of N transactions. In gen
effectiveness of a classifier. The ROC curve is obtained by plotting for
eral, a classification algorithm provides a continuous score si := s(xi) ∈
each possible threshold value the false positive rate (FPR) on the X-axis
[0, 1] for each transaction i. This score si is a function of the observed
and the true positive rate (TPR) on the Y-axis. As a graphical tool the
features xi of transaction i and represents the fraud propensity of that
ROC curve visualizes the tradeoff between achieving a high recall (TPR)
transaction. Here we assume that legitimate transfers (class 0) have a
while maintaining a low false positive rate (FPR), and is often used to
lower score than fraudulent ones (class 1). The score si is then converted
find an appropriate decision threshold. Provost et al. [48] argue that
to a predicted class ̂ y i ∈ {0, 1} by comparing it with a classification
ROC curves, as an alternative to accuracy estimation for comparing
threshold t ∈ [0, 1]. If a transfer’s probability of being fraudulent as
classifiers, would enable stronger and more general conclusions. For
estimated by the classification model lies above this threshold value,
more information about ROC curves we refer to Krzanowski and Hand
then the transfer is predicted as fraud (si > t⇒̂y i = 1), and otherwise it is
[37] and Swets [59].
classified as legitimate (si ≤ t⇒̂ y i = 0).

8
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

Fig. 9. (Left) example of a ROC curve. (Right) example of a Precision-Recall curve. Both curves are based on the same classifier validated on the same data set.

Comparing classifiers based solely on their ROC curves can be

Table 4
challenging. Therefore, the ROC curve is often summarized in a single
Cost matrix where, between square brackets, the related instance-dependent
score, namely the Area Under the ROC Curve (AUC) which varies be
classification costs for transfer fraud are given.
tween 0 and 1 [22,23,39]. In the context of fraud detection, the AUC of a
classifier can be interpreted as being the probability that a randomly Actual legitimate (negative) Actual fraudulent (positive)
yi = 0 yi = 1
chosen fraud case is predicted a higher score than a randomly chosen
legitimate case. Therefore, a higher AUC indicates superior classification Predicted as True negative False negative
legitimate
performance. A perfect classifier would achieve an AUC of 1 while a
(negative) ̂
yi = 0 [Ci(0| 0) = 0] [Ci(0| 1) = Ai]
random model (i.e. no prediction power) would yield an AUC of 0.5.
Predicted as False positive True positive
When dealing with highly imbalanced data as is the case with fraud fraudulent
detection, AUC (and ROC curves) may be too optimistic and the Area (positive) ̂
yi = 1 [Ci(1| 0) = cf] [Ci(1| 1) = cf]
under the Precision-Recall Curve (AUPRC) gives a more informative
picture of a classifier’s performance [20,24,58]. As the name suggest,
the Precision-Recall curve (right plot in Fig. 9) plots the precision (Y- matrix and can even be instance-dependent, in other words, specific to
axis) against the recall (X-axis) or each possible threshold. The AUPRC is each transaction i as indicated in Table 4. Hand et al. [31] proposed a
therefore also a value between 0 and 1. Both ROC and PR curves use the cost matrix, where in the case of a false positive (i.e. incorrectly pre
recall, but the ROC curve also plots the FPR whereas PR curves focus on dicting a transaction as fraudulent) the associated cost is the adminis
precision. In the denominator of FPR, one sums the number of true trative cost Ci(1| 0) = cf. This fixed cost cf has to do with investigating the
negatives and false positives. In highly imbalanced data, the number of transaction and contacting the card holder. When detecting a fraudulent
negatives (legitimate observations) is much larger than the number of transfer, the same cost Ci(1| 1) is allocated to a true positive, because in
positives (fraudulent observations) and hence the number of true neg this situation, the card owner will still need to be contacted. In other
atives is typically very high compared to the number of false positives. words, the action undertaken by the company towards an individual
Therefore, a large increase or decrease in the number of false positives transaction i comes at a fixed cost cf ≥ 0, regardless of the nature of the
will have almost no impact on FPR in the ROC curves. Precision, on the transaction. However, in the case of a false negative, in which a fraud
other hand, compares the number of true positives to the number of false ulent transfer is not detected, the cost is defined as the amount Ci(0| 1) =
positives and hence copes better with the imbalance between positive Ai of the transaction i. The instance-dependent costs are summarized in
and negative observations. Since precision is more sensitive to class Table 4. We argue that the cost matrix in Table 4 is a reasonable
imbalance, the area under the Precision-Recall curve (AUPRC) is better assumption. However, one could alter the cost matrix, for example, by
to highlight differences between models for highly imbalanced data sets. using a variable cost for false positives that reflects the level of friction
Despite the many ways to evaluate a classification model’s perfor that the card holder experiences.
mance we argue that the true business objective of a fraud detection Using the instance-dependent cost matrix in Table 4, Bahnsen et al.
system is to minimize the financial losses due to fraud. However, the [6] define the cost of using a classifier s(⋅) on the transactions in D as
performance measures mentioned so far do not incorporate any costs N ( [ ( ) ( ]
∑
related to incorrect predictions such as not detecting a fraudulent Cost(s(D ) ) = y i Ci (1|1) + 1 − ̂
yi ̂ y i Ci 0 |1)
transaction. Therefore, they may not be the most appropriate evaluation i=1
[ ( ) ( ])
criteria when evaluating fraud detection models. In fact, the previous + (1 − yi ) ̂ y i Ci (1|0) + 1 − ̂y i Ci 0 |0) (18)
performance measures tacitly assume that misclassification errors carry
( )
the same cost, similarly with the correctly classified transactions. This ∑
N
= yi 1 − ̂y i Ai + ̂y i cf .
assumption clearly does not hold in practice because wrongly predicting i=1
a fraudulent transaction as legitimate carries a significantly different
financial cost than the inverse case. To better align the assessment of In other words, the total cost is the sum of the amounts of the un
data-driven fraud detection systems with the actual objective of detected fraudulent transactions (yi = 1, ̂ y i = 0) plus the administrative
decreasing losses due to fraud, we extend the confusion matrix in Table 3 cost incurred. The total cost may not always be easy to interpret because
by incorporating costs as proposed in [5]. Let Ci (̂ y |y) be the cost of there is no reference to which the cost is compared [62]. So Bahnsen
predicting class ̂y for a transfer i when the true class is y. If ̂
y = y then the et al. [6] proposed the cost savings of a classification algorithm as the cost
prediction is correct, while if ̂ y∕ = y the prediction is incorrect. In gen of using the algorithm compared to using no algorithm at all. The cost of
eral, the costs can be different for each of the four cells in the confusion using no algorithm is

9
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

Costl (D ) = min{Cost(s0 (D ) ) , Cost(s1 (D ) ) } (19) Table 5

Examples of typical features of transactions.
where s0 refers to a classifier that predicts all the transactions in D as Feature name Description
belonging to class 0 (legitimate) and similarly s1 refers to a classifier that
Transaction ID Transaction identification number
predicts all the transfers in D as belonging to class 1 (fraud). The cost Timestamp Date and time of the transaction
savings is then expressed as the cost improvement of using an algorithm Originator’s account number Identification number of the originator’s bank
as compared with Cost l (D ), account
Beneficiary’s account Identification number of the beneficiary’s bank
Costl (D ) − Cost(s(D ) ) number account
Savings(s(D ) ) = . (20)
Costl (D ) Beneficiary’s name Name of the beneficiary
Card number Identification of the debit card
In the case of transaction fraud, the cost of not using an algorithm is Payment channel Electronic channel (e.g. online banking, mobile
equal to the sum of amounts of the fraudulent transactions, Costl (D ) = app,...)
∑N Authentication method e.g. pin code, fingerprint, itsme,...
i=1 yi Ai . The savings are then calculated as Currency Original currency (e.g. Euros, USD,...)
∑N Amount Amount of the transaction in Euros
yi ̂y Ai − ̂y i cf Originator country Country from which the money is send
Savings(s(D ) ) = i=1∑N i . (21)
i=1 yi Ai Beneficiary country Country to which the money is send
Communication Message provided with the transfer
In other words, the costs that can be saved by using an algorithm are Gender Gender of the customer
the sum of amounts of detected fraudulent transactions minus the Age Age of the customer
administrative cost incurred in detecting them, divided by the sum of Country Customer’s country of residence
Language Customer’s preferred language
amounts of the fraudulent transactions.
Besides obtaining the best statistical accuracy or the highest cost
savings, there are many other reasons why one model might be preferred
above another, such as interpretability, operational efficiency and Table 6
economical cost. Summary of the data sets.
Interpretability refers to the intelligibility or readability of the Set Transactions Frauds
analytical model. Models that enable the user to understand the un Total 31,763 506
derlying reasons why the model signals a case to be suspicious are called Training 22,234 354
white-box models. Complex incomprehensible mathematical models are Testing 9529 153
often referred to as black-box models. It might well be, in a fraud
detection setting, that black-box models are acceptable, although in
5. Experimental assessment
most settings, some level of understanding and in-fact validation, which
is facilitated by interpretability, is required for the management to have
In this Section 5.1 we first describe the observed data set for the
confidence and allow the effective implementation of the model. In most
experiments. In Section 5.2 we present the experimental design and in
situations, the aim of the fraud detection system is to select out of mil
Section 5.3 we show the results of the experiments.
lions of payments the transactions that are most suspicious. These top,
say 100, most suspicious transactions are then given to the fraud in
5.1. Information about the real data set
vestigators for further examination. When using white box models, it is
straightforward to also give information about why a certain transaction
We illustrate the proposed techniques on a data set that has been
is flagged as being suspicious. This of course facilitates the job of the
provided to our research group by a large European bank. The data set
fraud investigators leading to more suspicious transactions that can be
consists of fraudulent and legitimate transactions made with debit cards
examined for example in one day. The need of interpretability on the
between September 2018 and July 2019. Note that the magnitude of the
operator side, which advocates for relatively simple models and
data set illustrated here is much smaller than data sets typically used in
methods, has also the advantage to simplify for the end-user (a bank) the
fraud prediction and its incidence of fraudulent transactions is also
implementation, maintainability and possibility to update/enrich the
much higher. This is because a kind of white-listing (based on
system over time.
experience-driven business rules) was first applied to the data by the
Operational efficiency refers to the response time or the time that is
bank to filter out “definitely safe” transactions. The total data set con
required to evaluate the model, or in other words, the time required to
tains 31,763 individual transactions, each with 14 attributes and a fraud
evaluate whether a case is suspicious. It also entails the efforts needed to
label that indicates when a transaction is confirmed as fraudulent. This
collect and preprocess the data, evaluate the model, monitor and back-
label was created internally in the bank by fraud investigators, and can
test the model, and re-estimate it when necessary. Operational efficiency
be considered as highly accurate. Only 506 transactions in the data set
can be a key requirement, meaning that the fraud detection system
were labeled as fraud, resulting in a fraud ratio of 1.6%.
might have only a limited amount of time available to reach a decision
The initial set of features include information regarding individual
and let a transaction pass or not. In others words, huge volumes of data
transactions, such as amount, timestamp, payment channel and bene
need to be processed in a short time span. For example, in a credit card
ficiary country. Table 5 contains examples of such typical attributes that
fraud detection setting, the decision time must typically be less than
are available for transactions.
eight seconds. Such a requirement clearly impacts the design of the
operational IT systems, but also the design of the analytical model.
The economical cost refers to the total cost of ownership and return 5.2. Experimental design
on investment of the analytical fraud model. Although the former can be
approximated reasonably well, the latter is more difficult to determine. In order to test the performance of machine learning models that
Fraud analytical models should also be in line and comply with all only use these 14 initial features, we split the data into a training and
applicable regulation and legislation with respect to, for example, pri testing set. Each one contains 70% and 30% of the transactions,
vacy or the use of cookies in a web browser. respectively, stratified according to the fraud label to obtain similar
fraud distributions as observed in the original data set. Table 6 sum
marizes the different data sets.
For the experiments we use the following popular classification

10
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

Table 7
Performance of logistic regression (LR), decision tree (DT) and gradient boosted trees (GBT) on the testing set using (top) the 14 original features, (middle) the RFM and
other time-related features, (bottom) and the features based on anomaly detection techniques.
Original features

Precision Recall F1 FPR AUPRC Savings % of fraud amount detected

LR 0.6154 0.3810 0.4706 0.0025 0.4417 0.5117 0.5340

DT 1.0000 0.1905 0.3200 0.0000 0.3050 0.3191 0.3260
GBT 0.7778 0.3333 0.4667 0.0010 0.4632 0.5068 0.5223

Including RFM and other time-related features

Precision Recall F1 FPR AUPRC Savings % of fraud amount detected
LR 0.5625 0.4286 0.4865 0.0035 0.4680 0.5483 0.5757
DT 0.8000 0.3810 0.5161 0.0010 0.4836 0.6635 0.6807
GBT 0.6923 0.4286 0.5294 0.0020 0.6333 0.5979 0.6202

Including features based on anomaly detection techniques

Precision Recall F1 FPR AUPRC Savings % of fraud amount detected
LR 0.7647 0.6190 0.6842 0.0020 0.6975 0.6751 0.7042
DT 0.8125 0.6190 0.7027 0.0015 0.6370 0.6883 0.7158
GBT 0.8750 0.6667 0.7568 0.0010 0.7669 0.7908 0.8183

methods: logistic regression (LR), decision tree (DT), using the CART based methods estimate the contribution of individual features towards
algorithm [12], and gradient boosted trees (GBT), using the XGBoost a specific prediction. The purpose of this paper is to illustrate the benefit
algorithm [17]. Logistic regression is often used in the industry because of the proposed data engineering techniques to the performance of fraud
it is fast to compute, easy to understand and interpret. Moreover, logistic detection models regardless of the chosen model structure. Therefore, all
regression is often used as a benchmark model to which other classifi three classifiers (LR, DT and GBT) are trained on the training set using
cation algorithms are compared. Commonly used decision tree algo their default parameters as suggested by their respective authors. The
rithms include CART [12] and C4.5 [49]. The tree-like structure of a performance of the three classifiers is evaluated on the testing set using
decision tree makes it particularly easy to gain insight in its decision Precision, Recall (i.e. hit rate), F1 measure, false positive rate (FPR, i.e.
process. This is especially useful in a fraud detection setting to under false alarm rate), Area Under Precision Recall Curve (AUPRC), Savings,
stand how fraud is committed and work out corresponding fraud pre and the fraction of fraudulent amounts that are detected. Hereby a de
vention strategies. XGBoost is short for eXtreme Gradient Boosting [17]. cision threshold of t = 50% is used. For the calculation of the Savings
It is an efficient and scalable implementation of the gradient boosting measure, we choose a fixed cost of cf = 5 Euros.
framework by Friedman et al. [27] and Friedman [26], but it uses a more
regularized model formalization to control over-fitting, which gives it 5.3. Results
better performance. The name XGBoost refers to the engineering goal to
push the limit of computational resources for boosted tree algorithms. Table 7 contains the performance of logistic regression (LR), decision
The XGBoost algorithm is widely used by data scientists to achieve state- tree (DT) and gradient boosted trees (GBT) on the testing set using the 14
of-the-art results on many machine learning challenges and has been original features (top). When we include RFM features and time features
used by a series of competition winning solutions [17]. Note that recent using the von Mises distribution, the performance of all three models
model explaining techniques, such as SHapley Additive exPlanation improves significantly (middle of Table 7). In particular the Savings, F1
(SHAP,Lundberg and Lee [42]) and Local Interpretable Model-agnostic and AUPRC values of the three models have clearly increased. Their
Explanations (LIME, Ribeiro et al. [52]) make it possible to provide overall performance is further enhanced when we add the features that
model interpretability for such black box methods. These perturbation- are based on the anomaly detection techniques (bottom of Table 7).

Table 8
Performance of logistic regression (top), decision tree (middle) and gradient boosted trees (bottom) on the testing set using different over-sampling methods: SMOTE,
ADASYN, MWMOTE and ROSE.
Logistic regression (LR)

Precision Recall F1 FPR AUPRC Savings % of fraud amount detected

Original 0.7647 0.6190 0.6842 0.0020 0.6975 0.6751 0.7042

SMOTE 0.4103 0.7619 0.5333 0.0116 0.6408 0.7647 0.8316
ADASYN 0.4167 0.7143 0.5263 0.0106 0.6924 0.6674 0.7291
MWMOTE 0.4706 0.7619 0.5818 0.0091 0.6388 0.7733 0.8316
ROSE 0.4324 0.7619 0.5517 0.0106 0.6692 0.7681 0.8316

Decision tree (DT)

Precision Recall F1 FPR AUPRC Savings % of fraud amount detected
Original 0.8125 0.6190 0.7027 0.0015 0.6370 0.6883 0.7158
SMOTE 0.5000 0.7619 0.6038 0.0081 0.5118 0.7712 0.8261
ADASYN 0.5667 0.8095 0.6667 0.0066 0.3716 0.7987 0.8501
MWMOTE 0.4545 0.7143 0.5556 0.0091 0.4001 0.6739 0.7305
ROSE 0.6190 0.6190 0.6190 0.0040 0.6565 0.6866 0.7226

Gradient boosted trees (GBT)

Precision Recall F1 FPR AUPRC Savings % of fraud amount detected
Original 0.8750 0.6667 0.7568 0.0010 0.7669 0.7908 0.8183
SMOTE 0.6842 0.6190 0.6500 0.0030 0.7146 0.5941 0.6266
ADASYN 0.8462 0.5238 0.6471 0.0010 0.7763 0.5962 0.6184
MWMOTE 0.7500 0.5714 0.6486 0.0020 0.6931 0.5975 0.6249
ROSE 0.6667 0.0952 0.1667 0.0005 0.4341 0.0430 0.0482

11
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

Using the original features, the three models are only able to detect [4] B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, J. Suykens, J. Vanthienen,
Benchmarking state-of-the-art classification algorithms for credit scoring, J. Oper.
around 50% of the fraudulent amounts. By including the features that
Res. Soc. 54 (2003) 627–635.
are created by the various feature engineering methods, the improved [5] B. Baesens, S. Höppner, W. Verbeke, T. Verdonck, Instance-dependent cost-
models can block more than 70% of the stolen money and thus saving sensitive learning for detecting transfer fraud, arXiv (2020) preprint arXiv:
more than 67% of the costs compared to not using any fraud detection 2005.02488.
[6] A.C. Bahnsen, D. Aouada, A. Stojanovic, B. Ottersten, Feature engineering
system. strategies for credit card fraud detection, Expert Syst. Appl. 51 (2016) 134–142.
While the data set is now extended with new features, the imbalance [7] L. Barabesi, A. Cerasa, A. Cerioli, D. Perrotta, Goodness-of-fit testing for the
between the fraudulent and legitimate transactions remains. To address newcomb-benford law with application to the detection of customs fraud, J. Bus.
Econ. Stat. 36 (2018) 346–358.
this issue we apply the following over-sampling methods on the [8] O. Barkan, N. Koenigstein, Item2vec: Neural Item Embedding for Collaborative
extended training set: SMOTE, ADASYN, MWMOTE and ROSE, each Filtering, 2016 arXiv:1603.04259.
with their default parameters as suggested by their respective authors. [9] S. Barua, M.M. Islam, X. Yao, K. Murase, Mwmote–majority weighted minority
oversampling technique for imbalanced data set learning, IEEE Trans. Knowl. Data
We use these over-sampling techniques such that the new, re-balanced Eng. 26 (2012) 405–425.
training set contains a ratio of 90% legitimate cases versus 10% fraud [10] S. Bhattacharyya, S. Jha, K. Tharakunnel, J.C. Westland, Data mining for credit
cases. In Table 8 we present the results for all three classifiers with each card fraud: a comparative study, Decis. Support. Syst. 50 (2011) 602–613.
[11] K. Boudt, P.J. Rousseeuw, S. Vanduffel, T. Verdonck, The minimum regularized
of the over-sampling methods. Notice how the performance varies covariance determinant estimator, Stat. Comput. 30 (2020) 113–128.
depending on the chosen over-sampling method. The Savings value of [12] L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and regression trees,
the logistic regression model is mostly improved with MWMOTE as well Wadsworth Int. Group 37 (1984) 237–251.
[13] M.M. Breunig, H.P. Kriegel, R.T. Ng, J. Sander, Lof: identifying density-based local
as SMOTE and ROSE. The Savings value of the decision tree, however,
outliers, in: Proceedings of the 2000 ACM SIGMOD International Conference on
only increases with ADASYN and SMOTE. While logistic regression and Management of Data, 2000, pp. 93–104.
decision tree may benefit from over-sampling methods, the overall [14] M.R. Brito, E.L. Chávez, A.J. Quiroz, J.E. Yukich, Connectivity of the mutual k-
performance of the gradient boosted trees is decreasing. This may be due nearest-neighbor graph in clustering and outlier detection, Statistics & Probability
Letters 35 (1997) 33–42.
to the boosting algorithm which could be over-fitting the classifier on [15] R.J. Carroll, D. Ruppert, Transformations in regression: a robust analysis,
the over-sampled training set resulting in a lesser performance on the Technometrics 27 (1985) 1–12.
testing set. Depending on the chosen classification method, there is [16] N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, Smote: synthetic minority
over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357.
definitely potential in over-sampling the training set with synthetic [17] T. Chen, C. Guestrin, Xgboost: a scalable tree boosting system, in: Proceedings of
fraud cases, although there is not one over-sampling technique that will the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data
always yield the best result. Mining, ACM, 2016, pp. 785–794.
[18] A. Dal Pozzolo, O. Caelen, Y.A. Le Borgne, S. Waterschoot, G. Bontempi, Learned
lessons in credit card fraud detection from a practitioner perspective, Expert Syst.
6. Conclusions and future research Appl. 41 (2014) 4915–4928.
[19] L. Davies, U. Gather, The identification of multiple outliers, J. Am. Stat. Assoc. 88
(1993) 782–792.
In this paper, we extensively researched data engineering in a fraud [20] J. Davis, M. Goadrich, The relationship between precision-recall and ROC curves,
detection setting. More specifically, we decomposed data engineering in: Proceedings of the 23rd International Conference on Machine Learning, ACM,
into feature engineering and instance engineering. Our motivation for 2006, pp. 233–240.
[21] European Central Bank, E, Fifth Report on Card Fraud, URL, www.ecb.europa.
doing so is that, based upon past extensive research, it is our firm belief
eu/pub/cardfraud/html/ecb.cardfraudreport201809.en.html, September 2018.
that the best way to boost the performance of any analytical technique is [22] T. Fawcett, ROC graphs: notes and practical considerations for researchers, Mach.
to smartly engineer the data instead of overly focusing on the develop Learn. 31 (2004) 1–38.
ment of new, often times highly complex, analytical techniques giving us [23] T. Fawcett, An introduction to ROC analysis, Pattern Recogn. Lett. 27 (2006)
861–874.
analytical models which are often only poorly benchmarked and give us [24] A. Fernández, S. Garca, M. Galar, R.C. Prati, B. Krawczyk, F. Herrera, Learning
no interpretability at all. We used a payment transactions data set from a from Imbalanced Data Sets, Springer, 2018.
large European Bank to illustrate the substantial impact of data engi [25] N.I. Fisher, Statistical Analysis of Circular Data, Cambridge University Press, 1995.
[26] J.H. Friedman, Greedy function approximation: a gradient boosting machine, Ann.
neering on the performance of a fraud detection mode. We empirically Stat. (2001) 1189–1232.
showed that both the feature engineering and instance engineering steps [27] J. Friedman, T. Hastie, R. Tibshirani, et al., Additive logistic regression: a statistical
significantly improved the performance of popular analytical models. view of boosting (with discussion and a rejoinder by the authors), Ann. Stat. 28
(2000) 337–407.
Moreover, we have illustrated that by clever engineering of the data [28] M. Goldstein, S. Uchida, A comparative evaluation of unsupervised anomaly
simple analytical techniques as logistic regression and classification detection algorithms for multivariate data, PLoS One 11 (2016), e0152173.
trees yield very good results. Although the focus in this paper is on [29] A. Grover, J. Leskovec, node2vec: scalable feature learning for networks, in:
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
payment transactions fraud, the discussed techniques are also useful or Discovery and Data Mining, 2016, pp. 855–864.
could be extended to other types of fraud, e.g. in healthcare, insurance [30] W.L. Hamilton, R. Ying, J. Leskovec, Inductive Representation Learning on Large
or e-commerce. Graphs, 2017 arXiv:1706.02216.
[31] D.J. Hand, C. Whitrow, N.M. Adams, P. Juszczak, D. Weston, Performance criteria
for plastic card fraud detection tools, J. Oper. Res. Soc. 59 (2008) 956–962.
Acknowledgements [32] H. He, Y. Bai, E.A. Garcia, S. Li, Adasyn: adaptive synthetic sampling approach for
imbalanced learning, in: 2008 IEEE International Joint Conference on Neural
The authors gratefully acknowledge the financial support from the Networks (IEEE World Congress on Computational Intelligence), IEEE, 2008,
pp. 1322–1328.
BNP Paribas Fortis Research Chair in Fraud Analytics at KU Leuven and [33] S. Heritier, E. Cantoni, S. Copt, M.P. Victoria-Feser, Robust Methods in Biostatistics
the Internal Funds KU Leuven under grant C16/15/068. 825, John Wiley & Sons, 2009.
[34] M. Hubert, E. Vandervieren, An adjusted boxplot for skewed distributions,
Computational statistics & data analysis 52 (2008) 5186–5201.
References [35] S. Jha, M. Guillen, J.C. Westland, Employing transaction aggregation strategy to
detect credit card fraud, Expert Syst. Appl. 39 (2012) 12650–12657.
[1] A. Amin, S. Anwar, A. Adnan, M. Nawaz, N. Howard, J. Qadir, A. Hawalah, [36] G. Kovács, Smote-variants: a python implementation of 85 minority oversampling
A. Hussain, Comparing oversampling techniques to handle the class imbalance techniques, Neurocomputing 366 (2019) 352–354.
problem: a customer churn prediction case study, IEEE Access 4 (2016) [37] W.J. Krzanowski, D.J. Hand, ROC Curves for Continuous Data, Chapman and Hall/
7940–7957. CRC, 2009.
[2] F. Angiulli, C. Pizzuti, Fast outlier detection in high dimensional spaces, in: [38] S. Lessmann, B. Baesens, H.V. Seow, L.C. Thomas, Benchmarking state-of-the-art
European Conference on Principles of Data Mining and Knowledge Discovery, classification algorithms for credit scoring: an update of research, Eur. J. Oper. Res.
Springer, 2002, pp. 15–27. 247 (2015) 124–136.
[3] A. Atkinson, M. Riani, Robust Diagnostic Regression Analysis, Springer Science & [39] C.X. Ling, J. Huang, H. Zhang, et al., AUC: a statistically consistent and more
Business Media, 2000. discriminating measure than accuracy, in: IJCAI, 2003, pp. 519–524.

12
B. Baesens et al. Decision Support Systems xxx (xxxx) xxx

[40] F.T. Liu, K.M. Ting, Z.H. Zhou, Isolation forest, in: 2008 Eighth IEEE International [57] M. Saerens, P. Latinne, C. Decaestecker, Adjusting the outputs of a classifier to new
Conference on Data Mining, IEEE, 2008, pp. 413–422. a priori probabilities: a simple procedure, Neural Comput. 14 (2002) 21–41.
[41] N. Lunardon, G. Menardi, N. Torelli, Rose: A package for binary imbalanced [58] T. Saito, M. Rehmsmeier, The precision-recall plot is more informative than the roc
learning, R Journal (2014) 6. plot when evaluating binary classifiers on imbalanced datasets, PloS one (2015)
[42] S.M. Lundberg, S.I. Lee, A unified approach to interpreting model predictions, in: 10.
Advances in Neural Information Processing Systems, 2017, pp. 4765–4774. [59] J.A. Swets, Signal Detection Theory and ROC Analysis in Psychology and
[43] A. Marazzi, A.J. Villar, V.J. Yohai, Robust response transformations based on Diagnostics: Collected Papers, Psychology Press, 2014.
optimal prediction, J. Am. Stat. Assoc. 104 (2009) 360–370. [60] J.W. Tukey, Exploratory data analysis vol. 2, 1977. Reading, MA.
[44] R.A. Maronna, R.D. Martin, V.J. Yohai, M. Salibián-Barrera, Robust Statistics: [61] V. Van Vlasselaer, T. Eliassi-Rad, L. Akoglu, M. Snoeck, B. Baesens, Gotcha!
Theory and Methods (with R), John Wiley & Sons, 2019. Network-based fraud detection for social security fraud, Manag. Sci. 63 (2017)
[45] E.W. Ngai, Y. Hu, Y.H. Wong, Y. Chen, X. Sun, The application of data mining 3090–3110.
techniques in financial fraud detection: a classification framework and an academic [62] C. Whitrow, D.J. Hand, P. Juszczak, D. Weston, N.M. Adams, Transaction
review of literature, Decis. Support. Syst. 50 (2011) 559–569. aggregation as a strategy for credit card fraud detection, Data Min. Knowl. Disc. 18
[46] M.J. Nigrini, Benford’s Law: Applications for Forensic Accounting, Auditing, and (2009) 30–55.
Fraud Detection 586, John Wiley & Sons, 2012. [63] B. Zhu, Z. Gao, J. Zhao, S.K. Vanden Broucke, Iric: an r library for binary
[47] C. Phua, V. Lee, K. Smith, R. Gayler, A comprehensive survey of data mining-based imbalanced classification, SoftwareX 10 (2019) 100341.
fraud detection research, arXiv (2010) preprint arXiv:1009.6119.
[48] F. Provost, T. Fawcett, R. Kohavi, The case against accuracy estimation for
Bart Baesens Faculty of Economics and Business, KU Leuven, Naamsestraat 69, B-3000
comparing classifiers. 5th int, in: Conference on Machine Learning, Kaufman
Leuven, Belgium. www.dataminingapps.com. Southampton Business School, University of
Morgan, San Francisco, 1998, pp. 445–453.
Southampton, 12 University Road, Highfield, Southampton SO17 1BJ, United Kingdom.
[49] J.R. Quilan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publichers,
Research interests: data mining and analytics, credit scoring, fraud detection, marketing
San Mateo, 1993.
analytics.
[50] J. Raymaekers, P.J. Rousseeuw, Transforming variables to central normality, arXiv
(2020) preprint arXiv:2005.07946.
[51] M. Riani, Robust transformations in univariate and multivariate time series, Econ. Sebastiaan Höppner Faculty of Science, Department of Mathematics, KU Leuven, Cel
Rev. 28 (2008) 262–278. estijnenlaan 200B, B-3001 Leuven, Belgium. https://www.kuleuven.be/wieiswie/nl/pe
[52] M.T. Ribeiro, S. Singh, C. Guestrin, “Why should i trust you?” explaining the rson/00111217. Research interests: robust statistics, fraud detection, high-dimensional
predictions of any classifier, in: Proceedings of the 22nd ACM SIGKDD data analysis.
International Conference on Knowledge Discovery and Data Mining, 2016,
pp. 1135–1144.
Tim Verdonck Faculty of Science, Department of Mathematics, UAntwerp, Mid
[53] P.J. Rousseeuw, K.V. Driessen, A fast algorithm for the minimum covariance
delheimlaan 1, B-2020 Antwerp, Belgium. https://www.uantwerpen.be/nl/personeel/t
determinant estimator, Technometrics 41 (1999) 212–223.
im-verdonck/. Faculty of Science, Department of Mathematics, KU Leuven, Celes
[54] P.J. Rousseeuw, M. Hubert, Anomaly detection by robust statistics, Wiley
tijnenlaan 200B, B-3001 Leuven, Belgium. https://www.kuleuven.be/wieiswie/nl/person
Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8 (2018), e1236.
/00071962. Research: statistical data science, anomaly and fraud detection, actuarial
[55] P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection 589, John
science.
Wiley & Sons, 2005.
[56] P. Rousseeuw, D. Perrotta, M. Riani, M. Hubert, Robust monitoring of time series
with application to fraud detection, Econometrics and Statistics 9 (2019) 108–121.

Salesforce AI Associate Dumps
100% (4)
Salesforce AI Associate Dumps
60 pages
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
60% (10)
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
9 pages
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
No ratings yet
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
3 pages
Ford Escape 4wd Workshop Manual v6 3 0l 2008
100% (4)
Ford Escape 4wd Workshop Manual v6 3 0l 2008
7,556 pages
2019 Book EssentialsOfBusinessAnalytics PDF
93% (14)
2019 Book EssentialsOfBusinessAnalytics PDF
971 pages
AP Statistics Chapter 3
0% (1)
AP Statistics Chapter 3
3 pages
Udemy 2024 Learning Trends Top 100 Surging Skills Infographic
100% (1)
Udemy 2024 Learning Trends Top 100 Surging Skills Infographic
1 page
Real-Time Credit Card Fraud Detection Using Machine Learning
No ratings yet
Real-Time Credit Card Fraud Detection Using Machine Learning
6 pages
Credit Card Fraud Detection Proposal Redone
No ratings yet
Credit Card Fraud Detection Proposal Redone
5 pages
Home Depot Strategy
100% (1)
Home Depot Strategy
8 pages
IoT Platform Complete Self-Assessment Guide
From Everand
IoT Platform Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
TOGAF® Business Architecture Level 1 Study Guide
From Everand
TOGAF® Business Architecture Level 1 Study Guide
Andrew Josey
No ratings yet
Single customer view Second Edition
From Everand
Single customer view Second Edition
Gerardus Blokdyk
No ratings yet
Ultimate Python for Fintech Solutions
From Everand
Ultimate Python for Fintech Solutions
Bhagvan Kommadi
No ratings yet
Federated Identity Management Complete Self-Assessment Guide
From Everand
Federated Identity Management Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Data Literacy Fundamentals: Understanding the Power & Value of Data
From Everand
Data Literacy Fundamentals: Understanding the Power & Value of Data
Ben Jones
No ratings yet
Mastering IDEAScript: The Definitive Guide
From Everand
Mastering IDEAScript: The Definitive Guide
IDEA
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Reference architecture Standard Requirements
From Everand
Reference architecture Standard Requirements
Gerardus Blokdyk
No ratings yet
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
Report
No ratings yet
Report
14 pages
Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
No ratings yet
Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
18 pages
Credit Card Fraud Detection Using A Deep Learning Multistage Model
No ratings yet
Credit Card Fraud Detection Using A Deep Learning Multistage Model
26 pages
Credit Card Fraud 1.4% Positive Class
No ratings yet
Credit Card Fraud 1.4% Positive Class
17 pages
Final_synopsis_fraud_detection[1]
No ratings yet
Final_synopsis_fraud_detection[1]
15 pages
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
No ratings yet
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
10 pages
Statistical Fraud Detection - A Review
No ratings yet
Statistical Fraud Detection - A Review
21 pages
Enhancing Computational Efficiency in Credit Card Fraud Detection Models While Maintaining Accuracy
No ratings yet
Enhancing Computational Efficiency in Credit Card Fraud Detection Models While Maintaining Accuracy
42 pages
Online Transaction Fraud Detection Using Backlogging on e Commerce Website IJERTV11IS050319 (1)
No ratings yet
Online Transaction Fraud Detection Using Backlogging on e Commerce Website IJERTV11IS050319 (1)
6 pages
Fraud Detection On Bankism Data
No ratings yet
Fraud Detection On Bankism Data
25 pages
Seminar II Initial Review
No ratings yet
Seminar II Initial Review
13 pages
Srinivasulu Journal
No ratings yet
Srinivasulu Journal
5 pages
Online Transaction Fraud Detection System Based On Machine Learning
No ratings yet
Online Transaction Fraud Detection System Based On Machine Learning
4 pages
122208
No ratings yet
122208
17 pages
DBNex_Deep_Belief_Network_and_Explainable_AI_based_Financial_Fraud_Detection
No ratings yet
DBNex_Deep_Belief_Network_and_Explainable_AI_based_Financial_Fraud_Detection
10 pages
Expert Systems With Applications
No ratings yet
Expert Systems With Applications
11 pages
New Synopsis
No ratings yet
New Synopsis
18 pages
Major 1 2nd
No ratings yet
Major 1 2nd
13 pages
Ds 1
No ratings yet
Ds 1
6 pages
DS 1
No ratings yet
DS 1
9 pages
Fraud Detection
No ratings yet
Fraud Detection
15 pages
Final Doc of Fraud Detection in Banking Data by Machine Learning Techniques
No ratings yet
Final Doc of Fraud Detection in Banking Data by Machine Learning Techniques
63 pages
Finding Needles in A Haystack: Using Data Analytics To Improve Fraud Prediction
No ratings yet
Finding Needles in A Haystack: Using Data Analytics To Improve Fraud Prediction
53 pages
Computer Science
No ratings yet
Computer Science
30 pages
Credit-Card-Fraud-Detection-System-Using-Machine-Learning-Process (1)
No ratings yet
Credit-Card-Fraud-Detection-System-Using-Machine-Learning-Process (1)
4 pages
Copy of final eddited research paper1
No ratings yet
Copy of final eddited research paper1
6 pages
Credit Card Fraud Detection and Analysis
No ratings yet
Credit Card Fraud Detection and Analysis
4 pages
Credit Card Fraud Detection Using Machine Learning PDF
No ratings yet
Credit Card Fraud Detection Using Machine Learning PDF
6 pages
Credit Card Fraud Analysis
No ratings yet
Credit Card Fraud Analysis
3 pages
Upi Fraud Detection Using Machine Learning
No ratings yet
Upi Fraud Detection Using Machine Learning
4 pages
ASystematic Review of Intelligent Systems and Analytic
No ratings yet
ASystematic Review of Intelligent Systems and Analytic
22 pages
Chapter 6 2.0
No ratings yet
Chapter 6 2.0
4 pages
Research Proposal Template for Master Student
No ratings yet
Research Proposal Template for Master Student
15 pages
Enhancing_Attribute-Driven_Fraud_Detection_With_Risk-Aware_Graph_Representation
No ratings yet
Enhancing_Attribute-Driven_Fraud_Detection_With_Risk-Aware_Graph_Representation
12 pages
MJ, Kim 2002
No ratings yet
MJ, Kim 2002
6 pages
Synopsis Major Project CreditCardFraudDetection
No ratings yet
Synopsis Major Project CreditCardFraudDetection
16 pages
Credit Card Fraud Detection by Data Analytics Using Python: Malay Joshi, Yudhishthir Bhunwal and Dr. Smita Agarwal
No ratings yet
Credit Card Fraud Detection by Data Analytics Using Python: Malay Joshi, Yudhishthir Bhunwal and Dr. Smita Agarwal
4 pages
jcp-05-00009
No ratings yet
jcp-05-00009
36 pages
Fraudulent Financial Transactions Detection Using Machine Learning
No ratings yet
Fraudulent Financial Transactions Detection Using Machine Learning
10 pages
IJRPR16322
No ratings yet
IJRPR16322
15 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Advancing-Fraud-Detection-In-Banking-XAI
No ratings yet
Advancing-Fraud-Detection-In-Banking-XAI
11 pages
Fraud Detection Synopsis[1]
No ratings yet
Fraud Detection Synopsis[1]
14 pages
JETIR2305424
No ratings yet
JETIR2305424
6 pages
Data Mining
No ratings yet
Data Mining
9 pages
Credit Card Fraud Detection: Title
No ratings yet
Credit Card Fraud Detection: Title
5 pages
RESEARCHINTELre
No ratings yet
RESEARCHINTELre
8 pages
Resume Updated
100% (3)
Resume Updated
2 pages
Consumer Reports Buying Guide 2021
100% (1)
Consumer Reports Buying Guide 2021
227 pages
(PDF) Introduction To Selling Value - Course-Final
No ratings yet
(PDF) Introduction To Selling Value - Course-Final
75 pages
Political Analysis
No ratings yet
Political Analysis
11 pages
A Collection of Fraud Schemes
67% (3)
A Collection of Fraud Schemes
54 pages
GRE Text Completion and Sentence Equivalence Practice Questions
100% (2)
GRE Text Completion and Sentence Equivalence Practice Questions
32 pages
TED Talks List
100% (2)
TED Talks List
15 pages
Outdoor Living Skills (PDFDrive) PDF
No ratings yet
Outdoor Living Skills (PDFDrive) PDF
157 pages
ATS Resume Template PDF
No ratings yet
ATS Resume Template PDF
1 page
Online Casino Software For Sale and Casino Software Solutions
No ratings yet
Online Casino Software For Sale and Casino Software Solutions
2 pages
Globalization Strategy Playbook: Document Revision History
100% (2)
Globalization Strategy Playbook: Document Revision History
93 pages
Focus Investing PDF
No ratings yet
Focus Investing PDF
18 pages
Data Analytics Concepts Techniques and A PDF
100% (11)
Data Analytics Concepts Techniques and A PDF
451 pages
QuickBooks Online Core Certification Self Study Workbook V21.2.2
100% (1)
QuickBooks Online Core Certification Self Study Workbook V21.2.2
55 pages
SAP GTS Case Study - Citrix - Systems
100% (1)
SAP GTS Case Study - Citrix - Systems
2 pages
Guidance On Good Data and Record Management Practices
No ratings yet
Guidance On Good Data and Record Management Practices
44 pages
2015 Book IntroductionToNursingInformati
100% (1)
2015 Book IntroductionToNursingInformati
456 pages
Cyber Resilience Blueprint
No ratings yet
Cyber Resilience Blueprint
12 pages
The Chemical Engineer - Issue 983 - May 2023
No ratings yet
The Chemical Engineer - Issue 983 - May 2023
68 pages
Whitepaper - Third-Party Risk Management Services
No ratings yet
Whitepaper - Third-Party Risk Management Services
24 pages
NIST 2 Framework
100% (1)
NIST 2 Framework
32 pages
Microsoft AppSource Partner Listing Guidelines PDF
No ratings yet
Microsoft AppSource Partner Listing Guidelines PDF
10 pages
Linguistik Assigment by Sonia Fitri-1
No ratings yet
Linguistik Assigment by Sonia Fitri-1
6 pages
A New Method of Feature Fusion and Its A PDF
No ratings yet
A New Method of Feature Fusion and Its A PDF
12 pages
Journal of Experimental Zoology Part A Comparative Experimental Biology - 2004 - Steinberg - Townes and Holtfreter 1955
No ratings yet
Journal of Experimental Zoology Part A Comparative Experimental Biology - 2004 - Steinberg - Townes and Holtfreter 1955
6 pages
Determination of Optimum Bitumen Content For The Bituminous Concrete Mix Design
No ratings yet
Determination of Optimum Bitumen Content For The Bituminous Concrete Mix Design
11 pages
CWA Piping Design, Layout & Analysis
No ratings yet
CWA Piping Design, Layout & Analysis
2 pages
APW - Day I PDF
No ratings yet
APW - Day I PDF
26 pages
TVPaint Tutorial 11
No ratings yet
TVPaint Tutorial 11
15 pages
Cemtop 200
No ratings yet
Cemtop 200
4 pages
ICT Occupational Standards Level 5
No ratings yet
ICT Occupational Standards Level 5
80 pages
Daftar Sisa Stok Dan Obat Habis Terpakai
No ratings yet
Daftar Sisa Stok Dan Obat Habis Terpakai
6 pages
Environmentally-Aware and Energy-Efficient Multi-Drone Coordination and Networking For Disaster Response
No ratings yet
Environmentally-Aware and Energy-Efficient Multi-Drone Coordination and Networking For Disaster Response
17 pages
The Impact of Modern Technology On IT Students A
No ratings yet
The Impact of Modern Technology On IT Students A
5 pages
Queen Victoria Thesis
100% (3)
Queen Victoria Thesis
5 pages
Coletora de Dados R60
No ratings yet
Coletora de Dados R60
2 pages
ON1 Photo RAW 2019 User Guide (2019.2 - January 2019) PDF
No ratings yet
ON1 Photo RAW 2019 User Guide (2019.2 - January 2019) PDF
182 pages
Last Arc Rulebook (Demo v4.5)
No ratings yet
Last Arc Rulebook (Demo v4.5)
100 pages
Heat Straightening - Guide
No ratings yet
Heat Straightening - Guide
77 pages
GDL Land Past Paper 1
No ratings yet
GDL Land Past Paper 1
9 pages
Dandora Brochure
No ratings yet
Dandora Brochure
2 pages
4 Method Statement For Canopy Roofing Work
No ratings yet
4 Method Statement For Canopy Roofing Work
3 pages
Financial Analysis: P&G vs. Colgate-Palmolive
No ratings yet
Financial Analysis: P&G vs. Colgate-Palmolive
24 pages
Material Safety Data Sheet Mega Check Penetrant
No ratings yet
Material Safety Data Sheet Mega Check Penetrant
6 pages
ATSMDE Brochure
100% (1)
ATSMDE Brochure
2 pages
Letter From James Pallotta
100% (2)
Letter From James Pallotta
2 pages
Grad School Cover Letter
No ratings yet
Grad School Cover Letter
5 pages
List of The Approved Incredible India Bed & Breakfast Establishments' in Northern Region
No ratings yet
List of The Approved Incredible India Bed & Breakfast Establishments' in Northern Region
35 pages
KX-T7730X-B: Proprietary Telephone
No ratings yet
KX-T7730X-B: Proprietary Telephone
37 pages
Perspective: Project Site
No ratings yet
Perspective: Project Site
2 pages
National Transport Authority MSI Building, Royal Road, Les Cassis, Port Louis Tel: 202 2800 Application For The Reservation of A Registration Mark
No ratings yet
National Transport Authority MSI Building, Royal Road, Les Cassis, Port Louis Tel: 202 2800 Application For The Reservation of A Registration Mark
1 page
Sy Sem 2 Maxima Practical 5,6
No ratings yet
Sy Sem 2 Maxima Practical 5,6
13 pages