0% found this document useful (0 votes)

17 views

Fighting Money Laundering With Statistics and Machine Learning

The document discusses existing methods for detecting money laundering using statistics and machine learning. It proposes a unified terminology for anti-money laundering in banks and reviews selected exemplary methods, discussing advantages of reducing unsupervised and eliminating supervised client risk profiling. Finally, it outlines hardware, software, and other system requirements.

Uploaded by

20bd1a058t

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Fighting Money Laundering With Statistics and Machine Learning

Uploaded by

20bd1a058t

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Fighting Money Laundering With Statistics and

Machine Learning
ABSTRACT

Money laundering is a profound global problem. Nonetheless, there is little scientific

literature on statistical and machine learning methods for anti-money laundering. In
this paper, we focus on anti-money laundering in banks and provide an introduction
and review of the literature. We propose a unifying terminology with two central
elements: (i) client risk profiling and (ii) suspicious behavior flagging.We find that
client risk profiling is characterized by diagnostics, i.e., efforts to find and explain
risk factors. On the other hand, suspicious behavior flagging is characterized by non-
disclosed features and hand-crafted risk indices. Finally, we discuss directions for
future research. One major challenge is the need for more public data sets. This may
potentially be addressed by synthetic data generation. Other possible research
directions include semi-supervised and deep learning, interpretability, and fairness of
the results.

EXISTING SYSTEM

Badal-Valero et al. [37] combine Benford’s Law and four machine learning models.
Benford’s Law [38] gives an empirical distribution of leading digits. The authors use
it to extract features from financial statements. Specifically, they consider statements
from 335 suppliers to a company on trial for money laundering. Of these, 23
suppliers have been investigated and labeled as colluders. All other (non-
investigated) suppliers are treated as benevolent. The motivating idea is that any
colluders, hiding in the non-investigated group, should be misclassified by the
employed models. These include a logistic regression, feedforward neural network,
decision tree, and random forest. Random forests [39], in particular, combine
multiple decision trees. Every tree uses a random subset of features in every node
split. To address class imbalance, i.e., the unequal distribution of labels, the authors
investigate weighting and synthetic minority oversampling [40]. The former weighs
observations during training, giving higher importance to data from the minority
class. The latter balances the data before training, generating synthetic observations
of the minority class. According to the authors, synthetic minority oversampling
works the best. However, the conclusion is apparently based on simulated evaluation
data.

González and Valásquez [41] employ a decision tree, feedforward neural network,
and Bayesian network to model Chilean firms using false invoices. Bayesian
networks [42], in particular, are probabilistic models that represent variable
dependencies via directed acyclic graphs. The authors use data on 582,161 firms,
1,692 of which have been labeled as either fraudulent or non-fraudulent. Features
include information about previous audits and taxes paid. Because most firms are
unlabeled, the authors first use unsupervised learning to characterize high-risk
behavior. To this end, they employ self-organizing maps [43] and neural gas [44].
Both are neural network techniques that build on competitive learning [45] rather
than error correction (i.e., gradient-based optimization). While the methods do
produce clusters with some behavioral patterns, they do not appear useful for false
invoice detection. On the labeled training data, the feedforward neural network
achieves the best performance.

Camino et al. [58] flag clients with three outlier detection techniques: an isolation
forest, a one-class support vector machine, and a Gaussian mixture model. Isolation
forests [59] build multiple decision trees using random feature splits. Observations
isolated by comparatively few feature splits (averaged over all trees) are then
considered outliers. One-class support vector machines [60] use a kernel function to
map data into a reproducing Hilbert space. The method then seeks a maximum
margin hyperplane that separates data points from the origin. A small number of
observations are allowed to violate the hyperplane; these are considered outliers.
Finally, Gaussian mixture models [61] assume that all observations are generated by
a number of Gaussian distributions. Observations in low-density regions are then
considered outliers. The authors combine all three techniques into a single ensemble
method. The method is tested on a data set from an AML software company. This
contains one million transactions with client-level features recording summary
statistics. The authors report positive feedback from the data-supplying company;
otherwise, evaluation is limited.

Sun et al. [62] apply extreme value theory [63] to flag outliers in transaction streams.
The authors start by engineering two features. The first records the number of times
an account has reached a balanced state, i.e., when money transferred into an account
is transferred out again. The second records the number of effective fan-ins
associated with an account, i.e., when money transferred into the account surpasses a
given limit and the account again reaches a balanced state. Next, the Pickands–
Balkema–De Haan theorem [64], [65] is invoked to model (derived) conditional
feature exceedances according to a generalized Pareto distribution. The approach
allows the authors to flag transactions according to a probabilistic limit p (in analogy
to the p-values used to test null hypotheses). The approach is tested on real bank data
with simulated noise and outliers.

Disadvantages
 We find that studies on client risk profiling are characterized by diagnostics,
i.e., efforts to find and explain risk factors. Specifically, unsupervised methods
are used to search for new ‘‘risky’’ observations or risk factors. On the other
hand, supervised methods are used with an explanatory focus.
 We also find that studies employing unsupervised methods generally use
relatively large data sets. By contrast, studies employing supervised methods
use small (labeled) data sets.

Proposed System

In this paper, we focus onAMLin banks and aim to provide a technical review that
researchers and industry practitioners (statisticians and machine learning engineers)
can use as a guide to the current literature on statistical and machine learning
methods for AML in banks. Furthermore, we aim to provide a terminology that can
facilitate policy discussions, and to provide guidance on open challenges within the
literature. To achieve our aims, we (i) propose a unified terminology for AML in
banks, (ii) review selected exemplary methods, and (iii) present recent machine

learning concepts that may improve AML.

Advantages

 The proposed system reduced an UNSUPERVISED CLIENT RISK

PROFILING problem.
 The proposed system eliminates SUPERVISED CLIENT RISK PROFILING
problem.

SYSTEM REQUIREMENTS

➢ H/W System Configuration:-

➢ Processor - Pentium –IV

➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Key Board - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA

SOFTWARE REQUIREMENTS:

 Operating system : Windows 7 Ultimate.

 Coding Language : Python.

 Front-End : Python.

 Back-End : Django-ORM

 Designing : Html, css, javascript.

 Data Base : MySQL (WAMP Server).

Intuitive Biostatistics & Normality Test & Sample PDF
100% (11)
Intuitive Biostatistics & Normality Test & Sample PDF
605 pages
Ward and Wilson 1978 PDF
100% (1)
Ward and Wilson 1978 PDF
13 pages
To Practical Applications (Amy Batchelor)
No ratings yet
To Practical Applications (Amy Batchelor)
206 pages
Research Methodology
100% (1)
Research Methodology
10 pages
STAT Lab
No ratings yet
STAT Lab
6 pages
LSS2
No ratings yet
LSS2
25 pages
Fraud Detection in Banking Data Using Machine Learning
No ratings yet
Fraud Detection in Banking Data Using Machine Learning
17 pages
Integrating a Machine Learning-driven Fraud Detection System
No ratings yet
Integrating a Machine Learning-driven Fraud Detection System
7 pages
Phase 5
No ratings yet
Phase 5
10 pages
Client Profiling Aml Sys
No ratings yet
Client Profiling Aml Sys
8 pages
Topic 2
No ratings yet
Topic 2
5 pages
IEEE_Conference_Template (2)
No ratings yet
IEEE_Conference_Template (2)
3 pages
FINANCIAL DISTRESS PREDICTION USING MACHINE LEARNING
No ratings yet
FINANCIAL DISTRESS PREDICTION USING MACHINE LEARNING
5 pages
Approaches To Fraud Detection On
No ratings yet
Approaches To Fraud Detection On
10 pages
Development of a Machine Learning-Based Financial Risk Control Sy
No ratings yet
Development of a Machine Learning-Based Financial Risk Control Sy
70 pages
Link For Google Colab Note Book: Pa Ge
No ratings yet
Link For Google Colab Note Book: Pa Ge
17 pages
AI and DS Final Document For Phase 5
No ratings yet
AI and DS Final Document For Phase 5
9 pages
ANDROID BASED Project Machine
No ratings yet
ANDROID BASED Project Machine
14 pages
Researcch Paper
No ratings yet
Researcch Paper
27 pages
imac-pretty-1 (1)
No ratings yet
imac-pretty-1 (1)
8 pages
Banking Fraud Detection Outline
No ratings yet
Banking Fraud Detection Outline
6 pages
Case Study Front Page
No ratings yet
Case Study Front Page
11 pages
reearchpaper1
No ratings yet
reearchpaper1
19 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
FINANCIAL FRAUD DETECTION
No ratings yet
FINANCIAL FRAUD DETECTION
11 pages
Explainable Deep Behavioral Sequence Clustering For Transaction Fraud Detection
No ratings yet
Explainable Deep Behavioral Sequence Clustering For Transaction Fraud Detection
8 pages
Copy of final eddited research paper1
No ratings yet
Copy of final eddited research paper1
6 pages
A1
No ratings yet
A1
4 pages
Report
No ratings yet
Report
14 pages
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING
No ratings yet
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING
6 pages
Syllabus FRE-GY 7871
No ratings yet
Syllabus FRE-GY 7871
5 pages
Poster
No ratings yet
Poster
2 pages
Credit Card Fraud Detection_final
No ratings yet
Credit Card Fraud Detection_final
3 pages
Fraud Detection Project Report
No ratings yet
Fraud Detection Project Report
4 pages
Fraud Detection in Financial Transactions.ppt.pptx_20240805_175608_0000 (1)
No ratings yet
Fraud Detection in Financial Transactions.ppt.pptx_20240805_175608_0000 (1)
22 pages
Literature Review
No ratings yet
Literature Review
8 pages
Bioconf Iscku2024 00076
No ratings yet
Bioconf Iscku2024 00076
18 pages
Capstone Project - Credit Card Fraud Prediction - Alexandre Daltro
No ratings yet
Capstone Project - Credit Card Fraud Prediction - Alexandre Daltro
15 pages
Granular computing framework for credit card fraud detection
No ratings yet
Granular computing framework for credit card fraud detection
15 pages
Fraud Detection ML
No ratings yet
Fraud Detection ML
13 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
25 pages
credit card fraud detection
No ratings yet
credit card fraud detection
8 pages
MITS6011 - ResearchReport
No ratings yet
MITS6011 - ResearchReport
15 pages
synth
No ratings yet
synth
6 pages
Aml Alerte Optimization ML Graph
No ratings yet
Aml Alerte Optimization ML Graph
8 pages
Data Mining21253BBBBBB
No ratings yet
Data Mining21253BBBBBB
9 pages
Bank Fraud Prediction
No ratings yet
Bank Fraud Prediction
16 pages
ads
No ratings yet
ads
8 pages
Phase-2 for DS.docx
No ratings yet
Phase-2 for DS.docx
13 pages
final project document
No ratings yet
final project document
8 pages
Report
No ratings yet
Report
14 pages
Industrial Oriented Mini Project - Summer Internship On
No ratings yet
Industrial Oriented Mini Project - Summer Internship On
14 pages
Performance Evaluation of Class Balancing
No ratings yet
Performance Evaluation of Class Balancing
6 pages
Paper 28
No ratings yet
Paper 28
17 pages
Financial Supervision and Management System
No ratings yet
Financial Supervision and Management System
9 pages
Credit Card Fraud Detection Report
100% (1)
Credit Card Fraud Detection Report
17 pages
Credit Card Fraud Detection Using Machine Learning (1) (1)
No ratings yet
Credit Card Fraud Detection Using Machine Learning (1) (1)
8 pages
Financial Fraud Detection Using Machine Learning Techniques
No ratings yet
Financial Fraud Detection Using Machine Learning Techniques
43 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
34 pages
Research Proposal Template for Master Student
No ratings yet
Research Proposal Template for Master Student
15 pages
Abstarct
No ratings yet
Abstarct
1 page
Mini Project
No ratings yet
Mini Project
12 pages
A Hyperparameters Tunned ML Algorithm For Fraud Identification in Banking and Financial Transactions
No ratings yet
A Hyperparameters Tunned ML Algorithm For Fraud Identification in Banking and Financial Transactions
7 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
SoccerCPD TimeSeries Project
No ratings yet
SoccerCPD TimeSeries Project
6 pages
Statistics S1 Revision Papers With Answers
50% (2)
Statistics S1 Revision Papers With Answers
22 pages
Local Indicators of Spatial Association-LISA
No ratings yet
Local Indicators of Spatial Association-LISA
23 pages
Data Analysis
No ratings yet
Data Analysis
3 pages
Data Mining Charu C Aggarwal download
100% (1)
Data Mining Charu C Aggarwal download
84 pages
Iso-12099-2010 Nirs
No ratings yet
Iso-12099-2010 Nirs
12 pages
Fitting Ellipse
No ratings yet
Fitting Ellipse
3 pages
(eBook PDF) Understanding Basic Statistics 7th Edition download pdf
83% (6)
(eBook PDF) Understanding Basic Statistics 7th Edition download pdf
56 pages
FM-MB-Aluminum-6063-Billet-Premiums-Germany-Italy
No ratings yet
FM-MB-Aluminum-6063-Billet-Premiums-Germany-Italy
10 pages
Skittles Project Part 1
No ratings yet
Skittles Project Part 1
6 pages
DataMining Workbook Answers
No ratings yet
DataMining Workbook Answers
18 pages
2023 PLS
No ratings yet
2023 PLS
21 pages
Data Driven Decisions For Business
100% (1)
Data Driven Decisions For Business
14 pages
Docslide - Us - Statistics s1 Revision Papers With Answers PDF
No ratings yet
Docslide - Us - Statistics s1 Revision Papers With Answers PDF
22 pages
Interpret All Statistics and Graphs For One-Way ANOVA - Minitab Express
No ratings yet
Interpret All Statistics and Graphs For One-Way ANOVA - Minitab Express
18 pages
Business Analytics Data Analysis Decision Making 5th Edition S. Christian Albright - Read the ebook online or download it for the best experience
No ratings yet
Business Analytics Data Analysis Decision Making 5th Edition S. Christian Albright - Read the ebook online or download it for the best experience
78 pages
Mean Median and Mode For Grouped Data
No ratings yet
Mean Median and Mode For Grouped Data
4 pages
C990 25 PDF
No ratings yet
C990 25 PDF
12 pages
Simulation of Dynamic Rollover There SH Old For Heavy Trucks
100% (1)
Simulation of Dynamic Rollover There SH Old For Heavy Trucks
9 pages
ANOVA
No ratings yet
ANOVA
25 pages
Descriptivestatistics 170330121728
No ratings yet
Descriptivestatistics 170330121728
36 pages
Introduction To Descriptive Statistics
100% (1)
Introduction To Descriptive Statistics
43 pages
Investigate OOT and OOS in Stability Studies (PDFDrive)
100% (1)
Investigate OOT and OOS in Stability Studies (PDFDrive)
55 pages
Numerical Descriptive Measures
No ratings yet
Numerical Descriptive Measures
126 pages

Fighting Money Laundering With Statistics and Machine Learning

Uploaded by

Fighting Money Laundering With Statistics and Machine Learning

Uploaded by

Fighting Money Laundering With Statistics and

Money laundering is a profound global problem. Nonetheless, there is little scientific

learning concepts that may improve AML.

 The proposed system reduced an UNSUPERVISED CLIENT RISK

➢ H/W System Configuration:-

➢ Processor - Pentium –IV

 Operating system : Windows 7 Ultimate.

 Coding Language : Python.

 Designing : Html, css, javascript.

 Data Base : MySQL (WAMP Server).

You might also like