Aiml
Aiml
2
Agenda
• Problem domain
Transactional fraud detection systems are designed to identify and prevent fraudulent activities joccurring
within financial transactions. The problem domain encompasses various aspects of detecting, preventing, and
mitigating fraudulent behavior across different types of transactions, such as credit card transactions, online
payments, wire transfers, and more.
4
Identification of Problem and Detailed Analysis
• Problem Statement
To design and train a Machine learning model to detect fraudulent credit card transactions as well as fraudulent
bank account transfers.
•Detailed and extensive explanation of the purpose & uniqueness of the project
The purpose of this project is to develop a robust machine learning model capable of accurately detecting fraudulent
credit card transactions and bank account transfers. By combining the detection of fraudulent credit card
transactions and bank account transfers into a single model, we aim to provide a holistic solution that can
effectively identify fraudulent activities across multiple channels, enhancing the overall security posture of the
financial ecosystem. The uniqueness of this project lies in its utilization of diverse data sources and advanced
machine learning techniques to capture the nuanced patterns of fraudulent behavior specific to credit card
transactions and bank transfers.
5
Literature Review
2. A Model Based Hindawi Journals The CNN model Obtained an For different
on Convolutional 2020 based on feature accuracy of 98.5 sequences of
rearrangement
Neural Network Xinxin Zhou, constructed in this after features, the
for Online Xiabo Zhang paper has an excellent experimenting model has
Transaction Fraud experimental, with different different effects.
Detection performance with a feature sequences more attention to
good stability. model
needs neither high the discovery of
dimensional input sequence
features nor derivative characteristics of
variables. transactions. 6
Literature Review
Sl. No Title of the Author/s and Objectives of Results Gaps
work Publication the work Obtained Identified
details carried out
3. Transaction Fraud IEEE 2019 In this paper, we have Achieved a Model struggles if
Detection Based a tendency to propose training accuracy data is highly
Lutao Zheng, a way to extract users’
on Total Order of 92.6 after varied in terms of
Relation and Guanjun Liu, BPs supported their
group action records, training TH- types of
Behavior that is employed to LSTM transaction as
Diversity detect group action well as format of
fraud within the on-
line looking state of transaction.
affairs. OM
overcomes the
disadvantage of
Markov process
models since it
characterizes the
range of user
behaviors.
4. Financial Fraud IEEE 2018 CoDetect, which can Obtained an For different
Detection With Dongxu Huang, perform fraud accuracy of sequences of
detection on graph-
Anomaly Feature Dejun Mu based similarity almost 97 on the features, the
Detection matrix and feature outliers using model has
matrixsimultaneousl subspace different effects.
y. It introduces a clustering more attention to
new way to reveal
the nature of
the discovery of
financial activities sequence 7
from fraud patterns characteristics of
Objectives
• To train a deep learning model to detect fraudulent transactions made in the case of bank transfers or other
modes of online payments.
• To train another deep learning model to detect fraudulent transactions made in the case of credit card
payments.
• Appropriate datasets must be procured for to train the model. The datasets have been obtained from Kaggle and a
Convolutional Neural Network model has been selected for our purposes.
8
Methodology
This research investigates the potential of applying Convolutional Neural Networks (CNNs) in conjunction with an
oversampling technique to improve the accuracy and effectiveness of transactional fraud detection.
Data Acquisition: We will acquire a comprehensive dataset of labeled transaction records, encompassing both
legitimate and fraudulent activities. This data may be obtained from financial institutions, public datasets, or
simulated scenarios.
Data Preprocessing: The raw data will undergo cleaning and preprocessing steps to ensure its consistency and
quality. This may involve handling missing values, formatting inconsistencies, and feature engineering relevant to
the CNN architecture.
9
Methodology
Imbalanced Class Problem: Similar to the previous methodology, transactional fraud data often exhibits class
imbalance, where the number of fraudulent transactions is significantly smaller than legitimate ones. This
imbalance can hinder the learning process of CNNs.
Oversampling Technique:
SMOTE (Synthetic Minority Oversampling Technique): This method creates synthetic data points by interpolating
existing data points in the minority class, specifically focusing on features suitable for CNNs (e.g., numerical
transaction amounts, timestamps).
10
Methodology
Identify k-Nearest Neighbors: It then identifies k nearest neighbors of the selected instance from the minority class
based on a chosen distance metric (e.g., Euclidean distance).
Randomly Select a Neighbor: A neighbor from the k nearest neighbors is randomly chosen.
Generate Synthetic Data Point: The algorithm creates a new synthetic data point by linearly interpolating between
the selected minority class instance and its chosen neighbor.
Dense (16 units, ReLU activation): The first layer takes an input with six features and transforms it into a hidden
layer with 16 neurons. The ReLU (Rectified Linear Unit) activation function introduces non-linearity, allowing the
model to learn complex relationships between features.
Dense (24 units, ReLU activation): The second layer further processes the information from the first layer,
increasing the complexity by introducing 24 neurons with ReLU activation.
Dropout (0.5): This layer randomly drops 50% of the neurons during training, preventing overfitting and improving
the model's generalization to unseen data.
Dense (20 units, ReLU activation): Similar to the second layer, this layer adds another layer of complexity with 20
neurons and ReLU activation.
Dense (24 units, ReLU activation): Another layer with 24 neurons and ReLU activation further refines the learned
features. 11
Methodology
Dense (1 unit, sigmoid activation): The final layer has only one neuron and uses the sigmoid activation function,
which outputs a value between 0 and 1, suitable for binary classification tasks.
The model is compiled with the Adam optimizer, which efficiently updates the model weights during training. The
loss function is set to binary cross-entropy, appropriate for binary classification problems. Additionally, the model
tracks accuracy as a metric to evaluate its performance.
12
Design
13
Design
14
Experimental Results & Analysis
1) Paysim Dataset:
Paysim dataset is a synthetic dataset made by The Norwegian University of Science and Technology which contains
6,362,620 records out of which 8,213 are fraudulent transactions. This dataset contains details such as account
number, transaction amount, account balance of both the concerned parties and class label (0-not a fraud, 1-fraud)
This is a dataset generated using Sparkov simulation which was ran for the duration of 1 Jan 2019 to 31 Dec 2020.
This dataset contains 1,296,675 records out of which 7,506 are fraudulent. This dataset contains details such as
credit card number, amount, transaction time, location of both the parties and class label (0-not a fraud, 1-fraud).
15
Experimental Results & Analysis
16
Experimental Results & Analysis
17
Demonstration of the project
Values closer
to 0 indicate
Model 1: not a fraud and
values closer
to 1 are
fraudulent
Model 2:
18
Conclusion & Future Work
Leveraging oversampling techniques and CNNs holds promise for enhancing transactional fraud detection. While
existing research shows positive results, further exploration is necessary to identify the most effective combinations
and assess their generalizability in real-world scenarios. By addressing class imbalance and utilizing the feature
learning capabilities of CNNs, this approach can contribute to developing more robust and accurate fraud detection
systems.
Explore the impact of different oversampling techniques (SMOTE, ADASYN) on CNN performance for fraud
detection.Investigate the effectiveness of combining oversampling with advanced CNN architectures (e.g., residual
networks).Analyze the generalizability of these approaches on real-world imbalanced fraud datasets from financial
institutions.
19
References
20