0% found this document useful (0 votes)
865 views

Credit Card Fraud Detection Proposal Redone

The document proposes using machine learning algorithms to detect credit card fraud. It discusses the problem of credit card fraud and reviews past approaches. The objectives are to identify fraudulent transactions, review fraud detection techniques, and propose models to accurately detect fraud. Logistic regression, decision trees, random forests and neural networks will be applied and evaluated on a credit card transactions dataset.

Uploaded by

adane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
865 views

Credit Card Fraud Detection Proposal Redone

The document proposes using machine learning algorithms to detect credit card fraud. It discusses the problem of credit card fraud and reviews past approaches. The objectives are to identify fraudulent transactions, review fraud detection techniques, and propose models to accurately detect fraud. Logistic regression, decision trees, random forests and neural networks will be applied and evaluated on a credit card transactions dataset.

Uploaded by

adane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Mekelle University, Ethiopian Institute of Technology

Faculty of Electrical and Computer Engineering


Course title: Independent Project In Computer Engineering
Program: M.Sc in Computer Engineering
Year: II, Semester: I, 2018

Proposal
Proposal title – “credit card fraud detection using machine learning algorithms”

Name: Adane Gebretsadik


ID: EITM/PR135773/10

Submitted to: [email protected]\

Introduction and motivation


Credit card fraud is a kind of theft or unauthorized activity to make payment using credit card in
an electronic payment system as a fake source of fund. The purpose of credit card fraud is to
obtain money or make payment without owner permission. It involves illegal use of card or card
information without the owner permission though it is a criminal deception and banned by laws.
Because of the advancement in technology and software’s, users can hide their identity and
locations while committing any transaction over the web, which increases the fraud over the
web. In order to thwart fraudsters, financial institutions must use current, advanced, customized
predictive analytics to protect themselves.
There are several different factors that make card fraud research worthwhile. The most obvious
advantage of having a proper fraud detection system in place is the restriction and control of
potential monetary loss due to fraudulent activity. Annually, card issuers suffer huge financial
losses due to card fraud and, consequently, large sums of money can be saved if successful and
effective fraud detection techniques are applied.
The dataset of this project is a CSV file selected from specific European bank that contains 31
features, the last feature is used to classify the transaction whether it is a fraud or not. It contains
only numerical input variables which are the result of a PCA transformation but the time and
amount features are not transformed.

Problem statement
Credit Card Fraud is a major concern in the financial industry nowadays.
For this project a standard CSV file dataset is downloaded from a web called kaggle and this
dataset is highly imbalanced dataset that is the great majority of the transactions in this dataset
are legitimate.

In this project a classification training model is used to predict the binary outcome of a
fraudulent and non-fraudulent and to solve the problem.

The algorithms that are going to be applied for this problem are Logistic regression, Decision
tree, Random Forest, k-nearest and neural network. So by measuring the performance of each
algorithm will be evaluated to meet the accuracy level of 100 % in the prediction of probability
of being fraudulent and non-fraudulent while minimizing the incorrect fraud classifications.

So the identification of the fraud is measured in this particular project using the metrics like
sensitivity and F1 – score for each technique. F1 – score represent a more balanced result as it is
the harmonic mean between precision and Recall. Sensitivity is more important in the sense that
we are more interested in identifying fraud than identifying legitimate customers.

Literature survey
Past research has shown that learning algorithms have their own set of assumptions, and by using
multiple algorithms the 2 strength of one algorithm can complement the weakness of another.
Furthermore, past studies have shown that probability based models can outperform neural
network models.
This author [1] Ong Shu Yee, Saravanan Sagadevan and Nurul Hashimah Ahamed Hassain
Malim have proposed the best methodology of Machine learning algorithms has been valuable,
using the supervised based classification using Bayesian network classifiers namely K2, Tree
Augmented Naïve Bayes (TAN), and Naïve Bayes, logistics and J48 classifiers. After pre-
processing the dataset using normalization and Principal Component Analysis, all the classifiers
achieved more than 95.0% accuracy compared to results attained before pre-processing the
dataset. The separation is achieved through the use of, Principal Component Analysis (PCA), to
detect the anomaly transactions.

In[2] Dorronsoro “et al.” in 1997 developed also proposed a system to detect credit card fraud
detection by using Neural Network which was presented in late 1943 by Walter Pitts and Warren
S.McCulloch as a data processing unit for classification or prediction problems .
Now-a-days, ANN have been successfully applied in business failure prediction, stock price
prediction, credit fraud detection and many more area using this machine learning algorithm.
In[3] and also Various modern techniques based on Sequence Alignment, Machine learning,
Artificial Intelligence, Genetic Programming, Data mining etc. has been evolved and is still
evolving to detect fraudulent transactions in credit card fraud detection. In addition to that a
survey of various techniques used in credit card fraud detection mechanisms has been shown
along with evaluation of each methodology based on certain design criteria.

This project is to detect the credit card fraud in the dataset obtained from kaggle by applying
Logistic regression, Decision tree, Random Forest and to evaluate their Accuracy, sensitivity,
specificity, precision using different models and compare and collate them to state the best
possible model to solve the credit card fraud detection problem.

General objective

The general objective of this project is to get a high level of predicting accuracy of detecting the
fraudulent and non-fraudulent by proposing a different techniques.

Specific objectives

The specific objectives of this project are:

➢ To identify the different types of credit card fraud in identifying fraudulent and non-
fraudulent characteristics.

➢ To review alternative techniques that have been used in fraud detection.

➢ To find the top 10 similar transactions for any given transaction in the dataset using
Linear Algebra.
➢ To propose a different models for this project so as to select the best model that can fit
with high level of accuracy of detecting the fraudulent activity.

➢ To minimize the risk of credit card fraud.

Methodology

• First, an appropriate tool set needs to be obtained which can be used to achieve the above
mentioned technical goals. And this tool set should include the pre-processor and
intended algorithms to measure the performance.

• Data transformation and data reduction are referred to as data pre-processing phase,
where the raw data is cleaned and transformed into appropriate forms (or standardization)
to be evaluated and fed into machine learners.

• Then, Principal Component Analysis technique will be employed to detect the anomaly
transactions. Principal Component Analysis is a method to transform the correlated
variables into a smaller number of uncorrelated attributes called Principal Components.
The objective of applying the method was to identify and reduce the dimensionality of
the dataset and discover new meaningful underlying attributes. The advantage of
Principal Component Analysis is during reducing the dimensions of the data using
eigenvector, the losses to the information of the data are insignificant.

• Since the data set is large, a more manageable subset of data has to be extracted which
exhibits a predetermined class distribution - a ratio of roughly 99:1 between legitimate
and fraudulent transactions.

• The pre-processor will be run on the resulting data set and the data will be split into
training and test sets with different size. We will then run a series of experiments on the
data sets using different machine learning of both supervised and unsupervised
algorithms during which the resulting performance of each algorithm will be computed
using the chosen performance measure. These results can then be analytically compared
to see how 3 algorithms compare to each other when applied to non-trivial real world
problem.

Expected outcome
From the models proposed in this project the best model that can fit the prediction with high
level of accuracy or 100% of accuracy of detecting a fraudulent credit card by using the
techniques mentioned.
Reference
[1] Ong Shu Yee, Saravanan Sagadevan and Nurul Hashimah Ahamed Hassain Malim.
Credit Card Fraud Detection Using Machine Learning As Data Mining Technique, on 12 August
2018
[2] Navanshu Khare and Saad Yunus Sait, Department of Computer Science and Engineering.
“Credit Card Fraud Detection Using Machine Learning Models and Collating Machine Learning
Models”.
[3] Bénard Jacobus Wiese. Credit Card Transactions, Fraud Detection, and Machine Learning:
Modelling Time with LSTM Recurrent Neural Networks. On august 2017.

You might also like