Credit Card Fraud Detection Proposal Redone
Credit Card Fraud Detection Proposal Redone
Proposal
Proposal title – “credit card fraud detection using machine learning algorithms”
Problem statement
Credit Card Fraud is a major concern in the financial industry nowadays.
For this project a standard CSV file dataset is downloaded from a web called kaggle and this
dataset is highly imbalanced dataset that is the great majority of the transactions in this dataset
are legitimate.
In this project a classification training model is used to predict the binary outcome of a
fraudulent and non-fraudulent and to solve the problem.
The algorithms that are going to be applied for this problem are Logistic regression, Decision
tree, Random Forest, k-nearest and neural network. So by measuring the performance of each
algorithm will be evaluated to meet the accuracy level of 100 % in the prediction of probability
of being fraudulent and non-fraudulent while minimizing the incorrect fraud classifications.
So the identification of the fraud is measured in this particular project using the metrics like
sensitivity and F1 – score for each technique. F1 – score represent a more balanced result as it is
the harmonic mean between precision and Recall. Sensitivity is more important in the sense that
we are more interested in identifying fraud than identifying legitimate customers.
Literature survey
Past research has shown that learning algorithms have their own set of assumptions, and by using
multiple algorithms the 2 strength of one algorithm can complement the weakness of another.
Furthermore, past studies have shown that probability based models can outperform neural
network models.
This author [1] Ong Shu Yee, Saravanan Sagadevan and Nurul Hashimah Ahamed Hassain
Malim have proposed the best methodology of Machine learning algorithms has been valuable,
using the supervised based classification using Bayesian network classifiers namely K2, Tree
Augmented Naïve Bayes (TAN), and Naïve Bayes, logistics and J48 classifiers. After pre-
processing the dataset using normalization and Principal Component Analysis, all the classifiers
achieved more than 95.0% accuracy compared to results attained before pre-processing the
dataset. The separation is achieved through the use of, Principal Component Analysis (PCA), to
detect the anomaly transactions.
In[2] Dorronsoro “et al.” in 1997 developed also proposed a system to detect credit card fraud
detection by using Neural Network which was presented in late 1943 by Walter Pitts and Warren
S.McCulloch as a data processing unit for classification or prediction problems .
Now-a-days, ANN have been successfully applied in business failure prediction, stock price
prediction, credit fraud detection and many more area using this machine learning algorithm.
In[3] and also Various modern techniques based on Sequence Alignment, Machine learning,
Artificial Intelligence, Genetic Programming, Data mining etc. has been evolved and is still
evolving to detect fraudulent transactions in credit card fraud detection. In addition to that a
survey of various techniques used in credit card fraud detection mechanisms has been shown
along with evaluation of each methodology based on certain design criteria.
This project is to detect the credit card fraud in the dataset obtained from kaggle by applying
Logistic regression, Decision tree, Random Forest and to evaluate their Accuracy, sensitivity,
specificity, precision using different models and compare and collate them to state the best
possible model to solve the credit card fraud detection problem.
General objective
The general objective of this project is to get a high level of predicting accuracy of detecting the
fraudulent and non-fraudulent by proposing a different techniques.
Specific objectives
➢ To identify the different types of credit card fraud in identifying fraudulent and non-
fraudulent characteristics.
➢ To find the top 10 similar transactions for any given transaction in the dataset using
Linear Algebra.
➢ To propose a different models for this project so as to select the best model that can fit
with high level of accuracy of detecting the fraudulent activity.
Methodology
• First, an appropriate tool set needs to be obtained which can be used to achieve the above
mentioned technical goals. And this tool set should include the pre-processor and
intended algorithms to measure the performance.
• Data transformation and data reduction are referred to as data pre-processing phase,
where the raw data is cleaned and transformed into appropriate forms (or standardization)
to be evaluated and fed into machine learners.
• Then, Principal Component Analysis technique will be employed to detect the anomaly
transactions. Principal Component Analysis is a method to transform the correlated
variables into a smaller number of uncorrelated attributes called Principal Components.
The objective of applying the method was to identify and reduce the dimensionality of
the dataset and discover new meaningful underlying attributes. The advantage of
Principal Component Analysis is during reducing the dimensions of the data using
eigenvector, the losses to the information of the data are insignificant.
• Since the data set is large, a more manageable subset of data has to be extracted which
exhibits a predetermined class distribution - a ratio of roughly 99:1 between legitimate
and fraudulent transactions.
• The pre-processor will be run on the resulting data set and the data will be split into
training and test sets with different size. We will then run a series of experiments on the
data sets using different machine learning of both supervised and unsupervised
algorithms during which the resulting performance of each algorithm will be computed
using the chosen performance measure. These results can then be analytically compared
to see how 3 algorithms compare to each other when applied to non-trivial real world
problem.
Expected outcome
From the models proposed in this project the best model that can fit the prediction with high
level of accuracy or 100% of accuracy of detecting a fraudulent credit card by using the
techniques mentioned.
Reference
[1] Ong Shu Yee, Saravanan Sagadevan and Nurul Hashimah Ahamed Hassain Malim.
Credit Card Fraud Detection Using Machine Learning As Data Mining Technique, on 12 August
2018
[2] Navanshu Khare and Saad Yunus Sait, Department of Computer Science and Engineering.
“Credit Card Fraud Detection Using Machine Learning Models and Collating Machine Learning
Models”.
[3] Bénard Jacobus Wiese. Credit Card Transactions, Fraud Detection, and Machine Learning:
Modelling Time with LSTM Recurrent Neural Networks. On august 2017.