Credit Card Fraud Detection Using Machine Learning
Credit Card Fraud Detection Using Machine Learning
Machine Learning
A MINI
PROJECT REPORT
Submitted in partial fulfillment of the Requirements
For the award of Master of Computer Application Degree
Submitted by
APARNA THAKUR( LNCCMCA11108)
Under the Guidance of
Dr. Kavita Kanathey
JAN-JUNE, 2023
LNCT UNIVERSITY, BHOPAL
CERTIFICATE
This is to certify that the mini project report entitled “Credit Card Fraud
Detection Using Machine Learning“ submitted by Aparna
Thakur(LNCCMCA11108) has been carried out under the guidance of
Prof. Ved Lad, Master of computer application, LNCT UNIVERSITY,
BHOPAL. The project report is approved for submission requirement for
Mini Project in “Python” 2nd semester in Master of Computer
Application, LNCT UNIVERSITY, BHOPAL(M.P) during the academic
session JAN-JUNE, 2023.
Guided By
CONTENTS
Introduction
Design/Flowchart/Graph
Code/Implementation
Screenshots
Conclusion
Bibliography
INTRODUCTION
Context
Content
It contains only numerical input variables which are the result of a PCA
transformation. Unfortunately, due to confidentiality issues, we cannot
provide the original features and more background information about the
data. Features V1, V2, ... V28 are the principal components obtained with
PCA, the only features which have not been transformed with PCA are
'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between
each transaction and the first transaction in the dataset. The feature
'Amount' is the transaction Amount, this feature can be used for example-
dependant cost-senstive learning. Feature 'Class' is the response variable
and it takes value 1 in case of fraud and 0 otherwise.
Inspiration
Acknowledgements
The dataset has been collected and analysed during a research collaboration
of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of
ULB (Université Libre de Bruxelles) on big data mining and fraud
detection. More details on current and past projects on related topics are
available on https://www.researchgate.net/project/Fraud-detection-5 and
the page of the DefeatFraud project
Model Prediction
Now it is time to start building the model .The types of algorithms we are
going to use to try to do anomaly detection on this dataset are as follows
Typical machine learning methods tend to work better when the patterns
they try to learn are balanced, meaning the same amount of good and bad
behaviors are present in the dataset.
import numpy as np
import pandas as pd
import sklearn
import scipy
rcParams['figure.figsize'] = 14, 8
RANDOM_SEED = 42
data = pd.read_csv('creditcard.csv',sep=',')
data.head()
data.info()
data.isnull().values.any()
normal.Amount.describe()
f, (ax1, ax2) = plt.subplots(2, 1, sharex=True)
f.suptitle('Amount per transaction by class')
bins = 50
ax1.hist(fraud.Amount, bins = bins)
ax1.set_title('Fraud')
ax2.hist(normal.Amount, bins = bins)
ax2.set_title('Normal')
plt.xlabel('Amount ($)')
plt.ylabel('Number of Transactions')
plt.xlim((0, 20000))
plt.yscale('log')
plt.show();
# We Will check Do fraudulent transactions occur more often during certain time
frame ? Let us find out with a visual representation.
f, (ax1, ax2) = plt.subplots(2, 1, sharex=True)
f.suptitle('Time of transaction vs Amount by class')
ax1.scatter(Fraud.Time, Fraud.Amount)
ax1.set_title('Fraud')
ax2.scatter(Normal.Time, Normal.Amount)
ax2.set_title('Normal')
plt.xlabel('Time (in Seconds)')
plt.ylabel('Amount')
plt.show()
data.shape
print(outlier_fraction)
print("Fraud Cases : {}".format(len(Fraud)))
print("Valid Cases : {}".format(len(Valid)))
## Correlation
import seaborn as sns
#get correlations of each features in dataset
corrmat = data1.corr()
top_corr_features = corrmat.index
plt.figure(figsize=(20,20))
#plot heat map
g=sns.heatmap(data[top_corr_features].corr(),annot=True,cmap="RdYlGn")
classifiers ={
contamination=outlier_fraction,random_state=state,
verbose=0),
leaf_size=30, metric='minkowski',
p=2, metric_params=None,
contamination=outlier_fraction),
max_iter=-1, random_state=state)
type(classifiers)
n_outliers = len(Fraud)
y_pred = clf.fit_predict(X)
scores_prediction = clf.negative_outlier_factor_
clf.fit(X)
y_pred = clf.predict(X)
else:
clf.fit(X)
scores_prediction = clf.decision_function(X)
y_pred = clf.predict(X)
y_pred[y_pred == 1] = 0
y_pred[y_pred == -1] = 1
print("{}: {}".format(clf_name,n_errors))
print(accuracy_score(Y,y_pred))
print(classification_report(Y,y_pred))
SCREENSHOT
CONCLUSION
When comparing error precision & recall for 3 models , the Isolation Forest
performed much better than the LOF as we can see that the detection of fraud
cases is around 27 % versus LOF detection rate of just 2 % and SVM of 0%.
We can also improve on this accuracy by increasing the sample size or use deep
learning algorithms however at the cost of computational expense. We can also
use complex anomaly detection models to get better accuracy in determining
more fraudulent cases.
BIBLIOGRAPHY
1. Han J. and Kamber M. (2003): “Data Mining, Concepts and Techniques”,
Academic Press, 2003.
2. Han J., Pei J., and Yin Y. (2000): “Mining Frequent Patterns without Candidate
Generation”. In proceedings of International Conference on Management of Data
(ACM SIGMOD’00), pages 1-12, ACM Press Dallas, TX, United States, May
2000.
3. Hand D., Mannila H. and Smyth P. (2001): “Principle of Data Mining”. MIT
Press, Cambridge, Massachusetts, USA, 2001.