0% found this document useful (0 votes)

95 views

Credit Card Fraud Detection Using Machine Learning

This document is a project report submitted by Aparna Thakur to fulfill requirements for a Master's degree in Computer Application at LNCT University, Bhopal. The project applies machine learning algorithms like Isolation Forest and Local Outlier Factor to detect credit card fraud from transaction data. It analyzes the data, explores correlations between features, and builds models to identify fraudulent transactions with the goal of protecting customers from unauthorized charges.

Uploaded by

manoharvats07

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views

Credit Card Fraud Detection Using Machine Learning

Uploaded by

manoharvats07

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Credit Card Fraud Detection Using

Machine Learning

A MINI
PROJECT REPORT
Submitted in partial fulfillment of the Requirements
For the award of Master of Computer Application Degree

LNCT UNIVERSITY, BHOPAL (M.P.)

MINOR PROJECT REPORT

Submitted by
APARNA THAKUR( LNCCMCA11108)
Under the Guidance of
Dr. Kavita Kanathey

MASTER OF COMPUTER APPLICATION

LNCT UNIVERSITY, BHOPAL

JAN-JUNE, 2023
LNCT UNIVERSITY, BHOPAL

MASTER OF COMPUTER APPLICATION

CERTIFICATE

This is to certify that the mini project report entitled “Credit Card Fraud
Detection Using Machine Learning“ submitted by Aparna
Thakur(LNCCMCA11108) has been carried out under the guidance of
Prof. Ved Lad, Master of computer application, LNCT UNIVERSITY,
BHOPAL. The project report is approved for submission requirement for
Mini Project in “Python” 2nd semester in Master of Computer
Application, LNCT UNIVERSITY, BHOPAL(M.P) during the academic
session JAN-JUNE, 2023.

Guided By

<Prof. Ved Lad>

SOCST, LNCT UNIVERSITY,BHOPAL

CONTENTS
Introduction

Design/Flowchart/Graph

Code/Implementation

Screenshots

Conclusion

Bibliography
INTRODUCTION

Context

It is important that credit card companies are able to recognize fraudulent

credit card transactions so that customers are not charged for items that
they did not purchase.

Content

The datasets contains transactions made by credit cards in September 2013

by european cardholders. This dataset presents transactions that occurred in
two days, where we have 492 frauds out of 284,807 transactions. The
dataset is highly unbalanced, the positive class (frauds) account for 0.172%
of all transactions.

It contains only numerical input variables which are the result of a PCA
transformation. Unfortunately, due to confidentiality issues, we cannot
provide the original features and more background information about the
data. Features V1, V2, ... V28 are the principal components obtained with
PCA, the only features which have not been transformed with PCA are
'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between
each transaction and the first transaction in the dataset. The feature
'Amount' is the transaction Amount, this feature can be used for example-
dependant cost-senstive learning. Feature 'Class' is the response variable
and it takes value 1 in case of fraud and 0 otherwise.

Inspiration

Identify fraudulent credit card transactions.

Given the class imbalance ratio, we recommend measuring the accuracy

using the Area Under the Precision-Recall Curve (AUPRC). Confusion
matrix accuracy is not meaningful for unbalanced classification.

Acknowledgements
The dataset has been collected and analysed during a research collaboration
of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of
ULB (Université Libre de Bruxelles) on big data mining and fraud
detection. More details on current and past projects on related topics are
available on https://www.researchgate.net/project/Fraud-detection-5 and
the page of the DefeatFraud project

Model Prediction

Now it is time to start building the model .The types of algorithms we are
going to use to try to do anomaly detection on this dataset are as follows

Isolation Forest Algorithm :

One of the newest techniques to detect anomalies is called Isolation

Forests. The algorithm is based on the fact that anomalies are data points
that are few and different. As a result of these properties, anomalies are
susceptible to a mechanism called isolation.

This method is highly useful and is fundamentally different from all

existing methods. It introduces the use of isolation as a more effective and
efficient means to detect anomalies than the commonly used basic distance
and density measures. Moreover, this method is an algorithm with a low
linear time complexity and a small memory requirement. It builds a good
performing model with a small number of trees using small sub-samples of
fixed size, regardless of the size of a data set.

Typical machine learning methods tend to work better when the patterns
they try to learn are balanced, meaning the same amount of good and bad
behaviors are present in the dataset.

How Isolation Forests Work The Isolation Forest algorithm isolates

observations by randomly selecting a feature and then randomly selecting a
split value between the maximum and minimum values of the selected
feature. The logic argument goes: isolating anomaly observations is easier
because only a few conditions are needed to separate those cases from the
normal observations. On the other hand, isolating normal observations
require more conditions. Therefore, an anomaly score can be calculated as
the number of conditions required to separate a given observation.
The way that the algorithm constructs the separation is by first creating
isolation trees, or random decision trees. Then, the score is calculated as the
path length to isolate the observation.

Local Outlier Factor(LOF) Algorithm

The LOF algorithm is an unsupervised outlier detection method which

computes the local density deviation of a given data point with respect to
its neighbors. It considers as outlier samples that have a substantially lower
density than their neighbors.

The number of neighbors considered, (parameter n_neighbors) is typically

chosen 1) greater than the minimum number of objects a cluster has to
contain, so that other objects can be local outliers relative to this cluster,
and 2) smaller than the maximum number of close by objects that can
potentially be local outliers. In practice, such informations are generally not
available, and taking n_neighbors=20 appears to work well in general.
GRAPH
SOURCE CODE

import numpy as np

import pandas as pd

import sklearn

import scipy

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.metrics import classification_report,accuracy_score

from sklearn.ensemble import IsolationForest

from sklearn.neighbors import LocalOutlierFactor

from sklearn.svm import OneClassSVM

from pylab import rcParams

rcParams['figure.figsize'] = 14, 8

RANDOM_SEED = 42

LABELS = ["Normal", "Fraud"]

data = pd.read_csv('creditcard.csv',sep=',')

data.head()

data.info()

data.isnull().values.any()

count_classes = pd.value_counts(data['Class'], sort = True)

count_classes.plot(kind = 'bar', rot=0)
plt.title("Transaction Class Distribution")
plt.xticks(range(2), LABELS)
plt.xlabel("Class")
plt.ylabel("Frequency")

## Get the Fraud and the normal dataset

fraud = data[data['Class']==1]
normal = data[data['Class']==0]
print(fraud.shape,normal.shape)

## We need to analyze more amount of information from the transaction data

#How different are the amount of money used in different transaction classes?
fraud.Amount.describe()

normal.Amount.describe()
f, (ax1, ax2) = plt.subplots(2, 1, sharex=True)
f.suptitle('Amount per transaction by class')
bins = 50
ax1.hist(fraud.Amount, bins = bins)
ax1.set_title('Fraud')
ax2.hist(normal.Amount, bins = bins)
ax2.set_title('Normal')
plt.xlabel('Amount ($)')
plt.ylabel('Number of Transactions')
plt.xlim((0, 20000))
plt.yscale('log')
plt.show();

# We Will check Do fraudulent transactions occur more often during certain time
frame ? Let us find out with a visual representation.
f, (ax1, ax2) = plt.subplots(2, 1, sharex=True)
f.suptitle('Time of transaction vs Amount by class')
ax1.scatter(Fraud.Time, Fraud.Amount)
ax1.set_title('Fraud')
ax2.scatter(Normal.Time, Normal.Amount)
ax2.set_title('Normal')
plt.xlabel('Time (in Seconds)')
plt.ylabel('Amount')
plt.show()

## Take some sample of the data

data1= data.sample(frac = 0.1,random_state=1)
data1.shape

data.shape

#Determine the number of fraud and valid transactions in the dataset

Fraud = data1[data1['Class']==1]
Valid = data1[data1['Class']==0]
outlier_fraction = len(Fraud)/float(len(Valid))

print(outlier_fraction)
print("Fraud Cases : {}".format(len(Fraud)))
print("Valid Cases : {}".format(len(Valid)))

## Correlation
import seaborn as sns
#get correlations of each features in dataset
corrmat = data1.corr()
top_corr_features = corrmat.index
plt.figure(figsize=(20,20))
#plot heat map
g=sns.heatmap(data[top_corr_features].corr(),annot=True,cmap="RdYlGn")

#Create independent and Dependent Features

columns = data1.columns.tolist()
# Filter the columns to remove data we do not want
columns = [c for c in columns if c not in ["Class"]]
# Store the variable we are predicting
target = "Class"
# Define a random state
state = np.random.RandomState(42)
X = data1[columns]
Y = data1[target]
X_outliers = state.uniform(low=0, high=1, size=(X.shape[0], X.shape[1]))
# Print the shapes of X & Y
print(X.shape)
print(Y.shape)
##Define the outlier detection methods

classifiers ={

"Isolation Forest":IsolationForest(n_estimators=100, max_samples=len(X),

contamination=outlier_fraction,random_state=state,
verbose=0),

"Local Outlier Factor":LocalOutlierFactor(n_neighbors=20, algorithm='auto',

leaf_size=30, metric='minkowski',

p=2, metric_params=None,
contamination=outlier_fraction),

"Support Vector Machine":OneClassSVM(kernel='rbf', degree=3,

gamma=0.1,nu=0.05,

max_iter=-1, random_state=state)

type(classifiers)
n_outliers = len(Fraud)

for i, (clf_name,clf) in enumerate(classifiers.items()):

#Fit the data and tag outliers

if clf_name == "Local Outlier Factor":

y_pred = clf.fit_predict(X)

scores_prediction = clf.negative_outlier_factor_

elif clf_name == "Support Vector Machine":

clf.fit(X)

y_pred = clf.predict(X)

else:

clf.fit(X)

scores_prediction = clf.decision_function(X)

y_pred = clf.predict(X)

#Reshape the prediction values to 0 for Valid transactions , 1 for Fraud

transactions

y_pred[y_pred == 1] = 0

y_pred[y_pred == -1] = 1

n_errors = (y_pred != Y).sum()

# Run Classification Metrics

print("{}: {}".format(clf_name,n_errors))

print("Accuracy Score :")

print(accuracy_score(Y,y_pred))

print("Classification Report :")

print(classification_report(Y,y_pred))
SCREENSHOT
CONCLUSION

 Isolation Forest detected 73 errors versus Local Outlier Factor detecting 97

errors vs. SVM detecting 8516 errors
 Isolation Forest has a 99.74% more accurate than LOF of 99.65% and SVM of
70.09

 When comparing error precision & recall for 3 models , the Isolation Forest
performed much better than the LOF as we can see that the detection of fraud
cases is around 27 % versus LOF detection rate of just 2 % and SVM of 0%.

 So overall Isolation Forest Method performed much better in determining the

fraud cases which is around 30%

 We can also improve on this accuracy by increasing the sample size or use deep
learning algorithms however at the cost of computational expense. We can also
use complex anomaly detection models to get better accuracy in determining
more fraudulent cases.

BIBLIOGRAPHY
1. Han J. and Kamber M. (2003): “Data Mining, Concepts and Techniques”,
Academic Press, 2003.

2. Han J., Pei J., and Yin Y. (2000): “Mining Frequent Patterns without Candidate
Generation”. In proceedings of International Conference on Management of Data
(ACM SIGMOD’00), pages 1-12, ACM Press Dallas, TX, United States, May
2000.

3. Hand D., Mannila H. and Smyth P. (2001): “Principle of Data Mining”. MIT
Press, Cambridge, Massachusetts, USA, 2001.

4. Hipp J., Guntzer U. and Nakhaeizadeh G. (2000): “Algorithms for Association

Rule Mining: A General Survey and Comparison”. SIGKDD Explorations, Vol. 2,
No. 1, pages 58-64, July 2000.

Internship Training Report
No ratings yet
Internship Training Report
36 pages
Report For Cse 343 Lpu
No ratings yet
Report For Cse 343 Lpu
25 pages
Report Minor Project PDF
No ratings yet
Report Minor Project PDF
37 pages
EXP 10 - Plate Heat Exchanger Report.
50% (2)
EXP 10 - Plate Heat Exchanger Report.
17 pages
Ipl Team Management
No ratings yet
Ipl Team Management
18 pages
Snehil Gupta Project Report
No ratings yet
Snehil Gupta Project Report
32 pages
Finalprojectreportsrms
No ratings yet
Finalprojectreportsrms
38 pages
Project
100% (1)
Project
25 pages
Health and Fitness Tracker
No ratings yet
Health and Fitness Tracker
55 pages
Hadoop Final Docment
100% (1)
Hadoop Final Docment
79 pages
MCA Project Report Format - MU - Updated
No ratings yet
MCA Project Report Format - MU - Updated
54 pages
Digital Naturalist Final (1) 22280
No ratings yet
Digital Naturalist Final (1) 22280
51 pages
Object Recognition
No ratings yet
Object Recognition
46 pages
18A25F0012
No ratings yet
18A25F0012
99 pages
Internship Report DiabetesPrediction
No ratings yet
Internship Report DiabetesPrediction
15 pages
Ooad Record Abinash
No ratings yet
Ooad Record Abinash
241 pages
Online Parking System
No ratings yet
Online Parking System
21 pages
Virtual Mouse Control Using Hand Class Gesture: Bachelor of Engineering Electronics and Telecommunication
No ratings yet
Virtual Mouse Control Using Hand Class Gesture: Bachelor of Engineering Electronics and Telecommunication
34 pages
Journal App Report
No ratings yet
Journal App Report
37 pages
Minor Project Report
No ratings yet
Minor Project Report
49 pages
Exam Cell Automation Project
0% (1)
Exam Cell Automation Project
17 pages
Guidelines For Major Project (MCA - 6 Sem)
No ratings yet
Guidelines For Major Project (MCA - 6 Sem)
9 pages
Multi Banking System: Mini Project Report On
No ratings yet
Multi Banking System: Mini Project Report On
116 pages
Online Bus Reservation System Project Report Good One
No ratings yet
Online Bus Reservation System Project Report Good One
59 pages
Detection and Mitigation of DDoS Attack in Cloud
No ratings yet
Detection and Mitigation of DDoS Attack in Cloud
9 pages
Project Report
No ratings yet
Project Report
50 pages
Big Data
No ratings yet
Big Data
30 pages
VTU Final Year Seminar Report FRONT PAGES
No ratings yet
VTU Final Year Seminar Report FRONT PAGES
6 pages
A Report of 08 Weeks Industrial Training At: ASPEXX Health Solution Pvt. LTD
No ratings yet
A Report of 08 Weeks Industrial Training At: ASPEXX Health Solution Pvt. LTD
74 pages
FSD Module 3 Notes
No ratings yet
FSD Module 3 Notes
16 pages
Mca, Bca Project List 2023-2024
No ratings yet
Mca, Bca Project List 2023-2024
90 pages
Image - Encryption - and - Decryption - BA Documentation
No ratings yet
Image - Encryption - and - Decryption - BA Documentation
33 pages
Vandana Internship Report
No ratings yet
Vandana Internship Report
48 pages
Dbms Project Report Inventory Management System
No ratings yet
Dbms Project Report Inventory Management System
41 pages
Digital Naturalist Final (1) 22280
0% (1)
Digital Naturalist Final (1) 22280
51 pages
Training Management System Project
No ratings yet
Training Management System Project
38 pages
E Learning Project Report
No ratings yet
E Learning Project Report
53 pages
This Is To Certify That, PIN: 17030-EE-097:, Studying Final Year
No ratings yet
This Is To Certify That, PIN: 17030-EE-097:, Studying Final Year
1 page
College Management e Magazine
No ratings yet
College Management e Magazine
82 pages
Data Mining and Business Intelligence File
No ratings yet
Data Mining and Business Intelligence File
53 pages
Mca Project - Synopsis
No ratings yet
Mca Project - Synopsis
11 pages
Palak Mishra Project Report
No ratings yet
Palak Mishra Project Report
45 pages
TEACHING AND EVALUATION SCHEME FOR 5th Semester (CSE) (Wef 2020-21)
No ratings yet
TEACHING AND EVALUATION SCHEME FOR 5th Semester (CSE) (Wef 2020-21)
25 pages
Final Project 2
No ratings yet
Final Project 2
104 pages
Onlinepay
No ratings yet
Onlinepay
23 pages
DBMS Mini Project Report (Review-1)
100% (1)
DBMS Mini Project Report (Review-1)
25 pages
Python Mini - Project - Reprot Final-1
No ratings yet
Python Mini - Project - Reprot Final-1
41 pages
Block Chain Mini-Project
No ratings yet
Block Chain Mini-Project
27 pages
Cms Report
No ratings yet
Cms Report
96 pages
Drowsiness Detection Using Opencv Final
No ratings yet
Drowsiness Detection Using Opencv Final
83 pages
Introduction To Data Warehousing and Business Intelligence
No ratings yet
Introduction To Data Warehousing and Business Intelligence
72 pages
Internship Report
No ratings yet
Internship Report
26 pages
Object Detection - Deep Learning: Jamia Hamdard
No ratings yet
Object Detection - Deep Learning: Jamia Hamdard
26 pages
Roo Project
No ratings yet
Roo Project
16 pages
Btech Project Report Intro
No ratings yet
Btech Project Report Intro
12 pages
Black Book
No ratings yet
Black Book
58 pages
Medical Insurance Cost Prediction Report Full
100% (1)
Medical Insurance Cost Prediction Report Full
50 pages
Online Credit Card Fraud Detection Using Big Data: A Project Review On
No ratings yet
Online Credit Card Fraud Detection Using Big Data: A Project Review On
16 pages
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet
Major project stage 2 ppt (2)
No ratings yet
Major project stage 2 ppt (2)
19 pages
10 - Anomaly Detection
No ratings yet
10 - Anomaly Detection
12 pages
Introduction To Data Quality Assessment Training Course: Instructor Notes
No ratings yet
Introduction To Data Quality Assessment Training Course: Instructor Notes
12 pages
Bilic-Zulle L.-Passing and Bablok Regression
100% (1)
Bilic-Zulle L.-Passing and Bablok Regression
4 pages
Log Returns
No ratings yet
Log Returns
10 pages
5 ASAP Business Analytics-BasicStatistics - Exploratory Data Analysis
No ratings yet
5 ASAP Business Analytics-BasicStatistics - Exploratory Data Analysis
24 pages
Dam Design PDF
100% (1)
Dam Design PDF
70 pages
Bank Customer Segmentation
No ratings yet
Bank Customer Segmentation
14 pages
Concrete Lab Report
No ratings yet
Concrete Lab Report
7 pages
NBA Salary Prediction Presentation
No ratings yet
NBA Salary Prediction Presentation
29 pages
Curing, Capping and Determining The Compressive Strength of Cylindrical Concrete Specimens
No ratings yet
Curing, Capping and Determining The Compressive Strength of Cylindrical Concrete Specimens
13 pages
Calibration of Mine Ventilation Network Models
No ratings yet
Calibration of Mine Ventilation Network Models
9 pages
Lesson 04 Data Analytics Overview
No ratings yet
Lesson 04 Data Analytics Overview
47 pages
01 Temitayo-Ds7006-Quantitative-Analysis
No ratings yet
01 Temitayo-Ds7006-Quantitative-Analysis
47 pages
Multi-Span Prestressed Concrete Girder Bridges W CVR
100% (1)
Multi-Span Prestressed Concrete Girder Bridges W CVR
39 pages
Modelling of Rainfall Intensity in A Watershed A C
No ratings yet
Modelling of Rainfall Intensity in A Watershed A C
10 pages
DMDW R20 Unit 5
No ratings yet
DMDW R20 Unit 5
21 pages
Bi12-019 Bi12-263 LW2
No ratings yet
Bi12-019 Bi12-263 LW2
17 pages
Basic Statistics
100% (9)
Basic Statistics
73 pages
Syllabus Mathematics in The Modern World
No ratings yet
Syllabus Mathematics in The Modern World
7 pages
Outlier Detection A Survey
No ratings yet
Outlier Detection A Survey
84 pages
Statistics For Machine Learning Part 01 1719342613
No ratings yet
Statistics For Machine Learning Part 01 1719342613
27 pages
Eda
No ratings yet
Eda
6 pages
Skittles Project Part 2 Spring 2019
100% (1)
Skittles Project Part 2 Spring 2019
3 pages
Çapalar2018 Chapter OptimizationOfPassengerDistrib
No ratings yet
Çapalar2018 Chapter OptimizationOfPassengerDistrib
8 pages
Growth Standard Charts For Monitoring Bodyweight I
No ratings yet
Growth Standard Charts For Monitoring Bodyweight I
28 pages
Detecting Data Outliers
No ratings yet
Detecting Data Outliers
7 pages
Credit Risk Modeling in Python Chapter1
100% (1)
Credit Risk Modeling in Python Chapter1
27 pages
MVTR Process
No ratings yet
MVTR Process
12 pages
NumXL - Getting Started
No ratings yet
NumXL - Getting Started
34 pages
RIQAS Evaluation of Performance
No ratings yet
RIQAS Evaluation of Performance
18 pages

Credit Card Fraud Detection Using Machine Learning

Uploaded by

Credit Card Fraud Detection Using Machine Learning

Uploaded by

Credit Card Fraud Detection Using

LNCT UNIVERSITY, BHOPAL (M.P.)

MINOR PROJECT REPORT

MASTER OF COMPUTER APPLICATION

LNCT UNIVERSITY, BHOPAL

MASTER OF COMPUTER APPLICATION

<Prof. Ved Lad>

It is important that credit card companies are able to recognize fraudulent

The datasets contains transactions made by credit cards in September 2013

Identify fraudulent credit card transactions.

Given the class imbalance ratio, we recommend measuring the accuracy

Isolation Forest Algorithm :

One of the newest techniques to detect anomalies is called Isolation

This method is highly useful and is fundamentally different from all

How Isolation Forests Work The Isolation Forest algorithm isolates

Local Outlier Factor(LOF) Algorithm

The LOF algorithm is an unsupervised outlier detection method which

The number of neighbors considered, (parameter n_neighbors) is typically

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.metrics import classification_report,accuracy_score

from sklearn.ensemble import IsolationForest

from sklearn.neighbors import LocalOutlierFactor

from sklearn.svm import OneClassSVM

from pylab import rcParams

LABELS = ["Normal", "Fraud"]

count_classes = pd.value_counts(data['Class'], sort = True)

## Get the Fraud and the normal dataset

## We need to analyze more amount of information from the transaction data

## Take some sample of the data

#Determine the number of fraud and valid transactions in the dataset

#Create independent and Dependent Features

"Isolation Forest":IsolationForest(n_estimators=100, max_samples=len(X),

"Local Outlier Factor":LocalOutlierFactor(n_neighbors=20, algorithm='auto',

"Support Vector Machine":OneClassSVM(kernel='rbf', degree=3,

for i, (clf_name,clf) in enumerate(classifiers.items()):

#Fit the data and tag outliers

if clf_name == "Local Outlier Factor":

elif clf_name == "Support Vector Machine":

#Reshape the prediction values to 0 for Valid transactions , 1 for Fraud

n_errors = (y_pred != Y).sum()

# Run Classification Metrics

print("Accuracy Score :")

print("Classification Report :")

 Isolation Forest detected 73 errors versus Local Outlier Factor detecting 97

 So overall Isolation Forest Method performed much better in determining the

4. Hipp J., Guntzer U. and Nakhaeizadeh G. (2000): “Algorithms for Association

You might also like