0% found this document useful (0 votes)
11 views

New Report

The project report focuses on developing a machine learning model for detecting credit card fraud, specifically using logistic regression. It highlights the increasing prevalence of credit card fraud due to online transactions and aims to automate the detection process to minimize financial losses. The project is conducted by students from RNS Institute of Technology under the guidance of Dr. Leena Chandrashekhar as part of their Bachelor of Engineering degree requirements.

Uploaded by

jaswant.rudraiah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

New Report

The project report focuses on developing a machine learning model for detecting credit card fraud, specifically using logistic regression. It highlights the increasing prevalence of credit card fraud due to online transactions and aims to automate the detection process to minimize financial losses. The project is conducted by students from RNS Institute of Technology under the guidance of Dr. Leena Chandrashekhar as part of their Bachelor of Engineering degree requirements.

Uploaded by

jaswant.rudraiah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Belagavi - 590 018

PROJECT REPORT ON

Credit Card Fraud Detection Using


Machine Learning
Submitted in Partial Fulfillment for the Award of Degree of
Bachelor of Engineering
in
Electronics and Communication Engineering

By
DIVYASHREE S 1RN21EC049
JASWANT R 1RN21EC055
KHUSHI RAJ 1RN21EC064

Under the Guidance of:


Dr. Leena Chandrashekhar
Associate Professor

Department of Electronics and Communication Engineering

RNS INSTITUTE OF TECHNOLOGY


Autonomous Institution Affiliated to VTU
Channasandra, Dr. Vishnuvardhan Road, Bengaluru-560098

2024-25
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
Jnana Sangama, Belagavi - 590 018

PROJECT REPORT ON

Credit Card Fraud Detection Using


Machine Learning
Submitted in Partial Fulfillment for the Award of Degree of
Bachelor of Engineering
in
Electronics and Communication Engineering

By
DIVYASHREE S 1RN21EC049
JASWANT R 1RN21EC055
KHUSHI RAJ 1RN21EC064

Under the Guidance of


Dr. Leena Chandrashekhar
Associate Professor

Department of Electronics and Communication Engineering

RNS INSTITUTE OF TECHNOLOGY


Autonomous Institution Affiliated to VTU
Channasandra, Dr. Vishnuvardhan Road, Bengaluru-560098

2024-25
RNS INSTITUTE OF TECHNOLOGY
Autonomous Institution Affiliated to VTU, Recognized by GOK, Approved by AICTE
(NAAC ‘A+ Grade’ Accredited, NBA Accredited (UG - CSE, ECE, ISE, EIE and EEE)
Channasandra, Dr. Vishnuvardhan Road, Bengaluru - 560 098

Department of Electronics and Communication Engineering

CERTIFICATE
Certified that the Project work entitled “Credit Card Fraud Detection Using
Machine Learning” is carried out by DIVYASHREE S (USN: 1RN21EC049),
JASWANT R(USN: 1RN21EC055), KHUSHI RAJ (USN: 1RN21EC064))
in partial fulfillment for the award of degree of Bachelor of Engineering in Electron-
ics and Communication Engineering of Visvesvaraya Technological University,
Belagavi, during the year 2024-2025. It is certified that all corrections and sugges-
tions indicated during internal assessment have been incorporated in the report. The
project report has been approved as it satisfies the academic requirements in respect
of the mini project work prescribed for the award of degree of Bachelor of Engi-
neering.

................................ ................................ ................................


Dr. Leena Chandrashekhar Dr. Rajini V Honnungar Dr. Ramesh Babu H S
Associate Professor Head of the Department Principal

External Viva

Name of the Examiners Signature with date

1 .......................................... ...........................................

2 .......................................... ...........................................
RNS INSTITUTE OF TECHNOLOGY
Autonomous Institution Affiliated to VTU, Recognized by GOK, Approved by AICTE
(NAAC ‘A+ Grade’ Accredited, NBA Accredited (UG - CSE, ECE, ISE, EIE and EEE)
Channasandra, Dr. Vishnuvardhan Road, Bengaluru - 560 098

Department of Electronics and Communication Engineering

DECLARATION
We hereby declare that the entire work embodied in this project report titled,
“Credit Card Fraud Detection Using Machine Learning ” submitted to Visves-
varaya Technological University, Belagavi, is carried out at the department of
Electronics and Communication Engineering, RNS Institue of Technology,
Bengaluru under the guidance of Dr. Leena Chandrashekhar, Associate Profes-
sor. This report has not been submitted for the award of any Diploma or Degree of
this or any other University.

Name USN Signature

1. DIVYASHREE S 1RN21EC049 .........................

2. JASWANT R 1RN21EC055 .........................

.........................
3. KHUSHI RAJ 1RN21EC064
3
4
5
6
Acknowledgements
The joy and satisfaction that accompany the successful completion of
any task would be incomplete without thanking those who made it possi-
ble. We are proud of being students of RNS Institute of Technology, the
Institution which shaped us for the better future.

We are grateful to our Founder Chairman, RNS Group of Institutions,


Late Dr. R.N.Shetty, for setting up state of the art facilities in the
Institution. We would like to express our gratitude to our Present Chair-
man, RNS Group of Institutions, Mr Satish R Shetty, for setting up
the environment conducive for all academic activities.

We sincerely thank our Director, Dr. M K Venkatesha, for being


the source of inspiration in whatever academic work we carry out.We are
grateful to our Principal, Dr. Ramesh Babu H S, who has always sup-
ported and motivated us in our endeavors.

Our heartfelt thanks to our Head, Dr. Rajini V Honnungar, who


has always been supportive for all academic activities in the department.
We express our sincere gratitude to our guide, Dr. Leena Chandrashekh
ar, Associate Professor, for her supervision, guidance, continuous support
and motivation, leading to successful completion of our project.

Finally, we take this opportunity to extend our earnest gratitude and


respect to our parents, teaching and non-teaching staff of the department,
the library staff, and all our friends who have directly or indirectly sup-
ported us.

Divyashree S

Jaswant R

Khushi Raj

i
Abstract
Credit card fraud detection focuses on real world scenarios. Nowadays
credit card frauds are drastically increasing in number as compared to
earlier times. Criminals are using fake identity and various technologies
to trap the users and get the money out of them. Therefore, it is very es-
sential to find a solution to these types of frauds. In this proposed project
we designed a model to detect the fraud activity in credit card transac-
tions. This system can provide most of the important features required to
detect illegal and illicit transactions. As technology changes constantly, it
is becoming difficult to track the behavior and pattern of criminal trans-
actions. With the advancement of machine learning, artificial intelligence,
and other related information technologies, it has become possible to auto-
mate the process of detecting credit card fraud. This not only streamlines
the task but also helps reduce the significant amount of labor typically
involved.

Initially, we will collect the credit card usage data set from the users
and classify it as a trained and testing data set using logistic regression
algorithm. Using this feasible algorithm, we can analyze the larger data-
set and user-provided current data-set. Then increase the accuracy of the
result data. Proceeded with the application of processing of some of the
attributes provided which can find affected fraud detection in viewing the
graphical model of data visualization. The performance of the techniques
is gauged on the basis of accuracy, sensitivity, and specificity, precision.
The results indicated regarding the best accuracy for logistic regression
algorithm are unit 98.6 per cent respectively.

ii
Table of Contents

Abstract ii

Table of Contents iii

List of Figures iv

Acronyms v

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Organisation of Report . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Literature Survey 11

3 Software Requirements 20
3.1 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Description of Software used . . . . . . . . . . . . . . . . . . . . . . . 20

4 Project Design And Architecture 23


4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Data Preprocessing and Feature Designing: . . . . . . . . . . . . . . . . 24
4.3 Model Development: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Breakdown of the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Result Analysis 41

6 Conclusion and Future scope 44

References 47

iii
List of Figures

1.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4.1 Dataset example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24


4.2 Libraries and loading dataset . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Unbalanced Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Separating the data for analysis . . . . . . . . . . . . . . . . . . . . . . . . 29
4.6 Comparing statistical measures . . . . . . . . . . . . . . . . . . . . . . . . 30
4.7 Concatenating two Data Frames . . . . . . . . . . . . . . . . . . . . . . . . 31
4.8 Verifying that the dataset is balanced . . . . . . . . . . . . . . . . . . . . . 32
4.9 Splitting the data into Features and Targets . . . . . . . . . . . . . . . . . 33
4.10 Split the data into Training and Testing Data . . . . . . . . . . . . . . . . 34
4.11 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.12 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.1 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

iv
Acronyms

ML : Machine Learning

LR : Logistic Regression

AI : Artificial Intelligence

ROC : Receiver Operating Characteristic

AUC : Area Under the Curve

F1S : F1-Score

SMOTE : Synthetic Minority Over-sampling Technique

TPR : True Positive Rate

FPR : False Positive Rate

TN : True Negative

FP : False Positive

FN : False Negative

KCV : K-Fold CV - K-Fold Cross-Validation

TP : True Positive

EDA : Exploratory Data Analysis

v
Chapter 1
Introduction

Credit card fraud detection is one of the most critical areas of research in the financial
sector. Fraudulent transactions can result in significant financial losses and damage
the reputation of financial institutions. With the growth in online transactions, de-
tecting fraud has become more challenging due to the complexity of data patterns.
Machine learning (ML) algorithms, particularly logistic regression, are increasingly be-
ing used to detect fraudulent activities by identifying patterns in transaction data and
distinguishing between legitimate and fraudulent transactions. This project focuses
on credit card fraud detection using machine learning, particularly logistic regression.

The project aims to develop a model that can predict whether a given credit card
transaction is fraudulent or legitimate based on historical transaction data. Credit
card fraud refers to unauthorized use of a credit card or its details to make fraudulent
transactions. Fraud can take several forms, including: Card-not-present fraud: Fraud-
sters make transactions without the physical card, often in online shopping. Card-
present fraud: Fraudsters use a stolen physical card to make purchases in person.
Account takeover: A fraudster takes control of an existing cardholder’s account and
uses it for unauthorized transactions. With the rise of online and mobile payments,
detecting fraudulent activities in real-time has become paramount to preventing sig-
nificant financial losses.

Credit card fraud is a pervasive and growing problem in the financial industry. As
more consumers turn to online shopping, digital transactions, and mobile payments,
the financial sector faces an increasing volume of transaction data that must be care-
fully monitored to identify fraudulent activity. Fraudulent transactions not only result
in significant monetary losses but can also damage the trust and reputation of financial
institutions, which are essential for maintaining customer relationships and business
stability. For financial institutions, detecting fraud in a timely manner is crucial. A
delayed response to fraud can lead to significant financial losses for both the insti-
tution and its customers. The complexity of modern fraud schemes, combined with
the ever-increasing number of transactions being processed globally, has made man-
ual detection methods and traditional rule-based systems less effective. Fraudsters
continuously evolve their tactics, finding new ways to bypass security measures.

1
Credit Card Fraud Detection Using Machine Learning 2024-25
1.1 Motivation
A Credit Card Fraud Detection using Machine Learning project is driven by several
critical factors that reflect both the increasing prevalence of fraud and the potential
of machine learning to address this challenge effectively. Credit card fraud is a major
global issue, with billions of dollars lost each year to fraudulent activities. The sophis-
tication of fraudsters continues to grow, making traditional fraud detection methods
less effective. With the rise of online shopping and digital financial transactions, fraud
detection has become even more important, as fraudsters exploit digital platforms to
steal personal and financial information. Fraudulent activities are becoming increas-
ingly complex, with fraudsters using more advanced techniques such as identity theft,
synthetic identity creation, and account takeovers. Fraudulent transactions are of-
ten conducted in real time, and detecting them as soon as they happen is crucial to
minimize losses and protect consumers. Traditional fraud detection systems are often
based on predefined rules or heuristics, which may not be flexible enough to detect
novel or sophisticated fraudulent behavior.

1.2 Objectives
Data Collection and Preprocessing: Gather relevant datasets that contain credit
card transaction details, including features like transaction amount, merchant, trans-
action type, and customer demographics. Preprocess the data by handling missing
values, encoding categorical variables, scaling numerical features, and removing any
noise or outliers.

Feature Engineering: Identify important features that can help detect fraudulent
transactions (e.g., transaction amount, location, time of transaction, frequency of
transactions, etc.). Create new features or transform existing ones to enhance the
model’s ability to distinguish between fraudulent and non-fraudulent transactions.

Data Splitting: Split the dataset into training and testing sets to evaluate the
model’s performance effectively. This allows the model to be trained on a subset of
the data and tested on unseen data.

Model Development using Logistic Regression: Implement a Logistic Regres-


sion model to classify credit card transactions as either fraudulent or non-fraudulent.
Fine-tune hyperparameters such as regularization strength (C) to optimize model
performance.

Model Evaluation: Evaluate the model’s performance using various metrics such

Dept of ECE, RNSIT, Bengaluru 2


Credit Card Fraud Detection Using Machine Learning 2024-25
as accuracy, precision, recall, F1-score, and the Area Under the Receiver Operating
Characteristic Curve (AUC-ROC). Address class imbalance issues, as fraud detection
datasets are often skewed, by using techniques like undersampling.

Model Interpretation: Interpret the model’s coefficients and understand which


features contribute most to the classification of fraud vs. non-fraud transactions. Use
methods like confusion matrix and feature importance ranking to visualize the model’s
decisions.

Handling Class Imbalance: Explore techniques to deal with the imbalance be-
tween fraudulent and non-fraudulent transactions, such as using SMOTE (Synthetic
Minority Over-sampling Technique), cost-sensitive learning, or balancing the dataset.

Model Validation: Perform cross-validation to assess the model’s robustness and


ensure it generalizes well on unseen data. Compare Logistic Regression with other
classifiers (e.g., Decision Trees, Random Forests, SVM) to assess its performance in
fraud detection.

Deployment Readiness: Prepare the model for deployment, including ensuring


that it is computationally efficient for real-time fraud detection. Optimize the model
for scalability and response time, which is crucial for credit card companies to prevent
fraudulent activities in real-time.

Continuous Monitoring and Improvement: Establish procedures to continuously


monitor the model’s performance after deployment, ensuring it adapts to new fraud
patterns over time. Retrain the model periodically using updated transaction data to
maintain high accuracy.

1.3 Methodology
Start by loading the dataset you’ll be working with. This is usually a file with lots of
information on each transaction, like how much was spent, where, and when. Convert
the data into a table format (like an Excel sheet) so it’s easy to work with. In pro-
gramming, this is often done using something called a Data frame, which helps you
view and manipulate the data. Since there are often far fewer fraud cases than normal
transactions, you need to balance the data. This can be done by making more copies
of the fraud cases so the model gets a fair chance to learn what fraud looks like. Split
the data into two parts. One part will be used to ”train” the model (help it learn),
and the other part will be used to test it (see how well it learned). Usually, 80 per cent

Dept of ECE, RNSIT, Bengaluru 3


Credit Card Fraud Detection Using Machine Learning 2024-25
of the data is used to teach the model, and 20 per cent is used to test it. This gives the
model enough data to learn patterns while keeping some data to evaluate its accuracy.

The data is divided into two sets: one for training the model and the other for testing
its performance. Typically, a 70-30 or 80-20 split is used. Give the training data
to each model. The model will use this data to understand pattern and differences
between fraudulent and non-fraudulent transactions.The training set is used to train
the logistic regression model. The test set is used to evaluate the model’s performance
and generalization ability. Once the models are trained, use them to make predictions
on the test data. Check how often each model gets the right answer (accuracy). This
helps you see which models are doing well. A confusion matrix shows details of the
model’s performance, like how many fraud cases it caught, how many it missed, and
if it accidentally labeled any good transactions as fraud. Finally, compare all three
models based on different criteria (like accuracy and how often they find fraud). This
helps you decide which model is the best for detecting fraud in your dataset.

1.4 Block diagram

Figure 1.1: Workflow

Dept of ECE, RNSIT, Bengaluru 4


Credit Card Fraud Detection Using Machine Learning 2024-25
Credit Card Data: Firstly, we obtain and load the dataset containing credit card
transaction details. The dataset typically includes various features like transaction
amount, user details, time of transaction, merchant details, and labels indicating
whether a transaction is fraudulent or not. Next, load the dataset into a pandas Data
Frame or other suitable data structure and then inspect the first few records to un-
derstand the structure of the data. Example: Use a publicly available dataset such as
the ”Credit Card Fraud Detection” dataset from Kaggle, which contains features like:
Transaction amount, Transaction time, Merchant details, Fraud label (1 for fraudu-
lent, 0 for non-fraudulent)

Data Preprocessing: Data preprocessing helps to clean and prepare the dataset
for analysis and modeling. Data preprocessing involves handling missing values, con-
verting categorical variables into numerical formats, normalizing or scaling data, and
dealing with class imbalance. Convert categorical variables (if any) into numerical
values. It handles missing values and checks for and address missing data points.
The Area Under the Curve (AUC) is a key metric used in evaluating the performance
of classification models, including logistic regression. It is specifically related to the
Receiver Operating Characteristic (ROC) curve, which is a graphical representation
of the model’s performance across all possible classification thresholds.

Data Analysis: The objective of data analysis is to analyse the data to gain in-
sights into the distribution and relationship of features. Then we conduct exploratory
data analysisto understand patterns, trends, and correlations in the data. Next, a
statistical summary is written where we use basic statistical methods (mean, median,
variance, etc.) to understand feature distributions. Correlation Analysis is performed
to check for correlations between features to identify the most influential variables
for predicting fraud. Visualize data to better understand the trends and patterns.
SMOTE (Synthetic Minority Over-sampling Technique) is a technique used to ad-
dress class imbalance in classification problems, where the number of instances in one
class (typically the minority class) is significantly lower than the number of instances
in the other class (majority class). Imbalanced datasets can lead to poor performance
of machine learning models like Logistic Regression, as the model tends to be biased
toward the majority class.

Train-Test Split: Firstly, we split the dataset into training and testing subsets to
evaluate the model’s performance. The data is divided into two sets: one for training
the model and the other for testing its performance. Typically, a 70-30 or 80-20 split
is used. The training set is used to train the logistic regression model. The test set

Dept of ECE, RNSIT, Bengaluru 5


Credit Card Fraud Detection Using Machine Learning 2024-25
is used to evaluate the model’s performance and generalization ability. Split the data
into training and test sets, typically using 80 per cent for training and 20 per cent for
testing. In K-Fold Cross-Validation (K-Fold CV), the dataset is divided into K equal
(or nearly equal) parts, called ”folds.” The model is trained K times, each time using
K-1 folds for training and the remaining 1 fold for testing. This process is repeated
for all K folds, and the results are averaged to provide an overall performance metric.

Logistic Regression Model: Here, we train the logistic regression model on the
training data. Logistic regression is used to predict the probability of a transaction
being fraudulent or not based on the input features. It outputs a probability, which is
then classified as fraudulent (1) or non-fraudulent (0). Then we have to initialize the
logistic regression model. After training, the model will learn the coefficients (weights)
associated with each feature. In the context of credit card fraud detection, logistic
regression is used to predict the probability that a given transaction is fraudulent
or non-fraudulent based on various input features, such as transaction amount, time,
location, and other transactional attributes. Logistic regression is a widely used binary
classification algorithm because it is simple, interpretable, and performs well with
linearly separable data.

Evaluation: First, evaluate the performance of the logistic regression model on the
test data. After training, the model is tested using the test set to assess its accu-
racy, precision, recall, and other relevant metrics. Then, use the trained model to
predict the labels (fraud or non-fraud) on the test set metrics. After training a lo-
gistic regression model on the training dataset, the next crucial step is to assess its
performance on a test dataset. The test dataset contains unseen data, which allows
us to evaluate how well the model generalizes to new, real-world data. The goal is to
understand the model’s effectiveness in predicting whether a transaction is fraudulent
or non-fraudulent. This evaluation step is essential for validating the model’s practical
utility.

1.5 Applications
The Credit Card Fraud Detection using Logistic Regression has several important
applications across various industries, primarily in the financial services sector, but
also in related fields that involve secure transactions and data analysis. However, the
principles behind fraud detection models, like those built using logistic regression, can
extend to other industries and sectors that deal with sensitive or secure transactions,
the need for data-driven analysis. Below are some key applications of this project:

Dept of ECE, RNSIT, Bengaluru 6


Credit Card Fraud Detection Using Machine Learning 2024-25
Fraud Detection in Financial Transactions: Detect fraudulent credit card trans-
actions as they occur. Financial institutions can identify and block suspicious trans-
actions before they are processed, minimizing potential financial losses and protecting
customers from fraud. Example: A customer makes a transaction in another country,
and the machine learning model flags it as potentially fraudulent based on transac-
tion patterns, geographic location, or unusual spending behavior. The transaction is
blocked or flagged for further verification.

Credit Card Issuers and Banks Fraud Prevention: Banks and credit card is-
suers use fraud detection systems to monitor customer accounts continuously for any
fraudulent activities. By using machine learning models, such as logistic regression,
banks can reduce false positives and improve the accuracy of fraud detection, offering
better protection to customers while minimizing disruptions in legitimate transac-
tions. Example: A customer’s card is used for a high-value transaction, and the
model identifies it as potentially fraudulent, triggering an automatic notification to
the customer for verification.

Enhancing Security of Online Transactions: With the rise of e-commerce, online


transactions are often a target for fraud. Logistic regression models can help identify
and block fraudulent online purchases. E-commerce platforms can integrate fraud
detection into their payment gateways, providing an additional layer of security for
customers and merchants. Example: A customer on an e-commerce site attempts to
make a purchase using a stolen credit card, but the fraud detection model flags the
transaction as suspicious based on inconsistent shopping behavior and prevents the
transaction from completing.

Customer Behavior Analysis for Fraud Prevention: Machine learning models


can analyze customer behavior to identify normal spending patterns and flag devia-
tions that might indicate fraudulent activity. By learning a customer’s typical trans-
action behavior (e.g., purchase frequency, typical transaction amounts), the system
can detect unusual patterns that may signal fraud. Example: A customer usually
makes small transactions but suddenly makes a large purchase; the model flags this
as potentially fraudulent for further verification.

Improving Customer Trust and Satisfaction: By reducing fraud, financial insti-


tutions can build better trust with customers, ensuring them that their transactions
are secure. Fraud detection systems ensure customer satisfaction by preventing the
financial and emotional burden that comes with fraudulent charges. Example: A
customer’s credit card is flagged for potential fraud, but the system allows them to

Dept of ECE, RNSIT, Bengaluru 7


Credit Card Fraud Detection Using Machine Learning 2024-25
approve or reject the transaction immediately via a mobile app or SMS alert, improv-
ing the user experience and trust in the service.

vi. Fraud Detection in Peer-to-Peer Payment Systems: Peer-to-peer payment


platforms like GPay, PhonePay and others can use fraud detection algorithms to
ensure transactions are legitimate and to prevent fraud. P2P systems can minimize the
risk of fraud and protect users from scams while maintaining user-friendly experiences.
Example: A user sends money to an unknown account, and the system flags the
transaction based on unusual behavior, such as rapid transfers or large sums of money.

1.6 Advantages and Disadvantages


Fraud Detection project using Logistic Regression offers several advantages that con-
tribute to enhancing security, operational efficiency, and customer satisfaction within
the financial services sector. Here are some key benefits:

Efficient Fraud Detection: Logistic regression is a simple and efficient machine


learning algorithm that can classify transactions as either fraudulent or non-fraudulent
based on various features (e.g., transaction amount, merchant, time, etc.). It provides
quick and effective identification of fraudulent transactions in real-time, minimizing
financial losses and protecting both customers and businesses from fraud.

Interpretability and Transparency: Logistic regression is considered a transpar-


ent and interpretable model, meaning that the results of the model can be easily
understood. The coefficients of the model indicate how each feature (e.g., transaction
amount, time, customer location) impacts the likelihood of fraud, making it easier
for businesses to explain the model’s decisions and build trust with customers and
regulators.

Handling of Imbalanced Data: Credit card fraud datasets often suffer from class
imbalance, where fraudulent transactions are much fewer than legitimate ones. Lo-
gistic regression can effectively handle imbalanced data through techniques like class
weighting and oversampling of minority classes. This improves the accuracy of de-
tecting fraudulent transactions without biasing the model toward the majority class
(non-fraudulent transactions), thus reducing false negatives.

Enhanced Customer Experience: By quickly identifying fraudulent transactions


and reducing the number of false positives, logistic regression helps ensure that legit-
imate transactions are not blocked or delayed. This leads to better customer experi-
ence, as customers don’t face unnecessary interruptions in their payments.

Dept of ECE, RNSIT, Bengaluru 8


Credit Card Fraud Detection Using Machine Learning 2024-25
While Credit Card Fraud Detection using Logistic Regression offers several advan-
tages, there are also certain disadvantages to consider, especially when applied to
complex and dynamic environments such as financial transactions. Here are some key
drawbacks of this project:

Limited Model Complexity: Logistic regression is a relatively simple linear model,


which may not capture the complexity of relationships in the data, particularly for
fraud detection where patterns can be non-linear and multifaceted. Fraud detection
often involves intricate relationships between features, and logistic regression might
fail to capture these complex patterns, leading to lower accuracy compared to more
advanced models like Random Forests, Support Vector Machines, or Neural Networks.
Example: If fraud behavior includes subtle, non-linear correlations between transac-
tion features, logistic regression might struggle to identify these patterns, potentially
missing out on fraud detection opportunities.

Bias Toward Majority Class: Logistic regression models can be biased towards
the majority class (non-fraudulent transactions) in imbalanced datasets, especially
when no additional measures (e.g., class weighting, oversampling) are taken. This
could lead to a high number of false positives (non-fraudulent transactions incorrectly
flagged as fraudulent) and a low recall for detecting actual fraud. When using logistic
regression for fraud detection, an imbalanced dataset can lead to a number of issues
that can affect the model’s performance, particularly in terms of bias, false positives,
and recall. In machine learning, especially in classification tasks, a common challenge
arises when dealing with imbalanced datasets—that is, when the distribution of classes
(e.g., fraudulent vs. non-fraudulent transactions) is skewed.

Limitations in Multi-Class Classification: While logistic regression is effective


for binary classification (fraud vs. non-fraud), it is less effective for multi-class clas-
sification problems (if you need to classify different types of fraud, for example). In
fraud detection, if there are multiple types of fraud (e.g., card-not-present fraud, ac-
count takeover, transaction fraud), logistic regression might struggle to classify these
different types accurately without significant modifications.

Difficulty in Handling Imbalanced Data: Fraud detection datasets are typically


imbalanced, with far more legitimate transactions than fraudulent ones. While logistic
regression can be adapted to handle class imbalance (e.g., through resampling, penal-
ization), it is still less effective at dealing with severe imbalances compared to tree-
based or ensemble methods. In cases of extreme imbalance, logistic regression may
struggle to identify fraudulent transactions and could result in a high false-negative
rate, where frauds are mistakenly classified as legitimate transactions.

Dept of ECE, RNSIT, Bengaluru 9


Credit Card Fraud Detection Using Machine Learning 2024-25
1.7 Organisation of Report
Chapter 1 : Introduction: This chapter just provides the introduction to our
project and discusses the motivation and the different objectives of our project and
just gives a glimpse of the methodology we used and what were the advantages, dis-
advantages and applications.

Chapter 2 : Literature Survey: This chapter gives a summary of the paper


we used for reference and explains the methodology behind it and how it contributed
to the making of our project and the ideas we took from the different papers.

Chapter 3 : System Analysis: This chapter gives us insight into the technical
details of our project such as the software requirement specification, hardware re-
quirement specifications, high-level design, etc.

Chapter 4 : Implementation: This chapter elaborates the whole process and


the methodology behind the implementation of the project in a step-by-step process
and provides insight into programming coding guidelines used during the making of
our project.

Chapter 5 : Discussion of Results: In this chapter, we discuss the outputs


rather than the results obtained after completion of our project with the efficiency of
the model or the accuracy of making correct predictions by our model.

Chapter 6 : Conclusion and Future Scope: Here we provide the conclusion


obtained by our project with the future work which tries to cover the probable loop-
holes in our project that may occur in the future as the scenario of the future may be
different as compared to now.

Dept of ECE, RNSIT, Bengaluru 10


Chapter 2
Literature Survey

In this chapter, we will summarize the research paper(s) we used as references for our
credit card fraud detection project, explaining the methodology behind the work and
how it directly contributed to the development of our project. This includes how we
adapted ideas from the referenced papers, incorporated them into our approach, and
customized these methodologies to suit our specific project goals.

Fraud Detection Systems (FDS) are automated machine learning based solutions that
credit card companies employ to detect the fraudulent transactions even before end
user’s feedback. Goal of such a system is to detect the fraudulent transaction before
it is committed to the database and thus prevent the fraud from taking place. An
ideal FDS should also minimize the false detections where a genuine transaction is
interrupted causing inconvenience to the end-user. Machine learning based algorithms
work with lots of example data of the underlying domain to define computation model
so as to classify future data seen in the domain. A class of these algorithms called
Supervised Learning Algorithms requires the example data classes to be pre-labeled.

On the other hand, other class of algorithms uses Unsupervised Learning where the
data is clustered into identical groups and termed as belonging to one class. Many al-
gorithms based on both approaches have been proposed in literature. FDS collect lot
of historical data to apply computations on them. But the transaction data sets are
typically imbalanced with number of normal transactions far outnumbering the fraud-
ulent ones. In this paper, we outline and evaluate various popular machine learning
algorithms with respect to their capability to correctly classify fraudulent transactions
in a real world imbalanced dataset[1].

Multiple Supervised and Semi-Supervised machine learning techniques are used


for fraud detection, but we aim is to overcome three main challenges with card frauds
related dataset i.e., strong class imbalance, the inclusion of labelled and unlabelled
samples, and to increase the ability to process a large number of transactions. Dif-
ferent Supervised machine learning algorithms like Decision Trees, Naive Bayes Clas-
sification, Least Squares Regression, Logistic Regression and SVM are used to detect

11
Credit Card Fraud Detection Using Machine Learning 2024-25
fraudulent transactions in real-time datasets. Two methods under random forests are
used to train the behavioural features of normal and abnormal transactions. They
are Random-tree-based random forest and CART-based. Even though random forest
obtains good results on small set data, there are still some problems in case of im-
balanced data. The future work will focus on solving the above-mentioned problem.
The algorithm of the random forest itself should be improved.

Performance of Logistic Regression, K-Nearest Neighbour, and Naı̈ve Bayes are anal-
ysed on highly skewed credit card fraud data where Research is carried out on ex-
amining meta-classifiers and meta-learning approaches in handling highly imbalanced
credit card fraud data. Through supervised learning methods can be used there may
fail at certain cases of detecting the fraud cases. A model of deep Auto-encoder and
restricted Boltzmann machine (RBM) that can construct normal transactions to find
anomalies from normal patterns. Not only that a hybrid method is developed with a
combination of Adaboost and Majority Voting methods[2].

It is essential that credit card companies are able to detect fraudulent transactions
so that customers are not charged for items they did not purchase. Data can be used
to solve these issues. Science and its importance, as well as machine and soft learning,
could not be more critical. When someone defrauds you of your money or otherwise
harms your financial well-being through deception or other illegal means, this is re-
ferred to as financial fraud. Billions of dollars worth of financial fraud is committed
every year. According to the Federal Trade Commission(FTC), the number of theft
reports has more than doubled in the last two years. One of the major types of finan-
cial fraud is credit card fraud.

As the number of online transactions is growing, so is the number of credit card


frauds. An effective solution is necessary to reduce loss due to fraudulent transac-
tions at the initial stage. An effective way to do so would be to use machine learning
algorithms to detect credit card fraud. This paper examines latest advances and ap-
plication in the field of machine learning-based credit card fraud detection. In this
paper four machine learning algorithms have been analyzed and compared on the ba-
sis of their accuracies. It is found out that Catboost algorithm works best to detect
credit card fraud with an accuracy of 99.87 percentage. The dataset for credit card
fraud detection was taken from kaggle[3].

Research on fraud detection using machine learning in credit card problems has re-
ceived high attention. The paper considers using popular supervised algorithms for

Dept of ECE, RNSIT, Bengaluru 12


Credit Card Fraud Detection Using Machine Learning 2024-25
classification, such as Logistic Regression. It also explores unsupervised methodolo-
gies like K-means clustering and considers hybrid models based on several models.
Here, we describe our data pre-processing techniques to enhance the model’s perfor-
mance regarding feature selection, data cleaning, and class imbalance. To make the
right decisions about the effectiveness of the approaches we are considering, we also
consider evaluation criteria such as accuracy, precision.

However, problems continue with drifting on the aspect of limited data availabil-
ity, interpretability, and explanation. Future research opportunities include ensemble
methods, deep learning architectures, additional data sources and natural time fraud
detection systems. Machine learning has many patterns associated with credit card
information analysis to combat fraud. However, in terms of the explainability of the
model and updating it, problems arise due to changes in fraud patterns associated
with credit card activities. .

Introduction of credit cards and others have led to the broad adoption of online trans-
actions in daily life . Credit is a plastic card for buying and cash withdrawal. This
invention made trading faster, thereby improving business and increasing economic
activity. Banks introduced cards to provide consumers with convenient and efficient
purchases without needing immediate cash. Furthermore, by streamlining payments
and improving the overall user experience, cards encourage spending by helping people
establish credit histories, which are helpful for various financial endeavours, including
applying for loans and mortgages. Additionally, credit cards benefit banks by creat-
ing new revenue streams through interest charges associated with card usage, thereby
facilitating the development of a credit-based economy. Credit card fraud remains a
global problem due to the rise in online commerce yield and the growing use of credit
cards recently, and credit card fraud has increased frequently[4].

During the search it was found that there were many models created by other re-
searchers which have proven that people have been trying to solve the credit card
fraud problem. I found that Najdat Team used an approach that is established upon
bidirectional long/short-term memory in building their model, other researchers have
tried different data splitting ratios to generate different accuracies. The team of Sahin
and Duman used different Support Vector Machine methods which are (SVM) Sup-
port Vector Machine with RBF, Polynomial, Sigmoid, and Linear Kernel.

The lowest accuracy of the four models that will be studied in this research, is 54.86%
for KNN and 36.40% for logistic Regression which were scored by Awoyemi and his

Dept of ECE, RNSIT, Bengaluru 13


Credit Card Fraud Detection Using Machine Learning 2024-25
team, as for Naı̈ve Bayes the lowest accuracy was scored by Gupta and his team which
is 80.4% and finally, SVM the lowest score was 94.65% and it was scored by Jain’s
team. To determine the best model out of the four models that will be studied through
the research, the average of the best three accuracies of each model will be calculated,
the average of the accuracy of KNN is 98.72%, the average of logistic regression is
98.11%, 98.85% for Naı̈ve bayes and 96.16% for Support Vector Machine. So, for the
best performing credit card fraud detecting model within the Literature review is the
Logistic Regression model.

A credit card is often described as a card that is granted to the customer (card-
holder), frequently enabling them to buy goods and services within their credit limit
or withdraw cash in advance, among many other things, is always because of the
absence of accessible funds at the time. Credit cards give cardholders the benefit of
time, allowing consumers to postpone payments that are due beyond a certain period
of time by rolling them over to the following billing cycle. The payments industry
employs a process called credit card fraud detection to determine if a transaction is
fraudulent, which involves utilizing historical data. Detecting fraudulent credit card
transactions can be a challenging task because it involves identifying unauthorized
usage of a credit card by an individual who does not have control over the account[5].

Machine learning (ML) algorithms are utilized to assess all authorized transactions
and identify any that appear suspicious. Investigators get in touch with the cardhold-
ers who are asked them if the transaction was genuine or fraudulent. The conclusions
drawn in this investigation are based on the datasets used, which are outlined in the
methodology section. The work is concluded with the Conclusion section and sug-
gestions for further investigation on relevant topics. Credit card fraud occurs in a
transaction when a fictitious source of funds is created using a credit card the ma-
jority of credit card fraud detection methods depend on artificial intelligence, Meta
learning, and pattern matching as their founding principles.

Fraud Detection Systems (FDS) are automated machine learning based solutions that
credit card companies employ to detect the fraudulent transactions even before end
user’s feedback. Goal of such a system is to detect the fraudulent transaction before
it is committed to the database and thus prevent the fraud from taking place. An
ideal FDS should also minimize the false detections where a genuine transaction is
interrupted causing inconvenience to the end-user. Machine learning based algorithms
work with lots of example data of the underlying domain to define computation model
so as to classify future data seen in the domain. A class of these algorithms called

Dept of ECE, RNSIT, Bengaluru 14


Credit Card Fraud Detection Using Machine Learning 2024-25
Supervised Learning Algorithms requires the example data classes to be pre-labeled.
It is possible to avoid this misuse by taking the required precautions and it is also
possible to study how such fraudulent acts behave intending to minimize them and
to halt them from occurring again in the future[6].

On one hand, other class of algorithms uses Unsupervised Learning where the data
is clustered into identical groups and termed as belonging to one class. Many algo-
rithms based on both approaches have been proposed in literature. FDS collect lot
of historical data to apply computations on them. But the transaction data sets are
typically imbalanced with number of normal transactions far outnumbering the fraud-
ulent ones. In this paper, we outline and evaluate various popular machine learning
algorithms with respect to their capability to correctly classify fraudulent transactions
in a real world imbalanced dataset.

It is essential that credit card companies are able to detect fraudulent transactions so
that customers are not charged for items they did not purchase. Data can be used to
solve these issues. Science and its importance, as well as machine and soft learning,
could not be more critical. When someone defrauds you of your money or otherwise
harms your financial well-being through deception or other illegal means, this is re-
ferred to as financial fraud. Billions of dollars worth of financial fraud is committed
every year. According to the Federal Trade Commission(FTC), the number of theft
reports has more than doubled in the last two years. One of the major types of finan-
cial fraud is credit card fraud. As the number of online transactions is growing, so is
the number of credit card frauds. An effective solution is necessary to reduce loss due
to fraudulent transactions at the initial stage.

An effective way to do so would be to use machine learning algorithms to detect


credit card fraud. This paper examines latest advances and application in the field of
machine learning-based credit card fraud detection. In this paper four machine learn-
ing algorithms have been analyzed and compared on the basis of their accuracies. It
is found out that Catboost algorithm works best to detect credit card fraud with an
accuracy of 99.87 percentage. The dataset for credit card fraud detection was taken
from kaggle[7].

Credit card fraud poses a significant threat to the financial sector, resulting in sub-
stantial financial losses. This research investigates the application of advanced ma-
chine learning techniques to effectively detect fraudulent transactions. By utilizing
a publicly available dataset, this study evaluates and compares various algorithms,

Dept of ECE, RNSIT, Bengaluru 15


Credit Card Fraud Detection Using Machine Learning 2024-25
including Random Forest, Decision Tree, and Artificial Neural Networks (ANN). This
research contributes to the advancement of fraud detection technologies by providing
valuable insights into the effectiveness of different machine learning algorithms and
the critical role of feature engineering. The findings highlight the need for resilient,
scalable, and real-time mechanisms to combat evolving fraud strategies.

Machine learning (ML) algorithms are utilized to assess all authorized transactions
and identify any that appear suspicious. Investigators get in touch with the cardhold-
ers who are asked them if the transaction was genuine or fraudulent. The conclusions
drawn in this investigation are based on the datasets used, which are outlined in the
methodology section. The work is concluded with the Conclusion section and sug-
gestions for further investigation on relevant topics. Credit card fraud occurs in a
transaction when a fictitious source of funds is created using a credit card the ma-
jority of credit card fraud detection methods depend on artificial intelligence, Meta
learning, and pattern matching as their founding principles.

There are many types of fraud in our daily life. One of the frauds occurring these days
is credit card fraud. When people around the globe make credit card transactions,
there will also be fraudulent transactions. To avoid credit card fraud, we must know
the patterns and how the fraud values differ. This paper proposed credit card fraud
detection using machine learning based on the labeled data and differentiating the
fraudulent and legitimate transactions. The experiment was conducted using super-
vised machine-learning techniques.

Performance of Logistic Regression are analysed on highly skewed credit card fraud
data where Research is carried out on examining meta-classifiers and meta-learning
approaches in handling highly imbalanced credit card fraud data. Through supervised
learning methods can be used there may fail at certain cases of detecting the fraud
cases. A model of deep Auto-encoder and restricted Boltzmann machine (RBM) that
can construct normal transactions to find anomalies from normal patterns. Not only
that a hybrid method is developed with a combination of Adaboost and Majority
Voting methods[8].

Financial fraud is an ever growing menace with far reaching consequences in the
finance industry, corporate organizations, and government. Fraud can be defined as
criminal deception with intent of acquiring financial gain. High dependence on internet
technology has enjoyed increased credit card transactions. As credit card transactions
become the most prevailing mode of payment for both online and offline transaction,

Dept of ECE, RNSIT, Bengaluru 16


Credit Card Fraud Detection Using Machine Learning 2024-25
credit card fraud rate also accelerates. Credit card fraud can come in either inner card
fraud or external card fraud. Inner card fraud occurs as a result of consent between
cardholders and bank by using false identity to commit fraud while the external card
fraud involves the use of stolen credit card to get cash through dubious means. A lot
of researches have been devoted to detection of external card fraud which accounts
for majority of credit card frauds. Detecting fraudulent transactions using traditional
methods of manual detection is time consuming and inefficient, thus the advent of
big data has made manual methods more impractical. However, financial institutions
have focused attention to recent computational methodologies to handle credit card
fraud problem.

Financial fraud is an ever growing menace with far consequences in the financial
industry. Data mining had played an imperative role in the detection of credit card
fraud in online transactions. Credit card fraud detection, which is a data mining
problem, becomes challenging due to two major reasons - first, the profiles of normal
and fraudulent behaviours change constantly and secondly, credit card fraud data sets
are highly skewed. The performance of fraud detection in credit card transactions is
greatly affected by the sampling approach on dataset, selection of variables and detec-
tion technique(s) used. This paper investigates the performance of logistic regression
on highly skewed credit card fraud data. Dataset of credit card transactions is sourced
from European cardholders containing 284,807 transactions[9].

Credit card fraud encompasses unauthorised transactions using stolen or compro-


mised credit card information. In research that involved a random sample of consumer
complaint filings in the Consumer Financial Protection Bureau (CFPB), nearly one-
quarter of complaints, almost 60 percent of complaint reports contain expressions of
emotional distress, and many mention financial hardship . As technology increases
in online transactions, evolving fraud patterns and sophisticated means make fraud a
severe crime, causing substantial financial loss. As much as it looks like a challenge,
machine learning has emerged as an ideal weapon to fight this increasing credit card
fraud crime . In Europe, credit card theft increased to 1.33 billion euros in 2012, a
14.8 percent increase from the previous year.

Challenges arise when the current system for detecting fraud does not sufficiently
tackle new and evolving fraud patterns. Credit card fraud is a significant concern
which causes damage to citizens, firms, and the economy. Credit card fraud cost an
estimated 27.85 billion just last year. This represents 16.2 per cent from 23.97 billion
in 2017. There are severe consequences from credit card fraud experienced by most

Dept of ECE, RNSIT, Bengaluru 17


Credit Card Fraud Detection Using Machine Learning 2024-25
of the victims. Victims of fraud often lose their money, experience emotional distress,
and find it challenging to regain credit in the future. Credit card fraud can also dam-
age a business’s reputation and make it difficult to attract new customers. In 2012,
credit card fraud increased by over €1.33 billion in Europe. Fraudsters continually
alter their strategies to evade detection, rendering traditional tools like expert rules
inadequate for effective fraud prevention. Therefore, real-time detection and analysis
of patterns and trends through machine learning is necessary to uncover new fraud
schemes that standard rules-based systems may not identify.

A hybrid technique of under-sampling and oversampling is carried out on the skewed


data. The three techniques are applied on the raw and preprocessed data. The work
is implemented in Python. The performance of the techniques is evaluated based on
accuracy, sensitivity, specificity, precision, Matthews correlation coefficient and bal-
anced classification rate. The results shows of optimal accuracy for logistic regression
model classifiers are 97.92 per cent, 97.69 per cent and 54.86 per cent[10].

Credit card fraud is a huge ranging term for theft and fraud committed using or
involving at the time of payment by using this card. The purpose may be to purchase
goods without paying, or to transfer unauthorized funds from an account. Credit
card fraud is also an add on to identity theft. As per the information from the United
States Federal Trade Commission, the theft rate of identity had been holding sta-
ble during the mid 2000s, but it was increased by 21 percent in 2008. Even though
credit card fraud, that crime which most people associate with ID theft, decreased as
a percentage of all ID theft complaints In 2000, out of 13 billion transactions made
annually, approximately 10 million or one out of every 1300 transactions turned out
to be fraudulent. Investigating the possibility of applying extra information bases to
construct credit card fraud detection systems would be necessary. Such information
would go beyond simply transactional data like the ones discussed above, like user
behaviour patterns, geo-location, and device fingerprints.

In this way, additional data sources could feed machine learning models with more
comprehensive intelligence about user activity and, consequently, better accuracy in
detecting possible fraud. Another promising area of research and development is real-
time credit card fraud detection systems. Financial institutions should develop mod-
els that can analyse transactions within the shortest period possible to assist them in
quickly spotting any fraud-related activity before they get substantial monetary losses.

Logistic Regression is one of the classification algorithm, used to predict a binary

Dept of ECE, RNSIT, Bengaluru 18


Credit Card Fraud Detection Using Machine Learning 2024-25
values in a given set of independent variables (1 / 0, Yes / No, True / False). To
represent binary / categorical values, dummy variables are used. For the purpose of
special case in the logistic regression is a linear regression, when the resulting variable
is categorical then the log of odds are used for dependent variable and also it pre-
dicts the probability of occurrence of an event by fitting data to a logistic function[11].

Credit card companies are able to detect fraudulent transactions so that customers
are not charged for items they did not purchase. Data can be used to solve these
issues. Science and its importance, as well as machine and soft learning, could not be
more critical. When someone defrauds you of your money or otherwise harms your
financial well-being through deception or other illegal means, this is referred to as
financial fraud. Billions of dollars worth of financial fraud is committed every year.
According to the Federal Trade Commission, the number of theft reports has more
than doubled in the last two years. One of the major types of financial fraud is credit
card fraud. As the number of online transactions is growing, so is the number of
credit card frauds. An effective solution is necessary to reduce loss due to fraudulent
transactions at the initial stage.

In corporate and finance business, financial fraud become very crucial issue. More-
over, financial fraud affect a lot in business, economy instability and it also affects the
people’s price of living. There are some frauds, which are again classify further, that
are the major issues now days. They are credit card fraud, mortgage fraud, money
laundering, financial statement fraud, securities and commodities fraud, automobile
insurance fraud and healthcare fraud. In this paper, we will focus on Credit card
fraud and its detection techniques. An effective way to do so would be to use machine
learning algorithms to detect credit card fraud. This paper examines latest advances
and application in the field of machine learning-based credit card fraud detection[12].

Dept of ECE, RNSIT, Bengaluru 19


Chapter 3
Software Requirements

The software requirements outlined in this document define the necessary features,
functionalities, and system constraints for developing the fraud detection system.
These include the need for real-time data processing, integration with transaction
databases, as well as the ability to train and update machine learning models based on
historical transaction data.Key aspects of the system will include data pre-processing,
model training, anomaly detection, decision support, and system performance metrics
to ensure that the solution can effectively balance detection accuracy with response
time.The software must also be capable of handling different types of fraud detec-
tion techniques, including supervised and unsupervised learning models, to adapt to
various fraud scenarios.

3.1 Software
The software used are: Google Colab is a cloud-based interactive development envi-
ronment (IDE) that allows users to write and execute Python code in a web-based
notebook. It is compatible with both Python 2.7 and Python 3.x, making it a versatile
tool for various coding tasks. Google Colab is particularly useful for data analysis,
machine learning, and deep learning projects, as it provides free access to powerful
computing resources such as GPUs and TPUs. Additionally, it integrates seamlessly
with Google Drive, making it easy to store and share notebooks. With its user-
friendly interface and collaborative features, Google Colab is an excellent choice for
both beginners and advanced developers working on Python projects.

3.2 Description of Software used


3.2.1 Operating System
An operating system (OS) is a crucial software layer that manages a computer’s
hardware and software resources, enabling users and programs to interact with the
machine effectively. It acts as an intermediary between hardware and applications,
ensuring that tasks are performed efficiently. One of its primary functions is process

20
Credit Card Fraud Detection Using Machine Learning 2024-25
management, which involves handling the execution of programs and managing sys-
tem resources like CPU time. Memory management is another essential aspect, as
the OS allocates and tracks memory usage to ensure that running programs do not
interfere with each other.Fraud detection systems often need to work in real-time to
monitor transactions as they occur. The OS plays a critical role in managing network
communications between different systems, such as when data is transferred from the
point-of-sale (POS) systems to fraud detection servers.

It handles TCP/IP connections, HTTP requests, and data streaming protocols for
continuous monitoring.For fraud detection, APIs are used to send transaction data
from one system to another (e.g., payment gateway to fraud detection service), en-
abling real-time fraud scoring.From managing system resources (CPU, memory, and
storage) to securing sensitive data and ensuring smooth real-time communication, the
OS is integral to the successful operation of a fraud detection system. It helps ensure
that fraud detection models can function efficiently, securely, and reliably in detecting
and preventing credit card fraud.

Specifications: Intel Core i7 / AMD Ryzen 7 or higher: A multi-core processor


with a high clock speed is crucial for faster computations, especially during model
training. Storage Size: At least 256 GB SSD is ideal, but 512 GB or higher is better
for handling large datasets and multiple models. Intel Integrated Graphics or basic
GPU

3.2.2 Python IDE


A Python Integrated Development Environment (IDE) is a software application that
provides comprehensive tools to help developers write, test, and debug Python code
more efficiently. IDEs offer various features such as code completion, syntax highlight-
ing, debugging tools, version control, and integration with libraries and frameworks,
making them an essential tool for developers working in Python.Most Python IDEs
come with an advanced code editor that provides features like syntax highlighting,
automatic indentation, and line numbering, making it easier to write clean and error-
free code.IDEs often include intelligent code completion (also known as autocomplete
or IntelliSense), which suggests possible completions for code as you type, helping you
write code faster and with fewer errors.

Python IDEs are essential for building and deploying a credit card fraud detection
system using machine learning. They provide a streamlined workflow for data prepro-
cessing, model building, debugging, and testing, as well as features for collaboration,

Dept of ECE, RNSIT, Bengaluru 21


Credit Card Fraud Detection Using Machine Learning 2024-25
version control, and performance optimization. IDEs enhance productivity, improve
code quality, and ensure that the fraud detection system is robust, efficient, and
ready for deployment in real-world environments. Whether for a solo project or col-
laboration in a team, Python IDEs are indispensable tools for implementing machine
learning solutions for fraud detection.Python IDEs help write and run data prepro-
cessing scripts in Python using libraries like pandas, NumPy, and scikit-learn. Data
preparation (cleaning, normalization, handling missing values, etc.) is one of the most
time-consuming tasks in machine learning projects, and the IDE makes it more ef-
ficient. IDEs support plotting libraries like Matplotlib, Seaborn, or Plotly, allowing
you to visualize patterns in the data (e.g., distribution of fraudulent vs. legitimate
transactions, correlation of features).

3.2.3 Google Colab


Google Colab (short for Colaboratory) is a cloud-based platform developed by Google
that allows users to write and execute Python code in an interactive environment.
It provides a Jupyter notebook interface, making it particularly popular among data
scientists, machine learning practitioners, and researchers. Google Colab is free to
use and offers a range of powerful features that make it an ideal tool for writing,
sharing, and executing Python code in a collaborative manner.Google Colab is a pow-
erful, user-friendly platform that simplifies the process of writing and running Python
code in an interactive environment. Its cloud-based nature, support for GPU/TPU
acceleration, and collaboration features make it an ideal tool for machine learning,
data analysis, and research. Whether you’re a beginner or an experienced developer,
Google Colab provides the resources and tools needed to write, test.

Google Colab offers a powerful, cloud-based platform to develop, train, and deploy
machine learning models for credit card fraud detection. With its free access to
GPUs/TPUs, integration with popular libraries, collaboration features, and the ability
to handle large datasets, it significantly speeds up the machine learning development
cycle. It’s ideal for experimenting with algorithms, visualizing data, and building end-
to-end fraud detection systems in a collaborative and scalable way. Google Colab is
a free, cloud-based platform provided by Google that enables users to write, execute,
and share Python code within a Jupyter notebook interface. It is especially popular
in the fields of data science, machine learning, artificial intelligence, and education
due to its accessibility, ease of use, and integration with powerful hardware resources
like GPUs and TPUs.

Dept of ECE, RNSIT, Bengaluru 22


Chapter 4
Project Design And Architecture

Designing the architecture for the “Credit Card Fraud Detection Using Machine
Learning” involves several components and stages. Designing the architecture for
the ”Credit Card Fraud Detection Using Machine Learning” involves a systematic ap-
proach that integrates multiple components and stages to efficiently detect fraudulent
transactions. The architecture is built to process large datasets, train a model, and
make predictions in real-time. The key stages include:

4.1 Dataset
The use of datasets in credit card fraud detection is fundamental to training machine
learning models that can identify fraudulent transactions. These datasets typically
contain transaction records, including features such as transaction amount, merchant
information, time of transaction, geographical location, user ID, and past transaction
history. The most critical aspect of these datasets is their ability to represent both
legitimate and fraudulent transactions, with fraud being a much smaller subset, mak-
ing the data highly imbalanced. This imbalance requires special handling techniques
such as oversampling, undersampling, or generating synthetic data to improve model
performance. The dataset is usually split into training, validation, and test sets to
ensure the model can generalize well to unseen data.

The image displays a tabular dataset containing a preview of the first five rows and 31
columns. The table starts with a ”Time” column, followed by 30 numerical features
labeled as V1, V2, V3, and so on up to V30. Each cell in the table contains numeric
values, which appear to be either standardized or normalized, as they include both
positive and negative values, many of which are fractions.

The dataset structure suggests that it may be intended for a machine learning or
statistical analysis task. The ”Time” column likely represents a temporal sequence
or the order of the records, while the V1 through V30 columns could represent input
features derived from the original data. The precise nature of these features is not
immediately clear, as their names are generic, but they are likely related to some

23
Credit Card Fraud Detection Using Machine Learning 2024-25
specific domain problem, possibly in finance, signal processing, or another numerical
analysis field. The table caption, labeled ”Figure 4.1: Dataset example,” implies that
this is part of a larger document, possibly a thesis or report, where the dataset serves
as an illustrative example of the data being analyzed. The dataset’s concise format
and presentation hint that it might have been preprocessed, likely for tasks such as
classification, regression, or anomaly detection.

Figure 4.1: Dataset example

4.2 Data Preprocessing and Feature Designing:


Data Cleaning: Perform information cleaning to deal with missing qualities,exceptions,
and irregularities in the dataset.Removing Duplicates: Identical transactions may ap-
pear more than once, so duplicate entries need to be identified and removed. Handling
Missing Values: Missing data can be handled through imputation (replacing missing
values with mean, median, or mode) or by dropping rows/columns with excessive miss-
ing data. Removing Duplicates: Identical transactions may appear more than once,
so duplicate entries need to be identified and removed. Dealing with Outliers: Some
transaction amounts or data points may be far outside typical values. Identifying and
handling these outliers (via trimming, capping, or transformation) is important for
ensuring model stability.

Dept of ECE, RNSIT, Bengaluru 24


Credit Card Fraud Detection Using Machine Learning 2024-25
Feature Transformation: Normalize, scale, or encode feature as expected for machine
learning model input. In credit card fraud detection, several features can be engi-
neered to identify abnormal behaviors that may indicate fraudulent activity. Trans-
action Amount is a key feature, as significant deviations from a user’s average spending
patterns could suggest fraudulent behavior. Similarly, Transaction Frequency within
a specific time period, such as an hour or day, may reveal unusual patterns, especially
if a user makes several high-value transactions in a short time. Merchant Patterns
also play an important role; if a user frequently transacts with certain merchants,
any sudden deviation from these patterns may raise a red flag. The Time of Day
is another critical feature, as transactions occurring during atypical hours—such as
late-night or early-morning transactions—can be suspicious. Geographical Location
is a strong indicator as well; if a user makes a transaction in a location or country
far from their typical locations, it could signal fraud. Lastly, Transaction Velocity,
which measures how quickly a user completes multiple transactions in a short period,
is a useful feature in detecting rapid, potentially fraudulent activity. By analyzing
these features together, machine learning models can effectively detect irregularities
and prevent fraudulent transactions.

4.3 Model Development:


Choosing of ML Algorithm: selecting from different machine learning like Logistic
Regression. Logistic Regression is one of the most commonly used machine learning
algorithms for binary classification tasks, such as credit card fraud detection. While it
is relatively simple compared to more complex algorithms, it has several advantages,
especially when dealing with smaller datasets or when interpretability is important.
The goal of using logistic regression in credit card fraud detection is to classify trans-
actions as either fraudulent or legitimate based on a set of input features, such as
transaction amount, time, location, and user behavior. Logistic regression models
the probability that a given transaction is fraudulent (represented by a label of 1)
or legitimate (represented by a label of 0). Unlike linear regression, which predicts a
continuous output, logistic regression produces a probability score between 0 and 1
using the sigmoid function.

Training: Train these models using the preprocessed dataset, partitioned into prepar-
ing.Model training in credit card fraud detection is a critical step in building a reliable
machine learning system. The process begins by preparing the data, which involves
feature engineering, handling missing values, scaling features, and addressing class im-
balance. A balanced dataset is crucial, as fraud detection datasets tend to be highly

Dept of ECE, RNSIT, Bengaluru 25


Credit Card Fraud Detection Using Machine Learning 2024-25
imbalanced, with far fewer fraudulent transactions than legitimate ones. Once the
data is prepared, the dataset is split into training, validation, and test sets to ensure
the model can generalize well to unseen data. Several algorithms can be used for model
training, including Logistic Regression, Decision Trees, Random Forest, Support Vec-
tor Machines, and Gradient Boosting. The selected algorithm is then trained on the
training set, where it learns to identify patterns in the features and adjust its param-
eters to minimize prediction errors. During training, hyper parameters like learning
rate, regularization strength, and model depth are fine-tuned using techniques like
grid search. The model is evaluated using various metrics such as accuracy, precision,
recall, and the F1 score, which are especially important in the context of imbalanced
datasets. Ensuring that the model does not overfit the training data is also a priority,
achieved through methods like cross-validation and regularization. Once the model is
trained and validated, it is tested on the unseen test set to assess its ability to detect
fraudulent transactions accurately. and validation sets.

4.4 Breakdown of the Code

Figure 4.2: Libraries and loading dataset

Imports libraries: NumPy and Pandas for numerical and data manipulation, train
test split for splitting the dataset into training and testing sets.Numerical Compu-
tation: NumPy (Numerical Python) is a powerful library for numerical computing
in Python. It provides support for handling large, multi-dimensional arrays and ma-
trices, along with a collection of mathematical functions to operate on these arrays.
NumPy’s array structure is far more efficient than Python’s built-in lists when dealing
with large datasets, making it indispensable for numerical analysis. Logistic Regres-
sion for building a logistic regression model and accuracy score for evaluating the
model’s performance. One of the fundamental steps in machine learning is to split
the dataset into training and testing subsets. This is essential to ensure that the
Dept of ECE, RNSIT, Bengaluru 26
Credit Card Fraud Detection Using Machine Learning 2024-25
model is evaluated on unseen data, allowing for an unbiased assessment of its gen-
eralization capability. By splitting the dataset into these two subsets, you simulate
how the model will behave when deployed in real-world scenarios, where it will en-
counter unseen data. Cross-validation can also be used to further refine the evaluation.

Loads a dataset: The dataset (credit data.csv) is loaded into a Pandas DataFrame
(credit card data) from the specified path (/content/credit data.csv).When we load a
dataset (such as the ”credit data.csv”) into a Pandas DataFrame, we are essentially
importing the dataset from a CSV (Comma-Separated Values) file into a structure
that is easy to manipulate and analyze. Pandas is a popular Python library for data
manipulation and analysis, and it allows us to work with data in a tabular format,
similar to how we might work with data in an Excel sheet or SQL database. This
is the initial step to load and prepare data for further processing, analysis, or model
training. To load the dataset, we typically use pd.read csv() function from Pandas.
This function reads the data from a specified file path (in this case, /content/credit
data.csv) and converts it into a DataFrame, a 2-dimensional data structure where
rows represent data entries, and columns represent features.

Figure 4.3: Dataset

It displays a summary of the dataset, including the number of entries, the columns,
and their data types. It also shows how many non-null values are present in each
column. It shows the distribution of the target variable (Class), which represents the
classes of transactions. The output indicates that the dataset has 284,315 legitimate
transactions (label 0) and 492 fraudulent transactions (label 1). This imbalance sug-
gests that the dataset is highly unbalanced, with a much larger number of legitimate
transactions than fraudulent ones, which may require special handling during model
training. When we load the dataset and perform an initial analysis using functions
like df.info() and df.describe(), it gives us a concise summary of the dataset, which is
vital for understanding its structure and characteristics.
Dept of ECE, RNSIT, Bengaluru 27
Credit Card Fraud Detection Using Machine Learning 2024-25

Figure 4.4: Unbalanced Dataset

Separating the data for analysis: It filters the dataset to select legitimate trans-
actions (where the Class is 0) and stores them in the legit DataFrame and filters the
dataset to select fraudulent transactions (where the Class is 1) and stores them in
the fraud DataFrame. To filter the dataset and separate legitimate and fraudulent
transactions, we would typically use Boolean indexing in Pandas. This allows us to
select rows based on a condition and store the results in separate DataFrames for
further analysis or model training.

The reason for separating the legitimate and fraudulent transactions into different
DataFrames is typically to: Analyze each class separately: You might want to per-
form different kinds of analysis, like looking at the distribution of amounts or features
for legitimate vs. fraudulent transactions. Balance the data: Since the dataset is
imbalanced, separating the classes allows you to apply techniques like oversampling
(for the minority class, i.e., fraud) or undersampling (for the majority class, i.e., le-
git) before combining the datasets back for model training. Model training: During
model training, you might prefer to handle the two classes separately. For example,
you may want to focus more on improving the detection of fraudulent transactions
(the minority class), or ensure the model learns from both legitimate and fraudulent
transactions while accounting for their imbalance.

Printing the shapes of the datasets: The output (284315, 31) indicates that there
are 284,315 legitimate transactions with 31 features (columns). The output (492, 31)
indicates that there are 492 fraudulent transactions with 31 features (columns). When
we print the shapes of the two datasets (legit and fraud), we are essentially asking

Dept of ECE, RNSIT, Bengaluru 28


Credit Card Fraud Detection Using Machine Learning 2024-25
for the number of rows (transactions) and columns (features) in each DataFrame.
284,315: This represents the number of legitimate transactions in the dataset. There
are 284,315 rows, meaning that 284,315 data points correspond to legitimate transac-
tions. 31: This represents the number of features (columns) in the dataset. In other
words, each legitimate transaction has 31 attributes or variables, such as Amount,
Date, Age, CreditScore, etc. The 31 columns describe various aspects of each trans-
action. Both the legit and fraud datasets have 31 columns. This indicates that each
transaction, whether legitimate or fraudulent, is described by the same set of features,
which are likely derived from the raw data, including financial details, user behaviors,
time-related information, etc.

Statistical measures of the data: The code displays summary statistics (count,
mean, standard deviation, min, max, quartiles) for the Amount column in the le-
gitimate transactions, giving an overview of the transaction amounts for legitimate
transactions. When we analyze the Amount column for legitimate transactions, it’s
helpful to use summary statistics to get an overall picture of the distribution and
characteristics of the transaction amounts. Using describe() in Pandas gives us key
insights such as the count, mean, standard deviation, min, max, and quartiles (25%,
50%, and 75%). The count represents the number of non-null entries in the Amount
column. In the context of legitimate transactions (legit), this would typically be equal
to the number of legitimate transactions, assuming there are no missing values here.

Figure 4.5: Separating the data for analysis

legit.Amount.describe(): Displays summary statistics (count, mean, standard devia-


tion, min, max, quartiles) for the Amount column in the legitimate transactions (legit).
When we run the code legit.Amount.describe(), it generates summary statistics for
the Amount column in the legit DataFrame. These statistics provide a quick overview
of the distribution and key characteristics of the legitimate transaction amounts. Let’s
Dept of ECE, RNSIT, Bengaluru 29
Credit Card Fraud Detection Using Machine Learning 2024-25
go over what each of these summary statistics means, with some elaboration. The
standard deviation measures the amount of variation or spread of the amounts from
the mean. A standard deviation of 250.21 tells us that the transaction amounts for
legitimate transactions vary quite a bit from the average. The larger the standard
deviation, the greater the spread of transaction amounts. A high standard deviation
suggests that while many transactions are small, there are also some very large trans-
actions that push the average higher.

fraud.Amount.describe(): Displays summary statistics for the Amount column in the


fraudulent transactions (fraud). When you run the code fraud.Amount.describe(), it
generates summary statistics for the Amount column in the fraud DataFrame, which
contains fraudulent transactions. These statistics provide a detailed view of the dis-
tribution and key characteristics of the transaction amounts specifically for fraudulent
transactions. There are only 492 fraudulent transactions in this dataset, which is much
smaller compared to the 284,315 legitimate transactions. This indicates a highly im-
balanced dataset, with the fraudulent transactions being a very small fraction of the
overall transactions. This average is higher than the average for legitimate transac-
tions (which was 88.30). The higher average suggests that fraudulent transactions
may be skewed towards larger amounts than legitimate transactions.

Figure 4.6: Comparing statistical measures

Dept of ECE, RNSIT, Bengaluru 30


Credit Card Fraud Detection Using Machine Learning 2024-25
This groups the data by the Class column (which distinguishes between legitimate
and fraudulent transactions) and calculates the mean of all numerical features for
each class. Under-Sampling: Here we build a sample dataset containing similar dis-
tribution of normal transactions and Fraudulent Transactions. Number of Fraudulent
Transactions are 492. item legit sample = legit.sample(n=492):We randomly select
a sample of 492 legitimate transactions from the legit Data Frame. This is done to
balance the dataset since there are only 492 fraudulent transactions. new dataset =
pd.concat([legit sample, fraud], axis=0): Combines the sampled legitimate transac-
tions (legit sample) and the fraudulent transactions (fraud) into a new dataset (new
dataset).

new dataset.head(): Displays the first 5 rows of the newly created new dataset, giving a
quick preview of the combined data. When you run the command new dataset.head(),
it displays the first 5 rows of the newly created dataset called new dataset. This is
a quick and common way to get a preview of the data, especially after you’ve made
modifications or combined different datasets. new dataset.tail(): Displays the last 5
rows of the new dataset, allowing a view of the dataset’s ending entries. The head()
function in Pandas is a method that shows you the first 5 rows of a DataFrame by
default.

Figure 4.7: Concatenating two Data Frames

It’s a convenient way to quickly inspect the contents of your dataset, especially when
you are working with large datasets and don’t want to print the entire DataFrame.

Dept of ECE, RNSIT, Bengaluru 31


Credit Card Fraud Detection Using Machine Learning 2024-25
new dataset[’Class’].value counts(): Displays the distribution of the Class column in
the new dataset. The output shows that there are 492 fraudulent transactions (Class
= 1) and 492 legitimate transactions (Class = 0), indicating the dataset is now bal-
anced. new dataset.groupby(’Class’).mean(): Groups the new dataset by the Class
column (fraudulent vs legitimate) and calculates the mean of all numerical features
for each class. This provides a comparison of average values across features for both
fraudulent and legitimate transactions.

Figure 4.8: Verifying that the dataset is balanced

X = new dataset.drop(columns=’Class’, axis=1): This drops the Class column from


the new dataset to create the feature set (X). The axis=1 argument specifies that we’re
removing a column (not a row). X now contains all the features (i.e., all columns ex-
cept Class) that will be used to train a machine learning model. When you run the
code X = new dataset.drop(columns=’Class’, axis=1), it performs a crucial step in
preparing the data for machine learning by separating the feature set (X) from the
target variable (Class). This part of the code is used to remove the Class column from
the new dataset. The Class column is typically the target variable in a supervised
machine learning task. It represents whether a transaction is legitimate (0) or fraud-
ulent (1). columns=’Class’ specifies that we want to drop the Class column. This
is because the target variable is not part of the feature set used to train the model;
instead, it is used as the label that the model will predict.

The resulting DataFrame, stored in the variable X, will now contain all the feature
columns from new dataset except for the Class column. Features (X): These are the
input variables used by the machine learning model to make predictions. For example,
in the case of fraud detection, features could include transaction amounts, time, and
various derived variables (e.g., V1, V2, V3, etc.). Target Variable (Class): The Class
column (which we just dropped) is the output variable that the model aims to predict.
This column indicates whether a transaction is legitimate (0) or fraudulent (1).
Dept of ECE, RNSIT, Bengaluru 32
Credit Card Fraud Detection Using Machine Learning 2024-25

Y = new dataset[’Class’]: It extracts the Class column from the new dataset as
the target variable (Y). Y will be used to train the model to predict whether a trans-
action is legitimate or fraudulent. When you run the code Y = new dataset[’Class’],
it extracts the target variable (the Class column) from the new dataset and assigns
it to the variable Y. This is an important step in preparing your dataset for machine
learning, where Y represents the output the model is trying to predict. The Class
column in the dataset represents the target variable. In the context of a classification
problem, the target variable is the value that the machine learning model will predict
based on the features. The Class column likely contains binary values (e.g., 0 for
legitimate transactions and 1 for fraudulent transactions).

The Class column is extracted and assigned to the variable Y. This variable will serve
as the target or labels that the machine learning model will attempt to predict. In
this case, Y will represent whether each transaction is legitimate (0) or fraudulent (1).
Since this is a supervised learning problem, the model learns to map input features
(X) to the target variable (Y). The model’s goal is to learn the patterns in the data
(from the features) that correspond to the target labels (fraudulent or legitimate).
After splitting the data into training and test sets, you will use X (features) and Y
(target) to train the machine learning model.

Figure 4.9: Splitting the data into Features and Targets

X train, X test, Y train, Y test = train test split(X, Y, test size=0.2, stratify=Y, ran-
dom state=2): Splits the data (X for features and Y for target labels) into training
and testing sets. test size=0.2 means 20 per cent of the data will be used for testing,

Dept of ECE, RNSIT, Bengaluru 33


Credit Card Fraud Detection Using Machine Learning 2024-25
and 80 per cent for training. The code is used to split the data (features and target)
into training and testing sets. This is a crucial step in machine learning, allowing
you to train the model on one portion of the data and evaluate its performance on
another. This represents the input features (all columns except the Class column,
which we’ve already dropped from the dataset).

X contains all the information that the model will use to make predictions, such
as transaction amounts, times, and other relevant features. This is the target variable
(Class column), which contains the labels (either 0 for legitimate or 1 for fraudulent
transactions). Y is the actual outcome that the model is trying to predict based on
the features in X. This function from sklearn.model selection splits your dataset into
training and testing sets. The training set will be used to train the model, while the
testing set will be used to evaluate the model’s performance. This parameter specifies
the proportion of the dataset to be included in the test set. test size=0.2 means 20%
of the data will be used for testing, and the remaining 80% will be used for training
the model. A typical split ratio is 80% for training and 20% for testing, but other
ratios can be used depending on the situation.

Figure 4.10: Split the data into Training and Testing Data

print(X.shape, X train.shape, X test.shape): Prints the shapes of the original dataset


X (984 samples, 30 features), the training set X train (787 samples, 30 features),
and the test set X test (197 samples, 30 features). The code is used to print the
shapes of the datasets, showing how many samples (rows) and features (columns) are
in the original dataset (X), the training set (X train), and the test set (X test). X
contains the features (the columns except the target variable Class) from the original
dataset.shape is an attribute of a Pandas DataFrame (or a NumPy array) that returns
Dept of ECE, RNSIT, Bengaluru 34
Credit Card Fraud Detection Using Machine Learning 2024-25
a tuple representing the dimensions of the data (number of rows and columns). The
output of X.shape gives you the total number of samples (rows) and the number of
features (columns) in the original dataset before splitting it. For example, if X.shape
is (984, 30), it means that there are 984 samples (observations or rows) and 30 features
(columns). After splitting the dataset into training and testing sets, X train contains
the features for the training set. This dataset will be used to train the machine
learning model. The .shape attribute for X train will give you the number of samples
and features in the training set. If X train.shape is (787, 30), this means that there
are 787 samples in the training set, and 30 features.

Figure 4.11: Model Training

model = LogisticRegression(): Initializes a Logistic Regression model. This model will


be used for binary classification (fraudulent vs. legitimate transactions).This code is
used to initialize a Logistic Regression model in Python, specifically using scikit-learn
(a popular machine learning library). Let’s break down what happens in this line
and how Logistic Regression works in the context of binary classification, like fraud
detection. Logistic Regression is a statistical method for binary classification prob-
lems, where the goal is to classify data into one of two possible outcomes or classes.
Even though it’s called ”regression,” Logistic Regression is used for classification tasks
because it predicts the probability of an observation belonging to a particular class.

Instead of outputting a raw score, Logistic Regression passes this linear combina-
tion z through a sigmoid function, which maps any real-valued number to a value
between 0 and 1. The model predicts probabilities for class 1 (fraudulent) for each
transaction. Based on a threshold (typically 0.5), the prediction is made: if the pre-
dicted probability p is greater than or equal to 0.5, the model classifies the transaction
as fraudulent (Class 1). If the predicted probability p is less than 0.5, the model clas-
sifies the transaction as legitimate (Class 0).
Dept of ECE, RNSIT, Bengaluru 35
Credit Card Fraud Detection Using Machine Learning 2024-25
model.fit(X train, Y train): Trains the logistic regression model using the training
data (X train for features and Y train for the target labels). The model learns the
relationship between the features and the target class (fraud or legitimate). The line
of code is where the Logistic Regression model is actually trained on the training
data (X train for features and Y train for target labels). This is a key step in ma-
chine learning, as it allows the model to ”learn” from the data, enabling it to make
predictions on unseen data. X train contains the feature data (i.e., the independent
variables that describe each transaction, such as transaction amount, time, etc.). Y
train contains the target labels (i.e., the class labels for each transaction, where 0
represents a legitimate transaction and 1 represents a fraudulent transaction).

The Logistic Regression model uses this data to learn the relationship between the
input features (X train) and the target labels (Y train). In other words, it tries to
understand how the features of a transaction (like the amount, time, etc.) can help
determine if it’s fraudulent or legitimate. The model uses an optimization algorithm,
such as gradient descent, to minimize the cost function or loss function. The loss
function measures how well the model’s predictions match the true target labels. In
logistic regression, the loss function is typically the log-loss (cross-entropy loss), which
penalizes the model more when it makes incorrect predictions. The optimization pro-
cess iteratively adjusts the weights to minimize the log-loss function and improve the
model’s accuracy. The model continues adjusting the weights through multiple iter-
ations until it reaches a point where the loss function cannot be minimized further,
or the changes in the loss function are very small. At this point, the model has con-
verged, and it has learned the optimal set of weights for the given data.

model = LogisticRegression(): Initializes a Logistic Regression model. This model


will be used for binary classification (fraudulent vs. legitimate transactions). The line
of code is used to initialize a Logistic Regression model from the scikit-learn library.
Logistic Regression is a statistical method that is commonly used for binary classifi-
cation, which means it classifies data into one of two possible outcomes. In this case,
it is being used to predict whether a transaction is fraudulent or legitimate. Despite
the name ”regression,” Logistic Regression is actually a classification algorithm. It is
specifically designed to handle problems where the target variable is binary (i.e., there
are two possible outcomes). In Logistic Regression, the objective is to determine the
probability that a given input belongs to one of these two classes. At its core, Logistic
Regression models the relationship between a set of independent variables (features)
and a binary dependent variable (target class). The algorithm uses a linear equation
to calculate a value that determines the likelihood of a particular outcome.

Dept of ECE, RNSIT, Bengaluru 36


Credit Card Fraud Detection Using Machine Learning 2024-25
model.fit(X train, Y train): Trains the logistic regression model using the training
data (X train for features and Y train for the target labels). The model learns the
relationship between the features and the target class (fraud or legitimate). The
line of code is used to train the Logistic Regression model using the training data.
This is the key step where the model learns from the input data to understand the
relationship between the features (X train) and the target labels (Y train), so that it
can make predictions about future data. In machine learning, fitting a model refers to
the process of training it on a dataset. When you call .fit(), you’re telling the model to
learn from the data, adjusting its internal parameters (weights and bias in the case of
logistic regression) to minimize prediction errors. For the Logistic Regression model:
X train represents the features (input data), which are the different characteristics
of the data points that we use to make predictions. For example, in fraud detection,
these might include features like transaction amount, time of the transaction, user
location, etc. Y train represents the target labels (the correct outcomes), which
indicate whether each transaction is fraudulent (1) or legitimate (0). These are the
values that the model tries to predict.

Figure 4.12: Model Evaluation

Accuracy on Training Data: X train prediction = model.predict(X train): Uses the


trained model to make predictions on the training data (X train). In machine learn-
ing, after training the model (using model.fit()), we want to see how well the model
has learned the relationship between the features (input data) and the target labels
(output data). To do this, we predict the target labels on the training set itself. The
predict() function of the trained model is used to generate predictions based on the
features provided as input. Here’s how it works: X train: This is the feature set (input
data) from the training set, which contains the same features (columns) used to train

Dept of ECE, RNSIT, Bengaluru 37


Credit Card Fraud Detection Using Machine Learning 2024-25
the model. It consists of the characteristics of each transaction, such as transaction
amount, time, and user data. model.predict(X train): This generates predictions for
each data point in X train. Each prediction is a value indicating whether the model
believes that a given transaction is legitimate (0) or fraudulent (1). Performance
Check: We use model.predict(X train) to see how well the model performs on the
data it has already seen. It gives an indication of how well the model has learned
from the training data.

Training Accuracy: By comparing the predicted labels (from model.predict(X train))


with the actual labels (the target variable Y train), we can calculate how accurate
the model is on the training set. This is typically done by computing the accuracy
score or other evaluation metrics like precision, recall, F1 score, etc. Understand-
ing Overfitting/Underfitting: Evaluating the model’s performance on the training set
is important for diagnosing potential issues such as overfitting (where the model is
too closely fitted to the training data and might not generalize well to new data) or
underfitting (where the model hasn’t learned enough from the training data). If the
accuracy is very high on the training data but low on the test data, this might indicate
overfitting. On the other hand, if the accuracy is low on both training and testing
sets, it may indicate underfitting.

Training data accuracy = accuracy score(X train prediction, Y train): Compares


the predicted values (X train prediction) with the actual values (Y train) and cal-
culates the accuracy score for the training data. The accuracy score() function from
sklearn.metrics compares the predicted values (X train prediction) with the true values
(Y train) and calculates how well the model performed on the training set. Specifi-
cally, accuracy is defined as the ratio of the number of correct predictions to the total
number of predictions. Predicted Values (X train prediction): These are the predic-
tions made by the model on the training data. The model generates these predictions
by applying the learned weights and bias to the features in X train and classifying each
transaction as either legitimate (0) or fraudulent (1). True Values (Y train): These
are the true target labels for the training data, indicating whether each transaction
is actually legitimate (0) or fraudulent (1). The Y train data contains the correct an-
swers. Calculating Accuracy: Accuracy is the proportion of correct predictions made
by the model, i.e., how often the model’s predictions match the labels.

Model Performance Assessment: Accuracy gives an overall sense of how well the
model is doing in classifying the data into the correct categories (fraudulent or legiti-
mate). If the accuracy is high, it suggests that the model is making correct predictions

Dept of ECE, RNSIT, Bengaluru 38


Credit Card Fraud Detection Using Machine Learning 2024-25
most of the time on the training data. Indicator of Model Fit: High accuracy on the
training data might indicate that the model is well-fit to the data, i.e., it has learned
the patterns well. However, it could also signal that the model is overfitting the train-
ing data (memorizing it) and may not perform well on new, unseen data. This is
especially true if the accuracy is very high on the training set but low on the testing
set. Identifying Overfitting or Underfitting: Overfitting: If the accuracy is very high
on the training set but low on the test set, it suggests that the model has overfit to the
training data, meaning it learned the noise or irrelevant details that won’t generalize
well to unseen data. Underfitting: If the accuracy is low on both the training and test
sets, the model might be underfitting, meaning it has failed to capture the underly-
ing patterns in the data. This could be due to an overly simple model, insufficient
features, or improper training.

The training accuracy is a measure of how good the model is at classifying the train-
ing data. However, for imbalanced datasets, like in fraud detection where fraudulent
transactions are much fewer than legitimate ones, accuracy alone can be misleading.
Class Imbalance Problem: In highly imbalanced datasets (e.g., where there are far
more legitimate transactions than fraudulent ones), a model that always predicts ”le-
gitimate” (0) could still achieve high accuracy, even if it fails to identify any fraudulent
transactions (Class 1). For example, if 99% of the data is legitimate, a model that
predicts legitimate for all transactions will have an accuracy of 99%. But it will not
help in detecting fraud, which is the goal of the model. In fraud detection, recall
(how many actual fraudulent transactions are detected) is often more important than
accuracy, because failing to detect fraud (false negatives) can be costly. So, you might
prioritize a model that has good recall, even if it sacrifices some accuracy.

Accuracy on Test Data: X test prediction = model.predict(X test): Uses the trained
model to make predictions on the test data (X test). The line of code is used to make
predictions on the test data using the trained logistic regression model. After training
the model on the training data, the next step is to test how well the model generalizes
to unseen data (the test data). This is done by using the predict() method on the test
data (X test), which allows the model to classify the test samples into one of the two
classes: fraudulent (1) or legitimate (0). Test Data (X test): X test consists of the
features (input variables) of the test data, just like X train contains the features of
the training data. The test data has not been used during the training process, so it
represents new, unseen data. The model will make predictions based on these unseen
features. Trained Model (model): The model is already trained (using model.fit(X

Dept of ECE, RNSIT, Bengaluru 39


Credit Card Fraud Detection Using Machine Learning 2024-25
train, Y train)) and has learned how to classify transactions into legitimate or fraud-
ulent based on the patterns found in the training data. Prediction (model.predict(X
test)): When you call model.predict(X test), the trained model uses the input features
in X test to predict the target label for each test sample. It will output predictions
in the form of a vector (array) of 0s (legitimate) and 1s (fraudulent).

Test data accuracy = accuracy score(X test prediction, Y test): Compares the pre-
dicted values (X test prediction) with the actual values (Y test) and calculates the
accuracy score for the test data. Exploratory Data Analysis (EDA) is an essential step
in the data science process, where you analyze and visualize the data to understand
its underlying patterns, detect anomalies, and check assumptions before applying ma-
chine learning models like Logistic Regression. For Logistic Regression, the expression
for EDA focuses on understanding the relationships between the features and the tar-
get variable (binary outcome), as well as checking data quality. The line of code is
used to calculate the accuracy of the model on the test data, which is an important
step to evaluate how well the model performs when applied to unseen data. The
function accuracy score() compares the predicted values (X test prediction) with the
actual values (Y test) from the test dataset and calculates the accuracy. Accuracy is
simply the percentage of correct predictions the model makes out of the total number
of predictions made.

Exploratory Data Analysis (EDA) is a crucial phase in the data science workflow.
It involves analyzing and visualizing the data to better understand its structure, iden-
tify patterns, check assumptions, and detect anomalies before building machine learn-
ing models. For Logistic Regression, which is a supervised learning algorithm, EDA
primarily focuses on: Understanding Relationships between Features and Target Vari-
able. Checking Data Quality to ensure there are no issues like missing values, outliers,
or incorrect data types. Visualizing Data to better comprehend the distribution of
features and their interaction with the target variable. For Logistic Regression, the
goal of EDA is to ensure that the data is prepared and suitable for modeling. Logistic
Regression is used for binary classification tasks, where the target variable (Y) has
two possible values (e.g., 0 or 1, which in fraud detection means legitimate or fraudu-
lent). Each feature (or independent variable) needs to be checked for its relationship
with the target variable (e.g., Class in fraud detection). For Logistic Regression, it’s
important to know if the features are linearly related to the target (since Logistic
Regression assumes linearity between the features and log-odds of the target). Vi-
sualize the features against the target variable to understand their distribution and
correlation with the target.

Dept of ECE, RNSIT, Bengaluru 40


Chapter 5
Result Analysis

In this chapter we will analyze the results obtained in the project. The main aim of
this project is the detection of credit card fraudulent transactions, as it’s important
to figure out the fraudulent transactions so that customers don’t get charged for the
purchase of products that they didn’t buy.The detection of the credit card fraudulent
transactions will be performed with multiple ML techniques then a comparison will be
made between the outcomes and results of each technique to find the best and most
suited model in the detection of the credit card transaction that are fraudulent, graphs
and numbers will be provided as well. In addition, exploring previous literatures and
different techniques used to distinguish the fraud within a dataset.the main objective
of this project was to find the most suited model in credit card fraud detection in
terms of the machine learning techniques. The result of a credit card fraud detection
using machine learning project typically involves evaluating the model’s effectiveness
in identifying fraudulent transactions, often by measuring several key performance
metrics.

The result of the project would typically be a trained model that can predict whether
a transaction is fraudulent or not with good accuracy and recall, alongside insights
into the effectiveness of various machine learning techniques and approaches to handle
imbalanced data.The considered dataset included 284,807 transactions, 492 of which
were fraudulent and the rest were legitimate. We can observe from the numbers that
this dataset is severely skewed, with only 0.173 percent of transactions being classi-
fied as fraudulent. Among the 31 features, Class has only two values: 1 in the case
of a fraud transaction and 0 otherwise.he most basic performance statistic is accu-
racy, which is just the percentage of properly predicted observations to all observed
data. One can assume that our model is the best if it has a high level of accuracy.As a
result, other parameters must be considered while evaluating the models’ performance.

The project focuses on applying multiple machine learning (ML) techniques to de-
tect fraudulent credit card transactions. The dataset we are working with contains a
mixture of legitimate and fraudulent transactions, and the goal is to determine the
most effective model to identify the fraudulent ones. After training multiple models, a
comparison of their results will help us identify the best model for fraud detection. Key

41
Credit Card Fraud Detection Using Machine Learning 2024-25
metrics such as accuracy, recall, precision, and F1-score will be used to evaluate and
compare the models’ performance. To improve the model’s ability to detect fraud,
it’s essential to address the class imbalance present in the dataset. The dataset is
heavily skewed, with only 492 fraudulent transactions out of a total of 284,807. This
means that only 0.173% of the transactions are fraudulent, and the rest are legiti-
mate. This imbalance introduces challenges in training models, as most models may
be biased toward predicting the majority class (legitimate transactions). The dataset
contains 31 features describing the transactions, including numerical and categorical
attributes. The most important column is the Class column, which indicates whether
a transaction is fraudulent (1) or legitimate (0). With 284,807 total transactions and
only 492 fraudulent transactions, the dataset is highly imbalanced. This is typical for
fraud detection, where fraudulent cases are rare but critical to identify. Class imbal-
ance is one of the most significant challenges in fraud detection. A model that simply
predicts that all transactions are legitimate would achieve a high accuracy (over 99%)
but would fail to detect any fraud, resulting in poor performance when it comes to
identifying actual fraudulent transactions. Therefore, alternative evaluation metrics
(such as recall and precision) are critical when measuring the model’s performance.

Figure 5.1: Result

Dept of ECE, RNSIT, Bengaluru 42


Credit Card Fraud Detection Using Machine Learning 2024-25

Dept of ECE, RNSIT, Bengaluru 43


Chapter 6
Conclusion and Future scope

Credit card fraud detection using a logistic regression model has successfully imple-
mented a machine learning approach to identify fraudulent transactions through the
following steps:

Data Preprocessing: The dataset was cleaned, missing values were handled, and the
data was balanced using sampling techniques. Model Building: A logistic regression
model was trained using the features (such as transaction amount, time, etc.) to
predict whether a transaction is fraudulent (Class = 1) or legitimate (Class = 0).

Model Evaluation: The model was evaluated using accuracy, which showed a high
performance with around 94 per cent accuracy on the training set and 93.9 per cent
accuracy on the test set. These results indicate that the model is well-suited for de-
tecting fraudulent transactions with good generalization to unseen data.

While the logistic regression model provides a solid approach for fraud detection,
it is important to note that the dataset was imbalanced, with a much larger number
of legitimate transactions compared to fraudulent ones. Balancing techniques, such
as oversampling or undersampling, were used to address this issue.

Model Improvement: Advanced Algorithms: Logistic regression provided good


accuracy and improve performance, especially for detecting rare fraudulent transac-
tions. Hyperparameter Tuning: Further fine-tuning of hyperparameters, such as the
regularization parameter C in logistic regression, can help improve the model’s gener-
alization and performance. The Logistic Regression model provided a solid baseline
for detecting fraudulent transactions. However, to improve the model’s performance
further, especially in the face of class imbalance and the rare nature of fraudulent
transactions, two main strategies can be applied. Advanced Algorithms: Enhancing
Logistic Regression for Fraud Detection: While Logistic Regression is a simple and in-
terpretable model, it may not be the most powerful when handling complex patterns,
especially in cases where fraudulent transactions are rare.Hyperparameter Tuning in
Logistic Regression: Hyperparameter tuning involves adjusting the hyperparameters
of the model to enhance its performance. In the case of Logistic Regression, there

44
Credit Card Fraud Detection Using Machine Learning 2024-25
are several important hyperparameters that can be optimized to improve the model’s
generalization and performance, especially in the context of fraud detection. By lever-
aging advanced algorithms and performing thorough hyperparameter tuning, we can
improve the performance of the Logistic Regression model for credit card fraud detec-
tion. Techniques such as regularization, class balancing, and ensemble methods can
enhance the model’s ability to identify rare fraudulent transactions and improve its
robustness to overfitting.
Handling Class Imbalance: Implementing advanced balancing techniques like
SMOTE (Synthetic Minority Over-sampling Technique) or Cost-sensitive Learning
can enhance the model’s ability to correctly predict fraudulent transactions, which
are underrepresented in the dataset. In the Credit Card Fraud Detection project, the
dataset is highly imbalanced, with the majority of transactions being legitimate (class
0) and a very small proportion being fraudulent (class 1). This class imbalance is a
significant challenge for any machine learning model, as the model tends to be biased
toward predicting the majority class (legitimate transactions) and may underperform
in detecting the minority class (fraudulent transactions).

To address this issue, we can apply various advanced techniques like SMOTE (Syn-
thetic Minority Over-sampling Technique) and Cost-sensitive Learning. These tech-
niques aim to enhance the model’s ability to correctly identify fraudulent transactions
while maintaining the overall accuracy of the model. SMOTE is a powerful technique
designed to handle imbalanced datasets by generating synthetic samples for the minor-
ity class (fraudulent transactions) rather than simply duplicating existing ones. This
helps the model learn more about the minority class and improves its ability to make
accurate predictions for the underrepresented class. By applying these techniques, the
model can achieve better recall and precision for detecting fraudulent transactions,
which is vital in minimizing financial losses and ensuring a secure transaction envi-
ronment for customers.
Feature Engineering: More sophisticated feature engineering can be done to ex-
tract additional relevant features or transform existing ones. Features such as user
behavior and transaction patterns over time can be valuable in identifying fraudulent
transactions. In the Credit Card Fraud Detection project, feature engineering plays a
crucial role in improving the model’s ability to correctly identify fraudulent transac-
tions. The goal of feature engineering is to create or transform existing features in ways
that make them more informative for the machine learning model, especially when
detecting anomalies like fraud. By improving the quality and relevance of features,
we can enhance the performance of the Logistic Regression algorithm. Fraudulent
transactions are often characterized by unusual patterns and behaviors that are not

Dept of ECE, RNSIT, Bengaluru 45


Credit Card Fraud Detection Using Machine Learning 2024-25
directly captured by the raw data (e.g., individual transaction amounts). Therefore,
feature engineering helps uncover these hidden relationships by generating new fea-
tures that better represent fraudulent behaviors and transaction patterns. In the case
of Logistic Regression, which is a linear model, better feature engineering can help
make the decision boundary between fraudulent and legitimate transactions clearer
and more effective.

Model Evaluation Metrics: Instead of just accuracy, other evaluation metrics like
precision, recall, F1-score, and AUC-ROC curve should be considered, especially for
imbalanced datasets, to get a better sense of model performance on detecting fraud.
In the Credit Card Fraud Detection project, the primary objective is to correctly iden-
tify fraudulent transactions (Class = 1) while minimizing the number of false positives
(legitimate transactions incorrectly classified as fraud) and false negatives (fraudulent
transactions incorrectly classified as legitimate). Since the dataset is highly imbal-
anced, with a very small fraction of fraudulent transactions, evaluating the model’s
performance based solely on accuracy might not give a true picture of its effectiveness.

Therefore, in this project, additional evaluation metrics such as precision, recall,


F1-score, and AUC-ROC curve are crucial for understanding how well the model
is detecting fraudulent transactions. When evaluating the performance of a Logistic
Regression model for credit card fraud detection, it is important to consider a variety
of metrics beyond accuracy. Precision, recall, F1-score, and the AUC-ROC curve give
a more comprehensive understanding of how well the model is detecting fraudulent
transactions, especially in the context of an imbalanced dataset. These metrics help
address the key goals of the project: minimizing fraud detection errors and improving
the model’s ability to detect fraud without overburdening legitimate customers with
false fraud flags.

Dept of ECE, RNSIT, Bengaluru 46


References

[1] Performance Evaluation of Machine Learning Algorithmsfor Credit Card Fraud


Detection Sangeeta Mittal;Shivani Tyagi 2019 9th International Conference on
Cloud Computing, Data Science Engineering (Confluence)

[2] Credit Card Fraud Detection using Machine Learning Algorithms, Vaishnavi
Nath Dornadulaa, S Geetha. 2022 International Conference on Electrical, Com-
puter, Communications and Mechatronics Engineering (ICECCME).

[3] Design and Implementation of Different Machine Learning Algorithms for Credit
Card Fraud Detection, Aditi Singh, Anoushka Singh, Anshul Aggarwal, Anamika
Chauhan 2024 5th International Conference on Smart Electronics and Commu-
nication (ICOSEC)

[4] A Review of Credit Card Fraud Detection Using Machine Learning Algorithms
(Adesola Gregory Oketola, Ayo Agbeja and Tobi Gbadebo Ogunmefun) 2023
2nd International Conference on Paradigm Shifts in Communications Embedded
Systems, Machine Learning and Signal Processing (PCEMS)

[5] Fraud Detection of Credit Card Using Logistic Regression: Nasser Hussain Mo-
hammed, Kakatiya Institute of Technology and Science, Warangal, Sai Charan
Reddy Maram, International Journal of Computer Science and Network (IJCSN),
vol. 1, no. 4, pp. 31-35, 2019, ISSN ISSN: 2277-5420

[6] Credit Card Fraud Detection Using Machine Learning Algorithms: Anagha T S;
Asra Fathima; Archana D. Naik; Chirag Goenka; Shridhar B. Devamane; Aneesh
R Thimmapurmath, International Journal of Computer Science and Mobile Com-
puting (IJCSMC)

[7] Performance Evaluation of Machine Learning Algorithms for Credit Card Fraud
Detection: Sangeeta Mittal, Shivani Tyagi, Decision Support Systems, vol. 50,
no. 3, pp. 602-613, 2019

[8] Advanced Machine Learning Techniques for Credit Card Fraud Detection: A
Comprehensive Study, Vishnu R. Sonwane, Siddika Zanje, Siddhant Yenpure,
Yash Gunjal, Yash Kulkarni, Rohit Yeole, International Journal of Scientific En-
gineering and Technology, vol. 1, no. 3, pp. 194-198, 2019, ISSN ISSN: 2277-1581.

47
Credit Card Fraud Detection Using Machine Learning 2024-25
[9] Credit Card Fraud Detection Using Machine Learning Techniques: In-
drani Vejalla,Sai Preethi Battula, Kartheek Kalluri, Hemantha Kumar
Kalluri,Communications and Mechatronics Engineering (ICECCME)

[10] Comparison and analysis of logistic regression algorithm for credit card fraud
detection: Fayaz Itoo, Meenakshi Satwinder Singh, Data Science Engineering
(Confluence)

[11] Design and Implementation of Different Machine Learning Algorithms for Credit
Card Fraud Detection: Aditi Singh, Anoushka Singh, Anshul Aggarwal, Anamika
Chauhan, 2024 5th International Conference on Smart Electronics and Commu-
nication (ICOSEC).

[12] A Survey on Credit Card Fraud Detection Using Machine Learning: Rimpal R.
Popat, Jayesh Chaudhary, International Journal of Computer Science and Mobile
Computing (IJCSMC)

Dept of ECE, RNSIT, Bengaluru 48

You might also like