0% found this document useful (0 votes)

14 views39 pages

Enoch Project

The document discusses the growing issue of credit card fraud in healthcare, emphasizing the need for effective fraud detection systems due to increasing theft and financial losses. It proposes a machine learning model utilizing Recursive Feature Elimination with classifiers like K-Nearest Neighbor, Naïve Bayes, and Logistic Regression to enhance detection accuracy. The study aims to improve fraud detection methods and reduce financial losses for victims in the healthcare sector.

Uploaded by

ogunrindeenoch22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views39 pages

Enoch Project

Uploaded by

ogunrindeenoch22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 39

CHAPTER 1

INTRODUCTION

1.1 Background of Study

Credit card use has grown in popularity across numerous areas, including
healthcare, due to the simplicity with which transactions can be made with them.
Credit cards have made internet transactions more convenient and accessible as
society moves toward cashless transactions (Mehbodniya et al, 2021). Fraud
transactions, on the other hand, result in a large loss of capital every year, which is
expected to rise in the future year. The technique for identifying fraud can be done
manually, which includes fraud investigators estimating each transaction and
providing binary feedback on each one, or automatically, which is done by
algorithms based on all prior ways fraud transactions have occurred. Health-care
fraud is a severe problem that affects both patients and providers of health-care
services. As a result, fraud detection is critical while doing online transactions.
The technique of examining the behavior of cardholder transactions to determine
whether they are genuine is known as fraud detection. The unauthorized use of a
credit or debit card, or a comparable payment method (ACH, EFT, recurring
charge, etc.) to obtain money or property is known as credit card fraud. Credit and
debit card numbers can be taken via unprotected websites or identity theft schemes.
Many technologies are constantly accustomed to gain information that is sensitive
regarding the people who use credit cards, such as phishing and virus-like trojans.
As a result, robust technology for identifying various types of credit card fraud
should be available. For training the other standard and Other algorithms such as
Naive Bayes, Logistic Regression, K-Nearest Neighbor (KNN), Random Forest,
and the sequential Convolution Neural Network are employed in addition to deep
learning techniques for detecting credit card fraud. (Mehbodniya et al, 2021).
The study of computer algorithms that may learn and improve over time as a result
of their experience and usage of data is known as machine learning (ML) (Mitchell
tom, 1997). It's a type of artificial intelligence. Machine learning algorithms create
a model from training data and use it to make predictions or judgments without the
need for explicit programming (Arthur Samuel, 1959). Machine learning
algorithms are employed in a range of applications where traditional algorithms are
difficult or impossible to design, such as medicine, email filtering, speech
recognition, and computer vision (Hu. J et al, 2020).
Several authors have used machine learning to model and assess publicly available
data; however, the performance of several approaches, including The Sequential
Convolutional Neural Network, Naive Bayes, Logistics Regression, K-Nearest
Neighbor(KNN), Random Forest, and Naive Bayes, among others, produces
superior results but requires more improvement, according to the comparison
analysis.
However, because the K-Nearest Neighbor algorithm calculates the distance
between data points for every training samples, it has a large computing cost. This
could have an impact on the accuracy of the outcome.

This study proposes a machine learning detection model for financial fraud
detection in health care using the Recursive Feature Elimination (RFE) feature
selection method with three classifiers: K-Nearest Neighbor, Logistic Regression,
and Nave Bayes to analyze and evaluate the performance of the developed model.

1.2 Statement of Problem

As contemporary technology progresses and global communication superhighways

are established, credit card theft is on the rise. Credit card theft costs consumers
and financial institutions billions of dollars every year, and fraudsters are always
looking for new methods to take advantage of people. As a result, fraud detection
systems have become critical for hospitals, banks, and financial institutions to
avoid losses. However, due to a lack of a credit card transactions dataset for
researchers, there is a shortage of published work on credit card fraud detection
systems. The Nave Bayes (NB), Support Vector Machines (SVM), and K-Nearest
Neighbor algorithms (KNN) are the most often utilized fraud detection methods
(Zareapoor & Shamsolmoali, 2015). When a very unequal class distribution is used
as input to a predictive model, the model is biased toward the majority samples.
Credit card fraud, on the other hand, is difficult to detect for two reasons: I The
data is heavily skewed, meaning that the bulk of samples (actual instances)
overwhelm the minority samples (fraudulent cases). As a result, a fraudulent
transaction may be misrepresented as a legitimate one. (2018, Shakya)

To address this problem, this research proposes the development of a model that
uses machine learning techniques like feature selection with KNN, Naïve Bayes
and Logistic Regression to classify the dataset.
1.3 Aim and Objectives

The Aim of this project is to detect fraudulent transactions involving credit cards
fraud in the health care sector. The objectives of this project are to;
i. To design a financial fraud detection model using existing dataset from open
source repository.
ii. Develop a machine learning model using a feature selection method called
RFE with KNN, Naïve Bayes and Logistic Regression to create the intended
detection model.
iii. Evaluate the implemented algorithms in terms of performance metrics.
iv. To compare this model to other previously developed model.

1.4 Significance of The Study

The significance of this study is to aid in improving fraud detection accuracy and
performance. This study is aimed at the victims of financial fraud in health care
sector like; patients, customers and health service provider.

1.5 Scope of The Study

This project helps in the further reduction of financial losses in regards to credit
card caused by financial fraud in health cares on victims like patients, customers
and health care service providers. This study uses machine learning algorithm to
detect financial fraud and dimensionality reduction to for reducing the number of
input variables in training data
CHATPER 2
LITERATURE REVIEW

2.1 Health Care

Medical professionals, organizations, and ancillary health care staff all work in the
health services industry, providing medical treatment to those in need. Patients,
families, communities, and the broader public all benefit from health services.
Emergency, preventive, rehabilitative, long-term, hospital, diagnostic, primary,
palliative, and home care services are all covered. These services strive to improve
the accessibility, quality, and patient-centeredness of health care. Several different
types of care and providers are required to give successful health services. Medical
professionals, organizations, and ancillary health care staff all work in the health
services industry, providing medical treatment to those in need. Patients, families,
communities, and the broader public all benefit from health services. Emergency,
preventive, rehabilitative, long-term, hospital, diagnostic, primary, palliative, and
home care services are all covered. These services aim to make health care more
accessible, high-quality, and patient-centered. To deliver good health services, a
variety of various forms of treatment and providers are necessary. (Health
Services: Definition, Types & Providers, 2018)

2.2 Financial Fraud

When someone defrauds you of money or harms your financial well-being in any
way by deception, fraud, or other unlawful means, it is known as financial fraud.
Identity theft and investment fraud are two examples of how this might be
performed. The bulk of victim compensation schemes do not reimburse money lost
due to deception or fraud. You should investigate your state's victim compensation
laws. Civil justice may be the only legal option for recovering money that has been
misappropriated. Regardless of the type of financial fraud, it is vital to report the
crime as quickly as possible to the appropriate agencies and law enforcement.
When fraudulent charges are discovered, they should be challenged or cancelled as
quickly as possible. Victims should also collect evidence connected to the crime,
such as bank statements, credit reports, and current and prior year tax forms, and
continue to file crucial information throughout the reporting process. (Apoorva,
2022)

2.3 Financial Fraud Detection System

Fraud is deception with the intent of obtaining an unlawful advantage for the
offender or depriving a victim of a right. Fraud may take many forms, including
tax fraud, credit card fraud, wire fraud, securities fraud, and bankruptcy fraud. A
single person, a group of people, or an entire corporation can engage in fraudulent
behavior (James & Margaret, 2021)
Financial fraud happens when someone takes your money or harms your financial
well-being by using deceptive, dishonest, or illegal methods. Identity theft and
investment fraud are two examples of how this might be performed.
Fraud detection refers to a set of measures for preventing money or property from
being obtained via deception. Fraud detection is utilized in a wide range of
businesses, such as banking and insurance. Banking fraud includes things like
check forgery and the use of stolen credit cards. Other sorts of fraud include
exaggerating losses or fabricating an accident for the sole purpose of getting a
payout (Gillis, 2021).

With the limitless and ever-increasing number of methods someone may commit
fraud, detection can be difficult. A company's ability to identify fraud might be
harmed by reorganization, downsizing, transitioning to new information systems,
or experiencing a cybersecurity incident. Techniques such as real-time fraud
monitoring have been offered. Fraud should be examined in all financial
transactions, locations, devices utilized, initiated sessions, and authentication
methods.
Algorithms for detecting fraud are present in all modern financial systems. They're
a crucial tool for financial institutions to avoid chargebacks, investigation fees,
government fines, and brand damage. A good preventive and detection system can
benefit companies in numerous ways. It can screen out the great majority of
fraudulent transactions, allowing security personnel to focus on other tasks.
However, not all systems for detecting fraud are the same.. Machine learning-
based credit card fraud detection is an interesting new advancement in the field of
detecting payment abnormalities. It enables financial organizations to detect
fraudulent transactions with unprecedented precision. It aids in the reduction of
false positives for legitimate transactions. It accomplishes this while lowering
overall IT costs (sidelov, 2021).

2.4 Machine Learning

Machine learning (ML) is a type of artificial intelligence (AI) that allows software
to improve prediction accuracy without being created particularly for it. Machine
learning algorithms anticipate new output values using past data as input. 2021)
(Burns).

Machine learning is a type of data analysis in which analytical models are created
using artificial intelligence. It's a branch of artificial intelligence based on the
premise that computers can learn from data, recognize patterns, and make
judgments with little or no human input. Machine learning (ML) systems offer a
far more current and efficient method of automating safety procedures. ML
algorithms have shown to be extremely successful for a number of big business
firms (sidelov, 2021).

By learning from prior transactions and computations, Machine Learning employs

to deliver dependable and informed findings, pattern recognition is used. The
initial step in the machine learning process is to provide training data into the
algorithm of choice. Training data must be known or unknown in order to
construct the final machine learning algorithm. The type of training data utilized
has an effect on the approach, which is explained later. To see if the Machine
Learning algorithm is working appropriately, it is fed new data. Following that,
both the prediction and the outcome are double-checked. If the prediction does not
turn out as expected, the algorithm is retrained until the desired result is obtained.
This allows for individual machine learning algorithm training and the creation of
the optimum solution, resulting in increased precision over time. (Mark et al,
2018).
2.4.1 Types of Machine Learning
Depending on your use case, there are a variety of algorithms that may be used to
model your data in Machine Learning, the most of which fall into three categories;

i. Supervised Learning
ii. Unsupervised Learning
iii. Reinforcement Learning
2.4.1.1 Supervised Learning
Supervised learning is a method of developing artificial intelligence (AI) that
includes training a computer system on input data that has been tagged for a certain
output. When given never-before-seen data, the model is trained until it can find
the underlying patterns and links between the input data and the output labels,
allowing it to offer suitable labeling results (Petersson, 2021).

Currently, the Key algorithm for supervised learning is in use:

i. Polynomial regression
ii. Random forest
iii. Linear Regression
iv. Logistic regression
v. Decision trees
vi. K-Nearest Neighbor
vii. Naïve Bayes
2.4.1.2 Unsupervised Learning
The use of artificial intelligence (AI) systems to find patterns in data sets that
comprise data points that are neither classified nor labeled is known as
unsupervised learning. As a consequence, the algorithms can categorize, label,
and/or organize the data points in the data sets without the need for outside
assistance. Unsupervised learning, in other terms, allows the system to detect
patterns in data sets on its own. Even if no categories are specified, an AI system
will categorize unsorted data based on similarities and differences in unsupervised
learning. Compared to supervised learning systems, unsupervised learning
algorithms can handle more complicated processing tasks. Furthermore,
unsupervised learning is one method of putting AI to the test. (Pratt, 2021).

The algorithm currently utilized for unsupervised learning are:

i. Partial least squares

ii. Fuzzy means
iii. Singular value decomposition
iv. K-means clustering
v. Apriori
vi. Hierarchical clustering
vii. Principal component analysis
2.4.1.3 Reinforcement learning

Reinforcement learning is a type of machine learning training that rewards positive

actions while penalizing undesirable ones. A reinforcement learning agent, in
general, can sense and comprehend its surroundings, act, and learn via trial and
error (Carew, 2021).

2.5 Feature Selection

Feature selection is the process of reducing the number of input variables in a

predictive model.

To decrease the computational cost of modeling and, in certain cases, to improve

the model's performance, the number of input variables should be reduced.
Statistics are used to analyze the relationship between each input variable and the
goal variable, and the input variables having the strongest association to the target
variable are chosen. Although the kind of data in both the input and output
variables influences the statistical measures used, these procedures can be rapid
and effective. (Brownlee, 2020).

Filter, wrapper, and embedding techniques are the three types of feature selection
methods, depending on how they interact with the classifier.

2.5.1 Filter Methods

Filtering algorithms are scalable (up to extremely high-dimensional data) and
conduct quick feature selection before classification, ensuring that the learning
algorithm's bias does not interact with the feature selection algorithm's bias. They're
mostly used to sort characteristics from best to worst. The order in which qualities
are ordered is determined by the inherent properties of the data, such as variance,
consistency, distance, information, correlation, and so on. There are many different
filter techniques available, and new ones are being created all the time; each one
utilizes a different criterion to determine the data's relevancy. A feature is
irrelevant, according to one definition, if it is conditionally independent of the class
labels or has no impact on the class labels (Effrosynidis & Arampatzis, 2021)

2.5.1.1 Chi Square

If the occurrence of the feature is believed to be independent of the class value, a
univariate filter based on the common 2 statistical test is used to assess divergence
from the projected distribution. As with any univariate method, the Chi-square is
calculated between each characteristic and the target variable to see if there is a link
between them. If the target variable isn't affected by the feature, it's a good sign, it
is given a low score; if they are dependent, the feature is given a high value. A
higher chi-square value indicates that the trait is more important to the class.
(Effrosynidis & Arampatzis, 2021).
2.5.1.2 Fisher’s Score

Fisher score is one of the most often used supervised feature selection methods.
Based on the fisher's score, the method we'll use delivers the variables' rankings in
decreasing order. The variables can then be chosen based on the situation.

2.6 Feature Extraction

In machine learning, pattern recognition, and image processing, feature extraction

is used to construct derived values (features) that are intended to be useful and
non-redundant, aiding subsequent learning and generalization stages and, in some
cases, leading to enhanced human interpretations. The extraction of features is
linked to dimensionality reduction (sarangi et al, 2020).
2.6.1 Principal Components Analysis(PCA)
PCA (principal component analysis) is a statistical method for summarizing the
contents of large data tables with a smaller number of "summary indices" that are
easier to read and evaluate. Measurements representing attributes of production
samples, chemical substances or reactions, process time points in a continuous
process, batches from a batch process, biological people, or DOE-protocol trials
may all be found in the underlying data. (“Data Analysis”, 2020.
https://www.sartorius.com/en/knowledge/science-snippets/what-is-principal-
component-analysis-pca-and-how-it-is-used-507186).
2.6.2 Latent Semantic Analysis (LSA0029
By constructing a collection of linked ideas, latent semantic analysis examines
links between a group of texts and the phrases they include (LSA). According to
LSA, words with similar meanings will occur in linked text (the distributional
hypothesis). Using a mathematical technique known as singular value
decomposition, a large piece of text is converted into a matrix containing word
counts per document (rows represent unique words, and columns represent each
document), and the number of rows is reduced while the similarity structure among
columns is maintained (SVD). When comparing documents, the cosine of the angle
created by any two columns (or the dot product between the normalizations of the
two vectors) is employed. Papers with values close to 1 have a lot in common,
whereas documents with values close to 2 have a lot of differences. (Dumais,
2005).
2.7 K-Nearest Neighbor

The K-Nearest Neighbor (KNN) technique classifies items by learning data that is
nearest to the item based on previous and current data comparisons. KNN
determines the distance to the closest neighbor using the euclidean distance
formula, but other algorithms optimize the distance formula by comparing it to
other related formulae to reach best results. To determine the distance to the
nearest neighbor, the euclidean distance formula in KNN will be compared to the
normalized euclidean distance, Manhattan, and normalized Manhattan.(Lubis et al.,
2020).

Distance-based approaches are often employed to solve data classification

problems. One of the most often used distance-based algorithms is the k-nearest
neighbor classification approach (k-NN). To reach the final classification result,
this classification is based on analyzing the distances between the test sample and
the training samples. The conventional k-NN classifier performs well with
numerical data. The main purpose of this study is to see how well k-NN works on
datasets that contain both numerical and category variables. For the purpose of
simplicity, this study only looks at one sort of categorical data: binary data.

In this article, many similarity measures for both numerical and binary data were
generated using a mixture of well-known distances, and the efficiency of k-NN for
categorizing such diverse data sets was studied (Ali et al., 2019). The trials used
six different datasets from various domains and two different types of metrics. For
heterogeneous data, the suggested measures beat Euclidean distance, suggesting
that the challenges faced by different data demand unique similarity metrics
adapted to the data characteristics.(Ali et al., 2019).

2.8 Logistic Regression

A statistical tool for better comprehending complicated events is logistic regression

(Connelly, 2020). When there are several explanatory factors, logistic regression is
used to obtain the odds ratio. With the exception that the response variable is
binomial, the method is quite similar to multiple linear regression. The influence of
each variable on the odds ratio of the observed event of interest is the final
outcome. The key benefit is that it eliminates puzzling effects by assessing the
correlations between all factors.(Sperandei, 2014).
When the response variable is binomial, logistic regression operates similarly to
linear regression. The ability to employ continuous explanatory factors and the
simplicity with which more than two explanatory variables may be handled
simultaneously is the most important benefit when compared to Mantel-Haenszel
OR. While this final aspect may seem insignificant, it is vital when analyzing the
influence of multiple explanatory factors on the response variable. We miss their
covariance and are prone to confounding effects when we examine at multiple
explanatory elements independently.(Sperandei, 2014).

A logistic regression will estimate the likelihood of a certain result based on

various factors. Due to the fact that chance is a ratio, the real model will be the
logarithm of chance, which is equal to:

log ( 1−ππ )=β + β x + β x + … β x

0 1 1 2 2 m m (eqn 2.1)

When I is the probability of an event occurring, and xi signifies the regression

coefficients associated with the reference group and the xi explanatory factors
(e.g., death in the preceding example). An important topic must be addressed at
this moment. Individuals who provide The reference group, which is indicated by 0
is made up of the reference levels of each and every variable x1...m(Sperandei,
2014).

2.8 Naïve Bayes

One of the most fundamental probabilistic classifiers is Naive Bayes. Despite the
strong assumption that all characteristics are conditionally independent given the
class, it frequently performs wonderfully in a wide range of real-world settings.
Class probabilities and conditional probabilities are generated using training data
in the learning phase of this known structure classifier, and the values of these
probabilities are subsequently utilized to categorize fresh observations. 2013
(Taheri & Mammadov).

Bayesian Networks were first introduced by Pearl (1988). (BNs), They are high-
level representations of probability distributions over a set of variables X = X1,
X2,...,Xn in the learning process. The two stages of BN learning are structure
learning and parameter learning. The former generates a directed acyclic graph
from the collection X. Each node in the graph represents a variable, and each arc
depicts a causal link between two variables, with the arc's orientation representing
the direction of causality. The causal node is referred to as the parent, while the
other node is referred to as the child, when two nodes are connected by an arc. The
set of parents of the node Xi is P a(Xi), where Xi signifies both the variable
(feature) and the related node.Finding probability distributions, class probabilities,
and conditional probabilities associated with each variable given a structure is
referred to as parameter learning(Taheri & Mammadov, 2013).

As shown in Figure 2.4, the Naive Bayes classifier assumes that each feature
simply depends on the class. This signifies that the class is the single parent for
each feature. NB is appealing because it has a clear and robust theoretical
foundation that ensures optimal induction given a set of stated assumptions. The
independency assumptions of features with respect to the class are violated in some
real-world problems, which is a flaw. However, it has been demonstrated that NB
is remarkably resistant to such violations. NB is quick, easy to use, and effective
because to its straightforward structure. It's also good for high-dimensional data
because each feature's probability is calculated separately. According to Wu et al.,
NB is one of the top ten data mining methods (2008)(Taheri & Mammadov, 2013).

Let C stand for the observation X's class. Using the Bayes method to forecast the
class of observation X, the highest posterior probability of
p(C )P( X∨C)
p ( C∨X )= (eqn 2.2)
P (X )

Should be found.

Using the premise that features X1, X2,...,Xn are conditionally independent of each
other given the class, we derive the NB classifier.
n
p (C ) ∏ P( X i∨C)
(eqn 2.3)
p ( C| X )= i=1
P(X )

They are three distinct optimization models to estimate class probabilities P(C) and
conditional probabilities P(Xi|C), I = 1,..., n, in Equ. (3)(Taheri & Mammadov,
2013)
C

x1 x2 x3 xn

Figure 2.4 Naïve Bayes

2.9 Related Works

In 2017, Awoyemi et al. compared the efficacy of several techniques such as Naive
Bayes, KNN, and Logistic Regression, when they looked at severely distorted
credit card fraudulent data. For a total of 284,807 transactions, customers
throughout Europe supplied credit card transaction information. On the erroneous
data, a hybrid approach of underrepresentation and the technique of oversampling
is employed. The raw preprocessed data is subjected to three tests. different
approaches. Python is utilized to complete the work. Naive Bayes, K-Nearest
Neighbor, and Logistic Regression classifiers had optimal accuracy of 97.92
percent, 97.69 percent, and 54.86 percent, respectively, according to the data. KNN
beats According to Naive Bayes and Logistic Regression approaches the findings
of the comparison.
In 2017, Dal Pozzolo et al. offered three major contributions. First, the authors
provide a formalization of the problem of fraud detection appropriately depicts the
operating conditions of FDSs that monitor massive amounts of credit card
transactions on a regular basis, thanks to their research support. The authors also
demonstrated how to spot fraud by employing the most appropriate evaluation
techniques. Second, to address class imbalance, concept drift, and verification
delay, The writers designed and tested a novel method of learning. Finally, the
writers demonstrated the impact the unequal distribution of class drift using a real-
world data stream with over Over the course of three years, there were 75 million
transactions. Two types of random forests are used to teach the behavior aspects of
regular and anomalous transactions.
In terms of credit fraud detection, Xuan et al framework's presented in 2018
examined and examined the results of several With a variety of categorization
models, random forests are created. These studies used data from a Chinese e-
commerce company.
In their study published in 2018, Jurgovsky et al. framed the fraud detection
problem as a sequence classification job and employed long short-term memory
networks to include transactional sequences. Furthermore, the system employs
cutting-edge attribute aggregation algorithms and reports the framework's findings
using standard retrieval metrics. When compared to a benchmark Random Forest
classifier, the LSTM improves identification accuracy on offline transactions
where the cardholder is physically present at merchants. Both sequential and
nonsequential learning systems benefit from manual attribute aggregation
strategies. Following an examination of true positives, it was determined that both
approaches detect different types of fraud, implying that they should be used
jointly.
In a study published in 2019, Varmedja et al. revealed many approaches for
determining if transactions are fraudulent or not. The credit card fraud
identification dataset was used in this research. Because the dataset was severely
imbalanced, the SMOTE method was used to oversample it. Furthermore,
attributes were picked, and the dataset was split into two sections: training data and
test data. The technologies used in the study were Logistic Regression, Random
Forest, Naive Bayes, and Multilayer Perceptron. The research shows that each
system is capable of accurately detecting credit card fraud. Additional anomalies
could be discovered using the developed framework. Systems that use supervised
learning approaches to detect credit card fraud are based on the premise that
fraudulent patterns can be learned from a review of previous transactions.
By merging supervised and unsupervised approaches, Carcillo et al. proposed a
hybrid methodology for enhancing fraud detection accuracy in 2019. Unsupervised
anomaly ratings created at various degrees of granularity are investigated and
assessed using a real, labeled credit card fraud identification dataset. Experimental
data show that the combination is effective and enhances identification accuracy.
In a study published in 2018, Randhawa et al. used machine learning approaches to
detect credit card fraud. To begin, traditional methods are employed. Then, using a
combination of AdaBoost and popular voting, hybrid techniques are used. The
framework's effectiveness is evaluated using a publicly available credit card
dataset. The information is then analyzed using a real-time credit card dataset
obtained from a financial institution. In addition, to test the approaches' resiliency,
distortion is inserted into the data samples. The results of the experiments reveal
that the popular vote method accurately detects instances of credit card theft.
To identify credit card fraud issues, De Sá et al. introduced the Fraud-BNC
approach in 2018. The Bayesian network classification model is used to underpin
the proposed technique. Fraud-BNC was developed using a dataset from Pag
Seguro, Brazil's most popular online payment platform, and tested against two
cost-sensitive categorization algorithms. The obtained results were compared to
seven other techniques, and the methodology's cost efficiency and data
classification issue were evaluated.
In 2020, Sailusha et al. developed a credit card fraud detection model. The focus of
this study is on machine learning techniques. The AdaBoost and Random Forest
methodologies were used in this study. To compare the outcomes of the two
methods, the accuracy, precision, recall, and F1-score are used. To create the ROC
curve, the confusion matrix is used. These two techniques were evaluated in terms
of performance criteria like accuracy, precision, recall, and F1-score. The best
methodology for detecting fraud is the one that has the best performance metrics.
Bagga et al. proposed a framework in 2020 to compare the efficacy of various
methodologies on credit card fraud data, including Logistic Regression, Naive
Bayes, Random Forest, KNN, AdaBoost, Multilayer Perceptron, Pipelining, and
Ensemble Learning. The variables used and the method used to detect fraud have
an impact on the effectiveness of fraud detection.
Zhaohui Zhang proposed a Convolutional Neural Network-Based Model for
Detecting Online Transaction Fraud in 2018. It creates an input feature sequencing
layer that allows raw transaction features to be reorganized into multiple
convolutional patterns. When compared to the existing CNN for fraud detection,
the experimental results show that the model achieves excellent fraud detection
performance without derivative features, with precision stabilizing at 91 percent
and recall stabilizing around 94 percent, an increase of 26 percent and 2 percent
respectively.
Srivastava et al. (2008) used the Hidden Markov Model (HMM) to describe the
credit card transaction process. For this analysis, HMM was used as a detector for
fraudulent transactions after being programmed with specific cardholder behavior.
Following the training phase, the incoming credit card transactions were checked
using the model. If HMM did not accept the incoming credit card transaction, it
would be considered a fraud. The main disadvantage of this approach is that HMM
generates a high rate of false alarms in both positive and negative situations.
Halvaiee and Akbari (2014) proposed an Artificial Immune System-based Fraud
Detection Model (AISFDM) for detecting credit card fraudulent behavior. In this
approach, AIS was used as the artificial immune detection mechanism. An
algorithm inspired by the immune system was developed to improve the accuracy
of fraud detection. It does not, however, improve classification accuracy.
Duman and Ozcelik (2011) addressed the issue of detecting fraudulent credit card
transactions. For better classification performance, the authors first introduced a
new classification cost function for fraud detection, and then combined two meta-
heuristic algorithms, such as genetic algorithms, with the scatter search.
Krivko (2010) used a data-customized approach to detect plastic card fraud. To
compensate for the shortcomings of the individual methods, the proposed approach
combined supervised and unsupervised methodologies. The proposed method first
tracked changes in transaction behavior over time, and then assigned scores to each
fraudulent transaction based on the assumption of fraud behavior. The rule-based
filters were fed a sequence of transactions with scores greater than a certain
threshold value. The rules were then generated from those transactional records
with the goal of improving the detection's performance. However, it is also critical
to keep the saving information in order to improve detection.
Lei and Ghorbani (2012) proposed an Improved Competitive Learning Network
(ICLN) and a clustering algorithm of the Supervised Improved Competitive
Learning Network (SICLN). To represent the data Centre, ICLN's neural network
was programmed to use the reward-punishment update rule. SICLN then used the
updated rule and achieved better results during clustering by assigning class labels
to guide the training process. Improving the SICLN's convergence speed
necessitates the use of an efficient method.
Ravisankar et al. (2011) compared data-driven fraud detection approaches based on
past fraudulent behavior and financial ratios. The authors compared methods for
detecting fraud in business financial statements, including Multilayer Feed
Forward Neural Network (MFFNN), Support Vector Machines (SVM), Genetic
Programming (GP), Group Method of Data Handling (GMDH), Logistic
Regression (LR), and Probabilistic Neural Network (PNN). Then, feature selection
methods were used to extract fraudulent transactions from the dataset, and fraud
behavior was found to be effective. GP which was found to be the best method
among others suffers from the delivery of marginally less accuracy. To detect
fraudulent financial reporting activities.
Glancy and Yadav (2011) proposed a Computational Fraud Detection Model
(CFDM). CFDM discovered incorrect details in annual filings with the assistance
of the US Securities and Exchange Commission (SEC) using information
presented in a text document for the detection process.
A.Shen et al (2007) demonstrate the efficiency of classification models to credit
card fraud detection problem and the authors proposed the three classification
models i.e decision tree, neural network and logistic regression. Among the three
models neural network and logistic regression outperforms than the decision tree.
The research was mentioned by Y. Sahin and E. Duman (2011) for credit card
fraud detection, and seven categorization algorithms were employed. To reduce the
risk of the banks, they used decision trees and SVMs in this study. They propose
that Artificial Neural Networks and Logistic Regression classification models are
more useful in improving fraud detection performance.
To increase the efficiency of identifying financial fraud, Li and Wong (2015)
developed Grammar-based Multi-Objective Genetic Programming with Statistical
Selection Learning (GBMGP-SSL). To adjust the goal values of each solution, this
system employed token competition. To maintain variety, similar objective values
of various definitions were separated.
For credit card fraud detection, Van Vlasselaer et al. (2015) developed Anomaly
Prevention Using Advanced Transaction Exploration (APATE). The proposed
approach combines past transactional trends and customer actions into useful
features that are then matched with incoming transactions. APATE has the
advantage of being able to detect fraudulent transactions in as little as six seconds.
The proposed method, however, was inapplicable to defining a group of fraudulent
behavior.

CHAPTER 3

METHODOLOGY

3.1 Study Approach

We used three key strategies in this study: dataset, feature selection, and
classification. This study uses open source data from https://data.world/vlad/credit-
card-fraud-detection, which is then passed through a feature selection method
called Recursive Feature Elimination(RFE) to reduce class imbalance, and the
subset obtained is then used for classification using K-Nearest Neighbor, Logical
regression, and Nave Bayes. After then, performance measures are used to
compare and assess the results. The proposed framework for this investigation is
shown in Figure 3.1.

Dataset
Recursive Feature Elimination

KNN Naïve Bayes Logistic

Regression

Results Results Results

Figure 3.1 The proposed Framework of the model

3.2 Recursive Feature Elimination

RFE assesses the features by significance and returns the top-n features after
removing the least important features, where n is the user's input.

To use Recursive Feature Elimination it must be imported into the code using the
sklearn.feature_selection library. There are major parameters it takes,

i. estimator: A supervised learning estimator with a fit technique that offers

feature importance information via the coef_ attribute or the feature
importance_ attribute.
ii. n features to select; int or None (default=None) n features to select: int
or None, The amount of options to choose from. If None is selected, half of
the features are chosen.
iii. Step; int or float, optional (default=1): Step relates to the (integer)
number of features to eliminate at each iteration if it is higher than or
equal to 1. Step relates to the percentage (rounded down) of features to
delete at each iteration if the range is between (0.0, 1.0).

We use the following two commands to fit RFE;

rfe = RFE(estimator = lm, n_features_to_select = 7, step = 1)

rfe = rfe.fit(X_train, y_train)
3.2.1 How RFE eliminate the features
RFE employs a supervised learning estimate that has previously been fitted to all
characteristics through the usage of data.

After that, the coefficient associated with each attribute is considered. which is
obtained from the coef_ or feature importances_ property. Those coefficients are
essentially the same as the ones we get after fitting the model to the dataset and
minimizing the residuals. The relevance of these coefficients with the target
variable is shown by their value. The feature with the smallest absolute coefficient
value is deemed the least significant, and so on.

The least important coefficient is then removed from the list of characteristics, and
the model is rebuilt using the remaining features. The step parameter determines
the number of features to be dropped at each iteration. It is preferable to remove
one feature at a time because the coefficient values of other features change when
the model is rebuilt.

With each iteration, it rebuilds the model, reducing the least important feature(s)
and continuing the process until it only has two features. After that, it assigns a
score to features depending on how long it took to delete them. The feature that
was eliminated initially receives the highest ranking, and so on. The last n features
that have been removed are given a rank of one. (Mittal, 2020).

Algorithm 3.1: Recursive Feature Elimination (source: Mittal, 2020)

1. Takes user input.

2. Fits all features to the data.
i. Remove the feature with the lowest absolute coefficient value.
ii. Rebuilds the model with the features that are still available.
3. Ranks the feature, with the last n features marked as 1 and the others ordered
according to their time of deletion, with the first feature eliminated having
the highest rank, and so on...
4. The features with rank 1 are the last to be chosen.

3.3 K-Nearest Neighbor

The KNN classifier is an example of a learning strategy in which classification is

done using the Manhattan or Euclidean similarity measure and the Minkowski
distance function. The Manhattan or Euclidean function is used to deal with
continuous data, whereas the Minkowski function is used to deal with categorical
data. In the KNN classifier, the distance is measured using the Euclidean function.
Between two vectors (Xi and Xj), the Euclidean function (Dij) is constructed.

√∑
m
Dist ij = 2
( X iı −X jı ) (eqn 3.2)
ι=1

In other words, the K-nearest neighbor technique is a straightforward supervised

machine learning algorithm that can be used to both classification and regression
issues.
3.3.1 How Does K-Nearest Neighbor work
The KNN algorithm believes that objects that are similar are close together. To put
it another way, related items are close together.

Algorithm 3.2: K Nearest Neighbor (source: Harrison, 2018)

1. Put the data on the hard drive.
2. Set K to the number of neighbors you want.
3. For each data point in the data set,
i. From the data, calculate the distance between the query example and the
present example.
ii. To an ordered collection, add the example's distance and index.
4. Using the distances, sort the ordered collection of distances and indices from
smallest to greatest (in ascending order).
5. From the sorted collection, select the first K elements.
6. Get the labels for the K entries you've chosen.
7. If there is a regression, the mean of the K labels should be returned.
8. If the K labels are classified, return the mode of the K labels.

3.3.2 Choosing the right value for K

We run the KNN algorithm numerous times with different values of K to find the K
that decreases the amount of mistakes we encounter while keeping the algorithm's
capacity to generate correct predictions when it's given data it hasn't seen before.

Here are a few things to remember:

1. As the value of K is reduced to one, our predictions become less stable.
Consider the case when K=1 and the query point is surrounded by numerous
reds and one green (I'm thinking of the top left corner of the colored plot
above), but the green is the lone closest neighbor. We would expect the query
point to be red, but because K=1, KNN wrongly predicts that the query point
would be green.
2. Conversely, if the value of K is increased, our forecasts become more stable
owing to majority voting / averaging, and hence more likely to be true (up to a
certain point). We eventually start to see an increase in the amount of mistakes.
We know we've pushed the value of K too much at this point.

3. We normally make K an odd number to have a tiebreaker in circumstances

where we take a majority vote among labels (e.g. determining the mode in a
classification issue).

3.4 Logistic Regression

Logistic regression is a method for calculating the likelihood of binary classes

based on one or more features. It calculates the optimal sigmoid nonlinear function
parameter. The sigmoid function () and the input vector (x) are illustrated below.
When the input data are multiplied and added together to create the intended class
classification classifier, a vector (z) is created, with w being the best coefficient. It
is referred to as 1 if its value is more than 0.5, otherwise it is referred to as 0. The
gradient ascent optimizer is then utilized to identify the optimal performance of the
classifier during training (Mehbodniya et al., 2021).
1
sigf ( x )=
( 1+e− x )
x=w0 z0 + w1 z1 + w2 z 2 +…+ wn z n (eqn 3.3)

3.5 Naïve Bayes

Naive Bayes is a statistical approach based on Bayesian theory that determines the
outcome based on the highest likelihood. Based on the given value, it calculates the
likelihood of the unknown value. To anticipate unknown probabilities, logic and
prior information can be used. Binary classes and conditional probabilities are the
foundations of Naive Bayes(Mehbodniya et al., 2021).
prob ( featurek ∨class j )∗prob(class j)
prob ( class j|featurek ) = (eqn 3.4)
prob(featurek )
m
prob ( featurek|class j ) =∏ prob(featurek ∨class j) (eqn 3.5)
j=1

In equations (3.4) and (3.5), n denotes the maximum number of features

prob ( featurek|class j ) , represents the likelihood of obtaining a feature value featurek
specified in class j, and prob(feature k ) and prob(class j ) denote the probability of
feature value featurek and class class j occurrence, respectively. With the help of the
Bayesian principle, this classifier was used for binary classification.

Algorithm 3.3: Naïve Bayes source(Saputra et al., 2018)

Input:

Training dataset T,
F = ( f 1, f 2, f 3,.., f n) // In the testing dataset, the value of the predictor variable.

Output:

A class of testing dataset;

Steps:

1. Load the training dataset T;

2. For each class, calculate the mean and standard deviation of the predictor
variable;
3. Repeat

Calculate the probability of f 1 in each class using the gauss density

equation;

Until all predicator variables' probabilities (f 1, f 2, f 3,..., f n) have been

computed.
4. Determine the probability for each class;
5. Increase your chances of success;

3.6 Performance Evaluation

3.6.1 Accuracy
The accuracy of a classification algorithm is the most widely used performance
statistic. The number of right forecasts divided by total predictions is the definition
of accuracy. The accuracy of the confusion matrix may be calculated using the
formula below.
Numberofcorrectpredictions( ∑ TruePositive+TrueNegative)
Accuracy=
Totalpopulation( ∑ TruePositive+ ∑ FalsePositive +∑ FalseNegative+ ∑ TrueNegative)

3.6.2 Specificity
The definition of specificity is the right number of negative forecasts, also known
as a true negative rate, divided by the accurate negative predictions and the
inaccurate negative predictions. A species' value ranges from 0 to 1.

specificity=
∑ TrueNegative
ConditionNegative (∑ FalsePositive+ ∑ TrueNegative )
3.6.3 Precision
The number of correct positive forecasts divided by the number of correct and
positive predictions equals precision. The accuracy number might be anything
between 0 and 1.

precision=
∑ TruePositive
∑ TruePositive+ ∑ FalsePositive
3.7 System Configuration

To achieve the study, an 16GB RAM, 64-bit Hp pavilion 15 system with intel core
i5 10300h 2.5GHz processor will be used.
CHAPTER 4

RESULTS AND DISCUSSIONS OF FINDINGS

This chapter explains the implementation's detailed outcomes as well as the

feedback gathered. It also includes a full analysis of the supposed model's findings
and results. The outcomes of the assessment are used to justify the study's
completion in line with its aims and objectives.

4.1 Results and Discussions

Machine learning methods were constructed on the jupyter notebook platform

utilizing machine python, and classification approaches were then applied. This
section, in particular, presents the results of the proposed model's investigations. I
was able to use these strategies by using feature selection. Machine learning
approaches such as feature selection (Recursive Feature Elimination) and
classifiers (KNN, Nave Bayes, and Logistic Regression) are used in this study.
Figure 4.1 shows the jupyter notebook environment.

Figure 4.1 Jupyter Notebook Environment

This is accomplished through the use of pandas, a program that simplifies the
process of importing and analyzing data. We used pd.read csv to read the dataset
for the project, where pd stands for pandas and the .read csv indicates that only
CSV files can be read into this application.

The dataset was stored in the download folder in the system and it containes
284807 instances and 32 attributes. The dataset was loaded using the following
code data = pd.read_csv(‘dataset_full.csv’)Data.

Figure 4.2 shows the data loading in the algorithm

The shape of the data, which has 284807 instances and 32 attributes, is returned by
df.shape(). The shape of the dataset is depicted in Figure 4.3.

Figure 4.3 Diagram of the shape of the dataset

4.2 Description of the Data

Datasets are described in the program using data.describe(), this is a method for
calculating statistics from a data frame's numerical values, such as the mean,
percentile, and standard deviation(std). The count, mean, standard deviation,
minimum, percentile of (25 percent, 50 percent, 75 percent), and maximum were
calculated in this dataset. The credit card dataset's data description is shown in
Figure 4.4.

Figure 4.4 Diagram displaying the data description

4.3 Data Splitting

This is a data splitting approach for machine learning that divides data into train,
test, and validation sets. Each algorithm divided the data into subgroups for
training and testing. To fit the model and conduct the evaluation test, the training
set was used. In this study, training took up 80% of the time, while testing took up
20%. Figure 4.5 depicts the data for training and testing.
Figure 4.5 Diagram showing split dataset.

4.4 Feature Selection with Recursive Feature Elimination on dataset

The optimal number of selected features offered in this study is 9, and RFE is used
to choose the relevant characteristics that will be suitable for the performance. The
selected features are shown in Figure 4.6 using the Recursive Features Elimination
technique.
Figure 4.6 Selecting features from the dataset using RFE
4.4.1 Evaluating RFE for classification
The reason for evaluating RFE for classification is to evaluate the accuracy of the
RFE using decision tree classifier, the model is evaluated and the performance is
reported with an accuracy. The figure 4.7 shows the evaluation of RFE for
classification.

Figure 4.7 Evaluating RFE for classification

4.4.2 Reporting which features where selected
In this study, the number test features inputted was 20. Figure 4.8 shows the report
of the selected features.
Figure 4.8 Report of selected features

4.5 Using Logistic Regression with RFE

As a type of supervised machine learning algorithm logistics regression uses true

labels for training. We use L1 and L2 logistic regression regularization to prevent
overfitting the training dataset. The figure 4.9.1, figure 4.9.2, figure 4.10.1 and
figure 4.10.2 shows the confusion matrix and ROC of L1 and L2 respectively.

Figure 4.9.1 Confusion Matrix of L1

Figure 4.9.2 Confusion Matrix of L2

Figure 4.10.1 ROC curve of L1

Figure 4.10.2 ROC curve of L2

4.6 Using KNN with RFE

KNN is a supervised machine learning technique that may be used to address

issues like regression and classification. However, it suffers from the drawback
that as the amount of data used grows, the system becomes slower. We use RFE to
eliminate unnecessary feature hence reducing the data and mitigate the
disadvantage of KNN. The confusion matrix and ROC of KNN after RFE are
shown in Figures 4.11 and 4.12.

Figure 4.11 Confusion Matrix of KNN

Figure 4.12 ROC of KNN

4.7 Using Naive Bayes with RFE

The Bayesian theorem is utilized to create the Nave Bayes algorithm, which is
used to handle classification problems. Simple and effective classification
methods, such as the Nave Bayes classifier, are critical for quickly constructing
machine learning models that can make accurate predictions. The confusion matrix
and ROC curve of Nave Bayes after RFE are shown in Figures 4.13 and 4.14.
Figure 4.13 Confusion Matrix of Naïve Bayes

Figure 4.14 ROC of Naïve Bayes

4.8 Scatter Plot Visualization

This type of graph is vital to the study of statistics because it can reveal the degree
of correlation between selected features or variables. Observation and visualization
of relationships between two numeric variables are the primary purposes of scatter
plots. Individual data points are represented by dots in a scatter plot, but as a whole
is represented by patterns in dots. Figure 4.15 shows the scatter plot visualization
in the credit card dataset.
Figure 4.15 Scatter Plot Graph

4.9 Performance Metric

PERFORMANC KNN LOGISTIC LOGISTIC NAÏVE

E
REGRESSION(L1) REGRESSION(L2) BAYES
MEASURES

sensitivity 0.8095 0.5357 0.6548 0.8571

Accuracy 0.9996 0.9992 0.9994 0.9818

Specificity 0.9999 0.9999 0.9999 0.9820

Precision 0.9444 0.8654 0.8871 0.0657

F1 Score 0.8718 0.6618 0.7534 0.1220

Table 4.1 Performance Metric

4.10 Comparison with Previous Works

Experiments were carried out, and the results are shown in Table 4.1, however
KNN, Logistic Regression(L1), Logistic Regression(L2) and Naïve Bayes
performed with an accuracy of 99.9%, 99.9%, 99.9% and 98.1% respectively.
Table 4.2 shows the comparison of the result with other works.

Table 4.2 Comparison of Results

AUTHORS METHODS RESULTS

Zhang, 2018 CNN 94%

Awoyemi et al., 2017 Logistics egression 97.9%

Ileberi, 2022 Naïve Bayes 97%

Vaishnavi, 2019 Decision Tree 99.9%

Bhanusri et al, 2020 Naïve Bayes 90.8%

CHAPTER 5

SUMMARY, CONCLUSION AND RECOMMENDATION

5.1 Summary

To detect credit card fraud, this study uses machine learning approaches. Different
conventional models are introduced and cast for evaluation, including Logistic
Regression, Decision Tree, K-Nearest Neighbor, and Naive Bayes. Logistics
Regression and KNN beat Nave Bayes in terms of performance, according to the
technique used in this study. It might arise because the dataset is insufficient to
train and uncover hidden patterns in order to anticipate future or upcoming data,
and the weights' initialization was extremely random, potentially interfering with
the training process. The dataset was imbalanced, and it was enhanced by
removing imbalance data using Recursive Feature Selection.

5.2 Conclusion

The credit card dataset is open to the general public. To achieve accuracy, A
variety of standard models are trained and evaluated, and the best model using both
stored and real-time data is picked. The dataset is used to train and test machine
learning classifiers, and their performance is assessed using a variety of credit card
fraud indicators. When contrasting to the sequential pattern of earlier expected
fraud detection data, our research shows that online and offline transactions have
distinct characteristics.

5.2 Recommendation

The numerous strategies presented in this paper could be developed to an online

machine learning methodology in the future. To obtain more precise results, they
can be investigated in both an offline (collected data) and a real-time setting. With
the least amount of processing time, the online learning model will detect fraud
instances in real time. This allows fraud to be predicted sooner (before it occurs),
reducing the amount of loss instances in the health-care industry.

Real-Time Credit Card Fraud Detection Using Machine Learning
No ratings yet
Real-Time Credit Card Fraud Detection Using Machine Learning
6 pages
Credit Card Fraud Detection Proposal Redone
No ratings yet
Credit Card Fraud Detection Proposal Redone
5 pages
Anti fraud for Cheques and use of AI: Next gen realtime anti fraud 4 cheque processing
From Everand
Anti fraud for Cheques and use of AI: Next gen realtime anti fraud 4 cheque processing
Prabhs Uyyala
No ratings yet
Financial Fraud Detection Based on Machine Learning
No ratings yet
Financial Fraud Detection Based on Machine Learning
9 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Analysis On Credit Card Fraud Detection Methods
0% (1)
Analysis On Credit Card Fraud Detection Methods
7 pages
Major 1 2nd
No ratings yet
Major 1 2nd
13 pages
Fraud Detection in Credit Card System: EEE1007 - Neural Networks and Fuzzy Control
No ratings yet
Fraud Detection in Credit Card System: EEE1007 - Neural Networks and Fuzzy Control
7 pages
Credit Card Fraud Detection Report
No ratings yet
Credit Card Fraud Detection Report
31 pages
AGI India
No ratings yet
AGI India
19 pages
Paper 65-Fraud Detection in Credit Cards
No ratings yet
Paper 65-Fraud Detection in Credit Cards
12 pages
DOC-20241212-WA0330.
No ratings yet
DOC-20241212-WA0330.
13 pages
Seminar II Initial Review
No ratings yet
Seminar II Initial Review
13 pages
Credit-Card-Fraud-Detection-System-Using-Machine-Learning-Process (1)
No ratings yet
Credit-Card-Fraud-Detection-System-Using-Machine-Learning-Process (1)
4 pages
Review On Credit Card Fraud Detection Using Machine Learning Algorithms
No ratings yet
Review On Credit Card Fraud Detection Using Machine Learning Algorithms
5 pages
Bridget
No ratings yet
Bridget
6 pages
Financial Fraud Detection in Healthcare Using Machine and Deep Learning
No ratings yet
Financial Fraud Detection in Healthcare Using Machine and Deep Learning
25 pages
Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
No ratings yet
Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
18 pages
Integrating a Machine Learning-driven Fraud Detection System
No ratings yet
Integrating a Machine Learning-driven Fraud Detection System
7 pages
Analysis_of_Discovering_Fraud_in_Master_Card_Based_on_Bidirectional_GRU_and_CNN_Based_Model
No ratings yet
Analysis_of_Discovering_Fraud_in_Master_Card_Based_on_Bidirectional_GRU_and_CNN_Based_Model
6 pages
Real Time Credit Card Fraud Detection
No ratings yet
Real Time Credit Card Fraud Detection
21 pages
Credit Card Fraud Detection and Classification by
No ratings yet
Credit Card Fraud Detection and Classification by
6 pages
Credit Card Fraud Detection Techniques
No ratings yet
Credit Card Fraud Detection Techniques
8 pages
MPML10 2022 FR
No ratings yet
MPML10 2022 FR
24 pages
PROJECT DOCUMENTATION
No ratings yet
PROJECT DOCUMENTATION
59 pages
A Synergistic Approach For Enhancing Credit Card Fraud Detection Using Random Forest and Naïve Bayes Models
No ratings yet
A Synergistic Approach For Enhancing Credit Card Fraud Detection Using Random Forest and Naïve Bayes Models
9 pages
Machine Learning For Credit Card Fraud D
No ratings yet
Machine Learning For Credit Card Fraud D
6 pages
19BCS3815@Project Report Kuber
No ratings yet
19BCS3815@Project Report Kuber
40 pages
Online Transaction Fraud Detection Using Backlogging on e Commerce Website IJERTV11IS050319 (1)
No ratings yet
Online Transaction Fraud Detection Using Backlogging on e Commerce Website IJERTV11IS050319 (1)
6 pages
Survey Paper On Credit Card Fraud Detection
No ratings yet
Survey Paper On Credit Card Fraud Detection
6 pages
Credit Fraude PDF
No ratings yet
Credit Fraude PDF
6 pages
Fraud Detection System Micro-Project
No ratings yet
Fraud Detection System Micro-Project
27 pages
Comparative Study of Machine Learning Algorithms F
No ratings yet
Comparative Study of Machine Learning Algorithms F
11 pages
My Seminar
No ratings yet
My Seminar
15 pages
itmconf_icdsia2023_02012
No ratings yet
itmconf_icdsia2023_02012
10 pages
Credit Card Fraud Detection Using Machine Learning PDF
No ratings yet
Credit Card Fraud Detection Using Machine Learning PDF
6 pages
3.
No ratings yet
3.
9 pages
Fraud Detection
No ratings yet
Fraud Detection
15 pages
Bioconf Iscku2024 00076
No ratings yet
Bioconf Iscku2024 00076
18 pages
A_Review_on_Credit_Card_Fraud_Detection_Using_Mach
No ratings yet
A_Review_on_Credit_Card_Fraud_Detection_Using_Mach
6 pages
Credit Card Fraudulent Transaction Detection and Prevention
100% (1)
Credit Card Fraudulent Transaction Detection and Prevention
8 pages
Article-2017 - A Novel Idea For Credit Card Fraud Detection Using Decision Tree
No ratings yet
Article-2017 - A Novel Idea For Credit Card Fraud Detection Using Decision Tree
5 pages
Credit Card Fraud Analysis
No ratings yet
Credit Card Fraud Analysis
3 pages
RESEARCHINTELre
No ratings yet
RESEARCHINTELre
8 pages
Credit Card Fraud 1.4% Positive Class
No ratings yet
Credit Card Fraud 1.4% Positive Class
17 pages
Credit Card Fraud Detection Using Machine Learning: Swaroop K Amruta D Sanath J Pooja G
No ratings yet
Credit Card Fraud Detection Using Machine Learning: Swaroop K Amruta D Sanath J Pooja G
5 pages
Systemic Acquired Critique of Credit Card - 2022 - Journal of Open Innovation T
No ratings yet
Systemic Acquired Critique of Credit Card - 2022 - Journal of Open Innovation T
20 pages
Rsearch Paperoncreditcardfrauddetection Auto Recovered
No ratings yet
Rsearch Paperoncreditcardfrauddetection Auto Recovered
14 pages
Credict Card
No ratings yet
Credict Card
6 pages
Credit Card Fraud Detection
100% (1)
Credit Card Fraud Detection
10 pages
Real Time Credit Card Fraud Detection
No ratings yet
Real Time Credit Card Fraud Detection
5 pages
Open AI and Its Impact On Fraud Detection in Financial Industry
No ratings yet
Open AI and Its Impact On Fraud Detection in Financial Industry
24 pages
Credit Card Fraud Detection Using Enhanced Random Forest Classifier For Imbalanced Data
No ratings yet
Credit Card Fraud Detection Using Enhanced Random Forest Classifier For Imbalanced Data
11 pages
Popat 2018
No ratings yet
Popat 2018
6 pages
A Comparative Analysis of Credit Card Fraud Detection Using Machine Learning Techniques
No ratings yet
A Comparative Analysis of Credit Card Fraud Detection Using Machine Learning Techniques
2 pages
Credit Card Fraud
No ratings yet
Credit Card Fraud
13 pages
Credit Card Fraud Detection: Dept. of Cse, Vvit 2019-2020
No ratings yet
Credit Card Fraud Detection: Dept. of Cse, Vvit 2019-2020
6 pages
Paper-7 - Supervised Machine Learning Model For Credit Card Fraud Detection
No ratings yet
Paper-7 - Supervised Machine Learning Model For Credit Card Fraud Detection
7 pages
Credit Card Underworld
From Everand
Credit Card Underworld
Emily Johnson
No ratings yet
Textbook of Urgent Care Management: Chapter 36, Virtual Care
From Everand
Textbook of Urgent Care Management: Chapter 36, Virtual Care
Ian Vasquez
No ratings yet
IIITB+EPGP+DS+AI_v5_compressed+(1)
No ratings yet
IIITB+EPGP+DS+AI_v5_compressed+(1)
20 pages
Revolutionary Approach For Smart Washing Machine
No ratings yet
Revolutionary Approach For Smart Washing Machine
6 pages
Prof - Gopichand Unit 1
No ratings yet
Prof - Gopichand Unit 1
72 pages
CHATGPT
No ratings yet
CHATGPT
1 page
Note 1
No ratings yet
Note 1
14 pages
AI Model Test Paper 1
No ratings yet
AI Model Test Paper 1
9 pages
SOP_Wolverhampton[1] Baijid
No ratings yet
SOP_Wolverhampton[1] Baijid
2 pages
GP - Questions For 2022 Exam
No ratings yet
GP - Questions For 2022 Exam
6 pages
HHW 2
No ratings yet
HHW 2
9 pages
Script - MyFitness Pal
No ratings yet
Script - MyFitness Pal
4 pages
Some Random Project
No ratings yet
Some Random Project
4 pages
Disaster Relief Management System [Autosaved]
No ratings yet
Disaster Relief Management System [Autosaved]
12 pages
Dr. Robot or Dr. Efficiency - The Impact of Robotic Process Automa
No ratings yet
Dr. Robot or Dr. Efficiency - The Impact of Robotic Process Automa
55 pages
Pattern Recognition and Computer Vision Third Chinese Conference PRCV 2020 Nanjing China October 16 18 2020 Proceedings Part III Yuxin Peng download pdf
100% (4)
Pattern Recognition and Computer Vision Third Chinese Conference PRCV 2020 Nanjing China October 16 18 2020 Proceedings Part III Yuxin Peng download pdf
47 pages
18AI61
No ratings yet
18AI61
3 pages
Computer Vision 1
No ratings yet
Computer Vision 1
16 pages
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
No ratings yet
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
6 pages
Dokumen - Tips Decision Tree and Random Forest 58f9e8a0cce07
No ratings yet
Dokumen - Tips Decision Tree and Random Forest 58f9e8a0cce07
17 pages
Journal of Retailing and Consumer Services: Hua Fan, Bing Han, Wei Gao
No ratings yet
Journal of Retailing and Consumer Services: Hua Fan, Bing Han, Wei Gao
11 pages
Design and Analysis of Algorithms
No ratings yet
Design and Analysis of Algorithms
43 pages
Augmented Intelligence Ebook
No ratings yet
Augmented Intelligence Ebook
20 pages
Dissertation Nick Yee
100% (2)
Dissertation Nick Yee
11 pages
CNN project
No ratings yet
CNN project
9 pages
CCF Mathematics Amendments
No ratings yet
CCF Mathematics Amendments
26 pages
VO- MCA Brochure (1)
No ratings yet
VO- MCA Brochure (1)
8 pages
2024_Finance_Technology_Bullseye_Report_Gartner1741612837502
No ratings yet
2024_Finance_Technology_Bullseye_Report_Gartner1741612837502
38 pages
AI-Assisted Deep NLP-Based Approach for Prediction of Fake News From Social Medi
No ratings yet
AI-Assisted Deep NLP-Based Approach for Prediction of Fake News From Social Medi
11 pages
(IJCST-V12I2P14) :asst. Prof. Neethi Narayanan, Kalyani V Nair, Sreelakshmi A S, Sreelekha A, Harsha T K
No ratings yet
(IJCST-V12I2P14) :asst. Prof. Neethi Narayanan, Kalyani V Nair, Sreelakshmi A S, Sreelekha A, Harsha T K
5 pages
AI - Grade-4 - Ans Key
No ratings yet
AI - Grade-4 - Ans Key
6 pages
Call For Applications SRIP 2024
No ratings yet
Call For Applications SRIP 2024
44 pages