0% found this document useful (0 votes)
36 views9 pages

A Survey Paper On Credit Card Fraud Detection Techniques

Uploaded by

naitik S T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views9 pages

A Survey Paper On Credit Card Fraud Detection Techniques

Uploaded by

naitik S T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/355410841

A Survey Paper On Credit Card Fraud Detection Techniques

Article in International Journal of Scientific & Technology Research · October 2021

CITATIONS READS
12 15,474

3 authors, including:

Aisha Fayyomi Derar Eleyan


Al Istiqlal University (Palestinian Academy for Security Sciences) Palestine Technical University- Kadoorie
1 PUBLICATION 12 CITATIONS 63 PUBLICATIONS 777 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Derar Eleyan on 19 October 2021.

The user has requested enhancement of the downloaded file.


INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 10, ISSUE 09, SEPTEMBER 2021 ISSN 2277-8616

A Survey Paper On Credit Card Fraud Detection


Techniques
Aisha Mohammad Fayyomi, Derar Eleyan, Amina Eleyan
Abstract: A credit card is the most widely used electronic payment method because of the increasing volume of daily electronic transactions, making it
more vulnerable to fraud. Credit card companies have suffered heavy losses from card fraud. The detection of credit card fraud is currently the most
common issue. . Credit card companies are looking for the right technologies and systems to detect and reducing fraud of transactions on the credit
card. There are several methods for identifying credit card fraud that has been surveyed and highlighted in this paper and has been compared in terms
of disadvantages and advantages for each one.

Index Terms: Detection Techniques, Machine Learning, Credit Card, Fraud Detection.
————————————————————

1 INTRODUCTION ―The estimated financial loss of credit card fraud worldwide in


At recent years, online payment methods have been used 2018 rose to $24.26 million‖ [6]. ―By 2019, the global fraud
widely as an outcome of the rapid increase in non-cash losses have accounted for US $ 27.billion, according to PR
electronic transactions. Credit cards represent one of the Newswire Association LLC‖.[7] ―Moreover, it is estimated that it
electronic payment methods A credit card is a thin rectangular will surpass roughly $30 billion by 2020‖ [8]. Activation
piece of plastic or metal issued by a bank or financial services procedures have all contributed to a reduction in the impact of
company to a consumer (cardholder) to facilitate payment to a fraud. Merchants are putting programs in place to help prevent
merchant of goods and services. It is based on the consumer's credit card fraud. although, more precautions must be taken to
promise to the card issuer. The card issuer (usually a bank) prevent fraud [4]. Fraudulent transactions are efficiently
opens an account, which is usually circling, and contributes a detected with the help of Machine Learning algorithms that
line of credit to the user. Which the user can use to make a have a high processing or computing power and the ability to
payment. With a card-based payments accounting for handle large datasets. which is a promising way to reduce
approximately 51% of transactions. [1], [2], [3]. Despite the credit card frauds [9], [10]. This paper includes seven
advantages of electronic payment, credit ca3rd companies are sections. Section II summarizes brief previous studies. Section
experiencing an increase in card fraud with the advent of many III the approaches by which the elementary studies were
new technologies. Scammers are smart enough to take systematically chosen are offered. In section IV. Several
advantage of loopholes and always try to steal data using new popular credit card fraud detection techniques have been
technologies like Skimming and phishing. There are briefed. Section V. presented a comparison of various fraud
occurrences when a website is designed to match a legitimate detection techniques. Section IV. Summarizes results and
site and victims enter personal information such as passwords, discussion. Finally section VII. Presented conclusion and
user names, and credit card information The hustler send out future scope.
a major number of emails (bait) that direct victims to their
bogus websites. The e-mails seem to be from organizations 1.1 Credit Card Fraud
such as PayPal banks, AOL, and eBay, and they ask the Fraud according to the Association of Certified is defined as
victim to log their personal information in order to resolve any wilful or deliberate act of depriving another of ownership or
―issue." The fraudster can earning by stealing the victim's money through wiliness, deception, or other unfair means [11].
identities and then theft their money [4]. Credit card fraud ―The unauthorized procedure of CC or information deprived of
caused a heavy financial loss from card fraud. ―According to a owner's data is called add the full name and then the
2017 US Payments Forum report, criminals have shifted their abbreviation CCF. The dissimilar CCF trick applications &
focus to activities involving CNP transactions as chip card behaviors are related to two groups of frauds. Specify the first
security has improved‖ [5]. group and the second group. When app fraud occurs,
fraudsters apply for a new card from the bank or provide it to
companies that use false or other information. A user can file
———————————————— multiple applications with a single usual of describes (named
 Aisha Mohammad Fayyomi is Currently a graduate student at
Technical University-Kadoorie, Tulkarem, Palestine duplicate fraud), or a different user with similar describes
E-mail: [email protected] (named identity fraud). Instead, there are practically four
 Department of Applied Computing, Technical University- main types of behavioural fraud: stolen/lost cards, mail theft,
Kadoorie, Tulkarem, Palestine fake cards, & ‗current cardholder does not exist‘ fraud. When a
E-mail: [email protected] stolen / lost card fraud occurs, fraudsters steal a credit card or
 Department of Computing and Mathematics, Manchester
get lost card. Mail theft fraud when a fraudster receives
Metropolitan University, Manchester M15 6BH, United
Kingdom, personal information from a bank in the mail before a credit
E-mail: [email protected] card or original card holder. Fake & Card Holders Fraud &
credit card descriptions are not presented. In past, remote
communications can be done using card details via mail,
phone or internet. Second,(where is first) fake cards are
created on card data" explain more here [12].

72
IJSTR©2021
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 10, ISSUE 09, SEPTEMBER 2021 ISSN 2277-8616

1.2 Credit Card Fraud Detection by recombination of the calculated probability with the initial
Services make electronic payments more restful, seamless, belief of fraud using an advanced combination heuristic.
adequate, and simple to use; however, we must not overlook Vimala Devi. J et al. [19] To detect counterfeit transactions,
the losses associated with electronic commerce. three machine-learning algorithms were presented and
Organizations and banks to use them propose good security implemented. There are many measures used to evaluate the
solutions. To address these issues, but fraudsters' subtle performance of classifiers or predictors, such as the Vector
techniques evolve over time. As a result, it is critical to Machine, Random Forest, and Decision Tree. These metrics
improving detection and prevention techniques [7]. It is critical are either prevalence-dependent or prevalence-independent.
to understand the mechanisms for carrying a fraud in order to Furthermore, these techniques are used in credit card fraud
combat the fraud effectively. The gadget for identifying credit detection mechanisms, and the results of these algorithms
score card fraud relies upon on the fraud manner itself [13]. have been compared. Popat and Chaudhary [20] supervised
To accomplish this, provide the transaction details to the algorithms were presented Deep learning, Logistic
verification module, which will classify them as either fraud or Regression, Nave Bayesian, Support Vector Machine (SVM),
non-fraud. If it classified as fraudulent, it will be rejected. Neural Network, Artificial Immune System, K Nearest
Otherwise, the transaction is accepted [14]. Fraud detection Neighbour, Data Mining, Decision Tree, Fuzzy logic based
techniques such as statistical data analysis and artificial System, and Genetic Algorithm are some of the techniques
intelligence can be used to distinguish between the two. AI used. Credit card fraud detection algorithms identify
technique includes data mining that used to detect fraud, transactions that have a high probability of being fraudulent.
which can classify, group, and segment data to search through We compared machine-learning algorithms to prediction,
millions of transactions to find patterns and detect fraud. clustering, and outlier detection. Shiyang Xuan et al. [21] For
Machine learning is a technique for automatically detecting training the behavioral characteristics of credit card
fraud characteristics. One method of dealing with fraud is transactions, the Random Forest classifier was used. The
through both prevention and detection. Fraud detection and following types are used to train the normal and fraudulent
prevention's primary goal is to tell the difference between behavior features Random forest-based on random trees and
legitimate and fraudulent transactions and to prevent random forest based on CART. To assess the model's
fraudulent activity. Using historical data, the user's pattern effectiveness, performance measures are computed.
and behavior are analysed to determine if a transaction is Dornadula and Geetha S. [5] Using the Sliding-Window
fraudulent or not. When the system fails to detect and prevent method, the transactions were aggregated into respective
fraudulent activities, fraud detection takes over. [15]. In groups, i. , some features from the window were extracted to
supervised fraud detection systems, new transactions are find cardholder's behavioral patterns. Features such as the
classified as fraudulent or genuine based on characteristics of maximum amount, the minimum amount of a transaction, the
deceptive and legitimate activities, whereas outliers' average amount in the window, and even the time elapsed are
transactions are identified as prospective fraudulent available. Sangeeta Mittal et al. [22] To evaluate the
transactions in unsupervised fraud detection systems. A point- underlying problems, some popular machine learning-
by-point dialogue between supervised and unsupervised algorithms in the supervised and unsupervised categories
machine learning techniques can be discovered. Diversity of were selected. A range of supervised learning algorithms, from
studies have been conducted on several methods to solve the classical to modern, have been considered. These include
issue of card fraud detection. These approaches include, tree-based algorithms, classical and deep neural networks,
ANN, K-means Clustering, DT, etc.[16]. hybrid algorithms and Bayesian approaches. The
effectiveness of machine-learning algorithms in detecting
1.3 Fraud types in Card-based transactions credit card fraud has been assessed. On various metrics, a
1) Physical Card Fraud in most POS (point of sale) number of popular algorithms in the supervised, ensemble,
transactions, as it is essential that the cardholder must have to and unsupervised categories were evaluated. It is concluded
be physically presenting the card to the merchant to carry out that unsupervised algorithms handle dataset skewness better
the transaction. There are chances that the customer's card and thus perform well across all metrics absolutely and in
can be stolen and misused by fraudsters without the comparison to other techniques. Deepa and Akila [17] For
customer‘s knowledge. 2)Virtual Card Fraud: In most Online fraud detection, different algorithms like Anomaly Detection
shopping transactions there is no need for a physical card and Algorithm, K-Nearest Neighbor, Random Forest, K-Means and
instead we use the Card Number, Expiry Date, and CVV Decision Tree were used. Based on a given scenario,
number to perform the transaction. Fraudsters can steal this presented several techniques and predicted the best algorithm
information and they can use it to perform fraudulent online to detect deceitful transactions. To predict the fraud result, the
transactions‖ [17]. system used various rules and algorithms to generate the
Fraud score for that certain transaction. Xiaohan Yu et al. [23]
2. LITERATURE REVIEW have proposed a deep network algorithm for fraud detection A
Prajal Save et al. [18] have proposed a model based on a deep neural network algorithm for detecting credit card fraud
decision tree and a combination of Luhn's and Hunt's was described in the paper. It has described the neural
algorithms. Luhn's algorithm is used to determine whether an network algorithm approach as well as deep neural network
incoming transaction is fraudulent or not. It validates credit applications. The preprocessing methods and focal loss; for
card numbers via the input, which is the credit card number. resolving data skew issues in the dataset. Siddhant. Bagga et
Address Mismatch and Degree of Outlierness are used to al. [24] presented several techniques for determining whether
assess the deviation of each incoming transaction from the a transaction is real or fraudulent Evaluated and compared the
cardholder's normal profile. In the final step, the general belief accomplishment of 9 techniques on data of credit card fraud,
is strengthened or weakened using Bayes Theorem, followed including logistic regression, KNN, RF, quadrant discriminative

73
IJSTR©2021
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 10, ISSUE 09, SEPTEMBER 2021 ISSN 2277-8616

analysis, naive Bayes, multilayer perceptron, ada boost, 3.1 Selection of rudimentary Studies
ensemble learning, and pipelining, using different parameters To highlight primary research for selection, keywords were
and metrics. ADASYN method is used to balance the dataset. passed to the search engine, then they were chosen to
Accuracy, recall, F1 score, Balanced Classification Rate are enhance the development of research that wishes to aid in
used to assess classifier performance and Matthews‘s answering the study questions. The only Boolean factors that
correlation coefficient. This is to determine which technique is could be used were AND and OR. (‖machine-learning‖ OR
the best to use to solve the issue based on various metrics. ―machine learning‖) AND ―fraud detection‖ were the search
Carrasco and Urban [25] Deep neural networks have been terms. IEEE Explore Digital Library was one of the platforms
used to test and measure their ability to detect false positives looked into.
by processing alerts generated by a fraud detection system. - Google Scholar
Ten neural network architectures classified a set of alerts - Elsevier- Science Direct
triggered by an FDS as either valid alerts, representing real - Web site
fraud cases, or incorrect alerts, representing false positives. According on the search platforms, the title, keywords, and
When capturing 91.79 percent of fraud cases, optimal abstract were all searched for. On March 28, 2021, we
configuration achieved an alert reduction rate of 35.16 percent, conducted the searches, and we went over all of the previous
and a reduction rate of 41.47 percent when capturing 87.75 studies. The outcome of these searches refined using the
percent of fraud cases. Kibria and Sevkli [26] Using the grid criteria described in Section 3.2, resulting in a collection of
search technique, create a deep learning model. The built results that could be run.
model's performance is compared to the performance of two
other traditional machine-learning algorithms: logistic 3.2 Inclusion and Exclusion Criteria
regression (LR) and support vector machine (SVM). The Modern technological fraud detection, Case studies, and
developed model is applied to the credit card data set and the comments on how to improve existing mechanisms by
results are compared to logistic regression and support vector building a hybrid approach could all be considered for
machine models. Borse, Suhas and Dhotre. [27] Machine inclusion in this SLR. Papers must be read and write in the
learning's Naive Bayes classification was used to predict English language. Any Google Scholar findings are tested for
common or fraudulent transactions. The accuracy, recall, submission, as if Google Scholar has the ability to re-turn
precision, F1 score, and AUC score of the Naive Bayes lower-grade papers. This SLR will only accept the most recent
classifier are all calculated. Asha R B et al. [14] have proposed version of a sample. Table 1 lists the most important inclusion
a deep learning-based method for detecting fraud in credit and exclusion requirement.
card transactions. Using machine-learning algorithms such as
support vector machine, k-nearest neighbor, and artificial Table 1
neural network to predict the occurrence of fraud. used. INDICATES IMPLICATION AND EXCLUSION CRITERIA
FOR THE PRELIMINARY STUDY
3. RESEARCH METHODOLOGY
Systematic literature reviews, for example, are a type of Inclusion Exclusion
Must contain information related
methodology, which conducts a literature review on a specifi Cantering on the social or lawful
to fraud detection and learning
topic, could be used to detect fraud. A systematic review's ramifications of fraud.
machine technologies.
primary goal in this context is to identify, evaluate, and The paper must include
Interpret the available studies in the literature that address the empirical data on credit card
paper on detecting fraud on
authors' research questions. A secondary goal is to identify fraud as well as the use of
individuals and public sites
research gaps and opportunities in the area of interest. In this machine learning techniques for
detection.
paper, we attempted to walk through the activities proposed by The paper must have been
Kitchenham: analysis preparation, execution, and reporting in published in a journal or a written in a language other than
iterations. [28]. conference. English.

.
3.3 Selection Results
The primary keyword searches against the pick platforms
yielded 68 studies. After duplicate studies were removed, this
was reduced to 52. After the procedure of the survey through
the implication/exception criteria, there were 45 papers left to
read. The 45 papers have been read in their entirety, after
applying the inclusion/exclusion criteria a second time, 37
papers remained. As a result, SLR will comprise 37 papers in
total, as illustrated in the diagram below:

74
IJSTR©2021
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 10, ISSUE 09, SEPTEMBER 2021 ISSN 2277-8616

the end, the node is divided into two leaves (accepted and
declined offers).

4.2 Random Forest


Random Forest classifier finds decision trees in a subset of
the data and then aggregates their information to that to get
the full dataset's predictive power. Rather than relying on a
single decision tree. The RF takes the predictions from each
tree and forecasts the final output based on the majority votes
of forecasts. Using a huge number of trees in the forest
improves precision and eliminates the issue of over fitting. It
predicts output with high precision, and it runs efficiently even
with large datasets. It can also keep accuracy when a large
proportion of data is lost. Random Forest can handle both
classification and regression tasks. It can handle large
datasets with high dimensionality. It improves the model's
accuracy and avoids the over fitting problem. We use two-step
training techniques in the process of tree-based Random
Fig. 1. Paper Attrition during Processing Forest: First, we generate the random forest by mixing N trees
together, and then we estimate for each of the trees we
4. CREDIT CARD FRAUD DETECTION generate in the first phase [31]. An ensemble algorithm
TECHNIQUES employs the "random forest" artificial intelligence technique.
Because it averts over-fitting by averaging the results, this
approach outperforms single decision trees. Random Forest is
4.1 Decision trees
an ensemble of diverse trees, similar to Gradient Boosted
A supervised learning methodology, graphical representation
Trees, but unlike GBT, RF tree grow in parallel. Random
of possible solutions to a choice based on certain situations
Forests have a lot of uncorrelated trees. Because various trees
[29] As in Figure 2 and it is a tree-structured classifier. It starts
are trained in parallel, the overall model diminishes a large
with a root node where inside nodes represent the features of
number of variances. Random Forest treats each tree as a
a dataset, branches symbolize the decision rules and each leaf
separate classifier that has been trained on resampled data.
node represents the result. In a decision tree and they have
As a result of employing this this learn strategy and divide, the
the purposes of deciding and communicating respectively. A
model's overall learning ability is increased [10], [32].
decision tree plainly asks a question and then divides it into
sub trees based on the answer. Although DT can solve
classification and regression problems, it is most commonly
used to solve classification problems. To find the dataset
class, the algorithm searches at the top of the tree. It
compares the root Trait with the record attribute and follows
the offshoot on way to the next node, which it calculates
depending on the relation [30].

Fig. 3. General structure working of the RF

The Random Forest Working Steps


These steps illustrate Figure 3 above; in the first step, choose
(K) as data points at random from the drill set. Second,
Fig. 2. General structure working of DT construct the DT linked with the chosen data points (Subsets).
Following that, select the digit (N) for the number of decision
Steps Working Of Decision Tree trees you wish to construct. Then, duplicate Steps 1 and 2.
In the first phase, start with S, which is the root node and Finally, discover the predictions of each decision tree for new
includes the entire dataset. Second, discover the best Trait in data points and assign the modern data points to the category
the dataset using the Attribute Selection Measure. When the that receives most votes. Clarify how RF works by using the
nodes cannot be categorized, in that time the final node is following scenario: Assume you have a dataset with a variety
called a foliate node. Based on the labels, the root node is of fruit images. As outcome, RF classifier will be given this
extra subdivided into the decision node and one leaf node. In dataset. Each decision tree is given a portion of the dataset to
75
IJSTR©2021
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 10, ISSUE 09, SEPTEMBER 2021 ISSN 2277-8616

deal with. When a new data point occurs, the Random Forest network receives an input variable, the weight of that input is
classifier predicts the conclusion based on the majority of allocated at random. The importance of each input data point
outcomes. is indicated by its weight in terms of forecasting the result; the
prejudice parameter, on the other hand, allows you to fine-tune
4.3 Logistic Regression the activation mission curve to achieve precise results. The
An algorithm that can be used for both regression and produce of the input and the weight are calculated after the
classification tasks, but it is most commonly used for inputs have been given weight. We get the Weighted Sum by
classification.' ‗Logistic Regression is used to predict adding all of these products together. The summation function
categorical variables using dependent variables. Consider two accomplishes this.
classes, and a new data point is to be checked to see which
class it belongs to. The algorithms then compute probability
values ranging between (0) and (1). Logistic Regression
employs a more complex cost function, this cost function is
known as the Sigmoid Function or the Logistic Function.' [33].
LR also does not require independent variables to be linearly
related, nor does it require equal variance within each group,
making it a less stringent statistical analysis procedure. As a
result, logistic regression was used to predict the likelihood of
fraudulent credit cards [34]. Clarify the working of LR through
the following scenario: The default variable for determining
whether a tumor is malignant or not is y=1 (tumor= malignant);
the x variable could be a measurement of the tumor, such as
its size. The logistic function converts the x-values of the
dataset's various instances into a range of 0 to 1. The tumor is
classified as malignant if the probability exceeds 0.5. (As
indicated by the horizontal line). As shown in the figure below:

Fig. 5. General structure working of the ANN

Consider the following example to better understand how


works: You're creating an Artificial Neural Network that
categorizes photos into two groups to distinguish between
infected and non-diseased crops: In Class A, you'll find photos
of healthy leaves. In Class B, there are photos of sick leaves.
Often, the process begins with the input being interpreted and
transformed so that it can be facilely addressed. In this case,
each paper picture is divided into pixels based on the size of
the image. like, if the image‘s size is 30 * 30, the total number
of pixels is 900. The total will be transformed into matrices and
fed back into the system. An ANN's perceptrons receive
entries and address them by moving them from the first layer
to the secret layer and finally to the output layer, as Neurons
Fig. 4. Example of LR in our brains help us create and connect thoughts. Each input
is given an initial random weight as it passes from the input
4.4 Artificial Neural Networks layer to the hidden layer. The inputs are then multiplied by
A sort of machine learning algorithm That uses a sophisticated their weights, and the sum is passed as data to the next
interaction between outputs and inputs to uncover modern hidden layer.Each perceptron is frequently activated or
patterns. Also it a strategy for detecting fraudulent credit card transformed, which determines whether or not it is active. A
activity. Because of their advanced predictive capabilities, probability is calculated at the output layer to determine
ANNs can improve existing data analysis techniques. As whether the data belongs to category A or category B.
shown in Figure 5 input Layers are the the first layer receives
input information in the shape of different texts, audio files, 4.5 K -Nearest Neighbors
image pixels, numbers, and so on. Hidden Layers are made up A simple, easy-to-implement supervised machine-learning
of units that transform the input into something that the output technique that uses categorized input data to develop a
layer can use to perform various types of mathematical function that gives a suitable output when given additional
computations on the input data and recognize patterns. Layer unlabelled data. Both classification and regression problems
of output: The result obtained by the middle layer's rigorous can be solved with the k-nearest neighbors (KNN) algorithm,
computations is obtained in the output layer [35], [2]. The which is quick and straightforward to apply. Uses labeled data
weights w are multiplied by the inputs (x) obtained from the to teach a function that generates an acceptable performance
input layer. The Weighted Sum is formed by adding the for new data. In the K-Nearest Neighbor algorithm, the
multiplied values. After that, a related Activation Function is resemblance between the new case and the cases that are
applied to the weighted sum of the inputs. The energizing already categorized is calculated. Once the new case is placed
assignment converts it into a corresponding output. If the in a category that is most comparable to the available ones, it
76
IJSTR©2021
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 10, ISSUE 09, SEPTEMBER 2021 ISSN 2277-8616

is applied to all remaining cases in that group. In an analogous


fashion, KNN organizes all accessible data and categorizes
new points depending on how similar they are. This describes
anytime new data emerges, it is just a matter of fitting a K-N
classification scheme to it. The algorithm is very
straightforward and uncomplicated to put into practice. If a
model does not need to be built, so some parameters and
expectations may be tuned, it is unnecessary. The algorithm
gets significantly slower as predictors/independent variables
increase [36]. As shown in the figure below:

Fig. 7. General structure working of the K-MC

To explain how K-MC works. Consider the following situation:


If a hospital wishes to establish Care Wards. K-means
Clustering will divide these high-risk locations into clusters and
establish a cluster centre for each cluster, which will be the
location of the Emergency Units. These cluster centres are
each cluster's centroids and are located at a minimum
distance from all of the cluster's points; as a result, the
Fig. 6. General structure working of the K-NN Emergency Units will be located at a minimum distance from
all accident-prone places within a cluster.
Decide on the number of neighbors in the first phase (K).
Definethe Euclidean distance amidst K neighbors, then locate 5. COMPARISON OF VARIOUS MACHINE LEARNING
K closest neighbors using the measured Euclidean distance. TECHNIQUES
Count the number of data points in every group between this
KN in a subsequent phase, then assign the modern data
Table 1
points to the collection with the most neighbors. Finally, our
COMPARISON ADVANTAGES AND DISADVANTAGES
paradigm is finished. Consider the following scenario: We
OF VARIOUS MACHIN-LEARNING TECHNIQUUES
have an image of two animals: a cat and a dog, and we want Strong Weakness
to identify which one the picture represents. As a result, the Techniques
Advantage disadvantages
KNN can be utilized as a method for the definition because it is It is simple to grasp and
based on a likeness measure. Our KNN will look for similarities put into action. This technique has
between the modern data set and the photos of animals, and It can be extremely useful many layers, making it
in resolving decision difficulty.
classify it based on the most analogous attributes. action problems. It may own an over
Decision Tree High adaptability, which fitting issue, which the
4.6 K-means Clustering aids in considering all RF algorithm mastery
Because of its simplicity and effectiveness, it is the most potential solutions to a resolve. The DR
widely used unsupervised learning methodology. By problem. Arithmetic intricacy
There is minimal need for may increase.
calculating the mean distance between data points, this
data cleaning.
method allocates points to groups. It then repeats this process RF is capable of
in order to improve the accuracy of its categorizes over time accomplishment both
[37]. The K-Means in the figure below are explained via the Classification and Although RF can be
following steps: To determine the number of clusters, choose Regression tasks. applied for both
K. Then choose K locations or centroids at random. (It could It is able of handling large classification and
Random Forest datasets with rise regression function, it
be something different from the incoming dataset.) In the dimensionality. It is not more
following step: Assign each data point to the centroid that is promotes the appropriate for
closest to it, forming the preset K clusters. Then calculate the thoroughness of the Regression tasks.
variance and reposition each cluster's centroid. Repeat the model and prevents over
third step, reassigning each point to the cluster's modern fitting problem.
Easier to implement, The non-linear issue
nearest centroid. Steps to finish: If there is a reassignment, go
interpret, and very cannot be fixed with
to step 4; otherwise, move to FINISH. The model is finished. Logistic efficient to train. It makes logistic regression
Regression no assumptions about because it has a
distributions of classes in linear decision
feature space. surface.
Storing information on the The unexplained
Artificial Neural
entire network. Ability to demeanor of the
Networks
work with incomplete network.

77
IJSTR©2021
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 10, ISSUE 09, SEPTEMBER 2021 ISSN 2277-8616

Strong Weakness have advantages and disadvantages. The researchers use


Techniques
Advantage disadvantages different performance measures employed (techniques) and
It is simple to grasp and
put into action. This technique has
algorithms to predict and show transactions fraudulent. Studies
It can be extremely useful many layers, making it are refreshed and encouraged to improve the fraud detection
in resolving decision difficulty. basis to determine the weight that is suitable with cost factors,
action problems. It may own an over the tested accuracy, and detection accuracy. Surveys of such
Decision Tree High adaptability, which fitting issue, which the kind will allow the researchers to build a hybrid approach most
aids in considering all RF algorithm mastery accurate for fraudulent credit card transaction detection.
potential solutions to a resolve. The DR
problem. Arithmetic intricacy
There is minimal need for may increase. ACKNOWLEDGMENT
data cleaning. The author wish to thank Palestine technical university-Kadoori
RF is capable of (PTUK) for supporting this research work as part of PTUK
accomplishment both
Classification and Although RF can be research fund.
Regression tasks. applied for both
It is able of handling large classification and REFERENCES
Random Forest datasets with rise regression function, it [1] S. H. Projects and W. Lovo, ―JMU Scholarly Commons
dimensionality. It is not more
promotes the appropriate for Detecting credit card fraud : An analysis of fraud detection
thoroughness of the Regression tasks. techniques,‖ 2020.
model and prevents over [2] S. G and J. R. R, ―A Study on Credit Card Fraud Detection
fitting problem. using Data Mining Techniques,‖ Int. J. Data Min. Tech.
knowledge High Difficulty of showing Appl., vol. 7, no. 1, pp. 21–24, 2018, doi:
accuracy. Having a the problem to the
distributed memory network. The duration
10.20894/ijdmta.102.007.001.004.
Ability to make machine of the network is [3] ―Credit Card Definition.‖
learning. Parallel unknown (High https://www.investopedia.com/terms/c/creditcard.asp
processing capability processing time for (accessed Apr. 03, 2021).
large neural networks [4] K. J. Barker, J. D‘Amato, and P. Sheridon, ―Credit card
Always needs to fraud: awareness and prevention,‖ J. Financ. Crime, vol.
define the value of K,
It is strong to the noisy
which may be 15, no. 4, pp. 398–410, 2008, doi:
coaching data. It is 10.1108/13590790810907236.
complex sometimes.
straightforward to
The computation cost [5] V. N. Dornadula and S. Geetha, ―Credit Card Fraud
K -Nearest implement.
Neighbors Speed of detection is
is high because of Detection using Machine Learning Algorithms,‖ Procedia
calculating the Comput. Sci., vol. 165, pp. 631–641, 2019, doi:
good. If the training data
distance between the
is huge, it may be more
data points for all the
10.1016/j.procs.2020.01.057.
efficient.
training samples. [6] A. H. Alhazmi and N. Aljehane, ―A Survey of Credit Card
Expensive Fraud Detection Use Machine Learning,‖ 2020 Int. Conf.
Efficient and Quick.
Lots of recurrences. Comput. Inf. Technol. ICCIT 2020, pp. 10–15, 2020, doi:
Have to select your 10.1109/ICCIT-144147971.2020.9213809.
K-means Repeated technique.
possess k value.
Clustering Works on categorized [7] B. Wickramanayake, D. K. Geeganage, C. Ouyang, and Y.
Must understand the
digital data.
case of your data well. Xu, ―A survey of online card payment fraud detection using
data mining-based methods,‖ arXiv, 2020.
[8] A. Agarwal, ―Survey of Various Techniques used for Credit
6. RESULTS AND DISCUSSION
Card Fraud Detection,‖ Int. J. Res. Appl. Sci. Eng.
We observe that while each technology has its advantages,
Technol., vol. 8, no. 7, pp. 1642–1646, 2020, doi:
there are disadvantages that affect its effectiveness and its
10.22214/ijraset.2020.30614.
ability to identify and detect fraudulent transactions. The
[9] C. Reviews, ―a Comparative Study : Credit Card Fraud,‖
unbalanced nature of the fraudulent activity (the percentage of
vol. 7, no. 19, pp. 998–1011, 2020.
fraud compared to the volume of transactions).
[10] R. Sailusha, V. Gnaneswar, R. Ramesh, and G.
Ramakoteswara Rao, ―Credit Card Fraud Detection Using
7. CONCLUSION AND FUTUREU SCOPE Machine Learning,‖ Proc. Int. Conf. Intell. Comput. Control
Credit card fraud becomes a serious concern to the world. Syst. ICICCS 2020, no. Iciccs, pp. 1264–1270, 2020, doi:
Fraud brings huge financial losses to the world. This urged 10.1109/ICICCS48265.2020.9121114.
Credit card companies have been invested money to create [11] I. Sadgali, N. Sael, and F. Benabbou, ―Detection and
and develop techniques to reveal and reduce fraud. The prime prevention of credit card fraud: State of art,‖ MCCSIS
goal of this study is to define algorithms that confer the 2018 - Multi Conf. Comput. Sci. Inf. Syst. Proc. Int. Conf.
appropriate, and can be adapted by credit card companies for Big Data Anal. Data Min. Comput. Intell. 2018, Theory
identifying fraudulent transactions more accurately, in less time Pract. Mod. Comput. 2018 Connect. Sma, no. March
and cost. Different machine learning algorithms are compared, 2019, pp. 129–136, 2018.
including Logistic Regression, Decision Trees, Random Forest, [12] R. Goyal and A. K. Manjhvar, ―Review on Credit Card
Artificial Neural Networks, Logistic Regression, K-Nearest Fraud Detection using Data Mining Classification
Neighbors, and K-means clustering. Because not all scenarios Techniques & Machine Learning Algorithms,‖ IJRAR-
are the same, a scenario-based algorithm can be used to International J. Res. …, vol. 7, no. 1, pp. 972–975, 2020,
determine which scenario is the best fit for that scenario. All of [Online]. Available:
the fraud detection techniques discussed in this survey article http://www.ijrar.org/papers/IJRAR19K7539.pdf.
78
IJSTR©2021
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 10, ISSUE 09, SEPTEMBER 2021 ISSN 2277-8616

[13] M. Kanchana, V. Chadda, and H. Jain, ―Credit card fraud [27] D. D. Borse, P. S. H. Patil, and S. Dhotre, ―Credit Card
detection,‖ Int. J. Adv. Sci. Technol., vol. 29, no. 6, pp. Fraud Detection Using Naïve Bayes and C4,‖ vol. 10, no.
2201–2215, 2020, doi: 10.17148/ijarcce.2016.5109. 1, pp. 423–429, 2021.
[14] A. RB and S. K. KR, ―Credit Card Fraud Detection Using [28] P. J. Taylor, T. Dargahi, A. Dehghantanha, R. M. Parizi,
Artificial Neural Network,‖ Glob. Transitions Proc., pp. 0–8, and K. K. R. Choo, ―A systematic literature review of
2021, doi: 10.1016/j.gltp.2021.01.006. blockchain cyber security,‖ Digit. Commun. Networks, vol.
[15] R. R. Popat and J. Chaudhary, ―A Survey on Credit Card 6, no. 2, pp. 147–156, 2020, doi:
Fraud Detection Using Machine Learning,‖ Proc. 2nd Int. 10.1016/j.dcan.2019.01.005.
Conf. Trends Electron. Informatics, ICOEI 2018, vol. 25, [29] V. Patil and U. Kumar Lilhore, ―A Survey on Different Data
no. 01, pp. 1120–1125, 2018, doi: Mining & Machine Learning Methods for Credit Card Fraud
10.1109/ICOEI.2018.8553963. Detection,‖ Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol.
[16] O. Adepoju, J. Wosowei, S. Lawte, and H. Jaiman, © 2018 IJSRCSEIT, vol. 5, no. 10, pp. 320–325, 2018, doi:
―Comparative Evaluation of Credit Card Fraud Detection 10.13140/RG.2.2.22116.73608.
Using Machine Learning Techniques,‖ 2019 Glob. Conf. [30] ―Machine Learning Decision Tree Classification Algorithm
Adv. Technol. GCAT 2019, pp. 1–6, 2019, doi: - Javatpoint.‖ https://www.javatpoint.com/machine-
10.1109/GCAT47503.2019.8978372. learning-decision-tree-classification-algorithm (accessed
[17] M. Deepa and D. Akila, ―Survey Paper for Credit Card Apr. 03, 2021).
Fraud Detection Using Data Mining Techniques,‖ Int. J. [31] ―Machine Learning Random Forest Algorithm -
Innov. Res. Appl. Sci. Eng., vol. 3, no. 6, p. 483, 2019, doi: Javatpoint.‖ https://www.javatpoint.com/machine-learning-
10.29027/ijirase.v3.i6.2019.483-489. random-forest-algorithm (accessed Apr. 03, 2021).
[18] P. Save, P. Tiwarekar, K. N., and N. Mahyavanshi, ―A [32] A. Mishra and C. Ghorpade, ―Credit Card Fraud Detection
Novel Idea for Credit Card Fraud Detection using Decision on the Skewed Data Using Various Classification and
Tree,‖ Int. J. Comput. Appl., vol. 161, no. 13, pp. 6–9, Ensemble Techniques,‖ 2018 IEEE Int. Students‘ Conf.
2017, doi: 10.5120/ijca2017913413. Electr. Electron. Comput. Sci. SCEECS 2018, pp. 1–5,
[19] J. Vimala Devi and K. S. Kavitha, ―Fraud Detection in 2018, doi: 10.1109/SCEECS.2018.8546939.
Credit Card Transactions by using Classification [33] ―Introduction to Logistic Regression | by Ayush Pant |
Algorithms,‖ Int. Conf. Curr. Trends Comput. Electr. Towards Data Science.‖
Electron. Commun. CTCEEC 2017, pp. 125–131, 2018, https://towardsdatascience.com/introduction-to-logistic-
doi: 10.1109/CTCEEC.2017.8455091. regression-66248243c148 (accessed Apr. 03, 2021).
[20] R. R. Popat and J. Chaudhary, ―A Survey on Credit Card [34] S. Venkata Suryanarayana, G. N. Balaji, and G.
Fraud Detection Using Machine Learning,‖ Proc. 2nd Int. Venkateswara Rao, ―Machine learning approaches for
Conf. Trends Electron. Informatics, ICOEI 2018, no. Icoei, credit card fraud detection,‖ Int. J. Eng. Technol., vol. 7,
pp. 1120–1125, 2018, doi: 10.1109/ICOEI.2018.8553963. no. 2, pp. 917–920, 2018, doi: 10.14419/ijet.v7i2.9356.
[21] S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang, and C. Jiang, [35] ―Artificial Neural Networks for Machine Learning - Every
―Random forest for credit card fraud detection,‖ ICNSC aspect you need to know about - DataFlair.‖ https://data-
2018 - 15th IEEE Int. Conf. Networking, Sens. Control, pp. flair.training/blogs/artificial-neural-networks-for-machine-
1–6, 2018, doi: 10.1109/ICNSC.2018.8361343. learning (accessed Apr. 03, 2021).
[22] S. Mittal and S. Tyagi, ―Performance evaluation of [36] ―K-Nearest Neighbor(KNN) Algorithm for Machine
machine learning algorithms for credit card fraud Learning - Javatpoint.‖ https://www.javatpoint.com/k-
detection,‖ Proc. 9th Int. Conf. Cloud Comput. Data Sci. nearest-neighbor-algorithm-for-machine-learning
Eng. Conflu. 2019, pp. 320–324, 2019, doi: (accessed Apr. 03, 2021).
10.1109/CONFLUENCE.2019.8776925. [37] ―K-Means Clustering Algorithm for Machine Learning | by
[23] X. Yu, X. Li, Y. Dong, and R. Zheng, ―A Deep Neural Madison Schott | Capital One Tech | Medium.‖
Network Algorithm for Detecting Credit Card Fraud,‖ Proc. https://medium.com/capital-one-tech/k-means-clustering-
- 2020 Int. Conf. Big Data, Artif. Intell. Internet Things Eng. algorithm-for-machine-learning-d1d7dc5de882 (accessed
ICBAIE 2020, pp. 181–183, 2020, doi: Apr. 03, 2021).
10.1109/ICBAIE49996.2020.00045.
[24] S. Bagga, A. Goyal, N. Gupta, and A. Goyal, ―Credit Card
Fraud Detection using Pipeling and Ensemble Learning,‖
Procedia Comput. Sci., vol. 173, pp. 104–112, 2020, doi:
10.1016/j.procs.2020.06.014.
[25] R. San Miguel Carrasco and M.-A. Sicilia-Urban,
―Evaluation of Deep Neural Networks for Reduction of
Credit Card Fraud Alerts,‖ IEEE Access, vol. 8, pp.
186421–186432, 2020, doi:
10.1109/access.2020.3026222.
[26] G. Kibria and M. Sevkli, ―Application of Deep Learning for
Credit Card Approval : A Comparison with Application of
Deep Learning for Credit Card Approval : A Comparison
with Two Machine Learning Techniques,‖ no. January, pp.
0–5, 2021, doi: 10.18178/ijmlc.2021.11.4.1049.

79
IJSTR©2021
View publication stats
www.ijstr.org

You might also like