0% found this document useful (0 votes)
10 views

Evaluation of Machine Learning Algorithms For The Detection of Fake Bank Currency

Uploaded by

Suraj Wadkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Evaluation of Machine Learning Algorithms For The Detection of Fake Bank Currency

Uploaded by

Suraj Wadkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Evaluation of Machine Learning Algorithms for the

2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence) | 978-1-6654-1451-7/20/$31.00 ©2021 IEEE | DOI: 10.1109/Confluence51648.2021.9377127

Detection of Fake Bank Currency


Anju Yadav* Tarun Jain
SCIT, Manipal University Jaipur SCIT, Manipal University Jaipur
[email protected] [email protected]

Vivek Kumar Verma Vipin Pal


SCIT, Manipal University Jaipur NIT, Meghalaya
[email protected] [email protected]

resemblance is like genuine not and it is very difficult to


Abstract— The one important asset of our country is Bank discriminate them [1]. This will lead to financial market to its
currency and to create discrepancies of money miscreants lowest level. To stop this and to conduct smooth transaction
introduce the fake notes which resembles to original note in the circulation forged bank currency must be conserved [16]. As a
financial market. During demonetization time it is seen that so human being it is very difficult to identify between genuine and
much of fake currency is floating in market. In general by a forged bank currency. Government have designed banknote
human being it is very difficult to identify forged note from the with some features by which we can identify genuine [9]. But
genuine not instead of various parameters designed for frauds are creating fake note with almost same features with
identification as many features of forged note are similar to
nice accuracy that make it very difficult to identify genuine note
[5]. So, now a days it is required that bank or ATM machines
original one. To discriminate between fake bank currency and
must have some system that can identify the forged note from
original note is a challenging task. So, there must be an automated
the genuine note [12]. To determine the legitimacy of the
system that will be available in banks or in ATM machines. To
banknote artificial intelligence and Machine learning(ML) can
design such an automated system there is need to design an
play a vital role to design such a system that ca identify forged
efficient algorithm which is able to predict weather the banknote note from the genuine bank currency[6,7,12].
is genuine or forged bank currency as fake notes are designed
with high precision. In this paper six supervised machine learning Now a days, supervised machine learning (SML)
algorithms are applied on dataset available on UCI machine approaches for classification problem is widely used. For
learning repository for detection of Bank currency medical disease its shows even promising results [2]. Few
authentication. To implement this we have applied Support authors have only applied SML algorithms on bank currency
Vector machine, Random Forest, Logistic Regression, Naïve authentication [6-9, 12]. To identify weather a note is genuine
Bayes, Decision Tree, K- Nearest Neighbor by considering three
or fake we have to develop an automation system. Initially, the
input is an image of note and from different image processing
train test ratio 80:20, 70:30 and 60:40 and measured their
techniques we can extract the features of note. Further these
performance on the basis various quantitative analysis parameter
images are given as an input to the SML algorithms to predict
like Precision, Accuracy, Recall, MCC, F1-Score and others. And
whether note is original or fake. In review we can see that not
some of SML algorithm are giving 100 % accuracy for particular
much of work is done on this side.
train test ratio. Contribution of the paper: First we have visualized the dataset
taken from UCI ML repository using different types of
Keywords— Support Vector Machine, Bank currency,
plotting, pre-processed the data. Further, SML algorithms
Supervised Machine Learning
Logistic regression (LR), Naive Bayes (NB), Decision tree
I. INTRODUCTION (DT), Random tree (RT), KNN, SVM are applied on the data
set which contains the features extracted from the bank
Financial activities are carrying out in every second by currency to classify them as whether it is original or not. For
many persons in which one most important asset of our country analysis of their result, we have applied SML algorithms on
is Banknotes [3]. Fake notes are introduced in the market to dataset with three different train test ratio and their results are
create discrepancies in the financial market, even they resemble compared on the basis of different SML algorithms standard
to the original note. Basically they are illegally created to evaluation parameters like MCC, F1 Score, NPV, NDR,
complete various task [12]. In 1990 forgery issue is not much accuracy and others.
of concern but as in late 19th century forgery has been increasing In section II literature review of papers is discussed
drastically [13]. In 20th century technology is increasing very followed to that in Section III description of dataset is given.
vastly that will help the frauds to generate fake note whose

978-1-6654-1451-7/21/$31.00 2021
c IEEE 810

Authorized licensed use limited to: Marathwada Mitra Mandal's College of Engg. Downloaded on August 22,2024 at 08:46:18 UTC from IEEE Xplore. Restrictions apply.
Further Results of various classification algorithms are In this section, results of various SML algorithm is
analysed on the basis of standard quantitative analysis discussed in detail. SVM, LR, KNN, DT, RF and NB is
parameter and Conclusion are drawn in Section IV. applied on bank currency authentication data to classify
whether the note is genuine or forge. To accomplish this task
II. LITERATURE REVIEW three train test ratio are considered 80:20, 60:40, 70:30 and
In this section, review of some papers is discussed further above algorithms are applied to test their accuracy and
those applied machine learning approaches to classify whether to also derive various other quantitative analysis parameter for
not is original or not. Yeh et. al. implemented SVM based on evaluation of the ML models performance.
multiple kernel to reduce false rate and compare d with SVM The following results are observed after applying various SML
(single kernel) [17,19]. To classify real and forged network. algorithm
Author’s Hassanpour et. al. used texture based feature :
extraction method for the recognition and to model texture A SML algorithms description with the ROC and Learning curves to
Markov chain concept is used. This method is able to measure accuracy for train test ratio 80:20.
recognize different countries’ currencies [5]. To classify SVM: It is SML model to classify the data on the basis of pattern
whether the note is forged or not global optimization recognition. To separate the two classes a decision boundary is
algorithms are applied in Artificial Neural Network (ANN)
created in the data. Dataset items are plotted on the graph then
training phase, and they have observed good success in
classification of note [8, 14, 15, 11]. Decision tree and MLP classification is performed for differentiating the two classes
algorithms are used to classify the bank currency in [7]. using hyperplane concept. Kernel function is used to convert in
Further multi-classification was done using wavelet for feature linearly separable data from the non-linearly separable data. For
extraction by [4] BPN and SVM machine learning algorithms less number of features linear kernel is used for classification
are used to classify the bank currency and it’s found that BPN and number of large test cases [6]. SVM is applied on dataset by
is giving more accuracy than SVM. [6, 16]. In [2] for the considering three different train test ratio (80:20, 60:40 and
counterfeit type of currency notes classification is done using 70:30) to predict whether the bank currency is forge or genuine.
segmentation for the feature extraction based on different For train test ratio 80:20 ROC curve and learning curves are
regions of the note. Same type of study is done in [6] where drawn. From Fig. 1. Accuracy of SVM is observed around 98%
bank currency features are extracted using segmentation of see Fig. 1.
image and further these features are given as input to SVM for
determining the note authentication. Neural network (NN) is
applied to the Thai bank currency for classification, scanner is
used to collect the note image and to covert in bit map for
feature extraction and then these data are given to BPN for
detecting authentication of Bank currency [15].
Probabilistic NN method is used for classification of bank
currency [9] and in [10] LVQ classifier is used for detecting
note authentication. Both the papers authors applied above
approaches on US Dollars data set. Recognition of euro
banknotes has been proposed by using perceptron of three
layer and to classify bank currency into a particular class by
considering input as an image of bank currency. The back
propagation method is used to train the model. Further for
validation radial basis function is used to discard the invalid
data [11].
Fig. 1. SVM ROC and Learning curve for train test ratio
III. DESCRIPTION OF DATA SET AND RESULTS OF MACHINE 80:20.
LEARNING ALGORITHM FOR PREDICTION OF FORGED AND LR: It is a SML model that is very commonly or widely used for
GENUINE BANK CURRENCY. the classification. Performance of LR model for linearly
Description of dataset: Dataset is taken from machine learning separable classes is very well and even easy to implement.
repository of UCI to train the models [18]. The features of data Specially, in industry it is most commonly used. In general LR
is extracted from the forge and genuine images of banknote. is used for binary classification as it is a linear model but using
Total instance in dataset are 1372 and 5 attributes are present. In technique OvR it may be used for classification of multi class
dataset 4 are the features and one is target. The dataset is divided [9]. LR is applied on dataset by considering three different train
into two classes forge and genuine, ratio (55: 45 percent) of both test ratio (80:20, 60:40, and 70:30) to predict whether the bank
the classes are balanced. Two values are present in the target currency is forge or genuine. For train test ratio 80:20 ROC
attribute i.e., 0 and 1 where 1 represent the fake note and 0 is curve and learning curves are drawn see Fig. 2. Accuracy of LR
represented as genuine note. is observed around 98% see Fig. 2.

2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence 2021) 811

Authorized licensed use limited to: Marathwada Mitra Mandal's College of Engg. Downloaded on August 22,2024 at 08:46:18 UTC from IEEE Xplore. Restrictions apply.
In general for training of such classifiers there is not a
particular algorithm, even having a bunch of algorithms
working on common principle i.e., NB assumes that each
feature have an independent value means not having relation
with some other feature for a given class variable. NB works
very efficiently specially for probability based models and
with supervised learning model may be trained very
effectively [1]. NB is applied on dataset by considering three
different train test ratio (80:20, 60:40, and 70:30) to predict
whether the bank currency is forge or genuine. For train test
Fig. 2. LR ROC and Learning curve for train test ratio 80:20. ratio 80:20 ROC curve and learning curves are drawn see Fig.
4. Accuracy of NB is observed around 84% i.e., lowest
DT: It is a classification model having a structure like a tree. DT accuracy see Fig. 4.
is incrementally developed by breaking down the data set into
smaller subsets. DT results is having two types of nodes KNN: It is a SML model that may be used for classification as
Decision nodes and leaf nodes. For an example consider a well as regression problems of prediction but mainly in
decision node i.e., Outlook and it have branches as Rainy, industry it is used for classification problems. KNN is lazy
Overcast and Sunny representing values of the tested feature. algorithm means it learns very slowly as its training is very
Hours Played i.e., a leaf node it gives the decision on numerical slow due to the consideration of whole dataset for
targeted value. DT can handle both numerical as well as classification. And it is also known as parametric learning
categorical data [8]. DT is applied on dataset by considering algorithm as it will not consider any information form the
three different train test ratio (80:20, 60:40, and 70:30) to predict underlying data. Basically, KNN uses the concept of feature
whether the bank currency is forge or genuine. For train test ratio similarity to find out the new data point values [9] i.e., the
80:20 ROC curve and learning curves are drawn. See Fig. 3. value assigned to the new data point is based on the matching
Accuracy of DT have been observed around 99% see Fig. 3. of its value to the points of training set [9]. KNN is applied on
dataset by considering three different train test ratio (80:20,
60:40, and 70:30) to predict whether the bank currency is forge
or genuine. For train test ratio 80:20 ROC curve and learning
curves are drawn see Fig. 5(a). Accuracy of SVM is observed
around 100% see Fig. 5 (a).

RF: It is a classifier that is formed by the combination of


various decision tree. Sub data sets are randomly selected from
the original data set to construct the sub classifier further they
are combined to form RF classifier. Each sub-classifier predict
the class and then voting is performed and class whose votes
is highest becomes the prediction of model [5]. RF is applied
Fig. 3. DT ROC and Learning curve for train test ratio 80:20. on dataset by considering three different train test ratio (80:20,
60:40, and 70:30) to predict whether the bank currency is forge
NB: NB is a classifier in which class labels are assigned to the or genuine. For train test ratio 80:20 ROC curve and learning
problem instances that are represents as feature values vector curves are drawn see Fig. 5(b). Accuracy of RF is observed
and whereas finite set are used to derive the class label see Fig. around 99% see Fig. 5(b).
4.

Fig. 4. NB ROC and Learning curve for train test ratio 80:20

812 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence 2021)

Authorized licensed use limited to: Marathwada Mitra Mandal's College of Engg. Downloaded on August 22,2024 at 08:46:18 UTC from IEEE Xplore. Restrictions apply.
Precision represents probability relevance of an item i.e., items
which are relevant are selected from the selected total items.
Probability of that a relevant item is selected is represented by
Recall i.e., ration of items selected which are relevant to items
available which are relevant. Formula to calculate Precision (ܲ௥ ሻ
and Recall (ܴ௘ ሻare:
ܶܲ௚௚
ܲ௥ ൌ
‫ܲܨ‬௚௙ ൅ ܶܲ௚௚
ܶܲ௚௚
ܴ௘ ൌ
‫ܰܨ‬௙௚ ൅ ܶܲ௚௚
The harmonic mean between ܲ௥ and ܴ௘ are called as F1-Measure
(‫ͳܨ‬௠ ). Basically it shows the balance between ܲ௥ andܴ௘ . ‫ͳܨ‬௠
will not be affected by the imbalance in class whereas ‫ܣ‬௖ may
Fig. 5. ROC curve a) KNN and b) Random Forest for train be affected by the imbalance present in class. The formula to
test ratio 80:20. calculate ‫ͳܨ‬௠ is:
ܲ௥ ‫ܴ כ‬௘
‫ͳܨ‬௠ ൌ ʹ ‫כ‬
B Quantitative analysis of algorithms on the basis of ܲ௥ ൅ ܴ௘
various Evaluation Parameter Sensitivity is called as true positive rate (TPR) that is used to
In this section various evaluation parameters calculate correctly identified positive samples proportion. True
definition and formula are discussed which are used to perform negative rate (TNR) that is used to calculate correctly identified
quantitative analysis. Three different train test ratio 80:20, negative samples proportion and also called as Specificity.
70:30 and 60:40 is considered for the result analysis of six Formula to calculate Sensitivity and Specificity is:
ܶܲ௚௚
popular SML algorithm SVM, LR, NB, DT, RF and KNN. ܵ௘ ൌ
ܲ
ܶܰ௙௙
Evaluation Parameter to measure performance of SML ܵ௣ ൌ
ܰ
algorithm: Where, P = Number of Genuine Note, N = Number of Forged
To evaluate the efficiency and to analyze which Note
classification algorithm performs better. Quantitative analysis False positive rate (FPR) is used to calculate the proportion of
can be done for Machine Learning algorithms on the basis of forged notes that are classified wrongly as genuine notes. False
various evaluation parameter like Accuracy, Recall, F1-score, negative rate (FNR) calculate the proportion of genuine notes
etc. parameter to identify. All the evaluation parameter that are that are classified wrongly as forged notes. Negative Predictive
used to determine the efficiency of classification algorithm are Value (NPV) give out the no of samples which are ܶܰ௙௙ . False
measured through Confusion matrix (CM). CM is used to Discovery Rate (FDR) calculate proportion of ‫ܲܨ‬௚௙ among all
determine these measures:
the sample that are classified as P. Formula to calculate FPR,
TABLE I. CONFUSION MATRIX
FNR, NPV and FDR are as follows:
Predicted/Classified
‫ܲܨ‬௚௙
Negative Positive ‫ ܴܲܨ‬ൌ
ܰ
Negative True Negative False Positive ‫ܰܨ‬௙௚
(ܶܰ௙௙ ) (‫ܲܨ‬௚௙ ) ‫ ܴܰܨ‬ൌ
ܲ
Actual ܶܰ௙௙
Positive False Negative True Positive ܸܰܲ ൌ
(‫ܰܨ‬௙௚ ) (ܶܲ௚௚ ) ܶܰ௙௙ ൅ ‫ܰܨ‬௙௚
‫ܲܨ‬௚௙
‫ ܴܦܨ‬ൌ
Where, True Positive (ܶܲ௚௚ ) = Genuine Note are classified as ‫ܲܨ‬௚௙ ൅ ܶܲ௚௚
Genuine Note. False Positive (‫ܲܨ‬௚௙ ) = Genuine Note are
classified as Forged Note. True Negative (ܶܰ௙௙ ) = Forged Note Matthews’s correlation coefficient (MCC) is used even for
are classified as Forged Note. False Negative (‫ܰܨ‬௙௚ ) = Forged different sizes classes, random and imbalanced data and is used
to measure balance based on CM. Its value lies between -1 to
Note are classified as Genuine Note.
+1. To determine which classification algorithm better MCC is
with ‫ͳܨ‬௠ is used. Formula to calculate MCC is as follow:
The Accuracy indicates the state of being correct, i.e., the
‫ܥܥܯ‬
Genuine banknote are classified as genuine. Formula to
ൌ ൫ܶܲ௚௚ ‫ܰܶ כ‬௙௙ ൯
calculate accuracy is: ‫ܲܨ‬௚௙ ‫ܰܨ כ‬௙௚

ܶܲ௚௚ ൅ ܶܰ௙௙ ܴܵܳܶ ቀ൫ܶܲ௚௚ ൅ ‫ܲܨ‬௚௙ ൯൫ܶܲ௚௚ ൅ ‫ܰܨ‬௙௚ ൯൫ܶܰ௙௙ ൅ ‫ܲܨ‬௚௙ ൯൫ܶܰ௙௙ ൅ ‫ܰܨ‬௙௚ ൯ቁ
‫ܣ‬௖ ൌ
ܶܲ௚௚ ൅ ܶܰ௙௙ ൅ ‫ܲܨ‬௚௙ ൅ ‫ܰܨ‬௙௚

2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence 2021) 813

Authorized licensed use limited to: Marathwada Mitra Mandal's College of Engg. Downloaded on August 22,2024 at 08:46:18 UTC from IEEE Xplore. Restrictions apply.
Analysis results of Machine Learning Algorithm accuracy is seen in Decision tree i.e., 100 percent see Table III.
In this subsection three different train test ratio is considered MCC value is also +1 that shows that decision tree is
80:20. 60:40 and 70:30. For each train test ratio we have applied performing better than other five algorithms. The lowest
six machine learning algorithm SVM, LR, NB, DT, RF and accuracy is seen in Naïve Bayes only that is same in 80:20 train
KNN on dataset to identify whether the Banknote is genuine or test ratio See table II and III. To visualize results of evaluation
forged. And to measure the performance of these algorithms parameter histogram is drawn with evaluation parameter and
quantitative analysis parameter are considered discussed in their values for SVM, LR, NB, DT, RF and KNN see Fig. 7.
detail in section 3.2.1. For 80:20 train test ratio Accuracy of
KNN model is highest i.e., 100 percent see Table I. In section TABLE III. EVALUATION PARAMETER VALUE OF
3.1.1 we have already discussed that if MCC value is near +1 SVM, LR, NB, DT, RF AND KNN FOR TRAIN TEST
then it is perfect model and we can see that from Table II for RATIO 60:40.
KNN value of MCC is 1 and F1m is also1. And Naïve Bayes is 60:40 SVM LR KNN DT RF NB
having the lowest accuracy i.e., 84 percent see Table II. To see
other evaluation parameter and their value see Table II. To Specificity 0.997 0.995 0.9967 1 0.99 0.8
visualize all evaluation parameter and their performance for six Sensitivity 0.98 0.98 1 0.99 1 0.858
different t machine learning algorithms see Fig. 6. Accuracy 0.98 0.98 0.9981 1 0.98 0.833
Precision 0.99 0.99 0.9959 1 0.99 0.83
TABLE II. EVALUATION PARAMETER VALUE OF
FPR 0.02 0.02 0.004 0.01 0.09 0.099
SVM, LR, NB, DT, RF AND KNN FOR TRAIN TEST
RATIO 80:20. FNR 0.006 0.003 0.0032 0 0 0.11
80:20 SVM LR KNN DT RF NB NPV 0.99 0.99 1 1 1 0.8
Specificity 1 0.99 1 0.99 0.99 0.81 FDR 0.016 0.016 0.004 0.01 0.01 0.18
Sensitivity 0.97 0.98 1 0.99 0.99 0.86 F1- Score 0.99 0.99 0.9979 0.99 0.99 0.67
Accuracy 0.98 0.98 1 0.99 0.99 0.84 MCC 0.97 0.97 0.9963 1 0.98 0.66
Precision 0.98 0.98 1 0.98 0.98 0.83
FPR 0.032 0.024 0 0.018 0.008 0.044
FNR 0 0.01 0 0.01 0.02 0.032 Train Test ratio 60:40
NPV 1 0.99 1 1 0.99 0.83
FDR 0.026 0.019 0 0.022 0.01 0.209 1.2
F1- Score 0.99 0.99 1 0.99 0.99 0.69 1
0.8
MCC 0.97 0.97 1 0.99 0.98 0.687 0.6
0.4
0.2
0
Train test Ratio 80:20
1.2
1
0.8
0.6
SVM Logistic Regression
0.4
0.2 KNN Decision Tree
0
Random Forest Naive Bayes

SVM Logistic Regression


Fig. 7. Histogram Graph on Evaluation Parameter value of SVM, LR,
KNN Decision Tree NB, DT, RF and KNN for train test ratio 60:40

Naive Bayes Lastly we have taken 70:30 train test ratio to measure the
accuracy and the performance of SVM, LR, NB, DT, RF and
KNN machine learning algorithms on bank currency dataset for
Fig. 6. Histogram Graph on Evaluation Parameter value of SVM, the prediction whether the note is genuine or forged. From Table
LR, NB, DT, RF and KNN for train test ratio 80:20. IV we have observed that KNN algorithm is giving the highest
accuracy and also the MCC value for KNN is nearer to 1. And
Now, second train test ratio 60:40 is selected and machine the lowest accuracy is seen in Naive Bayes i.e., 86 percent
learning algorithm SVM, LR, NB, DT, RF and KNN are lowest MCC see Table 4. To visualize the evaluation parameter
applied on data set of Bank currency. To evaluate the of SVM, LR, NB, DT, RF and KNN histogram is drawn see Fig.
performance of these algorithm Section IIIB evaluation 8.
parameters are considered. For train test ratio 60:40 highest

814 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence 2021)

Authorized licensed use limited to: Marathwada Mitra Mandal's College of Engg. Downloaded on August 22,2024 at 08:46:18 UTC from IEEE Xplore. Restrictions apply.
TABLE IV. EVALUATION PARAMETER OF SVM, LR, the lowest accuracy i.e., 84% in 80:20 and 86% in 70:30 and
NB, DT, RF AND KNN FOR TRAIN TEST RATIO 70:30. its MCC is lowest as well for both the train test ratio. For train
test ratio 60:40 highest accuracy is seen in DT i.e., 100%,
70:30 SVM LR KNN DT RF NB
MCC value is also +1 that shows that decision tree is
Specificity 0.98 0.99 0.9956 1 0.97 0.831 performing better than five SML algorithms. The lowest
Sensitivity 0.98 0.985 1 0.993 0.98 0.88 accuracy is seen in Naïve Bayes only that is same in 80:20
Accuracy 0.98 0.99 0.9975 0.996 0.97 0.861 train test ratio see Table II, III, and IV. To visualize the
Precision 0.98 0.99 0.9944 1 0.97 0.89 evaluation parameter of SVM, LR, NB, DT, RF and KNN
FPR 0.022 0.016 0.0055 0.01 0.01 0.09 histogram is also drawn.
FNR 0.008 0.0042 0.0043 0.01 0.02 0.08
REFERENCES
NPV 0.98 0.99 1 1 0.97 0.83 [1] M. Aoba, T. Kikuchi, and Y. Takefuji, “Euro Banknote Recognition System
FDR 0.017 0.0127 0.0055 0.01 0.02 0.149 Using a Three-layered Perceptron and RBF Networks”, IPSJ Transactions
F1- Score 0.99 0.99 0.9972 0.99 0.98 0.7 on Mathematical Modeling and it's Applications, May 2003.
[2] S. Desai, S. Kabade, A. Bakshi, A. Gunjal, M. Yeole, “Implementation of
MCC 0.97 0.98 0.995 0.992 0.96 0.71 Multiple Kernel Support Vector Machine for Automatic Recognition and
Classification of Counterfeit Notes”, International Journal of Scientific &
Train Test ratio 70:30 Engineering Research, October-2014.
[3] C. Gigliaranoa, S. Figini, P. Muliere, “Making classifier performance
1.5 comparisons when ROC curves intersect”, Computational Statistics and
Data Analysis 77 (2014) 300–312.
1 [4] E. Gillich and V. Lohweg, “Banknote Authentication”, 2014.
0.5 [5] H. Hassanpour and E. Hallajian, “Using Hidden Markov Models for Feature
0 Extraction in Paper Currency Recognition.
[6] Z. Huang, H. Chen, C. J. Hsu, W. H. Chen and S. Wuc, “Credit rating
analysis with support vector machines and neural network: a market
comparative study”, 2004.
[7] C. Kumar and A. K. Dudyala, “Banknote Authentication using Decision
Tree rules and Machine Learning Techniques”, International Conference
SVM Logistic Regression on Advances in Computer Engineering and Applications(ICACEA), 2015.
[8] M. Lee and T. Chang, “Comparison of Support Vector Machine and Back
KNN Decision Tree Propagation Neural Network in Evaluating the Enterprise Financial
Distress”, International Journal of Artificial Intelligence & Applications
Random Forest Naive Bayes 1.3 (2010) 31-43.
[9] C. Nastoulis, A. Leros, and N. Bardis, “Banknote Recognition Based On
Probabilistic Neural Network Models”, Proceedings of the 10th WSEAS
Fig. 8. Histogram Graph on Evaluation Parameter value of SVM, International Conference on SYSTEMS, Vouliagmeni, Athens, Greece,
LR, NB, DT, RF and KNN for train test ratio 70:30 July 10-12, 2006.
[10] S. Omatu, M. Yoshioka and Y. Kosaka, “Bankcurrency Classification
IV. CONCLUSION Using Neural Networks”, IEEE, 2007.
[11] A. Patle and D. S. Chouhan, “SVM Kernel Functions for Classification”,
In this paper, SML algorithm SVM, LR, NB, DT, RF and ICATE 2013.
[12] E. L. Prime and D. H. Solomon, “Australia’s plastic banknotes: fighting
KNN are applied to the banknote authentication dataset taken counterfeit currency.,” Angewandte Chemie (International ed. in English),
from UCI ML repository on three different train test ratio vol. 49, no. 22, pp. 3726–36, May 2010.
80:20, 60:40, 70:30. The dataset contain 1372 and 5 attributes [13] A. Roy, B. Halder, and U. Garain, “Authentication of currency notes
and out of which 4 are the features and one is the target through printing technique verification,” Proceedings of the Seventh Indian
Conference on Computer Vision, Graphics and Image Processing -ICVGIP
attribute that have value as genuine bank currency or forged ’10, pp. 383–390, 2010.
note. Initially, we have visualized the data by KDE, Box plot [14] P. D. Shahare and R. N. Giri, “Comparative Analysis of Artificial Neural
and par plot to study the correlation between the features and Network and Support Vector Machine Classification for Breast Cancer
the target class see Section III (See Fig. 1, 2 and 3). From this Detection”, International Research Journal of Engineering and
Technology, Dec-2015.
section it is concluded that all features are important and have [15] F. Takeda, L. Sakoobunthu and H. Satou, “Thai Banknote Recognition
relation with the target class as well as other features, so we Using Neural Network and Continues Learning by DSP Unit”,
have not dropped out any features. Further in Next section III International Conference on Knowledge-Based and Intelligent Information
we have analysed the performance of six SML algorithms and Engineering Systems, 2003.
[16] M. Thirunavukkarasu, K. Dinakaran, E.N Satishkumar and S Gnanendra,
based on the ROC curve and Learning curve on train test ratio “Comparison of support vector machine(svm) and Back propagation
80:20. For the train test ratio 80:20 Accuracy of KNN is network (bpn) methods in predicting the protein Virulence factors”,Jr. of
highest i.e., 100 % and NB is having lowest accuracy i.e., 84% Industrial Polluction Control 33(2)(2017)pp 11-19.
see Fig. 4, 5, 6, 7 and 8. Further in Next section we have [17] C.-Y. Yeh, W.-P. Su, and S.-J. Lee, “Employing multiplekernel support
vector machines for counterfeit banknote recognition,” Applied Soft
analysed the performance of SML algorithm SVM, LR, NB, Computing, vol. 11, no. 1, pp. 1439–1447, Jan. 2011.
DT, RF and KNN on the basis of standard quantitative analysis [18] https://archive.ics.uci.edu/ml/datasets/banknote+authenti cation.
parameter like MCC, F1 Score, NPV, NDR, accuracy and [19] Verma, Vivek K., Anju Yadav, and Tarun Jain. "Key Feature Extraction
others. For 80:20 and 70:30 train test ratio accuracy is highest and Machine Learning-Based Automatic Text Summarization." Emerging
Technologies in Data Mining and Information Security. Springer,
in KNN. As MCC value is near +1 then it is perfect model and Singapore, 2019. 871-877.
F1m is also1 for both train test ratio. And Naïve Bayes is having

2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence 2021) 815

Authorized licensed use limited to: Marathwada Mitra Mandal's College of Engg. Downloaded on August 22,2024 at 08:46:18 UTC from IEEE Xplore. Restrictions apply.

You might also like