0% found this document useful (0 votes)

254 views14 pages

DataScience - Project (Banknote Authentication) - SHILANJOY BHATTACHARJEE EE

This document is a project report submitted by Shilanjoy Bhattacharjee for a course on Data Science and Data Analytics. The report discusses various machine learning algorithms including Support Vector Machines, Random Forest Classifier, and K-Nearest Neighbors. It provides overview of these algorithms and their applications in areas such as text classification, image recognition, protein classification and more. The document was submitted as part of a course project to analyze a banknote authentication dataset using machine learning techniques.

Uploaded by

Shilanjoy Bhattacharjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

254 views14 pages

DataScience - Project (Banknote Authentication) - SHILANJOY BHATTACHARJEE EE

Uploaded by

Shilanjoy Bhattacharjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

BANKNOTE AUTHENTICATION

A PROJECT REPORT

Submitted by

SHILANJOY BHATTACHARJEE (EE 25)

Enrolment Id - 12017009005027

Data Science & Data Analytics Lab (CS – 695A)

Project Guide - Prof. Sankhadeep Chatterjee & Prof. Moumita Basu

B. Tech 3rd year

2020

University of Engineering & Management (UEM)

University Area, Plot No. III - B/5, New Town, Action Area - III, Kolkata, West Bengal 700156
CERTIFICATE

Certified that this project report “Bank Authentication Dataset” is the

bonafide work of
SHILANJOY BHATTACHARJEE (EE 25)
Enrolment Id - 12017009005027

Of B.Tech, EEE, who carried out the project work under our supervision.

………………………………..

SIGNATURE

Examiner:
ACKNOWLEDGEMENT

The completion of this project could not have been accomplished

Without the support of our teachers and guide Prof. Sankhadeep Chatterjee
& Prof. Moumita Basu. We are thankful to you for allowing us your time to
research and write.

We are also very thankful to our respected teachers for their co-operation and
suggestion regarding the project work.

Last but not the least we are very thankful to our HOD, Prof. Sanjoy Bhadra
for giving us an opportunity of doing such an interesting project work.

- Shilanjoy Bhattacharjee
OVERVIEW

Machine Learning is the field of study that gives computers the capability to
learn without being explicitly programmed. ML is one of the most exciting
technologies that one would have ever come across.

As it is evident from the name, it gives the computer that which makes it more
similar to humans: The ability to learn.

Machine learning is actively being used today, perhaps in many more places
than one would expect.

The basic premise of machine learning is to build algorithms that can receive
input data and use statistical analysis to predict an output while updating outputs
as new data becomes available.

TYPES OF LEARNING:
1. Supervised Learning
2. Unsupervised Learning
INTRODUCTION
Data mining is concerned with locating hidden relationships present in
commercial enterprise data presents groups to make predictions for eventual
use.

Data mining has risen as a key commercial enterprise intelligence technology.

The preference of data mining is to extract implicit, previously unexplored and
doubtlessly beneficial (or actionable) styles from statistics.

Data mining encompass many up to date tactics along with classification (neural
networks, k-nearest neighbor, naive Bayes classifier, and decision trees),
clustering (density based clustering, k-means, hierarchical clustering),
association (constraint-based association, multilevel association,
multidimensional, one-dimensional).

Years of education display that data mining is a technique, and it’s helpful
application calls for post processing (presentation, understanding ability,
summary), facts pre-processing (cleaning, noise/outlier removal, dimensionality
reduction) true knowledge of problem domains and domain facility.

All traditional algorithms are exaggerated to some amount by the class

imbalance crisis. Also, the accurate choice of the metric (or consolidation of
metrics) to assess – and ultimately improve, is essential for the accomplishment
of a data mining effort in such areas, since most of the time improving one
metric degrades others.
ALGORITHMS

SUPPORT VECTOR MACHINE (SVM):

A support-vector machine constructs a hyper plane or set of hyper planes in a high

or infinite-dimensional space, which can be used for classification,
regression, or other tasks like outliers detection. Intuitively, a good separation is
achieved by the hyper plane that has the largest distance to the nearest training-
data point of any class (so-called functional margin), since in general the larger the
margin, the lower the generalization error of the classifier.

Whereas the original problem may be stated in a finite-dimensional space, it often

happens that the sets to discriminate are not linearly separable in that space. For
this reason, it was proposed that the original finite-dimensional space be mapped
into a much higher-dimensional space, presumably making the separation easier
in that space.

To keep the computational load reasonable, the mappings used by SVM schemes
are designed to ensure that dot products of pairs of input data vectors may be
computed easily in terms of the variables in the original space, by defining them
in terms of a kernel function k(x, y) selected to suit the problem.

The hyper planes in the higher-dimensional space are defined as the set of points
whose dot product with a vector in that space is constant, where such a set of vector
is an orthogonal (and thus minimal) set of vectors that defines a hyper plane.

The vectors defining the hyper planes can be chosen to be linear combinations with
parameters of images of feature vectors x that occur in the data base. With this
choice of a hyper plane, the points x in the feature space that are mapped into the
hyper plane are defined by the relation.

Note that if k(x, y) becomes small as y grows further away from x, each term in
the sum measures the degree of closeness of the test point x to the corresponding
data base point. In this way, the sum of kernels above can be
Used to measure the relative nearness of each test point to the data points
originating in one or the other of the sets to be discriminated.

Note the fact that the set of points x mapped into any hyper plane can be quite
convoluted as a result, allowing much more complex discrimination between sets
that are not convex at all in the original space.
RANDOM FOREST CLASSIFIER:
Decision trees are a popular method for various machine learning tasks. Tree
learning "comes closest to meeting the requirements for serving as an off-the-
shelf procedure for data mining", because it is invariant under scaling and
various other transformations of feature values, is robust to inclusion of
irrelevant features, and produces inspect able models.

However, they are seldom accurate. In particular, trees that are grown very deep
tend to learn highly irregular patterns: they over fit their training sets, i.e. have
low bias, but very high variance.

Random forests are a way of averaging multiple deep decision trees, trained on
different parts of the same training set, with the goal of reducing the variance.

This comes at the expense of a small increase in the bias and some loss of
interpretability, but generally greatly boosts the performance in the final model.
K- NEAREST NEIGHBOUR:

The training examples are vectors in a multidimensional feature space, each

with a class label. The training phase of the algorithm consists only of storing
the feature vectors and class labels of the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector

(a query or test point) is classified by assigning the label which is most frequent
among the k training samples nearest to that query point.

A commonly used distance metric for continuous variables is Euclidean

distance. For discrete variables, such as for text classification, another metric
can be used, such as the overlap metric (or Hamming distance).

In the context of gene expression microarray data, for example, k-NN has also
been employed with correlation coefficients such as Pearson and Spearman.

Often, the classification accuracy of k-NN can be improved significantly if the

distance metric is learned with specialized algorithms such as Large Margin
Nearest Neighbor or Neighborhood components analysis.

A drawback of the basic "majority voting" classification occurs when the class
distribution is skewed. That is, examples of a more frequent class tend to
dominate the prediction of the new example, because they tend to be common
among the k nearest neighbors due to their large number.

One way to overcome this problem is to weight the classification, taking into
account the distance from the test point to each of its k nearest neighbors. The
class (or value, in regression problems) of each of the k nearest points is
multiplied by a weight proportional to the inverse of the distance from that point
to the test point.
Another way to overcome skew is by abstraction in data representation.

For example, in a self-organizing map (SOM), each node is a representative (a

center) of a cluster of similar points, regardless of their density in the original
training data. K-NN can then be applied to the SOM.
APPLICATION

 SVMs are helpful in text and hypertext categorization, as their application can
significantly reduce the need for labelled training instances in both the standard
inductive and transductive settings. Some methods for shallow semantic
parsing are based on support vector machines.
 Classification of images can also be performed using SVMs. Experimental
results show that SVMs achieve significantly higher search accuracy than
traditional query refinement schemes after just three to four rounds of relevance
feedback. This is also true for image segmentation systems, including those
using a modified version SVM that uses the privileged approach as suggested
by Vapnik.
 Hand-written characters can be recognized using SVM.
 The SVM algorithm has been widely applied in the biological and other
sciences. They have been used to classify proteins with up to 90% of the
compounds classified correctly. Permutation tests based on SVM weights have
been suggested as a mechanism for interpretation of SVM models Support-
vector machine weights have also been used to interpret SVM models in the
past Posthoc interpretation of support-vector machine models in order to
identify features used by the model to make predictions is a relatively new area
of research with special significance in the biological sciences.
CONCLUSION
After successful completion of this project we can conclude that Banknote
authentication Dataset can be used in data science for good purpose.

Here we have used Support Vector Machine, Random Forest, K-Nearest

Neighbour Classifier for analysis of Bank Note Authentication.

Training Report On Data Sciencep
No ratings yet
Training Report On Data Sciencep
80 pages
Report Minor Project PDF
No ratings yet
Report Minor Project PDF
37 pages
Machine Learning Algorithms Applications and Practices in Data Science PDF
No ratings yet
Machine Learning Algorithms Applications and Practices in Data Science PDF
113 pages
Test Bank Managerial Economics 7th Edition Keat
100% (1)
Test Bank Managerial Economics 7th Edition Keat
5 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
R PROGRAMMING LAB MANUAL
No ratings yet
R PROGRAMMING LAB MANUAL
35 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
Machine Learning - Customer Segment Project. Approved by UDACITY
100% (1)
Machine Learning - Customer Segment Project. Approved by UDACITY
19 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Machine Learning
100% (1)
Machine Learning
46 pages
Data Science Report - Compress
No ratings yet
Data Science Report - Compress
31 pages
CSC8001-Data Science Project Report
No ratings yet
CSC8001-Data Science Project Report
5 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Final Twitter - Sentiment - Analysis - Report
100% (1)
Final Twitter - Sentiment - Analysis - Report
14 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Data Science Project
No ratings yet
Data Science Project
49 pages
MCA Project Titles
No ratings yet
MCA Project Titles
2 pages
House Price Prediction Using Machine Learning: A Project Report On
No ratings yet
House Price Prediction Using Machine Learning: A Project Report On
19 pages
News Classification Using Machine Learning
No ratings yet
News Classification Using Machine Learning
5 pages
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
4 pages
C14 - Speech Emotion Recognition Using Machine Learning
No ratings yet
C14 - Speech Emotion Recognition Using Machine Learning
118 pages
Data Analytics III-i
No ratings yet
Data Analytics III-i
85 pages
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
No ratings yet
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
46 pages
Major Project Documentation Final 2
No ratings yet
Major Project Documentation Final 2
62 pages
Mark Halverson PHD Data Scientist Resume
No ratings yet
Mark Halverson PHD Data Scientist Resume
1 page
Data Science in E-Commerce - Report - Writing
No ratings yet
Data Science in E-Commerce - Report - Writing
18 pages
Ss Project With Python
No ratings yet
Ss Project With Python
9 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
16 pages
Sales Prediction
No ratings yet
Sales Prediction
37 pages
How To Document Your Data Science Project
No ratings yet
How To Document Your Data Science Project
9 pages
Linear Regression (Check List)
100% (1)
Linear Regression (Check List)
2 pages
Handling Missing Value
No ratings yet
Handling Missing Value
12 pages
Module 2
No ratings yet
Module 2
20 pages
Discriminant Analysis Chapter-Seven
No ratings yet
Discriminant Analysis Chapter-Seven
7 pages
Internship Report (200490111006)
No ratings yet
Internship Report (200490111006)
41 pages
Assignment - Machine Learning
No ratings yet
Assignment - Machine Learning
3 pages
Synopsis Minor Project-2
No ratings yet
Synopsis Minor Project-2
5 pages
Visvesvaraya Technological University: City Engineering College
No ratings yet
Visvesvaraya Technological University: City Engineering College
31 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Thesis Anum Afzal
No ratings yet
Thesis Anum Afzal
127 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Data Science
No ratings yet
Data Science
65 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
Machine Learning Projects PDF
No ratings yet
Machine Learning Projects PDF
5 pages
BA ZG523 Introduction To Data Science
50% (2)
BA ZG523 Introduction To Data Science
12 pages
Data Visualization Using Plotly, Matplotlib, Seaborn and Squarify - Data Science
No ratings yet
Data Visualization Using Plotly, Matplotlib, Seaborn and Squarify - Data Science
61 pages
Data Science
100% (2)
Data Science
38 pages
Text
No ratings yet
Text
131 pages
Interview Preparations - NielsenIQ
No ratings yet
Interview Preparations - NielsenIQ
1 page
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
12 pages
NLP and ML Project
100% (1)
NLP and ML Project
37 pages
DM Case Studies
No ratings yet
DM Case Studies
24 pages
Capstone Project 2 1
No ratings yet
Capstone Project 2 1
3 pages
ML and Ai Unit 04 and Unit 05
No ratings yet
ML and Ai Unit 04 and Unit 05
58 pages
Data Science Unit 3 (1) - Copy
No ratings yet
Data Science Unit 3 (1) - Copy
33 pages
DSBDUNITIII_T1729232981820-1
No ratings yet
DSBDUNITIII_T1729232981820-1
26 pages
San Diego OP 2018
No ratings yet
San Diego OP 2018
23 pages
TATASTEELCSRXITEUEM
No ratings yet
TATASTEELCSRXITEUEM
18 pages
Proposal For Training of Underprivileged Youth in The Areas of Renewable Energy & Waste Management
No ratings yet
Proposal For Training of Underprivileged Youth in The Areas of Renewable Energy & Waste Management
5 pages
Kolkata Metro Training Report
No ratings yet
Kolkata Metro Training Report
4 pages
Probability and Statistics July 2023
No ratings yet
Probability and Statistics July 2023
8 pages
e-Passbook-2025-05-31-22-17-10-pm
No ratings yet
e-Passbook-2025-05-31-22-17-10-pm
20 pages
Shellvoy 6 Part I
No ratings yet
Shellvoy 6 Part I
3 pages
ANT AQU4518R8v07 1444 002
100% (1)
ANT AQU4518R8v07 1444 002
2 pages
Curriculum Table _new
No ratings yet
Curriculum Table _new
6 pages
Art. 1953 - 1961
No ratings yet
Art. 1953 - 1961
7 pages
Terms and Conditions Personal Loan 1687845092306
No ratings yet
Terms and Conditions Personal Loan 1687845092306
7 pages
LSD 13 05 15-12-2000
No ratings yet
LSD 13 05 15-12-2000
198 pages
Glenrowan Terminal Station Redevelopment: 2008/09-2013/14 Capital Works Revised Proposal
No ratings yet
Glenrowan Terminal Station Redevelopment: 2008/09-2013/14 Capital Works Revised Proposal
17 pages
Info Po 0257
No ratings yet
Info Po 0257
3 pages
TOPIC 5 - Project Control and Monitoring
100% (2)
TOPIC 5 - Project Control and Monitoring
17 pages
Unit-3 PPT
No ratings yet
Unit-3 PPT
44 pages
Debug
No ratings yet
Debug
10 pages
Phdays Ffmpeg
No ratings yet
Phdays Ffmpeg
66 pages
lins-et-al-2018-mechanical-and-thermal-properties-of-high-density-polyethylene-alumina-glass-fiber-hybrid-composites (3)
No ratings yet
lins-et-al-2018-mechanical-and-thermal-properties-of-high-density-polyethylene-alumina-glass-fiber-hybrid-composites (3)
16 pages
Resist 86 AV Msds Comp A
No ratings yet
Resist 86 AV Msds Comp A
9 pages
Food Delivery System Report
No ratings yet
Food Delivery System Report
23 pages
Composite Column - Calculation Examples
No ratings yet
Composite Column - Calculation Examples
62 pages
Application For A Uk Certificate of Equivalent Competency: 1 Personal Details
No ratings yet
Application For A Uk Certificate of Equivalent Competency: 1 Personal Details
8 pages
EAD 533 - T1 Identify Leader Positions
No ratings yet
EAD 533 - T1 Identify Leader Positions
5 pages
Ms. Swati - PPT - HRM - HR Analytics - PGDM - 2019 - Week 2 Session 1 - 30th March 2020
No ratings yet
Ms. Swati - PPT - HRM - HR Analytics - PGDM - 2019 - Week 2 Session 1 - 30th March 2020
54 pages
15 Runtime Environments
No ratings yet
15 Runtime Environments
8 pages
DA unit-5 - Copy
No ratings yet
DA unit-5 - Copy
8 pages
Deep Sharan (REPORT)
No ratings yet
Deep Sharan (REPORT)
42 pages
458 Challenge Brochure
No ratings yet
458 Challenge Brochure
19 pages
Career Plateau Reading
No ratings yet
Career Plateau Reading
12 pages
Course Outline in Inferential Statistics
No ratings yet
Course Outline in Inferential Statistics
2 pages
1-FINAL Our Future Willoughby 2028
No ratings yet
1-FINAL Our Future Willoughby 2028
28 pages
RA No 9208 Expanded Anti-Trafficking in Persons Act of 2012
No ratings yet
RA No 9208 Expanded Anti-Trafficking in Persons Act of 2012
44 pages