0% found this document useful (0 votes)

19 views

A Case Study On Data Classification Approach Using K-Nearest Neighbor

Uploaded by

raditya tanaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

A Case Study On Data Classification Approach Using K-Nearest Neighbor

Uploaded by

raditya tanaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/357235411

A Case Study on Data Classiﬁcation Approach Using K-Nearest Neighbor

Conference Paper · October 2021

DOI: 10.1109/APSIT52773.2021.9641209

CITATION READS
1 620

3 authors, including:

Jogeswar Tripathy Binod Kumar Pattanayak

Institute of Technical Education and Research Siksha O Anusandhan University
9 PUBLICATIONS 42 CITATIONS 158 PUBLICATIONS 1,206 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Students’ Mental Health and Help-Seeking Behaviors View project

Software Quality Assessment View project

All content following this page was uploaded by Jogeswar Tripathy on 02 February 2022.

The user has requested enhancement of the downloaded file.

A Case Study on Data Classification
Approach Using k-Nearest Neighbor
Jogeswar Tripathy Rasmita Dash Binod Kumar Pattanayak
Department of Computer Science and Engineering Department of Computer Science and Engineering Department of Computer Science and Engineering
ITER, Siksha ‘O’ Anusandhan Deemed to be ITER, Siksha ‘O’ Anusandhan Deemed to be ITER, Siksha ‘O’ Anusandhan Deemed to be
University, University, University,
Bhubaneswar, Odisha, India Bhubaneswar, Odisha, India Bhubaneswar, Odisha, India
[email protected] [email protected] [email protected]

Abstract— Data mining is the process of obtaining database, data warehouse. Here somehow server is
knowledge and information from massive amounts of responsible for fetch the user data according to the user's data
data. Data mining is mostly used for data analysis. In Data mining request knowledge base.
Mining various techniques are used that are association In machine learning, the prediction accuracy of
mining, regression, prediction, classification, clustering, categorization algorithms derived from empirical data
etc. Classification is described as the process of (examples) is assessed first. In practice, however, a
identifying a collection of models (or functions) that classifier's interpretability or transparency is frequently
explain and differentiate data classes and ideas, to use the crucial. This research investigates the accuracy of k-nearest
model for detect the classes of unknown objects or neighbor classifiers in classifying a dataset. Many remarkable
patterns, whose class designations aren't clear. and diverse types of classification learning algorithms, such
Classification is a supervised learning problem. That as Support Vector Machine (SVM), k closest neighbors, and
means in machine learning, Classification is the problem Naive Bayes Classifier (NBC), are classification related
of identifying data patterns from the group of patterns algorithms due to their major significance in exploratory
according to their characteristics and define which pattern analysis.
patterns are from which class. Pattern classification can
be done by using various classifiers. A classifier is a The following are the different types of data mining
program that inputs the feature vector of a pattern or strategies:
data point and assigns it to one of a set of designated Classification:
classes. The classifiers such as Artificial Neural Network In this step, the provided data instance must be classified into
(ANN), k-Nearest Neighbor (k-NN) classifier, Support one of the target classes that have already been identified.
Vector Machine (SVM), etc. are used for pattern One example is determining whether a consumer in a credit
classification purposes. Focusing on the classification card transaction database should be categorized as a
technique of data mining, in this research work the trustworthy customer or a defaulter based on his numerous
accuracy of k-NN using three datasets from the UCI demographic and prior purchase criteria.
machine learning library is presented. The main goal of
this paper is to provide a review to find out the accuracy Estimation:
of the k-NN classification technique using different An estimated model, like a classification model, is used to
datasets in data mining. The k-NN classifier is a simple determine a value for an unknown output attribute. In contrast
but efficient approach used for classification in research. to classification, an estimation problem's output attribute is
numeric rather than categorical. Consider the following
Keywords— Classification, SVM, k-NN, Supervised Learning, scenario: Calculate an individual's pay.
Normalization.

I. INTRODUCTION Prediction:
It's difficult to tell the difference between prediction and
Data mining is nothing but the information repository where categorization or estimate. The only distinction is that the
a huge amount of data can be stored/retrieved in/from predictive model predicts a future consequence rather than
databases and data warehouses. In another way it can be influencing present behavior. The output attribute can be
recognized as Knowledge Discovery in Databases (KDD), As category or numeric.
it deal s with various techniques of integration from several
disciplines like learning, computing of high performance, Association rule mining:
database technology, machine learning, pattern recognition, It is the process of extracting interesting hidden rules known
statistics, neural networks, information retrieval, data as association rules from a large transactional data set. For
visualization, etc. Some major components of the data mining example, the rule milk, butter, biscuit specifies that anytime
system which help improvise the architecture of the same are: milk and butter are purchased together, the biscuit is also
purchased, allowing these goods to be sold together to the project topic in the domain of data mining. The
improve overall sales of each item. classification challenge in various datasets has also drawn the
attention of the data mining community in the recent decade.
Clustering: The popularity of the closest neighbor classifier and its
It is a method of classification where the target classes are not variations, such as the k-Nearest Neighbor classifier (k-NN),
known. For example, given 100 consumers, they must be can be attributed.
categorized based on specific similarity criteria, and the B. Objective
classes into which the customers should ultimately be placed
The majority of classification experiments in the field of
are not predetermined. The classification data mining data mining have been seen concerning various datasets. This
approach is mostly used in the thesis work. This papers objective is to provide a classification technique
The goal of this research is to increase the k-NN that takes less time to classify using three distinct datasets for
approach's performance. The goal of k-NN is to compute the different k values without degrading the performance of
closest neighbor based on various values of k, which defines classifier. It’s main goal is to an analytic study of data mining
how many nearest neighbors should be considered when technology.
defining a value. A k-NN's general concept is to compute a
distance metric for calculating distances between data points. C. Organization
After then, the algorithm attempts to locate the K closest or The following is a breakdown of the paper's structure: The
closest answers in the dataset. The following are some purpose and goal of this study are included in Section I, which
examples of distance metrics: Euclidean distance is a term provides a brief introduction to the issue. The related work is
used to describe the distance between two points in space. described in Section II. Section III explains the suggested
The initial step in k-NN is to calculate all of the ‘distances' framework using a block diagram. The approach is presented
between each data point and each of the data set's reference in Section IV. Section V contains the data sets utilized in this
points. In the second step these distances are sorted and then study, as well as recommended framework techniques,
implementation details, and experimental result analysis.
choose the k nearest objects from which to execute the third
Finally, section VI wraps off the study by discussing the
and final stage of categorization. Finally, from the d- proposal's future scope.
dimensional feature space, k-NN identifies the k nearest (or
most similar) points to a data point among N points. k is the II. RELATED WORK
number of neighbors that are considered from a dataset. This part provides a summary of the literature review, which
One of the most essential data mining techniques is includes reviews of technical papers on the k-Nearest
classification. It entails applying a model learned from the Neighbor classification approach as applied to diverse
dataset's data points to generate predictions about the class applications. It provides an overview of current data mining
label of fresh data/observations. The process of developing a research being conducted on diverse applications too.
collection of models that explain and identify data classes, to Henceforth it was discovered the following findings and
use the model to predict the classes of objects or patterns limitations from several papers:
whose class labels are unknown, is known as classification.
Detecting spam in email communications based on the In this work [1], author focuses on the k-Nearest Neighbor
message header and content, detecting cancer based on MRI approach for classification. In terms of the parameter k, the
scan findings, categorising galaxies based on their location researcher tried this method with a variety of distances and
and morphologies are just a few examples. Learning a target classification criteria (majority, consensus, and random). The
function as shown in Fig.1 that translates each attribute set x results indicated that the k-NN technique may be used with
to one of the predetermined class labels y is the problem of both types of Euclidean distance and Manhattan. These
classification. The classification model is another name for distances are useful for categorization and performance, but
the goal function. they take time. As a result, they develop two types of distance
that yield the greatest outcomes of (98.70 and 98.70,
respectively).
Input Output
The WBC dataset experiment [2] shows that combining
Machine Learning Process(MLP) and J48 classifiers with
Attribute Set Classification Class level features selection Principal Component Analysis (PCA) beats
(x) Model (y) the other classifiers. On the other hand, the Wisconsin
Diagnostic Breast Cancer (WDBC) dataset revealed that
Fig. 1. Classification Model for a Task using a single classifier, such as the Specific Efficient
Optimization Algorithm (SMO) or a fusion of SMO and MLP
A. Motivation or SMO and IBK, is preferable to using several classifiers.
The motivation in this research is to find the accuracy Finally, the MLP, J48, SMO, and IBK fusion fared better than
score of k-NN classifier. Here it was discovered that a lot of the other classifiers in the Wisconsin Prognostic Breast
work has already been done in the different areas of data Cancer (WPBC) dataset.
mining about different datasets when working on identifying
To solve the issues of poor efficiency and reliance on k, they between points. The system searches the database for the K
picked a few representatives from the training dataset with closest or closest replies to the provided query (i.e. query
some additional information to represent the whole training point). Euclidean distance, Manhattan distance, and other
dataset in this paper[3]. They have utilized the optimal but distance measures are examples. The supervised learning
varied k chosen by the dataset itself to eliminate the reliance approach known as the k-nearest neighbour method (k-NN)
on k without user interaction in the selection of each has been used in a number of applications. The Euclidean
representative. The restriction that which was discovered distance is commonly used.
from this study is that the researcher need to focus on how to
enhance the classification accuracy of marginal data that falls III. PROPOSED FRAMEWORK
beyond the typical areas. In the WPBC dataset, the fusion of This research work focuses on a data classification concept
MLP, J48, SMO, and IBK outperformed the other classifiers. using k-NN. The model is presented in Fig.2. After
normalizing all the data between 0 and 1 of the datasets, the k-
Here the researcher noticed and focused on the choice of k NN classification technique will be used by which the distance
values in this paper[4], and ultimately, the experiment results between each row is calculated as per the k value. Then in the
show that the proposed approach consistently beats other next step, the predicted class labels of each row of the dataset
classifiers across a wide range of k, and its efficacy has been will be going to predict. In the next step by using the actual
shown with good performance. class label and predicted class label, a confusion matrix will
be done[9]. By using the confusion matrix True Positive (TP),
True Negative (TN), False Positive (FP), false Negative (FN),
In this work[5] of pattern recognition, the KNN classifier is
value will be calculated according to the formula. Finally, a
one of the most often used neighborhood classifiers.
comparison between the accuracies of datasets will be done.
However, it has significant drawbacks, including high As the experimental data set, several sorts of two-class
computation complexity, complete reliance on the training datasets such as BCW, Pima, and zoo are used to determine
set, and no weight variation across classes. To address this, how the performance differs based on the data. Here the data
this study proposes a unique approach for improving KNN are collected of various sizes and types, then utilizes for both
classification performance using Genetic Algorithms (GA). nominal and numerical data to assess the outcomes. The
complete implementation and analysis of the accuracy of
In this paper [6] rather of computing the similarities between datasets using the k-Nearest Neighbor algorithm described on
all of the training and test samples and then selecting k- the following model using MATLAB tool [10,11].
neighbors for classification, GA picks just k-neighbors at
each iteration, calculates the similarities, classifies the test Apply
samples using these neighbours, and determines the accuracy. Input Identify the Normalizing k-NN
Dataset class level the data algorithm
k-NN's computation complexity was decreased in this case.
The performance of the Gk-NN classifier is compared against
conventional k-NN, CART, and SVM using five distinct
medical datasets from the UCI data collection. The trials and
findings have shown that the suggested technique not only

confusion
decreases the complexity of the k-NN but also enhances the

Use the
matrix
Comparison Calculate the
classifiability accuracy of the k-NN. of accuracy accuracy

In this work [7], a new method called Modified k-Nearest

Neighbor, Mk-NN is presented for enhancing the
Fig. 2. Proposed Framework
performance of the KNN classifier. Wine, Isodata, Iris, Bupa,
Ionosphere, and three Monks puzzles were among the nine
benchmark tasks that the technique was assessed on. The In this approach, k-Nearest Neighbor model has been
findings unanimously supported the authors' claims of taken with an algorithm and three datasets to analyze how this
algorithm helps to predict the unknown class value and
robustness and accuracy. As a result, our approach should
observe the accuracy by using a confusion matrix [12]. So first
perform better in noisy datasets and with outliers. k-Nearest
it is analyzed that the accuracy of different datasets and then
Neighbor (k-NN) is a technique of determining the nearest compares their accuracy, which dataset gives more accuracy
neighbour based on the number of k, which specifies how using such algorithm [13]. Here is a faster algorithm applied
many nearest neighbours should be considered when on three datasets for which it is to be considered which dataset
establishing a sample data point's class. gives less classification time and more accuracy as compared
to other datasets using k-Nearest Neighbor Classifier has been
The major issue, however, will always be computational proposed.
complexity and memory restrictions. The core notion of a k-
NN query, as described in this article [8], is that the user IV. METHODOLOGY
offers a choice of query types, including a multidimensional Any data-mining operation can't be done on the original
point query and a distance metric for computing distances dataset directly. To prepare for the procedure, a data set is
required. Impurities in data obtained from diverse sources
include unexpected numbers, missing values, and data The simplest normalizing approach is decimal scaling,
dimensions that are too high with undesired properties or in which data values are scaled down by shifting the
features [14]. These contaminants must be eliminated, and decimal point of attribute A values. The number of decimal
the data must be pre-processed before it can be used. Some points changed is determined by the attribute's maximum
of the pre-processing techniques mentioned in this chapter absolute value. This may be computed with the following
have been utilized on this work. formula:
The proposed system contains the following 𝑣
𝑣𝑖 = (3)
module: 10𝑗

A. Normalization Where:
B. k-Nearest Neighbor Classification j is the smallest integer such that
A. Normalization max(|v′|) < 1
Values of some attributes in a data-set can be of higher B. k-Nearest neighbor Classification
numeric range whereas some may be of smaller range [15]. Data points may be classified based on their distance from
For applying some classification algorithms like neural points in a training dataset, which is a basic yet effective
networks and their variants, distance measures, they require method of doing so.To calculate the distance, we may use a
the value of all the attributes to be small within a range [16]. variety of measures, which will be discussed next.
For example, input values for neural network or k-NN can be
-1, 0, or +1. Distance Metrics
i. Min-Max Normalization The following are the varying distances between the
In instances where we know the range of values in our components xs and yt: Given a mx-by-n data matrix X, which
input data, we utilise min-max normalisation. This approach can be represented by mx (1-by-n) row vectors x1, x2,...,
is utilised when using a neural network as a learning machine xmx, and a my-by-n data matrix Y, which can be represented
or when using a naïve bayesian classifier that requires all by my (1-by-n) row vectors y1, y2,..., ymy.
features to have an in-class variance of 0 to 1 [17]. The
minimum and maximum ranges are set to 0 and 1, TABLE I. APPROACHES TO DEFINING THE DISTANCE BETWEEN INSTANCES
(X AND Y)
respectively, in this work. It is given by the formula :
𝑚 2
𝑣 ′ = (𝑣 − min 𝐴)/(max 𝐴 − 𝑚𝑖𝑛𝐴) (1) Makowsky: 𝐷(𝑥, 𝑦) = (∑𝑖=1 |𝑥𝑖 − 𝑦𝑖 |𝑟 )
Where: Manhattan: 𝐷(𝑥, 𝑦) = | − 𝑥𝑖 − 𝑦𝑖 |
Chebychev: 𝐷(𝑥, 𝑦) = 𝑚𝑎𝑥 ∑𝑚 𝑖=1 |𝑥𝑖 − 𝑦𝑖 |
v is the original value for an instance of attribute ’A’. 𝑚
Euclidean: 𝐷(𝑥, 𝑦) = 𝑖=1(|𝑥𝑖 − 𝑦𝑖 |2 ) (1/2)
∑
𝑚
v is the new value. Canberra:𝐷(𝑥, 𝑦) = ∑𝑖=1(|𝑥𝑖 − 𝑦𝑖 |)/(|𝑥𝑖 | + |𝑦𝑖 |)
minA is the minimum value of the attribute in the
original dataset(A). V. DATA SET USED
max A is the maximum value of the attribute in the A data set is often the contents of a single database table
or statistical data matrix, with each table column representing
original dataset(A). a distinct variable and each row representing a specific
ii. Z-score Normalization member of the data set in question. A dataset is made up of a
data matrix with m rows (representing the items) and k
This normalization technique is based on the mean and columns (corresponding to the measurements). The columns
standard deviation of a particular attribute ’A’ in the dataset are commonly referred to as features, but they can also have a
[18]. This is therefore otherwise refereed as standard deviation different backdrop, as shown in the Table II. A dataset is an
normalization or zero-mean normalization. First, the mean enhanced version of a data matrix in this case. It has a size of
and standard deviation need to be calculated mathematically m*k and may be used with various Matlab matrices
as always and then the formula follows: operations.
𝑧 = (𝑥 − 𝐴)/𝐴 (2) A. BCW (Breast Cancer Wisconsin) Data set
Where: The Wisconsin Breast Cancer datasets from the UCI
Machine Learning Repository were used to distinguish
x is the original value for an instance of attribute ’A’.
malignant (cancerous) from benign (non-cancerous) samples.
Z is the transformed value. W. Nick Street, University of Wisconsin, Computer Sciences
Dept., 1210 West Dayton St., Madison, WI 53706 street at
is the mean of attribute ’A’. cs.wisc.edu 608-262-6619, and Olvi L. Mangasarian,
is the standard deviation of the attribute. Computer Sciences Dept.The Table II below contains a
summary of all the datasets [19]. Each dataset contains a TABLE III. PERFORMANCE OF KNN WITH DIFFERENT K VALUES
IN PERCENTAGE
collection of numerical characteristics or attributes as well as
certain categorization of patterns.
DATA SETS K=3 K=5 K=7
TABLE II. DESCRIPTION OF BCW, PIMA AND ZOO DATASETS

Name of No.of No. of No. of class BCW 95.4612 95.3148 91.3616

Dataset Instances Attributes levels
PIMA 71.0938 67.0573 65.1042
Wisconsin 699 10 2
Breast ZOO 84.1584 86.1386 78.2178
Cancer
Pima 768 9 2
Zoo 101 17 2
100
B. PIMA Data set 80
To construct a real-valued prediction between 0 and 1, the 60
Pima Indians Diabetes datasets from the UCI Machine
Learning Repository are utilized as shown in Table II. With a 40
threshold of 0.448, this was converted to a binary choice. 20
“Tested positive for diabetes” is the meaning of class value 1.
The National Institute of Diabetes and Digestive and Kidney 0
Diseases created it. k=3 k=5 k=7

C. ZOO Data set BCW PIMA ZOO

The Table II describes about this dataset which class attribute
appears to be represented by the “type” attribute. The purpose
for this dataset is to be able to predict the classififcation of
FIG. 3 PERFORMANCE OF K-NN WITH DIFFERENT K VALUES IN
the annimals , based upon the variables. PERCENTAGE

VI. IMPLEMENTATION DETAILS & EXPERIMENTAL As shown in Table III, It was observed that if the k value
RESULTS varies, the accuracy rate for k-NN also varies accordingly. In
The experimental data collection includes a variety of two- BCW dataset when k- value increases then the accuracy of this
class level datasets such as BCW, Pima, and zoo. The dataset is decreased with that k value. And also by using the
performance fluctuates or does not vary depending on the Pima dataset in this k-NN classifier without using training and
data. Here data collected of various sizes and sorts, utilizes testing instances with the k values increased, the accuracy
both nominal and numerical data to evaluate results. Using the decreased. But by using the zoo dataset with the use of
k-Nearest Neighbor techniques access the correctness of different k values, the accuracy of the classifier first increased,
datasets. The MATLAB tool is used to finish the and then decreased, and so on.
implementation. In this method, k-Nearest Neighbor model The accuracy rate for KNN increases first, then falls as the
used with three datasets to investigate how this technique aids k value grows. This is because bigger values of k minimize the
in the prediction of unknown class values and finding the influence of noise on the classification, but make class borders
accuracy of the prediction using the confusion matrix. less clear. When it was compared the three datasets with
Several two-class datasets, such as bcw, Pima, and zoo, are different k values, it was discovered that BCW has a better
included in the experimental data set. Depending on the data, KNN classification accuracy rate than the other two. The
the performance changes or doesn't change.Here data maximum accuracy is obtained quickly when k is still a small
collected of various sizes and types; for instance,outcomes number (k=5) and then gradually lowered, but the highest
accessed using both nominal and numerical data. Here it was accuracy is reached slowly for different k values for Pima and
investigated how this strategy assists in the prediction of zoo. This is most likely because BCW feature vectors are
unknown class values and the accuracy of the prediction using denser in multidimensional space than the other two.
the confusion matrix with k-Nearest Neighbor classification VII. CONCLUSION & FUTURE WORK
model [20] .
In data mining and pattern recognition, the k-NN classifier
is one of the most often used neighborhood classifier.
However it has certain drawbacks, such as high computation
complexity, complete reliance on the training set, and no
weight variation across classes. The focus of this method is on
the accuracy of various k selections to increase classification
performance. But by different k values with odd numbers, the
accuracy of each dataset initially increases, then falls, and then Pattern Recognition: A Review, IEEE Transactions on Pattern
increases again. The outcome of the implementation is that k- Analysis and Machine Intelligence , 22(1) pp.4-37, (2000) .
[17] E. Acuna, C. Rodriguez, The treatment of missing values and
NN is a very excellent classifier. As the size of the data set its effect in the classifier accuracy, in: D. Banks, L. House,
grows larger, it produces good results. The future work can be F.R. McMorris, P. Arabie, W. Gaul (Eds.), Classification,
done by using different datasets with some feature selection Clustering and Data Mining Applications, Springer, Berlin,
techniques for increasing the performance of a classifier. pp. 639648 (2004).
[18] Vijaya, P., Murty, M.N., Subramanian, D.K.: Leaders-
REFERENCES subleaders: An ecient hierarchical clustering algorithm for
large data sets. Pattern Recognition Letters 25, 505513 (2004).
[1] N. Suguna, and Dr. K. Thanushkodi, An Improved k-Nearest [19] A. Frank, A. Asuncion, UCI Machine Learning Repository,
Neighbor Classification Using Genetic Algorithm IJCSI http://www.archive.ics.uci.edu/ml,(2011).
International Journal of Computer Science Issues, Vol. 7, [20] https://archive.ics.uci.edu/ml/datasets.html.
Issue 4, No 2, (2018).
[2] Gouda I. Salama, M. B. Abdelhalim, and Magdy Abd-elghany
Zeid,Experimental Comparison of Classifiers for Breast
Cancer Diagnosis, IEEE Transactions, pp.978-1-4673-2961
(2016 ).
[3] Gongde Guo, Hui Wang , David Bell , Yaxin Bi , and Kieran
Greer, KNN Model-Based Approach in Classification,
Spinger(2012).
[4] Gou, J., Du, L. Zhang, Y. and Xiong, T. ”A New Distance-
weighted k-nearest Neighbor Classifier”, Journal of
Information and Computational Science, 9(6) pp.1429-1436
(2012).
[5] S. C. Bagui, S. Bagui, K. Pal, Breast Cancer Detection using
Nearest Neighbor Classification Rules, Pattern Recognition
36, pp 25-34, (2003).
[6] Hamid Parvin,Hoseinali Alizadeh,Behrouz Minati, A
Modification on K-Nearest Neighbor Classifier, Global
Journal of Computer Science and Technology, Vol.10, Issue
14 (Ver.1.0),November (2010).
[7] Agrawal,R., Imielinski, T., Swami, A., Database Mining:A
Performance Perspective, IEEE Transactions on Knowledge
and Data Engineering, pp. 914-925, December 1993.
[8] Nitin Bhatia, Vandana, ”Survey of Nearest Neighbor
Techniques” International Journal of Computer Science and
Information Security, Vol. 8, No. 2,(2010).
[9] Angeline Christobel. Y, Dr. Sivaprakasam (2011). An
Empirical Comparison of Data Mining Classification
Methods. International Journal of Computer Information
Systems,Vol. 3, No. 2, (2011).
[10] V. Suresh Babu and P. Viswanath. Weighted k-nearest leader
classier for large data sets. In PReMI, pp. 1724 (2007).
[11] Aman Kataria, M. D. Singh, A Review of Data Classification
Using K-Nearest Neighbour Algorithm, International Journal
of Emerging Technology and Advanced Engineering ,
Volume (3), Issue (6), June (2013).
[12] J. Tripathy,R. Dash, B. K. Pattanayak and B. Mohnty, “
Automated Phrase Mining Using Post: The Best
Approach,”2021 1st Odisha International Conference on
Electrical Power Engineering, Communication and
Computing Technology(ODICON), 2021, pp. 1-6,
doi:10.1109/ODICON50556.2021.9429014.
[13] Panda, Smruti Rekha, and Jogeswar Tripathy. "Odia offline
typewritten character recognition using template matching
with unicode mapping." 2015 international symposium on
advanced computing and communication (ISACC), pp. 109-
115, IEEE, 2015.
[14] D. Mohapatra, J. Tripathy, K. K. Mohanty and D. S. K. Nayak,
"Interpretation of Optimized Hyper Parameters in Associative
Rule Learning using Eclat and Apriori," 2021 5th
International Conference on Computing Methodologies and
Communication (ICCMC), 2021, pp. 879-882, doi:
10.1109/ICCMC51019.2021.9418049.
[15] Mohapatra D., Tripathy J., Patra T.K. (2021) Rice Disease
Detection and Monitoring Using CNN and Naive Bayes
Classification. In: Borah S., Pradhan R., Dey N., Gupta P.
(eds) Soft Computing Techniques and Applications. Advances
in Intelligent Systems and Computing, vol 1248. Springer,
Singapore. https://doi.org/10.1007/978-981-15-7394-1_2.
[16] Anil K. Jain , Robert P. W. Duin , Jianchang Mao, Statistical

View publication stats

Restaurant Review Analysis
67% (3)
Restaurant Review Analysis
59 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
A Review of Multi-Class Classification Algorithms
No ratings yet
A Review of Multi-Class Classification Algorithms
10 pages
A Review of Various KNN Techniques
No ratings yet
A Review of Various KNN Techniques
6 pages
Classification Algorithm in Data Mining: An
No ratings yet
Classification Algorithm in Data Mining: An
6 pages
Classification
No ratings yet
Classification
50 pages
DM - MP (1)
No ratings yet
DM - MP (1)
15 pages
Comparative Study of K-NN, Naive Bayes and Decision Tree Classification Techniques
No ratings yet
Comparative Study of K-NN, Naive Bayes and Decision Tree Classification Techniques
4 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
Survey of Classification Techniques in Data Mining: Open Access
No ratings yet
Survey of Classification Techniques in Data Mining: Open Access
10 pages
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
Decision Tree For The Weather Forecasting
No ratings yet
Decision Tree For The Weather Forecasting
4 pages
Datamining Fifth Lecture
No ratings yet
Datamining Fifth Lecture
65 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
A Comparative Analysis of Machine Learning Algorithms for Classification Purpose
No ratings yet
A Comparative Analysis of Machine Learning Algorithms for Classification Purpose
10 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
Analysis of Classification Algorithm in Data Mining
No ratings yet
Analysis of Classification Algorithm in Data Mining
3 pages
Back Propagated K-Mean Clustering For Prediction of Slow Learners
No ratings yet
Back Propagated K-Mean Clustering For Prediction of Slow Learners
5 pages
Research and Implementation of Machine
No ratings yet
Research and Implementation of Machine
6 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
An Introduction To Data Mining IIT Bombay
No ratings yet
An Introduction To Data Mining IIT Bombay
48 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
48 pages
Solution 1
63% (8)
Solution 1
3 pages
Journal On Decision Tree
No ratings yet
Journal On Decision Tree
5 pages
UNIT - II - Data Mining Essentials
No ratings yet
UNIT - II - Data Mining Essentials
20 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
Unit III Data Mining Techniques
No ratings yet
Unit III Data Mining Techniques
17 pages
Literature Review CCSIT205
No ratings yet
Literature Review CCSIT205
9 pages
DSBDUNITIII_T1729232981820-1
No ratings yet
DSBDUNITIII_T1729232981820-1
26 pages
8 Chapter Eight
No ratings yet
8 Chapter Eight
20 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Data Mining Techniques and Its Applications in Banking Section - Chitra and Subashini
No ratings yet
Data Mining Techniques and Its Applications in Banking Section - Chitra and Subashini
8 pages
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
100% (1)
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
704 pages
TOD 212 - Digging Through Data - PPT - For Students - Monsoon 2023 (Autosaved)
No ratings yet
TOD 212 - Digging Through Data - PPT - For Students - Monsoon 2023 (Autosaved)
18 pages
BI-Unit-3-Part-1-PPT.ppt
No ratings yet
BI-Unit-3-Part-1-PPT.ppt
51 pages
3 DM Classification (2)
No ratings yet
3 DM Classification (2)
62 pages
DataScience - Project (Banknote Authentication) - SHILANJOY BHATTACHARJEE EE
No ratings yet
DataScience - Project (Banknote Authentication) - SHILANJOY BHATTACHARJEE EE
14 pages
Classification in Data Mining
No ratings yet
Classification in Data Mining
14 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
classification basic concept.data mining
No ratings yet
classification basic concept.data mining
20 pages
Data Mining Intro
No ratings yet
Data Mining Intro
46 pages
A Study of Some Data Mining Classification Techniques
No ratings yet
A Study of Some Data Mining Classification Techniques
4 pages
Data Warehouse and Mining Notes
No ratings yet
Data Warehouse and Mining Notes
12 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Basic Concept of Classification (Data Mining)
No ratings yet
Basic Concept of Classification (Data Mining)
11 pages
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
10 pages
DM CIA 4
No ratings yet
DM CIA 4
20 pages
himanshPR
No ratings yet
himanshPR
12 pages
DM_UNIT-1_FUNDAMENTALS OF DATA MINING (1)
No ratings yet
DM_UNIT-1_FUNDAMENTALS OF DATA MINING (1)
43 pages
1.1 Data and Information Mining
No ratings yet
1.1 Data and Information Mining
24 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Project Lit Final1
No ratings yet
Project Lit Final1
15 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Week 1 - Chapter 1 - Introduction
No ratings yet
Week 1 - Chapter 1 - Introduction
25 pages
Data Science - Sem6
100% (3)
Data Science - Sem6
118 pages
Comparison of Machine Learning Algorithms For DDoS
No ratings yet
Comparison of Machine Learning Algorithms For DDoS
13 pages
DS notes BCA
No ratings yet
DS notes BCA
16 pages
MCQ Artificial Intelligence Class 10 Computer Vision
100% (3)
MCQ Artificial Intelligence Class 10 Computer Vision
41 pages
Data Mining - Outlier Analysis
100% (3)
Data Mining - Outlier Analysis
11 pages
Internship Report PDF 2022
No ratings yet
Internship Report PDF 2022
25 pages
Putation 1 Basic Algorithms and Operators by Thomas Back
100% (2)
Putation 1 Basic Algorithms and Operators by Thomas Back
379 pages
Week6 - Naive Bayes
No ratings yet
Week6 - Naive Bayes
68 pages
Derivation and Analysis of Dynamic Handwriting Features As Clinical Markers of Parkinson's Disease
No ratings yet
Derivation and Analysis of Dynamic Handwriting Features As Clinical Markers of Parkinson's Disease
10 pages
Zafira fk,+4 Vol11No1 855+ (36-47) +
No ratings yet
Zafira fk,+4 Vol11No1 855+ (36-47) +
12 pages
A Gentle Introduction To Neural Networks AI
No ratings yet
A Gentle Introduction To Neural Networks AI
1 page
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
No ratings yet
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
35 pages
Integration of Open Source Platform Duckietown and Gesture Recognition As An Interactive Interface For The Museum Robotic Guide
No ratings yet
Integration of Open Source Platform Duckietown and Gesture Recognition As An Interactive Interface For The Museum Robotic Guide
5 pages
ML Report 1
No ratings yet
ML Report 1
23 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Applsci 13 04550
No ratings yet
Applsci 13 04550
21 pages
s11042-023-14698-2
No ratings yet
s11042-023-14698-2
19 pages
Advanced Machine Learning: CS 281
100% (1)
Advanced Machine Learning: CS 281
88 pages
AbhishekYadav assignment 02
No ratings yet
AbhishekYadav assignment 02
24 pages
Practice Final sp22
No ratings yet
Practice Final sp22
10 pages
Paper 105
No ratings yet
Paper 105
4 pages
A-Seminar-Report-on-Machine-Learining Final Report
No ratings yet
A-Seminar-Report-on-Machine-Learining Final Report
30 pages
A Survey On Optical Character Recognition For Bangla and Devanagari Scripts
No ratings yet
A Survey On Optical Character Recognition For Bangla and Devanagari Scripts
36 pages
Jain 2018
No ratings yet
Jain 2018
14 pages
MCQ's
100% (1)
MCQ's
32 pages
Digital Design - Morris Mano-Fifth Edition
No ratings yet
Digital Design - Morris Mano-Fifth Edition
31 pages
Advanced Machine Learning Techniques For Cardiovascular Disease Early Detection and Diagnosis
No ratings yet
Advanced Machine Learning Techniques For Cardiovascular Disease Early Detection and Diagnosis
29 pages
7 MachineLearning
No ratings yet
7 MachineLearning
68 pages

A Case Study On Data Classification Approach Using K-Nearest Neighbor

Uploaded by

A Case Study On Data Classification Approach Using K-Nearest Neighbor

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A Case Study on Data Classiﬁcation Approach Using K-Nearest Neighbor

Conference Paper · October 2021

Jogeswar Tripathy Binod Kumar Pattanayak

SEE PROFILE SEE PROFILE

Students’ Mental Health and Help-Seeking Behaviors View project

Software Quality Assessment View project

The user has requested enhancement of the downloaded file.

In this work [7], a new method called Modified k-Nearest

Name of No.of No. of No. of class BCW 95.4612 95.3148 91.3616

C. ZOO Data set BCW PIMA ZOO

View publication stats

You might also like