0% found this document useful (0 votes)
3 views

himanshPR

This report provides a comparative analysis of K-Nearest Neighbor (KNN) and Bayesian Networks as machine learning algorithms, discussing their principles, strengths, weaknesses, and applications. It emphasizes the importance of choosing the right algorithm based on data characteristics and problem requirements. The report also suggests future research directions, including hybrid approaches that combine the strengths of both algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

himanshPR

This report provides a comparative analysis of K-Nearest Neighbor (KNN) and Bayesian Networks as machine learning algorithms, discussing their principles, strengths, weaknesses, and applications. It emphasizes the importance of choosing the right algorithm based on data characteristics and problem requirements. The report also suggests future research directions, including hybrid approaches that combine the strengths of both algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

BENGAL COLLEGE OF ENGINEERING AND TECHNOLOGY

A REPORT ON
Comparative Analysis of K-Nearest Neighbor and
Bayesian Networks as Learning Mechanism

Submitted by
Himanshu KUMAR

University roll no.: 12500221075

Subject: Pattern Recognition

Dept. Of Information Technology 3rd Year


Under the guidance of

Mrs. Aparna ma’am

( IT Department)
ACKNOWLEGMENT

On the very outset of this report, I would like to extend my


sincere & heartfelt obligation towards all the personages who
have helped me in this endeavour. Without their active
guidance, help, cooperation, I would not have made headway
in the project.I am ineffably indebted to ma’am for
conscientious guidance and encouragement to accomplishment
this report I am extremely thankful and pay my gratitude to my
faculty Mrs. Aparna for his valuable guidance and support on
completion of this report writing in its presently.
TABLE OF CONTENT

 ABSTRACT

 INTRODUCTION

 METHODOLOGY

 DISCUSSION

 CONCLUSION

 APPLICATION

 BIBLIOGRAPHY
ABSTRACT

This report presents a comprehensive comparison between two prominent


machine learning algorithms, namely K-Nearest Neighbor (KNN) and
Bayesian Networks. Both these methods are widely utilized for
classification and prediction tasks in various fields.
This report aims to highlight the fundamental principles, strengths,
weaknesses, and applications of KNN and Bayesian Networks, providing
insights into their suitability for different scenarios.
To conclude, the paper also highlights the future scope of ML algorithms
and artificial intelligence in the coming times and their roles in automation
and holistic development, not just in technology-related aspects but also, the
humanitarian aspects, finally followed by reliable and relevant conclusions
derived from this exhaustive research.
INTRODUCTION

Machine learning plays a pivotal role in data analysis, pattern recognition,


and decision-making processes. KNN and Bayesian Networks represent
two distinct approaches to learning from data. KNN is an instance-based
learning algorithm that relies on the proximity of data points, while
Bayesian Networks are probabilistic graphical models that capture
dependencies among variables.

K-Nearest Neighbor (KNN):


Principles: KNN is a simple and intuitive algorithm that classifies data
points based on the majority class of their k-nearest neighbors. The
algorithm's decision is heavily influenced by the choice of distance metric
and the value of k.
METHODOLOGY

. Working of KNN algorithm


The ‘K’ here in K-NN refers to the count of neighbors of the new data
point. Deciding a suitable value for K is the foremost process in this
algorithm. For better accuracy, it is imperative that one chooses the accurate
value of K, and this process is called parameter tuning. A very low value of
K like 1 or 2 can lead to noisy results, whereas, a very high value can create
confusion at times, depending on the data set [10].

There is no fixed value for K, however, one of the standard values that K
often assumes is ‘5’ i.e., for the majority voting process, the 5 neighbors
closest to the new data point are considered. To avoid mistakes and
confusion among two classes of data sets, generally, an odd value of K is
suitable. Another formula-based calculation for K can be done through this
formula:(1)
And, n is the overall count of data points.

. Euclidean distance is calculated as shown in Fig. 3.


Upon calculating the values of the Euclidean distances of all the points from
the new data point, one should observe the category to which the majority
of the nearest neighbors belong (say, at K = 5), and hence after careful
computation impute that class to the data point, assigned for classification.
Like in Fig. 4, it can be concluded that the point, goes to class A, since it has
3 (majority) nearest neighbors from that category [10].
Fig. 3. Calculation of Euclidean Distance b/w two points [13].

2.2. Comparison of logistic regression, Naive Bayes and KNN machine


learning algorithms for credit card fraud detection — recent application

2.2.1. Background of the recent work


Credit cards are a widely adopted method for payments these days due to
the unstoppable advancement of internet technology. Having said that,
banking scams are also way more commonly heard these days than before,
which has indelibly affected many segments of the population, be it
individuals or institutions. With every advanced security feature, the
DISCUSSION

. Decision tree consisting of a constant variable


The Decision Tree having a constant variable as the target.
Example: - Whether a person can repay a loan or not. In case the banks do
not have income details, which is a significant variable in this case, then a
decision tree could be built for predicting the monthly revenue of a person
on the basis of various factors like assets, living standard, occupation, etc.
Here the values being predicted are for variables continuous in nature.

5.2. Decision tree terminologies


 •

Root Node The initial part of the Decision Tree from where the entire
data set starts getting divided further, into various possible sets that are
homogeneous in nature.
 •

Leaf Node: The final outcome node beyond which no further


segregation of trees is possible.
 •

Splitting: It involves the process of division of the main node further,


upon the provided constraints into sub-nodes.
 •

Sub Tree: Splitting up a hierarchy results into a sub tree or branch.


 •

Pruning: This involves the elimination of superfluous branches of the


Decision Tree in order to get optimal results.

Child and Parent node: It is the base node also called the parental
node whereas the remaining nodes are simply called child nodes [40].
5.3. Attribute selection measures
Attribute selection measure (ASM) involves the collection of the optimum
attribute concerning the source node as well as for the sub-nodes. The two
major practices for ASM are:

5.3.2. Gini index


Gini index measures the impurity or purity used during the creation of a
decision tree algorithm. Small Gini index attributes are preferred by the
decision tree algorithm over the attributes possessing larger Gini index,
while taking the decision.
The calculation of Gini index can be performed using the expression given
as follows:(4)Gini index=1

Steps for making a decision tree


The root node, say X, that contains the entire data set is considered the
starting point of the tree.

By using ASM look for the best matching characteristic from the data
set.

Split X into subsections comprising values having the finest possible


qualities.

Develop the decision tree nodes only using the idyllic attribute.

Repetitively keep developing unique decision tree nodes using the


Application

 Image and speech recognition.


 Anomaly detection.
 Recommender systems.
1. Bayesian Networks: 3.1 Principles: Bayesian Networks model the
probabilistic relationships between variables using a directed acyclic
graph (DAG). Conditional dependencies and independencies are
explicitly represented, making it suitable for reasoning under
uncertainty.
1.2 Strengths:
 Effective handling of uncertainty and incomplete data.
 Explicit modeling of variable dependencies.
 Facilitates intuitive representation and interpretation.
1.3 Weaknesses:
 Complexity increases with the number of variables.
 Dependency on accurate prior probabilities.
1.4 Applications:
 Medical diagnosis.
 Fraud detection.
 Natural language processing.
2. Comparative Analysis: 4.1 Performance Metrics:
 Accuracy, precision, recall.
 Robustness to noise and outliers.
 Computational efficiency.
CONCULSION

In conclusion, both KNN and Bayesian Networks offer unique advantages


and challenges. The choice between them depends on the nature of the data,
the problem at hand, and computational considerations.

KNN excels in simplicity and adaptability, while Bayesian Networks


provide a principled approach to modeling uncertainties and dependencies.
Ultimately, the selection should be based on the specific requirements of the
task and the characteristics of the dataset.

Future Directions: Further research can explore hybrid approaches that


combine the strengths of KNN and Bayesian Networks, leveraging the
simplicity of KNN for local decisions and the probabilistic modeling
capabilities of Bayesian Networks for capturing global dependencies.
Additionally, advancements in handling large-scale datasets and
optimization techniques can contribute to the scalability of both algorithms
BIBLIOGRAPHY

 CLASS NOTES
 WEBSITE
 www.google.com
 https://www.javatpoint.com/machine-learning.
Google Scholar

https://images.app.goo.gl/eLBR6gBjRGnSyJ7S9.
Google Scholar

You might also like