Sayan Das - Machine Learning
Sayan Das - Machine Learning
Machine Learning"
Name – Sayan Das
Roll no - 14400121013
I. INTRODUCTION
Machine Learning (ML) is a branch of artificial
intelligence (AI) that allows computers to learn and make
decisions without being explicitly programmed. It's all about
using data to train models that can predict or classify new
data. Today, we’ll focus on two popular machine learning
algorithms:
Identify the K nearest points (neighbors) to the new Height Weight Age Class
point. 5.8 70 25 Healthy
2) For Regression: Take the average value of the K And now, there's a new person with:
nearest neighbors to predict the value for the new point. Height: 5.7
Weight: 72
Age: 26
IV. WHEN DO WE USE KNN ALGORITHM?
We can use when: We want to classify if this new person is "Healthy" or
"Unhealthy" based on their height, weight, and age using
Data is labeled. KNN (we'll use K=3 for this example).
d=√(0.1)2+(-2)2+(-1)2=√0.01+4+1=√5.01≈2.24
B. Recommender systems
Suggesting products or content based on user
preferences and past behavior
C. Pattern recognition Pick the best feature to split the data based on how
Identifying patterns in data sets, such as finding similar much information it gives you (more on this in the
customers in a database next slide).
H(S)= - ∑pilog2pi
Fig . A decision tree for the concept Play Badminton Probability of Yes = 1/3
Probability of No = 2/3
X. DECISION TREE - HOW IT WORKS 3 3 3 3
H(S) = −( log 2 + log 2 )
A. Step-by-Step Process 1 1 2 2
Start with all the data and consider all the features
(like weather, temperature).
H(S)=−(0.333×−1.585+0.666×−0.585)=0.918
Step 2: Calculate Information Gain for "Outlook"
We’ll calculate the information gain for the
Outlook feature. 3) Cons: Prone to overfitting.
XIV. DECISION TREE - ADVANTAGES & I extend my heartfelt thanks to my mentors and
DISADVANTAGES educators Suman Halder sir who provided guidance and
A. Advantages insights throughout the research and writing process. Their
valuable input has greatly enriched the content and structure
Easy to visualize and interpret. of this report.
Can handle both numerical and categorical data.
Doesn’t need much data preparation. I also want to acknowledge the resources,
textbooks, and academic materials that have served as
B. Disadvantages essential references, allowing me to delve into the subject
matter and present accurate information.
Can easily become too complex and overfit the
data (making it perform worse on new data). REFERENCES
Sensitive to small changes in the data.
[1] J. R. Quinlan, "Induction of Decision Trees," Mach.
Learn., vol. 1, no. 1, pp. 81–106, 1986.
XV. KNN VS. DECISION TREES [2] T. Cover and P. Hart, "Nearest Neighbor Pattern
A. KNN Classification," IEEE Trans. Inf. Theory, vol. 13, no. 1,
pp. 21–27, 1967.
1) Type: Instance-based (it remembers data).
[3] K-Nearest Neighbors (KNN) Classification with R
2) Learning Type: Lazy learning (no training). Tutorial. [Online]. Available:
https://www.datacamp.com/tutorial/k-nearest-neighbors-
knn-classification-with-r-tutorial. [Accessed: Sep. 13,
3) Cons: Slow with large data. 2024].
B. Decision Trees
[4] Decision tree Tutorial & Notes: [Online]. Available:
1) Type: Model-based (it creates a model to predict). https://www.hackerearth.com/practice/machine-
learning/machine-learning-algorithms/ml-decision-
tree/tutorial/[Accessed: Sep. 13, 2024].
2) Learning Type: Needs training.