0% found this document useful (0 votes)
35 views

ML1 - Classification - KNN & NB

The document discusses k-nearest neighbors (kNN) and naive Bayes classifiers. It provides references for learning about these algorithms from books and online sources. It also describes classification tasks, binary versus multiclass classification, and direct outcome versus two-step classification processes. The document uses the Iris dataset as a sample classification problem and discusses splitting data into training and test sets. It provides explanations of kNN classification, including calculating similarity, distance metrics, choosing k, and making predictions. It also explains the naive Bayes approach including the independence assumption and calculating conditional probabilities.

Uploaded by

param_email
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

ML1 - Classification - KNN & NB

The document discusses k-nearest neighbors (kNN) and naive Bayes classifiers. It provides references for learning about these algorithms from books and online sources. It also describes classification tasks, binary versus multiclass classification, and direct outcome versus two-step classification processes. The document uses the Iris dataset as a sample classification problem and discusses splitting data into training and test sets. It provides explanations of kNN classification, including calculating similarity, distance metrics, choosing k, and making predictions. It also explains the naive Bayes approach including the independence assumption and calculating conditional probabilities.

Uploaded by

param_email
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Classification : kNN & NB

REFERENCES
kNN Classifier:
Book: Machine Learning with Python for Everyone (Chapter 3)
NB Classifier:
Book: Machine Learning with Python for Everyone (Chapter 3)
Naive Bayes Classifier in Machine Learning (enjoyalgorithms.com)
Bayes Theorem - Statement, Proof, Formula, Derivation & Examples (byj
us.com)
Classification Tasks
• Depending on no. of outcomes
• Binary Classifiction (Two class classification)
• {Yes, No}; {Red, Black}; {True, False}
• {-1 +1}; {0, 1}
• Multiclass Classification
• {Cruiser, Destroyer, Frigate Mine Sweeper, Air Craft Carrier…}
• Depending on steps involved
• Direct outcome in one step
• K Nearest Neighbours
• Two step process
• (1) build a model of how likely the outcomes are and
• (2) pick the most likely outcome
• Naïve Bayes
Sample (& Simple) Classification Dataset
• IRIS Dataset
• Included with sklearn
• Fisher’s Dataset
• Sir Ronald Fisher, mid-20th-century statistician
• First academic paper on classification
• Edgar Anderson
• Gatherer of data!
• Contents
• Each Row: describes one iris flower, in terms of the length and width of that flower’s sepals
and petals
• Rows: Examples / samples
• Final Column: Particular species of that iris: setosa, versicolor, or virginica
• Features / Attributes / IV (initial columns) and Target / Label / DV (Final column)
Sample (& Simple) Classification Dataset
Sample (& Simple) Classification Dataset
Training and Test (Data) Sets
Training and Test (Data) Sets
• Generalization
• Performance on novel data (general knowledge)
• Evaluation Schemes
• in-sample evaluation or training error
• out-of-sample or test error evaluation
• sklearn’s train_test_split
• training data
• portion of the data that we will use to study and build up our understanding
• testing data
• portion of the data that we will use to test ourselves
• Split randomly
Training and Test (Data) Sets
Training and Test (Data) Sets

iris Python variable Symbol Phrase


iris Dall (total) dataset
iris.data Dftrs train and test features
iris.target Dtgt train and test targets
iris_train_ftrs Dtrain training features
iris_test_ftrs Dtest testing features
iris_train_tgt Dtraintgt training target
iris_test_tgt Dtesttgt testing target
Evaluation
• Accuracy
• If the answer is true and we predicted true, then we get a point!
• If the answer is false and we predicted true, we don’t get a point!!
• Formula: (#correct answers / #questions)
• sklearn’s train_test_split
• training data
• portion of the data that we will use to study and build up our understanding
• sklearn’s metrics.accuracy_score
k Nearest Neighbours Classifier
• Simple Classifier
• Single step to make predictions from labelled dataset
• Method
• Find a way to describe the similarity of two different examples.
• When you need to make a prediction on a new, unknown example, simply take the
value from the most similar known example
• Consider more than just the single most similar example:
• Describe similarity between pairs of examples.
• Pick several of the most-similar examples.
• Combine those picks to get a single answer.
k Nearest Neighbours Classifier
• Similarity
• A distance between pairs of examples
• similarity = distance(example_one, example_two)
• Similar things are close - a small distance apart
• Dissimilar things are far away - a large distance apart
• Distance Metrics
• Euclidean Distance
• treat the two examples as points in space
• Hamming Distance
• when we have examples that consist of simple Yes ; No or True; False features, with Boolean
data, we can compare two examples very nicely by counting up the number of features that are
different
• Minkowski Distance etc…
k Nearest Neighbours Classifier
• k in the k-NN and Answer Combination
• 1 / 3 / 5 / 10 / 20
• Voting method to classify
• Noise problem
• Tie problem
• {cat, dog, dog, zebra, cat}
• Statistic (mean / median) to regress
k Nearest Neighbours Classifier
• We want to use 3-NN - three nearest neighbors - as our model
• We want that model to capture the relationship between the iris
training features and the iris training targets
• We want to use that model to predict - on previously unseen test
examples - the iris target species.
• Finally, we want to evaluate the quality of those predictions, using
accuracy, by comparing predictions against reality. We don’t peek at
these known answers, but we use them as an answer key for the test.
k Nearest Neighbours Classifier
k Nearest Neighbours Classifier
• sklearn’s terminology
• An estimator is fit on some data and then used to predict on some data.
• We fit the estimator on training data and then use the fit-estimator to predict
on the test data.
• In other words:-
• Create a 3-NN model,
• Fit that model on the training data,
• Use that model to predict on the test data, and
• Evaluate those predictions using accuracy
k Nearest Neighbours Classifier
• Hyperparameters
• 3 in our 3-nearest-neighbors is not something that we adjust by training
• If we want a 5-NN machine, we have to build a completely different model
• 3 is a hyperparameter
• Hyperparameters are not trained or manipulated by the learning method they
help define
• Hyperparameters are predetermined and fixed before we get a chance to do
anything with them while learning
Naïve Bayes Classifier
Naïve Bayes Classifier
• Example: Football Play with single feature

• Aim: to make a ML model which receives the feature value of humidity and tries to
predict whether the play will happen or not
• Given that Humidity is Normal, lets find the chances of the play
• p(Play = Yes | Humidity = Normal)
Naïve Bayes Classifier
Naïve Bayes Classifier

You might also like