0% found this document useful (0 votes)

14 views

ML-chap-2

Uploaded by

Amanuel Fentahun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

ML-chap-2

Uploaded by

Amanuel Fentahun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Classification : Machine

Learning Basic and kNN

Wachemo University
School of Computing and Informatics
Department of Software Engineering
Ms. Senedu G/mariam (2023)
Outline

❖ A brief overview of ML
❖ Key tasks in ML
❖ Why we need ML
❖ K-nearest neighbors algorithm
❖ kNN Classification
❖ kNN Regression
❖ Some Issues in KNN
❖ Decision Tree
❖ Naïve Bayes

9/4/2023 2
Machine Learning

❖ With machine learning we can gain insight from a dataset.

❖ We’re going to ask the computer to make some sense from the
data.
❖ This is what we mean by learning.

❖ Machine learning is the process of turning the data into

information and Knowledge.
❖ ML lies at the intersection of computer science, engineering,
and statistics and often appears in other disciplines.

9/4/2023 3
What is Machine Learning?

❖ It’s a tool that can be applied to many problems.

❖ Any field that needs to interpret and act on data can benefit
from ML techniques.

❖ There are many problems where the solution isn’t deterministic.

❖ That is, we don’t know enough about the problem or don’t have
enough computing power to properly model the problem.

9/4/2023 4
Traditional Vs ML systems

❖ In ML, once the system is provided with the right data and
algorithms, it can "•
fish for itself”.

9/4/2023 5
Traditional Vs ML systems

❖ A key aspect of ML that makes it particularly appealing in

terms of business value is that it does not require as much
explicit programming in advance.

9/4/2023 6
Sensor and the Data Deluge

❖ We have a tremendous amount of human-created data from the

WWW, but recently more non-human sources of data have
been coming online.
❖ Sensors connected to the web.
❖ 20 % of non-video internet traffic by sensors.
❖ Data collected from mobile phone (three-axis accelerometer,
temperature sensors, and GPS receivers)

❖ Due to the two trends of mobile computing and sensor

generated data mean that we’ll be getting more and more data
in the future.
9/4/2023 7
Key Terminology

❖ Weight, Wingspan, Webbed feet, Back color are features or

attributes.
❖ An instance is made up of features. (controlled, exposure etc.)
❖ Species is the target variable. (response, outcome, output etc.)
❖ Attributes can be numeric, binary, nominal.

9/4/2023 8
Key Terminology

❖ To train the ML algorithm we need to feed it quality data

known as a training set.
❖ In the above example each training example (instant) has four
features and one target variable.
❖ In a training set the target variable is known.

❖ The machine learns by finding some relationship between the

features and the target variable.
❖ In the classification problem the target variables are called
classes, and they are assumed to be a finite number of classes.
9/4/2023 9
Key Terminology Cont…

❖ To test machine learning algorithms a separate dataset is used

which is called a test set.
❖ The target variable for each example from the test set isn’t
given to the program.

❖ The program (model) decides in which class each example

should belong to.
❖ Then compare the predicted value with the target variable.

9/4/2023 10
Key Tasks of Machine Learning

❖ In classification, our job is to predict what class an instance of

data should fall into.
❖ Regression is the prediction of a numeric value.

❖ Classification and regression are examples of supervised

learning.
❖ This set of problems is known as supervised because we’re
telling the algorithm what to predict.

9/4/2023 11
Key Tasks of Machine Learning

❖ The opposite of supervised learning is a set of tasks known as

unsupervised learning.
❖ In unsupervised learning, there’s no label or target value given
for the data. (known as clustering)
❖ In unsupervised learning, we may also want to find statistical
values that describe the data. This is known as density
estimation.
❖ Another task of unsupervised learning may be reducing the
data from many features to a small number so that we can
properly visualize the dimensions.

9/4/2023 12
Key Tasks of Machine Learning

❖ Common algorithms used to perform classification, regression,

clustering, and density estimation tasks.
❖ Balancing generalization and memorization (over fitting) is a
common problem to many ML algorithms.
❖ Regularization techniques are used to reduce over fitting.

9/4/2023 13
Key Tasks of Machine Learning

❖ There are two fundamental cause of prediction error: a model bias, and
its variance.
❖ A model with high variance over-fits the training data, while a model
with high bias under-fits the training data.
❖ High bias, low variance
❖ Low bias, high variance
❖ High bias, high variance
❖ Low bias, low variance
❖ The predictive power of many ML algorithms improve as the
amount of training data increases.
❖ Quality of data is also important.
❖ Ideally, a model will have both low bias and variance; but effort
to reduce one will frequently increase the other. This is known as
9/4/2023 the bias-variance trade-off. 14
Model bias Vs Variance
• Model bias refers to the presence of systematic errors in a model
that can cause it to consistently make incorrect predictions.
These errors can arise from many sources, including:
– the selection of the training data,
– the choice of features used to build the model, or
– the algorithm used to train the model.
• Variance refers to the changes in the model when using different
portions of the training data set. Simply stated, variance is the
variability in the model prediction—how much the ML function
can adjust depending on the given data set. Variance comes
from highly complex models with a large number of features.
9/4/2023 15
Common measurement of
performance
❖ Common measurement of performance include:
❖ Accuracy (ACC) = (TP + TN / TP+TN+FP+FN)
❖ Precision (P) = (TP / TP+FP)
❖ Recall (R) = (TP / TP+FN)
• A true positive is an outcome where the model correctly predicts
the positive class.
• A true negative is an outcome where the
model correctly predicts the negative class.
• A false positive is an outcome where the
model incorrectly predicts the positive class.
• A false negative is an outcome where the
model incorrectly predicts the negative class.
9/4/2023 16
Accuracy (ACC)

❖ is the fraction of predictions our model got right. Formally,

accuracy has the following definition:

For binary classification, accuracy can also be calculated in terms of positives and
negatives as follows:

Example: Let's try calculating accuracy for the following model that
classified 100 tumors as malignant (the positive class) or benign (the negative
class):

9/4/2023 17
Example

9/4/2023 18
Precision (P)

❖ Is attempts to answer the question: What proportion of positive identifications

was actually correct?

Example: Let's try calculating precision for the following model that classified 100
tumors as malignant (the positive class) or benign (the negative class):

Our model has a precision of 0.5—in other words, when it predicts a tumor is
malignant, it is correct 50% of the time.

9/4/2023 19
Recall (R)

❖ attempts to answer the question: What proportion of actual positives was

identified correctly?
Note: A model that produces no false negatives has a recall of 1.0.

Example: Let's try calculating recall for the following model that classified 100
tumors as malignant (the positive class) or benign (the negative class):

Our model has a recall of 0.11—in other words, it correctly identifies 11% of all
malignant tumors.
9/4/2023 20
How to Choose the Right
Algorithm

❖ First, you need to consider your goal.

❖ If you’re trying to predict or forecast a target value, then you
need to look into supervised learning.
❖ If not, then unsupervised learning is the place you want to be.

❖ If you’ve chosen supervised learning, what’s your target value?

❖ Discrete value (y/n, 1/2/3, Red/Yellow/Black):- classification
❖ A number of values (0.00 to 100.00 etc…):- regression

9/4/2023 21
How to Choose the Right
Algorithm

❖ Spend some time to know the data, and the more we know it,
we can build successful application.
❖ Things to know about the data are these:
❖ Are the features nominal or continuous?
❖ Are there missing values in the features?
❖ If there missing values, why are there missing values?
❖ Are there outliers in the data? etc…

❖ All of these features about your data can help you narrow the
algorithm selection process.
9/4/2023 22
How to Choose the Right
Algorithm

❖ Finding the best algorithm is an iterative process of trial and

error.
❖ Steps in developing a machine learning application:
❖ Collect data: scraping a website, RSS feed or API etc..
❖ Prepare the input data: make sure the unstableness of the data
format.
❖ Analyze the input data: looking at the data.
❖ Understand the data.
❖ Train the algorithm: the ML takes place (not for unsupervised)
❖ Test the algorithm: (go back to the 4th step)
❖ Use it (implement ML application)
9/4/2023 23
Problem Solving Framework

❖ Problem solving Framework for ML application:

❖ Business issue understanding
❖ Data understanding
❖ Data preparation
❖ Analysis Modeling
❖ Validation
❖ Presentation / Visualization

9/4/2023 24
Machine Learning Systems and
Data
❖ In AI (ML), instead of writing a program by hand for each
specific task, we collect lots of examples that specify the correct
output for a given input.
❖ The most important factors in ML is not the algorithm or the
software systems.
❖ The quality of the data is the soul of the ML systems.

9/4/2023 25
Machine Learning Systems and
Data
❖ Invalid training data:
❖ Garbage In ------ Garbage Out.

❖ Invalid dataset leads to invalid results.

❖ This is not to say that the training data needs to be prefer.

❖ Out of a million examples, some inaccurate labels is

acceptable.
❖ The quality of the data is the soul of the ML systems.

9/4/2023 26
Machine Learning Systems and
Data
❖ “garbage” can be several things:
❖ Wrong label (Dog – Cat, Cat – Dog)
❖ Inaccurate and Missing Values
❖ A bias dataset etc…
❖ Handling missing data:
❖ Small portion row and columns – discarded them
❖ Data imputation (time serial data) – the last valid value
❖ Substitute with mean or median
❖ Predicting the missing values from the available data
❖ A missing value can have a meaning on its own (missing)
9/4/2023 27
Machine Learning Systems and
Data
❖ Having a clear dataset is not always enough.
❖ Features with large magnitudes can dominate features with small
magnitudes during the training.
❖ Example: Age [0-100], salary [6,000 – 20,000] – Scaling and
Standardization
❖ Data imbalance:
❖ Leave as it is.
No Classes Number
❖Under sampling (if all classes are
1 Cat 5000
equally important) [5000 – 25]
2 Dog 5000
3 Tiger 150 ❖Over sampling (if all classes are
4 Cow 25 equally important) [25-5000]
9/4/2023 28
Challenges in Machine
Learning
❖ It requires considerable data and compute power.
❖ It requires knowledgeable data science specialists or teams.
❖ It adds complexity to the organization's data integration
strategy. (data-driven culture)

❖ Learning AI(ML) algorithms is challenging without an

advanced math background.
❖ The context of data often changes. (private data Vs public data)
❖ Algorithmic bias, privacy and ethical concerns may be
overlooked.
9/4/2023 29
Stages of ML Process

❖ The •first key step in preparing to explore and exploit AI(ML) is

to understand the basic stages involved.

9/4/2023 30
Stages of ML Process

❖ Machine Learning Tasks and Subtasks:

9/4/2023 31
Data Collection and Preparation

❖ Data collection is the process of gathering and measuring

information from countless different sources.

❖ Data generating at an unprecedented rate. These data can be:

❖ Numeric (temperature, loan amount, customer retention rate),
❖ Categorical (gender, color, highest degree earned), or
❖ Even free text (think doctor’s notes or opinion surveys).

❖ In order to use the data we collect to develop practical

solutions, it must be collected and stored in a way that makes
sense for the business problem at hand.
9/4/2023 32
Data Collection and Preparation
Data Collection and Preparation

❖ During an AI development, we always rely on data.

❖ From training, tuning, model selection to testing, we use three
different data sets: the training set, the validation set ,and the
testing set.

❖ The validation set is used to select and tune the final ML model.

❖ The test data set is used to evaluate how well your algorithm
was trained with the training data set.

9/4/2023 34
Data Collection and Preparation

❖ Testing sets represent 20% or 30% of the data. (cross validation)

❖ The test set is ensured to be the input data grouped together with
verified correct outputs, generally by human verification.

9/4/2023 35
Classifying with k-Nearest
Neighbors(KNN)

9/4/2023 41
K-Nearest Neighbors (KNN)

❖ It is an easy to grasp (understand and implement) and very

effective (powerful tool).
❖ The model for kNN is the entire training dataset.

❖ Pros: High accuracy, insensitive to outliers, no assumptions

about data.
❖ Cons: computationally expensive, requires a lot of memory.
❖ Works with: Numeric values, nominal values. (Classification
and regression)

9/4/2023 42
K-Nearest Neighbors (KNN)

❖ We have an existing set of example data (training set).

❖ We know what class each piece of the data should fall into.

❖ When we’re given a new piece of data without a label.

❖ We compare that new piece of data to the existing data, every
piece of existing data.
❖ We then take the most similar pieces of data (the nearest
neighbors) and look at their labels.

9/4/2023 43
K-Nearest Neighbors (KNN)

❖ We look at the top k most similar pieces of data from our

known dataset. (usually less than 20)
❖ The K is often set to an odd number to prevent ties.

❖ Lastly, we take a majority vote from the k most similar pieces

of data, and the majority is the new class we assign to the data
we were asked to classify.

9/4/2023 44
K-Nearest Neighbors (KNN)

❖ KNN, non-paramteric models can be useful when training data

is abundant and you have little prior knowledge about the
relationship b/n the response and explanatory variables.
❖ KNN makes only one assumption: instance that are near each
other are likely to have similar values of response variable.

❖ A model that makes assumption about the relationship can be

useful if training data is scarce or if you already know about
the relationship.

9/4/2023 45
KNN Classification

❖ Classifying movies into romance or action movies.

❖ The number of kisses and kicks in each movie (features)

❖ Now, you find a movie you haven’t seen yet and want to know
if it’s a romance movie or an action movie.
❖ To determine this, we’ll use the kNN algorithm.

9/4/2023 46
KNN Classification

❖ We find the movie in question and see how many kicks and
kisses it has.

Classifying movies by plotting the # kicks and kisses in each movie

9/4/2023 47
KNN Classification

Movies with the # of kicks, # of kisses along with their class

9/4/2023 48
KNN Classification

❖ We don’t know what type of movie the question mark movie is.
❖ First, we calculate the distance to all the other movies.

Distance b/n each movie and the unknown movie

9/4/2023 49
KNN Classification

Euclidian distance where the distance between two vectors

9/4/2023 50
Distances

Distance are Distance

used to measure
similarity
There are many Mahalanobis
Distance
Euclidean
Distance
ways to measure
the distances
between two
instances
Hamming Minkowski
Distance distance
Distances

• Manhattan Distance
|X1-X2| + |Y1-Y2|
KNN Classification

❖ Let’s assume k=3.

❖ Then, the three closest movies are He’s Not Really into
Dudes, Beautiful Woman, and California Man.
❖ Because all three movies are romances, we forecast that the
mystery movie is a romance movie. (majority vote)

9/4/2023 53
General Approach to KNN

❖ General approach to kNN:

❖ Collect: Any method
❖ Prepare: Numeric values are needed for a distance calculation.
❖ Analyze: Any method (plotting).
❖ Train: Does not apply to the kNN algorithm.
❖ Test: Calculate the error rate.
❖ Use: This application needs to get some input data and output
structured numeric values.

9/4/2023 54
K-Nearest Neighbors (KNN)

❖ kNN is an instance-based learning algorithm.

<x, y> 1 <x, y> 1

<x, y> 2 <x, y> 2
<x, y> 3 <x, y> 3 Database
F(x) = wx + b
<x, y> 4 <x, y> 4
…….. ……..
<x, y> n <x, y> n F(x) = lookup(x)
Non-instance supervised learning Instance-based supervised learning

9/4/2023 55
K-Nearest Neighbors (KNN)

❖ Advantage:
❖ It remembers
❖ Fast (no learning time)
❖ Simple and straight forward
❖ Disadvantage :
❖ No generalization
❖ Over-fitting (noise)
❖ Computationally expensive for large datasets

9/4/2023 56
K-Nearest Neighbors (KNN)

❖ Given:
❖ Training data D = (xi, yi)
❖ Distance metric d(q, x): domain knowledge important
❖ Number of neighbors K: domain knowledge important
❖ Query point q

❖ KNN = {i : d(q, xi) k smallest }

❖ Return:
❖ Classification: Vote of the yi.
❖ Regression: mean of the yi.
9/4/2023 57
KNN- Regression Problem

❖ The similarity measure is dependent on the type of the data:

❖ Real-valued data: Euclidean distance
❖ Hamming distance: categorical or binary data (P-norm; when p=0)
Regression
X1, X2 y
❖ d(): k Average
1, 6 7
❖ Euclidian: 1-NN _______
2, 4 8
3, 7 16 ❖ 3-NN _______
6, 8 44
7, 1 50 ❖ Manhattan 1-NN _______
8, 4 68
❖ 3-NN _______
Q = 4, 2, y = ???
9/4/2023 58
KNN- Regression Problem

❖ d(): k Average
Regression ❖ Euclidian: 1-NN ___8___
X1, X2 y ED
❖ 3-NN ___42__
1, 6 7 25
2, 4 8 8 ❖ Manhattan 1-NN _______
3, 7 16 26 ❖ 3-NN _______
6, 8 44 40
7, 1 50 10
8, 4 68 20 Euclidian = (X1i – q1)2 + (X2i – q2)2
Q = 4, 2, y = ???
9/4/2023 59
KNN- Regression Problem

❖ d(): k Average
Regression ❖ Euclidian: 1-NN _______
X1, X2 y mD
❖ 3-NN _______
1, 6 7 7
2, 4 8 4 ❖ Manhattan 1-NN ___29__
3, 7 16 6 ❖ 3-NN __35.5__
6, 8 44 8
7, 1 50 4
8, 4 68 6 Manhattan = (|X1i – q1|) + (|X2i - q1|)
Q = 4, 2, y = ???
9/4/2023 60
K-Nearest Neighbors Bias

❖ Preference Bias?
❖ Our believe about what makes a good hypothesis.
❖ Locality: near points are similar (distance function / domain)
❖ Smoothness: averaging
❖ All features matter equally
❖ Best practices for Data preparation
❖ Rescale data: normalizing the data to the range [0, 1] is a good
idea.
❖ Address missing data: excluded or imputed the missing values.
❖ Lower dimensionality: KNN is suitable for lower dimensional
data
9/4/2023 61
KNN and Curse of
Dimensionality

❖ As the number of features or dimension grows, the amount of

data we need to generalize accurately grows exponentially.
❖ Exponentially mean “bad”. O(2d)

9/4/2023 62
Some Other Issues

❖ What is needed to select a KNN model?

❖ How to measure closeness of neighbors.
❖ Correct value for K.

❖ d(x, q) = Euclidian, Manhattan, weighted etc…

❖ The choice of the distance function matters.
❖ K value
❖ K = n (the average of all data / no need of query)
❖ K = n (weighted average) [Locally weighted regression]
9/4/2023 63
Summary

❖ kNN is an example of instance-based learning.

❖ The algorithm has to carry around the full dataset; for large
datasets, this implies a large amount of storage.
❖ Need to calculate the distance measurement for every piece of
data in the database, and this can be bulky.
❖ kNN doesn’t give you any idea of the underlying structure of
the data.
❖ kNN is an example of lazy learning, which is the opposite of
eager learning.
❖ kNN can handle both classification and regression.

9/4/2023 64
Question & Answer

9/4/2023 65

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
FML - KNN
No ratings yet
FML - KNN
64 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Unit4_PPT (2)
No ratings yet
Unit4_PPT (2)
126 pages
AI Unit 1
No ratings yet
AI Unit 1
30 pages
AIch5 (2)
No ratings yet
AIch5 (2)
50 pages
Air quality prediction using machine learning
No ratings yet
Air quality prediction using machine learning
29 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
03-Introduction To Machine Learning - DNN
No ratings yet
03-Introduction To Machine Learning - DNN
35 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
ML 01
No ratings yet
ML 01
24 pages
dbms-10 marks
No ratings yet
dbms-10 marks
32 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
ML_MDU_2024_10939237
No ratings yet
ML_MDU_2024_10939237
20 pages
Lec 8
No ratings yet
Lec 8
35 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
An Enlightenment To Machine Learning
100% (1)
An Enlightenment To Machine Learning
16 pages
Machine Learning Notes From AWS
No ratings yet
Machine Learning Notes From AWS
5 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
FAM_QUESTION_BANK_CT[1]
No ratings yet
FAM_QUESTION_BANK_CT[1]
14 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Notes Unit 1-3 Part-II
No ratings yet
Notes Unit 1-3 Part-II
20 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
Introduction To Machine Learning Top-Down Approach - Towards Data Science
No ratings yet
Introduction To Machine Learning Top-Down Approach - Towards Data Science
6 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Machine Learning Note (2)
No ratings yet
Machine Learning Note (2)
40 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Module 4
No ratings yet
Module 4
28 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
No ratings yet
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
16 pages
Machine Learning - Brief
No ratings yet
Machine Learning - Brief
12 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Rohit Unit 1 ML Notes
No ratings yet
Rohit Unit 1 ML Notes
27 pages
Lec1 Intoduction
No ratings yet
Lec1 Intoduction
34 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
Key Elements of Machine Learning
No ratings yet
Key Elements of Machine Learning
9 pages
Machine Learning
No ratings yet
Machine Learning
122 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
Unit-4 White Box Testing
No ratings yet
Unit-4 White Box Testing
35 pages
4_5782705780780175193.docx
No ratings yet
4_5782705780780175193.docx
8 pages
Chapter 3 - Cloud Providers, platform and simulation
No ratings yet
Chapter 3 - Cloud Providers, platform and simulation
10 pages
Chapter 1 - Overview of cloud computing
No ratings yet
Chapter 1 - Overview of cloud computing
6 pages
Chapter 1-SEM
No ratings yet
Chapter 1-SEM
29 pages
RN Interview - Medical-Surgical Nurse
No ratings yet
RN Interview - Medical-Surgical Nurse
4 pages
COVID 19 Reflection Paper
No ratings yet
COVID 19 Reflection Paper
2 pages
LCD GDM12864H
No ratings yet
LCD GDM12864H
13 pages
PPG Lesson 9. Local Government Unit.
No ratings yet
PPG Lesson 9. Local Government Unit.
6 pages
QUERIES For Practical File 2023-24
No ratings yet
QUERIES For Practical File 2023-24
4 pages
Carriage of Dangerous Goods (Imdg) Code
100% (4)
Carriage of Dangerous Goods (Imdg) Code
39 pages
ICT Development in Bangladesh
100% (1)
ICT Development in Bangladesh
2 pages
Targuard
No ratings yet
Targuard
4 pages
United States v. Leo Christy Condolon, 600 F.2d 7, 4th Cir. (1979)
No ratings yet
United States v. Leo Christy Condolon, 600 F.2d 7, 4th Cir. (1979)
3 pages
MB Manual Ga z270x Ultra Gaming e
No ratings yet
MB Manual Ga z270x Ultra Gaming e
48 pages
Restricted Access Notice: Due To Third Party Proprietary Information
No ratings yet
Restricted Access Notice: Due To Third Party Proprietary Information
38 pages
Prof Elec Reviewer
No ratings yet
Prof Elec Reviewer
4 pages
Document
No ratings yet
Document
8 pages
SCM 341 Digital Marketing For Logistics - REVIEW Questions - 1
No ratings yet
SCM 341 Digital Marketing For Logistics - REVIEW Questions - 1
9 pages
Agreement Letter
No ratings yet
Agreement Letter
16 pages
Short Term: Applicant Information
100% (1)
Short Term: Applicant Information
3 pages
The Battle to Control the World’s Most Powerful Technology (The Daily)
No ratings yet
The Battle to Control the World’s Most Powerful Technology (The Daily)
14 pages
HC 4500 250w
No ratings yet
HC 4500 250w
4 pages
Silliman Notes Political Law Suggested Answers 1987 2006
0% (1)
Silliman Notes Political Law Suggested Answers 1987 2006
46 pages
Unit 6 - GST - 2024
No ratings yet
Unit 6 - GST - 2024
10 pages
PHẦN TRẮC NGHIỆM (8,0 điểm) : C B C A
No ratings yet
PHẦN TRẮC NGHIỆM (8,0 điểm) : C B C A
3 pages
Answer 1: Introduction: Payment of Wages Act, 1936 Is Based On Recommendation of The
No ratings yet
Answer 1: Introduction: Payment of Wages Act, 1936 Is Based On Recommendation of The
5 pages
White LED Drivers With External FET
No ratings yet
White LED Drivers With External FET
3 pages
300 735 SAUTO v1.1
No ratings yet
300 735 SAUTO v1.1
2 pages
OSF
100% (1)
OSF
2 pages
9946Z - 0657-SP-SL-DE-C-G77-452-P3 - Typical Sections Through Permeable Block Paving
No ratings yet
9946Z - 0657-SP-SL-DE-C-G77-452-P3 - Typical Sections Through Permeable Block Paving
1 page
WMX - Q Auto Trendline
No ratings yet
WMX - Q Auto Trendline
11 pages
RBAC Round 2 Case Study
No ratings yet
RBAC Round 2 Case Study
18 pages
Apex Learning - Journal 2 - Google Docs - 4003003 - SDJ9LCyp3
No ratings yet
Apex Learning - Journal 2 - Google Docs - 4003003 - SDJ9LCyp3
12 pages
Vendors - Four Seasons
No ratings yet
Vendors - Four Seasons
5 pages

ML-chap-2

Uploaded by

ML-chap-2

Uploaded by

Classification : Machine

Learning Basic and kNN

❖ With machine learning we can gain insight from a dataset.

❖ Machine learning is the process of turning the data into

❖ It’s a tool that can be applied to many problems.

❖ There are many problems where the solution isn’t deterministic.

❖ A key aspect of ML that makes it particularly appealing in

❖ We have a tremendous amount of human-created data from the

❖ Due to the two trends of mobile computing and sensor

❖ Weight, Wingspan, Webbed feet, Back color are features or

❖ To train the ML algorithm we need to feed it quality data

❖ The machine learns by finding some relationship between the

❖ To test machine learning algorithms a separate dataset is used

❖ The program (model) decides in which class each example

❖ In classification, our job is to predict what class an instance of

❖ Classification and regression are examples of supervised

❖ The opposite of supervised learning is a set of tasks known as

❖ Common algorithms used to perform classification, regression,

❖ is the fraction of predictions our model got right. Formally,

❖ Is attempts to answer the question: What proportion of positive identifications

❖ attempts to answer the question: What proportion of actual positives was

❖ First, you need to consider your goal.

❖ If you’ve chosen supervised learning, what’s your target value?

❖ Finding the best algorithm is an iterative process of trial and

❖ Problem solving Framework for ML application:

❖ Invalid dataset leads to invalid results.

❖ Out of a million examples, some inaccurate labels is

❖ Learning AI(ML) algorithms is challenging without an

❖ The •first key step in preparing to explore and exploit AI(ML) is

❖ Machine Learning Tasks and Subtasks:

❖ Data collection is the process of gathering and measuring

❖ Data generating at an unprecedented rate. These data can be:

❖ In order to use the data we collect to develop practical

❖ During an AI development, we always rely on data.

❖ Testing sets represent 20% or 30% of the data. (cross validation)

❖ It is an easy to grasp (understand and implement) and very

❖ Pros: High accuracy, insensitive to outliers, no assumptions

❖ We have an existing set of example data (training set).

❖ When we’re given a new piece of data without a label.

❖ We look at the top k most similar pieces of data from our

❖ Lastly, we take a majority vote from the k most similar pieces

❖ KNN, non-paramteric models can be useful when training data

❖ A model that makes assumption about the relationship can be

❖ Classifying movies into romance or action movies.

Classifying movies by plotting the # kicks and kisses in each movie

Movies with the # of kicks, # of kisses along with their class

Distance b/n each movie and the unknown movie

Euclidian distance where the distance between two vectors

Distance are Distance

❖ Let’s assume k=3.

❖ General approach to kNN:

❖ kNN is an instance-based learning algorithm.

<x, y> 1 <x, y> 1

❖ KNN = {i : d(q, xi) k smallest }

❖ The similarity measure is dependent on the type of the data:

❖ As the number of features or dimension grows, the amount of

❖ What is needed to select a KNN model?

❖ d(x, q) = Euclidian, Manhattan, weighted etc…

❖ kNN is an example of instance-based learning.

You might also like