0% found this document useful (0 votes)

51 views71 pages

Co-2 ML 2019

The document provides an overview of supervised machine learning algorithms including nearest neighbor classification, naive Bayes classification, and decision trees. It discusses the basic concepts behind each algorithm such as how nearest neighbor classifies points based on their distance to labeled examples and how naive Bayes makes predictions based on applying Bayes' theorem with independence assumptions. Examples are given to illustrate how each algorithm works. Unsupervised learning techniques like clustering are also briefly mentioned.

Uploaded by

UrsTruly Anirudh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views71 pages

Co-2 ML 2019

Uploaded by

UrsTruly Anirudh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 71

CO-II

• Supervised Learning
 Nearest Neighbour
 Naive Bayes
 Logistic Regression
 Support Vector Machines
 Neural Networks
 Decision Trees
• Unsupervised & Semi-Supervised Learning
 Clustering (K-means, GMMS)
 Factor Analysis (PCA, LDA)
• Learning Theory
 Bias and Variance,
 Probably Approximately Correct (PAC) Learning

1
Supervised Learning

• The model is “trained” on a pre-defined set of “training examples”, which then facilitate its
ability to reach an accurate conclusion when given new data.

• Supervised algorithms learn from labelled training data. The algorithms are “supervised”
because we know what the correct answer is.

• For example if the algorithm receives a bunch of images labelled as apples or oranges it can
first guess the object in the image, then use the label to check if its guess is correct.

• It is called supervised learning because the process of an algorithm learning from the training
dataset can be thought of as a teacher supervising the learning process. We know the correct
answers, the algorithm iteratively makes predictions on the training data and is corrected by
the teacher. Learning stops when the algorithm achieves an acceptable level of performance.

2
K-Nearest Neighbour
• The most basic instance-based method is the k-nearest neighbour
algorithm.
• This algorithm assumes all instances correspond to points in the n-
dimensional space ℜn.
• The nearest neighbours of an instance are defined in terms of the
standard Euclidean distance. More precisely, let an arbitrary instance x
be described by the feature vector.

where ar( x ) denotes the value of the rth attribute of instance x. Then the
distance between two instances xi and xj is defined to be d( xi , xj ),
where

3
K-Nearest Neighbour set of positive and negative training examples is shown on the left,
along with a query instance xq, to be classified. The 1-Nearest Neighbour algorithm
classifies xq, positive, whereas 5-Nearest Neighbour classifies it as negative. On the right
is the decision surface induced by the 1-Nearest Neighbour algorithm for a typical set of
training examples. The convex polygon surrounding each training example indicates the
region of instance space closest to that point (i.e., the instances for which the 1-Nearest
Neighbour algorithm will assign the classification belonging to that training example).

4
• The intuition behind the KNN algorithm is one of the simplest of all the supervised
machine learning algorithms.

• It simply calculates the distance of a new data point to all other training data points.

• The distance can be of any type e.g. Euclidean or Manhattan etc.

• It then selects the K-nearest data points, where K can be any integer. Finally it
assigns the data point to the class to which the majority of the K data points belong.

5
• Euclidean distance computes the root of square difference between
coordinates of pair of objects. Mathematically, it can be represented as

• Manhattan distance computes the absolute differences between

coordinates of pair of objects. Mathematically, it can be represented as

6
• Your task is to classify a new data point with 'X' into "Blue" class or "Red" class. The
coordinate values of the data point are x=45 and y=50. Suppose the value of K is 3.
The KNN algorithm starts by calculating the distance of point X from all the points. It
then finds the 3 nearest points with least distance to point X. This is shown in the
figure below. The three nearest points have been encircled.
• The final step of the KNN algorithm is to assign new point to the class to which
majority of the three nearest points belong. From the figure above we can see that the
two of the three nearest points belong to the class "Red" while one belongs to the class
"Blue". Therefore the new data point will be classified as "Red".

7
Pros
•It is extremely easy to implement
•As said earlier, it is lazy learning
•algorithm and therefore requires no training prior to making real time predictions. This
makes the KNN algorithm much faster than other algorithms that require training e.g SVM,
linear regression etc.
•Since the algorithm requires no training before making predictions, new data can be added
seamlessly.
•There are only two parameters required to implement KNN i.e. the value of K and the
distance function (e.g. Euclidean or Manhattan etc.)

Cons
•The KNN algorithm doesn't work well with high dimensional data because with large
number of dimensions, it becomes difficult for the algorithm to calculate distance in each
dimension.
•The KNN algorithm has a high prediction cost for large datasets. This is because in large
datasets the cost of calculating distance between new point and each existing point
becomes higher.
•Finally, the KNN algorithm doesn't work well with categorical features since it is difficult
to find the distance between dimensions with categorical features.
8
Applications of KNN Algorithm

• KNN is a simple yet powerful classification algorithm. It requires no

training for making predictions, which is typically one of the most
difficult parts of a machine learning algorithm.
• The KNN algorithm have been widely used to find document
similarity and pattern recognition.
• It has also been employed for developing recommender systems and
for dimensionality reduction and pre-processing steps for computer
vision, particularly face recognition tasks.

9
Naive Bayes

• It is a classification technique based on Bayes Theorem with an

assumption of independence among predictors.
• In simple terms, a Naive Bayes classifier assumes that the presence of
a particular feature in a class is unrelated to the presence of any other
feature.
• Naive Bayes model is easy to build and particularly useful for very
large data sets. Along with simplicity, Naive Bayes is known to
outperform even highly sophisticated classification methods.

10
11
12
Bayes theorem provides a way of calculating posterior probability P(Y|X) from P(Y), P(X) and P(X|Y).

13
• For example, a fruit may be considered to
be an apple if it is red, round, and about 3
inches in diameter. Even if these features
depend on each other or upon the existence
of the other features, all of these properties
independently contribute to the probability
that this fruit is an apple and that is why it is
known as ‘Naive’.

14
Let’s understand it using an example. Below I have a training data set of weather and
corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to
classify whether players will play or not based on weather condition. Let’s follow the
below steps to perform it.
Step 1: Convert the data set into a frequency table
Step 2: Create Likelihood table by finding the probabilities like Overcast probability =
0.29 and probability of playing is 0.64.
Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each
class. The class with the highest posterior probability is the outcome of prediction.

15
Problem: Players will play if weather is sunny. Is this statement is correct?
We can solve it using above discussed method of posterior probability.
P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 =
0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.

Naive Bayes uses a similar method to predict the probability of different class based on
various attributes. This algorithm is mostly used in text classification and with problems
having multiple classes.

16
Cons
•It is easy and fast to predict class of test data set. It also perform well in multi class prediction.
•When assumption of independence holds, a Naive Bayes classifier performs better compare to
other models like logistic regression and you need less training data.
•It perform well in case of categorical input variables compared to numerical variable(s). For
numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).
Pros
•If categorical variable has a category (in test data set), which was not observed in training data
set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This
is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One
of the simplest smoothing techniques is called Laplace estimation.
•On the other side naive Bayes is also known as a bad estimator, so the probability outputs
from predict_proba are not to be taken too seriously.
•Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it
is almost impossible that we get a set of predictors which are completely independent.

17
Applications of Naive Bayes Algorithms
Real time Prediction:

• Real time Prediction: Naive Bayes is an eager learning classifier and it is sure
fast. Thus, it could be used for making predictions in real time.
• Multi class Prediction: This algorithm is also well known for multi class
prediction feature. Here we can predict the probability of multiple classes of
target variable.
• Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes
classifiers mostly used in text classification (due to better result in multi class
problems and independence rule) have higher success rate as compared to other
algorithms.
• It is widely used in Spam filtering (identify spam e-mail) and Sentiment
Analysis (in social media analysis, to identify positive and negative customer
sentiments)
• Recommendation System: Naive Bayes Classifier and Collaborative Filtering
together builds a Recommendation System that uses machine learning and data
mining techniques to filter unseen information and predict whether a user would
like a given resource or not.
18
Logistic Regression
“Regression Analysis is a predictive modelling technique. It estimates the relationship
between a dependent (target) and an independent variable(predictor).”

• Its a classification algorithm used to assign observations to a discrete set of classes.

• Unlike linear regression which outputs continuous number values, logistic regression transforms its
output using the logistic sigmoid function to return a probability value which can then be mapped to
two or more discrete classes.

19
Comparison of linear & logistic regression

Example Given data on time spent studying and exam scores.

Linear Regression and logistic regression can predict different
things:
•Linear Regression could help us predict the student’s test score on a scale of
0 - 100. Linear regression predictions are continuous (numbers in a range).
•Logistic Regression could help use predict whether the student passed or
failed. Logistic regression predictions are discrete (only specific values or
categories are allowed). We can also view probability scores underlying the
model’s classifications.

20
Types of logistic regression

•Binary (Pass/Fail)
•Multi (Cats, Dogs, Sheep)
•Ordinal (Low, Medium, High)

21
Comparison of linear & logistic regression

22
Sigmoid Activation

In order to map predicted values to probabilities, we use the sigmoid function.

The function maps any real value into another value between 0 and 1.
In machine learning, we use sigmoid to map predictions to probabilities.

S(z) = output between 0 and 1 (probability

estimate)
z = input to the function (your algorithm’s
prediction e.g. mx + b)
e = base of natural log

23
Decision Boundary

Our current prediction function returns a probability score between 0 and 1. In order
to map this to a discrete class (true/false, cat/dog), we select a threshold value or
tipping point above which we will classify values into class 1 and below which we
classify values into class 2.

p ≥ 0.5,class = 1
p < 0.5,class = 0

For example, if our threshold was .5 and our prediction function returned .7, we
would classify this observation as positive. If our prediction was .2 we would classify
the observation as negative. For logistic regression with multiple classes we could
select the class with the highest predicted probability.

24
Binary Logistic Regression

Say we’re given data on student exam results and our goal is to predict
whether a student will pass or fail based on number of hours slept and
hours spent studying. We have two features (hours slept, hours studied)
and two classes: passed (1) and failed (0).

25
Support Vector Machines
• A support vector machine allows you to classify data that’s linearly
separable.
• If it isn’t linearly separable, use the kernel trick to make it work.
• However, for text classification it’s better to just stick to a linear
kernel.

26
27
Advantages

• Compared to newer algorithms like neural networks, they have two

main advantages: higher speed and better performance with a limited
number of samples (in the thousands).

• This makes the algorithm very suitable for text classification

problems, where it’s common to have access to a dataset of at most a
couple of thousands of tagged samples.

28
Neural Networks

29
Neural Networks

• In 1943, Warren S. McCulloch, a neuroscientist, and Walter Pitts, a

logician, developed the first conceptual model of an artificial neural
network. In their paper, "A logical calculus of the ideas imminent in
nervous activity,” they describe the concept of a neuron, a single cell
living in a network of cells that receives inputs, processes those inputs,
and generates an output.

30
Perceptron

• Perceptron has just 2 layers of nodes (input nodes and output nodes).
Often called a single-layer network on account of having 1 layer of
links, between input and output.

• The training of the perceptron consists of feeding it multiple training

samples and calculating the output for each of them. After each
sample, the weights w are adjusted in such a way so as to minimize the
output error, defined as the difference between the desired (target) and
the actual outputs. There are other error functions, like the mean
square error, but the basic principle of training remains the same.

31
• The single perceptron approach to deep learning has one major
drawback: it can only learn linearly separable functions.

• To address this problem, we’ll need to use a multilayer perceptron,

also known as feedforward neural network: in effect, we’ll compose a
bunch of these perceptrons together to create a more powerful
mechanism for learning.

32
Neural Networks

33
34
35
36
37
38
39
Activation Functions

40
41
42
43
44
45
46
47
The Problem with Large Networks
A neural network can have more than one hidden layer: in that case, the higher layers are “building” new
abstractions on top of previous layers. And as we mentioned before, you can often learn better in-practice
with larger networks.
However, increasing the number of hidden layers leads to two known issues:

•Vanishing gradients: as we add more and more hidden layers, backpropagation

becomes less and less useful in passing information to the lower layers. In effect, as
information is passed back, the gradients begin to vanish and become small relative to the
weights of the networks.

•Overfitting: perhaps the central problem in machine learning. Briefly, overfitting

describes the phenomenon of fitting the training data too closely, maybe with hypotheses
that are too complex. In such a case, your learner ends up fitting the training data really well,
but will perform much, much more poorly on real examples.

48
Decision Trees
• Decision tree learning uses a decision tree (as a predictive model) to go
from observations about an item (represented in the branches) to
conclusions about the item's target value (represented in the leaves).

• It is one of the predictive modelling approaches used in statistics, data

mining and machine learning.

• Tree models where the target variable can take a discrete set of values
are called classification trees; in these tree structures, leaves represent
class labels and branches represent conjunctions of features that lead to
those class labels.

• Decision trees where the target variable can take continuous values
(typically real numbers) are called regression trees.

49
Decision trees used in machine learning are of two main types:

•Classification tree analysis is when the predicted outcome is the class

(discrete) to which the data belongs.
•Regression tree analysis is when the predicted outcome can be considered a
real number (e.g. the price of a house, or a patient's length of stay in a
hospital).

50
A tree showing survival of
passengers on the Titanic ("sibsp"
is the number of spouses or
siblings aboard). The figures
under the leaves show the
probability of survival and the
percentage of observations in the
leaf. Summarizing: Your chances
of survival were good if you were
(i) a female or (ii) a male younger
than 9.5 years with less than 2.5
siblings.

51
Decision trees have various advantages
•Simple to understand and interpret.
•Able to handle both numerical and categorical data.
•Other techniques are usually specialized in analyzing datasets that have only one type of variable. (For example,
relation rules can be used only with nominal variables while neural networks can be used only with numerical
variables or categoricals converted to 0-1 values.)
•Requires little data preparation. Other techniques often require data normalization. Since trees can handle
qualitative predictors, there is no need to create dummy variables.
•Uses a white box model. If a given situation is observable in a model the explanation for the condition is easily
explained by boolean logic. Possible to validate a model using statistical tests.
•Non-statistical approach that makes no assumptions of the training data or prediction residuals; e.g., no
distributional, independence, or constant variance assumptions
•Performs well with large datasets.
•Mirrors human decision making more closely than other approaches.
•Robust against co-linearity, particularly boosting.
•In built feature selection. Additional irrelevant feature will be less used so that they can be removed on subsequent
runs.
•Decision trees can approximate any Boolean function eq. XOR

52
Limitations
• Trees can be very non-robust. A small change in the training data can
result in a large change in the tree and consequently the final
predictions.
• The problem of learning an optimal decision tree is known to be NP-
complete under several aspects of optimality and even for simple
concepts.
• Decision-tree learners can create over-complex trees that do not
generalize well from the training data (overfitting). Mechanisms such
as pruning are necessary to avoid this problem
• For data including categorical variables with different numbers of
levels, information gain in decision trees is biased in favor of
attributes with more levels.

53
Learning Theory: Bias and Variance
• In statistics and machine learning, the bias–variance tradeoff is the property of a set of
predictive models whereby models with a lower bias in parameter estimation have a
higher variance of the parameter estimates across samples, and vice versa.

• The bias–variance dilemma or problem is the conflict in trying to simultaneously

minimize these two sources of error that prevent supervised learning algorithms from
generalizing beyond their training set.

54
• The bias is an error from erroneous assumptions in the learning algorithm.
High bias can cause an algorithm to miss the relevant relations between
features and target outputs (underfitting).

• The variance is an error from sensitivity to small fluctuations in the training

set. High variance can cause an algorithm to model the random noise in the
training data, rather than the intended outputs (overfitting).

55
Bias

• Models with low bias are usually more complex (e.g.

higher-order regression polynomials), enabling them to
represent the training set more accurately. In the process,
however, they may also represent a large noise component
in the training set, making their predictions less accurate -
despite their added complexity. In contrast, models with
higher bias tend to be relatively simple (low-order or even
linear regression polynomials) but may produce lower
variance predictions when applied beyond the training set.

56
Variance

• High-variance learning methods may be able to represent

their training set well but are at risk of overfitting to noisy
or unrepresentative training data. In contrast, algorithms
with low variance typically produce simpler models that
don't tend to overfit but may underfit their training data,
failing to capture important regularities.

57
Approaches to reduce Bias-Variance trade off
problem

• Dimensionality reduction and feature selection can decrease variance by simplifying models. Similarly,
a larger training set tends to decrease variance. Adding features (predictors) tends to decrease bias, at
the expense of introducing additional variance. Learning algorithms typically have some tunable
parameters that control bias and variance; for example,
• linear and Generalized linear models can be regularized to decrease their variance at the cost of
increasing their bias.
• In artificial neural networks, the variance increases and the bias decreases as the number of hidden
units increase
• In k-nearest neighbor models, a high value of k leads to high bias and low variance.
• In instance-based learning, regularization can be achieved varying the mixture of prototypes and
exemplars.
• In decision trees, the depth of the tree determines the variance. Decision trees are commonly pruned to
control variance.
• One way of resolving the trade-off is to use mixture models and ensemble learning.

58
Probably Approximately Correct (PAC)
Learning

59
Computational learning theory

Intersection of AI, statistics, and computational theory.

Introduce Probably Approximately Correct Learning concerning efficient

learning

For our learning procedures we would like to prove that:

With high probability an (efficient) learning algorithm will find a hypothesis that
is approximately identical to the hidden target concept.

Note the double “hedging” – probably and approximately.

Why do we need both levels of uncertainty (in general)?

Carla P. Gomes
CS4700
Probably Approximately
Correct Learning

Underlying principle:

Seriously wrong hypotheses can be found out almost certainly

(with high probability) using a “small” number of examples

– Any hypothesis that is consistent with a significantly large

set of training examples is unlikely to be seriously wrong: it
must be probably approximately correct.

– Any (efficient) algorithm that returns hypotheses that are

PAC is called a PAC-learning algorithm
Carla P. Gomes
CS4700
Probably Approximately
Correct Learning

How many examples are needed to guarantee correctness?

– Sample complexity (# of examples to “guarantee” correctness)

grows with the size of the Hypothesis space

– Stationarity assumption: Training set and test sets are drawn

from the same distribution

Carla P. Gomes
CS4700
Notations
Notations:
– X: set of all possible examples
– D: distribution from which examples are drawn
– H: set of all possible hypotheses
– N: the number of examples in the training set
– f: the true function to be learned

Assume: the true function f is in H.

Error of a hypothesis h wrt f :

Probability that h differs from f on a randomly picked example:

error(h) = P(h(x) ≠ f(x)| x drawn from D)

Exactly what we are trying to measure with our test set.

Carla P. Gomes
CS4700
Approximately Correct

A hypothesis h is approximately correct if:

error(h) ≤ ε,

where ε is a given threshold, a small constant

Goal:

Show that after seeing a small (poly) number of examples N, with

high probability, all consistent hypotheses, will be approximately correct.

I.e, chance of “bad” hypothesis, (high error but consistent with examples) is
small (i.e, less than )

Carla P. Gomes
CS4700
Approximately Correct

Approximately correct hypotheses lie inside

the ε -ball around f;
Those hypotheses that are seriously wrong (hb 
HBad) are outside the ε -ball,

Error(hbad)= P(hb(x) ≠ f(x)| x drawn from D) > ε,

Thus the probability that the hbad (a seriously wrong

hypothesis) disagrees with one example is at least ε
(definition of error).
Thus the probability that the hbad (a seriously wrong hypothesis) agrees with one
example is no more than (1- ε).

So for N examples, P(hb agrees with N examples)  (1- ε )N. Carla P. Gomes
CS4700
Approximately Correct Hypothesis

The probability that HBad contains at least one consistent hypothesis is

bounded by the sum of the individual probabilities.

P(Hbad contains a consistent hypothesis, agreeing with all the examples)

 (|Hbad|(1- ε )N  (|H|(1- ε )N

hbad agrees with one example is no more than (1- ε).

Carla P. Gomes
CS4700
P(Hbad contains a consistent hypothesis)  (|Hbad|(1- ε )N  (|H|(1- ε )N
Goal –
Bound the probability of learning a bad hypothesis below some
small number .

Note:

Sample Complexity: Number of examples to

guarantee a PAC learnable function class
If the learning algorithm returns a
hypothesis that is consistent with this many
Sample Complexity examples, then with probability at least (1-) the
learning algorithm has an error of at most ε.
The more accuracy (smaller ε), and
and the hypothesis is
the more certainty (with smaller δ)
Probably Approximately Correct.
one wants, the more examples one needs.
Probably Approximately correct hypothesis h:
– If the probability of a small error (error(h) ≤ ε ) is greater than or equal to
a given threshold 1 - δ
– A bound on the number of examples (sample complexity) needed to
guarantee PAC, that is polynomial

(The more accuracy (with smaller ε), and the more certainty desired (with smaller δ), the more
examples one needs.)
– An efficient learning algorithm

Theoretical results apply to fairly simple learning models (e.g., decision list learning)

Carla P. Gomes
CS4700
PAC Learning

Two steps:

Sample complexity – a polynomial number of examples suffices to specify a good consistent

hypothesis (error(h) ≤ ε ) with high probability ( 1 – δ).

Computational complexity – there is an efficient algorithm for learning a consistent hypothesis

from the small sample.

Let’s be more specific with examples.

Carla P. Gomes
CS4700
Example:
Boolean Functions

2n
Consider H the set of all Boolean function on n attributes | H | 2
1 1
N  (ln  ln | H |)  O(2 n )
 
So the sample complexity grows as 2n !
(same as the number of all possible examples)
Not PAC-Learnable!
So, any learning algorithm will do not better than a lookup table
if it merely returns a hypothesis that is consistent with all known
examples!

Intuitively what does it say about H?

About learning in general? Carla P. Gomes
CS4700
Coping With Learning Complexity

1. Force learning algorithm to look for smallest consistent hypothesis.

We considered that for Decision Tree Learning, often worst case

intractable though.

2. Restrict size of hypothesis space.

e.g., Decision Lists  restricted form of Boolean Functions:
Hypotheses correspond to a series of tests, each of which a
conjunction of literals

Good news: only a poly size number of examples

is required for guaranteeing PAC learning K-DL functions
and there are efficient algorithms for learning K-DL
Carla P. Gomes
CS4700

02 NOF Dam Section
No ratings yet
02 NOF Dam Section
26 pages
Experiment 3: Extended Surface Heat Transfer - Rectangular Fin
No ratings yet
Experiment 3: Extended Surface Heat Transfer - Rectangular Fin
8 pages
Unit 5-6
No ratings yet
Unit 5-6
18 pages
Unit 5
No ratings yet
Unit 5
28 pages
Unit 3 - Supervise Learning Classification
No ratings yet
Unit 3 - Supervise Learning Classification
23 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
FPA Notes
No ratings yet
FPA Notes
13 pages
Classification
No ratings yet
Classification
50 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
CSE-VSEM-503-B-PR-UNIT-2-NOTES
No ratings yet
CSE-VSEM-503-B-PR-UNIT-2-NOTES
17 pages
Chapter
100% (1)
Chapter
101 pages
Module Iii
No ratings yet
Module Iii
15 pages
Internet of Things Comparative Study
No ratings yet
Internet of Things Comparative Study
3 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
FPA unit 2
No ratings yet
FPA unit 2
20 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Topics in Module-3-: ML & Cloud Computing For Iot
No ratings yet
Topics in Module-3-: ML & Cloud Computing For Iot
149 pages
L05-Predictive Analytics I
No ratings yet
L05-Predictive Analytics I
49 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
DM - MP (1)
No ratings yet
DM - MP (1)
15 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
ml unit2
No ratings yet
ml unit2
38 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
Unit - 3
No ratings yet
Unit - 3
83 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
Lecture_07_slides
No ratings yet
Lecture_07_slides
45 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
58-Khushbu Khamar-Short Text Classification USING
No ratings yet
58-Khushbu Khamar-Short Text Classification USING
4 pages
ML & Cloud Computing For Iot: Topics in Module-3
No ratings yet
ML & Cloud Computing For Iot: Topics in Module-3
38 pages
chapter-4
No ratings yet
chapter-4
40 pages
ML UNIT-2
No ratings yet
ML UNIT-2
33 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Machine Learning unit 3
No ratings yet
Machine Learning unit 3
40 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
Day 4 Content
No ratings yet
Day 4 Content
35 pages
ml5
No ratings yet
ml5
35 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
CMTH642 - Module 10.2- Classification
No ratings yet
CMTH642 - Module 10.2- Classification
10 pages
Comparison of Classification Algorithms
No ratings yet
Comparison of Classification Algorithms
11 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
Classification
No ratings yet
Classification
58 pages
3.1 K Nearest Neighbour Classifier (1)
No ratings yet
3.1 K Nearest Neighbour Classifier (1)
24 pages
ch2
No ratings yet
ch2
30 pages
2. Classification and clustering algorithms
No ratings yet
2. Classification and clustering algorithms
108 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Classification
No ratings yet
Classification
7 pages
ML Unit 3
No ratings yet
ML Unit 3
12 pages
Ml Module4 Classification
No ratings yet
Ml Module4 Classification
79 pages
Unit 4 Supervised Learning
100% (1)
Unit 4 Supervised Learning
75 pages
AIML Unit-IV & V
100% (1)
AIML Unit-IV & V
47 pages
Unit-4 Unsupervised Algorithm
No ratings yet
Unit-4 Unsupervised Algorithm
18 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Unit 4_KVR
No ratings yet
Unit 4_KVR
111 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Cambridge IGCSE™: Physics 0625/52 October/November 2020
No ratings yet
Cambridge IGCSE™: Physics 0625/52 October/November 2020
8 pages
Syllabus PGDM FM
No ratings yet
Syllabus PGDM FM
73 pages
The Rise and Decline of The Soviet Economy
No ratings yet
The Rise and Decline of The Soviet Economy
23 pages
Rhino
No ratings yet
Rhino
848 pages
MATH 114 Module 7 All Answers
No ratings yet
MATH 114 Module 7 All Answers
22 pages
Gmat 数学 Problem Solving: Section 1 30 Minutes (20 Questions)
No ratings yet
Gmat 数学 Problem Solving: Section 1 30 Minutes (20 Questions)
30 pages
FCE 401 Chapter 1 and 2
No ratings yet
FCE 401 Chapter 1 and 2
34 pages
Tree Circumferenceand Distance Lab 214656543
No ratings yet
Tree Circumferenceand Distance Lab 214656543
4 pages
Pre Assessment Summative
No ratings yet
Pre Assessment Summative
26 pages
FSD6
No ratings yet
FSD6
2 pages
Stepph Curry RoboticsFinalProjectReport
No ratings yet
Stepph Curry RoboticsFinalProjectReport
5 pages
Chapter 2 Measurement of Discharge
No ratings yet
Chapter 2 Measurement of Discharge
23 pages
Revision Notes For Class 11 Physics Motion in A Plane For NEET
No ratings yet
Revision Notes For Class 11 Physics Motion in A Plane For NEET
17 pages
Chapter 2 Econometric
No ratings yet
Chapter 2 Econometric
28 pages
Wave Mechanics
No ratings yet
Wave Mechanics
77 pages
FINS5535 Final2013.FormulaSheetOnly
No ratings yet
FINS5535 Final2013.FormulaSheetOnly
3 pages
DC 21 564 568 PDF
No ratings yet
DC 21 564 568 PDF
5 pages
Sudoku Generation
No ratings yet
Sudoku Generation
20 pages
Synthesis and Implementation of Vibration Suppression by 6 DOF Active Platform
No ratings yet
Synthesis and Implementation of Vibration Suppression by 6 DOF Active Platform
6 pages
Bagus
No ratings yet
Bagus
56 pages
Asm 9090
No ratings yet
Asm 9090
8 pages
Chapter 5 2nd Year
No ratings yet
Chapter 5 2nd Year
1 page
Math 8 Diagnostic Test
No ratings yet
Math 8 Diagnostic Test
4 pages
Computer Assignment
No ratings yet
Computer Assignment
19 pages
Pspice Analysis of DC Circuits
No ratings yet
Pspice Analysis of DC Circuits
4 pages
Paired and Independent Samples T Test
No ratings yet
Paired and Independent Samples T Test
34 pages
Bachelor of Computer Adminstration (Bca) : Practical File On
No ratings yet
Bachelor of Computer Adminstration (Bca) : Practical File On
93 pages
Mat212 PDF
No ratings yet
Mat212 PDF
4 pages

Co-2 ML 2019

Uploaded by

Co-2 ML 2019

Uploaded by

CO-II

• The distance can be of any type e.g. Euclidean or Manhattan etc.

• Manhattan distance computes the absolute differences between

• KNN is a simple yet powerful classification algorithm. It requires no

• It is a classification technique based on Bayes Theorem with an

• Its a classification algorithm used to assign observations to a discrete set of classes.

Example Given data on time spent studying and exam scores.

In order to map predicted values to probabilities, we use the sigmoid function.

S(z) = output between 0 and 1 (probability

• Compared to newer algorithms like neural networks, they have two

• This makes the algorithm very suitable for text classification

• In 1943, Warren S. McCulloch, a neuroscientist, and Walter Pitts, a

• The training of the perceptron consists of feeding it multiple training

• To address this problem, we’ll need to use a multilayer perceptron,

•Vanishing gradients: as we add more and more hidden layers, backpropagation

•Overfitting: perhaps the central problem in machine learning. Briefly, overfitting

• It is one of the predictive modelling approaches used in statistics, data

•Classification tree analysis is when the predicted outcome is the class

• The bias–variance dilemma or problem is the conflict in trying to simultaneously

• The variance is an error from sensitivity to small fluctuations in the training

• Models with low bias are usually more complex (e.g.

• High-variance learning methods may be able to represent

Intersection of AI, statistics, and computational theory.

Introduce Probably Approximately Correct Learning concerning efficient

For our learning procedures we would like to prove that:

Note the double “hedging” – probably and approximately.

Why do we need both levels of uncertainty (in general)?

Seriously wrong hypotheses can be found out almost certainly

– Any hypothesis that is consistent with a significantly large

– Any (efficient) algorithm that returns hypotheses that are

How many examples are needed to guarantee correctness?

– Sample complexity (# of examples to “guarantee” correctness)

– Stationarity assumption: Training set and test sets are drawn

Assume: the true function f is in H.

Error of a hypothesis h wrt f :

Probability that h differs from f on a randomly picked example:

error(h) = P(h(x) ≠ f(x)| x drawn from D)

Exactly what we are trying to measure with our test set.

A hypothesis h is approximately correct if:

where ε is a given threshold, a small constant

Show that after seeing a small (poly) number of examples N, with

Approximately correct hypotheses lie inside

Error(hbad)= P(hb(x) ≠ f(x)| x drawn from D) > ε,

Thus the probability that the hbad (a seriously wrong

The probability that HBad contains at least one consistent hypothesis is

P(Hbad contains a consistent hypothesis, agreeing with all the examples)

hbad agrees with one example is no more than (1- ε).

Sample Complexity: Number of examples to

Sample complexity – a polynomial number of examples suffices to specify a good consistent

Computational complexity – there is an efficient algorithm for learning a consistent hypothesis

Let’s be more specific with examples.

Intuitively what does it say about H?

1. Force learning algorithm to look for smallest consistent hypothesis.

We considered that for Decision Tree Learning, often worst case

2. Restrict size of hypothesis space.

Good news: only a poly size number of examples

You might also like