0% found this document useful (0 votes)

22 views

Lec1 PerceptronPocket Recap

Here is how the figure would look for a perceptron: - There would be a decision boundary line instead of a curve, since perceptron can only learn linear separable data - The data points would be on either side of the decision boundary line - Data points of one class would be on one side of the line, and data points of the other class would be on the other side - The perceptron would make predictions by checking on which side of the decision boundary line a new data point falls - For data points on the line, the perceptron prediction could be determined by the sign of the linear combination of features So in summary, the figure would show linearly separable data with a straight decision boundary line

Uploaded by

tejsharma815

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Lec1 PerceptronPocket Recap

Uploaded by

tejsharma815

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 61

ML – Recap

Perceptron
10th Oct 2022
We infer a rule through instances…….

This is a typical “Prediction” Problem

Training samples : Tuple (Pattern, class label)

Class 1

Class 2

We infer a rule through

What about this ???? instances…….

This is a typical
“Classification” Problem
Test sample : Only pattern is given
We need to complete the Tuple (Pattern, ??class label??)
Extending the problem
Training samples : Tuple (Pattern, class label)
Label = CAR Label = ??

Test sample : Only pattern is given

Label = PLANE We need to complete the Tuple (Pattern, ??class label??)

Label = ??

Training samples : Tuple (Pattern, class label)

Essence of Learning
(From Prof. Moustafa’s slides)

• There has to be a pattern

• Pattern may/may not be captured in a mathematical
expression
• We should have data on it
Question
(From Prof. Mostafa’s slides)
From Data ----->To Features
Class 1

Sample 1 Sample 2 Sample 3

Class 2

Sample 4 Sample 5 Sample 6

Let’s Define Feature as, = Number of Black boxes
Number of black boxes

2
1

1 2 3 4 5 6
Training Sample Number
Feature space
• Feature extractor : Mapping from Data to Feature Space
• Feature Extractor : Data -> (Feature1,Feature2,..)

• What is the advantage ?

• In this case, if the image is of size 10 X 100, we need 1000
pixels to represent every Data point
• But using Feature Extraction, each Data point is now
represented by a smaller set of numbers (Reals/Integers)
• Features should get us closer towards discovering inherent
patterns
Feature space
Extract “good” Features from the Given Data
Each Sample instance is mapped to a point in the
Feature-space
Sample Instance = {Feature1, Feature2, Feature3,….}

For an unseen test sample, what do we do ?

Compute number of black boxes

Number of black boxes

If 1, then Class 2
If 2, then Class 1
2
1

1 2 3 4 5 6
Training Sample Number
Data separability in Feature-space
Feature2
Feature2

Feature1 Feature1
Linearly separable Non-Linearly separable

Which kind of data is easier to work with ??

Basic Learning Premise

Training samples are drawn Independently of each other

All Training samples come from the same underlying process

Training Samples are all Independent Identically Distributed

(IID) Samples
Learning scenarios
1. Supervised Learning

2. Unsupervised Learning

3. Any more ???

Supervised Learning - Scenario
Linear Models

From the Hypothesis set, we are choosing only those that are Linear !!
Linear Models

Examples of Linear Models :

Linear Regression, Logistic Regression, Perceptron,
SVM, LDA….

When do you choose which model ?

What are the differences between them ?
Linear Regression
(Real-valued output)
Linear Regression - Objective
Cost function

Please Remember :
Training Error = In-sample error
Testing Error = Out-of-sample error
Cost function contribution

Notice : All data points that are not on the line contribute to the cost
function
Cost function - Rewriting

Notice : All data points x1, x2…..xN contribute to the cost function
Vector calculus - Hints

Given that w is a vector and U is a matrix/vector

(1) d/dw (UTw) = U
(2) d/dw (wTUw) = 2Uw
Error measure, J = ||y-Xw||2
J = (y – Xw)T(y-Xw), Expanding Norm ||y-Xw||2
Expand the brackets :

J = [yT – (Xw) T][y – Xw]

T T T
= yTy- (Xw) y- y Xw + (Xw) (Xw)

= yTy - 2 y Xw + wTXTXw
T

T T T
Setting dJ/dw = 0

T T T
-2 (y X) + 2 X Xw = 0

T T T
(y X) = X Xw

T T T T
X Xw = (y X) = X y

T T
w = (X X) -1 X

Hence w is the Pseudo Inverse of X

Pseudo-inverse Dimensions

Think : What if “d” was a large number ?!?

Cost function

• Can we think of any other way of quantifying error ?

• How about 1-norm of the Deviation ?

• How will we know when to choose 1-norm, when to

choose 2-norm ?
Logistic Regression
Linear Models – Contd…

Note : The logistic function behaves

“linearly” in a certain interval ; It behaves
like the Signum function at the tails
Logistic (Sigmoid) function

1
 (s)  Where,
1  es
Logistic (Sigmoid) function
• Maps real line to [0,1]
• Can be used to model posterior probability i.e P(C | x)
• Final goal: Feature as input and output as posterior probability,
using the sigmoid model
1
 ( w x) 
T
 wT x
1 e

Step 1: Take linear combination of features (similar to Linear Regression)

Step 2: Apply sigmoid function
Logistic Regression
1 Where,
 (s) 
1  es

What happens when -----

Risk score is high ?? s is large; Ɵ(s) ->

1
Risk score is low ?? s is small; Ɵ(s) ->
0
MLE

We got a closed form solution !!

Closed form solution not possible 

Way ahead : Iterative solution

Gradient Descent
Initialize w as w(t)

Differentiate Error wrt w :

Logistic Regression

• Classifier
• Discriminative model (vs Generative model)
• Parameters – feature weights
• Estimation – ML estimation
• Gradient Descent (Iterative Method)

Summary : Maximum likelihood estimation of ”w” assuming that the observed

training set was generated by a binomial model
Other Linear Models..
Same old Example
Perceptron – Why?

• Guaranteed to solve if Data is Linearly Separable

• Link to Support Vector Machines (SVM) – Neural
Networks (NN)
Perceptron
Linearly separable data
Perceptron -Wiki
• Invented in 1957 by Frank Rosenblatt

• In 1969 a famous book entitled “Perceptions” by Minsky &

Papert , they showed that it was impossible for Perceptron to
learn an XOR function.
To Do

• Perceptron on MNIST Data

• Take 2 classes at a time and check if 2 confusing
classes such as “1” and “7” can be classified using
Perceptron
• Is the Number of iterations taken dependent on the
Initialization ?
Perceptron - MNIST
Feature Space
PLA - Output
Pocket - Version
PLA Vs Pocket
To Do

• Perceptron on MNIST Data

• Think of a suitable feature set. Take 2 classes at a
time and check if 2 confusing classes such as “1”
and “7” can be classified using Perceptron
• What is the cost function we are minimizing in
Perceptron ?
• Apply Gradient Descent to this cost function
• Prove : Convergence of Perceptron
PLA aims to do..
• we have blue points and
the gray points (two
classes)

Assume we start with

the dotted line,
x1+x2 -0.5 = 0;

that will classify two

points wrongly, one blue
point and one gray point 59
PLA
The perceptron learns a line
which separates the
points correctly
1.42x1+0.51x2 -0.5 = 0

This line has zero

Training error

60
PLA
Cost function

Gradient Descent

61
Y
Iteration in PLA

62
Weight correction (blue to
black)
Perceptron inference
• Given this straight line, say we have learnt, the inference is as
follows :
• On the straight line x1-x2 = 0;
• If x1-x2 > 0, then RED class;
• If x1-x2 < 0 BLUE class
x1-x2 < 0
x2 x1-x2 = 0

x1-x2 > 0

x1
Linear Models

How does this figure look for perceptron ?

NN Theory
No ratings yet
NN Theory
138 pages
Lecture Notes 3 Perceptron
No ratings yet
Lecture Notes 3 Perceptron
7 pages
lec22-ML III
No ratings yet
lec22-ML III
51 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
46 pages
GML-slides-2024-04-29 (1)
No ratings yet
GML-slides-2024-04-29 (1)
206 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Lecture13 - ML Linear & Log-Linear Models
No ratings yet
Lecture13 - ML Linear & Log-Linear Models
34 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
NN-Ch2 New V1
No ratings yet
NN-Ch2 New V1
99 pages
lecture19
No ratings yet
lecture19
8 pages
Perceptron Linear Classifiers
No ratings yet
Perceptron Linear Classifiers
42 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Machine Learning - Classifiers and Boosting: Reading CH 18.6-18.12, 20.1-20.3.2
No ratings yet
Machine Learning - Classifiers and Boosting: Reading CH 18.6-18.12, 20.1-20.3.2
54 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Perceptron
No ratings yet
Perceptron
26 pages
Chap2slides - Copy
No ratings yet
Chap2slides - Copy
74 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
No ratings yet
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
59 pages
Week3_LearningI
No ratings yet
Week3_LearningI
48 pages
SML_Lecture5
No ratings yet
SML_Lecture5
45 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
CIS 4526: Foundations of Machine Learning Linear Classification: Perceptron
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Classification: Perceptron
33 pages
Lec 21
No ratings yet
Lec 21
34 pages
CH 1
No ratings yet
CH 1
24 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Basics of Deep Learning: Pierre-Marc Jodoin and Christian Desrosiers
No ratings yet
Basics of Deep Learning: Pierre-Marc Jodoin and Christian Desrosiers
183 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
Lecturenotes Perceptron
No ratings yet
Lecturenotes Perceptron
7 pages
Single Layer Feedforward Networks
No ratings yet
Single Layer Feedforward Networks
21 pages
Lecture 9
No ratings yet
Lecture 9
97 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
cs188 sp23 Lec25 - Z
No ratings yet
cs188 sp23 Lec25 - Z
38 pages
05_optimization_basics
No ratings yet
05_optimization_basics
94 pages
Lecture 3_Regression (1)
No ratings yet
Lecture 3_Regression (1)
47 pages
First Cours 2
No ratings yet
First Cours 2
42 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Perceptron 1
No ratings yet
Perceptron 1
28 pages
S02_DNN_Perceptron_wip
No ratings yet
S02_DNN_Perceptron_wip
24 pages
Chapter 01
No ratings yet
Chapter 01
43 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
125 pages
ANN_Unit-2
No ratings yet
ANN_Unit-2
48 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
02A-DL2023-NN-basics
No ratings yet
02A-DL2023-NN-basics
52 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
100% (2)
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
21 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
lec21-ML II
No ratings yet
lec21-ML II
66 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
Slide 2
No ratings yet
Slide 2
35 pages
ML-UNIT-I
No ratings yet
ML-UNIT-I
14 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
ML Section15 Neural Networks
No ratings yet
ML Section15 Neural Networks
133 pages
07. Linear Regression
No ratings yet
07. Linear Regression
37 pages
Linear Models
No ratings yet
Linear Models
30 pages
383-Fall11-Lec19
No ratings yet
383-Fall11-Lec19
30 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
(eBook PDF) Cognitive Science: An Introduction to the Science of the Mind 2nd Editioninstant download
No ratings yet
(eBook PDF) Cognitive Science: An Introduction to the Science of the Mind 2nd Editioninstant download
51 pages
Application of Machine Learning To Antenna Kim
No ratings yet
Application of Machine Learning To Antenna Kim
2 pages
MCQ Soft Computing Akash & Mirza
No ratings yet
MCQ Soft Computing Akash & Mirza
197 pages
Creating A Neural Network From Scratch in Python
100% (1)
Creating A Neural Network From Scratch in Python
12 pages
Kaggle-Ensembling-Guide Must Read PDF
No ratings yet
Kaggle-Ensembling-Guide Must Read PDF
31 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
51 pages
CS3491 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNIN.docx syllabus
No ratings yet
CS3491 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNIN.docx syllabus
1 page
Nonlinear Causality Test in R
No ratings yet
Nonlinear Causality Test in R
12 pages
Perceptron Notes
No ratings yet
Perceptron Notes
4 pages
Unit 1 NNDL
No ratings yet
Unit 1 NNDL
8 pages
Notes_ML_02_Slides_RNN_ANN
No ratings yet
Notes_ML_02_Slides_RNN_ANN
105 pages
Ece Neural Network and Fuzzy Logic
No ratings yet
Ece Neural Network and Fuzzy Logic
2 pages
Artificial Intelligence Tutorial 5 - Answers: Difficult), and P
100% (1)
Artificial Intelligence Tutorial 5 - Answers: Difficult), and P
5 pages
Prepared By: Mayank Kothari (10BIT036) : Face Recognition Using Artificial Neural Network
No ratings yet
Prepared By: Mayank Kothari (10BIT036) : Face Recognition Using Artificial Neural Network
36 pages
Beyond The Algorithm AI, Security, Privacy, and Ethics (Omar Santos)
No ratings yet
Beyond The Algorithm AI, Security, Privacy, and Ethics (Omar Santos)
437 pages
Neural Networks and Deep Learning
100% (1)
Neural Networks and Deep Learning
50 pages
Mini Project 1
No ratings yet
Mini Project 1
16 pages
Multilayer Perceptron PDF
No ratings yet
Multilayer Perceptron PDF
5 pages
Deep Learning Andrew NG
100% (3)
Deep Learning Andrew NG
173 pages
Week 1 Required Reading 2
No ratings yet
Week 1 Required Reading 2
17 pages
Soft Computing Vs Hard Computing
No ratings yet
Soft Computing Vs Hard Computing
23 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
23 pages
07a80303 Neuralnetworksandfuzzylogicsystems
No ratings yet
07a80303 Neuralnetworksandfuzzylogicsystems
8 pages
Aiml Q Bank
No ratings yet
Aiml Q Bank
25 pages
Detection of Fake Images Using Metadata Analysis and Error Level Analysis
No ratings yet
Detection of Fake Images Using Metadata Analysis and Error Level Analysis
28 pages
Soft Computing Syllabus
No ratings yet
Soft Computing Syllabus
3 pages
UNIT5
No ratings yet
UNIT5
60 pages
Machine Learning with Neural Networks: An Introduction for Scientists and Engineers Bernhard Mehlig instant download
100% (1)
Machine Learning with Neural Networks: An Introduction for Scientists and Engineers Bernhard Mehlig instant download
48 pages
The Unreasonable Effectiveness of Deep Learning in Artificial Intelligence PDF
No ratings yet
The Unreasonable Effectiveness of Deep Learning in Artificial Intelligence PDF
6 pages

Lec1 PerceptronPocket Recap

Uploaded by

Lec1 PerceptronPocket Recap

Uploaded by

ML – Recap

This is a typical “Prediction” Problem

We infer a rule through

Test sample : Only pattern is given

Training samples : Tuple (Pattern, class label)

• There has to be a pattern

Sample 1 Sample 2 Sample 3

Sample 4 Sample 5 Sample 6

• What is the advantage ?

For an unseen test sample, what do we do ?

Number of black boxes

Which kind of data is easier to work with ??

Training samples are drawn Independently of each other

All Training samples come from the same underlying process

Training Samples are all Independent Identically Distributed

3. Any more ???

Examples of Linear Models :

When do you choose which model ?

Given that w is a vector and U is a matrix/vector

J = [yT – (Xw) T][y – Xw]

Hence w is the Pseudo Inverse of X

Think : What if “d” was a large number ?!?

• Can we think of any other way of quantifying error ?

• How about 1-norm of the Deviation ?

• How will we know when to choose 1-norm, when to

Note : The logistic function behaves

Step 1: Take linear combination of features (similar to Linear Regression)

What happens when -----

Risk score is high ?? s is large; Ɵ(s) ->

We got a closed form solution !!

Closed form solution not possible 

Way ahead : Iterative solution

Differentiate Error wrt w :

Summary : Maximum likelihood estimation of ”w” assuming that the observed

• Guaranteed to solve if Data is Linearly Separable

• In 1969 a famous book entitled “Perceptions” by Minsky &

• Perceptron on MNIST Data

• Perceptron on MNIST Data

Assume we start with

that will classify two

This line has zero

How does this figure look for perceptron ?

You might also like