0% found this document useful (0 votes)

2 views

lec22-ML III

The document includes announcements for a project due date and a guest lecture on large model development. It covers topics related to logistic regression and neural networks, detailing the perceptron model, learning rules for binary and multiclass perceptrons, and introduces logistic regression as a probabilistic approach. Additionally, it discusses the properties, challenges, and improvements of perceptrons, as well as the application of deep neural networks for classification tasks.

Uploaded by

23020011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

lec22-ML III

Uploaded by

23020011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Announcements

§ Project 4 due today (Thursday, Nov 14) at 11:59pm PT

§ Catherine Olsson (Anthropic) giving guest lecture next Tuesday

(Nov 19) on large model development and interpretability
§ Come in person and ask questions!
CS 188: Artificial Intelligence
Logistic Regression and Neural Networks

[These slides were created by Dan Klein, Pieter Abbeel, Anca Dragan, Sergey Levine. All CS188 materials are at http://ai.berkeley.edu.]
Last Time: Perceptron

§ Inputs are feature values

§ Each feature has a weight
§ Sum is the activation

w1
§ If the activation is: f1

S
w2
§ Positive, output +1 f2 >0?
w3
§ Negative, output -1 f3
Last Time: Perceptron

§ Inputs are feature values

§ Each feature has a weight
§ Sum is the activation

Originated from computationally modeling neurons:

w1
§ If the activation is: f1

S
w2
§ Positive, output +1 f2 >0?
w3
§ Negative, output -1 f3
Binary Decision Rule
§ In the space of feature vectors
§ Examples are points
§ Any weight vector is a hyperplane
§ One side corresponds to Y=+1
§ Other corresponds to Y=-1

money
2

+1 = SPAM

1
BIAS : -3
free : 4
money : 2
... 0
-1 = HAM 0 1 free
Learning: Binary Perceptron
§ Start with weights w = 0
§ For each training instance f(x), y*:
§ Classify with current weights

§ If correct (i.e., y=y*), no change!

§ If wrong: adjust the weight vector by
adding or subtracting the feature
vector. Subtract if y* is -1.
Learning: Binary Perceptron
Before update: After update:
§ Start with weights w = 0
§ For each training instance f(x), y*:
§ Classify with current weights

§ If correct: (i.e., y=y*), no change!

§ If wrong: adjust the weight vector by
adding or subtracting the feature
vector. Subtract if y* is -1. 𝑤⋅𝑓 𝑤 + 𝑦∗ ⋅ 𝑓 ⋅ 𝑓
= 𝑤 ⋅ 𝑓 + 𝑦∗ ⋅ 𝑓 ⋅ 𝑓
Multiclass Decision Rule

§ If we have multiple classes:

§ A weight vector for each class:

§ Score (activation) of a class y:

§ Prediction highest score wins

Binary = multiclass where the negative class has weight zero

Learning: Multiclass Perceptron

§ Start with all weights = 0

§ Pick up training examples f(x), y* one by one Predicted Class

§ Predict with current weights

§ If correct: no change! True Class

§ If wrong: lower score of wrong answer, raise
score of right answer
Learning: Multiclass Perceptron
Before update: After update:
§ Start with all weights = 0
§ Pick up training examples f(x), y* one by one
§ Predict with current weights

§ If correct: no change!
§ If wrong: lower score of wrong answer, raise
score of right answer
Score of wrong class: Score of wrong class:
𝑤" ⋅ 𝑓 𝑤" − 𝑓 ⋅ 𝑓
= 𝑤" ⋅ 𝑓 − 𝑓 ⋅ 𝑓

Score of right class: Score of right class:

𝑤" ∗ ⋅ 𝑓 𝑤" ∗ ⋅ 𝑓 + 𝑓 ⋅ 𝑓
Example: Multiclass Perceptron
Iteration 0: x: “win the vote” f(x): [1 1 0 1 1] y*: politics
Iteration 1: x: “win the election” f(x): [1 1 0 0 1] y*: politics
Iteration 2: x: “win the game” f(x): [1 1 1 0 1] y*: sports

BIAS 1 0 0 1 BIAS 0 1 1 0 BIAS 0 0 0 0

win 0 -1 -1 0 win 0 1 1 0 win 0 0 0 0
game 0 0 0 1 game 0 0 0 -1 game 0 0 0 0
vote 0 -1 -1 -1 vote 0 1 1 1 vote 0 0 0 0
the 0 -1 -1 0 the 0 1 1 0 the 0 0 0 0
𝑤⋅𝑓 𝑥 : 1 -2 -2 𝑤⋅𝑓 𝑥 : 0 3 3 𝑤⋅𝑓 𝑥 : 0 0 0
Properties of Perceptrons
Separable
§ Separability: true if some parameters get the training set
perfectly correct

§ Convergence: if the training is separable, perceptron will

eventually converge (binary case)

§ Mistake Bound: the maximum number of mistakes (binary

Non-Separable
case) related to the margin or degree of separability

# of features
# of mistakes during training < !
width of margin
Problems with the Perceptron

§ Noise: if the data isn’t separable,

weights might thrash
§ Averaging weight vectors over time
can help (averaged perceptron)

§ Mediocre generalization: finds a

“barely” separating solution

§ Overtraining: test / held-out

accuracy usually rises, then falls
§ Overtraining is a kind of overfitting
Improving the Perceptron
Non-Separable Case: Deterministic Decision
Even the best linear boundary makes at least one mistake
Non-Separable Case: Probabilistic Decision
0.9 | 0.1
0.7 | 0.3
0.5 | 0.5
0.3 | 0.7
0.1 | 0.9
How to get probabilistic decisions?
§ Perceptron scoring: z = w · f (x)
§ If z = w · f (x) very positive à want probability of + going to 1
§ If z = w · f (x) very negative à want probability of + going to 0
𝑧>0
𝑧=0

𝑤 𝑧<0
How to get probabilistic decisions?
§ Perceptron scoring: z = w · f (x)
§ If z = w · f (x) very positive à want probability of + going to 1
§ If z = w · f (x) very negative à want probability of + going to 0

§ Sigmoid function

1
(z) = z
1+e
𝑒4
= 4
𝑒 +1
How to get probabilistic decisions?
§ Perceptron scoring: z = w · f (x)
§ If z = w · f (x) very positive à want probability of + going to 1
§ If z = w · f (x) very negative à want probability of + going to 0

§ Sigmoid function
!
𝑃 𝑦 = +1 𝑥 ; 𝑤) =
1 !"# !"⋅$(&)
(z) = z
1+e !
𝑃 𝑦 = −1 𝑥 ; 𝑤) = 1 − !"# !"⋅$(&)
= Logistic Regression
A 1D Example
𝑃 𝑟𝑒𝑑 𝑥

𝑓(𝑥)

definitely blue not sure definitely red

1
𝑃 𝑟𝑒𝑑 𝑥 ; 𝑤 = 𝜙 𝑤 ⋅ 𝑓(𝑥) =
1 + 𝑒 56⋅8(:)
A 1D Example: varying w
𝑃 𝑟𝑒𝑑 𝑥

𝑤=1

𝑤 = 10
𝑤=∞
𝑓(𝑥)

1
𝑃 𝑟𝑒𝑑 𝑥 ; 𝑤 = 𝜙 𝑤 ⋅ 𝑓(𝑥) =
1 + 𝑒 $%⋅'())
A 1D Example: varying w
𝑃 𝑟𝑒𝑑 𝑥

𝑓(𝑥)
A 1D Example: varying w
𝑃 𝑟𝑒𝑑 𝑥

𝑓(𝑥)
Best w?
§ Recall maximum likelihood estimation: Choose the w value that
maximizes the probability of the observed (training) data
Best w?
§ Recall maximum likelihood estimation: Choose the w value that
maximizes the probability of the observed (training) data
Separable Case: Deterministic Decision – Many Options
Separable Case: Probabilistic Decision – Clear Preference

0.7 | 0.3
0.5 | 0.5
0.7 | 0.3
0.3 | 0.7 0.5 | 0.5
0.3 | 0.7
Multiclass Logistic Regression
Multiclass Logistic Regression
§ Recall Perceptron:
§ A weight vector for each class:

§ Score (activation) of a class y: z=

§ Prediction highest score wins

§ How to make the scores into probabilities?

z1
e ez2 ez3
z1 , z2 , z3 ! , ,
ez1 + ez2 + ez3 ez1 + ez2 + ez3 ez1 + ez2 + ez3
original activations softmax activations
-!" -!$
§ In general: softmax 𝑧+, . . . , 𝑧, = [∑ !# , … , ∑ !# ]
/ /
Multiclass Logistic Regression
§ Recall Perceptron:
§ A weight vector for each class:

§ Score (activation) of a class y: z=

§ Prediction highest score wins

§ How to make the scores into probabilities?

!" ⋅$(&)
!
𝑃 𝑦 𝑥 ; 𝑤) = !"( ⋅$(&)
∑"( !

= Multi-Class Logistic Regression

Best w?
§ Maximum likelihood estimation:
X
(i) (i)
max ll(w) = max log P (y |x ; w)
w w
i

wy(i) ·f (x(i) )
(i) (i) e
with: P (y |x ; w) = P (i) )
e w y ·f (x
y

= Multi-Class Logistic Regression

Logistic Regression for 3-way classification

𝑓#
𝑧# s
𝑓$ o
f
𝑧$ t
𝑓% m
a
x
… 𝑧%

𝑓&

𝑧> = 𝑤> ⋅ 𝑓
= 2 𝑤?> ⋅ 𝑓?
?
Logistic Regression for 3-way classification

𝑓# 𝑤""
𝑤!" 𝑧# s
𝑓$ o
f
𝑤#" 𝑧$ t
𝑓% m
a
𝑤$" x
… 𝑧%

𝑓&

𝑧> = 𝑤> ⋅ 𝑓
= 2 𝑤?> ⋅ 𝑓?
?
Logistic Regression for 3-way classification

x1 𝑓#
𝑧# s
x2 𝑓$ o
f
Feature Extraction 𝑧$ t
x3 Code
𝑓% m
a
… x
… 𝑧%

Xd 𝑓&
Deep Neural Network for 3-way classification
Layer 1 Layer 2 Layer L

x1
𝑧# s
x2 o
f
… t
𝑧$
x3 m
a
… … … x
… 𝑧%

Xd
Deep Neural Network for 3-way classification
Hidden unit 1 in layer 1

(#)
x1 ℎ#
𝑧# s
x2 o
f
… t
𝑧$
x3 m
a
… … … x
… 𝑧%

Xd
Deep Neural Network for 3-way classification
Hidden unit 1 in layer 1
(")
𝑤"" (#)
x1 ℎ#
(")
𝑤!" 𝑧# s
x2 o
f
… t
𝑧$
x3 m
a
… … … x
(") … 𝑧%
𝑤'"

(!) (!) !
ℎ! =𝜙 𝑤! ⋅𝑥 = 𝜙() 𝑤$% ⋅ 𝑥$ )
$
𝜙 = activation function
Deep Neural Network for 3-way classification

(#)
x1 ℎ#
𝑧# s
(#) o
x2 ℎ$
f
… t
(#) 𝑧$
x3 ℎ% m
a
… … … x
… 𝑧%

(#)
Xd ℎ&
Deep Neural Network for 3-way classification
Hidden unit 1 in layer 2

(#) ($)
x1 ℎ# ℎ#
𝑧# s
(#) o
x2 ℎ$
f
… t
(#) 𝑧$
x3 ℎ% m
a
… … … x
… 𝑧%

(#)
Xd ℎ&
Deep Neural Network for 3-way classification
Hidden unit 1 in layer 2
(!)
(#)
𝑤"" ($)
x1 ℎ# ℎ#
(!)
𝑤!" s
𝑧#
(#) o
x2 ℎ$
f
… t
(#) 𝑧$
x3 ℎ% m
a
… … (!) … x
𝑤$" … 𝑧%

(#)
Xd ℎ&

(&) (&) (!) & (!)

ℎ! =𝜙 𝑤! ⋅ℎ = 𝜙() 𝑤$% ⋅ ℎ$ )
$
𝜙 = activation function
Deep Neural Network for 3-way classification

(#) ($)
x1 ℎ# ℎ#
𝑧# s
(#) ($) o
x2 ℎ$ ℎ$
f
… t
(#) ($) 𝑧$
x3 ℎ% ℎ% m
a
… … … x
… 𝑧%

(#) ($)
Xd ℎ& ℎ&
Deep Neural Network for 3-way classification

(#) ($) (*)

x1 ℎ# ℎ# ℎ#
𝑧# s
(#) ($) (*) o
x2 ℎ$ ℎ$ ℎ$
f
… t
(#) ($) (*) 𝑧$
x3 ℎ% ℎ% ℎ% m
a
… … … x
… 𝑧%

(#) ($) (*)

Xd ℎ& ℎ& ℎ&

(#) ($) (*)

Xd ℎ& ℎ& ℎ&

ℎ(') = 𝜙(ℎ '(!

×𝑊 ' )
𝜙 = activation function
Deep Neural Network for 3-way classification
𝑊 (#) 𝑊 ($)
(#) ($) (*)
x1 ℎ# ℎ# ℎ#
𝑧# s
(#) ($) (*) o
x2 ℎ$ ℎ$ ℎ$
f
… t
(#) ($) (*) 𝑧$
x3 ℎ% ℎ% ℎ% m
a
… … … x
… 𝑧%

(#) ($) (*)

Xd ℎ& ℎ& ℎ&

• Sometimes also called Multi-Layer Perceptron (MLP) or Feed-Forward Network (FFN)

It is a component of larger Transformer Models*

Attention is all you need,

Vaswani et al, 2017
Common Activation Functions 𝜙

[source: MIT 6.S191 introtodeeplearning.com]

Deep Neural Network Training
Training the deep neural network is just like logistic regression:

just w tends to be a much, much larger vector

How do we maximize functions?

X
(i) (i)
max ll(w) = max log P (y |x ; w)
w w
i

In general, cannot always take derivative and set to 0

Use numerical optimization!

Hill Climbing
Recall from CSPs lecture: simple, general idea
Start wherever
Repeat: move to the best neighboring state
If no neighbors better than current, quit

What’s particularly tricky when hill-climbing for multiclass

logistic regression?
• Optimization over a continuous space
• Infinitely many neighbors!
• How to do this efficiently?
Next Time: Optimization and more Neural Networks!
Naïve Bayes vs Logistic Regression
Naïve Bayes Logistic Regression

Joint over all features and label: Conditional:

Model
𝑃(𝑌, 𝐹" , 𝐹! , … ) 𝑃 𝑦 𝑓" , 𝑓! , … ; 𝑤)

Inference in a Bayes Net: Directly output label:

Predicted class probabilities
𝑃 𝑌 𝑓 ∝ 𝑃 𝑌 𝑃(𝑓" |𝑌) … 𝑃 𝑦 = +1 𝑓; 𝑤) = 1/(1 + 𝑒 ()⋅+ )

Features Discrete Discrete or Continuous

Entries of probability tables 𝑃(𝑌)

Parameters Weight vector 𝑤
and 𝑃(𝐹, |𝑌)

Learning Counting occurrences of events Iterative numerical optimization

Cirrus SR22 Maintenance Manual Thru - Rev - 1 PDF
75% (4)
Cirrus SR22 Maintenance Manual Thru - Rev - 1 PDF
1,082 pages
CARTEX Operating Manual Incl - Sctional DRWG Rev 4
No ratings yet
CARTEX Operating Manual Incl - Sctional DRWG Rev 4
11 pages
cs188 sp23 Lec25 - Z
No ratings yet
cs188 sp23 Lec25 - Z
38 pages
Lec 21
No ratings yet
Lec 21
34 pages
NN Theory
No ratings yet
NN Theory
138 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
SP14 CS188 Lecture 22 -- Perceptron - Print
No ratings yet
SP14 CS188 Lecture 22 -- Perceptron - Print
35 pages
cs188-sp24-note22
No ratings yet
cs188-sp24-note22
8 pages
Lec1 PerceptronPocket Recap
No ratings yet
Lec1 PerceptronPocket Recap
61 pages
unit 2_class_preceptron
No ratings yet
unit 2_class_preceptron
13 pages
unit 2_class
No ratings yet
unit 2_class
16 pages
Lecture Notes 3 Perceptron
No ratings yet
Lecture Notes 3 Perceptron
7 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
Slide 2
No ratings yet
Slide 2
35 pages
NN-Ch2 New V1
No ratings yet
NN-Ch2 New V1
99 pages
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
No ratings yet
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
59 pages
Week3_LearningI
No ratings yet
Week3_LearningI
48 pages
lec21-ML II
No ratings yet
lec21-ML II
66 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
cs188 sp23 Note25
No ratings yet
cs188 sp23 Note25
8 pages
NN 03
No ratings yet
NN 03
27 pages
Perceptron
No ratings yet
Perceptron
26 pages
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
No ratings yet
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
9 pages
Week4_LearningII
No ratings yet
Week4_LearningII
39 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Lecture 06 - Perceptron
No ratings yet
Lecture 06 - Perceptron
28 pages
ML Lecture#3
No ratings yet
ML Lecture#3
37 pages
3 Percept Ron
No ratings yet
3 Percept Ron
34 pages
3 ArtificialNeuralNetworks PDF
No ratings yet
3 ArtificialNeuralNetworks PDF
77 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
NN Unit 2
No ratings yet
NN Unit 2
20 pages
Chapter 11 Neural Nets (Python)
No ratings yet
Chapter 11 Neural Nets (Python)
43 pages
Perceptron Linear Classifiers
No ratings yet
Perceptron Linear Classifiers
42 pages
20200428135045cfbc718e2c (1)
No ratings yet
20200428135045cfbc718e2c (1)
30 pages
Basics of Deep Learning: Pierre-Marc Jodoin and Christian Desrosiers
No ratings yet
Basics of Deep Learning: Pierre-Marc Jodoin and Christian Desrosiers
183 pages
Lecture 19 NN
No ratings yet
Lecture 19 NN
32 pages
Lecture 19 NN
No ratings yet
Lecture 19 NN
32 pages
Single Layer Feedforward Networks
No ratings yet
Single Layer Feedforward Networks
21 pages
ML-UNIT-I
No ratings yet
ML-UNIT-I
14 pages
Unit 1.1
No ratings yet
Unit 1.1
44 pages
Clase3_redUnidireccional
No ratings yet
Clase3_redUnidireccional
74 pages
DL CHPT 1
No ratings yet
DL CHPT 1
59 pages
Lecture 5 NN
No ratings yet
Lecture 5 NN
57 pages
DeepLearning Workshop Humayun
No ratings yet
DeepLearning Workshop Humayun
63 pages
PLA Explanation
No ratings yet
PLA Explanation
19 pages
Machine Learning - Classifiers and Boosting: Reading CH 18.6-18.12, 20.1-20.3.2
No ratings yet
Machine Learning - Classifiers and Boosting: Reading CH 18.6-18.12, 20.1-20.3.2
54 pages
Perceptron Notes
No ratings yet
Perceptron Notes
5 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Perceptron Lecture 3
No ratings yet
Perceptron Lecture 3
25 pages
3 Perceptron: Nnets - L. 3 February 10, 2002
No ratings yet
3 Perceptron: Nnets - L. 3 February 10, 2002
31 pages
ML Lecture#4
No ratings yet
ML Lecture#4
109 pages
02A-DL2023-NN-basics
No ratings yet
02A-DL2023-NN-basics
52 pages
Neural Networks: Artificial Intelligence: Representation and Problem Solving
No ratings yet
Neural Networks: Artificial Intelligence: Representation and Problem Solving
19 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Super VIP Cheat Sheet: Arti Cial Intelligence
No ratings yet
Super VIP Cheat Sheet: Arti Cial Intelligence
18 pages
05_optimization_basics
No ratings yet
05_optimization_basics
94 pages
Ch1-fundamental of neural network
No ratings yet
Ch1-fundamental of neural network
59 pages
PLA Perceptron Learning Algorithm
No ratings yet
PLA Perceptron Learning Algorithm
14 pages
ML 03
No ratings yet
ML 03
42 pages
Adaptive Linear Neuron Using Linear (Identity) Activation Function With Batch Gradient Method
No ratings yet
Adaptive Linear Neuron Using Linear (Identity) Activation Function With Batch Gradient Method
19 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
46 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
cs188-fa24-lec25
No ratings yet
cs188-fa24-lec25
76 pages
Lec19-Particle Filtering and Applications of HMMs
No ratings yet
Lec19-Particle Filtering and Applications of HMMs
42 pages
lec17-Decision Networks VPI
No ratings yet
lec17-Decision Networks VPI
25 pages
lec16-Bayes Nets Sampling
No ratings yet
lec16-Bayes Nets Sampling
27 pages
cs188-fa24-lec23
No ratings yet
cs188-fa24-lec23
60 pages
Electric Field and Ultrasonic Sensor Based Security System PDF
No ratings yet
Electric Field and Ultrasonic Sensor Based Security System PDF
4 pages
Locking Hubs: 1991 Mitsubishi Montero
No ratings yet
Locking Hubs: 1991 Mitsubishi Montero
4 pages
Oops Notes
No ratings yet
Oops Notes
21 pages
Baudry Jean Louis - Ideological Effects of The Basic Cinematographic Apparatus PDF
No ratings yet
Baudry Jean Louis - Ideological Effects of The Basic Cinematographic Apparatus PDF
10 pages
Underwater Welding
100% (1)
Underwater Welding
19 pages
DLL Math 7 - 2nd Quarter Week 4
No ratings yet
DLL Math 7 - 2nd Quarter Week 4
5 pages
Chapter 1 5 Latest
No ratings yet
Chapter 1 5 Latest
28 pages
A Page From Nain Singh Rawat's Diary
No ratings yet
A Page From Nain Singh Rawat's Diary
2 pages
XET 401 Course Outline Public Economics
No ratings yet
XET 401 Course Outline Public Economics
5 pages
EHP3 For SAP ERP 6.0 Funds Management-Ba
No ratings yet
EHP3 For SAP ERP 6.0 Funds Management-Ba
60 pages
Output Leetcode Questions PDF
No ratings yet
Output Leetcode Questions PDF
224 pages
Sylvester Criterion For Positive Definiteness
100% (1)
Sylvester Criterion For Positive Definiteness
4 pages
Bal Seal Engineering Solutions For Reciprocating Static Face ApplicationsDM 6
No ratings yet
Bal Seal Engineering Solutions For Reciprocating Static Face ApplicationsDM 6
32 pages
Safety Data Sheet
No ratings yet
Safety Data Sheet
8 pages
Hydrostatic Transmissions For Heavy Autovehicules
100% (1)
Hydrostatic Transmissions For Heavy Autovehicules
128 pages
PADUA-ACTIVITY 3.a
No ratings yet
PADUA-ACTIVITY 3.a
4 pages
SCERT Maths Chapter1
No ratings yet
SCERT Maths Chapter1
25 pages
Download ebooks file Advanced IoT Sensors Networks and Systems Ashwani Kumar Dubey Vijayan Sugumaran Peter Han Joo Chong all chapters
100% (3)
Download ebooks file Advanced IoT Sensors Networks and Systems Ashwani Kumar Dubey Vijayan Sugumaran Peter Han Joo Chong all chapters
40 pages
Tutorial Letter 101/3/2018: Adulthood and Maturity
No ratings yet
Tutorial Letter 101/3/2018: Adulthood and Maturity
35 pages
Black Liquor Evaporation
100% (1)
Black Liquor Evaporation
3 pages
ION_M4_19''
No ratings yet
ION_M4_19''
77 pages
Aviation Emergency Response Guidebook
100% (4)
Aviation Emergency Response Guidebook
356 pages
Journals and Patents
No ratings yet
Journals and Patents
2 pages
British Council Scholarship Justin Baatjes A1 Poster
No ratings yet
British Council Scholarship Justin Baatjes A1 Poster
1 page
Ad7817 7818
No ratings yet
Ad7817 7818
20 pages
MMMA-OBTFM-030 Rev 00 Onboard Training Progress Monitoring Report-Final
No ratings yet
MMMA-OBTFM-030 Rev 00 Onboard Training Progress Monitoring Report-Final
2 pages
ISO-IEC 29119 Software Testing July 2010
No ratings yet
ISO-IEC 29119 Software Testing July 2010
23 pages
Flowserve Seal
100% (4)
Flowserve Seal
71 pages

lec22-ML III

Uploaded by

lec22-ML III

Uploaded by

Announcements

§ Project 4 due today (Thursday, Nov 14) at 11:59pm PT

§ Catherine Olsson (Anthropic) giving guest lecture next Tuesday

§ Inputs are feature values

§ Inputs are feature values

Originated from computationally modeling neurons:

§ If correct (i.e., y=y*), no change!

§ If correct: (i.e., y=y*), no change!

§ If we have multiple classes:

§ Score (activation) of a class y:

§ Prediction highest score wins

Binary = multiclass where the negative class has weight zero

§ Start with all weights = 0

§ Predict with current weights

§ If correct: no change! True Class

Score of right class: Score of right class:

BIAS 1 0 0 1 BIAS 0 1 1 0 BIAS 0 0 0 0

§ Convergence: if the training is separable, perceptron will

§ Mistake Bound: the maximum number of mistakes (binary

§ Noise: if the data isn’t separable,

§ Mediocre generalization: finds a

§ Overtraining: test / held-out

definitely blue not sure definitely red

§ Score (activation) of a class y: z=

§ Prediction highest score wins

§ How to make the scores into probabilities?

§ Score (activation) of a class y: z=

§ Prediction highest score wins

§ How to make the scores into probabilities?

= Multi-Class Logistic Regression

= Multi-Class Logistic Regression

(&) (&) (!) & (!)

(#) ($) (*)

(#) ($) (*)

(') (') ('(!) • Neural network with L layers

(#) ($) (*)

ℎ(') = 𝜙(ℎ '(!

(#) ($) (*)

• Sometimes also called Multi-Layer Perceptron (MLP) or Feed-Forward Network (FFN)

Attention is all you need,

[source: MIT 6.S191 introtodeeplearning.com]

just w tends to be a much, much larger vector

In general, cannot always take derivative and set to 0

Use numerical optimization!

What’s particularly tricky when hill-climbing for multiclass

Joint over all features and label: Conditional:

Inference in a Bayes Net: Directly output label:

Features Discrete Discrete or Continuous

Entries of probability tables 𝑃(𝑌)

Learning Counting occurrences of events Iterative numerical optimization

You might also like