Neural Networks Part-1
Neural Networks Part-1
Neural Networks
Matt Gormley
Lecture 12
Feb. 24, 2020
1
Reminders
Homework 4: Logistic Regression
Out: Wed, Feb. 19
Due: Fri, Feb. 28 at 11:59pm
Today’s In Class Poll
http://p12.mlcourse.org
Swapped lecture/recitation:
Lecture 14: Fri, Feb. 28
Recitation HW5: Mon, Mar. 02
2
Q&A
3
NEURAL NETWORKS
4
A Recipe for
Background
Machine Learning
1. Given training data: Face Face Not a face
Loss function
Examples: Mean squared error,
Cross Entropy
5
A Recipe for
Background
Machine Learning
1. Given training data: 3. Define goal:
6
A Recipe for
Background
Gradients
Machine Learning
1. Given training data: 3. Definecan
Backpropagation goal:
compute this
gradient!
And it’s a special case of a more
general algorithm called reverse
2. Choose each of these:
mode automatic differentiation that
4. Train
Decision function can compute the with SGD:
gradient of any
differentiable
(takefunction efficiently!
small steps
opposite the gradient)
Loss function
7
A Recipe for
Background
Goals for Today’s Lecture
Machine Learning
1. 1.
Given training
Explore data:
a new class of 3. Define functions
decision goal:
(Neural Networks)
2. Consider variants of this recipe for training
2. Choose each of these:
Decision function 4. Train with SGD:
(take small steps
opposite the gradient)
Loss function
8
Decision
Functions Linear Regression
Output
Input …
9
Decision
Functions Logistic Regression
Output
Input …
10
Decision
Functions Perceptron
Output
Input …
13
Decision
Functions Neural Network
Output
Hidden Layer …
Input …
14
Neural Network Model
Age
Gender
Stage
Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 15
“Combined logistic models”
Age
Gender
Stage
Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 16
Age
Gender
Stage
Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 17
Age
Gender
Stage
Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 18
Not really,
no target for hidden units...
Age
Gender
Stage
Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 19
From Biological to Artificial
The motivation for Artificial Neural Networks comes from biology…
22
Decision
Functions Logistic Regression
Output
Input …
23
Decision
Functions Logistic Regression
In Class Example
Output
1 1 0
y
x2
x1
Input …
24
Neural Networks
Chalkboard
1D Example from linear regression to logistic
regression
1D Example from logistic regression to a neural
network
25
Decision
Functions Logistic Regression
Output
Input …
26
Decision
Functions Logistic Regression
In Class Example
Output
1 1 0
y
x2
x1
Input …
27
Decision
Functions Neural Network
Neural Network for Classification
Output
…
Hidden Layer
…
Input
28
Neural Network Parameters
Question:
Suppose you are training a
one hidden layer neural
network with sigmoid
activations for binary
classification.
Answer:
True or False: There is a
unique set of parameters
that maximize the
likelihood of the dataset
above.
29
ARCHITECTURES
30
Neural Networks
Chalkboard
Example: Neural Network w/2 Hidden Layers
Example: Feed Forward Neural Network
(matrix form)
31
Neural Network Architectures
Even for a basic Neural Network, there are
many design decisions to make:
1. # of hidden layers (depth)
2. # of units per hidden layer (width)
3. Type of activation function (nonlinearity)
4. Form of objective function
32
Building a Neural Net
Hidden Layer …
D=M
Input …
35
Building a Neural Net
Hidden Layer …
D=M
Input …
36
Building a Neural Net
What method(s) is
this setting similar to?
Hidden Layer …
D<M
Input …
37
Building a Neural Net
Hidden Layer …
D>M
What method(s) is
this setting similar to?
Input …
38
Deeper Networks
Output
…
Hidden Layer 1
…
Input
39
Deeper Networks
Output
…
Hidden Layer 2
…
Hidden Layer 1
…
Input
40
Deeper Networks
…
Hidden Layer 3
…
Hidden Layer 2
…
Hidden Layer 1
…
Input
41
Deeper Networks
We don’t know
the “right”
levels of
abstraction
So let the model
figure it out!
46
Example from Honglak Lee (NIPS 2010)
Different Levels of Abstraction
Face Recognition:
Deep Network
can build up
increasingly
higher levels of
abstraction
Lines, parts,
regions
47
Example from Honglak Lee (NIPS 2010)
Different Levels of Abstraction
Output
…
Hidden Layer 3
…
Hidden Layer 2
…
Hidden Layer 1
…
Input
48
Example from Honglak Lee (NIPS 2010)
Activation Functions
Neural Network with sigmoid
activation functions
Output
…
Hidden Layer
…
Input
49
Activation Functions
Neural Network with arbitrary
nonlinear activation functions
Output
…
Hidden Layer
…
Input
50
Activation Functions
Sigmoid / Logistic Function So far, we’ve
assumed that the
u u activation function
e
(nonlinearity) is
always the sigmoid
function…
51
Activation Functions
A new change: modifying the nonlinearity
The logistic is not widely used in modern ANNs
Alternate 1:
tanh
depth 4?
sigmoid
vs.
tanh
Output
…
Hidden Layer
…
Input
56
Decision
Functions Neural Network
Neural Network for Regression
Output
Hidden Layer
… y
…
Input
57
Objective Functions for NNs
1. Quadratic Loss:
the same objective as Linear Regression
i.e. mean squared error
2. Cross Entropy:
the same objective as Logistic Regression
i.e. negative log likelihood
This requires probabilities, so we add an additional
“softmax” layer at the end of our network
58
Objective Functions for NNs
Cross entropy vs. Quadratic loss
Output …
Hidden Layer …
Input …
60
Multi Class Output
Softmax:
…
Output
…
Hidden Layer
…
Input
61
Neural Network Errors
Question A: For which of the datasets below Question B: For which of the datasets
does there exist a one hidden layer neural below does there exist a one hidden layer
network that achieves zero classification neural network for regression that achieves
error? Select all that apply. nearly zero MSE? Select all that apply.
A) B) A) B)
C) D) C) D)
62
DECISION BOUNDARY EXAMPLES
63
Example #1: Diagonal Band Example #2: One Pocket
64
Example #1: Diagonal Band
65
Example #1: Diagonal Band
66
Example #1: Diagonal Band
hidden
67
Example #1: Diagonal Band
hidden
68
Example #1: Diagonal Band
hidden
69
Example #1: Diagonal Band
hidden
70
Example #1: Diagonal Band
hidden
hidden
hidden hidden
71
Example #2: One Pocket
72
Example #2: One Pocket
73
Example #2: One Pocket
hidden
74
Example #2: One Pocket
hidden
75
Example #2: One Pocket
hidden
76
Example #2: One Pocket
hidden
77
Example #2: One Pocket
hidden
78
Example #2: One Pocket
hidden hidden
hidden hidden
79
Example #3: Four Gaussians
80
Example #3: Four Gaussians
81
Example #3: Four Gaussians
82
Example #3: Four Gaussians
hidden
83
Example #3: Four Gaussians
hidden
84
Example #3: Four Gaussians
hidden
85
Example #3: Four Gaussians
hidden
86
Example #4: Two Pockets
87
Example #4: Two Pockets
88
Example #4: Two Pockets
89
Example #4: Two Pockets
90
Example #4: Two Pockets
91
Example #4: Two Pockets
hidden
92
Example #4: Two Pockets
hidden
93
Example #4: Two Pockets
hidden
94
Example #4: Two Pockets
hidden
95
Neural Networks Objectives
You should be able to…
Explain the biological motivations for a neural network
Combine simpler models (e.g. linear regression, binary
logistic regression, multinomial logistic regression) as
components to build up feed forward neural network
architectures
Explain the reasons why a neural network can model
nonlinear decision boundaries for classification
Compare and contrast feature engineering with learning
features
Identify (some of) the options available when designing
the architecture of a neural network
Implement a feed forward neural network
96