0% found this document useful (0 votes)
2 views

Neural Networks Part-1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Neural Networks Part-1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

10 601 Introduction to Machine Learning

Machine Learning Department


School of Computer Science
Carnegie Mellon University

Neural Networks

Matt Gormley
Lecture 12
Feb. 24, 2020

1
Reminders
Homework 4: Logistic Regression
Out: Wed, Feb. 19
Due: Fri, Feb. 28 at 11:59pm
Today’s In Class Poll
http://p12.mlcourse.org
Swapped lecture/recitation:
Lecture 14: Fri, Feb. 28
Recitation HW5: Mon, Mar. 02

2
Q&A

3
NEURAL NETWORKS

4
A Recipe for
Background
Machine Learning
1. Given training data: Face Face Not a face

2. Choose each of these:


Decision function
Examples: Linear regression,
Logistic regression, Neural Network

Loss function
Examples: Mean squared error,
Cross Entropy

5
A Recipe for
Background
Machine Learning
1. Given training data: 3. Define goal:

2. Choose each of these:


Decision function 4. Train with SGD:
(take small steps
opposite the gradient)
Loss function

6
A Recipe for
Background
Gradients
Machine Learning
1. Given training data: 3. Definecan
Backpropagation goal:
compute this
gradient!
And it’s a special case of a more
general algorithm called reverse
2. Choose each of these:
mode automatic differentiation that
4. Train
Decision function can compute the with SGD:
gradient of any
differentiable
(takefunction efficiently!
small steps
opposite the gradient)
Loss function

7
A Recipe for
Background
Goals for Today’s Lecture
Machine Learning
1. 1.
Given training
Explore data:
a new class of 3. Define functions
decision goal:
(Neural Networks)
2. Consider variants of this recipe for training
2. Choose each of these:
Decision function 4. Train with SGD:
(take small steps
opposite the gradient)
Loss function

8
Decision
Functions Linear Regression

Output

Input …
9
Decision
Functions Logistic Regression

Output

Input …
10
Decision
Functions Perceptron

Output

Input …
13
Decision
Functions Neural Network

Output

Hidden Layer …

Input …
14
Neural Network Model

Age

Gender

Stage

Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 15
“Combined logistic models”

Age

Gender

Stage

Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 16
Age

Gender

Stage

Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 17
Age

Gender

Stage

Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 18
Not really,
no target for hidden units...

Age

Gender

Stage

Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 19
From Biological to Artificial
The motivation for Artificial Neural Networks comes from biology…

Biological “Model” Artificial Model


Neuron: an excitable cell Neuron: node in a directed acyclic
Synapse: connection between graph (DAG)
neurons Weight: multiplier on each edge
A neuron sends an Activation Function: nonlinear
thresholding function, which allows a
electrochemical pulse along its neuron to “fire” when the input value
synapses when a sufficient voltage is sufficiently high
change occurs Artificial Neural Network: collection
Biological Neural Network: of neurons into a DAG, which define
collection of neurons along some some differentiable function
pathway through the brain

Biological “Computation” Artificial Computation


Neuron switching time : ~ 0.001 sec Many neuron like threshold switching
Number of neurons: ~ 1010 units
Connections per neuron: ~ 104 5 Many weighted interconnections
Scene recognition time: ~ 0.1 sec among units
Highly parallel, distributed processes
21
Slide adapted from Eric Xing
Neural Networks
Chalkboard
Example: Neural Network w/1 Hidden Layer

22
Decision
Functions Logistic Regression

Output

Face Face Not a face

Input …
23
Decision
Functions Logistic Regression

In Class Example
Output

1 1 0

y
x2
x1

Input …
24
Neural Networks

Chalkboard
1D Example from linear regression to logistic
regression
1D Example from logistic regression to a neural
network

25
Decision
Functions Logistic Regression

Output

Face Face Not a face

Input …
26
Decision
Functions Logistic Regression

In Class Example
Output

1 1 0

y
x2
x1

Input …
27
Decision
Functions Neural Network
Neural Network for Classification

Output


Hidden Layer


Input

28
Neural Network Parameters
Question:
Suppose you are training a
one hidden layer neural
network with sigmoid
activations for binary
classification.
Answer:
True or False: There is a
unique set of parameters
that maximize the
likelihood of the dataset
above.

29
ARCHITECTURES

30
Neural Networks
Chalkboard
Example: Neural Network w/2 Hidden Layers
Example: Feed Forward Neural Network
(matrix form)

31
Neural Network Architectures
Even for a basic Neural Network, there are
many design decisions to make:
1. # of hidden layers (depth)
2. # of units per hidden layer (width)
3. Type of activation function (nonlinearity)
4. Form of objective function

32
Building a Neural Net

Q: How many hidden units, D, should we use?


Output

Hidden Layer …
D=M

Input …
35
Building a Neural Net

Q: How many hidden units, D, should we use?


Output

Hidden Layer …
D=M

Input …
36
Building a Neural Net

Q: How many hidden units, D, should we use?


Output

What method(s) is
this setting similar to?

Hidden Layer …
D<M

Input …
37
Building a Neural Net

Q: How many hidden units, D, should we use?


Output

Hidden Layer …
D>M

What method(s) is
this setting similar to?

Input …
38
Deeper Networks

Q: How many layers should we use?

Output


Hidden Layer 1


Input

39
Deeper Networks

Q: How many layers should we use?

Output


Hidden Layer 2


Hidden Layer 1


Input

40
Deeper Networks

Q: How many layers should we use?


Output


Hidden Layer 3


Hidden Layer 2


Hidden Layer 1


Input

41
Deeper Networks

Q: How many layers should we use?


Theoretical answer:
A neural network with 1 hidden layer is a universal function
approximator
Cybenko (1989): For any continuous function g(x), there
exists a 1 hidden layer neural net h (x)
s.t. | h (x) – g(x) | < for all x, assuming sigmoid activation
functions Output
Empirical answer:
Before 2006: “Deep networks (e.g.

3 or more hidden layers)
are too hard to train”
Hidden Layer 1

After 2006: “Deep networks are easier to train than shallow


networks (e.g. 2 or fewer layers) for many problems”

Input
Big caveat: You need to know and use the right tricks.
42
Different Levels of Abstraction

We don’t know
the “right”
levels of
abstraction
So let the model
figure it out!

46
Example from Honglak Lee (NIPS 2010)
Different Levels of Abstraction

Face Recognition:
Deep Network
can build up
increasingly
higher levels of
abstraction
Lines, parts,
regions

47
Example from Honglak Lee (NIPS 2010)
Different Levels of Abstraction

Output


Hidden Layer 3


Hidden Layer 2


Hidden Layer 1


Input

48
Example from Honglak Lee (NIPS 2010)
Activation Functions
Neural Network with sigmoid
activation functions

Output


Hidden Layer


Input

49
Activation Functions
Neural Network with arbitrary
nonlinear activation functions

Output


Hidden Layer


Input

50
Activation Functions
Sigmoid / Logistic Function So far, we’ve
assumed that the
u u activation function
e
(nonlinearity) is
always the sigmoid
function…

51
Activation Functions
A new change: modifying the nonlinearity
The logistic is not widely used in modern ANNs

Alternate 1:
tanh

Like logistic function but


shifted to range [ 1, +1]

Slide from William Cohen


AI Stats 2010

depth 4?

sigmoid
vs.
tanh

Figure from Glorot & Bentio (2010)


Activation Functions
A new change: modifying the nonlinearity
reLU often used in vision tasks

Alternate 2: rectified linear unit

Linear with a cutoff at zero

(Implementation: clip the gradient


when you pass zero)

Slide from William Cohen


Activation Functions
A new change: modifying the nonlinearity
reLU often used in vision tasks

Alternate 2: rectified linear unit

Soft version: log(exp(x)+1)

Doesn’t saturate (at one end)


Sparsifies outputs
Helps with vanishing gradient

Slide from William Cohen


Decision
Functions Neural Network
Neural Network for Classification

Output


Hidden Layer


Input

56
Decision
Functions Neural Network
Neural Network for Regression

Output

Hidden Layer
… y


Input

57
Objective Functions for NNs
1. Quadratic Loss:
the same objective as Linear Regression
i.e. mean squared error
2. Cross Entropy:
the same objective as Logistic Regression
i.e. negative log likelihood
This requires probabilities, so we add an additional
“softmax” layer at the end of our network

58
Objective Functions for NNs
Cross entropy vs. Quadratic loss

Figure from Glorot & Bentio (2010)


Multi Class Output

Output …

Hidden Layer …

Input …
60
Multi Class Output
Softmax:


Output


Hidden Layer


Input

61
Neural Network Errors
Question A: For which of the datasets below Question B: For which of the datasets
does there exist a one hidden layer neural below does there exist a one hidden layer
network that achieves zero classification neural network for regression that achieves
error? Select all that apply. nearly zero MSE? Select all that apply.

A) B) A) B)

C) D) C) D)

62
DECISION BOUNDARY EXAMPLES

63
Example #1: Diagonal Band Example #2: One Pocket

Example #3: Four Gaussians Example #4: Two Pockets

64
Example #1: Diagonal Band

65
Example #1: Diagonal Band

66
Example #1: Diagonal Band
hidden

67
Example #1: Diagonal Band
hidden

68
Example #1: Diagonal Band
hidden

69
Example #1: Diagonal Band
hidden

70
Example #1: Diagonal Band
hidden

hidden

hidden hidden

71
Example #2: One Pocket

72
Example #2: One Pocket

73
Example #2: One Pocket
hidden

74
Example #2: One Pocket
hidden

75
Example #2: One Pocket
hidden

76
Example #2: One Pocket
hidden

77
Example #2: One Pocket
hidden

78
Example #2: One Pocket
hidden hidden

hidden hidden

79
Example #3: Four Gaussians

80
Example #3: Four Gaussians

81
Example #3: Four Gaussians

82
Example #3: Four Gaussians
hidden

83
Example #3: Four Gaussians
hidden

84
Example #3: Four Gaussians
hidden

85
Example #3: Four Gaussians
hidden

86
Example #4: Two Pockets

87
Example #4: Two Pockets

88
Example #4: Two Pockets

89
Example #4: Two Pockets

90
Example #4: Two Pockets

91
Example #4: Two Pockets
hidden

92
Example #4: Two Pockets
hidden

93
Example #4: Two Pockets
hidden

94
Example #4: Two Pockets
hidden

95
Neural Networks Objectives
You should be able to…
Explain the biological motivations for a neural network
Combine simpler models (e.g. linear regression, binary
logistic regression, multinomial logistic regression) as
components to build up feed forward neural network
architectures
Explain the reasons why a neural network can model
nonlinear decision boundaries for classification
Compare and contrast feature engineering with learning
features
Identify (some of) the options available when designing
the architecture of a neural network
Implement a feed forward neural network

96

You might also like