0% found this document useful (0 votes)

2 views

Neural Networks Part-1

Uploaded by

Md. Mizanur Rahman

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Neural Networks Part-1

Uploaded by

Md. Mizanur Rahman

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

10 601 Introduction to Machine Learning

Machine Learning Department

School of Computer Science
Carnegie Mellon University

Neural Networks

Matt Gormley
Lecture 12
Feb. 24, 2020

1
Reminders
Homework 4: Logistic Regression
Out: Wed, Feb. 19
Due: Fri, Feb. 28 at 11:59pm
Today’s In Class Poll
http://p12.mlcourse.org
Swapped lecture/recitation:
Lecture 14: Fri, Feb. 28
Recitation HW5: Mon, Mar. 02

2
Q&A

3
NEURAL NETWORKS

4
A Recipe for
Background
Machine Learning
1. Given training data: Face Face Not a face

2. Choose each of these:

Decision function
Examples: Linear regression,
Logistic regression, Neural Network

Loss function
Examples: Mean squared error,
Cross Entropy

5
A Recipe for
Background
Machine Learning
1. Given training data: 3. Define goal:

2. Choose each of these:

Decision function 4. Train with SGD:
(take small steps
opposite the gradient)
Loss function

6
A Recipe for
Background
Gradients
Machine Learning
1. Given training data: 3. Definecan
Backpropagation goal:
compute this
gradient!
And it’s a special case of a more
general algorithm called reverse
2. Choose each of these:
mode automatic differentiation that
4. Train
Decision function can compute the with SGD:
gradient of any
differentiable
(takefunction efficiently!
small steps
opposite the gradient)
Loss function

7
A Recipe for
Background
Goals for Today’s Lecture
Machine Learning
1. 1.
Given training
Explore data:
a new class of 3. Define functions
decision goal:
(Neural Networks)
2. Consider variants of this recipe for training
2. Choose each of these:
Decision function 4. Train with SGD:
(take small steps
opposite the gradient)
Loss function

8
Decision
Functions Linear Regression

Output

Input …
9
Decision
Functions Logistic Regression

Output

Input …
10
Decision
Functions Perceptron

Output

Input …
13
Decision
Functions Neural Network

Output

Hidden Layer …

Input …
14
Neural Network Model

Age

Gender

Stage

Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 15
“Combined logistic models”

Age

Gender

Stage

Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 16
Age

Gender

Stage

Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 17
Age

Gender

Stage

Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 18
Not really,
no target for hidden units...

Age

Gender

Stage

Dependent
Independent
variable
variables
Prediction
© Eric Xing @ CMU, 2006 2011 19
From Biological to Artificial
The motivation for Artificial Neural Networks comes from biology…

Biological “Model” Artificial Model

Neuron: an excitable cell Neuron: node in a directed acyclic
Synapse: connection between graph (DAG)
neurons Weight: multiplier on each edge
A neuron sends an Activation Function: nonlinear
thresholding function, which allows a
electrochemical pulse along its neuron to “fire” when the input value
synapses when a sufficient voltage is sufficiently high
change occurs Artificial Neural Network: collection
Biological Neural Network: of neurons into a DAG, which define
collection of neurons along some some differentiable function
pathway through the brain

Biological “Computation” Artificial Computation

Neuron switching time : ~ 0.001 sec Many neuron like threshold switching
Number of neurons: ~ 1010 units
Connections per neuron: ~ 104 5 Many weighted interconnections
Scene recognition time: ~ 0.1 sec among units
Highly parallel, distributed processes
21
Slide adapted from Eric Xing
Neural Networks
Chalkboard
Example: Neural Network w/1 Hidden Layer

22
Decision
Functions Logistic Regression

Output

Face Face Not a face

Input …
23
Decision
Functions Logistic Regression

In Class Example
Output

1 1 0

y
x2
x1

Input …
24
Neural Networks

Chalkboard
1D Example from linear regression to logistic
regression
1D Example from logistic regression to a neural
network

25
Decision
Functions Logistic Regression

Output

Face Face Not a face

Input …
26
Decision
Functions Logistic Regression

In Class Example
Output

1 1 0

y
x2
x1

Input …
27
Decision
Functions Neural Network
Neural Network for Classification

Output

…
Hidden Layer

…
Input

28
Neural Network Parameters
Question:
Suppose you are training a
one hidden layer neural
network with sigmoid
activations for binary
classification.
Answer:
True or False: There is a
unique set of parameters
that maximize the
likelihood of the dataset
above.

29
ARCHITECTURES

30
Neural Networks
Chalkboard
Example: Neural Network w/2 Hidden Layers
Example: Feed Forward Neural Network
(matrix form)

31
Neural Network Architectures
Even for a basic Neural Network, there are
many design decisions to make:
1. # of hidden layers (depth)
2. # of units per hidden layer (width)
3. Type of activation function (nonlinearity)
4. Form of objective function

32
Building a Neural Net

Q: How many hidden units, D, should we use?

Output

Hidden Layer …
D=M

Input …
35
Building a Neural Net

Q: How many hidden units, D, should we use?

Output

Hidden Layer …
D=M

Input …
36
Building a Neural Net

Q: How many hidden units, D, should we use?

Output

What method(s) is
this setting similar to?

Hidden Layer …
D<M

Input …
37
Building a Neural Net

Q: How many hidden units, D, should we use?

Output

Hidden Layer …
D>M

What method(s) is
this setting similar to?

Input …
38
Deeper Networks

Q: How many layers should we use?

Output

…
Hidden Layer 1

…
Input

39
Deeper Networks

Q: How many layers should we use?

Output

…
Hidden Layer 2

…
Hidden Layer 1

…
Input

40
Deeper Networks

Q: How many layers should we use?

Output

…
Hidden Layer 3

…
Hidden Layer 2

…
Hidden Layer 1

…
Input

41
Deeper Networks

Q: How many layers should we use?

Theoretical answer:
A neural network with 1 hidden layer is a universal function
approximator
Cybenko (1989): For any continuous function g(x), there
exists a 1 hidden layer neural net h (x)
s.t. | h (x) – g(x) | < for all x, assuming sigmoid activation
functions Output
Empirical answer:
Before 2006: “Deep networks (e.g.
…
3 or more hidden layers)
are too hard to train”
Hidden Layer 1

After 2006: “Deep networks are easier to train than shallow

networks (e.g. 2 or fewer layers) for many problems”
…
Input
Big caveat: You need to know and use the right tricks.
42
Different Levels of Abstraction

We don’t know
the “right”
levels of
abstraction
So let the model
figure it out!

46
Example from Honglak Lee (NIPS 2010)
Different Levels of Abstraction

Face Recognition:
Deep Network
can build up
increasingly
higher levels of
abstraction
Lines, parts,
regions

47
Example from Honglak Lee (NIPS 2010)
Different Levels of Abstraction

Output

…
Hidden Layer 3

…
Hidden Layer 2

…
Hidden Layer 1

…
Input

48
Example from Honglak Lee (NIPS 2010)
Activation Functions
Neural Network with sigmoid
activation functions

Output

…
Hidden Layer

…
Input

49
Activation Functions
Neural Network with arbitrary
nonlinear activation functions

Output

…
Hidden Layer

…
Input

50
Activation Functions
Sigmoid / Logistic Function So far, we’ve
assumed that the
u u activation function
e
(nonlinearity) is
always the sigmoid
function…

51
Activation Functions
A new change: modifying the nonlinearity
The logistic is not widely used in modern ANNs

Alternate 1:
tanh

Like logistic function but

shifted to range [ 1, +1]

Slide from William Cohen

AI Stats 2010

depth 4?

sigmoid
vs.
tanh

Figure from Glorot & Bentio (2010)

Activation Functions
A new change: modifying the nonlinearity
reLU often used in vision tasks

Alternate 2: rectified linear unit

Linear with a cutoff at zero

(Implementation: clip the gradient

when you pass zero)

Slide from William Cohen

Activation Functions
A new change: modifying the nonlinearity
reLU often used in vision tasks

Alternate 2: rectified linear unit

Soft version: log(exp(x)+1)

Doesn’t saturate (at one end)

Sparsifies outputs
Helps with vanishing gradient

Slide from William Cohen

Decision
Functions Neural Network
Neural Network for Classification

Output

…
Hidden Layer

…
Input

56
Decision
Functions Neural Network
Neural Network for Regression

Output

Hidden Layer
… y

…
Input

57
Objective Functions for NNs
1. Quadratic Loss:
the same objective as Linear Regression
i.e. mean squared error
2. Cross Entropy:
the same objective as Logistic Regression
i.e. negative log likelihood
This requires probabilities, so we add an additional
“softmax” layer at the end of our network

58
Objective Functions for NNs
Cross entropy vs. Quadratic loss

Figure from Glorot & Bentio (2010)

Multi Class Output

Output …

Hidden Layer …

Input …
60
Multi Class Output
Softmax:

…
Output

…
Hidden Layer

…
Input

61
Neural Network Errors
Question A: For which of the datasets below Question B: For which of the datasets
does there exist a one hidden layer neural below does there exist a one hidden layer
network that achieves zero classification neural network for regression that achieves
error? Select all that apply. nearly zero MSE? Select all that apply.

A) B) A) B)

C) D) C) D)

62
DECISION BOUNDARY EXAMPLES

63
Example #1: Diagonal Band Example #2: One Pocket

Example #3: Four Gaussians Example #4: Two Pockets

64
Example #1: Diagonal Band

65
Example #1: Diagonal Band

66
Example #1: Diagonal Band
hidden

67
Example #1: Diagonal Band
hidden

68
Example #1: Diagonal Band
hidden

69
Example #1: Diagonal Band
hidden

70
Example #1: Diagonal Band
hidden

hidden

hidden hidden

71
Example #2: One Pocket

72
Example #2: One Pocket

73
Example #2: One Pocket
hidden

74
Example #2: One Pocket
hidden

75
Example #2: One Pocket
hidden

76
Example #2: One Pocket
hidden

77
Example #2: One Pocket
hidden

78
Example #2: One Pocket
hidden hidden

hidden hidden

79
Example #3: Four Gaussians

80
Example #3: Four Gaussians

81
Example #3: Four Gaussians

82
Example #3: Four Gaussians
hidden

83
Example #3: Four Gaussians
hidden

84
Example #3: Four Gaussians
hidden

85
Example #3: Four Gaussians
hidden

86
Example #4: Two Pockets

87
Example #4: Two Pockets

88
Example #4: Two Pockets

89
Example #4: Two Pockets

90
Example #4: Two Pockets

91
Example #4: Two Pockets
hidden

92
Example #4: Two Pockets
hidden

93
Example #4: Two Pockets
hidden

94
Example #4: Two Pockets
hidden

95
Neural Networks Objectives
You should be able to…
Explain the biological motivations for a neural network
Combine simpler models (e.g. linear regression, binary
logistic regression, multinomial logistic regression) as
components to build up feed forward neural network
architectures
Explain the reasons why a neural network can model
nonlinear decision boundaries for classification
Compare and contrast feature engineering with learning
features
Identify (some of) the options available when designing
the architecture of a neural network
Implement a feed forward neural network

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (83)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
MITx 6.86x Notes - MD
No ratings yet
MITx 6.86x Notes - MD
91 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
@ Car Evaluation
No ratings yet
@ Car Evaluation
10 pages
Lecture6 Neural Network Basics v1.1
No ratings yet
Lecture6 Neural Network Basics v1.1
40 pages
Intro 2 Netlab
No ratings yet
Intro 2 Netlab
10 pages
Neural Networks: 10-601B Introduction To Machine Learning
No ratings yet
Neural Networks: 10-601B Introduction To Machine Learning
78 pages
DLAI4 Revision
No ratings yet
DLAI4 Revision
6 pages
ML Lab 11 Manual - Neural Networks (Ver4)
No ratings yet
ML Lab 11 Manual - Neural Networks (Ver4)
8 pages
Artificial Neural Networks: Introduction To Computational Neuroscience
No ratings yet
Artificial Neural Networks: Introduction To Computational Neuroscience
42 pages
771 A18 Lec21
No ratings yet
771 A18 Lec21
109 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Neural-Networks Back Propagation
No ratings yet
Neural-Networks Back Propagation
70 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Lec 01 Introduction
No ratings yet
Lec 01 Introduction
98 pages
Main
No ratings yet
Main
25 pages
Neural - Networks
No ratings yet
Neural - Networks
47 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
Lecture 4 - Basics of ML
No ratings yet
Lecture 4 - Basics of ML
59 pages
Dive Into Deep Learning
No ratings yet
Dive Into Deep Learning
105 pages
Lec-4-Opt and BP
No ratings yet
Lec-4-Opt and BP
75 pages
Chapter 2 - 2 Shallow neural network 2_2
No ratings yet
Chapter 2 - 2 Shallow neural network 2_2
34 pages
855597620
No ratings yet
855597620
44 pages
Lecture 4
No ratings yet
Lecture 4
146 pages
Lec3 MLP Optimization
No ratings yet
Lec3 MLP Optimization
86 pages
DOC-20241108-WA0006.
No ratings yet
DOC-20241108-WA0006.
70 pages
NN Lab2
No ratings yet
NN Lab2
5 pages
Machine Learnig Syllabus
No ratings yet
Machine Learnig Syllabus
3 pages
Lecture 12 - Neural Networks (DONE!!) PDF
No ratings yet
Lecture 12 - Neural Networks (DONE!!) PDF
27 pages
PNAL1 Introduction
No ratings yet
PNAL1 Introduction
30 pages
7 Neural Networks
No ratings yet
7 Neural Networks
70 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
03-Lecture Notes-Mid
No ratings yet
03-Lecture Notes-Mid
23 pages
Lecture20 Backprop
No ratings yet
Lecture20 Backprop
77 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
No ratings yet
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
84 pages
cours1
No ratings yet
cours1
42 pages
Machine Learning-Gkouzionis
No ratings yet
Machine Learning-Gkouzionis
14 pages
1 Intro
No ratings yet
1 Intro
19 pages
DS303_NN
No ratings yet
DS303_NN
20 pages
Neural 13
No ratings yet
Neural 13
34 pages
Bai 1 Eng
No ratings yet
Bai 1 Eng
10 pages
Lec 3 NNs
No ratings yet
Lec 3 NNs
64 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
Lecture1 Introduction To ANN
No ratings yet
Lecture1 Introduction To ANN
70 pages
Lecture 1
No ratings yet
Lecture 1
30 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Slide 7 - Neural Networks
No ratings yet
Slide 7 - Neural Networks
64 pages
Lecture 4
No ratings yet
Lecture 4
50 pages
COS324 Course Notes
No ratings yet
COS324 Course Notes
256 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Lec 6-7 (Neural Networks)
No ratings yet
Lec 6-7 (Neural Networks)
26 pages
NN 05
No ratings yet
NN 05
28 pages
National Institute of Technology Patna: Department of Computer Science & Engineering
No ratings yet
National Institute of Technology Patna: Department of Computer Science & Engineering
2 pages
Midterm Combined Slides
No ratings yet
Midterm Combined Slides
210 pages
Aidl Unit III
No ratings yet
Aidl Unit III
79 pages
Ann4-3s.pdf 7oct PDF
No ratings yet
Ann4-3s.pdf 7oct PDF
21 pages
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
No ratings yet
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
11 pages
Gerald Corzo 5/26/2020: Workshop Google Machine Learning Tools (Services) 1
No ratings yet
Gerald Corzo 5/26/2020: Workshop Google Machine Learning Tools (Services) 1
24 pages
Fundamentals of Neural Networks PDF
100% (4)
Fundamentals of Neural Networks PDF
476 pages
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet
Neural Networks
From Everand
Neural Networks
Sasha Kurzweil
No ratings yet
Play_No Play_Data
No ratings yet
Play_No Play_Data
1 page
3. Objective based
No ratings yet
3. Objective based
1 page
2. Pure VS Applied
No ratings yet
2. Pure VS Applied
1 page
CDA_Assignment4
No ratings yet
CDA_Assignment4
12 pages
MakeupMidExam
No ratings yet
MakeupMidExam
1 page
Bus_Schedule_Friday
No ratings yet
Bus_Schedule_Friday
1 page
PMASDS_Payment Notice
No ratings yet
PMASDS_Payment Notice
1 page
Lecture on Q_Control
No ratings yet
Lecture on Q_Control
8 pages
RAW Data
No ratings yet
RAW Data
22 pages
Knowledge Series - Business Impact Analysis (BIA) vs. Risk Assessment (RA)
No ratings yet
Knowledge Series - Business Impact Analysis (BIA) vs. Risk Assessment (RA)
1 page
Stages_ResearchProcess_ImranKhan
No ratings yet
Stages_ResearchProcess_ImranKhan
8 pages
CIA Triad
No ratings yet
CIA Triad
1 page
Statistics Interview Questions
No ratings yet
Statistics Interview Questions
53 pages
Lecture 4. Dispersion
No ratings yet
Lecture 4. Dispersion
6 pages
Monetary and Financial System (MAFS)
No ratings yet
Monetary and Financial System (MAFS)
183 pages
Machine Learning MS
No ratings yet
Machine Learning MS
5 pages
Lecture9 Dropout Optimization Cnns
No ratings yet
Lecture9 Dropout Optimization Cnns
79 pages
Notes_ML_02_Slides_RNN_ANN
No ratings yet
Notes_ML_02_Slides_RNN_ANN
105 pages
Machine Learning Is A Computer Vision
No ratings yet
Machine Learning Is A Computer Vision
7 pages
Partition
No ratings yet
Partition
52 pages
Assignment - 2
No ratings yet
Assignment - 2
2 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Chapter1 (Classification)
No ratings yet
Chapter1 (Classification)
16 pages
Soft Computing
No ratings yet
Soft Computing
92 pages
Practical No 1: Aim:Breadth First Search & Iterative Depth First Search
No ratings yet
Practical No 1: Aim:Breadth First Search & Iterative Depth First Search
36 pages
Ex 9
No ratings yet
Ex 9
2 pages
Quiz 1 Machine Learning II
No ratings yet
Quiz 1 Machine Learning II
7 pages
ISTE Faculty Chapter - MH286 Sardar Patel Institute of Technology Presents
No ratings yet
ISTE Faculty Chapter - MH286 Sardar Patel Institute of Technology Presents
4 pages

Neural Networks Part-1

Uploaded by

Neural Networks Part-1

Uploaded by

10 601 Introduction to Machine Learning

Machine Learning Department

2. Choose each of these:

2. Choose each of these:

Biological “Model” Artificial Model

Biological “Computation” Artificial Computation

Face Face Not a face

Face Face Not a face

Q: How many hidden units, D, should we use?

Q: How many hidden units, D, should we use?

Q: How many hidden units, D, should we use?

Q: How many hidden units, D, should we use?

Q: How many layers should we use?

Q: How many layers should we use?

Q: How many layers should we use?

Q: How many layers should we use?

After 2006: “Deep networks are easier to train than shallow

Like logistic function but

Slide from William Cohen

Figure from Glorot & Bentio (2010)

Alternate 2: rectified linear unit

Linear with a cutoff at zero

(Implementation: clip the gradient

Slide from William Cohen

Alternate 2: rectified linear unit

Soft version: log(exp(x)+1)

Doesn’t saturate (at one end)

Slide from William Cohen

Figure from Glorot & Bentio (2010)

Example #3: Four Gaussians Example #4: Two Pockets

You might also like