0% found this document useful (0 votes)

18 views

6.1 DeepFFNets

Uploaded by

sharanyarb534

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

6.1 DeepFFNets

Uploaded by

sharanyarb534

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Deep Learning Srihari

Deep Feedforward Networks:

Overview
Sargur N. Srihari
[email protected]

1
Deep Learning Srihari

Topics in DFF Networks

1. Overview
2. Example: Learning XOR
3.Hidden Units
4. Architecture Design
5. Backpropagation and Other Differentiation
6. Historical Notes

2
Deep Learning Srihari

Sub-topics in Overview of DFF

1. Goal of a Feed-Forward Network
2. Feedforward vs Recurrent Networks
3. Function Approximation as Goal
4. Extending Linear Models (SVM)
5. Example of XOR

3
Deep Learning Srihari

Goal of a feedforward network

• Feedforward Nets are
quintessential deep learning models
• Deep Feedforward Networks are also called as
– Feedforward neural networks or
– Multilayer Perceptrons (MLPs)
• Their Goal is to approximate some function f *
– E.g., classifier y = f * (x) maps input x to category y
– Feedforward Network defines a mapping
y = f * (x ; θ )
•and learns the values of the parameters θ that result in the
best function approximation 4
Deep Learning Srihari

Feedforward network for MNIST

MNIST 28x28 images

Source: https://towardsdatascience.com/probability-and-statistics-explained-in-the-context-of-deep-learning-ed1509b2eb3f
5
Deep Learning Srihari

Another View of 2-hidden layers

https://www.easy-tensorflow.com/tf-tutorials/neural-networks/two-layer-neural-network

6
Deep Learning Srihari

Flow of Information
• Models are called Feedforward because: y=f (x)
– To evaluate f (x): information flows one-way from
x through computations defining f s to outputs y
• There are no feedback connections
– No outputs of model are fed back into itself

7
Deep Learning Srihari

Feedforward Net: US Election

• US Presidential Election y=f (x)
• Output: y={y1, y2}
• votes of electoral college for candidate
• Input: X={x1,..x50}
• are vote vectors cast for 2 candidates
• W converts votes to electoral votes
– E.g., Winner takes all or proportionate
h is defined for each state
• h is electoral college as shown in map
• Each state has fixed no of electors
• w maps 50 states to 2 outputs
• Simple addition
8
Deep Learning Srihari

Importance of Feedforward Networks

• They are extremely important to ML practice
• Form basis for many commercial applications
1. CNNs are a special kind of feedforward networks
• They are used for recognizing objects from photos
2. They are a conceptual stepping stones to RNNs
• RNNs power many NLP applications

9
Deep Learning Srihari

Feedforward vs. Recurrent

• When feedforward neural networks are

extended to include feedback connections they
are called Recurrent Neural Networks (RNNs)

RNN Unrolled RNN

RNN with
learning
component

10
Deep Learning Srihari

Feedforward Neural Network Structures

• They are called networks because they are

composed of many different functions
• Model is associated with a directed acyclic
graph describing how functions composed
– E.g., functions f (1), f (2), f (3) connected in a chain to
form f (x)= f (3) [ f (2) [ f (1)(x)]]
• f (1) is called the first layer of network (which is a vector)
• f (2) is called the second layer, etc
• These chain structures are the most commonly
used structures of neural networks 11
Deep Learning Srihari

Definition of Depth
• Overall length of the chain is the depth of the
model
– Ex: the composite function f (x)= f (3) [ f (2) [ f (1)(x)]]
has depth of 3
• The name deep learning arises from this
terminology
• Final layer of a feedforward network, ex f (3), is
called the output layer

12
Deep Learning Srihari

Training the Network

• In network training we drive f (x) to match f* (x)
• Training data provides us with noisy,
approximate examples of f* (x) evaluated at
different training points
• Each example accompanied by label y ≈ f*(x)
• Training examples specify directly what the
output layer must do at each point x
– It must produce a value that is close to y

13
Deep Learning Srihari

Definition of Hidden Layer

• Behavior of other layers is not directly specified
by the data
• Learning algorithm must decide how to use
those layers to produce value that is close to y
• Training data does not say what individual
layers should do
• Since the desired output for these layers is not
shown, they are called hidden layers

14
Deep Learning Srihari

A net with depth 2: one hidden layer

K outputs y1,..yK for a given input x

Hidden layer consists of M units

⎛ M (2) ⎛ D (1) ⎞ ⎞
yk (x,w) = σ ⎜ ∑ wkj h ⎜ ∑ w ji x i + w (1)
j0 ⎟
+ w (2)
k0 ⎟
⎝ j =1 ⎝ i =1 ⎠ ⎠

f (x)= f (2) [ f (1)(x)]

f (1) is a vector of M dimensions and
f (2) is a vector of K dimensions

fm (1) =zm= h(xTw(1)), m=1,..M

fk (2) = σ (zTw(2)), k=1,..K

15
Deep Learning Srihari

Feedforward net with depth 2

• Recognition of printed characters (OCR)
f (x)= f (2) [ f (1)(x)]
– Hidden layer f (1) compares raw pixel inputs to
component patterns

16
Deep Learning Srihari

Width of Model
• Each hidden layer is typically vector-valued
• Dimensionality of hidden layer vector is width of
the model

17
Deep Learning Srihari

Units of a model
• Each element of vector viewed as a neuron
– Instead of thinking of it as a vector-vector function,
they are regarded as units in parallel
• Each unit receives inputs from many other units
and computes its own activation value

18
Deep Learning Srihari

Depth versus Width

• Going deeper makes network more expressive
– It can capture variations of the data better.
– Yields expressiveness more efficiently than width
• Tradeoff for more expressiveness is increased
tendency to overfit
– You will need more data or additional regularization
• network should be as deep as training data allows.
– But you can only determine a suitable depth by
experiment.
• Also computation increases with no. of layers.
Deep Learning Srihari

Very Deep CNNs

CNNs with depth 11 to 19
Depth increases from left (A) to right (E)
as more layers are added
(the added layers are shown in bold)

Convolutional layer parameters denoted

“conv (receptive field size) –(no. of channels)”

ReLU activation not shown for brevity

20
Deep Learning Srihari

Why are they neural networks?

• These networks are loosely inspired by
neuroscience
• Each unit resembles a neuron
– Receives input from many other units
– Computes its own activation value
• Choice of functions f (i)(x):
– Loosely guided by neuroscientific observations
about biological neurons
• Modern neural networks are guided by many
mathematical and engineering disciplines
• Not perfectly model the brain
21
Deep Learning Srihari

Function Approximation is goal

• Think of feedforward networks as function
approximation machines
– Designed to achieve statistical generalization
• Occasionally draw insights from what we know
about the brain
– Rather than as models of brain function

22
Deep Learning Srihari

Understanding Feedforward Nets

• Begin with linear networks and understand their
limitations
• Linear models such as logistic regression and
linear regression can be fit reliably and
efficiently using either
– Closed-form solution
– Convex optimization
• Limitation

23
Deep Learning Srihari

Extending Linear Models

• To represent non-linear functions of x
– apply linear model to transformed input ϕ(x)
• where ϕ is non-linear
– Equivalently kernel trick of SVM obtains nonlinearity
SVM Kernel trick
Deep Learning Srihari

• Many ML algos can be rewritten as dot

products between examples:
f (x)=wTx+b written as b + Σi αi xTx(i)
where x(i) is a training example and α is a vector of coeffts
– This allows us to replace x with a feature function ϕ(x) and
dot product with function k(x,x(i))=ϕ(x)ϕ(x(i)) called a kernel
•The operator represents an inner product analogous to ϕ(x)Tϕ(x(i))
•For some feature spaces we may not literally use an inner product
– In continuous spaces an inner product based on integration
– Gaussian kernel
•Consider k(u,v) = exp (-||u-v||2/2σ2)
– By expanding the square ||u-v||2 = uTu + vTv - 2uTv
– we get k(u,v)=exp(-uTu/2σ2)exp(-uTv/σ2)exp(-vTv/2σ2)
•Validity follows from kernel construction rules
SVM Prediction
Deep Learning Srihari

• Use linear regression on Lagrangian for

determining the weights αi
• We can make predictions using
– f (x)= b + Σiαi k(x,x(i))
– Function is nonlinear wrt x but relationship between
ϕ(x) and f (x) is linear
– Also the relationship between α and f (x) is linear
– We can think of ϕ as providing a set of features
• describing x or providing a new representation for x
Deep Learning Srihari

Disadvantages of Kernel Methods

• Cost of decision function evaluation: linear in m
– Because the ith example contributes term αi k(x, x(i))
to the decision function
– Can mitigate this by learning an α with mostly zeros
• Classification requires evaluating the kernel function only
for training examples that have a nonzero αi
• These are known as support vectors
• Cost of training: high with large data sets
• Generic kernels struggle to generalize well
– Neural net outperformed RBF-SVM on MNIST
• Also, how to choose the mapping ϕ? 27
Deep Learning Srihari

Options for choosing mapping ϕ

1. Generic feature function ϕ (x)
– Radial basis function
2. Manually engineer ϕ
– Feature engineering
3. Principle of Deep Learning: Learn ϕ

28
Deep Learning Srihari

Option 1 to choose the mapping ϕ

• Generic feature function ϕ (x)
– Infinite-dimensional ϕ that is implicitly used by
kernel machines based on RBF
• RBF: N(x ; x(i), σ2I) centered at x(i)
σ =mean distance
x(i):
From between
k-means each unit j and its
clustering closest neighbor

– If ϕ(x) is of high enough dimension we can have

enough capacity to fit the training set
• Generalization to test set remains poor
• Generic feature mappings are based on smoothness
– Do not include prior information to solve advanced problems 29
Deep Learning Srihari

Option 2 to choose the mapping ϕ

• Manually engineer ϕ
• This was the dominant approach until arrival of
deep learning
• Requires decades of effort
– e.g., speech recognition, computer vision
• Little transfer between domains

30
Deep Learning Srihari

Option 3 to choose the mapping ϕ

• Strategy of Deep learning: Learn ϕ
• Model is y=f (x; θ,w) = ϕ(x; θ)T w
– θ used to learn ϕ from broad class of functions
– Parameters w map from ϕ (x) to output
– Defines FFN where ϕ define a hidden layer
• Unlike other two (basis functions, manual
engineering), this approach gives-up on
convexity of training
– But its benefits outweigh harms
31
Deep Learning Srihari

Extend Linear Methods to Learn ϕ

ϕM K outputs y1,..yK for a given input x
θMD
wKM Hidden layer consists of M units

M ⎛D ⎞
yk (x; θ,w) = ∑ wkj φj ⎜⎜∑ θji x i + θj 0 ⎟⎟⎟ + wk 0
j =1 ⎜⎝ i=1 ⎟⎠

ϕ1 w10
yk = fk (x;θ,w) = ϕ (x;θ)T w
ϕ0
Can be viewed as a generalization of linear models
• Nonlinear function fk with M+1 parameters wk= (wk0 ,..wkM ) with
• M basis functions, ϕj j=1,..M each with D parameters θj= (θj1,..θjD)
• Both wk and θj are learnt from data

32
Deep Learning Srihari

Approaches to Learning ϕ
• Parameterize the basis functions as ϕ(x;θ)
– Use optimization to find θ that corresponds to a
good representation
• Approach can capture benefit of first approach
(fixed basis functions) by being highly generic
– By using a broad family for ϕ(x;θ)
• Can also capture benefits of second approach
– Human practitioners design families of ϕ(x;θ) that
will perform well
– Need only find right function family rather than
precise right function 33
Deep Learning Srihari

Importance of Learning ϕ
• Learning ϕ is discussed beyond this first
introduction to feed-forward networks
– It is a recurring theme throughout deep learning
applicable to all kinds of models
• Feedforward networks are application of this
principle to learning deterministic mappings
form x to y without feedback
• Applicable to
– learning stochastic mappings
– functions with feedback
– learning probability distributions over a single vector34
Deep Learning Srihari

Plan of Discussion: Feedforward Networks

1. A simple example: learning XOR
2. Design decisions for a feedforward network
– Many are same as for designing a linear model
• Basics of gradient descent
– Choosing the optimizer, Cost function, Form of output units
– Some are unique
• Concept of hidden layer
– Makes it necessary to have activation functions
• Architecture of network
– How many layers , How are they connected to each other, How
many units in each later
• Learning requires gradients of complicated functions
– Backpropagation and modern generalizations 35
Deep Learning Srihari

1. Ex: XOR problem

• XOR: an operation on binary variables x1 and x2
– When exactly one value equals 1 it returns 1
otherwise it returns 0
– Target function is y=f *(x) that we want to learn
• Our model is y =f ([x1, x2] ; θ) which we learn, i.e., adapt
parameters θ to make it similar to f *
• Not concerned with statistical generalization
– Perform correctly on four training points:
• X={[0,0]T, [0,1]T,[1,0]T, [1,1]T}
– Challenge is to fit the training set
• We want f ([0,0]T; θ) = f ([1,1]T; θ) = 0
• f ([0,1]T; θ) = f ([1,0]T; θ) = 1 36
Deep Learning Srihari

ML for XOR: linear model doesn’t fit

• Treat it as regression with MSE loss function
1 1 4
J(θ) = ∑ ( f * (x)− f (x;θ)) = ∑ ( f * (x n )− f (x n ;θ))
2 2

4 x∈X 4 n=1
Alternative is Cross-entropy J(θ)
– Usually not used for binary data J(θ) = −ln p(t | θ)
N
= −∑ {tn ln yn + (1 −tn )ln(1 −yn )}
– But math is simple n=1
yn= σ (θTxn)

• We must choose the form of the model

• Consider a linear model with θ ={w,b} where
f (x;w,b) = x T w +b
1 4
(
) to get closed-form solution
2

– Minimize J(θ) = ∑ tn −x nT w - b)
4 n=1
• Differentiate wrt w and b to obtain w = 0 and b=½
– Then the linear model f(x;w,b)=½ simply outputs 0.5 everywhere
– Why does this happen? 37
Deep Learning Srihari

Linear model cannot solve XOR

• Bold numbers are values system must output
• When x1=0, output has to increase with x2
• When x1=1, output has to decrease with x2

• Linear model f (x;w,b)= x1w1+x2w2+b has to assign a single

weight to x2, so it cannot solve this problem
• A better solution:
– use a model to learn a different representation
• in which a linear model is able to represent the solution
– We use a simple feedforward network
• one hidden layer containing two hidden units

38
Deep Learning Srihari

Feedforward Network for XOR

• Introduce a simple feedforward

network
– with one hidden layer containing two
units
• Same network drawn in two different
styles
– Matrix W describes mapping from x to h
– Vector w describes mapping from h to y
– Intercept parameters b are omitted
39
Deep Learning Srihari

Functions computed by Network

• Layer 1 (hidden layer): vector of hidden
units h computed by function f (1)(x; W,c)
– c are bias variables
• Layer 2 (output layer) computes
f (2)(h; w,b)
– w are linear regression weights
– Output is linear regression applied to h
rather than to x
• Complete model is
f (x; W,c,w,b)=f (2)(f (1)(x)) 40
Deep Learning Srihari

Linear vs Nonlinear functions

• If we choose both f (1) and f (2) to be linear, the
total function will still be linear f (x)=xTw’
– Suppose f (1)(x)= WTx and f (2)(h)=hTw
– Then we could represent this function as
f (x)=xTw’
f (x)=xTw’ where w’=Ww
• Since linear is insufficient, we must use a
nonlinear function to describe the features
– We use the strategy of neural networks
– by using a nonlinear activation function
h=g(WTx+c) 41
Deep Learning Srihari

Activation Function
• In linear regression we used a vector of weights
w and scalar bias b T
f (x;w,b) = x w +b

– to describe an affine transformation from an input

vector to an output scalar
• Now we describe an affine transformation from
a vector x to a vector h, so an entire vector of
bias parameters is needed
• Activation function g is typically chosen to be
applied element-wise hi=g(xTW:,i+ci)

42
Deep Learning Srihari

Default Activation Function

• Activation: g(z)=max{0,z}
– Applying this to the output of a
linear transformation yields a
nonlinear transformation
– However function remains close A principle of CS:
to linear Build complicated
systems from
• Piecewise linear with two pieces minimal components.
A Turing Machine
• Therefore preserve properties that Memory needs only 0
make linear models easy to and 1 states.
optimize with gradient-based
We can build Universal
methods Function approximator
• Preserve many properties that from ReLUs
make linear models generalize
well
Deep Learning Srihari

Specifying the Network using ReLU

• Activation: g(z)=max{0,z}
• We can now specify the complete network as
f (x; W,c,w,b)=f (2)(f (1)(x))=wT max {0,WTx+c}+b
Deep Learning Srihari

We can now specify XOR Solution

f (x; W,c,w,b)=
• Let ⎡ ⎤
⎢ 1 1 ⎥
⎡ ⎤
⎢ −1 ⎥
⎡ ⎤
W = ⎢ 1 1 ⎥, c = ⎢ 0 ⎥, w = ⎢ 1 ⎥, b = 0
⎢ −2 ⎥ wT max {0,WTx+c}+b
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
• Now walk through how model processes a
batch of inputs ⎡
⎢ 0 0
⎤
⎥
⎢ ⎥
0 1
• Design matrix X of all four points: ⎡
⎢ 0 0
⎤
⎥
X = ⎢
⎢
⎢ 1 0
⎥
⎥
⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥
• First step is XW: ⎡
⎢ 0 −1
⎤ XW = ⎢⎢
⎥ ⎢
1 1
1 1
⎥
⎥
⎥
⎣ 1 1 ⎦
In this space all points lie ⎢ ⎥ ⎢ ⎥
1 0
XW +c = ⎢⎢ ⎥
• Adding c: along a line with slope 1. Cannot be
implemented by a linear model
⎢ 1 0
⎥
⎥
⎢
⎣ 2 2 ⎥
⎦
⎢ ⎥
⎤ ⎢⎣ 2 1 ⎥
• Compute h Using ReLU ⎡
⎢ 0 0 ⎥
⎢
1 0
⎥
⎦
⎢ ⎥
Has changed relationship among examples. max{0, XW +c} = ⎢⎢ ⎥
1 0 ⎥
They no longer lie on a single line. ⎢ ⎥
⎢ 2 1 ⎥
A linear model suffices ⎣ ⎦

• Finish by multiplying by w: ⎡ ⎤
⎢ 0 ⎥
• Network has obtained ⎢
f (x) = ⎢⎢
⎢
1
1
⎥
⎥
⎥
⎥
⎢ ⎥
⎢ 0 ⎥
⎣ ⎦
correct answer for all 4 examples

45
Deep Learning Srihari

Learned representation for XOR

• Two points that must have When x1=0, output has to
output 1 have been increase with x2
When x1=1, output has to
collapsed into one decrease with x2

• Points x=[0,1]T and

x=[1,0]T have been
mapped into h=[0,1]T

When h1=0, output is constant 0

• Described in linear model with h2
When h1=1, output is constant 1
– For fixed h2, output with h2
When h1=2, output is constant 0
increases in h1 with h2
46
Deep Learning Srihari

About the XOR example

• We simply specified the solution
– Then showed that it achieves zero error
• In real situations there might be billions of
parameters and billions of training examples
– So one cannot simply guess the solution
• Instead gradient descent optimization can find
parameters that produce very little error
– The solution described is at the global minimum
• Gradient descent could converge to this solution
• Convergence depends on initial values
• Would not always find easily understood integer solutions
47

TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Deep Feedforward Networks
No ratings yet
Deep Feedforward Networks
103 pages
ArchitectureDesign For DeepLearning
No ratings yet
ArchitectureDesign For DeepLearning
34 pages
Module 2
No ratings yet
Module 2
44 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
week 03-04 - Deep Feedforward Networks - Intro
141 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
Contents MLP PDF
No ratings yet
Contents MLP PDF
60 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Deep Learning Basics Lecture 1 Feedforward
No ratings yet
Deep Learning Basics Lecture 1 Feedforward
31 pages
MLP 1122 20240509 ch10 DeepNN
No ratings yet
MLP 1122 20240509 ch10 DeepNN
47 pages
A2.2 DNN Update 2
No ratings yet
A2.2 DNN Update 2
51 pages
MODULE 2 DL SNOTES P1
No ratings yet
MODULE 2 DL SNOTES P1
16 pages
Unit 3
No ratings yet
Unit 3
12 pages
Deep Learning Lecture 6
No ratings yet
Deep Learning Lecture 6
8 pages
Unit II
No ratings yet
Unit II
56 pages
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
No ratings yet
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
163 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
30 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
167 pages
ML Unit-5
No ratings yet
ML Unit-5
11 pages
A Little Book of Deep Learning - Francois Fleuret
No ratings yet
A Little Book of Deep Learning - Francois Fleuret
149 pages
Lbdlu
No ratings yet
Lbdlu
168 pages
DL-2
No ratings yet
DL-2
62 pages
Deep Learning Model
No ratings yet
Deep Learning Model
144 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
9.0 CNN-Overview PDF
No ratings yet
9.0 CNN-Overview PDF
11 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
143 pages
Ch5-Feedforward Neural Networks, Word Embeddings, Neural Language Models, and Word2vec PDF
No ratings yet
Ch5-Feedforward Neural Networks, Word Embeddings, Neural Language Models, and Word2vec PDF
67 pages
lbdl
No ratings yet
lbdl
143 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
DL 02 Deep Forward Networks
No ratings yet
DL 02 Deep Forward Networks
47 pages
Feed Forward Neural Network
No ratings yet
Feed Forward Neural Network
16 pages
Ch06 Deep Feedforward Networks
No ratings yet
Ch06 Deep Feedforward Networks
90 pages
Module 02
No ratings yet
Module 02
20 pages
ML06_Neural-Network_2024-2025
No ratings yet
ML06_Neural-Network_2024-2025
78 pages
Unit 4
No ratings yet
Unit 4
19 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Neural Networks / Deep Learning
No ratings yet
Neural Networks / Deep Learning
9 pages
Unit 2 v1.
No ratings yet
Unit 2 v1.
41 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
AI Lab 1
No ratings yet
AI Lab 1
11 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
2 3 4 6 7 8 9 Coursenotes
No ratings yet
2 3 4 6 7 8 9 Coursenotes
98 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
ML LittelBook
No ratings yet
ML LittelBook
161 pages
DL Unit-3
No ratings yet
DL Unit-3
9 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
DOC-20241117-WA0000
No ratings yet
DOC-20241117-WA0000
52 pages
MLT unit 4 and 5 part 2
No ratings yet
MLT unit 4 and 5 part 2
34 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Ai Foundation Syllabus
No ratings yet
Ai Foundation Syllabus
18 pages
An Overview and Comparative Analysis of Recurrent Neural Networks For Short Term Load Forecasting
No ratings yet
An Overview and Comparative Analysis of Recurrent Neural Networks For Short Term Load Forecasting
41 pages
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
No ratings yet
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
52 pages
Activation Functions and Keras Metrics
No ratings yet
Activation Functions and Keras Metrics
31 pages
Artificial Neural Networks and Fuzzy Logic
No ratings yet
Artificial Neural Networks and Fuzzy Logic
2 pages
Interview Questions in Neural Network
No ratings yet
Interview Questions in Neural Network
9 pages
Notes of ANN
No ratings yet
Notes of ANN
35 pages
Intelligent Systems in Accounting Finance and Management - 2000 - Coakley - Artificial Neural Networks in Accounting and
No ratings yet
Intelligent Systems in Accounting Finance and Management - 2000 - Coakley - Artificial Neural Networks in Accounting and
26 pages
Black-Box Modeling With State-Space Neural Networks
No ratings yet
Black-Box Modeling With State-Space Neural Networks
30 pages
Introduction of Neural Network
No ratings yet
Introduction of Neural Network
31 pages
2nd Sem, Data Science Syllabus
No ratings yet
2nd Sem, Data Science Syllabus
16 pages
MobiSys 2024 Fast and Energy Efficient Inference
No ratings yet
MobiSys 2024 Fast and Energy Efficient Inference
15 pages
Accounts of Experiences in The Application of Artificial Neural Networks in Chemical Engineering
No ratings yet
Accounts of Experiences in The Application of Artificial Neural Networks in Chemical Engineering
15 pages
Technical Seminar
No ratings yet
Technical Seminar
27 pages
GRU_IoT_IDS
No ratings yet
GRU_IoT_IDS
63 pages
Dam301 Data Mining and Data Warehousing Summary 08024665051
No ratings yet
Dam301 Data Mining and Data Warehousing Summary 08024665051
48 pages
Article - Applications of Artificial Neural Networks in Chemical Engineering
No ratings yet
Article - Applications of Artificial Neural Networks in Chemical Engineering
20 pages
Be Comp Engg Sem-Viii r2019
No ratings yet
Be Comp Engg Sem-Viii r2019
56 pages
Machine Learning in Smart Grids A System
No ratings yet
Machine Learning in Smart Grids A System
31 pages
Learning XOR - Gradient Based Learning - Hidden Units
No ratings yet
Learning XOR - Gradient Based Learning - Hidden Units
43 pages
ML UNIT-4 Notes PDF
100% (1)
ML UNIT-4 Notes PDF
40 pages
Basic Study of Artificial Neural Networks
No ratings yet
Basic Study of Artificial Neural Networks
5 pages
Neural Network Programming With Java Second Edition 2nd Revised Edition Fabio M Soares Alan M F Souza pdf download
No ratings yet
Neural Network Programming With Java Second Edition 2nd Revised Edition Fabio M Soares Alan M F Souza pdf download
89 pages
Instant Download Artificial Neural Networks for Engineers and Scientists: Solving Ordinary Differential Equations 1st Edition S. Chakraverty And Susmita Mall PDF All Chapters
100% (1)
Instant Download Artificial Neural Networks for Engineers and Scientists: Solving Ordinary Differential Equations 1st Edition S. Chakraverty And Susmita Mall PDF All Chapters
55 pages
1701 07274v2 PDF
No ratings yet
1701 07274v2 PDF
30 pages
Yousif Al Mashhadany MSC Thesis
No ratings yet
Yousif Al Mashhadany MSC Thesis
121 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
18 pages
DL Unit-2
No ratings yet
DL Unit-2
31 pages
RCS-080 Machine Learning MCQs
100% (2)
RCS-080 Machine Learning MCQs
57 pages
Chapter 4
No ratings yet
Chapter 4
30 pages

6.1 DeepFFNets

Uploaded by

6.1 DeepFFNets

Uploaded by

Deep Learning Srihari

Deep Feedforward Networks:

Topics in DFF Networks

Sub-topics in Overview of DFF

Goal of a feedforward network

Feedforward network for MNIST

MNIST 28x28 images

Another View of 2-hidden layers

Feedforward Net: US Election

Importance of Feedforward Networks

Feedforward vs. Recurrent

• When feedforward neural networks are

RNN Unrolled RNN

Feedforward Neural Network Structures

• They are called networks because they are

Training the Network

Definition of Hidden Layer

A net with depth 2: one hidden layer

K outputs y1,..yK for a given input x

f (x)= f (2) [ f (1)(x)]

fm (1) =zm= h(xTw(1)), m=1,..M

Feedforward net with depth 2

Depth versus Width

Very Deep CNNs

Convolutional layer parameters denoted

ReLU activation not shown for brevity

Why are they neural networks?

Function Approximation is goal

Understanding Feedforward Nets

Extending Linear Models

• Many ML algos can be rewritten as dot

• Use linear regression on Lagrangian for

Disadvantages of Kernel Methods

Options for choosing mapping ϕ

Option 1 to choose the mapping ϕ

– If ϕ(x) is of high enough dimension we can have

Option 2 to choose the mapping ϕ

Option 3 to choose the mapping ϕ

Extend Linear Methods to Learn ϕ

Plan of Discussion: Feedforward Networks

1. Ex: XOR problem

ML for XOR: linear model doesn’t fit

• We must choose the form of the model

Linear model cannot solve XOR

• Linear model f (x;w,b)= x1w1+x2w2+b has to assign a single

Feedforward Network for XOR

• Introduce a simple feedforward

Functions computed by Network

Linear vs Nonlinear functions

– to describe an affine transformation from an input

Default Activation Function

Specifying the Network using ReLU

We can now specify XOR Solution

Learned representation for XOR

• Points x=[0,1]T and

When h1=0, output is constant 0

About the XOR example

You might also like