6.1 DeepFFNets
6.1 DeepFFNets
1
Deep Learning Srihari
2
Deep Learning Srihari
3
Deep Learning Srihari
Source: https://towardsdatascience.com/probability-and-statistics-explained-in-the-context-of-deep-learning-ed1509b2eb3f
5
Deep Learning Srihari
https://www.easy-tensorflow.com/tf-tutorials/neural-networks/two-layer-neural-network
6
Deep Learning Srihari
Flow of Information
• Models are called Feedforward because: y=f (x)
– To evaluate f (x): information flows one-way from
x through computations defining f s to outputs y
• There are no feedback connections
– No outputs of model are fed back into itself
7
Deep Learning Srihari
9
Deep Learning Srihari
10
Deep Learning Srihari
Definition of Depth
• Overall length of the chain is the depth of the
model
– Ex: the composite function f (x)= f (3) [ f (2) [ f (1)(x)]]
has depth of 3
• The name deep learning arises from this
terminology
• Final layer of a feedforward network, ex f (3), is
called the output layer
12
Deep Learning Srihari
13
Deep Learning Srihari
14
Deep Learning Srihari
⎛ M (2) ⎛ D (1) ⎞ ⎞
yk (x,w) = σ ⎜ ∑ wkj h ⎜ ∑ w ji x i + w (1)
j0 ⎟
+ w (2)
k0 ⎟
⎝ j =1 ⎝ i =1 ⎠ ⎠
15
Deep Learning Srihari
16
Deep Learning Srihari
Width of Model
• Each hidden layer is typically vector-valued
• Dimensionality of hidden layer vector is width of
the model
17
Deep Learning Srihari
Units of a model
• Each element of vector viewed as a neuron
– Instead of thinking of it as a vector-vector function,
they are regarded as units in parallel
• Each unit receives inputs from many other units
and computes its own activation value
18
Deep Learning Srihari
20
Deep Learning Srihari
22
Deep Learning Srihari
23
Deep Learning Srihari
28
Deep Learning Srihari
30
Deep Learning Srihari
M ⎛D ⎞
yk (x; θ,w) = ∑ wkj φj ⎜⎜∑ θji x i + θj 0 ⎟⎟⎟ + wk 0
j =1 ⎜⎝ i=1 ⎟⎠
ϕ1 w10
yk = fk (x;θ,w) = ϕ (x;θ)T w
ϕ0
Can be viewed as a generalization of linear models
• Nonlinear function fk with M+1 parameters wk= (wk0 ,..wkM ) with
• M basis functions, ϕj j=1,..M each with D parameters θj= (θj1,..θjD)
• Both wk and θj are learnt from data
32
Deep Learning Srihari
Approaches to Learning ϕ
• Parameterize the basis functions as ϕ(x;θ)
– Use optimization to find θ that corresponds to a
good representation
• Approach can capture benefit of first approach
(fixed basis functions) by being highly generic
– By using a broad family for ϕ(x;θ)
• Can also capture benefits of second approach
– Human practitioners design families of ϕ(x;θ) that
will perform well
– Need only find right function family rather than
precise right function 33
Deep Learning Srihari
Importance of Learning ϕ
• Learning ϕ is discussed beyond this first
introduction to feed-forward networks
– It is a recurring theme throughout deep learning
applicable to all kinds of models
• Feedforward networks are application of this
principle to learning deterministic mappings
form x to y without feedback
• Applicable to
– learning stochastic mappings
– functions with feedback
– learning probability distributions over a single vector34
Deep Learning Srihari
4 x∈X 4 n=1
Alternative is Cross-entropy J(θ)
– Usually not used for binary data J(θ) = −ln p(t | θ)
N
= −∑ {tn ln yn + (1 −tn )ln(1 −yn )}
– But math is simple n=1
yn= σ (θTxn)
– Minimize J(θ) = ∑ tn −x nT w - b)
4 n=1
• Differentiate wrt w and b to obtain w = 0 and b=½
– Then the linear model f(x;w,b)=½ simply outputs 0.5 everywhere
– Why does this happen? 37
Deep Learning Srihari
38
Deep Learning Srihari
Activation Function
• In linear regression we used a vector of weights
w and scalar bias b T
f (x;w,b) = x w +b
42
Deep Learning Srihari
• Finish by multiplying by w: ⎡ ⎤
⎢ 0 ⎥
• Network has obtained ⎢
f (x) = ⎢⎢
⎢
1
1
⎥
⎥
⎥
⎥
⎢ ⎥
⎢ 0 ⎥
⎣ ⎦
correct answer for all 4 examples
45
Deep Learning Srihari