Chapter 2 - 3 Deep Neural Network
Chapter 2 - 3 Deep Neural Network
W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical
biophysics, 5(4):115–133, 1943.
Layer 1
[1]
• 2-layer NN 𝑎1
• 1 hidden layer
𝑥1 [1]
𝑎2
𝑎0 = 𝑿 Layer 2
𝑥2 𝑎3
[1]
𝑦ො =𝑎
[2]
𝑥3 [1]
𝑎4 [1]
𝑎1
[1]
𝑎2
𝑎[1] = [1]
𝑎3
[1]
𝑎4
Input layer Hidden layer Output layer
𝑥1
𝑥2 𝜎(𝑧) 𝑎 = 𝑦ො
𝑧 𝑎
𝑥3
𝑧 = 𝑤𝑇𝑥 + 𝑏
𝑎 = 𝜎(𝑧)
z
sigmoid 1 𝑎(1 − 𝑎)
𝑎=
1 + 𝑒 −𝑧 tanh a
tanh 𝑒 𝑧 − 𝑒 −𝑧 1 − 𝑎2
z
𝑎= 𝑧
𝑒 + 𝑒 −𝑧
0 if 𝑧 < 0
ReLU a
ReLU max(0, 𝑧)
1 if 𝑧 ≥ 0
0.01 if 𝑧 < 0
z
Leaky ReLU max(0.01𝑧, 𝑧)
1 if 𝑧 ≥ 0 Leaky a
ReLU
z
Minhhuy Le, ICSLab, Phenikaa Uni. 9
Previous Lecture Overview Vectorizing across multiple examples
for i = 1 to m:
1 (𝑖) 1
𝑧 =𝑊 𝑥 (𝑖) + 𝑏 1
𝑎 1 (𝑖) = 𝜎(𝑧 1 𝑖
)
2 (𝑖) 2
𝑧 =𝑊 𝑎 1 (𝑖) + 𝑏 2
𝑋= 𝑥 (1) 𝑥 (2) … 𝑥 (𝑚)
𝑎 2 (𝑖) = 𝜎(𝑧 2 𝑖
)
1 1
𝑍 =𝑊 𝑋+𝑏 1
A [1]
= 𝐴 1 = 𝜎(𝑍 1 )
𝑎[1](1) 𝑎[1](2) …𝑎[1](𝑚)
𝑍2 =𝑊 2
𝐴1 +𝑏 2
𝐴 2 = 𝜎(𝑍 2 )
𝑏 [1] 𝑑𝑧 [1] = 𝑊 2𝑇
𝑑𝑧 [2] ∗ 𝑔[1] ′(z 1 )
𝑑𝑧 [2] = 𝑎[2] − 𝑦
𝑇
𝑑𝑊 [2] = 𝑑𝑧 [2] 𝑎 1
𝑑𝑏 [2] = 𝑑𝑧 [2]
Minhhuy Le, ICSLab, Phenikaa Uni. 11
Previous Lecture Overview Vectorizing Gradient Descent
𝑥1
𝑥1
𝑥2 𝑦ො
𝑥2 𝑦ො
𝑥3
1-layer
𝑥3 2-layers
𝑥1
𝑦ො
𝑥2
5-layers
• #Layers = 𝐿 = 5
• 𝑛[𝑙] = #units in layer 𝑙
• 𝑛[1] = 3, 𝑛[2] = 5, 𝑛[3] = 4, 𝑛[4] = 2, 𝑛[5] = 𝑛[𝐿] = 1
• 𝑛[0] = 𝑛[𝑥] = 2
• 𝑎[𝑙] = 𝑔 𝑙 (𝑧 [𝑙] )
• 𝑊 [𝑙] = weights for computing 𝑧 [𝑙]
• 𝑏 [𝑙] = bias for computing 𝑧 [𝑙]
𝑥1 𝑦ො
• 𝑎[0] = 𝑋 (input)
𝑥2
• 𝑎[𝐿] = 𝑦ො (prediction output)
Matrix Dimensions
𝑍 [𝑙] , 𝐴[𝑙] , 𝑏 [𝑙] , 𝑑𝑍 [𝑙] , 𝑑𝐴[𝑙] , 𝑑𝑏 [𝑙] (𝑛[𝑙] , 𝑚)
𝑊 [𝑙] , 𝑑𝑊 [𝑙] 𝑛 𝑙 , 𝑛 𝑙−1
• Input: 𝑎[𝑙−1]
• Forward Propagation:
• For single example:
𝑧 [𝑙] = 𝑊 [𝑙] 𝑎[𝑙−1] + 𝑏 [𝑙] , 𝑎[𝑙] = 𝑔 𝑙 (𝑧 [𝑙] )
• Vectorized version:
𝑍 [𝑙] = 𝑊 [𝑙] 𝐴[𝑙−1] + 𝑏 [𝑙] , 𝐴[𝑙] = 𝑔 𝑙 (𝑍 [𝑙] )
• Output: 𝑎[𝑙]
• Cache: 𝑧 [𝑙] , 𝑊 [𝑙] , 𝑏 [𝑙]
𝑚
𝑍 [2] = 𝑊 [2] 𝐴[1] + 𝑏 [2] 1
𝑑𝑏 = 𝑛𝑝. sum(d𝑍 𝐿 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
[𝐿]
𝐴[2] = 𝑔 2 (𝑍 2 ) 𝑚 𝑇 𝐿
…
𝑑𝑍 [𝐿−1] = 𝑑𝑊 𝐿 𝑑𝑍 𝐿 𝑔′ (𝑍 𝐿−1 )
…
𝐴[𝐿] = 𝑔 𝐿 𝑍 𝐿 = 𝑌
[1] 𝐿𝑇 1
𝑑𝑍 = 𝑑𝑊 𝑑𝑍 2 𝑔′ (𝑍 1 )
1 𝑇
Forward Propagation 𝑑𝑊 = 𝑑𝑍 1 𝐴 1
[1]
𝑚
1
𝑑𝑏 [1] = 𝑛𝑝. sum(d𝑍 1 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
𝑚
Backward Propagation
On Assignments
Coding time !!!