Part 1.1.neural Network and Training Algorithm
Part 1.1.neural Network and Training Algorithm
Instructor:
Assoc. Prof. Dr. Truong Ngoc Son
Chapter 1
Introduction of Neural network
How a neuron is modelled?
Input
b
x1 w1
Output
x2 w2 f y
xn wn
Synaptic weight
How a neuron is modelled ?
Training a network – Optimization method
Activation functions
Sigmoid function Tanh function ReLU function
1 𝑒 𝑥 −𝑒 −𝑥 f 𝑥 = max(x,0)
𝜎 𝑥 = f 𝑥 = 𝑒 𝑥 +𝑒 −𝑥
1 + 𝑒 −𝑥
Neural network
Hidden layer
h1 h2 h3
Input layer
Output layer
Artificial neuron network
b1,1
o1 h1
f
W1,1
W1,1
b1,2 b2,1
W2,1 o2 h2 W1,2 a1 y1
𝑜1 = 𝑥1 𝑊1,1 + … + 𝑥𝑛 𝑊1,𝑛 + 𝑏1,1
x1 f f
W1,n Wk,1
ℎ1 = 𝑓 𝑜1
W1,m
W2,n Wk,2 𝑜2 = 𝑥1 𝑊2,1 + … + 𝑥𝑛 𝑊2,𝑛 + 𝑏1,2
b2,k
ℎ2 = 𝑓 𝑜2
ak yk
W1,m
b1,j f 𝑎1 = ℎ1 𝑊1,1 + … + ℎ𝑚 𝑊1,𝑚 + 𝑏2,1
xn oj hj
Wm,n
f Wk,m
𝑦1 = 𝑓 𝑎1
Output layer
Input layer b1,k 𝑎𝑘 = ℎ1 𝑊𝑘,1 + … + ℎ𝑚 𝑊𝑘,𝑚 + 𝑏2,𝑘
om
f
hm
𝑦𝑘 = 𝑓 𝑎𝑘
Hidden layer
TRAINING NEURAL
NETWORK
Output
Output
Input
Update weight
Error
Desired output
Training an artificial neural network
Unsupervised learning
Output
Input data
Output result
Simple neural network: Understanding of neuron
network learning
x2 w2 f y
Output
Input
Desired output
Training a network
Training a neural network is a process of using an optimization algorithm to find
a set of weights to best map inputs to outputs.
𝑛
∗
1
𝑊 = argmin ℒ(𝑓 𝑥 𝑖 , 𝑊 , 𝑦 𝑖 )
𝑤 𝑛
𝑖=1
𝜕𝑓 𝜕𝑓 𝜕𝑓
𝑓(𝑥, 𝑦, 𝑧) 𝛻𝑓 = , , 𝜕𝑓 𝜕𝑓 𝜕𝑓
𝜕𝑥 𝜕𝑦 𝜕𝑧 𝑓 𝑥, 𝑦, 𝑧 = 𝑥 −
𝜕𝑥
,𝑦 −
𝜕𝑦
,𝑧 −
𝜕𝑧
𝜕𝑓 𝜕𝑓 𝜕𝑓
Gradient of f 𝛻𝑓 = 𝑖+ 𝑗+ 𝑘
𝜕𝑥 𝜕𝑦 𝜕𝑧
Training a network – Optimization of the loss
Gradient Descent
Gradient descent is an optimization algorithm used to find the values of parameters
(coefficients) of a function (f) that minimizes a cost function (loss). This can be done
by iteratively moving in the direction of steepest descent as defined by the negative
of the gradient
𝑥0 = (𝑥0 , 𝑦0 , 𝑧0 )
𝑥𝑛+1 = 𝑥𝑛 − 𝜂𝛻𝑓 𝑥𝑛
𝜕𝑓 𝜕𝑓 𝜕𝑓
𝑥, 𝑦, 𝑧 = 𝑥 − 𝜂 ,𝑦 − 𝜂 ,𝑧 − 𝜂
𝜕𝑥 𝜕𝑦 𝜕𝑧
Optimization of the loss with gradient descent
Example: Linear Regression
y= mx+b
Desired output
Error
Predicted output
x1 x2
n
loss o y
i1
i i
2
m = m + Dm
(loss)
b = b + Db
Optimization of the loss with gradient descent
Assignment 01: Logistic Regresstion
y= mx+b
Desired output
Error
Predicted output
x1 x2
n
loss o y
i1
i i
2
m = m + Dm
(loss)
b = b + Db
Training Neural networks – Optimization method
x1
W1,1
W1,2 a1 o1
f
x2
W1,784
W10,2
W10,1
a10 o10
W10,784 f
x784
LOSS OPTIMIZATION WITH
GRADIENT DESCENT
Mathematical modeling of Training Process
Example
x1
W1,1
W1,2 a1 o1
f
x2
W1,784
W10,2
W10,1
a10 o10
W10,784 f
x784
Mathematical modeling of Training Process
Desired outputs, labels
1 x1
W1,1
0 W1,2 a1 o1
xi
f 1
W1,784 Wj,1
W10,2 aj oj
Wi,j f 0
W10,1
a10 o10
Wj,784 f
0
W10,784
1 x784
Mathematical modeling of Training Process
Desired outputs, labels
0 x1
W1,1
0 W1,2 a1
f
o1
0
xi
W1,784 Wj,1
W10,2 aj oj
Wi,j f 0
W10,1
a10 o10
Wj,784 f
W10,784
1
x784
1
Mathematical modeling of Training Process
Randomly initialize
Weights, W
1 x1
W1,1
0 W1,2 a1
f
o1
0.5
xi
W1,784 Wj,1
W10,2 aj oj
Wi,j f 0.6
W10,1
a10 o10
Wj,784 f 0.9
W10,784
1 x784
Mathematical modeling of Training Process
W = ArgMin (Loss)
1 x1
W1,1
0 W1,2 a1
f
o1
0.9
xi
W1,784 Wj,1
W10,2 aj oj
Wi,j f 0.2
W10,1
a10 o10
Wj,784 f
0.1
W10,784
1 x784
Mathematical modeling of Training Process
Training process
Error
Mathematical modeling of Training Process
For being simple, b=0
x1
Formulate the output loss For jth output
W1,1
W1,2 a1 o1 784 10 10
xi
f
1 784
2
𝑎𝑗 = 𝑥𝑖 𝑤𝑗,𝑖 𝐿= 𝑦𝑗𝑡 − 𝑜𝑗𝑡
W10,2
W1,784 Wj,1 aj oj 10 𝑎𝑗 = 𝑥𝑖 𝑤𝑗,𝑖
Wi,j f 𝑖=1 𝑡=1 𝑗=1
1 𝑖=1
𝑜𝑗 = 𝜎 𝑎𝑗 = 1
W10,1
a10 o10 1 + 𝑒 −𝑎𝑗 For jth output 𝑜𝑗 = 𝜎 𝑎𝑗 =
Wj,784 f 1 + 𝑒 −𝑎𝑗
W10,784
x784 For jth output 1
10
2
𝐿= 𝑦𝑗𝑡 − 𝑜𝑗𝑡 Gradient descent
10
𝑡=1
10 𝜕𝐿
𝜕𝐿 2 𝜕𝑜𝑗𝑡 𝑤𝑗,𝑖 ← 𝑤𝑗,𝑖 − 𝜂
=− 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝜕𝑤𝑗,𝑖
𝜕𝑤𝑗,𝑖 10 𝜕𝑤𝑗,𝑖
𝑡=1
10
10 𝜕𝐿 2
𝜕𝐿 2 𝜕𝑜𝑗𝑡 𝜕𝑎𝑗𝑡 =− 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝑜𝑗𝑡 (1 − 𝑜𝑗𝑡 )𝑥𝑖𝑡
=− 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝜕𝑤𝑗,𝑖 10
𝜕𝑤𝑗,𝑖 10 𝜕𝑎𝑗𝑡 𝜕𝑤𝑗,𝑖 𝑡=1
𝑡=1
W1,2 a1
f
o1
x2
W1,784
W10,2
W10,1
a10 o10
W10,784 f
x784
Gradient descent
10
𝜕𝐿 2
=− 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝑜𝑗𝑡 (1 − 𝑜𝑗𝑡 )𝑥𝑖𝑡
𝜕𝑤𝑗,𝑖 10
𝑡=1
60,000 training samples
10,000 testing samples 𝑤𝑗,𝑖 = 𝑤𝑗,𝑖 + ∆𝑤𝑗,𝑖
10
2
∆𝑤𝑗,𝑖 =𝜂 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝑜𝑗𝑡 (1 − 𝑜𝑗𝑡 )𝑥𝑖𝑡
10
𝑡=1
Neuron’s output
n1 W1,1 W1,2 W1,3 W1,784
n1 n2 n10
x1
W1,1 W1,1 W2,1 W10,1
W1,2 a1 o1 W1,2
n1 f W2,2 W10,2
x2 o1 X x1 x2 x784
W1,3 W2,3 W10,3
W1,784
W10,2
W1,784 W2,784 W10,784
W10,1
x784
𝑎 = 𝑋𝑊 𝑇
784
1
𝑎𝑗 = 𝑥𝑖 𝑤𝑗,𝑖 𝑜=𝜎 𝑎 =
1 + 𝑒 −𝑎
𝑖=1
1
𝑜𝑗 = 𝜎 𝑎𝑗 =
1 + 𝑒 −𝑎𝑗
Neuron’s output - batch of neurons
WT
n1 n2 n10
𝑎𝑗 = 𝑥𝑖 𝑤𝑗,𝑖
𝑖=1
1
𝑜𝑗 = 𝜎 𝑎𝑗 = 𝑎 = 𝑋𝑊 𝑇
1 + 𝑒 −𝑎𝑗
1
𝑜=𝜎 𝑎 =
1 + 𝑒 −𝑎
Gradient calculating dT
X
10
2
∆𝑤𝑗,𝑖 =𝜂 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝑜𝑗𝑡 (1 − 𝑜𝑗𝑡 )𝑥𝑖𝑡
10
𝑡=1
W = np.random.uniform(-0.1,0.1,(10,784))
o = sigmoid(np.matmul(x,W.transpose())) # matrix multiplication
print('output of first neuron with 10 digits ', o[:,0])
fig = plt.figure()
plt.bar([i for i, _ in enumerate(o)],o[:,0])
plt.show()
Training
#training process
n = 0.05
num_epoch = 10
for epoch in range(num_epoch):
This is just a simple
o = sigmoid(np.matmul(x,W.transpose())) example to intuitively
loss =np.power(o-y,2).mean() understand how to
#calculate update for all wegihts in matrix
dW =np.transpose((y-o)*o*(1-o))@x ∆𝑤 = 𝜂
2 𝑇
𝑑 𝑋
translate math into
10
#update python code
𝑑 = 𝑦 − 𝑜 𝑜(1 − 𝑜)
W=W+n*dW
print(loss)
o = sigmoid(np.matmul(x,W.transpose()))
print('output of the first neuron with 10 input digits ', o[:,0])
fig = plt.figure()
plt.bar([i for i, _ in enumerate(o)],o[:,0])
plt.show()