0% found this document useful (0 votes)
37 views

Part 1.1.neural Network and Training Algorithm

1. The document introduces artificial neural networks and how they are modeled and trained. 2. A neural network consists of interconnected nodes called neurons that are organized in layers. The input layer receives information from the external environment, the output layer delivers the results, and hidden layers connect the input and output. 3. Neural networks are trained through supervised learning using gradient descent optimization to minimize a loss function by iteratively updating the weights of connections between neurons. This allows the network to learn from examples in the training data.

Uploaded by

Việt Hoàng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Part 1.1.neural Network and Training Algorithm

1. The document introduces artificial neural networks and how they are modeled and trained. 2. A neural network consists of interconnected nodes called neurons that are organized in layers. The input layer receives information from the external environment, the output layer delivers the results, and hidden layers connect the input and output. 3. Neural networks are trained through supervised learning using gradient descent optimization to minimize a loss function by iteratively updating the weights of connections between neurons. This allows the network to learn from examples in the training data.

Uploaded by

Việt Hoàng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

AI - FOUNDATION AND APPLICATION

Instructor:
Assoc. Prof. Dr. Truong Ngoc Son
Chapter 1
Introduction of Neural network
How a neuron is modelled?

Input
b
x1 w1
Output

x2 w2 f y

xn wn
Synaptic weight
How a neuron is modelled ?
Training a network – Optimization method
Activation functions
Sigmoid function Tanh function ReLU function
1 𝑒 𝑥 −𝑒 −𝑥 f 𝑥 = max(x,0)
𝜎 𝑥 = f 𝑥 = 𝑒 𝑥 +𝑒 −𝑥
1 + 𝑒 −𝑥
Neural network
Hidden layer
h1 h2 h3

Input layer
Output layer
Artificial neuron network

b1,1
o1 h1
f
W1,1
W1,1
b1,2 b2,1
W2,1 o2 h2 W1,2 a1 y1
𝑜1 = 𝑥1 𝑊1,1 + … + 𝑥𝑛 𝑊1,𝑛 + 𝑏1,1
x1 f f
W1,n Wk,1
ℎ1 = 𝑓 𝑜1
W1,m
W2,n Wk,2 𝑜2 = 𝑥1 𝑊2,1 + … + 𝑥𝑛 𝑊2,𝑛 + 𝑏1,2
b2,k
ℎ2 = 𝑓 𝑜2
ak yk
W1,m
b1,j f 𝑎1 = ℎ1 𝑊1,1 + … + ℎ𝑚 𝑊1,𝑚 + 𝑏2,1
xn oj hj
Wm,n
f Wk,m
𝑦1 = 𝑓 𝑎1
Output layer
Input layer b1,k 𝑎𝑘 = ℎ1 𝑊𝑘,1 + … + ℎ𝑚 𝑊𝑘,𝑚 + 𝑏2,𝑘
om
f
hm
𝑦𝑘 = 𝑓 𝑎𝑘
Hidden layer
TRAINING NEURAL
NETWORK

Supervised learning vs. unsupervised learning


Training an artificial neural network
Supervised learning

Output

Output
Input

Update weight

Error

Desired output
Training an artificial neural network
Unsupervised learning

Output

Input data
Output result
Simple neural network: Understanding of neuron
network learning

(-1) (-1) (-1)


Input
b
x1 w1
Output

x2 w2 f y

xn (+1) (+1) (+1)


wn
Synaptic weight
Quantifying the loss
The loss of a network measure the cost incurred from incorrect prediction

MSE: mean squared error


Output

Output
Input

Error Cross-entropy loss

Desired output
Training a network
Training a neural network is a process of using an optimization algorithm to find
a set of weights to best map inputs to outputs.

In other word, this is the way to minimize the loss

𝑛

1
𝑊 = argmin ℒ(𝑓 𝑥 𝑖 , 𝑊 , 𝑦 𝑖 )
𝑤 𝑛
𝑖=1

So hard? Don’t worry, we will dive into the detail later


GRADIENT DESCENT
Training Neural Networks– Optimization of
the loss
What is Gradient ?

𝜕𝑓 𝜕𝑓 𝜕𝑓
𝑓(𝑥, 𝑦, 𝑧) 𝛻𝑓 = , , 𝜕𝑓 𝜕𝑓 𝜕𝑓
𝜕𝑥 𝜕𝑦 𝜕𝑧 𝑓 𝑥, 𝑦, 𝑧 = 𝑥 −
𝜕𝑥
,𝑦 −
𝜕𝑦
,𝑧 −
𝜕𝑧
𝜕𝑓 𝜕𝑓 𝜕𝑓
Gradient of f 𝛻𝑓 = 𝑖+ 𝑗+ 𝑘
𝜕𝑥 𝜕𝑦 𝜕𝑧
Training a network – Optimization of the loss
Gradient Descent
Gradient descent is an optimization algorithm used to find the values of parameters
(coefficients) of a function (f) that minimizes a cost function (loss). This can be done
by iteratively moving in the direction of steepest descent as defined by the negative
of the gradient

𝑥0 = (𝑥0 , 𝑦0 , 𝑧0 )
𝑥𝑛+1 = 𝑥𝑛 − 𝜂𝛻𝑓 𝑥𝑛

𝜕𝑓 𝜕𝑓 𝜕𝑓
𝑥, 𝑦, 𝑧 = 𝑥 − 𝜂 ,𝑦 − 𝜂 ,𝑧 − 𝜂
𝜕𝑥 𝜕𝑦 𝜕𝑧
Optimization of the loss with gradient descent
Example: Linear Regression

y= mx+b

Desired output

Error

Predicted output

x1 x2
n
loss o  y 
i1
i i
2

m = m + Dm
(loss)
b = b + Db
Optimization of the loss with gradient descent
Assignment 01: Logistic Regresstion

y= mx+b

Desired output

Error

Predicted output

x1 x2
n
loss o  y 
i1
i i
2

m = m + Dm
(loss)
b = b + Db
Training Neural networks – Optimization method

x1
W1,1

W1,2 a1 o1
f
x2
W1,784
W10,2
W10,1

a10 o10
W10,784 f

x784
LOSS OPTIMIZATION WITH
GRADIENT DESCENT
Mathematical modeling of Training Process
Example

x1
W1,1

W1,2 a1 o1
f
x2
W1,784
W10,2
W10,1

a10 o10
W10,784 f

x784
Mathematical modeling of Training Process
Desired outputs, labels
1 x1
W1,1
0 W1,2 a1 o1
xi
f 1
W1,784 Wj,1
W10,2 aj oj
Wi,j f 0
W10,1
a10 o10
Wj,784 f
0
W10,784

1 x784
Mathematical modeling of Training Process
Desired outputs, labels
0 x1
W1,1

0 W1,2 a1
f
o1
0
xi
W1,784 Wj,1
W10,2 aj oj
Wi,j f 0
W10,1
a10 o10
Wj,784 f
W10,784
1
x784
1
Mathematical modeling of Training Process
Randomly initialize
Weights, W
1 x1
W1,1
0 W1,2 a1
f
o1
0.5
xi
W1,784 Wj,1
W10,2 aj oj
Wi,j f 0.6
W10,1
a10 o10
Wj,784 f 0.9
W10,784

1 x784
Mathematical modeling of Training Process
W = ArgMin (Loss)

1 x1
W1,1
0 W1,2 a1
f
o1
0.9
xi
W1,784 Wj,1
W10,2 aj oj
Wi,j f 0.2
W10,1
a10 o10
Wj,784 f
0.1
W10,784

1 x784
Mathematical modeling of Training Process
Training process

Update w Predictive Desired


1 x1
output,o output, y
W1,1
0 W1,2 a1 o1
xi
f 0.5 1
W1,784 Wj,1
W10,2 aj oj
f
Wi,j
0.6 0
W10,1
a10 o10
Wj,784 f
W10,784 0.9 0
1 x784

Error
Mathematical modeling of Training Process
For being simple, b=0
x1
Formulate the output loss For jth output
W1,1

W1,2 a1 o1 784 10 10
xi
f
1 784
2
𝑎𝑗 = 𝑥𝑖 𝑤𝑗,𝑖 𝐿= 𝑦𝑗𝑡 − 𝑜𝑗𝑡
W10,2
W1,784 Wj,1 aj oj 10 𝑎𝑗 = 𝑥𝑖 𝑤𝑗,𝑖
Wi,j f 𝑖=1 𝑡=1 𝑗=1
1 𝑖=1
𝑜𝑗 = 𝜎 𝑎𝑗 = 1
W10,1
a10 o10 1 + 𝑒 −𝑎𝑗 For jth output 𝑜𝑗 = 𝜎 𝑎𝑗 =
Wj,784 f 1 + 𝑒 −𝑎𝑗
W10,784
x784 For jth output 1
10
2
𝐿= 𝑦𝑗𝑡 − 𝑜𝑗𝑡 Gradient descent
10
𝑡=1
10 𝜕𝐿
𝜕𝐿 2 𝜕𝑜𝑗𝑡 𝑤𝑗,𝑖 ← 𝑤𝑗,𝑖 − 𝜂
=− 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝜕𝑤𝑗,𝑖
𝜕𝑤𝑗,𝑖 10 𝜕𝑤𝑗,𝑖
𝑡=1
10
10 𝜕𝐿 2
𝜕𝐿 2 𝜕𝑜𝑗𝑡 𝜕𝑎𝑗𝑡 =− 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝑜𝑗𝑡 (1 − 𝑜𝑗𝑡 )𝑥𝑖𝑡
=− 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝜕𝑤𝑗,𝑖 10
𝜕𝑤𝑗,𝑖 10 𝜕𝑎𝑗𝑡 𝜕𝑤𝑗,𝑖 𝑡=1
𝑡=1

1 𝜕𝑜𝑗𝑡 𝑤𝑗,𝑖 = 𝑤𝑗,𝑖 + ∆𝑤𝑗,𝑖


𝑜𝑗𝑡 =𝜎 𝑎𝑗𝑡 = = 𝑜𝑗𝑡 (1 − 𝑜𝑗𝑡 )
−𝑎𝑗𝑡 𝜕𝑎𝑗𝑡
1+𝑒
10
2
∆𝑤𝑗,𝑖 =𝜂 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝑜𝑗𝑡 (1 − 𝑜𝑗𝑡 )𝑥𝑖𝑡
10
𝑡=1
PYTHON CODE
Translating mathematics into code
Translating mathematics into code
x1

MNIST Dataset W1,1

W1,2 a1
f
o1
x2
W1,784
W10,2
W10,1

a10 o10
W10,784 f

x784

Gradient descent
10
𝜕𝐿 2
=− 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝑜𝑗𝑡 (1 − 𝑜𝑗𝑡 )𝑥𝑖𝑡
𝜕𝑤𝑗,𝑖 10
𝑡=1
60,000 training samples
10,000 testing samples 𝑤𝑗,𝑖 = 𝑤𝑗,𝑖 + ∆𝑤𝑗,𝑖

10
2
∆𝑤𝑗,𝑖 =𝜂 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝑜𝑗𝑡 (1 − 𝑜𝑗𝑡 )𝑥𝑖𝑡
10
𝑡=1
Neuron’s output
n1 W1,1 W1,2 W1,3 W1,784

W2,1 W2,2 W2,3 W2,784


W

n10 W10,1 W10,2 W10,3 W10,784 WT

n1 n2 n10
x1
W1,1 W1,1 W2,1 W10,1

W1,2 a1 o1 W1,2
n1 f W2,2 W10,2
x2 o1 X x1 x2 x784
W1,3 W2,3 W10,3
W1,784
W10,2
W1,784 W2,784 W10,784
W10,1

a10 o10 a1 a2 a10


W10,784 n10 f o10

x784

𝑎 = 𝑋𝑊 𝑇
784
1
𝑎𝑗 = 𝑥𝑖 𝑤𝑗,𝑖 𝑜=𝜎 𝑎 =
1 + 𝑒 −𝑎
𝑖=1
1
𝑜𝑗 = 𝜎 𝑎𝑗 =
1 + 𝑒 −𝑎𝑗
Neuron’s output - batch of neurons
WT

n1 n2 n10

(1) (1) (1) W1,1 W2,1 W10,1


n1 W1,1 W1,2 W1,3 W1,784 X1 X2 x784
W1,2 W2,2 W10,2
W2,1 W2,2 W2,3 W2,784
W
(2) (2) (2)
W1,3 W2,3 W10,3
X x1 x2 x784
n10 W10,1 W10,2 W10,3 W10,784
W1,784 W2,784 W10,784

(10) (10) (10)


x1 x1 x2 x784
W1,1

W1,2 a1 o1 (1) (2) (10)


n1 f o1 o1 o1
x2
(1) (1) (1) (1) (1) (1)
W1,784 a1 a2 a10 o1 o2 o10
W10,2
W10,1 (2) (2) (2) (2) (2) (2)
a1 a2 a10 o1 o2 o10
a10 o10 (1) (2) (10)
W10,784 n10 f o10 o10 o10
(10) (10) (10) (10) (10) (10)
a1 a2 a10 o1 o2 o10
x784 784

𝑎𝑗 = 𝑥𝑖 𝑤𝑗,𝑖
𝑖=1
1
𝑜𝑗 = 𝜎 𝑎𝑗 = 𝑎 = 𝑋𝑊 𝑇
1 + 𝑒 −𝑎𝑗
1
𝑜=𝜎 𝑎 =
1 + 𝑒 −𝑎
Gradient calculating dT
X

(1) (1) (1)


X1 X2 x784
10 (1) (2) (10)
𝜕𝐿 2 d1 d1 d1
=− 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝑜𝑗𝑡 (1 − 𝑜𝑗𝑡 )𝑥𝑖𝑡
𝜕𝑤𝑗,𝑖 10 (1) (2) (10)
𝑡=1 d2 d2 d2 (2)
x1
(2)
x2
(2)
x784

𝑤𝑗,𝑖 = 𝑤𝑗,𝑖 + ∆𝑤𝑗,𝑖 (1)


d10
(2)
d10
(10)
d10 (10) (10) (10)
x1 x2 x784

10
2
∆𝑤𝑗,𝑖 =𝜂 𝑦𝑗𝑡 − 𝑜𝑗𝑡 𝑜𝑗𝑡 (1 − 𝑜𝑗𝑡 )𝑥𝑖𝑡
10
𝑡=1

ΔW1,1 ΔW1,2 ΔW1,3 ΔW1,784


(1) (1) (1)
d1 d2 d10 ΔW2,1 ΔW2,2 ΔW2,3 ΔW2,784
ΔW

(2) (2) (2)


d1 d2 d10 ΔW10,1ΔW10,2 ΔW10,3 ΔW10,784
d

(10) (10) (10)


d1 d2 d10 𝑑 = 𝑦 − 𝑜 𝑜(1 − 𝑜) (element-wise product)
2 𝑇
∆𝑤 = 𝜂 𝑑 𝑋
10
(element-wise product)
Load data set, pick out 10 images
𝑑 = 𝑦 − 𝑜 𝑜(1 − 𝑜)
2 𝑇
∆𝑤 = 𝜂 𝑑 𝑋
10
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
print('load data from MNIST')
mnist = tf.keras.datasets.mnist
(x_train,y_train), (x_test,y_test) = mnist.load_data()
dig = np.array([1,3,5,7,9,11,13,15,17,19]) # get the digit 0 - 9
x = x_train[dig,:,:]
y = np.eye(10,10)
plt.subplot(121)
plt.imshow(x[0])
plt.subplot(122)
plt.imshow(x[1])
x = np.reshape(x,(-1,784))/255
Define parameters and functions
def sigmoid(x):
return 1./(1.+np.exp(-x))

W = np.random.uniform(-0.1,0.1,(10,784))
o = sigmoid(np.matmul(x,W.transpose())) # matrix multiplication
print('output of first neuron with 10 digits ', o[:,0])
fig = plt.figure()
plt.bar([i for i, _ in enumerate(o)],o[:,0])
plt.show()
Training
#training process
n = 0.05
num_epoch = 10
for epoch in range(num_epoch):
This is just a simple
o = sigmoid(np.matmul(x,W.transpose())) example to intuitively
loss =np.power(o-y,2).mean() understand how to
#calculate update for all wegihts in matrix
dW =np.transpose((y-o)*o*(1-o))@x ∆𝑤 = 𝜂
2 𝑇
𝑑 𝑋
translate math into
10
#update python code
𝑑 = 𝑦 − 𝑜 𝑜(1 − 𝑜)
W=W+n*dW
print(loss)

o = sigmoid(np.matmul(x,W.transpose()))
print('output of the first neuron with 10 input digits ', o[:,0])

fig = plt.figure()
plt.bar([i for i, _ in enumerate(o)],o[:,0])
plt.show()

You might also like