0% found this document useful (0 votes)
16 views159 pages

Lecture 04 (6hrs) Neural Network and Deep Learning

The document is a lecture on neural networks and deep learning. It discusses gradient descent algorithms, backpropagation algorithms for feedforward neural networks, convolutional neural networks, and deep learning. Specifically, it defines gradient and directional derivative, describes gradient descent algorithms and their geometric meaning, and compares gradient descent to Newton's method. It provides examples and illustrations of gradient and gradient descent.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views159 pages

Lecture 04 (6hrs) Neural Network and Deep Learning

The document is a lecture on neural networks and deep learning. It discusses gradient descent algorithms, backpropagation algorithms for feedforward neural networks, convolutional neural networks, and deep learning. Specifically, it defines gradient and directional derivative, describes gradient descent algorithms and their geometric meaning, and compares gradient descent to Newton's method. It provides examples and illustrations of gradient and gradient descent.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 159

Neural Network and Deep

Learning

Xizhao WANG
Big Data Institute
College of Computer Science
Shenzhen University

March 2021
Gradient Descent Algorithm
BP Algorithm for Feed-Forward Neural Network Model
Convolutional Neural Network
Deep Learning

Outline

1. Gradient Descent Algorithm


2. BP Algorithm for Feed-Forward Neural
Network Model
3. Convolutional Neural Network
4. Deep Learning

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Lecture 01
Gradient Descent Algorithm
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Definition:

Directional derivative (taking the triadic function as an example):


Suppose function f is defined in a neighborhood of point P0 (x0, y0,
z0),l is a ray from point P0, P (x, y, z) is a point on l and is
contained in the neighborhood of P0, ρ represents the distance
between P and P0.

If lim((( f ( P ))  ( f ( P0 ))) /  )  lim(f /  )

exists when ρ→0, we call this limit the directional derivative of f


at P0 along the direction of l.

Generally speaking, directional derivative is the rate of change


of a function in a specified direction.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


The geometric meaning of directional derivatives:

Suppose z=f (x, y) is a surface equation; M (x, y, z) is a point on the curved


surface. An intersecting line is formed by the curved surface and the vertical
plane which goes through M along the direction of l. θ is the angle between l
and the tangent of the intersecting line at M.

f
Then  tan  .
l

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

Computation for directional derivative:

If function f is differentiable at P0 (x0, y0, z0), then the directional


derivative of f at P0 along any direction l exists, and the expression is:

f f f f
 cos   cos   cos  ,
l x y z

where cosα, cosβ, cosγ are direction cosine of l.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


The gradient of a scalar function f (x1, x2, ∙∙∙, xn) is denoted as
 f 
 x 
 1 
 f  T
   f f f 
f ( X )   x 2   , , ,  .
   x1 x 2 x n  f
 
  f
 x


 n 

In the three-dimensional Cartesian coordinate system with a Euclidean


metric, the gradient, if it exists, is given by:

f f f
f  i j k
x y z

where i, j, k are the standard unit vectors in the directions of the


coordinates, respectively. For example, the gradient of the function
f ( x, y, z )  2 x  3 y 2  sin( z ) is f  2i  6 yj  cos( z )k.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Geometric Meaning
The gradient specifies the direction that produces the steepest increase in the
function. The negative of gradient therefore gives the direction of steepest
decrease.

In the above two images, the values of the function are represented in black and white, black re
presenting higher values, and its corresponding gradient is represented by blue arrows.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Geometric Meaning
2
 2
The gradient of the function f ( x, y )   cos x  cos y 
2

is depicted as a projected vector field on the bottom plane.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


For the 2-dimensional case:
Gradient: Suppose z  f ( x, y ) has the first-order continuous partial derivative on
region D, and for there exists a vector  P ( x, y )
 f f   
 ,   f x ( x , y ) i  f y ( x , y ) j,
 x y 
then the gradient of z=f (x, y) at P(x, y) is marked as grad f (x, y) or i.e., f ( x, y ),

f  f 
grad f ( x, y )  f ( x, y )  i j.
x y

Along with gradient direction,


the function changes most
quickly

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Suppose e = [cosα, cosβ] is a unit vector in l direction, then

f f f  f f 
 cos   cos    , cos  , cos  
l x y  x y 
 grad f ( x, y )  e
 grad f ( x, y )  e  cos gradf ( x, y ), e

cos gradf ( x, y ), e  1,
f
then the directional derivative attains its maximum value, which equals to the
l
norm of gradient, i.e.

2 2
 f   f 
grad f ( x, y )    
 y 
 .
 x   
Then when variables change along the gradient direction, the rate of change of a
function attains its maximum value, which is the norm of the gradient.
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

When gradient is generalized to n dimensional space, it can be represented as:

 f 
 x 
 1
 f  T
   f f f 
f ( X )   x2    , ,,  .
   x1 x2 xn 
 
 f 
 x 
 n

Along with gradient direction, the function changes most quickly.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


initial point

the minimum value

The gradient descent algorithm may lead to local optimal solution; the
global optimal one can be ensured when the loss function is convex.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

Note on the Gradient Descent Algorithm parameters

1. The magnitude of gradient, epxilong, is one of termination conditions


2. Another termination condition is the iteration numbers (time control)

3. The learning rate, alpha, is to control the “walking-step”, too small will
lead to slow convergence (low efficiency), but too big will result in
vibrating (non-convergence). Its appropriate value is dependent on the
specific function to be minimized.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

1. Definition of gradient
2. Gradient descent algorithm (GDA)
3. Difference between GDA and Newton’ s method
4. An example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Suppose the objective function f(x) has the second order continuous partial
derivative; xk is an approximation of its minimum point. The second order
Taylor polynomial approximation of f(x) near xk is shown as follows:

Its gradient is

The minimum point of the approximate function satisfies

then
where H(xk) is the Hessian matrix of f(x) at point xk.

In the minimizing process of f(x), is considered as the


searching direction.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


The minimizing process of Gauss-Newton method can be represented as:

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

In optimization, Newton's method is applied to the derivative f ′ of a twice-di


fferentiable function f to find the roots of the derivative (solutions to
f ′(x)=0), also known as the stationary points of f.

In the one-dimensional problem, Newton's method to find the roots attempt


s to construct a sequence xn from an initial guess x0 that converges towards
 
x , t , x
some value x* satisfying f ′(x*)=0. This x* is a stationary point of f. t
wi order Taylor expansion fT(x) of f around xn is:
The second

wi
1 ''
fT  x   fT  xn  x   f  xn   f '
 xn  x  f  xn  x 2 .
2

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


We want to find Δx such that xn + Δx is a stationary point. We seek to solve the e
quation that sets the derivative of this last expression with respect to Δx equal t
o zero:
d  1 '' 
  n  n f  xn  x 2   f '  xn   f ''  xn  x.
'
0 f x  f x x 
d x  2 
 
For the value of Δx = −f ′(xn) / f ″(xn), which
x , t is, the solution
x of this equation, it can t
be hoped that xn+1 = xn + Δx = xn − f ′(xn) / f ″(xn) will be closer to a stationary point
x*. Provided that f is a twice-differentiable function and other technical condition
s are satisfied, the sequence x1, x2, ∙∙∙ will converge to a point x* satisfying f ′(x*)
= 0.

The above iterative scheme can be generalized to several dimensions by replac


ing the derivative with the gradient, ∇f (x), and the reciprocal of the second deri
vative with the inverse of the Hessian matrix, H f (x). One obtains the iterative s
cheme

1
x n 1  x n   H f  xn  f  xn  , n  0.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Comparison of GDA and Netwon's Method

A comparison of gradient descent


 (green) and Newton's method (red
x, t , x t
) for minimizing a function (with s
mall step sizes).

wi Newton's method uses curvature


information to take a more direct
route.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

1. Definition of gradient
2. Gradient descent algorithm (GDA)
3. Difference between GDA and Newton’s method
4. An example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Minimize: f ( x)  x 2 .
Step 1: computing the gradient,   2x.
Step 2: moving x along the negative direction of the gradient, i.e.,
x  x    , where γ is the learning rate.
Step 3: Looping Step 2, untill the difference of f (x) between two
adjacent iteration is small enough, which indicates f (x) attains its local
minimum value.
Step 4: outputting x, which is the optimal solution.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Example

Minimize f (x)=x2 by using Gradient Descent Algorithm

 
x, t , x The initial value of x is 2,
and the step length is 0.1.

After iteration 49 times, t


wi he minimum value 1.273
147e-09 of the function is
obtained, and the corresp
onding x value is 3.56811
9e-05.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Example

Minimize f (x)=x2 by using Netwon's Method


x, t , x0
The initial value of x is 2.

After iteration 15 times, t


he minimum value 3.7253
wi e-09 of the function is
obtained, and the corresp
onding x value is 6.1035e
x1
-05.
x2

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

Gradient Descent
Algorithm

The End.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An application
6. Questions

Outline

1. Gradient Descent Algorithm


2. BP Algorithm for Feed-Forward Neural
Network Model
3. Convolutional Neural Network
4. Deep Learning

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Rumelhart, McClelland proposed BP(Back Propagation)
algorithm for feed-forward neural network

David
Rumelhart

• BP algorithm – key idea J. McClelland


– Using the error of output layer to estimate the error of its
previous layer, generally using the error of layer n to
estimate the error of layer n-1

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


A intuitive understanding to a feed-forward neural network

A feed-forward NN is a smooth function which can used


to approximate a system of input-output (Black Box)

What is the specific form of the function


in box?
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

A intuitive understanding to a feed-forward neural network

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• A Perceptron

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• A Perceptron

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• A Perceptron can be used to represent many Boolean
functions, like the following:

A Perceptron cannot be used to represent:

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Sigmoid threshold unit

The Sigmoid unit computes its output o as where

It is easy to check that

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Sigmoid function picture

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

  
x  AT A
1
AT b,

Iteration method, approaches the optimal solution gradually through each updating
step.

Gradient descent, which belongs to iteration methods, is available for least squares
problems.

Gauss-Newton method is an commonly used iteration approach to solving


nonlinear least squares problems.

Levenberg-Marquardt is another iteration method to solve nonlinear least squares


problems.
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


It is a function of and its minimum
exists. BP algorithm is to use the
gradient descent technique to find
the minimum by gradually updating
the weights

It is easy to know that

The remaining task to derive a convenient expression


for
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


In summary:

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

1. Brief introduction
2. Feedforward NN
3. BP algorithm
4. Notes on BP
5. An application
6. Questions

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Learning process :
– Stimulated by input samples, the connection weights
update gradually, such that network outputs approach
expected outputs step by step.

• Learning essence :
– Dynamically update connection weights

• Learning rule :
– It is the rule of how updating the connection weights
(What rule is followed)

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Learning type : Supervised
• Key idea :
– The output error (in a suitable form) is back-propagated
to input layer via hidden layer(s)

Assigning the error to Updating


all units (nodes) in weight for
layers each node
• Features :
– Signal forward-propagated
– Error back-propagated

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Forward propagation :
– Input sample - input layer - every hidden layer
- output layer
• Judge whether go to back-propagation :
– If the difference between actual and expected outputs
(in output layer) is bigger than a threshold
• Back-propagation
– Representing errors of each layer and updating weight
for each node
• Stop if output error is under a predefined threshold or the
number of iterations attains the predefined maximum.
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


Related concepts of gradient descent

1. Learning rate: in the process of gradient descent, the function decreases


along the negative direction of the gradient. Learning rate determines the
descent degree for each iteration step.

2. Feature: the inputs of the algorithm, which are used to describe the
samples.

3. Hypothesis function: in supervised learning, it aims to fit leaning samples.

4. Loss function: it can measure the effectiveness of hypothesis function,


generally, which is computed as the square of the difference between the
outputs and the prediction fitting values.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Standard Gradient Descent As described in the Gradient Descent Algorithm,


the calculation of gradient is based on all the 
 training samples.
x, t , x t
Stochasticw Gradient Descent Whereas the gradient descent training rule
i
presented in the Gradient Descent Algorithm computes weight updates after summing
over all the training examples, the idea behind stochastic gradient descent is to
approximate this gradient
wi descent search by updating weights incrementally,
x , t
following the calculation of thefunction for each individual example.
x
Batch GradientwiDescent
, , where the gradient is based on a batch of the training
samples.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


Remarks

The key differences between standard gradient descent and stochastic gradient
descent are:

• In standard gradient descent, the error is summed


 over allexamples before updating weights,
x, t , x
whereas in stochastic gradient descent weights are updated upon examining each training
t
example.
wi

• Summing over multiple wexamples


i in standard gradient descent requires more computation per

weight update step.
x , t On the other hand, because it uses the true gradient, standard gradient
 size per weight update than stochastic gradient descent.
descent is often used with a larger step
x
wi ,
• In cases where there are multiple local minima with respect to the objective function, stochastic
gradient descent can sometimes avoid falling into these local minima because it uses the various
V E d ( G ) rather than V E ( 6 ) to guide its search.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

1. Brief introduction
2. Feedforward NN
3. BP algorithm
4. Notes on BP
5. An application
6. Questions

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• A 3-layer feed-forward neural network: Neural
network learning to steer an autonomous vehicle

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

1. Brief introduction
2. Feedforward NN
3. BP algorithm
4. Notes on BP
5. An application
6. Questions

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Questions:

1.If the features are not numerical but symbolic, I mean,


the input-output system has input of symbols and
output of real number, do you think how to use BP to
train the approximator?
2.In comparison with real case, how about its
performance?
3.In your own opinion, how to empirically select the
step in Gradient Descent Algorithm?

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Feedforward NN and
BP Algorithm

The End.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Outline

1. Gradient Descent Algorithm


2. BP Algorithm for Feed-Forward Neural
Network Model
3. Convolutional Neural Network
4. Deep Learning

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

1. Convolution definition
2. Convolution layer
3. Pooling layer
4. Fully connected layer
5. Example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


A. 褶积 ( 又名卷积 ) 和反褶积 ( 又名去卷积 ) 是一种积
分变换的数学方法 , 在许多方面得到了广泛应用。
B. 用褶积解决试井解释中的问题 , 早就取得了很好成
果 ; 而反褶积 , 直到最近 ,Schroeter 、 Hollaender
和 Gringarten 等人解决了其计算方法上的稳定性问
题 , 使反褶积方法很快引起了试井界的广泛注意。
C. 有专家认为 , 反褶积的应用是试井解释方法发展史上
的又一次重大飞跃。他们预言 , 随着测试新工具和新
技术的增加和应用 , 以及与其它专业研究成果的更紧
密结合 , 试井在油气藏描述中的作用和重要性必将不
断增大

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

卷积与傅里叶变换有着密切的关系。利用一点性质,即两函数的傅里叶
变换的乘积等于它们卷积后的傅里叶变换,能使傅里叶分析中许多问题
的处理得到简化。

由卷积得到的函数 f*g 一般要比 f 和 g 都光滑。特别当 g 为具有紧致


集的光滑函数, f 为局部可积时,它们的卷积 f * g 也是光滑函数。利
用这一性质,对于任意的可积函数 f ,都可以简单地构造出一列逼近于
f 的光滑函数列 fs ,这种方法称为函数的光滑化或正则化。

卷积的概念还可以推广到数列、测度以及广义函数上去。

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


卷积是两个变量在某范围内相乘后求和的结果。如果卷积的变量是
序列 x(n) 和 h(n) ,则卷积的结果 :

其中星号 * 表示卷积。当时序 n=0 时,序列 h(-i) 是 h(i) 的时序


i 取反的结果;时序取反使得 h(i) 以纵轴为中心翻转 180 度,所以这种
相乘后求和的计算法称为卷积和,简称卷积。另外, n 是使 h(-i) 位移的
量,不同的 n 对应不同的卷积结果。

如果卷积的变量是函数 x(t) 和 h(t) ,则卷积的计算变为 :

其中 p 是积分变量,积分也是求和, t 是使函数 h(-p) 位移的量,


星号 * 表示卷积。
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

1. Convolution definition
2. Convolution layer
3. Pooling layer
4. Fully connected layer
5. Example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


The connection of convolutional layer.

The connection of pooling layer.

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Convolution in Neural Network
The filters play as feature detectors.
Edge detection
1 1 1
1 0 -1
0 0 0
1 0 -1

1 0 -1 -1 -1 -1

Vertical edge detection Horizontal edge detection

The value of weights can be other number, What we need to do is to train the
weights and bias.

Different kind of filters mean extracting different feathers.


23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Convolution in Neural Network
An example of edge detection

The picture is from Andrew Ng.

The filters can become more intricate as they start incorporating information from a
n increasingly larger spatial extent.
23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


The computation of Convolution in Neural Network
10 10 10 0 0 0
0 30 30 0
10 10 10 0 0 0
1 0 -1 0 30 30 0
10 10 10 0 0 0
1 0 -1
10 10 10 0 0 0 * 1 0 -1
= 0 30 30 0

10 10 10 0 0 0
0 30 30 0
10 10 10 0 0 0

10 10 10 1 0 -1

1 0 -1
10 10 10
= 0
10 10 10 1 0 -1

Then slide the local receptive field across the entire input image.
23/9/20

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Convolution in Neural Network
10 10 10 0 0 0
0 30 30 0
10 10 10 0 0 0
1 0 -1 0 30 30 0
10 10 10 0 0 0
1 0 -1
10 10 10 0 0 0 * 1 0 -1
= 0 30 30 0

10 10 10 0 0 0
0 30 30 0
10 10 10 0 0 0

The higher value of output means


matching the feature better.

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Convolutions on RGB image
Filter W0
Feature 0

* =

* =
RGB channels depth

Feature1
Filter
W1
Why convolutions ?
Parameter sharing

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

1. Convolution definition
2. Convolution layer
3. Pooling layer
4. Fully connected layer
5. Example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Pooling layers - Shrinking the image stack

Pooling:
1.Pick a window size(usually 2 or 3)
2.Pick a stride(usually 2)
3.Walk your window across your filtered images.
4.From each window
23/9/20, take the maximum value.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Pooling layers ---Shrinking the image stack
• 3.2 Average pooling
Calculate the average
1 3 2 1 value of each window

2 9 1 1 3.75 1.25

2 3 2 3 4 2

5 6 1 2

• Remove the redundancy information of convolutional layer .


By having less spatial information you gain computation performance

Less spatial information also means less parameters, so less chance to over-fit

You get some translation invariance.

23/9/20

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

1. Convolution definition
2. Convolution layer
3. Pooling layer
4. Fully connected layer
5. Example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Full connection layer
The CNNs help extract certain features from the image, then fully connected
layer is able to generalize from these features to the output-space.

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


All the layers are put together, Then the CNN looks like…

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

1. Convolution definition
2. Convolution layer
3. Pooling layer
4. Fully connected layer
5. Example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

For example
Say whether a picture Is of an X or O.
A two-dimensional array of pixels

CNN
CNN XX or O

CNN O

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

What the computer see

-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Features match pieces of the image(Feature detectors)
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Filtering : The math behind the match

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Convolution layer
---One image becomes a stack of filtered images

stack

extracted
by three
filters

depth

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


5. ReLU layer
A stack of images becomes a stack of images with no negative values.

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Pooling layer ---A stack of images becomes a stack of smaller images

Max pooling

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Layers get stacked
The output of one becomes the input of the next.

Layers can be repeated several(or many) times. 9 5


5 9
9 5
5 5
5 9
9 5

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Fully connected layer
Every value gets a vote---Vote depends on how strongly a value
predicts X or O.

9
5
5
9
9
5
5
5
5
9
9
5

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network


Summary : Putting it all together
A set of pixels becomes a set of
votes.

Classifier

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Convolutional Layer Another story based on filter

Filter 1
3x3x
channel
tensor

Convolution
Filter 2
3x3x
channel
tensor
……

……

channel = 3(colorful) Each filter detects a small


channel = 1 (black and white) pattern (3 x 3 x channel).
99
Convolutional Layer
Consider channel = 1
(black and white
image)
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
-1 1 -1
1 0 0 0 1 0 Filter 2
-1 1 -1
0 1 0 0 1 0
-1 1 -1
0 0 1 0 1 0

6 x 6 image ……
(The values in the filters
are unknown parameters.)
100
Convolutional Layer 1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1

1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1

6 x 6 image 3 -2 -2 -1

101
Convolutional Layer -1 1 -1
-1 1 -1 Filter 2
-1 1 -1
stride=1 Do the same process for
1 0 0 0 0 1 every filter
0 1 0 0 1 0 3 -1 -3 -1
-1 -1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1
0 1 0 0 1 0 Feature
0 0 1 0 1 0 -3 -3 Map0 1
-1 -1 -2 1
6 x 6 image 3 -2 -2 -1
-1 0 -4 3

102
Convolutional 3
-1
-1
-1
-3
-1
-1
-1
Layer
-3 1 0 -3
-1 -1 -2 1

-3 -3 0 1
-1 -1 -2 1
3 -2 -2 -1
-1 0 -4 3
64
Convolution
filters “Image” with 64 channels

Convolution
……
Multiple
3 -1 -3 -1
Convolutional -1 -1 -1 -1
Layers -3 1 0 -3
-1 -1 -2 1

-3 -3 0 1
-1 -1 -2 1
3 -2 -2 -1
-1 0 -4 3
64
Convolution
filters “Image” with 64 channels

Convolution
Filter:
3 x 3 x 64
……

64 104
1 0 0 0 0 1
Multiple
0 1 0 0 1 0
Convolutional
0 0 1 1 0 0
Layers
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0

64 3 -1 -3 -1
Convolution -1 -1 -1 -1
filters
-3 1 0 -3
-1 -1 -2 1
Convolution
-3 -3 0 1
-1 -1 -2 1
3 -2 -2 -1
……

-1 0 -4 3 105
Comparison of Two Stories

1 -1 -1 Filter
.
…..

-1 1 -1 3 x 3 x
channel
-1 -1 1 tensor

Receptiv
e field (ignore bias in this slide)

106
The neurons with different

.
…..
receptive fields share the
parameters.
bias
1 0 0 0 0 1 1
11 00 00 00 00 11


0 1 0 0 1 0
00 11 00 00 11 00
0 0 1 1 0 0
00 00 11 11 00 00
1 0 0 0 1 0
11 00 00 00 11 00

.
…..
0 1 0 0 1 0
00 11 00 00 11 00
0 0 1 0 1 0
00 00 11 00 11 00
bias
Each filter convolves 1

over the input image. 107


Convolutional Layer

Neuron Version Filter Version


Story Story
Each neuron only There are a set of
considers a receptive filters detecting small
field. patterns.
The neurons with
Each filter convolves
different receptive fields
over the input image.
share the parameters.

They are the same story.


108
Observation 3

• Subsampling the pixels will not change the object

bird
bird

subsampling

109
Pooling – Max Pooling

1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1

3 -1 -3 -1 -1 -1 -1 -1

-3 1 0 -3 -1 -1 -2 1

-3 -3 0 1 -1 -1 -2 1

3 -2 -2 -1 -1 0 -4 3
110
Convolutional
3 -1 -3 -1
Layers -1 -1 -1 -1
+ Pooling -3 1 0 -3
-1 -1 -2 1

-3 -3 0 1
-1 -1 -2 1
3 -2 -2 -1
-1 0 -4 3
Convolution
Repeat

“Image” with 64 channels

Pooling 3 0
-1 1

3 1
0 3
……

111
The whole CNN

cat dog ……
Convolution
softmax

Pooling

Fully Connected
Layers Convolution

Pooling

Flatten 112
Application: Playing Go

Next move
Network (19 x 19
positions)
19 x 19 classes
19 x 19 matrix
19(image)
x 19 vector
Black: 1 Fully-connected
48 network can be used
white: -1
channels in
Alpha Go none: 0 But CNN performs much better.
113
Why CNN for Go playing?

• Some patterns are much smaller than the whole image

• The same patterns appear in different regions.


Alpha Go uses 5 x 5 for first layer

114
Why CNN for Go playing?

• Subsampling the pixels will not change the object

Pooling How to explain this???

Alpha Go does not use Pooling …… 115


More Applications

Speech
https://dl.acm.org/doi/10.11
09/TASLP.2014.2339736

Natural Language
Processing
https://www.aclweb.org/ant
hology/S15-2079/

116
Gradient Descent Algorithm 1. Convolution Definition
BP Algorithm for Feed-Forward Neural Network Model 2. Convolutional Layer
Convolutional Neural Network 3. Pooling Layer
4. Fully Connected Layer
Deep Learning 5. Example

Convolutional Neural Network

Convolutional Neural
Networks

The End.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Outline

1. Gradient Descent Algorithm


2. BP Algorithm for Feed-Forward Neural
Network Model
3. Convolutional Neural Network
4. Deep Learning

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Deep Learning

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Deep Learning

1. Introduction
2. What is Deep Learning
3. Partial connections
4. Initial weights
5. Biological & Theoretical Justification
6. Looking Forward

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Winter of Neural Network

+1
+1 +1

 Non-convex
 Need a lot of tricks to play with
 Hard to do theoretical analysis

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

What’s wrong with back-propagation


1. It requires labeled training data:
 Almost all data is unlabeled.

2. The learning time does not scale well:

 It is very slow in networks with multiple hidden layers.


3. It can get stuck in poor local optima:
 These are often quite good, but for deep nets they are
far from optimal.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

The Paradigm of Deep Learning

Neural networks are coming back!

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Race on ImageNet (Top 5 Hit Rate)

72%, 2010

74%, 2011

85%, 2012

Answer from Geoff Hinton, 2012.10


Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

The Architecture

 Max-pooling layers follow first, second, and


fifth convolutional layers
 The number of neurons in each layer is given
by 253440, 186624, 64896, 64896,
43264,4096, 4096, 1000

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Revolution on Speech Recognition, NLP…

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Deep Learning in Industry

 First successful deep learning models for


speech recognition, by MSR in 2009
 Now deployed in MS products, e.g. Xbox

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Deep Learning in Industry

“Google Brain” Project

 Led by Google fellow Jeff Dean


 Published two papers:
ICML2012, NIPS2012
 Company-wise large-scale deep learning
infrastructure
 Big success on images, speech, NLPs
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Deep Learning

1. Introduction
2. What is Deep Learning
3. Partial connections
4. Initial weights
5. Biological & Theoretical Justification
6. Looking Forward

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

What is Deep Learning (DL)


The training of a feed forward neural network (which possesses some of the
following main features: ① ,②,③,④), i.e., the process of determining the connection
weights from data, is called Deep Learning.

23/9/20 Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward
5. Further thinking about DL

Fundamental structures
Multi-scale fusion (Inception)

Randomly connected

2006 2011 2014 2015 2016 2017 2019

Deep CNN (AlexNet) Recurrent NN(GRU) ResNet DenseNet

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Basic training strategy


• The basic training strategy is BP (back-propagation) algorithm, an old
optimization technique based on gradient descent, with some specials:

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Deep Learning

1. Introduction
2. What is Deep Learning
3. Partial connections
4. Initial weights
5. Biological & Theoretical Justification
6. Looking Forward

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Partial Connections - Convolutional Layer


The connection of convolutional layer.

The connection of pooling layer.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Partial Connections - Pooling Layers


Pooling layers - Shrinking the image stack

Pooling:
1.Pick a window size(usually 2 or 3)
2.Pick a stride(usually 2)
3.Walk your window across your filtered images.
4.From each window
23/9/20, take the maximum value.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Skip Connections / Cyclic Connections

23/9/20

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Partial Connections – Full Connection Layer


Full connection layer
The CNNs help extract certain features from the image, then fully connected
layer is able to generalize from these features to the output-space.

23/9/20
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Convolutional Neural Network 4. Initial Weights
Deep Learning 5. Biological & Theoretical Justification
6. Looking Forward

Deep Learning

1. Introduction
2. What is Deep Learning
3. Partial connections
4. Initial weights
5. Biological & Theoretical Justification
6. Looking Forward

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Auto-encoder Neural Network Convolutional Neural Network
Deep Learning
4.
5.
Initial Weights
Biological & Theoretical Justification
6. Looking Forward

Initial Weights - Auto-Encoder Neural Network

+
1

+
1

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Auto-encoder Neural Network Convolutional Neural Network
Deep Learning
4.
5.
Initial Weights
Biological & Theoretical Justification
6. Looking Forward

Initial Weights - Auto-Encoder Neural Network

Input Code Prediction


Encoder Decoder

Error

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Sparse Auto-Encoder
Input Code Prediction
Encoder Decoder

Error

Sparsity
Penalty

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Initial Weights – Stacked Auto-Encoders

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Stacked Auto-Encoders

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Initial Weights – Sparse Coding

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Initial Weights – Restricted Boltzmann Machine


Hidden variables
h

The energy of the joint configuration:


W

v
Visible variables

Probability of the joint configuration is given by the


Boltzmann distribution:

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Initial Weights – Restricted Boltzmann Machine


Hidden variables Restricted: No interaction between
hidden variables.
h

W Inferring the distribution over the


hidden variables is easy:

v
Visible variables

Similarly:

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Initial Weights – Model Parameter Learning

Hidden variables
h

W
Maximize (penalized) log-likelihood
objective:

v
Visible variables

Derivative of the log-likelihood:

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Initial Weights – Contrastive Divergence

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Initial Weights – RBM & Auto-Encoders


 also involve activation and reconstruction
 but have explicit f(x)
 not necessarily enforce sparsity
 but if put sparsity on a, often get improved results [e.g. sparse
RBM, Lee et al. NIPS08]

a a

encoding f(x) decoding g(x)

x X’

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Initial Weights – DBNs for MNIST Classification

After layer-by-layer Unsupervised Pre-training, discriminative fine-


tuning by back-propagation achieves an error rate of 1.2% on
MNIST. SVM’s get 1.4% and randomly initialized back propagation
gets 1.6%.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Initial Weights – Deep Auto-encoders for Unsupervised Feature Learning

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Initial Weights – Recap of Deep Learning Tutorial


1. Building blocks :
 RBMs, Auto-encoder Neural Net, Sparse Coding

2. Go deeper: Layer-wise feature learning :


 Layer-by-layer unsupervised training
 Layer-by-layer supervised training

3. Fine tuning via Back-propagation :


 If data are big enough, direct fine tuning is enough.

4. Sparsity on hidden layers are often useful.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Deep Learning

1. Introduction
2. What is Deep Learning
3. Partial connections
4. Initial weights
5. Biological & Theoretical Justification
6. Looking Forward

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections

Why Hierarchy?
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network
Deep Learning
4.
5.
6.
Initial Weights
Biological & Theoretical Justification
Looking Forward

Biological & Theoretical Justification


Theoretical:
“…well-known depth-breadth tradeoff in circuits design [Hastad
1987]. This suggests many functions can be much more
efficiently represented with deeper architectures…” [Bengio&
LeCun 2007]

Biological:

Visual cortex is hierarchical


(Hubel-Wiesel Model)

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections

Sparse DBN: Training on face images


Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network
Deep Learning
4.
5.
6.
Initial Weights
Biological & Theoretical Justification
Looking Forward

Biological & Theoretical Justification

object models

object parts

edges

pixels

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Deep Learning

1. Introduction
2. What is Deep Learning
3. Partial connections
4. Initial weights
5. Biological & Theoretical Justification
6. Looking Forward

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections

Why does it work so well?


Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network
Deep Learning
4.
5.
6.
Initial Weights
Biological & Theoretical Justification
Looking Forward

Looking Forward
Plan:
 propose explanatory hypotheses
 observe the effects of pre-training
 infer its role & level of agreement with our hypotheses

Regularization hypothesis:

 Unsupervised component constrains the network to model P(x)


 P(x) representations good for P(y|x).

Optimization hypothesis:
 Unsupervised initialization near better local minimum of P(y|x)
 Reach lower local minimum not achievable by random initialization.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Open
Stacked Questions
Auto-encoders
Auto-encoder Neural Network …
Convolutional Neural Network
Deep Learning
4.
5.
6.
Initial Weights
Biological & Theoretical Justification
Looking Forward

Looking Forward

1. Is there a depth that is mostly sufficient for the


computations necessary to approach human-level
performance of AI tasks?
2. Why is gradient-based training of deep neural
networks from random initialization often
unsuccessful?
3. Are there other efficiently trainable deep
architectures besides Deep Brief Network, Stacked
Auto-encoders, and deep Boltzmann Machines?
4. Why Unsupervised Pre-training is Important?… …

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Introduction
Gradient Descent Algorithm 2. What is Deep Learning
BP Algorithm for Feed-Forward Neural Network Model 3. Partial Connections
Sparse Auto-encoder
Stacked Auto-encoders
Auto-encoder Neural Network Convolutional Neural Network 4.
5.
Initial Weights
Biological & Theoretical Justification
Deep Learning
6. Looking Forward

Deep Learning

Deep Learning

The End

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning

You might also like