0% found this document useful (0 votes)

72 views

Deep Learning Overview-NQU

This document provides an overview of deep learning and machine learning concepts. It discusses machine learning concepts like classification problems, datasets, and k-nearest neighbors algorithms. It then covers artificial neural networks including their structure, activation functions, backpropagation, and multi-layer perceptrons. Finally, it introduces deep learning, discussing its advantages over other methods for problems involving image, speech and other data types as well as innovations that enable training very large neural networks.

Uploaded by

Alan Turing

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views

Deep Learning Overview-NQU

Uploaded by

Alan Turing

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Deep learning overview

Nguyen Quang Uy

During the presentation

Please ask questions whenever you have.

4/15/16

Outline
This presentation provides an introduction to the machine
learning and deep learning.
Concept of Machine Learning

Artificial Neural network

Deep learning

3/11/16

Machine Learning Concept

Machine learning:

Methods that can automatically detect patterns in data

Use the uncovered patterns to predict future data,

3/11/16

Machine Learning System

Data Collection

New Data

Features
Extraction/Selection

Learning

Learnt Model

Decision

a) Training phase
3/11/16

b) Testing/Deploying
5

Datasets
Often in the form of tables

Samples: A record/item in the dataset

Features: A column in the dataset presenting a property

of the object of the learning problem.

Iris flower

3/11/16

Data set: Iris

sepal
length

sepal
width

petal
length

petal Class
width

5.1

3.5

1.4

0.2

Iris-setosa

7.0

3.2

4.7

1.4

Irisversicolor

6.3

3.3

6.0

2.5

Iris-virginica

3/11/16

Classification Problem

Given a dataset with

all samples have
already been labeled to
several classes.

Find the label class for

new samples.

3/11/16

F1
5.1
7.0
6.3
2.1
1.0
F1
2.2
3.5

F2
3.5
3.2
3.3
1.3
1.8
F2
1.8
2.9

F3
1.4
4.7
6.0
6.2
2.5
F3
3.7
4.8

F4
0.2
1.4
2.5
5.7
2.6

Class
1
0
1
1
0

F4 Class
4.6 ?
5.2 ?
8

K-Nearest Neighbor Classifiers

Learning by analogy: Tell me who your friends are

and Ill tell you who you are

3/11/16

K-Nearest Neighbor Algorithm

To determine the class of a new sample

Calculate the distance between the new sample and all examples
in the training set
Select K-nearest examples to the new sample in the training set
Assign the new sample to the most common class among its Knearest neighbors

3/11/16

K-Nearest Neighbor Example

Paper tissue dataset

X1 (Acid durability)
(seconds)

X2 (Strengh)
(Kg/m2 )

Y=Classification

Bad

Good

Please test the new paper tissue with X1=3, X2=7 with
K=3

3/11/16

K-Nearest Neighbor Example

Paper tissue dataset

X1 (Acid
durability)
(seconds)

X2 (Strengh)
(Kg/m2 )

Distance to the
new sample

Y=Classification

Bad

Good

Please test the new paper tissue with X1=3, X2=7 with
K=3
Since K=3 we have two out of three closet samples are
good then the testing sample is good.
3/11/16

K-Nearest Neighbor Algorithm

There are several key issues that affect the performance of

kNN:

One is the choice of k

The distance metric

The approach to combining the class labels.

3/11/16

Performance Measure

Popular measure for classification is accuracy:

Accuracy =

Number of correct classified samples

Total number of samples

3/11/16

Overfitting and model selection

Overfitting: When model A is beter than model B on training

data, but B is better than A on testing data, A is said to be

overfitted.

Model selection: Problem of selecting a good model for

unseen data

3/11/16

No free lunch theorem

No free lunch theorem (David Wolpert and William Macready,

1997).

On average: the performance of all algorithms is the same and

equal to the random algorithm.

3/11/16

When to apply machine learning

Human expertise is absent.

Humans are unable to explain their expertise

Speech recognition, Face recognition

The problem size is too vast

Robotics on Mars

Calculating webpage ranks, matching ads Google pages

Solution changes with time

Network traffic monitoring

3/11/16

Outline
This presentation provides an introduction to the machine
learning and deep learning.
Concept of Machine Learning

Artificial Neural network

Deep learning

3/11/16

Artificial Neural Network

This learning model is inspired by the biological neural
network.

A biological neural network is a series of interconnected

neurons which interact each other to process information.

3/12/16

Artificial Neural Network

A Artificial Neural Network is a system composed of many

simple processing elements operating in parallel:

Each element of NN is a node called unit.

Units are connected by links and the links has a numeric weight.
The activation function of a node defines the output of that node
given a set of inputs.

3/12/16

Activation function
The activation function defines the output of that node
given a set of inputs.

Used to transform input to different domain where they

may be easily separable.

3/12/16

Popular activation functions

3/12/16

Nummerical Example

Neth1=0.15*0.05+0.2*0.1+0.35=0.3775

Outh1=1/(1+e-Neth1)=1/(1+e-0.3775)=0.596

3/12/16

Nummerical Example

Neth1=0.15*0.05+0.2*0.1+0.35=0.3775

Outh1=1/(1+e-Neth1)=1/(1+e-0.3775)=0.596

3/12/16

Multilayer Perceptron

An MLP consists of multiple layers of nodes in a directed

graph, with each layer fully connected to the next one.
One hidden layer neural network is the most popular
structure in MLP

3/12/16

Training MLP

Finding the parameters (weights) so that the objective

function is optimal.
We need to:

Define an objective function

A method for adjusting the parameters

3/12/16

Cost function

Cost function is the objective function for the model that we want to
find the parameters to optimize.
One popular cost function for neural network is cross-entroy cost
function:

J()=-[ylogh(x)+(1y)log(1h(x))]
where y is the objective value with the input is x
h(x) is the output of the model given x

3/12/16

Parameters Estimation

Find the parameters so that the cost function is minimal.

We select that:
J()=[ylogh(x)+(1y)log(1h(x))]
is minimal

3/12/16

Gradient Decent Algorithm

0. Start at xk, k = 0, select .

J ( xk )
1.Compute a search direction pk =

2.Update xk+1 = xk - pk
3.Check for convergence (stopping criteria) e.g. df/dx = 0
4. k=k+1, repeat step 1 to 4.

3/12/16

Gradient Decent Algorithm

J ( xk )
pk =

3/12/16

Patch Gradient Decent Algorithm

To increase the convergence speed, we often update the
parameters of logistic model after a mini patch of training
samples.
This approach is reffered to as minipatch gradient descent.

3/12/16

MLP model selection

Several aspects need to consider when using MLP:

The way to initialize the weights and biases.

The number of neural in hidden layers: of the number of
inputs.

The learning rate.

Stopping criteria?

3/12/16

Initialization MLP

For biases:

Can initialize all to zero

For weights

Should not be the same for all

Should not be zero if the activation is tank
The common recipe is to initialize wij uniformly from [-a,a]
where

6
a=
H k + H k1

3/12/16

How do we pick ?

The stochastic gradient decent will converge if

t =
t =1

t=1

Where t is the learning rate at tth update

if is a constant then the algorithm is not converged.

3/12/16

How do we pick ?

Decreased strategies:

1+ t

t =

1+t

t =

Where is constant, usually selected between [0.5,1]

It is often better to used fixed learning rate for first few
updates.
3/12/16

When to stop Backprobagation?

Some common criteria are:

The fixed number of epoch.

Stop when can not reduce more error.
Using early stopping: Stop training when the validation error
increase (with some look ahead).

3/12/16

Neural work for digit recognition

If the output at position

k is greatest, than the
network will recognize
the input as number k.

3/12/16

Universal approximation theorem

Theorem (Cybenko 1989): A feed-forward network with a
single hidden layer containing a finite number of neurons
can approximate any continuous function.
In other words, a set of weights exists that can produce
the targets from the inputs. However, the problem is
finding them.

3/12/16

Outline
This presentation provides an introduction to the machine
learning and deep learning.
Concept of Machine Learning

Artificial Neural network

Deep learning

3/11/16

DL appears in The New York Times

Scientists See Promise in Deep-Learning Programs
John Markoff
November 23, 2012

3/12/16

DL is the center at Microsoft Research

3/12/16

Leading researchers

Hitton at Google and Toronton Uni.

Lecun at Facebook

Andrew Ng. at Stamford Uni.

3/12/16

Successful application

Deep learning is a powerful methodology well-suited to

training deep and large networks for big data applications.
Successful applications of deep networks have already
been presented on a large variety of applications:

Computer vision: Facebook image tagging

Natural Language Processing: Google translation

Speech Recognition: Google doc (voice to text)

3/12/16

Multilayer Neural Network

Can we use multilayer neural network with a lot of

layers?

The anwer is yes and people have tried this but

without much success.

3/12/16

Problems for many layers neural network

When training MLP with many hidden layer, gradient
descent algorithm is not suitable since:
Two many parameters

A network with 1000 inputs two hidden layers with 500 nodes
and 10 output have more than 1000*500*500*10=2,500,000,000
parameters
Computationally expensive
Gradient decay quickly.

3/12/16

What is novelty of DL
1. What exactly is deep learning?
2. Why is it generally better than other methods on
image, speech and certain other types of data?

3/12/16