0% found this document useful (0 votes)

15 views105 pages

Notes_ML_02_Slides_RNN_ANN

The document discusses Artificial Neural Networks (ANNs), detailing their history, design, and applications in various fields such as face recognition and autonomous vehicle navigation. It covers the structure of perceptrons, multilayer perceptrons, and the backpropagation algorithm for training these networks. Additionally, it highlights recent advances in neural networks, including deep learning and integration with fuzzy logic.

Uploaded by

Marcelo Davi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views105 pages

Notes_ML_02_Slides_RNN_ANN

Uploaded by

Marcelo Davi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 105

Terminal Branches

Dendrites
of Axon

Axon

Artificial Neural Network

Anna Helena Reali Costa
PCS
ANN
• A different style of computation: parallel
distributed processing
• A universal computational architecture: the
same structure carries out many different
functions
• It can learn new knowledge, therefore it is
adaptive
History
• McCulloch and Pitts introduced the artificial
neuron in 1943.
– Simplified model of a biological neuron
• Fell out of favor in the late 1960's
– Perceptron limitations (Minsky and Papert)
• Resurgence in the mid 1980's
– Nonlinear Neuron Functions
– Backpropagation training (Werbos)
• Currently resounding success with Deep NN
Applications
Face / Speech
Recognition
User Authentication.

Identification of military targets: Oil exploration:

B-52, Boeing 747, Space Shuttle Lithology, etc.

Autonomous Vehicle Navigation Prediction on

Financial Market
Design of an ANN
• The design of an ANN involves determining the
following elements:
– Neurons and activation function.
– Connections and arrangement of neurons: topology
(network architecture).
– Synaptic weights: values (in the case of learned
weights) or a training algorithm to be used and its
parameters.
– Recall: procedure to be used for the network to
calculate the outputs for given new inputs

• Unfortunately, there is no single "recipe" ...

Perceptron (Frank Rosenblatt, 1958)
1 wi1
2
wi2
N Inputs i OUT
.
. wij
j
win
n

#% 1: net > 0
OUT = f ( net ) = $ with net = ∑ wij In j
%& −1: net ≤ 0 j

Activation function
Perceptron: example
• Two inputs: x, y
1
• Two weights: w1, w2 w0
x w1
• Bias input: w0
Sf out
out = f [w0 + w1 x + w2 y] y w2

y
+1
w0 + w1x + w2 y = 0
w0 + w1x + w2 y > 0
-1
f(.):
x
w0 + w1x + w2 y < 0
7
Representational power of Perceptrons
Perceptron can represent any linearly separable function.
It represents a hyperplane decision surface
in the n-dimensional space of instances.
1 neuron: 2 classes +1 and –1
2 neurons: 4 classes +1+1, +1-1, -1+1, -1-1
N neurons: 2N classes
A
A
¬ OK B

B Not OK ®

8
Exercise Table 1

• Design a perceptron (Figure 1) x1 x2 y

which calculates the logical 0 0 1
implication function y = x1 ® x2,
0 1 1
described in Table 1.
1 0 0
1 1 1

Figure 1 – Perceptron.

9
Exercise Table 1

• Design a perceptron (Figure 1) x1 x2 y

which calculates the logical 0 0 1
implication function y = x1 ® x2,
0 1 1
described in Table 1.
1 0 0
1 1 1

Figure 1 – Perceptron.
Uma resposta: w0 = 1, w1= -2, w2 = 2
(0 representa NL0, 1 representa NL1) 10
The Perceptron training rule
Delta Rule

• Let t be the target output for the current

training example, o be the output generated by
the perceptron, and η be a positive constant
called the learning rate:

wi ← wi + Δwi
Delta Rule:
and
Δwi = η (t − o) xi
The Perceptron training algorithm
• Initialize weight matrix w
• For each training sample i, with (xi,s(xi)) = (xi,ti):
– Use current w to calculate s’(xi) = oi
– If |ti − oi| > ε then update:
w ¬ w + η.(ti – oi).xi
• Stop when ε ³ |ti − oi| for all samples (xi,ti)

If |ti – oi| = |d| £ e è w does not change

If d > 0 and |d| > e è w increases (because oi is very small)
If d < 0 and |d| > e è w decreases (because oi is very large)
Limitations of Perceptron with a
single layer
• Unfortunately, various functions of interest
are not linearly separable.
• For example, the Perceptron can not represent
XOR (exclusive OR).
X2
X1 X2 XOR
b d
a 0 0 0 1 Out = 0
b 0 1 1 Out = 1
c 1 0 1 a c
d 1 1 0 0
X1
1
MLP: Multilayer Perceptrons
• Sets of perceptrons arranged in several layers.
• At least one hidden layer.

• An intermediate layer is
sufficient to approximate
any continuous function.
• Two intermediate layers
are sufficient to
approximate any
mathematical function.
MLP: Multi-layer Perceptrons
X2
(111) (1-11)
A B
Classification requires 3
C straight lines that create 7
compartments and two
decision regions: one for
(-1-1-1)
X1
and one for
compartments
Hidden layer

Inputs A
Output
x1 1 Class 1
Solution: B out
x2
-1 Class 2

C
(-1-1-1),(-1-11),…..,(111)
15
Example of an MLP
• x1 = x2 = binary inputs.
• w1 = w2 = w3 = w4 = w5 = 1 e w6 = -2 x1 x2 Out
• f1 (x.w) = 1 if the activation level ³ 0.5, 0 0
0 otherwise. 0 1
• f2 (x.w) = 1 if the activation level ³ 1,5, 1 0
0 otherwise. 1 1

x1 1
0,5
1 1
0,5
1 out
1 -2
x2 1,5

Make the recall of this network, filling the table.

16
Example of an MLP
• x1 = x2 = binary inputs.
• w1 = w2 = w3 = w4 = w5 = 1 e w6 = -2 x1 x2 Out
• f1 (x.w) = 1 if the activation level ³ 0.5, 0 0 0
0 otherwise. 0 1 1
• f2 (x.w) = 1 if the activation level ³ 1,5, 1 0 1
0 otherwise. 1 1 0

x1 1
0,5
1 1
0,5
1 out
1 -2
x2 1,5
We got a network for XOR !!!

We now need an algorithm to train the weights ...

17
Types of Activation Function
• Linear-Threshold functions: • Sigmoid functions:
1
Out
Out = --------------------
Out 1 + exp (-l.net)
Out = 1 if net > q 1
1 = 0 if net < q

0 net
0 q net
Logistics function
Out
Out = 1 if net > q
= -1 if net < q Out
1 Out = tanh(l.net)
1
0 q net
0 net
-1
-1 Hyperbolic Tangent Function
MLP / Sigmoid
1
σ (net) = or σ (net) = tanh ( net )
1+ e−net

1
s ( x) = -x
Þ s ¢( x) = s ( x)(1 - s ( x) )
1+ e Interesting
s ( x) = tanh (x ) Þ s ( x) =
¢ (1 - s 2
)
( x ) property ...
2
derivative function 19
MLP training
• Key idea: use of gradient descent
to search the hypothesis space of
possible weight vectors to find the o
weights that best fit the training
examples.
• Learn wi’s that minimize squared x1 x1
error:

å
! 1 2
E[ w] = (td -o d )
2
d ÎD

D = training data 20
Gradient Descent
! 1
E[ w] = å (td -o d ) 2
2deD

h positive

1 neuron with two inputs

(and two weights w0 or w1)

Gradient: ! é ¶E ¶E ¶E ù
ÑE[ w] = ê , ,..., ú
¶
ë 0w ¶w1 ¶wnû

! ! ¶E
Rule: Dw = -h ÑE[w] Dwi = -h
¶wi
Because it is desired to move the weight vector in the direction that the error E decreases 21
Gradient Descent (one layer)
¶E ¶ 1
= å d d
¶wi ¶wi 2 d
(t - o ) 2

1 ¶
= å (t d - od ) 2
2 d ¶wi
1 ¶
= å 2(t d - od ) (t d - od )
2 d ¶wi
¶ ! !
= å (t d - od ) (t d - s ( w × xd ))
d ¶wi
¶ ! !
= å (t d - od ) (- s ( w × xd )) σ is the logistics function
d ¶wi
! ! ! !
= -å (t d - od ) s ( w × xd ) (1 - s ( w × xd )) xi ,d
d
od od
22
Gradient Descent: multiple Outputs

The error should now be redefined to k outputs:

 1
E[ w] = ∑ ∑ (tkd − o kd ) 2

2d∈D k ∈ outputs

D = training data Outputs

23
The Backpropagation algorithm
• A feed-forward network with one hidden layer
with nhid sigmoidal units, nout outputs (sigmoidal
units), nin inputs, several training samples < x, t >
W

t (target output)
x o (net output)
nout
nin
nhid

Input parameters of the backpropagation algorithm:

{<x1, t1>, <x2, t2>....}, h, nhid, nin, nout 24
The Backpropagation algorithm
• Initialize w to small random numbers
• Until the termination condition is met, do:
– For each <x,t> in training-examples, do:
//Propagate the input forward:
1. Propagate the input forward and compute each ok.
//Propagate the error backward:
2. For each output unit k, calculate the error dk:
dk ¬ ok (1 - ok) (tk - ok)
3. For each hidden unit h, calculate the error dh:
dh ¬ oh (1 - oh) SkÎsaidas whk .dk
4. Update each network weight wij:
wij ¬ wij + h dj xij
Input from i to j 25
Backpropagation:
animation

http://www.trapexit.org/images/b/ba/Animate_ANN.gif
46
Example:
initial weights: (-0.1 a +0.1); h = 0.3

Entrada Saída
10000000 ® 10000000
01000000 ® 01000000
00100000 ® 00100000
00010000 ® 00010000
00001000 ® 00001000
00000100 ® 00000100
00000010 ® 00000010
00000001 ® 00000001
47
Learning the representation (inner layer)
1 0 0
0 1 1

Entrada Saída hs Saída

10000000 ® .89 .04 .08 ® 10000000
01000000 ® .15 .99 .99 ® 01000000
00100000 ® .01 .97 .27 ® 00100000
00010000 ® .99 .97 .71 ® 00010000
00001000 ® .03 .05 .02 ® 00001000
00000100 ® .01 .11 .88 ® 00000100
00000010 ® .80 .01 .98 ® 00000010
00000001 ® .60 .94 .01 ® 00000001

Intermediate representation "discovers" binary code !!!!

48
Application × ANN type
• There are many other types of ANN:
– For classification or prediction tasks, usually feed-
forward networks (such as MLP) are used.
– For clustering tasks, the types of network used
are: Simple Competitive Networks, Adaptive
Resonance Theory (ART) networks, Kohonen Self-
Organizing Maps (SOM).
– In association tasks, an ANN can be trained to
"remember" a number of patterns; the type of
ANN usually used for this task is Hopfield network.
Recent advances and future
applications of ANNs
• Integration of fuzzy logic into neural networks
• Pulsed neural networks
• Hardware specialized for neural networks
• Deep Learning:
– enable much deeper (and larger) networks (5 to 10 hidden
layers): have the ability of building up a complex hierarchy
of concepts.
– one can show that there are functions which a k-layer
network can represent compactly (with a number of
hidden units that is polynomial in the number of inputs),
that a (k − 1)-layer network cannot represent unless it has
an exponentially large number of hidden units.
References
• Russel, S.; Norvig, P. Artificial Intelligence: a
modern approach. 2nd.edition. Prentice Hall,
2003. Cap. 20.5.
• Mitchell, T.M. Machine Learning.
WCB/McGraw-Hill, 1997. Cap.4.
• Simon Haykin. Neural Networks: A Comprehensive
Foundation.
• Livro em Português: Braga, Ludermir e Carvalho. Redes
Neurais Artificiais. LTC.

There are very interesting stuff on the web.

51
Deep Learning

Anna Helena Reali Costa

“more intuitive” applications
“Baseball player is
throwing ball in game."

Generating natural language descriptions

Face recognition

Dog or Mop? Chiuaua or Muffin?

Deep Learning Applications
Machine Learning depends on
Representation
• The performance of simple machine learning
algorithms depends heavily on the
representation of the data they are given
CHAPTER 1. INTRODUCTION
(features!).
Cartesian coordinates Polar coordinates

Representations
Matter!!
y

θ
x r 55

Figure 1.1: Example of diﬀerent representations: suppose we want to separate two

Representation Learning

• Solution: to use machine learning to discover

not only the mapping from representation to
output but also the representation itself.
– This approach is known as representation learning.

• Deep learning allows the computer to build

complex representations out of simpler
representations.

56
Output

Mapping from
Output Output
features

Learning
Multiple Output
Mapping from
features
Mapping from
features
Additional
layers of more
abstract
features

Components
Hand- Hand-
Simple
designed designed Features
features
program features

Input Input Input Input

Deep
Classic learning
Rule-based
machine
systems Representation
learning
learning 57
Deep Learning

https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-
machine-learning-deep-learning-ai/ 58
Historical Trends:
Growing Datasets

59
Historical Trends:
Growing Connections per Neuron

9: COTS HPC unsupervised CNN (Coates et al., 2013)

10: GoogLeNet (Szegedy et al., 2014) 60
Historical Trends:
Growing Number of Neurons

9: Echo state network (Jaeger and Haas, 2004)

20: GoogLeNet (Szegedy et al., 2014) 61
CHAPTER 1. INTRODUCTION
Historical Trends: Increasing Accuracy
• DL has solved increasingly complicated
applications with increasing accuracy:
– Image Recognition and Object Recognition:
• Large Scale Visual Recognition Challenge 2017 (ILSVRC2017)

0.30
ILSVRC classification error rate

0.25

0.20

0.15

0.10

0.05

0.00
2010 2011 2012 2013 2014 2015
Year
Figure 1.12: Since deep networks reached the scale necessary to compete in the ImageNet 62
Large Scale Visual Recognition Challenge, they have consistently won the competition
every year, and yielded lower and lower error rates each time. Data from Russakovsky
Deep Learning x GPUs
• What is the relation between DL and GPUs?

Deep Learning & GPU:

– Matrix operations
– Parallel computing paradigm

63
Problems with DNN (Deep MLP)
• Overfitting:
– The more layers you have, the
more degrees of freedom you
have.
– DNNs model rare dependencies
in the training data.
• Diffusion of Gradient: error
attenuates as it propagates to early
layers.
– Early layers never learn!

https://en.wikipedia.org/wiki/Deep_learning 64
Main Deep Learning Architectures
• Deep Belief Networks / Autoencoders
– Greedy layer-wise pretraining, by Hinton et al.,
2006
• Deep Convolutional Neural Networks
– La Net, by Le Cun et al, 1998.
• Deep Recurrent Networks
– Long short-term Memory, by Sepp&Jurgen, 97.

65
Greedy layer-wise pretraining,
by Hinton et al., 2006.

AUTOENCODERS
Autoencoders

• Autoencoders have just one layer.

• The aim of an autoencoder is to learn a
representation (encoding) for a set of data,
typically for the purpose of dimensionality
reduction.

67
Autoencoders
• An auto-encoder is trained, with an absolutely
standard weight-adjustment algorithm to
reproduce the input.
• By making this happen with (many) fewer
units than the inputs, this forces the ‘hidden
layer’ units to become good feature
detectors.

68
https://www.macs.hw.ac.uk/~dwcorne/Teaching/introdl.ppt
Representation learning (hidden layer)
1 0 0
0 1 1

Entrada Saída hs Saída

Intermediate representation: "discover" binary code !!!!

69
Greedy Layer-Wise Training
Geoffrey E. Hinton and Simon Osindero andYee-Whye Teh. A fast learning algorithm for deep belief nets, 2006

1. Train first layer using your data without the labels.

2. Then freeze the first layer parameters and start training the
second layer using the output of the first layer as the input to
the second layer.
3. Repeat this for as many layers as desired:
– This builds our set of robust features
4. Use the outputs of the final layer as inputs to a supervised
layer/model and train the last supervised layer(s) (leave early
weights frozen)
5. Unfreeze all weights and fine tune the full network by training
with a supervised approach, given the pre-processed weight
settings.
70
How to train this network?

Advanced Machine Learning and Neural Networks, Tony Martinez

http://axon.cs.byu.edu/~martinez/classes/678/ 71
http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders
How to train this network?

Autoencoder in the first

layer

Advanced Machine Learning and Neural Networks, Tony Martinez

http://axon.cs.byu.edu/~martinez/classes/678/ 72
http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders
How to train this network?

Autoencoder in the
second layer

Advanced Machine Learning and Neural Networks, Tony Martinez

http://axon.cs.byu.edu/~martinez/classes/678/ 73
http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders
How to train this network?

Supervised learning in
the last layer

Advanced Machine Learning and Neural Networks, Tony Martinez

http://axon.cs.byu.edu/~martinez/classes/678/ 74
http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders
How to train this network?

Fine tune the full NN

with supervised
learning

Advanced Machine Learning and Neural Networks, Tony Martinez

http://axon.cs.byu.edu/~martinez/classes/678/ 75
http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders
CONVOLUTIONAL NEURAL
NETWORKS

76
CNN
• We know it is good to learn a small model.
• From this fully connected model, do we really need all the
edges?
• Can some of these be shared?

https://cs.uwaterloo.ca/~mli/cs898-2017.html
Consider learning an image:
• Some patterns are much smaller than the
whole image

Can represent a small region with fewer parameters

“beak” detector
https://cs.uwaterloo.ca/~mli/cs898-2017.html
Same pattern appears in different places:
They can be compressed!
What about training a lot of such “small” detectors
and each detector must “move around”.

“upper-left
beak” detector

They can be compressed

to the same parameters.

“middle beak”
detector

https://cs.uwaterloo.ca/~mli/cs898-2017.html
A convolutional layer
A CNN is a neural network with some convolutional layers
(and some other layers). A convolutional layer has a number
of filters that does convolutional operation.

Beak detector

A filter

https://cs.uwaterloo.ca/~mli/cs898-2017.html
Convolution These are the network
parameters to be learned.

1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
-1 1 -1 Filter 2
0 1 0 0 1 0
0 0 1 0 1 0 -1 1 -1

…
…
6 x 6 image
Each filter detects a
small pattern (3 x 3).
https://cs.uwaterloo.ca/~mli/cs898-2017.html
1 -1 -1
Convolution-1 1 -1 Filter 1
-1 -1 1
stride=1

1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0

6 x 6 image

https://cs.uwaterloo.ca/~mli/cs898-2017.html
1 -1 -1
Convolution-1 1 -1 Filter 1
-1 -1 1
If stride=2

1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0

6 x 6 image

https://cs.uwaterloo.ca/~mli/cs898-2017.html
1 -1 -1
Convolution-1 1 -1 Filter 1
-1 -1 1
stride=1

1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1

6 x 6 image 3 -2 -2 -1

https://cs.uwaterloo.ca/~mli/cs898-2017.html
-1 1 -1
Convolution-1 1 -1 Filter 2
-1 1 -1
stride=1
Repeat this for each filter
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
-1 -1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1
0 1 0 0 1 0 Feature
0 0 1 0 1 0 -3 -3 Map0 1
-1 -1 -2 1

6 x 6 image 3 -2 -2 -1
-1 0 -4 3
Two 4 x 4 images
https://cs.uwaterloo.ca/~mli/cs898-2017.html Forming 2 x 4 x 4 matrix
Color image: RGB 3 channels
1 -1 -1 -1-1 11 -1-1
11 -1-1 -1-1 -1 1 -1
-1 1 -1 -1-1 11 -1-1
-1-1 11 -1-1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1-1 11 -1-1
-1-1 -1-1 11 -1 1 -1
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
https://cs.uwaterloo.ca/~mli/cs898-2017.html
1 -1 -1 Filter 1 1 1
-1 1 -1 2 0
-1 -1 1 3 0
4 0 3

…
1 0 0 0 0 1
0 1 0 0 1 0 7 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0 10 0

…
0 0 1 0 1 0
13 0
6 x 6 image
14 0
fewer parameters! 15 1 Only connect to 9
16 1 inputs, not fully
connected
…

https://cs.uwaterloo.ca/~mli/cs898-2017.html
1 -1 -1 1: 1
-1 1 -1 Filter 1 2: 0
-1 -1 1 3: 0
4: 0 3

…
1 0 0 0 0 1
0 1 0 0 1 0 7: 0
0 0 1 1 0 0 8: 1
1 0 0 0 1 0 9: 0 -1
0 1 0 0 1 0 10: 0

…
0 0 1 0 1 0
13: 0
6 x 6 image
14: 0
Fewer parameters 15: 1
16: 1 Shared weights
Even fewer parameters
…

https://cs.uwaterloo.ca/~mli/cs898-2017.html
The whole CNN
cat dog ……
Convolution

Max Pooling
Can
Fully Connected repeat
Feedforward network
Convolution many
times

Max Pooling

Flattened
https://cs.uwaterloo.ca/~mli/cs898-2017.html
Max Pooling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1

3 -1 -3 -1 -1 -1 -1 -1

-3 1 0 -3 -1 -1 -2 1

-3 -3 0 1 -1 -1 -2 1

3 -2 -2 -1 -1 0 -4 3
https://cs.uwaterloo.ca/~mli/cs898-2017.html
Why Pooling
• Subsampling pixels will not change the object
bird
bird

Subsampling

We can subsample the pixels to make image

smaller fewer parameters to characterize the image
https://cs.uwaterloo.ca/~mli/cs898-2017.html
A CNN compresses a fully connected
network in two ways:
• Reducing number of connections
• Shared weights on the edges
• Max pooling further reduces the complexity

https://cs.uwaterloo.ca/~mli/cs898-2017.html
Max Pooling

New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 Max 3 1
0 3
0 0 1 0 1 0 Pooling
2 x 2 image
6 x 6 image
Each filter
is a channel
https://cs.uwaterloo.ca/~mli/cs898-2017.html
The whole CNN
3 0
-1 1 Convolution

3 1
0 3
Max Pooling

A new image Can

repeat
Convolution many
Smaller than the original
times
image
The number of channels Max Pooling

is the number of filters

https://cs.uwaterloo.ca/~mli/cs898-2017.html
The whole CNN
cat dog ……
Convolution

Max Pooling

Fully Connected A new image

Feedforward network
Convolution

Max Pooling

Flattened A new image

https://cs.uwaterloo.ca/~mli/cs898-2017.html
3
Flattening
0

1
3 0
-1 1 3

3 1 -1
0 3 Flattened

1 Fully Connected
Feedforward network

3
https://cs.uwaterloo.ca/~mli/cs898-2017.html
CNNs

http://stats.stackexchange.com/questions/146413/why-
convolutional-neural-networks-belong-to-deep-learning 97
LeNet-5
LeCun, Bottou, Bengio & Haffner, 1998

98
Alexnet
Krizhevsky, Sutskever & Hinton, 2012

99
VGG
Simonyan & Zisserman, 2014

https://arxiv.org/abs/1409.1556 100
DeepFace
Taigman, Yang, Ranzato &Wolf, 2014

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

https://research.fb.com/publications/deepface-closing-the-gap-to-human-level-performance-in-
face-verification/ 101
GoogleLeNet
Szegedy et al, 2015.

http://research.google.com/pubs/pub43022.html 102
Conclusion
• CNN: Special purpose net – Just for images or problems
with strong grid-like local spatial/temporal correlation
• Once trained on one problem could use same net (often
tuned) for a new similar problem – general creator of
vision features
• Autoenconder could be used to find initial parameters
• Lots of hand crafting and tuning to find the right recipe
of receptive fields, layer interconnections, etc.
– Lots more Hyperparameters than standard nets, and even than other deep
networks, since the structures of CNNs are more handcrafted
– CNNs getting wider and deeper with speed-up techniques (e.g. GPU, ReLU,
etc.) and lots of current research, excitement, and success
103
Fully Supervised Deep Learning
• Much recent success in doing fully supervised
deep learning with extensions which diminish the
effect of early learning difficulties (unstable
gradient, etc.)
• Patience (now that we know it may be worth it),
faster computers, and use of GPUs
• More efficient activation functions (e.g. ReLUs) in
terms of both computation and avoiding f'(net)
saturation
104
Open problems
• A More Scientific Approach is Needed, not
Just Building Better Systems…
– Geoff Hinton, Yoshua Bengio & Yann LeCun, NIPS
2015

105

2025-Lecture07-P2-MLP
No ratings yet
2025-Lecture07-P2-MLP
56 pages
Deep Learning - Part-1
No ratings yet
Deep Learning - Part-1
143 pages
Artificial Neural Network (2)
No ratings yet
Artificial Neural Network (2)
75 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
Lecture NN 2005
No ratings yet
Lecture NN 2005
137 pages
Lecture 7 - Neural Networks
No ratings yet
Lecture 7 - Neural Networks
48 pages
ML-Lec10-Artificial Neural Networks (1)
No ratings yet
ML-Lec10-Artificial Neural Networks (1)
76 pages
08_NN
No ratings yet
08_NN
117 pages
AN2DL_02_2324_Perceptron_2_FeedForward
No ratings yet
AN2DL_02_2324_Perceptron_2_FeedForward
55 pages
ML_UNIT-1 &2 Notes
No ratings yet
ML_UNIT-1 &2 Notes
84 pages
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
No ratings yet
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
127 pages
Unit 6 Application of AI
No ratings yet
Unit 6 Application of AI
91 pages
DL_Unit II
No ratings yet
DL_Unit II
78 pages
Module - 2
No ratings yet
Module - 2
33 pages
Lecture+8
No ratings yet
Lecture+8
65 pages
UNIT 4 ML NN ,DL,CNN-1
No ratings yet
UNIT 4 ML NN ,DL,CNN-1
84 pages
Refined Chapter 5 UceQEJ (2)
No ratings yet
Refined Chapter 5 UceQEJ (2)
79 pages
4 Neural Network
No ratings yet
4 Neural Network
74 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
Wk. 12. Artificial Neural Networks [12!05!2021] (1)
No ratings yet
Wk. 12. Artificial Neural Networks [12!05!2021] (1)
48 pages
Ann
No ratings yet
Ann
24 pages
Lecture_2 (1)
No ratings yet
Lecture_2 (1)
52 pages
Basics
No ratings yet
Basics
48 pages
2023-Lecture11-NeuralNetworks
No ratings yet
2023-Lecture11-NeuralNetworks
48 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
No ratings yet
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
65 pages
Algorithm & Solved Example - ADALINE
No ratings yet
Algorithm & Solved Example - ADALINE
5 pages
Neural Network: Prof. Subodh Kumar Mohanty
No ratings yet
Neural Network: Prof. Subodh Kumar Mohanty
37 pages
Artificial Neural Network
100% (2)
Artificial Neural Network
20 pages
Wk9-Neural Networks
No ratings yet
Wk9-Neural Networks
46 pages
UNIT V
No ratings yet
UNIT V
49 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
81 pages
Module 3 Chap 4 ANNs
No ratings yet
Module 3 Chap 4 ANNs
69 pages
2021 Lecture11 NeuralNetworks
No ratings yet
2021 Lecture11 NeuralNetworks
48 pages
Lecture15 NeuronNetworks
No ratings yet
Lecture15 NeuronNetworks
61 pages
Isch 4
No ratings yet
Isch 4
44 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
19_Learning
No ratings yet
19_Learning
31 pages
12 Neural Network
No ratings yet
12 Neural Network
52 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
Machine Learning
No ratings yet
Machine Learning
77 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
GANS-ppt
No ratings yet
GANS-ppt
22 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
Basics of Deep Learning
No ratings yet
Basics of Deep Learning
20 pages
2024 MTH058 Lecture02 Backpropagation
No ratings yet
2024 MTH058 Lecture02 Backpropagation
62 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
No ratings yet
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
25 pages
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
No ratings yet
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
25 pages
Module 3 Ppt
No ratings yet
Module 3 Ppt
83 pages
Unit 5
No ratings yet
Unit 5
61 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
15 pages
Neural Networks: Some Material Adopted From Notes by
No ratings yet
Neural Networks: Some Material Adopted From Notes by
35 pages
Deep Learning Computer Vision NLP
No ratings yet
Deep Learning Computer Vision NLP
140 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
Part7.2 Artificial Neural Networks
No ratings yet
Part7.2 Artificial Neural Networks
51 pages
Deep Learning Techniques (Important Questions)
No ratings yet
Deep Learning Techniques (Important Questions)
5 pages
L07 - Advance Analytical Theory and Methods - Clustering
No ratings yet
L07 - Advance Analytical Theory and Methods - Clustering
22 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
BAB3 - 185150207111021 - Alfen Hasiholan
100% (1)
BAB3 - 185150207111021 - Alfen Hasiholan
23 pages
R: Adabag
No ratings yet
R: Adabag
34 pages
Artificial neural networks (II) (Part I)
No ratings yet
Artificial neural networks (II) (Part I)
12 pages
KNN_numerical
No ratings yet
KNN_numerical
4 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
Decision_Trees_Concepts_Algorithms
No ratings yet
Decision_Trees_Concepts_Algorithms
15 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Himmatun Najah - SVM SVR Data Mining
No ratings yet
Himmatun Najah - SVM SVR Data Mining
12 pages
Neural - N - Problems - MLP
No ratings yet
Neural - N - Problems - MLP
15 pages
Interview Questions in Neural Network
No ratings yet
Interview Questions in Neural Network
9 pages
MLP Lecture 4
No ratings yet
MLP Lecture 4
35 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
1 Neural Networks
No ratings yet
1 Neural Networks
16 pages
Deep Learning Interview Questions - Deep Learning Questions
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
21 pages
TB - 04 - Superwised Learning
No ratings yet
TB - 04 - Superwised Learning
24 pages
Support Vector Machine-Updated Version
No ratings yet
Support Vector Machine-Updated Version
13 pages
USL - 21070126112 - Colaboratory
No ratings yet
USL - 21070126112 - Colaboratory
3 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Lecture 26-30 Unit 2
No ratings yet
Lecture 26-30 Unit 2
20 pages
Unit III (2) RNN, LSTM, Gru
No ratings yet
Unit III (2) RNN, LSTM, Gru
14 pages
Vanishing and Exploding
No ratings yet
Vanishing and Exploding
9 pages
Principles of ML
100% (1)
Principles of ML
2 pages
DM DT Solved Example 02 - Unlocked
No ratings yet
DM DT Solved Example 02 - Unlocked
3 pages
Technophilia Artificial Intelligence
No ratings yet
Technophilia Artificial Intelligence
5 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Notes_ML_02_Slides_RNN_ANN

Uploaded by

Notes_ML_02_Slides_RNN_ANN

Uploaded by

Terminal Branches

Artificial Neural Network

Identification of military targets: Oil exploration:

Autonomous Vehicle Navigation Prediction on

• Unfortunately, there is no single "recipe" ...

• Design a perceptron (Figure 1) x1 x2 y

• Design a perceptron (Figure 1) x1 x2 y

• Let t be the target output for the current

If |ti – oi| = |d| £ e è w does not change

Make the recall of this network, filling the table.

We now need an algorithm to train the weights ...

1 neuron with two inputs

The error should now be redefined to k outputs:

D = training data Outputs

Input parameters of the backpropagation algorithm:

Entrada Saída hs Saída

Intermediate representation "discovers" binary code !!!!

There are very interesting stuff on the web.

Anna Helena Reali Costa

Generating natural language descriptions

Dog or Mop? Chiuaua or Muffin?

Figure 1.1: Example of diﬀerent representations: suppose we want to separate two

• Solution: to use machine learning to discover

• Deep learning allows the computer to build

Input Input Input Input

9: COTS HPC unsupervised CNN (Coates et al., 2013)

9: Echo state network (Jaeger and Haas, 2004)

Deep Learning & GPU:

• Autoencoders have just one layer.

Entrada Saída hs Saída

Intermediate representation: "discover" binary code !!!!

1. Train first layer using your data without the labels.

Advanced Machine Learning and Neural Networks, Tony Martinez

Autoencoder in the first

Advanced Machine Learning and Neural Networks, Tony Martinez

Advanced Machine Learning and Neural Networks, Tony Martinez

Advanced Machine Learning and Neural Networks, Tony Martinez

Fine tune the full NN

Advanced Machine Learning and Neural Networks, Tony Martinez

Can represent a small region with fewer parameters

They can be compressed

We can subsample the pixels to make image

A new image Can

is the number of filters

Fully Connected A new image

Flattened A new image

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

You might also like