lecture_4 (5)
lecture_4 (5)
Mehdi Zakroum
1. Introduction
Image analysis: the swan has certain characteristics that can be used to
help determine whether a swan is present or not, such as its long neck, its
white color, etc..
The features are still present in the above image, but it is more difficult for
us to pick out these characteristic features.
▶ A MLP trained to detect suricates on images with a fixed size will not
work on images with a different size, because the suricate would not
activate the same neurons, so the network’s output may be very
different!
▶ Therefore, we need to find a way to make the ANN work with inputs
of variable size.
Limitations of MLP
▶ 1-dimensional convolution:
+∞
X
s(n) = (x ⋆ k)(n) = x(n − i) × k(i)
i=−∞
▶ 2-dimensional convolution:
+∞
X +∞
X
S(m, n) = (X ⋆ K)(m, n) = X(m − i, n − j) × K(i, j)
i=−∞ j=−∞
I
X
s(n) = (x ⋆ k)(n) = x(n − i) × k(i)
i=−I
I
X J
X
S(m, n) = (X ⋆ K)(m, n) = X(m − i, n − j) × K(i, j)
i=−I j=−J
Example of 2D Convolution
Figure 1: Image image processing, The Sobel Gx filter is used for edge detection (in
the horizontal direction).
Example of 2D convolutions
Example of Padding
Convolution Layer
▶ The convolution stride specifies how much we move the filter window
at each step.
▶ In the previous examples, the stride was equal to 1, i.e. the filter
convolves around the input volume by shifting one unit at a time.
▶ In practice, the stride may be different from 1.
H − K + 2P
O= +1
S
▶ O: output height/length
▶ H : input height/length
▶ K : the filter size
▶ P : the padding
▶ S : the stride.
Non-linearity
Pooling
Pooling Hyperparameters:
AlexNet (2012): one of the first Deep CNN to achieve considerable accuracy
on the 2012 ILSVRC challenge with an accuracy of 84.7% as compared to
the second-best with an accuracy of 73.8%.
ILSVRC: ImageNet Large Scale Visual Recognition Challenge