Convolutional Neural Networks
Convolutional Neural Networks
CNN's
We have understood neural networks and how they can be used to build
robust models using numerical data.
Now, consider that we have the first hidden layer of neural network with
10 units. Since each node is connected to all the nodes for the subsequent
layer of a feed-forward neural network, the total number of parameters
(weights+biases) is 108 x 10 + 10 = 1090. So, we need 1090 weights for
only one layer and generally, the size of the image will be higher than 6 x
6 x 3, usually, 224 x 224 x 3. In these kinds of cases, we get a lot of
parameters to train which makes it computationally expensive and the
model doesn't perform better.
Convolutional Operation
The first step of a CNN is to detect features like edges, shapes, etc.
which is done by applying a convolution operation to the image
using filters (filters are responsible for extracting the features from the
image).
After the convolution, we will get a 4 X 4 image. The first element of the 4
X 4 matrix will be calculated as:
Image Source
So, we take the first 3 X 3 matrix from the 6 X 6 matrix and multiply it
with the filter. The first element of the 4 X 4 matrix, will be the sum of the
element-wise product of these values, i.e., 3*1 + 0 + 1*-1 + 1*1 + 5*0
+ 8*-1 + 2*1 + 7*0 + 2*-1 = -5.
Similarly, we will convolve over the entire image and get a 4 X 4 matrix.
Filters
1 0 -1
1 0 -1
1 0 -1
This filter is responsible for detecting vertical edges. Let's see how it
works. If the filter is sliding over a region of the image which has similar
pixels, then the result of the convolution is zero. As the positives and the
negatives cancel each other. However, if the filter is sliding over a region
that has a vertical edge, there are different colored pixels on the left and
right. Then, the result of this convolution is not zero detecting an edge
there.
1 1 1
0 0 0
-1 -1 -1
The above filter is responsible for detecting horizontal edges. The way it
works is if there are different colored pixels on the top and the bottom of
the region, where this filter is sliding through, the result of the convolution
is something other than zero, whereas the regions with uniform pixels
would have given zero as the result of the convolution.
Padding
Input size: n x n
Filter size: f x f
2. Pixels that are present in the corner of the image are used only a
few times during convolution when compared to the other pixels.
This leads to data loss.
To avoid these issues, we can add a border around the input image. This
border is called padding. If we apply a padding of 1, it means that the
input will be an 8 X 8 matrix (instead of a 6 x 6 matrix). Applying a
convolution of 3 x 3 on the padded input will result in a 6 x 6 matrix,
which is the original shape of the image.
Input size: n x n
Padding size: p
2. Same: Here, we apply padding so that the output size is the same
as the input size, i.e.,
n+2p-f+1 = n. So, p = (f-1)/2
We now know how to use padded convolution. This way we don’t lose a lot
of information and the image does not shrink either.
Strided Convolutions
Stride is the number of pixel shifts over the input matrix when we apply a
convolutional filter. Suppose we choose a stride of 2. So, while convoluting
through the image, we will take two steps – both in the horizontal and
vertical directions, separately. The dimensions for stride s will be:
Input size: n x n
Padding size: p
Stride: s
Filter size: f x f
Pooling Layers
Pooling layers are generally used to reduce the size of the input image
and to increase the speed of computation.
Image Source
Convolution Layer
We have seen how the convolution operation works. Now, we will see how
convolutional layers operate in a Neural Network setting.
In the picture shown below, the first matrix is the result we get after the
image goes through convolutional layers, the second layer is the flattened
layer that acts as the input for the fully connected layers.
Image Source