0% found this document useful (0 votes)
6 views

Convolutional Neural Networks

The document provides an overview of Convolutional Neural Networks (CNNs), explaining their structure and functionality in image processing and classification. It details the convolution operation, the role of filters in feature detection, and the importance of padding and strided convolutions to manage image size and data loss. Additionally, it describes the pooling layers and the transition from convolutional layers to fully connected layers for classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Convolutional Neural Networks

The document provides an overview of Convolutional Neural Networks (CNNs), explaining their structure and functionality in image processing and classification. It details the convolution operation, the role of filters in feature detection, and the importance of padding and strided convolutions to manage image size and data loss. Additionally, it describes the pooling layers and the transition from convolutional layers to fully connected layers for classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Convolutional Neural Networks

CNN's

We have understood neural networks and how they can be used to build
robust models using numerical data.

Let's assume we have an image with height = 6, width = 6, and the


number of channels = 3 (known as RGB, where each parameter (red,
green, and blue) defines the intensity of the color as an integer between 0
and 255). So, there are 6 x 6 x 3 = 108 pixels. Each pixel will be a node in
the input layer.

Now, consider that we have the first hidden layer of neural network with
10 units. Since each node is connected to all the nodes for the subsequent
layer of a feed-forward neural network, the total number of parameters
(weights+biases) is 108 x 10 + 10 = 1090. So, we need 1090 weights for
only one layer and generally, the size of the image will be higher than 6 x
6 x 3, usually, 224 x 224 x 3. In these kinds of cases, we get a lot of
parameters to train which makes it computationally expensive and the
model doesn't perform better.

To deal with such problems, we have a special type of neural network


called Convolutional Neural Networks (CNNs) that are used in image
processing and image classification. It takes the pixels of an image as
input and generates the desired output.

Convolutional Operation

Let’s understand how a convolution operation works.

The first step of a CNN is to detect features like edges, shapes, etc.
which is done by applying a convolution operation to the image
using filters (filters are responsible for extracting the features from the
image).

Let’s understand this using an example: Consider a greyscale image of


size 6 x 6 x 1 (the number of channels equal to 1 for greyscale images)
represented as a matrix, where each entry represents pixel intensity.

We can convolve this 6 x 6 matrix with a 3 x 3 filter:


Image Source

After the convolution, we will get a 4 X 4 image. The first element of the 4
X 4 matrix will be calculated as:

Image Source

So, we take the first 3 X 3 matrix from the 6 X 6 matrix and multiply it
with the filter. The first element of the 4 X 4 matrix, will be the sum of the
element-wise product of these values, i.e., 3*1 + 0 + 1*-1 + 1*1 + 5*0
+ 8*-1 + 2*1 + 7*0 + 2*-1 = -5.

Similarly, we will convolve over the entire image and get a 4 X 4 matrix.

In 3D convolution, the total parameters for 10 filters are equal to 3 x 3 x 3


(size of the filters) x 10 (filters) + 10 (bias) = 280. The number
of parameters is very less as compared to ANN.

Note: The size of the filter is 3 x 3 x 3 because we are assuming a colored


image. So, one 3 x 3 filter for each channel - RGB.

Filters

Filters are responsible for locating objects in an image by detecting the


changes in pixel intensity of the images.

Generally, we have an edge detector that detects edges in an image. For


example,

1 0 -1
1 0 -1

1 0 -1

This filter is responsible for detecting vertical edges. Let's see how it
works. If the filter is sliding over a region of the image which has similar
pixels, then the result of the convolution is zero. As the positives and the
negatives cancel each other. However, if the filter is sliding over a region
that has a vertical edge, there are different colored pixels on the left and
right. Then, the result of this convolution is not zero detecting an edge
there.

1 1 1

0 0 0

-1 -1 -1

The above filter is responsible for detecting horizontal edges. The way it
works is if there are different colored pixels on the top and the bottom of
the region, where this filter is sliding through, the result of the convolution
is something other than zero, whereas the regions with uniform pixels
would have given zero as the result of the convolution.

Padding

We have seen that convolving an input of 6 x 6 dimensions with a 3 x 3


filter results in a 4 x 4 matrix. We can generalize this and say that if the
input is n X n and the filter size is f X f, then the output size will be (n-f+1)
x (n-f+1):

 Input size: n x n

 Filter size: f x f

 Output size: (n-f+1) x (n-f+1)

But there are certain disadvantages of a convolutional filter:

1. When we apply a convolutional filter, the size of the image reduces.

2. Pixels that are present in the corner of the image are used only a
few times during convolution when compared to the other pixels.
This leads to data loss.

To avoid these issues, we can add a border around the input image. This
border is called padding. If we apply a padding of 1, it means that the
input will be an 8 X 8 matrix (instead of a 6 x 6 matrix). Applying a
convolution of 3 x 3 on the padded input will result in a 6 x 6 matrix,
which is the original shape of the image.

 Input size: n x n

 Padding size: p

 Filter size size: f x f

 Output size: (n+2p-f+1) x (n+2p-f+1)

There are two common choices for padding:

1. Valid: It means no padding. If we are using valid padding, the


output will be (n-f+1) x (n-f+1)

2. Same: Here, we apply padding so that the output size is the same
as the input size, i.e.,
n+2p-f+1 = n. So, p = (f-1)/2

We now know how to use padded convolution. This way we don’t lose a lot
of information and the image does not shrink either.

Strided Convolutions

Stride is the number of pixel shifts over the input matrix when we apply a
convolutional filter. Suppose we choose a stride of 2. So, while convoluting
through the image, we will take two steps – both in the horizontal and
vertical directions, separately. The dimensions for stride s will be:

 Input size: n x n

 Padding size: p

 Stride: s

 Filter size: f x f

 Output size : [(n+2p-f)/s+1] x [(n+2p-f)/s+1]

Stride helps to reduce the size of the image.

Pooling Layers

Pooling layers are generally used to reduce the size of the input image
and to increase the speed of computation.

Consider a 4 x 4 matrix as shown below:


Applying max-pooling on this matrix will result in a 2 x 2 output:

Image Source

For every 2 x 2 box, we take the maximum number. Here, we have


applied a filter of size 2 and a stride of 2. These are the hyperparameters
for the pooling layer. Apart from max-pooling, we can also apply average-
pooling where, instead of taking the maximum of the numbers, we take
their average.

Convolution Layer

We have seen how the convolution operation works. Now, we will see how
convolutional layers operate in a Neural Network setting.

In artificial neural networks, each layer has multiple neurons. Similarly in


CNNs, we have multiple filters on each layer. We specify the shape of
these filters and other parameters such as stride, padding, etc. The output
of the convolution operation with each filter is a 2-D matrix as discussed
above.

Since we have multiple convolutional layers, each layer receives the


image channel from the output of the previous layer. For the first layer,
the three input channels are the red, blue, and green pixel values of the
images, respectively. The output channels of each layer (since each filter
produces one output channel) act as the inputs for the next convolutional
layer.

At the start of the fully connected section of our architecture, we perform


a flatten operation that converts all these images into a 1-D data format
which makes it suitable for the feed-forward layers lying ahead.

Fully Connected Layer


We can extract different features of the image using combinations of
different convolutional layers and other techniques mentioned above but
convolutional layers cannot be used to do classification or regression. For
this, we make a fully connected layer. However, after applying different
filters and layers, the output is a matrix. So, we have to flatten that
matrix in the form of a vector to feed it into the fully connected layer.

In the picture shown below, the first matrix is the result we get after the
image goes through convolutional layers, the second layer is the flattened
layer that acts as the input for the fully connected layers.

Image Source

You might also like