0% found this document useful (0 votes)
1 views

Introduction to Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are specialized deep learning architectures designed for processing grid-like data, particularly in computer vision tasks such as image recognition and classification. They consist of multiple layers including convolutional layers for feature extraction, pooling layers for dimensionality reduction, and fully connected layers for final predictions. CNNs have evolved to handle complex tasks and achieve state-of-the-art results across various applications, leveraging backpropagation and gradient descent for learning optimal filters.

Uploaded by

mentorsahila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Introduction to Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are specialized deep learning architectures designed for processing grid-like data, particularly in computer vision tasks such as image recognition and classification. They consist of multiple layers including convolutional layers for feature extraction, pooling layers for dimensionality reduction, and fully connected layers for final predictions. CNNs have evolved to handle complex tasks and achieve state-of-the-art results across various applications, leveraging backpropagation and gradient descent for learning optimal filters.

Uploaded by

mentorsahila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

Introduction to Convolutional Neural Networks:


A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly
used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to
understand and interpret the image or visual data. In a regular Neural Network there are three types of layers:

 Input Layers: It’s the layer in which we give input to our model. The number of neurons in this layer
is equal to the total number of features in our data (number of pixels in the case of an image).
 Hidden Layer: The input from the Input layer is then feed into the hidden layer. There can be many
hidden layers depending upon our model and data size. Each hidden layer can have different numbers
of neurons which are generally greater than the number of features. The output from each layer is
computed by matrix multiplication of output of the previous layer with learnable weights of that layer
and then by the addition of learnable biases followed by activation function which makes the network
nonlinear.
 Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid or
softmax which converts the output of each class into the probability score of each class.

The data is fed into the model and output from each layer is obtained from the above step is called feed
forward, we then calculate the error using an error function, some common error functions are cross-entropy,
square loss error, etc. The error function measures how well the network is performing. After that, we back
propagate into the model by calculating the derivatives. This step is called Back propagation which basically
is used to minimize the loss.

1.1 Convolution Neural Network


Convolutional Neural Network (CNN) is the extended version of artificial neural networks (ANN) which
is predominantly used to extract the feature from the grid-like matrix dataset. For example visual datasets
like images or videos where data patterns play an extensive role.

Around the 1980s, CNNs were developed and deployed for the first time. A CNN could only detect
handwritten digits at the time. CNN was primarily used in various areas to read zip and pin codes etc. The
most common aspect of any AI model is that it requires a massive amount of data to train. This was one of
the biggest problems that CNN faced at the time, and due to this, they were only used in the postal
industry. Yann LeCun was the first to introduce convolutional neural networks.

Convolutional Neural Networks, commonly referred to as CNNs, are a specialized kind of neural network
architecture that is designed to process data with a grid-like topology. This makes them particularly well-
suited for dealing with spatial and temporal data, like images and videos that maintain a high degree of
correlation between adjacent elements.

CNNs are similar to other neural networks, but they have an added layer of complexity due to the fact that
they use a series of convolutional layers. Convolutional layers perform a mathematical operation called
convolution, a sort of specialized matrix multiplication, on the input data. The convolution operation helps
to preserve the spatial relationship between pixels by learning image features using small squares of input
data. . The picture below represents a typical CNN architecture.

Fig. 1Typical CNN architecture


The following are definitions of different layers shown in the above architecture:

 Convolutional layers

Convolutional layers operate by sliding a set of ‘filters’ or ‘kernels’ across the input data. Each filter is
designed to detect a specific feature or pattern, such as edges, corners, or more complex shapes in the case
of deeper layers. As these filters move across the image, they generate a map that signifies the areas where
those features were found. The output of the convolutional layer is a feature map, which is a
representation of the input image with the filters applied. Convolutional layers can be stacked to create
more complex models, which can learn more intricate features from images. Simply speaking,
convolutional layers are responsible for extracting features from the input images. These features might
include edges, corners, textures, or more complex patterns.

 Pooling layers

Pooling layers follow the convolutional layers and are used to reduce the spatial dimension of the input,
making it easier to process and requiring less memory. In the context of images, “spatial dimensions” refer
to the width and height of the image. An image is made up of pixels, and you can think of it like a grid,
with rows and columns of tiny squares (pixels). By reducing the spatial dimensions, pooling layers help
reduce the number of parameters or weights in the network. This helps to combat over-fitting and help
train the model in a fast manner. Max pooling helps in reducing computational complexity, owing to
reduction in size of feature map, and making the model invariant to small transitions. Without max
pooling, the network would not gain the ability to recognize features irrespective of small shifts or
rotations. This would make the model less robust to variations in object positioning within the image,
possibly affecting accuracy.

There are two main types of pooling: max pooling and average pooling. Max pooling takes the maximum
value from each feature map. For example, if the pooling window size is 2×2, it will pick the pixel with
the highest value in that 2×2 region. Max pooling effectively captures the most prominent feature or
characteristic within the pooling window. Average pooling calculates the average of all values within the
pooling window. It provides a smooth, average feature representation.

 Fully connected layers


Fully-connected layers are one of the most basic types of layers in a convolutional neural network (CNN).
As the name suggests, each neuron in a fully-connected layer is Fully connected- to every other neuron in
the previous layer. Fully connected layers are typically used towards the end of a CNN- when the goal is to
take the features learned by the convolutional and max pooling layers and use them to make predictions
such as classifying the input to a label. For example, if we were using a CNN to classify images of
animals, the final Fully connected layer might take the features learned by the previous layers and use
them to classify an image as containing a dog, cat, bird, etc.
Fully connected layers take the high-dimensional output from the previous convolutional and pooling
layers and flatten it into a one-dimensional vector. This allows the network to combine and integrate all the
extracted features across the entire image, rather than considering localized features. It helps in
understanding the global context of the image. The fully connected layers are responsible for mapping the
integrated features to the desired output, such as class labels in classification tasks. They act as the final
decision-making part of the network, determining what the extracted features mean in the context of the
specific problem (e.g., recognizing a cat or a dog).

The combination of Convolution layer followed by max-pooling layer and then similar sets creates a
hierarchy of features. The first layer detects simple patterns, and subsequent layers build on those to detect
more complex patterns.

CNNs are often used for image recognition and classification tasks. For example, CNNs can be used to
identify objects in an image or to classify an image as being a cat or a dog. CNNs can also be used for more
complex tasks, such as generating descriptions of an image or identifying the points of interest in an image.
Beyond image data, CNNs can also handle time-series data, such as audio data or even text data, although
other types of networks like Recurrent Neural Networks (RNNs) or transformers are often preferred for
these scenarios. CNNs are a powerful tool for deep learning, and they have been used to achieve state-of-
the-art results in many different applications.

1.2 CNN architecture


Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer,
Pooling layer, and fully connected layers.

Fig.2 Simple CNN architecture


The Convolutional layer applies filters to the input image to extract features, the
Pooling layer down samples the image to reduce computation, and the fully
connected layer makes the final prediction. The network learns the optimal filters
through back propagation and gradient descent as detailed in Fig. 3.

.
Fig. 3 Functions of CNN Layers

You might also like