0% found this document useful (0 votes)

24 views

Some Studies On Convolution Neural Network

Uploaded by

Sankhadeep Bhowmick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Some Studies On Convolution Neural Network

Uploaded by

Sankhadeep Bhowmick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

International Journal of Computer Applications (0975 – 8887)

Volume 182 – No. 21, October 2018

Some Studies on Convolution Neural Network

Goutam Sarker, PhD

Associate Professor,
Computer Science and Engineering Department,
NIT Durgapur, INDIA

ABSTRACT Classical ANN is having the severe limitations in that, the

Two major tools to implement any artificial intelligence and time complexity of computation for the purpose of pattern
machine learning systems are Symbolic AI and Artificial recognition in image processing is too large to afford.
Neural Network AI. Artificial Neural Network (ANN) has Let us take one example of learning and recognizing image
made a tremendous improvement in the versatile area of MNIST benchmark handwritten dataset. Each digit of the
Machine Learning (ML). Artificial Neural Network (ANN) is dataset is having the dimensionality of 28*28. If we employ a
an assembly of huge number of weighted interconnected conventional classical neural network, any neuron of the first
artificial neurons, initially invented with the inspiration of hidden layer should have 28*28*1= 784 weighted links falling
biological neurons. All these models are much better than onto it. Here we have considered conversion of MNIST data
previous models implemented with symbolic AI so far as their set to just black and white. This ‘784’ number is moderate and
performance is concerned. One revolutionary change in ANN affordable. This is just because of the fact, that this does not
is Convolutional Neural Network (CNN). These structures are demand the network size to be extremely large enough to
mainly suitable for complex pattern recognition tasks within accommodate it.
images for the purpose of computer vision.
But if our input image is a coloured one of size 128*128, any
General Terms neuron in the first hidden layer should have 128*128*3 =
Artificial Intelligence, Artificial Neural Network, Machine 12288 weighted links landing onto it. (Multiplication of pixel
Learning, Pattern Recognition size by 1 for black and white image while multiplication by 3
for coloured image). This number is much larger than the
Keywords previous black and white one. Therefore all the neurons in the
Convolution Neural Network, Deep Learning, Optical first hidden layer would have 12288 weighted links.
Character Recognition, Machine Transcription, Machine Accordingly the size of the classical neural network for the
Translation, Accuracy purpose of recognition of coloured normalized MINST digit
demands to be abnormally large enough. Apparently to cope
1. INTRODUCTION with this situation, we may think to increase the size of the
To extract different patterns within various images for the network (by just increasing the number of hidden layers and
specific task of image recognition, the latest state of the art the number of neurons per hidden layer). But there are two
technique is to design and develop a Convolutional Neural major problems in doing so. They are mentioned as follows:
Network (CNN) [1,2,3]. In this method, the specific features
of images are suitably extracted and coded in that special type 1.) The first problem is that, if the network size is extremely
of neural network, with the learned or programmed filters or large, it accordingly demands enormously huge computational
kernels. These filters or kernels are specific to particular power as well as the learning and recognition time to make
images. This makes the CNN [4,5,6,14] a far better concept the system learn and recognize the input data set.
and methodology for complex pattern and image recognition 2.) The second problem is the problem of overfitting. The
tasks, providing so far the best effectiveness and efficiency. power of generalisation is severely reduced with overfitting.
In our previous works for Face Detection and Localization The overfitting occurs if the network size is large. If the
[15,16,17,21,26] we used probabilistic framework and network size is large, the memorization power of the network
supervised learning method to perform Face Detection and undesirably increases, thereby dropping down the capability
Localization. In [18,27] an unsupervised learning model for of generalization. Also extremely large number of weights
both Face Identification and Localization is implemented. A are to be learned by the network – because the storage
modified RBF Network with basis function learning by capacity of the network is large. Network power of
Optimal Clustering is for Face Identification and Localization generalization reduces and thereby the network will overfit.
is utilized in [19,20,28,29,30,31]. A Modified RBFN with This degrades the performance of the system so far as the
basis function learning with a new concept of Heuristic Based image recognition is capabilities are concerned. Thus
Clustering is used in [22,24,25] while a Competitive Learning increasing the network size cannot be a solution for handling
Network using Malsburg Learning is employed for Face large coloured image.
Detection and Localization in [23].
2. THE FUNDAMENTALS OF
CONVOLUTION
In all those previous works we used classical ANN to solve in Let us consider the problem of finding the position of an
general any pattern recognition problem like Face Detection, aircraft with a LASER sensor system. The output of the
Identification and Localization and person authentication with system is the precise position of the aircraft p(t) at time t.
biometrics. But conventional ANN has some limitations also.

13
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018

Consider the above system is a very noisy one. We want to 3. MOTIVATIONS FOR
obtain a more accurate noise free position of the same aircraft.
Then we have to make an average of several such CONVOLUTIONAL NEURAL
measurements of position p(t). The more the measurement is NETWORK (CNN)
recent, the more it is likely to be close to an accurate one. The architecture of Convolutional Neural Network (CNN)
With that there must be an weight attached to each [6,7,8] in general is comprised of a series of the set of some
measurement – the weights are assigned to more recent convolutional layers, rectified linear unit layer, pooling layer.
measurements. In convolutional layer, to perform the operation of
convolution, some filters or kernels are employed
So let us present a new function P that will give an accurate
estimate of the position of the aircraft where

P(t) = ∫ p(t) w(t) dt

This technique to find the most accurate position of the
aircraft is in general the basic concept of the operation of
convolution.
We can also denote the operation of convolution through
another similar symbolic notation

P(t) = (p*w)(t)
With respect to convolutional neural network, the first
argument i.e. the function ‘p’ is referred to as the input, while
the second argument i.e. the weight function ‘w’ is called the Fig. 2 In case of CNN, the input output connections are
kernel or filter. The output is called feature map or activation always sparse.
map. The ‘w’ should definitely be a valid probability density . Classical or conventional ANNs are much improved and
function otherwise we will not be able to perform the empowered with the three major concepts or ideas of
operation of convolution. Convolution Neural Network. These are the main features of
any CNN [9,10]. These are as mentioned below:
1. Sparse Connectivity.
2. Sharing of Parameters.
3. Equivariant Representation.

3.1 Sparse Connectivity (or Sparse

Interactions or Sparse Weights)
In ordinary neural network, in any layer, if there are m inputs
and n outputs, for the purpose of computation of output, we
need matrix multiplication and m*n parameters are involved.
There the time complexity of this matrix multiplication in Big
O notations is O(m*n) for each input instance. On the
contrary, if only k inputs are connected to each of n output
(i.e. each output is connected to k input), this is sparse input
output connection, and this demands the above time
complexity to be O(k*n). If this k is substantially much
smaller than m, then there is a drastic reduction of time
Fig. 1 A 2D convolution without kernel flipping – the complexity per input instance. The sparse connectivity is
sliding kernel always lies entirely within the image, illustrated in Fig. 2 and Fig. 3. Note that in both the Figures
making the convolution a valid one the value of n = 1.
Fig. 1 shows a 2D convolution without kernel flipping. Here
the kernel or the filter moves left to right and top to bottom,
by one pixel and each time produces the dot product of the
local region of the input and the kernel. The moving or sliding
kernel always lies within the image boundary.
Kernel Non Flipping – The kernel while sliding on the input
always lies entirely within the input region boundary.
Kernel Flipping – The kernel while sliding on the input moves
beyond the input region boundary.

14
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018

Network, each component of the kernel as well as the entire

kernel itself is moved along the input activation and thereby
used at each and every position of the input (Some extreme
boundary pixels of the input might be discarded according to
some design considerations). In some design consideration the
kernel might move a few pixels of the input beyond the
boundary producing a ‘kernel flipping’. On the contrary in
other design the kernel movement is restricted only within the
input region producing ‘kernel non flipping’. The sharing of
parameters or weights has the advantage that rather than
learning a separate parameter or weight set for different
locations of the input, the system learns only one set in the
form of kernel or filter.

3.3 Equi Variant Representation:

The overall output corresponding to an input is same, if the
sequence of operation applied to the input is interchanged –
this property is termed as Equi Variance.
Fig. 3. An example of sparse input output connection Mathematically, a function ‘f’ is equi variant to a function ‘g’,
if f(g(x) ) = g(f(x)). For convolution operation, if ‘f’ is a
function that convolves the input and ‘g’ is another function
that translates the input (i.e. shifts the input from one position
to another), then convolution function ‘f’ is equi variant to ‘g’.

4. BROAD ARCHITECTURE OF CNN

In general in CNN there are three different types of layers
namely – convolution layer, pooling layer and fully connected
layer. The convolution layer is comprised of input volume,
kernel or filter and output volume. The input and output
volume with height, width and depth is indicated in Fig. 5.
The convolution and pooling layers may be repeated a number
of times to get the desired size of activation map. When these
are combined together, a complete CNN structure is formed.
The complete overall architecture of CNN is indicated in Fig.
6.

Fig. 4 Variation of output for the same input in case of a

sparse connectivity and a full connectivity
In Fig. 4 One input X3 is active and by this all the output units Fig. 5 Shows the height width and depth of input and
in S that are affected by this active input unit are highlighted. output layer
The upper diagram is showing the output S formed by
convolution of kernel width 3, where only 3 output units are
affected by X3, since the connections are sparse. The lower
diagram is showing the output S formed by matrix
multiplication, where all the inputs are connected to all the
output neurons. Thus all the output neurons are affected by
input neuron X3.

3.2 Sharing of Parameters or Weights:

In classical or conventional neural network, to compute the
Fig. 6. The overall architecture of CNN
output of any layer, each weight of the weight matrix is used
only once. This weight is multiplied by the corresponding The image pixel intensities are captured by the input layer.
input vector component and becomes totally useless for the
same input in future. Unlike this, in Convloutional Neural

15
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018

1. Convolution Layer: Local regions of the input volume are 3. Pooling Layer: Sub sampling along the spatial dimensionality
connected to the output neurons for the purpose of of the given input (produced as an activation map at the
convolution. Convolution layer computes the output of the output volume or as an output of ReLU) further reduces the
neurons connected to the local regions of the input volume. number of parameters within the activation map. This sub
This is mathematically done by scalar product of the weights sampling is called pooling.
of the kernel and the local region connected to the input
volume.
2. Rectified Linear Unit (ReLU): The ReLU applies an
activation function (e.g. sigmoid or g(z) = max{0,z}) to each
an every element at the output volume produced by the
previous layer.

In general, max pooling with a pooling kernel dimensionality

of 2*2 is applied to the activation map produced by
Convolution Layer. The number of pixels, through which the
pooling kernel has to be shifted along the spatial
dimensionality (formally called kernel stride) is usually set as
2 for a kernel dimensionality of 2*2, so that there is no
overlappling. With this evidently the activation map produced
by the convolution layer is scaled down to 25% (because 2*2
= 4 pixels are represented by 1 pixel at the output).
One variation of pooling is Overlapped Pooling. This occurs
when the kernel size is more than the stride. For example, say
the stride is set as 2 and the kernel size is 3*3. In this case the
pooling would be overlapped.
Pooling definitely distorts the image and therefore a pooling
kernel size above 3*3 distorts heavily the input image and
thereby degrades the desirable performance of the CNN.
The advantage of pooling is to reduce the size of the image
and number of parameters. Also the pooling makes the
representation more or less invariant to small translation of
the overall input image. We will see it later in section 5.
4. Fully Connected Layer: Here neurons are connected between
two adjacent layers, without being connected to any neuron in
any layer within them – unlike Recurrent Neural Network
(RNN)
Hyper Parameters : Depth, Stride and Zero Padding: We
can optimize the Convolutional Layer Output with three hyper
parameters namely depth, stride and zero padding. They are
detailed as below:
1) Depth : The number of neurons within a layer of
the output volume connected to the same region of the input is
called the depth of the output volume produced during
convolution. The lesser is the value of this hyper parameter
the lesser is the number of neurons and parameters in the
Fig. 7 Depth of the input and output volume
network. But lowering the value of this hyperparameter also
degrades the overall performance of the CNN so far as the
pattern recognition capability is concerned. The hyper
parameter depth is indicated in both upper and lower parts of 2) Stride: This is the number of pixels through which the kernel
Fig. 7, where for the same input volume depth, the upper part (filter) has to be shifted around the spatial dimensionality of
is has less depth while the lower part has more depth in output the input. If the stride is set as 1, then a heavily overlapped
volume. receptive field is obtained and extremely large activations and
large output spatial dimensions are yielded.
On the contrary, if the stride is set to be a number greater than
1, then the amount of overlapping is comparatively smaller
than before and an output of lower activations and lower
spatial dimensions are produced.
3) Zero Padding: To control the dimensionality of the output
volume, (such that output volume height and width exactly
matches or greater than or less than that of the input volume)

16
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018

we need zero padding. With this the outer region of the input
volume is surrounded by few layers of zero.

Fig. 8 Input vector, Pooled Vector Kernel and

Destination Pixel

Fig. 8 shows one specimen input vector, pooled vector and

kernel or filter. A local region of size 3*3 of input is
highlighted and produces a pooled vector. The kernel
produces a dot product and creates an activation in the
destination pixel. The kernel with a stride lands on next local
region and produces another pooled vector. Similarly the
activation of next destination pixel is created.
In the case the stride = 1, if the input volume size is m*n, the
kernel size is r*r, then the output volume size is [m-(r-1)].
Thus for example, if we want to keep output volume size =
input volume size = m*n, in case the kernel size is r*r, then
we are to put (r-1) layers of zero padding surrounding the
input (of course the stride should be equal to one). Similarly
we can also make the output volume size larger than the input
volume size, by suitably increasing the amount of zero
padding. Thus by controlling the amount of zero padding, we Fig. 9 Max pooling introduces translation invariance
can make the output volume size (height and width) either
In the upper part of fig. 9, the bottom row is the output of
greater than or equal to or less than the input volume size.
nonlinear convolution stage. The result of (max) pooling is
To extract the different desirable features of the input image, a indicated by upper row. Here we have considered a stride of
number of learned or programmed filters or kernels are unity. Since the pooling kernel size is 3 and stride is 1 (which
employed. A corresponding activation map is generated by is less than 3), this is an example of over lapped pooling.
each kernel. Those activation maps are accumulated or
In the lower part of the fig. 9, the input has been translated
stacked together. This produces the complete set of output
towards right by one pixel. The corresponding values of all
volume of the convolution layer.
the pixels of the bottom row are changed due to this shifting.
For example, if there are n kernels or filters then there are n But in the figure just half of the values of the activations of
activation map which produces the complete output volume. the top row have changed. This is due to the reason that max
If it is a coloured image, then there are n*3 kernels or filters, pooling units accumulate the maximum value in a
thereby 3 channels per activation map; and thus n*3 activation neighbourhood, irrespective of the exact location of the
maps. maximum value.

5. ADVANTAGES OF POOLING Rotation Invariance: If the input image is slightly rotated by

Pooling introduces: some angle, then also it is perfectly detected due to pooling
stage of CNN. Fig. 10 is a simple example:
1. Translation Invariance.
2. Rotation Invariance.
3. Scaling Invariance.
Translation Invariance: If the input image is slightly translated
to a new position, then also it is perfectly detected due to
pooling stage of CNN. Below is a simple example:

Fig. 10 Rotation invariance through max pooling

In Fig. 10, the CNN has learned three kernels or filters. These
kernels correspond to three Bs – the first one is rotated
anticlockwise by and angle of 45 deg., the second one is

17
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018

vertically upward and the third one is rotated clockwise by

and angle of 45 deg. These all kernels are supposed to detect a
handwritten “B”. When a 45 deg anticlockwise rotated “B” is
placed as input image, large activation is produced in the
activation map corresponding to Filter 1. Similarly when a 45
deg clockwise rotated “B” is placed as input image, large
activation is produced in the activation map corresponding to
Filter 3. In the first part of the figure, the pooling of the
activation corresponding to Filter 1 would be maximum,
while that of the other two would be minimum. In the second
part of the figure, the pooling of the activation corresponding
to Filter 3 would be maximum, while that of the other two
would be minimum. But when these three pooled activations
are integrated together, the integrator would indicate large Fig. 12 First feature map corresponding to first filter
activation irrespective of which pooled activation created
large response. Thus the integrator of the pooled outputs
would have large activation, irrespective of any of the above
orientation of “B” given as input. Thereby all those
differently oriented “B”s are successfully detected as “B”
only.
Scaling Invariance: Through convolution with or without
suitable zero padding and pooling, the input image is suitably
scaled up or down as desired.
In conclusion, the major advantage of convolution is to
overcome the effect of rotation, translation and scaling during
for the purpose of efficient, effective and successful image
recognition.

Fig. 13 Second feature map corresponding to second filter

6. AN EXAMPLE OF CONVOLUTION
In case the stride is equal to one, the size of the activation or
AND MAX POOLING feature map in the output volume = {m-(r-1)} = {6-(3-1)} =
In Fig. 11 we are going to present one 6*6 image input and
4*4; because the input image size is m*m = 6*6, kernel size is
two kernels or filters, namely Filter 1 and Filter 2, which are
r*r = 3*3. Note that there is no zero padding in the input.
expected to grab or extract two different features likely to be
present in various locations of the given input image. If there is a zero padding of z layers in the original input, then,
the size of the activation or feature map in the output volume
Here Filter 1 represents one particular feature, while Filter 2
= {(m+z) – (r-1)}. As for example, if the zero padding z =1,
represents another particular feature, which are expected to be
and kernel size = 3*3, for the same input image of size = 6*6,
present in different locations of the input image. The size of
then the activation map size in the output volume = {(6+1) –
the pattern or feature detected by both of them is equal to 3*3.
(3-1)} = 5*5. Here we have assumed that the stride is equal to
one as before.
Output volume is a stack of different activation maps
produced by different kernels. One activation map or feature
map describes the different locations in the input where a
particular feature represented by a filter of kernel is present.
When we repeat the process for each filter, we get a series of
activation map to produce the complete output volume.

7. COLOUR IMAGE
In all previous examples we considered monochrome image.
There a particular feature in the input image is described by
one and only one filter. Thus one filter is sufficient to extract
that feature.
On the contrary, in colour RBG image, a particular feature in
the input is constituted by a combination of three different
coloured pixels namely Red, Green and Blue. Thus to extract
Fig. 11 One sample 6*6 Input Image a particular feature in the coloured input image, three different
Filter 1 produces an activation map or feature map in the filters (corresponding to red, green and blue) describing that
output volume as shown in Fig. 12: feature are needed. This is shown in Fig. 14.

18
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018

Fig. 16 Two feature maps corresponding to two filters

given as input to Max Pooling unit

Fig. 14 Colour image feature/ activation map Now suppose we perform a max pooling with a pooling kernel
corresponding to two filters size = 2*2. The stride of the pooling kernel is equal to two
(and not one), so that this is no overlapping during the pooling
process. The max pooling output is indicated in the below
mentioned Fig. 17.
8. MAX POOLING
The idea of max pooling is to extract the maximum activation
out of the pooling region, indicated by the pooling kernel.
There are other varieties of pooling apart from Max Pooling.
Another common pooling is known as L2 pooling. Here we
don’t take the maximum activation of say 2*2 region. Unlike
this we take the square root of the sum of squares of the
activations within the 2*2 region. Both Max Pooling and L2
pooling are widely used. Sometimes other types of pooling
such as average pooling may be used
Let us consider that we are using the two filters of Fig. 15 as
indicated below :

Fig. 17 Max pooling output corresponding to two

feature maps

Pooling is also sometimes called sub sampling. Pooling

usually compresses the original image or make the original
image size smaller. But it cannot change one image to
another.
In the above example two 4*4 image (actually activation or
feature map which is input to the pooling stage) is compressed
into two 2*2 image due to the effect of pooling. So the
Fig. 15 Two filters used for creation of two activation compression ratio of pooling is = 4*4 / 2*2 = 2
maps
In general if the activation map size is m*n and the pooling
Corresponding feature maps or activation maps in the output kernel size is p*q, assuming there is no overlapping during
volume for the two filters are shown below in Fig. 16. pooling, the compression ratio is given by the following
formula:

C = m*n / p*q
Max pooling is a way to detect whether a particular feature is
present anywhere within a region of the image. It simply
discards the exact positional information regarding the
feature. Once a feature has been discovered in the image, its
exact location is not at all important and is secondary and its
rough location with respect to other features only matters. The
major advantage is that pooled features are always fewer in

19
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018

number and number of parameter needed at subsequent layers

are thereby substantially reduced. Fig. 18 indicates a
particular input is convolved with two filters and max pooled
producing two activation maps.

Fig. 18 Each filter is a channel and creates a separate Fig. 21 Convolution Neural Network (CNN) instructions
activation (feature) map and a new smaller image through for Convolution and Pooling in Keras
pooling

9. MAX POOLING
Convolution and max pooling may be repeated a number of
times to get the desired size of the reduced image. This has to
be flattened and finally applied as input to the fully connected
network to detect the output class. This is shown below in
Fig. 19.

Fig. 22 Convolution Neural Network (CNN) Architecture

in Keras

Fig. 19 A series of convolution and max pooling The network has input image of size 28*28 neurons. These
are the pixel intensities for the MNIST data set image. Then a
Final pooling output is a stack of activations corresponding to convolution layer with 25 kernels each of size 3*3 (this is also
different filters. These when serialized or flattened would be a the size of the local receptive field) follows this. This results
one dimensional matrix. This is the input to the fully 25*26*26 hidden feature neurons. The next step is to perform
connected network as shown below in Fig. 20 max pooling with a max pooling kernel size 2*2 with a stride
of 2 (no overlapping) across all the 25 feature or activation
maps. The result is a layer of 25*13*13 hidden feature
neurons. The process of convolution and max pooling is once
again repeated, but this time with 50 kernels of size as before
i.e. 3*3. With this, convolution layer yields a layer of
50*11*11 hidden feature neurons, while the max pooling
layer yields a layer of 50*5*5 hidden feature neurons. This is
now flattened at the next layer.
The final layer of connection is a fully connected layer. So
this layer connects every neuron from the flattened max
pooled layer to every one of the 10 output neurons (assuming
Fig. 20 An example of max pooling and corresponding there are 10 categories for classification). This is indicated in
flattening Fig. 21 and Fig. 22

20
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018

The prime advantage of Sharing of Weights is that it heavily [8] Szarvas, M., Yoshizawa, A., Yamamoto,M., Ogata,
reduces the number of parameters involved in a convolution J.:Pedestrain detection with convolutional neural
neural network. If the kernel size is 3*3 as before, for each networks. In: Intelligent Vehicles Symposium, 2005.
feature map we need 9 = 3*3 shared weights. If there are 25 Proceedings. IEEE. Pp. 224-229. IEEE(2005).
feature maps, during first convolution we need 25*3*3 = 225
parameters. Similarly during second convolution we need [9] Szegedy, C., Toshev, A., Erhan, D. : Deep neural
50*3*3 = 450 parameters, since during that convolution we networks for object detection. In: Advances in Neural
had 50 feature maps. Thus altogether we need 675 parameters. Information Processing Systems. Pp. 2553-2561 (2013).
These many parameters are needed up to the first hidden layer [10] Tivive, F.H.C., Bouzerdoum, A. : A new class of
neuron of fully connected network. On the contrary if we use convolutional neural networks (siconnets) and their
only a fully connected network without convolution, with applications of face detection. In: Neural Networks,
28*28=784 input neurons, and assuming a modest 30 hidden 2003. Proceedings of the International Joint Conference
layer neurons, we require 784*30 = 23520 parameters even on. Vol. 3, pp 2157-2162. IEEE(2003).
for only the first hidden layer neuron. Thus, approximately 35
times more number of parameters are needed in fully [11] Zeiler, M.D., Fergus, R.:Stochastic pooling for
connected layer, in comparison to that of convolutional layer. regularization of deep convolutional neural networks.
arXiv preprint arXiv: 1301.3557(2013).

10. CONCLUSION [12] Zeiler, M.D., Fergus, R.: Visualizing and understanding
The present paper is an introduction to Convolutional Neural convolutional networks. In: Computer Vision – ECCV
Network (CNN)- one revolutionary and dramatic concept in 2014, pp. 818-833. Springer (2014)
Artificial Neural Network (ANN). Starting with the [13] Sarker, G.(2000),A Learning Expert System for Image
preliminary concepts and motivations of CNN, the paper Recognition, Journal of The Institution of Engineers (I),
broadly discusses the general architecture of any CNN, the Computer Engineering Division.,Vol. 81, 6-15.
different layers and components of CNN, the major
advantages of CNN over classical ANN. It also details some [14] Mehak and Tarun Gulati, Detection of Digital Forgery
specific CNN architectures. The author expects that the Image using Different Techniques, International Journal
beginners of CNN will find this paper a most helpful one. of Engineering Trends and Technology (IJETT) –
Volume 46 Number 8 April 2017.
11. ACKNOWLEDGEMENTS [15] G. Sarker(2010),A Probabilistic Framework of Face
The author would like to thank his M.Tech Project Student Detection , International Journal of Computer,
Ms. Swagata Ghosh, Computer Science and Engineering Information Technology and Engineering
Department, NIT Durgapur for drawing all the figures in the (IJCITAE),4(1), 19-25.
paper.
[16] G. Sarker(2011),A Multilayer Network for Face
Detection and Localization, International Journal of
12. REFERENCES Computer, Information Technology and Engineering
[1] Ian Goodfellow and Yoshua Bengio and Aaron (IJCITAE), 5(2), 35-39.
Courville: Deep Learning, MIT Press.
[17] G. Sarker(2012),A Back Propagation Network for Face
[2] Ciresan, D.C., Meier, U., Gambardella, L.M., Identification and Localization, International Journal of
Schmidhuber, H.: Convolutional neural network Computer, Information Technology and Engineering
committees for handwritten character classification. (IJCITAE),6(1), 1-7.
In:Document Analysis and Recognition (ICDAR), 2011
International Conference on. Pp 1135-1139, IEEE(2011). [18] G. Sarker(2012), An Unsupervised Learning Network for
Face Identification and Localization, International
[3] Farabet, C., Martini, B., Akselrod, P., Talay, S., LeCun, Journal of Computer, Information Technology
Y., Culurciello, E.: Hardware accelerated convolutional and Engineering (IJCITAE),6(2), 83-89.
neural networks for synthetic vision systems. In:Circuits
and Systems (ISCAS). Proceedings of 2010 IEEE [19] G. Sarker and K. Roy (2013), A Modified RBF Network
International Symposium on. Pp 257-260. IEEE (2010) With Optimal Clustering For Face Identification and
Localization, International Journal of Advanced
[4] Karpathy, A. Toderici, G., Shetty, S., Leung, T., Computational Engineering and Networking, ISSN:
Sukthankar, R., Fei-Fei, L.: Large scale video 2320-2106.,1(3), 30 -35.
classification with convolutional neural networks. In:
Computer Vision and Pattern Recognition (CVPR), 2014 [20] G. Sarker and K. Roy(2013), An RBF Network with
IEEE Conference on. Pp. 17ng, Optimal Clustering for Face Identification, Engineering
Science International Research Journal (ISSN) – 2300 –
[5] T., Sukthankar, R., Fei-Fei, L.: Large scale video 4338 ,1(1),ISBB: 978-93-81583-90-6, 70-74.
classification with convolutional neural networks. In:
Computer Vision and Pattern Recognition (CVPR), 2014 [21] G. Sarker(2013), An Optimal Back Propagation Network
IEEE Conference on. Pp. 1725-1732. IEEE(2014). for Face Identification and Localization, International
Journal of Computers and Applications (IJCA),ACTA
[6] Nebaaer, C.: Evaluation of convolutional neural Press, Canada.,35(2).,DOI 10.2316 / Journal
networks for visual recognition. Neural Networks, IEEE .202.2013.2.202 – 3388.
Transactions on 9(4), 685-696 (1998).
[22] 22. G. Sarker and S. Sharma (2014), A Heuristic Based
[7] Simard, P.Y., Steinkraus, D., Platt, J,C.: Best practices RBFN for Location and Rotation Invariant Clear and
for convolutional neural networks applied to visual Occluded Face Identification, International Journal of
document analysis. In: null. P. 958. IEEE(2003).

21
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018

Computer Information Technology and Engineering [28] G. Sarker, and K. Roy(2013), An RBF Network with
(IJCITAE), Serials Publications ,8(2),109-118. Optimal Clustering for Face Identification,
International Conference on Information & Engineering
[23] G. Sarker(2014), A Competitive Learning Network for Science – 2013(ICIES -2013), Feb. 21-23 2013,
Face Detection and Localization, International Journal of organized by IMRF, Vijayawada, Andhra Pradesh, pp –
Computer Information Technology and Engineering 70-74.
(IJCITAE), Serials Publications, 8(2),119-123.
[29] G. Sarker and K. Roy(2013), A Modified RBF Network
[24] G. Sarker(2002), A Semantic Concept Model Approach with Optimal Clustering for Face Identification and
for Pattern Classification and recognition, 28th Annual Localization, International Conference on Information &
Convention and Exhibition IEEE – ACE Engineering Science – 2013(ICIES -2013), Feb. 21-23
2002.,December 20-21 2002 , Science City ,Kolkata, 271 2013, organized by IMRF, Vijayawada, Andhra Pradesh
– 274. pp 32-37.
[25] G. Sarker(2005), A Heuristic Based Hybrid Clustering [30] K. Roy and G. Sarker (2013), A Location Invariant Face
for Natural Classification in Data Mining, 20th Indian Identification and Localization with Modified RBF
Engineering Congress, organized by The Institution of Network, International Conference on Computer and
Engineers (India), December 15-18, 2005, Kolkata, Systems ICCS-2013, 21-22 September, 2013, pp – 23-
INDIA, paper no. 4. 28, Bardhaman.
[26] G. Sarker(2011), A Back propagation Network for Face [31] G. Sarker and S. Sharma(2014), A Heuristic Based
Identification and Localization, 2011 International RBFN for Location and Rotation Invariant Clear and
Conference on Recent Trends in Information Systems Occluded Face Identification, International Conference
(ReTIS–2011) held in Dec. 21-23, Kolkata, DOI: on Advances in Computer Engineering and Applications,
10.1109/ReTIS.2011.6146834, pp 24-29. ICACEA – 2014, with IJCA), pp – 30-36.
[27] G. Sarker(2012), An Unsupervised Learning
Network for Face Identification and Localization,
2012 International Conference on Communications,
Devices and Intelligent Systems (CODIS) Dec. 28 and
29, 2012, Kolkata, DOI:
10.1109/CODIS.2012.6422282, pp 652- 655.

IJCATM : www.ijcaonline.org 22

Cnn
No ratings yet
Cnn
123 pages
Understanding of A Convolutional Neural Network
No ratings yet
Understanding of A Convolutional Neural Network
6 pages
An Introduction To Convolutional Neural Networks
No ratings yet
An Introduction To Convolutional Neural Networks
7 pages
UNIT 2 Study Materials 1
No ratings yet
UNIT 2 Study Materials 1
42 pages
An Introduction To Convolutional Neural Networks: November 2015
No ratings yet
An Introduction To Convolutional Neural Networks: November 2015
12 pages
Unit - 2
No ratings yet
Unit - 2
51 pages
Convolutional Neural Network For Image Recognition
No ratings yet
Convolutional Neural Network For Image Recognition
8 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
21CS743_Module4_notes
No ratings yet
21CS743_Module4_notes
15 pages
What is a Convolutional Neural Network-unit3.docx
No ratings yet
What is a Convolutional Neural Network-unit3.docx
12 pages
Module 3
No ratings yet
Module 3
46 pages
Max78000 Article Series Part 1
No ratings yet
Max78000 Article Series Part 1
4 pages
Lecture 08
No ratings yet
Lecture 08
43 pages
Unit 2a
No ratings yet
Unit 2a
31 pages
21CS743_DL_Module4_notes
No ratings yet
21CS743_DL_Module4_notes
7 pages
An Introduction To Convolutional Neural Networks: November 2015
No ratings yet
An Introduction To Convolutional Neural Networks: November 2015
12 pages
AD3501-DL-Unit 2
No ratings yet
AD3501-DL-Unit 2
33 pages
Research and Prospect of Image Recognition Based o
No ratings yet
Research and Prospect of Image Recognition Based o
7 pages
Sarma Cnn Vce Oct 2022
No ratings yet
Sarma Cnn Vce Oct 2022
63 pages
Variants of Cnn(page no 17-23), structured output(29-31),datatypes
No ratings yet
Variants of Cnn(page no 17-23), structured output(29-31),datatypes
31 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
unit-3-CNN-2024
No ratings yet
unit-3-CNN-2024
58 pages
Module-4
No ratings yet
Module-4
20 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
M4_IA2
No ratings yet
M4_IA2
6 pages
AD3501-DL-UNIT 2 NOTES
No ratings yet
AD3501-DL-UNIT 2 NOTES
29 pages
UNIT_IV_DL
No ratings yet
UNIT_IV_DL
26 pages
CNNs
No ratings yet
CNNs
22 pages
CNN and Autoencoder
No ratings yet
CNN and Autoencoder
56 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
Lecture-CNN
No ratings yet
Lecture-CNN
68 pages
convolutional_neural_networks
No ratings yet
convolutional_neural_networks
108 pages
Convolution Neural Network (CNN) Unit 2: Dr. Kavita R Singh
No ratings yet
Convolution Neural Network (CNN) Unit 2: Dr. Kavita R Singh
65 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
L10-DL Intro
No ratings yet
L10-DL Intro
25 pages
10-Variants of Convolution Function-21-Sep-2020Material I 21-Sep-2020 Module5 CNN
No ratings yet
10-Variants of Convolution Function-21-Sep-2020Material I 21-Sep-2020 Module5 CNN
23 pages
An Overview of Popular Deep Learning Methods PDF
No ratings yet
An Overview of Popular Deep Learning Methods PDF
12 pages
FODL Unit-4
No ratings yet
FODL Unit-4
46 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
No ratings yet
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
7 pages
Cnns Convolution Neural Networks
No ratings yet
Cnns Convolution Neural Networks
50 pages
Convolutional Neural Networks-Part1
No ratings yet
Convolutional Neural Networks-Part1
15 pages
4th Unit Aktu Machine Learning
No ratings yet
4th Unit Aktu Machine Learning
9 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
95 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
78 pages
Convolution Neural Network-1
No ratings yet
Convolution Neural Network-1
44 pages
Deep Learning: Alberto Ezpondaburu
No ratings yet
Deep Learning: Alberto Ezpondaburu
58 pages
371810f3-a2d5-467f-aa88-bfa680405b79
No ratings yet
371810f3-a2d5-467f-aa88-bfa680405b79
17 pages
Introduction to Convolutional Neural Networks
No ratings yet
Introduction to Convolutional Neural Networks
4 pages
Machine Learning: Algorithms and Applications: Philip O. Ogunbona
No ratings yet
Machine Learning: Algorithms and Applications: Philip O. Ogunbona
29 pages
Military AI-Week 05-AI in Computer Vision
No ratings yet
Military AI-Week 05-AI in Computer Vision
65 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
Neural Networks and Deep Learning (PE - V) (18CSE23) Unit - 4
No ratings yet
Neural Networks and Deep Learning (PE - V) (18CSE23) Unit - 4
11 pages
Shravya Banala
No ratings yet
Shravya Banala
29 pages
CNN2
No ratings yet
CNN2
70 pages
Lecture-25 - Building - Training CNN
No ratings yet
Lecture-25 - Building - Training CNN
26 pages
DL Unit2
No ratings yet
DL Unit2
25 pages
NN 06
No ratings yet
NN 06
18 pages
A Multi-Level Integrator With Programming Based Boosting For Person Authentication Using Different Biometrics
No ratings yet
A Multi-Level Integrator With Programming Based Boosting For Person Authentication Using Different Biometrics
22 pages
A Set of Convolutional Neural Networks For Person Identification With Different Biometrics
No ratings yet
A Set of Convolutional Neural Networks For Person Identification With Different Biometrics
7 pages
QCA Multiplexer Based Design of Reversible ALU
No ratings yet
QCA Multiplexer Based Design of Reversible ALU
6 pages
Implementation, Characterization and Application of Path
No ratings yet
Implementation, Characterization and Application of Path
26 pages
An Efficient Implementation of Arbiter PUF On FPGA For IoT Application
No ratings yet
An Efficient Implementation of Arbiter PUF On FPGA For IoT Application
6 pages

Some Studies On Convolution Neural Network

Uploaded by

Some Studies On Convolution Neural Network

Uploaded by

International Journal of Computer Applications (0975 – 8887)

Volume 182 – No. 21, October 2018

Some Studies on Convolution Neural Network

Goutam Sarker, PhD

ABSTRACT Classical ANN is having the severe limitations in that, the

P(t) = ∫ p(t) w(t) dt

3.1 Sparse Connectivity (or Sparse

Network, each component of the kernel as well as the entire

3.3 Equi Variant Representation:

4. BROAD ARCHITECTURE OF CNN

Fig. 4 Variation of output for the same input in case of a

3.2 Sharing of Parameters or Weights:

In general, max pooling with a pooling kernel dimensionality

Fig. 8 Input vector, Pooled Vector Kernel and

Fig. 8 shows one specimen input vector, pooled vector and

5. ADVANTAGES OF POOLING Rotation Invariance: If the input image is slightly rotated by

Fig. 10 Rotation invariance through max pooling

vertically upward and the third one is rotated clockwise by

Fig. 13 Second feature map corresponding to second filter

Fig. 16 Two feature maps corresponding to two filters

Fig. 17 Max pooling output corresponding to two

Pooling is also sometimes called sub sampling. Pooling

number and number of parameter needed at subsequent layers

Fig. 22 Convolution Neural Network (CNN) Architecture

You might also like