Some Studies On Convolution Neural Network
Some Studies On Convolution Neural Network
13
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018
Consider the above system is a very noisy one. We want to 3. MOTIVATIONS FOR
obtain a more accurate noise free position of the same aircraft.
Then we have to make an average of several such CONVOLUTIONAL NEURAL
measurements of position p(t). The more the measurement is NETWORK (CNN)
recent, the more it is likely to be close to an accurate one. The architecture of Convolutional Neural Network (CNN)
With that there must be an weight attached to each [6,7,8] in general is comprised of a series of the set of some
measurement – the weights are assigned to more recent convolutional layers, rectified linear unit layer, pooling layer.
measurements. In convolutional layer, to perform the operation of
convolution, some filters or kernels are employed
So let us present a new function P that will give an accurate
estimate of the position of the aircraft where
P(t) = (p*w)(t)
With respect to convolutional neural network, the first
argument i.e. the function ‘p’ is referred to as the input, while
the second argument i.e. the weight function ‘w’ is called the Fig. 2 In case of CNN, the input output connections are
kernel or filter. The output is called feature map or activation always sparse.
map. The ‘w’ should definitely be a valid probability density . Classical or conventional ANNs are much improved and
function otherwise we will not be able to perform the empowered with the three major concepts or ideas of
operation of convolution. Convolution Neural Network. These are the main features of
any CNN [9,10]. These are as mentioned below:
1. Sparse Connectivity.
2. Sharing of Parameters.
3. Equivariant Representation.
14
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018
15
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018
1. Convolution Layer: Local regions of the input volume are 3. Pooling Layer: Sub sampling along the spatial dimensionality
connected to the output neurons for the purpose of of the given input (produced as an activation map at the
convolution. Convolution layer computes the output of the output volume or as an output of ReLU) further reduces the
neurons connected to the local regions of the input volume. number of parameters within the activation map. This sub
This is mathematically done by scalar product of the weights sampling is called pooling.
of the kernel and the local region connected to the input
volume.
2. Rectified Linear Unit (ReLU): The ReLU applies an
activation function (e.g. sigmoid or g(z) = max{0,z}) to each
an every element at the output volume produced by the
previous layer.
16
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018
we need zero padding. With this the outer region of the input
volume is surrounded by few layers of zero.
17
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018
7. COLOUR IMAGE
In all previous examples we considered monochrome image.
There a particular feature in the input image is described by
one and only one filter. Thus one filter is sufficient to extract
that feature.
On the contrary, in colour RBG image, a particular feature in
the input is constituted by a combination of three different
coloured pixels namely Red, Green and Blue. Thus to extract
Fig. 11 One sample 6*6 Input Image a particular feature in the coloured input image, three different
Filter 1 produces an activation map or feature map in the filters (corresponding to red, green and blue) describing that
output volume as shown in Fig. 12: feature are needed. This is shown in Fig. 14.
18
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018
Fig. 14 Colour image feature/ activation map Now suppose we perform a max pooling with a pooling kernel
corresponding to two filters size = 2*2. The stride of the pooling kernel is equal to two
(and not one), so that this is no overlapping during the pooling
process. The max pooling output is indicated in the below
mentioned Fig. 17.
8. MAX POOLING
The idea of max pooling is to extract the maximum activation
out of the pooling region, indicated by the pooling kernel.
There are other varieties of pooling apart from Max Pooling.
Another common pooling is known as L2 pooling. Here we
don’t take the maximum activation of say 2*2 region. Unlike
this we take the square root of the sum of squares of the
activations within the 2*2 region. Both Max Pooling and L2
pooling are widely used. Sometimes other types of pooling
such as average pooling may be used
Let us consider that we are using the two filters of Fig. 15 as
indicated below :
C = m*n / p*q
Max pooling is a way to detect whether a particular feature is
present anywhere within a region of the image. It simply
discards the exact positional information regarding the
feature. Once a feature has been discovered in the image, its
exact location is not at all important and is secondary and its
rough location with respect to other features only matters. The
major advantage is that pooled features are always fewer in
19
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018
Fig. 18 Each filter is a channel and creates a separate Fig. 21 Convolution Neural Network (CNN) instructions
activation (feature) map and a new smaller image through for Convolution and Pooling in Keras
pooling
9. MAX POOLING
Convolution and max pooling may be repeated a number of
times to get the desired size of the reduced image. This has to
be flattened and finally applied as input to the fully connected
network to detect the output class. This is shown below in
Fig. 19.
Fig. 19 A series of convolution and max pooling The network has input image of size 28*28 neurons. These
are the pixel intensities for the MNIST data set image. Then a
Final pooling output is a stack of activations corresponding to convolution layer with 25 kernels each of size 3*3 (this is also
different filters. These when serialized or flattened would be a the size of the local receptive field) follows this. This results
one dimensional matrix. This is the input to the fully 25*26*26 hidden feature neurons. The next step is to perform
connected network as shown below in Fig. 20 max pooling with a max pooling kernel size 2*2 with a stride
of 2 (no overlapping) across all the 25 feature or activation
maps. The result is a layer of 25*13*13 hidden feature
neurons. The process of convolution and max pooling is once
again repeated, but this time with 50 kernels of size as before
i.e. 3*3. With this, convolution layer yields a layer of
50*11*11 hidden feature neurons, while the max pooling
layer yields a layer of 50*5*5 hidden feature neurons. This is
now flattened at the next layer.
The final layer of connection is a fully connected layer. So
this layer connects every neuron from the flattened max
pooled layer to every one of the 10 output neurons (assuming
Fig. 20 An example of max pooling and corresponding there are 10 categories for classification). This is indicated in
flattening Fig. 21 and Fig. 22
20
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018
The prime advantage of Sharing of Weights is that it heavily [8] Szarvas, M., Yoshizawa, A., Yamamoto,M., Ogata,
reduces the number of parameters involved in a convolution J.:Pedestrain detection with convolutional neural
neural network. If the kernel size is 3*3 as before, for each networks. In: Intelligent Vehicles Symposium, 2005.
feature map we need 9 = 3*3 shared weights. If there are 25 Proceedings. IEEE. Pp. 224-229. IEEE(2005).
feature maps, during first convolution we need 25*3*3 = 225
parameters. Similarly during second convolution we need [9] Szegedy, C., Toshev, A., Erhan, D. : Deep neural
50*3*3 = 450 parameters, since during that convolution we networks for object detection. In: Advances in Neural
had 50 feature maps. Thus altogether we need 675 parameters. Information Processing Systems. Pp. 2553-2561 (2013).
These many parameters are needed up to the first hidden layer [10] Tivive, F.H.C., Bouzerdoum, A. : A new class of
neuron of fully connected network. On the contrary if we use convolutional neural networks (siconnets) and their
only a fully connected network without convolution, with applications of face detection. In: Neural Networks,
28*28=784 input neurons, and assuming a modest 30 hidden 2003. Proceedings of the International Joint Conference
layer neurons, we require 784*30 = 23520 parameters even on. Vol. 3, pp 2157-2162. IEEE(2003).
for only the first hidden layer neuron. Thus, approximately 35
times more number of parameters are needed in fully [11] Zeiler, M.D., Fergus, R.:Stochastic pooling for
connected layer, in comparison to that of convolutional layer. regularization of deep convolutional neural networks.
arXiv preprint arXiv: 1301.3557(2013).
10. CONCLUSION [12] Zeiler, M.D., Fergus, R.: Visualizing and understanding
The present paper is an introduction to Convolutional Neural convolutional networks. In: Computer Vision – ECCV
Network (CNN)- one revolutionary and dramatic concept in 2014, pp. 818-833. Springer (2014)
Artificial Neural Network (ANN). Starting with the [13] Sarker, G.(2000),A Learning Expert System for Image
preliminary concepts and motivations of CNN, the paper Recognition, Journal of The Institution of Engineers (I),
broadly discusses the general architecture of any CNN, the Computer Engineering Division.,Vol. 81, 6-15.
different layers and components of CNN, the major
advantages of CNN over classical ANN. It also details some [14] Mehak and Tarun Gulati, Detection of Digital Forgery
specific CNN architectures. The author expects that the Image using Different Techniques, International Journal
beginners of CNN will find this paper a most helpful one. of Engineering Trends and Technology (IJETT) –
Volume 46 Number 8 April 2017.
11. ACKNOWLEDGEMENTS [15] G. Sarker(2010),A Probabilistic Framework of Face
The author would like to thank his M.Tech Project Student Detection , International Journal of Computer,
Ms. Swagata Ghosh, Computer Science and Engineering Information Technology and Engineering
Department, NIT Durgapur for drawing all the figures in the (IJCITAE),4(1), 19-25.
paper.
[16] G. Sarker(2011),A Multilayer Network for Face
Detection and Localization, International Journal of
12. REFERENCES Computer, Information Technology and Engineering
[1] Ian Goodfellow and Yoshua Bengio and Aaron (IJCITAE), 5(2), 35-39.
Courville: Deep Learning, MIT Press.
[17] G. Sarker(2012),A Back Propagation Network for Face
[2] Ciresan, D.C., Meier, U., Gambardella, L.M., Identification and Localization, International Journal of
Schmidhuber, H.: Convolutional neural network Computer, Information Technology and Engineering
committees for handwritten character classification. (IJCITAE),6(1), 1-7.
In:Document Analysis and Recognition (ICDAR), 2011
International Conference on. Pp 1135-1139, IEEE(2011). [18] G. Sarker(2012), An Unsupervised Learning Network for
Face Identification and Localization, International
[3] Farabet, C., Martini, B., Akselrod, P., Talay, S., LeCun, Journal of Computer, Information Technology
Y., Culurciello, E.: Hardware accelerated convolutional and Engineering (IJCITAE),6(2), 83-89.
neural networks for synthetic vision systems. In:Circuits
and Systems (ISCAS). Proceedings of 2010 IEEE [19] G. Sarker and K. Roy (2013), A Modified RBF Network
International Symposium on. Pp 257-260. IEEE (2010) With Optimal Clustering For Face Identification and
Localization, International Journal of Advanced
[4] Karpathy, A. Toderici, G., Shetty, S., Leung, T., Computational Engineering and Networking, ISSN:
Sukthankar, R., Fei-Fei, L.: Large scale video 2320-2106.,1(3), 30 -35.
classification with convolutional neural networks. In:
Computer Vision and Pattern Recognition (CVPR), 2014 [20] G. Sarker and K. Roy(2013), An RBF Network with
IEEE Conference on. Pp. 17ng, Optimal Clustering for Face Identification, Engineering
Science International Research Journal (ISSN) – 2300 –
[5] T., Sukthankar, R., Fei-Fei, L.: Large scale video 4338 ,1(1),ISBB: 978-93-81583-90-6, 70-74.
classification with convolutional neural networks. In:
Computer Vision and Pattern Recognition (CVPR), 2014 [21] G. Sarker(2013), An Optimal Back Propagation Network
IEEE Conference on. Pp. 1725-1732. IEEE(2014). for Face Identification and Localization, International
Journal of Computers and Applications (IJCA),ACTA
[6] Nebaaer, C.: Evaluation of convolutional neural Press, Canada.,35(2).,DOI 10.2316 / Journal
networks for visual recognition. Neural Networks, IEEE .202.2013.2.202 – 3388.
Transactions on 9(4), 685-696 (1998).
[22] 22. G. Sarker and S. Sharma (2014), A Heuristic Based
[7] Simard, P.Y., Steinkraus, D., Platt, J,C.: Best practices RBFN for Location and Rotation Invariant Clear and
for convolutional neural networks applied to visual Occluded Face Identification, International Journal of
document analysis. In: null. P. 958. IEEE(2003).
21
International Journal of Computer Applications (0975 – 8887)
Volume 182 – No. 21, October 2018
Computer Information Technology and Engineering [28] G. Sarker, and K. Roy(2013), An RBF Network with
(IJCITAE), Serials Publications ,8(2),109-118. Optimal Clustering for Face Identification,
International Conference on Information & Engineering
[23] G. Sarker(2014), A Competitive Learning Network for Science – 2013(ICIES -2013), Feb. 21-23 2013,
Face Detection and Localization, International Journal of organized by IMRF, Vijayawada, Andhra Pradesh, pp –
Computer Information Technology and Engineering 70-74.
(IJCITAE), Serials Publications, 8(2),119-123.
[29] G. Sarker and K. Roy(2013), A Modified RBF Network
[24] G. Sarker(2002), A Semantic Concept Model Approach with Optimal Clustering for Face Identification and
for Pattern Classification and recognition, 28th Annual Localization, International Conference on Information &
Convention and Exhibition IEEE – ACE Engineering Science – 2013(ICIES -2013), Feb. 21-23
2002.,December 20-21 2002 , Science City ,Kolkata, 271 2013, organized by IMRF, Vijayawada, Andhra Pradesh
– 274. pp 32-37.
[25] G. Sarker(2005), A Heuristic Based Hybrid Clustering [30] K. Roy and G. Sarker (2013), A Location Invariant Face
for Natural Classification in Data Mining, 20th Indian Identification and Localization with Modified RBF
Engineering Congress, organized by The Institution of Network, International Conference on Computer and
Engineers (India), December 15-18, 2005, Kolkata, Systems ICCS-2013, 21-22 September, 2013, pp – 23-
INDIA, paper no. 4. 28, Bardhaman.
[26] G. Sarker(2011), A Back propagation Network for Face [31] G. Sarker and S. Sharma(2014), A Heuristic Based
Identification and Localization, 2011 International RBFN for Location and Rotation Invariant Clear and
Conference on Recent Trends in Information Systems Occluded Face Identification, International Conference
(ReTIS–2011) held in Dec. 21-23, Kolkata, DOI: on Advances in Computer Engineering and Applications,
10.1109/ReTIS.2011.6146834, pp 24-29. ICACEA – 2014, with IJCA), pp – 30-36.
[27] G. Sarker(2012), An Unsupervised Learning
Network for Face Identification and Localization,
2012 International Conference on Communications,
Devices and Intelligent Systems (CODIS) Dec. 28 and
29, 2012, Kolkata, DOI:
10.1109/CODIS.2012.6422282, pp 652- 655.
IJCATM : www.ijcaonline.org 22