Handwritten Digit Recognition of MNIST Dataset Using Deep Learning State-Of-The-Art Artificial Neural Network ANN and Convolutional Neural Network CNN
Handwritten Digit Recognition of MNIST Dataset Using Deep Learning State-Of-The-Art Artificial Neural Network ANN and Convolutional Neural Network CNN
(CNN)
Drishti Beohar Akhtar Rasool
Computer Science Department Computer Science Department
Maulana Azad National Institute of Technology, Maulana Azad National Institute of Technology,
Bhopal, India Bhopal, India
[email protected] [email protected]
Segmentation. After segmentation, the Feature Extraction of .recognition [3]. In this particular paper, .we prepared both
the data is done and the final stage i.e. Classification. OCR Artificial neural network and Convolutional neural network
follows the pipeline idea, progressive paces of each stage rely model to .recognize .written .by .hand .digits .from .0 .to .9. A .node
on the success pace of the past stage. With the headway of .in .a .neural .system .can .be .comprehended .as .a .neuron .in .the
innovation, we need the machine to perform the most extreme brain. Every. node. is associated with .different. nodes through
undertakings. There are various uses of computer vision like .weights (which are basically the edges between the nodes)
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on September 16,2022 at 10:38:08 UTC from IEEE Xplore. Restrictions apply.
.last output .of .the .system .is .related .with .the .objective .output, .at
.that .point weights .are changed according to the loss function .to
.depicting .whether .the .system is .speculated .effectively .[7]. This
and. correctness .in the .neural network, the systems .have .different
Layers. In the middle of a fully connected neural system,.there
are various. layers. that .exist, in particular .information, .output
.and .hidden .layers. Suppose we have features x1, x2, x3….. xn.
The edges from one node to the other node of the network have
weights that play the most important role in both forward and
backward propagation. In forward propagation, there are two
types of operation that happens in the hidden layer with the Fig. 2. Forward and Back propagation
feature and the weights being passed to the neuron or node. The
sum of the product of feature and weights and then applying an Whenever we have a deep Neural network or a network
activation function. Whenever we have a Neural Network which with a huge number of layers then we have a huge number of
is very deep at that time you will understand there are many weights and bias parameters as well which leads to overfitting
weights and bais parameters. In backward propagation we have the dataset problem or a particular data. In a multilevel Neural
to change the values of the previous epoch weights, this reduces Network, underfitting will never happen because we will be
the loss value. In a .completely .associated .neural .system .nodes having multiple levels that try to fit the training data perfectly.
.in .each particular layer .are .associated .with .the .nodes .and .the High variance is a problem with increasing levels in the
.layers preceding .and .succeeding .them[9]. network. We can apply regularization(L1 or L2) or Dropout
layer to decrease the overfitting problem. In a Random forest
multiple decision trees are created. Every Decision tree is
created to its depth which also leads to an overfitting problem.
Similarly, like the decision tree, we will be using a subset of
features which is regularization which improves the accuracy
of the whole model.
ݓ௪ ൌ ݓௗ െ ߟ ߜ כሺݏݏܮሻȀߜሺݓௗ Ȍ (1) Fig. 3. Graph of derivative ߜሺݏݏܮሻȀߜሺݓௗ ሻ for vanishing and exploding
gradient problem
543
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on September 16,2022 at 10:38:08 UTC from IEEE Xplore. Restrictions apply.
In the Neural Network, we select a subset of features from then be able to be plotted to give expectations to learn and
the input layer and select a subset of hidden neurons. The other adapt knowledge into how well a model is learning the issue.
neurons which are not selected in the subset are deactivated
[11]. The number of nodes in a subset count is calculated by the IV. THE RECOGNITION MODEL
use of the dropout ratio. In image classification, object Optical character recognition(OCR) is a recognition
detection, and many other data augmentation Convolutional system that has various stages. Each stage plays a very
Neural Network(Convolutional neural network) plays a very important role in the model. The stages are pipelined one after
major role. In the Convolutional neural network, the input data other Fig 7 shows the stages in the Recognition model.
is in the form of a matrix which is having values in each cell
ranging from 0-255 and either one or 3 artificial neural networks A. Image Acquisition
depending on grayscale and RGB scale respectively.[12] Image acquisition is the first stage of all the recognition
models. In this stage, the images are gathered, filtered, and
cleaned before any preprocessing is done on the images.
B. Pre-Processing
Preprocessing is a very vital operation in the image. In
Pre-processing major operations that are carried are image
cleaning to reduce the noise in the image and removing the
Fig. 5. Operations on an image using CNN model
garbage. The optimization of the image is also done in this
stage by filling the voids or holes, straightening curved lines.
The filters are applied to images and the output is also a
Different algorithms are also performed for skew correction.
matrix in a particular operation. The images go through a
The output of this particular stage is a binary image which is
pipeline of operations of convolution layers with the filter,
done by binarization and texture filtering.
pooling, fully connected layer, and applying softmax function.
The beneath figure is the complete architecture of a convolutional C. Segmentation
neural network to process an input picture and classify it based on Decomposition of an image into sub-images is
values.[13] segmentation. Segmentation is of three types line, word, and
III. SCRIPTS AND DATASETS character segmentation. When the input is an image with
multiple lines breaking that image into a single line is line
segmentation. When the input image is an image with a single
line but multiple words and words have to be segmentation is
word segmentation. Similarly, in character segmentation, the
words are segmented into words.
D. Feature Extraction
Feature Extraction is a very important stage in the
recognition model. It is a part of dimensionality reduction
techniques. In Dimensionality reduction, the input data is
converted into more simple and easy operation data. Large
datasets like MNIST are great for this step as this particular
stage optimizes the whole process of recognition. This stage
removes the redundant data by retaining the originality of the
dataset. In image processing, the feature extraction stage helps
in edge detection and many other operations. Without the
feature extraction stage, the classification of the image is a bit
more complex and time-consuming. PCA and Image pixel
Fig. 6. MNIST Dataset vector are some techniques for feature extraction.
The MNIST is a great dataset for the handwritten digit E. Classification
classification problem. The MNIST dataset is a very Classification is the decision-making stage in the pipeline
authenticated and great dataset for the students and researchers. of image recognition. The input to this stage is the output of
It has 60000 images with 10 classes (0-9) which is enormous in the feature extraction stage. For classification nowadays many
itself. Each image in the MNIST dataset is of 28 height and 28 classifiers are present like Logistic regression, random forest,
weight which make the image of 784-dimensional vectors. The K nearest neighbors (KNN), Support vector machine (SVM)
MNIST dataset is available easily on the internet. Each image in algorithm, Artificial Neural Network (Artificial neural
MNIST is a grey-scale image and the range is 0-255 which network), Convolutional Neural Network (Convolutional
indicates the brightness and the darkness of that particular pixel. neural network), and many more. For image classification,
The MNIST dataset was created by the National Institute of Deep neural network classifiers give great results i.e.
Standards and Technology(NIST). To estimate the performance Artificial Neural Network and Convolutional Neural
of a model, we split the preparation set into a training and Network. The MNIST dataset is huge and classifiers like
testing dataset. Execution on the train and testing dataset would Artificial Neural Network (Artificial neural network) and
544
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on September 16,2022 at 10:38:08 UTC from IEEE Xplore. Restrictions apply.
Convolutional Neural Network (Convolutional neural network) B. Convolutional Neural Network
give great accuracy on training 80% of the dataset and testing
20% of the dataset.
545
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on September 16,2022 at 10:38:08 UTC from IEEE Xplore. Restrictions apply.
neural network. The last layer is strongly connected and is The epochs value is chosen as 10 with the batch size value of
called a fully-connected layer. 200 for 60000 images being trained. For calculating the loss
the categorical cross-entropy which is a logarithmic function
VI. RESULTS is used and optimization is done by the ADAM i.e. Adaptive
A. Implementation using Artificial Neural Network: Moment Estimation algorithm for modifying the values of the
weights and bias in the backpropagation. And the value of
The Digit recognition of the MNIST dataset consists of 0-9 Baseline error was achieved as 1.31%.
digits which act as classes in classification. The PyCharm IDE
(Integrated Development Environment) has built-in developer B. Implementation using Convolutional Neural Network
tools and is a customizable and cross-platform IDE. PyCharm is (Convolutional neural network or ConvNet):
used with the latest stable version of Python3.7. As discussed The Digit recognition of the MNIST dataset consists of 0-
above in the recognition model we have 5 stages: the Data 9 digits which act as classes in classification. The PyCharm
acquisition is already implemented as MNIST is a very reliable IDE (Integrated Development Environment) has built-in
dataset. In the Image processing phase in the Artificial Neural developer tools and is a customizable and cross-platform IDE.
Network to make all the images uniform for reducing the PyCharm is used with the latest stable version of Python3.7.
complexity of the dataset. The loading of the data is done by the As discussed above in the recognition model we have 5
python library Numpy which is a fundamental package for stages: the Data acquisition is already implemented as MNIST
scientific computing in python). As mentioned earlier the Model is a very reliable dataset. The Convolutional Neural Network
of Artificial Neural Network has 3 layers Input, hidden, and is Not as simple as the Artificial Neural Network to be
output layer. The input to the next layer is the output of the trained. Like Artificial Neural Network had the number of
previous one. In the Neural Network, the size of the input image neurons in the input layers as the number of pixels in the
is equal to the number of neurons in the input layer. In the image(i.e. image size) here we have a 2-D matrix of the
dataset description, we mentioned it as 28x28 which is 784 network.
pixels. The output of every layer is calculated with the help of
the activation function which in our model is the ReLU
activation function. The number of neurons in the hidden layer
is kept the same as that of the input layer. The number of classes
in the MNIST dataset is 0-9 which is 10 classes so the output
layer consists of 10 neurons for 10 classes.
546
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on September 16,2022 at 10:38:08 UTC from IEEE Xplore. Restrictions apply.
ratio. Here we used the dropout ratio of 20%(here the value of all the more; however execution is less, and in the
drop out ratio p) of some particular randomly selected neurons convolutional neural organization, the underlying two layers
in the layer to reduce the overfitting problem (regularization is for example. The convolution layer and max-pooling layer are
another way to avoid overfitting). The learning rate is minimal handling the picture with the assistance of an artificial neural
as it will help us get the global minimum. The preparation network pipelined by the unequivocally associated layers and
dataset is coordinated as a 3-dimensional exhibit of occurrence, several learnable boundaries are less yet the execution is
picture width, and picture tallness. For a multi-layer perceptron better. Fit and evaluate the model can be a choice. The
model, we ought to decrease the photos down into a vector of particular model fits more than 10 epochs which revive every
pixels. For the present circumstance, the 28×28 measured 200 pictures. In the approval, dataset test data is used. Input to
pictures will be 784-pixel input esteems For this situation the one line for each preparation epoch a verbose estimation of 2
28×28 sized pictures will be 784-pixel input values. is used.
547
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on September 16,2022 at 10:38:08 UTC from IEEE Xplore. Restrictions apply.
me this opportunity to do this research paper. His crucial help [10] Study and Observation of the Variations of Accuracies for Handwritten
made it possible to accomplish the objective. I might also want Digits Recognition with VariousHidden Layers and Epochs using
Convolutional Neural Network Rezoana Bente Arif; Md Abu Bakr
to thank different writers of reference materials referenced in the Siddique; Mohammad.Mahmudur.Rahman… Khan.; Mahjabin
reference segment for their commendable research. Rahman. O,2019
[11] Beyond human. Recognition: A.Convolutional neural network-based
REFERENCE framework for .handwritten character recognition Li Chen;
[1] M .Nagu, N. V.Shankar, and K.Artificial neural network apurna, "A novel Song.Wang; Wei Fan; Jun Sun; Satoshi Naoi,2020
method for Handwritten Digit Recognition with Neural Networks," 2011. [12] Dhanya Sudarsan; Shelbi Joseph, “A Novel Approach for Handwriting
[2] Y .LeCun, B .E .Boser, J .S .Denker, D .Henderson, R .E .Howard, W .E Recognition in Malayalam Manuscripts using Contour Detection and
.Hubbard, et al., "Handwritten digit recognition with a backpropagation Convolutional Neural Nets”,2020.
network," in Advances in neural information processing systems, 1990, [13] Nanehkaran, Y.A., Zhang, D., Salimi, S. et al. “Analysis and
pp .396-404 . comparison of machine learning classifiers and deep neural networks
[3] A .Ashworth, Q .Vuong, B .Rossion, M .Tarr, Q .Vuong, M .Tarr, et al., techniques for recognition of Farsi handwritten digits”. J Supercomput
"Object Recognition in Man, Monkey, and Machine," Visual Cognition, (2020).
vol .5, pp .365-366, 2017 . [14] S. Oktaviani, C. A. Sari, E. Hari Rachmawanto and D. R. Ignatius
[4] J . Janai, F . Güney, A . Behl, and A . Geiger, "Computer Vision for Moses Setiadi, "Optical Character Recognition for Hangul Character
Autonomous Vehicles: Problems, Datasets, and State-of-the-Art," arXiv using Artificial Neural Network," 2020 International Seminar on
preprint arXiv:1704.05519, 2017. Application for Technology of Information and Communication
(semantic), Semarang, Indonesia, 2020.
[5] K .Islam and R .Raj, "Real-Time (Vision-Based) Road Sign Recognition
Using an Artificial Neural Network," Sensors, vol .17, p .853, 2017 . [15] R. Sharma, B. Kaushik, and N. Gandhi, "Character Recognition using
Machine Learning and Deep Learning - A Survey," 2020 International
[6] D .Arpit, Y.Zhou, B.Kota, and V.Govindaraju, "Normalization
Conference on Emerging Smart Computing and Informatics (ESCI),
propagation: A parametric technique for removing internal covariate shift
Pune, India, 2020.
in deep networks," in International Conference on Machine Learning,
2016. [16] P. Gupta, S. Deshmukh, S. Pandey, K. Tonge, V. Urkunde and S. Kide,
"Convolutional Neural Network-based Handwritten Devanagari
[7] I .Patel, V.Jagtap, and O.Kale, "A Survey on Feature Extraction Methods
Character Recognition," 2020 International Conference on Smart
for Handwritten Digits Recognition," International Journal of Computer
Technologies in Computing, Electrical and Electronics (ICSTCEE),
Applications, vol .107, 2014.
Bengaluru, 2020.
[8] I . H . Witten, E . Frank, M.A.Hall, and C . J . Pal, Data Mining: Practical
[17] P. Dhande and R. Kharat "Recognition of cursive English handwritten
machine learning tools and techniques: Morgan Kauf Artificial neural
characters" 2017 International Conference on Trends in Electronics
network, 2016.
and Informatics (ICEI) pp. 199-203 2017.
[9] Study and Observation of the Variations of Accuracies for Handwritten
[18] Shalini Puri and Satya Prakash Singh "An efficient Devanagari
Digits Recognition with Various Hidden Layers and Epochs using Neural
character classification in printed and handwritten documents using
Network Algorithm Md.Abu Bakr Siddique; Mohammad Mahmudur
SVM" Procedia Computer Science vol. 152 pp. 111-121 2019.
Rahman Khan; Rezoana Bente Arif; Zahidun Ashrafi,2018
[19] J. Schmidhuber "Deep learning in neural networks: an overview"
Neural Networks vol. 61 pp. 85-117 2015.
548
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on September 16,2022 at 10:38:08 UTC from IEEE Xplore. Restrictions apply.