Digit Recognition Using Convolutional Neural Networks
Digit Recognition Using Convolutional Neural Networks
Abstract
In this research, we explore the problem of handwritten digit recognition using a Convolutional
Neural Network (CNN) model. The task of recognizing digits from the MNIST-like dataset, a
standard benchmark for image classification, has many practical applications, including in
Optical Character Recognition (OCR). We develop a CNN model and evaluate its performance
on the Digit Recognizer dataset from Kaggle. The model achieves a high accuracy of 99.15% on
the validation set. Through this paper, we aim to demonstrate the effectiveness of CNNs in
image-based tasks and discuss potential improvements for real-world applications.
Introduction
Handwritten digit recognition is a classic machine learning problem, often used as an entry point
to image classification tasks. Its widespread use in applications like postal code recognition,
check processing, and digital form interpretation underscores its importance. The goal is to
correctly identify digits (0–9) from grayscale images, a task complicated by the variability in
human handwriting.
In this study, we utilize the Digit Recognizer dataset from Kaggle, a variant of the MNIST
dataset, to develop a classification model. Convolutional Neural Networks (CNNs) have become
the go-to method for image classification tasks due to their ability to learn spatial hierarchies and
patterns in images. Our objective is to train a CNN model and evaluate its performance on
unseen data, exploring its strengths and weaknesses.
Related Work
The MNIST dataset has been a foundational dataset for digit recognition research. LeCun et al.
(1998) introduced the first successful neural network model for this dataset, the LeNet-5, which
achieved excellent performance using convolutional layers.
With the advent of deep learning frameworks such as TensorFlow and Keras, CNNs have
become the standard architecture for image-based tasks. Modern approaches have extended
beyond basic CNNs, incorporating techniques such as dropout, batch normalization, and data
augmentation to improve accuracy. In recent competitions, models based on deep CNNs have
achieved near-human performance on MNIST.
Dataset and Preprocessing
The dataset used in this research is sourced from Kaggle's "Digit Recognizer" competition,
which consists of a training set of 42,000 labeled images and a test set of 28,000 unlabeled
images. Each image is 28x28 pixels, representing grayscale intensity values between 0 and 255.
Preprocessing Steps:
Normalization: Each pixel value was scaled to a range between 0 and 1 by dividing by 255.
This helps accelerate training by preventing large gradients.
Reshaping: The images were reshaped into 28x28 matrices with a single channel (grayscale),
resulting in an input shape of (28, 28, 1).
One-Hot Encoding: The digit labels (0-9) were transformed into one-hot encoded vectors for
use in categorical classification.
Methodology
Convolutional Layers: Two convolutional layers with ReLU activation and 32 and 64 filters,
respectively, using a kernel size of (3,3). These layers extract spatial features from the images.
Max-Pooling Layers: Pooling layers with a (2,2) window are applied after each convolution to
reduce the dimensionality.
Flattening Layer: The output of the last pooling layer is flattened into a vector.
Fully Connected (Dense) Layer: A dense layer with 128 neurons and ReLU activation is
applied.
Output Layer: A dense layer with 10 neurons and softmax activation is used to output the
probability distribution for each class (digit 0-9).
The model was compiled using the Adam optimizer, and the categorical cross-entropy loss
function was chosen for multiclass classification. The dataset was split into training (80%) and
validation (20%) sets. The model was trained for 10 epochs with a batch size of 128.
Experiments and Results
The model was trained on 33,600 images and validated on 8,400 images. After 10 epochs, the
model achieved a training accuracy of 99.28% and a validation accuracy of 99.15%.
B. Confusion Matrix
A confusion matrix was generated to observe the model's performance in classifying individual
digits. While the model performed well across all classes, the most common misclassifications
occurred between digits that share similar shapes (e.g., 3 and 8, 7 and 9).
Discussion
Our results confirm that CNNs are highly effective for handwritten digit recognition tasks. The
model achieved high accuracy with minimal preprocessing, demonstrating the CNN’s
robustness. The confusion matrix reveals that while the model performs well overall, there are
still misclassifications between visually similar digits. Future work could explore data
augmentation techniques and more complex architectures, such as Residual Networks (ResNets),
to further improve performance.
Conclusion
This paper presents a CNN-based approach to handwritten digit recognition using the Digit
Recognizer dataset. The model achieved a validation accuracy of 99.15%, highlighting the power
of CNNs in image classification tasks. Potential future work includes experimenting with more
complex architectures and deploying the model in real-world applications, such as automated
postal code recognition systems or check digit processing.
References
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to
document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep
convolutional neural networks. Advances in Neural Information Processing Systems, 25.