Machine Learning
Machine Learning
1.INTRODUCTION
2.WHY DIGIT RECOGNITION
3.DATA DESCRIPTION
4.LITERATURE SURVEY
5.OBJECTIVE OF PROJECT
6.PROBLEM STATEMENT
7.METHODOLOGY
8.IMPLEMENTATION
9.RESULT
10.CONCLUSION
11.REFERENCES
INTRODUCTION
• The problem of handwriting recognition is to interpret intelligible handwritten input automatically, which is of great
interest in the pattern recognition research community because of its applicability to many fields towards more
convenient input devices and more efficient data organization and processing. As one of the fundament problems
in designing practical recoginition systems, the recognition of handwritten digits is an active research field.
Immediate applications of the digit recognition techniques include postal mail sorting, automatically address
reading and mail routing, bank check processing, etc
• A major problem in handwriting recognition is the huge variability and distortions of patterns. Elastic models based
on local observations and dynamic programming such HMM are not efficient to absorb this variability. But their
vision is local. But they cannot face to length variability and they are very sensitive to distortions.
• Then the SVM is used to estimate global correlations and classify the pattern. Support Vector Machine (SVM) is an
alternative to NN. In Handwritten recognition, SVM gives a better recognition result.
WHY DIGIT RECOGNITION?
• For some brief background regarding handwritten digit processing with machine learning, lets note some
interesting features about being able to process this kind of data. Handwritten digits are a common part of
everyday life. One of the first uses that comes to mind is that of zip codes.
• A zip code consists of 5 digits (sometimes more, depending if the trailing digits are included), and is one of
the most important parts of a letter for it to be delivered to the correct location. Many years ago, the
postman would read the zip code manually for delivery. However, this type of work is now automated by
using optical character recognition (OCR) - similar to the type of solution we’ll be implementing in this
article!
OBJECTIVE OF PROJECT
• Given a grey scale isolated numerical images taken from MNIST database
• 2.Training of datasets is done. Training data is used by the learning algorithm, usually in a supervised learning model, to
increase accuracy
• 3. The label (answer) is provided for each row in the dataset, so the algorithm can learn which data corresponds to which
handwritten digit.
• 4. However, in order to really know how well the program is doing, we need to run it on data that it’s never seen before.
That’s where the cross validation set comes in.
• 5. We’ll split the training set in half. The first half will remain as the training data. The second half will serve as the cross
validation data. We’ll provide the training portion to the learning algorithm, along with the answers.
• 6. After training has completed, we’ll run the algorithm again on our cross validation data to see just how accurate the
solution really is. Since we have the digit labels (answers) for both the training and cross validation sets, we can calculate
an accuracy percentage.
• 7. Using the above technique, we can compare different learning algorithm and find best algorithm for mnist dataset
CLASSIFIER USED IN PROJECT FOR DIGIT
RECOGNITION
• 1.GAUSSIAN NAÏVE BAYES-Naive Bayes methods are a set of supervised learning algorithms based on
applying Bayes’ theorem with the “naive” assumption of independence between every pair of features.
GaussianNB implements the Gaussian Naive Bayes algorithm for classification.
patterns as prototypes. The classification accuracy is inuenced by the number of nearest neighbor k. We thus try
different k (k = 1; 3; 5; 7; 9) and obtain the test error rate for each classifier. The training error rate is obtained by
the 10-fold cross-validation. As shown in Figure , the highest accuracy is mostly given by k= 3. We hence use 3-NN
classiffier .
• Syntax to implement knn in scikit learn:
• from sklearn.neighbors import KNeighborsClassifier
• >>> neigh = KNeighborsClassifier(n_neighbors=3)
• >>> clf=neigh.fit(xtrain,xlabel)
CLASSIFIER USED IN PROJECT
• 3.SUPPORT VECTOR MACHINE-are a set of supervised learning methods used for classification, regression and outliers
detection.
• The MNIST database contains 60,000 digits ranging from 0 to 9. Each digit is normalized and
centered in a gray-level image with size 28 x 28, or with 784 pixel in total as the features
TRAINING DATASET
• Ex train.csv
8 0 0 0 0 0 0
9 0 0 0 0 0 0
1 0 0 0 0 0 0
3 0 0 0 0 0 0
3 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
0 0 0 0 0 0 0
7 0 0 0 0 0 0
5 0 0 0 0 0 0
8 0 0 0 0 0 0
6 0 0 0 0 0 0
2 0 0 0 0 0 0
0 0 0 0 0 0 0
TESTING DATA
• 2.SVM is the state of the art method for handwriting recognition which can provide very good
• 3.In this project we learnt how SVM can be applied for digit recognition. We have seen that with
proper set of training data, use of good image processing techniques, oriented features can
provide us with high level of accuracy of 99% in digit recognition using SVM .
REFERENCES
• [1] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines,2001. Software available at
http://www.csie.ntu.edu.tw/~cjlin/libsvm.
•
• [2] Hiroshi Sako Hiromichi Fujisawa Cheng-Lin Liu, Kazuki Nakashima. handwritten digit
• recognition: benchmarking of state-of-the-art techniques. The journal of the pattern
• recognition society, 36:2271{2285, 2003.
•
• [3] Y.LeCun et al. Comparison of learning algorithms for handwriteen digit recognition.
• International Conference on Arti_cial Neural Networks, pages 53{60, 1995.
•
• [4] John C.Platt Patrice Y.Simard, Dave Steinkraus. Best practices for convolutional neural networks applied to visual document
analysis.
•
• [5] David G.Stork Richard O.Duda, Peter E.Hart. Pattern Classi_cation. John Wiley &
• Sons, Inc., 2 edition, 2001.
•
• [6] Changsong Liu Xuewen Wang, Xiaoqing Ding. Gabor _lter-based feature extraction for
• character recognition. The journal of the pattern recognition society, 38:369{379, 2005.