0% found this document useful (0 votes)

5 views

Data Entry Through OCR - A Case Study of Digitizing Examination Marks from Paper Marksheets

This document discusses the challenges of digitizing handwritten examination marks in Bangladesh, highlighting the inefficiencies and errors associated with manual transcription. It proposes the development of an OCR-based system specifically designed to recognize handwritten Bangla numerals, utilizing CNN models for improved accuracy. The methodology includes evaluating existing OCR tools, training on a comprehensive dataset, and implementing image processing techniques to enhance digit recognition performance.

Uploaded by

voccubd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Data Entry Through OCR - A Case Study of Digitizing Examination Marks from Paper Marksheets

Uploaded by

voccubd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Data Entry Through OCR - A Case Study of Digitizing

Examination Marks from Paper Marksheets

Background

In government institutions across Bangladesh, particularly during recruitment

processes, handwritten documentation remains a standard practice for recording
marks. Each recruitment cycle involves viva voce examinations, where thousands,
sometimes even lakhs of applicants may apply and thousands appear before
interview boards. These boards typically consist of multiple evaluators, each of
whom assigns marks using handwritten Bangla numerals on paper-based mark
sheets. Subsequently, these handwritten scores are manually transcribed into
digital formats for record-keeping and further evaluation. This manual process not
only consumes substantial administrative time and effort but also introduces
significant risk of human error, particularly under high workloads and tight
deadlines.

While Optical Character Recognition (OCR) technologies have achieved maturity

in recognizing printed and Latin-script text (Smith, 2007), the recognition of
handwritten Bangla numerals remains a relatively under-explored and challenging
domain. Widely used OCR engines such as Tesseract perform well on printed text
but have not demonstrated high accuracy when applied to handwritten Bangla
numerals. While general OCR systems struggle with Bangla handwritten
numerals, specialized CNN models are able to achieve impressive accuracy in
recognizing single-digit Bangla handwritten numbers, even in noisy images. Given
that viva marks are typically one- or two-digit numbers, there is a scope for the
development of a CNN-based OCR system specifically optimized for recognizing
two-digit handwritten Bangla numerals from scanned interview mark sheets. Such
an automated digitization process will minimize transcription errors, and reduce
the administrative burden in recruitment workflows. This approach will offer the
potential to significantly enhance efficiency and accuracy in government
recruitment and other examination procedures.

Word count: 254

Objective

The principal objective of this project is to create an OCR-based system capable

of extracting data from images of handwritten mark sheets. The specific objectives
are:

● To evaluate the efficacy of open-source OCR tools in extracting marks from

Bangla handwritten marksheets.

● To read multi-digit Bangla numbers using image processing and single-digit

Bangla number detector models.

● To compare different methods for drawing bounding boxes around each digit
from multi-digit numbers.

Word count: 69

Expected Outcome

The expected outcomes of this project are:

● A functional OCR-based system specifically designed for recognizing

handwritten Bangla digits from scanned mark sheets.

● Accurate recognition of multi-digit Bangla numbers, especially two-digit

marks, using image segmentation and digit classification models.

● A comparative evaluation of multiple bounding box strategies to improve

digit segmentation accuracy.

● High performance and reliability when tested on real-world handwritten

mark sheet samples, reflecting practical usage scenarios.

Word count: 67
Methodology
● Getting image slices that contain the numbers we want to extract
● Validating existing OCR systems on these cropped images with numbers.
● Preliminary results have shown.

Our research addresses the challenge of recognizing handwritten Bangla digits

from examination papers through a structured approach. The methodology
comprises several sequential steps:

First, we analyzed existing OCR systems (Tesseract, EasyOCR) on cropped

number images from sample examination papers. Preliminary results did not show
adequate performance, primarily due to the high variability in individual handwriting
styles and the absence of robust, annotated datasets for multi-digit handwritten
Bangla numerals.

Then we selected the NumtaDB dataset for its comprehensive representation of

handwriting styles with 70,000+ annotated samples from diverse demographics,
making it superior to alternatives like CMATERdb for robust model training. It's
important to note that NumtaDB contains only single-digit samples, while our target
application requires recognizing multi-digit (predominantly two-digit) numbers from
examination papers.

To bridge this gap, we developed a two-phase approach: first training a robust

single-digit classifier, then implementing a segmentation pipeline to handle multi-
digit numbers. For image preprocessing, we extract individual digits from multi-
digit numbers using blurring, binarization, and contour detection, with bounding
boxes sorted left-to-right for proper sequencing.

We chose a Convolutional Neural Network (CNN) architecture because it excels

at automatically learning spatial features directly from pixel data without manual
feature engineering. Our CNN model consists of convolutional layers, max pooling,
and dense layers trained on the preprocessed images.

Preliminary results are positive, with the model showing significant improvement
over generic OCR solutions when tested on examination papers with variable
writing styles and potentially touching digits.

Word count: 239

Zanotti Z250 Z350 Z380 WORKSHOP MANUAL
No ratings yet
Zanotti Z250 Z350 Z380 WORKSHOP MANUAL
123 pages
Rusting of Iron - Project
78% (54)
Rusting of Iron - Project
14 pages
Basic Mortgage Statement Template 2
No ratings yet
Basic Mortgage Statement Template 2
1 page
17r-97 Cost Estimate Classification
No ratings yet
17r-97 Cost Estimate Classification
10 pages
Practical System Tips
0% (1)
Practical System Tips
237 pages
Journals List
100% (2)
Journals List
12 pages
Bangla Handwritten Digit Recognition Report
No ratings yet
Bangla Handwritten Digit Recognition Report
9 pages
Bangla Digit Recognition
No ratings yet
Bangla Digit Recognition
18 pages
Ocr With Machine Learning
No ratings yet
Ocr With Machine Learning
6 pages
BT4344 PPT
No ratings yet
BT4344 PPT
16 pages
Hand Written Bangla Numerals Recognition For Automated Postal System
No ratings yet
Hand Written Bangla Numerals Recognition For Automated Postal System
6 pages
Review 1 HDR
No ratings yet
Review 1 HDR
19 pages
Handwritten Manuscript Digitizer: Kaushil Ruparelia Ashay Shah Shah - Ashay@yahoo. Com Seema Wadhwani Dr. M Mani Roja
No ratings yet
Handwritten Manuscript Digitizer: Kaushil Ruparelia Ashay Shah Shah - Ashay@yahoo. Com Seema Wadhwani Dr. M Mani Roja
3 pages
Title: : Ahsanullah University of Science & Technology
No ratings yet
Title: : Ahsanullah University of Science & Technology
9 pages
Bangla Handwritten Word Recognition System Using Convolutional Neural Network
No ratings yet
Bangla Handwritten Word Recognition System Using Convolutional Neural Network
9 pages
9
No ratings yet
9
8 pages
Optical Character Recognition OCR in Handwritten Characters Using Convolutional Neural Networks to Assist in Exam Reader System
No ratings yet
Optical Character Recognition OCR in Handwritten Characters Using Convolutional Neural Networks to Assist in Exam Reader System
5 pages
CSEMP91_(5)[1][1] (1) (1)
No ratings yet
CSEMP91_(5)[1][1] (1) (1)
25 pages
A Multifaceted Evaluation of Representation of Graphemes For Practically Effective Bangla OCR
No ratings yet
A Multifaceted Evaluation of Representation of Graphemes For Practically Effective Bangla OCR
23 pages
Project Word Report
No ratings yet
Project Word Report
17 pages
Project T Proposal Bangla Alphabet Handwritten Recognition Using Deep Learning.
No ratings yet
Project T Proposal Bangla Alphabet Handwritten Recognition Using Deep Learning.
5 pages
State-of-the-Art Bangla Handwritten Character Recognition Using A Modified Resnet-34 Architecture
No ratings yet
State-of-the-Art Bangla Handwritten Character Recognition Using A Modified Resnet-34 Architecture
11 pages
Vaidhi Ayush Gurkirat Jatin Project Synopsis Format
No ratings yet
Vaidhi Ayush Gurkirat Jatin Project Synopsis Format
6 pages
Sample Project Report
No ratings yet
Sample Project Report
26 pages
18CSP83 - Project Phase 2 - body
No ratings yet
18CSP83 - Project Phase 2 - body
40 pages
Bangla Handwritten Character Recognition Using Convolutional Neural Network With Data Augmentation
No ratings yet
Bangla Handwritten Character Recognition Using Convolutional Neural Network With Data Augmentation
6 pages
digit main
No ratings yet
digit main
30 pages
Offline Handwritten Hindi Character Recognition Using Data Mining152
No ratings yet
Offline Handwritten Hindi Character Recognition Using Data Mining152
50 pages
Bangla Handwriting Recongnition
No ratings yet
Bangla Handwriting Recongnition
6 pages
Handwriting Recognition Using Deep Learning: Image Processing
No ratings yet
Handwriting Recognition Using Deep Learning: Image Processing
14 pages
hexel
No ratings yet
hexel
75 pages
Bornonet: Bangla Handwritten Characters Recognition Using Convolutional Neural Network Bornonet: Bangla Handwritten Characters Recognition Using Convolutional Neural Network
No ratings yet
Bornonet: Bangla Handwritten Characters Recognition Using Convolutional Neural Network Bornonet: Bangla Handwritten Characters Recognition Using Convolutional Neural Network
8 pages
ML proposal
No ratings yet
ML proposal
2 pages
ManishGiri G 2018465 34
No ratings yet
ManishGiri G 2018465 34
12 pages
ID10243
No ratings yet
ID10243
10 pages
Input Image
No ratings yet
Input Image
8 pages
1822-b.e-cse-batchno-4 (1)
No ratings yet
1822-b.e-cse-batchno-4 (1)
64 pages
Design of An OCR System and Its Hardware Implementation
No ratings yet
Design of An OCR System and Its Hardware Implementation
18 pages
Bilingual_OCR_Report
No ratings yet
Bilingual_OCR_Report
10 pages
Research Article
No ratings yet
Research Article
10 pages
A Deep Neural Network Based Holistic Approach for Optical Character Recognition of Handwritten Documents
No ratings yet
A Deep Neural Network Based Holistic Approach for Optical Character Recognition of Handwritten Documents
9 pages
Final
No ratings yet
Final
28 pages
Recognition of Handwritten Roman Numerals Using Tesseract Open Source OCR Engine
No ratings yet
Recognition of Handwritten Roman Numerals Using Tesseract Open Source OCR Engine
6 pages
Handwritten Text Recognition a Survey of OCR Techniques
No ratings yet
Handwritten Text Recognition a Survey of OCR Techniques
16 pages
Handwritten Bangla Digit Recognition Using Deep Learning: Alomm Udayton EDU
No ratings yet
Handwritten Bangla Digit Recognition Using Deep Learning: Alomm Udayton EDU
12 pages
Deep Learning - Handwritten Digit Recognition Using Python REVIEW 0
No ratings yet
Deep Learning - Handwritten Digit Recognition Using Python REVIEW 0
16 pages
A Presentation of Project Synopsis of B
No ratings yet
A Presentation of Project Synopsis of B
19 pages
Automatic Mcqs Reading and Writting by Using Mobile Camera: Mphill Synopsis
No ratings yet
Automatic Mcqs Reading and Writting by Using Mobile Camera: Mphill Synopsis
7 pages
Algorithms 15 00129 v2
No ratings yet
Algorithms 15 00129 v2
25 pages
english_review[1]
No ratings yet
english_review[1]
14 pages
Hand written letter recognition
No ratings yet
Hand written letter recognition
14 pages
Cnn
No ratings yet
Cnn
22 pages
CV - Computer Eng.1 PDF
No ratings yet
CV - Computer Eng.1 PDF
10 pages
Handwritten Digit Recognition Phase1 (1) - Pages
No ratings yet
Handwritten Digit Recognition Phase1 (1) - Pages
11 pages
BATCH 6 for presentation
No ratings yet
BATCH 6 for presentation
37 pages
Bayesian Decision Theory Based Handwritten Character Recognition
No ratings yet
Bayesian Decision Theory Based Handwritten Character Recognition
8 pages
Handwritten Equation Solver Major Mid
No ratings yet
Handwritten Equation Solver Major Mid
17 pages
Extraction of Information From Handwriting Using Optical Character Recognition and Neural Networks
No ratings yet
Extraction of Information From Handwriting Using Optical Character Recognition and Neural Networks
6 pages
Handwritten Text Recgnition Final
No ratings yet
Handwritten Text Recgnition Final
5 pages
Review On Optical Character Recognition of Devanagari Script Using Neural Network
No ratings yet
Review On Optical Character Recognition of Devanagari Script Using Neural Network
6 pages
Optical Character Recognition Using 40-Point Feature Extraction and Artificial Neural Network
No ratings yet
Optical Character Recognition Using 40-Point Feature Extraction and Artificial Neural Network
8 pages
Vidhale 2021
No ratings yet
Vidhale 2021
5 pages
1st research
No ratings yet
1st research
13 pages
3
No ratings yet
3
11 pages
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
From Everand
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
Stephen Fleming
5/5 (2)
Wolfram Language and Computational Techniques: Definitive Reference for Developers and Engineers
From Everand
Wolfram Language and Computational Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Inspector-Skills
100% (2)
Inspector-Skills
33 pages
Ship Cargo Linear Programming
No ratings yet
Ship Cargo Linear Programming
7 pages
CPWD A Code Book
No ratings yet
CPWD A Code Book
142 pages
ICT 5301 Lecture4
No ratings yet
ICT 5301 Lecture4
24 pages
BARD AD Circular
No ratings yet
BARD AD Circular
2 pages
A New Network To Be Arrive
No ratings yet
A New Network To Be Arrive
3 pages
Steel Structure Suborno Hand Note-2
No ratings yet
Steel Structure Suborno Hand Note-2
25 pages
Copy Word Tables Into Excel Without Splitting Cells
No ratings yet
Copy Word Tables Into Excel Without Splitting Cells
1 page
Public Works Bill Form
No ratings yet
Public Works Bill Form
4 pages
RFQ Method Document Sample
No ratings yet
RFQ Method Document Sample
9 pages
24 Useful Excel Macro
No ratings yet
24 Useful Excel Macro
12 pages
OSCE Skills
No ratings yet
OSCE Skills
9 pages
Performance Enhancement of Dynamic Systems
No ratings yet
Performance Enhancement of Dynamic Systems
2 pages
Soliciting Letter To Auditor
No ratings yet
Soliciting Letter To Auditor
1 page
Surveying - Lab Report 3
No ratings yet
Surveying - Lab Report 3
12 pages
National Guard (In Federal Status) and Reserve Activated As of September 27, 2011
No ratings yet
National Guard (In Federal Status) and Reserve Activated As of September 27, 2011
460 pages
Algebra, Statistics and Probability A Mathematics Book For High Schools and Colleges
No ratings yet
Algebra, Statistics and Probability A Mathematics Book For High Schools and Colleges
199 pages
Learning Avogadro
No ratings yet
Learning Avogadro
192 pages
Work, Energy & Power
No ratings yet
Work, Energy & Power
11 pages
Krull - LVL 2
No ratings yet
Krull - LVL 2
5 pages
Eryone Filament: Smart Materials Play / Petg
No ratings yet
Eryone Filament: Smart Materials Play / Petg
3 pages
Effects of Personal Experience On Self-Protective Behavior: Neil D. Weinstein
No ratings yet
Effects of Personal Experience On Self-Protective Behavior: Neil D. Weinstein
20 pages
Drug Study: Phinma University of Pangasinan
No ratings yet
Drug Study: Phinma University of Pangasinan
1 page
American Schools of Oriental Research The Biblical Archaeologist - Vol.34, N.3 1971
No ratings yet
American Schools of Oriental Research The Biblical Archaeologist - Vol.34, N.3 1971
24 pages
015100 RIB Temporary Utilities 08-26-2024
No ratings yet
015100 RIB Temporary Utilities 08-26-2024
1 page
8200&8300 Series
No ratings yet
8200&8300 Series
80 pages
Kanisa Sacco Information Booklet
0% (1)
Kanisa Sacco Information Booklet
24 pages
Information Technology Auditing 4th Edition Hall Solutions Manual
100% (30)
Information Technology Auditing 4th Edition Hall Solutions Manual
25 pages
ABDUL KHALEK-Australian-Job-Offer (2) (1) (2) (1)
No ratings yet
ABDUL KHALEK-Australian-Job-Offer (2) (1) (2) (1)
3 pages
DLL Q2 Math6 Week 7
No ratings yet
DLL Q2 Math6 Week 7
4 pages
Condensation in Buildings: Prepared By: Zarina Yasmin Hanur Harith
No ratings yet
Condensation in Buildings: Prepared By: Zarina Yasmin Hanur Harith
19 pages
Ma2261 Mathematics Ii UNIT I - Ordinary Differential Equation
No ratings yet
Ma2261 Mathematics Ii UNIT I - Ordinary Differential Equation
17 pages
Walking in The Air
No ratings yet
Walking in The Air
21 pages
SVKM'S Nmims NMIMS Global Access - School For Continuing Education
No ratings yet
SVKM'S Nmims NMIMS Global Access - School For Continuing Education
1 page
Dirty Tricks (Kingswood Prep #5) 1st Edition Layla Simon instant download
100% (1)
Dirty Tricks (Kingswood Prep #5) 1st Edition Layla Simon instant download
69 pages