Mohamed Salman Ismail Gadit - Thesis PDF
Mohamed Salman Ismail Gadit - Thesis PDF
D EGREE OF
BACHELOR OF E NGINEERING
NATIONAL U NIVERSITY OF S INGAPORE
Abstract
This project deals with the use of Optical Character Recognition (OCR) technology for effective translation of
business cards that are photo-captured under normal environmental noise. The scope of this project branches out
into improving the current OCR engine, removing environmental noise with effective pre-processing, streamlining usable information after conversion by effective post-processing and putting all these features together in
a user-friendly mobile application for the end user.
The report starts with a detailed timeline for the project, listing out various short term and long term goals
on the road towards completion, a literature review that highlights the need for such a technology, quantitative
and qualitative improvements over competitor apps and justification behind the choice of mobile platform that
is being used. The principles and guides to be adhered to, have been highlighted along with relevant work
which has been accomplished. The review concludes with a summary of the current improvements possible
over competing technologies.
The next part highlights contributions accomplished during the course of project development. Short term
and long term goals have been discussed in detail. The final vision for the product, consisting of both technical
and directional goals, has been discussed and justified. The next few chapters highlight individual contributions
to the main application, along with the challenges faced and solutions to overcome them. The results of the
applications performance as compared to competitor benchmarks has been documented and explained. Various
development and design bottlenecks observed have been listed out and improvements have been suggested
accordingly for future iterations. The report concludes with a section on future prospects for the app.
ii
Acknowledgements
This Final Year Project could not have been taken on and completed in its current state without the help and
contribution of the following people, who have from time to time provided direction, critical advice, viable
alternatives and praise. Im truly grateful for their assistance and contribution towards the culmination of this
final year project.
- Prof Ko has played a pivotal role in the way this project has
worked out its course. Right from the beginning, he gave us the imaginative freedom to take the resources that
were given to us and apply it to any imaginable extent. Through each stage of assessment, he gave us his critical
opinions and advice on the way we were crafting this mobile application and implementing the innovations
mentioned in this paper. While giving us the freedom, Prof Ko also helped us tie our ideas down in the core
of practicality by reminding us about constraints and suggesting ways to tackle them. His supervision of the
project is truly appreciated.
My team mates - Arnab Ghosh, Aravindh Ravishankar, Varun Ganesh - Without these gentlemen, this
project would not stand on its own anything like its present-day iteration. Right from day one, my team mates
and I have worked together in a cohesive unit to come up with a wide array of viable, and sometimes extensive,
visions for the way this project was going to turn out. Bouncing ideas off each other, we were able to distill out
the best possible option for us given our practical constraints of time and resources. My team mates have also
helped to motivate me during periods where my results werent as promising and my progress was slow. It has
been my pleasure to work with these gentlemen realizing a vision that we shared together, right from the start,
into the app that has resulted from countless hours that we have worked on it.
National University of Singapore - My gratitude to the university for giving me the opportunity to work on
this project.
iii
My examiner, Professor Lawrence Wong - While working towards a larger goal, it is often possible to lose
focus on the smaller issues at hand. I value Prof Wongs valuable and critical advice during CA2 where he
pointed out the aspects of the project that needed a closer attention. His recommendations helped me take a
closer look at the project and strengthen some aspects of the presentation as well.
iv
Contents
Abstract
ii
Acknowledgements
iii
List of Figures
ix
List of Tables
xii
List of Abbreviations
xiii
Introduction
1.1
1.2
1.3
1.4
Project goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Literature Review
2.1
2.2
Tesseract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
Tesseract Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4
Cube Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5
Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6
AForge.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.2
OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.3
MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.4
Comparison of Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7
Current Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7.1
Google Goggles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7.2
OCR Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.7.3
18
19
Project Decision
3.1
3.2
3.3
Moving to Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4
Application Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Image Processing
23
4.1
Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2
Smoothing Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.1
Homogeneous Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.2
Gaussian Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.3
Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.4
Bilateral Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3
Combining Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4
ScopeMate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Image Segmentation
5.1
31
5.1.2
Alternatives Explored . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.3
vi
5.2
5.3
5.4
6
Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.1
5.2.2
Alternatives Explored . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.3
Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.1
Objectives of Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.2
Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.3
Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Region of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
48
6.1
6.2
6.3
Multi-threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.4
56
57
Segmentation Results
7.1
7.2
Results of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.1.2
Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2.1
Results of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2.2
Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Performance Results
8.1
62
vii
8.2
9
App Results
9.1
9.2
9.3
67
NUS Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.1.1
Clear Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.1.2
Unclear Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
External Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2.1
9.2.2
9.2.3
9.2.4
9.2.5
Colored Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
10 Conclusion
77
viii
List of Figures
1
Original Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
10
11
12
13
14
15
1D Gaussian Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Region of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
41
42
43
44
45
Single-threaded model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
46
Multi-threaded model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
47
48
49
Effect of CUBE libraries: (a) Original Text (b) Only Tesseract (c) CUBE and Tesseract together
50
51
55
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
xi
List of Tables
1
10
xii
List of Abbreviations
ABBYY - A Russian software company, headquartered in Moscow, that provides optical character recognition,
document capture and language software for both PC and mobile devices BMP - Bitmap Image File Format
GUI - Graphical User Interface
HP - Hewlett-Packard
ICR - Intelligent Character Recognition
JNI - Java Native Interface
NCut - Normalized Cut
NDK - Native Development Kit
OCR - Optical Character Recognition
OS - Operating System
OSD - Orientation and Screen Detection
ROI - Region of Interest SDK - Software Development Kit
TIFF - Tagged Image File Format
UTF-8 - Universal Character Set Transformation Format - 8-bit
WPF - Windows Presentation Foundation
xiii
Part I.
Literature Review
Introduction
It is the 21st century and the human species is becoming increasingly dependent on the devices that it built.
Transistor count is still following an upward trend on Moores law and the processing power of devices have
never been better [1].
Human-machine interaction has conventionally been thought of as a one-way input system where the human
understands the machine and gives it explicit commands to function. However, with the current increasing trend
in human-machine interaction and the increasing processing power of new age devices, the need for devices
to understand the human world too is becoming increasingly important. Optical Character Recognition (OCR)
technology tries to bridge this gap by giving devices the power to understand characters and languages from the
human world.
OCR technology was first used by libraries for historic newspaper digitization projects in the early 1990s.
An initial experiment at the British Library with the Burney collection and a cooperative project in Australia
(the ACDP) with the Ferguson collection were both considered unsuccessful largely due to the difficulties
with OCR technology and historic newspapers[2]. Fast-forwarding two decades, OCR software accuracy has
improved drastically and they are being currently used in a wide array of applications such as data entry, number
plate recognition, assisting the visually impaired, importing information from business cards, etc [3].
1.1
As already mentioned, OCR software had initially been developed to digitize newspapers and library books.
The images of documents that were digitized by this process were obtained using commercial scanners, where
the light intensity distribution was uniform and the image consisted of standard black font styles printed on a
white background. However, once the number of people who exchanged business cards increased, there was an
increasing need to introduce OCR for business cards too. Chip Cutter, editor at LinkedIn, writes that although
the use of paper has drastically decreased in the current day and age, the convention of swapping business cards
at the end of a conversation still prevails high even among the tech savvy attendees of TED 2013 (Technology,
Entertainment and Design global conference) [4].
2
He continues to explain in his article on why people might still prefer swapping business cards instead of
sharing contacts on their mobile phones and thus puts forward four points to support his argument: business
cards are easy to use, quick to share, have a small learning curve and showcase the owners creativity. However,
one of the main problems with swapping business cards is maintainability and searching. The business cards
tend to wear out with time and are very difficult to search through during the time of need [5]. Thus, the
problem here is not people wanting to move away from the use of conventional business cards, but rather,
the need for a solution to maintain and manage all of their business cards over time. This signals a need to
digitize business cards using OCR applications in order to make them more maintainable and to make searching
through them easier. Some of the challenges in using OCR applications on business cards as compared to their
earlier implementation to read documents include: light distribution is non-uniform because of the environment
in which the photo of the card is captured, and the font, colours and arrangement of letters do not follow a
standard pattern across business cards.
1.2
Many OCR business card readers have been developed as standalone software solutions on the PC (Personal
Computer) [6]. Recently, students at the National University of Singapore, B.K. Ng & Jackson yeow, have
also developed an OCR system named SCORE and a business card reader B.SCORE on the PC platform [7].
B.SCORE was a follow up of the initial SCORE platform that was coded in C#, using the Windows Visual Basic
architecture.
However in OCR applications on the PC, the images have to be uploaded manually or should be captured
directly using the PCs webcam. The quality of OCR depends on the quality of the image captured [8]. The image
quality of a standard desktop webcam is only around 1.3 Megapixel whereas an average smartphone camera can
offer a higher range of image quality ranging from 5 megapixels and upward. Therefore the accuracy of OCR
on images captured using smartphones would be better than those captured using webcams.
World-wide smartphone users have already topped 1 billion, and this trend is only set to increase In the near
future [9]. Moreover since most of the contacts are maintained on the phone, capturing a photo using a camera,
then applying OCR on a PC, and then transferring back the recognized data to the phone might be cumbersome
when compared to carrying out this entire process chain on the smartphone directly. This would also give the
user the ability to digitize business cards anywhere, as opposed to only in the presence of a personal computer.
1.3
Most of the current mobile OCR applications need internet access in order to perform character recognition. A
detailed comparison of feature sets available in current mobile OCR business card readers has been performed
under the literature review.
However, this constrains users to only use their mobile OCR applications in the presence of an active internet
connection, thereby limiting the mobile aspect of the OCR application itself. Expecting users to be connected
to the internet at all times when they might exchange business cards, might be unrealistic. However with the
increasing processing power in todays smartphones [10], the ability to realize the entire OCR process by just
using the mobile phones processor can be a reality. Therefore, a robust, efficient, offline and mobile OCR
business card application would be the solution to realizing the business card digitizing needs of the current day
smartphone user.
1.4
Project goal
Therefore, the goal of this project is to develop a robust, efficient and offline mobile application that uses Optical
Character Recognition to automatically digitize business cards into maintainable and searchable mobile phone
contacts.
The team has planned to call this application Scope. In this project the teams goal is to achieve a minmum
of 90% accuracy for NUS cards, and 75% accuracy for any external cards.
Literature Review
Optical Character Recognition, abbreviated to OCR, is the mechanical or electronic conversion of scanned
images of handwritten, typewritten or printed text into machine-encoded text. It is widely used as a form of data
entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of
printed records [11].
It is a common method of digitizing printed texts so that they can be electronically searched, stored more
compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and
text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
The OCR technology was developed in the 1920s and remains an area of interest and concentrated research
to date. Systems for recognizing machine-printed text originated in the late 1950s and there has been widespread
use of OCR on desktop computers since the early 1990s [12].
The OCR technology enables users to liberate large amount of information held captive in hard copy form.
Once converted to electronic form, information can be edited and extracted easily according to users needs.
2.1
Since OCR technology has been more and more widely applied to paper-intensive industry, it is facing more
complex image environments in the real world. For example: complicated backgrounds, degraded-images,
heavy-noise, paper skew, picture distortion, low-resolution, disturbed by grid & lines, text image consisting of
special fonts, symbols, glossary words and etc. All the factors affect OCR products stability in recognition
accuracy.
In recent years, the major OCR technology providers began to develop dedicated OCR systems, each for
special types of images. They combine various optimization methods related to the special image, such as
business rules, standard expression, glossary or dictionary and rich information contained in colour images, to
improve the recognition accuracy.
Such strategy to customize OCR technology is called Application-Oriented OCR or Customized OCR[13],
widely used in the fields of Business-card OCR, Invoice OCR, Screenshot OCR, ID card OCR, Driver-license
OCR or Auto plant OCR, and so on. For the purpose of this project, the application, Scope, will make use of an
open-source OCR engine, called Tesseract, that can be ported onto a mobile phone
2.2
Tesseract
Tesseract is a free, open-source optical character recognition engine which can be used across various operating
systems. The engine was originally developed as proprietary software at Hewlett-Packard between 1985 and
1995, but it was then released as open source in 2005 by Hewlett-Packard and University of Nevada, Las Vegas
[14]. Tesseract development has been sponsored by Google since 2006[15].
The Tesseract algorithm is illustrated in Figure 2 [16]. A grayscale or colour image will be loaded into
the engine and processed. The program takes .tiff (TIFF) and .bmp (BMP) files but plug-ins can be installed
to allow processing of other image extensions. As there is no rectification capability, the input image should
ideally be a flat image from a scanner.
In the adaptive thresholding process, the engine performs the reduction of a grayscale image to a binary
image. The algorithm assumes that there are foreground (black) pixels and background (white) pixels and
calculates the optimal threshold that separates the two pixel classes such that the variance between them is
minimal .
Following that, Tesseract searches through the image and identifies foreground pixel and marks them as
potential character. Lines of texts are found by analysing the image spaces adjacent to the potential character.
For each line, the baselines are found and Tesseract examines them to find the appropriate height across the line.
Characters that lie out of this appropriate height or are not of uniform width are reclassified to be processed in
an alternate manner[17].
After nding all of the possible characters in the document, Tesseract does word recognition word by word,
on a line by line basis. Words are then passed through a contextual and syntactical analyser which will then
produce an editable .txt file in the tesseract folder. The tesseract folder is where all the source codes are located
and where the main engine is run. In addition Tesseract is also able to undergo training in order to recognize
special characters, other than alphabets. The Scope application would need Tesseract to recognize numbers and
certain symbols such as @ and +.
2.3
Tesseract Shortcomings
Despite the numerous contributions from many developers over the years, Tesseract performance suffers from
many shortcomings and restraints. OCR accuracy falls drastically if the image processed has a coloured background. The design and layout of the name cards and websites affects the precision adversely.
Another challenge faced by Tesseract is text size of the images. The Tesseract Frequently Asked Questions
(FAQ) [18] page states the noise-reduction mechanisms can and will hinder the processing of small text sizes.
In order to achieve notable results, text sizes should typically be around 20 pixels and any text under 8 pixels
will be recognized as noise and filtered off.
Table 1 illustrates the relationship between pixels size and character accuracy.
Essentially, Tesseract is a raw skeleton OCR engine with the core feature of text recognition. It does not
come with any GUI, consists of no page layout analysis, no output formatting and is lacking of additional
features.
255x285
4.61
384x429
98.12
1024x1087
99.49
2048x2289
99.15
2.4
Cube Libraries
The current OCR detection using Tesseract 3.02 simply translates the image to text, but does not take into
account the relationship behind the identified letters and word formations in order to provide an intelligent
result. The Cube libraries when used along with Tesseract, help in improving the contrast between the words
and the background in images and help boost the performance of OCR recognition.
The key features of CUBE libraries include:
Performing adaptive thresholding prior to OCR, to improve text contrast..
Windowed Segmentation to improve word recognition, by recognizing smaller pieces of image first and
stitching them together later.
Compare translated junk data to a dictionary database and the most frequently used words in a particular
language, so as to retrieve data lost due to noise.
A comparison of results on using Tesseract and on using Cube without any pre-processing, has been illustrated below.
10
2.5
Android
Android is a Linux-based operating system designed primarily for touchscreen mobile devices such as smartphones and tablet computers. Initially developed by Android, Inc., whom Google financially backed and later
purchased in 2005[19]. Android was unveiled in 2007 along with the founding of the Open Handset Alliance: a
consortium of hardware, software, and telecommunication companies devoted to advancing open standards for
mobile devices. The first Android-powered phone was sold in October 2008 [20].
Android is open source and Google releases the code under the Apache License. This open source code
and permissive licensing allows the software to be freely modified and distributed by device manufacturers,
wireless carriers and enthusiast developers. Additionally, Android has a large community of developers writing
11
applications (apps) that extend the functionality of devices, written primarily in a customized version of the
Java programming language. In October 2012, there were approximately 700,000 apps available for Android,
and the estimated number of applications downloaded from Google Play, Androids primary app store, was 25
billion[21].
These factors have allowed Android to become the worlds most widely used smartphone platform and
the software of choice for technology companies who require a low-cost, customizable, lightweight operating
system for high tech devices without developing one from scratch[22]. As a result, despite being primarily
designed for phones and tablets, it has seen additional applications on televisions, games consoles and other
electronics. Androids open nature has further encouraged a large community of developers and enthusiasts to
use the open source code as a foundation for community-driven projects, which add new features for advanced
users or bring Android to devices which were officially released running other operating systems[23].
The open source nature of the Android platform and thereby, the culture of the community makes Android
a great choice for using with the Tesseract OCR Engine. Using Androids Java Native Interace (JNI) and Native
Development Kit (NDK), the conversion of the C++ code in Tesseract into managed code in Java which is
recognised by Android becomes a manageable task and this allows the full functionality of Tesseract to be
explored with the immensely added advantage of an Android mobile operating system. In addition, development
and deployment of application on Android and consequently the Google Play store is a smooth process. Thus,
taking into consideration these various factors, the author has decided to specialise his project, Scope, for the
Android platform.
2.6
2.6.1
AForge.NET
AForge.NET is a C# framework designed for developers and researchers in the fields of Image Processing,
Computer Vision, and Artificial Intelligence. AForge.Imaging which is the biggest library of the framework,
contains different image processing routines, which are aimed to help in image enhancement or processing as
required various computer vision tasks.
12
The library consists of a wide array of filters to perform various colour correction, convolution, binarization
and thresholding operations. In addition to these, AForge also offers methods to perform edge detection and
feature extraction with Hough Transform analysis[24].
The functions are extensively documented and online help in the form of user forums and sample code
snippets is readily available. The libraries are constantly updated with new versions being frequently released.
2.6.2
OpenCV
OpenCV (Open Source Computer Vision Library) is an open source C/C++ library for image processing and
computer vision developed by Intel. It is a library of programming functions mainly aimed at real time image
processing. It is free for both commercial and non-commercial use[25].
OpenCV was originally written in C but now has a full C++ interface and all new development is in C++.
There is also a full Python interface to the library. Recently the OpenCV4Android SDK was developed and
released to enable using OpenCV functionality in Android applications.
OpenCV offers a comprehensive collection of image processing capabilities that surpasses that offered by
AForge. OpenCV in addition to supporting various image filters, transformations and thresholding mechanisms,
presents us with the ability to identify, compare and manipulate histograms in order to perform intelligent and
automated processing of images.
The most important feature of OpenCV is that it allows complex matrix operations to be performed on
images. This enables developers dealing with image processing to perform actions with greater understanding
and control over what is being done.
2.6.3
MATLAB
The Image Processing Toolbox that is included with MathWorks MATLAB provides a comprehensive set of
reference-standard algorithms, functions, and applications for image processing, analysis, visualization, and
algorithm development.
Operations that can be performed include image enhancement, image de-blurring, feature detection, noise
13
reduction, image segmentation, geometric transformations, and image registration. Many toolbox functions are
multithreaded to take advantage of multicore and multiprocessor computers[26].
MATLAB is a high-level scripting language meaning that it will take care of lower-level programming issues
such as declaring variables and performing memory management without the user having to worry about it. This
essentially makes MATLAB an easier language to get familiar with faster and allows the user to quickly piece
together a small amount of programming code to prototype an image processing operation [27].
2.6.4
Comparison of Libraries
AForge and MATLAB are more generic processing libraries that cater to a variety of requirements whereas
OpenCV was built with the main focus on image manipulation. Hence its code is highly optimised for this
purpose. It provides basic data structures for matrix operations and image processing and offers more extensive
functions when compared to the other two libraries.
Since AForge and MATLAB were built on C# and Java which are in turn built on C, they are higher level
languages. Though this means that memory management or other lower-level programming issues will be taken
care of, it also means that the processor will be kept more busy trying to interpret the higher level language,
turning it into lower-level C code and finally executing that code[28].
OpenCV however, is essentially a library of functions written in C. Which means it is closer to providing the
computer machine level code for it to execute. Ultimately more image processing is done during the computers
processing cycles, and less interpreting. As a result of this, programs written in OpenCV run much faster than
similar programs written in MATLAB or AForge.
Moreover, OpenCV is available for use on multiple mobile platforms such as iOS[29], Android and Windows. This is a crucial factor in choosing the library as the development of the Scope application is intended
to be based on the Android platform. MATLAB does not at present provide an SDK for Android development
whereas AForge will require an external framework such as Mono for it be ported onto Android[30].
Thus given all these issues, it was decided that the Scope application would incorporate the OpenCV library
for its image processing operations, primarily due to its speed and efficiency of processing and its cross-platform
14
compatibility which allow for easier expansion of the application in the future.
2.7
Current Alternatives
Such an implementation of the OCR technology isnt the first to show up on the Android market, and listed
below are the closest competitors to the Scope application.
2.7.1
Google Goggles
Google dominates Android market in general and its no different when it comes to OCR applications.
Google Goggles [31] is a visual search app that allows users to take pictures of items that they want to obtain
more information about. Among the various types of input that Goggles accepts, business cards are one of them.
The application captures the text areas in the image of the business card and sends it to Googles OCR engine
in the cloud. The parsed information is then pushed back to the phone, which recognises the information as a
contact and shows the user relevant contextual menus.
One of the main drawbacks of the app is that it requires an active internet connection to be able to carry
out any OCR processing. This means that the results churned out by the application are subject to delays and
15
lags faced in sending the photo, processing the photo, compiling the results and pushing out the results over the
cellular network service of the user.
2.7.2
OCR Test
OCR Test is an experimental app that attempts to harness the power of Tesseract and use it for OCR on
Android [32]. This app runs the Tesseract engine on the users devicewithout uploading your images to a
serverand is suitable for recognizing individual words or short phrases of text. The app also offers translation
services of the recognised text, which is powered by Google/Bing Translate.
The default single-shot capture runs OCR on a snapshot image that is captured when the user clicks the
shutter button, like a regular photo. The application also offers a continuous preview option while taking a
picture in which it shows a dynamic, real-time display of what the device is recognizing right beside the camera
viewfinder. An on-screen resize-able viewfinder box allows the user to focus on one word or phrase at a time
and the recognised text is displayed in the top right corner of the window. The continuous preview mode works
best on a fast device.
While the application is a decent attempt at using the Tesseract engine to carry out the processing, its
accuracy is thwarted Tesseracts limitations (see: Tesseract Shortcomings) making the operation of this app a
hit-or-miss scenario, where the best results are produced only when the ideal image conditions are met.
16
2.7.3
By far, the most competitive solution in the Android marketplace currently is the ABBYY Business Card
Reader [33]. Developed by the Russian company ABBYY, the applications built-in optical character recognition
allows the user to quickly receive precise results. The application also supports a database of 20 different
languages, which is used to translate business cards in various languages to English. Although the application
works using its built-in OCR engine, connecting to the network is required to authorise licenses and actually
use the app. This adds for a slight inconvenience for users who arent connected through their phones [34].
Added functionalities like searching for more information on social networks also depend on network connectivity. The application, however, fails to recognise words accurately in a number of scenarios, for instance,
if the background is in black and the text is in white. Special symbols (@, #, $ etc.) also are another issue.
The application circumvents this problem by highlighting the characters which it is unsure of and providing
alternatives for the user to choose from.
17
Part II.
Technical Details
18
3
3.1
Project Decision
Understanding the Tesseract OCR Engine
The most important task to start with was to understand the heart of this entire project, the OCR Engine developed by HP and now by Google Labs Tesseract. It was voted in as one of the 13 best OCR Engines in existence
and is considered one of the most accurate free software OCR engines currently available.
The most recent change is that Tesseract can now recognize 60 languages, is fully UTF8 capable, and is
fully trainable. This makes Tesseract an extremely powerful free tool to use in this project.
Tesseract 3.0.2 is the latest version of the engine with a whole new set of languages it supports, and revolutionary features that allow it to recognize text from any angle. With this, we feel like its a good starting point to
make this project as accurate as possible.
In Java, the Tesseract engine is encapsulated by the TessBaseApi class. Below I have outlined some of the
public methods and their functionalities, so as to understand the public scope of the Tesseract engine in Java.
3.2
The first step was to try building the Tesseract Engine on Windows and replicating simple code that could
allow images to be converted into text. The team planned on port it into a complete Windows system including
Windows Phone and Windows 8. When working on the WPF (Windows Presentation Foundation) application
19
the results were good, the author used the .NET wrapper for Tesseract and created a simple project and managed
to convert clear text into letters. However, there were memory issues at times which indicated errors with the
wrapper that was being used.
The next step was trying to port this into Windows Phone. After building several libraries, the author
discovered a fundamental flaw Windows Phone does not accept code in C++ or C, only C#. Since Tesseract
had been written completely in C++ and C#, it could never be run on the current version of Windows Phone
(7.5). However Microsoft has promised C and C++ support in Windows Phone 8 which was released to the
public on 29th October 2012.
Therefore, the team would have to move into a platform which accepted native code and allowed us to use
the Tesseract Engine.
3.3
Moving to Android
Moving to iOS was out of question there was no way the team could get it to work with all the restrictions
Apple Inc. has over it. In addition, the pre-requisite of having a Mac to code in iOS made this an all the more
easier decision. Thus the resultant move was to port to the Android platform.
The biggest challenges here were to get used to the Java programming language which Android uses. In
addition, since Tesseract is written in C++, the team would have to use one additional layer to convert the C++
code into Java managed code. This layer is called the JNI and is converted by using the Android NDK.
As seen from the figure above, the NDK can convert the C++ code into to the Java Native Library and then
the JNI. This was essential for Tesseract. To make this work, the author had to also use Apache Ant, a Java
builder that can combine the JNI and Java and build a final application compatible for Android and other Java
projects.
Once this was ready there was now a Tesseract Engine Library that was working on Android. Now the
remaining task was simply to make it work in the final application.
20
3.4
Application Workflow
21
22
research and create relevant algorithms to accomplish their relevant sections. The author of this thesis took on
the following tasks:
Creation of layouts and managing integration within the Android App
Background segmentation
Text segmentation
Segment clustering
Image and memory management in app
Integrating Tesseract with Android and optimizing performance
The rest of this report logically analyses and covers these various tasks in due detail. The final section of
the report also covers the results of this app after the entire workflow has been implemented and run together
for images.
Image Processing
The first task for the project introduction of image processing filters to the system. In a system where the
accuracy of the result depends highly on the quality of the image and how clean it is, it is important to perform
some pre-processing to the image before passing it into the Tesseract Engine. To do this in a diverse and
wholesome manner, a set of Image Processing libraries have to be compiled together in a simple fashion to be
simple enough to be accessed by the system, and later, the automation algorithm. This task was undertaken by
the author and fellow team-mate and the tasks were divided equally between them. This section describes the
filters implemented by the author and the rationale behind implementing each one of the filters mentioned.
4.1
Brightness
A general image processing operator is a function that takes one or more input images and produces an output
image. In this kind of image processing transform, each output pixels value depends on only the corresponding
23
input pixel value (plus, potentially, some globally collected information or parameters)[36].
One commonly used point process is addition with a constant:
g(i, j) = f (i, j) +
(1)
In equation 1 , is the bias parameter, which controls the brightness of the image, g is the output image
matrix, f is the input image matrix and i and j refer to row and column number respectively.
Brightness is a basic functionality of any image processing class and a good brightened image will help
enhance results better and therefore the author chose this class as one of the implementations.
4.2
Smoothing Filters
Smoothing, also called blurring, is a simple and frequently used image processing operation. There are many
reasons for smoothing. For Scope, the main purpose of smoothing is to reduce noise in a picture, and thereby
ensure smoothness in a picture. This, naturally, leads to better results in OCR reading due to the balance between
varying pixels in the image.
24
To perform a smoothing operation we will apply a filter to our image. The most common type of filters is
linear, in which an output pixels value is determined as a weighted sum of input pixel values:
g(i, j) =
f (i + k, j + l)h(k, l)
(2)
k,l
In equation 2 , h(k,l) is the filter kernel, which is nothing more than the coefficients of the filter, i is row size
and j is column size.
4.2.1
Homogeneous Filters
This filter is the simplest of all. Each output pixel is the mean of its kernel neighbours ( all of them contribute
with equal weights)[37]. This results in a simple matrix kernel, which looks like:
1
width height
.
..
25
...
..
...
..
.
(3)
Gaussian Filters
Gaussian filtering is done by convolving each point in the input array with a Gaussian kernel and then summing
them all to produce the output array. To understand what a Gaussian kernel is like, we can imagine a 1-D image,
and the kernel looks in a Gaussian Curve. This is referenced in figure 15
The weight of its neighbours decreases as the spatial distance between them and the centre pixel increases[38].
This concept is the same for a Gaussian kernel in 2-D, representing a 2-D curve that increases in the centre of the
26
2-D space. The equation for calculating a Gaussian kernel therefore, is as per the Normal distribution equation
4.
G0 (x, y) = Ae
(xx )2
2
2x
(yy )2
2
2y
(4)
In 4, is the mean and is the variance for variables x and y. An applied Gaussian filter is shown in 16.
4.2.3
Median Filter
The median filter runs through each element of the signal (in this case the image) and replaces each pixel with
the median of its neighbouring pixels (located in a square neighbourhood around the evaluated pixel). Median
filtering is very widely used in digital image processing because, under certain conditions, it preserves edges
while removing noise [39]. The result of a Median filter is shown in figure 17.
The weight of its neighbours decreases as the spatial distance between them and the centre pixel increases.
This concept is the same for a Gaussian kernel in 2-D, representing a 2-D curve that increases in the centre
27
4.2.4
Bilateral Filter
Sometimes filters do not only dissolve the noise, but also smooth away the edges. To avoid this (at certain
extent at least), we can use a bilateral filter. In an analogous way as the Gaussian filter, the bilateral filter also
considers the neighbouring pixels with weights assigned to each of them. These weights have two components,
the first of which is the same weighting used by the Gaussian filter. The second component takes into account
the difference in intensity between the neighbouring pixels and the evaluated one[40].
The basic idea underlying bilateral filtering is to do in the range of an image what traditional filters do in
its domain. Two pixels can be close to one another, that is, occupy nearby spatial location, or they can be
similar to one another, that is, have nearby values, possibly in a perceptually meaningful fashion. It replaces
the pixel value at x with an average of similar and nearby pixel values. In smooth regions, pixel values in a
28
small neighbourhood are similar to each other, and the bilateral filter acts essentially as a standard domain filter,
averaging away the small, weakly correlated differences between pixel values caused by noise. The result of a
Bilateral filter is shown in figure 18.
4.3
Combining Filters
Concurrently while the author was in the process of developing the filters mentioned in the previous section, a
fellow team-mate was implementing the second half of image processing filters for the teams self-created image
processing library. The other functions implemented are listed as follows:
Contrast
Greyscale
Histogram Equaliser
Morphing filters
29
Pyramids
Thresholding Filter.
The team agreed that this large collection of filters should serve their initial tests well enough. There was
now a need to find a way to test all functions independently and together.
4.4
ScopeMate
ScopeMate was a separate app developed for the purposes of testing the functionality of Scope. It was developed
so that image processing filter can be dynamically and immediately tested and the OCR results retrieved. The
author was purely responsible of creating this test app and combining the Image Processing library with the
Tesseract Engine. In addition, the order of applying the filters also affects the final result, so ScopeMate was
laden with an internal algorithm to listen to the order of the application of filters by the tester. This order and
values are recorded and applied to an image, showing the visual result as well as the results from the testing
process.
ScopeMate was to be primarily used by the teams tester (a fellow team-mate) and the results calibrated are
explained on the team-mates report.
30
Image Segmentation
Image segmentation is the process of subdividing a digital image into multiple segments or sub-images. The
goal of segmentation is to streamline the representation of an image into something that is more meaningful and
easier to analyse. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in
images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such
that pixels with the same label share certain visual characteristics [41].
The result of image segmentation is a set of segments that collectively cover the entire image, or a set of
contours extracted from the image. Each of the pixels in a region is similar with respect to some characteristic
or calculated property, such as colour, intensity, or the presence of some common trait[42].
In this project, the author performed two types of image segmentation:
Background Colour Segmentation
Text Segmentation
The relevance, objectives, implementation and results of the above methods of segmentation are given in the
sections that follow. It must be noted, that due to the device resource constraints, no existing algorithms were
used to perform these operations, and it was a complete research effort from ground up. The workflow for the
image segmentation used in the app is given in figure 20.
5.1
Background colour segmentation is the process of segmenting an image based on the trait of background colour.
In business cards, there could be many cases of varying background colours, and it is important to segment by
31
background colour before performing text segmentation. Usually each background contains a certain number
of text segments, and it is contained within the background colour. In this section, the author will attempt to
justify the necessity for background segmentation, the algorithms involved, provide a thorough analysis of the
results and explore the future uses and improvements on the idea.
5.1.1
The Tesseract OCR Engine learns an entire image before it performs the OCR imaging. For this purpose, the
more concentrated the image is, in terms of several features, the better the recognition process by the OCR. One
purpose of background segmentation is to provide this contextual purpose to the application. Using background
segmentation, the card is divided by background colour and this provides a more concentrated scope of learning
for the OCR engine.
The second objective of background colour segmentation is to separate the business card into its differently
coloured backgrounds because the adaptive thresholding mechanisms used for cards with dark backgrounds, is
different from that used for light coloured backgrounds. The preliminary identification of the different parts
of the card is performed by the background colour segmentation algorithm. By splitting the card into its different background colours, the strategy is to create a stronger, more customised adaptive thresholding for each
background colour, and thereby create more accurate results.
5.1.2
Alternatives Explored
Histogram Based: Histogram based background segmentation involves the utilisation of an image histogram
to detect the primary intensities of the image. By method of contour analysis, combined with the results of the
histogram, the user is able to make a reasonable guess as to the nature of the backgrounds in the card. Text
is usually a very small representation on the histogram, so by eliminating all histogram peaks that fall below
a set threshold value, the program is able to identify the types of backgrounds, and by using contour analysis
and image masking, accurately point out backgrounds of the image. The problem with using a mere histogram
based approach is that, it fails to work for any backgrounds with gradients[43]. Gradients occupy a large number
32
of peaks in the histogram even though they may be from one background technically. To overcome this, this
method has to be combined with Canny edge detection.
as output an image showing the positions of tracked intensity discontinuities. The Canny algorithm performs
various operations. Firstly, since the image is susceptible to noise present in image data, a filter is used where the
raw image is convolved with a Gaussian filter. The result is a slight blurred version of the original which is not
affected by a single noisy pixel to any significant degree. The edge detection operator then returns a value for the
first derivative in the horizontal (Gx) and vertical (Gy) directions. From this the edge gradient magnitude and
direction can be found. The direction is rounded to one of four angles (0, 45, 90 and 135 degrees) representing
vertical, horizontal and diagonal directions. Given estimates of the image gradients, a search is then carried out
to determine if the gradient magnitude assumes a local maximum in the gradient direction, a process known as
non-maximal suppression[44]. The tracking process exhibits hysteresis controlled by two thresholds: T1 and
T2, with T1 is greater than T2. Tracking can only begin at a point on a ridge higher than T1. Tracking then
continues in both directions out from that point until the height of the ridge falls below T2. This hysteresis helps
to ensure that noisy edges are not broken up into multiple edge fragments. Thus the Canny Detector is great
for identifying squares with gradient shading in the image. By using this Canny detector, we are able to find
edges of the card with similar gradients. This helps split the image into various edges. By applying histogram
analysis to each bounded contour, we are able to identify that contours primary background colour and apply
our masking technique as before to accurately identify the different segments in the card. By this method, the
combination of Canny detection and histograms yields a more accurate result. Unfortunately, this is a more
expensive and time consuming process primarily because of the addition of Canny detection. As the mobile
phone is a limited resource environment, usage of this technique must be carefully justified. The concepts for
both the above algorithms were developed by the author of this thesis.
Background Removal: The third option for this section would be the removal of background altogether. This
is done by detecting the backgrounds using the methods above and applying a reverse mask, so as to preserve
33
the foreground and hide the background. Though this would work quite well, it turns out to be a problem for
cards with darker backgrounds as the text left behind is usually white or closer to white and it would involve yet
another process to fill in the text itself. Again, even though this is possible, the expensive nature of the entire
process must be considered in a limited resource environment like the Android OS for smartphones, and this
leads one to believe that this method may not be suitable.
5.1.3
The author decided to choose the histogram based background segmentation. The rationale behind this was that
the phone has limited resources, and this was still the pre-processing stage of the application. There was still
adaptive thresholding and text segmentation to go before the OCR engine was activated and all the following
processes would need more memory and processing power especially the Tesseract OCR engine. In lieu, the
background segmentation with the least amount of resources consumed would be the most ideal. In addition, the
author hypothesized that there arent very many business cards with gradient backgrounds. Usually professional
organisations do not use linear gradients in business cards as it gives of an aura of unprofessionalism, and with
this assumption, using the histogram based background segmentation method makes most sense. A workflow
for this algorithm is given in figure 21.
To highlight the algorithm, the author is using the example of the NUS card given in figure 22.
The first step in the process is grayscaling. With the help of our custom built library on top of OpenCV, this
is made very easy and results in a grayed out version of figure 22. The next step is harder, requiring a histogram
analysis. To calculate the histogram, we use the OpenCV library again.
Histograms are collected counts of data organized into a set of predefined bins. Since it is known that the
range of information value for this case is 256 values, we can segment our range in subparts (called bins). This
34
Figure 24: Image masking applied to histogram bin from 112 to 127
36
Figure 25: Image masking applied to histogram bin from 240 to 255
masking, we have removed all the unnecessary parts of the image and converted them black. The largest
remaining white area will now be the desired background. To identify this, a contour analysis will provide a
set of contours, and the largest contour will be the large white background area visually recognisable after the
masking. The contour analysis for both images are given in figures 26 and 27. With this it is clearly evident that
the masking has definitely helped identify the correct backgrounds from the card.
Figure 26: Contour analysis applied to histogram bin from 112 to 127
With this the correct background colours and segments have been identified. This can now be separated into
sub images using OpenCVs Region of Interest functionality. More about this is given in the section Region of
37
Figure 27: Contour analysis applied to histogram bin from 240 to 255
Interest below. The results and testing sections of the algorithm has been covered in Part III of this thesis.
5.2
Text Segmentation
Text segmentation is the second kind of image segmentation that has been performed in this project. Text
segmentation is the method of locating, identifying, and separating the different closely located text areas in an
image. Each segment is then separately analysed by the OCR engine for maximal results.
5.2.1
The main objective of text segmentation is to provide a simple, concentrated view of just text for the OCR
engine to analyse. It identifies the different parts of the cards containing text and, using the clustering algorithm
explained in the next section, compiles together the different segments in a visually understandable format. As
the OCR engine learns the image before processing results, a smaller area with just text has a higher chance
of success. Text segmentation also acts as an excellent follow up from adaptive thresholding. For some cards,
adaptive thresholding could leave unwarranted pixels around the thresholded image. The text segmentation
ensures to box in a very concentrated area leaving all these dirty pixels out of the final OCR result, therefore
rendering a more accurate result.
38
5.2.2
Alternatives Explored
Erosion-Clustering: The erosion-clustering method of text segmentation applies erosion based pre-processing
techniques to the image to blur together similar neighbouring regions. The idea is to blur all the text areas together close enough to be identified clearly in a condition based contour analysis. This approach is inexpensive
and works quite well. However, too many dirty pixels can cause a mistaken assessment of the image and the
text segments identified could be wrong. Erosion-clustering was developed by the author of this thesis.
Spectral Clustering: Spectral clustering makes use of the image like a spectrum to identify different regions
of the image as a matrix and using vector theory. By constructing a weighted graph and an affinity matrix,
the spectral clustering method computes the diagonal of the matrix and combines it with the Eigen vectors of
the same to create a new spectral vector, which then is used to partition the image [45]. Though this image is
powerful, it is still a fairly fresh method and hasnt seen popularity yet in the recent years. Spectral clustering
was developed students in the Harbin Institute of Technology in China.
Normalized Cut Segmentation: There are also methods of segmentation using pure thresholding, by analysing
what the threshold level required for text segmentation is. NCut or Normalized Cut segmentation figures out
a normalised threshold value of the image for text by treating the image segmentation as a graph partitioning
problem for segmenting the graph and the threshold value can be used, much like Otsus threshold over an image
to separate the text[46]. The problem with this is that it does not work very well for images with coloured text
and coloured backgrounds, and seeing that most of the business cards will be in colour, this alternative could
certainly be a problem for this project.
5.2.3
The chosen method for performing text segmentation was erosion-clustering, which was developed by the author
of the thesis. Besides the innovative appeal, erosion-clustering is also inexpensive in this project as it uses many
readily available OpenCV methods to enhance the image well enough to produce a reasonable result for text
segmentation. The workflow used in this text segmentation method by the author is given in figure 28.
39
The first step is to grayscale the image. This is done using OpenCVs grayscaling function, which provides a
greyed out version of the image in figure 29. Following that, a median blur is applied over the entire card to blur
out any dirt or dust particles that have arrived in the image. The median blur has been explained in the image
processing section above. The image after the median blur is given in figure 30.
Next, a general adaptive threshold is applied all over the image, so that the whites of the image can be
inverted with the outlines of the text. It also further helps clarify any dirt pixels from the image. The image now
looks like in figure 31.
40
41
Following this, the card is now run by a strong dilate function. The dilate function applies a strong blur all
over the image and the text is no longer visually readable. As the goal of the whole process is to get blurred
lumps of what was text, this method is the most useful in the process so far.
The next step is to apply a relatively weak erosion with a large kernel. The reason for doing that is to possibly
create spaces in images that have been over dilated from the previous process. It doesnt affect images that have
dilated correctly, much like this has. Thus the result in figure 33 isnt that different from the previous image. In
some other cards, it plays a bigger role and can be a differentiating factor for perfect text segmentation.
42
Following this step, the image is run through a contour analysis. All contours found are set to be bounded
by a rectangle. This roughly generates many contours that overlap within on another as seen in figure 34.
The next thing to do is to clean the image. A cleansing process is applied through out the image to remove
contours that may be entire card and to remove contours that may be dust particles. This is done by dynamically
checking the largest contours available and creating thresholds against them against the total area of the image.
By applying the cleaner and also removing any contours within contours, the image generated is as per figure
35.
43
The text segments have now been identified successfully at this point. The text segmentation process is done.
However there are 6 visual segments and by looking at figure 35, there are certainly more than 6 identified
segments. The next section of this report deals with the algorithm used to cluster all the neighbouring text
segments together.
5.3
Clustering
Clustering is the process of radially collect information about the closest objects and merge them together. In
this project, clustering is directly relevant to text segmentation. By using the clustering algorithm developed by
the author of this thesis, the text segmentation process is completed and reaches the recommended results.
5.3.1
Objectives of Clustering
The clustering algorithm comes into play in this context in the scenario shown in figure 36. Clustering behaves as
a feedback loop. This means that, if there are any segments that are overlapping they will be sent through another
pass with clustering algorithm to merge together. The discussion on this is done in the further improvements
section below.
5.3.2
Future Improvements
Future improvements to the clustering algorithm could be to use a recursive check. This will mean that instead
of a feedback loop, a single recursive loop checks for predicted overlaps and clusters the relevant portions
together.
44
5.3.3
Clustering Algorithm
The clustering algorithm works on distance check mechanism and then populates a list of queues depending
on which cluster they belong to. For all the corners of every rectangle in the image, the distances to other
rectangles are taken and compared. If the distance is shorter than a specified minimum distance, then the
clustering algorithm adds it to the relevant clustering queue. The flowchart given in figure 38 describes the
workings of this algorithm.
When clustering is applied to the card given in 35, the text segmentation can now be successfully completed.
After applying the clustering algorithm, the result is as in figure 39. This results in the 6 segments that were
predicted visually and can now be separated into sub images using the Region of Interest.
45
46
5.4
Region of Interest
A region of interest (ROI) is a sub-matrix that has to be extracted from within a matrix. In segmentation, this is
extremely important as the card is being split into segments. Almost all functions have support the work with
ROI and work with the selected image area, which is useful to speed up the algorithms. Thus if only a specific
area is needed - it can be extracted and worked with without affecting the whole image.
To use ROI in Android with OpenCV, the following simple code snippet can be used.
// Create region of interest and save as a seperate bitmap
Mat cropped = performCrop(x, y, width, height, sourceImageMat);
destImage = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888);
Utils.matToBitmap(cropped.clone(), destImage);
47
6
6.1
There are a number of reasons why loading bitmaps in Android applications is tricky. Some of them are:
Mobile devices typically have constrained system resources. Android devices can have as little as 16MB
of memory available to a single application.
Bitmaps take up a lot of memory, especially for rich images like photographs. If the bitmap configuration
used is ARGB 8888 (the default from the Android 2.3 onward) then loading this image into memory
takes about 19MB of memory (2592*1936*4 bytes), immediately exhausting the per-app limit on some
devices
Android app UIs frequently require several bitmaps to be loaded at once. Components such as ListView,
GridView and ViewPager commonly include multiple bitmaps on-screen at once with many more potentially off-screen ready to show at the flick of a finger.
Since there is very limited memory, ideally a lower resolution version has to be loaded in memory[47]. The
lower resolution version should match the size of the UI component that displays it. An image with a higher
resolution does not provide any visible benefit, but still takes up precious memory and incurs additional performance overhead due to additional on the fly scaling. In events that memory is not managed right, it results in an
overload of the virtual machine heap and the following dreaded message crashes the entire app:
java.lang.OutofMemoryError: bitmap size exceeds VM budget
To implement this on the fly from the varying types of Android phones, the author has implemented something called a Bitmap Handler. In creating the handler, the traits to be considered were: Here are some factors
to consider:
Estimated memory usage of loading the full image in memory.
Amount of memory willing to commit to loading this image given any other memory requirements of the
app.
48
6.2
To integrate Tesseract into Android, a Java-based fork of the Tesseract OCR Engine created by Robert Theis
on GitHub was used. It is based upon the tesseract-android-tools project which is a reference from the original
Tesseract website. It is a project written on Javas Native Development Kit, with a Java based API. By compiling
it and running the local NDK, the project is converted into an Android-compatible library. Being a Windows
49
50
user, there was a need to get a Linux based environment to be able to run the build functions. To do this, the
author installed Cygwin and used that as a Linux layer to carry out the necessary functionality. Cygwin is a
collection of tools which provide a Linux look and feel environment for Windows and acts as a Linux API layer
providing substantial Linux API functionality[48]. By installing the gcc-core, gcc-g++, make, swig libraries
into Cygwin, the author was able to acquire an environment where the first stage of building the NDK was
ready. When built, the C++ side of the NDK was completely built. The process took a little over 50 minutes.
The next step is to compile the C++ side and integrate it into the Java API and make a Java library. To do this,
the author used a tool called Apache Ant. Apache Ant is a Java library and command-line tool whose mission
is to drive processes described in build files as targets and extension points dependent upon each other. The
main known usage of Ant is the build of Java applications. Ant supplies a number of built-in tasks allowing
to compile, assemble, test and run Java applications[49]. Ant can also be used effectively to build non Java
applications, for instance C or C++ applications. By running it with the two tools, the author was able to
convert the project into an Android library. From this point, it was as simple as importing as a reference into
the Scope app and marking it as a library. Figure 44 shows all the libraries for Scope correctly referenced after
compilation. As visible, Scope is a project compatible only on Android OS version 4.0 (Ice Cream Sandwich)
and upwards, thereby meeting the minimum requirement for Tesseract on Android which is 2.2.
6.3
Multi-threading
Scope is architected as a workflow. Each step has to be completed before it reaches the next step and acts as
the input for the next process. In this way, it follows what is referred to as a chain model. When the final
data reaches the OCR Engine, it is in the form of several separate adaptively thresholded images. As the OCR
Engine can be instantiated only once, it is a matter of careful architecture to how resources can be used at
maximum whilst getting the fastest possible results. Assume there are 6 resulting segments after all the preprocessing techniques have been completed. These 6 segments can be passed into the OCR one at a time in two
different ways. The authors first method of implementation was to use a linear chain of Asynchronous tasks as
represented in figure 45.
51
52
In this method, which is called the Single Threading model, each segment passes through the OCR Engine
only once. For 6 segments, the time went up to152 seconds before being sent to the contacts parser! This was
completely inefficient for the app and therefore the author decided to introduce multithreading into the app. The
multi-threaded model is represented in figure 46.
In the multi-threaded model, all the segments are treated parallel. The instantiation reference object identifies the phone resources at that moment and how many threads can be spawned to run an instance of Tesseract.
By instantiating Tesseract outside the multithreaded model[50], the engine has to only be instantiated once, and
various clones of the references are passed to each thread, thereby saving valuable time constructing the engine.
In addition, the maximum possible segments are being processed in parallel, therefore shortening the entire
time. When trying the multi-threaded model, the time for the same card was now an astounding 87 seconds.
The performance had improved by 42.7%. This is a tremendous improvement and even though it is still slow,
it is limited by the fact that the app is offline. In this environment, 87 seconds is indeed a very optimal time for
the Engine to perform results in. Therefore, the current model used in the app is the multithreaded model, based
on the optimisation that it renders.
6.4
As discussed in the literature review, the current OCR detection using Tesseract 3.02 simply just translates the
image to text, but does not take into account the relationship behind identified letters and word formations in
order to provide an intelligent result. The English CUBE libraries consist of seven different files [51].
53
eng.cube.bigrams This helps automatically correct the identification of most commonly found bigrams
in detected text.
eng.cube.fold This auto-formats the detected document into sections, lists, and paragraphs.
eng.cube.lm This trains tesseract to identify special characters along with alphabets and numbers.
eng.cube.params Params defines a collated list of global OCR parameters like Max Word Aspect Ratio,
Max segment per character, etc, for faster OCR.
eng.cube.size Size automatically grids large images into smaller portions for faster OCR.
eng.cube.word-freq This uses word-frequency data from the identified text to correct similar words that
continue to appear in the image.
eng.osd.traineddata This automatically corrects orientation of the image if it is not top-down. This would
increase the usability for this applications users, especially when taking pictures with their phones
These libraries are however too big to be loaded every time an OCR request comes up. As a solution
to this problem, the CUBE libraries are asynchronously loaded along with the splash screen, right before the
application begins. This method takes place in a background thread such that the user doesnt notice the lag
54
when he/she uses the application. The libraries are permanently loaded on a small space on the memory card,
so that there is no delay in loading the libraries after first time use.
Figure 49: Effect of CUBE libraries: (a) Original Text (b) Only Tesseract (c) CUBE and Tesseract together
The seven CUBE files that have been implemented in this project provide for effective post-processing and
segmentation capabilities that give a substantial boost to Tesseracts performance. One such example of CUBEs
performance improvement over stand-alone tesseract has been illustrated in figure 49.
55
Part III.
Results
56
Segmentation Results
7.1
7.1.1
Results of Algorithm
A perfectly working example of background segmentation is given in figure 50. This image is perfectly
segmented into its two predominant colours by the background segmentation algorithm. As seen by the result
on the right, the split into its correct background segment is indicated by the coloured boxes, representing
contours around the segments.
In figure 51, a boundary case is presented. This is defined as a boundary case because it is a very special
type of card that requires a non-rectangular segmentation cut. Since this is not possible using OpenCV, a part
of the card is common to both segments the part with the name. This would result in this segment being sent
57
into the OCR engine twice for results, and a check should be done on the smart parser side.
The third text case of background segmentation is a failing case. This is a very special case that fails because
the colour of the logo on the second segment is in the same bin as the background colour of the first segment.
Thus, when finding contours, this too gets included into the final contour analysis and skews the image. This
should be marked as an issue to fix, and has been discussed in the improvements section.
7.1.2
Future Improvements
mixing, and the future iteration of the this algorithm should focus on this case.
7.2
Text Segmentation
7.2.1
Results of Algorithm
The text segmentation algorithm, usage and purpose have been discussed in Part II of this document. This
section explores some of the results obtained by the text segmentation algorithm and failing cases have been
displayed and discussed as well. The first example discussed is that in figure 53. This card is an example of a
successful text segmentation and clustering. Visually, there are 3 segments in the card. This translates perfectly,
as the image on the right perfectly shows the three identified segments encapsulated within the specified area
correctly. This is another example of the successful test case of the text segmentation algorithm and this should
yield better results than the image sent without text segmentation. To check the effect of text segmentation, the
author sent the same card in with and without text segmentation and analysed the results. The results are shown
in the following table.
Accuracy (%)
65.2
82.8
This shows that text segmentation does play a big part in obtaining accurate results in the system, and a
59
The boundary case for text segmentation is explained by figure 54. In this instance, the text segmenter
wrongly takes a larger segment than what actually is (as the two segments are close by) and this overlaps with
an existing segment. Due to this, the feedback loop kicks into place and performs clustering, which leads to the
image below the two images above. This basically returns the original card, in a smaller area. This way, the text
segmentation has not been effective enough to analyse what has happened, and therefore not worked perfectly,
but hasnt failed either.
The failing case example is the card above. In this instance, no segments have been recognised correctly, and
the feedback loop doesnt clean up overlaps as it usually would. This usually happens in cards with a big logo
right in the middle of the card. In this instance, there are many overlaps but none of them meet the minimum
clustering distance requirement, which is 5% of the diagonal of the card. In the improvements section, the
author has discussed what could be done to improve this and fix this behaviour.
60
Future Improvements
Even though text segmentation works very well now, there still need to be a few fixes that need to be addressed
to make it even better. Firstly, is the issue discussed in figure 37. In this figure, it can been that the clustering
portion of the algorithm does not do a sound job in clustering at times and leads to giving a result like in the
case of figure 54. To avoid this, the text segmentation needs to be able to judge if it is holding too much in its
segment and perform a recursive segment validation test. By performing it recursively, it will ensure never to
over compensate its segments, keeping the clustering at the right amounts. The algorithm also needs to look
into moving from just rectangular segments into more compound shaped objects, so as to segment just the right
area.
The second improvement, is the reason for the failing case in 55. This happens because the minimum
clustering distance is too small for the image to eventually cluster and yield a result. To fix this, the text
segmentation algorithm must analyse the segments and dynamically create a minimum distance size based on
the average of the smallest distances between corners of the segments. This way, the clustering distance is kept
dynamic and the result accuracy will improve drastically.
61
Performance Results
The first set of test results focus on the accuracy-speed trade-off that the application consumes. As the entire
app is offline, loading higher resolution images is more expensive and time consuming but the results are
significantly better. The card used for testing this is shown in figure 56.
8.1
Low Resolution
Medium Resolution
High Resolution
8483
26684
69033
433
2191
3881
1935
13380
52258
90
834
1864
64337
80862
108009
75278
123951
235045
As expected, with increasing resolution, the time taken to complete all the processes in the app is increasing.
62
These results are graphically demonstrated in 57. However, there is a very significant advantage to increasing
the resolution, and this is discussed in the next section.
8.2
Accuracy results significantly vary too. As can seen below, the actual texts of results has been shown with a
percentage of accuracy. The percentage of accuracy has been done by matching every letter with the expected
letter, and taking a success ratio.
Expected Result
Cammie TAN
Senior Manager (NUS Career Centre)
63
E-mail: [email protected]
Website: www.nus.edu.sg/osa/career
Accuracy: 35.27%
Medium Resolution Result
Cammie TAN
Senior Manager tNUS Ci"rser Centre)
Accuracy: 77.80%
High Resolution Result
64
Cammy TAN
Senior Manager (NUS Career Centre)
Accuracy: 94.7%
From this, we are able to see that resolution highly affects accuracy. The significance in difference is shown
in the graph in figure 58.
Even though high resolution is slow, the authors primary focus in this project is accuracy, and therefore
65
all results of cards shown will be shown in high resolution, which means processing time would be quite slow.
This happens due to limited processing power and the fact that the application is running completely offline
with inbuilt libraries.
For the rest of the results section, everything will be assumed to be in high resolution henceforth.
66
App Results
To demonstrate the accuracy and result of the app so far as a whole, various different types of NUS and external
cards with different features have been tested and accuracy results given. The percentage accuracy of the results
have been calculated, and justification provided if the result is not good enough. All the cards in this section
have to be assumed to be of the highest resolution written by the apps customised bitmap handler.
Given below are descriptions of a few of the terms used in the testing process:
Clear indicates a fairly new card, with clear text and less dirt and crushing
Unclear indicates an old card, lots of dust, dirt and crushed edges
Light indicates the type of background colour of the card, being close to white
Dark indicates cards with darker background colours
67
9.1
NUS Cards
9.1.1
Clear Card
Expected Result
Actual Result
Cammie TAN
Cammy TAN
E-mail: [email protected]
E-mail: [email protected]
Website: www.nus.edu.sg/osa/career
Website: vvwvv.nus.edusg/osa/career
Accuracy: 94.7%
68
9.1.2
Unclear Cards
Expected Result
Actual Result
Shona Gillies
Shone 1Gli(Lll,,lE5
Assistant Manager
Assistant Manager
E-mail: [email protected]
E-mail: [email protected]
Website: www.nus.edu.sg/iro
Website: www.nus.edu.sg/iro
Accuracy: 93.7%
The name section of this card has not yielded a good result. This is because it is evident that the top part of the
card is quite crushed. In this instance, retrieving the data even by good pre-processing is not possible if the card
has a physical anomaly like the one shown in this example.
69
9.2
External Cards
9.2.1
Figure 61: Image used for External Card with Light Background
Expected Result
Actual Result
www.philips.com
www.philipsxom
Accuracy: 96.1%
70
9.2.2
Figure 62: Image used for External Unclear Card with Light Background
Expected Result
Actual Result
www.solarworld.sg
www.solarworkisg
hassan gafrar@solarworldsg
Accuracy: 81.6%
This card is an example of a dirty card, as it looks like it has been stored away in a possibly dusty place for a
long time. It also has very small fonts, making it harder for Tesseract to see. However, using the preprocessing
methods and segmentation algorithms combined, the result meets the teams minimum accuracy of 75% and
renders reasonably good results
71
9.2.3
Figure 63: Image used for External Clear Card with Dark Background
Expected Result
Actual Result
E-Gaming Solution
E-Gaming Solution
Video Streaming
Video Streaming
www.titansoft.com.sg
www.titansoft.com.sg
Accuracy: 80.8%
Tesseract doesnt work well with dark coloured backgrounds. Usually, there would be a result of under 20% for
this card. However, the workflow provided by the team ensures that the preprocessing is customised for darker
backgrounds and therefore meets the minimum accuracy percentage set by the team for external cards, of 75%.
72
9.2.4
Figure 64: Image used for External Unclear Card with Dark Background
Expected Result
Actual Result
CHRISTOPHER CAI
(CHlRyE)ir((i))F)Tifiytii c)) -
-65 6. 00
[email protected] [email protected]
thrts@ ,, srtiloC9 q
www.replaid.com
www.replaid.com
facebook.com/chriscai
facebook.corn/r: l,
REPLAID
REPLAID
Accuracy: 46.5%
This is an example of a failing case card for the application. This card has dark backgrounds, folded edges
and reflecting lights on the card. This makes the analysis process very difficult and therefore renders a negative
result, with only 46.5% accuracy. It is a requirement that the user take a good picture if good results are to be
expected from the app.
73
9.2.5
Colored Background
Figure 65: Image used for External Card with Mixed Background
Expected Result
Actual Result
bheemal6Punisim.edu.sg
www.unisim.edu.sg
wwwoumiisiimt,edluusg
Accuracy: 84.7%
This is an example of a card with both light and dark backgrounds. Here all the components in the process come
into pay together to seperate the sections and render reasonable results. The card meets the threshold set by the
team for a minimum results accuracy of 75% and therefore satisfies the goals set by the team.
74
9.3
Summary of Results
The team tested out cards for 10 NUS cards and 25 External Cards of different types. On this testing, the author
and the team realised that the results of the app depend on three primary factors:
1. Adequate lighting - the cards need to have good lighting to be able to yield reasonable results. Therefore
the user needs to ensure that the lighting is enough before taking a picture. Shadows arent much of an
issue, but if they can be avoided, it leads to better results.
2. Physical condition of the card - If the card is damaged in the corners, it can prove to be a problem as seen
in figure 60. Even though the rest of the card was perfect, the name in the card did not yield good results
due to physical damage in the card itself.
3. Dust - If there is a lot of dust and dirt on the card, it can interfere with the letters. Even though the team
implements measures to remove random dust particles, if they merge with the lettering it is unidentifiable
and thus should be avoided when used by the users as much as possible.
With the results, the team has achieved a test accuracy of 92.26% for NUS cards and 77.34% for external
cards.
75
76
10
Conclusion
At the heart of the Scope application lies innovation, finding creative answers to challenging problems. Fuelled
by sound imagination, ideas that are generated from scratch are fortified by extensive research in the respective
fields consequently transforming them into novel solutions. These solutions when put in place help to utilise the
available resources optimally, given their limitations, to generate the best possible results. This is seen across
the entirety of the application across the different processes. The edge detection and auto-rotation extract and
supply the most important part of the input image. The background segmentation analyses colour variations
and separates the card into different sections so that the cleaning filters and pixel fillers can cater to them
specifically and perform precise image correction. The OCR Engine is then made to work with these processed
segments, generating text that is processed by the Parser, allowing leeway for error that is expected. Thus in
essence, Scope is a system of components that follow a logical flow of functionality while all along effectively
complementing each other to ultimately produce a product that is robust and adaptive; the in-depth analysis of
results and performance that has been done is testament to this. The application produces accurate results when
subjected to a host of day-to-day scenarios and this goes to show that it is a product that is indeed more than
functional in its current state. Suggestions that have been given to improve specific aspects indicate that Scope
undeniably has potential for further growth in future iterations. These improvements will help the application
overcome certain limitations and extend its reach into an ever growing market and help realise its eventual goal
of being one amongst the best and definitively the most unique in its category.
77
References
[1] M. Kanellos. (2003). Moores Law to roll on for another decade. CNET [online].
http://news.cnet.com/2100-1001-984051.html.
[2] R. Holley , How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic
Newspaper Digitisation Programs, volume 15 no 3/4, D-Lib Magazine, National Library of Australia,
2009.
[3] CVision Tech. (1998). Applications of OCR [online]. http://www.cvisiontech.com/reference/generalinformation/ocr-applications.html.
[4] C.
Cutter.
(2013).
TED
2013:
Here,
the
Business
Card
Is
Not
Dead
[online].
http://www.linkedin.com/today/post/article/20130308001134-13780238-ted-2013-the-business-cardis-not-dead.
[5] R.
Bennet.(2012).
How
Business
Cards
Survive
in
the
Age
of
[online].
http://www.businessweek.com/articles/2012-02-16/how-business-cards-survive-in-the-age-of-linkedin.
[6] Tech
Tip
Org.
Convert
Images
to
Text
with
Free
OCR
Applications
[online].
http://www.techtip.org/convert-image-to-text-with-ocr/.
[7] B. K. Ng, Screen Capture Optical Recognition Engine SCORE, National University of Singapore, 2010.
[8] W.S. Lian, Heuristic Based OCR Post, University of North Carolina Chapel Hille, 2009.
[9] S.
Dover.
(2012
).
Study:
Number
of
smartphone
users
tops
billion
[online].
http://www.cbsnews.com/8301-205 162-57534583/study-number-of-smartphone-users-tops-1-billion/.
[10] PassMark Software. (2013). Android Benchmarks [online]. http://www.androidbenchmark.net/cpumark chart.html.
[11] H.F. Schantz, The History of OCR: Optical Character Recognition, Recognition Technologies Users
Association, 2012.
78
[12] M. Mann, Reading Machine Spells Out Loud, Popular Science, 1949.
[13] R.
Ahmad.
(2012).
Optical
Character
Recognition
(OCR)
[online].
http://rosalindaahmad.blogspot.sg/2012/04/optical-character-recognition-ocr.html.
[14] A. Kay. (2007). Tesseract:
http://www.linuxjournal.com/article/9676.
[15] L.
Vincent.
(2006).
Announcing
Tesseract
OCR
[online].
http://google-code-
updates.blogspot.sg/2006/08/announcing-tesseract-ocr.html.
[16] D.
Wolski.
(2012).
Toolbox:
OCR
with
Tesseract
OCR
[online].
http://www.heise.de/open/artikel/Toolbox-Texterkennung-mit-Tesseract-OCR-1674881.html.
[17] S.
tion
Bhaskar
on
et
the
al.
(n.d.).
Android
Implementing
Operating
Optical
System
for
Business
Character
Cards
Recogni[online].
Elgin.
(2005)
Buys
Android
for
Its
Mobile
Arsenal
[online].
http://www.webcitation.org/5wk7sIvVb.
[20] Open Handset Alliance. (2007) Industry Leaders Announce Open Platform for Mobile Devices [online].
http://www.openhandsetalliance.com/press 110507.html.
[21] J.
Rosenberg.
(2012)
Play
hits
25
billion
downloads
[online].
http://officialandroid.blogspot.ca/2012/09/google-play-hits-25-billion-downloads.html.
[22] Canalys. (2011) Googles Android becomes the worlds leading smart phone platform [online]. http://www.canalys.com/newsroom/google%E2%80%99s-android-becomes-world%E2%80%99sleading-smart-phone-platform.
79
[23] A. Russakovskii. (2012) Custom ROMs For Android Explained - Here Is Why You Want Them [online]. http://www.androidpolice.com/2010/05/01/custom-roms-for-android-explained-and-why-you-wantthem/.
[24] AForge.NET. (n.d.). Retrieved 01 12, 2012, from http://www.aforgenet.com/framework/features/
[25] OpenCV. Image Processing [online]. http://docs.opencv.org/modules/imgproc/doc/imgproc.html.
[26] MathWorks. Image Processing Toolbox [online]. http://www.mathworks.com/products/image/.
[27] Fixational. (2012). OpenCV vs MATLAB [online]. http://blog.fixational.com/post/19177752599/opencvvs-matlab.
[28] U. Sinha. (2012). Why OpenCV? [online]. http://blog.fixational.com/post/19177752599/opencv-vsmatlab.
[29] A. Curylo. (2012). OpenCV for iOS OFFICIAL [online]. http://www.alexcurylo.com/blog/2012/07/11/opencvfor-ios-official/.
[30] R. Paul. (2011). Mono for Android framework lets C# developers tame the Droid [online].
http://arstechnica.com/gadgets/2011/04/mono-for-android-framework-lets-c-developers-tame-the-droid/.
[31] Google Inc.. (2013). Google Goggles [online]. http://www.google.com/mobile/goggles/#text.
[32] R. Theis. (2013). OCR Test [online]. Google Play Store. https://play.google.com/store/apps/details?id=edu.sfsu.cs.orange.ocr&
[33] ABBYY. (n.d.). ABBYY Business Card Reader [online]. http://www.abbyy.com/products/bcr/.
[34] J. Richardson. (2012). ABBYY Business Card Reader and ABBYY CardHolder scan business cards with
your iPhone [online]. http://www.iphonejd.com/iphone jd/2012/02/review-abbyy-cardholder.html.
[35] M.
Gargenta.
(2009).
Using
NDK
to
Call
code
from
Android
Apps
[online].
http://www.aishack.in/2010/02/why-opencv/l.
[36] OpenCV.
Changing
the
contrast
and
brightness
of
an
image!
[online].
Large
Bitmaps
Efficiently
[online].
http://developer.android.com/training/displaying-
bitmaps/load-bitmap.html.
[48] Cygwin [online]. http://www.cygwin.com/.
[49] S. Loughran et al, Ant in Action, 2nd Ed, July 12, 2007.
[50] P. Hyde, Java thread programming, Sams Pub., Indianapolis, Ind, 1999.
81
82