0% found this document useful (0 votes)
84 views5 pages

Recognition of Multi-Oriented, Multi-Sized, and Curved Text

This document presents a technique for recognizing text in documents that contain multi-oriented, multi-sized, and curved text lines. The technique groups characters into text strings based on character size and maximum string curvature, without requiring training for specific fonts. It can be integrated with commercial OCR software. The technique first extracts text pixels and then uses a conditional dilation algorithm to iteratively expand and connect connected components, using tests on pixel expansion, to identify individual text strings for recognition. An evaluation showed the technique outperformed a commercial OCR product on recognizing text in raster maps.

Uploaded by

cnrk777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views5 pages

Recognition of Multi-Oriented, Multi-Sized, and Curved Text

This document presents a technique for recognizing text in documents that contain multi-oriented, multi-sized, and curved text lines. The technique groups characters into text strings based on character size and maximum string curvature, without requiring training for specific fonts. It can be integrated with commercial OCR software. The technique first extracts text pixels and then uses a conditional dilation algorithm to iteratively expand and connect connected components, using tests on pixel expansion, to identify individual text strings for recognition. An evaluation showed the technique outperformed a commercial OCR product on recognizing text in raster maps.

Uploaded by

cnrk777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Recognition of Multi-Oriented, Multi-Sized, and Curved Text

Yao-Yi Chiang Craig A. Knoblock


University of Southern California, University of Southern California,
Information Sciences Institute Department of Computer Science
and Spatial Sciences Institute, and Information Sciences Institute,
4676 Admiralty Way, Marina del Rey, CA 90292, USA 4676 Admiralty Way, Marina del Rey, CA 90292, USA
Email: [email protected] Email: [email protected]

Abstract—Text recognition is difficult from documents that


contain multi-oriented, curved text lines of various character
sizes. This is because layout analysis techniques, which most
optical character recognition (OCR) approaches rely on, do not
work well on unstructured documents with non-homogeneous
text. Previous work on recognizing non-homogeneous text typ-
ically handles specific cases, such as horizontal and/or straight
text lines and single-sized characters. In this paper, we present a Figure 1. Multi-oriented and multi-sized characters in a raster map from
general text recognition technique to handle non-homogeneous Rand McNally maps
text by exploiting dynamic character grouping criteria based on
the character sizes and maximum desired string curvature. This be easily integrated with the classic, well-developed OCR
technique can be easily integrated with classic OCR approaches techniques that process homogeneous text. Moreover, rec-
to recognize non-homogeneous text. In our experiments, we ognizing individual characters separately fails to take the
compared our approach to a commercial OCR product using advantage of word context, such as utilizing a dictionary to
a variety of raster maps that contain multi-oriented, curved and
help recognize grouped characters that represent meaningful
straight text labels of multi-sized characters. Our evaluation
showed that our approach produced accurate text recognition words.
results and outperformed the commercial product at both the Instead of recognizing individual characters sepa-
word and character level accuracy. rately, previous work on extracting text lines from non-
homogeneous text for text recognition typically handles
I. I NTRODUCTION
specific cases, such as specific language scripts [8], straight
Text recognition, or optical character recognition (OCR), text lines [5, 10], multi-oriented but similar-sized charac-
is an active area in both academic research and commercial ters [5, 6]. In our previous work [3], we presented a text
software development. Effective text recognition techniques recognition approach that locates individual multi-oriented
are widely used, such as for indexing and retrieval of text labels in raster maps and detects the label orientations
document images and understanding of text in pictorial to then leverage the horizontal text recognition capability
images or videos. of commercial OCR software. Our previous work requires
In classic text recognition systems, including most com- manually specified character spacing for identifying individ-
mercial OCR products, the first step is “zoning,” which ana- ual text labels and does not consider multi-sized characters.
lyzes the layout of an input image for locating and ordering In this paper, we build on our previous work [3] and
the text blocks (i.e., zones). Next, each of the identified present a text recognition technique to dynamically group
text blocks containing homogeneous text lines of the same characters from non-homogeneous text into text strings
orientation is processed for text recognition. However, this based on the character sizes and maximum desired string
zoning approach cannot handle documents that do not have curvature. The hypothesis is that characters in a text string
homogeneous text lines, such as artistic documents, pictorial are similar in size and are spatially closer than the characters
images with text, raster maps, and engineering drawings. in two separated strings. Our text recognition technique does
For example, Figure 1 shows an example map that contains not require training for specific fonts and can be easily
multi-oriented text lines of multi-sized characters and no integrated with a commercial OCR product for processing
zones of homogeneous text lines exist. documents that contain non-homogeneous text.
To process documents with non-homogeneous text, one
approach is to recognize individual characters separately [1, II. R ELATED W ORK
4, 9], such as utilizing rotation invariant features of specific Text recognition from documents that contain non-
character sets for character recognition [4]. However, this homogeneous text, such as from raster maps [7], is a difficult
approach requires specific training work and hence cannot task, and hence much of the previous research only works
on specific cases. Fletcher and Kasturi [5] utilize the Hough we employ our previous work [3] to detect the orientation
transformation to group characters and identify text strings. of each string and rotate the stings to the horizontal direction
Since the Hough transformation detects straight lines, their for text recognition using a commercial OCR product.
method cannot be applied on curved strings. Moreover, their This paper focuses on the second step of string identifi-
work does not handle multi-sized characters. cation, which is described in the next section. The details of
Goto and Aso [6] present a text recognition technique the other steps are described in our previous work [2, 3].
to handle multi-oriented and curved text strings, which can
have touching characters. Their technique first divides the IV. I DENTIFYING I NDIVIDUAL T EXT S TRINGS
input document into columns of equal sizes and then de- Once we extract the text pixels, we have a binary image
tects connected components within each column for further where each connected component (CC) in the foreground is
dividing the columns into blocks. Then the connected com- a single character or a part of a character, such as the top
ponents in each block are expanded in various orientations dot of the ‘i’. To group the CCs into strings, we present the
to compute the local linearity for extracting text strings. conditional dilation algorithm (CDA) and Figure 2 shows
This block-based approach works on touching characters but the pseudo-code of the CDA.
requires characters of similar sizes.
The CDA performs multiple iterations to expand and
Velázquez and Levachkine [13] and Pal et al. [8] present
connect the CCs and then uses the connectivity of the
text recognition techniques to handle characters in various
expanded CCs to identify individual text strings. As shown
font sizes, font types, and orientations. Their techniques are
in the ConditionalDilation function in Figure 2, before the
based on detecting straight string baselines for identifying
first CDA iteration, the CDA sets every CC as expandable.
individual text strings. These techniques cannot work on
Next, in an iteration, the CDA tests a set of conditions on
curved strings.
every background pixel (the TestConditions sub-function)
Pouderoux et al. [10] present a text recognition technique
to determine if the pixel is a valid expansion pixel: a
for raster maps. They identify text strings in a map by
background pixel that can be converted to the foreground
analyzing the geometry properties of individual connected
for expanding a CC. After an iteration, the CDA evaluates
components in the map and then rotate the identified strings
each expanded CC (the CountExpandableCC sub-function)
horizontally for OCR. Roy et al. [11] detect text lines from
to determine whether the CC can be further expanded in the
multi-oriented, straight or curved strings. Their algorithm
next iteration and stops when there is no expandable CC.
handles curved strings by applying a fixed threshold on
We describe the test conditions to determine an expansion
the connecting angle between the centers of three nearby
pixel and an expandable CC in the remainder of this section.
characters. Their orientation detection method only allows
a string to be classified into 1 of the 4 directions. In both
Character Connectivity Condition An expansion pixel
[10, 11], their methods are based on the assumption that
needs to connect to at least one and at most two characters.
the string curvature can be accurately estimated from the
This is because the maximum neighboring characters that
line segments connecting each character center in a string.
any character in a text string can have is two.
However, this assumption does not hold when the string
Character Size Condition If an expansion pixel connects
characters have very different heights or widths. In contrast,
to two characters, the sizes of the two characters must be
we present a robust technique to estimate the curvature and
similar. For a character, A, and its bounding box, Abx, the
orientation of a text string and our technique is independent
size of A is defined as:
from the character size.
Size = M ax(Abx.Height, Abx.W idth) (1)
III. OVERVIEW OF O UR T EXT R ECOGNITION A PPROACH
Given a document image, there are three major steps in For the characters connected by expansion pixels, the size
our approach for text recognition. First, we extract the text ratio between the characters must be smaller than a pre-
pixels from the input document. For an input image, the defined parameter (the max size ratio parameter). For two
user provides example text areas where each text area is characters, A and B, their bounding boxes are Abx and Bbx,
a rectangle that contains a horizontal string. The user can their size ratio is defined as:
rotate the rectangle to select a text string that is not hori- M ax(Size(A), Size(B))
zontally placed in the image. Since each rectangle contains SizeRatio = (2)
M in(Size(A), Size(B))
a horizontal string, we exploit the fact that the text pixels
are horizontally near each other to identify the colors that This character size condition guarantees that every character
represent text in the image and use the identified colors to in an identified text string has a similar size. We use the size
extract the text pixels [2]. Second, we dynamically group ratio equal to two because some letters, such as the English
the extracted text pixels into text strings, which is the main letter ‘l’ and ‘e’, do not necessarily have the exact same
focus of this paper. Third, with the identified text strings, size, even when the same font is used.
// The number of processed iterations
!
!

that the characters of the text strings in different orien-


IterationCounter = 0;!
// The number of expandable connected components! tations will not be connected. However, determining the
Expandable_CC_Counter; ! string curvature without knowing how the characters are
// CDA parameters! aligned is unreliable. For example, considering the text string
double max_size_ratio, max_distance_ratio, !
max_curvature_ratio; ! “Wellington”, if we link the mass centers or bounding-box
! centers of each character to represent the string curvature,
MainFunction void ConditionalDilation(int[,] image)!
FOR EACH connected component CC in image! the line segments linking any two neighboring characters
CC.expandable = TRUE;! can have very different orientations since the characters have
DO{ TestConditions(image);! various heights, such as the links between “We” and the one
CountExpandableCC(image);!
IterationCounter = IterationCounter+1; ! between “el”.
} WHILE(Expandiable_CC_Counter > 0)!
EndMainFunction!
To accurately estimate the curvature of a string, the CDA
! first establishes a curvature baseline for the string. For
SubFunction void TestConditions(int[,] image)! example, the left image in Figure 3(a) shows an example
FOR EACH background pixel BG in image!
IF(PassConnectivityTest(BG)&&PassSizeTest(BG)&& ! string, and the right image shows the rearranged string
PassExpandabilityTest(BG)&&! as if the example string is straight and in the horizontal
PassStringCurvatureTest(BG))!
Set BG to Foreground;!
direction. The CDA generates the rearranged string by first
EndSubFunction! aligning each of the characters vertically and rearranging
! the characters’ positions in the horizontal direction so that
SubFunction void CountExpandableCC(int[,] image)!
FOR EACH expanded connected component ECC in image! the characters are not overlapped. The dashed line in the
IF(HasConnectedToTwoECCs(ECC) || ! right image shows the curvature baseline of “dale”. This
IterationCounter > max_distance_ratio*ECC.char_size)!
curvature baseline contains two connecting angles: the ones
ECC.expandable = FALSE;!
ELSE! between“dal” and “ale”.
Expandable_CC_Counter = Expandable_CC_Counter+1;!
EndSubFunction!
With the curvature baseline, the CDA determines the
!
Figure 2.
string curvature by comparing the connecting angles in the
The pseudo-code for the conditional dilation algorithm (CDA)
original string to the ones in the curvature baseline. For
Character Expandability Condition An expansion pixel example, Figure 3(c) shows that θ1 is similar to θ1 ’ and
needs to connect to at least one expandable CC and the θ2 is similar to θ2 ’ and hence the CDA considers the string
expandability of a CC is determined as follows: before the “dale” as a straight string (i.e., every original connecting
first CDA iteration, every CC is expandable. After each angle is similar to its corresponding one). Figure 3(d) shows
iteration, the CDA checks the connectivity of each expanded an example where θ1 is very different from θ1 ’ and hence
CC and if the expanded CC has already connected to two the CDA considers the string “AvRi” as a curved string.
other CCs, the CC is not expandable. The CDA uses a curvature parameter to control
Next, for the remaining expanded CCs (i.e., the ones the maximum desired curvature of a text string (the
with connectivity less than two), the CDA determines the max curvature ratio parameter). If the difference between
expandability of each CC by comparing the number of iter- one connecting angle of a string and the corresponding angle
ations that have been done and the original size of each CC in the string’s curvature baseline is larger than the curvature
before any expansion. This is to control the longest distance parameter, the string violates the string curvature condition.
between any two characters that the CDA can connect so For example, with the curvature parameter set to 30% from
that the characters in two separated strings will not be the curvature baseline, any string with curvature within 138◦
connected. For example, in our experiments, we empirically (180◦ divided by 130%), to 234◦ (180◦ multiplied by 130%)
set the longest distance between two characters to 1/5 of will be preserved.
the character size (the max distance ratio parameter). As a The CDA Output After the CDA stops when there is no
result, for a character of size equal to 20 pixels, the character expansion pixel, each connected component of the expansion
will not be expandable after 4 iterations, which means this results is an identified text string. For example, in Figure 4,
character can only find a connecting neighbor within the the set of color blobs are the expansion results (each color
distance of 4 pixels plus 1/5 of the size of a neighboring represents a connected component) and the black pixels
CC. overlapped with a color blob belong to an identified string. In
String Curvature Condition If an expansion pixel con- Figure 4, the CDA does not group small CCs correctly, such
nects two CCs and at least one of the two CCs has a as the dot on top of the character ‘i’ . This is because these
connected neighbor (i.e., together as a string with at least small CCs violate the character size condition. The OCR
three characters), the curvature of the set of CCs should system will recover these missing small parts in the character
be less than the maximum desired curvature. This condition recognition step, which is more robust than adopting special
allows the CDA to identify curved strings and guarantees rules for handling small CCs in the CDA.
(a) The original string (left) and curvature baseline (right) of “dale”

Figure 5. A portion of the GIZI map

Table I
(b) The original string (left) and curvature baseline (right) of “AvRi” T EST MAPS FOR EXPERIMENT
Map Source (abbr.) Map Type # Char/Word
International Travel Maps (ITM) Scanned 1358/242
Gecko Maps (GECKO) Scanned 874/153
Gizi Map (GIZI) Scanned 831/165
Rand McNally (RM) Computer Generated 1154/266
UN Afghanistan (UNAfg) Computer Generated 1607/309
Google Maps (Google) Computer Generated 401/106
Live Maps (Live) Computer Generated 233/64
OpenStreetMap (OSM) Computer Generated 162/42
(c) θ1 /θ2 is similar to θ1 ’/θ2 ’ (d) θ1 is very different from θ1 ’ MapQuest Maps (MapQuest) Computer Generated 238/62
Yahoo Maps (Yahoo) Computer Generated 214/54
Figure 3. Testing the string curvature condition

ABBYY FineReader 10 in all metrics, especially the recall.


ABBYY FineReader 10 did not do well on identifying text
regions from the test maps because of the multi-oriented
text strings in the maps. ABBYY FineReader 10 alone
could only recognize the stings that are in the horizontal or
vertical directions. Moreover, ABBYY FineReader 10 could
not detect any text region from the Google, OSM, MapQuest,
and Yahoo maps and hence the precision and recall are 0 at
Figure 4. The CDA output both the character and word levels.
V. E XPERIMENTS Overall Strabo achieved accurate text recognition results
at both the character and word levels. This is because the
We have implemented the techniques described in this CDA successfully grouped the multi-oriented and multi-
paper in our map processing system called Strabo. To sized characters into individual text strings for OCR. More-
evaluate our technique, we tested Strabo on 15 maps from over, the CDA correctly identified curved strings that have
10 sources, including 3 scanned maps and 12 computer- their curvature within the desired curvature ratio (30%), such
generated maps (directly generated from vector data).1 These as the example shown in Figure 6.
maps contain non-homogeneous text of numeric characters
The errors in Strabo’s results came from several aspects:
and the English alphabet. Table I shows the information of
(i) The poor image quality of the test maps, especially
the test maps and their abbreviations used in this section.
scanned maps, could result in poor quality of text pixels,
Figure 5 shows one example area in a test map.
such as broken characters or the existence of non-text
We utilized Strabo together with a commercial OCR
objects in the extracted text pixels. (ii) The CDA might
product called ABBYY FineReader 10 to recognize the
not correctly identify strings with significant wide character
text labels in the test maps. For comparison, ABBYY
spacing. For example, Figure 7 the string “Hindu Kush” in
FineReader 10 was also tested alone without Strabo. For
the UNAfg map was not identified correctly. (iii) The CDA
evaluating the recognized text labels, we report the precision
might group characters with non-text objects. If there exist
and recall at both the character and word levels.
non-text objects in the CDA input and a non-text object
Table II shows the numeric results of our experiments.
was close to one end of a string and has a similar size
Strabo produced higher numbers compared to using only
as the ending character, the CDA would connect the end
1 The information for obtaining the test maps can be found on: http: character to the non-text object. A connected-component
//www.isi.edu/integration/data/maps/prj map extract data.html filter can be used to post-process the extracted text pixel
Table II
T EXT RECOGNITION RESULTS (P. IS PRECISION AND R. IS RECALL )
Source System Ch. P. Ch. R. Wd. P. Wd. R.
Strabo 93.6% 93.3% 83.3% 82.6%
ITM
ABBYY 86.4% 45.6% 57.5% 33%
Strabo 93.4% 86.3% 83.1% 77.1%
GECKO
ABBYY 77.8% 41% 66.2% 37.2%
Strabo 95.1% 77.3% 82% 63.6%
GIZI
ABBYY 71.3% 16% 51.4% 10.9%
Strabo 93.4% 94% 87.9% 84.9%
RM
ABBYY 71.8% 10.4% 23.5% 3%
Strabo 91.5% 88% 82.3% 80.2%
UNAfg
ABBYY 65.6% 56% 34.8% 36.5% Figure 7. Wide character spacing
Strabo 97.3% 91.7% 89.2% 85.8%
Google
ABBYY 0% 0% 0% 0% [2] Chiang, Y.-Y. (2010). Harvesting Geographic Features
Strabo 94.7% 93.5% 75.3% 76.5% from Heterogeneous Raster Maps Ph.D. Dissertation,
Live
ABBYY 51.8% 47.6% 47.8% 53.1%
Strabo 95.4% 77.7% 74.3% 69% University of Southern California.
OSM
ABBYY 0% 0% 0% 0% [3] Chiang, Y.-Y. and Knoblock, C. A. (2010). An approach
Strabo 91.3% 84% 81% 75.8% for recognizing text labels in raster maps. In Proceedings
MapQuest
ABBYY 0% 0% 0% 0%
Strabo 69.7% 63.5% 43.1% 40.7% of the 20th ICPR, pages 3199–3202.
Yahoo [4] Deseilligny, M. P., Mena, H. L., and Stamonb, G.
ABBYY 0% 0% 0% 0%
Avg. Strabo 92.7% 87.9% 82% 77.5% (1995). Character string recognition on maps, a rotation-
Avg. ABBYY 71.9% 30% 46.1% 20.6%
invariant recognition method. Pattern Recognition Let-
ters, 16(12):1297–1310.
[5] Fletcher, L. A. and Kasturi, R. (1988). A robust algo-
rithm for text string separation from mixed text/graphics
images. IEEE TPAMI, 10(6):910–918.
[6] Goto, H. and Aso, H. (1998). Extracting curved text
lines using local linearity of the text line. IJDAR, 2(2–
Figure 6. An identified curved string with its rotated image containing 3):111–119.
the horizontal string for OCR
[7] Nagy, G., Samal, A., Seth, S., Fisher, T., Guthmann,
for removing this type of error. However, the connected- E., Kalafala, K., Li, L., Sivasubramaniam, S., and Xu,
component filter would need careful parameter settings and Y. (1997). Reading street names from maps - technical
might also remove characters. challenges. In GIS/LIS conference, pages 89–97.
[8] Pal, U., Sinha, S., and Chaudhuri, B. B. (2003). Multi-
VI. D ISCUSSION AND F UTURE W ORK
oriented english text line identification. In Proceedings
We presented a general text recognition technique for of the 13th Scandinavian conference on Image analysis,
processing documents that contain non-homogeneous text pages 1146–1153.
lines. This technique handles multi-oriented, curved and [9] Pezeshk, A. and Tutwiler, R. (2010). Extended character
straight text lines of multi-sized characters and requires only defect model for recognition of text from maps. In
three parameter settings. We show that our technique can be Proceedings of the IEEE Southwest Symposium on Image
easily integrated with a commercial OCR product to support Analysis Interpretation, pages 85–88.
text recognition from documents for which classic layout [10] Pouderoux, J., Gonzato, J. C., Pereira, A., and Gui-
analysis techniques do not work. In the future, we plan to tton, P. (2007). Toponym recognition in scanned color
test this text recognition technique on non-English scripts. topographic maps. In Proceedings of the 9th ICDAR,
We also plan to broaden the coverage of our technique to volume 1, pages 531–535.
handle documents with mostly touching characters, such as [11] Roy, P. P., Pal, U., Lladós, J., and Kimura, F. (2008).
by incorporating a character segmentation method [12]. Multi-oriented english text line extraction using back-
ACKNOWLEDGMENT ground and foreground information. IAPR International
Workshop on DAS, 0:315–322.
This research is based upon work supported in part by the [12] Roy, P. P, Pal, U., Lladós, J., and Delalandre, M.
University of Southern California under the Viterbi School (2009). Multi-oriented and multi-sized touching character
of Engineering Doctoral Fellowship. segmentation using dynamic programming. In the Pro-
R EFERENCES ceedings of the 10th ICDAR, pages 11–15.
[13] Velázquez, A. and Levachkine, S. (2004). Text/graphics
[1] Adam, S., Ogier, J., Cariou, C., Mullot, R., Labiche, J.,
separation and recognition in raster-scanned color carto-
and Gardes, J. (2000). Symbol and character recognition:
graphic maps. In GREC, vol 3088 of LNCS, pages 63–74.
application to engineering drawings. IJDAR, 3(2):89–101.

You might also like