0% found this document useful (0 votes)
9 views

Reference

The document discusses various preprocessing techniques for improving text extraction from document images, including thresholding, morphology operations, and blurring. It analyzes the effect of these techniques on image quality and text extraction accuracy. Experimental results show that preprocessing can enhance the visual and structural quality of images and thereby improve text extraction to some extent.

Uploaded by

yadhavan d
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Reference

The document discusses various preprocessing techniques for improving text extraction from document images, including thresholding, morphology operations, and blurring. It analyzes the effect of these techniques on image quality and text extraction accuracy. Experimental results show that preprocessing can enhance the visual and structural quality of images and thereby improve text extraction to some extent.

Uploaded by

yadhavan d
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Preprocessing Techniques for High Quality Text

Extraction from Text Images


Alan Koshy Niranj Balakumar MJ Prof. Shyna A Prof. Ansamma John
Department of Computer Science Department of Computer Science Department of Computer Science Department of Computer Science
TKM College of Engineering TKM College of Engineering TKM College of Engineering TKM College of Engineering
Kollam, India Kollam, India Kollam, India Kollam, India
[email protected] [email protected] [email protected] [email protected]

Abstract: In this age of digitization, there is a growing need to The remaining portion of this paper is organized as
preserve physical copies of documents such as historical text. It follows, Section II deals with various preprocessing
is important in digitization to capture every aspect of the methods being tested and Section III describes the IQA
document which is infeasible due to challenges such as fading, approaches used for analysis. Section IV illustrates the
creases, and shadows. Various approaches have been put forth
experimental evaluation and the results. Section V
to improve upon text extraction by means of preprocessing.
This paper analyses the effect of applying some general concludes the paper with directions for future work.
preprocessing methods such as Thresholding, Morphology, and
II. PREPROCESSING METHODS
Blurring and enhancements of quality in the output obtained.
Experimental results show that preprocessing improves the Image preprocessing algorithm can be used for wide
visual and structural quality of the document to a certain variety of sophisticated image processing applications.
extent.
Parker [1] and Gonzalez, Woods [2] provides an
Keywords- Image Quality Assessment, Preprocessing, understanding of the nature of these algorithms that can be
Digitization. Blurring, Thresholding, Morphology. adopted to speed up the development of image processing
used.In various text extraction methods, there are either
I. INTRODUCTION generalized or specialized preprocessing mechanisms being
used. We focus on general methods for preprocessing such
Text extraction from images has been a major issue in
as Thresholding, Morphology, and Blurring.
the field of computer Vision. Text in images provides
valuable information that can server as the basis for a A. Thresholding
variety of applications. However, the accuracy of text Thresholding techniques attempt to binarize a grayscale
extraction is noticeably influenced by the quality of the image based on pixel density. Thresholding provides a
images. When considering the digitization of old historical simple way to achieve segmentation operation over the
documents or creating digital copies of handwritten notes, foreground and background regions of an image. A
the quality of an image remains a major factor affecting the parameter called intensity threshold determines the output
accuracy of text extraction. As a result, the commercial produced. Depending on whether the pixel intensity was
applications of this technology has been limited. greater that on lesser than the threshold value, it would be
Preprocessing techniques are usually layered on top of other replaced by either a white or black pixel. There are three
functions to further enhance the text extraction quality. In main methods for thresholding that are tested in this paper:
this paper, we explore the effect of basic preprocessing Simple Thresholding, Adaptive Thresholding, and Otsu’s
techniques on image quality and subsequently their effect on Method.
text extraction. Simple thresholding performs a basic scan through the
pixels of the image by checking the intensities of each pixel.
To discern the ability of a preprocessing technique, its
An intensity threshold has to be specified each time the
effectiveness should be analyzed. This can be achieved by
process is performed. In Grover, Arora, and Mitra [3]’s edge
performing quality analysis on both original image and
based text detection method, thresholding is used to remove
preprocessed image. Image Quality Assessment (IQA) has a
weak edges from the edge detected image.
major role in the field of digital image processing. It is
highly preferred in the assessment of various image-based Adaptive thresholding utilizes an algorithm to perform
operations. IQA helps in evaluating various preprocessing thresholding for a small area of the input image. This allows
algorithms, thus help to choose the necessary algorithms to for the adaptive thresholding methods to overcome the
be implemented in a sequence. limitations regarding lighting conditions that the simple
thresholding methods suffer from. Simple thresholding
utilizes a fixed threshold which causes it to be incapable of
properly processing images which causes it to be incapable on the object surface. Morphological operations should be
of properly processing images with variations in selected depending on the condition of the objects(text) in
illumination such as shadows whereas adaptive thresholding the image.
can handle these variations with ease. In Shi, Setlur, and
Govindaraju [4]’s paper for extracting handwritten Arabic C. Blurring
text, adaptive thresholding is applied to get a binarized Blurring removes high-frequency contents in the image
image which gives a rough estimate of the text line location. such as noise. Three blurring methods tested in this paper
are average filter, Gaussian filter, median filter.
Otsu’s methods developed by N.Otsu [5] employs a In average filter, the image is convolved using a
bimodal histogram to image thresholding based on normalized box filter that takes the average of entire pixels
automatic clustering. It is a unique method which separates under a specified kernel area and changes the central
the pixels much better than the previous methods but fails to element.
create a consistent result across the entire image. It exceeds
the noise reduction capabilities of both adaptive and simple Gaussian kernel is used in the Gaussian filter to perform
thresholding but is limited by image size and illumination blurring. It is able to remove the Gaussian noise. The
variances. formula of a Gaussian function in two dimensions is:
( )
B. Morphology ( )= (4)
In Morphology, the image is examined using a small
template that is known as a structuring element. A Here x is the distance along the horizontal axis, y is the
structuring element such as kernel is used as a reference to distance along the vertical axis, and σ denotes the standard
compare with the corresponding pixels at all potential deviation of the Gaussian distribution.
locations in the image. These operations are suited for used
on binary images. A kernel must be specified for performing Median filter takes the median of every pixel under the
morphological operations and it influences the nature of the specified area, after which the central element is replaced by
results. Four morphological approaches tested in this paper the median value. It eliminates the noise while maintaining
are erosion, dilation, opening, and closing. the edges. Salt and pepper noise are eliminated using this
Erosion attempts to erode away the boundaries of the method. In Seeri, Giraddi, and Prasanth [8]’s paper on
foreground object. Based on the kernel’s size, all pixels next Kannada text extraction, noise found in the image is
to the boundary are discarded. So the foreground object’s removed using a median filter and in Kawano, Orii, Maeda,
size decreases or there is a decrease in the white region of and Ikoma [9]’s work, median filter functions as a
the image. The translation of the set A by the point x is background estimator for the removal of noise.
defined in set notation as:
III. TESTING METHODS USED
( ) ={ | = + , ∈ } (1)
Since most of the devices are operated by people,
The erosion of an image I by structuring element S can quality issues are inevitable. The accuracy of text extraction
be defined as: is heavily influenced by the quality of the image; this can be
achieved through preprocessing. The quality of the
Ɵ = { |( ) ⊆ } (2) preprocessed images can be determined by IQA which
Dilation, as opposed to erosion, increases the white supports both subjective and objective evaluations. The
region in an image. It is often used to increase object size. In former being the way in which humans view image quality
the works of Nagabhushan and Nirmala [6], dilation is used and the latter being based on computational models and
to enhance an edge detected image for connected component algorithms that can analyze the image quality. In this work,
analysis and Audithan and Chandrasekaran [7] used dilation we are using a few of the full referenced methods to
to connect the text edges in each detail component. Dilation determine the improvement induced by the preprocessor.
of an image I by structuring element S can be defined by: Full referenced methods are a type of objective IQA where
there is a comparison of the original image with a reference
⊕ =⋃ ( ) (3) image. Some methods used are Pixel-Based Visual

Information Fidelity (VIFP), Mean Squared error (MSE),
Opening and Closing Morphology are derived from the Peak Signal to Noise Ratio (PSNR), Structural Similarity
previous two techniques, Erosion followed by Dilation and Index (SSIM), Multi Scale Structural Similarity Index
Dilation followed by Erosion respectively. Opening is useful (MSSSIM), and Universal Image Quality Index (UQI)[10].
in increasing object size as well as joining broken parts of
an object. Closing helps in removing the small white points
MSE and PSNR measures the difference between two around 1000 receipts of various categories. IQA results are
images and the result is the similarity of strength of error obtained by comparing the original and the corresponding
between the images. From the works of Wang and Bovik preprocessed image. Fig. 1a shows an image of a receipt
[11], it can be noted that PSNR is useful when comparing present in the data set we have used for this work. Table I
images with dynamic ranges. The disadvantage of MSE is shows the IQA results. In all cases, it is observed that a
that it doesn’t represent human perceived image quality. higher value for PSNR and a lower value for MSE is
Wang and Bovik proposed a measure for UQI [10], it splits obtained, which indicates better quality. While measuring
the comparison between original and distorted images into UQI, SSIM, MSSSIM, VIFP, it can be noted a value of 1
three components which are luminance, structural and means the image and its reference are exactly the same. If
contrast. Wang et al. proposed SSIM because UQI doesn’t the value is close to 0 or 0 itself, then there is very little
correlate well with subjective assessment. In Wang, Bovik, structural similarity or no similarities between the two
Sheikh, and Simoncelli [12] the basic version of SSIM is images. From the table, it is clear that preprocessing method
described, where structural information is similar to UQI. referred to as opening morphology yields a good result that
SSIM outperform MSE and UQI but still has flaws. It is more similar to the original image. The preprocessing
doesn’t perform well in cases of translated, rotated, and method adaptive thresholding turns out to be the least
scaled images, even when the quality of an image and its performing one. But the IQA metric VIFP is not giving
corresponding reference image are the same. The initial consistent results. Based on the VIFP values, the
steps of MS-SSIM are similar to SSIM. Here the similar preprocessing method simple thresholding provides the least
steps are repeated at various scalings of the original image. similar image. From a human perspective, VIFP proves to
Compared to SSIM, MSSSIM performs more computation be more accurate.
and produces better results. VIFP is another image quality
TABLE I. IQA RESULTS
assessment method presented by Sheik and Bovik [13].
VIFP is a type of full reference IQA index which is based on Preprocessing IQA Methods
the natural scene statistics and involves the concepts of the Methods UQI SSIM PSNR MSE VIFP MSSIM
Simple 0.9779 0.7764 18.40 939.53 0.0934 0.7080
Human Visual System (HVS) for the image information Thresholding
extraction. Adaptive 0.9705 0.6935 15.58 1796.17 0.2156 0.6950
Thresholding
IV. EXPERIMENTAL EVALUATION Otsu’s 0.9715 0.7946 17.48 1159.13 0.1963 0.8004
Thresholding
Dilation 0.9939 0.8608 23.45 293.85 0.2851 0.8618
Morphology
Erosion 0.9898 0.8959 22.47 367.71 0.3567 0.8974
Morphology
Opening 0.9990 0.9571 32.46 36.86 0.5410 0.9563
Morphology
Closing 0.9978 0.9233 27.47 116.36 0.4245 0.9231
Morphology
Average 0.9976 0.8904 26.74 137.66 0.3006 0.8899
Blurring
Gaussian 0.9987 0.9398 29.88 66.84 0.4463 0.9391
Blurring
Median 0.9980 0.9036 27.44 117.15 0.3428 0.9033
Blurring

From the experiments, it is observed that thresholding


methods reduce background noise while slightly enhancing
the text. When comparing simple and adaptive thresholding,
the former had better noise reduction while the latter had
better text enhancement. Otsu’s thresholding (Fig. 1b)
performs better than other thresholding methods in regards
to better text enhancement and noise reduction.
Fig.1 a) Sample Image b) Otsu’s Thresholding We performed four morphology methods from which
The preprocessing methods discussed in this paper is opening morphology performs better than other morphology
applicable to most images containing text such as methods by enhancing the text quality which may also
documents, receipts etc. In this work, we have applied enhance background noise. It is also observed that the high
various preprocessing techniques on a data set containing frequency noise encountered was reduced due to blurring
operations. Gaussian blurring showed the best results for
Gaussian noise and Median blurring best handled salt-and- [4] Z. Shi, S. Setlur, V. Govindaraju, “A Steerable Directional Local Profile
Technique for Extraction of Handwritten Arabic Text Lines”, 10th
pepper noise. International Conference on Document Analysis and Recognition,
Barcelona, July 26-29 (2009) 176-180.
V. CONCLUSION [5] N. Otsu, “A Threshold Selection Method from Gray-level Histogram”,
IEEE Trans. Syst. Man Cybern. 8, 62-66 (1978)
In this paper, various generalized preprocessing [6] P. Nagabhushan, S. Nirmala, “Text Extraction in Complex Color
techniques were performed on a dataset and IQA was used Document Images for Enhanced Readability”, Intelligent Information
Management, 2(2010) 120-133.
for analyzing the subsequent results. We found that among [7] S. Audithan, RM. Chandrasekaran, “ Document Text Extraction from
the techniques tested, opening morphology manages to keep Document Images using Haar Discrete Wavelet Transform”, European
the essential structure of the image intact, while enhancing Journal of Scientific Research, 36 (2009) 502-512.
[8] S.V. Seeri, s, Giraddi, Prashant B.M, “A Novel Approach for Kannada
the core aspects of the image. While the other processes Text Extraction”, Proceedings of the International Conference on Pattern
perform differently depending on the function, each of the Recognition, Informatics and Medical Engineering, Tamil Nadu, Mar. 21-
processes has managed to create noticeable changes in either 23 (2012) 444-448.
[9] H. Kawano, H. Orii, H. Maeda, N. Ikoma, “Text Extraction from
noise reduction, object enhancement or binarization. Future Degraded Document Image Independent of Character Color Based on
work could be considered for examining the quality MAP-MRF Approach”, IEEE, Jeju Island, Aug. 20-24 (2009) 165-168.
improvement in images when multiple preprocessing [10] Zhou Wang and A.C. Bolvik, “A universal image quality index,” in IEEE
Signal Processing Letters, vol. 9, no. 3, pp. 81-84, March 2002.
techniques are performed in sequence. [11] Wang, Z. and Bovik, A.,”Mean squared error: Love it or leave it? a new
look at signal fidelity measures”, Signal Processing Magazine, IEEE, 26,
REFERENCES 98-117, 2009.
[12] Wang, Z., Bovik, A., Sheik, H., and Simoncelli, E,. “Image quality
[1] James R. Parker. “Algorithms for Image Processing and Computer assessment: From error visibility to structural similarity”, Image
Vision”. John Wiley and Sons,1997. Processing, IEEE Transactions on, 13,600-612, 2004.
[2] Rafael C. Gonzalez, Richard E. Woods, “Digital Image Processing”, [13] Sheik, H. and Bovik, A., “Image information and visual quality, Image
Second Edition, 2002 Processing”, IEEE Transactions on, 15, 430-444, 2006.
[3] S. Grover, K. Arora, and S. Mitra, “Text Extraction from Document
Images using Edge Information”, in Annual IEEE India Conference
(INDICON), Vol. 1-4, IEEE, Gujarat, India (2009).

You might also like