Reference
Reference
Abstract: In this age of digitization, there is a growing need to The remaining portion of this paper is organized as
preserve physical copies of documents such as historical text. It follows, Section II deals with various preprocessing
is important in digitization to capture every aspect of the methods being tested and Section III describes the IQA
document which is infeasible due to challenges such as fading, approaches used for analysis. Section IV illustrates the
creases, and shadows. Various approaches have been put forth
experimental evaluation and the results. Section V
to improve upon text extraction by means of preprocessing.
This paper analyses the effect of applying some general concludes the paper with directions for future work.
preprocessing methods such as Thresholding, Morphology, and
II. PREPROCESSING METHODS
Blurring and enhancements of quality in the output obtained.
Experimental results show that preprocessing improves the Image preprocessing algorithm can be used for wide
visual and structural quality of the document to a certain variety of sophisticated image processing applications.
extent.
Parker [1] and Gonzalez, Woods [2] provides an
Keywords- Image Quality Assessment, Preprocessing, understanding of the nature of these algorithms that can be
Digitization. Blurring, Thresholding, Morphology. adopted to speed up the development of image processing
used.In various text extraction methods, there are either
I. INTRODUCTION generalized or specialized preprocessing mechanisms being
used. We focus on general methods for preprocessing such
Text extraction from images has been a major issue in
as Thresholding, Morphology, and Blurring.
the field of computer Vision. Text in images provides
valuable information that can server as the basis for a A. Thresholding
variety of applications. However, the accuracy of text Thresholding techniques attempt to binarize a grayscale
extraction is noticeably influenced by the quality of the image based on pixel density. Thresholding provides a
images. When considering the digitization of old historical simple way to achieve segmentation operation over the
documents or creating digital copies of handwritten notes, foreground and background regions of an image. A
the quality of an image remains a major factor affecting the parameter called intensity threshold determines the output
accuracy of text extraction. As a result, the commercial produced. Depending on whether the pixel intensity was
applications of this technology has been limited. greater that on lesser than the threshold value, it would be
Preprocessing techniques are usually layered on top of other replaced by either a white or black pixel. There are three
functions to further enhance the text extraction quality. In main methods for thresholding that are tested in this paper:
this paper, we explore the effect of basic preprocessing Simple Thresholding, Adaptive Thresholding, and Otsu’s
techniques on image quality and subsequently their effect on Method.
text extraction. Simple thresholding performs a basic scan through the
pixels of the image by checking the intensities of each pixel.
To discern the ability of a preprocessing technique, its
An intensity threshold has to be specified each time the
effectiveness should be analyzed. This can be achieved by
process is performed. In Grover, Arora, and Mitra [3]’s edge
performing quality analysis on both original image and
based text detection method, thresholding is used to remove
preprocessed image. Image Quality Assessment (IQA) has a
weak edges from the edge detected image.
major role in the field of digital image processing. It is
highly preferred in the assessment of various image-based Adaptive thresholding utilizes an algorithm to perform
operations. IQA helps in evaluating various preprocessing thresholding for a small area of the input image. This allows
algorithms, thus help to choose the necessary algorithms to for the adaptive thresholding methods to overcome the
be implemented in a sequence. limitations regarding lighting conditions that the simple
thresholding methods suffer from. Simple thresholding
utilizes a fixed threshold which causes it to be incapable of
properly processing images which causes it to be incapable on the object surface. Morphological operations should be
of properly processing images with variations in selected depending on the condition of the objects(text) in
illumination such as shadows whereas adaptive thresholding the image.
can handle these variations with ease. In Shi, Setlur, and
Govindaraju [4]’s paper for extracting handwritten Arabic C. Blurring
text, adaptive thresholding is applied to get a binarized Blurring removes high-frequency contents in the image
image which gives a rough estimate of the text line location. such as noise. Three blurring methods tested in this paper
are average filter, Gaussian filter, median filter.
Otsu’s methods developed by N.Otsu [5] employs a In average filter, the image is convolved using a
bimodal histogram to image thresholding based on normalized box filter that takes the average of entire pixels
automatic clustering. It is a unique method which separates under a specified kernel area and changes the central
the pixels much better than the previous methods but fails to element.
create a consistent result across the entire image. It exceeds
the noise reduction capabilities of both adaptive and simple Gaussian kernel is used in the Gaussian filter to perform
thresholding but is limited by image size and illumination blurring. It is able to remove the Gaussian noise. The
variances. formula of a Gaussian function in two dimensions is:
( )
B. Morphology ( )= (4)
In Morphology, the image is examined using a small
template that is known as a structuring element. A Here x is the distance along the horizontal axis, y is the
structuring element such as kernel is used as a reference to distance along the vertical axis, and σ denotes the standard
compare with the corresponding pixels at all potential deviation of the Gaussian distribution.
locations in the image. These operations are suited for used
on binary images. A kernel must be specified for performing Median filter takes the median of every pixel under the
morphological operations and it influences the nature of the specified area, after which the central element is replaced by
results. Four morphological approaches tested in this paper the median value. It eliminates the noise while maintaining
are erosion, dilation, opening, and closing. the edges. Salt and pepper noise are eliminated using this
Erosion attempts to erode away the boundaries of the method. In Seeri, Giraddi, and Prasanth [8]’s paper on
foreground object. Based on the kernel’s size, all pixels next Kannada text extraction, noise found in the image is
to the boundary are discarded. So the foreground object’s removed using a median filter and in Kawano, Orii, Maeda,
size decreases or there is a decrease in the white region of and Ikoma [9]’s work, median filter functions as a
the image. The translation of the set A by the point x is background estimator for the removal of noise.
defined in set notation as:
III. TESTING METHODS USED
( ) ={ | = + , ∈ } (1)
Since most of the devices are operated by people,
The erosion of an image I by structuring element S can quality issues are inevitable. The accuracy of text extraction
be defined as: is heavily influenced by the quality of the image; this can be
achieved through preprocessing. The quality of the
Ɵ = { |( ) ⊆ } (2) preprocessed images can be determined by IQA which
Dilation, as opposed to erosion, increases the white supports both subjective and objective evaluations. The
region in an image. It is often used to increase object size. In former being the way in which humans view image quality
the works of Nagabhushan and Nirmala [6], dilation is used and the latter being based on computational models and
to enhance an edge detected image for connected component algorithms that can analyze the image quality. In this work,
analysis and Audithan and Chandrasekaran [7] used dilation we are using a few of the full referenced methods to
to connect the text edges in each detail component. Dilation determine the improvement induced by the preprocessor.
of an image I by structuring element S can be defined by: Full referenced methods are a type of objective IQA where
there is a comparison of the original image with a reference
⊕ =⋃ ( ) (3) image. Some methods used are Pixel-Based Visual
∊
Information Fidelity (VIFP), Mean Squared error (MSE),
Opening and Closing Morphology are derived from the Peak Signal to Noise Ratio (PSNR), Structural Similarity
previous two techniques, Erosion followed by Dilation and Index (SSIM), Multi Scale Structural Similarity Index
Dilation followed by Erosion respectively. Opening is useful (MSSSIM), and Universal Image Quality Index (UQI)[10].
in increasing object size as well as joining broken parts of
an object. Closing helps in removing the small white points
MSE and PSNR measures the difference between two around 1000 receipts of various categories. IQA results are
images and the result is the similarity of strength of error obtained by comparing the original and the corresponding
between the images. From the works of Wang and Bovik preprocessed image. Fig. 1a shows an image of a receipt
[11], it can be noted that PSNR is useful when comparing present in the data set we have used for this work. Table I
images with dynamic ranges. The disadvantage of MSE is shows the IQA results. In all cases, it is observed that a
that it doesn’t represent human perceived image quality. higher value for PSNR and a lower value for MSE is
Wang and Bovik proposed a measure for UQI [10], it splits obtained, which indicates better quality. While measuring
the comparison between original and distorted images into UQI, SSIM, MSSSIM, VIFP, it can be noted a value of 1
three components which are luminance, structural and means the image and its reference are exactly the same. If
contrast. Wang et al. proposed SSIM because UQI doesn’t the value is close to 0 or 0 itself, then there is very little
correlate well with subjective assessment. In Wang, Bovik, structural similarity or no similarities between the two
Sheikh, and Simoncelli [12] the basic version of SSIM is images. From the table, it is clear that preprocessing method
described, where structural information is similar to UQI. referred to as opening morphology yields a good result that
SSIM outperform MSE and UQI but still has flaws. It is more similar to the original image. The preprocessing
doesn’t perform well in cases of translated, rotated, and method adaptive thresholding turns out to be the least
scaled images, even when the quality of an image and its performing one. But the IQA metric VIFP is not giving
corresponding reference image are the same. The initial consistent results. Based on the VIFP values, the
steps of MS-SSIM are similar to SSIM. Here the similar preprocessing method simple thresholding provides the least
steps are repeated at various scalings of the original image. similar image. From a human perspective, VIFP proves to
Compared to SSIM, MSSSIM performs more computation be more accurate.
and produces better results. VIFP is another image quality
TABLE I. IQA RESULTS
assessment method presented by Sheik and Bovik [13].
VIFP is a type of full reference IQA index which is based on Preprocessing IQA Methods
the natural scene statistics and involves the concepts of the Methods UQI SSIM PSNR MSE VIFP MSSIM
Simple 0.9779 0.7764 18.40 939.53 0.0934 0.7080
Human Visual System (HVS) for the image information Thresholding
extraction. Adaptive 0.9705 0.6935 15.58 1796.17 0.2156 0.6950
Thresholding
IV. EXPERIMENTAL EVALUATION Otsu’s 0.9715 0.7946 17.48 1159.13 0.1963 0.8004
Thresholding
Dilation 0.9939 0.8608 23.45 293.85 0.2851 0.8618
Morphology
Erosion 0.9898 0.8959 22.47 367.71 0.3567 0.8974
Morphology
Opening 0.9990 0.9571 32.46 36.86 0.5410 0.9563
Morphology
Closing 0.9978 0.9233 27.47 116.36 0.4245 0.9231
Morphology
Average 0.9976 0.8904 26.74 137.66 0.3006 0.8899
Blurring
Gaussian 0.9987 0.9398 29.88 66.84 0.4463 0.9391
Blurring
Median 0.9980 0.9036 27.44 117.15 0.3428 0.9033
Blurring