0% found this document useful (0 votes)
12 views

Handwritten Document Image Binarization An Adaptive K-Means Based Approach

The document discusses an adaptive K-means based approach for binarization of degraded handwritten document images. It provides background on document image binarization techniques and related work. The proposed methodology uses K-means clustering for adaptive thresholding in 5 main steps. Experimental results on a public dataset are compared to top-performing methods to demonstrate the effectiveness of the approach.

Uploaded by

ivanz.arsi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Handwritten Document Image Binarization An Adaptive K-Means Based Approach

The document discusses an adaptive K-means based approach for binarization of degraded handwritten document images. It provides background on document image binarization techniques and related work. The proposed methodology uses K-means clustering for adaptive thresholding in 5 main steps. Experimental results on a public dataset are compared to top-performing methods to demonstrate the effectiveness of the approach.

Uploaded by

ivanz.arsi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2017 IEEE Calcutta Conference (CALCON)

Handwritten Document Image Binarization: An


Adaptive K-Means Based Approach
Prithwish Jana, Soulib Ghosh, Suman Kumar Bera and Ram Sarkar
Department of Computer Science and Engineering
Jadavpur University, Kolkata, India
[email protected], [email protected], [email protected], [email protected]

Abstract²Degraded historical document images face many This is followed by section four comprising experimental
challenges in the process of optical character recognizing or word results and verification. Finally, the conclusions and future
spotting, even after applying the traditional binarization work are narrated.
techniques. In this paper, we propose a K-means based clustering
technique for adaptive binarization of degraded document II. RELATED WORK
images. For validation of test results, we have used the recent
The desire to upsurge performance-wise efficiency of
dataset of Handwritten counterpart of Document Image
Binarization Contest (H-',%&2¶  FRPSULVLQJ RI KLJKO\
degraded document has appealed DIB to be explored
degraded handwritten document images and computed detailed colossally, resulting in publication of a large number of
results of each image. In order to corroborate verification and binarization algorithms. Generally speaking, the most
validation, the experimental results are compared with three top commonly adopted techniques deal with threshold estimation
winning ones in the contest and other prominent techniques in for each pixel using global or local method. In global
the literature. Experimental results reveal outstanding thresholding method, a single threshold value is determined to
performance in the four evaluation measures compared with the be applied to the whole image in order to put aside the pixels
top winners of the competition, claiming its effectiveness and under consideration into foreground and background. The
validity conformance. classical clustering based technique by Otsu [2] causes minimal
intra-class variance using standard deviation and mean of
Keywords— document image binarization, background bimodal histogram. It comprehensively searches a global
estimation, global and adaptive thresholding, K-means, H-DIBCO. threshold value taking weighted sum of variances. Other well-
known methods include moment-preserving deterministic way
I. INTRODUCTION by Tsai [3], entropy measure from gray level distribution by
Document image binarization (DIB) attempts to categorize Johannsen and Bille [4] and entropy of histogram by Kapur et
pixels in a degraded input image into two subdivisions based al. [5]. The main drawbacks of the global thresholding method
on gray-level pixel intensity: foreground and background. are that it cannot handle complex background, faint
Foreground subdivision contains texts in documents foreground, bleed through etc. Moreover, global thresholding is
represented by black pixels having low intensity whereas not adaptive but the computational cost is comparatively very
document background segment is represented by white pixels less.
of high intensity. DIB has been used as a very important
preprocessing stage in recognition and analysis of document In case of local or adaptive thresholding, same threshold is
images and helps in the downstream processing that deal with never used throughout the entirety of the image. Rather, the
information retrieval and optical character recognition (OCR). properties of a pixel and its neighbors in a sub-image help to
Image binarization facilitates a number of ensuing tasks such as determine the threshold. Niblack [6] uses rectangular window
detection of line, slant, slope, skew and word estimation. Other sliding over the gray-level image to determine pixel-wise
application areas include restoration of historical document and threshold that adapts using standard deviation and local mean.
verification of signature. DIB is still considered as a major Gatos et al. [7] use a low-pass Wiener filter, estimate
research area because of its complex challenges due to foreground parts and interpolate background intensities of
appearance of noise, complex background, uneven illumination neighboring pixels to compute background surface. The
and faint foreground often caused by smear, strain, bleed computed background surface is combined with the original
through, blur, ageing factor, seepage of ink etc. followed by a post-processing stage after image up-sampling
for improved quality with retained stroke connectivity.
Handwritten Document Image Binarization Contest (H- Adaptive or local thresholding also helps to handle complex
DIBCO) [1] is an established and popular forum in the field of background, faint foreground, to an extent but it is
DIB. Since 2009, DIBCO and H-DIBCO provide publicly computationally expensive and mostly manual parameter
available benchmarking dataset not only to the participants in tuning driven.
the contest, but also to the research community for the purpose
of upcoming work in the said field. This paper is organized in Background separation method using efficient background
five sections. In following part, we present a brief literature surface computation is presented by Gatos et al. [8].
survey related to several binarization methods. In section three, Moghaddam and Cheriet [9] adopt shrinkage of wavelet or
we present the proposed methodology through five major steps. time-stepping technique and use reverse diffusion process for
both sides of document remain degraded. Drira et al. [10]

978-1-5386-3745-6/17/$31.00 ©2017 IEEE 226

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:48:19 UTC from IEEE Xplore. Restrictions apply.
2017 IEEE Calcutta Conference (CALCON)

suggest to apply local PDE based anisotropic diffusion filter to because the use of an average filter applies a smoothening
diminish the underlying rounding corner problem and also help effect to random high intensity point-noises and edges appear
reinforce character discontinuity. Ramirez-Ortegon et al. [11] thick, bright and enhanced.
introduce the idea of transition pixel categorized by extreme The edge-detected image is henceforth termed as
transition metric by calculating differences of pixel-intensity in ୣୢ୥ୣ ሺ”ǡ ሻ. Initially, all the connected components of ୣୢ୥ୣ ሺ”ǡ ሻ
a small sub-window neighborhood for use in adaptive gray- are detected. Out of those, the small connected components are
level threshold computation. Hu et al. [12] adopt a region- eliminated [16] as these components are isolated noise pixels in
based segmentation to extract hieroglyph strokes from images the original image. So it is wise not to count them in the
of degraded ancient Maya codices while preserving delicate computation of the stroke-width. For the large connected
local details. The inherent drawback is the performance components, the height and width of the smallest rectangle
bottleneck due to innate sequential nature and finer bounding them is computed. Each row of the bounding-box is
segmentation. Moran [13] adopts a learning technique to scanned from left to right. In each row, the horizontal distance
binarize using hashing-based search of estimated adjacent (d) between 2 special kinds of pixels, is measured. The first is a
neighbor at the cost of voluminous training data and storage. pixel which is itself an edge-pixel while the pixel to its
Shen et al. [14] formulate multiclass classification as an immediate right is a non-edge pixel, and the second is a pixel
optimization problem using binary hash codes solving a sub- which is itself an edge-pixel while the pixel to its immediate
problem efficiently using either a linear program or a binary left is a non-edge pixel. This distance d is considered as a
quadratic program (BQP). DIB has been a popular research single stroke-width. Ultimately a histogram (H) of the various
topic for over three decades and even today produces stroke-widths (†୧ ) versus their frequency (ˆ୧ ) of occurrence is
interesting results over improvements focusing on several formed, for  ୣୢ୥ୣ . Average of the stroke-widths corresponding
prejudice and limitations, adding impetus to our motivation. to three highest frequencies of H is considered as the average
III. THE PROPOSED METHODOLOGY stroke-width (ୟ୴୥ ). A histogram showing stroke-width
frequency distribution corresponding to Image-3 of H-
The proposed method consists of five distinct major steps DIBCO¶[1] dataset is provided in Fig. 1.
that eventually culminate in effective binarization of the
degraded document images. Initially, the average stroke-width B. Background Estimation
of the candidate foreground regions is computed. Stroke-width %LQDUL]DWLRQDOJRULWKPVOLNHWKDWRI2WVX¶V[2], provide best
at a certain pixel is defined as the thickness of curves and lines outcome when the intensity histogram of an image has at most
that make up a character, measured between two opposite two sharp peaks. But in general, it fails to efficiently handle the
contour pixels, along any direction, generally, horizontal, documents with degradations like bleed-through, uneven
vertical or any diagonal. In our work, the horizontal stroke- background, background ink-stains, non-uniform illumination,
width is considered. This value is used in selection of the size and also those degraded by image artifacts. Background
of structuring element to estimate the background. Finally, the separation and image normalization help in providing a
background separated image is partitioned into uniformly bimodal histogram to this type of image, thus making the
symmetric blocks and based on a certain condition each block further procedure of binarization pretty much easier. The
is binarized separately with the aid of K-Means clustering. proposed method employs a fast and proficient yet simple
process for estimating an approximate background surface
( ሺ”ǡ ሻ) from the original grayscale image, ሺ”ǡ ሻǤ Initially, a
structuring element ሺ”ǡ ሻ [16] of size ™ ൈ ™ is formed with
each of its element as 1, as shown in Eq. (1), where ™ ൌ ሺʹ ൈ
ୟ୴୥ ൅ ͳሻ.
ͳ ‫ͳ ڮ‬
 ሺ”ǡ ሻ ൌ ൭ ‫ ڭ ڰ ڭ‬൱  
ͳ ‫୵ ͳ ڮ‬ൈ୵
The structuring element defines the neighborhood of the pixel
of interest, located at its well-defined center-pixel (because it is
odd-dimensional). Grayscale erosion [17] of an image locates
the minimum intensity of the points in neighborhood of the
Fig. 1. Histogram of frequency distribution of stroke-width. center-pixel, where the neighborhood is defined by the
structuring element. Thus it is similar to a local-minimum
A. Calculation of Stroke-Width in Text Regions operator. Consequently, this process removes all the small
components and shrinks the dimension of the larger
The average stroke-width (ୟ୴୥ ) of characters in a
components. The grayscale erosion of ሺ”ǡ ሻ by ሺ”ǡ ሻ is
document image helps to assess whether the image contains a denoted by Eq. (2), where D represents the domain of ሺ”ǡ ሻ,
majority of thick text-strokes or thin text-strokes. This value of
and D = [-ୟ୴୥ ,+ୟ୴୥ ].
ୟ୴୥ is used in later parts of our work to automate the 

selection of the size of structuring element in morphological ‫ܧ‬ሺ‫ݎ‬ǡ ܿሻ ൌ ሺ‫ܵ ٓ ܫ‬ሻሺ‫ݎ‬ǡ ܿሻ ൌ ‹ ሼ‫ܫ‬ሺ‫ ݎ‬൅ ݅ǡ ܿ ൅ ݆ሻ െ ܵሺ݅ǡ ݆ሻሽ  
‫׊‬௜ǡ௝‫א‬஽
operations. At first, edge detection of the document image is Grayscale dilation [17] of an image locates the maximum
done, using Sobel edge detection algorithm as presented by intensity of the points in neighborhood of the center-pixel, and
Vincent and Folorunso [15] 6REHO¶V PHWKRG LV SUHIHUUHG returns this maximum value within the moving window of

227

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:48:19 UTC from IEEE Xplore. Restrictions apply.
2017 IEEE Calcutta Conference (CALCON)

ሺ”ǡ ሻ. Thus, it is similar to a local-maximum operator. binarize with considerable success. The original grayscale
Consequently, this process probes and expands the shapes image is subtracted [16] from the background image estimated
contained in its input image. The grayscale dilation of ሺ”ǡ ሻ by from background estimation stage and thus the variations of
ሺ”ǡ ሻ is denoted by Eq. (3), where D represents the domain of illumination in ሺ”ǡ ሻ are practically eradicated and an image,
ሺ”ǡ ሻ, and D = [-ୟ୴୥ ,+ୟ୴୥ ]. ሺ”ǡ ሻ is obtained. ሺ”ǡ ሻ is acquired as in Eq. (4).
  

ሺ”ǡ ሻ ൌ ሺ ْ ሻሺ”ǡ ሻ ൌ ƒš ሼ ሺ” ൅ ‹ǡ ൅ Œሻ ൅ ሺ‹ǡ Œሻሽ  ሺ”ǡ ሻ ൌ  ሺ”ǡ ሻ െ ሺ”ǡ ሻ‫ א ”׊‬ǡ ‫ א‬ 
‫׊‬୧ǡ୨‫א‬ୈ This results in an image with the 0-side representing
The proposed method estimates  ሺ”ǡ ሻ from ሺ”ǡ ሻ by background and 255-side representing foreground. So, the final
successive morphological closing and opening operation [18] background separated image (with the 0-side representing
on ሺ”ǡ ሻ. This allows the eradication of undesirable regions foreground and 255-side representing background), ሺ”ǡ ሻ is
without significantly affecting the remaining structures of the obtained by complementing ሺ”ǡ ሻ as in Eq. (5) where R,C
image. Thus it is beneficial for recovering structures which are represents the height and width of ሺ”ǡ ሻ, respectively.
not completely destroyed by erosion or dilation, which is  

effectively the approximate illumination of ሺ”ǡ ሻ, i.e.  ሺ”ǡ ሻ. 


ሺ”ǡ ሻ ൌ ʹͷͷ െ ሺ”ǡ ሻ‫ א ”׊‬ǡ ‫ א‬ 

(a) (b) (c)


(a) (b) (c)
Fig. 3. Original images of Fig. 2 in their normalized forms.

Fig.3 shows the results of normalization done on the


original images given in Fig. 2.
D. Quadtree-based partitioning
Post background separation and image normalization, the
(d) (e) (f)
image is devoid of local illumination variations, like shadows
Fig. 2. Original versus corresponding background images: (a)-(c) as original and highlights, as well as global illumination variations. As a
image and (d)-(f) background image of (a)-(c) respectively. consequence, the image is henceforth enhanced, before an
adaptive thresholding algorithm is applied. This proposed
The morphological closing of a grayscale image by a method puts forward a quadtree-based algorithm [19] for image
structuring element is dilation followed by erosion, using the partitioning. Thus, it combines the noble traits of global (high
same structuring element for both operations. On a counter speed) and local (high accuracy) binarization methods. The
note, the morphological opening of a grayscale image by a
task is to delineate regions in an image into a number of blocks
structuring element is erosion followed by dilation with the
of equal sizes which are thereafter binarized individually. The
same structuring element. So, in accordance with the proposed
proposed method takes the advantage of the fact that the
method, initially ሺ”ǡ ሻ is passed through a morphological partitioning of the image minimizes the within-class variance,
closing operation by ሺ”ǡ ሻ. This removes small objects from thus simplifying further binarization steps.
an image while preserving the shape and size of bigger objects
in the image. This image is then subjected to a morphological
opening operation by the same structuring element ሺ”ǡ ሻ. This
operation fills almost all the gaps in the closed image. The
image obtained by performing these two processes
successively, is  ሺ”ǡ ሻǤ Fig. 2 shows three variations of
original images and their corresponding background images (a) (b)
respectively. Fig. 4. Partitioning the normalized image into blocks.

C. Normalization of Image Initially the background separated image, ‫ܵܤ‬ሺ‫ݎ‬ǡ ܿሻ is


The variation in appearance induced by the changes in partitioned into four rectangular sub-images by splitting both
illumination levels across the image has been a formidable the height and width into two equal halves. Each of the
problem for binarization. Hence it is essentially a subtle subdivisions is again subdivided into four equal partitions in a
necessity to do away with this variation. So, after the similar manner, by slicing across their half-height and half-
estimation of the approximate background of a document width points. The process of quadtree-partitioning is stopped at
image possessing uneven background and non-uniform the level of two because after this, the sub-image will become
illumination, the next step undertaken is to approximately too small, and it may contain only background or only
subtract the background image,  ሺ”ǡ ሻ from the original foreground, thus making the process of binarization erroneous.
grayscale image, ሺ”ǡ ሻ. This brings about a nearly uniform This proposed method of partition on one hand is simple and
intensity for all the background pixels and thus effectuates a bi- very fast, and on the other hand it reduces the significant
modal histogram to the image. Lighting condition of the picture variations in the gray levels in each block. Thus the process of
subsequently gets balanced, and the image becomes easier to binarization, in each of these blocks individually, is

228

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:48:19 UTC from IEEE Xplore. Restrictions apply.
2017 IEEE Calcutta Conference (CALCON)

streamlined to a large extent. Fig. 4 shows the effect of considered as 2 here) distinctly categorized into background
partitioning on the normalized image into blocks. and foreground.
E. Binarization by clustering in the individual blocks IV. EXPERIMENTAL RESULTS AND OBSERVATION
Finally, sixteen blocks are obtained from the partitioning of For the test results, we have used the most recent dataset of
the background-separated image. In this stage, we employ a H-DIBCO (H-',%&2¶16 [1]), the handwritten images of the
multivariate K-means clustering algorithm [20] for performing DIBCO series. This publicly available database contains 12
segmentation in each block based upon fulfillment of a diverse competing methods from nine different research groups
condition, else, all pixel intensities are transformed to 255 respectively. All the images and their respective ground truths
(white). For each block, at first, the range (rng) i.e. the are considered IURPWKHFRPSHWLWLRQ¶VVLWH [21]. H-DIBCO is a
difference between the maximum (max) and minimum (min) recognized and prevalent forum in the DIB field. These
intensity, mean (avg) and standard deviation (std) of all the datasets provide challenging assignment with handwritten set
pixel intensities are measured and the following condition is of characters alongside embedded noises and distortions to be
defined as Eq. (6). The condition,ˆሺ„Ž‘  ୧ ሻ = 1 depicts that barely readable in LW¶V as-is form.
„Ž‘  ୧ has a very high mean and low standard deviation, thus it
TABLE I. COMPARATIVE RESULTS IN APPLICABLE MEASURES
is reasonably concluded to be an all-background block.
  Measures based on H-D,%&2¶GDWDVHW
୰୬୥
ͳǡ ቄƒ˜‰ ൒ ቀ‹ ൅ ቁቅ ƒ†ሼ•–† ൑ Ⱦሽ Method F-Measure ࡲ࢖࢙ PSNR DRD
ˆሺ„Ž‘  ୧ ሻ ൌ ቊ ஑  
Ͳǡ ‫݁ݏ݅ݓݎ݄݁ݐ݋‬ 3rd 88.47 ± 4.45 91.71 ± 4.38 18.29 ± 3.35 3.93 ± 1.37
Accordingly, for best binarization result, the proposed 2nd 88.72 ± 4.68 91.84 ± 4.24 18.45 ± 3.41 3.86 ± 1.57
method refrains itself from applying clustering algorithm to it,
and hence it is left unbinarized and all pixel intensities in 1st 87.61 ± 6.99 91.28 ± 8.36 18.11 ± 4.27 5.21 ± 5.28
„Ž‘  ୧ are set to 255 (white). We found optimally that Ƚ and Ⱦ Otsu 86.61 ± 7.26 88.67 ± 7.99 17.80 ± 4.51 5.56 ± 4.44
give best results when they are in range ሾͳǤ͵ǡ ͳǤͷሿ and ሾ͵ǡ ͷሿ
respectively. Otherwise the block is binarized through K-means Sauvola 82.52 ± 9.65 86.85 ± 8.56 16.42 ± 2.87 7.49 ± 3.97
clustering algorithm. The proposed method uses the following Proposed 89.08 ± 5.31 90.26 ± 6.15 (4th) 18.48 ± 4.22 4.474 ± 3.32 (3rd)
variables for the multivariate K-Means clustering in a 3×3
TABLE II. RESULTS OF H-DIBCO¶ IMAGES IN TERMS OF DIFFERENT
window: MEASURING PARMETERS
ƒ Intensity of each central pixel
ƒ Mean of intensities of the 8-connected neighboring pixels of H-DIBCO¶,PDJHV
each central pixel Measures 1 2 3 4 5 6 7 8 9 10
ƒ Standard deviation of intensities of the 8-connected F-Measure 93.10 85.83 95.96 88.63 96.83 88.19 88.27 79.32 90.28 84.44
neighboring pixels of each central pixel
93.64 87.96 95.84 95.12 96.58 91.06 94.74 79.31 86.90 81.47
Values outside non-overlapping „Ž‘  ୧ bounds are computed ୮ୱ

by mirror-reflecting the intensities across the „Ž‘  ୧ boundary. PSNR 20.05 22.40 23.94 19.31 23.52 18.23 16.55 11.58 16.28 12.91
Clustering algorithms are based upon the index of similarity DRD 4.69 4.57 1.58 4.16 1.13 5.70 3.02 12.92 2.23 4.69
or dissimilarity between each pair of data points. K-means is
Two foremost objectives for any performance assessment
one of the unsupervised learning algorithms for clustering and
are certainly its usefulness for practice and efficiency when
make the decision whether the corresponding pixel intensity is
compared with other methods. In Table I, the comprehensive
text pixel or non-text pixel. Its major aim is to delineate regions
results of the three best contending techniques in the H-DIBCO
in a sub-image into a number of clusters, which is carried out
contest of 2016 are provided. Values, in bold, denote the first
by identifying regions that are homogeneous in some set of
rank among all the competitorsDVZHOODV2WVX¶V6DXYROD¶VLQ
local image attributes. The segregation is such that there exists
each corresponding column. For the evaluation purpose, an
high similarity of the samples inside the clusters (minimization
ensemble of measures has been used. In the contest of H-
of clusters within group) and high dissimilarity between
DIBCO¶ WKH HYDOXDWLRQ PHDVXUHV RI F-Measure (FM),
samples belonging to distinct clusters (maximizing the
pseudo-FMeasure (‫ܨ‬௣௦ ), Peak Signal to Noise Ratio (PSNR),
difference between groups). Initially, the number of clusters (k)
is defined which is 2 for the present scenario, since the Distance Reciprocal Distortion (DRD) Metric are used [1]. It
proposed method aims to segregate the sub-image into two can be arbitrated that even without deploying any post-
subdivisions, viz. foreground (represented as black) and processing step after binarization by the suggested schematized
background (represented as white). Then center of k clusters mandate of stages, the method beats the best competing
are randomly selected and the distance between the each pixel techniques in the H-DIBCO 2016 in two fields, and stands in
to each cluster centers is measured. For the comparison of the close competition with the top three in the other two measures.
distance, the Euclidean distance between the centers is used. The outcomes of the proposed technique are collated along
The pixel is relocated to cluster which has the shortest distance with them, and compared with the former. It is evident from
among the clusters. After re-estimation of the new centroid, Table I that the proposed method outperforms all the
each pixel is compared to the k centroids and reassigned to the participants of H-DIBCO¶ LQ WKH PHDVXULQJ SDUDPHWHUV RI
nearest updated center. This process reiterates until the center FM and PSNR, while it secures the third and fourth position
converges to a solitary point. Finally, pixels of each „Ž‘  ୧ respectively in the fields of DRD and ‫ܨ‬௣௦ . In Table II, we
(where ˆሺ„Ž‘  ୧ ሻ equals zero) are segmented into k-clusters (k present the detailed results using the DIBCO-provided

229

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:48:19 UTC from IEEE Xplore. Restrictions apply.
2017 IEEE Calcutta Conference (CALCON)

evaluator applied on each image of H-DIBCO¶ For some [1] I. Pratikakis, K. Zagoris, G. Barlas and B. Gatos, "ICFHR2016
images, we are not getting better results because specific Handwritten Document Image Binarization Contest (H-DIBCO 2016),"
2016 15th International Conference on Frontiers in Handwriting
challenge needs to be dealt separately. Several original images Recognition (ICFHR), Shenzhen, 2016, 619-623.
from H-',%&2¶ GDWDVHW DQG WKHLU ELQDUL]DWLRQ UHVXOWV DUH [2] 1 2WVX ³$ WKUHVKROG VHOHFWLRQ PHWKRG IURP JUD\OHYHO KLVWRJUDPV´
accorded in Fig. 5. Automatica, vol. 20, no. 1, 1975, 62±66.
[3] :7VDL³0RPHQW-SUHVHUYLQJWKUHVKROGLQJDQHZDSSURDFK´Computer
Vision, Graphics and Image Processing, 29 (3), 1985, 377±393.
[4] G -RKDQQVHQ DQG - %LOOH ³$ WKUHVKROG VHOHFWLRQ PHWKRG using
LQIRUPDWLRQ PHDVXUHV´ 6th ht. Conf. on Pattern Recognition, Munich,
Germany, 1982, 140±143.
[5] -1.DSXU3.6DKRR$.&:RQJ³$QHZPHWKRGIRUJUD\-level
picture thresholding using the enWURS\ RI WKH KLVWRJUDP´ Computer
Vision, Graphics, and Image Processing, V. 29, Issue 3, 1985, 273-285.
[6] :1LEODFN³$Q,QWURGXFWLRQWR'LJLWDO,PDJH3URFHVVLQJ´Strandberg
Publ. Co., Birkeroed, Denmark, 1985.
[7] B. Gatos, I. Pratikakis, S. J. Perantonis, ³$GDSWLYH GHJUDGHG GRFXPHQW
LPDJHELQDUL]DWLRQ´Pattern Recogn. 39 (3), 2006, 317±327.
[8] % *DWRV 3UDWLNDNLV , 3HUDQWRQLV 6- ³$Q $GDSWLYH %LQDUL]DWLRQ
7HFKQLTXHIRU/RZ4XDOLW\+LVWRULFDO'RFXPHQWV´0DULQDL6'HQJHO
A.R. (eds) Document Analysis Systems VI. DAS 2004. Lecture Notes in
Computer Science, vol 3163. Springer, Berlin, Heidelberg, 2004.
[9] 5 ) 0RJKDGGDP 0 &KHULHW ³$ YDULDWLRQDO DSSURDFK WR GHJUDGHG
GRFXPHQW HQKDQFHPHQW¶ IEEE Transactions on Pattern Analysis and
Machine Intelligence 32 (8) (2010) 1347±1361.
[10] ) 'ULUD ) /HERXUJHRLV + (PSWR] ³$ QHZ 3'(-based approach for
singularity-preserving regularization: application to degraded characters
UHVWRUDWLRQ´Int. J. Doc. Anal. Recognit. 15, 3, 2012, 183-212.
[11] M. Ramirez-Ortegon, E. Tapia, L. Ramirez-Ramirez, R. Rojas, E.
&XHYDV ³7UDQVLWLRQ SL[HO D FRQFHSW IRU ELQDUL]DWLRQ EDVHG RQ HGJH
detection and gray-LQWHQVLW\ KLVWRJUDPV´ Pattern Recogn. 43, 2010
1233±1243.
[12] R. Hu, J. Odobez and D. Gatica-3HUH] ³([WUDFWLQJ0D\D *O\SKV IURP
DegradeG $QFLHQW 'RFXPHQWV YLD ,PDJH 6HJPHQWDWLRQ´ J. Comput.
Cult. Herit. 10, 2, Article 10, 2017, 23 pages.
[13] 6 0RUDQ ³/HDUQLQJ WR 3URMHFW DQG %LQDULVH IRU +DVKLQJ %DVHG
ApproximatH 1HDUHVW 1HLJKERXU 6HDUFK´ Proceedings of the 39th
International ACM SIGIR conference on Research and Development in
Information Retrieval (SIGIR '16), ACM, New York, NY, USA, 2016,
897-900.
(a) (b) [14] F. Shen, Y. Mu, Y. Yang, W. Liu, L. Liu, J. Song, and H. T. Shen,
Fig. 5. (a) Few selected dataset images from H-DIBCO and (b) Binarization ³&ODVVLILFDWLRQ E\ 5HWULHYDO %LQDUL]LQJ 'DWD DQG &ODVVLILHUV´
results obtained using proposed method. Proceedings of the 40th International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR '17). ACM,
New York, NY, USA, 2017, 595-604.
V. CONCLUSION AND FUTURE WORK
[15] 259LQFHQW2)RORUXQVR³$'HVFULSWLYH$OJRULWKPIRU6REHO,PDJH
We have used a multivariate K-means clustering algorithm (GJH 'HWHFWLRQ´ Proceedings of Informing Science & IT Education
after partitioning the background-separated normalized image. Conference (InSITE) 2009, 97-107.
This effectively binarizes degraded document images, and [16] S. Mandal, S. Das, A. Agarwal and B. Chanda, "Binarization of
thereby making them ready for better image retrieval and degraded handwritten documents based on morphological contrast
intensification," 2015 Third International Conference on Image
downstream processing. The contribution lies in the fact that Information Processing (ICIIP), Waknaghat, 2015, 73-78.
superlative results are obtained without the use of any complex [17] I. Bloch,. "Duality vs. adjunction for fuzzy mathematical morphology
blocking or post-processing. Moreover it employs a simple and general form of fuzzy erosions and dilations", Fuzzy Sets and
clustering algorithm like K-Means, which is not even required Systems, 160, no. 13 (2009): 1858-1867.
in all the blocks. Our proposed method shows outstanding [18] M. Pesaresi and J. A. Benediktsson. "A new approach for the
performance in experiments involving better than the best with morphological segmentation of high-resolution satellite imagery." IEEE
two measures and close to first three results of the competition transactions on Geoscience and Remote Sensing 39, 2 (2001): 309-320.
using the dataset provided by H-DIBCO 2016. However, a [19] Z. F. Muhsin, A. Rehman, A. Altameem, Tanzila Saba, and M. Uddin,
"Improved quadtree image segmentation approach to region
scope for further improvement still remains to bring out more information." The Imaging Science Journal 62, no. 1 (2014): 56-62.
translucent foreground texts from the noisy background,
[20] S. Na, L. Xumin, and G. Yong, "Research on k-means clustering
thereby providing room for our future work with improved algorithm: An improved k-means clustering algorithm", Third IEEE
results. In this regard, we may apply a hybrid approach of International Symposium on Intelligent Information Technology and
global and adaptive thresholding techniques as a tradeoff that Security Informatics (IITSI), pp. 63-67, 2010.
could deal with specific challenges. [21] H-DIBCO 2016: ICFHR 2016 Handwritten Document Image
Binarization Contest, URL: http://vc.ee.duth.gr/h-dibco2016/, last
REFERENCES accessed: July, 2017.

230

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:48:19 UTC from IEEE Xplore. Restrictions apply.

You might also like