0% found this document useful (0 votes)
15 views

Non-Uniform Illumination Document Image Binarization Using K-Means Clustering Algorithm

The document discusses a new K-Means clustering based algorithm for non-uniform illumination document image binarization. The algorithm obtains a combined edge map and divides the document into blocks classified as text or non-text using the proposed method. It then binarizes text blocks using K-Means clustering centroids. The method is evaluated on datasets and shows competitive performance against six state-of-the-art algorithms.

Uploaded by

ivanz.arsi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Non-Uniform Illumination Document Image Binarization Using K-Means Clustering Algorithm

The document discusses a new K-Means clustering based algorithm for non-uniform illumination document image binarization. The algorithm obtains a combined edge map and divides the document into blocks classified as text or non-text using the proposed method. It then binarizes text blocks using K-Means clustering centroids. The method is evaluated on datasets and shows competitive performance against six state-of-the-art algorithms.

Uploaded by

ivanz.arsi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2021 IEEE 9th International Conference on Information, Communication and Networks

Non-uniform Illumination Document Image


Binarization Using K-Means Clustering Algorithm
2021 IEEE 9th International Conference on Information, Communication and Networks (ICICN) | 978-1-6654-3861-2/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICICN52636.2021.9674011

Xingxin Yang Yi Wan


School of Information Science and Engineering School of Information Science and Engineering
Lanzhou University Lanzhou University
Lanzhou, China Lanzhou, China
[email protected] [email protected]

Abstract—Good binarization result is of great help to the Document image binarization is the preprocessing stage
afterwords document image analysis and optical character of document image analysis and recognition which has
recognition(OCR). However, non-uniform illumination document been widely researched for decades[1–7, 16–18]. It aims
image binarization is a very challenging task due to high
variation between the document background and foreground. at segmenting document image into foreground(text) and
This paper describes a new K-Means clustering based algorithm background(non-text). The advantage of document image bina-
for non-uniform illumination document image binarization to rization lies in removing complex background while retaining
solve this problem. In the proposed technique, we firstly obtain original text information. After binarization, document image
the combined edge map by take intersection of Canny’s edge map intensity change from 0-255 in RGB three channels to 0 or
and local image contrast. Then divide the document image into
small blocks, each block is classified as text and non-text block 255 in a single channel. As illustrated in Fig. 1, document
using our proposed algorithm. Finally, binarize the text block image binarization remains a challenging task due to aging,
using K-Means clustering centroids. The proposed technique has light reflection and artifacts. Because of the high variation
been evaluated over nine Non-uniform illumination document between the document background and foreground, it is hard
images extracted from DIBCO datasets and one scene light to select a proper threshold for these document images.
reflection document image. Experimental results show that our
proposed method achieves competitive performance among other Fortunately, the annually held document image binarization
six state-of-the-art binarization algorithm. contest(DIBCO[13]) since 2009 shows efforts on dealing with
these issues.
Keywords-Document image binarization, K-Means clustering,
non-uniform illumination. Lots of related methods have been proposed in the past
decades, as far as I am concerned, they can be classified
I. I NTRODUCTION into four categories: global threshold, sliding window based
local threshold, combined methodology and convolution neural
network(CNN)[18]. Otsu’s[2] method is the classical global
threshold which scan each intensity to find an optimum
threshold to maximum their inter-class variance. This method
performs well when the image histogram has obvious bimodal
(a) DIBCO2011 PR8 pattern. Local threshold makes use of features within a cer-
tain size sliding window to choose the threshold for each
pixel. Usually, suitable sliding widow size is essential for
removing small noisy pixels and fitting text stroke width. For
example, Sauvola’s[3] and Niblack’s[4] threshold is calculated
by local means and standard deviations. M arkov Random
F ield(MRF) model attracts more attention on document image
binarization these year because of Howe’s[7] related work.
Specifically, the energy function is constructed by Laplacian
operator and edge detector. The way to minimize energy
function is achieved by solving Boykov’s[14] max-flow/min-
cut problem. MRF model achieves very high score in ICDAR
series challenges, but it failed in our taken scene document
image. CNN made great success in computer vision and
(b) Scene document image (c) DIBCO2017 17
of course, achieved very high score[18] in DIBCO series
challenges. It considers the binarization process as a pixel wise
Fig. 1. Examples of non-uniform illumination document im- classification problem. More specifically, it takes the document
ages image as input and the output feature map corresponds to

978-1-6654-3861-2/21/$31.00 ©2021 IEEE 506

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:54:47 UTC from IEEE Xplore. Restrictions apply.
probability that each pixel is part of the foreground. In order Finally, take the intersection of Canny edge and the selected
to fine-tune the pre-trained model, the target DIBCO series stroke edge as final text stroke edge.
challenge is regarded as test data and the rest are training
B. Our Binarization Process
data. As a result, the drawback of CNN lies in the trained
model is not suitable for another DIBCO series challenge.
II. P ROPOSED M ETHOD
This section presents our proposed document binarization
method. Given an input document image Ic , combined edge
map is firstly obtained following Su’s[1] principle which
combines Canny edge[12] and local image contrast. Then,
divide document image into small blocks, classify each block
into text and non-text block using our proposed algorithm Fig. 3. Document image binarization process, divide document
1. Thirdly, binarize current block using K-Means clustering image into small blocks first, then binarize it block by block
centroids. Finally, convert Ic into white background and black
text(foreground) pattern. K-Means is a popular clustering algorithm. Since document
image only contains two categories: text(foreground) and (non-
A. Edge Detection text)background, we just need to divide the pixels into two
categories. The document image binarization process is stated
as follows.
Given a document image I(H, W )(H and W denotes height
and width of image separately), we can get it’s corresponding
edge map E(H, W ) using Su’s method which is described
in previous section. The next step is dividing the given
document image and edge map into small blocks. The block
size selection will be discussed in section III. As shown in
Fig. 3, some blocks contain text, while others do not contain
text. The initial block Irc is defined as the one who contains
most stroke edge pixels. Since this block contains most stroke
(a) (b) (c) (d) edge pixels, it is considered as text block. As illustrated in
Fig. 3, centered at block Irc (block 1), binarize all the blocks
Fig. 2. Different edge detection maps, (a) Original document outward. The block is binarized using K-Means algorithm, the
image, (b) Canny’s edge detection map, (c) local contrast two clustering centroids are CTij0 and CTij1 , pixels within
map[17], (d) combined edge map[1] lower clustering centroids are labeled 0, others are labeled 1.
The two clustering centroids combined with the number of
Canny’s edge detector is very useful when detecting text edge pixels within current block is very useful in distinct text
stroke edges, but it often detects some non-text strokes(shown and non-text block. In detail, current block is considered as
in Fig. 2). Hence, combined edge map is applied to deal text block when meeting inequality group (3):
with this problem, here Su’s[1] local contrast is formulated  CT − CT ICij1 − ICij0
ij1 ij0
as follows:  > β1
CT + CT ICij1 + ICij0


 ij1
 ij0
Imax (i, j) − Imin (i, j) CTij1 − CTij0
Ca (i, j) = αC(i, j) + (1 − α) (1) (3)
255 > β2
CT + CT

ij1 ij0



Centered on pixel (i, j), where Imax (i, j) and Imin (i, j) refers 
Eij >= 1
to maximum and minimum intensity within a local window
size 3 × 3. C(i, j) is derived from his previous work[17] and Where ICij denotes mean clustering centroids of the ad-
it can be formulated as: jacent text blocks, more specifically, there should be smallest
Euclidean distance between current block and adjacent text
Imax (i, j) − Imin (i, j)
C(i, j) = (2) block. Eij denotes the number of stroke edge pixels in current
Imax (i, j) + Imin (i, j) + ε block. Assuming the illumination changes gradually, the first
ε is a infinite small positive number, we set it as 0.0001. inequality of (3) denotes the relative difference of current text
α is the trade-off factor for the first part(local contrast) and block should be β1 times larger than that of adjacent text
second part(local gradient), it is formulated as α = ( Std γ
128 ) . blocks. The second inequality denotes the relative difference
Std denotes standard deviation of image intensity. According of the clustering centroids should larger than a constant value
to Su’s study, in order to get more robust binarization result, β2 . The last inequality denotes text block should contain at
γ should be set around 1, here we set it as 0.3. Then use least one stroke. Parameter β1 and β2 will be discussed in
Otsu’s[2] global threshold method to select stroke edges. section III.

507

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:54:47 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1: Document image binarization process Algorithm 2: Convert labels
Input: Document image I(H, W ) and corresponding Input: The output of Algorithm 1 I(H, W ), divide
edge map E(H, W ) I(H, W ) into the same blocks using
Output: The binarization result of I(H, W ) Algorithm 1
1 Divide I(H, W ) and E(H, W ) into small blocks, Iij Output: Final binarization result
and Ei,j respectively; 1 foreach block in I(H, W ) do
2 Choose the block Irc which contains most stroke edge 2 Caculate the mean indensity value Mi,j of curent
pixels as initial text block; block;
3 Calculate the 2 clustering centroids CTij0 CTij1 and 3 If meeting (5): Labeli,j = 255 − Labeli,j ;
then label the pixels in this block(pixels within 4 end
smaller clustering centroids are labeled 0, others are
labeled 1);
4 Centered on block Irc , expanding outward. The initial
value of adjacent non-text blocks. Considering background of
clustering centroids is the mean centroids value of
adjacent blocks do not change heavily, the clustering centroids
adjacent text blocks;
should be similar to the mean intensity of surrounding non-text
5 Current block is considered as text block when
areas. As a result, the absolute difference between CTij1 and
meeting (3) and then binarize current block using (4);
Mij shall larger than the absolute difference between CTij0
6
( and Mij . Parameter θ is the trade off factor.
Labelij if (3)
Iij = (4)
255 else

7 Repeat step 3, 4, 5 until all the blocks are binarized;

As shown in Fig. 3, block 1 is chosen as initial text block (a) (b) (c)
under the principle of containing most stroke edge pixels. The Fig. 4. (a) original document image cropped from Fig. 3, (b)
next step is binarize block 2, it’s adjacent text block is block 1, binarization result using algorithm 1, (c) binarization result
so it should be an non-text block according to (3). We should after convert labels
notice that the adjacent text block of block 3 is block 1 only,
because block 2 has been classified as non-text block. Binarize
block 3-9 clockwise, and then binarize next ring clockwise. III. E XPERIMENTS AND D ISCUSSION
The whole binarization process is presented in algorithm 1. Two experiments are designed to demonstrate the effec-
C. Convert Labels tiveness of our proposed method. Before that, we evaluated
the performance of our proposed method on a document im-
As discussed in previous section, pixels belong to the ages(DIBCO2011 PR8) to select the optimal aforementioned
lower clustering centroids are labeled 0, which means these parameters. Then we compare our proposed method with other
pixels are considered as texts and the others are considered six state-of-the-art methods[2]-[7] on nine pictures extracted
as background. It is true in most cases, because we prefer from DIBCO datasets. Finally, we test and compare our
dark texts and light background. But documents often contain proposed method with the others on one scene light-reflection
different kinds of degradation, such as stains, ink bleed through document image.
and artifacts. Some documents contain light background and
dark text while others contain dark background and light A. Parameter Selection
text, some documents are even a combination of the above
As described in previous section, the block size is selected
two conditions. As illustrated in Fig. 4, pixels are be mis-
as 20 × 15. Actually, we mainly follow 2 principles in block
labeled under such circumstances. The black text is darker
size selection. Firstly, the block size is restricted by stroke
than red background while the yellow text is brighter than red
width which should larger than the largest stroke width of the
background. We correct these mis-labeled blocks by Algorithm
document image. Secondly, under the consideration of convert
2, we will discuss parameter θ in section III.
labels which is described in section II C, the block size should
smaller than the text line space. What’s more, our experiments
|CTij0 − Mij | > θ|CTij1 − Mij | (5)
show that larger block size can accelerate the binarization
Where Mi,j denotes mean pixel value of adjacent non-text process.
blocks, CTij is the clustering centroids of current block, CTij0 β2 is a global threshold which means the relative difference
is mean pixel value of lower cluster and CTij1 is mean pixel should higher than a certain value, we take Fig. 1(a) as an
value of higher cluster. The inequality (5) derives from com- example to find the best β2 . Under such circumstance, β1 is
paring clustering centroids of current block with mean pixel a fixed value equals to 0. As shown in Fig. 6(b), it achieves

508

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:54:47 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c) (d)

(e) (f) (g) (h)

Fig. 5. Figure (a) is original document image selected from DIBCO2011, the others are the binarization result using (b) Otsu’s,
(c) Niblack’s, (d) Sauvola’s, (e) Azad’s, (f) Xiong’s, (g) Howe’s and (h) ours algorithm respectively.

TABLE I. Experimental result on 9 pictures Exacted from


DIBCO datasets, P denotes Precision, P ∗ denotes pseudo
Precision.
methods P R F − Measure P∗ R∗ F − Measure∗
Otsu[2] 95.00 80.39 86.80 94.00 91.41 92.37
Ni[4] 26.13 75.35 36.02 26.06 92.57 38.20
Sa[3] 97.74 72.91 82.78 97.40 87.49 91.59
Azad[5] 91.52 55.80 65.34 90.81 59.97 67.98
Xiong[6] 94.34 94.37 94.25 93.15 96.85 94.85
(a) (b) Howe[7] 96.41 92.94 94.52 95.55 95.27 95.20
ours 95.96 83.62 89.20 94.98 94.40 94.48
Fig. 6. Pseudo F-Measure[8] on Fig. 1(a) using different β1 ,
β2
ground truth which attributing to each text pixel its distance
from the text contour. Compared with F-Measure, pseudo F-
better Pseudo F-Measure score when β2 range from 0.038 to Measure makes use of the contour of the ground truth to keep
0.048. In order to find the best β1 , β2 is a fixed value which the completeness of a single word which is more essential in
equals to 0.038. Finally, we get the test result in Fig. 6(a). As OCR.
a result, all the β1 is equal to 0.43 and β2 equal to 0.038 in
our experimental degraded images. With the same reason, θ is C. Experiments on Scene Document Image
select as 1.5 in our experiment. We also evaluate ours and other aforementioned state-of-the-
art method on scene document image. As we can see in Fig. 7,
B. Experiments on DIBCO Datasets the document image is degraded by non-uniform illumination
There are only nine non-uniform illumination document im- and light reflection. What’s more, compared with background,
ages in DIBCO datasets. Our proposed algorithm is compared this image contains lighter foreground and darker foreground.
with other 6 state-of-the-art methods on this nine images. Otsu, The block size is selected as 20×15. The experiment is shown
Niblack and Sauvola algorithm are the classical binarization in Fig. 7. Obviously, our algorithm achieves best result in
algorithm, Azad’s method is based on Bi-Directional ConvL- this image. Because all the text region in DIBCO datasets
STM U-Net and evaluated on 2016 DIBCO datasets[10], the are darker than adjacent background, algorithm [6], [7] did
experiments can be found in[18] , Howe is the winner method not take lighter text into consideration. As a result, their
in 2011 DIBCO competition[9], Xiong is the winner method binarization result contains both bright and dark text.
in 2018 H-DIBCO competition[11]. Experimental results are
shown in table I, we evaluate our algorithm and other methods IV. C ONCLUSION
in terms of Precision, Recall, F-Measure, pseudo Precision, In this paper, we present a new document image binarization
pseudo Recall and pseudo F-Measure[8] which is a compro- method based on K-Means clustering algorithm. Text and
mise of pseudo-Precision and pseudo-Recall. non-text blocks are separated by the difference of clustering
As shown in table I, our proposed method achieves com- centroids. Although our proposed algorithm achieves com-
petitive scores in P , P∗ , R∗ and F-Measure∗ but failed in R petitive scores on nine Non-uniform illumination document
and F-measure. P∗ , R∗ and F-Measure∗ is the measurement images from DIBCO datasets and significantly outperforms
methods proposed by Ntirogiannis[8]. We usually use white other state-of-the-art document image binarization methods
and black pattern ground truth, but Ntirogiannis uses weighted on the taken scene light reflection document image. Other

509

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:54:47 UTC from IEEE Xplore. Restrictions apply.
Convolutions,“ in Proc. IEEE/CVF Int. Conf. Computer
Vision Workshop, Seoul, Korea (South), 2019, pp. 406-415.
[6] W. Xiong, X. Jia, J. Xu, Z. Xiong, M. Liu, J. Wang,
“Historical document image binarization using background
estimation and energy minimization,“ in Proc. Int. Conf.
Pattern Recognition, Beijing, CHINA, 2018, pp. 3716-3721.
[7] N. R. Howe, “A Laplacian Energy for Document Bi-
narization,“ in Proc. Int. Conf. Document Analysis and
Recognition, Beijing, China, 2011, pp. 6-10.
(a) (b) (c) [8] K. Ntirogiannis, B. Gatos and I. Pratikakis, “Performance
Evaluation Methodology for Historical Document Image
Binarization,“ IEEE Trans. Image Processing, vol. 22, no.
2, pp. 595-609, Feb. 2013, 10.1109/TIP.20.23.2219550.
[9] I. Pratikakis, B. Gatos and K. Ntirogiannis, “ICDAR 2011
document image binarization contest,“ in Proc. 11th Int.
Conf. Document Analysis and Recognition (ICDAR 2011),
2011, pp. 1506-1510.
[10] I. Pratikakis, K. Zagoris, G. Barlas and B. Gatos,
“ICFHR 2016 handwritten document image binarization
(d) (e) (f) contest (H-DIBCO 2016),“ in Proc. 15th Int. Conf. Fron-
tiers in Handwriting Recognition (ICFHR), 2016, pp. 619-
623.
[11] I. Pratikakis, K. Zagori, P. Kaddas and B. Gatos, “ICFHR
2018 Competition on Handwritten Document Image Bi-
narization (H-DIBCO 2018),“ in Proc. 16th Int. Conf.
Frontiers in Handwriting Recognition (ICFHR), Niagara
Falls, NY, 2018, pp. 489-493.
[12] J. Canny, “A Computational Approach to Edge De-
tection,“ IEEE Trans. Pattern Analysis and Machine In-
telligence, vol. PAMI-8, no. 6, pp. 679-698, Nov. 1986,
(g) (h)
10.1109/TPAMI.1986.4767851.
Fig. 7. Figure (a) is original document image, the others are [13] B. Gatos, K. Ntirogiannis and I. Pratikakis, “ICDAR
the binarization result using (b) Otsu’s, (c) Niblack’s, (d) 2009 Document Image Binarization Contest (DIBCO
Sauvola’s, (e) Azad’s, (f) Xiong’s, (g) Howe’s and (h) ours 2009),“ in Proc. 10th Int. Conf. Frontiers in Handwriting
algorithm respectively. Recognition, Barcelona, 2009, pp. 1375-1382.
[14] Y. Boykov and V. Kolmogorov, “An experimental com-
parison of min-cut/max- flow algorithms for energy min-
degradation, such as page stains and ink bleed through, still imization in vision,“ IEEE Trans. Pattern Analysis and
challenging using our algorithm. Machine Intelligence, vol. 26, no. 9, pp. 1124-1137, Sept.
2004, 10.1109/TPAMI.2004.60.
R EFERENCES [15] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D.
[1] B. Su, S. Lu and C. L. Tan, “Robust Document Image Piatko, R. Silverman and A. Y. Wu, “An efficient k-
Binarization Technique for Degraded Document Images,“ means clustering algorithm: analysis and implementa-
IEEE Trans. Image Processing, vol. 22, no. 4, pp. 1408- tion,“ IEEE Trans. Pattern Analysis and Machine In-
1417, April 2013, 10.1109/TIP.20.23.2231089. telligence, vol. 24, no. 7, pp. 881-892, July 2002,
[2] N. Otsu, “A threshold selection method from gray level 10.1109/TPAMI.2002.1017616.
histogram,“ IEEE Trans. Systems, Man, and Cybernetics, [16] P. Jana, S. Ghosh, S. K. Bera and R. Sarkar, “Handwritten
vol. 19, pp. 62-66, Jan. 1979. document image binarization: An adaptive K-means based
[3] J. Sauvola and M. Pietikainen “Adaptive document image approach,“ in Proc. IEEE Calcutta Conference (CALCON),
binarization,“ Pattern Recognition, vol. 33, no. 2, pp. 225- Kolkata, 2017, pp. 226-230.
236, 2000. [17] Su. Bolan, Shijian Lu, and Chew Lim Tan. “Binarization
[4] W. Niblack, “An Introduction to Digital Image Pro- of historical document images using the local maximum and
cessing,“ Englewood Cliffs, NJ: Prentice-Hall, USA, pp. minimum.“ in Proc. 9th IAPR Int. Workshop on Document
115116, 1986. Analysis Systems, Jun. 2010, pp. 159-166.
[5] R. Azad, M. Asadi-Aghbolaghi, M. Fathy and S. Escalera, [18] https://github.com/rezazad68/BCDUnet DIBCO
“Bi-Directional ConvLSTM U-Net with Densley Connected

510

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:54:47 UTC from IEEE Xplore. Restrictions apply.

You might also like