2e9c7b5ca372fc0e
2e9c7b5ca372fc0e
76-82 Science
Keywords: SIFT, Euclidean distance, classification, k-nearest neighbor, Bag of Visual Words.
1. Introduction the idea of Bag of Words (BoW) in the text
Recognition is the main problem of document, therefore this techniques for text
learning visual categories and classify new classification easily viable to the problem of
instances to those categories. Vision task image classification [4].
almost relies on the capability to identify The remaining of this paper includes the
objects, scenes, and categories. Visual following: section 2 presents a review about
recognition has different applications that the existing works on image classification and
contact with many areas of artificial Bag of Word. Section 3 presents the concept
intelligence and information retrieval e.g. of Scale Invariant Feature Transform method.
content based image, data mining, or object Section 4 presents a general concept of
identification for mobile robots [1]. clustering and k-means clustering algorithm.
Content based image retrieval (CBIR) Section 5 presents Euclidean distance metric
make it possible to search for and classify for corresponding features comparison
images. Images can be analyzed based on their process. Section 6 presents K-Nearest
features (such as color, textures, shape or Neighbor classification algorithm. Section 7
edge). Keypoints are salient image patches that presents Bag of Visual Words approach.
contain rich local information of an image, and Section 8 presents image classification based
they can be automatically detected using on bag of visual words algorithm. Section 9
various detectors [2]. presents images of interest and the
Local features have been widely used, the experimental results when running the
most well-known local features detection and algorithm on unlabeled images. Section 10
description approaches are Speed Up Robust presents conclusion of this work.
Feature (SURF) and Scale Invariant Feature
Transform (SIFT). To find images similar to a
query image, all images feature descriptors 2. Related Work
must be compared using some distance Various surveys for image classification
measures. Bag of Words (BoW) method has using BoVW can be found, in literatures below
gained popularity. In BoW method, get some of those which are most related to this
clustered vectors of image features and create work:
histograms (number of features occurrence) 1. In 2007, Jun Yang, Yu-Gang Jiang,
based on features descriptor. All the obtained Alexander Hauptmann, and Chong-Wah
descriptors in the histogram must be compared
[3]. Ngo used text categorization steps to create
Bag of Visual Words (BoVW) model in different representations of visual word
computer vision represents image as visual and studied their impact to classification
words. The concept of BoVW is extract from performance on the TRECVID and
76
Abdul Amir Abdullah Karim
PASCAL collections. The Empirical study image rotation, image scaling, and image
gives basis for representing visual word lightening variation. SIFT is patent algorithm
that is likely to create high classification and take dense processing cost that make it too
performance [2]. slow [6].
2. In 2012, Mingyuan Jiu, Christian Wolf, SIFT composed of four main stages: (a)
Christophe Garcia, and Atilla Baskurt detect scale space, (b) localize keypoints, (c)
offers a novel method for learning assign orientation, and (d) describe keypoints.
supervised codebook and optimization bag The first step is define a location and a scales
of words approach. The proposed approach of the interest points using the extrema of scale
allows to evolve or keep the distinctive space in the DoG (Difference of Gaussian)
power of an unsupervised learning functions with various values of σ. Different
codebook while reducing of the learned scale of images created by using different
codebook size. The codebook learning and value of σ in Gaussian function (σ in every
recognition process are integrated to scale separated by k that is constant value),
update the cluster centers through the back then Subtract consecutive images to create
propagated errors: one is based on classical DoG pyramid. DOG was used instead of
error backpropagation. The drawback of a Gaussian to increase the processing speed.
gradient descent algorithm applied to a After that the Gaussian image down sampled
nonlinear system is difficult to learn a set by 2 and create DoG to down sampled image.
of optimal parameters, the algorithms Gaussian function shown in equation (1) and
mostly converge to local minima and DoG shown in equation(2) [7][8].
sometimes even diverge. The other is
based on cluster reassignments algorithm G (x, y, σ) = exp [ - ] ............... (1)
which adjusts the cluster centers indirectly
by rearranging the cluster labels for all the Where
feature vectors. It needs more iterations to G (x,y,σ) represents a changing scale
converge to a better solution [5]. Gaussian,
3. In 2015, Marcin Korytkowski, Rafał σ represents the scale variable of the
Scherer, Paweł Staszewski, and Piotr consecutive scale space,
Woldan presents method to classify and x represents horizontal coordinates in
retrieve visual words using a novel Gaussian window,
relational database architecture. This work y represents vertical coordinates in Gaussian
created a special database indexing window,
algorithm, which will significantly speed π = 3.14
up answering to visual query-by-example
SQL queries in relational databases. The D(x, y, σ) = (G(x, y, k σ) - G(x, y, σ))*I(x, y)
proposed method tested on three classes of ................................. (2)
visual objects and divided them into Where
learning and testing examples. The testing * represents the convolution operation,
set consists of 15% images from the whole k represents scaling factor,
dataset. Local keypoint generated before G(x,y,σ) represents a changing scale Gaussian
the learning procedure for all images using function,
the SIFT algorithm. All simulations were I(x, y) represents an input image,
performed on a hyper virtual machine [3]. D(x,y,σ) represents Difference of Gaussians
have k times scale,
3. Scale Invariant Feature Transform x represents horizontal coordinate in image
(SIFT) (I(x,y)) with corresponding horizontal
SIFT is a local features detection and coordinate in Gaussian window (G(x,y,σ)),
description algorithm, it is able to provide y represents vertical coordinate in image
steady point for matching image. SIFT is (I(x,y)) with corresponding vertical coordinate
popular algorithm for detecting important in Gaussian window (G(x,y,σ)).
points which are invariant to image translation,
77
Al-Nahrain Journal of Science Vol.21 (4), December, 2018, pp.76-82 Science
78
Abdul Amir Abdullah Karim
unlabeled image by relating the unlabeled vocabulary that can vary from hundreds to
image’s features to the labeled features more than tens of thousands. By mapping the
depending on distance function (equation (4)) keypoints to visual words each image can be
or similarity measure. In the k nearest represented as a “bag of visual words” [2].
neighbor a test sample allocates to the class BoVW for new unlabeled images
that frequently describe among the k nearest calculated in a similar way: local features
training samples. If two or more such classes extraction from image and features
exist, then the test sample is assigned the class description, projection of these descriptors on
with minimum average distance to it [14]. the dictionary calculated previously by the
training set, and histogram calculation of each
7. Bag of Visual words (BoVW) visual word appearance of the dictionary [5].
The image has keypoints or local features
identified as prominent image regions that 8. Proposed Algorithm
have rich local information (such as color or The proposed Algorithm of image
texture) and these features can be detected classification using bag of visual words can be
using different detection and description described by two main algorithms training
method. Detected features are then split to a algorithm and testing algorithm as follows:
number of clusters using the K-means
clustering algorithm where each cluster will Training Algorithm
have features with similar descriptors and Input (collection of image)
encodes each keypoint by the index of the Output (k - clusters, k - visual word)
cluster to which it belongs this is called vector Step 1: Collect set of images for each class
quantization (VQ) technique [2]. of interest (in this paper the experimental
The VQ can be considered as a class of interest are Car, Motor, and Ship).
generalization of scalar quantization to the Step 2: Apply BoVW on collected images.
quantization of a vector. The VQ encoder BoVW consists of three main steps:
encodes a given set of k-dimensional data 1. Extract keypoints from images using
vectors with a much smaller subset. The subset SIFT feature detection and description
C is called a codebook and its elements Ci are algorithm.
called codewords, codevectors, reproducing 2. Create descriptor for each extracted
vectors, prototypes or design samples. The keypoints.
commonly used vector quantizers are based on 3. Clustering features using k-means
nearest neighbor called Voronoi or nearest clustering algorithm (Create visual
neighbor vector quantizer [13]. vocabulary using vector quantization of
Each cluster represented by a visual word descriptor space) and save the resulting
that represents the specific local pattern “visual words”.
participated by the keypoints in that cluster, so
a visual word vocabulary identifies all types of Testing Algorithm
local patterns of image. The image can be Input (k - visual word)
identifies as a bag of visual words, or in other Output (labeled image)
words, as a visual-word vector containing the Step 1: Open unlabeled new image.
number (weight) of each visual word in image Step 2: Extract and describe features of
(i.e., the number of keypoints in the unlabeled image using SIFT.
corresponding cluster), which in classification Step 3: Extract visual word (centroid) for
task can be used as a feature vector [2]. testing image.
BoVW approach in general creates Step 4: Calculate the nearest neighbor
supervised classifiers depend on visual words using Euclidean distance between visual
taken from labeled images for label prediction word of tested image and visual words of
of a new image. Therefore the clustering training images.
method creates a visual words vocabulary to Step 5: Take the decision: compare
describe different local patterns in images. The extracted features of unlabeled image with
clusters number defines the size of the visual words extracted in training stage.
79
Al-Nahrain Journal of Science Vol.21 (4), December, 2018, pp.76-82 Science
80
Abdul Amir Abdullah Karim
distance with each cluster the minimum [2] Jun Y., Yu-Gang J., Alexander H., Chong-
distance describe the class of tested image. Wah N., “Evaluating Bag-of-Visual-Words
Representations in Scene Classification”,
Table (2) Proceedings of the international Workshop
K-Nearest Neighbor. on Workshop on Multimedia information
Original Car Ship Motorbike
Retrieval, vol. 2, pp. 197-206, 2007.
image class class class [3] Marcin K., Rafał S., Paweł S., Piotr W.,
“Bag-of-Features Image Indexing and
Car 14 154 154
Classification in Microsoft SQL Server
Car 30 154 17 Relational Database”, IEEE, 46, 746-751,
Car 18 30 154 2015.
Car 10 16 19 [4] Pornntiwa P., Emmanuel O., Olarik S.,
Car 154 574 518 Lambert S., Marco W., “Comparing Local
Ship 21 15 24 Descriptors and Bags of Visual Words to
Ship 24 11 51 Deep Convolutional Neural Networks for
Ship 17 9 18 Plant Recognition”, 6th International
Ship 18 30 37 Conference on Pattern Recognition
Applications and Methods, 1, 886-893,
Ship 15 18 23
2017.
Motorbike 8 44 7 [5] Mingyuan J., Christian W., Christophe G.,
Motorbike 23 154 17 Atilla B., “Supervised Learning and
Motorbike 14 16 6 Codebook Optimization for Bag-of-Words
Motorbike 9 16 9 Models”, Springer Science Business
Motorbike 16 21 17 Media, 4, 409-419, 2012.
Motorbike 14 17 11 [6] Yi H., Guohua D., Yuanyuan W., Ling W.,
Jinsheng Y., Xiqi L., Yudong Z.,
The ratio of sensitivity and specificity is “Optimization of SIFT algorithm for fast-
limited between 0 and 1; that is the ratio of image feature extraction in line-scanning
true classification, high value of sensitivity ophthalmoscope”, Optik journal, 152, 21-
and specificity give impression of good 28, 2017.
method performance. [7] El-gayar M., Soliman H., Meky N., “A
comparative study of image low level
10. Conclusion feature extraction algorithms”, Egyptian
Bag of visual word (BoVW) technique is Informatics Journal, 14, 175-181, 2013.
an efficient image representation in the [8] Panchal P., Panchal S., Shah S., “A
classification task. In this paper there are two Comparison of SIFT and SURF”,
main stage, the first is the training stage and International Journal of Innovative
the second is the testing stage, every stage Research in Computer and Communication
have number of steps. In general the first stage Engineering, 1, 323-327, 2013.
create visual vocabulary from training images. [9] Jian W., Zhiming C., Victor S., Pengpeng
The information that extracted in the first stage Z., Dongliang S., Shengrong G., “A
used to classify new unlabeled image based on Comparative Study of SIFT and its
bag of features created using supervised Variants”, Measurement Science Review,
BoVW approach on set of training images. 13, 122-131, 2013.
This approach gives very good results [10] Pedro J., “Contribution to the
although small number of images using in completeness and complementarity of
training process. Local Image Features”, thesis, 2013.
[11] Soumyadeep G., Tejas I., Rohit K., Richa
References S., Mayank V., “Feature and Keypoint
[1] Kristen G., “Visual Object Recognition”, Selection for Visible to Near-infrared Face
thesis, 2010. Matching”, International Conference on
81
Al-Nahrain Journal of Science Vol.21 (4), December, 2018, pp.76-82 Science
82