0% found this document useful (0 votes)

60 views

Spatial Pyramid Matching For Scene Category Recognition

This document summarizes a method called spatial pyramid matching for scene category recognition. It partitions images into increasingly fine sub-regions and computes histograms of local features in those sub-regions. This creates a spatial pyramid that extends the bag-of-features representation while incorporating spatial information. The method was shown to improve scene categorization performance on datasets like Caltech-101 and Caltech-256 compared to orderless bag-of-features. The document discusses how the spatial pyramid is constructed and matched between images as well as previous related work and the authors' experiments applying this method to the Caltech-256 dataset.

Uploaded by

Vaibhav Jain

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views

Spatial Pyramid Matching For Scene Category Recognition

Uploaded by

Vaibhav Jain

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Spatial Pyramid Matching for Scene Category Recognition

Nityananda Jayadevprakash, Olzhas Makhambetov

University of California
Irvine, CA
[email protected], [email protected]

Abstract
In this work we present a method for scene category
recognition which is based on matching approximate global
correspondences. This work was proposed by the Lazebnik
et al[1]. It is based on partitioning images into increasingly fine sub-regions and then computing histograms of
local-features in those sub-regions. This approach creates a
spatial pyramid which is an extension of the orderless bagof-features representation of an image. This work shows a
significant improvement in scene categorization tasks and
is one of the best methods for object recognition in the
Caltech-101 database. Here, our work focuses on trying
this approach on the new Caltech-256 [8] dataset which is
generally considered to be a more challenging dataset than
the Caltech-101 [3].

1. Introduction
Our task in this project is to find the image category of
an image. One of the dominant approaches taken to achieve
this task is the bag-of-features methods, which represents an
image as a collection of local features. However, since this
method and related methods propose taking a histogram of
features of the dominant points of interest, we are throwing
away all the information giving spatial layout of the features. This way they are incapable of capturing the shape of
object or segmenting them from background. This means
that it would be better if we could use the spatial information to build a structural object description. However this
task in not very simple in the presence of occlusion, clutter
or view point changes. There has been considerable work
done towards building a robust structural object descriptor. Some of this work involves the generative parts model
[3], [4] which associates a certain relationship between the
position of detected parts. Another approach considered
efficient for this task finds pairwise relations between the
neighboring local features. However, these methods are too
computationally expensive or they have yielded inconclusive results.

However, in this work the authors move away from the

trying to develop geometrically invariant structural representations and propose a global non-invariant representation. The main idea here is to gather the local statistics of
an image over small patches on a regular grid and then finding geometric correspondences between these aggregated
statistics. So, this method makes sure that the images have
local invariance over small patches and the corresponding
local histograms in two images are matched to obtain correspondence matching in a global scale. To do this, we use
kernel based classification where we build a pyramid with
the image by subdividing the image into grids with increasingly fine resolution and then take the a histogram of local
features in each of these grid boxes. We then find the geometric correspondence with another image by matching it
with its pyramid which is constructed in the same fashion.
This idea is based on Grauman and Darrells pyramid matching scheme [5].
To evaluate this method, here we implement it and test it
on the Caltech-256. This dataset is considered harder than
the Caltech-101, since it has a lot of clutter, objects are not
centered and images in it do not contain corner artifacts.
The author argues that since the global statistics of an
image provide good hints about the category of an image
we can use this method as a precursor to subsequent object recognition tasks. This method can provide hints to the
object recognition component of the algorithm.

2. Previous work
In computer vision histograms are widely used for image
description. Koendernik and Van Doorn [?] replaced local
image structure with local histograms. Essentially they discard the precise location of individual image elements. In
this sense histogram images are locally orderless images.
(Basically for each region of interest, where ROI is a Gaussian aperture with given location and scale, they compute
histograms of features over that ROI). The proposed spatial pyramid approach can be considered as an alternative
way of creating histogram images, where instead of using
Gaussian apertures, a fixed hierarchy of rectangular win-

dows is used. Experiments have shown that spatial pyramid

approach for matching is a very powerful mechanism for
estimating overall perceptual similarity between images.
An opposite approach called the multiresolution histograms was proposed by Hadjidemetriou et. al. [?] Here,
they repeatedly subsample an image and compute the histogram for each different resolution image. While, they
change the resolution of the image and compute its features,
having a fixed resolution for the histograms we find the features of a fixed resolution image, but vary the spatial resolution at which the histograms are aggregated. By keeping
the resolution fixed we preserve more information and have
higher dimensional representations. Because of this key difference, this method can be used for approximate geometric
matching when an appropriate kernel is used.
The method of subdividing image into sub blocks and
then computing the histograms of local features has already
been used in computer vision for global image description
and for local description of regions of interest. The important point is to find the right subdivision scheme and the
right balance between subdividing and disordering. In
this work, the authors suggest an approach. They argue that
good results can be achieved by combining multiple resolutions in a principled way. Since this method performs
approximate geometric matching, the reason for success of
subdivide and disorder techniques could be because of
this operation.

3. Spatial pyramid matching

3.1. Pyramid Match Kernels
Assume X and Y to be sets of vectors in d-dimensional
feature space. In order to find the approximate matching
between these sets, Grauman and Darrell proposed Pyramid
matching technique. Briefly, they propose that we place a
sequence of increasingly coarser grids in the feature space
and take a weighted sum of the number of matches that occur at each level of resolution. We consider two points to
match if they appear in the same cell of the grid. Matches
at finer resolutions will be given higher weight when compared to matches that occur at a coarser resolution. Let us
construct the sequence of grids at resolutions from 0, ., L.
Grids will be constructed in a way, such that at level l it
will have 2l cells along each dimension, in total creating
l
D = 2dl cells. Assume HX
and HYl to be the histograms
l
of X and Y at the level l, so that HX
(i) and HYl (i) will
be the number of points falling into the ith cell of the grid.
Then we can compute the number of matches at level l by
using:
Il =

D
X
i=1

l
min(HX
(i), HYl (i))

(1)

Another thing to note here is that the number of matches

found at level l includes all the matches found at level l + 1.
This way the number of new matches found at level l will
be I l I l+1 for all l = 0, ..., L 1 and will have a weight
1
. Note that the weight assigned is inversely
equal to 2Ll
proportional to the cell width at that level. Intuitively, we
are trying to penalize the larger number of matches which
are found at lower resolutions, since they involve increasingly dissimilar features. In the end our pyramid matching
kernel will look like:
L1
X

X
1
1
(I l I l+1 ) = L I 0 +
Il
Ll
2
2
2Ll+1
l=1
l=0
(2)
Both the histogram intersection and the pyramid match
kernel are Mercer kernels [5]. This way, we know that when
we evaluate the kernel function, we are finding the inner
product between the elements of a rich feature space. Consequently this kernel function can be used in an SVM to do
classification.
k L (X, Y ) = I L +

3.2. Spatial Matching Scheme

Grauman and Darrell proposed that we match two collections of features when they are put in the high dimensional feature space (which throws away all the spatial information available). Here, however the authors propose
a different approach which does pyramid matching on the
two-dimensional image layout space. Each feature vector is
quantized into M discrete types (which we can call as Visual Words). We will assume that only features of the same
type can be matched to one another. For each channel m we
will get two sets of two-dimensional vectors, Xm and Ym ,
which represents the coordinates of features type m found
in the images. The final kernel will be the sum of all separate channel kernels:
K L (X, Y ) =

M
X

k L (Xm , Ym )

(3)

m=1

Since we are clustering the features into a Visual Vocabulary ,this method is a regular bag of features approach
when L = 0. Another important point to note here is
that we can implement the final kernel as a single histogram intersection of vectors, formed by concatenating the
weighted histograms of all the channels at all resolutions
(Fig 1). This is possible because the pyramid match kernel is a weighted sum of histogram intersections and since
cmin(a, b) = min(ca, cb) for positive numbers. For L levels P
and M channels, the resulting vector has dimensionality
L
M l=0 4l = M 13 (4L+1 1). If we set M = 400 channels
and have 3 levels we will have to do 34000-dimensional histogram intersection, but since these histograms are sparse,

set of patches from the training set to form a visual vocabulary. Typical vocabulary sizes for experiments are M = 200
and M = 400.
Since the results shown by the weak features were not
very good, in our work we decided to use only the strong
features which are the SIFT descriptors.

5. Authors Experiments
Figure 1. Example of constructing a three-level pyramid. The image has three feature types, indicated by circles, diamonds, and
crosses. At the top, the image is subdivided into three different
levels of resolution. Next, for each level of resolution and each
channel, features that fall in each spatial bin are counted. Finally,
each spatial histogram is weighted according to eq. (2)

these operations are efficient. The authors note that there

was no significant increase in performance beyond M =
200 and L = 2, where the concatenated histograms are
4200 dimensional. For simplicity, in our data set we will
resize all images to be the same size. If we did not do this,
we would have to normalize the histograms since the images in the datasets might not all be of the same size. This
involves normalizing with the total weight of all the features
in the image, which would force the number of features in
all the images to be the same.
Another interesting performance improvement that we
discovered and implemented was to only find the histogram
of the patches at the finest resolution. After this, we can
find the histograms of the coarser patches at the lower resolutions by simply adding the histograms of the appropriate
patches (from the finer resolution) that constitute this larger
patch.

4. Feature Extraction
The authors use two kinds of features for their experiments. The first type of features they use are dubbed weak
features, which are oriented edge points, namely points
whose gradient magnitude in a given direction exceeds a
minimum threshold. To create features which are similar to
Torralbas gist features [7], the authors extract edge points
at two scales and eight orientations, for total of M = 16
channels.
Then, for better discriminative power, they use what they
dub as strong features, which are actually SIFT descriptors of 16 16 pixel patches computed over a grid with
spacing of 8 pixels. The authors propose the use of a dense
regular grid instead of only interest points, since comparative evaluation of Fei-Fei and Perona [3] show that dense
features work better for scene classification. This is because
they capture uniform regions like the sky, calm water etc.
After this, we perform k-means clustering of a random sub-

For their experiments Lazebnik et al used three different

datasets: fifteen scene categories, Caltech-101, and Graz.
They use the per class recognition rate as a measure of
performance since there are different number of pictures
in each class. For this, they run the experiment 10 times
with different random training and test sets and take the
mean and standard deviation of the results from individual runs. They have done Multi-class classification using
many binary SVMs using the one against all classification
technique. The classifier with the maximum positive response determines the class of the test image. They used
only grayscale images for all their experiments.
The fifteen scene categories dataset is composed of thirteen provided by Fei-Fei and Perona [3] (eight of these were
originally collected by Oliva and Torralba [7]), and two
(industrial and store) were collected by the author. Each
category has 200 to 400 images, and average image size
is 300 250 pixels. Classification experiments were done
using 100 images per class for training and the rest for testing. Strong features do better than the weak features. Going
from M = 200 to M = 400 does not improve performance
very much. Although single level performances increase as
we increase the resolution of the grid, combining the levels
to form a pyramid gives better performance. For all three
kinds of features, results improve dramatically as they go
from L = 0 (bag of features) approach to a multi-level pyramid setup conferring a statistically significant benefit. But
for strong features, single-level performance actually drops
as they go from L = 2 to L = 3. This can be explained
with the pyramid being too finely subdivided, with individual bins yielding too few matches. Table 1 shows the results
for fifteen scene categories.
Their second set of experiments were done on the
Caltech-101 dataset. Although diverse, in this dataset most
images have relatively little clutter, and the objects are centered and occupy most of the image. It is also important
to mention that some images are affected by corner artifacts resulting from artificial image rotation. These artifacts
can provide cues resulting in misleadingly high recognition
rate. They train on 30 images per class and test on the rest
of the images with maxing out at 50 images per class. Table 2 gives a breakdown of classification rates for different
pyramid levels for weak features and strong features with
M = 200. For their experiments the successful classes
are either dominated by rotation artifacts (like minaret),or

L
0(1 1)
1(2 2)
2(4 4)
3(8 8)

Weak features (M = 16)

Single-level Pyramid
45.3 0.5
53.6 0.3 56.2 0.6
61.7 0.6 64.6 0.7
63.3 0.8 66.8 0.6

Strong features (M = 200)

Single-level Pyramid
72.2 0.6
77.9 0.6 79.0 0.5
79.4 0.3 81.1 0.3
77.2 0.4 80.7 0.3

Strong features (M = 400)

Single-level Pyramid
74.8 0.3
78.8 0.4 80.1 0.5
79.7 0.5 81.4 0.5
77.3 0.5 81.1 0.6

Table 1. Classification results for the scene category database

Class
Bikes
People

L=0
82.4 2.0
79.5 2.3

L=0
86.3 2.5
82.3 3.1

Table 3. Results of our method (M = 200) for the Graz database

and comparison with two existing methods.

have very little clutter (like chair), or represent coherent

natural scenes (like joshua tree and okapi). The least successful classes are either textureless animals (like beaver
and cougar), animals that camouflage well in their environment (like crocodile), or thin objects (like ant). Their work
beats state-of-the-art orderless methods and precise geometric correspondence.
Although their work was not designed to cope with
heavy clutter and pose changes, the authors wanted to check
how much of the global scene cues can their algorithm exploit even under these conditions. So, they tested their
method on the Graz dataset [9],which is characterized by
high intra-class variation. It has two object classes, bikes
(373 images) and persons (460 images), and a background
class (270 images). The image resolution is 640 480, and
the range of scales and poses at which exemplars are presented is very diverse. They train detectors for persons and
bikes on 100 positive and 100 negative images and test on
a similarly distributed set. Table 3 summarizes their results
for strong features with M = 200.

6. Our Experiments
We tested our implementation on the Caltech 256 [8]
which is considered to be harder than the Caltech 101 on
which the authors have already tested. This dataset is much
bigger (with 30608 images) and has many images per category (much more than the Caltech 101). The Caltech 256
also has more clutter and occlusion. The images are not leftright aligned and does not suffer from the rotation artifacts
which exists in the Caltech 101 (this gave high recognition
rates for the method because they provide stable cues).
For our experiments we are using only the strong features which we have described before. For this, we are
taking the SIFT descriptors of the image in a dense fashion over a 16 16 pixel patch of the image at a time and
moving the patch by 8 pixels. Like the authors, we are tak-

Number of Images
Classified Correctly
Percentage of Images
Classified Correctly
Number of Classes
Detected

Level 1

Level 2

Level 3

21
150

26
150

27
150

14 %

17.33%

18 %

17
50

18
50

Table 4. The results obtained for the following setup: Number of

Classes: 50, Training Images per Class: 5, Testing Images per
Class: 3, Number of Visual Words: 300

Number of Images
Classified Correctly
Percentage of Images
Classified Correctly
Number of Classes
Detected

Level 1

Level 2

Level 3

9
50

8
50

18 %

18%

16 %

9
50

8
50

Table 5. The results obtained for the following setup: Number of

Classes: 50, Training Images per Class: 7, Testing Images per
Class: 1, Number of Visual Words: 300

ing the descriptors over a dense patch and not just for the
points of interest. We are also resizing all the images to
320320 to eliminate the need for histogram normalization
(normalization becomes necessary when the histograms are
collected over different patch sizes for images of different sizes). After generating the descriptors, we are creating a feature space by randomly selecting descriptors from
patches of the images. Random selection is done rather than
taking all the descriptors since that would be prohibitively
costly. After creating the feature space, we run a clustering
algorithm (we used k-means) on the feature space to obtain
a set of visual words. This forms our visual vocabulary.
For our experiments, we have used M = 300 visual words
and M = 500 visual words. Since we wanted to check
the benefit of having more visual words we created a set of
M = 500 visual words and tested the algorithm with these.
After creating the visual words, we assign the features to
the visual words that they are closest to (i.e. we assign the
features the cluster label of the cluster that they belong to).

L
0
1
2
3

Weak features
Single-level
15.5 0.9
31.4 1.2
47.2 1.1
52.2 0.8

Pyramid
32.8 1.3
49.3 1.4
54.0 1.1

Strong features (200)

Single-level Pyramid
41.2 1.2
55.9 0.9 57.0 0.8
63.6 0.9 64.6 0.8
60.3 0.9 64.6 0.7

Table 2. Classification results for the Caltech-101 database.

After this, we do Multi-class classification using many

binary SVMs. We are training every SVM to learn one
class from the rest of the classes. Naturally, we have as
many SVMs as there are classes. The kernel used for the
training and testing is the spatial pyramid match kernel as
described above. For testing, we check the probability of
every test image belonging to every class and classify the
test image as belonging to the class which is the most probable. We have used 50 categories from the 256 categories
of Caltech-256 mainly since we were working on our personal computers which are not the best, performance wise.
As our performance measure, we have used the percentage
of images classified correctly. This is a feasible measure of
performance since the test set for all the chosen classes are
all of the same size. For the first round, we have taken 5
training and 3 testing images per class for the 50 classes.
We have a visual vocabulary of M = 300 words and do the
classification using L = 1, 2 and 3. We see that the classification rate goes up as we increase L from 14% for L = 1
to 18% for L = 3. Please find the complete results in the
Table 4. These recognition rates are consistent with the current benchmark results on the Caltech-256 for a training set
(Fig 2) of size 5. As the size of the training set increases,
the results will improve. To test this, we re-distributed the
above set of training and testing images such that the size
of the training set is now 7 images per class and the size of
test set is now 1 image per class. Another reason for doing this is because taking dense SIFT descriptors for 500
images once again takes a very long time on our personal
laptops. With this new set we again found M = 300 visual
words and did the classification using many binary SVMs.
Our results were 18% for L = 1, 18% for L = 2 and 16%
for L = 3. Please find the complete results in the Table
5. Although, this is better than the results obtained with 5
training images per category, we feel that the effects of having a larger training set would be more pronounced if we
had a larger test set per class. With this belief, we now narrowed down the number of classes that we want to test as
5 from the 256 classes. However, this time we wanted to
test on a larger training and testing set. Hence, we used 20
training images per class for the 5 classes and 3 test images
per class. We once again found a dense set of SIFT descriptors for these 115 images and created a visual vocabulary
of size M = 300. Surely, our results were better. They

Number of Images
Classified Correctly
Percentage of Images
Classified Correctly
Number of Classes
Detected
100% detection?
No Detection?

Level 1

Level 2

Level 3

8
15

7
15

53 %

53%

46.67 %

4
5

3
5

Class 3

Classes
2,3
Classes
4,5

Class 2

Class 4

Classes
4,5

Table 6. The results obtained for the following setup: Number of

Classes: 5, Training Images per Class: 20, Testing Images per
Class: 3, Number of Visual Words: 300

were 53% correct for L = 1 and L = 2 but only 46.67%

correct for L = 3. Another thing we wanted to test was the
effect of increasing the size of the visual vocabulary from
M = 300 to M = 500. So, we used the same setup as before but this time with M = 500 visual words. This gave a
huge benefit for level L = 1 since the correct classification
percentage sprung up to 66.67% here. However, as we increase the level to L = 2 and L = 3 the performance drops
to 46.67% and 40%. Please see the complete results in the
Tables 6 and 7.
Some of the hard classes to classify even with a large
training set were baseball bat because the test images used
were very different from the training set. Another hard class
was the baseball glove since most of the training images
has 2 gloves but the testing images (Fig 3) have only one
glove. However, this class did have some recognition as
opposed to the baseball bat. An easy class was the backpack class.

7. Conclusion
The above discussed method presents a holistic
method for image categorization. Despite its simplicity it
has shown good results when compared to methods that
construct a structural model for object recognition. It even
does better than an orderless image representation scheme
which is not a trivial accomplishment. It highlights the
power of global scene statistics and provides useful discrim-

Number of Images Classified Correctly

Percentage of Images Classified Correctly
Number of Classes Detected
100% detection?
No Detection?

Level 1

Level 2

Level 3

10
15

7
15

6
15

66.67 %

46.67 %

40 %

4
5

3
5

Classes 1,2,3
Class 4

Class 3
Class 4

None
Classes 4,5

Table 7. Results for the followin setup: Number of Classes: 5, Training Images per Class: 20, Testing Images per Class: 3, Number of
Visual Words: 500

Figure 2. Training Set: The training set used to train the SVM for the baseball bat, baseball gloves and backpacks

cremental Bayesian approach tested on 101 object categories. In IEEE CVPR Workshop on Generative-Model
Based Vision, 2004.
[4] R. Fergus, P. Perona, and A. Zisserman. Object class
recognition by unsupervised scale-invariant learning. In
Proc. CVPR, volume 2, pages 264271, 2003.
[5] K. Grauman and T. Darrell. Pyramid match kernels:
Discriminative classification with sets of image features. In Proc. ICCV, 2005.
Figure 3. Test set: Backpack always had high recognition, while
very difficult test images were the baseball bats (no recognition)
and baseball gloves (some low recognition)

inative information. The author claims that it can be used to

provide context for larger object recognition systems or
can be used to evaluate biases in new datasets.

References
[1] Svetlana Lazebnik, Cordelia Schmid, Jean Ponce, Beyond Bags of Features: Spatial Pyramid Matching for
Recognizing Natural Scene Categories, In Proc. CVPR,
2006.
[2] Griffin, G. Holub, AD. Perona, P. The Caltech-256,
Caltech Technical Report.
[3] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an in-

[6] J. Koenderink and A. V. Doorn. The structure of locally

orderless images. IJCV, 31(2/3):159168, 1999.
[7] A. Torralba, K. P. Murphy, W. T. Freeman, and M. A.
Rubin. Context-based vision system for place and object recognition. In Proc. ICCV, 2003.
[8] G. Griffin and A. Holub and P. Perona Caltech-256 Object Category Dataset, California Institute of Technology , 2007
[9] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer. Weak
hypotheses and boosting for generic object detection
and recognition. In Proc. ECCV, volume 2, pages 7184,
2004.

Army Intelligence Training Strategy 2014
100% (2)
Army Intelligence Training Strategy 2014
27 pages
Temesgen Tegabu
No ratings yet
Temesgen Tegabu
225 pages
Beyond Bags of Features: Spatial Pyramid Matching For Recognizing Natural Scene Categories
No ratings yet
Beyond Bags of Features: Spatial Pyramid Matching For Recognizing Natural Scene Categories
8 pages
pyramid_chapter
No ratings yet
pyramid_chapter
19 pages
1214 - خمسه عشری
No ratings yet
1214 - خمسه عشری
5 pages
Grauman Darrell Iccv05
No ratings yet
Grauman Darrell Iccv05
8 pages
Scale Space: Exploring Dimensions in Computer Vision
From Everand
Scale Space: Exploring Dimensions in Computer Vision
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Geometric Hashing: Efficient Algorithms for Image Recognition and Matching
From Everand
Geometric Hashing: Efficient Algorithms for Image Recognition and Matching
Fouad Sabry
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
Sift
No ratings yet
Sift
28 pages
Keypoint Recognition Using Randomized Trees
No ratings yet
Keypoint Recognition Using Randomized Trees
29 pages
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Q1 - SIFT - Distinctive Image Features From Scale-Invariant Keypoints
No ratings yet
Q1 - SIFT - Distinctive Image Features From Scale-Invariant Keypoints
20 pages
Deformable Spatial Pyramid Matching For Fast Dense Correspondences
No ratings yet
Deformable Spatial Pyramid Matching For Fast Dense Correspondences
8 pages
Leaf Image Classification With Shape Context and SIFT Descriptors
No ratings yet
Leaf Image Classification With Shape Context and SIFT Descriptors
5 pages
Clasificación de Imájenes
No ratings yet
Clasificación de Imájenes
16 pages
Image Features Detection, Description and Matching: M. Hassaballah, Aly Amin Abdelmgeid and Hammam A. Alshazly
No ratings yet
Image Features Detection, Description and Matching: M. Hassaballah, Aly Amin Abdelmgeid and Hammam A. Alshazly
36 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
SSTD05 SimilarityModel
No ratings yet
SSTD05 SimilarityModel
18 pages
Kernel Visual Recognition
No ratings yet
Kernel Visual Recognition
9 pages
Fast Feature Pyramids For Object Detection
No ratings yet
Fast Feature Pyramids For Object Detection
14 pages
Multimedia Systems: Multimedia Databases - Image Processing Basics
No ratings yet
Multimedia Systems: Multimedia Databases - Image Processing Basics
58 pages
Qualitative Representations in Large Spatial Databases
No ratings yet
Qualitative Representations in Large Spatial Databases
8 pages
Berg Cvpr05
No ratings yet
Berg Cvpr05
8 pages
IT5409 Ch4 Part2 Feature ExtractionMatching
No ratings yet
IT5409 Ch4 Part2 Feature ExtractionMatching
85 pages
Modeling Multi-Object Spatial Relationships For Satellite Image Database Indexing and Retrieval
No ratings yet
Modeling Multi-Object Spatial Relationships For Satellite Image Database Indexing and Retrieval
10 pages
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
Content Search
No ratings yet
Content Search
4 pages
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Unit-6 Image Segmentation
No ratings yet
Unit-6 Image Segmentation
8 pages
Felisberto Et-Al 2003
No ratings yet
Felisberto Et-Al 2003
10 pages
Paper Geohash
No ratings yet
Paper Geohash
12 pages
Image_Approximation_from_Gray_Scale_Medial_Axes
No ratings yet
Image_Approximation_from_Gray_Scale_Medial_Axes
10 pages
2002 - Using Pattern Matching On A Flexible, Horizon-Aligned Grid For Robotic Vision
No ratings yet
2002 - Using Pattern Matching On A Flexible, Horizon-Aligned Grid For Robotic Vision
9 pages
IT5409 - Ch4 - Part2 - Feature ExtractionMatching - 4pages
No ratings yet
IT5409 - Ch4 - Part2 - Feature ExtractionMatching - 4pages
43 pages
Theory Multiresolution Signal Decomposition: The Wavelet Representation
No ratings yet
Theory Multiresolution Signal Decomposition: The Wavelet Representation
20 pages
Feature Extraction For Image Retrieval Using Image Mining Techniques
No ratings yet
Feature Extraction For Image Retrieval Using Image Mining Techniques
7 pages
Two Dimensional Geometric Model: Understanding and Applications in Computer Vision
From Everand
Two Dimensional Geometric Model: Understanding and Applications in Computer Vision
Fouad Sabry
No ratings yet
E Cient Histogram-Based Similarity Search in Ultra-High Dimensional Space
No ratings yet
E Cient Histogram-Based Similarity Search in Ultra-High Dimensional Space
15 pages
Content Based Image Retrieval Based On Color, Texture and Shape Features Using Image and Its Complement
No ratings yet
Content Based Image Retrieval Based On Color, Texture and Shape Features Using Image and Its Complement
11 pages
Two Types of Image Segmentation Exist:: Semantic Segmentation. Objects Shown in An Image Are Grouped Based On
No ratings yet
Two Types of Image Segmentation Exist:: Semantic Segmentation. Objects Shown in An Image Are Grouped Based On
25 pages
Classification of Images Using Similar Objects
No ratings yet
Classification of Images Using Similar Objects
4 pages
Lec 27
No ratings yet
Lec 27
25 pages
An Area of Application of Computer Visio1
No ratings yet
An Area of Application of Computer Visio1
17 pages
InvariantFeaturesFromInterestPointGroups Brown2002 PDF
No ratings yet
InvariantFeaturesFromInterestPointGroups Brown2002 PDF
10 pages
Invariant Features From Interest Point Groups
No ratings yet
Invariant Features From Interest Point Groups
10 pages
Image Segmentation
No ratings yet
Image Segmentation
9 pages
Feature Transfer and Matching in Dispara
No ratings yet
Feature Transfer and Matching in Dispara
6 pages
Ankerst kdd2001
No ratings yet
Ankerst kdd2001
8 pages
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
Exercises of Numerical Analysis
From Everand
Exercises of Numerical Analysis
Simone Malacrida
No ratings yet
Amelio 2019
No ratings yet
Amelio 2019
14 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Tempered Image Detection
No ratings yet
Tempered Image Detection
6 pages
Normalized Cuts and Image Segmentation: Jianbo Shi and Jitendra Malik, Member, IEEE
No ratings yet
Normalized Cuts and Image Segmentation: Jianbo Shi and Jitendra Malik, Member, IEEE
18 pages
Multi-Image Matching Using Multi-Scale Oriented Patches: Matthew Brown Richard Szeliski Simon Winder
No ratings yet
Multi-Image Matching Using Multi-Scale Oriented Patches: Matthew Brown Richard Szeliski Simon Winder
8 pages
Goel Efficient Category Mining 2013 CVPR Paper
No ratings yet
Goel Efficient Category Mining 2013 CVPR Paper
7 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Image Segmentation
No ratings yet
Image Segmentation
18 pages
5segmentation
No ratings yet
5segmentation
62 pages
Shape Recognition From Large Image Libraries by in
No ratings yet
Shape Recognition From Large Image Libraries by in
12 pages
Zeiler Ec CV 2014
No ratings yet
Zeiler Ec CV 2014
16 pages
Real-Time Airport Security Checkpoint Surveillance Using A Camera Network
No ratings yet
Real-Time Airport Security Checkpoint Surveillance Using A Camera Network
8 pages
Scalable Recognition With A Vocabulary Tree PDF
No ratings yet
Scalable Recognition With A Vocabulary Tree PDF
8 pages
Semantic Hierarchies For Image Annotation - A Survey
No ratings yet
Semantic Hierarchies For Image Annotation - A Survey
41 pages
Predicting Student Success
No ratings yet
Predicting Student Success
14 pages
SVM - Hype or Hallelujah
No ratings yet
SVM - Hype or Hallelujah
13 pages
Classifiers For Educational Data Mining
No ratings yet
Classifiers For Educational Data Mining
34 pages
Pomodoro
No ratings yet
Pomodoro
1 page
Synopsis
No ratings yet
Synopsis
3 pages
Categorical Data Analysis With Graphics
No ratings yet
Categorical Data Analysis With Graphics
104 pages
Quesnay - Tableau Economique (Translation)
No ratings yet
Quesnay - Tableau Economique (Translation)
17 pages
PUS+STO PLD - Application-Note-EN-Ver1
No ratings yet
PUS+STO PLD - Application-Note-EN-Ver1
9 pages
Polycet Booklet 2024
No ratings yet
Polycet Booklet 2024
80 pages
ALL ECE S4 2019 Scheme Syllabus Ktustudents - in
No ratings yet
ALL ECE S4 2019 Scheme Syllabus Ktustudents - in
86 pages
Risk Assessment: Child Labour-Tackling The Root Cause
No ratings yet
Risk Assessment: Child Labour-Tackling The Root Cause
3 pages
APPENDIXE - 1 STR-1386-tests PDF
No ratings yet
APPENDIXE - 1 STR-1386-tests PDF
4 pages
Ground Soil Improvement Work For The Construction of Udaipur Station Yard in State of Tripura by Using Pre-Fabricated Vertical Drains (PVDS)
No ratings yet
Ground Soil Improvement Work For The Construction of Udaipur Station Yard in State of Tripura by Using Pre-Fabricated Vertical Drains (PVDS)
7 pages
MCE 233 Mechanics of Machines III Part 1
No ratings yet
MCE 233 Mechanics of Machines III Part 1
42 pages
Maintenance Manual - Smoke Extract Fans
No ratings yet
Maintenance Manual - Smoke Extract Fans
3 pages
Sta. Monica Industrial v. DAR Regional Director
No ratings yet
Sta. Monica Industrial v. DAR Regional Director
16 pages
Light Alloy Drill Pipe of Improved Dependability: Aquatic - Dril Pipe Company "Adp" LLC
No ratings yet
Light Alloy Drill Pipe of Improved Dependability: Aquatic - Dril Pipe Company "Adp" LLC
9 pages
CCP150 Studio 5000
No ratings yet
CCP150 Studio 5000
2 pages
Shailendra FICO-TRM-BW-BPC-REFX Consultant
No ratings yet
Shailendra FICO-TRM-BW-BPC-REFX Consultant
3 pages
MIAA Vs Rodriguez (GR No. 161836)
100% (1)
MIAA Vs Rodriguez (GR No. 161836)
2 pages
Motorola Razr V3 Manual
No ratings yet
Motorola Razr V3 Manual
100 pages
Pokemon ROM Hacks (The Best and Most Downloaded in 2025)
No ratings yet
Pokemon ROM Hacks (The Best and Most Downloaded in 2025)
1 page
16 CVT Control System
No ratings yet
16 CVT Control System
19 pages
Electronics Tutorials Ws
No ratings yet
Electronics Tutorials Ws
11 pages
Battalion Team
100% (2)
Battalion Team
268 pages
Writing A Business Plan
100% (1)
Writing A Business Plan
6 pages
EEE482 - Exp 8 - Frequency Modulation
No ratings yet
EEE482 - Exp 8 - Frequency Modulation
7 pages
50 Velilla v. Posadas
No ratings yet
50 Velilla v. Posadas
2 pages
Shalimar
No ratings yet
Shalimar
109 pages
ABCs of Modulation Domain Analysis AN01 201 Rev3
No ratings yet
ABCs of Modulation Domain Analysis AN01 201 Rev3
4 pages
Tendon-Gear Mechanism For Robot Arm (School Project)
No ratings yet
Tendon-Gear Mechanism For Robot Arm (School Project)
17 pages
Citizens Charter 2022final
No ratings yet
Citizens Charter 2022final
64 pages
Corporate Restructuring Insolvency Liquidation
No ratings yet
Corporate Restructuring Insolvency Liquidation
290 pages
Guideline On Provisional Certifiaction As Primary Care Provider For FCM Residents
No ratings yet
Guideline On Provisional Certifiaction As Primary Care Provider For FCM Residents
2 pages

Spatial Pyramid Matching For Scene Category Recognition

Uploaded by

Spatial Pyramid Matching For Scene Category Recognition

Uploaded by

Spatial Pyramid Matching for Scene Category Recognition

Nityananda Jayadevprakash, Olzhas Makhambetov

However, in this work the authors move away from the

dows is used. Experiments have shown that spatial pyramid

3. Spatial pyramid matching

Another thing to note here is that the number of matches

3.2. Spatial Matching Scheme

these operations are efficient. The authors note that there

For their experiments Lazebnik et al used three different

Weak features (M = 16)

Strong features (M = 200)

Strong features (M = 400)

Table 1. Classification results for the scene category database

Table 3. Results of our method (M = 200) for the Graz database

have very little clutter (like chair), or represent coherent

Table 4. The results obtained for the following setup: Number of

Table 5. The results obtained for the following setup: Number of

Strong features (200)

Table 2. Classification results for the Caltech-101 database.

After this, we do Multi-class classification using many

Table 6. The results obtained for the following setup: Number of

were 53% correct for L = 1 and L = 2 but only 46.67%

Number of Images Classified Correctly

inative information. The author claims that it can be used to

[6] J. Koenderink and A. V. Doorn. The structure of locally

You might also like