0% found this document useful (0 votes)
6 views

CBIR BasicLec15

Uploaded by

mhc2023006
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

CBIR BasicLec15

Uploaded by

mhc2023006
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 95

Content Based Image Retrieval

Thanks to John Tait


Outline
– Introduction
• Why image retrieval is hard
• How images are represented
• Current approaches
– Indexing and Retrieving Images
• Navigational approaches
• Relevance Feedback
• Automatic Keywording
– Advanced Topics, Futures and Conclusion
• Video and music retrieval
• Towards practical systems
• Conclusions and Feedback
2
Scope
 General Digital Still Photographic Image
Retrieval
– Generally colour
 Some different issues arise
– Narrower domains
• E.g.Medical images especially where part of body
and/or specific disorder is suspected
– Video
– Image Understanding - object recognition
3
Thanks to
 John Tait
 Chih-Fong Tsai

 Sharon McDonald

 Ken McGarry

 Simon Farrand

 And members of the University of

Sunderland Information Retrieval Group


4
Introduction
Why is Image Retrieval Hard ?
? What is the topic of
this image
? What are right
keywords to index
this image
? What words would
you use to retrieve
this image ?
 The Semantic Gap

6
Problems with Image Retrieval
 A picture is worth a thousand words
 The meaning of an image is highly
individual and subjective

7
How similar are these two
images

8
How Images are represented
10
11
Compression
• In practice images are stored as compressed
raster
– Jpeg
– Mpeg
• Cf Vector …
• Not Relevant to retrieval

12
Image Processing for Retrieval
• Representing the Images
– Segmentation
– Low Level Features
• Colour
• Texture
• Shape

13
Image Features
• Information about colour or texture or
shape which are extracted from an image
are known as image features
– Also a low-level features
• Red, sandy
– As opposed to high level features or concepts
• Beaches, mountains, happy, serene, George Bush

14
Image Segmentation
• Do we consider the whole image or just part
?
– Whole image - global features
– Parts of image - local features

15
Global features
• Averages across whole image
 Tends to loose distinction between foreground
and background
 Poorly reflects human understanding of images
 Computationally simple
 A number of successful systems have been built
using global image features including
Sunderland’s CHROMA

16
Local Features
• Segment images into parts
• Two sorts:
– Tile Based
– Region based

17
Regioning and Tiling Schemes

Tiles

(a) 5 tiles (b) 9 tiles

Regions

(c) 5 regions (d) 9 regions

18
Tiling
 Break image down into simple geometric
shapes
 Similar Problems to Global
 Plus dangers of breaking up significant objects
 Computational Simple
 Some Schemes seem to work well in practice

19
Regioning
• Break Image down into visually coherent
areas
 Can identify meaningful areas and
objects
 Computationally intensive
 Unreliable

20
Colour
• Produce a colour signature for
region/whole image
• Typically done using colour correllograms
or colour histograms

21
Colour Histograms
Identify a number of buckets in which to sort
the available colours (e.g. red green and blue,
or up to ten or so colours)
Allocate each pixel in an image to a bucket
and count the number of pixels in each bucket.
Use the figure produced (bucket id plus count,
normalised for image size and resolution) as
the index key (signature) for each image.

22
Global Colour Histogram
90
80
70
60
50
40
30
20
10
0
Red Orange

23
Other Colour Issues
• Many Colour Models
– RGB (red green blue)
– HSV (Hue Saturation Value)
– Lab, etc. etc.
• Problem is getting something like human
vision
– Individual differences
24
Texture
• Produce a mathematical characterisation of
a repeating pattern in the image
– Smooth
– Sandy
– Grainy
– Stripey

25
26
27
Texture

• Reduces an area/region to a (small - 15 ?)


set of numbers which can be used a
signature for that region.
• Proven to work well in practice
• Hard for people to understand

28
Shape
• Straying into the realms of object
recognition
• Difficult and Less Commonly used

29
Ducks again
• All objects have
closed boundaries
• Shape interacts in a
rather vicious way
with segmentation
• Find the duck shapes

30
31
Summary of Image
Representation
• Pixels and Raster
• Image Segmentation
– Tiles
– Regions
• Low-level Image Features
– Colour
– Texture
– Shape
32
Indexing and Retrieving
Images
Overview of Section 2
 Quick Reprise on IR
 Navigational Approaches
 Relevance Feedback
 Automatic Keyword Annotation

34
Reprise on Key Interactive IR
ideas
 Index Time vs Query Time Processing
 Query Time
 Must be fast enough to be interactive
 Index (Crawl) Time
 Can be slow(ish)
 There to support retrieval

35
An Index
 A data structure which stores data in a suitably
abstracted and compressed form in order to
faciliate rapid processing by an application

36
Indexing Process

 Index all the images in the database


 Store the indexes in a suitable form
 Search will be done on indexes

37
Navigational Approaches
to Image Retrieval
Essential Idea
 Layout images in a virtual space in an
arrangement which will make some sense to
the user
 Project this onto the screen in a
comprehensible form
 Allow them to navigate around this projected
space (scrolling, zooming in and out)

39
Notes
 Typically colour is used
 Texture has proved difficult for people to
understand
 Shape possibly the same, and also user interface -
most people can’t draw !
 Alternatives include time (Canon’s Time
Tunnel) and recently location (GPS Cameras)
 Need some means of knowing where you are

40
Observation

 It appears people can take in and will inspect


many more images than texts when searching

41
CHROMA
 Development in Sunderland:
 mainly by Ting Sheng Lai now of National Palace
Museum, Taipei, Taiwan
 Structure Navigation System
 Thumbnail Viewer
 Similarity Searching
 Sketch Tool

42
The CHROMA System
 General Photographic Images
 Global Colour is the Primary Indexing Key
 Images organised in a hierarchical
classification using 10 colour descriptors and
colour histograms

43
Access System

44
The Navigation Tool

45
Technical Issues
 Fairly Easy to arrange image signatures so
they support rapid browsing in this space

46
Relevance Feedback
More Like this
Relevance Feedback
 Well established technique in text retrieval
 Experimental results have always shown it to work
well in practice
 Unfortunately experience with search engines
has shown that it is difficult to get real
searchers to adopt it - too much interaction

48
Essential Idea
 User performs an initial query
 Selects some relevant results
 System then extracts terms from these to
augment the initial query
 Requeries

49
Many Variants
 Pseudo
 Just assume high ranked documents are relevant
 Ask users about terms to use
 Include negative evidence
 Etc. etc.

50
Query-by-Image-Example

51
Why useful in Image Retrieval?
1. Provides a bridge between the users
understanding of images and the low level
features (colour, texture etc.) with which the
systems is actually operating
2. Is relatively easy to interface to

52
Image Retrieval Process
Green

Ducks Water Texture


Leaf Texture

53
Observations
 Most image searchers prefer to use key words
to formulate initial queries
 Eakins et al, Enser et al
 First generation systems all operated using low
level features only
 Colour, texture, shape etc.
 Smeulders et al

54
Ideal Image Retrieval Process

Thumbnail
Browsing

Need Keyword
Query

More Like
this

55
Image Retrieval as Text Retrieval

What we really want to do is make the image


retrieval problem as a text retrieval problem

56
Three Ways to go
 Manually Assign Keywords to each image
 Use text associated with the images (captions,
web pages)
 Analyse the image content to automatically
assign keywords

57
Manual Keywording
 Expensive
 Can only really be justified for high value
collections – advertising
 Unreliable
 Do the indexers and searchers see the images in
the same way
 Feasible

58
Associated Text
 Cheap
 Powerful
 Famous names/incidents
 Tends to be “one dimensional”
 Does not reflect the content rich nature of images
 Currently Operational - Google

59
Possible Sources
of Associated text
 Filenames
 Anchor Text
 Web Page Text around the anchor/where the
image is embedded

60
Automatic Keyword Assignment

A form of Content Based Image Retrieval

 Cheap (ish)
 Predictable (if not always “right”)
 No operational System Demonstrated
 Although considerable progress has been made
recently

61
Basic Approach
 Learn a mapping from the low level image
features to the words or concepts

62
Two Routes
1. Translate the image into piece of text
n Forsyth and other s
n Manmatha and others
2. Find that category of images to which a
keyword applies
n Tsai and Tait
n (SIGIR 2005)

63
Second Session Summary
 Separating Index Time and Retrieval Time
Operations
 “First generation CBIR”
 Navigation (by colour etc.)
 Relevance Feedback
 Keyword based Retrieval
 Manual Indexing
 Associated Text
 Automatic Keywording
64
Advanced Topics, Futures and
Conclusions
Outline
 Videoand Music Retrieval
 Towards Practical Systems
 Conclusions and Feedback

66
Video and Music
Retrieval
Video Retrieval
• All current Systems are based on one
or more of:
– Narrow domain - news, sport
– Use automatic speech recognition to do
speech to text on the soundtrack
– Do key frame extraction and then treat the
problem as still image retrieval

68
Missing Opportunities in Video
Retrieval
• Using delta’s - frame to frame
differences - to segment the image into
foreground/background, players, pitch,
crowd etc.
• Trying to relate image data to
language/text data

69
Music Retrieval
• Distinctive and Hard Problem
– What makes one piece of music similar to
another
• Features
– Melody
– Artist
– Genre ?

70
Towards Practical Systems
Ideal Image Retrieval Process
Thumbnail
Browsing

Need Keyword
Query

More Like
this

72
Requirements
> 5000 Key word vocabulary
 > 5% accuracy of keyword assignment for all
keywords
> 5% precision in response to single key word queries

The Semantic Gap Bridged!

73
CLAIRE
 Example State of the Art Semantic CBIR System
 Colour and Texture Features
 Simple Tiling Scheme
 Two Stage Learning Machine
 SVM/SVM and SVM/k-NN
 Colour to 10 basic colours
 Texture to one texture term per category

74
Tiling Scheme

75
Architecture of Claire

Colour

Annotation
Segmentation

Key word
Data
Data
Image

Extractor
Extractor

Texture
Texture
Texture
Classifier
Classifier

Known
Known Key
Key Word/class
Word/class

76
Training/Test Collection
 Randomly Selected from Corel
 Training Set
 30 images per category
 Test Collection
 20 images per category

77
SVM/SVM Keywording with 100+50
Categories
70%
60%
50%
40% concrete classes
abstract classes
30% baseline
20%
10%
0%
10 30 50 70 100

78
Examples Keywords
 Concrete  Abstract
 Beaches  Architecture
 Dogs  City
 Mountain  Christmas
 Orchids  Industry
 Owls  Sacred
 Rodeo  Sunsets
 Tulips  Tropical
 Women  Yuletide

79
SVM vs kNN
70%
60%
50% SVM concrete
40% SVM abstract
baseline
30% kNN abstract
20% kNN concrete

10%
0%
10 30 50 70 100 150

80
Reduction in Unreachable Classes
Missing Category Numbers
60
50
40 SVM concrete
30 SVM abstract
20 kNN concrete
kNN abstract
10
0
10 30 50 70 100 150

81
Labelling Areas of Feature Space

Mountain
Tree

Sea

82
Overlap in Feature Space

83
Keywording 200+200 Categories
SVM/1-NN
60%
concrete keywords
50%
40% abstract keywords
30%
baseline
20%
10% Expon. (abstract
keywords)
0%
10 30 50 70 100 150 200

84
Discussion
 Results still promising 5.6% of images have at least one
relevant keyword assigned
 Still useful - but only for a vocabulary of 400 words !
 See demo at http://osiris.sunderland.ac.uk/~da2wli/system/silk1/
 High proportion of categories which are never assigned

85
Segmentation

Are the results dependent on the specific


tiling/regioning scheme used ?

86
Regioning

(a) 5 tiles (b) 9 tiles

(c) 5 regions (d) 9 regions

87
Effectiveness Comparison
70.00%

61.5% (0)
60.00%

50.00%
52.5% (0)
40.67% (1)
Accuracy

40.00%
33% (0)
tiles
36.67% (0) 27.79% (1) regions
30.00%
27.9% (2)
18.4% (9)
20.00%
21.43% (5) 14.3% (30)
16.7% (15) 9.13% (69)
10.00%
13.7% (25)
8% (81)
60.00%
0.00%
10 30 50 70 100 150 200 52.5% (0)
No. of concrete classes 50.00%

48% (0)
40.00%

Accuracy
31% (0)

Five Tiles vs Five Regions


tiles
30.00%
regions
22.55% (2)
26.33% (2)

1-NN Data Extractor


20.00%
21.06% (0) 16.29% (7)
11.25% (21)
14.14% (7) 9% (51) 9.13% (69)
10.00%
9.25% (27) 8.86% (44)
8% (81)

0.00%
10 30 50 70 100 150 200
No. of abstract classes

88
Next Steps
 More categories
 Integration into complete systems
 Systematic Comparison with Generative approach
pioneered by Forsyth and others

89
Other Promising Examples
 Jeon, Manmatha and others -
 High number of categories - results difficult to interpret
 Carneiro and Vasconcelos
 Also problems with missing concepts
 Srikanth et al
 Possiblyleading results in terms of precision and
vocabulary scale

90
Conclusions
 Image Indexing and Retrieval is Hard
 Effective Image Retrieval needs a cheap and
predictable way of relating words and images
 Adaptive and Machine Learning approaches offer
one way forward with much promise

91
Feedback

Comments and Questions


Selected Bibliography
 Early Systems
The following leads into all the major trends in systems based on colour,
texture and shape
 A. Smeaulder, M. Worring, S. Santini, A. Gupta and R. Jain “Content-based Image Retrieval: the end
of the early years” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349-
1380, 2000.
 CHROMA
 Sharon McDonald and John Tait “Search Strategies in Content-Based Image Retrieval” Proceedings
of the 26th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR
2003), Toronto, July, 2003. pp 80-87. ISBN 1-58113-646-3
 Sharon McDonald, Ting-Sheng Lai and John Tait, “Evaluating a Content Based Image Retrieval
System” Proceedings of the 24 th ACM SIGIR Conference on Research and Development in
Information Retrieval (SIGIR 2001), New Orleans, September 2001. W.B. Croft, D.J. Harper, D.H.
Kraft, and J. Zobel (Eds). ISBN 1-58113-331-6 pp 232-240 .
 Translation Based Approaches
 P. Duygulu, K. Barnard, N. de Freitas and D. Forsyth “Learning a Lexicon for a Fixed Image
Vocabulary” European Conference on Computer Vision, 2002.
 K. Barnard, P. Duygulu, N. de Freitas and D. Forsyth “Matching Words and Pictures” Journal of
machine Learning Research 3: 1107-1135, 2003.
Very recent new paper on this is:
 P. Virga, P. Duygulu “Systematic Evaluation of Machine Translation Methods for Image and Video
Annotation” Images and Video Retrieval, Proceedings of CIVR 2005, Singapore, Springer, 2005.

94
 Cross-media Relevance Models etc
 J. Jeon, V. Lavrenko, R. Manmatha “Automatic Image Annotation and Retrieval using Cross-Media
Relevance Models” Proceedings of the 26th ACM SIGIR Conference on Research and Development
in Information Retrieval (SIGIR 2003), Toronto, July, 2003. Pp 119-126
See also recent unpublished papers on
http://ciir.cs.umass.edu/~manmatha/mmpapers.html
 More recent stuff
 G Carneiro and N. Vasconcelos “A Database Centric View of Sentic Image Annotation and Retrieval”
Proceedings of the 28th ACM SIGIR Conference on Research and Development in Information
Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005
 M. Srikanth, J. Varner, M. Bowden, D. Moldovan “Exploiting Ontologies for Automatic Image
Annotation” Proceedings of the 28 th ACM SIGIR Conference on Research and Development in
Information Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005
See also the SIGIR workshop proceedings
http://mmir.doc.ic.ac.uk/mmir2005

95

You might also like