SlideShare a Scribd company logo
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 1
Multimedia Information Retrieval
Stephane Marchand-Maillet
Viper group
University of Geneva
Switzerland
http://viper.unige.ch
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 2
• What is your background?
– Computer Science
– Scientific
– Humanities
• This lecture introduces intuition from
examples
– More technical details in recommended reading
Quick get-to-know
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 3
• To introduce (recall, rehearse) some key
principles and techniques of IR
• To show how these are adapted to Multimedia
• To look at the Multimedia context
• To see how to best face it
Objectives of the lecture
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 4
Outline
• Motivation and context
• Preliminary material on IR
• Audio-visual IR
• Complements
• Evaluation
4
Note: Several illustrations from within these slides have been borrowed from the Web, including Wikipedia or teaching
material. Please do not reproduce without permission from the respective authors. When in doubt, don't.
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 5
Data Production
• Growth of Data
– 1,250B GB (=1.2EB) of data generated in
2010.
– Data generation growing at a rate of 58%
per year
• Baraniuk, R., “More is Less: Signal Processing and the
Data Deluge”, Science, V331, 2011.
1 exabyte (EB) = 1,073,741,824 gigabytes
0
2000
4000
6000
8000
10000
2010 2011 2012 2013 2014
DataSize(EB)
Data Generation Growth
http://www.intel.com/conte
nt/www/us/en/communicati
ons/internet-minute-
infographic.html
http://www.ritholtz.com/blog
/2011/12/financial-industry-
interconnectedness/
Internet
Scientific
Industry
Data
By Sverre Jarp,
By Felix'Schürmann
© Copyright attached
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 6
A digital world
[From http://1000memories.com/blog/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 7
[Picture from: http://www.intel.com/content/www/us/en/communications/internet-minute-infographic.html]
Data communication
© Copyright attached
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 8
User “productivity”
[Picture from: http://www.go-gulf.com/wp-content/themes/go-gulf/blog/online-time.jpg - Feb 2012]
© Copyright attached
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 9
Motivation
• Decision making requires informed
choices
• Data production ever easier
• The information is often
overwhelming
– « Big Data » trend
• The information is often not easy to
manage and access
We need to bring a structure to the raw data
• Document (data) representation
• Similarity measurements
• Further analysis: data mining, information retrieval, learning
© Copyright attached
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 10
Components of Data Quality
• Validity
• Integrity
• Completeness
• Consistency
• Relevance
• Accuracy
and…
• Timeliness
• Interpretability
• Reliability
• …
 Applies on all media
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 11
• Searching for multimedia
– Is it like searching text?
• Information retrieval model
• Representing documents
– BoW and TF*IDF
• Indexing model
– Inverted files
– hashing
Preliminary
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 12
Searching for multimedia
Basic solution:
• Attach text to multimedia and search for text
– Tags, keywords
– Description
“Multimedia Annotation”
– Domain of “Knowledge Management”
– Requires the definition of Ontologies, Taxonomies,…
Human in the loop
– Not scalable, not feasible
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 13
[Picture from: http://www.intel.com/content/www/us/en/communications/internet-minute-infographic.html]
Per minute rate (some years ago!)
© Copyright attached
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 14
• From the sensing device
– Device type, model
– Sensing conditions (flash/no flash)
– Author (device owner?)
– …
• From the environment
– Time, date, location
– Environment conditions (inside/outside, weather)
– Spoken languages
– …
• More? From the content?
– Later in this lecture
Note: Can never be exhaustive
Auto-annotation
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 15
• “Wasp on a yellow flower”
• Specific (latin) name of the flower? Of the insect?
• What is the insect exactly doing?
• Weather looks nice: “countryside scene”?
• The picture may be arty:
“2015 1st price college photo competition winner”
Inference cannot resolve all
• “The wasp who scared my daughter during our holiday in
Gruyere”
Annotation
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 16
• Exif: standard picture annotation container
• ID3-tag: standard audio information
• NTFS: window-based metadata exchange
• JPSearch, MPEG 7: Search-oriented description
standards
• Every format will have a text header to insert
“comments” (PNG, GIF, JPG, TIFF,…)
 Complementarity text-based / content-based
• Other category is recommender-based
– “social” annotation
Annotating mulitmedia
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 17
MPEG-7 Still Image Description
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 18
IBM MPEG-7 Annotation Tool (video/visual stream)
[http://www.research.ibm.com/VideoAnnEx/]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 19
• Given a query (information need)
• Given a repository (information source)
 Infer the most relevant documents from the repository as an
answer to the query
The most used query type is Query-by-Example
“I want something like this”
• Descriptive query
• Free text (Google-like) query
• Example document-s (“ ”)
• Relevance is similarity
• Similarity is transferred to “closeness”
More like this…
Information Retrieval paradigm
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 20
• IR tells us that we do not (really) need to have a
complete absolute understanding of documents
to respond queries, we (just) need to be able to
compare 2 documents
• We merely need appropriate distance
measurements
– To infer similarity
– Based on document features (space)
– and a notion of vicinity (distance)
 Everything is about inspecting neighborhoods
– Provided the distance is semantically relevant
Information Retrieval
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 21
Information management process
Raw documents
Representation space
(visualisation)
Document features
User interaction
Feature extraction
“Appropriate”
mapping
“Decision”
process
Query
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 22
Distance-based search
Representation space
(visualisation)
Query
Relevance list
(sorted by distance
to the query)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 23
Example: text
Text documents
Feature extraction
“Appropriate”
mapping
User interaction
“Decision”
process
“Word” occurrences
Query
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 24
Also...
• Any type of media: webpage, audio, video,
data,...
• Objects, based on their characteristics
• People in social networks
• Concepts: processes, states, ... Etc
Anything for which “characteristics” may be
measured
This is what this lecture is about
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 25
• Raw data (the documents) carries information
+ Computers essentially perform additions
We need to represent the data somehow to
provide the computer with as much as possible
faithful information
• The representation is an opportunity for us to
transfer some prior (domain) knowledge as
design assumptions
If this (data modelling) step is flawed, the computer
will work with random information
Representation spaces (intuition)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 26
From documents to features
• Features help characterizing 1 document (summary)
• Features help comparing 2 documents (similarity)
• Features help structuring a collection (visualization)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 27
Comparing two documents (occurrences)
2 1 1 2 1 0 0
1 2 0 1 1 1 2
1 2 0 1 1 1 2
1 1 0 1 1 0 0
“Terms” (Vocabulary)
“Documents”(Repository)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 28
Comparing two documents (presence)
Y Y Y Y Y N N
Y Y N N Y Y Y
Y Y N N Y Y Y
Y Y N Y Y N N
Similarity: Y+Y=1 Distance: Y+N=1
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 29
Comparing two documents (presence)
Y Y Y Y Y N N
Y Y N Y Y Y Y
1 1 0 1 1 0 0
Similarity = 4
(Distance=3)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 30
Comparing two documents
6 (0)
3 (4) 3 (4)
4 (1)
3 (2) 3 (2)
S(D)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 31
“ “
Comparing two documents
S(D)
D1 Y Y Y Y Y N N 1(6)
D2 Y Y N N Y Y Y 2(4)
D3 Y Y N N Y Y Y 2(4)
D4 Y Y N Y Y N N 0 (7)
Q N N Y N N Y Y
Nota: The query is seen as a document (Query-by-example)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 32
Query-based ranking
“ “
• Since similarity is computed only on (Y,Y) pairs,
only terms count for computing the similarity
Every document having no common term
with the query scores 0
(ie terms not in the query are ignored)
2(4) 2(4) 1(6) 0(7)
D4D3 D1D2Q
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 33
Computing the ranks
Naïve solution (fill the array):
• For every document, for every term, score 1 if
the term is both in the query and the
document
 N (documents) * M (terms) comparisons
Smarter solution
• For every term of the query (few), find
documents having this term and work from
there (the rest will score 0)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 34
Information indexing
Goal:
to organize the data so as to
facilitate its access (read or write)
• Fast access for computation
• Fast access with selection criteria
• …
Baseline: exhaustive search, sequential scan (unordered list)
Strategy: enable “direct access”
Simple example:
• Address book: sorted list, with first letter shortcut
– Smith  “S”  search 19th sublist (ignore 25 other sublists)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 35
Indexing structures
…+
M-tree
Tries
Suffix array
Suffix Tree
Inverted files
LSH…
Illustration: Wikipedia
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 36
Term Documents
D1,D2,D3,D4
D1,D2,D3,D4
D1
D1,D4
D1,D2,D3,D4
D2,D3
D2,D3
“find documents having this term”
Arrange documents by the terms they contain
 Number of comparisons ~ number of query terms
<< N*M
Inverted file
(D1),(D2,D3),(D2,D3)
D1, D2, D2, D3, D3
D4 is never considered !
(we know already that its score is 0)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 37
Principles and lessons
• We need to be able to compare 2 documents
• Comparison is based on document features
• Similarity may be based on feature occurrence
• Only occurring features count in building
similarity
 Feature occurrence allows for the use of
inverted files to get independent from the size of
the repository
Feature occurrence loses a lot of structure in the
document
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 38
In practice (text IR)
• Document features are normalised word
(stem) occurrences (Bag of Words model)
– Vocabulary ~ 10’000 stems
– Billions of documents
• Term weighting scheme (TF*IDF) accounts for
feature statistics (better than Y/N occurrence)
• Similarity is based on cosine distance
• Inverted files include term statistics
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 39
BoW : Assumptions
• A text is reduced to the base vocabulary it uses
• Example:
• Becomes:
The ultimate goal is to define similarity as:
Two documents are similar if
they share the same vocabulary
Signals from a particle collider near Geneva suggested in
September that scientists might have sighted a long-sought
particle thought to be the source of mass itself.
A (x2), be, collide, from, geneva, have, in, itself, long-sought,
mass, may, near, of,
particle (x2), scientist, see, september, signal, source,
suggest, that, the, think, to
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 40
BoW: Limitations
• The structure is lost
– Title, sections, paragraphs, sentences, negations
• The representation is incomplete…
– If all words appear once only and there is no extra information such as
• Grammatical type
• Priors on terms (recent news, unused terms,…)
– And writing well is often about using a rich vocabulary
• Use of synonyms
• …or ambiguous
– If a word appears many times but because of polysemy
– Eg: mouse
• A part on mouse as a pet
• A part on computer mouse
• A part on Mickey mouse
– Does not make the text about “mouse” but maybe just about “kid
amusements”
… still works very well in practice
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 41
A document is represented by the occurrences of the
terms of the vocabulary
• Earlier example: Y/N occurrence
 Number of occurrence does not matter
• Counting the term occurrence
Favors longer documents (more occurrences)
Document length normalization (frequency)
• Not all frequent terms are relevant
Balance between representation (TF) and discriminative
power (IDF)
How good in my term in my query for this document?
A bit more details
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 42
• Term frequency (TF)
– Percentage of space taken by each term in the document:
“how much this term represents the document”
High TF  frequent term in the document  potential
representative term (keyword)
• Inverse document frequency (IDF)
– Inverse of the percentage of space taken by each term in
the collection:
“how much this term discriminates documents”
High DF  low IDF  term frequent in all documents  not
discriminant (all documents are relevant)
Eg: the, at, and, she, he, her… + thematic (patient, disease,…)
 Good terms are terms with high TF and high IDF
TF*IDF model: intuition
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 43
High TF
(frequent in document)
Potential keyword
Low TF
(infrequent in document)
High IDF
(infrequent in collection)
“Keyword”
Good representative
Verb, noun,..
Low representation
(automatically ignored)
Low IDF
(frequent in collection)
“Stop word”
Does not carry meaning
“structural”
The, at, and, she,…
(removed from voc / ignored)
Rare term
(automatically ignored)
TF*IDF term weight
When computing the difference
between documents, terms are
weighted by their TF*IDF factor
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 44
Inverted file construction
Docs
extraction sorting compaction
Term Doc ID
“blah” N
“new” 1
“the” 1
“tool” N-1
… ...
“tool” 1
“blah” N
“blah” 1
Term Doc ID
“the” 1
“new” 1
“tool” 1
“blah” N
… ...
“tool” N-1
“blah” N
“blah” 1
Term Doc ID
“blah” 1
“new” 1
“the” 1
“tool” N-1
… ...
“tool” 1
“blah” N
Freq
1
1
1
1
...
1
2
Term Tot. F.
“blah” 3
“the” 1
… ...
“tool” 2
“new” 1
...
Doc ID
1
1
1
N-1
1
N
Freq
1
1
1
1
...
1
2
factorisation
Misc
operations
Eg: posting file compression
Dictionary
Collection information
« idf »
Postings
Document information
« tf »
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 45
Value Key
1
2
…
20
…
33
…
Hashing / fingerprinting (intuition only)
=1
=2
=3
=4
=5
=6
=7
=1+1+2+3+4+4+5 = 20
=1+2+2+4+5+6+7+7 = 33
Hash Table
Query
 Immediate access
=1+2+2+4+5+6+7+7 = 33
• Associate a key (hash, fingerprint) to each value
• Insert the (key, value) into a hash table
Immediate retrieval of the query
Difficult constraint (Perceptual hashing) : The key does
not vary if the value changes a bit (eg due to noise)
Hash function
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 46
What is multimedia?
Basically our digital life
• Text
– Plain text (any language)
– Structured text (XML-like, code,…)
• Visual
– Images (Photo)
– Sketches (drawing, map,..)
• Audio
– Music
– Speech
– Sound
• Misc
– 3D object
– Video
– … (software, playlist,…)
+ Any combination
– PDF, Web pages and alike
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 47
Multimedia IR: Use Cases
• Image
– Find look-alike pictures, recognize landmark
• Video
– Find specific actions
• Music
– Find inspirational music, recognize music extract
• Medical
– Find similar cases
• Patent
– Find similar proposals, plagiarism
 Not always doable with text
47
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 48
Volume of multimedia data
• 2 billion persons connected to internet
– Mostly via mobile devices (phones)
• 1.8 billions active mobile phones
• 180 millions active web servers
Sources:
R. Baeze-Yates – RussIR 2010
http://news.netcraft.com/
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 49
Volume of multimedia data
• Web scale:
– Google:
• Early 2004: 4.3 billions indexed pages
• Early 2005: 8 billions indexed pages
• Today: XX billion indexed pages?
– http://www.archive.org
• Growth: 20 Tb/month
• Multimedia collection:
– Facebook: 140 billions photos
180 years at 25 fps
– YouTube (2008): 83.4 million videos
• Curated collections:
– AOL Video library
• 410 millions views in October 2011
– Institut National de l'Audiovisuel (INA-France):
• 700’000h audio (radio)
• 400’000h video (television)
• 2 millions documents
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 50
The new wave
[From http://1000memories.com/blog/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 51
Example: Information « loss »
• Annotation
– " Wasp on a yellow flower "
• Text search:
– « insect feeding
This document will be missed
• The user must know what is the annotation
limitation
• The goal is to represent the document
– Exhaustively: no specific interpretation context
– Automatically: no human intervention
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 52
• Image formation model
• Image features
• Indexing visual content
Visual Information Retrieval
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 53
Visual content has specific types
• Photo
• Medical image, satellite image
• Document: fax, scanned text…
• Drawing: sketch, blueprint, cartoon,…
• 3D, shape
Visual content
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 54
Example: Images
Images
Feature extraction
“Appropriate”
mapping
“Decision”
process
Search
Photo collage
Filtering
Color histogram
User interaction
Query
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 55
Pixels
RGB Sensor
RGB Color model
Any color = combination of RGB values
Image = 3 Arrays (RGB) of values between 0 and 255
Pixel = RGB values (color) at a position in the image
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 56
Image processing / Analysis
RGB Histograms
Gray scale (intensity) Edge image
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 57
Low-level image representation
Global Color Histogram data in color spaces:
Global Texture data from Gabor filter banks:
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 58
Corner detection / keypoints
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 59
Local features: SIFT[Lowe 1999] and SURF[Bay et al 2006]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 60
From data to information
Interpreting the content
• Data is physically measured
• Information is interpreted semantically
Semantic gap
• Fusion is a way to reduce the semantic gap
• Examples
– Looking at a movie w/o audio
– Listening to a story w/o visual
– Voice conversation vs video conversation
– Disambiguation (eg „jaguar“)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 61
From data to information
Semantic gap:
“Discrepancy between the level of analysis of a computer and the level of
perception of the same data by a human user.”
Level Text Visual
Interpretation Story “Action”
Semantic
segmentation
Paragraph Scene
Group Sentence Object
Physical
segmentation
Word Homogeneous
region
Physical
inspection
Character Color, texture
Levelofinterpretation
Relevanceforindexing
Computerprecision
Semantic Gap
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 62
Semantic gap
• Exploited by Captchas
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 63
Region-based image understanding
Images are compared w.r.t the regions
(hopefully objects/concepts) they contain
Object detection eg, Face detection
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 64
Object detection / recognition
• Train the machine to recognize specific groups
of patterns
Faces
[Viola, Jones, 2001]
Objects
 Automated image tagging  Text search
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 65
Naming faces : Image to text
Specific characteristics of a face are extracted
(eg eigenfaces)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 66
Deep Learning
[http://www.slideshare.net/NVIDIA/visual-computing-the-road-ahead-an-nvidia-ces-2015-presentation-deck]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 67
Bag of Visual Words
• Once basic visual features are extracted,
– Eg: the description of the surrounding of each corner point
(eg SIFT)
• Basic features are grouped
– Eg corner of an object
• And “quantified”
– Many examples are gathered into one instance
• To create a “visual dictionary” of visual words
• An image is a composition (occurrences) of these visual
words (“terms”)
 We can use of the machinery related to text
- Inverted files, etc
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 68
Comparing two documents (occurrences)
2 1 1 2 1 0 0
1 2 0 1 1 1 2
1 2 0 1 1 1 2
1 1 0 1 1 0 0
“Terms” (Vocabulary)
“Documents”(Repository)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 69
Hyperbrowser [S. Craver, 1999]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 70
Hyperbrowser [S. Craver, 1999]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 71
Digital shoebox
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 72
Digital shoebox
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 73
Collection mining
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 74
• Modeling sound
• Audio information
– Speech, music, sound
• Searching for audio
Audio IR
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 75
Example: Audio
Audio
Feature extraction
“Appropriate”
mapping
“Decision”
process
Playlist
Filtering
User interaction
Query
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 76
Sound characteristics
Sound corresponds to waves perturbing the environment
Wave characteristics
– Frequency/Period
1 Hertz (Hz) = 1 vibration/second
Related notion of pitch
– Intensity
• Human ear has logarithmic scale
Decibel (dB) scale
0dB = 10-12 W/m2 (Threshold of hearing)
10dB = 10-11 W/m2
…
x*10dB= 10-(12-x) W/m2
2
m
Watt
Area
Power
Time
Energy
Area
1
I
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 77
Sound perception
Airwave collection
Mapping to
mechanical waves
Mapping to
compressional waves
Mapping to
electrical signal
0dB = 10-12 W/m2 : Threshold of hearing
160 dB = 10-4 W/m2 : Instant perforation of eardrum
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 78
200k100k
Sound perception
0
Frequency f (Hz)
Human (20-20k)
Dog (50-45k)
Cat (50-85k)
50 100 500 1k 5k 10k 50k
Bats (50-85k)
Dolphins (-200k)
Bats (-120k)
Elephant (5-)
Infrasound Audible sound Ultrasound
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 79
Audio and Sound
50 100
80
20
500 1k 5k 10k 20k
40
60
0
120
100
Non-audible
Painful
Audible sounds
Speech
Frequency f (Hz)
Intensity I (dB)
Partly adapted from [Schäuble,1997]
Quiet library,
soft whisper
Quiet office,
living room,
Light traffic at a distance,
gentle breeze
Conversation,
Sewing machine
Busy traffic,
noisy restaurant
Subway, heavy city traffic,
alarm clock at 2 feet,
factory noise
Truck traffic, noisy home
appliances, shop tools,
lawnmower
Chain saw,
pneumatic drill
Rock concert in front of
speakers, thunderclap
(180) Rocket launching pad
(140) Gunshot blast, jet plane
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 80
Audio information
• Speech
– Male, female speech
Signal-based speech characterisation
– Automatic speech recognition (ASR)
ASR (text) retrieval
• Music
Genre recognition (Jazz, classic,…)
Excerpt retrieval
Query by humming
…
• Sound
– Car, water, people, animals,…
– Often useful in relation to visual (video)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 81
Audio information
high
low
natural man-made
Informationcontent
Origin
Animal
sounds
Wind
Water
Speech Music
Urban noise Machines,
engines
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 82
Audio features
• Low-level-audio representation
– Spectrogram
– MFCC
• Mid-level representations
– MIDI strings
– LPC
• High-level representation
– Automatic speech recognition (ASR)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 83
Translation
Signal-
based
Audio information indexing
Audio
SoundMusicSpeech
ASR
Wordspotting
Translation
Piece/Genre recognition Classification
Visual
A/V Processing
Transcripts,
Speaker info
Score,
MIDI
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 84
Speech
Speech production
model
Speech perception
model
Note: In vision, mostly only
perception models
Speech
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 85
Speech production
Human apparatus
Speech production scenarios
1. Voiced sound (vocal tract)
2. Fricative sound (‘S’)
3. Plosive sound (‘P’)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 86
Speech production modeling
Mechanical model
Human apparatus
Multistage process
1. Build up
pressure
2. Release
3. Return to first
state
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 87
Speech signal modeling
Given w(t) as a speech signal
where S(w) is the STFT of w.
The model assume H(w) as the
vocal tract transfert function
and E(w) is the excitation
222
)()()( wEwHwS 
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 88
Note: MP3 coding
Advanced coding model
MP3 = MPEG 1 Audio layer 3
See previous
slides
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 89
Sony music browser
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 90
Islands of music [http://www.oefai.at/~elias/music]
Organisation d'archives musicales
• Basée sur des cartes auto-organisées
de Kohonen (Self organising maps –
SOM)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 91
ThemeFinder [http://www.themefinder.org/]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 92
• Design a nice perceptual hash function for
audio (based on perceptual features)
• Slice an audio document (music piece) into
small chunks (large enough to be “unique”)
• For each chunk compute a key through the
hash function and store as value the music
piece ID (title)
Music query by example
Retrieval by noisy chunk
Audio fingerprinting
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 93
• Video type
– Movies
– Amateur souvenir film
– Lectures
– Youtube (short, static)
• Combining individual media (fusion)
– each brings some information…
• Going further: social multimedia…
Multimedia IR
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 94
From data to information
Interpreting the content
• Data is physically measured
• Information is interpreted semantically
Semantic gap
• Fusion is a way to reduce the semantic gap
• Examples
– Looking at a movie w/o audio
– Listening to a story w/o visual
– Voice conversation vs video conversation
– Disambiguation (eg „jaguar“)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 95
Multimodal video features
Over temporal modalities:
• Segmentation (boundary) cues
• Categorization (region) cues
Word relatedness
measure
Visual syntactic
homogeneity
Multimodal
boundary-based cues
Semantic
region- based cues
Audio
Visual
Text
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 96
Adding context to interpretation
Video Story Segmentation:
Using fusion between the visual and audio
modalities, high-level segmentation may be
achieved
“shot”
 keyframe
Audio
transcript
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 97
Main goals of fusion
Multimodal information fusion aims at interpreting
jointly multiple sources of information
representing the same underlying „concept“
The main goal is the extraction of information
By fusing information, one aims at:
• Being more accurate in the discovery of the
„concept“
– Each individual stream may be incomplete
• Being more robust in the discovery of the
„concept“
– Each individual stream may be distorted (eg, noisy)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 98
Multimodal fusion
• From many sources of information and
context, how to make our best to “interpret”
the data
representation
color
texture
shape
face
local features
vector space model
Port;;Naval and River;;For
Transportation;;Structure;;Architecture;;Vegetabl
e Garden;;Landscape Farmland -
Countryside);;Landscape;;Landscape
(Seascape);;Landscape;;Panorama;;View of the
City;;View 98
data
Interpretation
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 99
Levels of fusion
How to organise fusion from classical
„unimodal“ interpretation?
Feature
extraction
RecognizerRaw data
„Knowledge“
Output
Unique
source
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 100
Levels of fusion
Raw data
Output
Raw data
Multiple
sources
One unique
output
Recognizer Decision
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 101
Early fusion strategy
• Acts over features
• All modalities are „concatenated into one“
• Only one decision is taken over the concatenated input
Raw data
Output
Raw data
Multiple
Sources
Fusion
One unique
output
Recognizer Decision
Example: multimodal medical image analysis
• Sources: modalities (XRay, PET,…)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 102
Intermediate fusion strategy
• Each source acts as an input for a specific decision
• All recognizers are coupled in some way to form a meta-
recognizer
• One decision is taken
Fusion
Raw data
Output
Raw data
Multiple
sources
One unique
output
Recognizer
Meta-
Recognizer
Recognizer
Decision
Example: Video analysis
• Source: A/V streams
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 103
Late fusion strategy
• Each source is processed individually by a specific recognizer
• Multiple independent initial decisions are taken, possibly associated with
confidence scores
• A final decision is taken based on this output
Final
decision
Fusion
Raw data
Output
Raw data
Multiple
sources
One unique
output
Recognizer
Recognizer
Decision
Decision
DecisionExample: Object recognition
• Source: image regions
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 104
Prototype-based fusion
     ,,,, SIFT)(
xxx p72π
query
document x
     ,,7,2, SIFT)(
xxx pπ
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 105
Concept “Mona Lisa”
Multimedia is complex data
1503-1506 by L. Da Vinci
EN: Mona Lisa
IT: La Gioconda
FR: La Joconde
Visible in Le Louvre, Paris
48° 51′ 34″ N 2° 19′ 54″ E
Type: Oil on poplar
Dimensions: 77 cm × 53 cm (30 in × 21 in)
Photo by Mr. X
Date, context
Reason of trip
Variations from artists
Embedding into prints
Chocolate piece
URLs about Mona Lisa
Books about Mona Lisa
Videos about Mona Lisa
Audio about Mona Lisa
“official picture”
Texts about Mona Lisa
“Factual” data
“Faithful” data “Related” data
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 106
Networked multimodal data
Data
Tags
Users
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 107
Features
Networked multimodal data
Data
Tags
Users
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 108
• Curse of dimensionality
– Sparsity
– Dimension reduction/visualisation
• Big data
– Indexing
– Visual Analytics
Complements
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 109
Statistics tell us that the more data we have, the
better we get
Simulation: exponential distribution
n=10 n=100 n=1’000 n=3’000
n=10’000 Mean Standard deviation
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 110
Imagine I want to know the age spread of a population
• I create 10 intervals (0-10, 10-20,…) and I want to know the proportion of
people in each interval
• To have a reliable estimate I want 100 persons per interval in average
 With 10*100=1000 persons, I get my estimates
Now, imagine I want to estimate the height against age (I add one dimension
to my measure)
• I create 10 height interval and question, “how many people with height X
that are aged Y?”
• I have 10*10=100 such (X,Y) pairs
• To get 100 persons for each pair in average I need 100*100 = 10’000
persons
And so on…
 The size of the data needed for a reliable estimate grows with the
exponential of the dimension (huge!!!)
 In practice our data size is limited
 Our only choice to preserve reliability is to reduce the dimension
High dimensionality
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 111
Dimension reduction: principle
• Given a set of data in a M-dimensional space, we
seek an equivalent representation of lower
dimension
– For better statistics
– For visualization (2D, 3D,..)
• Dimension reduction induces a loss. What to
sacrifice? What to preserve?
– Preserve local: neighbourhood, distances
– Preserve global: distribution of data, variance
– Sacrifice local: noise
– Sacrifice global: large distances
– Map linearly
– Unfold data
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 112
Some example techniques:
• SFC: preserve neighbourhoods
• PCA: preserve global linear structures
• MDS: preserve linear neighbourhoods
• IsoMAP: Unfold neighbourhoods
• SNE family: unfold statistically
Dimension reduction
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 113
Non-recursive Space Filling Curves
Z-Scan Curve Snake Scan Curve
[Illustrations from the lecture “SFC in Info Viz”, Jiwen Huo, Uni Waterloo, CA]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 114
Recursive Space Filling Curves
Hilbert Curve
Gray Code Curve
Peano Curve
[Illustrations from the lecture “SFC in Info Viz”, Jiwen Huo, Uni Waterloo, CA]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 115
3D SFC
Z Curve Peano Curve Hilbert Curve
N-dimensional algorithm:
A.R. Butz (April 1971). "Alternative algorithm for Hilbert’s space filling curve.".
IEEE Trans. On Computers, 20: 424–42.
[Illustrations from the lecture “SFC in Info Viz”, Jiwen Huo, Uni Waterloo, CA]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 116
PCA
[Illustration Wikipedia]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 117
• A local neighbourhood graph (eg 5-NN graph) is built to create
a topology and ensure continuity
• Distances are replaced by geodesics (paths on the
neighbourhood graph)
• MDS is applied on this interdistance matrix (eg with m=2)
IsoMap (non Euclidean)
[Illustration from http://isomap.stanford. edu]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 118
• MNIST dataset
t-SNE example
[Illustration from L. van der Maaten’s website]
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 119
Traces of our everyday activities can be:
• Captured, exchanged (production,
communication)
• Aggregated, Stored
• Filtered, Mined (Processing)
The “V”’s of Big Data:
• Volume, Variety, Velocity (technical)
• and hopefully... Value
Raw data is almost worthless, the added value is in
juicing the data into information (and knowledge)
Big Data
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 120
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 121
Large-scale: Massive data volume, too large to
perform sequential scan of a small portion
Aim:
From a query, “filtering” by (fast) approximate
search) a tiny (fixed-size) portion of the data as
candidate answers. And then perform (slow) exact
search in that tiny-scale sample
Method:
Assuming the features group the document
coherently, approximately identify the vicinity of
the query and search within that neighborhood
Tools for Large Scale indexing
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 122
Approximate search
Fast approximate filtering
border (exact search zone)
Actual slow border
Final resultsNever considered
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 123
Idea: reference-based indexing
“If I am close to you, I see the same thing as you”
I am close to Geneva, Montreux and Evian, anything also close
to these places is close to me
• Fix few reference documents (landmarks)
• For each document, measure the distance to each
reference document
– Eg: define the set of the 5 closest landmarks
• For a query
– Find its 5 closest landmarks
– Find all documents whose 5 closest landmarks are the same
– Actually (slow distance) compare to these
Fast approximate filtering
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 124
• Fix few reference documents (landmarks)
Can be done offline
• For each document, measure the distance to each
reference document
– Eg: define the set of the 5 closest landmarks
Also done offline
• For a query
– Find its 5 closest landmarks
Fast since few landmarks
– Find all documents whose 5 closest landmarks are the
same
Can be made fast (eg Inverted File)
– Actually (slow distance) compare to these
 Fast since tiny sample
Fast approximate filtering
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 125
Permutation-based Indexing
L(x1, R)= (𝑟1, 𝑟2, 𝑟3, 𝑟4, 𝑟5) L(x2, R)=(𝑟1, 𝑟2, 𝑟3, 𝑟4, 𝑟5) L(x3, R)= (𝑟5, 𝑟3, 𝑟2, 𝑟4, 𝑟1)
n=5:
D={x1, . . . , x 𝑁}, N objects,
R = {𝑟1, . . . , 𝑟 𝑛} ⊂ D, n references
Each 𝑜𝑖 is identified by an ordered list:
L(x𝑖, R)= {𝑟𝑖1, . . . , 𝑟𝑖𝑛} such that
d(x𝑖, 𝑟𝑖𝑗) ≤ d(x𝑖, 𝑟𝑖𝑗+1 ) ∀j = 1, . . . , n − 1
x
y
z
1x2x
3x
4x
5x
r1
r2
r3
r4
r5
 
j
ririSFD
rank
i jj
RxLRqLxqdxqd || ),(),(),(),(
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 126
• Assess system performance on standard
challenges
– Data collection
– Annotation / groud-truth collection
– Query formulation
– Performance measures ( precision, recall,…)
Yearly event / related to scientific conference
In general, participants are academic teams
Evaluation
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 127
ImageCLEF
http://www.imageclef.org/
CLEF= Cross
Lingual Evaluation
Forum
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 128
ImageCLEF Wikipedia
Images + text (Overall: 237,434)
• English only: 70,127
• German only: 50,291
• French only: 28,461
• English and German: 26,880
• English and French: 20,747
• German and French: 9,646
• English, German and French: 22,899
• Language undetermined: 8,144
• No textual annotation: 239
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 129
TRECVid
The goal of the conference series is to encourage research in
information retrieval by providing a large test collection, uniform
scoring procedures, and a forum for organizations interested in
comparing their results.
http://trecvid.nist.gov/
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 130
MIREX
http://www.music-ir.org
• The Music Information Retrieval Evaluation
eXchange (MIREX) is an annual evaluation
campaign for Music Information Retrieval
(MIR) algorithms, coupled to the International
Society (and Conference) for Music
Information Retrieval (ISMIR)
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 131
Other
• 3D Retrieval
– SHREC: Shape Retrieval Contest
– http://www.aimatshape.net/event/SHREC
• XML retrieval
– INitiative for the Evaluation of XML retrieval
– https://inex.mmci.uni-saarland.de/
• Many dedicated corpora for specific tasks
– Generic (Image-Net)
– Medical (ImageCLEF)
– Satellite imagery
– …
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 132
Conclusions
• Multimedia Information Retrieval extends classical IR
– Word occurrence model is preserved
• Text IR is not always sufficient / feasible
– Completeness
– Scalability
• Multimedia IR requires accurate interpretation of multimodal
data
• Fusion is required for reaching such an accurate level
• Multimedia IR still faces many challenges
– Accuracy, interactivity, scalability
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 133
Challenges
• Speed and accuracy of response are the major
challenges
• IR tends to be mobile
– Low computational power, connection, memory,…
• Large scale
– Big data
• Partial search
– Searching for a part of a document
• Semantic search
– Searching with high-level description
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 134
• Baeza-Yates R. and Ribeiro-Neto B. (1999): Modern
Information Retrieval. Addison-Wesley.
http://people.ischool.berkeley.edu/~hearst/irbook
• Baeza-Yates R. and Ribeiro-Neto B. (2011): Modern
Information Retrieval. Addison-Wesley, Second
Edition
http://www.mir2ed.org/
Recommended reading
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 135
Thank you!
Questions?
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 136
Big Data and Large-scale data
– Mohammed, H., & Marchand-Maillet, S. (2015). Scalable Indexing for
Big Data Processing. Chapman & Hall.
– Marchand-Maillet, S., & Hofreiter, B. (2014). Big Data Management
and Analysis for Business Informatics. Enterprise Modelling and
Information Systems Architectures (EMISA), 9.
– M. von Wyl, H. Mohamed, E. Bruno, S. Marchand-Maillet, “A parallel
cross-modal search engine over large-scale multimedia collections
with interactive relevance feedback” in ICMR 2011 - ACM International
Conference on Multimedia Retrieval.
– H. Mohamed, M. von Wyl, E. Bruno and S. Marchand-Maillet,
“Learning-based interactive retrieval in large-scale multimedia
collections” in AMR 2011 - 9th International Workshop on Adaptive
Multimedia Retrieval.
– von Wyl, M., Hofreiter, B., & Marchand-Maillet, S. (2012).
Serendipitous Exploration of Large-scale Product Catalogs. In 14th IEEE
International Conference on Commerce and Enterprise Computing
(CEC 2012), Hangzhou, CN.
More at http://viper.unige.ch/publications
References
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 137
References
Large-scale Indexing
– Mohamed, H., & Marchand-Maillet, S. (2015). Quantized Ranking for Permutation-
Based Indexing. Information Systems.
– Mohamed, H., Osipyan, H., & Marchand-Maillet, S. (2014). Multi-Core (CPU and
GPU) For Permutation-Based Indexing. In Proceedings of the 7th Internation
Conference on Similarity Search and Applications (SISAP2014), Los Cabos, Mexico.
– H. Mohamed and S. Marchand-Maillet “Parallel Approaches to Permutation-Based
Indexing using Inverted Files” in SISAP 2012 - 5th International Conference on
Similarity Search and Applications .
– H. Mohamed and S. Marchand-Maillet “Distributed Media indexing based on MPI
and MapReduce” in CBMI 2012 - 10th Workshop on Content-Based Multimedia
Indexing.
– H. Mohamed and S. Marchand-Maillet “Enhancing MapReduce using MPI and an
optimized data exchange policy”, P2S2 2012 - Fifth International Workshop
onParallel Programming Models and Systems Software for High-End Computing.
– Mohamed, H., & Marchand-Maillet, S. (2014). Distributed media indexing based on
MPI and MapReduce. Multimedia Tools and Applications, 69(2).
– Mohamed, H., & Marchand-Maillet, S. (2013). Permutation-Based Pruning for
Approximate K-NN Search. In DEXA, Prague, CZ.
More at http://viper.unige.ch/publications
© Stephane.Marchand-Maillet@unige.ch – University of Geneva – Multimedia Retrieval – September 2015 - 138
References
Large data analysis – Manifold learning
– Sun, K., Morrison, D., Bruno, E., & Marchand-Maillet, S. (2013).
Learning Representative Nodes in Social Networks. In 17th Pacific-Asia
Conference on Knowledge Discovery and Data Mining, Gold Coast, AU.
– Sun, K., Bruno, E., & Marchand-Maillet, S. (2012). Unsupervised
Skeleton Learning for Manifold Denoising and Outlier Detection. In
International Conference on Pattern Recognition (ICPR'2012), Tsukuba,
JP.
– Sun, K., & Marchand-Maillet, S. (2014). An Information Geometry of
Statistical Manifold Learning. In Proceedings of the International
Conference on Machine Learning (ICML 2014), Beijing, China.
– Wang, J., Sun, K., Sha, F., Marchand-Maillet, S., & Kalousis, A. (2014).
Two-Stage Metric Learning. In Proceedings of the International
Conference on Machine Learning (ICML 2014), Beijing, China.
– Sun, K., Bruno, E., & Marchand-Maillet, S. (2012). Stochastic Unfolding.
In IEEE Machine Learning for Signal Processing Workshop (MLSP'2012),
Santander, Spain.
More at http://viper.unige.ch/publications
Ad

More Related Content

What's hot (20)

Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
kavitha muneeshwaran
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ankur bhalla
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
Mustafa Sherazi
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
Selman Bozkır
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
neelamoberoi1030
 
web mining
web miningweb mining
web mining
Arpit Verma
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
rathnaarul
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
Web data mining
Web data miningWeb data mining
Web data mining
Institute of Technology Telkom
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
Azad public school
 
Text mining
Text miningText mining
Text mining
Malik Imran
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Gajanand Sharma
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
Amr Abd El Latief
 
Clustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsClustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning Algorithms
Umang MIshra
 
Text MIning
Text MIningText MIning
Text MIning
Prakhyath Rai
 
Social Impacts & Trends of Data Mining
Social Impacts & Trends of Data MiningSocial Impacts & Trends of Data Mining
Social Impacts & Trends of Data Mining
SushilDhakal4
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
kavitha muneeshwaran
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ankur bhalla
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
Mustafa Sherazi
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
Selman Bozkır
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
neelamoberoi1030
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
rathnaarul
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
Amr Abd El Latief
 
Clustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsClustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning Algorithms
Umang MIshra
 
Social Impacts & Trends of Data Mining
Social Impacts & Trends of Data MiningSocial Impacts & Trends of Data Mining
Social Impacts & Trends of Data Mining
SushilDhakal4
 

Viewers also liked (20)

Multimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.pptMultimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.ppt
govintech1
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...
webhostingguy
 
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Jonathon Hare
 
multimedia chapter1
multimedia chapter1multimedia chapter1
multimedia chapter1
nes
 
Multimedia
MultimediaMultimedia
Multimedia
Shivam Tuteja
 
Iaetsd enhancement of face retrival desigend for
Iaetsd enhancement of face retrival desigend forIaetsd enhancement of face retrival desigend for
Iaetsd enhancement of face retrival desigend for
Iaetsd Iaetsd
 
Principles of Multimedia
Principles of MultimediaPrinciples of Multimedia
Principles of Multimedia
dborcoman
 
Lecture05
Lecture05Lecture05
Lecture05
mavillard
 
Lecture01
Lecture01Lecture01
Lecture01
mavillard
 
Lecture06
Lecture06Lecture06
Lecture06
mavillard
 
Multimedia project life cycle - wsolvt
Multimedia project life cycle - wsolvtMultimedia project life cycle - wsolvt
Multimedia project life cycle - wsolvt
Wsolvt
 
Multimedia Development and Evaluation
Multimedia Development and EvaluationMultimedia Development and Evaluation
Multimedia Development and Evaluation
Mechelle Tumanda
 
Achieving interoperability between CARARE schema for monuments and sites and ...
Achieving interoperability between CARARE schema for monuments and sites and ...Achieving interoperability between CARARE schema for monuments and sites and ...
Achieving interoperability between CARARE schema for monuments and sites and ...
Valentine Charles
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Edge AI and Vision Alliance
 
Content Based Image and Video Retrieval Algorithm
Content Based Image and Video Retrieval AlgorithmContent Based Image and Video Retrieval Algorithm
Content Based Image and Video Retrieval Algorithm
Akshit Bum
 
Principles Of Multimedia
Principles Of MultimediaPrinciples Of Multimedia
Principles Of Multimedia
dborcoman
 
Calgary Multimedia Company
Calgary Multimedia CompanyCalgary Multimedia Company
Calgary Multimedia Company
brandsmultimedia
 
Chapter 8 - Multimedia Storage and Retrieval
Chapter 8 - Multimedia Storage and RetrievalChapter 8 - Multimedia Storage and Retrieval
Chapter 8 - Multimedia Storage and Retrieval
Pratik Pradhan
 
2015.12.17 kg bim
2015.12.17 kg bim2015.12.17 kg bim
2015.12.17 kg bim
Konstantinos Gkoumas
 
The Art and Science of Image Retrieval
The Art and Science of Image RetrievalThe Art and Science of Image Retrieval
The Art and Science of Image Retrieval
Jonathon Hare
 
Multimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.pptMultimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.ppt
govintech1
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...
webhostingguy
 
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Jonathon Hare
 
multimedia chapter1
multimedia chapter1multimedia chapter1
multimedia chapter1
nes
 
Iaetsd enhancement of face retrival desigend for
Iaetsd enhancement of face retrival desigend forIaetsd enhancement of face retrival desigend for
Iaetsd enhancement of face retrival desigend for
Iaetsd Iaetsd
 
Principles of Multimedia
Principles of MultimediaPrinciples of Multimedia
Principles of Multimedia
dborcoman
 
Multimedia project life cycle - wsolvt
Multimedia project life cycle - wsolvtMultimedia project life cycle - wsolvt
Multimedia project life cycle - wsolvt
Wsolvt
 
Multimedia Development and Evaluation
Multimedia Development and EvaluationMultimedia Development and Evaluation
Multimedia Development and Evaluation
Mechelle Tumanda
 
Achieving interoperability between CARARE schema for monuments and sites and ...
Achieving interoperability between CARARE schema for monuments and sites and ...Achieving interoperability between CARARE schema for monuments and sites and ...
Achieving interoperability between CARARE schema for monuments and sites and ...
Valentine Charles
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Edge AI and Vision Alliance
 
Content Based Image and Video Retrieval Algorithm
Content Based Image and Video Retrieval AlgorithmContent Based Image and Video Retrieval Algorithm
Content Based Image and Video Retrieval Algorithm
Akshit Bum
 
Principles Of Multimedia
Principles Of MultimediaPrinciples Of Multimedia
Principles Of Multimedia
dborcoman
 
Calgary Multimedia Company
Calgary Multimedia CompanyCalgary Multimedia Company
Calgary Multimedia Company
brandsmultimedia
 
Chapter 8 - Multimedia Storage and Retrieval
Chapter 8 - Multimedia Storage and RetrievalChapter 8 - Multimedia Storage and Retrieval
Chapter 8 - Multimedia Storage and Retrieval
Pratik Pradhan
 
The Art and Science of Image Retrieval
The Art and Science of Image RetrievalThe Art and Science of Image Retrieval
The Art and Science of Image Retrieval
Jonathon Hare
 
Ad

Similar to Multimedia Information Retrieval (20)

Curse of Dimensionality and Big Data
Curse of Dimensionality and Big DataCurse of Dimensionality and Big Data
Curse of Dimensionality and Big Data
Stephane Marchand-Maillet
 
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
EADTU
 
EADTU 2018 conference e-LIVES project
EADTU 2018 conference e-LIVES project EADTU 2018 conference e-LIVES project
EADTU 2018 conference e-LIVES project
Manuel Castro
 
TUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
TUW-ASE-Summer 2015: Advanced Services Engineering - IntroductionTUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
TUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
Hong-Linh Truong
 
Paul de Theux and Catherine Gerooms
Paul de Theux and Catherine GeroomsPaul de Theux and Catherine Gerooms
Paul de Theux and Catherine Gerooms
mymobileeu
 
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
MOVING Project
 
MediaEval 2014 Multimedia Benchmark Workshop
MediaEval 2014 Multimedia Benchmark WorkshopMediaEval 2014 Multimedia Benchmark Workshop
MediaEval 2014 Multimedia Benchmark Workshop
multimediaeval
 
CV-Europass-2016-Hugo Maurer-EN
CV-Europass-2016-Hugo Maurer-ENCV-Europass-2016-Hugo Maurer-EN
CV-Europass-2016-Hugo Maurer-EN
Hugo Maurer
 
Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...
REVEAL - Social Media Verification
 
Giovinazzo_ In2
Giovinazzo_ In2Giovinazzo_ In2
Giovinazzo_ In2
GoWireless
 
Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?
PEDAGOGY.IR
 
Moodle, the de facto learning platform to facilitate research and experimenta...
Moodle, the de facto learning platform to facilitate research and experimenta...Moodle, the de facto learning platform to facilitate research and experimenta...
Moodle, the de facto learning platform to facilitate research and experimenta...
David Monllaó
 
Wimmer Egov
Wimmer EgovWimmer Egov
Wimmer Egov
Aleksey Lashin
 
Intelligent User Interfaces: from Machine Learning to Crowdsourcing
Intelligent User Interfaces: from Machine Learning to CrowdsourcingIntelligent User Interfaces: from Machine Learning to Crowdsourcing
Intelligent User Interfaces: from Machine Learning to Crowdsourcing
Jean Vanderdonckt
 
Technology-Supported Large-Scale Transformative Innovations for the 21st Cent...
Technology-Supported Large-Scale Transformative Innovations for the 21st Cent...Technology-Supported Large-Scale Transformative Innovations for the 21st Cent...
Technology-Supported Large-Scale Transformative Innovations for the 21st Cent...
Demetrios G. Sampson
 
Pen-based Gestures and Sketching User Interfaces
Pen-based Gestures and Sketching User InterfacesPen-based Gestures and Sketching User Interfaces
Pen-based Gestures and Sketching User Interfaces
Jean Vanderdonckt
 
Tut mathematics and hypermedia research seminar 2011 11-11
Tut mathematics and hypermedia research seminar 2011 11-11Tut mathematics and hypermedia research seminar 2011 11-11
Tut mathematics and hypermedia research seminar 2011 11-11
Yleisradio
 
Introduction to irs notes easy way learning
Introduction to irs notes easy way learningIntroduction to irs notes easy way learning
Introduction to irs notes easy way learning
JafarHussain48
 
Analyzing User Reviews in Tourism with Topic Models
Analyzing User Reviews in Tourism with Topic ModelsAnalyzing User Reviews in Tourism with Topic Models
Analyzing User Reviews in Tourism with Topic Models
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Wesley De Neve
 
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
EADTU
 
EADTU 2018 conference e-LIVES project
EADTU 2018 conference e-LIVES project EADTU 2018 conference e-LIVES project
EADTU 2018 conference e-LIVES project
Manuel Castro
 
TUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
TUW-ASE-Summer 2015: Advanced Services Engineering - IntroductionTUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
TUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
Hong-Linh Truong
 
Paul de Theux and Catherine Gerooms
Paul de Theux and Catherine GeroomsPaul de Theux and Catherine Gerooms
Paul de Theux and Catherine Gerooms
mymobileeu
 
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
MOVING Project
 
MediaEval 2014 Multimedia Benchmark Workshop
MediaEval 2014 Multimedia Benchmark WorkshopMediaEval 2014 Multimedia Benchmark Workshop
MediaEval 2014 Multimedia Benchmark Workshop
multimediaeval
 
CV-Europass-2016-Hugo Maurer-EN
CV-Europass-2016-Hugo Maurer-ENCV-Europass-2016-Hugo Maurer-EN
CV-Europass-2016-Hugo Maurer-EN
Hugo Maurer
 
Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...
REVEAL - Social Media Verification
 
Giovinazzo_ In2
Giovinazzo_ In2Giovinazzo_ In2
Giovinazzo_ In2
GoWireless
 
Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?
PEDAGOGY.IR
 
Moodle, the de facto learning platform to facilitate research and experimenta...
Moodle, the de facto learning platform to facilitate research and experimenta...Moodle, the de facto learning platform to facilitate research and experimenta...
Moodle, the de facto learning platform to facilitate research and experimenta...
David Monllaó
 
Intelligent User Interfaces: from Machine Learning to Crowdsourcing
Intelligent User Interfaces: from Machine Learning to CrowdsourcingIntelligent User Interfaces: from Machine Learning to Crowdsourcing
Intelligent User Interfaces: from Machine Learning to Crowdsourcing
Jean Vanderdonckt
 
Technology-Supported Large-Scale Transformative Innovations for the 21st Cent...
Technology-Supported Large-Scale Transformative Innovations for the 21st Cent...Technology-Supported Large-Scale Transformative Innovations for the 21st Cent...
Technology-Supported Large-Scale Transformative Innovations for the 21st Cent...
Demetrios G. Sampson
 
Pen-based Gestures and Sketching User Interfaces
Pen-based Gestures and Sketching User InterfacesPen-based Gestures and Sketching User Interfaces
Pen-based Gestures and Sketching User Interfaces
Jean Vanderdonckt
 
Tut mathematics and hypermedia research seminar 2011 11-11
Tut mathematics and hypermedia research seminar 2011 11-11Tut mathematics and hypermedia research seminar 2011 11-11
Tut mathematics and hypermedia research seminar 2011 11-11
Yleisradio
 
Introduction to irs notes easy way learning
Introduction to irs notes easy way learningIntroduction to irs notes easy way learning
Introduction to irs notes easy way learning
JafarHussain48
 
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Wesley De Neve
 
Ad

Recently uploaded (20)

Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptxQuiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
NutriGen
 
Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025
Shiva Nejati
 
APES 6.5 Presentation Fossil Fuels .pdf
APES 6.5 Presentation Fossil Fuels   .pdfAPES 6.5 Presentation Fossil Fuels   .pdf
APES 6.5 Presentation Fossil Fuels .pdf
patelereftu
 
when is CT scan need in breast cancer patient.pptx
when is CT scan need in breast cancer patient.pptxwhen is CT scan need in breast cancer patient.pptx
when is CT scan need in breast cancer patient.pptx
Rukhnuddin Al-daudar
 
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Sérgio Sacani
 
SuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptxSuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptx
BurkanAlpKale
 
Nutritional Diseases in poultry.........
Nutritional Diseases in poultry.........Nutritional Diseases in poultry.........
Nutritional Diseases in poultry.........
Bangladesh Agricultural University,Mymemsingh
 
Parallel resonance circuits of science.pdf
Parallel resonance circuits of science.pdfParallel resonance circuits of science.pdf
Parallel resonance circuits of science.pdf
rk5867336912
 
2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure
Insilico Gen
 
Structure formation with primordial black holes: collisional dynamics, binari...
Structure formation with primordial black holes: collisional dynamics, binari...Structure formation with primordial black holes: collisional dynamics, binari...
Structure formation with primordial black holes: collisional dynamics, binari...
Sérgio Sacani
 
06-Molecular basis of transformation.pptx
06-Molecular basis of transformation.pptx06-Molecular basis of transformation.pptx
06-Molecular basis of transformation.pptx
LanaQadumii
 
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Ali Raei
 
Chromatography, types, techniques, ppt.pptx
Chromatography, types, techniques, ppt.pptxChromatography, types, techniques, ppt.pptx
Chromatography, types, techniques, ppt.pptx
Dr Showkat Ahmad Wani
 
Preparation of Permanent mounts of Parasitic Protozoans.pptx
Preparation of Permanent mounts of Parasitic Protozoans.pptxPreparation of Permanent mounts of Parasitic Protozoans.pptx
Preparation of Permanent mounts of Parasitic Protozoans.pptx
Dr Showkat Ahmad Wani
 
Application of Microbiology- Industrial, agricultural, medical
Application of Microbiology- Industrial, agricultural, medicalApplication of Microbiology- Industrial, agricultural, medical
Application of Microbiology- Industrial, agricultural, medical
Anoja Kurian
 
whole ANATOMY OF EYE with eye ball .pptx
whole ANATOMY OF EYE with eye ball .pptxwhole ANATOMY OF EYE with eye ball .pptx
whole ANATOMY OF EYE with eye ball .pptx
simranjangra13
 
amino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathakamino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathak
GovindaPathak6
 
Introduction to Mobile Forensics Part 1.pptx
Introduction to Mobile Forensics Part 1.pptxIntroduction to Mobile Forensics Part 1.pptx
Introduction to Mobile Forensics Part 1.pptx
Nivya George
 
Skin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _ControlSkin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _Control
muralinath2
 
Chapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.pptChapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.ppt
JessaBalanggoyPagula
 
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptxQuiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
NutriGen
 
Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025
Shiva Nejati
 
APES 6.5 Presentation Fossil Fuels .pdf
APES 6.5 Presentation Fossil Fuels   .pdfAPES 6.5 Presentation Fossil Fuels   .pdf
APES 6.5 Presentation Fossil Fuels .pdf
patelereftu
 
when is CT scan need in breast cancer patient.pptx
when is CT scan need in breast cancer patient.pptxwhen is CT scan need in breast cancer patient.pptx
when is CT scan need in breast cancer patient.pptx
Rukhnuddin Al-daudar
 
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Sérgio Sacani
 
SuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptxSuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptx
BurkanAlpKale
 
Parallel resonance circuits of science.pdf
Parallel resonance circuits of science.pdfParallel resonance circuits of science.pdf
Parallel resonance circuits of science.pdf
rk5867336912
 
2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure
Insilico Gen
 
Structure formation with primordial black holes: collisional dynamics, binari...
Structure formation with primordial black holes: collisional dynamics, binari...Structure formation with primordial black holes: collisional dynamics, binari...
Structure formation with primordial black holes: collisional dynamics, binari...
Sérgio Sacani
 
06-Molecular basis of transformation.pptx
06-Molecular basis of transformation.pptx06-Molecular basis of transformation.pptx
06-Molecular basis of transformation.pptx
LanaQadumii
 
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Ali Raei
 
Chromatography, types, techniques, ppt.pptx
Chromatography, types, techniques, ppt.pptxChromatography, types, techniques, ppt.pptx
Chromatography, types, techniques, ppt.pptx
Dr Showkat Ahmad Wani
 
Preparation of Permanent mounts of Parasitic Protozoans.pptx
Preparation of Permanent mounts of Parasitic Protozoans.pptxPreparation of Permanent mounts of Parasitic Protozoans.pptx
Preparation of Permanent mounts of Parasitic Protozoans.pptx
Dr Showkat Ahmad Wani
 
Application of Microbiology- Industrial, agricultural, medical
Application of Microbiology- Industrial, agricultural, medicalApplication of Microbiology- Industrial, agricultural, medical
Application of Microbiology- Industrial, agricultural, medical
Anoja Kurian
 
whole ANATOMY OF EYE with eye ball .pptx
whole ANATOMY OF EYE with eye ball .pptxwhole ANATOMY OF EYE with eye ball .pptx
whole ANATOMY OF EYE with eye ball .pptx
simranjangra13
 
amino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathakamino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathak
GovindaPathak6
 
Introduction to Mobile Forensics Part 1.pptx
Introduction to Mobile Forensics Part 1.pptxIntroduction to Mobile Forensics Part 1.pptx
Introduction to Mobile Forensics Part 1.pptx
Nivya George
 
Skin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _ControlSkin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _Control
muralinath2
 
Chapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.pptChapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.ppt
JessaBalanggoyPagula
 

Multimedia Information Retrieval

  • 1. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 1 Multimedia Information Retrieval Stephane Marchand-Maillet Viper group University of Geneva Switzerland http://viper.unige.ch
  • 2. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 2 • What is your background? – Computer Science – Scientific – Humanities • This lecture introduces intuition from examples – More technical details in recommended reading Quick get-to-know
  • 3. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 3 • To introduce (recall, rehearse) some key principles and techniques of IR • To show how these are adapted to Multimedia • To look at the Multimedia context • To see how to best face it Objectives of the lecture
  • 4. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 4 Outline • Motivation and context • Preliminary material on IR • Audio-visual IR • Complements • Evaluation 4 Note: Several illustrations from within these slides have been borrowed from the Web, including Wikipedia or teaching material. Please do not reproduce without permission from the respective authors. When in doubt, don't.
  • 5. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 5 Data Production • Growth of Data – 1,250B GB (=1.2EB) of data generated in 2010. – Data generation growing at a rate of 58% per year • Baraniuk, R., “More is Less: Signal Processing and the Data Deluge”, Science, V331, 2011. 1 exabyte (EB) = 1,073,741,824 gigabytes 0 2000 4000 6000 8000 10000 2010 2011 2012 2013 2014 DataSize(EB) Data Generation Growth http://www.intel.com/conte nt/www/us/en/communicati ons/internet-minute- infographic.html http://www.ritholtz.com/blog /2011/12/financial-industry- interconnectedness/ Internet Scientific Industry Data By Sverre Jarp, By Felix'Schürmann © Copyright attached
  • 6. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 6 A digital world [From http://1000memories.com/blog/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox]
  • 7. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 7 [Picture from: http://www.intel.com/content/www/us/en/communications/internet-minute-infographic.html] Data communication © Copyright attached
  • 8. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 8 User “productivity” [Picture from: http://www.go-gulf.com/wp-content/themes/go-gulf/blog/online-time.jpg - Feb 2012] © Copyright attached
  • 9. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 9 Motivation • Decision making requires informed choices • Data production ever easier • The information is often overwhelming – « Big Data » trend • The information is often not easy to manage and access We need to bring a structure to the raw data • Document (data) representation • Similarity measurements • Further analysis: data mining, information retrieval, learning © Copyright attached
  • 10. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 10 Components of Data Quality • Validity • Integrity • Completeness • Consistency • Relevance • Accuracy and… • Timeliness • Interpretability • Reliability • …  Applies on all media
  • 11. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 11 • Searching for multimedia – Is it like searching text? • Information retrieval model • Representing documents – BoW and TF*IDF • Indexing model – Inverted files – hashing Preliminary
  • 12. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 12 Searching for multimedia Basic solution: • Attach text to multimedia and search for text – Tags, keywords – Description “Multimedia Annotation” – Domain of “Knowledge Management” – Requires the definition of Ontologies, Taxonomies,… Human in the loop – Not scalable, not feasible
  • 13. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 13 [Picture from: http://www.intel.com/content/www/us/en/communications/internet-minute-infographic.html] Per minute rate (some years ago!) © Copyright attached
  • 14. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 14 • From the sensing device – Device type, model – Sensing conditions (flash/no flash) – Author (device owner?) – … • From the environment – Time, date, location – Environment conditions (inside/outside, weather) – Spoken languages – … • More? From the content? – Later in this lecture Note: Can never be exhaustive Auto-annotation
  • 15. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 15 • “Wasp on a yellow flower” • Specific (latin) name of the flower? Of the insect? • What is the insect exactly doing? • Weather looks nice: “countryside scene”? • The picture may be arty: “2015 1st price college photo competition winner” Inference cannot resolve all • “The wasp who scared my daughter during our holiday in Gruyere” Annotation
  • 16. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 16 • Exif: standard picture annotation container • ID3-tag: standard audio information • NTFS: window-based metadata exchange • JPSearch, MPEG 7: Search-oriented description standards • Every format will have a text header to insert “comments” (PNG, GIF, JPG, TIFF,…)  Complementarity text-based / content-based • Other category is recommender-based – “social” annotation Annotating mulitmedia
  • 17. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 17 MPEG-7 Still Image Description
  • 18. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 18 IBM MPEG-7 Annotation Tool (video/visual stream) [http://www.research.ibm.com/VideoAnnEx/]
  • 19. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 19 • Given a query (information need) • Given a repository (information source)  Infer the most relevant documents from the repository as an answer to the query The most used query type is Query-by-Example “I want something like this” • Descriptive query • Free text (Google-like) query • Example document-s (“ ”) • Relevance is similarity • Similarity is transferred to “closeness” More like this… Information Retrieval paradigm
  • 20. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 20 • IR tells us that we do not (really) need to have a complete absolute understanding of documents to respond queries, we (just) need to be able to compare 2 documents • We merely need appropriate distance measurements – To infer similarity – Based on document features (space) – and a notion of vicinity (distance)  Everything is about inspecting neighborhoods – Provided the distance is semantically relevant Information Retrieval
  • 21. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 21 Information management process Raw documents Representation space (visualisation) Document features User interaction Feature extraction “Appropriate” mapping “Decision” process Query
  • 22. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 22 Distance-based search Representation space (visualisation) Query Relevance list (sorted by distance to the query)
  • 23. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 23 Example: text Text documents Feature extraction “Appropriate” mapping User interaction “Decision” process “Word” occurrences Query
  • 24. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 24 Also... • Any type of media: webpage, audio, video, data,... • Objects, based on their characteristics • People in social networks • Concepts: processes, states, ... Etc Anything for which “characteristics” may be measured This is what this lecture is about
  • 25. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 25 • Raw data (the documents) carries information + Computers essentially perform additions We need to represent the data somehow to provide the computer with as much as possible faithful information • The representation is an opportunity for us to transfer some prior (domain) knowledge as design assumptions If this (data modelling) step is flawed, the computer will work with random information Representation spaces (intuition)
  • 26. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 26 From documents to features • Features help characterizing 1 document (summary) • Features help comparing 2 documents (similarity) • Features help structuring a collection (visualization)
  • 27. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 27 Comparing two documents (occurrences) 2 1 1 2 1 0 0 1 2 0 1 1 1 2 1 2 0 1 1 1 2 1 1 0 1 1 0 0 “Terms” (Vocabulary) “Documents”(Repository)
  • 28. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 28 Comparing two documents (presence) Y Y Y Y Y N N Y Y N N Y Y Y Y Y N N Y Y Y Y Y N Y Y N N Similarity: Y+Y=1 Distance: Y+N=1
  • 29. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 29 Comparing two documents (presence) Y Y Y Y Y N N Y Y N Y Y Y Y 1 1 0 1 1 0 0 Similarity = 4 (Distance=3)
  • 30. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 30 Comparing two documents 6 (0) 3 (4) 3 (4) 4 (1) 3 (2) 3 (2) S(D)
  • 31. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 31 “ “ Comparing two documents S(D) D1 Y Y Y Y Y N N 1(6) D2 Y Y N N Y Y Y 2(4) D3 Y Y N N Y Y Y 2(4) D4 Y Y N Y Y N N 0 (7) Q N N Y N N Y Y Nota: The query is seen as a document (Query-by-example)
  • 32. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 32 Query-based ranking “ “ • Since similarity is computed only on (Y,Y) pairs, only terms count for computing the similarity Every document having no common term with the query scores 0 (ie terms not in the query are ignored) 2(4) 2(4) 1(6) 0(7) D4D3 D1D2Q
  • 33. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 33 Computing the ranks Naïve solution (fill the array): • For every document, for every term, score 1 if the term is both in the query and the document  N (documents) * M (terms) comparisons Smarter solution • For every term of the query (few), find documents having this term and work from there (the rest will score 0)
  • 34. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 34 Information indexing Goal: to organize the data so as to facilitate its access (read or write) • Fast access for computation • Fast access with selection criteria • … Baseline: exhaustive search, sequential scan (unordered list) Strategy: enable “direct access” Simple example: • Address book: sorted list, with first letter shortcut – Smith  “S”  search 19th sublist (ignore 25 other sublists)
  • 35. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 35 Indexing structures …+ M-tree Tries Suffix array Suffix Tree Inverted files LSH… Illustration: Wikipedia
  • 36. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 36 Term Documents D1,D2,D3,D4 D1,D2,D3,D4 D1 D1,D4 D1,D2,D3,D4 D2,D3 D2,D3 “find documents having this term” Arrange documents by the terms they contain  Number of comparisons ~ number of query terms << N*M Inverted file (D1),(D2,D3),(D2,D3) D1, D2, D2, D3, D3 D4 is never considered ! (we know already that its score is 0)
  • 37. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 37 Principles and lessons • We need to be able to compare 2 documents • Comparison is based on document features • Similarity may be based on feature occurrence • Only occurring features count in building similarity  Feature occurrence allows for the use of inverted files to get independent from the size of the repository Feature occurrence loses a lot of structure in the document
  • 38. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 38 In practice (text IR) • Document features are normalised word (stem) occurrences (Bag of Words model) – Vocabulary ~ 10’000 stems – Billions of documents • Term weighting scheme (TF*IDF) accounts for feature statistics (better than Y/N occurrence) • Similarity is based on cosine distance • Inverted files include term statistics
  • 39. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 39 BoW : Assumptions • A text is reduced to the base vocabulary it uses • Example: • Becomes: The ultimate goal is to define similarity as: Two documents are similar if they share the same vocabulary Signals from a particle collider near Geneva suggested in September that scientists might have sighted a long-sought particle thought to be the source of mass itself. A (x2), be, collide, from, geneva, have, in, itself, long-sought, mass, may, near, of, particle (x2), scientist, see, september, signal, source, suggest, that, the, think, to
  • 40. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 40 BoW: Limitations • The structure is lost – Title, sections, paragraphs, sentences, negations • The representation is incomplete… – If all words appear once only and there is no extra information such as • Grammatical type • Priors on terms (recent news, unused terms,…) – And writing well is often about using a rich vocabulary • Use of synonyms • …or ambiguous – If a word appears many times but because of polysemy – Eg: mouse • A part on mouse as a pet • A part on computer mouse • A part on Mickey mouse – Does not make the text about “mouse” but maybe just about “kid amusements” … still works very well in practice
  • 41. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 41 A document is represented by the occurrences of the terms of the vocabulary • Earlier example: Y/N occurrence  Number of occurrence does not matter • Counting the term occurrence Favors longer documents (more occurrences) Document length normalization (frequency) • Not all frequent terms are relevant Balance between representation (TF) and discriminative power (IDF) How good in my term in my query for this document? A bit more details
  • 42. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 42 • Term frequency (TF) – Percentage of space taken by each term in the document: “how much this term represents the document” High TF  frequent term in the document  potential representative term (keyword) • Inverse document frequency (IDF) – Inverse of the percentage of space taken by each term in the collection: “how much this term discriminates documents” High DF  low IDF  term frequent in all documents  not discriminant (all documents are relevant) Eg: the, at, and, she, he, her… + thematic (patient, disease,…)  Good terms are terms with high TF and high IDF TF*IDF model: intuition
  • 43. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 43 High TF (frequent in document) Potential keyword Low TF (infrequent in document) High IDF (infrequent in collection) “Keyword” Good representative Verb, noun,.. Low representation (automatically ignored) Low IDF (frequent in collection) “Stop word” Does not carry meaning “structural” The, at, and, she,… (removed from voc / ignored) Rare term (automatically ignored) TF*IDF term weight When computing the difference between documents, terms are weighted by their TF*IDF factor
  • 44. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 44 Inverted file construction Docs extraction sorting compaction Term Doc ID “blah” N “new” 1 “the” 1 “tool” N-1 … ... “tool” 1 “blah” N “blah” 1 Term Doc ID “the” 1 “new” 1 “tool” 1 “blah” N … ... “tool” N-1 “blah” N “blah” 1 Term Doc ID “blah” 1 “new” 1 “the” 1 “tool” N-1 … ... “tool” 1 “blah” N Freq 1 1 1 1 ... 1 2 Term Tot. F. “blah” 3 “the” 1 … ... “tool” 2 “new” 1 ... Doc ID 1 1 1 N-1 1 N Freq 1 1 1 1 ... 1 2 factorisation Misc operations Eg: posting file compression Dictionary Collection information « idf » Postings Document information « tf »
  • 45. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 45 Value Key 1 2 … 20 … 33 … Hashing / fingerprinting (intuition only) =1 =2 =3 =4 =5 =6 =7 =1+1+2+3+4+4+5 = 20 =1+2+2+4+5+6+7+7 = 33 Hash Table Query  Immediate access =1+2+2+4+5+6+7+7 = 33 • Associate a key (hash, fingerprint) to each value • Insert the (key, value) into a hash table Immediate retrieval of the query Difficult constraint (Perceptual hashing) : The key does not vary if the value changes a bit (eg due to noise) Hash function
  • 46. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 46 What is multimedia? Basically our digital life • Text – Plain text (any language) – Structured text (XML-like, code,…) • Visual – Images (Photo) – Sketches (drawing, map,..) • Audio – Music – Speech – Sound • Misc – 3D object – Video – … (software, playlist,…) + Any combination – PDF, Web pages and alike
  • 47. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 47 Multimedia IR: Use Cases • Image – Find look-alike pictures, recognize landmark • Video – Find specific actions • Music – Find inspirational music, recognize music extract • Medical – Find similar cases • Patent – Find similar proposals, plagiarism  Not always doable with text 47
  • 48. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 48 Volume of multimedia data • 2 billion persons connected to internet – Mostly via mobile devices (phones) • 1.8 billions active mobile phones • 180 millions active web servers Sources: R. Baeze-Yates – RussIR 2010 http://news.netcraft.com/
  • 49. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 49 Volume of multimedia data • Web scale: – Google: • Early 2004: 4.3 billions indexed pages • Early 2005: 8 billions indexed pages • Today: XX billion indexed pages? – http://www.archive.org • Growth: 20 Tb/month • Multimedia collection: – Facebook: 140 billions photos 180 years at 25 fps – YouTube (2008): 83.4 million videos • Curated collections: – AOL Video library • 410 millions views in October 2011 – Institut National de l'Audiovisuel (INA-France): • 700’000h audio (radio) • 400’000h video (television) • 2 millions documents
  • 50. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 50 The new wave [From http://1000memories.com/blog/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox]
  • 51. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 51 Example: Information « loss » • Annotation – " Wasp on a yellow flower " • Text search: – « insect feeding This document will be missed • The user must know what is the annotation limitation • The goal is to represent the document – Exhaustively: no specific interpretation context – Automatically: no human intervention
  • 52. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 52 • Image formation model • Image features • Indexing visual content Visual Information Retrieval
  • 53. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 53 Visual content has specific types • Photo • Medical image, satellite image • Document: fax, scanned text… • Drawing: sketch, blueprint, cartoon,… • 3D, shape Visual content
  • 54. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 54 Example: Images Images Feature extraction “Appropriate” mapping “Decision” process Search Photo collage Filtering Color histogram User interaction Query
  • 55. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 55 Pixels RGB Sensor RGB Color model Any color = combination of RGB values Image = 3 Arrays (RGB) of values between 0 and 255 Pixel = RGB values (color) at a position in the image
  • 56. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 56 Image processing / Analysis RGB Histograms Gray scale (intensity) Edge image
  • 57. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 57 Low-level image representation Global Color Histogram data in color spaces: Global Texture data from Gabor filter banks:
  • 58. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 58 Corner detection / keypoints
  • 59. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 59 Local features: SIFT[Lowe 1999] and SURF[Bay et al 2006]
  • 60. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 60 From data to information Interpreting the content • Data is physically measured • Information is interpreted semantically Semantic gap • Fusion is a way to reduce the semantic gap • Examples – Looking at a movie w/o audio – Listening to a story w/o visual – Voice conversation vs video conversation – Disambiguation (eg „jaguar“)
  • 61. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 61 From data to information Semantic gap: “Discrepancy between the level of analysis of a computer and the level of perception of the same data by a human user.” Level Text Visual Interpretation Story “Action” Semantic segmentation Paragraph Scene Group Sentence Object Physical segmentation Word Homogeneous region Physical inspection Character Color, texture Levelofinterpretation Relevanceforindexing Computerprecision Semantic Gap
  • 62. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 62 Semantic gap • Exploited by Captchas
  • 63. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 63 Region-based image understanding Images are compared w.r.t the regions (hopefully objects/concepts) they contain Object detection eg, Face detection
  • 64. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 64 Object detection / recognition • Train the machine to recognize specific groups of patterns Faces [Viola, Jones, 2001] Objects  Automated image tagging  Text search
  • 65. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 65 Naming faces : Image to text Specific characteristics of a face are extracted (eg eigenfaces)
  • 66. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 66 Deep Learning [http://www.slideshare.net/NVIDIA/visual-computing-the-road-ahead-an-nvidia-ces-2015-presentation-deck]
  • 67. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 67 Bag of Visual Words • Once basic visual features are extracted, – Eg: the description of the surrounding of each corner point (eg SIFT) • Basic features are grouped – Eg corner of an object • And “quantified” – Many examples are gathered into one instance • To create a “visual dictionary” of visual words • An image is a composition (occurrences) of these visual words (“terms”)  We can use of the machinery related to text - Inverted files, etc
  • 68. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 68 Comparing two documents (occurrences) 2 1 1 2 1 0 0 1 2 0 1 1 1 2 1 2 0 1 1 1 2 1 1 0 1 1 0 0 “Terms” (Vocabulary) “Documents”(Repository)
  • 69. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 69 Hyperbrowser [S. Craver, 1999]
  • 70. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 70 Hyperbrowser [S. Craver, 1999]
  • 71. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 71 Digital shoebox
  • 72. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 72 Digital shoebox
  • 73. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 73 Collection mining
  • 74. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 74 • Modeling sound • Audio information – Speech, music, sound • Searching for audio Audio IR
  • 75. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 75 Example: Audio Audio Feature extraction “Appropriate” mapping “Decision” process Playlist Filtering User interaction Query
  • 76. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 76 Sound characteristics Sound corresponds to waves perturbing the environment Wave characteristics – Frequency/Period 1 Hertz (Hz) = 1 vibration/second Related notion of pitch – Intensity • Human ear has logarithmic scale Decibel (dB) scale 0dB = 10-12 W/m2 (Threshold of hearing) 10dB = 10-11 W/m2 … x*10dB= 10-(12-x) W/m2 2 m Watt Area Power Time Energy Area 1 I
  • 77. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 77 Sound perception Airwave collection Mapping to mechanical waves Mapping to compressional waves Mapping to electrical signal 0dB = 10-12 W/m2 : Threshold of hearing 160 dB = 10-4 W/m2 : Instant perforation of eardrum
  • 78. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 78 200k100k Sound perception 0 Frequency f (Hz) Human (20-20k) Dog (50-45k) Cat (50-85k) 50 100 500 1k 5k 10k 50k Bats (50-85k) Dolphins (-200k) Bats (-120k) Elephant (5-) Infrasound Audible sound Ultrasound
  • 79. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 79 Audio and Sound 50 100 80 20 500 1k 5k 10k 20k 40 60 0 120 100 Non-audible Painful Audible sounds Speech Frequency f (Hz) Intensity I (dB) Partly adapted from [Schäuble,1997] Quiet library, soft whisper Quiet office, living room, Light traffic at a distance, gentle breeze Conversation, Sewing machine Busy traffic, noisy restaurant Subway, heavy city traffic, alarm clock at 2 feet, factory noise Truck traffic, noisy home appliances, shop tools, lawnmower Chain saw, pneumatic drill Rock concert in front of speakers, thunderclap (180) Rocket launching pad (140) Gunshot blast, jet plane
  • 80. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 80 Audio information • Speech – Male, female speech Signal-based speech characterisation – Automatic speech recognition (ASR) ASR (text) retrieval • Music Genre recognition (Jazz, classic,…) Excerpt retrieval Query by humming … • Sound – Car, water, people, animals,… – Often useful in relation to visual (video)
  • 81. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 81 Audio information high low natural man-made Informationcontent Origin Animal sounds Wind Water Speech Music Urban noise Machines, engines
  • 82. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 82 Audio features • Low-level-audio representation – Spectrogram – MFCC • Mid-level representations – MIDI strings – LPC • High-level representation – Automatic speech recognition (ASR)
  • 83. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 83 Translation Signal- based Audio information indexing Audio SoundMusicSpeech ASR Wordspotting Translation Piece/Genre recognition Classification Visual A/V Processing Transcripts, Speaker info Score, MIDI
  • 84. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 84 Speech Speech production model Speech perception model Note: In vision, mostly only perception models Speech
  • 85. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 85 Speech production Human apparatus Speech production scenarios 1. Voiced sound (vocal tract) 2. Fricative sound (‘S’) 3. Plosive sound (‘P’)
  • 86. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 86 Speech production modeling Mechanical model Human apparatus Multistage process 1. Build up pressure 2. Release 3. Return to first state
  • 87. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 87 Speech signal modeling Given w(t) as a speech signal where S(w) is the STFT of w. The model assume H(w) as the vocal tract transfert function and E(w) is the excitation 222 )()()( wEwHwS 
  • 88. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 88 Note: MP3 coding Advanced coding model MP3 = MPEG 1 Audio layer 3 See previous slides
  • 89. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 89 Sony music browser
  • 90. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 90 Islands of music [http://www.oefai.at/~elias/music] Organisation d'archives musicales • Basée sur des cartes auto-organisées de Kohonen (Self organising maps – SOM)
  • 91. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 91 ThemeFinder [http://www.themefinder.org/]
  • 92. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 92 • Design a nice perceptual hash function for audio (based on perceptual features) • Slice an audio document (music piece) into small chunks (large enough to be “unique”) • For each chunk compute a key through the hash function and store as value the music piece ID (title) Music query by example Retrieval by noisy chunk Audio fingerprinting
  • 93. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 93 • Video type – Movies – Amateur souvenir film – Lectures – Youtube (short, static) • Combining individual media (fusion) – each brings some information… • Going further: social multimedia… Multimedia IR
  • 94. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 94 From data to information Interpreting the content • Data is physically measured • Information is interpreted semantically Semantic gap • Fusion is a way to reduce the semantic gap • Examples – Looking at a movie w/o audio – Listening to a story w/o visual – Voice conversation vs video conversation – Disambiguation (eg „jaguar“)
  • 95. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 95 Multimodal video features Over temporal modalities: • Segmentation (boundary) cues • Categorization (region) cues Word relatedness measure Visual syntactic homogeneity Multimodal boundary-based cues Semantic region- based cues Audio Visual Text
  • 96. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 96 Adding context to interpretation Video Story Segmentation: Using fusion between the visual and audio modalities, high-level segmentation may be achieved “shot”  keyframe Audio transcript
  • 97. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 97 Main goals of fusion Multimodal information fusion aims at interpreting jointly multiple sources of information representing the same underlying „concept“ The main goal is the extraction of information By fusing information, one aims at: • Being more accurate in the discovery of the „concept“ – Each individual stream may be incomplete • Being more robust in the discovery of the „concept“ – Each individual stream may be distorted (eg, noisy)
  • 98. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 98 Multimodal fusion • From many sources of information and context, how to make our best to “interpret” the data representation color texture shape face local features vector space model Port;;Naval and River;;For Transportation;;Structure;;Architecture;;Vegetabl e Garden;;Landscape Farmland - Countryside);;Landscape;;Landscape (Seascape);;Landscape;;Panorama;;View of the City;;View 98 data Interpretation
  • 99. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 99 Levels of fusion How to organise fusion from classical „unimodal“ interpretation? Feature extraction RecognizerRaw data „Knowledge“ Output Unique source
  • 100. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 100 Levels of fusion Raw data Output Raw data Multiple sources One unique output Recognizer Decision
  • 101. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 101 Early fusion strategy • Acts over features • All modalities are „concatenated into one“ • Only one decision is taken over the concatenated input Raw data Output Raw data Multiple Sources Fusion One unique output Recognizer Decision Example: multimodal medical image analysis • Sources: modalities (XRay, PET,…)
  • 102. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 102 Intermediate fusion strategy • Each source acts as an input for a specific decision • All recognizers are coupled in some way to form a meta- recognizer • One decision is taken Fusion Raw data Output Raw data Multiple sources One unique output Recognizer Meta- Recognizer Recognizer Decision Example: Video analysis • Source: A/V streams
  • 103. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 103 Late fusion strategy • Each source is processed individually by a specific recognizer • Multiple independent initial decisions are taken, possibly associated with confidence scores • A final decision is taken based on this output Final decision Fusion Raw data Output Raw data Multiple sources One unique output Recognizer Recognizer Decision Decision DecisionExample: Object recognition • Source: image regions
  • 104. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 104 Prototype-based fusion      ,,,, SIFT)( xxx p72π query document x      ,,7,2, SIFT)( xxx pπ
  • 105. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 105 Concept “Mona Lisa” Multimedia is complex data 1503-1506 by L. Da Vinci EN: Mona Lisa IT: La Gioconda FR: La Joconde Visible in Le Louvre, Paris 48° 51′ 34″ N 2° 19′ 54″ E Type: Oil on poplar Dimensions: 77 cm × 53 cm (30 in × 21 in) Photo by Mr. X Date, context Reason of trip Variations from artists Embedding into prints Chocolate piece URLs about Mona Lisa Books about Mona Lisa Videos about Mona Lisa Audio about Mona Lisa “official picture” Texts about Mona Lisa “Factual” data “Faithful” data “Related” data
  • 106. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 106 Networked multimodal data Data Tags Users
  • 107. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 107 Features Networked multimodal data Data Tags Users
  • 108. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 108 • Curse of dimensionality – Sparsity – Dimension reduction/visualisation • Big data – Indexing – Visual Analytics Complements
  • 109. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 109 Statistics tell us that the more data we have, the better we get Simulation: exponential distribution n=10 n=100 n=1’000 n=3’000 n=10’000 Mean Standard deviation
  • 110. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 110 Imagine I want to know the age spread of a population • I create 10 intervals (0-10, 10-20,…) and I want to know the proportion of people in each interval • To have a reliable estimate I want 100 persons per interval in average  With 10*100=1000 persons, I get my estimates Now, imagine I want to estimate the height against age (I add one dimension to my measure) • I create 10 height interval and question, “how many people with height X that are aged Y?” • I have 10*10=100 such (X,Y) pairs • To get 100 persons for each pair in average I need 100*100 = 10’000 persons And so on…  The size of the data needed for a reliable estimate grows with the exponential of the dimension (huge!!!)  In practice our data size is limited  Our only choice to preserve reliability is to reduce the dimension High dimensionality
  • 111. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 111 Dimension reduction: principle • Given a set of data in a M-dimensional space, we seek an equivalent representation of lower dimension – For better statistics – For visualization (2D, 3D,..) • Dimension reduction induces a loss. What to sacrifice? What to preserve? – Preserve local: neighbourhood, distances – Preserve global: distribution of data, variance – Sacrifice local: noise – Sacrifice global: large distances – Map linearly – Unfold data
  • 112. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 112 Some example techniques: • SFC: preserve neighbourhoods • PCA: preserve global linear structures • MDS: preserve linear neighbourhoods • IsoMAP: Unfold neighbourhoods • SNE family: unfold statistically Dimension reduction
  • 113. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 113 Non-recursive Space Filling Curves Z-Scan Curve Snake Scan Curve [Illustrations from the lecture “SFC in Info Viz”, Jiwen Huo, Uni Waterloo, CA]
  • 114. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 114 Recursive Space Filling Curves Hilbert Curve Gray Code Curve Peano Curve [Illustrations from the lecture “SFC in Info Viz”, Jiwen Huo, Uni Waterloo, CA]
  • 115. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 115 3D SFC Z Curve Peano Curve Hilbert Curve N-dimensional algorithm: A.R. Butz (April 1971). "Alternative algorithm for Hilbert’s space filling curve.". IEEE Trans. On Computers, 20: 424–42. [Illustrations from the lecture “SFC in Info Viz”, Jiwen Huo, Uni Waterloo, CA]
  • 116. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 116 PCA [Illustration Wikipedia]
  • 117. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 117 • A local neighbourhood graph (eg 5-NN graph) is built to create a topology and ensure continuity • Distances are replaced by geodesics (paths on the neighbourhood graph) • MDS is applied on this interdistance matrix (eg with m=2) IsoMap (non Euclidean) [Illustration from http://isomap.stanford. edu]
  • 118. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 118 • MNIST dataset t-SNE example [Illustration from L. van der Maaten’s website]
  • 119. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 119 Traces of our everyday activities can be: • Captured, exchanged (production, communication) • Aggregated, Stored • Filtered, Mined (Processing) The “V”’s of Big Data: • Volume, Variety, Velocity (technical) • and hopefully... Value Raw data is almost worthless, the added value is in juicing the data into information (and knowledge) Big Data
  • 120. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 120
  • 121. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 121 Large-scale: Massive data volume, too large to perform sequential scan of a small portion Aim: From a query, “filtering” by (fast) approximate search) a tiny (fixed-size) portion of the data as candidate answers. And then perform (slow) exact search in that tiny-scale sample Method: Assuming the features group the document coherently, approximately identify the vicinity of the query and search within that neighborhood Tools for Large Scale indexing
  • 122. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 122 Approximate search Fast approximate filtering border (exact search zone) Actual slow border Final resultsNever considered
  • 123. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 123 Idea: reference-based indexing “If I am close to you, I see the same thing as you” I am close to Geneva, Montreux and Evian, anything also close to these places is close to me • Fix few reference documents (landmarks) • For each document, measure the distance to each reference document – Eg: define the set of the 5 closest landmarks • For a query – Find its 5 closest landmarks – Find all documents whose 5 closest landmarks are the same – Actually (slow distance) compare to these Fast approximate filtering
  • 124. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 124 • Fix few reference documents (landmarks) Can be done offline • For each document, measure the distance to each reference document – Eg: define the set of the 5 closest landmarks Also done offline • For a query – Find its 5 closest landmarks Fast since few landmarks – Find all documents whose 5 closest landmarks are the same Can be made fast (eg Inverted File) – Actually (slow distance) compare to these  Fast since tiny sample Fast approximate filtering
  • 125. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 125 Permutation-based Indexing L(x1, R)= (𝑟1, 𝑟2, 𝑟3, 𝑟4, 𝑟5) L(x2, R)=(𝑟1, 𝑟2, 𝑟3, 𝑟4, 𝑟5) L(x3, R)= (𝑟5, 𝑟3, 𝑟2, 𝑟4, 𝑟1) n=5: D={x1, . . . , x 𝑁}, N objects, R = {𝑟1, . . . , 𝑟 𝑛} ⊂ D, n references Each 𝑜𝑖 is identified by an ordered list: L(x𝑖, R)= {𝑟𝑖1, . . . , 𝑟𝑖𝑛} such that d(x𝑖, 𝑟𝑖𝑗) ≤ d(x𝑖, 𝑟𝑖𝑗+1 ) ∀j = 1, . . . , n − 1 x y z 1x2x 3x 4x 5x r1 r2 r3 r4 r5   j ririSFD rank i jj RxLRqLxqdxqd || ),(),(),(),(
  • 126. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 126 • Assess system performance on standard challenges – Data collection – Annotation / groud-truth collection – Query formulation – Performance measures ( precision, recall,…) Yearly event / related to scientific conference In general, participants are academic teams Evaluation
  • 127. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 127 ImageCLEF http://www.imageclef.org/ CLEF= Cross Lingual Evaluation Forum
  • 128. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 128 ImageCLEF Wikipedia Images + text (Overall: 237,434) • English only: 70,127 • German only: 50,291 • French only: 28,461 • English and German: 26,880 • English and French: 20,747 • German and French: 9,646 • English, German and French: 22,899 • Language undetermined: 8,144 • No textual annotation: 239
  • 129. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 129 TRECVid The goal of the conference series is to encourage research in information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. http://trecvid.nist.gov/
  • 130. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 130 MIREX http://www.music-ir.org • The Music Information Retrieval Evaluation eXchange (MIREX) is an annual evaluation campaign for Music Information Retrieval (MIR) algorithms, coupled to the International Society (and Conference) for Music Information Retrieval (ISMIR)
  • 131. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 131 Other • 3D Retrieval – SHREC: Shape Retrieval Contest – http://www.aimatshape.net/event/SHREC • XML retrieval – INitiative for the Evaluation of XML retrieval – https://inex.mmci.uni-saarland.de/ • Many dedicated corpora for specific tasks – Generic (Image-Net) – Medical (ImageCLEF) – Satellite imagery – …
  • 132. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 132 Conclusions • Multimedia Information Retrieval extends classical IR – Word occurrence model is preserved • Text IR is not always sufficient / feasible – Completeness – Scalability • Multimedia IR requires accurate interpretation of multimodal data • Fusion is required for reaching such an accurate level • Multimedia IR still faces many challenges – Accuracy, interactivity, scalability
  • 133. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 133 Challenges • Speed and accuracy of response are the major challenges • IR tends to be mobile – Low computational power, connection, memory,… • Large scale – Big data • Partial search – Searching for a part of a document • Semantic search – Searching with high-level description
  • 134. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 134 • Baeza-Yates R. and Ribeiro-Neto B. (1999): Modern Information Retrieval. Addison-Wesley. http://people.ischool.berkeley.edu/~hearst/irbook • Baeza-Yates R. and Ribeiro-Neto B. (2011): Modern Information Retrieval. Addison-Wesley, Second Edition http://www.mir2ed.org/ Recommended reading
  • 135. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 135 Thank you! Questions?
  • 136. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 136 Big Data and Large-scale data – Mohammed, H., & Marchand-Maillet, S. (2015). Scalable Indexing for Big Data Processing. Chapman & Hall. – Marchand-Maillet, S., & Hofreiter, B. (2014). Big Data Management and Analysis for Business Informatics. Enterprise Modelling and Information Systems Architectures (EMISA), 9. – M. von Wyl, H. Mohamed, E. Bruno, S. Marchand-Maillet, “A parallel cross-modal search engine over large-scale multimedia collections with interactive relevance feedback” in ICMR 2011 - ACM International Conference on Multimedia Retrieval. – H. Mohamed, M. von Wyl, E. Bruno and S. Marchand-Maillet, “Learning-based interactive retrieval in large-scale multimedia collections” in AMR 2011 - 9th International Workshop on Adaptive Multimedia Retrieval. – von Wyl, M., Hofreiter, B., & Marchand-Maillet, S. (2012). Serendipitous Exploration of Large-scale Product Catalogs. In 14th IEEE International Conference on Commerce and Enterprise Computing (CEC 2012), Hangzhou, CN. More at http://viper.unige.ch/publications References
  • 137. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 137 References Large-scale Indexing – Mohamed, H., & Marchand-Maillet, S. (2015). Quantized Ranking for Permutation- Based Indexing. Information Systems. – Mohamed, H., Osipyan, H., & Marchand-Maillet, S. (2014). Multi-Core (CPU and GPU) For Permutation-Based Indexing. In Proceedings of the 7th Internation Conference on Similarity Search and Applications (SISAP2014), Los Cabos, Mexico. – H. Mohamed and S. Marchand-Maillet “Parallel Approaches to Permutation-Based Indexing using Inverted Files” in SISAP 2012 - 5th International Conference on Similarity Search and Applications . – H. Mohamed and S. Marchand-Maillet “Distributed Media indexing based on MPI and MapReduce” in CBMI 2012 - 10th Workshop on Content-Based Multimedia Indexing. – H. Mohamed and S. Marchand-Maillet “Enhancing MapReduce using MPI and an optimized data exchange policy”, P2S2 2012 - Fifth International Workshop onParallel Programming Models and Systems Software for High-End Computing. – Mohamed, H., & Marchand-Maillet, S. (2014). Distributed media indexing based on MPI and MapReduce. Multimedia Tools and Applications, 69(2). – Mohamed, H., & Marchand-Maillet, S. (2013). Permutation-Based Pruning for Approximate K-NN Search. In DEXA, Prague, CZ. More at http://viper.unige.ch/publications
  • 138. © [email protected] University of Geneva – Multimedia Retrieval – September 2015 - 138 References Large data analysis – Manifold learning – Sun, K., Morrison, D., Bruno, E., & Marchand-Maillet, S. (2013). Learning Representative Nodes in Social Networks. In 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Gold Coast, AU. – Sun, K., Bruno, E., & Marchand-Maillet, S. (2012). Unsupervised Skeleton Learning for Manifold Denoising and Outlier Detection. In International Conference on Pattern Recognition (ICPR'2012), Tsukuba, JP. – Sun, K., & Marchand-Maillet, S. (2014). An Information Geometry of Statistical Manifold Learning. In Proceedings of the International Conference on Machine Learning (ICML 2014), Beijing, China. – Wang, J., Sun, K., Sha, F., Marchand-Maillet, S., & Kalousis, A. (2014). Two-Stage Metric Learning. In Proceedings of the International Conference on Machine Learning (ICML 2014), Beijing, China. – Sun, K., Bruno, E., & Marchand-Maillet, S. (2012). Stochastic Unfolding. In IEEE Machine Learning for Signal Processing Workshop (MLSP'2012), Santander, Spain. More at http://viper.unige.ch/publications