SlideShare a Scribd company logo
FOUNDATIONS OF
INFORMATION RETRIEVAL
Lynda Tamine-Lechani
lynda.lechani@irit.fr
https://www.irit.fr/~Lynda.Tamine-Lechani/
FOUNDATIONS OF INFORMATION RETRIEVAL
2
•Course description
Study the theory, design, and implementation of information retrieval systems
from the perspectives of:
ü information representation: focus on texts
ü theoretical information retrieval model: focus on language model and learning-
based models
ü Performance evaluation: focus on system-centred evaluation
•Learning objectives
ü Index and represent textual information;
ü Recall and discuss well-known information retrieval models;
ü Design, implement and evaluate the performance of information retrieval
systems using retrieval algorithms and models discussed in class.
© L. Tamine-Lechani
FOUNDATIONS OF INFORMATION RETRIEVAL
3
•Organization
o 12H course, 6H tutorial: Lynda Tamine-Lechani
o 10H hands-on work: Jesus-Lovon Melgajero, José G. Moréno and Lynda Tamine-
Lechani
•Prerequisites
o Python programming
o Basics in probability and statistics
•Course material
o Copies of the lecture slides are posted on the MOODLE site
o Book and readings references are provided
•Grading
o 1st session
üHands-on experience with techniques discussed in class: assignment of 30% of the final score
üFinal written exam in class: assignment of 70% of the final score
o 2nd session
üFinal written exam in class: assignment of 100% of the final score
© L. Tamine-Lechani
FOUNDATIONS OF INFORMATION RETRIEVAL
4
•Schedule
Lecture Topic
1 Course Introduction; Text indexing, vector semantics
2 Static embeddings, contextual embeddings
3 Infomation retrieval (IR) models: query reformulation, learning to
rank
4 Tutorial 1: Text indexing and representation
5 Neural models for IR
6 Page Rank, Performance evaluation
7 Tutorial 2: information retrieval techniques and models
8 Question answering systems and chatbots
9 Tutorial 3: performance evaluation
© L. Tamine-Lechani
Information retrieval: Algorithms and Heuristics
David A. Grossamnn, Ophir Frieder, Kluwer
Academic Publishers, 1998
Modern information retrieval
R.B Yates, R. Neto, ACM Press Addisson Wesley, 1999
Recherche d'information, applications, modèles et algorithmes
M.R Amini et E. Gaussier, Eyrolles 2012
Search engines in practice
B. Croft, D. Metzler, T. Trohman, Pearson 2010
Books
5
FOUNDATIONS OF INFORMATION RETRIEVAL
© L. Tamine-Lechani
Calvin Mooers 1951 :
Information retrieval (IR) is the name for the process or method whereby a prospective user of
information is able to convert his need for information into an actual list of citations to documents in
storage containing information useful to him. .. Information retrieval is crucial to documentation and
organization of knowledge". (Mooers, 1951, p. 25)
Salton, 1980 :
Information retrieval systems are designed to help analyze and describe the items stored in a file, to
organize them and search among them, and finally to retrieve them in response to a user's query.
Designing and using a retrieval system involves four major activities: information analysis, information
organization and search, query formulation, and information retrieval and dissemination.
Information retrieval (IR) in computing and information science is the process of obtaining
information system resources that are relevant to an information need from a
collection of those resources. Searches can be based on full-text or other content-based
indexing.
6
Information Retrieval (IR): definitions
Introduction
© L. Tamine-Lechani
...Yes, but also refer to:
- Search in digital libraries
- Search in campany corpus
- Search in specialized corpus (health, legal, biological –related resources)
- Search for a location
- Search for answers
- Recommend items
- Summarize reviews
- ...
7
Definitions refer to ....well-known search engines ?
Introduction
© L. Tamine-Lechani
• Wide-variety of search systems, interaction environments
o Web search engines
o Conversational agents
o E commmerce: Amazon, AirBnb, ...
o Media recommendation: Netflix, Spotify, ...
8
From search to
conversation
Search and navigate on
maps
Cross-device search
Heatmaps on SERP
...with voice only!
...and different forms of user-system interactions
Introduction
© L. Tamine-Lechani
(Web) search systems that select from a corpus of texts documents those that are
relevant to a user information need experssed by the user using a query.
9
Focus in this lecture
Introduction
Query Documents
Selection
Information
need
Corpus
System's
answer to the query
© L. Tamine-Lechani
10
Basic notions: Document
Introduction
• Document: information unit being searched
- Document
- Paragraph
- Phrase
- Structure unit (section, chapter,...)
•Different views 1. Introduction
Information
retrieval....
2. Basics
The notion of
query…
Date : 15/01/2013
Author : Albert
Langue : Français
….
This course introdues
the basics of
information retrieval
Content
Metadata
Structure
© L. Tamine-Lechani
• Different media
Text (monomedia)
Image
Multimedia
Video
11
Basic notions: Document
Introduction
© L. Tamine-Lechani
•Different forms
-Document
-Blog
-Tweet
-News
-Presentation
-E-mail
--..
12
Basic notions: Document
Introduction
© L. Tamine-Lechani
• What the user seeks for: an information need
• How the user expresses his information need : a query
In this course: a query is a
list of keywords
13
Basic notions: information need, query
Introduction
© L. Tamine-Lechani
• A key concept in information retrieval
A document is relevant if it matches the information need. Numerous types of
relevance:
o Topical (aboutness) relevance: the document covers the query topic
o Situational relevance: the document matches the user's situation (e.g., task,
location, ...)
o Cognitive relevance : the documents fits with the user's knowledge state
o ...
and numerous criteria of relevance:
- Novelty
- Fresheness
- Language
- Specificity
- Trust
- ...
The main focus in this course is topical relevance: useful and "easy" to define and to
measure, but it does not cover everything related to relevance
14
Basic notions: Relevance
Introduction
© L. Tamine-Lechani
15
What makes information retrieval challenging ?
Introduction
© L. Tamine-Lechani
© NIST (TREC)
16
What makes information retrieval challenging ?
Introduction
© L. Tamine-Lechani
•Deluge of information
o Large-scale information
o Often little ratio of information is relevant and/or useful for a query
o Information is noisy
o Information is not always trusty
o Hetrogeneous information forms and sources
o ...
Source : Infographic
Increasing volumes of information
available on increasing information sources: social applications
mobile devices, sensors, ...
2003
Réseaux sociaux
2001
Wiki
1998
Recherche
1995
Annuaire
1994
E-commerce
1990
WWW
1972
ARPANET
1999
Blogs
2001
Wiki
2003
Réseaux sociaux
17
Information is every where
Introduction
© L. Tamine-Lechani
18
Focus on Web 3.0: The digital world today
Introduction
© L.Tamine-Lechani
•1st place: platforms for
publication/sharing of texts (mostly),
newsletters, podcasts, videos, photos,
o Wikipedia, Blogger, Google Poadcast,
youtube, Flickr, TripAdvisor, ...
•2nd place: platforms for messaging
o Facebook, Messenger, telegram,...
•3rd place: platforms for conversations
o Quora, StackExchange, Reddit, Facebook
groups, Google Groups, ...
•4th place: platforms for collaboration
o Facebook workplace, TeamWork, Chatter,
...
image credit https://fredcavazza.net/2021/05/06/panorama-des-
medias-sociaux-2021/
19
2003
Réseaux sociaux
2003
Réseaux sociaux
Source :
https://datastudio.google.com/embed/reporting/1sImC_rjeWqNXdgQt5MtmrQMbH44qFjtA/page/1fzh
• Google processes in 2020 more than 7
milliards of queries every day among
which 15% have never been submitted
before (new queries)
• The number of users in the world is
estimated as 2.77 milliards on social
media, 2.46 milliards in 2017
• 51%, or more than 240 milliards of
dollars, de tout l'argent publicitaire
dépensé dans le monde en 2019 seront
basés sur les médias numériques.
• Les ventes en ligne devraient atteindre
3.45 billions de dollars de ventes en 2020
• 47.3% de la population mondiale devrait
acheter en ligne en 2020.
Statitistics on usage
of information
access systems
2014-2020
Some statistics 2020-2021: information and users
Introduction
image credit https://www.internetlivestats.com/
• Users and information
shared in live 2021
20
What makes information retrieval challenging?
Introduction
© L. Tamine-Lechani
•Information needs are ambiguous
oQueries are generally short, ambiguous
oThe matching between queries and intents is M-N
Roi lion
1 Queryà N intents
- Master UPS Intelligence artificielle
- Université paul Sabatier IA
- Formation IA Toulouse
- Matsre IAFA
..
M Queriesà 1 intent
21
What makes information retrieval challenging ?
Introduction
© L. Tamine-Lechani
•Relevance is subjective
o Relevance is subjective
ü User-dependent
ü Situation-dependant
ü Topicality is often the threshold relevance
•Relevance faces vocabulary mismatch between queries and
documents
o Matching as word overlap: is it really semantic overlap?
Q: "most jurisdictions exercise a high degree of regulation over banks" [financial institution]
D1: "I have been stolen when I withdrew the money from the bank" [Building]
D2: "fish lined the bank of the stream" [The land alongside or sloping down to a river or lake]
o Matching is not exact, rough matching between queries and documents
Q: "Presidential Elections in France"
D1 : "Election campaign is running"
[relevant, but missing ‘presidential’ and ‘France’]
D2 : "Macron, the President of France is attending COP21"
[irrelevant, and matching ‘France’ and ‘President’]
22
What makes information retrieval challenging ?
Introduction
© L. Tamine-Lechani
•Queries and documents vary in length
oModels must handle variable length input
oRelevant documents have irrelevant content
Q: "variant Omicron symptomes"
D: "Le variant Omicron a déjà atteint plusieurs patients en France après avoir fait son apparition en Afrique du Sud. S'il semble plus
transmissible, il ne serait pas plus virulent. Mais quels sont ses symptômes ?
Le 26 novembre dernier, l’Organisation mondiale de la Santé (OMS) qualifiait le variant Omicron, nouvellement apparu en Afrique
du Sud, de « préoccupant » sur la base de sa rapidité de propagation. De nombreux cas commencent depuis à émerger à travers le
monde, dont quelques-uns en France.
Mais concernant sa dangerosité ou ses symptômes, le grand flou règne. Alors, que savons-nous ?
En se basant sur les situations en Afrique du Sud et au Royaume-Unis, l'OMS a indiqué dans une mise au point technique que le
variant Omicron semble se propager plus vite que Delta.
Néanmoins, contrairement à ce dernier, les symptômes seraient moins sévères.
Pas de perte de goût ou d’odorat
Interrogée par la BBC, le Dr Angelique Coetzee, présidente de l’Association médicale sud-africaine, qui fut l’une des premières à
être confrontée à Omicron, a indiqué que les symptômes qu’elle a pu observer semblent moins spécifiques que ceux de la maladie
originelle. « Cela a débuté avec un patient de sexe masculin âgé d’environ 33 ans », a-t-elle expliqué lors de cet entretien.
« Il a déclaré qu’il était extrêmement fatigué ces derniers jours et se plaignait de courbatures et de légers maux de tête. » Mais
l’homme n’a pas perdu son sens du goût ni celui de l’odorat ; il avait la « gorge qui le grattait », et non pas un mal de gorge et
une toux comme avec les variants précédents.
Elle a également déclaré que les autres patients auscultés le même jour « présentaient les mêmes symptômes bénins ".
Source: https://www.leprogres.fr/magazine-sante/2021/12/13/variant-omicron-quels-sont-les-premiers-symptomes-
detectes
23
What makes information retrieval similar vs. different from data retrieval (Databases)?
Introduction
© L. Tamine-Lechani
Information retrieval Data retrieval
Information unit Information Data (attribute-value)
Query Vague expression of an
information need
Vague expressio
Language of the query Natural language Formel language
Matching query-information Approximatif Exact
Selected information Information relevant to the
query
All the data that satifies the
query
Documents
Documents
representations
Information need
Query
Selected documents
Indexing Expression
Matching
Feedback
24
Copyright L.Tamine-Lechani
The basic process of information retrieval
Introduction
FOUNDATIONS OF INFORMATION RETRIEVAL
25
• Lecture structure
oIntroduction
o Chapter 1: Text indexing and representation
"How to transform raw texts into machinable representations?
Keywords: indexation, words, documents, representation learning of texts
o Chapter 2: Information retrieval (IR) models
"How to score the relevance of a document as an answer to a user's
query?"
Keywords: relevance status value, retrieval model
o Chapter 3: Performance evaluation of an IR system
"How to measure the performance of an information retrieval system?"
Keywords: evaluation metrics, test collections
o Chapter 4: From question-answering systems to chatbots
"How to interact with systems while searching for information?"
Keywords: conversation, turn, clarification
Ad

More Related Content

Similar to Introduction to irs notes easy way learning (20)

L yuan alt c 3
L yuan alt c 3L yuan alt c 3
L yuan alt c 3
cetisli
 
ESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking SessionESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking Session
Erik Mannens
 
Lecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive DatasetsLecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive Datasets
Marina Santini
 
Information Management
Information ManagementInformation Management
Information Management
Nadeem Raza
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
openminted_eu
 
Rise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen RomboutsRise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen Rombouts
Library_Connect
 
Synthesising JISC Institutional Innovation
Synthesising JISC Institutional InnovationSynthesising JISC Institutional Innovation
Synthesising JISC Institutional Innovation
George Roberts
 
Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?
PEDAGOGY.IR
 
Competitive & Saleable E-Content for Philippine Libraries
Competitive & Saleable E-Content for Philippine LibrariesCompetitive & Saleable E-Content for Philippine Libraries
Competitive & Saleable E-Content for Philippine Libraries
Philippine Association of Academic/Research Librarians
 
A distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaA distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL India
Enno Meijers
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
Maryann Martone
 
Digital Repositories in Teaching and Learning (ppt)
Digital Repositories in Teaching and Learning (ppt)Digital Repositories in Teaching and Learning (ppt)
Digital Repositories in Teaching and Learning (ppt)
UKOLN, University of Bath
 
Universities Without Borders
Universities Without BordersUniversities Without Borders
Universities Without Borders
University of West Florida
 
Tacit knowledge sharing in virtual teams: is it even possible?
Tacit knowledge sharing in virtual teams:is it even possible?Tacit knowledge sharing in virtual teams:is it even possible?
Tacit knowledge sharing in virtual teams: is it even possible?
Amanda Lam
 
De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012
Anna De Liddo
 
Semi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-termSemi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-term
PERICLES_FP7
 
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning Analytics
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning AnalyticsEurocall2014 SpeakApps Presentation - SpeakApps and Learning Analytics
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning Analytics
SpeakApps Project
 
ICTConcepts.ppt
ICTConcepts.pptICTConcepts.ppt
ICTConcepts.ppt
AhmedOthman511332
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Prateek Singh
 
A presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptxA presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptx
ROHITSHARMA779690
 
L yuan alt c 3
L yuan alt c 3L yuan alt c 3
L yuan alt c 3
cetisli
 
ESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking SessionESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking Session
Erik Mannens
 
Lecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive DatasetsLecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive Datasets
Marina Santini
 
Information Management
Information ManagementInformation Management
Information Management
Nadeem Raza
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
openminted_eu
 
Rise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen RomboutsRise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen Rombouts
Library_Connect
 
Synthesising JISC Institutional Innovation
Synthesising JISC Institutional InnovationSynthesising JISC Institutional Innovation
Synthesising JISC Institutional Innovation
George Roberts
 
Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?
PEDAGOGY.IR
 
A distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaA distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL India
Enno Meijers
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
Maryann Martone
 
Digital Repositories in Teaching and Learning (ppt)
Digital Repositories in Teaching and Learning (ppt)Digital Repositories in Teaching and Learning (ppt)
Digital Repositories in Teaching and Learning (ppt)
UKOLN, University of Bath
 
Tacit knowledge sharing in virtual teams: is it even possible?
Tacit knowledge sharing in virtual teams:is it even possible?Tacit knowledge sharing in virtual teams:is it even possible?
Tacit knowledge sharing in virtual teams: is it even possible?
Amanda Lam
 
De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012
Anna De Liddo
 
Semi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-termSemi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-term
PERICLES_FP7
 
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning Analytics
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning AnalyticsEurocall2014 SpeakApps Presentation - SpeakApps and Learning Analytics
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning Analytics
SpeakApps Project
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Prateek Singh
 
A presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptxA presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptx
ROHITSHARMA779690
 

Recently uploaded (20)

How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Ad

Introduction to irs notes easy way learning

  • 1. FOUNDATIONS OF INFORMATION RETRIEVAL Lynda Tamine-Lechani [email protected] https://www.irit.fr/~Lynda.Tamine-Lechani/
  • 2. FOUNDATIONS OF INFORMATION RETRIEVAL 2 •Course description Study the theory, design, and implementation of information retrieval systems from the perspectives of: ü information representation: focus on texts ü theoretical information retrieval model: focus on language model and learning- based models ü Performance evaluation: focus on system-centred evaluation •Learning objectives ü Index and represent textual information; ü Recall and discuss well-known information retrieval models; ü Design, implement and evaluate the performance of information retrieval systems using retrieval algorithms and models discussed in class. © L. Tamine-Lechani
  • 3. FOUNDATIONS OF INFORMATION RETRIEVAL 3 •Organization o 12H course, 6H tutorial: Lynda Tamine-Lechani o 10H hands-on work: Jesus-Lovon Melgajero, José G. Moréno and Lynda Tamine- Lechani •Prerequisites o Python programming o Basics in probability and statistics •Course material o Copies of the lecture slides are posted on the MOODLE site o Book and readings references are provided •Grading o 1st session üHands-on experience with techniques discussed in class: assignment of 30% of the final score üFinal written exam in class: assignment of 70% of the final score o 2nd session üFinal written exam in class: assignment of 100% of the final score © L. Tamine-Lechani
  • 4. FOUNDATIONS OF INFORMATION RETRIEVAL 4 •Schedule Lecture Topic 1 Course Introduction; Text indexing, vector semantics 2 Static embeddings, contextual embeddings 3 Infomation retrieval (IR) models: query reformulation, learning to rank 4 Tutorial 1: Text indexing and representation 5 Neural models for IR 6 Page Rank, Performance evaluation 7 Tutorial 2: information retrieval techniques and models 8 Question answering systems and chatbots 9 Tutorial 3: performance evaluation © L. Tamine-Lechani
  • 5. Information retrieval: Algorithms and Heuristics David A. Grossamnn, Ophir Frieder, Kluwer Academic Publishers, 1998 Modern information retrieval R.B Yates, R. Neto, ACM Press Addisson Wesley, 1999 Recherche d'information, applications, modèles et algorithmes M.R Amini et E. Gaussier, Eyrolles 2012 Search engines in practice B. Croft, D. Metzler, T. Trohman, Pearson 2010 Books 5 FOUNDATIONS OF INFORMATION RETRIEVAL © L. Tamine-Lechani
  • 6. Calvin Mooers 1951 : Information retrieval (IR) is the name for the process or method whereby a prospective user of information is able to convert his need for information into an actual list of citations to documents in storage containing information useful to him. .. Information retrieval is crucial to documentation and organization of knowledge". (Mooers, 1951, p. 25) Salton, 1980 : Information retrieval systems are designed to help analyze and describe the items stored in a file, to organize them and search among them, and finally to retrieve them in response to a user's query. Designing and using a retrieval system involves four major activities: information analysis, information organization and search, query formulation, and information retrieval and dissemination. Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. 6 Information Retrieval (IR): definitions Introduction © L. Tamine-Lechani
  • 7. ...Yes, but also refer to: - Search in digital libraries - Search in campany corpus - Search in specialized corpus (health, legal, biological –related resources) - Search for a location - Search for answers - Recommend items - Summarize reviews - ... 7 Definitions refer to ....well-known search engines ? Introduction © L. Tamine-Lechani
  • 8. • Wide-variety of search systems, interaction environments o Web search engines o Conversational agents o E commmerce: Amazon, AirBnb, ... o Media recommendation: Netflix, Spotify, ... 8 From search to conversation Search and navigate on maps Cross-device search Heatmaps on SERP ...with voice only! ...and different forms of user-system interactions Introduction © L. Tamine-Lechani
  • 9. (Web) search systems that select from a corpus of texts documents those that are relevant to a user information need experssed by the user using a query. 9 Focus in this lecture Introduction Query Documents Selection Information need Corpus System's answer to the query © L. Tamine-Lechani
  • 10. 10 Basic notions: Document Introduction • Document: information unit being searched - Document - Paragraph - Phrase - Structure unit (section, chapter,...) •Different views 1. Introduction Information retrieval.... 2. Basics The notion of query… Date : 15/01/2013 Author : Albert Langue : Français …. This course introdues the basics of information retrieval Content Metadata Structure © L. Tamine-Lechani
  • 11. • Different media Text (monomedia) Image Multimedia Video 11 Basic notions: Document Introduction © L. Tamine-Lechani
  • 13. • What the user seeks for: an information need • How the user expresses his information need : a query In this course: a query is a list of keywords 13 Basic notions: information need, query Introduction © L. Tamine-Lechani
  • 14. • A key concept in information retrieval A document is relevant if it matches the information need. Numerous types of relevance: o Topical (aboutness) relevance: the document covers the query topic o Situational relevance: the document matches the user's situation (e.g., task, location, ...) o Cognitive relevance : the documents fits with the user's knowledge state o ... and numerous criteria of relevance: - Novelty - Fresheness - Language - Specificity - Trust - ... The main focus in this course is topical relevance: useful and "easy" to define and to measure, but it does not cover everything related to relevance 14 Basic notions: Relevance Introduction © L. Tamine-Lechani
  • 15. 15 What makes information retrieval challenging ? Introduction © L. Tamine-Lechani © NIST (TREC)
  • 16. 16 What makes information retrieval challenging ? Introduction © L. Tamine-Lechani •Deluge of information o Large-scale information o Often little ratio of information is relevant and/or useful for a query o Information is noisy o Information is not always trusty o Hetrogeneous information forms and sources o ...
  • 17. Source : Infographic Increasing volumes of information available on increasing information sources: social applications mobile devices, sensors, ... 2003 Réseaux sociaux 2001 Wiki 1998 Recherche 1995 Annuaire 1994 E-commerce 1990 WWW 1972 ARPANET 1999 Blogs 2001 Wiki 2003 Réseaux sociaux 17 Information is every where Introduction © L. Tamine-Lechani
  • 18. 18 Focus on Web 3.0: The digital world today Introduction © L.Tamine-Lechani •1st place: platforms for publication/sharing of texts (mostly), newsletters, podcasts, videos, photos, o Wikipedia, Blogger, Google Poadcast, youtube, Flickr, TripAdvisor, ... •2nd place: platforms for messaging o Facebook, Messenger, telegram,... •3rd place: platforms for conversations o Quora, StackExchange, Reddit, Facebook groups, Google Groups, ... •4th place: platforms for collaboration o Facebook workplace, TeamWork, Chatter, ... image credit https://fredcavazza.net/2021/05/06/panorama-des- medias-sociaux-2021/
  • 19. 19 2003 Réseaux sociaux 2003 Réseaux sociaux Source : https://datastudio.google.com/embed/reporting/1sImC_rjeWqNXdgQt5MtmrQMbH44qFjtA/page/1fzh • Google processes in 2020 more than 7 milliards of queries every day among which 15% have never been submitted before (new queries) • The number of users in the world is estimated as 2.77 milliards on social media, 2.46 milliards in 2017 • 51%, or more than 240 milliards of dollars, de tout l'argent publicitaire dépensé dans le monde en 2019 seront basés sur les médias numériques. • Les ventes en ligne devraient atteindre 3.45 billions de dollars de ventes en 2020 • 47.3% de la population mondiale devrait acheter en ligne en 2020. Statitistics on usage of information access systems 2014-2020 Some statistics 2020-2021: information and users Introduction image credit https://www.internetlivestats.com/ • Users and information shared in live 2021
  • 20. 20 What makes information retrieval challenging? Introduction © L. Tamine-Lechani •Information needs are ambiguous oQueries are generally short, ambiguous oThe matching between queries and intents is M-N Roi lion 1 Queryà N intents - Master UPS Intelligence artificielle - Université paul Sabatier IA - Formation IA Toulouse - Matsre IAFA .. M Queriesà 1 intent
  • 21. 21 What makes information retrieval challenging ? Introduction © L. Tamine-Lechani •Relevance is subjective o Relevance is subjective ü User-dependent ü Situation-dependant ü Topicality is often the threshold relevance •Relevance faces vocabulary mismatch between queries and documents o Matching as word overlap: is it really semantic overlap? Q: "most jurisdictions exercise a high degree of regulation over banks" [financial institution] D1: "I have been stolen when I withdrew the money from the bank" [Building] D2: "fish lined the bank of the stream" [The land alongside or sloping down to a river or lake] o Matching is not exact, rough matching between queries and documents Q: "Presidential Elections in France" D1 : "Election campaign is running" [relevant, but missing ‘presidential’ and ‘France’] D2 : "Macron, the President of France is attending COP21" [irrelevant, and matching ‘France’ and ‘President’]
  • 22. 22 What makes information retrieval challenging ? Introduction © L. Tamine-Lechani •Queries and documents vary in length oModels must handle variable length input oRelevant documents have irrelevant content Q: "variant Omicron symptomes" D: "Le variant Omicron a déjà atteint plusieurs patients en France après avoir fait son apparition en Afrique du Sud. S'il semble plus transmissible, il ne serait pas plus virulent. Mais quels sont ses symptômes ? Le 26 novembre dernier, l’Organisation mondiale de la Santé (OMS) qualifiait le variant Omicron, nouvellement apparu en Afrique du Sud, de « préoccupant » sur la base de sa rapidité de propagation. De nombreux cas commencent depuis à émerger à travers le monde, dont quelques-uns en France. Mais concernant sa dangerosité ou ses symptômes, le grand flou règne. Alors, que savons-nous ? En se basant sur les situations en Afrique du Sud et au Royaume-Unis, l'OMS a indiqué dans une mise au point technique que le variant Omicron semble se propager plus vite que Delta. Néanmoins, contrairement à ce dernier, les symptômes seraient moins sévères. Pas de perte de goût ou d’odorat Interrogée par la BBC, le Dr Angelique Coetzee, présidente de l’Association médicale sud-africaine, qui fut l’une des premières à être confrontée à Omicron, a indiqué que les symptômes qu’elle a pu observer semblent moins spécifiques que ceux de la maladie originelle. « Cela a débuté avec un patient de sexe masculin âgé d’environ 33 ans », a-t-elle expliqué lors de cet entretien. « Il a déclaré qu’il était extrêmement fatigué ces derniers jours et se plaignait de courbatures et de légers maux de tête. » Mais l’homme n’a pas perdu son sens du goût ni celui de l’odorat ; il avait la « gorge qui le grattait », et non pas un mal de gorge et une toux comme avec les variants précédents. Elle a également déclaré que les autres patients auscultés le même jour « présentaient les mêmes symptômes bénins ". Source: https://www.leprogres.fr/magazine-sante/2021/12/13/variant-omicron-quels-sont-les-premiers-symptomes- detectes
  • 23. 23 What makes information retrieval similar vs. different from data retrieval (Databases)? Introduction © L. Tamine-Lechani Information retrieval Data retrieval Information unit Information Data (attribute-value) Query Vague expression of an information need Vague expressio Language of the query Natural language Formel language Matching query-information Approximatif Exact Selected information Information relevant to the query All the data that satifies the query
  • 24. Documents Documents representations Information need Query Selected documents Indexing Expression Matching Feedback 24 Copyright L.Tamine-Lechani The basic process of information retrieval Introduction
  • 25. FOUNDATIONS OF INFORMATION RETRIEVAL 25 • Lecture structure oIntroduction o Chapter 1: Text indexing and representation "How to transform raw texts into machinable representations? Keywords: indexation, words, documents, representation learning of texts o Chapter 2: Information retrieval (IR) models "How to score the relevance of a document as an answer to a user's query?" Keywords: relevance status value, retrieval model o Chapter 3: Performance evaluation of an IR system "How to measure the performance of an information retrieval system?" Keywords: evaluation metrics, test collections o Chapter 4: From question-answering systems to chatbots "How to interact with systems while searching for information?" Keywords: conversation, turn, clarification