SlideShare a Scribd company logo
ETHIOPIAN POLICE UNIVERSITY
DEPARTMENT OF INFORMATION TECHNOLOGY AND CYBER SECURITY
Information Storage and Retrieval
Chapter One: Introduction to Information Storage and Retrieval
Chapters Point of Discussions
• IR and IR systems
• Data versus information retrieval
• IR and the retrieval process
• Basic structure of an IR system
Chapters objectives
• At the end of this chapter you should have a comprehensive
understanding of:
• Information Retrieval
• The differences between data and information retrieval
• The details of the retrieval process and
• The fundamental structure of IR systems.
Brainstorming
• Consider Google search engine as use case and discuss:
 How does Google decide which websites to show when you search
for something?
• What do you think makes a website more likely to appear at the
top?
 What do you think happens when you type a word into Google?
• Can you describe the steps from your search to the results you
see?
 What kinds of problems do you think Google might face when trying
to find and show the right information from millions of websites?
Brainstorming
• How does Google decide which websites to show when you search for
something? What do you think makes a website more likely to appear at the
top?
 Google uses a system called algorithms to rank websites.
 Relevance to the search term, the quality of its content, the number of other
sites linking to it, and how often it is updated are factors to determine the rank.
 Websites that provide valuable, trustworthy information are often ranked higher.
• What do you think happens when you type a word into Google? Can you
describe the steps from your search to the results you see?
 It quickly searches its massive index of web pages.
 It looks for pages that match your query, ranks them based on relevance, and
then displays a list of results on the search results page.
 This process happens just in seconds!
Brainstorming
• What kinds of problems do you think Google might face when trying to find
and show the right information from millions of websites?
 Google face challenges to provide comprehensive search results for
languages those lack extensive online content or digital resources.
Introduction
• Nowadays, enormous amounts of data are being generated
continuously from various sources such as social media platforms,
sensors and more.
 Data lacks value, if we can't access and search through it
effectively, which would be extremely challenging without
information retrieval systems.
• Information retrieval (IR) is the process of finding material (usually
documents) of an unstructured nature (usually text) that satisfies an
information need from large collections (usually stored on computers).
• Information retrieval deals with representation, storage, organization
of, and access to information items.
 The organization and access of information items should provide the user
with easy access to the information in which he/she is interested.
General Goal of IR Systems
• To help users find useful information based on their information
needs (with a minimum effort) despite
 Increasing complexity of Information
 Changing needs of user
Typical IR Task
IR
System
Query
String
Document
corpus
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
.
.
Given:
 A corpus of textual natural-
language documents.
 A user query in the form of
a textual string.
Find:
 A ranked set of documents
that are relevant to the
query
Data versus Information Retrieval
• Emphasis of IR is on the retrieval of information, rather than on the
retrieval of data.

Data retrieval
 Consists mainly of determining which documents contain a set of keywords
in the user query (which is not enough to satisfy the user information need)
 Aims at retrieving all objects that satisfy well defined semantics
 a single erroneous object among a thousand retrieved objects implies failure

Information retrieval
 Is concerned with retrieving information about a subject or topic than
retrieving data which satisfies a given query
 semantics is frequently loose: the retrieved objects might be inaccurate
 small errors are tolerated
Data versus information retrieval(cont’d…)
• Example of data retrieval system is a relational database
Criteria Data retrieval Information retrieval
Data Structured data Free text, unstructured
Result Exact matches Partial/Approximate matches
Accessibility Knowledgeable users Non-expert humans
Sensitivity Single error, total failure Small errors are unnoticed
Query language SQL(artificial) Natural
Examples of IR Systems
• Document-retrieval systems:
 Store entire documents
 Usually retrieve stored document by title or by key words
associated with the document.
• Reference retrieval systems:
 Store references to documents rather than the documents
themselves.
 Usually provide the titles of relevant documents and
frequently their physical locations.
 Extremely effective in libraries
Examples of IR Systems(cont’d…)
• Cross language information retrieval: designed to retrieve
information in one language based on queries formulated in
another language.
 Accept queries in user preferred language.
 Translates the query into the target language of the
document collection.
 Searches the documents for matches to the translated query.
 Rank retrieved documents based on relevance, considering
factors like keyword matching and context.
Examples of IR Systems(cont’d…)
• Question-answering IR system: designed to provide specific answers to
user queries instead of just returning a list of documents.
 Processing: analyzing of the query to identify key concepts and intent.
 Retrieval: searches a structured or unstructured data source to find
relevant information.
• Ranking of retrieved documents on their relevance to question
using algorithms that assess factors like keyword matching,
context, and semantic meaning.
 Answer extraction: extraction of potential answers from the ranked
documents, focusing on sentences or phrases that directly respond to
the query.
 Response Generation: formats the final answer to ensure clarity and
conciseness.
Examples of IR Systems(cont’d…)
• Image Retrieval: designed to search and retrieve images from a database or the
internet based on specific queries, often using visual content or metadata.
 Text-Based Image Retrieval: relies on metadata (titles, descriptions, tags)
associated with images.
 Searches for images that match the keywords or phrases provided by the
user.
 Content-Based Image Retrieval (CBIR): analyzes the visual content of images to
find matches.
 Utilizes features such as color, texture and shapes extracted from the
images.
 Retrieval Process:
 Index both visual features and associated metadata
 comparing the user’s input (text or visual) against the indexed images.
 retrieve images are ranked based on relevance to the query, considering both
visual similarity and textual metadata matches.
What makes IR hard?
• Query evaluation (or retrieval process)
– To what extent does a document correspond to a query?
– Simply, matching on words is a very hard approach as one
word can have different semantic meanings.
• System evaluation
– How good is a system?
– Are the retrieved documents relevant? (precision)
– Are all the relevant documents retrieved? (recall)
Intelligent IR:
Taking into account the meaning of the words used.
Taking into account the order of words in the query.
IR and the retrieval process
IR and the retrieval process(cont’d…)
• It is necessary to define the text database before any of the
retrieval processes are initiated.
• This is usually done by the manager of the database and includes
specifying the following
– The documents to be used
– The operations to be performed on the text
– The text model to be used (the text structure and what
elements can be retrieved)
• The text operations transform the original documents and the
information needs and generate a logical view of them
IR and the retrieval process(cont’d…)
• Once the logical view of the documents is defined, the database
module builds an index of the text
– An index is a critical data structure
– It allows fast searching over large volumes of data
• Different index structures might be used, but the most popular one
is the inverted file.
• Given that the document database is indexed, the retrieval process
can be initiated.
IR and the retrieval process(cont’d…)
• The user first specifies a user need via the user interface which is
then parsed and transformed by the same text operation applied
to the text.
• Next the query operations is applied before the actual query,
which provides a system representation for the user need, is
generated.
• The query is then processed to obtain the retrieved documents
(Searching).
• Before the retrieved documents are sent to the user, the retrieved
documents are ranked according to the likelihood of relevance
IR and the retrieval process(cont’d…)
• The user then examines the set of ranked documents in the search
for useful information. Two choices for the user:
– reformulate query, run on entire collection or
– reformulate query, run on result set
• At this point, s/he might locate a subset of the documents seen as
definitely of interest and initiate a user feedback cycle
• In such a cycle, the system uses the documents selected by the
user to change the query formulation.
• Modified query is assumed to be better representation of the real
user need than the previous one.
Basic Structure of an IR System
• An Information Retrieval System serves as a bridge between the world of
authors and the world of readers/users.
• IR system typically consists of three
main subsystems:
 Document representation
 Representation of users'
requirements (queries)
 The algorithms used to match user
requirements (queries) with
document representations.
We are IT professionals, nothing should be black box for us, we need to open it and see
Pros and cons of IR System
• Pros
– Fast Answers: super-fast and efficient at finding and bringing back the
exact information needed from huge amounts of data.
– 24/7 Availability: retrieval systems never take breaks.
• They are always active, standing by to retrieve information
whenever we require it, whether it's daytime or night-time.
• Cons
– Garbage In Garbage Out: greatly depends on the accuracy and
cleanliness of the data provided to generate meaningful results.
– Overreliance on Keywords: If search terms don’t match exactly,
crucial information will be missed.
– Information Overload Risk: retrieval of too much information.
Thank you!
Ad

More Related Content

Similar to Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieval ch1 (20)

Chapter 1 Intro Information Rerieval.pptx
Chapter 1 Intro Information Rerieval.pptxChapter 1 Intro Information Rerieval.pptx
Chapter 1 Intro Information Rerieval.pptx
bekidea
 
Unit 1
Unit 1Unit 1
Unit 1
karthiksmart21
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
alaa223
 
Lec1
Lec1Lec1
Lec1
alaa223
 
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdfChapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Habtamu100
 
CSC315_LECTURE on database design and management
CSC315_LECTURE on database design and managementCSC315_LECTURE on database design and management
CSC315_LECTURE on database design and management
tissandavid
 
Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
Habtamu100
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3
alaa223
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Vikas Bhushan
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
Salford Systems
 
Starting a search application
Starting a search applicationStarting a search application
Starting a search application
Lucidworks (Archived)
 
Informationa Retrieval Techniques .pptx
Informationa Retrieval Techniques  .pptxInformationa Retrieval Techniques  .pptx
Informationa Retrieval Techniques .pptx
lekhacce
 
CHAPTER -12 it.pptx
CHAPTER -12 it.pptxCHAPTER -12 it.pptx
CHAPTER -12 it.pptx
Koteswari Kasireddy
 
Ir 01
Ir   01Ir   01
Ir 01
Mohammed Romi
 
Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrieval
Sadaf Rafiq
 
Hci encyclopedia irshortefords
Hci encyclopedia irshortefordsHci encyclopedia irshortefords
Hci encyclopedia irshortefords
apollobgslibrary
 
Chapter 1 Intro Information Rerieval.pptx
Chapter 1 Intro Information Rerieval.pptxChapter 1 Intro Information Rerieval.pptx
Chapter 1 Intro Information Rerieval.pptx
bekidea
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdfChapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Habtamu100
 
CSC315_LECTURE on database design and management
CSC315_LECTURE on database design and managementCSC315_LECTURE on database design and management
CSC315_LECTURE on database design and management
tissandavid
 
Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
Habtamu100
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3
alaa223
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Vikas Bhushan
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
Salford Systems
 
Informationa Retrieval Techniques .pptx
Informationa Retrieval Techniques  .pptxInformationa Retrieval Techniques  .pptx
Informationa Retrieval Techniques .pptx
lekhacce
 
Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrieval
Sadaf Rafiq
 
Hci encyclopedia irshortefords
Hci encyclopedia irshortefordsHci encyclopedia irshortefords
Hci encyclopedia irshortefords
apollobgslibrary
 

Recently uploaded (20)

SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Ad

Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieval ch1

  • 1. ETHIOPIAN POLICE UNIVERSITY DEPARTMENT OF INFORMATION TECHNOLOGY AND CYBER SECURITY Information Storage and Retrieval Chapter One: Introduction to Information Storage and Retrieval
  • 2. Chapters Point of Discussions • IR and IR systems • Data versus information retrieval • IR and the retrieval process • Basic structure of an IR system
  • 3. Chapters objectives • At the end of this chapter you should have a comprehensive understanding of: • Information Retrieval • The differences between data and information retrieval • The details of the retrieval process and • The fundamental structure of IR systems.
  • 4. Brainstorming • Consider Google search engine as use case and discuss:  How does Google decide which websites to show when you search for something? • What do you think makes a website more likely to appear at the top?  What do you think happens when you type a word into Google? • Can you describe the steps from your search to the results you see?  What kinds of problems do you think Google might face when trying to find and show the right information from millions of websites?
  • 5. Brainstorming • How does Google decide which websites to show when you search for something? What do you think makes a website more likely to appear at the top?  Google uses a system called algorithms to rank websites.  Relevance to the search term, the quality of its content, the number of other sites linking to it, and how often it is updated are factors to determine the rank.  Websites that provide valuable, trustworthy information are often ranked higher. • What do you think happens when you type a word into Google? Can you describe the steps from your search to the results you see?  It quickly searches its massive index of web pages.  It looks for pages that match your query, ranks them based on relevance, and then displays a list of results on the search results page.  This process happens just in seconds!
  • 6. Brainstorming • What kinds of problems do you think Google might face when trying to find and show the right information from millions of websites?  Google face challenges to provide comprehensive search results for languages those lack extensive online content or digital resources.
  • 7. Introduction • Nowadays, enormous amounts of data are being generated continuously from various sources such as social media platforms, sensors and more.  Data lacks value, if we can't access and search through it effectively, which would be extremely challenging without information retrieval systems. • Information retrieval (IR) is the process of finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from large collections (usually stored on computers). • Information retrieval deals with representation, storage, organization of, and access to information items.  The organization and access of information items should provide the user with easy access to the information in which he/she is interested.
  • 8. General Goal of IR Systems • To help users find useful information based on their information needs (with a minimum effort) despite  Increasing complexity of Information  Changing needs of user
  • 9. Typical IR Task IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3 . . Given:  A corpus of textual natural- language documents.  A user query in the form of a textual string. Find:  A ranked set of documents that are relevant to the query
  • 10. Data versus Information Retrieval • Emphasis of IR is on the retrieval of information, rather than on the retrieval of data.  Data retrieval  Consists mainly of determining which documents contain a set of keywords in the user query (which is not enough to satisfy the user information need)  Aims at retrieving all objects that satisfy well defined semantics  a single erroneous object among a thousand retrieved objects implies failure  Information retrieval  Is concerned with retrieving information about a subject or topic than retrieving data which satisfies a given query  semantics is frequently loose: the retrieved objects might be inaccurate  small errors are tolerated
  • 11. Data versus information retrieval(cont’d…) • Example of data retrieval system is a relational database Criteria Data retrieval Information retrieval Data Structured data Free text, unstructured Result Exact matches Partial/Approximate matches Accessibility Knowledgeable users Non-expert humans Sensitivity Single error, total failure Small errors are unnoticed Query language SQL(artificial) Natural
  • 12. Examples of IR Systems • Document-retrieval systems:  Store entire documents  Usually retrieve stored document by title or by key words associated with the document. • Reference retrieval systems:  Store references to documents rather than the documents themselves.  Usually provide the titles of relevant documents and frequently their physical locations.  Extremely effective in libraries
  • 13. Examples of IR Systems(cont’d…) • Cross language information retrieval: designed to retrieve information in one language based on queries formulated in another language.  Accept queries in user preferred language.  Translates the query into the target language of the document collection.  Searches the documents for matches to the translated query.  Rank retrieved documents based on relevance, considering factors like keyword matching and context.
  • 14. Examples of IR Systems(cont’d…) • Question-answering IR system: designed to provide specific answers to user queries instead of just returning a list of documents.  Processing: analyzing of the query to identify key concepts and intent.  Retrieval: searches a structured or unstructured data source to find relevant information. • Ranking of retrieved documents on their relevance to question using algorithms that assess factors like keyword matching, context, and semantic meaning.  Answer extraction: extraction of potential answers from the ranked documents, focusing on sentences or phrases that directly respond to the query.  Response Generation: formats the final answer to ensure clarity and conciseness.
  • 15. Examples of IR Systems(cont’d…) • Image Retrieval: designed to search and retrieve images from a database or the internet based on specific queries, often using visual content or metadata.  Text-Based Image Retrieval: relies on metadata (titles, descriptions, tags) associated with images.  Searches for images that match the keywords or phrases provided by the user.  Content-Based Image Retrieval (CBIR): analyzes the visual content of images to find matches.  Utilizes features such as color, texture and shapes extracted from the images.  Retrieval Process:  Index both visual features and associated metadata  comparing the user’s input (text or visual) against the indexed images.  retrieve images are ranked based on relevance to the query, considering both visual similarity and textual metadata matches.
  • 16. What makes IR hard? • Query evaluation (or retrieval process) – To what extent does a document correspond to a query? – Simply, matching on words is a very hard approach as one word can have different semantic meanings. • System evaluation – How good is a system? – Are the retrieved documents relevant? (precision) – Are all the relevant documents retrieved? (recall) Intelligent IR: Taking into account the meaning of the words used. Taking into account the order of words in the query.
  • 17. IR and the retrieval process
  • 18. IR and the retrieval process(cont’d…) • It is necessary to define the text database before any of the retrieval processes are initiated. • This is usually done by the manager of the database and includes specifying the following – The documents to be used – The operations to be performed on the text – The text model to be used (the text structure and what elements can be retrieved) • The text operations transform the original documents and the information needs and generate a logical view of them
  • 19. IR and the retrieval process(cont’d…) • Once the logical view of the documents is defined, the database module builds an index of the text – An index is a critical data structure – It allows fast searching over large volumes of data • Different index structures might be used, but the most popular one is the inverted file. • Given that the document database is indexed, the retrieval process can be initiated.
  • 20. IR and the retrieval process(cont’d…) • The user first specifies a user need via the user interface which is then parsed and transformed by the same text operation applied to the text. • Next the query operations is applied before the actual query, which provides a system representation for the user need, is generated. • The query is then processed to obtain the retrieved documents (Searching). • Before the retrieved documents are sent to the user, the retrieved documents are ranked according to the likelihood of relevance
  • 21. IR and the retrieval process(cont’d…) • The user then examines the set of ranked documents in the search for useful information. Two choices for the user: – reformulate query, run on entire collection or – reformulate query, run on result set • At this point, s/he might locate a subset of the documents seen as definitely of interest and initiate a user feedback cycle • In such a cycle, the system uses the documents selected by the user to change the query formulation. • Modified query is assumed to be better representation of the real user need than the previous one.
  • 22. Basic Structure of an IR System • An Information Retrieval System serves as a bridge between the world of authors and the world of readers/users. • IR system typically consists of three main subsystems:  Document representation  Representation of users' requirements (queries)  The algorithms used to match user requirements (queries) with document representations. We are IT professionals, nothing should be black box for us, we need to open it and see
  • 23. Pros and cons of IR System • Pros – Fast Answers: super-fast and efficient at finding and bringing back the exact information needed from huge amounts of data. – 24/7 Availability: retrieval systems never take breaks. • They are always active, standing by to retrieve information whenever we require it, whether it's daytime or night-time. • Cons – Garbage In Garbage Out: greatly depends on the accuracy and cleanliness of the data provided to generate meaningful results. – Overreliance on Keywords: If search terms don’t match exactly, crucial information will be missed. – Information Overload Risk: retrieval of too much information.