0% found this document useful (0 votes)
9 views

02 - Lect2 Biomedical IR

Uploaded by

Mahmoud Nasser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

02 - Lect2 Biomedical IR

Uploaded by

Mahmoud Nasser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Biomedical

Information Retrieval
 Search Engine Architecture
Lecture 2

Dr. Ebtsam AbdelHakam


[email protected]
Minia University
Introduction
Retrieving relevant information from biomedical text data is a new
challenging area of research. Thousands of articles are being added
into biomedical literature each year and this large collection of
publications offer an excellent opportunity for discovering hidden
biomedical knowledge by applying information retrieval (IR) and
Natural Language Processing (NLP) technologies.

Biomedical Text processing is different from others. It requires


special kind of processing as it has complex medical terminologies.
Medical entity identification and normalization itself is a research
problem. Relationships among medical entities have the impact on
any system.

Medical field has various types of queries: short questions, medical


case reports, medical case narratives, verbose medical queries,
community questioning, semi-structured queries, etc. These diverse
nature of medical data demands special kind of attention from IR and
NLP.
Requirements of Designing a
Search Engine

 The two primary requirements of a search engine are:

 • Effectiveness (quality): We want to be able to retrieve


the most relevant set of documents possible for a query.

 • Efficiency (speed): We want to process queries from


users as quickly as possible.
Designing a Search Engine

Search engine design balances two factors:

‣ Effectiveness – accuracy of results, presentation of


results, absence of spam, good ad selection

‣ Efficiency / Performance – response time,


concurrency, disaster mitigation, security issues.

These factors deeply impact the architecture of these


systems. Often the engineering solutions feed back into
research (NoSQL, Map Reduce, etc.).
Search Engine Basic Building Blocks

 Search engine components support two major functions, which


we are called:
 .
 1- the indexing process: The indexing process builds the
structures that enable searching.
 The index (inverted index) is an efficient data structure that represents the
documents of a Corpus and allows fast searching of the Corpus documents
using that indexed information.

 2- the query process: the query process uses those structures


(index) and a person’s query to produce a ranked list of
documents
Query process

1. User interaction
It supports creation and refinement of user query and
displays the results.

2. Ranking
It uses query and indexes to create ranked list of
documents.

3. Evaluation
It monitors and measures the effectiveness and
efficiency. It is done offline
Query Process
(User Interaction)
The• user interaction component provides the interface
between the person doing the searching and the search
engine.

Its four tasks are:


1- Accepting the user’s query, query language is defined and
transforming it into index terms.

2- Query Transformation: The user-interface parses user queries, and


converts search terms in a form that is acceptable for input to the
query engine i.e. into index terms that appear in the index
vocabulary.
User Interaction

User Interaction Component


3- Spell checking and query suggestion and refinement .
‣ Query expansion adds terms related to the query terms (e.g. synonyms,
related entities)
‣ Relevance feedback runs an initial query, then uses the top-ranked
documents to expand the query for a second run

4- Take the ranked list of documents from the search engine and
organize it into the results shown to the user.
‣ Displays the top-ranked results
‣ Generates snippets to show how queries match documents
‣ Highlights important words and passages
‣ Retrieves query-relevant advertising.
What is Query Expansion?

 Query Expansion is the term given when a


search engine adding search terms to a
user’s weighted search.
 The goal is to improve precision and/or
recall.

 Example: User Query: “car”; Expanded


Query: “car cars automobile automobiles
auto” etc…
Query Process
(Ranking)
Ranking Component

 The ranking component is the core of the search engine.


• It takes the transformed query from the user interaction component
and generates a ranked list of documents using scores based on a
retrieval model.

• Ranking must be both efficient, since many queries may need to be


processed in a short time, and effective, since the quality of the
ranking determines whether the search engine accomplishes the goal of
finding relevant information.

 The efficiency of ranking depends on the indexes,

 The effectiveness depends on the retrieval model.


Ranking
Document scoring

‣ A score is assigned to the most likely-relevant documents based


on how well it matches the query.

‣ Core component of a search engine, and often the most


closely-guarded secret.

‣ Many, many approaches and variations have been


developed

‣ The basic form is the dot product of query term weights and
corresponding document weights:
Query Process
(Evaluation)
Evaluation component
 The task of the evaluation component is to measure and monitor
effectiveness and efficiency.

• An important part of that is to record and analyze user behavior using


log data.

 The results of evaluation are used to tune and improve the ranking
component.

• Most of the evaluation component is not part of the online search


engine, apart from logging user and system data.

 Evaluation is primarily an offline activity, but it is a critical part of any


search application.
Evaluation component
• Logging

‣ Logging user interaction is an essential tool for


measuring performance

‣ Query logs and clickthrough data are used for query


suggestion, spell checking, query caching, ranking,
advertising search, …

• Logging. Query logs of the users’ interactions with the


search engine are obtained and are of paramount
importance.

• They can improve the search experience, speed up


results, store results of common queries, and identify
source of new revenue.
Evaluation component

 Pages that are clicked or ignored might be logged to improve the overall
quality of the search engine but also detect patterns in user activity (i.e.
data-mining).

 Query logs can be used for a variety of other reasons that include:
1. Keeping track of a history of user queries,
2. Generation of spell checking logs (instead of running the spellchecker
every time)
3. Recording of time spent on the query or a particular document
4. Query logs and clickt-hrough data are used for query suggestion, spell
checking, query caching, ranking, advertising search.

You might also like