0% found this document useful (0 votes)
7 views

Learning Guide Unit 1 _ Home

Unit 1 of the course introduces key concepts in Information Retrieval (IR), including the Boolean and Ranked Retrieval Models, inverted indexes, and techniques like tokenization and stemming. The learning objectives focus on understanding these models and their constructs, as well as improving retrieval efficiency. Students are expected to engage in reading assignments, discussions, and self-assessments to reinforce their understanding of IR principles.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Learning Guide Unit 1 _ Home

Unit 1 of the course introduces key concepts in Information Retrieval (IR), including the Boolean and Ranked Retrieval Models, inverted indexes, and techniques like tokenization and stemming. The learning objectives focus on understanding these models and their constructs, as well as improving retrieval efficiency. Students are expected to engage in reading assignments, discussions, and self-assessments to reinforce their understanding of IR principles.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

11/18/24, 9:27 PM Learning Guide Unit 1 | Home

Overview

Unit 1: Introduction to IR, Boolean Retrieval, and Terms and Postings (Chapters 1 & 2)

Topics:

Introduction to Information Retrieval


Boolean Retrieval Model
Inverted Index
Ranked Retrieval Model
Tokenization
Stemming and Lemmatization
Skip pointers
Biword indexes
Positional indexes

Learning Objectives:

By the end of this Unit, you will be able to:

1. Identify the characteristics of the Boolean Retrieval Model


2. Describe the basic constructs of information retrieval systems including:
Inverted Index
Dictionary
Postings list
3. Identify the characteristics of the Ranked Retrieval Model and be able to compare and contrast with the Boolean Retrieval Model
4. Describe the process of document tokenization
5. Define and be able to implement methods to reduce term vocabulary including:
Stop words
Token Normalization
Capitalization and Case folding
6. Define and be able to implement methods of Stemming and lemmatization including:
Porter stemmer
Lemmitizer
7. Implement methods to improve the efficiency and performance of processing posting lists to include:
8. Skip lists
Biword indexes
Phrase Indexes
Positional indexes

Tasks:

Read the Learning Guide and Reading Assignments


Participate in the Discussion Assignment (post, comment, and rate in the Discussion Forum)
Make entries to the Learning Journal
Take the Self-Quiz

https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 4/10
11/18/24, 9:27 PM Learning Guide Unit 1 | Home

Introduction

This course will explore the key theories of information retrieval and will also put these theories into practice as you will build a
complete information retrieval (IR) system in a series of four development projects. Information retrieval has its beginnings in a paper
presented by Vannevar Bush in 1945 (Bush, 1945) in which Bush describes a system capable of storing and retrieving large amounts of
information. Lesk (1995) describes information retrieval as a discipline that ‘grew up’ as a function of library science. The archival and
ability to search library information was an important application of information retrieval techniques. The introduction of the internet
and the world wide web in the 1990’s has significantly broadened the role and application of information retrieval techniques. Google
has become a technology leader by applying IR techniques to develop the ability to index and search the world wide web. One objective
of this course is to develop and understanding of the underlying theory of IR and the skills necessary to apply IR techniques.

The basic objective in information retrieval is the ability to find specific information within a corpus through the use of a query. A corpus
is a collection of information usually in the form of documents although other forms of media are becoming increasingly commonplace.
Imagine a collection of Shakespeare’s plays and you wanted to find just those that included ‘Ceasar’ as a subject. The way that you could
accomplish this is by scanning each work for the word ‘Ceasar’.

In our Unit 1 reading assignment, we will begin to explore information retrieval (IR). The first concept that we are introduced to is the
Boolean Retrieval model. The term Boolean refers to a simple two state protocol; on/off, true/false, and of course present/not present.
The Boolean retrieval method is based upon the presence or lack of presence of the search term. The Boolean method is a very basic
concept that does not rank results but simply returns any document that meets the terms of the search.

One of the key topics that is introduced in unit one is the concept of an inverted index. The inverted index which is also called the
postings file is a data structure that maps the words extracted from a document or set of documents to the documents that contain
them and also typically maintains the frequency the word appears.

The purpose of this structure is that it allows specific terms to be quickly searched to determine which documents contain the words
(search terms). Although the inverted index structure can support the Boolean Retrieval Model, it also enables other models such as the
Ranked Retrieval Model.

The Ranked Retrieval Model differs from the Boolean model in that users make use of free text queries rather than the precise language
of the Boolean model. In the Boolean model we issue a query that incorporates a strict Boolean language format which includes
keywords such as AND in which both terms are required to be present in order to return a document, OR in which either term can be
present to return the document or NOT in which the term CANNOT be present in order to return the document.

In the ranked retrieval model, queries are free text and relevance is determined by techniques such as the vector space model, learned
weights and other techniques for determining relevance.

In this first unit we are introduced to a number of concepts that may be quite new for you. Including tokenization, stemming, byword
indexes, and positional indexes. Make sure that you spend some time understanding these concepts. As a reminded, each unit contains
a self-quiz. This self-quiz does not receive a grade and has no points, however, it is designed as a learning tool and is important to use in
conjunction with the reading assignment. You should begin each unit by completing the reading assignment, reviewing the unit
overview, and then taking the self-quiz. Every time you answer a question incorrectly, you should immediately go back and review the
relevant sections in the reading assignment or overview to ensure your understanding of the subject matter. This iterative process will
aid in your learning and help you to prepare for the mid-term and final exams.

References

Bush, V. (1945). As We May Think. Atlantic Monthly. 176(1). 101-108. Retrieved June 10, 2011 here.

Lesk, M. (1995). The Seven Ages of Information Retrieval. UDT Occasional Paper # 5. Retrieve June 10, 2011
from http://archive.ifla.org/VI/5/op/udtop5/udtop5.htm

https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 5/10
11/18/24, 9:27 PM Learning Guide Unit 1 | Home

Reading Assignment

Manning, C.D., Raghaven, P., & Schütze, H. (2009). An Introduction to Information Retrieval (Online ed.). Cambridge, MA: Cambridge
University Press. Available at http://nlp.stanford.edu/IR-book/information-retrieval-book.html

Chapter 1: Boolean retrieval


Chapter 2: The term vocabulary and posting list

Key Terms for Unit 1

Boolean Retrieval
Document
Corpus
Inverted Index
Posting
Intersection
Ranked Retrieval
Term Frequency
Tokenization
Document unit
Stop words
Normalization
Stemming
Lemmatization
Skip pointer
Biword index
Positional index

https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 6/10
11/18/24, 9:27 PM Learning Guide Unit 1 | Home

Discussion Assignment

In unit one, we are introduced to the concept of the inverted index as a fundamental technology in information retrieval systems. The
inverted index essentially is an index of words known as terms extracted from the document corpus that can be searched to find
documents with the content that the user is looking for. Our text also introduces two extensions to the concept of the inverted index,
the biword index and the positional index.

For your discussion assignment:

Select either the biword index or positional index


Provide a description of the index that you selected. As part of your description make sure that you describe why and how it is
different than the inverted index.
Describe both where and when the index would it be used
Describe the advantage the index has over the inverted index

You must post your initial response before being able to review other student’s responses. Once you have made your first response,
you will be able to reply to other student’s posts. You are expected to make a minimum of 3 responses to your fellow student’s posts.

Peer-Assessment Criteria
*In addition to the criteria already posted in the Discussion Forum

Did the posting describe either the byword index or positional index?
Did the description explain how the index is different from the inverted index?
Did the posting describe under what circumstances the index would be used?
Did the posting describe the advantage that the index has over the inverted index?

https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 7/10
11/18/24, 9:27 PM Learning Guide Unit 1 | Home

Learning Journal

Your learning journal entry must be a reflective statement that considers the following questions:

Describe what you did. This does not mean that you copy and paste from what you have posted or the assignments you have
prepared. You need to describe what you did and how you did it.
Describe your reactions to what you did
Describe any feedback you received or any specific interactions you had. Discuss how they were helpful
Describe your feelings and attitudes
Describe what you learned

Another set of questions to consider in your learning journal statement include:

What surprised me or caused me to wonder?


What happened that felt particularly challenging? Why was it challenging to me?
What skills and knowledge do I recognize that I am gaining?
What am I realizing about myself as a learner?
In what ways am I able to apply the ideas and concepts gained to my own experience?

Your Learning Journal must be a minimum of 500 words.

https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 8/10
11/18/24, 9:27 PM Learning Guide Unit 1 | Home

Self-Quiz

The Self-Quiz gives you an opportunity to self-assess your knowledge of what you have learned so far.

The results of the Self-Quiz do not count towards your final grade, but the quiz is an important part of the University’s learning process
and it is expected that you will take it to ensure understanding of the materials presented. Reviewing and analyzing your results will help
you perform better on future Graded Quizzes and the Final Exam.

Please access the Self-Quiz on the main course homepage; it will be listed inside the Unit.

https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 9/10

You might also like