Learning Guide Unit 1 _ Home
Learning Guide Unit 1 _ Home
Overview
Unit 1: Introduction to IR, Boolean Retrieval, and Terms and Postings (Chapters 1 & 2)
Topics:
Learning Objectives:
Tasks:
https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 4/10
11/18/24, 9:27 PM Learning Guide Unit 1 | Home
Introduction
This course will explore the key theories of information retrieval and will also put these theories into practice as you will build a
complete information retrieval (IR) system in a series of four development projects. Information retrieval has its beginnings in a paper
presented by Vannevar Bush in 1945 (Bush, 1945) in which Bush describes a system capable of storing and retrieving large amounts of
information. Lesk (1995) describes information retrieval as a discipline that ‘grew up’ as a function of library science. The archival and
ability to search library information was an important application of information retrieval techniques. The introduction of the internet
and the world wide web in the 1990’s has significantly broadened the role and application of information retrieval techniques. Google
has become a technology leader by applying IR techniques to develop the ability to index and search the world wide web. One objective
of this course is to develop and understanding of the underlying theory of IR and the skills necessary to apply IR techniques.
The basic objective in information retrieval is the ability to find specific information within a corpus through the use of a query. A corpus
is a collection of information usually in the form of documents although other forms of media are becoming increasingly commonplace.
Imagine a collection of Shakespeare’s plays and you wanted to find just those that included ‘Ceasar’ as a subject. The way that you could
accomplish this is by scanning each work for the word ‘Ceasar’.
In our Unit 1 reading assignment, we will begin to explore information retrieval (IR). The first concept that we are introduced to is the
Boolean Retrieval model. The term Boolean refers to a simple two state protocol; on/off, true/false, and of course present/not present.
The Boolean retrieval method is based upon the presence or lack of presence of the search term. The Boolean method is a very basic
concept that does not rank results but simply returns any document that meets the terms of the search.
One of the key topics that is introduced in unit one is the concept of an inverted index. The inverted index which is also called the
postings file is a data structure that maps the words extracted from a document or set of documents to the documents that contain
them and also typically maintains the frequency the word appears.
The purpose of this structure is that it allows specific terms to be quickly searched to determine which documents contain the words
(search terms). Although the inverted index structure can support the Boolean Retrieval Model, it also enables other models such as the
Ranked Retrieval Model.
The Ranked Retrieval Model differs from the Boolean model in that users make use of free text queries rather than the precise language
of the Boolean model. In the Boolean model we issue a query that incorporates a strict Boolean language format which includes
keywords such as AND in which both terms are required to be present in order to return a document, OR in which either term can be
present to return the document or NOT in which the term CANNOT be present in order to return the document.
In the ranked retrieval model, queries are free text and relevance is determined by techniques such as the vector space model, learned
weights and other techniques for determining relevance.
In this first unit we are introduced to a number of concepts that may be quite new for you. Including tokenization, stemming, byword
indexes, and positional indexes. Make sure that you spend some time understanding these concepts. As a reminded, each unit contains
a self-quiz. This self-quiz does not receive a grade and has no points, however, it is designed as a learning tool and is important to use in
conjunction with the reading assignment. You should begin each unit by completing the reading assignment, reviewing the unit
overview, and then taking the self-quiz. Every time you answer a question incorrectly, you should immediately go back and review the
relevant sections in the reading assignment or overview to ensure your understanding of the subject matter. This iterative process will
aid in your learning and help you to prepare for the mid-term and final exams.
References
Bush, V. (1945). As We May Think. Atlantic Monthly. 176(1). 101-108. Retrieved June 10, 2011 here.
Lesk, M. (1995). The Seven Ages of Information Retrieval. UDT Occasional Paper # 5. Retrieve June 10, 2011
from http://archive.ifla.org/VI/5/op/udtop5/udtop5.htm
https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 5/10
11/18/24, 9:27 PM Learning Guide Unit 1 | Home
Reading Assignment
Manning, C.D., Raghaven, P., & Schütze, H. (2009). An Introduction to Information Retrieval (Online ed.). Cambridge, MA: Cambridge
University Press. Available at http://nlp.stanford.edu/IR-book/information-retrieval-book.html
Boolean Retrieval
Document
Corpus
Inverted Index
Posting
Intersection
Ranked Retrieval
Term Frequency
Tokenization
Document unit
Stop words
Normalization
Stemming
Lemmatization
Skip pointer
Biword index
Positional index
https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 6/10
11/18/24, 9:27 PM Learning Guide Unit 1 | Home
Discussion Assignment
In unit one, we are introduced to the concept of the inverted index as a fundamental technology in information retrieval systems. The
inverted index essentially is an index of words known as terms extracted from the document corpus that can be searched to find
documents with the content that the user is looking for. Our text also introduces two extensions to the concept of the inverted index,
the biword index and the positional index.
You must post your initial response before being able to review other student’s responses. Once you have made your first response,
you will be able to reply to other student’s posts. You are expected to make a minimum of 3 responses to your fellow student’s posts.
Peer-Assessment Criteria
*In addition to the criteria already posted in the Discussion Forum
Did the posting describe either the byword index or positional index?
Did the description explain how the index is different from the inverted index?
Did the posting describe under what circumstances the index would be used?
Did the posting describe the advantage that the index has over the inverted index?
https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 7/10
11/18/24, 9:27 PM Learning Guide Unit 1 | Home
Learning Journal
Your learning journal entry must be a reflective statement that considers the following questions:
Describe what you did. This does not mean that you copy and paste from what you have posted or the assignments you have
prepared. You need to describe what you did and how you did it.
Describe your reactions to what you did
Describe any feedback you received or any specific interactions you had. Discuss how they were helpful
Describe your feelings and attitudes
Describe what you learned
https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 8/10
11/18/24, 9:27 PM Learning Guide Unit 1 | Home
Self-Quiz
The Self-Quiz gives you an opportunity to self-assess your knowledge of what you have learned so far.
The results of the Self-Quiz do not count towards your final grade, but the quiz is an important part of the University’s learning process
and it is expected that you will take it to ensure understanding of the materials presented. Reviewing and analyzing your results will help
you perform better on future Graded Quizzes and the Final Exam.
Please access the Self-Quiz on the main course homepage; it will be listed inside the Unit.
https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814 9/10