0% found this document useful (0 votes)

8 views

ISR Chap...4

Chapter Four discusses the basics of indexing in information retrieval (IR) systems, highlighting the importance of indexing for document searchability and the various methods of indexing such as Huffman coding and Lempel-Ziv compression. It also covers the structure and organization of inverted files, which facilitate efficient searching by mapping terms to their corresponding documents. Key concepts include the design of IR systems, the role of text compression, and the construction of vocabulary and postings files for effective indexing.

Uploaded by

biruktilahundinki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

ISR Chap...4

Uploaded by

biruktilahundinki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Chapter Four

Indexing Basics

1
Text Collections and IR
• Large collections of documents from various sources: news
articles, research papers, books, digital libraries, Web pages,
etc.
Sample Statistics of Text Collections
• Dialog:
–claims to have more than 15 terabytes of data in >600 Databases, >
800 million unique records
• LEXIS/NEXIS:
–claims 7 terabytes, 1.7 billion documents, 1.5 million subscribers,
11,400 databases; >200,000 searches per day
• Web Search Engines:
–Google claim to index over 1.5 billion pages.
–How many search engines are available these days?

2
Designing an IR System
Our focus during IR system design is:
• In improving performance effectiveness of the system
–Effectiveness of the system is measured in terms of precision,
recall, …
–Stemming, stop-words, weighting schemes, matching algorithms
• In improving performance efficiency.
The concern here is
–storage space usage, access time, …
–Compression, data/file structures, space – time tradeoffs
• The two subsystems of an IR system:
–Searching and
–Indexing

3
Indexing Subsystem
documents
Documents Assign document identifier

text document
Tokenize
IDs
tokens
Stop list
non-stoplist Stemming & Normalize
tokens
stemmed Term weighting
terms
terms with
weights Index
4
Searching Subsystem
query parse query
query tokens
ranked non-stoplist
document Stop list
tokens
set
Ranking
Stemming & Normalize
relevant stemmed terms
document set
Similarity Query Term weighting
Measure terms
Index terms
Index

5
Basic assertion
Indexing and searching:
– you cannot search that was not first indexed in some
manner or other
– indexing of documents or objects is done in order to
be searchable
• there are many ways to do indexing
– to index one needs an indexing language
• there are many indexing languages
• even taking every word in a document is an indexing
language
Knowing searching is knowing indexing

6
Implementation Issues
•Storage of text:
–The need for text compression: to reduce storage space

•Indexing text
–Organizing indexes
–Storage of indexes

•Accessing text

7
Text Compression
• Text compression is about finding ways to represent the
text in fewer bits or bytes
• Advantages: Disadvantage
The time required to decode
–save storage space requirement.
and encode the text.
–speed up document transmission time
–Takes less time to search the compressed text
• Common compression methods
–Static methods: which requires statistical information about
frequency of occurrence of symbols in the document
E.g. Huffman coding
•Estimate probabilities of symbols, code one at a time, shorter codes
for high probabilities
–Adaptive methods: which constructs dictionary in the processing
of compression
E.g. Lempel-Ziv Compression (LZ):
•Replace words or symbols with a pointer to dictionary entries 8
Huffman coding
•Developed in 1950s by David Huffman,
widely used for text compression and 0 1
message transmission D4
0 1
•The problem: Given a set of n symbols
1 D3
and their weights (or frequencies), 0
construct a tree structure (a binary tree D D2
1
for binary code) with the objective of
reducing memory space & decoding
Code of:
time per symbol.
D1 = 000
•Huffman coding is constructed based D2 = 001
on frequency of occurrence of letters D3 = 01
in text documents D4 = 1
9
How to construct Huffman coding
Step 1: Create forest of trees for each symbol, t1, t2,… tn
Step 2: Sort forest of trees according to falling probabilities
of symbol occurrence
Step 3: WHILE more than one tree exist DO
–Merge two trees t1 and t2 with least probabilities p1 and p2
–Label their root with sum p1 + p2
–Associate binary code: 1 with the right branch and 0 with the left
branch
Step 4: Create a unique code word for each symbol by
traversing the tree from the root to the leaf.
–Concatenate all encountered 0s and 1s together during traversal
• The resulting tree has a prob. of 1 in its root and symbols in
its leaf node.
10
Example
• Consider a 7-symbol alphabet given in the following table
to construct the Huffman coding.

Symbol Probability
a 0.05
• The Huffman encoding
b 0.05
algorithm picks each time two
c 0.1 symbols (with the smallest
d 0.2 frequency) to combine
e 0.3
f 0.2
g 0.1
11
Huffman code tree
1
0 1
0.4 0.6
0
0 1 1
0.3
d f 1 e
0
0.2
1
g 0
0.1

c 0 1
a b

• Using the Huffman coding a table can be constructed by

working down the tree, left to right. This gives the binary
equivalents for each symbol in terms of 1s and 0s.
• What is the Huffman binary representation for ‘café’? 12
Exercise
1. Given the following, apply the Huffman
algorithm to find an optimal binary code:

Character: a b c d e t

Frequency: 16 5 12 17 10 25

13
Ziv-Lempel compression
•The problem with Huffman coding is that it requires
knowledge about the data before encoding takes place.
–Huffman coding requires frequencies of symbol occurrence
before code word is assigned to symbols

•Ziv-Lempel compression
–Not rely on previous knowledge about the data
–Rather builds this knowledge in the course of data
transmission/data storage
–Ziv-Lempel algorithm (called LZ) uses a table of code-words
created during data transmission;
•each time it replaces strings of characters with a reference to a previous
occurrence of the string.

14
Lempel-Ziv Compression Algorithm
• The multi-symbol patterns are of the form: C0C1 . . . Cn-
1Cn. The prefix of a pattern consists of all the pattern
symbols except the last: C0C1 . . . Cn-1

Lempel-Ziv Output: there are three options in assigning a code

to each symbol in the list
• If one-symbol pattern is not in dictionary, assign (0, symbol)
• If multi-symbol pattern is not in dictionary, assign
(dictionaryPrefixIndex, lastPatternSymbol)
• If the last input symbol or the last pattern is in the dictionary,
asign (dictionaryPrefixIndex, )

15
Example: LZ Compression
Encode (i.e., compress) the string ABBCBCABABCAABCAAB
using the LZ algorithm.

The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B)

Note: The above is just a representation, the commas and parentheses
16
are not transmitted;
Example: Decompression
Decode (i.e., decompress) the sequence (0, A) (0, B) (2, C) (3, A)
(2, A) (4, A) (6, B)

The decompressed message is: •

ABBCBCABABCAABCAAB
17
Exercise
Encode (i.e., compress) the following strings using the
Lempel-Ziv algorithm.

1. Mississippi
2. ABBCBCABABCAABCAAB
3. SATATASACITASA.

18
Indexing
Some concept about indexing
• Indexing until recently ,was accomplished by created
bibliographic citation in structure file that reference
the original text.
• Bibliographic citation: This is a formal description of a
source of information (such as a book, article, or
document) that typically includes details like the
author’s name, title, publication date, and publisher. It
serves as a reference to the original work.
• Automatic indexing is the capability for a system
automatically determines index term to be assigned
an item.
19
Con…
•An index file of a document is a file consisting of a list of
index terms and a link to one or more documents that has the
index term
–A good index file maps each keyword Ki to a set of documents Di that
contain the keyword

20
Sequential File
•Sequential file is the most primitive file structures.
–It has no vocabulary as well as linking pointers.
• The records are generally arranged serially, one after another,
but in lexicographic order on the value of some key field.
–a particular attribute is chosen as primary key whose value
will determine the order of the records.
–when the first key fails to discriminate among records, a
second key is chosen to give an order.

21
Sequential File
• To access records search serially;
– starting at the first record read and investigate all the
succeeding records until the required record is found
or end of the file is reached.

• Its main advantages are:

– easy to implement;
– provides fast access to the next record using
lexicographic order.

22
Sequential File
• Its disadvantages:
– difficult to update. Index must be rebuilt if a new
term is added.
– Inserting a new record may require moving a large
proportion of the file;
– random access is extremely slow.

• The problem of update can be:

– solved by ordering records by date of acquisition,
than the key value,
– hence, the newest entries are added to the end of
the file and therefore pose no difficulty to updating23
Inverted file
• A word oriented indexing mechanism based on sorted
list of keywords, with each keyword having links to the
documents containing it
–Building and maintaining an inverted index is a relatively low cost
risk.
• Data to be held in the inverted file includes list of index
terms and for each term:
–fij, number of occurrences of term tj in document di
–nj, number of documents containing tj
–mi, maximum frequency of any term in di
–n, total number of documents in a collection
–tf, total frequency of tj in nj
–….
24
Inverted file
• The inverted file contains:
–The vocabulary (List of terms)
–The occurrence (Location and frequency of terms in a document
collection)

• The vocabulary: is the set of all distinct words (index

terms) in the text collection.
–The collection is organized by terms

• The occurrence: contains one record per term, listing

–all the text locations/positions where the word occurs
–Frequency(count number ) of each term in a document,

25
Inverted file
•Having information about vocabulary (list of terms)
–When a system knows the vocabulary, it can quickly match user
queries to the appropriate documents that contain relevant terms,
–speeds searching for relevant documents
•Having information about the location of each term within the
document helps for:
–user interface design: highlight location of search term
•Having information about frequency is used for:
•calculating term weighting (like TF, TF*IDF, …)
•`optimizing query processing

26
Inverted File
Documents are organized by the terms/words they contain
This is called an
Word Tot Freq Document Term Location
index file.
Freq
Act 3 2 1 66
Text operations
19 1 213 are performed
29 1 45 before building
bus 4 3 1 94 the index.
19 2 7, 212  Location The
character or
22 1 56
word position
Pen 1 5 1 43 within the
total 3 11 2 3, 70 document.
34 1 40 (e.g., the 5th
word, the 12th
character). 27
Construction of Inverted file
An inverted index consists of two files: vocabulary
and posting files
•A vocabulary file (Word list):
–stores all of the distinct terms (keywords) that appear in
any of the documents (in lexicographical order) and
–For each word a pointer to posting file
•Records kept for each term j in the word list
contains the following:
–term j
–Frequency of a term in a given document
–number of documents in which term j occurs (nj)
–Total frequency of term j
–pointer to inverted (postings) list for term j 28
Postings File (Inverted List)
• For each distinct term in the vocabulary, stores a list of
pointers to the documents that contain that term.
• Each element in an inverted list is called a posting, i.e., the
occurrence of a term in a document
• Each list consists of one or many individual postings
Advantage of dividing inverted file:
• Keeping a pointer in the vocabulary to the list in the posting
file allows:
–the vocabulary to be kept in memory at search time even for large
text collection, and
–Posting file to be kept on disk for accessing to document

29
Organization of Index File
Vocabulary
Postings
(word list) Documents
(inverted list)
Term No Tot Pointer
of freq To
Doc posting

Act 3 3 Inverted
Bus 3 4 lists

pen 1 1
total 2 3
30
Example:
• Given a collection of documents, they are parsed to
extract words and these are saved with the Document
ID.

I did enact Julius

Doc 1 Caesar I was killed
i' the Capitol;
Brutus killed me.

So let it be with
Doc 2 Caesar. The noble
Brutus hath told you
Caesar was ambitious 31
• After all
Sorting the Vocabulary
Term Doc #
Term Doc #
documents have I 1 ambitious 2
been parsed the did 1 be 2
enact 1 brutus 1
inverted file is julius 1 brutus 2
sorted by terms caesar 1 capitol 1
– Inverted I
was
1
1
caesar 1
caesar 2
index may killed 1
caesar 2
I 1
record term the 1
did 1
locations capitol 1 enact 1
has 1
within brutus 1
I 1
killed 1
document me 1 I 1
during so 2 I 1
let 2 it 2
parsing it 2 julius 1
be 2 killed 1
with 2
killed 1
caesar 2
let 2
the 2
noble 2 me 1
brutus 2 noble 2
hath 2 so 2
told 2 the 1
you 2 the 2
caesar 2 told 2
was 2 you 2
ambitious 2 was 1 32
was 2
with 2
Remove stop-words, apply stemming
& compute term frequency
•Multiple term
entries in a
single Term Doc #
Term Doc # TF
document are ambition 2
ambition 2 1
merged and brutus 1
brutus 1 1
frequency brutus 2
brutus 2 1
information capitol 1
capitol 1 1
added caesar 1
caesar 1 1
•Counting caesar 2
caesar 2 2
number of caesar 2
enact 1 1
occurrence of enact 1
julius 1 1
terms in the julius 1
kill 1 2
collections kill 1
noble 2 1
helps to kill 1
compute TF noble 2
33
Vocabulary and postings file
The file is commonly split into a Dictionary and a Posting file
vocabulary posting
Term Doc # TF
Doc # TF
ambition 2 1 2 1
brutus 1 1 1 1
brutus 2 1 2 1
capitol 1 1 1 1
caesar 1 1 1 1
caesar 2 2 2 2
enact 1 1 1 1
julius 1 1 1 1
kill 1 2 1 2
noble 2 1 2 1
Pointers
Inverted index storage
•Separation of inverted file into vocabulary and posting file
is a good idea.
–Vocabulary: For searching purpose we need only word list. This
allows the vocabulary to be kept in memory at search time since the
space required for the vocabulary is small.
• Example: from 1,000,000,000 documents, there may be 1,000,000 distinct
words.
–Posting file requires much more space.
• For each word appearing in the text we are keeping statistical information
related to word occurrence in documents.
• Each of the postings pointer to the document requires an extra space

35
Suffix Tree

36
Suffix trie
• What is Suffix? A suffix is a substring that exists at the end of the
given string.
– Each position in the text is considered as a text suffix
– If txt=t1t2...ti...tn is a string, then Ti=ti, ti+1...tn is the suffix of txt that starts at
position i,
• Example: txt = mississippi txt = GOOGOL
T1 = mississippi; T1 = GOOGOL
T2 = ississippi; T2 = OOGOL
T3 = ssissippi; T3 = OGOL
T4 = sissippi; T4 = GOL
T5 = issippi; T5 = OL
T6 = ssippi; T6 = L
T7 = sippi;
T8 = ippi;
T9 = ppi;
T10 = pi;
T11 = i; 37
Suffix trie
A suffix trie is an ordinary trie in which the input strings are
all possible suffixes.
–Principles: The idea behind suffix TRIE is to assign to each
symbol in a text an index corresponding to its position in the text.
(i.e: First symbol has index 1, last symbol has index n (#of symbols
in text).
– To build the suffix TRIE we use these indices instead of the actual
object.
The structure has several advantages:
–It requires less storage space.
 Since the suffix trie stores suffixes, common prefixes among
these suffixes are shared in the structure. For example, the
suffixes "anana", "ana", and "a" share the prefix "a."
–We do not have to store the same object twice (no duplicate).
 For instance, in the word "banana", the suffix "ana" appears more
38
than once but is only stored once in the trie.
Suffix Trie
Construct suffix trie for the following string: GOOGOL
We begin by giving a position to every suffix in the text starting from
left to right as per characters occurrence in the string.
TEXT: GOOGOL$
POSITION: 1 2 3 4 5 6 7
Build a SUFFIX TRIE for all n suffixes of the text.
Note: The resulting tree has n leaves and height n.

This structure is
particularly useful
for any application
requiring prefix
based ("starts with")
pattern matching.
39
Suffix tree
• A suffix tree is a member of
the trie family. It is a Trie of all
the proper suffixes of S
–The suffix tree is created by O
compacting unary nodes
(nodes with only one child) of
the suffix TRIE.
• We store pointers rather than
words in the leaves.
–It is also possible to replace
strings in every edge by a pair
(a,b), where a & b are the
beginning and end index of the
string. i.e.
(3,7) for OGOL$
(1,2) for GO
40
(7,7) for $
Example: Suffix tree
Let s=abab, a suffix tree of s is a compressed trie
of all suffixes of s=abab$
We label each leaf with the
starting point of the
{ corresponding suffix.
$ $
b$ b 5
ab$ ab
bab$ $
abab$
} $ ab$ 4
ab$
3
2
1
41
Generalized suffix tree
Given a set of strings S, a generalized suffix tree of S is a
compressed trie of all suffixes of s  S
 To make suffixes prefix-free we add a special char, $, at the end of s.
 To associate each suffix with a unique string in S add a different special
symbol to each s
 Avoids overlap help in distinguishing end of one suffix from another
Build a suffix tree for the string s1$s2#, where `$' and `#' are a
special terminator for s1,s2.
Ex.: Let s1=abab & s2=aab, a generalized suffix tree for s1 & s2 is:
{ a $ #
b
$ #
# 5 4
b$ b#
b
ab$ ab# ab$ ab$ $
bab$ aab# 3
abab$ ab$ # 4
$ 1 2
} 1 2
42
3
Search in suffix tree
• Searching for all instances of a substring S in a suffix tree is easy
since any substring of S is the prefix of some suffix.
• Pseudo-code for searching in suffix tree:
–Start at root
–Go down the tree by taking each time the corresponding path
–If S correspond to a node then return all leaves in sub-tree
–If S encountered a NIL pointer before reaching the end, then S is
not in the tree
Example:
• If S = "GO" we take the GO path and return:
GOOGOL$,GOL$.
• If S = "OR" we take the O path and then we hit a NIL pointer so
"OR" is not in the tree.
43

4
No ratings yet
4
41 pages
Intro To Business Communication
100% (5)
Intro To Business Communication
204 pages
Chapter 3 Indexing Structures
No ratings yet
Chapter 3 Indexing Structures
63 pages
Chapter Four Indexing Structure
100% (2)
Chapter Four Indexing Structure
60 pages
Chapter 3 Part 1
No ratings yet
Chapter 3 Part 1
43 pages
Chapter 3,4, 5 and 6
No ratings yet
Chapter 3,4, 5 and 6
145 pages
AI6122 Topic 3.1 - Index
No ratings yet
AI6122 Topic 3.1 - Index
40 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
Chapter-4 - Data Structure-File Structure
No ratings yet
Chapter-4 - Data Structure-File Structure
34 pages
4_Indexing (2)
No ratings yet
4_Indexing (2)
29 pages
Ir Chapter Three
No ratings yet
Ir Chapter Three
41 pages
chap5
No ratings yet
chap5
64 pages
Chapter 4 IR
No ratings yet
Chapter 4 IR
56 pages
Advanced Indexing Issues
No ratings yet
Advanced Indexing Issues
52 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
File Organization Lec910
No ratings yet
File Organization Lec910
37 pages
Chapter 3 Indexing
No ratings yet
Chapter 3 Indexing
48 pages
Lec2 2
No ratings yet
Lec2 2
17 pages
chap5-index-construction
No ratings yet
chap5-index-construction
38 pages
ch3_ Indexing _2019
No ratings yet
ch3_ Indexing _2019
38 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Algorithms: Compressed Matching in Dictionaries
No ratings yet
Algorithms: Compressed Matching in Dictionaries
14 pages
Indexing and Searching: Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
No ratings yet
Indexing and Searching: Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
32 pages
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
No ratings yet
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
34 pages
FOP Efficiency Indexing 13
No ratings yet
FOP Efficiency Indexing 13
22 pages
4_Indexing
No ratings yet
4_Indexing
59 pages
3-Index Construction
No ratings yet
3-Index Construction
43 pages
Week 4 - Information Retrieval Indexing
No ratings yet
Week 4 - Information Retrieval Indexing
55 pages
IR Chapter Three
No ratings yet
IR Chapter Three
30 pages
Chapter-2 - Automatic Text Anlysis
No ratings yet
Chapter-2 - Automatic Text Anlysis
67 pages
IR Chap3
No ratings yet
IR Chap3
45 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
9 Dictionaries and Tolerant Retrieval
No ratings yet
9 Dictionaries and Tolerant Retrieval
58 pages
Module 2-Data Structures and Algorithms For Retrieval-Cat1
No ratings yet
Module 2-Data Structures and Algorithms For Retrieval-Cat1
133 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
09-indexes2
No ratings yet
09-indexes2
5 pages
Lecture 5p2 - Index Construction & Compressing
No ratings yet
Lecture 5p2 - Index Construction & Compressing
40 pages
3 Index Construction
No ratings yet
3 Index Construction
43 pages
3 Index Construction
No ratings yet
3 Index Construction
43 pages
Part-A: Searching: Searching Refers To The Operation of Finding Locations of A
No ratings yet
Part-A: Searching: Searching Refers To The Operation of Finding Locations of A
8 pages
lecture2-indexing
No ratings yet
lecture2-indexing
78 pages
Computer Systems Are Often Used To Store Large Amounts of Data From Which Individual Records Must Be Retrieved According To Some Search Criterion
No ratings yet
Computer Systems Are Often Used To Store Large Amounts of Data From Which Individual Records Must Be Retrieved According To Some Search Criterion
4 pages
Information Retrieval - 3
No ratings yet
Information Retrieval - 3
36 pages
Inverted File
No ratings yet
Inverted File
20 pages
CS2202_IndexingHashing
No ratings yet
CS2202_IndexingHashing
83 pages
W11 Greedy Algorithms Lecture 21 06052024 115021am
No ratings yet
W11 Greedy Algorithms Lecture 21 06052024 115021am
6 pages
Lecture4 - Indexing and Searching I
No ratings yet
Lecture4 - Indexing and Searching I
56 pages
UNIT 3 notes
No ratings yet
UNIT 3 notes
6 pages
UNIT-2
No ratings yet
UNIT-2
10 pages
Compression: Some Slides Courtesy James Allan@umass
No ratings yet
Compression: Some Slides Courtesy James Allan@umass
47 pages
Unit I
No ratings yet
Unit I
83 pages
DS_TM_Study_Material_Presentations_Unit-4_1TM
No ratings yet
DS_TM_Study_Material_Presentations_Unit-4_1TM
22 pages
Index Method1
No ratings yet
Index Method1
24 pages
Searching Handout
No ratings yet
Searching Handout
58 pages
IRS Module5-I
No ratings yet
IRS Module5-I
15 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
Module 5 - Indexing and Searching
No ratings yet
Module 5 - Indexing and Searching
15 pages
DB2 11 for z/OS: Intermediate Training for Application Developers
From Everand
DB2 11 for z/OS: Intermediate Training for Application Developers
Robert Wingate
No ratings yet
Schematron: A language for validating XML
From Everand
Schematron: A language for validating XML
Erik Siegel
No ratings yet
Write Great Code, Volume 1, 2nd Edition: Understanding the Machine
From Everand
Write Great Code, Volume 1, 2nd Edition: Understanding the Machine
Randall Hyde
No ratings yet
CodeNotes for Oracle 9i
From Everand
CodeNotes for Oracle 9i
Gregory Brill
3.5/5 (3)
CodeNotes for .NET
From Everand
CodeNotes for .NET
Gregory Brill
5/5 (1)
chapte3
No ratings yet
chapte3
16 pages
HTML
No ratings yet
HTML
1 page
ACCONTING NOTE
No ratings yet
ACCONTING NOTE
9 pages
EIG.Chapter1
No ratings yet
EIG.Chapter1
27 pages
Chapter one
No ratings yet
Chapter one
23 pages
Communication and Nurse Patient Relationship: By, Ms. Ellen Angellin Assist Prof, Aconc
No ratings yet
Communication and Nurse Patient Relationship: By, Ms. Ellen Angellin Assist Prof, Aconc
114 pages
Sophisticated Encryption
No ratings yet
Sophisticated Encryption
2,948 pages
Chapter 6
No ratings yet
Chapter 6
34 pages
Youtube Lists For Coding and Decoding
No ratings yet
Youtube Lists For Coding and Decoding
4 pages
Chapte-2 Information Theory and Coding
No ratings yet
Chapte-2 Information Theory and Coding
68 pages
Teacher: Dr. Ayesha Riaz Mam Humera Hina Mr. Faisal Ihsan EDU-504 - 3 (3-0)
No ratings yet
Teacher: Dr. Ayesha Riaz Mam Humera Hina Mr. Faisal Ihsan EDU-504 - 3 (3-0)
16 pages
Information Theory and Logical Analysis in The Tractatus Logico-Philosophicus
No ratings yet
Information Theory and Logical Analysis in The Tractatus Logico-Philosophicus
32 pages
Strategic Communication in Business and The Professions 8th Edition O'Hair Friedrich Dixon Test Bank
No ratings yet
Strategic Communication in Business and The Professions 8th Edition O'Hair Friedrich Dixon Test Bank
37 pages
Convolutional Coding
No ratings yet
Convolutional Coding
42 pages
Fibre Channel 1
No ratings yet
Fibre Channel 1
26 pages
FINAL DAILY LESSON LOG-digital Representation
No ratings yet
FINAL DAILY LESSON LOG-digital Representation
3 pages
BRETTE Is Coding A Relevant Metaphor For The Brain
No ratings yet
BRETTE Is Coding A Relevant Metaphor For The Brain
58 pages
Information Theory
No ratings yet
Information Theory
21 pages
CD WINTER 2021 Solution
No ratings yet
CD WINTER 2021 Solution
30 pages
DAA Unit 3
No ratings yet
DAA Unit 3
23 pages
Ritual View of Communication
No ratings yet
Ritual View of Communication
25 pages
BCCCO101 Book NEW
100% (1)
BCCCO101 Book NEW
187 pages
Ascii Code
No ratings yet
Ascii Code
4 pages
An Introduction To Arithmetic Coding: Glen G. Langdon, JR
No ratings yet
An Introduction To Arithmetic Coding: Glen G. Langdon, JR
15 pages
Codes
No ratings yet
Codes
4 pages
Oral Com Quiz 1
No ratings yet
Oral Com Quiz 1
1 page
Com Skills
No ratings yet
Com Skills
102 pages
Binary Codes
No ratings yet
Binary Codes
4 pages
Compression & Decompression
67% (3)
Compression & Decompression
118 pages
CRC 1
No ratings yet
CRC 1
22 pages
UWUBET18078
No ratings yet
UWUBET18078
8 pages
Binary Conversion
No ratings yet
Binary Conversion
18 pages
Communication Derived From The Latin Word Communis-1 PDF
No ratings yet
Communication Derived From The Latin Word Communis-1 PDF
37 pages

ISR Chap...4

Uploaded by

ISR Chap...4

Uploaded by

Chapter Four

• Using the Huffman coding a table can be constructed by

Lempel-Ziv Output: there are three options in assigning a code

The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B)

The decompressed message is: •

• Its main advantages are:

• The problem of update can be:

• The vocabulary: is the set of all distinct words (index

• The occurrence: contains one record per term, listing

I did enact Julius

You might also like