Outline and Reading: Tries 4/1/2003 9:02 AM
Outline and Reading: Tries 4/1/2003 9:02 AM
com
Standard tries (11.3.1) Compressed tries (11.3.2) Suffix tries (11.3.3) Huffman encoding tries (11.4.1)
4/1/2003 9:02 AM
Tries
4/1/2003 9:02 AM
Tries
Preprocessing Strings
Preprocessing the pattern speeds up pattern matching queries
After preprocessing the pattern, KMPs algorithm performs pattern matching in time proportional to the text size
If the text is large, immutable and searched for often (e.g., works by Shakespeare), we may want to preprocess the text instead of the pattern A trie is a compact data structure for representing a set of strings, such as all the words in a text
A tries supports pattern matching queries in time proportional to the pattern size
s u l l
Tries
i l l d
e y l l c k
t o p
4
4/1/2003 9:02 AM
Tries
s e e s e e b i d h e a r
a a
b e a r ? b u l l ?
s e l b u y b i d l ?
s t o c k ! s t o c k !
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
s t o c k ! t h e
s t o c k ! s t o p !
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
b e l
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
s u l l
Tries
b i d 47, 58 l l 30 u y 36
h e a r 69 e 0, 24 e l l 12
i l l d
e y l l c k
t o p
5
4/1/2003 9:02 AM
Tries
www.jntuworld.com
Compressed Trie
A compressed trie has internal nodes of degree at least two It is obtained from standard trie by compressing chains of redundant nodes
b e a r l l
4/1/2003 9:02 AM
Compact Representation
b s u ll y ell ck to p
e ar ll
id
s e e b e a r s e l l
b u l l b u y b i d
h e a r b e l l
0 1 2 3
s u l l
Tries
S[2] =
s t o p
i d
e y l l c k
t o p
7
S[3] =
s t o c k
1, 0, 0 7, 0, 3 4, 1, 1 4, 2, 3 5, 2, 2
Tries
0, 0, 0 0, 1, 1 3, 1, 2 3, 3, 4 9, 3, 3
8
1, 1, 1 1, 2, 3 8, 2, 3
6, 1, 2
0, 2, 2
2, 2, 3
4/1/2003 9:02 AM
m i n i m i z e 0 1 2 3 4 5 6 7
m i n i m i z e 0 1 2 3 4 5 6 7
e mize
i nimize ze nimize
mi ze
nimize
ze
7, 7 4, 7
9 4/1/2003 9:02 AM
1, 1 2, 7 6, 7
Tries
0, 1 2, 7 6, 7
2, 7
6, 7
4/1/2003 9:02 AM
Tries
10
Example
X = abracadabra T1 encodes X into 29 bits T2 encodes X into 24 bits
T1
T2
00 a
010 011 b c
10 d
11 e
a
Tries
d b c
e
11
c a
4/1/2003 9:02 AM
d r
a c d
4/1/2003 9:02 AM
Tries
12
www.jntuworld.com
Huffmans Algorithm
Given a string X, Huffmans algorithm construct a prefix code the minimizes the size of the encoding of X It runs in time O(n + d log d), where n is the size of X and d is the number of distinct characters of X A heap-based priority queue is used as an auxiliary structure
4/1/2003 9:02 AM
Example
X = abracadabra Frequencies
a 5
a 5 a
11 6 2 4 d b r 6 2 a 5 2 c d b 4 r
Algorithm HuffmanEncoding(X) Input string X of size n Output optimal encoding trie for X C distinctCharacters(X) computeFrequencies(C, X) Q new empty heap for all c C T new single-node tree storing c Q.insert(getFrequency(c), T) while Q.size() > 1 f1 Q.minKey() T1 Q.removeMin() f2 Q.minKey() T2 Q.removeMin() T join(T1, T2) Q.insert(f1 + f2, T) return Q.removeMin()
Tries 13
b 2
b 2
c 1
c 1
d 1
d 1
r 2
r 2
2 a 5 b 2 c d r 2 a 5
Tries
4 d b r
14
4/1/2003 9:02 AM