CCCS314 - DAA - 22!23!3rd 05 Space and Time Tradeoffs - Modified
CCCS314 - DAA - 22!23!3rd 05 Space and Time Tradeoffs - Modified
Main source: A. Levitin, Introduction to the Design and Analysis of Algorithms, 3rd edition
Space-for-time Tradeoffs
Consider, as an example, the problem of computing values of a function at many points in its domain. If it is
time that is at a premium, we can precompute the function’s values and store them in a table.
2
String Matching
(Searching)
String Matching
Several string matching (searching) algorithms are based on the idea
of input enhancement for preprocessing an input pattern to get
useful additional information for matching (searching) speed up
Knuth-Morris-Pratt (KMP) algorithm preprocesses an input pattern left to right
O(m+n) time in the worst case
Boyer-Moore algorithm preprocesses an input pattern right to left and stores
information into two tables O(m+n) time in the worst case
Remember that brute force string matching was in O(mn) in the worst case
4
Horspool’s Algorithm
Align the pattern at beginning of the text and
starting with the rightmost character of the pattern and moving right to
left, we compare the corresponding pairs of characters in the pattern and text
Case 2: If the checked character matches the rightmost pattern character, but
not any other pattern character, shift the pattern by its entire length
Case 4: If the checked character matches the rightmost pattern character and
some other pattern characters, shift the pattern to align the next rightmost
occurrence in the pattern with the checked character
Otherwise, we can shift the pattern by the distance
from the rightmost occurrence of the checked
character among the first m-1 characters of the
pattern and the rightmost pattern character
7
Horspool’s Algorithm, Contd.
Noticing that the shift depends on the match between the checked
character and the pattern characters,
we use the idea of input enhancement to speed up the process of
determining how much to shift the pattern when a mismatch occurs
Create a shift table indexed by all possible text (and pattern) characters
Preprocess the pattern to compute for each shift table entry the pattern shift
size (when a mismatch occurs) using the formula
Case 1 & 2
Case 3 & 4
8
Horspool’s Algorithm, Contd.
Horspool’s algorithm
1. Create a shift table indexed by all possible text (and pattern) characters
2. Preprocess the pattern to compute for each shift table entry the pattern shift size
(when a mismatch occurs)
3. Align the pattern at beginning of the text
4. Starting with the rightmost character of the pattern and moving right to left, compare
the corresponding pairs of characters in the pattern and text until either all pattern
characters are matched then stop or a mismatch occurs
5. When a mismatch occurs, shift the pattern to the right along the text according to
the shift table’s entry for the text’s character c aligned with the last character in the
pattern
6. Repeat until either a matching substring is found, or the pattern reaches beyond the
last character of the text
9
Horspool’s Algorithm, Contd.
10
Horspool’s Algorithm, Contd.
Otherwise, (mismatch), Shift the rightmost pattern character position by the table entry indexed by the text
character at this position
When the rightmost pattern character position reaches beyond the text, no matching
11
Example 1
Example 1 The entire length of the
pattern BAOBAB is 6, so
the shift size is 6 for all
characters that are not
among the first m-1
characters of the pattern
15
Dictionary, Contd.
Implementations
Use a linked list
Use a balanced binary search tree
Use a Direct Access Table (DAT)
Key becomes address of the element
Extremely Efficient
Impractical when the number of possible
keys is large (extremely large size), or
when it far exceeds the number of
actually stored keys
https://cis300.cs.ksu.edu/dictionaries/linked-list-impl/; http://see-programming.blogspot.com/2013/05/implement-dictionary-using-binary.html; 16
https://web.stanford.edu/class/archive/cs/cs161/cs161.1168/lecture9.pdf
Hashing
A very efficient method for implementing a dictionary
based on transform and conquer representation-change and space-for-time
tradeoff (prestructuring) ideas
by mapping keys of size n into a hash table of manageable size m
17
Hashing, Contd.
Generally, a hash function should
be easy to compute
distribute keys about evenly throughout the hash table
Some hash functions
mod
truncation, e.g., keep 3 right most digits
squaring, e.g., square and truncate
radix conversion, e.g., 1234 treat it to be base 11, truncate if necessary
folding, e.g., 123|456|789: add them and take mod of table size
Important applications
symbol tables
databases (extendible hashing)
18
https://www.geeksforgeeks.org/folding-method-in-hashing/; http://estudies4you.blogspot.com/2017/09/symbol-table-organizing-using-hashing.html; https://www.geeksforgeeks.org/extendible-hashing-dynamic-approach-to-dbms/
Hashing, Contd.
Hash Function = mod(key, 10)
DAT versus hashing Direct Access Table Hash Table
Address Record Address Record
0
1
… 2
5336663 “Sara” 3 5336663 “Sara”
… 4
… 5
5661116 “ Ross” 6 5661116 “ Ross”
… 7
8
9
20
Hashing, Contd.
Open hashing
each cell is a header of linked list of all keys hashed to it
Closed hashing
one key per cell
in case of collision, finds another cell by a technique such as
linear probing: use next free cell (bucket)
quadratic probing: use next free cell (bucket) distant by 1, 4, 9, 16, … positions
double hashing: use second hash function to compute increment
21
Open Hashing (Separate Chaining)
Keys are stored in linked lists outside a hash table whose elements
serve as the lists’ headers
Example
A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED e.g.,
h(FOOL)
h(K) = sum of K’s letters’ positions in the alphabet MOD 13 = (6+15+15+12)
MOD13 = 9
Key A FOOL AND HIS MONEY ARE SOON PARTED
e.g., search for KID
h(K) 1 9 6 10 7 11 11 12
h(KID)
0 1 2 3 4 5 6 7 8 9 10 11 12 = (11+9+4)
MOD13 = 11
SOON 22
Open Hashing (Separate Chaining), Contd.
Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81
hash(key) = key % 10
0 0
1 81 1
2
4 64 4
5 25
6 36 16
7
9 49 9
23
Open Hashing (Separate Chaining), Contd.
If hash function evenly distributes n keys among m cells of the hash table
average length of linked list (load factor) will be α = n/m and is very important
for the efficiency of hashing (typically kept small, ideally about 1)
the average number of pointers inspected (probes) in successful searches S,
and unsuccessful searches, U, turns out to be
Advantage
reduction in average linked list size by a factor of m
still works if n > m
Disadvantage
requires the implementation of a second data structure (a linked list using pointers)
24
Closed Hashing (Open Addressing)
Keys are stored inside a hash table, one key per cell
and in case of collision, finds another cell by a technique such as
linear probing,
quadratic probing, or
double hashing
Linear probing
use next free cell even if you wrap around
25
Closed Hashing (Open Addressing), Contd.
Example
Insert items with keys into an empty hash table using linear probing
using h(K) = sum of K’s letters’ positions in the alphabet MOD 13
After insert A
Try After insert FOOL
h(SOON) After insert AND
h(SOON)+1 After insert HIS
26
Closed Hashing (Open Addressing), Contd.
Example
Insert items with keys
89, 18, 49, 58, 9 into
an empty hash table
using linear probing
using h(K) =K % 10
27
Closed Hashing (Open Addressing), Contd.
Advantage
avoids pointers
Disadvantages
does not work if n > m
deletions are not straightforward (if we delete the key ARE from the last cell of the hash
table, we will be unable to find the key SOON afterward)
the average number of times the algorithm must access the hash table (probes to
find/insert/delete a key) in successful and unsuccessful searches depends on the
load factor α=n/m (hash table density) and collision resolution strategy
For linear probing
28
Closed Hashing (Open Addressing), Contd.
As the hash table gets closer to being full (α approaches 1), number of probes
in linear probing increases dramatically
29
Questions?