0% found this document useful (0 votes)
11 views

CCCS314 - DAA - 22!23!3rd 05 Space and Time Tradeoffs - Modified

Uploaded by

engmanalf98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

CCCS314 - DAA - 22!23!3rd 05 Space and Time Tradeoffs - Modified

Uploaded by

engmanalf98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Topic 5

Space and Time Tradeoffs


Prof. Hanan Elazhary

Main source: A. Levitin, Introduction to the Design and Analysis of Algorithms, 3rd edition
Space-for-time Tradeoffs
Consider, as an example, the problem of computing values of a function at many points in its domain. If it is
time that is at a premium, we can precompute the function’s values and store them in a table.

Two varieties of space-for-time algorithms


input enhancement — preprocess the input (or part of it) and store the
additional information obtained (extra space) to accelerate solving the problem
counting sort
string matching (searching), e.g., Horspool

prestructuring — preprocess the input (extra space) to make accessing its


elements faster and/or more flexible
hashing
indexing (e.g., B-trees)

2
String Matching
(Searching)
String Matching
Several string matching (searching) algorithms are based on the idea
of input enhancement for preprocessing an input pattern to get
useful additional information for matching (searching) speed up
Knuth-Morris-Pratt (KMP) algorithm preprocesses an input pattern left to right
O(m+n) time in the worst case
Boyer-Moore algorithm preprocesses an input pattern right to left and stores
information into two tables O(m+n) time in the worst case

Horspool’s algorithm is a simplified version of Boyer-Moore algorithm, that


uses one table only

Remember that brute force string matching was in O(mn) in the worst case
4
Horspool’s Algorithm
Align the pattern at beginning of the text and
starting with the rightmost character of the pattern and moving right to
left, we compare the corresponding pairs of characters in the pattern and text

If a mismatch occurs, we determine how to shift the pattern to the right


by checking the text character corresponding to the rightmost
pattern character
5
Horspool’s Algorithm, Contd.
Case 1: If the checked character does not match any pattern character, shift
the pattern by its entire length

Case 2: If the checked character matches the rightmost pattern character, but
not any other pattern character, shift the pattern by its entire length

We can shift the pattern by its entire length


m as long as the checked character is not
among the first m-1 characters of the pattern
6
Horspool’s Algorithm, Contd.
Case 3: If the checked character occurs in the pattern but not in the rightmost
position, shift the pattern to align the rightmost occurrence with the checked
character

Case 4: If the checked character matches the rightmost pattern character and
some other pattern characters, shift the pattern to align the next rightmost
occurrence in the pattern with the checked character
Otherwise, we can shift the pattern by the distance
from the rightmost occurrence of the checked
character among the first m-1 characters of the
pattern and the rightmost pattern character
7
Horspool’s Algorithm, Contd.
Noticing that the shift depends on the match between the checked
character and the pattern characters,
we use the idea of input enhancement to speed up the process of
determining how much to shift the pattern when a mismatch occurs
Create a shift table indexed by all possible text (and pattern) characters
Preprocess the pattern to compute for each shift table entry the pattern shift
size (when a mismatch occurs) using the formula

Case 1 & 2

Case 3 & 4

8
Horspool’s Algorithm, Contd.
Horspool’s algorithm
1. Create a shift table indexed by all possible text (and pattern) characters
2. Preprocess the pattern to compute for each shift table entry the pattern shift size
(when a mismatch occurs)
3. Align the pattern at beginning of the text
4. Starting with the rightmost character of the pattern and moving right to left, compare
the corresponding pairs of characters in the pattern and text until either all pattern
characters are matched then stop or a mismatch occurs
5. When a mismatch occurs, shift the pattern to the right along the text according to
the shift table’s entry for the text’s character c aligned with the last character in the
pattern
6. Repeat until either a matching substring is found, or the pattern reaches beyond the
last character of the text

9
Horspool’s Algorithm, Contd.

The default shift size is the pattern size m


distance between last pattern character (m-1)
and one of its first m-1 characters
For each of the first m-1 characters of the pattern, modify the
corresponding table entry

10
Horspool’s Algorithm, Contd.

the rightmost pattern character position


While the rightmost pattern character position does not reach beyond the text

While the number of matched characters is smaller


than the pattern length and there is a match
Increment k with each match and use it to move right to left along the pattern and text

If all pattern characters are (matched)


return the leftmost position of the pattern

Otherwise, (mismatch), Shift the rightmost pattern character position by the table entry indexed by the text
character at this position
When the rightmost pattern character position reaches beyond the text, no matching

11
Example 1
Example 1 The entire length of the
pattern BAOBAB is 6, so
the shift size is 6 for all
characters that are not
among the first m-1
characters of the pattern

The shift size for the


other characters is the
distance from the
rightmost occurrence of
the character among the
first m-1 characters of
the pattern and the
rightmost pattern
character
A: 1
B: 2
O: 3 12
Example 2
Example The entire length of the
pattern BARBER is 6, so
the shift size is 6 for all
characters that are not
among the first m-1
4 1 6 2 3
characters of the pattern

The shift size for the


other characters is the
distance from the
rightmost occurrence of
the character among the
first m-1 characters of
the pattern and the
rightmost pattern
character
E: 1
B: 2
R: 3
13
A: 4
Hashing
Dictionary
A dictionary is a data structure which stores key-value (or key-element)
pairs and supports the operations
search (lookup)
insert
delete

When a key is passed to a dictionary, it returns the corresponding


element
e.g., given a phone number, return the caller’s name

15
Dictionary, Contd.
Implementations
Use a linked list
Use a balanced binary search tree
Use a Direct Access Table (DAT)
Key becomes address of the element
Extremely Efficient
Impractical when the number of possible
keys is large (extremely large size), or
when it far exceeds the number of
actually stored keys

https://cis300.cs.ksu.edu/dictionaries/linked-list-impl/; http://see-programming.blogspot.com/2013/05/implement-dictionary-using-binary.html; 16
https://web.stanford.edu/class/archive/cs/cs161/cs161.1168/lecture9.pdf
Hashing
A very efficient method for implementing a dictionary
based on transform and conquer representation-change and space-for-time
tradeoff (prestructuring) ideas
by mapping keys of size n into a hash table of manageable size m

The hash function h


maps each key to an integer between 0 and m-1, which is a hash address of
a location in the hash table
ℎ: 𝐾 → {0, 1, … , 𝑚 − 1}

17
Hashing, Contd.
Generally, a hash function should
be easy to compute
distribute keys about evenly throughout the hash table
Some hash functions
mod
truncation, e.g., keep 3 right most digits
squaring, e.g., square and truncate
radix conversion, e.g., 1234 treat it to be base 11, truncate if necessary
folding, e.g., 123|456|789: add them and take mod of table size
Important applications
symbol tables
databases (extendible hashing)
18
https://www.geeksforgeeks.org/folding-method-in-hashing/; http://estudies4you.blogspot.com/2017/09/symbol-table-organizing-using-hashing.html; https://www.geeksforgeeks.org/extendible-hashing-dynamic-approach-to-dbms/
Hashing, Contd.
Hash Function = mod(key, 10)
DAT versus hashing Direct Access Table Hash Table
Address Record Address Record
0
1
… 2
5336663 “Sara” 3 5336663 “Sara”
… 4
… 5
5661116 “ Ross” 6 5661116 “ Ross”
… 7
8
9

Non-numeric keys can be converted to numeric values, for example by


adding the ASCII values of all characters of the key
19
Hashing, Contd.
Collisions problem
If h(K1) = h(K2), there is a collision
Good hash functions result in fewer collisions, but some collisions should be
expected

Two principal hashing


schemes handle collisions
differently
Open hashing
Closed hashing

20
Hashing, Contd.
Open hashing
each cell is a header of linked list of all keys hashed to it
Closed hashing
one key per cell
in case of collision, finds another cell by a technique such as
linear probing: use next free cell (bucket)
quadratic probing: use next free cell (bucket) distant by 1, 4, 9, 16, … positions
double hashing: use second hash function to compute increment

21
Open Hashing (Separate Chaining)
Keys are stored in linked lists outside a hash table whose elements
serve as the lists’ headers
Example
A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED e.g.,
h(FOOL)
h(K) = sum of K’s letters’ positions in the alphabet MOD 13 = (6+15+15+12)
MOD13 = 9
Key A FOOL AND HIS MONEY ARE SOON PARTED
e.g., search for KID
h(K) 1 9 6 10 7 11 11 12
h(KID)
0 1 2 3 4 5 6 7 8 9 10 11 12 = (11+9+4)
MOD13 = 11

A AND MONEY FOOL HIS ARE PARTED

SOON 22
Open Hashing (Separate Chaining), Contd.
Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81
hash(key) = key % 10

0 0

1 81 1
2

4 64 4
5 25
6 36 16
7

9 49 9

23
Open Hashing (Separate Chaining), Contd.
If hash function evenly distributes n keys among m cells of the hash table
average length of linked list (load factor) will be α = n/m and is very important
for the efficiency of hashing (typically kept small, ideally about 1)
the average number of pointers inspected (probes) in successful searches S,
and unsuccessful searches, U, turns out to be

Advantage
reduction in average linked list size by a factor of m
still works if n > m
Disadvantage
requires the implementation of a second data structure (a linked list using pointers)
24
Closed Hashing (Open Addressing)
Keys are stored inside a hash table, one key per cell
and in case of collision, finds another cell by a technique such as
linear probing,
quadratic probing, or
double hashing

Linear probing
use next free cell even if you wrap around

25
Closed Hashing (Open Addressing), Contd.
Example
Insert items with keys into an empty hash table using linear probing
using h(K) = sum of K’s letters’ positions in the alphabet MOD 13

After insert A
Try After insert FOOL
h(SOON) After insert AND
h(SOON)+1 After insert HIS

After insert MONEY


h(PARTED)
After insert ARE
h(PARTED)+1
→ wrap around After insert SOON

After insert PARTED

26
Closed Hashing (Open Addressing), Contd.
Example
Insert items with keys
89, 18, 49, 58, 9 into
an empty hash table
using linear probing
using h(K) =K % 10

27
Closed Hashing (Open Addressing), Contd.
Advantage
avoids pointers

Disadvantages
does not work if n > m
deletions are not straightforward (if we delete the key ARE from the last cell of the hash
table, we will be unable to find the key SOON afterward)
the average number of times the algorithm must access the hash table (probes to
find/insert/delete a key) in successful and unsuccessful searches depends on the
load factor α=n/m (hash table density) and collision resolution strategy
For linear probing

28
Closed Hashing (Open Addressing), Contd.
As the hash table gets closer to being full (α approaches 1), number of probes
in linear probing increases dramatically

29
Questions?

You might also like