11-Hash-Tables-II
11-Hash-Tables-II
Hash Tables
In case of collision create a list of elements with the same hash value
h(k1 )
k1 key article next
h( k 2 )
k2 key article next key article next
k4
k3 h ( k3 )
key article next key article next
k6
k5
key article next
Algorithms – Hash Tables II 11-3
Note, there are two types of hash functions with absolutely different
requirements:
- hash functions to support data structures
- cryptographic hash functions
Assumption:
All keys are natural numbers
Algorithms – Hash Tables 10-4
Then ℎ( ) = ( 1)
= 2 is a convenient value
If the size of a computer word is , choose to be a fraction like
for a integer
To compute ℎ( ), multiply by = ⋅ 2
The result is a 2 -bit value 2 +
Then ℎ( ) is the the most significant bits of
Algorithms – Hash Tables 10-6
Universal Hashing
To guarantee hashing even closer to simple uniform, a natural idea is to
choose hash function also at random, independent of the keys being
hashed
We use universal collection of hash functions
A collection of hash functions is called universal, if for each pair of
distinct keys and , the number of hash functions ℎ ∈ such
that ℎ( ) = ℎ( ) is no more than | |/
Corollary
Using universal hashing and collision resolution by chaining in a table
with slots, it takes expected time Θ(#) to handle any
sequence of # table operations.
Algorithms – Hash Tables 10-8
Theorem
The class ,6 of hash functions is universal
Algorithms – Hash Tables II 11-9
Open Addressing
A serious drawback of chaining: it uses a lot of pointers
The idea:
Keep all the lists inside the hash table
Instead of using pointers, compute the location of the next element
Probe Sequence
Hash function depends on 2 arguments and generates a probe
sequence
Formally:
ℎ: 9 0,1, … , – 1 → {0,1, … , – 1}
Probe sequence
ℎ( , 0), ℎ( , 1), … , ℎ( , – 1)
We want this sequence to be a permutation of 0,1, … , – 1, so that
every slot in the hash table can be occupied.
Clearly we cannot store more elements than the number of slots in the
table
Thus the load factor does not exceed 1
Algorithms – Hash Tables II 11-11
Insertion
Hash-Insert(;, )
set <: = 0
repeat
set =: = ℎ( , <)
if ;[=] =Nil then do
set ;[=]: =
return =
else set <: = < + 1
until < =
error “hash table overflow”
Algorithms – Hash Tables II 11-12
Probing: Linear
To generate a probe sequence we use an ordinary hash function,
called auxiliary hash function
ℎ′: 9 {0,1, … , – 1}
Linear probing:
ℎ , < = ℎ′ +<
Thus we start searching from slot ℎ′( ), then check ℎ′( ) + 1, etc.
Drawbacks:
- Primary clustering, long sequences of occupied slots build up
making the average search time too long
- Since ℎ( , 0) = ℎ( ′, 0) implies ℎ( , <) = ℎ( ′, <) for all <,
there are very few different probe sequences (m to be precise)
Algorithms – Hash Tables II 11-14
Probing: Quadratic
Quadratic probing:
ℎ( , <) = (ℎ′( ) + ? < + < )
No primary clustering
Drawbacks:
- Possible values of ?, , and are very restricted
- Secondary clustering, milder form of clustering
- Only few different probe sequences
Algorithms – Hash Tables II 11-15
Theorem
Given an open-address hash table with load factor ) = #/ < 1,
the expected number of probes in an unsuccessful search is at most
assuming uniform hashing
@A
Theorem
Given an open-address hash table with load factor ) = #/ < 1,
the expected number of probes in a successful search is at most
1 1
ln
) 1−)
assuming uniform hashing
Algorithms – Hash Tables II 11-18
Homework
Suggest how to organize a direct access table in which not all keys are
different. All operations must run in D(1) time
Show that if |9| > # (9 denotes the set of all possible keys), there
is a subset of 9 of size # consisting of keys that all hash to the
same slot, so that the worst-case searching time for hashing with
chaining is Θ(#)