0% found this document useful (0 votes)
15 views

11-Hash-Tables-II

Uploaded by

movieemailid9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

11-Hash-Tables-II

Uploaded by

movieemailid9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Hash Tables II

Data Structures and Algorithms


Andrei Bulatov
Algorithms – Hash Tables II 11-2

Hash Tables
In case of collision create a list of elements with the same hash value

h(k1 )
k1 key article next

h( k 2 )
k2 key article next key article next

k4
k3 h ( k3 )
key article next key article next
k6
k5
key article next
Algorithms – Hash Tables II 11-3

Good Hash Functions


Good hash functions are those that are as close to simple uniform
hashing as possible
It is difficult to achieve, since we do not know the distribution of keys

Note, there are two types of hash functions with absolutely different
requirements:
- hash functions to support data structures
- cryptographic hash functions

Assumption:
All keys are natural numbers
Algorithms – Hash Tables 10-4

The Division Method


Choose
Then ℎ( ) =

Should be careful with some values of


Say, no powers of 2, or powers of 10, or …

Primes is a good choice, as long as they are not close to a power of 2


Algorithms – Hash Tables 10-5

The Multiplication Method


Choose
Choose with 0 < <1

1 denotes the fractional part of , that is –

Then ℎ( ) = ( 1)
= 2 is a convenient value
If the size of a computer word is , choose to be a fraction like
for a integer
To compute ℎ( ), multiply by = ⋅ 2
The result is a 2 -bit value 2 +
Then ℎ( ) is the the most significant bits of
Algorithms – Hash Tables 10-6

Universal Hashing
To guarantee hashing even closer to simple uniform, a natural idea is to
choose hash function also at random, independent of the keys being
hashed
We use universal collection of hash functions
A collection of hash functions is called universal, if for each pair of
distinct keys and , the number of hash functions ℎ ∈ such
that ℎ( ) = ℎ( ) is no more than | |/

To construct a hash table we first select ℎ ∈ (randomly!), and then


use it
Algorithms – Hash Tables 10-7

Universal Hashing (cntd)


Lemma
Suppose a hash function is chosen at random from a universal
collection and is used to hash # keys into a table of size .
If key is not in the table, then the expected length $[#& ' ]
of the list that hashes to is at most ) = #/ .
If is in the table, then the expected length $[#& ' ] of the list
containing is at most 1 + )

Corollary
Using universal hashing and collision resolution by chaining in a table
with slots, it takes expected time Θ(#) to handle any
sequence of # table operations.
Algorithms – Hash Tables 10-8

Constructing a Universal Hashing Collection


Choose a prime such that all possible keys are in the range
{0, … , – 1}
Let / = {0, … , − 1} and / ∗ = {1, … , − 1}
For 2 ∈ / ∗ and 3 ∈ / let
ℎ4,5 = 2 +3
and

,6 = {ℎ 4,5 ∶ 2 ∈ / ,3 ∈ / }

Theorem
The class ,6 of hash functions is universal
Algorithms – Hash Tables II 11-9

Open Addressing
A serious drawback of chaining: it uses a lot of pointers
The idea:
Keep all the lists inside the hash table
Instead of using pointers, compute the location of the next element

To insert or search the hash table


we successfully check or probe a
sequence of entries of the table

This sequence depends on the key being


searched or inserted
Algorithms – Hash Tables II 11-10

Probe Sequence
Hash function depends on 2 arguments and generates a probe
sequence
Formally:
ℎ: 9 0,1, … , – 1 → {0,1, … , – 1}
Probe sequence
ℎ( , 0), ℎ( , 1), … , ℎ( , – 1)
We want this sequence to be a permutation of 0,1, … , – 1, so that
every slot in the hash table can be occupied.

Clearly we cannot store more elements than the number of slots in the
table
Thus the load factor does not exceed 1
Algorithms – Hash Tables II 11-11

Insertion
Hash-Insert(;, )
set <: = 0
repeat
set =: = ℎ( , <)
if ;[=] =Nil then do
set ;[=]: =
return =
else set <: = < + 1
until < =
error “hash table overflow”
Algorithms – Hash Tables II 11-12

Search and Deletion


Hash-Search(;, )
set <: = 0
repeat
set =: = ℎ( , <)
if ;[=] = then return =
set <: = < + 1
until ; = = Nil or < =
return Nil

Deletion is difficult, as it is not possible in general to shift all elements in


a sequence, for some of them may belong to different sequences
We can write `Deleted’ instead of actual deleting
Or better use chaining
Algorithms – Hash Tables II 11-13

Probing: Linear
To generate a probe sequence we use an ordinary hash function,
called auxiliary hash function
ℎ′: 9 {0,1, … , – 1}
Linear probing:
ℎ , < = ℎ′ +<
Thus we start searching from slot ℎ′( ), then check ℎ′( ) + 1, etc.

Drawbacks:
- Primary clustering, long sequences of occupied slots build up
making the average search time too long
- Since ℎ( , 0) = ℎ( ′, 0) implies ℎ( , <) = ℎ( ′, <) for all <,
there are very few different probe sequences (m to be precise)
Algorithms – Hash Tables II 11-14

Probing: Quadratic
Quadratic probing:
ℎ( , <) = (ℎ′( ) + ? < + < )

where ℎ′ is an auxiliary hash function, ?, 0 are constants

No primary clustering

Drawbacks:
- Possible values of ?, , and are very restricted
- Secondary clustering, milder form of clustering
- Only few different probe sequences
Algorithms – Hash Tables II 11-15

Probing: Double Hashing


Double hashing uses two auxiliary hash functions
ℎ( , <) = (ℎ′( ) + < ℎ′′( ))
where ℎ′ and ℎ′′ are auxiliary hash functions
Thus the sequence depends on the value of two hash functions
It is unlikely it produces any kind of clustering
Also if ℎ′ and ℎ′′ are selected properly, we have different
probe sequences
Algorithms – Hash Tables II 11-16

Probing: Double Hashing (cntd)


Choice of ℎ′ and ℎ′′:
ℎ′′( ) should be relatively prime to to make sure we search the
entire table
Say, is a power of 2, and ℎ′′( ) is always odd
Or is prime, and ℎ′′( ) < for all
ℎ′( ) =
ℎ′′( ) = 1 + ( ′), and ′ = – 1
Algorithms – Hash Tables II 11-17

Open Addressing Analysis

Theorem
Given an open-address hash table with load factor ) = #/ < 1,
the expected number of probes in an unsuccessful search is at most
assuming uniform hashing
@A

Theorem
Given an open-address hash table with load factor ) = #/ < 1,
the expected number of probes in a successful search is at most
1 1
ln
) 1−)
assuming uniform hashing
Algorithms – Hash Tables II 11-18

Homework

Suggest how to organize a direct access table in which not all keys are
different. All operations must run in D(1) time

Show that if |9| > # (9 denotes the set of all possible keys), there
is a subset of 9 of size # consisting of keys that all hash to the
same slot, so that the worst-case searching time for hashing with
chaining is Θ(#)

Write pseudocode for Hash-Delete in the case of open addressing, and


modify Hash-Insert to handle deleted elements.

You might also like