0% found this document useful (0 votes)
22 views18 pages

Hash Tables

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views18 pages

Hash Tables

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Hash Tables

0 
1 025-612-0001
2 981-101-0002
3 
4 451-229-0004

© 2010 Goodrich, Tamassia Hash Tables 1


Recall the Map ADT
 find(k): if the map M has an entry with key k,
return its associated value; else, return null
 put(k, v): insert entry (k, v) into the map M; if key k
is not already in M, then return null; else, return old
value associated with k
 erase(k): if the map M has an entry with key k,
remove it from M and return its associated value;
else, return null
 size(), empty()
 entrySet(): return a list of the entries in M
 keySet(): return a list of the keys in M
 values(): return a list of the values in M

© 2010 Goodrich, Tamassia Hash Tables 2


Hash Functions and
Hash Tables
 A hash function h maps keys of a given type to
integers in a fixed interval [0, N1]
 Example:
h(x)  x mod N
is a hash function for integer keys
 The integer h(x) is called the hash value of key x
 A hash table for a given key type consists of
 Hash function h

 Bukket Array (called table) of size N

 When implementing a map with a hash table, the


goal is to store the entry (k, o) at index i  h(k)
© 2010 Goodrich, Tamassia Hash Tables 3
Hash Tables
 bucket array
 An array A of size N, where each cell of A is
thought of as a “bucket”
 a collection of key-value pairs
 N defines the capacity of the array
 An entry e with key k is inserted into the bucket A[k]
Hash Tables
 bucket array
 If keys are unique integers in the range [0,N − 1]
 then each bucket holds at most one entry
 Thus, searches, insertions, and removals in the bucket array take O(1)
time
 This sounds like a great achievement
 Drawbacks.
 The space used is proportional to N
 if N is much larger than the number of entries n actually present in the map
 waste of space.
 The keys are required to be integers in the range [0,N −1]
 Often not the case
 Because of these two drawbacks, we use the bucket array in
conjunction with a “good” mapping from the keys to the integers in
the range [0,N − 1]
Example
 We design a hash table 0 
for a map storing entries 1 025-612-0001

as (SSN, Name), where 2 981-101-0002


3 
SSN (social security 4 451-229-0004
number) is a nine-digit


positive integer
 Our hash table uses an 9997 
9998 200-751-9998
array of size N10,000 9999 
and the hash function
h(x)last four digits of x

© 2010 Goodrich, Tamassia Hash Tables 6


Hash Function
 A hash function is usually specified as the
composition of two functions:
Hash code:
h1: keys  integers
Mapping the key k to an integer
Need not be in the range [0,N − 1], can be –ve
Hash code for a key k should be the same as the
hash code for any key that is equal to k
Avoid collisions as much as possible
Compression function:
h2: integers  [0, N1]
mapping the hash code to an integer within
the range of indices ([0,N − 1]) of a bucket array
The hash code is applied first, and the compression
function is applied next on the result, i.e., h(x) = h2(h1(x))
Hash Codes
 Memory address:  Component sum:
 We reinterpret the  We partition the bits of
memory address of the the key into
key object as an integer components of fixed
 Good in general, except length (e.g., 16 or 32
for numeric and string bits) and we sum the
keys components (ignoring
overflows)
 Integer cast:  Suitable for numeric
 We reinterpret the bits of keys of fixed length
the key as an integer greater than or equal
 Suitable for keys of length to the number of bits
less than or equal to the of the integer type
number of bits of the (e.g., long and double
integer type (e.g., byte, in C++)
short, int and float in C++)
© 2010 Goodrich, Tamassia Hash Tables 8
Hash Codes (cont.)
 Polynomial accumulation:  Polynomial p(z) can be
 We partition the bits of the evaluated in O(n) time
key into a sequence of
components of fixed length using Horner’s rule:
(e.g., 8, 16 or 32 bits)  The following
a0 a1 … an1 polynomials are
 We evaluate the polynomial successively computed,
p(z)  a0  a1 z  a2 z2  … each from the previous
… one in O(1) time
an1zn1 p0(z)  an1
at a fixed value z, ignoring pi (z)  ani1 
overflows
zpi1(z)
 Especially suitable for strings
(e.g., the choice z  33 gives (i  1, 2, …, n 1)
at most 6 collisions on a set  We have p(z)  pn1(z)
of 50,000 English words)

© 2010 Goodrich, Tamassia Hash Tables 9


Compression
Functions
 Division:  Multiply, Add and
 h2 (y)  y mod N Divide (MAD):
 The size N of the  h2 (y)  (ay  b) mod N
hash table is  a and b are
usually chosen to nonnegative
be a prime integers such that
 The reason has to a mod N  0
do with number  Otherwise, every
theory and is integer would map
beyond the scope to the same value b
of this course
© 2010 Goodrich, Tamassia Hash Tables 10
Collision
Handling
 Collisions occur 0 

when different 1 025-612-0001


2 
elements are 3 
mapped to the same 4 451-229-0004 981-101-0004

cell
 Separate Chaining:  Separate chaining is
let each cell in the simple, but requires
table point to a additional memory
linked list of entries outside the table
that map there
© 2010 Goodrich, Tamassia Hash Tables 11
Map with Separate Chaining
Delegate operations to a list-based map at each
cell:
Algorithm find(k):
return A[h(k)].find(k)

Algorithm put(k,v):
t = A[h(k)].put(k,v)
if t = null then {k is a new key}
n=n+1
return t

Algorithm erase(k):
t = A[h(k)].erase(k)
if t ≠ null then {k was found}
n=n-1
return t
© 2010 Goodrich, Tamassia Hash Tables 12
Linear Probing
 Open addressing: the  Example:
colliding item is placed in a
different cell of the table
 h(x)  x mod 13
 Linear probing: handles  Insert keys 18, 41,
collisions by placing the 22, 44, 59, 32, 31,
colliding item in the next
73, in this order
(circularly) available table
cell
 Each table cell inspected is
referred to as a “probe” 0 1 2 3 4 5 6 7 8 9 10 11 12
 Colliding items lump
together, causing future
collisions to cause a longer 41 18445932223173
sequence of probes 0 1 2 3 4 5 6 7 8 9 10 11 12

© 2010 Goodrich, Tamassia Hash Tables 13


Search with Linear Probing
 Consider a hash table Algorithm find(k)
A that uses linear i  h(k)
probing p0
 find(k) repeat
c  A[i]
 We start at cell h(k)
if c  
 We probe consecutive
locations until one of return null
the following occurs else if c.key ()  k
 An item with key k is return c.value()
found, or else
 An empty cell is found,
i  (i  1) mod N
or
 N cells have been pp1
unsuccessfully probed until pN
return null
© 2010 Goodrich, Tamassia Hash Tables 14
Updates with Linear
Probing
 To handle insertions  put(k, o)
and deletions, we
introduce a special
 We throw an
object, called exception if the table
AVAILABLE, which is full
replaces deleted  We start at cell h(k)
elements  We probe consecutive
 erase(k) cells until one of the
 We search for an entry following occurs
with key k
 A cell i is found that is

If such an entry (k, o) is either empty or stores
found, we replace it AVAILABLE, or
with the special item  N cells have been
AVAILABLE and we
unsuccessfully probed
return element o
 Else, we return null
 We store (k, o) in cell i
© 2010 Goodrich, Tamassia Hash Tables 15
Double Hashing
 Double hashing uses a
secondary hash function
 Common choice of
d(k) and handles compression function
collisions by placing an for the secondary
item in the first hash function:
available cell of the d2(k)  q  k mod q
series
(i  jd(k)) mod N where
for j  0, 1, … , N  1  qN
 The secondary hash
 q is a prime
function d(k) cannot  The possible values
have zero values for d2(k) are
 The table size N must 1, 2, … , q
be a prime to allow
probing of all the cells
© 2010 Goodrich, Tamassia Hash Tables 16
Example of Double Hashing
k h (k ) d (k ) Probes
 Consider a hash 18 5 3 5
table storing 41 2 1 2
22 9 6 9
integer keys that 44 5 5 5 10
handles collision 59 7 4 7
32 6 3 6
with double 31 5 4 5 9 0
hashing 73 8 4 8
 N13
 h(k)  k mod 13
0 1 2 3 4 5 6 7 8 9 10 11 12
 d(k)  7  k mod 7
 Insert keys 18, 41,
31 41 183259732244
22, 44, 59, 32, 31,
0 1 2 3 4 5 6 7 8 9 10 11 12
73, in this order
© 2010 Goodrich, Tamassia Hash Tables 17
Performance of
Hashing
 In the worst case,
searches, insertions and
 The expected running
removals on a hash table time of all the
take O(n) time dictionary ADT
 The worst case occurs operations in a hash
when all the keys inserted table is O(1)
into the map collide  In practice, hashing is
 The load factor   nN
affects the performance of
very fast provided the
a hash table load factor is not close
 Assuming that the hash to 100%
values are like random  Applications of hash
numbers, it can be shown tables:
that the expected number  small databases
of probes for an insertion  compilers
with open addressing is  browser caches
1 (1  )
© 2010 Goodrich, Tamassia Hash Tables 18

You might also like