MIT6 006F11 Lec08 PDF
MIT6 006F11 Lec08 PDF
Lecture 8: Hashing I
Lecture Overview
• Dictionaries and Python
• Motivation
• Prehashing
• Hashing
• Chaining
Dictionary Problem
Abstract Data Type (ADT) — maintain a set of items, each with a key, subject to
We assume items have distinct keys (or that inserting new one clobbers old).
Balanced BSTs solve in O(lg n) time per op. (in addition to inexact searches like next-
largest).
Goal: O(1) time per operation.
Python Dictionaries:
Items are (key, value) pairs e.g. d = {‘algorithms’: 5, ‘cool’: 42}
Python set is really dict where items are keys (no values)
1
Lecture 8 Hashing I 6.006 Fall 2011
Motivation
Dictionaries are perhaps the most popular data structure in CS
• built into most modern programming languages (Python, Perl, Ruby, JavaScript,
Java, C++, C#, . . . )
2
Lecture 8 Hashing I 6.006 Fall 2011
0
1
2
key item
key item
key item
.
.
.
Problems:
2. large key range =⇒ large space — e.g. one key of 2256 is bad news.
2 Solutions:
• Object’s key should not change while in table (else cannot find it anymore)
Solution to 2 : hashing (verb from French ‘hache’ = hatchet, & Old High German ‘happja’
= scythe)
• Reduce universe U of all keys (say, integers) down to reasonable size m for table
3
Lecture 8 Hashing I 6.006 Fall 2011
T
0
k1 1
. . .
U . . . k. k3
k k . k.
1
h(k 1) = 1
.
. .
2
k.
4
. 3
k2
m-1
1. Chaining: TODAY
Chaining
Linked list of colliding elements in each slot of table
k. . . .
U k k . k.
1
2 k1 k4 k2
4
.k 3
. h(k 1) =
k3 h(k 2) =
h(k )
4
• Worst case: all n keys hash to same slot =⇒ Θ(n) per operation
4
Lecture 8 Hashing I 6.006 Fall 2011
Performance
This implies that expected running time for search is Θ(1+α) — the 1 comes from applying
the hash function and random access to the slot whereas the α comes from searching the
list. This is equal to O(1) if α = O(1), i.e., m = Ω(n).
Hash Functions
We cover three methods to achieve the above performance:
Division Method:
h(k) = k mod m
This is practical when m is prime but not too close to power of 2 or 10 (then just depending
on low bits/digits).
Multiplication Method:
h(k) = [(a · k) mod 2w ] (w − r)
where a is random, k is w bits, and m = 2r .
This is practical when a is odd & 2w−1 < a < 2w & a not too close to 2w−1 or 2w .
5
Lecture 8 Hashing I 6.006 Fall 2011
w
k
x 1 1 1 a
k
k
k
}
r
Universal Hashing
[6.046; CLRS 11.3.3]
For example: h(k) = [(ak + b) mod p] mod m where a and b are random ∈ {0, 1, . . . p − 1},
and p is a large prime (> |U|).
This implies that for worst case keys k1 6= k2 , (and for a, b choice of h):
1
P ra,b {event Xk1 k2 } = P ra,b {h(k1 ) = h(k2 )} =
m
This lemma not proved here
This implies that:
X
Ea,b [# collisions with k1 ] = E[ Xk1 k2 ]
k2
X
= E[Xk1 k2 ]
k2
X
= P r{Xk1 k2 = 1}
| {z }
k2 1
m
n
= =α
m
This is just as good as above!
6
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.