Hashing PDF
Hashing PDF
Overview
Hash function
I l
Implementations,
t ti
Analysis,
A l i Applications
A li ti
Cpt S 223. School of EECS, WSU
value
john
TableSize
e
Hash index
h(john)
key
Hash
function
How to determine ?
Cpt S 223. School of EECS, WSU
Hash table
(implemented as a vector)
Hash Table
key
Element value
Insert
T [h(john)] = <john,25000>
Data
record
Delete
Hash key
Hash
f
function
ti
T [h(
[h(john)]
john )] = NULL
Search
Hash function
Table size
Hash Function
Collisions
C
lli i
cannott be
b avoided
id d but
b t its
it chances
h
can be
b
reduced using a good hash function
2.
10
Potential problems:
Anagrams will map to the same index
h(abcd) == h(dbac)
Strlen(S) * 255 < TableSize
11
Approach 2
Potential problems:
Assumes first 3 characters randomly distributed
collision
12
Approach 3
Use all N characters of string as an
N-digit
g base-K number
h( S ) S [ L i 1] 37 i mod TableSize
i 0
Problems:
Li it L for
Limit
f long
l
strings
ti
larger runtime
13
Techniques
T
h i
to
t Deal
D l with
ith
Collisions
Chaining
Open addressing
Double hashing
Etc.
Etc
Cpt S 223. School of EECS, WSU
14
Resolving Collisions
==>
> collision !
Chaining
Open addressing
Store colliding
g keys
y elsewhere in the table
Cpt S 223. School of EECS, WSU
15
Ch i i
Chaining
Collision resolution technique #1
16
h(k) = k mod 10
Insert first 10 perfect
squares
Cpt S 223. School of EECS, WSU
17
Implementation of Chaining
Hash Table
Vector of linked lists
(this is the main
hashtable)
Current #elements in
the hashtable
Hash functions for
i t
integers
and
d string
ti
keys
Cpt S 223. School of EECS, WSU
18
Implementation of Chaining
Hash Table
This is the hashtables
current capacity
(aka. table size)
19
Duplicate check
20
Each of these
operations takes time
linear in the length of
the list at the hashed
index location
Cpt S 223. School of EECS, WSU
21
Hash function to
handle Employee
object type
Cpt S 223. School of EECS, WSU
22
Collision Resolution by
Chaining: Analysis
N = number of elements in T
M = size
i off T
= N/M
(current size)
(t bl size)
(table
i )
( load factor)
23
Potential disadvantages of
Chaining
Linked lists could get long
Especially when N approaches M
Longer
L
linked
li k d lists
li t could
ld negatively
ti l impact
i
t
performance
More memory because of pointers
Absolute worst-case (even if N << M):
All N elements in one linked list!
Typically the result of a bad hash function
Cpt S 223. School of EECS, WSU
24
O
Open
Addressing
Add
i
Collision resolution technique #2
25
Collision Resolution by
Open Addressing
An inplace approach
Disadvantages
26
Collision Resolution by
Open Addressing
27
Linear Probing
i probe
th
index =
Linear probing:
0th probe
b
occupied
1st
occupied
2nd probe
occupied
probe
+i
E.g., f(i) = i
hi(x) = (h(x) + i) mod TableSize
3rd probe
0th probe
index
Populate x here
Continue until an empty slot is found
#failed probes is a measure of performance
Cpt S 223. School of EECS, WSU
28
ith probe
index =
0th probe
index
+i
Linear Probing
h0(89)
h0(18)
h0(49)
h1(49)
= (h(89)+f(0)) mod 10 = 9
= (h(18)+f(0)) mod 10 = 8
= (h(49)+f(0)) mod 10 = 9 (X)
= (h(49)+f(1)) mod 10
= (h(49)+ 1 ) mod 10 = 0
Cpt S 223. School of EECS, WSU
29
#unsuccessful
probes:
time
7
total
30
Primary clustering
31
Expected number of
probes for insertion or
unsuccessful search
1
1
1
2
2 (1 )
Expected number of
probes for successful
search
1
1
1
2 (1 )
Example ( = 0.5)
Insert / unsuccessful
search
Successful search
2.5 probes
1 5 probes
1.5
b
Example ( = 0.9)
Insert / unsuccessful
search
50.5 probes
Successful search
5.5 probes
32
Example
l
ln
33
# probe
es
U - unsuccessful search
S - successful search
I - insert
Linear probing
Random probing
good
bad
Load factor
Cpt S 223. School of EECS, WSU
34
Quadratic Probing
Quadratic probing:
occupied
occupied
0th probe
1st probe
2nd probe
occupied
3rd probe
Probe sequence:
q
+0, +1, +4, +9, +16,
occupied
35
Quadratic Probing
Example:
h0(58) = (h(58)+f(0))
(h(58) f(0)) mod
d 10 = 8 (X)
h1(58) = (h(58)+f(1)) mod 10 = 9 (X)
h2(58)
( 8) = (h(58)+f(2))
(h( 8) f(2)) mod
d 10
0=2
Cpt S 223. School of EECS, WSU
36
+12
+22
+22
+02
+02
+02
+02
#unsuccessful
probes:
+12
2
+02
2
5
total
37
Difficult to analyze
Theorem 5.1
38
Quadratic Probing
Deletion
Emptying
p y g slots can break probe
p
sequence
q
and
could cause find stop prematurely
Lazy deletion
Differentiate
Diff
ti t b
between
t
empty
t and
dd
deleted
l t d slot
l t
When finding skip and continue beyond deleted slots
39
Quadratic Probing:
Implementation
40
Quadratic Probing:
Implementation
Lazy deletion
41
Quadratic Probing:
Implementation
Ensure table
size is prime
42
Quadratic Probing:
Implementation
Find
Skip DELETED;
No duplicates
Quadratic probe
sequence (really)
43
Quadratic Probing:
Implementation
Insert
No duplicates
Remove
No deallocation
needed
Cpt S 223. School of EECS, WSU
44
P i
Previous
example
l with
ith R=7
R 7
f(1)
45
46
47
0th try
1st try
0th try
i
2nd try
t
try
3rd try
3rd
Double hashing*:
2nd try
1stt try
3rd try
Linear probing:
*(determined by a second
hash function)
48
Rehashing
49
Rehashing Example
h(x) = x mod 7
= 0.57
0 57
h(x) = x mod 17
= 0.29
0 29
Insert 23
Rehashing
= 0.71
50
Rehashing Analysis
Mustt h
M
have been
b
N/2 iinsertions
ti
since
i
last
l t
rehash
A
Amortizing
ti i the
th O(N) costt over the
th N/2 prior
i
insertions yields only constant additional
time per insertion
Cpt S 223. School of EECS, WSU
51
Rehashing Implementation
When to rehash
52
53
Rehashing for
Quadratic Probing
54
hash_set
hash map
hash_map
55
Key
Hash fn
int main()
{
hash_set<const char*, hash<const char*>, eqstr> Set;
Set.insert("kiwi");
lookup(Set, kiwi");
}
Cpt S 223. School of EECS, WSU
56
Key
Data
Hash fn
Key equality test
int main()
{
hash_map<const char*, int, hash<const char*>, eqstr> months;
Internally
months["january"] = 31;
treated
months["february"] = 28;
like insert
(or overwrite
months["december"] = 31;
if key
cout << january -> " << months[january"] << endl;
already present)
}
Cpt S 223. School of EECS, WSU
57
But
Collisions
ll
require disk
d k accesses
Rehashing requires a lot of disk accesses
Solution: Extendible Hashing
Cpt S 223. School of EECS, WSU
58
59
Summary
60