Hashing: CSE225: Data Structures and Algorithms
Hashing: CSE225: Data Structures and Algorithms
Hashing
CSE225: Data Structures and Algorithms
Searching
• Consider the problem of searching an array for a given
value
• If the array is not sorted, the search requires O(n) time
• If the value isn’t there, we need to search all n elements
• If the value is there, we search n/2 elements on average
• If the array is sorted, we can do a binary search
• A binary search requires O(log n) time
• About equally fast whether the element is found or not
• It doesn’t seem like we could do much better
• How about an O(1), that is, constant time search?
• We can do it if the array is organized in a particular way
Hashing
• Suppose we were to come up with a “magic function”
that, given a value to search for, would tell us exactly
where in the array to look
• If it’s in that location, it’s in the array
• If it’s not in that location, it’s not in the array
• This function would have no other purpose
• This function is called a hash function because it “makes
hash” of its inputs
Hashing
• Each item has a key
• A hash function is a function that:
• When applied to an key, returns a number
• When applied to equal keys, returns the same number for each
• When applied to unequal keys, is very unlikely to return the
same number for each
• Hash functions turn out to be very important for
searching, that is, looking things up fast
• This is their story....
Hashing
• Hash table using direct addressing:
• Use the key itself for indexing
• The record with key i is stored at the ith index of the array
What if we used the agent’s cell phone number as key? We would need an array
of 1011 elements just to store 7 records.
Hashing
• Solution: Modify the hash function as (Key % 100)
Array size
Collision
0
U
(universe of keys) h(k1)
h(k4)
K k1
h(k2) = h(k5)
(actual k4 k2
keys)
k5 k3 h(k3)
m-1
h : U → {0, 1, . . . , m - 1}
hash table size: m
Linear Probing
• Insert 77003
• Collision occurs
• Insert 77003 at the next available spot (treat the array in a
circular way)
Issues with Linear Probing
• Leads to problem of clustering. Elements tend to cluster
in dense intervals in the array.