Hash Tables: Dr. Dibakar Saha
Hash Tables: Dr. Dibakar Saha
❑That means, given a key k, we find the element whose key is k by just
looking in the kth position of the array.
❑ If we have less locations and more possible keys, then simple array
implementation is not enough.
❑ Hash table or hash map is a data structure that stores the keys and their
associated values.
❑ hash table uses a hash function to map keys to their associated values.
❑ We use a hash table when the number of keys actually stored is small
relative to the number of possible keys.
Hash Function
❑The hash function is used to transform the key into the index. Ideally, the hash
function should map each possible key to a unique slot index, but it is difficult to
achieve in practice.
❑Given a collection of elements, a hash function that maps each item into a unique
slot is referred to as a perfect hash function.
❑If we know the elements and the collection will never change, then it is possible to
construct a perfect hash function.
Example
1
2
3
3 2 9 6 11 13 7 12 Key values 4
5
6
7
We can use the key as the index of the Hash Table
8
9
10
11
12
Hash table of size M=15
13
14
Example 0
1
2 2
3 3
3 2 9 6 11 13 7 12 4
Key values 5
20 6 6
7 7
8
Now the question is if new key value 20 appears. Size 9 9
issue 10
So, what to do? 11 11
12 12
Or same key re-appear then? For example, 6 13 13
14
Already
Occupied Hash table of size M=15
Hash Function
❑There is no systematic way to construct a perfect hash function.
❑One way to always have a perfect hash function is to increase the size of the hash
table so that each possible value in the element range can be accommodated.
❑Although this is practical for small numbers of elements, it is not feasible when the
number of possible elements is large.
Characteristics of Good Hash Functions
❑ Minimize collision
❑ The load factor of a non-empty hash table is the number of items stored in the table divided by the
size of the table.
❑ This is the decision parameter used when we want to rehash or expand the existing hash table entries.
❑ That means, it tells whether the hash function is distributing the keys uniformly or not.
❑ There are a number of collision resolution techniques, and the most popular are direct chaining and
open addressing.
Open Closed
Hashing Hashing
❑ When two or more records hash to the same location, these records are constituted into a
singly-linked list called a chain.
Chaining Example
3 2 9 6 11 13 7 12 Key values
Hash Function h(ki)=2ki+3
M=10
Key h(ki)=2ki+3 h(ki)%M Location
0
3 2*3+3=9 9%10=9 9
1 9
2 2*2+3=7 7%10=7 7 2
9 2*9+3=21 21%10=1 1 3
6 4
2*6+3=15 15%10=5 5
5 6 11
11 2*11+3=25 25%10=5 5
6
13 2*13+3=29 29%10=5 9
7 2 7 12
7 2*7+3=17 17%10=7 7
8
12 2*12+3=27 27%10=7 7 9 3 13
Open Addressing
In open addressing all keys are stored in the hash table itself. This approach is also known as
closed hashing. This procedure is based on probing.
rehash(key) = (n + 1)% M
Linear Probing
❑ One of the problems with linear probing is that table items tend to cluster together in the
hash table.
❑ This means that the table contains groups of consecutively occupied locations that are
Clusters can get close to one another, and merge into a larger cluster.
❑ Thus, the one part of the table might be quite dense, even though another part has relatively
few items.
❑ Clustering causes long probe searches and therefore decreases the overall efficiency.
Quadratic Probing
The interval between probes increases proportionally to the hash value (the interval thus increasing linearly, and the
indices are described by a quadratic function).
rehash(key) = (n + k2)% M
Quadratic Probing
Example: Let us assume that the table size is 11 (0..10)
❑ The increments for the probing sequence are computed by using a second hash function.