0% found this document useful (0 votes)
33 views

Hashing: CSE225: Data Structures and Algorithms

This document discusses hashing techniques for improving search time in arrays. It introduces the concept of a hash function that maps keys to array indices, allowing constant-time lookups. However, collisions may occur when different keys map to the same index. Linear probing and chaining are two approaches for dealing with collisions that are discussed. Linear probing resolves collisions by searching the next array index, but can lead to clustering. Chaining instead stores collided items in a linked list at each index to avoid clustering issues. Rehashing functions are also introduced to map keys to new indices on collisions.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Hashing: CSE225: Data Structures and Algorithms

This document discusses hashing techniques for improving search time in arrays. It introduces the concept of a hash function that maps keys to array indices, allowing constant-time lookups. However, collisions may occur when different keys map to the same index. Linear probing and chaining are two approaches for dealing with collisions that are discussed. Linear probing resolves collisions by searching the next array index, but can lead to clustering. Chaining instead stores collided items in a linked list at each index to avoid clustering issues. Rehashing functions are also introduced to map keys to new indices on collisions.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Lecture 17

Hashing
CSE225: Data Structures and Algorithms
Searching
• Consider the problem of searching an array for a given
value
• If the array is not sorted, the search requires O(n) time
• If the value isn’t there, we need to search all n elements
• If the value is there, we search n/2 elements on average
• If the array is sorted, we can do a binary search
• A binary search requires O(log n) time
• About equally fast whether the element is found or not
• It doesn’t seem like we could do much better
• How about an O(1), that is, constant time search?
• We can do it if the array is organized in a particular way
Hashing
• Suppose we were to come up with a “magic function”
that, given a value to search for, would tell us exactly
where in the array to look
• If it’s in that location, it’s in the array
• If it’s not in that location, it’s not in the array
• This function would have no other purpose
• This function is called a hash function because it “makes
hash” of its inputs
Hashing
• Each item has a key
• A hash function is a function that:
• When applied to an key, returns a number
• When applied to equal keys, returns the same number for each
• When applied to unequal keys, is very unlikely to return the
same number for each
• Hash functions turn out to be very important for
searching, that is, looking things up fast
• This is their story....
Hashing
• Hash table using direct addressing:
• Use the key itself for indexing
• The record with key i is stored at the ith index of the array

index ID (key) Secret Agent


0 0 Inferno
1 1 Reaper
2 2 Vanguard
3 3 Wolfhound
4 4 Firefly
5 5 Panzer Division
6 6 Iron Horse

What if we used the agent’s cell phone number as key? We would need an array
of 1011 elements just to store 7 records.
Hashing
• Solution: Modify the hash function as (Key % 100)

Array size
Collision
0

U
(universe of keys) h(k1)
h(k4)

K k1
h(k2) = h(k5)
(actual k4 k2
keys)
k5 k3 h(k3)

m-1

h : U → {0, 1, . . . , m - 1}
hash table size: m
Linear Probing
• Insert 77003
• Collision occurs
• Insert 77003 at the next available spot (treat the array in a
circular way)
Issues with Linear Probing
• Leads to problem of clustering. Elements tend to cluster
in dense intervals in the array.

    

• Search efficiency problem remains.


• Deletion becomes trickier….
Issues with Linear Probing
• Records may also be deleted from a hash table.
• But the location must not be left as an ordinary "empty spot" since
that could interfere with searches.
• The location must be marked in some special way so that a search
can tell that the spot used to have something in it.
Rehashing
• Resolving a collision by computing a new hash location from a hash
function that manipulates the original location rather than the
element's key
(HashValue + constant) % array-size
• Quadratic probing Resolving a hash collision by using the rehashing formula
(HashValue ± I2) % array-size, where I is the number of times that the rehash
function has been applied
• Random probing Resolving a hash collision by generating pseudo-random
hash values in successive applications of the rehash function

Caution: Constant and array size must be relatively prime.


Buckets and Chaining
• Bucket: A collection of elements associated with a particular hash
location
Buckets and Chaining
• Chain: A linked list of elements that share the same hash location
Comparison Between Linear Probing and
Chaining
• Insertion order: 45300, 20006, 50002, 40000, 25001, 13000, 65905,
30001, 95000 (search for 30001)

You might also like