Hashing and Hash Tables
Hashing and Hash Tables
Session 07-08
1
Learning Outcomes
At the end of this session, students will be able to:
LO 1: Explain the concept of data structures and its usage in
Computer Science
LO 2: Illustrate any learned data structure and its usage in
application
LO 3: Apply data structures using C
2
Outline
1. Hashing
2. Hash Table
3. Hash Function
4. Collision
3
Hashing
• Hashing is a technique used for storing and retrieving keys in a
rapid manner.
• In hashing, a string of characters are transformed into a
usually shorter length value or key that represents the original
string.
• Hashing is used to index and retrieve items in a database
because it is faster to find item using shorter hashed key than
to find it using the original value.
• Hashing can also be defined as a concept of distributing keys in
an array called a hash table using a predetermined function
called hash function.
4
Hash Table
• Hash table is a table (array) where we store the original string.
Index of the table is the hashed key while the value is the
original string.
• The size of hash table is usually several orders of magnitude
lower than the total number of possible string, so several string
might have a same hashed-key.
5
Example
Consider this example.
6
Example
This is the hash table h[ ] Value
of those string. 0 atan
1
2 char
atan is stored in h[0] because a is 0.
3 define
char is stored in h[2] because c is 2.
4 exp
define is stored in h[3] because d is 3.
5 float
and so on..
6
…
We only consider the first character 25
of each string.
7
Hash Function
• There are many ways to hash a string into a key. The
following are some of the important methods for
constructing hash functions.
• Mid-square
• Division (most common)
• Folding
• Digit Extraction
• Rotating Hash
8
Mid-square
• Square the string/identifier and then using an appropriate
number of bits from the middle of the square to obtain the hash-
key.
• If the key is a string, it is converted to a number.
• Steps:
1. Square the value of the key. (k2)
2. Extract the middle r bits of the result obtained in Step 1
Function : h(k) = s
k = key
s = the hash key obtained by selecting r bits from k2
9
Mid Square Example
• Here the entire key participates in generating the address so that there is a
better chance that different addresses are generated even for keys close to
each other.
• For example,
Key squared value middle part
3121 9740641 406
3122 9746884 468
3123 9753129 531
• In practice it is more efficient to choose a power of 2 for the size of the table
and extract the middle part of the bit representation of the square of a key.
• If the table size is chosen in this example as 1024, the binary representation
of square of 3121 is 1001010-0101000010-1100001.
• The middle part can be easily extracted using a mask and a shift operation.
Division
• Divide the string/identifier by using the modulus operator.
• It’s the most simple method of hashing an integer x.
11
Division Example
• Suppose the table is to store strings. A very simple hash function would
be to add up ASCII values of all the characters in the string and take
modulo of table size, say 97.
13
Digit Extraction
• A predefined digit of the given number is considered
as the hash address.
• Example:
– Consider x = 14,568
– If we extract the 1st, 3rd, and 5th digit, we will get a
hash code of : 158.
14
Rotating Hash
• Use any hash method (such as division or mid-square
method)
• After getting the hash code/address from the hash
method, do rotation
• Rotation is performed by shifting the digits to get a
new hash address.
• Example:
– Given hash address = 20021
– Rotation result: 12002 (fold the digits)
15
Collision
• What happened if we want to store these strings using the
previous hash function (use the first character of each
string)
• There are several strings which have the same hash-key, it’s
float and floor (hash-key: 5), char and ceil (hash-key: 2).
16
Collision
• There are two general ways to handle collisions:
• Linear Probing
Search the next empty slot and put the string there.
• Chaining
Put the string in a slot as a chained list (linked list).
17
Linear Probing
This is the hash table of these string: h[ ] Value
0 atan
define, float, exp, char, atan, 1 acos
ceil, floor, acos. 2 char
3 define
Note that ceil is stored in h[6], acos is 4 exp
stored in h[1] and floor is stored in h[7].
5 float
6 ceil
When we want to store “ceil”, there is
7 floor
already “char” stored in h[2], so we
…
search the next empty slot which is h[6].
18
Linear Probing
• Linear probing has a bad h[ ] Value Step
search complexity if there 0 atan 1
are many collisions.
1 acos 2
• The table “step” on the right 2 char 1
describe how many loop/step 3 define 1
needed to find the string. 4 exp 1
5 float 1
• Supposed we want to find
ceil, we compute the hash 6 ceil 5
key and found 2. But ceil is 7 floor 3
not there so we should iterate …
until we found ceil.
19
Linear Probing
void linear_probing(item, h[]) {
hash_key = hash(item);
i = has_key;
while ( strlen(h[i] != 0 ) {
if ( strcmp(h[i], item) == 0 ) {
// DUPLICATE ENTRY
}
i = (i + 1) % TABLE_SIZE;
if ( i == hash_key ) {
// TABLE IS FULL
}
}
h[i] = item;
}
20
Chaining
In chaining, we store each h[ ] Value
string in a chain (linked list). 0 atan acos
1 NULL
7 NULL
21
Chaining
void chaining(item, h[]) {
hash_key = hash(item);
trail = NULL, lead = h[hash_key];
while ( lead ) {
if ( strcmp(lead->item, item) == 0 ) { // DUPLICATE ENTRY }
trail = lead; lead = lead->next;
}
p = malloc new node;
p->item = item;
p->next = NULL;
if ( trail == 0 ) h[hash_key] = p; else trail->next = p;
}
22
References
• S. Sridhar. 2015. Design and Analysis of Algorithms.
Oxford University Press. New Delhi. ISBN:
9780198093695. Chapter 10
• Reema Thareja. 2014. Data structures using C.
Oxford University Press. New Delhi.
ISBN:9780198099307. Chapter 15
• Thomas H. Cormen, Charles E. Leiserson, Ronald L.
Rivest, & Clifford Stein. (2009). Introduction to
Algorithms. 03. The MIT Press. London. ISBN:
9780262033848. Chapter 11
• Hash Table,
https://visualgo.net/en/hashtable?slide=1
23