0% found this document useful (0 votes)
126 views

Hashing and Hash Tables

This document provides an overview of hashing and hash tables. It discusses key concepts like hashing, hash tables, hash functions, and collision handling. Specifically, it covers: - Hashing is a technique to map keys to values in an array using a hash function for fast retrieval. - A hash table stores key-value pairs, with the index determined by the hashed key. Collisions occur when different keys hash to the same value. - Common hash functions discussed include mid-square, division, folding, digit extraction, and rotating hash. - Collisions are handled using linear probing, where the next empty slot is used, or chaining, where a linked list is used to store

Uploaded by

Andre Laurent
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views

Hashing and Hash Tables

This document provides an overview of hashing and hash tables. It discusses key concepts like hashing, hash tables, hash functions, and collision handling. Specifically, it covers: - Hashing is a technique to map keys to values in an array using a hash function for fast retrieval. - A hash table stores key-value pairs, with the index determined by the hashed key. Collisions occur when different keys hash to the same value. - Common hash functions discussed include mid-square, division, folding, digit extraction, and rotating hash. - Collisions are handled using linear probing, where the next empty slot is used, or chaining, where a linked list is used to store

Uploaded by

Andre Laurent
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Course : Data Structures

Effective Period : February 2019

Hashing & Hash Tables

Session 07-08

1
Learning Outcomes
At the end of this session, students will be able to:
 LO 1: Explain the concept of data structures and its usage in
Computer Science
 LO 2: Illustrate any learned data structure and its usage in
application
 LO 3: Apply data structures using C

2
Outline
1. Hashing
2. Hash Table
3. Hash Function
4. Collision

3
Hashing
• Hashing is a technique used for storing and retrieving keys in a
rapid manner.
• In hashing, a string of characters are transformed into a
usually shorter length value or key that represents the original
string.
• Hashing is used to index and retrieve items in a database
because it is faster to find item using shorter hashed key than
to find it using the original value.
• Hashing can also be defined as a concept of distributing keys in
an array called a hash table using a predetermined function
called hash function.

4
Hash Table
• Hash table is a table (array) where we store the original string.
Index of the table is the hashed key while the value is the
original string.
• The size of hash table is usually several orders of magnitude
lower than the total number of possible string, so several string
might have a same hashed-key.

• For example, there are 267 (8,031,810,176) string of length


7 character consists of lowercase only.

5
Example
Consider this example.

Supposed we want to store 5 string: define, float,


exp,
char, atan into a hash table with size 26. The hash
function we will use is “transform the first character of
each string into a number between 0..25”
(a will be 0, b will be 1, c will be 2, …, z will be 25).

6
Example
This is the hash table h[ ] Value
of those string. 0 atan
1
2 char
atan is stored in h[0] because a is 0.
3 define
char is stored in h[2] because c is 2.
4 exp
define is stored in h[3] because d is 3.
5 float
and so on..
6

We only consider the first character 25
of each string.

7
Hash Function
• There are many ways to hash a string into a key. The
following are some of the important methods for
constructing hash functions.
• Mid-square
• Division (most common)
• Folding
• Digit Extraction
• Rotating Hash

8
Mid-square
• Square the string/identifier and then using an appropriate
number of bits from the middle of the square to obtain the hash-
key.
• If the key is a string, it is converted to a number.
• Steps:
1. Square the value of the key. (k2)
2. Extract the middle r bits of the result obtained in Step 1

Function : h(k) = s
k = key
s = the hash key obtained by selecting r bits from k2

9
Mid Square Example
• Here the entire key participates in generating the address so that there is a
better chance that different addresses are generated even for keys close to
each other.
• For example,
Key squared value middle part
3121 9740641 406
3122 9746884 468
3123 9753129 531
• In practice it is more efficient to choose a power of 2 for the size of the table
and extract the middle part of the bit representation of the square of a key.
• If the table size is chosen in this example as 1024, the binary representation
of square of 3121 is 1001010-0101000010-1100001.
• The middle part can be easily extracted using a mask and a shift operation.
Division
• Divide the string/identifier by using the modulus operator.
• It’s the most simple method of hashing an integer x.

Function: h(z) = z mod M


z = key
M = the value using to divide the key, usually using a prime
number, the table size or the size of memory used.

11
Division Example
• Suppose the table is to store strings. A very simple hash function would
be to add up ASCII values of all the characters in the string and take
modulo of table size, say 97.

“COBB” would be stored at the location


( 64+3 + 64+15 + 64+2 + 64+2) % 97 = 88
“HIKE” would be stored at the location
( 64+8 + 64+9 + 64+11 + 64+5) % 97 = 2
“PPQQ” would be stored at the location
( 64+16 + 64+16 + 64+17 + 64+17) % 97 = 35
“ABCD” would be stored at the location
( 64+1 + 64+2 + 64+3 + 64+4) % 97 = 76
Hash Function : Folding
• The Folding method works in two steps :
– Divide the key value into a number of parts where each part has the
same number of digits except the last part which may have lesser digits
than the other parts.
– Add the individual parts. That is obtain the sum of part1 + part2 + ... +
part n. The hash value produced by ignoring the last carry, if any.
• Example:
– Given a hash table 100 locations, calculate the hash value for key 5678
and 34567

Key 5678 34567


Parts 56 and 78 34, 56 and 7
Sum 134 97
Hash Value 34 (ignore the last carry) 97

13
Digit Extraction
• A predefined digit of the given number is considered
as the hash address.
• Example:
– Consider x = 14,568
– If we extract the 1st, 3rd, and 5th digit, we will get a
hash code of : 158.

14
Rotating Hash
• Use any hash method (such as division or mid-square
method)
• After getting the hash code/address from the hash
method, do rotation
• Rotation is performed by shifting the digits to get a
new hash address.
• Example:
– Given hash address = 20021
– Rotation result: 12002 (fold the digits)

15
Collision
• What happened if we want to store these strings using the
previous hash function (use the first character of each
string)

• define, float, exp, char, atan, ceil, acos, floor.

• There are several strings which have the same hash-key, it’s
float and floor (hash-key: 5), char and ceil (hash-key: 2).

• It’s called a collision. How can we handle this?

16
Collision
• There are two general ways to handle collisions:

• Linear Probing
Search the next empty slot and put the string there.

• Chaining
Put the string in a slot as a chained list (linked list).

17
Linear Probing
This is the hash table of these string: h[ ] Value
0 atan
define, float, exp, char, atan, 1 acos
ceil, floor, acos. 2 char
3 define
Note that ceil is stored in h[6], acos is 4 exp
stored in h[1] and floor is stored in h[7].
5 float
6 ceil
When we want to store “ceil”, there is
7 floor
already “char” stored in h[2], so we

search the next empty slot which is h[6].

18
Linear Probing
• Linear probing has a bad h[ ] Value Step
search complexity if there 0 atan 1
are many collisions.
1 acos 2
• The table “step” on the right 2 char 1
describe how many loop/step 3 define 1
needed to find the string. 4 exp 1
5 float 1
• Supposed we want to find
ceil, we compute the hash 6 ceil 5
key and found 2. But ceil is 7 floor 3
not there so we should iterate …
until we found ceil.

19
Linear Probing
void linear_probing(item, h[]) {
hash_key = hash(item);
i = has_key;
while ( strlen(h[i] != 0 ) {
if ( strcmp(h[i], item) == 0 ) {
// DUPLICATE ENTRY
}
i = (i + 1) % TABLE_SIZE;
if ( i == hash_key ) {
// TABLE IS FULL
}
}
h[i] = item;
}

20
Chaining
In chaining, we store each h[ ] Value
string in a chain (linked list). 0 atan  acos
1 NULL

So if there is collision, we only 2 char  ceil


need to iterate on that chain. 3 define
4 exp
5 float  floor
6 NULL

7 NULL

21
Chaining
void chaining(item, h[]) {
hash_key = hash(item);
trail = NULL, lead = h[hash_key];
while ( lead ) {
if ( strcmp(lead->item, item) == 0 ) { // DUPLICATE ENTRY }
trail = lead; lead = lead->next;
}
p = malloc new node;
p->item = item;
p->next = NULL;
if ( trail == 0 ) h[hash_key] = p; else trail->next = p;
}

22
References
• S. Sridhar. 2015. Design and Analysis of Algorithms.
Oxford University Press. New Delhi. ISBN:
9780198093695. Chapter 10
• Reema Thareja. 2014. Data structures using C.
Oxford University Press. New Delhi.
ISBN:9780198099307. Chapter 15
• Thomas H. Cormen, Charles E. Leiserson, Ronald L.
Rivest, & Clifford Stein. (2009). Introduction to
Algorithms. 03. The MIT Press. London. ISBN:
9780262033848. Chapter 11
• Hash Table,
https://visualgo.net/en/hashtable?slide=1
23

You might also like