0% found this document useful (0 votes)
5 views

Unit-6c DBMS - Hashing

Uploaded by

rohan goud
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Unit-6c DBMS - Hashing

Uploaded by

rohan goud
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Hashing

Hashing is a technique where we can compute the location


of desired record in order to retrieve it in single access or
comparison.

The searching time of each searching technique depends


on comparison.

N comparisons are required for an Array A with N elements.


Advantages
 To increase the efficiency i.e. to reduce the
searching time, we need to avoid un necessary
comparisons.
 Operations of Hashing
 Insertion of Records
 Searching of Records
 Deletion of Records
If L is the memory location where each record is
related with the key.
If we can locate the memory address of a record from
the key then the desired record can be retrieved in a
single access.
For notational and coding convenience, we assume
that the keys in k and the address in are (decimal)
integers.
So the location is selected by applying a function which
is called hash function or hashing function from the key k.
Hash Collision
Unfortunately such a function H may not yield different
values (or index or many address);
it is possible that two different keys k1 and k2 will yield
the same hash address.
This situation is called Hash Collision
Hash Function

The basic idea of hash function is the transformation of the


key into the corresponding location in the hash table.
A Hash function H can be defined as a function that takes
key as input and transforms it into a hash table index.
Hash functions are of two types:
1. Distribution- Independent function
2. Distribution- Dependent function
Distribution - Independent hash functions :
1. Division method
2. Mid Square method
3. Folding method.
1.Division Method
TABLE is an array of database file where the employee details are
stored.
 Choose a number m, which is larger than the number of keys k. i.e.,
m is greater than the total number of records in the TABLE.
 The number m is usually chosen to be prime number to
minimize the collision.
The hash function H is defined by
H(k) = k (mod m)
Where H(k) is the hash address (or index of the array) and here k (mod m) means the
remainder when k is divided by m.
Example
Let a company has 90 employees and 00, 01, 02, ...... 99 be the two digit 100
memory address (or index or hash address) to store the records. We have
employee code as
the key.
Choose m in such a way that it is grater than 90. Suppose m = 93. Then for the
following employee code (or key k) :
H(k) = H(2103) = 2103 (mod93) = 57
H(k) = H(6147) = 6147 (mod 93) = 9
H(k) = H(3750) = 3750 (mod93) = 30
2.Mid Square Method

The key k is squared. Then the hash function H is defined by


H(k) = k2 = l
Where l is obtained by digits from both the end of k2
starting from left. Same number of digits must be used for
all of the keys.
For example consider following keys in the table
and its hash index :
3.Folding Method

The key K, k1, k2,...... kr is partitioned into number of parts.


The parts have same number of digits as the required hash
address, except possibly for the last part.
Then the parts are added together, ignoring the last carry.
H(k) = k1 + k2 + ...... + kr
Here we are dealing with a hash table with index form 00 to
99, i.e., two-digit hash table.
So we divide the K numbers of two digits.
H(7148) = 71 + 64 = 155, here we will eliminate
the leading carry (i.e., 1). So H(7148)
= 71 + 64 = 55.
Hash Collision
• It is possible that two non-identical keys K1, K2
are hashed into the same hash address.
This situation is called Hash Collision.
Let us consider a hash table having 10
locations as shown in Fig.
Division method is used to hash the key.
H(k) = k (mod m)

If we want to insert a new record with key


500 then H(500) = 500(mod 10) = 0.
The location 0 in the table is already filled (i.e.,
not empty). Thus collision occurred.
Collisions are almost impossible to avoid but it can be
minimized considerably by introducing any one of the
following three techniques:
1. Open addressing
2. Chaining
3. Bucket addressing
1. Open Addressing
• In open addressing method, when a key is colliding with
another key, the collision is resolved by finding a nearest
empty space by probing the cells.
• Suppose a record R with key K has a hash address H(k) = h.
then we will linearly search h + i (where i = 0, 1, 2, ...... m)
locations for free space (i.e., h, h + 1, h + 2, h + 3 ......hash
address)
• The main disadvantage of Linear Probing is that substantial
amount of time will take to find the free cell by sequential or
linear searching the table.
QUADRATIC PROBING
Suppose a record with R with key k has the hash address H(k) = h. Then
instead of searching the location with address h, h + 1, h + 2,...... h + i ......,
we search for free hash address h, h + 1, h + 4, h + 9, h + 16, ...... h + i2,...... .
DOUBLE HASHING
Second hash function H1 is used to resolve the collision.
Suppose a record R with key k has the hash address
H(k) = h and H1(k) = h1, which is not equal to m.
Then we linearly search for the location with addresses
h, h + h1, h + 2h1, h + 3h1, ...... h + i (h1)2 (where i = 0, 1, 2, ......).
2. Chaining
• In chaining technique the entries in the hash table are
dynamically allocated and entered into a linked list
associated with each hash key.
3. Bucket Addressing
• Another solution to the hash collision problem is
to store colliding elements in the same position
in table by introducing a bucket with each hash
address.
• A bucket is a block of memory space, which is
large enough to store multiple items.
If a bucket is full, then the colliding item can be stored in the
new bucket by incorporating its link to previous bucket.

You might also like