Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21
Hashing
Hashing is a technique where we can compute the location
of desired record in order to retrieve it in single access or comparison.
The searching time of each searching technique depends
on comparison.
N comparisons are required for an Array A with N elements.
Advantages To increase the efficiency i.e. to reduce the searching time, we need to avoid un necessary comparisons. Operations of Hashing Insertion of Records Searching of Records Deletion of Records If L is the memory location where each record is related with the key. If we can locate the memory address of a record from the key then the desired record can be retrieved in a single access. For notational and coding convenience, we assume that the keys in k and the address in are (decimal) integers. So the location is selected by applying a function which is called hash function or hashing function from the key k. Hash Collision Unfortunately such a function H may not yield different values (or index or many address); it is possible that two different keys k1 and k2 will yield the same hash address. This situation is called Hash Collision Hash Function
The basic idea of hash function is the transformation of the
key into the corresponding location in the hash table. A Hash function H can be defined as a function that takes key as input and transforms it into a hash table index. Hash functions are of two types: 1. Distribution- Independent function 2. Distribution- Dependent function Distribution - Independent hash functions : 1. Division method 2. Mid Square method 3. Folding method. 1.Division Method TABLE is an array of database file where the employee details are stored. Choose a number m, which is larger than the number of keys k. i.e., m is greater than the total number of records in the TABLE. The number m is usually chosen to be prime number to minimize the collision. The hash function H is defined by H(k) = k (mod m) Where H(k) is the hash address (or index of the array) and here k (mod m) means the remainder when k is divided by m. Example Let a company has 90 employees and 00, 01, 02, ...... 99 be the two digit 100 memory address (or index or hash address) to store the records. We have employee code as the key. Choose m in such a way that it is grater than 90. Suppose m = 93. Then for the following employee code (or key k) : H(k) = H(2103) = 2103 (mod93) = 57 H(k) = H(6147) = 6147 (mod 93) = 9 H(k) = H(3750) = 3750 (mod93) = 30 2.Mid Square Method
The key k is squared. Then the hash function H is defined by
H(k) = k2 = l Where l is obtained by digits from both the end of k2 starting from left. Same number of digits must be used for all of the keys. For example consider following keys in the table and its hash index : 3.Folding Method
The key K, k1, k2,...... kr is partitioned into number of parts.
The parts have same number of digits as the required hash address, except possibly for the last part. Then the parts are added together, ignoring the last carry. H(k) = k1 + k2 + ...... + kr Here we are dealing with a hash table with index form 00 to 99, i.e., two-digit hash table. So we divide the K numbers of two digits. H(7148) = 71 + 64 = 155, here we will eliminate the leading carry (i.e., 1). So H(7148) = 71 + 64 = 55. Hash Collision • It is possible that two non-identical keys K1, K2 are hashed into the same hash address. This situation is called Hash Collision. Let us consider a hash table having 10 locations as shown in Fig. Division method is used to hash the key. H(k) = k (mod m)
If we want to insert a new record with key
500 then H(500) = 500(mod 10) = 0. The location 0 in the table is already filled (i.e., not empty). Thus collision occurred. Collisions are almost impossible to avoid but it can be minimized considerably by introducing any one of the following three techniques: 1. Open addressing 2. Chaining 3. Bucket addressing 1. Open Addressing • In open addressing method, when a key is colliding with another key, the collision is resolved by finding a nearest empty space by probing the cells. • Suppose a record R with key K has a hash address H(k) = h. then we will linearly search h + i (where i = 0, 1, 2, ...... m) locations for free space (i.e., h, h + 1, h + 2, h + 3 ......hash address) • The main disadvantage of Linear Probing is that substantial amount of time will take to find the free cell by sequential or linear searching the table. QUADRATIC PROBING Suppose a record with R with key k has the hash address H(k) = h. Then instead of searching the location with address h, h + 1, h + 2,...... h + i ......, we search for free hash address h, h + 1, h + 4, h + 9, h + 16, ...... h + i2,...... . DOUBLE HASHING Second hash function H1 is used to resolve the collision. Suppose a record R with key k has the hash address H(k) = h and H1(k) = h1, which is not equal to m. Then we linearly search for the location with addresses h, h + h1, h + 2h1, h + 3h1, ...... h + i (h1)2 (where i = 0, 1, 2, ......). 2. Chaining • In chaining technique the entries in the hash table are dynamically allocated and entered into a linked list associated with each hash key. 3. Bucket Addressing • Another solution to the hash collision problem is to store colliding elements in the same position in table by introducing a bucket with each hash address. • A bucket is a block of memory space, which is large enough to store multiple items. If a bucket is full, then the colliding item can be stored in the new bucket by incorporating its link to previous bucket.