0% found this document useful (0 votes)
3 views

Hash

Hashing is a technique that maps data to specific locations in a hash table using a hash function for efficient storage and retrieval. It has components like keys, hash functions, and hash tables, and is used in various applications such as database indexing and password storage. Collision resolution techniques include open hashing and closed hashing, each with its own advantages and disadvantages.

Uploaded by

student09hub
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Hash

Hashing is a technique that maps data to specific locations in a hash table using a hash function for efficient storage and retrieval. It has components like keys, hash functions, and hash tables, and is used in various applications such as database indexing and password storage. Collision resolution techniques include open hashing and closed hashing, each with its own advantages and disadvantages.

Uploaded by

student09hub
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Hashing

Definition: Hashing is a fundamental technique in data structures and algorithms that facilitates
efficient data storage and retrieval by mapping data (keys) to specific locations (indices) in a data
structure called a hash table. This process is achieved through the use of a hash function, which
computes an index into an array of buckets or slots, from which the desired value can be found.

Situations Where Hash is not Used

 Need to maintain sorted data along with search, insert and delete. We use a self-balancing BST
in these cases.

 When Strings are keys and we need operations like prefix search along with search, insert and
delete. We use Trie in these cases.

 When we need operations like floor and ceiling along with search, insert and/or delete. We use
Self Balancing BST in these cases.

Components of Hashing

There are majorly three components of hashing:

1. Key: A Key can be anything string or integer which is fed as input in the hash function the
technique that determines an index or location for storage of an item in a data structure.

2. Hash Function: Receives the input key and returns the index of an element in an array called
a hash table. The index is known as the hash index .

3. Hash Table: Hash table is typically an array of lists. It stores values corresponding to the keys.
Hash stores the data in an associative manner in an array where each data value has its own
unique index.
How does Hashing work?

Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it in a table.

 Step 1: We know that hash functions (which is some mathematical formula) are used to
calculate the hash value which acts as the index of the data structure where the value will be
stored.

 Step 2: So, let’s assign

o “a” = 1,

o “b”=2, .. etc, to all alphabetical characters.

 Step 3: Therefore, the numerical value by summation of all characters of the string:

 “ab” = 1 + 2 = 3,

 “cd” = 3 + 4 = 7 ,

 “efg” = 5 + 6 + 7 = 18

 Step 4: Now, assume that we have a table of size 7 to store these strings. The hash function that
is used here is the sum of the characters in key mod Table size . We can compute the location
of the string in the array by taking the sum(string) mod 7 .

 Step 5: So we will then store

o “ab” in 3 mod 7 = 3,

o “cd” in 7 mod 7 = 0, and

o “efg” in 18 mod 7 = 4.
Properties of a Good Hash Function

 Deterministic: The same input should always produce the same output.

 Efficiently Computable: The function should compute the hash value quickly.

 Uniform Distribution: It should distribute the keys uniformly across the hash table to
minimize collisions.

 Minimize Collisions: Different keys should ideally hash to different indices.

Collision Resolution Techniques

Collisions occur when two keys hash to the same index in a hash table. Common methods to handle
collisions include:

In hashing, collision resolution is essential to handle scenarios where multiple keys map to the same
index in a hash table. Two primary strategies for collision resolution are open hashing (also known as
separate chaining) and closed hashing (also known as open addressing).

1. Open Hashing (Separate Chaining):

In open hashing, each slot in the hash table contains a reference to a collection (commonly a linked list)
of all keys that hash to the same index. When a collision occurs, the new key is added to the collection
at that slot.

Example:

Consider a hash table with 10 slots and a hash function h(k) = k % 10. Inserting keys 12, 22, and 32
would result in:

h(12) = 2

h(22) = 2

h(32) = 2

All three keys hash to index 2. Using separate chaining, index 2 would contain a linked list with
elements [12, 22, 32].

Advantages:

 Simplifies collision resolution by allowing multiple elements at each index.


 The hash table can handle a dynamic number of elements without significant performance
degradation.

Disadvantages:

 Requires additional memory for pointers in the linked lists.


 Performance may degrade if many keys hash to the same index, leading to long chains.

2. Closed Hashing (Open Addressing):

In closed hashing, all keys are stored within the hash table itself. When a collision occurs, the algorithm
probes the table to find the next available slot. Common probing methods include:

Linear Probing: Sequentially checks the next slots until an empty one is found.

Example:

If h(k) = 2 is occupied, check h(k) + 1, h(k) + 2, etc.

Quadratic Probing: Probes slots at intervals of 1², 2², 3², etc., away from the original hash index.

Example:

If h(k) = 2 is occupied, check h(k) + 1², h(k) + 2², etc.

Double Hashing: Uses a second hash function to determine the probe step distance.

Example:

If h1(k) = 2 is occupied, compute the step size using h2(k) and probe at intervals of h2(k).

Advantages:

 Avoids the need for additional data structures like linked lists.
 Can be more cache-friendly due to contiguous memory usage.

Disadvantages:

 Performance can degrade as the load factor increases, leading to clustering.


 Deletion of keys can be complex, often requiring special markers to indicate deleted slots.
 Choosing between open and closed hashing depends on factors like memory availability,
expected load factor, and performance requirements. Understanding these methods is crucial
for designing efficient hash tables and ensuring optimal data retrieval performance.

Applications of Hashing

 Database Indexing: Quickly locate data without searching every row.


 Caches: Implement associative arrays for fast data retrieval.

 Sets: Implement sets that can check for membership efficiently.

 Password Storage: Store hashed passwords to enhance security.

Advantages of Hashing

 Fast Data Retrieval: Provides constant time complexity, O(1), for search, insert, and delete
operations on average.

 Efficient Memory Usage: Only stores necessary data, leading to efficient memory utilization.

Disadvantages of Hashing

 Collisions: Handling collisions can complicate implementation and affect performance.

 Fixed Size: Hash tables have a fixed size, which can lead to inefficiencies if not appropriately
sized.

 Not Ordered: Data is not stored in a sorted order, making range queries inefficient.

Hash Function

A function that takes an input (or 'key') and returns an integer, which is typically used as an index in a
hash table. The goal is to distribute the keys uniformly across the hash table to minimize collisions.
Various types of hash functions are employed to achieve uniform distribution and minimize collisions.
Below are some common types of hash functions, each explained with examples:

1. Division (Modulo) Method:

o Description: This method computes the hash value by taking the remainder of the
division of the key by the size of the hash table.

o Formula: hash(key) = key % table_size

o Example: If the key is 1234 and the table size is 10, the hash value would be 1234 %
10 = 4.

2. Multiplication Method:

o Description: This method involves multiplying the key by a constant fractional value
(A), extracting the fractional part of the result, and then multiplying it by the table size
to get the hash value.

o Formula: hash(key) = floor(table_size * (key * A % 1)), where 0 < A < 1


o Example: If the key is 1234, table size is 10, and A is 0.618, the hash value would be
calculated as floor(10 * (1234 * 0.618 % 1)).

3. Mid-Square Method:

o Description: This method squares the key and then extracts a portion of the resulting
digits to use as the hash value.

o Example: If the key is 56, squaring it gives 3136. Extracting the middle two digits (13)
could serve as the hash value.

4. Folding Method:

o Description: This method divides the key into equal parts, adds these parts together,
and then applies a modulo operation with the table size to get the hash value.

o Example: If the key is 987654 and the table size is 100, dividing the key into two parts
(987 and 654), summing them gives 1641. Then, 1641 % 100 = 41, so the hash value
is 41.

5. Digit Extraction Method:

o Description: This method selects specific digits from the key to form the hash value.

o Example: If the key is 7654321, extracting the 2nd, 4th, and 6th digits gives 652, which
can be used as the hash value.

6. Radix Transformation Method:

o Description: This method changes the base of the key to another number system (radix)
and then applies a hash function.

o Example: Converting a decimal key 255 to binary gives 11111111. Applying a hash
function to this binary representation yields the hash value.

7. Pseudo-Random Method:

o Description: This method uses a pseudo-random number generator to produce a hash


value based on the key.

o Example: Using a seed value equal to the key in a pseudo-random number generator
to produce a hash value.

8. Cryptographic Hash Functions:


o Description: These functions are designed for security purposes, producing a fixed-
size hash value from input data, making it computationally infeasible to reverse the
process.

o Example: SHA-256 produces a 256-bit hash value from any input data.

Each of these hash functions has its own use cases and is chosen based on factors like the nature of the
keys, the required distribution uniformity, and specific application requirements. Understanding these
functions helps in designing efficient hash tables and ensuring optimal performance in data retrieval
operations.

You might also like