DSA G5 Hashing Handouts
DSA G5 Hashing Handouts
Group 5
Hashing
Definition
- refers to the process of transforming a given key to another value. It involves mapping data
to a specific index in a hash table using a hash function that enables fast retrieval of
information based on its key. The transformation of a key to the corresponding value is done
using a Hash Function and the value obtained from the hash function is called Hash Code..
Hash Tables
It is defined as a data structure used to insert, look up, and remove key-value pairs quickly. It operates
on the hashing concept, where each key is translated by a hash function into a distinct index in an
array.
It stores key-value pairs and uses a hash function to map each key to a specific location, or "bucket,"
in memory. Hash tables are widely used in programming for their efficiency in performing quick
insertions, deletions, and lookups.
The index functions as a storage location for the matching value. In simple words, it maps the keys
with the value.
A hash table’s load factor is determined by how many elements are kept there in relation to how big
the table is. The table may be cluttered and have longer search times and collisions if the load factor is
high.
An ideal load factor can be maintained with the use of a good hash function and proper table
resizing.
Keys - a unique identifier for values. Keys are passed through a hash function to determine where to
store or retrieve the corresponding value.
Values - it is the data associated with each key, which can be of any data type.
Hash Function - it is an algorithm that takes a key as input and produces an index in the table where
the value will be stored. A good hash function distributes keys uniformly to minimize collisions.
Buckets - are the individual slots in the hash table where key-value pairs are stored. Each bucket is
indexed by the hash function's output.
When a key-value pair is added, the key is hashed using a hash function, producing an index where
the value is stored.
For lookups, the hash table hashes the key to find the index where the value is stored.
If the hash function distributes keys well, the hash table can achieve average 0(1) (constant-time)
complexity for insertion, deletion, and retrieval operations.
Hash Functions
Hash functions are a fundamental concept in computer science and play a crucial role in various
applications such as data storage, retrieval, and cryptography. It is primarily used in hash tables,
which are essential for efficient data management.
It is a function that takes an input (or ‘message’) and returns a fixed-size string of bytes. The output,
typically a number, is called the hash code or hash value.
The main purpose of a hash function is to efficiently map data of arbitrary size to fixed-size values,
which are often used as indexes in hash tables.
Division Method
It is one of the simplest hashing techniques.
A key or input is divided by a certain divisor (often a prime number), and the remainder is used as
the hash value. This method is easy to implement but may lead to clustering, where multiple keys
map to the same hash value.
Formula: Hash Value = key mod divisor
Multiplication Method
It involves multiplying the key by a constant (usually a fraction between 0 and 1) and then
extracting a portion of the resulting number to use as the hash value.
This method reduces clustering by evenly distributing values across the hash table.
⌊
Formula: Hash Value= m×(k×Amod1) ⌋
where:
m is the size of the hash table,
k is the key,
A is a constant between 0 and 1 (often chosen as a fraction related to the golden ratio for
best results).
Mid-Square Method
In the Mid-Square Method, the key is squared, and the middle portion of the resulting number is
taken as the hash value. This method often reduces collisions by creating a more diverse range of
hash values.
Steps:
i. Square the key,
ii. Extract the middle digits of the squared result as the hash value.
iii. For example: If the key is 123, squaring it gives 123 (squared) = 15129. If we take the middle
three digits (512), then 512 is the hash value.
Folding Method
The Folding Method splits the key into equal parts (often using the same number of digits) and
then adds those parts together to obtain the hash value. This method is useful when keys are large
numbers, like identification numbers.
Steps:
i. Split the key into several parts.
ii. Add those parts together to get the hash value.
iii. For example: If the key is 123456, split it into 123 and 456. Summing these parts gives a hash
value of 123 + 456 = 579.
Cryptographic Hash Functions
It is designed to be secure and are used in cryptography. They are often used for tasks like digital
signatures, password storage, and data integrity checks. Examples include MD5, SHA-1, and
SHA-256.
Collision Resolution Techniques
What is Collision?
Since a hash function gets us a small number for a key which is a big integer or string, there is a
possibility that two keys result in the same value. The situation where a newly inserted key maps
to an already occupied slot in the hash table is called collision and must be handled using some
collision handling technique.
Separate Chaining
Open Addressing
SEPARATE CHAINING
The idea behind separate chaining is to implement the array as a linked list called a chain.
The linked list data structure is used to implement this technique. So what happens is, when multiple
elements are hashed into the same slot index, then these elements are inserted into a singly-linked list
which is known as a chain.
Performance of Chaining
Performance of hashing can be evaluated under the assumption that each key is equally likely to be hashed
to any slot of the table (simple uniform hashing).
m = Number of slots in hash table
n = Number of keys to be inserted in hash table
OPEN ADDRESING
-is a collision handling technique used in hash tables. Instead of using linked lists (as in separate
chaining), Open Addressing keeps all elements within the hash table itself, ensuring that each slot holds
only one key-value pair. When a collision occurs, it searches for an alternative empty slot within the table
to store the new key.
Linear Probing: Searches sequentially from the point of collision until an empty slot is found.
Quadratic Probing: Uses a quadratic function to space out the search intervals, reducing clustering.
Double Hashing: Uses a second hash function to calculate probe intervals, making probing sequences
unique for each key.
Load Factor a = n/m (where n is the number of keys and m is the number of slots):
Table Capacity Doesn’t fill up (can always Table can become full
add more elements to a
chain)
Usage Scenario Suitable for unknown key Better for known key count
insertion/deletion frequency and frequency
Cache Performance Poorer (due to linked list Better (data stays within
traversal) the table)
Memory Efficiency Uses extra space for links No extra links, but requires
sufficient slots
Space Utilization Some slots might be unused Each slot can be used, even
if not directly mapped
Open Addressing has better cache efficiency and uses probing techniques (linear, quadratic, double
hashing) to avoid clustering, but is limited by table capacity and is sensitive to load factor.
Application of Hashing
Database Indexing: Quickly retrieves data using key-value pairs.
Caches: Stores frequently accessed data for fast retrieval.
Dictionaries in Programming Languages: Implements associative arrays for key-value data storage.
Password Verification: Stores hashed passwords for secure authentication.
Memory Management: Tracks memory allocation and deallocation for efficient usage.
References:
Leader
Carlowe Deala
Members
Jenevive Sanchez
Joan Grace Patalinghug
Keisha Soler
Nathaniel Piraman
Ivann Jade Martel
John Marnell Asutilla