0% found this document useful (0 votes)

20 views

Hashing

Hashing refers to generating a fixed-size output from variable input using hash functions to determine storage location. Hashing enables constant-time storage and retrieval by mapping keys to array indices via hash functions. Good hash functions uniformly distribute keys and minimize collisions.

Uploaded by

Hajra Arshad Abbasi 4374-FBAS/BSCS/F21

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Hashing

Uploaded by

Hajra Arshad Abbasi 4374-FBAS/BSCS/F21

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Hashing

Hashing refers to the process of generating a fixed-size output from an input of

variable size using the mathematical formulas known as hash functions. This technique
determines an index or location for the storage of an item in a data structure.

Need for Hash data structure

Every day, the data on the internet is increasing multifold and it is always a struggle to
store this data efficiently. In day-to-day programming, this amount of data might not be
that big, but still, it needs to be stored, accessed, and processed easily and efficiently. A
very common data structure that is used for such a purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for a new data
structure! The answer to this is in the word “efficiency“. Though storing in Array takes
O(1) time, searching in it takes at least O(log n) time. This time appears to be small, but
for a large data set, it can cause a lot of problems and this, in turn, makes the Array data
structure inefficient.
So now we are looking for a data structure that can store the data and search in it in
constant time, i.e. in O(1) time. This is how Hashing data structure came into play.
With the introduction of the Hash data structure, it is now possible to easily store data
in constant time and retrieve them in constant time as well.

Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in the hash
function the technique that determines an index or location for storage of an item in
a data structure.
2. Hash Function: The hash function receives the input key and returns the index of
an element in an array called a hash table. The index is known as the hash index.
3. Hash Table: Hash table is a data structure that maps keys to values using a special
function called a hash function. Hash stores the data in an associative manner in an
array where each data value has its own unique index.
How does Hashing work?
Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it in a
table.
Our main objective here is to search or update the values stored in the table quickly in
O(1) time and we are not concerned about the ordering of strings in the table. So the
given set of strings can act as a key and the string itself will act as the value of the
string but how to store the value corresponding to the key?
Step 1: We know that hash functions (which are some mathematical formula) are
used to calculate the hash value which acts as the index of the data structure where
the value will be stored.
Step 2: So, let’s assign
“a” = 1,
“b”=2, .. etc, to all alphabetical characters.
Step 3: Therefore, the numerical value by summation of all characters of the string:
“ab” = 1 + 2 = 3,
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
Step 4: Now, assume that we have a table of size 7 to store these strings. The hash
function that is used here is the sum of the characters in key mod Table size. We
can compute the location of the string in the array by taking the sum (string) mod
7.
Step 5: So we will then store
“ab” in 3 mod 7 = 3,
“cd” in 7 mod 7 = 0, and
“efg” in 18 mod 7 = 4.

Mapping key with indices of array

The above technique enables us to calculate the location of a given string by using a
simple hash function and rapidly find the value that is stored in that location. Therefore
the idea of hashing seems like a great way to store (key, value) pairs of the data in a
table.

What is a Hash function?

The hash function creates a mapping between key and value, this is done through the
use of mathematical formulas known as hash functions. The result of the hash function
is referred to as a hash value or hash. The hash value is a representation of the original
string of characters but usually smaller than the original.
For example: Consider an array as a Map where the key is the index and the value is the
value at that index. So for an array A if we have index i who will be treated as the key
then we can find the value by simply looking at the value at A[i].

Types of Hash functions:

There are many hash functions that use numeric or alphanumeric keys. This article
focuses on discussing different hash functions:
1. Division Method.
2. Mid Square Method.
3. Folding Method.
4. Multiplication Method

1. Division Method:
This is the most simple and easiest method to generate a hash value. The hash function
divides the value k by M and then uses the remainder obtained.
Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
It is best suited that M is a prime number as that can make sure the keys are more
uniformly distributed. The hash function is dependent upon the remainder of a division.

Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
=0
Pros:
1. This method is quite good for any value of M.
2. The division method is very fast since it requires only a single division operation.
Cons:
1. This method leads to poor performance since consecutive keys map to consecutive
hash values in the hash table.
2. Sometimes extra care should be taken to choose the value of M.
2. Mid Square Method:
The mid-square method is a very good hashing method. It involves two steps to
compute the hash value-
1. Square the value of the key k i.e. k2
2. Extract the middle r digits as the hash value.

Formula:
h(K) = h(k x k)
Here,
k is the key value.
The value of r can be decided based on the size of the table.

Example:
Suppose the hash table has 100 memory locations. So r = 2 because two digits are
required to map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
The hash value obtained is 60
Pros:
1. The performance of this method is good as most or all digits of the key value
contribute to the result. This is because all digits in the key contribute to generating
the middle digits of the squared result.
2. The result is not dominated by the distribution of the top digit or bottom digit of the
original key value.
Cons:
1. The size of the key is one of the limitations of this method, as the key is of big size
then its square will double the number of digits.
2. Another disadvantage is that there will be collisions but we can try to reduce
collisions.

3. Digit Folding Method:

This method involves two steps:
1. Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part
has the same number of digits except for the last part that can have lesser digits than
the other parts.
2. Add the individual parts. The hash value is obtained by ignoring the last carry if
any.
Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Here,
s is obtained by adding the parts of the key k
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
Note:
The number of digits in each part varies depending upon the size of the hash table.
Suppose for example the size of the hash table is 100, then each part must have two
digits except for the last part which can have a lesser number of digits.

4. Multiplication Method
This method involves the following steps:
1. Choose a constant value A such that 0 < A < 1.
2. Multiply the key value with A.
3. Extract the fractional part of kA.
4. Multiply the result of the above step by the size of the hash table i.e. M.
5. The resulting hash value is obtained by taking the floor of the result obtained in step
4.

Formula:
h(K) = floor (M (kA mod 1))
Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Example:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Pros:
The advantage of the multiplication method is that it can work with any value between
0 and 1, although there are some values that tend to give better results than the rest.
Cons:
The multiplication method is generally suitable when the table size is the power of two,
then the whole process of computing the index by the key using multiplication hashing
is very fast.

Properties of a Good hash function

A hash function that maps every item into its own unique slot is known as a perfect
hash function. We can construct a perfect hash function if we know the items and the
collection will never change but the problem is that there is no systematic way to
construct a perfect hash function given an arbitrary collection of items. Fortunately, we
will still gain performance efficiency even if the hash function isn’t perfect. We can
achieve a perfect hash function by increasing the size of the hash table so that every
possible value can be accommodated. As a result, each item will have a unique slot.
Although this approach is feasible for a small number of items, it is not practical when
the number of possibilities is large.
So, we can construct our hash function to do the same but the things that we must be
careful about while constructing our own hash function.
A good hash function should have the following properties:
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position is equally likely for each.
3. Should minimize collisions.
4. Should have a low load factor (number of items in the table divided by the size of
the table).

Complexity of calculating hash value using the hash function

 Time complexity: O(n)
 Space complexity: O(1)

Problem with Hashing

If we consider the above example, the hash function we used is the sum of the letters,
but if we examined the hash function closely then the problem can be easily visualized
that for different strings same hash value is begin generated by the hash function.
For example: {“ab”, “ba”} both have the same hash value, and string {“cd”,”be”} also
generate the same hash value, etc. This is known as collision and it creates problem in
searching, insertion, deletion, and updating of value.
What is collision?
The hashing process generates a small number for a big key, so there is a possibility
that two keys could produce the same value. The situation where the newly inserted key
maps to an already occupied and it must be handled using some collision handling
technology.

Collision in Hashing

Biostar-Workflows 1
No ratings yet
Biostar-Workflows 1
228 pages
Retail Dive Mediakit
100% (1)
Retail Dive Mediakit
19 pages
Rain Detector Project Proposal
90% (10)
Rain Detector Project Proposal
4 pages
hashtables
No ratings yet
hashtables
21 pages
Hashing
No ratings yet
Hashing
20 pages
20.Hashing Search Technique
No ratings yet
20.Hashing Search Technique
8 pages
Unit 5 Data Structure
No ratings yet
Unit 5 Data Structure
12 pages
Week13 1
No ratings yet
Week13 1
16 pages
ADS Unit-2
No ratings yet
ADS Unit-2
53 pages
Unit 3.4 Hashing Techniques
No ratings yet
Unit 3.4 Hashing Techniques
7 pages
Module V Unit 2 Hashing
No ratings yet
Module V Unit 2 Hashing
41 pages
Hash
No ratings yet
Hash
7 pages
Hashing and Skiplist_removed
No ratings yet
Hashing and Skiplist_removed
113 pages
Hashing
No ratings yet
Hashing
30 pages
BCS304 DS Module 5 Notes
No ratings yet
BCS304 DS Module 5 Notes
45 pages
Hashing Techniques
No ratings yet
Hashing Techniques
13 pages
Hashing
No ratings yet
Hashing
7 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
HAshing (Satish sir)
No ratings yet
HAshing (Satish sir)
52 pages
Hashing in Data Structures
No ratings yet
Hashing in Data Structures
8 pages
Module5 Notes
No ratings yet
Module5 Notes
68 pages
Unit-9-Hashing-BIM
No ratings yet
Unit-9-Hashing-BIM
5 pages
DSA_M5
No ratings yet
DSA_M5
38 pages
UNIT V - Hashing
No ratings yet
UNIT V - Hashing
20 pages
Hashing Data Structure
No ratings yet
Hashing Data Structure
22 pages
Hash-Data Structure
No ratings yet
Hash-Data Structure
16 pages
Unit-5
No ratings yet
Unit-5
50 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Unit 5 Session 5 Hashing
No ratings yet
Unit 5 Session 5 Hashing
20 pages
hashing.docx
No ratings yet
hashing.docx
6 pages
Sorting 2
No ratings yet
Sorting 2
19 pages
Hashing
No ratings yet
Hashing
12 pages
ds-5_removed
No ratings yet
ds-5_removed
16 pages
C
No ratings yet
C
20 pages
DS 5
No ratings yet
DS 5
23 pages
Hash Function
No ratings yet
Hash Function
9 pages
Hashing
No ratings yet
Hashing
44 pages
Hashing
No ratings yet
Hashing
34 pages
HASHING
No ratings yet
HASHING
8 pages
Lecture 08 - Hash Tables
No ratings yet
Lecture 08 - Hash Tables
21 pages
CH 4 Hash Table
No ratings yet
CH 4 Hash Table
20 pages
vbhash
No ratings yet
vbhash
10 pages
Hashing
No ratings yet
Hashing
23 pages
Lab5 Hashing Algos
No ratings yet
Lab5 Hashing Algos
10 pages
Hashing
No ratings yet
Hashing
20 pages
CO4 - Hashing in Data Structure
No ratings yet
CO4 - Hashing in Data Structure
13 pages
DS 5
No ratings yet
DS 5
16 pages
Hashing: Why We Need Hashing?
No ratings yet
Hashing: Why We Need Hashing?
22 pages
Unit-5 2
No ratings yet
Unit-5 2
9 pages
Unit 2 hashing
No ratings yet
Unit 2 hashing
3 pages
Module 5 UQ
No ratings yet
Module 5 UQ
15 pages
HASHING
No ratings yet
HASHING
21 pages
ADS M TECH MID 2
No ratings yet
ADS M TECH MID 2
26 pages
Module 5 Hashing
No ratings yet
Module 5 Hashing
66 pages
Unit-10
No ratings yet
Unit-10
10 pages
Block-4
No ratings yet
Block-4
30 pages
DS Module-X
No ratings yet
DS Module-X
74 pages
Unit 3 Hashing
No ratings yet
Unit 3 Hashing
23 pages
Hashing
No ratings yet
Hashing
5 pages
Notes of advanced data structures
No ratings yet
Notes of advanced data structures
202 pages
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
4512 984 05232aa
No ratings yet
4512 984 05232aa
14 pages
[FREE PDF sample] Abstract Dynamic Programming Second Edition Dimitri P. Bertsekas ebooks
100% (2)
[FREE PDF sample] Abstract Dynamic Programming Second Edition Dimitri P. Bertsekas ebooks
44 pages
All - Turbos, Flyin' Miata Turbo Installation
100% (1)
All - Turbos, Flyin' Miata Turbo Installation
46 pages
E-Governance Case Study in China-Beijing Business E-Park
No ratings yet
E-Governance Case Study in China-Beijing Business E-Park
4 pages
IMS Unison University, Dehradun Online II Mid Term Examination Question Paper Word Limit: Up-To 500 Words
No ratings yet
IMS Unison University, Dehradun Online II Mid Term Examination Question Paper Word Limit: Up-To 500 Words
6 pages
Unit-1 DBMS English
No ratings yet
Unit-1 DBMS English
23 pages
Ad 7705 Interface
No ratings yet
Ad 7705 Interface
3 pages
QMDM ANOVA1way
No ratings yet
QMDM ANOVA1way
25 pages
Canva Shortcuts Table Watermarked 1
No ratings yet
Canva Shortcuts Table Watermarked 1
1 page
Ict Lesson Plans 1 Week 3 Intro
No ratings yet
Ict Lesson Plans 1 Week 3 Intro
4 pages
Basis Realtime Questions and Answers
No ratings yet
Basis Realtime Questions and Answers
18 pages
Email Contacts
No ratings yet
Email Contacts
3 pages
Arabic Book For Wcdma 3g GSM 2g Gprs Umts Planning
No ratings yet
Arabic Book For Wcdma 3g GSM 2g Gprs Umts Planning
289 pages
Ibrahim Abdelhamid - Project Engineer - CV
No ratings yet
Ibrahim Abdelhamid - Project Engineer - CV
5 pages
Fabiola Eshun Resume - 22-03-24
No ratings yet
Fabiola Eshun Resume - 22-03-24
1 page
rsch_pdf_30297368
No ratings yet
rsch_pdf_30297368
83 pages
Mongodb Schema Validation
No ratings yet
Mongodb Schema Validation
8 pages
Marking the Mind A History of Memory 1st Edition Kurt Danzigerpdf download
100% (1)
Marking the Mind A History of Memory 1st Edition Kurt Danzigerpdf download
29 pages
IBM Tivoli Monitoring for Network Performance Messages and Troubleshooting
No ratings yet
IBM Tivoli Monitoring for Network Performance Messages and Troubleshooting
119 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
Python Programming and SQL
No ratings yet
Python Programming and SQL
5 pages
Security Standards White Paper
No ratings yet
Security Standards White Paper
60 pages
Ideal Remote Sensing System
No ratings yet
Ideal Remote Sensing System
4 pages
Deepfakes A Grounded Threat Assessment
100% (1)
Deepfakes A Grounded Threat Assessment
50 pages
Annexure 2
No ratings yet
Annexure 2
2 pages
Led Infrarrojo
No ratings yet
Led Infrarrojo
7 pages
Voice Based E-Mail For Visually Challanged
No ratings yet
Voice Based E-Mail For Visually Challanged
14 pages

Hashing

Uploaded by

Hashing

Uploaded by

Hashing

Hashing refers to the process of generating a fixed-size output from an input of

Need for Hash data structure

Mapping key with indices of array

What is a Hash function?

Types of Hash functions:

3. Digit Folding Method:

Properties of a Good hash function

Complexity of calculating hash value using the hash function

Problem with Hashing

You might also like