Dsa Lecture 13 Hash Tables

This document provides an overview of hash tables. It discusses: 1. Hash tables allow storing data in easily determined locations by mapping keys to indices using a hash function, allowing constant-time lookups but using more memory than other data structures. 2. The table abstract data type specifies common operations like insert, retrieve, update, and delete based on a unique key. 3. Hash tables are implemented using arrays and a hash function to map keys to indices, with strategies like buckets, chaining, and open addressing to handle collisions when different keys map to the same index. 4. Linear probing and double hashing are open addressing strategies, with double hashing avoiding clustering issues by jumping multiple slots based

Uploaded by

Krizzy kriz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views15 pages

Dsa Lecture 13 Hash Tables

Uploaded by

Krizzy kriz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Data Structures and Algorithms

Lecture 13 – Hash Tables

Sayed Faheem Qadry

Department of Computer Science
The Institute of Finance Management
Storing data
 So far we have learned about arrays, linked lists and trees.
 These approaches can perform quite differently when it comes to the particular tasks we expect to
carry out on the items, such as insertion, deletion and searching.
 The best way of storing data does not exist in general, but depends on the particular application.
Now lets look at another way to store data:
 We want to put each item in an easily determined location.
 So that we never need to search for it, and have no ordering to maintain when inserting or deleting
items.
 This has impressive performance as far as time is concerned
 BUT the disadvantage is need for more memory, as well as complicated algorithms and harder to
implement.
The Table abstract data type
The specification of the table abstract data type is as follows:
 A table can be used to store objects
 The table has a unique key
Methods or procedures:
 Boolean IsEmpty()
 Boolean IsFull()
 void Insert(Record)
 Record Retrieve(key)
 void Update(Record)
 void Delete(key)
 void Traverse()
Implementations of the table data structure

 Implementation via sorted arrays: deletion is a challenge.

 Implementation via binary search trees: the best way to utilize will
be to use self-balancing binary search tree which is complicated
 Implementation via Hash tables: this is better than the above
alternatives, but uses more memory.
Hash Tables
 given a key, there is a way of jumping straight to the entry for that key.
 So there is no need to search at all.
 Assume that we have an array data to hold our entries.
 if we had a function h(k) that maps each key k to the index (an integer) where the associated
entry will be stored, then we could just look up data[h(k)] to find the entry with the key k.
 If keys were in small number say n = 100, then we could use an array a[100], then the
function h(k) will simply be k (the index of the array) and will make accessing the values
much easier.
 BUT what if we have huge data say 14 digit NIDA ID number?
 Therefore we use a non-trivial function h, the so-called hash function, to map the space of
possible keys to the set of indices of our array (mapping).
 particular attention should be paid to choosing the hash function h in such a way that
collisions among them are less likely to occur (similar keys).
Collision likelihoods and load factors for hash
tables
The von Mises birthday paradox
 for 24 people’s birthdays in a 365 calendar days the probability of collision is bigger than 50%.
 It may be surprising that p(22) = 0.476 and p(23) = 0.507, which means that as soon as there are more
than 22 people in a group, it is more likely that two of them share a birthday than not.
Implications for hash tables
 If 23 random locations in a table of size 365 have more than a 50% chance of overlapping, it seems
inevitable that collisions will occur in any hash table that does not waste an enormous amount of memory.
 And collisions will be even more likely if the hash function does not distribute the items randomly
throughout the table.
The load factor of a hash table
 Suppose we have a hash table of size m, and it currently has n entries.
 Then we call λ = n/m the load factor of the hash table.
 Therefore, to minimize collisions, it is prudent to keep the load factor low. Fifty percent is an often quoted
good maximum figure, while beyond an eighty percent load the performance deteriorates considerably.
A simple Hash Table in operation
 Assume we have the following table with key/value pair. Key Value
 Say the key is name and value is the age of the individual. Paul 29
 We want to build a hash table or a dictionary in this case such that ideally Jane 35
the operations insert(), search(), delete() and update() will be O(1). Chacha 18
 Say my hash table is an array of 8 elements and I decide the best way to Alex 30
locate the names is by using the first letter’s distance from first letter (A).
 Since I have only 8 spaces in my hash table I can only store digits 0 – 7.
 But e.g. if my first letter is ‘P’ for Paul I will have ‘p’ – ‘a’ = 15, I cannot
store 15 in my array.
 Therefore I have to use a modulo operation, and modular arithmetic more
generally, are widely used when constructing good hash functions.
 That is 15 mod 8 = 7 (this will give me the remainder after dividing by
7). It’s called a hash code and (x – a) mod 8 is called a hash function.
A simple Hash Table in operation (cont)

Key Value index

Paul 29 7
Jane 35 1
Chacha 18 2
Alex 30 0

0 1 2 3 4 5 6 7

Alex Jane Chacha Paul

30 35 18 29

 What if we want to add the following names Amina, Chakubanga, Jamila, Peter?
Strategies for dealing with collisions
Buckets
 One obvious option is to reserve a two-dimensional array from the start.
 The disadvantage of this approach is that it has to reserve quite a bit more space than will be
eventually required, since it must take into account the likely maximal number of collisions.
 Also, when searching for a particular key, it will be necessary to search the entire column
associated with its expected position, at least until an empty slot is reached (Linear search).
You can also sort the column (Binary Search).

0 1 2 3 4 5 6 7
Alex Jane Chacha Paul
30 35 18 29
Amina Jamila Chakubanga Peter
23 37 25 50
Strategies for dealing with collisions (cont)
Direct chaining
 Use linked lists instead of the full array.
 This approach does not reserve any space that will not be taken up, but has the disadvantage that in order to find
a particular item, lists will have to be traversed.
 However, adding the hashing step still speeds up retrieval considerably.
 the complexity class of all operations is constant, i.e. O(1)
 For traversal, we need to sort the keys, which can be done in O(nlog2 n)
 Hence, this method is better than the previous.
0 1 2 3 4 5 6 7

Alex 18 Jane 35 Chacha Paul 29

Amina Jamila 37 Chakuba Peter 50

23 nga 25
Strategies for dealing with collisions (cont)
Open addressing
 involves finding another open location for any entry which cannot be placed where its
hash function points.
 We refer to that position as a key’s primary position (so in the earlier example, Alex
and Amina have the same primary position).
 The easiest strategy for achieving this is to search for open locations by simply
decreasing the index considered by one until we find an empty space.
 If this reaches the beginning of the array, i.e. index 0, we start again at the end. This
process is called linear probing.
 A better approach is to search for an empty location using a secondary hash function.
 This process is called double hashing.
Open addressing
Linear probing
 Insert into an empty slot on the left of the array by reducing its index.
 hash table that uses open addressing should have at least one empty slot at any time, and be
declared full when only one empty location is left.
 hash table time complexity for search is constant, i.e. O(1)
 Creates secondary clusters of same hash code, these blocks, or clusters, keep growing, not
only if we hit the same primary location repeatedly, but also if we hit anything that is part
of the same cluster. The last effect is called secondary clustering.
 Note that searching for keys is also adversely affected by these clustering effects.
Open addressing (cont)
Double hashing
 The obvious way to avoid the clustering problems of linear probing is to do something
slightly more sophisticated than trying every position to the left until we find an empty
one.
 We apply a secondary hash function to tell us how many slots to jump to look for an
empty slot if a key’s primary position has been filled already.
 Say if we choose the secondary hash function such as [(x – a)/6] mod 6 equals 3 then we
try to look for an empty slot every 3rd location.
 Suppose you want to insert (Peter, 50) then you will get position at index 4.

0 1 2 3 4 5 6 7

Alex Jane Chacha Peter Paul

30 35 18 50 29

3rd 2nd 1st

Choosing good hash functions
For primary hash functions
 make sure that it spreads the space of possible keys onto the set of hash table indices as evenly
as possible, so that few collisions occur.
 it is advantageous if any potential clusters in the space of possible keys are broken up (leave a
space between one cluster and another) to avoid “continuous run”.
For secondary hash functions
 different keys with the same primary position give different results when the secondary hash
function is applied.
 one has to be careful to ensure that the secondary hash function cannot result in a number which
has a common divisor with the size of the hash table.
For example, if the hash table has size 10, and we get a secondary hash function which gives 2 (or
4, 6 or 8) as a result, then only half of the locations will be checked, which might result in failure
(an endless loop, for example) while the table is still half empty. Even for large hash tables, this can
still be a problem if the secondary hash keys can be similarly large.
A simple remedy for this is to always make the size of the hash table a prime number.
Applications of Hash tables
 Password Verification
Cryptographic hash functions are very commonly used in password verification. Let’s
understand this using an Example:
When you use any online website which requires a user login, you enter your E-mail and
password to authenticate that the account you are trying to use belongs to you. When the
password is entered, a hash of the password is computed which is then sent to the server for
verification of the password. The passwords stored on the server are actually computed hash
values of the original passwords. This is done to ensure that when the password is sent from
client to server, no sniffing is there.
 File System
The hashing is used for the linking of the file name to the path of the file. When you interact
with a file system as a user, you see the file name, maybe the path to the file. But to actually
store the correspondence between the file name and path, and the physical location of that file
on the disk, the System uses a map, and that map is usually implemented as a hash table.

GuideToApacheAirflow PDF
100% (1)
GuideToApacheAirflow PDF
6 pages
710-97 - Archive Server 9.7.1 Administration
No ratings yet
710-97 - Archive Server 9.7.1 Administration
449 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hashing
No ratings yet
Hashing
56 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Lecture 14 Hashing
No ratings yet
Lecture 14 Hashing
44 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
GROUP 15.Pptx Presentation
No ratings yet
GROUP 15.Pptx Presentation
29 pages
L-2005-08-Advance Data Structure Part 1-HS
No ratings yet
L-2005-08-Advance Data Structure Part 1-HS
46 pages
Hashing new
No ratings yet
Hashing new
48 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Hash Tables: Dr. Dibakar Saha
No ratings yet
Hash Tables: Dr. Dibakar Saha
26 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
Hash Tables: A Detailed Description
No ratings yet
Hash Tables: A Detailed Description
10 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
L5 HashTables
No ratings yet
L5 HashTables
22 pages
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
No ratings yet
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
39 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Hash Functions
No ratings yet
Hash Functions
60 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Hashing Slide
No ratings yet
Hashing Slide
16 pages
Hashing
No ratings yet
Hashing
20 pages
CH 4
No ratings yet
CH 4
58 pages
Hashing
No ratings yet
Hashing
34 pages
UNIT 1- Hashing
No ratings yet
UNIT 1- Hashing
118 pages
Hash Tables
100% (1)
Hash Tables
30 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
06 - Hashing
No ratings yet
06 - Hashing
75 pages
Hashing Updated
No ratings yet
Hashing Updated
26 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
Hash Table
No ratings yet
Hash Table
9 pages
06 - APS - Hash Table
No ratings yet
06 - APS - Hash Table
28 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Group 15 Hash Tables
No ratings yet
Group 15 Hash Tables
42 pages
Chapter10_HashTables
No ratings yet
Chapter10_HashTables
49 pages
MODULE-5
No ratings yet
MODULE-5
33 pages
Chapter 5_Hashing _Part1
No ratings yet
Chapter 5_Hashing _Part1
28 pages
Hashing
No ratings yet
Hashing
44 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
Struktur Data: By: Sri Rezeki Candra Nursari
No ratings yet
Struktur Data: By: Sri Rezeki Candra Nursari
34 pages
Hashing
No ratings yet
Hashing
57 pages
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
No ratings yet
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
20 pages
Hashing
No ratings yet
Hashing
37 pages
Hashing
No ratings yet
Hashing
30 pages
Lec12-Hash-Tables-09092024-090609pm (1)
No ratings yet
Lec12-Hash-Tables-09092024-090609pm (1)
48 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Unit 1 Dsa Hashing 2022 Compressed 1
No ratings yet
Unit 1 Dsa Hashing 2022 Compressed 1
115 pages
Idst 2016 SA 05 Hashing
No ratings yet
Idst 2016 SA 05 Hashing
68 pages
Lecture 12
No ratings yet
Lecture 12
33 pages
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
No ratings yet
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
77 pages
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
Order Management System ER Diagram PDF
No ratings yet
Order Management System ER Diagram PDF
4 pages
01 Create An Azure AI Search Solution
No ratings yet
01 Create An Azure AI Search Solution
34 pages
Oracle DBA Resume With 4+exp
50% (2)
Oracle DBA Resume With 4+exp
5 pages
Access Control Matrix
No ratings yet
Access Control Matrix
41 pages
376871_Lab4
No ratings yet
376871_Lab4
2 pages
Hibernate Interview Questions and Answers
No ratings yet
Hibernate Interview Questions and Answers
39 pages
E-Notes by Prof. Jagadish S Kallimani, MSRIT, BLR Basic File Attributes
No ratings yet
E-Notes by Prof. Jagadish S Kallimani, MSRIT, BLR Basic File Attributes
39 pages
Com Res 10619 Annex M OFOV
No ratings yet
Com Res 10619 Annex M OFOV
1 page
Who Is On The Lords Side - Caradog Roberts
No ratings yet
Who Is On The Lords Side - Caradog Roberts
4 pages
MCQ Freak
No ratings yet
MCQ Freak
24 pages
MS 1184 - 2002 - Code of Practice On Access For Disabled Person To Public Buildings-1
No ratings yet
MS 1184 - 2002 - Code of Practice On Access For Disabled Person To Public Buildings-1
1 page
Classification of Memory
No ratings yet
Classification of Memory
34 pages
BI Chapter 4 - SP2020 PDF
No ratings yet
BI Chapter 4 - SP2020 PDF
16 pages
Book Shop Information System: Final Project Report
No ratings yet
Book Shop Information System: Final Project Report
25 pages
Oscilografia 10bbt01
No ratings yet
Oscilografia 10bbt01
82 pages
Splunk Dump
No ratings yet
Splunk Dump
33 pages
SQL For Data Analysis Cheat Sheet A4
No ratings yet
SQL For Data Analysis Cheat Sheet A4
3 pages
Msbte UT 1 QB Answers
No ratings yet
Msbte UT 1 QB Answers
13 pages
IT Dept TimeTable Aug to Dec 2024_24_09
No ratings yet
IT Dept TimeTable Aug to Dec 2024_24_09
11 pages
GPFS Commands
No ratings yet
GPFS Commands
2 pages
Objective: Professional Summary:: Clients Confidential
No ratings yet
Objective: Professional Summary:: Clients Confidential
2 pages
Tallerde Produccion
No ratings yet
Tallerde Produccion
13 pages
Core DBA Scripts
75% (4)
Core DBA Scripts
115 pages
R-Studio For Mac: User's Manual
100% (1)
R-Studio For Mac: User's Manual
228 pages
Test Bank - Business Intelligence - Grant Mai
No ratings yet
Test Bank - Business Intelligence - Grant Mai
61 pages
SQL Practical File
100% (1)
SQL Practical File
9 pages
Setting Oracle Environment Variables in UNIX
No ratings yet
Setting Oracle Environment Variables in UNIX
2 pages
Sap Abap Mock Test IV
No ratings yet
Sap Abap Mock Test IV
4 pages

Dsa Lecture 13 Hash Tables

Uploaded by

Dsa Lecture 13 Hash Tables

Uploaded by

Data Structures and Algorithms

Lecture 13 – Hash Tables

Sayed Faheem Qadry

 Implementation via sorted arrays: deletion is a challenge.

Key Value index

Alex Jane Chacha Paul

Alex 18 Jane 35 Chacha Paul 29

Amina Jamila 37 Chakuba Peter 50

Alex Jane Chacha Peter Paul

3rd 2nd 1st

You might also like