L5 HashTables

Uploaded by

myhealth632

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views22 pages

L5 HashTables

Uploaded by

myhealth632

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 22

Hashed Indexes

Hashed Indexes
At the end of this lecture students should be able to:
• Describe static hashed tables and indexes, and how to
handle collisions
• Describe dynamic hashed tables, how database operations
are carried out on them, and the advantages offered
• Explain the main properties of hash functions, and multi-
attribute hashing techniques
Hashed Indexing and Collision
Handling
• Associative Tables
• (Dynamic) Hashed Tables
• Hash Functions
• Collisions and How to Handle Them
Introduction: Hashing
• Many applications require a dynamic set that supports only the
dictionary operations INSERT, SEARCH, and DELETE. For example,
a compiler for a computer language maintains a symbol table, in
which the keys of elements are arbitrary character strings that
correspond to identifiers in the language.
• A hash table is an effective data structure for implementing
dictionaries.
• Although searching for an element in a hash table can take as long as
searching for an element in a linked list O(n) time in the worst case—
in practice, hashing performs extremely well.
• Under reasonable assumptions, the expected time to search for an
element in a hash table is O(1).
• The bottom line is that hashing is an extremely effective and practical
technique: the basic dictionary operations require only O(1) time on
the average.
• “perfect hashing” can support searches in O(1) worst case time, when
the set of keys being stored is static (that is, when the set of keys never
changes once stored).
Associative Tables
• Consider an array of integers:

• It is easy enough to find the value at slot 3 (that is, 91). How to find
the slot with value 76?
– We could search in the array.
– But it is much faster to use associative lookup.
• Hash functions provide association by mapping from the stored value
(for example 76) to the location where it is stored (for example slot 4).
Hash Tables
• Basic idea:
– Have a hash table H of T slots, addressed from 0 to T−1.
– Have a hash function h, that is, any function that maps keys
(integers, or strings, or ...) into integers in the range 0 to T − 1.
– Store record Rk at slot v = h(k).
• Ideally the access cost is O(1), regardless of data volume. That is, no
matter how much data we have, the lookup cost is the same.
Hash Tables
• Example: Consider a hash table of 8 slots:

• Suppose the hash function is h(k) = k mod 8.

Then h(76) = 4 and h(91) = 3.

• Why use hash tables for databases?

As each database record has a key, hash tables can provide very fast
access:
– Hash the key to get a slot number.
– Store the record in the slot. Alternatively, store a pointer to the
record in the slot.
• Here … 25.4.2019
Hash Tables in Memory
• The number of slots T is fixed. Each slot is a structure (similar to a C
language struct) that holds a key and maybe some other information
(for example, the rest of the record).
• To insert a record Rk with key k into hash table H compute its hash
value v = h(k) and set table entry H[v] to contain Rk.
• Problems:
– Collisions, when two records hash to the same slot.
– Static size, when the hash table is not of the appropriate capacity
(too big or too small).
Collisions
• “Collisions” happens when more than one key maps to the same array
index (slot).
• A good hash function is a bit like a random number generator. Given a
regular pattern of input, the output appears chaotic. A good hash
function for integers is
h(k) = ( ( p0 × k + p1 ) MOD p2 ) MOD T
where p0, p1, and p2 are large primes. (T does not have to be prime.)
• Aside: Not only does this look like a random number generator, but it
is a good random number generator. To get 16-bit pseudo-random
numbers, use:
unsigned R0 = seed
Rn+1 = (( Rn + p0 ) × p1 ) MOD p2
return Rn+1 MOD 216
Collisions

• Consider a hash table of size T = 10 and suppose p0 = 438,439, p1 =

34,723,753, and p2=376,307.
• h(k) = ( ( 438439 × k + 34723753 ) MOD 376307 ) MOD 10

• There are two 2's and two 7's but no 1 and no 9.

• We say that the hash function is uniform, because all 10 possible hash
values are equally likely. But that does not mean that hashing 10 numbers k
will give ten different addresses h(k). Some of the h(k) values will be the
same, that is, they will collide.
• :
Handling collisions
• There are three standard methods for in-memory collision resolution.
1. Linear probing:
2. Double Hashing:
3. Chaining:
Linear probing
• If H[v] is occupied, try H[(v+1) MOD T], H[(v+2) MOD T], ... until a
free slot is found, i.e., look in the slot next door, the slot next to that ...
• More generally: try H[(v + r) MOD T], H[(v + 2r) MOD T] ... where r
and T have no common divisor greater than 1.
• Linear probing leads to clustering of records; once the table is nearly
full, search degenerates to O(T).
• On search, the same procedure is followed, with a check at each stage
to see if the current slot is either empty or has a matching k value.
• At most T records can be inserted.
Double Hashing
• Set v = h(k) as before.
• If H[v] is occupied, use a secondary hash function h′ to compute v′ = h
′(k).
• Try H[(v + v′) MOD T], H[(v + 2v′) MOD T], ...
• Although records no longer cluster, search still degenerates to O(T),
but not as quickly.
• There is also an absolute problem of overflow — at most T records
can be inserted.
Chaining
• The hash table is an array of pointers, not an array of data.
• Each pointer in the hash table points to a linked list of records that
hash to that location.
• On insertion, a record Rk with hash value v = h(k) is added to the
linked list for slot H(v).
• On search, the list must be traversed to find the matching k value.
• A hash table of T slots can now be used to index n > T records, but if n
>> T then access may be slow.
• Access costs do not degenerate to O(n) as n grows.
• Extra space is required for pointers, as the table contains an array of
pointers not required in the other schemes.
Chaining
• Aside: If some keys are more commonly searched for than others, it is
efficient to move the most-recently accessed record to the front of the
list. The likelihood is, then, that the searched for record will be at the
start of its list. A chained hash table with move-to-front on access in
the chains is almost always the most efficient in-memory data
structure, unless it is essential that the data be kept sorted.
Cost of Collisions
• What is the cost of these collision schemes? That is, how many slots
(or records) need to be visited to find the record that is being searched
for? To estimate this cost, we need to know the probability of a
collision, that is, the value p(collision). In each of the three schemes,
the expected number of accesses is always greater than or equal to 1 +
p(collision). For linear probing and double hashing, the costs are much
greater once the table is more than (say) 2/3 full — but as these
methods are hard to analyse we will ignore them! The probability of
collision in a chained hash table can be worked out as follows.
• Assume that the hash addresses generated by the hash function are
uniformly-distributed random numbers in the interval 0 to T – 1.
Cost of Collisions
• Suppose that a total of n records are hashed. The probability that
exactly m records hash to a particular slot is given by the binomial
distribution:

• When T is large and λ = n/T is bounded (that is, not too big like in our
practical application), the above distribution can be simplified using
the Poisson approximation to the binomial distribution:

where λ is the mean number of records per slot.

Example:
• Suppose T = 100,000 and n = 80,000 so that λ = 0.80 records per slot.
• Then p(λ, 2) = p(0.8, 2) = ( 0.82 × e–.08 ) ÷ ( 2 × 1) = 0.14378
• Thus 0.14378 × 100,000 = 14,378 slots get exactly 2 records each. In
each case, one of these records must be an overflow.

• Note that a uniform distribution does not mean that all slots get
roughly the same number of hits. In this case, 45% of the hash table is
empty.
Overflows
• The number of overflow records is: 1 for each bucket with 2 records; 2
for each bucket with 3 records; 3 for each bucket with 4 records ...
That is, the number of overflow records is

• That is, 24,800 ÷ 80,000 or 31% of records can't be stored in the slot
given by their hash value. If T = n = 100,000, this fraction rises to
about 37%.

Data Base Security and Privacy Question Bank
100% (3)
Data Base Security and Privacy Question Bank
46 pages
IQ204 - ADL Autodialling Controller Data Sheet - Trend
No ratings yet
IQ204 - ADL Autodialling Controller Data Sheet - Trend
12 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Dsa Lecture 13 Hash Tables
No ratings yet
Dsa Lecture 13 Hash Tables
15 pages
GROUP 15.Pptx Presentation
No ratings yet
GROUP 15.Pptx Presentation
29 pages
MODULE-5
No ratings yet
MODULE-5
33 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
No ratings yet
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
20 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Hashing
No ratings yet
Hashing
34 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
Hashng Notes SVIMS
No ratings yet
Hashng Notes SVIMS
14 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
5 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
UNIT 1- Hashing
No ratings yet
UNIT 1- Hashing
118 pages
DSA_unit_!
No ratings yet
DSA_unit_!
123 pages
06 - Hashing
No ratings yet
06 - Hashing
75 pages
Hash Functions
No ratings yet
Hash Functions
60 pages
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
No ratings yet
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
32 pages
UNIT 1- Hashing
No ratings yet
UNIT 1- Hashing
118 pages
Hashing new
No ratings yet
Hashing new
48 pages
Hash Tables: Dr. Dibakar Saha
No ratings yet
Hash Tables: Dr. Dibakar Saha
26 pages
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
No ratings yet
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
19 pages
Unit28 Hashing1
No ratings yet
Unit28 Hashing1
19 pages
Hashing
No ratings yet
Hashing
20 pages
CH 4
No ratings yet
CH 4
58 pages
Hash Tables - : Structure
No ratings yet
Hash Tables - : Structure
21 pages
Chapter 5_Hashing _Part1
No ratings yet
Chapter 5_Hashing _Part1
28 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Unit 1 Dsa Hashing 2022 Compressed 1
No ratings yet
Unit 1 Dsa Hashing 2022 Compressed 1
115 pages
Unit-5 2
No ratings yet
Unit-5 2
9 pages
Hash Tables: A Detailed Description
No ratings yet
Hash Tables: A Detailed Description
10 pages
Hashing Slide
No ratings yet
Hashing Slide
16 pages
Exp 5 - Dsa Lab File
No ratings yet
Exp 5 - Dsa Lab File
10 pages
Hash Tables
100% (1)
Hash Tables
30 pages
Hashing
No ratings yet
Hashing
44 pages
Lecture 08 - Hash Tables
No ratings yet
Lecture 08 - Hash Tables
21 pages
Hashing and Skiplist_removed
No ratings yet
Hashing and Skiplist_removed
113 pages
Final Hashing
No ratings yet
Final Hashing
41 pages
Lec12-Hash-Tables-09092024-090609pm (1)
No ratings yet
Lec12-Hash-Tables-09092024-090609pm (1)
48 pages
Hashing Updated
No ratings yet
Hashing Updated
26 pages
Group 15 Hash Tables
No ratings yet
Group 15 Hash Tables
42 pages
Unit 1 Dsa Hashing
No ratings yet
Unit 1 Dsa Hashing
137 pages
Lecture 14 Hashing
No ratings yet
Lecture 14 Hashing
44 pages
Hashing: Amar Jukuntla
No ratings yet
Hashing: Amar Jukuntla
22 pages
DS Module-X
No ratings yet
DS Module-X
74 pages
Unit 2 hashing
No ratings yet
Unit 2 hashing
3 pages
300+ Python Algorithms: Mastering the Art of Problem-Solving
From Everand
300+ Python Algorithms: Mastering the Art of Problem-Solving
Hernando Abella
5/5 (1)
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
CS336 Lecture 1
No ratings yet
CS336 Lecture 1
81 pages
CS336 Lecture 3
No ratings yet
CS336 Lecture 3
45 pages
Code Generation Sample Questions
No ratings yet
Code Generation Sample Questions
3 pages
Importance of IT Audit and Controls in the Current Tech Landscape
No ratings yet
Importance of IT Audit and Controls in the Current Tech Landscape
2 pages
CS336 Lecture 2
No ratings yet
CS336 Lecture 2
31 pages
Exercise For A Multicycle Datapath
No ratings yet
Exercise For A Multicycle Datapath
1 page
N 1
No ratings yet
N 1
1 page
Java Review Q&As
No ratings yet
Java Review Q&As
31 pages
L2-Mathematical Background
No ratings yet
L2-Mathematical Background
31 pages
Java Questions
No ratings yet
Java Questions
9 pages
L1-Basic Analysis
No ratings yet
L1-Basic Analysis
6 pages
CISCO Stealthwatch Free Training
No ratings yet
CISCO Stealthwatch Free Training
7 pages
Guess Paper 2011 Class-XII Subject - Computer Science (Theory)
No ratings yet
Guess Paper 2011 Class-XII Subject - Computer Science (Theory)
8 pages
Module 5 - The Basic of Computer Networking
No ratings yet
Module 5 - The Basic of Computer Networking
5 pages
EDC15PSuite Manual
No ratings yet
EDC15PSuite Manual
31 pages
payment_to_adhoc_bene_converter (2)
No ratings yet
payment_to_adhoc_bene_converter (2)
115 pages
PAN Manager Command Reference
No ratings yet
PAN Manager Command Reference
235 pages
Sap Hana
No ratings yet
Sap Hana
4 pages
Basic Operations On MAP: Example
No ratings yet
Basic Operations On MAP: Example
9 pages
1 MPLS-TP Principles and Key Technologies
No ratings yet
1 MPLS-TP Principles and Key Technologies
21 pages
Governor of Poker Setup Log
No ratings yet
Governor of Poker Setup Log
2 pages
Oracle EBS 12.2 Single Sign On With Access Manager:: Product Server OS User
No ratings yet
Oracle EBS 12.2 Single Sign On With Access Manager:: Product Server OS User
30 pages
MPLS Forwarding Commands PDF
No ratings yet
MPLS Forwarding Commands PDF
56 pages
The World-Wide Web
No ratings yet
The World-Wide Web
3 pages
Interfacing SD Card With FPGA
No ratings yet
Interfacing SD Card With FPGA
37 pages
How To Implement Modbus TCP Protocol Using VBA With Excel - Acc Automation
No ratings yet
How To Implement Modbus TCP Protocol Using VBA With Excel - Acc Automation
18 pages
FileZillaUserGuide Digital
No ratings yet
FileZillaUserGuide Digital
124 pages
N0111 10 080415 - 0 PDF
No ratings yet
N0111 10 080415 - 0 PDF
39 pages
Sram PDF
No ratings yet
Sram PDF
22 pages
Delta Flashing of An ECU in The Automotive Industr
No ratings yet
Delta Flashing of An ECU in The Automotive Industr
6 pages
IBM QRadar SIEM
No ratings yet
IBM QRadar SIEM
37 pages
Duet GGC
No ratings yet
Duet GGC
6 pages
Capturing Changed Data Using SQL Server 2008 - Srikant Jahagirdar
No ratings yet
Capturing Changed Data Using SQL Server 2008 - Srikant Jahagirdar
19 pages
What Is The Difference Between DBMS and RDBMS??
No ratings yet
What Is The Difference Between DBMS and RDBMS??
2 pages
AT Commands PDF
No ratings yet
AT Commands PDF
68 pages
HP ProLiant DL580 G3 Server - Troubleshooting LED
No ratings yet
HP ProLiant DL580 G3 Server - Troubleshooting LED
19 pages
PL7 Manual Referencia
No ratings yet
PL7 Manual Referencia
228 pages
Lab A, Securing Administrative Access Using AAA and RADIUS: Topology
No ratings yet
Lab A, Securing Administrative Access Using AAA and RADIUS: Topology
13 pages
Disk Scheduling Algorithms
No ratings yet
Disk Scheduling Algorithms
7 pages

L5 HashTables

Uploaded by

L5 HashTables

Uploaded by

Hashed Indexes

• Suppose the hash function is h(k) = k mod 8.

• Why use hash tables for databases?

• Consider a hash table of size T = 10 and suppose p0 = 438,439, p1 =

• There are two 2's and two 7's but no 1 and no 9.

where λ is the mean number of records per slot.

You might also like