0% found this document useful (0 votes)

45 views7 pages

Hash-Based Indexes: As For Any Index, 3 Alternatives For Data Entries K

Hashed Indexing

Uploaded by

manishbhardwaj8131

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views7 pages

Hash-Based Indexes: As For Any Index, 3 Alternatives For Data Entries K

Hashed Indexing

Uploaded by

manishbhardwaj8131

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Hash-Based Indexes

Chapter 10

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Introduction

As for any index, 3 alternatives for data entries k*:

Data record with key value k <k, rid of data record with search key value k> <k, list of rids of data records with search key k> Choice orthogonal to the indexing technique

Hash-based indexes are best for equality selections. Cannot support range searches. Static and dynamic hashing techniques exist; trade-offs similar to ISAM vs. B+ trees.
2

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Static Hashing

# primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed. h(k) mod M = bucket to which data entry with key k belongs. (M = # of buckets)
h(key) mod N key h 0 2

N-1
Primary bucket pages Overflow pages
3 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Static Hashing (Contd.)

Buckets contain data entries. Hash fn works on search key field of record r. Must distribute values over range 0 ... M-1.

h(key) = (a * key + b) usually works well. a and b are constants; lots known about how to tune h.

Long overflow chains can develop and degrade performance.

Extendible and Linear Hashing: Dynamic techniques to fix this problem.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Extendible Hashing

Situation: Bucket (primary page) becomes full. Why not re-organize file by doubling # of buckets?

Reading and writing all pages is expensive! Idea: Use directory of pointers to buckets, double # of buckets by doubling the directory, splitting just the bucket that overflowed! Directory much smaller than file, so doubling it is much cheaper. Only one page of data entries is split. No overflow page! Trick lies in how hash function is adjusted!
5

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

LOCAL DEPTH

Example

GLOBAL DEPTH

4* 12* 32* 16*

Bucket A

Directory is array of size 4. To find bucket for r, take last `global depth # bits of h(r); we denote r by h(r). If h(r) = 5 = binary 101, it is in bucket pointed to by 01.

5* 21* 13*

Bucket B

01 10

10*

Bucket C

15* 7* 19* DATA PAGES

Bucket D

Insert: If bucket is full, split it (allocate new page, re-distribute).

If necessary, double the directory. (As we will see, splitting a bucket does not always require doubling; we can tell by comparing global depth with local depth for the split bucket.)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6

Insert h(r)=20 (Causes Doubling)

LOCAL DEPTH GLOBAL DEPTH

2 32*16*

Bucket A

LOCAL DEPTH

GLOBAL DEPTH

32* 16* Bucket A 2 1* 5* 2113 Bucket B

2 00

2 1* 5* 21*13* Bucket B

3 000 001 Bucket C 010 011

10 11

2 10*

Bucket C

100 Bucket D 101 110

Bucket A2 (`split image' of Bucket A)

Bucket A2 (`split image' of Bucket A) 7

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Points to Note

20 = binary 10100. Last 2 bits (00) tell us r belongs in A or A2. Last 3 bits needed to tell which.

Global depth of directory: Max # of bits needed to tell which bucket an entry belongs to. Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket.

When does bucket split cause directory doubling?

Before insert, local depth of bucket = global depth. Insert causes local depth to become > global depth; directory is doubled by copying it over and `fixing pointer to split image page. (Use of least significant bits enables efficient doubling via copying of directory!)
8

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Directory Doubling
Why use least significant bits in directory? Allows for doubling via copying!

6 = 110

000 001 2

000 100 2

010 011 1 0 1 00 10

010 110 001

1 0 1

00 01 10 11

100 101 110 111

01 11

101 011 111

Least Significant

vs.

Most Significant
9

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Comments on Extendible Hashing

If directory fits in memory, equality search answered with one disk access; else two.

100MB file, 100 bytes/rec, 4K pages contains 1,000,000 records (as data entries) and 25,000 directory elements; chances are high that directory will fit in memory. Directory grows in spurts, and, if the distribution of hash values is skewed, directory can grow large. Multiple entries with same hash value cause problems!

Delete: If removal of data entry makes bucket empty, can be merged with `split image. If each directory element points to same bucket as its split image, can halve directory.
10

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Linear Hashing

This is another dynamic hashing scheme, an alternative to Extendible Hashing. LH handles the problem of long overflow chains without using a directory, and handles duplicates. Idea: Use a family of hash functions h0, h1, h2, ...

hi(key) = h(key) mod(2iN); N = initial # buckets h is some hash function (range is not 0 to N-1) If N = 2d0, for some d0, hi consists of applying h and looking at the last di bits, where di = d0 + i. hi+1 doubles the range of hi (similar to directory doubling)
11

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Linear Hashing (Contd.)

Directory avoided in LH by using overflow pages, and choosing bucket to split round-robin.

Splitting proceeds in `rounds. Round ends when all NR initial (for round R) buckets are split. Buckets 0 to Next-1 have been split; Next to NR yet to be split. Current round number is Level. Search: To find bucket for data entry r, find hLevel(r): If hLevel(r) in range `Next to NR , r belongs here. Else, r could belong to bucket hLevel(r) or bucket hLevel(r) + NR; must apply hLevel+1(r) to find out.
12

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Overview of LH File

In the middle of a round.

Bucket to be split Next Buckets split in this round: If h Level ( search key value ) is in this range, must use h Level+1 ( search key value ) to decide if entry is in `split image' bucket.

Buckets that existed at the beginning of this round: this is the range of

hLevel
`split image' buckets: created (through splitting of other buckets) in this round

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Linear Hashing (Contd.)

Insert: Find bucket by applying hLevel / hLevel+1:

If bucket to insert into is full: Add overflow page and insert data entry. (Maybe) Split Next bucket and increment Next.

Can choose any criterion to `trigger split.

Since buckets are split round-robin, long overflow chains dont develop! Doubling of directory in Extendible Hashing is similar; switching of hash functions is implicit in how the # of bits examined is increased.
14

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Example of Linear Hashing

On split, hLevel+1 is used to re-distribute entries.

Level=0, N=4 h 1 000 001 h 0 00 01 PRIMARY Next=0 PAGES 32*44* 36* 9* 25* 5* 14* 18*10*30* 31*35* 7* 11* Data entry r with h(r)=5 h 1 000 001 h 0 00 01 Level=0 PRIMARY PAGES 32* Next=1 9* 25* 5* 14* 18*10*30* 31*35* 7* 11* 44* 36* 15 43* OVERFLOW PAGES

010 011

10 11

Primary bucket page

010 011

10 11

(This info (The actual contents 100 00 is for illustration of the linear hashed only!) file) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Example: End of a Round

Level=1 h1 Level=0 h1 000 001 010 011 100 101 110 h0 00 01 10 11 00 01 10 PRIMARY PAGES 32* 010 9* 25* 011 66*18*10* 34* Next=3 31* 35* 7* 11* 44* 36* 5* 37*29* 22* 14*30* 100 43* 101 110 111 11 00 11 10 11 43* 35* 11* 44* 36* 5* 37* 29* 14* 30* 22* 31*7* 16 10 66* 18* 10* 34* 50* OVERFLOW PAGES 000 001 h0 Next=0 00 01 32* 9* 25* PRIMARY PAGES OVERFLOW PAGES

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

LH Described as a Variant of EH

The two schemes are actually quite similar:

Begin with an EH index where directory has N elements. Use overflow pages, split buckets round-robin. First split is at bucket 0. (Imagine directory being doubled at this point.) But elements <1,N+1>, <2,N+2>, ... are the same. So, need only create directory element N, which differs from 0, now.
When bucket 1 splits, create directory element N+1, etc.

So, directory can double gradually. Also, primary bucket pages are created in order. If they are allocated in sequence too (so that finding ith is easy), we actually dont need a directory! Voila, LH.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17

Summary

Hash-based indexes: best for equality searches, cannot support range searches. Static Hashing can lead to long overflow chains. Extendible Hashing avoids overflow pages by splitting a full bucket when a new data entry is to be added to it. (Duplicates may require overflow pages.)

Directory to keep track of buckets, doubles periodically. Can get large with skewed data; additional I/O if this does not fit in main memory.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Summary (Contd.)

Linear Hashing avoids directory by splitting buckets round-robin, and using overflow pages.

Overflow pages not likely to be long. Duplicates handled easily. Space utilization could be lower than Extendible Hashing, since splits not concentrated on `dense data areas. Can tune criterion for triggering splits to trade-off slightly longer chains for better space utilization.

For hash-based indexes, a skewed data distribution is one in which the hash values of data entries are not uniformly distributed!
19

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
12 pages
11 What Is Hashing in DBMS
No ratings yet
11 What Is Hashing in DBMS
20 pages
2.8. ADS_collision Resolution-Extendible Hashing-1
No ratings yet
2.8. ADS_collision Resolution-Extendible Hashing-1
47 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
11 pages
DBMS Hashing
No ratings yet
DBMS Hashing
3 pages
04-Requirements Use Cases
No ratings yet
04-Requirements Use Cases
38 pages
Linear Hashing
No ratings yet
Linear Hashing
21 pages
Segmentation
No ratings yet
Segmentation
35 pages
Scheduling: - Job Queue - Ready Queue - Device Queue - IPC Queue
No ratings yet
Scheduling: - Job Queue - Ready Queue - Device Queue - IPC Queue
20 pages
Magnetic Disks
No ratings yet
Magnetic Disks
19 pages
Unit 3.Docx Dbms
No ratings yet
Unit 3.Docx Dbms
25 pages
lecture_5_6
No ratings yet
lecture_5_6
30 pages
Process Management
No ratings yet
Process Management
10 pages
Chapter 8: Main Memory
No ratings yet
Chapter 8: Main Memory
10 pages
Crash Recovery: A C I D
No ratings yet
Crash Recovery: A C I D
9 pages
Chapter 15: Security
No ratings yet
Chapter 15: Security
8 pages
Evaluation of Relational Operations: Other Techniques: Chapter 12, Part B
No ratings yet
Evaluation of Relational Operations: Other Techniques: Chapter 12, Part B
4 pages
Introduction To Introduction To Introduction To Introduction To Operating Systems Operating Systems
No ratings yet
Introduction To Introduction To Introduction To Introduction To Operating Systems Operating Systems
6 pages
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
No ratings yet
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
6 pages
Lecture14 Hash Based Indexing and Sorting MHH 18oct 2016
No ratings yet
Lecture14 Hash Based Indexing and Sorting MHH 18oct 2016
71 pages
Hashing
No ratings yet
Hashing
33 pages
Block 03
No ratings yet
Block 03
31 pages
CO3 Session 6
No ratings yet
CO3 Session 6
29 pages
Chapter 11
No ratings yet
Chapter 11
22 pages
Java Swing Demo (Class Name Test)
No ratings yet
Java Swing Demo (Class Name Test)
2 pages
Module 6. Dynamic Hashing
No ratings yet
Module 6. Dynamic Hashing
18 pages
Ch11 Hash Indexes 1perpage Annotated
No ratings yet
Ch11 Hash Indexes 1perpage Annotated
28 pages
Adbs 5
No ratings yet
Adbs 5
37 pages
Unit 4-Hashing
No ratings yet
Unit 4-Hashing
24 pages
Chapter 11
No ratings yet
Chapter 11
22 pages
Cambridge Institute of Technology
No ratings yet
Cambridge Institute of Technology
3 pages
UNIT V
No ratings yet
UNIT V
93 pages
CSCI 201L Written Exam #2 Spring 2018 15% of Course Grade
No ratings yet
CSCI 201L Written Exam #2 Spring 2018 15% of Course Grade
10 pages
Dynamic Hashing Notes
No ratings yet
Dynamic Hashing Notes
3 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
58 pages
hashing-2 (1)
No ratings yet
hashing-2 (1)
17 pages
English Autocom Manual
100% (1)
English Autocom Manual
125 pages
Checkout Commmands
No ratings yet
Checkout Commmands
4 pages
Chapter 7 Indexing Part2
No ratings yet
Chapter 7 Indexing Part2
41 pages
04_UW_Hashing (3)
No ratings yet
04_UW_Hashing (3)
79 pages
Software Architecture Assessment Form
No ratings yet
Software Architecture Assessment Form
2 pages
P4M900-M7 Fe
No ratings yet
P4M900-M7 Fe
41 pages
Extendible Hashing
No ratings yet
Extendible Hashing
65 pages
Adobe Photoshop: Jump To Navigation Jump To Search
No ratings yet
Adobe Photoshop: Jump To Navigation Jump To Search
32 pages
Hashing
No ratings yet
Hashing
8 pages
Storing Data: Disks and Files: Why Not Store Everything in Main Memory?
No ratings yet
Storing Data: Disks and Files: Why Not Store Everything in Main Memory?
10 pages
E Ds Extendiblehashing
No ratings yet
E Ds Extendiblehashing
3 pages
Brickware For Windows: Release 5.1.1 July 5, 1999
No ratings yet
Brickware For Windows: Release 5.1.1 July 5, 1999
12 pages
EKT Model Question Paper Answer Key
No ratings yet
EKT Model Question Paper Answer Key
6 pages
6 Hash-Based Indexing
No ratings yet
6 Hash-Based Indexing
26 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
6 pages
Consignment Tracking Services Abstracts
No ratings yet
Consignment Tracking Services Abstracts
11 pages
Hash-Based Indexes: Introduction To Database, Fall 2004/melikyan 1
No ratings yet
Hash-Based Indexes: Introduction To Database, Fall 2004/melikyan 1
19 pages
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
No ratings yet
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
43 pages
hash_dbms
No ratings yet
hash_dbms
5 pages
Welcome: Distance Calculator
No ratings yet
Welcome: Distance Calculator
13 pages
Group Assignment - On - Hashing in DBMS
No ratings yet
Group Assignment - On - Hashing in DBMS
4 pages
Ca Erwin Data Modeler: Methods Guide
No ratings yet
Ca Erwin Data Modeler: Methods Guide
102 pages
Linear Hashing: Historical Background
No ratings yet
Linear Hashing: Historical Background
4 pages
Unit-4 Hand Written
No ratings yet
Unit-4 Hand Written
35 pages
Winphlash User Guide PDF
No ratings yet
Winphlash User Guide PDF
12 pages
Physical Database Design
No ratings yet
Physical Database Design
9 pages
Library Management System Project
76% (17)
Library Management System Project
32 pages
Lec04 Hashing CH 11 P2
No ratings yet
Lec04 Hashing CH 11 P2
44 pages
MSI Gaming Mouse User Manual
No ratings yet
MSI Gaming Mouse User Manual
40 pages
Ch8 Storage Indexing Overview 95 HH Rev 1
No ratings yet
Ch8 Storage Indexing Overview 95 HH Rev 1
42 pages
Ch6 Process Synchronization 6
No ratings yet
Ch6 Process Synchronization 6
11 pages
EP300 User Manual
No ratings yet
EP300 User Manual
100 pages
HPSA Install 20160508-132200
No ratings yet
HPSA Install 20160508-132200
351 pages
1 - Disk Storage - Ch13
No ratings yet
1 - Disk Storage - Ch13
31 pages
There Are Two Types of Hashing
No ratings yet
There Are Two Types of Hashing
2 pages
9-Hashing Schemes
No ratings yet
9-Hashing Schemes
23 pages
Unit-3 Hashing Storage Btree
No ratings yet
Unit-3 Hashing Storage Btree
26 pages
Como Instalar OS X Yosemite in Virtualbox Con Yosemite Zone
No ratings yet
Como Instalar OS X Yosemite in Virtualbox Con Yosemite Zone
12 pages
CO3 Notes Hashing
No ratings yet
CO3 Notes Hashing
10 pages
Section08 Memory Management
No ratings yet
Section08 Memory Management
35 pages
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
No ratings yet
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
7 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
Hashing
No ratings yet
Hashing
8 pages
Principles of Database Management Systems: 4.2: Hashing Techniques
No ratings yet
Principles of Database Management Systems: 4.2: Hashing Techniques
36 pages
Extendible hashing
No ratings yet
Extendible hashing
4 pages
Data Management: INFO125
No ratings yet
Data Management: INFO125
111 pages
Analyst's Notebook 8
No ratings yet
Analyst's Notebook 8
2 pages
Deadlock 2
No ratings yet
Deadlock 2
8 pages
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
No ratings yet
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
15 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
60 pages
Configuring Vintela SSO in Distributed Environments
No ratings yet
Configuring Vintela SSO in Distributed Environments
18 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
4 pages
File Organization and Indexing: Structure of Disks
No ratings yet
File Organization and Indexing: Structure of Disks
28 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
20 pages
Serial Port Communication
100% (5)
Serial Port Communication
39 pages
UsbFix Report
No ratings yet
UsbFix Report
3 pages
CS143: Hash Index
No ratings yet
CS143: Hash Index
26 pages
m5 Index PDF
No ratings yet
m5 Index PDF
60 pages
Database Indexing and Hashing
No ratings yet
Database Indexing and Hashing
7 pages
VNX Allocation Process
No ratings yet
VNX Allocation Process
7 pages
Unit 6.2 Indexing and Hashing
No ratings yet
Unit 6.2 Indexing and Hashing
37 pages
Dynamic Hashing
No ratings yet
Dynamic Hashing
35 pages
UML Diagrams
No ratings yet
UML Diagrams
23 pages
Communication Protocol For ARRAY Power Supply
No ratings yet
Communication Protocol For ARRAY Power Supply
11 pages
Advantages and Disadvantages of RAD
No ratings yet
Advantages and Disadvantages of RAD
11 pages
Chap 12. Extendible Hashing: File Structures
No ratings yet
Chap 12. Extendible Hashing: File Structures
40 pages
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
From Everand
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
VIOLET CASTRO
No ratings yet

Hash-Based Indexes: As For Any Index, 3 Alternatives For Data Entries K

Uploaded by

Hash-Based Indexes: As For Any Index, 3 Alternatives For Data Entries K

Uploaded by

Hash-Based Indexes

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

As for any index, 3 alternatives for data entries k*:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Static Hashing (Contd.)

Long overflow chains can develop and degrade performance.

Extendible and Linear Hashing: Dynamic techniques to fix this problem.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

4* 12* 32* 16*

15* 7* 19* DATA PAGES

Insert: If bucket is full, split it (allocate new page, re-distribute).

Insert h(r)=20 (Causes Doubling)

LOCAL DEPTH GLOBAL DEPTH

32* 16* Bucket A 2 1* 5* 21*13* Bucket B

3 000 001 Bucket C 010 011

100 Bucket D 101 110

Bucket A2 (`split image' of Bucket A)

Bucket A2 (`split image' of Bucket A) 7

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

When does bucket split cause directory doubling?

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

010 110 001

100 101 110 111

101 011 111

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Comments on Extendible Hashing

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Linear Hashing (Contd.)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

In the middle of a round.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Linear Hashing (Contd.)

Insert: Find bucket by applying hLevel / hLevel+1:

Can choose any criterion to `trigger split.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Example of Linear Hashing

On split, hLevel+1 is used to re-distribute entries.

Primary bucket page

Example: End of a Round

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

The two schemes are actually quite similar:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

You might also like

32* 16* Bucket A 2 1* 5* 2113 Bucket B