0% found this document useful (0 votes)

15 views9 pages

DBMS Unit-Iv

Dbms unit 4

Uploaded by

Madhu Arruri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views9 pages

DBMS Unit-Iv

Dbms unit 4

Uploaded by

Madhu Arruri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

UNIT - 4

CREC. Dept. of CSE Page 87

UNIT – 4
Representing Data Elements & Index Structures

Data on External Storage:

Disks: Can retrieve random page at fixed cost

 But reading several consecutive pages is much cheaper than reading them in random
order
Tapes: Can only read pages in sequence

 Cheaper than disks; used for archival storage.

File organization and Indexing:

File organization: Method of arranging a file of records on external storage.

 Record id (rid) is sufficient to physically locate record

 Indexes are data structures that allow us to find the record ids of records with given
values in index search key fields
Architecture: Buffer manager stages pages from external storage to main memory buffer

pool. File and index layers make calls to the buffer manager.

Primary and secondary Indexes:

Primary vs. secondary: If search key contains primary key, then called primary index.

Unique index: Search key contains a candidate key.

Clustered and unclustered:

Clustered vs. unclustered: If order of data records is the same as, or `close to’, order of data
entries, then called clustered index.

 Alternative 1 implies clustered; in practice, clustered also implies Alternative 1(since

sorted files are rare).

 A file can be clustered on at most one search key.

 Cost of retrieving data records through index varies greatly based on whether index is
clustered or not!

CREC. Dept. of CSE Page 88

Clustered vs. Unclustered Index

 Suppose that Alternative (2) is used for data entries, and that the data records are stored

in a Heap file.

 To build clustered index, first sort the Heap file (with some free space on each page for

future inserts).

Overflow pages may be needed for inserts. (Thus, order of data recs is `close to’, but not
identical to, the sort order.)

Index Data Structures:

An index on a file speeds up selections on the search key fields for the index.

 Any subset of the fields of a relation can be the search key for an index on the relation.

 Search key is not the same as key (minimal set of fields that uniquely identify a record in
a relation).

CREC. Dept. of CSE Page 89

 An index contains a collection of data entries, and supports efficient retrieval of all data

entries k* with a given key value k.

 Given data entry k*, we can find record with key k in at most one disk I/O.

(Details soon …)

B+ Tree Indexes

Example B+ Tree

1. Find 28? 29? All > 15* and < 30*

2. Insert/delete: Find data entry in leaf, then change it. Need to adjust parent sometimes.

 And change sometimes bubbles up the tree

CREC. Dept. of CSE Page 90

Hash-Based Indexing:

 Hash-Based Indexes

 Good for equality selections.

 Index is a collection of buckets.

Bucket = primary page plus zero or more overflow pages. Buckets contain data entries.

 Hashing function h: h(r) = bucket in which (data entry for) record r belongs. h looks
atthe search key fields of r.

 No need for “index entries” in this scheme.

Alternatives for Data Entry k* in Index

In a data entry k* we can store:

 Data record with key value k, or

 <k, rid of data record with search key value k>, or

 <k, list of rids of data records with search key k>

 Choice of alternative for data entries is orthogonal to the indexing technique used to

locate data entries with a given key value k.

Tree Based Indexing:

– Examples of indexing techniques: B+ trees, hash-based structures

– Typically, index contains auxiliary information that directs searches to the desired data
entries
Alternative 1:

– If this is used, index structure is a file organization for data records (instead of a Heap file or
sorted file).
– At most one index on a given collection of data records can use Alternative 1. (Otherwise,
data records are duplicated, leading to redundant storage and potential inconsistency.)

– If data records are very large, # of pages containing data entries is high.
Implies size of auxiliary information in the index is also large, typically.

CREC. Dept. of CSE Page 91

Cost Model for Our Analysis

We ignore CPU costs, for simplicity:

– B: The number of data pages

– R: Number of records per page

– D: (Average) time to read or write disk page

– Measuring number of page I/O’s ignores gains of pre-fetching a sequence of pages; thus,
even I/O cost is only approximated.

– Average-case analysis; based on several simplistic assumptions.

Choice of Indexes

1. What indexes should we create?

– Which relations should have indexes? What field(s) should be the search key?

Should we build several indexes?

1. For each index, what kind of an index should it be?

Clustered? Hash/tree?

1. One approach: Consider the most important queries in turn. Consider the best plan using

the current indexes, and see if a better plan is possible with an additional index.

If so, create it.

– Obviously, this implies that we must understand how a DBMS evaluates queries and creates
query evaluation plans!

– For now, we discuss simple 1-table queries.

Before creating an index, must also consider the impact on updates in the workload!

– Trade-off: Indexes can make queries go faster, updates slower. Require disk space, too.

CREC. Dept. of CSE Page 92

Index Selection Guidelines

Attributes in WHERE clause are candidates for index keys.

– Exact match condition suggests hash index.

– Range query suggests tree index.

Clustering is especially useful for range queries; can also help on equality queries if there are
many duplicates.

Multi-attribute search keys should be considered when a WHERE clause contains several
conditions.

– Order of attributes is important for range queries.

– Such indexes can sometimes enable index-only strategies for important queries.
For index-only strategies, clustering is not important!

B+ Tree:

B+ Tree: Most Widely Used Index. Insert/delete at log F N cost; keep tree height-balanced. (F
= fanout, N = # leaf pages) Minimum 50% occupancy (except for root). Each node contains d <=
m <= 2d entries. The parameter d is called the order of the tree. Supports equality and
range-searches efficiently.

Example B+ Tree

1. Search begins at root, and key comparisons direct it to a leaf (as in ISAM).

2. Search for 5*, 15*, all data entries >= 24* ...

B+ Trees in Practice

Typical order: 100. Typical fill-factor: 67%.

– average fanout = 133

Typical capacities:

– Height 4: 1334 = 312,900,700 records

– Height 3: 1333 = 2,352,637 records

Can often hold top levels in buffer pool:

CREC. Dept. of CSE Page 93

– Level 1 = 1 page = 8 Kbytes

– Level 2 = 133 pages = 1 Mbyte

– Level 3 = 17,689 pages = 133 MBytes

Inserting a Data Entry into a B+ Tree

Find correct leaf L.

Put data entry onto L.

– If L has enough space, done!

– Else, must split L (into L and a new node L2)

• Redistribute entries evenly, copy up middle key.

• Insert index entry pointing to L2 into parent of L.

This can happen recursively

– To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf
splits.) Splits “grow” tree; root split increases height.

– Tree growth: gets wider or one level taller at top.

Inserting 8* into Example B+ Tree

Observe how minimum occupancy is guaranteed in both leaf and index pg splits.

Note difference between copy-up and push-up; be sure you understand the reasons for

this.

Example B+ Tree After Inserting 8*

1. Deleting a Data Entry from a B+ Tree

2. Start at root, find leaf L where entry belongs.

3. Remove the entry.

CREC. Dept. of CSE Page 94

– If L is at least half-full, done!

– If L has only d-1 entries,

 Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

 If re-distribution fails, merge L and sibling.

If merge occurred, must delete entry (pointing to L or sibling) from parent of L. Merge could
propagate to root, decreasing height.

Example Tree After (Inserting 8*, Then) Deleting 19* and 20* ...

Deleting 19* is easy.

Deleting 20* is done with re-distribution. Notice how middle key is copied up.... And
Then Deleting 24*

Must merge.

Observe `toss’ of index entry (on right), and `pull down’ of index entry (below).

Hash Based Indexing:

Bucket: Hash file stores data in bucket format. Bucket is considered a unit of storage. Bucket
typically stores one complete disk block, which in turn can store one or more records.

Hash Function: A hash function h, is a mapping function that maps all set of search-keys K to

the address where actual records are placed. It is a function from search keyto bucket addresses. 

CREC. Dept. of CSE Page 95

DBMS Unit-4
No ratings yet
DBMS Unit-4
9 pages
Storage and Indexing
No ratings yet
Storage and Indexing
41 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
V_Unit[1]
No ratings yet
V_Unit[1]
36 pages
V Unit
No ratings yet
V Unit
15 pages
Index and Hashing
No ratings yet
Index and Hashing
82 pages
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
No ratings yet
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
52 pages
DBMS-Unit5-PPT (1)
No ratings yet
DBMS-Unit5-PPT (1)
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
CSE 301 Lecture-8-Indexing WT
No ratings yet
CSE 301 Lecture-8-Indexing WT
31 pages
IT3031-L06-Indexing
No ratings yet
IT3031-L06-Indexing
45 pages
Unit-5 B+Trees & Hashing
No ratings yet
Unit-5 B+Trees & Hashing
37 pages
Indexing
No ratings yet
Indexing
141 pages
Lecture 5 Trees
No ratings yet
Lecture 5 Trees
47 pages
Lesson 8 Cs450 - Indexing
No ratings yet
Lesson 8 Cs450 - Indexing
31 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
UNIT-5: Indexing and Hashing
No ratings yet
UNIT-5: Indexing and Hashing
78 pages
Find All Students With Gpa 3.0'': Can Do Binary Search On (Smaller) Index File!
No ratings yet
Find All Students With Gpa 3.0'': Can Do Binary Search On (Smaller) Index File!
42 pages
2 - Indexing Structures - Ch14
No ratings yet
2 - Indexing Structures - Ch14
50 pages
CH 12 Updated
No ratings yet
CH 12 Updated
55 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
Indexing: Contents
No ratings yet
Indexing: Contents
13 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
Chapter 7 - Indexing
No ratings yet
Chapter 7 - Indexing
94 pages
Dbms. 5 Unit Part-B
No ratings yet
Dbms. 5 Unit Part-B
8 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
No ratings yet
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
7 pages
Hash Tree Index
No ratings yet
Hash Tree Index
44 pages
Ch14, Veiws, Normalization_summary.pptx
No ratings yet
Ch14, Veiws, Normalization_summary.pptx
68 pages
CS2202_IndexingHashing
No ratings yet
CS2202_IndexingHashing
83 pages
Chapter 11: Indexing and Hashing
No ratings yet
Chapter 11: Indexing and Hashing
47 pages
Indexing
No ratings yet
Indexing
56 pages
08-indexes1
No ratings yet
08-indexes1
7 pages
B - Trees
No ratings yet
B - Trees
19 pages
Tree-Structured Indexes: R & G Chapter 9
No ratings yet
Tree-Structured Indexes: R & G Chapter 9
34 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Indexing and Hashing: (Emphasis On B+ Trees)
No ratings yet
Indexing and Hashing: (Emphasis On B+ Trees)
23 pages
B+ Tree & B Tree
No ratings yet
B+ Tree & B Tree
38 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
Chapter 8 Indexing NEW
No ratings yet
Chapter 8 Indexing NEW
43 pages
UNIT V Imp Questions
No ratings yet
UNIT V Imp Questions
12 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Unit Iv
No ratings yet
Unit Iv
6 pages
Btrees Animated
No ratings yet
Btrees Animated
77 pages
7 Indexing
No ratings yet
7 Indexing
13 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
Database Modeling - Notes-V
No ratings yet
Database Modeling - Notes-V
9 pages
Indexing_Hashing_Files
No ratings yet
Indexing_Hashing_Files
68 pages
CO3-SESSION-22
No ratings yet
CO3-SESSION-22
19 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
File Organization
No ratings yet
File Organization
41 pages
Storage-Final
No ratings yet
Storage-Final
77 pages
Chapter 7 Indexing Part1
No ratings yet
Chapter 7 Indexing Part1
58 pages
Unit08 DBMS
100% (1)
Unit08 DBMS
45 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
From Everand
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
VIOLET CASTRO
No ratings yet
Vam
No ratings yet
Vam
5 pages
CS550_Lec1
No ratings yet
CS550_Lec1
32 pages
DPLL_22011A0512
No ratings yet
DPLL_22011A0512
2 pages
unit-1 FiniteAutomata
No ratings yet
unit-1 FiniteAutomata
89 pages
R22B.tech.CSECourseStructureSyllabus2
No ratings yet
R22B.tech.CSECourseStructureSyllabus2
25 pages
State Ful
No ratings yet
State Ful
1 page
unit-2-flat
No ratings yet
unit-2-flat
9 pages
22011A0512_CHEBYSHEV
No ratings yet
22011A0512_CHEBYSHEV
5 pages
2 concept-learning
No ratings yet
2 concept-learning
42 pages
22011A0555_TSP
No ratings yet
22011A0555_TSP
3 pages
Beyond Earth
No ratings yet
Beyond Earth
13 pages
White and Grey Modern Business Research Proposal Presentation
No ratings yet
White and Grey Modern Business Research Proposal Presentation
10 pages
DBMS Unit-V
No ratings yet
DBMS Unit-V
27 pages
DBMS Unit-Ii
No ratings yet
DBMS Unit-Ii
16 pages
Introduction To The Waterfall Model: by 512-Madhu A
No ratings yet
Introduction To The Waterfall Model: by 512-Madhu A
8 pages
22011a0512 Madhu
No ratings yet
22011a0512 Madhu
13 pages
DBMS Unit-I
No ratings yet
DBMS Unit-I
32 pages
Maths Notes
No ratings yet
Maths Notes
28 pages
DPI 610/615 Series: Druck Portable Pressure Calibrators
No ratings yet
DPI 610/615 Series: Druck Portable Pressure Calibrators
8 pages
Mechanical Maintenance Syllabus
No ratings yet
Mechanical Maintenance Syllabus
69 pages
DevOps Roadmap 2024 - TrainWithShubham
No ratings yet
DevOps Roadmap 2024 - TrainWithShubham
5 pages
Vivek GBE Hacktivisim, Wikileaks, Anonymous, Pegasus, Personal Cybersecurity
No ratings yet
Vivek GBE Hacktivisim, Wikileaks, Anonymous, Pegasus, Personal Cybersecurity
4 pages
Call by Value and Call by Reference in C
No ratings yet
Call by Value and Call by Reference in C
8 pages
How Can I Insert Variable Into Formula in VBA: 2 Answers
No ratings yet
How Can I Insert Variable Into Formula in VBA: 2 Answers
2 pages
Aakash Scholarship Test Sample Papers - ANTHE Previous Year Question Papers and Text Solutions - Aakash
No ratings yet
Aakash Scholarship Test Sample Papers - ANTHE Previous Year Question Papers and Text Solutions - Aakash
5 pages
Simplexity PRD Template Sept 2022
No ratings yet
Simplexity PRD Template Sept 2022
12 pages
5S Audit Checklist - SafetyCulture
No ratings yet
5S Audit Checklist - SafetyCulture
7 pages
prosthesis-03-00027
No ratings yet
prosthesis-03-00027
20 pages
Smartoffice Pl4080: A Fast & Versatile Document Digitalization Solution For Workgroups and Vertical Markets
No ratings yet
Smartoffice Pl4080: A Fast & Versatile Document Digitalization Solution For Workgroups and Vertical Markets
2 pages
1.initial Boot Sequence
No ratings yet
1.initial Boot Sequence
94 pages
final1
No ratings yet
final1
4 pages
0 Lecture 0
No ratings yet
0 Lecture 0
63 pages
Log N 44420
No ratings yet
Log N 44420
2 pages
Algorithm Handout 24
No ratings yet
Algorithm Handout 24
6 pages
"Student Careers": A Project Report On
No ratings yet
"Student Careers": A Project Report On
147 pages
Condé Nast House & Garden - August 2015 ZA
No ratings yet
Condé Nast House & Garden - August 2015 ZA
148 pages
Tracking Genie Official
No ratings yet
Tracking Genie Official
46 pages
GUIDELINES Poster Presentation
No ratings yet
GUIDELINES Poster Presentation
4 pages
DFS With Example
No ratings yet
DFS With Example
8 pages
Prince Sain
No ratings yet
Prince Sain
1 page
Quasi-Physical Modeling of Robot IRB 120 Using Sim
No ratings yet
Quasi-Physical Modeling of Robot IRB 120 Using Sim
17 pages
Nitesh CET Admit Card
No ratings yet
Nitesh CET Admit Card
3 pages
FSP Power Never Ends !
No ratings yet
FSP Power Never Ends !
2 pages
Illumio-White-Paper-How-to-Build-a-Micro-Segmentation-Strategy
No ratings yet
Illumio-White-Paper-How-to-Build-a-Micro-Segmentation-Strategy
14 pages
Pulse Output Example Program
No ratings yet
Pulse Output Example Program
12 pages
Galaxy 4016 GSM A6 and A7 Manual
No ratings yet
Galaxy 4016 GSM A6 and A7 Manual
57 pages
vtunotesforall Block Chain Technology module-2
No ratings yet
vtunotesforall Block Chain Technology module-2
48 pages
DNR500 Series QuickConnectionGuide
No ratings yet
DNR500 Series QuickConnectionGuide
2 pages

DBMS Unit-Iv

Uploaded by

DBMS Unit-Iv

Uploaded by

UNIT - 4

CREC. Dept. of CSE Page 87

Data on External Storage:

Disks: Can retrieve random page at fixed cost

 Cheaper than disks; used for archival storage.

File organization and Indexing:

File organization: Method of arranging a file of records on external storage.

 Record id (rid) is sufficient to physically locate record

Primary and secondary Indexes:

Unique index: Search key contains a candidate key.

Clustered and unclustered:

 Alternative 1 implies clustered; in practice, clustered also implies Alternative 1(since

 A file can be clustered on at most one search key.

CREC. Dept. of CSE Page 88

Index Data Structures:

CREC. Dept. of CSE Page 89

entries k* with a given key value k.

1. Find 28*? 29*? All > 15* and < 30*

 And change sometimes bubbles up the tree

CREC. Dept. of CSE Page 90

 Good for equality selections.

 Index is a collection of buckets.

 No need for “index entries” in this scheme.

In a data entry k* we can store:

 <k, rid of data record with search key value k>, or

locate data entries with a given key value k.

Tree Based Indexing:

– Examples of indexing techniques: B+ trees, hash-based structures

CREC. Dept. of CSE Page 91

We ignore CPU costs, for simplicity:

– B: The number of data pages

– R: Number of records per page

– D: (Average) time to read or write disk page

– Average-case analysis; based on several simplistic assumptions.

1. What indexes should we create?

Should we build several indexes?

1. For each index, what kind of an index should it be?

If so, create it.

– For now, we discuss simple 1-table queries.

CREC. Dept. of CSE Page 92

Attributes in WHERE clause are candidates for index keys.

– Exact match condition suggests hash index.

– Range query suggests tree index.

– Order of attributes is important for range queries.

Typical order: 100. Typical fill-factor: 67%.

– average fanout = 133

– Height 4: 1334 = 312,900,700 records

– Height 3: 1333 = 2,352,637 records

Can often hold top levels in buffer pool:

CREC. Dept. of CSE Page 93

– Level 2 = 133 pages = 1 Mbyte

– Level 3 = 17,689 pages = 133 MBytes

Inserting a Data Entry into a B+ Tree

Find correct leaf L.

Put data entry onto L.

– If L has enough space, done!

– Else, must split L (into L and a new node L2)

• Redistribute entries evenly, copy up middle key.

• Insert index entry pointing to L2 into parent of L.

This can happen recursively

– Tree growth: gets wider or one level taller at top.

Example B+ Tree After Inserting 8*

1. Deleting a Data Entry from a B+ Tree

2. Start at root, find leaf L where entry belongs.

3. Remove the entry.

CREC. Dept. of CSE Page 94

– If L has only d-1 entries,

 If re-distribution fails, merge L and sibling.

Deleting 19* is easy.

Hash Based Indexing:

CREC. Dept. of CSE Page 95

You might also like

1. Find 28? 29? All > 15* and < 30*