0% found this document useful (0 votes)

33 views

File Organizations and Indexing: R&G Chapter 8

File organizations include heap files, sorted files, and clustered files with indexes. Heap files are suitable for full scans but poor for searches. Sorted files and clustered files with indexes support efficient searches through tree-based indexing structures like B-trees. Indexes speed up selections by maintaining a data structure that maps keys to record identifiers. The data structure can be hash-based or tree-based depending on the needed query capabilities. Indexes are classified as clustered or unclustered depending on whether the indexed records are stored in key order.

Uploaded by

raw.junk

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

File Organizations and Indexing: R&G Chapter 8

Uploaded by

raw.junk

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 26

File Organizations and Indexing

R&G Chapter 8

"If you don't find it in the index, look very

carefully through the entire catalogue."
-- Sears, Roebuck, and Co.,
Consumer's Guide, 1897

Context

Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management

Alternative File Organizations

Many alternatives exist, each good for
some situations, and not so good in
others:
Heap files: Suitable when typical
access is a file scan retrieving all
records.
Sorted Files: Best for retrieval in search
key order, or only a `range of records is
needed.
Clustered Files (with Indexes): Coming
soon

Cost Model for Analysis

We ignore CPU costs, for simplicity:

B: The number of data blocks

R: Number of records per block
D: (Average) time to read or write disk block
Measuring number of block I/Os ignores
gains of pre-fetching and sequential access;
thus, even I/O cost is only loosely
approximated.
Average-case analysis; based on several
simplistic assumptions.
Good enough to show the overall trends

Some Assumptions in the

Analysis
Single record insert and delete.
Equality selection - exactly one
match (what if more or less???).
Heap Files:
Insert always appends to end of file.

Sorted Files:
Files compacted after deletions.
Selections on search key.

Cost of
Operations
Heap File
Scan all
records
Equality
Search
Range
Search
Insert
Delete

B: The number of data pages

R: Number of records per page
D: (Average) time to read or write disk p

Sorted File

Clustered File

Cost of
Operations

Scan all
records
Equality
Search
Range
Search
Insert
Delete

B: The number of data pages

R: Number of records per page
D: (Average) time to read or write disk p

Heap File

Sorted File

Clustered File

Cost of
Operations

B: The number of data pages

R: Number of records per page
D: (Average) time to read or write disk p

Heap File

Sorted File

Scan all
records

Equality
Search

0.5 BD

(log2 B) * D

Range
Search
Insert
Delete

Clustered File

Cost of
Operations

B: The number of data pages

R: Number of records per page
D: (Average) time to read or write disk p

Heap File

Sorted File

Scan all
records

Equality
Search

0.5 BD

(log2 B) * D

Range
Search

[(log2 B) +
#match pg]*D

Insert
Delete

Clustered File

Cost of
Operations

B: The number of data pages

R: Number of records per page
D: (Average) time to read or write disk p

Heap File

Sorted File

Scan all
records

Equality
Search

0.5 BD

(log2 B) * D

Range
Search

[(log2 B) +
#match pg]*D

Insert

((log2B)+B)D
(because R,W 0.5)

Delete

Clustered File

Cost of
Operations

B: The number of data pages

R: Number of records per page
D: (Average) time to read or write disk p

Heap File

Sorted File

Scan all
records

Equality
Search

0.5 BD

(log2 B) * D

Range
Search

[(log2 B) +
#match pg]*D

Insert

((log2B)+B)D

Delete

0.5BD + D

((log2B)+B)D
(because R,W 0.5)

Clustered File

Indexes
Sometimes, we want to retrieve records by
specifying the values in one or more fields, e.g.,
Find all students in the CS department
Find all students with a gpa > 3
An index on a file is a disk-based data structure
that speeds up selections on the search key
fields for the index.
Any subset of the fields of a relation can be
the search key for an index on the relation.
Search key is not the same as key (e.g.
doesnt have to be unique ID).
Can have multiple (different) indexes per file.
E.g. sort by age, with an index on salary and
name.

Index Breakdown
Index Data Structure
Tree-based, hash-based, other
What can the index speed up, and how much?

Associating index entries with records

primary vs. secondary indexes, handling
duplicates
what kind of info is the index actually storing?

Clustered vs. Unclustered Indexes

Single Part vs. Multi-Part Keys
Multi-part key = Composite Indexes

Data structures
What kinds of selections do they support?
Selections of form field <op> constant
Equality selections (op is =)
Range selections (op is one of <, >, <=, >=,
BETWEEN)

Hash-based structures (how to

grow/shrink)
Key problem on disk is handling growth
Extendible/Linear Hashing (Chap 11)

Tree based structures

Why not binary tree? Estimate log 2(1M) * D
B-Tree, B+-Tree (Chap 10)

Wide World of Index

Structures
2-dimensional ranges (east of Berkeley and west of
Truckee and North of Fresno and South of Eureka)
Or distances (within 2 miles of Soda Hall)
Or n-dimensional
One common n-dimensional index: R-tree
Supported in Oracle and Informix

See http://gist.cs.berkeley.edu for research on this topic

Nearest neighbor (closest BMW dealer)

Ranking queries (10 best Berkeley Thai restaurants on
price and atmosphere)
these are hard to support!

Regular expression matches

Suffix Trees

XML path matches

DataGuide, 1-Index

Primary vs. Secondary Index

Primary index search key must contain
a real key, usually primary key
e.g., social security #, ISBN, etc.
No duplicate support
Store record in the index?

Secondary index
e.g., eye color, year of birth, etc.
Duplicate support required
Use RID or primary key to refer to record?

Alternatives for Data Entry k* in

Index
Three alternatives:
Actual data record (primary index

only)
<k, rid of matching data record>
<k, list of rids of matching data
records>

Choice is orthogonal to the indexing

technique.

Alternatives for Data Entries

(Contd.)
Alternative 1:

Actual data record

Use index structure as the file structure
Saves pointer lookups for primary index
searches
Adds a primary index lookup for secondary
index access!
Index nodes have
all the issues of
record
management

Alternatives for Data Entries

(Contd.)
Alternative 2

<k, rid of matching data record>

and Alternative 3

<k, list of rids of matching data records>

If heap file is used (no alt 1 indexes), then physical rid
can be used instead of primary key to refer to records
Alternative 3 more compact than Alternative 2, but
leads to variable sized data entries even if search keys
are of fixed length.
Even worse, for large rid lists the data entry would
have to span multiple blocks! (how many?)
Typical solution: add primary key or rid to end of
secondary keys, and use Alternative 2!

Index Classification

Clustered vs. unclustered: If order of data

records is the same as, or `close to, order
of index data entries, then called clustered
index.
A file can be clustered on at most one search key.
Cost of range scans through index varies greatly
based on whether index is clustered or not!
Alternative 1-light
Alternative 1 implies clustered, but not vice-versa.
Use Physical RID in secondary index (why is this good?)

Clustered vs. Unclustered Index

Suppose that Alternative (2) is used for data entries,
and that the data records are stored in a Heap file.
To build clustered index, first sort the Heap file (with some free
space on each block for future inserts).
Overflow blocks may be needed for inserts. (Thus, order of
data recs is `close to, but not identical to, the sort order.)

CLUSTERED

Index entries
direct search for
data entries

Data entries

UNCLUSTERED

Data entries
(Index File)
(Data file)

Data Records

Unclustered vs. Clustered

Indexes
What are the tradeoffs????
Clustered Pros
Efficient for range searches
May be able to do some types of
compression
Possible locality benefits (related data?)
???

Clustered Cons
Expensive to maintain (on the fly or
sloppy with reorganization)
Pages tend to be only 2/3 full!

Cost of
Operations

B: The number of data pages

R: Number of records per page
D: (Average) time to read or write disk p

Heap File

Sorted File

Clustered File

Scan all
records

1.5 BD

Equality
Search

0.5 BD

(log2 B) * D

(logF 1.5B) * D

Range
Search

[(log2 B) +
#match pg]*D

[(logF 1.5B) +
#match pg]*D

Insert

((log2B)+B)D

((logF 1.5B)+1) *
D

Delete

0.5BD + D

((log2B)+B)D
(because R,W 0.5)

((logF 1.5B)+1) *
D

Composite Search Keys

Search on a combination of
fields.
Equality query: Every field value
is equal to a constant value. E.g.
wrt <age,sal> index:
age=20 and sal =75

Range query: Some field value is

not a constant. E.g.:
age > 20; or age=20 and sal >
10

Data entries in index sorted

by search key to support
range queries.
Lexicographic order
Like the dictionary, but on fields,
not letters!

Examples of composite key

indexes using lexicographic order.
11,80

12,10

12,20
13,75
<age, sal>
10,12
20,12
75,13

name age sal

bob 12

cal 11

joe 12

sue 13

12
13
<age>
10

Data records
sorted by name

80,11
<sal, age>

Data entries in index

sorted by <sal,age>

20
75
80
<sal>

Data entries
sorted by <sal>

Summary
File Layer manages access to records in pages.
Record and page formats depend on fixed vs. variablelength.
Free space management an important issue.
Slotted page format supports variable length records
and allows records to move on page.

Many alternative file organizations exist, each

appropriate in some situation.
If selection queries are frequent, sorting the file
or building an index is important.
Hash-based indexes only good for equality search.
Sorted files and tree-based indexes best for range
search; also good for equality search. (Files rarely
kept sorted in practice; B+ tree index is better.)

Index is a collection of data entries plus a way

to quickly find entries with given key values.

Summary (Contd.)
Data entries in index can be actual data records, <key,
rid> pairs, or <key, rid-list> pairs.
Choice orthogonal to indexing structure (i.e., tree, hash,
etc.).

Usually have several indexes on a given file of data

records, each with a different search key.
Indexes can be classified as clustered vs. unclustered
Differences have important consequences for
utility/performance.
Catalog relations store information about relations,
indexes and views.

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (81)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
69% (72)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Dbms PPT For Chapter 7
No ratings yet
Dbms PPT For Chapter 7
45 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Chapter 8 Indexing NEW
No ratings yet
Chapter 8 Indexing NEW
43 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
V_Unit[1]
No ratings yet
V_Unit[1]
36 pages
V Unit
No ratings yet
V Unit
15 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
File Organization
No ratings yet
File Organization
19 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Unit08 DBMS
100% (1)
Unit08 DBMS
45 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
Layers of a DBMS
No ratings yet
Layers of a DBMS
38 pages
index1 (5)
No ratings yet
index1 (5)
25 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
File Organization
No ratings yet
File Organization
41 pages
Lecture 5 Trees
No ratings yet
Lecture 5 Trees
47 pages
Indexing
No ratings yet
Indexing
62 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
Storage and Indexing
No ratings yet
Storage and Indexing
41 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
Lesson 9 Mod2l2
No ratings yet
Lesson 9 Mod2l2
16 pages
File Organizations and Indexing: Module 2, Lecture 2
No ratings yet
File Organizations and Indexing: Module 2, Lecture 2
16 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
IT3031-L06-Indexing
No ratings yet
IT3031-L06-Indexing
45 pages
4 File & Index
No ratings yet
4 File & Index
35 pages
Lecture 16
No ratings yet
Lecture 16
19 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Lesson 8 Cs450 - Indexing
No ratings yet
Lesson 8 Cs450 - Indexing
31 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
No ratings yet
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
52 pages
Chapter_3_File_Organization_Indexed_methods
No ratings yet
Chapter_3_File_Organization_Indexed_methods
31 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
9 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
Heap File vs Sorted Files
No ratings yet
Heap File vs Sorted Files
35 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
PPT-203105251-3
No ratings yet
PPT-203105251-3
35 pages
DBMS Unit-Iv
No ratings yet
DBMS Unit-Iv
9 pages
CH 3 Index
No ratings yet
CH 3 Index
40 pages
UNIT-5: Indexing and Hashing
No ratings yet
UNIT-5: Indexing and Hashing
78 pages
File Organization
No ratings yet
File Organization
11 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
File Structure Data Storage Query Evaluation Indexing and Hashing
No ratings yet
File Structure Data Storage Query Evaluation Indexing and Hashing
14 pages
SS3 TERM 1
No ratings yet
SS3 TERM 1
18 pages
3 - QueryProcessing - Ch15
No ratings yet
3 - QueryProcessing - Ch15
56 pages
MYCH8
No ratings yet
MYCH8
35 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Indexing_Hashing_Files
No ratings yet
Indexing_Hashing_Files
68 pages
CS2202_IndexingHashing
No ratings yet
CS2202_IndexingHashing
83 pages
DBMS Unit5
No ratings yet
DBMS Unit5
20 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
Managing Multimedia and Unstructured Data in the Oracle Database
From Everand
Managing Multimedia and Unstructured Data in the Oracle Database
Marcelle Kratochvil
No ratings yet
Web Caches, CDNS, and P2Ps
No ratings yet
Web Caches, CDNS, and P2Ps
7 pages
Line Encoding: Line Encoding Converts A Binary Information Sequence To Digital Signal
No ratings yet
Line Encoding: Line Encoding Converts A Binary Information Sequence To Digital Signal
8 pages
Lecture 8
No ratings yet
Lecture 8
11 pages
Recent Advances in Routing Architecture Including: Line Cards
No ratings yet
Recent Advances in Routing Architecture Including: Line Cards
11 pages
Medium Access Control
No ratings yet
Medium Access Control
8 pages
Lecture17 NetworkResourceAllocation
No ratings yet
Lecture17 NetworkResourceAllocation
12 pages
Length of A Curve and Surface Area
No ratings yet
Length of A Curve and Surface Area
12 pages
3 Post Notes
No ratings yet
3 Post Notes
6 pages
Lecture16 TCPOverview
No ratings yet
Lecture16 TCPOverview
12 pages
Data Link Protocols: Unrestricted Simplex Protocol Simplex Stop-And-Wait Protocol Simplex Protocol For A Noisy Channel
No ratings yet
Data Link Protocols: Unrestricted Simplex Protocol Simplex Stop-And-Wait Protocol Simplex Protocol For A Noisy Channel
6 pages
Inter-Domain Routing Basics: Exterior Routing Protocols Created To
No ratings yet
Inter-Domain Routing Basics: Exterior Routing Protocols Created To
14 pages
The Nodes Need To Remember Their Addresses Identify The Links To Which They Are Attached
No ratings yet
The Nodes Need To Remember Their Addresses Identify The Links To Which They Are Attached
13 pages
Congestion Control: Issues
No ratings yet
Congestion Control: Issues
7 pages
What Is Direct Link Networks?
No ratings yet
What Is Direct Link Networks?
6 pages
Optimization Problems
No ratings yet
Optimization Problems
8 pages
18 Post Notes
No ratings yet
18 Post Notes
8 pages
Differentials and Approximations
No ratings yet
Differentials and Approximations
6 pages
33 Post Notes
No ratings yet
33 Post Notes
11 pages
29 Post Notes
No ratings yet
29 Post Notes
6 pages
Solving Equations Numerically: 21B Numerical Solutions
No ratings yet
Solving Equations Numerically: 21B Numerical Solutions
8 pages
10 Post Notes
No ratings yet
10 Post Notes
9 pages
19 Post Notes
No ratings yet
19 Post Notes
5 pages
12 Post Notes
No ratings yet
12 Post Notes
6 pages
2.1B Riorous Study of Limits
No ratings yet
2.1B Riorous Study of Limits
7 pages
24 Post Notes
No ratings yet
24 Post Notes
8 pages
1 Post Notes
No ratings yet
1 Post Notes
7 pages
The First Fundamental Theorem of Calculus
No ratings yet
The First Fundamental Theorem of Calculus
6 pages
22 Post Notes
No ratings yet
22 Post Notes
8 pages
23 Post Notes
No ratings yet
23 Post Notes
6 pages
Troubleshooting SQL Server Alwayson v1 0
100% (1)
Troubleshooting SQL Server Alwayson v1 0
77 pages

File Organizations and Indexing: R&G Chapter 8

Uploaded by

File Organizations and Indexing: R&G Chapter 8

Uploaded by

File Organizations and Indexing

"If you don't find it in the index, look very

Alternative File Organizations

Cost Model for Analysis

B: The number of data blocks

Some Assumptions in the

B: The number of data pages

B: The number of data pages

B: The number of data pages

B: The number of data pages

B: The number of data pages

B: The number of data pages

Associating index entries with records

Clustered vs. Unclustered Indexes

Hash-based structures (how to

Tree based structures

Wide World of Index

See http://gist.cs.berkeley.edu for research on this topic

Nearest neighbor (closest BMW dealer)

Regular expression matches

XML path matches

Primary vs. Secondary Index

Alternatives for Data Entry k* in

Choice is orthogonal to the indexing

Alternatives for Data Entries

Actual data record

Alternatives for Data Entries

<k, rid of matching data record>

<k, list of rids of matching data records>

Clustered vs. unclustered: If order of data

Clustered vs. Unclustered Index

Unclustered vs. Clustered

B: The number of data pages

Composite Search Keys

Range query: Some field value is

Data entries in index sorted

Examples of composite key

name age sal

Data entries in index

Many alternative file organizations exist, each

Index is a collection of data entries plus a way

Usually have several indexes on a given file of data

You might also like