0% found this document useful (0 votes)

15 views45 pages

IT3031-L06-Indexing

The document discusses file organization and indexing in database systems, detailing types of file organizations such as heap, sequential, and hashed files, along with their advantages and disadvantages. It also covers the characteristics of indexes, including clustered vs. unclustered, dense vs. sparse, and primary vs. secondary indexes, as well as the B+ tree structure for efficient data retrieval. Additionally, it touches on hashing techniques for equality selections and the implementation of static hashing.

Uploaded by

jdoe52682

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views45 pages

IT3031-L06-Indexing

Uploaded by

jdoe52682

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 45

Database Systems and

Data-Driven Applications

File Organization and Indexes

This Lecture…
 File Organization

 Indexes
Files of Records

 Page or block is OK when doing I/O, but higher

levels of DBMS operate on records, and files of
records.
 FILE: A collection of pages, each containing a
collection of records. Must support:

insert/delete/modify record

read a particular record (specified using
record id)

scan all records (possibly with some
conditions on the records to be retrieved)
File Organization

 Three types
 Heap File Organization
 Sequential File Organization
 Hashing File Organization
Alternative File
Organizations
Many alternatives exist, each ideal for some situation ,
and not so good in others:
 Heap files: Suitable when typical access is a file scan

retrieving all records.


Search (Equality/Range) needs to scan the file

Insert: At the end of file

Delete: Search for record and delete record
 Sorted Files: Best if records must be retrieved in

some order, or only a `range’ of records is needed.


Search (Equality/Range): Efficient

Insert: Finding the position, inserting & move
records

Delete Search for record, delete & move records
Alternative File
Organizations… (contd.)
 Hashed Files: Good for equality selections.
 File is a collection of buckets. Bucket = primary page

plus zero or more overflow pages.

 Hashing function h: h(r) = bucket in which record r

belongs. h looks at only some of the fields of r, called

the search fields.
Alternative File
Organizations… (contd.)
 Hashed Files:

Search (Equality): good for equality (if
based on search key). Otherwise scan
table

Search (Range): needs to scan the file

Insert: search for primary bucket (hash)
and insert

Delete: search for primary bucket (hash)
if available, else scan file & delete record
Unordered (Heap) Files
 Simplest file structure contains records in no
particular order.
 As file grows and shrinks, disk pages are
allocated and de-allocated.
 To support record level operations, we must:

keep track of the pages in a file

keep track of free space on pages

keep track of the records on a page
 There are many alternatives for keeping track
of this.
Heap File Implemented as
a List
Data Data Data Full Pages
Page Page Page
Header
Page
Data Data Data
Pages with
Page Page Page
Free Space

 The header page id and Heap file name must be

stored someplace.
 Each page contains 2 `pointers’ plus data.
Heap File Implemented as a
List… (contd.)
 Disadvantages…
 Need to scan many pages to find a
page with enough free space
Heap File Using a Page
Directory Data
Header Page 1
Page
Data
Page
2

Data
DIRECTORY Page N

 The entry for a page can include the

number of free bytes on the page.
 The directory is a collection of pages
Much smaller than linked list of all Heap
File pages!
Example
ExampleLibrary
LibraryCatalog
Catalog/ /Book
BookIndex
Index

Indexes

 An index on a file speeds up selections

on the search key fields for the index.

 Any subset of the fields of a relation can be

the search key for an index on the relation.

 Search key is not the same as key (minimal

set of fields that uniquely identify a record
in a relation).
Characteristics
 Indexes provide fast access

 Indexes takes space

 Need to be careful in creating only useful
indexes

 May slow-down certain inserts/updates/

deletes (maintain indexes)
Explain
Explainon
onboard
board

Alternatives for Data Entry k*

in Index
 An index contains a collection of data
entries, and supports efficient retrieval of
all data entries k* with a given key value
k.

 Three alternatives:
1. Data record with key value k (Alt. 1)
2. <k, rid of data record with search key
value k> (Alt. 2)
3. <k, list of rids of data records with
search key k> (Alt. 3)
Terminology
 File of records containing index
entries = index file

 There are several organization

techniques for building index files
= access methods
Properties of Indexes…

 Clustered vs. Unclustered Index

Index entries
CLUSTERED direct search for UNCLUSTERED
data entries

Data entries Data entries

(Index File)
(Data file)

Data Records Data Records

 Can have at most one clustered index per table

 Cost of retrieving data records through index varies greatly based
on whether index is clustered or not!
Properties… (contd.)

 Dense vs. Sparse: If there Ashby, 25, 3000

is at least one data entry Basu, 33, 4003

per search key value (in Bristow, 30, 2007

30
some data record), then Ashby
33
Cass Cass, 50, 5004
dense. Smith Daniels, 22, 6003

Alt. 1 always leads to

40
 Jones, 40, 6003
44

dense index. Smith, 44, 3000


Every sparse index is Tracy, 44, 5004
50

clustered! Sparse Index Dense Index


Sparse indexes are on
Name Data File
on
Age
smaller; however,
some useful
optimizations are
based on dense
indexes.
Properties… (contd.)
 Primary vs. secondary: If search
key contains primary key, then
called primary index.
 Unique index: Search key contains a
candidate key.
Properties… (contd.)
Examples of composite key
indexes using lexicographic order.
 Composite Search Keys: Search
on a combination of fields. 11,80 11
 Equality query: Every field 12,10 12
value is equal to a constant 12,20 name age sal 12
value. E.g. wrt <sal,age> 13,75 bob 12 10 13
index: <age, sal> cal 11 80 <age>

age=20 and sal =75 joe 12 20
 Range query: Some field value 10,12 sue 13 75 10
is not a constant. E.g.: 20,12 20
Data records

age =20; or age=20 and 75,13 sorted by name 75
sal > 10 80,11 80
 Data entries in index sorted by <sal, age> <sal>
search key to support range Data entries in index Data entries
queries. sorted by <sal,age> sorted by <sal>
Indexes in SQL…
 Index is not a part of SQL-92

 However, all major DBMSs provide

facilities for index creation
 CREATE INDEX…
 DROP INDEX…

 SQL Server 2005 support indexes

(clustered and non-clustered indexes)
Range Searches
 ``Find all students with gpa > 3.0’’
 If data is in sorted file, do binary search to find first

such student, then scan to find others.

 Cost of binary search can be quite high.

 Simple idea: Create an `index’ file.

k1 k2 kN Index File

Page 1 Page 2 Page 3 Page N Data File

 Can do binary search on (smaller) index file!

B+ Tree: The Most Widely
Used Index
 Insert/delete at log F N cost; keep tree height-
balanced. (F = fanout, N = # leaf pages)
 Minimum 50% occupancy (except for root). Each
node (except root) contains d <= m <= 2d
entries. The parameter d is called the order of
the tree.
 Supports equality and range-searches efficiently.
Index Entries
(Direct search)

Data Entries
("Sequence set")
B+ Trees in Practice
 Typical order: 100. Typical fill-factor: 67%.
 average fanout = 133

 Typical capacities:
 Height 4: 1334 = 312,900,700 records

 Height 3: 1333 = 2,352,637 records

 Can often hold top levels in buffer pool:

 Level 1 = 1 page = 8 Kbytes
 Level 2 = 133 pages = 1 Mbyte
 Level 3 = 17,689 pages = 133 MBytes
B+ Tree…
 Search begins at root, and key
comparisons direct it to a leaf
 Each Node has search keys (Ki) and
pointers (Pi).
 Pi points to a sub-tree in which all key
values K are such that Ki ≤ K < Ki+1
Search
func tree_search (nodepointer, search key value K)
returns nodepointer
/ / Searches tree for entry
if *nodepointer is a leaf, return nodepointer;
else,
if K < K1 then return tree_search(Po, K);
else,
if K ≥ Km then return tree_search(Pm, K) // m = #
entries
else,
find i such that Ki ≤K < Ki+1;
return tree_search(Pi, K)
end if
end if
Example B+ Tree…

 Search for 5, 15, all data entries

>= 24*
Root
...
13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

 Based on the search for 15*, we know it is not in the tree

Inserting a Data Entry into
a B+ Tree
Find correct leaf L.
Put data entry onto L.
If L has enough space, done!
Else, must split L (into L and a new node L2)
Redistribute entries evenly, copy up middle key.
Insert index entry pointing to L2 into parent of L.
This can happen recursively
To split index node, redistribute entries evenly, but push
up middle key. (Contrast with leaf splits.)
Splits “grow” tree; root split increases height.
Tree growth: gets wider or one level taller at top.
Inserting 8* into Example
B+ Tree
Entry to be inserted in parent node.
 Observe how 5 (Note that 5 is
s copied up and
minimum continues to appear in the leaf.)

occupancy is
guaranteed in 2* 3* 5* 7* 8*
both leaf and
index pg splits.
 Note difference
between copy- Entry to be inserted in parent node.
(Note that 17 is pushed up and only
up and push- 17
appears once in the index. Contrast
this with a leaf split.)
up; be sure you
understand the
5 13 24 30
reasons for this.
Example B+ Tree After
Inserting 8*
Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

 Notice that root was split, leading to increase in height.

 In this example, we can avoid split by re-
distributing entries; however, this is
usually not done in practice.
Deleting a Data Entry from
a B+ Tree
 Start at root, find leaf L where entry belongs.
 Remove the entry.

If L is at least half-full, done!
 If L has only d-1 entries,


Try to re-distribute, borrowing from sibling
(adjacent node with same parent as L).

If re-distribution fails, merge L and sibling.
 If merge occurred, must delete entry (pointing to L or
sibling) from parent of L.

Merge could propagate to root, decreasing height.
Example Tree After
(Inserting 8*, Then)
Deleting 19* and 20* ...
Root

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

 Deleting 19* is easy.

 Deleting 20* is done with re-distribution.
Notice how middle key is copied up.
... And Then Deleting
24*
 Must merge. 30

 Observe `toss’ of
index entry (on 22* 27* 29* 33* 34* 38* 39*

right), and `pull

down’ of index
entry (below).
Root
5 13 17 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

Duplicates in B+ Trees…
 We have ignored duplicates so far…

 Alternatives…
 Overflow leaf pages

 Duplicate values in the leaf pages

 Make unique key values (by adding rowid’s)


Preferred approach by many DBMSs
Hashing

 Hash-based indexes are best for

equality selections.

 Cannot support range searches.

 Static and dynamic hashing

techniques exists
Static Hashing
 # primary pages fixed, allocated
sequentially, never de-allocated; overflow
pages if needed.
 h(k) mod N = bucket to which data entry
with key k belongs.
h(key) mod N
0 (N = # of buckets)
2
key
h

N-1
Primary bucket pages Overflow pages
Static Hashing… (contd.)

 Buckets contain data entries.

 Hash fn works on search key field of
record r. Must distribute values over
range 0...N-1.
 h(key) = (a * key + b) usually works well.
 a and b are constants; lots known about
how to tune h.
Static Hashing… (contd.)
Problems…
 Insertion can create long overflow
chains can develop and degrade
performance.
 Deletion may waste space
 Extendible and Linear Hashing:
Dynamic techniques to fix this
problem.
Extendible Hashing

 Situation: Bucket (primary page) becomes full. Why

not re-organize file by doubling # of buckets?
 Reading and writing all pages is expensive!

 Idea: Use directory of pointers to buckets,

double # of buckets by doubling the directory,

splitting just the bucket that overflowed!
 Directory much smaller than file, so doubling it

is much cheaper. Only one page of data entries

is split. No overflow page!
 Trick lies in how hash function is adjusted!
LOCAL DEPTH 2
Bucket A
4* 12* 32* 16*

Example
GLOBAL DEPTH

2 2
Bucket B
00 1* 5* 21* 13*
 Directory is array of size 4.
01
 To find bucket for r, take 2
10
last `global depth’ # bits of 10*
Bucket C
11
h(r); we denote r by h(r).
 If h(r) = 5 = binary 101,
DIRECTORY 2
it is in bucket pointed to 15* 7* 19*
Bucket D
by 01.
DATA PAGES

sert: If bucket is full, split it (allocate new page, re-distribut

necessary, double the directory. (As we will see, splitting a
bucket does not always require doubling; we can tell by
comparing global depth with local depth for the split bucket.
Insert h(r)=20 (Causes
Doubling)
LOCAL DEPTH 2 3
Bucket A LOCAL DEPTH
GLOBAL DEPTH 32*16* 32* 16* Bucket A
GLOBAL DEPTH

2 2
3 2
00 1* 5* 21*13* Bucket B 000 1* 5* 21* 13* Bucket B
01 001
10 2 2
010
10* Bucket C
11 10*
011 Bucket C
100
2
DIRECTORY 101 2
Bucket D
15* 7* 19*
110 15* 7* 19* Bucket D
111
2
3
4* 12* 20* Bucket A2
DIRECTORY 4* 12* 20* Bucket A2
(`split image'
of Bucket A) (`split image'
of Bucket A)
Points to Note

 20 = binary 10100. Last 2 bits (00) tell us r belongs

in A or A2. Last 3 bits needed to tell which.
 Global depth of directory: Max # of bits needed to

tell which bucket an entry belongs to.

 Local depth of a bucket: # of bits used to

determine if an entry belongs to this bucket.

 Not all splits double the directory size

 Example: Insert 9*
Points to Note (contd.)
 When does bucket split cause
directory doubling?
 Before insert, local depth of bucket =
global depth. Insert causes local depth
to become > global depth; directory is
doubled by copying it over and `fixing’
pointer to split image page. (Use of
least significant bits enables efficient
doubling via copying of directory!)
Comments on Extendible
Hashing
 If directory fits in memory, equality search
answered with one disk access; else two.
 100MB file, 100 bytes/rec, 4K pages contains

1,000,000 records (as data entries) and 25,000

directory elements; chances are high that
directory will fit in memory.
 Directory grows in spurts, and, if the distribution

of hash values is skewed, directory can grow

large.
 Multiple entries with same hash value cause

problems!
Comments on Extendible
Hashing (contd.)
 Delete: If removal of data entry
makes bucket empty, can be
merged with `split image’. If each
directory element points to same
bucket as its split image, can halve
directory.
Summary
 File Organizations
 Indexes
 B+ Tree
 Hashing (Extendible)

Banana Fibre
100% (2)
Banana Fibre
32 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
Hash Tree Index
No ratings yet
Hash Tree Index
44 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
Lecture 5 Trees
No ratings yet
Lecture 5 Trees
47 pages
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
No ratings yet
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
52 pages
Unit08 DBMS
100% (1)
Unit08 DBMS
45 pages
Tree-Structured Indexes: Comp 521 - Files and Databases Fall 2010 1
No ratings yet
Tree-Structured Indexes: Comp 521 - Files and Databases Fall 2010 1
27 pages
Indexing
No ratings yet
Indexing
141 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
Layers of a DBMS
No ratings yet
Layers of a DBMS
38 pages
Storage and Indexing
No ratings yet
Storage and Indexing
41 pages
Lesson 8 Cs450 - Indexing
No ratings yet
Lesson 8 Cs450 - Indexing
31 pages
CS2202_IndexingHashing
No ratings yet
CS2202_IndexingHashing
83 pages
Chapter 8 Indexing NEW
No ratings yet
Chapter 8 Indexing NEW
43 pages
V_Unit[1]
No ratings yet
V_Unit[1]
36 pages
2 - Indexing Structures - Ch14
No ratings yet
2 - Indexing Structures - Ch14
50 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
Index and Hashing
No ratings yet
Index and Hashing
82 pages
Indexing_Hashing_Files
No ratings yet
Indexing_Hashing_Files
68 pages
CH 13
No ratings yet
CH 13
34 pages
B - Trees
No ratings yet
B - Trees
19 pages
V Unit
No ratings yet
V Unit
15 pages
Tree-Structured Indexes: R & G Chapter 9
No ratings yet
Tree-Structured Indexes: R & G Chapter 9
34 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
Find All Students With Gpa 3.0'': Can Do Binary Search On (Smaller) Index File!
No ratings yet
Find All Students With Gpa 3.0'': Can Do Binary Search On (Smaller) Index File!
42 pages
CSE 301 Lecture-8-Indexing WT
No ratings yet
CSE 301 Lecture-8-Indexing WT
31 pages
File Organization
No ratings yet
File Organization
41 pages
UNIT-5: Indexing and Hashing
No ratings yet
UNIT-5: Indexing and Hashing
78 pages
DBMS-Unit5-PPT (1)
No ratings yet
DBMS-Unit5-PPT (1)
40 pages
DBMS Unit-Iv
No ratings yet
DBMS Unit-Iv
9 pages
Final Review
No ratings yet
Final Review
96 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
9 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
File Organization
No ratings yet
File Organization
19 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
81 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Indexing
No ratings yet
Indexing
62 pages
PPT-203105251-3
No ratings yet
PPT-203105251-3
35 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Nice Hand Book - Numerical Analysis PDF
100% (14)
Nice Hand Book - Numerical Analysis PDF
472 pages
4 File & Index
No ratings yet
4 File & Index
35 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
INDEXING
No ratings yet
INDEXING
10 pages
Gate Project
No ratings yet
Gate Project
77 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
File Organization
No ratings yet
File Organization
11 pages
Chapter 11: Indexing and Hashing
No ratings yet
Chapter 11: Indexing and Hashing
47 pages
File Structure Data Storage Query Evaluation Indexing and Hashing
No ratings yet
File Structure Data Storage Query Evaluation Indexing and Hashing
14 pages
EMT Short Notes
No ratings yet
EMT Short Notes
11 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Dbms PPT For Chapter 7
No ratings yet
Dbms PPT For Chapter 7
45 pages
Homeostasis: Body Systems Maintain Homeostasis
No ratings yet
Homeostasis: Body Systems Maintain Homeostasis
46 pages
CAD + GIS EN Tutorial
No ratings yet
CAD + GIS EN Tutorial
113 pages
Weight Loss: 7-Day Meal Prep For
No ratings yet
Weight Loss: 7-Day Meal Prep For
18 pages
Arunachala Siva The Trishula Sadhana
No ratings yet
Arunachala Siva The Trishula Sadhana
17 pages
Nondestructive Inspection (Ndi)
No ratings yet
Nondestructive Inspection (Ndi)
25 pages
Montessori-inspired Math Bead Bar Printable Pack 1
No ratings yet
Montessori-inspired Math Bead Bar Printable Pack 1
17 pages
Chapter 2 - Managing Transaction Exposure
No ratings yet
Chapter 2 - Managing Transaction Exposure
11 pages
TSF Blank
No ratings yet
TSF Blank
1 page
The Transport Process in The Catalytic Dehydrogenation of Ethyl Alcohol
No ratings yet
The Transport Process in The Catalytic Dehydrogenation of Ethyl Alcohol
23 pages
Company Profile NDC
No ratings yet
Company Profile NDC
5 pages
Largent 1943 Presentation On Occupational Fluoride
No ratings yet
Largent 1943 Presentation On Occupational Fluoride
6 pages
Mushrooms As Source of Dietary Fiber and Its Medicinal Value: A Review Article
No ratings yet
Mushrooms As Source of Dietary Fiber and Its Medicinal Value: A Review Article
4 pages
freake2010
No ratings yet
freake2010
3 pages
Qsr
No ratings yet
Qsr
5 pages
HJJK
No ratings yet
HJJK
11 pages
Safety Data Sheet: HI 93733A-0 Nessler Reagent
No ratings yet
Safety Data Sheet: HI 93733A-0 Nessler Reagent
5 pages
NI CaseStudy Cs 16265
No ratings yet
NI CaseStudy Cs 16265
3 pages
Hubspot Vs Freshsales
No ratings yet
Hubspot Vs Freshsales
3 pages
Education Agents
No ratings yet
Education Agents
2 pages
Boss br-800 - Brochure
No ratings yet
Boss br-800 - Brochure
2 pages
Name Synopsis Description
No ratings yet
Name Synopsis Description
4 pages
THERMA V (AWHP) R32 Split IWT Leaflet - 0721 - FIN - Low
No ratings yet
THERMA V (AWHP) R32 Split IWT Leaflet - 0721 - FIN - Low
2 pages
Functionality of LACT Unit
No ratings yet
Functionality of LACT Unit
5 pages
North Western University, Khulna (Fall-2021) : Thursday Friday Saturday
No ratings yet
North Western University, Khulna (Fall-2021) : Thursday Friday Saturday
2 pages
Past Simple Ativ
No ratings yet
Past Simple Ativ
2 pages
Mumbai Salaried 6
No ratings yet
Mumbai Salaried 6
4 pages
Beginning Data Structures Using C
From Everand
Beginning Data Structures Using C
Yogish Sachdeva
4.5/5 (7)
Elasticsearch Essentials: Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide
From Everand
Elasticsearch Essentials: Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide
Bharvi Dixit
No ratings yet
Microsoft Access: Database Creation and Management through Microsoft Access
From Everand
Microsoft Access: Database Creation and Management through Microsoft Access
Steven Bright
No ratings yet
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

IT3031-L06-Indexing

Uploaded by

IT3031-L06-Indexing

Uploaded by

Database Systems and

File Organization and Indexes

 Page or block is OK when doing I/O, but higher

retrieving all records.

some order, or only a `range’ of records is needed.

plus zero or more overflow pages.

belongs. h looks at only some of the fields of r, called

 The header page id and Heap file name must be

 The entry for a page can include the

 An index on a file speeds up selections

 Any subset of the fields of a relation can be

 Search key is not the same as key (minimal

 Indexes takes space

 May slow-down certain inserts/updates/

Alternatives for Data Entry k*

 There are several organization

 Clustered vs. Unclustered Index

Data entries Data entries

Data Records Data Records

 Can have at most one clustered index per table

 Dense vs. Sparse: If there Ashby, 25, 3000

is at least one data entry Basu, 33, 4003

per search key value (in Bristow, 30, 2007

Alt. 1 always leads to

dense index. Smith, 44, 3000

clustered! Sparse Index Dense Index

 However, all major DBMSs provide

 SQL Server 2005 support indexes

such student, then scan to find others.

 Simple idea: Create an `index’ file.

Page 1 Page 2 Page 3 Page N Data File

 Can do binary search on (smaller) index file!

 Height 3: 1333 = 2,352,637 records

 Can often hold top levels in buffer pool:

 Search for 5*, 15*, all data entries

 Based on the search for 15*, we know it is not in the tree

 Notice that root was split, leading to increase in height.

 Deleting 19* is easy.

right), and `pull

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

 Duplicate values in the leaf pages

 Make unique key values (by adding rowid’s)

 Hash-based indexes are best for

 Cannot support range searches.

 Static and dynamic hashing

 Buckets contain data entries.

 Situation: Bucket (primary page) becomes full. Why

 Idea: Use directory of pointers to buckets,

double # of buckets by doubling the directory,

is much cheaper. Only one page of data entries

sert: If bucket is full, split it (allocate new page, re-distribut

 20 = binary 10100. Last 2 bits (00) tell us r belongs

tell which bucket an entry belongs to.

determine if an entry belongs to this bucket.

 Not all splits double the directory size

1,000,000 records (as data entries) and 25,000

of hash values is skewed, directory can grow

You might also like

 Search for 5, 15, all data entries