0% found this document useful (0 votes)
28 views

Indexing: By: Arnold Mesa

The document discusses different types of indices used to access data in files or databases. There are two main types: ordered indices, which sort values, and hash indices, which distribute values uniformly across buckets using a hash function. Within ordered indices, there are dense and sparse variants. Dense indices have an index record for every value, while sparse indices only index some values. The document also covers performance factors like access time and space overhead and how indices can be made more efficient for large datasets using additional sparse indices or tree structures like B+ trees.

Uploaded by

Rohit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Indexing: By: Arnold Mesa

The document discusses different types of indices used to access data in files or databases. There are two main types: ordered indices, which sort values, and hash indices, which distribute values uniformly across buckets using a hash function. Within ordered indices, there are dense and sparse variants. Dense indices have an index record for every value, while sparse indices only index some values. The document also covers performance factors like access time and space overhead and how indices can be made more efficient for large datasets using additional sparse indices or tree structures like B+ trees.

Uploaded by

Rohit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 12

Indexing

By: Arnold Mesa

Indexing
You can think of an index to a file like a
catalogue to a library

There are two kinds...


Ordered Indices - sorted ordering of the

values.
Hash Indices - a uniform distribution of

values across a range of buckets. The


distribution is based on a hash function.

Key Concepts
Access Types - types of access that are supported

efficiently
Access Time - time it takes to access a particular data item
Insertion Time - time it takes to insert a data item
Deletion Time - time it takes to delete a data item
Space Overhead - additional space occupied by an index

structure

There are two kinds of ordered indices


Dense Index -

An index record appears for every search-key


value in the file. The index record contains the search-key value and a
pointer to the first data record. The rest of the records with the same
search key-value would be sequentially stored after the first record.

Sparse Index - An index record appears for only some of the


search key values. So you have a smaller number of index records.
Each index contains a search key and a pointer to the first record, as
with the dense index.

Dense Index

234

Hotel Sofitel

A-212

Hotel Sofitel

321

Hilton

B-321

Hilton
Westin
Marriot

389

Hilton

C-002

396

Hilton

A-322

112

Westin

C-034

253

Westin

B-219

501

Marriot

B-069

532

Marriot

C-304

221

The Ritz

A-007

The Ritz

Sparse Tree

Hotel Sofitel
Westin
The Ritz

234

Hotel Sofitel

A-212

321

Hilton

B-321

389

Hilton

C-002

396

Hilton

A-322

112

Westin

C-034

253

Westin

B-219

501

Marriot

B-069

532

Marriot

C-304

221

The Ritz

A-007

Suppose we want to find the Marriot #532...

Hotel Sofitel
Westin
The Ritz

234

Hotel Sofitel

A-212

321

Hilton

B-321

389

Hilton

C-002

396

Hilton

A-322

112

Westin

C-034

253

Westin

B-219

501

Marriot

B-069

532

Marriot

C-304

221

The Ritz

A-007

Efficiency Issues
Even if we use a sparse index, the index itself may become

too large for efficient processing


If an index is sufficiently small to be kept in main memory,

the search time would be low


If the index is large that is kept on disk, a search may

require several disk block reads

How to deal ...


With a large index we should construct a sparse index on

the primary index.

Hotel Sofitel
Marriot
Marriot

Hotel Sofitel
Hilton
Westin
Marriot
The Ritz

234

Hotel Sofitel

A-212

321

Hilton

B-321

389

Hilton

C-002

396

Hilton

A-322

112

Westin

C-034

253

Westin

B-219

501

Marriot

B-069

532

Marriot

C-304

221

The Ritz

A-007

Is this looking familiar?


Remember B+-trees
B+ trees are said to be of m-order. A number of the designers choosing.
Each leaf has between m and [m-2] children.
All data is stored at the leaf level.
All leaves are at the same depth

Example?

You might also like