CS 345: Topics in Data Warehousing: Thursday, October 21, 2004

The document summarizes topics related to indexes in databases including: B-tree and hash indexes; clustered vs non-clustered indexes; covering indexes that allow index-only query plans; multi-column indexes; and using indexes to improve performance of queries in data warehouses.

Uploaded by

Subasri Sridhar

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

CS 345: Topics in Data Warehousing: Thursday, October 21, 2004

Uploaded by

Subasri Sridhar

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 29

CS 345:

Topics in Data Warehousing

Thursday, October 21, 2004
Review of Tuesday’s Class
• Database System Architecture
– Memory management
– Secondary storage (disk)
– Query planning process
• Joins
– Nested Loop Join
– Merge Join
– Hash Join
• Grouping
– Sort vs. Hash
Outline of Today’s Class
• Indexes
– B-Tree and Hash Indexes
– Clustered vs. Non-Clustered
– Covering Indexes
• Using Indexes in Query Plans
• Bitmap Indexes
– Index intersection plans
– Bitmap compression
Indexes
• Provide efficient access to relevant records
– Based on values of particular attribute(s)
• Same idea as index in back of a book
• “fact tables 16, 17, 49”
– Information about fact tables on pages 16, 17, and 49
– No information about fact tables on other pages
– Without an index, we’d have to look through the
whole book page by page
Typical Index Structure
• Indexes organized based on some search key
– Column (or set of columns) whose values are used to
access the index
– Organization can be sorting or hashing
• Index is built for some relation
– One index entry per record in the relation
• Index consists of <Value, RID> pairs
– Value = value of the search key for this record
– RID = record identifier
• Tells the DBMS where the record is stored
• Usually (page number, offset in page)
Sorted Index
• Index entries usually much smaller than records
– Record has many attributes besides search key
• Build search tree on top of index entries
– Allows particular value to be located quickly

2 5

2 4 4 5 7 8
B-Tree Index
• By far the most common type of index
• Sorted index with search tree
• Good for point queries and range queries
– Point query: A = 5
– Range query: A BETWEEN 5 AND 10
• Search tree nodes are page-sized
– Contain <Value, Pointer> pairs
– Each Pointer is to a node of the level below
• Trade-off in choosing index page sizes
– Larger pages → fewer search tree levels → fewer
page reads
– Larger pages → each page read takes longer
Hash Indexes
• Useful for point queries
– Slightly better performance than B-Trees
– Not useful for range queries
• Less widely supported than B-Trees
Alternate B-Tree Organization
• Many records with same search key causes
redundancy
– <Stanford,RID1>,<Stanford,RID2>,
<Stanford,RID3>,<Stanford,RID4>
• Can store RID-lists instead
– <Stanford, (RID1,RID2,RID3,RID4)>
– Each value occurs once in the index
– Index entry is <Value,RID-list> instead of
<Value,RID>
– Saves space when search key has many repeated
values
Clustered Indexes
• An index is clustered (or “clustering”) if records in the
relation are organized based on index search key
• Clustered indexes are good because:
– Records satisfying a range query are packed onto a small number
of consecutive pages
• In unclustered indexes, by contrast:
– Records satisfying a range query are spread across a large
number of random pages
– Commingled with other records that do not satisfy the query
• Only one clustered index allowed per relation
– A relation can’t be simultaneously sorted by 2 different attributes
– (Unless there are multiple copies of the relation)
Clustered vs. Unclustered
Clustered Sequential
2 5 Reads

2 4 4 5 7 8

2 4 4

5 7 8

Unclustered 2 5
Random
Reads

2 4 4 5 7 8

4 7 5

2 4 8
Comparing Access Plans
• Consider query “SELECT * FROM R WHERE A=5”
• Three query plans:
– Scan relation R
• Sequential read of all pages in R
• Regardless of how many tuples have A=5
– Use clustered index on A
• Sequential read of relevant pages in R
• Num. relevant pages = (# of tuples with A=5) / (# of tuples per page)
• Plus overhead of accessing index pages
– Use unclustered index on A
• Random read of relevant pages in R
• Number of relevant pages = (# of tuples with A=5)
– Less if A is highly correlated with sort order of relation
• Plus overhead of accessing index pages
Comparing Access Plans
• Clustered index is always best
– Unless all tuples are being returned (then use scan)
– But clustered index may not be available
• Unclustered index beats scan when fraction of
tuples returned is small
– Depends on these factors:
• % of tuples being returned
• Cost ratio of random I/O vs. sequential I/O
• # of tuples per page
– Query returns >10% of rows → scan is almost
certainly faster
Covering Indexes
• Example using index in a book:
– “What does this book say about fact tables?”
• Look up “fact tables” in the index
• Turn to each page that is listed
• Read that page and see what it says
– “Which of these topics are discussed in this
book: fact tables, bridge tables, B-trees?”
• Look up the three topics in the index
• See how many of them appear
• Don’t need to read any of the actual book
Covering Indexes
• Sometimes an index has all the data you need
– Allows index-only query plan
– Not necessary to access the actual tuples
– Such an index is called a covering index
• SELECT COUNT(*) FROM R WHERE A=5
– Use index on A
– Count number of <5,RID> entries
– No need to look up records referenced by RIDs
• An index is a “thin” copy of a relation
– Not all columns from the relation are included
– The index is sorted in a particular way
Multi-Column Indexes
• Multi-column indexes are very useful in data
warehousing
– We say such an index has a composite key
• Example: B-Tree index on (A,B)
– Search key is (A,B) combination
– Index entries sorted by A value
– Entries with same A value are sorted by B value
– Called a lexicographic sort
• SELECT SUM(B) FROM R WHERE A=5
– Our (A,B) index covers this query!
• Coverage vs. size trade-off
– More attributes in search key → index covers more queries
– More attributes in search key → index takes up more disk space
Fact and Dimension Indexes
• Dimension table index
• Narrow version of table with
only frequently-queried
attributes
• Always include dimension key!
• Improve performance on large
dimension tables

• Fact table index

• Narrow version of fact that
omits certain dimensions /
measures
• Useful for queries that
exclusively reference indexed
dimensions / measures
Order of Composite Key
• Index on (A,B) ≠ Index on (B,A)
– Can efficiently search based on leading terms
– No efficient search for trailing terms
• SELECT SUM(B) FROM R WHERE A=5
– Index on (A,B) is sorted by A
• Search for records where A=5
• Scan only the relevant portion of the index
– Index on (B,A) is sorted by B
• Records with A=5 are scattered throughout index
• Need to scan the entire index
• Or else do one search for each distinct value of B
– Oracle’s “index skip scans”
– Index on (A,B) is better for this query
– Either index is much faster than accessing relation!
Index Summary
• Indexes are useful in two ways:
– Indexes allow efficient search on some attributes due
to the way they are organized
– Index-only plans use small indexes in place of large
relations
• For OLAP queries, the second use is generally
more important
– Search via non-covering, non-clustered index leads to
random I/O
– Analysis queries typically aggregate lots of tuples
– Doing one random I/O per tuple can be costly
Example
• Sales(Date, Store, Product, Promotion,
TransactionId, Quantity, DollarAmt)
– Index on (Date, Store, Quantity, DollarAmt)
– Index on (Date, Promotion, Product, Quantity,
DollarAmt)
– Index on (Product, Date, Store, Quantity, DollarAmt)
• Store
– Index on (Name, District, StoreKey)
• Product
– Index on (Name, Brand, Dept, ProductKey)
– Index on (Brand, Dept, ProductKey)
Example Query
Product:
Sales:
Brand
DollarAmt

SELECT Brand, SUM(DollarAmt)

FROM Sales, Product, Store
WHERE Sales.ProductKey = Product.ProductKey
AND Sales.StoreKey = Store.StoreKey
AND Store.Name = 'Crystal Springs Safeway‘
GROUP BY Brand
Store:
Name
Selecting Indexes
Lacks
• Sales(Date, Store, Product, Promotion, Product
TransactionId, Quantity, DollarAmt)
– Index on (Date, Store, Quantity, DollarAmt)
– Index on (Date, Promotion, Product, Quantity,
DollarAmt)
– Index on (Product, Date, Store, Quantity, DollarAmt)
• Store Lacks
– Index on (Name, District, StoreKey) Store
• Product
– Index on (Name, Brand, Dept, ProductKey)
– Index on (Brand, Dept, ProductKey)
Wider
Than
Needed
Query Plan
• Search Store(Name, District, StoreKey) index for
Name=‘Crystal Springs Safeway’
• Nested Loop Join
– Outer = Sales(Product,Date,Store,Quantity,DollarAmt) index
– Inner = Qualifying Store index entries
– Output preserves sort order of Sales index
• Sort Product(Brand,Dept,ProductKey) index entries by
ProductKey
• Merge Join
– Result of Nested Loop Join (already sorted by ProductKey)
– Product(Brand,Dept,ProductKey)
• Hash resulting tuples on Brand (for GROUP BY)
– Compute SUM(DollarAmt) for each Brand
Index Intersection
• Suppose we have table R(A,B,C,D,E)
– B-Tree index on A
– B-Tree index on B
– No multi-column indexes
• SELECT COUNT(*) FROM R WHERE A=5 AND B < 10
• Use an index intersection plan
– Search A index for A=5
• Index entries have <A,RID>
• Think of the index as a 2-column table with schema I1(A,RID)
– Search B index for B<10
• Index entries have <B,RID>
• Think of the index as a 2-column table with schema I2(B,RID)
– Join qualifying index entries on I1.RID = I2.RID
Index Intersection
• Index intersection works well for conjunction of
multiple, moderately selective filters
– SELECT SUM(C) FROM R WHERE A=5 AND B<10
– 5% of rows have A=5
– 5% of rows have B<10
– 5% * 5% = 0.25% of rows have A=5 AND B<10
– Retrieving rows matching A index alone, or B index
alone, would be slow
– Only a few rows match both indexes
• Intersect indexes and retrieve rows that match both
– Overhead of joining indexes often small relative to
cost of retrieving matching records from relation
Bitmap Indexes
• Earlier idea: use RID-lists in place of RIDs
– Save space when attribute values repeat
• Bitmap indexes take this one step further
– Use Bitmap in place of RID-list
– Each RID in the entire relation is represented by 1 bit
• 1 = RID is present in RID-list
• 0 = RID is absent from RID-list
– Bitmaps are usually compressed
• E.g using run-length encoding
Bitmap Index Example
• Bitmap index looks ID Name Sex
like this: 1 Fred M
<M,10100011>
2 Jill F
<F,01011100>
3 Joe M
4 Fran F
5 Ellen F
6 Kate F
7 Matt M
8 Bob M
Why Bitmap Indexes?
• Index intersection plans with bitmap indexes are fast
– Just perform bitwise AND!
– Index intersection with B-Trees requires a join
• SELECT COUNT(*) FROM R WHERE A=5 AND B < 10
– Bitmap index on A
– Bitmap index on B
– OR together bitmaps for B values that are < 10
– AND the result with the bitmap for A=5
– Can be computed very quickly
• Assuming not too many distinct B values that are < 10
• Save space for low-cardinality attributes
– As compared to a B-Tree or Hash index
– Particularly if compression is used
• Most useful for attributes with low or medium cardinality
– Not good for something like LastName
Compressing Bitmaps
• Consider a bitmap index on an attribute with 20 distinct values
• Each row has 1 value for that attribute
• 20 different bitmaps
– ith bit is set to 1 in one bitmap
– ith is set to 0 in 19 bitmaps
• Bitmaps consist mostly of zeros (95% of bits are zero)
– Good opportunity for compression
• Compression via run length encoding
– Just record number of zeros between adjacent ones
– 00000001000010000000000001100000
– Store this as “7,4,12,0,5”
• Compression Pros and Cons
– Reduce storage space → reduce number of I/Os required
– Need to compress/uncompress → increase CPU work required
– Each compression scheme negotiates this trade-off differently
– Operate directly on compressed bitmap → improved performance

Boxer 2.2 HDi Tigethening Torque
50% (2)
Boxer 2.2 HDi Tigethening Torque
3 pages
Introduction To Indexes
No ratings yet
Introduction To Indexes
35 pages
Brayton Refrigeration Cycles For Small-Scale LNG
No ratings yet
Brayton Refrigeration Cycles For Small-Scale LNG
9 pages
Session Hijacking
No ratings yet
Session Hijacking
21 pages
Index: Presented By-VISHAKHA CHANDRA (10030141082)
No ratings yet
Index: Presented By-VISHAKHA CHANDRA (10030141082)
29 pages
Indexing in Relational Databases
No ratings yet
Indexing in Relational Databases
2 pages
Indexes
No ratings yet
Indexes
70 pages
Module 12 - Managing Indexes
No ratings yet
Module 12 - Managing Indexes
19 pages
SQL Server Index Basics
No ratings yet
SQL Server Index Basics
5 pages
Module 5 6 7 8
No ratings yet
Module 5 6 7 8
116 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
Tuning SQL Queries - Oracle
100% (1)
Tuning SQL Queries - Oracle
27 pages
Lesson 4 - Indexing
No ratings yet
Lesson 4 - Indexing
6 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Database Indexing
No ratings yet
Database Indexing
4 pages
Hash Tables and Query Execution: March 1st, 2004
No ratings yet
Hash Tables and Query Execution: March 1st, 2004
32 pages
Tuning
100% (2)
Tuning
29 pages
Database Index PDF
No ratings yet
Database Index PDF
6 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
MySQL Indexing
No ratings yet
MySQL Indexing
19 pages
Hashing & Indexing Structures_ Single Level & Multi Level Indices
No ratings yet
Hashing & Indexing Structures_ Single Level & Multi Level Indices
1 page
Tuning: Overview: Leccotech
No ratings yet
Tuning: Overview: Leccotech
29 pages
Data Warehouse - Bitmap Indexing
No ratings yet
Data Warehouse - Bitmap Indexing
24 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Physical Database Design and Tuning: R&G - Chapter 20
No ratings yet
Physical Database Design and Tuning: R&G - Chapter 20
23 pages
Primary Indexing
No ratings yet
Primary Indexing
7 pages
ADBMS TypicalQueryOptimizer
No ratings yet
ADBMS TypicalQueryOptimizer
30 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
dbms
No ratings yet
dbms
8 pages
1 Indexing Techniques
No ratings yet
1 Indexing Techniques
30 pages
Indexing Lecture Nov 2023 Summary
No ratings yet
Indexing Lecture Nov 2023 Summary
41 pages
Planning For SQL Server® 2012 Indexing
No ratings yet
Planning For SQL Server® 2012 Indexing
25 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
What Is Indexing?: Indexing Is A Data Structure Technique Which Allows You To Quickly Retrieve
100% (1)
What Is Indexing?: Indexing Is A Data Structure Technique Which Allows You To Quickly Retrieve
7 pages
DBMS A1
No ratings yet
DBMS A1
10 pages
Physical Database Design and Tuning: R&G - Chapter 20
No ratings yet
Physical Database Design and Tuning: R&G - Chapter 20
19 pages
Index Structures
No ratings yet
Index Structures
34 pages
02 - Indices
No ratings yet
02 - Indices
208 pages
Indexing
No ratings yet
Indexing
6 pages
DBMS JOIN INDEXING (1)
No ratings yet
DBMS JOIN INDEXING (1)
6 pages
CS 522 - Database Administration Manage Indexes: Dr. Dongming Liang (Dongming - Liang@svuca - Edu)
No ratings yet
CS 522 - Database Administration Manage Indexes: Dr. Dongming Liang (Dongming - Liang@svuca - Edu)
32 pages
Taking Advantage of Indexes: How It Works
No ratings yet
Taking Advantage of Indexes: How It Works
7 pages
Database Tuning
0% (1)
Database Tuning
27 pages
Indexes and Operators
No ratings yet
Indexes and Operators
12 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
12 Database SQL Index Interview Questions and Answers For 2 To 5 Years Experienced
No ratings yet
12 Database SQL Index Interview Questions and Answers For 2 To 5 Years Experienced
5 pages
V_Unit[1]
No ratings yet
V_Unit[1]
36 pages
V Unit
No ratings yet
V Unit
15 pages
Data Access Methods in Oracle
No ratings yet
Data Access Methods in Oracle
4 pages
Lec6 QP Indexing
No ratings yet
Lec6 QP Indexing
40 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
SQL Query Optimization
No ratings yet
SQL Query Optimization
49 pages
DBAdminFund_PPT_4.3
No ratings yet
DBAdminFund_PPT_4.3
13 pages
mod4
No ratings yet
mod4
4 pages
File Organization
No ratings yet
File Organization
41 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Indexes
No ratings yet
Indexes
18 pages
Elasticsearch Server: Second Edition
From Everand
Elasticsearch Server: Second Edition
Rafał Kuć
No ratings yet
ElasticSearch Server
From Everand
ElasticSearch Server
Rafal Kuc
No ratings yet
Sphinx Search Beginner's Guide
From Everand
Sphinx Search Beginner's Guide
Abbas Ali
4/5 (2)
Example Site Specific Erection Plan
No ratings yet
Example Site Specific Erection Plan
3 pages
BEM PH613 Semester Exam Question Paper
No ratings yet
BEM PH613 Semester Exam Question Paper
1 page
CH6 Laminated Object Manufacturing
100% (2)
CH6 Laminated Object Manufacturing
10 pages
Rs. 4,340,250 Total Project Cost For 2nos. 25KL Tanks and One 40 KL Tank
No ratings yet
Rs. 4,340,250 Total Project Cost For 2nos. 25KL Tanks and One 40 KL Tank
1 page
Mig-29 Superpit: Rampstart Checklist To Be Used in Falcon 4.0 Only
No ratings yet
Mig-29 Superpit: Rampstart Checklist To Be Used in Falcon 4.0 Only
7 pages
ME8351-Manufacturing Technology-I
No ratings yet
ME8351-Manufacturing Technology-I
12 pages
Biopolymer Technology and Applications
No ratings yet
Biopolymer Technology and Applications
24 pages
Eg3002 Manual en
No ratings yet
Eg3002 Manual en
8 pages
HP RECYCLING c06289065
No ratings yet
HP RECYCLING c06289065
36 pages
Correlating Sun Glare and Traffic Accidents
No ratings yet
Correlating Sun Glare and Traffic Accidents
13 pages
Astm D3282
No ratings yet
Astm D3282
6 pages
ECE Undergrad Syllabus Ec455 2015
No ratings yet
ECE Undergrad Syllabus Ec455 2015
2 pages
Bearing Capacity
No ratings yet
Bearing Capacity
8 pages
Wireless Communication: A Seminar Report On
No ratings yet
Wireless Communication: A Seminar Report On
15 pages
Process Output Types Through Program
No ratings yet
Process Output Types Through Program
2 pages
ExxonMobil 797 Mobil DTE 790 Series
No ratings yet
ExxonMobil 797 Mobil DTE 790 Series
2 pages
Industrial Project File
100% (1)
Industrial Project File
56 pages
DHL Express Remote Areas en
100% (1)
DHL Express Remote Areas en
234 pages
Fdas_sprinkler_fire Pump_fire Pump Room for Rfq
No ratings yet
Fdas_sprinkler_fire Pump_fire Pump Room for Rfq
49 pages
Solid Block Work
50% (2)
Solid Block Work
8 pages
1.8l Turbo ATC AWP PDF
100% (1)
1.8l Turbo ATC AWP PDF
270 pages
Rajesh Dulal's Updated CV
No ratings yet
Rajesh Dulal's Updated CV
2 pages
2012 Durability of Alkali-Activated Binders A Clear Advantage Over Portland Cement
No ratings yet
2012 Durability of Alkali-Activated Binders A Clear Advantage Over Portland Cement
6 pages
Method Statement For Installation of Water Supply
No ratings yet
Method Statement For Installation of Water Supply
10 pages
Uniform Technical Guidelines For Water Reticulation and Plumbing
100% (2)
Uniform Technical Guidelines For Water Reticulation and Plumbing
216 pages
HEC Telecommunication Engineering Curriculum 2015 Latest
0% (1)
HEC Telecommunication Engineering Curriculum 2015 Latest
84 pages
ITP Pipeline
100% (1)
ITP Pipeline
5 pages