Chapter 3 Topic - 4
Chapter 3 Topic - 4
AND
DATA MINING
P. Venkateshwarlu
Faculty of Computer Sci
[email protected]
TOPIC
Data Warehouse
Implementation
2
INTRODUCTION
Data warehouses contain huge volumes of data.
OLAP servers demand that decision support queries be
answered in the order of seconds.
Therefore, it is essential (crucial) for data warehouse
systems to support efficient implementation with
› highly efficient cube computation techniques
› access methods
› query processing techniques.
3
Efficient Computation of Data Cubes
4
Computation of Data Cubes ..
5
Computation of Data Cubes ..
The total number of data cuboids is 23=8
› {(city,item,year),
› (city,item), (city,year),
› (city),(item),(year),
› ()}
6
Lattice of a cube
7
Computation of Cubes
On-line analytical processing may need to access
different cuboids for different queries
n
T (Li 1)
i1
Selected computation
Partial materialization
› Only some of possible cuboids are generated
Materialization
No materialization
› Do not precompute any of “nonbase” cuboids
Expensive computation in during data analysis
Full materialization
› Pre compute all cuboids
Huge amount of memory....
Partial materialization
Which cuboids should we pre compute and which not?
Partial materialization -
Selection of cuboids
Take into account:
› the queries, their frequencies, the accessing costs
› workload characteristics, costs for incremental
updates, storage requirements
Dimension hierarchies
› time: day < month < quater < year
› Item: item_name < brand < type
Query
{brand,province_or_state} with year=2000
Cust Region Type RecID Asia Europe Am erica RecID Retail Dealer
C1 Asia Retail 1 1 0 0 1 1 0
C2 Europe Dealer 2 0 1 0 2 0 1
C3 Asia Dealer 3 1 0 0 3 0 1
4 0 0 1 4 1 0
C4 America Retail
5 0 1 0 5 0 1
C5 Europe Dealer
21
Bitmap Index
Allows quick search in data cubes
Advantageous compared to hash and tree indices
Useful for low-cardinality domains because
comparison, join, and aggregation operations are
reduced to bitmap arithmetic's
› (Reduced processing time!)
Significant reduction in space and I/O since a string
of character can be represented by a bit
Join indexing method
Motivation
Only calculate “interesting” cells—data above certain
threshold
Avoid explosive growth of the cube
29