0% found this document useful (0 votes)

47 views29 pages

Chapter 3 Topic - 4

This document discusses techniques for efficient implementation of data warehouses, including cube computation, access methods, and query processing. It covers efficient computation of data cubes through precomputation of cuboids to allow fast response times to queries. Methods for selecting which cuboids to materialize include considering query frequencies, update costs, and storage requirements. Indexing techniques like bitmap indexes and join indexes can also help optimize cube access and querying.

Uploaded by

VENKATESHWARLU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views29 pages

Chapter 3 Topic - 4

Uploaded by

VENKATESHWARLU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

DATA WAREHOUSING

AND
DATA MINING

P. Venkateshwarlu
Faculty of Computer Sci

[email protected]
TOPIC

Data Warehouse
Implementation

2
INTRODUCTION
 Data warehouses contain huge volumes of data.
 OLAP servers demand that decision support queries be
answered in the order of seconds.
 Therefore, it is essential (crucial) for data warehouse
systems to support efficient implementation with
› highly efficient cube computation techniques
› access methods
› query processing techniques.

3
Efficient Computation of Data Cubes

 Core of multidimensional data analysis is the

efficient of aggregation across many sets of
dimension
 The compute cube operator aggregates over all
subsets of dimensions

4
Computation of Data Cubes ..

 You would like to create a data cubeAll_Electronics

that contains the following:
 item, city, year, and sales_in_Euro

 Answer following queries

› Compute the sum of sales, grouping by item and city
› Compute the sum of sales, grouping by item
› Compute the sum of sales, grouping by city

5
Computation of Data Cubes ..
 The total number of data cuboids is 23=8

› {(city,item,year),
› (city,item), (city,year),
› (city),(item),(year),
› ()}

 (), the dimensions are not grouped

› These group-by’s form a lattice of cuboids for the data
cube
› The basic cuboid contains all three dimensions

6
Lattice of a cube

7
Computation of Cubes
 On-line analytical processing may need to access
different cuboids for different queries

 Compute some cuboids in advance

› Precomputation leads to fast response times
› Most products support to some degree precomputation
Computation of Cubes
 Storage space may explode...
› If there are no hierarchies the total number for n-
dimensional cube is 2n
 But....
› Many dimensions may have hierarchies, for example time
 day < week < month < quarter < year
 For a n-dimensional data cube, where Li is the number of all
levels (for time Ltime=5), the total number of cuboids that can
be generated is

n
T   (Li  1)
i1
Selected computation

 It is unrealistic to precompute and materialize

(store) all cuboids that can be generated

 Partial materialization
› Only some of possible cuboids are generated
Materialization
 No materialization
› Do not precompute any of “nonbase” cuboids
 Expensive computation in during data analysis
 Full materialization
› Pre compute all cuboids
 Huge amount of memory....
 Partial materialization
 Which cuboids should we pre compute and which not?
Partial materialization -
Selection of cuboids
 Take into account:
› the queries, their frequencies, the accessing costs
› workload characteristics, costs for incremental
updates, storage requirements

 Broad context of physical database design,

generation and selection of indices
Heuristic approaches for cuboid
selection

 Materialize the set of cuboids on which other

popular referenced cuboids are based
Advantage of materialized cuboids

 It is important to take advantage of materialized

cuboids during query processing
› How to use available index structures on the
materialized cuboids
› How to transform the OLAP operations into the
selected cuboids
Selection of operations
 Determine which operations should be preformed on
the available cuboids
› This involves transforming any selection, projection, roll-
up and drill down operations in the query into
corresponding SQL and/or OLAP operations

› Determine to which materialized cuboids the relevant

operations should be applied
› Identifying all materialized cuboids that may potentially
by used to answer the query
Example
 Suppose that we define a datacube for
ALLElectronics of the form
 sales[time,item,location]: sum(salles_in_euro)

 Dimension hierarchies
› time: day < month < quater < year
› Item: item_name < brand < type
Query
 {brand,province_or_state} with year=2000

 Four materialized cubes are available

1) {year, item_name, city}

2) {year, brand, country}
3) {year, brand, province_or_state}
4) {item_name, province_or_state} where year = 2000

 Which should be selected to process the query?

Granularity
 Finer granularity data cannot be generated from coarser-
granularity data

 Cuboid 2 cannot be used since country is more general

concept then province_or_state
 Cuboids 1, 3, 4 can be used
› They have the same set or superset of the dimensions of the query
› The selection clause in the query can imply the selection in the cuboid
› The abstraction levels for the item and location dimension in these
cuboids are at a finer level than brand and province_or_state
How would the costs of each cuboid
compare?
 Cuboid 1 would cost the most, since both item_name
and city are at a lower level than brand and
province_or_state

 If not many year values associated with items in the

cube, and there are several item names for each
brand, then cuboid 3 will be better than cuboid 4
 Efficient indices available for cuboid 4, cuboid 4
better choice (bitmap indexes)
Indexing OLAP Data:
Bitmap Index
 Index on a particular column
 Each value in the column has a bit vector: bit-op is fast
 The length of the bit vector: # of records in the base table
 The i-th bit is set if the i-th row of the base table has the
value for the indexed column
Base table Index on Region Index on Type

Cust Region Type RecID Asia Europe Am erica RecID Retail Dealer
C1 Asia Retail 1 1 0 0 1 1 0
C2 Europe Dealer 2 0 1 0 2 0 1
C3 Asia Dealer 3 1 0 0 3 0 1
4 0 0 1 4 1 0
C4 America Retail
5 0 1 0 5 0 1
C5 Europe Dealer

21
Bitmap Index
 Allows quick search in data cubes
 Advantageous compared to hash and tree indices
 Useful for low-cardinality domains because
comparison, join, and aggregation operations are
reduced to bitmap arithmetic's
› (Reduced processing time!)
 Significant reduction in space and I/O since a string
of character can be represented by a bit
Join indexing method

 The join indexing method gained popularity from its use in

relational database query processing
 Traditional indexing maps the value in a given column to a
list of rows having that value
 For example, if two relations R(RID,A) and S(B,SID) join on
two attributes A and B, then the join index record contains
the pair (RID,SID) from R and S relation
 Join index records can identify joinable tuples without
performing costly join operators
Indexing OLAP Data:
Join Indices

In data warehouses, join index relates the

values of the dimensions of a start
schema to rows in the fact table.
› E.g. fact table: Sales and two dimensions
city and product
 A join index on city maintains for each
distinct city a list of R-IDs of the tuples
recording the Sales in the city
› Join indices can span multiple dimensions
Multiway Array Aggregation

 Sometimes we need to precompute all of the cuboids

for a given data cube
› (full materialization)
 Cuboids can be stored on secondary storage and
accessed when necessary
 Methods must take into account the limited amount
of main memory and time
 Different techniques for ROLAP and MOLAP
ROLAP cube computation

 Sorting hashing and grouping operations are applied

to the dimension attributes in order to reorder and
group related tuples
 Grouping is preformed on some sub aggregates as a
partial grouping step
› Speed up computation
 Aggregate may be computed from previously
computed aggregates, rather than from the base fact
tables
MOLAB and cube computation
 MOLAP cannot perform the value-based reordering because
it uses direct array addressing
› Partition arrays into chunks (a small subcube which fits in memory).
› Compressed sparse array addressing for empty cell arrays
 Compute aggregates in “multiway” by visiting cube cells in
the order which minimizes the number of times to visit
each cell, and reduces memory access and storage cost
Iceberg Cube
 Computing only the cuboid cells whose count
or other aggregates satisfying the condition like
HAVING COUNT(*) >= minsup

 Motivation
 Only calculate “interesting” cells—data above certain
threshold
 Avoid explosive growth of the cube

• Suppose 100 dimensions, only 1 base cell. How many

aggregate cells if count >= 1? What about count >= 2?
THANK YOU

CHAPTER 3 (TOPIC – IV ) CONCLUDED

Batch B DWM Experiments
No ratings yet
Batch B DWM Experiments
90 pages
Concepts and Techniques: - Chapter 5
No ratings yet
Concepts and Techniques: - Chapter 5
95 pages
1.7 Efficient Processing of OLAP Queries & OLAP Servers
No ratings yet
1.7 Efficient Processing of OLAP Queries & OLAP Servers
14 pages
OLAP2
No ratings yet
OLAP2
53 pages
3-Data Warehouse Modeling_ Data Cube and OLAP-18!12!2024
No ratings yet
3-Data Warehouse Modeling_ Data Cube and OLAP-18!12!2024
25 pages
05 Cube Tech
No ratings yet
05 Cube Tech
95 pages
Module 2 DMDW
No ratings yet
Module 2 DMDW
132 pages
UNIT2DM
No ratings yet
UNIT2DM
63 pages
DM Module 2
No ratings yet
DM Module 2
47 pages
Unit 2 (1)
No ratings yet
Unit 2 (1)
26 pages
Chapter 2 and 3
No ratings yet
Chapter 2 and 3
89 pages
Unit - 4 Final
No ratings yet
Unit - 4 Final
71 pages
BMW M-2
No ratings yet
BMW M-2
41 pages
Concepts and Techniques: - Chapter 5
No ratings yet
Concepts and Techniques: - Chapter 5
95 pages
05 Chapter
No ratings yet
05 Chapter
95 pages
Data Warehousing & Modeling: Module - 2
No ratings yet
Data Warehousing & Modeling: Module - 2
144 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
17 olap
No ratings yet
17 olap
32 pages
DWDM notes
No ratings yet
DWDM notes
19 pages
DMDW Co1 Session 7
No ratings yet
DMDW Co1 Session 7
46 pages
DPM 9
No ratings yet
DPM 9
39 pages
1.6 Efficient Data Cube Computation & Indexing OLAP
No ratings yet
1.6 Efficient Data Cube Computation & Indexing OLAP
25 pages
P7 CubeTech
No ratings yet
P7 CubeTech
34 pages
Data Mining and Warehosuing Lecture 02
No ratings yet
Data Mining and Warehosuing Lecture 02
22 pages
DM 6
No ratings yet
DM 6
29 pages
Chap 2
No ratings yet
Chap 2
21 pages
DWDM Module 2
No ratings yet
DWDM Module 2
76 pages
Hirasugar Institute of Technology, Nidasoshi
No ratings yet
Hirasugar Institute of Technology, Nidasoshi
30 pages
Data Warehouse and Data Mining - Unit 4
No ratings yet
Data Warehouse and Data Mining - Unit 4
14 pages
9 MidReview
No ratings yet
9 MidReview
25 pages
09 Data Serving
No ratings yet
09 Data Serving
46 pages
Note2 3
No ratings yet
Note2 3
36 pages
DM Mod2 PDF
No ratings yet
DM Mod2 PDF
41 pages
DMDW 2nd Module
No ratings yet
DMDW 2nd Module
29 pages
8
No ratings yet
8
11 pages
Implementation: Data Warehouse
No ratings yet
Implementation: Data Warehouse
56 pages
Module 2
No ratings yet
Module 2
19 pages
DM
No ratings yet
DM
5 pages
DW Seminar
No ratings yet
DW Seminar
13 pages
Data Warehousing: Data Models and OLAP Operations
No ratings yet
Data Warehousing: Data Models and OLAP Operations
39 pages
Data Cube
No ratings yet
Data Cube
42 pages
DM and DW Notes-Module2
No ratings yet
DM and DW Notes-Module2
18 pages
Data Warehousing & OLAP (Business Intellegent)
No ratings yet
Data Warehousing & OLAP (Business Intellegent)
31 pages
New Text Document
No ratings yet
New Text Document
3 pages
Unit-4 Finalized
No ratings yet
Unit-4 Finalized
7 pages
Data Warehousing & OLAP
No ratings yet
Data Warehousing & OLAP
57 pages
Cube Computation and Indexes For Data Warehouses: CPS 196.03 Notes 7
No ratings yet
Cube Computation and Indexes For Data Warehouses: CPS 196.03 Notes 7
28 pages
Compact Representation of Frequent Item Set
No ratings yet
Compact Representation of Frequent Item Set
59 pages
Data Warehouse - Logical Design
No ratings yet
Data Warehouse - Logical Design
40 pages
Reference Short Notes For Mid Term Papers: CS614 - Date Warehousing
No ratings yet
Reference Short Notes For Mid Term Papers: CS614 - Date Warehousing
18 pages
What Is OLAP - On - Line Analytical Processing
No ratings yet
What Is OLAP - On - Line Analytical Processing
34 pages
DWDM Unit 2 Part 2 by Jithender Tulasi
No ratings yet
DWDM Unit 2 Part 2 by Jithender Tulasi
63 pages
Duck Data Umpire by Cubical Kits: Sarathchand P.V. B.E (Cse), M.Tech (CS), (PHD) Professor and Research Scholar
No ratings yet
Duck Data Umpire by Cubical Kits: Sarathchand P.V. B.E (Cse), M.Tech (CS), (PHD) Professor and Research Scholar
4 pages
Data Science IV
No ratings yet
Data Science IV
126 pages
1dm download pro
No ratings yet
1dm download pro
54 pages
zOSV2R3 - MVS Program Management - User's Guide and Reference - IBM - 2017
No ratings yet
zOSV2R3 - MVS Program Management - User's Guide and Reference - IBM - 2017
258 pages
OLAP Implementation Techniques: High Performance Data Warehouse Design and Construction
No ratings yet
OLAP Implementation Techniques: High Performance Data Warehouse Design and Construction
34 pages
IT-Support-Officer-Intern
No ratings yet
IT-Support-Officer-Intern
2 pages
SEMINAR PPT_Digital Literacy Day
No ratings yet
SEMINAR PPT_Digital Literacy Day
24 pages
Get CC Certified in Cybersecurity All-in-One Exam Guide 1st Edition - Ebook PDF PDF Ebook With Full Chapters Now
100% (8)
Get CC Certified in Cybersecurity All-in-One Exam Guide 1st Edition - Ebook PDF PDF Ebook With Full Chapters Now
21 pages
Data Warehousing and Decision Support
No ratings yet
Data Warehousing and Decision Support
8 pages
SPLK-1005 Splunk Cloud Certified Admin Updated Practice Questions
No ratings yet
SPLK-1005 Splunk Cloud Certified Admin Updated Practice Questions
6 pages
Day-01-Information Security
No ratings yet
Day-01-Information Security
26 pages
Deleting Files Permanently
No ratings yet
Deleting Files Permanently
4 pages
Muthuraj
No ratings yet
Muthuraj
3 pages
FBackup User Manual
100% (1)
FBackup User Manual
93 pages
License Type for Analytic Models
No ratings yet
License Type for Analytic Models
6 pages
Found 465999304 2273280
No ratings yet
Found 465999304 2273280
55 pages
Describe Core Solutions and Management
No ratings yet
Describe Core Solutions and Management
28 pages
Lec 7 File System & Disk Management
No ratings yet
Lec 7 File System & Disk Management
22 pages
Identikey Server Product Guide
No ratings yet
Identikey Server Product Guide
197 pages
Chapter 3 Topic - 5
No ratings yet
Chapter 3 Topic - 5
13 pages
Introduction To AWS Identity and Access Management
No ratings yet
Introduction To AWS Identity and Access Management
13 pages
Practice Quizes 1 - 7
No ratings yet
Practice Quizes 1 - 7
18 pages
IT Self Assessment
100% (1)
IT Self Assessment
9 pages
Isilon Archive Scale-Out Nas Storage: Isilon A200 Isilon A2000
No ratings yet
Isilon Archive Scale-Out Nas Storage: Isilon A200 Isilon A2000
5 pages
HPS Virtualization Whitepaper
No ratings yet
HPS Virtualization Whitepaper
11 pages
Himawan Nugroho Bagaimana Network Engineer Jaman Now Melakukan Otomasi Di Masa Pandemi ID
No ratings yet
Himawan Nugroho Bagaimana Network Engineer Jaman Now Melakukan Otomasi Di Masa Pandemi ID
32 pages
System Call
No ratings yet
System Call
5 pages
Hemanth Chada - SR - TIBCO
No ratings yet
Hemanth Chada - SR - TIBCO
5 pages
MICE Vendor Database
No ratings yet
MICE Vendor Database
5 pages
Udit Joshi Resume PDF
No ratings yet
Udit Joshi Resume PDF
1 page
Linked List - SLL
No ratings yet
Linked List - SLL
15 pages
CCNA Security v2.0 Final Exam Answers 100% PDF Download: Routing Protocol Authentication
No ratings yet
CCNA Security v2.0 Final Exam Answers 100% PDF Download: Routing Protocol Authentication
22 pages
Integrating On-Premise and Cloud Based Applications: Golden Gate
No ratings yet
Integrating On-Premise and Cloud Based Applications: Golden Gate
2 pages
Muralisankar's Resume
No ratings yet
Muralisankar's Resume
1 page
QVSDC IRWIN Testing Process
No ratings yet
QVSDC IRWIN Testing Process
8 pages
Create Role
No ratings yet
Create Role
6 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
From Everand
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
Dave Fowler
No ratings yet
100 Puzzles to Learn Data Warehousing
From Everand
100 Puzzles to Learn Data Warehousing
Cristian Scutaru
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet

Chapter 3 Topic - 4

Uploaded by

Chapter 3 Topic - 4

Uploaded by

DATA WAREHOUSING

 Core of multidimensional data analysis is the

 You would like to create a data cubeAll_Electronics

 Answer following queries

 (), the dimensions are not grouped

 Compute some cuboids in advance

 It is unrealistic to precompute and materialize

 Broad context of physical database design,

 Materialize the set of cuboids on which other

 It is important to take advantage of materialized

› Determine to which materialized cuboids the relevant

 Four materialized cubes are available

1) {year, item_name, city}

 Which should be selected to process the query?

 Cuboid 2 cannot be used since country is more general

 If not many year values associated with items in the

 The join indexing method gained popularity from its use in

In data warehouses, join index relates the

 Sometimes we need to precompute all of the cuboids

 Sorting hashing and grouping operations are applied

• Suppose 100 dimensions, only 1 base cell. How many

CHAPTER 3 (TOPIC – IV ) CONCLUDED

You might also like