Data Warehouse Concepts
Data Warehouse Concepts
, HCMC, VN
Tel: +84 8 37221223, Fax: +84 8 38960640
DATA WAREHOUSE
(DAWH430784)
CONCEPTS
DAWH430784 1/16/2025 1
OUTLINE
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Multidimensional Model
OLAP Operations
DAWH430784 3
Motivation for DWH
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
DAWH430784 4
Data Warehouse
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
DAWH430784 5
DWH
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Pattern Evaluation
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases 7
DAWH430784
DWH in Business Intelligence
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
DAWH430784 9
OLTP vs. OLAP
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Transaction
processing (OLTP)
• Primary data from
transactions
• Daily operations and
short term decisions
Business intelligence
processing (OLAP)
• Transformed secondary
data
• Medium and long-term
decisions
10
DAWH430784
OLTP vs. OLAP
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Employee
EmpNo
EmpFirstName
EmpLastName
...
Takes
Product
Customer
Order ProdNo
CustNo ProdName
CustFirstName Places OrdNo Contains ProdQOH
CustLastName OrdDate ...
... ...
Qty
12
DAWH430784
Multidimensional Model
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Nice 12 20 24 33
S
Paris
18
23
Q1 21 10 18 35 measure
Time (Quarter)
14
values
17
20
Q2 27 14 11 30
12
18
dimensions
33
Q3 26 12 35 32
10
Q4 14 20 47 31
games DVDs
books CDs
Product (Category) 13
DAWH430784
Hierarchies
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Example
Hierarchies of
the Product,
Time, and
Customer
dimensions
15
DAWH430784
Hierarchies
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
16
DAWH430784
Measure Aggregation and
Summarizability
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
17
DAWH430784
Measure Classification
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Milan 24 18 28 14
ity e
Rome 33 25 23 25 Italy 57 43 51 39
(C tor
)
Nice France
S
12 20 24 33
Paris Q1 33 30 42 68
18
Time (Quarter)
41
23
Q1 21 10 18 35
Roll-up to the Country level
Time (Quarter)
Q2 27 14 11 30
14
17
37
20
Q2 27 14 11 30
Q3 26 12 35 32
12
18
51
33
Q3 26 12 35 32
Q4 14 20 47 31
10
Q4 14 20 47 31
games DVDs
games DVDs books CDs
books CDs Product (Category)
Product (Category)
20
DAWH430784
OLAP Operations: Drill down
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
ity e
Rome 10 8 11 8
(C tor
ity e
)
Rome 33 25 23 25
(C tor
)
Nice 4
S
Nice 7 8 10
S
12 20 24 33
Paris Paris
6
18
10
Jan 7 2 6 13
Drill-down to the
23
Q1 21 10 18 35
14
Time (Quarter)
3
14
17
Month level
Time (Quarter)
Feb 8 4 8 12
7
20
Q2 27 14 11 30
...
9
12
18
...
Mar 6 4 4 10
33
Q3 26 12 35 32
...
8
10
14
Q4 14 20 47 31 ... ... ... ... ...
5
games DVDs Dec 4 4 16 7
books CDs games DVDs
Product (Category) books CDs
Product (Category)
21
DAWH430784
OLAP Operations: Pivot or Rotate
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Milan 24 18 28 14
DVDs 35 30 32 31
or t
ity e
y)
eg c
Rome 33 25 23 25
(C tor
)
a t du
Nice CDs 18 11 35 47
S
12 20 24 33
(C Pro
games 10 14 12 20
Paris
18
books
10
23
Q1 21 10 18 35
21
Time (Quarter)
Paris 21 27 26 14
14
17
17
33
Store (City)
20
Q2 27 14 11 30
20
Pivot Nice 12 14 11 13
12
18
28
18
33
Q3 26 12 35 32
47
Rome 33 28 35 32
10
19
Q4 14 20 47 31
Milan 24 23 25 18
games DVDs
Q1 Q2 Q3 Q4
books CDs
Product (Category) Time (Quarter)
22
DAWH430784
OLAP Operations: Slice
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Milan 24 18 28 14
ity e
Rome 33 25 23 25
(C tor
Q1 21 10 18 35
)
Time (Quarter)
Nice
S
12 20 24 33
Paris Q2 27 14 11 30
18
23
Q1 21 10 18 35
Time (Quarter)
Q3 26 12 35 32
14
17
Q2 27 14 11 30
Q4 14 20 47 31
12
18
33
Q3 26 12 35 32 games DVDs
10
Q4 14 20 47 31 books CDs
Product (Category)
games DVDs
books CDs
Product (Category)
23
DAWH430784
OLAP Operations: Dice
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Milan 24 18 28 14
ity e
Rome 33 25 23 25
(C tor
)
(Quarter) ity e
Nice
(C tor
S
12 20 24 33
)
Nice 12 20 24 33
S
Paris Paris
18
23
Q1 21 10 18 35 Q1 21 10 18 35
Dice on Store.Country = ‘France’
Time (Quarter)
Time
14
17
14
and Time.Quarter= ‘Q1’ or ‘Q2’
20
Q2 27 14 11 30 Q2 27 14 11 30
12
18
33
Q3 26 12 35 32 games DVDs
10
books CDs
Q4 14 20 47 31
Product (Category)
games DVDs
books CDs
Product (Category)
24
DAWH430784
OLAP Operations – Summary
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Top Down
• Enterprise data warehouse
• Higher integration levels
• Logically centralized
• Larger project scope
Bottom Up
• Independent data marts
• Lower integration levels
• Logically decentralized
• Smaller project scope
26
DAWH430784
Bottom-up Architecture
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
User
Data mart tier
departments
Operational
database
Transformation
process
Data mart
Operational
database
External
data source
Data mart
27
DAWH430784
Top-Down Architecture
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Operational
database Staging Extraction
Area
process
Transformation
process
Detailed and
summarized data
EDM
External
data source Data warehouse
Data mart
28
DAWH430784
General Architecture
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Enterprise Reporting
ETL OLAP tools
Operational data
process warehouse server
databases
Statistical
tools
External
sources Data marts
Data mining
tools
29
DAWH430784
General Architecture
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
Data sources
Operational databases
Other internal or external sources of information (e.g. files)
Back-end tier
Extraction-Transformation-Loading (ETL) tools for manipulating data
from sources
Data staging area: Intermediate database where manipulation is done
OLAP tier
OLAP Server: Supports multidimensional data and operations
Front-end tier: Deals with data analysis and visualization
Composed of OLAP tools, reporting tools, statistical tools, data-
mining tools, …
30
DAWH430784
Extraction-Transformation-Loading
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
35
DAWH430784
Reference Books
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
DAWH430784 36
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
DAWH430784 1/16/2025 37