06 Data Warehouse Design and Analytics
06 Data Warehouse Design and Analytics
Overview
Data Warehousing
Online Analytical Processing
2
Overview
3
Overview (Cont.)
4
Overview (Cont.)
• Build predictive models and use the models for decision making
5
Overview (Cont.)
• E.g., use past history of sales (by season) to predict future sales
6
Overview (Cont.)
7
Overview (Cont.)
8
Data Integration From Multiple Sources
Many database applications require data from multiple databases
9
Data Integration From Multiple Sources
o Schema integration
10
Data Integration From Multiple Sources
Wrapper for a data source is a view that translates data from local to a global schema.
Wrappers must also translate updates on global schema to updates on local schema
11
Data Integration From Multiple Sources
Databases that support common schema and queries, but not updates, are
referred to as mediator systems
12
Data Warehouses Concepts
Data warehouse is an alternative to data integration
Migrates data to a common schema, avoiding run-time overhead
Cost of translating schema/data to a common warehouse schema can be significant
ETL is a process in Data
Warehousing and it stands
for
Extract (E),
Transform (T) and
Load (L).
It is a process in which
an ETL tool extracts
the data from
various data source
systems, transforms it in the
staging area and then finally,
loads it into the Data
Warehouse system.
13
Data Warehouse Concepts
Once gathered, the data are stored for a long time, permitting access to
historical data.
14
Data Warehouse Concepts
15
Data Warehouse Concepts
Multidimensional Data and Warehouse
Schemas
16
Data Warehouse Concepts
Multidimensional Data and Warehouse
Schemas
17
Data Warehouse Concepts
Multidimensional Data and Warehouse
Schemas
18
Designing Star Schema
A fact table sales would have dimension
attributes item id, store id, customer id,
and date, and measure attributes
number and price. The attribute store id
is a foreign key into a dimension table
store, which has other attributes such as
store location (city, state, country). The
item id attribute of the sales table
would be a foreign key into a
dimension table item info, which would
contain information such as the name of
the item, the category to which the item
belongs, and other item details such as
color and size. The customer id attribute
would be a foreign key into a customer
table containing attributes such as name
and address of the customer. We can also
view the date attribute as a foreign key
into a date info table giving the month,
quarter, and year of each date.
19
Designing Star Schema
Design star schema for for National Board
of Revenue.
Measure attributes: Tax amount, income,
Dimensions:
Tax payer
Tax Collector
Time
Source
20
Privacy-Preserved National Clinical Data Warehouse Architecture
21
Multidimensional Data
Data
Model
Modeling and
Analysis
22
Data Analysis and OLAP
23
Example sales relation
24
Cross Tabulation of sales by item_name and color
How can you find the cross-tab of
sales?
Write SQL to find the cross-tab.
25
Cross Tabulation of sales by item_name and color
How can you find the cross-tab of
sales?
Write SQL to find the cross-tab.
26
Data Cube
27
Hierarchies on Dimensions
28
Hierarchies on Dimensions
How can you prepare DSS reports based on hierarchy?
Report on date (Year, , quarter, month, date wise report)
R1 = select year, quarter, month, date, sum(quantity) as tot_d from sales s,
date_info d Where s.date = d.date Group by year, quarter, month, date
Report on date (Year, , quarter, month wise report)
R2 = select year, quarter, month sum(tot_d) as tot_m from R1 Where s.date =
d.date Group by year, quarter, month, date
29
Hierarchies on Dimensions
How can you prepare DSS reports based on hierarchy?
Report on date (Year, , quarter, month, date wise report)
R1 = select year, quarter, month, date, sum(quantity) as tot_d from sales s,
date_info d Where s.date = d.date Group by year, quarter, month, date
Report on month (Year, , quarter, month wise report)
R2 = select year, quarter, month sum(tot_d) as tot_m from R1 Where s.date =
d.date Group by year, quarter, month, date
Report on quarter (Year, quarter wise report)
R3 = select year, quarter sum(tot_m) as tot_q from
R2
Group by year, quarter
30
Hierarchies on Dimensions
How can you prepare DSS reports based on hierarchy?
Report on date:
R1 = select year, quarter, month, date, sum(quantity) as tot_d from sales
Group by year, quarter, month, date
R2 = select year, quarter, month sum(tot_d) as tot_m from R1
Group by year, quarter, month
31
Data Warehouse Queries
Rollup
32
Data Warehouse Queries
Rollup
FROM table
(c1,c2)
(c1)
()
33
Data Warehouse Queries
CUBE
34
Data Warehouse Queries
CUBE
FROM table_name
35
PIVOT Table
To create a pivot table, you first need to have a table with your source data.
Let’s say you have a table called “sales” with columns for the date, product,
and sales amount.
To create a pivot table, you would use the following SQL statement:
SELECT *
FROM sales
PIVOT (
SUM(sales_amount)
FOR product IN ('Product A', 'Product B', 'Product C')
);
36