SlideShare a Scribd company logo
Lecture 13
Datawarehouse SchemaDatawarehouse SchemaDatawarehouse SchemaDatawarehouse Schema
If one is not careful, with the increase in number of dimensions, the
number of summary tables gets very large
Consider the example discussed earlier with the following two
dimensions on the fact table...
Time: Day,Week, Month, Quarter,Year,All Days
Product: Item, Sub-Category, Category,All Products
ROLAP and Space Requirement
EXAMPLE: ROLAP and Space Requirement
A naïve implementation will require all combinations of summary tables at
each and every aggregation level.
…
Maintenance.
Non standard hierarchy of dimensions.
Non standard conventions.
Explosion of storage space requirement.
ROLAP Issues
Summary tables are mostly a maintenance issue (similar to MOLAP)
than a storage issue.
Notice that summary tables get much smaller as dimensions get less
detailed (e.g., year vs. day).
Should plan for twice the size of the unsummarized data for ROLAP
summaries in most environments.
Assuming "to-date" summaries, every detail record that is received into
warehouse must aggregate into EVERY summary table.
ROLAP Issues: Maintenance
Dimensions are NOT always simple hierarchies
Dimensions can be more than simple hierarchies i.e. item, subcategory, category,
etc.
The product dimension might also branch off by trade style that cross simple
hierarchy boundaries such as:
Looking at sales of air conditioners that cross manufacturer boundaries, such as
COY1, COY2, COY3 etc.
Looking at sales of all “green colored” items that even cross product categories
(washing machine, refrigerator, split-AC, etc.).
Looking at a combination of both.
ROLAP Issues: Hierarchies
Conventions are NOT absolute
Example:What is calendar year?What is a week?
Calendar:
01 Jan. to 31 Dec or
01 Jul. to 30 Jun. or
01 Sep to 30 Aug.
Week:
Mon. to Sat. orThu. toWed.
ROLAP Issues: Convention
Coarser granularity correspondingly decreases potential
cardinality.
Aggregating whatever that can be aggregated.
Throwing away the detail data after aggregation.
ROLAP Issues: Aggregation Pitfalls
Many ROLAP products have developed means to reduce the
number of summary tables by:
Building summaries on-the-fly as required by end-user applications.
Enhancing performance on common queries at coarser granularities.
Providing smart tools to assist DBAs in selecting the "best”
aggregations to build i.e. trade-off between speed and space.
How to Reduce Summary Tables
Performance vs Space Trade-Off
20
40
60
80
100
2 4 6 8
××××
××××
MB
%Gain
Aggregation answers most queries
Aggregation answers few
queries
Is a relational database schema for representing multidimensional
data
Is the simplest form of a datawarehouse schema that contains one
or more dimensions and fact tables.
People usually want to see some form of aggregated data
Called Measures
Usually numeric and additive
Example: Sales $, Number of customers
Just tracking measures, however, is not enough
People want to see data using a “by” condition
Called dimensions
Example:Time, product, Geography, etc
Star Schema
Steps Involved
Identify business processes for analysis (eg. Sales)
Identify the measures
Identify the dimensions for facts
List the columns for each dimension
Identify the lowest level of granularity in a fact table
Aspects of Star Schema
Every dimension will have a primary key (usually surrogate)
Dimensions do not have parent tables
Hierarchies for the dimensions are stored in the same table
Star Schema
Simplified 3NF
ZONE REGION
zip _x_SMSA
1
ZIP ZONE ZIP SMSA ZIP ADI QTR YR
STORE # ADDRESS ZIP ...
WEEK QTR
DATE WEEK
RECEIPT # STORE # DATE ...
DATE WEATHER
RECEIPT #ITEM # ... $
ITEM # CATEGORY
ITEM # MFCTR
DEPTCATEGORY
zip _x_adi year
quarter
week
date_x_store_x_weather
sale_detail
item_x_category
item_x_mfctr
category_x_dept
M
1
M
1 1 1M
1
M11
M M
1 M 1
M
M M
1
1
M
1
STORE #1
M
M M
ADI:Area of Dominance Influence
SMSA: Standard Metropolitan StatisticalArea
Simplified Star Schema
ITEM# CATEGORY DEPT MFCTR ...
ITEM# STORE# DATERECEIPT# ...
M
1
Fact Table
Product Dimension Table
STORE# ADDRESS ZIP ADI SMSA ZONE
1
M
Geography Dimension Table
REGION
$
DATE WEEK QUARTER YEAR ...
Calendar Dimension Table
1
M
A vastly simplified model ... may even summarize out receipt # .....
STORE# DATE WEATHER
Store x Date Dimensional Table
1
M
1
M
ADI:Area of Dominance Influence
SMSA: Standard Metropolitan StatisticalArea
Example
Star Schema
A vastly simplified physical data model!
Collapse dimensional hierarchies into a single table for each
dimension and create a single fact table from the header and
detail records:
Fewer tables.
Fewer joins to get results.
Merely a methodology for deploying the pre-join
denormalization discussed earlier
A Simplified Star Schema
Target is to get the best of both worlds.
HOLAP (Hybrid OLAP) allow co-existence of pre-built MOLAP
cubes alongside relational OLAP or ROLAP structures.
How much to pre-build?
HOLAP
DOLAP
Cube on the
remote server
Local Machine/Server
Subset of the cube is
transferred to the local
machine
OLAP Implementation Techniques
Summary

More Related Content

Viewers also liked (10)

PPTX
Digital business briefing January 2015
The Knowledge Transfer Network Creative, Digital & Design
 
PPTX
What is Bitcoin Currency
nasim12
 
PPTX
the present simple and present continuous
Lua Pham
 
PPT
Slideshare presentation XIAXIONG
Sean Xiong
 
DOC
x town report
Jıa Yıı
 
PPTX
Creative Business Development Briefing - April 2015
The Knowledge Transfer Network Creative, Digital & Design
 
PPT
Cot safety
cfdjmeier
 
PPTX
Developing for Windows 8 based devices
Aneeb_Khawar
 
PPTX
Creative Business Development Briefing - February 2015
The Knowledge Transfer Network Creative, Digital & Design
 
PPT
Ken Hughes and morning presentations at ECR Ireland Category Management Shopp...
ecrireland
 
Digital business briefing January 2015
The Knowledge Transfer Network Creative, Digital & Design
 
What is Bitcoin Currency
nasim12
 
the present simple and present continuous
Lua Pham
 
Slideshare presentation XIAXIONG
Sean Xiong
 
x town report
Jıa Yıı
 
Creative Business Development Briefing - April 2015
The Knowledge Transfer Network Creative, Digital & Design
 
Cot safety
cfdjmeier
 
Developing for Windows 8 based devices
Aneeb_Khawar
 
Creative Business Development Briefing - February 2015
The Knowledge Transfer Network Creative, Digital & Design
 
Ken Hughes and morning presentations at ECR Ireland Category Management Shopp...
ecrireland
 

Similar to Cs437 lecture 13 (20)

PPT
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
PPT
CS636-olap.ppt
Iftikharbaig7
 
PPTX
Data minng and warehousing lecture notes 1PowerPoint.pptx
aaronbangala5
 
PPS
Teradata Aggregate Join Indices And Dimensional Models
pepeborja
 
PPTX
Data Warehousing
SHIKHA GAUTAM
 
PPT
Intro to Data warehousing lecture 08
AnwarrChaudary
 
PPTX
Lecture 3:Introduction to Dimensional Modelling.pptx
RehmahAtugonza
 
PPT
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
PPTX
Getting power bi
Umakant Bhardwaj
 
PPTX
Data Warehousing for students educationpptx
jainyshah20
 
PPTX
Dataware house multidimensionalmodelling
meghu123
 
PPT
Data Warehouse
ganblues
 
DOC
Basics+of+Datawarehousing
theextraaedge
 
PDF
Hadoop secondary sort and a custom comparator
Subhas Kumar Ghosh
 
PPT
3dw
umavipplow
 
PPT
Dimensional Modeling
Sunita Sahu
 
PDF
Date Analysis .pdf
ABDEL RAHMAN KARIM
 
PPT
Michael Stonebraker How to do Complex Analytics
MassTLC
 
PPTX
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
Saikiran Panjala
 
PPT
DWO -Pertemuan 1
Abrianto Nugraha
 
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
CS636-olap.ppt
Iftikharbaig7
 
Data minng and warehousing lecture notes 1PowerPoint.pptx
aaronbangala5
 
Teradata Aggregate Join Indices And Dimensional Models
pepeborja
 
Data Warehousing
SHIKHA GAUTAM
 
Intro to Data warehousing lecture 08
AnwarrChaudary
 
Lecture 3:Introduction to Dimensional Modelling.pptx
RehmahAtugonza
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
Getting power bi
Umakant Bhardwaj
 
Data Warehousing for students educationpptx
jainyshah20
 
Dataware house multidimensionalmodelling
meghu123
 
Data Warehouse
ganblues
 
Basics+of+Datawarehousing
theextraaedge
 
Hadoop secondary sort and a custom comparator
Subhas Kumar Ghosh
 
Dimensional Modeling
Sunita Sahu
 
Date Analysis .pdf
ABDEL RAHMAN KARIM
 
Michael Stonebraker How to do Complex Analytics
MassTLC
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
Saikiran Panjala
 
DWO -Pertemuan 1
Abrianto Nugraha
 
Ad

More from Aneeb_Khawar (6)

PDF
Cs437 lecture 16-18
Aneeb_Khawar
 
PDF
Cs437 lecture 14_15
Aneeb_Khawar
 
PDF
Cs437 lecture 10-12
Aneeb_Khawar
 
PDF
Cs437 lecture 09
Aneeb_Khawar
 
PDF
Cs437 lecture 7-8
Aneeb_Khawar
 
PDF
Cs437 lecture 1-6
Aneeb_Khawar
 
Cs437 lecture 16-18
Aneeb_Khawar
 
Cs437 lecture 14_15
Aneeb_Khawar
 
Cs437 lecture 10-12
Aneeb_Khawar
 
Cs437 lecture 09
Aneeb_Khawar
 
Cs437 lecture 7-8
Aneeb_Khawar
 
Cs437 lecture 1-6
Aneeb_Khawar
 
Ad

Recently uploaded (20)

PDF
Kafka Use Cases Real-World Applications
Accentfuture
 
PDF
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 
PDF
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 
PDF
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
PDF
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
PPTX
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
 
PDF
NSEST - 2025-Brochure srm institute of science and technology
MaiyalaganT
 
PPTX
Natural Language Processing Datascience.pptx
Anandh798253
 
PPTX
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
PDF
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
PDF
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
PPTX
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
 
PDF
Predicting Titanic Survival Presentation
praxyfarhana
 
PDF
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
PPTX
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
PDF
Data science AI/Ml basics to learn .pdf
deokhushi04
 
PPTX
english9quizw1-240228142338-e9bcf6fd.pptx
rossanthonytan130
 
PDF
5- Global Demography Concepts _ Population Pyramids .pdf
pkhadka824
 
PPTX
covid 19 data analysis updates in our municipality
RhuAyungon1
 
Kafka Use Cases Real-World Applications
Accentfuture
 
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
 
NSEST - 2025-Brochure srm institute of science and technology
MaiyalaganT
 
Natural Language Processing Datascience.pptx
Anandh798253
 
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
 
Predicting Titanic Survival Presentation
praxyfarhana
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
Data science AI/Ml basics to learn .pdf
deokhushi04
 
english9quizw1-240228142338-e9bcf6fd.pptx
rossanthonytan130
 
5- Global Demography Concepts _ Population Pyramids .pdf
pkhadka824
 
covid 19 data analysis updates in our municipality
RhuAyungon1
 

Cs437 lecture 13

  • 1. Lecture 13 Datawarehouse SchemaDatawarehouse SchemaDatawarehouse SchemaDatawarehouse Schema
  • 2. If one is not careful, with the increase in number of dimensions, the number of summary tables gets very large Consider the example discussed earlier with the following two dimensions on the fact table... Time: Day,Week, Month, Quarter,Year,All Days Product: Item, Sub-Category, Category,All Products ROLAP and Space Requirement
  • 3. EXAMPLE: ROLAP and Space Requirement A naïve implementation will require all combinations of summary tables at each and every aggregation level. …
  • 4. Maintenance. Non standard hierarchy of dimensions. Non standard conventions. Explosion of storage space requirement. ROLAP Issues
  • 5. Summary tables are mostly a maintenance issue (similar to MOLAP) than a storage issue. Notice that summary tables get much smaller as dimensions get less detailed (e.g., year vs. day). Should plan for twice the size of the unsummarized data for ROLAP summaries in most environments. Assuming "to-date" summaries, every detail record that is received into warehouse must aggregate into EVERY summary table. ROLAP Issues: Maintenance
  • 6. Dimensions are NOT always simple hierarchies Dimensions can be more than simple hierarchies i.e. item, subcategory, category, etc. The product dimension might also branch off by trade style that cross simple hierarchy boundaries such as: Looking at sales of air conditioners that cross manufacturer boundaries, such as COY1, COY2, COY3 etc. Looking at sales of all “green colored” items that even cross product categories (washing machine, refrigerator, split-AC, etc.). Looking at a combination of both. ROLAP Issues: Hierarchies
  • 7. Conventions are NOT absolute Example:What is calendar year?What is a week? Calendar: 01 Jan. to 31 Dec or 01 Jul. to 30 Jun. or 01 Sep to 30 Aug. Week: Mon. to Sat. orThu. toWed. ROLAP Issues: Convention
  • 8. Coarser granularity correspondingly decreases potential cardinality. Aggregating whatever that can be aggregated. Throwing away the detail data after aggregation. ROLAP Issues: Aggregation Pitfalls
  • 9. Many ROLAP products have developed means to reduce the number of summary tables by: Building summaries on-the-fly as required by end-user applications. Enhancing performance on common queries at coarser granularities. Providing smart tools to assist DBAs in selecting the "best” aggregations to build i.e. trade-off between speed and space. How to Reduce Summary Tables
  • 10. Performance vs Space Trade-Off 20 40 60 80 100 2 4 6 8 ×××× ×××× MB %Gain Aggregation answers most queries Aggregation answers few queries
  • 11. Is a relational database schema for representing multidimensional data Is the simplest form of a datawarehouse schema that contains one or more dimensions and fact tables. People usually want to see some form of aggregated data Called Measures Usually numeric and additive Example: Sales $, Number of customers Just tracking measures, however, is not enough People want to see data using a “by” condition Called dimensions Example:Time, product, Geography, etc Star Schema
  • 12. Steps Involved Identify business processes for analysis (eg. Sales) Identify the measures Identify the dimensions for facts List the columns for each dimension Identify the lowest level of granularity in a fact table Aspects of Star Schema Every dimension will have a primary key (usually surrogate) Dimensions do not have parent tables Hierarchies for the dimensions are stored in the same table Star Schema
  • 13. Simplified 3NF ZONE REGION zip _x_SMSA 1 ZIP ZONE ZIP SMSA ZIP ADI QTR YR STORE # ADDRESS ZIP ... WEEK QTR DATE WEEK RECEIPT # STORE # DATE ... DATE WEATHER RECEIPT #ITEM # ... $ ITEM # CATEGORY ITEM # MFCTR DEPTCATEGORY zip _x_adi year quarter week date_x_store_x_weather sale_detail item_x_category item_x_mfctr category_x_dept M 1 M 1 1 1M 1 M11 M M 1 M 1 M M M 1 1 M 1 STORE #1 M M M ADI:Area of Dominance Influence SMSA: Standard Metropolitan StatisticalArea
  • 14. Simplified Star Schema ITEM# CATEGORY DEPT MFCTR ... ITEM# STORE# DATERECEIPT# ... M 1 Fact Table Product Dimension Table STORE# ADDRESS ZIP ADI SMSA ZONE 1 M Geography Dimension Table REGION $ DATE WEEK QUARTER YEAR ... Calendar Dimension Table 1 M A vastly simplified model ... may even summarize out receipt # ..... STORE# DATE WEATHER Store x Date Dimensional Table 1 M 1 M ADI:Area of Dominance Influence SMSA: Standard Metropolitan StatisticalArea
  • 16. A vastly simplified physical data model! Collapse dimensional hierarchies into a single table for each dimension and create a single fact table from the header and detail records: Fewer tables. Fewer joins to get results. Merely a methodology for deploying the pre-join denormalization discussed earlier A Simplified Star Schema
  • 17. Target is to get the best of both worlds. HOLAP (Hybrid OLAP) allow co-existence of pre-built MOLAP cubes alongside relational OLAP or ROLAP structures. How much to pre-build? HOLAP
  • 18. DOLAP Cube on the remote server Local Machine/Server Subset of the cube is transferred to the local machine