SlideShare a Scribd company logo
Aggregate Join Indices & Dimensional models delivering extraordinary performance Jose M. Borja – Jborja@Menard-inc.com
Theory vs. Practice “ In theory, there is no difference between theory and practice. In practice there is….” Yogi Berra The reason we are here today is to help bridge the gap between theory and practice and to share with you real life experiences on using Aggregate Join Indices and Dimensional Models to deliver extraordinary performance
Background (or who is this guy) 20 years working with Relational Databases 14 years developing Data Architectures and Physical Database Design work 5 years practicing Data Administration 10 years of ICASE tool work and Data Model driven development 6 years of Teradata DW practice  Teradata DW Administrator and Data Architect Teradata DBA SQL Script Writer (ETL and Dimensional Models) General Teradata Handyman: Performance Tuning, DBS Controls, TDQM, PS, TDWM, Troubleshooting, Performance Tuning, Workload Management, Capacity Planning, etc.
What’s the Challenge? Design a Data Warehouse to meet these goals: Faithfully implements the Enterprise DW Data Model Ad Hoc Reporting  Data Mining  Business Intelligence (BI tools, on the fly reports) Provide operational application support Provide tactical query and operational support And do all of that with fast response times and tight SLAs
What is the proposed solution  Maintain two separate Data Models: 3NF Data Model Keep the data in line with the Enterprise DW Data Model Accessible and easy to query Available to applications Contain all the legacy data at the lowest granular level Dimensional Model Star Schemas Support BI efforts and limited applications  Building block for targeted mini data marts (one AMP) Easy to use  Place data closer to the point of use (fast access)
Common misconceptions about this approach?  Waste of space and processing handling two models Handling data twice More money for a bigger machine to host two models Is that really true?  What would the 3NF need to get the job done? An assortment of Secondary Indexes Requires storage and CPU to maintain Lots of CPU cycles to join tables and create aggregates Limits number of concurrent queries (long run times) May dictate the need to get more machine? More complex SQL to navigate 3NF model
How can I do it in Teradata  3NF Model Keep it faithful to the EDWDM with good PI  choices Keep number of Secondary Indexes small (or near none!)  Ad Hoc queries can afford slower response times Most of the big tables will be available in the DM! Dimensional Model Build Fact Tables to supply the measures across grains Build single table aggregate Join Indexes on a Fact Table Handle different levels of dimensional granularity Calculate the data once, use it many times “ Automagic” maintenance by Teradata (Yes!) Reusability of AJIs by optimizer (bonus!)  AJIs made available as views for direct query access
Example #1  Task: Compare This Year vs. Last Year sales by Product corporate wide 3NF Model Volume of data will be very large (detail level, approximately 2B rows) Number of tables may equal Number of Joins May be cumbersome for an Ad Hoc script to write quickly Aggregate is at corporate granularity (lots of rows qualify!)
Example #1 - SQL for 3NF Model  Select product_id,  sum(sold_qty * price_amt) – discount_amt – coupon_amt) as LY_Sales_Amt From  Sale s,  Sale_Line sl  Where saledate between 2005-01-01 and date – interval ‘1’ year and  s.Store_Nbr = sl.Store_Nbr  and  s.Transaction_Nbr = sl.Transaction_Nbr Group By product_id Select product_id,  sum(sold_qty * price_amt) – discount_amt – coupon_amt) as TY_Sales_Amt From  Sale s, Sale_Line sl  Where saledate between 2006-01-01 and date and  s.Store_Nbr = sl.Store_Nbr and  s.Transaction_Nbr = sl.Transaction_Nbr Group by product_id FULL OUTER JOIN
Example #1 – Dimensional Model  Task: Compare This Year vs. Last Year sales by Product corporate wide Dimensional Model Volume of data is smaller  Measures taken at the intersection of grains (aggregates) Fact table eliminates most joins to 3NF tables
Example #1 - SQL for Dimensional Model Select product_id,  sum(net_sale_amt) as LY_Sales From  store_product_daily_sale Where the_date between 2005-01-01 and date – interval ‘1’ year Group by product_id Select product_id,  sum(net_sale_amt) as TY_Sales From  store_product_daily_sale Where the_date between 2006-01-01 and date Group by product_id FULL OUTER JOIN The fact table is 1/3 the size of the 3NF Sale_Line table and  eliminates a table join between Sales and Sales_Line
Add Aggregate Join Indices to boost performance  A view is added in the Dimensional model to represent a single table aggregate Join Index at the Corporate level.  The AJI removes the Store grain and yields a higher aggregate with less rows.
Example #1 - SQL for Dimensional Model using the Join Index View Select product_id,  sum(net_sale_amt) as LY_Sales From  ji_product_daily_salev Where the_date between 2005-01-01 and date – interval ‘1’ year Group by product_id Select product_id,  sum(net_sale_amt) as TY_Sales From  ji_product_daily_salev Where the_date between 2006-01-01 and date Group by product_id FULL OUTER JOIN The Join Index is 1/30 the size of the 3NF Sale_Line table
A more robust Fact table has more possibilities  Bring additional dimensions to yield different levels of aggregation granularity to the mix of Join Indexes
 
Store & Subclass at 3 levels of Time granularity
Product at Daily Level and Store at Daily level
Subclass at 5 levels of Time granularity
Use the view to gain access to the Join Index  CREATE JOIN INDEX   JI_PRODUCT_DAILY_SALEv   AS  SELECT  product_id, the_date, product_subclass_id, Supplier_id, sum( net_sale_amt) as net_sale_amt)  . . . . . . . FROM STORE_PRODUCT_DAILY_SALE PRIMARY INDEX ( product_id, the_date); REPLACE VIEW   JI_PRODUCT_DAILY_SALEv  AS  SELECT  product_id, the_date, product_subclass_id, Supplier_id, sum( net_sale_amt) as net_sale_amt) . . . . . . . . FROM STORE_PRODUCT_DAILY_SALE; SELECT   prodcut_id, the_date, net_sale_amt  FROM  JI_PRODUCT_DAILY_SALEv   WHERE product_id = 198273648;
CPU consumption for the LY vs. TY Sales Example  12% 2%
Disk I/O Usage for the LY vs. TY Sales Example 21% 7%
Elapsed Time for the LY vs. TY Sales Example 10% 3%
LY vs. TY for 1 Product Corporate Wide
LY vs. TY for all Product Categories Corporate Wide
Conclusions Teradata technology makes it possible to sustain a 3NF and a Dimensional Model in a single system and enjoy the benefits of having both worlds.
Conclusions Teradata technology makes it easy to get the Dimensional model available for use at different levels of granularity using Join Indexes.  Sweet performance with low resource usage and auto-magic maintenance!
Conclusions The expense of maintaining a dozen Join Indexes on a single Fact table is paid back with just one substantial single report ran against the 3NF model. The Join Indexes are maintained when the DW has less usage at night and the benefits are harvested during the day by the users.
Conclusions The number of Secondary Indexes can be kept very low in the 3NF model since the Dimensional Model provides most of the necessary access to large volumes of data.  Most access to the 3NF can be limited to PI queries for application support, tactical queries, or reports that can afford table scans.
Tips on Join Indexes Keep join indexes limited to only one table.  Maintenance is too high on Join Indexes with two or more tables.  If one of the tables is maintained the Join Index may need to be maintained also.  Do not drop and recreate Join Indexes for maintenance.  It is not necessary and can be (very, very, very) costly to recreate. Store the Join Index definitions in macros for reuse and storage in the data dictionary. Create a view to provide “direct” access to the Join Index. Create a dummy Join Index on any table to prevent accidental DROPS.  A life saver to see the can not drop table message!
Ad

More Related Content

What's hot (20)

Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
Code Mastery
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfalls
sam2sung2
 
Datastage free tutorial
Datastage free tutorialDatastage free tutorial
Datastage free tutorial
tekslate1
 
Accenture informatica interview question answers
Accenture informatica interview question answersAccenture informatica interview question answers
Accenture informatica interview question answers
Sweta Singh
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
hasanshan
 
Migration services (DB2 to Teradata)
Migration services (DB2  to Teradata)Migration services (DB2  to Teradata)
Migration services (DB2 to Teradata)
ModakAnalytics
 
Teradata Unity
Teradata UnityTeradata Unity
Teradata Unity
Teradata
 
The Database Environment Chapter 6
The Database Environment Chapter 6The Database Environment Chapter 6
The Database Environment Chapter 6
Jeanie Arnoco
 
Crystal xcelsius best practices and workflows for building enterprise solut...
Crystal xcelsius   best practices and workflows for building enterprise solut...Crystal xcelsius   best practices and workflows for building enterprise solut...
Crystal xcelsius best practices and workflows for building enterprise solut...
Yogeeswar Reddy
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
shanker_uma
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
Calpont
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
Datawarehouse Trainings
 
Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
Suryakant Bharati
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic Applications
Calpont
 
Essbase beginner's guide olap fundamental chapter 1
Essbase beginner's guide olap fundamental chapter 1Essbase beginner's guide olap fundamental chapter 1
Essbase beginner's guide olap fundamental chapter 1
Amit Sharma
 
58750024 datastage-student-guide
58750024 datastage-student-guide58750024 datastage-student-guide
58750024 datastage-student-guide
Madhusudhanareddy Katta
 
Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0
alok khobragade
 
Online Datastage training
Online Datastage trainingOnline Datastage training
Online Datastage training
chpriyaa1
 
Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)
LizLavaveshkul
 
Optimization in essbase
Optimization in essbaseOptimization in essbase
Optimization in essbase
Ajay singh chouhan
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
Code Mastery
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfalls
sam2sung2
 
Datastage free tutorial
Datastage free tutorialDatastage free tutorial
Datastage free tutorial
tekslate1
 
Accenture informatica interview question answers
Accenture informatica interview question answersAccenture informatica interview question answers
Accenture informatica interview question answers
Sweta Singh
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
hasanshan
 
Migration services (DB2 to Teradata)
Migration services (DB2  to Teradata)Migration services (DB2  to Teradata)
Migration services (DB2 to Teradata)
ModakAnalytics
 
Teradata Unity
Teradata UnityTeradata Unity
Teradata Unity
Teradata
 
The Database Environment Chapter 6
The Database Environment Chapter 6The Database Environment Chapter 6
The Database Environment Chapter 6
Jeanie Arnoco
 
Crystal xcelsius best practices and workflows for building enterprise solut...
Crystal xcelsius   best practices and workflows for building enterprise solut...Crystal xcelsius   best practices and workflows for building enterprise solut...
Crystal xcelsius best practices and workflows for building enterprise solut...
Yogeeswar Reddy
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
shanker_uma
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
Calpont
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
Datawarehouse Trainings
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic Applications
Calpont
 
Essbase beginner's guide olap fundamental chapter 1
Essbase beginner's guide olap fundamental chapter 1Essbase beginner's guide olap fundamental chapter 1
Essbase beginner's guide olap fundamental chapter 1
Amit Sharma
 
Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0
alok khobragade
 
Online Datastage training
Online Datastage trainingOnline Datastage training
Online Datastage training
chpriyaa1
 
Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)
LizLavaveshkul
 

Viewers also liked (16)

ABC of Teradata System Performance Analysis
ABC of Teradata System Performance AnalysisABC of Teradata System Performance Analysis
ABC of Teradata System Performance Analysis
Shaheryar Iqbal
 
Teradata memory management - A balancing act
Teradata memory management  -  A balancing actTeradata memory management  -  A balancing act
Teradata memory management - A balancing act
Shaheryar Iqbal
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
Hortonworks
 
Teradata Overview
Teradata OverviewTeradata Overview
Teradata Overview
Teradata
 
Aggregate fact tables
Aggregate fact tablesAggregate fact tables
Aggregate fact tables
Siddique Ibrahim
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
Inside Analysis
 
Teradata Intelligent Memory
Teradata Intelligent MemoryTeradata Intelligent Memory
Teradata Intelligent Memory
inside-BigData.com
 
100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c
guest8ebe0a8
 
Teradata introduction - A basic introduction for Taradate system Architecture
Teradata introduction - A basic introduction for Taradate system ArchitectureTeradata introduction - A basic introduction for Taradate system Architecture
Teradata introduction - A basic introduction for Taradate system Architecture
Mohammad Tahoon
 
Teradata Architecture
Teradata Architecture Teradata Architecture
Teradata Architecture
BigClasses Com
 
Teradata introduction
Teradata introductionTeradata introduction
Teradata introduction
Rameejmd
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Introduction to Teradata And How Teradata Works
Introduction to Teradata And How Teradata WorksIntroduction to Teradata And How Teradata Works
Introduction to Teradata And How Teradata Works
BigClasses Com
 
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for HadoopPartners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Eric Sun
 
ABC of Teradata System Performance Analysis
ABC of Teradata System Performance AnalysisABC of Teradata System Performance Analysis
ABC of Teradata System Performance Analysis
Shaheryar Iqbal
 
Teradata memory management - A balancing act
Teradata memory management  -  A balancing actTeradata memory management  -  A balancing act
Teradata memory management - A balancing act
Shaheryar Iqbal
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
Hortonworks
 
Teradata Overview
Teradata OverviewTeradata Overview
Teradata Overview
Teradata
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
Inside Analysis
 
100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c
guest8ebe0a8
 
Teradata introduction - A basic introduction for Taradate system Architecture
Teradata introduction - A basic introduction for Taradate system ArchitectureTeradata introduction - A basic introduction for Taradate system Architecture
Teradata introduction - A basic introduction for Taradate system Architecture
Mohammad Tahoon
 
Teradata Architecture
Teradata Architecture Teradata Architecture
Teradata Architecture
BigClasses Com
 
Teradata introduction
Teradata introductionTeradata introduction
Teradata introduction
Rameejmd
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Introduction to Teradata And How Teradata Works
Introduction to Teradata And How Teradata WorksIntroduction to Teradata And How Teradata Works
Introduction to Teradata And How Teradata Works
BigClasses Com
 
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for HadoopPartners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Eric Sun
 
Ad

Similar to Teradata Aggregate Join Indices And Dimensional Models (20)

Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
Sunita Sahu
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
bartlowe
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional model
Gersiton Pila Challco
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
ABDEL RAHMAN KARIM
 
mdmodel multidimensional (MD) modeling approach to represent more complex da...
mdmodel  multidimensional (MD) modeling approach to represent more complex da...mdmodel  multidimensional (MD) modeling approach to represent more complex da...
mdmodel multidimensional (MD) modeling approach to represent more complex da...
anitha803197
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
kiran14360
 
MongoDb Schema Pattern - Kalpit Pandit.pptx
MongoDb Schema Pattern - Kalpit Pandit.pptxMongoDb Schema Pattern - Kalpit Pandit.pptx
MongoDb Schema Pattern - Kalpit Pandit.pptx
KalpitPandit1
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
HCL Technologies
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
BsMath3rdsem
 
Chapter 13 Business Intelligence and Data Warehouses Problems.docx
Chapter 13 Business Intelligence and Data Warehouses Problems.docxChapter 13 Business Intelligence and Data Warehouses Problems.docx
Chapter 13 Business Intelligence and Data Warehouses Problems.docx
bartholomeocoombs
 
Data Warehouse-Final
Data Warehouse-FinalData Warehouse-Final
Data Warehouse-Final
Priyanka Manchanda ☁️
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
NEWYORKSYS-IT SOLUTIONS
 
Technical Presentation - TimeWIzard
Technical Presentation - TimeWIzardTechnical Presentation - TimeWIzard
Technical Presentation - TimeWIzard
Praveen Kumar Peddi
 
Cs437 lecture 7-8
Cs437 lecture 7-8Cs437 lecture 7-8
Cs437 lecture 7-8
Aneeb_Khawar
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
Malik Alig
 
Dimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.pptDimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.ppt
nishant523869
 
Social Media Marketing with Digitalization
Social Media Marketing with DigitalizationSocial Media Marketing with Digitalization
Social Media Marketing with Digitalization
korellanida
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2
akitda
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
Sunita Sahu
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
bartlowe
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional model
Gersiton Pila Challco
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
mdmodel multidimensional (MD) modeling approach to represent more complex da...
mdmodel  multidimensional (MD) modeling approach to represent more complex da...mdmodel  multidimensional (MD) modeling approach to represent more complex da...
mdmodel multidimensional (MD) modeling approach to represent more complex da...
anitha803197
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
kiran14360
 
MongoDb Schema Pattern - Kalpit Pandit.pptx
MongoDb Schema Pattern - Kalpit Pandit.pptxMongoDb Schema Pattern - Kalpit Pandit.pptx
MongoDb Schema Pattern - Kalpit Pandit.pptx
KalpitPandit1
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
HCL Technologies
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
BsMath3rdsem
 
Chapter 13 Business Intelligence and Data Warehouses Problems.docx
Chapter 13 Business Intelligence and Data Warehouses Problems.docxChapter 13 Business Intelligence and Data Warehouses Problems.docx
Chapter 13 Business Intelligence and Data Warehouses Problems.docx
bartholomeocoombs
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
NEWYORKSYS-IT SOLUTIONS
 
Technical Presentation - TimeWIzard
Technical Presentation - TimeWIzardTechnical Presentation - TimeWIzard
Technical Presentation - TimeWIzard
Praveen Kumar Peddi
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
Malik Alig
 
Dimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.pptDimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.ppt
nishant523869
 
Social Media Marketing with Digitalization
Social Media Marketing with DigitalizationSocial Media Marketing with Digitalization
Social Media Marketing with Digitalization
korellanida
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2
akitda
 
Ad

Teradata Aggregate Join Indices And Dimensional Models

  • 1. Aggregate Join Indices & Dimensional models delivering extraordinary performance Jose M. Borja – [email protected]
  • 2. Theory vs. Practice “ In theory, there is no difference between theory and practice. In practice there is….” Yogi Berra The reason we are here today is to help bridge the gap between theory and practice and to share with you real life experiences on using Aggregate Join Indices and Dimensional Models to deliver extraordinary performance
  • 3. Background (or who is this guy) 20 years working with Relational Databases 14 years developing Data Architectures and Physical Database Design work 5 years practicing Data Administration 10 years of ICASE tool work and Data Model driven development 6 years of Teradata DW practice Teradata DW Administrator and Data Architect Teradata DBA SQL Script Writer (ETL and Dimensional Models) General Teradata Handyman: Performance Tuning, DBS Controls, TDQM, PS, TDWM, Troubleshooting, Performance Tuning, Workload Management, Capacity Planning, etc.
  • 4. What’s the Challenge? Design a Data Warehouse to meet these goals: Faithfully implements the Enterprise DW Data Model Ad Hoc Reporting Data Mining Business Intelligence (BI tools, on the fly reports) Provide operational application support Provide tactical query and operational support And do all of that with fast response times and tight SLAs
  • 5. What is the proposed solution Maintain two separate Data Models: 3NF Data Model Keep the data in line with the Enterprise DW Data Model Accessible and easy to query Available to applications Contain all the legacy data at the lowest granular level Dimensional Model Star Schemas Support BI efforts and limited applications Building block for targeted mini data marts (one AMP) Easy to use Place data closer to the point of use (fast access)
  • 6. Common misconceptions about this approach? Waste of space and processing handling two models Handling data twice More money for a bigger machine to host two models Is that really true? What would the 3NF need to get the job done? An assortment of Secondary Indexes Requires storage and CPU to maintain Lots of CPU cycles to join tables and create aggregates Limits number of concurrent queries (long run times) May dictate the need to get more machine? More complex SQL to navigate 3NF model
  • 7. How can I do it in Teradata 3NF Model Keep it faithful to the EDWDM with good PI choices Keep number of Secondary Indexes small (or near none!) Ad Hoc queries can afford slower response times Most of the big tables will be available in the DM! Dimensional Model Build Fact Tables to supply the measures across grains Build single table aggregate Join Indexes on a Fact Table Handle different levels of dimensional granularity Calculate the data once, use it many times “ Automagic” maintenance by Teradata (Yes!) Reusability of AJIs by optimizer (bonus!) AJIs made available as views for direct query access
  • 8. Example #1 Task: Compare This Year vs. Last Year sales by Product corporate wide 3NF Model Volume of data will be very large (detail level, approximately 2B rows) Number of tables may equal Number of Joins May be cumbersome for an Ad Hoc script to write quickly Aggregate is at corporate granularity (lots of rows qualify!)
  • 9. Example #1 - SQL for 3NF Model Select product_id, sum(sold_qty * price_amt) – discount_amt – coupon_amt) as LY_Sales_Amt From Sale s, Sale_Line sl Where saledate between 2005-01-01 and date – interval ‘1’ year and s.Store_Nbr = sl.Store_Nbr and s.Transaction_Nbr = sl.Transaction_Nbr Group By product_id Select product_id, sum(sold_qty * price_amt) – discount_amt – coupon_amt) as TY_Sales_Amt From Sale s, Sale_Line sl Where saledate between 2006-01-01 and date and s.Store_Nbr = sl.Store_Nbr and s.Transaction_Nbr = sl.Transaction_Nbr Group by product_id FULL OUTER JOIN
  • 10. Example #1 – Dimensional Model Task: Compare This Year vs. Last Year sales by Product corporate wide Dimensional Model Volume of data is smaller Measures taken at the intersection of grains (aggregates) Fact table eliminates most joins to 3NF tables
  • 11. Example #1 - SQL for Dimensional Model Select product_id, sum(net_sale_amt) as LY_Sales From store_product_daily_sale Where the_date between 2005-01-01 and date – interval ‘1’ year Group by product_id Select product_id, sum(net_sale_amt) as TY_Sales From store_product_daily_sale Where the_date between 2006-01-01 and date Group by product_id FULL OUTER JOIN The fact table is 1/3 the size of the 3NF Sale_Line table and eliminates a table join between Sales and Sales_Line
  • 12. Add Aggregate Join Indices to boost performance A view is added in the Dimensional model to represent a single table aggregate Join Index at the Corporate level. The AJI removes the Store grain and yields a higher aggregate with less rows.
  • 13. Example #1 - SQL for Dimensional Model using the Join Index View Select product_id, sum(net_sale_amt) as LY_Sales From ji_product_daily_salev Where the_date between 2005-01-01 and date – interval ‘1’ year Group by product_id Select product_id, sum(net_sale_amt) as TY_Sales From ji_product_daily_salev Where the_date between 2006-01-01 and date Group by product_id FULL OUTER JOIN The Join Index is 1/30 the size of the 3NF Sale_Line table
  • 14. A more robust Fact table has more possibilities Bring additional dimensions to yield different levels of aggregation granularity to the mix of Join Indexes
  • 15.  
  • 16. Store & Subclass at 3 levels of Time granularity
  • 17. Product at Daily Level and Store at Daily level
  • 18. Subclass at 5 levels of Time granularity
  • 19. Use the view to gain access to the Join Index CREATE JOIN INDEX JI_PRODUCT_DAILY_SALEv AS SELECT product_id, the_date, product_subclass_id, Supplier_id, sum( net_sale_amt) as net_sale_amt) . . . . . . . FROM STORE_PRODUCT_DAILY_SALE PRIMARY INDEX ( product_id, the_date); REPLACE VIEW JI_PRODUCT_DAILY_SALEv AS SELECT product_id, the_date, product_subclass_id, Supplier_id, sum( net_sale_amt) as net_sale_amt) . . . . . . . . FROM STORE_PRODUCT_DAILY_SALE; SELECT prodcut_id, the_date, net_sale_amt FROM JI_PRODUCT_DAILY_SALEv WHERE product_id = 198273648;
  • 20. CPU consumption for the LY vs. TY Sales Example 12% 2%
  • 21. Disk I/O Usage for the LY vs. TY Sales Example 21% 7%
  • 22. Elapsed Time for the LY vs. TY Sales Example 10% 3%
  • 23. LY vs. TY for 1 Product Corporate Wide
  • 24. LY vs. TY for all Product Categories Corporate Wide
  • 25. Conclusions Teradata technology makes it possible to sustain a 3NF and a Dimensional Model in a single system and enjoy the benefits of having both worlds.
  • 26. Conclusions Teradata technology makes it easy to get the Dimensional model available for use at different levels of granularity using Join Indexes. Sweet performance with low resource usage and auto-magic maintenance!
  • 27. Conclusions The expense of maintaining a dozen Join Indexes on a single Fact table is paid back with just one substantial single report ran against the 3NF model. The Join Indexes are maintained when the DW has less usage at night and the benefits are harvested during the day by the users.
  • 28. Conclusions The number of Secondary Indexes can be kept very low in the 3NF model since the Dimensional Model provides most of the necessary access to large volumes of data. Most access to the 3NF can be limited to PI queries for application support, tactical queries, or reports that can afford table scans.
  • 29. Tips on Join Indexes Keep join indexes limited to only one table. Maintenance is too high on Join Indexes with two or more tables. If one of the tables is maintained the Join Index may need to be maintained also. Do not drop and recreate Join Indexes for maintenance. It is not necessary and can be (very, very, very) costly to recreate. Store the Join Index definitions in macros for reuse and storage in the data dictionary. Create a view to provide “direct” access to the Join Index. Create a dummy Join Index on any table to prevent accidental DROPS. A life saver to see the can not drop table message!