SlideShare a Scribd company logo
Worst Practices in Data Warehouse 
Design 
Kent Graziano 
Data Warrior LLC 
Twitter @KentGraziano
Agenda 
 My Bio 
 My Book 
 Survey 
 Backstory 
 What’s wrong with this picture? 
 The fallacy of the unconstrained data 
warehouse 
 Moral of the Story 
© Data Warrior LLC
My Bio 
 Kent Graziano 
● Oracle ACE Director (BI/DW) 
● Data Architecture and Data Warehouse Specialist 
● 30+ years in IT 
● 20+ years of Oracle-related work 
● 15+ years of data warehousing experience 
● Member: Boulder BI Brain Trust 
(http://www.boulderbibraintrust.org/ ) 
● Co-Author of 
● The Business of Data Vault Modeling 
● The Data Model Resource Book (1st Edition) 
● Past-President of Oracle Development Tools User Group and 
Rocky Mountain Oracle User Group 
© Data Warrior LLC
Most recent book: 
http://www.amazon.com/Check-Doing-Design-Reviews-ebook/dp/B008RG9L5E/
Survey 
 Who are you? 
● Data Modeler or Architect 
● Project Managers 
● IT Managers 
● DBA 
● Developer 
 Experience 
● Data Warehousing? 
● Less than 1 yr? 
● 1-5 yrs? 
● Over 5 years? 
© Data Warrior LLC
The Backstory 
 Metrics data mart 
 Outsourced 
 POC worked great 
● 500 records loaded! 
 Real world: 100K ++ rows 
● 1st run – DBA cancelled after 8 hours 
● Filled up 665GB temp space 
 Something wrong? 
© Data Warrior LLC
Next step 
 DBA says 
● Too many parallel sessions 
● Too many partitions on fact table 
● Load includes 
● Select * 
● Select distinct 
 Me 
● Reverse engineer the tables first 
● Look at the design 
● Yikes! 
© Data Warrior LLC
My email to management 
“In general, the designs of both the source star schema 
and the target reporting table do not conform to best 
practices from either an Oracle tuning or data warehouse 
design perspective. “ 
“My only conclusion is that the folks who did the design 
were not well versed or experienced in designing high 
performance, high volume data warehouse databases on 
Oracle.” 
“Some of the omissions are so basic as it is hard to 
comprehend how this could have been considered a 
completed system. “ 
© Data Warrior LLC
What’s wrong with this picture? 
● All optional 
columns 
● The 
measure is 
optional! 
● Even meta 
data! 
● Extra 
Varchar 
columns 
● No PK 
● No UK 
● No FKs 
● No 
Indexes! 
© Data Warrior LLC
So what? 
 Works fine for 500 rows 
● Full table scans 
 No clues for the optimizer 
 No clues for customer! 
● Design intent? 
● Data profile? 
 No PK/UK – could get duplicates in load 
 No FK – could be missing dimension keys 
 Lazy design! 
© Data Warrior LLC
What’s wrong with this picture? 
● All 
optional 
columns 
● Even the 
PK and 
meta 
data! 
● No UK 
● PK on an 
optional 
column? 
© Data Warrior LLC
So what? 
 No clue on business key 
 SCD Type 1 or 2? 
 There is a CRC Key and CRC Attr 
● But which date is the Type 2 date? 
 Again no clues in the indexes or NOT NULL 
 Have to look at data to see if 
DW_REC_CREATED_DT and 
DW_REC_UPDATED_DT are different 
 Can’t discern the intent 
© Data Warrior LLC
How about the Date Dimension? 
● All 
optional 
columns 
● Assume 
1st column 
is PK? 
● No PK 
● No UK 
● No Indexes 
© Data Warrior LLC
More examples 
 Let’s look into the data model…. 
© Data Warrior LLC
Other Stuff 
 Untested partitioning scheme 
● Target report table partitioning and sub-partition is 
non-standard – not on date field 
● Pre-created 200 list-based partitions 
● But the domain only had 37 values! 
 Did not use partition-aware loading approach 
 No indexes on partitions or sub partition 
© Data Warrior LLC
Load approach 
 Uses a “select *” from source in a view 
 UPPER function in predicate 
● Not needed 
● Cancels index usage 
 Degree of parallelism hardcoded into view 
 Dummy columns coded into view 
 No documentation on why 
 NEVER TESTED with real data! 
© Data Warrior LLC
The Fallacy of the Unconstrained Data 
Warehouse 
 Rationale 
● Fast to load – no constraints 
● All the validation is in the code 
 Reality 
● May be fast load, but slow query 
● Not tuned for extract! 
● Code may not have been QA’d well 
● No model to tell the programmers the rules 
● What columns are required? 
● What are the FKs to check? 
● What defines a duplicate row? 
 Cost 
● Slow query response 
● Bad data loaded 
● Few clues to help tune 
© Data Warrior LLC
Moral of the story? 
 Be careful who you outsource to 
 Have someone independent do touch point 
reviews of design 
● Costs extra, but we have spent MONTHS fixing this 
 Insist on documentation 
 Insist on knowledge transfer with internal DBA 
 Require load testing with performance criteria 
Trust but Verify! 
© Data Warrior LLC
Worst Practices in Data Warehouse Design
SUBMIT YOUR ABSTRACTS TODAY! 
Kscope15.com
Contact Information 
Kent Graziano 
The Oracle Data Warrior 
Data Warrior LLC 
Kent.graziano@att.net 
On Twitter @KentGraziano 
Visit my blog at 
http://kentgraziano.com
Ad

More Related Content

What's hot (20)

Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
Databricks
 
Visual Data Vault
Visual Data VaultVisual Data Vault
Visual Data Vault
Michael Olschimke
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Kent Graziano
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Kent Graziano
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
Kent Graziano
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
Empowered Holdings, LLC
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Kent Graziano
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
Kent Graziano
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
Kent Graziano
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
Kent Graziano
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
Kent Graziano
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
 
Conceptional Data Vault
Conceptional Data VaultConceptional Data Vault
Conceptional Data Vault
Torsten Glunde
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
Empowered Holdings, LLC
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
Michael Olschimke
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
larsgeorge
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
Empowered Holdings, LLC
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland Bouman
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
Quontra Solutions
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
Databricks
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Kent Graziano
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Kent Graziano
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
Kent Graziano
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Kent Graziano
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
Kent Graziano
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
Kent Graziano
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
Kent Graziano
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
Kent Graziano
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
 
Conceptional Data Vault
Conceptional Data VaultConceptional Data Vault
Conceptional Data Vault
Torsten Glunde
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
Michael Olschimke
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
larsgeorge
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland Bouman
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
Quontra Solutions
 

Viewers also liked (16)

Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Kent Graziano
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
Kent Graziano
 
Data control
Data controlData control
Data control
Kyle Hailey
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
Kent Graziano
 
Heli data modeler wildcard2013
Heli data modeler wildcard2013Heli data modeler wildcard2013
Heli data modeler wildcard2013
Andrejs Vorobjovs
 
Pimping SQL Developer and Data Modeler
Pimping SQL Developer and Data ModelerPimping SQL Developer and Data Modeler
Pimping SQL Developer and Data Modeler
Kris Rice
 
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
FrederikN
 
Oracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new featuresOracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new features
Philip Stoyanov
 
My Favorite Oracle SQL Developer Data Modeler Features
My Favorite Oracle SQL Developer Data Modeler FeaturesMy Favorite Oracle SQL Developer Data Modeler Features
My Favorite Oracle SQL Developer Data Modeler Features
Jeff Smith
 
Generating Code with Oracle SQL Developer Data Modeler
Generating Code with Oracle SQL Developer Data ModelerGenerating Code with Oracle SQL Developer Data Modeler
Generating Code with Oracle SQL Developer Data Modeler
Rob van den Berg
 
Dimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developerDimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developer
Jeff Smith
 
Oracle SQL Developer Data Modeler - Version Control Your Designs
Oracle SQL Developer Data Modeler - Version Control Your DesignsOracle SQL Developer Data Modeler - Version Control Your Designs
Oracle SQL Developer Data Modeler - Version Control Your Designs
Jeff Smith
 
Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?
Jeff Smith
 
PL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL DeveloperPL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL Developer
Jeff Smith
 
Dan Norris: Exadata security
Dan Norris: Exadata securityDan Norris: Exadata security
Dan Norris: Exadata security
Kyle Hailey
 
Mark Farnam : Minimizing the Concurrency Footprint of Transactions
Mark Farnam  : Minimizing the Concurrency Footprint of TransactionsMark Farnam  : Minimizing the Concurrency Footprint of Transactions
Mark Farnam : Minimizing the Concurrency Footprint of Transactions
Kyle Hailey
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Kent Graziano
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
Kent Graziano
 
Heli data modeler wildcard2013
Heli data modeler wildcard2013Heli data modeler wildcard2013
Heli data modeler wildcard2013
Andrejs Vorobjovs
 
Pimping SQL Developer and Data Modeler
Pimping SQL Developer and Data ModelerPimping SQL Developer and Data Modeler
Pimping SQL Developer and Data Modeler
Kris Rice
 
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
FrederikN
 
Oracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new featuresOracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new features
Philip Stoyanov
 
My Favorite Oracle SQL Developer Data Modeler Features
My Favorite Oracle SQL Developer Data Modeler FeaturesMy Favorite Oracle SQL Developer Data Modeler Features
My Favorite Oracle SQL Developer Data Modeler Features
Jeff Smith
 
Generating Code with Oracle SQL Developer Data Modeler
Generating Code with Oracle SQL Developer Data ModelerGenerating Code with Oracle SQL Developer Data Modeler
Generating Code with Oracle SQL Developer Data Modeler
Rob van den Berg
 
Dimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developerDimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developer
Jeff Smith
 
Oracle SQL Developer Data Modeler - Version Control Your Designs
Oracle SQL Developer Data Modeler - Version Control Your DesignsOracle SQL Developer Data Modeler - Version Control Your Designs
Oracle SQL Developer Data Modeler - Version Control Your Designs
Jeff Smith
 
Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?
Jeff Smith
 
PL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL DeveloperPL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL Developer
Jeff Smith
 
Dan Norris: Exadata security
Dan Norris: Exadata securityDan Norris: Exadata security
Dan Norris: Exadata security
Kyle Hailey
 
Mark Farnam : Minimizing the Concurrency Footprint of Transactions
Mark Farnam  : Minimizing the Concurrency Footprint of TransactionsMark Farnam  : Minimizing the Concurrency Footprint of Transactions
Mark Farnam : Minimizing the Concurrency Footprint of Transactions
Kyle Hailey
 
Ad

Similar to Worst Practices in Data Warehouse Design (20)

OpenMetadata Community Meeting - 7th August 2024
OpenMetadata Community Meeting - 7th August 2024OpenMetadata Community Meeting - 7th August 2024
OpenMetadata Community Meeting - 7th August 2024
OpenMetadata
 
Flipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 reprise
FlipkartStories
 
Cloud dwh
Cloud dwhCloud dwh
Cloud dwh
Alexander Tokarev
 
Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...
Marco Tusa
 
SCUG.DK: Visualizing Your Data, April 2015
SCUG.DK: Visualizing Your Data, April 2015SCUG.DK: Visualizing Your Data, April 2015
SCUG.DK: Visualizing Your Data, April 2015
Ronni Pedersen
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Edelweiss Kammermann
 
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Databricks
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should know
Kevin Kline
 
Talend Open Studio Data Integration
Talend Open Studio Data IntegrationTalend Open Studio Data Integration
Talend Open Studio Data Integration
Roberto Marchetto
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling Magento
Copious
 
Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]
Hiral Patel
 
Top 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL ServerTop 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL Server
Kevin Kline
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data Quality
Caserta
 
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big DataVoxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
zekeLabs Technologies
 
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
J.D. Wade
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf
3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf
3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf
VedatCoskun3
 
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQAutomate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
iceDQ
 
AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interaction
Daniel Norman
 
OpenMetadata Community Meeting - 7th August 2024
OpenMetadata Community Meeting - 7th August 2024OpenMetadata Community Meeting - 7th August 2024
OpenMetadata Community Meeting - 7th August 2024
OpenMetadata
 
Flipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 reprise
FlipkartStories
 
Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...
Marco Tusa
 
SCUG.DK: Visualizing Your Data, April 2015
SCUG.DK: Visualizing Your Data, April 2015SCUG.DK: Visualizing Your Data, April 2015
SCUG.DK: Visualizing Your Data, April 2015
Ronni Pedersen
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Edelweiss Kammermann
 
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Databricks
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should know
Kevin Kline
 
Talend Open Studio Data Integration
Talend Open Studio Data IntegrationTalend Open Studio Data Integration
Talend Open Studio Data Integration
Roberto Marchetto
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling Magento
Copious
 
Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]
Hiral Patel
 
Top 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL ServerTop 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL Server
Kevin Kline
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data Quality
Caserta
 
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big DataVoxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
zekeLabs Technologies
 
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
J.D. Wade
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf
3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf
3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf
VedatCoskun3
 
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQAutomate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
iceDQ
 
AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interaction
Daniel Norman
 
Ad

Recently uploaded (20)

DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 

Worst Practices in Data Warehouse Design

  • 1. Worst Practices in Data Warehouse Design Kent Graziano Data Warrior LLC Twitter @KentGraziano
  • 2. Agenda  My Bio  My Book  Survey  Backstory  What’s wrong with this picture?  The fallacy of the unconstrained data warehouse  Moral of the Story © Data Warrior LLC
  • 3. My Bio  Kent Graziano ● Oracle ACE Director (BI/DW) ● Data Architecture and Data Warehouse Specialist ● 30+ years in IT ● 20+ years of Oracle-related work ● 15+ years of data warehousing experience ● Member: Boulder BI Brain Trust (http://www.boulderbibraintrust.org/ ) ● Co-Author of ● The Business of Data Vault Modeling ● The Data Model Resource Book (1st Edition) ● Past-President of Oracle Development Tools User Group and Rocky Mountain Oracle User Group © Data Warrior LLC
  • 4. Most recent book: http://www.amazon.com/Check-Doing-Design-Reviews-ebook/dp/B008RG9L5E/
  • 5. Survey  Who are you? ● Data Modeler or Architect ● Project Managers ● IT Managers ● DBA ● Developer  Experience ● Data Warehousing? ● Less than 1 yr? ● 1-5 yrs? ● Over 5 years? © Data Warrior LLC
  • 6. The Backstory  Metrics data mart  Outsourced  POC worked great ● 500 records loaded!  Real world: 100K ++ rows ● 1st run – DBA cancelled after 8 hours ● Filled up 665GB temp space  Something wrong? © Data Warrior LLC
  • 7. Next step  DBA says ● Too many parallel sessions ● Too many partitions on fact table ● Load includes ● Select * ● Select distinct  Me ● Reverse engineer the tables first ● Look at the design ● Yikes! © Data Warrior LLC
  • 8. My email to management “In general, the designs of both the source star schema and the target reporting table do not conform to best practices from either an Oracle tuning or data warehouse design perspective. “ “My only conclusion is that the folks who did the design were not well versed or experienced in designing high performance, high volume data warehouse databases on Oracle.” “Some of the omissions are so basic as it is hard to comprehend how this could have been considered a completed system. “ © Data Warrior LLC
  • 9. What’s wrong with this picture? ● All optional columns ● The measure is optional! ● Even meta data! ● Extra Varchar columns ● No PK ● No UK ● No FKs ● No Indexes! © Data Warrior LLC
  • 10. So what?  Works fine for 500 rows ● Full table scans  No clues for the optimizer  No clues for customer! ● Design intent? ● Data profile?  No PK/UK – could get duplicates in load  No FK – could be missing dimension keys  Lazy design! © Data Warrior LLC
  • 11. What’s wrong with this picture? ● All optional columns ● Even the PK and meta data! ● No UK ● PK on an optional column? © Data Warrior LLC
  • 12. So what?  No clue on business key  SCD Type 1 or 2?  There is a CRC Key and CRC Attr ● But which date is the Type 2 date?  Again no clues in the indexes or NOT NULL  Have to look at data to see if DW_REC_CREATED_DT and DW_REC_UPDATED_DT are different  Can’t discern the intent © Data Warrior LLC
  • 13. How about the Date Dimension? ● All optional columns ● Assume 1st column is PK? ● No PK ● No UK ● No Indexes © Data Warrior LLC
  • 14. More examples  Let’s look into the data model…. © Data Warrior LLC
  • 15. Other Stuff  Untested partitioning scheme ● Target report table partitioning and sub-partition is non-standard – not on date field ● Pre-created 200 list-based partitions ● But the domain only had 37 values!  Did not use partition-aware loading approach  No indexes on partitions or sub partition © Data Warrior LLC
  • 16. Load approach  Uses a “select *” from source in a view  UPPER function in predicate ● Not needed ● Cancels index usage  Degree of parallelism hardcoded into view  Dummy columns coded into view  No documentation on why  NEVER TESTED with real data! © Data Warrior LLC
  • 17. The Fallacy of the Unconstrained Data Warehouse  Rationale ● Fast to load – no constraints ● All the validation is in the code  Reality ● May be fast load, but slow query ● Not tuned for extract! ● Code may not have been QA’d well ● No model to tell the programmers the rules ● What columns are required? ● What are the FKs to check? ● What defines a duplicate row?  Cost ● Slow query response ● Bad data loaded ● Few clues to help tune © Data Warrior LLC
  • 18. Moral of the story?  Be careful who you outsource to  Have someone independent do touch point reviews of design ● Costs extra, but we have spent MONTHS fixing this  Insist on documentation  Insist on knowledge transfer with internal DBA  Require load testing with performance criteria Trust but Verify! © Data Warrior LLC
  • 20. SUBMIT YOUR ABSTRACTS TODAY! Kscope15.com
  • 21. Contact Information Kent Graziano The Oracle Data Warrior Data Warrior LLC [email protected] On Twitter @KentGraziano Visit my blog at http://kentgraziano.com