SlideShare a Scribd company logo
Data Warehouse Design 
Best Practices
About me 
 Project Manager @ 
 12 years professional experience 
 .NET Web Development MCPD 
 SQL Server 2012 (MCSA) 
 Business Interests 
 Web Development, SOA, Integration 
 Security  Performance Optimization 
 Horizon2020, Open BIM, GIS, Mapping 
 Contact me 
 ivelin.andreev@icb.bg 
 www.linkedin.com/in/ivelin 
 www.slideshare.net/ivoandreev 
2 |
About me 
 Senior Developer @ 
 .NET Web Development MCPD 
 Business Interests 
 Web Development, WCF, Integration 
 SQL Server – Query Optimization and Tuning 
 Data Warehousing 
 Contact me 
 georgi.mishev@icb.bg 
 www.linkedin.com/in/georgimishev
Sponsors
Agenda 
 Why Data Warehouse 
 Main DW Architectures 
 Dimensional Modeling 
 Patterns  Practices 
 DW Maintenance 
 ETL Process 
 SSIS Demo
Lots of Data Everywhere 
 Can’t find data? 
 Data scattered over the network 
 Can’t get data? 
 Need an expert to get the data 
 Can’t understand data? 
 Data poorly documented 
 Can’t use data found? 
 Data needs to be transformed
Data Warehouse? 
Def: Central repository where data are organized, cleansed 
and in standardized format. 
 Integrated 
 Heterogeneous sources 
 Data clean and conversion ($, €, 元) 
 Focus on subject 
 i.e. Customer, Sale, Product 
 Time variant 
 Timestamp every key 
 Historical data (10+ years)
Different Problems - Different Solutions 
OLTP Database Data Warehouse 
Users Customer Knowledge worker 
Design Normalized, Data Integrity Denormalized 
Function Daily operation Decision making 
Data Current, Detailed Historical, Aggregated 
Usage Real time Ad-hoc 
Access Short R/W transactions Complex R/O queries 
Data accessed Comparatively lower Large Amounts 
# Records x100 x1’000’000 
# Users x1’000 x10 
DB Size x10 GB x100GB-TB
Different DW Architectures
B.Inmon Model 
Top-Down Approach 
 Warehouse (3NF) 
 Data Mart  OLAP (MD) 
http://sqlschoolgr.files.wordpress.com/2012/03/clip_image003_thumb.png?w=640h=368
R.Kimball Model 
Bottom-Up Approach 
 Data Marts (3NF or MD) 
 Warehouse  OLAP (MD) 
http://sqlschoolgr.files.wordpress.com/2012/03/clip_image005_thumb.png?w=640h=369
Data Vault (by Dan Linstedt) 
 Hubs 
 List of unique business keys 
 Links 
 Unique relationships between keys 
 Satellites 
 Hub and Link details and history
It is irrelevant which camp you belong… 
as far as you understand why!
Making Your Choice 
• Kimball (MD) 
+ Start small, scale big 
+ Faster ROI 
+ Analytical tools 
- Low reusability 
• Data Vault 
• Inmon (3NF) 
+ Structured 
+ Easy to maintain 
+ Easier data mining 
- Timely to build 
Backend Data Warehouse 
+ Multiple sources; Full history; Incremental build 
- Up-front work; Long-term payoff; Many joins
Dimensional modeling as de-facto standard
Dimensions 
Def: The object of BI interest 
 Keys 
 Surrogate key 
 Business key 
 Hierarchical attributes 
 Analysis and Drill Down 
 Member properties 
 Presentation labels 
 Auditing information (not for end users)
Slowly Changing Dimensions 
Def: Scheme for recording changes over time 
 Type 1 - Overwrite 
 Type 2 – Multiple Records
Facts 
Def: Measurement of a business process 
 Keys 
 FK from all dimensional tables (in the star) 
 PK - Composite (usually) or Surrogate 
 Measures 
 Numeric columns, that are of interest to the business 
 Additive, Non-additive, Semi-additive 
 Factless facts 
 Auditing information (optional)
Practices and Design Patterns
Data Warehouse Pitfalls 
 Admit it is not as it seems to be 
 You need education 
 Find what is of business value 
 Rather than focus on performance 
 Spend a lot of time in Extract-Transform-Load 
 Homogenize data from different sources 
 Find (and resolve) problems in source systems
Prepare your Sources 
 Data integrity 
 Avoid redundancy 
 Data quality 
 Master data source 
 Data validation 
 Auditing 
 CreatedDate / CreatedBy 
 ChangedDate / ChangedBy 
 Nightly jobs
Dimension Design 
 Business key with non-clustered index 
 Include date (if dimension has history) 
 Surrogate key 
 The smallest possible integer 
 Clustered index 
 FK constraints 
 Do not enforce (WITH NOCHECK) 
 Document the relation 
 Faster load 
 Data validation 
 Task for the Source system
Conformed Dimensions 
Def. Having the same meaning and content 
when referred from multiple fact tables 
 Date Dimension 
 Partitioning best candidate 
 Granularity 
 Do not store every hour, when reporting daily 
 Avoid surrogate keys 
 Saves lookup and joins 
 Integer representing date (yyyyMMdd, days after 1/1/1900)
Pre-join Hierarchies 
 Recursive relationships 
 Fast drill and report 
 Pre-computed aggregations 
Hierarchy Bridge 
 For each dimension row 
 1 association with self 
 1 row for each subordinate
Determine the Facts 
The center of a Star schema 
 Identify subject areas 
 Identify key business events 
 Identify dimensions 
 Start from OLTP logical model 
 Identify historical requirements 
 Identify attributes
The Grain 
Def: The level of detail of a fact table 
 What is the business objective? 
 Fine grain - behaviour and frequency analysis 
 Coarse grain - overall and trend analysis 
 Aggregates 
 DO NOT summarize prematurely 
 DO NOT mix detail and summary 
 DO use “summary tables”
C3-PO is fluent in 6M forms of communication. 
What about your customers?
Multinational DW 
 What parts need translation? 
 Where to store various language versions? 
 How to support future languages? 
 Dimensions 
 Add language attribute 
 Include text data in the dimension 
 Problem 1: The dimension key? 
 Replicate PK for every language 
Fact.DimId = Dim.Id AND Dim.Lang=[Lang] 
 Problem 2: Storage = [Dim] x [Lang] 
 Sub-dimension with language attributes 
TxtId Attr1 Attr2 LangId 
1 large Yes En 
2 small No En 
1 stor Ja No 
2 liten Nei No 
3 … … …
Data warehouse maintenance
How Large is “Large” 
Is big really big?
Partitioning 
 Why 
 Faster index maintenance 
 Faster load 
 Faster queries 
 When 
 Tables 10GB+ 
 How 
 Do not partition dimension tables 
 Partition by date (most analysis are time-based) 
 Eliminate partitions (WHERE [PartitionKey]=…) 
 Avoid split and merge of existing partitions 
 Can cause inefficient log generation
Columnstore Index 
 Non-clustered in SQL 2012 
 Clustered in SQL 2014 
 Pros 
 Better data compression 
 High performance on table scan 
 Clustered CSI Limitations 
 No other indexes allowed 
 Little advantage on seek operations 
 No XML, computed column or replication
Extract-Transform-Load 
 Extract data from OLTP 
 Data transformations 
 Data loads 
 DW maintenance
Efficient Load Process 
 Use simple recovery model during data load 
 Staging 
 Avoid indexing 
 Populate in parallel 
 Maintain DW 
 Disable indexes on load 
 Rebuild manually after load 
 Automatic stats update slow down SQL Server
To SSIS, or not to SSIS ? 
Pros 
 Minimum coding to none 
 Extensive support of various data sources 
 Parallel execution of migration tasks 
 Better organization of the ETL process 
Cons 
 Another way of thinking 
 Hidden options 
 T-SQL developer would do much faster 
 Auto-generated flows need optimization 
 Sometimes simply does not work (i.e. Sort by GUID)
Data Warehouse Design and Best Practices
Takeaways 
 Books 
 The Data Warehouse Toolkit (3rd ed), Ralph Kimball 
 Implementing DW with Microsoft SQL Server 2012 
 Data Warehousing Fundamentals, Paulraj Ponniah 
 Articles 
 Best Practices in Data Warehouse (Hanover Research Council) 
 http://www.kimballgroup.com/category/design-tips/ 
 http://sqlmag.com/business-intelligence 
 Resources 
 http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/ 
dimensional-modeling-techniques/ 
 http://www.databaseanswers.org/data_models/index.htm
Data Warehouse Design and Best Practices
Ad

More Related Content

What's hot (20)

Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
Eric Matthews
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
Prithwis Mukerjee
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
chennakesava44
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
Denodo
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
Ishan Bhawantha Hewanayake
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical Approaches
DATAVERSITY
 
Postgre sql vs oracle
Postgre sql vs oraclePostgre sql vs oracle
Postgre sql vs oracle
Jacques Kostic
 
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
DATAVERSITY
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
Snowflake Computing
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
 
Data modeling for the business
Data modeling for the businessData modeling for the business
Data modeling for the business
Christopher Bradley
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
Eric Matthews
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
chennakesava44
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
Denodo
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical Approaches
DATAVERSITY
 
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
DATAVERSITY
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 

Similar to Data Warehouse Design and Best Practices (20)

SQL Server Integration Services and Analysis Services
SQL Server Integration Services and Analysis ServicesSQL Server Integration Services and Analysis Services
SQL Server Integration Services and Analysis Services
Mohan Arumugam
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
Empowered Holdings, LLC
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL Server
Peter Gfader
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
Siwawong Wuttipongprasert
 
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .NetCV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
Sham Sunder
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
NEWYORKSYS-IT SOLUTIONS
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Mark Tabladillo
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Elena Lopez
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
Jeff Smoley
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
webuploader
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Grega Kespret
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
Deepali Raut
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en Azure
Elena Lopez
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Kent Graziano
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure
Marco Parenzan
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
Ahsan Kabir
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Databricks
 
SQL Server Integration Services and Analysis Services
SQL Server Integration Services and Analysis ServicesSQL Server Integration Services and Analysis Services
SQL Server Integration Services and Analysis Services
Mohan Arumugam
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL Server
Peter Gfader
 
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .NetCV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
Sham Sunder
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
NEWYORKSYS-IT SOLUTIONS
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Mark Tabladillo
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Elena Lopez
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
Jeff Smoley
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
webuploader
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Grega Kespret
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
Deepali Raut
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en Azure
Elena Lopez
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Kent Graziano
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure
Marco Parenzan
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
Ahsan Kabir
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Databricks
 
Ad

More from Ivo Andreev (20)

LLM-based Multi-Agent Systems to Replace Traditional Software
LLM-based Multi-Agent Systems to Replace Traditional SoftwareLLM-based Multi-Agent Systems to Replace Traditional Software
LLM-based Multi-Agent Systems to Replace Traditional Software
Ivo Andreev
 
LLM Security - Smart to protect, but too smart to be protected
LLM Security - Smart to protect, but too smart to be protectedLLM Security - Smart to protect, but too smart to be protected
LLM Security - Smart to protect, but too smart to be protected
Ivo Andreev
 
What are Phi Small Language Models Capable of
What are Phi Small Language Models Capable ofWhat are Phi Small Language Models Capable of
What are Phi Small Language Models Capable of
Ivo Andreev
 
Autonomous Control AI Training from Data
Autonomous Control AI Training from DataAutonomous Control AI Training from Data
Autonomous Control AI Training from Data
Ivo Andreev
 
Autonomous Systems for Optimization and Control
Autonomous Systems for Optimization and ControlAutonomous Systems for Optimization and Control
Autonomous Systems for Optimization and Control
Ivo Andreev
 
Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
Ivo Andreev
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
Ivo Andreev
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
Ivo Andreev
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
Ivo Andreev
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
Ivo Andreev
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and Misconceptions
Ivo Andreev
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
Ivo Andreev
 
Collecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn DataCollecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn Data
Ivo Andreev
 
Collecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure OrbitalCollecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure Orbital
Ivo Andreev
 
Language Studio and Custom Models
Language Studio and Custom ModelsLanguage Studio and Custom Models
Language Studio and Custom Models
Ivo Andreev
 
CosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosCosmosDB for IoT Scenarios
CosmosDB for IoT Scenarios
Ivo Andreev
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
Ivo Andreev
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
Ivo Andreev
 
Azure security guidelines for developers
Azure security guidelines for developers Azure security guidelines for developers
Azure security guidelines for developers
Ivo Andreev
 
Autonomous Machines with Project Bonsai
Autonomous Machines with Project BonsaiAutonomous Machines with Project Bonsai
Autonomous Machines with Project Bonsai
Ivo Andreev
 
LLM-based Multi-Agent Systems to Replace Traditional Software
LLM-based Multi-Agent Systems to Replace Traditional SoftwareLLM-based Multi-Agent Systems to Replace Traditional Software
LLM-based Multi-Agent Systems to Replace Traditional Software
Ivo Andreev
 
LLM Security - Smart to protect, but too smart to be protected
LLM Security - Smart to protect, but too smart to be protectedLLM Security - Smart to protect, but too smart to be protected
LLM Security - Smart to protect, but too smart to be protected
Ivo Andreev
 
What are Phi Small Language Models Capable of
What are Phi Small Language Models Capable ofWhat are Phi Small Language Models Capable of
What are Phi Small Language Models Capable of
Ivo Andreev
 
Autonomous Control AI Training from Data
Autonomous Control AI Training from DataAutonomous Control AI Training from Data
Autonomous Control AI Training from Data
Ivo Andreev
 
Autonomous Systems for Optimization and Control
Autonomous Systems for Optimization and ControlAutonomous Systems for Optimization and Control
Autonomous Systems for Optimization and Control
Ivo Andreev
 
Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
Ivo Andreev
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
Ivo Andreev
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
Ivo Andreev
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
Ivo Andreev
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
Ivo Andreev
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and Misconceptions
Ivo Andreev
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
Ivo Andreev
 
Collecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn DataCollecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn Data
Ivo Andreev
 
Collecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure OrbitalCollecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure Orbital
Ivo Andreev
 
Language Studio and Custom Models
Language Studio and Custom ModelsLanguage Studio and Custom Models
Language Studio and Custom Models
Ivo Andreev
 
CosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosCosmosDB for IoT Scenarios
CosmosDB for IoT Scenarios
Ivo Andreev
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
Ivo Andreev
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
Ivo Andreev
 
Azure security guidelines for developers
Azure security guidelines for developers Azure security guidelines for developers
Azure security guidelines for developers
Ivo Andreev
 
Autonomous Machines with Project Bonsai
Autonomous Machines with Project BonsaiAutonomous Machines with Project Bonsai
Autonomous Machines with Project Bonsai
Ivo Andreev
 
Ad

Recently uploaded (20)

Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 

Data Warehouse Design and Best Practices

  • 1. Data Warehouse Design Best Practices
  • 2. About me Project Manager @ 12 years professional experience .NET Web Development MCPD SQL Server 2012 (MCSA) Business Interests Web Development, SOA, Integration Security Performance Optimization Horizon2020, Open BIM, GIS, Mapping Contact me [email protected] www.linkedin.com/in/ivelin www.slideshare.net/ivoandreev 2 |
  • 3. About me Senior Developer @ .NET Web Development MCPD Business Interests Web Development, WCF, Integration SQL Server – Query Optimization and Tuning Data Warehousing Contact me [email protected] www.linkedin.com/in/georgimishev
  • 5. Agenda Why Data Warehouse Main DW Architectures Dimensional Modeling Patterns Practices DW Maintenance ETL Process SSIS Demo
  • 6. Lots of Data Everywhere Can’t find data? Data scattered over the network Can’t get data? Need an expert to get the data Can’t understand data? Data poorly documented Can’t use data found? Data needs to be transformed
  • 7. Data Warehouse? Def: Central repository where data are organized, cleansed and in standardized format. Integrated Heterogeneous sources Data clean and conversion ($, €, 元) Focus on subject i.e. Customer, Sale, Product Time variant Timestamp every key Historical data (10+ years)
  • 8. Different Problems - Different Solutions OLTP Database Data Warehouse Users Customer Knowledge worker Design Normalized, Data Integrity Denormalized Function Daily operation Decision making Data Current, Detailed Historical, Aggregated Usage Real time Ad-hoc Access Short R/W transactions Complex R/O queries Data accessed Comparatively lower Large Amounts # Records x100 x1’000’000 # Users x1’000 x10 DB Size x10 GB x100GB-TB
  • 10. B.Inmon Model Top-Down Approach Warehouse (3NF) Data Mart OLAP (MD) http://sqlschoolgr.files.wordpress.com/2012/03/clip_image003_thumb.png?w=640h=368
  • 11. R.Kimball Model Bottom-Up Approach Data Marts (3NF or MD) Warehouse OLAP (MD) http://sqlschoolgr.files.wordpress.com/2012/03/clip_image005_thumb.png?w=640h=369
  • 12. Data Vault (by Dan Linstedt) Hubs List of unique business keys Links Unique relationships between keys Satellites Hub and Link details and history
  • 13. It is irrelevant which camp you belong… as far as you understand why!
  • 14. Making Your Choice • Kimball (MD) + Start small, scale big + Faster ROI + Analytical tools - Low reusability • Data Vault • Inmon (3NF) + Structured + Easy to maintain + Easier data mining - Timely to build Backend Data Warehouse + Multiple sources; Full history; Incremental build - Up-front work; Long-term payoff; Many joins
  • 15. Dimensional modeling as de-facto standard
  • 16. Dimensions Def: The object of BI interest Keys Surrogate key Business key Hierarchical attributes Analysis and Drill Down Member properties Presentation labels Auditing information (not for end users)
  • 17. Slowly Changing Dimensions Def: Scheme for recording changes over time Type 1 - Overwrite Type 2 – Multiple Records
  • 18. Facts Def: Measurement of a business process Keys FK from all dimensional tables (in the star) PK - Composite (usually) or Surrogate Measures Numeric columns, that are of interest to the business Additive, Non-additive, Semi-additive Factless facts Auditing information (optional)
  • 20. Data Warehouse Pitfalls Admit it is not as it seems to be You need education Find what is of business value Rather than focus on performance Spend a lot of time in Extract-Transform-Load Homogenize data from different sources Find (and resolve) problems in source systems
  • 21. Prepare your Sources Data integrity Avoid redundancy Data quality Master data source Data validation Auditing CreatedDate / CreatedBy ChangedDate / ChangedBy Nightly jobs
  • 22. Dimension Design Business key with non-clustered index Include date (if dimension has history) Surrogate key The smallest possible integer Clustered index FK constraints Do not enforce (WITH NOCHECK) Document the relation Faster load Data validation Task for the Source system
  • 23. Conformed Dimensions Def. Having the same meaning and content when referred from multiple fact tables Date Dimension Partitioning best candidate Granularity Do not store every hour, when reporting daily Avoid surrogate keys Saves lookup and joins Integer representing date (yyyyMMdd, days after 1/1/1900)
  • 24. Pre-join Hierarchies Recursive relationships Fast drill and report Pre-computed aggregations Hierarchy Bridge For each dimension row 1 association with self 1 row for each subordinate
  • 25. Determine the Facts The center of a Star schema Identify subject areas Identify key business events Identify dimensions Start from OLTP logical model Identify historical requirements Identify attributes
  • 26. The Grain Def: The level of detail of a fact table What is the business objective? Fine grain - behaviour and frequency analysis Coarse grain - overall and trend analysis Aggregates DO NOT summarize prematurely DO NOT mix detail and summary DO use “summary tables”
  • 27. C3-PO is fluent in 6M forms of communication. What about your customers?
  • 28. Multinational DW What parts need translation? Where to store various language versions? How to support future languages? Dimensions Add language attribute Include text data in the dimension Problem 1: The dimension key? Replicate PK for every language Fact.DimId = Dim.Id AND Dim.Lang=[Lang] Problem 2: Storage = [Dim] x [Lang] Sub-dimension with language attributes TxtId Attr1 Attr2 LangId 1 large Yes En 2 small No En 1 stor Ja No 2 liten Nei No 3 … … …
  • 30. How Large is “Large” Is big really big?
  • 31. Partitioning Why Faster index maintenance Faster load Faster queries When Tables 10GB+ How Do not partition dimension tables Partition by date (most analysis are time-based) Eliminate partitions (WHERE [PartitionKey]=…) Avoid split and merge of existing partitions Can cause inefficient log generation
  • 32. Columnstore Index Non-clustered in SQL 2012 Clustered in SQL 2014 Pros Better data compression High performance on table scan Clustered CSI Limitations No other indexes allowed Little advantage on seek operations No XML, computed column or replication
  • 33. Extract-Transform-Load Extract data from OLTP Data transformations Data loads DW maintenance
  • 34. Efficient Load Process Use simple recovery model during data load Staging Avoid indexing Populate in parallel Maintain DW Disable indexes on load Rebuild manually after load Automatic stats update slow down SQL Server
  • 35. To SSIS, or not to SSIS ? Pros Minimum coding to none Extensive support of various data sources Parallel execution of migration tasks Better organization of the ETL process Cons Another way of thinking Hidden options T-SQL developer would do much faster Auto-generated flows need optimization Sometimes simply does not work (i.e. Sort by GUID)
  • 37. Takeaways Books The Data Warehouse Toolkit (3rd ed), Ralph Kimball Implementing DW with Microsoft SQL Server 2012 Data Warehousing Fundamentals, Paulraj Ponniah Articles Best Practices in Data Warehouse (Hanover Research Council) http://www.kimballgroup.com/category/design-tips/ http://sqlmag.com/business-intelligence Resources http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/ dimensional-modeling-techniques/ http://www.databaseanswers.org/data_models/index.htm