0% found this document useful (0 votes)

87 views

Data Warehouse Design

The document discusses data warehouse architecture and design. It begins by explaining that data warehouse architecture is standardized and inherent to the hosting platform, while design is customized to user needs. It then provides details on traditional on-premises architectures with three tiers and their drawbacks compared to modern cloud data warehouses which offer scalability, reliability and pay-as-you-go pricing. Finally, it outlines seven steps for robust data warehouse design including understanding user needs, data modeling, ETL/ELT processes, and ongoing maintenance.

Uploaded by

Dinesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views

Data Warehouse Design

Uploaded by

Dinesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Data warehouse design for data-driven

enterprises
Builders erect houses from blueprints, with architecture constrained by physical limitations.
When someone moves in, they decide how the inside of their home should look, and their
preferences determine interior decor and design. Data warehouses are similar: their architecture
is standardized, but their design is adapted to business and user needs.

Data warehouse architecture is inherent to the main hosting platform or service selected by an
organization. It’s a built-in, static infrastructure, and hosts all the specific tools and processes
that eventually make up the warehouse.

Data warehouse design, however, should be customized with an understanding of an

organization’s stakeholders and strategic requirements. Short-sighted design choices limit the
potential and performance of enterprise data warehouses.

What is data warehouse architecture?

Before discussing data warehouse design, it’s important to understand the foundational role that
architecture plays in both on-premises and cloud data warehouses.

Data warehouses contain historical, current, and critical enterprise data. They form the storage
and processing platform underlying reporting, dashboards, business intelligence, and analytics.

Traditional data warehouse architecture

Originally, data warehouses ran on on-premises hardware, and were architected in three distinct
tiers:
 The bottom tier of traditional data warehouse architecture is the core relational database
system, and contains all data ingestion logic and ETL processes. The ETL processes connect to
data sources and extract data to local staging databases, where it’s transformed then forwarded to
production servers.

 In the middle tier, an online analytical processing (OLAP) server powers reporting and
analytic logic. At this tier data architects may further transform the data, aggregate it, or enrich it
before running business intelligence processes.

 The top tier is the front end, the user-facing layer. It contains web interfaces for
stakeholders to access and query reports or analytical results, as well as visualization
and business intelligence tools for end users running ad-hoc analysis.
Drawbacks of traditional architecture
A major issue associated with on-premises data warehouses is the cost of deployment and
management. Businesses must purchase server hardware, set aside space to house it, and devote
IT staff to set it up and administer it.

More important, however, is the fact that on-premises hardware is more difficult to scale than
cloud-based storage and computing. Managers and decision-makers must get approval for new
hardware, assign budgets, and wait for shipment, and then engineers and IT specialists must
installation and set up both hardware and software. After installation, operational systems may
not be used to capacity, which means money wasted on resources that aren’t contributing value.

This type of mission-critical infrastructure requires a high level of expenditure, attention, and
employee specialization when deployed on-premises. Modern cloud services don’t have these
problems, because resources are scalable and pricing is too.

The modern cloud data warehouse

Cloud data warehouses are more adaptable, performant, and powerful than in-house systems.
Businesses can save on staffing and can put their IT staff to better use, because their
infrastructure is managed by dedicated specialists.

Cloud data warehouses feature column-oriented databases, where the unit of storage is a single
attribute, with values from all records. Columnar storage does not change how customers
organize or represent data, but allows for faster access and processing.

Cloud data warehouses also offer automatic, near-real-time scalability and greater system
reliability and uptime than on-premises hardware, and transparent billing, which allows
enterprises to pay only for what they use.

Because cloud data warehouses don’t rely on the rigid structures and data modeling concepts
inherent in traditional systems, they have diverse architectures.

 Amazon Redshift’s approach is akin to infrastructure-as-a-service (IaaS) or platform-as-

a-service (PaaS). Redshift is highly scalable, provisioning clusters of nodes to customers as their
storage and computing needs evolve. Each node has individual CPU, RAM, and storage space,
facilitating the massive parallel processing (MPP) needed for any big data application, especially
the data warehouse. Customers have to be responsible for some capacity planning and must
provision compute and storage nodes on the platform.

 The Google BigQuery approach is more like software-as-a-service (SaaS) that allows

interactive analysis of big data. It can be used alongside Google Cloud Storage and technologies
such as MapReduce. BigQuery differentiates itself with a serverless architecture, which means
users cannot see details of resource allocation, as computational and storage provisioning
happens continuously and dynamically.

 Snowflake’s automatically managed storage layer can contain structured or

semistructured data, such as nested JSON objects. The compute layer is composed of clusters,
each of which can access all data but work independently and concurrently to enable automatic
scaling, distribution, and rebalancing. Snowflake is a data warehouse-as-a-service, and operates
across multiple clouds, including AWS, Microsoft Azure and, soon, Google Cloud.

 Microsoft Azure SQL Data Warehouse is an elastic, large-scale data warehouse PaaS that
leverages the broad ecosystem of SQL Server. Like other cloud storage and computing
platforms, it uses a distributed, MPP architecture and columnar data store. It gathers data from
databases and SaaS platforms into one powerful, fully-managed centralized repository.

Data warehousing schemas

Data warehouses are relational databases, and they are associated with traditional schemas,
which are the ways in which records are described and organized.

 A snowflake schema arranges tables and their connections so that a representative entity

relationship diagram (ERD) resembles a snowflake. A centralized fact table connects to many
dimension tables, which themselves connect to more dimension tables, and so on. Data
is normalized.

Snowflake scheme
 The simpler star schema is a special case of the snowflake schema. Only one level of
dimension tables is connected to the central fact table, resulting in ERDs with star shapes. These
dimension tables are denormalized, containing all attributes and information associated with the
particular type of record they hold.

Star scheme

Anticipating major design flaws

A complex system like a modern data warehouse includes many dependencies. Errors may
propagate throughout the system and make mistakes difficult to rectify.

 Organizations should strive to be future-proof. Design choices based exclusively on

immediate needs may cause problems later.

 Data warehouse design is a collaborative process that should include all key stakeholders.
Leaving out end users during planning means less engagement. The same applies when leaving
design entirely up to the IT department. High-level managers and decision-makers should
provide the overall business strategy.
 Data quality should be a priority. Strong data governance practices ensure clean data and
encourage adherence to rules and regulations.

 Subject matter experts should lead the data modeling process. This guidance ensures that
the data pipeline will be robust, consistently organized, and documented.

 Businesses should design for optimized query performance, pulling only relevant data,
using efficient data structures, and tuning systems often. OLAP cube design in particular is
critical: It allows super-fast and intuitive analysis of data according to the multiple dimensions of
a business problem.

7 steps to robust data warehouse design

After selecting a data warehouse, an organization can focus on specific design considerations.
Here are seven steps that help ensure a robust data warehouse design:

1. User needs: A good data warehouse design should be based on business and user needs.
Therefore, the first step in the design procedure is to gather requirements, to ensure that the data
warehouse will be integrated with existing business processes and be compatible with long-term
strategy. Enterprises must determine the purpose of their data warehouse, any technical
requirements, which stakeholders will benefit from the system, and which questions will be
answered with improved reporting, business intelligence (BI), and analytics.

2. Physical environment: Enterprises that opt for on-premises architecture must set up the
physical environment, including all the servers necessary to power ETL processes, storage, and
analytic operations. Enterprises can skip this step if they choose a cloud data warehouse.

3. Data modeling: Next comes data modeling, which is perhaps the most important planning
step. The data modeling process should result in detailed, reusable documentation of a data
warehouse’s implementation. Modelers assess the structure of data in sources, decide how to
represent these sources in the data warehouse, and specify OLAP requirements, including level
of processing granularity, important aggregations and measures, and high-level dimensions or
context.

4. ETL/ELT: The next step is the selection of an ETL/ELT solution. ETL transforms data prior

to the loading stage. When businesses used costly in-house analytics systems, it made sense to do
as much prep work as possible, including transformations, prior to loading data into the
warehouse. However, ELT is a better approach when the destination is a cloud data warehouse.
Organizations can transform their raw data at any time, when and as necessary for their use case,
and not as a step in the data pipeline.
5. Semantic layer: Next up is designing the data warehouse’s semantic layer. Based on
previously documented data models, the OLAP server is implemented to support the analytical
queries of individual users and to empower BI systems. This step determines the core analytical
processing capabilities of the data warehouse, so data engineers should carefully consider time-
to-analysis and latency requirements.

6. Reporting layer: With analytic infrastructure built and implemented, an organization can

design a reporting layer. An administrator designates groups and end users, describes and
enforces permissible access, and implements reporting interfaces or delivery methods.

7. Test and tune: All that remains is to test and tune the completed data warehouse and data
pipeline. Businesses should assess data ingestion and ETL/ELT systems, tweak query engine
configurations for performance, and validate final reports. This is a continuous process requiring
dedicated testing environments and ongoing engagement.

Pratik - BA Deliverables - Online Loan Application
No ratings yet
Pratik - BA Deliverables - Online Loan Application
15 pages
SOQL Queries
100% (2)
SOQL Queries
18 pages
Telecommunication - DWH - Models
No ratings yet
Telecommunication - DWH - Models
3 pages
Object Storage
100% (1)
Object Storage
45 pages
AWS Vs Azure Vs Google Vs IBM Vs Oracle Vs Alibaba A Detailed Comparison PDF
No ratings yet
AWS Vs Azure Vs Google Vs IBM Vs Oracle Vs Alibaba A Detailed Comparison PDF
12 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Big Data Storage Platforms
No ratings yet
Big Data Storage Platforms
19 pages
Objectscale xf960 Cru
No ratings yet
Objectscale xf960 Cru
9 pages
Big Data in Financial Services
No ratings yet
Big Data in Financial Services
20 pages
IDC Report-Worldwide Big Data Technology and Services
No ratings yet
IDC Report-Worldwide Big Data Technology and Services
34 pages
Business Intelligence Business Case 2
No ratings yet
Business Intelligence Business Case 2
8 pages
HLD Software692263-1
No ratings yet
HLD Software692263-1
112 pages
Supercharging Hyper-V Performance: For The Time-Strapped Admin
No ratings yet
Supercharging Hyper-V Performance: For The Time-Strapped Admin
24 pages
Data Storage Technologies and Networks
No ratings yet
Data Storage Technologies and Networks
6 pages
Chapter 1. Networking and Storage Concepts
No ratings yet
Chapter 1. Networking and Storage Concepts
31 pages
MIS 584 Final Capstone Project Report - Group13
No ratings yet
MIS 584 Final Capstone Project Report - Group13
44 pages
Cloudera Kudu
100% (1)
Cloudera Kudu
102 pages
Mware - To - OpenStack - WP - 04.08.22
No ratings yet
Mware - To - OpenStack - WP - 04.08.22
12 pages
Supply-Chain Strateries: Suppliers Become Part of A Company Coalition
No ratings yet
Supply-Chain Strateries: Suppliers Become Part of A Company Coalition
4 pages
2020 Data Center Roadmap Survey PDF
No ratings yet
2020 Data Center Roadmap Survey PDF
16 pages
Global Datacenter Locations Talent Neuron
No ratings yet
Global Datacenter Locations Talent Neuron
15 pages
Hitachi Tiered Storage Manager Software Datasheet
No ratings yet
Hitachi Tiered Storage Manager Software Datasheet
2 pages
Architecting A High Performance Storage System
No ratings yet
Architecting A High Performance Storage System
19 pages
Storage Area Network
No ratings yet
Storage Area Network
5 pages
WP The Data Center of The Future Reaching Sustainability en
No ratings yet
WP The Data Center of The Future Reaching Sustainability en
40 pages
HFS HZ 2023 May Data Modernization Services 3
No ratings yet
HFS HZ 2023 May Data Modernization Services 3
60 pages
Battle of The Giants - Comparing Kimball and Inmon
No ratings yet
Battle of The Giants - Comparing Kimball and Inmon
15 pages
Cloud Computing
No ratings yet
Cloud Computing
27 pages
Eb Attunity Streaming Change Data Capture en
No ratings yet
Eb Attunity Streaming Change Data Capture en
60 pages
DWH & BI in Banking at ET 2 Nov 2004 Chandrasekhar
No ratings yet
DWH & BI in Banking at ET 2 Nov 2004 Chandrasekhar
50 pages
A Roadmap To Total Cost Ofownership: Building The Cost Basis For The Move To Cloud
No ratings yet
A Roadmap To Total Cost Ofownership: Building The Cost Basis For The Move To Cloud
9 pages
Accelerating Data Governance and Analysis With Snowflake
No ratings yet
Accelerating Data Governance and Analysis With Snowflake
2 pages
EcoStruxure IT Product Overview - Partner R
No ratings yet
EcoStruxure IT Product Overview - Partner R
62 pages
Global Data Center Rack Server Market
No ratings yet
Global Data Center Rack Server Market
1 page
Performance Tuning With InfoSphere CDC
100% (1)
Performance Tuning With InfoSphere CDC
37 pages
Analyse This, P Analyse This, Predict That - Final Report - Singles LR - Redict That - Final Report - Singles LR
No ratings yet
Analyse This, P Analyse This, Predict That - Final Report - Singles LR - Redict That - Final Report - Singles LR
60 pages
From Monolithic Systems To Microservices: An Assessment Framework
No ratings yet
From Monolithic Systems To Microservices: An Assessment Framework
12 pages
Building The Unified Data Warehouse and Data Lake: Best Practices Report Q2
No ratings yet
Building The Unified Data Warehouse and Data Lake: Best Practices Report Q2
30 pages
Advanced SQL Case Study
No ratings yet
Advanced SQL Case Study
42 pages
Advanced Technology Stacks and Business Use-Cases
100% (1)
Advanced Technology Stacks and Business Use-Cases
28 pages
ASP and JNJ Acquisition Presentation
No ratings yet
ASP and JNJ Acquisition Presentation
16 pages
Data Warehouse Concepts: Avinash Kanumuru Diya Jana Debyajit Majumder
No ratings yet
Data Warehouse Concepts: Avinash Kanumuru Diya Jana Debyajit Majumder
482 pages
Snowflake Schema
No ratings yet
Snowflake Schema
4 pages
Storage Technology Trends
No ratings yet
Storage Technology Trends
51 pages
Data Center Principles and Strategies 5
No ratings yet
Data Center Principles and Strategies 5
29 pages
Serverchoice - Data Centre Checklist
No ratings yet
Serverchoice - Data Centre Checklist
8 pages
2.2 Storage and Database Services
No ratings yet
2.2 Storage and Database Services
64 pages
IN 1040 DataDiscoveryGuide en PDF
No ratings yet
IN 1040 DataDiscoveryGuide en PDF
215 pages
Design and Implementation of An Enterprise Data Warehouse
No ratings yet
Design and Implementation of An Enterprise Data Warehouse
91 pages
Remote Replication: Rashi Kanungo Digvijay Singh Rajawat Shivani Vashi
No ratings yet
Remote Replication: Rashi Kanungo Digvijay Singh Rajawat Shivani Vashi
22 pages
Facility Operations Maturity Model For Data Centers: White Paper 197
No ratings yet
Facility Operations Maturity Model For Data Centers: White Paper 197
9 pages
Informix 11.70 Fundamentals
No ratings yet
Informix 11.70 Fundamentals
1 page
IM0005 ETL Mapping Template
No ratings yet
IM0005 ETL Mapping Template
1 page
Informatica Cloud (IICS) Architecture
No ratings yet
Informatica Cloud (IICS) Architecture
21 pages
Data Fabric Solutions
No ratings yet
Data Fabric Solutions
37 pages
Storage Xenmotion Live Storage Migration With Citrix Xenserver
No ratings yet
Storage Xenmotion Live Storage Migration With Citrix Xenserver
7 pages
Containerization in Cloud Computing For OS-level Virtualization
No ratings yet
Containerization in Cloud Computing For OS-level Virtualization
13 pages
IaaS Infrastructure As A Service
No ratings yet
IaaS Infrastructure As A Service
15 pages
Cohesity Data Protection White Paper
No ratings yet
Cohesity Data Protection White Paper
12 pages
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
Download ebooks file PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries 2nd Edition Dombrovskaya all chapters
100% (4)
Download ebooks file PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries 2nd Edition Dombrovskaya all chapters
76 pages
Online Movie Ticket Booking System
100% (1)
Online Movie Ticket Booking System
72 pages
PRM392-Quiz Bank
No ratings yet
PRM392-Quiz Bank
27 pages
Mimics Guide
100% (1)
Mimics Guide
480 pages
A Case Study of Using BIM and COBie For
No ratings yet
A Case Study of Using BIM and COBie For
16 pages
Inforos 2022.x Inforosig On-Premise En-Us
No ratings yet
Inforos 2022.x Inforosig On-Premise En-Us
107 pages
Mapping Test
No ratings yet
Mapping Test
2 pages
OUM White Paper
No ratings yet
OUM White Paper
17 pages
az-305_6
No ratings yet
az-305_6
40 pages
Kavita More
No ratings yet
Kavita More
3 pages
Laboratory Information System 4107-04a
No ratings yet
Laboratory Information System 4107-04a
9 pages
Whats New
No ratings yet
Whats New
183 pages
Pervasive ETL Fundamental Exercises
No ratings yet
Pervasive ETL Fundamental Exercises
239 pages
Superior University Lahore: Faculty of Computer Science & IT
No ratings yet
Superior University Lahore: Faculty of Computer Science & IT
11 pages
Acceptance Criteria - Sample
No ratings yet
Acceptance Criteria - Sample
10 pages
SOQL in Salesforce
No ratings yet
SOQL in Salesforce
32 pages
ITB - Unit 1 - Notes
No ratings yet
ITB - Unit 1 - Notes
21 pages
OpenText Documentum Xcelerated Composition Platform CE 23.2 - User Guide English (EDCPKL230200-UGD-EN-01)
No ratings yet
OpenText Documentum Xcelerated Composition Platform CE 23.2 - User Guide English (EDCPKL230200-UGD-EN-01)
744 pages
Tyit Regular Sem6 Gis QB
No ratings yet
Tyit Regular Sem6 Gis QB
7 pages
Course Material
100% (1)
Course Material
57 pages
Akash Data Engineer
No ratings yet
Akash Data Engineer
6 pages
Storage Account
No ratings yet
Storage Account
6 pages
Library PPT 1 Rev
No ratings yet
Library PPT 1 Rev
12 pages
Mini Project
No ratings yet
Mini Project
6 pages
Indian Railways URS
No ratings yet
Indian Railways URS
10 pages
Online Bakery Management System ASP Net
53% (15)
Online Bakery Management System ASP Net
27 pages
CodeofaNinja - Shopping Cart Module
No ratings yet
CodeofaNinja - Shopping Cart Module
28 pages
Activity 3
No ratings yet
Activity 3
16 pages

Data Warehouse Design

Uploaded by

Data Warehouse Design

Uploaded by

Data warehouse design for data-driven

Data warehouse design, however, should be customized with an understanding of an

What is data warehouse architecture?

Traditional data warehouse architecture

The modern cloud data warehouse

 Amazon Redshift’s approach is akin to infrastructure-as-a-service (IaaS) or platform-as-

 The Google BigQuery approach is more like software-as-a-service (SaaS) that allows

 Snowflake’s automatically managed storage layer can contain structured or

Data warehousing schemas

 A snowflake schema arranges tables and their connections so that a representative entity

Anticipating major design flaws

 Organizations should strive to be future-proof. Design choices based exclusively on

7 steps to robust data warehouse design

4. ETL/ELT: The next step is the selection of an ETL/ELT solution. ETL transforms data prior

6. Reporting layer: With analytic infrastructure built and implemented, an organization can

You might also like