02 - Intro To Data Warehousing
02 - Intro To Data Warehousing
Warehousing
Creating Multidimensional Models
Software we will use in this course
• Microsoft SQL Server Express 2019
• Data Engine
• Integration Services
• SQL Server Management Studio
• Free download from Microsoft
2
Software we will use in this course
• Tableau
• Free 1-year license https://www.tableau.com/academic/students
• We will only use it in class, so you don’t need to download it if you don’t want
to.
• Python 3.7
• https://www.anaconda.com/distribution/#download-section
3
Anaconda Download
4
OLTP Source Systems
• OLTP System Characteristics
• Processes real-time transactions of a business
• Contains data structures optimized for entries and
edits
• Provides limited decision support capabilities
• OLTP Examples
• Order tracking
• Customer service
Service-based sales
• Point-of-sales Banking functions
Data Warehouse Characteristics
• Provides Data for Business Analysis Processes
• Integrates Data from Heterogeneous Source
Systems
• Combines Validated Source Data
• Organizes Data into Non-Volatile, Subject-
Specific Groups
• Stores Data in Structures that Are Optimized
for Extraction and Querying
Data Warehouse System
Components
Data Warehouse
User
Data Data Marts
Staging Data Access
Sources
Area
Data Input
Data Access
OLAP Databases
• Optimized Schema for Fast User Queries
• Robust Calculation Engine for Numeric Analysis
• Conceptual, Intuitive Data Model
• Multidimensional View of Data
• Drill down and drill up
• Pivot views of data
Common OLAP Applications
• Executive Information Systems Financial Applications
• Performance measures Reporting
• Exception reporting
Planning
Analysis
• Sales/Marketing Applications
• Booking/billing
• Product analysis
Operations Applications
• Customer analysis Manufacturing
Customer service
Product cost
Relational Data Marts and OLAP
Cubes
Relational
Relational OLAP
OLAPCube
Cube
Data
Data Mart
Mart
Relational
Relational N-dimensional
N-dimensional
Data
Data Storage
Storage Data
Data Structure
Structure Data
Data structure
structure
Detailed
Detailed and
and
Data
Data Content
Content Summarized
Summarized Data
Data
Summarized
Summarized Data
Data
Relational
Relational and
and Relational
Relational and
and
Data
Data Sources
Sources Non-relational
Non-relational Sources
Sources Non-relational
Non-relational Sources
Sources
Fast
Fast Performance
Performance for
for Faster
Faster Performance
Performance for
for
Data
Data Retrieval
Retrieval Data
Data Extract
Extract Queries
Queries Data
Data Extract
Extract Queries
Queries
OLAP in SQL Server 2017
• Microsoft Is One of Several OLAP Vendors
• Analysis Services Is Bundled with certain
versions of Microsoft SQL Server
• Analysis Services Include
• OLAP engine
• Data mining technology
• R and Python support
Understanding Data Warehouse
Design
• The Star Schema
• Fact Table Components
• Dimension Table Characteristics
• The Snowflake Schema
The Star Schema
Employee_Dim
EmployeeKey
EmployeeKey
EmployeeID
EmployeeID
...
...
Dimension Table
Time_Dim Fact Table Product_Dim
TimeKey
TimeKey Sales_Fact
Sales_Fact ProductKey
ProductKey
TheDate
TheDate TimeKey
TimeKey ProductID
ProductID
...
... EmployeeKey
EmployeeKey ...
...
ProductKey
ProductKey
CustomerKey
CustomerKey
ShipperKey
ShipperKey
Sales
Sales Amount
Amount
Unit
Unit Sales
Sales ...
...
Shipper_Dim Customer_Dim
ShipperKey
ShipperKey CustomerKey
CustomerKey
ShipperID
ShipperID CustomerID
CustomerID
...
... ...
...
Fact Table Components
Dimension
Dimension sales_fact Table
Tables
Tables
customer_dim
customer_dim Foreign Keys Measures
201
201 ALFI
ALFI Alfreds
Alfreds
customer_key
customer_key product_key
product_key time_key
time_key quantity_sales
quantity_sales amount_sales
amount_sales
time_dim
time_dim
134
134 1/1/2000
1/1/2000 The grain of the sales_fact table is defined by the lowest
level of detail stored in each dimension
Dimension Table Characteristics
REGION REGION
West West
CA East
OR
East STATE
MA REGION
NY CA West
OR West
MA East
NY East
Dimension Fundamentals
Dimension Family Relationships
USA is the parent of North West and
USA South West
North West North West and South West are
Oregon children of USA
Washington North West and California are
South West descendants of USA
California North West and USA are ancestors of
Washington
North West and South West are
siblings
Oregon and California are cousins
All are dimension members
Lab
Lab Scenario
• Entities • Reporting Use Cases
• Vendor • Vendor Report
• Customer • Product Report
• Product • Inventory Velocity report
• OLTP Use Cases • Average Inventory on Hand
Report
• Order products from Vendors
• Process Customer Sales • Activities
• Create relational design
• Create multidimensional
design