0% found this document useful (0 votes)
6 views

02 - Intro To Data Warehousing

Uploaded by

eric8y
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

02 - Intro To Data Warehousing

Uploaded by

eric8y
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Intro to Data

Warehousing
Creating Multidimensional Models
Software we will use in this course
• Microsoft SQL Server Express 2019
• Data Engine
• Integration Services
• SQL Server Management Studio
• Free download from Microsoft

2
Software we will use in this course
• Tableau
• Free 1-year license https://www.tableau.com/academic/students
• We will only use it in class, so you don’t need to download it if you don’t want
to.
• Python 3.7
• https://www.anaconda.com/distribution/#download-section

3
Anaconda Download

4
OLTP Source Systems
• OLTP System Characteristics
• Processes real-time transactions of a business
• Contains data structures optimized for entries and
edits
• Provides limited decision support capabilities
• OLTP Examples
• Order tracking
• Customer service
 Service-based sales
• Point-of-sales  Banking functions
Data Warehouse Characteristics
• Provides Data for Business Analysis Processes
• Integrates Data from Heterogeneous Source
Systems
• Combines Validated Source Data
• Organizes Data into Non-Volatile, Subject-
Specific Groups
• Stores Data in Structures that Are Optimized
for Extraction and Querying
Data Warehouse System
Components
Data Warehouse
User
Data Data Marts
Staging Data Access
Sources
Area

Data Input

Data Access
OLAP Databases
• Optimized Schema for Fast User Queries
• Robust Calculation Engine for Numeric Analysis
• Conceptual, Intuitive Data Model
• Multidimensional View of Data
• Drill down and drill up
• Pivot views of data
Common OLAP Applications
• Executive Information Systems  Financial Applications
• Performance measures  Reporting
• Exception reporting
 Planning
 Analysis
• Sales/Marketing Applications
• Booking/billing
• Product analysis
 Operations Applications
• Customer analysis  Manufacturing
 Customer service
 Product cost
Relational Data Marts and OLAP
Cubes
Relational
Relational OLAP
OLAPCube
Cube
Data
Data Mart
Mart
Relational
Relational N-dimensional
N-dimensional
Data
Data Storage
Storage Data
Data Structure
Structure Data
Data structure
structure

Detailed
Detailed and
and
Data
Data Content
Content Summarized
Summarized Data
Data
Summarized
Summarized Data
Data

Relational
Relational and
and Relational
Relational and
and
Data
Data Sources
Sources Non-relational
Non-relational Sources
Sources Non-relational
Non-relational Sources
Sources

Fast
Fast Performance
Performance for
for Faster
Faster Performance
Performance for
for
Data
Data Retrieval
Retrieval Data
Data Extract
Extract Queries
Queries Data
Data Extract
Extract Queries
Queries
OLAP in SQL Server 2017
• Microsoft Is One of Several OLAP Vendors
• Analysis Services Is Bundled with certain
versions of Microsoft SQL Server
• Analysis Services Include
• OLAP engine
• Data mining technology
• R and Python support
Understanding Data Warehouse
Design
• The Star Schema
• Fact Table Components
• Dimension Table Characteristics
• The Snowflake Schema
The Star Schema
Employee_Dim
EmployeeKey
EmployeeKey
EmployeeID
EmployeeID
...
...

Dimension Table
Time_Dim Fact Table Product_Dim
TimeKey
TimeKey Sales_Fact
Sales_Fact ProductKey
ProductKey
TheDate
TheDate TimeKey
TimeKey ProductID
ProductID
...
... EmployeeKey
EmployeeKey ...
...
ProductKey
ProductKey
CustomerKey
CustomerKey
ShipperKey
ShipperKey
Sales
Sales Amount
Amount
Unit
Unit Sales
Sales ...
...

Shipper_Dim Customer_Dim
ShipperKey
ShipperKey CustomerKey
CustomerKey
ShipperID
ShipperID CustomerID
CustomerID
...
... ...
...
Fact Table Components
Dimension
Dimension sales_fact Table
Tables
Tables
customer_dim
customer_dim Foreign Keys Measures
201
201 ALFI
ALFI Alfreds
Alfreds

customer_key
customer_key product_key
product_key time_key
time_key quantity_sales
quantity_sales amount_sales
amount_sales

201 25 134 400 10,789


product_dim
product_dim
25
25 123
123 Chai
Chai

time_dim
time_dim
134
134 1/1/2000
1/1/2000 The grain of the sales_fact table is defined by the lowest
level of detail stored in each dimension
Dimension Table Characteristics

• Describes Business Entities


• Contains Attributes That Provide Context to
Numeric Data
• Presents Data Organized into Hierarchies
The Snowflake Schema

• Defines Hierarchies by Using Multiple Dimension Tables


• Is More Normalized than a Single Table Dimension
• Is Supported within Analysis Services
• Is Generally a REALLY bad idea because of performance issues
OLAP Dimensions vs. Relational
Dimensions
OLAP Relational

REGION REGION
West West
CA East
OR
East STATE
MA REGION
NY CA West
OR West
MA East
NY East
Dimension Fundamentals
Dimension Family Relationships
 USA is the parent of North West and
USA South West
North West  North West and South West are
Oregon children of USA
Washington  North West and California are
South West descendants of USA
California  North West and USA are ancestors of
Washington
 North West and South West are
siblings
 Oregon and California are cousins
 All are dimension members
Lab
Lab Scenario
• Entities • Reporting Use Cases
• Vendor • Vendor Report
• Customer • Product Report
• Product • Inventory Velocity report
• OLTP Use Cases • Average Inventory on Hand
Report
• Order products from Vendors
• Process Customer Sales • Activities
• Create relational design
• Create multidimensional
design

You might also like