Lecture 3:Introduction to Dimensional Modelling.pptx

Lecture 3
Introduction to Dimensional Modeling

01/27/2025
Topics to cover
• Understanding key concepts in dimensional
modeling
• Importance of dimensional modeling
• Dimensional modeling vs ER modeling
• Types of dimensional models
• A DWH as a Multidimensional Model
IS 301 2

01/27/2025
Data Warehouse Design
 Designing the data warehouse is a key issue in
the DWH process.
 Although a DWH can be designed by entity
relationship modeling , many DWH experts
including Kimball et al… prefer dimensional
modeling.
IS 301 3

01/27/2025
Dimensional modeling is the process and outcome of designing
logical database schemas created to support OLAP and Data Warehousing
Solutions.
It is especially useful for summarizing and rearranging the data and
presenting views of the data to support data analysis.
Dimension: A category of information, for example, the time dimension.
Attribute: A unique level within a dimension, for example, Month is an
attribute in the Time Dimension.
Hierarchy: The specification of levels that represents relationship between
different attributes within a dimension. For example, one possible hierarchy in
the Time dimension is Year → Quarter → Month → Day.
Dimensional Modeling
IS 301 4

01/27/2025
Issues to note:
1. Dimensions and hierarchies are represented by
dimensional tables.
2. Attributes are the non-key columns in the
dimensional tables.
3. Fact tables connect to one or more dimensional
tables, but fact tables do not have direct relationships
to one another.
IS 301 5

IS 301
Time Locations
Year Country
Quarterly District
Monthly Village
Measured Facts: annual sales amounts per village
Attributes
in
their
hierarchy
Dimensions
Lecture 3-6

01/27/2025 IS 301 7
Some examples:
– The analysis of a product sales to a customer during last six
month has three dimensions –: customer, product and time
– The analysis of a product sales to a customer in the a district
during last six month has four dimensions –: customer, product,
region, and time
• By the rule, the time is always one of the
dimensions

01/27/2025 IS 301 8
Data Granularity
• The grain defines the level of detail of a single record
in the fact table.
• The more detail there is in the fact table, the higher
its granularity and vice versa.

01/27/2025 IS 301 9
Data Granularity example
• A fact table contains three metrics (Unit Price, Units Sold and Total Sale
Amount).
– The Time dimension consists of four hierarchical elements (Year,
Quarter, Month and Day).
– The Organization dimension consists of three hierarchical elements
(Region, District and Store).
– The Product dimension consists of two hierarchical elements
(Product Family and SKU(Stock Keeping Unit)).
• The highest granularity that we can store Sales metrics is by
Day/Store/SKU (i.e., the lowest level in each dimensional hierarchy).
• Conversely, the lowest granularity that we can aggregate Sales metrics
to in this data mart is by Year/Region/Product Family (i.e., the highest
level in each dimensional hierarchy).

01/27/2025 IS 301 10
Benefits of using Dimensional Modeling
1. Logical model is easy to understand
• Provides a predictable and standard framework for end user apps. Report
writers, query tools, and user interfaces can all make strong assumptions about the
dimensional model to make the user interfaces more understandable.
• Model can be done (mostly) independent of expected queries since it withstands
unexpected changes in user behavior
• Handle changes easily – such as adding new dimensional attributes since there is
no need to reload data and no need to reprogram query tools
2. Optimized for performance
• High performance “browsing” across the attributes
• Strategy to handling aggregates .i.e. Summary records that are logically redundant
with base data already in the data warehouse, but they are used to enhance query
performance.
• OLAP engines can make processing more efficient
3. Historical tracking of information
– Strategies for handling changing dimensions
– Fact design allows high volume snapshots and transaction Tracking
Dimension Tables

01/27/2025 IS 301 11
Dimensional Model vs ER model
• The key to understanding the relationship
between DM and ER is that a single ER diagram
breaks down into multiple DM diagrams, or ‘stars’.
• Think of a large ER diagram as representing every
possible business process within an application.
The ER diagram may have Sales Calls, Order
Entries, Shipment Invoices, Customer Payments,
and Product Returns, all on the same diagram.

01/27/2025 IS 301 12
• To create the individual ‘stars’ that exist within an
application:
– Look for many-to-many relationships in the ER model
containing numeric and additive facts and designate
them as fact tables.
– Alternatively, look for ‘events’ or ‘transactions’ – these
may also be facts
– De-normalize all of the remaining tables into flat tables
with single-part keys that connect directly to the fact
tables. These tables become the dimension tables.

01/27/2025 IS 301 13
Shipments
Returns
Sales Contact
Orders
Payments

01/27/2025 IS 301 14
ERD versus DM
Order
Customer
Order-
line
Product
Product-
type
Region
Order_
fact
Product_
dimension
Customer_
dimension
Time_
dimension
Product-
group

01/27/2025 IS 301 15
• ER models are not appropriate for Data
Warehouses. ER modeling does not really
model a business; rather, it models the micro
relationships among data elements.
• ER models are wildly variable in structure. As
such, it is extremely difficult to optimize query
performance.

01/27/2025 IS 301
ER Modeling vs Dimensional modeling
Relational DM
Dimensional DM
1. Data is stored in RDBMS
2. Tables are units of storage
3. Data is normalized and used for
OLTP. Optimized for OLTP processing
4. Several tables and chains of
relationships among them
5. Volatile(several updates)
6. User is usually constrained by an
application that understands the
data design. Users are typically
operations staff.
1. Data is stored in RDBMS or
Multidimensional databases
2. Cubes are units of storage
3. Data is denormalized and used in
datawarehouse and data mart.
Optimized for OLAP
4. Few tables and fact tables are
connected to dimensional tables
5. Non volatile
6. The simpler data design makes it easier
for users to analyze data in any way they
choose. Users are typically analysts,
company strategists, or even executives
16

01/27/2025 IS 301
ER Modeling vs Dimensional modeling
Relational DM
Dimensional DM
7. SQL is used to manipulate data
8. Detailed level of transactional data
9. Normal Reports
10. Typical data design used for business
transaction systems
11. Goal – reduce every piece of information
to it’s simplest form –
a debit transaction, a customer record,
an address.
12. Suited for concurrent handling of many
small transactions by many users. Only a
limited amount of data history is
normally kept.
7. MDX is used to manipulate data
8. Summary of bulky transactional
data(Aggregates and Measures) used in
business decisions
9. User friendly, interactive, drag and drop
multidimensional OLAP Reports
10. Data design used for analysis systems
11. Goal – break up information into ‘Facts’ –
things a company measures and
‘Dimensions’ - how we measure them: by
time, region, or customer
12. Suited for reading or analyzing large
amounts of data by a modest numbers of
users. Many years of data history may be
kept. 17

01/27/2025 IS 301 18
What is a Dimensional Model?
• A dimensional model is a star schema that contains
two types of tables, fact tables and dimension tables.
1. Fact table (quantitative) – a fact table is the primary table
in a dimensional model where the numerical performance
measurement of the business are stored. I.e. attributes of
numeric and additive. Example: quantity sold, dollar sales
amount.
2. Dimension table ( descriptive) – tables that contain the
textual descriptors of the business. Example: product and
brand descriptions.

01/27/2025
•A fact table is a table that contains the measures of interest. For example, sales
amount would be such a measure.
•This measure is stored in the fact table with the appropriate granularity. For
example, it can be sales amount by store by day. In this case, the fact table would
contain three columns: A date column, a store column, and a sales amount
column.
•Numeric
• Additive – across dimensions. Hundreds and thousands of records are fetched
from the database, the useful thing to do with so many records is to add them
up.
• Primary keys of dimensions (surrogate keys) become foreign keys in the fact
table.
•Consider rolling summaries
Fact Table
IS 301 19

01/27/2025
Rolling summary
Day 2 Day 5
Week 1 Week 2 Week 3 Week 4
Day 3 Day 4 Day 6 Day 7
Month 1 Month 2 Month 3
Qrt 1 Qrt 2 Qrt 3 Qrt 4
Day 1
Month 12
……….
IS 301 20

01/27/2025
•A dimensional table provides the detailed information about the
attributes. For example, the dimensional table for the Quarter
attribute would include a list of all of the quarters available in the
data warehouse.
•Each row (each quarter) may have several fields, one for the
unique ID that identifies the quarter, and one or more additional
fields that specifies how that particular quarter is represented on a
report (for example, first quarter of 2001 may be represented as
"Q1 2001" or "2001 Q1").
Dimensional Table
IS 301 21

01/27/2025
Three basic types of dimensional models, and they are:
1. Star model
2. Snowflake model
3. Fact constellation model
Types of dimensional models
IS 301 22

01/27/2025
• a single object (the fact table) sits in the middle and is radically
connected to other surrounding objects (dimension tables) like
a star.
• Each dimension is represented as a single table. The primary key
in each dimension table is related to a foreign key in the fact
table.
Star model
IS 301 23

01/27/2025 IS 301 24
Example of Star Schema

01/27/2025
Example of Star Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_street
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
IS 301 25

01/27/2025 IS 301 26
Star Schema with Sample Data
001
002
001
003
002
001
002
001

01/27/2025 IS 301 27
Relationship of a Star Schema model to a
Report
• Question answered what, when, by whom, and to
whom.
– Results got by combining (joining one or more
dimension tables with the fact table)
• Example
– The Marketing Dept wants to know the quantity of
and order amount of PCs sold, relating to
customers who are married obtained by sales
persons in the Makerere region in the month of
March.

01/27/2025 IS 301 28
Relationship of a Star Schema model to a
Report
Order Facts Table
Product Key (FK)
Time Key (FK)
Customer Key (FK)
Sales Person Key (FK)
Order Shillings
Cost Shillins
Margin Shillings
Quanty
Product Dimension Table
PK Product Key
Product Name
Product Code
Product Line
Brand
Sales Person Dimension Table
PK Sales Person Key
Sales Person NAme
Territory Name
Region Name
Customer Dimension Table
PK Customer Key
Customer Name
Customer Code
Marital Status
Address
Town
Time Dimension Table
PK Time Key
Date
Month
Quarter
Year
Product name =
PCs
Month =
March
Marital Status =
Married
Region Name =
Makerere

01/27/2025
1. Easy to understand
2. Easy to define hierarchies
3. Reduces number of physical joins
4. Low maintenance
5. Very simple metadata
Benefits of a Star model
IS 301 29

01/27/2025
A refinement of star schema where some dimensional
hierarchy is normalized into a set of smaller dimension
tables, forming a shape similar to snowflake
Snowflake model
IS 301 30

01/27/2025 IS 301
Example of Snowflake Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_key
item
branch_key
branch_name
branch_type
branch
supplier_key
supplier_type
supplier
city_key
city
province_or_street
country
city
31

01/27/2025 IS 301 32
Example of a snowflake schema- a student
attendance DWH

01/27/2025 IS 301 33
Benefits of snowflaking
1. Used in OLAP multidimensional database modeling tools that use
dimensional data marts as a data source.
2. Appropriate for use where a dimension table occupies a significant
proportion of the database as a result of querying very sparse dimensions or
dimensions with many attributes
3. Easier to use in cases where there is need to add a multidimensional view to
an existing transactional database for decision making.
4. A snowflake schema can sometimes reflect the way in which users think
about data.
5. Useful in conventional multidimensional reporting tools where queries
cannot be represented by simple star schema.

01/27/2025
A fact constellation model is a dimensional model that consists of
multiple fact tables, joined together through dimensions.
Multiple fact tables share dimension tables, viewed as a collection
of stars, therefore called galaxy schema or fact constellation
In this case a dimension table connects to more than one fact table,
we therefore refer to the dimension tables as "conformed" between
the two dimensional models.
Fact Constellation model
IS 301 34

01/27/2025
Example of Fact Constellation
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_street
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_key
shipper_name
location_key
shipper_type
shipper
IS 301 35

01/27/2025 IS 301 36
Common uses
• These schemas are most commonly found in dimensional DWHs and
data marts where speed of data retrieval is more important than the
efficiency of data manipulations.
• The decision whether to employ a star schema or a snowflake schema or
a fact constellation schema should consider the relative strengths of the
database platform in question and the query tool to be employed.
– Star schema should be favored with query tools that largely expose users to
the underlying table structures, and in environments where most queries
are simpler in nature.
– Snowflake schema are often better with more sophisticated query tools
that isolate users from the raw table structures and for environments
having numerous queries with complex criteria.

01/27/2025 IS 301 37
Schema Keys
• Dimension Business Key
– Column or columns that identify a unique instance of the business
record (not necessarily a unique record in the dimension table)
– Used in the ETL process to tie fact records with dimension members
• Dimension Record Surrogate Keys
– Defines the dimension’s primary key
– Relates to the fact table foreign key field
– Numeric data type, typically integer (2,4,8 byte)
• Foreign Keys
– Each Dimensional Table has a one-to-many relationship with the
central fact table
– The PK of each Dimension Table must be a Foreign Key in the Fact
Table

01/27/2025 IS 301 38
Why use surrogate Keys
• Data tables in various source systems may use different keys for the same entity.
Legacy systems that provide historical data might have used a different numbering
system than a current online transaction processing system. A surrogate key
uniquely identifies each entity in the dimension table regardless of its source key. A
separate field can be used to contain the key used in the source system.
• Systems developed independently in company divisions may not use the same
keys, or they may use keys that conflict with data in the systems of other
divisions. This situation may not cause problems when each division independently
reports summary data, but it cannot be permitted in the data warehouse where
data is consolidated.
• Keys may change or be reused in the source data systems. This situation is usually
less likely than others, but some systems have been known to reuse keys belonging
to obsolete data. However, the key may still be in use in historical data in the data
warehouse, and the same key cannot be used to identify different entities.

01/27/2025 IS 301 39
Why use surrogate Keys
• Changes in organizational structures may move keys in the hierarchy. This can be a
common situation.
– For example, if a salesperson is transferred from one region to another, the company
may prefer to track two things: sales data for the salesperson with the person's original
region for data prior to the transfer date, and sales data for the salesperson in the
person's new region after the transfer date. To represent this organization of data, the
salesperson's record must exist in two places in the sales force dimension table, which is
not possible if the salesperson's company employee identification number is used as
the primary key for the dimension table. A surrogate key allows the same salesperson to
participate in different locations in the dimension hierarchy.
– In this case, the salesperson will be represented twice in the dimension table with two
different surrogate keys. These surrogate keys are used to join the salesperson's records
to the sets of facts appropriate to the various locations in the hierarchy occupied by the
salesperson.
– The employee's identification number should be carried in a separate column in the
table so information about the employee can be reviewed or summarized regardless of
the number of times the employee's record appears in the dimension table.
Dimensions that exhibit this type of change are called slowly changing
dimensions.

01/27/2025 IS 301 40
Multi-dimensional Databases
• A data warehouse is based on a multidimensional
data model
• A multidimensional database (MDDB) reflects
the multi-dimensional characteristics of data
internally
• Stores the data in a giant hypercube
• Makes it easy for computer to locate any item
of interest.

01/27/2025 IS 301 41
Multi-dimensional Data
Product
R
e
g
i
o
n
Month
Hierarchical summarization paths
Dimensions:
Product, Location, Time
Industry Region Year
Category Country Quarter
Product City Month Week
Office Day
Sales volume as a function of product,
month, and region

01/27/2025 IS 301 42
Data Cube/Hypercube
• Hypercubes summarise data into dimensions
• Multidimensional Hypercubes enable managers to analyse
values at the intersection of these dimensions
Total annual sales
of TVs in U.S.A.
Date
P
r
o
d
u
c
t
Country
sum
sum
TV
VC
R
PC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum

01/27/2025 IS 301 43
What are Data Cubes
• Data Cubes are multidimensional data
resources
• Data Cubes allow you to look at complex data
in a simple format
• Data Cubes allow you to analyze specific data

01/27/2025 IS 301 44
Parts of the Data Cube
Filters: Ability to look at only one
portion of a dimension. Example –
GSFC instead of All Centers.
Measures: Method of
counting. How the
dimension is being counted.
Dimensions: The category that is being counted.
Example – GSFC Employees, Employee Type
etc.

• Data Cubes look like a simple spreadsheet
• Just choose what data you want in your columns and
your rows and the totals self-populate
– In this example, Centers are in the columns and
some dates are in the rows
How to use a Data Cube
Lecture 3-46
IS 301

How to choose your Columns
• To choose a column, simply click on the pull-down
menu and choose your data
• The column pull-down menu is located just to the
right of the column header labels
Lecture 3-47
IS 301

How to choose your Rows
• To choose a row, simply click on the pull-down menu
and choose your data
The row pull-down menu is located
just below the row header labels
Lecture 3-48
IS 301

Filters
• After choosing the columns and rows, you may
choose what segment of the population you want to
view
• Use drop-down boxes to determine what data you
want to focus on
Lecture 3-49
IS 301

01/27/2025 IS 301 50
Measures
• You may also choose how the data is counted
or measured
– The most common measure is headcount
– Other measures include:
• Average Age
• Average Years of Federal Service
• Average Annual Salary
• Aggregate Salary

Measures
• Use the drop-down box to choose a measure
Lecture 3-51
IS 301

View Data As . . .
• You may also choose to view the data as a number
or as a percentage
• Just use the View Data As drop down box
Lecture 3-52
IS 301

Lecture 3:Introduction to Dimensional Modelling.pptx

Recommended

More Related Content

Similar to Lecture 3:Introduction to Dimensional Modelling.pptx (20)

Recently uploaded (20)

Lecture 3:Introduction to Dimensional Modelling.pptx

Editor's Notes