4 Lecture 4-Dimensional Modelling
4 Lecture 4-Dimensional Modelling
Chapters 10 and 11
02/17/24 2
02/17/24 3
Data Warehouse Design
Designing the data warehouse is a key issue in the
DWH process.
02/17/24 4
Dimensional Model vs ER model
ER models are not appropriate for Data Warehouses.
ER modeling does not really model a business; rather,
it models the micro relationships among data
elements.
02/17/24 5
What is a Dimensional Model?
A dimensional model is a star schema that
contains two types of tables, fact tables and
dimension tables.
1. Fact table (quantitative) – a fact table is the
primary table in a dimensional model where the
numerical performance measurement of the business
are stored. i.e. attributes are numeric and additive.
Example: quantity sold, dollar sales amount.
02/17/24 8
•A dimensional table provides the detailed information about
the attributes. For example, the dimensional table for the
Quarter attribute would include a list of all of the quarters
available in the data warehouse.
•Each row (each quarter) may have several fields, one for the
unique ID that identifies the quarter, and one or more
additional fields that specifies how that particular quarter is
represented on a report (for example, first quarter of 2001 may
be represented as "Q1 2001" or "2001 Q1").
02/17/24 9
Dimensional modeling is the process and outcome of designing
logical database schemas created to support OLAP and Data
Warehousing solutions.
02/17/24 10
Issues to note:
1.Dimensions and hierarchies are represented by
dimensional tables.
2.Attributes are the non-key columns in the
dimensional tables.
3.Fact tables connect to one or more dimensional
tables, but fact tables do not have direct
relationships to one another.
02/17/24 11
Dimensional Modeling
Dimensions
Time Locations
Year Country
Attributes in their hierarchy
Quarter District
Month Village
02/17/24 14
Dimensional Modeling
Data Granularity example
Suppose a fact table contains three metrics (Unit Price, Units Sold
and Total Sale Amount).
The Time dimension consists of four hierarchical elements (Year,
Quarter, Month and Day).
The Organization dimension consists of three hierarchical
elements (Region, District and Store).
The Product dimension consists of two hierarchical elements
(Product Family and SKU(Stock Keeping Unit)).
02/17/24 16
Dimensional Model vs ER model
To create the individual ‘stars’ that exist within an
application:
Look for many-to-many relationships in the ER model
containing numeric and additive facts and designate
them as fact tables.
Alternatively, look for ‘events’ or ‘transactions’ – these
may also be facts
De-normalize all of the remaining tables into flat
tables with single-part keys that connect directly to the
fact tables. These tables become the dimension tables.
02/17/24 17
Shipments
Returns
Orders
Sales Contact
Payments
02/17/24 18
ERD versus DM
Product Customer
Order_
fact
Order
Time_
Order- dimension
line
02/17/24 19
1. Logical model is easy to understand
• Provides a predictable and standard framework for end user apps. Report
writers, query tools, and user interfaces can all make strong assumptions about the
dimensional model to make the user interfaces more understandable.
• Model can be done (mostly) independent of expected queries since it
withstands unexpected changes in user behavior
• Handle changes easily – such as adding new dimensional attributes since
there is no need to reload data and no need to reprogram query tools
2. Optimized for performance
• High performance “browsing” across the attributes
• Strategy to handling aggregates .i.e. Summary records that are logically
redundant with base data already in the data warehouse, but they are used to
enhance query performance.
• OLAP engines can make processing more efficient
3. Historical tracking of information
– Strategies for handling changing dimensions
– Fact design allows high volume snapshots and transaction Tracking
02/17/24 20
ER Modeling vs Dimensional modeling
Dimensional DM
Relational DM
1. Data is stored in RDBMS 1. Data is stored in RDBMS or
Multidimensional databases
2. Tables are units of storage 2. Cubes are units of storage
3. Data is normalized and used for 3. Data is denormalized and used in
OLTP. Optimized for OLTP datawarehouse and data mart.
processing Optimized for OLAP
4. Several tables and chains of 4. Few tables and fact tables are
relationships among them connected to dimensional tables
5. Non volatile
5. Volatile(several updates)
6. The simpler data design makes it
6. User is usually constrained by an
easier for users to analyze data in any
application that understands the
way they choose. Users are typically
data design. Users are typically
analysts, company strategists, or even
operations staff. executives
02/17/24 21
ER Modeling vs Dimensional modeling
Dimensional DM
Relational DM
7. SQL is used to manipulate data 7. MDX is used to manipulate data
8. Detailed level of transactional data 8. Summary of bulky transactional
data(Aggregates and Measures) used
in business decisions
9. Normal Reports 9. User friendly, interactive, drag and
drop multidimensional OLAP Reports
10. Typical data design used for business 10. Data design used for analysis systems
transaction systems
11. Goal – reduce every piece of 11. Goal – break up information into
information to it’s simplest form – ‘Facts’ – things a company measures
a debit transaction, a customer record, and ‘Dimensions’ - how we measure
an address. them: by time, region, or customer
12. Suited for concurrent handling of 12. Suited for reading or analyzing large
many small transactions by many amounts of data by a modest numbers
users. Only a limited amount of data of users. Many years of data history
history is normally kept.
02/17/24 may be kept. 22
Three basic types of dimensional models, and they are:
1. Star model
2. Snowflake model
3. Fact constellation model
02/17/24 23
• a single object (the fact table) sits in the middle and is
radically connected to other surrounding objects
(dimension tables) like a star.
02/17/24 24
Example of Star Schema
02/17/24 25
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
02/17/24 26
Star Schema with Sample Data
001
002
001
003
002
001
002
003
02/17/24 27
Relationship of a Star Schema model
to a Report
Question answered what, when, by whom, and to
whom.
Results got by combining (joining one or more
dimension tables with the fact table)
Example
The Marketing Dept wants to know the quantity of and
order amount of PCs sold, relating to customers who are
married obtained by sales persons in the Makerere region
in the month of March.
02/17/24 28
Relationship of a Star Schema model
to a Report
Product Dimension Table
Customer Dimension Table
PK Product Key
PK Customer Key
Product Name
Product Code Customer Name
Product Line Customer Code
Brand Marital Status
Address
Town
Order Shillings
Cost Shillins
02/17/24 29
1. Easy to understand
2. Easy to define hierarchies
3. Reduces number of physical joins
4. Low maintenance
5. Very simple metadata
02/17/24 30
A refinement of star schema where some dimensional
hierarchy is normalized into a set of smaller
dimension tables, forming a shape similar to
snowflake
02/17/24 31
Example of Snowflake Schema
time
item
time_key
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_stree
Measures country
02/17/24 32
Example of a snowflake schema- a student
attendance DWH
02/17/24 33
02/17/24 34
Benefits of Snowflaking
Itis possible to save on storage space
Normalized structures are easier to update
and maintain than un-normalized one
It is appropriate for use where a dimension
table occupies a significant proportion of
the database as a result of dimensions with
many attributes
02/17/24 35
Disadvantages of Snowflaking
Schema less intuitive and end-users are put
off by the complexity
Ability to browse through the contents is
difficult
Degraded query performance because of
additional joins
02/17/24 36
A fact constellation model is a dimensional model that
consists of multiple fact tables, joined together through
dimensions.
02/17/24 37
02/17/24 38
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter item_key
time_key type
year supplier_type shipper_key
item_key
branch_key fromlocation
02/17/24 40
Schema Keys
Dimension Business Key
Column or columns that identify a unique instance of the
business record (not necessarily a primary key in the dimension
table)
Dimension Record Surrogate Keys
Defines the dimension’s primary key
Relates to the fact table foreign key field
Numeric data type, typically integer
Foreign Keys
Each Dimensional Table has a one-to-many relationship with
the central fact table
The PK of each Dimension Table must be a Foreign Key in the
Fact Table
02/17/24 41
Why use surrogate Keys
Data tables in various source systems may use different
presentations of keys for the same entity. Legacy systems
that provide historical data might have used a different
numbering system than a current online transaction processing
system. A surrogate key uniquely identifies each entity in the
dimension table regardless of its source key. A separate field
can be used to contain the key used in the source system.
02/17/24 42
Why use surrogate keys
Keys may change or be reused in the source
data systems. This situation is usually less likely
than others, but some systems have been known
to reuse keys belonging to obsolete data.
However, the key may still be in use in historical
data in the data warehouse, and the same key
cannot be used to identify different entities.
02/17/24 43
Why use surrogate Keys
Changes in organizational structures may move keys in
the hierarchy. This can be a common situation.
For example, if a salesperson is transferred from one region
to another, the company may prefer to track two things:
sales data for the salesperson with the person's original
region or data prior to the transfer date, and sales data for
the salesperson in the person's new region after the
transfer date. To represent this organization of data, the
salesperson's record must exist in two places in the sales
force dimension table, which is not possible if the
salesperson's company employee identification number is
used as the primary key for the dimension table. A
surrogate key allows the same salesperson to participate in
different locations in the dimension hierarchy.
02/17/24 44
Why use surrogate keys
In this case, the salesperson will be represented twice in
the dimension table with two different surrogate keys.
These surrogate keys are used to join the salesperson's
records to the sets of facts appropriate to the various
locations in the hierarchy occupied by the salesperson.
The employee's identification number should be carried
in a separate column in the table so information about
the employee can be reviewed or summarized
regardless of the number of times the employee's
record appears in the dimension table.
Dimensions that exhibit this type of change are
called slowly changing dimensions.
02/17/24 45