Lab1 Dimensional Modeling
Lab1 Dimensional Modeling
Overview
This lab will introduce the Dimensional Modeling process. Upon completing this lab activity you learn:
Lab Requirements
To complete this lab you will need the following:
Access to the Northwind Database on Microsoft SQL Server 2012. This should be available
through your iSchool vSphere login.
You should connect to your SQL server database before starting this lab.
The High-Level and Detailed dimensional modeling Excel Workbooks, available in the same place
where you got this lab.
Microsoft Excel 2007 or higher for editing the worksheets
Grading
This lab may be handed in as part of a problem set.
1|Page
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
NOTE: You can view this database diagram on-line. It’s under the Database Diagrams section of your
Northwind database and is accessible through SQL Server Management Studio.
Sales reporting. Senior management would like to be able to track sales by customer, employee,
product and supplier, with the goal of establishing which products are the top sellers which
employees place the most orders, and who are the best suppliers.
Order Fulfillment and Delivery. There is a need to analyze the order fulfillment process to see if
the time between when the order is placed and when it is shipped can be improved
2|Page
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
Product Inventory Analysis. Management requires a means to track inventory, On Order, and
Re-Order levels of products by supplier or category. Inventory levels should be snapshotted daily
and recorded into the warehouse for analysis.
Sales Coverage Analysis. An Analysis of the employees and the sales territories they cover.
As part of the business requirements, the following Enterprise Bus Matrix was created.
Dimension Order Shipped Customers Employees Shippers Products Suppliers Territory
Bus. Process Date Date
Sales X X X X X X
Reporting
Order X X X X X
Fulfillment
Inventory X X X
Analysis
Sales X X
Coverage
3|Page
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
At this point you might be wondering: what does order detail look like and how to we know it is what we
need? This is where data profiling comes into play.
Let’s take a look.
NOTE: In real life you won’t strike gold so easily. You’ll have to look at several tables before you can get
a clear picture of your fact table grain.
For example if you review the database diagram on page 2 of the lab you’ll see that the Order Details
table connects directly to the Products table via a foreign key in a many to one relationship. Because it
appears on multiple orders, Product fits the candidacy of a dimension. Once again we can verify this
dimension works for us and “rolls up” a couple of our known facts by writing some SQL.
4|Page
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
Important Tip: You should always exercise caution when profiling live systems. Executing SQL queries
against production data is usually not a wise decision as you may impact performance negatively. It is
important to seek the advice of a Database Administrator prior to embarking your data profiling
adventure!
Once you’ve identified a useful dimension, it’s time to add it to our Detailed Bus Matrix like so:
5|Page
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
Important Tip: It’s important to recognize that dimensional modeling is not a formal that can be
automated. There’s a lot of art that goes with the science. (To quote the pirate’s code these are more
like guidelines than hard and fast rules ).
6|Page
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
One important this to recognize is not all facts appear among your source data. Some of the facts you’ll
need are derived by doing a “little math” on some of the source data values. We include the facts we
want in the Detailed Bus Matrix but explain how they are derived in the Attributes and Metrics
worksheet. For now, we’ll add the following facts to our Detailed Bus Matrix and complete it.
Completing the Attributes and Metrics worksheet is self-explanatory and therefore I will leave it as an
exercise for you. As you complete this part, keep the following in mind:
Start with the dimensions you’ve identified in your Detailed Bus Matrix.
You can profile for useful dimensional attributes with a SQL query like this:
select * from [table_name]
Don’t forget to explore any hierarchies among your dimensions, as discussed in the previous
section.
Time dimensions are fairly standard. You only need to be detailed about any unique definitions
in your time dimensions.
If your fact is semi-additive, make note in the description.
If your fact is derived, be sure to explain how it is derived in the description.
When you’re done. Save your worksheet before moving on to the next step.
7|Page
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
1. Create a formal table design, including tables, keys, data types, and indexes so we can create
tables and indexes required for our star schemas (ROLAP).
2. Identify data sources of our dimensional model so that we can architect and implement the ETL
process in a future phase.
DO THIS: You should start by opening the Excel Workbook and reading the section titled How to use
this tool under the Home worksheet, and then read the ReadMe tab. This will give you an overview of
how to use this workbook.
Getting Started
First let’s setup the workbook.
DO THIS: Click on the Home worksheet, and complete the fields as follows:
Database: NorthwindDW
Description: The Northwind Traders Data Warehouse
Gen FK’s?: Y
Schema For Views: (leave blank)
To complete the design we will need to refer to the Attributes & Measures from the high level design. A
screen shot has been included for reference.
8|Page
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
The process you’ll follow to design a dimension or fact table is outlined in 5 Steps:
9|Page
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
NOTE: Included in this detailed design are techniques for dealing with type-2 SCD’s and an audit
dimension (everything from row 22 and higher in the screenshot). Both of these techniques are covered
in the ETL chapters of our course. For now, we can leave these in our design. We’ll revisit them later.
The columns you’ll need to complete in this step for each attribute are:
Datatype, Size, Precision – the SQL Server datatype (including size and precision, where
appropriate) of the attribute. A good rule of thumb is to check the source data type for
reference. It should be noted that data types vary from DBMS to DBMS. SQL server datatype
reference can be found at http://msdn.microsoft.com/en-us/library/ms187752.aspx.
Key? – Should be blank if not a key or labeled PK = primary key, PK ID = primary key (with
surrogate), or FK = foreign key.
FK To – When you label an attribute as FK, you need to include a dimension table and its primary
key as the referencing column.
10 | P a g e
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
NULL? – Whether or not the attribute permits null values. This should only be permitted in very
rare circumstances. The better design decision is to provide a default value in place of NULL.
Default Value – A value which should be stored in the event there is no value.
Source System – List the source system for the attribute. Derived implies the attribute is
calculated.
Source Schema – If the attribute comes from a specific schema, list it here.
Source Table – State the table the attribute comes from on the source system.
Source Field Name – The column or columns which supply the attribute. If the column is a
calculation, specify that here (ex. OrderQty*Price).
11 | P a g e
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
Next Steps
Finish your detailed design by repeating steps 1-5 for your other dimensions and fact table for the sales
reporting process. You’ll probably have to do a bit of profiling to complete the process. Ask questions if
you have them, or if working outside of class time, log your issues to the issues worksheet.
In this part, you will repeat the process outlined in part 2 of the lab for the order fulfillment business
process. Here’s a set of instructions and guidelines for you to follow:
12 | P a g e
IST722 Data Warehousing Lab1
Michael A. Fudge, Jr. The Dimensional Modeling Process
c. In both cases make sure to be complete in your documentation process, including the
target table design for your star schema and information regarding your data source so
that you have what is required for the ETL design.
d. Any issues you encounter, such as not knowing how to source your data, should be
placed in the issues list (in the other workbook).
13 | P a g e