0% found this document useful (0 votes)
45 views

Mid 2 - Solution

The document discusses dimensions and measures for a data warehouse schema based on supplier, product, and calendar dimensions from an ERD and spreadsheet. It identifies the dimensions as supplier, calendar, and product with hierarchies. Measures come from purchase transactions and related tables. The grain is individual product purchases. A star schema is proposed with fact and dimension tables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Mid 2 - Solution

The document discusses dimensions and measures for a data warehouse schema based on supplier, product, and calendar dimensions from an ERD and spreadsheet. It identifies the dimensions as supplier, calendar, and product with hierarchies. Measures come from purchase transactions and related tables. The grain is individual product purchases. A star schema is proposed with fact and dimension tables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Mid 2 - Solution

1. The dimensions in the problem are reasonably clear. Supplier, calendar, and product are dimensions.
Supplier and product come from the ERD and the sample spreadsheet. The calendar dimension is a
standard data warehouse dimension. Calendar is a hierarchical dimension. Phone and email can be
parsed to be hierarchical as part of the supplier dimension.
• Supplier
o SuppNo: ERD only
o SuppName (Suppler table) | Supp (spreadsheet)
o SuppPhone: ERD only; hierarchical (country code → area code → prefix → line)
o SuppEmail: ERD only; hierarchical (top level domain → second level domain → local part)
• Calendar
o Date columns in the ERD (ProdNextShipDate, PurchDelDate, and PurchDate) and
spreadsheet (PurchDate); hierarchical (year → month → day)
• Product:
o ProdNo: ERD only
o ProdName (ERD) | ProdDesc (spreadsheet)
o ProdCode: spreadsheet only

2. The measures mostly come from the PurchLine table and supply purchases spreadsheet. Measures
from related tables are important to associate with the measures from the PurchLine table and Supply
Purchases spreadsheet.
• PLQty (PurchLine table) | Qty (spreadsheet); additive measure
• Amount of purchase: derived additive measure from the spreadsheet
• PLUnitCost (PuchLine table) | Unit Price (spreadsheet); snapshot measure
• ProdQOH (Product table) | Stock (Spreadsheet): Semi-additive across products but not useful
to add quantity of different products. Usually average across time periods
• SuppDisc (Supplier table): supplier discount; snapshot measure
• ProdPrice: product price; snapshot measure indicating the resale price of the product when
the purchase occurs
Mid 2 - Solution

3. The most detailed grain is the combination of individual supplier, individual product, and date.
• 1,100 products: sum of product rows and unique products in a spreadsheet
• 120 suppliers: sum of supplier rows and unique suppliers in a spreadsheet
• Days per year: 365
• 512,000 purchases of individual products: sum of PurchLine rows and spreadsheet rows (one
year)
• Fact table size is determined from sum of the rows in the PurchLine table and Spreadsheet.
Thus, the individual product purchases per year are 512,000.
• Sparsity estimate:
o 1 - ( fact table size / product of dimensions )
o (1 – ( 512,000 / (1,100*120*365) ) = 0.98937
o The data cube has mostly missing cells with slightly more than 1% of cells with non zero
values.

4. The star schema should support the dimensions and measures specified in problems 1 and 2. There
are two relationships between the Calendar and InvFact tables to record both the purchase and
delivery dates. Product type is a new derived column indicating the data source (merchandise for
resale or supply for internal usage). ProdNextShipDate was dropped in the data warehouse design.
The problem did not indicate a clear usage the data warehouse. It could be added as another
relationship from Calendar to InvFact if the date was useful for business intelligence reasoning. The
relationship would be incomplete for the spreadsheet data source.

Supplier
SuppNo
SuppName
SuppPhone
SuppEmail
SuppDisc

Supplies

Calendar InvFact
PurchDate
CalId InvFactNo Product
CalDay InvFactQty
ProdNo
CalMonth DelDate IFUnitCost ProdOf ProdName
CalYear IFQOH
ProdType
IFProdPrice
IFSuppDisc

You might also like