Schema
Schema
Schema is a logical description of the entire database. It includes the name and description of
records of all record types including all associated data-items and aggregates. Much like a
database, a data warehouse also requires to maintain a schema. A database uses relational
model, while a data warehouse uses Star, Snowflake, and Fact Constellation schema. In this
chapter, we will discuss the schemas used in a data warehouse.
Star Schema
There is a fact table at the center. It contains the keys to each of four dimensions.
The fact table also contains the attributes, namely dollars sold and units sold.
Note − Each dimension has only one dimension table and each table holds a set of attributes.
For example, the location dimension table contains the attribute set {location_key, street,
city, province_or_state,country}. This constraint may cause data redundancy. For example,
"Vancouver" and "Victoria" both the cities are in the Canadian province of British Columbia.
The entries for such cities may cause data redundancy along the attributes province_or_state
and country.
Snowflake Schema
Now the item dimension table contains the attributes item_key, item_name, type,
brand, and supplier-key.
The supplier key is linked to the supplier dimension table. The supplier dimension
table contains the attributes supplier_key and supplier_type.
Note − Due to normalization in the Snowflake schema, the redundancy is reduced and
therefore, it becomes easy to maintain and the save storage space.
A fact constellation has multiple fact tables. It is also known as galaxy schema.
The following diagram shows two fact tables, namely sales and shipping.
The sales fact table is same as that in the star schema.
The shipping fact table has the five dimensions, namely item_key, time_key,
shipper_key, from_location, to_location.
The shipping fact table also contains two measures, namely dollars sold and units
sold.
It is also possible to share dimension tables between fact tables. For example, time,
item, and location dimension tables are shared between the sales and shipping fact
table.
Schema Definition
Multidimensional schema is defined using Data Mining Query Language (DMQL). The two
primitives, cube definition and dimension definition, can be used for defining the data
warehouses and data marts.