0% found this document useful (0 votes)
345 views

Star and Snowflake Schemas: What Is A Star Schema?

- A star schema stores dimension data in a single table for each dimension, while a snowflake schema uses multiple tables to store each dimension's hierarchical levels. - Snowflake schemas are preferable for large customer dimensions, complex financial products, and calendars with unique periods/holidays. - Star schemas are simpler and involve fewer joins but snowflake schemas reduce data redundancy. Kimball recommends star schemas for most cases.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
345 views

Star and Snowflake Schemas: What Is A Star Schema?

- A star schema stores dimension data in a single table for each dimension, while a snowflake schema uses multiple tables to store each dimension's hierarchical levels. - Snowflake schemas are preferable for large customer dimensions, complex financial products, and calendars with unique periods/holidays. - Star schemas are simpler and involve fewer joins but snowflake schemas reduce data redundancy. Kimball recommends star schemas for most cases.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Star and Snowflake Schemas

In relational implementation, the dimensional designs are mapped to a relational set of tables. You can implement the design into following two methods: Star Schema Snowflake Schema

What Is a Star Schema?


A star schema model can be depicted as a simple star: a central table contains fact data and multiple tables radiate out from it, connected by the primary and foreign keys of the database. In a star schema implementation, Warehouse Builder stores the dimension data in a single table or view for all the dimension levels. For example, if you implement the Product dimension using a star schema, Warehouse Builder uses a single table to implement all the levels in the dimension, as shown in the screenshot. The attributes in all the levels are mapped to different columns in a single table called PRODUCT.

What Is a Snowflake Schema?


The snowflake schema represents a dimensional model which is also composed of a central fact table and a set of constituent dimension tables which are further normalized into sub-dimension tables. In a snowflake schema implementation, Warehouse Builder uses more than one table or view to store the dimension data. Separate database tables or views store data pertaining to each level in the dimension.

The screenshot displays the snowflake implementation of the Product dimension. Each level in the dimension is mapped to a different table.

When do you use Snowflake Schema Implementation?


Ralph Kimball, the data warehousing guru, proposes three cases where snowflake implementation is not only acceptable but is also the key to a successful design: Large customer dimensions where, for example, 80 percent of the fact table measurements involve anonymous visitors about whom you collect little detail, and 20 percent involve reliably registered customers about whom you collect much detailed data by tracking many dimensions

Financial product dimensions for banks, brokerage houses, and insurance companies, because each of the individual products has a host of special attributes not shared by other products

Multienterprise calendar dimensions because each organization has idiosyncratic fiscal periods, seasons, and holidays

Ralph Kimball recommends that in most of the other cases, star schemas are a better solution. Although redundancy is reduced in a normalized snowflake, more joins are required. Kimball usually advises that it is not a good idea to expose end users to a physical snowflake design, because it almost always compromises understandability and performance.

Understanding Star and Snowflake Schemas



Articles > SQL Server > Analysis Services > Schemas

Dimensions Review
This article discusses how to design your model using the star or snowflake schemas. So dimensions are the objects or business entities in your database, there a group of related attributes about that object. Now its important to make the distinction between dimension tables and the dimensions themselves, when were creating our unified dimensional model were going to be modelling our dimension tables that ultimately we will create a dimension from so dont get confused thinking they are the same thing. I think when we review the approach to modelling the data youll see how dimension tables and dimensions are different. And in the dimensions article we talked about measures now measures are the data values that you want to report on, they are numeric and usually additive and the measures are stored in fact tables. A good way to define your dimensions and fact tables for the first time can usually be in a way that people describe their reports they like to see, for example you might hear a user that says they would like to see a report that shows sales by date and product line. Well in this example the sales

data would be a measure and therefore be in your fact table and date and product would be your dimension tables

Model our Database Star or Snowflake

We have two design approaches we can follow and they are either the star or snowflake schemas. The star schema is the simpler of the two, the star schema has a few basic rules each dimension is represented by one dimension table. So here there is a one to one match between the dimension tables that youre modelling and the actual dimensions that will be built into your OLAP cube. The dimension table is also related to or linked to a fact table, so all of your dimension tables in the star schema in your model will be linked to a fact table.

You can see in the above diagram we have two dimension tables that are linked to a fact table, we have DimProduct and DimLocation. The fact table here is in the middle of the diagram each dimension table is linked directly to the fact table and each dimension table is represents one dimension. The naming convention you see is widely used when working with dimensional modelling, the dimension tables are prefixed with Dim and the fact tables are prefixed with Fact.

Star Schema

So let expand on the diagram a little so you can see how this resembles a star which is where the star schema got its name from but also notice the relationships between the dimension tables and the fact tables are via primary keys and foreign keys. You can see with the DimProduct dimension at the top left has a primary key of ProductID which can be found in the FactSales table in the middle as a foreign key. So the FactSales table is just a collection of foreign keys to all the dimensions in the star schema and the measures you want to report on and you can see that represented in the above diagram. The fact table in your dimensional model in this case FactSales will most likely have most of the data and the rows as compared to the dimensional tables since its actually holding all of the detail that you will be reporting on. So take some time to study this star schema and really pay particular attention to the relationships between each of the dimension tables and the fact

tables. So if look just a little bit longer at the FactSales table you can see that we have the SalesOrderNumber which is the primary key for the fact sales table itself, its not related to any dimensions and then we have the four foreign keys each representing the primary key for the four dimensions that its related to DimProduct by ProductID, DimLocation by LocationID, DimPromotion by PromotionID then DimTime by TimeID. The remaining fields SaleAmmount, TaxAmmount and UnitPrice are the measures so each of these measures can then be viewed by either of those dimensions.

Snowflake Schema

Next we have the snowflake schema which varies slightly from the star schema, with the snowflake schema the dimensions are represented by more than one dimension table in other words its takes multiple dimension tables to define a dimension. Not all dimension tables are linked or related to the fact table because some of the dimensions will just be related to other dimensions.

So you can see in the diagram we have three dimension tables that together represent one dimension, they represent the pPoduct dimension. So with the star schema approach all the information in the three dimension tables shown here would actually be in only one dimension table, this is where the snowflake schema really differs in approach from the star. Also note since the dimension is represented by more than one dimension table not all the dimension tables are related to the fact table. So in the next diagram a few more tables have been added to get a better picture of what the schema will look like and the name snowflake comes from the idea somehow resembles a snowflake. At any rate there are some very key differences between this and the star schema we just pointed out.

Which way you choose to go has in my mind has a lot to do with your personal preference and theres an ongoing debate that continues to discuss which of these is better, honestly there are advantages and disadvantages of both and my experience Ive seen the star schema used the most and

preferred by many its simple to work with, but really the choice is up to you and both will work in analysis services.

When do you use a star schema and when to use a snowflake schema?
Print Reprints Email inShare

Question: Could you please let me know when to use a star schema and when to use a snowflake schema? What are the major differences between them?

Chuck Kelleys Answer: My personal opinion is to use the star by default, but if the product you are using for the business community prefers a snowflake, then I would snowflake it. The major difference between snowflake and star is that a snowflake will have multiple tables for a dimension and a start with a single table. For example, your company structure might be

Corporate Region Department Store

In a star schema, you would collapse those into a single "store" dimension. In a snowflake, you would keep them apart with the store connecting to the fact.

Joe Oates Answer: First of all, some definitions are in order. In a star schema, dimensions that reflect a hierarchy are flattened into a single table. For example, a star schema Geography Dimension would have columns like country, state/province, city, state and postal code. In the source system, this hierarchy would probably be normalized with multiple tables with one-to-many relationships.

A snowflake schema does not flatten a hierarchy dimension into a single table. It would, instead, have two or more tables with a one-to-many relationship. This is a more normalized structure. For example, one table may have state/province and country columns and a second table would have city and postal code. The table with city and postal code would have a many-to-one relationship to the table with the state/province columns.

There are some good for reasons snowflake dimension tables. One example is a company that has many types of products. Some products have a few attributes, others have many, many. The products are very different from each other. The thing to do here is to create a core Product dimension that has common attributes for all the products such as product type, manufacturer, brand, product group, etc. Create a separate sub-dimension table for each distinct group of products where each group shares common attributes. The subproduct tables must contain a foreign key of the core Product dimension table.

One of the criticisms of using snowflake dimensions is that it is difficult for some of the multidimensional front-end presentation tools to generate a query on a snowflake dimension. However, you can create a view for each combination of the core product/subproduct dimension tables and give the view a suitably description name (Frozen Food Product, Hardware Product, etc.) and then these tools will have no problem.

Types of Views in OBIEE


View Name
Compound Layout

Description
Use the compound layout view to assemble different views for display on a dashboard. On the Criteria tab, you can click the following button to access the compound layout view.

Title Table

Use the title view to add a title, a subtitle, a logo, a link to a custom online help page, and timestamps to the results. Use the table view to show results in a standard table. Users can navigate and drill down in the results. You can add totals, customize headings, and change the formula or aggregation rule for a column. You can also control the appearance of a column and its contents, and specify formatting to apply only if the contents of the column meet certain conditions. On the Criteria tab, you can click the following button to access the table view. Use the chart view to drag and drop columns to a layout chart. You can customize the title, legend location, axis titles, and data labels. You can customize the size and scale of the chart, and control colors using a style sheet. Oracle BI Answers supports a variety of standard chart types, including bar charts,

Chart

column charts, line charts, area charts, pie charts, and scatter charts. Custom chart subtypes include two-and-three-dimensional, absolute, clustered, stacked, combination, and custom. On the Criteria tab, you can click the following button to access the chart view. Pivot Table Use the pivot table view to take row, column, and section headings and swap them around to obtain different perspectives. You can drag and drop headings to pivot results, preview them, and apply the settings. Users can navigate through pivot tables and drill down into information. Users can create complex pivot tables that show aggregate and nonrelated totals next to the pivoted data, allowing for flexible analysis. For an interactive result set, elements can be placed in pages, allowing users to choose elements. On the Criteria tab, you can click the following button to access the pivot table view. Gauge Filters Column Selector Use the gauge view to show results as gauges, such as dial, bar, and bulb-style gauges. Use the filters view to show the filters in effect for a request. Filters allow you to constrain a request to obtain results that answer a particular question. Use the column selector view to permit users to dynamically change which columns appear in results. This allows users to analyze data along several dimensions. By changing the facts, users can dynamically alter the content of the results. Use the View Selector view to select a specific view of the results from among the saved views. When placed on a dashboard, the view selector appears as a drop-down list from which users can make a selection. Use the Legend view to document the meaning of special formatting used in results, such as the meaning of custom colors applied to gauges. Use the funnel chart view to show a three-dimensional chart that represents target and actual values using volume, level, and color. It is useful for depicting target values that decline over time, such as a sales pipeline. Use the narrative view to show the results as one or more paragraphs of text. You can type in a sentence with placeholders for each column in the results, and specify how rows should be separated. Use the ticker view to show the results of the request as a ticker or marquee, similar in style to the stock tickers that run across many financial and news sites on the Internet. You can control what information is presented and how it scrolls across the page. Use the static text view to include static text in the results. You can use HTML to add banners, tickers, ActiveX objects, Java applets, links, instructions, descriptions, graphics, and so on, in the results. The no results view allows you to specify explanatory text to appear if the request does not return any results. Use the logical SQL view to show the SQL generated for the request. This view is useful for trainers and Oracle BI administrators, and is usually not included in results for typical users. You cannot modify this view, except to delete it. Create Segment The create segment view is for users of the Oracle's Siebel Marketing Version 7.7 (or higher) operational application. Use it to display a Create Segment link in the results. Users can click this link to create a segment in their Oracle Siebel Marketing operational application, based on the results data.

View Selector

Legend Funnel Chart

Narrative

Ticker

Static Text

No Results Logical SQL

Create Target List

The create target list view is for users of Oracle's Siebel Life Sciences operational application integrated with Oracle's Siebel Life Sciences Analytics applications. Use it to create a Create Target List link in the results. Users can click this link to create a target list, based on the results data, in their Oracle Siebel operational application.

Physical Schemas in Relational Data Sources There are four types of physical schemas (models):
Star Schemas. A star schema is a set of dimensional schemas (stars) that each have a single fact table with foreign key join relationships to several dimension tables. When you map a star to the business model, you first map the physical fact columns to one or more logical fact tables. Then, for each physical dimension table that joins to the physical fact table for that star, you map the physical dimension columns to the appropriate conformed logical dimension tables.

Snowflake Schemas. A snowflake schema is similar to a star schema, except that each dimension is made up of multiple tables joined together. Like star schemas, you first map the physical fact columns to one or more logical tables. Then, for each dimension, you map the snowflaked physical dimension tables to a single logical table. You can achieve this by either having multiple logical table sources, or by using a single logical table source with joins.

Normalized Schemas. Normalized schemas distribute data entities into multiple tables to minimize data storage redundancy and optimize data updates. Before mapping a normalized schema to the business model, you need to understand how the distributed structure can be understood in terms of facts and dimensions.

After analyzing the structure, you pick a table that has fact columns and then map the physical fact columns to one or more logical fact tables. Then, for each dimension associated with that set of physical fact columns, you map the distributed physical tables containing dimensional columns to a single logical table. Like with snowflake schemas, you can achieve this by having multiple logical table sources, or by using a single logical table source with joins. Mapping normalized schemas is an iterative process because you first map a certain set of facts, then the associated dimensions, and then you move on to the next set of facts. Note that when a single physical table has both fact and dimension columns, you may need to create a physical alias table to handle the multiple roles played by that table. Fully Denormalized Schemas. This type of dimensional schema combines the facts and dimensions as columns in one table (or flat file), and is mapped differently than other types of schemas. When you map a fully denormalized schema to the star

shaped business model, you map the physical fact columns from the single physical fact table to multiple logical fact tables in the business model. Then, you map the physical dimension columns to the appropriate conformed logical dimension tables.

Star Vs Snowflake schema


Star Schemas The star schema is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a center. The center of the star consists of one or more fact tables and the points of the star are the dimension tables, as shown in figure.

Fact Tables A fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated.

Dimension Tables A dimension is a structure, often composed of one or more hierarchies, that categorizes data. Dimensional attributes help to describe the dimensional value. They are normally descriptive, textual values. Dimension tables are generally small in size as compared to fact table.

-To take an example and understand, assume this schema to be of a retail-chain (like wal-mart or carrefour). Fact will be revenue (money). Now how do you want to see data is called a dimension.

In above figure, you can see the fact is revenue and there are many dimensions to see the same data. You may want to look at revenue based on time (what was the revenue last quarter?), or you may want to look at revenue based on a certain product (what was the revenue for chocolates?) and so on. In all these cases, the fact is same, however dimension changes as per the requirement.

Note: In an ideal Star schema, all the hierarchies of a dimension are handled within a single table. Star Query A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. The cost-based optimizer recognizes star queries and generates efficient execution plans for them.

Snoflake Schema The snowflake schema is a variation of the star schema used in a data warehouse. The snowflake schema (sometimes callled snowflake join schema) is a more complex schema than the star schema because the tables which describe the dimensions are normalized.

Flips of "snowflaking"

- In a data warehouse, the fact table in which data values (and its associated indexes) are stored, is typically responsible for 90% or more of the storage requirements, so the benefit here is normally insignificant. - Normalization of the dimension tables ("snowflaking") can impair the performance of a data warehouse. Whereas conventional databases can be tuned to match the regular pattern of usage, such patterns rarely exist in a data warehouse. Snowflaking will increase the time taken to perform a query, and the design goals of many data warehouse projects is to minimize these response times.

Benefits of "snowflaking"

- If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snowflaking may be appropriate.

- A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalised. A snowflake schema will hence be easier to implement. - A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organisation of the database.

- Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snowflaking would typically be required to permit simple query tools such as Cognos Powerplay to form such a query, especially if provision for these forms of query weren't anticpated when the data warehouse was first designed. In practice, many data warehouses will normalize some dimensions and not others, and hence use a combination of snowflake and classic star schema.

Basics of Data Model


Relational Vs Dimensional Model :

1) Dimensional Modeling approach provides a way to improve query performance for summary reports thereby enabling faster business decision making process. 2) Dimensional Modeling is easier to understand and navigate as it represents business subject areas in an intuitive style. 3) Dimensional Data Modeling is used for storing and reporting summarized data. For example, sales data could be collected on a daily basis and then be aggregated to the week level, the week data could be aggregated to the month level, and so on.

Star Schema: 1) Star Schema is a relational database schema for representing multidimensional data. 2) It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. 3) Entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. 4) The center of the star schema consists of a large fact table and it points towards the dimension tables.

5) The advantage of star schema are slicing down, performance increase and easy understanding of data.

Snowflake Schema: 1) A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e. dimension table hierarchies are broken into simpler tables. 2) Hierarchies (category, branch, state, and month) are being broken out of the dimension tables (PRODUCT, ORGANIZATION, LOCATION, and TIME) respectively and shown separately. 3) Snowflake schema approach increases the number of joins and poor performance in retrieval of data. 4) Normalization of dimension makes it difficult for user to understand 5) Dimension tables are normally smaller than fact tables - space may not be a major issue to warrant snowflaking.

Important aspects of Star and Snowflake schema: 1) In a star schema every dimension will have a primary key. 2) In a star schema, a dimension table will not have any parent table. 3) In a snow flake schema, a dimension table will have one or more parent tables. 4) Hierarchies for the dimensions are stored in the dimensional table itself in star schema. 5) Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.

Model Implementation: 1) In the Data Modeling Life cycle the last phase is the implementation of the Physical model. During the physical design process, you convert the data gathered during the logical design phase into a description of the physical database structure. Physical design decisions are mainly driven by query performance and database maintenance aspects. 2) During the physical design process, you translate the expected schemas into actual database structures. At this time, you have to map: a) Entities to tables b) Relationships to foreign key constraints c) Attributes to columns d) Primary unique identifiers to primary key constraints e) Unique identifiers to unique key constraints

3) Once you have converted your logical design to a physical design you will need to create or use some or all of the following features while implementing you Data warehouse solution. Partitions, Bitmap Index, Materialized Views, Analytic SQL function, Parallel Execution, etc

You might also like