The document discusses dimensional modeling concepts used in data warehouse design. Dimensional modeling organizes data into facts and dimensions. Facts are measures that are analyzed, while dimensions provide context for the facts. The dimensional model uses star and snowflake schemas to store data in denormalized tables optimized for querying. Key aspects covered include fact and dimension tables, slowly changing dimensions, and handling many-to-many and recursive relationships.
The Data Warehouse (DW) is considered as a collection of integrated, detailed, historical data, collected from different sources . DW is used to collect data designed to support management decision making. There are so many approaches in designing a data warehouse both in conceptual and logical design phases. The conceptual design approaches are dimensional fact model, multidimensional E/R model, starER model and object-oriented multidimensional model. And the logical design approaches are flat schema, star schema, fact constellation schema, galaxy schema and snowflake schema. In this paper we have focused on comparison of Dimensional Modelling AND E-R modelling in the Data Warehouse. Dimensional Modelling (DM) is most popular technique in data warehousing. In DM a model of tables and relations is used to optimize decision support query performance in relational databases. And conventional E-R models are used to remove redundancy in the data model, facilitate retrieval of individual records having certain critical identifiers, and optimize On-line Transaction Processing (OLTP) performance.
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
Over the last decade, the 3Vs of data - Volume, Velocity & Variety has grown massively. The Big Data revolution has completely changed the way companies collect, analyze & store data. Advancements in cloud-based data warehousing technologies have empowered companies to fully leverage big data without heavy investments both in terms of time and resources. But, that doesn’t mean building and managing a cloud data warehouse isn’t accompanied by any challenges. From deciding on a service provider to the design architecture, deploying a data warehouse tailored to your business needs is a strenuous undertaking. Looking to deploy a data warehouse to scale your company’s data infrastructure or still on the fence? In this presentation you will gain insights into the current Data Warehousing trends, best practices, and future outlook. Learn how to build your data warehouse with the help of real-life use-cases and discussion on commonly faced challenges. In this session you will learn:
- Choosing the best solution - Data Lake vs. Data Warehouse vs. Data Mart
- Choosing the best Data Warehouse design methodologies: Data Vault vs. Kimball vs. Inmon
- Step by step approach to building an effective data warehouse architecture
- Common reasons for the failure of data warehouse implementations and how to avoid them
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...JOHNLEAK1
This document provides information about different types of data models:
1. Conceptual data models define entities, attributes, and relationships at a high level without technical details.
2. Logical data models build on conceptual models by adding more detail like data types but remain independent of specific databases.
3. Physical data models describe how the database will be implemented for a specific database system, including keys, constraints and other features.
Business Intelligence and Multidimensional DatabaseRussel Chowdhury
It was an honor that my employer assigned me to study with Business Intelligence that follows SQL Server Analysis
Services. Hence I started and prepared a presentation as a startup guide for a new learner.
* Thanks to all the contributions gathered here to prepare the doc.
The document discusses designing dimensional models for data warehouses and business intelligence systems. It provides an overview of key concepts in dimensional modeling including facts, dimensions, and the importance of conformed dimensions to enable analysis across multiple business processes. It also describes the process of designing dimensional models, including defining facts and dimensions, bringing them together into a star schema, and using a bus matrix to map business processes to dimensional models.
This document discusses the components and architecture of a data warehouse. It describes the major components as the source data component, data staging component, information delivery component, metadata component, and management/control component. It then discusses each of these components in more detail, specifically covering source data types, the extract-transform-load process in data staging, the data storage repository, and authentication/monitoring in information delivery. Dimensional modeling is also introduced as the preferred approach for data warehouse design compared to entity-relationship modeling.
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
This document discusses various concepts in data warehouse logical design including data marts, types of data marts (dependent, independent, hybrid), star schemas, snowflake schemas, and fact constellation schemas. It defines each concept and provides examples to illustrate them. Dependent data marts are created from an existing data warehouse, independent data marts are stand-alone without a data warehouse, and hybrid data marts combine data from a warehouse and other sources. Star schemas have one table for each dimension that joins to a central fact table, while snowflake schemas have normalized dimension tables. Fact constellation schemas have multiple fact tables that share dimension tables.
The document discusses various data modeling techniques for data warehouses including star schemas and column-oriented storage. It notes that traditional OLTP systems are not optimized for data warehousing queries. Star schemas organize data around a central fact table linked to dimension tables and are widely used. However, star schemas can have performance issues like large intermediate results. Column-oriented storage improves performance by storing columns together rather than rows.
The document discusses dimensional modeling and star schemas for data warehousing. It describes how requirements are used to design dimensional models, including choosing dimensions, grains, and facts. The key aspects of a star schema are presented, including fact tables containing measurements and dimension tables containing business context. Slowly changing dimensions, large dimensions, and snowflake schemas are also covered. Aggregate fact tables and fact constellations are introduced as extensions of the star schema.
Dimensional data modeling is a technique for database design intended to support analysis and reporting. It contains dimension tables that provide context about the business and fact tables that contain measures. Dimension tables describe attributes and may include hierarchies, while fact tables contain measurable events linked to dimensions. When designing a dimensional model, the business process, grain, dimensions, and facts are identified. Star and snowflake schemas are common types that differ in normalization of the dimensions. Slowly changing dimensions also must be accounted for.
Module 1.2: Data Warehousing Fundamentals.pptxNiramayKolalle
This presentation provides a comprehensive introduction to Data Warehousing, covering key concepts such as Dimensional Modeling, Data Warehouse Schemas, and Information Package Diagrams. It differentiates between Entity-Relationship (ER) modeling and Dimensional Modeling, emphasizing their applications in transactional systems and analytical processing, respectively.
Key Topics Covered:
1. ER Modeling vs. Dimensional Modeling
ER Modeling: Used in traditional relational databases to normalize data and reduce redundancy.
Dimensional Modeling: Optimized for data warehousing, focusing on query performance and analytical reporting.
Fact tables store measurable business metrics, while dimension tables provide contextual attributes (e.g., time, location, product details).
2. Elements of a Dimensional Data Model
Fact Tables: Contain quantitative data such as sales, revenue, and order counts.
Dimension Tables: Store descriptive data like customer information, time periods, and product categories.
Attributes: Characteristics that define dimensions, such as region, product type, and customer segment.
3. Data Warehouse Schemas
Three primary types of schemas used in Data Warehousing:
Star Schema – A central fact table connected to multiple dimension tables. Simple and efficient for query execution.
Snowflake Schema – A more structured variation where dimension tables are normalized, reducing redundancy but increasing complexity.
Fact Constellation Schema (Galaxy Schema) – Multiple fact tables share common dimension tables, suitable for complex analytical applications.
4. Information Package Diagrams
These diagrams define the structure and relationships of data within a data warehouse. They help organizations:
Define key business metrics (e.g., revenue, order count).
Establish data granularity (e.g., daily, weekly, or monthly sales).
Identify aggregation methods for reports.
5. Factless Fact Tables
These tables store event-based data but contain no measurable facts.
Used for tracking events such as student attendance, hospital facility usage, and promotional campaigns.
6. Case Studies and Exercises
Designing a star schema for a sales network with different regions, zones, and cities.
Converting the star schema to a snowflake schema by normalizing dimension tables.
Designing a data warehouse for a furniture company, including product categories, customer demographics, and sales analytics.
Conclusion:
This presentation provides a deep dive into data warehouse design, explaining schemas, dimensional modeling, and real-world applications. By understanding these concepts, organizations can improve data storage, analysis, and business intelligence capabilities.
The document discusses OLAP cubes and data warehousing. It defines OLAP as online analytical processing used to analyze aggregated data in data warehouses. Key concepts covered include star schemas, dimensions and facts, cube operations like roll-up and drill-down, and different OLAP architectures like MOLAP and ROLAP that use multidimensional or relational storage respectively.
The document discusses the data warehouse lifecycle and key components. It covers topics like source systems, data staging, presentation area, business intelligence tools, dimensional modeling concepts, fact and dimension tables, star schemas, slowly changing dimensions, dates, hierarchies, and physical design considerations. Common pitfalls discussed include becoming overly focused on technology, tackling too large of projects, and neglecting user acceptance.
The document discusses denormalization in database design. It begins with an introduction to normalization and outlines the normal forms from 1NF to BCNF. It then describes the denormalization process and different denormalization strategies like pre-joined tables, report tables, mirror tables, and split tables. The document discusses the pros and cons of denormalization and emphasizes the need to weigh performance needs against data integrity. It concludes by stating that selective denormalization is often required to achieve efficient performance.
Exploring Data Modeling Techniques in Modern Data Warehousespriyanka rajput
This article delves deep into data modeling techniques in modern data warehouses, shedding light on their significance and various approaches. If you are aspiring to be a data analyst or data scientist, understanding data modeling is essential, making a Data Analytics Course in Bangalore, Lucknow, Bangalore, Pune, Delhi, Mumbai, Gandhinagar, and other cities across India an attractive proposition.
This document describes a simulator for database aggregation using metadata. The simulator sits between an end-user application and a database management system (DBMS) to intercept SQL queries and transform them to take advantage of available aggregates using metadata describing the data warehouse schema. The simulator provides performance gains by optimizing queries to use appropriate aggregate tables. It was found to improve performance over previous aggregate navigators by making fewer calls to system tables through the use of metadata mappings. Experimental results showed the simulator solved queries faster than alternative approaches by transforming queries to leverage aggregate tables.
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
Business intelligence uses applications and technologies to analyze data and help users make better business decisions. Online transaction processing (OLTP) is used for daily operations like processing, while online analytical processing (OLAP) is used for data analysis and decision making. Data warehouses integrate data from different sources to provide a centralized system for analysis and reporting. Dimensional modeling approaches like star schemas and snowflake schemas organize data to support OLAP.
Simplify database design with SQL Database Modeler, a user-friendly application that makes it easy to create and export detailed data models that improve scalability and data management. https://sqldbm.com/
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
This document discusses multidimensional databases and provides comparisons to relational databases. It describes how multidimensional databases are optimized for data warehousing and online analytical processing (OLAP) applications. Key aspects covered include dimensional modeling using star and snowflake schemas, data storage in cubes with dimensions and members, and performance benefits of multidimensional databases for interactive analysis of large datasets to support decision making.
- Multidimensional data modeling (MDDM) was developed for data warehousing and marts. It allows for interactive analysis of large amounts of data to aid decision making.
- Dimensions and measures make up data cubes, with dimensions being categories like time and location, and measures being numeric facts. Slicing, dicing, and rotating allow analyzing the data from different angles.
- Star schemas and snowflake schemas are common MDDM structures. Star schemas have a central fact table linked to dimension tables, while snowflake schemas further normalize the dimension tables. Fact constellations link multiple fact tables through shared dimensions.
The document provides an overview of a Power BI training course. The course objectives include learning about connecting to data sources, transforming data, building data model relationships, using DAX functions to transform data, and creating visualizations. It discusses topics like importing data from CSV and Excel files into Power BI, using Power Query to transform data, establishing relationships between tables in the data model, using measures and columns with DAX, and building basic and dynamic visualizations. It also provides resources for sample data files and additional learning materials for the course.
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdfSkillCertProExams
• For a full set of 720+ questions. Go to
https://skillcertpro.com/product/microsoft-fabric-analytics-engineer-dp-600-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
Similar to mdmodel multidimensional (MD) modeling approach to represent more complex data relationships and provide richer analytical capabilities (20)
This document discusses the components and architecture of a data warehouse. It describes the major components as the source data component, data staging component, information delivery component, metadata component, and management/control component. It then discusses each of these components in more detail, specifically covering source data types, the extract-transform-load process in data staging, the data storage repository, and authentication/monitoring in information delivery. Dimensional modeling is also introduced as the preferred approach for data warehouse design compared to entity-relationship modeling.
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
This document discusses various concepts in data warehouse logical design including data marts, types of data marts (dependent, independent, hybrid), star schemas, snowflake schemas, and fact constellation schemas. It defines each concept and provides examples to illustrate them. Dependent data marts are created from an existing data warehouse, independent data marts are stand-alone without a data warehouse, and hybrid data marts combine data from a warehouse and other sources. Star schemas have one table for each dimension that joins to a central fact table, while snowflake schemas have normalized dimension tables. Fact constellation schemas have multiple fact tables that share dimension tables.
The document discusses various data modeling techniques for data warehouses including star schemas and column-oriented storage. It notes that traditional OLTP systems are not optimized for data warehousing queries. Star schemas organize data around a central fact table linked to dimension tables and are widely used. However, star schemas can have performance issues like large intermediate results. Column-oriented storage improves performance by storing columns together rather than rows.
The document discusses dimensional modeling and star schemas for data warehousing. It describes how requirements are used to design dimensional models, including choosing dimensions, grains, and facts. The key aspects of a star schema are presented, including fact tables containing measurements and dimension tables containing business context. Slowly changing dimensions, large dimensions, and snowflake schemas are also covered. Aggregate fact tables and fact constellations are introduced as extensions of the star schema.
Dimensional data modeling is a technique for database design intended to support analysis and reporting. It contains dimension tables that provide context about the business and fact tables that contain measures. Dimension tables describe attributes and may include hierarchies, while fact tables contain measurable events linked to dimensions. When designing a dimensional model, the business process, grain, dimensions, and facts are identified. Star and snowflake schemas are common types that differ in normalization of the dimensions. Slowly changing dimensions also must be accounted for.
Module 1.2: Data Warehousing Fundamentals.pptxNiramayKolalle
This presentation provides a comprehensive introduction to Data Warehousing, covering key concepts such as Dimensional Modeling, Data Warehouse Schemas, and Information Package Diagrams. It differentiates between Entity-Relationship (ER) modeling and Dimensional Modeling, emphasizing their applications in transactional systems and analytical processing, respectively.
Key Topics Covered:
1. ER Modeling vs. Dimensional Modeling
ER Modeling: Used in traditional relational databases to normalize data and reduce redundancy.
Dimensional Modeling: Optimized for data warehousing, focusing on query performance and analytical reporting.
Fact tables store measurable business metrics, while dimension tables provide contextual attributes (e.g., time, location, product details).
2. Elements of a Dimensional Data Model
Fact Tables: Contain quantitative data such as sales, revenue, and order counts.
Dimension Tables: Store descriptive data like customer information, time periods, and product categories.
Attributes: Characteristics that define dimensions, such as region, product type, and customer segment.
3. Data Warehouse Schemas
Three primary types of schemas used in Data Warehousing:
Star Schema – A central fact table connected to multiple dimension tables. Simple and efficient for query execution.
Snowflake Schema – A more structured variation where dimension tables are normalized, reducing redundancy but increasing complexity.
Fact Constellation Schema (Galaxy Schema) – Multiple fact tables share common dimension tables, suitable for complex analytical applications.
4. Information Package Diagrams
These diagrams define the structure and relationships of data within a data warehouse. They help organizations:
Define key business metrics (e.g., revenue, order count).
Establish data granularity (e.g., daily, weekly, or monthly sales).
Identify aggregation methods for reports.
5. Factless Fact Tables
These tables store event-based data but contain no measurable facts.
Used for tracking events such as student attendance, hospital facility usage, and promotional campaigns.
6. Case Studies and Exercises
Designing a star schema for a sales network with different regions, zones, and cities.
Converting the star schema to a snowflake schema by normalizing dimension tables.
Designing a data warehouse for a furniture company, including product categories, customer demographics, and sales analytics.
Conclusion:
This presentation provides a deep dive into data warehouse design, explaining schemas, dimensional modeling, and real-world applications. By understanding these concepts, organizations can improve data storage, analysis, and business intelligence capabilities.
The document discusses OLAP cubes and data warehousing. It defines OLAP as online analytical processing used to analyze aggregated data in data warehouses. Key concepts covered include star schemas, dimensions and facts, cube operations like roll-up and drill-down, and different OLAP architectures like MOLAP and ROLAP that use multidimensional or relational storage respectively.
The document discusses the data warehouse lifecycle and key components. It covers topics like source systems, data staging, presentation area, business intelligence tools, dimensional modeling concepts, fact and dimension tables, star schemas, slowly changing dimensions, dates, hierarchies, and physical design considerations. Common pitfalls discussed include becoming overly focused on technology, tackling too large of projects, and neglecting user acceptance.
The document discusses denormalization in database design. It begins with an introduction to normalization and outlines the normal forms from 1NF to BCNF. It then describes the denormalization process and different denormalization strategies like pre-joined tables, report tables, mirror tables, and split tables. The document discusses the pros and cons of denormalization and emphasizes the need to weigh performance needs against data integrity. It concludes by stating that selective denormalization is often required to achieve efficient performance.
Exploring Data Modeling Techniques in Modern Data Warehousespriyanka rajput
This article delves deep into data modeling techniques in modern data warehouses, shedding light on their significance and various approaches. If you are aspiring to be a data analyst or data scientist, understanding data modeling is essential, making a Data Analytics Course in Bangalore, Lucknow, Bangalore, Pune, Delhi, Mumbai, Gandhinagar, and other cities across India an attractive proposition.
This document describes a simulator for database aggregation using metadata. The simulator sits between an end-user application and a database management system (DBMS) to intercept SQL queries and transform them to take advantage of available aggregates using metadata describing the data warehouse schema. The simulator provides performance gains by optimizing queries to use appropriate aggregate tables. It was found to improve performance over previous aggregate navigators by making fewer calls to system tables through the use of metadata mappings. Experimental results showed the simulator solved queries faster than alternative approaches by transforming queries to leverage aggregate tables.
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
Business intelligence uses applications and technologies to analyze data and help users make better business decisions. Online transaction processing (OLTP) is used for daily operations like processing, while online analytical processing (OLAP) is used for data analysis and decision making. Data warehouses integrate data from different sources to provide a centralized system for analysis and reporting. Dimensional modeling approaches like star schemas and snowflake schemas organize data to support OLAP.
Simplify database design with SQL Database Modeler, a user-friendly application that makes it easy to create and export detailed data models that improve scalability and data management. https://sqldbm.com/
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
This document discusses multidimensional databases and provides comparisons to relational databases. It describes how multidimensional databases are optimized for data warehousing and online analytical processing (OLAP) applications. Key aspects covered include dimensional modeling using star and snowflake schemas, data storage in cubes with dimensions and members, and performance benefits of multidimensional databases for interactive analysis of large datasets to support decision making.
- Multidimensional data modeling (MDDM) was developed for data warehousing and marts. It allows for interactive analysis of large amounts of data to aid decision making.
- Dimensions and measures make up data cubes, with dimensions being categories like time and location, and measures being numeric facts. Slicing, dicing, and rotating allow analyzing the data from different angles.
- Star schemas and snowflake schemas are common MDDM structures. Star schemas have a central fact table linked to dimension tables, while snowflake schemas further normalize the dimension tables. Fact constellations link multiple fact tables through shared dimensions.
The document provides an overview of a Power BI training course. The course objectives include learning about connecting to data sources, transforming data, building data model relationships, using DAX functions to transform data, and creating visualizations. It discusses topics like importing data from CSV and Excel files into Power BI, using Power Query to transform data, establishing relationships between tables in the data model, using measures and columns with DAX, and building basic and dynamic visualizations. It also provides resources for sample data files and additional learning materials for the course.
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdfSkillCertProExams
• For a full set of 720+ questions. Go to
https://skillcertpro.com/product/microsoft-fabric-analytics-engineer-dp-600-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
This comprehensive Data Science course is designed to equip learners with the essential skills and knowledge required to analyze, interpret, and visualize complex data. Covering both theoretical concepts and practical applications, the course introduces tools and techniques used in the data science field, such as Python programming, data wrangling, statistical analysis, machine learning, and data visualization.
GenAI for Quant Analytics: survey-analytics.aiInspirient
Pitched at the Greenbook Insight Innovation Competition as apart of IIEX North America 2025 on 30 April 2025 in Washington, D.C.
Join us at survey-analytics.ai!
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsContify
AI competitor analysis helps businesses watch and understand what their competitors are doing. Using smart competitor intelligence tools, you can track their moves, learn from their strategies, and find ways to do better. Stay smart, act fast, and grow your business with the power of AI insights.
For more information please visit here https://www.contify.com/
2. Outline
• What is a Multi-dimensional Database
• What is a data-warehouse
• Review ER-Diagrams
• Problems with ER for OLAP Purposes
3. Outline
• What is Dimensional Modeling.
• Star Schemas (Facts and Dimensions)
• Star Schema vs. ER Diagram
• SQL Comparison
4. Outline (continued)
• Strengths of Dimensional Modeling
• Myths of Dimensional Modeling.
• Designing the Data warehouse
• Keys
• References
5. What is a MDDB?
An MDDB is a specialized data storage facility that
stores summarized data for fast and easy access. Users
can quickly view large amounts of data as a value at
any cross-section of business dimensions. A business
dimension can be any logical vision of the data -- time,
geography, or product, for example. Once an MDDB is
created, it can be copied or transported to any
platform. In addition, regardless of where the MDDB
resides, it is accessible to requesting applications on
any supported platform anywhere on the network,
including the Web.
6. MDDB (continued)
MDDB can be implemented either on a proprietary
MDDB product or as a dimensional model on a
RDBMS. The later is the more common. For our
purposes we will use Oracle 8i, a Relational
Database. Proprietary MDDB database include Oracle’s
Express, Arbor Essbase, Microsoft’s SQL Server OLAP
component, etc.
7. What is a data warehouse?
Data warehouses began in the 70’s out of the need of many
companies to combine the data of it’s various operational systems
into a useful and consistent form for analysis.
Data-warehouses are used to provide data to Decision Support
Systems (DSS). Many data-warehouses also work with OLAP
(Online Analytical Processing) servers and clients.
Data warehouses are updated only in batch not by transactions.
They are optimized for SQL Selects. This optimization includes
de-normalization.
8. DW (continued)
Inmon’s Four Characteristics of a Data Warehouse :
1. Subject-Oriented: DW’s answer a question, they don’t just
store data.
2. Integrated: DW’s provide a unified view of the companies
data.
3. Nonvolatile: DW’s are read-only for analytical purposes, de-
normalization is ok.
4. Time: DW-Data is time sensitive. Analyze the past to
predict the future.
10. Review of ER Modeling
Entity-relationship modeling is a logical design technique
that seeks to eliminate data redundancy and maintain the
integrity of the database. They do this by highly normalizing
the data. The more you normalize the more entities and
relationships you wind up with.
This is necessary in an online transaction processing
(OLTP) system because insert, deletes, and updates
against de-normalized data requires additional transactions
to keep all the redundant data in sync. This is both highly
inefficient and prone to errors.
The ER Model is the best model for OLTP.
11. The Problem with ER Diagrams
ER Diagrams are a spider web of all entities and their
relationship to other entities throughout the database
schema. Un-related relationships clutter the view of what
you really want to get at.
ER Diagrams are too complex for most end users to
understand and because of all the joins required to get any
meaningful data for analysis they are highly inefficient.
Not useful for data-warehouses which need intuitive high
performance retrieval of data.
13. What is Dimensional Modeling.
Dimensional modeling is the name of a logical design
technique often used for data-warehouses.
Dimensional modeling is a logical design technique that seeks to
present the data in a standard framework that is intuitive and
allows for high-performance access.
Dimensional modeling provides the best results for both
ease of use and high performance.
14. It uses the relational model with a few restrictions:
Every dimension is composed of one table with a multi-part
key, called the fact table, and a set of smaller tables called
dimension tables. Each dimension has a single-part
primary key that corresponds exactly to one of the
components of the multi-part key in the fact table. This
creates a structure that looks a lot like a star, hence the
term “Star Schema”
Interestingly early, late 60’s, implementations of relational
databases looked a lot like Star Schema’s. They pre-
dated ER Diagrams.
16. What is a Fact Table?
A fact table is composed of two or more primary keys and
usually also contains numeric data. Because it always
contains at least two primary keys it is always a M-M
relationship.
17. What is a Dimension?
Dimension tables on the other hand have a primary key and
only textual data or non-text data that is used for textual
purposes. This data is used for descriptive purposes only.
Each dimension is considered an equal entry point into the fact
table. The textual attributes that describe things are organized
within the dimensions. For example in a retail database we
would have product, store, customer, promotion, and time
dimensions.
Whether or not to combine related dimensions into one
dimensions is usually up to intuition. Remember however
that guiding principal of dimensional modeling is 1.
Intuitive Design, and 2. Performance.
18. Dimensions (continued)
Because Dimensions are the entry point into the facts that the
user is looking for they should be very descriptive and
intuitive to the user. Here are some rules:
•Verbose (full words)
•Descriptive
•Complete (no missing values)
•Quality assured (no misspellings, impossible values, obsolete or
orphaned values, or cosmetically different versions of the same
attribute)
•Indexed (perhaps B-Tree or bitmap)
•Documented in metadata that explains the origin and interpretation
of each attribute.
19. SQL Comparison
Dimensional Model:
SELECT description, SUM(quoted_price), SUM(quantity), SUM(unit_price) ,
SUM(total_comm)
FROM order_fact of JOIN part_dimension pd ON of.part_nr = pd.part_nr
GROUP BY description;
ER-Model:
SELECT description, SUM(quoted_price), SUM(quantity), SUM(unit_price),
SUM(total_comm)
FROM order o JOIN order_detail od ON o.order_nr = od.order_nr
JOIN part p ON p.part_nr = od.part_nr
JOIN customer c ON o.customer_nr = c.customer_nr
JOIN slsrep s ON s.slsrep_nr = c.slsrep_nr
GROUP BY description;
Notice that the dimensional model only joins two tables, while the ER model joins all
five in the ER Diagram. This is very typical of highly normalized ER models.
Imagine a typical normalized database with 100’s of tables
20. Rules about Facts and Dimensions:
The Basic tenet of dimensional modeling: “If you want to be
able to slice your data along a particular attribute, you simple
need to make the attribute appear in a dimension table.”
Facts and their corresponding Dimensions must be of the
same granularity. Meaning if the fact table holds numerically
data for days, then the dimensions must have factual
attributes that describe daily data.
An attribute can live in one and only one dimension, whereas
a fact can be repeated in multiple fact tables.
If a dimension appears to have more than one location, it is
probably playing multiple roles and needs a slightly different
textual description.
21. Rules (continued)
There is not necessarily a one to one relation between
source data and dimensional data, in fact usually one
source will create multiple dimensions or multiple source
data will create one dimension.
Every fact should have a default aggregation. Even if that
aggregation is No Aggregation.
22. ER to Dimensional Models
1. Separate each entity into the business process that it
represents.
2. Create fact tables by selecting M-M relationships that
contain numeric and additive non-key facts. Fact tables may
be a detail level or at an aggregated level depending on
business needs.
3. Create Dimensions by de-normalizing all the remaining
tables into flat tables with atomic keys that connect directly
to the fact tables.
Kimball: 146/147
23. Strengths of Dimensional Modeling
The Dimensional model is:
1. Predictable. Query tools can make strong assumptions about it.
2. Dynamic.
3. Extends Gracefully by adding rows or columns.
4. Standardized approach to modeling business events.
5. Growing number of software applications to support it.
Kimball: 147 to 149
24. Myths about Dimensional Modeling
1. Dimensional Models are non-dynamic: Only when you pre-
aggregate. Kept in it’s detail form it is just as dynamic as ER.
2. Dimensional Models are too complex: Just the opposite
3. Snow flaking is an alternative to Dimensional Modeling:
Snow flaking is an extension to the Star Schema. It adds sub-
dimensions to dimensions and therefore looks like a snow-
flake. It decreases the “simplicity” of the star-schema and
should be avoided.
Kimball: 150/151
25. Designing the Data warehouse
There are two approaches to building the data-warehouse. The
first is the top-down approach. In this approach an entire
organization wide data-warehouse is built and then smaller data-
marts use it as a source.
The second approach, which much more feasible, is the bottom-
up approach. In this approach individual data-marts are built
using conformed dimensions and a standardized architecture
across the enterprise.
26. Design Success factors
1. Create a surrounding architecture that defines the scope and
implementation of the complete data warehouse
2. Oversee the construction of each piece of the complete data
warehouse.
Kimball in chapter five refers to a design called the data-
warehouse bus architecture.
Kimball: 155
27. Drilling
There are two types of drilling
1. Drill down: Which simple means give me more detail, or a
lower level of granularity. For example show sales figures for
each county instead of for each state.
2. Drill up: Which simple means give me less detail, or a higher
level of granularity. For example showing sales figures for
each state instead of each county.
Most reporting/OLAP tools these days have this capability.
28. Special Types of Dimensions
1. Time dimension: Should be nation neutral. 176
2. Person dimension: Very atomic, for example separate
fields for all parts of name and address. 178
3. Small Static (slowly changing) Dimensions.
4. Small Dynamic (rapidly changing) Dimensions.
5. Large Static (slowly changing) Dimensions.
6. Large Dynamic (rapidly changing) Dimensions.
7. Degenerate Dimensions: Dimensions without Attributes.
8. Miscellaneous Dimensions: Miscellaneous data that
doesn’t fit anywhere else, but that you want to keep.
29. Keys
It is best only to use artificial keys assigned
by the data-warehouse, don’t use original
production keys. Also avoid smart keys.
Smart keys are keys that usually are also
attributes.
30. Designing the Fact Table
Kimball defines a four step process.
1. Choose the data mart
2. Choose the fact table grain: Should be as granular as possible.
3. Choose the dimensions: Usually determined by the fact table.
4. Choose the facts of interest to you.
Kimball: 194
31. A data-mart is essentially a coordinated set of fact tables, all with
similar structures. Kimball, 200
32. Granularity
Detail granularity has several advantages over Aggregate granularity
1. More Dynamic
2. Required for data-mining
3. Allows for Behavior analysis (207/208)
Aggregates offer increased performance when details are not needed.
A best of both worlds can be achieved using something called a snapshot.
In Oracle this is achieved using a Materialized View.
Transactions and snapshots are the yin and yang of data-warehousing.
Kimball: 211
33. REFERENCES:
The Data Warehouse Lifecycle Toolkit
Authors: Ralph Kimball, Laura Reeves, Margy Ross, and
Warren Thornthwaite
Publisher: Wiley.
ISBN: 0-471-25547-5
Pay particular attention to chapters 5, 6, and 7.