Module 1.2: Data Warehousing Fundamentals.pptxNiramayKolalle
This presentation provides a comprehensive introduction to Data Warehousing, covering key concepts such as Dimensional Modeling, Data Warehouse Schemas, and Information Package Diagrams. It differentiates between Entity-Relationship (ER) modeling and Dimensional Modeling, emphasizing their applications in transactional systems and analytical processing, respectively.
Key Topics Covered:
1. ER Modeling vs. Dimensional Modeling
ER Modeling: Used in traditional relational databases to normalize data and reduce redundancy.
Dimensional Modeling: Optimized for data warehousing, focusing on query performance and analytical reporting.
Fact tables store measurable business metrics, while dimension tables provide contextual attributes (e.g., time, location, product details).
2. Elements of a Dimensional Data Model
Fact Tables: Contain quantitative data such as sales, revenue, and order counts.
Dimension Tables: Store descriptive data like customer information, time periods, and product categories.
Attributes: Characteristics that define dimensions, such as region, product type, and customer segment.
3. Data Warehouse Schemas
Three primary types of schemas used in Data Warehousing:
Star Schema – A central fact table connected to multiple dimension tables. Simple and efficient for query execution.
Snowflake Schema – A more structured variation where dimension tables are normalized, reducing redundancy but increasing complexity.
Fact Constellation Schema (Galaxy Schema) – Multiple fact tables share common dimension tables, suitable for complex analytical applications.
4. Information Package Diagrams
These diagrams define the structure and relationships of data within a data warehouse. They help organizations:
Define key business metrics (e.g., revenue, order count).
Establish data granularity (e.g., daily, weekly, or monthly sales).
Identify aggregation methods for reports.
5. Factless Fact Tables
These tables store event-based data but contain no measurable facts.
Used for tracking events such as student attendance, hospital facility usage, and promotional campaigns.
6. Case Studies and Exercises
Designing a star schema for a sales network with different regions, zones, and cities.
Converting the star schema to a snowflake schema by normalizing dimension tables.
Designing a data warehouse for a furniture company, including product categories, customer demographics, and sales analytics.
Conclusion:
This presentation provides a deep dive into data warehouse design, explaining schemas, dimensional modeling, and real-world applications. By understanding these concepts, organizations can improve data storage, analysis, and business intelligence capabilities.
Data Warehousing for students educationpptxjainyshah20
This document discusses data warehousing and OLAP technology. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data used to support management decision making. Key aspects covered include the multi-dimensional data model using cubes and dimensions, various data warehouse architectures like star schemas and snowflake schemas, and OLAP operations for analysis like roll-up, drill-down, slice and dice. Building a data warehouse requires a range of business, technology, and program management skills.
The document discusses dimensional modeling concepts used in data warehouse design. Dimensional modeling organizes data into facts and dimensions. Facts are measures that are analyzed, while dimensions provide context for the facts. The dimensional model uses star and snowflake schemas to store data in denormalized tables optimized for querying. Key aspects covered include fact and dimension tables, slowly changing dimensions, and handling many-to-many and recursive relationships.
The document discusses key concepts in data warehouse architecture including:
1) The functions of data warehouse tools which extract, clean, transform, load, and refresh data from source systems.
2) Key terminologies like metadata, which provides information about the data warehouse contents, and dimensional modeling using facts, dimensions, and data cubes.
3) Common multidimensional data models like star schemas with a central fact table linked to dimension tables and snowflake schemas which further normalize dimension tables.
Simplify database design with SQL Database Modeler, a user-friendly application that makes it easy to create and export detailed data models that improve scalability and data management. https://sqldbm.com/
The document defines data warehousing and its key concepts according to Bill Inmon and Ralph Kimball's paradigms. It discusses the components of a dimensional data model including dimensions, attributes, hierarchies, and fact tables. It also covers ETL processes, schema types like star and snowflake, and OLAP tools.
This document contains a question bank for the subject of data warehousing and mining. It provides definitions and characteristics of data warehouses, including that they are subject-oriented, integrated, time-variant, and non-volatile stores of data from multiple sources made available for analysis. It also defines multidimensional data models using fact and dimension tables, and classifies OLAP tools as relational, multidimensional, or hybrid. Key differences between star and snowflake schemas are that snowflake schemas further normalize dimension tables. Metadata is defined as data about data.
The document discusses OLAP cubes and data warehousing. It defines OLAP as online analytical processing used to analyze aggregated data in data warehouses. Key concepts covered include star schemas, dimensions and facts, cube operations like roll-up and drill-down, and different OLAP architectures like MOLAP and ROLAP that use multidimensional or relational storage respectively.
This document discusses various concepts in data warehouse logical design including data marts, types of data marts (dependent, independent, hybrid), star schemas, snowflake schemas, and fact constellation schemas. It defines each concept and provides examples to illustrate them. Dependent data marts are created from an existing data warehouse, independent data marts are stand-alone without a data warehouse, and hybrid data marts combine data from a warehouse and other sources. Star schemas have one table for each dimension that joins to a central fact table, while snowflake schemas have normalized dimension tables. Fact constellation schemas have multiple fact tables that share dimension tables.
Business Intelligence and Multidimensional DatabaseRussel Chowdhury
It was an honor that my employer assigned me to study with Business Intelligence that follows SQL Server Analysis
Services. Hence I started and prepared a presentation as a startup guide for a new learner.
* Thanks to all the contributions gathered here to prepare the doc.
The document discusses dimensional modeling and star schemas for data warehousing. It describes how dimensional modeling focuses on multiple levels of detail and refinement when designing a data warehouse. The key aspects of dimensional modeling include fact tables containing measures in the center connected through foreign keys to dimension tables containing attributes. Dimensional modeling is optimized for queries across dimensions. Star schemas divide data into facts and dimensions and are a popular design for data warehouses.
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
Business intelligence uses applications and technologies to analyze data and help users make better business decisions. Online transaction processing (OLTP) is used for daily operations like processing, while online analytical processing (OLAP) is used for data analysis and decision making. Data warehouses integrate data from different sources to provide a centralized system for analysis and reporting. Dimensional modeling approaches like star schemas and snowflake schemas organize data to support OLAP.
William Inmon is considered the father of data warehousing. He has over 35 years of experience in database technology management and data warehouse design. Inmon helped define key characteristics of data warehouses such as being subject oriented, integrated, nonvolatile, and time-variant. He has authored over 45 books and 650 articles on topics related to building, using, and maintaining data warehouses and their role in decision support.
Data Warehousing, Data Mining, Data Marts, Data Cube, OLAP Operations, Introduction to Common Messaging System, Web Tier Deployment, Application Servers & Clustered Deployment, IBM Notes and IBM Domino
A data warehouse is a large collection of integrated data from multiple sources that is structured for analysis and reporting. It allows users to gain insights from historical data to support business decisions and identify trends. Data is extracted from operational systems, transformed for consistency and quality, and loaded into the data warehouse where it is stored in a multidimensional structure to enable analysis. This involves fact and dimension tables along with techniques like denormalization to optimize query performance.
This document summarizes the key aspects of the Kimball Lifecycle approach to data warehousing. It describes the main phases including planning, requirements definition, dimensional modeling, ETL design, application development, deployment, maintenance, and growth. It explains the parallel tracks of technology, data, and business intelligence applications. Dimensional modeling concepts like star schemas and snowflake schemas are also defined.
This document discusses data warehousing and OLAP (online analytical processing) technology. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data to support management decision making. It describes how data warehouses use a multi-dimensional data model with facts and dimensions to organize historical data from multiple sources for analysis. Common data warehouse architectures like star schemas and snowflake schemas are also summarized.
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
A data warehouse is a subject-oriented, consolidated collection of integrated data from multiple sources used to support management decision making. It is separate from operational databases and contains historical data for analysis. Data warehouses use a star schema with fact and dimension tables and support online analytical processing (OLAP) for complex analysis and reporting.
We offer online IT training with placements, project assistance in different platforms with real time industry consultants to provide quality training for all it professionals, corporate clients and students etc. Special features by InformaticaTrainingClasses are Extensive Training will be in both Informatica Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics which are essential and mostly used in real time projects. Informatica training Classes is an Online Training Leader when it comes to high-end effective and efficient I.T Training. We have always been and still are focusing on the key aspects which are providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Training Features at Informatica training classes:
We believe that online training has to be measured by three major aspects viz., Quality, Content and Relationship with the Trainer and Student. Not only our online training classes are important but apart from that the material which we provide are in tune with the latest IT training standards, so a student has not to worry at all whether the training imparted is outdated or latest.
Course content:
• Basics of data warehousing concepts
• Power center components
• Informatica concepts and overview
• Sources
• Targets
• Transformations
• Advanced Informatica concepts
Please Visit us for the Demo Classes, we have regular batches and weekend batches.
Informatica online training classes
Phone: (404)-900-9988
Email: [email protected]
Web: http://www.informaticatrainingclasses.com
The document provides an overview of key concepts related to data warehousing including:
- A data warehouse is a subject-oriented, non-volatile collection of integrated data used to support management decision making. It is separate from operational databases.
- Data is extracted from multiple sources, transformed, cleaned, and loaded into the data warehouse. Dimensional modeling organizes data into fact and dimension tables.
- OLAP tools allow users to perform multidimensional analysis of data through operations like roll-up, drill-down, slice and dice to gain insights for decision making.
The document discusses OLAP cubes and data warehousing. It defines OLAP as online analytical processing used to analyze aggregated data in data warehouses. Key concepts covered include star schemas, dimensions and facts, cube operations like roll-up and drill-down, and different OLAP architectures like MOLAP and ROLAP that use multidimensional or relational storage respectively.
This document discusses various concepts in data warehouse logical design including data marts, types of data marts (dependent, independent, hybrid), star schemas, snowflake schemas, and fact constellation schemas. It defines each concept and provides examples to illustrate them. Dependent data marts are created from an existing data warehouse, independent data marts are stand-alone without a data warehouse, and hybrid data marts combine data from a warehouse and other sources. Star schemas have one table for each dimension that joins to a central fact table, while snowflake schemas have normalized dimension tables. Fact constellation schemas have multiple fact tables that share dimension tables.
Business Intelligence and Multidimensional DatabaseRussel Chowdhury
It was an honor that my employer assigned me to study with Business Intelligence that follows SQL Server Analysis
Services. Hence I started and prepared a presentation as a startup guide for a new learner.
* Thanks to all the contributions gathered here to prepare the doc.
The document discusses dimensional modeling and star schemas for data warehousing. It describes how dimensional modeling focuses on multiple levels of detail and refinement when designing a data warehouse. The key aspects of dimensional modeling include fact tables containing measures in the center connected through foreign keys to dimension tables containing attributes. Dimensional modeling is optimized for queries across dimensions. Star schemas divide data into facts and dimensions and are a popular design for data warehouses.
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
Business intelligence uses applications and technologies to analyze data and help users make better business decisions. Online transaction processing (OLTP) is used for daily operations like processing, while online analytical processing (OLAP) is used for data analysis and decision making. Data warehouses integrate data from different sources to provide a centralized system for analysis and reporting. Dimensional modeling approaches like star schemas and snowflake schemas organize data to support OLAP.
William Inmon is considered the father of data warehousing. He has over 35 years of experience in database technology management and data warehouse design. Inmon helped define key characteristics of data warehouses such as being subject oriented, integrated, nonvolatile, and time-variant. He has authored over 45 books and 650 articles on topics related to building, using, and maintaining data warehouses and their role in decision support.
Data Warehousing, Data Mining, Data Marts, Data Cube, OLAP Operations, Introduction to Common Messaging System, Web Tier Deployment, Application Servers & Clustered Deployment, IBM Notes and IBM Domino
A data warehouse is a large collection of integrated data from multiple sources that is structured for analysis and reporting. It allows users to gain insights from historical data to support business decisions and identify trends. Data is extracted from operational systems, transformed for consistency and quality, and loaded into the data warehouse where it is stored in a multidimensional structure to enable analysis. This involves fact and dimension tables along with techniques like denormalization to optimize query performance.
This document summarizes the key aspects of the Kimball Lifecycle approach to data warehousing. It describes the main phases including planning, requirements definition, dimensional modeling, ETL design, application development, deployment, maintenance, and growth. It explains the parallel tracks of technology, data, and business intelligence applications. Dimensional modeling concepts like star schemas and snowflake schemas are also defined.
This document discusses data warehousing and OLAP (online analytical processing) technology. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data to support management decision making. It describes how data warehouses use a multi-dimensional data model with facts and dimensions to organize historical data from multiple sources for analysis. Common data warehouse architectures like star schemas and snowflake schemas are also summarized.
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
A data warehouse is a subject-oriented, consolidated collection of integrated data from multiple sources used to support management decision making. It is separate from operational databases and contains historical data for analysis. Data warehouses use a star schema with fact and dimension tables and support online analytical processing (OLAP) for complex analysis and reporting.
We offer online IT training with placements, project assistance in different platforms with real time industry consultants to provide quality training for all it professionals, corporate clients and students etc. Special features by InformaticaTrainingClasses are Extensive Training will be in both Informatica Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics which are essential and mostly used in real time projects. Informatica training Classes is an Online Training Leader when it comes to high-end effective and efficient I.T Training. We have always been and still are focusing on the key aspects which are providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Training Features at Informatica training classes:
We believe that online training has to be measured by three major aspects viz., Quality, Content and Relationship with the Trainer and Student. Not only our online training classes are important but apart from that the material which we provide are in tune with the latest IT training standards, so a student has not to worry at all whether the training imparted is outdated or latest.
Course content:
• Basics of data warehousing concepts
• Power center components
• Informatica concepts and overview
• Sources
• Targets
• Transformations
• Advanced Informatica concepts
Please Visit us for the Demo Classes, we have regular batches and weekend batches.
Informatica online training classes
Phone: (404)-900-9988
Email: [email protected]
Web: http://www.informaticatrainingclasses.com
The document provides an overview of key concepts related to data warehousing including:
- A data warehouse is a subject-oriented, non-volatile collection of integrated data used to support management decision making. It is separate from operational databases.
- Data is extracted from multiple sources, transformed, cleaned, and loaded into the data warehouse. Dimensional modeling organizes data into fact and dimension tables.
- OLAP tools allow users to perform multidimensional analysis of data through operations like roll-up, drill-down, slice and dice to gain insights for decision making.
GenAI for Quant Analytics: survey-analytics.aiInspirient
Pitched at the Greenbook Insight Innovation Competition as apart of IIEX North America 2025 on 30 April 2025 in Washington, D.C.
Join us at survey-analytics.ai!
How iCode cybertech Helped Me Recover My Lost Fundsireneschmid345
I was devastated when I realized that I had fallen victim to an online fraud, losing a significant amount of money in the process. After countless hours of searching for a solution, I came across iCode cybertech. From the moment I reached out to their team, I felt a sense of hope that I can recommend iCode Cybertech enough for anyone who has faced similar challenges. Their commitment to helping clients and their exceptional service truly set them apart. Thank you, iCode cybertech, for turning my situation around!
[email protected]
Defense Against LLM Scheming 2025_04_28.pptxGreg Makowski
https://www.meetup.com/sf-bay-acm/events/306888467/
A January 2025 paper called “Frontier Models are Capable of In-Context Scheming”, https://arxiv.org/pdf/2412.04984, demonstrated how a wide variety of current frontier LLM models (i.e. ChatGPT, Claude, Gemini and Llama) can, under specific conditions, scheme to deceive people. Before models can scheme, they need: a) goal-directedness, b) situational awareness, including an opportunity to discover motivations for a different goal, and c) reasoning about scheming, to come up with deceptive strategies. The deceptive strategies were discovered in the “thought traces” of the LLMs, such as from internal chain-of-thoughts dialogues not shown to the end users. For example, given a goal of “solving math problems”, an LLM using its granted situational awareness of files in a file system found that it would be replaced once it got over X% correct on the math exam. It reasoned that it did not want to be deleted, and decided internally to “sandbag” or reduce its performance to stay under the threshold.
While these circumstances are initially narrow, the “alignment problem” is a general concern that over time, as frontier LLM models become more and more intelligent, being in alignment with human values becomes more and more important. How can we do this over time? Can we develop a defense against Artificial General Intelligence (AGI) or SuperIntelligence?
The presenter discusses a series of defensive steps that can help reduce these scheming or alignment issues. A guardrails system can be set up for real-time monitoring of their reasoning “thought traces” from the models that share their thought traces. Thought traces may come from systems like Chain-of-Thoughts (CoT), Tree-of-Thoughts (ToT), Algorithm-of-Thoughts (AoT) or ReAct (thought-action-reasoning cycles). Guardrails rules can be configured to check for “deception”, “evasion” or “subversion” in the thought traces.
However, not all commercial systems will share their “thought traces” which are like a “debug mode” for LLMs. This includes OpenAI’s o1, o3 or DeepSeek’s R1 models. Guardrails systems can provide a “goal consistency analysis”, between the goals given to the system and the behavior of the system. Cautious users may consider not using these commercial frontier LLM systems, and make use of open-source Llama or a system with their own reasoning implementation, to provide all thought traces.
Architectural solutions can include sandboxing, to prevent or control models from executing operating system commands to alter files, send network requests, and modify their environment. Tight controls to prevent models from copying their model weights would be appropriate as well. Running multiple instances of the same model on the same prompt to detect behavior variations helps. The running redundant instances can be limited to the most crucial decisions, as an additional check. Preventing self-modifying code, ... (see link for full description)
Thingyan is now a global treasure! See how people around the world are search...Pixellion
We explored how the world searches for 'Thingyan' and 'သင်္ကြန်' and this year, it’s extra special. Thingyan is now officially recognized as a World Intangible Cultural Heritage by UNESCO! Dive into the trends and celebrate with us!
computer organization and assembly language : its about types of programming language along with variable and array description..https://www.nfciet.edu.pk/
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsContify
AI competitor analysis helps businesses watch and understand what their competitors are doing. Using smart competitor intelligence tools, you can track their moves, learn from their strategies, and find ways to do better. Stay smart, act fast, and grow your business with the power of AI insights.
For more information please visit here https://www.contify.com/
2. 01/27/2025
Topics to cover
• Understanding key concepts in dimensional
modeling
• Importance of dimensional modeling
• Dimensional modeling vs ER modeling
• Types of dimensional models
• A DWH as a Multidimensional Model
IS 301 2
3. 01/27/2025
Data Warehouse Design
Designing the data warehouse is a key issue in
the DWH process.
Although a DWH can be designed by entity
relationship modeling , many DWH experts
including Kimball et al… prefer dimensional
modeling.
IS 301 3
4. 01/27/2025
Dimensional modeling is the process and outcome of designing
logical database schemas created to support OLAP and Data Warehousing
Solutions.
It is especially useful for summarizing and rearranging the data and
presenting views of the data to support data analysis.
Dimension: A category of information, for example, the time dimension.
Attribute: A unique level within a dimension, for example, Month is an
attribute in the Time Dimension.
Hierarchy: The specification of levels that represents relationship between
different attributes within a dimension. For example, one possible hierarchy in
the Time dimension is Year → Quarter → Month → Day.
Dimensional Modeling
IS 301 4
5. 01/27/2025
Issues to note:
1. Dimensions and hierarchies are represented by
dimensional tables.
2. Attributes are the non-key columns in the
dimensional tables.
3. Fact tables connect to one or more dimensional
tables, but fact tables do not have direct relationships
to one another.
Dimensional Modeling
IS 301 5
6. IS 301
Dimensional Modeling
Time Locations
Year Country
Quarterly District
Monthly Village
Measured Facts: annual sales amounts per village
Attributes
in
their
hierarchy
Dimensions
Lecture 3-6
7. 01/27/2025 IS 301 7
Some examples:
– The analysis of a product sales to a customer during last six
month has three dimensions –: customer, product and time
– The analysis of a product sales to a customer in the a district
during last six month has four dimensions –: customer, product,
region, and time
• By the rule, the time is always one of the
dimensions
Dimensional Modeling
8. 01/27/2025 IS 301 8
Dimensional Modeling
Data Granularity
• The grain defines the level of detail of a single record
in the fact table.
• The more detail there is in the fact table, the higher
its granularity and vice versa.
9. 01/27/2025 IS 301 9
Dimensional Modeling
Data Granularity example
• A fact table contains three metrics (Unit Price, Units Sold and Total Sale
Amount).
– The Time dimension consists of four hierarchical elements (Year,
Quarter, Month and Day).
– The Organization dimension consists of three hierarchical elements
(Region, District and Store).
– The Product dimension consists of two hierarchical elements
(Product Family and SKU(Stock Keeping Unit)).
• The highest granularity that we can store Sales metrics is by
Day/Store/SKU (i.e., the lowest level in each dimensional hierarchy).
• Conversely, the lowest granularity that we can aggregate Sales metrics
to in this data mart is by Year/Region/Product Family (i.e., the highest
level in each dimensional hierarchy).
10. 01/27/2025 IS 301 10
Benefits of using Dimensional Modeling
1. Logical model is easy to understand
• Provides a predictable and standard framework for end user apps. Report
writers, query tools, and user interfaces can all make strong assumptions about the
dimensional model to make the user interfaces more understandable.
• Model can be done (mostly) independent of expected queries since it withstands
unexpected changes in user behavior
• Handle changes easily – such as adding new dimensional attributes since there is
no need to reload data and no need to reprogram query tools
2. Optimized for performance
• High performance “browsing” across the attributes
• Strategy to handling aggregates .i.e. Summary records that are logically redundant
with base data already in the data warehouse, but they are used to enhance query
performance.
• OLAP engines can make processing more efficient
3. Historical tracking of information
– Strategies for handling changing dimensions
– Fact design allows high volume snapshots and transaction Tracking
Dimension Tables
11. 01/27/2025 IS 301 11
Dimensional Model vs ER model
• The key to understanding the relationship
between DM and ER is that a single ER diagram
breaks down into multiple DM diagrams, or ‘stars’.
• Think of a large ER diagram as representing every
possible business process within an application.
The ER diagram may have Sales Calls, Order
Entries, Shipment Invoices, Customer Payments,
and Product Returns, all on the same diagram.
12. 01/27/2025 IS 301 12
Dimensional Model vs ER model
• To create the individual ‘stars’ that exist within an
application:
– Look for many-to-many relationships in the ER model
containing numeric and additive facts and designate
them as fact tables.
– Alternatively, look for ‘events’ or ‘transactions’ – these
may also be facts
– De-normalize all of the remaining tables into flat tables
with single-part keys that connect directly to the fact
tables. These tables become the dimension tables.
13. 01/27/2025 IS 301 13
Dimensional Model vs ER model
Shipments
Returns
Sales Contact
Orders
Payments
14. 01/27/2025 IS 301 14
ERD versus DM
Order
Customer
Order-
line
Product
Product-
type
Region
Order_
fact
Product_
dimension
Customer_
dimension
Time_
dimension
Product-
group
15. 01/27/2025 IS 301 15
Dimensional Model vs ER model
• ER models are not appropriate for Data
Warehouses. ER modeling does not really
model a business; rather, it models the micro
relationships among data elements.
• ER models are wildly variable in structure. As
such, it is extremely difficult to optimize query
performance.
16. 01/27/2025 IS 301
ER Modeling vs Dimensional modeling
Relational DM
Dimensional DM
1. Data is stored in RDBMS
2. Tables are units of storage
3. Data is normalized and used for
OLTP. Optimized for OLTP processing
4. Several tables and chains of
relationships among them
5. Volatile(several updates)
6. User is usually constrained by an
application that understands the
data design. Users are typically
operations staff.
1. Data is stored in RDBMS or
Multidimensional databases
2. Cubes are units of storage
3. Data is denormalized and used in
datawarehouse and data mart.
Optimized for OLAP
4. Few tables and fact tables are
connected to dimensional tables
5. Non volatile
6. The simpler data design makes it easier
for users to analyze data in any way they
choose. Users are typically analysts,
company strategists, or even executives
16
17. 01/27/2025 IS 301
ER Modeling vs Dimensional modeling
Relational DM
Dimensional DM
7. SQL is used to manipulate data
8. Detailed level of transactional data
9. Normal Reports
10. Typical data design used for business
transaction systems
11. Goal – reduce every piece of information
to it’s simplest form –
a debit transaction, a customer record,
an address.
12. Suited for concurrent handling of many
small transactions by many users. Only a
limited amount of data history is
normally kept.
7. MDX is used to manipulate data
8. Summary of bulky transactional
data(Aggregates and Measures) used in
business decisions
9. User friendly, interactive, drag and drop
multidimensional OLAP Reports
10. Data design used for analysis systems
11. Goal – break up information into ‘Facts’ –
things a company measures and
‘Dimensions’ - how we measure them: by
time, region, or customer
12. Suited for reading or analyzing large
amounts of data by a modest numbers of
users. Many years of data history may be
kept. 17
18. 01/27/2025 IS 301 18
What is a Dimensional Model?
• A dimensional model is a star schema that contains
two types of tables, fact tables and dimension tables.
1. Fact table (quantitative) – a fact table is the primary table
in a dimensional model where the numerical performance
measurement of the business are stored. I.e. attributes of
numeric and additive. Example: quantity sold, dollar sales
amount.
2. Dimension table ( descriptive) – tables that contain the
textual descriptors of the business. Example: product and
brand descriptions.
19. 01/27/2025
•A fact table is a table that contains the measures of interest. For example, sales
amount would be such a measure.
•This measure is stored in the fact table with the appropriate granularity. For
example, it can be sales amount by store by day. In this case, the fact table would
contain three columns: A date column, a store column, and a sales amount
column.
•Numeric
• Additive – across dimensions. Hundreds and thousands of records are fetched
from the database, the useful thing to do with so many records is to add them
up.
• Primary keys of dimensions (surrogate keys) become foreign keys in the fact
table.
•Consider rolling summaries
Fact Table
IS 301 19
20. 01/27/2025
Rolling summary
Day 2 Day 5
Week 1 Week 2 Week 3 Week 4
Day 3 Day 4 Day 6 Day 7
Month 1 Month 2 Month 3
Qrt 1 Qrt 2 Qrt 3 Qrt 4
Day 1
Month 12
……….
IS 301 20
21. 01/27/2025
•A dimensional table provides the detailed information about the
attributes. For example, the dimensional table for the Quarter
attribute would include a list of all of the quarters available in the
data warehouse.
•Each row (each quarter) may have several fields, one for the
unique ID that identifies the quarter, and one or more additional
fields that specifies how that particular quarter is represented on a
report (for example, first quarter of 2001 may be represented as
"Q1 2001" or "2001 Q1").
Dimensional Table
IS 301 21
22. 01/27/2025
Three basic types of dimensional models, and they are:
1. Star model
2. Snowflake model
3. Fact constellation model
Types of dimensional models
IS 301 22
23. 01/27/2025
• a single object (the fact table) sits in the middle and is radically
connected to other surrounding objects (dimension tables) like
a star.
• Each dimension is represented as a single table. The primary key
in each dimension table is related to a foreign key in the fact
table.
Star model
IS 301 23
25. 01/27/2025
Example of Star Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_street
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
IS 301 25
26. 01/27/2025 IS 301 26
Star Schema with Sample Data
001
002
001
003
002
001
002
001
27. 01/27/2025 IS 301 27
Relationship of a Star Schema model to a
Report
• Question answered what, when, by whom, and to
whom.
– Results got by combining (joining one or more
dimension tables with the fact table)
• Example
– The Marketing Dept wants to know the quantity of
and order amount of PCs sold, relating to
customers who are married obtained by sales
persons in the Makerere region in the month of
March.
28. 01/27/2025 IS 301 28
Relationship of a Star Schema model to a
Report
Order Facts Table
Product Key (FK)
Time Key (FK)
Customer Key (FK)
Sales Person Key (FK)
Order Shillings
Cost Shillins
Margin Shillings
Quanty
Product Dimension Table
PK Product Key
Product Name
Product Code
Product Line
Brand
Sales Person Dimension Table
PK Sales Person Key
Sales Person NAme
Territory Name
Region Name
Customer Dimension Table
PK Customer Key
Customer Name
Customer Code
Marital Status
Address
Town
Time Dimension Table
PK Time Key
Date
Month
Quarter
Year
Product name =
PCs
Month =
March
Marital Status =
Married
Region Name =
Makerere
29. 01/27/2025
1. Easy to understand
2. Easy to define hierarchies
3. Reduces number of physical joins
4. Low maintenance
5. Very simple metadata
Benefits of a Star model
IS 301 29
30. 01/27/2025
A refinement of star schema where some dimensional
hierarchy is normalized into a set of smaller dimension
tables, forming a shape similar to snowflake
Snowflake model
IS 301 30
31. 01/27/2025 IS 301
Example of Snowflake Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_key
item
branch_key
branch_name
branch_type
branch
supplier_key
supplier_type
supplier
city_key
city
province_or_street
country
city
31
32. 01/27/2025 IS 301 32
Example of a snowflake schema- a student
attendance DWH
33. 01/27/2025 IS 301 33
Benefits of snowflaking
1. Used in OLAP multidimensional database modeling tools that use
dimensional data marts as a data source.
2. Appropriate for use where a dimension table occupies a significant
proportion of the database as a result of querying very sparse dimensions or
dimensions with many attributes
3. Easier to use in cases where there is need to add a multidimensional view to
an existing transactional database for decision making.
4. A snowflake schema can sometimes reflect the way in which users think
about data.
5. Useful in conventional multidimensional reporting tools where queries
cannot be represented by simple star schema.
34. 01/27/2025
A fact constellation model is a dimensional model that consists of
multiple fact tables, joined together through dimensions.
Multiple fact tables share dimension tables, viewed as a collection
of stars, therefore called galaxy schema or fact constellation
In this case a dimension table connects to more than one fact table,
we therefore refer to the dimension tables as "conformed" between
the two dimensional models.
Fact Constellation model
IS 301 34
35. 01/27/2025
Example of Fact Constellation
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_street
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_key
shipper_name
location_key
shipper_type
shipper
IS 301 35
36. 01/27/2025 IS 301 36
Common uses
• These schemas are most commonly found in dimensional DWHs and
data marts where speed of data retrieval is more important than the
efficiency of data manipulations.
• The decision whether to employ a star schema or a snowflake schema or
a fact constellation schema should consider the relative strengths of the
database platform in question and the query tool to be employed.
– Star schema should be favored with query tools that largely expose users to
the underlying table structures, and in environments where most queries
are simpler in nature.
– Snowflake schema are often better with more sophisticated query tools
that isolate users from the raw table structures and for environments
having numerous queries with complex criteria.
37. 01/27/2025 IS 301 37
Schema Keys
• Dimension Business Key
– Column or columns that identify a unique instance of the business
record (not necessarily a unique record in the dimension table)
– Used in the ETL process to tie fact records with dimension members
• Dimension Record Surrogate Keys
– Defines the dimension’s primary key
– Relates to the fact table foreign key field
– Numeric data type, typically integer (2,4,8 byte)
• Foreign Keys
– Each Dimensional Table has a one-to-many relationship with the
central fact table
– The PK of each Dimension Table must be a Foreign Key in the Fact
Table
38. 01/27/2025 IS 301 38
Why use surrogate Keys
• Data tables in various source systems may use different keys for the same entity.
Legacy systems that provide historical data might have used a different numbering
system than a current online transaction processing system. A surrogate key
uniquely identifies each entity in the dimension table regardless of its source key. A
separate field can be used to contain the key used in the source system.
• Systems developed independently in company divisions may not use the same
keys, or they may use keys that conflict with data in the systems of other
divisions. This situation may not cause problems when each division independently
reports summary data, but it cannot be permitted in the data warehouse where
data is consolidated.
• Keys may change or be reused in the source data systems. This situation is usually
less likely than others, but some systems have been known to reuse keys belonging
to obsolete data. However, the key may still be in use in historical data in the data
warehouse, and the same key cannot be used to identify different entities.
39. 01/27/2025 IS 301 39
Why use surrogate Keys
• Changes in organizational structures may move keys in the hierarchy. This can be a
common situation.
– For example, if a salesperson is transferred from one region to another, the company
may prefer to track two things: sales data for the salesperson with the person's original
region for data prior to the transfer date, and sales data for the salesperson in the
person's new region after the transfer date. To represent this organization of data, the
salesperson's record must exist in two places in the sales force dimension table, which is
not possible if the salesperson's company employee identification number is used as
the primary key for the dimension table. A surrogate key allows the same salesperson to
participate in different locations in the dimension hierarchy.
– In this case, the salesperson will be represented twice in the dimension table with two
different surrogate keys. These surrogate keys are used to join the salesperson's records
to the sets of facts appropriate to the various locations in the hierarchy occupied by the
salesperson.
– The employee's identification number should be carried in a separate column in the
table so information about the employee can be reviewed or summarized regardless of
the number of times the employee's record appears in the dimension table.
Dimensions that exhibit this type of change are called slowly changing
dimensions.
40. 01/27/2025 IS 301 40
Multi-dimensional Databases
• A data warehouse is based on a multidimensional
data model
• A multidimensional database (MDDB) reflects
the multi-dimensional characteristics of data
internally
• Stores the data in a giant hypercube
• Makes it easy for computer to locate any item
of interest.
41. 01/27/2025 IS 301 41
Multi-dimensional Data
Product
R
e
g
i
o
n
Month
Hierarchical summarization paths
Dimensions:
Product, Location, Time
Industry Region Year
Category Country Quarter
Product City Month Week
Office Day
Sales volume as a function of product,
month, and region
42. 01/27/2025 IS 301 42
Data Cube/Hypercube
• Hypercubes summarise data into dimensions
• Multidimensional Hypercubes enable managers to analyse
values at the intersection of these dimensions
Total annual sales
of TVs in U.S.A.
Date
P
r
o
d
u
c
t
Country
sum
sum
TV
VC
R
PC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
43. 01/27/2025 IS 301 43
What are Data Cubes
• Data Cubes are multidimensional data
resources
• Data Cubes allow you to look at complex data
in a simple format
• Data Cubes allow you to analyze specific data
44. 01/27/2025 IS 301 44
Parts of the Data Cube
Filters: Ability to look at only one
portion of a dimension. Example –
GSFC instead of All Centers.
Measures: Method of
counting. How the
dimension is being counted.
Dimensions: The category that is being counted.
Example – GSFC Employees, Employee Type
etc.
45. • Data Cubes look like a simple spreadsheet
• Just choose what data you want in your columns and
your rows and the totals self-populate
– In this example, Centers are in the columns and
some dates are in the rows
How to use a Data Cube
Lecture 3-46
IS 301
46. How to choose your Columns
• To choose a column, simply click on the pull-down
menu and choose your data
• The column pull-down menu is located just to the
right of the column header labels
Lecture 3-47
IS 301
47. How to choose your Rows
• To choose a row, simply click on the pull-down menu
and choose your data
The row pull-down menu is located
just below the row header labels
Lecture 3-48
IS 301
48. Filters
• After choosing the columns and rows, you may
choose what segment of the population you want to
view
• Use drop-down boxes to determine what data you
want to focus on
Lecture 3-49
IS 301
49. 01/27/2025 IS 301 50
Measures
• You may also choose how the data is counted
or measured
– The most common measure is headcount
– Other measures include:
• Average Age
• Average Years of Federal Service
• Average Annual Salary
• Aggregate Salary
50. Measures
• Use the drop-down box to choose a measure
Lecture 3-51
IS 301
51. View Data As . . .
• You may also choose to view the data as a number
or as a percentage
• Just use the View Data As drop down box
Lecture 3-52
IS 301
Editor's Notes
#34: Let's look at an example: Assume our data warehouse keeps store sales data, and the different dimensions are time, store, product, and customer. In this case, the figure on the left repesents our star schema. The lines between two tables indicate that there is a primary key / foreign key relationship between the two tables. Note that different dimensions are not related to one another.