SlideShare a Scribd company logo
Data Warehouse Definition<br />Different people have different definitions for a data warehouse. The most popular definition came from Bill Inmon, who provided the following: <br />A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. <br />Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, \"
sales\"
 can be a particular subject.<br />Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.<br />Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.<br />Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.<br />Ralph Kimball provided a more concise definition of a data warehouse: <br />A data warehouse is a copy of transaction data specifically structured for query and analysis.<br />This is a functional view of a data warehouse. Kimball did not address how the data warehouse is built like Inmon did, rather he focused on the functionality of a data warehouse. <br />Data Warehouse Architecture<br />Different data warehousing systems have different structures. Some may have an ODS (operational data store), while some may have multiple data marts. Some may have a small number of data sources, while some may have dozens of data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture rather than discussing the specifics of any one system.<br />In general, all data warehouse systems have the following layers:<br />Data Source Layer
Data Extraction Layer
Staging Area
ETL Layer
Data Storage Layer
Data Logic Layer
Data Presentation Layer
Metadata Layer
System Operations Layer The picture below shows the relationships among the different components of the data warehouse architecture: <br />Each component is discussed individually below: <br />Data Source Layer <br />This represents the different data sources that feed data into the data warehouse. The data source can be of any format -- plain text file, relational database, other types of database, Excel file, ... can all act as a data source. <br />Many different types of data can be a data source: <br />Operations -- such as sales data, HR data, product data, inventory data, marketing data, systems data.
Web server logs with user browsing data.
Internal market research data.
Third-party data, such as census data, demographics data, or survey data. All these data sources together form the Data Source Layer. <br />Data Extraction Layer <br />Data gets pulled from the data source into the data warehouse system. There is likely some minimal data cleansing, but there is unlikely any major data transformation. <br />Staging Area <br />This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration. <br />ETL Layer <br />This is where data gains its \"
intelligence\"
, as logic is applied to transform the data from a transactional nature to an analytical nature. This layer is also where data cleansing happens. <br />Data Storage Layer <br />This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities can be found here: data warehouse, data mart, and operational data store (ODS). In any given system, you may have just one of the three, two of the three, or all three types. <br />Data Logic Layer <br />This is where business rules are stored. Business rules stored here do not affect the underlying data transformation rules, but does affect what the report looks like. <br />Data Presentation Layer <br />This refers to the information that reaches the users. This can be in a form of a tabular / graphical report in a browser, an emailed report that gets automatically generated and sent everyday, or an alert that warns users of exceptions, among others. <br />Metadata Layer <br />This is where information about the data stored in the data warehouse system is stored. A logical data model would be an example of something that's in the metadata layer. <br />System Operations Layer <br />This layer includes information on how the data warehouse system operates, such as ETL job status, system performance, and user access history. <br />Data Warehousing Concepts<br />Dimensional Data Model: Dimensional data model is commonly used in data warehousing systems. This section describes this modeling technique, and the two common schema types, star schema and snowflake schema. <br />Slowly Changing Dimension: This is a common issue facing data warehousing practioners. This section explains the problem, and describes the three ways of handling this problem with examples. <br />Conceptual Data Model: What is a conceptual data model, its features, and an example of this type of data model. <br />Logical Data Model: What is a logical data model, its features, and an example of this type of data model. <br />Physical Data Model: What is a physical data model, its features, and an example of this type of data model. <br />Conceptual, Logical, and Physical Data Model: Different levels of abstraction for a data model. This section compares and constrasts the three different types of data models. <br />Data Integrity: What is data integrity and how it is enforced in data warehousing. <br />What is OLAP: Definition of OLAP. <br />MOLAP, ROLAP, and HOLAP: What are these different types of OLAP technology? This section discusses how they are different from the other, and the advantages and disadvantages of each. <br />Bill Inmon vs. Ralph Kimball: These two data warehousing heavyweights have a different view of the role between data warehouse and data mart<br />Dimensional Data Model<br />Dimensional data model is most often used in data warehousing systems. This is different from the 3rd normal form, commonly used for transactional (OLTP) type systems. As you can imagine, the same data would then be stored differently in a dimensional model than in a 3rd normal form model. <br />To understand dimensional data modeling, let's define some of the terms commonly used in this type of modeling: <br />Dimension: A category of information. For example, the time dimension. <br />Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension. <br />Hierarchy: The specification of levels that represents relationship between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year -> Quarter -> Month -> Day. <br />Fact Table: A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A date column, a store column, and a sales amount column. <br />Lookup Table: The lookup table provides the detailed information about the attributes. For example, the lookup table for the Quarter attribute would include a list of all of the quarters available in the data warehouse. Each row (each quarter) may have several fields, one for the unique ID that identifies the quarter, and one or more additional fields that specifies how that particular quarter is represented on a report (for example, first quarter of 2001 may be represented as \"
Q1 2001\"
 or \"
2001 Q1\"
). <br />A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup tables, but fact tables do not have direct relationships to one another. Dimensions and hierarchies are represented by lookup tables. Attributes are the non-key columns in the lookup tables. <br />In designing data models for data warehouses / data marts, the most commonly used schema types are Star Schema and Snowflake Schema. <br />Whether one uses a star or a snowflake largely depends on personal preference and business needs. Personally, I am partial to snowflakes, when there is a business case to analyze the information at that particular level. <br />Snowflake Schema<br />The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy. <br />Sample snowflake schema<br />For example, the Time Dimension that consists of 2 different hierarchies: <br />1. Year -> Month -> Day 2. Week -> Day <br />We will have 4 lookup tables in a snowflake schema: A lookup table for year, a lookup table for month, a lookup table for week, and a lookup table for day. Year is connected to Month, which is then connected to Day. Week is only connected to Day. A sample snowflake schema illustrating the above relationships in the Time Dimension is shown to the right. <br />The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables. <br />Star schema<br />In the star schema design, a single object (the fact table) sits in the middle and is radially connected to other surrounding objects (dimension lookup tables) like a star. Each dimension is represented as a single table. The primary key in each dimension table is related to a forieng key in the fact table. <br />Sample star schema<br />All measures in the fact table are related to all the dimensions that fact table is related to. In other words, they all have the same level of granularity. <br />A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table. <br />Let's look at an example: Assume our data warehouse keeps store sales data, and the different dimensions are time, store, product, and customer. In this case, the figure on the left repesents our star schema. The lines between two tables indicate that there is a primary key / foreign key relationship between the two tables. Note that different dimensions are not related to one another. <br />Slowly Changing Dimensions<br />The \"
Slowly Changing Dimension\"
 problem is a common one particular to data warehousing. In a nutshell, this applies to cases where the attribute for a record varies over time. We give an example below: <br />Christina is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in the customer lookup table has the following record: <br />Customer KeyNameState1001ChristinaIllinois<br />At a later date, she moved to Los Angeles, California on January, 2003. How should ABC Inc. now modify its customer table to reflect this change? This is the \"
Slowly Changing Dimension\"
 problem. <br />There are in general three ways to solve this type of problem, and they are categorized as follows: <br />Type 1: The new record replaces the original record. No trace of the old record exists. <br />In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept. <br />In our example, recall we originally have the following table: <br />Customer KeyNameState1001ChristinaIllinois<br />After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table: <br />Customer KeyNameState1001ChristinaCalifornia<br />Advantages: <br />- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information. <br />Disadvantages: <br />- All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before. <br />Usage: <br />About 50% of the time. <br />When to use Type 1: <br />Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes<br />Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people. <br />In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The newe record gets its own primary key. <br />In our example, recall we originally have the following table: <br />Customer KeyNameState1001ChristinaIllinois<br />After Christina moved from Illinois to California, we add the new information as a new row into the table: <br />Customer KeyNameState1001ChristinaIllinois1005ChristinaCalifornia<br />Advantages: <br />- This allows us to accurately keep all historical information. <br />Disadvantages: <br />- This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern. <br />- This necessarily complicates the ETL process. <br />Usage: <br />About 50% of the time. <br />When to use Type 2: <br />Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.<br />Type 3: The original record is modified to reflect the change. <br />We next take a look at each of the scenarios and how the data model and the data looks like for each of them. Finally, we compare and contrast among the three alternatives. <br />n Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active. <br />In our example, recall we originally have the following table: <br />Customer KeyNameState1001ChristinaIllinois<br />To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns: <br />Customer Key<br />Name<br />Original State<br />Current State<br />Effective Date<br />After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003): Customer KeyNameOriginal StateCurrent StateEffective Date1001ChristinaIllinoisCalifornia15-JAN-2003<br />Advantages:
- This does not increase the size of the table, since new information is updated.
- This allows us to keep some part of history.
Disadvantages:
- Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost.
Usage:
Ad

More Related Content

What's hot (20)

multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
moni sindhu
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
Malik Alig
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
aksrauf
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
Bryan Cafferky
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
Sunita Sahu
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
Ashish Chandwani
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
jagdish_93
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional Modelling
Vincent Rainardi
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
Abdul Aslam
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNG
Divya Tadi
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension
Sunita Sahu
 
Dimensional Modelling - Basic Concept
Dimensional Modelling - Basic ConceptDimensional Modelling - Basic Concept
Dimensional Modelling - Basic Concept
Folio3 Software
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Gurpreet Singh Sachdeva
 
Data Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional ModelingData Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional Modeling
Code Mastery
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
Saikiran Panjala
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
Dunn Solutions Group
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
Russel Chowdhury
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
VijayasankariS
 
Data mininng trends
Data mininng trendsData mininng trends
Data mininng trends
VijayasankariS
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
VijayasankariS
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
moni sindhu
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
Malik Alig
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
aksrauf
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
Sunita Sahu
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
Ashish Chandwani
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
jagdish_93
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional Modelling
Vincent Rainardi
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
Abdul Aslam
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNG
Divya Tadi
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension
Sunita Sahu
 
Dimensional Modelling - Basic Concept
Dimensional Modelling - Basic ConceptDimensional Modelling - Basic Concept
Dimensional Modelling - Basic Concept
Folio3 Software
 
Data Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional ModelingData Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional Modeling
Code Mastery
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
Saikiran Panjala
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
Dunn Solutions Group
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
Russel Chowdhury
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
VijayasankariS
 

Viewers also liked (17)

Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
Sajjad Zaheer
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
Prithwis Mukerjee
 
Chapter 2 - Retail Sales
Chapter 2 - Retail Sales Chapter 2 - Retail Sales
Chapter 2 - Retail Sales
Khairul Shafee Kalid
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Dwh lecture slides-week5&6
Dwh lecture slides-week5&6
Shani729
 
Dwh lecture slides-week10
Dwh lecture slides-week10Dwh lecture slides-week10
Dwh lecture slides-week10
Shani729
 
11 Database Concepts
11 Database Concepts11 Database Concepts
11 Database Concepts
Praveen M Jigajinni
 
Database System Concepts and Architecture
Database System Concepts and ArchitectureDatabase System Concepts and Architecture
Database System Concepts and Architecture
sontumax
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
Sarita Kataria
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 Introduction
Ali Usman
 
Datacube
DatacubeDatacube
Datacube
man2sandsce17
 
Retail Data Warehouse
Retail Data WarehouseRetail Data Warehouse
Retail Data Warehouse
Peter Campbell
 
Shape Subtract in PowerPoint
Shape Subtract in PowerPoint Shape Subtract in PowerPoint
Shape Subtract in PowerPoint
Indezine.com
 
Data cubes
Data cubesData cubes
Data cubes
Mohammed
 
AUTOMATED FOOTBALL MANAGEMENT SYSTEM
AUTOMATED FOOTBALL MANAGEMENT SYSTEMAUTOMATED FOOTBALL MANAGEMENT SYSTEM
AUTOMATED FOOTBALL MANAGEMENT SYSTEM
Abhishek Kumar
 
How to Present Data in PowerPoint
How to Present Data in PowerPointHow to Present Data in PowerPoint
How to Present Data in PowerPoint
Matt Hunter
 
Data modeling for the business
Data modeling for the businessData modeling for the business
Data modeling for the business
Christopher Bradley
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
Sajjad Zaheer
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Dwh lecture slides-week5&6
Dwh lecture slides-week5&6
Shani729
 
Dwh lecture slides-week10
Dwh lecture slides-week10Dwh lecture slides-week10
Dwh lecture slides-week10
Shani729
 
Database System Concepts and Architecture
Database System Concepts and ArchitectureDatabase System Concepts and Architecture
Database System Concepts and Architecture
sontumax
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
Sarita Kataria
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 Introduction
Ali Usman
 
Shape Subtract in PowerPoint
Shape Subtract in PowerPoint Shape Subtract in PowerPoint
Shape Subtract in PowerPoint
Indezine.com
 
AUTOMATED FOOTBALL MANAGEMENT SYSTEM
AUTOMATED FOOTBALL MANAGEMENT SYSTEMAUTOMATED FOOTBALL MANAGEMENT SYSTEM
AUTOMATED FOOTBALL MANAGEMENT SYSTEM
Abhishek Kumar
 
How to Present Data in PowerPoint
How to Present Data in PowerPointHow to Present Data in PowerPoint
How to Present Data in PowerPoint
Matt Hunter
 
Ad

Similar to Dimensional data model (20)

Dw concepts
Dw conceptsDw concepts
Dw concepts
Krishna Prasad
 
Cs1011 dw-dm-1
Cs1011 dw-dm-1Cs1011 dw-dm-1
Cs1011 dw-dm-1
Aarti Goyal
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Subrata Kumer Paul
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
Er. Nawaraj Bhandari
 
Data Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptData Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.ppt
MutiaSari53
 
1-Data Warehousing-Multi Dim Data Model.pptx
1-Data Warehousing-Multi Dim Data Model.pptx1-Data Warehousing-Multi Dim Data Model.pptx
1-Data Warehousing-Multi Dim Data Model.pptx
ShobySunny2
 
Lesson 2.docx
Lesson 2.docxLesson 2.docx
Lesson 2.docx
calf_ville86
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docx
DURGADEVIL
 
Dimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.pptDimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.ppt
nishant523869
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
nikshaikh786
 
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...
SakkaravarthiS1
 
DATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdfDATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdf
NikitaKumari71
 
Data warehouse
Data warehouseData warehouse
Data warehouse
_123_
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Salah Amean
 
data warehousing and online analtytical processing
data warehousing and online analtytical processingdata warehousing and online analtytical processing
data warehousing and online analtytical processing
321106410027
 
INT 1010 07-3.pdf
INT 1010 07-3.pdfINT 1010 07-3.pdf
INT 1010 07-3.pdf
Luis R Castellanos
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Samir Sabry
 
Data warehousing interview questions
Data warehousing interview questionsData warehousing interview questions
Data warehousing interview questions
Satyam Jaiswal
 
CHAPTER 2 - Datawarehouse Architecture.pptx
CHAPTER 2 - Datawarehouse Architecture.pptxCHAPTER 2 - Datawarehouse Architecture.pptx
CHAPTER 2 - Datawarehouse Architecture.pptx
AnithaSakthivel3
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
Nivetha Durganathan
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Subrata Kumer Paul
 
Data Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptData Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.ppt
MutiaSari53
 
1-Data Warehousing-Multi Dim Data Model.pptx
1-Data Warehousing-Multi Dim Data Model.pptx1-Data Warehousing-Multi Dim Data Model.pptx
1-Data Warehousing-Multi Dim Data Model.pptx
ShobySunny2
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docx
DURGADEVIL
 
Dimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.pptDimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.ppt
nishant523869
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
nikshaikh786
 
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...
SakkaravarthiS1
 
DATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdfDATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdf
NikitaKumari71
 
Data warehouse
Data warehouseData warehouse
Data warehouse
_123_
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Salah Amean
 
data warehousing and online analtytical processing
data warehousing and online analtytical processingdata warehousing and online analtytical processing
data warehousing and online analtytical processing
321106410027
 
Data warehousing interview questions
Data warehousing interview questionsData warehousing interview questions
Data warehousing interview questions
Satyam Jaiswal
 
CHAPTER 2 - Datawarehouse Architecture.pptx
CHAPTER 2 - Datawarehouse Architecture.pptxCHAPTER 2 - Datawarehouse Architecture.pptx
CHAPTER 2 - Datawarehouse Architecture.pptx
AnithaSakthivel3
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
Nivetha Durganathan
 
Ad

Recently uploaded (20)

YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Library Association of Ireland
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Library Association of Ireland
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 

Dimensional data model

  • 1. Data Warehouse Definition<br />Different people have different definitions for a data warehouse. The most popular definition came from Bill Inmon, who provided the following: <br />A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. <br />Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, \" sales\" can be a particular subject.<br />Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.<br />Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.<br />Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.<br />Ralph Kimball provided a more concise definition of a data warehouse: <br />A data warehouse is a copy of transaction data specifically structured for query and analysis.<br />This is a functional view of a data warehouse. Kimball did not address how the data warehouse is built like Inmon did, rather he focused on the functionality of a data warehouse. <br />Data Warehouse Architecture<br />Different data warehousing systems have different structures. Some may have an ODS (operational data store), while some may have multiple data marts. Some may have a small number of data sources, while some may have dozens of data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture rather than discussing the specifics of any one system.<br />In general, all data warehouse systems have the following layers:<br />Data Source Layer
  • 9. System Operations Layer The picture below shows the relationships among the different components of the data warehouse architecture: <br />Each component is discussed individually below: <br />Data Source Layer <br />This represents the different data sources that feed data into the data warehouse. The data source can be of any format -- plain text file, relational database, other types of database, Excel file, ... can all act as a data source. <br />Many different types of data can be a data source: <br />Operations -- such as sales data, HR data, product data, inventory data, marketing data, systems data.
  • 10. Web server logs with user browsing data.
  • 12. Third-party data, such as census data, demographics data, or survey data. All these data sources together form the Data Source Layer. <br />Data Extraction Layer <br />Data gets pulled from the data source into the data warehouse system. There is likely some minimal data cleansing, but there is unlikely any major data transformation. <br />Staging Area <br />This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration. <br />ETL Layer <br />This is where data gains its \" intelligence\" , as logic is applied to transform the data from a transactional nature to an analytical nature. This layer is also where data cleansing happens. <br />Data Storage Layer <br />This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities can be found here: data warehouse, data mart, and operational data store (ODS). In any given system, you may have just one of the three, two of the three, or all three types. <br />Data Logic Layer <br />This is where business rules are stored. Business rules stored here do not affect the underlying data transformation rules, but does affect what the report looks like. <br />Data Presentation Layer <br />This refers to the information that reaches the users. This can be in a form of a tabular / graphical report in a browser, an emailed report that gets automatically generated and sent everyday, or an alert that warns users of exceptions, among others. <br />Metadata Layer <br />This is where information about the data stored in the data warehouse system is stored. A logical data model would be an example of something that's in the metadata layer. <br />System Operations Layer <br />This layer includes information on how the data warehouse system operates, such as ETL job status, system performance, and user access history. <br />Data Warehousing Concepts<br />Dimensional Data Model: Dimensional data model is commonly used in data warehousing systems. This section describes this modeling technique, and the two common schema types, star schema and snowflake schema. <br />Slowly Changing Dimension: This is a common issue facing data warehousing practioners. This section explains the problem, and describes the three ways of handling this problem with examples. <br />Conceptual Data Model: What is a conceptual data model, its features, and an example of this type of data model. <br />Logical Data Model: What is a logical data model, its features, and an example of this type of data model. <br />Physical Data Model: What is a physical data model, its features, and an example of this type of data model. <br />Conceptual, Logical, and Physical Data Model: Different levels of abstraction for a data model. This section compares and constrasts the three different types of data models. <br />Data Integrity: What is data integrity and how it is enforced in data warehousing. <br />What is OLAP: Definition of OLAP. <br />MOLAP, ROLAP, and HOLAP: What are these different types of OLAP technology? This section discusses how they are different from the other, and the advantages and disadvantages of each. <br />Bill Inmon vs. Ralph Kimball: These two data warehousing heavyweights have a different view of the role between data warehouse and data mart<br />Dimensional Data Model<br />Dimensional data model is most often used in data warehousing systems. This is different from the 3rd normal form, commonly used for transactional (OLTP) type systems. As you can imagine, the same data would then be stored differently in a dimensional model than in a 3rd normal form model. <br />To understand dimensional data modeling, let's define some of the terms commonly used in this type of modeling: <br />Dimension: A category of information. For example, the time dimension. <br />Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension. <br />Hierarchy: The specification of levels that represents relationship between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year -> Quarter -> Month -> Day. <br />Fact Table: A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A date column, a store column, and a sales amount column. <br />Lookup Table: The lookup table provides the detailed information about the attributes. For example, the lookup table for the Quarter attribute would include a list of all of the quarters available in the data warehouse. Each row (each quarter) may have several fields, one for the unique ID that identifies the quarter, and one or more additional fields that specifies how that particular quarter is represented on a report (for example, first quarter of 2001 may be represented as \" Q1 2001\" or \" 2001 Q1\" ). <br />A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup tables, but fact tables do not have direct relationships to one another. Dimensions and hierarchies are represented by lookup tables. Attributes are the non-key columns in the lookup tables. <br />In designing data models for data warehouses / data marts, the most commonly used schema types are Star Schema and Snowflake Schema. <br />Whether one uses a star or a snowflake largely depends on personal preference and business needs. Personally, I am partial to snowflakes, when there is a business case to analyze the information at that particular level. <br />Snowflake Schema<br />The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy. <br />Sample snowflake schema<br />For example, the Time Dimension that consists of 2 different hierarchies: <br />1. Year -> Month -> Day 2. Week -> Day <br />We will have 4 lookup tables in a snowflake schema: A lookup table for year, a lookup table for month, a lookup table for week, and a lookup table for day. Year is connected to Month, which is then connected to Day. Week is only connected to Day. A sample snowflake schema illustrating the above relationships in the Time Dimension is shown to the right. <br />The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables. <br />Star schema<br />In the star schema design, a single object (the fact table) sits in the middle and is radially connected to other surrounding objects (dimension lookup tables) like a star. Each dimension is represented as a single table. The primary key in each dimension table is related to a forieng key in the fact table. <br />Sample star schema<br />All measures in the fact table are related to all the dimensions that fact table is related to. In other words, they all have the same level of granularity. <br />A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table. <br />Let's look at an example: Assume our data warehouse keeps store sales data, and the different dimensions are time, store, product, and customer. In this case, the figure on the left repesents our star schema. The lines between two tables indicate that there is a primary key / foreign key relationship between the two tables. Note that different dimensions are not related to one another. <br />Slowly Changing Dimensions<br />The \" Slowly Changing Dimension\" problem is a common one particular to data warehousing. In a nutshell, this applies to cases where the attribute for a record varies over time. We give an example below: <br />Christina is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in the customer lookup table has the following record: <br />Customer KeyNameState1001ChristinaIllinois<br />At a later date, she moved to Los Angeles, California on January, 2003. How should ABC Inc. now modify its customer table to reflect this change? This is the \" Slowly Changing Dimension\" problem. <br />There are in general three ways to solve this type of problem, and they are categorized as follows: <br />Type 1: The new record replaces the original record. No trace of the old record exists. <br />In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept. <br />In our example, recall we originally have the following table: <br />Customer KeyNameState1001ChristinaIllinois<br />After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table: <br />Customer KeyNameState1001ChristinaCalifornia<br />Advantages: <br />- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information. <br />Disadvantages: <br />- All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before. <br />Usage: <br />About 50% of the time. <br />When to use Type 1: <br />Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes<br />Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people. <br />In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The newe record gets its own primary key. <br />In our example, recall we originally have the following table: <br />Customer KeyNameState1001ChristinaIllinois<br />After Christina moved from Illinois to California, we add the new information as a new row into the table: <br />Customer KeyNameState1001ChristinaIllinois1005ChristinaCalifornia<br />Advantages: <br />- This allows us to accurately keep all historical information. <br />Disadvantages: <br />- This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern. <br />- This necessarily complicates the ETL process. <br />Usage: <br />About 50% of the time. <br />When to use Type 2: <br />Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.<br />Type 3: The original record is modified to reflect the change. <br />We next take a look at each of the scenarios and how the data model and the data looks like for each of them. Finally, we compare and contrast among the three alternatives. <br />n Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active. <br />In our example, recall we originally have the following table: <br />Customer KeyNameState1001ChristinaIllinois<br />To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns: <br />Customer Key<br />Name<br />Original State<br />Current State<br />Effective Date<br />After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003): Customer KeyNameOriginal StateCurrent StateEffective Date1001ChristinaIllinoisCalifornia15-JAN-2003<br />Advantages:
  • 13. - This does not increase the size of the table, since new information is updated.
  • 14. - This allows us to keep some part of history.
  • 16. - Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost.
  • 18. Type 3 is rarely used in actual practice.
  • 19. When to use Type 3:
  • 20. Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur for a finite number of timeConceptual Data Model<br />A conceptual data model identifies the highest-level relationships between the different entities. Features of conceptual data model include: <br />Includes the important entities and the relationships among them. <br />No attribute is specified. <br />No primary key is specified. <br />The figure below is an example of a conceptual data model. <br />Conceptual Data Model<br />From the figure above, we can see that the only information shown via the conceptual data model is the entities that describe the data and the relationships between those entities. No other information is shown through the conceptual data model. <br />Logical Data Model<br />A logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. Features of a logical data model include: <br />Includes all entities and relationships among them. <br />All attributes for each entity are specified. <br />The primary key for each entity is specified. <br />Foreign keys (keys identifying the relationship between different entities) are specified. <br />Normalization occurs at this level. <br />The steps for designing the logical data model are as follows: <br />Specify primary keys for all entities. <br />Find the relationships between different entities. <br />Find all attributes for each entity. <br />Resolve many-to-many relationships. <br />Normalization. <br />The figure below is an example of a logical data model. <br />Logical Data Model<br />Comparing the logical data model shown above with the conceptual data model diagram, we see the main differences between the two: <br />In a logical data model, primary keys are present, whereas in a conceptual data model, no primary key is present. <br />In a logical data model, all attributes are specified within an entity. No attributes are specified in a conceptual data model. <br />Relationships between entities are specified using primary keys and foreign keys in a logical data model. In a conceptual data model, the relationships are simply stated, not specified, so we simply know that two entities are related, but we do not specify what attributes are used for this relationship<br />Physical Data Model<br />Physical data model represents how the model will be built in the database. A physical database model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables. Features of a physical data model include: <br />Specification all tables and columns. <br />Foreign keys are used to identify relationships between tables. <br />Denormalization may occur based on user requirements. <br />Physical considerations may cause the physical data model to be quite different from the logical data model. <br />Physical data model will be different for different RDBMS. For example, data type for a column may be different between MySQL and SQL Server. <br />The steps for physical data model design are as follows: <br />Convert entities into tables. <br />Convert relationships into foreign keys. <br />Convert attributes into columns. <br />Modify the physical data model based on physical constraints / requirements. <br />The figure below is an example of a physical data model. <br />Physical Data Model<br />Comparing the logical data model shown above with the logical data model diagram, we see the main differences between the two: <br />Entity names are now table names. <br />Attributes are now column names. <br />Data type for each column is specified. Data types can be different depending on the actual database being used<br />Conceptual, Logical, and Physical Data Models<br />The three level of data modeling, conceptual data model, logical data model, and physical data model, were discussed in prior sections. Here we compare these three types of data models. The table below compares the different features: <br />FeatureConceptualLogicalPhysicalEntity Names✓✓  Entity Relationships✓✓  Attributes  ✓  Primary Keys  ✓✓Foreign Keys  ✓✓Table Names    ✓Column Names    ✓Column Data Types    ✓<br />Below we show the conceptual, logical, and physical versions of a single data model. Conceptual Model Design Logical Model Design Physical Model Design <br />We can see that the complexity increases from conceptual to logical to physical. This is why we always first start with the conceptual data model (so we understand at high level what are the different entities in our data and how they relate to one another), then move on to the logical data model (so we understand the details of our data without worrying about how they will actually implemented), and finally the physical data model (so we know exactly how to implement our data model in the database of choice). In a data warehousing project, sometimes the conceptual data model and the logical data model are considered as a single deliverableData Integrity<br />Data integrity refers to the validity of data, meaning data is consistent and correct. In the data warehousing field, we frequently hear the term, \" Garbage In, Garbage Out.\" If there is no data integrity in the data warehouse, any resulting report and analysis will not be useful. <br />In a data warehouse or a data mart, there are three areas of where data integrity needs to be enforced:<br />Database level<br />We can enforce data integrity at the database level. Common ways of enforcing data integrity include: <br />Referential integrity<br />The relationship between the primary key of one table and the foreign key of another table must always be maintained. For example, a primary key cannot be deleted if there is still a foreign key that refers to this primary key.<br />Primary key / Unique constraint<br />Primary keys and the UNIQUE constraint are used to make sure every row in a table can be uniquely identified.<br />Not NULL vs NULL-able<br />For columns identified as NOT NULL, they may not have a NULL value.<br />Valid Values<br />Only allowed values are permitted in the database. For example, if a column can only have positive integers, a value of '-1' cannot be allowed. <br />ETL process<br />For each step of the ETL process, data integrity checks should be put in place to ensure that source data is the same as the data in the destination. Most common checks include record counts or record sums. <br />Access level<br />We need to ensure that data is not altered by any unauthorized means either during the ETL process or in the data warehouse. To do this, there needs to be safeguards against unauthorized access to data (including physical access to the servers), as well as logging of all data access history. Data integrity can only ensured if there is no unauthorized access to the data<br />What Is OLAP<br />OLAP stands for On-Line Analytical Processing. The first attempt to provide a definition to OLAP was by Dr. Codd, who proposed 12 rules for OLAP. Later, it was discovered that this particular white paper was sponsored by one of the OLAP tool vendors, thus causing it to lose objectivity. The OLAP Report has proposed the FASMI test, Fast Analysis of Shared Multidimensional Information. For a more detailed description of both Dr. Codd's rules and the FASMI test, please visit The OLAP Report. <br />For people on the business side, the key feature out of the above list is \" Multidimensional.\" In other words, the ability to analyze metrics in different dimensions such as time, geography, gender, product, etc. For example, sales for the company is up. What region is most responsible for this increase? Which store in this region is most responsible for the increase? What particular product category or categories contributed the most to the increase? Answering these types of questions in order means that you are performing an OLAP analysis. <br />Depending on the underlying technology used, OLAP can be braodly divided into two different camps: MOLAP and ROLAP. A discussion of the different OLAP types can be found in the MOLAP, ROLAP, and HOLAP section. <br />MOLAP, ROLAP, and HOLAP<br />In the OLAP world, there are mainly two different types: Multidimensional OLAP (MOLAP) and Relational OLAP (ROLAP). Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and ROLAP. <br />MOLAP <br />This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats. <br />Advantages: <br />Excellent performance: MOLAP cubes are built for fast data retrieval, and is optimal for slicing and dicing operations. <br />Can perform complex calculations: All calculations have been pre-generated when the cube is created. Hence, complex calculations are not only doable, but they return quickly. <br />Disadvantages: <br />Limited in the amount of data it can handle: Because all calculations are performed when the cube is built, it is not possible to include a large amount of data in the cube itself. This is not to say that the data in the cube cannot be derived from a large amount of data. Indeed, this is possible. But in this case, only summary-level information will be included in the cube itself. <br />Requires additional investment: Cube technology are often proprietary and do not already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional investments in human and capital resources are needed. <br />ROLAP <br />This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a \" WHERE\" clause in the SQL statement. <br />Advantages: <br />Can handle large amounts of data: The data size limitation of ROLAP technology is the limitation on data size of the underlying relational database. In other words, ROLAP itself places no limitation on data amount. <br />Can leverage functionalities inherent in the relational database: Often, relational database already comes with a host of functionalities. ROLAP technologies, since they sit on top of the relational database, can therefore leverage these functionalities. <br />Disadvantages: <br />Performance can be slow: Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the relational database, the query time can be long if the underlying data size is large. <br />Limited by SQL functionalities: Because ROLAP technology mainly relies on generating SQL statements to query the relational database, and SQL statements do not fit all needs (for example, it is difficult to perform complex calculations using SQL), ROLAP technologies are therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by building into the tool out-of-the-box complex functions as well as the ability to allow users to define their own functions. <br />HOLAP <br />HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type information, HOLAP leverages cube technology for faster performance. When detail information is needed, HOLAP can \" drill through\" from the cube into the underlying relational data. <br />