Chapter 9 & 10 - Data Warehouse
Chapter 9 & 10 - Data Warehouse
Course outline
The Need for Data Analysis
Business Intelligence
Business Intelligence Architecture
Decision Support Data
Data Warehouse
Online Analytical Processing
Star Schema
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Objectives
At the end of this lesson, you should be able to:
Describe the need for data analysis
Describe business intelligence and its steps
Explain the components architecture of business intelligence
Explain the tools used in business intelligence
Explain the operational vs decision support data
Describe the contrasting characteristics of operational and decision support
data
Explain the decision support database requirements
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Introduction
Data are crucial raw material in this information age, and data
storage and management have become the focus of database
design and implementation.
The data warehouse extracts or obtains its data from operational databases
as well as from external sources, providing a more comprehensive data
pool.
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Business Intelligence
Business intelligence (BI) is a term used to describe a comprehensive,
cohesive, and integrated set of tools and processes used to capture,
collect, integrate, store, and analyze data with the purpose of
generating and presenting information used to support business
decision making.
Enhances the user’s ability to efficiently comprehend the meaning of the data
Techniques:
Pie charts and bar charts
Line graphs
Scatter plots
Gantt charts
Heat maps
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Monitoring and
Advanced reporting
alerting
Advanced data
analytics
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Integrating architecture
Personal analytics
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Data Warehouse
A central data repository where data from operational database and other
sources are integrated, cleaned, and standardized to support decision
making.
Oper Data
Marts Marts
DATA
WAREHOUSE
ARCHITECTURE
S
Bottom- Three-
up tier
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Two-tier architecture
An architecture for a data warehouse in which user departments directly use the
data warehouse rather than data marts.
Operational data are transformed and then transferred to a data warehouse.
A separate layer of servers may be used to support the complex activities of the
transformation process.
To assist with the transformation process, an enterprise data model (EDM) is
created.
EDM:
i. Describes the structure of the data warehouse
ii. Contains meta data for data transformation
iii. Contains details about cleaning and integrating data sources.
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Three-tier Architecture
An architecture for a data warehouse in which user departments access data
marts rather than the data warehouse.
To provide users with faster access while isolating them from data needed by
other user groups, smaller data warehouse called data marts are often used.
Data marts:
i. A subset or view of a data warehouse.
ii. typically at a department or functional level
iii. Act as the interface between end users and the corporate data warehouse.
iv. Storing a subset of the data warehouse.
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Bottom-up architecture
An architecture for a data warehouse in which data marts are built for user
departments.
Data are modeled one entity at a time and stored in separate data marts.
Over time, new data are synthesized, cleaned, and merged into existing data
marts or built into new data marts.
Data marts may eventually evolve into a data warehouse
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Characteristics
• Multidimensional data analysis techniques
• Advanced database support
• Easy-to-use end-user interfaces
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Augmenting functions
• Advanced data presentation functions
• Advanced data aggregation, consolidation, and classification functions
• Advanced computational functions
• Advanced data-modeling functions
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Access to many different kinds of DBMSs, flat files, and internal and external data
sources
Access to aggregated data warehouse data and to the detail data found in operational
databases
Ability to map end-user requests to appropriate data source and to proper data access
language
Advanced OLAP features are more useful when access is kept simple
OLAP Architecture
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
• Sparsity: Measures the density of the data held in the data cube
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Multidimensional representation
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
MULTIDIMENSIONAL TERMINOLOGY
• Dimension: subject label for a row or column
DIMENSIONS
• Hierarchies: members can have sub members eg Location dimension may
have hierarchy country state city
• Hierarchies can be used to drill down from higher level to lower level of detail
or roll up in reverse direction.
MEASURES
• Numeric operation such as simple arithmetic statistical calculations.
SLICE OPERATOR
• Focus on a subset of dimensions
DICE OPERATOR
• Focus on a subset of member values
• Replace dimension with a subset of values
• Dice operation often follows a slice operation
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
DRILL-DOWN
ROLL-UP/DRILL-UP
•Remove detail from a dimension.
PIVOT
• Rearrange dimensions so that data cube can be presented in a visually
appealing order.
•
• Most typically used on data cube of more than two dimensions.
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
We have a multidimensional data model with the fact table Sales and the dimension
tables Customers, Products, and Salespeople. The sales table below represents sales
data for 1st January 2010:
SALES
PRODUCT
CUSTOMER SALESPEOPLE
SALESPEOPLE
Custid Name Address
CUSTOMER 101 Kamal London
102 Mokthar New York
103 Bukhairi Paris
Draw a 3D picture of a data cube. Assume that all values that are missing
from the Sales table are 0.
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Star Schema
Data-modeling technique
Dimensions
Attributes
Attribute hierarchy
Many-to-one (M:1) relationship between fact table and each dimension table
• Partitioning: Splits tables into subsets of rows or columns and places them
close to customer location
• Periodicity: Provides information about the time span of the data stored in
the table
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
CONSTELLATION SCHEMA
Data modeling representation of multidimensional database
A constellation schema contains multiple facts table in the center related to
the dimension table
Typically, the facts table share some dimension tables
Multiple fact tables share dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
SNOWFLAKE SCHEMA
Data modeling representation of multidimensional database
Snowflake schema has multiple levels of dimension tables related to one or
more facts tables
The snowflake schema instead of the star schema for small dimension tables
that are not in 3NF
However, the snowflake structure can reduce the effectiveness of browsing,
since more joins will be needed
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Data Analytics
• Encompasses a wide range of mathematical, statistical, and modeling
techniques to extract knowledge from data
Subset of BI functionality
• Classification of tools
Explanatory analytics: Focuses on discovering and explaining data
characteristics and relationships based on existing data
Predictive analytics: Focuses on predicting future outcomes with a high
degree of accuracy
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Data Mining
• Analyzing massive amounts of data to:
Uncover hidden trends, patterns, and relationships
Form computer models to stimulate and explain the findings
Use the models to support business decision making
Data-Mining Phases
Author: Muhammad Hamiz Mohd Radzi Edited By: Zuhri Arafah Zulkifli
Predictive Analytics
• Employs mathematical and statistical algorithms, neural networks, artificial
intelligence, and other advanced modeling tools
References
Database Systems: A Practical Approach to Design, Implementation, and
Management, Thomas Connolly and Carolyn Begg, 5th Edition, 2010, Pearson.