Unit B Data Warehousing
Unit B Data Warehousing
WAREHOUSING
IS1231 | Evaluation of Business
Performance
LEARNING OBJECTIVES
Explain the basic definitions and
concepts of data warehouses;
01
Data
Warehouse
02
Data Warehouse
Architectures
03
Data Access:
Extraction,
Transformation,
Loading
0
DATA
WAREHOUSE
1
DATA WAREHOUSE
OLTP VS OLAP
OLTP OLAP
Online Transaction
Online Analytical Processing
Processing
OLTP systems are OLAP systems are
designed for managing designed for complex,
and processing day-to-day multidimensional analysis
operational data, such as of historical data. They are
customer orders, inventory used to extract insights,
management, and generate reports, and
financial transactions. support decision-making
They support real-time, processes.
high-volume, and
transactional operations.
DATA WAREHOUSE
DECISION SUPPORT SYSTEM
Until the mid 1980s, enterprise database stored
only operational data. These are the data
created by business operations involved in daily
management processes, such as purchase
management, sales management and invoicing.
However, every business or enterprise must
have quick, comprehensive access to the
information required by decision-making
processes.
DATA
DATA
WAREHOUSIN
WAREHOUSE
G
It is a collection of methods, It is a collection of data
techniques, and tools used to that supports decision-
support knowledge workers to making processes
conduct data analyses that help
with performing decision-making
processes and improving
information resources.
DATA WAREHOUSE
DATA WAREHOUSE FEATURES
SUBJECT-ORIENTED INTEGRATED
Data warehouses are Data from various sources
organized around specific across the organization are
business subjects or areas of extracted, transformed, and
interest, such as sales, loaded (ETL) into the data
finance, or customer data, warehouse, ensuring
rather than being structured consistency and uniformity in
around operational data format and structure.
processes.
TIME-VARIANT NON-VOLATILE
Data warehouses store Once data is loaded into the
historical data, allowing data warehouse, it is not
users to analyze trends and frequently updated or
changes over time. This changed. This ensures data
historical perspective is stability for reporting and
crucial for business analysis analysis purposes.
and forecasting.
0
DATA WAREHOUSE
ARCHITECTURES
2
DATA WAREHOUSE ARCHITECTURES
DATA WAREHOUSE ARCHITECTURE PROPERTIES
SEPARATION
Analytical and
transactional processing
should be kept apart as
much as possible. Data warehouse
management should not be
overly difficult.
SCALABILITY
DATA WAREHOUSE ARCHITECTURES
DATA WAREHOUSE ARCHITECTURE PROPERTIES
EXTENSIBILIT
Y
SECURITY
DATA WAREHOUSE ARCHITECTURES
DATA WAREHOUSE ARCHITECTURE PROPERTIES
ADMINISTERABILITY
SOURCE LAYER
OPERATIONAL DATA
DATA WAREHOUSE
MIDDLEWARE
ANALYSIS
REPORTING OLAP
TOOLS TOOLS
DATA WAREHOUSE ARCHITECTURES
SINGLE-LAYER ARCHITECTURE
A single-layer architecture is not frequently used in practice.
Its goal is to minimize the amount of data stored and to reach this goal, it
removes data redundancies.
Data warehouses in this architecture are virtual because the only layer
physically available is the source layer. This means that a data
warehouse is implemented as a multidimensional view of operational data
created by specific middleware or an intermediate processing layer.
DATA WAREHOUSE ARCHITECTURES
TWO-LAYER ARCHITECTURE
DATA META-DATA
WAREHOUSE
DATA WAREHOUSE LAYER
DATA
MARTS
ANALYSIS
REPORTING OLAP DATA WHAT-IF ANALYSIS
TOOLS TOOLS MINING TOOLS
DATA WAREHOUSE ARCHITECTURES
TWO-LAYER ARCHITECTURE
It is typically called a two-layer architecture to highlight a separation
between physically available sources and data warehouses.
It actually consists of four subsequent data flow stages.
DATA
WAREHOUSE
DATA WAREHOUSE LAYER
DATA
MARTS
ANALYSIS
REPORTING OLAP DATA WHAT-IF ANALYSIS
TOOLS TOOLS MINING TOOLS
DATA WAREHOUSE ARCHITECTURES
THREE-LAYER ARCHITECTURE
In this architecture, the third layer is the reconciled data layer or
operational data store. This layer materializes operational data
obtained after integrating and cleansing source data. As a result, those
data are integrated, consistent, correct, current, and detailed.
The main advantage of the reconciled data layer is that it creates a
common reference data model for a whole enterprise. At the same time, it
sharply separate the problems of source data extraction and integration
from those of data warehouse population.
DATA ACCESS:
0
EXTRACTION,
TRANSFORMATION,
LOADING
3
DATA STAGING OPERATIONA
L&
EXTRACTION EXTERNAL
DATA
RECONCILED
DATA
DATA STAGING
TRANSFORMATION
When populating a data warehouse,
normalization is replaced by
denormalization because data warehouse
data are typically denormalized, and you need
aggregation to sum up data properly.
TRANSFORMATIO
N
RECONCILED
DATA
DATA STAGING
LOADING
It is the last step to take into a data
warehouse.