0% found this document useful (0 votes)
2 views

Unit B Data Warehousing

Uploaded by

boneojohnlexter3
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit B Data Warehousing

Uploaded by

boneojohnlexter3
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

DATA

WAREHOUSING
IS1231 | Evaluation of Business
Performance
LEARNING OBJECTIVES
Explain the basic definitions and
concepts of data warehouses;

Discuss the data warehousing


architectures;

Describe the processes used in


developing and managing data
warehouses;

Define data warehousing


operations; and

Compare data integration and the


extraction, transformation, and
load (ETL) processes.
TABLE OF CONTENTS

01
Data
Warehouse

02
Data Warehouse
Architectures

03
Data Access:
Extraction,
Transformation,
Loading
0
DATA
WAREHOUSE

1
DATA WAREHOUSE
OLTP VS OLAP

OLTP OLAP
Online Transaction
Online Analytical Processing
Processing
OLTP systems are OLAP systems are
designed for managing designed for complex,
and processing day-to-day multidimensional analysis
operational data, such as of historical data. They are
customer orders, inventory used to extract insights,
management, and generate reports, and
financial transactions. support decision-making
They support real-time, processes.
high-volume, and
transactional operations.
DATA WAREHOUSE
DECISION SUPPORT SYSTEM
 Until the mid 1980s, enterprise database stored
only operational data. These are the data
created by business operations involved in daily
management processes, such as purchase
management, sales management and invoicing.
However, every business or enterprise must
have quick, comprehensive access to the
information required by decision-making
processes.

Decision Support System – It is a set of


expandable interactive IT techniques and tools
designed for processing and analyzing data and for
supporting managers in decision making. To do
this, the system matches individual resources of
managers with computer resources to improve the
quality of the decisions made.
DATA WAREHOUSE
DATA WAREHOUSING AND DATA WAREHOUSE

DATA
DATA
WAREHOUSIN
WAREHOUSE
G
It is a collection of methods, It is a collection of data
techniques, and tools used to that supports decision-
support knowledge workers to making processes
conduct data analyses that help
with performing decision-making
processes and improving
information resources.
DATA WAREHOUSE
DATA WAREHOUSE FEATURES

SUBJECT-ORIENTED INTEGRATED
Data warehouses are Data from various sources
organized around specific across the organization are
business subjects or areas of extracted, transformed, and
interest, such as sales, loaded (ETL) into the data
finance, or customer data, warehouse, ensuring
rather than being structured consistency and uniformity in
around operational data format and structure.
processes.

TIME-VARIANT NON-VOLATILE
Data warehouses store Once data is loaded into the
historical data, allowing data warehouse, it is not
users to analyze trends and frequently updated or
changes over time. This changed. This ensures data
historical perspective is stability for reporting and
crucial for business analysis analysis purposes.
and forecasting.
0
DATA WAREHOUSE
ARCHITECTURES

2
DATA WAREHOUSE ARCHITECTURES
DATA WAREHOUSE ARCHITECTURE PROPERTIES

SEPARATION

Analytical and
transactional processing
should be kept apart as
much as possible. Data warehouse
management should not be
overly difficult.

SCALABILITY
DATA WAREHOUSE ARCHITECTURES
DATA WAREHOUSE ARCHITECTURE PROPERTIES

EXTENSIBILIT
Y

The architecture should


be able to host new
applications and Monitoring accesses is
technologies without essential because of the
redesigning the whole strategic data stored in
system. data warehouses.

SECURITY
DATA WAREHOUSE ARCHITECTURES
DATA WAREHOUSE ARCHITECTURE PROPERTIES

ADMINISTERABILITY

Hardware and software architectures should be


easy to upgrade as the data volume, which has
to be managed and processed, and the number
of users’ requirements, which have to be met,
progressively increase.
DATA WAREHOUSE ARCHITECTURES
SINGLE-LAYER ARCHITECTURE

SOURCE LAYER

OPERATIONAL DATA

DATA WAREHOUSE
MIDDLEWARE

ANALYSIS

REPORTING OLAP
TOOLS TOOLS
DATA WAREHOUSE ARCHITECTURES
SINGLE-LAYER ARCHITECTURE
 A single-layer architecture is not frequently used in practice.
 Its goal is to minimize the amount of data stored and to reach this goal, it
removes data redundancies.
 Data warehouses in this architecture are virtual because the only layer
physically available is the source layer. This means that a data
warehouse is implemented as a multidimensional view of operational data
created by specific middleware or an intermediate processing layer.
DATA WAREHOUSE ARCHITECTURES
TWO-LAYER ARCHITECTURE

OPERATIONAL EXTERNAL SOURCE LAYER


DATA DATA

E.T.L. TOOLS DATA STAGING

DATA META-DATA
WAREHOUSE
DATA WAREHOUSE LAYER
DATA
MARTS

ANALYSIS
REPORTING OLAP DATA WHAT-IF ANALYSIS
TOOLS TOOLS MINING TOOLS
DATA WAREHOUSE ARCHITECTURES
TWO-LAYER ARCHITECTURE
 It is typically called a two-layer architecture to highlight a separation
between physically available sources and data warehouses.
 It actually consists of four subsequent data flow stages.

 Source Layer – A data warehouse system uses heterogeneous sources


of data. That data is originally stored to corporate relational databases
or legacy databases, or it may come from information systems outside
the corporate walls.
 Data Staging - The data stored to sources should be extracted,
cleansed to remove inconsistencies and fill gaps, and integrated to
merge heterogeneous sources into one common schema. The so-called
Extraction, Transformation, and Loading tools (ETL) can merge
heterogeneous schemata, extract, transform, cleanse, validate, filter,
and load source data into a data warehouse.
DATA WAREHOUSE ARCHITECTURES
TWO-LAYER ARCHITECTURE
 Data Warehouse Layer – Information is stored to one logically
centralized single repository which is a data warehouse. The data
warehouse can be directly accessed, but it can also be used as a source
for creating data marts, which partially replicate data warehouse
contents and are designed for specific enterprise departments. Meta-
data repositories store information on sources, access procedures, data
staging, users, data mart schemata, and so on.
 Analysis - In this layer, integrated data is efficiently and flexibly
accessed to issue reports, dynamically analyze information, and
simulate hypothetical business scenarios. Technologically speaking, it
should feature aggregate data navigators, complex query optimizers,
and user-friendly GUIs.
DATA WAREHOUSE ARCHITECTURES
THREE-LAYER ARCHITECTURE
OPERATIONAL EXTERNAL
DATA DATA SOURCE LAYER

E.T.L. TOOLS DATA STAGING

META-DATA RECONCILED LAYER


E.T.L. TOOLS LOADING

DATA
WAREHOUSE
DATA WAREHOUSE LAYER
DATA
MARTS

ANALYSIS
REPORTING OLAP DATA WHAT-IF ANALYSIS
TOOLS TOOLS MINING TOOLS
DATA WAREHOUSE ARCHITECTURES
THREE-LAYER ARCHITECTURE
 In this architecture, the third layer is the reconciled data layer or
operational data store. This layer materializes operational data
obtained after integrating and cleansing source data. As a result, those
data are integrated, consistent, correct, current, and detailed.
 The main advantage of the reconciled data layer is that it creates a
common reference data model for a whole enterprise. At the same time, it
sharply separate the problems of source data extraction and integration
from those of data warehouse population.
DATA ACCESS:

0
EXTRACTION,
TRANSFORMATION,
LOADING

3
DATA STAGING OPERATIONA
L&
EXTRACTION EXTERNAL
DATA

 This is where relevant data is obtained from


sources.

 Static Extraction – used when a data


warehouse needs populating for the first
time. It looks like a snapshot of
operational data.
 Incremental Extraction – used to
EXTRACTION
update data warehouses regularly, seizes
the changes applied to source data since
the latest extraction.
DATA STAGING
TRANSFORMATION
 This is the core of the reconciliation phase. It
converts data from its operational source
format into a specific data warehouse
format.

The following points must be rectified in this


TRANSFORMATIO
phase: N

 Loose texts may hide valuable


information – BigDeal LtD does not
explicitly show that this is a Limited
Partnership company
 Different formats can be used for
individual data – a data can be saved
as a string or as three integers RECONCILED
DATA
DATA STAGING
TRANSFORMATION
The following are the main transformation
processes aimed at populating the reconciled
data layer:

 Conversion and Normalization that


operate on both storage formats and
TRANSFORMATIO
units of measure to make data uniform N
 Matching that associates equivalent
fields in different sources
 Selection that reduces the number of
source fields and records

RECONCILED
DATA
DATA STAGING
TRANSFORMATION
When populating a data warehouse,
normalization is replaced by
denormalization because data warehouse
data are typically denormalized, and you need
aggregation to sum up data properly.
TRANSFORMATIO
N

RECONCILED
DATA
DATA STAGING
LOADING
 It is the last step to take into a data
warehouse.

Loading can be carried out in two ways:

 Refresh – Data warehouse data is


LOADING
completely rewritten. This means that
older data is replaced. Refresh is normally
used in combination with static extraction
to initially populate a data warehouse.
 Update – Only those changes applied to
source data are added to the data
warehouse. Update is typically carried out
without deleting or modifying preexisting
data. This technique is used in DATA
WAREHOUSE
combination with incremental extraction
THE END

You might also like