Building The Data WareHouse - Chapter 03
Building The Data WareHouse - Chapter 03
By Inmon
Chapter 3: The Data Warehouse and Design
Prepared By:Song Nguyen
Date: 05/09/2022
3.0 Introduction
There are two major components to
building a data warehouse:
The design of the interface from
operational systems.
The design of the data warehouse
itself.
3.1 Beginning with Operational Data
environment
Ongoing changes to the data warehouse
p.m. on January 5
3.3.2 The Midlevel Data Model (ct)
3.3.3 The Physical Data Model
3.3.3 The Physical Data Model (ct)
3.3.3 The Physical Data Model (con’t)
3.3.3 The Physical Data Model (con’t)
3.3.3 The Physical Data Model (con’t)
A unit of time
sourcing systems
Transformation encoding rules and
data types
Loading to new environment
3.8 Complexity of Transformation
and Integration (ct)
• The selection of data from the operational
environment may be very complex.
• Operational input keys usually must be
restructured and converted before they are
written out to the data warehouse.
• Nonkey data is reformatted as it passes from
the operational environment to the data
warehouse environment.
As a simple example, input data about a date is read
as YYYY/MM/DD and is written to the output file as
DD/MM/YYYY. (Reformatting of operational data
before it is ready to go into a data warehouse often
becomes much more complex than this simple
example.)
3.8 Complexity of Transformation
and Integration (ct)
• Data is cleansed as it passes from the operational
environment to the data warehouse environment.
• Multiple input sources of data exist and must be merged
as they pass into the data warehouse.
• When there are multiple input files, key resolution must
be done before the files can be merged.
• With multiple input files, the sequence of the files may
not be the same or even compatible.
• Multiple outputs may result. Data may be produced at
different levels of summarization by the same data
warehouse creation program.
3.8 Complexity of Transformation
and Integration (ct)
• Default values must be supplied.
• The efficiency of selection of input data
for extraction often becomes a real
issue.
• Summarization of data is often required.
• Tracking the renaming of data elements
as they are moved from the operational
environment to the data warehouse.
3.8 Complexity of Transformation
and Integration (ct)
The input record type conversion
• Fixed-length records
• Variable-length records
• Occurs depending on
• Occurs clause
Understand semantic (logical
meanings) data relationship of old
systems
3.8 Complexity of Transformation
and Integration (ct)
• Data format conversion must be done.
EBCDIC to ASCII (or vice versa) must
be spelled out.
• Massive volumes of input must be
accounted for.
• The design of the data warehouse must
conform to a corporate data model.
3.8 Complexity of Transformation
and Integration (ct)
• The data warehouse reflects the historical need
for information, while the operational
environment focuses on the immediate,
current need for information.
• The data warehouse addresses the
informational needs of the corporation, while
the operational environment addresses the up-
to-the-second clerical needs of the corporation.
• Transmission of the newly created output file
that will go into the data warehouse must be
accounted for.
3.9 Triggering the Data
Warehouse Record
The basic business interaction that
populated data warehouse is called an
event-snapshot interaction.
3.9.2 Components of the
Snapshot
The snapshot placed in the data warehouse
normally contains several components.
The unit of time that marks the occurrence of the
event.
The key that identifies the snapshot.
insurance policy.
3.10 Profile Records (sample)
The aggregation of operational data into a single data
warehouse record may take many forms, including the
following:
Values taken from operational data can be summarized.
Units of operational data can be tallied, where the total
number of units is captured.
Units of data can be processed to find the highest, lowest,
average, and so forth.
First and last occurrences of data can be trapped.
Data of certain types, falling within the boundaries of
several parameters, can be measured.
Data that is effective as of some moment in time can be
trapped.
The oldest and the youngest data can be trapped.
3.10 Profile Records (ct)
3.11 Managing Volume
3.12 Creating Multiple Profile
Records
Individual call records can be used to
create:
• A customer profile record
• A district traffic profile record
• A line analysis profile record so forth.
3.13 Going from the Data Warehouse
to the Operational Environment
3.14 Direct Operational Access
of Data Warehouse Data
3.14 Direct Operational Access of Data
Warehouse Data (Issues)
Data Latency (data from one source
may not be ready for loading)
Data Volume (sizing)
flatfiles, etc)
Different format or encoding rules
3.15 Indirect Access of Data
Warehouse Data (solution)
One of the most effective uses of the
data warehouse is the indirect access
of data warehouse data by the
operational environment
3.15.1 An Airline Commission Calculation
System (Operational example)
The customer requests a The airline clerk must enter
ticket and the travel agent and complete several
wants to know transactions:
Is there a seat available? Are there any seats
are involved?
Can the connections be
made?
What is the cost of the
ticket?
What is the commission?
3.15.1 An Airline Commission
Calculation System (ct)
3.15.2 A Retail Personalization
System
The retail sales While engaging the
representative could customer in
find out some other conversation, the
information about sales representative
may initiates
cust. “I see it’s been since
The last type of
February that we last
purchase made heard from you.”
The market segment “How was that blue
Market analysis/segmenting
3.15.3 Credit Scoring
based on (Demographics data)
The background check relies on the data
warehouse. In truth, the check is an eclectic one,
in which many aspects of the customer are
investigated, such as the following:
Past payback history
Home/property ownership
Financial management
Net worth
Gross income
Gross expenses
Other intangibles
3.15.3 Credit Scoring (ct)
The analysis program is run
periodically and produces a
prequalified file for use in the
operational environment. In addition
to other data, the prequalified file
includes the following:
Customer identification
processing requirements.
It fits very nicely with the data model.
3.17 Star Joins (ct)
3.17 Star Joins (ct)
3.17 Star Joins (ct)
3.17 Star Joins (ct)
3.17 Star Joins (ct)
3.17 Star Joins (ct)
3.17 Star Joins (ct)
3.17 Star Joins (ct)
3.18 Supporting the ODS
In general, there are four classes of ODS:
Class I—In a class I ODS, updates of data from the
operational environment to the ODS are synchronous.
Class II— In a class II ODS, the updates between the
operational environment and the ODS occur within a two-
to-three-hour time frame.
Class III—In a class III ODS, the synchronization of
updates between the operational environment and the ODS
occurs overnight.
Class IV—In a class IV ODS, updates into the ODS from
the data warehouse are unscheduled. Figure 3-56 shows
this support.
3.18 Supporting the ODS (ct)
The customer has been active for several years. The
analysis of the transactions in the data
warehouse is used to produce the following
profile information about a single customer:
Customer name and ID
Customer volume—high/low
Customer profitability—high/low
frequent/very infrequent
Customer likes/dislikes (fast cars, single malt
scotch)
3.18 Supporting the ODS (ct)
3.19 Requirements and the
Zachman Framework
3.19 Requirements and the
Zachman Framework (ct)
Summary
Design of data warehouse
• Corporate Data model
• Operational data model
• Iterative approach since requirements are a non-priori
• Different SDLC approach
Data warehouse construction considerations
• Data Volume (large size)
• Data Latency (late arrival of data set)
• Require transformation and understand of legacy
Data Models (granularities)
• Low level
• Mid Level
• High Level
Structure of typical record in data warehouse
• Time stamp, a surrogate key, direct data, secondary data
Summary
(cont’)
Reference tables must be manage in time-variant
manner
Data Latency – wrinkles of time
Data Transformation is complex
• Different architectures
• Different technologies
• Different encoding rules and complex logics
Creation of data warehouse record is triggered by
on event (activity)
A profile record is a composite representation of
data (historical activities
Star Join (is a preferred database design
techniques