0% found this document useful (0 votes)
46 views30 pages

Overview of Data Warehouse

A data warehouse is a subject-oriented, integrated collection of non-volatile data from multiple sources used to support management decisions. It provides a historical, integrated view of data that is accessed by users for analysis. A data warehouse contains data integrated from operational systems over time for querying and analysis rather than real-time transactions. It aims to enable better business intelligence and faster decision making.

Uploaded by

Asis Mahalik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views30 pages

Overview of Data Warehouse

A data warehouse is a subject-oriented, integrated collection of non-volatile data from multiple sources used to support management decisions. It provides a historical, integrated view of data that is accessed by users for analysis. A data warehouse contains data integrated from operational systems over time for querying and analysis rather than real-time transactions. It aims to enable better business intelligence and faster decision making.

Uploaded by

Asis Mahalik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

OVERVIEW OF DATA WAREHOUSE

What is a Data Warehouse ? Can I see credit report


Can I see credit report
from Accounts, Sales
from Accounts, Sales Data from
Data from
from marketing and multiple sources is
from marketing and multiple sources is
open order report from integrated for a
open order report from integrated for a
order entry for this subject
order entry for this subject
customer
customer

A data warehouse is a subject-oriented,


integrated, nonvolatile, time-variant collection
of data in support of management's decisions.
Identical queries will
Identical queries will
give same results at
give same results at
different times.
different times.
- WH Inmon
Supports analysis
Supports analysis
requiring historical data
requiring historical data
Data stored for historical
Data stored for historical
period. Data is populated in
period. Data is populated in
the data warehouse on
the data warehouse on
daily/weekly basis
daily/weekly basis
depending upon the
depending upon the
requirement.
requirement.

WH Inmon - Regarded As Father Of Data Warehousing


Subject-Oriented-
Characteristics of a Data
Warehouse Data
Operational
Warehouse

Leads Prospects Customers Products

Quotes Orders Regions Time

Focus is on Subject Areas rather than Applications


Integrated - Characteristics of a
Appl A - m,f Data Warehouse
Appl B - 1,0 m,f
Appl C - male,female

Appl A - balance dec fixed (13,2)


balance dec
Appl B - balance pic 9(9)V99
fixed (13,2)
Appl C - balance pic S9(7)V99 comp-3

Appl A - bal-on-hand
Appl B - current-balance Current balance
Appl C - cash-on-hand

Appl A - date (julian)


Appl B - date (yymmdd) date (julian)
Appl C - date (absolute)

Integrated View Is The Essence Of A Data Warehouse


Non-volatile - Characteristics of
insert
a Data Warehouse
change

Operational Data
Warehouse
insert
delete
load
read only
access
replace
change

Integrated View Is The Essence Of A Data Warehouse


Time Variant - Characteristics of
a Data Warehouse
Operational Data
Warehouse

Current Value data Snapshot data


• time horizon : 60-90 days • time horizon : 5-10 years
• key may not have element of time • key has an element of time
• data warehouse stores historical

data

Data Warehouse Typically Spans Across Time


Alternate Definitions

A collection of integrated, subject oriented


databases designed to support the DSS
function, where each unit of data is
relevant to some moment of time
- Imhoff
Alternate Definitions

Data Warehouse is a repository of data


summarized or aggregated in simplified
form from operational systems. End user
orientated data access and reporting tools
let user get at the data for decision
support - Babcock
Evolution of Data Warehousing
1960 - 1985 : MIS Era

• Unfriendly
• Slow
• Dependent on IS programmers
• Inflexible
• Analysis limited to defined reports
Focus on Reporting
Evolution of Data Warehousing
1985 - 1990 : Querying Era

• Adhoc, unstructured access to corporate data


• SQL as interface not scalable
• Cannot handle complex analysis

Focus on Online Querying


Evolution of Data Warehousing
1990 - 20xx : Analysis Era

• Trend Analysis
• What If ?
• Moving Averages
• Cross Dimensional Comparisons
• Statistical profiles
• Automated pattern and rule discovery
Focus on Online Analysis
Need for Data Warehousing
 Better business intelligence for end-users
 Reduction in time to locate, access, and analyze
information
 Consolidation of disparate information sources
 Strategic advantage over competitors
 Faster time-to-market for products and services
 Replacement of older, less-responsive decision
support systems
 Reduction in demand on IS to generate reports
OLTP Vs Warehouse

Operational System Data Warehouse


Transaction Processing Query Processing
Time Sensitive History Oriented
Operator View Managerial View
Organized by transactions (Order, Organized by subject (Customer,
Input, Inventory) Product)
Relatively smaller database Large database size
Many concurrent users Relatively few concurrent users
Volatile Data Non Volatile Data
Stores all data Stores relevant data
Not Flexible Flexible
Processing Power Capacity Planning

Time of day
Processing Load Peaks During the Beginning and End of Day
Examples Of Some Applications
Manufacturers
Manufacturers  Target Marketing
Retailers
Retailers
 Market Segmentation

 Budgeting

 Credit Rating Agencies

 Financial Reporting and Consolidation

 Market Basket Analysis - POS Analysis


 Churn Analysis Customers
Customers

 Profitability Management
 Event tracking
Do we need a separate database ?

 OLTP and data warehousing require two very


differently configured systems
 Isolation of Production System from Business
Intelligence System
 Significant and highly variable resource demands
of the data warehouse
 Cost of disk space no longer a concern
 Production systems not designed for query
processing
Data Marts
 Enterprise wide data warehousing projects have a
very large cycle time
 Getting consensus between multiple parties may
also be difficult
 Departments may not be satisfied with priority
accorded to them
 Sometimes individual departmental needs may be
strong enough to warrant a local implementation
 Application/database distribution is also an
important factor
Data Marts
 Subject or Application Oriented Business
View of Warehouse
 Quick Solution to a specific Business Problem
 Finance, Manufacturing, Sales etc.
 Smaller amount of data used for Analytic
Processing

A Logical Subset of The Complete Data Warehouse


Data Warehouses or Data Marts
Companies that want a quick solution to a specific
business problem are better served by a
standalone data mart.
Some companies opt to build a warehouse
incrementally, data mart by data mart.
For companies interested in changing their
corporate cultures or integrating separate
departments, an enterprise wide approach makes
sense.
A Logical Subset of The Complete Data Warehouse
Data Warehouse and Data Mart
Data Warehouse Data Marts
Scope Application Neutral SpecificApplication
Centralized, Shared Requirement
Cross LOB/enterprise LOB, department

Business Process

Oriented
Historical
Detailed data Detailed (some
Data
Some summary history)
Perspecti Summarized
ve
Multiple subject areas Single Partial subject
Subjects
Multiple partial

subjects
Data Warehouse and Data Mart
Data Warehouse Data Marts
Data Sources Many Few

Operational/ External Operational,

Data external data


Implement 9-18 months for first 4-12 months
stage
Time Frame Multiple stage

implementation
Characteristi Flexible,extensible Restrictive, non
Durable/Strategic extensible
cs Data orientation Short life/tactical

Project Orientation
Warehouse or Mart First ?

Data Warehouse First Data Mart first


Expensive Relatively cheap
Large development cycle Delivered in < 6 months

Change management is difficult Easy to manage change

Difficult to obtain continuous Can lead to independent and


corporate support incompatible marts
Technical challenges in building Cleansing, transformation,
large databases modeling techniques may be
incompatible
OLTP Systems Vs Data Warehouse

Remember
Between OLTP and Data Warehouse systems

users are different

data content is different,

data structures are different

hardware is different
Understanding The Differences Is The Key
Operational Data Store - Definition

A
Data
B ODS Warehouse

Operational
DSS
Can I see credit
report from
Accounts, Sales

Operational Data Store - Definition


from Data from multiple
marketing and sources is integrated
open order for a subject
report from
order entry for
this customer

A subject oriented, integrated,


volatile, current valued data store
Identical queries may
give different results
containing only corporate
detailed data
at different times.
Supports analysis Data stored only for
requiring current current period. Old
data Data is either
archived or moved to
Data Warehouse
Operational Data Store
 The ODS applies only to the world of
operational systems.
 The ODS contains current valued and near
current valued data.
 The ODS contains almost exclusively all
detail data
 The ODS requires a full function, update,
record oriented environment.
Operational Data Store
 Functions of an ODS
 Converts Data,
 Decides Which Data of Multiple Sources Is the
Best,
 Summarizes Data,
 Decodes/encodes Data,
 Alters the Key Structures,
 Alters the Physical Structures,
 Reformats Data,
 Internally Represents Data,
 Recalculates Data.
Different kinds of Information
Needs
Is this medicine available
 Current
Current in stock

What are the tests this


 Recent
Recent patient has completed so
far

Has the incidence of


 Historical
Historical Tuberculosis increased in
last 5 years in Southern
region
OLTP Vs ODS Vs DWH
Characteristic OLTP ODS Data Warehouse

Audience Operating Personnel Analysts Managers and analysts

Data access Individual records, Individual records, Set of records, analysis


transaction driven transaction or analysis driven
driven

Data content Current, real-time Current and near- Historical


current

Data Structure Detailed Detailed and lightly Detailed and


summarized Summarized

Data organization Functional Subject-oriented Subject-oriented

Type of Data Homogeneous Homogeneous Vast Supply of very


heterogeneous data
OLTP Vs ODS Vs DWH
Characteristi OLTP ODS Data
c Warehouse
Data Non-redundant within Somewhat Managed
system; Unmanaged redundant with redundancy
redundancy redundancy among operational
systems databases
Data update Field by field Field by field Controlled batch

Database Moderate Moderate Large to very


large
size
Development Requirements driven, Data driven, Data driven,
structured somewhat evolutionary
evolutionary
Methodology
Philosophy Support day-to-day Support day-to- Support
operation day decisions managing the
& operational enterprise

You might also like