SlideShare a Scribd company logo
Data Warehousing
2
Problem: Heterogeneous Information
Sources
โ€œHeterogeneities are everywhereโ€
๏ฌ Different interfaces
๏ฌ Different data representations
๏ฌ Duplicate and inconsistent information
Personal
Databases
Digital Libraries
Scientific Databases
World
Wide
Web
3
Problem: Data Management in Large
Enterprises
๏‚ท Vertical fragmentation of informational systems
(vertical stove pipes)
๏‚ท Result of application (user)-driven development of
operational systems
Sales Administration Finance Manufacturing ...
Sales Planning
Stock Mngmt
...
Suppliers
...
Debt Mngmt
Num. Control
...
Inventory
4
Goal: Unified Access to Data
Integration System
๏‚ท Collects and combines information
๏‚ท Provides integrated view, uniform user interface
๏‚ท Supports sharing
World
Wide
Web
Digital Libraries Scientific Databases
Personal
Databases
5
๏‚ท Two Approaches:
๏€ญ Query-Driven (Lazy)
๏€ญ Warehouse (Eager)
Source Source
?
Why a Warehouse?
6
The Traditional Research Approach
Source Source
Source
. . .
Integration System
. . .
Metadata
Clients
Wrapper Wrapper
Wrapper
๏‚ท Query-driven (lazy, on-demand)
7
Disadvantages of Query-Driven
Approach
๏‚จ Delay in query processing
๏‚จ Slow or unavailable information sources
๏‚จ Complex filtering and integration
๏‚จ Inefficient and potentially expensive for
frequent queries
๏‚จ Competes with local processing at sources
๏‚จ Hasnโ€™t caught on in industry
8
The Warehousing Approach
Data
Warehouse
Clients
Source Source
Source
. . .
Extractor/
Monitor
Integration System
. . .
Metadata
Extractor/
Monitor
Extractor/
Monitor
๏‚ท Information
integrated in
advance
๏‚ท Stored in wh for
direct querying
and analysis
9
Advantages of Warehousing Approach
๏‚ท High query performance
๏€ญ But not necessarily most current information
๏‚ท Doesnโ€™t interfere with local processing at sources
๏€ญ Complex queries at warehouse
๏€ญ OLTP at information sources
๏‚ท Information copied at warehouse
๏€ญ Can modify, annotate, summarize, restructure, etc.
๏€ญ Can store historical information
๏€ญ Security, no auditing
๏‚ท Has caught on in industry
10
Not Either-Or Decision
๏‚ท Query-driven approach still better for
๏€ญ Rapidly changing information
๏€ญ Rapidly changing information sources
๏€ญ Truly vast amounts of data from large numbers
of sources
๏€ญ Clients with unpredictable needs
11
What is a Data Warehouse?
A Practitioners Viewpoint
โ€œA data warehouse is simply a single,
complete, and consistent store of data
obtained from a variety of sources and made
available to end users in a way they can
understand and use it in a business context.โ€
-- Barry Devlin, IBM Consultant
12
What is a Data Warehouse?
An Alternative Viewpoint
โ€œA DW is a
๏€ญ subject-oriented,
๏€ญ integrated,
๏€ญ time-varying,
๏€ญ non-volatile
collection of data that is used primarily in
organizational decision making.โ€
-- W.H. Inmon, Building the Data Warehouse, 1992
13
A Data Warehouse is...
๏‚ท Stored collection of diverse data
๏€ญ A solution to data integration problem
๏€ญ Single repository of information
๏‚ท Subject-oriented
๏€ญ Organized by subject, not by application
๏€ญ Used for analysis, data mining, etc.
๏‚ท Optimized differently from transaction-
oriented db
๏‚ท User interface aimed at executive
14
โ€ฆ Contโ€™d
๏‚ท Large volume of data (Gb, Tb)
๏‚ท Non-volatile
๏€ญ Historical
๏€ญ Time attributes are important
๏‚ท Updates infrequent
๏‚ท May be append-only
๏‚ท Examples
๏€ญ All transactions ever at Sainsburyโ€™s
๏€ญ Complete client histories at insurance firm
๏€ญ LSE financial information and portfolios
15
Generic Warehouse Architecture
Extractor/
Monitor
Extractor/
Monitor
Extractor/
Monitor
Integrator
Warehouse
Client Client
Design Phase
Maintenance
Loading
...
Metadata
Optimization
Query & Analysis
16
Data Warehouse Architectures:
Conceptual View
๏‚ท Single-layer
๏€ญ Every data element is stored once only
๏€ญ Virtual warehouse
๏‚ท Two-layer
๏€ญ Real-time + derived data
๏€ญ Most commonly used approach in
industry today
โ€œReal-time dataโ€
Operational
systems
Informational
systems
Derived Data
Real-time data
Operational
systems
Informational
systems
17
Three-layer Architecture:
Conceptual View
๏‚ท Transformation of real-time data to derived
data really requires two steps
Derived Data
Real-time data
Operational
systems
Informational
systems
Reconciled Data
Physical Implementation
of the Data Warehouse
View level
โ€œParticular informational
needsโ€
18
Data Warehousing: Two Distinct
Issues
(1) How to get information into warehouse
โ€œData warehousingโ€
(2) What to do with data once itโ€™s in
warehouse
โ€œWarehouse DBMSโ€
๏‚ท Both rich research areas
๏‚ท Industry has focused on (2)
19
Issues in Data Warehousing
๏‚ท Warehouse Design
๏‚ท Extraction
๏€ญ Wrappers, monitors (change detectors)
๏‚ท Integration
๏€ญ Cleansing & merging
๏‚ท Warehousing specification & Maintenance
๏‚ท Optimizations
๏‚ท Miscellaneous (e.g., evolution)
20
๏‚ท OLTP: On Line Transaction Processing
๏€ญ Describes processing at operational sites
๏‚ท OLAP: On Line Analytical Processing
๏€ญ Describes processing at warehouse
OLTP vs. OLAP
21
Warehouse is a Specialized DB
Standard DB (OLTP)
๏‚ท Mostly updates
๏‚ท Many small transactions
๏‚ท Mb - Gb of data
๏‚ท Current snapshot
๏‚ท Index/hash on p.k.
๏‚ท Raw data
๏‚ท Thousands of users (e.g.,
clerical users)
Warehouse (OLAP)
๏‚ท Mostly reads
๏‚ท Queries are long and complex
๏‚ท Gb - Tb of data
๏‚ท History
๏‚ท Lots of scans
๏‚ท Summarized, reconciled data
๏‚ท Hundreds of users (e.g.,
decision-makers, analysts)

More Related Content

Similar to SUPERB DATA WAREHOUSE.ppt (20)

PPT
2. olap warehouse
Azad public school
ย 
PPT
Introduction to Business Intelligence and Data warehousing - ppt
nansambakuluthum7
ย 
PPT
Data mining presentation for OLAP and other details
faraz9905580950
ย 
PPT
1-_Intro_to_Data_Minning__DWH.ppt
BsMath3rdsem
ย 
PPT
Data Mining and Data Warehouse Introuduction
gufranqureshi506
ย 
PPT
Data Mining and Warehousing Concept and Techniques
AnilkumarBrahmane2
ย 
PPT
04OLAP in data mining concept Online Analytical Processing.ppt
anitha803197
ย 
PPT
1.4 data warehouse
Krish_ver2
ย 
PPTX
Business Intelligence Module 3_Datawarehousing.pptx
AmbikaVenkatesh4
ย 
PPT
Topic(4)-OLAP data mining master ALEX.ppt
YazanMohamed1
ย 
PDF
BI Chapter 03.pdf business business business business business business
JawaherAlbaddawi
ย 
PPTX
04OLAPV2 from the course data warehousing
allendiorca
ย 
PDF
TOPIC 9 data warehousing and data mining.pdf
SCITprojects2022
ย 
PDF
data warehousing and data mining (1).pdf
SCITprojects2022
ย 
PPT
Ch03
Sayantan Bhowmik
ย 
PDF
Dbm630_Lecture02-03
Aj Kritsada Sriphaew
ย 
PDF
Dbm630_lecture02-03
Tokyo Institute of Technology
ย 
PPTX
DMDW Lesson 03 - Data Warehouse Theory
Johannes Hoppe
ย 
PPTX
Data warehousing
Shruti Dalela
ย 
PPT
data warehouse and data mining unit 2 ppt
PreetiSahu90690
ย 
2. olap warehouse
Azad public school
ย 
Introduction to Business Intelligence and Data warehousing - ppt
nansambakuluthum7
ย 
Data mining presentation for OLAP and other details
faraz9905580950
ย 
1-_Intro_to_Data_Minning__DWH.ppt
BsMath3rdsem
ย 
Data Mining and Data Warehouse Introuduction
gufranqureshi506
ย 
Data Mining and Warehousing Concept and Techniques
AnilkumarBrahmane2
ย 
04OLAP in data mining concept Online Analytical Processing.ppt
anitha803197
ย 
1.4 data warehouse
Krish_ver2
ย 
Business Intelligence Module 3_Datawarehousing.pptx
AmbikaVenkatesh4
ย 
Topic(4)-OLAP data mining master ALEX.ppt
YazanMohamed1
ย 
BI Chapter 03.pdf business business business business business business
JawaherAlbaddawi
ย 
04OLAPV2 from the course data warehousing
allendiorca
ย 
TOPIC 9 data warehousing and data mining.pdf
SCITprojects2022
ย 
data warehousing and data mining (1).pdf
SCITprojects2022
ย 
Dbm630_Lecture02-03
Aj Kritsada Sriphaew
ย 
Dbm630_lecture02-03
Tokyo Institute of Technology
ย 
DMDW Lesson 03 - Data Warehouse Theory
Johannes Hoppe
ย 
Data warehousing
Shruti Dalela
ย 
data warehouse and data mining unit 2 ppt
PreetiSahu90690
ย 

Recently uploaded (20)

PDF
ESUG 2025: Pharo 13 and Beyond (Stephane Ducasse)
ESUG
ย 
PDF
How AI in Healthcare Apps Can Help You Enhance Patient Care?
Lilly Gracia
ย 
PPTX
Odoo Migration Services by CandidRoot Solutions
CandidRoot Solutions Private Limited
ย 
PDF
custom development enhancement | Togglenow.pdf
aswinisuhu
ย 
PPTX
TexSender Pro 8.9.1 Crack Full Version Download
cracked shares
ย 
PPTX
Cutting Optimization Pro 5.18.2 Crack With Free Download
cracked shares
ย 
PPT
Brief History of Python by Learning Python in three hours
adanechb21
ย 
PDF
Notification System for Construction Logistics Application
Safe Software
ย 
PDF
Top 10 AI Use Cases Every Business Should Know.pdf
nicogonzalez1075
ย 
PDF
Odoo Customization Services by CandidRoot Solutions
CandidRoot Solutions Private Limited
ย 
PDF
chapter 5.pdf cyber security and Internet of things
PalakSharma980227
ย 
PPTX
Transforming Lending with IntelliGrow โ€“ Advanced Loan Software Solutions
Intelli grow
ย 
PPTX
Transforming Insights: How Generative AI is Revolutionizing Data Analytics
LetsAI Solutions
ย 
PPTX
PCC IT Forum 2025 - Legislative Technology Snapshot
Gareth Oakes
ย 
PDF
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
ย 
PPTX
prodad heroglyph crack 2.0.214.2 Full Free Download
cracked shares
ย 
PDF
How Attendance Management Software is Revolutionizing Education.pdf
Pikmykid
ย 
PDF
Code and No-Code Journeys: The Maintenance Shortcut
Applitools
ย 
PDF
AI Image Enhancer: Revolutionizing Visual Qualityโ€
docmasoom
ย 
PDF
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
ย 
ESUG 2025: Pharo 13 and Beyond (Stephane Ducasse)
ESUG
ย 
How AI in Healthcare Apps Can Help You Enhance Patient Care?
Lilly Gracia
ย 
Odoo Migration Services by CandidRoot Solutions
CandidRoot Solutions Private Limited
ย 
custom development enhancement | Togglenow.pdf
aswinisuhu
ย 
TexSender Pro 8.9.1 Crack Full Version Download
cracked shares
ย 
Cutting Optimization Pro 5.18.2 Crack With Free Download
cracked shares
ย 
Brief History of Python by Learning Python in three hours
adanechb21
ย 
Notification System for Construction Logistics Application
Safe Software
ย 
Top 10 AI Use Cases Every Business Should Know.pdf
nicogonzalez1075
ย 
Odoo Customization Services by CandidRoot Solutions
CandidRoot Solutions Private Limited
ย 
chapter 5.pdf cyber security and Internet of things
PalakSharma980227
ย 
Transforming Lending with IntelliGrow โ€“ Advanced Loan Software Solutions
Intelli grow
ย 
Transforming Insights: How Generative AI is Revolutionizing Data Analytics
LetsAI Solutions
ย 
PCC IT Forum 2025 - Legislative Technology Snapshot
Gareth Oakes
ย 
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
ย 
prodad heroglyph crack 2.0.214.2 Full Free Download
cracked shares
ย 
How Attendance Management Software is Revolutionizing Education.pdf
Pikmykid
ย 
Code and No-Code Journeys: The Maintenance Shortcut
Applitools
ย 
AI Image Enhancer: Revolutionizing Visual Qualityโ€
docmasoom
ย 
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
ย 
Ad

SUPERB DATA WAREHOUSE.ppt

  • 2. 2 Problem: Heterogeneous Information Sources โ€œHeterogeneities are everywhereโ€ ๏ฌ Different interfaces ๏ฌ Different data representations ๏ฌ Duplicate and inconsistent information Personal Databases Digital Libraries Scientific Databases World Wide Web
  • 3. 3 Problem: Data Management in Large Enterprises ๏‚ท Vertical fragmentation of informational systems (vertical stove pipes) ๏‚ท Result of application (user)-driven development of operational systems Sales Administration Finance Manufacturing ... Sales Planning Stock Mngmt ... Suppliers ... Debt Mngmt Num. Control ... Inventory
  • 4. 4 Goal: Unified Access to Data Integration System ๏‚ท Collects and combines information ๏‚ท Provides integrated view, uniform user interface ๏‚ท Supports sharing World Wide Web Digital Libraries Scientific Databases Personal Databases
  • 5. 5 ๏‚ท Two Approaches: ๏€ญ Query-Driven (Lazy) ๏€ญ Warehouse (Eager) Source Source ? Why a Warehouse?
  • 6. 6 The Traditional Research Approach Source Source Source . . . Integration System . . . Metadata Clients Wrapper Wrapper Wrapper ๏‚ท Query-driven (lazy, on-demand)
  • 7. 7 Disadvantages of Query-Driven Approach ๏‚จ Delay in query processing ๏‚จ Slow or unavailable information sources ๏‚จ Complex filtering and integration ๏‚จ Inefficient and potentially expensive for frequent queries ๏‚จ Competes with local processing at sources ๏‚จ Hasnโ€™t caught on in industry
  • 8. 8 The Warehousing Approach Data Warehouse Clients Source Source Source . . . Extractor/ Monitor Integration System . . . Metadata Extractor/ Monitor Extractor/ Monitor ๏‚ท Information integrated in advance ๏‚ท Stored in wh for direct querying and analysis
  • 9. 9 Advantages of Warehousing Approach ๏‚ท High query performance ๏€ญ But not necessarily most current information ๏‚ท Doesnโ€™t interfere with local processing at sources ๏€ญ Complex queries at warehouse ๏€ญ OLTP at information sources ๏‚ท Information copied at warehouse ๏€ญ Can modify, annotate, summarize, restructure, etc. ๏€ญ Can store historical information ๏€ญ Security, no auditing ๏‚ท Has caught on in industry
  • 10. 10 Not Either-Or Decision ๏‚ท Query-driven approach still better for ๏€ญ Rapidly changing information ๏€ญ Rapidly changing information sources ๏€ญ Truly vast amounts of data from large numbers of sources ๏€ญ Clients with unpredictable needs
  • 11. 11 What is a Data Warehouse? A Practitioners Viewpoint โ€œA data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.โ€ -- Barry Devlin, IBM Consultant
  • 12. 12 What is a Data Warehouse? An Alternative Viewpoint โ€œA DW is a ๏€ญ subject-oriented, ๏€ญ integrated, ๏€ญ time-varying, ๏€ญ non-volatile collection of data that is used primarily in organizational decision making.โ€ -- W.H. Inmon, Building the Data Warehouse, 1992
  • 13. 13 A Data Warehouse is... ๏‚ท Stored collection of diverse data ๏€ญ A solution to data integration problem ๏€ญ Single repository of information ๏‚ท Subject-oriented ๏€ญ Organized by subject, not by application ๏€ญ Used for analysis, data mining, etc. ๏‚ท Optimized differently from transaction- oriented db ๏‚ท User interface aimed at executive
  • 14. 14 โ€ฆ Contโ€™d ๏‚ท Large volume of data (Gb, Tb) ๏‚ท Non-volatile ๏€ญ Historical ๏€ญ Time attributes are important ๏‚ท Updates infrequent ๏‚ท May be append-only ๏‚ท Examples ๏€ญ All transactions ever at Sainsburyโ€™s ๏€ญ Complete client histories at insurance firm ๏€ญ LSE financial information and portfolios
  • 15. 15 Generic Warehouse Architecture Extractor/ Monitor Extractor/ Monitor Extractor/ Monitor Integrator Warehouse Client Client Design Phase Maintenance Loading ... Metadata Optimization Query & Analysis
  • 16. 16 Data Warehouse Architectures: Conceptual View ๏‚ท Single-layer ๏€ญ Every data element is stored once only ๏€ญ Virtual warehouse ๏‚ท Two-layer ๏€ญ Real-time + derived data ๏€ญ Most commonly used approach in industry today โ€œReal-time dataโ€ Operational systems Informational systems Derived Data Real-time data Operational systems Informational systems
  • 17. 17 Three-layer Architecture: Conceptual View ๏‚ท Transformation of real-time data to derived data really requires two steps Derived Data Real-time data Operational systems Informational systems Reconciled Data Physical Implementation of the Data Warehouse View level โ€œParticular informational needsโ€
  • 18. 18 Data Warehousing: Two Distinct Issues (1) How to get information into warehouse โ€œData warehousingโ€ (2) What to do with data once itโ€™s in warehouse โ€œWarehouse DBMSโ€ ๏‚ท Both rich research areas ๏‚ท Industry has focused on (2)
  • 19. 19 Issues in Data Warehousing ๏‚ท Warehouse Design ๏‚ท Extraction ๏€ญ Wrappers, monitors (change detectors) ๏‚ท Integration ๏€ญ Cleansing & merging ๏‚ท Warehousing specification & Maintenance ๏‚ท Optimizations ๏‚ท Miscellaneous (e.g., evolution)
  • 20. 20 ๏‚ท OLTP: On Line Transaction Processing ๏€ญ Describes processing at operational sites ๏‚ท OLAP: On Line Analytical Processing ๏€ญ Describes processing at warehouse OLTP vs. OLAP
  • 21. 21 Warehouse is a Specialized DB Standard DB (OLTP) ๏‚ท Mostly updates ๏‚ท Many small transactions ๏‚ท Mb - Gb of data ๏‚ท Current snapshot ๏‚ท Index/hash on p.k. ๏‚ท Raw data ๏‚ท Thousands of users (e.g., clerical users) Warehouse (OLAP) ๏‚ท Mostly reads ๏‚ท Queries are long and complex ๏‚ท Gb - Tb of data ๏‚ท History ๏‚ท Lots of scans ๏‚ท Summarized, reconciled data ๏‚ท Hundreds of users (e.g., decision-makers, analysts)