DW Source Integration, Tools, and Architecture
DW Source Integration, Tools, and Architecture
• Source Integration
• DW architecture
Aalborg University 2007 - DWML course 3 Aalborg University 2007 - DWML course 4
What Templates to Choose? Overview
Aalborg University 2007 - DWML course 5 Aalborg University 2007 - DWML course 6
Aalborg University 2007 - DWML course 7 Aalborg University 2007 - DWML course 8
Using DW Metadata in the Enterprise Analyst: “Why can’t I answer question X?”
Aalborg University 2007 - DWML course 9 Aalborg University 2007 - DWML course 10
Aalborg University 2007 - DWML course 11 Aalborg University 2007 - DWML course 12
Schema Integration Virtual Data Integration
• Only data definition is integrated
• Producing one global schema (one-shot or incremental)
Data only in sources, queries on views, queries shipped to sources
• Pre-integration
Not suited for DW?
Analyzing and annotating source schemata
• Carnot
Semantic enrichment of schema, often in richer data model
Individual schemata mapped onto rich GCL ontology (1. order logic)
• Schema comparison
Articulation axioms specify mappings, queries mapped to GCL
Determine correlations/conflicts among schema concepts
• SIMS
Heterogeneity conflicts – different source data models
Creates common class-based domain model to describe sources
Naming conflicts – homonyms and synonyms
Sources are dynamically chosen and integrated at query time
Semantic conflicts –different abstraction levels
Query reformulation, access planning, optimization, execution
Structural – different constructs
• Information Manifold
• Schema conforming
Relational world view + information source description + correspondences
Conform/align schemas to make them compatible
Metamodel enriched using description logic/Datalog rules
Typically semi-automatic process
Datalog queries, optimized by choosing ”minimal” sources
• Schema merging and restructuring
• TSIMMIS
“Superimpose” conformed schemas
Wrappers wrap sources using semi-structured OEM model
Quality: completeness, correctness, minimality, understandability
Mediator performs its own integration – no global integration (global as view)
Aalborg University 2007 - DWML course 13 Aalborg University 2007 - DWML course 14
Aalborg University 2007 - DWML course 15 Aalborg University 2007 - DWML course 16
DWQ Source Integration Architecture DWQ Source Integration Methodology
• Source-driven integration
Enterprise and source model construction
Source model integration (into the domain model)
Source and DW schema specification (+ mappings)
Data integration and reconciliation
Quality analysis steps in all phases above
• Client-driven integration
New client query considered
Reasoning determines whether query can be answered by
materialized views already in DW
◆ Query containment reasoning
If DW not sufficient, materialize new concepts in domain model?
Otherwise, new sources must be added using source-driven integr.
Note explicit
mappings!
Aalborg University 2007 - DWML course 17 Aalborg University 2007 - DWML course 18
Technical
Technical Product
Product
• Source Integration Architecture
Architecture Selection&
Selection&
Design
Design Installation
Installation
• DW architecture Business
Business Dimensional Physical
Data
Data Staging
Staging
Project
Project Dimensional Physical Design Maintenance
Maintenance
Requirements
Requirements Modeling Design Design && Deployment
Deployment
Planning
Planning Modeling Design Development and
and Growth
Growth
Definition
Definition Development
End-User
End-User End-User
End-User
Application
Application Application
Application
Specification
Specification Development
Development
Project
Project Management
Management
Aalborg University 2007 - DWML course 19 Aalborg University 2007 - DWML course 20
Technical DW Architecture Central DW
Existing databases
and systems (OLTP) New databases Clients
Appl. and systems (OLAP)
OLAP
DB DM • All data in one, central DW
Appl. • All client queries directly on the
Data
DB central DW
DM mining
• Pros
Appl. ETL How to organize
DW Simplicity
DB DW and DMs?
Aalborg University 2007 - DWML course 21 Aalborg University 2007 - DWML course 22
Clients
• Data stored in separate data marts, • Central DW is materialized
aimed at special departments • Data is distributed to data marts in
• Logical DW (i.e., virtual) one or more tiers
Finance Mrktng Distr. Bread
Milk 56 67
• Data marts contain detail data • Only aggregated data in cube tiers Aalborg 57 45
211
• Pros
Milk 56 67
• Cons
Bread
2000 2001
Copenhagen
57
123 127
45
211
More complex
DW DW
Milk 56 67 2000 2001
redundancy+distribution Bread
Aalborg 57 45 Bread
Milk 56 67
211
Copenhagen
57
123 127
45
211
2000 2001
Most complex
2000 2001
Source Source Hard to manage
Aalborg University 2007 - DWML course 23 Aalborg University 2007 - DWML course 24
Coordination w. Development Strategy Operational Data Store (ODS)
• Different development strategies pose different demands
Existing databases
to the architecture elements
and systems (OLTP)
• Example: Kimball Dimensional Modeling Appl.
OLAP
Centralized design of (conforming) dimensions DB DM
First, design of a single-source data mart Appl.
Later, design of multi-source data marts DB Data
DM mining
Integration of existing data marts into new data marts ETL
The DW is just the union of the marts it is composed of Appl.
DB
Entails top-down (“Bus Architecture”) and bottom-up elements …
DW
• Consequences ODS
Appl. Visua-
No initial design of DW, from which data marts are extracted lization
New
DB DB DM
Data is extracted directly from sources to data marts
Allows distribution of data marts and computation on them Appl.
DB
Aalborg University 2007 - DWML course 25 Aalborg University 2007 - DWML course 26
Aalborg University 2007 - DWML course 27 Aalborg University 2007 - DWML course 28
MS Analysis Services IBM DB2 OLAP Server
• Cheap
• “Light” version of Hyperion Essbase (OLAP market leader)
• Easy to use Extra product “on top of” DB2
• (R/M/H)OLAP technology • (R/M/H)OLAP
Data placement as desired
Data in DB2 or in multidimensional structures
• Intelligent pre-aggregation • Interfaces
• Server and client parts Hyperion Essbase API
Reporting Services a separate tool OLE DB for OLAP (promised)
• Built-in data mining • DB2 can also handle aggregates
Decision trees Automatic summary tables
Clustering Used by DB2 optimizer
• MS OLE DB for OLAP interface Automatic maintenance by DB2
Aalborg University 2007 - DWML course 29 Aalborg University 2007 - DWML course 30
Aalborg University 2007 - DWML course 31 Aalborg University 2007 - DWML course 32
MS vs. IBM vs. Oracle Virtual vs. Physical Cubes
• All
Good scalability
Good analysis facilities • Virtual cubes
Flexible storage (MOLAP, ROLAP, HOLAP) Logical cube specification directly on source data
Incremental update ROLAP implementation without aggregates
Many client tools + flexible, design can be changed quickly
• MS Analysis Server - performance, constant load on source DB
Built-in mining + good integration with MS SQL Server • Physical cubes
• DB2 OLAP Server Data for cube extracted and stored on OLAP server
Good integration with DB2 Several implementation choices possible
• Oracle + good performance, only source DB load at creation/update
Best RDBMS/MOLAP integration of the three - harder to change design
• All three products are good
Dependent on the other choices + existing technical architecture
Aalborg University 2007 - DWML course 33 Aalborg University 2007 - DWML course 34
Aalborg University 2007 - DWML course 35 Aalborg University 2007 - DWML course 36
Choosing Client Tools Architecture Alternatives - Conclusion
• Many OLAP clients on the market, e.g.,
Hyperion, Targit, Oracle
MS Reporting Services • Architecture alternatives, their pros and cons
• Client and server communicates via an API • No simple general choices
• MS OLE DB for OLAP • Choices dependent on the concrete situation
De facto standard Look at books
Supported by almost all client tools
Look at requirements specs
• Hyperion Essbase API Look at the latest products
Supported by many client tools
Think about prototyping
• Some criteria
Functionality (web distribution, analysis, reporting, …)
Support
Price
Aalborg University 2007 - DWML course 37 Aalborg University 2007 - DWML course 38
Aalborg University 2007 - DWML course 39 Aalborg University 2007 - DWML course 40