0% found this document useful (0 votes)
45 views

DW Source Integration, Tools, and Architecture

This document provides an overview of data warehouse front end tools, source integration, and data warehouse architecture. It discusses end user applications for canned reports and templates. It also covers topics like views on data warehouse metadata, schema integration, and virtual data integration. The document is from a course on data warehousing and contains slides on various concepts, tools, and practices related to building and managing a data warehouse.

Uploaded by

RafaVelasquez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

DW Source Integration, Tools, and Architecture

This document provides an overview of data warehouse front end tools, source integration, and data warehouse architecture. It discusses end user applications for canned reports and templates. It also covers topics like views on data warehouse metadata, schema integration, and virtual data integration. The document is from a course on data warehousing and contains slides on various concepts, tools, and practices related to building and managing a data warehouse.

Uploaded by

RafaVelasquez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Overview

DW Source Integration, Tools,


and Architecture
• DW Front End Tools

• Source Integration

• DW architecture

Original slides were written by


Torben Bach Pedersen
Aalborg University 2007 - DWML course 2

End User Applications (EUA) EUA Concepts

• The business impact of the DW! • Templates


• Canned reports  Layout/structure + parameters
 End user application templates  Compare sales per product in <area> for <period1> and <period2>
 Provide answers to common questions
• Parameters - chosen at run-time
 Can be used as (quality-assured) building blocks for other reports
 Come from any level of the given dimension – drill-down
• Two extremes  Time (All time, 2002, 2002 4Q, 2002 Dec, 2002 Dec 1) possible
 Ad hoc strategic analysis, power users, DIY query tools  Many different
 Fixed operational analysis, report consumers, operational reporting
• Identify report candidates
• EUA fills the gap  Produce a list of candidates
 “Tactical” analysis, push-button knowledge workers
• Consolidate candidate list
 Categorize candidates by data elements

Aalborg University 2007 - DWML course 3 Aalborg University 2007 - DWML course 4
What Templates to Choose? Overview

• “Analytical Cycle” Steps (repeats) • DW Front End Tools


1) How’s business? – current performance
2) What are the trends? – performance over time
• Source Integration
3) What’s unusual? – quick identification of exceptions (+/-)
4) What is driving the exceptions? – find causes for exceptions
5) What if …? – play around with parameters and see effect • DW architecture
6) Make a business decision – small as well as big decisions
7) Implement the decision – feed analysis results into op. systems
• Prioritize template list
 Rank or group templates – implement 15 most important at first

Aalborg University 2007 - DWML course 5 Aalborg University 2007 - DWML course 6

Data Integration Research Projects Views on DW Metadata


• Focus on source integration and update propagation
• Wrapper: convert source data into a standard format
• Information Manifold
 Sources: databases, SGML docs, unstructured files,… • Most DW projects: DW architecture as
 Relational integration data model a “stepwise flow” of information from
source to analyst
• TSIMMIS • No conceptual domain model used for
 Wrapper/mediator integration
 Semi-structured OEM integration data model  Some questions cannot be answered
• DWQ project: extended metamodel to
• Squirrel capture all relevant aspects
 Powerful “integration mediator”
• WHIPS
 Wrapper/mediator
 Relational integration data model

Aalborg University 2007 - DWML course 7 Aalborg University 2007 - DWML course 8
Using DW Metadata in the Enterprise Analyst: “Why can’t I answer question X?”

• Analyst wants to analyze data


Gather data from operational


departments through OLTP


• Possible reasons
(5)
 Question travels through (1)-(5)  Certain measures not included in fact table
• Traditional DW (previous slide) (4)  Granularity of facts too coarse
only describes step (3)-(4)
 Particular dimensions not in DW
 Cannot solve problems like
“why can’t I answer quest. X?”  Descriptive attributes missing from dimensions
(3)
• Conceptual relationships between (1)
 Meaning of attributes/measures deviate from the
enterprise model, operational analyst’s expectation
(2)
models + DW must be captured
 Everything is a view on the
 ……
enterprise model ! (“local as
view”) – unlike previous slide

Aalborg University 2007 - DWML course 9 Aalborg University 2007 - DWML course 10

DWQ Metadata Source integration practice

• Three metadata perspectives • Focus on information integration in databases


must be captured (schema and data)
 Conceptual (enterprise) • Two main approaches
 Logical (data model)
 Constructing integrated enterprise model
 Physical (data flow)
 Focus on mappings between sources and DW
• Framework instantiated by
conceptual, logical, and physical • Tools for DW management
information models  Schema integration
• DW quality heavily depends on  Metadata management
DW processes rather than  Based on modeling tools
schemas
• Tools for data integration
• A process meta model is needed
to capture process definitions, and  Mapping specification
the relationships to DW quality  ETL tools – like last lecture

Aalborg University 2007 - DWML course 11 Aalborg University 2007 - DWML course 12
Schema Integration Virtual Data Integration
• Only data definition is integrated
• Producing one global schema (one-shot or incremental)
 Data only in sources, queries on views, queries shipped to sources
• Pre-integration
 Not suited for DW?
 Analyzing and annotating source schemata
• Carnot
 Semantic enrichment of schema, often in richer data model
 Individual schemata mapped onto rich GCL ontology (1. order logic)
• Schema comparison
 Articulation axioms specify mappings, queries mapped to GCL
 Determine correlations/conflicts among schema concepts
• SIMS
 Heterogeneity conflicts – different source data models
 Creates common class-based domain model to describe sources
 Naming conflicts – homonyms and synonyms
 Sources are dynamically chosen and integrated at query time
 Semantic conflicts –different abstraction levels
 Query reformulation, access planning, optimization, execution
 Structural – different constructs
• Information Manifold
• Schema conforming
 Relational world view + information source description + correspondences
 Conform/align schemas to make them compatible
 Metamodel enriched using description logic/Datalog rules
 Typically semi-automatic process
 Datalog queries, optimized by choosing ”minimal” sources
• Schema merging and restructuring
• TSIMMIS
 “Superimpose” conformed schemas
 Wrappers wrap sources using semi-structured OEM model
 Quality: completeness, correctness, minimality, understandability
 Mediator performs its own integration – no global integration (global as view)

Aalborg University 2007 - DWML course 13 Aalborg University 2007 - DWML course 14

Materialized Data Integration DWQ Source Integration


• Current DW tools cannot fully support DW quality
• Views on source data are materialized in integrated DB  No support for validation of interschema assertions and other
specified relationships, i.e., the DW design process
• Squirrel
 Integration mediators incrementally maintain materialized views
• Conceptual perspective
 Domain model = enterprise model + source models
 Cooperation of sources required
 Consolidated and reconciled description of important concepts
• WHIPS ◆ Not all enterprise data captured (at first, incremental approach)
 Relational SPJ + aggregation views specified in view tree
 Logic-based formalism allows reasoning over metadata
 View manager computes view and handles updates
 Intermodel assertions capture interdependencies
 Integrator ensures view maintainability
• Logical perspective
 Global query processor queries sources using wrappers/mediators
 Source schemata + DW schema in logical data model (relational)
• In combination with virtual integration?  Defined as queries over the corresponding conceptual component
• Physical perspective
 The actual data stores

Aalborg University 2007 - DWML course 15 Aalborg University 2007 - DWML course 16
DWQ Source Integration Architecture DWQ Source Integration Methodology

• Source-driven integration
 Enterprise and source model construction
 Source model integration (into the domain model)
 Source and DW schema specification (+ mappings)
 Data integration and reconciliation
 Quality analysis steps in all phases above
• Client-driven integration
 New client query considered
 Reasoning determines whether query can be answered by
materialized views already in DW
◆ Query containment reasoning
 If DW not sufficient, materialize new concepts in domain model?
 Otherwise, new sources must be added using source-driven integr.
Note explicit
mappings!
Aalborg University 2007 - DWML course 17 Aalborg University 2007 - DWML course 18

Overview Lifecycle Overview

• DW Front End Tools

Technical
Technical Product
Product
• Source Integration Architecture
Architecture Selection&
Selection&
Design
Design Installation
Installation

• DW architecture Business
Business Dimensional Physical
Data
Data Staging
Staging
Project
Project Dimensional Physical Design Maintenance
Maintenance
Requirements
Requirements Modeling Design Design && Deployment
Deployment
Planning
Planning Modeling Design Development and
and Growth
Growth
Definition
Definition Development

End-User
End-User End-User
End-User
Application
Application Application
Application
Specification
Specification Development
Development

Project
Project Management
Management

Aalborg University 2007 - DWML course 19 Aalborg University 2007 - DWML course 20
Technical DW Architecture Central DW
Existing databases
and systems (OLTP) New databases Clients
Appl. and systems (OLAP)
OLAP
DB DM • All data in one, central DW
Appl. • All client queries directly on the
Data
DB central DW
DM mining
• Pros
Appl. ETL How to organize
DW Simplicity
DB DW and DMs?


…  Easy to manage Central


Data
Warehouse • Cons DW
Appl. Visua-  Bad performance due to no
lization redundancy/ workload
DB DM
Data Marts distribution
Appl.
Source Source
DB
Clients

Aalborg University 2007 - DWML course 21 Aalborg University 2007 - DWML course 22

Federated DW Tiered Architecture

Clients
• Data stored in separate data marts, • Central DW is materialized
aimed at special departments • Data is distributed to data marts in
• Logical DW (i.e., virtual) one or more tiers
Finance Mrktng Distr. Bread
Milk 56 67

• Data marts contain detail data • Only aggregated data in cube tiers Aalborg 57 45
211

mart mart mart Copenhagen 123 127

• Pros • Data is aggregated/reduced as it Bread


Milk 56 67
2000 2001

 Performance due to distribution moves through tiers Aalborg 57 45


211

Copenhagen 123 127

• Pros
Milk 56 67

• Cons
Bread

2000 2001

Logical  Best performance due to


Central Aalborg

Copenhagen
57

123 127
45
211

 More complex
DW DW
Milk 56 67 2000 2001

redundancy+distribution Bread

Aalborg 57 45 Bread
Milk 56 67

211

• Cons Copenhagen 123 127


Aalborg

Copenhagen
57

123 127
45
211

2000 2001

Most complex
2000 2001


Source Source  Hard to manage

Aalborg University 2007 - DWML course 23 Aalborg University 2007 - DWML course 24
Coordination w. Development Strategy Operational Data Store (ODS)
• Different development strategies pose different demands
Existing databases
to the architecture elements
and systems (OLTP)
• Example: Kimball Dimensional Modeling Appl.
OLAP
 Centralized design of (conforming) dimensions DB DM
 First, design of a single-source data mart Appl.
 Later, design of multi-source data marts DB Data
DM mining
 Integration of existing data marts into new data marts ETL
 The DW is just the union of the marts it is composed of Appl.
DB
 Entails top-down (“Bus Architecture”) and bottom-up elements …
DW
• Consequences ODS
Appl. Visua-
 No initial design of DW, from which data marts are extracted lization
New
DB DB DM
 Data is extracted directly from sources to data marts
 Allows distribution of data marts and computation on them Appl.
DB

Aalborg University 2007 - DWML course 25 Aalborg University 2007 - DWML course 26

Operational Data Store I Operational Data Store II


a “subject oriented, integrated, volatile, current valued data
store containing only corporate detailed data” • ODS - pros
(Inmon et al.)  More modeling choices
◆ The dimensional “straightjacket” can force sub-optimal design
decisions hiding the true semantics of data
• A database which integrates and accumulates operational ◆ No need to choose a granularity, and no need to exclude data
data in a subject-oriented structure ◆ In summary, no need to make design decisions that cannot be
changed subsequently
• Not dimensional, but ordinary relational
 This means extra flexibility
• An extra level between operational systems and • ODS – cons
dimensional structures
 Not feasible to do analysis directly on ODS
• Two benefits sought  extra complexity
 Integration of operational systems
• Separate ODS unnecessary, DW = ODS (Kimball et al.)
 Basis for data warehouse

Aalborg University 2007 - DWML course 27 Aalborg University 2007 - DWML course 28
MS Analysis Services IBM DB2 OLAP Server
• Cheap
• “Light” version of Hyperion Essbase (OLAP market leader)
• Easy to use  Extra product “on top of” DB2
• (R/M/H)OLAP technology • (R/M/H)OLAP
 Data placement as desired
 Data in DB2 or in multidimensional structures
• Intelligent pre-aggregation • Interfaces
• Server and client parts  Hyperion Essbase API
 Reporting Services a separate tool  OLE DB for OLAP (promised)
• Built-in data mining • DB2 can also handle aggregates
 Decision trees  Automatic summary tables
 Clustering  Used by DB2 optimizer
• MS OLE DB for OLAP interface  Automatic maintenance by DB2

Aalborg University 2007 - DWML course 29 Aalborg University 2007 - DWML course 30

Oracle 10g BI Architecture Alternatives


• Based on Express OLAP product • Cubes are smart
 On the market since 1970!  Intuitive model
• (R/M/H)OLAP  Better overview
 Flexible data placement  Better suited for data analysis
 Integrates ROLAP strategy and Express OLAP • But logical cubes suffice
• Total integration with Oracle 10g RDBMS  Implementation hidden from user
 Storage, security, management,… • Architecture alternatives
 Best integration compared to MS and IBM  MS, IBM, Oracle
• Add-on data mining (10g Data Mining)  Virtual cubes, physical cubes
 Associations, classification, prediction, clustering  ROLAP, MOLAP
 Separate relational DW, cubes directly on source data
 Client tools
 3*23 = 24 different possibilities (without clients) – less in reality

Aalborg University 2007 - DWML course 31 Aalborg University 2007 - DWML course 32
MS vs. IBM vs. Oracle Virtual vs. Physical Cubes
• All
 Good scalability
 Good analysis facilities • Virtual cubes
 Flexible storage (MOLAP, ROLAP, HOLAP)  Logical cube specification directly on source data
 Incremental update  ROLAP implementation without aggregates
 Many client tools  + flexible, design can be changed quickly
• MS Analysis Server  - performance, constant load on source DB
 Built-in mining + good integration with MS SQL Server • Physical cubes
• DB2 OLAP Server  Data for cube extracted and stored on OLAP server
 Good integration with DB2  Several implementation choices possible
• Oracle  + good performance, only source DB load at creation/update
 Best RDBMS/MOLAP integration of the three  - harder to change design
• All three products are good
 Dependent on the other choices + existing technical architecture

Aalborg University 2007 - DWML course 33 Aalborg University 2007 - DWML course 34

MOLAP vs. ROLAP Separate Data Warehouse?


• MOLAP • Separate DW
 Data in specialized data structure, optimized for OLAP  Integration of source data in DW
 + best performance, least space consumption  Cubes built from DW
 - changing design requires rebuilding, scalability at detail level?,  Sometimes the only solution
detail data stored several times
 + better integration and cleansing, less load on existing servers
• ROLAP
 - larger complexity, design changes, updating DW
 Data in RDBMS
 + more flexible change of design, scalable for detail data • Cubes directly on source data
 - not as good performance, larger space consumption  Cubes built directly from source data
• HOLAP  Cannot handle all cases
 Detail data in RDBMS (can be source DB)  + less complexity, easier to change design, no update of DW
 Aggregates in multidimensional structure  - cannot handle all forms of integration and cleansing, more load
 + good performance for higher-level queries, detail data only stored on operational servers
once
 - handling design changes, operational complexity

Aalborg University 2007 - DWML course 35 Aalborg University 2007 - DWML course 36
Choosing Client Tools Architecture Alternatives - Conclusion
• Many OLAP clients on the market, e.g.,
 Hyperion, Targit, Oracle
 MS Reporting Services • Architecture alternatives, their pros and cons
• Client and server communicates via an API • No simple general choices
• MS OLE DB for OLAP • Choices dependent on the concrete situation
 De facto standard  Look at books
 Supported by almost all client tools
 Look at requirements specs
• Hyperion Essbase API  Look at the latest products
 Supported by many client tools
 Think about prototyping
• Some criteria
 Functionality (web distribution, analysis, reporting, …)
 Support
 Price

Aalborg University 2007 - DWML course 37 Aalborg University 2007 - DWML course 38

Summary Mini Project

• DW Front End Tools • New subtask


 Build a few reports in Reporting Services to answer important
• Source Integration business questions you proposed in part (1a)
 Discuss the architecture of your DW system
 Discuss source integration in your system
• DW architecture
• MS Reporting Services Tutorial
 http://msdn2.microsoft.com/en-us/library/ms170246.aspx

Aalborg University 2007 - DWML course 39 Aalborg University 2007 - DWML course 40

You might also like