Data Warehouse PDF
Data Warehouse PDF
02
Data Warehouse - Basic Concepts
DW 2012/2013
Notice
! Author
" Joo Moura Pires ([email protected])
! This material can be freely used for personal or academic purposes without any previous authorization from the author, only if this notice is maintained with.
! For commercial purposes the use of any part of this material requires the previous authorization from the author.
DW Basic Concepts - 2
Bibliography
! Many examples are extracted and adapted from
" [Imhoff , 2003] - Mastering Data Warehouse Design : Relational and Dimensional Techniques, Wiley. " [Kimball, 2002] - The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition), from Ralph Kimball, Margy Ross, Willey
DW Basic Concepts - 3
Table of Contents
! Corporate Information Factory
DW Basic Concepts - 4
DW - Basic Concepts - 5
[Imhoff , 2003]
DW Basic Concepts - 6
[Imhoff , 2003]
DW Basic Concepts - 7
[Imhoff , 2003]
Data acquisition is a set of processes and programs that extracts data for the data warehouse and operational data store from the operational systems. The data acquisition programs perform the cleansing as well as the integration of the data and transformation into an enterprise format. This enterprise format reects an integrated set of enterprise business rules that usually causes the data acquisition layer to be the most complex component in the CIF. In addition to programs that transform and clean up data, the data acquisition layer also includes audit and control processes and programs to ensure the integrity of the data as it enters the data warehouse or operational data store.
Corporate Information Factory DW Basic Concepts - 8
[Imhoff , 2003]
Data delivery is the process that moves data from the data warehouse into data and oper marts. Like the data acquisition layer, it manipulates the data as it moves it. In the case of data delivery, however, the origin is the data warehouse or ODS, which already contains highquality, integrated data that conforms to the enterprise business rules.
Corporate Information Factory DW Basic Concepts - 9
[Imhoff , 2003]
a subject-oriented, integrated, time variant and non-volatile collection of data used in strategic decision making [Imnon, 1980]
DW Basic Concepts - 10
- It is subject oriented like a data warehouse. [Imhoff , 2003] - Its data is fully integrated like a data warehouse. - Its data is current. ! The ODS has minimal history and shows the state of the entity as close to real time as feasible. - Its data is volatile or updatable. - Its data is almost entirely detailed with a small amount of dynamic aggregation
Corporate Information Factory DW Basic Concepts - 11
[Imhoff , 2003]
The data in each data mart is usually tailored for a particular capability or function, such as product protability analysis, KPI analyses, customer demographic analyses, and so on.
DW Basic Concepts - 12
Administrative metadata describes the operation of the CIF, including audit trails, performance metrics, data quality metrics, and other statistical meta data.
DW Basic Concepts - 13
[Imhoff , 2003]
Information feedback is the sharing mechanism that allows intelligence and knowledge gathered through the usage of the Corporate Information Factory to be shared with other data stores, as appropriate
DW Basic Concepts - 14
[Imhoff , 2003]
toolbox is the collection of reusable components (for example, analytical reports) that business users can share, in order to leverage work and analysis performed by others in the enterprise. In the workbench, metadata, data, and analysis tools are organized around business functions and tasks that supports business users in their jobs
Corporate Information Factory DW Basic Concepts - 15
[Imhoff , 2003]
DW Basic Concepts - 16
DW Basic Concepts - 18
Sta
d r a nd
a R E
+
ch a o ppr
ta a D l a c i tor s i H s e + g n a h C s e r u t c tru
Corporate Information Factory DW Basic Concepts - 19
DW - Basic Concepts - 20
Multidimensional Cube
Um negcio que products vende vrios A business sells in stores andatravs it is necessary to produtos de vrias measure the companys lojas, pretende medir o seu performance through desempenho ao longo time do tempo
Produtos
Products
Dollar amount Valor Sales de vendas Unit Sales Unidades vendida ...
...
Valores referentes a: Values concerning um produto a product um dia a day numa loja
Lojas Stores
Hiper-cubo Hyper-Cube
a store
DW Basic Concepts - 21
Multidimensional Cube
Time (days) Tempo Time (days)
Month Ms
Semana Week
Produtos Products
Regio N Region N
dia day
Lojas Stores
Medidas referentes a: Values concerning um produto a product a day um dia a storeloja numa
DW Basic Concepts - 22
Slice ::subconjunto Slice a subset of dos dados multidimensional data Multidimensionais. Um slice definido atravs Slice : a slice is defined by da seleco valores especficos selectingde specific values of dimensions para atributosattributes das dimenses
DW Basic Concepts - 23
l = l2 , t = t1 , p
f ( l, p, t )
Produtos
l{l2 , l3 ,l5}, t = t1 , p
Lojas
f (l, p, t )
l, t = t1 , p MarcaX
f (l, p, t )
Regio 1
l, t = t1 , p MarcaY
f (l, p, t )
DW Basic Concepts - 24
DW - Basic Concepts - 25
Multidimensional Cube
! A Data Modeling approach with the purpose of addressing the following aspects: ! The resulting data models should be understandable by the analytical users: ! Simple. ! Using terms from the domain and appropriate for data analysis. ! Provides a framework for efficient querying ! Provides the basics for generic software development where the users can navigate in large data sets in an intuitive way
DW Basic Concepts - 26
Star schema
! Fact table
! Big and central table. The only table with many joins connecting with the others tables
Asymmetric Model
Product Time
time_key day_of week month quarter year
Sales
time_key product_key store_key value units cost
Dimension
Store
loja_key name address type
Dimension
Fact Table
Basics of Multidimensional Modeling
Dimension
DW Basic Concepts - 27
Fact Tables
! Numerical measures of process.
! Continuos values (or represented as continuos values). ! Additive (may be correctly added by any dimension). ! Semi-additive (may be correctly added by some dimension but not on other dimensions). ! Non-additive (cannot be added but some other aggregation operators are allowed)
! The goal is to summarize the information presented in fact tables. ! The granularity of a fact table is defined by a sub-set of dimensions that index it.
! Ex: sales per day, store and product.
DW Basic Concepts - 28
Dimension Tables
! Tables with simple primary keys that are related to fact tables. ! The most interesting attributes the ones with textual descriptions.
! They are used to define constraints over the data that will be analyzed. ! They are used to group the aggregations made over the fact table measures. They will be the headers columns
DW Basic Concepts - 29
Typical result
! Data for the first quarter for all stores by brand
Sales
time_key product_key store_key value units cost
Store
loja_key name address type
Dimension
Fact Table
Dimension
DW Basic Concepts - 31
select p.brand, sum(f.value), sum(f.units) from sales f, product p, time t where f.product_key = p.product_key ! and f.time_key = t.time_key ! and f.quarter = Q1 1996 group by p.brand order by p.brand
Grouping Sorting
DW Basic Concepts - 32
! All the candidate keys are concatenated (Cartesian Product) to get the keys to be searched in the fact tables. ! All the hits on the fact table are grouped and aggregated.
DW Basic Concepts - 33
DW Basic Concepts - 34
Nome
Restrio: Alcatel
Nokia
Valores Distintos: Alcatel Ericson Coca-Cola Nokia Motorola Nestle Telemvel Televiso ...
Easy .. .. 3610 ...
DW Basic Concepts - 35
DW Basic Concepts - 37
DW Basic Concepts - 38
DW Basic Concepts - 39
DW Basic Concepts - 40
DW Basic Concepts - 41
DW - Basic Concepts - 42
! What you should know: ! Understand the Corporate Information Model (CIF): The different roles for the DW, the ODS and the Data Marts (specially the OLAP data marts). The fundamental aspect of feedback from the knowledge and information gathered at DSS systems into the architecture (operational systems and the DW) ! Understand the fundamental differences between OLTP and the analytical activities developed on the DW or on the Data Marts: data, access, users ...
DW Basic Concepts - 43