Data Warehousing, OLAP, and Data Mining
Data Warehousing, OLAP, and Data Mining
Introduction
Data, data, dataeverywhere! Informationthats another story! Especially, the right information @ the right time! Data ware housing's goal is to make the right information available @ the right time Data warehousing is a data store (eg., a database of some sort) and a process for bringing together disparate data from throughout an organization for decision-support purposes
2
Different Goal
Aggregation, summarization and exploration Of historical data To help management make informed decisions
Product Coke (0.5 gallon) Pepsi (0.5 gallon) Coke (1 gallon) Altoids Branch Convoy Street UTC UTC Costa Verde Time 2006-03-01 09:00:01 2006-03-01 09:00:01 2006-03-01 09:00:02 2006-03-01 09:01:33 Price $1.00 $1.03 $1.50 $0.30
...
Find the total sales for each product and month Find the percentage change in the total monthly sales for each product
3
Different Requirements
OLTP On-Line Transaction Processing OLAP On-Line Analytical Processing
OLTP Tasks Size of database Time span Size of working set Day to day operation Gigabytes Recent, up-to-date OLAP High level decision support Terabytes Spanning over months / years
Tens of records, accessed Consolidated data from through primary keys multiple databases Structured / repetitive Transaction throughput Ad-hoc, exploratory queries Query latency
Workload Performance
Data Warehouse
Enterprise Database
Customers
Orders
Transactions
Vendors Etc Data Miners: Farmers they know Explorers - unpredictable
Etc
Copied, organized summarized
Data Warehouse
Data Mining
OLAP Overview
Interactive, exploratory analysis of multidimensional data to discover patterns
gender
age
ac
s nt e id c
10
OLAP Architecture
11
Server Options
Single processor Symmetric multiprocessor (SMP) Massively parallel processor (MPP)
12
ZOLAP
Speak with your chemist (normally only prescribed for death march victims)
13
Data representation is in the form of a CUBE OLAP goes beyond SQL with its analysis capabilities Key feature of OLAP: Relevant multi-dimensional views such as products, time, geography
14
OLAP Cube - 1
15
OLAP Cube - 2
16
OLAP Cube - 3
Star Structure (quite common)
Product Model Type Color Facts Product Region Time Channel Revenue Expenses Units Region Nation District Dealer
Channel
17
Date
3Qtr
4Qtr
sum
Country
18
OLAP Cube - 5
ThreeDimensional Cube Display
Pag e R e g io n : N orth Red b lo b Row s Y ear 1996 1997 T otal B lu e b lo b C o lu m n s S a le s
T ota l
19
OLAP Cube - 6
Dimension Brand Store Customer segment Product group Period Variable Example Mt. Airy Atlanta Business Desks January Units sold
SixDimensional Cube
20
21
Drill Down
22
OLAP Examples
http://perso.wanadoo.fr/bernard.lupin/english/example.htm
23
25
26
27
28