DW Concepts
DW Concepts
1
Agenda
• OLTP Vs OLAP
• Modeling Techniques
• User Profile
• Top down approach
• Bottom up approach
2
Traditional OLTP systems
• OLTP systems are highly structured sets of information that
support the ongoing and day-to-day operation of an
organization
3
OLTP (Contd…)
4
What is OLAP ?
5
Why not OLTP for OLAP?
6
In other words...
7
Data warehouse
• A Data Warehouse is a copy of the
enterprise operational data, suitably
modified to support the needs of
analytical processes and stored
outside the operational database.
• According to Bill Inmon, known as the
father of Data Warehousing, a data
warehouse is a subject oriented,
integrated, time-variant, nonvolatile
collection of data in support of
management decisions.
8
OLAP Vs OLTP
• Data warehouse • OLTP database
database • Designed for real-time
• Designed for analysis of business operations
business measures by • Optimized for a common
categories and attributes set of transactions,
• Optimized for bulk loads usually adding or
and large, complex, retrieving a single row at
unpredictable queries a time per table
that access many rows • Optimized for validation
per table of incoming data during
• Loaded with consistent, transactions; uses
valid data; requires no validation data tables
real time validation • Supports thousands of
• Supports few concurrent concurrent users
users relative to OLTP
9
Data warehouse architecture
extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s Data Mining
serve
Data Marts
10
D/W Architecture Goals
11
Characteristic of D/W
• Are based on a dimensional model
• Contain historical data
• Include both detailed and summarized data
• Consolidate disparate data from multiple sources while
retaining consistency
• Focus on a single subject, such as sales, inventory, or
finance
12
User Profile
• Statisticians (2%)
• Knowledge workers (15%)
• Information Consumers (83%)
13
Steps in implementing D/W
14
Identify and gather requirements
• Identify the Sponsor
• Meet the Business Users
• Meet Data experts
• Communicate with users often and thoroughly
15
Identify The Business Areas
• For Telecom D/W
– Customer Behavior
– Corporate Customer
– Customer Service
– Accounts
– Settlements
– Partner
– Supplier
– Competitor
– Marketing
16
Sources and Targets
• Sources
– Telephone call detail recording
– Customer Service such as ordering service
and disconnecting lines
– Customer payment processing
• Targets
– Studies of minutes of call use by customer
group
– Segmentation of customers by minutes of
call use
– Product bundling analysis
– Customer Payment analysis
17
Design the dimensional model
• Identify the dimensions
• Should match with Business needs
• Identify the grain of the detail
• Decide on
– Star Schema
– Snow-flake Schema
– Star-flake Schema
18
Star Schema
19
Star Schema
20
Snowflake Schema
21
Snowflake Schema
22
23
Design consideration of
Dimension Table
• Level of hierarchies
• Surrogate Key
• Star or Snowflake
• Date and Time
24
Slowly changing Dimension
• Type 1: Overwrite the dimension record.
• Type 2: Add a new dimension record.
• Type 3: Create new fields in the dimension record.
25
Rapidly changing Dimensions
• Breaking
offending
dimension
attributes
• Fact less facts!
• Confirmed
Dimensions
26
Fact tables
• Multiple Fact tables
• Additive measures
• Non-additive/Semi additive measures
• Calculated Measures
• Granularity
27
ETL
28
Extraction
• Push strategy
• Pull strategy
29
Transformation
• Transformation involves applying
complex filters, removing the
inconsistency between data from
different sources, conditional
transforms, complex calculations to
create derived data etc. Cleansing of
data could be an important part of the
transformation process
30
Loading
31
Loading approach
32
Issues in Loading
33
Data Marts
• A data mart is a repository of data
gathered from operational data and other
sources that is designed to serve a
particular community of knowledge
workers. In scope, the data may derive
from an enterprise-wide database or data
warehouse or be more specialized. The
emphasis of a data mart is on meeting
the specific demands of a particular group
of knowledge users in terms of analysis,
content, presentation, and ease-of-use
34
OLAP
• ROLAP
• MOLAP
• HOLAP
35
Few Popular tools
• ETL
– DataStage
– Data Junction.
– Microsoft DTS (Available with SQL Server
7.0 and above)
– Oracle Warehouse Builder.
– Informatica- PowerCenter
– IBM- Data Warehouse Manager
– AbIntio
36
Few Popular tools
• OLAP
– Cognos
– Business Objects
– Power Analyzer
– Microsoft Analysis service
– Micro strategy
– DB2 OLAP Server
– Hyperion OLAP Server
37
Few Popular tools
• Data Mining
– Intelligent Miner
– DARWIN
– SAS
38
References
• http://192.168.121.14/asp/Search/DispDoc.asp?
DocNo=8703&KCURating=8.61&ContentType=Internal
+Literature
• http://www.datawarehouse-training.com
• http://www.datawarehousing.com
• http://www.caworld.com/proceedings/2000/data_wareh
ousing/ws006pn/sld001.htm
• http://sdgcomputing.com
• http://www.dmreview.com
39
Thank You
40