0% found this document useful (0 votes)
106 views

DW Concepts

This document discusses key concepts in data warehousing including: - The differences between OLTP and OLAP systems and why OLTP is not suitable for analysis. - Common techniques for modeling data in a warehouse like star schemas and slowly changing dimensions. - The major components of a data warehouse including the extraction, transformation, and loading (ETL) process. - Types of users that access the data warehouse and different approaches for querying like ROLAP and MOLAP.

Uploaded by

Ravi Vakula
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views

DW Concepts

This document discusses key concepts in data warehousing including: - The differences between OLTP and OLAP systems and why OLTP is not suitable for analysis. - Common techniques for modeling data in a warehouse like star schemas and slowly changing dimensions. - The major components of a data warehouse including the extraction, transformation, and loading (ETL) process. - Types of users that access the data warehouse and different approaches for querying like ROLAP and MOLAP.

Uploaded by

Ravi Vakula
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 40

Data warehousing concepts

1
Agenda
• OLTP Vs OLAP
• Modeling Techniques
• User Profile
• Top down approach
• Bottom up approach

2
Traditional OLTP systems
• OLTP systems are highly structured sets of information that
support the ongoing and day-to-day operation of an
organization

• These databases usually hold information about small


subsets of the organization split on the basis of
– Business functions e.g. sales, purchase,travel
– Geographical locations e.g. Northern region,
Eastern region
– Logical units e.g. REUD, BCMD, IHLD, EISA

3
OLTP (Contd…)

• Transactional database require a highly


normalized database design to achieve
performance goals and to optimize on
storage space
• These databases need to record, on a
real-time basis, every transaction that
the organization enters into

4
What is OLAP ?

• An organization’s success also depends


on its ability to analyze data (through
views and reports) and make intelligent
decisions that potentially affect its
future. Systems that facilitate such
analyses are called On Line
Analytical Processing (OLAP) systems

5
Why not OLTP for OLAP?

• OLTP databases do not contain historical


data
• OLTP databases contain small subsets of
organizational data
• OLTP databases are heterogeneous in
nature and geographically distributed
systems

6
In other words...

• OLTP systems are


– Fragmented
– Not integrated.
– Difficult to access.
– Disparate sources.
– Disparate platforms.
– Poor data quality.
– Redundant data.
– Difficult to understand.

7
Data warehouse
• A Data Warehouse is a copy of the
enterprise operational data, suitably
modified to support the needs of
analytical processes and stored
outside the operational database.
• According to Bill Inmon, known as the
father of Data Warehousing, a data
warehouse is a subject oriented,
integrated, time-variant, nonvolatile
collection of data in support of
management decisions.

8
OLAP Vs OLTP
• Data warehouse • OLTP database
database • Designed for real-time
• Designed for analysis of business operations
business measures by • Optimized for a common
categories and attributes set of transactions,
• Optimized for bulk loads usually adding or
and large, complex, retrieving a single row at
unpredictable queries a time per table
that access many rows • Optimized for validation
per table of incoming data during
• Loaded with consistent, transactions; uses
valid data; requires no validation data tables
real time validation • Supports thousands of
• Supports few concurrent concurrent users
users relative to OLTP

9
Data warehouse architecture

Data Warehouse OLAP Servers Clients


Server (Tier 2) (Tier 3)
(Tier 1)
e.g., MOLAP
Semistructured Analysis
Sources
Data
Warehouse serve

extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s Data Mining
serve

Data Marts
10
D/W Architecture Goals

• Deliver a great user experience — user


acceptance is the measure of success
• Function without interfering with OLTP
systems
• Provide a central repository of
consistent data
• Answer complex queries quickly
• Provide a variety of powerful analytical
tools, such as OLAP and data mining

11
Characteristic of D/W
• Are based on a dimensional model
• Contain historical data
• Include both detailed and summarized data
• Consolidate disparate data from multiple sources while
retaining consistency
• Focus on a single subject, such as sales, inventory, or
finance

12
User Profile
• Statisticians (2%)
• Knowledge workers (15%)
• Information Consumers (83%)

13
Steps in implementing D/W

• Identify and gather requirements


• Design the dimensional model
• Develop the architecture, including the
Operational Data Store (ODS)
• Design the relational database and
OLAP cubes
• Develop the data maintenance
applications
• Develop analysis applications
• Test and deploy the system

14
Identify and gather requirements
• Identify the Sponsor
• Meet the Business Users
• Meet Data experts
• Communicate with users often and thoroughly

15
Identify The Business Areas
• For Telecom D/W
– Customer Behavior
– Corporate Customer
– Customer Service
– Accounts
– Settlements
– Partner
– Supplier
– Competitor
– Marketing

16
Sources and Targets
• Sources
– Telephone call detail recording
– Customer Service such as ordering service
and disconnecting lines
– Customer payment processing
• Targets
– Studies of minutes of call use by customer
group
– Segmentation of customers by minutes of
call use
– Product bundling analysis
– Customer Payment analysis

17
Design the dimensional model
• Identify the dimensions
• Should match with Business needs
• Identify the grain of the detail
• Decide on
– Star Schema
– Snow-flake Schema
– Star-flake Schema

18
Star Schema

19
Star Schema

20
Snowflake Schema

21
Snowflake Schema

22
23
Design consideration of
Dimension Table

• Level of hierarchies
• Surrogate Key
• Star or Snowflake
• Date and Time

24
Slowly changing Dimension
• Type 1: Overwrite the dimension record.
• Type 2: Add a new dimension record.
• Type 3: Create new fields in the dimension record.

– Tracking bands can reduce the updation to some extent


– Nightmare if source and report not in sync

25
Rapidly changing Dimensions

• Breaking
offending
dimension
attributes
• Fact less facts!
• Confirmed
Dimensions

26
Fact tables
• Multiple Fact tables
• Additive measures
• Non-additive/Semi additive measures
• Calculated Measures
• Granularity

27
ETL

• Extract, Transform and Load process


may be described as the process of
selecting, migrating, transforming,
cleansing and converting mapped data
from the legacy environment to data
warehouse environment.

28
Extraction

• Push strategy
• Pull strategy

29
Transformation
• Transformation involves applying
complex filters, removing the
inconsistency between data from
different sources, conditional
transforms, complex calculations to
create derived data etc. Cleansing of
data could be an important part of the
transformation process

30
Loading

• Loading involves the insertion of data


into the target system, that is, the data
warehouse. Loading is the last step
before the users see the data. It
involves populating the fact and
dimension tables as well as aggregation
tables that are part of the physical data
model

31
Loading approach

• Transform and Load


• Load and Transform
• Transform while Loading

32
Issues in Loading

• Volume and frequency of loading


• Disk space
• Scheduling

33
Data Marts
• A data mart is a repository of data
gathered from operational data and other
sources that is designed to serve a
particular community of knowledge
workers. In scope, the data may derive
from an enterprise-wide database or data
warehouse or be more specialized. The
emphasis of a data mart is on meeting
the specific demands of a particular group
of knowledge users in terms of analysis,
content, presentation, and ease-of-use

34
OLAP
• ROLAP
• MOLAP
• HOLAP

35
Few Popular tools

• ETL
– DataStage
– Data Junction.
– Microsoft DTS (Available with SQL Server
7.0 and above)
– Oracle Warehouse Builder.
– Informatica- PowerCenter
– IBM- Data Warehouse Manager
– AbIntio

36
Few Popular tools
• OLAP
– Cognos
– Business Objects
– Power Analyzer
– Microsoft Analysis service
– Micro strategy
– DB2 OLAP Server
– Hyperion OLAP Server

37
Few Popular tools
• Data Mining
– Intelligent Miner
– DARWIN
– SAS

38
References
• http://192.168.121.14/asp/Search/DispDoc.asp?
DocNo=8703&KCURating=8.61&ContentType=Internal
+Literature
• http://www.datawarehouse-training.com
• http://www.datawarehousing.com
• http://www.caworld.com/proceedings/2000/data_wareh
ousing/ws006pn/sld001.htm
• http://sdgcomputing.com
• http://www.dmreview.com

39
Thank You

40

You might also like