0% found this document useful (0 votes)
16 views22 pages

Data Warehouse Lec-4

Uploaded by

Naveen Bandaru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views22 pages

Data Warehouse Lec-4

Uploaded by

Naveen Bandaru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

8/17/2024

SS ZG515 - Data
Warehousing
BITS Pilani Prof. Yashvardhan Sharma
CSIS Dept., BITS-Pilani
Pilani Campus

BITS Pilani, Pilani Campus

Outline

• Data Warehouse vs Data Mart


• Inmon vs Kimball approach to Data
Warehousing
• A Practical Approach
• ER vs Dimensional Modeling
• Dimensional Modeling

2
BITS Pilani, Pilani Campus

1
8/17/2024

Data warehouse versus data mart.

3
BITS Pilani, Pilani Campus

Building a Data Mart

• Questions to be asked:
– Top-down or bottom-up approach?
– Enterprise-wide or departmental?
– Which first—data warehouse or data mart?
– Build pilot or go with a full-fledged
implementation?
– Dependent or independent data marts?

4
BITS Pilani, Pilani Campus

2
8/17/2024

Bottom-Up Versus Top-Down Approach

Dependent Data Marts

17-Aug-24 6
BITS Pilani, Pilani Campus

3
8/17/2024

Independent Data Marts

17-Aug-24 7
BITS Pilani, Pilani Campus

Data Warehouse or Data Mart First?

• Top-Down vs. Bottom-Up Approach


• Advantages of Top-Down
– A truly corporate effort, an enterprise view of data
– Inherently architected-not a union of disparate DMs
– Single, central storage of data about the content
– Central rules and control
– May be developed fast using iterative approach

8
BITS Pilani, Pilani Campus

4
8/17/2024

Data Warehouse or Data Mart First?

• Disadvantages of Top-Down
– Takes longer to build even with iterative method
– High exposure/risk to failure
– Needs high level of cross functional skills
– High outlay without proof of concept
– Difficult to sell this approach to senior management
and sponsors

9
BITS Pilani, Pilani Campus

Data Warehouse or Data Mart First?

• Advantages of Bottom-Up Approach


– Faster and easier implementation of manageable
pieces
– Favorable ROI and proof of concept
– Less risk of failure
– Inherently incremental; can schedule important DMs
first
– Allows project team to learn and grow

10
BITS Pilani, Pilani Campus

5
8/17/2024

Data Warehouse or Data Mart First?

• Disadvantages of Bottom-Up Approach


– Each DM has its own narrow view of data
– Permeates redundant data in every DM
– Difficult to integrate if the overall requirements are not
considered in the beginning
• Kimball’s approach is considered as a Bottom-Up
approach, but he disagrees

11
BITS Pilani, Pilani Campus

The Bottom-Up Misnomer

Kimball encourages you to broaden your


perspective both “vertically” and “horizontally”
while gathering business requirements while
developing data marts

12
BITS Pilani, Pilani Campus

6
8/17/2024

The Bottom-Up Misnomer

• Vertical
– Don’t just rely on the business data analyst to
determine requirements
– Inputs from senior managers about their vision,
objectives, and challenges are critical
– Ignoring this vertical span might cause failure in
understanding the organization’s direction and likely
future trends

13
BITS Pilani, Pilani Campus

The Bottom-Up Misnomer

• Horizontal
– Look horizontally across the departments before designing the
DW
– Critical in establishing the enterprise view
– Challenging to do if one particular department if funding the
project
– Ignoring horizontal span will create isolated, department-centric
databases that are inconsistent and can’t be integrated
– Complete coverage in a large organization is difficult
– One rep. from each dept. interacting with the core development
team can be of immense help

14
BITS Pilani, Pilani Campus

7
8/17/2024

15
BITS Pilani, Pilani Campus

16
BITS Pilani, Pilani Campus

8
8/17/2024

the essence of the difference between Inmon and Kimball


the question being answered – what is the single version
of the truth? what is corporate data?

data
mart

data
mart

data finance
mart
marketing

sales
data
mart

mgmt
integrated
historical
granular

single version HR
data
warehouse of the truth
BITS Pilani, Pilani Campus

from an architectural perspective

relational based
star schema
data warehouse
(Kimball)
(Inmon)

Inmon’s Corporate Information Factory (CIF) and the Kimball Data


Warehouse Bus (BUS)

BITS Pilani, Pilani Campus

9
8/17/2024

In an article for the Business Intelligence Network, Mr. Inmon writes:


“… Independent data marts may work well when there are only a few data
marts. But over time there are never only a few data marts ...
Once there are … a lot of data marts, the independent data mart approach
starts to fall apart. There are many reasons why …
independent data marts built directly from a legacy/source environment
fall apart:
•There is no single source of data for analytical processing …;
•There is no easy reconcilability of data values …;
•There is no foundation to build on for new data marts …
•An independent data mart is rarely reusable for other purposes;
•There are too many interface programs to be built and maintained;
•There is a massive redundancy of detailed data in each data mart ... because there
is no common place where that detailed data is collected and integrated;
•There is no convenient place for historical data;
•There is no low level of granularity guaranteed for all data marts to use;
•Each data mart integrates data from the source systems in a unique way,
which does not permit reconcilability or integrity of the data across the enterprise; and
•The window for extracting data from the legacy environment is stretched with
each independent data mart requiring its own window of time for extraction …”

19

Inmon and Kimball compared

20
BITS Pilani, Pilani Campus

10
8/17/2024

Pros and cons of both approaches

21
BITS Pilani, Pilani Campus

Data Warehouse or Data Mart


First?
New Practical approach by Kimball
1. Plan and define requirements at the overall corporate
level
2. Create a surrounding architecture for a complete
warehouse
3. Conform and standardize the data content
4. Implement the Data Warehouse as a series of
Supermarts, one at a time

22
BITS Pilani, Pilani Campus

11
8/17/2024

A Word about SUPERMARTS

• Totally monolithic approach vs. totally stovepipe approach


• A step-by-step approach for building an EDW from granular data
• A Supermart s a data mart that has been carefully built with a
disciplined architectural framework
• A Supermart is naturally a complete subset of the DW
• A Supermart is based on the most granular data that can possible
be collected and stored
• Conformed dimensions and standardized fact definitions

23
BITS Pilani, Pilani Campus

A Word about SUPERMARTS

24
BITS Pilani, Pilani Campus

12
8/17/2024

Why do Data Warehouse projects fail?

25
BITS Pilani, Pilani Campus

Why do Data Warehouse projects fail?

• Unreliable or unattainable user requirements


• Quality of the data that feeds the source system
• Changing source or target requirements
• Poor development productivity
• High TCO (Total Cost of Ownership)
• Poor documentation
• “…over 50% of data warehouse projects fail or
go wildly over budget–they blame data
quality…”The real problem is project approach.
26
BITS Pilani, Pilani Campus

13
8/17/2024

Why do Data Warehouse projects fail?

– Fail due to lack of attention to Data Quality Issues

– More than half only have limited acceptance

– Consistency and Accuracy of Data

– Most businesses fail to use business intelligence (BI) strategically

– IT organizations build data warehouses with little to no business


involvement

27
BITS Pilani, Pilani Campus

28
BITS Pilani, Pilani Campus

14
8/17/2024

Business Ownership

• The data warehouse should be owned by the business


–not IT
• A successful project depends upon creating a
partnership with the business
• Prioritization of project phases or agreement on a data
dictionary to should be agreed by the business
• Without a strong, high level business sponsor(s) the
project is likely to hit problems
• If sponsorship is present then the data warehouse
project can be broken down into a set of smaller
projects
29
BITS Pilani, Pilani Campus

Divide and Conquer

• A “big bang” approach to data warehousing


has almost always ended in disaster
• The project phases and the order in which
they are developed should be decided by the
data warehouse sponsors
• Momentum is paramount for keeping the
required focus
• Rapid prototyping and tight development
cycles are vital for successful warehouse
• Keep in view the bigger picture
• Use smaller phases to fund the project
adequately
30
BITS Pilani, Pilani Campus

15
8/17/2024

“Big Bang” Approach:


Advantages and Disadvantages
• Advantages:
– warehouse built as part of major project (eg:
BPR)
– Having a “big picture” of the data warehouse
before starting the data warehousing project
• Disadvantages:
– Involves a high risk, takes a longer time
– Runs the risk of needing to change
requirements
– Costly and harder to get support for from users
BITS Pilani, Pilani Campus

Incremental Approach to Warehouse


Development

Strategy • Multiple iterations


Definition
Analysis
• Shorter
Design implementations
Build
• Validation of each
Production
phase

BITS Pilani, Pilani Campus

16
8/17/2024

Benefits of an Incremental Approach

• Delivers a strategic data warehouse solution


through incremental development efforts
• Provides extensible, scalable architecture
• Quickly provides business benefits and ensures
a much earlier return of investment
• Allows a data warehouse to be built based on a
subject or application area at a time
• Allows the construction of an integrated data
mart environment

BITS Pilani, Pilani Campus

Pilot Projects: Risk vs. Reward

• Start with a pilot implementation as the first


rollout for DW
• Pilot projects have advantage of being small
and manageable
• Provide organization with a “proof of concept”

34
BITS Pilani, Pilani Campus

17
8/17/2024

Pilot Projects: Risk vs. Reward

Functional scope of a pilot project should be


determined based on:
1. The Degree of risk enterprise is willing to
take
2. The potential for leveraging the pilot project
 Avoid constructing a throwaway prototype
 Pilot warehouse must have actual value to the
enterprise

35
BITS Pilani, Pilani Campus

Pilot Projects: Risk vs. Reward

High Risk High Risk


Low Reward High reward
RISK

Low Risk Low Risk


Low Reward High Reward

REWARD
17-Aug-24 36
BITS Pilani, Pilani Campus

18
8/17/2024

A Practical Approach

Most people employ a Hybrid approach with elements of Top-


Down and Bottom-Up
 Again, practitioners don’t always concentrate on these issues
and use this terminology, and just focus on best-practice
That would include;
Build incrementally according to a business function
 Employ an enterprise perspective
 Dimensionally model data
 Utilise conformed dimensional models
 Employ a Staging Area or Data Warehouse
 Store atomic data

37
BITS Pilani, Pilani Campus

The Kimball Lifecycle Diagram

BITS Pilani, Pilani Campus

19
8/17/2024

The Kimball Lifecycle

• Illustrates the general flow of a DW


implementation
• Identifies task sequencing and highlights
activities that should happen concurrently
• May need to be customized to address the
unique needs of your organization
• Not every detail of every Lifecycle task will be
performed on every project

BITS Pilani, Pilani Campus

The Kimball Lifecycle,


SDLC, and DBLC

Planning DB Initial Study

DB Design
Analysis

Implementation
Detailed System
Design Testing

Implementation
Operation

Maintenance Maintenance

BITS Pilani, Pilani Campus

20
8/17/2024

Program/Project Planning

• Kimball’s view of programs and projects


– Project refers to a single iteration of the Kimball Lifecycle
• from launch through deployment
– Program refers to the broader, ongoing coordination of
resources, infrastructure, timelines, and communication
across multiple projects
• a program contains multiple projects
– In real world, programs do not necessarily start before
projects although ideally they should be.

BITS Pilani, Pilani Campus

Program/Project Planning

• Project planning
– Scope definition understanding business
requirements
– Tasks’ identification
– Scheduling
– Resource planning
– Workload assignment
– The end document represents a blueprint of the
project

BITS Pilani, Pilani Campus

21
8/17/2024

Program/Project Management

• Enforces the project plan


• Activities:
– Status monitoring
– Issue tracking
– Development of a comprehensive communication
plan that addresses both the business and IT units

BITS Pilani, Pilani Campus

22

You might also like