Chapter1 Data Warehousing Intro
Chapter1 Data Warehousing Intro
By : Aruna Khubalkar
Outline
Need for Data Warehousing
Benefits and features of Data Warehouse
Data warehouse Characteristics
Data warehouse Architecture
Dimensional Modeling : Star & Snowflake
OLAP operations
OLTP v/s OLAP
HRM Applications
Etc.
business rules
Accessible
Easily accessible with intuitive access paths and
value
Timely
Information must be available within the stipulated
time frame
* Paulraj 2001.
By : Ms. Aruna Khubalkar 5
Expectations of new soln.
DB designed for analytical tasks
Data from multiple applications
Easy to use
Ability of what-if analysis
Read-intensive data usage
Direct interaction with system, without IT assistance
Periodical updating contents & stable
Current & historical data
Ability for users to initiate reports
Kelly said
“Separate available, integrated, time-stamped, subject-
oriented, non-volatile, accessible”
Four properties/characteristics of DW
subject-oriented, integrated,
non-volatile, time-variant
By : Ms. Aruna Khubalkar 9
Subject-oriented
In operational sources data is organized by
applications, or business processes.
In DW subject is the organization method
Subjects vary with enterprise
These are critical factors, that affect performance
Example of Manufacturing Company
Sales
Shipment
Inventory etc
Organized along the lines of Every record in the data Refers to the inability of data
the subjects of the to be updated. Every record
warehouse has some
corporation. Typical subjects in the data warehouse is time
are customer, product, form of time variancy stamped in one form or
vendor and transaction. attached to it. another.
By : Ms. Aruna Khubalkar 13
Generic two-level architecture
L
One,
T company-
wide
warehouse
E
Dimensions are perspectives or entities with
respect to which an organization wants to keep
records.
Time, item, branch, location
Each dimension have a table associated with it,
Dimension table, which further describes the
dimension.
Dimension tables, such as
item (item_name, brand, type), or
time(day, week, month, quarter, year)
A multidimensional data model is typically
organized around a central theme, like sales,
represented by Fact Table
Fact table contains measures (such as
dollars_sold, units_sold) and keys to each
of the related dimension tables
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location
Region
Sales data warehouse
Dimensions: Product, Location, Time
Product
Month
In data warehousing literature, a data
cube (such as above) is referred as
cuboid.
Given a set of dimensions, we can
generate a cuboid for each of the
possible subsets of given dimensions.
The result is lattice of cuboids, each
showing the data at different level of
summarization.
The lattice of cuboids is referred as a
Data Cube
A Sample Data Cube
Date
Total annual sales
2Qtr of TVs in U.S.A.
1Qtr 3Qtr 4Qtr sum
TV
Product PC U.S.A
VCR
sum
Canada
Country
Mexico
sum
time,location,supplier
3-D cuboids
time,item,location
time,item,supplier item,location,supplier
Pivot (rotate):
◦ reorient the cube, visualization, 3D to series of
2D planes
47
Fig. 3.10 Typical
OLAP Operations
48
Figure 11-22: Slicing a data cube
Example of drill-down
Drill-down with
color added