DW Data Warehousing
DW Data Warehousing
Scenario 1
ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
Delhi
Sales Manager
Banglore
Report Delhi Data Warehouse Chennai Query & Analysis tools Sales Manager
Banglore
Scenario 2
One Stop Shopping Super Market has huge operational database.Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.
Management
Solution 2
Extract data needed for analysis from operational database. Store it in warehouse. Refresh warehouse at regular interval so that it contains up to date information for analysis. Warehouse will contain data with historical perspective.
Solution 2
Data Entry Operator Report
Transaction
Operational database
Extract data
Data Warehouse
Manager
Scenario 3
Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions.
Solution 3
Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.
Solution 3
Expansio n sales
Data Warehouse
President
Improvemen t
Basic reports and simple OLAP analyses can be made directly from operational data. Many organizations choose to extract operational data into facilities called data warehouses and data marts, both of which are facilities that prepare, store, and manage data specifically for data mining and other analyses. Programs read operational data and extract, clean, and prepare that data for BI processing. The prepared data are stored in a data-warehouse database using data-warehouse DBMS, which can be different from the organizations operational DBMS.
Most operational and purchased data have problems that inhibit their usefulness for business intelligence.
Data Warehouse
Inmonss definition
A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of managements decision making process.
Subject-oriented
Data warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.
Integration
Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency.
RDBMS
Legacy System
Data Warehouse
Flat File
Integration
In terms of data.
encoding structures.
Measurement of attributes. physical attribute. of data naming conventions. Data type format
remarks
Time-variant
Provides information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time
Nonvolatile
Data once recorded cannot be updated. Data warehouse requires two operations in data accessing Initial loading of data Access of data
load
access
Operational
Operational processing
Transaction Clerk,DBA,database professional Day to day operation Current
Historic
Informational processing
Analysis Knowledge workers Decision support Historical
View
DB design Unit of work Access
Detailed,flat relational
Application oriented Read/write
Summarized, multidimensional
Subject oriented Mostly read
Operational
Data in
tens thousands 100MB to GB
Historic
Information out
millions hundreds 100 GB to TB
Priority
Metric
Two-dimensional data warehouse Three-dimensional data warehouse Data warehouses can have four or more dimensions
Extract
Transform
Load
ETL Methodology:
Data warehouse
Data marts
ETL Methodology
Data extraction: Process of copying relevant data from a variety of transactional databases for inclusion in a DW. May occur at regular intervals (e.g., weekly, monthly) to add new data. Data from incompatible databases, flat files, text documents, etc. must be filtered through appropriate API (application programming interfaces) as needed. Data transformation: Next slide. Data loading: Extracted, cleaned, and transformed data is loaded into DW at a predetermined data refresh frequency.
Date May 19, May 19, May 19, May 20, May 34, June 03, June 04, June 04, June 05, June 12, June 15,
2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003
Price Quantity 32.99 1 19.95 21 24.99 3 49.50 1 3200.99 1 32.99 2 30.00 1 24.99 1 30.00 5 -32.99 2 32.99 1
City State TN Columbus OH San Diego CA Raleigh NC Brisbane Columbus GA Brussels San Diego CA Toronto ON Columbus RP Santiago Country USA USA USA USA Australia USA Belgium USA Canada USA Chile
Street 123 Oak St. 345 Main Ave. 50 Elm Rd. 876 Leslie Ln. 1200 Wallaby St. 345 Main Ave. 742 Ave. Louise 50 Elm Rd. 48 Maple Ave. 390 Martin Dr. 666 Ave. Bolivar
Missing data: City is blank. Questionable data: State for rows 2 & 6 could be the same Possible misspelling: Do rows 3 & 8 refer to the same person?
Case Study
Afco Foods & Beverages is a new company which produces dairy,bread and meat products with production unit located at Baroda. There products are sold in North,North West and Western region of India. They have sales units at Mumbai, Pune , Ahemdabad ,Delhi and Baroda. The President of the company wants sales information.
Sales Information
Report: The number of units sold.
113
January 14
February 41
March 33
April 25
Sales Information
Report : The number of items sold for each product with time
Jan Feb Mar Apr Wheat Bread Cheese Swiss Rolls 6 8 16 25 6 6 21
Product
17 8
Sales Information
Report: The number of items sold in each City for each product with time
Jan Mumbai Wheat Bread Feb Mar 3 Apr 10
3 4
16 16
6 6
3 7 8
Product Time
3 4 9 15
Sales Information
Report: The number of items sold and income in each region for each product with time.
Jan Rs Mumbai Wheat Bread Cheese Swiss Rolls Pune Wheat Bread Cheese Swiss Rolls 7.95 7.32 3 4 16.47 9 27.45 15 7.95 7.32 3 4 42.40 29.98 16 16 U Feb Rs U Mar Rs 7.44 15.90 10.98 7.44 U 3 6 6 3 17.36 21.20 7 8 Apr Rs 24.80 U 10
Product
Cheese Cheese Swiss Rolls
Month
January January February
Units
3 4 3 4 16
Rupees
7.95 7.32 7.95 7.32 42.40
Month
1/1/1998 1/1/1998 1/1/1998 1/1/1998 2/1/1998
Units
3 4 3 4 16
Rupees
7.95 7.32 7.95 7.32 42.40
Sales Fact
Product
Product Category
Region
Time
OLAP Cube
City
All Mumbai Mumbai Mumbai Mumbai Mumbai
Product
All All White Bread
Time
All All All
Units
113 64 38 13 3 3
Dollars
251.26 146.07 98.49 32.24 7.44 7.44
OLAP Operations
Drill Down
Product Category e.g Electrical Appliance
Time
OLAP Operations
Drill Up
Product Category e.g Electrical Appliance
Time
OLAP Operations
Slice and Dice
Product Product=Toaster
Time
Time
OLAP Operations
Pivot
Product Product
Time
Region
OLAP Server
An OLAP Server is a high capacity,multi user data manipulation engine specifically designed to support and operate on multi-dimensional data structure. OLAP server available are
MOLAP server ROLAP server HOLAP server
Presentation
Product
Reporting Tool
Report
Time
Flat File
Client