Topic 1.1 Need For Data Warehousing: Overview & Concepts
Topic 1.1 Need For Data Warehousing: Overview & Concepts
Textbook : Data Warehousing Fundamentals A comprehensive guide for IT Professionals, by Paulraj Ponniah, Publisher: John Wiley & Sons, 2nd Edition
Objectives
Understand the desperate need for strategic information Recognize the information crisis at every enterprise Distinguish between operational and informational systems Learn why past attempts to provide strategic information failed Clearly see why data warehousing is the viable solution
Definitions
Database: The database is a place where you put your data; data that you wish to convert to information at some future time. Database Management System: A DBMS is the software that converts the data in your database to information. It is the DBMS that provides you the capability for cross-referencing, correlating, sorting, summarizing, etc.
Operational System
Informational Systems
Operational vs DSS
advantage through systems that automate business processes to offer more efficient and cost-effective services to the customer.
Easily summarize and roll up the information across subject areas and business dimensions
Characteristics
1.
The new concept is not to generate fresh data, but to make use of the large volumes of existing data and to transform it into forms suitable for providing strategic information. It is an user-centric environment not a product. A computing environment where users can find strategic information. A central database that is loaded from multiple operational databases for the purpose of end-user access and decision support. A data warehouse differs from an operational system in that the data it contains is normally static and updated in a scheduled manner through massive loading procedures. A data warehouse is developed to accommodate random, ad hoc queries and to allow users to drill down to minute levels of detail.
2.
3.
4.
Blend of Technologies
Different technologies needed to support data warehousing functions.
Scenario 1
ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
Delhi Sales per item type per branch for first quarter. Chennai Sales Manager
Banglore
Report Delhi Data Warehouse Chennai Query & Analysis tools Sales Manager
Banglore
Scenario 2
One Stop Shopping Super Market has huge operational database.Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.
Solution 2
Extract data needed for analysis from operational database. Store it in a warehouse. Refresh warehouse at regular interval so that it contains up to date information for analysis. Warehouse will contain data with historical perspective.
Solution 2
Data Entry Operator Report Transaction Operational database Extract data Data Warehouse
Manager
Scenario 3
Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions.
Solution 3
Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.
Solution 3
Expansion
President
Inmonss definition
A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of managements decision making process.
Subject-oriented
Data warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.
Integration
Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency.
RDBMS
Legacy System
Data Warehouse
Flat File
Integration
In terms of data.
encoding structures. Measurement of
attributes.
physical attribute.
of data
remarks
Time-variant
Provides information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time
Nonvolatile
Data once recorded cannot be updated. Data warehouse requires two operations in data accessing Initial loading of data Access of data
load
access
Operational
Operational processing Transaction Clerk,DBA,database professional Day to day operation Current Detailed,flat relational Application oriented Read/write
Information
Informational processing Analysis Knowledge workers Decision support Historical Summarized, multidimensional Subject oriented Mostly read
Operational
Data in tens thousands 100MB to GB
Information
Information out millions hundreds 100 GB to TB
High performance,high High flexibility,endavailability user autonomy Transaction throughput Query througput
Reconciled data
External Sources
Analysis
Serve
Query/Reporting
Operational Dbs
Data Mining
DATA SOURCES
DATA MARTS
TOOLS
Star Schema
A single,large and central fact table and one table for each dimension. Every fact points to one tuple in each of the dimensions and has additional attributes. Does not capture hierarchies directly.
Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.
SnowFlake Schema
Variant of star schema model. A single,large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables
Fact Constellation
Multiple fact tables share dimension tables. This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.
Case Study
Afco Foods & Beverages is a new company which produces dairy,bread and meat products with production unit located at Baroda. There products are sold in North,North West and Western region of India. They have sales units at Mumbai, Pune , Ahemdabad ,Delhi and Baroda. The President of the company wants sales information.
Sales Information
Report: The number of units sold. 113
January 14
February 41
March 33
April 25
Sales Information
Report : The number of items sold for each product with time
17 8
Sales Information
Report: The number of items sold in each City for each product with time
Jan Mumbai Wheat Bread Cheese Swiss Rolls Pune Wheat Bread Cheese Swiss Rolls 3 4 3 4
Feb Mar 3 16 16 6 6 3
Apr 10
7 8
Produ ct
15
Time
Sales Information
Report: The number of items sold and income in each region for each product with time. Jan Rs Mumbai Wheat Bread Cheese Swiss Rolls Pune Wheat Bread Cheese Swiss Rolls 7.95 7.32 3 4 16.47 9 27.45 15 7.95 7.32 3 4 42.40 29.98 U Feb Rs U Mar Rs 7.44 16 15.90 16 10.98 7.44 U 3 6 6 3 17.36 21.20 7 8 Apr Rs 24.80 U 10
Units 3 4 3 4 16
Product_Category_ID 1 1 2
Product_Category_Id 1 2
City_ID 1 2
Sales Fact
Product
Product Category
Region
Time
OLAP Cube
City All Mumbai Mumbai Mumbai Mumbai Mumbai Product All All White Bread Time All All All Units 113 64 38 13 3 3 Dollars 251.26 146.07 98.49 32.24 7.44 7.44
OLAP Operations
Drill Down Product Category e.g Electrical Appliance Sub Category e.g Kitchen Product e.g Toaster
Time
OLAP Operations
Drill Up Product Category e.g Electrical Appliance Sub Category e.g Kitchen Product e.g Toaster
Time
OLAP Operations
Slice and Dice Product Product=Toaster
Time
Time
OLAP Operations
Pivot Product Product
Time
Region
OLAP Server
An OLAP Server is a high capacity,multi user data manipulation engine specifically designed to support and operate on multi-dimensional data structure. OLAP server available are
MOLAP server ROLAP server HOLAP server
Presentation
Product
Reporting Tool
Report Time
Flat File
Client