Week 7-1
Week 7-1
Week # 7
Agenda
What is Data Warehousing?
Data warehousing Concepts:
OLTP vs OLAP
SQL reporting
Relational Warehouse
3
The Past and The Problem
Often resources were taxed with both needs on the same systems
4
The Past and The Problem
5
Need for Data Warehousing
improved performance)
6
Operational system – a system that is used to run a business in real
time, based on current data; also called a system of record
7
Issues with Company-Wide View
Synonyms
Missing data
8
What is a Data Warehouse?
9
DWH is the act of organizing & storing data in a way so as to make its retrieval efficient and
insightful.
It is also called as the process of transforming data into information
11
Subject Oriented Integrated
12
Time Variant Non-Volatile
13
What Is a Data Warehouse?
14
Data Warehouse: A Multi-Tiered Architecture
16
Components of a Data Warehouse
Hardware:
Disk Storage - Speed and enough storage for the loaded data set
17
Components of a Data Warehouse
DBMS:
Availability
Metadata Repositories
19
Getting the Data In
•Data will come from multiple databases and files within the organization
20
Transformation Phase:
Getting the Data In
Takes data and turns it into a form that is
suitable for insertion into the warehouse
Three Steps : Combines related data
Removes redundancies
1. Extraction Phase
22
A data warehouse is based on a multidimensional data model which views data in the
form of a data cube
A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
23
Cube: A Lattice of Cuboids
time
time,location location,supplier
item,location
2-D cuboids
time,item time,supplier item,supplier
time,location,supplier item,location,supplier
3-D cuboids
time,item,supplier time,location,supplier
25
Multidimensional Data
28
29
30
Pivot (rotate):
reorient the cube, visualization, 3D to series of 2D planes
31
CS-822 Data Mining, Spring 2023
Roll-up
Roll-up performs aggregation on a data cube in any
of the following ways −
By dimension reduction
32
CS-822 Data Mining, Spring 2023
Drill-down
Drill-down is the reverse operation of roll-up. It is
performed by either of the following ways −
33
Slice
36
OLAP Operations
37
38
Dimensions
The tables that describe the dimensions involved are called dimension tables.
Dividing a DWH project into dimensions provides structured information for
analysis & reporting.
End users fire queries on these dimension tables which contain descriptive
information.
42
Star Schema
43
45
Components of a Data Warehouse – Data Mining
46