0% found this document useful (0 votes)
37 views

Week-2-Data Warehouse and Olap

This document provides an overview of data warehouses and OLAP. It defines a data warehouse as a central repository of integrated data from multiple sources that can be analyzed to generate insights. Key differences between databases and data warehouses are discussed, such as databases being transactional while data warehouses are designed for analysis. The evolution of data warehousing and the need for data warehouses to support decision making is also covered. Finally, the document discusses OLTP vs OLAP and the architecture and components of a typical data warehouse.

Uploaded by

Michael Zewdie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Week-2-Data Warehouse and Olap

This document provides an overview of data warehouses and OLAP. It defines a data warehouse as a central repository of integrated data from multiple sources that can be analyzed to generate insights. Key differences between databases and data warehouses are discussed, such as databases being transactional while data warehouses are designed for analysis. The evolution of data warehousing and the need for data warehouses to support decision making is also covered. Finally, the document discusses OLTP vs OLAP and the architecture and components of a typical data warehouse.

Uploaded by

Michael Zewdie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Data Warehouse and OLAP

Week-2
Nov-2023

Instructor:

Dr.T.GopiKrishna
Outline
Objective:- The primary purpose of a data
warehouse is to provide a central repository of
information that can be quickly analyzed and
queried to generate relevant insights.
 Differences between Database Vs. DW
 Overview of Data Warehouse, definitions
 Evolution of DW and need of DW
 Types of DW
 OLAP AND OLTP
 Applications of DW, Advantages, disadvantages
 Architecture and Components of DW
Key Difference between Database and
Data Warehouse
 A database is a collection of related data that represents some
elements of the real world, whereas a Data warehouse is an
information system that stores historical and commutative data
from single or multiple sources.
 A database is designed to record data, whereas a Data warehouse is
designed to analyze data.
 A database is an application-oriented collection of data, whereas
Data Warehouse is a subject-oriented collection of data.
 Database uses Online Transactional Processing (OLTP), whereas
Data warehouse uses Online Analytical Processing (OLAP).
 Database tables and joins are complicated because they are
normalized, whereas Data Warehouse tables and joins are easy
because they are denormalized.
 ER modelling techniques are used for designing Databases, whereas
data modelling techniques are used for designing Data Warehouse.
Bad decisions can lead to disaster!
• Data Warehousing is at the base of decision support
systems(DSS)
Why Data Warehousing & OLAP is important?
It helps to ..
Why Data Warehousing & OLAP is
important? Contd..
It helps to ..
• Understand the information hidden within the
organization’s data.

• See data from different angles:


product, client, time, geographical area

• Get adequate statistics to get your point of argumentation


across

• Get a glimpse of the future…


What is a Data Warehouse?
A Practitioners Viewpoint

“A data warehouse is simply a single,


complete, and consistent store of
data obtained from a variety of
sources and made available to end
users in a way they can understand
and use it in a business context.”
-- Barry Devlin, IBM Consultant
Data Warehouse
Data warehouse system is also known by the
following name:
Decision Support System (DSS)
Executive Information System
Management Information System
Business Intelligence Solution
Analytic Application
Data Warehouse
What is a DataWarehouse ?
An Alternative Viewpoint
A data warehouse is a subject-oriented,
integrated, nonvolatile, time-variant
collection of data in support of
management's decisions.

collection of data that is used primarily in


organizational decision making.”

WH Inmon - Regarded As Father Of Data Warehousing


Subject-Oriented- Characteristics of a Data Warehouse

Data
Operational
Warehouse

Leads Prospects Customers Products

Quotes Regions Time


Orders

Focus is on Subject Areas rather than Applications


Subject oriented contd..
• The data in the data warehouse is organized so that
all the data elements relating to the same real-
world event or object are linked together
• Typical subject areas in DWs are
Customer,
Product,
Order,
Claim,
Account,…
Subject oriented contd..
• Case Study: customer as subject in a DW
•DW is organized in this case by the customer
•It may consist of 10, 100 or more physical
tables, all related.
Integrated
• The data warehouse contains data from most
or all of an organization's operational systems
and this data is made consistent
–use case: gender, measurement, conflicting
keys,
Integrated (use case) - Characteristics of a
Data Warehouse
Appl A - m,f
Appl B - 1,0 m,f
Appl C - male,female

Appl A - balance dec fixed (13,2)


balance dec
Appl B - balance pic 9(9)V99
fixed (13,2)
Appl C - balance pic S9(7)V99 comp-3

Appl A - bal-on-hand
Appl B - current-balance Current balance
Appl C - cash-on-hand

Appl A - date (julian)


Appl B - date (yymmdd) date (julian)
Appl C - date (absolute)

Integrated View Is The Essence Of A Data Warehouse


Non-volatile
–– Data in the data warehouse is never over-
written or deleted-once committed, the
data is static, read-only, and retained for
future reporting.
––Data is loaded, but not updated.
––When subsequent changes occur, a new
snapshot record is written.
Non-volatile contd..- Characteristics of a
Data Warehouse
insert change

Operational Data
Warehouse
insert
delete
load
read only
access
replace
change

Data Warehouse Is Relatively Static In Nature


Time-varying
– The changes to the data in the data
warehouse are tracked and recorded so that
reports can be produced showing changes over
time.
– Different environments have different time
horizons associated.
• While for operational systems a 60-to-90 day
time horizon is normal, data warehouse has a
5-to-10 year horizon
Time Variant contd..- Characteristics of a
Data Warehouse

Data
Operational
Warehouse

Current Value data Snapshot data


• time horizon : 60-90 days • time horizon : 5-10 years
•data warehouse stores historical
data

Data Warehouse Typically Spans Across Time


More general,Definition of DW is..

a DW is a
–Repository of an organization’s
electronically stored data.

–Designed to facilitate
reporting and
analysis.
Typical Features of DW..
-- Reside on computers dedicated to this function.
– Run on DBMS such as Oracle, IBM DB2,
Teradata or Microsoft SQL Server
– Retain data for long periods of time
– Consolidate data obtained from a variety of
sources
– Are built around their own carefully designed
data model
Evolution of Data Warehousing
1. 1960 - 1985 : MIS Era
• Unfriendly
• Slow
• Dependent on IS programmers
• Inflexible
• Analysis limited to defined reports
Evolution of Data Warehousing
2. 1985 - 1990 : Querying Era

• Adhoc, unstructured access to corporate data

• SQL as interface not scalable

• Cannot handle complex analysis


Evolution of Data Warehousing
3. 1990 - 20xx : Analysis Era
• Trend Analysis
• What If ?
• Cross Dimensional Comparisons
• Statistical profiles
• Automated pattern and rule discovery
Evolution of Data Warehousing
2. 1985 - 1990 : Querying Era

• Adhoc, unstructured access to corporate data

• SQL as interface not scalable

• Cannot handle complex analysis


Need for Data Warehousing
• Better business intelligence for end-users

• Reduction in time to locate, access, and analyze information

• Consolidation of disparate information sources

• Strategic advantage over competitors

• Faster time-to-market for products and services

• Replacement of older, less-responsive decision support systems

• Reduction in demand on IS to generate reports


Types of Data Warehouse

Three main types of Data Warehouses (DWH) are:

1. Enterprise Data Warehouse (EDW):

Enterprise Data Warehouse (EDW) is a centralized warehouse. It provides decision support


service across the enterprise. It offers a unified approach for organizing and representing data. It
also provide the ability to classify data according to the subject and give access according to
those divisions.

2. Operational Data Store:

Operational Data Store, which is also called ODS, are nothing but data store required when
neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data
warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing
records of the Employees.

3. Data Mart:

A data mart is a subset of the data warehouse. It specially designed for a particular line of
business, such as sales, finance, sales or finance. In an independent data mart, data can collect
OLTP (OnLineTransaction
Processing):

Also known under the name of operational data,

it represents day-to-day operational business activities:

•Purchasing, sales, production distribution, …

–Typically for data entry and retrieval transaction


processing.

–Reflects only the current state of the data.


OLTP vs. DW

• Represents front-end analytics based on a


DW repository.
–It provides information for activities like
Resource planning, capital budgeting,
marketing initiatives,…
–It is decision oriented.
OLTP Systems Vs Data Warehouse
Remember

Between OLTP and Data Warehouse systems

users are different

data content is different,

data structures are different

hardware is different
Understanding The Differences Is The Key
Properties
OLTP Vs Warehouse contd..
Operational System Data Warehouse
Transaction Processing Query Processing

Predictable CPU Usage Random CPU Usage

Time Sensitive History Oriented

Operator View Managerial View

Normalized Efficient Denormalized Design for


Design for TP Query Processing
OLTP Vs Warehouse contd..
Operational System Data Warehouse
Designed for Atmocity, Designed for quite or static
Consistency, Isolation and database
Durability
Organized by transactions Organized by subject
(Order, Input, Inventory) (Customer, Product)
Relatively smaller database Large database size
Many concurrent users Relatively few concurrent
users
Volatile Data Non Volatile Data
OLTP Vs Warehouse contd..

Operational System Data Warehouse


Stores all data Stores relevant data

Performance Sensitive Less Sensitive to performance

Not Flexible Flexible

Efficiency Effectiveness
Different kinds of Information Needs
•• Current
Current Is this medicine available
in stock

•• Recent
Recent What are the tests this
patient has completed so
far

•• Historical
Historical
Has the incidence of
Tuberculosis increased in
last 5 years in Southern
region
OLAP (Online Analytical Process)

• data warehouse systems are well suited for On-Line

Analytical Processing.
• Describes processing at warehouse

Examples of OLAP operations –

include drill-down and roll-up, which allow the user to


view the data at differing degrees of summarization.
OLAP Contd..
• OLAP consists of Summarization,
Consolidation, Aggregation, Different angle
view / Multidimensional Analysis for decision
making.
OLTP Vs OLAP
Standard DB (OLTP) Warehouse (OLAP)
• Mostly updates · Mostly reads
• Many small transactions · Queries are long and complex
• Mb - Gb of data · Gb - Tb of data
• Current snapshot · History
• Index/hash on p.k. · Lots of scans
• Raw data · Summarized, reconciled data
• Thousands of users (e.g., · Hundreds of users (e.g.,
clerical users) decision-makers, analysts)

CS E5317 36
Comparison between OLTP and OLAP
systems Contd..
OLTP Vs ODS Vs DWH
Characteristic OLTP ODS Data Warehouse
Audience Operating Analysts Managers and
Personnel analysts
Data access Individual records, Individual records, Set of records,
transaction driven transaction or analysis driven
analysis driven
Data content Current, real-time Current and near- Historical
current
Data granularity Detailed Detailed and lightly Summarized and
summarized derived
Data organization Functional Subject-oriented Subject-oriented
Data quality All application All integrated data Data relevant to
specific detailed needed to support a management
data needed to business activity information needs
support a business
activity
OLTP Vs ODS Vs DWH Contd…
Characteristic OLTP ODS Data Warehouse
Data redundancy Non-redundant Somewhat Managed
within system; redundant with redundancy
Unmanaged operational
redundancy among databases
systems
Data stability Dynamic Somewhat dynamic Static
Data update Field by field Field by field Controlled batch
Data usage Highly structured, Somewhat Highly
repetitive structured, some unstructured,
analytical heuristic or
analytical
Database size Moderate Moderate Large to very large
Database Stable Somewhat stable Dynamic
structure stability
OLTP Vs ODS Vs DWH Contd…

Characteristic OLTP ODS Data Warehouse


Development Requirements Data driven, Data driven,
methodology driven, structured somewhat evolutionary
evolutionary
Operational Performance and Availability Access flexibility
priorities availability and end user
autonomy
Philosophy Support day-to- Support day-to-day Support managing
day operation decisions & the enterprise
operational
activities
Predictability Stable Mostly stable, some Unpredictable
unpredictability
Response time Sub-second Seconds to minutes Seconds to minutes
Return set Small amount of Small to medium Small to large
data amount of data amount of data
OLAP Operations

Since OLAP servers are based on


multidimensional view of data, we will
discuss OLAP operations in
multidimensional data.
Here is the list of OLAP operations −
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
Roll-up performs aggregation on Roll-up
a data cube in any of the following ways −
By climbing up a concept hierarchy for a dimension
By dimension reduction
On rolling up, the data is aggregated by ascending the location hierarchy from
the level of city to the level of country.
The data is grouped into cities rather than countries.
When roll-up is performed, one or more dimensions from the data cube are
removed.
OLAP Operations
Drill-down

Drill-down is the reverse operation of roll-up.


It is performed by either of the following ways
OLAP Operations

By stepping down a concept hierarchy for a


dimension
By introducing a new dimension.
On drilling down, the time dimension is descended from the level of quarter to
the level of month.
When drill-down is performed, one or more dimensions from the data cube are
added.
It navigates the data from less detailed data to highly detailed data.
OLAP Operations
Slice
The slice operation selects one particular dimension from a given cube and provides a new sub-
cube. Here Slice is performed for the dimension "time" using the criterion time = "Q1".

It will form a new sub-cube by selecting one or more dimensions.


OLAP Operations
Dice
Dice selects two or more dimensions from a given cube and provides a new sub-
cube.
OLAP Operations
The dice operation on the cube based on the
following selection criteria involves three
dimensions.

(location = "Toronto" or "Vancouver")


(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")
OLAP Operations
Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in
order to provide an alternative presentation of data.
OLAP Operations
Types of OLAP Servers

We have four types of OLAP servers −


Relational OLAP (ROLAP)
Multidimensional OLAP (MOLAP)
Hybrid OLAP (HOLAP)
Specialized SQL Servers
Applications of Data Warehouse
Data Warehouse Applications by Industry
Advantages of Data Warehouse (DWH):

 Data warehouse allows business users to quickly access critical data from some
sources all in one place.
 Data warehouse provides consistent information on various cross-functional
activities. It is also supporting ad-hoc reporting and query.
 Data Warehouse helps to integrate many sources of data to reduce stress on the
production system.
 Data warehouse helps to reduce total turnaround time for analysis and
reporting.
 Restructuring and Integration make it easier for the user to use for reporting
and analysis.
 Data warehouse allows users to access critical data from the number of sources
in a single place. Therefore, it saves user’s time of retrieving data from multiple
sources.
 Data warehouse stores a large amount of historical data. This helps users to
analyze different time periods and trends to make future predictions.
Disadvantages of Data Warehouse:

 Not an ideal option for unstructured data.


 Creation and Implementation of Data Warehouse is surely time
confusing affair.
 Data Warehouse can be outdated relatively quickly
 Difficult to make changes in data types and ranges, data source
schema, indexes, and queries.
 The data warehouse may seem easy, but actually, it is too
complex for the average users.
 Despite best efforts at project management, data warehousing
project scope will always increase.
 Sometime warehouse users will develop different business
rules.
 Organisations need to spend lots of their resources for training
and Implementation purpose.
Typical Data Warehouse Architecture
Functions of Data Warehouse

The following are the functions of data warehouse tools and utilities −

Data Extraction − Involves gathering data from multiple heterogeneous sources.

Data Cleaning − Involves finding and correcting the errors in data.

Data Transformation − Involves converting the data from legacy format to warehouse
format.

Data Loading − Involves sorting, summarizing, consolidating, checking integrity, and


building indices and partitions.

Refreshing − Involves updating from data sources to warehouse.


Reporting- Visualize the results
Data Warehouse Tools

There are many Data Warehousing tools are available in the market. Here, are some most
prominent one:
1. MarkLogic:

MarkLogic is useful data warehousing solution that makes data integration easier and
faster using an array of enterprise features. This tool helps to perform very complex search
operations. It can query different types of data like documents, relationships, and
metadata.

2. Oracle:

Oracle is the industry-leading database. It offers a wide range of choice of data warehouse
solutions for both on-premises and in the cloud. It helps to optimize customer experiences
by increasing operational efficiency.
3. Amazon RedShift:

Amazon Redshift is Data warehouse tool. It is a simple and cost-effective tool to analyze all
types of data using standard SQL and existing BI tools. It also allows running complex
queries against petabytes of structured data, using the technique of query optimization.

4. Informatica
Summary

A data warehouse is constructed by integrating data from multiple heterogeneous sources.


It supports analytical reporting, structured and/or ad hoc queries and decision making. A
data warehouses provides us generalized and consolidated data in multidimensional view.
Along with generalized and consolidated view of data, a data warehouses also provides us
Online Analytical Processing (OLAP) tools. These tools help us in interactive and effective
analysis of data in a multidimensional space. This analysis results in data generalization
and data mining. Data mining functions such as association, clustering, classification,
prediction can be integrated with OLAP operations to enhance the interactive mining of
knowledge at multiple level of abstraction. That's why data warehouse has now become an
important platform for data analysis and online analytical processing.

You might also like