Week-2-Data Warehouse and Olap
Week-2-Data Warehouse and Olap
Week-2
Nov-2023
Instructor:
Dr.T.GopiKrishna
Outline
Objective:- The primary purpose of a data
warehouse is to provide a central repository of
information that can be quickly analyzed and
queried to generate relevant insights.
Differences between Database Vs. DW
Overview of Data Warehouse, definitions
Evolution of DW and need of DW
Types of DW
OLAP AND OLTP
Applications of DW, Advantages, disadvantages
Architecture and Components of DW
Key Difference between Database and
Data Warehouse
A database is a collection of related data that represents some
elements of the real world, whereas a Data warehouse is an
information system that stores historical and commutative data
from single or multiple sources.
A database is designed to record data, whereas a Data warehouse is
designed to analyze data.
A database is an application-oriented collection of data, whereas
Data Warehouse is a subject-oriented collection of data.
Database uses Online Transactional Processing (OLTP), whereas
Data warehouse uses Online Analytical Processing (OLAP).
Database tables and joins are complicated because they are
normalized, whereas Data Warehouse tables and joins are easy
because they are denormalized.
ER modelling techniques are used for designing Databases, whereas
data modelling techniques are used for designing Data Warehouse.
Bad decisions can lead to disaster!
• Data Warehousing is at the base of decision support
systems(DSS)
Why Data Warehousing & OLAP is important?
It helps to ..
Why Data Warehousing & OLAP is
important? Contd..
It helps to ..
• Understand the information hidden within the
organization’s data.
Data
Operational
Warehouse
Appl A - bal-on-hand
Appl B - current-balance Current balance
Appl C - cash-on-hand
Operational Data
Warehouse
insert
delete
load
read only
access
replace
change
Data
Operational
Warehouse
a DW is a
–Repository of an organization’s
electronically stored data.
–Designed to facilitate
reporting and
analysis.
Typical Features of DW..
-- Reside on computers dedicated to this function.
– Run on DBMS such as Oracle, IBM DB2,
Teradata or Microsoft SQL Server
– Retain data for long periods of time
– Consolidate data obtained from a variety of
sources
– Are built around their own carefully designed
data model
Evolution of Data Warehousing
1. 1960 - 1985 : MIS Era
• Unfriendly
• Slow
• Dependent on IS programmers
• Inflexible
• Analysis limited to defined reports
Evolution of Data Warehousing
2. 1985 - 1990 : Querying Era
Operational Data Store, which is also called ODS, are nothing but data store required when
neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data
warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing
records of the Employees.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a particular line of
business, such as sales, finance, sales or finance. In an independent data mart, data can collect
OLTP (OnLineTransaction
Processing):
hardware is different
Understanding The Differences Is The Key
Properties
OLTP Vs Warehouse contd..
Operational System Data Warehouse
Transaction Processing Query Processing
Efficiency Effectiveness
Different kinds of Information Needs
•• Current
Current Is this medicine available
in stock
•• Recent
Recent What are the tests this
patient has completed so
far
•• Historical
Historical
Has the incidence of
Tuberculosis increased in
last 5 years in Southern
region
OLAP (Online Analytical Process)
Analytical Processing.
• Describes processing at warehouse
CS E5317 36
Comparison between OLTP and OLAP
systems Contd..
OLTP Vs ODS Vs DWH
Characteristic OLTP ODS Data Warehouse
Audience Operating Analysts Managers and
Personnel analysts
Data access Individual records, Individual records, Set of records,
transaction driven transaction or analysis driven
analysis driven
Data content Current, real-time Current and near- Historical
current
Data granularity Detailed Detailed and lightly Summarized and
summarized derived
Data organization Functional Subject-oriented Subject-oriented
Data quality All application All integrated data Data relevant to
specific detailed needed to support a management
data needed to business activity information needs
support a business
activity
OLTP Vs ODS Vs DWH Contd…
Characteristic OLTP ODS Data Warehouse
Data redundancy Non-redundant Somewhat Managed
within system; redundant with redundancy
Unmanaged operational
redundancy among databases
systems
Data stability Dynamic Somewhat dynamic Static
Data update Field by field Field by field Controlled batch
Data usage Highly structured, Somewhat Highly
repetitive structured, some unstructured,
analytical heuristic or
analytical
Database size Moderate Moderate Large to very large
Database Stable Somewhat stable Dynamic
structure stability
OLTP Vs ODS Vs DWH Contd…
Data warehouse allows business users to quickly access critical data from some
sources all in one place.
Data warehouse provides consistent information on various cross-functional
activities. It is also supporting ad-hoc reporting and query.
Data Warehouse helps to integrate many sources of data to reduce stress on the
production system.
Data warehouse helps to reduce total turnaround time for analysis and
reporting.
Restructuring and Integration make it easier for the user to use for reporting
and analysis.
Data warehouse allows users to access critical data from the number of sources
in a single place. Therefore, it saves user’s time of retrieving data from multiple
sources.
Data warehouse stores a large amount of historical data. This helps users to
analyze different time periods and trends to make future predictions.
Disadvantages of Data Warehouse:
The following are the functions of data warehouse tools and utilities −
Data Transformation − Involves converting the data from legacy format to warehouse
format.
There are many Data Warehousing tools are available in the market. Here, are some most
prominent one:
1. MarkLogic:
MarkLogic is useful data warehousing solution that makes data integration easier and
faster using an array of enterprise features. This tool helps to perform very complex search
operations. It can query different types of data like documents, relationships, and
metadata.
2. Oracle:
Oracle is the industry-leading database. It offers a wide range of choice of data warehouse
solutions for both on-premises and in the cloud. It helps to optimize customer experiences
by increasing operational efficiency.
3. Amazon RedShift:
Amazon Redshift is Data warehouse tool. It is a simple and cost-effective tool to analyze all
types of data using standard SQL and existing BI tools. It also allows running complex
queries against petabytes of structured data, using the technique of query optimization.
4. Informatica
Summary