0% found this document useful (0 votes)
2 views

Lecture 8 p2

OLAP (Online Analytical Processing) is a technology that allows analysts to extract and view business data from multiple perspectives, facilitating faster analysis through pre-calculated and aggregated data stored in cubes. It supports various analytical operations such as roll-up, drill-down, slice, and pivot, and can be implemented through different models including MOLAP, ROLAP, and HOLAP, each with its own advantages and challenges. The document discusses the structure, functionality, and application of OLAP in data warehousing, emphasizing its role in enhancing decision-making processes.

Uploaded by

cekagi7032
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 8 p2

OLAP (Online Analytical Processing) is a technology that allows analysts to extract and view business data from multiple perspectives, facilitating faster analysis through pre-calculated and aggregated data stored in cubes. It supports various analytical operations such as roll-up, drill-down, slice, and pivot, and can be implemented through different models including MOLAP, ROLAP, and HOLAP, each with its own advantages and challenges. The document discusses the structure, functionality, and application of OLAP in data warehousing, emphasizing its role in enhancing decision-making processes.

Uploaded by

cekagi7032
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Data Warehousing

Online Analytical Processing (OLAP)

1
Refs
• https://www.guru99.com/online-analytical-processing.html
• https://www.guru99.com/multidimensional-online-analytical-
processing.html
• https://www.javatpoint.com/sparse-matrix
• https://www.mssqltips.com/sqlservertutorial/4206/sql-server-
analysis-services-multidimensional-data-model/
What is OLAP?
OLAP = On-line analytical processing.
• OLAP is a characterization of applications, not a database design
technique.
• It is a technology that enables analysts to extract and view business
data from different points of view.
• Analysts frequently need to group, aggregate and join data. With
OLAP data can be pre-calculated and pre-aggregated, making
analysis faster.
• OLAP databases are divided into one or more cubes. The cubes are
designed in such a way that creating and viewing reports become
easy.
Cont.
• People often confuse OLAP with specific physical design techniques.
• This is a mistake: OLAP is a characterization of the application domain
centered around slice-and-dice analytics.
• As we will see, there are many possible implementations capable of
delivering OLAP characteristics. Depending on data size, performance
requirements, cost constraints, etc. the specific implementation
technique will vary.
Supporting the human thought process
THOUGHT PROCESS QUERY SEQUENCE

An enterprise wide fall in profit What was the quarterly sales


during last year ??

? Profit down by a large percentage What was the quarterly sales at


consistently during last quarter regional level during last year ??
only. Rest is OK

What was the quarterly sales at


What is special about last quarter product level during last year?
?
What was the monthly sale for
last quarter group by products
Products alone doing OK, but
North region is most problematic.
What was the monthly sale for
last quarter group by region
OK. So the problem is the high
cost of products purchased
in north. What was the monthly sale of
products in north at store level
group by products purchased

How many such query sequences can be programmed in advance? 5


Analysis of last example
• Analysis is Ad-hoc unplanned, unstructured

• Analysis is interactive (user driven) planned , structured anlysis

• Analysis is iterative
• Answer to one question leads to a dozen more

• Analysis is directional
• Drill Down
• Roll Up More in
subsequent
• Pivot slides

6
Challenges …
• Not feasible to write predefined queries.
• Fails to remain user_driven (becomes programmer driven).

• Fails to remain ad_hoc and hence is not interactive.

• Hard to enable ad-hoc query support


• Business user can not build his/her own queries (does not know SQL,
should not know it).

7
Challenges (Cont.)

• Contradiction
• Want to compute answers in advance, but don't know the
questions

• Solution
• Compute answers to “all” possible “queries”. But how?

• NOTE: Queries are multidimensional aggregates at some level

8
“All” possible queries (level aggregates)
ALL ALL

State Frontier ... Punjab

Division Mardan ... Peshawar Lahore ... Multan

District Peshawar Lahore

City Lahore ... Gugranwala

Zone Defense ...Gulberg 9


OLAP: Facts & Dimensions

• FACTS: Quantitative values (numbers) or “measures.”


• e.g., units sold, sales $, Co, Kg etc.

• DIMENSIONS: Descriptive categories.


• e.g., time, geography, product etc.

• DIM often organized in hierarchies representing levels of detail


in the data (e.g., week, month, quarter, year, decade etc.).

10
OLAP Cube
• Spreadsheets are ideal for two-
dimensional data. However,
OLAP contains
multidimensional data, with
data usually obtained from a
different and unrelated source.
• The cube can store and analyze
multidimensional data in a
logical and orderly manner.
How does it work?
• A Data warehouse would extract information from multiple data
sources and formats like text files, excel sheet, multimedia files, etc.

• The extracted data is cleaned and transformed. Data is loaded into


an OLAP server (or OLAP cube) where information is pre-calculated
in advance for further analysis.

etl==> olap server


Cube

Fact table view:


Multi-dimensional cube:
sale prodId storeId amt
p1 c1 12 c1 c2 c3
p2 c1 11 p1 12 50
p1 c3 50 p2 11 8
p2 c2 8

dimensions = 2
3-D Cube

Fact table view: Multi-dimensional cube:

sale prodId storeId date amt


p1 c1 1 12
p2 c1 1 11 c1 c2 c3
day 2
p1 c3 1 50 p1 44 4
p2 c2 1 8 p2 c1 c2 c3
p1 c1 2 44 day 1
p1 12 50
p1 c2 2 4 p2 11 8

dimensions = 3
all profucts in qtr 1 in
america

15
How is aggregation usually carried on?
Aggregates
• Add up amounts for day 1
• In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1

sale prodId storeId date amt


p1 c1 1 12
p2 c1 1 11
p1 c3 1 50 81
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
Aggregates
• Add up amounts by day
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date

sale prodId storeId date amt


p1 c1 1 12
p2 c1 1 11 ans date sum
p1 c3 1 50 1 81
p2 c2 1 8 2 48
p1 c1 2 44
p1 c2 2 4
Another Example
• Add up amounts by day, product
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId storeId date amt
p1 c1 1 12 sale prodId date amt
p2 c1 1 11
p1 1 62
p1 c3 1 50
p2 1 19
p2 c2 1 8
p1 c1 2 44 p1 2 48
p1 c2 2 4
How about re-arranging data so that all sorts
of aggregations can be easily achieved?
Where Does OLAP Fit In?
• It is a classification of applications, NOT a database
design technique.

• Analytical processing uses multi-level aggregates,


instead of record level access.

• Objective is to support very


I. fast
II. iterative and
III. ad-hoc decision-making.

21
Where does OLAP fit in?
Basic analytical operations of OLAP
• Four types of analytical OLAP operations are:
1.Roll-up
2.Drill-down
3.Slice and dice
4.Pivot (rotate)
1) Roll-up:
• Roll-up is also known as
“consolidation” or “aggregation.” The
Roll-up operation can be performed
in 2 ways
1.Reducing dimensions
2.Climbing up concept hierarchy.
Concept hierarchy is a system of
grouping things based on their order
or level.
2) Drill-down
• In drill-down data is fragmented into
smaller parts. It is the opposite of the
rollup process. It can be done via
• Moving down the concept hierarchy
• Increasing a dimension
3) Slice: filteration

• Here, one dimension is selected, and


a new sub-cube is created.
• It’s more of a filter to specific item.
Dice:
• This operation is similar to a slice.
The difference in dice is you select 2
or more dimensions that result in the
creation of a sub-cube.
4) Pivot
• In Pivot, you rotate the
data axes to provide a
substitute presentation
of data.
• In the following example,
the pivot is based on item
types.
OLAP Models/Implementations cube--> faster

29
MOLAP stands for Multidimensional
online analytical processing.
• It is a type of OLAP process which utilizes a multidimensional
data model.
• Data in MOLAP is pre-computed, pre-summarized and is
stored in MOLAP.
• MOLAP has the capability of storing different permutations
and combinations of data which is already stored in a
multidimensional array.
• All cells of data present can be accessed directly from the
array.
• As a result, MOLAP is faster and gives responses to the
analytical data.
Implementation of MOLAP
❖When the cubes are created it is difficult to scale the number
and size of cubes as these should be scalable as and when the
dimensions change or increase.
❖Specific languages used to query MOLAP. However, it
involves extensive click and drag support.
❖The data is by default stored in a multidimensional array. This
provides the user different perspectives of data that can
aggregate the sales by time, geography or the product
❖Data cubes cannot be created by using the ad hoc queries and
on the go. Hence it is said that they work best with pre-defined
queries. Data cubes are thus critical and have a necessity of in
detail front end and design work
Advantages of MOLAP
• MOLAP allows fastest indexing to the pre-computed summarized
data.

• Helps the users connected to a network who need to analyze


larger, less-defined data.

• Easier to use, therefore MOLAP is suitable for inexperienced


users.
Disadvantages of MOLAP
• MOLAP are not capable of containing detailed data, Since all
calculations are performed when the cube is built, a large
amount of data cannot be stored in the cube itself.

• It is difficult to change the dimensions without re-aggregating.

• The storage utilization may be low


MOLAP Tools examples

• Essbase – Tools from Oracle that has a multidimensional database.


• Express Server – Web-based environment that runs on Oracle
database.
• Yellowfin – Business analytics tools for creating reports and
dashboards.
• Clear Analytics – Clear analytics is an Excel-based business solution.
• SAP Business Intelligence – Business analytics solutions from SAP
• SQL Server Data Tools (SSDT) – is the IDE used to develop
SSAS solutions.
Conclusion
• Multidimensional data analysis is also possible if a relational
database is used.
• By that would require querying data from multiple tables.
• On the contrary, MOLAP has all possible combinations of data
already stored in a multidimensional array.
• MOLAP can access this data directly. Hence, MOLAP is faster
compared to Relational Online Analytical Processing (ROLAP).
Relational OLAP (ROLAP)
• Relational On-Line Analytical Processing (ROLAP) is primarily used for
data stored in a relational database, where both the base data and
dimension tables are stored as relational tables.
• ROLAP servers are used to bridge the gap between the relational
back-end server and the client’s front-end tools.
• ROLAP servers store and manage warehouse data using RDBMS, and
OLAP middleware fills in the gaps.
Relational OLAP Architecture

•Database server
•ROLAP server (Saved on the main data repository
(DWH))
•Front-end tool.
Advantages of ROLAP model:
• High data efficiency. It offers high data efficiency because query
performance and access language are optimized particularly for the
multidimensional data analysis.
• Scalability. This type of OLAP system offers scalability for
managing large volumes of data, and even when the data is steadily
increasing.

ROLAP doesn't store data in a multidimensional cube like MOLAP


does, it can still efficiently handle multidimensional data analysis
through its optimized query processing and access methods.
Drawbacks of ROLAP model:
• Demand for higher resources: ROLAP needs high utilization of
manpower, software, and hardware resources.
• Aggregately data limitations. ROLAP tools use SQL for all
calculation of aggregate data. However, there are no set limits for
handling computations.
• Slow query performance. Query performance in this model is slow
when compared with MOLAP
Hybrid OLAP
• Hybrid OLAP is a mixture of both ROLAP and MOLAP. It offers fast
computation of MOLAP and higher scalability of ROLAP. HOLAP uses
two databases.
1.Aggregated or computed data is stored in a multidimensional OLAP
cube
2.Detailed information is stored in a relational database.
Benefits of Hybrid OLAP:
• This kind of OLAP helps to economize the disk space, and it also
remains compact which helps to avoid issues related to access
speed and convenience.
• Hybrid HOLAP’s uses cube technology which allows faster
performance for all types of data.
• ROLAP are instantly updated and HOLAP users have access to this
real-time instantly updated data. MOLAP brings cleaning and
conversion of data thereby improving data relevance.
• This brings best of both worlds.
Drawbacks of Hybrid OLAP:
• Greater complexity level: The major drawback in HOLAP systems is
that it supports both ROLAP and MOLAP tools and applications.
Thus, it is very complicated.
• Potential overlaps: There are higher chances of overlapping
especially into their functionalities.

You might also like