0% found this document useful (0 votes)
53 views

DWDM Unit 2

Uploaded by

Sakshi Ujjlayan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

DWDM Unit 2

Uploaded by

Sakshi Ujjlayan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 104

BCA VI SEM

UNIT 2
Syllabus
Data Cube: A Multidimensional
Data Model
 “What is a data cube?” A data cube allows data to be modeled
and viewed in multiple dimensions.
 It is defined by dimensions and facts.
 In general terms, dimensions are the perspectives or entities
with respect to which an organization wants to keep records.
 For example, All Electronics may create a sales data warehouse in
order to keep records of the store’s sales with respect to the
dimensions time, item, branch, and location. These dimensions
allow the store to keep track of things like monthly sales of items
and the branches and locations at which the items were sold.
 Each dimension may have a table associated with it, called a
dimension table, which further describes the dimension.
 For example, a dimension table for item may contain the attributes
item name, brand, and type. Dimension tables can be specified by
users or experts, or automatically generated and adjusted based on
data distributions.
Principles of Dimensional Modeling
 What is Dimensional Modeling?
 Dimensional Data Modeling is one of the data
modeling techniques used in data warehouse design.
The concept of Dimensional Modeling was developed
by Ralph Kimball which is comprised of facts and
dimension tables. Since the main goal of this modeling
is to improve the data retrieval so it is optimized for
SELECT OPERATION.
 In dimensional modeling, the transaction record is
divided into either "facts," which are frequently
numerical transaction data, or "dimensions," which
are the reference information that gives context to the
facts
Objectives of Dimensional
Modeling
 The purposes of dimensional modeling are:
 To produce database architecture that is easy for end-
clients to understand and write queries.
 To maximize the efficiency of queries. It achieves these
goals by minimizing the number of tables and
relationships between them.
Advantages of Dimensional Modeling
 Following are the benefits of dimensional modeling are:
 Dimensional modeling is simple: Dimensional modeling methods
make it possible for warehouse designers to create database schemas
that business customers can easily hold and comprehend. There is no
need for vast training on how to read diagrams, and there is no
complicated relationship between different data elements.
 Dimensional modeling promotes data quality: The star schema
enable warehouse administrators to enforce referential integrity checks
on the data warehouse. Since the fact information key is a
concatenation of the essentials of its associated dimensions, a factual
record is actively loaded if the corresponding dimensions records are
duly described and also exist in the database.
 By enforcing foreign key constraints as a form of referential integrity
check, data warehouse DBAs add a line of defense against corrupted
warehouses data.
 Performance optimization is possible through aggregates: As the
size of the data warehouse increases, performance optimization
develops into a pressing concern. Customers who have to wait for hours
to get a response to a query will quickly become discouraged with the
warehouses. Aggregates are one of the easiest methods by which query
performance can be optimized.
Disadvantages of Dimensional Modeling

 To maintain the integrity of fact and dimensions,


loading the data warehouses with a record from
various operational systems is complicated.
 It is severe to modify the data warehouse operation if
the organization adopting the dimensional technique
changes the method in which it does business.
Elements of Dimensional Modeling
 Fact
 It is a collection of associated data items, consisting of
measures and context data. It typically represents business
items or business transactions.
 Dimensions
 It is a collection of data which describe one business
dimension. Dimensions decide the contextual background
for the facts, and they are the framework over which OLAP
is performed.
 Measure
 It is a numeric attribute of a fact, representing the
performance or behavior of the business relative to the
dimensions.
Elements of Dimensional Modeling
 Fact Table
 Fact tables are used to data facts or measures in the
business. Facts are the numeric data elements that are
of interest to the company.
 Characteristics of the Fact table
 The fact table includes numerical values of what we
measure. For example, a fact value of 20 might means
that 20 widgets have been sold.
 Each fact table includes the keys to associated
dimension tables. These are known as foreign keys in
the fact table.
 Fact tables typically include a small number of
columns.
 When it is compared to dimension tables, fact tables
have a large number of rows.
Elements of Dimensional Modeling
 Dimension Table
 Dimension tables establish the context of the facts.
Dimensional tables store fields that describe the facts.
Characteristics of the Dimension table
 Dimension tables contain the details about the facts.
That, as an example, enables the business analysts to
understand the data and their reports better.
 The dimension tables include descriptive data about the
numerical values in the fact table. That is, they contain
the attributes of the facts. For example, the dimension
tables for a marketing analysis function might include
attributes such as time, marketing region, and product
type.
 Example: A city and state can view a store summary in
a fact table. Item summary can be viewed by brand,
color, etc. Customer information can be viewed by
name and address.
What is Multi-Dimensional Data Model?
 A multidimensional model views data in the form of a data-
cube. A data cube enables data to be modeled and viewed in
multiple dimensions. It is defined by dimensions and facts.
 The dimensions are the perspectives or entities concerning
which an organization keeps records.
 For example, a shop may create a sales data warehouse to keep
records of the store's sales for the dimension time, item, and
location.
 These dimensions allow the save to keep track of things, for
example, monthly sales of items and the locations at which the
items were sold. Each dimension has a table related to it, called a
dimensional table, which describes the dimension further. For
example, a dimensional table for an item may contain the
attributes item_name, brand, and type.
 A multidimensional data model is organized around a central
theme, for example, sales. This theme is represented by a fact
table. Facts are numerical measures. The fact table contains the
names of the facts or measures of the related dimensional tables.
 Consider the data of a
shop for items sold per
quarter in the city of
Delhi. The data is shown
in the table. In this 2D
representation, the sales
for Delhi are shown for
the time dimension
(organized in quarters)
and the item dimension
(classified according to
the types of an item
sold). The fact or
measure displayed in
rupee_sold (in
thousands).
 Now, if we want to view the sales data with a third
dimension, For example, suppose the data according to
time and item, as well as the location is considered for the
cities Chennai, Kolkata, Mumbai, and Delhi. These 3D data
are shown in the table. The 3D data of the table are
represented as a series of 2D tables.
Conceptually, it may also be represented by the same
data in the form of a 3D data cube, as shown in fig:
Steps to Create Dimensional Data Modeling
 Step-1: Identifying the business objective: The first
step is to identify the business objective. Sales, HR,
Marketing, etc. are some examples of the need of the
organization. Since it is the most important step of
Data Modelling the selection of business objectives
also depends on the quality of data available for that
process.

 Step-2: Identifying Granularity: Granularity is the


lowest level of information stored in the table. The
level of detail for business problems and its solution is
described by Grain.
Steps to Create Dimensional Data Modeling
 Step-3: Identifying Dimensions and their
Attributes: Dimensions are objects or things.
Dimensions categorize and describe data warehouse
facts and measures in a way that supports meaningful
answers to business questions. A data warehouse
organizes descriptive attributes as columns in
dimension tables. For Example, the data dimension
may contain data like a year, month, and weekday.

 Step-4: Identifying the Fact: The measurable data is


held by the fact table. Most of the fact table rows are
numerical values like price or cost per unit, etc.
Steps to Create Dimensional Data Modeling

 Step-5: Building of Schema: We implement the


Dimension Model in this step. A schema is a database
structure. There are two popular schemes: Star
Schema and Snowflake Schema.
Advantages of Dimensional Data Modeling
 Simplified Data Access: Dimensional data modeling enables
users to easily access data through simple queries, reducing the
time and effort required to retrieve and analyze data.
 Enhanced Query Performance: The simple structure of
dimensional data modeling allows for faster query performance,
particularly when compared to relational data models.
 Increased Flexibility: Dimensional data modeling allows for
more flexible data analysis, as users can quickly and easily
explore relationships between data.
 Improved Data Quality: Dimensional data modeling can
improve data quality by reducing redundancy and
inconsistencies in the data.
 Easy to Understand: Dimensional data modeling uses simple,
intuitive structures that are easy to understand, even for non-
technical users.
Disadvantages of Dimensional Data Modeling
 Limited Complexity: Dimensional data modeling may
not be suitable for very complex data relationships, as it
relies on simple structures to organize data.
 Limited Integration: Dimensional data modeling may
not integrate well with other data models, particularly
those that rely on normalization techniques.
 Limited Scalability: Dimensional data modeling may
not be as scalable as other data modeling techniques,
particularly for very large datasets.
 Limited History Tracking: Dimensional data
modeling may not be able to track changes to historical
data, as it typically focuses on current data.
Schemas for Multidimensional Data Model

 Schema is a logical description of the entire database.


It includes the name and description of records of all
record types including all associated data-items and
aggregates.
 Much like a database, a data warehouse also requires to
maintain a schema.
 A database uses relational model, while a data
warehouse uses Star, Snowflake, and Fact
Constellation schema.
Star Schema
 A star schema is the elementary form of a dimensional
model, in which data are organized
into facts and dimensions.
 A fact is an event that is counted or measured, such as a
sale or log in. A dimension includes reference data about
the fact, such as date, item, or customer.
 A star schema is a relational schema where a relational
schema whose design represents a multidimensional data
model.
 The star schema is the explicit data warehouse schema. It is
known as star schema because the entity-relationship
diagram of this schemas simulates a star, with points,
diverge from a central table.
 The center of the schema consists of a large fact table, and
the points of the star are the dimension tables.
Star Schema
 Fact Tables
 A table in a star schema which contains facts and
connected to dimensions. A fact table has two types of
columns: those that include fact and those that are
foreign keys to the dimension table. The primary key of
the fact tables is generally a composite key that is made
up of all of its foreign keys.
 A fact table might involve either detail level fact or fact
that have been aggregated (fact tables that include
aggregated fact are often instead called summary
tables). A fact table generally contains facts with the
same level of aggregation.
Star Schema
 Dimension Tables
 A dimension is an architecture usually composed of one
or more hierarchies that categorize data. If a dimension
has not got hierarchies and levels, it is called a flat
dimension or list. The primary keys of each of the
dimensions table are part of the composite primary keys
of the fact table. Dimensional attributes help to define
the dimensional value. They are generally descriptive,
textual values. Dimensional tables are usually small in
size than fact table.
 Fact tables store data about sales while dimension tables
data about the geographic region (markets, cities),
clients, products, times, channels.
Characteristics of Star Schema
 The star schema is intensely suitable for data
warehouse database design because of the following
features:
 It creates a DE-normalized database that can quickly
provide query responses.
 It provides a flexible design that can be changed easily
or added to throughout the development cycle, and as
the database grows.
 It provides a parallel in design to how end-users
typically think of and use the data.
 It reduces the complexity of metadata for both
developers and end-users.
Advantages of Star Schema
 Star Schemas are easy for end-users and application to
understand and navigate. With a well-designed schema,
the customer can instantly analyze large, multidimensional
data sets.
 The main advantage of star schemas in a decision-support
environment are:
 Query Performance
A star schema database has a limited number of table and
clear join paths, the query run faster than they do against
OLTP systems. Small single-table queries, frequently of a
dimension table, are almost instantaneous. Large join
queries that contain multiple tables takes only seconds or
minutes to run.
 In a star schema database design, the dimension is
connected only through the central fact table. When the
two-dimension table is used in a query, only one join path,
intersecting the fact tables, exist between those two tables.
This design feature enforces authentic and consistent query
results.
Advantages of Star Schema
 Load performance and administration
 Structural simplicity also decreases the time required to
load large batches of record into a star schema database. By
describing facts and dimensions and separating them into
the various table, the impact of a load structure is reduced.
Dimension table can be populated once and occasionally
refreshed. We can add new facts regularly and selectively
by appending records to a fact table.
 Easily Understood
 A star schema is simple to understand and navigate, with
dimensions joined only through the fact table. These joins
are more significant to the end-user because they represent
the fundamental relationship between parts of the
underlying business. Customer can also browse dimension
table attributes before constructing a query.
Advantages of Star Schema
 Built-in referential integrity
 A star schema has referential integrity built-in when
information is loaded.
 Referential integrity is enforced because each data in
dimensional tables has a unique primary key, and all
keys in the fact table are legitimate foreign keys drawn
from the dimension table.
 A record in the fact table which is not related correctly
to a dimension cannot be given the correct key value to
be retrieved.
Disadvantage of Star Schema
 There is some condition which cannot be meet by star
schemas like the relationship between the user, and
bank account cannot describe as star schema as the
relationship between them is many to many.
 Example: Suppose a star schema is composed of a fact
table, SALES, and several dimension tables connected
to it for time, branch, item, and geographic locations.
 The TIME table has a column for each day, month,
quarter, and year. The ITEM table has columns for
each item_Key, item_name, brand, type,
supplier_type. The BRANCH table has columns for
each branch_key, branch_name, branch_type. The
LOCATION table has columns of geographic data,
including street, city, state, and country.
 In this scenario, the SALES table contains only four
columns with IDs from the dimension tables, TIME,
ITEM, BRANCH, and LOCATION, instead of four
columns for time data, four columns for ITEM data,
three columns for BRANCH data, and four columns for
LOCATION data. Thus, the size of the fact table is
significantly reduced. When we need to change an
item, we need only make a single change in the
dimension table, instead of making many changes in
the fact table.
 We can create even more complex star schemas by
normalizing a dimension table into several tables. The
normalized dimension table is called a Snowflake.
What is Snowflake Schema?
 A snowflake schema is equivalent to the star schema.
"A schema is known as a snowflake if one or more
dimension tables do not connect directly to the fact
table but must join through other dimension tables."
 The snowflake schema is an expansion of the star
schema where each point of the star explodes into
more points. It is called snowflake schema because the
diagram of snowflake schema resembles a snowflake.
 Snowflaking is a method of normalizing the
dimension tables in a STAR schemas. When we
normalize all the dimension tables entirely, the
resultant structure resembles a snowflake with the fact
table in the middle.
What is Snowflake Schema?
 Snowflaking is used to develop the performance of
specific queries.
 The schema is diagramed with each fact surrounded
by its associated dimensions, and those dimensions
are related to other dimensions, branching out into a
snowflake pattern.
 The snowflake schema consists of one fact table which
is linked to many dimension tables, which can be
linked to other dimension tables through a many-to-
one relationship.
 Tables in a snowflake schema are generally normalized
to the third normal form. Each dimension table
performs exactly one level in a hierarchy.
The following diagram shows a snowflake schema with
two dimensions, each having three levels. A snowflake
schemas can have any number of dimension, and each
dimension can have any number of levels.
Advantage of Snowflake Schema

 The primary advantage of the snowflake schema is the


development in query performance due to minimized
disk storage requirements and joining smaller lookup
tables.
 It provides greater scalability in the interrelationship
between dimension levels and components.
 No redundancy, so it is easier to maintain.
Disadvantage of Snowflake Schema
 The primary disadvantage of the snowflake schema is
the additional maintenance efforts required due to the
increasing number of lookup tables. It is also known as
a multi fact star schema.
 There are more complex queries and hence, difficult to
understand.
 More tables more join so more query execution time.
Figure shows a simple STAR schema for sales in a manufacturing company. The sales fact
table include quantity, price, and other relevant metrics. SALESREP, CUSTOMER,
PRODUCT, and TIME are the dimension tables.
 The STAR schema for sales, as shown above, contains
only five tables, whereas the normalized version now
extends to eleven tables.
 We will notice that in the snowflake schema, the
attributes with low cardinality in each original
dimension tables are removed to form separate tables.
 These new tables are connected back to the original
dimension table through artificial keys.
Difference between Star and Snowflake
Schemas
 Star Schema
 In a star schema, the fact table
will be at the center and is
connected to the dimension
tables.
 The tables are completely in a
denormalized structure.
 SQL queries performance is
good as there is less number of
joins involved.
 Data redundancy is high and
occupies more disk space.
Difference between Star and Snowflake
Schemas
 Snowflake Schema
 A snowflake schema is an
extension of star schema where
the dimension tables are
connected to one or more
dimensions.
 The tables are partially
denormalized in structure.
 The performance of SQL
queries is a bit less when
compared to star schema as
more number of joins are
involved.
 Data redundancy is low and
occupies less disk space when
compared to star schema.
Difference between Star and Snowflake
Schemas
S.NO Star Schema Snowflake Schema

In star schema, The fact While in snowflake schema, The fact


1. tables and the dimension tables, dimension tables as well as sub
tables are contained. dimension tables are contained.

Star schema is a top-down


2. While it is a bottom-up model.
model.

Star schema uses more


3. While it uses less space.
space.

It takes less time for the While it takes more time than star
4.
execution of queries. schema for the execution of queries.

In star schema, While in this, Both normalization and


5
Normalization is not used. denormalization are used.
Difference between Star and Snowflake
Schemas
S.NO Star Schema Snowflake Schema

6. It’s design is very simple. While it’s design is complex.

While the query complexity of


The query complexity of star
7. snowflake schema is higher than
schema is low.
star schema.

While it’s understanding is


8. It’s understanding is very simple.
difficult.

It has less number of foreign While it has more number of


9.
keys. foreign keys.

10. It has high data redundancy. While it has low data redundancy.
What is Fact Constellation Schema?
 Fact Constellation is a schema for representing
multidimensional model. It is a collection of multiple
fact tables having some common dimension tables. It
can be viewed as a collection of several star schemas. It
is one of the widely used schema for Data warehouse
designing and it is much more complex than star and
snowflake schema.
 A Fact constellation means two or more fact tables
sharing one or more dimensions. It is also
called Galaxy schema.
Fact Constellation Schema is a sophisticated database design that is
difficult to summarize information. Fact Constellation Schema can
implement between aggregate Fact tables or decompose a complex
Fact table into independent simplex Fact tables.
Example: A fact constellation schema is shown in the figure below.
 This schema defines two fact tables, sales, and
shipping. Sales are treated along four dimensions,
namely, time, item, branch, and location. The schema
contains a fact table for sales that includes keys to each
of the four dimensions, along with two measures:
Rupee_sold and units_sold. The shipping table has five
dimensions, or keys: item_key, time_key, shipper_key,
from_location, and to_location, and two measures:
Rupee_cost and units_shipped.
 The primary disadvantage of the fact constellation
schema is that it is a more challenging design because
many variants for specific kinds of aggregation must
be considered and selected.
OLAP in the Data Warehouse
 OLAP stands for On-Line Analytical Processing. OLAP
is a classification of software technology which authorizes
analysts, managers, and executives to gain insight into
information through fast, consistent, interactive access in a
wide variety of possible views of data that has been
transformed from raw information to reflect the real
dimensionality of the enterprise as understood by the
clients.
 OLAP implement the multidimensional analysis of
business information and support the capability for
complex estimations, trend analysis, and sophisticated data
modeling.
Who uses OLAP and Why?

 OLAP applications are used by a variety of the functions of an


organization.
 Finance and accounting:
 Budgeting
 Activity-based costing
 Financial performance analysis
 And financial modeling
 Sales and Marketing
 Sales analysis and forecasting
 Market research analysis
 Promotion analysis
 Customer analysis
 Market and customer segmentation
 Production
 Production planning
 Defect analysis
 OLAP cubes have two main purposes. The first is to
provide business users with a data model more
intuitive to them than a tabular model. This model is
called a Dimensional Model.
 The second purpose is to enable fast query response
that is usually difficult to achieve using tabular
models.
OLAP Guidelines (Dr.E.F.Codd Rule)
 Dr E.F. Codd, the "father" of the relational model, has
formulated a list of 12 guidelines and requirements as the
basis for selecting OLAP systems:
 1) Multidimensional Conceptual View: This is the
central features of an OLAP system. By needing a
multidimensional view, it is possible to carry out methods
like slice and dice.
 2) Transparency: Make the technology, underlying
information repository, computing operations, and the
dissimilar nature of source data totally transparent to
users. Such transparency helps to improve the efficiency
and productivity of the users.
OLAP Guidelines (Dr.E.F.Codd Rule)
 3) Accessibility: It provides access only to the data that is
actually required to perform the particular analysis, present a
single, coherent, and consistent view to the clients. The
OLAP system must map its own logical schema to the
heterogeneous physical data stores and perform any
necessary transformations. The OLAP operations should be
sitting between data sources (e.g., data warehouses) and an
OLAP front-end.
 4) Consistent Reporting Performance: To make sure that
the users do not feel any significant degradation in
documenting performance as the number of dimensions or
the size of the database increases. That is, the performance of
OLAP should not suffer as the number of dimensions is
increased. Users must observe consistent run time, response
time, or machine utilization every time a given query is run.
OLAP Guidelines (Dr.E.F.Codd Rule)
 5) Client/Server Architecture: Make the server
component of OLAP tools sufficiently intelligent that
the various clients to be attached with a minimum of
effort and integration programming. The server should
be capable of mapping and consolidating data
between dissimilar databases.
 6) Generic Dimensionality: An OLAP method
should treat each dimension as equivalent in both is
structure and operational capabilities. Additional
operational capabilities may be allowed to selected
dimensions, but such additional tasks should be
grantable to any dimension.
OLAP Guidelines (Dr.E.F.Codd Rule)
 7) Dynamic Sparse Matrix Handling: To adapt the
physical schema to the specific analytical model being
created and loaded that optimizes sparse matrix
handling. When encountering the sparse matrix, the
system must be easy to dynamically assume the
distribution of the information and adjust the storage
and access to obtain and maintain a consistent level of
performance.
 8) Multiuser Support: OLAP tools must provide
concurrent data access, data integrity, and access
security.
OLAP Guidelines (Dr.E.F.Codd Rule)
 9) Unrestricted cross-dimensional Operations: It
provides the ability for the methods to identify
dimensional order and necessarily functions roll-up
and drill-down methods within a dimension or across
the dimension.
 10) Intuitive Data Manipulation: Data Manipulation
fundamental the consolidation direction like as
reorientation (pivoting), drill-down and roll-up, and
another manipulation to be accomplished naturally
and precisely via point-and-click and drag and drop
methods on the cells of the scientific model. It avoids
the use of a menu or multiple trips to a user interface.
OLAP Guidelines (Dr.E.F.Codd Rule)
 11) Flexible Reporting: It implements efficiency to
the business clients to organize columns, rows, and
cells in a manner that facilitates simple manipulation,
analysis, and synthesis of data.
 12) Unlimited Dimensions and Aggregation
Levels: The number of data dimensions should be
unlimited. Each of these common dimensions must
allow a practically unlimited number of customer-
defined aggregation levels within any given
consolidation path.
Characteristics of OLAP
Characteristics of OLAP
 Fast
 It defines which the system targeted to deliver the most
feedback to the client within about five seconds, with the
elementary analysis taking no more than one second and
very few taking more than 20 seconds.
 Analysis
 It defines which the method can cope with any business
logic and statistical analysis that is relevant for the function
and the user, keep it easy enough for the target client.
Although some preprogramming may be needed we do not
think it acceptable if all application definitions have to be
allow the user to define new Adhoc calculations as part of
the analysis and to document on the data in any desired
method, without having to program so we excludes
products (like Oracle Discoverer) that do not allow the user
to define new Adhoc calculation as part of the analysis and
to document on the data in any desired product that do not
allow adequate end user-oriented calculation flexibility.
Characteristics of OLAP
 Share
 It defines which the system tools all the security requirements
for understanding and, if multiple write connection is needed,
concurrent update location at an appropriated level, not all
functions need customer to write data back, but for the
increasing number which does, the system should be able to
manage multiple updates in a timely, secure manner.
 Multidimensional
 This is the basic requirement. OLAP system must provide a
multidimensional conceptual view of the data, including full
support for hierarchies, as this is certainly the most logical
method to analyze business and organizations.
 Information
 The system should be able to hold all the data needed by the
applications. Data sparsity should be handled in an efficient
manner.
The main characteristics of OLAP are as follows:

 Multidimensional conceptual view: OLAP systems let


business users have a dimensional and logical view of the data in
the data warehouse. It helps in carrying slice and dice
operations.
 Multi-User Support: Since the OLAP techniques are shared,
the OLAP operation should provide normal database operations,
containing retrieval, update, adequacy control, integrity, and
security.
 Accessibility: OLAP acts as a mediator between data
warehouses and front-end. The OLAP operations should be
sitting between data sources (e.g., data warehouses) and an
OLAP front-end.
 Storing OLAP results: OLAP results are kept separate from
data sources.
 Uniform documenting performance: Increasing the number
of dimensions or database size should not significantly degrade
the reporting performance of the OLAP system.
The main characteristics of OLAP are as follows:

 OLAP provides for distinguishing between zero values


and missing values so that aggregates are computed
correctly.
 OLAP system should ignore all missing values and
compute correct aggregate values.
 OLAP facilitate interactive query and complex analysis
for the users.
 OLAP allows users to drill down for greater details or
roll up for aggregations of metrics along a single
business dimension or across multiple dimension.
 OLAP provides the ability to perform intricate
calculations and comparisons.
 OLAP presents results in a number of meaningful ways,
including charts and graphs.
Benefits of OLAP
 Fast query response: OLAP systems are designed to
provide fast query response times, even for complex queries
involving large amounts of data.
 Multidimensional analysis: OLAP systems allow users to
analyze data from multiple dimensions, such as time,
location, product, and customer, providing a deeper
understanding of the data.
 Flexible and customizable: OLAP systems are highly
customizable, allowing users to define their dimensions,
hierarchies, and calculations.
 Improved decision-making: OLAP systems provide users
with the ability to analyze data from different angles,
leading to better insights and more informed decision-
making.
Benefits of OLAP
 OLAP holds several benefits for businesses: -
 OLAP helps managers in decision-making through the
multidimensional record views that it is efficient in
providing, thus increasing their productivity.
 OLAP functions are self-sufficient owing to the
inherent flexibility support to the organized databases.
 It facilitates simulation of business models and
problems, through extensive management of analysis-
capabilities.
 In conjunction with data warehouse, OLAP can be
used to support a reduction in the application backlog,
faster data retrieval, and reduction in query drag.
Disadvantages of the OLAP System
 Complexity: OLAP systems can be complex to
implement and maintain, requiring specialized skills and
knowledge.
 Data storage requirements: OLAP systems require a
large amount of storage space to store multidimensional
data, which can be expensive and difficult to manage.
 Limited transactional processing: OLAP systems are
optimized for analytical processing, but they are not
suitable for transactional processing, which can lead to
performance issues.
 Performance degradation with large datasets: As the
size of the dataset increases, the performance of OLAP
systems may degrade, requiring additional hardware
resources to maintain performance.
Motivations for using OLAP
 1) Understanding and improving sales: For enterprises that
have much products and benefit a number of channels for selling
the product, OLAP can help in finding the most suitable
products and the most famous channels. In some methods, it
may be feasible to find the most profitable users. For
example, considering the telecommunication industry and
considering only one product, communication minutes, there is
a high amount of record if a company want to analyze the sales of
products for every hour of the day (24 hours), difference between
weekdays and weekends (2 values) and split regions to which
calls are made into 50 region.
 2) Understanding and decreasing costs of doing
business: Improving sales is one method of improving a
business, the other method is to analyze cost and to control
them as much as suitable without affecting sales. OLAP can
assist in analyzing the costs related to sales. In some methods, it
may also be feasible to identify expenditures which produce a
high return on investments (ROI). For example, recruiting a top
salesperson may contain high costs, but the revenue generated
by the salesperson may justify the investment.
OLAP Operations in the
Multidimensional Data Model
 In the multidimensional model, the records are
organized into various dimensions, and each
dimension includes multiple levels of abstraction
described by concept hierarchies.
 This organization support users with the flexibility to
view data from various perspectives.
 A number of OLAP data cube operation exist to
demonstrate these different views, allowing interactive
queries and search of the record at hand.
 Hence, OLAP supports a user-friendly environment for
interactive data analysis.
Roll-Up
 The roll-up operation (also known as drill-up or
aggregation operation) performs aggregation on a data
cube, by climbing down concept hierarchies, i.e.,
dimension reduction. Roll-up is like zooming-out on the
data cubes. Figure shows the result of roll-up operations
performed on the dimension location. The hierarchy for
the location is defined as the Order Street, city, province, or
state, country. The roll-up operation aggregates the data by
ascending the location hierarchy from the level of the city
to the level of the country.(Next Slide)
 When a roll-up is performed by dimensions reduction, one
or more dimensions are removed from the cube. For
example, consider a sales data cube having two dimensions,
location and time. Roll-up may be performed by removing,
the time dimensions, appearing in an aggregation of the
total sales by location, relatively than by location and by
time.
Drill-Down
 The drill-down operation (also called roll-down) is the
reverse operation of roll-up. Drill-down is like zooming-
in on the data cube. It navigates from less detailed record
to more detailed data. Drill-down can be performed by
either stepping down a concept hierarchy for a dimension
or adding additional dimensions.
 Figure shows a drill-down operation performed on the
dimension time by stepping down a concept hierarchy
which is defined as day, month, quarter, and year. Drill-
down appears by descending the time hierarchy from the
level of the quarter to a more detailed level of the month.
 Because a drill-down adds more details to the given data, it
can also be performed by adding a new dimension to a
cube. For example, a drill-down on the central cubes of the
figure can occur by introducing an additional dimension,
such as a customer group.
Slice
 A slice is a subset of the cubes corresponding to a
single value for one or more members of the
dimension.
 For example, a slice operation is executed when the
customer wants a selection on one dimension of a
three-dimensional cube resulting in a two-
dimensional site.
 So, the Slice operations perform a selection on one
dimension of the given cube, thus resulting in a
subcube.
Dice

 The dice operation describes a subcube by operating a


selection on two or more dimension.
Pivot
 The pivot operation is also called a rotation.
 Pivot is a visualization operations which rotates the
data axes in view to provide an alternative presentation
of the data.
 It may contain swapping the rows and columns or
moving one of the row-dimensions into the column
dimensions.
Types of OLAP Servers
 Online Analytical Processing Server (OLAP) is based
on the multidimensional data model. It allows
managers, and analysts to get an insight of the
information through fast, consistent, and interactive
access to information.
 We have four types of OLAP servers −
 Relational OLAP (ROLAP)
 Multidimensional OLAP (MOLAP)
 Hybrid OLAP (HOLAP)
 Specialized SQL Servers

Relational OLAP (ROLAP)
 The ROLAP is based on the premise that data need not
be stored multi-dimensionally to be viewed multi-
dimensionally, and that it is possible to exploit the
well-proven relational database technology to handle
the multidimensionality of data.
 In ROLAP data is stored in a relational database. In
essence, each action of slicing and dicing is equivalent
to adding a “WHERE” clause in the SQL statement.
 ROLAP can handle large amounts of data. ROLAP can
leverage functionalities inherent in the relational
database.
Relational OLAP (ROLAP)
Advantages Disadvantages

 ROLAP servers can be easily  Poor query performance.


used with existing RDBMS.  Some limitations of
 Data can be stored efficiently, scalability depending on the
since no zero facts can be technology architecture that
stored. is utilized.
 ROLAP tools do not use pre-
calculated data cubes.
 DSS server of micro-strategy
adopts the ROLAP approach.
Multidimensional OLAP (MOLAP)
 MOLAP stores data on disks in a specialized
multidimensional array structure. OLAP is performed on it
relying on the random access capability of the arrays. Arrays
elements are determined by dimension instances, and the
fact data or measured value associated with each cell is
usually stored in the corresponding array element. In
MOLAP, the multidimensional array is usually stored in a
linear allocation according to nested traversal of the axes in
some predetermined order.
 MOLAP systems typically include provisions such as
advanced indexing and hashing to locate data while
performing queries for handling sparse arrays. MOLAP cubes
are fast data retrieval, optimal for slicing and dicing, and can
perform complex calculations. All calculations are pre-
generated when the cube is created.
Multidimensional OLAP (MOLAP)
Advantages Disadvantages

 MOLAP allows fastest  MOLAP are not capable of


indexing to the pre- containing detailed data.
computed summarized data.  The storage utilization may
 Helps the users connected to be low if the data set is
a network who need to sparse.
analyze larger, less-defined
data.
 Easier to use, therefore
MOLAP is suitable for
inexperienced users.
Hybrid OLAP (HOLAP)
 HOLAP is a combination of ROLAP and
MOLAP. HOLAP servers allow for storing large data
volumes of detailed data.
 On the one hand, HOLAP leverages the greater
scalability of ROLAP.
 On the other hand, HOLAP leverages cube technology
for faster performance and summary-type
information.
 Cubes are smaller than MOLAP since detailed data is
kept in the relational database.
 The database is used to store data in the most
functional way possible.
Transparent OLAP (TOLAP)

 TOLAP systems are designed to work transparently


with existing RDBMS systems, allowing users to access
OLAP features without needing to transfer data to a
separate OLAP system.
 This allows for more seamless integration between
OLAP and traditional RDBMS systems.
Web OLAP (WOLAP)
 It is a Web browser-based technology. In traditional OLAP
application is accessible by the client/server but this OLAP
application is accessible by the web browser.
 It is a three-tier architecture that consists of a client,
middleware, and database server.
 The most appealing features of this style of OLAP were
(past tense intended, since few products categorize
themselves this way) the considerably lower investment
involved on the client side (“all that’s needed is a browser”)
and enhanced accessibility to connect to the data.
 A Web-based application requires no deployment on the
client machine. All that is needed is a Web browser and a
network connection to the intranet or Internet.
Desktop OLAP (DOLAP)
 DOLAP stands for desktop analytical processing.
Users can download the data from the source and work
with the dataset, or on their desktop.
 Functionality is limited compared to other OLAP
applications. It has a cheaper cost.
Mobile OLAP (MOLAP)
 MOLAP is wireless functionality for mobile devices.
User work and access the data through mobile
devices.
Spatial OLAP (SOLAP):

 Merge capabilities of both Geographic Information


Systems (GIS) and OLAP into the single user interface,
SOLAP egress.
 SOLAP is created because the data come in the form
of alphanumeric, image, and vector.
 This provides the easy and quick exploration of data
that resides in a spatial database.
Cloud OLAP (COLAP)
 COLAP is a cloud-based OLAP solution that allows
users to access data from anywhere and anytime.
 It eliminates the need for on-premise hardware and
software installations, making it a cost-effective and
scalable solution for businesses of all sizes.
 COLAP also offers high availability and disaster
recovery capabilities, ensuring business continuity in
the event of a disaster.
Big Data OLAP (BOLAP)
 BOLAP is an OLAP solution that can handle large
amounts of data, such as data from Hadoop or other
big data sources.
 It provides high-performance analytics on large
datasets and supports complex queries that are
impossible with traditional OLAP tools.
 BOLAP also supports real-time analysis of big data,
allowing users to make informed decisions based on
up-to-date information.
Difference Between ROLAP And MOLAP
S.NO ROLAP MOLAP

While MOLAP stands


ROLAP stands for Relational Online
1. for Multidimensional Online
Analytical Processing.
Analytical Processing.

While it is used for limited data


2. ROLAP is used for large data volumes.
volumes.

3. The access of ROLAP is slow. While the access of MOLAP is fast.

In ROLAP, Data is stored in relation While in MOLAP, Data is stored in


4.
tables. multidimensional array.

In ROLAP, Data is fetched from data- While in MOLAP, Data is fetched


5.
warehouse. from MDDBs database.

In ROLAP, Complicated sql queries are While in MOLAP, Sparse matrix is


6.
used. used.
OLAP IMPLEMENTATION CONSIDERATIONS
 Dimensional modeling
 Design and building of the MDDB
 Selection of the data to be moved into the OLAP system
 Data acquisition or extraction for the OLAP system
 Data loading into the OLAP server
 Computation of data aggregation and derived data
 Implementation of application on the desktop
 Provision of user training
Executive Information Systems
(EIS)
 EIS is defined as a system that helps the high-level
executives to take policy decisions. This system uses
high level data, analytical models and user friendly
software for taking decisions. It is a structured,
automated tracking system that operates continuously
to keep everything managed. It provides exception and
status reporting capabilities.
Executive Information Systems (EIS)
Advantages Disadvantages
 Easy to use.  Functions are limited.
 Ability to analyze the  Difficult to keep
trends. current data.
 Time management.  System can run slow.
 Efficiency.  Less reliable.
 Enhances business
problem solving.
Data Warehouse and Business
Strategy

You might also like