2.1 Principles of Dimensional Modeling
2.1 Principles of Dimensional Modeling
Example:
Let's consider a retail business that wants to analyze its sales data. From requirements gathering,
they identify dimensions like time, product, store, and customer, along with facts such as sales
revenue and quantity sold. They then design a dimensional model where time, product, store, and
customer are dimensions, and sales revenue and quantity sold are facts. This model is implemented
logically and then physically, optimized for querying performance. Finally, it undergoes testing,
deployment, and ongoing maintenance to ensure its effectiveness.
In essence, dimensional modeling bridges the gap between business requirements and data design
by providing a structured approach to organizing and analyzing data for decision-making purposes.
OLAP
OLAP stands for Online Analytical Processing (OLAP) could be a innovation that’s utilized to
organize expansive business databases and back business intelligence. OLAP databases are
separated into one or more cubes, and each cube is organized and designed by a cube administrator
to fit the way simply recover and analyze data so that it is less demanding to form and utilize the
PivotTable reports and PivotChart reports that you just require.
Characteristics of OLAP
The FASMI Test : It can represent the characteristics of an OLAP application in a specific method,
without dictating how it should be performed.
a. Fast − It defines that the system is targeted to produce most responses to users within about
five seconds, with the understandable analysis taking no more than one second and very few
taking more than 20 seconds.
Independent research in the Netherlands has shown that end-users consider that a process has
declined if results are not received with 30 seconds, and they are suitable to hit
‘ALT+Ctrl+Delete’ unless the system needs them that the report will take longer.
b. Analysis − It defines that the system can manage with any business logic and statistical
analysis that is appropriate for the application and the user, the keep it easy enough for the
target user. Although some pre-programming can be required, it does not think it acceptable
if all application definitions have to be completed using a professional 4GL.
It is necessary to enable the user to represent new ad hoc calculations as part of the analysis
and to report on the data in any desired method, without having to program, so it can exclude
products (like Oracle Discoverer) that do not enable the user to represent new ad hoc
calculations as an element of the analysis and to report on the data in any desired method,
without having to program, so it can exclude products (like Oracle Discoverer) that do not
enable adequate end-user oriented calculation flexibility.
c. Shared − It defines that the system implements all the security requirements for
confidentiality (probably down to cell level) and, multiple write access is required,
concurrent update areas at a suitable level. It is not all applications required users to write
data back, but for the increasing number that does, the system must be able to handle several
updates in an appropriate, secure manner. This is a major field of weakness in some OLAP
products, which tend to consider that all OLAP applications will be read-only, with simple
security controls.
d. Multidimensional − The system should support a multidimensional conceptual view of the
data, including complete support for hierarchies and multiple hierarchies. It is not setting up
a specific minimum number of dimensions that should be managed as it is too software
dependent and most products seem to have enough for their target industry.
e. Information − Information is all of the data and derived data required, whether it is and
however much is relevant for the software. We are measuring the capacity of several products
in terms of how much input data can manage, not how many Gigabytes they take to save it.
Advantages of OLAP
Quick inquiry execution due to optimized capacity, multidimensional ordering and caching.
Smaller on-disk measure of information compared to information put away in social database
due to compression techniques.
Automated computation of higher level totals of the data.
It is exceptionally compact for most measurement information sets. Array models give
common indexing.
Effective information extraction accomplished through the pre-structuring of amassed
information.
Disadvantages of OLAP
Inside a few OLAP Arrangements the preparing step (information stack) can be very long,
particularly on expansive information volumes. This is often ordinarily helped by doing as it
were incremental handling, i.e., preparing as it were the data which have changed (usually
modern information) rather than reprocessing the whole information set.
Some OLAP techniques present data redundancy.
OLAP Applications
a. Business Reporting for sales: The Business Reporting gives an overview of the sales
activity in the sales activities within an organization. It shows the trends in the sales over a
certain time period. It also analyzes the different steps for sales and sales executive
performance. These reports can be used to analyze the sales data and assess the situation to
make the best decisions to undertake.
b. Marketing: Industries like digital marketing, health care, eCommerce, and finance uses
OLAP in their marketing.Example: Market Basket Analysis is a technique that gives the
careful study of purchases done by a customer in a supermarket. This concept identifies the
pattern of frequent purchase items by customers. This analysis can help to promote deals,
offers, sale by the companies and data mining techniques helps to achieve this analysis
task.
c. Management Reporting: It aims to inform the managers of different aspects of the
organizations about the data from the various departments of the company in order to help
them to make better decisions. They collect the data and present them in an understandable
way. It also provides the insights of the company on how a company is doing and what are
the steps to be taken to increase efficiency and make decisions to remain competitive in the
market.
d. Business Process Management: Business process management refers to improve a
business process from end to end by analyzing it. It helps organizations to steps required
to carry out a business task.
e. Financial Reporting: Financial Reporting refers to financial reports of an organization
that are released to stakeholders and the public. It includes the financial statements which
include the balance sheet, income sheet, statement of cash flows, etc. It shows the financial
information that the company choose to show.
Some other applications of OLAP are as follows:
Marketing analysis
Customer and product profitability
Supply and Demand forecasting
Human resources analysis
Resource analysis and capacity planning
Variance analysis
Claims experience analysis
b. Transparency:
It makes the technology, underlying data repository, computing architecture, and the
diverse nature of source data totally transparent to users.
c. Accessibility:
Access should provided only to the data that is actually needed to perform the specific
analysis, presenting a single, coherent and consistent view to the users.
f. Generic Dimensionality:
It should be ensured that very data dimension is equivalent in both structure and
operational capabilities. Have one logical structure for all dimensions.
h. Multi-user Support:
Support should be provided for end users to work concurrently with either the same
analytical model or to create different models from the same data.
k. Flexible Reporting:
Business user is provided capabilities to arrange columns, rows, and cells in manner that
gives the facility of easy manipulation, analysis and synthesis of information.
OLAP Functions
OLAP functions provide powerful capabilities for analyzing multidimensional data, enabling organizations
to derive insights, make informed decisions, and optimize business processes. These functions encompass a
wide range of analytical techniques, from basic aggregations to advanced forecasting and time series analysis,
tailored to meet diverse analytical needs across industries and domains.
1. Aggregation Functions:
Description: Aggregation functions calculate summary statistics or aggregate values across one or
more dimensions in the data cube.
Examples:
Sum: Calculates the total sum of a measure across selected dimensions. For instance, summing
up sales revenue across product categories.
Average: Computes the average value of a measure across selected dimensions. For example,
calculating the average monthly sales quantity.
Count: Counts the number of data points or records within a dimension. For instance, counting
the number of customers in each region.
Minimum/Maximum: Determines the minimum or maximum value of a measure within selected
dimensions. For example, finding the highest and lowest temperatures recorded per month.
2. Ranking Functions:
Description: Ranking functions assign a rank or position to data based on specified criteria, enabling
the identification of top or bottom performers.
Examples:
Rank: Assigns a rank to each data point based on a measure, such as ranking products by
sales revenue.
Top N/Bottom N: Selects the top or bottom N data points based on a measure, such as
identifying the top 5 best-selling products.
3. Forecasting Functions:
Description: Forecasting functions use historical data to predict future trends, allowing organizations
to anticipate demand or plan resources.
Examples:
Moving Average: Calculates the average of a measure over a specified period, smoothing out
fluctuations to identify trends.
Exponential Smoothing: Applies weighted averages to historical data, giving more
importance to recent observations in forecasting future values.
Time Series Analysis: Utilizes statistical models to analyze patterns and seasonality in
sequential data points, projecting future values based on historical trends.
Trend Analysis: Identifies long-term trends in data, such as increasing sales over several
years.
Seasonality Detection: Detects recurring patterns or cycles in data, such as increased sales
during holiday seasons.
Smoothing Techniques: Removes noise or irregular fluctuations in data to reveal underlying
trends more clearly.
5. Calculation Functions:
Description: Calculation functions create custom calculations or derive new measures from existing
data to meet specific analytical requirements.
Examples:
Profit Margin Calculation: Calculates the profit margin by subtracting the cost from the
revenue and dividing by revenue.
Year-over-Year Growth: Computes the percentage change in a measure from one year to the
next, indicating growth or decline.
Market Basket Analysis: Identifies associations or relationships between items purchased
together to inform cross-selling strategies.
OLAP operations:
There are five basic analytical operations that can be performed on an OLAP cube:
1. Drill down: In drill-down operation, the less detailed data is converted into highly detailed
data. It can be done by:
Moving down in the concept hierarchy
Adding a new dimension
In the cube given in overview section, the drill down operation is performed by moving down
in the concept hierarchy of Time dimension (Quarter -> Month).
2. Roll up: It is just opposite of the drill-down operation. It performs aggregation on the OLAP
cube. It can be done by:
Climbing up in the concept hierarchy
Reducing the dimensions
In the cube given in the overview section, the roll-up operation is performed by climbing up
in the concept hierarchy of Location dimension (City -> Country).
3. Dice: It selects a sub-cube from the OLAP cube by selecting two or more dimensions. In the
cube given in the overview section, a sub-cube is selected by selecting following dimensions
with criteria:
Location = “Delhi” or “Kolkata”
Time = “Q1” or “Q2”
Item = “Car” or “Bus”
4. Slice: It selects a single dimension from the OLAP cube which results in a new sub-cube
creation. In the cube given in the overview section, Slice is performed on the dimension Time
= “Q1”.
5. Pivot: It is also known as rotation operation as it rotates the current view to get a new view
of the representation. In the sub-cube obtained after the slice operation, performing pivot
operation gives a new view of it.
Definition:
A data warehouse is a centralized repository that stores integrated, historical data
from various sources within an organization. It is optimized for analytical processing
and supports decision-making processes.
Benefits:
Improved Decision-Making: Data warehouses enable informed decision-making by
providing timely access to reliable, integrated data for analysis.
Enhanced Agility: Data warehouses support agile decision-making by enabling quick
access to relevant data and insights, allowing organizations to respond rapidly to
changing market conditions.
Cost Savings: Data warehouses help reduce costs associated with data duplication,
inconsistency, and inefficiency by providing a single source of truth for
organizational data.
Innovation: Data warehouses support innovation by providing a platform for
advanced analytics, data mining, and predictive modeling to uncover new business
opportunities and optimize processes.
Example:
A retail company uses a data warehouse to analyze sales data from various sources,
including point-of-sale systems, online transactions, and customer feedback. By
analyzing historical sales data, the company identifies trends in customer purchasing
behavior, seasonality effects, and product performance. Based on these insights, the
company develops targeted marketing campaigns, optimizes inventory management,
and introduces new product offerings to align with its strategic objectives of
increasing market share and customer satisfaction.