Unit 2
Unit 2
1
CCW331-BUSINESS ANALYTICS UNIT-II
UNIT II BUSINESS INTELLIGENCE 6
Data Warehouses and Data Mart - Knowledge Management –Types of
Decisions - Decision Making Process - Decision Support Systems –
Business Intelligence –OLAP – Analytic functions
T / R* Mode of
Sl. Teaching (Google Blooms
Periods
No Topic(s) Required
Meet/BB / PPT / Level CO
Book NPTEL / MOOC / (L1-L6)
.
etc )
Text Books
1. R. Evans James, Business Analytics, 2nd Edition, Pearson, 2017
2. R N Prasad, Seema Acharya, Fundamentals of Business Analytics, 2nd Edition, Wiley 2016
3. Philip Kotler and Kevin Keller, Marketing Management, 15th edition, PHI, 2016
4. VSP RAO, Human Resource Management, 3rd Edition, Excel Books, 2010.
5. Mahadevan B, ―Operations Management -Theory and Practice‖,3rd Edition,Pearson Education,2018.
Prepared By Verified By
2
CCW331-BUSINESS ANALYTICS UNIT-II
DATA WAREHOUSES AND DATA MART
INTRODUCTION
Business intelligence, as we know it today, would not be possible without the data warehouse.
At its core, business intelligence is the ability to answer complex questions about your data and use
those answers to make informed business decisions. In order to do this well, you need a data
warehouse, which not only provides a safe way to centralize and store all your data but also a method
to quickly find the answers you need, when you need them. And that‗s a pretty important role. By
2025, it‗s estimated humanity will have produced a total of 175 zettabytes of data. For context, that‗s
175,000,000,000 terabytes.
Where does all of this information go? Well, most of it goes in the data warehouses.
Companies use data warehouses to manage transactions, understand their data, and keep it all
organized. In short, data warehouses make large amounts of information more usable for organizations
of all sizes and types. This has made them a linchpin of data pipelines and business intelligence systems
the world over. And understanding how data warehouses work can help you fulfill the full potential of
business intelligence (it‗s not as complex as it may seem).
WHAT IS A DATABASE?
A database is a storage location of related data used to capture a specific situation. One
example of a database is a point-of-sale (POS) database. The POS database will capture and store all
the relevant data surrounding a retail store‗s transactions.
Databases have a variety of flavors: structured, relational, relational database management systems
(RDBMS), or unstructured data structures (known as ‗NoSQL‗). New data coming into the database is
processed, organized, managed, updated and then stored in tables. Databases are single-purpose
repositories of raw transactional data. Because a database is closely tied with transactions, a database
performs online transactional processing (OLTP).
Data recording capabilities, capturing transactions as they occur and housing those transactions
3
CCW331-BUSINESS ANALYTICS UNIT-II
Breaking that down into human terms, this means data warehouses excel at storing data that‗s:
Integrated: They combine data from many databases and data sources.
Granular: The data they house is highly detailed and can be used in many different ways.
Historical: They can host a continuous record of data over years and years.
You can store this data in three different ways: on-premise data warehouses, cloud data
warehouses, and hybrid data warehouses.
On-premise data warehouses run on physical servers that your company owns and manages.
Cloud data warehouses are fully online, and you pay for space on servers that another company
manages, like Amazon Redshift. Hybrid data warehouses are a mix of both on-premise and
cloud, and companies making the transition to the cloud over a period of time use this option.
With all the data stored in one place, data warehouses use a specific approach to process data
called online analytical processing (OLAP), which is specifically designed for complex queries.
One way to think about it is that when you go to your data warehouse to ask a question about the
relationship between one set of data and another, OLAP is a way of organizing and moving
among the rows and rows of shelves to quickly find that information.
This is great for business intelligence because the questions you ask about your data in order to
make decisions are rarely simple. Because data warehouses use OLAP, they make finding
answers to these complex questions very efficient. As a result, they‗ve become a foundation for
many successful business intelligence systems.
4
CCW331-BUSINESS ANALYTICS UNIT-II
The glue holding this process together is data warehouses, which serve as the facilitator of data
storage using OLAP. They integrate, summarize, and transform data, making it easier to analyze.
Even though data warehouses serve as the backbone of data storage, they‗re not the only
technology involved in data storage. Many companies go through a data storage hierarchy before
reaching the point where they absolutely need a data warehouse.
SOURCE DATA
Source data is any individual set of data like databases, Excel spreadsheets, individual
application reports, etc. It‗s structured (i.e., organized) yet siloed data that works fine alone but
does not provide a larger picture of your organization‗s data as a whole.
DATA LAKE
For teams who have graduated to a need to centralize their source data into one place, a
data lake is increasingly becoming the next step. A data lake serves as a central repository for all
raw, unstructured (i.e., not organized) data.
If a data warehouse is like backing up a truck and unloading the data in an orderly fashion into a
well-organized shelving system, data lakes are like backing the truck up and dumping all the data
into, well, a lake. James Dixon, who coined the term ―data lake,‖ describes it as the natural raw
state of data that, for people with the diving skills, serves as a frontier to explore.
5
CCW331-BUSINESS ANALYTICS UNIT-II
The drawback of a data lake is that the data is not ready for analysis. It‗s not well-organized,
there may be duplicates, and in order to make sense of it, you‗ll need to tell your diver exactly
what you‗re looking for. Even then, the diver might not find exactly what you need after all that
effort.
DATA WAREHOUSE
Like a data lake, a data warehouse centralizes your data, but as we‗ve established, it‗s
well-organized and set up for efficient analysis. It‗s a single source of truth for all data that‗s
easier to understand and navigate.
Data warehouses can hook right up to source data, but nowadays, we‗re seeing more and more
companies use their data warehouse as a layer on top of their data lake. Following Dixon‗s
comparison, if a data lake is the water/data in its natural, unorganized state, a data warehouse is
where you treat it and make it ready for consumption.
Business Intelligence has advanced quickly and dramatically in recent years, and many
people are taking advantage of it. To be the most successful and efficient with this newfound
Business Intelligence (BI) power, it‗s essential to be able to analyze and harness ALL of your
data. Enter the data warehouse.
Simply put, a data warehouse is a large store of data that‗s collected from multiple different
sources within a business. A data warehouse is used as storage for data analytic work (OLAP
systems), leaving the transactional database (OLTP systems) free to focus on transactions. With
a significant amount of data kept in one place, it‗s now easier for businesses to analyze and make
better-informed decisions.
6
CCW331-BUSINESS ANALYTICS UNIT-II
When using a data warehouse to its full potential, analyzing data becomes convenient and
answering important questions about your business becomes simple. Your data is organized and
available so you can get your answers quickly and securely.
Regardless of the specific approach, you take to building a data warehouse, there are three
components that should make up your basic structure: A storage mechanism, operational
software, and human resources.
Storage – This part of the structure is the main foundation — it‗s where your warehouse
will live. There are two main options when it comes to storage, an in-house server
(Oracle, Microsoft SQL Server) or on the cloud (Amazon S3, Microsoft Azure). An in-
house server is internal hardware that‗s set up within your office, and the cloud is a
digital storage solution based on external servers. Either is a feasible option when it
comes to storage and all depends on your needs.
Software – This is the operational part of the data warehouse structure. It‗s often broken
down into two categories — centralization software and visualization software.
Centralization software is needed to collect and maintain the data that comes from all of
your separate databases. Visualization software is needed to take the data and present it in
a visual form to aid in analyzation. Some centralization software includes visualization
software as part of its package, but it is highly recommended that you have both types of
software regardless.
Labor – This is the management aspect of the data warehouse, something that‗s
absolutely essential in having a working solution. To keep your warehouse functional, it
might be necessary to hire new positions within your business. Hiring well-skilled
professionals is crucial, as running a data warehouse requires a lot of knowledge.
However, if you choose to have a cloud-based warehouse, it might not be necessary to
7
CCW331-BUSINESS ANALYTICS UNIT-II
have as many human resources. The cloud is managed by third-party vendors, so it‗s their
responsibility to do routine maintenance on hardware and servers
The fundamental use of a data mart is Business Intelligence (BI) applications. BI is used to
gather, store, access, and analyze record. It can be used by smaller businesses to utilize the data
they have accumulated since it is less expensive than implementing a data warehouse.
There are mainly two approaches to designing data marts. These approaches are
7
CCW331-BUSINESS ANALYTICS UNIT-II
A dependent data marts is a logical subset of a physical subset of a higher data warehouse.
According to this technique, the data marts are treated as the subsets of a data warehouse. In this
technique, firstly a data warehouse is created from which further various data marts can be
created. These data mart are dependent on the data warehouse and extract the essential record
from it. In this technique, as the data warehouse creates the data mart; therefore, there is no need
for data mart integration. It is also known as a top-down approach.
The second approach is Independent data marts (IDM) Here, firstly independent data marts are
created, and then a data warehouse is designed using these independent multiple data marts. In
this approach, as all the data marts are designed independently; therefore, the integration of data
marts is required. It is also termed as a bottom-up approach as the data marts are integrated to
develop a data warehouse.
8
CCW331-BUSINESS ANALYTICS UNIT-II
Other than these two categories, one more type exists that is called "Hybrid Data Marts."
It allows us to combine input from sources other than a data warehouse. This could be helpful for
many situations; especially when Adhoc integrations are needed, such as after a new group or
product is added to the organizations.
The significant steps in implementing a data mart are to design the schema, construct the
physical storage, populate the data mart with data from source systems, access it to make
informed decisions and manage it over time. So, the steps are:
Designing
The design step is the first in the data mart process. This phase covers all of the functions from
initiating the request for a data mart through gathering data about the requirements and
developing the logical and physical design of the data mart.
9
CCW331-BUSINESS ANALYTICS UNIT-II
Constructing
This step contains creating the physical database and logical structures associated with the data
mart to provide fast and efficient access to the data.
1. Creating the physical database and logical structures such as table spaces associated with
the data mart.
2. Creating the schema objects such as tables and indexes describe in the design step.
3. Determining how best to set up the tables and access structures.
Populating
This step includes all of the tasks related to the getting data from the source, cleaning it up,
modifying it to the right format and level of detail, and moving it into the data mart.
Accessing
This step involves putting the data to use: querying the data, analyzing it, creating reports, charts
and graphs and publishing them.
1. Set up and intermediate layer (Meta Layer) for the front-end tool to use. This layer
translates database operations and objects names into business conditions so that the end-
clients can interact with the data mart using words which relates to the business
functions.
2. Set up and manage database architectures like summarized tables which help queries
agree through the front-end tools execute rapidly and efficiently.
10
CCW331-BUSINESS ANALYTICS UNIT-II
Managing
This step contains managing the data mart over its lifetime. In this step, management functions
are performed as:
It may hold multiple subject areas. It holds only one subject area. For example,
Finance or Sales.
11
CCW331-BUSINESS ANALYTICS UNIT-II
Works to integrate all data sources It concentrates on integrating data from a given
subject area or set of source systems.
In data warehousing, Fact constellation is used. In Data Mart, Star Schema and Snowflake
Schema are used.
CONCLUSION
A data warehouse is a great solution to centralizing and easily analyzing your business‗s
data. It increases data availability, boosts efficiency in analytical activity, improves the quality of
information needed for reporting, and makes working with data secure. The structure of a data
warehouse is basic, consisting of a storage system, two types of software, and a few employees
to make it all work.
12
CCW331-BUSINESS ANALYTICS UNIT-II
KNOWLEDGE MANAGEMENT
WHAT IS KNOWLEDGE MANAGEMENT?
Companies with a knowledge management strategy achieve business outcomes more quickly as
increased organizational learning and collaboration among team members facilitates faster
decision-making across the business. It also streamlines more organizational processes, such as
training and on-boarding, leading to reports of higher employee satisfaction and retention.
TYPES OF KNOWLEDGE
Tacit knowledge: This type of knowledge is typically acquired through experience, and
it is intuitively understood. As a result, it is challenging to articulate and codify, making it
difficult to transfer this information to other individuals. Examples of tacit knowledge
can include language, facial recognition, or leadership skills.
Implicit knowledge: While some literature equivocates implicit knowledge to tacit
knowledge, some academics break out this type separately, expressing that the definition
of tactic knowledge is more nuanced. While tacit knowledge is difficult to codify,
implicit knowledge does not necessarily have this problem. Instead, implicit information
has yet to be documented. It tends to exist within processes, and it can be referred to as
―know-how‖ knowledge.
Explicit knowledge: Explicit knowledge is captured within various document types such
as manuals, reports, and guides, allowing organizations to easily share knowledge across
teams. This type of knowledge is perhaps the most well-known and examples of it
include knowledge assets such as databases, white papers, and case studies. This form of
knowledge is important to retain intellectual capital within an organization as well as
facilitate successful knowledge transfer to new employees.
13
CCW331-BUSINESS ANALYTICS UNIT-II
Step 1: Collecting
This is the most important step of the knowledge management process. If you collect the
incorrect or irrelevant data, the resulting knowledge may not be the most accurate. Therefore, the
decisions made based on such knowledge could be inaccurate as well.
There are many methods and tools used for data collection. First of all, data collection should be
a procedure in knowledge management process. These procedures should be properly
documented and followed by people involved in data collection process.
The data collection procedure defines certain data collection points. Some points may be the
summary of certain routine reports. As an example, monthly sales report and daily attendance
reports may be two good resources for data collection.
With data collection points, the data extraction techniques and tools are also defined. As an
example, the sales report may be a paper-based report where a data entry operator needs to feed
the data manually to a database whereas, the daily attendance report may be an online report
where it is directly stored in the database.
In addition to data collecting points and extraction mechanism, data storage is also defined in this
step. Most of the organizations now use a software database application for this purpose.
Step 2: Organizing
The data collected need to be organized. This organization usually happens based on certain
rules. These rules are defined by the organization.
As an example, all sales-related data can be filed together and all staff-related data could be
stored in the same database table. This type of organization helps to maintain data accurately
within a database.
14
CCW331-BUSINESS ANALYTICS UNIT-II
If there is much data in the database, techniques such as 'normalization' can be used for
organizing and reducing the duplication.
This way, data is logically arranged and related to one another for easy retrieval. When data
passes step 2, it becomes information.
Step 3: Summarizing
In this step, the information is summarized in order to take the essence of it. The lengthy
information is presented in tabular or graphical format and stored appropriately.
For summarizing, there are many tools that can be used such as software packages, charts
(Pareto, cause-and-effect), and different techniques.
Step 4: Analyzing
At this stage, the information is analyzed in order to find the relationships, redundancies and
patterns.
An expert or an expert team should be assigned for this purpose as the experience of the
person/team plays a vital role. Usually, there are reports created after analysis of information.
Step 5: Synthesizing
At this point, information becomes knowledge. The results of analysis (usually the reports) are
combined together to derive various concepts and artefacts.
A pattern or behavior of one entity can be applied to explain another, and collectively, the
organization will have a set of knowledge elements that can be used across the organization.
This knowledge is then stored in the organizational knowledge base for further use.
Usually, the knowledge base is a software implementation that can be accessed from anywhere
through the Internet.
You can also buy such knowledge base software or download an open-source implementation of
the same for free.
Step 6: Decision Making
At this stage, the knowledge is used for decision making. As an example, when estimating a
specific type of a project or a task, the knowledge related to previous estimates can be used.
This accelerates the estimation process and adds high accuracy. This is how the organizational
knowledge management adds value and saves money in the long run.
Knowledge management tools
There are number of tools that organizations utilize to reap the benefits of knowledge
management. Examples of knowledge management systems can include:
15
CCW331-BUSINESS ANALYTICS UNIT-II
Content management systems (CMS) are applications which manage web content
where end users can edit and publish content. These are commonly confused with
document management systems, but CMSs can support other media types, such as audio
and video.
Intranets are private networks that exist solely within an organization, which enable the
sharing of enablement, tools, and processes within internal stakeholders. While they can
be time-consuming and costly to maintain, they provide a number of groupware services,
such as internal directories and search, which facilitate collaboration.
Wikis can be a popular knowledge management tool given its ease of use. They make it
easy to upload and edit information, but this ease can lead to concerns about
misinformation as workers may update them with incorrect or outdated information.
Data warehouses aggregate data from different sources into a single, central, consistent
data store to support data analysis, data mining, artificial intelligence (AI), and machine
learning. Data is extracted from these repositories so that companies can derive insights,
empowering employees to make data-driven decisions.
Organizational Culture: Management practices will affect the type of organization that
executives lead. Managers can build learning organizations by rewarding and
encouraging knowledge sharing behaviors across their teams. This type of leadership sets
the groundwork for teams to trust each other and communicate more openly to achieve
business outcomes.
Communities of practice: Centers of excellence in specific disciplines provide
employees with a forum to ask questions, facilitating learning and knowledge transfer. In
this way, organizations increase the number of subject matter experts in a given area of
the company, reducing dependencies on specific individuals to execute certain tasks.
Armed with the right tools and strategies, knowledge management practices have seen success in
specific applications, such as:
16
CCW331-BUSINESS ANALYTICS UNIT-II
Self-serve customer service: Customers repeatedly say they‗d prefer to find an answer
themselves, rather than pick up the phone to call support. When done well, a knowledge
management system helps businesses decrease customer support costs and increase
customer satisfaction.
Identification of skill gaps: When teams create relevant documentation around implicit
or tacit knowledge or consolidate explicit knowledge, it can highlight gaps in core
competencies across teams. This provides valuable information to management to form
new organizational structures or hire additional resources.
Make better informed decisions: Knowledge management systems arm individuals and
departments with knowledge. By improving accessibility to current and historical
enterprise knowledge, your teams can upskill and make more information -driven
decisions that support business goals.
Maintains enterprise knowledge: If your most knowledgeable employees left
tomorrow, what would your business do? Practicing internal knowledge management
enables businesses to create an organizational memory. Knowledge held by your long-
term employees and other experts, then make it accessible to your wider team.
Operational efficiencies: Knowledge management systems create a go-to place that
enable knowledge workers to find relevant information more quickly. This, in turn,
reduces the amount of time on research, leading to faster decision-making and cost-
savings through operational efficiencies. Increase productivity not only saves time, but
also reduces costs.
Increased collaboration and communication: Knowledge management systems and
organizational cultures work together to build trust among team members. These
information systems provide more transparency among workers, creating more
understanding and alignment around common goals. Engaged leadership and open
communication create an environment for teams to embrace innovation and feedback .
Data Security: Knowledge management systems enable organizations to customize
permission control, viewership control and the level of document-security to ensure that
information is shared only in the correct channels or with selected individuals. Give your
employees the autonomy access knowledge safely and with confidence.
CONCLUSION
Knowledge management is an essential practice for enterprise organizations.
Organizational knowledge adds long-term benefits to the organization in terms of finances,
culture and people. Therefore, all mature organizations should take necessary steps for
knowledge management in order to enhance the business operations and organization's overall
capability.
17
CCW331-BUSINESS ANALYTICS UNIT-II
TYPES OF DECISIONS
INTRODUCTION
Modern cloud-native organizations have constantly growing streams of raw data flowing
from every corner of the enterprise. Determining the impact this data has on business
performance can be an overwhelming task requiring teams of analysts. That‗s where employing
business intelligence (BI) can help.By presenting current and historical data within a business
context, the data insights supplied by BI tools enable organizations to make smarter, more
confident decisions that provide strategic direction for years to come.
Instead of relying on intuition and ―gut feel,‖ companies can use BI to find new ways to
increase revenue, track performance, boost operational efficiency, identify market trends, expose
problems, and much, much more. Before we dig deeper into the primary types of decisions in
business intelligence, let‗s define what we mean by decision-making in a business context and
understand how business intelligence factors into the process.
DECISION-MAKING DEFINED
Simply put, decision-making is the process of deciding something, especially with a
group of people. From a business decision perspective, the aim is to achieve business objectives
to satisfy stakeholder requirements, needs, and expectations.
For the decision to be effective, however, decision makers must forecast the outcome of each
option and determine which is best for a particular situation. That makes decision support
systems (DSS) like decision intelligence and business intelligence absolute essentials.
BI is the means through which organizations make smarter business decisions. While data fuels
the engine, integrating BI-related infrastructure like a data warehouse, dashboards, reports, data
discovery tools, and cloud data services make it possible to extract insights from your data.
Using BI and advanced analytics, organizations can extract crucial facts from the mountain of
data, transforming it into information companies can act on to make informed strategic decisions.
The result: improved business processes, operational efficiency, and business productivity.
18
CCW331-BUSINESS ANALYTICS UNIT-II
On the other hand, business analytics is an umbrella term for predictive data analysis techniques
(can tell you what‗s going to happen) and prescriptive (tells you what you should be doing to
create better outcomes).
Using business intelligence and analytics efficiently is the difference between companies that
succeed and those that fail in the modern environment.
STRATEGIC DECISIONS
Strategic decisions comprise the highest level of organizational business decisions and are
usually less frequent and made by the organization‗s executives. Yet, their impact is enormous
and far-reaching.
19
CCW331-BUSINESS ANALYTICS UNIT-II
Some types of strategic decisions include selecting a particular market to penetrate, a company to
acquire, or whether to hire additional staff.
Decisions made at this level usually involve significant expenditure. However, they are generally
non-repetitive in nature and are taken only after careful analysis and evaluation of many
alternatives.
TACTICAL DECISIONS
Tactical decisions (or semi structured decisions) occur with greater frequency (e.g., weekly or
monthly) and fall into the mid-management level. Often, they relate to the implementation of
strategic decisions.
Examples of tactical decisions include product price changes, work schedules, departmental
reorganization, and similar activities.The impact of these types of decisions is medium regarding
risk to the organization and impact on profitability.
OPERATIONAL DECISIONS
Operational decisions (or structured decisions) usually happen frequently (e.g., daily or hourly),
relate to day-to-day operations of the enterprise, and have a lesser impact on the organization.
Operational decisions determine the day-to-day profitability of the business, how effectively it
retains customers, or how well it manages risk.
You can summarize these types of decisions in business intelligence this way:
20
CCW331-BUSINESS ANALYTICS UNIT-II
INTRODUCTION
Decision making is the process of making choices by identifying a decision, gathering
information, and assessing alternative resolutions.
Using a step-by-step decision-making process can help you make more deliberate, thoughtful
decisions by organizing relevant information and defining alternatives. This approach increases
the chances that you will choose the most satisfying alternative possible.
21
CCW331-BUSINESS ANALYTICS UNIT-II
CONCLUSION
In any business situation there are multiple directions in which to take a strategy or an
initiative. The variety of alternatives to weigh -- and the volume of decisions that must be made
on an ongoing basis, especially in large organizations -- makes the implementation of an
effective decision-making process a crucial element of managing successful business operations.
22
CCW331-BUSINESS ANALYTICS UNIT-II
23
CCW331-BUSINESS ANALYTICS UNIT-II
A decision support system (DSS) is an information system that aids a business in decision-
making activities that require judgment, determination, and a sequence of actions. The
information system assists the mid- and high-level management of an organization by analyzing
huge volumes of unstructured data and accumulating information that can help to solve problems
and help in decision-making. A DSS is either human-powered, automated, or a combination of
both.
A decision support system produces detailed information reports by gathering and analyzing
data. Hence, a DSS is different from a normal operations application, whose goal is to collect
data and not analyze it.
In a JIT inventory system, the organization requires real-time data of their inventory levels to
place orders ―just in time‖ to prevent delays in production and cause a negative domino effect.
24
CCW331-BUSINESS ANALYTICS UNIT-II
Therefore, a DSS is more tailored to the individual or organization making the decision than a
traditional system.
The model management system S=stores models that managers can use in their decision-making.
The models are used in decision-making regarding the financial health of the organization and
forecasting demand for a good or service.
2. User Interface
The user interface includes tools that help the end-user of a DSS to navigate through the system.
3. Knowledge Base
The knowledge base includes information from internal sources (information collected in a
transaction process system) and external sources (newspapers and online databases).
25
CCW331-BUSINESS ANALYTICS UNIT-II
It automates monotonous managerial processes, which means more of the manager‗s time
can be spent on decision-making.
It improves interpersonal communication within the organization.
•
Disadvantages of a Decision Support System
The cost to develop and implement a DSS is a huge capital investment, which makes it
less accessible to smaller organizations.
A company can develop a dependence on a DSS, as it is integrated into daily decision-
making processes to improve efficiency and speed. However, managers tend to rely on
the system too much, which takes away the subjectivity aspect of decision-making.
A DSS may lead to information overload because an information system tends to
consider all aspects of a problem. It creates a dilemma for end-users, as they are left with
multiple choices.
Implementation of a DSS can cause fear and backlash from lower-level employees. Many
of them are not comfortable with new technology and are afraid of losing their jobs to
technology.
26
CCW331-BUSINESS ANALYTICS UNIT-II
Business intelligence (BI) is an umbrella term for data analysis techniques, applications and
practices used to support the decision-making processes in business. The term was proposed by
Howard Dresner in 1989 and became widespread in the late 1990s.
Business intelligence assists business owners in making important decisions based on their
business data. Rather than directly telling business owners what to do, business intelligence
allows them to analyze the data they have to understand trends and get insights, thus scaffolding
the decision-making process.
BI includes a wide variety of techniques and tools for data analytics, including tools for ad-hoc
analytics and reporting, OLAP tools, real-time business intelligence, SaaS BI, etc. Another
important area of BI is data visualization software, dashboards, and scorecards.
27
CCW331-BUSINESS ANALYTICS UNIT-II
We can say that OLAP occupies a place between a data warehouse and end-user tools in BI, thus
allowing users to get the data they need in a fast and efficient way.
What is OLAP?
OLAP (for online analytical processing) is software for performing multidimensional analysis at
high speeds on large volumes of data from a data warehouse, data mart, or some other unified,
centralized data store.
Most business data have multiple dimensions—multiple categories into which the data are
broken down for presentation, tracking, or analysis. For example, sales figures might have
several dimensions related to location (region, country, state/province, store), time (year, month,
week, day), product (clothing, men/women/children, brand, type), and more.
But in a data warehouse, data sets are stored in tables, each of which can organize data into just
two of these dimensions at a time. OLAP extracts data from multiple relational data sets and
reorganizes it into a multidimensional format that enables very fast processing and very
insightful analysis.
This is where the OLAP cube comes in. The OLAP cube extends the single table with additional
layers, each adding additional dimensions—usually the next level in the ―concept hierarchy‖ of
the dimension. For example, the top layer of the cube might organize sales by region; additional
layers could be country, state/province, city and even specific store.
In theory, a cube can contain an infinite number of layers. (An OLAP cube representing more
than three dimensions is sometimes called a hypercube.) And smaller cubes can exist within
layers—for example, each store layer could contain cubes arranging sales by salesperson and
product. In practice, data analysts will create OLAP cubes containing just the layers they need,
for optimal analysis and performance.
28
CCW331-BUSINESS ANALYTICS UNIT-II
Drill-down
The drill-down operation converts less-detailed data into more-detailed data through one of two
methods—moving down in the concept hierarchy or adding a new dimension to the cube. For
example, if you view sales data for an organization‗s calendar or fiscal quarter, you can drill-
down to see sales for each month, moving down in the concept hierarchy of the ―time‖
dimension.
Roll up
Roll up is the opposite of the drill-down function—it aggregates data on an OLAP cube by
moving up in the concept hierarchy or by reducing the number of dimensions. For example, you
could move up in the concept hierarchy of the ―location‖ dimension by viewing each country's
data, rather than each city.
The slice operation creates a sub-cube by selecting a single dimension from the main OLAP
cube. For example, you can perform a slice by highlighting all data for the organization's first
fiscal or calendar quarter (time dimension).
The dice operation isolates a sub-cube by selecting several dimensions within the main OLAP
cube. For example, you could perform a dice operation by highlighting all data by an
organization‗s calendar or fiscal quarters (time dimension) and within the U.S. and Canada
(location dimension).
Pivot
The pivot function rotates the current cube view to display a new representation of the data—
enabling dynamic multidimensional views of data. The OLAP pivot function is comparable to
the pivot table feature in spreadsheet software, such as Microsoft Excel, but while pivot tables in
29
CCW331-BUSINESS ANALYTICS UNIT-II
Excel can be challenging, OLAP pivots are relatively easier to use (less expertise is required) and
have a faster response time and query performance.
Despite the variety and complexity of data stored in the corporate environment, everything is
typically recorded in simple columns and rows. This is the classic spreadsheet look we‗re all
familiar with, and that‗s how most databases file data.
For the most part, businesses use databases to record transactions. This is an operational need,
as we have to save our sales results, customer information, etc. Later, this data can be
30
CCW331-BUSINESS ANALYTICS UNIT-II
A typical OLAP system will include the following components that perform dedicated functions
to handle analytical queries.
Data source. This could be a transactional database or any other storage we take data from. The
data in its standard format isn‗t optimized for OLAP queries, so it requires transformation and
remodeling before it can be used.
OLAP database is where we store data for analysis. Usually, transformation takes place
before the data is uploaded to a database, but the approach may vary.
OLAP cube is basically a tool for representing multidimensional data for analysis. As we‗re
talking about online analytical processing, cubes are deployed on a dedicated server.
An OLAP cube allows analytics to group or slice items by different categories. They are
primarily designed to run complex queries, which can‗t be handled by the usual OLTP
databases. Here, a user can perform cube-specific operations with data, so we‗ll cover them in a
dedicated section.
Analytical interface- The interactions with cubes and other analytical tools for data
visualization and reporting is done via a dedicated interface. The majority of interfaces are
represented by business intelligence dashboards. Cubes can be accessed via these dashboards,
providing more control to a user.The easiest way to understand how OLAP works is to
compare OLAP and OLTP databases and explore how they structure and process data.
31
CCW331-BUSINESS ANALYTICS UNIT-II
Now, we‗ll look in more detail at how both types can be used, what operations they run, and how
the data is structured for OLTP and OLAP purposes.
The data will be saved as a set of items and values that relate to this transaction. An
OLTP solution will allow a user to perform the following operations with this data:
insert,
copy,
32
CCW331-BUSINESS ANALYTICS UNIT-II
paste,
edit/update, and
delete.
Such transactions have a short response time – measured in seconds – as they are natural to
OLTP. But when it comes to more complex queries that involve aggregating data from
multiple tables, a transactional database will run into trouble. The more data is inquired, the
more problematic and resource-intensive it is for OLTP.
Analytical requests are often much more complex than ―show me total sales amount.‖
More often than not, we need to compare things to each other and look at the data from
different dimensions. That‗s where an OLAP technology kicks in.
But to run complex custom queries, we must structure data properly. That‗s why in most
cases, there is a need for a separate OLAP database or warehouse that will model data for
multidimensional analysis.
OLAP models a database in such a way that it becomes possible to quickly gather the data and
present it to analysts in a multidimensional mode rather than a flat table. That‗s why OLTP
and OLAP databases will differ in numerous ways.
Now, we have to answer two simple questions. How is OLAP data modeling different
from transactional databases? And why can‗t we run such complex queries in OLTP?
33
CCW331-BUSINESS ANALYTICS UNIT-II
This is the most standard way we store data and make modifications to transactional information.
Such an approach works great for simple queries to modify transactional data. But if we need to
query something like ―compare sales of a given item in the 3rd quarter for the last three years
in the US‖ — the relational database will require enormous resources because it will scan each
table entirely to find all the related values.
Moreover, the query will return disparate data items with a lot of unnecessary information, as
the relational model doesn‗t support filtering by multiple dimensions at once (product type, time
period, location).
In a star schema we structure data around facts, providing the keys to every dimension for
measurement. A fact, in this case, is a category of related business items, e.g., product, sales
amount, revenue, customers, time, location, etc. Each of these items is a separate dimension that
includes subcategories. So we can divide, for example, time by year, quarter, month, week, and
day.
34
CCW331-BUSINESS ANALYTICS UNIT-II
A multidimensional model of data is what makes it possible for OLAP systems to extract
the required information, perform complex filtering, and allow for analysis of this data.
35
CCW331-BUSINESS ANALYTICS UNIT-II
Depending on how we plan to design the business intelligence system, OLAP may or may
not require a separate database to run queries. The architecture of a BI system with a
standalone OLAP repository looks something like this:
Data extraction. First, the data is extracted from its original sources and uploaded to a
unified data storage. In the case of BI, a data warehouse will be the place we upload data to.
Data preparation. Once we‗ve got the data, it requires optimization and modeling for
multidimensional analysis. In some cases, corporate DW can be optimized to run OLAP queries,
but a more typical case is to use a separate OLAP database. Here are a few reasons why.
Running analytical and transactional queries on separate databases eliminates the risk of
overloads and database downtimes, while guaranteeing decent performance of the two.
Applying data models is easier when we use a storage for a single purpose.
Data transformation and integration is usually done via ETL/ELT tools, which help developers
to automate data extraction, transformation, and uploading.
Building a cube. Once the data is prepared, a group of responsible data engineers will model
cubes and deploy them on the dedicated server. Creating a cube is a custom process each time,
because data can‗t be updated once it was modeled in a cube. So, for each specific query, a
new cube will be created.
Accessing data. As an end point in the system, OLAP cubes will be accessed through
analytical interfaces. Here, analytics can type in commands and perform cube-specific
operations to analyze data.
Now, let‗s look at the cubes themselves and define the capabilities they give to the analyst.
36
CCW331-BUSINESS ANALYTICS UNIT-II
Drill down allows a user to move from high-level data (e.g., annual sales) to a lower level (e.g.,
monthly sales). Here we use the concept of hierarchy that applies to every single dimension. So,
in the ―time‖ dimension, we can move down from yearly figures to weekly or even daily
records. This depends on how you store your data and model the actual cube.
Roll up is the opposite of drill down, as it basically lifts the data in hierarchy levels. Both operations
either make the data more or less detailed, or add/remove dimensions for the analysis.
37
CCW331-BUSINESS ANALYTICS UNIT-II
Slice operations help you divide a certain dimension into a separate table (one-
dimension view). ―Slice‖ can detach, say, the city‗s dimension from the rest of the cube, which
will create a separate spreadsheet. This way we can analyze low-level information in the isolated
environment.
Dice provides the same separation functionality, but allows you to choose more than one
dimension, producing a separate cube.
38
CCW331-BUSINESS ANALYTICS UNIT-II
Pivot is a similar operation to create pivot tables in Excel. This function allows us to rotate
a cube to get a different representation of data in between the dimensions.
All in all, the functions can be used in conjunction, which gives huge flexibility to use a single
cube for multiple purposes. But, as we mentioned before, each time there is a modification to
data, the cube will require reuploading the information or remodeling the existing OLAP DB.
OLAP PROVIDERS
OLAP is a vital part of any BI system. Despite its resource-intensive nature, OLAP remains a
standard solution for complex analytics that can‗t be done in the usual databases. As the
technology appeared in the early ‗90s, the market of solutions is quite large. And the main
proposal comes from data warehouse/business intelligence providers.
Nearly any provider these days supports all of the basic functions of OLAP and allows the
creation of multidimensional cube systems as a part of their BI platform. Now, let‗s look at some
popular products that can be used as a separate OLAP tool.
39
CCW331-BUSINESS ANALYTICS UNIT-II
Apache Kylin is an open-source distributed data warehouse for big data and OLAP. Kylin was developed
for internal analytics on Ebay. Since 2014, it has gone open source and is distributed bya free license.
While it focuses on analyzing big data, Kylin can also be used for corporate warehouses of medium size.
Plus, Kylin integrates with the popular BI interfaces such
as Tableau, Superset, Qlik, and Zeppelin.
Microsof t SQL Server Analysis Services (SSAS). As a part of its Azure Cloud Platform and PowerBI
analytical solutions, Microsoft offers a separate product for OLAP. Currently, they callit Azure
Analysis Services.Basically, it‗s an OLAP modeling and processing tool integrated with PowerBI.
The pricing is calculated like all the Azure products, based on computation resources. You can check
pricing on the corresponding page.
IBM Cognos TM1 is another platform that consists of multiple tools for data analysis, cube modeling,
and data visualization. As with Microsoft, TM1 includes a broad range of Watson Analytics products
and a dedicated analytical server. The pricing for TM1 can be gotten from IBM on their price planning
page.
But while software vendors offer tools for data modeling and analysis, you‗ll still need data
engineers/data analysts to model and analyze the information. In other words, OLAP falls into a
category of specific analytical tools that require data engineering or BI development expertise towork
with. In this case, you may think of hiring a data specialists team to handle a custom project for your
corporate needs.
40
CCW331-BUSINESS ANALYTICS UNIT-II
As you might remember, a traditional relational database stores values in rows, while
columns denote categories of items. A column database is a type of schema that uses
columns to organize tables in DB. As simple as that, this type of schema provides
capabilities similar to what an OLAP database does. Each table will represent a dimension
that can be quickly scanned and analyzed.Column databases can potentially be used as a
data warehouse capable of handling OLAP queries by nature. While this approach was
described way back in 2012 in different studies, it gained popularity only a few years ago.
So this led to the emergence of column-oriented cloud data warehouses.