BI Bro Notes Full
BI Bro Notes Full
DSS
It analyzes large amounts of data and presents the best option available to the organization.
It helps decision makers to solve complex problems. DSS is a basic component in the development of business
intelligence architecture.
Three main elements of DSS are data, models and graphical user interface for handling the dialogue between
the system and the user.
The decision-making process includes three phases and two additional phases.
o Intelligence
o Design
o Choice
Additional phases
o Implementation
o Control
Intelligence: In this face, the decision maker examines the reality and tries to understand the problem
occurring in the organization, why the problem exists and what effect it is having on the firm.
Design: This phase involves identifying and exploring various solutions to solve the problem. Here the
experience of the decision-maker plays a very important role.
Choice: The selected alternative for solving a problem has to be evaluated using the performance criteria.
Implementation: The selected alternative has to be transformed into action by means of an implementation
plan, which includes assigning rules to all involved in the action plan.
Control: Once the implementation is completed, the difference of performance indicator values obtained in
the choice phase and the implementation phase need to be compared. The results are to be transformed into
experience and information. This is used in data warehouse during the decision-making process.
BI
The large amount of data which we get through the Internet is often heterogeneous in origin and also
represented in different formats, contents are also varied with respect to origin.
So, there is a need to convert search data into knowledgeable information, which can be used by decision-
makers to take decisions for the improvement of their business.
Business intelligence is a set of mathematical models and analysis methodologies that exploit the available
data to generate information and knowledge useful for complex decision-making processes.
Benefits
Make information actionable as the user can get data as per their requirement to get the knowledge.
Decision makers can make better decisions as exact and up-to-date information is provided.
Multiple data sources can be combined through BI, so can be taken faster.
Business metrics reports are available and can be accessed from anywhere and whenever there is a need.
BI Architecture
Architecture
Data sources
In addition to these internal data, operational data also includes external data obtained from commercial
databases, and databases associated with suppliers and customers, which include unstructured documents
like emails.
It is necessary to perform all operations associated with extraction and integration of data from different
heterogeneous sources.
Data warehouse
An ETL tool (extract, transform, load) is a tool that reads data from one or more sources, transforms the data so
that it is compatible with destination, and loads the data to the destination databases.
ETL function transforms the relevant data collected from different source systems into useful information and
then stores it in data warehouse, which can be used for strategic decision making.
The main purpose of data warehouse is to provide information to the business manager for strategic decision
making.
Extracted data from data warehouse is supplied to the mathematical model and analysis methodologies.
These users interact with the warehouse using user access tools, such as
This data is generated from internal transactions and from external sources.
This data collected and stored in a systematic manner is still not suitable for the decision-making process.
It needs to be extracted, using tools, and processed using analytical methods, capable of transforming them
into information and knowledge which can be used by decision-makers.
Data
Data encompasses plain facts, observation, statistics, character, symbols, images, numbers, and more that
are collected to be used for analysis.
Data alone is not very informative, but it gains purpose after it is interpreted to derive significance.
Qualitative data is not measurable. It is descriptive, this includes conversations, observations, and feedback.
Information
Information is the processed, refined, structured, and presented form of data which can be used to create
relevance and usefulness.
A variety of knowledge management systems, software and tools are used by the organizations to turn data into
information, some of which include databases, spreadsheets, documents, guidelines and strategies.
Information assigns meaning to the data. It also improves the reliability of data.
Data when converted to information ensures undesirability and reduces uncertainty, due to which it never has
any useless details.
Knowledge
Active way involves applying a mathematical model in the form of inductive learning.
A business intelligence system uses mathematical models and algorithms to provide decision makers with
information and knowledge extracted from data.
This includes calculation of totals and percentages, graphical representation by histogram and more analysis
through optimization and learning models.
o An abstract model developed helps the decision makers to focus on the main features of the domain
which leads to deeper understanding of the problem under investigation.
o Knowledge acquired while building a mathematical model about the domain can be transferred to other
individuals in the long run.
o Mathematical models developed for decision-making is useful for solving other problems of similar
types.
The usage of business intelligence, data mining methods, and decisions support system raise ethical problems
that cannot be ignored.
The distortions and risks generated towards information and knowledge should be avoided using adequate
control rules, and mechanisms.
Data related to individuals which do not respect the right to privacy should not be tolerated.
Business intelligence analysts and decisions makers should abide by the ethical principle of respect for the
personal rights of individuals.
BI Unit – 2
Data warehouse
Data warehouse keeps the historical information of business-related data. This information helps the
businesspeople to take future decisions for business and also do analysis based on past data.
Three different kinds of systems that are required for a data warehouse are,
o Source system
o Staging area
o Presentation servers
The data travels from source system to presentation servers via the data staging area.
Source System
The data in the data warehouse comes from the operational system of the organization as well as from external
sources.
Staging Area
The data extracted from source system is stored in an area called data staging area, where the data is cleaned,
transformed, combined, and duplicated to prepare the data in the data warehouse.
The data staging area is generally a collection of machines where simple activity like sorting and sequential
processing takes place.
The staging area does not provide any query or presentation service. As soon as the system provides square
your presentation service is categorized as presentation server.
Presentation servers
Presentation server is the target machine on which the data is loaded from the data staging area, organized,
and stored for direct querying by end users, report writers and other applications.
An ETL tool (extract, transform, load) is a tool that reads data from one or more sources, transforms the data so
that it is compatible with destination, and loads the data to the destination databases.
ETL function transforms the relevant data collected from different source systems into useful information and
then stores it in data warehouse, which can be used for strategic decision making.
BI is where companies gather today’s information from the market about their competitors.
DW helps the enterprise in storing the data while BI helps control the data in decision-making or forecasting.
A business intelligence tool collects, processes, and analyzes large amounts of data from internal and external
systems.
Data may be collected from sources like documents, images, emails, videos, journals, books, social media,
etc.
BI makes use of queries to find information and present the data in user friendly formats like reports,
dashboard, charts, and graphs.
Functions performed by the tool include data mining, data visualization, performance management analytics,
reporting, text mining, predictive analytics and etc.
Spreadsheets
Spreadsheets are also provided as front-end user interface by business intelligence software.
Digital dashboards
The software helps create visual analysis of data sets to gain meaningful insights of data within minutes.
OLAP tool
OLAP tool helps the user in interactively analyzing the data from multiple sources in a multidimensional view.
This is about bringing intelligence on mobile devices to read historical data and analyze real time information.
This tool is used in discovering patterns in large data sets using AI, machine, learning, statistics, and data
database systems.
This is mining of event logs stored, which is used for providing information for process analysis and governance.
Business intelligence provides the organization with the ability to observe and analyze the customer buying
trends.
This information can be used to create products and product improvements to meet their expectations and
needs.
Improved visibility
Organizations using business intelligence have better control over their processes and standard operating
procedures since visibility is improved by BI system.
Actionable information
It also allows you to understand the implication of various processes and changes within an organization,
which helps in making informed decisions and take action.
Efficiency improvements
BI helps in improving organizational efficiency, which results in increased productivity and increased revenue.
Information across departments is shared with ease which saves time on reporting, data extraction, and data
interpretation.
This helps in eliminating redundant roles and duties, which allows the employees to focus on work rather than
processing data.
Sales insights
A customer relationship management (CRM) application has a lot of important data and information that can
be used for strategic initiatives.
BI helps in identifying new customer tracking and retaining existing and also providing post sales services.
With the help of BI tools, the executives and decision makers can access data in real time through
spreadsheets, visual dashboards, and schedule emails.
Online analytical processing (OLAP) is a category of software that allows users to analyze information from
multiple database systems at the same time.
It is a technology that enables analysts to extract and view business data from different points of view.
As online analytical processing operations are a multi-dimensional data model, these operations are
performed over the data cubes.
The concept of data cube is used to represent data in three-dimensional space and so that analyst can turn
around the data cubes along its dimension for all its possible space combinations to determine the every
aspect of available data.
Each OLA cube is presented through measure and dimensions. Measure refers to the numeric value
categorized by dimensions. Basic operations are roll up, drill down, slicing and dicing.
Roll up:
It is also called drill-up operation. It performs aggregation on a data cube, either by climbing up a concept of
hierarchy or by dimension reduction.
Roll up is like zooming out on the data cube. When a rollup is performed by dimension reduction, one or more
dimensions are removed from the data cube. It involves summarizing the data along a dimension.
Drill down:
Drill down is the reverse of a roll up. Drill down is a dimension expansion technique that can be applied on the
data cube.
Dimension expansion means adding new dimensions or expanding existing dimensions across any axis of the
data cube using the notation of concept hierarchy.
Navigation from less detailed data to more detailed data can be achieved through drill down.
The slice operation performs a selection on one dimension of the given cube, resulting in a sub cube.
The dice operation defines a cube by performing a selection on two or more dimensions.
The slice operation produces a sliced OLAP cube by allowing the analyst to pick a specific value for one of the
dimensions.
It helps the user to visualize and gather the information specific to a dimension.
Multidimensional OLAP (MOLAP) is the classic form of OLAP and is sometimes referred to as just OLAP.
MOLAP Stores this data in an optimized multidimensional array storage, rather than in a relational database.
Therefore, it requires the pre-computation and storage of information in the cube - this operation is known as
processing.
MOLAP tool generally utilizes a pre-calculated data set referred to as data cube. The data cube contains all the
possible answers to a given range of questions.
MOLAP tools have a very fast response time and ability to quickly write back the data into the data set.
Relational OLAP (ROLAP) is a kind of online analytical processing that analyzes data using multidimensional
data models.
ROLAP can handle large volumes of data. Although its use of relational database means that it requires more
processing time, it can be accessed by any SQL tool, and it does not have to be a tool specifically for OLAP’s.
Compared to MOLAP, ROLAP tools are much better at controlling non-aggregated facts such as textual
descriptions.
There is no clear agreement across the industry as to what constitutes Hybrid OLAP, except that a database will
divide data between relational and specialized storage.
For example, HOLAP database will use relational tables to hold larger quantities of detailed data and use
specialized storage for at least some aspects of the smaller quantities of less detailed data.
HOLAP addresses the shortcomings of MOLAP and ROLAP by combining the capabilities of both approaches.
HOLAP tools can utilize both pre-calculated cubes and relational data sources.
An OLAP database has a multidimensional schema. OLTP uses a traditional DBMS to accommodate a
large volume of real-time transactions.
Tables in OLAP database are not normalized. Tables in OLTP database are normalized.
OLAP systems are designed for use by data OLTP systems are designed for use by frontline
scientists, business analysts, and knowledge workers such as cashiers, bank tellers, hotel desk
workers. clerks or for custom self-service applications.
Data Cube
A data cube in a data warehouse is a multi-dimensional structure used to store data. A data cube allows data
to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts.
A cube is basically categorized into two main kinds that are multi-dimensional data cube and relational data
cube.
Multidimensional modeling is a technique for structuring data around the business concept.
Multi-dimensional data model is a logical view of an enterprise that represents the important entities of a
business and the relationship between them.
The multi-dimensional data model holds data in the shape of a data cube. Two or three-dimensional cubes are
often served by data warehousing.
Dimensions are the perspective or entities with respect to which an organization wants to keep records. Each
dimension may have a table associated with it called a dimension table.
Defining schemas
Star schema
Star schema is a simple and common modeling paradigm where the data warehouse contains facts and
dimensions. It is simple and easily designed.
A fact table is used in dimensional model in data warehouse design. A fact table is found at the center of a star
schema or snowflake schema surrounded by dimension tables.
Dimension tables are generally used to define dimensions; they include the values, dimensions as well as
attributes.
Typically, dimension tables are small in size, ranging from a few to numerous thousand rows.
Most data warehouses use star schema to represent multidimensional models. Each dimension is represented
by a dimension table that describes it.
Star schema does not use normalization hence there is high level of data redundancy. Cube processing is
faster in star schema, but it uses more space.
The link between the fact table in the center and the dimension tables in the extremities form a shape like a
star.
Snowflakes schema
Snowflakes schema is a kind of star schema, which includes the hierarchical form of dimensional tables.
It contains in-depth joints because the tables are split into many pieces. Have to use complicated joints, since
it has more tables.
The major difference between the snowflake and star schema model is that the dimension tables of the
snowflake model may be kept in normalized form to reduce redundancies.
Fact constellation
This schema is more complex than star or snowflake varieties, which is due to the fact that it contains multiple
fact tables.
This allows dimension tables to be shared amongst the tables.
Scheme of this type should only be used for application that need high level of sophistication.
For each star schema or snowflake schema it is possible to construct a fact constellation schema.
This is a very flexible solution however it may be hard to manage and support it later.
The main advantage of the constellation schema is a more complicated design because many variants of
aggregation must be considered.