0% found this document useful (0 votes)
28 views

BI Bro Notes Full

A decision support system analyzes large amounts of data and presents the best options to organizations to help decision makers solve complex problems. It gathers, analyzes, and synthesizes data to produce reports that make more informed decisions. A data warehouse stores historical business data to help organizations make future decisions and do analysis based on past data. It receives data through an extract, transform, load process from source systems to presentation servers.

Uploaded by

Vaibhav Sonawane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

BI Bro Notes Full

A decision support system analyzes large amounts of data and presents the best options to organizations to help decision makers solve complex problems. It gathers, analyzes, and synthesizes data to produce reports that make more informed decisions. A data warehouse stores historical business data to help organizations make future decisions and do analysis based on past data. It receives data through an extract, transform, load process from source systems to presentation servers.

Uploaded by

Vaibhav Sonawane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

BI Unit – 1

DSS

A decision support system (DSS) is an interactive computer-based application.

It analyzes large amounts of data and presents the best option available to the organization.

It helps decision makers to solve complex problems. DSS is a basic component in the development of business
intelligence architecture.

Decision support system is defined as a computer system that,

o Gathers and analysis data.


o Synthesizes data to produce comprehensive information reports.
o Differs from ordinary operational applications.
o Makes more informed decisions.
o Timely problem solving.
o Efficient and dealing with issues or operations, planning and management.

Three main elements of DSS are data, models and graphical user interface for handling the dialogue between
the system and the user.

Decision making process

The decision-making process includes three phases and two additional phases.

o Intelligence
o Design
o Choice

Additional phases

o Implementation
o Control
Intelligence: In this face, the decision maker examines the reality and tries to understand the problem
occurring in the organization, why the problem exists and what effect it is having on the firm.

Design: This phase involves identifying and exploring various solutions to solve the problem. Here the
experience of the decision-maker plays a very important role.

Choice: The selected alternative for solving a problem has to be evaluated using the performance criteria.

Implementation: The selected alternative has to be transformed into action by means of an implementation
plan, which includes assigning rules to all involved in the action plan.

Control: Once the implementation is completed, the difference of performance indicator values obtained in
the choice phase and the implementation phase need to be compared. The results are to be transformed into
experience and information. This is used in data warehouse during the decision-making process.

BI

The large amount of data which we get through the Internet is often heterogeneous in origin and also
represented in different formats, contents are also varied with respect to origin.

So, there is a need to convert search data into knowledgeable information, which can be used by decision-
makers to take decisions for the improvement of their business.

Business intelligence is a set of mathematical models and analysis methodologies that exploit the available
data to generate information and knowledge useful for complex decision-making processes.

Benefits

BI system reduces labor costs by generating reports automatically.

Make information actionable as the user can get data as per their requirement to get the knowledge.

Decision makers can make better decisions as exact and up-to-date information is provided.

Multiple data sources can be combined through BI, so can be taken faster.

Business metrics reports are available and can be accessed from anywhere and whenever there is a need.

Get insight into customer behavior.

BI Architecture

Architecture
Data sources

Different data sources can be relational DBMS like oracle, Informix.

In addition to these internal data, operational data also includes external data obtained from commercial
databases, and databases associated with suppliers and customers, which include unstructured documents
like emails.

It is necessary to perform all operations associated with extraction and integration of data from different
heterogeneous sources.

Data warehouse

An ETL tool (extract, transform, load) is a tool that reads data from one or more sources, transforms the data so
that it is compatible with destination, and loads the data to the destination databases.

ETL function transforms the relevant data collected from different source systems into useful information and
then stores it in data warehouse, which can be used for strategic decision making.

Business intelligence methodologies

The main purpose of data warehouse is to provide information to the business manager for strategic decision
making.

Extracted data from data warehouse is supplied to the mathematical model and analysis methodologies.

These users interact with the warehouse using user access tools, such as

o Reporting enquiry tool


o Application development tool
o Executive information system tool
o Online analytical processing tool
o Data mining tool

Data information and knowledge

The vast amount of data gets collected in the information system.

This data is generated from internal transactions and from external sources.

This data collected and stored in a systematic manner is still not suitable for the decision-making process.

It needs to be extracted, using tools, and processed using analytical methods, capable of transforming them
into information and knowledge which can be used by decision-makers.

Difference between data information and knowledge

Data

Data encompasses plain facts, observation, statistics, character, symbols, images, numbers, and more that
are collected to be used for analysis.

Data alone is not very informative, but it gains purpose after it is interpreted to derive significance.

Data is a standalone concept and is the foundation of information.


Data can be of two types, qualitative and quantitative.

Quantitative data is measurable as it is numeric.

Qualitative data is not measurable. It is descriptive, this includes conversations, observations, and feedback.

Information

Information is the processed, refined, structured, and presented form of data which can be used to create
relevance and usefulness.

Information doesn’t exist without data.

A variety of knowledge management systems, software and tools are used by the organizations to turn data into
information, some of which include databases, spreadsheets, documents, guidelines and strategies.

Information assigns meaning to the data. It also improves the reliability of data.

Data when converted to information ensures undesirability and reduces uncertainty, due to which it never has
any useless details.

Knowledge

Knowledge is a combination of information, experience, and insight.

It is used to make decisions and develop corresponding actions.

Knowledge can be extracted from data in passive and active ways.

Passive way involves analysis as suggested by the decision-maker.

Active way involves applying a mathematical model in the form of inductive learning.

Role of mathematical models

A business intelligence system uses mathematical models and algorithms to provide decision makers with
information and knowledge extracted from data.

This includes calculation of totals and percentages, graphical representation by histogram and more analysis
through optimization and learning models.

Using the mathematical models have the following advantages:

o An abstract model developed helps the decision makers to focus on the main features of the domain
which leads to deeper understanding of the problem under investigation.
o Knowledge acquired while building a mathematical model about the domain can be transferred to other
individuals in the long run.
o Mathematical models developed for decision-making is useful for solving other problems of similar
types.

Ethics in business intelligence

The usage of business intelligence, data mining methods, and decisions support system raise ethical problems
that cannot be ignored.

The distortions and risks generated towards information and knowledge should be avoided using adequate
control rules, and mechanisms.

Data related to individuals which do not respect the right to privacy should not be tolerated.

Business intelligence analysts and decisions makers should abide by the ethical principle of respect for the
personal rights of individuals.
BI Unit – 2

Data warehouse

Data warehouse keeps the historical information of business-related data. This information helps the
businesspeople to take future decisions for business and also do analysis based on past data.

Three different kinds of systems that are required for a data warehouse are,

o Source system
o Staging area
o Presentation servers

The data travels from source system to presentation servers via the data staging area.

Source System

The data in the data warehouse comes from the operational system of the organization as well as from external
sources.

These are collectively referred to as a source system.

Staging Area

The data extracted from source system is stored in an area called data staging area, where the data is cleaned,
transformed, combined, and duplicated to prepare the data in the data warehouse.

The data staging area is generally a collection of machines where simple activity like sorting and sequential
processing takes place.

The staging area does not provide any query or presentation service. As soon as the system provides square
your presentation service is categorized as presentation server.

Presentation servers

Presentation server is the target machine on which the data is loaded from the data staging area, organized,
and stored for direct querying by end users, report writers and other applications.

The enter process is popularly known as ETL extract, transform, load.

An ETL tool (extract, transform, load) is a tool that reads data from one or more sources, transforms the data so
that it is compatible with destination, and loads the data to the destination databases.

ETL function transforms the relevant data collected from different source systems into useful information and
then stores it in data warehouse, which can be used for strategic decision making.

Distinguish between Business Intelligence and Data Warehouse

Business Intelligence (BI) Data Warehouse


Set of tools, processes, and technologies used to Centralized repository that integrates data from
gather, analyze, and present data for decision- various sources into a unified format for reporting
making. and analysis.
Convert raw data into actionable insights to support Provide a consolidated view of historical and current
decision-making, strategic planning, and business data to support decision-making, reporting, and
performance improvement. analytics.
Analyzing historical and current data to identify Storing and managing large volumes of structured
patterns, trends, and correlations for informed data from multiple sources for querying, analysis,
decision-making. and reporting.
Used by business users, analysts, and decision- Designed for technical users involved in designing
makers who require access to organized and and maintaining the infrastructure, but the data is
analyzed data. accessed by business users through BI tools and
reporting interfaces.
BI is a term very commonly associated with data warehousing.

BI is where companies gather today’s information from the market about their competitors.

BI simplifies information discovery and analysis.

BI simply refers to information available with the company to take decisions.

BI also refers to getting insight into the data mining analysis.

DW is the backend used for achieving BI.

DW helps the enterprise in storing the data while BI helps control the data in decision-making or forecasting.

Tools in Business Intelligence

A business intelligence tool collects, processes, and analyzes large amounts of data from internal and external
systems.

Data may be collected from sources like documents, images, emails, videos, journals, books, social media,
etc.

BI makes use of queries to find information and present the data in user friendly formats like reports,
dashboard, charts, and graphs.

Functions performed by the tool include data mining, data visualization, performance management analytics,
reporting, text mining, predictive analytics and etc.

Employees may use this information to make better decisions.

Some of business intelligence tools are:

Spreadsheets

Options in spreadsheets used our MS excel, open-source spreadsheets, web-based spreadsheets.

Spreadsheets are also provided as front-end user interface by business intelligence software.

Digital dashboards

Dashboards are used for graphical presentation of the current status.

Data visualization software

The software helps create visual analysis of data sets to gain meaningful insights of data within minutes.

OLAP tool

OLAP tool helps the user in interactively analyzing the data from multiple sources in a multidimensional view.

Functions include roll up, drill down, slicing and dicing.

Mobile business intelligence

This is about bringing intelligence on mobile devices to read historical data and analyze real time information.

Data mining tools

This tool is used in discovering patterns in large data sets using AI, machine, learning, statistics, and data
database systems.

Data Warehousing tools

This is used as a central repository of data collected from multiple sources.

This data is used for future retrieval for analysis.


Process mining

This is mining of event logs stored, which is used for providing information for process analysis and governance.

Need for business intelligence

Gain new customer insights

Business intelligence provides the organization with the ability to observe and analyze the customer buying
trends.

This information can be used to create products and product improvements to meet their expectations and
needs.

Improved visibility

Organizations using business intelligence have better control over their processes and standard operating
procedures since visibility is improved by BI system.

BI helps in identifying areas of improvement and allows an organization to be prepared.

Actionable information

BI helps in identifying key organizational patterns and trends.

It also allows you to understand the implication of various processes and changes within an organization,
which helps in making informed decisions and take action.

Efficiency improvements

BI helps in improving organizational efficiency, which results in increased productivity and increased revenue.

Information across departments is shared with ease which saves time on reporting, data extraction, and data
interpretation.

This helps in eliminating redundant roles and duties, which allows the employees to focus on work rather than
processing data.

Sales insights

A customer relationship management (CRM) application has a lot of important data and information that can
be used for strategic initiatives.

BI helps in identifying new customer tracking and retaining existing and also providing post sales services.

Real time data

With the help of BI tools, the executives and decision makers can access data in real time through
spreadsheets, visual dashboards, and schedule emails.

Online analytical processing (OLAP)

Online analytical processing (OLAP) is a category of software that allows users to analyze information from
multiple database systems at the same time.

It is a technology that enables analysts to extract and view business data from different points of view.

As online analytical processing operations are a multi-dimensional data model, these operations are
performed over the data cubes.

The concept of data cube is used to represent data in three-dimensional space and so that analyst can turn
around the data cubes along its dimension for all its possible space combinations to determine the every
aspect of available data.
Each OLA cube is presented through measure and dimensions. Measure refers to the numeric value
categorized by dimensions. Basic operations are roll up, drill down, slicing and dicing.

Roll up:

It is also called drill-up operation. It performs aggregation on a data cube, either by climbing up a concept of
hierarchy or by dimension reduction.

Roll up is like zooming out on the data cube. When a rollup is performed by dimension reduction, one or more
dimensions are removed from the data cube. It involves summarizing the data along a dimension.

Drill down:

Drill down is the reverse of a roll up. Drill down is a dimension expansion technique that can be applied on the
data cube.

Dimension expansion means adding new dimensions or expanding existing dimensions across any axis of the
data cube using the notation of concept hierarchy.

Navigation from less detailed data to more detailed data can be achieved through drill down.

Slice and dice:

The slice operation performs a selection on one dimension of the given cube, resulting in a sub cube.

The dice operation defines a cube by performing a selection on two or more dimensions.

The slice operation produces a sliced OLAP cube by allowing the analyst to pick a specific value for one of the
dimensions.

It helps the user to visualize and gather the information specific to a dimension.

Different OLAP architectures

Multidimensional OLAP (MOLAP)

Multidimensional OLAP (MOLAP) is the classic form of OLAP and is sometimes referred to as just OLAP.

MOLAP Stores this data in an optimized multidimensional array storage, rather than in a relational database.

Therefore, it requires the pre-computation and storage of information in the cube - this operation is known as
processing.

MOLAP tool generally utilizes a pre-calculated data set referred to as data cube. The data cube contains all the
possible answers to a given range of questions.

MOLAP tools have a very fast response time and ability to quickly write back the data into the data set.

Relational OLAP (ROLAP)

Relational OLAP (ROLAP) is a kind of online analytical processing that analyzes data using multidimensional
data models.

ROLAP can handle large volumes of data. Although its use of relational database means that it requires more
processing time, it can be accessed by any SQL tool, and it does not have to be a tool specifically for OLAP’s.

Compared to MOLAP, ROLAP tools are much better at controlling non-aggregated facts such as textual
descriptions.

Hybrid OLAP (HOLAP)

There is no clear agreement across the industry as to what constitutes Hybrid OLAP, except that a database will
divide data between relational and specialized storage.
For example, HOLAP database will use relational tables to hold larger quantities of detailed data and use
specialized storage for at least some aspects of the smaller quantities of less detailed data.

HOLAP addresses the shortcomings of MOLAP and ROLAP by combining the capabilities of both approaches.
HOLAP tools can utilize both pre-calculated cubes and relational data sources.

Online Analytical Processing (OLAP) Online Transaction Processing (OLTP)


It supports short transactions, both query and It supports long transactions, usually complex
updates. queries. Simple

An OLAP database has a multidimensional schema. OLTP uses a traditional DBMS to accommodate a
large volume of real-time transactions.
Tables in OLAP database are not normalized. Tables in OLTP database are normalized.
OLAP systems are designed for use by data OLTP systems are designed for use by frontline
scientists, business analysts, and knowledge workers such as cashiers, bank tellers, hotel desk
workers. clerks or for custom self-service applications.

OLAP contains historical data. OLTP contains current data.


It focuses on information out. It is an application-oriented model.
It’s database size varies from 100 MB to GB. It’s database size varies from 100 GB to TB.
It is used for decision support system It is used for day-to-day operations.

Data Cube

A data cube in a data warehouse is a multi-dimensional structure used to store data. A data cube allows data
to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts.

A cube is basically categorized into two main kinds that are multi-dimensional data cube and relational data
cube.

Multidimensional modeling is a technique for structuring data around the business concept.

Multi-dimensional data model is a logical view of an enterprise that represents the important entities of a
business and the relationship between them.

The multi-dimensional data model holds data in the shape of a data cube. Two or three-dimensional cubes are
often served by data warehousing.

Dimensions are the perspective or entities with respect to which an organization wants to keep records. Each
dimension may have a table associated with it called a dimension table.

Defining schemas

Star schema

Star schema is a simple and common modeling paradigm where the data warehouse contains facts and
dimensions. It is simple and easily designed.

A fact table is used in dimensional model in data warehouse design. A fact table is found at the center of a star
schema or snowflake schema surrounded by dimension tables.

Dimension tables are generally used to define dimensions; they include the values, dimensions as well as
attributes.

Typically, dimension tables are small in size, ranging from a few to numerous thousand rows.

Most data warehouses use star schema to represent multidimensional models. Each dimension is represented
by a dimension table that describes it.
Star schema does not use normalization hence there is high level of data redundancy. Cube processing is
faster in star schema, but it uses more space.

The link between the fact table in the center and the dimension tables in the extremities form a shape like a
star.

Snowflakes schema

Snowflakes schema is a kind of star schema, which includes the hierarchical form of dimensional tables.

It contains sub-dimension tables, including fact and dimension tables.

It contains in-depth joints because the tables are split into many pieces. Have to use complicated joints, since
it has more tables.

Queue processing might be slow because of the complex join.

Snowflakes schema is hard to understand and design.

The major difference between the snowflake and star schema model is that the dimension tables of the
snowflake model may be kept in normalized form to reduce redundancies.

Such a table is easy to maintain and saves storage space.

Fact constellation

As its name implies, it is shaped like a constellation of stars.

This schema is more complex than star or snowflake varieties, which is due to the fact that it contains multiple
fact tables.
This allows dimension tables to be shared amongst the tables.

Scheme of this type should only be used for application that need high level of sophistication.

For each star schema or snowflake schema it is possible to construct a fact constellation schema.

This is a very flexible solution however it may be hard to manage and support it later.

The main advantage of the constellation schema is a more complicated design because many variants of
aggregation must be considered.

You might also like