Audit Course 2
Audit Course 2
Business Intelligence
AUDIT COURSE (COMPUTER ENGINEERING)
PRESENTED BY: -
UNDER GUIDANCE OF
1|Page
Certificate
This is to certify that the audit course entitles
(Business Intelligence)
Submitted by: -
Place : Pune
2|Page
Date :
ACKNOWLEDGEMENT
Samiksha Rahul
Bankar
3|Page
CONTENTS
Chapter: 1
Concepts with Mathematical treatment.
Chapter: 2
Decision Making Concepts.
Chapter: 3
Data-Warehouse.
Chapter: 4
Data Pre-processing and outliers.
Chapter: 5
Designing and managing BI systems.
4|Page
Chapter 1
Concepts with Mathematical Treatment
Introduction of BI:
The purpose of Business Intelligence (BI) is to provide decision makers with
the information necessary to make informed decisions. The information is
delivered by the Business Intelligence system via reports. This book focuses on
the architecture and infrastructure needed to deliver the information.
Architecture is a set of rules or structures providing a framework for the overall
design of a system or product (Poe et al. 1998). The BI system includes the
following parts:
Interested parties and their respective information needs
Input of data
Storage of data
Analysis of data
Automatic and selective dissemination of information
A BI system includes the rules (architecture) providing a framework for the
organization of the technologies, platforms, databases, gateways, people and
processes. To implement an architecture the Business Intelligence architect
must implement an infrastructure. Technical infrastructures are the
technologies, platforms, databases, gateways, people and processes necessary to
make the architecture functional within the corporation (Poe et al. 1998).
In sum, decision makers need reports that deliver the information that allows
them to understand his/her organization and the world in order to make better
decisions.
Given these functions of the BI system the most enduring definition focuses on
the system, tools, technology, process, and techniques that compose these four
elements that help decision makers understand their world. BI augments the
ability of decision makers to turn data into information by aiding in extracting
data from data sources, organizing the data based on established business
knowledge and then presenting the information in a manner that is organized in
a way to be useful to the decision maker. It merges technology with knowledge
in order to provide useful information to management as quickly as possible.
In sum, Business Intelligence system includes the rules (architecture) that
outline how to organize the parts of the system (infrastructure) to deliver the
5|Page
information needed to thrive in a competitive market (business) or to provide
the best service to the people (government) (Poe et al.1998). Regardless of the
technology, which is simply tools, the core of a business intelligence system has
and will not change.
Determining BI Cycle:
Intelligence is not just a set of tools to analyze raw data to help make strategic
and operational decisions. It is a framework that offers guidance in
understanding what to look for in the volumes of disparate data. As a
framework, BI is a continuous cycle of analysis, insight, action and
measurement.
Analysing a business is based on what we know and feel to be important while
filtering out the aspects of the business not considered mission critical or
detrimental to the growth of the organization. Deciding what is important is
based on our understanding and assumptions of what is important to customers,
supplies, competitors and employees. All of this knowledge is unique to a
business and is an incredible resource when formulating a BI strategy.
However, having such granular grassroots knowledge of the business can
subconsciously limit the ability to see patterns obvious to others. One of the
benefits of BI reporting is performing ad hoc reporting by drilling down layers
of data pivoting on the rows and columns. Such flexibility opens up human’s
inquisitive nature to ask more questions that wouldn’t necessarily be asked if
such access to data wasn’t available. Effective analysis helps to understand the
business better so to challenge conventional wisdom and assumptions as well as
what is considered to be the right analysis.
6|Page
Insight comes in many forms. There are operational insights, such as
determining the effect on production costs with the installation of new more
energy efficient machines that have slightly lower production yields per unit of
measure. There are strategic insights analysing, for example, new market
opportunities by conducting research on the barriers to entry. Insight is the
intangible product of analysis developed from asking questions that only
humans can ask. Computers can be used for the identification of patterns, but
only humans can recognize what patterns are useful.
The issue with having insight is convincing others to believe or support the new
perspective in order for the insight to be useful. As in life, anything new or
different is slow to acceptance or given credibility. Well organized business
intelligence that supports the insight by providing clear data, patterns, logic,
presentation (i.e., graphs, reports) and calculations are the drivers to help sell
the new insight.
Once the analysis is done and the insight has been sold the next process in the
BI cycle is performing the action or decision making. Well thought out
decisions backed up by good analysis and insight gives confidence and courage
to the proposed action. Otherwise, decisions not supported by quality analytics
are made with overbearing safety measures or less dedication or commitment
from the stakeholders. In addition, quality business intelligence delivered
quickly improves the speed to action. Today’s organizations need to react more
quickly, develop new approaches faster, conduct more agile R&D and get
products and services to market faster than ever before. BI based decision
making with faster access to information and feedback provides more
opportunity for quicker prototyping and testing.
7|Page
Creating a sustainable architecture depends on understanding the different
components that are involved with developing successful business intelligence
tools. The process is broadly divided into three areas: data collection,
information management, and business intelligence.
The first area refers to the different channels and methods of collecting data
from activities carried out within your organization. This includes
understanding which data different users need to meet their requirements, as
well as a clear idea of the quality, type, and currency of the data. This step is
vital for adding value as the right data produces the best BI insights. The second
major component is data management. This covers various aspects of
integrating data, scrubbing datasets, and fashioning the overall structures that
will house and administer data.
Finally, business intelligence is the part of an organizations architecture that
analyses properly organized data sets to produce insights. This area involves
using real-time analytics, data visualizations, and other BI tools.
Benefits of BI:
Faster reporting, analysis or planning
More accurate reporting, analysis or planning
Better business decisions
Improved data quality
Improved employee satisfaction
Improved operational efficiency
8|Page
Improved customer satisfaction
Increased competitive advantage
Reduced costs
Increased revenues
Saved headcount
9|Page
Chapter 2
Decision Making Concepts
Concept of Decision Making:
Decision-making is the act of making a choice among available alternatives.
There are innumerable decisions that are taken by human beings in day-to-day
life. In business undertakings, decisions are taken at every step. It is also
regarded as one of the important functions of management. Managerial
functions like planning, organizing, staffing, directing, coordinating and
controlling are carried through decisions.
Decision making is possible when there are two or more alternatives to solve a
single problem or difficulty. If there is only one alternative then there is no
question of decision making. It is believed that the management without a
decision is a man without a backbone. Therefore, decision making is a problem-
solving approach by choosing a specific course of action among various
alternatives. In conclusion, we can say that decision making is the process of
choosing a specific course of action from various alternatives to solve the
organizational problems or difficulties.
Employee’s motivation
Pervasive function
10 | P a g e
Steps of Decision-Making Process:
Identification of problem
Analysis of problem
11 | P a g e
Intelligence — Searching for conditions that call for decision;
The actual application that will be used by the user. This is the part of the
application that allows the decision maker to make decisions in a particular
problem area. The user can act upon that particular problem.
Applications of DSS:
Medical diagnosis
Agricultural Production
12 | P a g e
Chapter 3
Data Warehouse
13 | P a g e
How it Works.
Data Warehousing combines information collected from multiple sources into
one comprehensive database. To cite an example from the business world, I
might say that data warehouse incorporates customer information from a
company’s point-of-sale systems (the cash registers), its website, its mailing
lists, and its comment cards. Data Warehousing may also consider confidential
information about employee details, salary information, etc.
Companies use this information to analyze their customers. Data warehousing
also related to data mining which means looking for meaningful data patterns in
the huge data volumes and devise newer strategies for higher sales and profits.
3. Data Mart-
A Data Mart is a subset of the data warehouse. It specially designed for
specific segments like sales, finance, sales, or finance. In an independent
data mart, data can collect directly from sources.
Data Warehouses and data marts are mostly built on dimensional data
modeling where fact tables relate to dimension tables. This is useful for
users to access data since a database can be visualized as a cube of several
dimensions. A data warehouse allows a user to splice the cube along each of
its dimensions.
15 | P a g e
1. The data flow in the bottom-up approach starts from extraction of data
from various source system into the stage area where it is processed and
loaded into the data marts that are handling specific business process.
2. After data marts are refreshed the current data is once again extracted in
stage area and transformations are applied to create data into the data
mart structure. The data is the extracted from Data Mart to the staging
area is aggregated, summarized and so on loaded into EDW and then
made available for the end user for analysis and enables critical business
decisions.
2. Top-down design:
The top-down approach is designed using a normalized enterprise data
model. "Atomic" data, that is, data at the lowest level of detail, are stored in
the data warehouse. Dimensional data marts containing data needed for
specific business processes or specific departments are created from the data
warehouse. “Bill Inmon” is sometimes also referred to as the “father of data
warehousing”; his design methodology is based on a top-down approach. In
the top-down approach, the data warehouse is designed first and then data
mart are built on top of data warehouse.
16 | P a g e
2. Data is extracted from the data warehouse in regular basis in stage area.
At this step, you will apply various aggregation, summarization
techniques on extracted data and loaded back to the data warehouse.
3. Once the aggregation and summarization are completed, various data
marts extract that data and apply the some more transformation to make
the data structure as defined by the data marts.
3. Hybrid design:
Data warehouses (DW) often resemble the hub and spokes architecture.
Legacy systems feeding the warehouse often include customer relationship
management and enterprise resource planning, generating large amounts of
data. To consolidate these various data models, and facilitate the extract
transform load process, data warehouses often make use of an operational
data store, the information from which is parsed into the actual DW. To
reduce data redundancy, larger systems often store the data in a normalized
way. Data marts for specific reports can then be built on top of the DW.
17 | P a g e
An enterprise data warehouse contains historical detailed data about the
organization. Typically, data flows from one or more online transaction
processing (OLTP) databases into the data warehouse on a monthly, weekly, or
daily basis. The data is usually processed in a staging file before being added to
the data warehouse. Data warehouses typically range in size from tens of
gigabytes to a few terabytes, usually with the vast majority of the data stored in
a few very large fact tables.
A data mart contains a subset of corporate data that is of value to a specific
business unit, department, or set of users. Typically, a data mart is derived from
an enterprise data warehouse.
One of the techniques employed in data warehouses to improve performance is
the creation of summaries, or aggregates. They are a special kind of aggregate
view which improves query execution times by pre-calculating expensive joins
and aggregation operations prior to execution, and storing the results in a table
in the database. For example, a table may be created which would contain the
sum of sales by region and by product.
Today, organizations using summaries spend a significant amount of time
manually creating summaries, identifying which ones to create, indexing the
summaries, updating them, and advising their users on which ones to use. The
introduction of summary management in the Oracle server changes the
workload of the DBA dramatically and means the end-user no longer has to be
aware of which summaries have been defined.
The DBA creates one or more materialized views, which are the equivalent of a
summary. The end-user queries the tables and views in the database and the
query rewrite mechanism in the Oracle server automatically rewrites the SQL
query to use the summary tables. This results in a significant improvement in
response time for returning results from the query and eliminates the need for
the end-user or database application to be aware of the summaries that exist
within the data warehouse.
Although summaries are usually accessed indirectly via the query rewrite
mechanism, an end-user or database application can construct queries which
directly access the summaries. However, serious consideration should be given
to whether users should be allowed to do this because, once the summaries are
directly referenced in queries, the DBA will not be free to drop and create
summaries without affecting applications.
The summaries or aggregates that are referred to in this book and in literature on
data warehousing are created in Oracle using a schema object called a
materialized view. Materialized views can be used to perform a number of roles,
18 | P a g e
such as improving query performance or providing replicated data, as described
below.
Chapter 4
Data Pre-Processing and Outliers
20 | P a g e
findings, quantify the business value, and develop a narrative to summarize and
convey findings to stakeholders.
Phase 6—Operationalize: In Phase 6, the team delivers final reports, briefings,
code, and technical documents. In addition, the team may run a pilot project to
implement the models in a production environment.
Data Preprocessing:
Data preprocessing is a data mining technique that involves transforming raw
data into an understandable format. Real-world data is often incomplete,
inconsistent, and/or lacking in certain behaviors or trends, and is likely to
contain many errors. Data preprocessing is a proven method of resolving such
issues.
Data preprocessing is a data mining technique that involves transforming raw
data into an understandable format. Real-world data is often incomplete,
inconsistent, and/or lacking in certain behaviors or trends, and is likely to
contain many errors. Data preprocessing is a proven method of resolving such
issues. Data preprocessing prepares raw data for further processing.
Data preprocessing is used database-driven applications such as customer
relationship management and rule-based applications (like neural networks)
Data preprocessing is a data mining technique that involves transforming raw
data into an understandable format. Real-world data is often incomplete,
inconsistent, and/or lacking in certain behaviors or trends, and is likely to
contain many errors. Data preprocessing is a proven method of resolving such
issues. Data preprocessing prepares raw data for further processing.
Data preprocessing is used database-driven applications such as customer
relationship management and rule-based applications (like neural networks).
Steps in Data preprocessing:
1. Data Cleaning: Data cleaning, also called data cleansing or scrubbing. Fill
in missing values, smooth noisy data, identify or remove the outliers, and
resolve inconsistencies. Data cleaning is required because source systems
contain “dirty data” that must be cleaned.
Steps in data cleaning-
Parsing:
Parsing locates and identifies individual data elements in the source files
and then isolates these data elements in the target files.
Example: includes parsing the first, middle and the last name.
21 | P a g e
Correcting:
Standardizing:
Matching:
Searching and matching records within and across the parsed, corrected
and standardized data based on predefined business rules to eliminate
duplications.
Consolidating:
Data cleansing
It must deal with many types of possible errors like these include missing
a data and incorrect data at one source
Data Staging:
22 | P a g e
Data integration: Combines data from multiple sources into a coherent
data store e.g., data warehouse. Sources may include multiple databases,
data cubes or data files.
Schema integration:
Entity identification problem: identify real world entities from multiple data
sources, e.g. A cut- id=B.cust#.
For the same real-world entity, attribute values from different sources are
different. Possible reasons: different representations, different scales.
Redundant data occur often when integration of multiple databases. The
same attribute may have different names in different databases.
Data Transformation:
Transformation process deals with rectifying any inconsistency (if any).
23 | P a g e
Thus, one set of Data Names are picked and used consistently in the data
warehouse. Once all the data elements have right names, they must be
converted to common formats.
3. Data Reduction:
Introduction to OLAP:
OLAP allows business users to slice and dice data at will. Normally data in an
organization is distributed in multiple data sources and are incompatible with
each other. A retail example: Point-of-sales data and sales made via call-center
or the Web are stored in different location and formats. It would a time-
consuming process for an executive to obtain OLAP reports such as - What are
the most popular products purchased by customers between the ages 15 to 30?
Part of the OLAP implementation process involves extracting data from the
various data repositories and making them compatible. Making data compatible
involves ensuring that the meaning of the data in one repository matches all
other repositories. An example of incompatible data: Customer ages can be
stored as birth date for purchases made over the web and stored as age
categories (i.e., between 15 and 30) for in store sales.
It is not always necessary to create a data warehouse for OLAP analysis. Data
stored by operational systems, such as point-of-sales, are in types of databases
called OLTPs. OLTP, Online Transaction Process, databases do not have any
24 | P a g e
difference from a structural perspective from any other databases. The main
difference, and only, difference is the way in which data is stored.
OLTPs are designed for optimal transaction speed. When a consumer makes a
purchase online, they expect the transactions to occur instantaneously. With a
database design, call data modeling, optimized for transactions the record
'Consumer name, Address, Telephone, Order Number, Order Name, Price,
Payment Method' is created quickly on the database and the results can be
recalled by managers equally quickly if needed.
25 | P a g e
Types Of Outliers:
Type1: Global-Outliers:
A data point is considered a global outlier if its value is far outside the entirety
of the data set in which it is found (similar to how “global variables” in a
computer program can be accessed by any function in the program). Global
Anomaly.
Type 2: Contextual (Conditional) Outliers:
Some Python libraries like SciPy and Sci-kit Learn have easy to use functions
and classes for an easy implementation along with Pandas and Numpy. After
26 | P a g e
making the appropriate transformations to the selected feature space of the
dataset, the z-score of any data point can be calculated with the following
expression:
When computing the z-score for each sample on the data set a threshold must
be specified. Some good thumb-rule thresholds can be: 2.5, 3, 3.5 or more
standard deviations.
By tagging or removing the data points that lay beyond a given threshold we
are classifying data into outliers and not outliers.
Z-score is a simple, yet powerful method to get rid of outliers in data if you
are dealing with parametric distributions in a low dimensional feature space.
For nonparametric problems Dbscan and Isolation Forests can be good
solutions.
27 | P a g e
Chapter 5
Designing and Managing the BI Systems
1. End-User Experience
These are the core capabilities for all the end users of your application. Business
intelligence requirements in this category may include dashboards and reports as
well as the interactive and analytical functions users can perform. Ideally, such
self-service capabilities let users answer their own questions without having to
involve IT. That frees up IT to work on more strategic projects rather than
answering ad hoc requests. It also empowers users to be more self-sufficient.
During your evaluation, make sure the capabilities important to your project are
demonstrated and understand how you will deliver and iterate upon these
capabilities inside your application.
28 | P a g e
Analysis and authoring: Empowering your users to query their own data,
create visualizations and reports, and share their findings with colleagues
can add value to your analytics application.
2. Data Environment
The BI solutions you evaluate should be compatible with your current data
environment, while at the same time have enough flexibility to meet future
demands as your data architecture evolves. These are the diverse data
requirements commonly evaluated by application providers:
Data sources: Make sure your primary data source is supported by your
BI solution. Also look for a vendor that supports generic connectors and
has flexibility through APIs or plug-ins.
30 | P a g e
to the finish line, and beyond. These factors may affect your business
intelligence requirements:
31 | P a g e