Unit 1 - Introduction To Business Intelligence and Big Data Analytics
Unit 1 - Introduction To Business Intelligence and Big Data Analytics
Unit 1
Introduction to business
intelligence and big data
analytics
Dr Franklin Lam
Business Intelligence and Big Data Analytics
• Intelligence has been used by researcher in artificial intelligence since the 1950s.
• Business Intelligence (BI) become a popular term in business and IT communities
in the 1990s. BI “encompasses a wide array of processes and software used to
collect, analyze, and disseminate data, all in the interests of better decision
making.” (Davenport, 2006).
• Business Analytics was introduced to represent the key analytical component in BI
(Davenport 2006).
• Big Data Analytics is analytics on a large enough scale, with fast enough
processing, to handle big data. In the era of big data, new data sources such as
clickstream data, social media postings and sensor data are arriving in different
forms and speed that cannot be easily stored in a single storage unit.
1.1 Davenport, T.H. (2017) “How Analytics Has Changed in the Last 10 Years (and
How It’s Stayed the Same)” at https://hbr.org/2017/06/how-analytics-has-
changed-in-the-last-10-years-and-how-its-stayed-the-same.
2
What is Analytics?
• Analytics is “the extensive use of data, statistical and quantitative analysis,
explanatory and predictive models, and fact-based management to drive decisions
and actions.” (Davenport and Harris, 2017)
A revolution of analytics
Analytics 1.0 – Traditional Analytics (1975+)
Analytics 2.0 – Big Data (2001+)
Primarily descriptive
analytics and reporting Analytics 3.0 – Fast Business Impact for the
Complex, large, Data Economy (2013+)
Internally sourced,
relatively small, unstructured data Analytics 4.0 –
sources
structured data A seamless blend of Cognitive (2016+)
“Black room” teams of New analytical and traditional analytics
analysts computational and big data Analytics used to make
capabilities automated decisions
Internal decision Analytics integral to
support “Data scientists” running the business Emergence of “cognitive
emerge technologies”
Spreadsheets, BI Data and analytics-
Online firms crate data- based products in every Replacement of human
based products and business tasks – digital/physical
services
Industrialized decision- Augmentation is human
making at scale focus
Davenport (2016)
3
Skills Across the Eras
1.0 Traditional 2.0 Big Data 3.0 Fast Business 4.0 Cognitive
Analytics Impact for the
Data Economy
• Data integration • Experimentation • Machine learning • Natural language
and curation • Data restructuring • Agile methods processing
• Storytelling with • Open source • Change • Neural
data coding management networks/deep
• Business acumen • Product learning
• Statistics development • Event stream
• Visual analytics processing
• Work design
4
Why Analytics is Important?
• Companies in many industries offer similar products and use
comparable technology. Many of the previous bases for competition
are no longer available.
• Unique geographical advantage doesn’t matter in global competition.
• Protective regulation is largely gone.
• Proprietary technologies are rapidly copied.
• Breakthrough innovation in products or services seems increasingly difficult to
achieve.
• High-performance business processes are among the last remaining
points of differentiation. To compete, companies must execute the
business in maximum efficiency and effectiveness, and to make the
smartest business decision possible.
5
Why Analytics is Important?
• Analytics support almost any business process.
• Automotive – for driverless car, automatic emergency response systems can
make maneuvers without driver input.
• Banking – big data sources provide the opportunity to market new products,
balance risks, and detect fraud.
• Government – pattern recognition in images and videos enhance security and
threat detection while the examination of transactions can spot healthcare
fraud.
• Manufacturing – pattern recognition in sensor data or images can diagnose
otherwise undetectable manufacturing defects.
• Retail – micro-segmentation and continuous monitoring of consumer behavior
can lead to nearly instantaneous customized offers.
6
Applications of Analytics
• Applications of analytics in various industry sectors have developed many related
areas e.g. marketing analytics, retail analytics, fraud analytics, transportation
analytics, health analytics, sports analytics, talent analytics, behavioral analytics, and
so forth.
• For example, sports analytics is the science of learning from data to improve sports
performance. Sports analytics was created in the 1970’s by Bill James and
popularized by bestselling book, Moneyball, by Michael Lewis in 2003 that focused on
the use of analytics in player in Oakland A’s.
• Nowadays, analytics is used in many facets of sports.
Texas Rangers boost
• Front-office attendance and optimize
• Analyzing fan behavior ranging from predictive models for season ticket renewals and marketing spend with
regular ticket sales, to scoring tweets by fans regarding the team, athletes, coaches, and 360-degree view of
ballpark operations
owners, similar to traditional CRM.
• Financial analysis on salary caps for pros.
• Back-office
• For individual players – recruitment models and scouting analytics; analytics for strength
and fitness as well as development.
• For team play – strategies and tactics, competitive assessments, and optimal roster
choices under various on-field and on-court situations. 7
Sport Analytics – Front-office Applications
Customers in green cells are most likely to
renew tickets and therefore require fewer
Seasonal ticket renewals survey score marketing touches
• a survey of fans by ticket seat location (“tier”) and asks about their likelihood of
renewing their season tickets provides useful insight for marketing action
• what they say versus what they do have big differences – 69% of fans in Tier 1
seats who said “probably not renew” will actual did!
8
Sport Analytics – Front-office Applications
Dynamic pricing—moving the
business from simple static
pricing by seat location tier to
day-by-day up-and-down pricing of
individual seats.
Factors include:
• the team’s record; star
athletes play for each team
• game dates and times
• each fan’s history of renewing
season tickets or buying
single tickets
• seat location, number of seats
• real-time information – traffic
congestion historically at
game time; the weather.
9
Sport Analytics – Back-office Application
A cascaded decision tree model can be developed from the coach’s annotated
film for predicting whether the next play will be a running play or passing play.
10
Analytical Competitors
• Analytics themselves don’t constitute a strategy, but using them to
optimize a distinctive business capability certainly constitutes a
strategy. For example, GE is differentiating its industrial services
processes by using sensor data to identify problems and
maintenance needs before they cause unscheduled downtime.
• Analytical competitors are “organizations that have selected one or a
few distinctive capabilities on which to base their strategies, and then
have applied extensive data, statistical and quantitative analysis, and
fact-based decision making to support the selected capabilities.”
(Davenport and Harris 2017)
11
Four Pillars of Analytical Competition
• Distinctive capability
• If analytics are to support competitive strategy, they must be in support of an important and distinctive
capability. Having a distinctive capability means that the organization views this aspect of its business as what
sets it apart from competitors and as what makes it successful in the marketplace.
• Enterprise-wide analytics
• Companies and organizations that compete analytically don’t entrust analytical activities just to one group
within the company or to a collection of disparate employees across the organization. They manage analytics as
an organization or enterprise and ensure that no process or business unit is optimized at the expense of another
unless it is strategically important to do so. The data and analyses are made available broadly throughout the
organization and that the proper care is taken to manage data and analyses efficiently and effectively.
• Senior management commitment
• The adoption of a broad analytical approach to business requires changes in culture, process, behavior, and
skills for multiple employees. Such changes don’t happen by accident; they must be led by senior executives
with a passion for analytics and fact-based decision making.
• Large-scale ambition
• Not all attempts to create analytical competition will be successful, of course. But the scale and scope of results
from such efforts should at least be large enough to affect organizational fortunes. Incremental, tactical uses of
analytics will yield minor results; strategic, competitive uses should yield major ones.
(Davenport and Harris 2017)
12
The Five Stages of Analytical Competition
Stage 5
Analytical
competitors
Stage 4
Analytical
companies
Stage 3
Analytical aspirations
Stage 2
Localized analytics
Stage 1
Analytical impaired
Objectives of BI:
• Enable interactive access to data
• Enable manipulate of data
• Give business managers and
analysts the ability to conduct
appropriate analyses
(Source: Business intelligence, analytics, and data science : a managerial perspective)
15
Architecture of BI
Components of BI:
• Data warehouse
(DW)
• Business
analytics
• Business
performance
management
Managers need the • User interface
right information at
the right time and in
the right place.
16
The Data Analytics Project Life Cycle
Performing
Identifying the Designing data Preprocessing Visualizing
analytics using
problem requirement data data
data
Clearly define the The data source can Data operations such To obtain meaningful
scope, purpose, data be decided and as data cleansing, information from
sources and based on the domain data aggregation, data using Use visualization to
timelines of the and problem data augmentation, descriptive, provides more
analytics project and specification; the data sorting, and predictive, informative insights.
put them all together data attributes of data formatting, to prescriptive or
in a clear and these datasets can provide the data in a autonomous
concise problem be defined. supported format for analytics.
statement that gets data analytics.
signed off by all
stakeholders.
17
The Four Types of Business Analytics
• Descriptive analytics (aka BI or performance reports)
provides access to historical and current data. It
provides the ability to alert, explore, and reports using
both internal and external data from a variety of
sources.
• Predictive analytics uses quantitative techniques (e.g.
segmentation, network analysis and econometric
forecasting) and technologies (such as models and
rule-based systems) that use past data to predict the
future.
• Prescriptive analytics uses a variety of quantitative
techniques (such as optimization) and technologies
(e.g., models, machine learning and recommendation
engines) to specify optimal behaviors and actions.
• Autonomous analytics employs artificial intelligence
or cognitive technologies (such as machine learning)
to create and improve models and learn from data –
all without human hypotheses and with substantially
less involvement by human analysts. (Davenport and Harris 2017) 18
Enabling Technologies of Business Analytics
Descriptive Analytics Predictive Analytics Prescriptive Analytics Autonomous Analytics
• OLAP/DW • Data mining • Optimization • AI
• Data visualization • Text mining • Simulation • Cognitive technologies
• Dashboards and • Web/media mining • Decision modeling • Computer vision
scorecards • Forecasting • Expert systems • Natural language
• Descriptive statistics processing
• Speech recognition
• Robotics
AI vs cognitive technologies
“Artificial intelligence is the theory and development of computer systems able to perform tasks that
normally require human intelligence.”
“Cognitive technologies are products of the field of artificial intelligence. They are able to perform tasks
that only humans used to be able to do.” (e.g. A.I. Assistant)
19
Analytics in Retail Value Chain
22
Analytics Examples in Retail Value Chain
Analytic Business Question Business Value
Application
New store 1. What location should I open? 1. Best practices of other locations and channels
analysis 2. What and how much opening inventory can be used to get a jump start.
should I keep? 2. Comparison with competitor data can help to
create a differentiator factor to attract the new
customers
Store layout 1. How should I do store layout? 1. Understand the association of products to
2. How can I increase my in-store customer decide store layout and better alignment with
experience? customer needs.
2. Workforce deployment can be planned for
better customer interactivity and thus satisfying
customer experience.
Video analytics 1. What demography is entering the store 1. In-store promotions and events can be planned
during the peak period of sales? based on the demography of incoming traffic.
2. How can I identify a customer with high LTV 2. Targeted customer engagement and instant
at the store entrance so that a better discount enhance the customer experience
personalized experience can be provided to resulting in higher retention.
this customer?
(Source: Business intelligence, analytics, and data science : a managerial perspective)
23
Machine Learning and Statistical Modeling
• Both machine learning and statistics share the same goal: learning
from data. Both these methods focus on drawing knowledge or Machine learning and statistics
insights from the data. But, their methods are affected by their
inherent cultural differences.
• Statistical modeling is a formalization of relationships between
variables in the data in the form of mathematical equations. Statistical
methods can be classified as either descriptive or inferential.
• Descriptive statistics – describing the sample data
• Inferential statistics – drawing inference or conclusions about the
characteristics of the population.
• Machine learning uses algorithms to build analytical models, helping
computers “learn” from data, by drawing on concepts and results from KDD = Knowledge discovery in database
many fields, including statistics, artificial intelligence, philosophy, (Source: SAS Institute.)
information theory, biology, cognitive science, computational
complexity, and control theory.
• Machine learning seeks to answer the question: “How can we build
computer systems that automatically improve with experience, and
what are the fundamental laws that govern all learning processes?”
24
Machine Learning vs Statistical Analysis
• Understand the risk level of customers
churn over a period of time for a
Telecom company based on two drivers
– A & B.
• Classic regression analysis can
formulate a non-linear boundary for
classifying risky people from non-risky
people while machine learning
algorithm can generate contours to
capture all patterns beyond any
boundaries of linearity or even
continuity of the boundaries.
26
Machine Learning Definition
Suppose your email program watches which email you do or do not
mark as spam, and based on that learns how to better filter spam.
Which of the following is the task T in this setting?
Classifying emails as spam or not spam
27
How does Machine Learning Work?
• Machine Learning algorithm is
trained using a training data
set to create a model.
• When new input data is
introduced to the ML algorithm,
it makes a prediction on the
basis of the model.
• The prediction is evaluated for
accuracy and if the accuracy is
acceptable, the Machine
Learning algorithm is deployed.
If the accuracy is not
acceptable, the Machine
(Source: https://www.edureka.co/blog/what-is-machine-learning/)
Learning algorithm is trained
again and again with an
augmented training data set.
28
Types of Machine Learning
29
Supervised Learning vs Unsupervised Learning
Supervised learning is basically a synonym Unsupervised learning is essentially a
for classification. The supervision in the synonym for clustering. The learning process
learning comes from the labeled examples is unsupervised since the input examples are
in the training data set. not class labeled. Typically, we may use
clustering to discover classes within the data.
(Source: https://blog.westerndigital.com/machine-learning-pipeline- 30
object-storage/supervised-learning-diagram/)
Supervised Learning vs Unsupervised Learning
Which type of learning approach would be more suitable for the following
applications?
(a) Spam mail identification (b) Cross-selling
(Source: https://www.quora.com/How-does-machine-learning-work)
31
Supervised Learning
• Supervised learning uses a full set of labeled data to train an algorithm.
• Fully labeled means that each example in the training dataset is tagged with the
answer the algorithm should come up with on its own.
• So, a labeled dataset of animal images would tell the model which photos were of
dogs and cats. When shown a new image, the model compares it to the training
examples to predict the correct label.
(Source: https://blogs.nvidia.com/blog/2018/08/02/supervised-unsupervised-learning/)
32
Semi-supervised Learning
Semi-supervised learning is a class of machine learning techniques that make use of
both labeled and unlabeled examples when learning a model. In one approach,
labeled examples are used to learn class models and unlabeled examples are used to
refine the boundaries between classes.
33
Semi-supervised Learning
• Common situations for this kind of learning
are medical images like CT scans or MRIs.
• A trained radiologist can go through and
label a small subset of scans for tumors or
diseases.
• It would be too time-intensive and costly to
manually label all the scans — but the deep
learning network can still benefit from the
small proportion of labeled data and (Source:
improve its accuracy compared to a fully https://blogs.nvidia.com/blog/2018/
08/02/supervised-unsupervised-
unsupervised model. learning/)
34
Reinforcement Learning (RL)
Reinforcement learning (RL) continuously learns from the environment in an iterative
fashion. In the process, the agent learns from its experiences of the environment
until it explores the full range of possible states. Some applications of the RL
algorithm are computer played board games (e.g. playing chess), robotic hands, and
self-driving cars.
(Source: https://www.quora.com/How-does-machine-learning-work)
35
Machine
Learning
Algorithms
(Source:
https://in.mathworks.com/help
/stats/machine-learning-in-
matlab.html?w.mathworks.com)
1.2 Domingos, P. (2012) “A Few Useful Things to Know About Machine learning”,
ACM Communications, Vol. 55, No. 10, 78-87.
36