Intro. To Business Analytics
Intro. To Business Analytics
Introduction to Business
Analytics
Books
Contents
▪ What is Business Analytics?
▪ Gartner Analytics Maturity Model
▪ Categorization of Analytical Methods and Models
▪ (Big) Data Types
▪ The Six V’s of Big Data
▪ Modeling
▪ Simulation
▪ Data Mining
▪ Optimization
▪ Business Intelligence (BI)
▪ Business Analytics in Practice
▪ Machine Learning
▪ The CRISP-DM Framework
▪ Introduction to R & R Studio
Introduction to
Business Analytics
What is Business analytics?
▪ Business analytics is the scientific process of transforming data into insight for making better
decisions.
o Business analytics makes extensive use of analytical modeling and numerical analysis, including
explanatory and predictive modeling, and fact-based management to drive decision making.
Data that resides in fixed fields Data that does not reside in fixed Between the tow forms where
within a record or file. locations generally refers to free- “tags” or “structure” are
form text, which is ubiquitous associated or embedded within
▪ Both Inputs and Outputs are
▪ Both Inputs and Outputs are unstructured data
clear
unclear ▪ Either Inputs or Outputs are
unclear
(Big) Data Types
▪ "(Big) Data Types" refers to the various kinds of data that are collected, processed, and analyzed in the
context of big data.
▪ Big data is characterized by the volume, velocity, variety, and often the complexity of the data involved.
▪ Here are some common data types in the realm of big data:
▪ Structured Data:
This type of data is organized into well-defined, tabular formats with rows and columns, making it
easy to store and analyze. Examples include data in relational databases, spreadsheets, and CSV
files. Structured data is highly organized and can be efficiently processed.
▪ Unstructured Data:
Unstructured data does not have a fixed format, and it can take many forms, including text, images,
audio, and video. Analyzing unstructured data often requires advanced techniques like natural
language processing (NLP) for text, image recognition for images, and speech-to-text for audio.
▪ Semi-Structured Data:
Semi-structured data falls between structured and unstructured data. It has some structure, such as
tags, but is not as rigid as structured data. Examples include JSON and XML data. NoSQL
databases are often used to store and process semi-structured data.
(Big) Data Types
▪ Time-Series Data:
Time-series data is collected and recorded over time, with each data point associated with a specific
timestamp. Examples include stock prices, sensor data, and weather observations. Time-series data
is used for trend analysis and forecasting.
▪ Geospatial Data:
Geospatial data contains information related to geographic locations. This includes GPS coordinates,
maps, satellite imagery, and location-based data from mobile devices. Geospatial data is used in
applications like geographic information systems (GIS) and location-based services.
▪ Graph Data:
Graph data represents relationships between entities, often in a network structure. Social networks,
organizational structures, and the World Wide Web can be modeled using graph data. Graph
databases are used to manage and analyze such data.
▪ Financial Data:
Financial data includes transaction records, market data, and financial statements. It's critical in
areas like algorithmic trading, risk assessment, and fraud detection.
(Big) Data Types
▪ Audio and Video Data:
Audio and video data are multimedia formats containing sound and visual information. Analyzing this
data can involve speech recognition, video analysis, and content recommendation.
▪ Machine Data:
Machine data is generated by machines and devices, including sensors, IoT devices, and industrial
equipment. It contains information about the operational status and performance of these machines
and is essential for predictive maintenance and monitoring.
▪ Log Data:
Log data consists of records of events or activities, often with timestamps. These logs are generated
by servers, applications, and devices. Analyzing log data is crucial for troubleshooting and monitoring
system performance.
▪ Text Data:
Text data includes written or textual information, such as documents, emails, social media posts, and
customer reviews. Text data analysis involves tasks like sentiment analysis, topic modeling, and text
mining.
The Six Vs of Big Data
Big data is a collection of data from various sources, often characterized by what’s become known as the
3Vs: Volume, Variety and Velocity. Overtime, other Vs have been added to description of big data.
The Six Vs of Big Data
▪ The "Six Vs of Big Data" is a framework that characterizes and highlights the key attributes of big data.
These attributes are essential for understanding the challenges and opportunities associated with
managing, analyzing, and deriving value from large and complex datasets.
▪ Volume:
This refers to the sheer size of the data. Big data involves datasets that are much larger than what
traditional databases and tools can handle. It's about the scale of data, often measured in petabytes,
exabytes, or even zettabytes. The enormous volume of data presents challenges in terms of storage,
processing, and analysis.
▪ Velocity:
Velocity relates to the speed at which data is generated, collected, and processed. Big data often comes
in real-time or near-real-time streams. This is especially relevant in applications like social media, Internet
of Things (IoT), financial trading, and cybersecurity, where rapid data ingestion and analysis are crucial.
The Six Vs of Big Data
▪ Variety:
Variety refers to the diversity of data types and sources. Big data encompasses structured data (like
databases), semi-structured data (like XML or JSON), and unstructured data (like text, images, and
videos). It also includes data from a wide range of sources, such as sensors, social media, and online
interactions.
▪ Veracity:
Veracity relates to the quality and reliability of the data. Big data often involves messy, incomplete, or
inconsistent data. Ensuring data quality and accuracy is a significant challenge. Veracity issues can arise
from errors in data collection, biases, or the sheer volume of data.
▪ Value:
While not one of the original Vs, value is a critical dimension. It refers to the ability to extract meaningful
insights, knowledge, or business value from big data. The goal of big data analytics is to turn this data
into actionable insights that lead to informed decision-making and positive outcomes.
▪ Viability:
Viability refers to the economic and technical feasibility of handling and deriving value from big data.
Organizations need to consider the costs, infrastructure, and resources required to work with large and
complex datasets.
The Six Vs of Big Data
▪ In addition to the Six Vs, some discussions also mention other Vs:
▪ Variability:
Variability relates to changes in data flow and patterns. In some applications, data flow may be erratic,
and the volume and characteristics of the data can change unpredictably. This variability can present
challenges in data processing.
▪ Visualization:
Visualization is essential for making sense of big data. It helps in presenting complex patterns and
insights in a visually digestible format. Effective data visualization is crucial for data exploration and
communication.
▪ Understanding these Vs is critical for businesses and organizations that deal with big data. They inform
decisions related to data storage, processing, analytics tools, and strategies for extracting value from the
wealth of information available.
▪ Big data technologies and techniques have evolved to address these challenges and harness the potential
insights and benefits offered by big data.
Modeling
▪ Modeling is a process of creating simplified representations of real-world systems, processes, or
concepts to understand, analyze, predict, or simulate their behavior.
▪ Models are used in a wide range of disciplines, including science, engineering, economics, and many
others, to gain insights, make informed decisions, and solve complex problems.
▪ Abstraction:
Modeling involves simplifying complex systems or phenomena by focusing on the most relevant and
essential aspects. This simplification is called abstraction and is essential for making models
manageable and useful.
▪ Representation:
Models are constructed to represent the essential characteristics of the real-world system. They can
take various forms, including mathematical equations, physical prototypes, computer simulations,
diagrams, and conceptual frameworks.
▪ Purpose:
Models are created for specific purposes, such as prediction, explanation, optimization, or
visualization. The choice of modeling approach and the level of detail depend on the intended
purpose.
Types of Models:
▪ There are various types of models, including:
1. Mathematical Models:
These are expressed as mathematical equations, formulas, or algorithms. They are common in
physics, engineering, and economics.
2. Physical Models:
These are tangible representations of systems or objects, such as architectural scale models or
mechanical prototypes.
3. Conceptual Models:
These are high-level representations used to convey concepts, relationships, and ideas, often through
diagrams or flowcharts.
4. Simulation Models:
These are computer-based models that imitate the behavior of a system or process over time. They
are used in fields like computer science, engineering, and social sciences.
▪ Modeling Reality:
In simulation, a model is constructed to represent a real-world system or process. This model can be
a mathematical equation, a computer program, a physical replica, or a combination of these. The
model captures the essential features and interactions of the system being studied.
▪ Purpose:
Simulations are used for various purposes, including understanding complex systems, testing
hypotheses, predicting outcomes, optimizing processes, and training individuals. They are especially
valuable when conducting real-world experiments is costly, dangerous, or impractical.
▪ Simulation offers the advantage of experimenting with systems without real-world consequences.
▪ It allows for risk-free testing, in-depth analysis, and exploration of scenarios that may be challenging or
impossible to replicate physically.
▪ simulation is a valuable tool for modeling and understanding real-world systems or processes.
Difference between Modeling & Simulation
▪ Modeling provides a static or conceptual representation of a system, while simulation involves
running a model to observe the dynamic behavior of the system over time.
▪ Models are valuable for conceptual understanding and prediction, while simulations are used to study
and test real-world systems' behavior under varying conditions.
▪ Both modeling and simulation are essential tools for problem-solving, analysis, and decision-making in
various fields, including engineering, science, business, and more. They are often used in conjunction
to gain a comprehensive understanding of complex systems and processes.
▪ Examples of Modeling:
Mathematical equations describing the motion of a pendulum, a diagram of an electrical circuit, or a
flowchart representing a business process are all examples of models.
▪ Examples of Simulation:
Simulating the movement of a car in a driving simulator, running a computer model to predict weather
patterns, or using a business process simulation to optimize supply chain operations are all
examples of simulations.
Data Mining
▪ Data mining is a process of discovering patterns, trends, associations, and valuable information from
large datasets.
▪ It involves the use of various techniques and algorithms to extract meaningful insights and knowledge
from data.
▪ Data mining is a crucial component of the broader field of data analysis and is used across various
industries and applications.
▪ Algorithm:
▪ An algorithm is a step-by-step set of instructions or a well-defined procedure for solving a specific
problem or accomplishing a particular task.
▪ Algorithms are fundamental to computer science, mathematics, and various fields where systematic
and repeatable processes are involved. They serve as a blueprint for how a particular task or
computation should be carried out, and they can be implemented in computer programs or executed
manually.
Optimization
▪ Optimization is the process of finding the best possible solution or outcome from a set of available
choices or conditions.
▪ It involves making decisions and setting variables to maximize benefits, minimize costs, or achieve a
specific goal while satisfying a set of constraints or limitations.
▪ Optimization is a fundamental concept in various fields, including mathematics, engineering,
economics, operations research, and computer science.
▪ Objective Function:
Optimization begins with defining an objective function, which quantifies the goal you want to
achieve. The objective function is a mathematical expression that takes a set of decision variables as
input and produces a value that you aim to maximize or minimize.
▪ Decision Variables:
Decision variables are the parameters or choices that you can control or adjust to achieve your
objective. They are typically represented as symbols in the objective function.
Optimization
▪ Constraints:
Constraints are conditions or limitations that must be satisfied in the optimization problem. These
conditions can be equality constraints (requirements that must be met exactly) or inequality
constraints (limits that must not be violated).
▪ Optimization Problem:
An optimization problem involves defining an objective function, specifying decision variables, and
setting constraints. The goal is to find the values of decision variables that optimize the objective
while satisfying the constraints.
▪ Feasible Region:
The feasible region represents the space of possible solutions that satisfy all the constraints.
Solutions outside this region are infeasible.
▪ Optimal Solution:
An optimal solution is the best possible outcome that the optimization problem can achieve. It is the
set of values for decision variables that maximize or minimize the objective while adhering to the
constraints.
Types of Optimization
1. Linear Optimization (Linear Programming):
In linear optimization, both the objective function and constraints are linear equations. This approach
is widely used in fields like operations research, supply chain management, and economics.
2. Nonlinear Optimization:
Nonlinear optimization involves nonlinear objective functions or constraints. This type is common in
engineering, scientific research, and machine learning.
3. Integer Optimization:
Integer optimization deals with problems where some or all decision variables must take integer
values. This is applicable in situations like project scheduling and network design.
4. Convex Optimization:
Convex optimization focuses on problems with convex objective functions and constraints. It is used in
areas like machine learning, control systems, and signal processing.
5. Multi-objective Optimization:
In multi-objective optimization, there are multiple conflicting objectives that need to be optimized
simultaneously. The goal is to find a set of solutions that balance these objectives.
Business Intelligence (BI)
▪ Business Intelligence (BI) refers to the technology, processes, and tools used to transform raw data into
meaningful insights and actionable information that can support decision-making in an organization.
▪ The primary goal of BI is to provide business users, including managers and executives, with easy access
to data-driven insights in a format that is understandable and actionable.
▪ Business Intelligence is not a single software application but a concept encompassing a wide array of
software tools designed to help organizations manage and derive insights from their data.
▪ Data Integration:
BI systems integrate data from various sources, including databases, spreadsheets, and external data
feeds, into a single, coherent data repository.
▪ Data Warehousing:
Data warehousing is a central component of BI, involving the storage of structured data from various
sources in a way that enables efficient retrieval and analysis.
▪ Tableau
▪ Microsoft Power BI
▪ QlikView and Qlik Sense
▪ SAP BusinessObjects
▪ IBM Cognos
▪ MicroStrategy
▪ Looker
▪ Sisense
▪ These tools are typically used in various industries and domains to support data-driven decision-making,
monitor business performance, and derive valuable insights from data.
▪ Organizations select BI software based on their specific needs, data sources, and technical requirements.
▪ These BI tools are vital for turning data into actionable information that supports strategic, operational, and
tactical decisions.
Business Analytics in Practice: Financial Analytics
▪ Predictive models are used
o to forecast financial performance,
o to assess the risk of investment portfolios and projects,
o to construct financial instruments (i.e., a monetary contract between parties)
Tiwari, S., Wee, H. M., & Daryanto, Y. (2018). Big data analytics
in supply chain management between 2010 and 2016: Insights
to industries. Computers & Industrial Engineering, 115, 319-330.
Supply Chain
▪ Campaign & Promotion Analysis
▪ Warehouse planning
▪ Fraud Detection
Finance Manufacturing
▪ Many more
Machine Learning (ML)
▪ Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses on the development of
algorithms and statistical models that enable computer systems to improve their performance on a
specific task through experience, without being explicitly programmed.
▪ In other words, instead of instructing a machine how to perform a task, machine learning allows the
machine to learn from data and adapt its behavior accordingly.
▪ Algorithm:
Machine learning algorithms are the mathematical and computational techniques used to learn patterns
and make predictions from data. These algorithms can be classified into various categories, such as
supervised learning, unsupervised learning, and reinforcement learning.
▪ Training:
To train a machine learning model, you provide it with a labeled dataset, meaning that the correct
answers or outcomes are known. The model learns to make predictions based on this training data.
Machine Learning (ML)
▪ Testing and Evaluation:
After training, the model's performance is evaluated on a separate dataset, called the test dataset, to assess its
ability to make accurate predictions on new, unseen data.
▪ Feature Engineering:
Feature engineering involves selecting and transforming relevant attributes or features from the data to improve the
model's performance.
▪ Model Selection:
There are various types of machine learning models, such as linear regression, decision trees, neural networks, and
support vector machines. The choice of the model depends on the specific task and the data.
▪ Supervised Learning:
In supervised learning, the model is trained on labeled data, which means it learns to map input data to known
output values. It is commonly used for tasks like classification and regression.
▪ Unsupervised Learning:
Unsupervised learning deals with unlabeled data and focuses on finding patterns, structures, or clusters in the data.
Common techniques include clustering and dimensionality reduction.
▪ Reinforcement Learning:
In reinforcement learning, agents learn to make sequences of decisions by interacting with an environment. They
receive feedback in the form of rewards or punishments based on their actions.
Machine Learning
▪ After deployment, the results and feedback can lead to further iterations, particularly as new data
becomes available, or business objectives evolve.
▪ The advantages of using CRISP-DM include its flexibility, focus on business understanding, and
systematic approach to data mining projects.
▪ It emphasizes the importance of collaboration between business stakeholders and data scientists
throughout the project.
▪ Keep in mind that the specific tools, techniques, and steps within each stage can vary depending on
the project's unique requirements and the tools available.
▪ The framework serves as a guideline for organizing the process, but the details of its implementation
can be adapted to the specific project.
R & R-Studio
▪ R and RStudio are two closely related tools commonly used by data analysts, data scientists, and statisticians
for statistical analysis, data visualization, and data manipulation.
▪ R:
R is a programming language and open-source software environment specifically designed for statistical
computing and data analysis. It was developed by statisticians and data analysts to provide a comprehensive
platform for data manipulation, statistical modeling, data visualization, and more.
Features: Statistical Analysis, Data Manipulation, Data Visualization, Custom Functions.
▪ RStudio:
RStudio is an integrated development environment (IDE) designed to work with the R programming
language. It provides a user-friendly and efficient environment for working with R, making it easier to write,
test, and debug R code.
Features: Code Editor, Console, Data Viewer, Plots and Visualizations, Package Management, Reports and
Presentations.
▪ R and RStudio work in tandem, with R providing the statistical and data analysis capabilities and RStudio
offering a user-friendly interface for writing and running R code. RStudio streamlines the workflow and
enhances the productivity of R users by providing a centralized platform for coding, data exploration, and
report generation.
▪ Both R and RStudio are widely used in data analysis, statistics, data science, and research, and they are
available for multiple operating systems, making them accessible to a broad user base.
R, R-Studio & Excel: Install
▪ Install R Programming
▪ Install R-Studio
o Analysis ToolPak
o Solver Add-in
THANK YOU