BA4206 Business Analytics Batch 17 Study Material
BA4206 Business Analytics Batch 17 Study Material
Study Material
Prepared by Prof. S Balaji
1
Unit – 1
What is BA?
2. Diagnostic: The interpretation of historical data to determine why something has happened
4. Prescriptive: The application of testing and other techniques to determine which outcome
will yield the best result in a given scenario
Evolution of BA:
2
• The Internet Boom (2000s) – Advanced Analytics & Big Data
Big Data grew with the internet and social media. Predictive analytics gained traction,
supported by cloud computing and data mining.
Scope of BA:
2. Reputation management
4. Financial management
• Competitive Advantage
• Risk Mitigation
• Performance Measurement
3
Components of BA:
Data Mining
Association
Text Mining
Forecasting
Optimization
Data Visualization
1. Data Query
2. Data dashboard
1. Data mining
2. Simulation
1. Simulation Optimisation
2. Decision analysis
4
Title Descriptive Predictive Prescriptive
Descriptive Statistics
Data Visualization
Linear Regression
Time Series Analysis and Forecasting
Data Mining
Spreadsheet Models
Linear Optimization Models
Integer Linear Optimization Models
Non-linear Optimization Models
Monte Carlo Simulation
Decision Analysis
Importance of BA:
• Data-Driven Decision Making
• Improved Operational Efficiency
• Competitive Advantage
• Enhanced Customer Experience
• Cost Reduction & Revenue Growth
• Risk Management & Fraud Detection
• Supports AI & Automation
5
Tools of BA:
Data Visualization Tools
• Tableau
• Power BI
• Google Data Studio
Statistical & Data Analysis Tools
• R
• Python (Pandas, NumPy, SciPy)
• SAS
Business Intelligence (BI) Tools
• SAP BusinessObjects
• IBM Cognos Analytics
• Oracle BI
Big Data & Database Management Tools
• SQL
• Apache Hadoop
• Snowflake
Predictive Analytics & Machine Learning Tools
• IBM SPSS
• KNIME
ETL (Extract, Transform, Load) Tools
• Talend
• Apache Nifi
Customer Analytics & CRM Tools
• Google Analytics
• Salesforce Analytics
• HubSpot
Spreadsheet & Reporting Tools
• Microsoft Excel
• Google Sheets
6
Challenges of BA:
4) Integrity of Data
• Customer segmentation
• Fraud detection
• Demand forecasting
• Inventory optimization
7
Supply Chain & Logistics
• Route optimization
• Warehouse management
• Demand-supply balancing
• Attrition prediction
• Predictive maintenance
• Process optimization
8
Ways Business Analytics can help achieve a Competitive Advantage:
Price Leadership – Business analytics helps optimize pricing strategies by analyzing market
trends, competitor pricing, and consumer behavior to offer competitive yet profitable prices.
Service Effectiveness – Analytics enables personalized customer experiences and faster issue
resolution by predicting customer needs and improving service delivery.
Innovation – By identifying emerging trends and customer preferences, analytics fosters data-
driven innovation in products, services, and business models.
9
Unit – 2
Business analyst:
10
Types of Business Analysts:
• IT Business Analysts
11
Organizational structures that align well with the Business Analyst (BA) role are:
• Example:
• Example:
12
Primary data:
• Surveys
• Interviews
• Experiments
• Auditing
• Simulation
• Observation
Secondary data
Secondary data refers to information that has been collected by someone else or for another
This data is not collected firsthand but rather obtained from sources such as,
Internal Sources:
Sales Analysis
Invoice Analysis
Financial Data
Transportation data
External sources:
Libraries
Literature
13
Difference between Primary and Secondary data:
Secondary
Primary Data
Data
Data collected firsthand for a Data that has already been collected by
Definition
specific research purpose someone else for a different purpose
Collection
Direct and customized Indirect and pre-existing
Method
Outsourcing
Data Quality
Measuring BA contribution
Managing change
14
Unit – 3
Descriptive Analytics
Measures of Frequency – Describe how often a particular value appears in the dataset.
Measures of Central Tendency – Identify the central or most representative value in a dataset.
Measures of Dispersion (Variability) – Show how much the data varies from the central
value.
Measures of Position – Indicate where a particular value stands in relation to others in the
dataset.
15
Steps involved in Descriptive Analysis:
Data Visualization:
Data Visualization is the graphical representation of data and information using visual elements
like charts, graphs, maps, and dashboards. It helps to identify trends, patterns, and insights in
data, making complex information easier to understand and interpret.
1. Simplifies Complex Data – Converts raw data into visual formats, making it easier to
interpret and analyze.
2. Identifies Trends & Patterns – Helps in recognizing trends, correlations, and outliers
in large datasets.
5. Facilitates Quick Analysis – Enables users to analyze and understand large amounts
of data at a glance.
16
Charts & Graphs in Data Visualization:
Charts and graphs are essential tools in data visualization that help represent numerical data in
a visual format. They make it easier to identify trends, patterns, and relationships between
variables. Choosing the right type of chart depends on the nature of the data and the insights
you want to extract.
2. Go to Insert Tab
o Select a chart type (e.g., Bar, Line, Pie) from the Charts group.
17
Types of Charts:
Bar chart
In a bar chart, values are indicated by the length of bars, each of which corresponds with a
measured group. Bar charts can be oriented vertically or horizontally; vertical bar charts are
sometimes called column charts. Horizontal bar charts are a good option when you have a lot
of bars to plot, or the labels on them require additional space to be legible.
Line chart
18
Line charts show changes in value across continuous measurements, such as those made over
time. Movement of the line up or down helps bring out positive and negative changes,
respectively. It can also expose overall trends, to help the reader make predictions or projections
for future outcomes. Multiple line charts can also give rise to other related charts like the
sparkline or ridgeline plot.
Scatter plot
A scatter plot displays values on two numeric variables using points positioned on two axes:
one for each variable. Scatter plots are a versatile demonstration of the relationship between
the plotted variables—whether that correlation is strong or weak, positive or negative, linear
or non-linear.
Heatmap
19
The heatmap presents a grid of values based on two variables of interest. The axis variables
can be numeric or categorical; the grid is created by dividing each variable into ranges or levels
like a histogram or bar chart. Grid cells are colored based on value, often with darker colors
corresponding with higher values. A heatmap can be an interesting alternative to a scatter plot
when there are a lot of data points to plot, but the point density makes it difficult to see the true
relationship between variables.
Pie chart
A pie chart, sometimes called a circle chart, is a way of summarizing a set of nominal data or
displaying the different values of a given variable (e.g. percentage distribution). This type of
chart is a circle divided into a series of segments. Each segment represents a particular category.
20
Probability Distribution
3. Supports Decision-Making
Sampling:
Sampling is the process of selecting a subset of individuals from a larger population to represent
the whole group.
Types of Sampling:
Probability Sampling:
• Every member of the population has a known, non-zero chance of being selected.
• It ensures objectivity and is suitable for generalizing results to the population.
• Examples: Simple Random, Systematic, Cluster, and Stratified Sampling
Non-Probability Sampling:
21
Probability Sampling Methods
Systematic Sampling
Selects every kth individual from a list after a random start.
Useful when the population is orderly and large.
Easy to implement but may introduce bias if patterns exist.
Cluster Sampling
Divides population into clusters, randomly selects entire clusters.
Used when population is large and spread out.
Cost-effective but may reduce diversity within samples.
Stratified Sampling
Divides population into subgroups (strata) based on a characteristic.
Samples are taken from each stratum proportionally or equally.
Ensures representation across key subgroups.
Convenience Sampling
Selects samples that are easiest to access.
Quick and inexpensive but prone to bias.
Often used in exploratory research or pilot studies.
Purposive Sampling
Samples chosen based on researcher’s judgment and purpose.
Targets specific characteristics relevant to the study.
Useful in qualitative research with defined criteria.
22
Panel Sampling
Involves studying the same group (panel) over time.
Allows for longitudinal analysis and tracking changes.
Panel members are selected non-randomly and retained.
Snowball Sampling
Existing subjects recruit future subjects from their networks.
Used for hard-to-reach or hidden populations.
Effective but risks selection bias due to homogenous networks.
Sampling error
• Sampling error is the difference between the results obtained from a sample and the
actual values of the population.
• It occurs because only a subset of the population is studied, not the entire group.
• This error is natural and expected, but it can be minimized through proper sampling
techniques.
Occurs due to chance variations when a random sample does not perfectly represent the
population.
It can be reduced by increasing the sample size or using more precise sampling methods.
23
Estimation
It helps researchers make predictions or decisions without studying the entire population.
Types of Estimation
Point Estimation:
Gives a single value (point) as an estimate of a population parameter (e.g., mean, proportion).
Example: Using the sample mean to estimate the population mean.
It's simple but doesn't show how accurate the estimate is.
Interval Estimation:
Provides a range (interval) of values within which the population parameter is likely to lie.
Usually includes a confidence level (like 95%) to indicate reliability.
Example: The population mean is estimated to be between 45 and 55 with 95% confidence.
Probability
Probability Distribution
• A probability distribution shows how probabilities are distributed over the values of a
random variable.
• It describes all possible outcomes and their associated probabilities.
• Two main types are Discrete (e.g., Binomial, Poisson) and Continuous (e.g., Normal
distribution).
24
Binomial Distribution
Each trial has two outcomes (success or failure) and a constant probability of success.
Key Conditions:
Independent trials
P(x)=nC x ⋅p x ⋅q n−x
Example Question: A coin is tossed 5 times. What is the probability of getting exactly 3
heads?(Here, getting a head is considered a success)
Given:
• n=5
• x=3
• q=1−p=0.5
25
Poisson Distribution
It models the number of times an event occurs in a fixed interval of time or space.
It’s used when events happen independently and rarely over a large number of opportunities.
Key Conditions:
P(x)= (e −λ) λ x / x!
Example Question: A call center receives 4 calls per hour on average. What is the
probability that it will receive exactly 2 calls in an hour?
Given:
𝜆=4
𝑥=2
So, there is approximately a 14.64% chance that the call center will receive exactly 2 calls in
one hour.
26
Unit – 4
Predictive Analytics
Predictive Analytics
Predictive analytics is the use of statistics and modeling techniques to forecast future outcomes.
Current and historical data patterns are examined and plotted to determine the likelihood that
those patterns will repeat.
27
Steps involved in Predictive Analysis
1. Define your project’s objectives. What is the desired outcome? What problem are you trying
to solve? The first step is to define your project’s objectives, deliverables, scope, and data
required.
2. Collect your data. Gather all the data you need in one place. Include different types of current
and historical data from a variety of sources – from transactional systems and sensors to call
center logs – for more in-depth results.
3. Clean and prepare your data. Clean, prepare, and integrate your data to get it ready for
analysis. Remove outliers and identifying missing information to improve the quality of your
predictive data set.
4. Build and test your model. Build your predictive model, train it on your data set, and test it
to ensure its accuracy. It may take multiple iterations to generate an error-free model.
5. Deploy your model. Deploy your predictive model and put it to work on new data. Get results
and reports – and automate decision-making based on the output.
6. Monitor and refine your model. Regularly monitor your model to review its performance
and ensure it’s providing the expected results. Refine and optimize your model as needed.
Definition:
These models use expert knowledge and predefined rules to make predictions. The logic is
manually encoded, often using "if-then" rules or decision trees.
How it works:
Based on domain expertise or business logic. Doesn’t rely heavily on historical data. Uses
deterministic rules.
28
Examples:
Pros:
Cons:
Definition:
These models rely on historical data and use statistical algorithms or machine learning to learn
patterns and make predictions.
How it works:
Examples:
29
Pros:
High accuracy.
Cons:
Difference:
Python ML libraries, R,
Example Tools Expert Systems, Rule Engines
AutoML platforms
30
For Difference,
Data Mining is the process of discovering patterns, trends, and knowledge from large sets of
data. When used for Predictive Analytics, it helps in forecasting future outcomes based on
historical data.
Problem Definition
Data Collection
Gather relevant data from different sources like databases, cloud storage, etc.
Data Preprocessing
Transform data (normalization, encoding, etc.) and Select relevant features (feature selection
or extraction).
31
Data Mining / Model Building
Evaluation
Deployment
1. Classification
Example:
A bank wants to predict whether a loan applicant will default or not (Yes/No).
32
2. Regression
Example:
A real estate company wants to predict the price of a house.
Input Data: Number of bedrooms, Size (sq. ft.), Location, Age of house
Example:
A retail store wants to forecast monthly sales for the next 6 months.
Example:
A supermarket wants to find which items are frequently bought together.
Output Rule: "If a customer buys bread and butter, they are likely to buy jam."
33
5. Clustering (supporting technique for segmentation)
Example:
An e-commerce site wants to group customers into segments based on purchase behavior.
Output:
Example:
A hospital uses patient data to predict the risk of heart disease.
Customer segmentation
Sales forecasting
34
Healthcare
Disease prediction
Finance
Credit scoring
Fraud detection
• Data Quality
• Dynamic Data
• Interpretability
• Scalability
35
Unit – 5
Prescriptive Analytics
Prescriptive Analytics
• Prescriptive analytics uses data to recommend actions that can lead to the best possible
outcomes.
• It not only predicts what might happen but also suggests what should be done next.
• By using techniques like optimization and machine learning, it helps businesses make
smarter decisions.
• Prescriptive analytics is often used in areas like marketing, healthcare, and supply
chain management.
Define the Objective: Clearly understand and state what decision or goal you want to achieve.
Collect and Prepare Data: Gather relevant internal and external data and organize it for
analysis.
Analyze Data and Build Models: Use optimization models, machine learning, or simulation
techniques to study the data.
Evaluate and Validate Results: Test the recommendations to check if they meet the objectives
and make adjustments if needed.
Implement and Monitor: Put the chosen actions into practice and track outcomes to improve
future decisions.
36
Types of Prescriptive Modeling
Definition: Linear programming is used to find the best outcome in a mathematical model with
linear relationships. It is often used for optimization problems.
Example: A factory wants to maximize profits by deciding how many units of each product to
produce, given constraints like labor and materials. Linear programming helps find the optimal
production mix.
Example: A delivery company needs to assign trucks to different routes. Since the number of
trucks is fixed, integer programming helps in assigning them to minimize travel distance.
Definition: Network optimization involves finding the most efficient flow of resources through
a network. It aims to minimize costs or maximize efficiency in transportation, logistics, or data
flow.
Example: A shipping company needs to minimize transportation costs by determining the most
efficient routes for its trucks to take between distribution centers.
Definition: Simulation optimization uses simulation models to evaluate different scenarios and
find the best solution. It is particularly useful for complex, uncertain systems.
Definition: Decision trees are used to make decisions by breaking down a problem into a tree
structure of possible outcomes. They help in decision-making under uncertainty.
Example: A bank uses a decision tree to determine whether to approve a loan based on the
applicant's credit score and income.
37
Definition: Reinforcement learning is a machine learning technique where an agent learns by
interacting with an environment to maximize a cumulative reward.
Example: A self-driving car uses reinforcement learning to determine the best route based on
real-time traffic conditions, continuously improving its driving decisions.
Definition: Non-linear optimization deals with optimization problems where the objective
function or constraints are non-linear, involving complex relationships between variables.
Example: A company in the energy sector uses non-linear optimization to minimize fuel
consumption while meeting fluctuating energy demands, considering non-linear relationships
in energy production costs.
Example: A traveler wants to pack a suitcase quickly for a trip. Instead of trying all packing
combinations, they pack by priority. Heuristics help choose what’s most important. It’s not
perfect, but it's fast and practical.
• Decision Support
• Scenario Analysis
• Competitive Advantage
38
Cons of Prescriptive Modeling
• Complexity
• Data Dependency
• Limited Flexibility
• Healthcare Management
39
All the Best !
40